Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Cryptography Questions: XOR && Compression

  1. Jan 24, 2008 #1
    I have been reading a bit about cryptography recently, and there are a few things that I haven't understood so far:

    #1) I read that since compression removes redundant information, it is a randomizing function. Does that mean that I
    would have more high quality random data afterwards if I used 'dd' to dump a few hundred megabytes of random data into a
    file, and then compressed the file w/ a good compression algorithm? Or did I misunderstand what they meant by compression? Does this only work for certain types of compression? Which is the best one to use?

    #2) I read that XORing data that is not completely random, will actually decrease the randomness even further. Why is
    this? For example, I have a built in hardware random number generator in my laptop, but I don't trust the multibillion dollar corporation that manufactured it. In case they have engineered it to have some type of "backdoor", I was thinking about taking the output of it (perhaps every 1 KB), and XORing each 32-bit word of it with a number (which would change from block to block) from /dev/random.

    If there is a backdoor built in (i.e. the data isn't completely random), would XORing these numbers together cause this pattern to be more pronounced? That is, would I make it even worse by doing this?

  2. jcsd
  3. Jan 24, 2008 #2

    jim mcnamara

    User Avatar

    Staff: Mentor

    1. Compression is a kind of encryption, if that is what you mean. And no, it is not a good source for cryptographically useable random data.

    2. I seriously doubt there is a backdoor in the hardware bit stream. The math behind random number analysis isn't trivial but there are alogorithms out there to test and analyze random number generators. My understanding: It is very difficult to certify a RNG as truly random. A lot of encryption methodology depends on large prime number arithmetic as well as RNG's. Google for 'RSA algorithm' for example.

    http://www.burtleburtle.net/bob/rand/testsfor.html This has code to play with that tests RNG's. I cannot vouch for it.

    Cryptography is a complex subject. A nice starting point is probably Matt Blaze:
    http://www.crypto.com/ You may want to check out Bruce Schneier, too.
  4. Jan 24, 2008 #3


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    1. Perfectly compressed data is indistinguishable from random data. Since no perfect compression algorithms exist, your compressed data will never be as good as a source of true randomness, and you should not use it as such. Futhermore, every compression algorithm has worst-case input data that it cannot compress at all, and clearly the resulting "compressed" data will not be random at all.

    2) XORing is not affect the data at all. If you take truly random data and pass it through a linear function like XOR, it remains random. If you take non-random data and pass it through XOR, cryptanalysis can "undo" the XOR. Bottom line, it won't help either way.

    - Warren
  5. Jan 24, 2008 #4
    I would recommend checking into the Robert G. Brown's 'dieharder' It's the most comprehensive RNG test suite I've been able to find.

    As far as the compression, I didn't mean taking non-random data and compressing it to try to get random data. I meant taking the random data from an RNG, and compressing it using arithmetic coding (which although not 'perfect', compresses really well) or some other high quality compression algorithm to remove any redundancies. Would this help at all?

    Thanks for clearing that up. Obviously I had misunderstood something before, and now reading more about it, I see that my logic was faulty.

    Also, I had another question: If I broke up the random data into 512-bit blocks, and hashed each of them with SHA-512 (using /dev/random to generate #'s for the hash function), would this increase, decrease, or do nothing to the quality of the data? Why?

    Forgive me if my questions are 'ignorant', but I'm trying to learn the math right now, and it's going to take me a long while (I'm not in school anymore, and I'm still trying to get through integral calculus on my own)...

  6. Jan 24, 2008 #5


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    You cannot compress truly random data, as it has no redundancies. Furthermore, compression algorithms necessarily introduce artifacts of their own (block boundaries, padding, etc.) that could only turn truly random data into less random data.

    If the input data is already truly random, it can't get any more random. The best case scenario is that it remains truly random. The worst case is that it becomes less random.

    - Warren
  7. Jan 24, 2008 #6
    OK, thanks for the responses chroot. It looks like I need to do a bit more reading ;)
  8. Nov 17, 2008 #7
    Hi Warren,

    I would like to know why only xor is used during random pattern generation. why is only xor and xnor called linear functions ?

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook