Hi I'm writing a compression utility based on an improvement to Huffman coding and just for giggles, I thought I'd try it out on true random data from random.org to see if it did anything. What I found was that the data from random.org (which is derived from atmospheric noise) definitely displays biases which can be exploited to generate significant compression rates (I'm getting upwards of 55% savings.) I'm well aware of the "pigeonhole principle" and know that random data shouldn't be compressible so that brings up the question: Is this data somehow not random? Is randomness really just a function of scale? For example: If you take 1,000,000 randomly generated bits and slice them up into 8 bit samples, there are bound to be certain biases (ie: more instances of the number 15 than 16) If you slice those 1,000,000 bits into 25 bit samples, completely different biases would result, but it seems that by choosing a sample size, you automatically introduce biases into random data, which sort of makes it - not random. The only way I can think of to create a random set without biases would be to start with equal numbers of all values for a particular sample size and mix them. That way, there would be no frequency differences to exploit. But even this fails if you choose a different sample size. So I'm wondering is randomness just an illusion caused by sampling scale?