# A question on the interpretation of randomness in algorithmic information theory

1. Aug 15, 2010

### technodai

Hi everyone, I'd like to start by saying that I have no background or knowledge of maths other than high school. As a consequence, I have no idea if this question is stupid, trivial, nonsensical or whatever. I also don't even know if it's in the correct thread (I made the best guess I could). So I should probably apologise straight away!

I've been switched unexpectedly back to an interest in maths for reasons difficult to explain. But the nature of random data has arisen and I have been doing some reading. The question below is just one of the many I have on the issue. It's also quite long, so again, I apologise. Thanks for your time.

As I understand it, algorithmic information theory tells us that;

The 3000 page encyclopaedia contains less information than 3000 pages of random letters

This is illustrated by the fact that a description of the string of random characters would need to be at least as long as the string itself (random data cannot be compressed). The encyclopaedia however, as it is composed of words from a language which has its own rules and structure, could be described in less. As an example of one of the ways this might be achieved, we can imagine replacing every instance of the word 'the' with a single symbol, thereby removing the need for two of the three characters required to express the word itself.

This does of course seem counter intuitive to us, since the encyclopaedia contains more 'useful' data. Of course, how 'useful' data is depends upon human values and cannot be quantified objectively here. In addition to this objection though is our normal understanding of what it means for information to be truly random. To be truly random, information must contain no information (?). If this is the case, then we can assume that no matter how we choose to re-arrange the information as whole, it will make no discernible difference to the meaning of the information. It is said to be in a state of maximum entropy – the information is in a state of total disorder.

Black holes are also an example of maximum entropy, since any re-arrangement of the matter behind the event horizon can make absolutely no difference to the appearance of the black hole. In fact, since it is impossible to ever receive information from inside a black hole, we say that the black hole contains no information. A black hole therefore represents the maximum amount of entropy that can exist in space. This is further illustrated by the fact that if we add more entropy to the system, the black hole must get larger.

Now, if we compare this to our above description of the 3000 pages of random words we find a similarity. If we add more random letters, the solution that describes those random letters must increase accordingly. And yet the characters still contain no information! So then, if the shortest possible solution to describe the characters is the characters themselves, then the solution itself contains no information. Would a better description of the random characters in fact be an algorithm that generates a string of random digits, regardless of whether they are the same digits as in the first random string?

2. Aug 15, 2010

### CRGreathouse

Consider a text file containing the encyclopedia. Now compress that file with, e.g., gzip. The resulting file has the whole encyclopedia but in fewer bits, so it has more information per bit. If you look at the compressed bits you'll see that they look essentially random.

For all you know, that large 'random' string is just the compressed form of a larger encyclopedia... right?

3. Aug 15, 2010

### technodai

Okay I think I get that so far. However, the bits that make up the compressed encyclopaedia wouldn't actually be random, even if they gave the appearance of being so. Similarly the random string, if it was the compressed form of a larger encyclopedia would therefore cease to be random!

Does this say more about our inability to tell if a string is truly random than it does about whether any random string is equivalent to any other (of the same length)? Could the random string be effectively described by the random string generating algorithm?

4. Aug 15, 2010

### CRGreathouse

How could you possibly tell? I mean, if you try to unzip it with a gzip decompressor (gunzip) and it fails, you know it's not gzipped. But what if it was CRGreathouse-encoded?

But all this is a bit afield from the original question. What's being measured is entropy, not usefulness, and a random string has maximal entropy. This follows fairly quickly from the definition of information entropy.