Kolmogorov randomness

1. Feb 24, 2008

Tiiba

Are strings that are Kolmogorov-random actually random in the usual sense?

An incompressible string probably won't contain this post anywhere in it, but a random one can.

2. Feb 24, 2008

CRGreathouse

If you have a long string in which your post does not appear, you could compress it (very slightly) by using that fact. If your post has N characters, and the string (and post) are drawn from an alphabet of length A, it could be compressed by (100/A^N)%.* So if your post was "11" out of a binary alphabet, a string not containing it could be compressed by 25% by encoding in trinary: 0 = 00, 1 = 01, 2 = 10.

So I would actually expect your post to appear in a long-enough Kolmogorov-random string.

* Actually, depending on the post, probably a fair bit more, maybe as much as (100N/A^N)%.

3. Feb 24, 2008

CRGreathouse

Suppose your post was 171 characters long out of a 7-bit (128-character) alphabet. Then a Kolmogorov-random string of length 128171 = 2.15... x 10360 would be expected to have one copy of your post.

4. Feb 24, 2008

Hurkyl

Staff Emeritus
What is the "usual sense"? A Kolmogorov-random string certainly isn't a probability measure on a topological space -- and most peoples' intuitive notion of probability is notoriously ill-defined.

On the contrary, "almost all" strings contain a copy of your post -- therefore there exists a length L such that every incompressible string longer than L contains a copy of your post.

Last edited: Feb 24, 2008
5. Feb 24, 2008

Tiiba

By "random in the usual sense" I mean that any substring - including a trillion repetitions of the letter "g" - can appear in the string.

As I understand, Kolmogorov randomness is produced by simply compressing a non-random string. But if I have a random string generator, I could create a program that works as follows:

1 Read a string from the generator
2 Compress the chunk
3 If the chunk gets bigger due to the compression algorithm, place it in the output as is and prepend it with a 0; If it proves compressible, place the compressed version in the output and prepend it with a 1
4 If the amount of space saved through compression exceeds the size of the decompressor, halt.

So a truly random string would be compressible. But a Kolmogorov-random string is incompressible by definition. So is my logic wrong, or did I misunderstand a definition somewhere?

6. Feb 24, 2008

CRGreathouse

I think you'll find that the average of min(size(chunk), size(compress(chunk))) + 1 bit is actually greater than size(chunk) for uniformly random bitstrings and any compression algorithm, even though min(size(chunk), size(compress(chunk))) will of course be no larger than size(chunk).

7. Feb 24, 2008

Tiiba

Okay... Then I do this:

1) Place the whole random string in a file F.
2) Define a header of length A that instructs the decompressor to decompress the following string and insert the result at its original position.
3) Cut out all detected compressible chunks from F and place them in another file (as long as the compression saves more space than the header uses).
4) If the amount of space saved through compression exceeds the size of the decompressor, halt.

This way, there is no overhead for the incompressible chunks.