- #26

0rthodontist

Science Advisor

- 1,230

- 0

Another definition I can think of is that an infinite string is compressible if every finite prefix above a certain minimum length is compressible. Even though the summary of an infinite string might also be infinite, you can read the first n digits of the summary and derive more than the first n digits of the infinite string. So one compression of the first 3 * n digits of k is the first n digits of j, and reading the first n digits of the summary allows you to extrapolate the first 3 * n digits of the original string k, so k is "compressible" and not fully random.

I think that a compression scheme of randomness needs to take into account three things when deciding the randomness of some given data:

1. The data itself

2. Any context for the data (as in the dice example, where the context is the starting velocities and physical properties of the dice). Data + context = the "real" data set that the machine will try to compress, and statements about randomness can be made about data + context considered together, not about data alone.

3. The machine itself. What one compression machine is able to compress is not necessarily compressible by another compression machine. For example, let's say we have two compression algorithms, one designed to run on images and the other designed to run on English text. Data for an image seems mostly random to the English text compressor, and data for English text seems mostly random to the image compressor. I don't think this can be turned into a question of "does there exist a machine that can compress this" because the answer for finite data is always yes, there is always a machine that can compress all the data into one bit as you have mentioned earlier. I also don't think it can be profitably turned into a question of the existence of a machine to distinguish between many different data sets of the same type, because of the question of how you define what that type is. It has to be relative to the machine. Making it relative to the type that the data sets are considered part of just exchanges one relativeness for another, and it seems more natural (to me) to talk about the machine.

Statements about randomness of data are then really statements about randomness of (data + context) for a particular machine.

I'm not sure what data you are talking about--the dice? Anyway, I may have addressed this point depending on what you mean.Furthermore, I don't think your definition is adequate. Maybe it's impossible to summarize that data in shorter form if you live in a vacuum... but it might become possible if you're allowed to refer to the initial conditions. If so, then I don't think the data could qualify as being "random".