- #1
- 315
- 1
In minimum description length, you start with some programming language L, and some data D, and seek to find a program P in L such that evaluating P outputs D. You try to minimize the length of P, and you consider P to be a valid model of the data if the length of P is less than the length of D.
However, what if your goals are more modest--suppose instead of describing all the data you only wish to find a "feature" in the data, like identifying a shape. Let a "feature descriptor" be a program F that outputs a string S, such that S agrees with D at a large number of points, and where S contains a "dummy" symbol (a symbol not used in D) at each other position. For example, if D = 0101001111111111010100011, then one might let F be a program that outputs the string S = xxxxxx1111111111xxxxxxxxx, which agrees with D at 10 places.
What is a criterion, along the lines of minimum description length, for finding valid features in the data? One possible criterion is to say that F is a valid feature descriptor if the length of F is less than the number of places where F's output, S, agrees with D. Is this a good criterion, or does it lead to spurious features?
A good criterion for finding valid features should very rarely allow one to find any "features" in a random string of data, no matter how long. So, suppose we have an infinite string D of 0's and 1's, each randomly selected by a fair coin flip. What is the probability that, by examining D, we can write a computer program P (say, in Python) that is a valid feature descriptor by the above criterion (length of P in bits is less than the number of places where the output of P agrees with D)? Is this probability close to 0 (as we'd like) or close to 1?
However, what if your goals are more modest--suppose instead of describing all the data you only wish to find a "feature" in the data, like identifying a shape. Let a "feature descriptor" be a program F that outputs a string S, such that S agrees with D at a large number of points, and where S contains a "dummy" symbol (a symbol not used in D) at each other position. For example, if D = 0101001111111111010100011, then one might let F be a program that outputs the string S = xxxxxx1111111111xxxxxxxxx, which agrees with D at 10 places.
What is a criterion, along the lines of minimum description length, for finding valid features in the data? One possible criterion is to say that F is a valid feature descriptor if the length of F is less than the number of places where F's output, S, agrees with D. Is this a good criterion, or does it lead to spurious features?
A good criterion for finding valid features should very rarely allow one to find any "features" in a random string of data, no matter how long. So, suppose we have an infinite string D of 0's and 1's, each randomly selected by a fair coin flip. What is the probability that, by examining D, we can write a computer program P (say, in Python) that is a valid feature descriptor by the above criterion (length of P in bits is less than the number of places where the output of P agrees with D)? Is this probability close to 0 (as we'd like) or close to 1?