In minimum description length, you start with some programming language L, and some data D, and seek to find a program P in L such that evaluating P outputs D. You try to minimize the length of P, and you consider P to be a valid model of the data if the length of P is less than the length of D.(adsbygoogle = window.adsbygoogle || []).push({});

However, what if your goals are more modest--suppose instead of describing all the data you only wish to find a "feature" in the data, like identifying a shape. Let a "feature descriptor" be a program F that outputs a string S, such that S agrees with D at a large number of points, and where S contains a "dummy" symbol (a symbol not used in D) at each other position. For example, if D = 0101001111111111010100011, then one might let F be a program that outputs the string S = xxxxxx1111111111xxxxxxxxx, which agrees with D at 10 places.

What is a criterion, along the lines of minimum description length, for finding valid features in the data? One possible criterion is to say that F is a valid feature descriptor if the length of F is less than the number of places where F's output, S, agrees with D. Is this a good criterion, or does it lead to spurious features?

A good criterion for finding valid features should very rarely allow one to find any "features" in a random string of data, no matter how long. So, suppose we have an infinite string D of 0's and 1's, each randomly selected by a fair coin flip. What is the probability that, by examining D, we can write a computer program P (say, in Python) that is a valid feature descriptor by the above criterion (length of P in bits is less than the number of places where the output of P agrees with D)? Is this probability close to 0 (as we'd like) or close to 1?

**Physics Forums | Science Articles, Homework Help, Discussion**

Join Physics Forums Today!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

# Minimum Description Length for Features

**Physics Forums | Science Articles, Homework Help, Discussion**