Can we determine the probability of a string using optimal encoding techniques?

Click For Summary

Discussion Overview

The discussion revolves around the possibility of determining the probability distribution of a discrete random variable from its optimal encoding. Participants explore whether an optimal encoding can reveal the underlying distribution, and they also consider related problems involving computer programs and their inputs in a binary string format.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that if one starts with an optimal encoding of a discrete random variable, it may be possible to derive the distribution of that variable, suggesting a method involving codeword lengths and normalization.
  • Others argue that this approach relies on the assumption that the encoding is optimal with respect to an already existing distribution, implying that the structure of the code reflects the distribution rather than constructing it from scratch.
  • One participant presents a mathematical formulation relating codeword lengths to probabilities, suggesting that the lengths can be used to derive a prior distribution over binary strings under certain assumptions.
  • Another participant introduces a related problem involving computer programs and their inputs, questioning how to derive a joint probability distribution when multiple programs may accept the same input or vice versa.
  • There is a discussion on the implications of Kolmogorov complexity in determining prior probabilities of strings, with the acknowledgment that multiple programs can generate the same string, complicating the interpretation of optimal encodings.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility of deriving a probability distribution from an optimal encoding, with no consensus reached on the validity of the proposed methods or assumptions. The discussion remains unresolved regarding the implications of these ideas in the context of computer programs and their inputs.

Contextual Notes

Limitations include the dependence on the assumption of optimality in encoding and the complexity introduced by multiple programs potentially generating the same string, which complicates the derivation of probabilities.

mXSCNT
Messages
310
Reaction score
1
The source coding theorem tells us that given a discrete probability distribution, there is an optimal encoding for it. Is it possible to go in the reverse direction? That is, suppose you start with an encoding of a discrete random variable X whose distribution is unknown. Assuming that this encoding is optimal, can one derive the distribution of X?

Suppose that one can determine the distribution of X from the optimal encoding. Then this would allow us to determine the a priori probability of any string in the encoding. In other words, that would let us construct a concept of probability out of nothing but a formal language!
 
Physics news on Phys.org
mXSCNT said:
The source coding theorem tells us that given a discrete probability distribution, there is an optimal encoding for it. Is it possible to go in the reverse direction? That is, suppose you start with an encoding of a discrete random variable X whose distribution is unknown. Assuming that this encoding is optimal, can one derive the distribution of X?

Yeah, sure. You know that the length of each codeword is proportional to the log of the probability corresponding element, so you can just raise all the codeword lengths to the power of 2 and then normalize. Up to round-off error (because the encoding lengths must be integers), you will end up with the original distribution.

More generally, any code, optimal or not, corresponds to an assumed probability distribution, and the average excess length of such a code (relative to the optimal one) is then the KL divergence between the assumed distribution and the true one.

mXSCNT said:
In other words, that would let us construct a concept of probability out of nothing but a formal language!

Not so fast. This whole approach is based on the assumption that the encoding is optimal with respect to the distribution, which is assumed to already exist. You aren't constructing anything here, but rather noting that the structure of the code tells you what the distribution was.
 
quadraphonics said:
Yeah, sure. You know that the length of each codeword is proportional to the log of the probability corresponding element, so you can just raise all the codeword lengths to the power of 2 and then normalize. Up to round-off error (because the encoding lengths must be integers), you will end up with the original distribution.
If I am not misinterpreting you, then
|s| = k * log(P(d(s)))
P(d(s)) = 2^(|s|/k)
where k is a normalizing constant, and d(s) is the event encoded by s. Since the string s is produced if and only if the event d(s) occurs, P(d(s)) = P(s).

Setting k = -1/2 normalizes the distribution, giving a prior distribution over binary strings, on the assumption that these strings are an optimal encoding for some random variable:
P(s) = 1/4^|s|
 
mXSCNT said:
If I am not misinterpreting you, then
|s| = k * log(P(d(s)))
P(d(s)) = 2^(|s|/k)
where k is a normalizing constant, and d(s) is the event encoded by s. Since the string s is produced if and only if the event d(s) occurs, P(d(s)) = P(s).

Yeah, that's the idea. Although, since [itex]P(d(s)) \leq 1[/itex], I'd write it as:

[tex] |s| = -k\log_2(P(d(s)))[/tex]

or even just

[tex] |s| \propto -\log_2(P(d(s)))[/tex]
 
I wish to solve the following problem using similar techniques. Suppose we have a set of computer programs written as binary strings, and a set of inputs also written as binary strings. Each computer program either terminates on an input, which means that it accepts the input, or it fails to terminate, which means that it rejects the input. I wish to answer the question, "given an input string, what is the maximum likelihood computer program that accepts that input string"?

In order to answer this question, it is necessary to derive a joint probability distribution over the computer programs and the input strings. It is not possible to assume that the computer programs are an optimal encoding of anything, because there may be more than one computer program that accepts a given input, a computer program may accept more than one input, and a computer program may reject all inputs. But perhaps it is possible to assume that the computer programs are optimal "recognizers" of the input, for some meaning of optimal.

This is related to another question: given the Kolmogorov complexity of a string, is it possible to derive the prior probability of that string, assuming that the computer programs used to generate strings are somehow optimal at doing so? In this case it's still not possible to interpret the computer programs as encoding the strings, because more than one computer program may generate the same string. But in this case each computer program corresponds to only one string, so it may be simpler in that regard.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 28 ·
Replies
28
Views
6K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
624
  • · Replies 264 ·
9
Replies
264
Views
33K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 13 ·
Replies
13
Views
10K