Can we determine the probability of a string using optimal encoding techniques?

In summary, the source coding theorem states that given a discrete probability distribution, there is an optimal encoding for it. This encoding can be "raised to the power of 2" and then normalized to get the original distribution. Any encoding, whether optimal or not, corresponds to an assumed probability distribution, and the average excess length of such a code (relative to the optimal one) is then the KL divergence between the assumed distribution and the true one.
  • #1
mXSCNT
315
1
The source coding theorem tells us that given a discrete probability distribution, there is an optimal encoding for it. Is it possible to go in the reverse direction? That is, suppose you start with an encoding of a discrete random variable X whose distribution is unknown. Assuming that this encoding is optimal, can one derive the distribution of X?

Suppose that one can determine the distribution of X from the optimal encoding. Then this would allow us to determine the a priori probability of any string in the encoding. In other words, that would let us construct a concept of probability out of nothing but a formal language!
 
Physics news on Phys.org
  • #2
mXSCNT said:
The source coding theorem tells us that given a discrete probability distribution, there is an optimal encoding for it. Is it possible to go in the reverse direction? That is, suppose you start with an encoding of a discrete random variable X whose distribution is unknown. Assuming that this encoding is optimal, can one derive the distribution of X?

Yeah, sure. You know that the length of each codeword is proportional to the log of the probability corresponding element, so you can just raise all the codeword lengths to the power of 2 and then normalize. Up to round-off error (because the encoding lengths must be integers), you will end up with the original distribution.

More generally, any code, optimal or not, corresponds to an assumed probability distribution, and the average excess length of such a code (relative to the optimal one) is then the KL divergence between the assumed distribution and the true one.

mXSCNT said:
In other words, that would let us construct a concept of probability out of nothing but a formal language!

Not so fast. This whole approach is based on the assumption that the encoding is optimal with respect to the distribution, which is assumed to already exist. You aren't constructing anything here, but rather noting that the structure of the code tells you what the distribution was.
 
  • #3
quadraphonics said:
Yeah, sure. You know that the length of each codeword is proportional to the log of the probability corresponding element, so you can just raise all the codeword lengths to the power of 2 and then normalize. Up to round-off error (because the encoding lengths must be integers), you will end up with the original distribution.
If I am not misinterpreting you, then
|s| = k * log(P(d(s)))
P(d(s)) = 2^(|s|/k)
where k is a normalizing constant, and d(s) is the event encoded by s. Since the string s is produced if and only if the event d(s) occurs, P(d(s)) = P(s).

Setting k = -1/2 normalizes the distribution, giving a prior distribution over binary strings, on the assumption that these strings are an optimal encoding for some random variable:
P(s) = 1/4^|s|
 
  • #4
mXSCNT said:
If I am not misinterpreting you, then
|s| = k * log(P(d(s)))
P(d(s)) = 2^(|s|/k)
where k is a normalizing constant, and d(s) is the event encoded by s. Since the string s is produced if and only if the event d(s) occurs, P(d(s)) = P(s).

Yeah, that's the idea. Although, since [itex]P(d(s)) \leq 1 [/itex], I'd write it as:

[tex]
|s| = -k\log_2(P(d(s)))
[/tex]

or even just

[tex]
|s| \propto -\log_2(P(d(s)))
[/tex]
 
  • #5
I wish to solve the following problem using similar techniques. Suppose we have a set of computer programs written as binary strings, and a set of inputs also written as binary strings. Each computer program either terminates on an input, which means that it accepts the input, or it fails to terminate, which means that it rejects the input. I wish to answer the question, "given an input string, what is the maximum likelihood computer program that accepts that input string"?

In order to answer this question, it is necessary to derive a joint probability distribution over the computer programs and the input strings. It is not possible to assume that the computer programs are an optimal encoding of anything, because there may be more than one computer program that accepts a given input, a computer program may accept more than one input, and a computer program may reject all inputs. But perhaps it is possible to assume that the computer programs are optimal "recognizers" of the input, for some meaning of optimal.

This is related to another question: given the Kolmogorov complexity of a string, is it possible to derive the prior probability of that string, assuming that the computer programs used to generate strings are somehow optimal at doing so? In this case it's still not possible to interpret the computer programs as encoding the strings, because more than one computer program may generate the same string. But in this case each computer program corresponds to only one string, so it may be simpler in that regard.
 

Related to Can we determine the probability of a string using optimal encoding techniques?

What is the Inverse Source Coding Theorem?

The Inverse Source Coding Theorem, also known as the Rate-Distortion Theorem, is a fundamental theorem in information theory that establishes the minimum rate at which a source can be encoded in order to achieve a given level of distortion at the decoder.

How is the Inverse Source Coding Theorem used in data compression?

The theorem is used in data compression by determining the minimum amount of information that can be removed from a source signal without significantly affecting the quality of the reconstructed signal. This allows for efficient compression of data without losing important information.

What are the key assumptions of the Inverse Source Coding Theorem?

The theorem assumes that the source signal is stationary, meaning its statistical properties do not change over time, and that the encoder and decoder have access to the same codebook or dictionary. It also assumes that there is no channel noise or errors in transmission.

What is the relationship between the Inverse Source Coding Theorem and Shannon's source coding theorem?

The Inverse Source Coding Theorem is essentially the inverse of Shannon's source coding theorem, which states the minimum rate at which a source can be encoded without any distortion. The Inverse Source Coding Theorem determines the minimum distortion that can be achieved for a given encoding rate.

What practical applications does the Inverse Source Coding Theorem have?

The theorem has numerous practical applications in data compression, image and video coding, and error control coding. It is also used in various communication systems to optimize data transmission and storage. Additionally, the theorem has implications in the field of artificial intelligence and machine learning for data compression and representation.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
762
Replies
12
Views
793
  • Programming and Computer Science
2
Replies
55
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
5K
  • STEM Academic Advising
Replies
13
Views
2K
  • Quantum Interpretations and Foundations
2
Replies
37
Views
2K
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
3K
Back
Top