Truncating probabilities based on entropy

Physics Monkey · Jun 18, 2012

Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly [itex]e^S[/itex] states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only [itex]e^S[/itex] states. Formally, I keep the [itex]e^S[/itex] states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

viraltux · Jun 19, 2012

Physics Monkey said:

Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly [itex]e^S[/itex] states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only [itex]e^S[/itex] states. Formally, I keep the [itex]e^S[/itex] states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

Hi Monkey,

If you keep the states with largest probabilities it means you know them, which means that you know how badly you do by keeping just those states. I mean, if let's say the states you keep have a 0.99 probability to occur you know you have a 0.01 probability to miss a state you didn't keep. Obviously you can make that probability as close to one as you want by choosing more and more states. I'm probably missing something in your question but I don't see where is the problem.

Physics Monkey · Jun 19, 2012

Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by [itex]n=1,2,...[/itex] and fix a large number [itex]S[/itex]. Let [itex]p[/itex] be a probability distribution over these states with [itex]p(n) \geq p(n+1)[/itex] with entropy [itex]S[/itex] and let [itex]p_S[/itex] by the probability truncated to its largest [itex]e^S[/itex] values e.g. [itex]p_S(n < e^S) = p(n)[/itex] and [itex]p_S(n > e^S) = 0[/itex]. I want to maximize [itex]||p - p_S ||_1[/itex] over all [itex]p[/itex] with a fixed entropy [itex]S[/itex].

viraltux · Jun 19, 2012

Physics Monkey said:

Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by [itex]n=1,2,...[/itex] and fix a large number [itex]S[/itex]. Let [itex]p[/itex] be a probability distribution over these states with [itex]p(n) \geq p(n+1)[/itex] with entropy [itex]S[/itex] and let [itex]p_S[/itex] by the probability truncated to its largest [itex]e^S[/itex] values e.g. [itex]p_S(n < e^S) = p(n)[/itex] and [itex]p_S(n > e^S) = 0[/itex]. I want to maximize [itex]||p - p_S ||_1[/itex] over all [itex]p[/itex] with a fixed entropy [itex]S[/itex].

You say [itex]p[/itex] is a probability distribution and so is [itex]p_S[/itex] as far as I understand, so when you want to maximize [itex]||p - p_S ||_1[/itex] I guess you are talking about a probability metric like a Fortet-Mourier type metric but, if S is fixed and you only have probability mass functions, then where is the degree of freedom you need to maximize your expression?

chiro · Jun 19, 2012

Physics Monkey said:

Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by [itex]n=1,2,...[/itex] and fix a large number [itex]S[/itex]. Let [itex]p[/itex] be a probability distribution over these states with [itex]p(n) \geq p(n+1)[/itex] with entropy [itex]S[/itex] and let [itex]p_S[/itex] by the probability truncated to its largest [itex]e^S[/itex] values e.g. [itex]p_S(n < e^S) = p(n)[/itex] and [itex]p_S(n > e^S) = 0[/itex]. I want to maximize [itex]||p - p_S ||_1[/itex] over all [itex]p[/itex] with a fixed entropy [itex]S[/itex].

Given any N states, the maximum entropy will always be log_2(N). You are given an entropy S which will have the constraint S <= log_2(N). The upper-bound corresponds to all probabilities being equal and the lower bound corresponds to one probability being 1. We also have the condition that the sum of all probabilities = 1 as a constraint.

From this description, and given your goal of maximizing ||p - p_S||_1, you are going to have some degrees of freedom if the entropy is in-between 0 and log_2(N) as a general statement. However bringing in the constraint of existing vector p will make the solution space a lot smaller, but it won't necessarily make it unique for the general case.

The reason intuitively for this is that you could for example swap probability values between different indices and the entropy wouldn't change. You also have the many degrees of freedom where you can lower one probability and make another one higher and do this in a myriad of ways without violating the constraints.

Given the above, you should introduce some more constraints to narrow down your solution space and the extra constraints should relate specifically to what you are trying to do: if for example you just wanted 'any' distribution that fit this criteria then you could construct a specific constraint where p_i+1 >= p_i which would force the solution space to shrink dramatically.

It would be useful for the readers to know some context for your problem to give further suggestions.

Truncating probabilities based on entropy

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Truncating probabilities based on entropy

Similar threads