Truncating probabilities based on entropy

Physics Monkey · Jun 18, 2012

Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly e^S states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only e^S states. Formally, I keep the e^S states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

viraltux · Jun 19, 2012

Physics Monkey said:

Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly e^S states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only e^S states. Formally, I keep the e^S states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

Hi Monkey,

If you keep the states with largest probabilities it means you know them, which means that you know how badly you do by keeping just those states. I mean, if let's say the states you keep have a 0.99 probability to occur you know you have a 0.01 probability to miss a state you didn't keep. Obviously you can make that probability as close to one as you want by choosing more and more states. I'm probably missing something in your question but I don't see where is the problem.

Physics Monkey · Jun 19, 2012

Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

viraltux · Jun 19, 2012

Physics Monkey said:

Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

You say p is a probability distribution and so is p_S as far as I understand, so when you want to maximize ||p - p_S ||_1 I guess you are talking about a probability metric like a Fortet-Mourier type metric but, if S is fixed and you only have probability mass functions, then where is the degree of freedom you need to maximize your expression?

chiro · Jun 19, 2012

Physics Monkey said:

Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

Given any N states, the maximum entropy will always be log_2(N). You are given an entropy S which will have the constraint S <= log_2(N). The upper-bound corresponds to all probabilities being equal and the lower bound corresponds to one probability being 1. We also have the condition that the sum of all probabilities = 1 as a constraint.

From this description, and given your goal of maximizing ||p - p_S||_1, you are going to have some degrees of freedom if the entropy is in-between 0 and log_2(N) as a general statement. However bringing in the constraint of existing vector p will make the solution space a lot smaller, but it won't necessarily make it unique for the general case.

The reason intuitively for this is that you could for example swap probability values between different indices and the entropy wouldn't change. You also have the many degrees of freedom where you can lower one probability and make another one higher and do this in a myriad of ways without violating the constraints.

Given the above, you should introduce some more constraints to narrow down your solution space and the extra constraints should relate specifically to what you are trying to do: if for example you just wanted 'any' distribution that fit this criteria then you could construct a specific constraint where p_i+1 >= p_i which would force the solution space to shrink dramatically.

It would be useful for the readers to know some context for your problem to give further suggestions.

Truncating probabilities based on entropy

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Thread 'Roulette wheel physics and probability'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

I Stochastic calculus: Ito's lemma and differentials

I The reason for lambda calculus being universal

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective