Truncating probabilities based on entropy

Physics Monkey
Homework Helper
Messages
1,363
Reaction score
34
Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly e^S states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only e^S states. Formally, I keep the e^S states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.
 
Physics news on Phys.org
Physics Monkey said:
Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly e^S states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only e^S states. Formally, I keep the e^S states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

Hi Monkey,

If you keep the states with largest probabilities it means you know them, which means that you know how badly you do by keeping just those states. I mean, if let's say the states you keep have a 0.99 probability to occur you know you have a 0.01 probability to miss a state you didn't keep. Obviously you can make that probability as close to one as you want by choosing more and more states. I'm probably missing something in your question but I don't see where is the problem.
 
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.
 
Physics Monkey said:
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

You say p is a probability distribution and so is p_S as far as I understand, so when you want to maximize ||p - p_S ||_1 I guess you are talking about a probability metric like a Fortet-Mourier type metric but, if S is fixed and you only have probability mass functions, then where is the degree of freedom you need to maximize your expression?
 
Physics Monkey said:
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

Given any N states, the maximum entropy will always be log_2(N). You are given an entropy S which will have the constraint S <= log_2(N). The upper-bound corresponds to all probabilities being equal and the lower bound corresponds to one probability being 1. We also have the condition that the sum of all probabilities = 1 as a constraint.

From this description, and given your goal of maximizing ||p - p_S||_1, you are going to have some degrees of freedom if the entropy is in-between 0 and log_2(N) as a general statement. However bringing in the constraint of existing vector p will make the solution space a lot smaller, but it won't necessarily make it unique for the general case.

The reason intuitively for this is that you could for example swap probability values between different indices and the entropy wouldn't change. You also have the many degrees of freedom where you can lower one probability and make another one higher and do this in a myriad of ways without violating the constraints.

Given the above, you should introduce some more constraints to narrow down your solution space and the extra constraints should relate specifically to what you are trying to do: if for example you just wanted 'any' distribution that fit this criteria then you could construct a specific constraint where p_i+1 >= p_i which would force the solution space to shrink dramatically.

It would be useful for the readers to know some context for your problem to give further suggestions.
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top