Truncating probabilities based on entropy

  • Context: Graduate 
  • Thread starter Thread starter Physics Monkey
  • Start date Start date
  • Tags Tags
    Entropy Probabilities
Click For Summary

Discussion Overview

The discussion revolves around the limitations of Shannon's information theory in the one-shot setting, particularly focusing on how truncating probabilities based on entropy affects the likelihood of missing states in a probability distribution. Participants explore the implications of keeping only the most probable states and the resulting errors in probability estimation.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions how badly Shannon's theory can fail in a one-shot scenario, seeking to understand the implications of truncating a probability distribution to the largest e^S states.
  • Another participant suggests that if the states with the largest probabilities are known, the error in missing states can be quantified directly, implying that the problem may not be as complex as presented.
  • A different participant clarifies their intent by introducing a specific probability distribution and seeks to maximize the difference between the original distribution and the truncated version, raising questions about the constraints imposed by fixed entropy.
  • Further elaboration indicates that while the maximum entropy is constrained by the number of states, there are degrees of freedom in adjusting probabilities without changing the overall entropy, suggesting a complex interplay between constraints and the goal of maximizing ||p - p_S||_1.
  • One participant emphasizes the need for additional constraints to narrow down the solution space, proposing that specific conditions could significantly impact the outcomes.

Areas of Agreement / Disagreement

Participants express differing views on the complexity of the problem and the implications of truncating probabilities. While some suggest that the issue may be straightforward, others highlight the intricate nature of the constraints involved, indicating that the discussion remains unresolved.

Contextual Notes

Limitations include the dependence on the definitions of entropy and probability distributions, as well as the unresolved mathematical steps related to maximizing ||p - p_S||_1 under fixed entropy conditions.

Physics Monkey
Homework Helper
Messages
1,363
Reaction score
34
Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly e^S states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only e^S states. Formally, I keep the e^S states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.
 
Physics news on Phys.org
Physics Monkey said:
Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly e^S states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only e^S states. Formally, I keep the e^S states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

Hi Monkey,

If you keep the states with largest probabilities it means you know them, which means that you know how badly you do by keeping just those states. I mean, if let's say the states you keep have a 0.99 probability to occur you know you have a 0.01 probability to miss a state you didn't keep. Obviously you can make that probability as close to one as you want by choosing more and more states. I'm probably missing something in your question but I don't see where is the problem.
 
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.
 
Physics Monkey said:
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

You say p is a probability distribution and so is p_S as far as I understand, so when you want to maximize ||p - p_S ||_1 I guess you are talking about a probability metric like a Fortet-Mourier type metric but, if S is fixed and you only have probability mass functions, then where is the degree of freedom you need to maximize your expression?
 
Physics Monkey said:
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by n=1,2,... and fix a large number S. Let p be a probability distribution over these states with p(n) \geq p(n+1) with entropy S and let p_S by the probability truncated to its largest e^S values e.g. p_S(n < e^S) = p(n) and p_S(n > e^S) = 0. I want to maximize ||p - p_S ||_1 over all p with a fixed entropy S.

Given any N states, the maximum entropy will always be log_2(N). You are given an entropy S which will have the constraint S <= log_2(N). The upper-bound corresponds to all probabilities being equal and the lower bound corresponds to one probability being 1. We also have the condition that the sum of all probabilities = 1 as a constraint.

From this description, and given your goal of maximizing ||p - p_S||_1, you are going to have some degrees of freedom if the entropy is in-between 0 and log_2(N) as a general statement. However bringing in the constraint of existing vector p will make the solution space a lot smaller, but it won't necessarily make it unique for the general case.

The reason intuitively for this is that you could for example swap probability values between different indices and the entropy wouldn't change. You also have the many degrees of freedom where you can lower one probability and make another one higher and do this in a myriad of ways without violating the constraints.

Given the above, you should introduce some more constraints to narrow down your solution space and the extra constraints should relate specifically to what you are trying to do: if for example you just wanted 'any' distribution that fit this criteria then you could construct a specific constraint where p_i+1 >= p_i which would force the solution space to shrink dramatically.

It would be useful for the readers to know some context for your problem to give further suggestions.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
2K
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K