Truncating probabilities based on entropy

In summary, the conversation discusses the concept of entropy and its relation to the number of states in a system. The speaker wants to know how badly the entropy can be affected in a one-shot setting where only a certain number of states are considered. The goal is to maximize the difference between the original probability distribution and a truncated version with the largest probabilities. The conversation also mentions the maximum entropy for a given number of states and the constraints that apply.
  • #1
Physics Monkey
Science Advisor
Homework Helper
1,363
34
Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly [itex] e^S [/itex] states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only [itex] e^S [/itex] states. Formally, I keep the [itex] e^S [/itex] states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.
 
Physics news on Phys.org
  • #2
Physics Monkey said:
Roughly speaking, I want to know how badly Shannon can fail in the one-shot setting.

The standard ideas of asymptotic information theory give a precise meaning to the entropy of a given probability distribution in terms of best achievable compression with vanishing error in the limit of many iid variables. More generally, we have the idea of a thermodynamic limit in which again roughly [itex] e^S [/itex] states suffice to capture most of the probability. S is the entropy which is growing with system size.

I have a set of questions related to these ideas but in a one-shot setting. Here is an example. I consider a fixed probability distribution over many states (maybe it has a system size like parameter, but let's not use that right now) and I want to know how badly I can do by keeping only [itex] e^S [/itex] states. Formally, I keep the [itex] e^S [/itex] states with largest probability and then I ask how big I can make the probability of choosing a state that I've not kept. For example, can I always find a probability distibution with a given entropy for which this error is arbitrarily close to one?

Any information along these general lines would be helpful to me. Also, this may be a "trivial" set of questions with "simple" answers in which case a reference suggestion would be very helpful.

Hi Monkey,

If you keep the states with largest probabilities it means you know them, which means that you know how badly you do by keeping just those states. I mean, if let's say the states you keep have a 0.99 probability to occur you know you have a 0.01 probability to miss a state you didn't keep. Obviously you can make that probability as close to one as you want by choosing more and more states. I'm probably missing something in your question but I don't see where is the problem.
 
  • #3
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by [itex] n=1,2,... [/itex] and fix a large number [itex] S [/itex]. Let [itex] p[/itex] be a probability distribution over these states with [itex] p(n) \geq p(n+1) [/itex] with entropy [itex] S [/itex] and let [itex] p_S [/itex] by the probability truncated to its largest [itex] e^S [/itex] values e.g. [itex] p_S(n < e^S) = p(n) [/itex] and [itex] p_S(n > e^S) = 0 [/itex]. I want to maximize [itex] ||p - p_S ||_1 [/itex] over all [itex] p [/itex] with a fixed entropy [itex] S [/itex].
 
  • #4
Physics Monkey said:
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by [itex] n=1,2,... [/itex] and fix a large number [itex] S [/itex]. Let [itex] p[/itex] be a probability distribution over these states with [itex] p(n) \geq p(n+1) [/itex] with entropy [itex] S [/itex] and let [itex] p_S [/itex] by the probability truncated to its largest [itex] e^S [/itex] values e.g. [itex] p_S(n < e^S) = p(n) [/itex] and [itex] p_S(n > e^S) = 0 [/itex]. I want to maximize [itex] ||p - p_S ||_1 [/itex] over all [itex] p [/itex] with a fixed entropy [itex] S [/itex].

You say [itex]p[/itex] is a probability distribution and so is [itex]p_S[/itex] as far as I understand, so when you want to maximize [itex] ||p - p_S ||_1 [/itex] I guess you are talking about a probability metric like a Fortet-Mourier type metric but, if S is fixed and you only have probability mass functions, then where is the degree of freedom you need to maximize your expression?
 
  • #5
Physics Monkey said:
Thanks, but I think I didn't convey what I wanted. I'll try again.

Consider a set of states labeled by [itex] n=1,2,... [/itex] and fix a large number [itex] S [/itex]. Let [itex] p[/itex] be a probability distribution over these states with [itex] p(n) \geq p(n+1) [/itex] with entropy [itex] S [/itex] and let [itex] p_S [/itex] by the probability truncated to its largest [itex] e^S [/itex] values e.g. [itex] p_S(n < e^S) = p(n) [/itex] and [itex] p_S(n > e^S) = 0 [/itex]. I want to maximize [itex] ||p - p_S ||_1 [/itex] over all [itex] p [/itex] with a fixed entropy [itex] S [/itex].

Given any N states, the maximum entropy will always be log_2(N). You are given an entropy S which will have the constraint S <= log_2(N). The upper-bound corresponds to all probabilities being equal and the lower bound corresponds to one probability being 1. We also have the condition that the sum of all probabilities = 1 as a constraint.

From this description, and given your goal of maximizing ||p - p_S||_1, you are going to have some degrees of freedom if the entropy is in-between 0 and log_2(N) as a general statement. However bringing in the constraint of existing vector p will make the solution space a lot smaller, but it won't necessarily make it unique for the general case.

The reason intuitively for this is that you could for example swap probability values between different indices and the entropy wouldn't change. You also have the many degrees of freedom where you can lower one probability and make another one higher and do this in a myriad of ways without violating the constraints.

Given the above, you should introduce some more constraints to narrow down your solution space and the extra constraints should relate specifically to what you are trying to do: if for example you just wanted 'any' distribution that fit this criteria then you could construct a specific constraint where p_i+1 >= p_i which would force the solution space to shrink dramatically.

It would be useful for the readers to know some context for your problem to give further suggestions.
 

1. What is the purpose of truncating probabilities based on entropy?

The purpose of truncating probabilities based on entropy is to simplify and reduce the complexity of a probability distribution by removing low-probability events. This can help in analyzing and understanding the distribution, as well as making it easier to compute or use in calculations.

2. How is entropy used in truncating probabilities?

Entropy is a measure of uncertainty or randomness in a system. In truncating probabilities, entropy is used to identify and remove low-probability events that contribute little to the overall distribution. This is done by setting a threshold value for entropy, and any events with lower entropy values are truncated or removed.

3. What are the potential benefits of truncating probabilities based on entropy?

Truncating probabilities based on entropy can have several benefits, including simplifying the probability distribution, reducing computation time, and improving the interpretability of the distribution. It can also help in identifying and focusing on the most relevant or significant events in the distribution.

4. Are there any drawbacks to using entropy-based truncation?

One potential drawback of entropy-based truncation is that it may remove important events or outliers that could be valuable in certain analyses. It can also be subjective, as the choice of threshold for entropy may vary depending on the context or goals of the analysis. Additionally, truncation based on entropy may not always be appropriate for all types of probability distributions.

5. How can one determine the optimal threshold for entropy-based truncation?

The optimal threshold for entropy-based truncation may vary depending on the specific context and goals of the analysis. One approach is to experiment with different threshold values and observe the impact on the resulting distribution and any subsequent analyses. Another approach is to consult with experts or use data-driven methods to determine the most suitable threshold. Ultimately, the optimal threshold may also depend on the trade-off between simplicity and accuracy in the analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
330
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
493
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
965
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
632
Replies
2
Views
842
Replies
1
Views
638
  • Thermodynamics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
1K
Replies
13
Views
1K
Back
Top