How Does Unequal Probability Affect Entropy Calculation in Information Theory?

Click For Summary

Discussion Overview

The discussion revolves around the concept of entropy in information theory, particularly focusing on how unequal probabilities affect entropy calculations. Participants explore the implications of different probability distributions on the minimum number of bits required to represent messages, with references to examples such as a loaded die and the encoding of outcomes.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant describes the basic definition of entropy as log(M) for a message with M equally probable values, and notes that this definition changes when probabilities are unequal.
  • Another participant provides an example using an 8-sided loaded die with specific probabilities, calculating the entropy as 127/64 bits and discussing the implications for communication efficiency.
  • A later reply suggests that the formula H(0,1a,1b) = H(0,1) + P(1)H(1a,1b) reflects the relationship between average bits needed for typical sequences and the bits required for individual messages, questioning if it can be expanded to include H(0).
  • One participant expresses unfamiliarity with the mathematical notation and suggests looking at external resources for clarity on the additivity of entropy.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and interpretation of entropy calculations, particularly regarding the implications of unequal probabilities. There is no clear consensus on the interpretation of the mathematical expressions or their implications.

Contextual Notes

Some participants express uncertainty about the mathematical notation and its implications, indicating potential limitations in understanding the additivity of entropy and how it applies to different probability groupings.

Who May Find This Useful

This discussion may be useful for individuals interested in information theory, particularly those exploring the concepts of entropy, probability distributions, and their applications in communication and data encoding.

Zak
Messages
15
Reaction score
0
I am reading a book called 'quantum processes systems and information' and in the beginning a basic idea of information is set out in the following way. If information is coded in 'bits', each of which have two possible states, then the number of possible different messages/'values' that can be encoded in an 'n' bit system is 2^n. If it is known that a message (to be received in the future) has M values, then the entropy is defined of log(M) = H where the logarithm is of base 2 (because its a binary system).

Effectively, the entropy of a message with M possible values tells you the minimum number of bits required to represent the message. However, this assumes that each possible 'value' of the message has an equal probability of being read. It is later discussed that if message with 2 possible values is being read with the probability of each message not being equal, then the entropy is defined in a different way.

If, say, the first message (0) has a probability of 1/3, and the second (1) of 2/3 then the second message is 'split' into two messages (say, 1a and 1b) each of which having a probability of 1/3. The entropy is then defined in the following way: if H(M) = entropy of message M, and P(x) = probability of value x then --> H(0,1a,1b) = H(0,1) + P(1)H(1a,1b).

Does anybody have an intuitive understanding of this definition of entropy for the second case (with unequal probabilities for each value of the message) that they could explain to me?

Danke.
 
Physics news on Phys.org
Let's look at an 8 sided loaded die with the following probabilities for each side.
The probabilities are:
P(side 1) = 1/2
P(side 2) = 1/4
P(side 3) = 1/8
P(side 4) = 1/16
P(side 5) = 1/32
P(side 6) = 1/64
P(side 7) = 1/128
P(side 8) = 1/128

The entropy of this probability distribution will be 127/64, or about 1.98 bits.

Now with 8 outcomes, one needs three bits to uniquely label each outcome.
(side 1) -> 000
(side 2) -> 001
,,,
...
(side 8) -> 111
In order to communicate the outcome with certainty, one can accomplish this by sending mo more than three bits of information every time.

What the entropy tells us is the minimum number of bits we need to send the outcome on average.

Indeed, if we roll the die many times, we can with high probability communicate the sequence of outcomes using only 1.98 bits per roll instead of 3 bits per roll.
What makes this possible is the law of large numbers.
In particular, with a long sequence of die rolls, the sequence of outcomes is overwhelmingly likely to be one where the relative frequencies of each outcome are very close to the true probabilities (these sequences are "typical").

What gives us this savings of data is that the number of typical sequences is usually a whole lot smaller than the total number of possible sequences. In particular, the number of digits needed to uniquely label all typical sequences is smaller than the number of bits needed to uniquely label all possible sequences.For a more concrete example (since this distribution is particularly easy to work with), consider this alternative encoding of each outcome

(side1) -> 1
(side 2) -> 01
(side 3) -> 001
(side 4) -> 0001
(side 5) -> 00001
(side 6) -> 000001
(side 7) -> 0000001
(side 8) -> 0000000

Even though sides 7 and 8 have 7 bit code words for their outcomes, these outcomes are much more improbable than the most likely outcome, which needs only one bit. Indeed, if you use this scheme, you can uniquely communicate the outcome using on average only 1.98 bits per roll.
 
Ah yes I think that makes a lot of sense! So does this mean that the statement 'H(0,1a,1b) = H(0,1) + P(1)H(1a,1b)' is in a sense saying that after a very large number of repetitions (on average), the number of bits required to express all possible sequences is equal to the average number of bits (to express typical sequences) plus the number of bits required to express each individual message (which are weighted by some probability of occurring, over large numbers of repetition)?

Does this mean then that the statement could actually be written as: H(0,1a,1b) = H(0,1) + P(1)H(1a,1b) + P(0)H(0) where H(0) = log(1) = 0 (because sequence '0' has only one possible value)?

Thanks a lot.
 
Unfortunately, I'm not familiar with the mathematical notation you're using. If I read it correctly, you're talking about the additivity of the entropy (that the total entropy is the same no matter how you group your probabilities). Wikipedia has a decent treatment of it, and maybe there you'll see what you're looking for?
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 6 ·
Replies
6
Views
30K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K