How Does Unequal Probability Affect Entropy Calculation in Information Theory?

In summary, the conversation discussed the concept of entropy and its relationship to information coding. It was explained that if information is coded in 'bits', each with two possible states, then the number of possible messages that can be encoded in an 'n' bit system is 2^n. The entropy of a message with M possible values was defined as the minimum number of bits required to represent the message, assuming equal probabilities for each value. However, if the probabilities are unequal, the entropy is defined in a different way. The conversation then provided an intuitive example using a loaded die to demonstrate how entropy can be reduced by using a more efficient coding scheme for more probable outcomes. The conversation concluded with a discussion on the additivity of entropy.
  • #1
Zak
15
0
I am reading a book called 'quantum processes systems and information' and in the beggining a basic idea of information is set out in the following way. If information is coded in 'bits', each of which have two possible states, then the number of possible different messages/'values' that can be encoded in an 'n' bit system is 2^n. If it is known that a message (to be received in the future) has M values, then the entropy is defined of log(M) = H where the logarithm is of base 2 (because its a binary system).

Effectively, the entropy of a message with M possible values tells you the minimum number of bits required to represent the message. However, this assumes that each possible 'value' of the message has an equal probability of being read. It is later discussed that if message with 2 possible values is being read with the probability of each message not being equal, then the entropy is defined in a different way.

If, say, the first message (0) has a probability of 1/3, and the second (1) of 2/3 then the second message is 'split' into two messages (say, 1a and 1b) each of which having a probability of 1/3. The entropy is then defined in the following way: if H(M) = entropy of message M, and P(x) = probability of value x then --> H(0,1a,1b) = H(0,1) + P(1)H(1a,1b).

Does anybody have an intuitive understanding of this definition of entropy for the second case (with unequal probabilities for each value of the message) that they could explain to me?

Danke.
 
Mathematics news on Phys.org
  • #2
Let's look at an 8 sided loaded die with the following probabilities for each side.
The probabilities are:
P(side 1) = 1/2
P(side 2) = 1/4
P(side 3) = 1/8
P(side 4) = 1/16
P(side 5) = 1/32
P(side 6) = 1/64
P(side 7) = 1/128
P(side 8) = 1/128

The entropy of this probability distribution will be 127/64, or about 1.98 bits.

Now with 8 outcomes, one needs three bits to uniquely label each outcome.
(side 1) -> 000
(side 2) -> 001
,,,
...
(side 8) -> 111
In order to communicate the outcome with certainty, one can accomplish this by sending mo more than three bits of information every time.

What the entropy tells us is the minimum number of bits we need to send the outcome on average.

Indeed, if we roll the die many times, we can with high probability communicate the sequence of outcomes using only 1.98 bits per roll instead of 3 bits per roll.
What makes this possible is the law of large numbers.
In particular, with a long sequence of die rolls, the sequence of outcomes is overwhelmingly likely to be one where the relative frequencies of each outcome are very close to the true probabilities (these sequences are "typical").

What gives us this savings of data is that the number of typical sequences is usually a whole lot smaller than the total number of possible sequences. In particular, the number of digits needed to uniquely label all typical sequences is smaller than the number of bits needed to uniquely label all possible sequences.For a more concrete example (since this distribution is particularly easy to work with), consider this alternative encoding of each outcome

(side1) -> 1
(side 2) -> 01
(side 3) -> 001
(side 4) -> 0001
(side 5) -> 00001
(side 6) -> 000001
(side 7) -> 0000001
(side 8) -> 0000000

Even though sides 7 and 8 have 7 bit code words for their outcomes, these outcomes are much more improbable than the most likely outcome, which needs only one bit. Indeed, if you use this scheme, you can uniquely communicate the outcome using on average only 1.98 bits per roll.
 
  • #3
Ah yes I think that makes a lot of sense! So does this mean that the statement 'H(0,1a,1b) = H(0,1) + P(1)H(1a,1b)' is in a sense saying that after a very large number of repetitions (on average), the number of bits required to express all possible sequences is equal to the average number of bits (to express typical sequences) plus the number of bits required to express each individual message (which are weighted by some probability of occurring, over large numbers of repetition)?

Does this mean then that the statement could actually be written as: H(0,1a,1b) = H(0,1) + P(1)H(1a,1b) + P(0)H(0) where H(0) = log(1) = 0 (because sequence '0' has only one possible value)?

Thanks a lot.
 
  • #4
Unfortunately, I'm not familiar with the mathematical notation you're using. If I read it correctly, you're talking about the additivity of the entropy (that the total entropy is the same no matter how you group your probabilities). Wikipedia has a decent treatment of it, and maybe there you'll see what you're looking for?
 

What is information theory?

Information theory is a branch of mathematics and computer science that deals with the quantification, storage, and communication of information. It was developed by Claude Shannon in the 1940s as a way to measure the amount of information in a message and to understand how it can be efficiently transmitted and received.

What is the basic concept of information theory?

The basic concept of information theory is that information can be measured and quantified in terms of bits, which are the smallest unit of data. This means that any message, whether it is a text, image, or sound, can be broken down into a series of bits and transmitted from one point to another.

What is the relationship between information and uncertainty?

Information theory states that the amount of information in a message is directly related to the level of uncertainty or randomness in that message. A message that is highly predictable and contains less uncertainty will have a lower amount of information, while a message that is more unpredictable and contains more uncertainty will have a higher amount of information.

What are the main applications of information theory?

Information theory has many practical applications, including data compression, error correction, cryptography, and communication systems. It is also used in fields such as statistics, biology, and neuroscience to understand and analyze complex systems.

How has information theory impacted the field of communication?

Information theory has revolutionized the way we communicate by providing a framework for understanding how information is transmitted and received. It has led to the development of more efficient communication systems, such as the internet and wireless networks, and has paved the way for advancements in fields such as telecommunications and data storage.

Similar threads

Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Special and General Relativity
Replies
6
Views
985
Replies
13
Views
1K
  • Beyond the Standard Models
Replies
3
Views
2K
  • Beyond the Standard Models
Replies
9
Views
2K
Replies
2
Views
845
  • Beyond the Standard Models
Replies
1
Views
821
  • Programming and Computer Science
Replies
9
Views
3K
Back
Top