Intuition for information theory

Click For Summary

Discussion Overview

The discussion centers around the concepts of information theory, particularly the relationship between uncertainty, information content, and the encoding of symbols in messages. Participants explore the counter-intuitive aspects of how information is quantified and the implications for understanding messages in English.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant expresses confusion about why the information content of a two-symbol message cannot simply be summed to 12 bits, despite each symbol being associated with 6 bits of information.
  • Another participant explains that the combinatorial nature of the symbols allows for 64 possible values per character slot, leading to a maximum of 12 bits of information for two slots, according to Shannon Information Theory.
  • A different participant clarifies that the statement about encoding messages refers to 6 bits per symbol, not a total of 6 bits for the message.
  • One participant introduces the concept of entropy, noting that the average information carried by each symbol in an English message is less than 6 bits due to common patterns and predictability in language.
  • Another participant highlights that the understanding of information in this context is more about randomness than meaning, which can lead to confusion.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and confusion regarding the principles of information theory, particularly around the concepts of information content and entropy. There is no clear consensus on the best way to conceptualize these ideas, as different interpretations and clarifications are presented.

Contextual Notes

Participants mention the importance of common patterns in language and how they affect the perceived information content, indicating that assumptions about uniformity in symbol information may not hold in practical applications.

Who May Find This Useful

This discussion may be of interest to individuals studying information theory, linguistics, data compression, or those seeking to understand the nuances of how information is quantified and interpreted in different contexts.

nigels
Messages
36
Reaction score
0
Hi, although I've studied info theory briefly in the past, now revisiting it, I seem to be scratching my head trying to understand the counter-intuitive logic of it.

For instance,

I understand that

the amount of uncertainty associated with a symbol is correlated with the amount of information the symbol carries. Hence, each symbol in English (64 upper- and lowercase characters, a space, and punctuation marks) carries about 6 bits worth of information (2^6=64). However, the textbook also says that "no more than this amount of info can be contained in an English message, because we can in fact encode any such message in binary form using 6 binary digits per symbol".

What this means is that, in a 2 symbol message, if each message contains 6 bits of info, I can't sum the two up and say the whole message contains 12 bits of info. Why is this? What part of my intuition needs to be tweaked? How should I think about this in general?

Thank you very much for your help.
 
Physics news on Phys.org
nigels said:
What this means is that, in a 2 symbol message, if each message contains 6 bits of info, I can't sum the two up and say the whole message contains 12 bits of info. Why is this? What part of my intuition needs to be tweaked? How should I think about this in general?

Why can't you sum them up?

2^6 * 2^6 = 2^12

It's simple combinatorics. If you have a six bits you have 64 possible values for each character "slot". With two slots you have -- 64 in the first and 64 in the second -- 64*64 possible combinations. Shannon Information Theory then says that you have a maximum of 12 bits of information.

The interesting thing is to then compare that value to the probabilities from the stream of characters you are _actually_ getting to see if your system has any (non-random) order to it. The funny thing is that "Information" should have been "Randomness" in this usage. This leads to the confusing thing that folks interpret "Information" to be "Meaning", but it's the opposite...
 
nigels said:
"no more than this amount of info can be contained in an English message, because we can in fact encode any such message in binary form using 6 binary digits per symbol".

What is meant here is that no more than 6-bits PER SYMBOL can be contained in an English message, not 6-bits total.

(What I find more interesting is the amount of entropy in an English message; that is, how there is really a lot less than 6-bits of info carried by each symbol in a message ON AVERAGE. This is because there are common patterns that repeat with high probability. For example, say you have just read a "t"; you would not be terribly surprised to read an "h", since so many words contain the "th" combination. However, you would be surprised if an "m" followed a "t". I find that amusing anyways.)
 
navaburo said:
(What I find more interesting is the amount of entropy in an English message; that is, how there is really a lot less than 6-bits of info carried by each symbol in a message ON AVERAGE. This is because there are common patterns that repeat with high probability. For example, say you have just read a "t"; you would not be terribly surprised to read an "h", since so many words contain the "th" combination. However, you would be surprised if an "m" followed a "t". I find that amusing anyways.)

That is, in fact, the key to data compression...
 
Thank you all for the wonderfully helpful response! It all makes sense now. :)
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
3
Views
2K
  • · Replies 14 ·
Replies
14
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 45 ·
2
Replies
45
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K