Why do we use log2(1/p(xi)) in Shannon's entropy definition?

  • Context: Undergrad 
  • Thread starter Thread starter dervast
  • Start date Start date
  • Tags Tags
    Definition Entropy
Click For Summary
SUMMARY

The discussion centers on the use of log2(1/p(xi)) in Shannon's entropy definition, which quantifies the information content of an event based on its probability. The formula H(X) = -∑ p(X=s) · log2(p(X=s)) represents the expected value of information, where p(xi) indicates the uncertainty of an event. The logarithmic function reflects the idea that less probable events convey more information, as demonstrated by the example of an encoded three-letter word. This highlights the relationship between probability and information relevance in Shannon's framework.

PREREQUISITES
  • Understanding of probability distributions
  • Familiarity with logarithmic functions
  • Basic knowledge of information theory concepts
  • Experience with Shannon's entropy definition
NEXT STEPS
  • Study the derivation of Shannon's entropy formula
  • Explore the concept of information content in probability theory
  • Learn about alternative measures of information beyond Shannon's entropy
  • Investigate applications of entropy in data compression and cryptography
USEFUL FOR

Students of information theory, data scientists, and professionals in fields involving data analysis and communication theory will benefit from this discussion.

dervast
Messages
132
Reaction score
1
Hi to everyone i was reading today the wikipedia's article about information entropy
I need some help to understand why in the
http://upload.wikimedia.org/math/6/a/3/6a33010c16b1d526bc5daee924e3d363.png
entropy of an event we use the log2(1/p(xi))
I have read to the article that
"An intuitive understanding of information entropy relates to the amount of uncertainty about an event associated with a given probability distribution. "
Then why only the first part of the equation Sump(xi) is not enough. p(xi) denotes the amount of uncertainty of an event..
 
Physics news on Phys.org
The entropy is an expectation value, namely ##H(X)=E(X)=-\sum_{s\in S} p(X=s) \cdot \log_2(p(X=s))##; the weighted sum of all informations. The information of an event ##s## over a binary alphabet is defined as ##\log_2 \left( \frac{1}{p(X=s)} \right)##. The rest follows from this.

There are other possible measures of information, but the one above is which Shannon defined and considered.

The logarithm is basically a length, and ##\frac{1}{p(X=s)}## the statistical relevanz: An event which occurs for sure doesn't carry any information. But the less probable an event is, the more information is carried. Imagine an encoded three letter word. An 'x' in the middle carries more information about this word than an 'e' does. There are only a few words like 'axe', but many like 'bet', 'get', 'jet', 'let', 'set', 'men', etc.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 16 ·
Replies
16
Views
4K
  • · Replies 13 ·
Replies
13
Views
10K
  • · Replies 12 ·
Replies
12
Views
5K
Replies
5
Views
2K
  • · Replies 0 ·
Replies
0
Views
1K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K