Shannon's measure - any profound explanation?

  • Context: Graduate 
  • Thread starter Thread starter saviourmachine
  • Start date Start date
  • Tags Tags
    Explanation Measure
Click For Summary

Discussion Overview

The discussion revolves around Shannon's measure of information, specifically its formulation and implications in the context of entropy and probability. Participants explore the mathematical structure of Shannon's entropy, its significance, and its applications in scientific modeling and inference.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant presents Shannon's entropy formula and questions the meaning behind the use of probabilities raised to their own powers, seeking an intuitive understanding beyond standard explanations.
  • Another participant describes Shannon's measure as a unique method for making inferences based on probability, introducing concepts like conditional entropy and the optimization of inferences in scientific models.
  • A different participant argues that the alternative formulation of Shannon's entropy is merely mathematically equivalent to the original and does not imply deeper significance, emphasizing the role of the logarithm in measuring information.
  • Another contribution highlights the logarithm's function in ensuring that entropy for independent distributions can be additive, linking it to the asymptotic equipartition theorem and discussing mutual information in relation to possibilities.

Areas of Agreement / Disagreement

Participants express differing views on the significance of the logarithmic formulation of Shannon's measure, with some seeing deeper implications while others view it as a straightforward mathematical equivalence. The discussion remains unresolved regarding the intuitive understanding of the measure.

Contextual Notes

Some assumptions about the nature of probability distributions and the context of entropy in scientific modeling are not fully explored, leaving room for further clarification on these concepts.

saviourmachine
Messages
96
Reaction score
0
Shannon's measure of information is the well known formula (in the discrete case):
[tex]H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)[/tex]
Of course this can be written as:
[tex]H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}[/tex]
It appears to me that the multiple occurrence of the same quantity (the probability of a particular symbol) must have some profound meaning. Why is it a power that is used to formulate this 'entropy' measure? In the case of throwing a six with a dice, the result of [tex]1/6^{-1/6}=1.348[/tex] and [tex]5/6^{-5/6}=22.255[/tex]. Is there some intuitive alternative association with these numbers (without direct connection with binary strings)?
Until now I found only explanations that took the logarithm for granted. IMHO the logarithm is only rescaling the "chances to the power of chances" characteristic. So, I would like to have an explanation considering that characteristic.

Or is it more or less arbitrary, as taking quadratic error measures in stead of absolute errors (in e.g. the least mean square method)?
 
Physics news on Phys.org
A scientific model is an algorithm (mathematical procedure) for making inferences. An "inference" is an extrapolation from the state of a real object in an "observed state-space" to the state of the same object in an "unobserved state-space." Shannon's measure of information can be shown to be the unique measure of these inferences when the uncertainty of each state in the unobserved state-space is measured by the notion of probability.

Generally, Shannon's measure of the inferences is called the "conditional entropy." In the circumstance that the observed state-space contains but a single state, the conditional entropy reduces to the "entropy."

The existence and uniqueness of Shannon's measure makes it the only possible choice if one is to optimize the inferences that are made by a model. One optimizes these inferences by maximizing the entropy under zero or more constraints or by minimizing the conditional entropy. Optimization works extremely well in deciding which inferences shall be made by a model, thus the role for Shannon's measure in science.

By the way, Shannon's measure of information is not identical to the entropy but rather is a broader concept, some of whose manifestations also include the conditional entropy and the mutual information.
 
i think this:
[tex]H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}[/tex]

is just a consequence of this:

[tex]H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)[/tex]

and has no other meaning than that it is mathematical equivalent. the reason the bottom works is that the measure of information of message si is -log( P(si) ). then the mean quantity of information (per message) over all of the messages is H(S).
 
The logarithm is there because we want the entropy for independent probability distributions [p(x,y)=p(x)p(y)] to add. A log is what converts multiplication into addition.

The entropy is roughly "log of the number of different possibilities". This precise formulation of this statement is called the "asymptotic equipartition theorem".

The mutual information is roughly "reduction in the log of the number of different possibilities".
 
Last edited:

Similar threads

  • · Replies 2 ·
Replies
2
Views
762
  • · Replies 1 ·
Replies
1
Views
844
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
4
Views
3K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 4 ·
Replies
4
Views
1K