Shannon's measure - any profound explanation?

  • Thread starter Thread starter saviourmachine
  • Start date Start date
  • Tags Tags
    Explanation Measure
AI Thread Summary
Shannon's measure of information, expressed as H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i), quantifies uncertainty and is essential for optimizing inferences in scientific models. The formula highlights the significance of probabilities raised to their own power, suggesting a deeper relationship with the concept of entropy. The logarithm serves to convert multiplicative relationships into additive ones, allowing for the aggregation of independent probability distributions. Shannon's measure encompasses broader concepts like conditional entropy and mutual information, emphasizing its unique role in information theory. Ultimately, it provides a mathematical foundation for understanding and optimizing information in various contexts.
saviourmachine
Messages
96
Reaction score
0
Shannon's measure of information is the well known formula (in the discrete case):
H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)
Of course this can be written as:
H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}
It appears to me that the multiple occurence of the same quantity (the probability of a particular symbol) must have some profound meaning. Why is it a power that is used to formulate this 'entropy' measure? In the case of throwing a six with a dice, the result of 1/6^{-1/6}=1.348 and 5/6^{-5/6}=22.255. Is there some intuitive alternative association with these numbers (without direct connection with binary strings)?
Until now I found only explanations that took the logarithm for granted. IMHO the logarithm is only rescaling the "chances to the power of chances" characteristic. So, I would like to have an explanation considering that characteristic.

Or is it more or less arbitrary, as taking quadratic error measures in stead of absolute errors (in e.g. the least mean square method)?
 
Physics news on Phys.org
A scientific model is an algorithm (mathematical procedure) for making inferences. An "inference" is an extrapolation from the state of a real object in an "observed state-space" to the state of the same object in an "unobserved state-space." Shannon's measure of information can be shown to be the unique measure of these inferences when the uncertainty of each state in the unobserved state-space is measured by the notion of probability.

Generally, Shannon's measure of the inferences is called the "conditional entropy." In the circumstance that the observed state-space contains but a single state, the conditional entropy reduces to the "entropy."

The existence and uniqueness of Shannon's measure makes it the only possible choice if one is to optimize the inferences that are made by a model. One optimizes these inferences by maximizing the entropy under zero or more constraints or by minimizing the conditional entropy. Optimization works extremely well in deciding which inferences shall be made by a model, thus the role for Shannon's measure in science.

By the way, Shannon's measure of information is not identical to the entropy but rather is a broader concept, some of whose manifestations also include the conditional entropy and the mutual information.
 
i think this:
H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}

is just a consequence of this:

H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)

and has no other meaning than that it is mathematical equivalent. the reason the bottom works is that the measure of information of message si is -log( P(si) ). then the mean quantity of information (per message) over all of the messages is H(S).
 
The logarithm is there because we want the entropy for independent probability distributions [p(x,y)=p(x)p(y)] to add. A log is what converts multiplication into addition.

The entropy is roughly "log of the number of different possibilities". This precise formulation of this statement is called the "asymptotic equipartition theorem".

The mutual information is roughly "reduction in the log of the number of different possibilities".
 
Last edited:
comparing a flat solar panel of area 2π r² and a hemisphere of the same area, the hemispherical solar panel would only occupy the area π r² of while the flat panel would occupy an entire 2π r² of land. wouldn't the hemispherical version have the same area of panel exposed to the sun, occupy less land space and can therefore increase the number of panels one land can have fitted? this would increase the power output proportionally as well. when I searched it up I wasn't satisfied with...

Similar threads

Replies
2
Views
422
Replies
1
Views
541
Replies
15
Views
3K
Replies
1
Views
2K
Replies
4
Views
1K
Replies
33
Views
8K
Replies
2
Views
2K
Back
Top