Shannon's measure - any profound explanation?

saviourmachine · Oct 26, 2004

Shannon's measure of information is the well known formula (in the discrete case):
H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)
Of course this can be written as:
H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}
It appears to me that the multiple occurence of the same quantity (the probability of a particular symbol) must have some profound meaning. Why is it a power that is used to formulate this 'entropy' measure? In the case of throwing a six with a dice, the result of 1/6^{-1/6}=1.348 and 5/6^{-5/6}=22.255. Is there some intuitive alternative association with these numbers (without direct connection with binary strings)?
Until now I found only explanations that took the logarithm for granted. IMHO the logarithm is only rescaling the "chances to the power of chances" characteristic. So, I would like to have an explanation considering that characteristic.

Or is it more or less arbitrary, as taking quadratic error measures in stead of absolute errors (in e.g. the least mean square method)?

Terry Oldberg · Sep 13, 2008

A scientific model is an algorithm (mathematical procedure) for making inferences. An "inference" is an extrapolation from the state of a real object in an "observed state-space" to the state of the same object in an "unobserved state-space." Shannon's measure of information can be shown to be the unique measure of these inferences when the uncertainty of each state in the unobserved state-space is measured by the notion of probability.

Generally, Shannon's measure of the inferences is called the "conditional entropy." In the circumstance that the observed state-space contains but a single state, the conditional entropy reduces to the "entropy."

The existence and uniqueness of Shannon's measure makes it the only possible choice if one is to optimize the inferences that are made by a model. One optimizes these inferences by maximizing the entropy under zero or more constraints or by minimizing the conditional entropy. Optimization works extremely well in deciding which inferences shall be made by a model, thus the role for Shannon's measure in science.

By the way, Shannon's measure of information is not identical to the entropy but rather is a broader concept, some of whose manifestations also include the conditional entropy and the mutual information.

rbj · Sep 13, 2008

i think this:
H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}

is just a consequence of this:

H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)

and has no other meaning than that it is mathematical equivalent. the reason the bottom works is that the measure of information of message s_i is -log( P(s_i) ). then the mean quantity of information (per message) over all of the messages is H(S).

atyy · Sep 13, 2008

The logarithm is there because we want the entropy for independent probability distributions [p(x,y)=p(x)p(y)] to add. A log is what converts multiplication into addition.

The entropy is roughly "log of the number of different possibilities". This precise formulation of this statement is called the "asymptotic equipartition theorem".

The mutual information is roughly "reduction in the log of the number of different possibilities".

Shannon's measure - any profound explanation?

Thread 'Need help understanding particle physics and quantum physics'

Thread 'Linear generator prototype'

Thread 'Fundamental Probability'

Similar threads

Hot Threads

I 'Set of pearls' mathematics / physics help

B Is space stretching or is new space being created?

B Generator shaft rotation through spring and magnet (Is this feasible in Physics?)

B How do walk-in freezers not give people hypothermia? [they do]

B Easier to pedal with physics

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem