saviourmachine
- 96
- 0
Shannon's measure of information is the well known formula (in the discrete case):
H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)
Of course this can be written as:
H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}
It appears to me that the multiple occurence of the same quantity (the probability of a particular symbol) must have some profound meaning. Why is it a power that is used to formulate this 'entropy' measure? In the case of throwing a six with a dice, the result of 1/6^{-1/6}=1.348 and 5/6^{-5/6}=22.255. Is there some intuitive alternative association with these numbers (without direct connection with binary strings)?
Until now I found only explanations that took the logarithm for granted. IMHO the logarithm is only rescaling the "chances to the power of chances" characteristic. So, I would like to have an explanation considering that characteristic.
Or is it more or less arbitrary, as taking quadratic error measures in stead of absolute errors (in e.g. the least mean square method)?
H(S) = {-}\sum_{i=1}^N P(s_i) \log_{2}P(s_i)
Of course this can be written as:
H(S) = \sum_{i=1}^N \log_{2}P(s_i)^{-P(s_i)}
It appears to me that the multiple occurence of the same quantity (the probability of a particular symbol) must have some profound meaning. Why is it a power that is used to formulate this 'entropy' measure? In the case of throwing a six with a dice, the result of 1/6^{-1/6}=1.348 and 5/6^{-5/6}=22.255. Is there some intuitive alternative association with these numbers (without direct connection with binary strings)?
Until now I found only explanations that took the logarithm for granted. IMHO the logarithm is only rescaling the "chances to the power of chances" characteristic. So, I would like to have an explanation considering that characteristic.
Or is it more or less arbitrary, as taking quadratic error measures in stead of absolute errors (in e.g. the least mean square method)?