# Shannon entropy of QXZ

1. Apr 23, 2013

### Karl Coryat

Shannon entropy of "QXZ"

Hello everyone. I am trying to determine the Shannon entropy of the string of letters QXZ, taking into consideration those letters' frequency in English. I am using the formula:

H(P) = –Ʃ pilog2pi

What's puzzling me is that I am expecting to calculate a high entropy, since QXZ represents an unexpected string in the context of English letter frequencies -- but the first pi term in the formula, which takes very small values (e.g., .0008606 for Q), is greatly diminishing my calculations. I am obviously making a wrong assumption here or applying something incorrectly, because as I understand it, letters with high surprisals should increase the entropy of the string, not reduce it.

2. Apr 23, 2013

### Stephen Tashi

Shannon entropy is defined for a probablity distribution. You are apparently making some sort of assumptions about the probability of a string of letters and trying to apply the formula for Shannon entropy to the probability of that string happening. Shannon entropy can be computed for the probability distribution for all 3 letter strings. (i.e. it applies to a set of probabilities that sum to 1.0) It doesn't apply to one realization of a 3 letter strings taken from that distribution.

Perhaps you should try Kolmogorov complexity if you want to deal with definite strings of letters.

3. Apr 25, 2013

### awkward

Karl,

I don't know much about information theory, but I think the Shannon information content of "Q" in English text is simply $$- \log_2(P(Q))$$ The formula you quote for H(P) is for the entropy of an "ensemble" (or distribution), e.g. the entropy of a randomly selected letter in English text.