Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Shannon entropy of QXZ

  1. Apr 23, 2013 #1
    Shannon entropy of "QXZ"

    Hello everyone. I am trying to determine the Shannon entropy of the string of letters QXZ, taking into consideration those letters' frequency in English. I am using the formula:

    H(P) = –Ʃ pilog2pi

    What's puzzling me is that I am expecting to calculate a high entropy, since QXZ represents an unexpected string in the context of English letter frequencies -- but the first pi term in the formula, which takes very small values (e.g., .0008606 for Q), is greatly diminishing my calculations. I am obviously making a wrong assumption here or applying something incorrectly, because as I understand it, letters with high surprisals should increase the entropy of the string, not reduce it.

    Thank you in advance for your generous help.
     
  2. jcsd
  3. Apr 23, 2013 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    Shannon entropy is defined for a probablity distribution. You are apparently making some sort of assumptions about the probability of a string of letters and trying to apply the formula for Shannon entropy to the probability of that string happening. Shannon entropy can be computed for the probability distribution for all 3 letter strings. (i.e. it applies to a set of probabilities that sum to 1.0) It doesn't apply to one realization of a 3 letter strings taken from that distribution.

    Perhaps you should try Kolmogorov complexity if you want to deal with definite strings of letters.
     
  4. Apr 25, 2013 #3
    Karl,

    I don't know much about information theory, but I think the Shannon information content of "Q" in English text is simply [tex]- \log_2(P(Q))[/tex] The formula you quote for H(P) is for the entropy of an "ensemble" (or distribution), e.g. the entropy of a randomly selected letter in English text.

    Reference: "Information Theory, Inference and Learning Algorithms" by MacKay (which is available for free download) http://www.inference.phy.cam.ac.uk/itila/
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook