Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Efficient encoding

  1. Mar 18, 2008 #1
    Say you are given a five-sided biased die that has a probability of 1/8 of coming up A, 1/8 for B, and 1/4 for each of C, D, and E. Design an optimal code for transmitting throws of this die.

    log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die. But if we encode as follows:

    A: 000
    B: 001
    C: 01
    D: 10
    E: 11

    Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

    A:111
    B:000
    C:11
    D:11
    E:11

    This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution
     
  2. jcsd
  3. Mar 18, 2008 #2
    The latter cannot be a valid encoding: if you see "11", you won't be able to discern if the case is C, D or E.
     
  4. Mar 18, 2008 #3

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    That applies only for equiprobable events. With the first encoding, the expected number of bits to transmit a throw of the die is 2.4 if all events have probability 1/5.

    Of course not. Your own example disproves this. So does this one:

    A:000
    B:001
    C:00
    D:01
    E:11

    With this scheme one cannot unambiguously transmit a sequence of dice throws. Is 000000 two throws of A or three throws of C?
     
  5. Mar 18, 2008 #4
    You will typically want what's called a "Prefix Code" for this type of problem. See http://en.wikipedia.org/wiki/Prefix_code . The idea with a prefix code is that no codeword is the "prefix" of any other; i.e., there is no codeword that is formed by adding digits to another codeword. This allows you to send strings of codewords and be able to parse them correctly. Consider this example:

    A: 0
    B: 10
    C: 11

    Which is a prefix code. Then, the string {A,B,C} becomes 01011, which can be uniquely decoded. By comparison, consider a non-prefix code:

    A: 1
    B: 11
    C: 111

    Then, {A,B,C} would be encoded as 111111, but when decoding, we can't tell whether this should be {A,A,A,A,A,A}, {A,B,A,B}, {B,B,B}, {C,C}, {A,B,C}, {C,B,A}, etc. So it's not terribly useful as a code.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?