Efficient encoding

Main Question or Discussion Point

Say you are given a five-sided biased die that has a probability of 1/8 of coming up A, 1/8 for B, and 1/4 for each of C, D, and E. Design an optimal code for transmitting throws of this die.

log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die. But if we encode as follows:

A: 000
B: 001
C: 01
D: 10
E: 11

Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

A:111
B:000
C:11
D:11
E:11

This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution

Answers and Replies

The latter cannot be a valid encoding: if you see "11", you won't be able to discern if the case is C, D or E.

D H
Staff Emeritus
Science Advisor
log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die.
That applies only for equiprobable events. With the first encoding, the expected number of bits to transmit a throw of the die is 2.4 if all events have probability 1/5.

My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same?
Of course not. Your own example disproves this. So does this one:

A:000
B:001
C:00
D:01
E:11

With this scheme one cannot unambiguously transmit a sequence of dice throws. Is 000000 two throws of A or three throws of C?

quadraphonics
Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

A:111
B:000
C:11
D:11
E:11

This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution
You will typically want what's called a "Prefix Code" for this type of problem. See http://en.wikipedia.org/wiki/Prefix_code . The idea with a prefix code is that no codeword is the "prefix" of any other; i.e., there is no codeword that is formed by adding digits to another codeword. This allows you to send strings of codewords and be able to parse them correctly. Consider this example:

A: 0
B: 10
C: 11

Which is a prefix code. Then, the string {A,B,C} becomes 01011, which can be uniquely decoded. By comparison, consider a non-prefix code:

A: 1
B: 11
C: 111

Then, {A,B,C} would be encoded as 111111, but when decoding, we can't tell whether this should be {A,A,A,A,A,A}, {A,B,A,B}, {B,B,B}, {C,C}, {A,B,C}, {C,B,A}, etc. So it's not terribly useful as a code.