Is a Prefix Code Necessary for Efficient Encoding?

lordy12 · Mar 18, 2008

Say you are given a five-sided biased die that has a probability of 1/8 of coming up A, 1/8 for B, and 1/4 for each of C, D, and E. Design an optimal code for transmitting throws of this die.

log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die. But if we encode as follows:

A: 000
B: 001
C: 01
D: 10
E: 11

Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

A:111
B:000
C:11
D:11
E:11

This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution

dodo · Mar 18, 2008

The latter cannot be a valid encoding: if you see "11", you won't be able to discern if the case is C, D or E.

D H · Mar 18, 2008

lordy12 said:

log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die.

That applies only for equiprobable events. With the first encoding, the expected number of bits to transmit a throw of the die is 2.4 if all events have probability 1/5.

My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same?

Of course not. Your own example disproves this. So does this one:

A:000
B:001
C:00
D:01
E:11

With this scheme one cannot unambiguously transmit a sequence of dice throws. Is 000000 two throws of A or three throws of C?

quadraphonics · Mar 18, 2008

lordy12 said:

Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

A:111
B:000
C:11
D:11
E:11

This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution

You will typically want what's called a "Prefix Code" for this type of problem. See http://en.wikipedia.org/wiki/Prefix_code . The idea with a prefix code is that no codeword is the "prefix" of any other; i.e., there is no codeword that is formed by adding digits to another codeword. This allows you to send strings of codewords and be able to parse them correctly. Consider this example:

A: 0
B: 10
C: 11

Which is a prefix code. Then, the string {A,B,C} becomes 01011, which can be uniquely decoded. By comparison, consider a non-prefix code:

A: 1
B: 11
C: 111

Then, {A,B,C} would be encoded as 111111, but when decoding, we can't tell whether this should be {A,A,A,A,A,A}, {A,B,A,B}, {B,B,B}, {C,C}, {A,B,C}, {C,B,A}, etc. So it's not terribly useful as a code.

Is a Prefix Code Necessary for Efficient Encoding?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight