Is a Prefix Code Necessary for Efficient Encoding?

  • Context: Undergrad 
  • Thread starter Thread starter lordy12
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around the necessity of a prefix code for efficient encoding of outcomes from a biased five-sided die. Participants explore the implications of different encoding schemes and their ability to uniquely identify outcomes based on the assigned binary representations.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes an encoding scheme for the die outcomes and questions whether the arrangement of bits matters if the total number of bits remains the same.
  • Another participant argues that the proposed encoding is invalid because it leads to ambiguity in decoding, as multiple outcomes could correspond to the same bit sequence.
  • A third participant clarifies that the calculation of required bits applies only to equiprobable events and provides an example of a non-prefix code that fails to allow for unique decoding.
  • Further examples are provided to illustrate the necessity of a prefix code, emphasizing that a valid encoding must ensure that no codeword is a prefix of another.

Areas of Agreement / Disagreement

Participants generally agree on the importance of prefix codes for unique decoding but disagree on the validity of certain encoding schemes proposed in the discussion. The necessity of prefix codes remains a contested topic, with different interpretations of encoding validity.

Contextual Notes

Some assumptions about the probabilities of outcomes and the implications for encoding efficiency are not fully explored. The discussion also highlights the limitations of non-prefix codes in terms of ambiguity during decoding.

lordy12
Messages
36
Reaction score
0
Say you are given a five-sided biased die that has a probability of 1/8 of coming up A, 1/8 for B, and 1/4 for each of C, D, and E. Design an optimal code for transmitting throws of this die.

log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die. But if we encode as follows:

A: 000
B: 001
C: 01
D: 10
E: 11

Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

A:111
B:000
C:11
D:11
E:11

This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution
 
Physics news on Phys.org
The latter cannot be a valid encoding: if you see "11", you won't be able to discern if the case is C, D or E.
 
lordy12 said:
log(5) = 2.32, so 2.32 bits are required to transfer one throw of the die.
That applies only for equiprobable events. With the first encoding, the expected number of bits to transmit a throw of the die is 2.4 if all events have probability 1/5.

My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same?
Of course not. Your own example disproves this. So does this one:

A:000
B:001
C:00
D:01
E:11

With this scheme one cannot unambiguously transmit a sequence of dice throws. Is 000000 two throws of A or three throws of C?
 
lordy12 said:
Then the average number of bits for each throw is 2.25. My question is, can the 1's and 0's be placed anywhere so long as the number of numbers is the same? For example:

A:111
B:000
C:11
D:11
E:11

This representation still has the same number of bits for each letter as the previous representation. Am I wrong? Is this still a valid solution

You will typically want what's called a "Prefix Code" for this type of problem. See http://en.wikipedia.org/wiki/Prefix_code . The idea with a prefix code is that no codeword is the "prefix" of any other; i.e., there is no codeword that is formed by adding digits to another codeword. This allows you to send strings of codewords and be able to parse them correctly. Consider this example:

A: 0
B: 10
C: 11

Which is a prefix code. Then, the string {A,B,C} becomes 01011, which can be uniquely decoded. By comparison, consider a non-prefix code:

A: 1
B: 11
C: 111

Then, {A,B,C} would be encoded as 111111, but when decoding, we can't tell whether this should be {A,A,A,A,A,A}, {A,B,A,B}, {B,B,B}, {C,C}, {A,B,C}, {C,B,A}, etc. So it's not terribly useful as a code.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 66 ·
3
Replies
66
Views
8K
Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K