Limit of compression (source coding theorem)

gop · Apr 4, 2009

Hi

The source coding theorem says that one needs at least N*H bits to encode a message of length N and entropy H. This supposedly is the theoretical limit of data compression.

But is it? Or does it only apply to situations where only the frequency (probability) of a given symbol is known?

For example, If I know that the symbol x is always followed by the symbol y (or that this happens with a very high probability) couldn't I use this to construct a compression algorithm that needs fewer bits than N*H?

thx

CRGreathouse · Apr 4, 2009

gop said:

For example, If I know that the symbol x is always followed by the symbol y (or that this happens with a very high probability) couldn't I use this to construct a compression algorithm that needs fewer bits than N*H?

If x is (almost) always followed by y, then that lowers the entropy. It's already taken into account.

gop · Apr 4, 2009

But if I have something like this

p(x,y,z)= (1/3,1/3,1/3)

then I have entropy 1.585. But now I could have p(y|x) = 1/2 or p(y|x) = 1 as long as p(y|z)=1-p(y|x) the overall probability p(x,y,z) stays the same. So I have the same entropy. But in the case of p(y|x)=1 I can only use two symbols say p(xy,z) = (1/2,1/2) which has entropy 1.

I guess I'm missing something obvious here but...

thx

CRGreathouse · Apr 4, 2009

gop said:

I guess I'm missing something obvious here but...

Yes, your formula doesn't apply unless the choices are made independently.

gop · Apr 5, 2009

ok I got it if I use conditional probabilities I need to use another model and another way to compute the entropy rate.

thx

Limit of compression (source coding theorem)

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Undergrad My basic understanding of set theory

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers