Limit of compression (source coding theorem)

In summary, the source coding theorem states that at least N*H bits are required to encode a message of length N and entropy H, which is the theoretical limit of data compression. However, this only applies when the frequency or probability of a given symbol is known. In cases where the symbol x is always followed by y with high probability, the entropy is lowered and a compression algorithm can use fewer bits. However, this does not apply in situations where choices are made independently.
  • #1
gop
58
0
Hi

The source coding theorem says that one needs at least N*H bits to encode a message of length N and entropy H. This supposedly is the theoretical limit of data compression.

But is it? Or does it only apply to situations where only the frequency (probability) of a given symbol is known?

For example, If I know that the symbol x is always followed by the symbol y (or that this happens with a very high probability) couldn't I use this to construct a compression algorithm that needs fewer bits than N*H?

thx
 
Mathematics news on Phys.org
  • #2
gop said:
For example, If I know that the symbol x is always followed by the symbol y (or that this happens with a very high probability) couldn't I use this to construct a compression algorithm that needs fewer bits than N*H?

If x is (almost) always followed by y, then that lowers the entropy. It's already taken into account.
 
  • #3
But if I have something like this

p(x,y,z)= (1/3,1/3,1/3)

then I have entropy 1.585. But now I could have p(y|x) = 1/2 or p(y|x) = 1 as long as p(y|z)=1-p(y|x) the overall probability p(x,y,z) stays the same. So I have the same entropy. But in the case of p(y|x)=1 I can only use two symbols say p(xy,z) = (1/2,1/2) which has entropy 1.

I guess I'm missing something obvious here but...

thx
 
  • #4
gop said:
I guess I'm missing something obvious here but...

Yes, your formula doesn't apply unless the choices are made independently.
 
  • #5
ok I got it if I use conditional probabilities I need to use another model and another way to compute the entropy rate.

thx
 

What is the limit of compression?

The limit of compression, also known as the source coding theorem, states that for a given source with a fixed amount of information, there is a limit to how much it can be compressed. This limit is determined by the entropy of the source, which represents the average amount of information per symbol.

How is the limit of compression calculated?

The limit of compression is calculated using Shannon's source coding theorem, which states that the optimal compression rate is equal to the entropy of the source. This can be mathematically represented as R ≤ H(X), where R is the compression rate and H(X) is the entropy of the source.

Why is the limit of compression important?

The limit of compression is important because it helps determine the best possible compression rate for a given source. It also serves as a benchmark for evaluating compression algorithms and techniques. Furthermore, understanding the limit of compression can lead to more efficient and effective compression methods.

Can the limit of compression be exceeded?

No, the limit of compression cannot be exceeded. This is because it is determined by the entropy of the source, which represents the minimum amount of information needed to describe the source. Any compression rate that exceeds the limit would result in loss of information.

Does the limit of compression apply to all types of data?

Yes, the limit of compression applies to all types of data, including text, images, audio, and video. However, the limit may vary depending on the type of data and its characteristics, such as redundancy and predictability. For example, images with high levels of redundancy may have a lower limit of compression compared to text.

Similar threads

  • General Math
Replies
33
Views
2K
  • Computing and Technology
Replies
0
Views
179
  • Programming and Computer Science
Replies
2
Views
2K
  • Programming and Computer Science
Replies
4
Views
672
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Advanced Physics Homework Help
Replies
6
Views
1K
  • Electrical Engineering
Replies
15
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
6
Views
1K
Back
Top