How Does Expanding a Lossy Compressed File Increase Its Data Size?

peter.ell · May 19, 2011

I was wondering what actually happens when a lossy compressed file is expanded to a higher quality version with more data... where does that extra data come from?

Consider an mp3 with a file size of 3mb and a data rate of 256kbps which is converted to a wav file with a size of 50mb and a data rate of 1411kbps. Obviously there isn't any more audio information in the wav file than the mp3 because the wav came from the mp3, so what is that extra data, where does it come from, and what does it do if it can't code for any higher quality audio?

Thank you so much!

Jiggy-Ninja · May 19, 2011

It'd just be repeats of information already there, or there may be some algorithms that can smooth the transition between two values.

davenn · May 19, 2011

Jiggy-Ninja said:

It'd just be repeats of information already there, or there may be some algorithms that can smooth the transition between two values.

yes indeed and hence why WAV files from MP3's sound crap, 1/2 the data is missing

Dave

uart · May 20, 2011

davenn said:

yes indeed and hence why WAV files from MP3's sound crap, 1/2 the data is missing

Not strictly true and also very misleading.

Firstly the wav files made from mp3's sound precisely the same as the mp3's from which they were made. At playback time they are in fact bit-wise identical. It's the mp3's which may or may not sound inferior to the original wav files (or CD Audio) from which they were created. High quality mp3's (and other lossy formats) however, may be perceptually indistinguishable from the original material for most people most of the time.

Secondly "half the data" is not missing. All of the sample points are still present but many of sample values will have been subtly altered so as to make the data more easily compressed. The algorithms use what's called "psycho-acoustics" to determine in what ways the data can be altered without making too much perceptual difference to how it sounds. Obviously at very low bit rates (eg 64 kbps) then significant audio quality may have to be sacrificed whereas at higher bit rates (eg 192+ kbps) the mp3's will be much truer to the originals.

Personally I listen to mp3's encoded at about 200kbps VBR and I know that in "blind" AB listening tests I absolutely cannot tell the difference between the mp3 and the original.

uart · May 20, 2011

peter.ell said:

I was wondering what actually happens when a lossy compressed file is expanded to a higher quality version with more data... where does that extra data come from?

Consider an mp3 with a file size of 3mb and a data rate of 256kbps which is converted to a wav file with a size of 50mb and a data rate of 1411kbps. Obviously there isn't any more audio information in the wav file than the mp3 because the wav came from the mp3, so what is that extra data, where does it come from, and what does it do if it can't code for any higher quality audio?

Thank you so much!

Hi Peter, there are two main categories of compression.

The first category is called lossless compression, a type in which the data when expanded is completely identical to the original (pre compression) data.

This is the type of compression used for computer data and programs files, for example with common formats like zip and rar etc. Here it is critical that no data is altered in any way. For example if just one single bit (out of millions or billions) was incorrect in an executable program file it might cause the program to completely crash or produce a serious error.

The key idea with lossless compression is to make use of a certain amount of non-randomness inherent within many types of data. For example in encoding "text" type data some of the non-randomness is due to certain symbols (eg letters like e t s space etc) being much more common than others (like z q $ # % etc). So by simply re-assigning codes in such a way that, instead of all symbols using a 7 bit code, the most commonly used symbols get a very short code while more infrequently used symbols get a much longer code, we can reduce the average data size per symbol significantly. That's a bit of an oversimplification but it illustrates the general idea of compression by making use of the lack of randomness within the original data. For this reason this type of encoding (lossless) is often also referred to as entropy encoding.The second category of compression is called lossy compression (often called perceptual based encoding) where we don't need to have the reconstructed data 100% identical to the original, where we can get away with small differences as long as they are (close to) imperceptible. Obviously this opens up many more options as to how the data is encoded and generally allows for much greater levels of compression compared to lossless encoding. This type of compression is most commonly used for things like music, pictures and movies (mp3, jpeg and mpeg for example). Note that most lossy encoders actually uses a combination of both entropy (lossless) and perceptual encoding to achieve the best results.

Filip Larsen · May 20, 2011

On a very general and abstract level, compression work by replacing the audio samples with a much smaller data set required to make a fair rendering of those samples using some kind of fitting with a given set of quality parameters, and when you uncompress you recreate the samples that is generated from that fitting (instead of the original samples).

As an example of the principle behind this, assume you have a plot of (x_i,y_i) point values (your samples) where each x is increasing with a fixed interval and you notice that to a good degree of accuracy you can fit a line to those values, that is, using some fitting method you determine a line y = ax+b such that when you calculate y by inserting each x_i into that equation, the difference between that y and y_i is very small. If you want to communicate your samples to your friend, you can now instead of sending him all the values (x_i,y_i), just tell him the parameters for the line (a and b) and the range of x-values (like the first x₁ and and last x_n and n) and your friend will be able to recreate the (x_i,y_i) plot with good accuracy.

davenn · May 20, 2011

uart said:

Not strictly true and also very misleading.

not at all! I have yet to hear a WAV made from an MP3 that sounds even as good as the mp3 it was made from...
mate ... I am just commenting from lots of personal experience ... nothing untrue or misleading at all! :)

since we are dealing with basic data ... the same thing with images jpg's and RAW images from a camera. A RAW image file is roughly analogous to a WAV sound file.
once you compress a RAW to a JPG or a WAV to a MP3 you loose data, that data cannot be brought back, you can only try and fill in the gaps. The result is noticeable.

Dave

Borek · May 20, 2011

davenn said:

I have yet to hear a WAV made from an MP3 that sounds even as good as the mp3 it was made from...

once you compress a RAW to a JPG or a WAV to a MP3 you loose data, that data cannot be brought back, you can only try and fill in the gaps. The result is noticeable.

While your second stamement is correct, the first one doesn't follow. WAV -> MP3 means data loss, but MP3 -> WAV means just change in file size. What MP3 players do is they decompress MP3 stream to samples and send them to the DAC. When you convert MP3 to WAV you do exactly the same - you decompress MP3 stream to samples. Just instead of sending them directly to DAC you save them to file. If later you listen to this WAV file exactly the same samples are sent to DAC. In no way the output can be different.

uart · May 20, 2011

davenn said:

not at all! I have yet to hear a WAV made from an MP3 that sounds even as good as the mp3 it was made from...
mate ... I am just commenting from lots of personal experience ... nothing untrue or misleading at all! :)

Sorry but what you are saying there is completely untrue and therefore 100% misleading. How do you think an mp3 player (or software) plays an mp3? It decodes it to the raw data (equivalent to what's stored in the wav file) and then sends that "decoded" data to the DAC's (or sound card). Like I said before, that raw data that the DAC's receive is identical in either case. In one case you're decoding the mp3 "on the fly" as it's played, while in the other case you're simply pre-decoding that very same data before it's played.

You need to try some blind listening tests. If you do so you'll very quickly find out that you cannot in fact hear a difference. The differences that you "perceive" are due to your brain fooling you because of your mistaken expectations. Don't worry, it's not just you, no one is immune to this type of self delusion (seriously, I'm not being facetious).

davenn · May 20, 2011

uart said:

Don't worry, it's not just you, no one is immune to this type of self delusion (seriously, I'm not being facetious).

yes you are ... but i will get over it even if you don't ;)

Dave

sophiecentaur · May 20, 2011

davenn said:

yes you are ... but i will get over it even if you don't ;)

Dave

I don't think uart is, necessarily being facetious. He is stating a well known fact about subjective appreciation of impairments. We often tend to listen with our chequebooks and our preconceptions. Double blind testing may be the only way to settle this for you.

There is always the possibility that the particular implementations of codecs that you are using may be faulty or inappropriate.

jsgruszynski · May 24, 2011

For lossy compression, the process of compression throws away high information of the original but approximately matches its features to regular (simple, low information, standardized) family of patterns. This process is related to how a Fourier series can represent any function by a sum of sines and cosines having particular coefficients and frequencies. Essentially the compressed file is only including the coefficients, frequency parameters of the series function along with much smaller bitmaps and tossing out the original image entirely.

The original is later reconstructed from these coefficients and bitmap data alone by reversing the process.

A series of various frequency raised-cosine functions is used in JPEG and MPEG instead of a Fourier sine/cosine because it's positive only and it has a roll-off factor that works better with 2-D image representation. A lot of this relies on the fact that your eye actually sucks at certain kinds of image recognition so it can be fooled, within reason, with a cheap imitation of the image.

http://en.wikipedia.org/wiki/Raised-cosine_filter

http://en.wikipedia.org/wiki/File:Raised-cosine-filter.pngThe difference in low or high fidelity settings in a JPEG have to do with how many frequency terms in the series representation you are going to include in your file. Fewer means smaller file size but worse fidelity because you are using fewer higher order terms (each amplitude coefficent is a number that has to have space in the file) and vice versa.

You see these represented raised-cosine patterns, and the corruption of the original data, when you save to JPEG with low fidelity: you get the blocky artifacts - those blocks are the 2-dimensional, low order raised-cosine terms. Adding higher order, higher frequency (smaller blocks) increases the fidelity of the representation.

Instead of a raised-cosine, you could also use a fractal kernel with multiple frequencies and save the amplitudes and patch bitmaps, which is what fractal compression does.

The very short answer: the reconstructed information content comes from the latent information and redundancy of the original image that allowed simple patterns to used in the compression to fit "well enough for the human eye". Not something for nothing actually.

How Does Expanding a Lossy Compressed File Increase Its Data Size?

Similar threads

Following up on the recent thread about Earth Resistance measurements

I'm trying to intuit how pull-up and pull-down resistors work

Electronic (6v) Lamp failed - troubleshooting

Question about bonding ground and neutral

Electric power distribution from powerplant to homes

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers