Understanding MP3: How Does It Work?

  • Thread starter Thread starter arcnets
  • Start date Start date
  • Tags Tags
    Work
Click For Summary

Discussion Overview

The discussion centers around the workings of MP3 compression, particularly how it achieves significant file size reduction while maintaining sound quality. Participants explore the technical aspects of audio encoding, psychoacoustics, and the implications of the Shannon theorem in relation to MP3 files.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Exploratory

Main Points Raised

  • Some participants note that MP3 compression reduces audio data by discarding frequencies that are inaudible to humans.
  • Carlos expresses that he can discern a noticeable difference in sound quality between MP3 and WAV formats, particularly in stereo quality.
  • Warren clarifies that Shannon's theorem does not directly apply to MP3 compression, emphasizing that it involves a wavelet or FFT transform to discard psychoacoustically unimportant coefficients.
  • Participants discuss the impact of encoding bitrate on sound quality, with some suggesting that higher bitrates yield better results.
  • There is mention of the importance of the encoding software used, with recommendations for using high-quality encoders like LAME.
  • Some participants question how MP3 encoders determine which audio information can be discarded without affecting perceived quality.
  • A hypothetical scenario is proposed regarding the construction of a WAV file that could sound poor when compressed to MP3, suggesting that certain audio characteristics might challenge the compression algorithm.

Areas of Agreement / Disagreement

Participants express differing opinions on the quality of MP3 compression compared to WAV files, with some asserting that significant audio quality is lost, while others suggest that the loss is acceptable or negligible. The discussion remains unresolved regarding the extent of quality loss and the effectiveness of various encoding methods.

Contextual Notes

Participants highlight limitations in the encoding process, such as the dependency on software capabilities and the subjective nature of audio quality perception. There is also mention of unresolved technical details regarding the psychoacoustic principles that guide MP3 compression.

arcnets
Messages
493
Reaction score
0
Hi all,
when you store music as a WAV file (CD quality), it will take ~10 MB per minute. If you compress to MP3, even with the highest quality, you will only get ~1 MB per minute.
I know that the file size is determined by amplitude resolution * sample rate. If you use 16 bit, 44 kHz, and stereo, that results in ~10 MB per minute as I said.
The Shannon theorem says that, in order to digitalize an analog signal with full quality, the sample rate must be at least twice the highest frequency. So you cannot throw away any high-frequency Fourier components, or the sound will be bad.
My question is: How can the MP3 compression give a sound almost as good as the original, while throwing away ~90% of the information?
Does anyone know a simple explanation?
Thanks...
 
Computer science news on Phys.org
What MP3 codec does, is that it gets rid of parts of audio that you and I cannot hear.
 
I can easily tell the difference between mpw and .wav. When you convert to mp3, the song loses much of its stereo sound/3-d quality. I compared same songs in both formatts and could tell the difference quite easily.

Carlos Hernandez
 
Carlos,
Were you using variable bitrate or constant bitrate ?
 
Originally posted by BoulderHead
Carlos,
Were you using variable bitrate or constant bitrate ?

Constant, I believe.
 
Some software programs may not allow the luxury of opting to encode using variable bitrate but you can obtain better sound quality with this method...just a thought.
 
Originally posted by arcnets

The Shannon theorem says that, in order to digitalize an analog signal with full quality, the sample rate must be at least twice the highest frequency.
Actually, Shannon's sampling theorem says nothing about quality. What it says is that to represent a signal of frequency f Hz at all, you must sample at or above 2f Hz. f, the highest frequency representable, is called the Nyquist frequency.

A sine wave digitized at almost exactly the Nyquist frequency will look nothing at all like the original sine wave. All Shannon's theorem says is that its fundamental power will still appear in a frequency-domain plot.

Now, mp3 is not a sampled data stream, so Shannon's theorem doesn't apply to it.

What mp3 basically does is compute a wavelet (or FFT, I can't remember which) transform on blocks of samples. The result is a set of Fourier coefficients. The mp3 algorithm then discards some of these coefficients which are not psychoacoustically important. Voila! You have compressed your data.

- Warren
 
Originally posted by BoulderHead
Some software programs may not allow the luxury of opting to encode using variable bitrate but you can obtain better sound quality with this method...just a thought.

I was using the freeware CDex, they had both options available.
 
Note to Carlos: the quality of your encoding software matters. A lot.

- Warren
 
  • #10
Originally posted by chroot
Note to Carlos: the quality of your encoding software matters. A lot.

- Warren

Yeah for example 64kb encoding would sound pretty bad, but 192kb encoding sounds great.
 
  • #11
Originally posted by Greg Bernhardt
Yeah for example 64kb encoding would sound pretty bad, but 192kb encoding sounds great.
Not just that -- different encoders, both running at 192kbps, will produce different results.

- Warren
 
  • #12
What I'm referring to as 'almost as good as the original', is:
256 kBit/s, 44100 Hz, Stereo + Resample.
Originally posted by Greg Bernhardt
What MP3 codec does, is that it gets rid of parts of audio that you and I cannot hear.
How does it know what you and I cannot hear? And does this imply that we cannot hear ~90% of the information that is in a soundwave?
 
  • #13
I encoded my .wav into 128, 160, 192, 256, and 320 kbps, and I've tried freeware like CDex, which is supposed to be high quality, but I've also tried high quality shareware, like media jukebox and MusicMatch Jukebox. No matter what the situation, I noticed that you end up loosing significant stereo/3-D sound from the original. And it's not just me, I found that other people also found the same results as I. I searched this on the internet, and others have said the same thing.

Carlos Hernandez
 
  • #14
Try a real encoder, like LAME.

- Warren
 
  • #15
Originally posted by chroot
Try a real encoder, like LAME.

- Warren

All the encoders I used had the Lame feature.

Perhaps I just have a better ear for music and can tell differences better. Or maybe not, who knows.
 
  • #16
This page explains the difference between joint stereo and stereo.

http://www.modatic.net/audio/stereo_vs_jointstereo.php

If your having a problem with your separation, then it's probably the software you are using. The lame encoder that was mentioned is a command line program. It's one of the best encoders. It's free. If your not comfortable with using a command line interface, then you might not like using lame. There are many windows front-ends available but they generally don't use all the features that you get when using the CLI version. Some software may use the lame encoder (lame_enc.dll) but the functionality is usually crippled.

http://lame.sourceforge.net/

I think the info on the first page I linked to will solve your problem.
 
Last edited:
  • #17
Originally posted by arcnets
How does it know what you and I cannot hear? And does this imply that we cannot hear ~90% of the information that is in a soundwave?

one example is that when you have 2 frequencies f1 and f2 very close and the amplitude of f1 is smaller than the amplitude of f2 you cannot hear f1. But the exact parameters (when 2 frequencies are close or what amplitude difference is important is subject to psychoacoustics...).
So the mp3 encoder after computing the wavelet transform (or FFT, whetever...) can safely ignore some of the coeficients based on rules such as the above one.
 
  • #18
Thanks Guybrush T.,
yes I understand. In the case that you mention, the result of the 2 waves superimposing, would be a very long-period, low amplitude modulation in volume, which is ignored by the ear.
Since the lowest audible frequency is ~20 Hz, the Fourier transform will deliver sinus components of 20 Hz, 40 Hz, 60 Hz, and so on. And, of course, say, 10000 Hz and 10020 Hz are 'very close together'.

I have the following idea: Is it possible to construct (mathematically or otherwise) a WAV file that sounds *bad* with any MP3 compression? Because the wave is constructed so tricky that MP3 throws away just the components that are, in fact, important to our hearing?

Maybe a voice whispering some words, barely audible, in a lot of noise?
 
  • #19
Originally posted by arcnets
I have the following idea: Is it possible to construct (mathematically or otherwise) a WAV file that sounds *bad* with any MP3 compression? Because the wave is constructed so tricky that MP3 throws away just the components that are, in fact, important to our hearing?

Maybe a voice whispering some words, barely audible, in a lot of noise?

Actually, there are people who test various encoders using a bunch of different files. You can see what the results are for some pathological sounds like instantaneous pulses.

IIRC mp3 has problems with clicks and pops of some type more than anything else.

Here's a chapter of O'Reilley's book on mp3's. It might be more than what you wanted.

http://www.oreilly.com/catalog/mp3/chapter/ch02.html
 
Last edited by a moderator:
  • #20
NateTG,
that's great. Thank you.
 

Similar threads

Replies
7
Views
7K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
17
Views
6K
Replies
11
Views
2K
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K