Fourier Analysis of Real Sound Waves

Atomic_Sheep · Aug 9, 2012

If a basic sin sound wave is analysed with a Fourier transform, the result is just a spike at a certain frequency. My maths isn't the best so bare with me... if we take a real sound file and take Fourier transforms at regular intervals (I assume that's what's being done when calculating a Spectrum over a range of time of a sound file), I'll get a spectrum that doesn't any longer have spikes and instead has these more mountain peaks rather than spikes. My question is, since real sounds aren't stationary and taking into account the additivity of Fourier transforms (i.e. from my understanding, you can simply add Fourier transforms without um... data loss?), is the reason for these more mountain peaks rather than spikes simply due to the averaging process?

Rooted · Aug 9, 2012

Sorry, I'm being a bit stupid, I'm not understanding what you're asking - could you possibly post a pic of what you're describing? There's a load of java apps on t'internet that should draw it for you. Cheers!

Simon Bridge · Aug 9, 2012

An arbitrary sound will be composed of a superposition of sine waves ... the Fourier transform of an actual sound shape would therefore be a lot of spikes for the periodic components.

Don't try this with cats.
http://xkcd.com/26/

Atomic_Sheep · Aug 13, 2012

Basically in a sound editing program, there's an option to view the Fourier transform over a period of time... if you view this Fourier transform at any given time, it's as Simon Bridge pointed out, a bunch of spikes that correspond with the individual periodic components, however since we're dealing with a range of time of sound, my understanding is that what's happening is that e.g. we have a sound which is composed of just 1 sin wave at say 1000 hz... however since real world sounds aren't stationary... that 1000 hz sound might go up and down a little... so over a range of time, it would appear that the sound is composed from different frequencies even though it's the same one that is simply not stationary (from a stats point of view). So my question was simply asking whether that's indeed what was most likely that was happening... i.e. the spikes become peaks when you extend the time period of analysis? As I explain this, it makes more and more sense to me and indeed I believe that this is indeed what is happening but just wanting to make sure.

jahaan · Aug 13, 2012

The reason for the 'mountain' instead of a peak is a process called 'wondowing'.

The Fourier transform of a sine wave is indeed a spike at it's base frequency. BUT that is the Fourier transform of a sine wave that goes on forever (if you compute it by hand you take the integral from minus infinity to infinity).

If however you take the transform during a finite interval of time the peaks are being 'broadened'. If you know a bit of the mathematics behind this, you know that multiplying two signals is the same as convoluting them in the frequency domain. So, taking a transform in a finite interval is the same as taking the transform of an infinite sine, multiplied by a function which is zero outside this interval, and one in the interval. That function is called a 'window'.

So, to find the spectrum of a 'real life' sine (like in a sound editing program), you need to convolute the spectrum of the window function and the sine. This convolution broadens the spectum.

Maybe you can try this to see it for yourself in a sound editing program: take a sine wave of 1 sec, 2 sec, 4 sec, ... and compare the spectra. You should see the spectrum getting peakyer (thinner) as the sine wave gets longer...

See this as well: http://en.wikipedia.org/wiki/Window_function

Simon Bridge · Aug 14, 2012

Windo... ah: jahaan beat me to it :)

I'll add:
In general, sound waves are traveling waves.
The Fourier transform of a traveling wave with a single frequency is still a spike. Try it.
If the frequency somehow changed continuously in time, say the source is accelerating wrt the air, the Fourier transform would just be time-dependent with the change. If you took a long sample (compared with the rate the frequency changes), the Fourier transform would look like whatever combination of discrete frequencies would make the final waveform.
You could have a wave constructed from a continuous range of frequency components though :) The Fourier transform does not have to be all spikes. You can have fun constructing frequency spectra and doing the inverse transform to see what the waveform would look like.

Besides windowing, another source of line broadening in real-world sounds would be statistical uncertainties in the measuring process.

PhilDSP · Aug 14, 2012

The basic situation is that if the portion of the sound wave that you've captured in your data window is not entirely periodic within that window, then the FT or DFT will be inexact (at best) or wildly inaccurate (at worst). If a sound wave varies in time, which is very likely, then it's highly unlikely that it will be periodic within the data window. Lengthening the data window will reduce the inaccuracies as well as increase the frequency resolution (giving up sharper peaks).

mikeph · Aug 14, 2012

Can I ask a secondary question which may potentially clean up a confusion that's been in my brain for a while- when we split the wave into subsections in time, are we using the Nyquist frequency to do this? And when Phil says "inexact(at best) or wildly inaccurate (at worst)", would the critical limit between these two be when the frequency of divisions in the sample is half the maximum frequency of sound contained within the sample?

jartsa · Aug 14, 2012

Atomic_Sheep said:

If a basic sin sound wave is analysed with a Fourier transform, the result is just a spike at a certain frequency. My maths isn't the best so bare with me... if we take a real sound file and take Fourier transforms at regular intervals (I assume that's what's being done when calculating a Spectrum over a range of time of a sound file), I'll get a spectrum that doesn't any longer have spikes and instead has these more mountain peaks rather than spikes. My question is, since real sounds aren't stationary and taking into account the additivity of Fourier transforms (i.e. from my understanding, you can simply add Fourier transforms without um... data loss?), is the reason for these more mountain peaks rather than spikes simply due to the averaging process?

A typical symphony is made of 200 million sine waves. When a computer plots a spectrum of said symphony, 200 million spikes must fit into a 1000 pixels wide window, which will cause some spikes to fuse into some kind of mountain tops.

Discrete Fourier transform of data = spectrum of the data
discrete = spiky

Simon Bridge · Aug 15, 2012

MikeyW said:

Can I ask a secondary question which may potentially clean up a confusion that's been in my brain for a while- when we split the wave into subsections in time, are we using the Nyquist frequency to do this?

The Nyquist frequency is set by the sampling rate. When you choose the sampling rate you also choose the nyquist frequency.
http://en.wikipedia.org/wiki/Nyquist_frequency

And when Phil says "inexact(at best) or wildly inaccurate (at worst)", would the critical limit between these two be when the frequency of divisions in the sample is half the maximum frequency of sound contained within the sample?

See the wiki article - to have a good reconstruction of a signal you need the nyquist frequency to be, at least, a bit higher than the highest frequency in the sample ... to get this, you choose a sampling frequency twice this.

For example, if the sample rate is 20 kHz, the Nyquist frequency is 10 kHz, and an 11 kHz signal will be indistinguishable from a (20-11=) 9 kHz signal and 'tother way too. You need some reason to believe it's one and not the other to do the reconstruction correctly. i.e. if you happen to know that all the signals of interest have f < 10kHz then you set your sampling rate for f_sam > 20kHz = 2xf_sig.

But I don't think that's what Phil was talking about - I'll leave it to him to explain :)

jartsa · Aug 15, 2012

Simon Bridge said:

Windo... ah: jahaan beat me to it :)

Besides windowing, another source of line broadening in real-world sounds would be statistical uncertainties in the measuring process.

Yeah but atomic sheep is asking about spikes broadening when the window becomes wider.

From post 7: the spikes become peaks when you extend the time period of analysis

a spike: |
a peak: /\

PhilDSP · Aug 15, 2012

Assuming that you keep the sample rate the same, doubling the sample length will give you twice as many samples (per window). That means you will end up with twice as many frequency buckets with your DFT. Hence, your frequency resolution will increase by 2. And your peak will be twice as sharp if that is what the signal consists of.

The Nyquist frequency gives you a similar relationship to frequency resolution. If the Nyquist frequency can be doubled, you are allowed to have a usable sampling frequency that is doubled. You'll end up with twice as many samples per window. That will also give you double the number of frequency buckets and twice the resolution.

Fourier Analysis of Real Sound Waves

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Why is water pressure increased in a plastic bag in a bucket?

High School Fan is pushing in the front and sucking from the back

Undergrad Why is thermal energy treated differently than other kinds of energy?

Graduate Does a moving particle count as a wave?

Undergrad Measured Spectrum of Stopped Wood Organ Pipe Shows ALL Overtones

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect