Sound waves

If a basic sin sound wave is analysed with a Fourier transform, the result is just a spike at a certain frequency. My maths isn't the best so bare with me... if we take a real sound file and take Fourier transforms at regular intervals (I assume that's what's being done when calculating a Spectrum over a range of time of a sound file), I'll get a spectrum that doesn't any longer have spikes and instead has these more mountain peaks rather than spikes. My question is, since real sounds aren't stationary and taking into account the additivity of Fourier transforms (i.e. from my understanding, you can simply add Fourier transforms without um... data loss?), is the reason for these more mountain peaks rather than spikes simply due to the averaging process?

Sorry, I'm being a bit stupid, I'm not understanding what you're asking - could you possibly post a pic of what you're describing? There's a load of java apps on t'internet that should draw it for you. Cheers!

Simon Bridge
Homework Helper
An arbitrary sound will be composed of a superposition of sine waves ... the fourier transform of an actual sound shape would therefore be a lot of spikes for the periodic components.

Don't try this with cats.
http://xkcd.com/26/

Basically in a sound editing program, there's an option to view the fourier transform over a period of time... if you view this fourier transform at any given time, it's as Simon Bridge pointed out, a bunch of spikes that correspond with the individual periodic components, however since we're dealing with a range of time of sound, my understanding is that what's happening is that e.g. we have a sound which is composed of just 1 sin wave at say 1000 hz.... however since real world sounds aren't stationary... that 1000 hz sound might go up and down a little... so over a range of time, it would appear that the sound is composed from different frequencies even though it's the same one that is simply not stationary (from a stats point of view). So my question was simply asking whether that's indeed what was most likely that was happening... i.e. the spikes become peaks when you extend the time period of analysis? As I explain this, it makes more and more sense to me and indeed I believe that this is indeed what is happening but just wanting to make sure.

The reason for the 'mountain' instead of a peak is a process called 'wondowing'.

The fourier transform of a sine wave is indeed a spike at it's base frequency. BUT that is the fourier transform of a sine wave that goes on forever (if you compute it by hand you take the integral from minus infinity to infinity).

If however you take the transform during a finite interval of time the peaks are being 'broadened'. If you know a bit of the mathematics behind this, you know that multiplying two signals is the same as convoluting them in the frequency domain. So, taking a transform in a finite interval is the same as taking the transform of an infinite sine, multiplied by a function which is zero outside this interval, and one in the interval. That function is called a 'window'.

So, to find the spectrum of a 'real life' sine (like in a sound editing program), you need to convolute the spectrum of the window function and the sine. This convolution broadens the spectum.

Maybe you can try this to see it for yourself in a sound editing program: take a sine wave of 1 sec, 2 sec, 4 sec, ... and compare the spectra. You should see the spectrum getting peakyer (thinner) as the sine wave gets longer...

See this as well: http://en.wikipedia.org/wiki/Window_function

Simon Bridge
Homework Helper
Windo... ah: jahaan beat me to it :)

In general, sound waves are travelling waves.
The Fourier transform of a travelling wave with a single frequency is still a spike. Try it.
If the frequency somehow changed continuously in time, say the source is accelerating wrt the air, the Fourier transform would just be time-dependent with the change. If you took a long sample (compared with the rate the frequency changes), the fourier transform would look like whatever combination of discrete frequencies would make the final waveform.
You could have a wave constructed from a continuous range of frequency components though :) The Fourier transform does not have to be all spikes. You can have fun constructing frequency spectra and doing the inverse transform to see what the waveform would look like.

Besides windowing, another source of line broadening in real-world sounds would be statistical uncertainties in the measuring process.

Last edited:
The basic situation is that if the portion of the sound wave that you've captured in your data window is not entirely periodic within that window, then the FT or DFT will be inexact (at best) or wildly inaccurate (at worst). If a sound wave varies in time, which is very likely, then it's highly unlikely that it will be periodic within the data window. Lengthening the data window will reduce the inaccuracies as well as increase the frequency resolution (giving up sharper peaks).

Can I ask a secondary question which may potentially clean up a confusion that's been in my brain for a while- when we split the wave into subsections in time, are we using the Nyquist frequency to do this? And when Phil says "inexact(at best) or wildly inaccurate (at worst)", would the critical limit between these two be when the frequency of divisions in the sample is half the maximum frequency of sound contained within the sample?

If a basic sin sound wave is analysed with a Fourier transform, the result is just a spike at a certain frequency. My maths isn't the best so bare with me... if we take a real sound file and take Fourier transforms at regular intervals (I assume that's what's being done when calculating a Spectrum over a range of time of a sound file), I'll get a spectrum that doesn't any longer have spikes and instead has these more mountain peaks rather than spikes. My question is, since real sounds aren't stationary and taking into account the additivity of Fourier transforms (i.e. from my understanding, you can simply add Fourier transforms without um... data loss?), is the reason for these more mountain peaks rather than spikes simply due to the averaging process?

A typical symphony is made of 200 million sine waves. When a computer plots a spectrum of said symphony, 200 million spikes must fit into a 1000 pixels wide window, which will cause some spikes to fuse into some kind of mountain tops.

Discrete Fourier transform of data = spectrum of the data
discrete = spiky

Simon Bridge
Homework Helper
Can I ask a secondary question which may potentially clean up a confusion that's been in my brain for a while- when we split the wave into subsections in time, are we using the Nyquist frequency to do this?
The Nyquist frequency is set by the sampling rate. When you choose the sampling rate you also choose the nyquist frequency.
http://en.wikipedia.org/wiki/Nyquist_frequency

And when Phil says "inexact(at best) or wildly inaccurate (at worst)", would the critical limit between these two be when the frequency of divisions in the sample is half the maximum frequency of sound contained within the sample?
See the wiki article - to have a good reconstruction of a signal you need the nyquist frequency to be, at least, a bit higher than the highest frequency in the sample ... to get this, you choose a sampling frequency twice this.

For example, if the sample rate is 20 kHz, the Nyquist frequency is 10 kHz, and an 11 kHz signal will be indistinguishable from a (20-11=) 9 kHz signal and 'tother way too. You need some reason to believe it's one and not the other to do the reconstruction correctly. i.e. if you happen to know that all the signals of interest have f < 10kHz then you set your sampling rate for fsam > 20kHz = 2xfsig.

But I don't think that's what Phil was talking about - I'll leave it to him to explain :)

Windo... ah: jahaan beat me to it :)

Besides windowing, another source of line broadening in real-world sounds would be statistical uncertainties in the measuring process.