Fourier Analysis of Real Sound Waves

Click For Summary

Discussion Overview

The discussion revolves around the Fourier analysis of real sound waves, particularly focusing on the differences between the Fourier transform of ideal sine waves and the transforms of real-world sound files. Participants explore concepts such as windowing, frequency resolution, and the implications of non-stationary signals on the resulting spectra.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants note that the Fourier transform of a basic sine wave results in a spike at its frequency, while real sound files yield broader peaks due to non-stationarity and averaging effects.
  • One participant suggests that the appearance of mountain peaks instead of spikes in the spectrum may be due to the averaging process over time intervals.
  • Another participant explains that windowing, which involves taking the Fourier transform over a finite interval, leads to the broadening of peaks in the spectrum.
  • It is mentioned that if a sound wave is not periodic within the data window, the Fourier transform may be inaccurate, and lengthening the window can improve frequency resolution but may reduce peak sharpness.
  • Some participants discuss the role of the Nyquist frequency in sampling and its relationship to the accuracy of the Fourier transform.
  • There is a mention of statistical uncertainties in measuring real-world sounds contributing to line broadening in the spectrum.

Areas of Agreement / Disagreement

Participants express various viewpoints on the effects of windowing and non-stationarity on the Fourier transform, indicating that multiple competing views remain. The discussion does not reach a consensus on the implications of these factors.

Contextual Notes

Limitations include the dependence on the definitions of periodicity and non-stationarity, as well as unresolved mathematical steps regarding the implications of windowing and sampling rates on the Fourier transform.

Atomic_Sheep
Messages
18
Reaction score
1
If a basic sin sound wave is analysed with a Fourier transform, the result is just a spike at a certain frequency. My maths isn't the best so bare with me... if we take a real sound file and take Fourier transforms at regular intervals (I assume that's what's being done when calculating a Spectrum over a range of time of a sound file), I'll get a spectrum that doesn't any longer have spikes and instead has these more mountain peaks rather than spikes. My question is, since real sounds aren't stationary and taking into account the additivity of Fourier transforms (i.e. from my understanding, you can simply add Fourier transforms without um... data loss?), is the reason for these more mountain peaks rather than spikes simply due to the averaging process?
 
Physics news on Phys.org
Sorry, I'm being a bit stupid, I'm not understanding what you're asking - could you possibly post a pic of what you're describing? There's a load of java apps on t'internet that should draw it for you. Cheers!
 
An arbitrary sound will be composed of a superposition of sine waves ... the Fourier transform of an actual sound shape would therefore be a lot of spikes for the periodic components.

Don't try this with cats.
http://xkcd.com/26/
 
Basically in a sound editing program, there's an option to view the Fourier transform over a period of time... if you view this Fourier transform at any given time, it's as Simon Bridge pointed out, a bunch of spikes that correspond with the individual periodic components, however since we're dealing with a range of time of sound, my understanding is that what's happening is that e.g. we have a sound which is composed of just 1 sin wave at say 1000 hz... however since real world sounds aren't stationary... that 1000 hz sound might go up and down a little... so over a range of time, it would appear that the sound is composed from different frequencies even though it's the same one that is simply not stationary (from a stats point of view). So my question was simply asking whether that's indeed what was most likely that was happening... i.e. the spikes become peaks when you extend the time period of analysis? As I explain this, it makes more and more sense to me and indeed I believe that this is indeed what is happening but just wanting to make sure.
 
The reason for the 'mountain' instead of a peak is a process called 'wondowing'.

The Fourier transform of a sine wave is indeed a spike at it's base frequency. BUT that is the Fourier transform of a sine wave that goes on forever (if you compute it by hand you take the integral from minus infinity to infinity).

If however you take the transform during a finite interval of time the peaks are being 'broadened'. If you know a bit of the mathematics behind this, you know that multiplying two signals is the same as convoluting them in the frequency domain. So, taking a transform in a finite interval is the same as taking the transform of an infinite sine, multiplied by a function which is zero outside this interval, and one in the interval. That function is called a 'window'.

So, to find the spectrum of a 'real life' sine (like in a sound editing program), you need to convolute the spectrum of the window function and the sine. This convolution broadens the spectum.

Maybe you can try this to see it for yourself in a sound editing program: take a sine wave of 1 sec, 2 sec, 4 sec, ... and compare the spectra. You should see the spectrum getting peakyer (thinner) as the sine wave gets longer...

See this as well: http://en.wikipedia.org/wiki/Window_function
 
Windo... ah: jahaan beat me to it :)

I'll add:
In general, sound waves are traveling waves.
The Fourier transform of a traveling wave with a single frequency is still a spike. Try it.
If the frequency somehow changed continuously in time, say the source is accelerating wrt the air, the Fourier transform would just be time-dependent with the change. If you took a long sample (compared with the rate the frequency changes), the Fourier transform would look like whatever combination of discrete frequencies would make the final waveform.
You could have a wave constructed from a continuous range of frequency components though :) The Fourier transform does not have to be all spikes. You can have fun constructing frequency spectra and doing the inverse transform to see what the waveform would look like.

Besides windowing, another source of line broadening in real-world sounds would be statistical uncertainties in the measuring process.
 
Last edited:
The basic situation is that if the portion of the sound wave that you've captured in your data window is not entirely periodic within that window, then the FT or DFT will be inexact (at best) or wildly inaccurate (at worst). If a sound wave varies in time, which is very likely, then it's highly unlikely that it will be periodic within the data window. Lengthening the data window will reduce the inaccuracies as well as increase the frequency resolution (giving up sharper peaks).
 
Can I ask a secondary question which may potentially clean up a confusion that's been in my brain for a while- when we split the wave into subsections in time, are we using the Nyquist frequency to do this? And when Phil says "inexact(at best) or wildly inaccurate (at worst)", would the critical limit between these two be when the frequency of divisions in the sample is half the maximum frequency of sound contained within the sample?
 
Atomic_Sheep said:
If a basic sin sound wave is analysed with a Fourier transform, the result is just a spike at a certain frequency. My maths isn't the best so bare with me... if we take a real sound file and take Fourier transforms at regular intervals (I assume that's what's being done when calculating a Spectrum over a range of time of a sound file), I'll get a spectrum that doesn't any longer have spikes and instead has these more mountain peaks rather than spikes. My question is, since real sounds aren't stationary and taking into account the additivity of Fourier transforms (i.e. from my understanding, you can simply add Fourier transforms without um... data loss?), is the reason for these more mountain peaks rather than spikes simply due to the averaging process?


A typical symphony is made of 200 million sine waves. When a computer plots a spectrum of said symphony, 200 million spikes must fit into a 1000 pixels wide window, which will cause some spikes to fuse into some kind of mountain tops.

Discrete Fourier transform of data = spectrum of the data
discrete = spiky
 
  • #10
MikeyW said:
Can I ask a secondary question which may potentially clean up a confusion that's been in my brain for a while- when we split the wave into subsections in time, are we using the Nyquist frequency to do this?
The Nyquist frequency is set by the sampling rate. When you choose the sampling rate you also choose the nyquist frequency.
http://en.wikipedia.org/wiki/Nyquist_frequency

And when Phil says "inexact(at best) or wildly inaccurate (at worst)", would the critical limit between these two be when the frequency of divisions in the sample is half the maximum frequency of sound contained within the sample?
See the wiki article - to have a good reconstruction of a signal you need the nyquist frequency to be, at least, a bit higher than the highest frequency in the sample ... to get this, you choose a sampling frequency twice this.

For example, if the sample rate is 20 kHz, the Nyquist frequency is 10 kHz, and an 11 kHz signal will be indistinguishable from a (20-11=) 9 kHz signal and 'tother way too. You need some reason to believe it's one and not the other to do the reconstruction correctly. i.e. if you happen to know that all the signals of interest have f < 10kHz then you set your sampling rate for fsam > 20kHz = 2xfsig.

But I don't think that's what Phil was talking about - I'll leave it to him to explain :)
 
  • #11
Simon Bridge said:
Windo... ah: jahaan beat me to it :)

Besides windowing, another source of line broadening in real-world sounds would be statistical uncertainties in the measuring process.


Yeah but atomic sheep is asking about spikes broadening when the window becomes wider.

From post 7: the spikes become peaks when you extend the time period of analysis

a spike: |
a peak: /\
 
  • #12
Assuming that you keep the sample rate the same, doubling the sample length will give you twice as many samples (per window). That means you will end up with twice as many frequency buckets with your DFT. Hence, your frequency resolution will increase by 2. And your peak will be twice as sharp if that is what the signal consists of.

The Nyquist frequency gives you a similar relationship to frequency resolution. If the Nyquist frequency can be doubled, you are allowed to have a usable sampling frequency that is doubled. You'll end up with twice as many samples per window. That will also give you double the number of frequency buckets and twice the resolution.
 
Last edited:

Similar threads

Replies
3
Views
2K
  • · Replies 27 ·
Replies
27
Views
5K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 3 ·
Replies
3
Views
12K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 58 ·
2
Replies
58
Views
8K
  • · Replies 8 ·
Replies
8
Views
2K