# How does the ear distinguish multiple different pitches?

• B

## Main Question or Discussion Point

A chord played on a piano is multiple different pitches, each at a different frequency, played simultaneously. The ear hears this as separate sounds. The cochlea of the ear is organized by tone (tonographic organization), from highest to lowest frequency. So a triad chord will stimulate three corresponding area of hair cells in the cochlea separately.

My question is: how do the three tones get perceived that way? Why not as a single complex wave? It seems that the combined air pressure wavefront generated from three different notes actually should combine to form a very complex wave pattern, through a sort of Fourier summation. Does the ear do some kind of "reverse" Fourier analysis to break up this complex wave pattern into its individual component waves?

This is even more puzzling when you listen to a whole symphony orchestra all playing together at the same time. You can still distinguish the strings from the trumpets from the tympany from the flutes from the triangle, etc... It can clearly still distinguish all the different pitches and timbres, with all their associated overtones, ranging from the tuba to the piccolo. But I can't even imagine how complex that combined wave of all these instruments would appear if you would put them all on the same graph. Somehow it seems the ear is still able to sort them out and hear them as separate sounds though.

And a related question regarding the stereo diaphragm: how does a single diaphgram vibrate to recreate all those different sounds, not to mention all the overtones? It must be vibrating as the sum of all those waves, in the same way the ear drum must be vibrating.

How?

Related Other Physics Topics News on Phys.org
Gold Member
The hairs in your ear are sized differently and resonate at different natural frequencies. A given hair vibrating tells the brain there is sound content and the corresponding frequency.

The ear essential performs an analog Fourier transform in real time.

The hairs in your ear are sized differently and resonate at different natural frequencies. A given hair vibrating tells the brain there is sound content and the corresponding frequency.

The ear essential performs an analog Fourier transform in real time.
I see. That makes sense. Thank you.

But then what about the diaphragm on the stereo that vibrates? I can't even imagine the level of complexity that a single diaphragm like that has to vibrate in to transmit all the sounds and pitches in a piece of music.

tnich
Homework Helper
I see. That makes sense. Thank you.

But then what about the diaphragm on the stereo that vibrates? I can't even imagine the level of complexity that a single diaphragm like that has to vibrate in to transmit all the sounds and pitches in a piece of music.
But the speaker diaphragm doesn't have to do any analysis. It just has to vibrate according to the input signal. One underlying assumption of the audio equipment is that you record a waveform and then reproduce it with a fidelity that is as high as you can make it (for the price).

To make that happen you try to make everything in the system - the microphones, amplifiers, recording and playback equipment, recording media, and speakers - behave linearly. That is to say, if you record a signal $G(t) + H(t)$ containing sounds from two different sources, then at each step of the process, the transformations you make to the signal have the property $T(G(t) + H(t)) = aG(t) + aH(t)$, where $a$ is a change in amplitude.

Speakers are just one of many parts of the system that need to adhere to that constraint, and they do so only imperfectly and over a limited range of frequencies. That is why you use a woofer, a tweeter and a midrange speaker. One diaphragm generally cannot transform the signal into pressure waves in an approximately linear fashion over the whole range of audible frequencies.

Interesting, though, the ear does not seem to work that way. In certain cases, particularly with multiple voices singing, you can hear overtones (or undertones) that sound like an additional voice. That indicates that your ears and brain process sounds in a non-linear fashion.

A.T.
I can't even imagine the level of complexity ...
You can look at the waveforms in audio software.

sophiecentaur
Gold Member
The hairs in your ear are sized differently and resonate at different natural frequencies. A given hair vibrating tells the brain there is sound content and the corresponding frequency.

The ear essential performs an analog Fourier transform in real time.
A very close analogy to the way the cochlea works would be a Piano with the dampers removed (actually, with them applied very lightly). If you shout at the piano then some of the strings will resonate but some won't. If you play a chord on a nearby piano, the same strings that were struck will resonate on the 'receiving' piano. You could imagine having a sensor on each of the 88 keys on the receiving piano and you could then display the chord that was played on the other piano. There are many more than 88 hairs on the cochlea and your brain is also aware to the variation in time of the 'amplitudes' of the received signals on the hairs.
"The ear essentialy performs an analog Fourier transform in real time" could imply that there is something more fundamental about the time variation of a sound (time domain representation) than the frequency content (frequency domain). In fact they are of equal 'status'. Indeed, before the invention of the oscilloscope, the frequency domain information was very important for representing music - sheet music and the pegs on a musical box are a mixture of frequency and time domain information and a chord played on the guitar of piano is just frequency domain.

But the speaker diaphragm doesn't have to do any analysis. It just has to vibrate according to the input signal. One underlying assumption of the audio equipment is that you record a waveform and then reproduce it with a fidelity that is as high as you can make it (for the price).

To make that happen you try to make everything in the system - the microphones, amplifiers, recording and playback equipment, recording media, and speakers - behave linearly. That is to say, if you record a signal $G(t) + H(t)$ containing sounds from two different sources, then at each step of the process, the transformations you make to the signal have the property $T(G(t) + H(t)) = aG(t) + aH(t)$, where $a$ is a change in amplitude.

Speakers are just one of many parts of the system that need to adhere to that constraint, and they do so only imperfectly and over a limited range of frequencies. That is why you use a woofer, a tweeter and a midrange speaker. One diaphragm generally cannot transform the signal into pressure waves in an approximately linear fashion over the whole range of audible frequencies.

Interesting, though, the ear does not seem to work that way. In certain cases, particularly with multiple voices singing, you can hear overtones (or undertones) that sound like an additional voice. That indicates that your ears and brain process sounds in a non-linear fashion.
This makes sense, thank you.

What is interesting is that the eardrum itself is a membrane which should have a fundamental vibrating frequency of its own, ie, if you tap on it like a tympany diaphragm, it should vibrate at a particular frequency (I would think it would be fairly high pitched, given how small it is). Does that mean the ear hears at that frequency better than other frequencies?

tnich
Homework Helper
This makes sense, thank you.

What is interesting is that the eardrum itself is a membrane which should have a fundamental vibrating frequency of its own, ie, if you tap on it like a tympany diaphragm, it should vibrate at a particular frequency (I would think it would be fairly high pitched, given how small it is). Does that mean the ear hears at that frequency better than other frequencies?
The eardrum is a membrane, but it is unlike a drumhead in that it is firmly connected to a series of little bones called ossicles. So it is not free to vibrate at its fundamental frequency.

phinds
Gold Member
2019 Award
@Sophrosyne if you are interested in human sound mechanisms, check this out.

It took me a while to believe that the "polyphonic singing" is a real thing. Some of the examples are just flat amazing. This is just the first one I found in a Google search.

A chord played on a piano is multiple different pitches, each at a different frequency, played simultaneously. The ear hears this as separate sounds. … My question is: how do the three tones get perceived that way? Why not as a single complex wave?
In my experience my ear is not a good frequency analyzer for continuous tones. Test: when listening to the sound of a square wave, I am unable to count the separate frequency components. And when listening to the mixture of two or three continuous pure tones, my ear experiences it primarily as a single sound. In my opinion, a chord played on a piano is not a fair test, because the hammers do not strike perfectly simultaneously, and the tones have different decay times.

sophiecentaur
Gold Member
What is interesting is that the eardrum itself is a membrane which should have a fundamental vibrating frequency of its own, ie, if you tap on it like a tympany diaphragm, it should vibrate at a particular frequency (I would think it would be fairly high pitched, given how small it is). Does that mean the ear hears at that frequency better than other frequencies?
Firstly, the membrane has many modes of vibration, which could 'colour' the perceived sound and secondly, the system has been 'engineered' to pass the received sound power as efficiently as possible and over a very wide frequency range (around ten octaves). This implies that the resonances are damped by the air in the ear canal at one end and the cochlea at the other end of the chain. (A significant resonance will only occur in a membrane if the energy is allowed to build up on it and that doesn't happen in the ear)
A loudspeaker will have a large diaphragm (cone) and the cabinet (sides and internal void) will have natural resonances. This can sound dreadful unless the cabinet is damped quite drastically. Same idea as the sound transmission in the ear.
In my experience my ear is not a good frequency analyzer for continuous tones. Test: when listening to the sound of a square wave, I am unable to count the separate frequency components. And when listening to the mixture of two or three continuous pure tones, my ear experiences it primarily as a single sound. In my opinion, a chord played on a piano is not a fair test, because the hammers do not strike perfectly simultaneously, and the tones have different decay times.
The analysis is very clever indeed and we take all sorts of clues about the sound we hear when we identify the content and the source. We do frequency and time analysis of the waveform. Actually, the time domain representation of any sound that you see on a 'scope' means very little to the casual observer and it needs a lot of training and experience to get much about the frequencies and waveforms from such a picture. Show someone a spectrum analyser display and they could easily determine the notes and the chords (with a little help). Otoh, if you look at a much slower scan rate, the time domain display can show the syllables and rhythms of speech and music.

russ_watters
Mentor
And a related question regarding the stereo diaphragm: how does a single diaphgram vibrate to recreate all those different sounds, not to mention all the overtones? It must be vibrating as the sum of all those waves...
Yes: there is only one, albeit very complicated wave. The principle at work here is called "superposition":
http://www.acs.psu.edu/drussell/demos/superposition/superposition.html

Any two or more waves can be added together into a single, more complicated wave. This is how sound transmission and playback systems work (though different sounds are first recorded on individual channels before being combined).

Also, an object has one natural frequency, but it can be driven to vibrate at any frequency as long as you keep applying a force to it. They are not limited to providing only their natural frequency.

sophiecentaur
Gold Member
That indicates that your ears and brain process sounds in a non-linear fashion.
And why not? You have to remember that the way we process information about the world about us is nothing like the way a Sound Recorder or a TV camera works. We evolved and were not designed by an Engineer and there are many apparently lunatic design aspects to the human body.
A couple of years ago, I had a Sonocardiogram and I was looking at an image of my heart happily pumping away with the valves apparently being held in place by lots of flimsy looking strings. It looked like a model a child could have put together with a supermarket plastic bag and bits of string. I had to look at it for a long time before I could believe that my life had been relying on that mechanism for many decades. It really works very well (even my slightly out of condition model)!
So trying to apply a conventional design critique to any part of us is going to confuse us. It is always constructed in a different way than we would build a replacement. Non linearity is not a problem if you are appropriately analysing all the many signal channels that our senses use.

PeterO
Homework Helper
A chord played on a piano is multiple different pitches, each at a different frequency, played simultaneously. The ear hears this as separate sounds. The cochlea of the ear is organized by tone (tonographic organization), from highest to lowest frequency. So a triad chord will stimulate three corresponding area of hair cells in the cochlea separately.

My question is: how do the three tones get perceived that way? Why not as a single complex wave? It seems that the combined air pressure wavefront generated from three different notes actually should combine to form a very complex wave pattern, through a sort of Fourier summation. Does the ear do some kind of "reverse" Fourier analysis to break up this complex wave pattern into its individual component waves?

This is even more puzzling when you listen to a whole symphony orchestra all playing together at the same time. You can still distinguish the strings from the trumpets from the tympany from the flutes from the triangle, etc... It can clearly still distinguish all the different pitches and timbres, with all their associated overtones, ranging from the tuba to the piccolo. But I can't even imagine how complex that combined wave of all these instruments would appear if you would put them all on the same graph. Somehow it seems the ear is still able to sort them out and hear them as separate sounds though.

And a related question regarding the stereo diaphragm: how does a single diaphgram vibrate to recreate all those different sounds, not to mention all the overtones? It must be vibrating as the sum of all those waves, in the same way the ear drum must be vibrating.

How?
The key is:
"So a triad chord will stimulate three corresponding area of hair cells in the cochlea separately.
How do the three tones get perceived that way?"

Part of the ear is detecting the three pitches simultaneously, and the brain then interprets those sounds.
Generally the lowest pitch is what our brain interprets the pitch to be, then the other pitches provide a "quality of sound" property. Indeed even a single note on a piano consists of a number of frequencies (fundamental and overtones), and it is those frequencies and their relative intensity that lets us recognise the sound as coming from a piano (provided you have previously heard a piano before and seen it and heard it at the same time)
Indeed:
If a double bass is recorded, then played back through a cheap sound system that simply cannot reproduce the fundamental frequency, just the overtones, our brain can recognise the "incomplete" set of "fundamental & overtones" and "fool" the brain into "hearing" the lower note that was originally produced.

sophiecentaur
Gold Member
our brain can recognise the "incomplete" set of "fundamental & overtones" and "fool" the brain into "hearing" the lower note that was originally produced.
Our brains are extremely good at making the most of limited information. It copes amazingly well with low levels of lighting and poor listening conditions. Of course, it sometimes doesn't get it right but when you think the basic system evolved for the purposes of being an early hunting hominid in jungle or savannah conditions, it does pretty well.

pinball1970
Gold Member
@Sophrosyne if you are interested in human sound mechanisms, check this out.

It took me a while to believe that the "polyphonic singing" is a real thing. Some of the examples are just flat amazing. This is just the first one I found in a Google search.
That is nothing short of amazing.

pinball1970
Gold Member
@Sophrosyne if you are interested in human sound mechanisms, check this out.

It took me a while to believe that the "polyphonic singing" is a real thing. Some of the examples are just flat amazing. This is just the first one I found in a Google search.
I always thought about this note from Paul Mac. Its either a top A or G - is polyphonic? From 3.08-3.11

pinball1970
Gold Member
In my experience my ear is not a good frequency analyzer for continuous tones. Test: when listening to the sound of a square wave, I am unable to count the separate frequency components. And when listening to the mixture of two or three continuous pure tones, my ear experiences it primarily as a single sound. In my opinion, a chord played on a piano is not a fair test, because the hammers do not strike perfectly simultaneously, and the tones have different decay times.
Agree - strings are better for that "block" sound or something else that has a low percussive element to it.