# How to separate sound from another sound?

For example: in a song, there is some instrument like guitar, piano, violin and singer voice. How to separate them ? how we can get the individual sound? human can easily understand them, that in that song is consist of several instrument and a singer voice. But how exactly we do that theoretically?

If that can be done, I believe that may change our perspective about signal processing. And also it is very useful in many application.

Related Other Physics Topics News on Phys.org
You'd usually do something like take a fourier transform of the signal and develop a sort of "template signal" or each instrument and match it. The worse quality the recording to harder this is to do and the more advanced techniques you need.

DavidSnider
Gold Member
Isn't this akin to trying to remove a watermark from an image? You'd have to know the original function in order to separate the two "signals"..

When you add together multiple signals it's impossible to know if it was 1 + 2 + 3 = 6 or 0 + 3 + 3, etc...

but I'm sure there are tricky ways to make very good guesses based on probability

Last edited:
You'd usually do something like take a fourier transform of the signal and develop a sort of "template signal" or each instrument and match it. The worse quality the recording to harder this is to do and the more advanced techniques you need.
I think Fourier transform is not enough. Because for example if we mix c tone from piano and c tone from violin the frequencies is same. But the sound is different. I think we need more than frequencies domain to recognize it.
Template matching may be work for verification or identification the individual sound. But not to separate signal into each component.

diazona
Homework Helper
The distinction between e.g. a piano C and a violin C lies in the relative intensities of the higher harmonics. This pattern of relative intensities is roughly characteristic to each instrument. So if you had the waveform of a perfectly ideal piano/violin duet playing a unison C, you could probably tell that the intensities of the harmonics were the sums of the piano intensities and violin intensities, and thereby identify the instruments. But in a real recording, you'd have to deal with phase differences, various levels of interference, echos, random noise, etc. - basically all sorts of effects that would make it very difficult to clearly identify which instruments were contributing to a waveform.

You're certainly not the only one wondering about how to do that, though ;-)

Isn't this akin to trying to remove a watermark from an image? You'd have to know the original function in order to separate the two "signals"..

When you add together multiple signals it's impossible to know if it was 1 + 2 + 3 = 6 or 0 + 3 + 3, etc...

but I'm sure there are tricky ways to make very good guesses based on probability
No, this is not about that. Think about the application of this technique. If we can separate mixed sound into each component, we can build a very effective speech recognition. But first we need voice recognition. People usually misunderstand about speech recognition and voice recognition. Speech recognition is recognize speech (what is being said) and voice recognition is recognize voice (what /who voice is that).

I think this is not about addition, but something else. And not about probability. I believe there is several pattern that we don’t know yet. Because we human race can easily mention what instrument is make up the song.

Well with something like voice recognition things get a lot more complicated. The general approach is probably some form of stochastic signal analysis and machine learning ( I could see using something like a genetic algorithm that mashes together base frequencies and matches again an atlas of clean voices).

f95toli
Gold Member
For example: in a song, there is some instrument like guitar, piano, violin and singer voice. How to separate them ? how we can get the individual sound? human can easily understand them, that in that song is consist of several instrument and a singer voice. But how exactly we do that theoretically?
We are not nearly as good as this as you might think. During the production of a record one of the main tasks of the mixing engineer is to make sure the listener can separate the main instruments. There are many ways to achieve this, the most obvious being to use equalizers (i.e. filtering) to "distribute" the various instruments throughout the spectrum, sometimes this only leaves a small range around the fundamental of the instrument (this is quite common for acoustic guitars). Another common technique is to "duck" instruments with respect to each other, e.g. the bass is ducked with respect to the kick. Last but not least the instruments are distributed in "space", i.e. the stereo field (two instruments that use the same part of the spectrum can be panned hard right/left).

Now, there are of course many cases where we DON'T want the listener to separate the sounds. It is quite often the case that what we perceive as a single instrument is really several layered sounds (modern pop-songs) often use 30-40 different tracks, e.g. a synth bass "underneath" the main bass line to make it sound "thicker" etc (not to mention compressed side-chains, reverbs delays etc).

Hence, our ability to separate instruments is to a large extent artificial; it is much more difficult to do so if the song is just a straightforward recording, a typical example would be a symphony orchestra with all the strings playing at once.

Note that the reason why we can "remove" so much of a sound is that our brains are very good at "adding" the missing bits and this is true in general; even when we THINK we can separate two sounds if is usually the case that we are really only hearing some parts of the spectrum and our brains fill in the blanks; this is incidentally also used by mp3 and other psycho-acoustic compression methods.