Randomizing phases of harmonics

  • #1
entropy1
Gold Member
1,042
60
Suppose I decompose a discrete audio signal in a set of frequency components. Now, if I would add the harmonics I got, I would get the original discrete signal. My question is: if I would randomize the phases of the harmonics first, and then add them, I would get a different signal, but would it sound the same as a signal in which the phases were left zero?
 

Answers and Replies

  • #2
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
but would it sound the same
Not 'the same' , necessarily but our hearing has to deal with many different listening conditions and we can recognise sounds under conditions of phase dispersion. Anywhere that's enclosed will produce dispersion effects, some can be extreme ( a large tiled tunnel, for instance , and you get phase and amplitude distortions of the harmonic content of any sound. We have some very clever Audio Signal Processing in our heads and the result is that we can often 'undo' the distortions and, at the same time, get an idea of the shape and size of the structure we're listening in.

It may be of interest that there's a lot of compression achievable in audio programmes by trifling the phases of very peaky sounds dynamically to reduce the amplitude and allow a higher RMS Power (and hence the loudness) to be raised, making full use of the available transmitter power (AM that is); more bang per buck with little further damage to audio quality.
 
  • Like
Likes entropy1 and Paul Colby
  • #3
Paul Colby
Gold Member
1,166
286
Might be a fun experiment to try with Octave or perhaps csound?
 
  • Like
Likes entropy1
  • #4
Baluncore
Science Advisor
2019 Award
7,969
2,854
The cochlea in our ear is a frequency sensitive tapered transmission line that has hair cells to amplify and detect movement. In effect it is performing a mechanical fourier transform of the audio. The centre of the auditory nerve carries the high frequency response, surrounded by progressively lower frequencies.
Above a few hundred hertz we have difficulty identifying the phase of harmonics, so we are unable to perceive a difference if the phase of the higher frequencies is shifted.
 
  • Like
Likes atyy, sophiecentaur and entropy1
  • #5
boneh3ad
Science Advisor
Insights Author
Gold Member
3,130
810
I think you need to be careful with the use of the term "harmonic" since it seems to be used here to simply indicate the frequency components of an audio signal rather than its technical definition.

Either way, the actual sound you hear is a temporal signal. Yes, it is sensed by the cochlea that essentially performs a mechanical Fourier transform, but it is a short-time Fourier transform in that the amplitudes of the various frequencies change rapidly in time and your ear is able to resolve that. Your brain essentially then performs the inverse Fourier transform in real time and experiences the audio as a signal in time.

If you randomize all of the phases, you will get the right combination of notes but they will be out of order in time so it's likely to sound like nonsense.
 
  • Like
Likes sophiecentaur
  • #6
atyy
Science Advisor
13,991
2,271
Last edited:
  • Informative
  • Like
Likes mfb and Baluncore
  • #7
boneh3ad
Science Advisor
Insights Author
Gold Member
3,130
810
Phase deafness in the sense you describe appears to be a case of playing what amounts to a constant set of tones and just randomizing the phases of those tones. The key part is that each frequency component individually is constant in time.

This does not represent something like, say, a song, where you have many tones that change over time to result in music. If you randomize the phases in "Thriller," I highly doubt you'll still recognize it but instead be listening to "hliTrler."
 
  • #8
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
so it's likely to sound like nonsense.
I agree with most of your post except for this bit. "Nonsense" implies not making any sense. In fact the hearing system copes very well with phase shift variations all over the audio spectrum. When you think how the voice signal processing in your mobile phone mangles up what hits the microphone yet how you can, not only get the words but recognise who's actually talking to you. 'Vocoding' gets away with murder.
Your hearing doesn't seem to care about the actual values of the time varying sound pressure.

Edit: this all depends on what actual phase delays you're talking about. The anagram "hliTrler" involves a several hundred milliseconds whereas out mid range perceived frequenc involve a time period of just a few ms.
 
  • #9
Baluncore
Science Advisor
2019 Award
7,969
2,854
Your brain essentially then performs the inverse Fourier transform in real time and experiences the audio as a signal in time.
A warm and wet inverse transform is not possible because the frequency information is encoded in different separated nerve fibres. Those parallel channels can be correlated. There is no IFT.

The rate of the brain chemistry limits the frequency at which it is possible to correlate phase. The brain can correlate the crest of LF waves, or the rise of the envelope of HF waves. All that can really be done is to estimate the level of stimulation from the two ears to estimate the direction of the wave.

If you randomize all of the phases, you will get the right combination of notes but they will be out of order in time so it's likely to sound like nonsense.
The cochlea is a systolic processor that does NOT scramble the order in time. A click will appear on all fibres at about the same time. The brain ignores those slight differences because phase is not required.
 
  • #10
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
Your brain essentially then performs the inverse Fourier transform in real time and experiences the audio as a signal in time.
I would say that is an over-simplification. When you hear any sound (particularly musical) your temporal and frequency domain experiences are of equal importance. People want to 'nail' what our hearing is doing but it is definitely not one or the other. Engineers have a yearning to characterise it all in circuitry (same for vision) and in the temporal domain but any coding system that involves minimising bit rate (for instance) always involves a bit of both. If you could input the 'perfect sound experience' directly into the senses, you wouldn't use a serial waveform (i.e. you'd have to bypass the eardrum).
 
  • #11
boneh3ad
Science Advisor
Insights Author
Gold Member
3,130
810
I agree with most of your post except for this bit. "Nonsense" implies not making any sense. In fact the hearing system copes very well with phase shift variations all over the audio spectrum. When you think how the voice signal processing in your mobile phone mangles up what hits the microphone yet how you can, not only get the words but recognise who's actually talking to you. 'Vocoding' gets away with murder.
Your hearing doesn't seem to care about the actual values of the time varying sound pressure.

Edit: this all depends on what actual phase delays you're talking about. The anagram "hliTrler" involves a several hundred milliseconds whereas out mid range perceived frequenc involve a time period of just a few ms.
Right, of course this is going to depend on magnitude of the shift and perhaps I've been a bit extreme in my examples.

The other relevant point would be that the different notes that are finite length in time are effectively showing up as amplitude-modulated pulses, so if the envelope doesn't shift in time, a phase shift in the carrier signal (in this instance, the one with the higher frequency of the note being heard) is unlikely to be meaningfully perceived. The issue is that the envelope wave itself has a series of frequency components of its own that would be subject to this hypothetical random shift. To illustrate, I've constructed the plot below.

8ASIVjr.png


Each of those signals has an identical amplitude spectrum. The top is the original. The middle only has the note itself shifted but the carrier envelope remains the same. The bottom has a random phase shift applied to every frequency component of the signal. The top two plots would sound the same. The bottom surely would not.

Note the drop in amplitude. Since this is a single pulse, the random phase shift distributes its power over time and so it lowers amplitudes in time without reducing them in the actual spectrum. If this was a signal composed of multiple pulses and frequencies in time, that effect would be much less.

A warm and wet inverse transform is not possible because the frequency information is encoded in different separated nerve fibres. Those parallel channels can be correlated. There is no IFT.
My point was not to suggest that the brain is actually physically converting those nerve signals back into an electrical time signal representing the audio, but that you as a conscious human experience the audio as a time-varying signal. It is senses as individual frequencies and experienced as something that varies in time, so that is in some ways an inverse Fourier transform. Maybe it's a metaphysical way more than an actual physical way, but my comment was intended to be illustrative rather than a technical discussion of brain physics.

The rate of the brain chemistry limits the frequency at which it is possible to correlate phase. The brain can correlate the crest of LF waves, or the rise of the envelope of HF waves. All that can really be done is to estimate the level of stimulation from the two ears to estimate the direction of the wave.
Right but the issue is that changes in phase across the whole spectrum can entirely scramble the temporal behavior of a signal as illustrated above. For a signal of constant tones, the cilia in your ear aren't likely to notice anything different if one or more are phase shifted, but they will be vibrating at the wrong times if these tones start and stop as you would expect in music or speech.

The cochlea is a systolic processor that does NOT scramble the order in time. A click will appear on all fibres at about the same time. The brain ignores those slight differences because phase is not required.
I think you may have missed what I was attempting to convey. I wasn't suggesting that the cochlea was somehow scrambling things in time. My point was that a random set of phase shifts across the entire spectrum of the signal will scramble the signal in time even before it gets to your ear.[/quote]
 
Last edited:
  • #12
Baluncore
Science Advisor
2019 Award
7,969
2,854
My point was that a random set of phase shifts across the entire spectrum of the signal will scramble the signal in time even before it gets to your ear.
And that is why phase is not important in understanding the sounds we hear. Everything flows at the speed of sound in air, ear, chochlea, nerve and brain, so the time is still in order, only the phase of the carrier is lost and unimportant.
 
  • #13
Baluncore
Science Advisor
2019 Award
7,969
2,854
Each of those signals has an identical amplitude spectrum. The top is the original. The middle only has the note itself shifted but the carrier envelope remains the same. The bottom has a random phase shift applied to every frequency component of the signal. The top two plots would sound the same. The bottom surely would not.
You have stretched the response over an entire 1 second period for numerical fourier analysis.
The cochlea is a bandpass filter with low Q so it delays each frequency channel by only about one cycle at that channel frequency. The cochlea is systolic, it does not take "one second sound grabs".
 
  • #14
boneh3ad
Science Advisor
Insights Author
Gold Member
3,130
810
You have stretched the response over an entire 1 second period for numerical fourier analysis.
The cochlea is a bandpass filter with low Q so it delays each frequency channel by only about one cycle at that channel frequency. The cochlea is systolic, it does not take "one second sound grabs".
But that's not the original question. The original question was about taking a discretely sampled audio signal, decomposing it into frequency components, applying a random phase shift to all of them, reconstituting it, and then playing it. Perhaps we are answering two different questions here accidentally?
 
  • #15
Baluncore
Science Advisor
2019 Award
7,969
2,854
Suppose I decompose a discrete audio signal in a set of frequency components.
But that's not the original question. The original question was about taking a discretely sampled audio signal, decomposing it into frequency components, applying a random phase shift to all of them, reconstituting it, and then playing it.
The OP does not specify the method, or the time over which the analysis is to be done.

You are assuming that a single very long period of time will be analysed with a block FFT.

I am assuming that a systolic process, like the cochlea or a digital filter element, is used for the analysis over a few cycles of each channel only. My analyser has a low Q, with a different bandwidth for each element over many octaves, quite unlike a computationally efficient FFT.

The cochlea implant device employs multi-channel analysers, not a block FFT.
https://en.wikipedia.org/wiki/Cochlear_implant#Parts
 
  • #16
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
Perhaps we are answering two different questions here accidentally?
It wouldn't be a first for PF. :smile:
There is a problem with introducing the Fourier Transform into any real life problem. The discrete fourier transform assumes a repeated waveform with some repeat rate, that lasts for ever. That seldom happens except when it can be said to apply approximately for a few musical instruments. The very structure of the mechanical / sensor bit of our hearing system implies time and frequency domain measurements and, as the really clever bits are in the brain / nerves and they do their best with what the ear presents them with. (This is the same for vision, taste and touch.)

Frankly, the only answer to the OP is that the question is a bit too naive for a good answer. Our experience of audio recording and communications is that you can seriously mangle some audio programme signals yet we hear them as 'perfectly understandable but not hifi'. Those of us with golden ears claim to be able to hear the slightest distortions though. The answer must be "it all depends" and it's actually pretty hard to predict because our model is very limited. Subjective tests are needed if you want to know just how good or bad a system is.
 
  • #17
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
The bottom surely would not.
Can you be sure of that? Whatever the signal sounds like, it will still repeat at the rate of the original 'pulsed note'. Each of the components will also be modulated by the original repeat rate. What exactly are you doing with the phases ("random phase")? A small amount of phase shifting would not make it suddenly 'burst out' like that.

The problem could be some possible aliasing (?) because I think there is an implied window function in the top diagram - there's no windowing in the bottom one.
 
  • #18
entropy1
Gold Member
1,042
60
The discrete fourier transform assumes a repeated waveform with some repeat rate, that lasts for ever. That seldom happens
Yes. I realized that later. How do you reconstruct the original signal IFT with FT "blocks" (discrete frequency domain)? However, there are things like scrambling an vocoding, so it should be possible I guess?
 
  • #19
atyy
Science Advisor
13,991
2,271
Yes. I realized that later. How do you reconstruct the original signal IFT with FT "blocks" (discrete frequency domain)? However, there are things like scrambling an vocoding, so it should be possible I guess?
One can even "throw away" the phase information (as in a spectrogram), and still recover the phase information if there is sufficient overlap in frequency bands. You can find brief discussion and references in https://www.jneurosci.org/content/20/6/2315: " An invertible spectrographic representation of sound requires the use of overlapping frequency bands, as explained in this paragraph and in more mathematical detail in the . The fine temporal structure of a sound is given by the relative phase of the narrow-band signals obtained in the decomposition of the sound into frequency bands. The phase of these narrow-band signals is thrown away in a spectrographic representation, where only the amplitude envelopes of the narrow-band signals are preserved. However, the relative phase of the narrow-band signals can be recovered from the joint consideration of the amplitude envelopes, as long as there is sufficient overlap among the frequency bands (Cohen, 1995; Theunissen and Doupe, 1998)."

See also the answer by Edouard at https://dsp.stackexchange.com/questions/9877/reconstruction-of-audio-signal-from-spectrogram which mentions overlapping time windows.
 
Last edited:
  • Like
Likes sophiecentaur
  • #20
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
Yes. I realized that later. How do you reconstruct the original signal IFT with FT "blocks" (discrete frequency domain)? However, there are things like scrambling an vocoding, so it should be possible I guess?
The transforms used on the audio data blocks in mpeg coding (raised cosine transforms, iirc) have to be arranged so that discontinuities between adjacent blocks aren’t heard. That will mean, I presume, that phase ‘jitter’ of components is a consideration. But that’s high end quality. Vocoding’s at its worst makes everyone sound like Donald Duck and you can’t tell em apart. Those systems are not very good with music.
 
  • #21
entropy1
Gold Member
1,042
60
Vocoding’s at its worst makes everyone sound like Donald Duck and you can’t tell em apart. Those systems are not very good with music.
I own FL Studio and it has a Vocoder that sounds very cool :-p :oldbiggrin:
 
  • Like
Likes atyy and sophiecentaur
  • #22
sophiecentaur
Science Advisor
Gold Member
25,009
4,743
Hah. I never said DD doesn’t sound cool!
 
  • Like
Likes nasu and atyy
  • #23
34,655
10,799
But that's not the original question. The original question was about taking a discretely sampled audio signal, decomposing it into frequency components, applying a random phase shift to all of them, reconstituting it, and then playing it. Perhaps we are answering two different questions here accidentally?
OP asked about harmonics, i.e. can we hear the difference between sin(x)+0.3sin(2x) and sin(x)+0.3sin(2x+1)?
 
  • #25
Baluncore
Science Advisor
2019 Award
7,969
2,854
OP asked about harmonics, i.e. can we hear the difference between sin(x)+0.3sin(2x) and sin(x)+0.3sin(2x+1)?
That will depend on the frequency.

At high frequencies you will not be able to detect the phase difference.

If the frequencies are below a few hundred Hz you may be able to learn to recognise a consistent phase difference. That is because the different hair cells will be triggered synchronously by the sinewave components, and your brain may “learn” to correlate the relative timing.
 
  • Like
Likes sophiecentaur

Related Threads on Randomizing phases of harmonics

Replies
2
Views
3K
Replies
3
Views
5K
Replies
12
Views
2K
  • Last Post
Replies
1
Views
926
  • Last Post
Replies
9
Views
12K
  • Last Post
Replies
1
Views
788
Replies
1
Views
962
  • Last Post
Replies
1
Views
1K
  • Last Post
Replies
1
Views
988
Top