Randomizing phases of harmonics

entropy1 · Aug 5, 2020

Suppose I decompose a discrete audio signal in a set of frequency components. Now, if I would add the harmonics I got, I would get the original discrete signal. My question is: if I would randomize the phases of the harmonics first, and then add them, I would get a different signal, but would it sound the same as a signal in which the phases were left zero?

sophiecentaur · Aug 5, 2020

entropy1 said:

but would it sound the same

Not 'the same' , necessarily but our hearing has to deal with many different listening conditions and we can recognise sounds under conditions of phase dispersion. Anywhere that's enclosed will produce dispersion effects, some can be extreme ( a large tiled tunnel, for instance , and you get phase and amplitude distortions of the harmonic content of any sound. We have some very clever Audio Signal Processing in our heads and the result is that we can often 'undo' the distortions and, at the same time, get an idea of the shape and size of the structure we're listening in.

It may be of interest that there's a lot of compression achievable in audio programmes by trifling the phases of very peaky sounds dynamically to reduce the amplitude and allow a higher RMS Power (and hence the loudness) to be raised, making full use of the available transmitter power (AM that is); more bang per buck with little further damage to audio quality.

Paul Colby · Aug 5, 2020

Might be a fun experiment to try with Octave or perhaps csound?

Baluncore · Aug 5, 2020

The cochlea in our ear is a frequency sensitive tapered transmission line that has hair cells to amplify and detect movement. In effect it is performing a mechanical Fourier transform of the audio. The centre of the auditory nerve carries the high frequency response, surrounded by progressively lower frequencies.
Above a few hundred hertz we have difficulty identifying the phase of harmonics, so we are unable to perceive a difference if the phase of the higher frequencies is shifted.

boneh3ad · Aug 6, 2020

I think you need to be careful with the use of the term "harmonic" since it seems to be used here to simply indicate the frequency components of an audio signal rather than its technical definition.

Either way, the actual sound you hear is a temporal signal. Yes, it is sensed by the cochlea that essentially performs a mechanical Fourier transform, but it is a short-time Fourier transform in that the amplitudes of the various frequencies change rapidly in time and your ear is able to resolve that. Your brain essentially then performs the inverse Fourier transform in real time and experiences the audio as a signal in time.

If you randomize all of the phases, you will get the right combination of notes but they will be out of order in time so it's likely to sound like nonsense.

atyy · Aug 6, 2020

It depends. For some classes of sound, human hearing is "phase deaf".
https://www.sfu.ca/sonic-studio-webdav/handbook/Phase.html
https://www.researchgate.net/publication/286939870_Sensitivity_of_Human_Hearing_to_Changes_in_Phase_Spectrum
http://www.mcld.co.uk/blog/2016/audio-phenomenon-schroeder-phase-complexes.html

In the extreme case, a brief click and white noise have the same spectrum, but sound completely different.
https://ccrma.stanford.edu/~jos/sasp/Why_Impulse_Not_White.html

boneh3ad · Aug 6, 2020

atyy said:

It depends. For some classes of sound, human hearing is "phase deaf".
https://www.sfu.ca/sonic-studio-webdav/handbook/Phase.html
https://www.researchgate.net/publication/286939870_Sensitivity_of_Human_Hearing_to_Changes_in_Phase_Spectrum
http://www.mcld.co.uk/blog/2016/audio-phenomenon-schroeder-phase-complexes.html

In the extreme case, a brief click and white noise have the same spectrum, but sound completely different.
https://ccrma.stanford.edu/~jos/sasp/Why_Impulse_Not_White.html

Phase deafness in the sense you describe appears to be a case of playing what amounts to a constant set of tones and just randomizing the phases of those tones. The key part is that each frequency component individually is constant in time.

This does not represent something like, say, a song, where you have many tones that change over time to result in music. If you randomize the phases in "Thriller," I highly doubt you'll still recognize it but instead be listening to "hliTrler."

sophiecentaur · Aug 7, 2020

boneh3ad said:

so it's likely to sound like nonsense.

I agree with most of your post except for this bit. "Nonsense" implies not making any sense. In fact the hearing system copes very well with phase shift variations all over the audio spectrum. When you think how the voice signal processing in your mobile phone mangles up what hits the microphone yet how you can, not only get the words but recognise who's actually talking to you. 'Vocoding' gets away with murder.
Your hearing doesn't seem to care about the actual values of the time varying sound pressure.

Edit: this all depends on what actual phase delays you're talking about. The anagram "hliTrler" involves a several hundred milliseconds whereas out mid range perceived frequenc involve a time period of just a few ms.

Baluncore · Aug 7, 2020

boneh3ad said:

Your brain essentially then performs the inverse Fourier transform in real time and experiences the audio as a signal in time.

A warm and wet inverse transform is not possible because the frequency information is encoded in different separated nerve fibres. Those parallel channels can be correlated. There is no IFT.

The rate of the brain chemistry limits the frequency at which it is possible to correlate phase. The brain can correlate the crest of LF waves, or the rise of the envelope of HF waves. All that can really be done is to estimate the level of stimulation from the two ears to estimate the direction of the wave.

boneh3ad said:

If you randomize all of the phases, you will get the right combination of notes but they will be out of order in time so it's likely to sound like nonsense.

The cochlea is a systolic processor that does NOT scramble the order in time. A click will appear on all fibres at about the same time. The brain ignores those slight differences because phase is not required.

sophiecentaur · Aug 7, 2020

boneh3ad said:

Your brain essentially then performs the inverse Fourier transform in real time and experiences the audio as a signal in time.

I would say that is an over-simplification. When you hear any sound (particularly musical) your temporal and frequency domain experiences are of equal importance. People want to 'nail' what our hearing is doing but it is definitely not one or the other. Engineers have a yearning to characterise it all in circuitry (same for vision) and in the temporal domain but any coding system that involves minimising bit rate (for instance) always involves a bit of both. If you could input the 'perfect sound experience' directly into the senses, you wouldn't use a serial waveform (i.e. you'd have to bypass the eardrum).

boneh3ad · Aug 7, 2020

sophiecentaur said:

I agree with most of your post except for this bit. "Nonsense" implies not making any sense. In fact the hearing system copes very well with phase shift variations all over the audio spectrum. When you think how the voice signal processing in your mobile phone mangles up what hits the microphone yet how you can, not only get the words but recognise who's actually talking to you. 'Vocoding' gets away with murder.
Your hearing doesn't seem to care about the actual values of the time varying sound pressure.

Edit: this all depends on what actual phase delays you're talking about. The anagram "hliTrler" involves a several hundred milliseconds whereas out mid range perceived frequenc involve a time period of just a few ms.

Right, of course this is going to depend on magnitude of the shift and perhaps I've been a bit extreme in my examples.

The other relevant point would be that the different notes that are finite length in time are effectively showing up as amplitude-modulated pulses, so if the envelope doesn't shift in time, a phase shift in the carrier signal (in this instance, the one with the higher frequency of the note being heard) is unlikely to be meaningfully perceived. The issue is that the envelope wave itself has a series of frequency components of its own that would be subject to this hypothetical random shift. To illustrate, I've constructed the plot below.

Each of those signals has an identical amplitude spectrum. The top is the original. The middle only has the note itself shifted but the carrier envelope remains the same. The bottom has a random phase shift applied to every frequency component of the signal. The top two plots would sound the same. The bottom surely would not.

Note the drop in amplitude. Since this is a single pulse, the random phase shift distributes its power over time and so it lowers amplitudes in time without reducing them in the actual spectrum. If this was a signal composed of multiple pulses and frequencies in time, that effect would be much less.

Baluncore said:

A warm and wet inverse transform is not possible because the frequency information is encoded in different separated nerve fibres. Those parallel channels can be correlated. There is no IFT.

My point was not to suggest that the brain is actually physically converting those nerve signals back into an electrical time signal representing the audio, but that you as a conscious human experience the audio as a time-varying signal. It is senses as individual frequencies and experienced as something that varies in time, so that is in some ways an inverse Fourier transform. Maybe it's a metaphysical way more than an actual physical way, but my comment was intended to be illustrative rather than a technical discussion of brain physics.

Baluncore said:

The rate of the brain chemistry limits the frequency at which it is possible to correlate phase. The brain can correlate the crest of LF waves, or the rise of the envelope of HF waves. All that can really be done is to estimate the level of stimulation from the two ears to estimate the direction of the wave.

Right but the issue is that changes in phase across the whole spectrum can entirely scramble the temporal behavior of a signal as illustrated above. For a signal of constant tones, the cilia in your ear aren't likely to notice anything different if one or more are phase shifted, but they will be vibrating at the wrong times if these tones start and stop as you would expect in music or speech.

Baluncore said:

The cochlea is a systolic processor that does NOT scramble the order in time. A click will appear on all fibres at about the same time. The brain ignores those slight differences because phase is not required.

I think you may have missed what I was attempting to convey. I wasn't suggesting that the cochlea was somehow scrambling things in time. My point was that a random set of phase shifts across the entire spectrum of the signal will scramble the signal in time even before it gets to your ear.[/quote]

Baluncore · Aug 7, 2020

boneh3ad said:

My point was that a random set of phase shifts across the entire spectrum of the signal will scramble the signal in time even before it gets to your ear.

And that is why phase is not important in understanding the sounds we hear. Everything flows at the speed of sound in air, ear, chochlea, nerve and brain, so the time is still in order, only the phase of the carrier is lost and unimportant.

Baluncore · Aug 7, 2020

boneh3ad said:

Each of those signals has an identical amplitude spectrum. The top is the original. The middle only has the note itself shifted but the carrier envelope remains the same. The bottom has a random phase shift applied to every frequency component of the signal. The top two plots would sound the same. The bottom surely would not.

You have stretched the response over an entire 1 second period for numerical Fourier analysis.
The cochlea is a bandpass filter with low Q so it delays each frequency channel by only about one cycle at that channel frequency. The cochlea is systolic, it does not take "one second sound grabs".

boneh3ad · Aug 7, 2020

Baluncore said:

You have stretched the response over an entire 1 second period for numerical Fourier analysis.
The cochlea is a bandpass filter with low Q so it delays each frequency channel by only about one cycle at that channel frequency. The cochlea is systolic, it does not take "one second sound grabs".

But that's not the original question. The original question was about taking a discretely sampled audio signal, decomposing it into frequency components, applying a random phase shift to all of them, reconstituting it, and then playing it. Perhaps we are answering two different questions here accidentally?

Baluncore · Aug 7, 2020

entropy1 said:

Suppose I decompose a discrete audio signal in a set of frequency components.

boneh3ad said:

But that's not the original question. The original question was about taking a discretely sampled audio signal, decomposing it into frequency components, applying a random phase shift to all of them, reconstituting it, and then playing it.

The OP does not specify the method, or the time over which the analysis is to be done.

You are assuming that a single very long period of time will be analysed with a block FFT.

I am assuming that a systolic process, like the cochlea or a digital filter element, is used for the analysis over a few cycles of each channel only. My analyser has a low Q, with a different bandwidth for each element over many octaves, quite unlike a computationally efficient FFT.

The cochlea implant device employs multi-channel analysers, not a block FFT.
https://en.wikipedia.org/wiki/Cochlear_implant#Parts

sophiecentaur · Aug 8, 2020

boneh3ad said:

Perhaps we are answering two different questions here accidentally?

It wouldn't be a first for PF.

There is a problem with introducing the Fourier Transform into any real life problem. The discrete Fourier transform assumes a repeated waveform with some repeat rate, that lasts for ever. That seldom happens except when it can be said to apply approximately for a few musical instruments. The very structure of the mechanical / sensor bit of our hearing system implies time and frequency domain measurements and, as the really clever bits are in the brain / nerves and they do their best with what the ear presents them with. (This is the same for vision, taste and touch.)

Frankly, the only answer to the OP is that the question is a bit too naive for a good answer. Our experience of audio recording and communications is that you can seriously mangle some audio programme signals yet we hear them as 'perfectly understandable but not hifi'. Those of us with golden ears claim to be able to hear the slightest distortions though. The answer must be "it all depends" and it's actually pretty hard to predict because our model is very limited. Subjective tests are needed if you want to know just how good or bad a system is.

sophiecentaur · Aug 8, 2020

boneh3ad said:

The bottom surely would not.

Can you be sure of that? Whatever the signal sounds like, it will still repeat at the rate of the original 'pulsed note'. Each of the components will also be modulated by the original repeat rate. What exactly are you doing with the phases ("random phase")? A small amount of phase shifting would not make it suddenly 'burst out' like that.

The problem could be some possible aliasing (?) because I think there is an implied window function in the top diagram - there's no windowing in the bottom one.

entropy1 · Aug 8, 2020

sophiecentaur said:

The discrete Fourier transform assumes a repeated waveform with some repeat rate, that lasts for ever. That seldom happens

Yes. I realized that later. How do you reconstruct the original signal IFT with FT "blocks" (discrete frequency domain)? However, there are things like scrambling an vocoding, so it should be possible I guess?

atyy · Aug 8, 2020

entropy1 said:

Yes. I realized that later. How do you reconstruct the original signal IFT with FT "blocks" (discrete frequency domain)? However, there are things like scrambling an vocoding, so it should be possible I guess?

One can even "throw away" the phase information (as in a spectrogram), and still recover the phase information if there is sufficient overlap in frequency bands. You can find brief discussion and references in https://www.jneurosci.org/content/20/6/2315: " An invertible spectrographic representation of sound requires the use of overlapping frequency bands, as explained in this paragraph and in more mathematical detail in the . The fine temporal structure of a sound is given by the relative phase of the narrow-band signals obtained in the decomposition of the sound into frequency bands. The phase of these narrow-band signals is thrown away in a spectrographic representation, where only the amplitude envelopes of the narrow-band signals are preserved. However, the relative phase of the narrow-band signals can be recovered from the joint consideration of the amplitude envelopes, as long as there is sufficient overlap among the frequency bands (Cohen, 1995; Theunissen and Doupe, 1998)."

See also the answer by Edouard at https://dsp.stackexchange.com/questions/9877/reconstruction-of-audio-signal-from-spectrogram which mentions overlapping time windows.

sophiecentaur · Aug 8, 2020

entropy1 said:

Yes. I realized that later. How do you reconstruct the original signal IFT with FT "blocks" (discrete frequency domain)? However, there are things like scrambling an vocoding, so it should be possible I guess?

The transforms used on the audio data blocks in mpeg coding (raised cosine transforms, iirc) have to be arranged so that discontinuities between adjacent blocks aren’t heard. That will mean, I presume, that phase ‘jitter’ of components is a consideration. But that’s high end quality. Vocoding’s at its worst makes everyone sound like Donald Duck and you can’t tell em apart. Those systems are not very good with music.

entropy1 · Aug 8, 2020

sophiecentaur said:

Vocoding’s at its worst makes everyone sound like Donald Duck and you can’t tell em apart. Those systems are not very good with music.

I own FL Studio and it has a Vocoder that sounds very cool

sophiecentaur · Aug 8, 2020

Hah. I never said DD doesn’t sound cool!

mfb · Aug 8, 2020

boneh3ad said:

But that's not the original question. The original question was about taking a discretely sampled audio signal, decomposing it into frequency components, applying a random phase shift to all of them, reconstituting it, and then playing it. Perhaps we are answering two different questions here accidentally?

OP asked about harmonics, i.e. can we hear the difference between sin(x)+0.3sin(2x) and sin(x)+0.3sin(2x+1)?

boneh3ad · Aug 9, 2020

mfb said:

OP asked about harmonics, i.e. can we hear the difference between sin(x)+0.3sin(2x) and sin(x)+0.3sin(2x+1)?

That depends on how he or she was using the term "harmonics" since it has different meanings in different contexts.

Baluncore · Aug 9, 2020

mfb said:

OP asked about harmonics, i.e. can we hear the difference between sin(x)+0.3sin(2x) and sin(x)+0.3sin(2x+1)?

That will depend on the frequency.

At high frequencies you will not be able to detect the phase difference.

If the frequencies are below a few hundred Hz you may be able to learn to recognise a consistent phase difference. That is because the different hair cells will be triggered synchronously by the sinewave components, and your brain may “learn” to correlate the relative timing.

sophiecentaur · Aug 9, 2020

mfb said:

OP asked about harmonics, i.e. can we hear the difference between sin(x)+0.3sin(2x) and sin(x)+0.3sin(2x+1)?

'hear the difference' is not a defined test. There is a whole hierarchy of 'hearing the difference' from just detectable under ideal conditions and with an appropriate frequency and 'nonsense' (a word that was introduced higher up), which implies you couldn't identify the sound at all. The example above corresponds to about 8° of phase shift for the second harmonic, which would be visible on a scope trace (obvs). Easy to produce that in a room with appropriate dimensions and hard walls. You'd probably be aware that you were not listening under ideal conditions - but it would be other clues about what the room is doing to the sound. You have to remember what our hearing was developed for - not analysing waveforms. This is the same thing as the fact that our eyes make lousy spectrometers.

tech99 · Aug 9, 2020

entropy1 said:

Suppose I decompose a discrete audio signal in a set of frequency components. Now, if I would add the harmonics I got, I would get the original discrete signal. My question is: if I would randomize the phases of the harmonics first, and then add them, I would get a different signal, but would it sound the same as a signal in which the phases were left zero?

tech99 · Aug 9, 2020

The traditional view, which I learned as a young engineer in telecommunications where we were designing long telephone lines, is that the ear does not hear phase differences. In other words, the group delay/frequency response of a transmission system for audio is unimportant. Some telephone lines used to have very severe group delay distortion due to the use of loading inductors, but it was not audible. I have later been told that severe group delay distortion can make the audio sound sibilant, which I presume is because higher frequencies are significantly displaced in time.
In the case of stereo listening, I understand the ear can distinguish phase for low frequencies only. There was some effort to make loudspeakers with a flat group delay/ frequency response for hi-fi, but as usual in that field it is uncertain if it is actually audible.
Another area of interest is when the peak value of an audio waveform we wish to amplify must be restricted, for instance so it can pass through an amplifier. In this case, if we insert phase errors before the amplifier we change the shape of the waveform, which will usually increase the peak-to-average voltage ratio. For example, a square wave will become two spikes.This then means that the amplifier must be "throttled back" to accommodate the peaks.

entropy1 · Aug 9, 2020

tech99 said:

In the case of stereo listening, I understand the ear can distinguish phase for low frequencies only. There was some effort to make loudspeakers with a flat group delay/ frequency response for hi-fi, but as usual in that field it is uncertain if it is actually audible.

I used to be a HiFi-enthousiast and used to have very good ears (poor quality equipment did them in). The "higher" and mid frequencies contain a lot of audio info, so if they are not pure, it shows ("hears"

). If you have sufficiently good equipment, and the positioning of the speakers (or headphone) is perfect, you can especially tell from for instance the reverb on the recording. Reverb contains a lot of higher (wrt human hearing) frequency info.

Another example is that some time I was able to whistle along with a female singer, and here voice is so pure that her voice and my whistle interfered because I was singing just a few hertz higher of lower, so that we had a very audible interference.

sophiecentaur · Aug 9, 2020

entropy1 said:

Suppose I decompose a discrete audio signal in a set of frequency components. Now, if I would add the harmonics I got, I would get the original discrete signal. My question is: if I would randomize the phases of the harmonics first, and then add them, I would get a different signal, but would it sound the same as a signal in which the phases were left zero?

Perhaps we should start again. What is the actual scenario you are discussing? If your 'discrete audio signal' is a length of real audio and not just generated with a simple signal generator of basic synth then the "harmonics" you refer to will not actually be harmonics. Musical intsruments and voices contain Overtones which are not harmonically related to any fundamental frequency. That means the waveform will be changing all the time and an isolated clip will not 'sound right' when played as a loop. So the simple scenario you propose will already not sound the same as the original. There can be an identifiable set of impairments that correspond to the period of the original sample clip, unless your clip is several seconds long. I have looked for examples of the waveforms of musical instruments but they all (so far) seem to be printed snapshots of waveforms. They don't show the higher frequencies (of overtones) marching over the trace of the fundamental note. If you see an animated version then your original question is actually answered, in that a steadily changing 'phase' of those higher frequency components is something we are always hearing and it's what give an instrument its peculiar sound.

I must say, I am very disappointed that there is so little appreciation of the difference between harmonics and overtones and with so many links I found that totally ignore the difference between those 'stills' they show and actual dynamic waveforms. This is bending evidence to fit inadequate theory and you find it all over the place in quasi Physics when applied to music. (I think that what they actually see on their scopes is put down just to instrumentation and scope triggering problems.)

If you are actually starting with a very simple waveform that actually does consist of a fundamental and true harmonics then it will, indeed, be the same for ever. So it's important not to try to carry any conclusions based on this simple model into the world of real audio.

Randomizing phases of harmonics

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Why is water pressure increased in a plastic bag in a bucket?

High School Does acceleration affect impact energy vs constant velocity?

High School Fan is pushing in the front and sucking from the back

Undergrad Why is thermal energy treated differently than other kinds of energy?

Graduate Does a moving particle count as a wave?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect