Combining two audio sources, digitally

Skaperen · Feb 10, 2012

I was playing around with some computer code to process some audio. As part of it, I needed to combine the audio from two different sources. Everything is in raw, uncompressed, 16-bit signed samples, at 48kHz sample rate. So now I'm thinking, do I add voltage, or add power? The sample values are voltage (or so I've believed over the years). But in real life, adding a 2nd audio source just adds more power. If I have a 25-watt source, and another 25-watt source that might happen to be exactly the same audio waveform, combining them is going to get 50 watts, not 100 watts. But if I do this in voltage signals, what would be a 25-watt output would become a 100 watt output (assuming a sound system with sufficient capacity to not overload, and digital bits that won't overflow).

So what's the proper way to combine two audio sources, in digital space, that correctly behaves like real life space where adding 25 watts and 25 watts gets you no free power?

skeptic2 · Feb 10, 2012

You're correct, you need to add powers. However, I understand your data represents voltages so you just need to apply Pythagorean's Theorem Pout = sqrt(V1^2 +V2^2).

Skaperen · Feb 12, 2012

skeptic2 said:

You're correct, you need to add powers. However, I understand your data represents voltages so you just need to apply Pythagorean's Theorem Pout = sqrt(V1^2 +V2^2).

One of the things I'm wanting to understand is just how many audio sources I could add together and still have some reasonable level of dynamic range. For example, consider a VoIP teleconferencing server. Maybe a teleconference might involve dozens of people joining in. Knowing that adding power (rather than voltage) is the correct way, means there really is more dynamic range preserved (th square root being a form of compression of range).

Still, I've yet to find any technical descriptions of digital audio that explicitly say (or deny) that the numbers represent voltage, and that simply adding the numbers from more than one source is wrong. When one thinks about adding power, then sqrt(V1^2 +V2^2) sure seems right. But is that what is really done?

One thing that I know about in radio waves is that when you have a constant power output from a transmitter, and switch from a single dipole to a double dipole, even though each dipole now only gets half of the transmitter power, there's a net gain of 3db at the in-phase reception point. There's no free power because other directions lose power (e.g. the pattern is just re-arranged). As far as I know, radiating sound from two emitters would have that same effect. If one simply added a 2nd transmitter to a 2nd antenna, with an identical waveform, or did likewise with a 2nd amplifier and 2nd speaker, the in-phase pickup with be experiencing a 6db increase (3db from the doubling of power sources, and 3db from radiation pattern effects). Presumably, adding voltages to digital audio would be creating the effect of being at an in-phase pickup point. Obviously, being at a perfect out of phase point means you get nothing at all.

I've also been curious about the difference between a single vocalist, and a choir. The latter would be a summation of power, although with no all being in perfect sync, it's not really a perfect summation, either. If I have 64 audio inputs from 64 vocalists, where they are not in perfect sync, anyway, should I add voltage or add power? If they were in perfect sync at the waveform level, adding voltage means I would get a 36db increase, whereas adding power should give an 18db increase. But by not being in perfect sync, I would get less? If the 64 sources were random noise, now I'm expecting 18db for added voltage and 9db for added power (power is lost). If the 64 sources were perfect sine waves at the same frequency in each of the different phase angles of 5.625 degree steps, then I expect the sum to be zero (power is lost here, too).

I don't know the right answer. But neither choice I anticipate seems to be without arguments against it.

Edit ... Another thought: I've seen many cases illustrating the addition of two or more sine waves of different frequency, showing a sum as a sum of voltage (by assuming the inputs are voltage). Is this right?

Edit2 ... It seems that Op Amp Summing Amplifiers are just adding their sources.

Antiphon · Feb 12, 2012

Uh, no.

You add voltages. Simply add the two signals numerically. Adding powers is non-linear and non-sensical in this context.

Skaperen · Feb 12, 2012

Antiphon said:

Uh, no.

You add voltages. Simply add the two signals numerically. Adding powers is non-linear and non-sensical in this context.

Actually, I ran a test program that performed the sqrt(a^2+b^2) operation, and calculated the ratio of that result to one of the inputs (effectively both since I did the test where both inputs were identical). In the sum=a+b scenario, the ratio was a flat 2.0 across the range of all input values. In the sum=sqrt(a^2+b^2) test, the ratio was a flat 1.414 across the range of all input values. I'd agree that (a^2+b^2) is by itself non-linear, as a^2 alone would be. But sqrt(a^2) is most certainly linear, and sqrt(a^2+b^2) sure is when a==b. What about a!=b? Maybe there is intermodulation? I guess I need to run a test with a and b at different frequencies and see what Mr. Fourier thinks.

So, how does this work in a realistic sense?

If I am adding signal A and signal B, while signal A is a 440 Hz sine wave tone, and signal B is zero, and I feed the output of this to an audio amplifier, at a certain volume setting I will have 4 volts out to an 8 ohm speaker, giving me 0.5 amp current, and thus 2 watts dissipated in the speaker (presumably mostly as sound energy). Then signal B produces and identical sine wave tone in the same phase. Now with the amplifier and speak setup remaining the same, now I have 8 volts out to that 8 ohm speaker, and 1.0 amp current. This gives me 8 watts.

How many of the watts comes from signal A and how many of the watts comes from signal B?

If I create the same scenario in real life with separate equipment, where is the free power coming from?

Antiphon · Feb 12, 2012

Your confusing orthogonal signals with coherent power.

You need to consider what happens with A being 440 and B being 450. Then you get 4 watts, not 8.

Skaperen · Feb 14, 2012

Antiphon said:

Your confusing orthogonal signals with coherent power.

You need to consider what happens with A being 440 and B being 450. Then you get 4 watts, not 8.

I should still get the same amount of power. There is no free power just because I happen to have A and B both be 440.

AlephZero · Feb 14, 2012

Skaperen said:

So what's the proper way to combine two audio sources, in digital space, that correctly behaves like real life space where adding 25 watts and 25 watts gets you no free power?

A digital signal doesn't have any "power" that you can measure in watts, until you feed it into an analog device. The power level you get then depends on the analog system as well as the signal level.

If you feed the same signal into two identical amps, effectively you are halving the impedance of the speaker system by running two sets of speakers in parallel. That doesn't correspond to anything in digital signal processing. (At least, not to anything as simple as just adding signals together).

If you feed double the signal voltage level into one amp (assuming it can handle that signal level) you will get 4 times the power output.

What happens when you combine signals with different frequences, or even the same frequency and different phases, is another question. If you play back identical sine waves through two separate amps and speakers in an anechoic chamber (so the echoes off the walls don't confuse the issue) at some places in the chamber you will get double the output power and at other places you may get zero, depending on the distance between the speakers and the interferance patterns between the sound waves.

All of this is an issue when mixing audio where you want to position (pan) a mono signal in a stereo image. In theory, a given signal level applied to both channels (center of the sound image) should give 6dB more sound output than the same signal applied to just one channel (extreme left or right). But in practice, in most listening environments, the measured difference is closer to 3dB than 6dB. On pro mixing equipment you can select the "panning law" to choose how much compensation you want to apply to correct for this. In fact you might not want to correct at all, if you think it's more natural for the center-stage sounds to be louder than those at the edges, but for recordings made for home listening conditions the correction is usually about 3db not 6.

Skaperen · Feb 15, 2012

Let me see if I understand this right this time.

In the anechoic chamber (or an infinite open air space), doubling the sound power by having 2 sound emission sources, same power, same frequency, involves an interference pattern where different places get different power, the highest being 4x and the lowest being 0x, and the average being 2x. This would be like a dual dipole radio antenna radiation pattern achieving gain in specific directions.

So I could say the speaker arrangement can give me gain in some directions, like an antenna, and so some places it will sound like 4x power, even though only 2x power is physically used, due to the gain effect.

So adding signal voltage (before it gets amplified) is really "emulating" the loudness I would get at the point of best gain from the above scenario, but is actually achieving it in all directions without an interference pattern, driving 4x power through the amplifier (assuming it can handle 4x power). But this is only experienced if adding coherent signals.

With incoherent signals, the case of 2 separate speakers would have a moving interference pattern, which would average out to 2x power (but each power goes through the range of 0x to 4x power changes). Adding the incoherent signals and driving a single amplifier and speaker with this would ... NOT have the interference pattern, but would sound like it does at every point, the same way at the same time. And it would average out to 2x power over time.

So the only reason I'd have 4x actual power out is because I'm effectively choosing to create the effect of being always at one of those points of interference gain, when I choose to add 2 coherent signals (same frequency and phase). I could choose to add signals 90 degrees apart, and have the effect of listening half way between one of those peak points and a null point.

This is making sense to me now, so I hope I have it right, and that my explanation still fits with what you were saying.

So c=a+b, for c being the result signal, makes sense if what I want ... in cases of coherence when they might happen in the input signals ... is the effect of being in the middle line of those two speakers.

Antiphon · Feb 15, 2012

That's basically correct. There's a time component to the argument that works without a room where you replace the speaker with a simple resistor.

If you have a 400.0 Hz sine wave and add a second 400.0 Hz sine wave you will get 4x the power. But if you change the frequency of one of the signals to 400.00000000000000000000000000000000001 Hz you will get 2x the power. At first and for many years you will get 4x the power. Then after many millions of years you will get zero. The time average over forever will give you 2x the power.

Skaperen · Feb 15, 2012

Antiphon said:

That's basically correct. There's a time component to the argument that works without a room where you replace the speaker with a simple resistor.

If you have a 400.0 Hz sine wave and add a second 400.0 Hz sine wave you will get 4x the power. But if you change the frequency of one of the signals to 400.00000000000000000000000000000000001 Hz you will get 2x the power. At first and for many years you will get 4x the power. Then after many millions of years you will get zero. The time average over forever will give you 2x the power.

But this is (mostly ... in time terms larger than one or a few wave cycles) a spatial component where the signals to be added are kept separate until they reach their respective separate amplified speakers?

OK, so I need to just simply add the voltage numbers. How much head-room do I need? In the worst case I would need the coherent case. To do less I would need to understand the nature of my particular signals. In the case of a VoIP teleconference system (say maybe as many as 64 participants at one time), in addition to coherence being very unlikely, it might be rare for everyone to talk all at once.

DragonPetter · Feb 15, 2012

You are sampling the voltage of 2 signals.

The signals are linear sound pressure waves that superposition applies to, and the pressure (more specifically the force) is directly proportional to the voltage created by the transducer. Velocity of the transducer vibration is proportional to the current. This is based on the electroacoustic reciprocity principle.

[itex]\frac{V}{v} = \frac{F}{I}[/itex]

The force times its velocity will be the power, not just the force you measured, which is what your voltage data represents.

Since you are not measuring voltage AND current of your signal, you can only know what the relative force/pressure is, not the power. So it makes sense to add with the units that you measured are in - volts.

All of your other concerns about power, coherence, etc. are besides the fact that your signal is in volts, and those other factors depend on spatial, time, and transducer variables that will give you the expected power from the volts you use.

Skaperen · Feb 17, 2012

DragonPetter said:

Since you are not measuring voltage AND current of your signal, you can only know what the relative force/pressure is, not the power. So it makes sense to add with the units that you measured are in - volts.

All of your other concerns about power, coherence, etc. are besides the fact that your signal is in volts, and those other factors depend on spatial, time, and transducer variables that will give you the expected power from the volts you use.

My motivation was to combine two audio signals in such a way that in the combined form, one would HEAR a volume level that corresponds to the increase that would exist in a circumstance where it is clearly just a doubling of power. If a person is speaking or singing at a volume level equivalent to 4 watts from a speaker, and a 2nd person chimes in at the same level, we presume it involves twice the power. My motivation was to achieve, in signa arithmetic, exactly that same thing. This had the apparent contradiction that adding voltage would be the equivalent of 16 watts. How could one get 16 watts from a pair of 4 watt sources. So it sure seemed plausible that what is correct is adding power, not adding voltage. Since the actual signal numbers were voltage, they had to be converted to power to do the power arithmetic, then converted back to voltage.

That is why adding power "makes sense" (even though I'm quite confident now that even though it STILL makes sense, it is actually wrong).

Reality is more complex. Waves, space, and time are involved. And with two people speaking on stage (without an amplification system), different people will hear different things. Given two different places to emit the sounds, the mixing is not as simple as adding voltage (or adding power, for that matter). One must consider the situation, and deal with how the arithmetic applies. The situation is the two audio sources are not in the same place (no more so than two talkers in a conference room with one phone on the table). How to apply the arithmetic begs deciding where the listener or microphone is places (or even more complications if things are moving with respect to time).

Simply adding voltages makes an assumption about spatial placement. Adding them with no phase/time shift means accepting the assumption of what is heard at a point equal distant from both sound sources in space. They actually do combine to "appear" to be more power at that point. This is how waves achieve gain ... it's at a loss in other directions (where waves cancel out). Power (and energy) is still preserved (nothing gained ... nothing lost besides the usual physical loses like heat). The wave structure just moves it around so it appears to be gained in some places and lost in others. Radio waves do this, and antennas are designed to take advantage of it. Panelized speakers are the same thing, I presume (an antenna array equivalent).

This was not about me assuming the signal values were power or such. I always knew they were a measure of voltage. But I had not determined that the addition of voltage was the correct one, until getting through this thread. I looked at quite a number of websites about this, and not a single one ever even said in an unsupported way that addition of voltage is the way to go to effectively combine two power sources (much less a site saying so with an explanation of why it was correct). Hopefully, in the future, Google and the others find this thread.

Waves are funny things.

Antiphon · Feb 19, 2012

There's no need to imagine enormous complexity here. You don't need rooms and spatial interference.

As I said before, replace the speaker with an 8 Ohm resistor. All the forgoing still applies.

Skaperen · Feb 21, 2012

Antiphon said:

There's no need to imagine enormous complexity here. You don't need rooms and spatial interference.

As I said before, replace the speaker with an 8 Ohm resistor. All the forgoing still applies.

The interference aspect is required to understand how adding voltage can result in the same perception of power as adding power.

Consider heat. Add the voltage and you get 4 times the heat dissipation in a resistor. But that's not the equivalent of adding a 2nd heater.

Combining two audio sources, digitally

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Why must residential electrical systems be connected to Earth (soil)?

VFD for powering a car lift

Series motors, switched to parallel

Wireless Charging

Interpreting VIA PCB characteristics through TDR and TDT

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect