# Propagation of Error when Taking the Average of Averages

• I

## Main Question or Discussion Point

This is an issue I've seen asked and answered before on this forum some years ago. However, the answer doesn't quite make sense to me, so I want to see if I can get either a more satisfactory answer or a better explanation of the original.

Suppose that I have some cells that produce a luminescent protein, and I have a drug that curtails this process. To quantitatively assay how much the drug impacts protein production as a function of dose, I take the cell culture supernatant from each condition and measure it under a luminometer. I make each measurement in triplicate, and I repeat the experiment three times. For simplicity's sake, we'll refer to each triplicate measurement in each experiment as a Rep, and each experiment as a Set. So, three Sets, each with three Reps.

Now, each Set is going to have an average value (Set AVG) with a degree of uncertainty about the mean (using SEM). I want to take the average of these three set averages. This Meta-Average will have its own standard deviation (SD). To propagate the measurement uncertainty of each set average to the SD of the meta-average, I use this formula:

STDEVmeta-average=SQRT[(STDEV(avg_set1, avg_set2, avg_set3))^2+(STDEV(sem_set1, sem_set2, sem_set3))^2]. This, as I understand it, is the method that was advocated by user viraltux in a thread back in 2012. You can find it here: https://www.physicsforums.com/threads/error-propagation-with-averages-and-standard-deviation.608932/

I like this formula, as it always generates a variance that is always a little bit bigger than simply taking the SD of the three Set averages. This makes sense, as the calculated variability should always be a little bit compounded by the initial uncertainty in my measurements. My problem, though, is that when I subtract the SD from the values generated by this formula, they do not correspond linearly with the sum of the SEMs from each Set. This doesn't make sense to me. If the propagated error formula takes the SD and then adds, essentially, a function of the SEM to it, then once you subtract the SD, the difference should depend entirely on the initial variance in my measurement. Can anyone explain why this is not the case?

Thank you.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
mfb
Mentor
They are added in quadrature. The variances add linearly, if you subtract them you get what you expect. The standard deviation is the square root of the variance.

As an example: If you have a small standard deviation combined with a very large one the small one has a tiny effect. x +- 10 +- 1 adds to x +- 10.05 instead of +- 11.

They are added in quadrature. The variances add linearly, if you subtract them you get what you expect. The standard deviation is the square root of the variance.

As an example: If you have a small standard deviation combined with a very large one the small one has a tiny effect. x +- 10 +- 1 adds to x +- 10.05 instead of +- 11.
Thanks. That helps set me straight. But it exposes what I feel is an intrinsic shortcoming of the approach. My original formula quadratically sums the variance of the three set averages and the variances of their uncertainty. But I don't really care about the second one. I want to factor in how much 'wobble' I have about my means, not how consistent the magnitude of that wobble is.

Let's take an extreme case. Let's say, just for the sake of argument, that the data are thus:
Avg_set1: 10 SEM_set1: 9
Avg_set2: 14 SEM_set2: 9
Avg_set3: 11 SEM_set3: 9

The SEM is outrageously high in every case. But, because they're extremely consistent, the final calculated variance wouldn't be penalized at all. Maybe the formula should be amended like this:

STDEVmeta-average=SQRT[(STDEV(avg_set1, avg_set2, avg_set3))^2+(SUM(sem_set1, sem_set2, sem_set3))^2]

In this way, it doesn't matter how uniform the individual SEMs are. What matters is their magnitude. This makes a lot more sense to me. Are there any inherent problems with this approach?

mfb
Mentor
What exactly do you want to quantify with the number?

How much you expect the true mean of the sets to be spread out? Then you should reduce the observed standard deviation based on the variance within sets (because this will increase your observed standard deviation).

How uncertain your mean is? Using the observed standard deviation of the mean values should work.

How uncertain you are about the global mean of all observations? I would pool all 9 and work with them as single group.

If you want to construct a confidence interval, you probably have to add quite a lot from the SEM to the overall result.

Something else?

What exactly do you want to quantify with the number?

How much you expect the true mean of the sets to be spread out? Then you should reduce the observed standard deviation based on the variance within sets (because this will increase your observed standard deviation).

How uncertain your mean is? Using the observed standard deviation of the mean values should work.

How uncertain you are about the global mean of all observations? I would pool all 9 and work with them as single group.

If you want to construct a confidence interval, you probably have to add quite a lot from the SEM to the overall result.

Something else?
It's more of the first one. I'm not sure why I would want the observed SD reduced. I mean, from a presentation standpoint, that does make my data look nicer, but from the standpoint of intellectual honesty, I don't follow. If I have a set of data, like so:

(---1---)__________(-2-)_____________________(----3----)

I have three observations, each with varying degrees of uncertainty about them (represented by the parentheses). My meta-average will be somewhere to the right of 2, and the SD of this meta-average will be a product of how far each of these is from that point. But, to my mind, that number should be compounded by the fact that we don't know exactly where 1, 2, or 3 are. The SDmeta is serving as an estimate of the biological variability of the studied system, and I would intuitively suspect such an estimate to be greater the less sure we are of our founding observations.

BvU
Homework Helper
2019 Award
assay how much the drug impacts protein production as a function of dose
where do we see the dose varied ?

BvU
Homework Helper
2019 Award
(---1---)__________(-2-)_____________________(----3----)

I have three observations, each with varying degrees of uncertainty about them (represented by the parentheses).
So the average of $2\pm1,\ 1\pm 3,$ and $3\pm 4$ is $2\pm 1$, or, if you want $2.0 \pm 0.9$ (noting there is a 35% error in the .9 if the total number of measurements is 9)

In post #1 you still had three sets of three observations each. Why don't you post the raw data instead of concocting vague examples ?

mfb
Mentor
It's more of the first one.
Estimating the true spread of the mean values of the samples?
If you use the observed standard deviation, you overestimate this.

Toy example: Your samples have a true spread of +- 10, and your three individual measurements have an uncertainty of +-17 each, leading to +- 10 spread for the mean of a set (standard deviation of observed minus true). Your overall spread of the set means will be $\sqrt{\color{red}{10}^2+\color{blue}{10}^2}=14$ - not the 10 you wanted. If you know the 10 you can revert this process and find what you wanted.

With just three sets, this process is not very reliable, but your estimate will be very coarse anyway. The probability that all three are close together by random chance is large, and you cannot take this into account. What if your true spread was +-50 and you just happened to have the values within +-10?

where do we see the dose varied ?
For simplicity's sake, I haven't shown the dose response. I figure the process should be the same for each dose.

So the average of $2\pm1,\ 1\pm 3,$ and $3\pm 4$ is $2\pm 1$, or, if you want $2.0 \pm 0.9$ (noting there is a 35% error in the .9 if the total number of measurements is 9)

In post #1 you still had three sets of three observations each. Why don't you post the raw data instead of concocting vague examples ?
Again, for simplicity's sake. But, as you wish.

Just looking at the control dose (i.e. no drug), my measurements are:

Set 1:
1. 12446.52
2. 12757.86
3. 11786.68

Set 2:
1. 8476.44
2. 7463.44
3. 5574.07

Set 3:
1. 15070.42
2. 16642.77
3. 16641.54

Statistics:
• Averages:
• Set 1: 12330.35
• Set 2: 7171.32
• Set 3: 16118.24
• SEMs:
• Set 1: 286.31
• Set 2: 850.48
• Set 3: 523.91
• Meta-average: 11873.30

Fixed a number, see later posts - mfb

Last edited by a moderator:
BvU
Homework Helper
2019 Award
And the hypothesis is that the nine observations in these three sets are samples from one population with one average and one standard deviation ?

And the hypothesis is that the nine observations in these three sets are samples from one population with one average and one standard deviation ?
Yes.

BvU
Homework Helper
2019 Award
Any a priori estimate of the standard deviation ? Any chance of systematic errors ?
My guess is the probability to pick set 2 as an unbiased sample must be rather small - have to calculate a bit, but not now: sports

Any a priori estimate of the standard deviation ? Any chance of systematic errors ?
My guess is the probability to pick set 2 as an unbiased sample must be rather small - have to calculate a bit, but not now: sports
My answers are 'Wha...?' and 'Sure, I guess?' respectively. Also, if 'my guess is the probability to pick set 2 as an unbiased sample must be rather small' is a complicated way of saying 'Set 2 seems like the odd man out', yeah, but not really. The meta-average is about 12,000. Set 3 is about 4,000 above that, and set 2 is about 5,000 below. Pretty evenly dispersed.

Estimating the true spread of the mean values of the samples?
If you use the observed standard deviation, you overestimate this.

Toy example: Your samples have a true spread of +- 10, and your three individual measurements have an uncertainty of +-17 each, leading to +- 10 spread for the mean of a set (standard deviation of observed minus true). Your overall spread of the set means will be $\sqrt{\color{red}{10}^2+\color{blue}{10}^2}=14$ - not the 10 you wanted. If you know the 10 you can revert this process and find what you wanted.

With just three sets, this process is not very reliable, but your estimate will be very coarse anyway. The probability that all three are close together by random chance is large, and you cannot take this into account. What if your true spread was +-50 and you just happened to have the values within +-10?
I'm afraid I don't follow you here. When you say 'spread', are you referring to the SD of the meta-average, or are you referring to the range? When you refer to the uncertainty of the measurements, do you mean their respective SEMs? Also, how did we go from 17 to 10?

BvU
Homework Helper
2019 Award
I can reproduce averages and sem for 1 and 3. Typo in 2 ?

Just considering them as nine independent observations gives an average of $\ (11.9\pm 1.3)\times 1000\$ and two-thirds of the measurements are within $\pm \sigma = \pm 4000$ from the average.

But looking at them as three groups gives means that are so utterly distinct ( $12300\pm 300, \ 7300 \pm 900, \ 16100\pm 500 \$ ) that it is very hard to believe they can be from the same population.

a priori estimate of the standard deviation
means: how accurate do you expect your measurements to be ? $\pm 4000$ ?

#### Attachments

• 25.4 KB Views: 593
Last edited:
I can reproduce averages and sem for 1 and 3. Typo in 2 ?
Yes, there was a typo. Set 2.1 should be 8476.44, not, as I wrote, 8746.44. I would go back and fix the original, but I don't see any way to do that.

how accurate do you expect your measurements to be ? $\pm 4000$ ?
Well, are we talking precision or accuracy here? But, to try to answer your questions, I didn't come with any a priori assumptions of either. When you do these experiments, you hope - and it's nice when - the experimental readouts don't differ from each other by more than 20% either way. But that rarely actually happens. There are just so - many - sources of variability in the system: how happy the cells are, which is a function of the cell culture medium, the passage number of the cells, how gently they were thawed out, etc.; how uniform the treatment doses were across experiments - you're dealing with nanomolar concentrations, so it's relatively easy to be off by a good bit; the quality of your reagents; exactly how long you let the reaction that detects the luminescence 'cook' before you throw it in the reader. Any and all of these can vary from one experiment to the next. Good experimental practice reduces the variability of each of these, but everyone has an off day, and when you're dealing with this many steps, you just can't know for sure. So, you do the next best thing, and your normalize all your readouts to the control for each experiment. That reduces a lot of the variability, but even then, you can still get quite a spread. If you were to do this experiment 10 times, a lot of that variability would even out, but I don't have 10 sets. I have 3. And considering the time and money involved in getting those 3, I'm not likely to get more.

BvU
Homework Helper
2019 Award
The way you describe it there are other factors, not under your control, that influence the outcome. Statistically you should obtain a Gauss distribution for a large number of repeats. With only three sets your estimate for the standard deviation of the over-all distribution will be rather inaccurate, though.
In such a case I would consider the nine observations as independent and end up with the 11900 $\pm$ 1300 and a standard deviation of 4000.

The way you describe it there are other factors, not under your control, that influence the outcome. Statistically you should obtain a Gauss distribution for a large number of repeats. With only three sets your estimate for the standard deviation of the over-all distribution will be rather inaccurate, though.
In such a case I would consider the nine observations as independent and end up with the 11900 $\pm$ 1300 and a standard deviation of 4000.
I would respectfully differ with that prognosis. I think, just by looking at the data, that the three reps in each set are a lot closer to each other than they are to the reps from different sets. If the data looked like this:

0____0___0_________0_0__0____0_____0__0

then yeah, I think a fair case could be made for the method that you're advocating. But the data don't look like that. They look like this:

0___0___0_____________________________________0_0___0_________________________________0_____0__0

Clearly, three distinct sets of three. Just trying your method for shits and giggles, I get SDs that are comparable to just taking the straight SD of my three set averages, completely ignoring the SEM of each set. Sometimes they're a little higher. Sometimes a little lower. This, as opposed to this:

STDEVmeta-average=SQRT[(STDEV(avg_set1, avg_set2, avg_set3))^2+(SUM(sem_set1, sem_set2, sem_set3))^2]

which is ALWAYS higher than straight SD, and which mfb says is not what I want for reasons I still don't understand.

Though, on second thought, I dunno. Your method might be good enough. My main concern was that by treating each of the reps as a separate independent observation, you're artificially boosting your n and reducing your SD. It's the difference between weighing nine different test subjects and weighing three test subjects three times each. But, as I said, the SDs I obtain with your method aren't radically different from straight SD without propagating error. When it's lower, it's not much lower, and when it's higher it tends to be of a greater magnitude difference than when it's lower. So it doesn't seem to fall into the trap of underestimating variance. So then, I guess the question boils down to whether or not ignoring the error about the mean when calculating the final variance, itself, underestimates the variance. My intuition is that it would. Mfb seems to disagree.

mfb
Mentor
I'm afraid I don't follow you here. When you say 'spread', are you referring to the SD of the meta-average, or are you referring to the range? When you refer to the uncertainty of the measurements, do you mean their respective SEMs? Also, how did we go from 17 to 10?
I mean the variance in the true values within each set. The numbers you would get if your individual measurements in a set would all be exactly the same but the variation between sets is still there.

Given the large difference between the spread between sets and the spread within sets I would just use the set averages and ignore uncertainties on them - they are small compared to the differences between sets.