Error propagation with averages and standard deviation

rano · May 25, 2012

I was wondering if someone could please help me understand a simple problem of error propagation going from multiple measurements with errors to an average incorporating these errors. I have looked on several error propagation webpages (e.g. UC physics or UMaryland physics) but have yet to find exactly what I am looking for.

I would like to illustrate my question with some example data. Suppose we want to know the mean ± standard deviation (mean ± SD) of the mass of 3 rocks. We weigh these rocks on a balance and get:

Rock 1: 50 g
Rock 2: 10 g
Rock 3: 5 g

So we would say that the mean ± SD of these rocks is: 21.6 ± 24.6 g.

But now let's say we weigh each rock 3 times each and now there is some error associated with the mass of each rock. Let's say that the mean ± SD of each rock mass is now:

Rock 1: 50 ± 2 g
Rock 2: 10 ± 1 g
Rock 3: 5 ± 1 g

How would we describe the mean ± SD of the three rocks now that there is some uncertainty in their masses? Would it still be 21.6 ± 24.6 g? Some error propagation websites suggest that it would be the square root of the sum of the absolute errors squared, divided by N (N=3 here). But in this case the mean ± SD would only be 21.6 ± 2.45 g, which is clearly too low.

I think this should be a simple problem to analyze, but I have yet to find a clear description of the appropriate equations to use. If my question is not clear please let me know. Any insight would be very appreciated.

viraltux · May 25, 2012

rano said:

I was wondering if someone could please help me understand a simple problem of error propagation going from multiple measurements with errors to an average incorporating these errors. I have looked on several error propagation webpages (e.g. UC physics or UMaryland physics) but have yet to find exactly what I am looking for.

I think this should be a simple problem to analyze, but I have yet to find a clear description of the appropriate equations to use. If my question is not clear please let me know. Any insight would be very appreciated.

Hi rano,

You are comparing different things, in the first case you calculate the standard error for the mass rock distribution; this error gives you an idea of how far away from the average will be the weight of the next rock you sample.

In the second case you calculate the standard error due to measurements, this time you get an idea of how far away the measured weight is from the real weight of the rock.

haruspex · May 25, 2012

viraltux said:

You are comparing different things, ...

Yes and no.
If Rano had wanted to know the variance within the sample (the three rocks selected) I would agree. But I note that the value quoted, 24.66, is as though what's wanted is the variance of weights of rocks in general. (The variance within the sample is only 20.1.)
That being so, I think the question is valid. The variance of the population is amplified by the uncertainty in the measurements.
What further confuses the issue is that Rano has presented three different standard deviations for the measurements of the three rocks. In assessing the variation of rocks in general, that's unusable. We have to make some assumption about errors of measurement in general. We can assume the same variance in measurement, regardless of rock size, or some relationship between rock size and error range.

viraltux · May 25, 2012

haruspex said:

Yes and no.
If Rano had wanted to know the variance within the sample (the three rocks selected) I would agree. But I note that the value quoted, 24.66, is as though what's wanted is the variance of weights of rocks in general. (The variance within the sample is only 20.1.)

I'm not sure where you get a variance of 20.1, but the standard error for the sample is definitely 24.66

haruspex · May 25, 2012

viraltux said:

I'm not sure where you get a variance of 20.1, but the standard error for the sample is definitely 24.66

Sorry, a bit loose in terminology. The st dev of the sample is 20.1 The variance (average square minus square average) is 405.56.
But for the st dev of the population the sample of n represents we multiply by sqrt(n/(n-1)) to get 24.66. Since Rano quotes the larger number, it seems that it's the s.d. of the population that's wanted.

viraltux · May 25, 2012

haruspex said:

Sorry, a bit loose in terminology. The st dev of the sample is 20.1 The variance (average square minus square average) is 405.56.
But for the st dev of the population the sample of n represents we multiply by sqrt(n/(n-1)) to get 24.66. Since Rano quotes the larger number, it seems that it's the s.d. of the population that's wanted.

Ah, OK, I see what's going on... it's a naming thing, the standard deviation definition/estimation is unfortunately a bit messy since I see it change from book to book but anyway, I should have said standard deviation myself instead standard error since the data do not represent sample means.

But anyway, whether standard error or standard deviation the only thing we can do is to estimate the values, and when it comes to estimators everyone has its favorites and its reason to choose them.

So 20.1 would be the maximum likelihood estimation, 24.66 would be the unbiased estimation and 17.4 would be the lower quadratic error estimation and ... you could actually go on.

So which estimation is the right one? all of them.

You're right, rano is messing up different things (he should explain how he measures the errors etc.) but my point was to make him see that the numbers are different because they are measuring different things.

chiro · May 26, 2012

rano said:

I was wondering if someone could please help me understand a simple problem of error propagation going from multiple measurements with errors to an average incorporating these errors. I have looked on several error propagation webpages (e.g. UC physics or UMaryland physics) but have yet to find exactly what I am looking for.

I would like to illustrate my question with some example data. Suppose we want to know the mean ± standard deviation (mean ± SD) of the mass of 3 rocks. We weigh these rocks on a balance and get:

Rock 1: 50 g
Rock 2: 10 g
Rock 3: 5 g

So we would say that the mean ± SD of these rocks is: 21.6 ± 24.6 g.

But now let's say we weigh each rock 3 times each and now there is some error associated with the mass of each rock. Let's say that the mean ± SD of each rock mass is now:

Rock 1: 50 ± 2 g
Rock 2: 10 ± 1 g
Rock 3: 5 ± 1 g

How would we describe the mean ± SD of the three rocks now that there is some uncertainty in their masses? Would it still be 21.6 ± 24.6 g? Some error propagation websites suggest that it would be the square root of the sum of the absolute errors squared, divided by N (N=3 here). But in this case the mean ± SD would only be 21.6 ± 2.45 g, which is clearly too low.

I think this should be a simple problem to analyze, but I have yet to find a clear description of the appropriate equations to use. If my question is not clear please let me know. Any insight would be very appreciated.

Hey rano and welcome to the forums.

In general this problem can be thought of as going from values that have no variance to values that have variance.

What this means mathematically is that you introduce a variance term for each data element that is now a random variable given by X(i) = x(i) + E where E is a random variable. In this example x(i) is your mean of the measures found (the thing before the +-)

A good choice for a random variable would be to say use a Normal random variable with mean 0 and standard deviation of say 1/2 which means that 95% of all values would be covered within 2 standard deviations (i.e. 1 unit either side from the mean). If instead you had + or -2, you would adjust your variance.

Then to get the variance and mean for this you simply take the mean and variance of the sum of all the X(i)'s and this will give you a mean and variance for the sample mean where you define the sample mean of your new general data to be Sum(X(i))/n where i = 1 to n and n is the number of things you have.

rano · May 27, 2012

Hi viraltux and haruspex,

Thank you for considering my question. I apologize for any confusion; I am in fact interested in the standard deviation of the population as haruspex deduced. I think a different way to phrase my question might be, "how does the standard deviation of a population change when the samples of that population have uncertainty"?

From your responses I gathered two things. First, this analysis requires that we need to assume equal measurement error on all 3 rocks. I'm not clear though if this is an absolute or relative error; i.e. is it ok that we set the SD of each rock to be 2 g despite the fact that their means are different (and thus different relative errors). The second thing I gathered is that I'm not sure if this is even a valid question since it appears as though I am comparing two different measures. But I guess to me it is reasonable that the SD in the sample measurement should be propagated to the population SD somehow. Thank you again for your consideration.

Hi chiro,

Thank you for your response. I think it makes sense to represent each sample as a function with error (e.g. 1 SD) as a random variable. What I am struggling with is the last part of your response where you calculate the population mean and variance. Let's say our rocks all have the same standard deviation on their measurement:

Rock 1: 50 ± 2 g
Rock 2: 10 ± 2 g
Rock 3: 5 ± 2 g

My interpretation of your instruction would be to add these 3 together and divide by N (3 in this case):

Sum: (65 ± 6 g) / 3 = 21.6 ± 2 g.

But to me this doesn't make sense because the standard deviation of the population should be at least 24.6 g as calculated earlier. If you could clarify for me how you would calculate the population mean ± SD in this case I would appreciate it. Thank you again for your consideration.

viraltux · May 27, 2012

rano said:

But I guess to me it is reasonable that the SD in the sample measurement should be propagated to the population SD somehow.

But of course! OK, let's call X the random variable with the real weights, and ε the random error in the measurement. then Y=X+ε will be the actual measurements you have, in this case Y = {50,10,5}.

You want to know how ε SD affects Y SD, right? Then we go:

Y=X+ε → V(Y) = V(X+ε) → V(Y) = V(X) + V(ε) → V(X) = V(Y) - V(ε)

And therefore we can say that the SD for the real weights considering the measurement errors is

[tex]σ_X = \sqrt{σ_Y^2 - σ_ε^2}[/tex]

What you were doing before was to compare the estimations of σ_Y and σ_ε

rano · May 27, 2012

Hi viraltux,

Thank you very much for your explanation. That was exactly what I was looking for. I really appreciate your help.

Dickfore · May 27, 2012

rano said:

I was wondering if someone could please help me understand a simple problem of error propagation going from multiple measurements with errors to an average incorporating these errors. I have looked on several error propagation webpages (e.g. UC physics or UMaryland physics) but have yet to find exactly what I am looking for.

I would like to illustrate my question with some example data. Suppose we want to know the mean ± standard deviation (mean ± SD) of the mass of 3 rocks. We weigh these rocks on a balance and get:

Rock 1: 50 g
Rock 2: 10 g
Rock 3: 5 g

So we would say that the mean ± SD of these rocks is: 21.6 ± 24.6 g.

But now let's say we weigh each rock 3 times each and now there is some error associated with the mass of each rock. Let's say that the mean ± SD of each rock mass is now:

Rock 1: 50 ± 2 g
Rock 2: 10 ± 1 g
Rock 3: 5 ± 1 g

How would we describe the mean ± SD of the three rocks now that there is some uncertainty in their masses? Would it still be 21.6 ± 24.6 g? Some error propagation websites suggest that it would be the square root of the sum of the absolute errors squared, divided by N (N=3 here). But in this case the mean ± SD would only be 21.6 ± 2.45 g, which is clearly too low.

I think this should be a simple problem to analyze, but I have yet to find a clear description of the appropriate equations to use. If my question is not clear please let me know. Any insight would be very appreciated.

How did you get 21.6 ± 24.6 g, and 21.6 ± 2.45 g, respectively?!

viraltux · May 27, 2012

rano said:

Hi viraltux,

Thank you very much for your explanation. That was exactly what I was looking for. I really appreciate your help.

You're welcome

haruspex · May 27, 2012

rano said:

First, this analysis requires that we need to assume equal measurement error on all 3 rocks. I'm not clear though if this is an absolute or relative error; i.e. is it ok that we set the SD of each rock to be 2 g despite the fact that their means are different (and thus different relative errors).

Both can be valid, but you would need more data to justify the choice.
Taking the error variance to be a function of the actual weight makes it "heteroscedastic". It would also mean the answer to the question would be a function of the observed weight - i.e. you would not get just one number for the s.d. I think you should avoid this complication if you can.

haruspex · May 27, 2012

viraltux said:

But of course! OK, let's call X the random variable with the real weights, and ε the random error in the measurement. then Y=X+ε will be the actual measurements you have, in this case Y = {50,10,5}.

You want to know how ε SD affects Y SD, right? Then we go:

Y=X+ε → V(Y) = V(X+ε) → V(Y) = V(X) + V(ε) → V(X) = V(Y) - V(ε)

And therefore we can say that the SD for the real weights considering the measurement errors is

[tex]σ_X = \sqrt{σ_Y^2 - σ_ε^2}[/tex]

What you were doing before was to compare the estimations of σ_Y and σ_ε

viraltux, there must be something wrong with that argument. The uncertainty in the weighings cannot reduce the s.d. I would believe [tex]σ_X = \sqrt{σ_Y^2 + σ_ε^2}[/tex]

viraltux · May 28, 2012

haruspex said:

viraltux, there must be something wrong with that argument. The uncertainty in the weighings cannot reduce the s.d. I would believe [tex]σ_X = \sqrt{σ_Y^2 + σ_ε^2}[/tex]

There is nothing wrong.

σ_X is the uncertainty of the real weights, the measured weights uncertainty will always be higher due to the error. Probably what you mean is this [tex]σ_Y = \sqrt{σ_X^2 + σ_ε^2}[/tex] which is also true.

haruspex · May 28, 2012

viraltux said:

There is nothing wrong.

σ_X is the uncertainty of the real weights, the measured weights uncertainty will always be higher due to the error. Probably what you mean is this [tex]σ_Y = \sqrt{σ_X^2 + σ_ε^2}[/tex] which is also true.

OK viraltux, I see what you've done.
For clarity, let me express the problem like this:
- We have N sets of measurements of each of M objects which samples from a population.
- We want to know the s.d., Su, of the sampled population.
An obvious approach is to obtain the average measurement of each object then compute a s.d for the population in the usual way from those M values. Call this result Sm (s.d. of means). Clearly this will underestimate that s.d. because it ignores the uncertainty in the M values.
There is another thing to be clarified. I'm sure you're familiar with the fact that there are two formulae for s.d. These correspond to SDEV and SDEVP in spreadsheets. SDEVP gives the s.d. of the dataset, whereas SDEV estimates the s.d. of the population of which the dataset is a (small) sample. (Strictly speaking, it gives the sq root of the unbiased estimate of its variance.) Numerically, SDEV = SDEVP * √(n/(n-1)).
As I understand your formula, it only works for the SDEVP interpretation, and all it does is provide another way of calculating Sm, namely, by taking the s.d. of the entire N * M dataset then adjusting it using the s.d. of the measurement error.
So your formula is correct, but not actually useful. What's needed is a less biased estimate of the SDEV of the population.
I'll give this some more thought...

TheBigH · May 28, 2012

Hi everyone,
I am having a similar problem, except that mine involves repeated measurements of the same same constant quantity. Suppose I'm measuring the brightness of a star, a few times with a good telescope that gives small errors (generally of different sizes), and many times with a less sensitive instrument that gives larger errors (also generally of different sizes).

Clearly I can get a brightness for the star by calculating an average weighted by the inverse squares of the errors on the individual measurements, but how can I get the uncertainty on the final measurement?

I don't think the above method for propagating the errors is applicable to my problem because incorporating more data should generally reduce the uncertainty instead of increasing it, even if the data is of poor quality. I should not have to throw away measurements to get a more precise result.

Can anyone help?

viraltux · May 29, 2012

haruspex said:

...So your formula is correct, but not actually useful. What's needed is a less biased estimate of the SDEV of the population.
I'll give this some more thought...

Hi haruspex...

OK, let's go, given a random variable X, you will never able to calculate its σ (standard deviation) with a sample, ever, no matter what. The best you can do is to estimate that σ. Usually the estimation of an statistic is written with have a hat on it, in this case [itex]\hat{σ}[/itex].

Now, though the formula I wrote is for σ, it works for any of the infinite ways to estimate σ with a [itex]\hat{σ}[/itex]. In this case, since you don't have the whole population of rocks, using SDEV or SDEVP only gives you two of those infinite ways to get a [itex]\hat{σ}[/itex] under their own mathematical assumptions, and, by the way, you can find situations where any of those infinite ways will be the best.

haruspex said:

As I understand your formula, it only works for the SDEVP interpretation,

the formula

[tex]σ_X = \sqrt{σ_Y^2 - σ_ε^2}[/tex]

is not only useful, but the one that is going to work with whatever estimation [itex]\hat{σ}[/itex] you end up using for σ.

Sooooo... yeah, that is basically it...

viraltux · May 29, 2012

TheBigH said:

Hi everyone,
I am having a similar problem, except that mine involves repeated measurements of the same same constant quantity. Suppose I'm measuring the brightness of a star, a few times with a good telescope that gives small errors (generally of different sizes), and many times with a less sensitive instrument that gives larger errors (also generally of different sizes).

Clearly I can get a brightness for the star by calculating an average weighted by the inverse squares of the errors on the individual measurements, but how can I get the uncertainty on the final measurement?

I don't think the above method for propagating the errors is applicable to my problem because incorporating more data should generally reduce the uncertainty instead of increasing it, even if the data is of poor quality. I should not have to throw away measurements to get a more precise result.

Can anyone help?

Hi TheBigH,

You are absolutely right!

A way to do so is by using a Kalman filter: http://en.wikipedia.org/wiki/Kalman_filter

In your case, for your two measurements a and b (and assuming they both have the same size), you would have a reduced uncertainty given by the expression:

[tex]σ_{ab}^2 = σ_a^2 \left(1-\frac{σ_a^2}{σ_a^2+σ_b^2}\right)[/tex]

haruspex · May 29, 2012

Viraltux,
As I said, the obvious approach to the OP, and most likely the one used by rano, is to take the average of measurements for each rock sample, then find the s.d. of those averages. It seems to me that your formula does the following to get exactly the same answer:
- finds the s.d. of all the measurements as one large dataset
- adjusts by removing the s.d. contribution from the measurement errors
This is why I said it's not useful.
But I was wrong to say it requires SDEVP; it works with SDEV, and shows one needs to be careful about the sample sizes. If SDEV is used in the 'obvious' method then in the final step, finding the s.d. of the means, the sample size to use is m * n, i.e. the total number of measurements.
Working with variances (i.e. sigma-squareds) for convenience and using Vx, Vy, Ve, VPx, VPy, VPe with what I hope are the obvious meanings, your equation reads:
VPx = VPy - VPe
If there are m rocks and n weighings each, the relationship between SDEV and SDEVP yields:
Vy = VPy*mn/(mn-1)
Ve = VPe*mn/(mn-1)
So if we derive VPx by your formula we get:
VPx = VPy*mn/(mn-1) - VPe*mn/(mn-1)
Hence we must use a sample size of mn to get Vx.

I'm still not sure whether Vx is the unbiased estimate of the population variance... working on it.

viraltux · May 29, 2012

haruspex said:

It seems to me that your formula does the following to get exactly the same answer:
- finds the s.d. of all the measurements as one large dataset
- adjusts by removing the s.d. contribution from the measurement errors
This is why I said it's not useful...

I don't know what you mean by "not useful", I'm not sure we are in the same line here.

haruspex said:

I'm still not sure whether Vx is the unbiased estimate of the population variance... working on it.

Do you think that an unbiased estimator for σ is the best possible in any situation? because this is not true. But anyway, using your notation Vx is unbiased if that is your concern, VPx is not.

haruspex · May 29, 2012

I'm saying it's not useful because all it does is calculate the same number that the method I'm calling the obvious one calculates, i.e. take the s.d. of the average weights of the m rocks. It just uses a different route to get there.

viraltux · May 29, 2012

haruspex said:

I'm saying it's not useful because all it does is calculate the same number that the method I'm calling the obvious one calculates, i.e. take the s.d. of the average weights of the m rocks. It just uses a different route to get there.

You mean weighting the same rock with the same device several times? Why would anyone do that? And why would anyone expect the machine to give a different weight if I take the same the rock out and put it back? If that was so the machine would be horribly calibrated.

This does not make much sense to me since you already know the error given by the manufacturer of the machine; that is the ± that I assume is the ε we are talking about.

haruspex · May 29, 2012

viraltux said:

You mean weighting the same rock with the same device several times? Why would anyone do that? And why would anyone expect the machine to give a different weight if I take the same the rock out and put it back? If that was so the machine would be horribly calibrated.

That is indeed what the OP says. Of course, it could be merely a model for the actual problem.

rano · May 29, 2012

haruspex said:

That is indeed what the OP says. Of course, it could be merely a model for the actual problem.

Thank you all for thinking about this problem. Haruspex is correct on both respects; first that I am interested in measuring the same sample multiple times, and also that the original scenario is simply a model. The real-life application of this is actually in biology, where for any given sample, we prepare multiple "measurement replicates" to take into account operator error from pipetting small volumes. This is why in my original example the standard deviations were all different, because the error is coming not only from an instrument but also from human error which can be variable.

A way to imagine this would be if you wanted to know the mean sugar concentration from 3 different brands of soda bottles (e.g. Pepsi, Sprite, etc.). You have an instrument to measure this, but, the native sugar concentration is too high for your instrument, so you need to dilute 10:1. From these bottles you take 100 mL and add it to 900 mL of water. But you know that there is significant human error in making this dilution, and thus from each sample (each brand of soda) you repeat the dilution 3 times. So now you have a total of 3x3 samples. There is variance both between brands of soda and also within a brand of soda due to sample preparation error. You will end up with a sugar concentration from each bottle with some variance that reflects error in the instrument (with usually known magnitude) and from human operator error (often unknown).

I'm not sure if this example helped clarify my question. The main point I wanted to clarify is that there is often error from sample preparation and not just from an instrument. Thank you all so much for the extensive thought into this. It really has been great to read the discussion.

haruspex · May 29, 2012

OK, think I have an answer.

Underlying r.v. X, unknown mean M, variance V (sigma-squared).
Samples subject to random measurement errors with zero mean and unknown variance Vf.
R samples of X, each measured S times.
Observed values Yij, i = 1..R, j = 1..S.
Problem: find the least biased estimator for the variance of X.

Proposal:
- compute Mi = mean(Yij) for i = 1..R
- Estimate V as the variance V' of the set Mi

To assess bias, need to calculate E(V').
Let Yij = Xi + Fij (F is error variable)
Let M' = average of Mi
E(F) = 0
Without loss of generality we can set M = 0, so E(Mi) = 0 for all i.
E(V') = E(Ʃ_i (Mi - M')²)/R
R.E(V') = E(Ʃ_i(Mi²)) - R.E(M'²)) (Eqn 1)
E(Ʃ_i(Mi²)) = E(Ʃ_i(Ʃ_j Yij)²/S²)
= E(Ʃ_i(Ʃ_j(Xi+Fij))²)/S²
= E(Ʃ_i(Ʃ_j(Xi) + Ʃ_j(Fij))²)/S²
= E(Ʃ_i(S.Xi + Ʃ_j(Fij))²)/S²
= E(Ʃ_i(S².Xi² + 2.S.Xi.Ʃ_j(Fij) + (Ʃ_j(Fij))²)/S²
Since E(Fij) = 0, this reduces to
= E(Ʃ_i(Xi²) + Ʃ_i(Ʃ_j(Fij²))/S²)
= R.V + R.Vf/S (Eqn 2)

R².S².E(M'²) = E((Ʃ_ij(Yij))²)
= E((Ʃ_ij(Xi) + Ʃ_ij(Fij))²)
= E((S.Ʃ_i(Xi) + Ʃ_ij(Fij))²)
= E(S².(Ʃ_i(Xi))² + 2.S.Ʃ_i(Xi).Ʃ_ij(Fij) + (Ʃ_ij(Fij))²)
= S².R.V + R.S.Vf
E(M'²) = V/R + Vf/(R.S) (Eqn 3)

Using (2) and (3) to substitute in (1):
E(V') = V + Vf/S - V/R - Vf/(R.S)
= (V + Vf/S)(1 - 1/R)
E(Vf') = Vf.(1 - 1/(R.S))

So least biased estimator for V = E(V').R/(R-1) - Vf'.R/(R.S-1)

So here's the procedure:
- take the SDEV of the averaged weighings in the obvious way; squaring this gives E(V').R/(R-1), = σ_Y² in viraltux's notation
- for each of the R*S measurements, form the difference from the average measurement for the corresponding 'rock' and take the SDEV of these; squaring that gives Vf'.R.S/(R.S-1), = σ_ε²
- divide the second by S before subtracting it from the first:
σ_X² = σ_Y² - σ_ε²/S

On the one hand, I do find that divide-by-S tweak a little surprising. On the other, I cannot find an error in my working, and it does fit with my intuition that subtracting the whole of σ_ε² was excessive.

viraltux, would you have the time/patience to check my working?

haruspex · May 30, 2012

Wait guys - I think I do have something wrong there. Give me a while to try to sort it out.

viraltux · May 30, 2012

rano said:

Thank you all for thinking about this problem. Haruspex is correct on both respects; first that I am interested in measuring the same sample multiple times, and also that the original scenario is simply a model. The real-life application of this is actually in biology...

Hahaha

I am laughing because I am new in PF and this is the second of third time that someone ask help for a problem which end up to be not the problem they wanted to solve! OK... Next time I will ask "are you suuuuuuuuure that is the problem?"

haruspex said:

OK, think I have an answer.

I do find that divide-by-S tweak a little surprising. On the other, I cannot find an error in my working, and it does fit with my intuition that subtracting the whole of σ_ε² was excessive.

viraltux, would you have the time/patience to check my working?

OK, I think we are in the same line now... in this case we have two "machine" errors, we can name as σ_m the standard deviation for the sugar concentration measuring machine, and as [itex]\sigma_{tn}[/itex] the s.d when one guy in the team dilutes the solution n times. So again, the expression to estimate the s.d among brands taking into account the errors would be:

[tex]\sigma_X = \sqrt{\sigma_Y^2-\sigma_m^2-\sigma_{tn}^2}[/tex]

Calculate σ_Y is simple, you have the data.
Calculate σ_m is simple, just ask the machine manufacturer.

For σ_tn asking the "manufacturer" probably won't work, so it is up to us the estimation. We can do that in a number of ways but we have chosen in the problem to repeat the measurements n times for every brand, so in this case we know that for one particular brand we can estimate σ_tn as follow:

[tex]\sigma_{tn} = \sqrt{\sigma_Y^2-\sigma_m^2}[/tex]

Now, pay attention that this estimation is dependent of n, since this estimation is the standard deviation of the sample means of Y which is no other thing that the means standard error, that is [itex]\sigma_{tn}=\sigma_{t}/\sqrt{n}[/itex] where n is the number of samples. Therefore, for any value of n we have:

[tex]\sigma_{t} = \sqrt{n(\sigma_Y^2-\sigma_m^2)}[/tex]

So arus, we need to estimate σ_t which will give us an idea of how reliable are you guys doing the dilutions, now, this might change from person to person but let's assume again that you all are equally reliable in the team (or screwy

), so simply we can just assume σ_t as the team reliability. OK?

But we have several σ_t, in this case three brands, but we don't want to use just those three brands because, since what we need to estimate is the reliability of the "machine team", you can use all the historical data you have to further increase the accuracy of the estimation for σ_t!

So

Gather all your data and calculate all the σ_t (all must have the same n)
Use the expected value of the collection of σ_t as the estimation for σ_t.
Use this formula to estimate σ_X (And since you already have the estimation for σ_t you can change n if it suits you i.e. time, money...)
[tex]\sigma_X = \sqrt{\sigma_Y^2-\sigma_m^2-\sigma_{t}^2/n}[/tex]

OK, so that's it... and if someone changes the problem again I resign!

rano · May 30, 2012

viraltux said:

[tex]\sigma_X = \sqrt{\sigma_Y^2-\sigma_m^2-\sigma_{t}^2/n}[/tex]

Ok thank you virtaltux. Sorry about the confusion with the original problem; I should have been more specific from the beginning. My interpretation of your analysis and final equation is that:

σ_Y = error propagated standard deviation of population (i.e. SD of sugar concentration between brands of soda taking into account measurement SD)

σ_X = standard deviation of sample means from a population (i.e. SD of sugar concentration between brands of soda without taking into account measurement SD)

σ_m = standard deviation arising from an instrument

σ_t = standard deviation arising from human sample preparation error, which needs to be normalized to n

And then to find σ_Y, which I am calling the error propagated SD, we simply add σ_x², σ_m², and σ_t²/n, and then take the square root. Could you please confirm that this is the case? Thanks again for your help.

viraltux · May 30, 2012

rano said:

And then to find σ_Y, which I am calling the error propagated SD, we simply add σ_x², σ_m², and σ_t²/n, and then take the square root. Could you please confirm that this is the case? Thanks again for your help.

No, I think you mix up Y and X... maybe? again,

X are the real concentrations of sugar (you will never know these), Y is the concentration the machine returns to you (which includes the human dilution error and the machine error).

You don't need to use the formula to find out σ_Y since Y are the measurements of the machine, so just estimate its s.d. in the usual way.

You need to estimate σ_t so that you know how reliable is your team, just the same way you know how reliable is the machine with σ_m, and for that you use all the data available. If you are always equally reliable (i.e. you don't do better in winter than summer) σ_t will be a constant and once you have a good estimation you can stick to that value for future tests.

This info will help you decide the size of the n, it might be that with two n or even one you're fine or maybe you are better of with five if you introduce too much error with the dilution.

Clear now?

rano · May 30, 2012

Ok thank you for the clarification. I got mixed up because a few posts back the errors were additive rather than subtractive (but I see now the variables were switched). So we would report mean ± σ_X from your equation, where σ_X is the error propagated standard deviation of the population.

This analysis has been very helpful; I've searched through 3 textbooks and numerous websites but this has been the only discussion I have seen on the subject.

viraltux · May 30, 2012

rano said:

Ok thank you for the clarification. I got mixed up because a few posts back the errors were additive rather than subtractive (but I see now the variables were switched). So we would report mean ± σ_X from your equation, where σ_X is the error propagated standard deviation of the population.

This analysis has been very helpful; I've searched through 3 textbooks and numerous websites but this has been the only discussion I have seen on the subject.

where σ_X is the [STRIKE]error propagated[/STRIKE] standard deviation of the population. (Remember X are the real values, the error has been accounted for in σ_X)

Good Luck!

viraltux · May 30, 2012

rano said:

So we would report mean ± σ_X

Oops, no, no, no, hold it right there. you want to report the estimation of E(Y) with [itex]\bar{Y}[/itex] ?

σ_X is the s.d. for the population taking into account the errors (removing them), if this is what you want then we already discussed how you get that, but if you want to report [itex]\bar{Y}[/itex] then you cannot get rid of the error and you need to report [itex]\bar{Y}±σ_Y[/itex].

Think about this example, imaging the concentration of sugar is 1 in the three samples, when we measure it we will get that the errors account exactly for the the machine and human error, that means σ_X=0, but you cannot get rid of the errors if you are interested in the estimation of E(Y), you cannot tell [itex]\bar{Y} ± 0[/itex].

In this example you could say all three have the same value because σ_X=0, but you don't know that value, so whatever that value is will be within an error [itex]±σ_Y[/itex] which is additive, now I see why you were insisting the additive model, anyway, you need to simply report:
[tex]\bar{Y} ± \sigma_Y[/tex]

which already accounts for all the human errors and machine ones. At the end it was just that, go figure, ha!

Joel Levitt · May 31, 2012

It depends on which question you are asking. Are you asking what is the average weight of the three rocks (error = 2.4g) or are you asking what is the probability of finding a rock with a specified weight (depends on the distribution that applies, we know that there are almost uncountably many tiny rocks on Earth and only one as big as the Eurasian continental mass, unless you count the planet itself as a rock)?

Applied math is a tool for answering many questions, but only after the questions have been carefully posed.

Error propagation with averages and standard deviation

FAQ: Error propagation with averages and standard deviation

What is error propagation?

How is error propagation calculated for averages?

What is the purpose of calculating error propagation?

How does error propagation affect the precision of a result?

What is the relationship between error propagation and standard deviation?

Similar threads

Hot Threads

Recent Insights