# Standard Deviation of Averages

1. May 13, 2013

### SamBam77

I am measuring the thickness of many identically prepared objects. In order to obtain the most accurate value, I measure each object multiple times, in different locations, and average these values to get the average thickness of that object, along with the standard deviation. But I am really interested in the thickness of the average object prepared with this technique. So I produce many objects in the same way and, using the same instrument, measure each object’s average thickness.

At this point, I have a set of averages:
Object_1: µ_1 ± σ_1
Object_2: µ_2 ± σ_2

Object_i: µ_i ± σ_i
Where µ_i and σ_i are the average thickness and standard deviation of the ith object.

How would I go about computing the overall average thickness, and its standard deviation, from these data?

To find the overall average, I know it should be:
µ_overall = (N_1 * µ_1 + N_2 * µ_2 + … + N_i * µ_i) / (N_1 + N_2 + … + N_i)
and if each sample’s average is computed using the same number of measurements, this simplifies to just the average of the averages,
µ_overall = (µ_1 + µ_2 + … + µ_i) / i

But what about the standard deviations? I am not sure.
Does it have to do something with sum of the squares of the individual standard deviations (variances)?

And a related concept question that I am not clear on in my mind at this point:
Let’s say that the measurements performed on each object, individually, have a very narrow standard deviation, but the average that results from each object’s measurement vary greatly between objects.
The resulting overall standard deviation must be large, right?
1. The problem statement, all variables and given/known data

2. Relevant equations

3. The attempt at a solution

2. May 14, 2013

### haruspex

Why not just treat all measurements as one big dataset and take the standard deviation of that?

3. May 15, 2013

### SamBam77

Is that statistically valid? I am not sure.

I was hoping for a solution that would make use of the averages and deviations that I already calculated.

4. May 15, 2013

### haruspex

Knowing the number of samples in each dataset, and the average and s.d. of each dataset, you can recover the sum and sum of squares of each. Thus you can compute the sum and sum of squares of the entire collection.

5. May 16, 2013

### Ray Vickson

I think your notation leaves out some important distinctions: in models of this type we typically have parameters $\mu$ and $\sigma$ that are unknown. We typically estimate these by using a sample mean $m = \bar{x}$ and a sample standard deviation $s = \sqrt{s^2}$, where we often use a so-called unbiased estimator $s^2$ of $\sigma^2$. (This will give a slightly biased estimate of $\sigma$ itself, but nobody worries much about that, as getting unbiased estimates of $\sigma$ is a really difficult problem.) So, in your problem, are the $\mu_i$ and $\sigma_i$ truly the unknown, unobservable parameters, or did you really mean $m_i$ and $s_i$?

How you treat the data depends on what you want to do. If you really think there can be significant differences between the objects (that is, different underlying $\mu_i$) then combining all the data into a single vs. aggregating the different $s_i^2$ values somehow can yield very different results. In fact, if all the $\sigma_i$ are equal to some common $\sigma$ (but, of course, the $s_i$ are different because of random fluctuations), then you are dealing with the types of questions involved in ANOVA (Analysis of Variance)----which looks at differences in means by examining variance). If the two types of variance calculations give very different results, you can take that a signal that the $\mu_i$ are not all the same.

So, for example, if your different $i$ correspond to items produced on the same machine but at different times of the day, or different days of the week, or under different weather conditions, then you are looking to see if there are time-of-day or day-of-week or weather-related effects. (Many types of industrial process are affected quite a bit by such factors.) Or, in a quality-control scenario, you may be trying to ascertain if a process has drifted out of control, etc.

There seem to be few on-line sources of notes and explanations about ANOVA (although there are lots of on-line calculators, etc.), so looking in a book may still be your best bet. However, I did find one somewhat useful page: http://en.wikipedia.org/wiki/One-way_analysis_of_variance . The other Wiki article on ANOVA leaves a lot to the imagination.