Standard Deviation of Averages

In summary, The conversation is discussing how to accurately measure the thickness of identically prepared objects by taking multiple measurements and averaging the values, as well as calculating the standard deviation. The focus is on finding the overall average thickness and its standard deviation from the collected data. There is also a discussion on how to handle varying standard deviations and how they may indicate differences in underlying parameters. The use of ANOVA is mentioned as a potential method for analyzing the data and identifying potential factors affecting the measurements.
  • #1
SamBam77
27
0
I am measuring the thickness of many identically prepared objects. In order to obtain the most accurate value, I measure each object multiple times, in different locations, and average these values to get the average thickness of that object, along with the standard deviation. But I am really interested in the thickness of the average object prepared with this technique. So I produce many objects in the same way and, using the same instrument, measure each object’s average thickness.

At this point, I have a set of averages:
Object_1: µ_1 ± σ_1
Object_2: µ_2 ± σ_2

Object_i: µ_i ± σ_i
Where µ_i and σ_i are the average thickness and standard deviation of the ith object.

How would I go about computing the overall average thickness, and its standard deviation, from these data?

To find the overall average, I know it should be:
µ_overall = (N_1 * µ_1 + N_2 * µ_2 + … + N_i * µ_i) / (N_1 + N_2 + … + N_i)
and if each sample’s average is computed using the same number of measurements, this simplifies to just the average of the averages,
µ_overall = (µ_1 + µ_2 + … + µ_i) / i

But what about the standard deviations? I am not sure.
Does it have to do something with sum of the squares of the individual standard deviations (variances)?

And a related concept question that I am not clear on in my mind at this point:
Let’s say that the measurements performed on each object, individually, have a very narrow standard deviation, but the average that results from each object’s measurement vary greatly between objects.
The resulting overall standard deviation must be large, right?
 
Physics news on Phys.org
  • #2
Why not just treat all measurements as one big dataset and take the standard deviation of that?
 
  • #3
Is that statistically valid? I am not sure.

I was hoping for a solution that would make use of the averages and deviations that I already calculated.
 
  • #4
SamBam77 said:
I was hoping for a solution that would make use of the averages and deviations that I already calculated.
Knowing the number of samples in each dataset, and the average and s.d. of each dataset, you can recover the sum and sum of squares of each. Thus you can compute the sum and sum of squares of the entire collection.
 
  • #5
SamBam77 said:
I am measuring the thickness of many identically prepared objects. In order to obtain the most accurate value, I measure each object multiple times, in different locations, and average these values to get the average thickness of that object, along with the standard deviation. But I am really interested in the thickness of the average object prepared with this technique. So I produce many objects in the same way and, using the same instrument, measure each object’s average thickness.

At this point, I have a set of averages:
Object_1: µ_1 ± σ_1
Object_2: µ_2 ± σ_2

Object_i: µ_i ± σ_i
Where µ_i and σ_i are the average thickness and standard deviation of the ith object.

How would I go about computing the overall average thickness, and its standard deviation, from these data?

To find the overall average, I know it should be:
µ_overall = (N_1 * µ_1 + N_2 * µ_2 + … + N_i * µ_i) / (N_1 + N_2 + … + N_i)
and if each sample’s average is computed using the same number of measurements, this simplifies to just the average of the averages,
µ_overall = (µ_1 + µ_2 + … + µ_i) / i

But what about the standard deviations? I am not sure.
Does it have to do something with sum of the squares of the individual standard deviations (variances)?

And a related concept question that I am not clear on in my mind at this point:
Let’s say that the measurements performed on each object, individually, have a very narrow standard deviation, but the average that results from each object’s measurement vary greatly between objects.
The resulting overall standard deviation must be large, right?

I think your notation leaves out some important distinctions: in models of this type we typically have parameters ##\mu## and ##\sigma## that are unknown. We typically estimate these by using a sample mean ##m = \bar{x}## and a sample standard deviation ##s = \sqrt{s^2}##, where we often use a so-called unbiased estimator ##s^2## of ##\sigma^2##. (This will give a slightly biased estimate of ##\sigma## itself, but nobody worries much about that, as getting unbiased estimates of ##\sigma## is a really difficult problem.) So, in your problem, are the ##\mu_i## and ##\sigma_i## truly the unknown, unobservable parameters, or did you really mean ##m_i## and ##s_i##?

How you treat the data depends on what you want to do. If you really think there can be significant differences between the objects (that is, different underlying ##\mu_i##) then combining all the data into a single vs. aggregating the different ##s_i^2## values somehow can yield very different results. In fact, if all the ##\sigma_i## are equal to some common ##\sigma## (but, of course, the ##s_i## are different because of random fluctuations), then you are dealing with the types of questions involved in ANOVA (Analysis of Variance)----which looks at differences in means by examining variance). If the two types of variance calculations give very different results, you can take that a signal that the ##\mu_i## are not all the same.

So, for example, if your different ##i## correspond to items produced on the same machine but at different times of the day, or different days of the week, or under different weather conditions, then you are looking to see if there are time-of-day or day-of-week or weather-related effects. (Many types of industrial process are affected quite a bit by such factors.) Or, in a quality-control scenario, you may be trying to ascertain if a process has drifted out of control, etc.

There seem to be few on-line sources of notes and explanations about ANOVA (although there are lots of on-line calculators, etc.), so looking in a book may still be your best bet. However, I did find one somewhat useful page: http://en.wikipedia.org/wiki/One-way_analysis_of_variance . The other Wiki article on ANOVA leaves a lot to the imagination.
 

Related to Standard Deviation of Averages

What is the standard deviation of averages?

The standard deviation of averages is a statistical measure that represents the amount of variation or spread in a set of data. It measures how much the data points deviate from the average of the data set.

How is the standard deviation of averages calculated?

The standard deviation of averages is calculated by taking the square root of the variance of the data set. The variance is calculated by finding the average of the squared differences between each data point and the mean of the data set.

Why is the standard deviation of averages important?

The standard deviation of averages is important because it helps to understand the distribution of data and how much the data points deviate from the mean. It is also used as a measure of risk or uncertainty in various fields such as finance and science.

What does a high standard deviation of averages indicate?

A high standard deviation of averages indicates that the data points are widely spread out from the average of the data set. This means that there is a large amount of variation in the data and it may be more difficult to make predictions or draw conclusions.

How does the standard deviation of averages differ from the standard deviation?

The standard deviation of averages differs from the standard deviation in that it is calculated from the averages of multiple data sets, rather than from the individual data points of a single data set. This allows for a more comprehensive understanding of the data and its variation.

Similar threads

  • Calculus and Beyond Homework Help
Replies
24
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
2K
  • Introductory Physics Homework Help
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
962
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
2K
Back
Top