Is it valid to combine standard deviations from different groups?

In summary, combining standard deviations is used to determine overall variability in a set of data. This is done by squaring and adding individual standard deviations, then taking the square root. It is useful when comparing groups or populations. Different sample sizes can be combined, but it may not accurately represent the population. Other methods for combining standard deviations exist, but they are not as commonly used and may not be as accurate.
  • #1
regisz90
19
0
We have two groups measuring the same resistors, the nominal value is unknown. Group 1 is slower and because of that they did not calcute the s1 empirical standard deviation.
  • Group 1: N1=500 , R1=6903 , s1=unknown
  • Group 2: N2=20 , R2=6880 , s2=168.3
,where N1,N2 is number of measurements, R1,R2 is average of the measured values and s1,s2 empirical standard deviations

We have to give the confidence interval for the nominal value of the resistance at confidence level p=90% and prove that this is the right way to calculate it.

Somewhere i found an answer, but i don't really understand why is it correct and also i need to prove that its right.

The solution was this: "Even if we don't know s1, we can use estimator R1, because its more accurate than R2 (because of the larger amount of information). However we can only use s2 empirical standard dev. and the confidence interval should be calculated by student´s-t distribution with degree of freedom N2-1=19"
zkot7.png

The s2 is divided by root square of N1 however the student´s distribution has degree of freedom N2. Can someone explain me, why is this correct, why can we combine the two measurements this way or better show me the deduction?
 
Physics news on Phys.org
  • #2
Degrees of freedom are for ##s_2##, so it makes sense that you would use 19.
There is an underlying assumption that the variances in the samples are equal. This seems fair, since the groups are measuring the same resistors.
I would assume that you could also add in the information from group 2 into your mean data as well if you wanted to.
## \overline{R} = \frac{500(6903)+20(6880)}{520} ##
## P[ \overline{R} - \frac{s_2}{\sqrt{N_2+N_1}}t_{19,0.05} < R < \overline{R} + \frac{s_2}{\sqrt{N_2+N_1}}t_{19,0.05} ] = 90\%##

I doubt the intervals would be much different either way. In any case, I would normally argue for including more data whenever possible.
 
Last edited:
  • #3
To add a little more clarity to the matter:
The 90% confidence interval for anyone observation based on the observed standard deviation and mean would be:
## P[ \overline{R} - s_2 t_{19,0.05} < R < \overline{R} + s_2 t_{19,0.05} ] = 90\%##
However, the mean of the data will vary much less...reduced by a factor of ##\sqrt{N}## where N is the number of observations used to determine the mean.
The T distribution tends toward the normal Z distribution for large samples, but for small samples is a fatter curve, which makes for wider confidence intervals than a true normal. This is to account for uncertainty in the estimator for the population standard deviation--in this case ##s_2##. The degrees of freedom in the T distribution are n-1 where n is the number of observations used to estimate ##s_2##.
Does that make sense?
 
  • #4
Yes, thank you! Is there any way I can prove that: $$\frac{\bar{R}-R}{\frac{s_2}{\sqrt{N_1+N_2}}}=t_{19,0.05} $$?
 
  • #5
It is based on the assumption that N1 and N2 are pulled from the same population. If they are, then there is no reason not to add the 20 trials from N2 into the overall observed mean unless you can speculate as to some reason they might have yielded slightly lower values.
So then if you assume they are one population, then you get an estimator for the mean, which is ##\overline{R}## based off of N1+N2 =N trials.
Now the question is: what about the variance of the mean?
We can only estimate based on what we know. Using our best estimate for the variance, ##s_2^2##, we again assume that all the resistors are from the same population and use the T distribution.

I am not sure what you are asking for in terms of a proof...
 
  • #6
Okay, I think I found a solid proof.
I came across the following--refer to Montgomery et al Introduction to Linear Regression Analysis 5th ed. p. 576:

" If ##Z ## ~ ## N(0,1)## and ##V## ~##\chi^2##, and Z and V are independent, then ##\frac{Z}{\sqrt{V/\nu}} ## ~## t_\nu ## "

Given that ##\frac{\overline{R} - \mu }{\frac{\sigma}{\sqrt{520}}}## is normally distributed with mean 0 and stdev = 1, that will serve as Z.
Also, ##\sqrt{\frac{s_2^2}{\sigma^2}}## ~ ##\sqrt{\frac{\chi_{19}^2}{19} }##, this will serve as ##\sqrt{V/\nu}##.
Therefore, ## \frac{ \frac{\overline{R} - \mu }{\frac{\sigma}{\sqrt{520}}} } {\sqrt{\frac{s_2^2}{\sigma^2}} }
= \frac{\overline{R} - \mu }{\frac{\sigma}{\sqrt{520}}} \frac{1}{ \frac{s_2}{\sigma}}
= \frac{\overline{R} - \mu }{\frac{s_2}{\sqrt{520}}}## ~ ##t_{19}##.

Again, the fundamental assumption is that the samples came from the same population, so they can be combined. If there is reason to question this assumption, then there may also be reason to assume that you should question that the variances would be the same between samples.
 

What is the purpose of combining standard deviations?

The purpose of combining standard deviations is to determine the overall variability or dispersion of a set of data points. It allows us to better understand the spread of data and make more accurate conclusions about the population from which the data was collected.

How do you combine standard deviations?

To combine standard deviations, you first square each individual standard deviation, add them together, and then take the square root of the sum. This is known as the root mean square method and is the most commonly used method for combining standard deviations.

When should you combine standard deviations?

Standard deviations should be combined when you have multiple sets of data with their own individual standard deviations and want to determine the overall variability of the combined data. This is especially useful when comparing different groups or populations.

Can you combine standard deviations of different sample sizes?

Yes, you can combine standard deviations of different sample sizes. However, it is important to note that the resulting combined standard deviation may not accurately represent the variability of the entire population. This is because larger sample sizes have more influence on the combined standard deviation than smaller sample sizes.

Are there other methods for combining standard deviations?

Yes, there are other methods for combining standard deviations, such as the range rule of thumb and the standard error of the mean method. However, these methods are not as commonly used as the root mean square method and may not give as accurate results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
737
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
926
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
688
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
668
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
10K
Back
Top