Comparing Sample Means with Pooled Variance Estimate

Click For Summary
SUMMARY

The discussion focuses on the use of the pooled variance estimate in hypothesis testing for comparing two sample means from independent samples with unknown variance. The formula for the pooled variance is given by [ (n1 - 1)s1^2 + (n2 - 1)s2^2 ] / (n1 + n2 - 2). The rationale for using (n1 - 1) and (n2 - 1) instead of n1 and n2 is to obtain an unbiased estimate of the population variance, as using N leads to a biased estimate. The discussion emphasizes that using (N - 1) corrects this bias, ensuring that the sample variance accurately reflects the population variance.

PREREQUISITES
  • Understanding of hypothesis testing principles
  • Familiarity with sample variance and standard deviation calculations
  • Knowledge of statistical notation and formulas
  • Basic concepts of independent samples in statistics
NEXT STEPS
  • Study the derivation of the pooled variance formula in detail
  • Learn about the implications of biased versus unbiased estimators
  • Explore the concept of degrees of freedom in statistical analysis
  • Investigate the application of pooled variance in different statistical tests
USEFUL FOR

Statisticians, data analysts, researchers conducting hypothesis tests, and anyone interested in understanding the nuances of variance estimation in statistical analysis.

adamg
Messages
48
Reaction score
0
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks
 
Physics news on Phys.org
adamg said:
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks

When you calculate the sample variance using "(sum of squared difference from mean)/N" then it turns out that this gives a biased estimate of the population variance (and it's square-root a biased estimate of the population standard deviation). Replacing "N" with "N-1" gives an unbiased estimate of the population variance and standard deviation so it's usually preferred. Unfortunately there is often a bit of ambiguity whenever sample var and sd are discussed as there doesn't seem to be a universal standard of whether to use "N" or "N-1" in the definition.

In your example above I assume that s1^2 and s2^2 are based on the "N-1" calculations.
 
Last edited:
Here is the above in a bit more detail :

\[ s_n^2 = 1/n \sum (x_i-\bar{x})^2 \]

\[= 1/n \sum [ ( (x_i-\mu) - (\bar{x}-\mu) )^2 ]\]

\[= 1/n \sum [(x_i-\mu)^2 - 2 (x-\mu)(\bar{x}-\mu) + (\bar{x}-\mu)^2 ] \]

\[= 1/n \sum [(x_i-u)^2)] - (\bar{x}-u)^2 \]

So,

\[E[s_n^2] = 1/n \sum E[(x_i-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= E[(x-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= \sigma^2 - \{\rm{term\ greater\ than\ or\ equal\ zero}\} \]

This shows that the sample variance s_n^2 always under-estimates the population variance \sigma^2.

You can further show (assuming all the samples are independent) that E[(\bar{x}-u)^2 ] is equal to $\sigma^2/n and hence,

\[s_n^2 = \sigma^2 - \sigma^2/n = \frac{n-1}{n} \sigma^2 \].

So not only is s_n^2 a biased estimator of \sigma^2 it is too small by a factor of precisely (n-1)/n. Clearly using (n-1) instead of (n) in the denominator fixes this and makes the expectation of this modified sample variance (E(s_{n-1}^2)) equal to the population variance (\sigma^2).
 
Last edited:

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
Replies
1
Views
1K
Replies
5
Views
5K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K