Comparing Sample Means with Pooled Variance Estimate

In summary, when conducting a hypothesis test to compare two sample means from independent samples with unknown but equal variances, the pooled estimate of the variance is calculated using [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2) instead of just dividing by (n1+n2) in order to get an unbiased estimate of the population variance. This is because using (n-1) instead of (n) in the denominator corrects for the bias in the sample variance.
  • #1
adamg
48
0
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks
 
Mathematics news on Phys.org
  • #2
adamg said:
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks

When you calculate the sample variance using "(sum of squared difference from mean)/N" then it turns out that this gives a biased estimate of the population variance (and it's square-root a biased estimate of the population standard deviation). Replacing "N" with "N-1" gives an unbiased estimate of the population variance and standard deviation so it's usually preferred. Unfortunately there is often a bit of ambiguity whenever sample var and sd are discussed as there doesn't seem to be a universal standard of whether to use "N" or "N-1" in the definition.

In your example above I assume that s1^2 and s2^2 are based on the "N-1" calculations.
 
Last edited:
  • #3
Here is the above in a bit more detail :

[tex]\[ s_n^2 = 1/n \sum (x_i-\bar{x})^2 \][/tex]

[tex]\[= 1/n \sum [ ( (x_i-\mu) - (\bar{x}-\mu) )^2 ]\][/tex]

[tex]\[= 1/n \sum [(x_i-\mu)^2 - 2 (x-\mu)(\bar{x}-\mu) + (\bar{x}-\mu)^2 ] \][/tex]

[tex]\[= 1/n \sum [(x_i-u)^2)] - (\bar{x}-u)^2 \][/tex]

So,

[tex]\[E[s_n^2] = 1/n \sum E[(x_i-u)^2)] - E[(\bar{x}-u)^2 ] \][/tex]

[tex]\[= E[(x-u)^2)] - E[(\bar{x}-u)^2 ] \][/tex]

[tex]\[= \sigma^2 - \{\rm{term\ greater\ than\ or\ equal\ zero}\} \][/tex]

This shows that the sample variance [tex]s_n^2[/tex] always under-estimates the population variance [tex]\sigma^2[/tex].

You can further show (assuming all the samples are independant) that [tex]E[(\bar{x}-u)^2 ] [/tex] is equal to [tex]$\sigma^2/n[/tex] and hence,

[tex]\[s_n^2 = \sigma^2 - \sigma^2/n = \frac{n-1}{n} \sigma^2 \][/tex].

So not only is [tex]s_n^2[/tex] a biased estimator of [tex]\sigma^2[/tex] it is too small by a factor of precisely [tex](n-1)/n[/tex]. Clearly using [tex](n-1)[/tex] instead of [tex](n)[/tex] in the denominator fixes this and makes the expectation of this modified sample variance ([tex]E(s_{n-1}^2)[/tex]) equal to the population variance ([tex]\sigma^2[/tex]).
 
Last edited:

1. What is the purpose of comparing sample means with pooled variance estimate?

The purpose of comparing sample means with pooled variance estimate is to determine if there is a significant difference between the means of two independent samples. This can help researchers make conclusions about the population means based on the sample data.

2. How is the pooled variance estimate calculated?

The pooled variance estimate is calculated by taking the weighted average of the variances of the two samples, where the weights are determined by the sample sizes.

3. What is the assumption of equal variances in comparing sample means with pooled variance estimate?

The assumption of equal variances in comparing sample means with pooled variance estimate is that the variances of the two populations from which the samples are drawn are equal.

4. Can the pooled variance estimate be used for dependent samples?

No, the pooled variance estimate is only applicable for independent samples. For dependent samples, other methods such as the paired t-test should be used.

5. How is the decision made when comparing sample means with pooled variance estimate?

The decision is made by comparing the calculated test statistic, typically the t-statistic, to a critical value from the t-distribution with degrees of freedom equal to the sum of the sample sizes minus two. If the calculated test statistic is greater than the critical value, then the null hypothesis of equal means is rejected in favor of the alternative hypothesis of unequal means.

Similar threads

Replies
1
Views
753
  • General Math
Replies
1
Views
729
  • General Math
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
458
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
432
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
896
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
Back
Top