Comparing Sample Means with Pooled Variance Estimate

adamg · May 21, 2005

Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks

uart · May 22, 2005

adamg said:

Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks

When you calculate the sample variance using "(sum of squared difference from mean)/N" then it turns out that this gives a biased estimate of the population variance (and it's square-root a biased estimate of the population standard deviation). Replacing "N" with "N-1" gives an unbiased estimate of the population variance and standard deviation so it's usually preferred. Unfortunately there is often a bit of ambiguity whenever sample var and sd are discussed as there doesn't seem to be a universal standard of whether to use "N" or "N-1" in the definition.

In your example above I assume that s1^2 and s2^2 are based on the "N-1" calculations.

uart · May 23, 2005

Here is the above in a bit more detail :

\[ s_n^2 = 1/n \sum (x_i-\bar{x})^2 \]

\[= 1/n \sum [ ( (x_i-\mu) - (\bar{x}-\mu) )^2 ]\]

\[= 1/n \sum [(x_i-\mu)^2 - 2 (x-\mu)(\bar{x}-\mu) + (\bar{x}-\mu)^2 ] \]

\[= 1/n \sum [(x_i-u)^2)] - (\bar{x}-u)^2 \]

So,

\[E[s_n^2] = 1/n \sum E[(x_i-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= E[(x-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= \sigma^2 - \{\rm{term\ greater\ than\ or\ equal\ zero}\} \]

This shows that the sample variance s_n^2 always under-estimates the population variance \sigma^2.

You can further show (assuming all the samples are independant) that E[(\bar{x}-u)^2 ] is equal to $\sigma^2/n and hence,

\[s_n^2 = \sigma^2 - \sigma^2/n = \frac{n-1}{n} \sigma^2 \].

So not only is s_n^2 a biased estimator of \sigma^2 it is too small by a factor of precisely (n-1)/n. Clearly using (n-1) instead of (n) in the denominator fixes this and makes the expectation of this modified sample variance (E(s_{n-1}^2)) equal to the population variance (\sigma^2).

Comparing Sample Means with Pooled Variance Estimate

Thread 'Video on imaginary numbers and some queries'

Thread 'What Exactly is Dirac’s Delta Function? - Insight'

Thread 'Unit Circle Double Angle Derivations'

Similar threads

Hot Threads

Insights Fermat's Last Theorem

B What could prove this wrong? I'm having a dispute with friends

B About a definition: What is the number of terms of a polynomial P(x)?

B Geometry Puzzle with 20 points in a cross pattern

I How to convince someone that an event repeating an infinite number of times does not guarantee every outcome?

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem