Register to reply

Pooled variance

by adamg
Tags: pooled, variance
Share this thread:
adamg
#1
May21-05, 12:04 PM
P: 48
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks
Phys.Org News Partner Mathematics news on Phys.org
Heat distributions help researchers to understand curved space
Professor quantifies how 'one thing leads to another'
Team announces construction of a formal computer-verified proof of the Kepler conjecture
uart
#2
May22-05, 11:45 AM
Sci Advisor
P: 2,751
Quote Quote by adamg
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks
When you calculate the sample variance using "(sum of squared difference from mean)/N" then it turns out that this gives a biased estimate of the population variance (and it's square-root a biased estimate of the population standard deviation). Replacing "N" with "N-1" gives an unbiased estimate of the population variance and standard deviation so it's usually preferred. Unfortunately there is often a bit of ambiguity whenever sample var and sd are discussed as there doesn't seem to be a universal standard of whether to use "N" or "N-1" in the definition.

In your example above I assume that s1^2 and s2^2 are based on the "N-1" calculations.
uart
#3
May23-05, 12:01 PM
Sci Advisor
P: 2,751
Here is the above in a bit more detail :

[tex]\[ s_n^2 = 1/n \sum (x_i-\bar{x})^2 \][/tex]

[tex]\[= 1/n \sum [ ( (x_i-\mu) - (\bar{x}-\mu) )^2 ]\][/tex]

[tex]\[= 1/n \sum [(x_i-\mu)^2 - 2 (x-\mu)(\bar{x}-\mu) + (\bar{x}-\mu)^2 ] \][/tex]

[tex]\[= 1/n \sum [(x_i-u)^2)] - (\bar{x}-u)^2 \][/tex]

So,

[tex]\[E[s_n^2] = 1/n \sum E[(x_i-u)^2)] - E[(\bar{x}-u)^2 ] \][/tex]

[tex]\[= E[(x-u)^2)] - E[(\bar{x}-u)^2 ] \][/tex]

[tex]\[= \sigma^2 - \{\rm{term\ greater\ than\ or\ equal\ zero}\} \][/tex]

This shows that the sample variance [tex]s_n^2[/tex] always under-estimates the population variance [tex]\sigma^2[/tex].

You can further show (assuming all the samples are independant) that [tex]E[(\bar{x}-u)^2 ] [/tex] is equal to [tex]$\sigma^2/n[/tex] and hence,

[tex]\[s_n^2 = \sigma^2 - \sigma^2/n = \frac{n-1}{n} \sigma^2 \][/tex].

So not only is [tex]s_n^2[/tex] a biased estimator of [tex]\sigma^2[/tex] it is too small by a factor of precisely [tex](n-1)/n[/tex]. Clearly using [tex](n-1)[/tex] instead of [tex](n)[/tex] in the denominator fixes this and makes the expectation of this modified sample variance ([tex]E(s_{n-1}^2)[/tex]) equal to the population variance ([tex]\sigma^2[/tex]).


Register to reply

Related Discussions
Variance of a sum Set Theory, Logic, Probability, Statistics 7
Mean and variance Calculus & Beyond Homework 1
Variance of variance Set Theory, Logic, Probability, Statistics 4
Help on Variance of Variance Calculus & Beyond Homework 1
Var(x) = E[ x^2] - (E[X])^ Set Theory, Logic, Probability, Statistics 9