Comparing Sample Means with Pooled Variance Estimate

AI Thread Summary
In hypothesis testing for comparing two sample means with unknown but equal variances, the pooled variance estimate uses (n1-1) and (n2-1) in the numerator to avoid bias in estimating the population variance. Using N instead of N-1 leads to an underestimation of the population variance, as sample variance calculated with N is biased. The adjustment to N-1 corrects this bias, ensuring that the expected value of the sample variance equals the true population variance. This adjustment is crucial for accurate statistical inference. Therefore, employing the pooled variance formula with (n1-1) and (n2-1) is the preferred method in these scenarios.
adamg
Messages
48
Reaction score
0
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks
 
Mathematics news on Phys.org
adamg said:
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks

When you calculate the sample variance using "(sum of squared difference from mean)/N" then it turns out that this gives a biased estimate of the population variance (and it's square-root a biased estimate of the population standard deviation). Replacing "N" with "N-1" gives an unbiased estimate of the population variance and standard deviation so it's usually preferred. Unfortunately there is often a bit of ambiguity whenever sample var and sd are discussed as there doesn't seem to be a universal standard of whether to use "N" or "N-1" in the definition.

In your example above I assume that s1^2 and s2^2 are based on the "N-1" calculations.
 
Last edited:
Here is the above in a bit more detail :

\[ s_n^2 = 1/n \sum (x_i-\bar{x})^2 \]

\[= 1/n \sum [ ( (x_i-\mu) - (\bar{x}-\mu) )^2 ]\]

\[= 1/n \sum [(x_i-\mu)^2 - 2 (x-\mu)(\bar{x}-\mu) + (\bar{x}-\mu)^2 ] \]

\[= 1/n \sum [(x_i-u)^2)] - (\bar{x}-u)^2 \]

So,

\[E[s_n^2] = 1/n \sum E[(x_i-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= E[(x-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= \sigma^2 - \{\rm{term\ greater\ than\ or\ equal\ zero}\} \]

This shows that the sample variance s_n^2 always under-estimates the population variance \sigma^2.

You can further show (assuming all the samples are independant) that E[(\bar{x}-u)^2 ] is equal to $\sigma^2/n and hence,

\[s_n^2 = \sigma^2 - \sigma^2/n = \frac{n-1}{n} \sigma^2 \].

So not only is s_n^2 a biased estimator of \sigma^2 it is too small by a factor of precisely (n-1)/n. Clearly using (n-1) instead of (n) in the denominator fixes this and makes the expectation of this modified sample variance (E(s_{n-1}^2)) equal to the population variance (\sigma^2).
 
Last edited:
Seemingly by some mathematical coincidence, a hexagon of sides 2,2,7,7, 11, and 11 can be inscribed in a circle of radius 7. The other day I saw a math problem on line, which they said came from a Polish Olympiad, where you compute the length x of the 3rd side which is the same as the radius, so that the sides of length 2,x, and 11 are inscribed on the arc of a semi-circle. The law of cosines applied twice gives the answer for x of exactly 7, but the arithmetic is so complex that the...
Thread 'Unit Circle Double Angle Derivations'
Here I made a terrible mistake of assuming this to be an equilateral triangle and set 2sinx=1 => x=pi/6. Although this did derive the double angle formulas it also led into a terrible mess trying to find all the combinations of sides. I must have been tired and just assumed 6x=180 and 2sinx=1. By that time, I was so mindset that I nearly scolded a person for even saying 90-x. I wonder if this is a case of biased observation that seeks to dis credit me like Jesus of Nazareth since in reality...
Fermat's Last Theorem has long been one of the most famous mathematical problems, and is now one of the most famous theorems. It simply states that the equation $$ a^n+b^n=c^n $$ has no solutions with positive integers if ##n>2.## It was named after Pierre de Fermat (1607-1665). The problem itself stems from the book Arithmetica by Diophantus of Alexandria. It gained popularity because Fermat noted in his copy "Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos, et...
Back
Top