Comparing Sample Means with Pooled Variance Estimate

AI Thread Summary
In hypothesis testing for comparing two sample means with unknown but equal variances, the pooled variance estimate uses (n1-1) and (n2-1) in the numerator to avoid bias in estimating the population variance. Using N instead of N-1 leads to an underestimation of the population variance, as sample variance calculated with N is biased. The adjustment to N-1 corrects this bias, ensuring that the expected value of the sample variance equals the true population variance. This adjustment is crucial for accurate statistical inference. Therefore, employing the pooled variance formula with (n1-1) and (n2-1) is the preferred method in these scenarios.
adamg
Messages
48
Reaction score
0
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks
 
Mathematics news on Phys.org
adamg said:
Suppose you are conducting a hypothesis test to compare two sample means from independent samples, with the variance unknown, but you know it is the same for both populations. Then you use the pooled estimate of the variance given by [ (n1 - 1)s1^2 + (n2-1)s2^2 ] / (n1+n2-2)

I was just wondering why we use (n1-1) etc instead of using n1 and n2 and then dividing by n1 + n2?

thanks

When you calculate the sample variance using "(sum of squared difference from mean)/N" then it turns out that this gives a biased estimate of the population variance (and it's square-root a biased estimate of the population standard deviation). Replacing "N" with "N-1" gives an unbiased estimate of the population variance and standard deviation so it's usually preferred. Unfortunately there is often a bit of ambiguity whenever sample var and sd are discussed as there doesn't seem to be a universal standard of whether to use "N" or "N-1" in the definition.

In your example above I assume that s1^2 and s2^2 are based on the "N-1" calculations.
 
Last edited:
Here is the above in a bit more detail :

\[ s_n^2 = 1/n \sum (x_i-\bar{x})^2 \]

\[= 1/n \sum [ ( (x_i-\mu) - (\bar{x}-\mu) )^2 ]\]

\[= 1/n \sum [(x_i-\mu)^2 - 2 (x-\mu)(\bar{x}-\mu) + (\bar{x}-\mu)^2 ] \]

\[= 1/n \sum [(x_i-u)^2)] - (\bar{x}-u)^2 \]

So,

\[E[s_n^2] = 1/n \sum E[(x_i-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= E[(x-u)^2)] - E[(\bar{x}-u)^2 ] \]

\[= \sigma^2 - \{\rm{term\ greater\ than\ or\ equal\ zero}\} \]

This shows that the sample variance s_n^2 always under-estimates the population variance \sigma^2.

You can further show (assuming all the samples are independant) that E[(\bar{x}-u)^2 ] is equal to $\sigma^2/n and hence,

\[s_n^2 = \sigma^2 - \sigma^2/n = \frac{n-1}{n} \sigma^2 \].

So not only is s_n^2 a biased estimator of \sigma^2 it is too small by a factor of precisely (n-1)/n. Clearly using (n-1) instead of (n) in the denominator fixes this and makes the expectation of this modified sample variance (E(s_{n-1}^2)) equal to the population variance (\sigma^2).
 
Last edited:
Thread 'Video on imaginary numbers and some queries'
Hi, I was watching the following video. I found some points confusing. Could you please help me to understand the gaps? Thanks, in advance! Question 1: Around 4:22, the video says the following. So for those mathematicians, negative numbers didn't exist. You could subtract, that is find the difference between two positive quantities, but you couldn't have a negative answer or negative coefficients. Mathematicians were so averse to negative numbers that there was no single quadratic...
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Thread 'Unit Circle Double Angle Derivations'
Here I made a terrible mistake of assuming this to be an equilateral triangle and set 2sinx=1 => x=pi/6. Although this did derive the double angle formulas it also led into a terrible mess trying to find all the combinations of sides. I must have been tired and just assumed 6x=180 and 2sinx=1. By that time, I was so mindset that I nearly scolded a person for even saying 90-x. I wonder if this is a case of biased observation that seeks to dis credit me like Jesus of Nazareth since in reality...
Back
Top