# Large samples confidence interval for difference in means

1. Feb 20, 2009

### kingwinner

The following distinguishes TWO cases for large samples confidence interval for difference in means:
http://www.geocities.com/asdfasdf23135/stat11.JPG

where Sp^2 is the pooled estimate of the common variance, n1 is the sample size from the first population, n2 is the sample size from the second population, and z_alpha/2 is 100(1-alpha/2) th percentile of the standard normal.
==========================

It seems to me that case 1 is a special case of case 2 with the population variances being equal. If this is the case, the formula for case 2 should reduce to the formula for case 1 when the population variances are equal. However, I have no way of seeing it being the case.
[aside: I am trying to cut down on the number of formulas that I have to memorize. Instead of two different formulas, if case 2 contains case 1, then I only have to memorize the general case 2 formula which is nice.]

Could somebody please show me how I can reduce case 2 to case 1?
Any help would be appreciated!

Last edited: Feb 21, 2009
2. Feb 21, 2009

Suppose you knew the two population variances. The confidence interval for $$\mu_1 - \mu_2$$ would look like this.

$$(\overline X_1 - \overline X_2) \pm z_{\frac{\alpha}2} \sqrt{\, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }$$

This is true for any $$\sigma_1^2$$ and $$\sigma_2^2$$. If the two variances are the same, call the common value $$\sigma^2$$. The confidence interval simplifies to

$$(\overline X_1 - \overline X_2) \pm z_{\frac{\alpha}2} \sqrt{\,\sigma^2 \left(\frac 1 {n_1} + \frac 1 {n_2} \right) }$$

so if you actually know the real variances, the two formulae are equivalent.

However, in practice you don't know the real variances, and you have to make due with the sample variances. Since, even if you are willing to assume the population variances are equal, there is no reason to expect the sample variances will be equal,
instead of using them individually they are pooled to obtain $$s^2_p$$. Then the interval is

\begin{align*} (\overline X_1 - \overline X_2) &\pm z_{\frac{\alpha}2} \sqrt{\, \frac{s_p^2}{n_1} + \frac{s_p^2}{n_2} } \Rightarrow \\ (\overline X_1 - \overline X_2) &\pm z_{\frac{\alpha}2} \sqrt{\, s_p^2 \left(\frac 1 {n_1} + \frac 1 {n_2} \right)} \end{align*}

3. Feb 21, 2009

### kingwinner

So we have,

Case 1:
The variances are unknown but known to be equal.

Case 2:
The variance are unknown and not-known to be equal (they could be equal, but we just don't have this additional information).

So they are really two separate cases. Am I right?

But what is the real point of using the "pooled estimate"? Is it going to give a better estimate? (i.e. when the population variances are unkown but known to be equal, will the case 1 formula give a narrower (better) interval than case 2?)

Thanks!