Large samples confidence interval for difference in means

kingwinner · Feb 20, 2009

The following distinguishes TWO cases for large samples confidence interval for difference in means:
http://www.geocities.com/asdfasdf23135/stat11.JPG

where Sp^2 is the pooled estimate of the common variance, n1 is the sample size from the first population, n2 is the sample size from the second population, and z_alpha/2 is 100(1-alpha/2) th percentile of the standard normal.
==========================

It seems to me that case 1 is a special case of case 2 with the population variances being equal. If this is the case, the formula for case 2 should reduce to the formula for case 1 when the population variances are equal. However, I have no way of seeing it being the case.
[aside: I am trying to cut down on the number of formulas that I have to memorize. Instead of two different formulas, if case 2 contains case 1, then I only have to memorize the general case 2 formula which is nice.]

Could somebody please show me how I can reduce case 2 to case 1?
Any help would be appreciated!

statdad · Feb 21, 2009

Suppose you knew the two population variances. The confidence interval for [tex]\mu_1 - \mu_2[/tex] would look like this.

[tex] (\overline X_1 - \overline X_2) \pm z_{\frac{\alpha}2} \sqrt{\, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }[/tex]

This is true for any [tex]\sigma_1^2[/tex] and [tex]\sigma_2^2[/tex]. If the two variances are the same, call the common value [tex]\sigma^2[/tex]. The confidence interval simplifies to

[tex] (\overline X_1 - \overline X_2) \pm z_{\frac{\alpha}2} \sqrt{\,\sigma^2 \left(\frac 1 {n_1} + \frac 1 {n_2} \right) }[/tex]

so if you actually know the real variances, the two formulae are equivalent.

However, in practice you don't know the real variances, and you have to make due with the sample variances. Since, even if you are willing to assume the population variances are equal, there is no reason to expect the sample variances will be equal,
instead of using them individually they are pooled to obtain [tex]s^2_p[/tex]. Then the interval is

[tex] \begin{align*}<br /> (\overline X_1 - \overline X_2) &\pm z_{\frac{\alpha}2} \sqrt{\, \frac{s_p^2}{n_1} + \frac{s_p^2}{n_2} } \Rightarrow \\<br /> (\overline X_1 - \overline X_2) &\pm z_{\frac{\alpha}2} \sqrt{\, s_p^2 \left(\frac 1 {n_1} + \frac 1 {n_2} \right)}<br /> \end{align*}[/tex]

kingwinner · Feb 21, 2009

So we have,

Case 1:
The variances are unknown but known to be equal.

Case 2:
The variance are unknown and not-known to be equal (they could be equal, but we just don't have this additional information).

So they are really two separate cases. Am I right?

But what is the real point of using the "pooled estimate"? Is it going to give a better estimate? (i.e. when the population variances are unkown but known to be equal, will the case 1 formula give a narrower (better) interval than case 2?)

Thanks!

Large samples confidence interval for difference in means

SUMMARY

PREREQUISITES

NEXT STEPS

USEFUL FOR

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect