Large samples confidence interval for difference in means

Click For Summary
SUMMARY

This discussion clarifies the distinction between two cases for large samples confidence intervals for the difference in means, specifically addressing the use of pooled variance estimates. Case 1 applies when population variances are equal, while Case 2 is for unequal variances. The formulas for both cases are presented, demonstrating that Case 1 is a specific instance of Case 2. The pooled estimate of variance, denoted as Sp², is crucial for calculating confidence intervals when population variances are unknown.

PREREQUISITES
  • Understanding of confidence intervals in statistics
  • Familiarity with the concept of pooled variance
  • Knowledge of standard normal distribution and z-scores
  • Ability to perform calculations involving sample means and variances
NEXT STEPS
  • Study the derivation of the pooled variance formula in statistical analysis
  • Learn about the implications of using unequal variances in confidence intervals
  • Explore the application of the Central Limit Theorem in large sample statistics
  • Investigate software tools for statistical analysis, such as R or Python's SciPy library
USEFUL FOR

Statisticians, data analysts, and researchers who are involved in hypothesis testing and confidence interval estimation will benefit from this discussion.

kingwinner
Messages
1,266
Reaction score
0
The following distinguishes TWO cases for large samples confidence interval for difference in means:
http://www.geocities.com/asdfasdf23135/stat11.JPG

where Sp^2 is the pooled estimate of the common variance, n1 is the sample size from the first population, n2 is the sample size from the second population, and z_alpha/2 is 100(1-alpha/2) th percentile of the standard normal.
==========================

It seems to me that case 1 is a special case of case 2 with the population variances being equal. If this is the case, the formula for case 2 should reduce to the formula for case 1 when the population variances are equal. However, I have no way of seeing it being the case.
[aside: I am trying to cut down on the number of formulas that I have to memorize. Instead of two different formulas, if case 2 contains case 1, then I only have to memorize the general case 2 formula which is nice.]

Could somebody please show me how I can reduce case 2 to case 1?
Any help would be appreciated!
 
Last edited:
Physics news on Phys.org
Suppose you knew the two population variances. The confidence interval for \mu_1 - \mu_2 would look like this.

<br /> (\overline X_1 - \overline X_2) \pm z_{\frac{\alpha}2} \sqrt{\, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }<br />

This is true for any \sigma_1^2 and \sigma_2^2. If the two variances are the same, call the common value \sigma^2. The confidence interval simplifies to

<br /> (\overline X_1 - \overline X_2) \pm z_{\frac{\alpha}2} \sqrt{\,\sigma^2 \left(\frac 1 {n_1} + \frac 1 {n_2} \right) }<br />

so if you actually know the real variances, the two formulae are equivalent.

However, in practice you don't know the real variances, and you have to make due with the sample variances. Since, even if you are willing to assume the population variances are equal, there is no reason to expect the sample variances will be equal,
instead of using them individually they are pooled to obtain s^2_p. Then the interval is

<br /> \begin{align*}<br /> (\overline X_1 - \overline X_2) &amp;\pm z_{\frac{\alpha}2} \sqrt{\, \frac{s_p^2}{n_1} + \frac{s_p^2}{n_2} } \Rightarrow \\<br /> (\overline X_1 - \overline X_2) &amp;\pm z_{\frac{\alpha}2} \sqrt{\, s_p^2 \left(\frac 1 {n_1} + \frac 1 {n_2} \right)}<br /> \end{align*}<br />
 
So we have,

Case 1:
The variances are unknown but known to be equal.

Case 2:
The variance are unknown and not-known to be equal (they could be equal, but we just don't have this additional information).

So they are really two separate cases. Am I right?


But what is the real point of using the "pooled estimate"? Is it going to give a better estimate? (i.e. when the population variances are unkown but known to be equal, will the case 1 formula give a narrower (better) interval than case 2?)

Thanks!
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
1K
Replies
6
Views
3K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
4
Views
2K
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
1
Views
1K