Convergence of 2 sample means with 95% confidence

Click For Summary
SUMMARY

The discussion centers on deriving an equation for the convergence of two sample means, ##\hat{x_1}## and ##\hat{x_2}##, within a 95% confidence interval. The user attempts to establish a relationship between sample standard deviation and population standard deviation, leading to a flawed conclusion regarding the relationship between sample sizes ##n## and ##N##. Key insights reveal that inequalities must be handled carefully, particularly when multiplying or dividing by negative values, and that the Central Limit Theorem (CLT) plays a crucial role in understanding the distribution of sample means.

PREREQUISITES
  • Understanding of sample means and standard deviations
  • Familiarity with the Central Limit Theorem (CLT)
  • Knowledge of confidence intervals and their calculations
  • Basic principles of statistical inequalities
NEXT STEPS
  • Study the derivation of confidence intervals in statistics
  • Learn about the implications of the Central Limit Theorem on sample means
  • Explore the application of Chernoff Bounds in statistical estimation
  • Investigate the differences between sample variance and population variance calculations
USEFUL FOR

Statisticians, data analysts, and researchers involved in statistical modeling and hypothesis testing will benefit from this discussion, particularly those working with sample means and confidence intervals.

fahraynk
Messages
185
Reaction score
5
I tried to derive an equation for one sample mean to converge to another sample mean within a 95% confidence interval, but I know I am wrong. Can someone tell me what I did wrong, and what is the correct formula?

Suppose:

##\hat{x_1},\hat{\sigma_1},N## are a sample mean, standard deviation calculated with ##N## samples,

##\hat{x_2},\hat{\sigma_2},n## are a sample mean, standard deviation calculated with ##n## samples ##n\leq N##

##\mu##,##\delta## are the true mean, true standard deviation for the population.

If ##d(\hat{x_1},\hat{x_2})## is a euclidean distance function on the sample means, then:

$$
d(\hat{x_1},\hat{x_2})\leq d(\hat{x_1},\mu)+ d(\hat{x_2},\mu)\leq 4\frac{\sigma_1}{\sqrt{N}}+4\frac{\sigma_2}{\sqrt{n}}
$$

With ##95##% confidence because : ##\mu\in [\hat{x_1}-\frac{2\sigma_1}{\sqrt{N}},\hat{x_1}+\frac{2\sigma_1}{\sqrt{N}}]## with 95% confidenceMy first question is, What is the relationship between sample standard deviation and population standard deviation?

When I take many samples, the standard deviation of the samples changes very little, so I assume the relationship ##\sigma_1=\sigma_2=\delta## :

$$
d(\hat{x_1},\hat{x_2})\leq 4(\frac{\sigma_1}{\sqrt{N}}+\frac{\sigma_2}{\sqrt{n}})=4\delta\frac{(\sqrt{N}+\sqrt{n})}{\sqrt{N}\sqrt{n}}=4\delta\frac{\frac{\sqrt{N}}{\sqrt{n}}+1}{\sqrt{N}}=>\\
=>\sqrt{N}d(\hat{x_1},\hat{x_2})-4\delta\leq\frac{4\delta\sqrt{N}}{\sqrt{n}}=>\\=>\sqrt{n}\leq\frac{4\delta\sqrt{N}}{\sqrt{N}d(\hat{x_1},\hat{x_2})-4\delta}
$$

But this can't be true, because if I choose ##d(\hat{x_1},\hat{x_2})=0## then ##\sqrt{n}\leq-\sqrt{N}##, but ##n## and ##N## must be positive.

What is wrong here?

Also, I am sure there must be a simple way to do this. What I really want to know is how to get ##n## as a function of ##d(\hat{x_1},\hat{x_2})## and ##\phi##, where ##\phi## is a confidence level, like 95% confidence.
 
Physics news on Phys.org
I don't like the look of this. As I understand your scenario, n is not a variable, and d is a random variable that you cannot "choose".
Mathematically, your problem comes in the last step. If we set d=0, the penultimate line is
-4δ ≤ 4δ√N/√n ⇒
√n ≤ 4δ√N/(-4δ)
You have to be careful with inequalities. It's not as simple as "swapping terms" in an equation. When you multiply or divide both sides of an inequality by a negative quantity (-4δ), the direction of the inequality is reversed. So it should be
√n ≥ 4δ√N/(-4δ)
And more generally, if √Nd - 4δ is negative, moving it to the bottom of the RHS reverses the direction of the inequality.
 
fahraynk said:
<Snip>
If ##d(\hat{x_1},\hat{x_2})## is a euclidean distance function on the sample means, then:

$$
d(\hat{x_1},\hat{x_2})\leq d(\hat{x_1},\mu)+ d(\hat{x_2},\mu)\leq 4\frac{\sigma_1}{\sqrt{N}}+4\frac{\sigma_2}{\sqrt{n}}
$$

With ##95##% confidence because : ##\mu\in [\hat{x_1}-\frac{2\sigma_1}{\sqrt{N}},\hat{x_1}+\frac{2\sigma_1}{\sqrt{N}}]## with 95% confidence<Snip>
What is wrong here?

Also, I am sure there must be a simple way to do this. What I really want to know is how to get ##n## as a function of ##d(\hat{x_1},\hat{x_2})## and ##\phi##, where ##\phi## is a confidence level, like 95% confidence.

Not sure, but maybe because ##\mu## is also in a similar interval about ##\hat{x_2} ##?
 
As an idea ( which I have put off developing) maybe you can use correlation as an inner-product ( with some adjustments) , then find the norm generated by the inner-product and define a distance based on the norm. I will do it too...some day.
 
fahraynk said:
My first question is, What is the relationship between sample standard deviation and population standard deviation?

some red flags here are that I see you're dividing by ##n## when in fact for sample variance you'd divide by ##n-1##. Similar issue is that while you can get unbiased estimates of variance, you'll have a biased standard deviation estimate due to (negative) convexity issues.

other issues in addition to what was raised above: I don't see why your mean estimates have a normal distribution -- this isn't stated anywhere. Sure CLT would tell you that normal approximation works for large enough ##n## but I don't see the sufficiency of size of ##n## addressed anywhere.

There's also ruler problem in that you're using estimates of standard deviation to measure estimates of mean -- but how do you know your std dev (or variance) estimates are any good? There's a lot of issues lurking in here... I think this is why books on statistics are long.

- - - -
if I was trying to develop some kind of estimate from scratch, I'd probably start with some kind of bounded random variable and apply Chernoff Bounds or concentration inequalities. That way you don't need variance information, only mean. Once I had this down, if feeling adventurous this may be applied to more general random variables (that still have first 2 moments) with the help of the method of truncation.
 
fahraynk said:
I tried to derive an equation for one sample mean to converge to another sample mean within a 95% confidence interval,

Your description doesn't define a specific mathematical problem.

You might intend asking about a scenario where independent random samples are taken of a random variable. After taking ##n## samples, the sample mean is the random variable ##\mu_n##. After taking ##k## more samples, the sample mean is the random variable ##\mu_{n+k}## where the first ##n## of those samples are the same as those used to compute ##\mu_n##.

Or you might intend to ask about the situation where ##\mu_n## and ##\mu_{n+k}## are computed from two groups of samples that need not have any common samples.

You might be taking ##n## and ##k## as given and asking for an interval length ##L## such that there is a ##0.95## probability that ##| \mu_n - \mu_{k+n}| < L/2##

Or you might be taking ##L## and ##n## as given and asking for value of ##k## such that there is a 0.95 probability that ##|\mu_n - \mu_{n+k}| < L/2##

Or you might have in mind some question involving the relationship of ##\mu_n## and ##\mu_{n+k}## with the mean ##\mu## of the random variable being sampled.
 
Last edited:
It eems that if N,n were both large enough, we could use the CLT to somehow argue they must be close to each other, under certain assumptions on sampling as @Stephen Tashi described in his post.
 
Doesn't Weak Convergence allow us to say that ##\hat x_n## is Cauchy, so that for n>N ## |x_k -x_j | < \epsilon##?
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
Replies
1
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K