Convergence of 2 sample means with 95% confidence

In summary: And what do you mean by "converge"? Do you mean ##\lim_{n \to \infty} \mu_n = \mu## in probability? In mean-square? Almost surely? Or something else? For that matter, do you mean the same sort of convergence for ##\mu_{n+k}## to ##\mu##? If you want to know about convergence of the difference between the two sample means to 0, do you want that convergence to be in probability, in mean-square, almost surely, or something else?Without knowing the details of the problem you have in mind, it's impossible to know how to fix it. You can't
  • #1
fahraynk
186
6
I tried to derive an equation for one sample mean to converge to another sample mean within a 95% confidence interval, but I know I am wrong. Can someone tell me what I did wrong, and what is the correct formula?

Suppose:

##\hat{x_1},\hat{\sigma_1},N## are a sample mean, standard deviation calculated with ##N## samples,

##\hat{x_2},\hat{\sigma_2},n## are a sample mean, standard deviation calculated with ##n## samples ##n\leq N##

##\mu##,##\delta## are the true mean, true standard deviation for the population.

If ##d(\hat{x_1},\hat{x_2})## is a euclidean distance function on the sample means, then:

$$
d(\hat{x_1},\hat{x_2})\leq d(\hat{x_1},\mu)+ d(\hat{x_2},\mu)\leq 4\frac{\sigma_1}{\sqrt{N}}+4\frac{\sigma_2}{\sqrt{n}}
$$

With ##95##% confidence because : ##\mu\in [\hat{x_1}-\frac{2\sigma_1}{\sqrt{N}},\hat{x_1}+\frac{2\sigma_1}{\sqrt{N}}]## with 95% confidenceMy first question is, What is the relationship between sample standard deviation and population standard deviation?

When I take many samples, the standard deviation of the samples changes very little, so I assume the relationship ##\sigma_1=\sigma_2=\delta## :

$$
d(\hat{x_1},\hat{x_2})\leq 4(\frac{\sigma_1}{\sqrt{N}}+\frac{\sigma_2}{\sqrt{n}})=4\delta\frac{(\sqrt{N}+\sqrt{n})}{\sqrt{N}\sqrt{n}}=4\delta\frac{\frac{\sqrt{N}}{\sqrt{n}}+1}{\sqrt{N}}=>\\
=>\sqrt{N}d(\hat{x_1},\hat{x_2})-4\delta\leq\frac{4\delta\sqrt{N}}{\sqrt{n}}=>\\=>\sqrt{n}\leq\frac{4\delta\sqrt{N}}{\sqrt{N}d(\hat{x_1},\hat{x_2})-4\delta}
$$

But this can't be true, because if I choose ##d(\hat{x_1},\hat{x_2})=0## then ##\sqrt{n}\leq-\sqrt{N}##, but ##n## and ##N## must be positive.

What is wrong here?

Also, I am sure there must be a simple way to do this. What I really want to know is how to get ##n## as a function of ##d(\hat{x_1},\hat{x_2})## and ##\phi##, where ##\phi## is a confidence level, like 95% confidence.
 
Mathematics news on Phys.org
  • #2
I don't like the look of this. As I understand your scenario, n is not a variable, and d is a random variable that you cannot "choose".
Mathematically, your problem comes in the last step. If we set d=0, the penultimate line is
-4δ ≤ 4δ√N/√n ⇒
√n ≤ 4δ√N/(-4δ)
You have to be careful with inequalities. It's not as simple as "swapping terms" in an equation. When you multiply or divide both sides of an inequality by a negative quantity (-4δ), the direction of the inequality is reversed. So it should be
√n ≥ 4δ√N/(-4δ)
And more generally, if √Nd - 4δ is negative, moving it to the bottom of the RHS reverses the direction of the inequality.
 
  • #3
fahraynk said:
<Snip>
If ##d(\hat{x_1},\hat{x_2})## is a euclidean distance function on the sample means, then:

$$
d(\hat{x_1},\hat{x_2})\leq d(\hat{x_1},\mu)+ d(\hat{x_2},\mu)\leq 4\frac{\sigma_1}{\sqrt{N}}+4\frac{\sigma_2}{\sqrt{n}}
$$

With ##95##% confidence because : ##\mu\in [\hat{x_1}-\frac{2\sigma_1}{\sqrt{N}},\hat{x_1}+\frac{2\sigma_1}{\sqrt{N}}]## with 95% confidence<Snip>
What is wrong here?

Also, I am sure there must be a simple way to do this. What I really want to know is how to get ##n## as a function of ##d(\hat{x_1},\hat{x_2})## and ##\phi##, where ##\phi## is a confidence level, like 95% confidence.

Not sure, but maybe because ##\mu## is also in a similar interval about ##\hat{x_2} ##?
 
  • #4
As an idea ( which I have put off developing) maybe you can use correlation as an inner-product ( with some adjustments) , then find the norm generated by the inner-product and define a distance based on the norm. I will do it too...some day.
 
  • #5
fahraynk said:
My first question is, What is the relationship between sample standard deviation and population standard deviation?

some red flags here are that I see you're dividing by ##n## when in fact for sample variance you'd divide by ##n-1##. Similar issue is that while you can get unbiased estimates of variance, you'll have a biased standard deviation estimate due to (negative) convexity issues.

other issues in addition to what was raised above: I don't see why your mean estimates have a normal distribution -- this isn't stated anywhere. Sure CLT would tell you that normal approximation works for large enough ##n## but I don't see the sufficiency of size of ##n## addressed anywhere.

There's also ruler problem in that you're using estimates of standard deviation to measure estimates of mean -- but how do you know your std dev (or variance) estimates are any good? There's a lot of issues lurking in here... I think this is why books on statistics are long.

- - - -
if I was trying to develop some kind of estimate from scratch, I'd probably start with some kind of bounded random variable and apply Chernoff Bounds or concentration inequalities. That way you don't need variance information, only mean. Once I had this down, if feeling adventurous this may be applied to more general random variables (that still have first 2 moments) with the help of the method of truncation.
 
  • #6
fahraynk said:
I tried to derive an equation for one sample mean to converge to another sample mean within a 95% confidence interval,

Your description doesn't define a specific mathematical problem.

You might intend asking about a scenario where independent random samples are taken of a random variable. After taking ##n## samples, the sample mean is the random variable ##\mu_n##. After taking ##k## more samples, the sample mean is the random variable ##\mu_{n+k}## where the first ##n## of those samples are the same as those used to compute ##\mu_n##.

Or you might intend to ask about the situation where ##\mu_n## and ##\mu_{n+k}## are computed from two groups of samples that need not have any common samples.

You might be taking ##n## and ##k## as given and asking for an interval length ##L## such that there is a ##0.95## probability that ##| \mu_n - \mu_{k+n}| < L/2##

Or you might be taking ##L## and ##n## as given and asking for value of ##k## such that there is a 0.95 probability that ##|\mu_n - \mu_{n+k}| < L/2##

Or you might have in mind some question involving the relationship of ##\mu_n## and ##\mu_{n+k}## with the mean ##\mu## of the random variable being sampled.
 
Last edited:
  • #7
It eems that if N,n were both large enough, we could use the CLT to somehow argue they must be close to each other, under certain assumptions on sampling as @Stephen Tashi described in his post.
 
  • #8
Doesn't Weak Convergence allow us to say that ##\hat x_n## is Cauchy, so that for n>N ## |x_k -x_j | < \epsilon##?
 

1. What is the significance of having a 95% confidence level in the convergence of 2 sample means?

The 95% confidence level is a statistical measure that indicates the likelihood of the true population mean falling within the calculated confidence interval. In the context of convergence of 2 sample means, a 95% confidence level means that there is a 95% chance that the difference between the two sample means is representative of the true difference between the two population means.

2. How is the confidence interval calculated for the convergence of 2 sample means?

The confidence interval for the convergence of 2 sample means is calculated using the formula: (mean of sample 1 - mean of sample 2) ± (critical value * standard error). The critical value is determined based on the desired confidence level and the sample size, while the standard error is calculated using the standard deviation of each sample and the sample size.

3. What is the role of sample size in the convergence of 2 sample means with 95% confidence?

The sample size plays a crucial role in determining the accuracy and reliability of the convergence of 2 sample means. A larger sample size reduces the standard error, resulting in a narrower confidence interval and a higher level of confidence in the convergence of the two sample means.

4. Can the convergence of 2 sample means with 95% confidence be affected by outliers?

Yes, outliers can significantly impact the convergence of 2 sample means with 95% confidence. Outliers can skew the results and increase the standard error, leading to a wider confidence interval and a lower level of confidence in the convergence of the two sample means.

5. How is the convergence of 2 sample means with 95% confidence used in scientific research?

The convergence of 2 sample means with 95% confidence is commonly used in scientific research to determine whether there is a significant difference between two groups or variables. It allows researchers to make conclusions about the true difference between the two populations based on the data from the two samples.

Similar threads

Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
918
  • General Math
Replies
21
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
738
  • General Math
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
651
  • Calculus and Beyond Homework Help
Replies
1
Views
353
Replies
4
Views
276
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
716
  • Precalculus Mathematics Homework Help
Replies
4
Views
969
Back
Top