B Chebyshev inequality, confidence intervals, etc

AI Thread Summary
Chebyshev's inequality states that for any distribution with finite variance, at least 1 - 1/k² of observations lie within k standard deviations from the mean, providing a broad estimate applicable to all distributions. In contrast, confidence intervals assume a normal distribution, allowing for tighter estimates of where a certain percentage of observations will fall, such as 95% within ±1.96 standard deviations. The differing percentages arise because Chebyshev's inequality is more conservative, applying to a wider range of distributions, while confidence intervals leverage the properties of the normal distribution for more precise predictions. Understanding the assumptions behind each method is crucial, as they dictate the applicability and accuracy of the results. This distinction helps in choosing the appropriate method based on the distribution characteristics.
Vital
Messages
108
Reaction score
4
Hello.

I am bewildered by so many different notions of probability distribution percentages, i.e. the proportion of values that lie within certain standard deviations from the mean.

(1) There is a Chebyshev's inequality:
- for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Below X is the mean.

k = 1.25 => X ± 1.25s => proportion 1 - 1/(1.25)^2 = 36% => 36% of observations lie within 36% from the mean (hence 18% below the mean and 18% above the mean)
k = 1.50 => X ± 1.5s => proportion 56% => 56% of observations lie within 56% from the mean (hence 28% below the mean and 28% above the mean)
k = 2 => X ±2.0s => proportion 75% => 75% of observations lie within 75% from the mean (hence 37.5% below the mean and 37.5% above the mean)
k = 2.50 => X ±2.5s => proportion 84% => 84% of observations lie within 84% from the mean (hence 42% below the mean and 42% above the mean)
k = 3.0 => X +- 3.0s => proportion 89% => 89% of observations lie within 89% from the mean (hence 44.5% below the mean and 44.5% above the mean)
k = 4.0 => X +- 4.0s => proportion 94% => 94% of observations lie within 94% from the mean (hence 47% below the mean and 47% above the mean)

(2) Confidence intervals:
is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. For example, a 95% confidence interval is a range that we expect the random variable to be in 95% of the time.
μ ± 1.65σ for 90 percent of the observations
μ ± 1.96σ for 95 percent of the observations
μ ± 2.58σ for 99 percent of the observations.

Both approaches above show completely different percentages of observations within a certain number of standard deviations from the mean. In Chebyshev's inequality concept there are 94% of observations within ±4 standard deviations, while in Confidence interval approach there are 99% within ±2.58 standard deviations.

Please, help me to understand how these differ from each other, and why they give such different percentages. Please, do me a favour and don't go too deep into a rabbit whole by using complicated math formulas in your explanation.

Thank you very much.)
 
Physics news on Phys.org
No need to go down a rabbit hole. It is pretty simple actually.

The confidence interval numbers make the assumption that it is normally distributed. Making that assumption you can get pretty tight intervals.

The Chebyshev inequality makes a much weaker assumption. It assumes only that the variance is finite. Many distributions have finite variance but are much broader than the normal distribution. So you expect that the Chebyshev intervals will be wider than the normal distribution confidence intervals.

In statistics you should ALWAYS pay close attention to the assumptions. They are exceptionally important in statistics.
 
Dale said:
No need to go down a rabbit hole. It is pretty simple actually.

The confidence interval numbers make the assumption that it is normally distributed. Making that assumption you can get pretty tight intervals.

The Chebyshev inequality makes a much weaker assumption. It assumes only that the variance is finite. Many distributions have finite variance but are much broader than the normal distribution. So you expect that the Chebyshev intervals will be wider than the normal distribution confidence intervals.

In statistics you should ALWAYS pay close attention to the assumptions. They are exceptionally important in statistics.
Thank you very much. It is much clearer now. So if in this or that problem it is stated that the distribution is normal, then I can use confidence intervals. But when the distribution is assumed to be non normal, then I should use the Chebyshev inequality to define the interval around the mean. I hope I did understand that correctly)
 
Yes, although if you know that the distribution is something specific rather than normal then you can construct confidence intervals for that specific distribution. That will give you equal or better results as the Chebyshev assumption of an unknown distribution.
 
Vital said:
Hello.

I am bewildered by so many different notions of probability distribution percentages, i.e. the proportion of values that lie within certain standard deviations from the mean.

(1) There is a Chebyshev's inequality:
- for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Below X is the mean.

k = 1.25 => X ± 1.25s => proportion 1 - 1/(1.25)^2 = 36% => 36% of observations lie within 36% from the mean (hence 18% below the mean and 18% above the mean)
*************************************
(2) Confidence intervals:
is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. For example, a 95% confidence interval is a range that we expect the random variable to be in 95% of the time.
μ ± 1.65σ for 90 percent of the observations
*****************************************
Both approaches above show completely different percentages of observations within a certain number of standard deviations from the mean. In Chebyshev's inequality concept there are 94% of observations within ±4 standard deviations, while in Confidence interval approach there are 99% within ±2.58 standard deviations.

Please, help me to understand how these differ from each other, and why they give such different percentages. Please, do me a favour and don't go too deep into a rabbit whole by using complicated math formulas in your explanation.

Thank you very much.)

You have mis-stated the results. For Chebychev with ##k = 1.25## it follows that 36% of the observations lie with 25% from the mean (not 36% from the mean as you said). Also, you cannot say that 18% lie above and 18% lie below the mean; you can only say that 36% lie within the range above and below the mean. (I suspect that I could construct an asymmetric example where 30% lie above the mean and 6% lie below.)

The point about Chebychev is that it applies universally, to any distribution whatsoever with a finite mean and variance. Of course, if you know the actual form of the distribution you can do much better: Chebychev is a type of worst-case bound, and when you have a given distribution you are no longer looking at the worst case.
 
Last edited:
  • Like
Likes Dale
Ray Vickson said:
You have mis-stated the results. For Chebychev with ##k = 1.25## it follows that 36% of the observations lie with 25% from the mean (not 36% from the mean as you said). [snip]
Thank you very much. But I am not sure what you mean when you say that 36% is not correct.
1 - 1/1.25^2 = 36%, hence around 36% fall with +/- 1.25 standard deviations from the mean. Why is this incorrect?
 
Vital said:
Thank you very much. But I am not sure what you mean when you say that 36% is not correct.
1 - 1/1.25^2 = 36%, hence around 36% fall with +/- 1.25 standard deviations from the mean. Why is this incorrect?

I agree that ##1 - 1/1.25^2 = 0.36,## but that does NOT mean that points in the interval ##(\mu - 1.25 \sigma , \mu + 1.25 \sigma)## are within 36% from the mean. In fact, to even speak of "% from the mean" is using meaningless words. The concept of "%" must be in reference to some standard (or base) amount, which you have not specified. Even if you use the standard deviation ##\sigma## to be that base amount, the interval above is actually "within 125% of the mean." There are no intervals of length 36% here. The 36% applies to the probabilites, not to the "distances".
 
Last edited:

Similar threads

Back
Top