Chebyshev inequality, confidence intervals, etc

Click For Summary

Discussion Overview

The discussion revolves around the differences between Chebyshev's inequality and confidence intervals in statistics, particularly focusing on how they define the proportion of values within certain standard deviations from the mean. Participants explore the implications of different assumptions regarding probability distributions and the resulting percentages of observations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants explain Chebyshev's inequality, stating that it applies to any distribution with finite variance and provides a minimum proportion of observations within k standard deviations of the mean.
  • Others argue that confidence intervals assume a normal distribution and yield tighter intervals, leading to different percentages of observations compared to Chebyshev's inequality.
  • A participant highlights that Chebyshev's inequality is a worst-case bound and that knowing the specific distribution allows for better results than those provided by Chebyshev.
  • One participant challenges the interpretation of the results, asserting that the statement about percentages being "within" certain distances from the mean is misleading and requires clarification on what is meant by "percent from the mean."
  • Another participant expresses confusion over the correction regarding the interpretation of the 36% figure from Chebyshev's inequality, seeking clarification on the distinction between probabilities and distances from the mean.

Areas of Agreement / Disagreement

Participants generally agree on the definitions and applications of Chebyshev's inequality and confidence intervals, but there is disagreement regarding the interpretation of the results, particularly the meaning of percentages in relation to distances from the mean. The discussion remains unresolved on certain points of clarification.

Contextual Notes

There are limitations in the discussion regarding the assumptions made about the distributions and the definitions used in interpreting percentages. The distinction between probabilities and distances from the mean is also not fully resolved.

Vital
Messages
108
Reaction score
4
Hello.

I am bewildered by so many different notions of probability distribution percentages, i.e. the proportion of values that lie within certain standard deviations from the mean.

(1) There is a Chebyshev's inequality:
- for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Below X is the mean.

k = 1.25 => X ± 1.25s => proportion 1 - 1/(1.25)^2 = 36% => 36% of observations lie within 36% from the mean (hence 18% below the mean and 18% above the mean)
k = 1.50 => X ± 1.5s => proportion 56% => 56% of observations lie within 56% from the mean (hence 28% below the mean and 28% above the mean)
k = 2 => X ±2.0s => proportion 75% => 75% of observations lie within 75% from the mean (hence 37.5% below the mean and 37.5% above the mean)
k = 2.50 => X ±2.5s => proportion 84% => 84% of observations lie within 84% from the mean (hence 42% below the mean and 42% above the mean)
k = 3.0 => X +- 3.0s => proportion 89% => 89% of observations lie within 89% from the mean (hence 44.5% below the mean and 44.5% above the mean)
k = 4.0 => X +- 4.0s => proportion 94% => 94% of observations lie within 94% from the mean (hence 47% below the mean and 47% above the mean)

(2) Confidence intervals:
is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. For example, a 95% confidence interval is a range that we expect the random variable to be in 95% of the time.
μ ± 1.65σ for 90 percent of the observations
μ ± 1.96σ for 95 percent of the observations
μ ± 2.58σ for 99 percent of the observations.

Both approaches above show completely different percentages of observations within a certain number of standard deviations from the mean. In Chebyshev's inequality concept there are 94% of observations within ±4 standard deviations, while in Confidence interval approach there are 99% within ±2.58 standard deviations.

Please, help me to understand how these differ from each other, and why they give such different percentages. Please, do me a favour and don't go too deep into a rabbit whole by using complicated math formulas in your explanation.

Thank you very much.)
 
Physics news on Phys.org
No need to go down a rabbit hole. It is pretty simple actually.

The confidence interval numbers make the assumption that it is normally distributed. Making that assumption you can get pretty tight intervals.

The Chebyshev inequality makes a much weaker assumption. It assumes only that the variance is finite. Many distributions have finite variance but are much broader than the normal distribution. So you expect that the Chebyshev intervals will be wider than the normal distribution confidence intervals.

In statistics you should ALWAYS pay close attention to the assumptions. They are exceptionally important in statistics.
 
Dale said:
No need to go down a rabbit hole. It is pretty simple actually.

The confidence interval numbers make the assumption that it is normally distributed. Making that assumption you can get pretty tight intervals.

The Chebyshev inequality makes a much weaker assumption. It assumes only that the variance is finite. Many distributions have finite variance but are much broader than the normal distribution. So you expect that the Chebyshev intervals will be wider than the normal distribution confidence intervals.

In statistics you should ALWAYS pay close attention to the assumptions. They are exceptionally important in statistics.
Thank you very much. It is much clearer now. So if in this or that problem it is stated that the distribution is normal, then I can use confidence intervals. But when the distribution is assumed to be non normal, then I should use the Chebyshev inequality to define the interval around the mean. I hope I did understand that correctly)
 
Yes, although if you know that the distribution is something specific rather than normal then you can construct confidence intervals for that specific distribution. That will give you equal or better results as the Chebyshev assumption of an unknown distribution.
 
Vital said:
Hello.

I am bewildered by so many different notions of probability distribution percentages, i.e. the proportion of values that lie within certain standard deviations from the mean.

(1) There is a Chebyshev's inequality:
- for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Below X is the mean.

k = 1.25 => X ± 1.25s => proportion 1 - 1/(1.25)^2 = 36% => 36% of observations lie within 36% from the mean (hence 18% below the mean and 18% above the mean)
*************************************
(2) Confidence intervals:
is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. For example, a 95% confidence interval is a range that we expect the random variable to be in 95% of the time.
μ ± 1.65σ for 90 percent of the observations
*****************************************
Both approaches above show completely different percentages of observations within a certain number of standard deviations from the mean. In Chebyshev's inequality concept there are 94% of observations within ±4 standard deviations, while in Confidence interval approach there are 99% within ±2.58 standard deviations.

Please, help me to understand how these differ from each other, and why they give such different percentages. Please, do me a favour and don't go too deep into a rabbit whole by using complicated math formulas in your explanation.

Thank you very much.)

You have mis-stated the results. For Chebychev with ##k = 1.25## it follows that 36% of the observations lie with 25% from the mean (not 36% from the mean as you said). Also, you cannot say that 18% lie above and 18% lie below the mean; you can only say that 36% lie within the range above and below the mean. (I suspect that I could construct an asymmetric example where 30% lie above the mean and 6% lie below.)

The point about Chebychev is that it applies universally, to any distribution whatsoever with a finite mean and variance. Of course, if you know the actual form of the distribution you can do much better: Chebychev is a type of worst-case bound, and when you have a given distribution you are no longer looking at the worst case.
 
Last edited:
  • Like
Likes   Reactions: Dale
Ray Vickson said:
You have mis-stated the results. For Chebychev with ##k = 1.25## it follows that 36% of the observations lie with 25% from the mean (not 36% from the mean as you said). [snip]
Thank you very much. But I am not sure what you mean when you say that 36% is not correct.
1 - 1/1.25^2 = 36%, hence around 36% fall with +/- 1.25 standard deviations from the mean. Why is this incorrect?
 
Vital said:
Thank you very much. But I am not sure what you mean when you say that 36% is not correct.
1 - 1/1.25^2 = 36%, hence around 36% fall with +/- 1.25 standard deviations from the mean. Why is this incorrect?

I agree that ##1 - 1/1.25^2 = 0.36,## but that does NOT mean that points in the interval ##(\mu - 1.25 \sigma , \mu + 1.25 \sigma)## are within 36% from the mean. In fact, to even speak of "% from the mean" is using meaningless words. The concept of "%" must be in reference to some standard (or base) amount, which you have not specified. Even if you use the standard deviation ##\sigma## to be that base amount, the interval above is actually "within 125% of the mean." There are no intervals of length 36% here. The 36% applies to the probabilites, not to the "distances".
 
Last edited:

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
14K