Undergrad Accuracy of the Normal Approximation to Binomial

Click For Summary
SUMMARY

The accuracy of the normal approximation to the binomial distribution is primarily measured by the expected number of successes and failures, which should both exceed 5 for the approximation to be considered adequate. The approximation is effective near the mean but significantly less reliable in the extreme tails. When conducting hypothesis tests, the approximation's validity hinges on whether the significance decisions align with those derived from the actual binomial distribution. Stein's method is recommended for bounding the error of Gaussian approximations, highlighting the pedagogical value of the central limit theorem in understanding these distributions.

PREREQUISITES
  • Understanding of binomial distribution and its properties
  • Familiarity with normal distribution and its applications
  • Knowledge of hypothesis testing methodologies
  • Basic proficiency in statistical software, such as R
NEXT STEPS
  • Explore Stein's method for bounding Gaussian approximation errors
  • Learn about the central limit theorem and its implications for statistical analysis
  • Investigate the differences between binomial and normal distributions in hypothesis testing
  • Utilize R to compute binomial probabilities and compare them with normal approximations
USEFUL FOR

Statisticians, data analysts, and researchers involved in probability theory and hypothesis testing will benefit from this discussion, particularly those interested in the applications of the normal approximation to the binomial distribution.

Adeimantus
Messages
112
Reaction score
1
What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.
 
Physics news on Phys.org
Adeimantus said:
What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.
If you are doing a p-test, then it matters a lot.
 
That is a good point. So in that case, how would you decide when the approximation is good enough?
 
Adeimantus said:
That is a good point. So in that case, how would you decide when the approximation is good enough?
It seems to me that it would be better to use the binomial distribution for the p-test in that case.
 
Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?
 
Adeimantus said:
Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.
 
  • Like
Likes FactChecker
tnich said:
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.
Good point. This implies that it is the accuracy of the cumulative distribution function at important and frequently used confidence values that really matters.
 
tnich said:
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

Okay, but it will never be exactly the same, right? So even there, don't you need to consider how big the range is where the two distributions will lead to opposite conclusions. And then reason, I suppose, that if this range is small, then the approximation is good enough. In other words, you will be led to the same conclusion with the normal distribution as with the binomial most of the time. You just have to specify how often is often enough.

Also, this might be a dumb question, but is hypothesis testing the main application of this limit theorem? Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?
 
Adeimantus said:
Okay, but it will never be exactly the same, right?
Right. My point is, why would you want to use an approximation, when you can easily calculate or look up exact values for the binomial distribution? Exact values are unimpeachable.

Adeimantus said:
Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?
I think a normal distribution table would have been quite useful for computing approximate binomial probabilities by hand. Of course, DeMoivre would have realized that at extreme values where the approximation was not good, doing the computation using the actual binomial distribution would be fairly simple.
You could also use the normal approximation to calculate the expected value of a function of a binomial random variable as long as the function was not heavily weighted at the extremes, but the actual binomial distribution would not be any more difficult to use for that.
 
  • #10
I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.
 
  • #11
Adeimantus said:
I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.

As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.
 
  • Like
Likes Adeimantus and FactChecker
  • #12
StoneTemplePython said:
As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.

Excellent points. Also, thank you for suggesting Stein's method. I looked it up on Wikipedia, and that may be exactly what I'm looking for!
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
6K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
2
Views
3K
Replies
6
Views
4K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 11 ·
Replies
11
Views
2K