# Accuracy of the Normal Approximation to Binomial

• I
What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.

tnich
Homework Helper
What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.
If you are doing a p-test, then it matters a lot.

That is a good point. So in that case, how would you decide when the approximation is good enough?

tnich
Homework Helper
That is a good point. So in that case, how would you decide when the approximation is good enough?
It seems to me that it would be better to use the binomial distribution for the p-test in that case.

Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?

tnich
Homework Helper
Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

FactChecker
FactChecker
Gold Member
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.
Good point. This implies that it is the accuracy of the cumulative distribution function at important and frequently used confidence values that really matters.

If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

Okay, but it will never be exactly the same, right? So even there, don't you need to consider how big the range is where the two distributions will lead to opposite conclusions. And then reason, I suppose, that if this range is small, then the approximation is good enough. In other words, you will be led to the same conclusion with the normal distribution as with the binomial most of the time. You just have to specify how often is often enough.

Also, this might be a dumb question, but is hypothesis testing the main application of this limit theorem? Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?

tnich
Homework Helper
Okay, but it will never be exactly the same, right?
Right. My point is, why would you want to use an approximation, when you can easily calculate or look up exact values for the binomial distribution? Exact values are unimpeachable.

Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?
I think a normal distribution table would have been quite useful for computing approximate binomial probabilities by hand. Of course, DeMoivre would have realized that at extreme values where the approximation was not good, doing the computation using the actual binomial distribution would be fairly simple.
You could also use the normal approximation to calculate the expected value of a function of a binomial random variable as long as the function was not heavily weighted at the extremes, but the actual binomial distribution would not be any more difficult to use for that.

I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.

StoneTemplePython
Gold Member
I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.

As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.