Accuracy of the Normal Approximation to Binomial

Adeimantus · Sep 5, 2018

What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.

tnich · Sep 5, 2018

Adeimantus said:

What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.

If you are doing a p-test, then it matters a lot.

Adeimantus · Sep 5, 2018

That is a good point. So in that case, how would you decide when the approximation is good enough?

tnich · Sep 5, 2018

Adeimantus said:

That is a good point. So in that case, how would you decide when the approximation is good enough?

It seems to me that it would be better to use the binomial distribution for the p-test in that case.

Adeimantus · Sep 5, 2018

Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?

tnich · Sep 5, 2018

Adeimantus said:

Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?

If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

FactChecker · Sep 5, 2018

tnich said:

If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

Good point. This implies that it is the accuracy of the cumulative distribution function at important and frequently used confidence values that really matters.

Adeimantus · Sep 6, 2018

tnich said:

If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

Okay, but it will never be exactly the same, right? So even there, don't you need to consider how big the range is where the two distributions will lead to opposite conclusions. And then reason, I suppose, that if this range is small, then the approximation is good enough. In other words, you will be led to the same conclusion with the normal distribution as with the binomial most of the time. You just have to specify how often is often enough.

Also, this might be a dumb question, but is hypothesis testing the main application of this limit theorem? Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?

tnich · Sep 6, 2018

Adeimantus said:

Okay, but it will never be exactly the same, right?

Right. My point is, why would you want to use an approximation, when you can easily calculate or look up exact values for the binomial distribution? Exact values are unimpeachable.

Adeimantus said:

Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?

I think a normal distribution table would have been quite useful for computing approximate binomial probabilities by hand. Of course, DeMoivre would have realized that at extreme values where the approximation was not good, doing the computation using the actual binomial distribution would be fairly simple.
You could also use the normal approximation to calculate the expected value of a function of a binomial random variable as long as the function was not heavily weighted at the extremes, but the actual binomial distribution would not be any more difficult to use for that.

Adeimantus · Sep 6, 2018

I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.

StoneTemplePython · Sep 6, 2018

Adeimantus said:

I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.

As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.

Adeimantus · Sep 10, 2018

StoneTemplePython said:

As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.

Excellent points. Also, thank you for suggesting Stein's method. I looked it up on Wikipedia, and that may be exactly what I'm looking for!

Accuracy of the Normal Approximation to Binomial

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight