Accuracy of the Normal Approximation to Binomial

In summary, the preferred method of measuring how accurate the normal approximation to the binomial distribution is involves ensuring that the expected number of successes and failures is greater than 5. However, a useful definition of "adequate" is needed to determine when the approximation is good enough. The approximation is good for values near the mean but not for extreme values in the tails. The accuracy of the cumulative distribution function at important confidence values is what matters most. The main application of this limit theorem is for hypothesis testing, but it also has theoretical value and is important in math. The Gaussian distribution is significant because it is characterized by its first two moments and has maximum entropy. The central limit theorem for binomial to normal is an easy proof with pedagogic
  • #1
Adeimantus
113
1
What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.
 
Physics news on Phys.org
  • #2
Adeimantus said:
What is the preferred method of measuring how accurate the normal approximation to the binomial distribution is? I know that the rule of thumb is that the expected number of successes and failures should both be >5 for the approximation to be adequate. But what is a useful definition of "adequate"? The approximation is good for values near the mean, and terrible for values in the extreme tails. How much do those extreme values matter?

Thank you.
If you are doing a p-test, then it matters a lot.
 
  • #3
That is a good point. So in that case, how would you decide when the approximation is good enough?
 
  • #4
Adeimantus said:
That is a good point. So in that case, how would you decide when the approximation is good enough?
It seems to me that it would be better to use the binomial distribution for the p-test in that case.
 
  • #5
Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?
 
  • #6
Adeimantus said:
Okay, that makes sense. In cases where you would want to use the approximation, how do you quantify how good the approximation is?
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.
 
  • Like
Likes FactChecker
  • #7
tnich said:
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.
Good point. This implies that it is the accuracy of the cumulative distribution function at important and frequently used confidence values that really matters.
 
  • #8
tnich said:
If the decision about significance of the result is the same for the approximation and the actual distribution, then the approximation is good enough.

Okay, but it will never be exactly the same, right? So even there, don't you need to consider how big the range is where the two distributions will lead to opposite conclusions. And then reason, I suppose, that if this range is small, then the approximation is good enough. In other words, you will be led to the same conclusion with the normal distribution as with the binomial most of the time. You just have to specify how often is often enough.

Also, this might be a dumb question, but is hypothesis testing the main application of this limit theorem? Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?
 
  • #9
Adeimantus said:
Okay, but it will never be exactly the same, right?
Right. My point is, why would you want to use an approximation, when you can easily calculate or look up exact values for the binomial distribution? Exact values are unimpeachable.

Adeimantus said:
Is that what Laplace and DeMoivre were using it for, or were they more interested in the values of the density/mass function?
I think a normal distribution table would have been quite useful for computing approximate binomial probabilities by hand. Of course, DeMoivre would have realized that at extreme values where the approximation was not good, doing the computation using the actual binomial distribution would be fairly simple.
You could also use the normal approximation to calculate the expected value of a function of a binomial random variable as long as the function was not heavily weighted at the extremes, but the actual binomial distribution would not be any more difficult to use for that.
 
  • #10
I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.
 
  • #11
Adeimantus said:
I agree that since tables of binomial probabilities are readily available, it makes sense to use them. Especially since they are now accessible with the call of a function in R, for instance. But then I'm left wondering, why does anyone care about the limit theorem? Does it have a use? Perhaps it has more theoretical value than anything else.

As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.
 
  • Like
Likes Adeimantus and FactChecker
  • #12
StoneTemplePython said:
As is often the case, it depends. It's worth remarking that the sum of two independent binomial distributions isn't necessarily binomial, but the sum of 2 independent Gaussians is Gaussian. If you are interesting in general for bounding the error of a Gaussian approximation, look into Stein's method. Other reasons: in math, people tend to like the exponential function a lot more than binomial coefficients, and the fact that the Gaussian is entirely characterized by its first two moments (and for a continuous distribution on ##(-\infty, \infty)## with a given finite variance, it has maximum entropy) and so on.

The Gaussian is one of the most important distributions in probability. In general the various central limit theorem proofs are not easy... it just so happens that the theorem for binomial -> normal, is an easy one, so it has pedagogic value.

Excellent points. Also, thank you for suggesting Stein's method. I looked it up on Wikipedia, and that may be exactly what I'm looking for!
 

1. What is the normal approximation to binomial distribution?

The normal approximation to binomial distribution is a statistical method used to estimate the probability of a certain number of successes in a series of independent trials. It assumes that the data follows a normal distribution, which is a bell-shaped curve, and allows for easier calculations and predictions.

2. When is the normal approximation to binomial distribution appropriate?

The normal approximation to binomial distribution is appropriate when the number of trials is large (typically more than 30) and the probability of success is not too extreme (between 0.1 and 0.9). This is because the normal distribution becomes a better fit with larger sample sizes.

3. How accurate is the normal approximation to binomial distribution?

The accuracy of the normal approximation to binomial distribution depends on the sample size and the probability of success. Generally, the larger the sample size and the closer the probability of success is to 0.5, the more accurate the approximation will be. It is important to note that the approximation is not exact and there will always be some level of error.

4. What are some limitations of using the normal approximation to binomial distribution?

One limitation is that the normal approximation may not be accurate when the probability of success is very small or very large. In these cases, the binomial distribution should be used instead. Additionally, the normal approximation assumes that the trials are independent, which may not always be the case in real-world situations.

5. How can I determine if the normal approximation to binomial distribution is appropriate for my data?

To determine if the normal approximation to binomial distribution is appropriate, you can check the sample size and the probability of success. If the sample size is large (typically more than 30) and the probability of success is not too extreme (between 0.1 and 0.9), then the approximation may be appropriate. It is also helpful to plot the data and see if it follows a normal distribution or if there are any outliers or non-normal patterns present.

Similar threads

Replies
1
Views
647
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
349
  • Classical Physics
Replies
16
Views
822
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Differential Equations
Replies
1
Views
775
Back
Top