# Why are normal distributions so frequent?

1. Feb 23, 2014

### carllacan

Why are there so many physical processes which are described (with more or less accuracy) by a normal distribution?

2. Feb 23, 2014

### Filip Larsen

3. Feb 23, 2014

### D H

Staff Emeritus
A more nefarious reason: It's easy. The normal distribution is extremely amenable to analysis. People oftentimes use a normal distribution when they shouldn't be doing that. I myself have been committed that statistical crime.

4. Feb 23, 2014

### PeroK

Something short of a crime, though. A misdemeanor, at most, surely!

5. Feb 23, 2014

### slakker

The worse crime I've seen is when companies use it to rank employee performance. Sorry for the Dilbert moment. :)

6. Feb 23, 2014

### carllacan

I'm reading on the CLT and I'm getting more and more confused. I'm getting the idea that according to it every physical experiment would end up giving a normal distribution, but that is obviously false. Can someone clear my head?

7. Feb 23, 2014

### FactChecker

Not the experiment itself, but the average of many results of the experiment. Suppose you have an experiment whose results have probabilities with fairly general conditions (finite mean and variance, etc.) When you take an average of many results of that experiment, the average will not exactly equal the mean, but the difference between the average and the mean will be close to normally distributed. The more data you average, the closer you can expect it to be to the mean.

This also explains why so many random errors are assumed to be normally distributed. If the error of a result may be the summation of many, many unknown factors, but you know that on average the error has a certain mean and variance, then a normal distribution is a natural choice. Common exceptions are when you know that the errors are never negative, (Chi-squared, log normal, etc.), when the error tends to be proportional to the expected result , or when you know something about the frequency content of the random errors (white noise, pink noise, brown noise, etc).

8. Feb 24, 2014

### homeomorphic

Another idea is that the normal distribution is the distribution that assumes the least information, given a specific mean and variance (maximum entropy).

Here's a silly example of where that could go wrong and also serves as a baby example of how that might work.

Say we are arguing over the existence of God. Because we are in complete ignorance and have no information about it, we should assume that it's 50/50 odds or probability 0.5 there is a God, probability 0.5 there isn't.

So, this is clearly nonsense, but there's a kind of logic to it. We really shouldn't make any assumption at all about the probabilities, in the absence of any information. However, distributing the probability evenly between the two possibilities assumes the least. So, if we are forced to make assumptions, the way to minimize what we are assuming is to distribute the probabilities evenly.

So, that's entropy maximization with no constraint (uniform distribution). If you specify mean and variance and allow a continuous distribution and do an analogous thing, you get the normal curve. One way to think of the central limit theorem is that repeated trials destroy information when they are averaged because the peculiarities of one particular trial are averaged away and only the overall trend remains (it's not clear from this perspective that the maximum entropy, that is, the normal curve, is actually achieved and this is one of the subtleties that have to be addressed when actually proving the CLT, using this approach).

A sneaky trick here is that you can artificially fix the mean and variance to be whatever you want, say 0 and 1 by a change of scale, even if the distributions were not originally that way, and this is exploited in the central limit theorem.

What's the relation between probability and information?

Here's a video series that explains it.

Entropy maximization is a slightly broader reason than the central limit theorem.

From this perspective, it becomes clear that as soon as we know anything about the distribution more than the mean and variance, using the normal curve would CERTAINLY be a mistake because you have to take advantage of all the information you have for probability to work. The converse is not clear. If you only know the mean and variance and nothing else, it may or may not be a mistake to use the normal curve. The normal curve just minimizes the possible amount of mistake made, in some sense.

Last edited by a moderator: Sep 25, 2014
9. Feb 24, 2014

### carllacan

Very interesting. And why is it that the normal distribution has the most entropy?

10. Feb 24, 2014

### homeomorphic

I haven't figured out a much better answer than, "you do an ugly calculation and that's what happens," at the moment. I have a few thoughts, but they'd probably be sort of incoherent without a lot of work on my part, so I think I will pass on sharing them.

11. Feb 24, 2014

### homeomorphic

I realized a minor correction is needed here. If you get information that is consistent with the normal, then the normal still might be the right choice. It's new contradictory information that you need to worry about. So, you don't really need the maximum entropy principle to say that you should correct for new information because that's just true without much further thought. The maximum entropy principle just underscores it. Also, you might have other information that is equivalent to knowing the mean and variance.

12. Feb 24, 2014

### Choppy

Personally, I like the way that Taylor (Intro to Error Analysis) explains it.

Consider a quantity that has a true value u that you want to make a measurement of.

If you have only a single source of error with magnitude E, and no bias in your measurements, you'll measure values of u + E and u - E with equal probability.

If you have 2 sources of error with magnitude E, and no bias in your measurements, you'll measure values of:
u - 2E (0.25)
u (0.50)
u + 2E (0.25)
The quantities in the brackets are the probabilities.

Extend the argument now to N sources of error. Your possible measurements will have values between u - NE and u + NE. Binary outcomes like this follow a binomial distribution, which would allow you to calculate the probability of any result in between these two values.

Then you just consider the limit as your number of sources of error N approaches infinity and the magnitude of your error E approaches zero. If you plot it out, you can see that you're approaching a normal distribution.

So a normal distribution results from any situation that's subject to a large number of very small, random variations. In that sense, it's not surprising that the normal distribution is so common.

13. Feb 25, 2014

### Stephen Tashi

"Normal distribution" refers to a family of distributions and not all members of that family have the same entropy. Is isn't clear what you mean by "the" normal distribution.

14. Feb 25, 2014

### FactChecker

In holeomorphic's post, he is given a mean and distribution. So the normal distribution is uniquely determined.

15. Mar 4, 2014

### stevmg

Is there a "proof" that the normal distribution is, in fact, the binomial distribution as n approaches infinity?

If so, that would explain a lot.

16. Mar 4, 2014

### Hornbein

The normal distribution results whenever a large number of small, independent factors are summed up. That's fairly common. It often shows up with measurements because the large factors have already been dealt with.

It is "robust" in that deviations from the assumptions often don't make that much difference.

In real life there is seldom an exact match to the normal. Usually the differences show up in the tails of the distributions. This is one reason that confidence intervals are 95%, as this avoids the tails.

17. Mar 4, 2014

### homeomorphic

Of course there is. That's just a special case of what we've been talking about because a binomial distribution comes from a repeated experiment, involving 2 possible outcomes, usually called "success" and "failure".

I think there are three ways to prove it that I know of in the case of the binomial distribution. One (historically, the first) involves approximating the binomial coefficients, using Stirling's approximating. A shorter non-rigorous version of this, not involving Stirling's approximation is more insightful, and to my mind, Stirling's approximation should really be thought of as a consequence of the central limit theorem, applied to i.i.d. Poisson random variables, since the algebraic proof of it is extremely ugly.

The shorter, heuristic version is explained here:

http://stats.stackexchange.com/ques...nation-is-there-for-the-central-limit-theorem

There is also a proof using entropy, as I've mentioned, and finally, there's also a proof using characteristic functions, which are basically Fourier transforms.

The first proof I mentioned is only for the binomial, the other two prove the central limit theorem in general.