Where does the normal distribution come from?

Click For Summary
SUMMARY

The normal distribution serves as a critical probability measure for subspaces within the measurable space defined by the Borel σ-algebra on real numbers. Its significance is underscored by the Central Limit Theorem (CLT), which approximates the binomial distribution and is foundational in probability theory alongside the Law of Large Numbers (LLN) and the Law of the Iterated Logarithm (LIL). The derivation of the normal distribution can be approached through combinatorial methods, convolution, and even physical reasoning, with the Lindeberg CLT offering a general framework with minimal assumptions. For a comprehensive understanding, refer to the de Moivre-Laplace Theorem and resources on convolution in probability.

PREREQUISITES
  • Understanding of Borel σ-algebra and measurable spaces
  • Familiarity with the Central Limit Theorem (CLT)
  • Knowledge of binomial distribution and its properties
  • Basic concepts of convolution in probability theory
NEXT STEPS
  • Study the de Moivre-Laplace Theorem for foundational insights into normal distribution
  • Explore the Lindeberg Central Limit Theorem for advanced applications
  • Learn about convolution and its role in deriving probability distributions
  • Investigate entropy maximization and its connection to the normal distribution
USEFUL FOR

Statisticians, mathematicians, data scientists, and anyone interested in the theoretical foundations of probability distributions and their applications in statistical analysis.

V0ODO0CH1LD
Messages
278
Reaction score
0
Okay, so I guess my first question is if the main utility of the normal distribution ##f(x)## is to provide a probability measure for any subspace of the measurable space ##(\mathbb{R},\mathcal{B})## (where ##\mathcal{B}## is the borel σ-algebra on the real numbers) by defining the measure as
P(A)=\int_Af(x)\,dx\;\;\; \forall{}A\in\mathcal{B}.

Then my second question is: where does the normal distribution come from? What is the derivation of it that uses the least amount of assumptions (i.e. assume I am NOT throwing darts at a target)? Can you derive it from combinatorics alone? Or are there some assumptions that have to be made?

Thanks
 
Physics news on Phys.org
The normal distribution is probably the most important probability distribution there is. Its main importance comes from the celebrated central limit theorem (CLT). It is one of the important theorems of probability theory (together with the Law of Large Numbers (LLN) and the Law of the Iterated Logarithm (LIL)).

The CLT was first called the de Moivre-Laplace Theorem and is used to approximate the binomial distribution with the normal distribution. A proof can be found on wikipedia: http://en.wikipedia.org/wiki/De_Moivre–Laplace_theorem I think the proof on this wiki page is the closest you can get to answering your question.

It was later found out that not only the binomial distribution can be approximated by the normal distribution, but that something far more general holds, this is the CLT.
 
Well, you wouldn't think it would be purely combinatorial if the result is a continuous distribution.

If you want to know where the normal distribution comes from, intuitively speaking, my answer is that it's like an exponentiated version of the old summation formula, 1 + 2 + 3 + ...+n = n(n+1)/2.

Notice that if n is large, asymptotically, this is just n^2/2, approximately, so you can start to see what I mean by an exponentiated version of this.

The idea is to find the peak of the binomial probability mass function and then do some kind of approximation around that point, using the relationship between binomial coefficients (which can be seen combinatorially). For an explanation of how this works, along with other nice stuff about the CLT, see here:

http://stats.stackexchange.com/ques...nation-is-there-for-the-central-limit-theorem

When I first encountered the CLT, I thought there should be some sort of graphical proof, but, as the post points out, the exact shape of the curve seems mysterious without at least a small amount of calculation. However, if you have a good intuition for convolution (when adding two independent rv's, the density function of the sum is given by convolution), such as the intuition I gained by taking a signal processing class, you can can actually picture something like the normal distribution forming through purely visual reasoning, although, again, the exact shape of the curve remains mysterious. A nice feature of this argument is that it's vaguely suggestive of the rigorous proof which runs along similar lines, using convolution, but in the frequency domain (using characteristic functions/Fourier transforms).

It also seems possible to arrive at the normal distribution through some physical reasoning, along with a little optimization problem (entropy maximization). This isn't at the forefront of my consciousness, so it would take me a while to dig it back up (well, to be honest, I never fully figured it out in the first place), so I'll point you to where I got this idea (Susskind lectures in statistical mechanics):



You might have to be sort of astute to catch what I'm talking about, though.

If you want the most general version, with the fewest assumptions (notably, the identically distributed hypothesis can be weakened), look up the Lindeberg CLT. It's not a simple proof, though.
 
Last edited by a moderator:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 31 ·
2
Replies
31
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
5
Views
5K
Replies
1
Views
4K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K