Where does the normal distribution come from?

In summary, the normal distribution is a crucial probability distribution in statistics due to its connection to the central limit theorem. It can be derived through various methods such as the de Moivre-Laplace theorem and using convolution in the frequency domain. While there is no purely combinatorial derivation of the normal distribution, it can be seen as an exponentiated version of the summation formula. The Lindeberg CLT offers a more general version with fewer assumptions.
  • #1
V0ODO0CH1LD
278
0
Okay, so I guess my first question is if the main utility of the normal distribution ##f(x)## is to provide a probability measure for any subspace of the measurable space ##(\mathbb{R},\mathcal{B})## (where ##\mathcal{B}## is the borel σ-algebra on the real numbers) by defining the measure as
[tex] P(A)=\int_Af(x)\,dx\;\;\; \forall{}A\in\mathcal{B}. [/tex]

Then my second question is: where does the normal distribution come from? What is the derivation of it that uses the least amount of assumptions (i.e. assume I am NOT throwing darts at a target)? Can you derive it from combinatorics alone? Or are there some assumptions that have to be made?

Thanks
 
Physics news on Phys.org
  • #2
The normal distribution is probably the most important probability distribution there is. Its main importance comes from the celebrated central limit theorem (CLT). It is one of the important theorems of probability theory (together with the Law of Large Numbers (LLN) and the Law of the Iterated Logarithm (LIL)).

The CLT was first called the de Moivre-Laplace Theorem and is used to approximate the binomial distribution with the normal distribution. A proof can be found on wikipedia: http://en.wikipedia.org/wiki/De_Moivre–Laplace_theorem I think the proof on this wiki page is the closest you can get to answering your question.

It was later found out that not only the binomial distribution can be approximated by the normal distribution, but that something far more general holds, this is the CLT.
 
  • #3
Well, you wouldn't think it would be purely combinatorial if the result is a continuous distribution.

If you want to know where the normal distribution comes from, intuitively speaking, my answer is that it's like an exponentiated version of the old summation formula, 1 + 2 + 3 + ...+n = n(n+1)/2.

Notice that if n is large, asymptotically, this is just n^2/2, approximately, so you can start to see what I mean by an exponentiated version of this.

The idea is to find the peak of the binomial probability mass function and then do some kind of approximation around that point, using the relationship between binomial coefficients (which can be seen combinatorially). For an explanation of how this works, along with other nice stuff about the CLT, see here:

http://stats.stackexchange.com/ques...nation-is-there-for-the-central-limit-theorem

When I first encountered the CLT, I thought there should be some sort of graphical proof, but, as the post points out, the exact shape of the curve seems mysterious without at least a small amount of calculation. However, if you have a good intuition for convolution (when adding two independent rv's, the density function of the sum is given by convolution), such as the intuition I gained by taking a signal processing class, you can can actually picture something like the normal distribution forming through purely visual reasoning, although, again, the exact shape of the curve remains mysterious. A nice feature of this argument is that it's vaguely suggestive of the rigorous proof which runs along similar lines, using convolution, but in the frequency domain (using characteristic functions/Fourier transforms).

It also seems possible to arrive at the normal distribution through some physical reasoning, along with a little optimization problem (entropy maximization). This isn't at the forefront of my consciousness, so it would take me a while to dig it back up (well, to be honest, I never fully figured it out in the first place), so I'll point you to where I got this idea (Susskind lectures in statistical mechanics):



You might have to be sort of astute to catch what I'm talking about, though.

If you want the most general version, with the fewest assumptions (notably, the identically distributed hypothesis can be weakened), look up the Lindeberg CLT. It's not a simple proof, though.
 
Last edited by a moderator:

1. What is the normal distribution?

The normal distribution is a commonly observed probability distribution that follows a bell-shaped curve. It is often used to describe the distribution of continuous data in natural phenomena.

2. Where did the concept of the normal distribution originate?

The concept of the normal distribution can be traced back to the 18th century when mathematicians such as Abraham de Moivre and Carl Friedrich Gauss studied the properties of the bell curve. However, it was Gauss who first introduced the term "normal distribution" in his work on the theory of errors.

3. How is the normal distribution related to the central limit theorem?

The central limit theorem states that when independent random variables are added together, their sum tends to follow a normal distribution. This means that the normal distribution is often used to approximate the sum of a large number of independent and identically distributed random variables, even if the individual variables do not follow a normal distribution.

4. What are the characteristics of a normal distribution?

A normal distribution is characterized by its mean, which is the center of the curve, and its standard deviation, which measures the spread of the data around the mean. The curve is symmetric, with 50% of the data falling above the mean and 50% falling below. It also follows the 68-95-99.7 rule, where approximately 68% of the data falls within one standard deviation of the mean, 95% falls within two standard deviations, and 99.7% falls within three standard deviations.

5. How is the normal distribution used in statistics and scientific research?

The normal distribution is widely used in statistics and scientific research due to its many desirable properties. It allows for the calculation of probabilities and confidence intervals, and it is often used in hypothesis testing and parameter estimation. Many natural phenomena, such as human height and IQ scores, also tend to follow a normal distribution, making it a useful tool for data analysis and modeling.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
901
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
952
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
Back
Top