Need help with normal distribution

thetexan · Jul 16, 2015

I need help understanding normal distribution. I am self studying statistics to help me in my role of teaching excel as a business tool.

I understand taking a data set and creating a frequency distribution. I don't understand about normal distribution. Why should any data set regardless of what it is necessarily end up with a nice bell shaped curve. It seems to me that any data set, regardless of how you distribute the individual datum, will produce whatever it produces and that every resulting curve will be different from the other. I don't understand how we go from any random set of data to a distribution that results in a bell curve.

I need to understand this before I can understand the importance of the normal distribution.

tex

Dr. Courtney · Jul 16, 2015

thetexan said:

I need help understanding normal distribution. I am self studying statistics to help me in my role of teaching excel as a business tool.

I understand taking a data set and creating a frequency distribution. I don't understand about normal distribution. Why should any data set regardless of what it is necessarily end up with a nice bell shaped curve. It seems to me that any data set, regardless of how you distribute the individual datum, will produce whatever it produces and that every resulting curve will be different from the other. I don't understand how we go from any random set of data to a distribution that results in a bell curve.

I need to understand this before I can understand the importance of the normal distribution.

tex

Understanding comes in cycles. My experience with stats it that students can often use the normal distribution for a while before they gain a deeper understanding of why it works.

https://en.wikipedia.org/wiki/Normal_distribution

The wiki page is pretty good. Understand what you can and keep moving forward. Students can make progress by taking it on the authority of experts while their own understanding catches up.

After you have made a lot of histograms and compared them with the bell curve, your will have developped a better intuition for when a distribution will be Gaussian and when it won't be. Jump in and get to work.

SteamKing · Jul 16, 2015

thetexan said:

I need help understanding normal distribution. I am self studying statistics to help me in my role of teaching excel as a business tool.

I understand taking a data set and creating a frequency distribution. I don't understand about normal distribution. Why should any data set regardless of what it is necessarily end up with a nice bell shaped curve. It seems to me that any data set, regardless of how you distribute the individual datum, will produce whatever it produces and that every resulting curve will be different from the other. I don't understand how we go from any random set of data to a distribution that results in a bell curve.

I need to understand this before I can understand the importance of the normal distribution.

tex

Tex, you should direct your self-study of statistics to something called the "Central Limit Theorem":

https://en.wikipedia.org/wiki/Central_limit_theorem

While taking a limited data sample can produce a variety of different probability distributions, the CLT says that by taking a sufficiently large data sample, the results should follow the normal distribution having the same mean and standard deviation as the sampled values.

Like a lot of things, statistically speaking, the CLT and it's proof are quite subtle, so it may take some time to understand it and its implications.

thetexan · Jul 16, 2015

Thanks, I'm studying about the CLT. What I'm getting from that is that if you take a large enough sample any data will tend toward the same normal distribution (the height and shape of the bell curve differing based on several factors). This seems counter intuitive to me...so far. (I hope I'll get it eventually).

Let's take a sample of billions of geographic points on the Earth each measuring the terrain height at that point. Surely the frequency distribution of the data will be heavily weighted toward the low height end since, by far there is more flat Earth that mountainous earth. Simply increasing the data sample to trillions of samples won't change that fact. But, if I read the CLT and understand it with my very uneducated capability, the resulting distribution should be close to a bell shaped normal distribution. If this is right then I am lost as to why and that is my dilema.

tex

SteamKing · Jul 16, 2015

thetexan said:

Thanks, I'm studying about the CLT. What I'm getting from that is that if you take a large enough sample any data will tend toward the same normal distribution (the height and shape of the bell curve differing based on several factors). This seems counter intuitive to me...so far. (I hope I'll get it eventually).

A lot of statistics is counter intuitive. That's why it must be studied very carefully. IMO, intuition is overrated, though it does serve a purpose, sometimes. The key is to know when enough is enough.

Let's take a sample of billions of geographic points on the Earth each measuring the terrain height at that point. Surely the frequency distribution of the data will be heavily weighted toward the low height end since, by far there is more flat Earth that mountainous earth. Simply increasing the data sample to trillions of samples won't change that fact. But, if I read the CLT and understand it with my very uneducated capability, the resulting distribution should be close to a bell shaped normal distribution. If this is right then I am lost as to why and that is my dilema.

tex

Dilemma with 2 ems.

There are a lot of qualifications behind the CLT.

I think the key one with your example of measuring the height of Earth's terrain is do you expect the height of a mountain to fluctuate randomly, or is the random variable in this process the location of where the measurement is made? Trying to find your position on the globe with repeatable certainty is not as easy as you think, let alone how you determine how high something is from a completely arbitrary datum.

I think using a different example would better illuminate what the CLT is about. I think the illustrated examples in the Wiki article on the CLT, e.g., rolling dice or flipping coins, would be a better place to start seeing the CLT in action than measuring the heights of mountains (and much easier to do your own experiments).

mathman · Jul 16, 2015

thetexan said:

Thanks, I'm studying about the CLT. What I'm getting from that is that if you take a large enough sample any data will tend toward the same normal distribution (the height and shape of the bell curve differing based on several factors). This seems counter intuitive to me...so far. (I hope I'll get it eventually).

Let's take a sample of billions of geographic points on the Earth each measuring the terrain height at that point. Surely the frequency distribution of the data will be heavily weighted toward the low height end since, by far there is more flat Earth that mountainous earth. Simply increasing the data sample to trillions of samples won't change that fact. But, if I read the CLT and understand it with my very uneducated capability, the resulting distribution should be close to a bell shaped normal distribution. If this is right then I am lost as to why and that is my dilema.

tex

The central limit theorem has a condition that the samples be independent. The height measurements are not - the height at any point will, most of the time, be similar to the height at nearby points.

SteamKing · Jul 16, 2015

mathman said:

The central limit theorem has a condition that the samples be independent. The height measurements are not - the height at any point will, most of the time, be similar to the height at nearby points.

Unless, of course, you're near the edge of a cliff.

Need help with normal distribution

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect