Need help with normal distribution

In summary, the conversation revolves around the concept of understanding the normal distribution in statistics, specifically in relation to creating a frequency distribution and the importance of the normal distribution. The conversation also touches on the Central Limit Theorem and its implications, as well as the counterintuitive nature of statistics. The main dilemma discussed is the idea that any data set, regardless of its distribution, will eventually tend towards a bell-shaped normal distribution with a large enough sample size, which seems counterintuitive in certain scenarios.
  • #1
thetexan
269
13
I need help understanding normal distribution. I am self studying statistics to help me in my role of teaching excel as a business tool.

I understand taking a data set and creating a frequency distribution. I don't understand about normal distribution. Why should any data set regardless of what it is necessarily end up with a nice bell shaped curve. It seems to me that any data set, regardless of how you distribute the individual datum, will produce whatever it produces and that every resulting curve will be different from the other. I don't understand how we go from any random set of data to a distribution that results in a bell curve.

I need to understand this before I can understand the importance of the normal distribution.

tex
 
Mathematics news on Phys.org
  • #2
thetexan said:
I need help understanding normal distribution. I am self studying statistics to help me in my role of teaching excel as a business tool.

I understand taking a data set and creating a frequency distribution. I don't understand about normal distribution. Why should any data set regardless of what it is necessarily end up with a nice bell shaped curve. It seems to me that any data set, regardless of how you distribute the individual datum, will produce whatever it produces and that every resulting curve will be different from the other. I don't understand how we go from any random set of data to a distribution that results in a bell curve.

I need to understand this before I can understand the importance of the normal distribution.

tex

Understanding comes in cycles. My experience with stats it that students can often use the normal distribution for a while before they gain a deeper understanding of why it works.

https://en.wikipedia.org/wiki/Normal_distribution

The wiki page is pretty good. Understand what you can and keep moving forward. Students can make progress by taking it on the authority of experts while their own understanding catches up.

After you have made a lot of histograms and compared them with the bell curve, your will have developped a better intuition for when a distribution will be Gaussian and when it won't be. Jump in and get to work.
 
  • #3
thetexan said:
I need help understanding normal distribution. I am self studying statistics to help me in my role of teaching excel as a business tool.

I understand taking a data set and creating a frequency distribution. I don't understand about normal distribution. Why should any data set regardless of what it is necessarily end up with a nice bell shaped curve. It seems to me that any data set, regardless of how you distribute the individual datum, will produce whatever it produces and that every resulting curve will be different from the other. I don't understand how we go from any random set of data to a distribution that results in a bell curve.

I need to understand this before I can understand the importance of the normal distribution.

tex

Tex, you should direct your self-study of statistics to something called the "Central Limit Theorem":

https://en.wikipedia.org/wiki/Central_limit_theorem

While taking a limited data sample can produce a variety of different probability distributions, the CLT says that by taking a sufficiently large data sample, the results should follow the normal distribution having the same mean and standard deviation as the sampled values.

Like a lot of things, statistically speaking, the CLT and it's proof are quite subtle, so it may take some time to understand it and its implications.
 
  • #4
Thanks, I'm studying about the CLT. What I'm getting from that is that if you take a large enough sample any data will tend toward the same normal distribution (the height and shape of the bell curve differing based on several factors). This seems counter intuitive to me...so far. (I hope I'll get it eventually).

Let's take a sample of billions of geographic points on the Earth each measuring the terrain height at that point. Surely the frequency distribution of the data will be heavily weighted toward the low height end since, by far there is more flat Earth that mountainous earth. Simply increasing the data sample to trillions of samples won't change that fact. But, if I read the CLT and understand it with my very uneducated capability, the resulting distribution should be close to a bell shaped normal distribution. If this is right then I am lost as to why and that is my dilema.

tex
 
  • #5
thetexan said:
Thanks, I'm studying about the CLT. What I'm getting from that is that if you take a large enough sample any data will tend toward the same normal distribution (the height and shape of the bell curve differing based on several factors). This seems counter intuitive to me...so far. (I hope I'll get it eventually).

A lot of statistics is counter intuitive. That's why it must be studied very carefully. IMO, intuition is overrated, though it does serve a purpose, sometimes. The key is to know when enough is enough.

Let's take a sample of billions of geographic points on the Earth each measuring the terrain height at that point. Surely the frequency distribution of the data will be heavily weighted toward the low height end since, by far there is more flat Earth that mountainous earth. Simply increasing the data sample to trillions of samples won't change that fact. But, if I read the CLT and understand it with my very uneducated capability, the resulting distribution should be close to a bell shaped normal distribution. If this is right then I am lost as to why and that is my dilema.

tex
Dilemma with 2 ems.

There are a lot of qualifications behind the CLT.

I think the key one with your example of measuring the height of Earth's terrain is do you expect the height of a mountain to fluctuate randomly, or is the random variable in this process the location of where the measurement is made? Trying to find your position on the globe with repeatable certainty is not as easy as you think, let alone how you determine how high something is from a completely arbitrary datum.

I think using a different example would better illuminate what the CLT is about. I think the illustrated examples in the Wiki article on the CLT, e.g., rolling dice or flipping coins, would be a better place to start seeing the CLT in action than measuring the heights of mountains (and much easier to do your own experiments).
 
  • #6
thetexan said:
Thanks, I'm studying about the CLT. What I'm getting from that is that if you take a large enough sample any data will tend toward the same normal distribution (the height and shape of the bell curve differing based on several factors). This seems counter intuitive to me...so far. (I hope I'll get it eventually).

Let's take a sample of billions of geographic points on the Earth each measuring the terrain height at that point. Surely the frequency distribution of the data will be heavily weighted toward the low height end since, by far there is more flat Earth that mountainous earth. Simply increasing the data sample to trillions of samples won't change that fact. But, if I read the CLT and understand it with my very uneducated capability, the resulting distribution should be close to a bell shaped normal distribution. If this is right then I am lost as to why and that is my dilema.

tex
The central limit theorem has a condition that the samples be independent. The height measurements are not - the height at any point will, most of the time, be similar to the height at nearby points.
 
  • #7
mathman said:
The central limit theorem has a condition that the samples be independent. The height measurements are not - the height at any point will, most of the time, be similar to the height at nearby points.

Unless, of course, you're near the edge of a cliff. :eek: :sorry: :))
 

FAQ: Need help with normal distribution

What is a normal distribution?

A normal distribution, also known as a Gaussian distribution, is a probability distribution that is commonly used to describe the distribution of a continuous variable. It is bell-shaped and symmetrical, with most values clustered around the mean and decreasing in frequency as they move away from the mean.

How is a normal distribution represented mathematically?

A normal distribution is typically represented by its mean (μ) and standard deviation (σ). The formula for a normal distribution is: f(x) = (1 / σ√2π) * e^(-1/2((x-μ)/σ)^2), where e is the mathematical constant and x is the variable.

What are the characteristics of a normal distribution?

A normal distribution has several important characteristics, including: symmetry around the mean, with the mean, median, and mode all being equal; a bell-shaped curve; and the empirical rule, which states that approximately 68% of the data falls within one standard deviation of the mean, 95% falls within two standard deviations, and 99.7% falls within three standard deviations.

What is the importance of the normal distribution in statistics?

The normal distribution is important in statistics because it is used to model many real-world phenomena, such as heights, weights, and test scores. It also allows for the use of common statistical methods, such as the z-score and hypothesis testing. Additionally, many statistical tests assume a normal distribution, making it a fundamental concept in statistics.

How is the normal distribution used in data analysis?

The normal distribution is used in data analysis to determine the probability of a certain value occurring within a given range. It is also used to identify outliers and to compare two or more sets of data. By understanding the characteristics of a normal distribution, scientists can make informed decisions about their data and draw meaningful conclusions from their analyses.

Back
Top