# Creating a histogram and then applying a gaussian fit. Help!

Ok, I need to take the data:

Code:
D1	   D2
6.5	   3
6	   4
6.7	   5.5
7	   3.8
6.3	   4.5
8.6	   5.8
5.5	   4
7	   3.5
7	   4.5
7	   5
6.5	   4
6.8	   3.2
7	   3.6
6	   4.5
6	   2.8
and make a histogram (centered around 0 -- i.e. 0 will not be an edge of a bin) of the data and then fit a gaussian to the data.

This is the normal distribution equation:
http://img67.imageshack.us/img67/6014/equationma0.jpg [Broken]

And I am to normalize the data using (D1-D2)/D1

I normally would not ask for help with something like this, but it is for a 1unit seminar and this was all the information I was given. I have never even had any statistics education. I tried looking up how to create a histogram and all of that, but just reached a state of utter confusion when it came to "bins."

Can someone at least point me in the right direction?

Last edited by a moderator:

Related Set Theory, Logic, Probability, Statistics News on Phys.org
EnumaElish
Homework Helper
After you calculate the "normalized" data, D = 1 - D2/D1, you are to create multiple bins that will make the D's look like the "bell curve." It can help to experiment with different bin sizes to make the histogram as similar to the normal curve as possible. I guess "fitting a gaussian to the data" is one way of saying "calculate the average and the standard deviation for the D's."

After you calculate the "normalized" data, D = 1 - D2/D1, you are to create multiple bins that will make the D's look like the "bell curve." It can help to experiment with different bin sizes to make the histogram as similar to the normal curve as possible. I guess "fitting a gaussian to the data" is one way of saying "calculate the average and the standard deviation for the D's."

What exactly is a bin? And would you recommend doing this on Excel, or is there a superior program available online?

EnumaElish
Homework Helper
A bin is a subinterval. Say, your data were randomly distributed between -10 and +10. Then you could represent this as a single bin, which would appear as a single column representing "all data." This would not be a very informative histogram. Alternatively, you could have 20 bins each with unit length, i.e. [-10 to -9], ..., [+9 to +10]. Excel is as good as any for this problem.

See http://en.wikipedia.org/wiki/Histogram

Last edited:
A bin is a subinterval. Say, your data were randomly distributed between -10 and +10. Then you could represent this as a single bin, which would appear as a single column representing "all data." This would not be a very informative histogram. Alternatively, you could have 20 bins each with unit length, i.e. [-10 to -9], ..., [+9 to +10]. Excel is as good as any for this problem.

See http://en.wikipedia.org/wiki/Histogram
Last questions methinks:

1. How to I apply the equation for normal distribution to my data in Excel?

I suppose you should estimate mean and standard deviation, and use your estimates as the parameters "mu" and "sigma" in your ecuation, with A = 1/(sigma . sqrt(2 . pi)).

By the way, why use (D1-D2)/D1 instead of (D2-D1)/D1 ? Unless you deliberately want to change the sign of the difference.

Last edited:
EnumaElish
Homework Helper
Last questions methinks:

1. How to I apply the equation for normal distribution to my data in Excel?
Use Excel functions AVERAGE and STDEV to calculate these parameters from data. Then use NORMDIST to calculate the probability:

From Excel help:
NORMDIST returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing.

Syntax

NORMDIST(x,mean,standard_dev,cumulative)

X is the value for which you want the distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.

Alright, I have my histogram, as well as my Standard Deviation and Mean for the data.

My final question is how do I get excel to actually overlay the normal distribution curve over my existing histogram?

Do I have to use NORMDIST to return the normal distribution first? And if so, what does the parameter 'X' represent?

EnumaElish