Creating a histogram and then applying a gaussian fit. Help

  • Context: Undergrad 
  • Thread starter Thread starter absolute3
  • Start date Start date
  • Tags Tags
    Fit Gaussian Histogram
Click For Summary

Discussion Overview

The discussion revolves around creating a histogram from a given dataset and fitting a Gaussian distribution to it. Participants explore the process of normalizing data, defining bins for the histogram, and applying statistical functions in Excel to achieve these tasks. The conversation includes technical aspects of histogram creation and Gaussian fitting, as well as the use of specific equations and software tools.

Discussion Character

  • Homework-related
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant presents a dataset and seeks guidance on creating a histogram and fitting a Gaussian curve, expressing confusion about the concept of bins.
  • Another participant explains that bins are subintervals and suggests experimenting with different bin sizes to achieve a bell curve appearance in the histogram.
  • There is a discussion about the normalization of data using the formula D = 1 - D2/D1, with one participant questioning the choice of this formula over an alternative.
  • Participants suggest using Excel for the task and provide insights on how to apply the normal distribution equation, including estimating mean and standard deviation.
  • One participant inquires about overlaying the normal distribution curve on the histogram and seeks clarification on the parameters needed for the NORMDIST function in Excel.
  • Another participant clarifies that 'X' in the NORMDIST function corresponds to the normalized variable, referred to as D.

Areas of Agreement / Disagreement

Participants generally agree on the approach to creating the histogram and fitting a Gaussian curve, but there are differing views on the normalization formula and the specifics of using Excel functions. The discussion remains unresolved regarding the best practices for overlaying the normal distribution curve.

Contextual Notes

Participants express varying levels of familiarity with statistics and Excel, which may affect their understanding of the concepts discussed. There is also a lack of consensus on the normalization method and its implications.

Who May Find This Useful

This discussion may be useful for students or individuals learning about data visualization, statistical analysis, and the application of Excel in creating histograms and fitting distributions.

absolute3
Messages
4
Reaction score
0
Ok, I need to take the data:

Code:
D1	   D2
6.5	   3
6	   4
6.7	   5.5
7	   3.8
6.3	   4.5
8.6	   5.8
5.5	   4
7	   3.5
7	   4.5
7	   5
6.5	   4
6.8	   3.2
7	   3.6
6	   4.5
6	   2.8

and make a histogram (centered around 0 -- i.e. 0 will not be an edge of a bin) of the data and then fit a gaussian to the data.

This is the normal distribution equation:
http://img67.imageshack.us/img67/6014/equationma0.jpg

And I am to normalize the data using (D1-D2)/D1

I normally would not ask for help with something like this, but it is for a 1unit seminar and this was all the information I was given. I have never even had any statistics education. I tried looking up how to create a histogram and all of that, but just reached a state of utter confusion when it came to "bins."

Can someone at least point me in the right direction?
 
Last edited by a moderator:
Physics news on Phys.org
After you calculate the "normalized" data, D = 1 - D2/D1, you are to create multiple bins that will make the D's look like the "bell curve." It can help to experiment with different bin sizes to make the histogram as similar to the normal curve as possible. I guess "fitting a gaussian to the data" is one way of saying "calculate the average and the standard deviation for the D's."
 
EnumaElish said:
After you calculate the "normalized" data, D = 1 - D2/D1, you are to create multiple bins that will make the D's look like the "bell curve." It can help to experiment with different bin sizes to make the histogram as similar to the normal curve as possible. I guess "fitting a gaussian to the data" is one way of saying "calculate the average and the standard deviation for the D's."


What exactly is a bin? And would you recommend doing this on Excel, or is there a superior program available online?
 
A bin is a subinterval. Say, your data were randomly distributed between -10 and +10. Then you could represent this as a single bin, which would appear as a single column representing "all data." This would not be a very informative histogram. Alternatively, you could have 20 bins each with unit length, i.e. [-10 to -9], ..., [+9 to +10]. Excel is as good as any for this problem.

See http://en.wikipedia.org/wiki/Histogram
 
Last edited:
EnumaElish said:
A bin is a subinterval. Say, your data were randomly distributed between -10 and +10. Then you could represent this as a single bin, which would appear as a single column representing "all data." This would not be a very informative histogram. Alternatively, you could have 20 bins each with unit length, i.e. [-10 to -9], ..., [+9 to +10]. Excel is as good as any for this problem.

See http://en.wikipedia.org/wiki/Histogram

Last questions methinks:

1. How to I apply the equation for normal distribution to my data in Excel?
 
I suppose you should estimate mean and standard deviation, and use your estimates as the parameters "mu" and "sigma" in your ecuation, with A = 1/(sigma . sqrt(2 . pi)).

By the way, why use (D1-D2)/D1 instead of (D2-D1)/D1 ? Unless you deliberately want to change the sign of the difference.
 
Last edited:
absolute3 said:
Last questions methinks:

1. How to I apply the equation for normal distribution to my data in Excel?
Use Excel functions AVERAGE and STDEV to calculate these parameters from data. Then use NORMDIST to calculate the probability:

From Excel help:
NORMDIST returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing.

Syntax

NORMDIST(x,mean,standard_dev,cumulative)

X is the value for which you want the distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.
 
Alright, I have my histogram, as well as my Standard Deviation and Mean for the data.

My final question is how do I get excel to actually overlay the normal distribution curve over my existing histogram?

Do I have to use NORMDIST to return the normal distribution first? And if so, what does the parameter 'X' represent?
 
X is equivalent to your "normalized" variable, which I named D above.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
15K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
Replies
2
Views
905
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 9 ·
Replies
9
Views
9K
  • Poll Poll
  • · Replies 5 ·
Replies
5
Views
13K