Fitting data to a Gaussian Distribution in Excel

In summary: This can happen if there is not enough precision in the calculations. To prevent this from happening, you might want to try using the Random Number Generator with your mean and standard deviation to generate data that would fit to a Gaussian, and then plot that over your original data.
  • #1
QuantumCurt
Education Advisor
726
166
Hey everyone,

I'm hoping I can get some input on this. As some of you may recall, I'm currently in an internship at Fermilab, and I've hit a snag in plotting my data. I'm measuring the intensity of the NuMI neutrino beam at multiple different points to look for discrepancies in the measured intensity and plotting them graphically. Essentially I'm taking the measured intensity and creating normal distributions, then finding the difference between the normal distributions and plotting the differences as normal distributions. I'm good up to this point, but now I need to take these differences and fit them to a Gaussian Distribution. This apparently isn't something that Excel has built in, but can be done using the proper procedures.

I've been following http://homepages.gac.edu/~smiller3/courses/371-f14/fitting.pdf which has worked very well up to a point. I'm stuck on step 8. I'm trying to optimize the distribution using the Solver function in Excel 2013, and I keep getting a division by zero error for my R² value, and I can't figure out why. Could anyone provide some input or direct me toward a better tutorial for doing this? It would be very appreciated! :smile:
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Can you share the data set which is giving you trouble, or is it too big?
 
  • Like
Likes QuantumCurt
  • #3
I am not familiar with Solver and this is a quess but If the optimization algorithm in the Solver function is using some sort of gradient search procedure you have to make sure that the differences of quantities used by or generated the algorithm do not equal zero. this can happen if there is not enough precision in the calculations.
 
  • Like
Likes QuantumCurt
  • #4
The data set isn't that large. What I did was take the initial measured intensity and the measured intensity in one of the toroids. I plotted normal distributions, and found the difference between them. Then I took the difference and plotted a new normal distribution. The x-axis is the change in measured intensity, and the y-axis is the count frequency of each intensity difference. My bins are set up from 4 standard deviations below the mean and are in steps of one half of a standard deviation up to 4.5 standard deviations above the mean.

Here's a workbook containing the data. If I should post it in a different format, let me know.

Mu and sigma² were just arbitrary numbers, the optimizer should correct them appropriately. R² was calculated using the sums of my residuals and squares, which are both at the bottoms of their respective columns.
 

Attachments

  • NuMIBeamIntensityGaussian.xlsx
    10.3 KB · Views: 1,423
  • #5
gleem said:
I am not familiar with Solver and this is a quess but If the optimization algorithm in the Solver function is using some sort of gradient search procedure you have to make sure that the differences of quantities used by or generated the algorithm do not equal zero. this can happen if there is not enough precision in the calculations.

This is actually my first time using it, so I'm not very familiar with it either. What you're saying is quite possible.
 
  • #6
I have known data sets and associated fitting functions that required 128 bit precision to avoid divide overflows.
 
  • #7
I'm wondering if I should use the Random Number Generator with my mean and standard deviation to generate data that would fit to a Gaussian, and then plot that over my original data.

Since the data itself does not fit a Gaussian, but is close to a Gaussian, it seems like a plausible option. And it would be very simple. Any thoughts?
 
  • #8
Okay, never mind. That didn't really work out at all...lol
 
  • #9
I've figured out a bit more. I'm getting a division by zero error because my sigma² is somehow becoming zero when I try to optimize the data for the Gaussian fit. On the left is the data after being optimized, and on the right is all of the original data prior to optimization. I traced the error out and it's because the sigma² became zero, which doesn't make sense, because it definitely should not be zero as far as I can tell. I even tried bringing it out to 18 decimal places and it was still just zero.

Anyone have any other thoughts on this?

zeroerror.PNG
 
  • #10
QuantumCurt said:
I've figured out a bit more. I'm getting a division by zero error because my sigma² is somehow becoming zero when I try to optimize the data for the Gaussian fit. On the left is the data after being optimized, and on the right is all of the original data prior to optimization. I traced the error out and it's because the sigma² became zero, which doesn't make sense, because it definitely should not be zero as far as I can tell. I even tried bringing it out to 18 decimal places and it was still just zero.

Anyone have any other thoughts on this?

I've taken a look at your data and the values of μ and σ you are using for your trial Gaussian fit.

It occurs to me that your starting values for μ and σ lie outside the range of the x values for your sample data set. The μ value represents the mean value of x, i.e. where the Gaussian curve's peak is located. This value must lie somewhere between x = -0.01799295 and x = 0.03830881. Similarly, the value of σ controls if the Gaussian curve ir relatively broad or narrow. For a typical Gaussian curve, a distance of 3σ on each side of x = μ should encompass at least 99% of the area under the Gaussian curve, so if you took 6σ = 0.03830881 - (-0.01799295) = 0.05630176, then σ ≈ 0.009383627. I would try these estimates of μ and σ when starting the optimization process. Some scaling of the y-values will probably be required as well.

This diagram shows how a Gaussian curve changes with different values of μ and σ:

normal_pdf.png
 
Last edited:
  • Like
Likes QuantumCurt
  • #11
Sorry for the late reply. Through a combination of your advice and utilizing Chi² within my calculation, I think I've managed to figure this out.

It didn't even occur to me that the values of mu and sigma² I was using weren't within the range, so that was a large part of my problem.

I think I've got this issue sorted out now. Thanks for the help!
 
  • #12
QuantumCurt said:
Sorry for the late reply. Through a combination of your advice and utilizing Chi² within my calculation, I think I've managed to figure this out.

It didn't even occur to me that the values of mu and sigma² I was using weren't within the range, so that was a large part of my problem.

I think I've got this issue sorted out now. Thanks for the help!

You got to remember non-linear least squares always needs accurate initial guesses. I usually graph the function with my initial guesses and data to make sure the curve is close.
 

What is a Gaussian distribution?

A Gaussian distribution, also known as a normal distribution, is a statistical distribution that is shaped like a bell curve. It is commonly used to model natural phenomena, such as human height or IQ, as well as errors in scientific measurements.

Why would I want to fit data to a Gaussian distribution in Excel?

Fitting data to a Gaussian distribution in Excel can help you analyze and understand your data better. It can also help you make predictions and identify any anomalies or outliers in your data.

How do I fit data to a Gaussian distribution in Excel?

To fit data to a Gaussian distribution in Excel, you can use the "Data Analysis" tool. First, select your data and open the "Data Analysis" tool. Then, choose "Histogram" as the analysis type and select the option for "Normal distribution" in the "Input Range" field. Click "OK" and Excel will generate a histogram and a best-fit line for your data.

What do the parameters mean in a Gaussian distribution curve?

The Gaussian distribution curve has two main parameters: mean and standard deviation. The mean represents the central value of the distribution, where the curve reaches its peak. The standard deviation represents the spread of the data points around the mean. A larger standard deviation indicates a wider spread of data points, while a smaller standard deviation indicates a narrower spread.

How do I interpret the results of a Gaussian distribution fit in Excel?

The results of a Gaussian distribution fit in Excel can be interpreted by looking at the histogram and the best-fit curve. If the curve closely matches the shape of the histogram, then the data is likely to be normally distributed. However, if there are significant differences between the curve and the histogram, it may indicate that the data does not follow a Gaussian distribution.

Similar threads

  • General Math
Replies
6
Views
782
  • MATLAB, Maple, Mathematica, LaTeX
Replies
31
Views
4K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
6
Views
4K
  • Other Physics Topics
Replies
5
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
14
Views
2K
  • STEM Educators and Teaching
Replies
5
Views
646
  • MATLAB, Maple, Mathematica, LaTeX
Replies
6
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
Back
Top