Fitting data to a Gaussian Distribution in Excel

Click For Summary

Discussion Overview

The discussion revolves around fitting data to a Gaussian distribution using Excel, specifically addressing challenges encountered during the optimization process with the Solver function. Participants explore the steps involved in plotting measured intensity data, calculating differences, and fitting these differences to a Gaussian model.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes their process of measuring neutrino beam intensity and creating normal distributions, but encounters a division by zero error when using the Solver function to optimize the Gaussian fit.
  • Another participant suggests that the division by zero error may arise from the optimization algorithm requiring non-zero differences in calculations, potentially due to insufficient precision.
  • There is a proposal to use a Random Number Generator to create data that fits a Gaussian distribution, although this approach is later dismissed by the original poster.
  • A participant points out that the starting values for the mean (μ) and variance (σ²) used in the Gaussian fit may be outside the range of the data, which could lead to issues during optimization.
  • One participant emphasizes the importance of accurate initial guesses for non-linear least squares fitting and suggests graphing the function with initial guesses to ensure a close fit to the data.

Areas of Agreement / Disagreement

Participants express varying levels of familiarity with the Solver function and optimization techniques, leading to multiple suggestions and hypotheses about the source of the division by zero error. The discussion does not reach a consensus on a single solution, as participants explore different aspects of the problem.

Contextual Notes

Some limitations include the potential for errors due to the precision of calculations and the necessity for initial parameter estimates to be within the data range for effective optimization.

QuantumCurt
Education Advisor
Messages
726
Reaction score
166
Hey everyone,

I'm hoping I can get some input on this. As some of you may recall, I'm currently in an internship at Fermilab, and I've hit a snag in plotting my data. I'm measuring the intensity of the NuMI neutrino beam at multiple different points to look for discrepancies in the measured intensity and plotting them graphically. Essentially I'm taking the measured intensity and creating normal distributions, then finding the difference between the normal distributions and plotting the differences as normal distributions. I'm good up to this point, but now I need to take these differences and fit them to a Gaussian Distribution. This apparently isn't something that Excel has built in, but can be done using the proper procedures.

I've been following http://homepages.gac.edu/~smiller3/courses/371-f14/fitting.pdf which has worked very well up to a point. I'm stuck on step 8. I'm trying to optimize the distribution using the Solver function in Excel 2013, and I keep getting a division by zero error for my R² value, and I can't figure out why. Could anyone provide some input or direct me toward a better tutorial for doing this? It would be very appreciated! :smile:
 
Last edited by a moderator:
Physics news on Phys.org
Can you share the data set which is giving you trouble, or is it too big?
 
  • Like
Likes   Reactions: QuantumCurt
I am not familiar with Solver and this is a quess but If the optimization algorithm in the Solver function is using some sort of gradient search procedure you have to make sure that the differences of quantities used by or generated the algorithm do not equal zero. this can happen if there is not enough precision in the calculations.
 
  • Like
Likes   Reactions: QuantumCurt
The data set isn't that large. What I did was take the initial measured intensity and the measured intensity in one of the toroids. I plotted normal distributions, and found the difference between them. Then I took the difference and plotted a new normal distribution. The x-axis is the change in measured intensity, and the y-axis is the count frequency of each intensity difference. My bins are set up from 4 standard deviations below the mean and are in steps of one half of a standard deviation up to 4.5 standard deviations above the mean.

Here's a workbook containing the data. If I should post it in a different format, let me know.

Mu and sigma² were just arbitrary numbers, the optimizer should correct them appropriately. R² was calculated using the sums of my residuals and squares, which are both at the bottoms of their respective columns.
 

Attachments

gleem said:
I am not familiar with Solver and this is a quess but If the optimization algorithm in the Solver function is using some sort of gradient search procedure you have to make sure that the differences of quantities used by or generated the algorithm do not equal zero. this can happen if there is not enough precision in the calculations.

This is actually my first time using it, so I'm not very familiar with it either. What you're saying is quite possible.
 
I have known data sets and associated fitting functions that required 128 bit precision to avoid divide overflows.
 
I'm wondering if I should use the Random Number Generator with my mean and standard deviation to generate data that would fit to a Gaussian, and then plot that over my original data.

Since the data itself does not fit a Gaussian, but is close to a Gaussian, it seems like a plausible option. And it would be very simple. Any thoughts?
 
Okay, never mind. That didn't really work out at all...lol
 
I've figured out a bit more. I'm getting a division by zero error because my sigma² is somehow becoming zero when I try to optimize the data for the Gaussian fit. On the left is the data after being optimized, and on the right is all of the original data prior to optimization. I traced the error out and it's because the sigma² became zero, which doesn't make sense, because it definitely should not be zero as far as I can tell. I even tried bringing it out to 18 decimal places and it was still just zero.

Anyone have any other thoughts on this?

zeroerror.PNG
 
  • #10
QuantumCurt said:
I've figured out a bit more. I'm getting a division by zero error because my sigma² is somehow becoming zero when I try to optimize the data for the Gaussian fit. On the left is the data after being optimized, and on the right is all of the original data prior to optimization. I traced the error out and it's because the sigma² became zero, which doesn't make sense, because it definitely should not be zero as far as I can tell. I even tried bringing it out to 18 decimal places and it was still just zero.

Anyone have any other thoughts on this?

I've taken a look at your data and the values of μ and σ you are using for your trial Gaussian fit.

It occurs to me that your starting values for μ and σ lie outside the range of the x values for your sample data set. The μ value represents the mean value of x, i.e. where the Gaussian curve's peak is located. This value must lie somewhere between x = -0.01799295 and x = 0.03830881. Similarly, the value of σ controls if the Gaussian curve ir relatively broad or narrow. For a typical Gaussian curve, a distance of 3σ on each side of x = μ should encompass at least 99% of the area under the Gaussian curve, so if you took 6σ = 0.03830881 - (-0.01799295) = 0.05630176, then σ ≈ 0.009383627. I would try these estimates of μ and σ when starting the optimization process. Some scaling of the y-values will probably be required as well.

This diagram shows how a Gaussian curve changes with different values of μ and σ:

normal_pdf.png
 
Last edited:
  • Like
Likes   Reactions: QuantumCurt
  • #11
Sorry for the late reply. Through a combination of your advice and utilizing Chi² within my calculation, I think I've managed to figure this out.

It didn't even occur to me that the values of mu and sigma² I was using weren't within the range, so that was a large part of my problem.

I think I've got this issue sorted out now. Thanks for the help!
 
  • #12
QuantumCurt said:
Sorry for the late reply. Through a combination of your advice and utilizing Chi² within my calculation, I think I've managed to figure this out.

It didn't even occur to me that the values of mu and sigma² I was using weren't within the range, so that was a large part of my problem.

I think I've got this issue sorted out now. Thanks for the help!

You got to remember non-linear least squares always needs accurate initial guesses. I usually graph the function with my initial guesses and data to make sure the curve is close.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 31 ·
2
Replies
31
Views
5K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 6 ·
Replies
6
Views
6K
Replies
28
Views
4K