I need to fit a tail-heavy Gaussian curve

Click For Summary
SUMMARY

This discussion focuses on fitting a tail-heavy Gaussian curve to scattering data using a least squares approach. The user, Grogs, is struggling with the fit, as the Gaussian function does not adequately represent the data beyond 1.5 standard deviations from the mean. Suggestions include exploring alternative functions such as the double exponential and considering the convolution of two different Gaussian distributions. The importance of conducting a normality test, such as the Anderson-Darling test, is emphasized to assess the underlying distribution of the data.

PREREQUISITES
  • Understanding of Gaussian functions and their parameters
  • Familiarity with least squares fitting techniques
  • Knowledge of statistical normality tests, specifically the Anderson-Darling test
  • Experience with data modeling and analysis in a physics context
NEXT STEPS
  • Research the double exponential function for fitting tail-heavy distributions
  • Learn about the EM algorithm for identifying mixtures in data
  • Explore convolution of Gaussian distributions for complex data fitting
  • Investigate statistical software options like JMP for conducting normality tests
USEFUL FOR

Researchers and data analysts in physics, particularly those working with scattering data and seeking to improve model fitting techniques for non-standard distributions.

Grogs
Messages
148
Reaction score
0
I need to fit a tail-heavy "Gaussian" curve

Hi, it's been a long time since I've been around PF.



Homework Statement



This isn't a homework problem per se, but I've been trying to fit some scattering data using a Gaussian function using a least squares approach and it's not working so well. Doing the fit is no problem, but the Gaussian doesn't follow the data very well. The agreement is good within ~ 1.5 standard deviation from the mean, but the data is too tail heavy and the agreement is lousy beyond that.

I need to find a function that will fit the data better. Something with a scaling parameter that would let me vary the kurtosis (tail heaviness) would be ideal, but I can't think of anything that fits the bill. I'm hoping that there's a stats whiz around who can point me in the right direction.


Homework Equations



My Gaussian fit function: f(\theta) = Aexp(-\theta^{2}/2s^{2}) where A and s are the fit parameters (mean = 0).

I also tried using: f(\theta) = B + Aexp(-\theta^{2}/2s^{2})

The Attempt at a Solution



I tried adding a constant to the Gaussian fit, but then the fit ends up being too large at the tails.

TIA,

Grogs
 
Physics news on Phys.org


Some questions:

1. What is the mean? That you tried a zero mean fit and then a non-zero mean fit suggests you aren't thinking enough about your model.

2. Can the data ever be negative? For example, time-to-failure. Things don't fail before they are made; time-to-failure is inherently non-negative.

3. What made you think the underlying distribution might be gaussian?

4. Have you done a normality test? Testing whether a distribution is normal is a fairly simple procedure. For example, the Anderson-Darling test.
 


Hi, thanks for the quick response.

D H said:
Some questions:

1. What is the mean? That you tried a zero mean fit and then a non-zero mean fit suggests you aren't thinking enough about your model.

The mean is zero. The data I'm modeling is essentially the angle a neutron scatters in the x-y plane, which is going to be a symmetric function. f(theta) is the number of counts I get in a detector at angle theta. The incoming particles I modeled are all traveling along the x-axis, so that's why I know the mean scattering angle is 0. Adding the constant shouldn't affect the mean angle, just the magnitude, right?

D H said:
Can the data ever be negative? For example, time-to-failure. Things don't fail before they are made; time-to-failure is inherently non-negative.

Yes it can. The scattering angle can be either positive or negative in this case. I could just invoke symmetry and fit |theta| without making much of a difference if I need to though.

D H said:
What made you think the underlying distribution might be gaussian?

I read through a lot of previous work of this type in journal articles during my literature review (this work is part of my dissertation) and they typically used a Gaussian to fit the scattering function. My detector geometry is a bit different than anything I found in the literature review though, which I suspect is what is causing the non-Normality.

D H said:
Have you done a normality test? Testing whether a distribution is normal is a fairly simple procedure. For example, the Anderson-Darling test.

Not yet. I can run the data through JMP tomorrow when I get back to my work and see what it tells me, but I can pretty well tell by inspection that it's not Normally distributed in the tails. Values at 5 sigma are something like .001 of the maximum, which is way too high for a Gaussian. That's why I decided to see if I could find something other than a Gaussian that would fit it better.

Thanks again,

Grogs
 


Grogs said:
Adding the constant shouldn't affect the mean angle, just the magnitude, right?
Whoa! I misread your f(\theta) = B + Aexp(-\theta^{2}/2s^{2}). That says there is a non-zero intensity at all angles. I assume that even with your broad tails, the intensity eventually tails off to zero. So that non-zero B is inviting some non-physics into your model.

I read through a lot of previous work of this type in journal articles during my literature review (this work is part of my dissertation) and they typically used a Gaussian to fit the scattering function. My detector geometry is a bit different than anything I found in the literature review though, which I suspect is what is causing the non-Normality.
One possibility: See if you find a mapping that maps your detector geometry to something a bit more "normal" (i.e., what you found in the literature), do your statistics in that space, and then translate back to your geometry.

In your space, that would translate to some weird member of the exponential family,

f(\theta) = A\exp(-g(|\theta|))

Another possibility is the double exponential:

f(\theta) = A\exp\left(-\,\frac{|\theta|} b\right)

Note that the absolute value in the above force the distribution to be symmetric (which makes sense given the description of your setup).
 


Thanks, D H. I'll take a look at the exponential and see if it does any better. Another thought that occurred to me was that what I'm looking at may be the convolution of two different Gaussians - one with a small maximum and a very broad distribution and a second with a very large peak and a narrow distribution.

Switching back to the previous geometries isn't really an option unfortunately. They were using a 2D surface tally and I'm using an array of volumetric ones in a cylindrical configuration. I also have a time window on the detectors, i.e., the particles have to arrive at a certain time to be counted, which complicates things as well. It's just too hard to do analytically which is why we've had to resort to using a trusted simulation program to find the right answer and then develop an empirical model. You are right that there does need to be some reasonable explanation of why a particular fit works though. Otherwise I could just do something like a cubic spline on, get a great fit, and move on. I'm hoping that the form of the fit will at least give me a hint of the underlying physics.
 


Grogs said:
Another thought that occurred to me was that what I'm looking at may be the convolution of two different Gaussians - one with a small maximum and a very broad distribution and a second with a very large peak and a narrow distribution.
A mixture.

The EM algorithm is very good at pulling out mixtures. Maybe a bit overkill, though.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
5K
Replies
28
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 9 ·
Replies
9
Views
6K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
Replies
1
Views
2K