I need to fit a tail-heavy Gaussian curve

Grogs · Dec 3, 2009

I need to fit a tail-heavy "Gaussian" curve

Hi, it's been a long time since I've been around PF.

Homework Statement

This isn't a homework problem per se, but I've been trying to fit some scattering data using a Gaussian function using a least squares approach and it's not working so well. Doing the fit is no problem, but the Gaussian doesn't follow the data very well. The agreement is good within ~ 1.5 standard deviation from the mean, but the data is too tail heavy and the agreement is lousy beyond that.

I need to find a function that will fit the data better. Something with a scaling parameter that would let me vary the kurtosis (tail heaviness) would be ideal, but I can't think of anything that fits the bill. I'm hoping that there's a stats whiz around who can point me in the right direction.

Homework Equations

My Gaussian fit function: f(\theta) = Aexp(-\theta^{2}/2s^{2}) where A and s are the fit parameters (mean = 0).

I also tried using: f(\theta) = B + Aexp(-\theta^{2}/2s^{2})

The Attempt at a Solution

I tried adding a constant to the Gaussian fit, but then the fit ends up being too large at the tails.

TIA,

Grogs

D H · Dec 3, 2009

Some questions:

1. What is the mean? That you tried a zero mean fit and then a non-zero mean fit suggests you aren't thinking enough about your model.

2. Can the data ever be negative? For example, time-to-failure. Things don't fail before they are made; time-to-failure is inherently non-negative.

3. What made you think the underlying distribution might be gaussian?

4. Have you done a normality test? Testing whether a distribution is normal is a fairly simple procedure. For example, the Anderson-Darling test.

Grogs · Dec 3, 2009

Hi, thanks for the quick response.

D H said:

Some questions:

1. What is the mean? That you tried a zero mean fit and then a non-zero mean fit suggests you aren't thinking enough about your model.

The mean is zero. The data I'm modeling is essentially the angle a neutron scatters in the x-y plane, which is going to be a symmetric function. f(theta) is the number of counts I get in a detector at angle theta. The incoming particles I modeled are all traveling along the x-axis, so that's why I know the mean scattering angle is 0. Adding the constant shouldn't affect the mean angle, just the magnitude, right?

D H said:

Can the data ever be negative? For example, time-to-failure. Things don't fail before they are made; time-to-failure is inherently non-negative.

Yes it can. The scattering angle can be either positive or negative in this case. I could just invoke symmetry and fit |theta| without making much of a difference if I need to though.

D H said:

What made you think the underlying distribution might be gaussian?

I read through a lot of previous work of this type in journal articles during my literature review (this work is part of my dissertation) and they typically used a Gaussian to fit the scattering function. My detector geometry is a bit different than anything I found in the literature review though, which I suspect is what is causing the non-Normality.

D H said:

Have you done a normality test? Testing whether a distribution is normal is a fairly simple procedure. For example, the Anderson-Darling test.

Not yet. I can run the data through JMP tomorrow when I get back to my work and see what it tells me, but I can pretty well tell by inspection that it's not Normally distributed in the tails. Values at 5 sigma are something like .001 of the maximum, which is way too high for a Gaussian. That's why I decided to see if I could find something other than a Gaussian that would fit it better.

Thanks again,

Grogs

D H · Dec 3, 2009

Grogs said:

Adding the constant shouldn't affect the mean angle, just the magnitude, right?

Whoa! I misread your f(\theta) = B + Aexp(-\theta^{2}/2s^{2}). That says there is a non-zero intensity at all angles. I assume that even with your broad tails, the intensity eventually tails off to zero. So that non-zero B is inviting some non-physics into your model.

I read through a lot of previous work of this type in journal articles during my literature review (this work is part of my dissertation) and they typically used a Gaussian to fit the scattering function. My detector geometry is a bit different than anything I found in the literature review though, which I suspect is what is causing the non-Normality.

One possibility: See if you find a mapping that maps your detector geometry to something a bit more "normal" (i.e., what you found in the literature), do your statistics in that space, and then translate back to your geometry.

In your space, that would translate to some weird member of the exponential family,

f(\theta) = A\exp(-g(|\theta|))

Another possibility is the double exponential:

f(\theta) = A\exp\left(-\,\frac{|\theta|} b\right)

Note that the absolute value in the above force the distribution to be symmetric (which makes sense given the description of your setup).

Grogs · Dec 4, 2009

Thanks, D H. I'll take a look at the exponential and see if it does any better. Another thought that occurred to me was that what I'm looking at may be the convolution of two different Gaussians - one with a small maximum and a very broad distribution and a second with a very large peak and a narrow distribution.

Switching back to the previous geometries isn't really an option unfortunately. They were using a 2D surface tally and I'm using an array of volumetric ones in a cylindrical configuration. I also have a time window on the detectors, i.e., the particles have to arrive at a certain time to be counted, which complicates things as well. It's just too hard to do analytically which is why we've had to resort to using a trusted simulation program to find the right answer and then develop an empirical model. You are right that there does need to be some reasonable explanation of why a particular fit works though. Otherwise I could just do something like a cubic spline on, get a great fit, and move on. I'm hoping that the form of the fit will at least give me a hint of the underlying physics.

D H · Dec 4, 2009

Grogs said:

Another thought that occurred to me was that what I'm looking at may be the convolution of two different Gaussians - one with a small maximum and a very broad distribution and a second with a very large peak and a narrow distribution.

A mixture.

The EM algorithm is very good at pulling out mixtures. Maybe a bit overkill, though.

I need to fit a tail-heavy Gaussian curve

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Does this series converge uniformly?

Use greedy vertex coloring algorithm to prove the upper bound of χ

Conflicting definitions of linear independence

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers