# I need to fit a tail-heavy Gaussian curve

• Grogs
In summary, Grogs is trying to fit a tail-heavy Gaussian curve to some scattering data, but is having difficulty getting a good fit. They have tried adjusting the mean and adding a constant, but the data is still too tail heavy. They are now considering using a different function, such as the exponential or a double Gaussian, and are also considering the possibility of the data being a convolution of two different Gaussians. They are limited in their options due to their unique detector geometry and time window restrictions. It is important for them to have a reasonable explanation for why a particular fit works.
Grogs
I need to fit a tail-heavy "Gaussian" curve

Hi, it's been a long time since I've been around PF.

## Homework Statement

This isn't a homework problem per se, but I've been trying to fit some scattering data using a Gaussian function using a least squares approach and it's not working so well. Doing the fit is no problem, but the Gaussian doesn't follow the data very well. The agreement is good within ~ 1.5 standard deviation from the mean, but the data is too tail heavy and the agreement is lousy beyond that.

I need to find a function that will fit the data better. Something with a scaling parameter that would let me vary the kurtosis (tail heaviness) would be ideal, but I can't think of anything that fits the bill. I'm hoping that there's a stats whiz around who can point me in the right direction.

## Homework Equations

My Gaussian fit function: $f(\theta) = Aexp(-\theta^{2}/2s^{2})$ where A and s are the fit parameters (mean = 0).

I also tried using: $f(\theta) = B + Aexp(-\theta^{2}/2s^{2})$

## The Attempt at a Solution

I tried adding a constant to the Gaussian fit, but then the fit ends up being too large at the tails.

TIA,

Grogs

Some questions:

1. What is the mean? That you tried a zero mean fit and then a non-zero mean fit suggests you aren't thinking enough about your model.

2. Can the data ever be negative? For example, time-to-failure. Things don't fail before they are made; time-to-failure is inherently non-negative.

3. What made you think the underlying distribution might be gaussian?

4. Have you done a normality test? Testing whether a distribution is normal is a fairly simple procedure. For example, the Anderson-Darling test.

Hi, thanks for the quick response.

D H said:
Some questions:

1. What is the mean? That you tried a zero mean fit and then a non-zero mean fit suggests you aren't thinking enough about your model.

The mean is zero. The data I'm modeling is essentially the angle a neutron scatters in the x-y plane, which is going to be a symmetric function. f(theta) is the number of counts I get in a detector at angle theta. The incoming particles I modeled are all traveling along the x-axis, so that's why I know the mean scattering angle is 0. Adding the constant shouldn't affect the mean angle, just the magnitude, right?

D H said:
Can the data ever be negative? For example, time-to-failure. Things don't fail before they are made; time-to-failure is inherently non-negative.

Yes it can. The scattering angle can be either positive or negative in this case. I could just invoke symmetry and fit |theta| without making much of a difference if I need to though.

D H said:
What made you think the underlying distribution might be gaussian?

I read through a lot of previous work of this type in journal articles during my literature review (this work is part of my dissertation) and they typically used a Gaussian to fit the scattering function. My detector geometry is a bit different than anything I found in the literature review though, which I suspect is what is causing the non-Normality.

D H said:
Have you done a normality test? Testing whether a distribution is normal is a fairly simple procedure. For example, the Anderson-Darling test.

Not yet. I can run the data through JMP tomorrow when I get back to my work and see what it tells me, but I can pretty well tell by inspection that it's not Normally distributed in the tails. Values at 5 sigma are something like .001 of the maximum, which is way too high for a Gaussian. That's why I decided to see if I could find something other than a Gaussian that would fit it better.

Thanks again,

Grogs

Grogs said:
Adding the constant shouldn't affect the mean angle, just the magnitude, right?
Whoa! I misread your $f(\theta) = B + Aexp(-\theta^{2}/2s^{2})$. That says there is a non-zero intensity at all angles. I assume that even with your broad tails, the intensity eventually tails off to zero. So that non-zero B is inviting some non-physics into your model.

I read through a lot of previous work of this type in journal articles during my literature review (this work is part of my dissertation) and they typically used a Gaussian to fit the scattering function. My detector geometry is a bit different than anything I found in the literature review though, which I suspect is what is causing the non-Normality.
One possibility: See if you find a mapping that maps your detector geometry to something a bit more "normal" (i.e., what you found in the literature), do your statistics in that space, and then translate back to your geometry.

In your space, that would translate to some weird member of the exponential family,

$$f(\theta) = A\exp(-g(|\theta|))$$

Another possibility is the double exponential:

$$f(\theta) = A\exp\left(-\,\frac{|\theta|} b\right)$$

Note that the absolute value in the above force the distribution to be symmetric (which makes sense given the description of your setup).

Thanks, D H. I'll take a look at the exponential and see if it does any better. Another thought that occurred to me was that what I'm looking at may be the convolution of two different Gaussians - one with a small maximum and a very broad distribution and a second with a very large peak and a narrow distribution.

Switching back to the previous geometries isn't really an option unfortunately. They were using a 2D surface tally and I'm using an array of volumetric ones in a cylindrical configuration. I also have a time window on the detectors, i.e., the particles have to arrive at a certain time to be counted, which complicates things as well. It's just too hard to do analytically which is why we've had to resort to using a trusted simulation program to find the right answer and then develop an empirical model. You are right that there does need to be some reasonable explanation of why a particular fit works though. Otherwise I could just do something like a cubic spline on, get a great fit, and move on. I'm hoping that the form of the fit will at least give me a hint of the underlying physics.

Grogs said:
Another thought that occurred to me was that what I'm looking at may be the convolution of two different Gaussians - one with a small maximum and a very broad distribution and a second with a very large peak and a narrow distribution.
A mixture.

The EM algorithm is very good at pulling out mixtures. Maybe a bit overkill, though.

## 1. How do I determine if my data follows a Gaussian curve?

The best way to determine if your data follows a Gaussian curve is to create a histogram and see if it resembles a bell-shaped curve. You can also calculate the skewness and kurtosis of your data, with values close to 0 indicating a Gaussian distribution.

## 2. What does it mean if my curve is tail-heavy?

A tail-heavy Gaussian curve refers to a curve where the tail on one or both ends of the distribution is longer than what is expected from a normal Gaussian distribution. This indicates that there are more extreme values in your data compared to a typical Gaussian distribution.

## 3. How do I fit a Gaussian curve to my data?

To fit a Gaussian curve to your data, you can use software or programming languages such as R or Python that have functions specifically for fitting curves. You will need to provide the mean and standard deviation of your data to create the best fit curve.

## 4. What if my data does not fit a Gaussian curve?

If your data does not fit a Gaussian curve, it could indicate that your data follows a different distribution, such as a skewed or bimodal distribution. You may need to explore other curve-fitting methods or transformations in order to accurately model your data.

## 5. Can I still use a Gaussian curve if my data is tail-heavy?

Yes, you can still use a Gaussian curve if your data is tail-heavy. However, you may need to adjust the parameters of the curve to account for the longer tails in your data. Additionally, you may want to consider alternative curve-fitting methods that better capture the behavior of your data.

• Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
• MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
2K
• Calculus and Beyond Homework Help
Replies
13
Views
4K
• Calculus and Beyond Homework Help
Replies
1
Views
1K
• Calculus and Beyond Homework Help
Replies
9
Views
5K
• Astronomy and Astrophysics
Replies
1
Views
1K
• Programming and Computer Science
Replies
6
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
• Calculus and Beyond Homework Help
Replies
12
Views
3K
• Atomic and Condensed Matter
Replies
1
Views
1K