I Fit a Poisson on Gaussian distributed data

AI Thread Summary
Using a Poisson function to fit Gaussian-distributed data can be problematic, as Poisson distributions are designed for specific processes that model rates, where the mean and variance are equal. If the sample mean and variance of the data are close, a Poisson model may be appropriate; otherwise, a Gaussian fit might be better. Transformations like logarithmic or Box-Cox can help make data more symmetric, but should be applied carefully to avoid misinterpretation. The discussion emphasizes the importance of understanding the underlying process before selecting a fitting model. Ultimately, exploring other distributions like Gamma or Chi-square may provide better fits for data that appears Gaussian but has distinct characteristics.
ChrisVer
Science Advisor
Messages
3,372
Reaction score
465
Hi, I have a simple/fast question...
Can you reliably use a Poisson function to fit on data that seem to be Gaussian distributed (although that is due to the large number of the mean)?
 
Physics news on Phys.org
For the question to be interpreted in any specific way, you need to describe the data (precisely).
 
Hey ChrisVer.

Poisson distributions (and Poisson processes) are constructed from very specific first principles where they represent rates as a limit to a Binomial distribution (with certain properties).

Usually these processes model rates and similar phenomena - you might want to tell us what you are trying to do so we can give further feedback.
 
If you have reason to think that the process is a Poisson process, you may want to check the sample variance. Poisson only has one parameter, λ, which is both the mean and variance. If the sample mean and variance are close, you can probably model it as Poisson. Otherwise, Gaussian would probably give a better fit.

You might also want to check the sample skewness. It should be close to λ-1/2.

PS. I am not sure how to define "close to" for the sample variance and skew. Maybe you can Google a confidence interval.
 
the problem with the gauss is that it's symmetric around the mean something that was not the case for my histograms.
I thought about fitting on it, to get the Var, but at the end I chose to integrate it and find the +/- 34%
 
ChrisVer said:
the problem with the gauss is that it's symmetric around the mean something that was not the case for my histograms.
I thought about fitting on it, to get the Var, but at the end I chose to integrate it and find the +/- 34%

Have you tried to transform your histograms? like take the logarithm of the data often makes things more symmetric.
 
  • Like
Likes FactChecker
That's an example of a histo...
 

Attachments

  • EX.jpg
    EX.jpg
    29.1 KB · Views: 526
ChrisVer said:
That's an example of a histo...

Try a Box-Cox transformation to make it more normal.
 
You should probably tell us what you are trying to do before using arbitrary transformations, test statistics and inferences.

Transforming data out of context is not a good idea and depending on what resolutions you are trying to make it can actually be detrimental to getting a useful inference.
 
  • #10
What I wanted to do was to:
not add bins in order to find the +/- 34.1% errors to the red line, which can be binning dependent.
but instead fit a function on the distribution, and integrate that function around the red line to get the +/-34.1%.
Obviously the distribution is not Gaussian, but looks more like a Poisson...
I was thinking about rescaling the x-axis [since Poisson is accepting integer entries while I have floats], doing the fit, integrating, and then scale everything [together with the obtained variances] back to the original x-axis.

The point is that I have other distributions which look pretty much like Gaussians, and I wanted to make sure I could use a Poisson to fit them too [since the code should do that, I wouldn't want to check everytime the distribution and determine with what I could fit it with].
 
  • #11
If you want to fit probabilities to data then that is understandable - but I ask because if it based on a particular process (of which the distribution constraints should be derived) then it means you typically construct models for specific reasons before you fit them.

I'd look at Gamma, Chi-square and other generalized distributions of these for more. You'll find they can deal with these bumps and skewness and you can estimate the parameters of these distributions and do goodness of fit tests.
 
Back
Top