Fit a Poisson on Gaussian distributed data

Click For Summary

Discussion Overview

The discussion centers around the appropriateness of fitting a Poisson distribution to data that appears to be Gaussian distributed, particularly in the context of statistical modeling and error estimation. Participants explore the characteristics of the data and the implications of using different distributions for fitting.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions the reliability of using a Poisson function for data that seems Gaussian, suggesting that the interpretation of the question depends on a precise description of the data.
  • Another participant explains that Poisson distributions are derived from specific principles and typically model rates, prompting a request for more context regarding the data and its intended use.
  • A suggestion is made to check the sample variance and skewness to determine if a Poisson model is appropriate, noting that Poisson has a unique relationship between mean and variance.
  • Concerns are raised about the symmetry of Gaussian distributions, with one participant indicating that their histograms do not exhibit this symmetry, leading to a preference for integration over fitting.
  • Transformations such as logarithmic or Box-Cox are proposed to achieve more symmetric data, although one participant cautions against arbitrary transformations without context.
  • A participant describes their goal of fitting a function to the distribution to estimate errors without binning, expressing uncertainty about the applicability of Poisson fitting for distributions resembling Gaussians.
  • Another participant emphasizes the importance of understanding the underlying process before fitting distributions and suggests exploring other generalized distributions that may better accommodate the data's characteristics.

Areas of Agreement / Disagreement

Participants express differing views on the suitability of fitting a Poisson distribution to data that appears Gaussian. There is no consensus on the best approach, and multiple competing perspectives on transformation and fitting methods remain present.

Contextual Notes

Participants highlight limitations related to the definitions of "close" for variance and skewness, as well as the potential pitfalls of transforming data without a clear understanding of the underlying processes.

Who May Find This Useful

This discussion may be useful for researchers or practitioners interested in statistical modeling, particularly those dealing with data fitting and distribution selection in the context of experimental or observational data.

ChrisVer
Science Advisor
Messages
3,372
Reaction score
465
Hi, I have a simple/fast question...
Can you reliably use a Poisson function to fit on data that seem to be Gaussian distributed (although that is due to the large number of the mean)?
 
Physics news on Phys.org
For the question to be interpreted in any specific way, you need to describe the data (precisely).
 
Hey ChrisVer.

Poisson distributions (and Poisson processes) are constructed from very specific first principles where they represent rates as a limit to a Binomial distribution (with certain properties).

Usually these processes model rates and similar phenomena - you might want to tell us what you are trying to do so we can give further feedback.
 
If you have reason to think that the process is a Poisson process, you may want to check the sample variance. Poisson only has one parameter, λ, which is both the mean and variance. If the sample mean and variance are close, you can probably model it as Poisson. Otherwise, Gaussian would probably give a better fit.

You might also want to check the sample skewness. It should be close to λ-1/2.

PS. I am not sure how to define "close to" for the sample variance and skew. Maybe you can Google a confidence interval.
 
the problem with the gauss is that it's symmetric around the mean something that was not the case for my histograms.
I thought about fitting on it, to get the Var, but at the end I chose to integrate it and find the +/- 34%
 
ChrisVer said:
the problem with the gauss is that it's symmetric around the mean something that was not the case for my histograms.
I thought about fitting on it, to get the Var, but at the end I chose to integrate it and find the +/- 34%

Have you tried to transform your histograms? like take the logarithm of the data often makes things more symmetric.
 
  • Like
Likes   Reactions: FactChecker
That's an example of a histo...
 

Attachments

  • EX.jpg
    EX.jpg
    29.1 KB · Views: 546
ChrisVer said:
That's an example of a histo...

Try a Box-Cox transformation to make it more normal.
 
You should probably tell us what you are trying to do before using arbitrary transformations, test statistics and inferences.

Transforming data out of context is not a good idea and depending on what resolutions you are trying to make it can actually be detrimental to getting a useful inference.
 
  • #10
What I wanted to do was to:
not add bins in order to find the +/- 34.1% errors to the red line, which can be binning dependent.
but instead fit a function on the distribution, and integrate that function around the red line to get the +/-34.1%.
Obviously the distribution is not Gaussian, but looks more like a Poisson...
I was thinking about rescaling the x-axis [since Poisson is accepting integer entries while I have floats], doing the fit, integrating, and then scale everything [together with the obtained variances] back to the original x-axis.

The point is that I have other distributions which look pretty much like Gaussians, and I wanted to make sure I could use a Poisson to fit them too [since the code should do that, I wouldn't want to check everytime the distribution and determine with what I could fit it with].
 
  • #11
If you want to fit probabilities to data then that is understandable - but I ask because if it based on a particular process (of which the distribution constraints should be derived) then it means you typically construct models for specific reasons before you fit them.

I'd look at Gamma, Chi-square and other generalized distributions of these for more. You'll find they can deal with these bumps and skewness and you can estimate the parameters of these distributions and do goodness of fit tests.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
Replies
28
Views
4K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 12 ·
Replies
12
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
5K