Fit a Poisson on Gaussian distributed data

In summary, the data looks like a Poisson process, so you can fit it using a Poisson distribution. However, the distribution might not be Gaussian, so you might want to try a Box-Cox transformation to make it more normal.
  • #1
ChrisVer
Gold Member
3,378
464
Hi, I have a simple/fast question...
Can you reliably use a Poisson function to fit on data that seem to be Gaussian distributed (although that is due to the large number of the mean)?
 
Physics news on Phys.org
  • #2
For the question to be interpreted in any specific way, you need to describe the data (precisely).
 
  • #3
Hey ChrisVer.

Poisson distributions (and Poisson processes) are constructed from very specific first principles where they represent rates as a limit to a Binomial distribution (with certain properties).

Usually these processes model rates and similar phenomena - you might want to tell us what you are trying to do so we can give further feedback.
 
  • #4
If you have reason to think that the process is a Poisson process, you may want to check the sample variance. Poisson only has one parameter, λ, which is both the mean and variance. If the sample mean and variance are close, you can probably model it as Poisson. Otherwise, Gaussian would probably give a better fit.

You might also want to check the sample skewness. It should be close to λ-1/2.

PS. I am not sure how to define "close to" for the sample variance and skew. Maybe you can Google a confidence interval.
 
  • #5
the problem with the gauss is that it's symmetric around the mean something that was not the case for my histograms.
I thought about fitting on it, to get the Var, but at the end I chose to integrate it and find the +/- 34%
 
  • #6
ChrisVer said:
the problem with the gauss is that it's symmetric around the mean something that was not the case for my histograms.
I thought about fitting on it, to get the Var, but at the end I chose to integrate it and find the +/- 34%

Have you tried to transform your histograms? like take the logarithm of the data often makes things more symmetric.
 
  • Like
Likes FactChecker
  • #7
That's an example of a histo...
 

Attachments

  • EX.jpg
    EX.jpg
    29.1 KB · Views: 457
  • #8
ChrisVer said:
That's an example of a histo...

Try a Box-Cox transformation to make it more normal.
 
  • #9
You should probably tell us what you are trying to do before using arbitrary transformations, test statistics and inferences.

Transforming data out of context is not a good idea and depending on what resolutions you are trying to make it can actually be detrimental to getting a useful inference.
 
  • #10
What I wanted to do was to:
not add bins in order to find the +/- 34.1% errors to the red line, which can be binning dependent.
but instead fit a function on the distribution, and integrate that function around the red line to get the +/-34.1%.
Obviously the distribution is not Gaussian, but looks more like a Poisson...
I was thinking about rescaling the x-axis [since Poisson is accepting integer entries while I have floats], doing the fit, integrating, and then scale everything [together with the obtained variances] back to the original x-axis.

The point is that I have other distributions which look pretty much like Gaussians, and I wanted to make sure I could use a Poisson to fit them too [since the code should do that, I wouldn't want to check everytime the distribution and determine with what I could fit it with].
 
  • #11
If you want to fit probabilities to data then that is understandable - but I ask because if it based on a particular process (of which the distribution constraints should be derived) then it means you typically construct models for specific reasons before you fit them.

I'd look at Gamma, Chi-square and other generalized distributions of these for more. You'll find they can deal with these bumps and skewness and you can estimate the parameters of these distributions and do goodness of fit tests.
 

1. What is a Poisson distribution?

A Poisson distribution is a probability distribution that is used to model the number of events that occur within a fixed time interval or within a specified region of space, given that these events occur with a known average rate and independently of the time since the last event. It is often used to model count data, such as the number of customers arriving at a store or the number of defects in a product.

2. How does a Poisson distribution fit on Gaussian distributed data?

A Poisson distribution can be used to model data that follows a Gaussian (also known as normal) distribution, as long as the data is count data. This is because the Poisson distribution assumes that the data is discrete (e.g. whole numbers) and the Gaussian distribution assumes that the data is continuous. By converting the continuous data to counts, the data can be better modeled by a Poisson distribution.

3. What is the difference between a Poisson distribution and a Gaussian distribution?

The main difference between a Poisson distribution and a Gaussian distribution is that the Poisson distribution is used to model discrete data (e.g. count data), while the Gaussian distribution is used to model continuous data. Additionally, the Poisson distribution assumes that the data follows a specific shape (skewed to the right) and has a fixed average rate, while the Gaussian distribution assumes that the data follows a bell-shaped curve and can have any mean and standard deviation.

4. How do you fit a Poisson distribution on Gaussian distributed data?

To fit a Poisson distribution on Gaussian distributed data, you can use a statistical software or programming language to convert the continuous data to counts. This can be done by rounding the data to the nearest whole number or by using the floor or ceiling function. Once the data is in counts, you can use the maximum likelihood estimation method to find the parameters of the Poisson distribution that best fit the data.

5. What are some common applications of fitting a Poisson on Gaussian distributed data?

Fitting a Poisson distribution on Gaussian distributed data is commonly used in fields such as finance, biology, and engineering. For example, it can be used to model the number of stock market trades in a given time period, the number of bacteria in a petri dish, or the number of failures in a manufacturing process. It can also be used in quality control to identify any discrepancies in the data, as the Poisson distribution can highlight any outliers or unexpected values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
596
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
961
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
Back
Top