Binned Maximum Likelihood fit in python?

Click For Summary

Discussion Overview

The discussion revolves around implementing Maximum Likelihood fitting methods for both binned and unbinned data in Python, specifically seeking alternatives to the curve_fit function from Scipy. Participants explore various approaches, share code snippets, and address issues encountered during implementation.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Exploratory

Main Points Raised

  • One participant expresses a desire to transition from Least-squares fits to Maximum Likelihood fits for binned and unbinned data, citing difficulties with existing Scipy documentation.
  • Another participant suggests that the functionality may exist within Scipy or specialized libraries, referencing a Stack Exchange post for further guidance.
  • A participant challenges the original poster to clarify the issues faced with their implementation, emphasizing the need for specificity in describing problems encountered with the fitting process.
  • Concerns are raised about the correctness of the original poster's code, particularly regarding the definition of the probability density function (PDF) for the exponential distribution.
  • Participants discuss the limitations of existing methods and the need for a more sophisticated approach to fitting, particularly for binned data.
  • Links to external resources and examples are shared, though some participants express that these do not fully meet their needs for custom fitting functions.
  • One participant reports success in achieving convergence after normalizing their PDF, indicating progress in their fitting efforts.
  • Another participant encourages the original poster to share their experiences with the rv_histogram.fit method for binned data, suggesting it may function similarly to continuous fits.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the best approach for implementing Maximum Likelihood fits for binned data. There are multiple competing views regarding the adequacy of existing methods and the correctness of the original poster's code.

Contextual Notes

Participants note potential issues with the original poster's code, including the definition of the PDF and the handling of scale parameters. There is also mention of the need for further resources or education on statistical methods in Python.

ORF
Messages
169
Reaction score
18
TL;DR
Is there any built-in function to perform Binned Maximum Likelihood fit in python standard libraries?
Hi,

I have been using Python for a while now, but so far for Least-squares fits using curve_fit from Scipy.

I would like to start using Likelihood method to fit binned and unbinned data. I found some documentation in Scipy of how to implement unbinned likelihood fit, but I have not managed to make it work for a simple exponential...

[CODE lang="python" title="Unbinned likelihood fit"]
from scipy.stats import rv_continuous
import numpy as np

class myfunc_gen(rv_continuous):

"Exp distribution"

def _pdf(self, x,a):

return np.exp(x*a)

myfunc = myfunc_gen(name='exp')

a = 1.
x = myfunc.rvs(a, size=10)
a1, loc1, scale1 = myfunc.fit(x, a, floc=0, fscale=1)[/CODE]

I found that Pandas has some fit capabilities, but still quite limiting.

Question: is there any functionality in python equivalent to curve_fit from Scipy for Binned/Unbinned likelihood fits?

Thank you for your time.

Cheers,
ORF
 
Technology news on Phys.org
This is exactly what scipy-stats-rv-continuous-fit is for. Saying 'it doesn't work' is not going to find a solution, you need to be more specific:

Was the result not what you expected? Was it close but not accurate enough? Did it fail to execute? Does it run too slowly? Did your computer catch fire?
 
Hi,

pbuk said:
Was the result not what you expected? Was it close but not accurate enough? Did it fail to execute? Does it run too slowly? Did your computer catch fire?

In this case the result is wrong.
For other simple fitting function it complains either convergence is very slow or directly it reaches 100 iterations and it stops without converging. Probably there is something wrong with my code.

Still, this method is for unbinned data. Is there any method for binned data?

Thank you for your time.

Cheers,
ORF
 
ORF said:
In this case the result is wrong ... Probably there is something wrong with my code.
There is quite a lot wrong with your code. Take a look at how the expon distribution is defined in scipy's source. The key parts are:
Python:
    def _pdf(self, x):
        # expon.pdf(x) = exp(-x)
        return np.exp(-x)
Note that there is no scale parameter in there, _pdf must be defined with a scale factor of 1: you add the scale factor when creating an instance of the class or when calling its methods.

And note that the exponential PDF is not ## e^x ##.

Python:
expon = expon_gen(a=0.0, name='expon')
The exponential distribution is supported in ## [0, \infty) ## but the default support for a rv_continuous distribution is ## (-\infty, \infty) ##. This can be overriden with the (unhelpfully) named parameters a and b: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

ORF said:
Still, this method is for unbinned data. Is there any method for binned data?
rv_histogram.fit?

I think you probably need a course or a book on statistical methods with Python, unfortunately I can't recommend any.
 
  • Like
Likes   Reactions: ORF
Thanks, but that example is exactly as the one in the documentation (replacing beta distro by exp). I would like to do unbinned and binned likelihood fits using a custom pdf/fitting function.

Thank you for your time.
 
pbuk said:
There is quite a lot wrong with your code. Take a look at how the expon distribution is defined in scipy's source. The key parts are:
Python:
    def _pdf(self, x):
        # expon.pdf(x) = exp(-x)
        return np.exp(-x)
Note that there is no scale parameter in there, _pdf must be defined with a scale factor of 1: you add the scale factor when creating an instance of the class or when calling its methods.

And note that the exponential PDF is not ## e^x ##.

Python:
expon = expon_gen(a=0.0, name='expon')
The exponential distribution is supported in ## [0, \infty) ## but the default support for a rv_continuous distribution is ## (-\infty, \infty) ##. This can be overriden with the (unhelpfully) named parameters a and b: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.htmlrv_histogram.fit?

I think you probably need a course or a book on statistical methods with Python, unfortunately I can't recommend any.
Hi, thanks for your very complete explanation. After normalizing the pdf it converges nicely.

I hope I can continue from it.

Thanks,
ORF
 
  • Like
Likes   Reactions: pbuk
  • #10
ORF said:
I hope I can continue from it.
Do let us know how you get on with rv_histogram.fit for your binned data - I have never used it, but it looks like it should work similarly to the continuous fit once configured properly.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
7K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
7K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
Replies
3
Views
2K
Replies
7
Views
2K