Python Binned Maximum Likelihood fit in python?

AI Thread Summary
The discussion revolves around using Python for fitting data using the likelihood method, specifically for both binned and unbinned data. The user initially attempts to implement an exponential distribution using Scipy's `rv_continuous` but encounters issues with convergence and incorrect results. Suggestions include examining the Scipy source code for the exponential distribution to correct the probability density function (PDF) definition, emphasizing that the PDF should reflect the correct mathematical form. The conversation also touches on the limitations of Pandas for fitting and the need for a method similar to `curve_fit` for maximum likelihood fitting. Resources and links to relevant code examples are shared, including a reference to `rv_histogram.fit` for binned data. After correcting the PDF normalization, the user reports improved convergence and expresses interest in further exploring fitting methods.
ORF
Messages
169
Reaction score
18
TL;DR Summary
Is there any built-in function to perform Binned Maximum Likelihood fit in python standard libraries?
Hi,

I have been using Python for a while now, but so far for Least-squares fits using curve_fit from Scipy.

I would like to start using Likelihood method to fit binned and unbinned data. I found some documentation in Scipy of how to implement unbinned likelihood fit, but I have not managed to make it work for a simple exponential...

[CODE lang="python" title="Unbinned likelihood fit"]
from scipy.stats import rv_continuous
import numpy as np

class myfunc_gen(rv_continuous):

"Exp distribution"

def _pdf(self, x,a):

return np.exp(x*a)

myfunc = myfunc_gen(name='exp')

a = 1.
x = myfunc.rvs(a, size=10)
a1, loc1, scale1 = myfunc.fit(x, a, floc=0, fscale=1)[/CODE]

I found that Pandas has some fit capabilities, but still quite limiting.

Question: is there any functionality in python equivalent to curve_fit from Scipy for Binned/Unbinned likelihood fits?

Thank you for your time.

Cheers,
ORF
 
Technology news on Phys.org
This is exactly what scipy-stats-rv-continuous-fit is for. Saying 'it doesn't work' is not going to find a solution, you need to be more specific:

Was the result not what you expected? Was it close but not accurate enough? Did it fail to execute? Does it run too slowly? Did your computer catch fire?
 
Hi,

pbuk said:
Was the result not what you expected? Was it close but not accurate enough? Did it fail to execute? Does it run too slowly? Did your computer catch fire?

In this case the result is wrong.
For other simple fitting function it complains either convergence is very slow or directly it reaches 100 iterations and it stops without converging. Probably there is something wrong with my code.

Still, this method is for unbinned data. Is there any method for binned data?

Thank you for your time.

Cheers,
ORF
 
ORF said:
In this case the result is wrong ... Probably there is something wrong with my code.
There is quite a lot wrong with your code. Take a look at how the expon distribution is defined in scipy's source. The key parts are:
Python:
    def _pdf(self, x):
        # expon.pdf(x) = exp(-x)
        return np.exp(-x)
Note that there is no scale parameter in there, _pdf must be defined with a scale factor of 1: you add the scale factor when creating an instance of the class or when calling its methods.

And note that the exponential PDF is not ## e^x ##.

Python:
expon = expon_gen(a=0.0, name='expon')
The exponential distribution is supported in ## [0, \infty) ## but the default support for a rv_continuous distribution is ## (-\infty, \infty) ##. This can be overriden with the (unhelpfully) named parameters a and b: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

ORF said:
Still, this method is for unbinned data. Is there any method for binned data?
rv_histogram.fit?

I think you probably need a course or a book on statistical methods with Python, unfortunately I can't recommend any.
 
  • Like
Likes ORF
Thanks, but that example is exactly as the one in the documentation (replacing beta distro by exp). I would like to do unbinned and binned likelihood fits using a custom pdf/fitting function.

Thank you for your time.
 
pbuk said:
There is quite a lot wrong with your code. Take a look at how the expon distribution is defined in scipy's source. The key parts are:
Python:
    def _pdf(self, x):
        # expon.pdf(x) = exp(-x)
        return np.exp(-x)
Note that there is no scale parameter in there, _pdf must be defined with a scale factor of 1: you add the scale factor when creating an instance of the class or when calling its methods.

And note that the exponential PDF is not ## e^x ##.

Python:
expon = expon_gen(a=0.0, name='expon')
The exponential distribution is supported in ## [0, \infty) ## but the default support for a rv_continuous distribution is ## (-\infty, \infty) ##. This can be overriden with the (unhelpfully) named parameters a and b: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.htmlrv_histogram.fit?

I think you probably need a course or a book on statistical methods with Python, unfortunately I can't recommend any.
Hi, thanks for your very complete explanation. After normalizing the pdf it converges nicely.

I hope I can continue from it.

Thanks,
ORF
 
  • Like
Likes pbuk
  • #10
ORF said:
I hope I can continue from it.
Do let us know how you get on with rv_histogram.fit for your binned data - I have never used it, but it looks like it should work similarly to the continuous fit once configured properly.
 

Similar threads

Replies
6
Views
7K
Replies
6
Views
3K
Replies
4
Views
2K
Replies
6
Views
7K
Replies
1
Views
2K
Replies
1
Views
1K
Back
Top