How Can You Draw from a PDF in Python Without the CDF?

ergospherical · Feb 21, 2024

Some python function f(x) defines an (unnormalised) pdf between x_min and x_max
and say we want to draw x randomly from this distribution

if we had the CDF F(x) and its inverse F^{-1}(x), we could take values y uniformly in [0,1], and then our random values of x would be x = F^{-1}(y).

but without the CDF is there a direct way, maybe even built into some python library?

f95toli · Feb 21, 2024

What do you mean by direct?
Would rejection sampling work?
https://en.wikipedia.org/wiki/Rejection_sampling

Greg Bernhardt · Feb 21, 2024

How is this related to PDFs?

Frabjous · Feb 21, 2024

Greg Bernhardt said:

How is this related to PDFs?

Probability density function

Greg Bernhardt · Feb 21, 2024

Frabjous said:

Probability density function

oh boy, I had the wrong acronym

pbuk · Feb 21, 2024

ergospherical said:

but without the CDF is there a direct way, maybe even built into some python library?

Yes, a generalised version of rejection sampling (which allows for support over ## [-\infty, +\infty] ##) as pointed to by @f95toli is available as scipy.stats.sampling.RatioUniforms as well as other methods including some based on numerical integration and inversion of the PDF to get the CDF which is then interpolated.

ergospherical · Feb 23, 2024

That is an interesting idea. I'd gone ahead and implemented a method based on the inverse cumulative density function, which is a bit cumbersome, namely

- define the pdf, i.e. pdf = lambda x: __some function of x__
- calculate the cdf, i.e. cdf = lambda x: quad(pdf, 0, x)[0]
- invert the cdf, i.e. return fsolve(lambda x: cdf(x) - random.random(), init_guess)[0]

where for init_guess I can simply feed in what is roughly the expected value.

I think it works, but I'm curious if rejection sampling would be faster (?). In particular, I'm now sampling from ##[0, \infty]## so I would need a generalized implementation.

ergospherical · Feb 23, 2024

Also, if the pdf is nasty then the numerical integration can go wrong...

phyzguy · Feb 23, 2024

Rejection sampling is usually quite fast. I would start there and only move to other methods if rejection sampling doesn't work.

pbuk · Feb 24, 2024

ergospherical said:

Also, if the pdf is nasty then the numerical integration can go wrong...

Yes, but a difficult integration often goes with a very small acceptance area so naive rejection sampling may be impossibly slow.

There is of course a trade-off depending on how many samples you want: if it is only a handful there is less to be gained by spending time on the integration, but if you want a few million and the acceptance is one in ## 10^9 ##...

The prebuilt SciPy routines are pretty good and easy to use, I'd experiment to see what works best.

How Can You Draw from a PDF in Python Without the CDF?

Similar threads

Hot Threads

Recent Insights