How Can You Draw from a PDF in Python Without the CDF?

AI Thread Summary
To draw random samples from a probability density function (PDF) defined between x_min and x_max without a cumulative distribution function (CDF), rejection sampling is a viable method. SciPy offers a generalized version of rejection sampling through the RatioUniforms function, which can handle distributions over the entire real line. Alternatively, one can implement a sampling method based on the inverse CDF, though this approach involves defining the PDF, calculating the CDF using numerical integration, and then inverting it to find random samples. While this method can be cumbersome and may struggle with difficult integrations, rejection sampling is generally faster, especially for larger sample sizes. However, if the PDF has a small acceptance area, naive rejection sampling may be inefficient. The choice of method may depend on the number of samples required and the characteristics of the PDF, with SciPy's prebuilt routines being recommended for ease of use.
ergospherical
Science Advisor
Homework Helper
Education Advisor
Insights Author
Messages
1,097
Reaction score
1,384
Some python function f(x) defines an (unnormalised) pdf between x_min and x_max
and say we want to draw x randomly from this distribution

if we had the CDF F(x) and its inverse F^{-1}(x), we could take values y uniformly in [0,1], and then our random values of x would be x = F^{-1}(y).

but without the CDF is there a direct way, maybe even built into some python library?
 
Computer science news on Phys.org
How is this related to PDFs?
 
Greg Bernhardt said:
How is this related to PDFs?
Probability density function
 
  • Haha
Likes Greg Bernhardt
Frabjous said:
Probability density function
oh boy, I had the wrong acronym :biggrin:
 
ergospherical said:
but without the CDF is there a direct way, maybe even built into some python library?
Yes, a generalised version of rejection sampling (which allows for support over ## [-\infty, +\infty] ##) as pointed to by @f95toli is available as scipy.stats.sampling.RatioUniforms as well as other methods including some based on numerical integration and inversion of the PDF to get the CDF which is then interpolated.
 
  • Like
Likes ergospherical
That is an interesting idea. I'd gone ahead and implemented a method based on the inverse cumulative density function, which is a bit cumbersome, namely

- define the pdf, i.e. pdf = lambda x: __some function of x__
- calculate the cdf, i.e. cdf = lambda x: quad(pdf, 0, x)[0]
- invert the cdf, i.e. return fsolve(lambda x: cdf(x) - random.random(), init_guess)[0]

where for init_guess I can simply feed in what is roughly the expected value.

I think it works, but I'm curious if rejection sampling would be faster (?). In particular, I'm now sampling from ##[0, \infty]## so I would need a generalized implementation.
 
Also, if the pdf is nasty then the numerical integration can go wrong...
 
Rejection sampling is usually quite fast. I would start there and only move to other methods if rejection sampling doesn't work.
 
  • #10
ergospherical said:
Also, if the pdf is nasty then the numerical integration can go wrong...
Yes, but a difficult integration often goes with a very small acceptance area so naive rejection sampling may be impossibly slow.

There is of course a trade-off depending on how many samples you want: if it is only a handful there is less to be gained by spending time on the integration, but if you want a few million and the acceptance is one in ## 10^9 ##...

The prebuilt SciPy routines are pretty good and easy to use, I'd experiment to see what works best.
 

Similar threads

Back
Top