How Can You Draw from a PDF in Python Without the CDF?

Click For Summary
SUMMARY

This discussion focuses on generating random samples from a probability density function (PDF) in Python without utilizing the cumulative distribution function (CDF). Participants highlight the use of rejection sampling, specifically the generalized version available in scipy.stats.sampling.RatioUniforms, as a viable method. Additionally, a method involving numerical integration and inversion of the PDF to derive the CDF is discussed, though it may be cumbersome and less efficient for complex PDFs. The consensus suggests starting with rejection sampling for efficiency, especially when dealing with large sample sizes.

PREREQUISITES
  • Understanding of probability density functions (PDFs)
  • Familiarity with Python programming
  • Knowledge of numerical integration techniques
  • Experience with the SciPy library, particularly scipy.stats
NEXT STEPS
  • Explore scipy.stats.sampling.RatioUniforms for rejection sampling implementation
  • Learn about numerical integration methods in Python using scipy.integrate.quad
  • Investigate the performance of rejection sampling versus numerical integration for various PDFs
  • Study the concept of acceptance-rejection sampling and its efficiency trade-offs
USEFUL FOR

This discussion is beneficial for data scientists, statisticians, and Python developers interested in sampling techniques from complex probability distributions without relying on CDFs.

ergospherical
Science Advisor
Homework Helper
Education Advisor
Insights Author
Messages
1,100
Reaction score
1,387
Some python function f(x) defines an (unnormalised) pdf between x_min and x_max
and say we want to draw x randomly from this distribution

if we had the CDF F(x) and its inverse F^{-1}(x), we could take values y uniformly in [0,1], and then our random values of x would be x = F^{-1}(y).

but without the CDF is there a direct way, maybe even built into some python library?
 
Computer science news on Phys.org
Greg Bernhardt said:
How is this related to PDFs?
Probability density function
 
  • Haha
Likes   Reactions: Greg Bernhardt
Frabjous said:
Probability density function
oh boy, I had the wrong acronym :biggrin:
 
  • Like
Likes   Reactions: Frabjous
ergospherical said:
but without the CDF is there a direct way, maybe even built into some python library?
Yes, a generalised version of rejection sampling (which allows for support over ## [-\infty, +\infty] ##) as pointed to by @f95toli is available as scipy.stats.sampling.RatioUniforms as well as other methods including some based on numerical integration and inversion of the PDF to get the CDF which is then interpolated.
 
  • Like
Likes   Reactions: ergospherical
That is an interesting idea. I'd gone ahead and implemented a method based on the inverse cumulative density function, which is a bit cumbersome, namely

- define the pdf, i.e. pdf = lambda x: __some function of x__
- calculate the cdf, i.e. cdf = lambda x: quad(pdf, 0, x)[0]
- invert the cdf, i.e. return fsolve(lambda x: cdf(x) - random.random(), init_guess)[0]

where for init_guess I can simply feed in what is roughly the expected value.

I think it works, but I'm curious if rejection sampling would be faster (?). In particular, I'm now sampling from ##[0, \infty]## so I would need a generalized implementation.
 
Also, if the pdf is nasty then the numerical integration can go wrong...
 
Rejection sampling is usually quite fast. I would start there and only move to other methods if rejection sampling doesn't work.
 
  • #10
ergospherical said:
Also, if the pdf is nasty then the numerical integration can go wrong...
Yes, but a difficult integration often goes with a very small acceptance area so naive rejection sampling may be impossibly slow.

There is of course a trade-off depending on how many samples you want: if it is only a handful there is less to be gained by spending time on the integration, but if you want a few million and the acceptance is one in ## 10^9 ##...

The prebuilt SciPy routines are pretty good and easy to use, I'd experiment to see what works best.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 50 ·
2
Replies
50
Views
6K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K