Obtaining samples of an RV with a given CDF

  • Context: Undergrad 
  • Thread starter Thread starter chroot
  • Start date Start date
  • Tags Tags
    Cdf
Click For Summary

Discussion Overview

The discussion revolves around obtaining samples from a random variable with a given cumulative distribution function (CDF) using computational methods in MATLAB. Participants explore various approaches to improve the efficiency and accuracy of sampling, particularly in the context of discrete and continuous distributions.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant describes using interpolation of the inverse function to sample from the CDF but notes performance issues and limitations with horizontal segments in the CDF.
  • Another participant suggests using specialized routines for discrete distributions and recommends vectorization to reduce computational overhead.
  • A later reply emphasizes the continuous nature of the CDF, which is not analytically described, and suggests performing a binary search for efficient sampling.
  • Further suggestions include adjusting x values for better spacing of CDF points and precomputing interpolations to reduce overhead during sampling.
  • Concerns are raised about the potential problems with CDF/quantile interpolation, particularly regarding endpoints and infinite tails.

Areas of Agreement / Disagreement

Participants express differing views on the best methods for sampling from the CDF, with no consensus on a single approach. Various techniques are proposed, but the discussion remains unresolved regarding the most effective solution.

Contextual Notes

Limitations include the dependence on the specific characteristics of the CDF, such as whether it is continuous or discrete, and the challenges posed by horizontal segments in the CDF. The discussion also highlights the need for careful consideration of computational efficiency.

chroot
Staff Emeritus
Science Advisor
Gold Member
Messages
10,270
Reaction score
45
Hey guys,

I know the cdf of a random process, and I want to obtain samples of it computationally. (In MATLAB, specifically.) How can I go about doing this?

The naive approach I took was to put the cdf into a variable, and then use interpolation of the inverse function. More specifically, say the distribution was this:

Code:
angles = [ 0 10 20 30 40 50 60 70 80 90 ];
cdf = [ 0 0.3 0.4 0.6 0.8 0.88 0.92 0.94 0.99 1.0 ];

To find a sample of this random process, I do something like this:

Code:
angle = interp1(cdf, angles, rand, 'spline');

This seems to work reasonably well, but it has some flaws.

First, it's pretty slow, and is a bottleneck in my program. Yes, linear interpolation is a little faster, but still much slower than what I'd like. I tried a quick 'qinterp1' function I found at the MATLAB file exchange, but could not get it to behave correctly.

Second, it will not work if the cdf is ever horizontal. In other words, the cdf [ 0 0.5 1.0 1.0 ... ] won't work, because interp1 can only work when the x values are distinct.

Is there some better way to do this? I remember seeing a mathematical explanation of how to get samples of a random process with a known cdf in a statistics class, but I suspect it was different than what I'm doing. If I'm doing it the right way conceptually, is there a way to speed up the computation? I have to do this many billions of times, so even just doing some precomputation would be enormously beneficial.

- Warren
 
Physics news on Phys.org
If you're simulating from a discrete distribution then there are other specialized routines on the file exchange that might help.

Also if the distributions don't change at every step then you could vectorize to reduce overhead, e.g. with "rand(1e6,1)" instead of "rand".
 
Actually, the cdf is continuous, but is not described analytically. Instead I took samples of the cdf.

- Warren
 
Pick a random number, r, uniformly from (0,1) and then perform a binary search in your CDF array for the highest index whose entry is <= r.
 
chroot said:
Actually, the cdf is continuous, but is not described analytically. Instead I took samples of the cdf.

Ok. Looks you're using equally spaced x points. Instead try choosing the x values so that the cdf points are more equally spaced (more work, but this is only done in the preprocessing). At the places where the cdf is horizontal choose the single x value you think is most appropriate, and add more cdf points nearby to make the spline more accurate.

Also with each call of interp1 there would be a lot of overhead for computing the spline. You might be able to speed things up significantly by precomputing it as described in the documentation for interp1.

In general cdf/quantile interpolation can be problematic though, and extra care should be taken with the endpoints especially if the tails are infinite.

Hope this helps.
 

Similar threads

  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 36 ·
2
Replies
36
Views
5K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
4
Views
10K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 0 ·
Replies
0
Views
3K