Generating a random sample with a standard deviation

Click For Summary

Discussion Overview

The discussion revolves around generating a random list of numbers that conform to a bell curve, characterized by a specific mean and standard deviation. Participants explore various methods and techniques for achieving this, including programming approaches and mathematical concepts.

Discussion Character

  • Exploratory, Technical explanation, Mathematical reasoning

Main Points Raised

  • One participant describes their current method of generating a bell curve using a predefined list of numbers but expresses dissatisfaction with this approach.
  • Another participant suggests the Box-Muller transform as a potential starting point for generating normally distributed random numbers.
  • A different participant outlines two key problems in pseudo-random number generation: minimizing autocorrelation and achieving the desired sample distribution, recommending established references for further reading.
  • One participant proposes a method involving the cumulative distribution function (CDF) and its inverse for generating samples from a known distribution, emphasizing the need for numerical solutions when the inverse is not analytically solvable.
  • A similar point is reiterated regarding the use of the inverse transform method for sampling from a distribution, highlighting its effectiveness when the CDF is invertible.
  • Concerns are raised about the analytical solvability of the inverse function of a CDF, suggesting reliance on computational methods for numerical solutions.

Areas of Agreement / Disagreement

Participants present multiple competing views and methods for generating random samples that conform to a bell curve. There is no consensus on a single approach, and the discussion remains unresolved regarding the best method to use.

Contextual Notes

Some methods discussed depend on the properties of specific distributions, and the feasibility of analytical solutions for CDF inverses varies. The discussion does not resolve these limitations.

gamow99
Messages
71
Reaction score
2
I'm trying to write a computer program which generates a random list of numbers but the random numbers form a bell curve, that is, there is a mean and a standard deviation from that mean. I'm not interested in some function that gets the job done, rather I'm trying to understand how do you generate a random list of numbers which are not entirely but conform to a bell curve. I have already done the following in Python:

list5 = [5] * 8
list4 = [4,5,6] * 4
list3 = [3,4,5,6,7] * 2
list2 = [x for x in range(2,9)]
list1 = [x for x in range(1,11)]
list6 = list1 + list2 + list3 + list4 + list5

So in the above 5 appears 8 times more often often 1,2,9,10. 4 times more often than 3 and 4 and twice as often as 4 and 6 which does form a bell curve and then I just select randomly from list 6. But I don't like that solution.
 
Physics news on Phys.org
There are two problems of pseudo-random numbers that can be handled independently. The first is that the series of numbers should have as little detectable autocorrelations as possible. The second is to get the desired sample distribution. If the first problem is solved for generating a uniform distribution of numbers in [0,1), then there are several ways to use that to solve the second problem.

There has been a great deal of work done to solve both problems. An excellent reference is Knuth, The Art of Computer Programming, Vol 2: Seminumerical Algorithms. Chapter 3, Random Numbers. (Knuth's series of books is almost a bible for computer programmers.)

I do not advise you to try your own uniform random number generator unless you are prepared to learn a lot of number theory concepts.
The easiest, most versatile, brute-force method to solve the second problem is to use "rejection sampling". See https://en.wikipedia.org/wiki/Rejection_sampling. For the special case of the normal distribution there are several other techniques. A popular one is to use the Box-Muller transformation (see http://www.design.caltech.edu/erik/Misc/Gaussian.html). Mathworks uses other techniques in their MATLAB normrnd function, which they document reasonably well (see https://www.mathworks.com/company/newsletters/articles/normal-behavior.html )
 
  • Like
Likes   Reactions: WWGD, jim mcnamara, BvU and 1 other person
Suppose, f(x)= f(x,μ,σ) is your curve with known mean (μ) and sd (σ) and f(x)≥0. Find C=∫Xf(x)dx, -∞<x<∞. Draw a 3 digited (say) random number and put a decimal point before it. Let this fraction be R. Find x by solving ∫-∞x f(x)dx/c =R. x is now a sample form f(x).
 
  • Like
Likes   Reactions: FactChecker
ssd said:
Suppose, f(x)= f(x,μ,σ) is your curve with known mean (μ) and sd (σ) and f(x)≥0. Find C=∫Xf(x)dx, -∞<x<∞. Draw a 3 digited (say) random number and put a decimal point before it. Let this fraction be R. Find x by solving ∫-∞x f(x)dx/c =R. x is now a sample form f(x).
If the cumulative distribution function is invertible, this is a great method. It's called the inverse transform method (see https://en.wikipedia.org/wiki/Inverse_transform_sampling )
 
More often than not, inverse function of a CDF is not analytically solvable in terms of simple functions. We have to use a computer program for numerical solution.
 
  • Like
Likes   Reactions: FactChecker

Similar threads

  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K