How can I generate random integers without bias using programming?

leroyjenkens · May 23, 2012

In my book, it says the way to produce a random integer from, for example, 1-50 is to use srand() % 50 + 1.
But wouldn't that give "1" the chance of showing up more often than other numbers?
If srand is 0, then the random result is 1. If srand is 50, then the random result is also 1. The other numbers only have one chance of showing up, while "1" has two. Or am I missing something?

Thanks.

Mark44 · May 24, 2012

Yes, you are missing something, or rather, a couple of things. One, srand is used only to seed the random number generator. It doesn't return a value. To generate a random number, use the rand function, which returns an int value.

Two, assuming that an int is 32 bits, the rand function will return a value in the range -2147483648 through +2147483647 inclusive. Applying the modulus operator gives you a number in the range 0 through 49. Adding 1 gives you a number in the range 1 through 50. For each of the numbers 0 through 49, there are lots of possible outputs of rand() % 50 that produce it. For example, rand() could produce ... -99, -49, 1, 51, 101, 151, 201, ..., and rand() % 50 would evaluate to 1.

AlephZero · May 24, 2012

Mark44 said:

Two, assuming that an int is 32 bits, the rand function will return a value in the range -2147483648 through +2147483647 inclusive.

The C++ specification says rand() returns a number between 0 and RAND_MAX, which may be as small as 32767.

Just taking the modulus will not give a truly random distribution. if RAND_MAX was 32767, there would be is a slight bias in favor of numbers between 0 and 17. Think what happens if the random number is between 32750 and 32767.

For anything serious, don't use the rand() library function anyway - you don't know what its properties are, and they are different for different implementations of the standard library. Find a good quality RNG that produces floating point numbers r with ##0 \le r \lt 1##, and then do (int)(r*50) + 1

Hurkyl · May 24, 2012

A good quality RNG that produces integers is just as good as one that produces floating point. Probably better, due to greater precision.

In any case, if your RNG has a small range and you are worried about edge effects, there are a couple of things you can do:

You could sample with rejection. e.g. if your RNG produces numbers in [0, 32767], you could use the algorithm

Code:

int rand50() {
    while(1) {
        int x = rand();
        if(x < 32750) { return x % 50 + 1; }
    }
}

(replace rand() with whatever rng you want)

You could aggregate several values together to diminish the effects. e.g.

Code:

int rand50() {
    while(1) {
        uint32_t x = rand();
        uint32_t y = rand();
        uint32_t z = (x * 32768) + y;
        return z % 50 + 1;
    }
}

(replace rand() with whatever rng you want)

AlephZero · May 24, 2012

Hurkyl said:

A good quality RNG that produces integers is just as good as one that produces floating point. Probably better, due to greater precision.

The precision may be irrelevant here, because taking the modulus is homing in on the low-order bits of the integer. For example in the OP's case, whether the modulus creates an odd or even number depends only on ONE bit of the 32-bit integer. There may be an unwanted pattern of "odds and evens" in the sequence of random numbers, even if each number has equal probabilty of occurring.

Dividing the whole range of the RNG into N intervals avoids that problem. At worst, it homes in on the the high-order bits rather that the low-order, and that is less likely to behave badly.

In any case, for a 32-bit C implementation with IEEE arithmetic, a double can store a wider range of integers exactly (up to 2^52) than an int.

Hurkyl · May 24, 2012

AlephZero said:

The precision may be irrelevant here, because taking the modulus is homing in on the low-order bits of the integer. For example in the OP's case, whether the modulus creates an odd or even number depends only on ONE bit of the 32-bit integer. There may be an unwanted pattern of "odds and evens" in the sequence of random numbers, even if each number has equal probabilty of occurring.

What you are reacting to is a specific deficiency of certain types of low-quality random number generators -- e.g. a Linear Congruential Generator (LCG) with power-of-2 modulus.

But whether the result is even or odd will always be determined by a single bit of information about the random number. And for a high-quality RNG, simply taking the low bit will have exactly the same quality as any other practical method of using it to decide between 0 and 1.

In any case, for a 32-bit C implementation with IEEE arithmetic, a double can store a wider range of integers exactly (up to 2^52) than an int.

I honestly don't remember the last time I programmed in a C/C++ environment that didn't have a 64-bit integer type available.

AlephZero · May 24, 2012

Hurkyl said:

What you are reacting to is a specific deficiency of certain types of low-quality random number generators -- e.g. a Linear Congruential Generator (LCG) with power-of-2 modulus.

Sure, but several widely used C implementations use exactly that, according to Wikipedia. A quick and simple LCG is perfectly adequate for many purposes provided you use it the right way. IMO the OP's book is the wrong way. Usinng the standard library rand(), a better way is

Code:

r = (int)(rand()/(RAND_MAX + 1.0)) * 50 + 1;

With a more sophisticated RNG that might not be necessary, but the defensive programming does no harm.

How can I generate random integers without bias using programming?

What are random integers in programming?

Why do we use random integers in programming?

How do you generate random integers in programming?

What is the difference between random integers and pseudo-random integers?

How can you ensure that random integers are truly random?

Similar threads

Hot Threads

Recent Insights