How can I generate random integers without bias using programming?

Click For Summary

Discussion Overview

The discussion revolves around methods for generating random integers without bias in programming, particularly focusing on the use of the C standard library's random number generation functions. Participants explore the implications of using the modulus operator with the rand function and propose alternative approaches to achieve a uniform distribution of random integers.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions whether using srand() % 50 + 1 introduces bias, suggesting that the number "1" could appear more frequently than others based on the output of srand.
  • Another participant clarifies that srand is used to seed the random number generator, while the rand function is responsible for generating random integers. They note that the range of rand can lead to biases when using the modulus operator.
  • A different participant points out that if RAND_MAX is small, there could be a bias favoring certain numbers due to the distribution of outputs from rand().
  • Some participants propose using rejection sampling or aggregating multiple random values to mitigate bias when generating random integers.
  • Concerns are raised about the low-order bits of integers affecting the distribution of odd and even numbers, suggesting that this could lead to patterns in the output.
  • Several participants discuss the quality of random number generators, with some advocating for high-quality RNGs over the standard library's rand function, citing potential deficiencies in certain implementations.

Areas of Agreement / Disagreement

Participants express differing views on the effectiveness and reliability of the rand function and its implementations. There is no consensus on the best method for generating random integers without bias, as various approaches and concerns are presented.

Contextual Notes

Limitations include the potential biases introduced by the modulus operation, the varying properties of different random number generators, and the specific deficiencies of low-quality RNGs. The discussion does not resolve these issues, leaving open questions about the best practices for random number generation.

leroyjenkens
Messages
621
Reaction score
49
In my book, it says the way to produce a random integer from, for example, 1-50 is to use srand() % 50 + 1.
But wouldn't that give "1" the chance of showing up more often than other numbers?
If srand is 0, then the random result is 1. If srand is 50, then the random result is also 1. The other numbers only have one chance of showing up, while "1" has two. Or am I missing something?

Thanks.
 
Technology news on Phys.org
Yes, you are missing something, or rather, a couple of things. One, srand is used only to seed the random number generator. It doesn't return a value. To generate a random number, use the rand function, which returns an int value.

Two, assuming that an int is 32 bits, the rand function will return a value in the range -2147483648 through +2147483647 inclusive. Applying the modulus operator gives you a number in the range 0 through 49. Adding 1 gives you a number in the range 1 through 50. For each of the numbers 0 through 49, there are lots of possible outputs of rand() % 50 that produce it. For example, rand() could produce ... -99, -49, 1, 51, 101, 151, 201, ..., and rand() % 50 would evaluate to 1.
 
Mark44 said:
Two, assuming that an int is 32 bits, the rand function will return a value in the range -2147483648 through +2147483647 inclusive.

The C++ specification says rand() returns a number between 0 and RAND_MAX, which may be as small as 32767.

Just taking the modulus will not give a truly random distribution. if RAND_MAX was 32767, there would be is a slight bias in favor of numbers between 0 and 17. Think what happens if the random number is between 32750 and 32767.

For anything serious, don't use the rand() library function anyway - you don't know what its properties are, and they are different for different implementations of the standard library. Find a good quality RNG that produces floating point numbers r with ##0 \le r \lt 1##, and then do (int)(r*50) + 1
 
A good quality RNG that produces integers is just as good as one that produces floating point. Probably better, due to greater precision.

In any case, if your RNG has a small range and you are worried about edge effects, there are a couple of things you can do:

You could sample with rejection. e.g. if your RNG produces numbers in [0, 32767], you could use the algorithm
Code:
int rand50() {
    while(1) {
        int x = rand();
        if(x < 32750) { return x % 50 + 1; }
    }
}
(replace rand() with whatever rng you want)

You could aggregate several values together to diminish the effects. e.g.
Code:
int rand50() {
    while(1) {
        uint32_t x = rand();
        uint32_t y = rand();
        uint32_t z = (x * 32768) + y;
        return z % 50 + 1;
    }
}
(replace rand() with whatever rng you want)
 
Hurkyl said:
A good quality RNG that produces integers is just as good as one that produces floating point. Probably better, due to greater precision.

The precision may be irrelevant here, because taking the modulus is homing in on the low-order bits of the integer. For example in the OP's case, whether the modulus creates an odd or even number depends only on ONE bit of the 32-bit integer. There may be an unwanted pattern of "odds and evens" in the sequence of random numbers, even if each number has equal probability of occurring.

Dividing the whole range of the RNG into N intervals avoids that problem. At worst, it homes in on the the high-order bits rather that the low-order, and that is less likely to behave badly.

In any case, for a 32-bit C implementation with IEEE arithmetic, a double can store a wider range of integers exactly (up to 2^52) than an int.
 
AlephZero said:
The precision may be irrelevant here, because taking the modulus is homing in on the low-order bits of the integer. For example in the OP's case, whether the modulus creates an odd or even number depends only on ONE bit of the 32-bit integer. There may be an unwanted pattern of "odds and evens" in the sequence of random numbers, even if each number has equal probability of occurring.
What you are reacting to is a specific deficiency of certain types of low-quality random number generators -- e.g. a Linear Congruential Generator (LCG) with power-of-2 modulus.

But whether the result is even or odd will always be determined by a single bit of information about the random number. And for a high-quality RNG, simply taking the low bit will have exactly the same quality as any other practical method of using it to decide between 0 and 1.
In any case, for a 32-bit C implementation with IEEE arithmetic, a double can store a wider range of integers exactly (up to 2^52) than an int.
I honestly don't remember the last time I programmed in a C/C++ environment that didn't have a 64-bit integer type available.
 
Hurkyl said:
What you are reacting to is a specific deficiency of certain types of low-quality random number generators -- e.g. a Linear Congruential Generator (LCG) with power-of-2 modulus.
Sure, but several widely used C implementations use exactly that, according to Wikipedia. A quick and simple LCG is perfectly adequate for many purposes provided you use it the right way. IMO the OP's book is the wrong way. Usinng the standard library rand(), a better way is
Code:
r = (int)(rand()/(RAND_MAX + 1.0)) * 50 + 1;
With a more sophisticated RNG that might not be necessary, but the defensive programming does no harm.
 

Similar threads

Replies
19
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
Replies
11
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
9
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
7
Views
5K
  • · Replies 46 ·
2
Replies
46
Views
9K
  • · Replies 1 ·
Replies
1
Views
3K