How to check if a list of numbers is random?

ORF · Sep 13, 2017

Hello

I have used a random number generator to create a list of uniformly random numbers, between 0 and 1.

The usual check that I do is sorting the list, and histograming the difference between the following and the previous one. The shape of the histogram should follow an negative exponential, and the slope will be the size of the list. If the size of the list is around 1e5, no strange effect is observed. If the size of the list is around 1e8, empty bins appear, following a certain repetitive pattern (empty bin - filled bin - empty bin - filled bin - empty bin - filled bin - 3 empty bins - filled bin ... )

I'm not sure about this test, because if the size of the list is huge, the exponential slope will be also huge, and I don't know how to face precision issues. For a list of 1e8 numbers, 1% are repeated (a std::map is used for storing the numbers, so sorting and controlling repetitions is easy).

Question: how to check (in a better way) if a list of numbers is uniformly random?

Thank you for your time.

Regards.

FactChecker · Sep 13, 2017

A good way is to apply the ARIMA time series analysis of Box-Jenkins. It looks for any autocorrelation between the series and any number of lagged values of the same series. So it does what you are trying to do for one lag, but it does it for any number of lags. (You might have every number heavily influenced by the number 10 places earlier, even if it is completely independent of the immediately preceding number.) And it can give you statistics of how statistically significant the autocorrelations are.

PS. It only tests the autocorrelations. You need to use another test to verify that the distribution is close enough to uniform. The Chi-squared goodness of fit test would work well for that.

hilbert2 · Sep 13, 2017

Convert the number sequence ##(x_1 , x_2 , x_3 ,\dots )## to a 2d point sequence ##(x_1 , x_2 ),(x_3 ,x_4 ), (x_5 ,x_6),\dots## and plot the points on an xy-plane. If they're not random or uniformly distributed, you will probably see some obviously "non-random" patterns in the image.

See this blog post I've written: https://physicscomputingblog.com/2017/01/15/random-numbers-in-computation/

phyzguy · Sep 13, 2017

ORF said:

Hello

I have used a random number generator to create a list of uniformly random numbers, between 0 and 1.

The usual check that I do is sorting the list, and histograming the difference between the following and the previous one. The shape of the histogram should follow an negative exponential, and the slope will be the size of the list. If the size of the list is around 1e5, no strange effect is observed. If the size of the list is around 1e8, empty bins appear, following a certain repetitive pattern (empty bin - filled bin - empty bin - filled bin - empty bin - filled bin - 3 empty bins - filled bin ... )

I'm not sure about this test, because if the size of the list is huge, the exponential slope will be also huge, and I don't know how to face precision issues. For a list of 1e8 numbers, 1% are repeated (a std::map is used for storing the numbers, so sorting and controlling repetitions is easy).

Question: how to check (in a better way) if a list of numbers is uniformly random?

Thank you for your time.

Regards.

If you generated the numbers using the random number generator on a computer, they are almost certainly not random, but have been generated using a pseudo-random sequence generator. These numbers will repeat after a certain (usually very large) number of draws. If you require truly random numbers, this is not an easy problem. Most approaches that I know use dedicated hardware based on some sort of stochastic physical process to generate the numbers. Having said that, pseudo-random numbers are usually adequate for almost all applications. You may be able to increase the repeat period of the numbers by using a different seed.

ORF · Sep 13, 2017

Hello

@FactChecker : thank you for the tool! I will try to implement it.
@hilbert2 : thank you for the advice. Sadly, there is nothing strange in that 2D plot.
@phyzguy : you mean a different seed after, let's say, 1e6 random numbers?

Thank you all for your time :)

Regards

phyzguy · Sep 13, 2017

Well, I didn't mean to change the seed part way through the computation, but that's not a bad idea. I meant that the repeat period can be dependent on the seed, and some seeds have longer repeat periods than others.

FactChecker · Sep 13, 2017

ORF said:

Hello

@FactChecker : thank you for the tool! I will try to implement it.

The free software R has an ARIMA program. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/arima.html

@hilbert2 : thank you for the advice. Sadly, there is nothing strange in that 2D plot.

I am not surprised. If you are using a modern uniform random number generator that is part of a computer language standard library, they probably pass all but the most sophisticated tests.

@phyzguy : you mean a different seed after, let's say, 1e6 random numbers?

I don't think this is good idea. It is very unlikely that your choice of the next seed will be statistically better than the continuation of the pseudo-random number sequence from the first seed. And it will throw another variable in that you will need to keep track of. One place where you want different seeds is for different random number sequences where you want to guarantee that two or more simulated options get the same "luck of the draw". For instance, if you were simulating a battle comparing force A versus B fighting the bad guys, there may be some very critical random numbers (e.g. how close a nuclear bomb comes to a target or whether a critical bridge gets captured intact) where you want A and B to have the same luck. This could be in the middle of a large simulation where A is on the 10,000th random number and B is on the 50,000th random number. Then you need a separate random number sequence, nukeAccuracy, that you can use the same luck (i.e. random number) just for those major events in both A and B.

jim mcnamara · Sep 13, 2017

PRNG's are cyclic, they will repeat after (usually) a large number iterations. The cyclic nature of these algorithms is used in GPS, for example. So simply because a random number algorithm is cyclic does not preclude it's use, or mean that it is defective. You! get to pick the one that meets your needs.

Some references:
Pseudo Random Number Generators for Statistical Applications, L. E. Cannon (1971)
Generation of Pseudo-Random Numbers L. E. Howell 1982

Note - outside of cryptography PNRG's are pretty much a done deal. So do not reinvent 40 year old tech. Unless of course this is just part of homework or general messing around. Cryptographically sound PNRG's are periodically revisited.
Read Schneier: https://www.schneier.com/blog/archives/2006/06/random_number_g.html

jbriggs444 · Sep 14, 2017

jim mcnamara said:

PRNG's are cyclic, they will repeat after (usually) a large number iterations.

This is true for any decent PRNG you are likely to actually use. A PRNG with finite bounded state must repeat. However, a PRNG with a state that is allowed to grow without bound need not repeat. As a simple example, consider a "PRNG" that generates the hexadecimal digits of pi. (Or one which simply outputs Champernown's constant).

.Scott · Sep 14, 2017

You are asking how to determine whether the sequence is "uniformly" random. There are several things that a random sequence can provide. In some cases you may need a sequence that is random in the sense that it cannot be predicted. Since you are asking about "uniformity", I will assume that that is what you need and that is what I will address.

By uniformity, you seem to mean something like "all values from 0 to 1 have an equal chance to be selected". But I am guessing that the value you are receiving from your random number generator is a floating point number. If that is the case, what your random number generator is probably doing is this:

1) Generate a random integer value from 0 to N (or perhaps 1 to N) where N is very large - perhaps on the order of 2^64.
2) Convert the value to floating point.
3) Scale it to 0 to 1.

Note that there are a lot more floating point value from 0 to 0.1 than there are from 0.1 to 1.0. So "uniformly" does mean an equal chance for all numbers, it means that the chance of the random value occurring within a sub-interval of the interval 0 to 1 must be the width of that sub-interval.

Let's say that the core random value is in the range 1 to (2^64)-1 (and, if your seed is 64 bits long, this is actually likely). That means that the range of 0 to 2^-30 (about a billionth) will only be receiving about 2^34 codes. Since there are more than 34 bits in the mantissa of a "double" floating point value, you are assured that certain values will never be returned by the random number generator.

In contrast, the higher range values (0.1 to 1.0) will have fewer bits than the core random value - and so they will be hit multiple times before the core random value is repeated.

But, of course, if it is on the order of 2^64, it will not repeat. You don't have enough time, processors, and electric power to call the random number generator 2^64 (about 2*10^19) times.

Detecting that lack of uniformity near zero will be hard to detect. The best way to make sure that it is generating the right random value is to look at the code for the generator. If that is not possible, it is not difficult to write a random number generator to your own spec.

The fact is, uniformity is the easiest thing for the random number generator to do. In most cases, the problem is that it is too uniform. For example, you can use a shift and taps register to generate a full length 33 bit code - and then reject all code that start with a zero. This will give you a precise uniformity over the range 0 to (2^32)-1. But "truly" random numbers are not "precisely" uniform.

ORF · Sep 14, 2017

Hello

The problem was the precision: if you generate a huge list of numbers, but you don't have enough precision (ie, the numbers will be truncated), the list will be bias.

Using "long double" instead of "double" solved the problem.

Thank you all for your time. You have taught me a lot about this topic :)

Regards.
PS: long double random number generator,
http://www.cplusplus.com/reference/random/uniform_real_distribution/uniform_real_distribution/

How to check if a list of numbers is random?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect