# How to compute the probability that a given set of numbers is random

1. Aug 21, 2011

### mrspeedybob

Suppose you have a set of digits, for the sake of simplicity we'll make them binary, how would you determine the probability that the set is random? For example, given the following 3 strings of numbers...

1111111111111111111111111111111111111111111111111111111111111111
0010010000111111011010101000100010000101101000110000100011010011
1101011111001000000000011101111011111010010101011111101010110110

The first appears non random while the second and third appear random. In reality, only the third actually is random. Given a truly random string of 1s and 0s all 3 are equally likely. So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?

For the sake of simplicity assume that if they are random then the probability of 1 is the same as the probability of 0 and that each digit selected without consideration of any other digit.

2. Aug 21, 2011

### SW VandeCarr

You can't always know. It really depends on how the sequence is generated. The decimal expansion of an irrational number appears random, but is fully determined by an algorithm. A truly random sequence cannot generated by an algorithm unless it includes a randomizing process. It is currently accepted that only quantum processes are truly random, but other processes are effectively random in that no algorithm can be specified.

Randomness can be checked by evaluating the output of a process. Random processes can often be described by their distributions. So if for a string of digits such that every digit is equally likely, then the process should converge to a uniform distribution. However there is always a vanishingly small probability that it may not or that your choice of a distribution is wrong.

Last edited: Aug 21, 2011
3. Aug 21, 2011

### alexfloo

The "probability that the sequence came from some distribution" actually isn't well-defined. In some cases the "probability that distribution D *would* generate the sequence" is a good proxy, but that won't help in this case. This is because the distribution you're interested in is, by definition, the distribution for which all sequences have the same probability.

If you are willing to assume that the dependencies are the same for each digit (if digit 3 has some dependency on digit 1, then digit n+3 has that same dependency on digit n+1), then each subsequence of n digits (for all n) is identically distributed (but not necessarily independent). You can check whether the proportion of 1s is close to 1/2, and you can furthermore check whether the proportion of each 2-character sequence is close to 1/4.

In general, (subject to the assumption above) if the proportion of each n-subsequence is close to 2-n, (with additional weight given to smaller n, since we have a larger sample of them) then it is likely to be random.

This is not a formal procedure, but could form the basis of a formal hypothesis test (and if indeed it works, this has probably been done before).

4. Aug 21, 2011

### Hurkyl

Staff Emeritus
You can't.

What you can do is consider an alternate hypothesis, and do a statistical test to obtain evidence for or against the random generation hypothesis vs the alternate hypothesis.

Ideally, you'd want to involve prior probabilities too -- e.g. if you have very high confidence that a particular source is trustworthy and claims a sequence is random, that should affect whatever inference you make after gaining evidence from the statistical test.

Another thing you can do is to decide upon a statistical test before looking at the sequences (e.g. these tests), and then use the test to obtain a level of "confidence". In general this isn't so great, but it's a decent heuristic if you have nothing else to do.

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook