How to compute the probability that a given set of numbers is random

mrspeedybob
Messages
869
Reaction score
65
Suppose you have a set of digits, for the sake of simplicity we'll make them binary, how would you determine the probability that the set is random? For example, given the following 3 strings of numbers...

1111111111111111111111111111111111111111111111111111111111111111
0010010000111111011010101000100010000101101000110000100011010011
1101011111001000000000011101111011111010010101011111101010110110



The first appears non random while the second and third appear random. In reality, only the third actually is random. Given a truly random string of 1s and 0s all 3 are equally likely. So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?

For the sake of simplicity assume that if they are random then the probability of 1 is the same as the probability of 0 and that each digit selected without consideration of any other digit.
 
Physics news on Phys.org
mrspeedybob said:
Suppose you have a set of digits, for the sake of simplicity we'll make them binary, how would you determine the probability that the set is random? For example, given the following 3 strings of numbers...

1111111111111111111111111111111111111111111111111111111111111111
0010010000111111011010101000100010000101101000110000100011010011
1101011111001000000000011101111011111010010101011111101010110110
The first appears non random while the second and third appear random. In reality, only the third actually is random. Given a truly random string of 1s and 0s all 3 are equally likely. So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?

For the sake of simplicity assume that if they are random then the probability of 1 is the same as the probability of 0 and that each digit selected without consideration of any other digit.

You can't always know. It really depends on how the sequence is generated. The decimal expansion of an irrational number appears random, but is fully determined by an algorithm. A truly random sequence cannot generated by an algorithm unless it includes a randomizing process. It is currently accepted that only quantum processes are truly random, but other processes are effectively random in that no algorithm can be specified.

Randomness can be checked by evaluating the output of a process. Random processes can often be described by their distributions. So if for a string of digits such that every digit is equally likely, then the process should converge to a uniform distribution. However there is always a vanishingly small probability that it may not or that your choice of a distribution is wrong.
 
Last edited:
The "probability that the sequence came from some distribution" actually isn't well-defined. In some cases the "probability that distribution D *would* generate the sequence" is a good proxy, but that won't help in this case. This is because the distribution you're interested in is, by definition, the distribution for which all sequences have the same probability.

If you are willing to assume that the dependencies are the same for each digit (if digit 3 has some dependency on digit 1, then digit n+3 has that same dependency on digit n+1), then each subsequence of n digits (for all n) is identically distributed (but not necessarily independent). You can check whether the proportion of 1s is close to 1/2, and you can furthermore check whether the proportion of each 2-character sequence is close to 1/4.

In general, (subject to the assumption above) if the proportion of each n-subsequence is close to 2-n, (with additional weight given to smaller n, since we have a larger sample of them) then it is likely to be random.

This is not a formal procedure, but could form the basis of a formal hypothesis test (and if indeed it works, this has probably been done before).
 
mrspeedybob said:
So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?
You can't.

What you can do is consider an alternate hypothesis, and do a statistical test to obtain evidence for or against the random generation hypothesis vs the alternate hypothesis.

Ideally, you'd want to involve prior probabilities too -- e.g. if you have very high confidence that a particular source is trustworthy and claims a sequence is random, that should affect whatever inference you make after gaining evidence from the statistical test.


Another thing you can do is to decide upon a statistical test before looking at the sequences (e.g. these tests), and then use the test to obtain a level of "confidence". In general this isn't so great, but it's a decent heuristic if you have nothing else to do.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top