Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

How to compute the probability that a given set of numbers is random

  1. Aug 21, 2011 #1
    Suppose you have a set of digits, for the sake of simplicity we'll make them binary, how would you determine the probability that the set is random? For example, given the following 3 strings of numbers...


    The first appears non random while the second and third appear random. In reality, only the third actually is random. Given a truly random string of 1s and 0s all 3 are equally likely. So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?

    For the sake of simplicity assume that if they are random then the probability of 1 is the same as the probability of 0 and that each digit selected without consideration of any other digit.
  2. jcsd
  3. Aug 21, 2011 #2
    You can't always know. It really depends on how the sequence is generated. The decimal expansion of an irrational number appears random, but is fully determined by an algorithm. A truly random sequence cannot generated by an algorithm unless it includes a randomizing process. It is currently accepted that only quantum processes are truly random, but other processes are effectively random in that no algorithm can be specified.

    Randomness can be checked by evaluating the output of a process. Random processes can often be described by their distributions. So if for a string of digits such that every digit is equally likely, then the process should converge to a uniform distribution. However there is always a vanishingly small probability that it may not or that your choice of a distribution is wrong.
    Last edited: Aug 21, 2011
  4. Aug 21, 2011 #3
    The "probability that the sequence came from some distribution" actually isn't well-defined. In some cases the "probability that distribution D *would* generate the sequence" is a good proxy, but that won't help in this case. This is because the distribution you're interested in is, by definition, the distribution for which all sequences have the same probability.

    If you are willing to assume that the dependencies are the same for each digit (if digit 3 has some dependency on digit 1, then digit n+3 has that same dependency on digit n+1), then each subsequence of n digits (for all n) is identically distributed (but not necessarily independent). You can check whether the proportion of 1s is close to 1/2, and you can furthermore check whether the proportion of each 2-character sequence is close to 1/4.

    In general, (subject to the assumption above) if the proportion of each n-subsequence is close to 2-n, (with additional weight given to smaller n, since we have a larger sample of them) then it is likely to be random.

    This is not a formal procedure, but could form the basis of a formal hypothesis test (and if indeed it works, this has probably been done before).
  5. Aug 21, 2011 #4


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    You can't.

    What you can do is consider an alternate hypothesis, and do a statistical test to obtain evidence for or against the random generation hypothesis vs the alternate hypothesis.

    Ideally, you'd want to involve prior probabilities too -- e.g. if you have very high confidence that a particular source is trustworthy and claims a sequence is random, that should affect whatever inference you make after gaining evidence from the statistical test.

    Another thing you can do is to decide upon a statistical test before looking at the sequences (e.g. these tests), and then use the test to obtain a level of "confidence". In general this isn't so great, but it's a decent heuristic if you have nothing else to do.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook