How to compute the probability that a given set of numbers is random

Click For Summary

Discussion Overview

The discussion revolves around the challenge of computing the probability that a given set of binary numbers is random. Participants explore various methods and considerations related to randomness, including the implications of different generation processes and statistical testing.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants suggest that determining the randomness of a sequence depends on how it was generated, noting that sequences from algorithms may appear random but are not truly random.
  • It is proposed that randomness can be evaluated by examining the distribution of digits, with the expectation that a random process should converge to a uniform distribution.
  • One participant argues that the "probability that the sequence came from some distribution" is not well-defined, and that checking the proportion of 1s and 2-character sequences can provide insights into randomness.
  • Another viewpoint emphasizes the need for statistical tests to compare the random generation hypothesis against alternate hypotheses, incorporating prior probabilities for more informed inference.
  • There is mention of the importance of deciding on statistical tests before analyzing the sequences to avoid bias in the results.

Areas of Agreement / Disagreement

Participants express varying opinions on the feasibility of computing randomness probabilities, with some asserting it is impossible without knowing the origin of the numbers. There is no consensus on a definitive method for assessing randomness, and multiple competing views remain on how to approach the problem.

Contextual Notes

Limitations include the dependence on the assumptions about digit dependencies and the nature of the generation process. The discussion does not resolve the complexities involved in defining and measuring randomness.

mrspeedybob
Messages
869
Reaction score
65
Suppose you have a set of digits, for the sake of simplicity we'll make them binary, how would you determine the probability that the set is random? For example, given the following 3 strings of numbers...

1111111111111111111111111111111111111111111111111111111111111111
0010010000111111011010101000100010000101101000110000100011010011
1101011111001000000000011101111011111010010101011111101010110110



The first appears non random while the second and third appear random. In reality, only the third actually is random. Given a truly random string of 1s and 0s all 3 are equally likely. So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?

For the sake of simplicity assume that if they are random then the probability of 1 is the same as the probability of 0 and that each digit selected without consideration of any other digit.
 
Physics news on Phys.org
mrspeedybob said:
Suppose you have a set of digits, for the sake of simplicity we'll make them binary, how would you determine the probability that the set is random? For example, given the following 3 strings of numbers...

1111111111111111111111111111111111111111111111111111111111111111
0010010000111111011010101000100010000101101000110000100011010011
1101011111001000000000011101111011111010010101011111101010110110
The first appears non random while the second and third appear random. In reality, only the third actually is random. Given a truly random string of 1s and 0s all 3 are equally likely. So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?

For the sake of simplicity assume that if they are random then the probability of 1 is the same as the probability of 0 and that each digit selected without consideration of any other digit.

You can't always know. It really depends on how the sequence is generated. The decimal expansion of an irrational number appears random, but is fully determined by an algorithm. A truly random sequence cannot generated by an algorithm unless it includes a randomizing process. It is currently accepted that only quantum processes are truly random, but other processes are effectively random in that no algorithm can be specified.

Randomness can be checked by evaluating the output of a process. Random processes can often be described by their distributions. So if for a string of digits such that every digit is equally likely, then the process should converge to a uniform distribution. However there is always a vanishingly small probability that it may not or that your choice of a distribution is wrong.
 
Last edited:
The "probability that the sequence came from some distribution" actually isn't well-defined. In some cases the "probability that distribution D *would* generate the sequence" is a good proxy, but that won't help in this case. This is because the distribution you're interested in is, by definition, the distribution for which all sequences have the same probability.

If you are willing to assume that the dependencies are the same for each digit (if digit 3 has some dependency on digit 1, then digit n+3 has that same dependency on digit n+1), then each subsequence of n digits (for all n) is identically distributed (but not necessarily independent). You can check whether the proportion of 1s is close to 1/2, and you can furthermore check whether the proportion of each 2-character sequence is close to 1/4.

In general, (subject to the assumption above) if the proportion of each n-subsequence is close to 2-n, (with additional weight given to smaller n, since we have a larger sample of them) then it is likely to be random.

This is not a formal procedure, but could form the basis of a formal hypothesis test (and if indeed it works, this has probably been done before).
 
mrspeedybob said:
So If I didn't know the origin of the numbers how would I compute a probability that they were generated randomly?
You can't.

What you can do is consider an alternate hypothesis, and do a statistical test to obtain evidence for or against the random generation hypothesis vs the alternate hypothesis.

Ideally, you'd want to involve prior probabilities too -- e.g. if you have very high confidence that a particular source is trustworthy and claims a sequence is random, that should affect whatever inference you make after gaining evidence from the statistical test.


Another thing you can do is to decide upon a statistical test before looking at the sequences (e.g. these tests), and then use the test to obtain a level of "confidence". In general this isn't so great, but it's a decent heuristic if you have nothing else to do.
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 20 ·
Replies
20
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K