# Definition of randomness (sidebar: truth is random)

## Main Question or Discussion Point

I find the definition of randomnumber at mathworld.wolfram.com rather unsatisfying.

Furthermore, dictionary.com will never help me decide what it means for an infinite sequence of numbers to be random.

I read this article once and I have changed it around a bit and I forget (nor can I find) the article. The rough idea is that a subset of N is random (ie it is a random list of numbers) if there is no (finite) formula corresponding to that set.

{1,1,1,1,...} is not random by this definition. f(n)=1 for all n specifies perfectly each term in this sequence.

Well, I think this might be a bit too model-theoretic for some but for the layperson there is some interesting conclusion involving Tarski's undefinability of truth theorem in arithmetic.

This is response to someone on IIDB named Alf about QM and randomness. Later, I challenge Alf to (1) well-define randomness and (2) prove that anything whatsoever in nature is random by this definition.

Also, I note that the set of tautologies corresponds fairly naturally (via Goedel numbering) to a RANDOM subset of N; thus one could intuitively though probably incorrectly decide that truth is random.

Here goes:
I however do not think rolling dice is random and therefore is at best a model for randomness. It's a rather poor example of something random, as is a pseudo-random number generator. In fact, I would go far as to say that we haven't yet both (1) well defined what randomness is and (2) been able to prove that randomness exists in nature.

It is safe to only consider defining randomness in terms of infinite sequences of numbers (which are functions from N to some set of numbers, N, Q, or R). An infinite sequence models the potentially infinite repetition of an experiment such as rolling a die. For example, {1,2,5,4,3,5,4,6,1,2,2,3,2,6,6,...} is a list of all the realized outcomes of rolling a die repeatedly. Thus I will define randomness for an arbitrary quantifiable event to mean that the sequence of numbers representing it is a list of random numbers where the definition of a set of random numbers follows.

I would define a sequence to be a random set of real (for example) numbers if that set is undefinable in (R,<,+,*). I tried to come up with sites that have a definition of definabliblity in model such as (R,<,+,*) and what I found are
http://plato.stanford.edu/entries/model-theory/ (keyword search for "definable") and
http://planetmath.org/encyclopedia/Definable2.html [Broken] . Essentially, a set of real numbers is random if no formula specifies it. That formula could be said to "predict" what will be in that set and thus an aspect of the inherintly unpredictable nature of randomness is tied into this definition. Another, more intuitive definition of random sequence is that it is random if there is no finite formula for the function that it is (again, sequence is a function from N to R, for example). This entails that if a sequence is random and a formula gives to each N the right output in R then that formula is essentially just as big as N. Likewise, if there is a finite formula for the sequence then the sequence is not random. The sequence {1,1,1,1,1,1,1,1,1,....} is obviously not random: define f(n)=1 for all n. A finite formula.

(I think you'll find the gist of this quite interesting.) Do random sequences exist? Yes and I believe I have a nontrivial example that is quite interesting (even for the layperson). Take the set of all wffs (well formed formulas) in 1st order logic. Encode each formula by a number in the following way (essentially), called the Goedel number for the formula if you want to look it up: assign to each logical symbol si [s sub i] a number ai [a sub i]. Then take the formula and read what symbols appear in that formula and make a finite sequence of numbers (a1,...,an). Then use unique prime factorization, mapping this wff to a natural number, the encoding of the formula, by [formula]:=(p1^a1)*...*(pn^an). Here, given formula, [formula] is a notation for the Goedel number of that formula and pi is the i-th prime number (2,3,5,7,11,...). Thus we have a map from the set of all wffs to N. A unique number (by the uniqueness of prime factorization) comes from all formulas. Now consider the set of all tautologies, properly contained in the set of all wffs. Call this set of wffs T. T' is the set of all Goedel numbers of wffs in T. T is the set of all wffs that are in some sense true; i.e., T is the set of all truths. T' is the set of all of the encodings of all truths. Theorem T' is undefinable in (N,<,+,*). This is the so-called Tarski "undefinability of truth" theorem. The theorem says that if you have a finite formula F (which is redundant for wffs because all wffs are finite), then it is not the case that n e T' iff F(n) is true.

Now consider a sequence of all Goedel numbers of all tautologies. I don't care what order the tautologies are in though it is possible to list them all because the whole set of wffs is countable. Thus 316 might be the first number in this sequence where 316 is the Goedel number if some tautology (e.g., for all x, x=x). The second number might be 847826 which the Goedel number of some other tautology. So you have this sequence of Goedel numbers of all the tautologies. This sequence is random.

Thus one could pull a stunt and say truth is random. I wouldn't go that far because I've only made a statement about arithmetic. But "truth is random" is on some superficial intuitive level the conclusion; i.e., the set of all true statements is not definable; i.e., the set of all true statements is not predicatble by some finite formula.

There certainly are a bunch of tautologies so this is a genuine example of randomness.

I think I have well-defined randomness. What your burden is now is to prove that anything in nature corresponds to this or any well-defined notion of randomness.

Note that saying electon behavior is given by some probability distribution is not sufficient to prove randomness of said behavior. {(heads,.5),(tails,.5)} is a probability distribution but as we've seen, rolling a die is NOT random (it is chaotic).

Last edited by a moderator:

0rthodontist
Well, the PlanetMath article involves some math that I don't know yet (stuck at the "models" symbol), but I can give a vague, philosophical answer that may be a misinterpretation. From a philosophical standpoint you can abstractly define anything--it depends on what you start with. If you start with R, <, +, *, then there are sequences that you can't define in a finite number of symbols. But if you start with T' instead of R, then it looks like you can define T' it in a finite number of symbols. If you start with T', <, +, *, can you define R in a finite number of symbols?

Edit: Well I have done a little more reading and maybe that last one is a dumb question, because R is not a subset of T'? But still, it looks like there is an easy algorithm to determine the elements of T' in order, namely go through each number and factor it and see if its factorization corresponds to a tautology through evaluating every possible assignment of truth values. So T' doesn't look random from that perspective.

The thing is that if you are just vaguely talking about a formula for the nth term of a sequence, it can be given. Even if you have some guaranteed random sequence, by whatever "random" means, there is probably some way of determining what the next item in the sequence is. If you start speaking your sequence out loud at a rate of one item per second, then I can define the n'th term of the sequence as the term you say in the n'th second.

Last edited:
0rthodontist said:
Well, the PlanetMath article involves some math that I don't know yet (stuck at the "models" symbol), but I can give a vague, philosophical answer that may be a misinterpretation. From a philosophical standpoint you can abstractly define anything--it depends on what you start with. If you start with R, <, +, *, then there are sequences that you can't define in a finite number of symbols. But if you start with T' instead of R, then it looks like you can define T' it in a finite number of symbols. If you start with T', <, +, *, can you define R in a finite number of symbols?

Edit: Well I have done a little more reading and maybe that last one is a dumb question, because R is not a subset of T'? But still, it looks like there is an easy algorithm to determine the elements of T' in order, namely go through each number and factor it and see if its factorization corresponds to a tautology through evaluating every possible assignment of truth values. So T' doesn't look random from that perspective.
From mathworld.wolfram.com: "A theory is decidable iff there is an algorithm which can determine whether or not any sentence r is a member of the theory." Consider the incompleness theorem. I think it means that there actually is no effective algorithm or procedure for deciding if a formula about arithmetic is a tautology. I'm still unclear on this point. Maybe someone who knows more about this can throw us a bone?

The thing is that if you are just vaguely talking about a formula for the nth term of a sequence, it can be given. Even if you have some guaranteed random sequence, by whatever "random" means, there is probably some way of determining what the next item in the sequence is. If you start speaking your sequence out loud at a rate of one item per second, then I can define the n'th term of the sequence as the term you say in the n'th second.
But your formula would only necessarily fit the first n-1 numbers and not guaranteed to predict the n+1 number. Also, any finite sequence cannot be random which is essentially what I think you're saying.

0rthodontist
All right, I was confused about the term "first order logic," I was thinking of propositional logic. You are right, you can't just assign truth values to determine if a statement in first order logic is true.

But your T' still does not seem random to me. If it is impossible even in theory to decide whether some integers are elements of T', that doesn't strike me as randomness so much as some kind of ephemeral nonreality... No one can list the elements of T' in increasing order. They have to stop whenever they reach a number that corresponds to an undecidable statement.

Also, even if people can't list ALL the elements of T', they can still list many of them. If we have the same algorithm on our computers to try to decide whether x is an element of T', an algorithm which perhaps gives up after a while, then our results--which are guaranteed elements of T', though we may leave out some elements in between--are guaranteed to be the same. If you can recognize even SOME deterministic pattern in a sequence, and if you can tell what even only some of the as-yet-unseen elements of the sequence are, then it doesn't sound right calling it random.

Last edited:
Hurkyl
Staff Emeritus
Gold Member
One of the funny things about probability is that random variables never actually take on values. So, if you insist on things taking values, then those things can't be random! (Of course, there are interpretations of QM in which observables don't actually take on values, in which case it would truly be accurate to say that they are "random")

There is a theorem of computability theory that says that "incompressible" and "random" strings "look the same". (The following is about finite strings)

We could talk about describing strings of digits. For example, we want to describe a string s of binary digits.

Suppose we have a program (specifically, a Turing machine) M, and some binary string w. Suppose that M(w) = s. (That is, if I run M on w, I get s as the answer) Then, we can use the pair <M, w> as a description of s. (For some reasonable way to encode a program and its input as a binary string)

We can define the descriptive complexity (or Kolmogorov complexity) K(s) of a string s by:

K(s) := min { |<M, w>| | <M, w> is a description of s }

Where |s| is simply the length of the string s.

A string s is said to be compressible if K(s) < |s|.

Note that K(s) depends on the method we use to encode a program and its input. Descriptive complexity is not well-defined for any individual input -- it only makes sense to speak about its bulk-properties.

For example, you can prove that at least half of the strings of a given size cannot be compressed. (for any particular scheme for describing strings)

But, for any individual string s, it is meaningless to ask whether s is compressible. (e.g. I can devise a scheme in which s is denoted by "0", and the binary representation of every other description <M, w> begins with a "1")

We can say if a property P holds for almost all strings: the fraction

|{s | |s| = n and not P(s) }| / |{s | |s| = n}|

goes to 0 as n goes to infinity.

If we have a computable property P (that is, one that can be computed by a program, aka Turing machine), then we have:

Theorem: Suppose P holds for almost all strings. Then P(s) is true for all compressible strings s, except for finitely many exceptions.

(This theorem actually has a slightly stronger statement, but I don't think it's necessary for this discussion)

In some sense, we would expect a property true for "almost all" strings to hold for almost all random strings. So, incompressible strings exhibit the same set of properties.

So, if you're willing to trade "randomness" for "incompressible", it is possible to speak about finite, random strings. But you are only allowed to speak about them in bulk, since the notion doesn't make sense for any particular string. (Or, I suppose, you could fix a particular representation of descriptive complexity, and speak relative to that)

(I've had this discussion in real life before. Can you tell? )

I'll have to think some more about infinite strings before I can decide what I want to say about them.

Last edited:
0rthodontist said:
All right, I was confused about the term "first order logic," I was thinking of propositional logic. You are right, you can't just assign truth values to determine if a statement in first order logic is true.

But your T' still does not seem random to me. If it is impossible even in theory to decide whether some integers are elements of T', that doesn't strike me as randomness so much as some kind of ephemeral nonreality... No one can list the elements of T' in increasing order. They have to stop whenever they reach a number that corresponds to an undecidable statement.

Also, even if people can't list ALL the elements of T', they can still list many of them. If we have the same algorithm on our computers to try to decide whether x is an element of T', an algorithm which perhaps gives up after a while, then our results--which are guaranteed elements of T', though we may leave out some elements in between--are guaranteed to be the same. If you can recognize even SOME deterministic pattern in a sequence, and if you can tell what even only some of the as-yet-unseen elements of the sequence are, then it doesn't sound right calling it random.
That's a good point. I guess I just differ on that in so that I feel that if a sequence contains a nonrandom subsequence then it does not follow that the sequence is nonrandom.

That's basically more along the lines of the original article that inspired me to define randomness by the undefinable subsets.

Last edited:
Hurkyl
Staff Emeritus
Gold Member
Yikes, that's a big quote!

Before I really say more, I'm pretty sure that ZFC has a definable model -- that is, the only sets that exist are those which uniquely satisfy some existential set-theroetic formula. (I may need to restate the axiom of choice to provide an actual logical mapping from sets to choice functions, though. I'm not sure)

If so, I think that there are no random subsets of N in this model.

I think you meant to say "sequence of natural numbers" and not "subset of the natural numbers" -- but my claim would hold for that too.

Your example of a "random" sequence fails, because it is the unique sequence s satisfying:

s is a sequence of natural numbers, and s(n) is the n-th smallest integer that is a Gödel number of a WFF that holds in every model of the natural numbers.

I think you need to use something weaker. For example, you could define:

A sequence s is said to be random if and only if it is not computable. That is, there does not exist a Turing machine M such that s(n) = M(n) for all indices n.

Last edited:
0rthodontist
I think that compressability is a good idea for randomness, but I think that the compression scheme is very important for determining if something is random, and it is random only in relation to a compression scheme. If some algorithm, or philosophically some person, can't reduce and summarize data very much, it means that the data is random _for that person alone_. Someone else might be able to compress the data very much, or even large sets of data, that the first person can't do anything with. No string and no set of strings (sufficiently small with respect to the length of the strings) by itself is random, it is only random with respect to the person or algorithm trying to analyze it.

Hurkyl
Staff Emeritus
Gold Member
I suppose you could get by with just requiring the formula to be in a weaker language. E.G.

A sequence s of natural numbers is random if and only if there does not exist a unary relation P for which P(s) and E!x:P(x) are both in Th(N).

(I wonder if that is the same as the computable definition I gave?)

It strikes me that there are some quirky things that should be observed -- I don't know if you can elimiate them.

I strongly suspect that in any model of N, there exists a sequence s and a unary relation P in the language of arithmetic, such that P(s) is true in the model, E!x:P(x) is in Th(N), but P(s) is independent of Th(N).

It might even be possible that for any random sequence s, we can pick a model of N where the above can happen for s!

Last edited:
0rthodontist
On an intuitive level I just don't see how these definitions on the basis of noncomputability or satisfying/not satisfying formulas are connected with the idea of randomness. Undecidability is not randomness.

Hurkyl
Staff Emeritus
Gold Member
Intuitively, randomness is unpredictability.

If a sequence is not computable, then there is no algorithm to predict its terms!

If a sequence is computable, then there is an algorithm to predict its terms.

So, intuitively, random sequences are precisely those that are not computable.

Alternatively, a random sequence is one that doesn't have any special properties.

Any formula that determines the terms of the random sequence would be considered a special property.

So, intuitively, random sequences are precisely those whose terms aren't determined by a formula.

0rthodontist
Randomness is unpredictability _of data_. If some terms of a sequence are noncomputable then _there is no data there_.

Anyway, and not to soften that point because I really think that's the heart of the issue, what about computation schemes that include oracles for undecidable problems?

Last edited:
Hurkyl
Staff Emeritus
Gold Member
A sequence is computable if and only if there exists a (deterministic) program M such that M(n) is the n-th term of the sequence.

A noncomputable sequence just means that there does not exist a program that computes its terms. It doesn't mean that the sequence doesn't exist at all. :tongue:

An example of a noncomputable sequence comes from the Busy Beaver problem:

s(n) := the largest number that can be printed by a Turing machine with n states.

If you have an oracle, then you could talk about oracle-computable functions. The theory is all the same, you just have more computable functions. (And, of course, with a sufficiently powerful class of oracles, any sequence would be computable)

Last edited:
0rthodontist
Just because there's a term there doesn't mean there's data there. I'm basically winging this definition, but data has to be able to be input to something. If you can't figure out what the term is you can't input it to anything, nothing can plot it as a data point in any graph, it can't be used for any computational purpose. (well the abstraction of the term could but the abstraction of the term isn't the term). Data has to be real in a way that a computer can compute.

Hurkyl
Staff Emeritus
Gold Member
Nobody said you can't figure it out -- just that there's no program that can do it.

For example, I could sit there and use my Geiger counter to figure out when my (infinitely large collection of) atoms decay, which gives me plenty of data to work with, but it might not be possible to write a computer program whose output is the same sequence of numbers.

Hurkyl
Staff Emeritus
Gold Member
Or even just how long it takes a single atom to decay. I could use that to generate a sequence of integers (take the binary representation of the time it took to decay). It might not be possible for a computer program to spit out that sequence of numbers.

Hurkyl said:
Before I really say more, I'm pretty sure that ZFC has a definable model -- that is, the only sets that exist are those which uniquely satisfy some existential set-theroetic formula. (I may need to restate the axiom of choice to provide an actual logical mapping from sets to choice functions, though. I'm not sure)

I think you need to use something weaker. For example, you could define:

A sequence s is said to be random if and only if it is not computable. That is, there does not exist a Turing machine M such that s(n) = M(n) for all indices n.
Very interesting. Thanks for the feedback.

I would like to study this model of set theory where x is a set iff f(x), where f is an existential 1st order formula. That certainly would mean that no subsets of N are lists of random numbers. Do you have more information on how this model is constructed for I thought that if we found any model for ZFC, then ZFC would be consistent and that it was not known whether ZFC is consistent. Am I wrong? Is this f built up in a constructive way or is it a pure existence of f theorem?

Your "weaker" definition of random is still capturing the idea that no function with finite formula perfectly predicts the sequence, right? What's the difference between my definition of a random sequence in the sense that are there random sequences in my definition that are not random in yours? Maybe T' is not a random subset of N. Could you eleborate on why in my definition T' is not a random subset of N?

So to more eloquently state my definition of random sequence, first it depends on what a random subset A of N is. A is a random subset of N iff by definition A is not definable in terms of the structure (N,<,+,*) [which may be redundant for I might be able to define < in terms of +]; i.e., there is no 1st order formula f such that n e A iff f(n). Then, a sequence NxS, where S is a set of numbers, is random iff S is a random subset of N. By this definition, since T' is a random subset of N, NxT' is a random sequence.

Your idea seems to be that there is no effective procedure for computing the sequence NxS. Which definition better encapsulates the unpredictability of randomness??? If yours, then I'd readily adopt it.

Orthodontist, well, an infinite sequence is just a model of taking an experiment and repeating it forever. The sequence contains all of the data. I hope this is relevant to your question. "Thus I will define randomness for an arbitrary quantifiable event to mean that the sequence of numbers representing it is a list of random numbers where the definition of a set of random numbers follows."

Hurkyl
Staff Emeritus
Gold Member
I would like to study this model of set theory where x is a set iff f(x), where f is an existential 1st order formula. That certainly would mean that no subsets of N are lists of random numbers. Do you have more information on how this model is constructed for I thought that if we found any model for ZFC, then ZFC would be consistent and that it was not known whether ZFC is consistent. Am I wrong? Is this f built up in a constructive way or is it a pure existence of f theorem?
I really ought to work through the details of this sometime. Maybe I'm fudging something I shouldn't.

If you allow me to adopt a suitable version of the axiom of choice (for simplicity, I will posit a well ordering < on the entire universe of sets)

For example, if Ax:P(x, y), then P(x, y) is clearly true for all definable x.

If Ex:P(x, y), then if y is definable, there is clearly a definable x satisfying the equation. Specifically, take x to be the smallest set (from the global well-ordering) satisfying P(x, y).

In other words,

Ex:P(x, y) ==> E!x: [ P(x, y) and (Az: P(z, y) => (x < z or x = z)) ]

I suspect the construction would be somewhat akin to constructing a term model for a set of sentences: I take my objects to be the formulas of the type E!x:P(x), modulo the relation that ZFC can prove the two sets equal.

I think would I have to assume the consistency of ZFC to construct this model -- maybe that will allay your concerns.

Alternatively, I assume (a first-order version of) ZFC to be consistent, from which I can conclude it has a model. I then select the substructure consisting only of objects that satisfy a formula of the form E!x:P(x), and then attempt to prove that this is actually a submodel of ZFC.

So to more eloquently state my definition of random sequence, first it depends on what a random subset A of N is. A is a random subset of N iff by definition A is not definable in terms of the structure (N,<,+,*)
Ah, I misunderstood -- I took "formula" to mean a formula of ZFC, not of the theory of interest. Your definition is almost the same as the one I gave.

As I said, I think there's an issue here with choice of model, which makes me unhappy.

I don't like your definition in terms of substs of N at all, though -- surely there can be random sequences that pass through each natural number at least once?

I really prefer defining things in terms of computability -- I think that's just a bias of mine, I don't necessarily know if it's better. And as I said, it's yet unclear to me just what the difference between the two methods would be.

0rthodontist
Hurkyl said:
Or even just how long it takes a single atom to decay. I could use that to generate a sequence of integers (take the binary representation of the time it took to decay). It might not be possible for a computer program to spit out that sequence of numbers.
Does noncomputability really cover the case where there isn't any information to solve the problem? Is the question of whether there is a fully connected subset of size 5 in a graph that I will not specify to you, really noncomputable?

But note that the notion of compressability does fine for this. There is no program that can compress the set of integer sequences produced in that manner, unless there is a breakthrough in physics, which would lead to both a program that could do that and a good reason to consider those numbers nonrandom.

Last edited:
matt grime
Homework Helper
Is the word 'computible' being used in its proper (turing machine) sense here at all? (I know Hurkyl was using it in this way.)

Given any graph there is an algorithm for 'computing if K_5' is contained as s subgraph. It just might take longer than the universe will exist for to work for any graph you actually give me. But that isn't what 'non-computible' usually means.

Last edited:
0rthodontist
No, I do not mean in a graph that the algorithm is provided with. I mean in a graph, as I said, that I will not specify. No algorithm could do that but does that mean that it is noncomputable? Just as, no (known) algorithm could predict the integers produced from measuring the decay time of an atom, but it has no data on which to start. It doesn't seem like either of those cases satisfies the formal meaning of noncomputable.

Hurkyl
Staff Emeritus
Gold Member
Let me try restating the point.

We can have a Turing machine that spits out an infinite sequence of 0's and 1's.

There are countably many Turing machines.

There are uncountably many infinite sequences of 0's and 1's.

Therefore, there are uncountably many infinite sequences that absolutely positively, can never be seen as the output of a Turing machine. Those sequences are called noncomputable.

If I have an atom, and I measure the time it takes to decay, this allows me (in principle) to generate a sequence of 0's and 1's. It may be that this sequence is noncomputable.

In such a case, this means that it is impossible, even a posteriori, to devise a Turing machine that prints out the time it took to decay.

0rthodontist
Well, in practice you can only measure to a certain amount of precision, so your bit strings from atom decay will never really be "infinite" (and isn't the purpose of your example to illustrate an example of randomness by your definition under real conditions?)

Also, and more than likely I am misunderstanding or misinformed, is it true that every Turing machine either does not terminate or only outputs a finite length string? So you couldn't output the infinite string 1111... though that string is clearly not random.

When you look at this from the perspective of compressability rather than computability, it is all clear. The bit string from the atom decay is random because no one can compress (summarize) that data into a shorter form.

When you apply the compressability perspective to the tossing of dice, then you can say two things that satisfy both sides of the issue, that dice are somehow random yet also determined. If you are provided only with the output of the dice tossing, then that data is not compressible, so it is random. But if you are also provided with more information about the situation the dice were tossed in--velocity and rotation at which they left the hand, bouncing properties of the dice against the surface, as well as the actual outcome of tossing the dice--then you can compress that data because parts of it can be derived from other parts, so it is not fully random. So you can say that the physical system of tossing dice is nonrandom, while the outcome alone of tossing dice is random, which is satisfying from two perspectives at once.

Hurkyl
Staff Emeritus
Gold Member
Well, there are (at least) two ways you can use a Turing machine to output an infinite string.

(1) Give the Turing machine a special output tape, to which it can do nothing but write a number and move the head to the right.

Then, the output of the Turing machine will be a (possibly infinite) sequence. That sequence is defined by saying the n-th term is the n-th symbol written to the output tape.

(2) Have a Turing machine that takes a number as input and returns a symbol as output.

This defines a sequence by saying that the n-th term is the result of running the Turing machine on input <n>.

When you look at this from the perspective of compressability rather than computability, it is all clear. The bit string from the atom decay is random because no one can compress (summarize) that data into a shorter form.
How are you going to define "compressability" for infinite strings?

The best definition I could imagine is:

An infinite string is said to be compressible if and only if there exists a Turing machine that can print it. (And thus the compressed form is simply the description of said Turing machine)

And thus, the compressible infinite strings are precisely those that are computable.

Furthermore, I don't think your definition is adequate. Maybe it's impossible to summarize that data in shorter form if you live in a vacuum... but it might become possible if you're allowed to refer to the initial conditions. If so, then I don't think the data could qualify as being "random".

Last edited: