# Definition of randomness (sidebar: truth is random)

1. Feb 24, 2006

### phoenixthoth

I find the definition of randomnumber at mathworld.wolfram.com rather unsatisfying.

Furthermore, dictionary.com will never help me decide what it means for an infinite sequence of numbers to be random.

I read this article once and I have changed it around a bit and I forget (nor can I find) the article. The rough idea is that a subset of N is random (ie it is a random list of numbers) if there is no (finite) formula corresponding to that set.

{1,1,1,1,...} is not random by this definition. f(n)=1 for all n specifies perfectly each term in this sequence.

Well, I think this might be a bit too model-theoretic for some but for the layperson there is some interesting conclusion involving Tarski's undefinability of truth theorem in arithmetic.

This is response to someone on IIDB named Alf about QM and randomness. Later, I challenge Alf to (1) well-define randomness and (2) prove that anything whatsoever in nature is random by this definition.

Also, I note that the set of tautologies corresponds fairly naturally (via Goedel numbering) to a RANDOM subset of N; thus one could intuitively though probably incorrectly decide that truth is random.

Here goes:
I however do not think rolling dice is random and therefore is at best a model for randomness. It's a rather poor example of something random, as is a pseudo-random number generator. In fact, I would go far as to say that we haven't yet both (1) well defined what randomness is and (2) been able to prove that randomness exists in nature.

It is safe to only consider defining randomness in terms of infinite sequences of numbers (which are functions from N to some set of numbers, N, Q, or R). An infinite sequence models the potentially infinite repetition of an experiment such as rolling a die. For example, {1,2,5,4,3,5,4,6,1,2,2,3,2,6,6,...} is a list of all the realized outcomes of rolling a die repeatedly. Thus I will define randomness for an arbitrary quantifiable event to mean that the sequence of numbers representing it is a list of random numbers where the definition of a set of random numbers follows.

I would define a sequence to be a random set of real (for example) numbers if that set is undefinable in (R,<,+,*). I tried to come up with sites that have a definition of definabliblity in model such as (R,<,+,*) and what I found are
http://plato.stanford.edu/entries/model-theory/ (keyword search for "definable") and
http://planetmath.org/encyclopedia/Definable2.html [Broken] . Essentially, a set of real numbers is random if no formula specifies it. That formula could be said to "predict" what will be in that set and thus an aspect of the inherintly unpredictable nature of randomness is tied into this definition. Another, more intuitive definition of random sequence is that it is random if there is no finite formula for the function that it is (again, sequence is a function from N to R, for example). This entails that if a sequence is random and a formula gives to each N the right output in R then that formula is essentially just as big as N. Likewise, if there is a finite formula for the sequence then the sequence is not random. The sequence {1,1,1,1,1,1,1,1,1,....} is obviously not random: define f(n)=1 for all n. A finite formula.

(I think you'll find the gist of this quite interesting.) Do random sequences exist? Yes and I believe I have a nontrivial example that is quite interesting (even for the layperson). Take the set of all wffs (well formed formulas) in 1st order logic. Encode each formula by a number in the following way (essentially), called the Goedel number for the formula if you want to look it up: assign to each logical symbol si [s sub i] a number ai [a sub i]. Then take the formula and read what symbols appear in that formula and make a finite sequence of numbers (a1,...,an). Then use unique prime factorization, mapping this wff to a natural number, the encoding of the formula, by [formula]:=(p1^a1)*...*(pn^an). Here, given formula, [formula] is a notation for the Goedel number of that formula and pi is the i-th prime number (2,3,5,7,11,...). Thus we have a map from the set of all wffs to N. A unique number (by the uniqueness of prime factorization) comes from all formulas. Now consider the set of all tautologies, properly contained in the set of all wffs. Call this set of wffs T. T' is the set of all Goedel numbers of wffs in T. T is the set of all wffs that are in some sense true; i.e., T is the set of all truths. T' is the set of all of the encodings of all truths. Theorem T' is undefinable in (N,<,+,*). This is the so-called Tarski "undefinability of truth" theorem. The theorem says that if you have a finite formula F (which is redundant for wffs because all wffs are finite), then it is not the case that n e T' iff F(n) is true.

Now consider a sequence of all Goedel numbers of all tautologies. I don't care what order the tautologies are in though it is possible to list them all because the whole set of wffs is countable. Thus 316 might be the first number in this sequence where 316 is the Goedel number if some tautology (e.g., for all x, x=x). The second number might be 847826 which the Goedel number of some other tautology. So you have this sequence of Goedel numbers of all the tautologies. This sequence is random.

Thus one could pull a stunt and say truth is random. I wouldn't go that far because I've only made a statement about arithmetic. But "truth is random" is on some superficial intuitive level the conclusion; i.e., the set of all true statements is not definable; i.e., the set of all true statements is not predicatble by some finite formula.

There certainly are a bunch of tautologies so this is a genuine example of randomness.

I think I have well-defined randomness. What your burden is now is to prove that anything in nature corresponds to this or any well-defined notion of randomness.

Note that saying electon behavior is given by some probability distribution is not sufficient to prove randomness of said behavior. {(heads,.5),(tails,.5)} is a probability distribution but as we've seen, rolling a die is NOT random (it is chaotic).

Last edited by a moderator: May 2, 2017
2. Feb 24, 2006

### 0rthodontist

Well, the PlanetMath article involves some math that I don't know yet (stuck at the "models" symbol), but I can give a vague, philosophical answer that may be a misinterpretation. From a philosophical standpoint you can abstractly define anything--it depends on what you start with. If you start with R, <, +, *, then there are sequences that you can't define in a finite number of symbols. But if you start with T' instead of R, then it looks like you can define T' it in a finite number of symbols. If you start with T', <, +, *, can you define R in a finite number of symbols?

Edit: Well I have done a little more reading and maybe that last one is a dumb question, because R is not a subset of T'? But still, it looks like there is an easy algorithm to determine the elements of T' in order, namely go through each number and factor it and see if its factorization corresponds to a tautology through evaluating every possible assignment of truth values. So T' doesn't look random from that perspective.

The thing is that if you are just vaguely talking about a formula for the nth term of a sequence, it can be given. Even if you have some guaranteed random sequence, by whatever "random" means, there is probably some way of determining what the next item in the sequence is. If you start speaking your sequence out loud at a rate of one item per second, then I can define the n'th term of the sequence as the term you say in the n'th second.

Last edited: Feb 24, 2006
3. Feb 24, 2006

### phoenixthoth

From mathworld.wolfram.com: "A theory is decidable iff there is an algorithm which can determine whether or not any sentence r is a member of the theory." Consider the incompleness theorem. I think it means that there actually is no effective algorithm or procedure for deciding if a formula about arithmetic is a tautology. I'm still unclear on this point. Maybe someone who knows more about this can throw us a bone?

But your formula would only necessarily fit the first n-1 numbers and not guaranteed to predict the n+1 number. Also, any finite sequence cannot be random which is essentially what I think you're saying.

4. Feb 24, 2006

### 0rthodontist

All right, I was confused about the term "first order logic," I was thinking of propositional logic. You are right, you can't just assign truth values to determine if a statement in first order logic is true.

But your T' still does not seem random to me. If it is impossible even in theory to decide whether some integers are elements of T', that doesn't strike me as randomness so much as some kind of ephemeral nonreality... No one can list the elements of T' in increasing order. They have to stop whenever they reach a number that corresponds to an undecidable statement.

Also, even if people can't list ALL the elements of T', they can still list many of them. If we have the same algorithm on our computers to try to decide whether x is an element of T', an algorithm which perhaps gives up after a while, then our results--which are guaranteed elements of T', though we may leave out some elements in between--are guaranteed to be the same. If you can recognize even SOME deterministic pattern in a sequence, and if you can tell what even only some of the as-yet-unseen elements of the sequence are, then it doesn't sound right calling it random.

Last edited: Feb 24, 2006
5. Feb 24, 2006

### Hurkyl

Staff Emeritus
One of the funny things about probability is that random variables never actually take on values. So, if you insist on things taking values, then those things can't be random! (Of course, there are interpretations of QM in which observables don't actually take on values, in which case it would truly be accurate to say that they are "random")

There is a theorem of computability theory that says that "incompressible" and "random" strings "look the same". (The following is about finite strings)

We could talk about describing strings of digits. For example, we want to describe a string s of binary digits.

Suppose we have a program (specifically, a Turing machine) M, and some binary string w. Suppose that M(w) = s. (That is, if I run M on w, I get s as the answer) Then, we can use the pair <M, w> as a description of s. (For some reasonable way to encode a program and its input as a binary string)

We can define the descriptive complexity (or Kolmogorov complexity) K(s) of a string s by:

K(s) := min { |<M, w>| | <M, w> is a description of s }

Where |s| is simply the length of the string s.

A string s is said to be compressible if K(s) < |s|.

Note that K(s) depends on the method we use to encode a program and its input. Descriptive complexity is not well-defined for any individual input -- it only makes sense to speak about its bulk-properties.

For example, you can prove that at least half of the strings of a given size cannot be compressed. (for any particular scheme for describing strings)

But, for any individual string s, it is meaningless to ask whether s is compressible. (e.g. I can devise a scheme in which s is denoted by "0", and the binary representation of every other description <M, w> begins with a "1")

We can say if a property P holds for almost all strings: the fraction

|{s | |s| = n and not P(s) }| / |{s | |s| = n}|

goes to 0 as n goes to infinity.

If we have a computable property P (that is, one that can be computed by a program, aka Turing machine), then we have:

Theorem: Suppose P holds for almost all strings. Then P(s) is true for all compressible strings s, except for finitely many exceptions.

(This theorem actually has a slightly stronger statement, but I don't think it's necessary for this discussion)

In some sense, we would expect a property true for "almost all" strings to hold for almost all random strings. So, incompressible strings exhibit the same set of properties.

So, if you're willing to trade "randomness" for "incompressible", it is possible to speak about finite, random strings. But you are only allowed to speak about them in bulk, since the notion doesn't make sense for any particular string. (Or, I suppose, you could fix a particular representation of descriptive complexity, and speak relative to that)

(I've had this discussion in real life before. Can you tell? )

I'll have to think some more about infinite strings before I can decide what I want to say about them.

Last edited: Feb 24, 2006
6. Feb 24, 2006

### phoenixthoth

That's a good point. I guess I just differ on that in so that I feel that if a sequence contains a nonrandom subsequence then it does not follow that the sequence is nonrandom.

7. Feb 24, 2006

### phoenixthoth

That's basically more along the lines of the original article that inspired me to define randomness by the undefinable subsets.

Last edited: Feb 24, 2006
8. Feb 24, 2006

### Hurkyl

Staff Emeritus
Yikes, that's a big quote!

Before I really say more, I'm pretty sure that ZFC has a definable model -- that is, the only sets that exist are those which uniquely satisfy some existential set-theroetic formula. (I may need to restate the axiom of choice to provide an actual logical mapping from sets to choice functions, though. I'm not sure)

If so, I think that there are no random subsets of N in this model.

I think you meant to say "sequence of natural numbers" and not "subset of the natural numbers" -- but my claim would hold for that too.

Your example of a "random" sequence fails, because it is the unique sequence s satisfying:

s is a sequence of natural numbers, and s(n) is the n-th smallest integer that is a Gödel number of a WFF that holds in every model of the natural numbers.

I think you need to use something weaker. For example, you could define:

A sequence s is said to be random if and only if it is not computable. That is, there does not exist a Turing machine M such that s(n) = M(n) for all indices n.

Last edited: Feb 24, 2006
9. Feb 24, 2006

### 0rthodontist

I think that compressability is a good idea for randomness, but I think that the compression scheme is very important for determining if something is random, and it is random only in relation to a compression scheme. If some algorithm, or philosophically some person, can't reduce and summarize data very much, it means that the data is random _for that person alone_. Someone else might be able to compress the data very much, or even large sets of data, that the first person can't do anything with. No string and no set of strings (sufficiently small with respect to the length of the strings) by itself is random, it is only random with respect to the person or algorithm trying to analyze it.

10. Feb 24, 2006

### Hurkyl

Staff Emeritus
I suppose you could get by with just requiring the formula to be in a weaker language. E.G.

A sequence s of natural numbers is random if and only if there does not exist a unary relation P for which P(s) and E!x:P(x) are both in Th(N).

(I wonder if that is the same as the computable definition I gave?)

It strikes me that there are some quirky things that should be observed -- I don't know if you can elimiate them.

I strongly suspect that in any model of N, there exists a sequence s and a unary relation P in the language of arithmetic, such that P(s) is true in the model, E!x:P(x) is in Th(N), but P(s) is independent of Th(N).

It might even be possible that for any random sequence s, we can pick a model of N where the above can happen for s!

Last edited: Feb 24, 2006
11. Feb 24, 2006

### 0rthodontist

On an intuitive level I just don't see how these definitions on the basis of noncomputability or satisfying/not satisfying formulas are connected with the idea of randomness. Undecidability is not randomness.

12. Feb 24, 2006

### Hurkyl

Staff Emeritus
Intuitively, randomness is unpredictability.

If a sequence is not computable, then there is no algorithm to predict its terms!

If a sequence is computable, then there is an algorithm to predict its terms.

So, intuitively, random sequences are precisely those that are not computable.

Alternatively, a random sequence is one that doesn't have any special properties.

Any formula that determines the terms of the random sequence would be considered a special property.

So, intuitively, random sequences are precisely those whose terms aren't determined by a formula.

13. Feb 24, 2006

### 0rthodontist

Randomness is unpredictability _of data_. If some terms of a sequence are noncomputable then _there is no data there_.

Anyway, and not to soften that point because I really think that's the heart of the issue, what about computation schemes that include oracles for undecidable problems?

Last edited: Feb 24, 2006
14. Feb 24, 2006

### Hurkyl

Staff Emeritus
A sequence is computable if and only if there exists a (deterministic) program M such that M(n) is the n-th term of the sequence.

A noncomputable sequence just means that there does not exist a program that computes its terms. It doesn't mean that the sequence doesn't exist at all. :tongue:

An example of a noncomputable sequence comes from the Busy Beaver problem:

s(n) := the largest number that can be printed by a Turing machine with n states.

If you have an oracle, then you could talk about oracle-computable functions. The theory is all the same, you just have more computable functions. (And, of course, with a sufficiently powerful class of oracles, any sequence would be computable)

Last edited: Feb 24, 2006
15. Feb 24, 2006

### 0rthodontist

Just because there's a term there doesn't mean there's data there. I'm basically winging this definition, but data has to be able to be input to something. If you can't figure out what the term is you can't input it to anything, nothing can plot it as a data point in any graph, it can't be used for any computational purpose. (well the abstraction of the term could but the abstraction of the term isn't the term). Data has to be real in a way that a computer can compute.

16. Feb 25, 2006

### Hurkyl

Staff Emeritus
Nobody said you can't figure it out -- just that there's no program that can do it.

For example, I could sit there and use my Geiger counter to figure out when my (infinitely large collection of) atoms decay, which gives me plenty of data to work with, but it might not be possible to write a computer program whose output is the same sequence of numbers.

17. Feb 25, 2006

### Hurkyl

Staff Emeritus
Or even just how long it takes a single atom to decay. I could use that to generate a sequence of integers (take the binary representation of the time it took to decay). It might not be possible for a computer program to spit out that sequence of numbers.

18. Feb 25, 2006

### phoenixthoth

Very interesting. Thanks for the feedback.

I would like to study this model of set theory where x is a set iff f(x), where f is an existential 1st order formula. That certainly would mean that no subsets of N are lists of random numbers. Do you have more information on how this model is constructed for I thought that if we found any model for ZFC, then ZFC would be consistent and that it was not known whether ZFC is consistent. Am I wrong? Is this f built up in a constructive way or is it a pure existence of f theorem?

Your "weaker" definition of random is still capturing the idea that no function with finite formula perfectly predicts the sequence, right? What's the difference between my definition of a random sequence in the sense that are there random sequences in my definition that are not random in yours? Maybe T' is not a random subset of N. Could you eleborate on why in my definition T' is not a random subset of N?

So to more eloquently state my definition of random sequence, first it depends on what a random subset A of N is. A is a random subset of N iff by definition A is not definable in terms of the structure (N,<,+,*) [which may be redundant for I might be able to define < in terms of +]; i.e., there is no 1st order formula f such that n e A iff f(n). Then, a sequence NxS, where S is a set of numbers, is random iff S is a random subset of N. By this definition, since T' is a random subset of N, NxT' is a random sequence.

Your idea seems to be that there is no effective procedure for computing the sequence NxS. Which definition better encapsulates the unpredictability of randomness??? If yours, then I'd readily adopt it.

Orthodontist, well, an infinite sequence is just a model of taking an experiment and repeating it forever. The sequence contains all of the data. I hope this is relevant to your question. "Thus I will define randomness for an arbitrary quantifiable event to mean that the sequence of numbers representing it is a list of random numbers where the definition of a set of random numbers follows."

19. Feb 25, 2006

### Hurkyl

Staff Emeritus
I really ought to work through the details of this sometime. Maybe I'm fudging something I shouldn't.

If you allow me to adopt a suitable version of the axiom of choice (for simplicity, I will posit a well ordering < on the entire universe of sets)

For example, if Ax:P(x, y), then P(x, y) is clearly true for all definable x.

If Ex:P(x, y), then if y is definable, there is clearly a definable x satisfying the equation. Specifically, take x to be the smallest set (from the global well-ordering) satisfying P(x, y).

In other words,

Ex:P(x, y) ==> E!x: [ P(x, y) and (Az: P(z, y) => (x < z or x = z)) ]

I suspect the construction would be somewhat akin to constructing a term model for a set of sentences: I take my objects to be the formulas of the type E!x:P(x), modulo the relation that ZFC can prove the two sets equal.

I think would I have to assume the consistency of ZFC to construct this model -- maybe that will allay your concerns.

Alternatively, I assume (a first-order version of) ZFC to be consistent, from which I can conclude it has a model. I then select the substructure consisting only of objects that satisfy a formula of the form E!x:P(x), and then attempt to prove that this is actually a submodel of ZFC.

Ah, I misunderstood -- I took "formula" to mean a formula of ZFC, not of the theory of interest. Your definition is almost the same as the one I gave.

As I said, I think there's an issue here with choice of model, which makes me unhappy.

I don't like your definition in terms of substs of N at all, though -- surely there can be random sequences that pass through each natural number at least once?

I really prefer defining things in terms of computability -- I think that's just a bias of mine, I don't necessarily know if it's better. And as I said, it's yet unclear to me just what the difference between the two methods would be.

20. Feb 25, 2006

### 0rthodontist

Does noncomputability really cover the case where there isn't any information to solve the problem? Is the question of whether there is a fully connected subset of size 5 in a graph that I will not specify to you, really noncomputable?

But note that the notion of compressability does fine for this. There is no program that can compress the set of integer sequences produced in that manner, unless there is a breakthrough in physics, which would lead to both a program that could do that and a good reason to consider those numbers nonrandom.

Last edited: Feb 25, 2006