# Paradox from Axiom of Choice

I heard this interesting paradox, which I haven't been able to find anywhere online!
Now, bear with me while I set it up!

Suppose a professor has a countably infinite number of students. This professor secretly assigns to each student a real number in the interval [0,1], and thus ends up with an infinite sequence (s1, s2, s3, s4, ... ), where sk is the real number assigned to student #k.

The professor then tells the class that he will offer each student the opportunity to guess the number he has been assigned. Starting with k=1, the professor will call student #k into his office, and show him the infinite sequence he made earlier, but replacing sk with "___", so as to obscure it. That is, student k will see: (s1, ..., sk-1,___,sk+1, sk+2, sk+3, ...). He will then ask student #k to guess sk (the only number in the sequence he hasn't seen!). After this, student #k must swear not to tell any of the other students what he has seen, and then student #k+1 is called in.

Now assuming the students have the ability to meet before hand, and also have the ability to memorize an uncountably infinite amount of things, is there a strategy that the students can adopt such that all but finitely many guess correctly?

The answer is YES! (And no, I have not left anything out of the problem!)

Proof:

The students all gather in the common room the night before they are to be called in. On tiny sheets of paper, they write down every possible sequence of real numbers that the professor could have chosen. i.e., they write down all possible sequences of the form (x1, x2, x3, ...) where xj is a real number in the interval [0,1].

Now, they gather up a bunch of old shoe-boxes and start sorting all these sequences with the following rule: two sequences (x1, x2, x3, ...) and (y1, y2, y3, ...) will be put in the same shoe box IF AND ONLY IF they differ by only a finite number of terms. (Differing by only a finite number of terms is an equivalence relation, so this is all well-defined).

Now by the Axiom of Choice, the students pick for each box a single "representative" sequence in that box. The students make sure that they all agree upon the choices of representatives. As the evening grows on, the students memorize all the information about which sequences are in which boxes, and which sequence is the representative for each box.

Now, starting with k=1, student #k is called into the office. He sees an infinite sequence with the k-th term omitted, like so:

(s1, ... s,k-1,___,sk+1, sk+2, sk+3, ...)

Student #k notes that whatever his own score happens to be, he knows which shoe-box the sequence (s1, s2, s3, ...) must belong to. We'll call this box Beta. Now student #k recalls what the representative sequence for Beta was. Call this sequence (b1, b2, b3, ...). (This sequence will, of course, differ from (s1, s2, s3, ...) in only a finite number of terms). Student #k then guesses that sk is bk, and then leaves.

When student #k+1 is called in, he will undoubtedly realize also that (s1, s2, s3, ...) belongs in box Beta. He will also recall that (b1, b2, b3, ...) was the representative sequence for box Beta. And thus he will guess bk+1. The same is true for every student.

But (b1, b2, b3, ...) differs from (s1, s2, s3, ...) in only a finite number of places, so all but finitely many students will guess correctly!

So, what do you all think? Counterintuitive, eh?

EnumaElish
Homework Helper
How exactly do they "write down all possible sequences of the form (x1, x2, x3, ...) where xj is a real number in the interval [0,1]"? Why does the prof assign real numbers and not rationals, even though there are countably many students?

they write down every possible sequence of real numbers that the professor could have chosen. i.e., they write down all possible sequences of the form (x1, x2, x3, ...) where xj is a real number in the interval [0,1].
This is not possible. You assume the set of "every possible sequence of real numbers that the professor could have chosen" is countable.

How exactly do they "write down all possible sequences of the form (x1, x2, x3, ...) where xj is a real number in the interval [0,1]"? Why does the prof assign real numbers and not rationals, even though there are countably many students?

Well, "write down" is more in a metaphorical sense. The paradox uses students, because it illustrates how counterintuitive the result is.

The professor assigns real numbers because that makes it all the more paradoxical. We could just as well have used numbers from the set {0,1}, or integers, or rationals, or integers between 0 and 100.

The paradox is that the probability of any given student guessing the number they've been given is 0. When student #k walks into the professors office, he has no clue about whether the professor has assigned him the number 0.222443333.... or pi/4, or any other real number in [0,1].

The probability that any particular student will guess incorrectly is 1. From this, we can conclude that the probability that any group of n students will all guess incorrectly is also 1. But this means that the probability that at least one student guesses correctly in some finite set of students as large as you like is exactly 0.

Eg: "What's the probability that at least one of the first 1,000,000,000,000,000 students to be called into the office guesses correctly?"

"Zero!"

And yet the conclusion is that there is some finite number N, such that ALL students after student N guess correctly!

This is not possible. You assume the set of "every possible sequence of real numbers that the professor could have chosen" is countable.

It's not literally possible, but neither is writing down any countable set of things!

Here's what I mean formally:

Consider the set A of infinite sequences of real numbers between 0 and 1.
Define an equivalence relation R on A by the following:
(x1, x2, x3, ...) R (y1, y2, y3, ...) <==> xi $$\neq$$ yi for only finitely many i.

The equivalence relation R induces a collection of equivalence classes (which will be uncountable, of course). These equivalence classes are the shoe-boxes.

We then use the axiom of choice to select a representative for each of these equivalence classes.

Student #k notes that whatever his own score happens to be, he knows which shoe-box the sequence (s1, s2, s3
I’m not sure he would, because he’s trying to select a member of an uncountable set composed of partitions by progressively going through it.

I’m not sure he would, because he’s trying to select a member of an uncountable set composed of partitions by progressively going through it.

We can formally define this too.

Once we have the set of equivalence classes, which we denote A/R, we have a map f from A onto A/R that sends each sequence to its equivalence class in A/R.

In other words, define f((x1,x2,x3,...)) = [(x1,x2,x3,...)]

where "[ ... ]" notes "the equivalence class of". We already know that each element belongs to one and only one equivalence class, so f is well-defined.

We also have a function g from A/R into A, which is the "representative of" function. In other words,

g([...blah...]) = (y1,y2,y3, ...)

where (y1,y2,y3, ...) is the chosen (by the axiom of choice) representative for the equivalence class [...blah...].

Now what student #k can formally do is take the censored not-quite-a-sequence: (s1,s2, ...,sk-1,___,sk+1, sk+2, ...) and map it to the sequence: (s1,s2, ...,sk-1,0,sk+1, sk+2, ...).

then g(f(s1,s2, ...,sk-1,0,sk+1, sk+2, ...)) will always pick out the same representative sequence (b1,b2,b3,...), regardless of the value of k.

Of course you can make a function that maps a member to its partition. I don’t see how that’s relevant. In this case it’s not a function a human can perform. And your whole silly paradox is about humans doing something humans shouldn’t be able to do.

The paradox is that the probability of any given student guessing the number they've been given is 0. When student #k walks into the professors office, he has no clue about whether the professor has assigned him the number 0.222443333.... or pi/4, or any other real number in [0,1].

why? i think:if 0.222443333 is in the sequence then 0.222443333 is not assigned him...

Hurkyl
Staff Emeritus
Gold Member
Of course you can make a function that maps a member to its partition. I don’t see how that’s relevant. In this case it’s not a function a human can perform. And your whole silly paradox is about humans doing something humans shouldn’t be able to do.
If you don't like to consider thought "experiments" that colorfully illustrate a mathematical situation, then you are in the wrong forum. :tongue:

Hurkyl
Staff Emeritus
Gold Member
Calling it a paradox is sensationalizing a little -- it's not a paradox of logic, instead it's a demonstration of an expected failure of intuition. "Pseudo-paradox" is a better term.

For the other posters, I just wanted to confirm that the OP's logic is correct. If you are having trouble seeing it, it might help to think of it this way: the students have come up with a naming scheme for sequences. They can convert the name back into a sequence, but introducing finitely many errors.

The professor gives every student enough information to determine the name of the professor's sequence.

why? i think:if 0.222443333 is in the sequence then 0.222443333 is not assigned him...

The professor does not necessarily have rules to assigning numbers!

He could even assign the sequence as follows if he wanted:

(s1, s2, s3, ...) = (1,1,1,1,1,1,1,...)

(Which means he assigns 1 to each student)

ok ,but could you explain why the answer would be "no" ?

Last edited:
EnumaElish
Homework Helper
Calling it a paradox is sensationalizing a little -- it's not a paradox of logic, instead it's a demonstration of an expected failure of intuition. "Pseudo-paradox" is a better term.

For the other posters, I just wanted to confirm that the OP's logic is correct. If you are having trouble seeing it, it might help to think of it this way: the students have come up with a naming scheme for sequences. They can convert the name back into a sequence, but introducing finitely many errors.

The professor gives every student enough information to determine the name of the professor's sequence.
This makes sense to me as pure logic; and for practical intuition, I'd like to think that each "name" is actually a sequence of rationals that can be converted to a sequence of reals with finitely many errors.

Here's how you can understand why I call it a paradox:

Here's the problem stated in purely formal terms:

Let A be the set of all real sequences. If S = (s1, s2, s3, s4, ... ) denotes an element of A, then define Si = (s1, s2, ... sk-1, sk+1, sk+2, ...). (Notice the omission of sk).

Does there exists a function f: A x N --> R such that for all S in A, f(Si, i) = si for all but finitely many i?

The reason one might think the answer is supposed to be "no" is the following argument:

Look, suppose I could construct such a function f. Consider then the question of when f(S26, 26) = s26 (we have simply let i=26).

For every sequence S for which f(S26, 26) = s26, I can construct uncountably many sequences for which f(S26, 26) $$\neq$$ s26 simply by changing s26 to any other real number! So f is going to be almost surely wrong about the value of s26, or any other particular si for that matter.

Since the probability that f(Si, i) = si is 0 for any i, the probability that f(Si, i) = si for all i in any finite collection I of natural numbers is also 0. If the probability is 0 for any given finite set I (of which there are countably many), then the probability is also 0 for any infinite set J (since every infinite set J contains some finite set I).​

It might seem straightforward to go from this, and conclude that the probability is 0 that there is some infinite set of natural numbers J such that f(Si, i) = si for all i $$\in$$ J. The problem is we can't conclude this, as there are an uncountable number such sets J.

Now there is an even MORE convincing argument that the answer should be no:

If we fix i, the probability that f(Si, i) $$\neq$$ si is 1. Let En denote the event that f(Si, i) $$\neq$$ si for i=1,2,3,...n. As the En are a decreasing sequence of events, the probability that f(Si, i) $$\neq$$ si for ALL i is equal to the limit as n-->infinity of the probability of En. As the probability of En = 1 for all n, the probability is 1 that f(Si, i) $$\neq$$ si.​

I'm actually not sure why the above argument doesn't work, to be honest! I'm not very good at probability, so I'm not sure exactly where this argument fails...

well i think that would be true if you ignored 'THEY CAN MEET BEFORE HAND'
if your first proof is right then the the probability that f(Si, i) = si is not 0(and is not nesecerily 1).
also i think your first proof is wrong because the students won't find wich shoe box professor's sequence must belong to unless they know the missing number.
for example if sequences in each shoe box differs in only (n) numbers, student k won't be sure about S(k) (for different values of S(k) the result might be different,because S(k) might be nth number that differs).

Last edited:
If you don't like to consider thought "experiments" that colorfully illustrate a mathematical situation, then you are in the wrong forum. :tongue:

I have no problem with thought experiments. I was just pointing out this is in no way a paradox. If we include the human element the way he deals with R and N isn’t possible, making it a non-paradox. If we consider it as a purely mathematical entity I don’t see any paradox either, counter intuitive maybe. But intuition never seemed to have much relevance to what is or isn’t a mathematical truth anyways…

Hurkyl
Staff Emeritus
Gold Member
You can clean your formalization a bit by introducing functions
gi : A ---> A x N : S --> (Si, i)

I don't have time to really follow your train of logic, but I have fivemain reservations:
1. You have not defined the probability space you are using. What are the outcomes? The events? How do you compute the probability of an event?
2. You are assuming properties of finite things are preserved when you generalize to the infinite
3. I'm not sure how you are connecting probability to the existence of a function f
4. A probability 0 event is still an event
5. If we have a way to interpret probabilities as a limit of a proportion, the fact the limit is zero doesn't mean the numerators of those proportions is zero.

Incidentally, have you seen the numbered ball pseudo-paradox? I think it's one of the more striking examples of a quantity discontinuous under the limit from finite to infinite:

You have infinitely many balls in a bucket, labeled by the natural numbers. (One per number). You have a box that starts empty.

At each finite step of the algorithm, you take the two lowest numbered balls from the bucket and place them in the box. Then you take the lowest numbered ball from the box and throw it away.

Define the limit of the box set to be the set of all balls that were placed into the box but not removed from the box.

Clearly, in the limit has size zero -- despite the fact the box has N balls after N steps of the algorithm. "The number of balls in the box" is not a continuous function here.​

Last edited:
Limitkiller: The students do not need to know every number of the professor's sequence (s1, s2, s3, ...) in order to know which box (equivalence class) it belongs to. This is because the students have set up the boxes such that all sequences that differ in only a finite number of terms will be in the same box.

For example, if a student sees the following:

(s1, s2, s3, ___, s5, s6, ...)​

Then he will know the professor's sequence is in the same box as:

(s1, s2, s3, 0.421555, s5, s6, ...)​

and

(s1, s2, s3, pi/4, s5, s6, ...)​

and

(s1, s2, s3, 0.2, s5, s6, ...)​

JonF: I would qualify this as a "veridical paradox" (wiki: http://en.wikipedia.org/wiki/Paradox). I have proven that it is true, and yet it still seems perfectly absurd, even when considered as a purely formal question of mathematics. For example, the Banach-Tarski paradox (http://en.wikipedia.org/wiki/Banach–Tarski_paradox) has been proven true, yet it still seems absurd as it violates even firmly-held intuition.

I'm still trying to find out why my intuition about probability (which I admit is weak to begin with) fails in this particular case, at it seems almost straightforward that the answer should be "no" upon first examination.

Hurkyl: I actually stumbled across the numbered ball paradox while trying to find a reference to the (pseudo-)paradox that is the subject of this thread.

The wikipedia article on it is pretty good: http://en.wikipedia.org/wiki/Ross–Littlewood_paradox

I think labeling it as a pseudo-paradox is actually inaccurate. One resolution mentioned of the paradox is similar to your argument, but subtly (and I think importantly different):

The problem is ill-posed. To be precise, according to the problem statement, an infinite number of operations will be performed before noon, and then asks about the state of affairs at noon. But, as in Zeno's paradoxes, if infinitely many operations have to take place (sequentially) before noon, then noon is a point in time that can never be reached. On the other hand, to ask how many balls will be left at noon is to assume that noon will be reached. Hence there is a contradiction implicit in the very statement of the problem, and this contradiction is the assumption that one can somehow 'complete' an infinite number of steps. This is the solution favored by mathematician and philosopher Jean Paul Van Bendegem.

Your explanation relies on a definition of a "limit box" in a particular way (that is, the set of all balls placed in the box, but not subsequently removed).

We could just as well have definite the "limit size" to be the limit of the number of balls in the box, which goes to infinity (not 0), but this is a very superficial observation.

I can show your definition of a "limit box" is more fundamentally flawed. We can modify the problem only very slightly to make your limit box definition yield anything we like. Instead of selecting the balls from the bucket, we could re-use the balls we have taken out in the following way:

Step 1: Label two balls "1" and "2" and put them in the box. Take out ball "1", and relabel it ball "3".
Step 2: Label a ball "4" and put it and ball "3" in the box. Take out ball "2" and relabel it "5"
Step 3: Label a ball "6" and put it and ball "5" in the box. Take out ball "3" and relabel it "7"
...

As you can see, this time the "limit box" according to your definition will contain infinitely many balls, namely those which were initially labeled "1" or an even number. But this time we find out that none of the balls in the limit box are labeled with a number!

You see, this new problem is exactly analogous (isomorphic, in a way) to the original scenario. Notice how if we think of each current ball number in my new scenario as a ball in the original scenario, that we can clearly conclude there are no numbered balls in the limiting box. However there are balls in the limiting box! But to conclude there are balls in the limiting box, we must posit the existence of unlabeled balls, which is ridiculous!

You should interpret what has been done to show that: "Two different yet equally appropriate applications of your definition to the same hypothetical process yields a limiting box with (seemingly contradictory) emergent properties." You can see by the way the problem reacts to rephrasing that it is not just a counterintuitive result, but a genuine paradox.

Indeed, if I were to pick a definition of limit, I think that the limit of the number of balls in the box is probably the option with the least potential for contradiction! :tongue2:

Now on to the problem at hand!

In order to formalize the probability calculations, I will change the problem slightly to say that the professor will assign to each student an integer in the set {0,1,2,...,9}. The proof of the paradoxical claim will remain almost exactly the same, except real sequences will be replaces by sequences of integers in the set {0,1,2,...,9}.

Let us then define A to be the set of infinite sequences of integers between 0 and 9 that do not terminate in all 9's. Now we can construct a bijection between A and the interval [0,1] by sending a sequence (s1, s2, s3, ...) to the real number 0.s1s2s3... [NOTE: the restriction that the sequences in A not terminate in 9's will not affect the probability calculations I am about to do, as only a countable number of sequences are omitted].

Now we can use the Lebesgue measure on the unit interval [0,1] to make a probability measure on A in the obvious way.

Now, suppose we have the claimed function f : A x N --> {0,1,2,...,9} where f(Si, i) = si for all but finitely many i. Suppose further that f has the property (as is used in the proof) that f(Si, i) = f(Ti, i) for all sequences S,T where S and T differ by only finitely many terms (*).

Now, fix k $$\in$$ {0,1,...9} and j $$\in$$ N. Consider the subset Dk,j of A which contains exactly those sequences S with sj = k. This corresponds to a finite union of intervals in [0,1]. The measure of this union is 1/10, so the probability of Dk,j = 1/10. We can see this is true for any fixed k and any fixed j.

Now, fix i and define the function fi on A by fi(S) = f(Si, i). We want to know the probability that fi(S) = si. This is the sum of the probabilities (from k = 0 to 9) of the probabilities of (Dk,i $$\cap$$ {S $$\in$$ A | fi(S) = si}). But this is just the sum of the probabilities of (Dk,i $$\cap$$ {S $$\in$$ A | fi(S) = k}). However, the two events in each term of the sum are independent (the probability of {S $$\in$$ A | fi(S) = k} is the same regardless of what the actual value of si due to *.) Thus the probability that fi(S) = si is 1/10 of the sum from k = 0 to 9 of the probability of {S $$\in$$ A | fi(S) = k}. Thus the probability that fi(S) = si is exactly 1/10. (This is all assuming i is fixed).

Uh.... I think I'll take a break for now!

Last edited by a moderator:
Let Ei be the set of sequences S in A such that fi(S) = si. We just calculated that the probability of Ei = 1/10 for any i.

Let Ci = Eic (the complement of Ei).
Then for each S in A, S is in only finitely many Ci.

That's crazy! The probability of each Ci is 9/10, and there are an infinite number of them, and you mean to tell me that any sequence in A is in only finitely many of them?

That's like claiming that you've stacked an infinite number of paper plates on top of each other, each with a 1 inch hole in it, but that above every point of the table, if you go up far enough in the stack, it's only holes above that point. That's a contradiction!

Now, how it arises must have to do with the fundamental assumptions required to use probability. I'll bet you that the functions fi are not measurable. In fact, that must be it! If the functions fi were measurable, then everything else would have worked!

OK, so now does it makes sense why it seems paradoxical? I have just fully formulated an argument using probability to show why such a function f cannot exist. And the only reason my argument fails is because such a function f which satisfies the requirements will fail to be measurable, and thus the axioms required for probability no longer hold!

Last edited:
Limitkiller: The students do not need to know every number of the professor's sequence (s1, s2, s3, ...) in order to know which box (equivalence class) it belongs to. This is because the students have set up the boxes such that all sequences that differ in only a finite number of terms will be in the same box.

For example, if a student sees the following:

(s1, s2, s3, ___, s5, s6, ...)​

Then he will know the professor's sequence is in the same box as:

(s1, s2, s3, 0.421555, s5, s6, ...)​

and

(s1, s2, s3, pi/4, s5, s6, ...)​

and

(s1, s2, s3, 0.2, s5, s6, ...)​
then next student will choose another box,because he will choose the box that s4 differs in the sequences in that box...right?

then next student will choose another box,because he will choose the box that s4 differs in the sequences in that box...right?

No, because the boxes are sorted as follows: All sequences that differ by a finite number of terms are put in the same box.

In other words, even if I change 1,000,000,000 of the terms of the professor's sequence, it will still belong in the same box.

No, because the boxes are sorted as follows: All sequences that differ by a finite number of terms are put in the same box.

In other words, even if I change 1,000,000,000 of the terms of the professor's sequence, it will still belong in the same box.
since there are infinite number of students ,there must be infinite changing terms in each box (1 per student(?)) in order to cover all the students .

Hurkyl
Staff Emeritus
Gold Member
I think labeling it as a pseudo-paradox is actually inaccurate. One resolution mentioned of the paradox is similar to your argument, but subtly (and I think importantly different):
It's been a while since I learned the difference between the two terms, but strictly speaking, a paradox is a contradiction derived from logic, and a pseudo-paradox is a contradiction derived through a mixture of intuition with logic. (e.g. see the mathworld page)

Your explanation relies on a definition of a "limit box" in a particular way (that is, the set of all balls placed in the box, but not subsequently removed).
Many pseudo-paradoxes rely on some sort of imprecision -- lack of definitions, sloppy analysis, ignoring relevant hypotheses. As soon as you state something precise, all of the apparent contradictions go away -- at worst you are left with something merely surprising rather than apparently contradictory.

But to conclude there are balls in the limiting box, we must posit the existence of unlabeled balls, which is ridiculous!
Here's a good example of an apparent contradiction due to being sloppy. You computed two different limit boxes, and then implicitly assumed the two boxes were equal. But there is no justification for such an assumption....

(I assume when you said "take out a ball and relabel it", you meant to relabel it without actually removing it from the box -- otherwise none of these balls would satisfy the property 'not removed from the box')

Hurkyl
Staff Emeritus