# One thing I don't understand about Cantor's diagonal argument

• B
Cantor's diagonal argument, essentially, proves (or demonstrates, as I'm not exactly sure if it's considered a mathematically rigorous proof) that the set of all real numbers is uncountable, ie. essentially larger than the set of natural numbers.

It does this by, essentially, proof by contradiction: We start with the assumption that the set of all real numbers is countable, we list all these numbers (which should be possible if they are countable), and we proceed to construct a number that's not in the set. Thus our original assumption that we listed all the numbers was proven incorrect, and thus the premise that the set of real numbers is countable is incorrect.

But thinking about the argument, I can't help but get the feeling that it's circular: It assumes what it's trying to prove.

What do I mean by this?

If we start with the assumption that the set of all real numbers is countable, that means that we can assume that we can create a bijection between this set and the set of natural numbers. In other words, that we can "remap" all the numbers to the natural numbers (which would essentially be their position in the list), and this shouldn't make a difference. If the set is countable, then it shouldn't really matter how we represent the values, as long as they are unique (ie. it's a true bijection.)

Thus we can represent every number in the set with consecutive natural numbers. We can do it eg. in base-2, starting with the least-significant digit first, like:

0: 000000...
1: 100000...
2: 010000...
3: 110000...
4: 001000...
5: 101000...

and so on and so forth. But now if we construct a new number by taking the first digit of the first number and inverting it, the second digit of the second number and inverting it and so on, we get a new number like:

X: 111111...

which indeed is not in our set... because it's not a number at all. A base-2 number with an infinite amount of 1-digits is just infinity, which isn't a number (natural or real) by definition. Thus the diagonal argument didn't actually create a new number that's not in the set, because it merely created infinity, an invalid "number".

"But", you might argue, "the argument is about proving that the real numbers between 0 and 1 are uncountable!" That doesn't really make a difference because we can simply make those be binary values between 0.0 and 1.0:

0: 1.000000...
1: 0.000000...
2: 0.100000...
3: 0.010000...
4: 0.110000...
5: 0.001000...

and now the diagonal argument constructs the number:

X: 0.111111...

which is just 1.0, which already exists in our original set! The diagonal argument failed to create a new number that's not in the set.

"But", you may once again argue, "you can't represent irrational numbers like pi or square root of 2 like that because they have an infinite representation and they can't be mapped to a natural number because... oh..."

Which is precisely where the circularity of the argument arises: It already assumes that there are numbers that cannot be enumerated like this... and proceeds to prove that there are numbers that cannot be enumerated like this.

PeroK
Homework Helper
Gold Member
2020 Award
Cantor's diagonal argument, essentially, proves (or demonstrates, as I'm not exactly sure if it's considered a mathematically rigorous proof) that the set of all real numbers is uncountable, ie. essentially larger than the set of natural numbers.

It does this by, essentially, proof by contradiction: We start with the assumption that the set of all real numbers is countable, we list all these numbers (which should be possible if they are countable), and we proceed to construct a number that's not in the set. Thus our original assumption that we listed all the numbers was proven incorrect, and thus the premise that the set of real numbers is countable is incorrect.

But thinking about the argument, I can't help but get the feeling that it's circular: It assumes what it's trying to prove.

What do I mean by this?

If we start with the assumption that the set of all real numbers is countable, that means that we can assume that we can create a bijection between this set and the set of natural numbers. In other words, that we can "remap" all the numbers to the natural numbers (which would essentially be their position in the list), and this shouldn't make a difference. If the set is countable, then it shouldn't really matter how we represent the values, as long as they are unique (ie. it's a true bijection.)

Thus we can represent every number in the set with consecutive natural numbers. We can do it eg. in base-2, starting with the least-significant digit first, like:

0: 000000...
1: 100000...
2: 010000...
3: 110000...
4: 001000...
5: 101000...

and so on and so forth. But now if we construct a new number by taking the first digit of the first number and inverting it, the second digit of the second number and inverting it and so on, we get a new number like:

X: 111111...

which indeed is not in our set... because it's not a number at all. A base-2 number with an infinite amount of 1-digits is just infinity, which isn't a number (natural or real) by definition. Thus the diagonal argument didn't actually create a new number that's not in the set, because it merely created infinity, an invalid "number".

"But", you might argue, "the argument is about proving that the real numbers between 0 and 1 are uncountable!" That doesn't really make a difference because we can simply make those be binary values between 0.0 and 1.0:

0: 1.000000...
1: 0.000000...
2: 0.100000...
3: 0.010000...
4: 0.110000...
5: 0.001000...

and now the diagonal argument constructs the number:

X: 0.111111...

which is just 1.0, which already exists in our original set! The diagonal argument failed to create a new number that's not in the set.

"But", you may once again argue, "you can't represent irrational numbers like pi or square root of 2 like that because they have an infinite representation and they can't be mapped to a natural number because... oh..."

Which is precisely where the circularity of the argument arises: It already assumes that there are numbers that cannot be enumerated like this... and proceeds to prove that there are numbers that cannot be enumerated like this.
The diagonal argument doesn't quite work in base 2. You need to represent the numbers in another base.

All you have is a failed attempt at a proof.

PeterDonis
Mentor
2020 Award
Cantor's diagonal argument, essentially, proves (or demonstrates, as I'm not exactly sure if it's considered a mathematically rigorous proof) that the set of all real numbers is uncountable, ie. essentially larger than the set of natural numbers.

Cantor's diagonal argument is a mathematically rigorous proof, but not of quite the proposition you state. It is a mathematically rigorous proof that the set of all infinite sequences of binary digits is uncountable. That set is not the same as the set of all real numbers.

It can be proven that there is a bijection between the set of all infinite sequence of binary digits and the set of all real numbers, which, combined with Cantor's diagonal argument, shows that the set of all real numbers is uncountable. But AFAIK Cantor never actually produced any proof that such a bijection exists.

Cantor did publish a different proof that the set of real numbers is uncountable, not using the diagonal argument:

https://en.wikipedia.org/wiki/Cantor's_first_set_theory_article

WWGD and etotheipi
FactChecker
Gold Member
But thinking about the argument, I can't help but get the feeling that it's circular: It assumes what it's trying to prove.

What do I mean by this?

If we start with the assumption that the set of all real numbers is countable,
A circular argument is where you assume what you are trying to prove. That is invalid. A proof by contradiction is where you assume what you are trying to disprove. That is completely different and it is valid.
The Cantor diagonal proof is a valid proof by contradiction.

A circular argument is where you assume what you are trying to prove. That is invalid. A proof by contradiction is where you assume what you are trying to disprove. That is completely different and it is valid.
The Cantor diagonal proof is a valid proof by contradiction.
I don't think you understood my post. I did not say that the argument is circular because it starts with the assumption that the set of real numbers is countable. The reason why I think it's circular is stated at the end of my post, not the part which you quoted.

Cantor's diagonal argument is a mathematically rigorous proof, but not of quite the proposition you state. It is a mathematically rigorous proof that the set of all infinite sequences of binary digits is uncountable. That set is not the same as the set of all real numbers.
Technically speaking you are completely right, as it indeed deals with infinite strings of digits (without taking a stance on whether the set of them has a bijection to the set of real numbers). But still, I don't think that changes the essence of my question.

If we start with the assumption that the set of all possible infinite strings is countable (which is the assumption we are trying to disprove), then it ought to make no difference if we remap every element in this set to the natural numbers. This is because if you argue that you can't do this kind of remapping without changing something essential about the situation, you are arguing that there is no such bijection between the set of all infinite strings and the set of natural numbers... which is the very thing we are trying to prove. By saying "you can't do this remapping" you would be already assuming what you are trying to prove, which would be a circular argument.

I'm certain that I'm missing something here, that I have misunderstood something, but I'm not sure what.

PeroK
Homework Helper
Gold Member
2020 Award
Technically speaking you are completely right, as it indeed deals with infinite strings of digits (without taking a stance on whether the set of them has a bijection to the set of real numbers). But still, I don't think that changes the essence of my question.

If we start with the assumption that the set of all possible infinite strings is countable (which is the assumption we are trying to disprove), then it ought to make no difference if we remap every element in this set to the natural numbers. This is because if you argue that you can't do this kind of remapping without changing something essential about the situation, you are arguing that there is no such bijection between the set of all infinite strings and the set of natural numbers... which is the very thing we are trying to prove. By saying "you can't do this remapping" you would be already assuming what you are trying to prove, which would be a circular argument.

I'm certain that I'm missing something here, that I have misunderstood something, but I'm not sure what.

Cantor's proof has a subtlety. Many rationals can be expressed in two ways. E.g. ##0.4999 \dots = 0.500 \dots##. The proof in base 10 needs to take account of this and avoid the possibility of getting a string of nines and/or a string of zeros. This is easy to do. For example, you simply never pick 0 or 9, as there are always other digits to choose from.

If you try in base 2, the problem of avoiding this gets harder, which is what you found.

You might be able to repair the proof using base 2, but better to stick to base 10. A proof is a proof.

FactChecker and etotheipi
If you try in base 2, the problem of avoiding this gets harder, which is what you found.
Yeah, I suppose that if I used my remapping trickery in base 3 (or higher) rather than base 2, then it would be more obvious that the argument is sound, as it's now easy to create a new number that doesn't exist in the set (and isn't just infinity).

But then... how do I know that the generated number wasn't actually in the original set that got remapped, and this diagonalization technique didn't simply replicate one of those numbers? Or does this even matter?

The argument, as I have changed it, is now essentially saying "if you remap the set of all infinite strings of digits to the set of all finite strings of digits, there are no infinite strings of digits in this new set", which seems to be self-evidently true, almost tautological?

I think there's something fundamentally wrong with my idea of remapping the supposedly-countable set to the natural numbers, but my very limited mathematical knowledge doesn't allow me to fully understand the basic problem in it.

PeroK
Homework Helper
Gold Member
2020 Award
But then... how do I know that the generated number wasn't actually in the original set that got remapped, and this diagonalization technique didn't simply replicate one of those numbers? Or does this even matter?

In the diagonal argument it is important that a) the string is not in the list and b) the string is not equal to another (different) string in the list when the two strings are mapped to real numbers. a) is easy; b) is the subtlety.

There is no need for any further trickery!

FactChecker
Gold Member
I don't think you understood my post. I did not say that the argument is circular because it starts with the assumption that the set of real numbers is countable. The reason why I think it's circular is stated at the end of my post, not the part which you quoted.
The issue with multiple representations of a number is not an assumption that creates a circular argument. It is a different problem that is easily avoided by using the base-10 representation. As @PeroK indicated, that gives you much more freedom to generate unlisted numbers while avoiding the problem of multiple representations. It is not "trickery"; it is a more natural approach. Another thing I like about using the decimal representation is that it gives you so much freedom to create vastly more unlisted numbers than listed numbers. So it prepares you for the introduction of transfinite numbers beyond ##\aleph_0##.

Last edited:
pbuk
Gold Member
The argument, as I have changed it, is now essentially saying "if you remap the set of all infinite strings of digits to the set of all finite strings of digits, there are no infinite strings of digits in this new set", which seems to be self-evidently true, almost tautological?
No, you have that the wrong way round; the argument is that if you try to remap the set of all infinite strings of digits to the set of all finite strings of digits I can always show that you have not succeeded by creating an infinite string of digits which you have not yet mapped.

No, you have that the wrong way round; the argument is that if you try to remap the set of all infinite strings of digits to the set of all finite strings of digits I can always show that you have not succeeded by creating an infinite string of digits which you have not yet mapped.
Actually no, because that was precisely one of the questions I asked in that particular post: "How do we know that this new generated number wasn't actually in the original set, which we remapped to the natural numbers?"

Anyway, I may have expressed myself poorly in that paragraph which you are quoting. I was not trying to say "there's a flaw in Cantor's diagonal argument because (the reason I described)", but instead, "there's probably a flaw in my argument because (the reason I described), I just don't know exactly what this flaw is."

What I'm trying to say is that if I make the assumption that the set of all infinite strings is countable, and using this assumption assume that I can "remap" all these strings to the natural numbers, we don't even need the diagonalization operation to create a number that's not in this new set. We simply need any infinite string that contains an infinite amount of 1-digits (in base 3 or higher), and we have a string that doesn't exist in this new set.

However, something doesn't feel right here. We have no way of knowing if this string didn't already exist in the original set, and was simply remapped to some natural number. Thus the argument would just say, like "we remap all the original numbers, like 1/3, to the natural numbers... and now there's no 1/3 in this new set!" Which is completely tautological. Of course there isn't, because we just remapped it to a natural number. But it isn't any sort of proof either. It doesn't prove anything.

There must be some fundamental mistake I'm doing in this whole thing that I'm not grasping.

PeroK
Homework Helper
Gold Member
2020 Award
Of course there isn't, because we just remapped it to a natural number.

There must be some fundamental mistake I'm doing in this whole thing that I'm not grasping.

Are you imagining that a typical item is the list is:
$$0.n_1n_2n_3 \dots$$
Where ##n_1n_2n_3 \dots## is a natural number?

FactChecker
Gold Member
Here is a point that might simplify things. It is perfectly ok to have numbers on the list multiple times in different representations. If that list does not contain all numbers, then clearly a cleaner list, without repeats, would also not contain all numbers. Where you want to be careful is when you generate the unlisted number. Then you want to avoid generating any number that might have other representations which might be on the list. That is easy to avoid in base 10. Just generate a number that does not end in infinite 0's or 9's. Throw in other digits 1,2,..,8, (also different from the digit on the list in that position) periodically.

Last edited:
pbuk
PeterDonis
Mentor
2020 Award
If we start with the assumption that the set of all possible infinite strings is countable

We start with the assumption that we have a sequence of infinite strings of binary digits. We do not assume that this sequence contains all possible infinite strings of binary digits. We simply ask the question: can we, given any such sequence, construct an infinite string of binary digits that cannot be in the sequence? The diagonal argument proves, by construction, that the answer to this question is yes.

This argument, by itself, does not prove that the set of infinite strings of binary digits is uncountable. To prove that, we need an additional argument as follows:

Premise #1: If the set of infinite strings of binary digits is countable, then there must exist some sequence of infinite strings of binary digits that contains all of them.

Premise #2: There does not exist any such sequence (by the argument above).

Conclusion: The set of infinite strings of binary digits is not countable.

Premise #1 does contain (as the first clause of a hypothetical) the assumption that the set of infinite strings of binary digits is countable--but Premise #1, as noted above, is not the premise used in the diagonalization construction. That construction--the proof that the answer to the question asked above is yes--makes no such assumption. So there is no circularity anywhere.

pbuk
PeterDonis
Mentor
2020 Award
it ought to make no difference if we remap every element in this set to the natural numbers

The easiest way to avoid any issues with "mapping" (which, as others have noted, can be avoided by using some other representation for numbers than sequences of binary digits to do the mapping, but that involves a more complicated argument) is to note that there is an independent proof of the theorem that there is a bijection between infinite sequences of binary digits and the real numbers (not the natural numbers). So once we have proven that the set of infinite sequences of binary digits is uncountable, we can use that theorem about the existence of a bijection to prove that the set of real numbers is uncountable, without ever having to deal with any issues of how exactly we "map" infinite sequences of binary digits to numbers.

If we start with the assumption that the set of all possible infinite strings is countable

We start with the assumption that we have a sequence of infinite strings of binary digits. We do not assume that this sequence contains all possible infinite strings of binary digits.
But we have to assume that the list is countable, and in fact remappable to the natural numbers, else we wouldn't be able to list it, and most importantly we would not be able to say "take the first digit of the first string, the second digit of the second string, etc." The very fact that we are talking about "the first string", "the second string" and so on requires that we assume the list of strings is countable, ie. enumerable, ie. mappable to the natural numbers.

PeterDonis
Mentor
2020 Award
we have to assume that the list is countable

We have to assume that we have a countable list of infinite binary strings, yes, since that's the definition of a "sequence". But we do not have to assume that that list contains all infinite binary strings. So we are not assuming that the set of all infinite binary strings is countable. We are only assuming that the set of infinite binary strings that appears in our list is countable. In other words, we are assuming that there exists a countable set of infinite binary strings. But this assumption is obviously true, since we know the set of all possible infinite binary strings is not finite, and any infinite set contains a countable subset.

FactChecker
Gold Member
One thing that would invalidate the proof is if we assumed what we want to prove. We want to prove that the countable list does not contain all the reals. We have never assumed that -- we proved it.

PeroK
Homework Helper
Gold Member
2020 Award
But we have to assume that the list is countable, and in fact remappable to the natural numbers, else we wouldn't be able to list it, and most importantly we would not be able to say "take the first digit of the first string, the second digit of the second string, etc." The very fact that we are talking about "the first string", "the second string" and so on requires that we assume the list of strings is countable, ie. enumerable, ie. mappable to the natural numbers.

Here we assume the opposite (negation) of what we want to prove. We assume the reals are countable; obtain a contradiction and conclude they must be uncountable.

This is called a proof by contradiction. For example, you can prove there are infinitely many primes using a proof by contradiction:

1) Assume there is only a finite number of primes.
2) Produce a contradiction (something you already know to be false, or that contradicts the assumption 1).
3) Conclude that the number of primes cannot be finite.

@PeroK
There are two ways (both closely related) to look at it. The one you have described is one way. The other is described at the beginning of post#16:
We start with the assumption that we have a sequence of infinite strings of binary digits. We do not assume that this sequence contains all possible infinite strings of binary digits. We simply ask the question: can we, given any such sequence, construct an infinite string of binary digits that cannot be in the sequence? The diagonal argument proves, by construction, that the answer to this question is yes.

P.S. pointing this out for benefit of OP

Last edited:
In answer to the original question: Cantor's proof is indeed a proof and it is entirely rigorous.

In outline, it shows that IF there existed a one-to-one correspondence between the positive integers Z+ and the real numbers, then it is always possible to find a real number that no integer in Z+ corresponds to.

Which instantly shows that the IF clause is not possible. That's what it would mean for the set of real numbers and the set of integers to have "the same size". Further, since the integers form a subset of the real numbers, this shows that the set of real numbers is of a "larger size" than the set of integers.

The diagonal argument doesn't quite work in base 2. You need to represent the numbers in another base.

All you have is a failed attempt at a proof.
Cantor used base 2 in his 1891 paper.

From: https://en.wikipedia.org/wiki/Cantor's_diagonal_argument#

An illustration of Cantor's diagonal argument (in base 2) for the existence of uncountable sets. The sequence at the bottom cannot occur anywhere in the enumeration of sequences above.

PeroK
PeterDonis
Mentor
2020 Award
In outline, it shows that IF there existed a one-to-one correspondence between the positive integers Z+ and the real numbers, then it is always possible to find a real number that no integer in Z+ corresponds to.

As already noted, the diagonal argument does not work directly with real numbers; it works with infinite sequences of binary digits. That avoids a number of complications which have been discussed in this thread.