# Proof of uniqueness of limits for a sequence of real numbers

## Homework Statement

[/B]
The proposition that I intend to prove is the following. (From Terence Tao "Analysis I" 3rd ed., Proposition 6.1.7, p. 128).

$Proposition$. Let $(a_n)^\infty_{n=m}$ be a real sequence starting at some integer index m, and let $l\neq l'$ be two distinct real numbers. Then, it is not possible for $(a_n)^\infty_{n=m}$ to converge to $l$ while also converging to $l'$.

## Homework Equations

/definitions/propositions[/B]

$Defintion.$ Let $x, y$ be real numbers. The distance $d(x, y)$ between $x$ and $y$ is defined by $$d(x, y) := |x - y|.$$

$Proposition$. Let $x, y, z$ be real numbers. We have
(a) $$d(x, y) = d(y, x),$$
(b) $$d(x, z) \leq d(x, y) + d(y, z).$$

$Defintion.$ Let $l$ be a real number. A sequence $(a_n)^\infty_{n=m}$ of real numbers converges to $l$, $\lim_{n \rightarrow \infty} {a_n} = l$, iff. for every real $\epsilon > 0$ there is an $N \geq m$ such that for all $n$
$$n \geq N \Rightarrow d(a_n, l) \leq \epsilon.$$

## The Attempt at a Solution

[/B]
$Proof.$ Suppose for the sake of a contradiction that $\lim_{n \rightarrow \infty} {a_n} = l$ and that $\lim_{n \rightarrow \infty} {a_n} = l'$. We then have,
$$(\forall \epsilon > 0)(\exists N \geq m)(\forall n)(n \geq N \Rightarrow d(a_n, l) \leq \epsilon)$$
and
$$(\forall \epsilon > 0)(\exists N' \geq m)(\forall n)(n \geq N' \Rightarrow d(a_n, l') \leq \epsilon).$$

If we let $n' := max(N, N')$, then we have
$$(\forall \epsilon > 0)(\exists n' \geq m)(\forall n)(n \geq n' \Rightarrow d(a_n, l) \leq \epsilon \land d(a_n, l') \leq \epsilon).$$
Thus, by the triangle inequality and symmetry of distance we have
$$d(l, l') \leq d(a_n, l) + d(a_n, l') \leq \epsilon + \epsilon = 2 \epsilon.$$

Hence, $d(l, l') \leq 2 \epsilon.$ Since $l \neq l'$, we have $d(l, l') > 0$. If we choose $\epsilon = \frac {d(l, l')} 3$, we then arrive at $d(l, l') \leq \frac {2d(l, l')} 3$, a contradiction. Therefore, it is not possible to converge to both $l$ and $l'$.

My questions are.

(i) First and foremost. Is the proof of the proposition correct.

(ii) I am a little irritated about the "If we let $n' := max(N, N')$" part. Why is one allowed to do this? And how does one come up with this idea of taking the $max$ of two numbers in order to proceed throughout the proof. I have seen also the $min$ being used in proofs for limits of functions.

(iii) I see that in the proof supplied by Terence Tao, he adds the "If we choose $\epsilon = \frac {d(l, l')} 3$" part at the beginning of the proof. Why so? Surely, he did not know this at the beginning of the proof did he? I only saw this "opportunity" once I was at the end of the proof. I have seen such ""covering the tracks" procedures in many proofs. Why do we do this? Is it not allowed to leave the proof the way I have?

Related Calculus and Beyond Homework Help News on Phys.org
fresh_42
Mentor

## Homework Statement

[/B]
The proposition that I intend to prove is the following. (From Terence Tao "Analysis I" 3rd ed., Proposition 6.1.7, p. 128).

$Proposition$. Let $(a_n)^\infty_{n=m}$ be a real sequence starting at some integer index m, and let $l\neq l'$ be two distinct real numbers. Then, it is not possible for $(a_n)^\infty_{n=m}$ to converge to $l$ while also converging to $l'$.

## Homework Equations

/definitions/propositions[/B]

$Defintion.$ Let $x, y$ be real numbers. The distance $d(x, y)$ between $x$ and $y$ is defined by $$d(x, y) := |x - y|.$$

$Proposition$. Let $x, y, z$ be real numbers. We have
(a) $$d(x, y) = d(y, x),$$
(b) $$d(x, z) \leq d(x, y) + d(y, z).$$

$Defintion.$ Let $l$ be a real number. A sequence $(a_n)^\infty_{n=m}$ of real numbers converges to $l$, $\lim_{n \rightarrow \infty} {a_n} = l$, iff. for every real $\epsilon > 0$ there is an $N \geq m$ such that for all $n$
$$n \geq N \Rightarrow d(a_n, l) \leq \epsilon.$$

## The Attempt at a Solution

[/B]
$Proof.$ Suppose for the sake of a contradiction that $\lim_{n \rightarrow \infty} {a_n} = l$ and that $\lim_{n \rightarrow \infty} {a_n} = l'$. We then have,
$$(\forall \epsilon > 0)(\exists N \geq m)(\forall n)(n \geq N \Rightarrow d(a_n, l) \leq \epsilon)$$
and
$$(\forall \epsilon > 0)(\exists N' \geq m)(\forall n)(n \geq N' \Rightarrow d(a_n, l') \leq \epsilon).$$

If we let $n' := max(N, N')$, then we have
$$(\forall \epsilon > 0)(\exists n' \geq m)(\forall n)(n \geq n' \Rightarrow d(a_n, l) \leq \epsilon \land d(a_n, l') \leq \epsilon).$$
Thus, by the triangle inequality and symmetry of distance we have
$$d(l, l') \leq d(a_n, l) + d(a_n, l') \leq \epsilon + \epsilon = 2 \epsilon.$$

Hence, $d(l, l') \leq 2 \epsilon.$ Since $l \neq l'$, we have $d(l, l') > 0$. If we choose $\epsilon = \frac {d(l, l')} 3$, we then arrive at $d(l, l') \leq \frac {2d(l, l')} 3$, a contradiction. Therefore, it is not possible to converge to both $l$ and $l'$.

My questions are.

(i) First and foremost. Is the proof of the proposition correct.
Yes. You can even drop the contradiction and with the same arguments, you arrive at $d(l,l')=0$ as only possibility, which means $l=l'$.
(ii) I am a little irritated about the "If we let $n' := max(N, N')$" part. Why is one allowed to do this?
You need only a single number $n'$ above which your inequalities hold. So any greater than $N$ and $N'$ will do, because the rest of the proof doesn't use the fact, that all sequence elements are close to the limits. For the triangle inequality one $a_n$ will do, and your choice of $n'$ guarantees it.
And how does one come up with this idea of taking the $max$ of two numbers in order to proceed throughout the proof.
It is simply the smallest number above which we don't have to deal with any runaway elements as e.g. $a_k = 500 \cdot l$ anymore. We want to stay within a small open neighborhood of the limits, where all sequence elements can be found in. The finally many starting elements outside of this neighborhood aren't of interest. We consider a neighborhood radius of $\varepsilon$. Only after that comes the clue: $\varepsilon$ isn't fixed. We can make the neighborhood smaller and smaller, still have the same behavior of the sequence and the limit point can't get away.
I have seen also the $min$ being used in proofs for limits of functions.
(iii) I see that in the proof supplied by Terence Tao, he adds the "If we choose $\epsilon = \frac {d(l, l')} 3$" part at the beginning of the proof. Why so?
We only need from $1 \cdot d(l,l') \leq \frac{2}{3} d(l,l')$ the part $1 \leq \frac{2}{3}$ in case of the contradiction version where we assumed $l \neq l'$ and thus $d(l,l')\neq 0$ or $(1-\frac{2}{3})\cdot d(l,l') \leq 0$ and thus $d(l,l') = 0$ in the version without contradiction. That is we use either that $1 \leq \frac{2}{3}$ is wrong or that $1-\frac{2}{3} > 0$ in the second case. As you see, many other quotients than $\frac{2}{3}$ would do the same job. It's just that we need one of them, so why not simply take the one above. It doesn't matter hoe precise or narrow we are. We only need a contradiction or a positive factor, so no need to think a lot about it.
Surely, he did not know this at the beginning of the proof did he?
In this case? He did. Terry "sees" the entire proof beforehand. It is simply a matter of practice. But why bother? Just make a suitable choice when needed.
I only saw this "opportunity" once I was at the end of the proof. I have seen such ""covering the tracks" procedures in many proofs. Why do we do this? Is it not allowed to leave the proof the way I have?
It is allowed, so what do you mean by this? Maybe you find some answers here: https://www.physicsforums.com/insights/10-math-tips-save-time-avoid-mistakes/ (section 8).

It is allowed, so what do you mean by this? Maybe you find some answers here: https://www.physicsforums.com/insights/10-math-tips-save-time-avoid-mistakes/ (section 8).
I did not mean "allowed". I meant to say why is it accustomed in proofs to "move" certain discoveries which one made along the way in the proof to the beginning or other parts of the proof. In this case, why did Tao add "If we choose $\epsilon = \frac {d(l, l')} 3$" to the beginning of the proof, where as I discovered it at the end and left it there?

In this case? He did. Terry "sees" the entire proof beforehand. It is simply a matter of practice.
Are you telling me that he "saw" clearly that $\epsilon = \frac {d(l, l')} 3$"? Like as in he knew from the beginning that two thirds of the distance will be the contradiction. He knew that the moment he read the proposition and attempted the proof? What I "saw" was that somewhere the contradiction will have to do with the distance and the epsilon. But certainly not that it will be two thirds or any fraction less than 1.

PeroK
Homework Helper
Gold Member
I did not mean "allowed". I meant to say why is it accustomed in proofs to "move" certain discoveries which one made along the way in the proof to the beginning or other parts of the proof. In this case, why did Tao add "If we choose $\epsilon = \frac {d(l, l')} 3$" to the beginning of the proof, where as I discovered it at the end and left it there?

Are you telling me that he "saw" clearly that $\epsilon = \frac {d(l, l')} 3$"? Like as in he knew from the beginning that two thirds of the distance will be the contradiction. He knew that the moment he read the proposition and attempted the proof? What I "saw" was that somewhere the contradiction will have to do with the distance and the epsilon. But certainly not that it will be two thirds or any fraction less than 1.
It's a good point, but you are confusing a proof with the process of discovering a proof. Let me draw an analogy with writing a computer program. In the end your program is neat, logical and bug-free (hopefully). But, that doesn't mean you didn't go through several versions of your code and some serious de-bugging to get there!

• StoneTemplePython
fresh_42
Mentor
I did not mean "allowed". I meant to say why is it accustomed in proofs to "move" certain discoveries which one made along the way in the proof to the beginning or other parts of the proof. In this case, why did Tao add "If we choose $\epsilon = \frac {d(l, l')} 3$" to the beginning of the proof, where as I discovered it at the end and left it there?
This is a matter of taste and style. Some prepare what is needed at the start and have it available when needed, others mention it at the later point. I find the first version is more elegant than the latter. Gather what you have, what will be needed and then proceed only by conclusions from that base. The other version looks a bit as if it came to mind when written down. The vast majority of proofs are written on scratch, sorted and lined up and then written into a final version. However, probably not in this case, as it is a standard proof, which most mathematicians have seen dozens of times and applied themselves in this or a similar way even more often.
Are you telling me that he "saw" clearly that $\epsilon = \frac {d(l, l')} 3$"?
Yes. Firstly, because I know of Tao's incredible talent and secondly for the reasons just mentioned.
Like as in he knew from the beginning that two thirds of the distance will be the contradiction
Sure. The triangle inequality is the key. This means two summands. Choosing a third should therefore do. No need to think a lot about it.
He knew that the moment he read the proposition and attempted the proof?
This proof is a warm-up if at all. He didn't need to attempt the proof, he could write it down in its final version without thinking a lot.
What I "saw" was that somewhere the contradiction will have to do with the distance and the epsilon. But certainly not that it will be two thirds or any fraction less than 1.
That's a matter of practice. Try it and draw a picture: two circles with radius $\varepsilon$ around two different points $l,l'$, then a couple of dots in these circles to represent the sequence, and you will see, that one third away from each center won't get you to find common sequence elements - at least none of those above $n'$, at most you'll find some exceptions among the points $\{a_m, \ldots , a_{N}\}$ or $\{a_m,\ldots ,a_{N'}\}$ which also illustrates, why $n'$ has been the chosen the way it was. Some people are able to imagine such a picture without actually drawing it. The written version is simply the formal way of what can be seen in the picture.

• StoneTemplePython
It's a good point, but you are confusing a proof with the process of discovering a proof. Let me draw an analogy with writing a computer program. In the end your program is neat, logical and bug-free (hopefully). But, that doesn't mean you didn't go through several versions of your code and some serious de-bugging to get there!
I understand the computer program analogy, however, when you write a computer program, then you write comments in order for other participants working on the same program to be able to understand the line of code. Whereas in the case of a proof when parts discovered and needed later in a proof are added to the beginning, i.e. $\epsilon = \frac {d(l, l')} 3$ is just added there and not mentioned why, i.e. no comment as in the programming example. Also the computer program is not intended as a pedagogical method to teach a student. Whereas a text book like Tao's Analysis is targeted at the honors undergraduate student. So out of my view, a textbook should be there to teach. Am I right? And by adding certain parts of a proof to somewhere else just strips away this pedagogical purpose of a textbook. Because if I were not able to prove the proposition then I would have read on the first line of the proof $\epsilon = \frac {d(l, l')} 3$ and would have wondered "How did you know? Where does this come from?". As if in the way I did it, the reader (the student, the one who is learning) clearly "sees" why and how, because the discovery is left where it was discovered and needed in the proof.

Let me quote fresh_42 here.
This is a matter of taste and style. Some prepare what is needed at the start and have it available when needed, others mention it at the later point. I find the first version is more elegant than the latter.
This is in fact the argument I keep hearing whenever a mathematician covers his tracks. I agree. It does look more elegant and tidy. However, textbooks are not publications of papers. They are intended to teach a learner and by covering the tracks it seems, at least to me, that it makes it more confusing because one has to "unravel" the thought process rather than presenting it.

PeroK
Homework Helper
Gold Member
I understand the computer program analogy, however, when you write a computer program, then you write comments in order for other participants working on the same program to be able to understand the line of code. Whereas in the case of a proof when parts discovered and needed later in a proof are added to the beginning, i.e. $\epsilon = \frac {d(l, l')} 3$ is just added there and not mentioned why, i.e. no comment as in the programming example.
When I finished studying maths and went into computer programming that is exactly what I thought! Why don't maths books comment their work like programmers do (or are supposed to)? I also wondered about using more meaningful long-name variables in maths.

There is certainly a parallel between the "old-style" programming - some of the people I met when I started were proud of how hard their code was to understand - and some maths texts which perhaps take a sink or swim attitude. I gave up on my first Real Analysis book for this reason.

That said, some of your concerns here seem to be fairly minor. Would it really make so much difference if Tao said "and, if you are wondering why I chose that particular $\epsilon$, then you'll soon see"?

• StoneTemplePython
StoneTemplePython
Gold Member
2019 Award
There's some good advice on this thread. It may perhaps feel more motivated if Tao had said:

for any $\epsilon \gt 0$ ... work through the proof, then at the end, you have

$$d(l, l') \leq d(a_n, l) + d(a_n, l') \leq \epsilon + \epsilon = 2 \epsilon.$$
then finish with:

1.) since this is for any positive $\epsilon$ let's make a smart selection and consider the result if we set it
$\epsilon := \frac {d(l, l')} 3$

giving you
$d(l, l') \leq \frac {2d(l, l')} 3 \to d(l, l')\leq 0$, but by positive-definiteness of a distance function $d(l, l') = 0$ and $l = l'$, but this contradicts the fact that $\epsilon$ is positive.
- - - -
but this is the same as him saying at the beginning, for any for any $\epsilon \gt 0$, and spoiler alert I'm mostly interested in one case here, i.e. when $\epsilon = \frac {d(l, l')} 3$, again on the assumption $l \neq l'$, so let's proceed examining that specific case, rather than for the general positive $\epsilon$. (He's seen this 'plot' so many times the spoilers at the end come instantly to mind.)

It really doesn't make a difference. If you read it the way I've stated above, I don't think you'll interpret it as "covering your tracks".

Personally, I prefer not having the spoiler alert, working it through, and then making a wise selection at the end. (Perhaps this is for the same reason I don't like knowing spoilers before I see a movie or read a novel for the first time?) Ultimately this really is a matter of taste. In some sense you need to make the proof 'your own' and write it -- and interpret it-- in a way that fits you. Either way you go at it, I'd probably get comfortable with this proof, then ultimately discard the contradiction, recognizing that a real non-negative number $x$ where we know $x \lt \epsilon$ for every $\epsilon \gt 0$ means $x = 0$.

- - - -
The parallels between math and programming are a very nice add to this thread and something not contemplated enough in my view.

PeroK
Homework Helper
Gold Member

To prove a sequence has at most one limit we use a proof by contradiction. We assume two different limits. Note that there must be a finite distance between these two limits. We then use the definition of convergence to show that the sequence must eventually stay close to the first limit - in particular less than half the distance between the limits - and we show that it must eventually stay close to the second limit - again less than half the distance between the limits. Then, using the triangle inequality (although it's also fairly obvious), we see that a sequence cannot simultaneously be this close to both limits. This contradiction shows that the two limits must be one and the same.

Note: any distance less than half the distance will do. We choose one third of the distance.

The full, formal proof is then as follows:

If I wrote an analysis book in that style, would you be my first customer?

When I finished studying maths and went into computer programming that is exactly what I thought! Why don't maths books comment their work like programmers do (or are supposed to)? I also wondered about using more meaningful long-name variables in maths.

There is certainly a parallel between the "old-style" programming - some of the people I met when I started were proud of how hard their code was to understand - and some maths texts which perhaps take a sink or swim attitude. I gave up on my first Real Analysis book for this reason.
I actually wanted to write about naming variables in programming languages are also often self-explanatory (if the programmer chose good names not such as int x instead of int score) not like in mathematics. But I do not really have problems with variable naming in mathematics because a proposition is never as long as the amount of lines of codes in a program.

That said, some of your concerns here seem to be fairly minor. Would it really make so much difference if Tao said "and, if you are wondering why I chose that particular $\epsilon$, then you'll soon see"?
It just seems that adding to the beginning it comes there unjustified because we read from left to right and top to bottom and because that particular choice of epsilon is needed at the end and that decision is made at the end. Now, as some have mentioned Tao and many mathematicians have seen this proof or similar ones so they know what to look for and so on. But that is like me spoiling the solution to a problem to someone who is trying to solve the problem just because I already solve that problem or have a knowledge advantage. That is not pedagogical. What I am arguing is that we should keep the thought process as linear as possible. One step at a time. Of course I do not want to see the scratch paper or the failed attempts of different approaches. What I want to see is that choices are made where they are used. At least in textbooks. Because I am sure that along the way I will not be able to prove a proposition and I'll have to read the proof or at least look at it in the book. At then I am left there to unravel the thought process of the author rather than the author guiding me step by step.

To prove a sequence has at most one limit we use a proof by contradiction. We assume two different limits. Note that there must be a finite distance between these two limits. We then use the definition of convergence to show that the sequence must eventually stay close to the first limit - in particular less than half the distance between the limits - and we show that it must eventually stay close to the second limit - again less than half the distance between the limits. Then, using the triangle inequality (although it's also fairly obvious), we see that a sequence cannot simultaneously be this close to both limits. This contradiction shows that the two limits must be one and the same.

Note: any distance less than half the distance will do. We choose one third of the distance.

The full, formal proof is then as follows:

If I wrote an analysis book in that style, would you be my first customer?
Yes, if you add variable notation, limit notation and so on. Then, I would. And I would still attempt to prove the proposition on my own. But in the case where I would have no clue what to do, it would benefit.

There's some good advice on this thread. It may perhaps feel more motivated if Tao had said:

for any $\epsilon \gt 0$ ... work through the proof, then at the end, you have

then finish with:

1.) since this is for any positive $\epsilon$ let's make a smart selection and consider the result if we set it
$\epsilon := \frac {d(l, l')} 3$

giving you
$d(l, l') \leq \frac {2d(l, l')} 3 \to d(l, l')\leq 0$, but by positive-definiteness of a distance function $d(l, l') = 0$ and $l = l'$, but this contradicts the fact that $\epsilon$ is positive.
- - - -
Yes this seems to me a better approach. At least for people who get stuck and have to read the proof. I.e. being guided by the author.

but this is the same as him saying at the beginning, for any for any $\epsilon \gt 0$, and spoiler alert I'm mostly interested in one case here, i.e. when $\epsilon = \frac {d(l, l')} 3$, again on the assumption $l \neq l'$, so let's proceed examining that specific case, rather than for the general positive $\epsilon$. (He's seen this 'plot' so many times the spoilers at the end come instantly to mind.)

It really doesn't make a difference. If you read it the way I've stated above, I don't think you'll interpret it as "covering your tracks".
But he is not saying "spoiler alert I am only interested in this epsilon" he says "let $\epsilon = \frac {d(l, l')} 3$" and you the reader figure out what this means, why I chose this and so on.
Personally, I prefer not having the spoiler alert, working it through, and then making a wise selection at the end. (Perhaps this is for the same reason I don't like knowing spoilers before I see a movie or read a novel for the first time?) Ultimately this really is a matter of taste. In some sense you need to make the proof 'your own' and write it -- and interpret it-- in a way that fits you. Either way you go at it, I'd probably get comfortable with this proof, then ultimately discard the contradiction, recognizing that a real non-negative number $x$ where we know $x \lt \epsilon$ for every $\epsilon \gt 0$ means $x = 0$.
Yes it is a matter of taste. But which "taste" is better suited for students (the targeted audience) who are learning and not knowing this stuff?

fresh_42
Mentor
Yes it is a matter of taste. But which "taste" is better suited for students (the targeted audience) who are learning and not knowing this stuff?
I'm not sure whether your assumption on targeted audience can be made in such a general way. I often use books to look up things and then I don't want to get bored by a nice fairy tale of how the author had his eureka moment. This is in my opinion the reason to attend lectures on the matter where there is room to explain things. I certainly wouldn't buy a book which was written the way @PeroK described in post #9: What a waste of time!

As to your second point about learning this stuff: What do you want to learn? How one arrives at certain results, or how one writes mathematics in the most rigorous way? As I said, I think the first is a matter for the lectures and a lot of practice, whilst the second is what usually is meant, if mathematicians speak of the beauty of their passion. I'd prefer to listen in the first case, and to read in the second.

Of course the truth lies - as always - somewhere in the middle. I just wanted to demonstrate that there are as many good reasons for the other point of view.

I'm not sure whether your assumption on targeted audience can be made in such a general way. I often use books to look up things and then I don't want to get bored by a nice fairy tale of how the author had his eureka moment. This is in my opinion the reason to attend lectures on the matter where there is room to explain things. I certainly wouldn't buy a book which was written the way @PeroK described in post #9: What a waste of time!
I quote from the preface of Tao's Analysis I book
This text originated from the lecture notes I gave teaching the honours undergraduate-level real analysis sequence at the Univeristy of California, Los Angeles, in 2003...
and it continues

Typically, an introductory sequence in real analysis assumes that the students are already familiar with real numbers, ...
Obviously, the targeted audience are students at the undergraduate level who have not learned real analysis in a rigorous way and not people like you who want to "just look up" something. Hence, it is supposed to be pedagogical, i.e. teaching a student real analysis. And if you want more evidence of this then I can quote more prefaces that I have on my shelf like from Spivak Calculus, Linear Algebra Friedberg, Rudin PMA, and so on. And each of them will mention their audience as undergraduate students of mathematics.

I don't want to get bored by a nice fairy tale of how the author had his eureka moment.
When did I speak about eureka moments? My criticism was with the way proofs are presented. You mentioned earlier that you like the style where the proof writer just puts everything in the beginning rather than where it is needed. This is a matter of taste, which proof presentation one likes more. And I agree with you. I like that style also more in terms of presentation. But this is not a beauty show for proof presentation. It is a text book for teaching students and that should be the highest priority. Not how well and beautiful you can alter your thought processes and erase your stuff and pretend that you came up with everything in the beginning of the proof. Leave the stuff where it has to be used in text books. That is what I am arguing.

@PeroK described in post #9: What a waste of time!
Yes I should have not agreed with that type of book because his example included for each proposition two proofs: an "explanatory" one and a formal one.
I was not saying that we need that. I was saying only that we should not alter the proofs in text books. Now, I am not saying publish your scratch work as your proofs. I am saying let the proofs be formal but if you happen to discover a fact in your proof somewhere in the middle of the proof and you are using it there, then leave it there and do not move it to the beginning of the proof in your finalized polished proof. So that the reader (the one who is learning, as I established above), can see where that comes from and does not have to wait until that particular fact that has been transported to the beginning is going to be used at the end. It is far more pedagogical this way because you "discover" the "tricks", if you may, through the deduction itself and they are not presented magically at the beginning (again put yourself in the situation of a person giving the proof for the first time and not in the position of field medalist). I am not sure why my point of view about proofs in textbooks is so wrong?

As to your second point about learning this stuff: What do you want to learn? How one arrives at certain results, or how one writes mathematics in the most rigorous way? As I said, I think the first is a matter for the lectures and a lot of practice, whilst the second is what usually is meant, if mathematicians speak of the beauty of their passion. I'd prefer to listen in the first case, and to read in the second.

Of course the truth lies - as always - somewhere in the middle. I just wanted to demonstrate that there are as many good reasons for the other point of view.
What I want to learn from Analysis I by Tao is real analysis in the rigorous way explained by one of the greatest mathematicians of our time. That is why I picked Tao's book and not a book by some other author. And this includes also learning how a mathematician like Tao comes to the conclusions and how he thinks about the problem at hand. When I read his proof it seemed that he knew in the beginning in the first line that epsilon is two thirds of the distance as if by magic.