Proof of Banach's fixed point theorem

Homework Helper

Homework Statement

I was a bit surprised to find out that one of the exercises in Munkres is actually a proof to the Banach fixed point theorem, unless I'm mistaken. The exercise follows:

If (X, d) is a complete metric space, and f : X --> X a contraction mapping, there is a unique point x0 in X such that f(x0) = x0.

The Attempt at a Solution

Somehow one needs to combine completeness of X with properties of the contraction mapping f, i.e. it is uniformly continuous, for example...

I've beed playing around for some while, trying to follow some ideas, but this seems non-trivial. When I peaked at the proof (I didn't look at it, though), it seemed a bit complicated.

Any discrete hints on solving this one?

Ah yes, the Banach fixed point theorem. It has a beautiful proof, but it's not that easy to find...

The exercise says that you should compare to Exercise 7, of section 28. Specifically, take a look at (a). I believe 7.a and this problem have the same proof...

Homework Helper
Yes, I know, but I didn't solve that exercise :) It's equivalent, it requires that X be a compact metric space, and a compact metric space is also sequentially compact, hence complete...I'll try to follow the hint from the book.

Hmm, I do not like the hint of 7.(a). Let me give you another one:

Put $$f^1=f$$ and $$f^{n+1}=f^n\circ f$$. Take x arbitrary, you must show that $$(f^n(x))_n$$ is a Cauchy sequence, and converges to a fixed point.

The choice of this sequence may seem awkward, but it's actually very natural. Take a calculator and enter an arbitrary number. If you keep pushing the "cosine" or "sine" or something like that, then eventually you will get a certain number. And if you keep pressing cosine, then this number will not change. This is your fixed point. This should illustrate the choice of the sequence $$(f^n(x))_n$$.

Homework Helper
Hmm, I do not like the hint of 7.(a). Let me give you another one:

Put $$f^1=f$$ and $$f^{n+1}=f^n\circ f$$. Take x arbitrary, you must show that $$(f^n(x))_n$$ is a Cauchy sequence, and converges to a fixed point.

The choice of this sequence may seem awkward, but it's actually very natural. Take a calculator and enter an arbitrary number. If you keep pushing the "cosine" or "sine" or something like that, then eventually you will get a certain number. And if you keep pressing cosine, then this number will not change. This is your fixed point. This should illustrate the choice of the sequence $$(f^n(x))_n$$.

Hmm, OK thanks I'll think about it...

Homework Helper
I have attached a small sketch I drew while I was thinking about this... From this it is obvious that d(f^n(x), f^n(y)) --> 0, but I wonder if it motivates the question if the sequence (f^n(x))n is Cauchy, for a fixed x, too? I mean, since for any x, y, the sketch will "behave" that way, so there should be a correlation between this and the Cauchy property for a fixed point x..? Just curious.

And btw, the motivation behind this is as follows:

if x0 = f(x0) then the sequence (f^n(x0))n converges to x0. But how to understand the fact that the same sequence converges to its fixed point x0 for an arbitrary start point x? The Cauchy property says that for some ε > 0 there is some integer N such that for all m, n >= N we have d((f^n(x)), (f^m(x))) < ε. So, after applying f again and again, the difference can be made arbitrarily small... Does this imply in an informal manner that f(x) = x, i.e. that the "application" of f doesn't change our input much?

I'm just trying to motivate myself before trying to solve this, maybe it will make things easier.

Attachments

• Clipboard02.jpg
4.9 KB · Views: 366
Homework Helper
And btw I assume the Banach theorem is important since the class of contraction maps and complete metric spaces is pretty "big", right?

Edit: oh yes, and btw the illustration in the figure I gave in the post above could look different, of course... But in all cases one can visualize the convergence of the distances to 0 and some kind of non-linear behaviour. Perhaps I'm thinking too much like an engineer

I have attached a small sketch I drew while I was thinking about this... From this it is obvious that d(f^n(x), f^n(y)) --> 0, but I wonder if it motivates the question if the sequence (f^n(x))n is Cauchy, for a fixed x, too? I mean, since for any x, y, the sketch will "behave" that way, so there should be a correlation between this and the Cauchy property for a fixed point x..? Just curious.

Well, what you've shown now is that the sequences $$(f^n(x))_n$$ and $$(f^n(y))_n$$ converge to the same point (for all x and y), if they converge. But I fear that this will not bring you closer to the answer...

I'll give you a small start. To show that $$(f^n(x))_n$$ is Cauchy, you need to show that $$d(f^n(x),f^m(x))\rightarrow 0$$ if $$n,m\rightarrow +\infty$$.
But instead of doing this, try focusing on $$d(f^n(x),f^{n+1}(x))$$. If you apply the definiton of contraction mapping to this, then you get an inequality between $$d(f^n(x),f^{n+1}(x))$$ and $$d(x,f(x))$$.

And btw I assume the Banach theorem is important since the class of contraction maps and complete metric spaces is pretty "big", right?

Edit: oh yes, and btw the illustration in the figure I gave in the post above could look different, of course... But in all cases one can visualize the convergence of the distances to 0 and some kind of non-linear behaviour. Perhaps I'm thinking too much like an engineer

Oh, the Banach theorem is extremely important. You can use it for several things. One of the most important applications is to show that there always exist a solution to differential and integral equations. Basically, if you want to show that there exists a unique solution to an equation like

$$f(x)=\int_0^x{f(x)e^{-x^2}dx}$$

then one applies the Banach theorem to

$$T:\mathcal{C}(\mathbb{R},\mathbb{R})\rightarrow \mathcal{C}(\mathbb{R},\mathbb{R}): f\rightarrow \int_0^x{f(x)e^{-x^2}dx}$$

Banach's theorem gives us a unique fixed point of T, which corresponds to the unique solution of the integral equation.

Another application of Banach fixed point theorem is in fractals. Basically, a fractal is some kind of fixed point to a suitable contraction

Homework Helper
Well, we can conclude that d(f^n(x), f^n+1(x)) < α d(x, f(x)), for a fixed x, and for any integer n, where α < 1 is the "contraction coefficient". I wrote the whole chain of inequalities down for a few values of n, and it appears obvious that d(f^n(x), f^n+1(x)) --> 0. This is correct, for a first step, right?

Well, we can conclude that d(f^n(x), f^n+1(x)) < α d(x, f(x)), for a fixed x, and for any integer n, where α < 1 is the "contraction coefficient". I wrote the whole chain of inequalities down for a few values of n, and it appears obvious that d(f^n(x), f^n+1(x)) --> 0. This is correct, for a first step, right?

$$d(f^n(x),f^{n+1}(x))<\alpha^n d(x,f(x))$$

Homework Helper
Hm, interesting...

We have d(f(x), f^2(x)) <= α d(x, f(x)). Then, d(f^2(x), f^3(x)) <= α d(f(x), f^2(x)) <= α^2 d(x, f(x)). Where does the "<=" sign drop to "<"? Man, I feel stupid...

Oh, no your right. I'm sorry. It isn't a strict inequality. So it had to be

$$d(f^n(x),f^{n+1}(x))\leq \alpha^n d(x,f(x))$$

Don't feel stupid, this is actually a very hard problem if you have to find it yourself...

So, that's the first step. Now, we need to find an estimation to $$d(f^n(x),f^m(x))$$ somehow, and we will have to use our first step. Use the triangle inequality for this.

Homework Helper
OK.

So just to make sure again, our goal right now is to show that d(f^n(x), f^m(x)) --> 0 as m, n --> ∞, right?

Yes, that is our goal. Then you have proven that we are dealing with a Cauchy sequence (and that was the hardest part of the proof)

Homework Helper
Well, let n, m be positive integers. We're interested in all the integers between them, i.e. in applying our result to all consecutive pairs of integers.

By the triangle inequality we have (I assumed first that m > n, and the notation of the "exponents" will be catastrophic, but I think the point is clear):

d(f^m(x), f^n(x)) <= d(f^m-1(x), f^m-2(x)) + ... + d(d^m-k(x), d^n(x)).

Now we can apply our previous result to the right side of the inequality in order to obtain:

d(f^m(x), f^n(x)) <= α^m-2 d(x, f(x)) + ... + α^n d(x, f(x)) = d(x, f(x))*(α^m-2 + ... + α^n).

As n, m --> ∞, the factor (α^m-2 + ... + α^n) gets smaller and smaller, hence we conclude that d(f^m(x), f^n(x)) --> 0. Basically, it should be the same thing, whatever m, n we chose, either m > n, or m <= n, the proof should be the same.

Yes, that is entirely correct!

So, you have just shown that $$(f^n(x))_n$$ is a Cauchy sequence. By completeness, this sequence converges, thus we can define

$$x_0=\lim_{n\rightarrow +\infty}{f^n(x)}$$.

Now you need to show that $$f(x_0)=x_0$$.

Homework Helper
Yes, that is entirely correct!

So, you have just shown that $$(f^n(x))_n$$ is a Cauchy sequence. By completeness, this sequence converges, thus we can define

$$x_0=\lim_{n\rightarrow +\infty}{f^n(x)}$$.

Now you need to show that $$f(x_0)=x_0$$.

Excellent! OK, I'm working on it...

Homework Helper
OK.

So, we know f^n(x) --> x0. Hence f(f^n(x)) --> f(x0). But it seems that the members of the sequence f(f^n(x)) are the same as the members of the sequence f^n(x), only "shifted" for one "potention"... So we conclude f(f^n(x)) --> x0. And hence f(x0) = x0 follows.

Correct!!!! Note that this didn't use contractions or anything at all. Thus if $$(f^n(x))_n$$ converges, then it is always a fixed point!

So, that leaves us with the uniqueness of the fixed point. Assume two fixed points x and y. What can be said about d(f(x),f(y))??

Homework Helper
Correct!!!! Note that this didn't use contractions or anything at all. Thus if $$(f^n(x))_n$$ converges, then it is always a fixed point!

So, that leaves us with the uniqueness of the fixed point. Assume two fixed points x and y. What can be said about d(f(x),f(y))??

Well, we easily arrive at a contradiction if we use the contraction property on two assumed fixed points x and y.

Thanks a lot! But I wouldn't manage to do it without your help! Although it seems nice and easy now, as always...

Yes, the proof of Banach's fixed point theorem is trivial, once you know it But actually coming up with it, is really hard. All I can say is: Banach was a genious

Homework Helper
Yes, the proof of Banach's fixed point theorem is trivial, once you know it But actually coming up with it, is really hard. All I can say is: Banach was a genious

Yes, the key point seemed to conclude that the sequence (f^n(x))n is Cauchy. Btw just another subtle point I'm not 100% sure about: we said that d(f^n(x), d^m(x)) --> 0 is equivalent to saying that (f^n(x))n is Cauchy, for some fixed x in X. If (f^n(x))n is Cauchy, then for any ε > 0 there is an integer N such that for all m, n >= N we have d(f^n(x), f^m(x)) < ε. So this is equivalent to saying that d(f^n(x), d^m(x)) --> 0 as m, n --> ∞, right? I mean, it's pretty obvious, only I didn't use that specific line of reasoning before. :)

Ah yes, I was probably a bit to fast there.

It is indeed true that: $$(x_n)_n$$ is a Cauchy-sequence if and only if $$d(x_n,x_m)\rightarrow 0$$ with $$n,m\rightarrow +\infty$$. I don't think it's that difficult to check.

In the beginning, most textbooks and students use the definition to check for a Cauchy sequence. Probably because it is more rigorous, and because they don't want to bother with limits with two variables.

But as time goes, the definition of Cauchy sequence becomes very cumbersome to use (try to prove the Banach fixed point theorem with the definition of Cauchy sequences, it becomes very ugly!). So after a while, most textbooks switch to the other characterization of Cauchy sequences, because it's much easier to use! So if you deal with Cauchy sequences a lot, you're going to see these kinds of arguments a lot of times!

Homework Helper
Ah yes, I was probably a bit to fast there.

It is indeed true that: $$(x_n)_n$$ is a Cauchy-sequence if and only if $$d(x_n,x_m)\rightarrow 0$$ with $$n,m\rightarrow +\infty$$. I don't think it's that difficult to check.

In the beginning, most textbooks and students use the definition to check for a Cauchy sequence. Probably because it is more rigorous, and because they don't want to bother with limits with two variables.

But as time goes, the definition of Cauchy sequence becomes very cumbersome to use (try to prove the Banach fixed point theorem with the definition of Cauchy sequences, it becomes very ugly!). So after a while, most textbooks switch to the other characterization of Cauchy sequences, because it's much easier to use! So if you deal with Cauchy sequences a lot, you're going to see these kinds of arguments a lot of times!

OK, thanks a lot! I will prove this equivalence (i.e. the other direction, I believe I have already shown the proof of one in an informal manner) for convenience.

This thread was very instructive indeed!