A question on proving the chain rule

"Don't panic!" · Jan 19, 2015

I'm currently reviewing my knowledge of calculus and trying to include rigourous (ish) proofs in my personal notes as I don't like accepting things in maths on face value. I've constructed a proof for the chain rule and was wondering if people wouldn't mind checking it and letting me know if it is correct or not (and what improvements may need to be made). Thanks for your time.

From the definition of the derivative of a differentiable function [itex]f:\mathbb{R}\rightarrow\mathbb{R}[/itex] (one-dimensional case), we have that [tex]f'(x)=\frac{df}{dx}=\lim_{\Delta x\rightarrow 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}= \lim_{\Delta x\rightarrow 0}\frac{\Delta f}{\Delta x}[/tex]
This implies that [tex]\frac{\Delta f}{\Delta x}= f'(x) +\varepsilon (x)\quad\Rightarrow\quad\Delta f = \left(f'(x)+\varepsilon (x)\right)\Delta x [/tex]
where [itex]\varepsilon (x)[/itex] is some error function which accounts for the difference between the actual (finite) change in [itex]f[/itex] and its linear approximation [itex]f'(x)\Delta x[/itex]. Furthermore, [itex]\varepsilon (x)[/itex] satisfies the property [tex]\lim_{\Delta x\rightarrow 0}\varepsilon (x)=0[/tex] such that as [itex]\Delta x \rightarrow 0,\quad\frac{\Delta f}{\Delta x}\rightarrow f'(x)[/itex].

Now, consider a function [itex]y=f\circ g(x)=f(g(x))[/itex] and let [itex]u=g(x)[/itex]. We have then, that [tex]\Delta u = g(x+\Delta x)-g(x)=\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x[/tex] [tex]\Delta y = f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u[/tex]
Note that [itex]\varepsilon_{2}(u)\rightarrow 0[/itex] as [itex]\Delta u\rightarrow 0[/itex]. However, since [itex]\Delta u\rightarrow 0[/itex] as [itex]\Delta x\rightarrow 0[/itex], this implies that [itex]\varepsilon_{2}(u)\rightarrow 0[/itex] as [itex]\Delta x\rightarrow 0[/itex].
And so,
[tex]f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u[/tex] [tex]\Rightarrow f(g(x+\Delta x))-f(g(x))=f\circ g(x+\Delta x)-f\circ g(x)\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=\left(f'(g(x))+\varepsilon_{2}(g(x))\right)\cdot\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))f'(g(x))g'(x)\Delta x +\left(f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))g'(x)\Delta x +\varepsilon_{3}\Delta x[/tex]

where [itex]\varepsilon_{3}\equiv f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}[/itex]. We see from this that as [itex]\Delta x\rightarrow 0,\quad\varepsilon_{3}\rightarrow 0[/itex]. Hence,
[tex]\lim_{\Delta x\rightarrow 0}\frac{f\circ g(x+\Delta x)-f\circ g(x)}{\Delta x}= (f\circ g)'(x)=f'(g(x))g'(x)[/tex]

Svein · Jan 19, 2015

I think you have a small hiccup in line 4 from the bottom - there seems to be too many f' in there.

Otherwise: Symbolically we write d(f°g)/dx = df/dg * dg/dx...

"Don't panic!" · Jan 19, 2015

Whoops, yes you're right, thanks for pointing it out. It should be, [tex]=f'(g(x))g'(x)\Delta x + \left(f'(g(x))\varepsilon_{1} + g'(x)\varepsilon_{2} + \varepsilon_{1} \varepsilon_{2} \right)\Delta x[/tex]

Fredrik · Jan 19, 2015

Your proof attempt gets really confusing when you introduce u and y. You say that y is a function, and then you set it equal to the number f(g(x)). After that point, I can't tell what's a function and what's a number. In those cases where the notation is supposed to represent the value of a function at a point in its domain, the notation hides what point that is.

Edit: I recommend a more explicit notation, e.g.
$$\Delta f(x,h)=f(x+h)-f(x) =(f'(x)+\varepsilon_f(x,h)) h.$$ For your approach to work, I think you will have to prove that ##h\mapsto \varepsilon_f(x,h)## is continuous.

"Don't panic!" · Jan 20, 2015

Yeah, I realized in hindsight that although original using u and y for notational convenience it does rather confuse things.

Is their a better approach?... I've seen some proofs which start by defining a new variable [tex]v=\frac{g(x+\Delta x)-g(x)}{\Delta x}- g'(x)[/tex] and stating that [itex]u[/itex] depends on the number [itex]\Delta x[/itex], and further, that as [itex]\Delta x\rightarrow 0,\quad v\rightarrow 0[/itex] (I assume this is true because [itex]g'(x)[/itex] is just a number and so [itex]\lim_{\Delta x\rightarrow 0} g'(x)=g'(x)[/itex] and therefore, as [itex]\lim_{\Delta x\rightarrow 0} \frac{g(x+\Delta x)-g(x)}{\Delta x}= g'(x)[/itex], the two terms cancel ?!). Here is the link to the one I was referencing in particular http://kruel.co/math/chainrule.pdf

Fredrik · Jan 20, 2015

The proof in that pdf is pretty good, but there's a detail that I think it doesn't explain well enough. When we get the result
$$\frac{f(g(x+h))-f(g(x))}{h} =(f'(g(x))+w)(g'(x)+v),$$ it's sufficient to show that ##\lim_{h\to 0}(f'(g(x))+w)=f'(g(x))## and ##\lim_{h\to 0}(g'(x)+v)=g'(x)##. The latter equality is pretty obvious, but the former is not. It would be more accurate to write ##w(g(x),g'(x)h+v(x,h)h)## instead of ##w##. But how do we know that $$\lim_{h\to 0}w\big(g(x),g'(x)h+v(x,h)h\big)=0?$$ An approach that I would consider as good as any is to do what the pdf does, but improve this part of the proof.

"Don't panic!" · Jan 20, 2015

Thanks for taking a look at it, appreciate it.

Fredrik said:

But how do we know that

limh→0w(g(x),g′(x)h+v(x,h)h)=0?

I assume it would involve using the definition of the limit or something like that (i.e. find a [itex]\delta >0[/itex] such that for any number [itex]\varepsilon >0[/itex], if [itex]\vert x-0\vert = \vert x\vert < \delta[/itex] then [itex]\vert w(g(x),g′(x)h+v(x,h)h) - 0\vert = \vert w(g(x),g′(x)h+v(x,h)h) \vert < \varepsilon[/itex])?

Fredrik · Jan 20, 2015

"Don't panic!" said:

I assume it would involve using the definition of the limit or something like that (i.e. find a [itex]\delta >0[/itex] such that for any number [itex]\varepsilon >0[/itex], if [itex]\vert x-0\vert = \vert x\vert < \delta[/itex] then [itex]\vert w(g(x),g′(x)h+v(x,h)h) - 0\vert = \vert w(g(x),g′(x)h+v(x,h)h) \vert < \varepsilon[/itex])?

Yes, an ε-δ proof is the straightforward way to do it. But it's not super easy. (Still, I don't know an easier way). Let ##\varepsilon>0## be arbitrary. The definition of ##w## and the assumption that ##f## is differentiable at ##g(x)## tell us that that there's a ##\delta_1>0## such that the following implication holds
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Because of this, it will be sufficient to find a ##\delta>0## such that the following implication holds
$$0<|h|<\delta\ \Rightarrow\ |g'(x)h+v(x,h)h|<\delta_1.$$ This is a bit tricky, but it's certainly doable. Here's a choice that works: (I recommend that you try to complete the proof yourself before you look at the spoiler).

Let ##\delta_2>0## be such that
$$0<|h|<\delta_2\ \Rightarrow\ |v(x,h)|<\frac{\delta_1}{2}.$$ Now define ##\delta## by
$$\delta=\min\left\{\frac{\delta_1}{2|g'(x)|},\delta_2,1\right\}.$$

Edit: Apparently I missed a factor of ##h##, and that ruined this attempted proof. See below for a new attempt.

"Don't panic!" · Jan 21, 2015

Fredrik said:

Because of this, it will be sufficient to find a δ>0\delta>0 such that the following implication holds

0<|h|<δ ⇒ |g′(x)h+v(x,h)h|<δ1

I have to admit, I've been a little stuck. Is the above true because [itex] k=g(x+h)-g(x)=g'(x)h+v(x,h)h[/itex] and therefore [itex]\vert k\vert < \delta_{1}[/itex] is equivalent to [itex]\vert g(x+h)-g(x)\vert = \vert g'(x)h+v(x,h)h\vert < \delta_{1}[/itex]?

My initial attempt was to note that [itex]g[/itex] is differentiable at [itex]x[/itex] and so there is some [itex]\delta_{2} >0[/itex] such that for any number [itex]\varepsilon_{2} >0[/itex], [tex]\Biggr\vert\frac{g(x+h)-g(h)}{h}-g'(x)\Biggr\vert = \vert v(x,h)\vert < \varepsilon_{2} \qquad\text{whenever}\qquad 0<\vert h\vert <\delta_{2}[/tex] I was then considering using the triangle inequality such that [tex]\vert g'(x)h+v(x,h)h \vert \leq \vert g'(x)h\vert +\vert v(x,h)h\vert = \vert h \vert\left(\vert g'(x)\vert +\vert v(x,h)\vert \right) < \vert h \vert\left(\vert g'(x)\vert + \varepsilon_{2}\right)[/tex] But I'm really not sure this is right and a bit unsure how to proceed?!

Fredrik · Jan 21, 2015

"Don't panic!" said:

Is the above true because [itex] k=g(x+h)-g(x)=g'(x)h+v(x,h)h[/itex] and therefore [itex]\vert k\vert < \delta_{1}[/itex] is equivalent to [itex]\vert g(x+h)-g(x)\vert = \vert g'(x)h+v(x,h)h\vert < \delta_{1}[/itex]?

I will need to clarify a few things. I didn't make it clear which of my statements are "for all" statements. In particular, I should have said this: There's a ##\delta_1>0## such that the following implication holds for all ##k## such that ##g(x)+k## is in the domain of ##f##:
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Note that this statement is saying nothing other than that ##f## is differentiable at ##g(x)##. Also note that because of the "for all", ##k## is a dummy variable; that symbol can be replaced by any other without changing the meaning of the statement.

This implies that for all ##h\in\mathbb R## such that

(a) ##x+h## is in the domain of ##g##,
(b) ##g(x)+g'(x)h+v(x,h)h## is in the domain of ##f##,
(c) ##0<|g'(x)h+v(x,h)h|<\delta_1##,

we have ##|w(g(x),g'(x)h+v(x,h)h)|<\varepsilon##.

Is that clear enough? The key is the "for all". Note for example how the statement "For all ##x##, we have ##x^2\geq 0##" implies that ##1^2\geq 0##".

"Don't panic!" said:

My initial attempt was to note that [itex]g[/itex] is differentiable at [itex]x[/itex] and so there is some [itex]\delta_{2} >0[/itex] such that for any number [itex]\varepsilon_{2} >0[/itex], [tex]\Biggr\vert\frac{g(x+h)-g(h)}{h}-g'(x)\Biggr\vert = \vert v(x,h)\vert < \varepsilon_{2} \qquad\text{whenever}\qquad 0<\vert h\vert <\delta_{2}[/tex] I was then considering using the triangle inequality such that [tex]\vert g'(x)h+v(x,h)h \vert \leq \vert g'(x)h\vert +\vert v(x,h)h\vert = \vert h \vert\left(\vert g'(x)\vert +\vert v(x,h)\vert \right) < \vert h \vert\left(\vert g'(x)\vert + \varepsilon_{2}\right)[/tex] But I'm really not sure this is right and a bit unsure how to proceed?!

I used the triangle inequality like that. Since the goal was to get ##<\delta_1## on the right, I decided to make sure that each of the two terms on the right is less than ##\frac{\delta_1}{2}##.

"Don't panic!" · Jan 22, 2015

Thanks for the additional info.

Here is my attempt so far.

Let [itex]\varepsilon >0[/itex] be any (positive) number. Then as [itex]g[/itex] is differentiable at [itex]x[/itex], there is a number [itex]\delta_{2} >0[/itex], such that [tex]0< \vert h\vert < \delta_{2} \qquad\Rightarrow\qquad\vert v(x,h)\vert < \frac{\delta_{1}}{2 \vert h\vert}[/tex] Further note that the [itex]\lim_{h\rightarrow 0} g'(x) = g'(x)[/itex] exists (as [itex]g'(x)[/itex] is just a number) and so there is a number [itex]\delta_{3} >0[/itex], such that, [tex]0< \vert h\vert < \delta_{3} \qquad\Rightarrow\qquad\vert g'(x) - g'(x)\vert < \frac{\delta_{1}}{\vert h\vert}[/tex] which, upon using the triangle inequality implies that [itex]\vert g'(x) \vert < \frac{\delta_{1}}{2 \vert h\vert}[/itex], as [tex]\vert g'(x) - g'(x)\vert \leq \vert g'(x) \vert +\vert - g'(x)\vert = \vert g'(x) \vert +\vert g'(x)\vert = 2 \vert g'(x) \vert < \frac{\delta_{1}}{\vert h\vert} [/tex]
(I'm a bit unsure on this bit, feel I may have done something a bit 'dodgy'?!).

As such, if we let [itex]\delta= \text{min}\bigg\lbrace \frac{\delta_{1}}{2 \vert h\vert}, \delta_{2}, \delta_{3} \bigg\rbrace[/itex] (I hope I'm using this notation correctly, i.e. that it means that [itex]\delta[/itex] assumes the smallest of the three elements in the set?), and suppose that [itex]0 <\vert h \vert < \delta[/itex], then [tex]\vert g'(x)h + v(x,h)h \vert \leq \vert g'(x)h \vert + \vert v(x,h)h \vert \\ \qquad\qquad\qquad\qquad = \vert h \vert \left(\vert g'(x) \vert + \vert v(x,h) \vert\right) \\ \qquad\qquad\qquad\qquad < \vert h \vert \left( \frac{\delta_{1}}{2 \vert h\vert} + \frac{\delta_{1}}{2 \vert h\vert} \right) \\ \qquad\qquad\qquad\qquad = \frac{\delta_{1}}{2} + \frac{\delta_{1}}{2} = \delta_{1}[/tex]
And therefore, there is a number [itex]\delta >0 [/itex], such that [tex]0 < \vert h \vert < \delta\qquad\Rightarrow\qquad \vert g'(x)h + v(x,h)h \vert < \delta_{1} [/tex]

Sorry if this is complete rubbish, I'm fairly new to doing rigorous [itex]\varepsilon[/itex] - [itex]\delta[/itex] proofs.

Fredrik · Jan 22, 2015

"Don't panic!" said:

Let [itex]\varepsilon >0[/itex] be any (positive) number. Then as [itex]g[/itex] is differentiable at [itex]x[/itex], there is a number [itex]\delta_{2} >0[/itex], such that [tex]0< \vert h\vert < \delta_{2} \qquad\Rightarrow\qquad\vert v(x,h)\vert < \frac{\delta_{1}}{2 \vert h\vert}[/tex]

I see that you've noticed that I missed a factor of ##|h|## when I described how to proceed. Unfortunately, the solution isn't to put an ##|h|## in the denominator on the right. That right-hand side must be independent of ##h##. I will need to think about how to correct my attempt to complete the proof.

If you want to prove that ##|g'(x)-g'(x)|## is less than some positive real number ##r##, then all you have to say is ##|g'(x)-g'(x)|=0<r##. No need to mention limits.

Fredrik · Jan 22, 2015

OK, new attempt. We want to prove that ##\lim_{h\to 0}w\big(g(x),g'(x)h+v(x,h)h\big)=0##. For all non-zero ##h##, we have ##g'(x)h+v(x,h)h=g(x+h)-g(x)##. This means that it will be sufficient to prove that ##\lim_{h\to 0}w\big(g(x),g(x+h)-g(x)\big)=0##. This makes things easier.

Let ##\varepsilon>0## be arbitrary. Let ##\delta_1## be a positive real number such that the following implication holds for all ##k## such that ##x+k## is in the domain of ##f##:
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Such a ##\delta_1## exists because ##f## is differentiable at ##g(x)##. Let ##\delta## be a positive real number such that the following implication holds for all ##h## such that ##x+h## is in the domain of ##g##:
$$0<|h|<\delta\ \Rightarrow\ |g(x+h)-g(x)|<\delta_1.$$ Such a ##\delta_2## exists because ##g## is continuous at ##x##. Now let ##h## be an arbitrary real number such that

(a) ##0<|h|<\delta##.
(b) ##x+h## is in the domain of ##g##.
(c) ##g(x+h)## is in the domain of ##f##.
(d) ##g(x+h)\neq g(x)##.

By definition of ##\delta##, we have ##|g(x+h)-g(x)|<\delta_1##. By definition of ##\delta_1##, this implies that ##\big|w\big(g(x),g(x+h)-g(x)\big)\big|<\varepsilon##. Since ##h## is an arbitrary real number that satisfies (a)-(d), this means that ##\lim_{h\to 0}w\big(g(x),g(x+h)-g(x)\big)=0##.

Note that (b)-(d) only ensure that ##h## is in the domain of the map ##t\mapsto w(g(x),g(x+t)-g(x))##.

Edit: There's an issue that the above doesn't account for. Condition (d) is supposed to ensure that there's no division by 0, but if ##g## is constant on an open neighborhood of ##x##, (d) is insufficient. We may have to handle the case of a constant ##g## separately. Even if ##g## isn't constant, there could be a non-zero ##h## such that ##g(x+h)=g(x)##. If there's a finite number of such ##h##, we can just choose ##\delta## small enough. If there's a whole sequence of them that goes to 0, then we can't do that, but we should be able to prove that this implies that ##g## isn't differentiable at ##x##, contradicting our assumptions.

"Don't panic!" · Jan 22, 2015

Thanks for looking into it further. Appreciate all the time you've put into this - it always seems to be the things that seem 'intuitive' that are the hardest to prove rigorously!

Fredrik · Jan 22, 2015

I need to do these things once in a while, or I'll forget how to do them. So the exercise was welcome. I think I still like a straightforward ε-δ proof the best. Instead of the w and the v in the proof above, I'll use the notation
$$f(x+h)=f(x)+f'(x)h+R_f(x,h),$$ and a similar notation for ##g##. Note that ##R_f(g(x),k)## is well-defined even when ##k=0##. We will use this to solve the division by zero issue. If we temporarily define ##k=g'(x)h+R_g(x,h)##, we have
$$f(g(x+h))=f(g(x)+g'(x)h+R_g(x,h))=f(g(x)+k)=f(g(x))+f'(g(x))k+R_f(g(x),k).$$ This implies that
\begin{align}
&\left|\frac{(f\circ g)(x+h)-(f\circ g)(x)}{h}-f'(g(x))g'(x)\right| =\left|\frac{f'(g(x))k+R_f(g(x),k)}{h}-f'(g(x))g'(x)\right|\\
&\left|\frac{f'(g(x))g'(x)h+f'(g(x))R_g(x,h)+R_f(g(x),k)}{h}-f'(g(x))g'(x)\right| =\left|\frac{f'(g(x))R_g(x,h)+R_f(g(x),k)}{h}\right|\\
&\leq|f'(g(x))|\left|\frac{R_g(x,h)}{h}\right|+\left|\frac{R_f(g(x),k)}{h}\right|.
\end{align}
Because of this, it's sufficient to find a ##\delta>0## such that if ##0<|h|<\delta##, then each of the two terms above is less than ##\frac\varepsilon 2##. Choose ##\delta_1>0## such that
$$0<|k|<\delta_1\ \Rightarrow\ \left|\frac{R_f(g(x),k)}{k}\right|<\frac{\varepsilon}{4|g'(x)|}.$$ Then choose ##\delta>0## such that if ##0<|h|<\delta##, then all of the following inequalities hold:
\begin{align}
&\left|\frac{R_g(x,h)}{h}\right| <\frac{\varepsilon}{2|f'(g(x))|},\\
&\left|\frac{R_g(x,h)}{h}\right| <|g'(x)|,\\
&|k|=|g'(x)h+R_g(x,h)|=|g(x+h)-g(x)|<\delta_1.
\end{align} Now let ##h## be an arbitrary real number such that ##x+h## is in the domain of ##g##, ##g(x+h)## is in the domain of ##f##, and ##0<|h|<\delta##. We have
$$|f'(g(x))|\left|\frac{R_g(x,h)}{h}\right|<\frac{\varepsilon}{2}.$$ If ##k=0##, we have
$$\left|\frac{R_f(g(x),k)}{h}\right|=0<\frac\varepsilon 2.$$ If ##k\neq 0##, we have
\begin{align}
&\left|\frac{R_f(g(x),k)}{h}\right| =\left|\frac{R_f(g(x),k)}{k}\right|\left|\frac k h\right| =\left|\frac{R_f(g(x),k)}{k}\right| \left|\frac{g'(x)h+R_g(x,h)}{h}\right|\\
&\leq \left|\frac{R_f(g(x),k)}{k}\right| \left(|g'(x)|+\left|\frac{R_g(x,h)}{h}\right|\right) < \frac{\varepsilon}{4|g'(x)|}2|g'(x)|=\frac\varepsilon 2.
\end{align} These results imply that the following implication holds for all real numbers ##h## such that ##x+h## is in the domain of ##g## and ##g(x+h)## is in the domain of ##f##:
$$0<|h|<\delta\ \Rightarrow\ \left|\frac{(f\circ g)(x+h)-(f\circ g)(x)}{h}-f'(g(x))g'(x)\right|<\varepsilon.$$

MidgetDwarf · Jan 22, 2015

The easieasiest proof I found was by treating the initial function as parametric. Thus, a 3 variable is introduced making the proof neater.

Fredrik · Jan 22, 2015

MidgetDwarf said:

The easieasiest proof I found was by treating the initial function as parametric. Thus, a 3 variable is introduced making the proof neater.

I haven't seen that proof. Can you show us?

mathwonk · Jan 23, 2015

i always prove it s follows:

if f(t) = y(x(t)), y(0) = x(0) = 0, and if x(t) ≠ 0 for all small t≠0, then deltay/deltat = deltay/deltax , deltax/deltat, so in the limit,

dy/dt = dy/dx . dx/dt.

If on the other hand x(t) = 0 for t converging to 0, then dx/dt = 0, and we only have to prove that dy/dt = 0. But dltay/deltat = 0 for those t for which x(t) = 0, and for the others, the previous argument works.

Svein · Jan 23, 2015

I haven't done this for about 50 years, but let's see: We start out with two functions f and g, both of which are differentiable. Then we seek the differential of f(g(x)):
[tex] \frac{\mathrm{df}}{\mathrm{dx} } = lim_{\Delta x\rightarrow0}\frac{\Delta f}{\Delta x} = lim_{\Delta x\rightarrow0}\frac{\Delta f}{\Delta g} \frac{\Delta g}{\Delta x} [/tex]
Since g is differentiable, we have [tex]
\begin{equation}
\Delta x \rightarrow 0 \Rightarrow \Delta g \rightarrow 0
\end{equation} [/tex]
and, of course, [tex]
\begin{equation}
lim_{\Delta x\rightarrow0}\frac{\Delta g}{\Delta x} = \frac{\mathrm{dg}}{\mathrm{dx} }
\end{equation}
[/tex]
we end up with [tex]
\begin{equation}
lim_{\Delta g\rightarrow0}\frac{\Delta f}{\Delta g}\frac{\mathrm{dg}}{\mathrm{dx} } = \frac{\mathrm{df}}{\mathrm{dg} }\frac{\mathrm{dg}}{\mathrm{dx} }
\end{equation}
[/tex]

"Don't panic!" · Jan 23, 2015

I like your limit proof Fredrik.
Thanks for the input from you all. It would be interesting to see that 3 variable proof if possible?!

Interestingly I found this other proof for the chain rule http://tutorial.math.lamar.edu/Classes/CalcI/DerivativeProofs.aspx (at the very bottom of the page) it seems to be pretty rigorous, but have a look and see what you think.

Fredrik · Jan 23, 2015

mathwonk said:

i always prove it s follows:

if f(t) = y(x(t)), y(0) = x(0) = 0, and if x(t) ≠ 0 for all small t≠0, then deltay/deltat = deltay/deltax , deltax/deltat, so in the limit,

dy/dt = dy/dx . dx/dt.

If on the other hand x(t) = 0 for t converging to 0, then dx/dt = 0, and we only have to prove that dy/dt = 0. But dltay/deltat = 0 for those t for which x(t) = 0, and for the others, the previous argument works.

I don't understand this. First you make assumptions about the functions, losing generality. (Perhaps the generality can be recovered by using this theorem to prove the more general result). Then you're talking about "small t". There's some fancy stuff hidden in that remark. What you're really saying is (I think) that there's an open neigborhood of 0 on which x takes non-zero values except at 0. Only a person who has studied topology will understand this. And someone who understands it will still have to consider the possibility that it isn't true. What if x(t)=0 for infinitely many values of t, in every open neighborhood of 0? Then we'd have to prove that either x'(0)=0, or x isn't differentiable at at 0.

There's also some fancy stuff hidden in the "so in the limit" comment at the end. It's far from obvious that this statement is true. I'd say that the key part of the proof is to show that it is.

Fredrik · Jan 23, 2015

Svein said:

we end up with [tex]
\begin{equation}
lim_{\Delta g\rightarrow0}\frac{\Delta f}{\Delta g}\frac{\mathrm{dg}}{\mathrm{dx} } = \frac{\mathrm{df}}{\mathrm{dg} }\frac{\mathrm{dg}}{\mathrm{dx} }
\end{equation}
[/tex]

This is the part that's hard to prove. It takes a lot more than just a statement.

mathwonk · Jan 23, 2015

i don't know if you will like this any better, but these are notes i wrote for my class on this topic years ago

ere is the traditional proof of the chain rule which appeared in calculus books at the turn of the century before being "lost".Trivial lemma: if the domain of a function f is the union of two sets and the restriction of f to each of those sets converges to 0 as x approaches a, then f itself converges to 0 as x approaches a.Now assume z(y(x)) is a composite of two differentiable functions and that on every deleted neighborhood of a, ∆y = 0 somewhere. Then clearly dy/dx = 0 at a. Hence to prove the chain rule there, means to show that ∆z/∆x approaches 0. On the set where ∆y = 0, then ∆z/∆x also equals 0, so this set poses no problem. On the set where ∆y ≠ 0, we have ∆z/∆x = (∆z/∆y) (∆y/∆x) so the result follows by the product rule for limits.From this point of view, the so called "problem set" is the easier one to deal with.This result was traditionally proved correctly in turn of

the century English language books, such as Pierpont's Theory of functions

of a real variable, and in 19th century European books such as that of

Tannery [see the article by Carslaw, in vol XXIX of B.A.M.S.], but

unfortunately not in the first three editions of the influential book Pure

Mathematics, by G.H.Hardy. Although Hardy reinstated the classical proof in later editions, modern books usually deal with the problem by giving the slightly more sophisticated linear approximation proof, or making what to me are somewhat artificial constructions. The classical proof seems to have merit, so I recall it here.The point is simply that in proving a function has limit L, one only needs

to prove it at points where the function does not already have value L.

Thus to someone who says that the usual argument for the chain rule for

y(u(x)), does not work for x's where ∆u = 0, one can simply reply that

these points are irrelevant.Assume f is differentiable at g(a), g is differentiable at a, and on every

neighborhood of a there are points x where g(x) = g(a). We claim the

derivative of f(g(x)) at a equals f'(g(a))(g'(a)).

Proof:

1) Clearly under these hypotheses, g'(a) = 0.

Consequently,

2) the chain rule holds at a if and only if lim∆f/∆x = 0 as x approaches a.

3) Note that ∆f = ∆f/∆x = 0 at all x such that g(x) = g(a).

4) In general, to prove that lim h(x) = L, as x approaches a, it suffices

to prove it for the restriction of h to those x such that h(x) ≠ L.

5) Thus in arguing that ∆f/∆x approaches 0, we may restrict to x such that

g(x) ≠ g(a), where the usual argument applies.

Svein · Jan 23, 2015

Fredrik said:

This is the part that's hard to prove. It takes a lot more than just a statement.

I was going to explain my meaning, but when I read the post above, I felt that mathwonk had already presented my arguments. So I'll refrain.

mathwonk · Jan 24, 2015

Since this is a bit confusing and I would like it to be better understood, I will add a bit more detail:

Classical proof of the chain rule. (I have corrected several typos since first posting this.)Chain rule: if y is a function of x, which is a function of t, and if the functions y and x are differentiable, at x0 = x(t0), and at t0, respectively, then y(x(t)) is differentiable at t0, and at t0, dy/dt = dy/dx.dx/dt, where the derivative of x is evaluated at x0.The usual proof of the chain rule, in the generic case, is to factor ∆y/∆t = ∆y/∆x. ∆x/∆t, and take the limit as t-->t0. Since the limit of ∆x/∆t is then assumed to exist and to equal dx/dt, and since thus x(t) is also continuous at t0, letting t-->t0 causes also x-->x0, so the limit ∆y/∆x as t-->t0 equals its limit as x-->x0, which is assumed to exist and to equal dy/dx. Then by the product rule for limits, the limit of dy/dt also exists and equals the product dy/dx.dx/dt.This argument assumes that the fractions in the factorization make sense, i.e. that for ∆t ≠ 0, we also have ∆x ≠ 0. Thus there is one special case to argue, when this does not happen. I.e. we must also argue the validity of the formula and the limit when, on every deleted interval about t0, there is a t for which ∆x = 0.
This argument is based on a simple principle. Namely, if we partition a deleted interval about t0 into two disjoint subsets, A and B, both of which contain points arbitrarily near t0, and if g is a function defined on some open interval containing t0, then the limit of

g(t) exists as t-->t0 and equals L, if and only if the limits of both restrictions of g, to A and to B, also exist and equal L, as t-->t0, but with t constrained to lie first in A and then in B.
Now we apply this rule to the function ∆y/∆t, defined on a deleted interval about t0. We partition this interval into the sets A of those points t ≠ t0 where ∆x ≠0, and the set B of those points t ≠ t0, where ∆x = 0.If B contains points arbitrarily near t0, then since the the limit dx/dt exists, also the limit exists when we restrict to the set B, and that restricted limit has the same value. Since on the set B, by definition ∆x has value constantly zero, the limit of that restriction is also zero, as is therefore the derivative dx/dt.Hence the equation we wish to prove, dy/dt = dy/dx.dx/dt, equals zero on the right side, so it suffices to prove we have limit zero also on the left hand side.I.e. we must prove the limit ∆y/∆t exists and equals zero as t-->t0, for which it suffices to prove this for both restrictions, of ∆y/∆t to the set A, and to the set B. now by definition of the set B, on this set ∆x = 0 and thus also ∆y = 0, so the function ∆y/∆t is identically zero on the set B, whence this restriction definitely has the limit zero.But on the set A, we can factor ∆y/∆t = ∆y/∆x. ∆x/∆t, and argue the limit exists and (since we know that ∆x/∆t-->dx/dt = 0), the limit equals zero by the product rule as before, This concludes the proof.

mathwonk · Jan 26, 2015

@Fredrik: did this help the misunderstanding?

Fredrik · Jan 29, 2015

mathwonk said:

@Fredrik: did this help the misunderstanding?

Sorry mathwonk, I've been really busy and haven't had time to really examine your posts yet. I will take a closer look tomorrow or Saturday.

Fredrik · Feb 13, 2015

@mathwonk: I read your post #23 soon after I wrote the above, but I didn't have time to read post #25 at the same time. I didn't come back here to do that for some time, but I have now. After reading post #23, I understood the idea of the proof, and after reading post #25, I also understood how you're using the lemma mentioned in post #23.

My only concern is that there's still some work required to show rigorously that ##\Delta y/\Delta x\to dy/dx## as ##t\to t_0##. I would be able to do it, but someone who's new at it would probably find it very difficult.

mathwonk · Feb 16, 2015

that is because x(t) is assumed differentiable, hence also continuous, so as t-->t0, also x-->x0.

and we know by hypothesis that Δy/Δx→dy/dx as x-->x0.

Fredrik · Feb 17, 2015

mathwonk said:

that is because x(t) is assumed differentiable, hence also continuous, so as t-->t0, also x-->x0.

and we know by hypothesis that Δy/Δx→dy/dx as x-->x0.

Right, but this isn't a proof. It's a couple of observations that you can use as the starting point of a proof.

I think this argument can be misleading. Someone might interpret it as a simple application of the transitivity of the logical implication operation. What I mean is that it might be interpreted as

We know that ##t\to t_0\Rightarrow x\to x_0\Rightarrow \Delta y/\Delta x\to dy/dx##, so we can conclude that ##t\to t_0\Rightarrow \Delta y/\Delta x\to dy/dx##.

Of course, a statement like "##f(x)\to A## as ##x\to a##" doesn't mean that ##x\to a\Rightarrow f(x)\to A##. The strings "##x\to a##" and "##f(x)\to A##" aren't even statements.

mathwonk · Feb 17, 2015

you lost me. i think i have given a complete proof, with ample detail.

added Later:

@ Fredrik: Sorry to be so slow. Did you want something like this?

Lemma: If (1) f(x)-->L as x-->x0, and (2) x(t)-->x0 as t-->t0, then f (x(t))-->L as t-->t0.
proof: given e>0 choose d1>0 so that |x-x0|<d1 implies | f(x)-L | < e. (ok by (1)).

Then choose d>0 so that |t-t0| < d implies |x(t)-x0| < d1. (ok by (2)).

Then |t-t0| < d implies |f(x(t) - L| < e. QED.

(I admit I tend to take for granted that someone can fill in such details on his own. But i do recall being puzzled by exactly such matters, hmmm, maybe some 50 years ago.)

Fredrik · Feb 18, 2015

Yes, that's what I had in mind. That completes the proof.

mathwonk · Feb 18, 2015

here is another nice way to make logical sense of vague statements like "t-->t0 implies x-->x0".

think of t as a sequence {tn}, and tn-->t0 as the statement "the sequence {tn} converges to t0 (as n goes to infinity)".

then the statement " tn-->t0 implies x(tn)-->x0" is an implication. and since also "xn-->x0 implies f(xn)-->L" is an implication,

one can compose them and get that "tn-->t0 implies f(x(tn))-->L".

Svein · Feb 18, 2015

mathwonk said:

think of t as a sequence {tn}, and tn-->t0 as the statement "the sequence {tn} converges to t0 (as n goes to infinity)".

Mathematically, this is not equivalent with an ε - δ argument. A sequence contains a countable set of points, the ε - δ argument talks about all points (which is usually not countable). The argument would make sense in ℚ, but then the limit might not be in ℚ.

mathwonk · Feb 19, 2015

As a matter of fact the sequence approach is equivalent to the epsilon delta approach, for convergence in the reals, as you can show as an easy exercise. i.e.

lemma: the following are equivalent:
1) for every e>0 there is a d>0 such that |x-x0| < d implies |f(x)-L| < e.

2) for every sequence {xn} converging to x0, the sequence {f(xn)} converges to L.

the point is that there is a countable sequence of rationals, {1/n} if you like, converging to 0, and if (1) fails, these can be used (as d's) to construct a sequence for which (2) fails. the proof that if (1) is true then (2) is true is even easier.

sequential convergence is equivalent to more general convergence in any "first countable space", such as any metric space. here is a little article on them from wikipedia:

http://en.wikipedia.org/wiki/First-countable_space

A question on proving the chain rule

Similar threads

Hot Threads

Recent Insights