# A question on proving the chain rule

Tags:
1. Jan 19, 2015

### "Don't panic!"

I'm currently reviewing my knowledge of calculus and trying to include rigourous (ish) proofs in my personal notes as I don't like accepting things in maths on face value. I've constructed a proof for the chain rule and was wondering if people wouldn't mind checking it and letting me know if it is correct or not (and what improvements may need to be made). Thanks for your time.

From the definition of the derivative of a differentiable function $f:\mathbb{R}\rightarrow\mathbb{R}$ (one-dimensional case), we have that $$f'(x)=\frac{df}{dx}=\lim_{\Delta x\rightarrow 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}= \lim_{\Delta x\rightarrow 0}\frac{\Delta f}{\Delta x}$$
This implies that $$\frac{\Delta f}{\Delta x}= f'(x) +\varepsilon (x)\quad\Rightarrow\quad\Delta f = \left(f'(x)+\varepsilon (x)\right)\Delta x$$
where $\varepsilon (x)$ is some error function which accounts for the difference between the actual (finite) change in $f$ and its linear approximation $f'(x)\Delta x$. Furthermore, $\varepsilon (x)$ satisfies the property $$\lim_{\Delta x\rightarrow 0}\varepsilon (x)=0$$ such that as $\Delta x \rightarrow 0,\quad\frac{\Delta f}{\Delta x}\rightarrow f'(x)$.

Now, consider a function $y=f\circ g(x)=f(g(x))$ and let $u=g(x)$. We have then, that $$\Delta u = g(x+\Delta x)-g(x)=\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x$$ $$\Delta y = f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u$$
Note that $\varepsilon_{2}(u)\rightarrow 0$ as $\Delta u\rightarrow 0$. However, since $\Delta u\rightarrow 0$ as $\Delta x\rightarrow 0$, this implies that $\varepsilon_{2}(u)\rightarrow 0$ as $\Delta x\rightarrow 0$.
And so,
$$f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u$$ $$\Rightarrow f(g(x+\Delta x))-f(g(x))=f\circ g(x+\Delta x)-f\circ g(x)\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=\left(f'(g(x))+\varepsilon_{2}(g(x))\right)\cdot\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))f'(g(x))g'(x)\Delta x +\left(f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))g'(x)\Delta x +\varepsilon_{3}\Delta x$$

where $\varepsilon_{3}\equiv f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}$. We see from this that as $\Delta x\rightarrow 0,\quad\varepsilon_{3}\rightarrow 0$. Hence,
$$\lim_{\Delta x\rightarrow 0}\frac{f\circ g(x+\Delta x)-f\circ g(x)}{\Delta x}= (f\circ g)'(x)=f'(g(x))g'(x)$$

2. Jan 19, 2015

### Svein

I think you have a small hiccup in line 4 from the bottom - there seems to be too many f' in there.

Otherwise: Symbolically we write d(f°g)/dx = df/dg * dg/dx...

3. Jan 19, 2015

### "Don't panic!"

Whoops, yes you're right, thanks for pointing it out. It should be, $$=f'(g(x))g'(x)\Delta x + \left(f'(g(x))\varepsilon_{1} + g'(x)\varepsilon_{2} + \varepsilon_{1} \varepsilon_{2} \right)\Delta x$$

4. Jan 19, 2015

### Fredrik

Staff Emeritus
Your proof attempt gets really confusing when you introduce u and y. You say that y is a function, and then you set it equal to the number f(g(x)). After that point, I can't tell what's a function and what's a number. In those cases where the notation is supposed to represent the value of a function at a point in its domain, the notation hides what point that is.

Edit: I recommend a more explicit notation, e.g.
$$\Delta f(x,h)=f(x+h)-f(x) =(f'(x)+\varepsilon_f(x,h)) h.$$ For your approach to work, I think you will have to prove that $h\mapsto \varepsilon_f(x,h)$ is continuous.

Last edited: Jan 19, 2015
5. Jan 20, 2015

### "Don't panic!"

Yeah, I realised in hindsight that although original using u and y for notational convenience it does rather confuse things.

Is their a better approach?... I've seen some proofs which start by defining a new variable $$v=\frac{g(x+\Delta x)-g(x)}{\Delta x}- g'(x)$$ and stating that $u$ depends on the number $\Delta x$, and further, that as $\Delta x\rightarrow 0,\quad v\rightarrow 0$ (I assume this is true because $g'(x)$ is just a number and so $\lim_{\Delta x\rightarrow 0} g'(x)=g'(x)$ and therefore, as $\lim_{\Delta x\rightarrow 0} \frac{g(x+\Delta x)-g(x)}{\Delta x}= g'(x)$, the two terms cancel ?!). Here is the link to the one I was referencing in particular http://kruel.co/math/chainrule.pdf

6. Jan 20, 2015

### Fredrik

Staff Emeritus
The proof in that pdf is pretty good, but there's a detail that I think it doesn't explain well enough. When we get the result
$$\frac{f(g(x+h))-f(g(x))}{h} =(f'(g(x))+w)(g'(x)+v),$$ it's sufficient to show that $\lim_{h\to 0}(f'(g(x))+w)=f'(g(x))$ and $\lim_{h\to 0}(g'(x)+v)=g'(x)$. The latter equality is pretty obvious, but the former is not. It would be more accurate to write $w(g(x),g'(x)h+v(x,h)h)$ instead of $w$. But how do we know that $$\lim_{h\to 0}w\big(g(x),g'(x)h+v(x,h)h\big)=0?$$ An approach that I would consider as good as any is to do what the pdf does, but improve this part of the proof.

7. Jan 20, 2015

### "Don't panic!"

Thanks for taking a look at it, appreciate it.

I assume it would involve using the definition of the limit or something like that (i.e. find a $\delta >0$ such that for any number $\varepsilon >0$, if $\vert x-0\vert = \vert x\vert < \delta$ then $\vert w(g(x),g′(x)h+v(x,h)h) - 0\vert = \vert w(g(x),g′(x)h+v(x,h)h) \vert < \varepsilon$)?

8. Jan 20, 2015

### Fredrik

Staff Emeritus
Yes, an ε-δ proof is the straightforward way to do it. But it's not super easy. (Still, I don't know an easier way). Let $\varepsilon>0$ be arbitrary. The definition of $w$ and the assumption that $f$ is differentiable at $g(x)$ tell us that that there's a $\delta_1>0$ such that the following implication holds
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Because of this, it will be sufficient to find a $\delta>0$ such that the following implication holds
$$0<|h|<\delta\ \Rightarrow\ |g'(x)h+v(x,h)h|<\delta_1.$$ This is a bit tricky, but it's certainly doable. Here's a choice that works: (I recommend that you try to complete the proof yourself before you look at the spoiler).

Let $\delta_2>0$ be such that
$$0<|h|<\delta_2\ \Rightarrow\ |v(x,h)|<\frac{\delta_1}{2}.$$ Now define $\delta$ by
$$\delta=\min\left\{\frac{\delta_1}{2|g'(x)|},\delta_2,1\right\}.$$

Edit: Apparently I missed a factor of $h$, and that ruined this attempted proof. See below for a new attempt.

Last edited: Jan 22, 2015
9. Jan 21, 2015

### "Don't panic!"

I have to admit, I've been a little stuck. Is the above true because $k=g(x+h)-g(x)=g'(x)h+v(x,h)h$ and therefore $\vert k\vert < \delta_{1}$ is equivalent to $\vert g(x+h)-g(x)\vert = \vert g'(x)h+v(x,h)h\vert < \delta_{1}$?

My initial attempt was to note that $g$ is differentiable at $x$ and so there is some $\delta_{2} >0$ such that for any number $\varepsilon_{2} >0$, $$\Biggr\vert\frac{g(x+h)-g(h)}{h}-g'(x)\Biggr\vert = \vert v(x,h)\vert < \varepsilon_{2} \qquad\text{whenever}\qquad 0<\vert h\vert <\delta_{2}$$ I was then considering using the triangle inequality such that $$\vert g'(x)h+v(x,h)h \vert \leq \vert g'(x)h\vert +\vert v(x,h)h\vert = \vert h \vert\left(\vert g'(x)\vert +\vert v(x,h)\vert \right) < \vert h \vert\left(\vert g'(x)\vert + \varepsilon_{2}\right)$$ But I'm really not sure this is right and a bit unsure how to proceed?!

10. Jan 21, 2015

### Fredrik

Staff Emeritus
I will need to clarify a few things. I didn't make it clear which of my statements are "for all" statements. In particular, I should have said this: There's a $\delta_1>0$ such that the following implication holds for all $k$ such that $g(x)+k$ is in the domain of $f$:
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Note that this statement is saying nothing other than that $f$ is differentiable at $g(x)$. Also note that because of the "for all", $k$ is a dummy variable; that symbol can be replaced by any other without changing the meaning of the statement.

This implies that for all $h\in\mathbb R$ such that

(a) $x+h$ is in the domain of $g$,
(b) $g(x)+g'(x)h+v(x,h)h$ is in the domain of $f$,
(c) $0<|g'(x)h+v(x,h)h|<\delta_1$,

we have $|w(g(x),g'(x)h+v(x,h)h)|<\varepsilon$.

Is that clear enough? The key is the "for all". Note for example how the statement "For all $x$, we have $x^2\geq 0$" implies that $1^2\geq 0$".

I used the triangle inequality like that. Since the goal was to get $<\delta_1$ on the right, I decided to make sure that each of the two terms on the right is less than $\frac{\delta_1}{2}$.

11. Jan 22, 2015

### "Don't panic!"

Here is my attempt so far.

Let $\varepsilon >0$ be any (positive) number. Then as $g$ is differentiable at $x$, there is a number $\delta_{2} >0$, such that $$0< \vert h\vert < \delta_{2} \qquad\Rightarrow\qquad\vert v(x,h)\vert < \frac{\delta_{1}}{2 \vert h\vert}$$ Further note that the $\lim_{h\rightarrow 0} g'(x) = g'(x)$ exists (as $g'(x)$ is just a number) and so there is a number $\delta_{3} >0$, such that, $$0< \vert h\vert < \delta_{3} \qquad\Rightarrow\qquad\vert g'(x) - g'(x)\vert < \frac{\delta_{1}}{\vert h\vert}$$ which, upon using the triangle inequality implies that $\vert g'(x) \vert < \frac{\delta_{1}}{2 \vert h\vert}$, as $$\vert g'(x) - g'(x)\vert \leq \vert g'(x) \vert +\vert - g'(x)\vert = \vert g'(x) \vert +\vert g'(x)\vert = 2 \vert g'(x) \vert < \frac{\delta_{1}}{\vert h\vert}$$
(I'm a bit unsure on this bit, feel I may have done something a bit 'dodgy'?!).

As such, if we let $\delta= \text{min}\bigg\lbrace \frac{\delta_{1}}{2 \vert h\vert}, \delta_{2}, \delta_{3} \bigg\rbrace$ (I hope I'm using this notation correctly, i.e. that it means that $\delta$ assumes the smallest of the three elements in the set?), and suppose that $0 <\vert h \vert < \delta$, then $$\vert g'(x)h + v(x,h)h \vert \leq \vert g'(x)h \vert + \vert v(x,h)h \vert \\ \qquad\qquad\qquad\qquad = \vert h \vert \left(\vert g'(x) \vert + \vert v(x,h) \vert\right) \\ \qquad\qquad\qquad\qquad < \vert h \vert \left( \frac{\delta_{1}}{2 \vert h\vert} + \frac{\delta_{1}}{2 \vert h\vert} \right) \\ \qquad\qquad\qquad\qquad = \frac{\delta_{1}}{2} + \frac{\delta_{1}}{2} = \delta_{1}$$
And therefore, there is a number $\delta >0$, such that $$0 < \vert h \vert < \delta\qquad\Rightarrow\qquad \vert g'(x)h + v(x,h)h \vert < \delta_{1}$$

Sorry if this is complete rubbish, I'm fairly new to doing rigorous $\varepsilon$ - $\delta$ proofs.

12. Jan 22, 2015

### Fredrik

Staff Emeritus
I see that you've noticed that I missed a factor of $|h|$ when I described how to proceed. Unfortunately, the solution isn't to put an $|h|$ in the denominator on the right. That right-hand side must be independent of $h$. I will need to think about how to correct my attempt to complete the proof.

If you want to prove that $|g'(x)-g'(x)|$ is less than some positive real number $r$, then all you have to say is $|g'(x)-g'(x)|=0<r$. No need to mention limits.

13. Jan 22, 2015

### Fredrik

Staff Emeritus
OK, new attempt. We want to prove that $\lim_{h\to 0}w\big(g(x),g'(x)h+v(x,h)h\big)=0$. For all non-zero $h$, we have $g'(x)h+v(x,h)h=g(x+h)-g(x)$. This means that it will be sufficient to prove that $\lim_{h\to 0}w\big(g(x),g(x+h)-g(x)\big)=0$. This makes things easier.

Let $\varepsilon>0$ be arbitrary. Let $\delta_1$ be a positive real number such that the following implication holds for all $k$ such that $x+k$ is in the domain of $f$:
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Such a $\delta_1$ exists because $f$ is differentiable at $g(x)$. Let $\delta$ be a positive real number such that the following implication holds for all $h$ such that $x+h$ is in the domain of $g$:
$$0<|h|<\delta\ \Rightarrow\ |g(x+h)-g(x)|<\delta_1.$$ Such a $\delta_2$ exists because $g$ is continuous at $x$. Now let $h$ be an arbitrary real number such that

(a) $0<|h|<\delta$.
(b) $x+h$ is in the domain of $g$.
(c) $g(x+h)$ is in the domain of $f$.
(d) $g(x+h)\neq g(x)$.

By definition of $\delta$, we have $|g(x+h)-g(x)|<\delta_1$. By definition of $\delta_1$, this implies that $\big|w\big(g(x),g(x+h)-g(x)\big)\big|<\varepsilon$. Since $h$ is an arbitrary real number that satisfies (a)-(d), this means that $\lim_{h\to 0}w\big(g(x),g(x+h)-g(x)\big)=0$.

Note that (b)-(d) only ensure that $h$ is in the domain of the map $t\mapsto w(g(x),g(x+t)-g(x))$.

Edit: There's an issue that the above doesn't account for. Condition (d) is supposed to ensure that there's no division by 0, but if $g$ is constant on an open neighborhood of $x$, (d) is insufficient. We may have to handle the case of a constant $g$ separately. Even if $g$ isn't constant, there could be a non-zero $h$ such that $g(x+h)=g(x)$. If there's a finite number of such $h$, we can just choose $\delta$ small enough. If there's a whole sequence of them that goes to 0, then we can't do that, but we should be able to prove that this implies that $g$ isn't differentiable at $x$, contradicting our assumptions.

Last edited: Jan 22, 2015
14. Jan 22, 2015

### "Don't panic!"

Thanks for looking into it further. Appreciate all the time you've put into this - it always seems to be the things that seem 'intuitive' that are the hardest to prove rigorously!

15. Jan 22, 2015

### Fredrik

Staff Emeritus
I need to do these things once in a while, or I'll forget how to do them. So the exercise was welcome. I think I still like a straightforward ε-δ proof the best. Instead of the w and the v in the proof above, I'll use the notation
$$f(x+h)=f(x)+f'(x)h+R_f(x,h),$$ and a similar notation for $g$. Note that $R_f(g(x),k)$ is well-defined even when $k=0$. We will use this to solve the division by zero issue. If we temporarily define $k=g'(x)h+R_g(x,h)$, we have
$$f(g(x+h))=f(g(x)+g'(x)h+R_g(x,h))=f(g(x)+k)=f(g(x))+f'(g(x))k+R_f(g(x),k).$$ This implies that
\begin{align}
&\left|\frac{(f\circ g)(x+h)-(f\circ g)(x)}{h}-f'(g(x))g'(x)\right| =\left|\frac{f'(g(x))k+R_f(g(x),k)}{h}-f'(g(x))g'(x)\right|\\
&\left|\frac{f'(g(x))g'(x)h+f'(g(x))R_g(x,h)+R_f(g(x),k)}{h}-f'(g(x))g'(x)\right| =\left|\frac{f'(g(x))R_g(x,h)+R_f(g(x),k)}{h}\right|\\
&\leq|f'(g(x))|\left|\frac{R_g(x,h)}{h}\right|+\left|\frac{R_f(g(x),k)}{h}\right|.
\end{align}
Because of this, it's sufficient to find a $\delta>0$ such that if $0<|h|<\delta$, then each of the two terms above is less than $\frac\varepsilon 2$. Choose $\delta_1>0$ such that
$$0<|k|<\delta_1\ \Rightarrow\ \left|\frac{R_f(g(x),k)}{k}\right|<\frac{\varepsilon}{4|g'(x)|}.$$ Then choose $\delta>0$ such that if $0<|h|<\delta$, then all of the following inequalities hold:
\begin{align}
&\left|\frac{R_g(x,h)}{h}\right| <\frac{\varepsilon}{2|f'(g(x))|},\\
&\left|\frac{R_g(x,h)}{h}\right| <|g'(x)|,\\
&|k|=|g'(x)h+R_g(x,h)|=|g(x+h)-g(x)|<\delta_1.
\end{align} Now let $h$ be an arbitrary real number such that $x+h$ is in the domain of $g$, $g(x+h)$ is in the domain of $f$, and $0<|h|<\delta$. We have
$$|f'(g(x))|\left|\frac{R_g(x,h)}{h}\right|<\frac{\varepsilon}{2}.$$ If $k=0$, we have
$$\left|\frac{R_f(g(x),k)}{h}\right|=0<\frac\varepsilon 2.$$ If $k\neq 0$, we have
\begin{align}
&\left|\frac{R_f(g(x),k)}{h}\right| =\left|\frac{R_f(g(x),k)}{k}\right|\left|\frac k h\right| =\left|\frac{R_f(g(x),k)}{k}\right| \left|\frac{g'(x)h+R_g(x,h)}{h}\right|\\
&\leq \left|\frac{R_f(g(x),k)}{k}\right| \left(|g'(x)|+\left|\frac{R_g(x,h)}{h}\right|\right) < \frac{\varepsilon}{4|g'(x)|}2|g'(x)|=\frac\varepsilon 2.
\end{align} These results imply that the following implication holds for all real numbers $h$ such that $x+h$ is in the domain of $g$ and $g(x+h)$ is in the domain of $f$:
$$0<|h|<\delta\ \Rightarrow\ \left|\frac{(f\circ g)(x+h)-(f\circ g)(x)}{h}-f'(g(x))g'(x)\right|<\varepsilon.$$

16. Jan 22, 2015

### MidgetDwarf

The easieasiest proof I found was by treating the initial function as parametric. Thus, a 3 variable is introduced making the proof neater.

17. Jan 22, 2015

### Fredrik

Staff Emeritus
I haven't seen that proof. Can you show us?

18. Jan 23, 2015

### mathwonk

i always prove it s follows:

if f(t) = y(x(t)), y(0) = x(0) = 0, and if x(t) ≠ 0 for all small t≠0, then deltay/deltat = deltay/deltax , deltax/deltat, so in the limit,

dy/dt = dy/dx . dx/dt.

If on the other hand x(t) = 0 for t converging to 0, then dx/dt = 0, and we only have to prove that dy/dt = 0. But dltay/deltat = 0 for those t for which x(t) = 0, and for the others, the previous argument works.

19. Jan 23, 2015

### Svein

I haven't done this for about 50 years, but let's see: We start out with two functions f and g, both of which are differentiable. Then we seek the differential of f(g(x)):
$$\frac{\mathrm{df}}{\mathrm{dx} } = lim_{\Delta x\rightarrow0}\frac{\Delta f}{\Delta x} = lim_{\Delta x\rightarrow0}\frac{\Delta f}{\Delta g} \frac{\Delta g}{\Delta x}$$
Since g is differentiable, we have $$\Delta x \rightarrow 0 \Rightarrow \Delta g \rightarrow 0$$
and, of course, $$lim_{\Delta x\rightarrow0}\frac{\Delta g}{\Delta x} = \frac{\mathrm{dg}}{\mathrm{dx} }$$
we end up with $$lim_{\Delta g\rightarrow0}\frac{\Delta f}{\Delta g}\frac{\mathrm{dg}}{\mathrm{dx} } = \frac{\mathrm{df}}{\mathrm{dg} }\frac{\mathrm{dg}}{\mathrm{dx} }$$

20. Jan 23, 2015

### "Don't panic!"

I like your limit proof Fredrik.
Thanks for the input from you all. It would be interesting to see that 3 variable proof if possible?!

Interestingly I found this other proof for the chain rule http://tutorial.math.lamar.edu/Classes/CalcI/DerivativeProofs.aspx (at the very bottom of the page) it seems to be pretty rigorous, but have a look and see what you think.