# A question on proving the chain rule

• "Don't panic!"
In summary, the chain rule can be used to calculate the derivative of a function at a given point in its domain.

#### "Don't panic!"

I'm currently reviewing my knowledge of calculus and trying to include rigourous (ish) proofs in my personal notes as I don't like accepting things in maths on face value. I've constructed a proof for the chain rule and was wondering if people wouldn't mind checking it and letting me know if it is correct or not (and what improvements may need to be made). Thanks for your time.

From the definition of the derivative of a differentiable function $f:\mathbb{R}\rightarrow\mathbb{R}$ (one-dimensional case), we have that $$f'(x)=\frac{df}{dx}=\lim_{\Delta x\rightarrow 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}= \lim_{\Delta x\rightarrow 0}\frac{\Delta f}{\Delta x}$$
This implies that $$\frac{\Delta f}{\Delta x}= f'(x) +\varepsilon (x)\quad\Rightarrow\quad\Delta f = \left(f'(x)+\varepsilon (x)\right)\Delta x$$
where $\varepsilon (x)$ is some error function which accounts for the difference between the actual (finite) change in $f$ and its linear approximation $f'(x)\Delta x$. Furthermore, $\varepsilon (x)$ satisfies the property $$\lim_{\Delta x\rightarrow 0}\varepsilon (x)=0$$ such that as $\Delta x \rightarrow 0,\quad\frac{\Delta f}{\Delta x}\rightarrow f'(x)$.

Now, consider a function $y=f\circ g(x)=f(g(x))$ and let $u=g(x)$. We have then, that $$\Delta u = g(x+\Delta x)-g(x)=\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x$$ $$\Delta y = f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u$$
Note that $\varepsilon_{2}(u)\rightarrow 0$ as $\Delta u\rightarrow 0$. However, since $\Delta u\rightarrow 0$ as $\Delta x\rightarrow 0$, this implies that $\varepsilon_{2}(u)\rightarrow 0$ as $\Delta x\rightarrow 0$.
And so,
$$f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u$$ $$\Rightarrow f(g(x+\Delta x))-f(g(x))=f\circ g(x+\Delta x)-f\circ g(x)\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=\left(f'(g(x))+\varepsilon_{2}(g(x))\right)\cdot\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))f'(g(x))g'(x)\Delta x +\left(f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))g'(x)\Delta x +\varepsilon_{3}\Delta x$$

where $\varepsilon_{3}\equiv f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}$. We see from this that as $\Delta x\rightarrow 0,\quad\varepsilon_{3}\rightarrow 0$. Hence,
$$\lim_{\Delta x\rightarrow 0}\frac{f\circ g(x+\Delta x)-f\circ g(x)}{\Delta x}= (f\circ g)'(x)=f'(g(x))g'(x)$$

I think you have a small hiccup in line 4 from the bottom - there seems to be too many f' in there.

Otherwise: Symbolically we write d(f°g)/dx = df/dg * dg/dx...

Whoops, yes you're right, thanks for pointing it out. It should be, $$=f'(g(x))g'(x)\Delta x + \left(f'(g(x))\varepsilon_{1} + g'(x)\varepsilon_{2} + \varepsilon_{1} \varepsilon_{2} \right)\Delta x$$

Your proof attempt gets really confusing when you introduce u and y. You say that y is a function, and then you set it equal to the number f(g(x)). After that point, I can't tell what's a function and what's a number. In those cases where the notation is supposed to represent the value of a function at a point in its domain, the notation hides what point that is.

Edit: I recommend a more explicit notation, e.g.
$$\Delta f(x,h)=f(x+h)-f(x) =(f'(x)+\varepsilon_f(x,h)) h.$$ For your approach to work, I think you will have to prove that ##h\mapsto \varepsilon_f(x,h)## is continuous.

Last edited:
Yeah, I realized in hindsight that although original using u and y for notational convenience it does rather confuse things.

Is their a better approach?... I've seen some proofs which start by defining a new variable $$v=\frac{g(x+\Delta x)-g(x)}{\Delta x}- g'(x)$$ and stating that $u$ depends on the number $\Delta x$, and further, that as $\Delta x\rightarrow 0,\quad v\rightarrow 0$ (I assume this is true because $g'(x)$ is just a number and so $\lim_{\Delta x\rightarrow 0} g'(x)=g'(x)$ and therefore, as $\lim_{\Delta x\rightarrow 0} \frac{g(x+\Delta x)-g(x)}{\Delta x}= g'(x)$, the two terms cancel ?!). Here is the link to the one I was referencing in particular http://kruel.co/math/chainrule.pdf

The proof in that pdf is pretty good, but there's a detail that I think it doesn't explain well enough. When we get the result
$$\frac{f(g(x+h))-f(g(x))}{h} =(f'(g(x))+w)(g'(x)+v),$$ it's sufficient to show that ##\lim_{h\to 0}(f'(g(x))+w)=f'(g(x))## and ##\lim_{h\to 0}(g'(x)+v)=g'(x)##. The latter equality is pretty obvious, but the former is not. It would be more accurate to write ##w(g(x),g'(x)h+v(x,h)h)## instead of ##w##. But how do we know that $$\lim_{h\to 0}w\big(g(x),g'(x)h+v(x,h)h\big)=0?$$ An approach that I would consider as good as any is to do what the pdf does, but improve this part of the proof.

Thanks for taking a look at it, appreciate it.

Fredrik said:
But how do we know that
limh→0w(g(x),g′(x)h+v(x,h)h)=0?​

I assume it would involve using the definition of the limit or something like that (i.e. find a $\delta >0$ such that for any number $\varepsilon >0$, if $\vert x-0\vert = \vert x\vert < \delta$ then $\vert w(g(x),g′(x)h+v(x,h)h) - 0\vert = \vert w(g(x),g′(x)h+v(x,h)h) \vert < \varepsilon$)?

"Don't panic!" said:
I assume it would involve using the definition of the limit or something like that (i.e. find a $\delta >0$ such that for any number $\varepsilon >0$, if $\vert x-0\vert = \vert x\vert < \delta$ then $\vert w(g(x),g′(x)h+v(x,h)h) - 0\vert = \vert w(g(x),g′(x)h+v(x,h)h) \vert < \varepsilon$)?
Yes, an ε-δ proof is the straightforward way to do it. But it's not super easy. (Still, I don't know an easier way). Let ##\varepsilon>0## be arbitrary. The definition of ##w## and the assumption that ##f## is differentiable at ##g(x)## tell us that that there's a ##\delta_1>0## such that the following implication holds
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Because of this, it will be sufficient to find a ##\delta>0## such that the following implication holds
$$0<|h|<\delta\ \Rightarrow\ |g'(x)h+v(x,h)h|<\delta_1.$$ This is a bit tricky, but it's certainly doable. Here's a choice that works: (I recommend that you try to complete the proof yourself before you look at the spoiler).

Let ##\delta_2>0## be such that
$$0<|h|<\delta_2\ \Rightarrow\ |v(x,h)|<\frac{\delta_1}{2}.$$ Now define ##\delta## by
$$\delta=\min\left\{\frac{\delta_1}{2|g'(x)|},\delta_2,1\right\}.$$

Edit: Apparently I missed a factor of ##h##, and that ruined this attempted proof. See below for a new attempt.

Last edited:
Fredrik said:
Because of this, it will be sufficient to find a δ>0\delta>0 such that the following implication holds
0<|h|<δ ⇒ |g′(x)h+v(x,h)h|<δ1​

I have to admit, I've been a little stuck. Is the above true because $k=g(x+h)-g(x)=g'(x)h+v(x,h)h$ and therefore $\vert k\vert < \delta_{1}$ is equivalent to $\vert g(x+h)-g(x)\vert = \vert g'(x)h+v(x,h)h\vert < \delta_{1}$?

My initial attempt was to note that $g$ is differentiable at $x$ and so there is some $\delta_{2} >0$ such that for any number $\varepsilon_{2} >0$, $$\Biggr\vert\frac{g(x+h)-g(h)}{h}-g'(x)\Biggr\vert = \vert v(x,h)\vert < \varepsilon_{2} \qquad\text{whenever}\qquad 0<\vert h\vert <\delta_{2}$$ I was then considering using the triangle inequality such that $$\vert g'(x)h+v(x,h)h \vert \leq \vert g'(x)h\vert +\vert v(x,h)h\vert = \vert h \vert\left(\vert g'(x)\vert +\vert v(x,h)\vert \right) < \vert h \vert\left(\vert g'(x)\vert + \varepsilon_{2}\right)$$ But I'm really not sure this is right and a bit unsure how to proceed?!

"Don't panic!" said:
Is the above true because $k=g(x+h)-g(x)=g'(x)h+v(x,h)h$ and therefore $\vert k\vert < \delta_{1}$ is equivalent to $\vert g(x+h)-g(x)\vert = \vert g'(x)h+v(x,h)h\vert < \delta_{1}$?
I will need to clarify a few things. I didn't make it clear which of my statements are "for all" statements. In particular, I should have said this: There's a ##\delta_1>0## such that the following implication holds for all ##k## such that ##g(x)+k## is in the domain of ##f##:
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Note that this statement is saying nothing other than that ##f## is differentiable at ##g(x)##. Also note that because of the "for all", ##k## is a dummy variable; that symbol can be replaced by any other without changing the meaning of the statement.

This implies that for all ##h\in\mathbb R## such that

(a) ##x+h## is in the domain of ##g##,
(b) ##g(x)+g'(x)h+v(x,h)h## is in the domain of ##f##,
(c) ##0<|g'(x)h+v(x,h)h|<\delta_1##,

we have ##|w(g(x),g'(x)h+v(x,h)h)|<\varepsilon##.

Is that clear enough? The key is the "for all". Note for example how the statement "For all ##x##, we have ##x^2\geq 0##" implies that ##1^2\geq 0##".

"Don't panic!" said:
My initial attempt was to note that $g$ is differentiable at $x$ and so there is some $\delta_{2} >0$ such that for any number $\varepsilon_{2} >0$, $$\Biggr\vert\frac{g(x+h)-g(h)}{h}-g'(x)\Biggr\vert = \vert v(x,h)\vert < \varepsilon_{2} \qquad\text{whenever}\qquad 0<\vert h\vert <\delta_{2}$$ I was then considering using the triangle inequality such that $$\vert g'(x)h+v(x,h)h \vert \leq \vert g'(x)h\vert +\vert v(x,h)h\vert = \vert h \vert\left(\vert g'(x)\vert +\vert v(x,h)\vert \right) < \vert h \vert\left(\vert g'(x)\vert + \varepsilon_{2}\right)$$ But I'm really not sure this is right and a bit unsure how to proceed?!
I used the triangle inequality like that. Since the goal was to get ##<\delta_1## on the right, I decided to make sure that each of the two terms on the right is less than ##\frac{\delta_1}{2}##.

Here is my attempt so far.

Let $\varepsilon >0$ be any (positive) number. Then as $g$ is differentiable at $x$, there is a number $\delta_{2} >0$, such that $$0< \vert h\vert < \delta_{2} \qquad\Rightarrow\qquad\vert v(x,h)\vert < \frac{\delta_{1}}{2 \vert h\vert}$$ Further note that the $\lim_{h\rightarrow 0} g'(x) = g'(x)$ exists (as $g'(x)$ is just a number) and so there is a number $\delta_{3} >0$, such that, $$0< \vert h\vert < \delta_{3} \qquad\Rightarrow\qquad\vert g'(x) - g'(x)\vert < \frac{\delta_{1}}{\vert h\vert}$$ which, upon using the triangle inequality implies that $\vert g'(x) \vert < \frac{\delta_{1}}{2 \vert h\vert}$, as $$\vert g'(x) - g'(x)\vert \leq \vert g'(x) \vert +\vert - g'(x)\vert = \vert g'(x) \vert +\vert g'(x)\vert = 2 \vert g'(x) \vert < \frac{\delta_{1}}{\vert h\vert}$$
(I'm a bit unsure on this bit, feel I may have done something a bit 'dodgy'?!).

As such, if we let $\delta= \text{min}\bigg\lbrace \frac{\delta_{1}}{2 \vert h\vert}, \delta_{2}, \delta_{3} \bigg\rbrace$ (I hope I'm using this notation correctly, i.e. that it means that $\delta$ assumes the smallest of the three elements in the set?), and suppose that $0 <\vert h \vert < \delta$, then $$\vert g'(x)h + v(x,h)h \vert \leq \vert g'(x)h \vert + \vert v(x,h)h \vert \\ \qquad\qquad\qquad\qquad = \vert h \vert \left(\vert g'(x) \vert + \vert v(x,h) \vert\right) \\ \qquad\qquad\qquad\qquad < \vert h \vert \left( \frac{\delta_{1}}{2 \vert h\vert} + \frac{\delta_{1}}{2 \vert h\vert} \right) \\ \qquad\qquad\qquad\qquad = \frac{\delta_{1}}{2} + \frac{\delta_{1}}{2} = \delta_{1}$$
And therefore, there is a number $\delta >0$, such that $$0 < \vert h \vert < \delta\qquad\Rightarrow\qquad \vert g'(x)h + v(x,h)h \vert < \delta_{1}$$

Sorry if this is complete rubbish, I'm fairly new to doing rigorous $\varepsilon$ - $\delta$ proofs.

"Don't panic!" said:
Let $\varepsilon >0$ be any (positive) number. Then as $g$ is differentiable at $x$, there is a number $\delta_{2} >0$, such that $$0< \vert h\vert < \delta_{2} \qquad\Rightarrow\qquad\vert v(x,h)\vert < \frac{\delta_{1}}{2 \vert h\vert}$$
I see that you've noticed that I missed a factor of ##|h|## when I described how to proceed. Unfortunately, the solution isn't to put an ##|h|## in the denominator on the right. That right-hand side must be independent of ##h##. I will need to think about how to correct my attempt to complete the proof.

If you want to prove that ##|g'(x)-g'(x)|## is less than some positive real number ##r##, then all you have to say is ##|g'(x)-g'(x)|=0<r##. No need to mention limits.

OK, new attempt. We want to prove that ##\lim_{h\to 0}w\big(g(x),g'(x)h+v(x,h)h\big)=0##. For all non-zero ##h##, we have ##g'(x)h+v(x,h)h=g(x+h)-g(x)##. This means that it will be sufficient to prove that ##\lim_{h\to 0}w\big(g(x),g(x+h)-g(x)\big)=0##. This makes things easier.

Let ##\varepsilon>0## be arbitrary. Let ##\delta_1## be a positive real number such that the following implication holds for all ##k## such that ##x+k## is in the domain of ##f##:
$$0<|k|<\delta_1\ \Rightarrow\ |w(g(x),k)|<\varepsilon.$$ Such a ##\delta_1## exists because ##f## is differentiable at ##g(x)##. Let ##\delta## be a positive real number such that the following implication holds for all ##h## such that ##x+h## is in the domain of ##g##:
$$0<|h|<\delta\ \Rightarrow\ |g(x+h)-g(x)|<\delta_1.$$ Such a ##\delta_2## exists because ##g## is continuous at ##x##. Now let ##h## be an arbitrary real number such that

(a) ##0<|h|<\delta##.
(b) ##x+h## is in the domain of ##g##.
(c) ##g(x+h)## is in the domain of ##f##.
(d) ##g(x+h)\neq g(x)##.

By definition of ##\delta##, we have ##|g(x+h)-g(x)|<\delta_1##. By definition of ##\delta_1##, this implies that ##\big|w\big(g(x),g(x+h)-g(x)\big)\big|<\varepsilon##. Since ##h## is an arbitrary real number that satisfies (a)-(d), this means that ##\lim_{h\to 0}w\big(g(x),g(x+h)-g(x)\big)=0##.

Note that (b)-(d) only ensure that ##h## is in the domain of the map ##t\mapsto w(g(x),g(x+t)-g(x))##.

Edit: There's an issue that the above doesn't account for. Condition (d) is supposed to ensure that there's no division by 0, but if ##g## is constant on an open neighborhood of ##x##, (d) is insufficient. We may have to handle the case of a constant ##g## separately. Even if ##g## isn't constant, there could be a non-zero ##h## such that ##g(x+h)=g(x)##. If there's a finite number of such ##h##, we can just choose ##\delta## small enough. If there's a whole sequence of them that goes to 0, then we can't do that, but we should be able to prove that this implies that ##g## isn't differentiable at ##x##, contradicting our assumptions.

Last edited:
Thanks for looking into it further. Appreciate all the time you've put into this - it always seems to be the things that seem 'intuitive' that are the hardest to prove rigorously!

I need to do these things once in a while, or I'll forget how to do them. So the exercise was welcome. I think I still like a straightforward ε-δ proof the best. Instead of the w and the v in the proof above, I'll use the notation
$$f(x+h)=f(x)+f'(x)h+R_f(x,h),$$ and a similar notation for ##g##. Note that ##R_f(g(x),k)## is well-defined even when ##k=0##. We will use this to solve the division by zero issue. If we temporarily define ##k=g'(x)h+R_g(x,h)##, we have
$$f(g(x+h))=f(g(x)+g'(x)h+R_g(x,h))=f(g(x)+k)=f(g(x))+f'(g(x))k+R_f(g(x),k).$$ This implies that
\begin{align}
&\left|\frac{(f\circ g)(x+h)-(f\circ g)(x)}{h}-f'(g(x))g'(x)\right| =\left|\frac{f'(g(x))k+R_f(g(x),k)}{h}-f'(g(x))g'(x)\right|\\
&\left|\frac{f'(g(x))g'(x)h+f'(g(x))R_g(x,h)+R_f(g(x),k)}{h}-f'(g(x))g'(x)\right| =\left|\frac{f'(g(x))R_g(x,h)+R_f(g(x),k)}{h}\right|\\
&\leq|f'(g(x))|\left|\frac{R_g(x,h)}{h}\right|+\left|\frac{R_f(g(x),k)}{h}\right|.
\end{align}
Because of this, it's sufficient to find a ##\delta>0## such that if ##0<|h|<\delta##, then each of the two terms above is less than ##\frac\varepsilon 2##. Choose ##\delta_1>0## such that
$$0<|k|<\delta_1\ \Rightarrow\ \left|\frac{R_f(g(x),k)}{k}\right|<\frac{\varepsilon}{4|g'(x)|}.$$ Then choose ##\delta>0## such that if ##0<|h|<\delta##, then all of the following inequalities hold:
\begin{align}
&\left|\frac{R_g(x,h)}{h}\right| <\frac{\varepsilon}{2|f'(g(x))|},\\
&\left|\frac{R_g(x,h)}{h}\right| <|g'(x)|,\\
&|k|=|g'(x)h+R_g(x,h)|=|g(x+h)-g(x)|<\delta_1.
\end{align} Now let ##h## be an arbitrary real number such that ##x+h## is in the domain of ##g##, ##g(x+h)## is in the domain of ##f##, and ##0<|h|<\delta##. We have
$$|f'(g(x))|\left|\frac{R_g(x,h)}{h}\right|<\frac{\varepsilon}{2}.$$ If ##k=0##, we have
$$\left|\frac{R_f(g(x),k)}{h}\right|=0<\frac\varepsilon 2.$$ If ##k\neq 0##, we have
\begin{align}
&\left|\frac{R_f(g(x),k)}{h}\right| =\left|\frac{R_f(g(x),k)}{k}\right|\left|\frac k h\right| =\left|\frac{R_f(g(x),k)}{k}\right| \left|\frac{g'(x)h+R_g(x,h)}{h}\right|\\
&\leq \left|\frac{R_f(g(x),k)}{k}\right| \left(|g'(x)|+\left|\frac{R_g(x,h)}{h}\right|\right) < \frac{\varepsilon}{4|g'(x)|}2|g'(x)|=\frac\varepsilon 2.
\end{align} These results imply that the following implication holds for all real numbers ##h## such that ##x+h## is in the domain of ##g## and ##g(x+h)## is in the domain of ##f##:
$$0<|h|<\delta\ \Rightarrow\ \left|\frac{(f\circ g)(x+h)-(f\circ g)(x)}{h}-f'(g(x))g'(x)\right|<\varepsilon.$$

The easieasiest proof I found was by treating the initial function as parametric. Thus, a 3 variable is introduced making the proof neater.

MidgetDwarf said:
The easieasiest proof I found was by treating the initial function as parametric. Thus, a 3 variable is introduced making the proof neater.
I haven't seen that proof. Can you show us?

i always prove it s follows:

if f(t) = y(x(t)), y(0) = x(0) = 0, and if x(t) ≠ 0 for all small t≠0, then deltay/deltat = deltay/deltax , deltax/deltat, so in the limit,

dy/dt = dy/dx . dx/dt.

If on the other hand x(t) = 0 for t converging to 0, then dx/dt = 0, and we only have to prove that dy/dt = 0. But dltay/deltat = 0 for those t for which x(t) = 0, and for the others, the previous argument works.

I haven't done this for about 50 years, but let's see: We start out with two functions f and g, both of which are differentiable. Then we seek the differential of f(g(x)):
$$\frac{\mathrm{df}}{\mathrm{dx} } = lim_{\Delta x\rightarrow0}\frac{\Delta f}{\Delta x} = lim_{\Delta x\rightarrow0}\frac{\Delta f}{\Delta g} \frac{\Delta g}{\Delta x}$$
Since g is differentiable, we have $$\Delta x \rightarrow 0 \Rightarrow \Delta g \rightarrow 0$$
and, of course, $$lim_{\Delta x\rightarrow0}\frac{\Delta g}{\Delta x} = \frac{\mathrm{dg}}{\mathrm{dx} }$$
we end up with $$lim_{\Delta g\rightarrow0}\frac{\Delta f}{\Delta g}\frac{\mathrm{dg}}{\mathrm{dx} } = \frac{\mathrm{df}}{\mathrm{dg} }\frac{\mathrm{dg}}{\mathrm{dx} }$$

I like your limit proof Fredrik.
Thanks for the input from you all. It would be interesting to see that 3 variable proof if possible?!

Interestingly I found this other proof for the chain rule http://tutorial.math.lamar.edu/Classes/CalcI/DerivativeProofs.aspx (at the very bottom of the page) it seems to be pretty rigorous, but have a look and see what you think.

mathwonk said:
i always prove it s follows:

if f(t) = y(x(t)), y(0) = x(0) = 0, and if x(t) ≠ 0 for all small t≠0, then deltay/deltat = deltay/deltax , deltax/deltat, so in the limit,

dy/dt = dy/dx . dx/dt.

If on the other hand x(t) = 0 for t converging to 0, then dx/dt = 0, and we only have to prove that dy/dt = 0. But dltay/deltat = 0 for those t for which x(t) = 0, and for the others, the previous argument works.
I don't understand this. First you make assumptions about the functions, losing generality. (Perhaps the generality can be recovered by using this theorem to prove the more general result). Then you're talking about "small t". There's some fancy stuff hidden in that remark. What you're really saying is (I think) that there's an open neigborhood of 0 on which x takes non-zero values except at 0. Only a person who has studied topology will understand this. And someone who understands it will still have to consider the possibility that it isn't true. What if x(t)=0 for infinitely many values of t, in every open neighborhood of 0? Then we'd have to prove that either x'(0)=0, or x isn't differentiable at at 0.

There's also some fancy stuff hidden in the "so in the limit" comment at the end. It's far from obvious that this statement is true. I'd say that the key part of the proof is to show that it is.

Last edited:
Svein said:
we end up with $$lim_{\Delta g\rightarrow0}\frac{\Delta f}{\Delta g}\frac{\mathrm{dg}}{\mathrm{dx} } = \frac{\mathrm{df}}{\mathrm{dg} }\frac{\mathrm{dg}}{\mathrm{dx} }$$
This is the part that's hard to prove. It takes a lot more than just a statement.

i don't know if you will like this any better, but these are notes i wrote for my class on this topic years ago:

Here is the traditional proof of the chain rule which appeared in calculus books at the turn of the century before being "lost".

Trivial lemma: if the domain of a function f is the union of two sets and the restriction of f to each of those sets converges to 0 as x approaches a, then f itself converges to 0 as x approaches a.

Now assume z(y(x)) is a composite of two differentiable functions and that on every deleted neighborhood of a, ∆y = 0 somewhere. Then clearly dy/dx = 0 at a. Hence to prove the chain rule there, means to show that ∆z/∆x approaches 0. On the set where ∆y = 0, then ∆z/∆x also equals 0, so this set poses no problem. On the set where ∆y ≠ 0, we have ∆z/∆x = (∆z/∆y) (∆y/∆x) so the result follows by the product rule for limits.

From this point of view, the so called "problem set" is the easier one to deal with.

This result was traditionally proved correctly in turn of

the century English language books, such as Pierpont's Theory of functions

of a real variable, and in 19th century European books such as that of

Tannery [see the article by Carslaw, in vol XXIX of B.A.M.S.], but

unfortunately not in the first three editions of the influential book Pure

Mathematics, by G.H.Hardy. Although Hardy reinstated the classical proof in later editions, modern books usually deal with the problem by giving the slightly more sophisticated linear approximation proof, or making what to me are somewhat artificial constructions. The classical proof seems to have merit, so I recall it here.

The point is simply that in proving a function has limit L, one only needs

to prove it at points where the function does not already have value L.

Thus to someone who says that the usual argument for the chain rule for

y(u(x)), does not work for x's where ∆u = 0, one can simply reply that

these points are irrelevant.

Assume f is differentiable at g(a), g is differentiable at a, and on every

neighborhood of a there are points x where g(x) = g(a). We claim the

derivative of f(g(x)) at a equals f'(g(a))(g'(a)).

Proof:

1) Clearly under these hypotheses, g'(a) = 0.

Consequently,

2) the chain rule holds at a if and only if lim∆f/∆x = 0 as x approaches a.

3) Note that ∆f = ∆f/∆x = 0 at all x such that g(x) = g(a).

4) In general, to prove that lim h(x) = L, as x approaches a, it suffices

to prove it for the restriction of h to those x such that h(x) ≠ L.

5) Thus in arguing that ∆f/∆x approaches 0, we may restrict to x such that

g(x) ≠ g(a), where the usual argument applies.

Fredrik said:
This is the part that's hard to prove. It takes a lot more than just a statement.

I was going to explain my meaning, but when I read the post above, I felt that mathwonk had already presented my arguments. So I'll refrain.

Since this is a bit confusing and I would like it to be better understood, I will add a bit more detail:

Classical proof of the chain rule. (I have corrected several typos since first posting this.)

Chain rule: if y is a function of x, which is a function of t, and if the functions y and x are differentiable, at x0 = x(t0), and at t0, respectively, then y(x(t)) is differentiable at t0, and at t0, dy/dt = dy/dx.dx/dt, where the derivative of x is evaluated at x0.

The usual proof of the chain rule, in the generic case, is to factor ∆y/∆t = ∆y/∆x. ∆x/∆t, and take the limit as t-->t0. Since the limit of ∆x/∆t is then assumed to exist and to equal dx/dt, and since thus x(t) is also continuous at t0, letting t-->t0 causes also x-->x0, so the limit ∆y/∆x as t-->t0 equals its limit as x-->x0, which is assumed to exist and to equal dy/dx. Then by the product rule for limits, the limit of dy/dt also exists and equals the product dy/dx.dx/dt.

This argument assumes that the fractions in the factorization make sense, i.e. that for ∆t ≠ 0, we also have ∆x ≠ 0. Thus there is one special case to argue, when this does not happen. I.e. we must also argue the validity of the formula and the limit when, on every deleted interval about t0, there is a t for which ∆x = 0.

This argument is based on a simple principle. Namely, if we partition a deleted interval about t0 into two disjoint subsets, A and B, both of which contain points arbitrarily near t0, and if g is a function defined on some open interval containing t0, then the limit of

g(t) exists as t-->t0 and equals L, if and only if the limits of both restrictions of g, to A and to B, also exist and equal L, as t-->t0, but with t constrained to lie first in A and then in B.

Now we apply this rule to the function ∆y/∆t, defined on a deleted interval about t0. We partition this interval into the sets A of those points t ≠ t0 where ∆x ≠0, and the set B of those points t ≠ t0, where ∆x = 0.

If B contains points arbitrarily near t0, then since the the limit dx/dt exists, also the limit exists when we restrict to the set B, and that restricted limit has the same value. Since on the set B, by definition ∆x has value constantly zero, the limit of that restriction is also zero, as is therefore the derivative dx/dt.

Hence the equation we wish to prove, dy/dt = dy/dx.dx/dt, equals zero on the right side, so it suffices to prove we have limit zero also on the left hand side.

I.e. we must prove the limit ∆y/∆t exists and equals zero as t-->t0, for which it suffices to prove this for both restrictions, of ∆y/∆t to the set A, and to the set B. now by definition of the set B, on this set ∆x = 0 and thus also ∆y = 0, so the function ∆y/∆t is identically zero on the set B, whence this restriction definitely has the limit zero.

But on the set A, we can factor ∆y/∆t = ∆y/∆x. ∆x/∆t, and argue the limit exists and (since we know that ∆x/∆t-->dx/dt = 0), the limit equals zero by the product rule as before, This concludes the proof.

Last edited:
@Fredrik: did this help the misunderstanding?

mathwonk said:
@Fredrik: did this help the misunderstanding?
Sorry mathwonk, I've been really busy and haven't had time to really examine your posts yet. I will take a closer look tomorrow or Saturday.

@mathwonk: I read your post #23 soon after I wrote the above, but I didn't have time to read post #25 at the same time. I didn't come back here to do that for some time, but I have now. After reading post #23, I understood the idea of the proof, and after reading post #25, I also understood how you're using the lemma mentioned in post #23.

My only concern is that there's still some work required to show rigorously that ##\Delta y/\Delta x\to dy/dx## as ##t\to t_0##. I would be able to do it, but someone who's new at it would probably find it very difficult.

that is because x(t) is assumed differentiable, hence also continuous, so as t-->t0, also x-->x0.

and we know by hypothesis that Δy/Δx→dy/dx as x-->x0.

mathwonk said:
that is because x(t) is assumed differentiable, hence also continuous, so as t-->t0, also x-->x0.

and we know by hypothesis that Δy/Δx→dy/dx as x-->x0.
Right, but this isn't a proof. It's a couple of observations that you can use as the starting point of a proof.

I think this argument can be misleading. Someone might interpret it as a simple application of the transitivity of the logical implication operation. What I mean is that it might be interpreted as

We know that ##t\to t_0\Rightarrow x\to x_0\Rightarrow \Delta y/\Delta x\to dy/dx##, so we can conclude that ##t\to t_0\Rightarrow \Delta y/\Delta x\to dy/dx##.​

Of course, a statement like "##f(x)\to A## as ##x\to a##" doesn't mean that ##x\to a\Rightarrow f(x)\to A##. The strings "##x\to a##" and "##f(x)\to A##" aren't even statements.

you lost me. i think i have given a complete proof, with ample detail.

@ Fredrik: Sorry to be so slow. Did you want something like this?

Lemma: If (1) f(x)-->L as x-->x0, and (2) x(t)-->x0 as t-->t0, then f (x(t))-->L as t-->t0.
proof: given e>0 choose d1>0 so that |x-x0|<d1 implies | f(x)-L | < e. (ok by (1)).

Then choose d>0 so that |t-t0| < d implies |x(t)-x0| < d1. (ok by (2)).

Then |t-t0| < d implies |f(x(t) - L| < e. QED.

(I admit I tend to take for granted that someone can fill in such details on his own. But i do recall being puzzled by exactly such matters, hmmm, maybe some 50 years ago.)

Last edited:
Yes, that's what I had in mind. That completes the proof.

here is another nice way to make logical sense of vague statements like "t-->t0 implies x-->x0".

think of t as a sequence {tn}, and tn-->t0 as the statement "the sequence {tn} converges to t0 (as n goes to infinity)".

then the statement " tn-->t0 implies x(tn)-->x0" is an implication. and since also "xn-->x0 implies f(xn)-->L" is an implication,

one can compose them and get that "tn-->t0 implies f(x(tn))-->L".

mathwonk said:
think of t as a sequence {tn}, and tn-->t0 as the statement "the sequence {tn} converges to t0 (as n goes to infinity)".
Mathematically, this is not equivalent with an ε - δ argument. A sequence contains a countable set of points, the ε - δ argument talks about all points (which is usually not countable). The argument would make sense in ℚ, but then the limit might not be in ℚ.

As a matter of fact the sequence approach is equivalent to the epsilon delta approach, for convergence in the reals, as you can show as an easy exercise. i.e.

lemma: the following are equivalent:
1) for every e>0 there is a d>0 such that |x-x0| < d implies |f(x)-L| < e.

2) for every sequence {xn} converging to x0, the sequence {f(xn)} converges to L.

the point is that there is a countable sequence of rationals, {1/n} if you like, converging to 0, and if (1) fails, these can be used (as d's) to construct a sequence for which (2) fails. the proof that if (1) is true then (2) is true is even easier.

sequential convergence is equivalent to more general convergence in any "first countable space", such as any metric space. here is a little article on them from wikipedia:

http://en.wikipedia.org/wiki/First-countable_space