"Don't panic!"
- 600
- 8
I'm currently reviewing my knowledge of calculus and trying to include rigourous (ish) proofs in my personal notes as I don't like accepting things in maths on face value. I've constructed a proof for the chain rule and was wondering if people wouldn't mind checking it and letting me know if it is correct or not (and what improvements may need to be made). Thanks for your time.
From the definition of the derivative of a differentiable function f:\mathbb{R}\rightarrow\mathbb{R} (one-dimensional case), we have that f'(x)=\frac{df}{dx}=\lim_{\Delta x\rightarrow 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}= \lim_{\Delta x\rightarrow 0}\frac{\Delta f}{\Delta x}
This implies that \frac{\Delta f}{\Delta x}= f'(x) +\varepsilon (x)\quad\Rightarrow\quad\Delta f = \left(f'(x)+\varepsilon (x)\right)\Delta x
where \varepsilon (x) is some error function which accounts for the difference between the actual (finite) change in f and its linear approximation f'(x)\Delta x. Furthermore, \varepsilon (x) satisfies the property \lim_{\Delta x\rightarrow 0}\varepsilon (x)=0 such that as \Delta x \rightarrow 0,\quad\frac{\Delta f}{\Delta x}\rightarrow f'(x).
Now, consider a function y=f\circ g(x)=f(g(x)) and let u=g(x). We have then, that \Delta u = g(x+\Delta x)-g(x)=\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x \Delta y = f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u
Note that \varepsilon_{2}(u)\rightarrow 0 as \Delta u\rightarrow 0. However, since \Delta u\rightarrow 0 as \Delta x\rightarrow 0, this implies that \varepsilon_{2}(u)\rightarrow 0 as \Delta x\rightarrow 0.
And so,
f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u \Rightarrow f(g(x+\Delta x))-f(g(x))=f\circ g(x+\Delta x)-f\circ g(x)\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=\left(f'(g(x))+\varepsilon_{2}(g(x))\right)\cdot\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))f'(g(x))g'(x)\Delta x +\left(f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))g'(x)\Delta x +\varepsilon_{3}\Delta x
where \varepsilon_{3}\equiv f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}. We see from this that as \Delta x\rightarrow 0,\quad\varepsilon_{3}\rightarrow 0. Hence,
\lim_{\Delta x\rightarrow 0}\frac{f\circ g(x+\Delta x)-f\circ g(x)}{\Delta x}= (f\circ g)'(x)=f'(g(x))g'(x)
From the definition of the derivative of a differentiable function f:\mathbb{R}\rightarrow\mathbb{R} (one-dimensional case), we have that f'(x)=\frac{df}{dx}=\lim_{\Delta x\rightarrow 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}= \lim_{\Delta x\rightarrow 0}\frac{\Delta f}{\Delta x}
This implies that \frac{\Delta f}{\Delta x}= f'(x) +\varepsilon (x)\quad\Rightarrow\quad\Delta f = \left(f'(x)+\varepsilon (x)\right)\Delta x
where \varepsilon (x) is some error function which accounts for the difference between the actual (finite) change in f and its linear approximation f'(x)\Delta x. Furthermore, \varepsilon (x) satisfies the property \lim_{\Delta x\rightarrow 0}\varepsilon (x)=0 such that as \Delta x \rightarrow 0,\quad\frac{\Delta f}{\Delta x}\rightarrow f'(x).
Now, consider a function y=f\circ g(x)=f(g(x)) and let u=g(x). We have then, that \Delta u = g(x+\Delta x)-g(x)=\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x \Delta y = f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u
Note that \varepsilon_{2}(u)\rightarrow 0 as \Delta u\rightarrow 0. However, since \Delta u\rightarrow 0 as \Delta x\rightarrow 0, this implies that \varepsilon_{2}(u)\rightarrow 0 as \Delta x\rightarrow 0.
And so,
f(u+\Delta u)-f(u)=\left(f'(u)+\varepsilon_{2}(u)\right)\Delta u \Rightarrow f(g(x+\Delta x))-f(g(x))=f\circ g(x+\Delta x)-f\circ g(x)\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=\left(f'(g(x))+\varepsilon_{2}(g(x))\right)\cdot\left(g'(x)+\varepsilon_{1}(x)\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))f'(g(x))g'(x)\Delta x +\left(f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}\right)\Delta x\\ \qquad\qquad\qquad\qquad\qquad\quad\;\;=f'(g(x))g'(x)\Delta x +\varepsilon_{3}\Delta x
where \varepsilon_{3}\equiv f'(g(x)) \varepsilon_{1}+g'(x)\varepsilon_{2}+\varepsilon_{1}\varepsilon_{2}. We see from this that as \Delta x\rightarrow 0,\quad\varepsilon_{3}\rightarrow 0. Hence,
\lim_{\Delta x\rightarrow 0}\frac{f\circ g(x+\Delta x)-f\circ g(x)}{\Delta x}= (f\circ g)'(x)=f'(g(x))g'(x)
ere is the traditional proof of the chain rule which appeared in calculus books at the turn of the century before being "lost".Trivial lemma: if the domain of a function f is the union of two sets and the restriction of f to each of those sets converges to 0 as x approaches a, then f itself converges to 0 as x approaches a.Now assume z(y(x)) is a composite of two differentiable functions and that on every deleted neighborhood of a, ∆y = 0 somewhere. Then clearly dy/dx = 0 at a. Hence to prove the chain rule there, means to show that ∆z/∆x approaches 0. On the set where ∆y = 0, then ∆z/∆x also equals 0, so this set poses no problem. On the set where ∆y ≠ 0, we have ∆z/∆x = (∆z/∆y) (∆y/∆x) so the result follows by the product rule for limits.From this point of view, the so called "problem set" is the easier one to deal with.This result was traditionally proved correctly in turn of