The chain rule:If we have a composite function y(u(x)), and both component functions y(u) and u(x) are differentiable, then we claim that also y(u(x)) is differentiable, and that its derivative equals y’(u).u’(x) = y’(u(x)).u’(x).
I.e. we claim the limit of (∆y/∆x), as ∆x -->0, equals y’(u).u’(x).
We are assuming that (∆u/∆x)-->u’(x) as ∆x -->0, and also that (∆y/∆u)-->y’(u) as ∆u -->0.
Moreover, by definition of continuity, and since differentiable functions are continuous, we know ∆u -->0 whenever ∆x -->0, hence both (∆u/∆x)-->u’(x) and (∆y/∆u)-->y’(u), as ∆x -->0.
Thus by the product rule for limits, (∆y/∆x) = (∆y/∆u)( ∆u/∆x) --> y’(u).u’(x), as ∆x -->0,
as long as the factorization (∆y/∆x) = (∆y/∆u)( ∆u/∆x) makes sense for all small values of (∆x).
The only way it does not, is if there are arbitrarily small values of ∆x for which ∆u = 0.
Now in that case, it follows that the fraction ∆u/∆x equals zero for all those ∆x’s, and since this fraction has limit u'(x), the only possible value of that limit is zero. I.e. in the only case where the proof does not work, we know that u’(x) = 0.
Thus for the theorem to hold also in that case, we only need to prove that y’(x) = y’(u).u’(x) = y’(u).0 = 0. I.e. all we have to do is prove that in this case the fraction ∆y/∆x still approaches zero even though we cannot always factor it into a product of fractions.
The secret is to notice that we can still factor it as ∆y/∆x = (∆y/∆u)(∆u/∆x), at those points where ∆u ≠ 0. I.e. there are two kinds of ∆x’s, those for which ∆u = 0, and those for which ∆u ≠ 0.
But when ∆u = 0, it is trivial that the fraction ∆y/∆x = 0, since ∆y is the difference of the value of y at the same two values of u, so of course it equals zero. And in the case where ∆u ≠ 0, we can still factor the fraction as ∆y/∆x = (∆y/∆u)(∆u/∆x), and use the other product argument.
Thus for both types of values of ∆x in this case, we have ∆y/∆x --> 0 = y’(u).u’(x), and so the chain rule holds in all cases.
This result was traditionally proved correctly in turn of
the century English language books, such as Pierpont's Theory of functions
of a real variable, and in 19th century European books such as that of
Tannery [see the article by Carslaw, in vol XXIX of B.A.M.S.], but
unfortunately not in the first three editions of the influential book Pure
Mathematics, by G.H.Hardy. Although Hardy reinstated the classical proof in later editions, modern books usually deal with the problem by giving the slightly more sophisticated linear approximation proof, or making what to me are somewhat artificial constructions.
Summary:
The point is simply that in proving a function has limit L, one only needs
to prove it at points where the function does not already have value L.
Thus to someone who says that the usual argument for the chain rule for
y(u(x)), does not work for x's where ∆u = 0, one can simply reply that
these points are irrelevant.