Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Help Understanding Stewart Chain Rule Proof [Picture Provided]

  1. Mar 8, 2012 #1
    Hi I've spent a long time trying to understand this chain rule proof but I just can't get it...

    I have attached 2 pictures: the second one is an intuitive chain rule proof that turns out to be bogus and the first is the correct proof. So I am trying to understand first of all what does the second proof actually mean, and secondly what did they do to fix the problem presented by the first invalid proof.

    Basically any and all help for me to gain insight into this proof would be appreciated. Don't be afraid to write too much or too little because I am very lost about what they are doing. What does the "property" that they lay out (resulting in equation 7) before they start the actual proof actually mean and how does it help their proof? Why introduce epsilon?

    I wish there was someone here in person to walk me through it but I guess I will have to try my best to get it over the internet.

    Thanks a lot!

    Attached Files:

    Last edited: Mar 8, 2012
  2. jcsd
  3. Mar 9, 2012 #2


    User Avatar
    Science Advisor
    Homework Helper

    Here is the traditional proof of the chain rule: (Stewart's first proof made correct):

    We have a composite function y(u(x)), and assume both component functions y(u) and u(x) are differentiable, and we claim that also y(u(x)) is differentiable, and that its derivative equals y’(u).u’(x) = y’(u(x)).u’(x).

    All we have to do is show the limit of (∆y/∆x) as ∆x -->0, equals y’(u).u’(x).

    We are assuming that (∆u/∆x)-->u’(x) as ∆x -->0, and also that (∆y/∆u)-->y’(u) as ∆u -->0.

    Of course one needs to know the meaning of a limit. I.e. (∆y/∆u)-->y’(u) as ∆u -->0, means that the fraction (∆y/∆u) gets really close to the number y’(u) as long as ∆u is really small but not zero, and the same for the other limit.

    Unfortunately since the function u(x) is not assumed to be “one to one”, it can happen that two different values of x give the same value of u, and then we would have ∆u = 0 even though ∆x ≠ 0.

    So if we try to break up the fraction ∆y/∆x into a product (∆y/∆u)(∆u/∆x) and use the product rule for limits, we have a problem since this product may not really equal ∆y/∆x for all ∆x that is really small but not zero. I.e. if there is a small non zero ∆x such that ∆u = 0, then ∆y/∆x does not equal (∆y/∆u)( ∆u/∆x), since the fraction (∆y/∆u) does not make sense.

    Now we get to start from as small a ∆x as we want in this limit, so if there is ever a ∆x so small that ∆u ≠ 0 for that ∆x and also for all smaller ∆x, there is no problem. So the only case where we have not proved the chain rule is when there is a sequence of ∆x’s approaching zero, and for all of them we still have ∆u = 0.

    Now in that case, it follows that the fraction ∆u/∆x equals zero for all those ∆x’s, and since this fraction has a limit, the only possible limit is zero. I.e. in the only case where the proof does not work, we know that u’(x) = 0. Thus for the theorem to hold in that case, we only need to prove that y’(x) = y’(u).u’(x) = y’(u).0 = 0. I.e. all we have to do is prove that in this case the fraction ∆y/∆x still approaches zero even though we cannot always factor it into a product of fractions.

    The secret is to notice that we can still factor it as ∆y/∆x = (∆y/∆u)(∆u/∆x),

    as long as ∆u ≠ 0. I.e. there are two kinds of ∆x’s, those for which ∆u = 0, and those for which ∆u ≠ 0. But when ∆u = 0 we do not need to factor it, i.e. it is trivial then that the fraction ∆y/∆x = 0, since the top is the difference of the values of y at the same two values of u, so of course it equals zero. I.e. ∆u = 0 means the two values of u are the same, so y has the same value at bo0th of them so ∆y = 0, hence also ∆y/∆ = 0.

    And in the case where ∆u ≠ 0, we can still factor the fraction as ∆y/∆x = (∆y/∆u)(∆u/∆x), and use the other product argument. I.e. as long as ∆x is really small, if ∆u ≠ 0, then the fraction ∆y/∆x = (∆y/∆u)(∆u/∆x). And since u’(x) = 0 in this case, (∆u/∆x) is a small number, and (∆y/∆u) is close to the finite number y’(u), so the product (∆y/∆u)(∆u/∆x), is a small number.

    And in the case where ∆u = 0, things are actually even better. I.e. although we cannot factor the fraction, it does not matter because then ∆u= 0 implies also ∆y = 0, so the fraction ∆y/∆x is as close to zero as it can get, since it equals zero.

    Thus in the “bad” case where ∆x is small and non zero, but ∆u = 0, the chain rule still holds because both sides of the equaion equal zero.

    Thus Stewart’s second proof is unnecessary. It works because he has managed to take the denominators out of the argument. But he has also managed to make the argument less understandable.

    This result was traditionally proved correctly in turn of
    the century English language books, such as Pierpont's Theory of functions
    of a real variable, and in 19th century European books such as that of
    Tannery [see the article by Carslaw, in vol XXIX of B.A.M.S.], but
    unfortunately not in the first three editions of the influential book Pure
    Mathematics, by G.H.Hardy. Although Hardy reinstated the classical proof in later editions, modern books usually deal with the problem by giving the slightly more sophisticated linear approximation proof, or making what to me are somewhat artificial constructions.

    The point is simply that in proving a function has limit L, one only needs
    to prove it at points where the function does not already have value L.
    Thus to someone who says that the usual argument for the chain rule for
    y(u(x)), does not work for x's where ∆u = 0, one can simply reply that
    these points are irrelevant.

    Assume f is differentiable at g(a), g is differentiable at a, and on every
    neighborhood of a there are points x where g(x) = g(a). We claim the
    derivative of f(g(x)) at a equals f'(g(a))(g'(a)).
    1) Clearly under these hypotheses, g'(a) = 0.
    2) the chain rule holds at a if and only if lim∆f/∆x = 0 as x approaches a.
    3) Note that ∆f = ∆f/∆x = 0 at all x such that g(x) = g(a).
    4) In general, to prove that lim h(x) = L, as x approaches a, it suffices
    to prove it for the restriction of h to those x such that h(x) ≠ L.
    5) Thus in arguing that ∆f/∆x approaches 0, we may restrict to x such that
    g(x) ≠ g(a), where the usual argument applies.
    Last edited: Mar 9, 2012
  4. Mar 9, 2012 #3


    User Avatar
    Science Advisor
    Homework Helper

    The proof in Stewart is the linear approximation proof, that just says the same thing but takes out the division step.

    I.e. saying that deltay/deltax -->y'(a) as deltax -->0, is the same as saying that

    deltay/deltax - y'(a) -->0, i.e. that [deltay - y'(a)deltax]/deltax -->0.

    now if we multiply by deltax, we get that {[deltay - y'(a)deltax]/deltax}.delta x =

    = [deltay - y'(a)deltax] = e.deltax, where e-->0, i.e. e = [deltay - y'(a)deltax]/deltax.

    Thus we can state that y’(a) is the derivative of y without usin=g denominatiors, by saying that

    delta y = y’(a)deltax + e.deltax, where e-->0 as deltax does.

    so if y(x) = y(u(x)), in order to show that dy/dx (a) = y’(u(a)).(u’(a)),

    all we have to do is show that we can write

    y(u(x)) – y(u(a)) = [y’(u(a)).(u’(a))].deltax + E.deltax, where E is something that goes to zero as deltax does.

    This is a messy substitution using what we are given,

    i.e. y(u(x)) – y(u(a)) = y’(u(a)). delta u + e delta u, where e-->0, because y is differentiable wrt u.

    But u is also differentiable wrt x, so we can plug the same sort of thing in for delta u:

    y(u(x)) – y(u(a)) = y’(u(a)). {u’(a).delta x + e1.delta x} + e{u’(a).delta x + e1.delta x},

    where e-->0 as delta u does, and e1-->0 as delta x does. Fortunately u is continuous in x, so delta u goes to zero when delta x does, hence e-->0 also when delta x does.

    Now just expand and collect terms getting:

    y(u(x)) – y(u(a)) = y’(u(a)). u’(a).delta x + {e1.y’(u(a)) + e.u’(a) + e1}.delta x}

    = y’(u(a)). u’(a).delta x + E. delta x, where E = {e1.y’(u(a)) + e.u’(a) + e1}, and that does go to zero as delta x does.

    Hence the multiplier of delta x must be the derivative dy/dx. i.e. dy/dx = y’(u(a)). u’(a).

    the one good thing about this more comp-licated proof is that this idea also works in several variables where you cannot just divide by the vector variable. also it reminds you that a derivative is really a linear approximation to the original function, which is important to know.

    But the idea that the original simpler proof does not work is just wrong, and may be evidence that a lot of calculus book writers just copy their stuff from other recent best sellers without thinking it through or doing historical research on the topic. Or to be fair, maybe they are aware of it but choose for pedagogical reasons not to mention it.
    Last edited: Mar 9, 2012
  5. Aug 2, 2012 #4

    dy = domain of y(u)
    du = domain of u(x)
    ru = range of u(x)
    [tex]\forall_{a\in d_y,\;b\in d_u,\;c\in r_u}\exists_{k_1, k_2, k_3\in\ \mathbb R}\;s.t.\;\left\{\lim_{x\to\ a}\frac{y(x)-y(a)}{y-a}=k_1 \wedge \lim_{x\to\ b}\frac{u(x)-u(b)}{x-b}=k_2 \wedge \lim_{x\to\ c}\frac{y(x)-y(c)}{y-c}=k_3\right\}\\\longrightarrow\;\forall_{a\in\ d_y,\;b\in\ d_u,\;c\in\ r_u}\left\{\frac{d}{dx}y(u(x)) = \left(\lim_{x\to\ a}\frac{y(x)-y(a)}{y-a}\cdot\lim_{x\to\ b}\frac{u(x)-u(b)}{x-b}\right) = \left(\frac{d}{dx}y(u(x))\cdot\lim_{x\to\ b}\frac{u(x)-u(b)}{x-b}\right)\right\}[/tex]

    ... did I interpret this first part correctly? I didn't know how to write the derivative of y(u(x)) in limit definition form.

    ... do you say ∆y instead of ∆y(u(x)) because if we can show that lim ∆x->0(∆y/∆x) = y’(u).u’(x) for an arbitrary domain of y, then obviously the equality will hold true for the domain made up by the range of u(x)?

    ... okay I think i think this is consistent with what I typed in the "want to show"

    ... okay because u(x) being one-to-one would imply each input x would give you a unique u(x) value, and we didn't assume this.

    [tex]\small\text{... okay because we have} \frac{∆y}{∆x}\text{and we want to multiply by} \frac{∆u}{∆u} \text{but} \frac{∆u}{∆u} \text{is undefined if ∆u=0}\\\small\text{and we can't guarantee ∆u≠0 since we didn't assume u(x) is 1-to-1}\\
    \small\text{Also, if ∆u=0 for a nonzero ∆x that means there is at least one number p in the domain of u(x) where}\\\lim_{x\to\ p}\frac{u(x)-u(p)}{x-p} = \lim_{x\to\ p+∆x}\frac{u(x)-u(p+∆x)}{x-(p+∆x)}[/tex]

    i don't know if that's relevent or not (or even true for that matter)



    is y'(x) the same as y'(u(x))? I thought we were trying to show y'(u(x)) = y'(u)*u'(x)? I was on board before because we were just doing ∆y/∆x which I assumed was considering u(x) to be the input into y. So I thought it was just saying "change in y(u(x)) resulting from a tiny change of x". But with y'(x) it seems like we're saying "change in y(u(x)) resulting from a tiny change in u(x)"?

    sorry; i'm stuck. :frown:
  6. Aug 3, 2012 #5
    edit: in the denominators of the limits in the "want to show" section i meant to have x-a and x-c not y-a and y-c
  7. Aug 5, 2012 #6
    can anyone help get me unstuck?
  8. Aug 6, 2012 #7
    I know it took me 5 months from the time I posted the initial question to when I posted my follow-up, but now I got stuck. :(

    I want to understand but I can't go anywhere with the proof at this point
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook