Why Multiply by the Derivative of the Inner Function in the Chain Rule?

Quincy · Oct 13, 2009

Chain Rule - intuitive "Proof"

Suppose y = f(u), and u = g(x), then dy/dx = dy/du * du/dx.

In an intuitive "proof" of the chain rule, it has this step: dy/dx = [tex]\lim_{\Delta x \to 0} \frac {\Delta y}{\Delta x}[/tex] = [tex]\lim_{\Delta x \to 0} \frac {\Delta y}{\Delta u}[/tex] * [tex]\frac {\Delta u}{\Delta x}[/tex]

My question is, why multiply by [tex]\frac {\Delta u}{\Delta u}[/tex]? I know mathematically, it's because [tex]\frac {\Delta u}{\Delta u}[/tex] = 1, and multiplying by 1 doesn't change the function, but I'm looking for a philosophical reason. I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...

LCKurtz · Oct 13, 2009

A change Δx in x causes a change Δu in u which causes a change Δy in y. The derivatives are limits of difference quotients so you would expect Δy/Δu [itex]\rightarrow[/itex] dy/du and Δu/Δx [itex]\rightarrow[/itex] du/dx. Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable. That's about as philosophical as I get. And I gather that you do know that method isn't really a proof.

Quincy · Oct 13, 2009

LCKurtz said:

Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable.

But why is it appropriate?

LCKurtz said:

And I gather that you do know that method isn't really a proof.

Yes, that's why I put proof in quotation marks.

LCKurtz · Oct 13, 2009

Quincy said:

But why is it appropriate?

Because I need a Δy/Δu and Δu/Δx in the equation and multiplying by Δu/Δu doesn't change the equation.

HallsofIvy · Oct 13, 2009

Surely you've seen that done before in mathematics? When you get common denominators to add fractions, you multiply numerator and denominator by the same thing- that's exactly the same idea. When you complete the square in a quadratic, you add and subtract the same thing. Almost the same idea.

Tac-Tics · Oct 13, 2009

Quincy said:

I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...

Interesting quote. I'm glad I came across this. (Thanks mathwonk!)

I'm not sure what more can be said. But for my own sake, and maybe yours as well, I'll do a little explaining to think the idea through.

Differentiable functions are locally linear. That means, you take any differentiable function f and a point p. There exist constants a and b such that f(x) = ax + b as long as x is pretty damn close to p. The derivative of f at point p is simply a. (That's almost TOO convenient...).

Combine this idea with the chain rule. Let f and g be differentiable functions. We note that f . g is differentiable too. We pick a point p. We want to find the derivative of f . g at p. We use our rule above. Since f . g is differentiable, it is locally linear. Which means, (f . g)(x) = f(g(x)) = a x + b for some a and b. Our goal is to determine the value of a.

Well, since f and g are both differentiable, we know that they too are locally linear. So let's will some more variables into existence, using the same rule:

g(x) = a' x + b'

(The primes are not differentiation. The a's are the derivatives at point p and the b's are the constants).

So the derivative of g at point p is a'.

Next, we use our rule on f. But it's a little different this time! We're not taking the derivative of f at p. Instead, we're taking the derivative of f at the point g(p)! But a point is a point, and fixing g(p), we use our rule to conclude that

f(x) = a'' x + b'' for all x's that are pretty damn close to g(p).

So now we have a' (the derivative of g at point p) and a'' (the derivative of f at point g(p)). Let's compose f and g!

(f . g)(x) = f(g(x))

If x is close to p, then we can expand g(x) as a' x + b':

(f . g)(x) = f(a' x + b') for all x's pretty damn close to p.

Now, and again, g(x) is pretty damn close to p, so we can expand f:

(f . g)(x) = a'' (a' x + b') + b'' = (a'' a') x + (a'' b' + b'') for all x pretty damn close to p.

And we draw our conclusion. The derivative of f . g at point p is simply the first-order term, a'' a'. Exciting. What does that mean? Pulling from the definitions above, a'' is the derivative of f at point g(p), and a' is the derivative of g at point p. That is exactly what the chain rule is.

OK. That wasn't as simple as I hoped. But I hope you get the picture a little better when you replace the f'(g(x)) clutter with constants. I think, in particular, you can see where the multiplication comes in. It's linear. You substitute. You shave off a few bits and toss it into your constant, and multiply a few coefficients.

Quincy · Oct 13, 2009

Thanks!

Why Multiply by the Derivative of the Inner Function in the Chain Rule?

1. What is the chain rule?

2. How is the chain rule used?

3. What is a composite function?

4. Can you provide an intuitive proof of the chain rule?

5. Why is the chain rule important in mathematics and science?

Similar threads

Hot Threads

Recent Insights