Why Multiply by the Derivative of the Inner Function in the Chain Rule?

Quincy
Messages
228
Reaction score
0
Chain Rule - intuitive "Proof"

Suppose y = f(u), and u = g(x), then dy/dx = dy/du * du/dx.

In an intuitive "proof" of the chain rule, it has this step: dy/dx = \lim_{\Delta x \to 0} \frac {\Delta y}{\Delta x} = \lim_{\Delta x \to 0} \frac {\Delta y}{\Delta u} * \frac {\Delta u}{\Delta x}

My question is, why multiply by \frac {\Delta u}{\Delta u}? I know mathematically, it's because \frac {\Delta u}{\Delta u} = 1, and multiplying by 1 doesn't change the function, but I'm looking for a philosophical reason. I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...
 
Physics news on Phys.org


A change Δx in x causes a change Δu in u which causes a change Δy in y. The derivatives are limits of difference quotients so you would expect Δy/Δu \rightarrow dy/du and Δu/Δx \rightarrow du/dx. Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable. That's about as philosophical as I get. And I gather that you do know that method isn't really a proof.
 


LCKurtz said:
Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable.
But why is it appropriate?


LCKurtz said:
And I gather that you do know that method isn't really a proof.
Yes, that's why I put proof in quotation marks.
 


Quincy said:
But why is it appropriate?

Because I need a Δy/Δu and Δu/Δx in the equation and multiplying by Δu/Δu doesn't change the equation.
 


Surely you've seen that done before in mathematics? When you get common denominators to add fractions, you multiply numerator and denominator by the same thing- that's exactly the same idea. When you complete the square in a quadratic, you add and subtract the same thing. Almost the same idea.
 


Quincy said:
I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...

Interesting quote. I'm glad I came across this. (Thanks mathwonk!)

I'm not sure what more can be said. But for my own sake, and maybe yours as well, I'll do a little explaining to think the idea through.

Differentiable functions are locally linear. That means, you take any differentiable function f and a point p. There exist constants a and b such that f(x) = ax + b as long as x is pretty damn close to p. The derivative of f at point p is simply a. (That's almost TOO convenient...).

Combine this idea with the chain rule. Let f and g be differentiable functions. We note that f . g is differentiable too. We pick a point p. We want to find the derivative of f . g at p. We use our rule above. Since f . g is differentiable, it is locally linear. Which means, (f . g)(x) = f(g(x)) = a x + b for some a and b. Our goal is to determine the value of a.

Well, since f and g are both differentiable, we know that they too are locally linear. So let's will some more variables into existence, using the same rule:

g(x) = a' x + b'

(The primes are not differentiation. The a's are the derivatives at point p and the b's are the constants).

So the derivative of g at point p is a'.

Next, we use our rule on f. But it's a little different this time! We're not taking the derivative of f at p. Instead, we're taking the derivative of f at the point g(p)! But a point is a point, and fixing g(p), we use our rule to conclude that

f(x) = a'' x + b'' for all x's that are pretty damn close to g(p).

So now we have a' (the derivative of g at point p) and a'' (the derivative of f at point g(p)). Let's compose f and g!

(f . g)(x) = f(g(x))

If x is close to p, then we can expand g(x) as a' x + b':

(f . g)(x) = f(a' x + b') for all x's pretty damn close to p.

Now, and again, g(x) is pretty damn close to p, so we can expand f:

(f . g)(x) = a'' (a' x + b') + b'' = (a'' a') x + (a'' b' + b'') for all x pretty damn close to p.

And we draw our conclusion. The derivative of f . g at point p is simply the first-order term, a'' a'. Exciting. What does that mean? Pulling from the definitions above, a'' is the derivative of f at point g(p), and a' is the derivative of g at point p. That is exactly what the chain rule is.

OK. That wasn't as simple as I hoped. But I hope you get the picture a little better when you replace the f'(g(x)) clutter with constants. I think, in particular, you can see where the multiplication comes in. It's linear. You substitute. You shave off a few bits and toss it into your constant, and multiply a few coefficients.
 


Thanks!
 
Back
Top