Why Multiply by the Derivative of the Inner Function in the Chain Rule?

Click For Summary

Discussion Overview

The discussion centers around the philosophical reasoning behind the chain rule in calculus, specifically why one multiplies by the derivative of the inner function when applying the chain rule. Participants explore intuitive proofs and the implications of differentiable functions being locally linear.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant questions the philosophical justification for multiplying by Δu/Δu in the chain rule, noting that while it mathematically equals 1, they seek a deeper understanding.
  • Another participant explains that multiplying by Δu/Δu is a convenient method to involve the intermediate variable, but does not provide a definitive rationale for its appropriateness.
  • A different participant draws parallels to common mathematical practices, such as finding common denominators or completing the square, suggesting that the reasoning is similar in nature.
  • One participant reflects on a quote regarding the best linear approximation to composite functions, discussing the local linearity of differentiable functions and how this relates to the chain rule.
  • The same participant elaborates on the local linearity of functions and how derivatives can be understood through the composition of functions, ultimately leading to the conclusion that the derivative of a composite function is the product of the derivatives of the individual functions.

Areas of Agreement / Disagreement

Participants express various viewpoints on the appropriateness of the multiplication by Δu/Δu, but there is no consensus on a singular philosophical justification. The discussion remains exploratory with multiple perspectives presented.

Contextual Notes

The discussion includes assumptions about the nature of differentiable functions and their local linearity, but these assumptions are not universally agreed upon or fully resolved within the conversation.

Quincy
Messages
228
Reaction score
0
Chain Rule - intuitive "Proof"

Suppose y = f(u), and u = g(x), then dy/dx = dy/du * du/dx.

In an intuitive "proof" of the chain rule, it has this step: dy/dx = [tex]\lim_{\Delta x \to 0} \frac {\Delta y}{\Delta x}[/tex] = [tex]\lim_{\Delta x \to 0} \frac {\Delta y}{\Delta u}[/tex] * [tex]\frac {\Delta u}{\Delta x}[/tex]

My question is, why multiply by [tex]\frac {\Delta u}{\Delta u}[/tex]? I know mathematically, it's because [tex]\frac {\Delta u}{\Delta u}[/tex] = 1, and multiplying by 1 doesn't change the function, but I'm looking for a philosophical reason. I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...
 
Physics news on Phys.org


A change Δx in x causes a change Δu in u which causes a change Δy in y. The derivatives are limits of difference quotients so you would expect Δy/Δu [itex]\rightarrow[/itex] dy/du and Δu/Δx [itex]\rightarrow[/itex] du/dx. Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable. That's about as philosophical as I get. And I gather that you do know that method isn't really a proof.
 


LCKurtz said:
Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable.
But why is it appropriate?


LCKurtz said:
And I gather that you do know that method isn't really a proof.
Yes, that's why I put proof in quotation marks.
 


Quincy said:
But why is it appropriate?

Because I need a Δy/Δu and Δu/Δx in the equation and multiplying by Δu/Δu doesn't change the equation.
 


Surely you've seen that done before in mathematics? When you get common denominators to add fractions, you multiply numerator and denominator by the same thing- that's exactly the same idea. When you complete the square in a quadratic, you add and subtract the same thing. Almost the same idea.
 


Quincy said:
I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...

Interesting quote. I'm glad I came across this. (Thanks mathwonk!)

I'm not sure what more can be said. But for my own sake, and maybe yours as well, I'll do a little explaining to think the idea through.

Differentiable functions are locally linear. That means, you take any differentiable function f and a point p. There exist constants a and b such that f(x) = ax + b as long as x is pretty damn close to p. The derivative of f at point p is simply a. (That's almost TOO convenient...).

Combine this idea with the chain rule. Let f and g be differentiable functions. We note that f . g is differentiable too. We pick a point p. We want to find the derivative of f . g at p. We use our rule above. Since f . g is differentiable, it is locally linear. Which means, (f . g)(x) = f(g(x)) = a x + b for some a and b. Our goal is to determine the value of a.

Well, since f and g are both differentiable, we know that they too are locally linear. So let's will some more variables into existence, using the same rule:

g(x) = a' x + b'

(The primes are not differentiation. The a's are the derivatives at point p and the b's are the constants).

So the derivative of g at point p is a'.

Next, we use our rule on f. But it's a little different this time! We're not taking the derivative of f at p. Instead, we're taking the derivative of f at the point g(p)! But a point is a point, and fixing g(p), we use our rule to conclude that

f(x) = a'' x + b'' for all x's that are pretty damn close to g(p).

So now we have a' (the derivative of g at point p) and a'' (the derivative of f at point g(p)). Let's compose f and g!

(f . g)(x) = f(g(x))

If x is close to p, then we can expand g(x) as a' x + b':

(f . g)(x) = f(a' x + b') for all x's pretty damn close to p.

Now, and again, g(x) is pretty damn close to p, so we can expand f:

(f . g)(x) = a'' (a' x + b') + b'' = (a'' a') x + (a'' b' + b'') for all x pretty damn close to p.

And we draw our conclusion. The derivative of f . g at point p is simply the first-order term, a'' a'. Exciting. What does that mean? Pulling from the definitions above, a'' is the derivative of f at point g(p), and a' is the derivative of g at point p. That is exactly what the chain rule is.

OK. That wasn't as simple as I hoped. But I hope you get the picture a little better when you replace the f'(g(x)) clutter with constants. I think, in particular, you can see where the multiplication comes in. It's linear. You substitute. You shave off a few bits and toss it into your constant, and multiply a few coefficients.
 


Thanks!
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 36 ·
2
Replies
36
Views
4K