Why does the summation come from?

Click For Summary

Discussion Overview

The discussion revolves around the generalized chain rule in calculus, specifically focusing on the derivation and understanding of the summation that arises when differentiating composite functions of multiple variables. Participants explore the implications of this rule, its proof, and its educational presentation.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant expresses confusion about the origin of the summation in the generalized chain rule, questioning whether it relates to the product rule.
  • Another participant identifies the concept as the generalized chain rule and reflects on the lack of its teaching in standard curricula.
  • Some participants suggest that the summation can be understood as applying the chain rule independently to each inner function and then summing the results.
  • A participant proposes a rigorous proof involving mappings and the standard chain rule, indicating a mathematical approach to understanding the rule.
  • There is a discussion about the intuitive versus rigorous explanations of the generalized chain rule, with some preferring simpler explanations.
  • One participant shares a handwavy argument that illustrates how moving in multiple directions affects the function, leading to the summation in the derivative expression.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and familiarity with the generalized chain rule, with some agreeing on its significance while others remain uncertain about its presentation in educational settings. No consensus is reached on the best way to explain the summation or its derivation.

Contextual Notes

Some participants note that the generalized chain rule is not commonly taught in standard calculus courses, leading to confusion about its application and the summation involved. There are also references to the complexity of proofs and the varying approaches to understanding the rule.

Zap
Messages
406
Reaction score
120
TL;DR
Applying the chain rule in a way that is not often encountered.
I want to take the derivative of a composite function that looks like

$$f( g(x), h(x) ).$$

I know from Wolfram that the answer is

$$\frac{ df( g(x), h(x) ) }{ dx } = \frac{ dg(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dg(x) } + \frac{ dh(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dh(x) }.$$

We can generalize the result to get

$$\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dx } = \sum_{i=1}^{n}{ \frac{ dg_{i}(x) }{ dx }\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dg_{i}(x) } }.$$

However, I am struggling to understand why. Where does the summation come from? It's like the product rule was applied somewhere, but I'm not seeing where.

Can anyone help me to understand this?

Thanks,
Zap.
 
Physics news on Phys.org
Whoops. I just found out this is called the generalized chain rule ... I guess in school they teach you a special case of the chain rule, which confusingly is referred to as THE chain rule ... weird.

Anyway, it's good enough for me to know that this has a special name called the generalized or general chain rule, and I don't have to say anything more than "according to the generalized chain rule" when presenting this result.

I briefly checked out the proof of the generalized chain rule, and it was a bit involved. So, my curiosity quickly plummeted.

This is a good site explaining the general version of the chain rule :
https://web.ma.utexas.edu/users/m408m/Display14-5-4.shtml#:~:text=The General Version of the,function of s and t.

That's pretty cool. I wonder why they don't teach us the general chain rule in school. I'm actually a little bit upset about that. What the heck did I pay all that money for? To be lied to?

I digress. I will change the name of this post to "The Generalized Chain Rule."
 
Do you want an intuitive reason why it is so, or a rigorous proof?

EDIT: Oh I just read the second post haha
 
The rigorous proof is too much ... I thought it was something simpler.
 
This is somewhat similar to the total derivative (using the chain rule independently for the two inner functions), but then setting the independent variables equal.
 
That is a much better way of looking at it!
 
I love the generalized chain rule. Was anyone taught this in school? I took calculus I, II, III and differential equations without ever seeing it.
 
It can be proved as follows Consider mappings
$$A:\mathbb{R}\to\mathbb{R}^2,\quad A(x)=(h(x),g(x))^T$$
and $$B=f\circ A,\quad f:\mathbb{R}^2\to\mathbb{R}$$
then by the standard chain rule you have
$$dB=df\circ dA$$
This is it
 
  • #10
Notice that your first example is just using the simple chain rule one variable at a time (on g and on h) as a function of x and then summing the results. This appears in a multitude of different combinations and situations. So they just teach the simple one and leave all the particular combinations up to you to figure out. I would be surprised if you did not run into this in some of your calculous classes. Maybe you used it and didn't realize it.
 
Last edited:
  • #11
I'm invoking the crud out of this chain rule. You can also nest this general chain rule inside of itself forever. I would write that out, but it's too frustrating to write all that out in Latex. The summation is still not obvious to me unless I assume the function requires the product rule, such as in f( g(x), h(x) ) = g(x)h(x), or something like f( g(x), h(x) ) = g(x) + h(x).
 
Last edited:
  • #12
Here's how I would think about it in a kind of handwavy argument but it's basically how a full proof works.

Suppose you have f(y,z), and you want to know how much it moves if you move ##\Delta y## in the y direction, and nothing in the z direction. By just normal derivative rules, that's
$$f(y+\Delta y,z) \approx f(y,z) + \left(\frac{\partial f}{\partial y}(y, z)\right) \Delta y$$

Now suppose we want to move a little bit in the##\Delta z## direction.

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z$$

Now we notice a neat trick. Let's assume f is twice differentiable just to keep things simple

$$
\left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z \approx \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z + \left( \frac{ \partial^2 f}{\partial y \partial z}(y,z)\right) \Delta y \Delta z$$

But the ##\Delta y \Delta z## is a second order term, so we can drop it. Plugging this back into that previous equation,

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

And plugging in our approximation for ##f(y+\Delta y,z)## we get

$$f(y+\Delta y,z+\Delta z) \approx f(y, z) + \left( \frac{\partial f}{\partial y} (y,z)\right) \Delta y
\left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

If you're still a bit confused by what's going on, the point is if you move a little in the y direction, you have to add some ##\frac{\partial f}{\partial y} \Delta y##. If you then move in the z direction, you have to add some ##\frac{\partial f}{\partial z} \Delta z##. You should be worried about at what point you are evaluating those derivatives, but if f is sufficiently differentiable and your move is small enough, it turns out it doesn't matter, you can do it at your original point.

Cool. Now if we have ##f(g(x),h(x))##, if I move x by ##\Delta x## then I move the y input by ##\Delta y = g'(x) \Delta x## and the h input by ##\Delta z = h'(x) \Delta x##. So

$$f(g(x+\Delta x),h(x+ \Delta x))\approx f(g(x),h(x)) + \left( \frac{\partial f}{\partial y}(g(x),h(x)\right) g'(x) \Delta x + \left(\frac{\partial f}{\partial z}(g(x),h(x)\right) h'(x) \Delta x$$The derivative is literally the term I multiply ##\Delta x ## by in order to approximate how much ##f(g(x),h(x))## moves by, so I have computed the derivative in that approximation by adding the final two terms.
 
Last edited:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K