Why does the summation come from?

Zap · Feb 8, 2021

I want to take the derivative of a composite function that looks like

$$f( g(x), h(x) ).$$

I know from Wolfram that the answer is

$$\frac{ df( g(x), h(x) ) }{ dx } = \frac{ dg(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dg(x) } + \frac{ dh(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dh(x) }.$$

We can generalize the result to get

$$\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dx } = \sum_{i=1}^{n}{ \frac{ dg_{i}(x) }{ dx }\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dg_{i}(x) } }.$$

However, I am struggling to understand why. Where does the summation come from? It's like the product rule was applied somewhere, but I'm not seeing where.

Can anyone help me to understand this?

Thanks,
Zap.

Zap · Feb 8, 2021

Whoops. I just found out this is called the generalized chain rule ... I guess in school they teach you a special case of the chain rule, which confusingly is referred to as THE chain rule ... weird.

Anyway, it's good enough for me to know that this has a special name called the generalized or general chain rule, and I don't have to say anything more than "according to the generalized chain rule" when presenting this result.

I briefly checked out the proof of the generalized chain rule, and it was a bit involved. So, my curiosity quickly plummeted.

This is a good site explaining the general version of the chain rule :
https://web.ma.utexas.edu/users/m408m/Display14-5-4.shtml#:~:text=The General Version of the,function of s and t.

That's pretty cool. I wonder why they don't teach us the general chain rule in school. I'm actually a little bit upset about that. What the heck did I pay all that money for? To be lied to?

I digress. I will change the name of this post to "The Generalized Chain Rule."

AndreasC · Feb 8, 2021

Do you want an intuitive reason why it is so, or a rigorous proof?

EDIT: Oh I just read the second post haha

Zap · Feb 8, 2021

The rigorous proof is too much ... I thought it was something simpler.

mfb · Feb 8, 2021

This is somewhat similar to the total derivative (using the chain rule independently for the two inner functions), but then setting the independent variables equal.

Ralph Dratman · Feb 8, 2021

That is a much better way of looking at it!

Zap · Feb 11, 2021

I love the generalized chain rule. Was anyone taught this in school? I took calculus I, II, III and differential equations without ever seeing it.

PeroK · Feb 11, 2021

Zap said:

I love the generalized chain rule. Was anyone taught this in school? I took calculus I, II, III and differential equations without ever seeing it.

I did an Insight on how to take the second derivative in general:

https://www.physicsforums.com/insights/how-to-solve-second-order-partial-derivatives/

wrobel · Feb 12, 2021

It can be proved as follows Consider mappings
$$A:\mathbb{R}\to\mathbb{R}^2,\quad A(x)=(h(x),g(x))^T$$
and $$B=f\circ A,\quad f:\mathbb{R}^2\to\mathbb{R}$$
then by the standard chain rule you have
$$dB=df\circ dA$$
This is it

FactChecker · Feb 12, 2021

Notice that your first example is just using the simple chain rule one variable at a time (on g and on h) as a function of x and then summing the results. This appears in a multitude of different combinations and situations. So they just teach the simple one and leave all the particular combinations up to you to figure out. I would be surprised if you did not run into this in some of your calculous classes. Maybe you used it and didn't realize it.

Zap · Feb 20, 2021

I'm invoking the crud out of this chain rule. You can also nest this general chain rule inside of itself forever. I would write that out, but it's too frustrating to write all that out in Latex. The summation is still not obvious to me unless I assume the function requires the product rule, such as in f( g(x), h(x) ) = g(x)h(x), or something like f( g(x), h(x) ) = g(x) + h(x).

Office_Shredder · Feb 20, 2021

Here's how I would think about it in a kind of handwavy argument but it's basically how a full proof works.

Suppose you have f(y,z), and you want to know how much it moves if you move ##\Delta y## in the y direction, and nothing in the z direction. By just normal derivative rules, that's
$$f(y+\Delta y,z) \approx f(y,z) + \left(\frac{\partial f}{\partial y}(y, z)\right) \Delta y$$

Now suppose we want to move a little bit in the##\Delta z## direction.

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z$$

Now we notice a neat trick. Let's assume f is twice differentiable just to keep things simple

$$
\left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z \approx \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z + \left( \frac{ \partial^2 f}{\partial y \partial z}(y,z)\right) \Delta y \Delta z$$

But the ##\Delta y \Delta z## is a second order term, so we can drop it. Plugging this back into that previous equation,

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

And plugging in our approximation for ##f(y+\Delta y,z)## we get

$$f(y+\Delta y,z+\Delta z) \approx f(y, z) + \left( \frac{\partial f}{\partial y} (y,z)\right) \Delta y
\left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

If you're still a bit confused by what's going on, the point is if you move a little in the y direction, you have to add some ##\frac{\partial f}{\partial y} \Delta y##. If you then move in the z direction, you have to add some ##\frac{\partial f}{\partial z} \Delta z##. You should be worried about at what point you are evaluating those derivatives, but if f is sufficiently differentiable and your move is small enough, it turns out it doesn't matter, you can do it at your original point.

Cool. Now if we have ##f(g(x),h(x))##, if I move x by ##\Delta x## then I move the y input by ##\Delta y = g'(x) \Delta x## and the h input by ##\Delta z = h'(x) \Delta x##. So

$$f(g(x+\Delta x),h(x+ \Delta x))\approx f(g(x),h(x)) + \left( \frac{\partial f}{\partial y}(g(x),h(x)\right) g'(x) \Delta x + \left(\frac{\partial f}{\partial z}(g(x),h(x)\right) h'(x) \Delta x$$The derivative is literally the term I multiply ##\Delta x ## by in order to approximate how much ##f(g(x),h(x))## moves by, so I have computed the derivative in that approximation by adding the final two terms.

Why does the summation come from?

1. What is the Generalized Chain Rule?

2. How is the Generalized Chain Rule different from the Chain Rule?

3. When should the Generalized Chain Rule be used?

4. What is the formula for the Generalized Chain Rule?

5. How do you apply the Generalized Chain Rule in practice?

Similar threads

Hot Threads

Recent Insights