# The Generalized Chain Rule

• I
Summary:
Applying the chain rule in a way that is not often encountered.
I want to take the derivative of a composite function that looks like

$$f( g(x), h(x) ).$$

I know from Wolfram that the answer is

$$\frac{ df( g(x), h(x) ) }{ dx } = \frac{ dg(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dg(x) } + \frac{ dh(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dh(x) }.$$

We can generalize the result to get

$$\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dx } = \sum_{i=1}^{n}{ \frac{ dg_{i}(x) }{ dx }\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dg_{i}(x) } }.$$

However, I am struggling to understand why. Where does the summation come from? It's like the product rule was applied somewhere, but I'm not seeing where.

Can anyone help me to understand this?

Thanks,
Zap.

Whoops. I just found out this is called the generalized chain rule ... I guess in school they teach you a special case of the chain rule, which confusingly is referred to as THE chain rule ... weird.

Anyway, it's good enough for me to know that this has a special name called the generalized or general chain rule, and I don't have to say anything more than "according to the generalized chain rule" when presenting this result.

I briefly checked out the proof of the generalized chain rule, and it was a bit involved. So, my curiosity quickly plummeted.

This is a good site explaining the general version of the chain rule :
https://web.ma.utexas.edu/users/m408m/Display14-5-4.shtml#:~:text=The General Version of the,function of s and t.

That's pretty cool. I wonder why they don't teach us the general chain rule in school. I'm actually a little bit upset about that. What the heck did I pay all that money for? To be lied to?

I digress. I will change the name of this post to "The Generalized Chain Rule."

AndreasC
Gold Member
Do you want an intuitive reason why it is so, or a rigorous proof?

EDIT: Oh I just read the second post haha

The rigorous proof is too much ... I thought it was something simpler.

mfb
Mentor
This is somewhat similar to the total derivative (using the chain rule independently for the two inner functions), but then setting the independent variables equal.

That is a much better way of looking at it!

I love the generalized chain rule. Was anyone taught this in school? I took calculus I, II, III and differential equations without ever seeing it.

wrobel
It can be proved as follows Consider mappings
$$A:\mathbb{R}\to\mathbb{R}^2,\quad A(x)=(h(x),g(x))^T$$
and $$B=f\circ A,\quad f:\mathbb{R}^2\to\mathbb{R}$$
then by the standard chain rule you have
$$dB=df\circ dA$$
This is it

FactChecker
Gold Member
Notice that your first example is just using the simple chain rule one variable at a time (on g and on h) as a function of x and then summing the results. This appears in a multitude of different combinations and situations. So they just teach the simple one and leave all the particular combinations up to you to figure out. I would be surprised if you did not run into this in some of your calculous classes. Maybe you used it and didn't realize it.

Last edited:
I'm invoking the crud out of this chain rule. You can also nest this general chain rule inside of itself forever. I would write that out, but it's too frustrating to write all that out in Latex. The summation is still not obvious to me unless I assume the function requires the product rule, such as in f( g(x), h(x) ) = g(x)h(x), or something like f( g(x), h(x) ) = g(x) + h(x).

Last edited:
Office_Shredder
Staff Emeritus
Gold Member
Here's how I would think about it in a kind of handwavy argument but it's basically how a full proof works.

Suppose you have f(y,z), and you want to know how much it moves if you move ##\Delta y## in the y direction, and nothing in the z direction. By just normal derivative rules, that's
$$f(y+\Delta y,z) \approx f(y,z) + \left(\frac{\partial f}{\partial y}(y, z)\right) \Delta y$$

Now suppose we want to move a little bit in the##\Delta z## direction.

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z$$

Now we notice a neat trick. Let's assume f is twice differentiable just to keep things simple

$$\left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z \approx \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z + \left( \frac{ \partial^2 f}{\partial y \partial z}(y,z)\right) \Delta y \Delta z$$

But the ##\Delta y \Delta z## is a second order term, so we can drop it. Plugging this back into that previous equation,

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

And plugging in our approximation for ##f(y+\Delta y,z)## we get

$$f(y+\Delta y,z+\Delta z) \approx f(y, z) + \left( \frac{\partial f}{\partial y} (y,z)\right) \Delta y \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

If you're still a bit confused by what's going on, the point is if you move a little in the y direction, you have to add some ##\frac{\partial f}{\partial y} \Delta y##. If you then move in the z direction, you have to add some ##\frac{\partial f}{\partial z} \Delta z##. You should be worried about at what point you are evaluating those derivatives, but if f is sufficiently differentiable and your move is small enough, it turns out it doesn't matter, you can do it at your original point.

Cool. Now if we have ##f(g(x),h(x))##, if I move x by ##\Delta x## then I move the y input by ##\Delta y = g'(x) \Delta x## and the h input by ##\Delta z = h'(x) \Delta x##. So

$$f(g(x+\Delta x),h(x+ \Delta x))\approx f(g(x),h(x)) + \left( \frac{\partial f}{\partial y}(g(x),h(x)\right) g'(x) \Delta x + \left(\frac{\partial f}{\partial z}(g(x),h(x)\right) h'(x) \Delta x$$

The derivative is literally the term I multiply ##\Delta x ## by in order to approximate how much ##f(g(x),h(x))## moves by, so I have computed the derivative in that approximation by adding the final two terms.

Last edited: