Why does the summation come from?

In summary: Delta y,z+\Delta z) \approx f(y, z) + \left( \frac{\partial f}{\partial y} (y,z)\right) \Delta y \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z + \left( \frac{\partial^2 f}{\partial y \partial z}(y,z)\right) \Delta y \Delta z$$This is starting to look like the product rule.
  • #1
Zap
406
120
TL;DR Summary
Applying the chain rule in a way that is not often encountered.
I want to take the derivative of a composite function that looks like

$$f( g(x), h(x) ).$$

I know from Wolfram that the answer is

$$\frac{ df( g(x), h(x) ) }{ dx } = \frac{ dg(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dg(x) } + \frac{ dh(x) }{ dx }\frac{ df( g(x), h(x) ) }{ dh(x) }.$$

We can generalize the result to get

$$\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dx } = \sum_{i=1}^{n}{ \frac{ dg_{i}(x) }{ dx }\frac{ df( g_{1}(x), g_{2}(x), ..., g_{n}(x) ) }{ dg_{i}(x) } }.$$

However, I am struggling to understand why. Where does the summation come from? It's like the product rule was applied somewhere, but I'm not seeing where.

Can anyone help me to understand this?

Thanks,
Zap.
 
Mathematics news on Phys.org
  • #2
Whoops. I just found out this is called the generalized chain rule ... I guess in school they teach you a special case of the chain rule, which confusingly is referred to as THE chain rule ... weird.

Anyway, it's good enough for me to know that this has a special name called the generalized or general chain rule, and I don't have to say anything more than "according to the generalized chain rule" when presenting this result.

I briefly checked out the proof of the generalized chain rule, and it was a bit involved. So, my curiosity quickly plummeted.

This is a good site explaining the general version of the chain rule :
https://web.ma.utexas.edu/users/m408m/Display14-5-4.shtml#:~:text=The General Version of the,function of s and t.

That's pretty cool. I wonder why they don't teach us the general chain rule in school. I'm actually a little bit upset about that. What the heck did I pay all that money for? To be lied to?

I digress. I will change the name of this post to "The Generalized Chain Rule."
 
  • #3
Do you want an intuitive reason why it is so, or a rigorous proof?

EDIT: Oh I just read the second post haha
 
  • #4
The rigorous proof is too much ... I thought it was something simpler.
 
  • #5
This is somewhat similar to the total derivative (using the chain rule independently for the two inner functions), but then setting the independent variables equal.
 
  • #6
That is a much better way of looking at it!
 
  • #7
I love the generalized chain rule. Was anyone taught this in school? I took calculus I, II, III and differential equations without ever seeing it.
 
  • #9
It can be proved as follows Consider mappings
$$A:\mathbb{R}\to\mathbb{R}^2,\quad A(x)=(h(x),g(x))^T$$
and $$B=f\circ A,\quad f:\mathbb{R}^2\to\mathbb{R}$$
then by the standard chain rule you have
$$dB=df\circ dA$$
This is it
 
  • #10
Notice that your first example is just using the simple chain rule one variable at a time (on g and on h) as a function of x and then summing the results. This appears in a multitude of different combinations and situations. So they just teach the simple one and leave all the particular combinations up to you to figure out. I would be surprised if you did not run into this in some of your calculous classes. Maybe you used it and didn't realize it.
 
Last edited:
  • #11
I'm invoking the crud out of this chain rule. You can also nest this general chain rule inside of itself forever. I would write that out, but it's too frustrating to write all that out in Latex. The summation is still not obvious to me unless I assume the function requires the product rule, such as in f( g(x), h(x) ) = g(x)h(x), or something like f( g(x), h(x) ) = g(x) + h(x).
 
Last edited:
  • #12
Here's how I would think about it in a kind of handwavy argument but it's basically how a full proof works.

Suppose you have f(y,z), and you want to know how much it moves if you move ##\Delta y## in the y direction, and nothing in the z direction. By just normal derivative rules, that's
$$f(y+\Delta y,z) \approx f(y,z) + \left(\frac{\partial f}{\partial y}(y, z)\right) \Delta y$$

Now suppose we want to move a little bit in the##\Delta z## direction.

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z$$

Now we notice a neat trick. Let's assume f is twice differentiable just to keep things simple

$$
\left( \frac{\partial f}{\partial z} (y+\Delta y,z)\right) \Delta z \approx \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z + \left( \frac{ \partial^2 f}{\partial y \partial z}(y,z)\right) \Delta y \Delta z$$

But the ##\Delta y \Delta z## is a second order term, so we can drop it. Plugging this back into that previous equation,

$$f(y+\Delta y,z+\Delta z) \approx f(y+\Delta y, z) + \left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

And plugging in our approximation for ##f(y+\Delta y,z)## we get

$$f(y+\Delta y,z+\Delta z) \approx f(y, z) + \left( \frac{\partial f}{\partial y} (y,z)\right) \Delta y
\left( \frac{\partial f}{\partial z} (y,z)\right) \Delta z$$

If you're still a bit confused by what's going on, the point is if you move a little in the y direction, you have to add some ##\frac{\partial f}{\partial y} \Delta y##. If you then move in the z direction, you have to add some ##\frac{\partial f}{\partial z} \Delta z##. You should be worried about at what point you are evaluating those derivatives, but if f is sufficiently differentiable and your move is small enough, it turns out it doesn't matter, you can do it at your original point.

Cool. Now if we have ##f(g(x),h(x))##, if I move x by ##\Delta x## then I move the y input by ##\Delta y = g'(x) \Delta x## and the h input by ##\Delta z = h'(x) \Delta x##. So

$$f(g(x+\Delta x),h(x+ \Delta x))\approx f(g(x),h(x)) + \left( \frac{\partial f}{\partial y}(g(x),h(x)\right) g'(x) \Delta x + \left(\frac{\partial f}{\partial z}(g(x),h(x)\right) h'(x) \Delta x$$The derivative is literally the term I multiply ##\Delta x ## by in order to approximate how much ##f(g(x),h(x))## moves by, so I have computed the derivative in that approximation by adding the final two terms.
 
Last edited:

1. What is the Generalized Chain Rule?

The Generalized Chain Rule is a mathematical concept that allows us to find the derivative of a composite function, where one function is nested inside another function.

2. How is the Generalized Chain Rule different from the Chain Rule?

The Generalized Chain Rule is an extension of the Chain Rule, which only applies to functions with one independent variable. The Generalized Chain Rule can be applied to functions with multiple independent variables.

3. When should the Generalized Chain Rule be used?

The Generalized Chain Rule should be used when finding the derivative of a composite function with multiple independent variables, such as in multivariable calculus or in physics and engineering problems.

4. What is the formula for the Generalized Chain Rule?

The formula for the Generalized Chain Rule is:
d(uv)/dx = u(dv/dx) + v(du/dx), where u and v are functions of x and d/dx represents the derivative with respect to x.

5. How do you apply the Generalized Chain Rule in practice?

To apply the Generalized Chain Rule, you must first identify the inner and outer functions in the composite function. Then, take the derivative of the outer function and multiply it by the derivative of the inner function, while also replacing the inner function with its original form. Finally, add this product to the derivative of the outer function multiplied by the original form of the inner function.

Similar threads

  • General Math
Replies
16
Views
2K
  • General Math
Replies
3
Views
812
Replies
4
Views
418
Replies
2
Views
1K
Replies
3
Views
736
  • Special and General Relativity
Replies
5
Views
1K
  • Introductory Physics Homework Help
Replies
15
Views
292
  • Differential Geometry
Replies
2
Views
591
Replies
6
Views
2K
Back
Top