How Do I Apply the Chain Rule for Second Order Partial Derivatives?

  • Context: Graduate 
  • Thread starter Thread starter Chuck37
  • Start date Start date
  • Tags Tags
    Chain Chain rule
Click For Summary

Discussion Overview

The discussion revolves around applying the chain rule to compute first and second order partial derivatives of a function F(u,v) that is expressed in terms of another function G(u,v). Participants explore the formulation of the gradient and Hessian matrix, as well as the complexities involved in deriving second order derivatives using various notations and approaches.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant outlines the first order partial derivatives using the chain rule and expresses interest in extending this to second order derivatives.
  • Another participant proposes an extended chain rule for second order derivatives but struggles to express it in matrix notation.
  • A different participant suggests using summation/index notation to derive the second derivative, emphasizing the complexity of the resulting terms.
  • Concerns are raised about interpreting certain notations, such as f_{,jm}, and whether they represent second order derivatives.
  • One participant questions the dimensionality of a resulting matrix from a gradient and Jacobian multiplication, seeking clarification on the terms involved.
  • Another participant corrects a previous claim about the number of terms in the second derivative calculation, asserting that there are additional terms that must be considered.
  • Discussions include whether different approaches to the problem yield the same results or if they contradict each other.
  • Participants express a desire to clarify their understanding of the matrix/vector notation used in the context of the problem.

Areas of Agreement / Disagreement

Participants exhibit a mix of agreement and disagreement regarding the correct formulation of second order derivatives, with some asserting that additional terms must be included while others believe their simpler approaches suffice. The discussion remains unresolved on whether the different methods lead to the same conclusions.

Contextual Notes

Some participants express uncertainty about the assumptions underlying their calculations and the implications of different notations. There is also a lack of consensus on the necessity of computing cross partials in F(x) for the problem at hand.

Chuck37
Messages
51
Reaction score
0
I have a function F(u,v) that I need to get first and second order partial derivatives for (Gradient and Hessian). F(u,v) is very ugly, so I'm thinking of it like F(x,y,z) where I have another function [x,y,z]=G(u,v).

So, I got my first orders, e.g.:

dF/du = dF/dx*dx/du + dF/dy*dy/du + dF/dz*dz/du

Defining X=[x y z] and U=[u v] I can formulate this in vector notation:

dF/dU = dF/dX * Jacobian(X(u,v))

at least I think I can. It seems to be working.

Now I need the second orders of F with respect to [u,v]. What I really need is the 2x2 Hessian matrix. I'm not totally sure how to proceed. I plowed through and got all my partials of F with respect to [x,y,z], but I'm not sure how to apply the chain rule or its equivalent either in scalar or matrix/vector notations.

Can anyone help? (If nothing else, how do you write out ddF/dudv in terms of partials of F(x,y,z) and G(u,v)?)
 
Physics news on Phys.org
I got notification that someone replied, but I don't see anything here. I wonder if it got moderated away for some reason?

I think I worked out this much, sort of an extended chain rule for partials (in lousy text notation):

ddF/dudv = (ddF/dxx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

It's a pain but I can compute all these terms. I can't really see a way to formulate this in terms of gradients, Jacobians and Hessians.
 
Okay, Chuck!
I did make a reply, but it was riddled with errors, so I deleted it.

I'm working on a better one! :smile:
 
The simplest way to do this is by using summation/index notation.

Suppose the index i=1,2,3. Then [itex]F_{i}[/itex] denotes a three-dimensional vector.
If the index j=1,2, then [itex]G_{ij}[/itex] is a 3*2 matrix.

The notation [tex]F_{,i}, i=1,2,3[/tex] means the vector with components:
[tex]F_{,1}=\frac{\partial{F}}{\partial{x}_{1}}[/tex]
and so on (the comma in front of the index means you differentiate with respect to x_1!)

We may therefore identify [itex]G_{i,j}[/itex] as an i*j MATRIX, with, for example the (1,2)-entry being:
[tex]G_{1,2}=\frac{\partial{G}_{1}}{\partial{x}_{2}}[/tex]

Furthermore, products with the same index is SUMMED over that index, for example (i=1,2,3):
[tex]u_{i}v_{i}=u_{1}v_{1}+u_{2}v_{2}+u_{3}v_{3}[/tex]


Thus, in your case, we have:
[tex]F(u_{i})=f(x_{j}(u_{i})), i=1,2, j=1,2,3[/tex]
Differentiating this wrt to the u_i yields, with the chain rule:
[tex]F_{,i}=f_{,j}x_{j,i} (*)[/tex]

Now, by setting n=1,2, m=1,2,3, we may find the full second derivative (a matrix!) of F as follows:
[tex]F_{,in}=f_{,jm}x_{j,i}x_{m,n}+f_{,j}x_{j,in}[/tex]
Note that the first product, running over BOTH j and m consists of 9 terms, whereas the latter product, running merely over the j's, consist of 3 terms.

Each of these 12 "terms" are themselves 2*2-matrices in "i" and "n"
 
Last edited:
How is it working out, Chuck?
If you have any questions to the above, just post them.
 
Last edited:
arildno said:
How is it working out, Chuck?
If you have any questions to the above, just post them.

I'm still absorbing it a little. I think it makes sense, but I won't waste your time until I give it the proper attention. Thanks very much for your help. I'll post my questions a little later if it doesn't sink in.
 
Is there a way to think of this as plain old matrix and vector multiplies? For example, your second to the last equation I can view as a multiply of the gradient of f() w.r.t X (a 1x3 row vector) multiplied by the Jacobian of X w.r.t [u,v] (a 3x2 matrix). Multiplying these results in a 1x2 vector gradient of f() w.r.t [u,v].

Can the last equation be thought of in similar terms? I'm having a hard time seeing it.

Part of my confusion is that I'm not certain how to interpret this:

[tex] f_{,jm}[/tex]

Is that a second order derivative?

Thanks.
 
We have:
[tex]f_{,jm}=\frac{\partial^{2}f}{\partial{x}_{j}\partial{x}_{m}}[/tex]
which you certainly can regard as a 3*3 matrix.

Thus, the first set of terms can, essentially be regarded as a triple matrix product, of dimensions:
(2*3)(3*3)(3*2), yielding a (2*2) matrix.

This could also, in your terminology be regarded as the product between the transpose of the "Jacobian", the Hessian of "f" and the Jacobian.

The last set of terms involves the gradient of "f" multiplied with the respective "Hessians" of the x-variables with respect to "u" and "v"
 
Last edited:
A few questions. If the last term is a gradient times a Jacobian (vector times matrix), how can it come out to be a 4x4 matrix in the end? Seems like it would result in a vector.

The other thing I'm trying to resolve is that I believe I solved this problem, though not in cleaner matrix format, and it was done without having to compute any "cross partials" in F(x), e.g. ddF/dxdy. For example, as in my previous post, I believe:

ddF/dudv = (ddF/dxdx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

So I'd need to compute all the second order partials in x, e.g. ddx/dudv, but not the full Hessian in F(x).

It looks like your solution requires the full Hessian in F(x), so I wonder if the solutions contradict each other, or if they are simply different ways of getting to the same answers, or if some terms cancel out?
 
  • #10
ddF/dudv = (ddF/dxdx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

This is simply wrong, there are a couple of other terms involved in the first parenthesis., for example:
[tex]\frac{\partial^{2}F}{\partial{y}\partial{x}}\frac{\partial{y}}{\partial{u}}\frac{\partial{x}}{\partial{v}}[/tex]

In total, you will have twelve terms, as stated.
 
  • #11
I believe you are correct. Those terms must be small in my present application since everything is working properly without them... Nonetheless, I do want to make it right.

Can you answer my other question about your second term. How does it end up as a 4x4 matrix?

Thanks for the help.
 
  • #12
It does NOT end up as a 4*4 matrix, but as a 2*2 matrix.
We have, by the mentioned summation notation:
[tex]f_{,j}x_{j,in}=\frac{\partial{f}}{\partial{x}_{1}}\frac{\partial^{2}x_{1}}{\partial{u}_{i}\partial{u}_{n}}+\frac{\partial{f}}{\partial{x}_{2}}\frac{\partial^{2}x_{2}}{\partial{u}_{i}\partial{u}_{n}}+\frac{\partial{f}}{\partial{x}_{3}}\frac{\partial^{2}x_{3}}{\partial{u}_{i}\partial{u}_{n}},i,n=1,2[/tex]
 
Last edited:
  • #13
Thanks for all the help. In my matrix vector terminology I think this is the best I can do.

In slightly abusive notation:

H{F(u,v)} = J{X(u,v)}'*H{F(x,y,z)}*J{X(u,v)} + G(x)*H{x(u,v)} + G(y)*H{y(u,v)} + G(z)*H{z(u,v)}

where H is the Hessian, J is Jacobian and G is the (3x1) gradient of F(x,y,z). X(u,v) is x,y,z as a function of u,v.

I couldn't see a way to write the last 3 terms as an elegant matrix/vector multipy.

I think this is what you have been saying all along, I just wanted to get it in a familiar notation.

Note:
J{X(u,v)} is (3x2)
H{F(x,y,z)} is (3x3)
G is (3x1) (G(x)=G(1), e.g. is scalar)
H{x(u,v)} is (2x2)
 
  • #14
That's right! :smile:
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K