How do I apply Chain Rule to get the desired result?

Click For Summary

Discussion Overview

The discussion revolves around the application of the chain rule in the context of calculating the directional derivative of a function. Participants explore the mathematical formulation and interpretation of the directional derivative, particularly how it relates to the gradient of the function and the direction vector.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant references a textbook definition of the directional derivative and attempts to apply the chain rule to derive the result, expressing confusion about the relationship between the partial derivative and the gradient.
  • Another participant critiques the application of the chain rule, suggesting that the form used is incorrect and emphasizes the need for clarity in the notation of partial derivatives.
  • A later reply discusses the nature of partial derivatives as functions of multiple variables and proposes an alternative notation to clarify their meaning, while also providing a detailed breakdown of how to analyze the example using the chain rule.
  • Some participants express uncertainty about the correct application of the chain rule and the interpretation of derivatives in the context of vector functions.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the correct application of the chain rule or the interpretation of the directional derivative. Multiple competing views and interpretations remain present throughout the discussion.

Contextual Notes

There are unresolved issues regarding the correct application of the chain rule, the notation of partial derivatives, and the assumptions made about the variables involved in the discussion.

bwest121
Messages
5
Reaction score
1
I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$
 
Physics news on Phys.org
You have used the chain rule on the (wrong) form df/dx = (df/dx)(dy/dx). The chain rule is df/dx = (df/dy)(dy/dx). If you have several variables y you get a sum over the variables and the derivatives of f will be the partial derivatives.
 
bwest121 said:
I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$

The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.
 
  • Like
Likes   Reactions: Stephen Tashi and bwest121
bwest121 said:
Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$
This is wrong. It is not $$\frac {\partial f}{\partial \alpha} $$
The simple, one variable version is df/dx = df/du * du/dx. Notice the df/du rather than df/dx.
 
PeroK said:
The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.
Thank you so much. I very much appreciate you taking the time to provide such a thorough explanation. :)
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
Replies
2
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 1 ·
Replies
1
Views
6K
  • · Replies 1 ·
Replies
1
Views
978
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K