I How do I apply Chain Rule to get the desired result?

bwest121 · Jan 15, 2017

I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$

Orodruin · Jan 15, 2017

You have used the chain rule on the (wrong) form df/dx = (df/dx)(dy/dx). The chain rule is df/dx = (df/dy)(dy/dx). If you have several variables y you get a sum over the variables and the derivatives of f will be the partial derivatives.

PeroK · Jan 15, 2017

bwest121 said:

I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$

The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.

FactChecker · Jan 15, 2017

bwest121 said:

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

This is wrong. It is not $$\frac {\partial f}{\partial \alpha} $$
The simple, one variable version is df/dx = df/du * du/dx. Notice the df/du rather than df/dx.

bwest121 · Jan 15, 2017

PeroK said:

The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.

Thank you so much. I very much appreciate you taking the time to provide such a thorough explanation. :)

I How do I apply Chain Rule to get the desired result?

Thread 'How does time derivative commute from one variable to another?'

Similar threads

Hot Threads

I Algebraic property of real numbers

I Problem in understanding instantaneous velocity

I How to find the path if we only know the velocity (without common formulas)?

I Harmonic series Ʃ1/n diverges but p-series Ʃ(1/n)^p diverges?

I Explicit logical justification for last step in epsilon/delta proof?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective