Directional derivative and the gradient - confused.

Tomer · Sep 1, 2011

Hello, thanks for reading!

I am slightly confused. According to the definition of the directional derivative, calculated at the point x in the direction y,
f'(x;y) = [itex]lim\frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h}[/itex], h-->0
According to this definition, the directional derivative seems to not depend on the size of the vector y, which makes intuitively sense.

It is also true for differentiable functions: f'(x;y) = [itex]grad(f)(x)\cdot\vec{y}[/itex]
However, this definition seems to depend on the size of y.
What am I missing?

Furthermore, I can't seem to be able to prove that if f'(x;y) >0 for a certain x,y then f'(x;-y)<0 (which I think is true, and can also be verified using the gradient relation).

This is probably incredibly dumb but I thought of it and I can't seem to understand it.

Hootenanny · Sep 1, 2011

Tomer said:

Hello, thanks for reading!

I am slightly confused. According to the definition of the directional derivative, calculated at the point x in the direction y,
f'(x;y) = [itex]lim\frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h}[/itex], h-->0
According to this definition, the directional derivative seems to not depend on the size of the vector y, which makes intuitively sense.

It is also true for differentiable functions: f'(x;y) = [itex]grad(f)(x)\cdot\vec{y}[/itex]
However, this definition seems to depend on the size of y.
What am I missing?

Both definitions you have quoted are only for unit vectors, i.e. iff [itex]|\vec{y}| = 1[/itex]. Indeed, both do depend on the length of the vector, [itex]\vec{y}[/itex]. If you want to use a non-normalized vector, [itex]\vec{v} : |\vec{v}|\neq0[/itex], then you must use the following definition

[tex]\nabla_\vec{v} f(\vec{x}) = \lim_{h\rightarrow0+} \frac{f(\vec{x}+h\vec{v}) - f(\vec{x})}{h|\vec{v}|}[/tex]

or the equivalent gradient definition.

Tomer said:

Furthermore, I can't seem to be able to prove that if f'(x;y) >0 for a certain x,y then f'(x;-y)<0 (which I think is true, and can also be verified using the gradient relation).

Well, what exactly have you tried?

Tomer · Sep 1, 2011

Thanks for replying.

Well, I'm reading Apostle (volume 2), and here's the definition he gives for the directional derivative at the point [itex]\vec{x}[/itex] and direction [itex]\vec{v}[/itex]:

Given a scalar field f: S -> R , where S[itex]\subseteq R^{n}[/itex], Let [itex]\vec{x}[/itex] be an interior point of S and let [itex]\vec{v}[/itex] be an arbitrary point in . The derivative of f at [itex]\vec{x}[/itex] with respect to [itex]\vec{v}[/itex] is denoted by the symbol
f'([itex]\vec{x}[/itex]; [itex]\vec{v}[/itex]) and is defined by the equation:

f'([itex]\vec{x}[/itex]; [itex]\vec{v}[/itex]) = [itex]lim_{h\rightarrow0}
\frac{f(\vec{x} + h\vec{v}) - f(\vec{x})}{h}[/itex]

He doesn't mention anywhere the fact that [itex]\vec{v}[/itex] is a unit vector. Is it wrong then? I made sure I copied the definition 1-1.

When I follow your definition (or the world's :-) ) I see the dependence of course.

About what I tried:

Well, I tried playing with the one-sided limits, and I actually seem to get the unwanted result: that the two derivatives in the directions [itex]\vec{v}[/itex] and [itex]\vec{-v}[/itex] are equal:

f'([itex]\vec{x}[/itex]; [itex]\vec{v}[/itex]) = [itex]lim_{h\rightarrow0^{+}}
\frac{f(\vec{x} + h\vec{v}) - f(\vec{x})}{h}[/itex] = [itex]lim_{h\rightarrow0^{-}}
\frac{f(\vec{x} + h(\vec{-v})) - f(\vec{x})}{h}[/itex] = f'([itex]\vec{x}[/itex]; [itex]\vec{-v}[/itex])
(if we ignore the ||v|| factor for a second, it shouldn't effect)

The first and last equalities follow from the fact that both one-sided limits exist and are equal to the limit itself.
The middle equality follows from a direct substitution h --> -h.

So, did I shake the foundations of maths or am I missing something? 0_0

Hootenanny · Sep 1, 2011

Tomer said:

Well, I'm reading Apostle (volume 2), and here's the definition he gives for the directional derivative at the point [itex]\vec{x}[/itex] and direction [itex]\vec{v}[/itex]:

Given a scalar field f: S -> R , where S[itex]\subseteq R^{n}[/itex], Let [itex]\vec{x}[/itex] be an interior point of S and let [itex]\vec{v}[/itex] be an arbitrary point in . The derivative of f at [itex]\vec{x}[/itex] with respect to [itex]\vec{v}[/itex] is denoted by the symbol
f'([itex]\vec{x}[/itex]; [itex]\vec{v}[/itex]) and is defined by the equation:

f'([itex]\vec{x}[/itex]; [itex]\vec{v}[/itex]) = [itex]lim_{h\rightarrow0}
\frac{f(\vec{x} + h\vec{v}) - f(\vec{x})}{h}[/itex]

He doesn't mention anywhere the fact that [itex]\vec{v}[/itex] is a unit vector. Is it wrong then? I made sure I copied the definition 1-1.

When I follow your definition (or the world's :-) ) I see the dependence of course.

As quoted, the definition is obviously false.

Tomer said:

Well, I tried playing with the one-sided limits, and I actually seem to get the unwanted result: that the two derivatives in the directions [itex]\vec{v}[/itex] and [itex]\vec{-v}[/itex] are equal:

f'([itex]\vec{x}[/itex]; [itex]\vec{v}[/itex]) = [itex]lim_{h\rightarrow0^{+}}
\frac{f(\vec{x} + h\vec{v}) - f(\vec{x})}{h}[/itex] = [itex]lim_{h\rightarrow0^{-}}
\frac{f(\vec{x} + h(\vec{-v})) - f(\vec{x})}{h}[/itex] = f'([itex]\vec{x}[/itex]; [itex]\vec{-v}[/itex])
(if we ignore the ||v|| factor for a second, it shouldn't effect)

The first and last equalities follow from the fact that both one-sided limits exist and are equal to the limit itself.
The middle equality follows from a direct substitution h --> -h.

So, did I shake the foundations of maths or am I missing something? 0_0

In the second equality, you forgot to change the sign of the h in the denominator.

Tomer · Sep 1, 2011

How disturbing. Apostle uses it again and again and all the extensions are defined similarly.

Aside from that - I realize the equalities are trivial, which only strengthens the fact that this definition is false. h must remain strictly positive then.
I cannot see therefore a way to manipulate the "-" sign inside the function. You think you could give a tip? This is not homework, I'm learning the material alone.

Tomer · Sep 1, 2011

Oh - heck, I just realized I have done a stupid mistake. I forgot that the dominator h also changes sign to -h, which gives me the minus I wanted!
I'm sorry, that was pretty dumb, as I suspected :-)

I'm still confused about the definition though. So h must remain positive? I realize this makes sense, I just cannot fathom how Apostle would write a false definition. Since the gradient was also defined through partial derivatives, which are defined through the relation I've written, I would expect everything to remain consistent, but then there's this thing with the dependence of the directional derivative on the size of the direction vector.

Hootenanny · Sep 1, 2011

Tomer said:

Oh - heck, I just realized I have done a stupid mistake. I forgot that the dominator h also changes sign to -h, which gives me the minus I wanted!
I'm sorry, that was pretty dumb, as I suspected :-)

Not a problem, we all make stupid mistakes. Even I didn't catch it first time through.

Tomer said:

I'm still confused about the definition though. So h must remain positive? I realize this makes sense, I just cannot fathom how Apostle would write a false definition. Since the gradient was also defined through partial derivatives, which are defined through the relation I've written, I would expect everything to remain consistent, but then there's this thing with the dependence of the directional derivative on the size of the direction vector.

Yes, it is convention that h is non-negative. You can easily understand why. If we did not put a restriction on the sign of h, then the directional derivative with respect to [itex]\vec{u}[/itex] could be in the direction [itex]-\vec{u}[/itex] or equally [itex]\vec{u}[/itex]. Do you see?

Something to point out: the directional derivative does not depend on the length of the vector. It is only your incorrect definition that does. The directional derivative is defined such that the direction vector is normalized and hence independent of length.

As for Apostol's "incorrect" definition do not worry about it. He could have said elsewhere that the vector must have unit length and you have just missed it. The other option is that your edition contains a typo that propagated through the text.

Tomer · Sep 1, 2011

Apostol :-)

Alright, I'll assume v is a unit vector from now on.

Thank you very much!

Stephen Tashi · Sep 2, 2011

Hootenanny said:

Something to point out: the directional derivative does not depend on the length of the vector. It is only your incorrect definition that does. The directional derivative is defined such that the direction vector is normalized and hence independent of length.

How standard is the definition of directional derivative? There are pages on the web (e.g. Wolfram's) that agree with you and there are pages that use Apostol's definition. Since Apostol's notation includes the v, it can be regarded as being more general that the definition that assumes v is a unit vector. I'm curious whether Apostol has any internal inconsistency in his book with respect to directional derivatives.

Tomer · Sep 2, 2011

Stephen Tashi said:

How standard is the definition of directional derivative? There are pages on the web (e.g. Wolfram's) that agree with you and there are pages that use Apostol's definition. Since Apostol's notation includes the v, it can be regarded as being more general that the definition that assumes v is a unit vector. I'm curious whether Apostol has any internal inconsistency in his book with respect to directional derivatives.

That's exactly what bothers me.
Since the gradient itself is also defined through partial derivatives, which are defined through directional derivatives in the direction of the axes - I would think the gradient is somehow also "altered" to be consistent with the rest (which is pretty disturbing as well). I also followed the proofs roughly and found myself agreeing with them (not that that means much). I made sure of it - he doesn't mention anywhere that v is a unit vector.

However, the really disturbing thing is, like I mention, the contradiction between his definition of the directional derivative (in which by the way "h" isn't even limited to be positive which seems totally wrong!) not being dependent of ||v||, whereas the gradient relation gradf*v has an obvious dependence on it.

Since reading the answer of Hootananny I just assume v is a unit vector.

Stephen Tashi · Sep 2, 2011

In the derivative of a real valued function of a real number, h isn't required to be positive. f(x+h) - f(x) and h both change "direction" when if h is given a different sign, so it doesn't matter. Is there something about the case of a function of a vector that is different?

lavinia · Sep 3, 2011

Tomer said:

Hello, thanks for reading!

I am slightly confused. According to the definition of the directional derivative, calculated at the point x in the direction y,
f'(x;y) = [itex]lim\frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h}[/itex], h-->0
According to this definition, the directional derivative seems to not depend on the size of the vector y, which makes intuitively sense.

your definition depends on the length of the vector,y. It is equal to df(y) which is linear in y. The direction that y determines is represented by the unit vector that points along y. Your definition is the directional derivative when y is that unit vector.

Tomer · Sep 5, 2011

Stephen Tashi said:

In the derivative of a real valued function of a real number, h isn't required to be positive. f(x+h) - f(x) and h both change "direction" when if h is given a different sign, so it doesn't matter. Is there something about the case of a function of a vector that is different?

I'd be happy if someone would clear that out. I'm also confused.

HallsofIvy · Sep 5, 2011

The difference is that in higher dimensional space, 2, 3, or even higher dimensions, we have more "degrees of freedom". In one dimension, we can only approach a point "from the left" or "from the right". In two dimensions, we can approach along an infinite number of lines or upon curves such as parabolas or spirals. You may have seen examples of functions of two variables where approaching along the x or y axes gives the same thing but along a slanted line another. There are even examples where approaching a point along any straight line gives the same limit, but approaching along a parabola gives a different result.

Here is an example from Sala, Hille, and Etgen, edition 4, page 860:
[tex]f(x,y)= \frac{x^2y}{x^4+ y^2}[/tex]
as (x,y) goes to (0,0).
Along any line y= mx, that becomes
[tex]f(x, mx)= \frac{x^2(mx)}{x^4+ (mx)^2}= \frac{mx^3}{x^4+ m^2x^2}= \frac{mx}{x^2+ m^2x}[/tex]
The limit of that, as x goes to 0, is 0 no matter what x is.
(The vertical line cannot be written "y= mx" but it is x= 0 so the function becomes
[tex]f(0, y)= \frac{0}{0+ y^2}= 0[/tex]
for all non-zero y so the limit is still 0.)

But if we approach (0, 0) along the curve [itex]y= x^2[/itex] we have
[tex]f(x, x^2)= \frac{x^2(x^2)}{x^4+ (x^2)^2}= \frac{x^4}{2x^4}[/tex]
which, for x not 0, is just 1/2. The limit as x approaches 0 is 1/2, not 0.

Notice, by the way, that the standard definition of the "derivative of f at [itex]x_0[/itex]",
[tex]\lim_{h\to 0}\frac{f(x_0+h)- f(x_0)}{h}[/tex]
can be restated as
[tex]\lim_{x\to x_0}\frac{f(x)- f(x_0)}{d(x, x_0)}[/tex]
where x= x_0+ h and, in one dimension, d(x, x_0)= |x- x_0|, is just the distance beween x and [itex]x_0[/itex].

That is, we must have a "vector space" structure on the range space (values of f) because we must be able to subtract [itex]f(x)- f(x_0)[/itex] and multiply by the number, [itex]1/d(x, x_0)[/itex]. But we do NOT need a "vector space structure" on the domain space! We only need to be able to calculate the distance between two points- we need a metric space. That's why, normally, we do not think of the arguments of functions as vectors but just talk about "functions of several variables".

Hootenanny · Sep 5, 2011

Couldn't have said it better myself, Halls.

Just to clarify a further point regarding the unitary direction vector. It is entirely legitimate to take a derivative along a vector. This vector needn't be normalised. Therefore, the length of the resulting derivative will be relative to the length of the vector along which we are taking the derivative. On the other hand, when taking the directional derivative, or the derivative in a given direction, the reference vector is assumed to be normalised since we are only interested in the direction and do not want to "scale" the derivative with respect to the length of the reference vector.

The distinction between directional derivative and the derivative along a vector is an important one, and I should have made this point earlier. Unfortunately, the two terms are often used synonymously.

Stephen Tashi · Sep 5, 2011

I agree that for a function of several variables, there can be different values of limits at a point depending on how we approach that point.

But my "point" is that when a definition refers to "limit" in the singular, it is defining something in the case when the value of the limit is independent of how we approach. With respect to the scalar h approaching zero, if there is more than one value of the limit depending on the direction of approach then the definition doesn't apply and there is no defined value of whatever-we-were-defining at such a point.

I gather that Apostol's definition of the directional derivative of a scalar valued function f(.) calculated at the point x in the direction y is [itex] f_y(x) = lim_{h \rightarrow 0} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h}[/itex].

There are various facts about this definition:
1. The definition does not assert that that the scalar [itex] h > 0 [/itex]
2. The definition implies that [itex]f_{v}(y) = -f_{-v}(y)[/itex] when both limits exist.
3. The value of [itex] f_{v}(y) [/itex] depends on the magnitude of the vector [itex]y[/itex]

As I see it, none of these facts contradict the others.

For example, the left and right handed limits can be equal
[tex]lim_{h \rightarrow 0_+} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} =lim_{h \rightarrow 0_-} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} [/tex]
without contradicting the fact that
[tex] lim_{h \rightarrow 0} \frac{f(\vec{x} + h(-\vec{y}))-f(\vec{x})}{h} = - lim_{h \rightarrow 0} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} [/tex]

Hootenanny · Sep 5, 2011

Actually, upon re-reading Apostol's definition as posted above, he does not mention the term "directional derivative". It is Tomer that asserts that this is the directional derivative. Apostol only refers to the derivative with respect to a vector. In this case, Apostol's definition is fine as long as we are mindful of the distinction between directional derivative and the derivative with respect to a vector.

Hootenanny · Sep 5, 2011

Stephen Tashi said:

[tex]lim_{h \rightarrow 0_+} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} =lim_{h \rightarrow 0_-} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} [/tex]

And
[tex]\lim_{h \rightarrow 0_+} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} = \lim_{h \rightarrow 0_-} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} = - \lim_{h \rightarrow 0_+} \frac{f(\vec{x} + h(-\vec{y}))-f(\vec{x})}{h}[/tex]
So Stephen, do these expressions represent the directional derivative with respect to [itex]\vec{y}[/itex] or [itex]-\vec{y}[/itex]?

Stephen Tashi · Sep 5, 2011

So Stephen, do these expressions represent the directional derivative with respect to [itex]\vec{y}[/itex] or [itex]-\vec{y}[/itex]?

The derivative with respect to [itex] \vec y [/itex] is:
[tex]f_{\vec y}(\vec x) = lim_{h \rightarrow 0} \frac{f(\vec{x} + h\vec{y})-f(\vec{x})}{h} [/tex]

The derivative with respect to [itex] -\vec y [/itex] is:
[tex]f_{-\vec y}(\vec x) = lim_{h \rightarrow 0} \frac{f(\vec{x} + h(-\vec{y}))-f(\vec{x})}{h} [/tex]

Tomer · Sep 5, 2011

You're right Hootananny, I wasn't aware of the difference and wasn't careful with my words!

Directional derivative and the gradient - confused.

What is a directional derivative?

What is the difference between directional derivative and partial derivative?

How is the directional derivative calculated?

What does the gradient represent?

How is the gradient used in optimization problems?

Similar threads

Hot Threads

Recent Insights