Deeper understanding of the gradient and directional derivative

autodidude · Dec 14, 2012

Why does the formula for the gradient - that is (for functions of 2 variables), the partial with respect to x plus the partial with respect to y give the direction of greatest increase?

i.e. the direction of maximum at some point on a surface is given by [tex]f_xi+f_yj[/tex]

And why, when you times each partial derivative with the corresponding components of a vector, it gives the derivative of the surface in the direction of that vector?

i.e. the derivative of f(x,y) in the direction of <a, b> is [tex]af_x+bf_y[/tex]

The proofs don't offer an understanding of why the formula does it what does - at least for me

My lack of understanding of this may have something to do with the fact that I still don't get intuitively why a change in the x direction plus change in y direction gives the total change.

Stephen Tashi · Dec 14, 2012

autodidude said:

My lack of understanding of this may have something to do with the fact that I still don't get intuitively why a change in the x direction plus change in y direction gives the total change.

Partial derivatives make that approximation by approximating the shape of a function of two variables locally as a plane. Hold one corner of a book against a table. Is it clear that your finger rises the same amount above the table by running it over the book from the corner on the table to the diagonally opposite corner as it would if you ran it along two sides of the book to reach the opposite corner?

algebrat · Dec 14, 2012

One way to simplify the understanding, is to work all this out for the geometry of a linear function, z=cx+dy. If you move on unit along x, then one unit along y, you've traced the edge of a parallelogram situated above a unit square, and you've risen c+d units.

There, that's why the total is the sum. Of course, this is just the linear approximation in general, for small dx and dy.

You can replace the unit square in the x-y plane with a rectangle of side lengths a and b. Then just imagine the parallelogram again.

But I forgot to try to help with your other question, why does it give the direction of greatest increase. Hmm... I guess if you considered all the unit vectors, the one giving the steepest increase would be parallel to the gradient. Or, you might try to play with visualizing the rise and run of triangle for variously oriented planes. Not sure if there's a better way, anyone got some tricks for why gradient is direction of steepest ascent?

Runei · Dec 15, 2012

To find out why it is we start with the directional derivative.

Directional Derivative
I expect you understand the partial derivatives and have a somewhat intuitive understanding of them. Otherwise we would have to start there.

The idea is to find out the rate of change in an arbitrary direction. So let's say we look at the point P(x₀, y₀) of the function f(x,y).

We want to know the rate of change in the direction of the vector u = <A, B>
How do we do this? Let's define u to be a unit vector.

Well the change in the function is of course Δf = f(x₀+Ah,y₀+Bh)-f(x₀,y₀)

This is the fundamental idea that we need to grasp. The vector controls the increase in x and y by its A and B components. The h variable is a continuos variable. In fact, the function
L(h) = f(x₀+Ah,y₀+Bh) describes all the values of the function f(x,y), that lies on the line parallel to the vector u.

If we accept this, then the rate of change can be defined by how much f(x,y) is changing, with respect to changes to this variable h.

So we have Δf/h = (f(x₀+Ah,y₀+Bh)-f(x₀,y₀))/h

When we take the limit, we of course end up with df/dh, which is the derivative. But let's find out what that fella actually looks like. Well to do this we have to use the dreaded Chain Rule!

Chain Rule
You said you had a problem understanding that a change due to change in y plus a change due to change in x is equal to the total change. Well let's look at that now.

If we want to approximate the change that happens in a function f, how can we do it?

Well if we know the partial derivatives, we know the rate of change in each direction (x and y). So we can compute the approximate change by the formula

Δf ≈ ∂f/∂x Δx + ∂f/∂y Δy

Why is this? Well if we say that
Δf1 = f(x₀+Δx, y₀) - f(x₀, y₀) and
Δf2 = f(x₀, y₀+Δy) - f(x₀, y₀)

Then Δf ≈ f(x₀+Δx, y₀) + f(x₀, y₀+Δy) - 2 f(x₀, y₀)

This can only be true if the two first values are approximately equal to each other. That is
f(x₀+Δx, y₀) ≈ f(x₀, y₀+Δy)

Is this true? Well yes of course it is! If the change in x or change in y is very, very small. Then we will have almost no change in the function, and the two values will almost (but not quite) be equal to each other.

So since we can describe Δf ≈ ∂f/∂x Δx + ∂f/∂y Δy
If we then divide it all by h (since in our case Δx and Δy are actually Ah and Bh respectivly), we have.

Δf/Δh ≈ ∂f/∂x A + ∂f/∂y B

When we take the limit we end up with

df/dh = ∂f/∂x A + ∂f/∂y B

Gradient
We now have the directional derivative. The next question would be - In which direction do we find the greatest rate of change?

Well we can choose to view the above equation as the dot product between two vectors.

so df/dh = <∂f/∂x , ∂f/∂y> * <A, B> = v * u

The dot product is also determined by

<∂f/∂x , ∂f/∂y> * <A, B> = |v||u|cos(θ)

Since |u| = 1 (because it is a unit vector) we have

df/dh = |v|cos(θ)

When is this equation the largest? Well it is the largest when the angle θ is zero. When is it zero? It is zero when the two vectors are parallel. This the greatest rate of change is in the direction of the vector v. This vector we call the gradient and signify by ∇f.

I hope this helped :)

autodidude · Dec 16, 2012

Thanks Runei, I think I still have to ponder it a bit more...it's one of those things where I got to read a whole bunch of different approaches and just keep thinking about it 'til it clicks.

An intuitive approach about which I'm also thinking about it is this:

http://betterexplained.com/articles/understanding-pythagorean-distance-and-the-gradient/

HallsofIvy · Dec 17, 2012

Another way to look at it: the "directional derivative", the rate of change of f(x,y,z), in the direction of unit vector [itex]\vec{v}[/itex], is given by [itex]\nabla f\cdot\vec{v}[/itex]. One way to show that is to note that any unit vector can be written in terms of "direction cosines". That is [itex]\vec{v}= < cos(\theta_x), cos(\theta_y), cos(\theta_z)>[/itex] where [itex]\theta_x[/itex] is the angle [itex]\vec{v}[/itex] makes with the x-axis, [itex]\theta_y[/itex] is the angle [itex]\vec{v}[/itex] makes with the y-axis, and [itex]\theta_z[/itex] is the angle [itex]\vec{v}[/itex] makes with the z-axis. Take derivatives with respecty to the angles and set them equal to 0 to see that maximizing that function requires that the angle are also the direction angles for the unit vector. Also, since a dot product with a unit vector is the length of the projection of the vector on the unit vector, it is easy to see that the dot product is largest when the unit vector is parallel to the given vector. And that implies the derivative is largest in the direction of the gradient vector.

Also note that the dot product of two non-zero vectors is 0 if and only if the vectors are perpendicular. It follows that the gradient vector is always perpendicular to a constant value surface.

Chestermiller · Dec 21, 2012

I think that this is a less formal version of what Halls of Ivy was (correctly) saying. If f(x,y) is a scalar function of x and y in the x-y plane, then the change in f between the point x,y and the point x + dx, y + dy is given by:

df = f_xdx + f_ydy

The right hand side of this equation can be expressed as the dot product of two vectors:

[tex]df = (f_x \mathbf{i}+f_y\mathbf{j})\cdot(dx\mathbf{i}+dy \mathbf{j})[/tex]

The vector [tex](f_x \mathbf{i}+f_y\mathbf{j})[/tex] is called the "gradient of f", and the vector [tex](dx\mathbf{i}+dy \mathbf{j})[/tex] is a differential position vector drawn between the point x,y and the point x + dx, y + dy. The differential position vector can also be expressed as:

[tex]\mathbf{ds}=(dx\mathbf{i}+dy \mathbf{j})= ds (\cos{\theta}\mathbf{i} +\sin{\theta}\mathbf{j})[/tex]

where

[tex]ds=\sqrt{(dx)^2+(dy)^2}[/tex]

[tex]\theta=\arctan{(\frac{dy}{dx})}[/tex]

Physically, θ is the angle between the differential position vector [tex](dx\mathbf{i}+dy \mathbf{j})[/tex] and the x axis.

In terms of ds and θ, the equation for df now becomes:

[tex]df = ds(f_x \mathbf{i}+f_y\mathbf{j})\cdot(\cos{\theta}\mathbf{i}+\sin{\theta} \mathbf{j})=ds(f_x\cos{\theta}+f_y\sin{\theta})[/tex]
or equivalently
[tex]\frac{df}{ds}= f_x\cos{\theta}+f_y\sin{\theta}[/tex]

Now suppose we hold the length of the differential position vector ds constant, and ask the question "in what direction θ will the change in df be a maximum over the specified distance ds?" We can answer this question by taking the derivative with respect to θ, and setting the derivative equal to zero:

[tex]-f_x\sin{\theta}+f_y\cos{\theta}=0[/tex]

The solution to this equation is [tex]\theta=\arctan{(\frac{f_y}{f_x})}[/tex]
or equivalently:

[tex]\sin{\theta}=\frac{f_y}{\sqrt{(f_x)^2+(f_y)^2}}[/tex]

[tex]\cos{\theta}=\frac{f_x}{\sqrt{(f_x)^2+(f_y)^2}}[/tex]

If we substitute these relationships into the equation for df/ds to get the maximum value of df/ds over all possible orientations of the differential displacement vector, we obtain:

[tex]\frac{df}{ds}=\sqrt{(f_x)^2+(f_y)^2}[/tex]

This shows that the maximum df over all possible orientations of the differential position vector is equal to ds times the magnitude of the vector gradient of f, [tex](f_x \mathbf{i}+f_y\mathbf{j})[/tex].

Deeper understanding of the gradient and directional derivative

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad Finding the minimum distance between two curves

Undergrad Why ##a^0=1##?

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Ambiguity of the term "indefinite integral"

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect