Intuition for gradient vector of multivariable functions

lys04 · Feb 2, 2024

In Homework Statement

Delta2 · Feb 2, 2024

The concept of Gradient is some sort of vector generalization of the first derivative of a function of a single variable. Your question
"What I dont get is why (-2,1) becomes the direction we should move from (1,1) in order to get the greates increase of f" is really good and right at the heart of the concept of gradient.

In functions of one variable we can say by approximation that ##\Delta f=f'(x) \Delta x##
if f is a function of many variables this become (again it is an approximation)

$$\Delta f=\nabla f (\vec{x})\cdot \vec{\Delta x}$$.

We have a dot product there and the dot product becomes maximum when the vector ##\vec{\Delta x}## and the vector ##\nabla f## are collinear, that is when ##\vec{\Delta x} ## has the same direction as ##\nabla f##.

IMPORTANT : The symbol ##\Delta## is the difference operator , not the Laplace operator.

mathwonk · Feb 3, 2024

Visually, if you look at a map of the level curves of f in the x,y plane, you can see that the direction of zero increase (to first order), is tangent to the level curve, i.e. the direction in which the function remains constant to 1st order. hence to obtain zero as the rate of increase in that direction, you must dot with a vector perpendicular to the level curve, hence the gradient points either towards the greatest or least rate of increase. Since one obtains a positive result from dotting a vector with itself, it must be perpendicular to the level curve and point towards the direction in which the increase is (most) positive, as Delta2 says.

The thing that is confusing to me is the fact that I tend to think of the gradient as defined by the coordinates (∂f/∂x, ∂f,∂y), as you emphasized in your original post. These coordinates actually have no intrinsic meaning at all. It is just a fact that if we want to know a vector, it suffices to know its projections onto any two independent axes. To understand the gradient, we need to use its definition as giving a linear approximation, as Delta2 focused on. This makes sense as soon as one has a good notion of length in the space, even before choosing coordinates. Then one introduces coordinates simply to make computations.

as another argument that partials don't necessarily tell you that much about the rate of change of the function, recall that both partials can exist even for a function that does not even have a gradient! i.e. the gradient is a vector that has the approximation property alluded to by Delta_2. So even if the partials exist, the vector they define may not have that property.

i.e. the graph of a function f(x,y) of 2 variables is a surface in 3-space. given a point on that surface, and a direction in the x,y plane, we say the directional derivative exists in that direction iff the curve we get by cutting the surface along that direction is a smooth curve with a tangent line. But even if this is true in all directions, there is no guarantee that all these tangent lines lie in the same plane. if not, there is no tangent plane, and no gradient. if there is, the gradient of f is obtained by projecting the gradient of the function f(x,y)-z, which is perpendicular to the graph of f, into the x,y plane.

Intuition for gradient vector of multivariable functions

Similar threads

Hot Threads

Recent Insights