Why Does the Gradient Point Towards the Greatest Increase?

kidsasd987 · May 14, 2016

Hi, I am looking for a proof that explains why gradient is a vector that points to the greatest increase of a scalar function at a given point p.

http://math.stackexchange.com/quest...always-be-directed-in-an-increasing-direction

I understand the proof here. But.. the idea here is del(f)*dl = df is maximized when del(f) and dl point to the same direction, and that maximizes df. Then we have to first consider the direction of dl to verify where del(f) points to.
If we assume that there is a multivariable function f(x1, x2, x3, . . . xn)
and let's say that the derivative with respect to xj is a negative value at p0.
(also derivatives with respect to other variable x1, x2, x3 . . . xn are positive)

which indicates that the peak is at the left of the graph (at the negative direction with respect to point p0)
then, del(f)*dl = df will be maximized when df/dxj*(-dxj) because it would give positive incremental df, since the derivative is negative at p0.

also, we can do this because dl is a vector quantity so we can define its direction as we want it to be.
But, total differential doesn't take this into account. It just multiplies a small increment of each variable, dxi. and they all have the same sign.

I think this is a contradiction.

Twigg · May 15, 2016

I think what's confusing you is the use of ## dx_{j}##. In this context, it doesn't mean a small positive infinitesimal quantity, as it would in the context of integrating over an area, as in ##dA = dx dy##. In this context, ##dx_{j}## is meant to represent a displacement (be it positive or negative) in the ##x_{j}## coordinate. The author of the post you linked likely uses this notation because it agrees with the theory of differential forms. This works if you always remember that the dx's aren't the fundamental changes and treat them instead as parametric differentials. In other words, let ## (c_{1}, . . ., c_{n}) ## be the point you want to take the gradient of ## f(x_{1},...,x_{n}) ## (which must be smooth and non-critical at the point ##c##). Consider an arbitrary smooth regular curve through ##\vec{c}## given by ## \gamma (\epsilon) ## defined over a closed interval ##-a \leq \epsilon \leq +a ## where ##\gamma(0) = \vec{c}##. For an infinitesimally small increment ##d\epsilon## in the curve parameter ##\epsilon##, the displacement of the point ##\vec{c}## along the curve is given by the differential ##d\vec{\mathcal{l}} = \sum_{i} \dot{\gamma_{i}}(0) \vec{e}_{i} d\epsilon##. In that sense, each component differential ##dx{i} = \dot{\gamma_{i}}(0) \vec{e}_{i} d\epsilon## can be either positive or negative depending on the velocity of the curve ##\gamma## as it passes through point ##\vec{c}##.

kidsasd987 · May 15, 2016

Twigg said:

I think what's confusing you is the use of ## dx_{j}##. In this context, it doesn't mean a small positive infinitesimal quantity, as it would in the context of integrating over an area, as in ##dA = dx dy##. In this context, ##dx_{j}## is meant to represent a displacement (be it positive or negative) in the ##x_{j}## coordinate. The author of the post you linked likely uses this notation because it agrees with the theory of differential forms. This works if you always remember that the dx's aren't the fundamental changes and treat them instead as parametric differentials. In other words, let ## (c_{1}, . . ., c_{n}) ## be the point you want to take the gradient of ## f(x_{1},...,x_{n}) ## (which must be smooth and non-critical at the point ##c##). Consider an arbitrary smooth regular curve through ##\vec{c}## given by ## \gamma (\epsilon) ## defined over a closed interval ##-a \leq \epsilon \leq +a ## where ##\gamma(0) = \vec{c}##. For an infinitesimally small increment ##d\epsilon## in the curve parameter ##\epsilon##, the displacement of the point ##\vec{c}## along the curve is given by the differential ##d\vec{\mathcal{l}} = \sum_{i} \dot{\gamma_{i}}(0) \vec{e}_{i} d\epsilon##. In that sense, each component differential ##dx{i} = \dot{\gamma_{i}}(0) \vec{e}_{i} d\epsilon## can be either positive or negative depending on the velocity of the curve ##\gamma## as it passes through point ##\vec{c}##.

Thank you for your reply.
But it seems still unclear to me somehow. well, i maybe asking you stupid questions but everything gets suspicious when I think too much of something.

1.
I thought that each incremental dxi should be all positive or all negative because that would give maximum df.

It is obvious that incremental df is dependent on the direction of increment hv at a given point p0.
if we look at the graph of f vs xj, because the derivative is negative, it should increase to the negative direction.
For each coordinate x1 to xn, we are free to choose the direction(either negative or positive) to increase the total differential sum of dl.
But the differential dxi has to have the same sign to maximize dl.
2.

Do you mean the parametrized differential dx by dx = lim(eps->0)|eps|?

Charles Link · May 15, 2016

In a very simplistic approach, ## df=\nabla f \cdot ds ## where ## ds=dx \hat{i} +dy \hat{j} +dz \hat{k} ##. This implies ## df=|\nabla f| |ds| \cos \theta ## where ## \theta ## is the angle between ## \nabla f ## and ## ds ##. The maximum occurs when ## \theta=0 ##.

Why Does the Gradient Point Towards the Greatest Increase?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Finding the minimum distance between two curves

Undergrad Why ##a^0=1##?

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Ambiguity of the term "indefinite integral"

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect