# Where does the gradient operator come from?

• I

## Main Question or Discussion Point

Can someone explain why the gradient of a function is just a vector made up of partial derivatives of the function?

haushofer
That's the definition. The rationale here is, that for ordinary calculus we look at the derivative of a function to say something sensible about extrema. If you consider a function of multiple arguments/variables which are independent (!), you stack all its partial derivatives in one vector* and call it the gradient.

* Technically, a gradient is a so-called one-form and not a vector, a distinction which becomes important when you want to use non-Cartesian coordinates or consider curved manifolds.

Delta2
Homework Helper
Gold Member
The main idea comes from the functions of one variable f(x) where the derivative (with respect to the x variable ) at the point $x$ , that is $f'(x)$ is the slope of the tangent line at the point $(x,f(x))$ of the graph of the function f.
So, If we want to make an operator that :

takes as input a function f(x) and gives as output a vector which has
1) magnitude of the output vector is the slope of the function with respect to that variable,
2) direction of the output vector is the direction of the line of the variable (in one variable, or one dimension we just have a single direction, a single line)

Then the operator is just $\vec{i}\frac{\partial}{\partial x}$

The most straightforward generalization of this operator in 3 dimensions is just the gradient operator, think about it, each component of the output vector of the gradient , is now the slope of the function f(x,y,z) with respect to that variable, that is the i component gives us the slope of f with respect to the variable x, the j component give us the slope with respect to y, and the k component give us the slope with respect to z.

Last edited:
fresh_42
Mentor
Can someone explain why the gradient of a function is just a vector made up of partial derivatives of the function?
I think it helps to consider the partial derivatives as the basis in tangent space. So the gradient is a certain derivative, which are always directional, expressed in this basis.

It is a straightforward generalization of the single-variable derivative to a multivariable function. Recall that if $f:\mathbb{R}\rightarrow\mathbb{R}$, then
$$f'(x) = \lim_{h\to 0} \frac{f(x+h) - f(x)}{h}$$
If we try to generalize this to two-variable functions, we have a problem with the denominator:
$$f'(x,y) = \lim_{(h,k)\to 0} \frac{f(x+h, y+k) - f(x,y)}{(h,k)}$$
The denominator is a displacement vector, and there is no consistent way to divide by a vector. However, what we want to do is divide by the length of the displacement vector, which is what we really mean when we divide by h in 1 dimension too. So let's do that instead:
$$f'(x,y) = \lim_{(h,k)\to 0} \frac{f(x+h, y+k) - f(x,y)}{||(h,k)||}$$
This looks like a derivative. But recall that in 1-dimension, the derivative was defined so that it provided the best linear approximation to f around the point at which it was taken. It was not really a number: it was a tool for finding the correct tangent line.
That is what we want in a multivariable derivative: a tool for finding the best linear approximation to f(x,y) at the point the derivative is applied. In 1 dimension, the derivative was applied to the displacement h to get the linear approximation: $f(x + h) \approx f(x) + f'(x)\cdot h$.
A linear 2-variable approximation based on the displacement vector (h, k) would then be $f(x, y) \approx f(x_0, y_0) + u\cdot h + v\cdot k$, where u and v are constants.
So $f'(x,y)$ somehow has to give us two numbers, u and v, to apply to the displacement vector (h,k) in order to get our linear approximation. We know the dot product gives us that expression, so f'(x,y) must be the vector (u, v). Applying f'(x,y) to a particular displacement vector (h,k) must then be done with the dot product.
But hold on. Our definition doesn't look like it is going to give us a vector:
$$f'(x,y) = \lim_{(h,k)\to 0} \frac{f(x+h, y+k) - f(x,y)}{||(h,k)||}$$
If $f(x,y)$ is a scalar-valued function, then the right side is a single number, not a 2-component vector. So whatever that is, it is not what we want.To get what we want, mathematicians made a slight adjustment to the definition. Recalling that for single variable functions, we have:
$$f'(x) = \lim_{h\to 0} \frac{f(x+h) - f(x)}{h}$$
we can do some slight of hand to unify the two sides. If $f'(x)$ exists, then $\lim_{h\to 0} f'(x) = f'(x)$, so
$$\lim_{h\to 0} f'(x) = \lim_{h\to 0} \frac{f(x+h) - f(x)}{h}$$
These two sides are just numbers, so we can do algebra:
$$0 = \lim_{h\to 0} \frac{f(x+h) - f(x)}{h} - \lim_{h\to 0} f'(x)$$
$$0 = \lim_{h\to 0} \frac{f(x+h) - f(x)}{h} - \lim_{h\to 0} \frac{f'(x)\cdot h}{h}$$
$$0 = \lim_{h\to 0} \frac{f(x+h) - f(x) - f'(x)\cdot h}{h}$$
It follows that this is an equivalent definition of the derivative, where we can see the linear approximation that it provides directly in the numerator. This motivates us to try the following definition for multivariable derivatives: $f'(x,y)$ is, at each point $(x,y)$ the unique linear function $L(h,k)$ such that:
$$0 = \lim_{(h,k)\to 0} \frac{f(x+h, y+k) - f(x,y) - L(h,k)}{||(h,k)||}$$
That is, the derivative at each point is the unique linear function that vanishes at the same rate as the function with the displacement vector. Recalling that L is a linear function of the displacement vector (h,k), it must be the case that $L(h,k) = u\cdot h + v\cdot k$.
Use this fact in the limit, and find out what u and v must be equivalent to. You will answer your own question then. :-)

Last edited:
Delta2
Homework Helper
Gold Member
Another way to answer is that vectors contain information about something, for example the vector of velocity gives us information on how fast something is moving and the direction that is moving.

The vector of the gradient of a function just give us information about each of the first partial derivatives (the magnitude of each component give us the first partial derivative with respect to that corresponding variable). The direction of the gradient give us some information about which partial derivative is greater, for example if the partial derivative with respect to x is much greater than the partial derivatives with respect to y,z then the gradient vector will tend to point in the x-direction.

jambaugh
Gold Member
Another way to generalize is using the Gâteaux differential and derivative definitions. View a derivative of a function at a point as a linear operator mapping the differentials of the input variables to differentials of the output variables.

So if $y=f(x)$ is a single variable function $dy = f'(x)dx$ is the simple 1-dimensional linear operator we call "multiplication by a number".
But in higher dimensions, where $\mathbf{x}$ is a vector, say $\mathbf{x}=\langle x,y,z\rangle$ and we have a scalar valued function of this vector:
$u = f(\mathbf{x}) = f(x,y,z)$
the derivative must map a vector differential to a scalar differential:
$$du = f'(\mathbf{x})[ d\mathbf{x}]$$
(I'm using square brackets here as function notation for a specifically linear function. Note we have a distinct linear function $f'(\mathbf{x})$ at each value of $\mathbf{x}$ which then acts on the differential $d\mathbf{x}$.)

In this case the derivative must be something called a dual vector, or a linear functional. It is also a vector but living in the dual space of the space where $d\mathbf{x}$ lives. But we don't like to get into this deeper aspect of linear algebra in the Calc III courses and we also have a nice way of avoiding explicit reference to dual spaces (when we work in finite dimensions) by use of something called the Riesz Representation Theorem. It says, effectively, that we can always (in finite dimensional inner product spaces) define the action of a dual vector on a vector as the action of taking the dot product $\bullet$ with some specific vector. In technical terms:

"For $\phi$ a linear functional there exists some vector $\mathbf{v}$ such that $\phi[\mathbf{x}] = \mathbf{v}\bullet \mathbf{x}$."

In the context of defining the derivative it means we can define the derivative functional as the dot product of some vector.

$$du = f'(\mathbf{x})[d\mathbf{x}] \equiv \nabla f (\mathbf{x})\bullet d\mathbf{x}$$
(:in short, $f'(\mathbf{x}) = \nabla f(\mathbf{x})\bullet$.)
This vector, $\nabla f$ being used to express the derivative is defined as the gradient vector. It is not ideal because we don't always have a dot product (or just one of em) defined on our space and the gradient depends on the definition of the dot product while this generalized derivative does not. But it is very convenient if you don't want to whip out the full toolbox of linear algebra with its dual spaces and such.

Do note that the gradient operator is something more, it is the operator mapping a function to its gradient vector. $\nabla: f \mapsto \nabla f$ But it can be applied to to vector valued functions as well and can be combined with other vector operations to define, for example, the curl and divergence operators.

Now as to why the gradient is just the vector of partial derivatives, this is not always the case and depends specifically on the fact that you are working with rectilinear coordinates where the coordinate basis is ortho-normal. You'll find, for example if you re-express a quantity in polar coordinates, its gradient vector is no longer simply the vector of partial derivatives. However it does obey a very simple "chain rule" so you can work out gradients in any coordinate system simply by remembering:
$$\nabla f(u,v,\ldots) = \frac{\partial f}{\partial u}\nabla u + \frac{\partial f}{\partial v}\nabla v + \ldots$$

lavinia
Gold Member
Can someone explain why the gradient of a function is just a vector made up of partial derivatives of the function?
Assuming you are asking this question for Euclidean 3 space, how do you define the gradient of a function if not the vector of partials with respect to the Euclidean coordinate axes? Are you using some other definition?

If you are thinking of a gradient as the direction and rate at which a physical quantity $f(x,y,z)$ changes most rapidly, then this turns out to be the vector of partial derivatives, $(∂f/∂x,∂f/∂y,∂f/∂z)$.

To show this one needs to show:

The function changes most rapidly in the normal direction to the surfaces $f(x,y,z)=k$ on which $f$ is constant.​

$∇f=(∂f/∂x,∂f/∂y,∂f/∂z)$ is normal to the surface $f(x,y,z)=k$.​

The directional derivative of $f$ in the direction of $∇f$ is the length of $∇f$​

The last two properties follow from the Chain Rule.

Last edited: