Why is grad(f) a covariant vector

1. Jun 5, 2014

Benjam:n

Take R2. Take a function f(x,y) defined on R2 which maps every point to a real number. The gradient of this at any point mean a vector which points in the direction of steepest incline. The magnitude of the vector is the value of the derivative of the function in that direction. Both of these things are very real. This vector is solid and is surely there, so why doesn't it transform contravariantly? I had a go to explore this take x coordinate as Cartesian and x bar as the polars. Then define the function f(x,y) as x^2 +y^2. And if you work out the gradient vector and transform this contarvariant it does give you (2r, 0) which is thee gradient vector relative to polars.

2. Jun 5, 2014

Fredrik

Staff Emeritus
The gradient of a function $f:\mathbb R^n\to\mathbb R$ is the function $\nabla f:\mathbb R^n\to\mathbb R^n$ defined by
$$\nabla f(x)=(f_{,1}(x),\dots,f_{,n}(x)),$$ for all $x\in\mathbb R^n$. For each $i\in\{1,\dots,n\}$, $f_{,i}$ denotes the ith partial derivative of f. In differential geometry, partial derivatives are defined using both a coordinate system and the conventional type of partial derivatives. For example, if $x:U\to\mathbb R^n$ is a coordinate system on $U\subseteq\mathbb R^n$, and $p\in U$, then for all $i\in\{1,\dots,n\}$, we have
$$\frac{\partial}{\partial x^i}\bigg|_p f= (f\circ x^{-1})_{,i}(x(p)).$$ This statement defines the notation on the left.

The conventional partial derivatives in a gradient can be interpreted as partial derivatives in the sense of differential geometry, if we use the fact that the identity map $I$, defined by $I(x)=x$ for all $x\in\mathbb R^n$, is a coordinate system. We have
$$\frac{\partial}{\partial I^i}\bigg|_p f = (f\circ I^{-1})_{,i}(I(p)) = f_{,i}(p).$$ To see how partial derivatives in the sense of differential geometry transform under a change of coordinates $x\to y$, we need to use the chain rule:
\begin{align}
\frac{\partial}{\partial y^i}\bigg|_p f &=(f\circ y^{-1})_{,i}(y(p)) = (f\circ x^{-1}\circ x\circ y^{-1})_{,i}(y(p))= (f\circ x^{-1})_{,j} \big((x\circ y^{-1})(y(p))\big)\, (x\circ y^{-1})^j{}_{,i}(y(p))\\
& = (x\circ y^{-1})^j{}_{,i}(y(p)) \frac{\partial}{\partial x^j}\bigg|_p f.
\end{align} Is the transformation
$$\frac{\partial}{\partial x^i}\bigg|_p \to \frac{\partial}{\partial y^i}\bigg|_p =(x\circ y^{-1})^j{}_{,i}(y(p)) \frac{\partial}{\partial x^j}\bigg|_p$$ covariant or contravariant? Well, "covariant" means that the components transform the same way as the basis vectors, but the partial derivative functionals $\frac{\partial}{\partial x^i}\big|_p$ are the basis vectors (of the tangent space at p) associated with the coordinate system x. So the transformation is by definition covariant.

I guess this changes the question to why the coordinate n-tuple $(x^1(p),\cdots,x^n(p))$ that a coordinate system x associates with a point $p\in\mathbb R^n$ transforms contravariantly. They don't always. Under the coordinate change $x\to y$, $x(p)$ changes to
$$y(p)=(y\circ x^{-1}\circ x)(p)= (y\circ x^{-1})(x(p)).$$ To proceed from here, an assumption is necessary. We assume that $y\circ x^{-1}$ is a linear bijection from $\mathbb R^n$ to $\mathbb R^n$ (for example a rotation or a Lorentz transformation). The $i$th component of the matrix equation corresponding to the above (see https://www.physicsforums.com/showthread.php?t=694922 [Broken] if you don't understand that concept) is
$$(y(p))^i = (y\circ x^{-1})^i{}_j (x(p))^j.$$ Let T be an arbitrary linear bijection from $\mathbb R^n$ to $\mathbb R^n$. For all $x\in\mathbb R^n$ (apologies for using the symbol x for a second purpose), we have
\begin{align}
&T^i(x)=T^i{}_j x^j\\
&T^i_{,k}(x)=T^i{}_j \delta^j_k =T^i{}_k.
\end{align} This implies that $(y\circ x^{-1})^i{}_{,j}(x(p)) =(y\circ x^{-1})^i{}_j$. So we have
$$y^i(p)=(y(p))^i =(y\circ x^{-1})^i{}_j (x(p))^j = (y\circ x^{-1})^i{}_{,j}(x(p))\, x^j(p).$$ As you can see, the numbers $(y\circ x^{-1})^i{}_{,j}(x(p))$ that appear in this transformation equation are not the same as the numbers $(x\circ y^{-1})^j{}_{,i}(y(p))$ that appear in the transformation equation for the components of the gradient. However, we have
\begin{align}
\delta^j_k &=I^j{}_{,k}(x(p))= (x\circ y^{-1}\circ y\circ x^{-1})^j{}_{,k}(x(p)) =(x\circ y^{-1})^j{}_{,i}(y(p)) (y\circ x^{-1})^i{}_{,k}(x(p)).
\end{align} This is how we see that coordinate n-tuples transform contravariantly, i.e. using the inverse of the matrix that's used to transform the basis vectors.

Last edited by a moderator: May 6, 2017
3. Jun 6, 2014

WWGD

I wonder if the general definition of gradient as (vector-space) duals to vector fields (using the Riemannian metric as a non-degenerate bilinear form to make the isomorphism V-->V* natural) is done to address this issue, i.e, to make the gradient locally independent (within a chart ) of coordinate changes. Anyone know?