# How is the gradient covariant?

• B

## Main Question or Discussion Point

Hi, basic cartesian coordinates and we want to know the gradient of a scalar function of x,y, and z. So we can use the most basic basis there is of three orthogonal unit vectors and come up with the gradient of the scalar function. Now without rescaling the coordinate system or altering it in any way just double the length of each basis vector. If it is the same scalar function as before then each component of gradient is reduced by half

Isn't that contravariance?

Related Linear and Abstract Algebra News on Phys.org
stevendaryl
Staff Emeritus
Hi, basic cartesian coordinates and we want to know the gradient of a scalar function of x,y, and z. So we can use the most basic basis there is of three orthogonal unit vectors and come up with the gradient of the scalar function. Now without rescaling the coordinate system or altering it in any way just double the length of each basis vector. If it is the same scalar function as before then each component of gradient is reduced by half

Isn't that contravariance?
There are two different meanings of "gradient" whose differences are glossed over when you're dealing with cartesian coordinates, but which need to be kept distinct when you're dealing with general coordinates and coordinate changes.

Most fundamentally,
• Let ##\phi(\vec{r})## be a scalar field---a function that returns a scalar (a real or complex number) for each point in space.
• Let ##\vec{r}(t)## be a path through space as a function of a real-valued parameter ##t## (this doesn't have to be time, it could be any quantity that increases continuously along the path).
• Let ##\overrightarrow{V}(t)## be the corresponding "velocity vector", ##\overrightarrow{V} = \frac{d \vec{r}}{dt}##.
Then we can define the "directional derivative" of ##\phi## along vector ##\overrightarrow{V}## to be the rate of change of ##\phi## along the path ##\vec{r}(t)##:

Directional derivative: ##\nabla_{\overrightarrow{V}}(\phi) \equiv \frac{d}{dt} \phi(\vec{r}(t))##

In terms of the directional derivative, we can define two different mathematical objects that might be called the "gradient" of ##\phi##:

Covector gradient: Define ##(\nabla \phi)## to be an operator that takes a vector ##\overrightarrow{V}## and returns the directional derivative ##\nabla_{\overrightarrow{V}}(\phi)##

Vector gradient: Define ##\overrightarrow{\nabla \phi}## to be a vector such that ##\overrightarrow{\nabla \phi} \cdot \vec{V} = \nabla_{\overrightarrow{V}}(\phi)##

The components of the covector gradient are given by: ##(\nabla \phi)_j = \frac{\partial \phi}{\partial x^j}##. The components of the vector gradient are given by: ##(\overrightarrow{\nabla \phi})^j = \sum_{i} g^{ij} \frac{\partial \phi}{\partial x^i} ##, where ##g^{ij}## is the metric tensor. In Cartesian components, ##g^{ij} = 0## unless ##i=j##, and ##g^{jj} = 1##. So there's not much difference between the covector gradient and the vector gradient. However, under a coordinate change, the components of the covector gradient transform differently than the components of the vector gradient.

plob
There are two different meanings of "gradient" whose differences are glossed over when you're dealing with cartesian coordinates, but which need to be kept distinct when you're dealing with general coordinates and coordinate changes.

Most fundamentally,
• Let ##\phi(\vec{r})## be a scalar field---a function that returns a scalar (a real or complex number) for each point in space.
• Let ##\vec{r}(t)## be a path through space as a function of a real-valued parameter ##t## (this doesn't have to be time, it could be any quantity that increases continuously along the path).
• Let ##\overrightarrow{V}(t)## be the corresponding "velocity vector", ##\overrightarrow{V} = \frac{d \vec{r}}{dt}##.
Then we can define the "directional derivative" of ##\phi## along vector ##\overrightarrow{V}## to be the rate of change of ##\phi## along the path ##\vec{r}(t)##:

Directional derivative: ##\nabla_{\overrightarrow{V}}(\phi) \equiv \frac{d}{dt} \phi(\vec{r}(t))##

In terms of the directional derivative, we can define two different mathematical objects that might be called the "gradient" of ##\phi##:

Covector gradient: Define ##(\nabla \phi)## to be an operator that takes a vector ##\overrightarrow{V}## and returns the directional derivative ##\nabla_{\overrightarrow{V}}(\phi)##

Vector gradient: Define ##\overrightarrow{\nabla \phi}## to be a vector such that ##\overrightarrow{\nabla \phi} \cdot \vec{V} = \nabla_{\overrightarrow{V}}(\phi)##

The components of the covector gradient are given by: ##(\nabla \phi)_j = \frac{\partial \phi}{\partial x^j}##. The components of the vector gradient are given by: ##(\overrightarrow{\nabla \phi})^j = \sum_{i} g^{ij} \frac{\partial \phi}{\partial x^i} ##, where ##g^{ij}## is the metric tensor. In Cartesian components, ##g^{ij} = 0## unless ##i=j##, and ##g^{jj} = 1##. So there's not much difference between the covector gradient and the vector gradient. However, under a coordinate change, the components of the covector gradient transform differently than the components of the vector gradient.
Hi, thank you I'm sure this is probably right

But don't really have the background to grasp the last part. My fault sorry.

Is it possible there might be a more intuitive way of understanding

stevendaryl
Staff Emeritus
Hi, thank you I'm sure this is probably right

But don't really have the background to grasp the last part. My fault sorry.

Is it possible there might be a more intuitive way of understanding

plob
Hi steven, yes

stevendaryl
Staff Emeritus
Suppose you have some weird coordinate system with two coordinates, ##u## and ##v##. You want to know the distance between point ##A## and point ##B## is. If you were using cartesian coordinates, then the distance would be given by: ##D = \sqrt{\delta u^2 + \delta v^2}##, where ##\delta u## is the change in the ##u## coordinate in going from ##A## to ##B##, and ##\delta v## is the change in the ##v## coordinate. But what about polar coordinates? In terms of ##r## and ##\theta##, the distance between ##A## and ##B## is given approximately (when ##A## and ##B## are very close together) by:

##D = \sqrt{\delta r^2 + r^2 \delta \theta^2}##

In general, for points that are close together, the distance will be given by: ##D^2 = g_{uu} \delta u^2 + g_{uv} \delta u \delta v + g_{vu} \delta v \delta u + g_{vv} \delta v^2##. Those four numbers, ##g_{uu}, g_{uv}, g_{vu}, g_{vv}## are the components of the "metric tensor" in the ##u-v## coordinate system.

In cartesian coordinates ##x,y##, it's trivial: ##g_{xx} = 1, g_{xy} = 0, g_{yx} = 0, g_{yy} = 0##. But in polar coordinates, it's a little more interesting: ##g_{rr} = 1, g_{r\theta} = 0, g_{\theta r} = 0, g_{\theta \theta} = r^2##.

The metric tensor is how you compute dot-products of two vectors: ##\vec{A} \cdot \vec{B} = (A^u)(B^u) g_{uu} + (A^u)(B^v) g_{uv} + (A^v)(B^u) g_{vu} + (A^v)(B^v) g_{vv}##. For cartesian coordinates, since the components of ##g## are pretty trivial, then it simplifies a lot: ##\vec{A} \cdot \vec{B} = A^x B^x + A^y B^y##.

Viewed as a 2x2 matrix, the metric tensor ##g## has an inverse. It's components are denoted by raised indices: ##g^{ij}##. You can use the inverse metric tensor to take a "dot" product of two covectors, or to convert a covector into a vector.

So let's take the case of the gradient in polar coordinates. The covector form is: ##\nabla \phi## with components ##(\nabla \phi)_r = \frac{\partial \phi}{\partial r}## and ##(\nabla \phi)_\theta = \frac{\partial \phi}{\partial \theta}##. To convert it into a vector, you use the inverse of the metric tensor. In this case,

##g^{rr} = \frac{1}{g_{rr}} = 1##
##g^{\theta \theta} = \frac{1}{g_{\theta \theta}} = \frac{1}{r^2}##
(the other two components are zero).

So the vector form of the gradient is: ##\overrightarrow{\nabla \phi}## with components

##(\overrightarrow{\nabla \phi})^r = g^{rr} \frac{\partial \phi}{\partial r} = \frac{\partial \phi}{\partial r}##
##(\overrightarrow{\nabla \phi})^\theta = g^{\theta \theta} \frac{\partial \phi}{\partial \theta} = \frac{1}{r^2} \frac{\partial \phi}{\partial \theta} ##

To get the directional derivative, you take the dot-product with a direction vector ##\overrightarrow{V}##:
##\overrightarrow{\nabla \phi} \cdot \overrightarrow{V}##

but computing the dot-product in curvilinear coordinates involves the metric tensor again:

##\overrightarrow{\nabla \phi} \cdot \overrightarrow{V} = g_{rr} (\overrightarrow{\nabla \phi})^r (\overrightarrow{V})^r + g_{\theta \theta} (\overrightarrow{\nabla \phi})^\theta (\overrightarrow{V})^\theta##

The metric tensor just cancels out the use of the inverse metric tensor in forming the vector gradient, so the result is just:

##\overrightarrow{\nabla \phi} \cdot \overrightarrow{V} = \frac{\partial \phi}{\partial r} V^r + \frac{\partial \phi}{\partial \theta} V^\theta##

This shows that the covariant form of the gradient is more natural; the vector form uses the inverse metric tensor to create the vector, and then uses the metric tensor again to get the result. In the final result, the metric tensor components drop out.

Please allow me a followup to this thread, which I found after struggling with the matter.

What you say, stevendaryl, appears as quite clear to me. I just would be grateful for confirmation that, consequently the total derivative of any scalar ##S## expressed through the derivatives of the coordinates ##\mbox{d}S = \frac{\partial S}{\partial x^\mu} \mbox{d}x^\mu## looks trivial in any coordinate system in the sense that the metric is not visible. In particular, in polar coordinates ##\mbox{d}S = \frac{\partial S}{\partial r} \mbox{d}r + \frac{\partial S}{\partial \theta} \mbox{d}\theta##.

I got much confused because in most literature it is said that the gradient in polar coordinates is ##\mbox{col}(\frac{\partial S}{\partial r} , \; \frac{1}{r}\frac{\partial S}{\partial \theta} )## . But I think the reason is that they refer to a normalized coordinate system (unit vectors in all directions) in place of the natural system given by the the coordinates.

Infrared
Gold Member
Forgive me for not reading most of the thread, but I'll try to answer @gerald V's last post.

Yes, the derivative ##dS=\frac{\partial S}{\partial x^i}dx^i## is independent of a metric. You do not need a metric to define the derivative of a scalar function.

I think you are right about why you found that formula in the literature. If you just follow the usual formula for the gradient vector in coordinates, you would get ##\nabla S=\frac{\partial S}{\partial r}\frac{\partial}{\partial r}+\frac{1}{r^2}\frac{\partial S}{\partial \theta}\frac{\partial}{\partial \theta},## so if you expressed ##\nabla S## in the basis ##\{\frac{\partial}{\partial r},\frac{\partial}{\partial \theta}\},## you would get the vector ##\begin{pmatrix}\frac{\partial S}{\partial r}\\ \frac{1}{r^2}\frac{\partial S}{\partial\theta}\end{pmatrix}.##

But people often like to use a basis of unit vectors, and from ##g_{rr}=1, g_{\theta\theta}=r^2##, we see that while ##\frac{\partial}{\partial r}## has unit length, the vector ##\frac{\partial}{\partial\theta}## has length ##r##. So the second component should be multiplied by ##r## if you want the coordinates in the re-scaled basis of unit vectors ##\{\frac{\partial}{\partial r},\frac{1}{r}\frac{\partial}{\partial \theta}\}##. Sometimes, especially in physics contexts, you might see these unit vectors called ##\hat{r}## and ##\hat{\theta}##, respectively.

Edit: I just noticed that the rest of this thread is from 2018. In the future, consider starting a new thread instead.

Last edited: