# Dual vector clarifications

## Main Question or Discussion Point

Hello,

I'm reading Sean Carroll's Spacetime and Geometry. When discussing dual vectors, he presents the gradient as the "simplest example of a dual vector" in spacetime. This confuses me because I learned the gradient to be an operator which takes a scalar as input and outputs a vector. The way Carroll presented dual vectors, it seemed that they should do the opposite: take a vector as input and output a scalar. Can someone clear this up for me?

Related Special and General Relativity News on Phys.org
Nugatory
Mentor
Hello,

I'm reading Sean Carroll's Spacetime and Geometry. When discussing dual vectors, he presents the gradient as the "simplest example of a dual vector" in spacetime. This confuses me because I learned the gradient to be an operator which takes a scalar as input and outputs a vector. The way Carroll presented dual vectors, it seemed that they should do the opposite: take a vector as input and output a scalar. Can someone clear this up for me?
There is more than one way to think of the gradient - your interpretation that it turns a scalar field into a vector is fine, but not the only way.

You can also interpret the gradient as a machine that you insert a vector into, and out pops a number telling how much the value of the scalar field changes along that vector. Think about a traditional contour map of the earth's surface: the contour lines are curves of equal height above sea level and your gradient is a vector perpendicular to these curves, but the same information is contained in the dual vector that tells you how much the elevation changes if you move an infinitesimal distance in a given direction. This latter form turns out to be much more convenient and mathematically tractable in GR.

There is more than one way to think of the gradient - your interpretation that it turns a scalar field into a vector is fine, but not the only way.
Fair enough.

You can also interpret the gradient as a machine that you insert a vector into, and out pops a number telling how much the value of the scalar field changes along that vector.
Can you be a bit more precise with this? My confusion persists because, in this case, you are still taking the gradient of a scalar field and getting a vector field as an output -- or is that statement wrong?

Perhaps Carroll is using a different definition of the gradient. He defines the gradient $d \phi$ (not $\nabla \phi$) by $$d\phi = \frac {\partial \phi}{\partial x^\mu} \hat{\theta}^{ (\mu)},$$ where $\hat{\theta}^{(\mu)}$ is one of the dual basis vectors. Summing over this doesn't look like the definition of the gradient I've seen before -- is this just a redefinition?

Nugatory
Mentor
Can you be a bit more precise with this? My confusion persists because, in this case, you are still taking the gradient of a scalar field and getting a vector field as an output -- or is that statement wrong?
We start with a scalar field, and build the gradient dual vector object from that using Carroll's definition. That's a machine that we feed an input vector into, and a scalar comes out, namely the scalar change in the value of the scalar field in the direction of that input vector. If $\phi_{\mu}$ are the components of the gradient dual vector, then $\phi_{\mu}A^{\mu}$ is the scalar that comes out when it acts on a vector $A$.

If you can get hold of a copy of MTW, look at the discussion of one-forms in the first few sections - the gradient as a one-form is one of their examples.

MTW
What is that?

Maybe I'm confused here -- is the gradient operator the one-form, or is $\nabla \phi$ the one-form? It seems like you are saying that the latter is the one form, but that doesn't make sense to me because it is represented as a linear combination of the basic vectors, not the dual space basis vectors (which is what Carroll appears to be saying it looks like).

For example, if I take the gradient of $\phi = xy^2,$ I get $\nabla \phi = y^2 \hat{x} + 2xy \hat{y},$ which is a contravariant vector, not a one-form -- right?

Last edited:
Fredrik
Staff Emeritus
Gold Member
I'm reading Sean Carroll's Spacetime and Geometry. When discussing dual vectors, he presents the gradient as the "simplest example of a dual vector" in spacetime. This confuses me because I learned the gradient to be an operator which takes a scalar as input and outputs a vector. The way Carroll presented dual vectors, it seemed that they should do the opposite: take a vector as input and output a scalar. Can someone clear this up for me?
The gradient of a function $f:\mathbb R^n\to\mathbb R^n$ is the function $\nabla f:\mathbb R^n\to\mathbb R^n$ defined by
$$\nabla f(x)=(f_1(x),\dots,f_n(x))$$ for all $x$. The corresponding n-tuple in differential geometry is
$$\left(\frac{\partial}{\partial x^1}\bigg|_p f,\dots,\frac{\partial}{\partial x^n}\bigg|_p f\right).$$ This n-tuple "transforms covariantly" under a change of coordinates $x\to y$, just like the n-tuple of components of a cotangent vector. This is perhaps what Carroll had in mind. It's not a cotangent vector. It just transforms like a cotangent vector's n-tuple of components. That makes them both "covariant vectors" (I hate that term so much that I can barely type it) to someone who uses the old and terrible definitions.

I think I've got it: it's a dual vector because its coordinates transform like a dual vector's. Sorry that I didn't make that (fairly obvious) connection before.

What's wrong with the ideas of covariant and contravariant vectors?

Fredrik
Staff Emeritus
Gold Member
What's wrong with the ideas of covariant and contravariant vectors?
It's not wrong, it's just very ugly compared to the modern approach. The biggest problem is that the people who use the old fashioned definitions don't seem to understand them well enough to explain them. They never even mention that in order to discuss the "transformation" of an n-tuple, you have to associate an n-tuple with each coordinate system. (The "transformation" is just the relationship between the n-tuples associated with two different coordinate systems).

They often make claims about specific n-tuples that should really be made about either the string of text that represents the n-tuple, or the function that associates n-tuples with coordinate systems.

lavinia
Gold Member
Fair enough.

Can you be a bit more precise with this? My confusion persists because, in this case, you are still taking the gradient of a scalar field and getting a vector field as an output -- or is that statement wrong?

Perhaps Carroll is using a different definition of the gradient. He defines the gradient $d \phi$ (not $\nabla \phi$) by $$d\phi = \frac {\partial \phi}{\partial x^\mu} \hat{\theta}^{ (\mu)},$$ where $\hat{\theta}^{(\mu)}$ is one of the dual basis vectors. Summing over this doesn't look like the definition of the gradient I've seen before -- is this just a redefinition?
$$d\phi$$ is the differential of the scalar field (in math this is called a function) $$\phi$$.
The differential of a function is a linear map on the tangent space at a point,x, into the real numbers ( the field of scalars). Its value on a vector,v, is the derivative of $$\phi(c(t)$$ where c(t) is a curve whose derivative at the point,x, is v.
Since it is a linear map it is in the dual space to the vector space of tangent vectors at,x. But it is not a vector so it is not the gradient of the function. Rather it is the differential of the function.

When one has a metric there exists a vector called the gradient of $$\phi$$ whose inner product with v is the same as $$d\phi(v)$$ So taking inner products with the gradient is again an element of the dual space. With a different metric, there will be a different gradient vector.

Fredrik
Staff Emeritus
Gold Member
Perhaps Carroll is using a different definition of the gradient. He defines the gradient $d \phi$ (not $\nabla \phi$) by $$d\phi = \frac {\partial \phi}{\partial x^\mu} \hat{\theta}^{ (\mu)},$$ where $\hat{\theta}^{(\mu)}$ is one of the dual basis vectors. Summing over this doesn't look like the definition of the gradient I've seen before -- is this just a redefinition?
This is how I would define $d\phi$: For each $p\in M$, define $(d\phi)_p$ by $(d\phi)_p(v)=v(\phi)$ for all $v\in T_pM$. The map $p\mapsto (d\phi)_p$ with domain $M$ is denoted by $d\phi$.

This definition makes $(d\phi)_p$ a cotangent vector at $p$, and $d\phi$ a cotangent vector field. Cotangent vector fields are also called 1-forms.

Using this definition, we can see that the ordered basis dual to $\big(\frac{\partial}{\partial x^i}\big|_p\big)_{i=1}^n$ is $\big((\mathrm d x^i)_p\big)_{i=1}^n$.
$$(dx^i)_p\left(\frac{\partial}{\partial x^j}\bigg|_p\right) = \frac{\partial}{\partial x^j}\bigg|_p x^i =(x^i\circ x^{-1})_{,j}(x(p)) =(x\circ x^{-1})^i{}_{,j}(x(p)) = I^i{}_{,j}(x(p)) =\delta^i_j.$$ This implies (read post 11 (and maybe 23 and 24) in this thread if you don't see why) that
$$(d\phi)_p = \left((d\phi)_p\left(\frac{\partial}{\partial x^i}\bigg|_p\right)\right) (dx^i)_p =\left(\frac{\partial}{\partial x^i}\bigg|_p \phi\right) (dx^i)_p.$$ So the n-tuple I mentioned in post #7 is the n-tuple of components of a cotangent vector. It's the n-tuple of components of $(df)_p$.

Note that what I said here and in post #7 explains why $df$ is called the gradient of $f$.

Last edited:
stevendaryl
Staff Emeritus
To me, understanding why vectors are different from covectors is helped by considering a case where it is impossible to convert between the two.

Instead of the usual space of locations in the universe, let's consider an abstract space from thermodynamics. Suppose you have a balloon filled with air. This balloon has a certain volume $V$, but the volume isn't constant. It depends on the temperature and air pressure of the room that it is placed into. So we can describe $V$ as a scalar function $V(P,T)$ in a two-dimensional space with coordinates $P,T$.

Now, in this 2-D space, we can easily define two different types of vector-like objects.
1. We can describe the dependency of $V$ on "location" by the one-form $\vec{\nabla} V$ with components $V_P = \frac{\partial V}{\partial P}$ and $V_T = \frac{\partial V}{\partial T}$.
2. We can describe the change in "location" of the balloon as a function of time by the tangent vector $\vec{U}$ with components $U^P = \frac{dP}{dt}$, $U^T = \frac{dT}{dt}$
To see that $\vec{\nabla} V$ and $\vec{U}$ are very different types of objects, ask yourself what it would mean for them to be in the "same direction". Obviously, two vectors are in the same direction if they are linear multiples of each other. So in terms of components, that means that there is some real number $\alpha$ such that
• $V_P = \alpha U^P$
• $V_T = \alpha U^T$
And that is completely impossible. To see that, just look at the units. If we measure volume in liters, pressure in atmospheres, time in seconds and temperature in degrees, then
1. The units of $V_P$ is $\frac{liter}{atmosphere}$
2. The units of $U^P$ is $\frac{atmosphere}{second}$
3. The units of $V_T$ is $\frac{liter}{degree}$
4. The units of $U^T$ is $\frac{degree}{second}$
From 1&2, we would conclude that $\alpha$ has units $\frac{second\ \cdot\ liter}{atmosphere^2}$
From 3&4, we would conclude that $\alpha$ has units $\frac{second\ \cdot\ liter}{degree^2}$

There is no single scaling factor $\alpha$ that could possibly work to make $\vec{\nabla V} = \alpha\ \vec{U}$. To make sense of these two vector-like objects being in "the same direction", you would need a way to convert degrees into atmospheres.

Here's another impossibility for this 2-D space: computing the "length" of a vector (of either type). Naively, if you have a two-component vector $\vec{U}$ then you could define the length to be $|\vec{U}| = \sqrt{(U^P)^2 + (U^T)^2}$. But that doesn't make any sense, because $U^P$ and $U^T$ have different units.

But there is one operation that you can do with vectors that DOES make sense: You can multiply a vector by a covector:

$\vec{\nabla V}\ \cdot \ \vec{U} = V_P U^P + V_T U^T$

This quantity has units $\frac{liter}{second}$, and has a clear interpretation: It is equal to $\frac{dV}{dt}$, the rate of change of the volume of $V$ as both the pressure and temperature change with time.

This is getting pretty long-winded, already, but I thought it would wrap up the discussion if I showed how the "impossibilities" are resolved by a metric tensor.

Although we can't convert $\vec{\nabla V}$ into $\vec{U}$ with a real number $\alpha$, we could relate them by a tensor $g$. Suppose we had a tensor $g$ with 4 components $g_{PP}, g_{PT}, g_{TP}, g_{TT}$. Then we could use that tensor to convert a vector $\vec{U}$ into a covector $\vec{\tilde{U}} = g(\vec{U})$ by letting $\tilde{U}_i = \sum_j g_{ij} U^j$ (where $i$ and $j$ run through the set $P,T$). Then using the tensor $g$, we could say
• $\vec{\nabla V}$ is in the same direction as $\vec{U}$ if $\vec{\nabla V} = \alpha g(\vec{U})$.
We could also use $g$ to define a "length" of a vector $\vec{U}$:
• $|\vec{U}| = \sqrt{g(\vec{U}) \cdot \vec{U}}$
where $\cdot$ is the operator that multiplies a covector by a vector.

Finally, $g$ would provide a way to convert degrees into atmospheres: the conversion factor $F = \sqrt{\frac{g_{PP}}{g_{TT}}}$ has units $\frac{atmosphere}{degree}$

• jmatt
lavinia
Gold Member
In general one can use a metric to switch between dual vectors and vectors. The gradient is an example of switching from a dual vector field to a vector field.

One can also use the metric to switch from a vector to a dual vector by taking inner products. (The dual of a vector field might not be the differential of a scalar field.)

Often calculus is done in Euclidean space with the standard metric that makes the coordinates axes perpendicular to each other. With this metric the gradient is just the vector of partial derivatives of the function with respect to the coordinate directions. So books often write

d/dtf(c(t) = ∇f.c'(t)

The dot product here is just the Euclidean inner product.

But if there were a different inner product ∇f would be a different vector.
Then the relation would be

d/dtf(c(t) = <∇f,c'(t)> where <,> is the inner product.

A finite dimensional vector space and the space of its dual vectors are linearly isomorphic but there is no "canonical" i.e. natural or obvious isomorphism between them. A metric defines one possible isomorphism. A different metric defines another.

Seemingly, another way to define an isomorphism is to choose a basis for the vector space and then associate to it the basis of dual vectors. But this is equivalent to choosing a metric where the basis vectors are orthonormal (perpendicular and of length 1).

- The gradient is not coordinate dependent, It is a tensor field . But it is dependent on the metric.

Last edited:
To me, understanding why vectors are different from covectors is helped by considering a case where it is impossible to convert between the two.

Instead of the usual space of locations in the universe, let's consider an abstract space from thermodynamics. Suppose you have a balloon filled with air. This balloon has a certain volume $V$, but the volume isn't constant. It depends on the temperature and air pressure of the room that it is placed into. So we can describe $V$ as a scalar function $V(P,T)$ in a two-dimensional space with coordinates $P,T$.

Now, in this 2-D space, we can easily define two different types of vector-like objects.
1. We can describe the dependency of $V$ on "location" by the one-form $\vec{\nabla} V$ with components $V_P = \frac{\partial V}{\partial P}$ and $V_T = \frac{\partial V}{\partial T}$.
2. We can describe the change in "location" of the balloon as a function of time by the tangent vector $\vec{U}$ with components $U^P = \frac{dP}{dt}$, $U^T = \frac{dT}{dt}$
To see that $\vec{\nabla} V$ and $\vec{U}$ are very different types of objects, ask yourself what it would mean for them to be in the "same direction". Obviously, two vectors are in the same direction if they are linear multiples of each other. So in terms of components, that means that there is some real number $\alpha$ such that
• $V_P = \alpha U^P$
• $V_T = \alpha U^T$
And that is completely impossible. To see that, just look at the units. If we measure volume in liters, pressure in atmospheres, time in seconds and temperature in degrees, then
1. The units of $V_P$ is $\frac{liter}{atmosphere}$
2. The units of $U^P$ is $\frac{atmosphere}{second}$
3. The units of $V_T$ is $\frac{liter}{degree}$
4. The units of $U^T$ is $\frac{degree}{second}$
From 1&2, we would conclude that $\alpha$ has units $\frac{second\ \cdot\ liter}{atmosphere^2}$
From 3&4, we would conclude that $\alpha$ has units $\frac{second\ \cdot\ liter}{degree^2}$

There is no single scaling factor $\alpha$ that could possibly work to make $\vec{\nabla V} = \alpha\ \vec{U}$. To make sense of these two vector-like objects being in "the same direction", you would need a way to convert degrees into atmospheres.

Here's another impossibility for this 2-D space: computing the "length" of a vector (of either type). Naively, if you have a two-component vector $\vec{U}$ then you could define the length to be $|\vec{U}| = \sqrt{(U^P)^2 + (U^T)^2}$. But that doesn't make any sense, because $U^P$ and $U^T$ have different units.

But there is one operation that you can do with vectors that DOES make sense: You can multiply a vector by a covector:

$\vec{\nabla V}\ \cdot \ \vec{U} = V_P U^P + V_T U^T$

This quantity has units $\frac{liter}{second}$, and has a clear interpretation: It is equal to $\frac{dV}{dt}$, the rate of change of the volume of $V$ as both the pressure and temperature change with time.

This is getting pretty long-winded, already, but I thought it would wrap up the discussion if I showed how the "impossibilities" are resolved by a metric tensor.

Although we can't convert $\vec{\nabla V}$ into $\vec{U}$ with a real number $\alpha$, we could relate them by a tensor $g$. Suppose we had a tensor $g$ with 4 components $g_{PP}, g_{PT}, g_{TP}, g_{TT}$. Then we could use that tensor to convert a vector $\vec{U}$ into a covector $\vec{\tilde{U}} = g(\vec{U})$ by letting $\tilde{U}_i = \sum_j g_{ij} U^j$ (where $i$ and $j$ run through the set $P,T$). Then using the tensor $g$, we could say
• $\vec{\nabla V}$ is in the same direction as $\vec{U}$ if $\vec{\nabla V} = \alpha g(\vec{U})$.
We could also use $g$ to define a "length" of a vector $\vec{U}$:
• $|\vec{U}| = \sqrt{g(\vec{U}) \cdot \vec{U}}$
where $\cdot$ is the operator that multiplies a covector by a vector.

Finally, $g$ would provide a way to convert degrees into atmospheres: the conversion factor $F = \sqrt{\frac{g_{PP}}{g_{TT}}}$ has units $\frac{atmosphere}{degree}$
Thanks very much for this. Very helpful!