# Covariance and Contravariance

Hey, I posted earlier about the tensor covariant derivative, and the help was great, that makes sense to me now.

However, I am getting really really stuck on the concept of covariant vectors vs. contravariant vectors. I've looked through as many resources as I can - wikipedia, mathworld, the NASA tensor pdf (which is otherwise great), schaum's tensor outline, and I'm getting nowhere. My next step is MTW, though I am sorta intimidated by it so I haven't looked yet. I just can't understand, maybe I'm stupid.

Everyone seems to be concerned with how the components of the vector transform under a change of coordinates, but to me this feels like sophistry. The vector and manifold exist independent of coordinates - a change of coordinates is just redrawing lines over the manifold, nothing actually changes. Similarly, the change of coordinates does absolutely nothing to the vector itself, it just affects how it is written.

So then how are contravariant and contravariant vectors actually different? It seems to me like they are just vectors, forget any of this contravariant/covariant business.

Now another point a lot of these books bring up is how you can convert a covariant vector to a contravariant vector and vice versa by taking the inner product with the metric tensor. However, this seems to me like a hackish abuse of the inner product. Instead of the metric eating 2 vectors like a good inner product, it eats 1 and then chills out.

Anyways, I would appreciate any help with this!

CompuChip
Homework Helper
Technically speaking, contravariant vectors are not vectors. Suppose we have some vector space V. Then a vector is an element v of V. In quantum mechanics we would write |v> and in the tensor formalism we can write $v^\mu$ to indicate that we can write it out in components, in some basis. Of course, as you say, the choice of coordinates is arbitrary and for any coordinate system, the vector is really the same thing. But the components (the number we write down to specify the vector, relative to the coordinate system) do change. So if we have some matrix $\Lambda$ which gives the change of coordinates (as you are used to in linear algebra) we write the components relative to these bases as $\Lambda_\nu{}^\lambda$ and it turns out that the components of the vector are related by a special formula, $v^\mu$ in the new basis is given by $\sum_\nu \Lambda_\nu{}^\mu v^\nu$.

A covariant vector is, however, not an element of the vector space V, but an element of the dual space V*. In other words, it is a linear form on V: you stick in a vector and get out a number, and if you stick in a linear combination of vectors you can calculate the number by looking at the separate parts of that linear combination. Now if we are in a special vector space, in which there is an inner product, it turns out that there exists for every vector v in V, an element v* of V* with the property that applying v* to any w in V gives the inner product between v and w. In quantum mechanics, we would write <v| instead of v* and in the tensor formalism we write $v_\mu$ for the components relative to some basis.

Note that, though the same letter is used for both, they are actually different things which live in different spaces. Again, the $v_\mu$ are just number which depend on the coordinate system that you (arbitrarily) choose. Of course, we can just as well choose some other coordinate system by a change of basis $\Lambda_\mu{}^\nu$. But because $v_\mu w^\mu$ (v*(w), in the old notation) is a number which does not have anything at all to do with the coordinates, the way in which it must transform is well-defined. In fact, an easy calculation shows that it must transform with the inverse of the change-of-coordinate matrix, which we - somewhat confusingly but very conveniently in practice - denote by $\Lambda^\mu{}_\nu$.

So you have really two different objects in two different spaces, and an arbitrary set of coordinates. But indeed, as you say, certain things are unchanged when we choose another arbitrary set of coordinates. For example, the actual vector must stay the same, therefore, changing coordinates means that the components of the vector w.r.t. the coordinates must change as well. But also, these things from different spaces "interact" which each other (you can apply one to the other and get a number, which is independent of the coordinates); therefore the way that we can write down the components of both when going to another coordinate system is related.

Finally, since the two spaces V and V* are isomorphic, we can go from elements v in the former to unique elements V* in the other and vice-versa. In quantum mechanics, this amounts to going from bras to kets (& v.v.) and in the tensor formalism this amounts to going from contravariant to covariant "vectors" (& v.v.). It turns out that the way to do it is by the metric (which has to do with the inner product <.|.>, and that has to satisfy certain rules such as <v|w> = <w|v>*, or, equivalently, $v_\mu w^\mu = w^\mu v_\mu = v^\mu w_\mu$) and this can be shown (e.g. in a linear algebra setting, or with bra-ket notation, or in tensor notation, whichever you prefer). If you want you can view it this way: the transformation is fixed and well-defined; the index notation just turns out to have a very nice property, where this transformation just amounts to raising and lowering indices, or - equivalently - writing the metric in between and summing over all repeated indices. You can view this purely as a formal trick however, with a very convenient notation; the underlying principles can all be derived without this notation from linear algebra and functional analysis.

I hope that makes things a bit clearer.

Fredrik
Staff Emeritus
Gold Member
Technically speaking, contravariant vectors are not vectors[/color].
Very good post. Just a minor correction. They are vectors, but not tangent vectors. (It's obvious from the rest of your post that you are aware of this. I'm just nitpicking a bit, because I think I would find that first sentence a bit confusing if I didn't know this stuff already).

Hello maze.

You have already received excellent responses. But perhaps i can help a little because i think you will be having the same problems as me when i first began to look at the subject.

Bearing in mind that I am also still learning about tensors and so am no expert but will attempt an answer to your confusion about the inner product. This is a bit rambling and a lot non-rigorous but I think it gives the general idea but any corrections from those better versed are of course welcome to me as well.

The inner product is an operation between two vectors. These vectors live in the same space. As a covector lives in a different space it needs a metric to express it in the same space as the vector. An inner product can then be made between the two vectors.
The relationship between vectors, covectors and the metric is vector times covector equals metric. To do this calculation in matrix form you must write the vector as a column and the covector as a row.

In Euclidean space the metric is a diagonal matrix with unit entries and so the covector sort of “equal” to the vector and so the calculation can be done without the metric although I suppose we should think of the covector being acted upon by the metric to make it into a vector so that we can work out the inner product. When we first learn about inner products we learn using Euclidean space but at this level the distinction is unimportant.

I too had a lot of trouble grappling with the distinction between covariant and contravariant vectors. I didn't understand it fully until I found an explanation in terms of the linear-algebraic aspects of tensor analysis, exactly like CompuChip's post.

In my opinion, any introduction to tensors that doesn't talk about dual spaces and linear functionals is bound to fail to adequately explain the distinctions between tangent and cotangent vectors. All these explanations that attempt to simplify things and leave out such notions in the end only caused me a whole lot of pain that could have been avoided if they would have just presented the subject for what it is in the first place.

Hurkyl
Staff Emeritus
Gold Member
I advise completely avoiding the term "covariant" and "contravariant" when possible -- there are actually two conventions regarding those terms, and they are exact opposites!

Spivak's Differental Geometry is a good source for differential geometry.

When a particular vector space is most interesting (e.g. the tangent space to a point on your manifold), it is customary to reserve the word "vector" specifically for elements of that vector space, and use other names for elements of related vector spaces. For example, elements the dual space (e.g. the cotangent space) are called "covectors", and arbitrary elements of the tensor algebra are called "tensors".

Now another point a lot of these books bring up is how you can convert a covariant vector to a contravariant vector and vice versa by taking the inner product with the metric tensor. However, this seems to me like a hackish abuse of the inner product. Instead of the metric eating 2 vectors like a good inner product, it eats 1 and then chills out.
This is something you'll just have to get used to -- the ability to do this is, in one deep sense, the very reason why things like sets and vector spaces are very notions.

You, incidentally, use this property very often -- it's exactly what you do when you define a function via something like
f(x) := 3x.​
Normally, multiplication is a function of two variables, the multiplier and multiplicand. But if you choose a specific value for the multiplier, you can plug it in, yielding a function of one variable.

In general, if you have any function $f:A \times B \to C$, you can choose a specific element a of A, you can produce a new function $g : B \to C$ defined by $g(b) = f(a, b)$. By letting $C^B$ denote the set of all functions from B to C, this process can be described as a function $h: A \to C^B$. h is defined so that h(a) is the function defined by $h(a)(b) = f(a, b)$.

For vector spaces, you have the same property relating linear functions $A \otimes B \to C$ to linear functions $A \to L(B, C)$, where $L(B, C)$ is the vector space of all linear transformations from B to C. (Note that the dual space $V^*$ is, by definition, given by $V^* = L(V, \mathbb{R})$, for real vector spaces)

Last edited:
Ok, first of all thank you to everyone who has posted in this thread. The explanations are really helping, especially CompuChip's, but everyone else's as well. I'm going to try to write out how I understand things now, both for my own benefit, and so that you can correct me if I make mistakes.

If you have a surface M, the space of all tangent vectors to M at a point p forms a vector space, $V_p$. If the surface is expressed in a particular coordinate system, $x_1,x_2,...$, then the tangent vectors to the coordinate curves at p form one basis for $V_p$. However, if you changed the coordinate system of M to $\bar{x}_1,\bar{x}_2,...$, the tangent vectors to the new coordinate curves would form a different basis of $V_p$. You can convert back and forth between these basis'es by using the chain rule, and in general the transformation of vector components $v^i$ of v looks like a matrix multiplication, $\bar{v}^i = \Lambda_{i}^{k} v^{i}$.

The vector space $V_p$ admits an inner product, $\left<\textbf{v},\textbf{w}\right>$. In order for the inner product to correspond properly to length and angles in euclidean space, the inner product is given in particular coordinates by
$$\left<\textbf{v},\textbf{w}\right>=\left[v^1, v^2, ...\right] \left(J J^T\right)^{-1} \left[w^1, w^2, ...\right]^T$$

where J is the Jacobian for the transformation from euclidean coordinates to the coordinate system v and w are written in.

To generalize from a surface to a more general manifold that is not necessairily embedded in euclidean space, you have to be a little more careful about how you build the tangent vector space. One way to do this is to use equivalence classes of curves as a substitute for tangent vectors when building $V_p$. You also don't have a mapping from euclidean coordinates to your coordinates or the Jacobian, but thats OK because the manifold will come equipped with an inner product already in the form of the metric, and you can just replace $\left(J J^T\right)^{-1}$ with the metric matrix G.
$$\left<\textbf{v},\textbf{w}\right>=v^i G_{i j} w^i$$

If I were inventing differential geometry, I would probably have just stopped right here, seeing as how you can calculate pretty much anything you want about tangent vectors with what we have so far. However, apparently it is interesting and useful to also consider the space of all linear functionals on $V_p$ as well (why is this?).

Now, let's actually think about linear functionals for a minute. If f is a linear functional, that means f(v) eats a vector and spits out a number and does so linearly. That's all fine and good, but suppose we have 2 linear functionals, f and g. Is there a sense in which we could "add" the functionals? Yes! If you define (f + g)(v) = f(v)+g(v), then "f+g" is another linear functional. More broadly, with addition between linear functionals defined this way, the space of all linear functionals actually forms a vector space, call it $V^*_p$ (one could check all the vector space axioms - its all good).

To actually calculate f(v), first express v in terms of a basis, $\textbf{v}=v^1 \textbf{e}_1 + v^2 \textbf{e}_2 + ...$. Then we can figure out how f acts on v if we know how f acts on each element of the basis.
$$f\left(\textbf{v}\right) = f\left(v^1 \textbf{e}_1 + v^2 \textbf{e}_2 + ...\right)=v^1 f\left(\textbf{e}_1\right) + v^2 f\left(\textbf{e}_2\right) + ...=\left<\textbf{v},\textbf{e}_1\right> f\left(\textbf{e}_1\right) + \left<\textbf{v},\textbf{e}_2\right> f\left(\textbf{e}_2\right) + ...$$.

Remembering that linear functionals are actually vectors in $V^*_p$, the values $f_i = f\left(\textbf{e}_i\right)$ can be thought of as the "components" of f in the basis of functionals $\textbf{h}_1\left(\textbf{v}\right)=\left<\textbf{v},\textbf{e}_1\right>$, $\textbf{h}_2\left(\textbf{v}\right)=\left<\textbf{v},\textbf{e}_2\right>$, ... Thus $f=f_1 \textbf{h}_1 + f_2 \textbf{h}_2 + ...$ This makes a particularly nice way to write f(v) if we know the components of f and v in their corresponding basis'es: $f\left(\textbf{v}\right)=f_i v^i$.

Above we wrote v in a certain basis $\textbf{e}_1,\textbf{e}_2, ...$, and that corresponded to the following basis for f: $h_1\left(\textbf{v}\right)=\left<\textbf{v},\textbf{e}_1\right>$, $h_2\left(\textbf{v}\right)=\left<\textbf{v},\textbf{e}_2\right>$, ... If the coordinate system is changed, the basis for v and f are both changed, and so the components of v and f will change, but how? We already saw that under a coordinate transform, the components of v change according to a matrix multiplication $\bar{v}^i = \Lambda_{i}^{k}v^{i}$. What about the components of f? If you work through it, you will find that the components of f transform inversely: $\bar{f}_i = \left(\Lambda^{-1}\right)_{i}^{k}f_{i}$ (is there an easier way to see this besides crunching the matrix algebra?).

Finally, since
$$f\left(\textbf{v}\right) =\left<\textbf{v},\textbf{e}_1\right> f\left(\textbf{e}_1\right) + \left<\textbf{v},\textbf{e}_2\right> f\left(\textbf{e}_2\right) + ...$$,

and since
$$\left<\textbf{v},\textbf{w}\right>=v^i G_{i j} w^i$$,

we can think of $v_j = v^i G_{i j}$ as the coordinate representation of the linear functional that computes $\left<\textbf{v},\textbf{w}\right>$.

Hurkyl
Staff Emeritus
Gold Member
Bleh, I'm quite conditioned to use the "(coordinate representations of) tangent vectors are columns" and "cotangent vectors are rows" convention. Hopefully I won't make any errors using your convention, but if you see something weird, that might be the reason.

the manifold will come equipped with an inner product already in the form of the metric, and you can just replace $\left(J J^T\right)^{-1}$ with the metric matrix G.
Since we're building things from scratch, we want to be careful to actually define what kind of object G is!

We've defined the tangent space V so that it's a vector space, so it seems clear that we could simply define a "tangent linear transformation", which is a linear transformation from V to V. So we can let G be such a thing, and we're happy.

Now, we can define the inner product: the inner product G represents ought to be $\langle v, w \rangle := v G w^T$. But now we see a problem: what is the transpose of a tangent vector?!?! Okay, we can fall back on coordinates: we might simply try defining $\langle v, w \rangle := \sum_{ij} v^i G_i^j w^j$.

So let's check: does this all make sense? We need to check coherence with respect to a change of basis. Let $[\mathbf{v}]_B$ denote the coordinate representation of v with respect to the basis B. Let $\Lambda$ be the change-of-basis matrix between B and B' (so $[\mathbf{v}]_{B'} = [\mathbf{v}]_{B} \Lambda$)

(Or, if you prefer index notation, $\bar{v}^i = v^j \Lambda_j^i$)

Exercise: Determine how the coordinate representation of G must transform...
1. ... based on the fact G is a linear transformation
2. ... based on the fact $\langle \cdot, \cdot \rangle$ is an inner product

(Hrm. I have more to say, but I think I'll give you time to reflect upon this first)

Last edited:
So let's check: does this all make sense? We need to check coherence with respect to a change of basis. Let $[\mathbf{v}]_B$ denote the coordinate representation of v with respect to the basis B. Let $\Lambda$ be the change-of-basis matrix between B and B' (so $[\mathbf{v}]_{B'} = [\mathbf{v}]_{B} \Lambda$)

(Or, if you prefer index notation, $\bar{v}^i = v^j \Lambda_j^i$)

Exercise: Determine how the coordinate representation of G must transform...
1. ... based on the fact G is a linear transformation
2. ... based on the fact $\langle \cdot, \cdot \rangle$ is an inner product

(Hrm. I have more to say, but I think I'll give you time to reflect upon this first)

Ok, just thinking of everything as boxes of numbers being summed over, you could say that
$$\left<\textbf{v},\textbf{w}\right>=v^i G^j_i w^j = \bar{v}^k \left(\Lambda^{-1}\right)^i_k G_i^j \left(\Lambda^{-1}\right)_l^j \bar{w}^l$$

Since $\left<\textbf{v},\textbf{w}\right> = \bar{v}^i \bar{G}^j_i \bar{w}^j$ as well, then $\bar{G}_k^l=\left(\Lambda^{-1}\right)^i_k G_i^j \left(\Lambda^{-1}\right)^j_l$

I'm pretty sure all the steps I took are correct (just substituting things in), but yet something screwy is going on with the indicies (they don't match).

Also, I didn't use any of the properties of the inner product in these steps. The inner product would force G to be symmetric, positive definite, and possibly other nice things.

Last edited:
Hurkyl
Staff Emeritus
Gold Member
then $\bar{G}_k^l=\left(\Lambda^{-1}\right)^i_k G_i^j \left(\Lambda^{-1}\right)^j_l$
Now, how does that compare with the result of exercise 1?

I'm pretty sure all the steps I took are correct (just substituting things in), but yet something screwy is going on with the indicies (they don't match).
That's a hint of things to come.

CompuChip
Homework Helper

If I were inventing differential geometry, I would probably have just stopped right here, seeing as how you can calculate pretty much anything you want about tangent vectors with what we have so far. However, apparently it is interesting and useful to also consider the space of all linear functionals on $V_p$ as well (why is this?).
That has to do with Hurkyl's post about converting multilinear forms to lower "degree" linear forms. We know that the inner product is linear in - in particular - the second slot. So if we momentarily fix v, we can consider a function $f: V \to \mathbb{R}: \vec w \mapsto \langle \vec v, \vec w \rangle$ and this turns out to be a linear form. Hence, we can apply the full machinery of functional analysis here, and we can compare to the quantum mechanical formalism of bras and kets which we by now (hopefully) already understand very well. It is common in physics to look at the same thing in a different way, and try to discover parallels with what we already know. You could even turn it around and say that if we understand this view correctly we might better understand the QM formalism.

Now, let's actually think about linear functionals for a minute. If f is a linear functional, that means f(v) eats a vector and spits out a number
No, it means that f eats a vector and spits out a number f(v). I suppose you meant: let's fix a vector v in the vector space and consider the unique linear functional we can build out of that by the construction given earlier, which - in a sense - still depends on v, so let's denote it by f(v). Then f(v) is a linear functional, it eats a vector (e.g. w) and spits out a number (f(v))(w) which happens to be equal to <v, w> by construction.

The space of all linear functionals actually forms a vector space, call it $V^*_p$ (one could check all the vector space axioms - its all good).
In some cases it turns out to be isomorphic to V itself. And very generally it turns out that, repeating this construction, the dual of the dual V** is always isomorphic to V. In physics, I think we use this in the fact that we go back and forth between bra's and ket's (e.g. the bra corresponding to the ket corresponding to a bra, is just itself).

To actually calculate f(v)
Note that now you are switching back to f being the linear functional, so f(v) is already a number.

Remembering that linear functionals are actually vectors in $V^*_p$, the values $f_i = f\left(\textbf{e}_i\right)$ can be thought of as the "components" of f in the basis of functionals $\textbf{h}_1\left(\textbf{v}\right)=\left<\textbf{v},\textbf{e}_1\right>$, $\textbf{h}_2\left(\textbf{v}\right)=\left<\textbf{v},\textbf{e}_2\right>$, ... Thus $f=f_1 \textbf{h}_1 + f_2 \textbf{h}_2 + ...$ This makes a particularly nice way to write f(v) if we know the components of f and v in their corresponding basis'es: $f\left(\textbf{v}\right)=f_i v^i$.
You are messing things up a bit. The hi(v) you define are just numbers, they aren't a basis for anything. I suppose you meant to explain that one can define a basis { fi } of linear functions such that $$f_i \in V^*, f_i(e^j) = \delta_i^j$$ (extended linearly to V*) for the chosen basis { ei } of V. This basis is called the basis dual to { ei } and indeed we can write $f = c_i f^i$ (note that the c are just numbers, and the f are linear functionals). If you continue this to the rest of your post, you will find the result you stated (so you messed up the notation, but the idea was fine).

Finally, I'd like to add that you can see the metric itself as the inner product. So the metric $g_{\mu\nu}$ is a tensor with two covariant indices (which just means: it is a bilinear map $g: V \times V \to \mathbb{R}$). You plug in two vectors and get a number: $g(\vec v, \vec w) = \langle \vec v, \vec w \rangle$ in functional notation, in tensor notation: $g_{\mu\nu} v^\mu w^\nu = v^\mu w_\mu$ (the metric "lowers the index"; if you want just view $v^\mu w_\mu$ as the notation for the inner product, forgetting about linear forms) and because it is symmetric, we can also view this as not g(v, w) but g(w, v) and get $v_\mu w^\mu$. But we can also plug in just one vector, so we get an object with just one slot for a vector which - when filled - produces a number. But that's exactly what we called a linear functional:
$$g(\vec v, \cdot) \in V^*: \vec w \mapsto \langle \vec v, \vec w\rangle \in \mathbb{R}$$.
In index notation, this is written as $g_{\mu\nu} v^\mu$. This is an object with one lower index, and objects with a lower indices are linear maps ("co-vectors") so indeed the notation is consistent.

One can extend this formalism to objects with k upper indices and l lower indices: if you plug in k linear forms and l vectors you get a number (and if you plug in less than that, you get a new map of "lower rank").
Perhaps it would be very instructive if you read chapter 1 of Sean Carroll's lecture notes on General Relativity, as he explains this quite rigorously (for a physicist) with clear physical applications in mind.

@Hurkl:
Ahh, I didn't realize those were supposed to be separate exercises. I'll go over it again at the next opportunity, which will be in a day or so.

@CompuChip:
There is a lot of material in your post that will take a while to go through. I will have to give some serious thought to the issues of isomorphism and other points you mention. However a couple of things came up that I can address right away.

First, my understanding of quantum mechanics is not at this high of a level, so a lot of the connections you making related to bra and ket vectors are going over my head.

Second, I think a couple of the problems you've identified are not actually problems but rather misunderstandings due to my poor choice of wording.

When I said f(v) in that section, I meant the general function, for any vector, not the value of f for a particular vector v. This is similar to how, if you want to talk about a function of a single variable, you might say "f(x)", even though technically f(x) is a particular value, not the function as a whole. I probably should have said $f\left(\cdot\right)$. Again, the same terminology thing is going on when I talked about the (dual) basis vectors $h_i\left(\cdot\right)$ for the linear function space $V^*$.

If there is a real problem or subtlety I'm missing here, please let me know, but I don't think there is. I do think that my reasoning is correct, despite being poorly worded.

CompuChip
Homework Helper
There is a lot of material in your post that will take a while to go through. I will have to give some serious thought to the issues of isomorphism and other points you mention.
I hope it's not too much new material, just the same things already discussed in this thread, maybe seen from another point of view. If they help you understand it better, good; if they confuse you more, forget about it.

First, my understanding of quantum mechanics is not at this high of a level, so a lot of the connections you making related to bra and ket vectors are going over my head.
OK, that's too bad, I was hoping it would clear things up a bit. Anyway, when you ever decide to do QM you already understand about bras and kets :)

Second, I think a couple of the problems you've identified are not actually problems but rather misunderstandings due to my poor choice of wording.
[...]
If there is a real problem or subtlety I'm missing here, please let me know, but I don't think there is. I do think that my reasoning is correct, despite being poorly worded.
As I already said, I think you got the idea very well. I don't even know if your notation was wrong or whether it was correct but just confusing for me. But indeed, you got the point

$$\bar{G}_k^l=\left(\Lambda^{-1}\right)^i_k G_i^j \left(\Lambda^{-1}\right)^j_l$$
Now, how does that compare with the result of exercise 1?

That's a hint of things to come.

So, based on linearity,
$$G\left(\left[\textbf{v}\right]_B\right) = G\left[\textbf{v}\right]_B = G\Lambda^{-1}\left[\textbf{v}\right]_{B'}$$

Therefore $\bar{G} = G \Lambda^{-1}$.
So $\bar{G}=\Lambda^{-1}G\Lambda^{-1}$ AND $\bar{G}=G \Lambda^{-1}$, so $G \Lambda^{-1} = \Lambda^{-1} G \Lambda^{-1}$, which would imply $G = \Lambda^{-1} G$. I don't see how this could possibly hold for any possible coordinate transform matrix.

Last edited:
scratch that. the manipulations there don't actually determine $\bar{G}$

Hurkyl
Staff Emeritus
Gold Member
Recall the basic computational formula for the coordinate representation [T] of a linear transformation T:
[T(v)] = [v] [T]​

(Incidentally, this is the main reason I prefer the convention where vectors are column vectors; so that this identity isn't 'backwards')

Or, with indices, if w = T(v), then $w^i = T^i_j v^j$

If you want to try working it out again, then don't read below this point.

Okay, the calculations work out to:
$$[v]_{B'} [G]_{B'} = [G(v)]_B' = [G(v)]_B \Lambda = [v]_B [G]_B \Lambda = [v]_{B'} \Lambda^{-1} [G]_B \Lambda$$
and so
$$[v]_{B'} \left( [G]_{B'} - \Lambda^{-1} [G]_B \Lambda \right) = 0$$
which implies, because this is true for all coordinate tuples $[v]_{B'}$,
$$[G]_{B'} = \Lambda^{-1} [G]_B \Lambda$$

Your observation was correct -- this is different than the constraint we needed on G in order for our naive attempt at defining an inner product to work!

The notion of an inner product clearly make sense -- but it is not so clear that inner products bear a nice relation to linear transformations of tangent vectors. Now that we've uncovered a contradiction, how would you proceed in developing differential geometry?

Last edited:
Okay, the calculations work out to:
$$[v]_{B'} [G]_{B'} = [G(v)]_B' = [G(v)]_B \Lambda = [v]_B [G]_B \Lambda = [v]_{B'} \Lambda^{-1} [G]_B \Lambda$$
and so
$$[v]_{B'} \left( [G]_{B'} - \Lambda^{-1} [G]_B \Lambda \right) = 0$$
which implies, because this is true for all coordinate tuples $[v]_{B'}$,
$$[G]_{B'} = \Lambda^{-1} [G]_B \Lambda$$

I think there is a problem with your derivation. How can necessairily know that
$$[G(v)]_{B'} = [G(v)]_B \Lambda$$
?

I think that step is in error because the contradiction it would raise isn't just a surface issue that you avoid and develop differential geometry around. If you follow the contradiction back to the assumptions, it would literally mean that there exists no function of tangent vectors satisfying all the properties of an inner product! (except the euclidean inner product, then G would be I and there would be no contradiction).

Also, on the previous page there was the issue of indices acting screwy and not matching when changing coordinates. I'm pretty sure that is because the indices on G were wrong to start out with. If you use $\left<\textbf{v},\textbf{w}\right>=v^i G_{i j} w^j$ instead of $v^i G^i_j w^j$, then everything works out fine. Also, having both indices lower like this is proper, since G is a linear functional on tangent vectors in V for both inputs.

Here is a summary of the situation as I understand it (I'm really starting to warm up to your notation, and will use it here. Also, I'm using column vectors instead of row vectors)

Coordinate representation of the inner product:
Let B = $\left{\textbf{e}_1, \textbf{e}_2, ...\right}$ be a basis for the tangent space V. Then
$$\left<\textbf{v},\textbf{w}\right> = \left<v^1\textbf{e}_1+v^1\textbf{e}_2+...,w^1\textbf{e}_1+w^1\textbf{e}_2+...\right> = \sum_{i j} v^i w^j \left<\textbf{e}_i,\textbf{e}_j\right>$$

Define
$$\left[G\right]_B=\left[\begin{matrix}\left<\textbf{e}_1,\textbf{e}_1\right> & \left<\textbf{e}_1,\textbf{e}_2\right> & ... \\ \left<\textbf{e}_2,\textbf{e}_1\right> & \left<\textbf{e}_2,\textbf{e}_2\right> & ... \\ \vdots & \vdots & \ddots \end{matrix}\right]$$

Then
$$\left<\textbf{v},\textbf{w}\right> =\left(\left[\textbf{v}\right]_B\right)^T \left[G\right]_B \left[\textbf{w}\right]_B$$.

Change of coordinates:
Let B' be another basis for V. Then the coordinate vector representation in one basis can be transformed into the coordinate vector representation in another basis via a linear transformation:
$$[\mathbf{v}]_{B'} = \Lambda [\mathbf{v}]_{B}$$

Transformation of G based on inner product properties:
$$\left<\textbf{v},\textbf{w}\right>= \left(\left[\textbf{v}\right]_B\right)^T \left[G\right]_B \left[\textbf{w}\right]_B = \left(\Lambda^{-1}\left[\textbf{v}\right]_{B'}\right)^T \left[G\right]_B \Lambda^{-1} \left[\textbf{w}\right]_{B'} = \left(\left[\textbf{v}\right]_{B'}\right)^T \left(\Lambda^{-1}\right)^T \left[G\right]_B \Lambda^{-1} \left[\textbf{w}\right]_{B'}$$
So
$$\left[G\right]_{B'} = \left(\Lambda^{-1}\right)^T \left[G\right]_B \Lambda^{-1}$$

(Questionable) Transformation of G based on linearity:
$$[G]_{B'}[v]_{B'} = [G(v)]_{B'} = \Lambda [G(v)]_B = [G]_B \Lambda [v]_B = \Lambda [G]_B \Lambda^{-1} [v]_{B'}$$
=>
$$\left( [G]_{B'} - \Lambda [G]_B \Lambda^{-1} \right)[v]_{B'} = 0$$
for all $[v]_{B'}$
=>
$$[G]_{B'} = \Lambda [G]_B \Lambda^{-1}$$

Incidentally, this coincides with the result from inner product properties if $\Lambda$ is orthogonal. Perhaps $[G(v)]_{B'} = [G(v)]_B \Lambda$ only if $\Lambda$ is orthogonal?

Hurkyl
Staff Emeritus
Gold Member
If we assume G is a linear transformation (i.e. that it's coordinate representation is an ordinary matrix), then G(v) is a tangent vector!

I've been exhausted lately, I dunno if I can spend as much time on this trying to ask leading questions, so I'll go straight for the point.

Because we see interesting objects transforming differently than simple things (like tangent vectors or linear transformations of tangent vectors), that shows that we must be must be interested in a greater variety of objects.

The correspondence between indices and matrix algebra is that superscripts are row indices, and subscripts are column indices. You are aware that G should really have two lower indices (of course, this is a little circular, because the index convention was developed, I assume, precisely because it respects this distinction), and so it shouldn't be a matrix. Or at least, it shouldn't be an n by n matrix -- in one of the standard ways of representing such things, the coordinate representation of G would, in fact, be a 1 by n^2 matrix (actually, it would be partitioned; n partitions of n columns each), and the inner product would be computed by

$$\langle x, y \rangle = [G]_B ([x]_B \otimes [y]_B)[/itex] where, for example, we use $\otimes$ to denote the Kronecker product. Aside: can you find a good product [G]_B [x]_B that's based on viewing [G] not as a 1xn^2 matrix, but instead as a 1xn matrix of 1xn matrices? Can you find a second one? (There are only two "good" ones) The two choices are obvious in index notation. For either of those products, What kind of object is [G]_B [x]_B? Can you define a reasonable product Gx? What kind of object would it be? You kept wanting to write $[v]^T_B$, the (matrix) transpose of the coordinate representation of v. Such a thing, of course, is not a (column, coordinate) vector. What sort of role does it play in the matrix algebra? What kind of object could have such a coordinate representation? Coming up with an interpretation of $v^T$ is harder -- defining a 'transpose' operation on vectors is equivalent to choosing a metric! For example, in special relativity, the 'right' transpose to use is [tex] \left[ \begin{array}{c} c t \\ x \\ y \\ z \end{array} \right]^T = \left[ \begin{array}{cccc} ct & -x & -y & -z \end{array} \right]$$

which gives the Minkowski inner product as $\langle v, w \rangle = v^T w$.

The correspondence between indices and matrix algebra is that superscripts are row indices, and subscripts are column indices. You are aware that G should really have two lower indices (of course, this is a little circular, because the index convention was developed, I assume, precisely because it respects this distinction), and so it shouldn't be a matrix. Or at least, it shouldn't be an n by n matrix -- in one of the standard ways of representing such things, the coordinate representation of G would, in fact, be a 1 by n^2 matrix (actually, it would be partitioned; n partitions of n columns each), and the inner product would be computed by

$$\langle x, y \rangle = [G]_B ([x]_B \otimes [y]_B)[/itex] where, for example, we use $\otimes$ to denote the Kronecker product. Ahhh, I see! You are requiring strict enforcement of the idea that functionals correspond to row vectors and tangent vectors must correspond to column vectors respectively in the matrix representation. I was just using the matrix notation as a shorthand for summation with no deeper meaning. Aside: can you find a good product [G]_B [x]_B that's based on viewing [G] not as a 1xn^2 matrix, but instead as a 1xn matrix of 1xn matrices? Can you find a second one? (There are only two "good" ones) The two choices are obvious in index notation. For either of those products, What kind of object is [G]_B [x]_B? Can you define a reasonable product Gx? What kind of object would it be? Sure, [tex]\left[\begin{matrix}\left[\begin{matrix}G_{1 1} \\ G_{1 2} \\ \vdots \end{matrix}\right] \\ \left[\begin{matrix}G_{2 1} \\ G_{2 2} \\ \vdots \end{matrix}\right] \\ \vdots \end{matrix}\right]\left[\begin{matrix}x_1 \\ x_2 \\ \vdots \end{matrix}\right]=\left[\begin{matrix}x_1\left[\begin{matrix}G_{1 1} \\ G_{1 2} \\ \vdots \end{matrix}\right] + x_2\left[\begin{matrix}G_{2 1} \\ G_{2 2} \\ \vdots \end{matrix}\right] + ...\end{matrix}\right]$$

OR

$$=\left[\begin{matrix}\left[\begin{matrix}G_{1 1} \\ G_{1 2} \\ \vdots\end{matrix}\right]\cdot\left[\begin{matrix}x_1 \\ x_2 \\ \vdots\end{matrix}\right] \\ \left[\begin{matrix}G_{2 1} \\ G_{2 2} \\ \vdots\end{matrix}\right]\cdot\left[\begin{matrix}x_1 \\ x_2 \\ \vdots\end{matrix}\right] \\ \vdots \end{matrix}\right]$$

You kept wanting to write $[v]^T_B$, the (matrix) transpose of the coordinate representation of v. Such a thing, of course, is not a (column, coordinate) vector. What sort of role does it play in the matrix algebra? What kind of object could have such a coordinate representation?

In this new system where rows and columns actually have meaning, I'm not sure what it would represent. If you want a "real" row vector, you can't just transpose a box of numbers, you need to do something the minkowski transpose or whatnot depending on the space youre in.

Hurkyl
Staff Emeritus
Gold Member
Almost right, but G (with lower indices) is a row vector of row vectors, not a column vector of column vectors.

Yeah I suppose that is right.

Ben Niehoff
Gold Member
I'm jumping in a bit late here, but here is a simple, non-rigorous way to visualize the difference between vectors and co-vectors, that helped me understand what all the fuss is about:

First, take ordinary, Euclidean space. For maximal simplicity, take a flat, 2-dimensional plane. Ordinarily, we draw a coordinate system on this plane using two lines crossing at right angles. However, in this case, use an oblique coordinate system instead: Let the X axis run horizontally as normal, but let the Q axis (we won't call it Y) run at 120 degrees from X; that is, pointed toward 10 o'clock. You can imagine the entire plane covered in a grid of lines parallel to the X and Q axes, forming a field of rhombuses. Using this oblique coordinate system, there are two ways to assign coordinates to points:

First, we can take the most obvious way: We can walk out, say, 3 units along the X axis, and then 2 units along a line parallel to the Q axis, and mark this point (3,2). This is most analogous about how we think about our usual X-Y system. But there is another method:

Alternatively, we could walk out 3 units perpendicular to the Q axis (that is, at 30 degrees), and then walk 2 units perpendicular to the X axis (i.e., at 90 degrees), and label this point (3,2). This is an equally valid convention, and if followed consistently, the labeled points on our plane will behave as a vector space.

In the ordinary X-Y system, "along the X axis" and "perpendicular to the Y axis" indicate the same direction; but in an oblique system, we must choose one convention or the other. It so happens that this is precisely the distinction between vectors and covectors. Suppose V is the point labelled (3,2) by our first method above; then, to find the covector corresponding to V, we must find the label of this exact same point, but using the second convention! You can verify that this is achieved by multiplying by the metric (just use the law of cosines to find the line element in oblique coordinates).

Of course, this ignores the deeper idea that covectors are linear functionals on the tangent space, but it does give you a geometrical interpretation of what covectors represent. In general: vectors measure a quantity along a direction, whereas covectors measure a density of parallel surfaces.

Bing. The light bulb finally goes on. I've struggled to understand this for quite some time, and was starting to get close, but your post makes it so clear now. Thanks Ben!

Ben Niehoff
Gold Member
I'm glad it helped. I actually made a slight error, but the general concept is the same. This paragraph is wrong:

Alternatively, we could walk out 3 units perpendicular to the Q axis (that is, at 30 degrees), and then walk 2 units perpendicular to the X axis (i.e., at 90 degrees), and label this point (3,2). This is an equally valid convention, and if followed consistently, the labeled points on our plane will behave as a vector space.

Instead, construct the covector components as follows: From our point (call it P), drop perpendiculars to the X and Q axes. The X component of the covector is the value where the perpendicular meets the X axis. Likewise, the Q component is where the other perpendicular meets the Q axis.

If the opening angle of the oblique system is $\gamma$, then the line element is

$$ds^2 = dx^2 + dy^2 + 2 \, dx \, dy \cos \gamma$$

Note the + sign, because when applying the law of cosines, gamma is actually on the exterior of the triangle.

For a vector with components $(a,b)$, its corresponding covector will have components $(a + b \cos \gamma, b + a \cos \gamma)$. Contracting the covector with the vector (i.e., taking the dot product), we get

$$||P||^2 = a \cdot (a + b \cos \gamma) + b \cdot (b + a \cos \gamma) = a^2 + b^2 + 2ab \cos \gamma$$

as expected.

All of this assumes that the axes are marked in unit distances. If they are not, then the metric will be different, and the covector components will not have such a simple geometric interpretation (there will be scale factors involved).

In three dimensions, the equivalent construction would be again to drop perpendiculars to the coordinate lines (not planes). In this case, to locate the point you would mark the axes, take planes perpendicular to the axes at the marked points, and then find the point where the three planes mutually intersect.

I have heard this as well in the meantime, but thank you for the post!

In general: vectors measure a quantity along a direction, whereas covectors measure a density of parallel surfaces.

Can you explain this sentence a bit?

In general: vectors measure a quantity along a direction, whereas covectors measure a density of parallel surfaces.
Can you explain this sentence a bit?

I'm still new to this, but my understanding of the situation is as follows. I'm going to use the example of a 2D surface embedded in 3D to explain the intuition.

Imagine a 2D surface sitting in 3D space, (x,y,z) = f(u,v). At any point, level curves for u and v make a pair of curves that cross. If you take the tangent to each of these curves, the 2 vectors you get form a basis for the tangent plane to the surface at that point.

On the other hand, suppose you start drawing a lot of level curves for u near the point: f(u0,v), f(u0+1,v), f(u0+2,v), etc. Ie: the constant-u contours. Now take a vector perpendicular to the initial u0-contour, and make its length (inversely?) proportional to the density of u-contour lines near the point. In other words, the vector is measuring locally how fast the surface is changing if you walk in the direction of increasing u. This vector, and the corresponding one for the constant-v curves, form a different basis for the tangent plane. Note that the vectors perpendicular to the level curves are not necessairily in the same direction as the tangent vectors (they are only the same if the level curves are perpendicular as they pass through the point).

Ok, so now we have 2 different ways of describing the tangent plane, and it turns out they are related. You can show that second way (level curves) describes the space of linear functionals on the first (tangent vectors).

Maybe this all makes sense to you already, but I had to take a second to think about what linear functionals are. A linear functional is a function that takes any vector from a vector space and returns a number, and does it in a way that is linear (f(av+bw) = af(v)+bf(w). How many possible linear functionals are there for a given vector space? There's certainly a lot of possibilities for different linear functions you could think up. If you think of each possible linear functional, what would the "space" of all linear functionals on a vector space look like? The result is that if the initial vector space is, say, R2, then the space of linear functionals on R2 is another 2D vector space. So the linear functionals form a vector space that basically looks like the original space.

A natural "basis" for the space of linear functionals is given by the inner product with the basis of the original space. If vectors x1 and x2 form the basis for R2, then e1(.)=< . ,x1> and e2(.)=< . ,x2> form the basis for the set of all linear functionals on R2. Remember that e1 and e2 are basis elements for the function space, so they are functions themselves. In this way, each vector, (a,b), in the original space has a corresponding linear functional, a< . , x1>+b< . , x2> in the function space.

So how does this relate back to vectors vs. sets of level curves? Well, lets say you have 2 vectors in the tangent plane, and you want to compute the result of plugging the second vector into the linear functional that corresponds to the first one? If you crunch through the matrix representation of the inner product, you'll find that the way to do this is by the following steps:
1) write the second vector in the basis of the tangent vectors to curves
2) write the first vector (the one that corresponds to the linear functional) in the basis of the perpendicular-to-level-curves vectors, then
3) blindly multiply out the components and add it up like a nice dot product in an orthogonal basis (a1*b1 + a2*b2).

Note that this simple component-by-component multiplication and summation holds even if the basis isnt orthogonal (like polar coords for example), and the reason it still works is because the work you do writing the vectors in the 2 different bases.