# A Covariant derivative definition in Wald

Tags:
1. May 6, 2016

### JonnyG

I'm working through Wald's "General Relativity" right now. My questions are actually about the math, but I figure that a few of you that frequent this part of the forums may have read this book and so will be in a good position to answer my questions. I have two questions:

1) Wald first defines a derivative operator $\nabla$ which maps smooth tensors of type $(k,l)$ to smooth tensors of type $(k, l + 1)$. He defines the operator by its properties. The one that is bothering me is the fourth one. He writes "Consistency with the notion of tangent vectors as directional derivatives on scalar fields: For all $f \in C^{\infty} M$ and all $t^a \in V_p$, $$t(f) = t^a \nabla_a f$$ What does the notation $t^a \nabla_a f$ mean? It looks like a contraction to me, but Wald explicitly says that the index attached to $\nabla$ is just for notational convenience. I was thinking, if $t$ was a vector field on $M$ (the smooth manifold in question), then $tf \in C^{\infty} M$ and so $t^a \nabla_ f$ could be interpreted as $\nabla_a tf$, which would make sense. But $t$ is merely a tangent vector, so in this case I am confused.

2) My second question is this: After Wald lists the five properties that the covariant derivative operator is required to satisfy, he shows that such an operator does indeed exist. He says let $\psi$ be a smooth coordinate map on $M$ and let $\{\partial/\partial x^\mu\}$ and $\{dx^\mu\}$ be bases for the tangent space and cotangent space, respectively. Then given a smooth tensor field $T^{a_1 \cdots a_k}_{b_1 \cdots b_l}$, take its components $T^{\mu_1 \cdots \mu_k}_{\nu_1 \cdots \nu_l}$ in the given coordinate basis and define $\partial_c T^{a_1 \cdots a_k}_{b_1 \cdots b_l}$ to be the tensor whose components are the partial derivatives $\partial(T^{\mu_1 \cdots \mu_k}_{\nu_1 \cdots nu_l} )/ \partial x^{\sigma}$.

I understand that because $T$ is a smooth tensor field then its components are smooth real-valued maps, and consequently, we can take their partial derivatives. But to which variable do we differentiate with respect to? The components of $\partial_c T$ (which is a type $(k, l+1)$ tensor) are supposed to be the partial derivatives of the component functions of $T$, but if you ask me to take the partial derivative of a component function of $T$, I ask, which of the $n$ variables do I differentiate with respect to?

I hope my misunderstandings are clear. If they aren't, please let me know and I will clear it up.

EDIT: In regards to my first question, I just realized that $\nabla f$ is dual to $df$, and $t^a$ is a vector, and by the isomorphism $V_p \simeq V_p^{**}$, the vector $t^a$ acts on the dual vector $df$. So, though $t^a \nabla_a f$ isn't a contraction, writing it as one is indeed notationally convenient because it's a quick way to say that $t^a$ is acting on $df$. Is this correct?

EDIT 2: I am going to take a quick stab at my second question by trying to answer it with an example. Please let me know if it is correct. Let us take a simple tensor field $T$ of type $(2,2)$ on a smooth $3$-manifold. Suppose $T$ has the simple expansion, $T = f \Big( \partial/\partial x^1 \otimes \partial/\partial x^2 \otimes dx^1 \otimes dx^2 \Big)$ where $f$ is a smooth real valued function on $M$. Then $\nabla T = \frac{\partial f}{\partial x^1} \Big(\partial/\partial x^1 \otimes \partial/\partial x^2 \otimes dx^1 \otimes dx^2 \otimes dx^1 \Big)+ \frac{\partial f}{\partial x^2} \Big( \partial/\partial x^1 \otimes \partial/\partial x^2 \otimes dx^1 \otimes dx^2 \otimes dx^2 \Big) + \frac{\partial f}{\partial x^3} \Big( \partial/\partial x^1 \otimes \partial/\partial x^2 \otimes dx^1 \otimes dx^2 \otimes dx^3 \Big)$. Is this what Wald means?

Last edited: May 6, 2016
2. May 6, 2016

### andrewkirk

Yes, I think that's what Wald means. So when he writes $t^a\nabla_af$ he means $t^a(\nabla f)_a$. The notation saves him writing two parentheses. I wonder is the saving worth it, given the confusion it causes?

He could have just written $t^a\nabla f_a$ which would mean the same thing without needing parentheses, employing an evaluation rule in which operators are evaluated from left to right except when a overruled by a precedence rule or parentheses, and operators are 'pushed up the stack' (like on those lovely old-fashioned reverse-polish calculators) when the next symbol to their right is not a legal operand. Then the $t^a$ is pushed up because it can't operate on $\nabla$. Then, when $\nabla$ operates on $f$, it generates an operand that $t^a$ can use.

3. May 6, 2016

### JonnyG

@andrewkirk I think I am wrong when I say that Wald means $t^a \nabla_a f$ to mean that $t$ acts on $df$. I mean, why not just write $t(df)$ if that's what he meant? It isn't making sense to me. But you say that you think that is what Wald means...I think I am missing something here?

4. May 6, 2016

### JonnyG

@andrewkirk Sorry Andrew, but you may have to spell it out for me. How is $(\nabla f)_a$ a dual vector?

5. May 6, 2016

### andrewkirk

The reason I suggested that interpretation is because your OP quoted Wald as saying
which rules out the most natural interpretation, which is that the subscript $a$ is not just a notational convenience but that instead $\nabla_a$ denotes the vector $\partial/\partial x^a$. With that interpretation $\nabla_a f$ is a scalar and $t^a\nabla_a f$ denotes the Einstein sum $t^a(\nabla_af)$.

I think that interpretation is the most likely one as it is specifically about directional derivatives. Perhaps it would be best to dismiss his comment about the subscript $a$ just being a notational convenience.

6. May 6, 2016

### JonnyG

Oh wait. $f$ is a smooth map, so it is a smooth tensor of type $(0,0)$. Thus $\nabla f$ is a smooth tensor of type $(0,1)$, meaning it is a dual vector. So $t$ can actually act on $\nabla f$. So when Wald says that "consistency with the notion of tangent vectors as directional derivatives on scalar fields", he means that we retain the notion that tangent vectors act as directional derivative operators not only on scalar fields, but also on $\nabla f$.

I don't know how I missed that...by the way, am I correct in my answer to my second question?

Last edited: May 6, 2016
7. May 6, 2016

### andrewkirk

I don't understand your answer to 2. The role of $f$ is unclear. You appear to be either multiplying it by a tensor or trying to apply it to a tensor (which is illegal). What Wald means is

$$\nabla T=\left(\frac{\partial}{\partial x^\alpha} T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_l}\right)\left(\partial_{\mu_1}\otimes...\otimes\partial_{\mu_k}\otimes dx^{\nu_1}\otimes...\otimes dx^{\nu_l}\otimes dx^{\alpha}\right)$$

8. May 6, 2016

### JonnyG

And that is using Einstein summation notation, so you are summing over $\alpha$ right? That is what I wrote in my answer, but I suppose I was using bad notation. The $f$ was meant to denote the function component of the tensor field $T$.

Thanks for your help. I really appreciate it!

9. May 7, 2016

### Orodruin

Staff Emeritus
But this object does not transform properly under coordinate transformations. I do not have my copy of Wald with me so I cannot check what it actually says at the moment.

Regarding the OP question, it is the most convenient place to introduce an index in abstract index notation. For the other options you would need parentheses or some weird rule of how to read indices on covariant derivatives. Simply writing $\nabla f_a$ and giving priority to the $\nabla$ before the index is also not without problems. For example it becomes unclear if the abstract index actually belongs to $\nabla f$ or to $f$, making it a dual vector.

An alternative is using the notation $f_{;a}$, which also works perfectly for higher order tensors.

I anyway think the meaning of $t^a \nabla_a f$ is clear. It is the contraction between the dual vector $\nabla f$ and the tangent vector $t$. When $f$ is a scalar you can write this in a number of different ways, but here you are after a notation which will also serve you well for higher order tensors.

10. May 7, 2016

### andrewkirk

Ah, what a silly mistake. Of course you are correct. What I wrote takes no account of curvature. It's only a 'comma derivative' (partial derivative of the coordinate-based tensor components), whereas what's required is a semi-colon derivative.

The correct formula is something like:

\begin{align*}\nabla T&=\left(\frac{\partial}{\partial x^\alpha}
T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_l}
+\sum_{r=1}^k T^{\mu_1,...,\mu_{r-1},\beta,\mu_{r+1},...,\mu_k}_{\nu_1,...,\nu_l}\Gamma^{\mu_r}_{\beta\alpha}
-\sum_{r=1}^l T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_{r-1},\beta,\nu_{r+1},...,\nu_l}\Gamma^\beta_{\nu_r\alpha}
\right)\\
\end{align*}

The mess of Christoffel symbols is the difference between the comma (partial, coordinate) and the semi-colon (covariant) derivative.

11. May 7, 2016

### JonnyG

@andrewkirk We haven't gotten to the Christoffel symbols yet in the book. Is there a way to express this without using those symbols? Still though, I don't see how your original answer isn't a tensor field. $T$ is expressed as a linear combination of elementary $(k, l + 1)$ tensors fields with smooth functions for coefficients. What's wrong with it?

@Orodruin Allow me to quote Wald exactly:

"Our first important task is to show that derivative operators exist. Let $\psi$ be a coordinate system and let $\{\partial/\partial x^\mu\}$ and $\{dx^\mu\}$ be the associated coordinate bases. Then in the region covered by these coordinates we may define a derivative operator $\partial_c$, called an ordinary derivative, as follows. For any smooth tensor field $T^{a_1 \cdots a_k}_{b_1 \cdots b_l}$ we take its components $T^{\mu_1 \cdots \mu_k}_{\nu_1 \cdots \nu_l}$ in this coordinate basis and define $\partial_c T^{a_1 \cdots a_k}_{b_1 \cdots b_l}$ to be the tensor whose components in this coordinate basis are the partial derivatives $\partial(T^{\mu_1 \cdots \mu_k}_{\nu_1 \cdots \nu_l}) / \partial x^{\sigma}$."

12. May 7, 2016

### Orodruin

Staff Emeritus
The components defined as partial derivatives of the components of another tensor does not have the correct transformation properties under general coordinate transformations.

13. May 7, 2016

### stevendaryl

Staff Emeritus
Well, the axioms for $\nabla$ imply that it obeys the product rule for derivatives: $\nabla (X Y) = (\nabla X) Y + X (\nabla Y)$. Now, let's apply that to a vector field $A$. We can pick out a basis $e_\mu$ and write $A = \sum_\mu e_\mu A^\mu$. So applying the product rule tells us:

$\nabla A = \sum_\mu ((\nabla e_\mu) A^\mu + e_\mu (\nabla A^\mu))$

Now, the assumption that $\nabla$ is just the partial derivative would mean that $\nabla e_\mu = 0$. But you can choose any (or just about any) tetrad of vector fields to act as your basis. So there is no way for $\nabla e_\mu$ to always be zero, for every basis.

Now you can perhaps pick a specific basis, and declare that $\nabla e_\mu$ is zero for that basis, but in general, it will be nonzero for any other basis (unless it is a linear transformation of the special basis).

So the axioms for $\nabla$ specify it uniquely only after you say what $\nabla e_\mu$ is (for one basis; its value for other bases would then be computable). That's where the Christofel symbols come in: $\nabla e_\mu = \Gamma^\alpha_{\beta \mu} e_\alpha \otimes \omega^\beta$

(where $\omega^\beta$ is the covector basis).

14. May 7, 2016

### andrewkirk

Here's an alternative. We start by writing the covariant derivative as a sum of derivatives in each of the coordinate directions:

\begin{align*}\nabla T&=T_{;\alpha}\otimes dx^\alpha\\
&=\bigg[\left(\frac{\partial}{\partial x^\alpha}
T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_l}
+\sum_{r=1}^k T^{\mu_1,...,\mu_{r-1},\beta,\mu_{r+1},...,\mu_k}_{\nu_1,...,\nu_l}\Gamma^{\mu_r}_{\beta\alpha}
-\sum_{r=1}^l T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_{r-1},\beta,\nu_{r+1},...,\nu_l}\Gamma^\beta_{\nu_r\alpha}\right)
\\
\bigg]\otimes dx^{\alpha}
\end{align*}
where $T_{;\alpha}$ is the derivative of $T$ in direction $\partial_\alpha$.

Focus on the coefficient in parentheses

$$\left(\frac{\partial}{\partial x^\alpha} T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_l} +\sum_{r=1}^k T^{\mu_1,...,\mu_{r-1},\beta,\mu_{r+1},...,\mu_k}_{\nu_1,...,\nu_l}\Gamma^{\mu_r}_{\beta\alpha} -\sum_{r=1}^l T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_{r-1},\beta,\nu_{r+1},...,\nu_l}\Gamma^\beta_{\nu_r\alpha}\right)$$

The first term is simply the partial derivative of the tensor components, which is intuitive. The second and third components represent the way that the coordinate bases change as we move in direction $\partial_\alpha$.

The second term covers how the basis $\partial_{\mu_1},...,\partial_{\mu_k}$ changes. The Christoffel symbol $\Gamma^{\mu_r}_{\beta\alpha}$ is equal to $dx^{\mu_r}\left(\partial_{\beta;\alpha}\right)$. The item $\partial_{\beta;\alpha}$ is the derivative of $\partial_\beta$ in direction $\partial_\alpha$, which represents the way that the direction of $\partial_\beta$ changes as we move in direction $\partial_\alpha$.

Similarly, the third term covers how the basis $dx^{\nu_1},...,dx^{\nu_k}$ changes and the Christoffel symbol $\Gamma^{\beta}_{\nu_r\alpha}$ is equal to $d\nu_r\left(\left(dx^\beta\right)_{;\alpha}\right)$. The item $\left(dx^\beta\right)_{;\alpha}$, usually written as just $dx^\beta{}_{;\alpha}$, is the derivative of $dx^{\beta}$ in direction $\partial_\alpha$, which represents the way that $dx^\beta$ changes as we move in direction $\partial_\alpha$.

So the coefficient can be written without Christoffel symbols as:

$$\left(\frac{\partial}{\partial x^\alpha} T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_l} +\sum_{r=1}^k T^{\mu_1,...,\mu_{r-1},\beta,\mu_{r+1},...,\mu_k}_{\nu_1,...,\nu_l}dx^{\mu_r}\left(\partial_{\beta;\alpha}\right) -\sum_{r=1}^l T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_{r-1},\beta,\nu_{r+1},...,\nu_l}\partial_{\nu_r}\left(dx^\beta{}_{;\alpha}\right)\right)$$

OR, if you feel more comfortable with $\nabla$ than with semi-colons, as

$$\left(\frac{\partial}{\partial x^\alpha} T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_l} +\sum_{r=1}^k T^{\mu_1,...,\mu_{r-1},\beta,\mu_{r+1},...,\mu_k}_{\nu_1,...,\nu_l}dx^{\mu_r}\left(\nabla_\alpha\partial_{\beta}\right) -\sum_{r=1}^l T^{\mu_1,...,\mu_k}_{\nu_1,...,\nu_{r-1},\beta,\nu_{r+1},...,\nu_l}\partial_{\nu_r}\left(\nabla_\alpha dx^\beta\right)\right)$$

Last edited: May 8, 2016
15. May 7, 2016

### strangerep

It's possible to have zero curvature, but nonzero $\Gamma$'s. If the curvature vanishes, then it's possible to find some other coordinate system in which the $\Gamma$'s also vanish.

The 2nd and 3rd components just correct for the fact that the components in the 1st term do not transform tensorially in general.

16. May 8, 2016

### vanhees71

Sure, you know this from the usual Euclidean 3D vector analysis. For curvilinear coordinates (like spherical or cylindrical) the Christoffel symbols do not vanish. Note that in the usual cases of orthogonal curvilinear coordinates one uses not the holonomous basis but the orthonormal basis. In any case the Christoffel symbols don't vanish, if the basis vectors depend on position.

17. May 8, 2016

### stevendaryl

Staff Emeritus
I think that the point is to show that interpreting $\nabla$ as the partial-derivative operator (in some specific basis) satisfies all the axioms that Wald gives for the covariant derivative. So this shows that a covariant derivative exists (that is, something satisfies the axioms).

18. May 8, 2016

### Orodruin

Staff Emeritus
That would make more sense. Of course, this is just one particular covariant derivative and it is dependent on which coordinate system is used to define it, but I agree that it would make a perfectly fine covariant derivative - as long as one does not do the mistake of thinking its Christoffel symbols vanish in other coordinate systems.

19. May 8, 2016

### andrewkirk

True. I forgot about that. Polar coordinates in 2D Euclidean space is a simple example with nonzero Christoffel symbols but no curvature.

20. May 8, 2016

### JonnyG

Okay I think I have got it. Please correct me if I am wrong. Let $M$ be a smooth manifold and let $p \in M$. Then given a smooth chart $(U, \phi)$ about $p$, then we can define a partial derivative operator $\partial$ (like the one we previously discussed) that satisfies the properties given by Wald. However, if $T$ is a smooth tensor field, then $\partial T$ is actually not a tensor field, because its components do not transform properly when changing coordinates. But, it can be shown that $\nabla T = \partial T + \Gamma^c_{ab}$, where $\Gamma^c_{ab}$ is a tensor field. And actually, $\nabla T$ IS a smooth tensor field, and thus is a covariant derivative operator. So the Christoffel symbol acts as a sort of correction factor.

I have three more questions (sorry guys):

1) If $T$ is a smooth tensor field then its components are smooth real-valued functions. i.e. they are smooth functions from the manifold into $\mathbb{R}$. But if we take the partial derivatives of the components, then $\frac{\partial T^{\mu_1 \cdots \mu_k}_{\nu_1 \cdots \nu_l}}{\partial x^\alpha}$ is the component of at least one of the terms in $\nabla T$. But $\frac{\partial T^{\mu_1 \cdots \mu_k}_{\nu_1 \cdots \nu_l}}{\partial x^\alpha}$ is not a smooth function from the manifold into $\mathbb{R}$ - it is a smooth function from $\mathbb{R}$ into $\mathbb{R}$ (in the case where the manifold were embedded in Euclidean space). But the components of $\nabla T$ are supposed to be smooth functions from the manifold into $\mathbb{R}$. What's up with this?

2) I understand that the Christoffel symbols act as correction factors. But what exactly is it correcting? What geometry does it describe?

3) The fifth property that Wald requires the covariant derivative operator to satisfy is the torsion free property. That is, $\nabla_a \nabla_b f = \nabla_b \nabla_a f$. I thought the indices on the derivative operator were kind of dummy indices? So what does it really mean to interchange the order of the $\nabla_a$ and the $\nabla_b$? If they are dummy indices, then interchanging the order is trivial, but that doesn't seem to be the case.

Last edited: May 8, 2016