# Question about covariant derivatives

1. Jun 4, 2014

### space-time

Why is it that the covariant derivative of a covariant tensor does not seem to follow the product rule like contravariant tensors do when taking the covariant derivatives of those?

Here is a visual of what I mean: This is the covariant derivative of a contravariant vector. As you can see, it perfectly follows the product rule.

nVm = ∂Vm/∂yn + $\Gamma$mnrVr

where the Cristoffel symbol is simply a mixed derivative.

Where as with a covariant vector:

bVa = ∂Va/∂yb - $\Gamma$cbaVc

As you can see with the covariant vector, all of the terms are the same ones that appear if you use the product rule when differentiating the covariant transformation of the vector with respect to b. However, that minus sign would not appear when you use the product rule.

Can anyone explain to me then, why the covariant derivatives of covariant tensors like this have that minus sign in the middle instead of a plus sign (which would go in accordance with the product rule)?

2. Jun 4, 2014

### Staff: Mentor

Because a derivative is not a product. Why do you think it should follow the product rule? What would that even mean in this case?

3. Jun 4, 2014

### space-time

The covariant transformation of a vector Va would be something like:

Va = ∂xc/∂ya * Vc

differentiating this with respect to yb would require one to use the product rule as such:

(∂Vc/∂yb)(∂xc/∂ya) + (∂2xc/∂yb∂ya)Vc

The mixed derivative is the Cristoffel symbol. As you can see, the product rule was used here.

4. Jun 4, 2014

### Staff: Mentor

What sort of "covariant transformation" is this supposed to be? It looks like you're just lowering an index, which is supposed to be done with the metric tensor: $V_a = g_{ac} V^c$.

No, the Christoffel symbols are constructed from derivatives of the metric tensor:

$$\Gamma^a{}_{bc} = \frac{1}{2} g^{ad} \left( \frac{\partial}{\partial x^b} g_{cd} + \frac{\partial}{\partial x^c} g_{bd} - \frac{\partial}{\partial x^d} g_{bc} \right)$$

5. Jun 4, 2014

### space-time

http://mathworld.wolfram.com/CovariantDerivative.html

This is what I am talking about in this link. You can see what I mean by the plus and minus issue by looking at the covariant derivative of the contravariant tensor and then looking at the covariant derivative for the covariant vector.

No I am not trying to move any indices. Tensors can be expressed as products of partial derivatives and other tensors.

Yes the cristoffel symbol is constructed from metric tensors and derivatives of metric tensors, but the cristoffel symbol also acts as a correction term when transforming between frames of reference.

6. Jun 4, 2014

### pervect

Staff Emeritus
A derivative operator must, by definition, satisfy the Lebnitz rule (chain product rule)

$$\nabla_e \left[ A^{p...}{}_{q...} \, B^{r...}{}_{s...} \right]= \left[ A^{p...}{}_{q...} \right] \nabla_e B^{r...}{}_{s...} + \left[ B^{r...}{}_{s...} \right] \nabla_e A^{p...}{}_{q...}$$

p..., q..., r..., s... represent multiple indicies here.

Aside from the above, a derivative operator, applied to a scalar function, must be consistent with the notion of tangent vectors as directional derivative operators, hence the difference between any two derivative operators applied to a scalar function must be zero.

There are a few other properties they have, but they won't be needed, so I won't discuss them. I suppose I should mention that one of them is "no torsion". See Wald "General Relativity" for the full details, and for a longer discussion than I'm going to give if I'm too terse.

You can prove that the difference between two derivative operators satisfying the properties given by Wald is a tensor field $C^c{}_{ab}$. I won't do this, though, I'll refer you to the textbook, you seem familiar with this already, I'll simply state that when we have two different derivative operators $\nabla_a$ and $\tilde{\nabla_a}$.

$\left( \nabla_a - \tilde{\nabla_a} \right) \omega_b = -C^c{}_{ab} \omega_c$

Having two derivative operators might be confusing - I'll point out that one derivative operator of interest might be ordinary partial derivatives (partial derivative operators satisfy all the axioms), the other derivative operator of interest might be one that that's has a metric compatibility condition, $\nabla_a g_{bc} = 0$. Then we can use this formula to find the difference between an ordinary partial derivative operator and a derivative operator that satisfies the metric compatibility condition.

The later one (the derivative operator) that satisfies the metric compatibility condition is the one that is used and implied in GR. Note that we've assumed the torsion free condition earlier as well.

Your formula boils down to finding what you need to add to the ordinary derivative operator to make it satisfy the metric compatibility condition, while still retaining the needed properties (such as following the Lebnitz rule) that all derivative operators must have.

Now, if you consider $\left( \nabla_a - \tilde{\nabla_a} \right) \left( \omega_b t^b \right)$ which must be equal to zero (because of the uniqueness of derivative operators applied to a scalar) and then apply the chain rule property, you can show that

$\nabla_a t^b = \tilde{\nabla_a} t^b + C^b{}_{ac} t^c$

which implies the different sign.

7. Jun 4, 2014

### stevendaryl

Staff Emeritus
The notation $\nabla_\mu A^\nu$ drives me bonkers, because it suggests an operation on a component of a vector. But it isn't, it's an operation on vectors that returns another vector, and then we take the $\nu$ component of that. So it really should be written as $(\nabla_\mu A)^\nu$. I think that some of the confusion about connection coefficients would be cleared up if better notation were used.

We can write a vector $A$ in terms of basis vectors $e_\nu$ as follows:

$A = A^\nu e_\nu$

Now, if we operate with the derivative-like operator $\nabla_\mu$, it's clear by the product rule that:

$\nabla_\mu A = (\nabla_\mu A^\nu) e_\nu + A^\nu (\nabla_\mu e_\nu)$

At this point, we apply two facts about $\nabla_\mu$:
1. If $\Phi$ is a scalar, then $\nabla_\mu \Phi = \partial_\mu \Phi$
2. On basis vectors, by definition: $\nabla_\mu e_\nu = \Gamma^\lambda_{\mu \nu} e_\lambda$

Then operating on a vector $A$ with $\nabla_\mu$ gives:
$\nabla_\mu A = (\partial_\mu A^\nu) e_\nu + A^\nu (\Gamma^\lambda_{\mu \nu}e_\lambda) = (\partial_\mu A^\nu + A^\lambda\Gamma^\nu_{\lambda \mu}) e_\nu$
(where I renamed dummy indices in the last expression to pull out a common factor of $e_\nu$)

So I see this as

$(\nabla_\mu A)^\nu = \partial_\mu A^\nu + A^\lambda \Gamma^\nu_{\lambda \nu}$

It's the $\nu$ component of the derivative, rather than derivative of the $\nu$ component. Since components are scalars, $\nabla_\mu A^\nu = \partial_\mu A^\nu$.

Getting to your question of why the minus sign, it comes from this: For CO-vectors $W$, you can do exactly the same thing, except first you write it in terms of basis co-vectors: $W = W_\nu \omega^\nu$, where $\omega^\nu$ is a basis covector. For co-vectors, the fundamental rule is:

$\nabla_\mu \omega^\nu = - \Gamma^\nu_{\mu \lambda} \omega^\lambda$

leading to the equation:

$(\nabla_\mu W)_\nu = \partial_\mu W_\nu - \Gamma^\lambda_{\mu \nu} W_\lambda$

Why the minus sign? Well, the fundamental relationship between basis vectors and basis co-vectors is $\omega^\nu \cdot e_\mu = \delta^\nu_\mu$, where $\delta^\nu_\mu$ is 1 if $\nu = \mu$ and is 0 otherwise. The minus sign is exactly what you need in order to have:

$\nabla_\mu (\omega^\lambda \cdot e_\nu) = 0$ (which it should be, since it is equal to $\delta^\lambda_\nu$, which is a constant, so its derivative is zero).

8. Jun 4, 2014

### WannabeNewton

I can't tell you how many times I've managed to confuse myself during a calculation precisely because of that notational issue! This is but one of the places where the index-free notation reigns supreme.

9. Jun 4, 2014

### Staff: Mentor

I understand all this, but none of it explains where your formulas in post #3 come from.

I don't think this is true in general, although of course some tensors can be expressed this way. Why do you think *all* tensors can be expressed this way?

I don't think this is correct either. You can certainly construct a coordinate transformation in order to force the Christoffel symbols to have certain values (for example, you can always force them all to be zero at the origin of a local inertial frame), but the symbols themselves are not part of the coordinate transformation.

10. Jun 4, 2014

### Staff: Mentor

So have I. I've tried to train myself to put the parentheses around the derivative expression, the way stevendaryl did in his post, but I don't always remember.

11. Jun 4, 2014

### pervect

Staff Emeritus
It tends to confuse me too.

I think of $\nabla_a A^b$ as a rank two tensor, i.e. using index notation. Then if we specify a specific vector u, and write $\nabla_u$ we really mean $\nabla_u = u^a \left(\nabla_a A^b\right)$

I don't believe this is equivalent to what you wrote, (\nabla_\mu A)^\nu, however :(

It's possible I'm still confused, but at the time I write this it still looks OK and your expression looks funny,.

12. Jun 4, 2014

### WannabeNewton

I do believe that you are correct and that a more appropriate but cumbersome notation would be $(\nabla A)^{\nu}{}{}_{\mu} = \nabla_{\mu}A^{\nu}$ but I'm not sure about the relative ordering of the indices on the LHS.

Well it's good to know I'm not the only one haha.

13. Jun 4, 2014

### Staff: Mentor

I don't think so either. I think stevendaryl's notation for $\nabla_u$ would be $\left( u^a \nabla_a A \right)^b$. This expression has only one free index, so it's a vector, whereas $\left( \nabla_a A \right)^b$ is, as you say, a rank-2 tensor (actually a 1-1 tensor since one index is upper and one is lower). Note that the first expression I wrote is just $u^a$ contracted with the second expression.

14. Jun 5, 2014

### stevendaryl

Staff Emeritus
I agree, but there is a way to think of connection coefficients (I'm never clear about the distinction between connection coefficients and Christoffel symbols--is one a special case of the other?) in terms of coordinate transforms:

Suppose you want to compute $\nabla_\mu A^\nu$ at a point $\mathcal{P}$. Since it's a tensor, you can compute it this way:
1. Let $x^i$ be a locally inertial coordinate system.
2. Let $L^i_\mu = \dfrac{\partial x^i}{\partial x^\mu}$ be the local transformation matrix between coordinates $x^i$ and $x^\mu$. Let $L^\mu_i be the inverse transformation matrix. 3. In terms of the [itex]x^i$, we just have: $\nabla_i A^j = \partial_i A^j$
4. We can transform to the original coordinate system via: $\nabla_\mu A^\nu = L^j_\nu L^i_\mu \partial_i A^j$
5. Since $A^j = L^j_\lambda A^\lambda$, we end up getting an additional term due to differentiating $L^j_\lambda$.
6. So $\nabla_\mu A^\nu = \partial_\mu A^\nu + (L^\nu_j \partial_\mu L^j_\lambda) A^\lambda$

So we conclude: $\Gamma^\nu_{\mu \lambda} = L^\nu_j \partial_\mu L^j_\lambda$

So the $\Gamma$ can be thought of as the additional terms you get when you transform to a locally inertial coordinate system, compute the partial derivative, and transform back.

15. Jun 5, 2014

### stevendaryl

Staff Emeritus
The thing that is confusing about it all is that while $A$ is a vector, the components of $A$ are scalars. Similarly, I would say that $\nabla A$ is a rank two tensor, while $\nabla_a A$, which is a single "row" of that tensor, is a vector, and $\nabla_a A^b$, which is a single "cell" of that row, is a scalar.

The problem is that people, for efficiency of notation, leave out the basis vectors. So when they write $A^b$ they sometimes mean a particular component of vector $A$, and sometimes mean the entire vector $A$. To me, if you mean the entire vector $A$, it's clearer to either just use $A$ (no indices at all), or to explicitly use basis vectors: $A = A^b e_b$, $\nabla = \omega^a \nabla_a$ (where $e_b$ and $\omega^a$ are basis vectors and co-vectors). In the expression $A^b e_b$, it's clear that $A^b$ means a component, not the entire vector.

If the basis vectors are made explicit, then $\nabla A$, the rank-two tensor, would be written as: $(\omega^a \nabla_a)(A^b e_b)$. Then it's clear that the meaning is:

$(\omega^a \nabla_a)(A^b e_b) = \omega^a (\nabla_a A^b) e_b + \omega^a A^b (\nabla_a e_b) = \omega^a e_b (\partial_a A^b) + \omega^a A^b \Gamma^c_{a b} e_c = (\omega^a e_b) (\partial_a A^b + \Gamma^b_{a c} A^c)$

In the above, the meaning of $\nabla_a A^b$ is the derivative of a component of a vector, not a component of the derivative of a vector, so you can replace $\nabla_a$ by the partial derivative.

Explicitly mentioning the basis vectors makes things clear, if long-winded. I guess I should say "unambiguous" rather than "clear", because it makes the expressions longer, and length works against clarity as much as ambiguity does.

16. Jun 5, 2014

### Staff: Mentor

As I understand it, yes, "connection coefficients" is the general term, which applies in any basis, and "Christoffel symbols" is the specific term for the gammas we've been writing, which assume a coordinate basis. If we allow a more general (non-coordinate) basis, there are extra terms in the formula for the connection coefficient (in addition to the derivatives of the metric) involving the commutation coefficients (because the basis vectors may not all commute). MTW discusses this in some detail in a fairly early chapter.

Ah, I see. And if we were computing the covariant derivative of a covector $\nabla_{\mu} A_{\nu}$ this way, we would end up computing the derivative of the inverse matrix, $\partial_{\mu} L^{\lambda}_j$, which would lead to the change in sign.

17. Jun 5, 2014

### WannabeNewton

Connection coefficients apply to arbitrary affine connections over vector bundles or more generally fibre bundles. Christoffel symbols are the special case wherein the affine connection arises from a metric and, in GR, is torsion-free.

By the way $\nabla = e^{\mu}\nabla_{\mu}$ is a complete abuse of notation; $\nabla_{\mu}$ is not a covector.

18. Jun 5, 2014

### bcrowell

Staff Emeritus
Index-free notation becomes extremely awkward for complicated expressions. Abstract index notation gives you the best of both worlds: the coordinate-independence of index-free notation and the convenience of index notation.

19. Jun 5, 2014

### Bill_K

What the heck, everyone else has taken a turn proving this, so I might as well too (especially since my way is simpler.)

The minus sign is there precisely because of the product rule, as we will now show. For arbitrary vectors Aa and Ba, their inner product is a scalar, and the covariant derivative of a scalar must equal the partial derivative.

Define the usual Christoffel symbol Γ by the contravariant case, Aa;b = Aa,b + Γacb Ac, and just for laughs suppose there's something different in the covariant case, Aa;b = Aa,b + Ωcab Ac. Then

(AaBa);c = (Aa,c + ΓadcAd) Ba + Ad (Bd,c + Ωedc Be) = (Aa Ba),c + Ad Beedc + Ωedc)

In order for the covariant derivative of a scalar to equal the partial derivative for all A and B, the last two terms must cancel:

Ωedc = - Γedc.

Thus the minus sign.

20. Jun 5, 2014

### stevendaryl

Staff Emeritus
I agree that it's not covector, since its "components" are differential operators, rather than numbers, but writing it that way makes it clear that $\nabla A$ can be contracted with two vectors to produce a scalar. It is true that if $A$ is a vector, then $\nabla A$ = $\omega^a \otimes e_b (\nabla_a A)^b$, so I don't see any downside to writing it as an operator $\omega^a \nabla_a$ acting on a vector $e_b A^b$. I might be convinced that it's a bad notation if you give me an example of where it would cause confusion. Do you have one in mind?

21. Jun 5, 2014

### stevendaryl

Staff Emeritus
That's sort of what I said, in terms of basis vectors and co-vectors: Since $\omega^i (e_j) = \delta^i_j$, which is a constant (for a particular $i$ and $j$, it must be that $\nabla_k (\omega^i (e_j)) = 0$. So if $\nabla_k e_j = \Gamma^l_{kj} e_l$, then $\nabla_k \omega^i$ has to produce the opposite sign.