# A Why does MTW keep calling the "product rule" the "chain rule"?

#### SiennaTheGr8

Perhaps if you ignore mathematical niceties
I do!

#### PAllen

Perhaps if you ignore mathematical niceties like:

The product rule requires the functions share a domain.

The chain rule requires the range of one function to be a subset of the domain of the other.

Function composition is fundamentally different from the simple product of functions.
It's perfectly easy to make the derivation of product rule from the multivariate chain rule completely rigorous.

#### vanhees71

Gold Member
It's fascinating, how one can get hooked up in a debate about a somewhat sloppy naming of a standard mathematical theorem known already in highschool, as if there's nothing else of real interest in this great textbook on GR...

#### fresh_42

Mentor
2018 Award
It's fascinating, how one can get hooked up in a debate about a somewhat sloppy naming of a standard mathematical theorem known already in highschool, as if there's nothing else of real interest in this great textbook on GR...
The reason is: Some like the thought "Look, even the big ones make mistakes!" - not that they ever had claimed otherwise - and others jump in to save their reputation - not that it would be necessary. And I have been fascinated by the mathematical questions whether the chain rule can be associated with chain complexes, or simpler, why the bilinear, associative $f\circ g$ as a multiplication instruction doesn't automatically show that the chain rule is an instance of the product rule. But I admit, wrong forum.

#### vanhees71

Gold Member
Yes, well, I wrote enough manuscripts to be very mild against trivial typos or just a bad formulation. The overall concept of MTW and the presentation of the material, however, is outstanding.

#### Orodruin

Staff Emeritus
Homework Helper
Gold Member
2018 Award
Yes, well, I wrote enough manuscripts to be very mild against trivial typos or just a bad formulation. The overall concept of MTW and the presentation of the material, however, is outstanding.
That moment when you are going to explain to your students that a time integral is missing in a course book and show the ”correct” version in your own book just to realise that - although you remembered the dt unlike the other book - the integral sign with the limits is missing from your book as well ...

Last edited:

#### fresh_42

Mentor
2018 Award
Yes, well, I wrote enough manuscripts to be very mild against trivial typos or just a bad formulation. The overall concept of MTW and the presentation of the material, however, is outstanding.
I have forgotten to mention the "internet effect". I remember a thread which I thought would have been instantly closed or at least after a few answers. It felt to have lasted internally. The question was, whether zero is a real number or some nonsense like this.

The internet effect: The more ridiculous the subject, the longer the thread! (*)

_________
(*) I do not claim that the reverse is true. So go ahead with QM interpretation discussions!

#### vanhees71

Gold Member
Though, the QM interpretation discussions tend to justify your observed rule about the length of threads ;-))). SCNR.

#### FreeThinking

Ok. That's why I asked the question, rather than assuming they were wrong. That's the kind of thing I meant when I asked, "What am I missing?" So the two rules are connected in this way. I just never spotted that in all my Googling.

I will say that the connection seems a bit obscure to me, personally, because I've been reading a lot of physics books for a very long time and I don't remember any other books saying "chain rule" and then writing equations that were simply the "product rule" without any sign of a "chain" in them. I've only seen this in MTW. But, at least, this reference shows that there is some justification for the connection, so now I consider myself better educated on the matter.

Thank you very much, SiennaTheGr8 .

#### FreeThinking

***** WARNING: LONG POST *****

NOTE: The questions I ask in this post are rhetorical. It was just an easy way to describe what I was thinking at the time. Please don't waste your time trying to answer them. Today I have answers to most of them anyway.

... it would definitely be helpful if the authors would at least get the terminology correct and/or give more steps in the derivation. Their cavalier use of the terms "covariant" and "directional" derivatives have caused me no end of grief.
What in particular is confusing you about those terms? Can you give an example of a usage of them that you find confusing from MTW (or another textbook if that's easier)?
I want to be clear that, despite how it may sometimes sound, I am in no way, shape, or form blaming MTW or any other author or book for my difficulties in understanding this subject. Every reader/student has a different level of background, understanding, and skill. Every author must decide for themselves who their target audience is. I just happen to be on the outer fringes of the target audience of MTW, so their book is very challenging for me. It is simply a fact, not a complaint, that MTW, and others as well, seem to have used the terms "covariant derivative" and "directional derivative", as well as other terms, in a way that has confused me. I am not talking about the occasional typographical mistakes, but only what appears to be deliberate uses.

For examples of what I mean, look at the following:

MTW page 208 and 209, which is section 8.5, Parallel transport, covariant derivative, connection coefficients, geodesics. On page 208, 2nd paragraph, they say that the gradient of a tensor field is $\boldsymbol \nabla \boldsymbol T$. So that defines the gradient operator as $\boldsymbol \nabla$. In elementary calculus, including 3-D, Cartesian vector calculus, I was taught that the gradient operator is defined as $\boldsymbol \nabla \equiv \left ( \frac {\partial} {\partial x^i} \right ) {\boldsymbol {\hat x}}_i$, a differential operator that is a vector. We write it that way to emphasize that the partial derivative does not operate on the basis vector. A very important point is that the gradient operator has to include the basis vectors of the coordinate system being used because they will be used to form the dot product with the basis vectors in the tangent vector of the curve to produce the total derivative, with respect to the parameter of the curve, of the given scalar field along a given curve. Later I learned that this doesn't work well in an arbitrary, curvilinear coordinate system, so it is convenient to change the definition of the gradient operator to be a one-form $\boldsymbol \nabla \equiv \left ( \frac {\partial} {\partial \xi^\gamma} \right ) {\boldsymbol {\widetilde \xi}}^\gamma$.

So, if that's true, then letting $\boldsymbol T = T^\rho_\lambda {\boldsymbol e}_\rho \otimes {\boldsymbol \omega}^\lambda$, we can write $\boldsymbol \nabla {\boldsymbol T} = \left (\frac {\partial{\boldsymbol T}} {\partial \xi^\gamma} \right ) \otimes {\boldsymbol {\widetilde \xi}}^\gamma = \left ( {\nabla}_\gamma T^\rho_\lambda \right ) {\boldsymbol e}_\rho \otimes {\boldsymbol \omega}^\lambda \otimes {\boldsymbol {\widetilde \xi}}^\gamma = \left ( T^\rho_{\lambda;\gamma} \right ) {\boldsymbol e}_\rho \otimes {\boldsymbol \omega}^\lambda \otimes {\boldsymbol {\widetilde \xi}}^\gamma$.
Note that the nabla symbol without the subscript is bolded while the one with the subscript is not bolded. The expressions $\nabla_\gamma T^\beta_\alpha$ and $T^\beta_{\alpha ; \gamma}$ are what I understood to be the covariant derivative of the components of the given tensor. Nothing else is the covariant derivative, just those two ways of expressing it. Especially, $\boldsymbol \nabla$ is definitely NOT the covariant derivative. You cannot take the covariant derivative of a tensor; only the components of a tensor. That was my understanding when I arrived at MTW's front door. Yet, in the next paragraph they say, "First define the "covariant derivative" $\boldsymbol \nabla_{\boldsymbol u} \boldsymbol T$ of $\boldsymbol T$ along a curve $P(\lambda)$, whose tangent vector is $\boldsymbol u = \frac {dP}{d\lambda}$." The expression $\boldsymbol \nabla_{\boldsymbol u} \boldsymbol T$ is not a covariant derivative. It contains a covariant derivative, but it is not itself a covariant derivative, at least according to my understanding of the definition of a covariant derivative. In fact, it is not even a gradient, per the definition above given by MTW themselves. It is a directional derivative. It contains a gradient, which contains a covariant derivative, but neither of them are covariant derivatives.

Now one could say, "Well, they're defining the covariant derivative to be the directional derivative along the curve, etc." Or, one could also say, "Well, let's not get too hung up on the exact wording, it should be clear what they mean from the equations." What I would say is, "Fine. If we're going to define terms differently from other books and/or ignore the wording & just look at the math, I can do that. But it would be helpful to me if that plan were explicitly stated ahead of time." Over the past six months or so, my understanding of the terms & notation has improved greatly. So now as I review MTW trying to pick up where I left off before all the confusion set in, I'm beginning to see how to interpret their wording correctly and so I'm now less confused about what they're saying. But that's due to my very hard-won, expanded insight that required a lot of study & self-help outside of MTW. Again, I'm not criticizing them, I'm just pointing out that this was a point of confusion for me when I first encountered it because I was still not sure of MTW's definition of things.

Also, starting with the last two paragraphs at the bottom of page 208, we establish that $\boldsymbol {e}_\beta$ and $\boldsymbol {\omega}^\alpha$ are general bases dual to each other. Continuing onto page 209, equation 8.19a says that ${\boldsymbol \nabla}_\gamma \equiv {\boldsymbol \nabla}_{{\boldsymbol e}_\gamma}$ . Then further down the page, equation 8.20 defines $T^\beta_{\alpha,\gamma} \equiv {\boldsymbol \nabla}_\gamma T^\beta_\alpha \equiv \partial_{{\boldsymbol e}_\gamma} T^\beta_\alpha \equiv \partial_\gamma T^\beta_\alpha$. With a general basis, not a local Lorentz frame, why are we defining the directional derivative ${\boldsymbol \nabla}_{{\boldsymbol e}_\gamma} \equiv {\boldsymbol \nabla}_\gamma$ to be a partial derivative $T^\beta_{\alpha,\gamma} \equiv \partial_\gamma T^\beta_\alpha$? If we were using a coordinate basis, say $\left \lbrace {\boldsymbol {\xi}_\gamma} \right \rbrace$, it would make sense since ${\boldsymbol {\xi}_\gamma} \equiv {\boldsymbol \nabla}_{{\boldsymbol e}_\gamma}$, the directional derivative operator along the coordinate curve ${\boldsymbol {\xi}_\gamma}$. Perhaps if we stare at this section long enough, it might dawn on us what they actually mean. I get the gist of the section. I understand the gamma correction terms, but I'm just not sure what justifies the way they write some of the equations. While writing this post, I tried to work through equation 8.19a & 8.20, but I'm still not getting the same result they seem to get. So, this confused me when I first encountered it and it still seems to be confusing me now.

Continuing on to page 210, 1st paragraph, equation 8.22: And we're back now with something that exactly matches what I would expect for the directional derivative of the tensor field $\boldsymbol T$, but again MTW calls it the covariant derivative. Ok, is this just how they call it, or have I misunderstood something? Considering some of the ways thay have defined the directional derivative as described in my previous paragraphs above, as a newbie, I was just not sure what I was not understanding. So not only did the words not match, even the equations did not seem totally consistent to me.

On page 253, it says: Any "rule" $\boldsymbol \nabla$, for producing new verctor fields from old, ... is called by differential geometers a "symmetric covariant derivative." So maybe their constantly calling the bold nabla the covariant derivative is standard. But then, what do we call the part with the semicolon? You know, the part with the gamma correction terms? Then on page 255, last paragraph in the box, labeled "B.", it says: "The machine $\boldsymbol \nabla$ differs from a tensor in two ways. ...(2) $\boldsymbol \nabla$ is not a linear machine (whereas a tensor must be linear!)". What? I thought the whole point of the covariant derivative was to have a derivative that produced results that could be used as the components of a tensor. If the covariant derivative is a tensor or has the components of a tensor, it's linear. Yes? No? Which is it? So, that really sent me scrambling all the way back to Schaum's Outline on Vector Analysis by Murray R. Spiegel where I first encountered the covariant derivative as that thing with the gamma correction terms. Yes, well, OK. The thing with the gamma correction terms is supposed to transform like the components of a tensor. So what does MTW mean when they say $\boldsymbol \nabla$ are not the components of a tensor?

By the time I reached page 271, equation 11.8, I was too confused about what the bold nabla symbol meant. Trying to understand (11.8), I tried several different definitions of $\boldsymbol \nabla$, but nothing worked. Somewhere around here and slightly beyond, I was not understanding the math at all, so I Googled. One of the things that popped up was Carroll's Lecture Notes. On pages 75 through 77 I found enough information to eventually figure out a consistent set of terms & notations for all these derivatives that actually seems to give me the same answers as Carroll and makes some sense of MTW. I don't know if my set of definitions is the same as the mainstream or if they even make sense to anyone else, but they make sense to me. The task(s) I'm working on now is to go back through MTW, identify all the places that confused me before, and reread & rewrite MTW's text & math in my notation to see if it makes MTW any more sensible to me. Preliminary experience indicates that MTW can be rewritten to make more sense to me, but there are still a lot of places that don't. So, I may have finally found the right definitions and can start making some progress.

This post is not meant to be a rigorous proof that MTW is inconsistent or sloppy in their notation. Probably anyone who has any business reading MTW would not be confused because they are either smart enough to figure it out without a lot of hand-holding (that ain't me) or they've already mastered the math (also not me) & MTW is just intended to show them how to apply it to general relativity. So most readers here will probably wonder how I could have gotten so confused. But for this particular, self-guided hobbyist, it was enough to hold me up for quite awhile. And now that I'm in review & recover mode, I see better what they mean, but I still feel that some of their wording & even the math is inconsistent in some places. I'm confident that that fact means I still don't know what I'm doing.

Finally, I'd like to echo one sentiment of an earlier poster:
The overall concept of MTW and the presentation of the material, however, is outstanding.
If I did not agree, I would never have spent so much time trying to understand it. I have found no other book that covers as much material, even if the coverage is a challenge for the likes of me. Well, it keeps me off the streets at night.

#### FreeThinking

Also, starting with the last two paragraphs at the bottom of page 208, we establish that $\boldsymbol {e}_\beta$ and $\boldsymbol {\omega}^\alpha$ are general bases dual to each other. Continuing onto page 209, equation 8.19a says that ${\boldsymbol \nabla}_\gamma \equiv {\boldsymbol \nabla}_{{\boldsymbol e}_\gamma}$ . Then further down the page, equation 8.20 defines $T^\beta_{\alpha,\gamma} \equiv {\boldsymbol \nabla}_\gamma T^\beta_\alpha \equiv \partial_{{\boldsymbol e}_\gamma} T^\beta_\alpha \equiv \partial_\gamma T^\beta_\alpha$. With a general basis, not a local Lorentz frame, why are we defining the directional derivative ${\boldsymbol \nabla}_{{\boldsymbol e}_\gamma} \equiv {\boldsymbol \nabla}_\gamma$ to be a partial derivative $T^\beta_{\alpha,\gamma} \equiv \partial_\gamma T^\beta_\alpha$? If we were using a coordinate basis, say $\left \lbrace {\boldsymbol {\xi}_\gamma} \right \rbrace$, it would make sense since ${\boldsymbol {\xi}_\gamma} \equiv {\boldsymbol \nabla}_{{\boldsymbol e}_\gamma}$, the directional derivative operator along the coordinate curve ${\boldsymbol {\xi}_\gamma}$. Perhaps if we stare at this section long enough, it might dawn on us what they actually mean. ... I tried to work through equation 8.19a & 8.20, but I'm still not getting the same result they seem to get.
Ok, I've stared it a while longer, and here's what I'm seeing:

Based on how MTW defines things, as described above, I get $\boldsymbol \nabla_\gamma T^\beta_\alpha = \Lambda^\mu_\gamma T^\beta_{\alpha,\mu}$, using $\boldsymbol e_\gamma = \Lambda^\sigma_\gamma \boldsymbol \xi_\sigma$ where $\boldsymbol \xi_\sigma$ are the coordinate basis vectors. But if I replace the $\boldsymbol e_\gamma$ with $\boldsymbol \xi_\gamma$, I get $\boldsymbol \nabla_\gamma T^\beta_\alpha = T^\beta_{\alpha,\gamma}$ which seems to be what MTW says it should be.

But, I see several problems with this:
• MTW has used $\boldsymbol \nabla$ is such a way that it generates gamma correction terms when applied to a general tensor. But applying it to just the components of a tensor does not generate those components unless we interpret it as the semicolon operator, which they do not seem to do in (8.20).
• MTW just defined $\boldsymbol e_\beta$ to be a general basis, not necessarily a coordinate basis. Yet in (8.20) the $\Lambda^\sigma_\gamma$ needed to define the general basis is nowhere to be found. It is as if MTW has suddenly changed $\boldsymbol e_\beta$ to be a coordinate basis.

This is a case where the math itself confuses me even if we ignore the text. Which is why, when I arrive at other places in MTW that use the nabla symbol, I'm never sure what they mean at that particular point. I have to work the problem multiple ways until I stumble on the same result.

So, this is a question I would like to have answered: How is one to think about this? Is it a typo? Have they just switched back to using e as a coordinate basis? Is $\boldsymbol \nabla_\gamma$ intended to be just the simple, elementary, partial derivative at this particular point in the text? Or, which I always consider to be the most likely case, what am I not understanding?

#### PeterDonis

Mentor
Based on how MTW defines things, as described above, I get$\boldsymbol \nabla_\gamma T^\beta_\alpha = \Lambda^\mu_\gamma T^\beta_{\alpha,\mu}$ , using $\boldsymbol e_\gamma = \Lambda^\sigma_\gamma \boldsymbol \xi_\sigma$ where $\boldsymbol \xi_\sigma$ are the coordinate basis vectors.
I don't understand what you're doing here. The Lorentz transformation $\Lambda$ doesn't appear anywhere in the section of MTW you're referring to, and anyway you don't use a Lorentz transformation to transform from local inertial coordinates to general curvilinear coordinates.

but if I replace the $\boldsymbol e_\gamma$ with $\boldsymbol \xi_\gamma$ , I get $\boldsymbol \nabla_\gamma T^\beta_\alpha = T^\beta_{\alpha,\gamma}$ which seems to be what MTW says it should be.
I don't understand what you're doing here either. It doesn't help that you're throwing in your own notation $\boldsymbol \xi_\gamma$, which doesn't appear anywhere in MTW. MTW always uses $\boldsymbol e$ for the basis vectors, not $\boldsymbol \xi$.

MTW has used $\boldsymbol \nabla$ is such a way that it generates gamma correction terms when applied to a general tensor. But applying it to just the components of a tensor does not generate those components unless we interpret it as the semicolon operator, which they do not seem to do in (8.20).
You are confused. You don't apply the $\boldsymbol \nabla$ operator to the components of a tensor.

$\boldsymbol \nabla$, by itself, with no subscripts, is a differential operator that takes an $(m, n)$ tensor (a tensor with $m$ upper indexes and $n$ lower indexes, or, in MTW's coordinate-free terminology, a tensor with $m$ slots that accept 1-forms and $n$ slots that accept vectors) to an $(m, n+1)$ tensor. (This is all explained in section 3.5 of MTW.) In other words, if I have a tensor $\boldsymbol T$, then $\boldsymbol \nabla \boldsymbol T$ is another tensor with one more lower index. Applying $\boldsymbol \nabla$ by itself to the components of a tensor makes no sense.

If I want to express $\boldsymbol \nabla \boldsymbol T$ in component notation, then if $\boldsymbol T$ is a $(1, 1)$, tensor, i.e., in components it is $T^\alpha{}_\beta$, then $\boldsymbol \nabla \boldsymbol T$ will be $T^\alpha{}_{\beta ; \gamma}$.

MTW also use the notation $\boldsymbol \nabla_{\boldsymbol u}$, i.e., $\boldsymbol \nabla$ with a subscript, to denote a different operator, the directional derivative along the 4-vector $\boldsymbol u$. In component notation, $\boldsymbol \nabla_{\boldsymbol u} T$ is $u^\gamma T^\alpha{}_{\beta ; \gamma}$.

In neither case described above do we apply the operator $\boldsymbol \nabla$ (with or without a subscript) to the components of a tensor.

MTW just defined $\boldsymbol e_\beta$ to be a general basis, not necessarily a coordinate basis. Yet in (8.20) the $\Lambda^\sigma_\gamma$ needed to define the general basis is nowhere to be found
I don't know where you're getting this from. You don't use a Lorentz transformation to go to general curvilinear coordinates. See above.

This is a case where the math itself confuses me even if we ignore the text.
Have you encountered covariant derivatives in other textbooks? Have they confused you there?

For example, Carroll discusses covariant derivatives in his lecture notes. Were you able to follow his presentation?

#### FreeThinking

Peter: My apologies. I was trying to be brief and may have pulled an MTW on you. Just ignore it for now and don't spend any more time on it. I'm working on a longer version that will hopefully explain things more clearly. I've got things going on so it may be a few days before I can post it. I want to make sure I get it right this time.

#### FreeThinking

I have a question.

MTW says that the covariant derivative is a machine with slots that accepts inputs and produces an output. Looking specifically on page 255, Box 10.3, part A, sub-parts 3 through 5, here's how I interpret what they're saying there:

$\boldsymbol \nabla$ is a machine, called the covariant derivative, with 3 slots. If we plug certain types of objects into the appropriate slots, we get out new machines depending on which slots we fill. These new machines also have slots that accept the proper kind of object. Depending on which slots we fill, we get different outputs: One selection gives us a directional derivative, another selection gives us a gradient, and filling all the slots gives us a number.

The key point of all of this is that the machine called the "directional derivative" and the machine called the "gradient" are both instances of the more general machine called "covariant derivative". A "directional derivative" is a "covariant derivative", but a "covariant derivative" is not necessarily a "directional derivative".

It's analogous to how a car, a truck, and a bus are each an instance of a motor vehicle, but not every motor vehicle is a car; not each is a truck; etc.

So, when MTW writes the term "covariant derivative" but then writes a mathematical expression that looks all the world like a directional derivative, this practice is consistent with their definition of the "covariant derivative" being the generator of other kinds of derivatives.

Is this the correct view to take of how and why MTW keeps calling expressions that are the directional derivative by the name "covariant derivative"?

#### fresh_42

Mentor
2018 Award
All derivatives are directional derivatives per construction. However, we can consider the direction as a variable, a slot to be filled. This makes it a covariant derivative, since the direction hasn't been specified yet. The tricky point here and what MTW tried to describe via slot machine, is the fact that a derivative can be considered from many different perspectives, resulting in a different object.

It is the path from the narrow high school perspective as $f'(x)$ being a "slope" to $D_p f(v)$ being a covariant derivative. As you can see, we can consider the differential process $D$, the evaluation at a certain point $p$, or in a certain direction $v$ or all of them to get a number $D_pf(v)$. Even the function $f$ can be considered as a variable for the process $D$.
It is always more or less the same thing, only differing in the point of view. But the objects are different as well. We have e.g. $D_p(f+g)(v) =D_p(f)(v)+D_p(g)(v)$ and $D_p(f)(v+w)=D_p(f)(v)+D_p(f)(w)$ but the same cannot be done on the location level $p$. We also have $D_p(f\cdot g)(v)=f(g(p)) \cdot D_p(g)(v)+D_{g(p)}(f)(v)\cdot g(p)$ but this is not true on the direction level $v$. So depending on what you consider variable, you get different results from the slot machine $D$. And in the end, you can even consider all these on the component level with different coordinate systems.

Here's a list I once gathered:
and "slope" wasn't even mentioned. If you want to read more, have a look at

"Why does MTW keep calling the "product rule" the "chain rule"?"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving