# Transformation properties of derivative of a scalar field

1. Oct 11, 2012

### ianhoolihan

Hi all,

I'm a part III student and taking the QFT course. The following seems "trivial" but when I went and asked the lecturer, the comment was that they too hate such nitty gritty details!

The problem is page 12 of Tong's notes: http://www.damtp.cam.ac.uk/user/tong/qft/one.pdf

All we're doing is waking an active transformation of a scalar field $x\to x'=\Lambda x$ such that $\phi(x)\to\phi'(x) = \phi(\Lambda^{-1}x)$. Correct me if I'm wrong, but an active rotation in this sense means we keep the axis fixed, and rotate the field. (Q1: if we're not changing the axis, what then does $x\to x' = \Lambda x$ even mean?) I can accept why the $\Lambda^{-1}$ appears, but I think this is a more accurate formulation:

$$\phi(x) \rightarrow \phi'(x') = \phi (\Lambda^{-1} x') = \phi (x)$$

So the previeus statement should really be $\phi(x)\to\phi'(x') = \phi(\Lambda^{-1}x')$. (I.e. I've taken $\phi' = \phi \circ \Lambda^{-1}$ in some sense.) It's equivalent to the former, except one has now taken $x$ to mean $x'$, which I think confuses the (following) situation.

Onto derivatives. The statement given is simply that
$$(\partial_\mu \phi)(x) = (\Lambda^{-1})^\nu{}_\mu (\partial_\nu \phi) (y).$$

My first problem is that things are undefined --- is $\partial_v$ on the right $\partial /\partial x^\nu$ or $\partial / \partial (x')^\nu$ or $\partial / \partial y^\nu$?

I have two different approaches to this:

A:
$$\partial_\mu \phi (x) \rightarrow \partial_{\mu'} \phi'(x') = \frac{\partial x^\nu}{\partial x^{\mu'}}\partial_\nu \phi(\Lambda^{-1}x') =(\Lambda^{-1})^\nu{}_{\mu'} \partial_\nu \phi(\Lambda^{-1}x')$$

where all I've done is change coordinates of the partial derivative. I was going to say that my problem would then that if I evaluated the derivative, I'd get another $\Lambda^{-1}$ from tho chain rule, but that would only have been if I hadn't used the more correct notation of $x'$ instead of $x$ in $\phi(\Lambda^{-1}x')$. This is, of course,$\phi(x)$, so evaluating the derivative involves no chain rule. Hence, by this method,

$$\partial_\mu \phi (x) \rightarrow \partial_{\mu'} \phi'(x') = (\Lambda^{-1})^\nu{}_{\mu'} \partial_\nu \phi(x)$$

This is similar to Tong's notes ($\partial_\nu \phi(x) = (\partial_\nu \phi)(x)$ as no chain rule) except that I have an $x$ instead of a $y$ in the final term.

B.

The next approach is to use the chain rule, and assuming that $\partial_\mu \to \partial_\mu$ i.e. the coordinate basis of differentiation does not change, and ignore my prior statements about $x$ versus $x'$ (which may have been incorrect). Then, letting $y= \Lambda^{-1} x$,

$$\partial_\mu \phi (x) \rightarrow \partial_\mu \phi(y) = \partial_\mu y (\partial_\mu \phi) (y) = (\Lambda^{-1})^\nu{}_\mu (\partial_\nu \phi) (y)$$

where I mean $(\partial_\nu \phi) (y)$ in the sense of $d f (g(x)) = d (g(x)) f' (g(x))$. This seems OK, except that it goes against some of my statements previously, and I'm also not sure if Tong means what I do by $(\partial_\nu \phi) (y)$.

I'd much appreciate a few quick comments on which scheme is correct, or, indeed, if both are wrong! I'm on the A-team...

Cheers.

2. Oct 11, 2012

### Fredrik

Staff Emeritus
I haven't looked at the problem involving the derivative yet. I will only try to explain the basics in this post. This stuff is easier when you have studied differential geometry and are used to thinking in terms of coordinate systems.

Let M be spacetime. Let p be an event, i.e. a point in spacetime. Let x and y denote two global coordinate systems. This means in particular that x and y are functions from spacetime into $\mathbb R^4$. Let f be a real-valued function with domain M.

Using the coordinate systems x and y, we can write
$$f(p)=f\circ x^{-1}(x(p))=f\circ y^{-1}(y(p)).$$ Now let's introduce the notations
$$\phi=f\circ x^{-1},\quad \phi'=f\circ y^{-1},\quad x=x(p),\quad x'=y(p).$$ Yes, I'm using the symbol x for two different things. Keep that in mind when you read the following. We have
$$f(p)=\phi(x)=\phi'(x').$$ Note that the second equality here follows trivially from our definitions. So $\phi(x)\to\phi'(x')$ wouldn't be any kind of "transformation".

I would call f a scalar field and $\phi$ and $\phi'$ coordinate representations of f with respect to the coordinate systems x and y respectively.

A Lorentz transformation is a change of coordinates. We like to write stuff like $x'=\Lambda x$, so $\Lambda$ denotes a change from the unprimed to the primed coordinates. In particular, when $\Lambda$ takes x(p) as input, the output will be y(p). So $\Lambda=y\circ x^{-1}$, and $\Lambda^{-1}=(y\circ x^{-1})^{-1}=x\circ y^{-1}$. This implies that
$$\phi'(x)=f\circ y^{-1}\circ x(p) =\underbrace{f\circ x^{-1}}_{=\phi} \circ \underbrace{x\circ y^{-1}}_{=\Lambda^{-1}} \circ x(p) =\phi(\Lambda^{-1}x).$$ The "active" transformation is the substitution $x\to x'=\Lambda x$, and the "passive" transformation is the inverse of this, i.e. $\phi(x)\to\phi(\Lambda^{-1}x)=\phi'(x)$.

Last edited: Oct 11, 2012
3. Oct 11, 2012

### Fredrik

Staff Emeritus
$\partial_\mu$ means $\displaystyle\frac{\partial}{\partial x^\mu}$.

4. Oct 11, 2012

### ianhoolihan

I do know some differential geometry, and your explanation gets a big thumbs up! I guess that was what I was going for with the $\phi \circ \Lambda^{-1}$ comment, just not as thorough.

That said, I disagree with the very last bit --- according to those notes, $x\to x' = \Lambda x$ corresponds to an active transformation if we transform the field as well --- and this gives $\phi ' = \phi(\Lambda^{-1} x)$. I think passive would mean we rotate the axis as opposed to the field, in which case it would be $\phi' = \phi(\Lambda x)$.

On second thought, this seems off. How about this: in an active transformation, we transform the field such that the value $\phi(x)$ is taken from the position $x$ to the position $\Lambda x$. But we don't actually change the coordinates. Hence it makes sense to talk about the new field $\phi'$ at a position $x$, not $x'$. Hence $\phi'(x)$.

With this reckoning, it also means $\partial_\mu$ stays as it is, i.e. does not go to $\partial_{\mu '}$. Hence the derivative of the transformed field is $\partial_\mu \phi' (x)$. Using your argument, this is $\partial_\mu \phi (\Lambda^{-1} x) = (\Lambda^{-1})^\mu{}_\nu \partial_\nu \phi (x)$ by the chain rule.

How close am I this time?

PS --- is there an equally sexy way of formulating this in terms of push forwards and differential maps in differential geometry?

5. Oct 11, 2012

### Fredrik

Staff Emeritus
The very last bit was the claim that an active transformation of the scalar field by a Lorentz transformation $\Lambda$ is the substitution $x\to\Lambda x$ in the expression $\phi(x)$. Now that I've thought about it some more, I think that was wrong.

Because of the identities $f(p)=f\circ x^{-1}(x(p))=f\circ y^{-1}(y(p))$, I find it hard to think of anything that it makes sense to think of as a transformation of a scalar field under a coordinate change, except maybe the substitution $x\to y$ in the expression $f\circ x^{-1}(x(p))$, which of course does absolutely nothing. Because of this, I figured there has to be something that these books call active transformation of the field. When I wrote my previous post, I didn't see anything it could refer to other than the substitution $x\to\Lambda x$ in the expression $\phi(x)$. With hindsight, that was kind of dumb.

Now I'm thinking that since the idea of transformation of the scalar field (i.e. f) doesn't really make sense, they're probably talking about what I would call a transformation of the coordinate representation. This would be a substitution $x\to y$ in the expression $f\circ x^{-1}$. So to transform the coordinate representation of f means to use the given $\phi$ and $\Lambda$ to find $\phi'$. This is easy.
$$\phi'=f\circ y^{-1} =f\circ x^{-1}\circ x\circ y^{-1} =\phi\circ\Lambda^{-1}.$$ But is $\phi\to\phi\circ\Lambda^{-1}$ an "active" or a "passive" transformation of $\phi$? I'm not sure why these terms are mentioned in this context at all. I guess, since this transformation is associated with the transformation $x\to\Lambda x$, which is an active transformation by $\Lambda$, we can think of $\phi\to\phi\circ\Lambda^{-1}$ as an active transformation of the coordinate representation of the field by $\Lambda$ (or if we are sloppy with the language, as an active transformation of the field by $\Lambda$).

The passive transformation "of the field" (actually of its coordinate representation) by $\Lambda$ would then be the transformation $\phi\to\phi\circ\Lambda$.

Does this sound better?

Note however that in the sloppy notation that's used in QFT books, $\phi$ usually means $\phi(x)$, so I think you are more likely to see $\phi'=\phi$ (meaning $\phi'(x')=\phi(x)$) than $\phi'=\phi\circ\Lambda^{-1}$.

I didn't quite follow your argument, but these are some of my thoughts on derivatives. When a QFT book writes $\partial_\mu\phi$, this really means $\phi_{,\mu}(x(p))$, which is equal to $\frac{\partial}{\partial x^\mu}\!\big|_p\, f$. (See this post for a brief explanation of the notation). If you make the substitution $x\to y$ in this expression, we get
$$\frac{\partial}{\partial y^\mu}\bigg|_p f =(f\circ y^{-1})_{,\mu}(y(p)) =\cdots= (\Lambda^{-1})^\nu{}_\mu (f\circ x^{-1})_{,\nu}(x(p))=(\Lambda^{-1})^\nu{}_\mu \frac{\partial}{\partial x^\nu}\bigg|_p f.$$ If you want to see (most of) the details I omitted above, scroll down to my next post in the thread I linked to above.

The equality between the two expressions closest to the dots above can also be written as
$$\phi'{}_{,\mu}(y(p))=(\Lambda^{-1})^\nu{}_\mu \phi_{,\nu}(x(p)) =\Lambda_\mu{}^\nu\phi_{,\nu}(x(p)).$$ See this post if you don't understand what I did with the $\Lambda$. I'm not sure how to write this in the sloppy notation. I think
$$(\partial_\mu\phi)'(x')=\Lambda_\mu{}^\nu \partial_\nu\phi(x),$$ would make the most sense, but I wouldn't be surprised to see the left-hand side written as $\partial'_\mu\phi(x')$ or $\partial'_\mu\phi'(x')$. (I don't expect the notation to always make sense).

I suspect that a lot of physicists are thinking like this: $\partial_\mu\phi$ transforms to $\partial'{}_\mu\phi'$, but $\phi'=\phi$ so we only have to worry about the derivative, which transforms covariantly, i.e. $\partial'{}_\mu=\Lambda_\mu{}^\nu\partial_\nu$, so $$\partial_\mu\phi \to \partial'{}_\mu\phi'=\Lambda_\mu{}^\nu\partial_\nu\phi.$$ This is more of a mnemonic for the correct result than an actual calculation.

I haven't really thought about that.

Last edited: Oct 11, 2012
6. Oct 12, 2012

### ianhoolihan

Thanks Frederik. I'll have to do some reading on this today, and get back to you. Specifically, what is meant by "active" and "passive". Also, we were disagreeing (I think) on the derivative point. Your contention was that $\partial_\mu \to \partial_{\mu'} = (\Lambda^{-1})^\nu{}_{\mu'}\partial_\nu$ whereas mine was that $\partial_\mu \to \partial_\mu$ and the $\Lambda^{-1}$ came from using the chain rule on $\phi'(x)$ i.e.
$$\partial_\mu \phi'(x^\alpha) = \partial_\mu \phi ((\Lambda^{-1})^\alpha{}_\nu x^\nu) = \partial_\mu ((\Lambda^{-1})^\alpha{}_\nu x^\nu) (\partial_\mu \phi) (\Lambda^{-1}x) =...$$
a problem... the indices do not work. Hmmm, so maybe that means this way is wrong?

7. Oct 12, 2012

### Fredrik

Staff Emeritus
This is how I would take the partial derivative of $\phi'=\phi\circ\Lambda^{-1}$ using the chain rule:
$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$ So
$$\partial_\mu \phi'(x) =\Lambda_\mu{}^\nu \partial_\nu\phi(\Lambda^{-1}x).$$ Since this holds for all x, it will hold if we replace x by $\Lambda x$. This yields
$$\partial'{}_\mu \phi'(x') =\Lambda_\mu{}^\nu \partial_\nu\phi(x).$$ Note that I put a prime on the ∂ on the left. I will try to explain why. Consider the expression $\frac{d}{dx}(ax^2)$. The notation means "the value of the derivative of the function $t\mapsto at^2$ at x". The x in the d/dx tells us both what function we're taking the derivative of, and at what point in the domain the derivative is to be evaluated. Similarly, if we want to take the μth partial derivative of $\phi'$, and evaluate the result at x', we should write $\partial/\partial x'{}^\mu$, not $\partial/\partial x{}^\mu$, and the simplified notation for the former is $\partial'_\mu$, not $\partial{}_\mu$.

Compare this to what I found here:
I see now that (because of what I said about the prime above) $\partial'_\mu\phi'(x')$ is the correct way to write the left-hand side of the first equality in the quote.

We found above (in this post, before the quote) that $\partial'{}_\mu \phi'(x') =\Lambda_\mu{}^\nu \partial_\nu\phi(x),$ but we also need to argue for the fact that the left-hand side is the transformed version of $\partial_\mu\phi(x)$, i.e. that it's correct to put a prime on each of $\partial$, $\phi$ and $x$ when we go to another coordinate system. The justification for that is what I did in my previous post (the one I'm quoting above). The μth partial derivative of f with respect to the coordinate system x is $\frac{\partial}{\partial x^\mu}\!\big|_p\, f$, and it's natural to define the "transformed derivative" as what you get by substituting y for x in that expression, and it follows from the definitions that this is equal to $\partial'{}_\mu \phi'(x')$.

Last edited: Oct 12, 2012
8. Oct 12, 2012

### Fredrik

Staff Emeritus
Summary

The function f is what should be called a scalar field here. $\phi=f\circ x^{-1}$ and $\phi'=f\circ y^{-1}$ are just its coordinate representations with respect to the coordinate systems x and y respectively. Similarly, x(p) and y(p) (also denoted by x and x' respectively) are coordinate representations of the point p. The "transformation" that we're talking about is a change of coordinate systems from x to y. To see how a specific expression changes under that transformation, what we have to do is to first rewrite it using coordinate-independent stuff like the function f and the coordinate system x (i.e. make all references to the coordinate system explicit instead of hidden as in $\phi$), and then make the substitution $x\to y$.

The expressions f and f(p) (the field and its value at p) do not contain x, so they remain unchanged. The coordinate representation of the field changes from $f\circ x^{-1}$ to $f\circ y^{-1}$, i.e. from $\phi$ to $\phi'$. The coordinate representation of p changes from $x(p)$ to $y(p)$ i.e. from x to x'. The expression $(f\circ x^{-1})(x(p))$ changes to $(f\circ y^{-1})(y(p))$. Since the former is by definition equal to $\phi(x)$ and the latter is by definition equal to $\phi'(x')$, we can also say that $\phi(x)$ changes to $\phi'(x')$. But both of these are equal to f(p), so the changes of $f\circ x^{-1}$ and $x(p)$ are canceling each other out in the transformation of $\phi(x)$.

The μth partial derivative of f with respect to the coordinate system x, evaluated at p, changes from $\frac{\partial}{\partial x^\mu}\!\big|_p\, f=\partial_\mu\phi(x)$ to $$\frac{\partial}{\partial y^\mu}\!\bigg|_p\, f =(f\circ y^{-1}),_\mu(y(p))=\partial'_\mu\phi'(x').$$ The prime on the derivative symbol is explained in post #7, just before the quote.

Last edited: Oct 12, 2012
9. Oct 14, 2012

### ianhoolihan

Re: Summary

OK, things got busy, so my apologies for the delay.

After reading the Wiki on active and passive transformations, I am happy to say that a passive transformation is just a trivial change of coordinates. That is, the basis vectors are changed to different ones, or the axis are changed, if you like to think of it that way. This is trivial, in that it doesn't actively change the object, just how it is described $\phi(x)\to \phi'(x') = \phi(x)$. The alternative is an active transformation where the basis (axis) remains fixed, and the coordinate representation of the thing is tranformed $\phi(x) \to \phi'(x) \neq \phi(x)$. The implication that $x\to x'=\Lambda x$ is incorrect, as it really means the coordinate representation at $x$ is moved to $x'=\Lambda x$. This does actively and physically change the object. To clarify, if I have two fields $\phi_a(x)$ and $\phi_b(x)$, and I do a passive transformation on $\phi_a(x) \to \phi_a'(x')$ then when you sort out the different coordinate systems, $\phi_a'(x')$ and $\phi_b(x)$ are still in the same relative state (e.g. same "distance" from eachother). In contrast, if I do an active transformation on $\phi_a(x)\to \phi_a'(x)$ such as a translation, the "distance" between $\phi_a'(x)$ and $\phi_b(x)$ does change.

With a scalar field, how do we write the active transformation? We could think "transform the field at $x$ to that at $x'=\Lambda x$ by acting on $\phi$ with $\Lambda$". But this only works in the vector case --- $\Lambda$ is not defined to act on a scalar. But, an equivalent approach is to transform the basis vectors by the inverse, and then take these new axis to be your original $x$ coordinates. For example, say I wanted to rotate $\phi$ by $\pi/2$ by the corresponding $\Lambda$. For example, in 2D, one would have $\phi'( (1,0)) = \phi((0,1))$. Instead, I could do a passive transformation $x\to x' = \Lambda^{-1} x$, so that $(1,0)\to (1,0)' = (0,1)$ (i.e. the axis rotated clockwise by $\pi/2$). $\phi$ remains unchanged. But now I say the new $\phi'$ is defined by the action on $x'$ coordinates, and we let $x'=x$. That is $\phi'(x)\equiv \phi(x') = \phi(\Lambda^{-1} x)$. One can see that the original statement $\phi'( (1,0)) = \phi(\Lambda^{-1}(1,0)) = \phi((0,1))$ holds.

OK, I'm not sure how clear that all was!

Now to derivatives. Since the basis doesn't change, all we have is
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi ((\Lambda^{-1}) x) = \frac{\partial y^\nu}{\partial x^\mu} \frac{\partial }{\partial y^\nu} \phi (y) =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial y^\nu} \phi (y)$$

where I used $y\equiv \Lambda^{-1} x$. However this is not what I want --- I want a $\partial /\partial x$ in the last bit, like your equation. How do you evaluate your chain rule? It makes little sense to me....?

Cheers

10. Oct 14, 2012

### Fredrik

Staff Emeritus
Re: Summary

I assume that you're referring to the second and third equality in this line:
$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$ The second equality is the chain rule in the form $(f\circ g),_\mu(x)=f,_\nu(g(x))g^\nu{},_\mu(x))$, nothing more, nothing less. $g^\nu$ is the notation I use for the map that takes x to the $\nu$th component of g(x). This post explains why I like this form of the chain rule.

To understand the last equality above, consider the following. Let T be any linear operator. Let $T^\mu$ denote the map $x\mapsto (Tx)^\mu$, i.e. the map that takes x to the $\mu$th component of $Tx$. We have $$T^\mu(x)=(Tx)^\mu=(T(x^\nu e_\nu))^\mu=x^\nu (Te_\nu)^\mu =T^\mu{}_\nu x^\nu,$$ where I have defined $T^\mu{}_\nu=(Te_\nu)^\mu$. These are the components (=matrix elements) of T with respect to the basis in which $x$ has components $x^\nu$. (See this post for a little more about this concept).

Now, what is the $\mu$th partial derivative of $T^\mu$. It's obviously going to be a constant, since $T^\mu$ is a first-degree polynomial. For all x, we have $T^\mu{},_\nu(x)=T^\mu{}_\nu$. (Note that this is just the $\mathbb R^n$ version of the statement: If $f:\mathbb R\to\mathbb R$ is defined by f(x)=ax for all x, then f'(x)=a for all x).

Now consider the special case $T=\Lambda^{-1}$. We get $(\Lambda^{-1})^\mu{},_\nu(x)=(\Lambda^{-1})^\mu{}_\nu$ for all x. And this right-hand side can also be written as $\Lambda_\nu{}^\mu$. (I linked to a post that explains that above).

Last edited: Oct 14, 2012
11. Oct 14, 2012

### Fredrik

Staff Emeritus
I assume that this is the article. By its definitions, the coordinate transformation $x\to y$ is a passive transformation (of spacetime), since it's just a change of coordinates (rather than a map from spacetime onto itself)

The term "active" is only used in two places in the pdf you linked to. Just after (1.26) on page 11, he says that we're dealing with an active transformation (of the field). It seems to me that the only thing he can mean by that, is something that's entirely obvious in the notation and terminology I've been using: that the substitution $x\to y$ changes $f\circ x^{-1}$ to $f\circ y^{-1}$, which is a different function. To see what function it is, we find its value at an arbitrary point u.
$$\phi'(u)=f\circ y^{-1}(u)=f\circ x^{-1}\circ x\circ y^{-1}(u)=\phi(\Lambda^{-1}u).$$ I used that $\Lambda=y\circ x^{-1}$ (i.e. that $\Lambda$ takes $x(p)$ to $y(p)$).

I don't see how his comment that we get $\Lambda^{-1}$ rather than $\Lambda$ "because we're dealing with an active transformation" explains anything. I would describe what he's doing like this: We're choosing to consider the transformation $\phi\to\phi'$, induced by the coordinate change $x\to y$, and this can be thought of as an "active transformation" of $\phi$, since $\phi'\neq\phi$. Note that if we had obtained $\phi'=\phi\circ\Lambda$ instead, we still would have had $\phi'\neq\phi$, making it an "active transformation" of $\phi$. So it really doesn't seem to make sense to say that the appearence of $\Lambda^{-1}$ instead of $\Lambda$ in the formula for $\phi'$ is explained by the fact that we're doing an active transformation.

Maybe he meant something completely different, which does make sense, but if I were you, I wouldn't spend too much time looking for a meaning where there might not be one. It's possible that he just messed up.

The Wikipedia article uses rotations of $\mathbb R^2$ to illustrate what they mean by active and passive transformations. For example, $x\mapsto Rx$ is considered an active transformation (by the rotation R) of the components of x, while a passive transformation of the components of x is the transformation $x_i\mapsto x_i'$ where the $x_i'$ are defined by $x=x_ie_i=x_i'Re_i$. It's not hard to turn this into a formula for $x_i'$. First let's expand $Re_i$ in basis vectors:
$$x_i'Re_i=x_i'(Re_i)_je_j=x_i'R_{ji}e_j.$$ This is equal to $x_je_j$, so we must have
$$x_j=R_{ji}x_i'.$$ Multiply by $(R^{-1})_{kj}$ (and sum over j).
$$(R^{-1})_{kj}x_j=\delta_{ki} x_i'=x_k'.$$ So we have
$$x_i'=(R^{-1})_{ij}x_j$$ for the passive transformation, and
$$x_i'=R_{ij}x_j$$ for the active transformation. This makes "passive transformations" of component matrices a pretty useless concept in my opinion. A passive transformation (of the component matrix) by R is just an active transformation by R-1.

12. Oct 14, 2012

### Muphrid

My background isn't in QFT, but what I've read on classical theories of gravity on a flat background seems extensively grounded in the same sort of math. I think I can shed some light on this topic.

Let $f(x) = x'$ represent such an active remapping of positions. The usual convention seems to be that $\phi'(x') = \phi(x)$. This is equivalent to the form given to you, where they note that $\phi'(x) = \phi(f^{-1}(x))$. It's enough to recognize that for Lorentz rotations, the passive transformation is always the inverse of some active transformation. I'm not sure how to generalize this to the case of an arbitrary transformation--I expect that a passive transformation of this kind must always be the adjoint of an active transformation, but that doesn't say very much for the nonlinear function $f(x)$.

At any rate, I like to stick with $\phi'(x') = \phi(x)$. What follows may not be entirely within the realm of the usual QFT way of doing things, but the math is sufficiently similar that something useful should be gleaned from it, I hope. You can analyze the action of derivatives by using the chain rule. Let $\nabla$ be the 3+1d vector derivative operator. The chain rule gives

$$a \cdot \nabla \phi(x) = a \cdot \nabla \phi'(x') = [a \cdot \nabla f(x)] \cdot \nabla' \phi'(x')$$

For any vector $a$. Define $\underline f(a) \equiv a \cdot \nabla f(x)$ as a linear operator on the vector $a$ which is the Jacobian of the transformation. (Note that rotations/boosts are themselves linear and, as such, equal to their own Jacobians.) This leads to the nice result,

$$a \cdot \nabla \phi(x) = \underline f(a) \cdot \nabla' \phi'(x')$$

Or, the form which I prefer, which is

$$a \cdot \nabla \phi(x) = a \cdot \overline f(\nabla') \phi'(x') \implies \nabla = \overline f(\nabla')$$

where $\overline f$ is the adjoint linear operator to the Jacobian. Note that for Lorentz boosts/rotations, the adjoint is equal to the inverse. (This is always true of orthogonal operators. Lorentz boosts are orthogonal with respect to the Minkowski metric.)

This form gives us the basic tensor transformation law for cotangent vectors (remembering that $\nabla$ is formed from cotangent vectors). The corresponding law for tangent vectors is derived by taking $x = x(\lambda)$ for some affine parameter $\lambda$.

$$\frac{dx'}{d\lambda} = \frac{dx}{d\lambda} \cdot \nabla f(x) = \underline f\left(\frac{dx}{d\lambda}\right) \implies \frac{dx}{d\lambda} = \underline f^{-1} \left(\frac{dx'}{d\lambda}\right)$$

This finishes the derivation of the transformation laws for tensors. In the case of the derivative of a scalar field, we see that

$$\nabla \phi(x) = \overline f[\nabla' \phi'(x')]$$

...I think that's right. The book I originally learned all this from preferred to switch $x, x'$, which I thought was unduly confusing, and it doesn't seem consistent with what the PDF linked does, either.

13. Oct 15, 2012

### ianhoolihan

Re: Summary

I just skimmed through this before lectures, and to clarify, I was confused by notation: I see $(\Lambda^{-1})^\nu{},_\mu = 0$ whereas $(\Lambda^{-1}x)^\nu{},_\mu = (\Lambda^{-1})^\nu{}_\mu$.

14. Oct 15, 2012

### ianhoolihan

OK, I still don't like this. In the linked post, you state the equality between
$$\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^m\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},$$
and
$$(f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x)$$
which I disagree with, as I've always thought $f,_\mu \equiv \partial f / \partial x^\mu$, so that the second expression is
$$(f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x) =\sum_{j=1}^m\frac{\partial f(g(x))}{\partial x_j}\frac{\partial g_j(x)}{\partial x_i}$$
which is incorrect. Furthermore, I do not see what is wrong with my expression,
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi ((\Lambda^{-1}) x) = \frac{\partial y^\nu}{\partial x^\mu} \frac{\partial }{\partial y^\nu} \phi (y) =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial y^\nu} \phi (y)$$

I still think it is much more transparent to write $(\Lambda^{-1}x)^\mu{},_\nu = ((\Lambda^{-1})^\mu{}_\alpha x^\alpha),_\nu = (\Lambda^{-1})^\mu{}_\alpha \delta^\alpha_\nu = (\Lambda^{-1})^\mu{}_\nu$. We're agree on this part, so I'll leave it.

OK, I agree, and that was what I was trying to say with my earlier post --- yours is just more elegant!

Agreed, again.

Errmm, I'll leave this.

So, for now, we are in agreeance, except for this bit about the chain rule.

Cheers.

15. Oct 15, 2012

### vanhees71

Somehow this is all written down in a very complicated way. First of all we have to recall the transformation rule for scalar fields under Lorentz transformations (i.e., boosts and rotations and all possible compositions of those):
$$\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.$$
Here $\Lambda$ is the Lorentz-transformation matrix ${\Lambda^{\mu}}_{\nu}$ fulfilling $\Lambda^{-1}=g \Lambda^T g.$

Now we have
$$\partial_{\mu}' \phi'(x')=\partial_{\mu}' \phi(x)=\frac{\partial x^{\nu}}{\partial x'^{\mu}} \partial_{\nu} \phi(x)={(\Lambda^{-1})^{\nu}}_{\mu} \partial_{\nu} \phi(x).$$
In short $\partial_{\mu} \phi$ transforms under Lorentz transformations as a covariant vector field, and that's what has been to show.

BTW: This is why the derivative of a scalar field wrt. to the contravariant vector components, $x^{\mu}$, leads to a lower index for a covariant vector, the four-dimensional gradient, $\partial_{\mu} \phi$.

16. Oct 15, 2012

### ianhoolihan

OK, we are trying to prove, not "recall" the rule. Also, I'm not sure we agree on what is "active".

Passive:
$$\quad\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.$$
This is a trivial change of coordinates.

Active:
$$\quad\phi'(x')=\phi'(x)=\phi(\Lambda^{-1} x)$$
Here we keep the coordinates the same, but change $\phi\to \phi'$. I think Frederik has explained it well in his last posts.

I think Frederik and I also agree that $\partial'_\mu = \partial_\mu$ as the coordinates do not change in an active transformation. The $\Lambda^{-1}$ comes from the chain rule.

17. Oct 15, 2012

### Fredrik

Staff Emeritus
The reason why my calculations are much longer than yours is that I'm explaining why the "transformed" versions of $\phi(x)$ and $\partial_\mu\phi(x)$ are equal to $\phi'(x')$ and $\partial_\mu'\phi'(x')$ respectively. If you take that as given, or as "obvious" (it's not to me), then the rest is fairly easy, as you noted.

Last edited: Oct 15, 2012
18. Oct 15, 2012

### Fredrik

Staff Emeritus
$f,_\mu$ denotes the $\mu$th partial derivative of f. (Note that this is a function that can be found from knowledge of f alone). So $f,_\mu(g(x))$ denotes the value of $f,_\mu$ at $g(x)$. This is the point of the notation, it makes it perfectly clear what function we're dealing with, and at what point in its domain we are to evaluate it.

So the notation $f,_\mu(g(x))$ can't possibly mean $\partial f(g(x))/\partial x^\mu$, because a) the latter expression denotes the value of the $\mu$th partial derivative of $f\circ g$ at x, which in the comma notation is denoted by $(f\circ g),_\mu(x)$, and b) the function that's being evaluated in the former expression is $f,_\mu$ which has nothing to do with g.

The stuff after the arrow looks fine to me. I just don't know why $\phi$ would be the only thing that transforms when we change the coordinate system.

I'm OK with this too, but I would prefer the $\partial_\mu$ notation over the comma notation here. This is a bit nitpicky, but to use the comma notation here is like writing $(f(x)),_\mu$ instead of $f,_\mu(x)$, and I find that kind of ugly because $,_\mu$ is supposed to be an operator that takes a function to a function, and f(x) isn't a function, it's a number in the range of the function f.

I suppose we could say the same about the expression $\partial_\mu(f(x))$. The operator is supposed to act on f, not on f(x). But I find this less annoying, because in this context we have defined $\partial_\mu$ as an abbreviation of $\partial/\partial x^\mu$, and the x in denominator of $\frac{\partial}{\partial x^\mu}f(x)$ has a purpose. It reminds us that the function we're talking a partial derivative of is $x\mapsto f(x)$, as opposed to say $z\mapsto f(x)$. For this reason, I find the $\frac{\partial}{\partial x^\mu}f(x)$ notation (and therefore also the $\partial_\mu f(x)$ notation) useful enough to be tolerable.

Last edited: Oct 15, 2012
19. Oct 15, 2012

### Muphrid

Some of this really involves the invoking of a convention. The easiest thing is to choose the convention for active transformations and then verify the corresponding passive transformation law.

Thus, there's no harm in taking for granted that $\phi'(x') = \phi(x)$ for active transformations $x'=f(x)$. From here, we just need to derive the passive transformation law. Consider instead $\phi(x \cdot e^\mu e_\mu)$. A passive transformation transforms the basis vectors without transforming the vector $x$ itself. Let $\overline f^{-1} ({e^\mu}') = e^\mu$, and let $\phi(x) = \Phi(x \cdot e^0, x \cdot e^1, \ldots{})$, so then

$$\phi(x) = \Phi(x \cdot e^0, x \cdot e^1) = \Phi(x \cdot \overline f^{-1}({e^0}'), x \cdot \overline f^{-1}({e^1}')) = \Phi(\underline f^{-1}(x) \cdot {e^0}', \underline f^{-1}(x) \cdot {e^1}')$$

Now, define $\phi'(x) = \Phi(x \cdot {e^0}', \ldots{})$. It is then guaranteed that $\phi'(\underline f^{-1}(x)) = \phi(x)$, which allows us to conclude that the passive transformation has $x' = \underline f^{-1}(x)$. We have constructed it to be so, and this should be persuasive (though I won't presume to call it proof) that passive transformations are naturally the inverses of active ones. $x$ doesn't really change, but there exists an $x'$ that would come from the active transformation that corresponds to the passive one.

Ultimately, though, making the statement that $\phi'(x') = \phi(x)$ is a necessary convention, in my opinion, while finding out how $x'$ relates to $x$ under the two kinds of transformations is really the matter at hand.

20. Oct 15, 2012

### Fredrik

Staff Emeritus
In my opinion, this shows why the terms "active" and "passive" shouldn't be used at all in this context. You're calling this a "passive" transformation probably because I said that by the Wikipedia article's definition, a coordinate change is a passive transformation. However, if M denotes the spacetime manifold, and x and y are coordinate systems that map M onto $\mathbb R^4$, as in all of my posts above, then while a coordinate change $x\to y$ could be called a passive transformation (of p, or of M), the function $\Lambda=y\circ x^{-1}$ that induces this change on coordinate 4-tuples has to be considered an active transformation of the components of x(p) since it takes x(p) to y(p).

And this is just the start of the confusion, since the pdf talks about active vs. passive transformations of the field without much of an explanation.

If I was the king of the universe, I think I would permanently retire that confusing terminology, at least from the context of transformation of field components.

21. Oct 16, 2012

### ianhoolihan

OK, reading the wiki again, the example makes it clear what is active and passive. By passive, the geometric thing of the vector does not change, only it's coordinate representation, and trivially so. The basis is transformed, and the coordinate representation of the vector transformed by th inverse, so the net effect is zilch --- a trivial coordinate transformation. In an active transformation, the geometric thing of the vector is itself rotated, which is represented by the a transformation of the corresponding coordinate representation, but not of the basis vectors. The net effect is not zilch.

Now, to your method of things. Since I don't want to confuse myself, I'll denote the map that takes a point in the manifold $p$ to a subset $V$ of $\mathbf{R}^n$ by $\varphi_x$. That is, $\varphi_x$ corresponds to your map $x$. So the coordinates of $x(p)$ for me are the coordinates of $\varphi_x(p)$.

Define some function $f:\ M\to \mathbf{I}$. As you say
$$f(p) = f\circ \varphi_x^{-1} \circ \varphi_x (p) = \phi(x)$$
where $\phi \equiv f\circ \varphi_x^{-1}$ and $x \equiv \varphi_x(p)$.

Now, by passive transformation, I think we are introducing a new coordinate system $x'$ on $V$ as above, such that $x \to x' = \Lambda x$. As you suggest, we can define $\Lambda = \varphi_{x'}\circ \varphi_x^{-1}\ : V\to V$ where $\varphi_{x'}\ : M \to V$. Now, all we have is
$$\phi(x) = f \circ \varphi_x^{-1} \circ \Lambda^{-1} \circ \Lambda \circ \varphi_x (p)= \phi'(x')$$
where $\phi' = \phi \circ \Lambda^{-1}$ and $x'= \Lambda (x)$. Trivially, $\phi'(x') = \phi(x)$.

Now, for an active transformation, as you've said in a prior post, one has $\varphi_x\to \varphi_{x'}$ in $\phi = f\circ \varphi_x ^{-1}$, but not in the argument $x(p)=\varphi_x(p)$. Then
$$\phi=f \circ \varphi_x^{-1} \to \phi' = f \circ \varphi_{x'} = f \circ \varphi_x^{-1}\circ \varphi_x \varphi_{x'}^{-1} = \phi \circ \Lambda^{-1}.$$
Both $\phi$ and $\phi'$ act on $x$, and clearly $\phi(x)\neq \phi'(x)=\phi(\Lambda^{-1} x)$. Hey presto, we're done! And I think we agree?

As for derivatives, my point in the prior post was regarding notation --- I'd always had the $\partial_\mu \equiv \partial / \partial x^\mu$. Your post had me a bit confused with the intricacies of the comma notation, but I think we're sorted now. However, you ask

When you do the same:
As above, I do not think we change the coordinate system in an active transformation. So, for now I stick by
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi (\Lambda^{-1} x) = \frac{\partial x'^\nu}{\partial x^\mu} \frac{\partial }{\partial x'^\nu} \phi (x') =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial x'^\nu} \phi (x')$$

PS --- if you become king of the world, can you magic me into an academic position at a university?

22. Oct 16, 2012

### Muphrid

You keep talking about the coordinate system not changing in an active transformation. I'm not sure if you mean to say that the coordinate lines are the same or that the basis vectors are the same.

Let $x = x^\mu e_\mu$ and $x' = f(x) = {x'}^\mu e_\mu$. In this picture, the basis vectors aren't changing. I assume this is what you mean by the coordinate system not changing. Nevertheless, the coordinates used for $\nabla'$ are different than those used for $\nabla$. I use this technique all the time to convert between coordinate systems (by the equivalence of passive and active transformations).

Then, you can use the transformation law for the vector derivative:

$$e_\mu \cdot \nabla \phi(x) = e_\mu \cdot \overline f[\nabla' \phi'(x')]$$

Or, in index notation,

$$\partial_\mu \phi(x) = {f^\nu}_\mu {\partial'_\nu} \phi'(x')$$

Which is clearly similar to what you've written, though the method is general, not particular to a boost.

23. Oct 16, 2012

### ianhoolihan

Your notation etc is unfamiliar. However, to clarify, by "the same coordinate system", I mean that the basis vectors do not change. You'll note that the derivative is $\partial /\partial x'$ in the last bit.

24. Oct 16, 2012

### Muphrid

Yeah, part of the goal of the notation is to avoid indices as much as possible. Unfortunately, indices are very, very ingrained in most discussions of this math. At any rate, though, we seem to agree that the basis isn't changing, so I think all your other results are valid.

25. Oct 16, 2012

### Fredrik

Staff Emeritus
That's OK, but I think a better way to improve the notation would be to rename my coordinate systems x and y to y and z respectively, or y and y'. If you don't like y, then how about S and S'? The source of the confusion was that I used x for two different things, if you really want to improve the notation for the coordinate systems, it would be best to use a notation for the coordinate systems that doesn't involve x at all. I'll use y and z in this post. We have x=y(p), x'=z(p), $\phi=f\circ y^{-1}$ and so on.

You're quoting a calculation of mine that's similar to the part of what you did that I said was fine. What I was objecting to was the idea that the thing on the left of the arrow would transform to the thing on the right of the arrow when we change coordinate systems $y\to z$. Did you mean something else by the arrow?