Transformation properties of derivative of a scalar field

ianhoolihan · Oct 11, 2012

Hi all,

I'm a part III student and taking the QFT course. The following seems "trivial" but when I went and asked the lecturer, the comment was that they too hate such nitty gritty details!

The problem is page 12 of Tong's notes: http://www.damtp.cam.ac.uk/user/tong/qft/one.pdf

All we're doing is waking an active transformation of a scalar field ##x\to x'=\Lambda x## such that ##\phi(x)\to\phi'(x) = \phi(\Lambda^{-1}x)##. Correct me if I'm wrong, but an active rotation in this sense means we keep the axis fixed, and rotate the field. (Q1: if we're not changing the axis, what then does ##x\to x' = \Lambda x## even mean?) I can accept why the ##\Lambda^{-1}## appears, but I think this is a more accurate formulation:

$$\phi(x) \rightarrow \phi'(x') = \phi (\Lambda^{-1} x') = \phi (x) $$

So the previeus statement should really be ##\phi(x)\to\phi'(x') = \phi(\Lambda^{-1}x')##. (I.e. I've taken ##\phi' = \phi \circ \Lambda^{-1}## in some sense.) It's equivalent to the former, except one has now taken ##x## to mean ##x'##, which I think confuses the (following) situation.

Onto derivatives. The statement given is simply that
$$(\partial_\mu \phi)(x) = (\Lambda^{-1})^\nu{}_\mu (\partial_\nu \phi) (y).$$

My first problem is that things are undefined --- is ##\partial_v## on the right ##\partial /\partial x^\nu## or ##\partial / \partial (x')^\nu## or ##\partial / \partial y^\nu##?

I have two different approaches to this:

A:
$$\partial_\mu \phi (x) \rightarrow \partial_{\mu'} \phi'(x') = \frac{\partial x^\nu}{\partial x^{\mu'}}\partial_\nu \phi(\Lambda^{-1}x') =(\Lambda^{-1})^\nu{}_{\mu'} \partial_\nu \phi(\Lambda^{-1}x')$$

where all I've done is change coordinates of the partial derivative. I was going to say that my problem would then that if I evaluated the derivative, I'd get another ##\Lambda^{-1}## from tho chain rule, but that would only have been if I hadn't used the more correct notation of ##x'## instead of ##x## in ##\phi(\Lambda^{-1}x')##. This is, of course,##\phi(x)##, so evaluating the derivative involves no chain rule. Hence, by this method,

$$\partial_\mu \phi (x) \rightarrow \partial_{\mu'} \phi'(x') = (\Lambda^{-1})^\nu{}_{\mu'} \partial_\nu \phi(x)$$

This is similar to Tong's notes (##\partial_\nu \phi(x) = (\partial_\nu \phi)(x)## as no chain rule) except that I have an ##x## instead of a ##y## in the final term.

B.

The next approach is to use the chain rule, and assuming that ##\partial_\mu \to \partial_\mu## i.e. the coordinate basis of differentiation does not change, and ignore my prior statements about ##x## versus ##x'## (which may have been incorrect). Then, letting ##y= \Lambda^{-1} x##,

$$\partial_\mu \phi (x) \rightarrow \partial_\mu \phi(y) = \partial_\mu y (\partial_\mu \phi) (y) = (\Lambda^{-1})^\nu{}_\mu (\partial_\nu \phi) (y)$$

where I mean ##(\partial_\nu \phi) (y)## in the sense of ##d f (g(x)) = d (g(x)) f' (g(x))##. This seems OK, except that it goes against some of my statements previously, and I'm also not sure if Tong means what I do by ##(\partial_\nu \phi) (y)##.

I'd much appreciate a few quick comments on which scheme is correct, or, indeed, if both are wrong! I'm on the A-team...

Cheers.

Fredrik · Oct 11, 2012

I haven't looked at the problem involving the derivative yet. I will only try to explain the basics in this post. This stuff is easier when you have studied differential geometry and are used to thinking in terms of coordinate systems.

Let M be spacetime. Let p be an event, i.e. a point in spacetime. Let x and y denote two global coordinate systems. This means in particular that x and y are functions from spacetime into ##\mathbb R^4##. Let f be a real-valued function with domain M.

Using the coordinate systems x and y, we can write
$$f(p)=f\circ x^{-1}(x(p))=f\circ y^{-1}(y(p)).$$ Now let's introduce the notations
$$\phi=f\circ x^{-1},\quad \phi'=f\circ y^{-1},\quad x=x(p),\quad x'=y(p).$$ Yes, I'm using the symbol x for two different things. Keep that in mind when you read the following. We have
$$f(p)=\phi(x)=\phi'(x').$$ Note that the second equality here follows trivially from our definitions. So ##\phi(x)\to\phi'(x')## wouldn't be any kind of "transformation".

I would call f a scalar field and ##\phi## and ##\phi'## coordinate representations of f with respect to the coordinate systems x and y respectively.

A Lorentz transformation is a change of coordinates. We like to write stuff like ##x'=\Lambda x##, so ##\Lambda## denotes a change from the unprimed to the primed coordinates. In particular, when ##\Lambda## takes x(p) as input, the output will be y(p). So ##\Lambda=y\circ x^{-1}##, and ##\Lambda^{-1}=(y\circ x^{-1})^{-1}=x\circ y^{-1}##. This implies that
$$\phi'(x)=f\circ y^{-1}\circ x(p) =\underbrace{f\circ x^{-1}}_{=\phi} \circ \underbrace{x\circ y^{-1}}_{=\Lambda^{-1}} \circ x(p) =\phi(\Lambda^{-1}x).$$ The "active" transformation is the substitution ##x\to x'=\Lambda x##, and the "passive" transformation is the inverse of this, i.e. ##\phi(x)\to\phi(\Lambda^{-1}x)=\phi'(x)##.

Fredrik · Oct 11, 2012

##\partial_\mu## means ##\displaystyle\frac{\partial}{\partial x^\mu}##.

ianhoolihan · Oct 11, 2012

Fredrik said:

I haven't looked at the problem involving the derivative yet. I will only try to explain the basics in this post. This stuff is easier when you have studied differential geometry and are used to thinking in terms of coordinate systems.

Let M be spacetime. Let p be an event, i.e. a point in spacetime. Let x and y denote two global coordinate systems. This means in particular that x and y are functions from spacetime into ##\mathbb R^4##. Let f be a real-valued function with domain M.

Using the coordinate systems x and y, we can write
$$f(p)=f\circ x^{-1}(x(p))=f\circ y^{-1}(y(p)).$$ Now let's introduce the notations
$$\phi=f\circ x^{-1},\quad \phi'=f\circ y^{-1},\quad x=x(p),\quad x'=y(p).$$ Yes, I'm using the symbol x for two different things. Keep that in mind when you read the following. We have
$$f(p)=\phi(x)=\phi'(x').$$ Note that the second equality here follows trivially from our definitions. So ##\phi(x)\to\phi'(x')## wouldn't be any kind of "transformation".

I would call f a scalar field and ##\phi## and ##\phi'## coordinate representations of f with respect to the coordinate systems x and y respectively.

A Lorentz transformation is a change of coordinates. We like to write stuff like ##x'=\Lambda x##, so ##\Lambda## denotes a change from the unprimed to the primed coordinates. In particular, when ##\Lambda## takes x(p) as input, the output will be y(p). So ##\Lambda=y\circ x^{-1}##, and ##\Lambda^{-1}=(y\circ x^{-1})^{-1}=x\circ y^{-1}##. This implies that
$$\phi'(x)=f\circ y^{-1}\circ x(p) =\underbrace{f\circ x^{-1}}_{=\phi} \circ \underbrace{x\circ y^{-1}}_{=\Lambda^{-1}} \circ x(p) =\phi(\Lambda^{-1}x).$$ The "active" transformation is the substitution ##x\to x'=\Lambda x##, and the "passive" transformation is the inverse of this, i.e. ##\phi(x)\to\phi(\Lambda^{-1}x)=\phi'(x)##.

I do know some differential geometry, and your explanation gets a big thumbs up! I guess that was what I was going for with the ##\phi \circ \Lambda^{-1}## comment, just not as thorough.

That said, I disagree with the very last bit --- according to those notes, ##x\to x' = \Lambda x## corresponds to an active transformation if we transform the field as well --- and this gives ##\phi ' = \phi(\Lambda^{-1} x)##. I think passive would mean we rotate the axis as opposed to the field, in which case it would be ##\phi' = \phi(\Lambda x)##.

On second thought, this seems off. How about this: in an active transformation, we transform the field such that the value ##\phi(x)## is taken from the position ##x## to the position ##\Lambda x##. But we don't actually change the coordinates. Hence it makes sense to talk about the new field ##\phi'## at a position ##x##, not ##x'##. Hence ##\phi'(x)##.

With this reckoning, it also means ##\partial_\mu## stays as it is, i.e. does not go to ##\partial_{\mu '}##. Hence the derivative of the transformed field is ##\partial_\mu \phi' (x)##. Using your argument, this is ##\partial_\mu \phi (\Lambda^{-1} x) = (\Lambda^{-1})^\mu{}_\nu \partial_\nu \phi (x)## by the chain rule.

How close am I this time?

PS --- is there an equally sexy way of formulating this in terms of push forwards and differential maps in differential geometry?

Fredrik · Oct 11, 2012

ianhoolihan said:

That said, I disagree with the very last bit

The very last bit was the claim that an active transformation of the scalar field by a Lorentz transformation ##\Lambda## is the substitution ##x\to\Lambda x## in the expression ##\phi(x)##. Now that I've thought about it some more, I think that was wrong.

Because of the identities ##f(p)=f\circ x^{-1}(x(p))=f\circ y^{-1}(y(p))##, I find it hard to think of anything that it makes sense to think of as a transformation of a scalar field under a coordinate change, except maybe the substitution ##x\to y## in the expression ##f\circ x^{-1}(x(p))##, which of course does absolutely nothing. Because of this, I figured there has to be something that these books call active transformation of the field. When I wrote my previous post, I didn't see anything it could refer to other than the substitution ##x\to\Lambda x## in the expression ##\phi(x)##. With hindsight, that was kind of dumb.

Now I'm thinking that since the idea of transformation of the scalar field (i.e. f) doesn't really make sense, they're probably talking about what I would call a transformation of the coordinate representation. This would be a substitution ##x\to y## in the expression ##f\circ x^{-1}##. So to transform the coordinate representation of f means to use the given ##\phi## and ##\Lambda## to find ##\phi'##. This is easy.
$$\phi'=f\circ y^{-1} =f\circ x^{-1}\circ x\circ y^{-1} =\phi\circ\Lambda^{-1}.$$ But is ##\phi\to\phi\circ\Lambda^{-1}## an "active" or a "passive" transformation of ##\phi##? I'm not sure why these terms are mentioned in this context at all. I guess, since this transformation is associated with the transformation ##x\to\Lambda x##, which is an active transformation by ##\Lambda##, we can think of ##\phi\to\phi\circ\Lambda^{-1}## as an active transformation of the coordinate representation of the field by ##\Lambda## (or if we are sloppy with the language, as an active transformation of the field by ##\Lambda##).

The passive transformation "of the field" (actually of its coordinate representation) by ##\Lambda## would then be the transformation ##\phi\to\phi\circ\Lambda##.

Does this sound better?

Note however that in the sloppy notation that's used in QFT books, ##\phi## usually means ##\phi(x)##, so I think you are more likely to see ##\phi'=\phi## (meaning ##\phi'(x')=\phi(x)##) than ##\phi'=\phi\circ\Lambda^{-1}##.

ianhoolihan said:

With this reckoning, it also means ##\partial_\mu## stays as it is, i.e. does not go to ##\partial_{\mu '}##. Hence the derivative of the transformed field is ##\partial_\mu \phi' (x)##.

I didn't quite follow your argument, but these are some of my thoughts on derivatives. When a QFT book writes ##\partial_\mu\phi##, this really means ##\phi_{,\mu}(x(p))##, which is equal to ##\frac{\partial}{\partial x^\mu}\!\big|_p\, f##. (See this post for a brief explanation of the notation). If you make the substitution ##x\to y## in this expression, we get
$$\frac{\partial}{\partial y^\mu}\bigg|_p f =(f\circ y^{-1})_{,\mu}(y(p)) =\cdots= (\Lambda^{-1})^\nu{}_\mu (f\circ x^{-1})_{,\nu}(x(p))=(\Lambda^{-1})^\nu{}_\mu \frac{\partial}{\partial x^\nu}\bigg|_p f.$$ If you want to see (most of) the details I omitted above, scroll down to my next post in the thread I linked to above.

The equality between the two expressions closest to the dots above can also be written as
$$\phi'{}_{,\mu}(y(p))=(\Lambda^{-1})^\nu{}_\mu \phi_{,\nu}(x(p)) =\Lambda_\mu{}^\nu\phi_{,\nu}(x(p)).$$ See this post if you don't understand what I did with the ##\Lambda##. I'm not sure how to write this in the sloppy notation. I think
$$(\partial_\mu\phi)'(x')=\Lambda_\mu{}^\nu \partial_\nu\phi(x),$$ would make the most sense, but I wouldn't be surprised to see the left-hand side written as ##\partial'_\mu\phi(x')## or ##\partial'_\mu\phi'(x')##. (I don't expect the notation to always make sense).

I suspect that a lot of physicists are thinking like this: ##\partial_\mu\phi## transforms to ##\partial'{}_\mu\phi'##, but ##\phi'=\phi## so we only have to worry about the derivative, which transforms covariantly, i.e. ##\partial'{}_\mu=\Lambda_\mu{}^\nu\partial_\nu##, so $$\partial_\mu\phi \to \partial'{}_\mu\phi'=\Lambda_\mu{}^\nu\partial_\nu\phi.$$ This is more of a mnemonic for the correct result than an actual calculation.

ianhoolihan said:

PS --- is there an equally sexy way of formulating this in terms of push forwards and differential maps in differential geometry?

I haven't really thought about that.

ianhoolihan · Oct 12, 2012

Thanks Frederik. I'll have to do some reading on this today, and get back to you. Specifically, what is meant by "active" and "passive". Also, we were disagreeing (I think) on the derivative point. Your contention was that ##\partial_\mu \to \partial_{\mu'} = (\Lambda^{-1})^\nu{}_{\mu'}\partial_\nu## whereas mine was that ##\partial_\mu \to \partial_\mu## and the ##\Lambda^{-1}## came from using the chain rule on ##\phi'(x)## i.e.
$$\partial_\mu \phi'(x^\alpha) = \partial_\mu \phi ((\Lambda^{-1})^\alpha{}_\nu x^\nu) = \partial_\mu ((\Lambda^{-1})^\alpha{}_\nu x^\nu) (\partial_\mu \phi) (\Lambda^{-1}x) =...$$
a problem... the indices do not work. Hmmm, so maybe that means this way is wrong?

I will think about it.

Fredrik · Oct 12, 2012

This is how I would take the partial derivative of ##\phi'=\phi\circ\Lambda^{-1}## using the chain rule:
$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$ So
$$\partial_\mu \phi'(x) =\Lambda_\mu{}^\nu \partial_\nu\phi(\Lambda^{-1}x).$$ Since this holds for all x, it will hold if we replace x by ##\Lambda x##. This yields
$$\partial'{}_\mu \phi'(x') =\Lambda_\mu{}^\nu \partial_\nu\phi(x).$$ Note that I put a prime on the ∂ on the left. I will try to explain why. Consider the expression ##\frac{d}{dx}(ax^2)##. The notation means "the value of the derivative of the function ##t\mapsto at^2## at x". The x in the d/dx tells us both what function we're taking the derivative of, and at what point in the domain the derivative is to be evaluated. Similarly, if we want to take the μth partial derivative of ##\phi'##, and evaluate the result at x', we should write ##\partial/\partial x'{}^\mu##, not ##\partial/\partial x{}^\mu##, and the simplified notation for the former is ##\partial'_\mu##, not ##\partial{}_\mu##.

Compare this to what I found here:

Fredrik said:

The equality between the two expressions closest to the dots above can also be written as
$$\phi'{}_{,\mu}(y(p))=(\Lambda^{-1})^\nu{}_\mu \phi_{,\nu}(x(p)) =\Lambda_\mu{}^\nu\phi_{,\nu}(x(p)).$$ See this post if you don't understand what I did with the ##\Lambda##. I'm not sure how to write this in the sloppy notation. I think
$$(\partial_\mu\phi)'(x')=\Lambda_\mu{}^\nu \partial_\nu\phi(x),$$ would make the most sense, but I wouldn't be surprised to see the left-hand side written as ##\partial'_\mu\phi(x')## or ##\partial'_\mu\phi'(x')##. (I don't expect the notation to always make sense).

I see now that (because of what I said about the prime above) ##\partial'_\mu\phi'(x')## is the correct way to write the left-hand side of the first equality in the quote.

We found above (in this post, before the quote) that ##\partial'{}_\mu \phi'(x') =\Lambda_\mu{}^\nu \partial_\nu\phi(x),## but we also need to argue for the fact that the left-hand side is the transformed version of ##\partial_\mu\phi(x)##, i.e. that it's correct to put a prime on each of ##\partial##, ##\phi## and ##x## when we go to another coordinate system. The justification for that is what I did in my previous post (the one I'm quoting above). The μth partial derivative of f with respect to the coordinate system x is ##\frac{\partial}{\partial x^\mu}\!\big|_p\, f##, and it's natural to define the "transformed derivative" as what you get by substituting y for x in that expression, and it follows from the definitions that this is equal to ##\partial'{}_\mu \phi'(x')##.

Fredrik · Oct 12, 2012

Summary

The function f is what should be called a scalar field here. ##\phi=f\circ x^{-1}## and ##\phi'=f\circ y^{-1}## are just its coordinate representations with respect to the coordinate systems x and y respectively. Similarly, x(p) and y(p) (also denoted by x and x' respectively) are coordinate representations of the point p. The "transformation" that we're talking about is a change of coordinate systems from x to y. To see how a specific expression changes under that transformation, what we have to do is to first rewrite it using coordinate-independent stuff like the function f and the coordinate system x (i.e. make all references to the coordinate system explicit instead of hidden as in ##\phi##), and then make the substitution ##x\to y##.

The expressions f and f(p) (the field and its value at p) do not contain x, so they remain unchanged. The coordinate representation of the field changes from ##f\circ x^{-1}## to ##f\circ y^{-1}##, i.e. from ##\phi## to ##\phi'##. The coordinate representation of p changes from ##x(p)## to ##y(p)## i.e. from x to x'. The expression ##(f\circ x^{-1})(x(p))## changes to ##(f\circ y^{-1})(y(p))##. Since the former is by definition equal to ##\phi(x)## and the latter is by definition equal to ##\phi'(x')##, we can also say that ##\phi(x)## changes to ##\phi'(x')##. But both of these are equal to f(p), so the changes of ##f\circ x^{-1}## and ##x(p)## are canceling each other out in the transformation of ##\phi(x)##.

The μth partial derivative of f with respect to the coordinate system x, evaluated at p, changes from ##\frac{\partial}{\partial x^\mu}\!\big|_p\, f=\partial_\mu\phi(x)## to $$\frac{\partial}{\partial y^\mu}\!\bigg|_p\, f =(f\circ y^{-1}),_\mu(y(p))=\partial'_\mu\phi'(x').$$ The prime on the derivative symbol is explained in post #7, just before the quote.

ianhoolihan · Oct 14, 2012

OK, things got busy, so my apologies for the delay.

After reading the Wiki on active and passive transformations, I am happy to say that a passive transformation is just a trivial change of coordinates. That is, the basis vectors are changed to different ones, or the axis are changed, if you like to think of it that way. This is trivial, in that it doesn't actively change the object, just how it is described ##\phi(x)\to \phi'(x') = \phi(x)##. The alternative is an active transformation where the basis (axis) remains fixed, and the coordinate representation of the thing is tranformed ##\phi(x) \to \phi'(x) \neq \phi(x)##. The implication that ##x\to x'=\Lambda x## is incorrect, as it really means the coordinate representation at ##x## is moved to ##x'=\Lambda x##. This does actively and physically change the object. To clarify, if I have two fields ##\phi_a(x)## and ##\phi_b(x)##, and I do a passive transformation on ##\phi_a(x) \to \phi_a'(x')## then when you sort out the different coordinate systems, ##\phi_a'(x')## and ##\phi_b(x)## are still in the same relative state (e.g. same "distance" from each other). In contrast, if I do an active transformation on ##\phi_a(x)\to \phi_a'(x)## such as a translation, the "distance" between ##\phi_a'(x)## and ##\phi_b(x)## does change.

With a scalar field, how do we write the active transformation? We could think "transform the field at ##x## to that at ##x'=\Lambda x## by acting on ##\phi## with ##\Lambda##". But this only works in the vector case --- ##\Lambda## is not defined to act on a scalar. But, an equivalent approach is to transform the basis vectors by the inverse, and then take these new axis to be your original ##x## coordinates. For example, say I wanted to rotate ##\phi## by ##\pi/2## by the corresponding ##\Lambda##. For example, in 2D, one would have ##\phi'( (1,0)) = \phi((0,1)) ##. Instead, I could do a passive transformation ##x\to x' = \Lambda^{-1} x##, so that ##(1,0)\to (1,0)' = (0,1) ## (i.e. the axis rotated clockwise by ##\pi/2##). ##\phi## remains unchanged. But now I say the new ##\phi'## is defined by the action on ##x'## coordinates, and we let ##x'=x##. That is ##\phi'(x)\equiv \phi(x') = \phi(\Lambda^{-1} x)##. One can see that the original statement ##\phi'( (1,0)) = \phi(\Lambda^{-1}(1,0)) = \phi((0,1)) ## holds.

OK, I'm not sure how clear that all was!

Now to derivatives. Since the basis doesn't change, all we have is
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi ((\Lambda^{-1}) x) = \frac{\partial y^\nu}{\partial x^\mu} \frac{\partial }{\partial y^\nu} \phi (y) =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial y^\nu} \phi (y) $$

where I used ##y\equiv \Lambda^{-1} x##. However this is not what I want --- I want a ##\partial /\partial x## in the last bit, like your equation. How do you evaluate your chain rule? It makes little sense to me...?

Cheers

Fredrik · Oct 14, 2012

ianhoolihan said:

How do you evaluate your chain rule? It makes little sense to me...?

I assume that you're referring to the second and third equality in this line:
$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$ The second equality is the chain rule in the form ##(f\circ g),_\mu(x)=f,_\nu(g(x))g^\nu{},_\mu(x))##, nothing more, nothing less. ##g^\nu## is the notation I use for the map that takes x to the ##\nu##th component of g(x). This post explains why I like this form of the chain rule.

To understand the last equality above, consider the following. Let T be any linear operator. Let ##T^\mu## denote the map ##x\mapsto (Tx)^\mu##, i.e. the map that takes x to the ##\mu##th component of ##Tx##. We have $$T^\mu(x)=(Tx)^\mu=(T(x^\nu e_\nu))^\mu=x^\nu (Te_\nu)^\mu =T^\mu{}_\nu x^\nu,$$ where I have defined ##T^\mu{}_\nu=(Te_\nu)^\mu##. These are the components (=matrix elements) of T with respect to the basis in which ##x## has components ##x^\nu##. (See this post for a little more about this concept).

Now, what is the ##\mu##th partial derivative of ##T^\mu##. It's obviously going to be a constant, since ##T^\mu## is a first-degree polynomial. For all x, we have ##T^\mu{},_\nu(x)=T^\mu{}_\nu##. (Note that this is just the ##\mathbb R^n## version of the statement: If ##f:\mathbb R\to\mathbb R## is defined by f(x)=ax for all x, then f'(x)=a for all x).

Now consider the special case ##T=\Lambda^{-1}##. We get ##(\Lambda^{-1})^\mu{},_\nu(x)=(\Lambda^{-1})^\mu{}_\nu## for all x. And this right-hand side can also be written as ##\Lambda_\nu{}^\mu##. (I linked to a post that explains that above).

Fredrik · Oct 14, 2012

I assume that this is the article. By its definitions, the coordinate transformation ##x\to y## is a passive transformation (of spacetime), since it's just a change of coordinates (rather than a map from spacetime onto itself)

The term "active" is only used in two places in the pdf you linked to. Just after (1.26) on page 11, he says that we're dealing with an active transformation (of the field). It seems to me that the only thing he can mean by that, is something that's entirely obvious in the notation and terminology I've been using: that the substitution ##x\to y## changes ##f\circ x^{-1}## to ##f\circ y^{-1}##, which is a different function. To see what function it is, we find its value at an arbitrary point u.
$$\phi'(u)=f\circ y^{-1}(u)=f\circ x^{-1}\circ x\circ y^{-1}(u)=\phi(\Lambda^{-1}u).$$ I used that ##\Lambda=y\circ x^{-1}## (i.e. that ##\Lambda## takes ##x(p)## to ##y(p)##).

I don't see how his comment that we get ##\Lambda^{-1}## rather than ##\Lambda## "because we're dealing with an active transformation" explains anything. I would describe what he's doing like this: We're choosing to consider the transformation ##\phi\to\phi'##, induced by the coordinate change ##x\to y##, and this can be thought of as an "active transformation" of ##\phi##, since ##\phi'\neq\phi##. Note that if we had obtained ##\phi'=\phi\circ\Lambda## instead, we still would have had ##\phi'\neq\phi##, making it an "active transformation" of ##\phi##. So it really doesn't seem to make sense to say that the appearence of ##\Lambda^{-1}## instead of ##\Lambda## in the formula for ##\phi'## is explained by the fact that we're doing an active transformation.

Maybe he meant something completely different, which does make sense, but if I were you, I wouldn't spend too much time looking for a meaning where there might not be one. It's possible that he just messed up.

The Wikipedia article uses rotations of ##\mathbb R^2## to illustrate what they mean by active and passive transformations. For example, ##x\mapsto Rx## is considered an active transformation (by the rotation R) of the components of x, while a passive transformation of the components of x is the transformation ##x_i\mapsto x_i'## where the ##x_i'## are defined by ##x=x_ie_i=x_i'Re_i##. It's not hard to turn this into a formula for ##x_i'##. First let's expand ##Re_i## in basis vectors:
$$x_i'Re_i=x_i'(Re_i)_je_j=x_i'R_{ji}e_j.$$ This is equal to ##x_je_j##, so we must have
$$x_j=R_{ji}x_i'.$$ Multiply by ##(R^{-1})_{kj}## (and sum over j).
$$(R^{-1})_{kj}x_j=\delta_{ki} x_i'=x_k'.$$ So we have
$$x_i'=(R^{-1})_{ij}x_j$$ for the passive transformation, and
$$x_i'=R_{ij}x_j$$ for the active transformation. This makes "passive transformations" of component matrices a pretty useless concept in my opinion. A passive transformation (of the component matrix) by R is just an active transformation by R^-1.

Muphrid · Oct 14, 2012

My background isn't in QFT, but what I've read on classical theories of gravity on a flat background seems extensively grounded in the same sort of math. I think I can shed some light on this topic.

Let [itex]f(x) = x'[/itex] represent such an active remapping of positions. The usual convention seems to be that [itex]\phi'(x') = \phi(x)[/itex]. This is equivalent to the form given to you, where they note that [itex]\phi'(x) = \phi(f^{-1}(x))[/itex]. It's enough to recognize that for Lorentz rotations, the passive transformation is always the inverse of some active transformation. I'm not sure how to generalize this to the case of an arbitrary transformation--I expect that a passive transformation of this kind must always be the adjoint of an active transformation, but that doesn't say very much for the nonlinear function [itex]f(x)[/itex].

At any rate, I like to stick with [itex]\phi'(x') = \phi(x)[/itex]. What follows may not be entirely within the realm of the usual QFT way of doing things, but the math is sufficiently similar that something useful should be gleaned from it, I hope. You can analyze the action of derivatives by using the chain rule. Let [itex]\nabla[/itex] be the 3+1d vector derivative operator. The chain rule gives

[tex]a \cdot \nabla \phi(x) = a \cdot \nabla \phi'(x') = [a \cdot \nabla f(x)] \cdot \nabla' \phi'(x')[/tex]

For any vector [itex]a[/itex]. Define [itex]\underline f(a) \equiv a \cdot \nabla f(x)[/itex] as a linear operator on the vector [itex]a[/itex] which is the Jacobian of the transformation. (Note that rotations/boosts are themselves linear and, as such, equal to their own Jacobians.) This leads to the nice result,

[tex]a \cdot \nabla \phi(x) = \underline f(a) \cdot \nabla' \phi'(x')[/tex]

Or, the form which I prefer, which is

[tex]a \cdot \nabla \phi(x) = a \cdot \overline f(\nabla') \phi'(x') \implies \nabla = \overline f(\nabla')[/tex]

where [itex]\overline f[/itex] is the adjoint linear operator to the Jacobian. Note that for Lorentz boosts/rotations, the adjoint is equal to the inverse. (This is always true of orthogonal operators. Lorentz boosts are orthogonal with respect to the Minkowski metric.)

This form gives us the basic tensor transformation law for cotangent vectors (remembering that [itex]\nabla[/itex] is formed from cotangent vectors). The corresponding law for tangent vectors is derived by taking [itex]x = x(\lambda)[/itex] for some affine parameter [itex]\lambda[/itex].

[tex]\frac{dx'}{d\lambda} = \frac{dx}{d\lambda} \cdot \nabla f(x) = \underline f\left(\frac{dx}{d\lambda}\right) \implies \frac{dx}{d\lambda} = \underline f^{-1} \left(\frac{dx'}{d\lambda}\right)[/tex]

This finishes the derivation of the transformation laws for tensors. In the case of the derivative of a scalar field, we see that

[tex]\nabla \phi(x) = \overline f[\nabla' \phi'(x')][/tex]

...I think that's right. The book I originally learned all this from preferred to switch [itex]x, x'[/itex], which I thought was unduly confusing, and it doesn't seem consistent with what the PDF linked does, either.

ianhoolihan · Oct 15, 2012

Fredrik said:

I assume that you're referring to the second and third equality in this line:
$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$ The second equality is the chain rule in the form ##(f\circ g),_\mu(x)=f,_\nu(g(x))g^\nu{},_\mu(x))##, nothing more, nothing less. ##g^\nu## is the notation I use for the map that takes x to the ##\nu##th component of g(x). This post explains why I like this form of the chain rule.

I just skimmed through this before lectures, and to clarify, I was confused by notation: I see ##(\Lambda^{-1})^\nu{},_\mu = 0## whereas ##(\Lambda^{-1}x)^\nu{},_\mu = (\Lambda^{-1})^\nu{}_\mu##.

I'll read everything else later.

ianhoolihan · Oct 15, 2012

Fredrik said:

I assume that you're referring to the second and third equality in this line:
$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$ The second equality is the chain rule in the form ##(f\circ g),_\mu(x)=f,_\nu(g(x))g^\nu{},_\mu(x))##, nothing more, nothing less. ##g^\nu## is the notation I use for the map that takes x to the ##\nu##th component of g(x). This post explains why I like this form of the chain rule.

OK, I still don't like this. In the linked post, you state the equality between
$$\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^m\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},$$
and
$$(f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x)$$
which I disagree with, as I've always thought ##f,_\mu \equiv \partial f / \partial x^\mu##, so that the second expression is
$$(f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x) =\sum_{j=1}^m\frac{\partial f(g(x))}{\partial x_j}\frac{\partial g_j(x)}{\partial x_i}$$
which is incorrect. Furthermore, I do not see what is wrong with my expression,
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi ((\Lambda^{-1}) x) = \frac{\partial y^\nu}{\partial x^\mu} \frac{\partial }{\partial y^\nu} \phi (y) =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial y^\nu} \phi (y)$$

Fredrik said:

Now consider the special case ##T=\Lambda^{-1}##. We get ##(\Lambda^{-1})^\mu{},_\nu(x)=(\Lambda^{-1})^\mu{}_\nu## for all x. And this right-hand side can also be written as ##\Lambda_\nu{}^\mu##. (I linked to a post that explains that above).

I still think it is much more transparent to write ##(\Lambda^{-1}x)^\mu{},_\nu = ((\Lambda^{-1})^\mu{}_\alpha x^\alpha),_\nu = (\Lambda^{-1})^\mu{}_\alpha \delta^\alpha_\nu = (\Lambda^{-1})^\mu{}_\nu##. We're agree on this part, so I'll leave it.

Fredrik said:

I assume that this is the article. By its definitions, the coordinate transformation ##x\to y## is a passive transformation (of spacetime), since it's just a change of coordinates (rather than a map from spacetime onto itself)

The term "active" is only used in two places in the pdf you linked to. Just after (1.26) on page 11, he says that we're dealing with an active transformation (of the field). It seems to me that the only thing he can mean by that, is something that's entirely obvious in the notation and terminology I've been using: that the substitution ##x\to y## changes ##f\circ x^{-1}## to ##f\circ y^{-1}##, which is a different function. To see what function it is, we find its value at an arbitrary point u.
$$\phi'(u)=f\circ y^{-1}(u)=f\circ x^{-1}\circ x\circ y^{-1}(u)=\phi(\Lambda^{-1}u).$$ I used that ##\Lambda=y\circ x^{-1}## (i.e. that ##\Lambda## takes ##x(p)## to ##y(p)##).

OK, I agree, and that was what I was trying to say with my earlier post --- yours is just more elegant!

Fredrik said:

I don't see how his comment that we get ##\Lambda^{-1}## rather than ##\Lambda## "because we're dealing with an active transformation" explains anything. I would describe what he's doing like this: We're choosing to consider the transformation ##\phi\to\phi'##, induced by the coordinate change ##x\to y##, and this can be thought of as an "active transformation" of ##\phi##, since ##\phi'\neq\phi##. Note that if we had obtained ##\phi'=\phi\circ\Lambda## instead, we still would have had ##\phi'\neq\phi##, making it an "active transformation" of ##\phi##. So it really doesn't seem to make sense to say that the appearence of ##\Lambda^{-1}## instead of ##\Lambda## in the formula for ##\phi'## is explained by the fact that we're doing an active transformation.

Maybe he meant something completely different, which does make sense, but if I were you, I wouldn't spend too much time looking for a meaning where there might not be one. It's possible that he just messed up.

Agreed, again.

Fredrik said:

The Wikipedia article uses rotations of ##\mathbb R^2## to illustrate what they mean by active and passive transformations. For example, ##x\mapsto Rx## is considered an active transformation (by the rotation R) of the components of x, while a passive transformation of the components of x is the transformation ##x_i\mapsto x_i'## where the ##x_i'## are defined by ##x=x_ie_i=x_i'Re_i##. It's not hard to turn this into a formula for ##x_i'##. First let's expand ##Re_i## in basis vectors:
$$x_i'Re_i=x_i'(Re_i)_je_j=x_i'R_{ji}e_j.$$ This is equal to ##x_je_j##, so we must have
$$x_j=R_{ji}x_i'.$$ Multiply by ##(R^{-1})_{kj}## (and sum over j).
$$(R^{-1})_{kj}x_j=\delta_{ki} x_i'=x_k'.$$ So we have
$$x_i'=(R^{-1})_{ij}x_j$$ for the passive transformation, and
$$x_i'=R_{ij}x_j$$ for the active transformation. This makes "passive transformations" of component matrices a pretty useless concept in my opinion. A passive transformation (of the component matrix) by R is just an active transformation by R^-1.

Errmm, I'll leave this.

So, for now, we are in agreeance, except for this bit about the chain rule.

Cheers.

vanhees71 · Oct 15, 2012

Somehow this is all written down in a very complicated way. First of all we have to recall the transformation rule for scalar fields under Lorentz transformations (i.e., boosts and rotations and all possible compositions of those):
[tex]\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.[/tex]
Here [itex]\Lambda[/itex] is the Lorentz-transformation matrix [itex]{\Lambda^{\mu}}_{\nu}[/itex] fulfilling [itex]\Lambda^{-1}=g \Lambda^T g.[/itex]

Now we have
[tex]\partial_{\mu}' \phi'(x')=\partial_{\mu}' \phi(x)=\frac{\partial x^{\nu}}{\partial x'^{\mu}} \partial_{\nu} \phi(x)={(\Lambda^{-1})^{\nu}}_{\mu} \partial_{\nu} \phi(x).[/tex]
In short [itex]\partial_{\mu} \phi[/itex] transforms under Lorentz transformations as a covariant vector field, and that's what has been to show.

BTW: This is why the derivative of a scalar field wrt. to the contravariant vector components, [itex]x^{\mu}[/itex], leads to a lower index for a covariant vector, the four-dimensional gradient, [itex]\partial_{\mu} \phi[/itex].

ianhoolihan · Oct 15, 2012

vanhees71 said:

Somehow this is all written down in a very complicated way. First of all we have to recall the transformation rule for scalar fields under Lorentz transformations (i.e., boosts and rotations and all possible compositions of those):
[tex]\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.[/tex]
Here [itex]\Lambda[/itex] is the Lorentz-transformation matrix [itex]{\Lambda^{\mu}}_{\nu}[/itex] fulfilling [itex]\Lambda^{-1}=g \Lambda^T g.[/itex]

Now we have
[tex]\partial_{\mu}' \phi'(x')=\partial_{\mu}' \phi(x)=\frac{\partial x^{\nu}}{\partial x'^{\mu}} \partial_{\nu} \phi(x)={(\Lambda^{-1})^{\nu}}_{\mu} \partial_{\nu} \phi(x).[/tex]
In short [itex]\partial_{\mu} \phi[/itex] transforms under Lorentz transformations as a covariant vector field, and that's what has been to show.

BTW: This is why the derivative of a scalar field wrt. to the contravariant vector components, [itex]x^{\mu}[/itex], leads to a lower index for a covariant vector, the four-dimensional gradient, [itex]\partial_{\mu} \phi[/itex].

OK, we are trying to prove, not "recall" the rule. Also, I'm not sure we agree on what is "active".

Passive:
$$\quad\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.$$
This is a trivial change of coordinates.

Active:
$$\quad\phi'(x')=\phi'(x)=\phi(\Lambda^{-1} x)$$
Here we keep the coordinates the same, but change ##\phi\to \phi'##. I think Frederik has explained it well in his last posts.

I think Frederik and I also agree that ##\partial'_\mu = \partial_\mu## as the coordinates do not change in an active transformation. The ##\Lambda^{-1}## comes from the chain rule.

Fredrik · Oct 15, 2012

vanhees71 said:

Somehow this is all written down in a very complicated way. First of all we have to recall the transformation rule for scalar fields under Lorentz transformations (i.e., boosts and rotations and all possible compositions of those):
[tex]\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.[/tex]

The reason why my calculations are much longer than yours is that I'm explaining why the "transformed" versions of ##\phi(x)## and ##\partial_\mu\phi(x)## are equal to ##\phi'(x')## and ##\partial_\mu'\phi'(x')## respectively. If you take that as given, or as "obvious" (it's not to me), then the rest is fairly easy, as you noted.

Fredrik · Oct 15, 2012

ianhoolihan said:

OK, I still don't like this. In the linked post, you state the equality between
$$\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^m\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},$$
and
$$(f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x)$$
which I disagree with, as I've always thought ##f,_\mu \equiv \partial f / \partial x^\mu##, so that the second expression is
$$(f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x) =\sum_{j=1}^m\frac{\partial f(g(x))}{\partial x_j}\frac{\partial g_j(x)}{\partial x_i}$$
which is incorrect.

##f,_\mu## denotes the ##\mu##th partial derivative of f. (Note that this is a function that can be found from knowledge of f alone). So ##f,_\mu(g(x))## denotes the value of ##f,_\mu## at ##g(x)##. This is the point of the notation, it makes it perfectly clear what function we're dealing with, and at what point in its domain we are to evaluate it.

So the notation ##f,_\mu(g(x))## can't possibly mean ##\partial f(g(x))/\partial x^\mu##, because a) the latter expression denotes the value of the ##\mu##th partial derivative of ##f\circ g## at x, which in the comma notation is denoted by ##(f\circ g),_\mu(x)##, and b) the function that's being evaluated in the former expression is ##f,_\mu## which has nothing to do with g.

ianhoolihan said:

Furthermore, I do not see what is wrong with my expression,
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi ((\Lambda^{-1}) x) = \frac{\partial y^\nu}{\partial x^\mu} \frac{\partial }{\partial y^\nu} \phi (y) =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial y^\nu} \phi (y)$$

The stuff after the arrow looks fine to me. I just don't know why ##\phi## would be the only thing that transforms when we change the coordinate system.

ianhoolihan said:

I still think it is much more transparent to write ##(\Lambda^{-1}x)^\mu{},_\nu = ((\Lambda^{-1})^\mu{}_\alpha x^\alpha),_\nu = (\Lambda^{-1})^\mu{}_\alpha \delta^\alpha_\nu = (\Lambda^{-1})^\mu{}_\nu##.

I'm OK with this too, but I would prefer the ##\partial_\mu## notation over the comma notation here. This is a bit nitpicky, but to use the comma notation here is like writing ##(f(x)),_\mu## instead of ##f,_\mu(x)##, and I find that kind of ugly because ##,_\mu## is supposed to be an operator that takes a function to a function, and f(x) isn't a function, it's a number in the range of the function f.

I suppose we could say the same about the expression ##\partial_\mu(f(x))##. The operator is supposed to act on f, not on f(x). But I find this less annoying, because in this context we have defined ##\partial_\mu## as an abbreviation of ##\partial/\partial x^\mu##, and the x in denominator of ##\frac{\partial}{\partial x^\mu}f(x)## has a purpose. It reminds us that the function we're talking a partial derivative of is ##x\mapsto f(x)##, as opposed to say ##z\mapsto f(x)##. For this reason, I find the ##\frac{\partial}{\partial x^\mu}f(x)## notation (and therefore also the ##\partial_\mu f(x)## notation) useful enough to be tolerable.

Muphrid · Oct 15, 2012

Some of this really involves the invoking of a convention. The easiest thing is to choose the convention for active transformations and then verify the corresponding passive transformation law.

Thus, there's no harm in taking for granted that ##\phi'(x') = \phi(x)## for active transformations ##x'=f(x)##. From here, we just need to derive the passive transformation law. Consider instead ##\phi(x \cdot e^\mu e_\mu)##. A passive transformation transforms the basis vectors without transforming the vector ##x## itself. Let ##\overline f^{-1} ({e^\mu}') = e^\mu##, and let ##\phi(x) = \Phi(x \cdot e^0, x \cdot e^1, \ldots{})##, so then

$$\phi(x) = \Phi(x \cdot e^0, x \cdot e^1) = \Phi(x \cdot \overline f^{-1}({e^0}'), x \cdot \overline f^{-1}({e^1}')) = \Phi(\underline f^{-1}(x) \cdot {e^0}', \underline f^{-1}(x) \cdot {e^1}')$$

Now, define ##\phi'(x) = \Phi(x \cdot {e^0}', \ldots{})##. It is then guaranteed that ##\phi'(\underline f^{-1}(x)) = \phi(x)##, which allows us to conclude that the passive transformation has ##x' = \underline f^{-1}(x)##. We have constructed it to be so, and this should be persuasive (though I won't presume to call it proof) that passive transformations are naturally the inverses of active ones. ##x## doesn't really change, but there exists an ##x'## that would come from the active transformation that corresponds to the passive one.

Ultimately, though, making the statement that ##\phi'(x') = \phi(x)## is a necessary convention, in my opinion, while finding out how ##x'## relates to ##x## under the two kinds of transformations is really the matter at hand.

Fredrik · Oct 15, 2012

ianhoolihan said:

OK, we are trying to prove, not "recall" the rule. Also, I'm not sure we agree on what is "active".

Passive:
$$\quad\phi'(x')=\phi(x)=\phi(\Lambda^{-1} x'), \quad x'=\Lambda x \; \Leftrightarrow \; x=\Lambda^{-1} x'.$$
This is a trivial change of coordinates.

In my opinion, this shows why the terms "active" and "passive" shouldn't be used at all in this context. You're calling this a "passive" transformation probably because I said that by the Wikipedia article's definition, a coordinate change is a passive transformation. However, if M denotes the spacetime manifold, and x and y are coordinate systems that map M onto ##\mathbb R^4##, as in all of my posts above, then while a coordinate change ##x\to y## could be called a passive transformation (of p, or of M), the function ##\Lambda=y\circ x^{-1}## that induces this change on coordinate 4-tuples has to be considered an active transformation of the components of x(p) since it takes x(p) to y(p).

And this is just the start of the confusion, since the pdf talks about active vs. passive transformations of the field without much of an explanation.

If I was the king of the universe, I think I would permanently retire that confusing terminology, at least from the context of transformation of field components.

ianhoolihan · Oct 16, 2012

Fredrik said:

In my opinion, this shows why the terms "active" and "passive" shouldn't be used at all in this context. You're calling this a "passive" transformation probably because I said that by the Wikipedia article's definition, a coordinate change is a passive transformation. However, if M denotes the spacetime manifold, and x and y are coordinate systems that map M onto ##\mathbb R^4##, as in all of my posts above, then while a coordinate change ##x\to y## could be called a passive transformation (of p, or of M), the function ##\Lambda=y\circ x^{-1}## that induces this change on coordinate 4-tuples has to be considered an active transformation of the components of x(p) since it takes x(p) to y(p).

And this is just the start of the confusion, since the pdf talks about active vs. passive transformations of the field without much of an explanation.

If I was the king of the universe, I think I would permanently retire that confusing terminology, at least from the context of transformation of field components.

OK, reading the wiki again, the example makes it clear what is active and passive. By passive, the geometric thing of the vector does not change, only it's coordinate representation, and trivially so. The basis is transformed, and the coordinate representation of the vector transformed by th inverse, so the net effect is zilch --- a trivial coordinate transformation. In an active transformation, the geometric thing of the vector is itself rotated, which is represented by the a transformation of the corresponding coordinate representation, but not of the basis vectors. The net effect is not zilch.

Now, to your method of things. Since I don't want to confuse myself, I'll denote the map that takes a point in the manifold ##p## to a subset ##V## of ##\mathbf{R}^n## by ##\varphi_x##. That is, ##\varphi_x## corresponds to your map ##x##. So the coordinates of ##x(p)## for me are the coordinates of ##\varphi_x(p)##.

Define some function ##f:\ M\to \mathbf{I}##. As you say
$$f(p) = f\circ \varphi_x^{-1} \circ \varphi_x (p) = \phi(x)$$
where ##\phi \equiv f\circ \varphi_x^{-1}## and ##x \equiv \varphi_x(p)##.

Now, by passive transformation, I think we are introducing a new coordinate system ##x'## on ##V## as above, such that ##x \to x' = \Lambda x##. As you suggest, we can define ##\Lambda = \varphi_{x'}\circ \varphi_x^{-1}\ : V\to V## where ##\varphi_{x'}\ : M \to V##. Now, all we have is
$$\phi(x) = f \circ \varphi_x^{-1} \circ \Lambda^{-1} \circ \Lambda \circ \varphi_x (p)= \phi'(x')$$
where ##\phi' = \phi \circ \Lambda^{-1} ## and ##x'= \Lambda (x)##. Trivially, ##\phi'(x') = \phi(x)##.

Now, for an active transformation, as you've said in a prior post, one has ##\varphi_x\to \varphi_{x'}## in ##\phi = f\circ \varphi_x ^{-1}##, but not in the argument ##x(p)=\varphi_x(p)##. Then
$$\phi=f \circ \varphi_x^{-1} \to \phi' = f \circ \varphi_{x'} = f \circ \varphi_x^{-1}\circ \varphi_x \varphi_{x'}^{-1} = \phi \circ \Lambda^{-1}.$$
Both ##\phi## and ##\phi'## act on ##x##, and clearly ##\phi(x)\neq \phi'(x)=\phi(\Lambda^{-1} x)##. Hey presto, we're done! And I think we agree?

As for derivatives, my point in the prior post was regarding notation --- I'd always had the ##\partial_\mu \equiv \partial / \partial x^\mu##. Your post had me a bit confused with the intricacies of the comma notation, but I think we're sorted now. However, you ask

The stuff after the arrow looks fine to me. I just don't know why ##\phi## would be the only thing that transforms when we change the coordinate system.

When you do the same:

$$\phi'{},_\mu(x)=(\phi\circ\Lambda^{-1}),_\mu(x)=\phi,_\nu(\Lambda^{-1}x)\,(\Lambda^{-1})^\nu{},_\mu(x) =\Lambda_\mu{}^\nu \phi,_\nu\!(\Lambda^{-1}x).$$

As above, I do not think we change the coordinate system in an active transformation. So, for now I stick by
$$\frac{\partial }{\partial x^\mu} \phi (x) \to \frac{\partial }{\partial x^\mu} \phi' (x) = \frac{\partial }{\partial x^\mu} \phi (\Lambda^{-1} x) = \frac{\partial x'^\nu}{\partial x^\mu} \frac{\partial }{\partial x'^\nu} \phi (x') =(\Lambda^{-1})^\nu{}_\mu \frac{\partial }{\partial x'^\nu} \phi (x')$$

PS --- if you become king of the world, can you magic me into an academic position at a university?

Muphrid · Oct 16, 2012

You keep talking about the coordinate system not changing in an active transformation. I'm not sure if you mean to say that the coordinate lines are the same or that the basis vectors are the same.

Let ##x = x^\mu e_\mu## and ##x' = f(x) = {x'}^\mu e_\mu##. In this picture, the basis vectors aren't changing. I assume this is what you mean by the coordinate system not changing. Nevertheless, the coordinates used for ##\nabla'## are different than those used for ##\nabla##. I use this technique all the time to convert between coordinate systems (by the equivalence of passive and active transformations).

Then, you can use the transformation law for the vector derivative:

$$e_\mu \cdot \nabla \phi(x) = e_\mu \cdot \overline f[\nabla' \phi'(x')]$$

Or, in index notation,

$$\partial_\mu \phi(x) = {f^\nu}_\mu {\partial'_\nu} \phi'(x')$$

Which is clearly similar to what you've written, though the method is general, not particular to a boost.

ianhoolihan · Oct 16, 2012

Muphrid said:

You keep talking about the coordinate system not changing in an active transformation. I'm not sure if you mean to say that the coordinate lines are the same or that the basis vectors are the same.

Let ##x = x^\mu e_\mu## and ##x' = f(x) = {x'}^\mu e_\mu##. In this picture, the basis vectors aren't changing. I assume this is what you mean by the coordinate system not changing. Nevertheless, the coordinates used for ##\nabla'## are different than those used for ##\nabla##. I use this technique all the time to convert between coordinate systems (by the equivalence of passive and active transformations).

Then, you can use the transformation law for the vector derivative:

$$e_\mu \cdot \nabla \phi(x) = e_\mu \cdot \overline f[\nabla' \phi'(x')]$$

Or, in index notation,

$$\partial_\mu \phi(x) = {f^\nu}_\mu {\partial'_\nu} \phi'(x')$$

Which is clearly similar to what you've written, though the method is general, not particular to a boost.

Your notation etc is unfamiliar. However, to clarify, by "the same coordinate system", I mean that the basis vectors do not change. You'll note that the derivative is ##\partial /\partial x'## in the last bit.

Muphrid · Oct 16, 2012

Yeah, part of the goal of the notation is to avoid indices as much as possible. Unfortunately, indices are very, very ingrained in most discussions of this math. At any rate, though, we seem to agree that the basis isn't changing, so I think all your other results are valid.

Fredrik · Oct 16, 2012

ianhoolihan said:

Now, to your method of things. Since I don't want to confuse myself, I'll denote the map that takes a point in the manifold ##p## to a subset ##V## of ##\mathbf{R}^n## by ##\varphi_x##. That is, ##\varphi_x## corresponds to your map ##x##.

That's OK, but I think a better way to improve the notation would be to rename my coordinate systems x and y to y and z respectively, or y and y'. If you don't like y, then how about S and S'? The source of the confusion was that I used x for two different things, if you really want to improve the notation for the coordinate systems, it would be best to use a notation for the coordinate systems that doesn't involve x at all. I'll use y and z in this post. We have x=y(p), x'=z(p), ##\phi=f\circ y^{-1}## and so on.

ianhoolihan said:

When you do the same:

You're quoting a calculation of mine that's similar to the part of what you did that I said was fine. What I was objecting to was the idea that the thing on the left of the arrow would transform to the thing on the right of the arrow when we change coordinate systems ##y\to z##. Did you mean something else by the arrow?

ianhoolihan · Oct 17, 2012

Fredrik said:

You're quoting a calculation of mine that's similar to the part of what you did that I said was fine. What I was objecting to was the idea that the thing on the left of the arrow would transform to the thing on the right of the arrow when we change coordinate systems ##y\to z##. Did you mean something else by the arrow?

I do not think we change coordinate systems in an active transformation, in the sense that the basis (or axis) does not change. And I also explained that, for an active transformation, ##x(p)\to x(p)##, i.e. does not change. Maybe 'change coordinate system' means more in this context. Anyway, what I was getting at was that you are conisdering ##\partial / \partial x ^\mu \phi '## (since the argument of ##\partial_\mu \phi '(x)## is ##x##) so you too have not changed ##\partial /\partial x^\mu \to \partial / \partial (x')^\mu##. So, in this sense, you have not changed coordinate system either.

Fredrik · Oct 17, 2012

ianhoolihan said:

I do not think we change coordinate systems in an active transformation, in the sense that the basis (or axis) does not change. And I also explained that, for an active transformation, ##x(p)\to x(p)##, i.e. does not change. Maybe 'change coordinate system' means more in this context. Anyway, what I was getting at was that you are conisdering ##\partial / \partial x ^\mu \phi '## (since the argument of ##\partial_\mu \phi '(x)## is ##x##) so you too have not changed ##\partial /\partial x^\mu \to \partial / \partial (x')^\mu##. So, in this sense, you have not changed coordinate system either.

But I did change coordinate systems. What I did answers the question "How does ##\partial_\mu\phi(x)## change when we change the coordinate system from y to z?". To find the answer, we first rewrite ##\partial_\mu\phi(x)## using variables such that the only one that isn't completely coordinate independent is y, and then we just make the substitution ##y\to z##.
$$\partial_\mu\phi(x)=\phi,_\mu(x)=(f\circ y^{-1}),_\mu(y(p))\to (f\circ z^{-1}),_\mu(z(p))=\phi',_\mu(x')=\partial'_\mu\phi'(x').$$ All this talk about active/passive stuff gets really confusing in this context, for several reasons, one of them being that the substitution ##y\to z## is a coordinate change, but a change of basis for ##\mathbb R^4## is also a coordinate change. The former is a coordinate change on M, and the latter is a coordinate change on ##\mathbb R^4##. In addition to that, for each p in M, the change ##y\to z## induces a change of basis (and therefore coordinates) on the tangent space of M at p. Because of these things, you have to be very careful when you use the active/passive terminology.

Note that when we change y to z, ##y(p)## changes to ##z(p)##, i.e. x changes to x'. Is x→x' an active or a passive transformation? That question doesn't make sense, because we haven't specified a transformation matrix or a basis for ##\mathbb R^4##. x and x' are equal to their own coordinate 4-tuples with respect to the standard basis ##\{e_\mu\}##. But there's also a basis ##\{e'_\mu\}## such that x' is equal to the coordinate 4-tuple of x with respect to ##\{e'_\mu\}##. So x' can be interpreted as a coordinate 4-tuple in at least two different ways. It's the coordinates in the standard basis of a new member of ##\mathbb R^4##, or it's the coordinates of x in a new basis.

Since the basis ##\{e'_\mu\}## is determined by z, the change ##\{e_\mu\}\to \{e'_\mu\}## is a coordinate change on ##\mathbb R^4## that's associated with the coordinate change ##y\to z##. So it's natural to let these two bases be the ones relative to which we use the term "passive transformation". But we can still define ##\Lambda## either by ##x'=\Lambda x## or (inequivalently) by ##e'_\mu=\Lambda e_\mu##. If we do the former, then ##x\to x'## is an active transformation by ##\Lambda## (and a passive transformation by ##\Lambda^{-1}##), and if we do the latter, then ##x\to x'## is a passive transformation by ##\Lambda## (and an active transformation by ##\Lambda^{-1}##).

I hope this will help you see why I find it hard to answer comments like "I do not think we change coordinate systems in an active transformation". The above shows that the change ##x\to x'## that's induced by the coordinate change ##y\to z## can be thought of as an active transformation in two different ways (and as a passive transformation in two different ways).

The comment "for an active transformation ##y(p)\to y(p)##" just looks wrong. (I changed your x to y, because I have changed my notation for the coordinate systems from x,y to y,z). Did you mean "passive"? If we interpret y(p) as a coordinate 4-tuple, it changes both under active and passive transformations, but if we interpret it as a point in ##\mathbb R^4##, it changes under active transformations but not under passive transformations.

ianhoolihan · Oct 17, 2012

Fredrik said:

Note that when we change y to z, ##y(p)## changes to ##z(p)##, i.e. x changes to x'. Is x→x' an active or a passive transformation?

I'll reply to this quickly, as I've got to shoot off. The point is, I disagree, and that's what I've been trying to say. In an active transformation, the coordinate system does not change, only the function --- as before ##y(p) \to y(p)##. In a passive one, ##y(p)\to z(p)## and the corresponding change in ##\phi \to \phi'## equates to a trivial change in coordinates.

To quote wiki:

Put differently, a passive transformation refers to observation of the same event from two different coordinate frames.[1] On the other hand, the active transformation is a new mapping of all points from the same coordinate frame.

Sorry about missing a sign somewhere in the equations from a prior post --- I can't seem to odit it.

Muphrid · Oct 17, 2012

I think Fredrik is right to say that it's kind of a meaningless distinction. Under any transformation ##x' = f(x)##, you can identify a new set of tangent and cotangent basis vectors

$${e_\mu}' = \underline f(e_\mu), \quad {e^\mu}' = \overline f^{-1}(e^\mu)$$

Again, ##\underline f## is the Jacobian and ##\overline f## is the transpose (adjoint). The picture of what's going on is equally valid if one insists on using the same basis vectors and different components or transformed basis vectors and the same components.

This equivalence is why I use "active" transformations even for something as simple as going from Cartesian to polar coordinates. Let me demonstrate:

Let ##x' = f(x) = r e_1 + \theta e_2 = e_1 \sqrt{(x^1)^2 + (x^2)^2} + e_2 \arctan(x^2/x^1)##.

Let's find the Jacobian of this transformation.

$$\begin{align*}
\underline f(e_1) &= e_1 \cdot \nabla f(x) = \frac{\partial x'}{\partial x^1} = e_1 \cos \theta - e_2 \frac{\sin \theta}{r} \\
\underline f(e_2) &= e_2 \cdot \nabla f(x) = \frac{\partial x'}{\partial x^2} = e_1 \sin \theta + e_2 \frac{\cos \theta}{r}
\end{align*}$$

Where I've used the transformation to avoid some problematic mixing of exponents and indices. Let's find the adjoint:

$$\begin{align*}
\overline f(e^1) &= e^1 \cos \theta + e^2 \sin \theta \\
\overline f(e^2) &= \frac{1}{r} (-e^1 \sin \theta + e^2 \cos \theta)\end{align*}$$

Pay close attention here. ##\overline f(e^1) = e^r##, the vector in the ##r## direction on the original, untransformed 2d plane. Similarly, ##\overline f(e^2) = e^\theta##.

Also notice that ##e^r \cdot e^r = 1, e^r \cdot e^\theta = 0, e^\theta \cdot e^\theta = 1/r^2##. These are exactly the metric coefficients you'd expect for ##g^{\mu \nu}##.

So, I can work with this transformation in a couple ways. I can use ##r,\theta## with ##e^r, e^\theta## as position-dependent basis vectors, or I can stay completely in the primed space, using ##e^1, e^2## as basis vectors and use the Jacobian as I must to get correct results.

In this example, I have explicitly mapped position vectors to new positions, but I've done so in order to replicate the results of what would otherwise be just a change from the cartesian to polar coordinate system.

ianhoolihan · Oct 17, 2012

Muphrid said:

I think Fredrik is right to say that it's kind of a meaningless distinction.

...

The picture of what's going on is equally valid if one insists on using the same basis vectors and different components or transformed basis vectors and the same components.

I disagree. Active transformations lead to physically observable effects. Passive ones do not --- they are trivial changes of coordinates.

In a passive transformation, both component and basis are changed (inversely). In an active transformation, only one of those is changed --- either the component or the basis, yes. Maybe this is what Frederik stated, and I mistook him.? I thought he meant that an active transformation was the same as a passive one, in the sense I've just described.

Kane

Fredrik · Oct 17, 2012

ianhoolihan said:

I'll reply to this quickly, as I've got to shoot off. The point is, I disagree, and that's what I've been trying to say. In an active transformation, the coordinate system does not change, only the function --- as before ##y(p) \to y(p)##. In a passive one, ##y(p)\to z(p)## and the corresponding change in ##\phi \to \phi'## equates to a trivial change in coordinates.

You need to distinguish between the point y(p) and its coordinate 4-tuple with respect to a basis (even when they happen to have the same components due to a choice of basis). The active/passive terminology is only used about transformations of coordinate 4-tuples. It simply doesn't apply to transformations of points. However, a transformation of the points induces both an active and a passive transformation of the coordinate 4-tuples.

The coordinate change ##y\to z## obviously induces the change ##y(p)\to z(p)##. And this induces an active transformation by ##z\circ y^{-1}## of the coordinate 4-tuple of y(p) with respect to the standard basis, and it induces a passive transformation by ##y\circ z^{-1}## of the coordinate 4-tuple of y(p) with respect to the standard basis.

When you say that in an active transformation, we have ##y(p)\to y(p)##, I'm not sure I even understand what you're saying. The active/passive terminology simply doesn't apply to transformations of the point y(p), and both active and passive transformations of a corresponding coordinate 4-tuple will change that coordinate 4-tuple. An active transformation by ##\Lambda## is a passive transformation by ##\Lambda^{-1}## and vice versa.

ianhoolihan said:

I can't seem to odit it.

There's a time limit for odits

. I think it's currently set to 11 hours and 40 minutes (=700 minutes).

ianhoolihan · Oct 17, 2012

Fredrik said:

An active transformation by ##\Lambda## is a passive transformation by ##\Lambda^{-1}## and vice versa.

There's a time limit for odits . I think it's currently set to 11 hours and 40 minutes (=700 minutes).

For now this: why then is a passive transformation not physically observable, while an active one is?

Now I need just about an odit of sleep

!

Fredrik · Oct 17, 2012

ianhoolihan said:

Active transformations lead to physically observable effects. Passive ones do not

If you need a more intuitive way to think about these things, I suggest that you think of an active transformation by a rotation matrix R as a physical rotation by R (say a counterclockwise rotation by an angle of π/4) of the object on which we're going to do measurements, and the corresponding passive transformation by R as a physical rotation by R^-1 (a clockwise rotation by π/4) of the labratory around the object (while the object is held fixed relative to the Earth).

In both cases, there's a physical change. The point is that the changes are equivalent, a far as physics experiments are concerned (unless of course we're doing experiments with something like a compass needle; in those cases, you have to imagine these things taking place in intergalactic space or something).

In the passive case, the orientation of the object relative to the Earth (or some other fixed stuff outside the laboratory) doesn't change. But we would still change our description of its orientation, if we describe it relative to the walls of the laboratory (the new basis vectors). In the active case, our description of the orientation of the object relative to the walls changes in exactly the same way as in the passive case.

Edit: Note that this last bit is consistent with what I've been saying about active and passive transformations of a coordinate 4-tuple ##(x_\mu)_{\mu=0}^3##:

Active transformation by ##\Lambda##: ##x^\mu\to\Lambda^\mu{}_\nu x^\nu##.
Passive transformation by ##\Lambda##: ##x^\mu\to(\Lambda^{-1})^\mu{}_\nu x^\nu##.
Passive transformation by ##\Lambda^{-1}##: ##x^\mu\to\Lambda^\mu{}_\nu x^\nu##. (This is the same as the active transformation by ##\Lambda##, as suggested by the informal argument above).

Muphrid · Oct 17, 2012

ianhoolihan said:

I disagree. Active transformations lead to physically observable effects. Passive ones do not --- they are trivial changes of coordinates.

In a passive transformation, both component and basis are changed (inversely). In an active transformation, only one of those is changed --- either the component or the basis, yes. Maybe this is what Frederik stated, and I mistook him.? I thought he meant that an active transformation was the same as a passive one, in the sense I've just described.

Kane

I think I've discovered the problem. Consider a tangent vector ##u = u^1 e_1 + u^2 e_2##. We can transform this vector into the primed space.

$$u' = {u^1}' e_1 + {u^2}' e_2 = u^1 {e_1}' + u^2 {e_2}'$$

(Incidentally, I think I see now why some authors prefer ##\phi'(x) = \phi(x')## now. It makes talking about the transformation laws hideous, but it would keep all the primes on one side of the above statement.)

At any rate, we can now transform back to get the original ##u##:

$$u = u^1 e_1 + u^2 e_2 = {u^1}' \underline f^{-1}(e_1) + {u^2}' \underline f^{-1}(e_2)$$

There's a certain symmetry here, which I think can be expressed as follows: the vector ##u'## can be expressed in terms of either (a) new components, same basis vectors or (b) same components, new basis vectors. This is what I was saying earlier.

However, you've also been talking about the untransformed vector ##u##, which clearly can be described in terms of either (a) same components, same basis vectors or (b) new components, new basis vectors. The latter is what you expect in a passive transformation, while the former may be what you expect in an active transformation, since you're generally not even interested in the untransformed vector at all (we don't tend to think about it, at least).

ianhoolihan · Oct 18, 2012

Fredrik said:

In both cases, there's a physical change. The point is that the changes are equivalent, a far as physics experiments are concerned (unless of course we're doing experiments with something like a compass needle; in those cases, you have to imagine these things taking place in intergalactic space or something).

Frederik, we still disagree I think. Active = observable, passive = unobservable.

Muphrid said:

There's a certain symmetry here, which I think can be expressed as follows: the vector ##u'## can be expressed in terms of either (a) new components, same basis vectors or (b) same components, new basis vectors. This is what I was saying earlier.

However, you've also been talking about the untransformed vector ##u##, which clearly can be described in terms of either (a) same components, same basis vectors or (b) new components, new basis vectors. The latter is what you expect in a passive transformation, while the former may be what you expect in an active transformation, since you're generally not even interested in the untransformed vector at all (we don't tend to think about it, at least).

If you mean former, as in the former paragraph, then yes, that's what I mean. An active transformation is not the inverse of a passive one (which, I admit, is what is usually bandied around).

See this post for an example of active nd passive transformations being observable and unobservable, respectively: https://www.physicsforums.com/showpost.php?p=4110601&postcount=5

Transformation properties of derivative of a scalar field

Similar threads

Hot Threads

Recent Insights