# Noether's theorem

1. Jul 10, 2008

### Fredrik

Staff Emeritus
I'm trying to understand Wikipedia's proof of Noether's theorem for a field theory on Minkowski space. Link. Their proof is clearly just the one from Goldstein (starting on page 588 in the second edition) with details omitted, but I can't understand Goldstein either. I'm going to ask a couple of basic questions first, and see if the answers will help me figure out what they're doing. If that doesn't work, I'll post more specific questions about the derivation.

1. Isn't the theorem supposed to be about what happens when the action is invariant under a transformation of the fields? Then why are Goldstein and Wikipedia messing with the set that the integration is performed over? And why would Goldstein postulate that the form of the Lagrangian is invariant?

2. Is it correct to say that the assumption we start with is that there's an n-parameter family of fields $\epsilon\mapsto\phi_a$ with $\phi_0=\phi$ and such that each $\phi_a$ satisfies both the Euler-Lagrange equations and the boundary conditions that we have imposed on $\phi$?

3. If the answers to 1 and 2 are "no" and "yes" respectively, then why doesn't the derivation (and the final result) look like this?

$$0=\frac{d}{d\epsilon^r}\Big|_0 S[\phi_\epsilon]=\dots=\int d^4x \partial_\mu\bigg(\frac{\partial\mathcal{L}}{\partial(\partial_\mu\phi)}\frac{d}{d\epsilon^r}\Big|_0\phi_\epsilon \bigg)$$

The conserved currents would be

$$j_r^\mu=\frac{\partial\mathcal{L}}{\partial(\partial_\mu\phi)}\frac{d}{d\epsilon^r}\Big|_0\phi_\epsilon$$

4. If the answers to 1 and 2 are "yes" and "no" respectively, then what kind of variation is causing the $\mathcal L$ term to appear in the conserved current? (It can't be a variation of the fields, can it?)

2. Jul 10, 2008

### masudr

We have some action that is the integral of the Lagrangian over spacetime. We postulate that all the physics is contained in the action, and so any transformation that leaves the action invariant (not the Lagrangian - it is more general to consider the action, since that is the physical thing) must lead to the same physics.

We then consider transformations of both, the sets the integration is performed over (i.e. the coordinates) and the set of fields that leave the action invariant. For instance, we have constructed the fields to be representations of the Poincaré group and the Lagrangian to be a scalar, so we know that any translations/boosts/rotations will leave the physics invariant. Then there are the internal transformations, such as rotations among the fields (or, for example, local U(1) for QED etc.).

I hope this answers your point 1. I don't really understand what you mean by point 2, and as a result I don't understand the points you have raised in 3 and 4. I don't have access to Goldstein, but may I ask what issue you have with the derivation in Wikipedia?

3. Jul 10, 2008

### Fredrik

Staff Emeritus
Thanks for the reply. I have to get some sleep now, so I can't explain my issues with the Wikipedia derivation right now. (It would take a while). You said you don't understand my point 2, so I'll try to at least explain that one better right now. Point 2 is my attempt to define what a variation of the field is. It's kind of hard to explain what I'm doing, but I think that if I show you how I would derive the Euler-Lagrange equations using this terminology and notation, things will be a lot clearer.

We start by assuming that there exists a field $\phi$ that minimizes the action. Then let $\phi_0=\phi$ and let $\phi_\epsilon$ be a field for each $\epsilon$ in some interval that contains 0. The map $\epsilon\mapsto\phi_\epsilon$ is assumed to be continuous, but the fields $\phi_\epsilon$ are arbitrary except for that. Since $\phi$ minimizes the action, we must have

$$0=\frac{d}{d\epsilon}\bigg|_0 S[\phi_\epsilon]=\int d^4x \frac{d}{d\epsilon}\bigg|_0 \mathcal L (\phi_\epsilon(x),\partial_\mu\phi_\epsilon(x),x) =\int d^4x \bigg(\frac{\partial\mathcal L}{\partial\phi}\frac{d}{d\epsilon}\bigg|_0\phi_\epsilon+\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\frac{d}{d\epsilon}\bigg|_0\partial_\mu\phi_\epsilon\bigg)$$

You are probably familiar with the standard trick here, so I won't write out the details (unless someone asks for them). We get

$$=\int d^4x \bigg( \frac{\partial\mathcal L}{\partial\phi} -\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\bigg)\frac{d}{d\epsilon}\bigg|_0\phi_\epsilon$$

Now the function in parentheses must be zero since the function on the right is arbitrary, so we're done here.

Note that if we multiply the equation with $\epsilon$ and define

$$\delta =\epsilon\frac{d}{d\epsilon}\bigg|_0$$

this changes the look of the entire calculation into a form you may be more familiar with. Note that my way of writing things makes it 100% clear that there's no need to assume that $\epsilon$ or anything else is an infinitesimal, and that none of the equalities are approximations that only hold to first order in $\epsilon$. They are all exact.

I really dislike the "standard" way of doing this calculation. There's no need to mention infinitesimals, to calculate things only to first order in $\epsilon$, or to say that $\delta\phi$ is a "small" variation of the field without defining what that means. If it's at all possible, I would like to do the derivation of Noether's theorem in a similar way.

I have made some progress since I started this thread. I was at least able to get the same conserved current as Wikipedia, but there are still a couple of steps in the derivation that I don't know how to justify. I'll elaborate on that tomorrow.

Last edited: Jul 10, 2008
4. Jul 11, 2008

### Fredrik

Staff Emeritus
Here's one thing I don't understand: Why do Goldstein, Wikipedia and many other sources talk about variations of the fields and variations of the coordinates as two different things?

$\phi(x)\rightarrow\epsilon\phi(x)$ would be considered a variation of the field, and $\phi(x)\rightarrow\phi(\epsilon x)$ would be considered a variation of the coordinates. It might seem that the most general variation of $\phi(x)$ is $\phi(x)\rightarrow\phi_a(f_b(x))$ where a and b are different parameters, but if we define $\phi_{ab}=\phi_a\circ f_b$ we see that even the combined variation of both the field and the coordinates can be expressed as a variation $\phi(x)\rightarrow\phi_{ab}(x)$ of the field only, so why not just write the last change as $\phi(x)\rightarrow\phi_\epsilon(x)$? (This would make $\epsilon$ an ordered set of indices $\epsilon=(\epsilon^1,\epsilon^2)=(a,b)$, but this is nothing new. a and b can have several components too. If a has n components and b m components, $\epsilon$ has m+n components. By the way, this is where the index r comes from in post #1. It goes from 1 to the number of independent parameters, e.g. 10 for Poincaré group transformations).

I suppose that when we consider actions that are defined as integrals over some proper subset $\Omega$ of $\mathbb R^n$, it makes sense to talk about a variation of the coordinates, in the form $\Omega\rightarrow\Omega'$, but I have definitely seen people (books) distinguish between variations of the fields and variations of the coordinates in the context of fields on Minkowski space.

(This is not the full elaboration about what I have a problem with in Wikipedia's derivation. It's just all I have time for right now. I'll return later).

Last edited: Jul 11, 2008
5. Jul 11, 2008

### haushofer

Because you don't only want to know the variation as a result of a coordinate change; you want to have a general variation. The physics is not really in the coordinates, these are just labels of the physics. Remember that the field configuration doesn't depend only on the coordinates.

Often, if you consider field variations as a result of a coordinate variation, you already have the explicit form of the field which is kinematically allowed. But these allowed configurations are given by the action principle.

So, a coordinate variation induces naturally a field variation, but to get the explicit expression for this we need the field configuration. To obtain the equations of motion however, we don't induce the variation $$\delta\phi$$ by a coordinate transformation; the invariance of the action under a coordinate transformation is already familiar with us, because we postulate that the action is a scalar under coordinate transformations and this variation gives us zero by construction. So that doesn't give us any new information.

If we only want to have the equations of motion, we can keep the coordinates fixed and consider the arbitrary field variation. If we than put this variation of the action to zero, we get the kinematically allowed field configurations.

I hope this helps; I also find this sometimes a little difficult, and maybe my understanding of this isn't correct also.

6. Jul 11, 2008

### Fredrik

Staff Emeritus
Yes, but my point is that a variation of only the field is the most general variation of the quantity $\phi(x)$. As I tried to explain in #4, there's nothing "more general" about the transformation $\phi(x)\rightarrow\phi'(x')$. The set of all such transformations is the same as the set of all transformations of type $\phi(x)\rightarrow\phi'(x)$.

Maybe I've been misled by the term "principle of least action", or maybe I have misunderstood what an "action" is. I have always thought that an action is a function(al) that takes a field (or several fields, or the components of one or more fields) to a number. If that definition is correct, then the only kind of variation that it makes sense to consider is $\phi(x)\rightarrow\phi'(x)$. If we're considering other variations, we're not really minimizing the action. We're minimizing something else. It would be like trying to find the minimum of the function $f:\mathbb R\rightarrow\mathbb R$ defined by f(x)=(a+1)2+x2 for all x, by considering variations of both x and a!

If I'm wrong about the definition of an action (or about what "principle of least action" means), then maybe I suppose it does make sense to consider variations of the Lagrangian and the set we integrate over. In that case, we shouldn't write $S[\phi]$, we should write $S[\phi,\mathcal L,\Omega]$, where $\mathcal L$ is a multi-variable polynomial and $\Omega$ is the set we integrate over.

7. Jul 11, 2008

### masudr

I don't personally see a problem with Fredrik's statement that there's nothing more general about the transformation of coordinates as well as fields, since we can combine the two transformations.

I think the real point of separating is to clearly see the difference between internal and external transformations. The external ones are due to symmetries of spacetime (i.e. the fact that the physics is unchanged after coordinate transformations) and leads to things like conserved stress-energy tensors. The internal ones are due to symmetries of the fields we are considering (e.g. U(1) symmetry of a complex scalar field) and leads to things like conserved charges.

While mathematically they may all be treated as one (or maybe even not, but I haven't had the time to fully carry through Fredrik's approach for myself; I assume it leads to the same results), the physical separation between internal and external is useful to think about. Perhaps this is why they are treated separately in common texts about Noether's theorem for fields. I may be wrong, and perhaps someone more knowledgeable will come by to offer some hints.

8. Jul 11, 2008

### Fredrik

Staff Emeritus
I really want to understand this, so I'm going to post what I believe is the Goldstein/Wikipedia derivation in my notation (see #3 and #4) and explain what parts I can't justify. I hope that someone can explain those parts to me.

$$S[\phi]=S[\phi_{ab}]=\int_\Omega d^4x\mathcal L(\phi_{ab}(x),\partial_\mu\phi_{ab}(x),x)$$

Here we make the change of variables $y=f_b(x)$, where $f_b$ is the same function as in #4. Note that we can write

$$y=f_b(x)\approx f_0(x)+b^s\frac{d}{db^s}\bigg|_0 f_b(x)=x+\delta x[/itex] Here and in the rest of this post, $\approx$ means "equal to first order in the parameters". The equality on the right is the definition of what I mean by $\delta x$. Recall that $b\mapsto f_b$ is supposed to be a variation of the coordinates and $a\mapsto\phi_a$ a variation of the field. My first problem with Wikipedia's derivation is that I haven't been able to get the correct result without first assuming that $\partial_\mu(\delta x)=0$, i.e. that the part of the transformation that we consider a transformation of the coordinates is a pure translation. This seems weird to me. At the very least, all the Lorentz transformations should be considered transformations of the coordinates. If we can't do that, then I don't see how it can ever make sense to talk about a transformation of the coordinates as opposed to a transformation of the field. I will write the Jacobian determinant associated with the coordinate change as [tex]|f'_b(x)|=|f'_b(f_b^{-1}(y)|$$

Note that

$$\partial_\mu\phi_{ab}(x)=(\phi_a\circ f_b),_\mu(x) =\phi_a,_\nu(f_b(x))f_b^\nu,_\mu(x) \approx \phi_a,_\nu(y)\partial_\mu(x^\nu+\delta x^\nu)=\phi_a,_\mu(y)=\partial_\mu\phi_a(y)$$

so

$$S[\phi_{ab}]\approx\int_{f_b(\Omega)}d^4y\frac{1}{|f'_b(f_b^{-1}(y))|}\mathcal L(\phi_a(y),\partial_\mu\phi_a(y),y-\delta x)$$

This is where the magic happens. Here they simply throw away the determinant and change the set that we integrate over back to $\Omega$. This I can't understand at all. Once we have done that, we can change the name of the variable we integrate over back to x, so we get

$$\approx\int_\Omega d^4y\mathcal L(\phi_a(x),\partial_\mu\phi_a(x),x-\delta x)$$

From this point on, the calculation is pretty straightforward, so I won't bother typing the details.

9. Jul 11, 2008

### Fredrik

Staff Emeritus
If we keep throwing away all terms that are second order and higher in the parameters, we get

$$0=S[\phi_{ab}]-S[\phi]\approx\int_\Omega d^4x\partial_\mu\bigg(\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}(\delta\phi+\delta x^\nu\partial_\nu\phi)-\delta x^\mu\mathcal L\bigg)$$

where $\delta\phi$ is

$$\delta\phi(x)=a^r\frac{d}{da^r}\bigg|_0\phi_a(x)$$

This is the correct result. It requires some thought to see why the integrand must be 0 and not just the integral, but it can be done.

Now if I instead use the method I outlined in #1, I get the same thing except for the last term. The calculation is a lot shorter and much less "magical". I don't know what to make of that. It seems to me that

$$\int_\Omega d^4x \partial_\mu(\delta x^\mu\mathcal L)=0$$

at least in the limit $\Omega\rightarrow\mathbb R^4$, so why do we include the last term at all?

If it isn't obvious, it's the stuff in red in this post and the last that I would like someone to explain to me.

10. Jul 12, 2008

### Fredrik

Staff Emeritus
Let's try an easier question. I have realized that I have the same problem even with the Lagrangian for a single non-relativistic point particle.

Invariance of the action under translations in time should give us conservation of energy, right? But how do we even begin to prove this? I want to write the invariance condition as

$$\frac{d}{da}\bigg|_0 S[q_a]=0$$

where $q_a$ is defined by $q_a(t)=q(t+a)$, but I don't see how to get anything useful from that. All I get is

$$0=\frac{d}{da}\bigg|_0 S[q_a]=\dots=\int dt \frac{d}{dt}\bigg(\frac{\partial L}{\partial\dot q}\dot q\bigg)$$

and the integral on the right-hand side is always zero because of the boundary conditions (right?), so this doesn't tell us anything.

So let's try the method that Wikipedia and Goldstein seem to be using in their proof (for fields in Minkowski space).

$$S[q_a]=\int_{t_1}^{t_2} dt L(q_a(t),\dot q_a(t),t)=\int_{t_1}^{t_2} dt L(q(t+a),\dot q(t+a),t)$$

Change variables, t'=t+a.

$$=\int_{t_1+a}^{t_2+a} dt' L(q(t'),\dot q(t'),t'-a)$$

Drop the primes and magically change the integration interval back to (t1,t2).

$$\approx\int_{t_1}^{t_2} dt L(q(t),\dot q(t),t-a) \approx\int_{t_1}^{t_2} dt \bigg( L(q(t),\dot q(t),t)-a\frac{\partial L}{\partial t}\bigg) \approx\int_{t_1}^{t_2} dt \bigg( L(q(t),\dot q(t),t)+a\frac{d}{dt}\bigg(\frac{\partial L}{\partial\dot q}\dot q-L\bigg)\bigg)$$

The invariance condition they're using is that $S[q_a]=S[q]$ (or maybe just $S[q_a]\approx S[q]$).

$$0=S[q_a]-S[q] \approx a\int_{t_1}^{t_2} dt\frac{d}{dt}\bigg(\frac{\partial L}{\partial\dot q}\dot q-L\bigg)$$

Even if this is the correct result, I don't see how to prove that the integrand is =0. It's also weird that if the Lagrangian doesn't have an explicit time dependence, the entire integral is =0 because of the boundary conditions, so again we get the result 0=0 which tells us nothing.

The result we're supposed to find is of course

$$0=\frac{d}{dt}\bigg(\frac{\partial L}{\partial\dot q}\dot q-L\bigg) =\frac{d}{dt}\bigg(m\dot q^2-(\frac{1}{2}m\dot q^2-V(q))\bigg)=\frac{d}{dt}\bigg(\frac{1}{2}m\dot q^2+V(q)\bigg)$$

11. Jul 13, 2008

### Fredrik

Staff Emeritus
I think I finally figured it out, no thanks to you guys.

We can think of the Lagrangian as a functional too, for each fixed x. $\phi\mapsto\mathcal L_x[\phi]$. Our theory is invariant under the transformation $\phi\mapsto\phi_\epsilon$ at $\epsilon=0$ if there exists a function $$f:\mathbb R^4\rightarrow\mathbb R^4[/itex] such that [tex]\frac{d}{d\epsilon}\bigg|_0\mathcal L_x[\phi_\epsilon]=\partial_\mu f^\mu(x)$$

So the starting assumption should be that this is true. It's always true that

$$\frac{d}{d\epsilon}\bigg|_0\mathcal L_x[\phi_\epsilon]=\partial_\mu\bigg(\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\frac{d}{d\epsilon}\bigg|_0\phi_\epsilon\bigg)$$

So the assumption implies that

$$\partial_\mu\bigg(\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\frac{d}{d\epsilon}\bigg|_0\phi_\epsilon-f^\mu\bigg)=0$$

Translations have a special significance because the Lagrangian isn't invariant under translations, but it's easy to show that the action is. When $b\mapsto\phi_b(x)=\phi(x+b)$, we have $f^\mu_\nu=\delta^\mu_\nu\mathcal L$. (The additional index on f comes from the fact that translations are four different symmetries, not just one).

So when the symmetry group we're considering consists of translations and something else, and that "something else" leaves not only the action invariant, but also the Lagrangian, the conserved currents are

$$j_{r}^\mu=\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\frac{d}{da^r}\bigg|_0\phi_a$$

and

$$j_{\nu}^\mu=\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\partial_\nu\phi-\delta^\mu_\nu\mathcal L$$

where the $a^r$ are the parameters of the "something else" transformations.

I still can't make sense of what Wikipedia and Goldstein are doing, but I care a lot less now that I have found both a satisfactory statement of one version of the theorem and a satisfactory proof.

Last edited: Jul 13, 2008
12. Aug 16, 2009

### Fredrik

Staff Emeritus
I have to bump this, because I've been linking to this thread and I spotted a mistake. The above should be

$$=\int d^4x \bigg( \frac{\partial\mathcal L}{\partial\phi} -\partial_\mu\bigg(\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\bigg)\bigg)\frac{d}{d\epsilon}\bigg|_0\phi_\epsilon$$