I know that ##\frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \partial_{\mu} \phi\ \partial^{\mu} \phi \big) = \partial_{\mu} \phi##.

Now, I need to prove this to myself.

So, here goes nothing.

##\frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \partial_{\mu} \phi\ \partial^{\mu} \phi \big)##
## = \frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \eta^{\mu\nu}\partial_{\mu} \phi\ \partial_{\nu} \phi \big)##
##= \eta^{\mu\nu}\ \partial_{\nu} \phi + \eta_{\mu\nu} \eta^{\mu\nu}\ \partial_{\mu} \phi##,

where I first differentiated the factor ##\partial_{\mu}\phi## with respect to ##\partial_{\mu}\phi## and then I differentiated the factor ##\partial_{\nu}\phi## with respect to ##\partial_{\mu}\phi##.

Am I correct so far?

Related Differential Geometry News on Phys.org
Your indices in the second term don't match with those in the first so that's an indication something is wrong.

Try using the following form

$$\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \partial_{\mu} \phi\ \partial^{\mu} \phi \big)$$

This helps you avoid the problem where you have ill-defined expressions like ##\eta_{\mu\nu}\eta^{\mu\nu}\partial_\mu\phi##.
The latter can mean two things
$$\eta_{\mu\nu}\left(\eta^{\mu\nu}\partial_\mu\phi\right)=\eta_{\mu\nu}\partial^\nu\phi=\partial_\mu\phi$$
or it can mean (using ##\eta_{\mu\nu}\eta^{\mu\nu}=D## with D the number of spacetime dimensions)
$$\left(\eta_{\mu\nu}\eta^{\mu\nu}\right)\partial_\mu\phi=4\partial_\mu\phi$$

• spaghetti3451
Firstly, I need to say that ##
\frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \frac{1}{2}\partial_{\mu} \phi\ \partial^{\mu} \phi \big) = \partial^{\mu} \phi##.
I made a mistake in my first post of placing the ##\mu## index on the RHS downstairs, instead of upstairs. I also made a mistake of forgetting the factor of ##\frac{1}{2}##.

Now,

##\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \frac{1}{2}\partial_{\mu} \phi\ \partial^{\mu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \partial_{\mu} \phi\ \partial_{\nu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}(\eta_{\alpha\mu}\partial_{\nu}\phi+\eta_{\alpha\nu}\partial_{\mu}\phi)##
##=\frac{1}{2}(\eta_{\alpha\mu}\eta^{\mu\nu}\partial_{\nu}\phi+\eta_{\alpha\nu}\eta^{\nu\mu}\partial_{\mu}\phi)##
##=\eta_{\alpha\mu}\eta^{\mu\nu}\partial_{\nu}\phi##
##=\delta^{\nu}_{\alpha}\partial_{\nu}\phi##
##=\partial_{\alpha}\phi##

Am I correct?

Last edited:
Why do you use that
$$\frac{\partial\left(\partial_\mu\phi\right)}{\partial\left(\partial_\alpha\phi\right)}=\eta_{\mu\alpha}$$

A hint that something is wrong, is that the indices don't correspond. You can check this by performing a transformation ##x^\mu\to x^{\prime\mu}## you should find that there is an upper and a lower index. [*]

It's equal to ##\delta^\alpha_\mu\,\,\,\left(=\eta^\alpha_{\,\,\, \mu}\right)## as far as I can tell.

[*]: Oddly it seems that this would turn out correct if you contract the metric so I'm starting to doubt myself.

• spaghetti3451

##\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \frac{1}{2}\partial_{\mu} \phi\ \partial^{\mu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \partial_{\mu} \phi\ \partial_{\nu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}({\eta^{\alpha}}_{\mu}\partial_{\nu}\phi+{\eta^{\alpha}}_{\nu}\partial_{\mu}\phi)##
##=\frac{1}{2}({\eta^{\alpha}}_{\mu}\eta^{\mu\nu}\partial_{\nu}\phi+{\eta^{\alpha}}_{\nu}\eta^{\nu\mu}\partial_{\mu}\phi)##
##={\eta^{\alpha}}_{\mu}\eta^{\mu\nu}\partial_{\nu}\phi##
##=\eta^{\alpha\nu}\partial_{\nu}\phi##
##=\partial^{\alpha}\phi##

It appears that in the answer, the index ##\alpha## should be upstairs, not downstairs. I made this mistake in my previous post.

Try to see why the ##\alpha## index is upstairs and also why you can write ##\eta^\alpha_{\,\,\,\mu}=\delta^\alpha_\mu## where delta is the kronecker delta.

Other than that it is correct.

• spaghetti3451
Try to see why the ##\alpha## index is upstairs
Shouldn't the index ##\alpha## be upstairs because we are differentiating the product of an upstairs index and a downstairs index with respect to a downstairs index?

and also why you can write ##\eta^\alpha_{\,\,\,\mu}=\delta^\alpha_\mu## where delta is the kronecker delta.
I think ##\eta^\alpha_{\,\,\,\mu}=\delta^\alpha_\mu## because ##\eta^\alpha_{\,\,\, \mu} = \eta^{\alpha\nu}\eta_{\nu\mu} = \delta^\alpha_\mu##.

Am I correct?

The latter part is perfect.

The first part becomes clearest when explicitly changing coordinates In other words how does the

$$\partial_\mu\phi(x)=\frac{\partial\phi(x)}{\partial x^\mu}$$

Apply a coordinate transformation ##x^\mu\to x^{\prime\mu}## and look at the way the Jacobian shows up.
The Jacobian is given by (or "upside down" doesn't matter too much since we look at invertible transformations)

$$\frac{\partial x^{\prime\mu}}{\partial x^{\mu}}$$

Under the coordinate transformation ##x^{\mu}\rightarrow x'^{\mu}={\Lambda^{\mu}}_{\nu}x^{\nu}##,

##\partial_{\mu}\phi(x)~=~\frac{\partial\phi(x)}{\partial x^{\mu}}~##

##\rightarrow \frac{\partial\phi(\Lambda^{-1}x)}{\partial x^{\mu}}=\frac{\partial (\Lambda^{-1}x)^{\nu}}{\partial x^{\mu}}\frac{\partial\phi(\Lambda^{-1}x)}{\partial (\Lambda^{-1}x)^{\nu}}=\frac{\partial}{\partial x^{\mu}}\Big( {(\Lambda^{-1})^{\nu}}_{\rho}x^{\rho} \Big)(\partial_{\nu}\phi)(\Lambda^{-1}x)={(\Lambda^{-1})^{\nu}}_{\rho}\delta^{\rho}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x)={(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x)##.

How does this help?

Last edited:
Well, now use it in the derivative w.r.t. ##\partial_\mu\phi##.
You should get something of the form
$$\frac{\partial\left(\partial_\mu\phi\right)}{\partial\left(\partial_\alpha\phi\right)}\to \left(\Lambda^{-1}\right)^\nu_{\,\,\, \mu}\Lambda^\alpha_{\,\,\, \beta}\frac{\partial\left(\partial_\nu\phi(x^\prime)\right)}{\partial\left(\partial_\beta\phi(x^\prime)\right)}$$

This means that the object transform as a (1,1)-tensor i.e. it has one upper and one lower index.

This might help with the details https://www.physicsforums.com/threads/kronecker-delta-as-tensor-proof.320692/

Ok. So, under the coordinate transformation ##x^{\mu} \rightarrow {\Lambda^{\mu}}_{\nu}x^{\nu}##,

##\frac{\partial(\partial_{\mu}\phi(x))}{\partial(\partial_{\alpha}\phi(x))} \rightarrow \frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda^{-1})^{\beta}}_{\alpha}(\partial_{\beta}\phi)(\Lambda^{-1}x))}=\frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda)_{\alpha}}^{\beta}(\partial_{\beta}\phi)(\Lambda^{-1}x))}##

How should I now take ##{(\Lambda)_{\alpha}}^{\beta}## to the numerator?

You can take ##\Lambda## outside of the derivatives. But I suggest you look at a text on (special) relativity.

I suppose you're studying relativistic field theory? This means that knowing how to quickly read and interpret indices (upper/lower) can help you focus on the physics content instead of the manipulations of expressions.

Let me finish the steps of my derivation:

##\frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda)_{\alpha}}^{\beta}(\partial_{\beta}\phi)(\Lambda^{-1}x))}=\frac{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda)_{\alpha}}^{\beta}(\partial_{\beta}\phi)(\Lambda^{-1}x))}\frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}={(\Lambda^{-1})^{\nu}}_{\mu}{(\Lambda^{-1})^{\beta}}_{\alpha}\frac{\partial((\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}={(\Lambda^{-1})^{\nu}}_{\mu}{(\Lambda)_{\alpha}}^{\beta}\frac{\partial((\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}##

I suppose you're studying relativistic field theory? This means that knowing how to quickly read and interpret indices (upper/lower) can help you focus on the physics content instead of the manipulations of expressions.
Hmm. I guess that's very important. I was just trying to practice my skills in tensor manipulations since I'm still new to this kind of math.

Wait! My order of indices on ##{(\Lambda)_{\alpha}}^{\beta}## in the final result are not the same as your order of indices in ##{\Lambda^{\alpha}}_{\beta}##.

Did I make a mistake in the second step of my calculation in the previous post?