I know that ##\frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \partial_{\mu} \phi\ \partial^{\mu} \phi \big) = \partial_{\mu} \phi##.

Now, I need to prove this to myself.

So, here goes nothing.

##\frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \partial_{\mu} \phi\ \partial^{\mu} \phi \big)##
## = \frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \eta^{\mu\nu}\partial_{\mu} \phi\ \partial_{\nu} \phi \big)##
##= \eta^{\mu\nu}\ \partial_{\nu} \phi + \eta_{\mu\nu} \eta^{\mu\nu}\ \partial_{\mu} \phi##,

where I first differentiated the factor ##\partial_{\mu}\phi## with respect to ##\partial_{\mu}\phi## and then I differentiated the factor ##\partial_{\nu}\phi## with respect to ##\partial_{\mu}\phi##.

Am I correct so far?

Your indices in the second term don't match with those in the first so that's an indication something is wrong.

Try using the following form

$$\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \partial_{\mu} \phi\ \partial^{\mu} \phi \big)$$

This helps you avoid the problem where you have ill-defined expressions like ##\eta_{\mu\nu}\eta^{\mu\nu}\partial_\mu\phi##.
The latter can mean two things
$$\eta_{\mu\nu}\left(\eta^{\mu\nu}\partial_\mu\phi\right)=\eta_{\mu\nu}\partial^\nu\phi=\partial_\mu\phi$$
or it can mean (using ##\eta_{\mu\nu}\eta^{\mu\nu}=D## with D the number of spacetime dimensions)
$$\left(\eta_{\mu\nu}\eta^{\mu\nu}\right)\partial_\mu\phi=4\partial_\mu\phi$$

spaghetti3451
Firstly, I need to say that ##
\frac{\partial}{\partial (\partial_{\mu}\phi)} \big( \frac{1}{2}\partial_{\mu} \phi\ \partial^{\mu} \phi \big) = \partial^{\mu} \phi##.
I made a mistake in my first post of placing the ##\mu## index on the RHS downstairs, instead of upstairs. I also made a mistake of forgetting the factor of ##\frac{1}{2}##.

Now,

##\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \frac{1}{2}\partial_{\mu} \phi\ \partial^{\mu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \partial_{\mu} \phi\ \partial_{\nu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}(\eta_{\alpha\mu}\partial_{\nu}\phi+\eta_{\alpha\nu}\partial_{\mu}\phi)##
##=\frac{1}{2}(\eta_{\alpha\mu}\eta^{\mu\nu}\partial_{\nu}\phi+\eta_{\alpha\nu}\eta^{\nu\mu}\partial_{\mu}\phi)##
##=\eta_{\alpha\mu}\eta^{\mu\nu}\partial_{\nu}\phi##
##=\delta^{\nu}_{\alpha}\partial_{\nu}\phi##
##=\partial_{\alpha}\phi##

Am I correct?

Last edited:
Why do you use that
$$\frac{\partial\left(\partial_\mu\phi\right)}{\partial\left(\partial_\alpha\phi\right)}=\eta_{\mu\alpha}$$

A hint that something is wrong, is that the indices don't correspond. You can check this by performing a transformation ##x^\mu\to x^{\prime\mu}## you should find that there is an upper and a lower index. [*]

It's equal to ##\delta^\alpha_\mu\,\,\,\left(=\eta^\alpha_{\,\,\, \mu}\right)## as far as I can tell.

[*]: Oddly it seems that this would turn out correct if you contract the metric so I'm starting to doubt myself.

spaghetti3451

##\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \frac{1}{2}\partial_{\mu} \phi\ \partial^{\mu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}\frac{\partial}{\partial (\partial_{\alpha}\phi)} \big( \partial_{\mu} \phi\ \partial_{\nu} \phi \big)##
##=\frac{1}{2}\eta^{\mu\nu}({\eta^{\alpha}}_{\mu}\partial_{\nu}\phi+{\eta^{\alpha}}_{\nu}\partial_{\mu}\phi)##
##=\frac{1}{2}({\eta^{\alpha}}_{\mu}\eta^{\mu\nu}\partial_{\nu}\phi+{\eta^{\alpha}}_{\nu}\eta^{\nu\mu}\partial_{\mu}\phi)##
##={\eta^{\alpha}}_{\mu}\eta^{\mu\nu}\partial_{\nu}\phi##
##=\eta^{\alpha\nu}\partial_{\nu}\phi##
##=\partial^{\alpha}\phi##

It appears that in the answer, the index ##\alpha## should be upstairs, not downstairs. I made this mistake in my previous post.

Try to see why the ##\alpha## index is upstairs and also why you can write ##\eta^\alpha_{\,\,\,\mu}=\delta^\alpha_\mu## where delta is the kronecker delta.

Other than that it is correct.

spaghetti3451
Try to see why the ##\alpha## index is upstairs

Shouldn't the index ##\alpha## be upstairs because we are differentiating the product of an upstairs index and a downstairs index with respect to a downstairs index?

and also why you can write ##\eta^\alpha_{\,\,\,\mu}=\delta^\alpha_\mu## where delta is the kronecker delta.

I think ##\eta^\alpha_{\,\,\,\mu}=\delta^\alpha_\mu## because ##\eta^\alpha_{\,\,\, \mu} = \eta^{\alpha\nu}\eta_{\nu\mu} = \delta^\alpha_\mu##.

Am I correct?

The latter part is perfect.

The first part becomes clearest when explicitly changing coordinates In other words how does the

$$\partial_\mu\phi(x)=\frac{\partial\phi(x)}{\partial x^\mu}$$

Apply a coordinate transformation ##x^\mu\to x^{\prime\mu}## and look at the way the Jacobian shows up.
The Jacobian is given by (or "upside down" doesn't matter too much since we look at invertible transformations)

$$\frac{\partial x^{\prime\mu}}{\partial x^{\mu}}$$

Under the coordinate transformation ##x^{\mu}\rightarrow x'^{\mu}={\Lambda^{\mu}}_{\nu}x^{\nu}##,

##\partial_{\mu}\phi(x)~=~\frac{\partial\phi(x)}{\partial x^{\mu}}~##

##\rightarrow \frac{\partial\phi(\Lambda^{-1}x)}{\partial x^{\mu}}=\frac{\partial (\Lambda^{-1}x)^{\nu}}{\partial x^{\mu}}\frac{\partial\phi(\Lambda^{-1}x)}{\partial (\Lambda^{-1}x)^{\nu}}=\frac{\partial}{\partial x^{\mu}}\Big( {(\Lambda^{-1})^{\nu}}_{\rho}x^{\rho} \Big)(\partial_{\nu}\phi)(\Lambda^{-1}x)={(\Lambda^{-1})^{\nu}}_{\rho}\delta^{\rho}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x)={(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x)##.

How does this help?

Last edited:
Well, now use it in the derivative w.r.t. ##\partial_\mu\phi##.
You should get something of the form
$$\frac{\partial\left(\partial_\mu\phi\right)}{\partial\left(\partial_\alpha\phi\right)}\to \left(\Lambda^{-1}\right)^\nu_{\,\,\, \mu}\Lambda^\alpha_{\,\,\, \beta}\frac{\partial\left(\partial_\nu\phi(x^\prime)\right)}{\partial\left(\partial_\beta\phi(x^\prime)\right)}$$

This means that the object transform as a (1,1)-tensor i.e. it has one upper and one lower index.

This might help with the details https://www.physicsforums.com/threads/kronecker-delta-as-tensor-proof.320692/

Ok. So, under the coordinate transformation ##x^{\mu} \rightarrow {\Lambda^{\mu}}_{\nu}x^{\nu}##,

##\frac{\partial(\partial_{\mu}\phi(x))}{\partial(\partial_{\alpha}\phi(x))} \rightarrow \frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda^{-1})^{\beta}}_{\alpha}(\partial_{\beta}\phi)(\Lambda^{-1}x))}=\frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda)_{\alpha}}^{\beta}(\partial_{\beta}\phi)(\Lambda^{-1}x))}##

How should I now take ##{(\Lambda)_{\alpha}}^{\beta}## to the numerator?

You can take ##\Lambda## outside of the derivatives. But I suggest you look at a text on (special) relativity.

I suppose you're studying relativistic field theory? This means that knowing how to quickly read and interpret indices (upper/lower) can help you focus on the physics content instead of the manipulations of expressions.

Let me finish the steps of my derivation:

##\frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda)_{\alpha}}^{\beta}(\partial_{\beta}\phi)(\Lambda^{-1}x))}=\frac{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}{\partial({(\Lambda)_{\alpha}}^{\beta}(\partial_{\beta}\phi)(\Lambda^{-1}x))}\frac{\partial({(\Lambda^{-1})^{\nu}}_{\mu}(\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}={(\Lambda^{-1})^{\nu}}_{\mu}{(\Lambda^{-1})^{\beta}}_{\alpha}\frac{\partial((\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}={(\Lambda^{-1})^{\nu}}_{\mu}{(\Lambda)_{\alpha}}^{\beta}\frac{\partial((\partial_{\nu}\phi)(\Lambda^{-1}x))}{\partial((\partial_{\beta}\phi)(\Lambda^{-1}x))}##

I suppose you're studying relativistic field theory? This means that knowing how to quickly read and interpret indices (upper/lower) can help you focus on the physics content instead of the manipulations of expressions.

Hmm. I guess that's very important. I was just trying to practice my skills in tensor manipulations since I'm still new to this kind of math.

Wait! My order of indices on ##{(\Lambda)_{\alpha}}^{\beta}## in the final result are not the same as your order of indices in ##{\Lambda^{\alpha}}_{\beta}##.

Did I make a mistake in the second step of my calculation in the previous post?