How is photon momentum compatible with special relativity?

Ahmed1029 · Nov 28, 2022

In relativity, momentum of a body is given by ##p=mv/\sqrt{1-v^2/c^2}##, but if mass is exactly zero and velocity is exactly ##c##, how is the photon momentum even defined? I don't think this problem can be resolved by simply stating the other formula relating energy to momentum, since it was derived from this definition in ths first place.

Dale · Nov 28, 2022

The correct general formula is ##m^2 c^2= E^2/c^2 - p^2##

Ahmed1029 said:

I don't think this problem can be resolved by simply stating the other formula relating energy to momentum, since it was derived from this definition in ths first place.

You have it backwards. The general formula I wrote is the one that is used to derive your formula, not the other way around.

Orodruin · Nov 28, 2022

Ahmed1029 said:

p=mv/sqrt(1-v^2/c^2)

That’s only valid if ##m > 0##.

Ahmed1029 · Nov 28, 2022

Dale said:

The correct general formula is ##m^2 c^2= E^2/c^2 - p^2##

You have it backwards. The general formula I wrote is the one that is used to derive your formula, not the other way around.

My understanding is that when you have the position 4 vector(ct,x,y,z), you get the momenthm by differentiating each component with respect to the proper time and multiply it with mass, so that each component necessarily has a mass multiplied by the lorentz factor, which would then by undefined for a photon.

Ahmed1029 · Nov 28, 2022

Orodruin said:

That’s only valid if ##m > 0##.

Is it a postulate or can be shown?

Orodruin · Nov 28, 2022

Ahmed1029 said:

My understanding is that when you have the position 4 vector(ct,x,y,z), you get the momenthm by differentiating each component with respect to the proper time and multiply it with mass, so that each component necessarily has a mass multiplied by the lorentz factor, which would then by undefined for a photon.

This is only valid for the case ##m>0## because only then do you have a proper time to differentiate with respect to. The more general statement is that the 4-momentum is proportional to the 4-vector of an affinely parametrised world line (timelike or null).

Ahmed1029 · Nov 28, 2022

Orodruin said:

This is only valid for the case ##m>0## because only then do you have a proper time to differentiate with respect to. The more general statement is that the 4-momentum is proportional to the 4-vector of an affinely parametrised world line (timelike or null).

Okay so now I know this formulation doesn't apply to the photon, but then how do we know that the general formula can be extended to include the photon? Do we start with the formulation using m and gamma and then substitute p in its place and do whatever we like disregarding the initial formula completely? Or is there another derivation that says nothing about mass and nothing about the gamma factor?

Orodruin · Nov 28, 2022

Ahmed1029 said:

Okay so now I know this formulation doesn't apply to the photon, but then how do we know that the general formula can be extended to include the photon? Do we start with the formulation using m and gamma and then substitute p in its place and do whatever we like disregarding the initial formula completely? Or is there another derivation that says nothing about mass and nothing about the gamma factor?

I just told you how it generalises. Instead of using proper time you use an affine parameter. For a timelike worldline proper time is an affine parameter and we call the proportionality constant ”mass”.

Edit; Because of how proper time is defined, this proportionality constant then also satisfies ##E^2 - p^2 = m^2##.

Ahmed1029 · Nov 28, 2022

Orodruin said:

I just told you how it generalises. Instead of using proper time you use an affine parameter. For a timelike worldline proper time is an affine parameter and we call the proportionality constant ”mass”.

I don't understand since I don't know what an affine parameter is. Is there a simpler way to say this?

vanhees71 · Nov 28, 2022

Photons can only be understood by relativistic quantum field theory, and to get the possible field equations, you have to consider the symmetry group of Minkowski spacetime. In quantum theory this symmetry group is represented by unitary representations, and the most simple ones are the irreducible interpretations.

Considering those symmetry transformations that are smoothly connected with the identity, you have the proper orthocrhonous Poincare group, which can be built from the translations of space-time, rotations of space, and Lorentz boosts. The generators of these transformations are then representing the corresponding conserved quantities from Noether's theory (translations in time and space are related to energy and momentum, rotations to angular momentum, and the invariance under Lorentz boosts leads to the theorem that the center of energy of any closed system moves with constant velocity).

Now you can find the unitary irreducible representations of the Poincare group (as was first derived by Wigner in 1939). The upshot is that there are two major classes of such representations, which also admit a socalled local realization of the Poincare group and a local relativistic QFT, where "local" means that there are local observables and all interactions are local, i.e., the Hamilton density ##\mathcal{H}(x)## commutes with all other local observables at space-like separation of the arguments, i.e., ##[\mathcal{H}(x),\mathcal{A}(y)]=0## for all four-vectors ##(x-y)## being spacelike: These two major classes are characterized by the Casimir operator ##p_{\mu} p^{\mu}=m^2## (using natural units with ##\hbar=c=1##) with ##m^2>0## (leading to QFTs describing particles with non-zero invariant mass) or ##m^2## (leading to QFTs describing "particles" (or rather "quanta") with 0 invariant mass).

Since now ##p^0=E## is the energy of a particle you have ##E^2-\vec{p}^2=m^2##. For ##m=0## you have ##E=|\vec{p}|##, and the speed of these particles is ##|\vec{v}|=|\vec{p}|/E=1##, i.e., particles with zero invariant mass move with the speed of light with respect to any inertial frame of reference, which makes the different from the massive particles.

Only for massive particles, you can express ##\vec{p}## in terms of ##\vec{v}=\vec{p}/E## since only then ##|\vec{v}|<1## and only then you can write
$$E^2-\vec{p}^2=E^2(1-\vec{v}^2)=m^2 \; \Rightarrow \; E=\frac{m}{\sqrt{1-\vec{v}^2}}$$
and
$$\vec{p}=E \vec{v}=\frac{m \vec{v}}{\sqrt{1-\vec{v}^2}}.$$
That's why massive particles have always a speed less than the speed of light with respect to any inertial frame of reference, and that's why for massless particles you cannot express ##E## and ##\vec{p}## in terms of the speed, because it's always 1 (i.e., it's always moving with the speed of light).

It turns also out that in another respect it's not so trivial to derive the properties of massless particles from those of massive particles letting the mass go to zero: Besides having a momentum, quantum particles also have a spin. For massive relativistic particles the spin is pretty similar to the spin within non-relativstic quantum mechanics. It's only no longer commuting with ##\vec{p}##, and thus it becomes a bit more complicated to handle.

This drastically changes for massless particles. Massless particles indeed do not make sense in non-relativistic QM, because one can show that massless realizations of the Galilei group do not lead to a dynamical quantum description, which can be in any sense interpreted physically.

In relativisic QFT it makes, however, sense to have massless particles, but they behave a bit different than you naively think. If the spin of a massless particle is ##\geq 1##, and photons are described, by quantum fields leading to massless particles, with spin 1 (i.e., massless vector fields). These, are however special compared to the analogous case of massive vector fields.

First of all it turns out that there are only 2 instead of three spin-degrees of freedom, i.e., electromagnetic fields and their quanta, the photons, have only 2 and not three polarization degrees of freedom, which can be characterized as the total angular momentum component in direction of their momenta, i.e., the helicities, which can only take the values ##\pm 1## but not the value ##0## (as is the case for massive particles). Also for massless particles the helicity is an invariant under arbitrary Lorentz boosts, while for massive particles you can flip the sign of the helicity by transforming from one inertial reference frame to another. This is, because you can always move faster than the massive particle in the original frame, and in the corresponding more frame the helicity looks flipped in sign. This is impossible for massless particles, which move always with the speed of light, and thus you cannot overtake them with any other inertial reference frame, i.e., the helicities for massless particles don't change under Poincare transformations.

Finally, in this approach to relativistic QT via symmetry principles, you have to construct the position observable out of this formalism. As it turns out, this is no problem for massive particles (it's also no problem in non-relativistic QT to derive the position operators, which is no surprise, since in non-relativistic QT the particles must necessarily have a mass, as stated already above), but for massless particles with a spin ##\geq 1##, you cannot find such a position observable, and this implies that it doesn't make any sense at all to think about photons as simple massless classical point-particles, because photons by construction cannot in any clear physical way be localized at all.

Ahmed1029 · Nov 28, 2022

vanhees71 said:

Photons can only be understood by relativistic quantum field theory, and to get the possible field equations, you have to consider the symmetry group of Minkowski spacetime. In quantum theory this symmetry group is represented by unitary representations, and the most simple ones are the irreducible interpretations.

Considering those symmetry transformations that are smoothly connected with the identity, you have the proper orthocrhonous Poincare group, which can be built from the translations of space-time, rotations of space, and Lorentz boosts. The generators of these transformations are then representing the corresponding conserved quantities from Noether's theory (translations in time and space are related to energy and momentum, rotations to angular momentum, and the invariance under Lorentz boosts leads to the theorem that the center of energy of any closed system moves with constant velocity).

Now you can find the unitary irreducible representations of the Poincare group (as was first derived by Wigner in 1939). The upshot is that there are two major classes of such representations, which also admit a socalled local realization of the Poincare group and a local relativistic QFT, where "local" means that there are local observables and all interactions are local, i.e., the Hamilton density ##\mathcal{H}(x)## commutes with all other local observables at space-like separation of the arguments, i.e., ##[\mathcal{H}(x),\mathcal{A}(y)]=0## for all four-vectors ##(x-y)## being spacelike: These two major classes are characterized by the Casimir operator ##p_{\mu} p^{\mu}=m^2## (using natural units with ##\hbar=c=1##) with ##m^2>0## (leading to QFTs describing particles with non-zero invariant mass) or ##m^2## (leading to QFTs describing "particles" (or rather "quanta") with 0 invariant mass).

Since now ##p^0=E## is the energy of a particle you have ##E^2-\vec{p}^2=m^2##. For ##m=0## you have ##E=|\vec{p}|##, and the speed of these particles is ##|\vec{v}|=|\vec{p}|/E=1##, i.e., particles with zero invariant mass move with the speed of light with respect to any inertial frame of reference, which makes the different from the massive particles.

Only for massive particles, you can express ##\vec{p}## in terms of ##\vec{v}=\vec{p}/E## since only then ##|\vec{v}|<1## and only then you can write
$$E^2-\vec{p}^2=E^2(1-\vec{v}^2)=m^2 \; \Rightarrow \; E=\frac{m}{\sqrt{1-\vec{v}^2}}$$
and
$$\vec{p}=E \vec{v}=\frac{m \vec{v}}{\sqrt{1-\vec{v}^2}}.$$
That's why massive particles have always a speed less than the speed of light with respect to any inertial frame of reference, and that's why for massless particles you cannot express ##E## and ##\vec{p}## in terms of the speed, because it's always 1 (i.e., it's always moving with the speed of light).

It turns also out that in another respect it's not so trivial to derive the properties of massless particles from those of massive particles letting the mass go to zero: Besides having a momentum, quantum particles also have a spin. For massive relativistic particles the spin is pretty similar to the spin within non-relativstic quantum mechanics. It's only no longer commuting with ##\vec{p}##, and thus it becomes a bit more complicated to handle.

This drastically changes for massless particles. Massless particles indeed do not make sense in non-relativistic QM, because one can show that massless realizations of the Galilei group do not lead to a dynamical quantum description, which can be in any sense interpreted physically.

In relativisic QFT it makes, however, sense to have massless particles, but they behave a bit different than you naively think. If the spin of a massless particle is ##\geq 1##, and photons are described, by quantum fields leading to massless particles, with spin 1 (i.e., massless vector fields). These, are however special compared to the analogous case of massive vector fields.

First of all it turns out that there are only 2 instead of three spin-degrees of freedom, i.e., electromagnetic fields and their quanta, the photons, have only 2 and not three polarization degrees of freedom, which can be characterized as the total angular momentum component in direction of their momenta, i.e., the helicities, which can only take the values ##\pm 1## but not the value ##0## (as is the case for massive particles). Also for massless particles the helicity is an invariant under arbitrary Lorentz boosts, while for massive particles you can flip the sign of the helicity by transforming from one inertial reference frame to another. This is, because you can always move faster than the massive particle in the original frame, and in the corresponding more frame the helicity looks flipped in sign. This is impossible for massless particles, which move always with the speed of light, and thus you cannot overtake them with any other inertial reference frame, i.e., the helicities for massless particles don't change under Poincare transformations.

Finally, in this approach to relativistic QT via symmetry principles, you have to construct the position observable out of this formalism. As it turns out, this is no problem for massive particles (it's also no problem in non-relativistic QT to derive the position operators, which is no surprise, since in non-relativistic QT the particles must necessarily have a mass, as stated already above), but for massless particles with a spin ##\geq 1##, you cannot find such a position observable, and this implies that it doesn't make any sense at all to think about photons as simple massless classical point-particles, because photons by construction cannot in any clear physical way be localized at all.

Wow that's a lot to consider! But if it's that complicated, why is it always assumed trivial to get the momentum of a photon in books of special relativity by treating them classically? Is it just a trick?

Dale · Nov 28, 2022

Ahmed1029 said:

My understanding is that when you have the position 4 vector(ct,x,y,z), you get the momenthm by differentiating each component with respect to the proper time and multiply it with mass, so that each component necessarily has a mass multiplied by the lorentz factor, which would then by undefined for a photon.

You are already assuming a non-zero mass if you take that approach. It can be used as a motivation, but not as a general definition. Instead, first define the particle's four-momentum $$p^\mu =(E/c,\vec p)$$ where ##E## is the particle's total energy and ##\vec p## is the particle's momentum. Then ##p^\mu p_\mu## is an invariant which we will define as the particle's invariant mass $$p^\mu p_\mu = m^2 c^2 = E^2/c^2 - p^2$$

Now, if a particle is massless (##m=0##) then ##E=cp## in all frames. And since ##v=c^2 p/E## we immediately get ##v=c^2 p/(c p) = c##. So massless particles must travel at ##c##.

On the other hand, if ##0<m## then there exists some frame where ##p'=0## and in that frame ##E'=mc^2##. If we boost from that frame to a frame moving at ##v## relative to that then we get $$p^\mu = \left( \frac{m c}{\sqrt{1-\frac{v^2}{c^2}}}, \frac{m \vec v}{\sqrt{1-\frac{v^2}{c^2}}} \right)$$ in which the spacelike term is the formula you used, derived from the general formula.

Ahmed1029 · Nov 28, 2022

Dale said:

You are already assuming a non-zero mass if you take that approach. It can be used as a motivation, but not as a general definition. Instead, first define the particle's four-momentum $$p^\mu =(E/c,\vec p)$$ where ##E## is the particle's total energy and ##\vec p## is the particle's momentum. Then ##p^\mu p_\mu## is an invariant which we will define as the particle's invariant mass $$p^\mu p_\mu = m^2 c^2 = E^2/c^2 - p^2$$

Now, if a particle is massless (##m=0##) then ##E=cp## in all frames. And since ##v=c^2 p/E## we immediately get ##v=c^2 p/(c p) = c##. So massless particles must travel at ##c##.

On the other hand, if ##0<m## then there exists some frame where ##p=0## and in that frame ##E=mc^2=E_0##. If we boost from that frame to a frame moving at ##v## relative to that then we get $$p'^\mu = \left( \frac{m c}{\sqrt{1-\frac{v^2}{c^2}}}, \frac{m \vec v}{\sqrt{1-\frac{v^2}{c^2}}} \right)$$ in which the spacelike term is the formula you used, derived from the general formula.

Yes! that probably what I was looking for! Thanks!

Dale · Nov 28, 2022

vanhees71 said:

Only for massive particles, you can express ##\vec{p}## in terms of ##\vec{v}=\vec{p}/E## since only then ##|\vec{v}|<1## and only then you can write
$$E^2-\vec{p}^2=E^2(1-\vec{v}^2)=m^2 \; \Rightarrow \; E=\frac{m}{\sqrt{1-\vec{v}^2}}$$
and
$$\vec{p}=E \vec{v}=\frac{m \vec{v}}{\sqrt{1-\vec{v}^2}}.$$

I like this step better than mine!

Orodruin · Nov 28, 2022

vanhees71 said:

Only for massive particles, you can express p→ in terms of v→=p→/E since only then |v→|<1 and only then you can write
E2−p→2=E2(1−v→2)=m2⇒E=m1−v→2
and

Noting that it is the implication that is only valid if m > 0. ##E^2(1-v^2) = m^2## still holds in the case of v=1, but then with the implication m=0.

PeterDonis · Nov 28, 2022

Ahmed1029 said:

I don't understand since I don't know what an affine parameter is. Is there a simpler way to say this?

Do you know what a parameterized curve is? All that means is that you label each point on the curve with a real number in a continuous way. "Affine" parameter just means there are some additional mathematical conditions on the curve parameter that we don't really need to go into here; all we need to know is that those conditions are necessary for what follows to work.

If you parameterize a curve in spacetime (flat spacetime since we are talking about SR here), and you pick an inertial frame, then you can express the coordinates of points on the curve as functions of the curve parameter. If we call the parameter ##\lambda## and the coordinates ##x^\mu##, then we have ##x^\mu (\lambda)##. We can then take the derivative of ##x^\mu (\lambda)## with respect to ##\lambda## to form a 4-vector. If we call the timelike and spacelike components of this 4-vector in our chosen inertial frame ##(\omega, \vec{k})##, then taking the norm of this 4-vector gives ##\sqrt{\omega^2 - k^2} = N##, where ##N## is a constant to be determined.

So far everything we've said is valid for any kind of curve in spacetime. However, the next step will be different for timelike vs. null curves. For a timelike curve, any affine parameter we pick will be a measure of arc length along the curve, i.e., of proper time; different affine parameters just correspond to different choices of units of time (e.g., seconds vs. years) and/or different choices of the "zero point" of time (e.g., do we set our clocks to zero at the start of our experiment, or at some agreed-upon time in the past like the starting point of our common calendar system?). So we can treat ##\lambda## as proper time ##\tau##, and the 4-vector we get by differentiating as above is then ##d x^\mu / d\tau##, usually called the 4-velocity. If we multiply by the rest mass ##m##, we get the 4-momentum. For this case, we can interpret the 4-vector components ##(\omega, \vec{k})## as ##(\gamma, \gamma \vec{v})##, where ##\gamma = 1 / \sqrt{1 - v^2}## (and I am using units in which ##c = 1## for simplicity). Multiplying by ##m## then gives the 4-momentum vector ##(E, \vec{p})##. The norm ##N## of this vector is then just the rest mass ##m##.

For null curves, we cannot use arc length as an affine parameter because arc length is zero between any two points on the curve. We can, however, use coordinate time in any inertial frame as an affine parameter. If we do this, we find that the tangent vector ##d x^\mu / d\lambda## already is the 4-momentum vector expressed in the inertial frame whose coordinate time we are using; i.e., the 4-vector components ##(\omega, \vec{k})## are the components of the 4-momentum vector ##(E, \vec{p})## in that frame. The norm of this vector, of course, is zero, since the norm of any vector tangent to a null curve is zero; we can interpret this as the rest mass of the object whose worldline the curve is being zero.

In both cases, squaring the norm of the 4-momentum vector gives ##E^2 - p^2 = m^2##.

Ahmed1029 · Nov 28, 2022

PeterDonis said:

If you parameterize a curve in spacetime (flat spacetime since we are talking about SR here), and you pick an inertial frame, then you can express the coordinates of points on the curve as functions of the curve parameter. If we call the parameter ##\lambda## and the coordinates ##x^\mu##, then we have ##x^\mu (\lambda)##. We can then take the derivative of ##x^\mu (\lambda)## with respect to ##\lambda## to form a 4-vector. If we call the timelike and spacelike components of this 4-vector in our chosen inertial frame ##(\omega, \vec{k})##, then taking the norm of this 4-vector gives ##\sqrt{\omega^2 - k^2} = N##, where ##N## is a constant to be determined.

So far everything we've said is valid for any kind of curve in spacetime. However, the next step will be different for timelike vs. null curves. For a timelike curve, any affine parameter we pick will be a measure of arc length along the curve, i.e., of proper time; different affine parameters just correspond to different choices of units of time (e.g., seconds vs. years) and/or different choices of the "zero point" of time (e.g., do we set our clocks to zero at the start of our experiment, or at some agreed-upon time in the past like the starting point of our common calendar system?). So we can treat ##\lambda## as proper time ##\tau##, and the 4-vector we get by differentiating as above is then ##d x^\mu / d\tau##, usually called the 4-velocity. If we multiply by the rest mass ##m##, we get the 4-momentum. For this case, we can interpret the 4-vector components ##(\omega, \vec{k})## as ##(\gamma, \gamma \vec{v})##, where ##\gamma = 1 / \sqrt{1 - v^2}## (and I am using units in which ##c = 1## for simplicity). Multiplying by ##m## then gives the 4-momentum vector ##(E, \vec{p})##. The norm ##N## of this vector is then just the rest mass ##m##.

For null curves, we cannot use arc length as an affine parameter because arc length is zero between any two points on the curve. We can, however, use coordinate time in any inertial frame as an affine parameter. If we do this, we find that the tangent vector ##d x^\mu / d\lambda## already is the 4-momentum vector expressed in the inertial frame whose coordinate time we are using; i.e., the 4-vector components ##(\omega, \vec{k})## are the components of the 4-momentum vector ##(E, \vec{p})## in that frame. The norm of this vector, of course, is zero, since the norm of any vector tangent to a null curve is zero; we can interpret this as the rest mass of the object whose worldline the curve is being zero.

In both cases, squaring the norm of the 4-momentum vector gives ##E^2 - p^2 = m^2##.

Wow the geometric intepretation is much more powerful than the algebraic one and makes much more sense

Orodruin · Nov 28, 2022

Ahmed1029 said:

Wow the geometric intepretation is much more powerful than the algebraic one and makes much more sense

This is true for most aspects of relativity.

Dale · Nov 28, 2022

Ahmed1029 said:

Wow the geometric intepretation is much more powerful than the algebraic one and makes much more sense

That is probably the single best take-away message that you could get from a typical relativity thread on this forum.

vanhees71 · Nov 29, 2022

Ahmed1029 said:

Wow that's a lot to consider! But if it's that complicated, why is it always assumed trivial to get the momentum of a photon in books of special relativity by treating them classically? Is it just a trick?

This is due to the fact that textbook writers tend to copy each other all the time. In QT it's particularly difficult to find a good introduction, and usually one uses a historical approach, starting with "old quantum theory", i.e., the ad-hoc ideas used from 1900-1925 to get an idea how to describe quantum phenomena. That's why you always find the "naive photon picture", i.e., photons as massless classical particles although such a thing is mathematically at least challenging or the Bohr-Sommerfeld model for atoms, which only works for hydrogen, etc. The problem with modern quantum theory is that it's quite abstract, and it's hard to find an intuitive introduction without talking a bit about the historical development to heuristically introduce this abstract picture, i.e., Hilbert space, self-adjoint operators, pure and mixed states, and all that. An alternative approach is the use of symmetries as a kind of modern "correspondence principle", because from a mathematical point of view that's a common ground of classical and quantum physics, i.e., the same symmetry principles that determine the laws of classical physics also determine the laws of the quantum theory.

In classical physics instead of photons you have electromagnetic fields and the corresponding wave equation together with gauge invariance. The "particle-like features" of electromagnetic wave fields are not following from a naive particle picture a la Newton but by the socalled eikonal approximation, which leads to an equation that's similar to the Hamilton-Jacobi partial differential equation of classical mechanics. The ingenious use of this principle in a kind of "reverse engineering" lead Schrödinger to his non-relativistic wave equation. He asked, which wave equation will lead to the Hamilton-Jacobi equation when using an eikonal approximation and was lead to his equation.

For more on the substitute of the eikonal approximation for the "naive photon picture" for use in general relativity and cosmology, see
https://itp.uni-frankfurt.de/~hees/pf-faq/gr-edyn.pdf

How is photon momentum compatible with special relativity?

Similar threads

Undergrad Euclidean geometry and gravity

Undergrad Synchronizing clocks in an inertial frame if light is anisotropic

Undergrad Question about Parallel Transport

Undergrad The Einstein Clock aka Light Clock

Graduate Assumptions of Hawking-Penrose 1970 Singularity Theorem

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers