# Motivation for the usage of 4-vectors in special relativity

I recently had someone ask me why we use 4-vectors in special relativity and what is the motivation for introducing them in the first place. This is the response I gave:

From Einstein's postulates( i.e. 1. the principle of relativity - the laws of physics are identical (invariant) in all inertial frames of reference, and 2. the speed of light in vacuum is the same for all inertial observers, regardless of the motion of the observer or the source) we are naturally led to the Lorentz transformations, which relate the coordinates of one inertial reference frame to another. In doing so one finds that time is in fact a frame dependent quantity and furthermore, spatial and temporal coordinates are mixed together under such transformations, showing that space and time are not in fact independent but should be considered as a 4-dimensional continuum, which we call spacetime.
We are then naturally led to consider 4-dimensional vectors, since these span the entire space and furthermore they transform under Lorentz transformations in such a way that the equations describing physical phenomena are Lorentz invariant, a requirement of Einstein's postulates. An additional argument for their usage is that in special relativity it is the spacetime interval that is an invariant quantity and not the traditional Pythagorean line element as in Classical mechanics. It is seen that the lengths of 4-vectors are preserved in this case, whereas the lengths of 3-vectors are not, hence we should construct physical equations out of 4-vector quantities ( and of course, in general, scalars and tensors).

I'm now starting to doubt my own understanding a bit and I'm worried I may have conveyed incorrect information. Would someone be able to clarify whether what I've written is correct or not (and if it's not correct, explain why)?

Battlemage!

stevendaryl
Staff Emeritus

Closely related yes. Apologies for that. I just wanted to check really that I've understood the concept of four-vectors correctly?!

PeterDonis
Mentor
2020 Award
I just wanted to check really that I've understood the concept of four-vectors correctly?
What you've written looks fine to me.

PAllen
What you've written looks fine to me.
Agreed.

vanhees71
Gold Member
Yes, I think too that this is a very good answer.

Perhaps one can add on a more advanced level that the symmetry analysis leading to the Lorentz transformations (as, e.g., given in Einstein's original paper of 1905) can be mathematically most elegantly formalized along the lines of Minkowski's analysis (1907) by introducing a "pseudo-scalar product" (i.e., a non-degenerate bilinear form) with signature (1,3) (or (3,1) if you prefer the east-coast convention) and identify the Lorentz transformations as the pseudo-orthogonal group ##\text{O}(1,3)## (or the equivalent group ##\mathrm{O}(3,1)##) or more precisely the subgroup ##\text{SO}(1,3)^{\uparrow}##, the special orthogonal Lorentz group which is the subgroup that is continously connected with the identity, which is the true symmetry group of special-relativistic spacetime realized in nature, because the weak interaction breaks parity P, time reversal T as well as the "grand reflection" PT. So far no violation of CPT (grand reflection together with charge conjugation) has been found in accordance with the corresponding CPT theorem following from the assumption of a local microcausal quantum field theory as the correct formulation of relativistic quantum theory.

QuantumQuest and Battlemage!
stevendaryl
Staff Emeritus
Although SR proposes a much tighter connection between space and time than Newtonian physics, I would like to point out that the 4-D view can be useful in nonrelativistic physics, as well. For some examples:

1. If you have a scalar function $\phi(\vec{r},t)$, and you want to compute the rate of change of $\phi$ as experienced by a moving particle, you can succinctly write it as: $\dfrac{d \phi}{dt} = (\partial_\mu \phi)V^\mu$, where $\partial_0 \equiv \dfrac{\partial}{\partial t}$ and $\partial_j = \dfrac{\partial}{\partial x^j}$, and $V^0 = \dfrac{dt}{dt} = 1$ and $V^j = \dfrac{dx^j}{dt}$

2. Charge density $\rho$ and current density $j$ can be combined into a 4-vector $J^\mu$ where $J^0 = \rho$, and $J^i = j^i$. Then conservation of charge can be written as $\partial_\mu J^\mu = 0$

3. The g-forces due to using accelerated coordinates and the Coriolis and Centrifugal forces due to using rotating coordinates can both be seen as "connection coefficients": The path of a test particle in an accelerated, rotating or curvilinear coordinate system can be given by:

$\dfrac{d x^\mu}{dt} + \Gamma^\mu_{\nu \lambda} V^\nu V^\lambda = 0$

where $\Gamma^\mu_{\nu \lambda}$ are the connection coefficients characteristic of your coordinate system.

The use of 4-vectors $V^\mu$ and 4-covectors $\partial_\mu \phi$ makes sense in Newtonian physics as well as SR, since in both cases, there are 4 variables governing the dynamics: $x,y,z,t$. But what Newtonian physics lacks is a 4-D metric, which would allow you to convert between 4-vectors and 4-covectors.

nnunn, vanhees71 and Battlemage!
What you've written looks fine to me.
Thanks. It's reassuring to know that I've understood the concept correctly.

Perhaps one can add on a more advanced level that the symmetry analysis leading to the Lorentz transformations (as, e.g., given in Einstein's original paper of 1905) can be mathematically most elegantly formalized along the lines of Minkowski's analysis (1907) by introducing a "pseudo-scalar product" (i.e., a non-degenerate bilinear form)
In this approach, does one assume a 4D spacetime from the outset and require that the "pseudo-scalar product" be invariant under an appropriate set of linear transformations (those that preserve the speed of light under coordinate transformations), i.e. the Lorentz transformations?

Although SR proposes a much tighter connection between space and time than Newtonian physics, I would like to point out that the 4-D view can be useful in nonrelativistic physics, as well
In Newtonian physics, time is absolute and thus completely independent from space and so isn't it more of a parameter? By this I mean that one can construct a 3 dimensional space in which a set of particles are located and then the physical trajectories of the particles can then be parametrised by time.

I watched a video lecture of Prof. Shankar's introductory lectures on special relativity, in which he mentions that the reason why time is considered as a coordinate (and as such a 4th dimension) is exactly because it transforms non-trivially under Lorentz transformations, and it's value in the transformed frame is a linear combination of spatial and temporal coordinates relative to the untransformed frame (analogous to how the coordinates in a rotated coordinate frame are linear combinations of the coordinates in the unrotated frame). To me this suggests that time is considered as a parameter in Newtonian mechanics, but maybe I have misunderstood something here?

PeterDonis
Mentor
2020 Award
To me this suggests that time is considered as a parameter in Newtonian mechanics
Not necessarily; it can also suggest that Newtonian mechanics uses a different transformation, the Galilean transformation, which, for the simple case of a boost in the ##x## direction, is ##x' = x - vt##, ##t' = t##. That is, the time coordinate is unchanged by the Galilean transformation, so every event can be labeled with a "time" that is an invariant, unchanged by a change of frame. You can consider this invariant as a parameter, or you can consider it as a fourth coordinate that just stays the same when you change frames. Either approach works.

What you can't do in Newtonian mechanics is define an invariant "spacetime interval" between two events; you can only define an invariant "spatial distance" between two events that both have the same time coordinate.

What you can't do in Newtonian mechanics is define an invariant "spacetime interval" between two events; you can only define an invariant "spatial distance" between two events that both have the same time coordinate.
Is this what distinguishes the two then? I think what confuses me slightly is why spacetime in used in Newtonian mechanics (at least in the way I was taught and in the textbooks that I've read)?! Is the point that in Newtonian mechanics the two, space and time, are completely independent, and can therefore be studied in isolation, whereas, in special relativity they become inextricably linked, since the Lorentz transformation mixes up spatial and temporal coordinates, and so one can no longer think of space and time as separate entities, but necessarily as part of a single 4 dimensional entity, spacetime?

PeterDonis
Mentor
2020 Award
Is this what distinguishes the two then?
It's part of what distinguishes the two.

I think what confuses me slightly is why spacetime in used in Newtonian mechanics
It isn't in most treatments, because the "spacetime" view of Newtonian mechanics (look up "Newton-Cartan theory" to get more info) is harder to use and doesn't add any predictive power to standard Newtonian mechanics.

It isn't in most treatments, because the "spacetime" view of Newtonian mechanics (look up "Newton-Cartan theory" to get more info) is harder to use and doesn't add any predictive power to standard Newtonian mechanics.
Ah ok. I think the thing I find difficult is to justify why we mainly consider just 3D space in Newtonian mechanics and time only plays a role when we consider the evolution of systems, or the kinematics, or dynamics, etc.?!
I think I've understood the reasoning for why space and time becomes spacetime in special relativity (and the two must be dealt with "hand-in-hand"), right?! But I'm unsure now how to argue in the Newtonian case why it is justifiable to separate the two completely and why time (at least in some sense) can be thought of as a parameter, and is not thought of as a 4th dimension?

Sorry if I'm being a bit thick here, I just seem to be having a mental block on the subject.

stevendaryl
Staff Emeritus
In Newtonian physics, time is absolute and thus completely independent from space and so isn't it more of a parameter? By this I mean that one can construct a 3 dimensional space in which a set of particles are located and then the physical trajectories of the particles can then be parametrised by time
But mathematically, functions can depend on 4 such parameters: $x,y,z,t$. And similarly, a particle follows a trajectory in which there is a corresponding "velocity" with respect to all 4 coordinates. The fact that time is absolute in Newtonian physics just amounts to a constraint, that $\frac{dx^0}{dt} = 1$. Or another way of saying it is that for Newtonian physics, we're only interested in coordinate transformations in which $(x^0)' = x^0$.

I watched a video lecture of Prof. Shankar's introductory lectures on special relativity, in which he mentions that the reason why time is considered as a coordinate (and as such a 4th dimension) is exactly because it transforms non-trivially under Lorentz transformations, and it's value in the transformed frame is a linear combination of spatial and temporal coordinates relative to the untransformed frame (analogous to how the coordinates in a rotated coordinate frame are linear combinations of the coordinates in the unrotated frame). To me this suggests that time is considered as a parameter in Newtonian mechanics, but maybe I have misunderstood something here?
A galilean transformation mixes up time and space, as well: $x' = x - v t$. So even in nonrelativistic physics, if you want to consider changes of reference frame, then you need to consider time as a coordinate in order for quantities to transform like vectors under a galilean transform. To say that a quantity transforms as a vector under a coordinate change is to say that $(V^\mu)' = \frac{\partial (x^\mu)'}{\partial x^\nu} V^\nu$. That isn't true of 3-vectors under galilean transformations, but it is true of 4-vectors.

A galilean transformation mixes up time and space, as well: x′=x−vtx' = x - v t. So even in nonrelativistic physics, if you want to consider changes of reference frame, then you need to consider time as a coordinate in order for quantities to transform like vectors under a galilean transform
I think the point he was trying to get at was that ##t'=t##, i.e. the time coordinate is completely independent of the spatial coordinates.

To say that a quantity transforms as a vector under a coordinate change is to say that V^\mu' = \frac{\partial x^\mu'}{\partial x^\nu} V^\nuV^\mu' = \frac{\partial x^\mu'}{\partial x^\nu} V^\nu.
Is this why we don't use spacetime in constructions of Newtonian mechanics in which we use 3-vectors to describe physical phenomena then, since they don't transform as vectors under (linear) spacetime coordinate transformations?

That isn't true of 3-vectors under galilean transformations, but it is true of 4-vectors.
Is this another reason why we need to use 4-vectors in special relativity, because 3-vectors don't transform as vectors under Lorentz transformations?

vanhees71
Gold Member
In this approach, does one assume a 4D spacetime from the outset and require that the "pseudo-scalar product" be invariant under an appropriate set of linear transformations (those that preserve the speed of light under coordinate transformations), i.e. the Lorentz transformations?
Well, that's how I introduce the spacetime concept of special relativity, starting with the motivation from the observation that the speed of light is independent of the observer's motion or the source's motion in any (inertial) reference frame (envoking some experimental facts like Michelson+Morley) and using Einstein's two postulates. But then I don't use the lengthy calculation as in Einstein's paper (which however, everybody should carefully read and understand, because it's a very clear derivation and exhibits the physics very clearly) but just tell the students that Minkowski had the brillant insight that all of this can be described with a 4D pseudo-Euclidean affine manifold called spacetime. Then you have the Lorentz transformations nearly for free (at least the special ones, i.e., boosts in the direction of or rotation around one spatial axis.

The most convincing approach from a group-theoretical symmetry point of view is to just start from Galileo's principle of inertia, i.e., the existence of a preferred class of reference frames, the inertial frames, the assumption that any inertial observer finds a Euclidean space as his space (including homogeneity and isotropy and implying the rotation group SO(3) as a subgroup of the space-time symmetry group) and homogeneity in time. Then you can look for the symmetry group of possible space-times. The quite surprising result is that there are just two structures left, namely either the Galilei-Newton spacetime or the Einstein-Minkowski spacetime (special relativity).

nnunn
stevendaryl
Staff Emeritus
Is this why we don't use spacetime in constructions of Newtonian mechanics in which we use 3-vectors to describe physical phenomena then, since they don't transform as vectors under (linear) spacetime coordinate transformations?
Yes, that's right. But my point is that "relativity" in the sense of "the laws of physics are the same in any inertial frame" is as true in Newtonian physics as in Special Relativity, but you can't see that symmetry as a vector transformation unless you use 4-vectors, rather than 3-vectors.

Is this another reason why we need to use 4-vectors in special relativity, because 3-vectors don't transform as vectors under Lorentz transformations?
Right, but 3-vectors don't transform as vectors under Galilean transformations, either. At least some important 3-vectors (such as momentum and velocity).

vanhees71
Gold Member
Well, you can bring the homogeneous part of the Galilei transformation also easily in a 4D-vector notation. This, however, reveals that it is not a (pseudo-)orthogonal transformation and thus in Newtonian spacetime it doesn't make sense to introduce pseudoscalar product as in SRT.

The homogeneous Galilei transformation (applied to time and spatial coordinates wrt. a Cartesian basis) reads
$$t'=t, \quad \vec{x}'=\hat{R} \vec{x}-\vec{v} t,$$
where ##\hat{R} \in \text{SO}(3)## and ##\vec{v} \in \mathbb{R}^3##. This can be written as a matrix acting on a "four vector"
$$\begin{pmatrix} t' \\ \vec{x}' \end{pmatrix} = \hat{\Lambda}(\hat{R},\vec{v})= \begin{pmatrix} 1 & 0 \\ -\vec{v} & \hat{R}\end{pmatrix} \begin{pmatrix} t \\ \vec{x} \end{pmatrix}.$$
You get also the group multiplication rule by just multiplying these Galilei-transformation matrices
$$\hat{\Lambda}(\hat{R}_2,\vec{v}_2) \hat{\Lambda}(\hat{R}_1,\vec{v}_1) = \Lambda(\hat{R}_2 \hat{R}_1,\hat{R}_2 \vec{v}_1+\vec{v}_2).$$
Admittedly, it doesn't help much as a calculational tool and thus usually one doesn't treat the Galilei transformations in this way.

Well, that's how I introduce the spacetime concept of special relativity, starting with the motivation from the observation that the speed of light is independent of the observer's motion or the source's motion in any (inertial) reference frame (envoking some experimental facts like Michelson+Morley) and using Einstein's two postulates. But then I don't use the lengthy calculation as in Einstein's paper (which however, everybody should carefully read and understand, because it's a very clear derivation and exhibits the physics very clearly) but just tell the students that Minkowski had the brillant insight that all of this can be described with a 4D pseudo-Euclidean affine manifold called spacetime. Then you have the Lorentz transformations nearly for free (at least the special ones, i.e., boosts in the direction of or rotation around one spatial axis.
Is this done by requiring that, given two inertial frames ##S## and ##S'##, we require that a beam of light traverses the same distance relative to both frames, furthermore is should do propagate outwards from a source equally in all directions (required by the 2nd postulate of special relativity). This is requirement is given mathematically by $$ds^{2}=c^{2}dt^{2}-dx^{2}-dy^{2}-dz^{2}=0=c^{2}dt'^{2}-dx'^{2}-dy'^{2}-dz'^{2}=ds'^{2}$$ (where we do not assume, a priori, that ##t=t'##), i.e. Light should propagate out from a source like the surface of a sphere and so the distance travelled by the light is equal to the radius of the corresponding sphere. Now, by transitivity of equality, it follows that $$c^{2}dt^{2}-dx^{2}-dy^{2}-dz^{2}=c^{2}dt'^{2}-dx'^{2}-dy'^{2}-dz'^{2}$$ The line element is thus invariant in the case where it vanishes. Given that, in this limiting case we require that ##ds^{2}=0=ds'^{2}##, then in general, we must have that $$ds'^{2}=F(t,x,y,z)ds^{2}$$ however, by the relativity principle we should have that $$ds^{2}=F(t,x,y,z)ds'^{2}$$ since otherwise one would be able to distinguish between the two frames. As such, we have that $$F(t,x,y,z)=\pm 1$$ but since in the case where the two frames are at rest relative to one another we must have that ##c^{2}dt'^{2}=c^{2}dt^{2}##, and so we must choose ##F(t,x,y,z)=1##. Thus in general, $$ds^{2}=c^{2}dt^{2}-dx^{2}-dy^{2}-dz^{2}$$ Hence it is this "pseudo-Euclidean" line element that we require to be invariant when transforming from one inertial frame to another, since this ensures that the speed of light is independent of the observer. The fact that both spatial and temporal coordinates combine to form an invariant quantity, i.e. the line element above, suggests that we should consider a 4-dimensional space, which we call "spacetime", since then this line element then naturally becomes the metric describing the geometry of such a spacetime. Furthermore, the fact that the line element is constructed from both spatial and temporal coordinates suggests that they are coordinates of a single 4-dimensional spacetime, i.e. space and time form a single continuum and not two separate, independent continuua as was the case in Newtonian mechanics.

Would this be a correct understanding at all?

Also, is it perfectly valid to introduce the notion of spacetime in the way I did in my original post too?

Is it also correct to say that Einstein's second postulate implies homogeneity and isotropy of space and homogeneity in time (since otherwise the speed of light would not be constant and observer independent)?!

Last edited:
Dale
Right, but 3-vectors don't transform as vectors under Galilean transformations, either. At least some important 3-vectors (such as momentum and velocity).
How does one show that 3-vectors do not transform as vectors, both under Galilean and Lorentz transformations?
(In what way don't momentum and velocity transform as vectors?)

Can't one show that Newton's laws are invariant under Galilean transformations though? Since, $$\mathbf{x}'=\mathbf{x}-\mathbf{v}t\Rightarrow \dot{\mathbf{x}}'=\dot{\mathbf{x}}-\mathbf{v}\Rightarrow\ddot{\mathbf{x}}'=\ddot{\mathbf{x}}$$ (where I have used that ##\mathbf{v}## is a constant velocity). Hence ##\mathbf{a}'=\mathbf{a}##, such that ##\mathbf{F}'=m\mathbf{a}'=m\mathbf{a}=\mathbf{F}##

What you can't do in Newtonian mechanics is define an invariant "spacetime interval" between two events; you can only define an invariant "spatial distance" between two events that both have the same time coordinat
Is this an important reason why time is treated iindependently of space, because one can only define an invariant interval under Galilean transformations in 3D space (and not spacetime) and even then, only when both events are simultaneous?!

PeterDonis
Mentor
2020 Award
Is this an important reason why time is treated iindependently of space
In Newtonian physics, yes.

robphy
Homework Helper
Gold Member
What you can't do in Newtonian mechanics is define an invariant "spacetime interval" between two events; you can only define an invariant "spatial distance" between two events that both have the same time coordinate.
The Galilean line element is ##ds^2=dt^2##... So the Galilean spacetime square interval between any two events is just the square of the time-difference between those events. This is Galilean invariant. (Tensorially, this uses ##t_a t_b##... Following http://www.socsci.uci.edu/~dmalamen/bio/GR.pdf page 249)

The spatial distance on a constant time slice is defined using ##h^{ab}##, which acts like a metric on the constant-time slice.

Admittedly both ##t_a t_b## and ##h^{ab}## are degenerate... But that doesn't prevent Galilean invariant quantities from being defined. However, more care is needed, as Malament notes.

vanhees71
Gold Member
The proper orthochronous subgroup does not only act on timelike vectors or intervals. It acts on all vectors or intervals. The fact that it excludes transformations that act discretely (parity inversion and time reversal) does not mean it only acts on timelike vectors or timelike intervals. The concept of "spacelike separated events" makes perfect sense with reference to this subgroup of transformations; spacelike intervals are transformed to spacelike intervals (with the same invariant spacelike length); timelike intervals are transformed to timelike intervals (with the same invariant timelike length); and null intervals are transformed to null intervals. There is no issue at all.
Right, but for the orthochronous and only the orthochronous transformations the sequence of time for non-spacelike separated events (i.e., for timelike or lightlike separated) events stays the same under the transformation. Thus two events can only be causally connected (i.e., the "earlier" event can be the cause of the "later" event) if they are non-spacelike separated. That's the fundamental reason for the strict "speed limit" in SRT, i.e., no signal can go faster than the limiting speed (which according to present empirical evidence is indeed the speed of light).

Dale
vanhees71
Gold Member
Is this done by requiring that, given two inertial frames ##S## and ##S'##, we require that a beam of light traverses the same distance relative to both frames, furthermore is should do propagate outwards from a source equally in all directions (required by the 2nd postulate of special relativity). This is requirement is given mathematically by $$ds^{2}=c^{2}dt^{2}-dx^{2}-dy^{2}-dz^{2}=0=c^{2}dt'^{2}-dx'^{2}-dy'^{2}-dz'^{2}=ds'^{2}$$ (where we do not assume, a priori, that ##t=t'##), i.e. Light should propagate out from a source like the surface of a sphere and so the distance travelled by the light is equal to the radius of the corresponding sphere. Now, by transitivity of equality, it follows that $$c^{2}dt^{2}-dx^{2}-dy^{2}-dz^{2}=c^{2}dt'^{2}-dx'^{2}-dy'^{2}-dz'^{2}$$ The line element is thus invariant in the case where it vanishes. Given that, in this limiting case we require that ##ds^{2}=0=ds'^{2}##, then in general, we must have that $$ds'^{2}=F(t,x,y,z)ds^{2}$$ however, by the relativity principle we should have that $$ds^{2}=F(t,x,y,z)ds'^{2}$$ since otherwise one would be able to distinguish between the two frames. As such, we have that $$F(t,x,y,z)=\pm 1$$ but since in the case where the two frames are at rest relative to one another we must have that ##c^{2}dt'^{2}=c^{2}dt^{2}##, and so we must choose ##F(t,x,y,z)=1##. Thus in general, $$ds^{2}=c^{2}dt^{2}-dx^{2}-dy^{2}-dz^{2}$$ Hence it is this "pseudo-Euclidean" line element that we require to be invariant when transforming from one inertial frame to another, since this ensures that the speed of light is independent of the observer. The fact that both spatial and temporal coordinates combine to form an invariant quantity, i.e. the line element above, suggests that we should consider a 4-dimensional space, which we call "spacetime", since then this line element then naturally becomes the metric describing the geometry of such a spacetime. Furthermore, the fact that the line element is constructed from both spatial and temporal coordinates suggests that they are coordinates of a single 4-dimensional spacetime, i.e. space and time form a single continuum and not two separate, independent continuua as was the case in Newtonian mechanics.

Would this be a correct understanding at all?

Also, is it perfectly valid to introduce the notion of spacetime in the way I did in my original post too?

Is it also correct to say that Einstein's second postulate implies homogeneity and isotropy of space and homogeneity in time (since otherwise the speed of light would not be constant and observer independent)?!
Yes, I think that's a very good motivation (if not a proof). I like particularly the discussion about the overall factor ##F(t,x,y,z)##.

Of course, if you claim the invariance of ##\mathrm{d}s^2## only for ##\mathrm{d}s^2=0##, i.e., tangent vectors of lightlike world lines you get a larger group than the Lorentz group, including conformal symmetry. It should be the symmetry group of free electromagnetic fields or any classical relativistic field theory involving only massless fields. In QFT, of course, conformal symmetry is pretty fragile, because in the interacting case it's usually anomalously broken through quantization of the classical theory.