# Minkowski Space Metric

## Main Question or Discussion Point

I've never seen a satisfactory explanation of the metrics used in a calculation of distance in Minkowski space. In Euclidean space, the distance is:
ds^2 = dx^2 + dy^2 + dz^2
But in Minkowski space, the distance is
ds^2 = (dt * c)^2 - dx^2 - dy^2 - dz^2
Why are the signs reversed? This implies that space (or time depending on your convention) is imaginary.

Related Special and General Relativity News on Phys.org
Dale
Mentor
That is one way to look at it, but it faded into disuse quite some time ago. Now, the usual approach is not to consider the time coordinate to be imaginary, but to consider the minus sign to be in the metric. So (in units where c=1):

##ds^2 = g_{\mu\nu} dx^{\mu} dx^{\nu} = -dt^2 + dx^2 + dy^2 + dz^2##

This can, as you suggested, be achieved by ##dx = (i~dt,dx,dy,dz)## and

##g = \left(
\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}
\right)##

But it can also be achieved by ##dx = (dt,dx,dy,dz)## and

##g = \left(
\begin{array}{cccc}
-1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}
\right)##

The usual modern approach is the latter

shounakbhatta and Torbjorn_L
Fredrik
Staff Emeritus
Gold Member
The ##dx^\mu## notation is from differential geometry. In the context of SR, we can talk about matrices instead. The Euclidean inner product (i.e. the dot product) on the space of 4×1 matrices is given by ##\langle x,y\rangle=x^Ty##. If you insist on using this formula in SR, you have to make some components of x and y imaginary. A nicer way is to modify the definition to ##\langle x,y\rangle =x^Tg y##, where g is defined in DaleSpam's post.

Torbjorn_L
Nugatory
Mentor
Why are the signs reversed? This implies that space (or time depending on your convention) is imaginary.
The different sign on the ##t## coordinate means that the Minkowski metric describes a space-time in which the distance between points on the line corresponding to the path of a light beam is zero. Experiments confirm that this model accurately describes the universe that we live in, so that's the model that we use. Thus, your "Why?" question comes down to "Why is the universe built this way and not some other way?" - and science isn't going to give you a satisfactory answer to that question.

As DaleSpam points out above, the modern style of moving the sign difference into the metric tensor reduces the embarrassing appearance of "imaginary" (better to say "complex" instead) numbers in the formulas. The older style, in which sooner or later you find yourself treating ##ict## (with ##i=\sqrt{-1}##) as a coordinate, was used mostly because it made the Lorentz transformations look like the already familiar problem of rotating the coordinate axes in Euclidean space. That helped people who were familiar with the mathematical underpinnings of classical mechanics make the jump to special relativity (it's worth noting that Goldstein introduces relativistic mechanics this way) but it's something you'll have to unlearn when you move on to general relativity.

Chalnoth
That is one way to look at it, but it faded into disuse quite some time ago. Now, the usual approach is not to consider the time coordinate to be imaginary, but to consider the minus sign to be in the metric. So (in units where c=1):

##ds^2 = g_{\mu\nu} dx^{\mu} dx^{\nu} = -dt^2 + dx^2 + dy^2 + dz^2##

This can, as you suggested, be achieved by ##dx = (i~dt,dx,dy,dz)## and

##g = \left(
\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}
\right)##

But it can also be achieved by ##dx = (dt,dx,dy,dz)## and

##g = \left(
\begin{array}{cccc}
-1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}
\right)##

The usual modern approach is the latter
Also, whether to put the minus sign on the time coordinate or the spatial coordinates is a matter of convention, and both conventions are in wide use.

ghwellsjr
Gold Member
I've never seen a satisfactory explanation of the metrics used in a calculation of distance in Minkowski space. In Euclidean space, the distance is:
ds^2 = dx^2 + dy^2 + dz^2
But in Minkowski space, the distance is
ds^2 = (dt * c)^2 - dx^2 - dy^2 - dz^2
Why are the signs reversed? This implies that space (or time depending on your convention) is imaginary.
Since you have the option of using either convention, when you are actually doing a calculation for the distance between two events, if it comes out imaginary in one convention, you can switch to the other convention. Then you can think of the distance as being either an actual spatial distance or an actual time interval, depending on which convention evaluates to a positive value by taking its squareroot. In the first case, it can be measured with a ruler at rest in an Inertial Reference Frame where the two events occur at the same time or in the second case, it can be measured with an inertial clock that is present at both events. Another commonly used term for this distance is the Spacetime Interval, which in the first case is called "Spacelike" and in the second case is called "Timelike". If the Spacetime Interval evaluates to zero, that means that it cannot be measured either with a ruler or with a clock and it is called "Null".

Chalnoth
The different sign on the ##t## coordinate means that the Minkowski metric describes a space-time in which the distance between points on the line corresponding to the path of a light beam is zero. Experiments confirm that this model accurately describes the universe that we live in, so that's the model that we use. Thus, your "Why?" question comes down to "Why is the universe built this way and not some other way?" - and science isn't going to give you a satisfactory answer to that question.
I think we can do a bit better than that.

For example, one way to understand why we formulate the dimensions in this way, with the time coordinate taking the opposite sign of the spatial coordinates, is that when we write down distances in this way, the shortest path something will take between two events is the path it actually takes. This isn't terribly useful in just Minkowski space, as this just means that things travel in straight lines. But in General Relativity this can be used to compute the paths of orbits, or of light rays being deflected by a gravitational field, or anything else you might care to estimate the path of. Everything takes the shortest distance when you use the metric, which has different signs between the spatial and time components, as the measure of distance. With the added constraints that light always takes a path which has space-time distance equal to zero and objects with mass cannot travel space-like distances (with the metric convention RCopernicus used, s^2 must be greater than zero).

pervect
Staff Emeritus
I've never seen a satisfactory explanation of the metrics used in a calculation of distance in Minkowski space. In Euclidean space, the distance is:
ds^2 = dx^2 + dy^2 + dz^2
But in Minkowski space, the distance is
ds^2 = (dt * c)^2 - dx^2 - dy^2 - dz^2
Why are the signs reversed? This implies that space (or time depending on your convention) is imaginary.
The name "distance" may be confusing you. In special relativity, distance is observer dependent, due to Lorentz contraction. What is independent of the observer, and hence an invariant, is the Lorentz interval.

We sometimes, especially in analogies, refer to space-like Lorentz intervals as distances, or call them "proper distances". But the Lorentz interval is still a separate concept, it's distinguishing feature is that it's the same for all observers, and the formula (with it's minus sign) calculates this quantity that is the same for all observers. Without the minus sign, this quantity we calculate would not be the same for all observers, and hence would not be as of much interest.

We'll get back to the similarities of the Lorentz interval with Euclidean distance later, but for now it's important to recognize that they are different ghings, before we point out their underlying similarity.

Note that the Lorentz interval being equal to zero is equivalent to a lightlike separation between a pair of points, and vica versa. So, the Lorentz interval being zero is equivalent to saying that the geometry of space-time is such that a light like separation between points is independent of the observer. If this sounds like it's on the right track, good! If it seems a bit vague, read on.

The more formal justification of the Lorentz interval follows from the Lorentz transform itself. You can verify mathematically that a consequence of the Lorentz transform is that it leaves the Lorentz interval unchanged. The Lorentz transformations don't leave distances unchanged, nor do they leave times unchanged. The only scalar quantity that the Lorentz transforms leave unchanged is the Lorentz interval.

There are several ways of motivating the Lorentz transforms, you can use Einstein's original approach, or my favorite, the k-calculus approach due to Bondi. But the point is that after you start out with the axioms of relativity, that the speed of light is the same for all obserervers, plus whatever auxillary assumptions your particular approach to special realtivity needs (isotropy is a common one). At the end, you wind up with the Lorentz transform. I can't really get more specific than that in a short post, I will just suggest that if you don't understand how the Lorentz transformations came about, and my explanation is too brief, that there is a lot of literature out there you can read to fill in the gaps. After you've derived this transform, you notice an interesting property it has - it leaves this quantity that we call the Lorentz interval unchanged.

If you compare this to Euclidean geometry, the invariance of the Lorentz interval under Lorentz transformations is similar to the invariance of distance under rotations in Euclidean space. So the Lorentz interval is a bit like the concept that distance used to be in Euclidean space, beuase it's independent of the observer.

So you have this useful analogy, between the transforms induced by changes in velocity (called Lorentz boosts), and rotations in standard Euclidean space. They both leave something underlying unchanged. In the case of Euclidean space, this important thing that is unchanged by rotation is Euclidean distance. In the case of Minkowskii space, this important thing that is unchanged by a boost is the Lorentz interval.

Dale
Mentor
Also, whether to put the minus sign on the time coordinate or the spatial coordinates is a matter of convention, and both conventions are in wide use.
Yes, both are in wide use. My personal preference is to use ##ds^2## for the (-+++) convention and ##d\tau^2## for the (+---) convention.

ghwellsjr
Gold Member
NOTE: the Lorentz interval that pervect was talking about in his post is the same as the Spacetime Interval that I was talking about in my post.

And remember, between any pair of arbitrary events, it is either a pure spatial distance, or a pure time interval, or neither, which is why it is called "null". Only a flash of light can be present at both events in the null case and for that reason it is also called "lightlike".

I've never seen a satisfactory explanation of the metrics used in a calculation of distance in Minkowski space. In Euclidean space, the distance is:
ds^2 = dx^2 + dy^2 + dz^2
But in Minkowski space, the distance is
ds^2 = (dt * c)^2 - dx^2 - dy^2 - dz^2
Why are the signs reversed? This implies that space (or time depending on your convention) is imaginary.
As others already explained, it's just a matter of convention. The way it was written the first time (I think) is probably easier to understand:

"[..] the invariants of the Lorentz group.
We know that the substitutions of this group [..] are linear substitutions which do not affect the quadratic form
x2 + y2 + z2 - t2.
- https://en.wikisource.org/wiki/Translation:On_the_Dynamics_of_the_Electron_(July)#.C2.A7_9._.E2.80.94_Hypotheses_on_gravitation

That is one way to look at it, but it faded into disuse quite some time ago. Now, the usual approach is not to consider the time coordinate to be imaginary, but to consider the minus sign to be in the metric. So (in units where c=1):

##ds^2 = g_{\mu\nu} dx^{\mu} dx^{\nu} = -dt^2 + dx^2 + dy^2 + dz^2##

This can, as you suggested, be achieved by ##dx = (i~dt,dx,dy,dz)## and

##g = \left(
\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}
\right)##

But it can also be achieved by ##dx = (dt,dx,dy,dz)## and

##g = \left(
\begin{array}{cccc}
-1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}
\right)##

The usual modern approach is the latter
Hello Dale Spam,

As you have written, I would like to know how the values of (-1,0,0,0) come from? Are they parameters?

The different sign on the ##t## coordinate means that the Minkowski metric describes a space-time in which the distance between points on the line corresponding to the path of a light beam is zero. Experiments confirm that this model accurately describes the universe that we live in, so that's the model that we use. Thus, your "Why?" question comes down to "Why is the universe built this way and not some other way?" - and science isn't going to give you a satisfactory answer to that question.

As DaleSpam points out above, the modern style of moving the sign difference into the metric tensor reduces the embarrassing appearance of "imaginary" (better to say "complex" instead) numbers in the formulas. The older style, in which sooner or later you find yourself treating ##ict## (with ##i=\sqrt{-1}##) as a coordinate, was used mostly because it made the Lorentz transformations look like the already familiar problem of rotating the coordinate axes in Euclidean space. That helped people who were familiar with the mathematical underpinnings of classical mechanics make the jump to special relativity (it's worth noting that Goldstein introduces relativistic mechanics this way) but it's something you'll have to unlearn when you move on to general relativity.
Hello Nugatory,

Thank you for this wonderful, lucid answer. I would clarify (-ct), is because that t time is considered imaginary, hence i=root(sqrt-1)). Is that so?

your "Why?" question comes down to "Why is the universe built this way and not some other way?" - and science isn't going to give you a satisfactory answer to that question.
This is off topic, I'm swimming in deep waters here and maybe this is obvious, but as a note I thought relativity was the embodiment of having a universal speed limit. And that the problem with "why this way" starts out of that observation (e.g. "why locality and hence causality").

Nugatory
Mentor
This is off topic, I'm swimming in deep waters here and maybe this is obvious, but as a note I thought relativity was the embodiment of having a universal speed limit.
The essential basis of relativity is not the universal speed limit, it is the invariance of the speed of light; the universal speed limit (and much else) follows from light-speed invariance.

But with said...
You've just moved the "Why?" question around. Why do we live in a universe that has a universal speed limit instead of one that does not? There's a fine and consistent mathematical model for describing a universe in which the speed of light is not invariant and there is no universal speed limit; it's called classical mechanics and there's nothing wrong with it except that observation tells us that it's not the way the universe works.

I think we can do a bit better than that.

For example, one way to understand why we formulate the dimensions in this way, with the time coordinate taking the opposite sign of the spatial coordinates, is that when we write down distances in this way, the shortest path something will take between two events is the path it actually takes. This isn't terribly useful in just Minkowski space, as this just means that things travel in straight lines. But in General Relativity this can be used to compute the paths of orbits, or of light rays being deflected by a gravitational field, or anything else you might care to estimate the path of. Everything takes the shortest distance when you use the metric, which has different signs between the spatial and time components, as the measure of distance. With the added constraints that light always takes a path which has space-time distance equal to zero and objects with mass cannot travel space-like distances (with the metric convention RCopernicus used, s^2 must be greater than zero).
I'm afraid your answer doesn't make much sense. I can claim that ds^2 = dx^2 - dy^2 describes a shorter path than ds^2 = dx^2 + dy^2, but I have no justification for arbitrarily flipping the sign on one of my dimensions. So why is Minkowski able to get away with it with time? Is not time orthogonal in every way to space?

Dale
Mentor
The flipped sign is what sets up the causal structure of spacetime. It separates spacetime and four-vectors into timelike spacelike and lightlike regions.

Clocks measure timelike intervals and rods measure spacelike intervals. If you gave them both the same sign then you would have a theory where you could measure time with rods and simply turn towards the past as easily as turning left.

Torbjorn_L
Dale
Mentor
I would like to know how the values of (-1,0,0,0) come from?
They come directly from the line element:
##ds^2=g_{\mu\nu}dx^{\mu}dx^{\nu}=-dt^2+dx^2+dy^2+dz^2##

There are no cross terms, so all of the off diagonal entries are 0. Then the diagonal entries are the corresponding coefficients: (-1,1,1,1)

Last edited:
pervect
Staff Emeritus
Respoinding to Chalnoth:

Interesting,why single it out for special attention then?

At the risk of being repetitive, I'll summarize my original longer post, which I'm concerned may have just gotten lost in the mass of replies.

While one can pretend that the sign flip in the expression for ds^2 did just "fall out of the air", and explore the consequence of saying "this magic quantity, which we won't tell you where it came from, is the same for all inertial observers" in detail and comparing the predictions made by this claim, I think it is more in the spirit of the original question to ask, historically, where did this quantity come from.

The process is a bit long, as I outlined in a longer post - one starts with special relativity and derives the Lorentz transform, then one looks at special quantities that are unchanged by the Lorentz transform and singles them out for special interest, eventually winding up with a geometrical interpretation of the special quantities that have this special property, which is called invariance.

So the quantity in question has a bit of a history, and if one wants to know where it came from, one needs to study the history.

If the real underlying question is "why study relativity at all", it might be good to remember that the end goal of science is to make predictions that agree with experiment, and focus on the experimental results. Starting out with preconceived notions of "how Nature ought to work" winds up in frustration at best, and at worst ends up with one sticking to the (incorrect) preconceived notions because one is happier with them than one is with the sometimes messy results that are actually measured.

DrGreg
Gold Member
ChrisVer
Gold Member
I prefer the minus in the time component.
One reason is that we have the boosts and rotations in a Lorentz transformation. The boosts are the transformations concerning the time-components and they are generated by hyperbolic trigonometric functions.
If instead you had a 4D Euclidean space, you could have just the rotations. Rotations are generated by trigonometric functions. Changing the 0-component then into imaginary you will get hyperbolic functs and so boosts.

I haven't ever gone through this to see if it's working or not... but subconsciously it leads me in writing the metric as diag(-+++)

(Side remark:
Thus, your "Why?" question comes down to "Why is the universe built this way and not some other way?" - and science isn't going to give you a satisfactory answer to that question.
Then scientists should never ask the question why, and just start collecting and believing in data.)

Last edited:
Respoinding to Chalnoth:
While one can pretend that the sign flip in the expression for ds^2 did just "fall out of the air", and explore the consequence of saying "this magic quantity, which we won't tell you where it came from, is the same for all inertial observers" in detail and comparing the predictions made by this claim, I think it is more in the spirit of the original question to ask, historically, where did this quantity come from.
Actually, that is precisely the question I'm asking. I accept the reality that the square-root of minus one accurately describes an observation. What is harder to accept is that geometry allows us to do this. I can make a shortest path by postulating that ds^2 = dx^2 - dy^2, but these dimensions are orthogonal to each other so I can't just arbitrarily flip the sign to make the distance shorter. So why are you able to do this for the dimension of time? (Yes, I understand they're different things, but the whole idea of an invariant distance is to put them in a form where you can add them together: ds^2 = cti^2 + x^2 + y^2 +z^2).

One poster has claimed: we don't know, it just works that way. I suppose that's good enough for the Quantum Mechanics, but I'm left with the sense that we just threw in a minus sign to make the formula fit the observation and I feel the same sense of dissatisfaction I feel when I use Gravitational Constant in a formula.

Nugatory
Mentor
Actually, that is precisely the question I'm asking. I accept the reality that the square-root of minus one accurately describes an observation. What is harder to accept is that geometry allows us to do this. I can make a shortest path by postulating that ds^2 = dx^2 - dy^2, but these dimensions are orthogonal to each other so I can't just arbitrarily flip the sign to make the distance shorter. So why are you able to do this for the dimension of time? (Yes, I understand they're different things, but the whole idea of an invariant distance is to put them in a form where you can add them together: ds^2 = cti^2 + x^2 + y^2 +z^2).
The time coordinate is different from the three spatial coordinates because we can always rotate the spatial axes in such a way that only one of ##dx##, ##dy##, ##dz## are non-zero, or make any one of then negative, while still maintaining the orthogonality of all four axes. We can't do the same thing with the time axis because (as others have already said in this thread) that would be tantamount to turning in some direction and being able to look backwards in time.

The difference in sign between the three spatial components and the one temporal component of the Minkowski metric is capturing this basic difference between the spatial coordinates on the one hand and the temporal coordinate on the other. Indeed, the fact that both the (-1,1,1,1) and (1,-1,-1,-1) conventions work equally well is a pretty strong hint that no matter how we talk about them, they're different.

One poster has claimed: we don't know, it just works that way. I suppose that's good enough for the Quantum Mechanics, but I'm left with the sense that we just threw in a minus sign to make the formula fit the observation and I feel the same sense of dissatisfaction I feel when I use Gravitational Constant in a formula.
I didn't say that I was any more satisfied with this answer than you...
I said we are stuck with it, because the universe isn't responding to our complaints :)

robphy
Homework Helper
Gold Member
What is harder to accept is that geometry allows us to do this. I can make a shortest path by postulating that ds^2 = dx^2 - dy^2, but these dimensions are orthogonal to each other so I can't just arbitrarily flip the sign to make the distance shorter. So why are you able to do this for the dimension of time? (Yes, I understand they're different things, but the whole idea of an invariant distance is to put them in a form where you can add them together: ds^2 = cti^2 + x^2 + y^2 +z^2).
Suppose I told you about an experiment plotted on a position-vs-time graph.
Starting at a common event, have an infinite set of inertial observers with different velocities along a line in space
run and stop when their wristwatch reads 1 minute. Call the set of their stopping events a "circle" in this graph.
What is the equation of this "circle" in t and x variables? (Note that every inertial observer making a graph of this experiment will have identical looking graphs... with identical asymptotes.)

Nugatory
pervect
Staff Emeritus
Actually, that is precisely the question I'm asking. I accept the reality that the square-root of minus one accurately describes an observation. What is harder to accept is that geometry allows us to do this. I can make a shortest path by postulating that ds^2 = dx^2 - dy^2, but these dimensions are orthogonal to each other so I can't just arbitrarily flip the sign to make the distance shorter. So why are you able to do this for the dimension of time? (Yes, I understand they're different things, but the whole idea of an invariant distance is to put them in a form where you can add them together: ds^2 = cti^2 + x^2 + y^2 +z^2).

One poster has claimed: we don't know, it just works that way. I suppose that's good enough for the Quantum Mechanics, but I'm left with the sense that we just threw in a minus sign to make the formula fit the observation and I feel the same sense of dissatisfaction I feel when I use Gravitational Constant in a formula.
The full explanation of where it came from would involve deriving the Lorentz transform.

That's too much work for a post. Any SR book should go into the full details. As I said before, I'm particularly fond of the so-called k-calculus approach, if you look for books by Bondi like "Relativity and common sense", you'll see that approach applied. https://www.amazon.com/dp/0486240215/?tag=pfamazon01-20&tag=pfamazon01-20

But perhaps I can say something short and motivational instead of trying to derive the transform, I'll just point out one of its properties.

Consider a spherical wavefront propagating at a velocity "c". Let's describe them in a frame S with coordinates (t,x,y,z). The equations for the points on this wavefront will be an expanding sphere. At time t, the radius of the sphere will be ct. This implies that that ##x^2 + y^2 + z^2 = (ct)^2##, or ##x^2 + y^2 + z^2 - (ct)^2 = 0##.

Now, relativity says that light will propagate isotropically in a sphere in any inertial frame of referece. Lets consider two specific inertial frames of reference, S, and S'. If both S and S' are inertial, S' must be moving with some constant velocity v relative to S.

The first point is that any event in space-time will have exactly one unique set of coordinates in S, and a different set of unique coordinates in S'. As a consequence there will be a 1:1 mapping from coordinates in S to coordinates in S'.

Proof:
Given there is a 1:1 mapping from events<->S, and from events<->S'
We can invert the order and find a 1:1 mapping from S to events, because a 1:1 mapping must be invertible.

Then we construct the map S->events. Composing it with our map from events to S', we get
S -> events -> S'

This is the desired map from S to S'

This 1:1 mapping from S to S' is the Lorentz transform. But rather than derive it in detail I'm going to make a much simpler remark.

In frame S', describing the same wavefront from the same event, presumed to happen at t=t'=0, what do we get? There's nothing particularly special about either S or S', so hopefully it's clear that the description must involves simply replacing x with x', y with y', z with z', and t with t'. Thus we have

##x^2 + y^2 + z^2 - (ct)^2 = 0## in S
before the transform, and after the transform we must have

## x'^2 + y'^2 + z'^2 - (ct')^2 = 0.## in S'

Giving the quantity ##x^2 + y^2 + z^2 - (ct)^2## a name, the Lorentz interval, we've demonstrated that if th Lorentz interval is zero in S, it must also be zero in S'. It turns out that there is a more general result, that the value of the Lorentz interval is preserved even when it's not zero. I'm afraid you'll have to wade through the full details of the Lorentz transform to prove that. But if we are looking for preserved quantities, we have narrowed the field down a lot by noting that a zero value of the Lorentz interval in S must yield a zero value in S'.

Note that our proof relied on the constancy and isotropy of the speed of light, the idea that if it is a spherical wavefront in S, it must be a spherical wavefront in S'. This is one of the assumptions in relativity.

I

Last edited by a moderator: