Showing that Lorentz transformations are the only ones possible

Fredrik · Nov 22, 2012

TrickyDicky said:

If I understand it correctly this proves that the most general transformations that take straight lines to straight lines are the linear fractional ones.
To get to the linear case one still needs to impose the condition mentioned above about the continuity of the transformation, right?

It's sufficient to assume that the map that takes straight lines to straight lines is defined on the entire vector space, rather than a proper subset. It's not necessary to assume that the map is continuous. (If you want the map to be linear, rather than linear plus a translation, you must also assume that it takes 0 to 0).

Fredrik · Nov 22, 2012

DrGreg said:

I've just realized there's a simple geometric proof, for Fredrik's special case, for the case of the whole of \mathbb{R}^2, which I suspect would easily extend to higher dimensions.

Let T : \mathbb{R}^2 \rightarrow \mathbb{R}^2 be a bijection that maps straight lines to straight lines. It must map parallel lines to parallel lines, otherwise two points on different parallel lines would both be mapped to the intersection of the non-parallel image lines, contradicting bijectivity. So it maps parallelograms to parallelograms. But, if you think about it, that's pretty much the defining property of linearity (assuming T(0)=0).

There are a few I's to dot and T's to cross to turn the above into a rigorous proof, but I think I'm pretty much there -- or have I omitted too many steps in my thinking? (I think you may have to assume T is continuous to extend the additive property of linearity to the scalar multiplication property.)

I've been examining the proof in Berger's book more closely. (Change the .se to your own country domain if the url is giving you trouble). His strategy is very close to yours, but there's a clever trick at the end that allows us to drop the assumption of continuity. Consider the following version of the theorem:

Suppose that X=ℝ². If T:X→X is a bijection that takes straight lines to straight lines and 0 to 0, then T is linear.

For this theorem, the steps are as follows:

1. If K and L are two different lines through 0, then T(K) and T(L) are two different lines through 0.
2. If K and L are two parallel lines, then T(K) and T(L) are two parallel lines.
3. For all x,y such that {x,y} is linearly independent, T(x+y)=Tx+Ty. (This is done by considering a parallelogram as you suggested).
4. For all vectors x and all real numbers a, T(ax)=aTx. (Note that this result implies that T(x+y)=Tx+Ty when {x,y} is linearly dependent).

The strategy for step 4 is as follows: Let x be an arbitrary vector and a an arbitrary real number. If either x or a is zero, we have T(ax)=0=aTx. If both are non-zero, we have to be clever. Since Tx is on the same straight line through 0 as T(ax), there's a real number b such that T(ax)=bTx. We need to prove that b=a. Let B be the map ##t\mapsto tx##. Let C be the map ##t\mapsto tTx##. Let f be the restriction of T to the line through x and 0. Define ##\sigma:\mathbb R\to\mathbb R## by ##\sigma=C^{-1}\circ f\circ B##. Since
$$\sigma(a)=C^{-1}\circ f\circ B(a) =C^{-1}(f(B(a)) =C^{-1}(T(ax)) =C^{-1}(bTx)=b,$$ what we need to do is to prove that σ is the identity map. Berger does this by proving that σ is a field isomorphism. Since both the domain and codomain is ℝ, this makes it an automorphism of ℝ, and by the lemma that micromass proved so elegantly above, that implies that it's the identity map.

TrickyDicky · Nov 22, 2012

Fredrik said:

It's sufficient to assume that the map that takes straight lines to straight lines is defined on the entire vector space, rather than a proper subset. It's not necessary to assume that the map is continuous. (If you want the map to be linear, rather than linear plus a translation, you must also assume that it takes 0 to 0).

What I meant is that one must impose that the transformation must map finite coordinates to finite coordinates, which I think is equivalent to what you are saying here.

strangerep · Nov 22, 2012

micromass said:

Here is a proof for the plane.

Thank you Micromass.
Your posts deserve to be polished and turned into a library item, so I'll mention a couple of minor typos I noticed:

[...] again a perspectivity [...]

Even though this is a synonym, I presume it should be "projectivity", since that's the word you used earlier.

Also,

[...] verticles [...]

Fredrik · Nov 23, 2012

Just out of curiosity, do people use the term "line" for curves that aren't straight? Do we really need to say "straight line" every time?

micromass · Nov 23, 2012

strangerep said:

Even though this is a synonym, I presume it should be "projectivity", since that's the word you used earlier.

Ah yes, thank you! It should indeed be projectivity.
A perspectivity is something slightly different. I don't know why I used that term...

lugita15 · Nov 23, 2012

Fredrik said:

Just out of curiosity, do people use the term "line" for curves that aren't straight? Do we really need to say "straight line" every time?

Yes, at least historically line was just used to mean any curve. I think Euclid defined a line to be a "breadthless length", and defined a straight line to be a line that "lies evenly with itself", whatever that means.

EDIT: If you're interested, you can see the definitions here.

Fredrik · Nov 24, 2012

I think I have completely understood how to prove the following theorem using the methods described in Berger's book.

If ##T:\mathbb R^2\to\mathbb R^2## is a bijection that takes lines to lines and 0 to 0, then ##T## is linear.

I have broken it up into ten parts. Most of them are very easy, but there are a few tricky ones.

Notation: If L is a line, then I will write TL instead of T(L).

If K is a line through 0, then so is TK.
If K,L are lines through 0 such that K≠L, then TK≠TL. (Note that this implies that if {x,y} is linearly independent, then so is {Tx,Ty}).
If K is parallel to L, then TK is parallel to TL.
For all x,y such that {x,y} is linearly independent, T(x+y)=Tx+Ty.
If x=0 or a=0, then T(ax)=aTx.
If x≠0 and a≠0, then there's a b such that T(ax)=bTx. (Note that this implies that for each x≠0, there's a map σ such that T(ax)=σ(a)Tx. The following steps determine the properties of σ for an arbitrary x≠0).
σ is a bijection from ℝ² into ℝ².
σ is a field homomorphism.
σ is the identity map. (Combined with 5-6, this implies that T(ax)=aTx for all a,x).
For all x,y such that {x,y} is linearly dependent, T(x+y)=Tx+Ty.

I won't explain all the details of part 8, because they require a diagram. But I will describe the idea. If you want to understand part 8 completely, you need to look at the diagrams in Berger's book.

Notation: I will denote the line through x and y by [x,y].

Since T takes lines to lines, TK is a line. Since T0=0, 0 is on TK.
Suppose that TK=TL. Let x be an arbitrary non-zero point on TK. Since x is also on TL, T^-1(x) is in both K and L. But this implies that T^-1(x)=0, which contradicts that x≠0.
If K=L, then obviously TK=TL. If K≠L, then, they are either parallel or intersect somewhere, and part 2 tells us that they don't intersect.
Let x,y be arbitrary vectors such that {x,y} is linearly independent. Part 2 tells us that {Tx,Ty} is linearly independent. Define
K=[0,x] (This is the range of ##t\mapsto tx##).
L=[0,y] (This is the range of ##t\mapsto ty##).
K'=[x+y,y] (This is the range of ##t\mapsto y+tx## so this line is parallel to K).
L'=[x+y,x] (This is the range of ##t\mapsto x+ty## so this line is parallel to L).
Since x+y is at the intersection of K' and L', T(x+y) is at the intersection of TK' and TL'. we will show that Tx+Ty is also at that intersection. Since x is on L', Tx is on TL'. Since L' is parallel to L, TL' is parallel to TL (the line spanned by {Ty}). These two results imply that TL' is the range of the map B defined by B(t)=Tx+tTy. Similarly, TK' is the range of the map C defined by C(t)=Ty+tTx. So there's a unique pair (r,s) such that T(x+y)=C(r)=B(s). The latter equality can be written as Ty+rTx=Tx+sTy. This is equivalent to (r-1)Tx+(1-s)Ty=0, and since {Tx,Ty} is linearly independent, this implies r=s=1. So T(x+y)=B(1)=Tx+Ty.
Let x be an arbitrary vector and a an arbitrary real number. If either of them is zero, we have T(ax)=0=aT(x).
Let x be non-zero but otherwise arbitrary. 0,x, and ax are all on the same line, K. So 0,x and T(ax) are on the line TK. This implies that there's a b such that T(ax)=bTx. (What we did here proves this statement when a≠0 and x≠0, and part 5 shows that it's also true when a=0 or x=0).
The map σ can be defined explicitly in the following way. Define B by B(t)=tx for all t. Define C by C(t)=tTx for all t. Let K be the range of B. Then the range of C is TK. Define ##\sigma=C^{-1}\circ T|_K\circ B##. This map is a bijection (ℝ→ℝ), since it's the composition of three bijections (ℝ→K→TK→ℝ). To see that this is the σ that was discussed in the previous step, let b be the real number such that T(ax)=bTx, and note that
$$\sigma(a)=C^{-1}\circ T|_K\circ B(a) =C^{-1}(T(B(a))) =C^{-1}(T(ax)) =C^{-1}(bTx)=b.$$
Let a,b be arbitrary real numbers. Using the diagrams in Berger's book, we can show that there are two lines K and L such that (a+b)x is at the intersection of K and L. This implies that the point at the intersection of TK and TL is T((a+b)x)=σ(a+b)Tx. Then we use the diagram (and its image under T) to argue that T(ax)+T(bx) must also be at that same intersection. This expression can be written (σ(a)+σ(b))Tx, so these results tell us that
$$(\sigma(a)+\sigma(b)-\sigma(a+b))Tx=0.$$ Since Tx≠0, this implies that σ(a+b)=σ(a)+σ(b). Then we use similar diagrams to show that σ(ab)=σ(a)σ(b), and that if a<b, then σ(a)<σ(b). (The book doesn't include a diagram for that last part, but it's easy to imagine one).
This follows from 8 and the lemma that says that the only automorphism of R is the identity.
Suppose that {x,y} is linearly dependent. Let k be the real number such that y=kx. Part 9 tells us that T(x+y)=T((1+k)x)=(1+k)Tx=Tx+kTx=Tx+T(kx)=Tx+Ty.

friend · Nov 25, 2012

This is a very interesting thread. Sorry I'm late to the conversation. I appreciate all the contributions. But I'm getting a little lost.

The question of the OP was asking about what kind of transformation keeps the following invariant:

c^2t^2 - x^2 - y^2 - z^2 = 0
c^2t'^2 - x'^2 - y'^2 - z'^2 = 0

But Mentz114 in post 3 interprets this to means that the transformation preserves

-dt'2 + dx'2 = -dt2 + dx2.

And Fredrik in post 8 interprets this to mean

If Λ is linear and g(Λx,Λx)=g(x,x) for all x∈R4, then Λ is a Lorentz transformation.

And modifies this in post 9 to be

If Λ is surjective, and g(Λ(x),Λ(y))=g(x,y) for all x,y∈R4, then Λ is a Lorentz transformation.Are these all the same answer in different forms? Or is there a side question being addressed about linearity? Thank you.

Fredrik · Nov 25, 2012

friend said:

And Fredrik in post 8 interprets this to mean

If Λ is linear and g(Λx,Λx)=g(x,x) for all x∈R4, then Λ is a Lorentz transformation.

And modifies this in post 9 to be

If Λ is surjective, and g(Λ(x),Λ(y))=g(x,y) for all x,y∈R4, then Λ is a Lorentz transformation.

Those aren't interpretations of the original condition. I would interpret the OP's assumption as saying that g(Λx,Λx) for all x∈ℝ⁴ such that g(x,x)=0 (i.e. for all x on the light cone). This assumption isn't strong enough to to imply that Λ is a Lorentz transformation, so I described two similar but stronger assumptions that are strong enough. The two statements you're quoting here are theorems I can prove.

There is another approach to relativity that's been discussed in a couple of other threads recently. In this approach, the speed of light isn't mentioned at all. (Note that the g in my theorems is the Minkowski metric, so the speed of light is mentioned there). Instead, we interpret the principle of relativity as a set of mathematically precise statements, and see what we get if we take those statements as axioms. The axioms are telling us that the set of functions that change coordinates from one inertial coordinate system to another is a group, and that each of them takes straight lines to straight lines.

The problem I'm interested in is this: If space and time are represented in a theory of physics as a mathematical structure ("spacetime") with underlying set ℝ⁴, then what is the structure? When ℝ⁴ is the underlying set, it's natural to assume that those functions are defined on all of ℝ⁴. The axioms will then include the statement that those functions are bijections from ℝ⁴ into ℝ⁴. (Strangerep is considering something more general, so he is replacing this with something weaker).

The theorems we've been discussing lately tell us that a bijection ##T:\mathbb R^4\to\mathbb R^4## takes straight lines to straight lines if and only if there's an ##a\in\mathbb R^4## and a linear ##\Lambda:\mathbb R^4\to\mathbb R^4## such that ##T(x)=\Lambda x+a## for all ##x\in\mathbb R^4##. The set of inertial coordinate transformations with a=0 is a subgroup, and it has a subgroup of its own that consists of all the proper and orthochronous transformations with a=0.

What we find when we use the axioms is that this subgroup is either the group of Galilean boosts and proper and orthochronous rotations, or it's isomorphic to the restricted (i.e. proper and orthochronous) Lorentz group. In other words, we find that "spacetime" is either the spacetime of Newtonian mechanics, or the spacetime of special relativity. Those are really the only options when we take "spacetime" to be a structure with underlying set ℝ⁴.

Of course, if we had lived in 1900, we wouldn't have been very concerned with mathematical rigor in an argument like this. We would have been trying to guess the structure of spacetime in a new theory, and in that situation, there's no need to prove that theorem about straight lines. We can just say "let's see if there are any theories in which Λ is linear", and move on.

In 2012 however, I think it makes more sense to do this rigorously all the way from the axioms that we wrote down as an interpretation of the principle of relativity, because this way we know that there are no other spacetimes that are consistent with those axioms.

member 11137 · Nov 26, 2012

Fredrik said:

Of course, if we had lived in 1900, we wouldn't have been very concerned with mathematical rigor in an argument like this. We would have been trying to guess the structure of spacetime in a new theory, and in that situation, there's no need to prove that theorem about straight lines. We can just say "let's see if there are any theories in which Λ is linear", and move on.

In 2012 however, I think it makes more sense to do this rigorously all the way from the axioms that we wrote down as an interpretation of the principle of relativity, because this way we know that there are no other spacetimes that are consistent with those axioms.

OK. Thank you for all these explanations. But don't you think that the "obsession" with preservation of straight lines is entirely due to our false and old fashioned use of the definition of what an inertial observer is? What do I mean? Inertial observer is not = observer without acceleration, but = observer on which no force is acting. And this is not the same thing within a generalized theory of relativity where F = d(m. v)/dt = m. acceleration + dm/dt. speed => F = 0 is not acceleration = 0.

Fredrik · Nov 26, 2012

Those formulas do imply that ##F=0\Leftrightarrow \dot v=0##.

$$\gamma=\frac{1}{\sqrt{1-v^2}},\qquad m=\gamma m_0$$
$$\dot\gamma=-\frac{1}{2}(1-v^2)^{-\frac{3}{2}}(-2v\dot v)=\gamma^3v\dot v$$
$$\dot m=\dot\gamma m_0=\gamma^3v\dot v m_0$$
\begin{align}
F &=\frac{d}{dt}(mv)=\dot m v+m\dot v=\gamma^3v^2\dot v m_0+\gamma m_0\dot v =\gamma m_0\dot v(\gamma^2v^2+1)\\
& =\gamma m_0\dot v\left(\frac{v^2}{1-v^2}+\frac{1-v^2}{1-v^2}\right) =\gamma^3 m_0\dot v
\end{align}

A complete specification of a theory of physics must include a specification of what measuring devices to use to test the theory's predictions. In particular, a theory about space, time and motion must describe how to measure lengths. It's not enough to just describe a meter stick, because the properties of a stick will to some degree depend on what's being done to it. So the theory must also specify the ideal conditions under which the measuring devices are expected to work the best. It's going to be very hard to specify a theory without ever requiring that an accelerometer displays 0. I don't even know if can be done.

So non-accelerated motion is probably always going to be an essential part of all theories of physics. In all of our theories, motion is represented by curves in the underlying set of a structure called "spacetime". I will denote that set by M. A coordinate system is a function from a subset of M into ℝ⁴. If ##C:(a,b)\to M## is a curve in M, U is a subset of M, and ##x:U\to\mathbb R^4## is a coordinate system, then ##x\circ C## is a curve in C. So each coordinate system takes curves in spacetime to curves in ℝ⁴. If such a curve is a straight line, then the object has zero velocity in that coordinate system. If a coordinate system takes all the curves that represent non-accelerating motion to straight lines, then it assigns a constant velocity to every non-accelerating object. Those are the coordinate systems we call "inertial". There's nothing particularly old-fashioned about that.

Edit: Fixed four (language/typing/editing) mistakes in the last paragrah.

member 11137 · Nov 26, 2012

Fredrik said:

Those formulas do imply that ##F=0\Leftrightarrow \dot v=0##.

$$\gamma=\frac{1}{\sqrt{1-v^2}},\qquad m=\gamma m_0$$
$$\dot\gamma=-\frac{1}{2}(1-v^2)^{-\frac{3}{2}}(-2v\dot v)=\gamma^3v\dot v$$
$$\dot m=\dot\gamma m_0=\gamma^3v\dot v m_0$$
\begin{align}
F &=\frac{d}{dt}(mv)=\dot m v+m\dot v=\gamma^3v^2\dot v m_0+\gamma m_0\dot v =\gamma m_0\dot v(\gamma^2v^2+1)\\
& =\gamma m_0\dot v\left(\frac{v^2}{1-v^2}+\frac{1-v^2}{1-v^2}\right) =\gamma^3 m_0\dot v
\end{align}

A complete specification of a theory of physics must include a specification of what measuring devices to use to test the theory's predictions. In particular, a theory about space, time and motion must describe how to measure lengths. It's not enough to just describe a meter stick, because the properties of a stick will to some degree depend on what's being done to it. So the theory must also specify the ideal conditions under which the measuring devices are expected to work the best. It's going to be very hard to specify a theory without ever requiring that an accelerometer displays 0. I don't even know if can be done.

So non-accelerated motion is probably always going to be an essential part of all theories of physics. In all of our theories, motion is represented by curves in the underlying set of a structure called "spacetime". I will denote that set by M. A coordinate system is a function from a subset of M into ℝ⁴. If ##C:(a,b)\to M## is a curve in M, U is a subset of M, and ##x:U\to\mathbb R^4## is a coordinate system, then ##x\circ C## is a curve in C. So each coordinate systems takes curves in spacetime to curves in ℝ⁴. If such a curve is a straight line, then object has zero velocity in that coordinate system. If a coordinate system takes all the curves that represent non-accelerating motion are take to straight lines, then it assigns a constant velocity to every to non-accelerating objects. Those are the coordinate systems we call "inertial". There's nothing particularly old-fashioned about that.

Ok, well-done and -explained (thanks). But all this concerns only special relativity. Where do you see that the question asked by the OP (and recalled by friend) is imposing linearity? For me it only imposes the Christoffel's work; see the other discussion "O-S model of star collapse" post 109, Foundations of the GTR by A. Einstein and translated by Bose, [793], (25). My impression (perhaps false) is that SR is based on a coherent but circular way of thinking including "linearity" for easy understandable historical reasons. The preservation of a length element (which is the initial question here) does not impose a flat geometry. Don't you think so?

TrickyDicky · Nov 26, 2012

Fredrik said:

The problem I'm interested in is this: If space and time are represented in a theory of physics as a mathematical structure ("spacetime") with underlying set ℝ⁴, then what is the structure? When ℝ⁴ is the underlying set, it's natural to assume that those functions are defined on all of ℝ⁴.The axioms will then include the statement that those functions are bijections from ℝ4 into ℝ4

I find this confusing, if you start by assuming a spacetime structure that admits bijections from ℝ4 into ℝ4 (that is E^4 or M^4) as the underlying structure because it seems natural to you, you are already imposing linearity for the transformations that respect the relativity principle. This leaves only the two posible transformations you comment below. The second postulate of SR is what allows us to pick which of the two is the right transformation.

But if you follow this path it is completely superfluous to prove anything about mapping straight lines to straight lines to get the most general transformation that does that and once you have it restrict it to the linear ones with a plausible physical assumption, since you are already starting with linear transformations.

Fredrik said:

What we find when we use the axioms is that this subgroup is either the group of Galilean boosts and proper and orthochronous rotations, or it's isomorphic to the restricted (i.e. proper and orthochronous) Lorentz group. In other words, we find that "spacetime" is either the spacetime of Newtonian mechanics, or the spacetime of special relativity. Those are really the only options when we take "spacetime" to be a structure with underlying set ℝ⁴.

Just a minor correction the Lorentz transformations are locally isomorphic to the restricted group.

Fredrik · Nov 26, 2012

TrickyDicky said:

I find this confusing, if you start by assuming a spacetime structure that admits bijections from ℝ4 into ℝ4 (that is E^4 or M^4) as the underlying structure because it seems natural to you, you are already imposing linearity for the transformations that respect the relativity principle. This leaves only the two posible transformations you comment below.

How am I "already imposing linearity"? I'm starting with "takes straight lines to straight lines", because that is the obvious property of inertial coordinate transformations, and then I'm using the theorem to prove that (when spacetime is ℝ⁴) an inertial coordinate transformation is the composition of a linear map and a translation. I don't think linearity is obvious. It's just an algebraic condition with no obvious connection to the concept of inertial coordinate transformations.

TrickyDicky said:

The second postulate of SR is what allows us to pick which of the two is the right transformation.

Right, if we add that to our assumptions, we can eliminate the Galilean group as a possibility. But I would prefer to just say this: These are the two theories that are consistent with a) the idea that ℝ⁴ is the underlying set of "spacetime", and b) our interpretation of the principle of relativity as a set of mathematically precise statements about transformations between global inertial coordinate systems. Now that we have two theories, we can use experiments to determine which one of them makes the better predictions.

TrickyDicky said:

Just a minor correction the Lorentz transformations are locally isomorphic to the restricted group.

How is that a correction? It seems like an unrelated statement.

member 11137 · Nov 26, 2012

Fredrik said:

How am I "already imposing linearity"? I'm starting with "takes straight lines to straight lines", because that is the obvious property of inertial coordinate transformations, and then I'm using the theorem to prove that (when spacetime is ℝ⁴) an inertial coordinate transformation is the composition of a linear map and a translation. I don't think linearity is obvious. It's just an algebraic condition with no obvious connection to the concept of inertial coordinate transformations.

Right, if we add that to our assumptions, we can eliminate the Galilean group as a possibility. But I would prefer to just say this: These are the two theories that are consistent with a) the idea that ℝ⁴ is the underlying set of "spacetime", and b) our interpretation of the principle of relativity as a set of mathematically precise statements about transformations between global inertial coordinate systems. Now that we have two theories, we can use experiments to determine which one of them makes the better predictions.

The experiment for the actual discussion here is the Morley and Michelson experiment.

How is that a correction? It seems like an unrelated statement.

Intuitively (I am not a specialist) this means that that isomorphism holds true only locally (on short distances around the observer). There is not really a global inertial coordinate system (except on the paper, in theory). And (as far I understand the generalized version of the theory) this is a crucial point. Among others things, this was forcing us (Weyl's work) to introduce the concept of parallel transport and of connection.

TrickyDicky · Nov 26, 2012

Fredrik said:

How am I "already imposing linearity"?

The assumption of a spacetime that is globally R^4(not just locally which is the weaker asumption) means your underlying geometry is flat(Minkowskian, Euclidean), do you agree?
Given that space, the transformations that leave inertial coordinates invariant in the sense of SR first postulate must automatically be linear transformations, do you agree? Maybe this is not as obvious to see as I think, but I I think it is correct.

Fredrik said:

How is that a correction? It seems like an unrelated statement.

Well, It just seemed important to make more precise that the isomorphism you were talking about is local.

Fredrik · Nov 26, 2012

Blackforest said:

But all this concerns only special relativity.

And pre-relativistic classical mechanics. It concerns all theories with ℝ⁴ as spacetime. I think it's pretty cool that there are only two such theories that are consistent with a straightforward interpretation of the principle of relativity.

Blackforest said:

Where do you see that the question asked by the OP (and recalled by friend) is imposing linearity? For me it only imposes the Christoffel's work; see the other discussion "O-S model of star collapse" post 109, Foundations of the GTR by A. Einstein and translated by Bose, [793], (25).

Someone who tries to argue that a transformation that satisfies the OP's condition must be a Lorentz transformation has probably already assumed that spacetime is ℝ⁴, and that the theory will involve global (i.e. defined on all of spacetime) inertial coordinate systems. That a transformation between two global inertial coordinate systems is a bijection and takes straight lines to straight lines is just a consequence of the definition of "global inertial coordinate system". The 4-dimensional version of the theorem I stated and proved in #98 shows that a bijection that takes straight lines to straight lines is affine (i.e. a composition of a linear map and a translation). So when we begin to consider the OP's condition, it's already a matter of determining which affine maps satisfy it. And the condition implies that 0 is taken to 0, so there's no translation involved, i.e. the transformation is linear.

Blackforest said:

My impression (perhaps false) is that SR is based on a coherent but circular way of thinking including "linearity" for easy understandable historical reasons.

I don't think there's anything circular about it. It's perhaps naive to think that we should be able to use ℝ⁴ as our spacetime, and talk about global inertial coordinate systems. But it makes sense to first find all such theories, and then ask what other theories are worth considering. I might take a look at that problem when I have worked out all the details of the ℝ⁴ case.

Fredrik · Nov 26, 2012

TrickyDicky said:

The assumption of a spacetime that is globally R^4(not just locally which is the weaker asumption) means your underlying geometry is flat(Minkowskian, Euclidean), do you agree?

I don't agree. We don't have a geometry at that stage, because until we have chosen an inner product (or something similar), ℝ⁴ is just a set. (And in the case of Galilean transformations, we will never define anything like an inner product on ℝ⁴). The lines that we call "straight" are straight in the Euclidean sense, but we're not considering them because they're straight in the Euclidean sense, but because they describe motion with a constant velocity. We don't need an inner product to see that they do.

TrickyDicky said:

Given that space, the transformations that leave inertial coordinates invariant in the sense of SR first postulate must automatically be linear transformations, do you agree? Maybe this is not as obvious to see as I think, but I I think it is correct.

They must automatically be affine maps, but it takes a non-trivial theorem* to see that, and you specifically said that there's no need to prove that theorem.

*) This theorem is essentially "the fundamental theorem of affine geometry", stated in terms of vector spaces instead of affine spaces.

TrickyDicky said:

Well, It just seemed important to make more precise that the isomorphism you were talking about is local.

But it's not. This is the 1+1-dimensional version of what I said, with all the details made explicit: For each K>0, the group ##G_K=\{\Lambda(v)|v\in (-c,c)\}##, where ##c=1/\sqrt{K}## and
$$\Lambda(v)=\frac{1}{\sqrt{1-Kv^2}}\begin{pmatrix}1 & -Kv\\ -v & 1\end{pmatrix}$$ is isomorphic to the restricted Lorentz group.

There's nothing local about this. In fact, when K=1, this group is the restricted Lorentz group, and the isomorphism is the identity map.

TrickyDicky · Nov 26, 2012

Fredrik said:

I don't agree. We don't have a geometry at that stage, because until we have chosen an inner product (or something similar), ℝ⁴ is just a set. .

Sorry, aren't we asuming inner product spaces? how can we even talk about transformation matrices otherwise?

Fredrik said:

But it's not.

Well, it's not with your assumption of flat inner product space, but if you consider general manifolds the restricted Lorentz group is locally isomorphic to the Lorentz group.

Fredrik · Nov 26, 2012

TrickyDicky said:

Sorry, aren't we asuming inner product spaces? how can we even talk about transformation matrices otherwise?

I'm not even mentioning matrices until later in the argument, after I've determined that we're dealing with linear operators. To associate a matrix with a linear operator, we only need a basis.

TrickyDicky · Nov 26, 2012

You wanted to prove that linear transformations are the only ones possible if one wants use rigorously the first postulate of SR, you bring a R^4 vector space because you consider natural the assumption that the space must be globally R^4, not just locally like in general manifolds, and in this space you need to perform matrix multiplications like:##T(x)=\Lambda x## that looks like a matrix product to me so we are starting with an R^4 vector space with an inner product structure, no? That is called a Euclidean structure IMO.

Samshorn · Nov 26, 2012

Here's a web page that talks about how Einstein and others justified the linearity of the transformations, and the extra assumptions necessary to exclude linear fractional transformations: http://www.mathpages.com/home/kmath659/kmath659.htm

Fredrik · Nov 26, 2012

TrickyDicky said:

You wanted to prove that linear transformations are the only ones possible if one wants use rigorously the first postulate of SR, you bring a R^4 vector space because you consider natural the assumption that the space must be globally R^4, not just locally like in general manifolds, and in this space you need to perform matrix multiplications like:##T(x)=\Lambda x## that looks like a matrix product to me so we are starting with an R^4 vector space with an inner product structure, no? That is called a Euclidean structure IMO.

I'm not using the principle of relativity to prove that they're linear. The notation ##T(x)=\Lambda x+a## doesn't mean that ##\Lambda## is a matrix at this point. It only means that I'm using the standard convention to not write out parentheses when the map is known to be linear. We don't need an inner product to associate matrices with linear operators. We only need a basis for that. If U and V are vector spaces with bases ##A=\{u_i\}## and ##B=\{v_i\}## respectively, then the ij component of ##T:U\to V## with respect to the pair of bases (A,B) is defined as ##(Tu_j)_i##. The matrix associated with T (and the pair (A,B)) has ##(Tu_j)_i## (=the ith component of ##Tu_j##) on row i, column j.

* Spacetime is a structure with underlying set M.
* We intend to use curves in M to represent motion.
* There's a special set of curves in M that we can use to represent the motion of non-accelerating objects.
* M can be bijectively mapped onto ℝ⁴.
* A coordinate system on a subset ##U\subset M## is an injective map from U into ℝ⁴.
* A global coordinate system on M is a coordinate system with domain M.
* A global inertial coordinate system is a global coordinate system that takes the curves that represent non-accelerating motion to straight lines.
* If x and y are global coordinate systems, then ##x\circ y^{-1}## represents a change of coordinates. I call these functions coordinate transformations. When both x and y are global inertial coordinate systems, I call ##x\circ y^{-1}## an inertial coordinate transformation. (I'm getting tired of saying "global" all the time).
* These definitions imply that an inertial coordinate transformation is a bijection that takes straight lines to straight lines.
* The fundamental theorem of affine geometry tells us that this implies that inertial coordinate transformations are affine maps.
* This implies that an inertial coordinate transformation that takes 0 to 0 is linear.
* The principle of relativity tells us among other things that the set of inertial coordinate transformations is a group.
* This group has a subgroup G that consists of the proper and orthochronous inertial coordinate transformations that take 0 to 0.
* We interpret the principle of relativity as imposing a number of other conditions on G.
* Since the members of G are linear (we know this because they are affine and take 0 to 0), we can write an arbitrary member of G as a matrix. (This requires only a basis, not an inner product, and all vector spaces have a basis).
* The conditions inspired by the principle of relativity determine a bunch of relationships between the components of that matrix.
* Those relationships tell us that the group is either the restricted Galilean group without translations, or isomorphic to the restricted Lorentz group. (Restricted = proper and orthochronous).
* This implies that the group of all inertial coordinate transformations is either the Galilean group or the Poincaré group.
* We therefore define spacetime as a structure that has ℝ⁴ as the underlying set, and somehow singles out exactly one of these two groups as "special".
* A nice way to define a structure that singles out the Poincaré group is to define spacetime as the pair (ℝ⁴,g), where g is a Lorentzian metric whose isometry group is the Poincaré group.
* There's no equally nice way to handle the Galilean case. I think we either have to define spacetime as (ℝ⁴,G,g), where G is the Galilean group and G the metric on "space", or define it as a fiber bundle. (An ℝ³ bundle over ℝ, where each copy of ℝ³ is equipped with the Euclidean inner product). The former option is ugly. The latter is difficult to understand, unless you already understand fiber bundles of course.

friend · Nov 26, 2012

I'm given to understand that

dτ²=dt²-dx² = dt'²-dx'²

when (t',x') are the Lorentz transformation of (t,x).

Perhaps it's instructive to consider in what circumstances dτ should want to be considered invariant wrt to coordinate changes. Maybe those requirements are the driving force behind the necessity of the Lorentz transformations.

For example, the most obvious use of dτ is in the calculation of the line integral,

\int_{{\tau _0}}^\tau {d\tau '} = \tau - {\tau _0}
which is the length of a line measured in terms of segments marked off along the length of the line. Then, of course, we can always place this line in an arbitrarily oriented coordinate system and express τ in term of those coordinates.

So the question is, when do we want to use the coordinates (t,x), and when would we want τ-τ₀ to be invariant wrt to those coordinates?

Usually, we specify a curve in space by parameterizing the space coordinates with an arbitrary variable, call it "t". But since the x and t coordinates are arbitrarily assigned, the length of the curve can depend on the (t,x) coordinates. But if you specify that the length of the curve is invariant, then this requires the Lorentz transformations between coordinate systems.

But what requires the length of the curve to be invariant? Perhaps if we have a more fundamental requirement like

\int_{{\tau _0}}^\tau {f(\tau - {\tau _0})d\tau } = a
this will require the length of τ-τ₀ to be invariant wrt to coordinate changes in (t,x). For example, maybe {f(\tau - {\tau _0})} might be a probability distribution along a path so that its integral along the path must be 1 in any coordinate system.

Did I get this all right? I would appreciate comments. Thank you.

micromass · Nov 26, 2012

TrickyDicky said:

You wanted to prove that linear transformations are the only ones possible if one wants use rigorously the first postulate of SR, you bring a R^4 vector space because you consider natural the assumption that the space must be globally R^4, not just locally like in general manifolds, and in this space you need to perform matrix multiplications like:##T(x)=\Lambda x## that looks like a matrix product to me so we are starting with an R^4 vector space with an inner product structure, no? That is called a Euclidean structure IMO.

Why do you think we need inner products to define matrix products??

TrickyDicky · Nov 26, 2012

micromass said:

Why do you think we need inner products to define matrix products??

No, it's not needed, I thought Fredrik was assuming Euclidean geometry but he wasn't.

Fredrik · Nov 27, 2012

Fredrik said:

* Spacetime is a structure with underlying set M.
* We intend to use curves in M to represent motion.
* There's a special set of curves in M that we can use to represent the motion of non-accelerating objects.
* M can be bijectively mapped onto ℝ⁴.
* A coordinate system on a subset ##U\subset M## is an injective map from U into ℝ⁴.
* A global coordinate system on M is a coordinate system with domain M.
* A global inertial coordinate system is a global coordinate system that takes the curves that represent non-accelerating motion to straight lines.
* If x and y are global coordinate systems, then ##x\circ y^{-1}## represents a change of coordinates. I call these functions coordinate transformations. When both x and y are global inertial coordinate systems, I call ##x\circ y^{-1}## an inertial coordinate transformation. (I'm getting tired of saying "global" all the time).
* These definitions imply that an inertial coordinate transformation is a bijection that takes straight lines to straight lines.

I have some concerns about this part. Maybe there is some circularity in the argument after all. It doesn't seem obvious* that the "special" curves in spacetime that represent non-accelerated motion should include curves that correspond to infinite speed in some inertial coordinate system. If we leave them out, then what I call an inertial coordinate transformation will be a map that takes finite-speed straight lines to finite-speed straight lines. Of course, inertial coordinate transformations in SR (i.e. Poincaré transformations) can take infinite-speed lines to finite-speed lines and vice versa. If inertial coordinate transformations can't do this, there's no relativity of simultaneity. So if we leave out the infinite-speed lines from the start, we will come to the conclusion that there's only one possibility: The group is the Galilean group. (Hm, maybe there will actually be infinitely many possibilities, distinguished by what exactly they're doing to infinite-speed lines).

Do we have a reason to include infinite-speed lines other than that we know what we want the final answer to be?

*) Recall that the main reason why we need spacetime to include that special set of curves is that they (or at least some of them) are to represent the motions of "observers" that are minimally disturbed by what's being done to them. (An "observer" here is not necessarily conscious. It could be a measuring device).

strangerep · Nov 27, 2012

Fredrik said:

Do we have reason to include infinite-speed lines other than that we know what we want the final answer to be?

*) Recall that the main reason why we need spacetime to include that special set of curves is that they (or at least some of them) are to represent the motions of "observers" that are minimally disturbed by what's being done to them. (An "observer" here is not necessarily conscious. It could be a measuring device).

This sort of thing is one reason why I prefer to start from inertial observers defined as those that feel no acceleration. If one finds the maximal dynamical group applicable to the zero-acceleration equations of motion, the problematic case you mentioned can be handled by taking a limit afterwards.

TrickyDicky · Nov 27, 2012

Fredrik said:

I have some concerns about this part. Maybe there is some circularity in the argument after all. It doesn't seem obvious* that the "special" curves in spacetime that represent non-accelerated motion should include curves that correspond to infinite speed in some inertial coordinate system. If we leave them out, then what I call an inertial coordinate transformation will be a map that takes finite-speed straight lines to finite-speed straight lines. Of course, inertial coordinate transformations in SR (i.e. Poincaré transformations) can take infinite-speed lines to finite-speed lines and vice versa. If inertial coordinate transformations can't do this, there's no relativity of simultaneity. So if we leave out the infinite-speed lines from the start, we will come to the conclusion that there's only one possibility: The group is the Galilean group. (Hm, maybe there will actually be infinitely many possibilities, distinguished by what exactly they're doing to infinite-speed lines).

Do we have a reason to include infinite-speed lines other than that we know what we want the final answer to be?

*) Recall that the main reason why we need spacetime to include that special set of curves is that they (or at least some of them) are to represent the motions of "observers" that are minimally disturbed by what's being done to them. (An "observer" here is not necessarily conscious. It could be a measuring device).

Why do you think relativity of simultaneity implies nonlinear transformations?(taking finite to infinite coords. and viceversa)

AFAIK RoS has always been explained with the usual linear Lorentz transformations.

Showing that Lorentz transformations are the only ones possible

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Euclidean geometry and gravity

Undergrad Synchronizing clocks in an inertial frame if light is anisotropic

Undergrad Question about Parallel Transport

Undergrad EPR revisited

Graduate Assumptions of Hawking-Penrose 1970 Singularity Theorem

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight