## Showing that Lorentz transformations are the only ones possible

It is always assumed that the transformation is linear (at least if the origin is mapped to the origin, otherwise affine). But what is the physical reason for this assumption?

Blog Entries: 3
Recognitions:
Gold Member
 Quote by bcrowell You can't have a be anything but gamma, because v is defined by the action of the LT on the positive t axis. .. .. Actually this only rules out $-1 \lt a lt 1$. What rules out all values of a<1 is the definition of v.
Ben, I think we're talking across each other so I'll let it go now.

Mentor
 Quote by bcrowell Fredrik's calculation is unnecessarily complicated.
In what way? What part of it can be simplified?

Mentor
 Quote by Erland It is always assumed that the transformation is linear (at least if the origin is mapped to the origin, otherwise affine). But what is the physical reason for this assumption?
The idea is that for each inertial (=non-accelerating) observer, there's a coordinate system in which the observer's own motion is described by the time axis, and the motion of any non-accelerating object is described by a straight line. So a function that changes coordinates from one of these coordinate systems to another must take straight lines to straight lines.

 Quote by Fredrik The idea is that for each inertial (=non-accelerating) observer, there's a coordinate system in which the observer's own motion is described by the time axis, and the motion of any non-accelerating object is described by a straight line. So a function that changes coordinates from one of these coordinate systems to another must take straight lines to straight lines.
Hmm, are you saying something like that a map between vector spaces that takes lines to lines must be linear, or affine?
Well, that's certainly not true in one dimension, where the map f(x)=x^3 maps the entire line onto itself without being linear, or affine.
But perhaps in higher dimensions...? Is there a theorem of this kind?

Recognitions:
 Quote by bob900 In a book ("The special theory of relativity by David Bohm") that I'm reading, it says that if (x,y,z,t) are coordinates in frame A, and (x',y',z',t') are coordinates in frame B moving with v in realtion to A, if we have (for a spherical wavefront) $c^2t^2 - x^2 - y^2 - z^2 = 0$ and we require that in frame B, $c^2t'^2 - x'^2 - y'^2 - z'^2 = 0$ then it can be shown that the only possible transformations (x,y,z,t) -> (x',y',z',t') which leave the above relationship invariant are the Lorentz transformations (aside from rotations and reflections). I'm wondering how exactly can this be shown?
I don't think Bohm said this! Lorentz group is a subgroup of a bigger group called the conformal group. It is the conformal group that preserves the light-cone structur.

Sam
 Recognitions: Science Advisor Staff Emeritus Didn't the paper that Ben mentioned in another thread, http://arxiv.org/abs/physics/0302045, go through all this? The assumptions that that paper made were (skimming) * replacing v with -v must invert the transform * isotropy *homogeneity of space and time with a few tricks along the way: * adding a third frame * noting that x=vt implies x'=0 The result was pretty much that there must be some invariant velocity that was the same for all observers. (THere were some arguments about sign of a constant before this to establish that it was positive). The remaining step is to identify this with the speed of light.

Recognitions:
 Quote by bob900 So if given just the following pieces of information : 1. $c^2 t^2 - x^2 - y^2 - z^2 = 0$ 2. $c^2 t'^2 - x'^2 - y'^2 - z'^2 = 0$ is it "difficult" or actually impossible to show that the Lorentz transformation is the only possibility (aside from rotation $x^2+y^2+z^2=x'^2+y'^2+z'^2$ and t=t', and reflection x=-x', t=-t', etc.)? That I know how to do - what I'm trying to see is if the book is wrong in saying that you only need 1 and 2 above. Here's a quote from the book :
Now Bohm is making sense.

see post #9 in

Recognitions:
 Quote by Erland It is always assumed that the transformation is linear (at least if the origin is mapped to the origin, otherwise affine). But what is the physical reason for this assumption?
The most common reason is so-called homogeneity of space and time. By this, the authors mean that position-dependent (and time-dependent) dilations (scale changes) are ruled out arbitrarily.

Personally, I prefer a different definition of spacetime homogeneity: i.e., that it should look the same wherever and whenever you are. IOW, it must be a space of constant curvature.
This includes such things as deSitter spacetime, and admits a larger class of possibilities.

But another way that various authors reach the linearity assumption is to start with the most general transformations preserving inertial motion, which are fractional-linear transformations. (These are the most general transformations which map straight lines to straight lines -- see note #1.) They then demand that the transformations must be well-defined everywhere, which forces the denominator in the FL transformations to be restricted to a constant, leaving us with affine transformations.

In the light of modern cosmology, these arbitrary restrictions are becoming questionable.

--------
Note #1: a simpler version of Fock's proof can be found in Appendix B of this paper:
http://arxiv.org/abs/gr-qc/0703078c/0703078 by Guo et al.

An even simpler proof for the case of 1+1D can also be found in Appendix 1 of this paper:
http://arxiv.org/abs/physics/9909009 by Stepanov. (Take the main body of this paper with a large grain of salt, but his Appendix 1 seems to be ok, though it still needs the reader to fill in some of the steps -- speaking from personal experience. :-)

Mentor
 Quote by Erland Hmm, are you saying something like that a map between vector spaces that takes lines to lines must be linear, or affine? Well, that's certainly not true in one dimension, where the map f(x)=x^3 maps the entire line onto itself without being linear, or affine. But perhaps in higher dimensions...? Is there a theorem of this kind?
The only book I know that suggests that there is such a theorem left the proof as an exercise. I tried to prove it a couple of years ago, but got stuck and put it aside. I just tried again, and I still don't see how to do it. It's pretty annoying. Three distinct vectors x,y,z are said to be collinear if they're on the same straight line. So x,y,z are collinear if and only if they're all different and there's a number a such that ##z=x+a(y-x)##, right? Note that the right-hand side is ##=(1-a)x+ay##. So three vectors are collinear if and only if they're all different and (any) one of them can be expressed as this special type of linear combination of the other two.

A linear transformation ##T:U\to V## is said to preserve collinearity if for all collinear x,y,z in U, Tx,Ty,Tz are collinear.

It's trivial to prove that linear maps preserve collinearity. Since ##T(ax+by)=aTx+bTy## for all a,b, we have ##T((1-a)x+ay)=(1-a)Tx+aTy## for all a.

I still haven't been able to prove that if T preserves collinearity, T is linear. Suppose that T preserves collinearity. Let x,y be arbitrary vectors and a,b arbitrary numbers. One idea I had was to rewrite ##T(ax+by)=T(ax+(1-a)z)##. All I have to do is to define ##z=by/(1-a)##. But this is a lot less rewarding than I hoped. All we can say now is that there's a number c such that
$$T(ax+by)=cTx+(1-c)Tz =cTx+(1-c)T\left(\frac{by}{1-a}\right).$$ The fact that we can't even carry the numbers a,b over to the right-hand side is especially troubling. I don't know, maybe I've misunderstood a definition or something.

The book I'm talking about is "Functional analysis: Spectral theory" by Sunder. It can be downloaded legally from the author's web page. Scroll down to the first horizontal line to find the download link. See exercise 1.3.1 (2) on page 9 (in the pdf, it may be on another page in the actual book). Edit: Direct link to the pdf.

Recognitions:
 Quote by Fredrik The only book I know that suggests that there is such a theorem left the proof as an exercise. I tried to prove it a couple of years ago, but got stuck and put it aside. I just tried again, and I still don't see how to do it. It's pretty annoying. [...]
I guess you didn't see my previous post #27, huh? :-)

Mentor
 Quote by strangerep I guess you didn't see my previous post #27, huh? :-)
Not until after I posted. I'm checking out those appendices now. I guess Sunder's exercise is just wrong then. No wonder I found it so hard to solve it.

Recognitions:
 Quote by Fredrik I guess Sunder's exercise is just wrong then.
He's restricting himself to the case of a vector space and linear transformations between them. But the more general case involves differentiable coordinate transformations on a more general manifold -- which is a different problem.

Edit: looking at his exercise, I think he means "##x,y,z## in ##V##", meaning that ##x,y,z## are vectors in ##V##. So the "straight line" also includes the origin. That makes his exercise almost trivial because "being on a straight line" means that the vectors are all simple multiples of each other (i.e., they're on the same ray), and linear transformations preserve this.

But this is somewhat tangential to the current issue since in relativity we want something more general which preserves continuous inertial motion.

Mentor
 Quote by strangerep He's restricting himself to the case of a vector space and linear transformations between them. But the more general case involves differentiable coordinate transformations on a more general manifold -- which is a different problem. Edit: looking at his exercise, I think he means "##x,y,z## in ##V##", meaning that ##x,y,z## are vectors in ##V##. So the "straight line" also includes the origin.
Exercise 1.3.1 (2) is asking the reader to prove that if T (defined on ##\mathbb R^n##) takes straight lines to straight lines, then T is linear. The exercise also says something about mapping the domain onto W, but W is not defined. If he meant that W is the domain, he's also assuming that T is surjective.

I think he just meant that x,y,z are on the same line, not that they're all on the same line through the origin.

Mentor
 Quote by strangerep Note #1: a simpler version of Fock's proof can be found in Appendix B of this paper: http://arxiv.org/abs/gr-qc/0703078c/0703078 by Guo et al. An even simpler proof for the case of 1+1D can also be found in Appendix 1 of this paper: http://arxiv.org/abs/physics/9909009 by Stepanov. (Take the main body of this paper with a large grain of salt, but his Appendix 1 seems to be ok, though it still needs the reader to fill in some of the steps -- speaking from personal experience. :-)
I started reading this, but so far I don't understand any of it. In the first one, the first thing the authors say after the word "Proof:" makes absolutely no sense to me. I don't understand anything in the first equation. I don't even understand if he's multiplying numbers with vectors (in that case, why does the last term look like a number?) or if it's a function taking a vector as input. It never ceases to amaze me how badly written published articles can be.

In the second one, I apply the chain rule to ∂f/∂t' and there appears a factor of ∂x/∂t' that I don't see how to deal with, so I don't understand (35). I guess I need to refresh my memory about partial derivatives of multivariable inverses.

Recognitions:
 Quote by Fredrik I started reading this [Guo?], but so far I don't understand any of it. In the first one, the first thing the authors say after the word "Proof:" makes absolutely no sense to me. I don't understand anything in the first equation. I don't even understand if he's multiplying numbers with vectors (in that case, why does the last term look like a number?) or if it's a function taking a vector as input. It never ceases to amaze me how badly written published articles can be.
Yeah, it took me several months (elapsed time) before I understood what's going on here. You should see the original version in Fock's textbook -- it's even more obscure.

The crucial idea here is that the straight line is being parameterized in terms of an arbitrary real ##\lambda##. Also think of ##x_0^i## as an arbitrary point on the line so that ##\lambda## and ##v^i## generate the whole line. Then they adopt a confusing notation that ##x## is an abbreviation for the 3-vector with components ##x^i##. Using a bold font would have been more helpful.

But persevering with their notation, ##x = x(\lambda) = x_0 + \lambda v##. Since we want the transformed ##x'^{\,i}## to be a straight line also, in general parameterized by a different ##\lambda'## and ##v'##, we can write
$$x'^{\,i}(x) ~=~ x'^{\,i}(x_0) ~+~ \lambda'(\lambda) \, v'^{\,i}$$
where the first term on the RHS is to be understood as what ##x_0## is mapped into. I.e., think of ##x'^{\,i}## as a mapping. It might have been more transparent if they'd written ##x'^{\,i}_0## and then explained why this can be expressed as ##x'^{\,i}(x_0)##.

Confusing? Yes, I know that only too well. I guess it becomes second nature when one is working in this way all the time. Fock also does a lot of this sort of thing.

 In the second one [Stepanov], I apply the chain rule to ∂f/∂t' and there appears a factor of ∂x/∂t' that I don't see how to deal with, so I don't understand (35).
Denoting partial derivatives by suffices in the same way as Stepanov does,
$$dx' = df = f_x dx + f_t dt = (f_x u + f_t) dt ~;~~~~~~ dt' = dg = g_x dx + g_t dt = (g_x u + g_t) dt ~;$$
and so Stepanov's (35) is obtained by
$$u' ~=~ dx'/dt' ~=~ \frac{f_x u + f_t}{g_x u + g_t} ~.$$

[Edit: I have more detailed writeups of both proofs where I try to fill in some of these gaps, but they're in ordinary latex, not PF latex. If you get stuck, I could maybe post a pdf.]

Mentor
 Quote by strangerep Yeah, it took me several months (elapsed time) before I understood what's going on here. You should see the original version in Fock's textbook -- it's even more obscure. The crucial idea here is that the straight line is being parameterized in terms of an arbitrary real ##\lambda##. Also think of ##x_0^i## as an arbitrary point on the line so that ##\lambda## and ##v^i## generate the whole line. Then they adopt a confusing notation that ##x## is an abbreviation for the 3-vector with components ##x^i##. Using a bold font would have been more helpful. But persevering with their notation, ##x = x(\lambda) = x_0 + \lambda v##. Since we want the transformed ##x'^{\,i}## to be a straight line also, in general parameterized by a different ##\lambda'## and ##v'##, we can write $$x'^{\,i}(x) ~=~ x'^{\,i}(x_0) ~+~ \lambda'(\lambda) \, v'^{\,i}$$ where the first term on the RHS is to be understood as what ##x_0## is mapped into. I.e., think of ##x'^{\,i}## as a mapping. It might have been more transparent if they'd written ##x'^{\,i}_0## and then explained why this can be expressed as ##x'^{\,i}(x_0)##. Confusing? Yes, I know that only too well. I guess it becomes second nature when one is working in this way all the time. Fock also does a lot of this sort of thing.
Thanks for explaining. I think I understand now. This notation is so bad it's almost funny. The coordinate transformation takes the straight line ##t\mapsto x_0+tv## to a straight line ##t\mapsto \Lambda(x_0)+tu##, where ##\Lambda## denotes the coordinate transformation and u denotes a tangent vector to the new straight line. That much is clear. Now it would make sense to write ##x_0'## instead of ##\Lambda(x_0)##, but these guys denote the components of this vector by ##x'^i(x_0)##!?!?! I guess for all y, x'(y) should be read as "the primed coordinates of the event whose unprimed coordinates are y".

It doesn't make a lot of sense to put a prime on the λ, but I guess they're doing it as a reminder that if the old straight line is the map B defined by ##B(\lambda)=x_0+\lambda v##, then the new straight line isn't necessarily ##\Lambda\circ B##. It could be ##\Lambda\circ B\circ f##, where f is a "reparametrization". I really don't like that they write v' for the vector I denoted by u, because it suggests that ##v'=\Lambda v##.

I realized something interesting when I looked at the statement of the theorem they're proving. They're saying that if ##\Lambda## takes straight lines to straight lines, there's a 4×4 matrix A, two 4×1 matrices y,z, and a number c, such that
$$\Lambda(x)=\frac{Ax+y}{z^Tx+c}.$$
If we just impose the requirement that ##\Lambda(0)=0##, we get y=0. And if z≠0, there's always an x such that the denominator is 0. So if we also require that ##\Lambda## must be defined on all of ##\mathbb R^4##, then the theorem says that ##\Lambda## must be linear. Both of these requirements are very natural if what we're trying to do is to explain e.g. what the principle of relativity suggests about theories of physics that use ##\mathbb R^4## as a model of space and time.

 Quote by strangerep Denoting partial derivatives by suffices in the same way as Stepanov does, $$dx' = df = f_x dx + f_t dt = (f_x u + f_t) dt ~;~~~~~~ dt' = dg = g_x dx + g_t dt = (g_x u + g_t) dt ~;$$ and so Stepanov's (35) is obtained by $$u' ~=~ dx'/dt' ~=~ \frac{f_x u + f_t}{g_x u + g_t} ~.$$
Cool. This doesn't look rigorous, because dx and dt are independent variables when they first appear in this calculation, and then you use dx/dt=u. But it's certainly enough to convince me that the result is correct.

 Quote by strangerep [Edit: I have more detailed writeups of both proofs where I try to fill in some of these gaps, but they're in ordinary latex, not PF latex. If you get stuck, I could maybe post a pdf.]
Thanks for the offer. I'm not sure I'll have the time to look at this. I have to go to bed now, and I will be very busy in the near future. Actually, I think that for now, I'll just try to figure out the best way to use the two additional assumptions I suggested above to simplify the problem.