# A question concerning the lorentz transformation derivation

1. Mar 4, 2008

### ehj

I have been studying the lorentz-transformation derivation but I can't quite get a proper answer as to why you can start with the assumption that:
x' = ax + bt
t' = cx + dt
The assumption is that "the transformations must be linear". So my first question is, what does that mean, that a, b, c and d are constants? And the 2nd question: How do you argue for that.

2. Mar 4, 2008

### StatusX

The quick answer is that, just as in newtonian physics, observers set up their coordinate systems so that free particles travel along straight lines in spacetime, so any transformation between coordinate systems must map straight lines to straight lines. Assuming we define the origins of the two systems to coincide, this only leaves transformations of the above type. You can derive it more rigorously by explicitly constructing a coordinate system by sending out beams of light, or something like that, as described in, eg, MTW.

3. Mar 5, 2008

### Fredrik

Staff Emeritus
Books often take Einstein's postulates as a starting point and pretend to derive the Lorentz transformation from them, but Einstein's postulates aren't mathematically well-defined, so you can't really derive anything from them, not rigorously anyway. The problem with them is that they use the concept "inertial frame" without defining it first. And you can't get around that by simply defining what an inertial frame is first, because a precise definition is going to include both of Einstein's postulates in one way or another. The postulates are a part of the definition.

OK fine, so why don't we just define inertial frames in a way that includes Einstein's postulates? That's actually what you're trying to do right now. The whole point of the "derivations" found in relativity textbooks is to help you guess a useful definition. So there really is no reason to do anything rigorously at this point. Just guess that the Lorentz transformation is linear and see if you get something useful from that. When you're done with the non-rigorous calculations, you will have found a set of statements that define the theory rigorously. (And then it's up to the experimentalists to verify that the predictions of the theory agree with experiments).

These are two ways to define SR rigorously:

1. Space and time can be represented mathematically by the smooth manifold called Minkowski space. Each isometry of the metric can be interpreted as a change of coordinates from a coordinate system used by a physical observer moving at constant velocity to another coordinate system of the same kind.

2. Space and time can be represented mathematically by the set $\mathbb{R}^4$. A change of coordinates from a coordinate system used by a physical observer moving at constant velocity to another coordinate system of the same kind always takes the form $x\mapsto\Lambda x+a$, where $\Lambda$ is linear and satisfies $\Lambda^T\eta\Lambda=\eta$.

Another approach is to start by looking at the properties of inertial frames in pre-relativistic physics, and figure out how to change them to get SR. Even before it was discovered that the speed of light is the same in all inertial frames, it made sense to let $\mathbb{R}^4$ represent space and time. You could define a coordinate system as a function from $\mathbb{R}^4$ to $\mathbb{R}^4$. (Because a coordinate system is supposed to assign four numbers to each event). If x and y are coordinate systems, then $x\circ y^{-1}$ represents a coordinate change.

Let's look at the properties of coordinate changes between inertial frames in pre-relativistic physics:

1. They are smooth functions (i.e. they can be differentiated as many time as you'd like).
2. They take straight lines to straight lines. (Otherwise different observers wouldn't agree about whether a certain object is moving at constant velocity).
3. They preserve simultaneity (i.e. they take hyperplanes that are orthogonal to the time axis to hyperplanes that are orthogonal to the time axis).

All you have to do to get from this to SR is to change the last one to

3'. The origin-preserving part also preserves the light-cone at the origin.

What I mean by "origin-preserving part" is this: Taylor expand the function and separate the constant term from the others. The sum of the others is what I call the origin-preserving part of the function. (There may be a better word for it, but I can't think of one right now).

Now if you take these 3 things to be your starting point, it's not hard to show that the origin-preserving part of a coordinate change between inertial frames (i.e. a homogeneous Lorentz transformation) must be linear. I have actually already written down that proof in a note to myself, so I'll just paste it here:

Consider a straight line through the origin, and two events x and y on that line. We have y=kx for some real number k. A homogeneous Lorentz transformation $\Lambda$ between inertial frames takes straight lines through the origin to straight lines through the origin, so we must have

$$\Lambda(y)=k'\Lambda(x)$$

for some real number k', but we also have

$$\Lambda(y)=\Lambda(kx)$$

so

$$\Lambda(kx)=k'\Lambda(x)$$

If we Taylor expand this around x=0 and compare the first order terms, we see that k'=k, so we have proved that

$$\Lambda(kx)=k\Lambda(x)$$

for any k and any x.

Now Taylor expand both sides of this around x=0, and compare terms of the same order. We see that only the first-order terms can be non-zero. This implies that

$$\Lambda^\mu(x)=\Lambda^\mu{}_\nu x^\nu$$

where the $\Lambda$ on the right-hand side is a matrix. So the function on the left-hand side must definitely be linear.

Last edited: Mar 5, 2008
4. Mar 5, 2008

### tiny-tim

Hi ehj!

StatusX is absolutely right: "any transformation between coordinate systems must map straight lines to straight lines." In other words, a uniform velocity for one observer must still be a uniform velocity for any other observer.

I'll just add: if you're still not happy, try for yourself to find a new definition for x´ and t´ so that a uniform velocity for the first observer is still a uniform velocity for the second observer.

In other words: so that whenever x = constant.t, then x´ = another-constant.t´.

You should find that only a Lorentz-type formula works.

5. Mar 5, 2008

### ehj

tiny-tim can you show me that?
I can't follow Fredrik's math :I

6. Mar 5, 2008

### Fredrik

Staff Emeritus
I realize that even though I gave you a very thorough answer about the linearity of the Lorentz transformations, I didn't really answer your first question. I see the other posters didn't either, so here it is: A function $f:U\mapsto V$, where U and V are vector spaces, is said to be linear if

$$f(ax+by)=af(x)+bf(y)$$

for all vectors x,y and for all numbers a,b.

As any textbook on linear algebra will tell you (and also my post in this thread), every linear function from $\mathbb{R}^n$ to $\mathbb{R}^m$ corresponds to an m x n matrix, and vice versa. So f(x) can always be expressed as Tx, where T is a matrix.

The assumption that a (1+1-dimensional) Lorentz transformation must be linear is therefore equivalent to assuming that it has the form

$$\begin{pmatrix}t \\ x\end{pmatrix} \mapsto \begin{pmatrix}b & a\\d & c\end{pmatrix} \begin{pmatrix}t \\ x\end{pmatrix}$$

which is exactly what your system of equations says.

The most important part of the answer to your second question has been mentioned in all the replies in this thread, including mine: Lorentz transformations must take straight lines to straight lines.

This implies that they satisfy the condition $\Lambda(x+y)=\Lambda x + \Lambda y$. However, it doesn't imply that they satisfy $\Lambda(ax)=a\Lambda x$. So "takes straight lines to straight lines" doesn't imply linearity!

The remaining condition can be thought of as a consequence of the first postulate. If it doesn't hold, then the velocity associated with $\Lambda^{-1}$ wouldn't be the negative of the velocity associated with $\Lambda$ when $\Lambda$ is a pure boost. And this would suggest that there's something different about one of the two inertial frames involved, which contradicts (some interpretations of) the first postulate.

(What I mean by the velocity associated with a Lorentz transformation is explained in this thread).

Last edited: Mar 5, 2008
7. Mar 5, 2008

### tiny-tim

umm … no!

You have to try it!

You see, you're not convinced that the equations have to be linear, and the only way of really convincing yourself is by trying to find some non-linear equations that will work, and eventually giving up 'cos it ain't possible!

(btw, I forgot to mention that the reason for requiring that "a uniform velocity for one observer must still be a uniform velocity for any other observer" is because of good ol' Newton's first law, which makes uniform velocity a fundamental part of physics. So everyone must agree on what a uniform velocity is.

To put it another way, can you design a physics in which diferent observers disagree about whether something has uniform velocity?)

8. Mar 5, 2008

### Fredrik

Staff Emeritus
What part of it is causing you problems? Is it when I Taylor expand both sides of an equation and match terms of the same order? What I'm doing there is basically the same as what I'm doing here: Suppose we know that

$$a+bx+cx^2=d+ex+fx^2$$

for all x. Then we must have a=d, b=e and c=f.

9. Mar 5, 2008

### ehj

Well the thing is I'm in highschool atm so my math knowledge is limited (in comparison). I've never been taught anything about taylor expansions, matrixes or functions of vectors. It might be too difficult a subject for me, but the derivation i read ( http://www-personal.umich.edu/~lorenzon/classes/2007/Handouts/lorentz-transformations.pdf ) required only basic algebra to perferm, but arguing for the linear transformations seems to be a bit more difficult -_-.
tiny-tim. Do you mean that I should conclude from the result that the assumptions must be correct?

Last edited: Mar 5, 2008
10. Mar 5, 2008

### Fredrik

Staff Emeritus
Some people might point out that you can't actually prove a statement by failing to prove its converse.

But I agree with you anyway. The main point of my long post above was that it isn't possible to prove the linearity since the "postulates" we started with are ill-defined. So the best we can do is to convince ourselves that the non-rigorous steps we take are reasonable, and the exercise you suggest will definitely help.

11. Mar 5, 2008

### Fredrik

Staff Emeritus
OK, I see. Then you should probably just try to understand what I started with: the fact that the "postulates" we started with aren't well-defined enough to be used as a starting point in a rigorous mathematical proof. This means that there's no way to prove every step rigorously. There's absolutely nothing wrong with just skipping every step you find difficult in this derivation, and then simply take the end result (the Lorentz transformation equations) as the definition of the special theory of relativity. The calculations you did to get there were not meant as a proof. They were just helping you guess the form of the Lorentz transformation. Once you've made a guess that seems reasonable, that guess is your theory, and only experiments can show if it's any good.

Just in case you aren't familiar with Einstein's postulates, which I've been referring to a few times, here they are:

1. The laws of physics are the same in all inertial frames.
2. The speed of light is the same in all inertial frames.

12. Mar 5, 2008

### tiny-tim

That argument was at the bottom of page 1:
I agree that's not very clear, but what it means is that if a particle is moving uniformly (not accelerating) in (x; y; z; t), then it should also not be accelerating in (x0; y0; z0; t0) - otherwise there would be a force in the second frame, but no force in the first frame.

And we want to use frames in which the laws of physics are the same!

Frankly, we can define anything we like, but some definitions are more useful than others, and we prefer to define "inertial" frames in which the laws of physics are "natural"!

If by "correct" you mean "necessary, with no alternatives", then no.

The object is to understand why everybody uses those assumptions, and what their significance is!

Maybe, just as Einstein came along and said Newton needed adjusting, one day someone else will say Einstein needs adjusting. But until then, you have to understand Einstein, or you won't know what everyone else is talking about!

13. Mar 5, 2008

### ehj

The problem for me was, that I didn't see how the equations presented in the first post satisfied the fact that we wan't an observed particle to move with uniform velocity in both frames, although I think I can see that now. I'll try showing what i did and you can comment on it :)

x' = ax + bt
t' = cx + dt
An object moving with a velocity v1 traces out a worldline giving us a set of events satisfying x = v1*t, if this is used in above equations we get:
x' = t(v1*a+b)
t' = t(v1*c+d)
If you isolate t in one and substitute into the other you get:
x' = t'*(v1*a+b)/(v1*c+d)
Which means we get a constant velocity in the ' - frame if *(v1*a+b)/(v1*c+d) is constant. So the equations can explain a uniform velocity, which is what i need to know =P..

Last edited: Mar 5, 2008
14. Mar 5, 2008

### peter0302

I think the easiest way to derive special relativity is to assume all motion is governed by a four-dimensional velocity vector of constant magnitude 'c'. All of special relativity flows easily from that one postulate and Newton's definitions of force, momentum, and energy.

15. Mar 5, 2008

### robphy

This wouldn't be a complete formulation.
What defines the "magnitude" of your vector?
As written, you wouldn't rule out obtaining Galilean relativity.

16. Mar 5, 2008

### tiny-tim

Yes, you're right!

Yes, that's good!

You've correctly proved that, with these linear equations for x´ and t´, constant x/t leads to constant x´/t´.

(btw, you must get into the habit of getting brackets in the right place - sooner or later, you'll make a mistake through not being careful enough with brackets - for example, your (v1*a+b)/(v1*c+d) should be v1*(a+b)/(v1*(c+d).)

mmm … that line was right, but messy.

You could have said:

"Divide one line by the other", and then the next line would have been: x´/t´ = (a+b)/(c+d).

You see the difference?

17. Mar 5, 2008

### peter0302

The magnitude is postulated to be 'c'. Or a unit vector in natural units.

a^2+b^2=1

Then define:
T == proper time
t == observer time
s == observer space
a == dT / dt
b == ds / dt

Then add Newton's definitions for momentum, energy, and force. And, again, you can get all of S.R. formally.

18. Mar 5, 2008

### robphy

I wouldn't call such a procedure "easy"....
It's certainly not physically natural or intuitive.

19. Mar 5, 2008

### ehj

tiny-tim
I'm not following you with the brackets, I looked mine through again and they seem to be correct imo and what you wrote doesn't seem to be the same as i wrote. And by "dividing one line with the other" you must assume that t' is different from 0, which I don't see any reason for.

20. Mar 5, 2008

### tiny-tim

Oops!

Hi ehj!

Oops!

Yes, you were right. I misread it.