Showing that Lorentz transformations are the only ones possible

Fredrik · Nov 14, 2012

strangerep said:

If you mean "just assume linearity", the best physicist-oriented proof I've seen is in Rindler's SR textbook.

I meant that I would like to prove that if ##\Lambda## is is a permutation of ##\mathbb R^4## (or ##\mathbb R^2##) that takes straight lines to straight lines, and 0 to 0, then ##\Lambda## is linear. I think I know how to do the rest after that, at least in 1+1 dimensions.

bobc2 · Nov 14, 2012

It is shown by geometrical inspection that the Lorentz transformation is the only solution to accounting for the invariant speed of light. We begin with a graphical representation of three examples of observers moving at arbitrarily selected different speeds with respect to the black inertial frame of reference. The speed of light in the black inertial reference system is already known to have the value of c and is represented by the world line of a single photon (green line slanted at an angle of 45 degrees in the black frame).

Next, we inquire as to what orientation of the X1 axis for each observer we must have for the speed of light to be invariant among the inertial frames. By trial and error inspection we can only have those orientations of the X1 axis for which the photon world line bisects the angle between the X1 axis and the X4 axis as shown below.

So, based on this result we wish to derive the coordinate transformations between any two arbitrarily selected frames. Again by geometric inspection we identify a right triangle for which we can apply the Pythagorean Theorem. Notice that we have selected two of the moving observer frames, entirely arbitrarily, and then found a new black inertial frame for which two other inertial frames are moving in opposite directions with the same speed. This is a perfectly general situation, since for any pair of observers moving relative to each other, you can always find such a reference frame. Having derived the time dilation, the result for length contraction can easily be shown by similar triangle inspection.

Erland · Nov 14, 2012

strangerep said:

Yeah, it took me several months (elapsed time) before I understood what's going on here. You should see the original version in Fock's textbook -- it's even more obscure.

The crucial idea here is that the straight line is being parameterized in terms of an arbitrary real ##\lambda##. Also think of ##x_0^i## as an arbitrary point on the line so that ##\lambda## and ##v^i## generate the whole line. Then they adopt a confusing notation that ##x## is an abbreviation for the 3-vector with components ##x^i##. Using a bold font would have been more helpful.

But persevering with their notation, ##x = x(\lambda) = x_0 + \lambda v##. Since we want the transformed ##x'^{\,i}## to be a straight line also, in general parameterized by a different ##\lambda'## and ##v'##, we can write
$$
x'^{\,i}(x) ~=~ x'^{\,i}(x_0) ~+~ \lambda'(\lambda) \, v'^{\,i}
$$
where the first term on the RHS is to be understood as what ##x_0## is mapped into. I.e., think of ##x'^{\,i}## as a mapping. It might have been more transparent if they'd written ##x'^{\,i}_0## and then explained why this can be expressed as ##x'^{\,i}(x_0)##.

Confusing? Yes, I know that only too well. I guess it becomes second nature when one is working in this way all the time. Fock also does a lot of this sort of thing.

Ok, but a little bit further down in the proof, the author seems to use this, which is based upon a particular representation of a particular line, to draw conclusions about other lines at other positions, it is where he introduces a function f(x,v), and I don't understand this at all.

And still, the conclusion of the theorem seems wrong to me. It is nowhere stated that we must have n>1, and for n=1, the function f(x)=x^3+x seems to contradict the theorem, since it is a differentialble bijection from R (a line) onto itself, with a differentiable inverse, but f does not have the required form.

strangerep · Nov 15, 2012

Erland said:

Ok, but a little bit further down in the proof, the author [Guo et al] seems to use this, which is based upon a particular representation of a particular line, to draw conclusions about other lines at other positions, it is where he introduces a function f(x,v), and I don't understand this at all.

From their equation
$$
v^j v^k \, \frac{\partial^2 x'^{\,i}}{\partial x^j \partial x^k}
(x_0 + \lambda v)
~=~ v^j\,\frac{\partial x'^{\,i}}{\partial x^j} \,
\frac{\,\frac{d^2 \lambda'}{d\lambda^2}\,}{d\lambda'/d\lambda} ~,
$$
we see that ##\frac{d^2 \lambda'}{d\lambda^2}/\frac{d\lambda'}{d\lambda}## at ##(x^i)## depends not only on ##x^i## but also on ##v^i##. Therefore, there must exist a function ##f(x,v)## such that
$$
v^j v^k \, \frac{\partial^2 x'^{\,i}}{\partial x^j \partial x^k}
~=~ v^j \, \frac{\partial x'^{\,i}}{\partial x^j} \,f(x,v) ~.
$$
Strictly, ##f(x,v)## also depends on ##\lambda##, but this dependence is suppressed in the notation here, since we only need the fact that ##f## depends at least on ##x## and ##v##.

And still, the conclusion of the theorem seems wrong to me. It is nowhere stated that we must have n>1, and for n=1, the function f(x)=x^3+x seems to contradict the theorem, [...]

No, that 's ##n=2##, not ##n=1##.
Think of the (x,y) plane. A straight line on this plane can be expressed as
$$
y ~=~ y(x) ~=~ y_0 + s x ~.
$$ for some constants ##y_0## and ##s##.
Alternatively, the same straight line can be expressed in terms of a parameter ##\lambda## and constants ##v_x, v_y## as
$$
y = y(\lambda) ~=~ y_0 + \lambda v_y ~,~~~~~
x = x(\lambda) ~=~ \lambda v_x ~,
$$ and eliminating ##\lambda## gives the previous form, with ##s = v_y/v_x##.
That's what going on here: straight lines are expressed in the parametric form. Your cubic cannot be expressed in this form, hence is in no sense a straight line.

Erland · Nov 15, 2012

strangerep said:

From their equation
$$
v^j v^k \, \frac{\partial^2 x'^{\,i}}{\partial x^j \partial x^k}
(x_0 + \lambda v)
~=~ v^j\,\frac{\partial x'^{\,i}}{\partial x^j} \,
\frac{\,\frac{d^2 \lambda'}{d\lambda^2}\,}{d\lambda'/d\lambda} ~,
$$
we see that ##\frac{d^2 \lambda'}{d\lambda^2}/\frac{d\lambda'}{d\lambda}## at ##(x^i)## depends not only on ##x^i## but also on ##v^i##. Therefore, there must exist a function ##f(x,v)## such that
$$
v^j v^k \, \frac{\partial^2 x'^{\,i}}{\partial x^j \partial x^k}
~=~ v^j \, \frac{\partial x'^{\,i}}{\partial x^j} \,f(x,v) ~.
$$
Strictly, ##f(x,v)## also depends on ##\lambda##, but this dependence is suppressed in the notation here, since we only need the fact that ##f## depends at least on ##x## and ##v##.

It is precisely this I don't understand. If we are talkning about a single line and its image, then ##v## is a constant vector, a direction vector of the line, and then it doesn't seem meaningful to take a function depending upon it.
If, on the other hand, we are talking about several, perhaps all, lines and their images, then the problem is that the parametric equations of the lines are not unique, we can freely choose between points on the line and parallell direction vectors, and it is hard to see how we can associate one such choice for the image line with one for the original line in a consistent way. How can then ##f(x,v)## be well defined?

strangerep said:

No, that 's ##n=2##, not ##n=1##.

[---]

Your cubic cannot be expressed in this form, hence is in no sense a straight line.

No, I am not talking about the curve ##y=f(x)=x^3+x## in ##R^2##. I talk about ##f## as a transformation from ##R^1## to itself. In ##R^1##, there is only one line, ##R^1## itself, and it is mapped onto itself by ##f##.

TrickyDicky · Nov 15, 2012

Erland said:

No, I am not talking about the curve ##y=f(x)=x^3+x## in ##R^2##. I talk about ##f## as a transformation from ##R^1## to itself. In ##R^1##, there is only one line, ##R^1## itself, and it is mapped onto itself by ##f##.

Remember that for the one dimensional case it doesn't make sense to single out mappings of straight lines to straight lines since they all are "straight lines", curvature for one-dimensional objects is only extrinsic unlike what happens in higher dimension spaces.
So even if you want to restrict the function to the real line, you need the 2-dimensional representation as strangerep pointed out if you want to make any distinction between linearity and non-linearity of lines(curves).

Fredrik · Nov 15, 2012

TrickyDicky said:

Remember that for the one dimensional case it doesn't make sense to single out mappings of straight lines to straight lines since they all are "straight lines",

That's precisely why it's disturbing that the theorem doesn't assume that the dimension of the vector space is at least 2. Since every ##f:\mathbb R\to\mathbb R## takes straight lines to straight lines, the theorem says that there are numbers a,b such that
$$f(x)=ax+b$$
for all x in the domain. Actually it says that there are numbers a,b,c,d such that
$$f(x)=\frac{ax+b}{cx+d}$$
for all x in the domain, but since we're considering an f with domain ℝ, we must have c=0, and this allows us to define a'=a/d, b'=b/d. Since there are lots of other functions from ℝ to ℝ, the theorem is wrong.

It's possible that the only problem with the theorem is that it left out a statement that says that the dimension of the vector space must be at least 2, but then the proof should contain a step that doesn't work in 1 dimension. (I still haven't studied the proof, so I have no opinion).

TrickyDicky · Nov 15, 2012

One dimensional vector spaces? That would be scalars, in linear algebra the vector spaces are assumed to be of dimension 2 or higher, aren't they?

Fredrik · Nov 15, 2012

TrickyDicky said:

One dimensional vector spaces? That would be scalars, in linear algebra the vector spaces are assumed to be of dimension 2 or higher, aren't they?

No, they can even be 0-dimensional. That would be a set with only one member. (Denote that member by 0. Define addition and scalar multiplication by 0+0=0, and a0=0 for all scalars a. The triple ({0},addition,scalar multiplication) satisfies the definition of a vector space). 0-dimensional vector spaces are considered "trivial". ℝ is a 1-dimensional real vector space.

TrickyDicky · Nov 15, 2012

Fredrik said:

No, they can even be 0-dimensional. That would be a set with only one member. (Denote that member by 0. Define addition and scalar multiplication by 0+0=0, and a0=0 for all scalars a. The triple ({0},addition,scalar multiplication) satisfies the definition of a vector space). 0-dimensional vector spaces are considered "trivial". ℝ is a 1-dimensional real vector space.

Sure, I'm not saying they can't be defined in those dimensions, by assumed I referred to the usually found in linear transformations involving velocities.

Fredrik · Nov 15, 2012

I think most theorems in linear algebra hold for any finite-dimensional vector space. But I'm sure there are some that only hold when the dimension is ≥2, and some that only hold when it's ≥3.

strangerep · Nov 15, 2012

Erland said:

[...] If, on the other hand, we are talking about several, perhaps all, lines and their images, then the problem is that the parametric equations of the lines are not unique, we can freely choose between points on the line and parallel direction vectors, and it is hard to see how we can associate one such choice for the image line with one for the original line in a consistent way. How can then ##f(x,v)## be well defined?

We're talking about all lines and their images. The idea is that, for any given line, pick a parameterization, and find mappings such that the image is still a (straight) line, in some parameterization of the same type. The ##f(x,v)## is defined in terms of whatever parameterization we chose initially.

No, I am not talking about the curve ##y=f(x)=x^3+x## in ##R^2##. I talk about ##f## as a transformation from ##R^1## to itself. In ##R^1##, there is only one line, ##R^1## itself, and it is mapped onto itself by ##f##.

But that case is irrelevant to the physics applications here since there's only one component ##x^i## (which I'll just write as ##x##), hence the notion of velocity cannot be defined since one needs at least ##n=2## for that so we can write ##dx/dt##.

In your ##n=1## objection, ##x'## is parallel (or antiparallel) to ##x##. Afaict, this means that the 2nd derivatives in the proof such as
$$
\frac{\partial^2 x'{^i}}{\partial x^j \, \partial x^k}
$$
always vanish. Probably this is a degenerate case, though I haven't tracked it through to find precisely where this affects things. The authors are interested in ##dx/dt## which is an ##n\ge 2## case, hence probably didn't bother with that subtlety. Maybe the proof should have a caveat about ##n\ge 2##, but for the intended physics applications, this doesn't change anything.

BTW, note that Stepanov's proof does not use the parameterization technique used by Guo et al, but rather works directly with 1+1D spacetime, requiring that the condition of zero acceleration is preserved. This is more physically intuitive, and less prone to subtle oversights.

bobc2 · Nov 15, 2012

I may as well go ahead and complete the derivation for the Lorentz transformations (boost). So, continuing from the previous time dilation derivation (post #37) we identify congruent triangles from which an easy derivation of the length contraction follows.

member 11137 · Nov 17, 2012

strangerep said:

The most common reason is so-called homogeneity of space and time. By this, the authors mean that position-dependent (and time-dependent) dilations (scale changes) are ruled out arbitrarily.

Personally, I prefer a different definition of spacetime homogeneity: i.e., that it should look the same wherever and whenever you are. IOW, it must be a space of constant curvature.
This includes such things as deSitter spacetime, and admits a larger class of possibilities.

But another way that various authors reach the linearity assumption is to start with the most general transformations preserving inertial motion, which are fractional-linear transformations. (These are the most general transformations which map straight lines to straight lines -- see note #1.) They then demand that the transformations must be well-defined everywhere, which forces the denominator in the FL transformations to be restricted to a constant, leaving us with affine transformations.

In the light of modern cosmology, these arbitrary restrictions are becoming questionable.

--------
Note #1: a simpler version of Fock's proof can be found in Appendix B of this paper:
http://arxiv.org/abs/gr-qc/0703078c/0703078 by Guo et al.

An even simpler proof for the case of 1+1D can also be found in Appendix 1 of this paper:
http://arxiv.org/abs/physics/9909009 by Stepanov. (Take the main body of this paper with a large grain of salt, but his Appendix 1 seems to be ok, though it still needs the reader to fill in some of the steps -- speaking from personal experience. :-)

I think this post is exposing the central problematic. Lorentz transformations are stronghly related to a pragmatic necessity: inertial observers must have the sensation that the essential properties of the space are presserved (one peculiar example is the length element).

Conversely, does it mean that non-inertial observers must use different transformations than the Lorentz's ones? If yes, which ones?

Fredrik · Nov 17, 2012

Anyone see a simple proof of the following less general statement? If ##\Lambda:\mathbb R^n\to\mathbb R^n## is a bijection that takes straight lines to straight lines, and takes 0 to 0, then ##\Lambda## is linear.

Feel free to add assumptions about differentiability of ##\Lambda## if you think that's necessary.

I've got almost nothing so far. I can see that given an arbitrary vector x and an arbitrary real number t, there's a real number s such that ##\Lambda(tx)=s\Lambda(x)##. This means that there's a function ##s:\mathbb R^n\times\mathbb R\to\mathbb R## such that ##\Lambda(tx)=s(x,t)\Lambda(x)## for all x,t. For all x, we have ##0=\Lambda(0)=\Lambda(0x)=s(x,0)\Lambda(x)##. This implies that ##s(x,0)=0## for all ##x\neq 0##. We should be able to choose our s such that s(0,0)=0 as well.

I don't see how to proceed from here, and I don't really see how to begin with the evaluation of ##\Lambda(x+y)## where x,y are arbitrary. One idea I had was to let r be a number such that x+y is on the line through rx and ry. (If x,y are non-zero, there's always such a number. And if one of x,y is zero, there's nothing to prove). Then there's a number t such that
$$\Lambda(x+y)=(1-t)\Lambda(rx)+t\Lambda(ry)=(1-t)s(x,r)\Lambda(x)+ts(y,r)\Lambda(y).$$ But I don't see how to use this. If we want to turn the above into a "For all x,y" statement, we must write t(x,y) instead of t.

By the way, one of the reasons why I think there should be a simple proof is that this was an exercise in the book I linked to in post #27. Unfortunately the author didn't even mention that the map needs to take 0 to 0, so there's definitely something wrong with the exercise, but perhaps that omission is the only thing wrong with it. The author also assumed that the map is a surjection (onto a vector space W), rather than a bijection.

member 11137 · Nov 17, 2012

Fredrik said:

Anyone see a simple proof of the following less general statement? If ##\Lambda:\mathbb R^n\to\mathbb R^n## is a bijection that takes straight lines to straight lines, and takes 0 to 0, then ##\Lambda## is linear.

Feel free to add assumptions about differentiability of ##\Lambda## if you think that's necessary.

A priori, per definition, a bijection is a surjection and an injection. I don't see why this should imply the linearity of that bijection.

By the way, one of the reasons why I think there should be a simple proof is that this was an exercise in the book I linked to in post #27. Unfortunately the author didn't even mention that the map needs to take 0 to 0, so there's definitely something wrong with the exercise, but perhaps that omission is the only thing wrong with it. The author also assumed that the map is a surjection (onto a vector space W), rather than a bijection.

The exercice (1.3.1) page 9 (1) is not so complicated: If T is a linear transformation and if x, y and z are co-linear vectors then you have an α, β and λ (for example in ℝ) such that α. x = β. y = λ. z. Consequently: T(α. x) = T(β. y) = T(λ. z) and the linearity implies: α. T(x) = β. T(y) = λ. T(z). So that T(x), T(y) and T(z) are also colinear.

Now I think we are very far from the initial question which was to prove the unicity of the Lorentz's transformations. There are several levels in the different interventions proposed until here: 1°) at one level interventions are trying to re-demontrate the Lorentz's transformations (LTs) but it is not answering the initial question; 2°) at the other level indications are given concerning the logic going from the preservation of the length element (post 1) to the LTs. An answer to the initial question would thus consist in testing the unicity of the followed logic.

Fredrik · Nov 17, 2012

Blackforest said:

A priori, per definition, a bijection is a surjection and an injection. I don't see why this should imply the linearity of that bijection.

Strangerep posted a link to an article that proves a theorem about functions that take straight lines to straight lines:

strangerep said:

Note #1: a simpler version of Fock's proof can be found in Appendix B of this paper:
http://arxiv.org/abs/gr-qc/0703078 by Guo et al.

Then I made the following observation:

Fredrik said:

I realized something interesting when I looked at the statement of the theorem they're proving. They're saying that if ##\Lambda## takes straight lines to straight lines, there's a 4×4 matrix A, two 4×1 matrices y,z, and a number c, such that
$$\Lambda(x)=\frac{Ax+y}{z^Tx+c}.$$
If we just impose the requirement that ##\Lambda(0)=0##, we get y=0. And if z≠0, there's always an x such that the denominator is 0. So if we also require that ##\Lambda## must be defined on all of ##\mathbb R^4##, then the theorem says that ##\Lambda## must be linear. Both of these requirements are very natural if what we're trying to do is to explain e.g. what the principle of relativity suggests about theories of physics that use ##\mathbb R^4## as a model of space and time.

So the theorem (which has a pretty hard proof) tells us that if X and Y are vector spaces and ##T:X\to Y## takes straight lines to straight lines, there's an ##a\in X## and a linear ##\Lambda:X\to Y## such that ##T(x)=\Lambda x+a## for all ##x\in X##. If we also require that T(0)=0, then T must be linear. I'm hoping that this statement has a simpler proof.

Fredrik said:

The exercice (1.3.1) page 9 (1) is not so complicated:

Right, that one is trivial. The one I'm struggling with is 1.3.1 (2).

Fredrik said:

Now I think we are very far from the initial question

I think the question has been answered. It has been pointed out by strangrep and samalkhaiat that the condition in the OP is consistent with conformal transformations as well as Lorentz transformations, and my first two posts in the thread described two ways to strengthen the assumption so that it leads to the Lorentz transformation. This stuff about straight lines is linear, because the theorem I proved assumes that the coordinate transformation is linear. I would prefer to only assume that it takes straight lines to straight lines, and 0 to 0, and prove linearity from that.

Muphrid · Nov 17, 2012

Let there be two vectors ##e_0, e_1## such that ##e_0 \cdot e_0 = -1## and ##e_1 \cdot e_1 = 1##, as well as ##e_0 \cdot e_1 = 0##. This is an orthonormal basis for a 1+1 Minkowski space.

Isotropy of this space allows us to freely change the basis. Let ##{e_0}' = ge_0 + he_1## and ##{e_1}' = k e_0 + l e_1##.

We enforce that these vectors are unit, yielding two conditions: ##-g^2 + h^2 = -1## and ##-k^2 + l^2 = 1##. We can say that these coefficients are hyperbolic sines and cosines. That is, ##g = \cosh \mu##, ##h = \sinh \mu##, ##l = \cosh \nu## and ##k = \sinh \nu## for some ##\mu, \nu##. (There is a case where $l, g$ have their signs negated, corresponding to reflections plus boosts, but we can tacitly ignore that case here.)

Now, enforce that the vectors are orthogonal. ##-\cosh \mu \sinh \nu + \sinh \mu \cosh \nu = 0##. This is a hyperbolic trig identity, yielding ##\sinh(\mu - \nu) = 0##. But hyperbolic sine is only zero when the argument is zero, yielding ##\mu = \nu##.

The transformed basis vectors then take the form

##{e_0}' = e_0 \cosh \phi + e_1 \sinh \phi \\
{e_1}' = e_1 \cosh \phi + e_0 \sinh \phi##

These are the Lorentz transformations. Using these basis vectors to evaluate the components of four-vectors establishes the more familiar form in terms of components. By construction, the only other possibilities for constructing an orthonormal frame involve reflections of the basis.

strangerep · Nov 17, 2012

Blackforest said:

I[...] Lorentz transformations are strongly related to a pragmatic necessity: inertial observers must have the sensation that the essential properties of the space are preserved (one peculiar example is the length element).

I'm not sure what point you're trying to make here. The metric can only be determined after we know the group of applicable symmetry transformations -- which map between inertial observers, and follow the principle that all inertial observers perceive equivalent laws of physics.

Conversely, does it mean that non-inertial observers must use different transformations than the Lorentz's ones? If yes, which ones?

Again, I'm not sure what you're asking. If you mean transformations which map an arbitrary non-inertial observer to any other, then of course one needs the full diffeomorphism group, as in GR. But different non-inertial observers do not necessarily perceive equivalent laws of physics.

strangerep · Nov 17, 2012

Fredrik said:

By the way, one of the reasons why I think there should be a simple proof is that this was an exercise in the book I linked to in post #27. Unfortunately the author didn't even mention that the map needs to take 0 to 0, so there's definitely something wrong with the exercise, [...]

Why does it need to take 0 to 0? The map could translate the origin to somewhere else...

Fredrik · Nov 17, 2012

strangerep said:

Why does it need to take 0 to 0? The map could translate the origin to somewhere else...

I'm just making the problem as simple as possible. If we prove this version of the theorem, and then encounter a map ##T:X\to Y## that takes straight lines to straight lines but 0 to ##y\neq 0##, then we can define ##S:X\to Y## by ##S(x)=T(x)-y## and apply the theorem to S.

In other words, there's no need to assume that it takes 0 to 0, but we have nothing to gain by leaving that assumption out. If we can prove the version of the theorem that includes the assumption that 0 is taken to 0, then the simple argument above proves the version of the theorem that doesn't include that assumption.

Edit: It looks like I started typing before I understood what exactly what you were asking. (I thought you were asking why I'm specifically asking for a proof of the "takes 0 to 0" version of the theorem). The reason why I think the problem in Sunder's book should include the assumption that the map takes 0 to 0 is that we're supposed to come to the conclusion that the map is linear.

strangerep · Nov 18, 2012

Fredrik said:

The reason why I think the problem in Sunder's book should include the assumption that the map takes 0 to 0 is that we're supposed to come to the conclusion that the map is linear.

But that is false. The most general transformation is FL, and there is a FL generalization of boosts (taking 0 to 0) which is not linear. [Manida]

The more I think about it, the "straight lines to straight lines" way of describing it is a bit misleading for physics/relativity purposes. For the latter, it's better to ask "what's the maximal dynamical group for the free equations of motion?" -- which is a more precise way of asking for the maximal group that maps between inertial observers. I don't think you can go direct to linearity, but only via FLTs.

Erland · Nov 18, 2012

Strangerep, I must admire your patience. Yes, I suppose one must spend months if one should get a chance to understand this proof by Guo et al. A proof that to me seems to be utter gibberish. Even if their reasoning probably is correct, they have utterly failed to communicate it in an intelligible way.
But since you claim you now understand it. I keep asking you about it. I hope that's okay...

strangerep said:

We're talking about all lines and their images. The idea is that, for any given line, pick a parameterization, and find mappings such that the image is still a (straight) line, in some parameterization of the same type. The ##f(x,v)## is defined in terms of whatever parameterization we chose initially.

What do you mean by "pick a parametrization"? How is this picking administered? Surely, such parametrizations cannot be picked in a completely arbitrary manner, not even depending continuously upon the lines (or their positions)?

The only way I can understand this is to consider a map from lines to lines, but not lines as point sets, but as parametrized lines. If (x₀,v) determines a parametization x=x₀+λv of a line, this is mapped to M(x₀,v)=(y₀,w) where y₀=T(x₀₎ and w=T(x₀+v)-T(x₀), where T is the coordinate transformation.

But even so, f(x,v) should be a function of x₀, v and λ, not of x. And I don't understand how they can claim that f depends linearly upon v. This seems outright false, since we have the factors vⁱv^j, which is a quadratic expression in v, not a linear one. And then they deduce an equation (B3) in a way that I don't understand either.

So, there is not much I understand in this proof.

member 11137 · Nov 18, 2012

strangerep said:

I'm not sure what point you're trying to make here. The metric can only be determined after we know the group of applicable symmetry transformations -- which map between inertial observers, and follow the principle that all inertial observers perceive equivalent laws of physics.

Again, I'm not sure what you're asking. If you mean transformations which map an arbitrary non-inertial observer to any other, then of course one needs the full diffeomorphism group, as in GR. But different non-inertial observers do not necessarily perceive equivalent laws of physics.

What is my point? Well I try to explain it better. You are evocating the “principle of relativity”.

My position was based on a demonstration of the LTs starting from the Morley and Michelson experiment. We write the equations mentioned in post #1. We then a priori suppose the existence of linear transformations of the coordinates. After some manipulations we get (I follow the short description for a 1 + 1 space) the Lorentz transformations for a special feature of the theory of relativity.

My opinion is changing since I have seen the article linked in post #24. The logic is based on two assumptions. The first one is the “principle of relativity” and the second one is in fact just the result of the Morley and Michelson experiment. What is interesting (and quite different from the first one I knew) in that second approach is the way of thinking leading slowly to the conclusion that the transformations we were looking for must be linear (-> 13 and 14). Linearity is an unavoidable consequence of the principle of relativity.

Now, the concept of inertial observers can only be involved when accelerations are negligible (exactly when the sum of all local forces vanishes). As mentioned somewhere during the discussion, the universe is accelerating everywhere (Nobelprize 2011)... this is suggesting that inertial observers exist locally and only when a short lapse of time is considered.

Another important point of the discussion (and it was cited several times here) concerns the concept of “homogeneity”. This is perhaps the place where non-linear transformations could be introduced into a more sophisticated theory, offering an alternative to the LTs. I see the critics coming... no speculation... just facts. Conditions preserving the formalism of equations exposed #1 are typically the center of the preoccupations devopped by E.B. Christoffel in 1869...

Fredrik · Nov 18, 2012

strangerep said:

But that is false. The most general transformation is FL,

Not when the domain is a vector space. You agreed with this before:

Fredrik said:

I realized something interesting when I looked at the statement of the theorem they're proving. They're saying that if ##\Lambda## takes straight lines to straight lines, there's a 4×4 matrix A, two 4×1 matrices y,z, and a number c, such that
$$\Lambda(x)=\frac{Ax+y}{z^Tx+c}.$$
If we just impose the requirement that ##\Lambda(0)=0##, we get y=0. And if z≠0, there's always an x such that the denominator is 0. So if we also require that ##\Lambda## must be defined on all of ##\mathbb R^4##, then the theorem says that ##\Lambda## must be linear. Both of these requirements are very natural if what we're trying to do is to explain e.g. what the principle of relativity suggests about theories of physics that use ##\mathbb R^4## as a model of space and time.

If my ##z## (their ##C_i##) is 0, there's always an x such that the denominator is 0, so ##\Lambda## can't be defined on the whole vector space. In Sunder's exercise, the domain is assumed to be a vector space, not an arbitrary subset of a vector space. So Fock's theorem says that the map is of the form ##x\mapsto Lx+y##, where L is linear. But Sunder is asking us to prove that it's linear, i.e. that it's of that form with y=0. That's why I'm saying that there's something wrong with his exercise, but it doesn't have to be anything more serious than an omission of the assumption that that the map takes 0 to 0.

As I pointed out in my previous post, (when we take the domain to be a vector space) the versions of the theorem with or without the assumption "takes 0 to 0" trivially imply each other, so it doesn't matter which one of those we prove.

strangerep said:

The more I think about it, the "straight lines to straight lines" way of describing it is a bit misleading for physics/relativity purposes. For the latter, it's better to ask "what's the maximal dynamical group for the free equations of motion?" -- which is a more precise way of asking for the maximal group that maps between inertial observers. I don't think you can go direct to linearity, but only via FLTs.

You are considering a more general problem than I am at the moment. I'm just trying to complete (the 1+1-dimensional version of) the argument that mathematical assumptions inspired by the principle of relativity show that if we're going to use a mathematical structure with ##\mathbb R^4## as the underlying set as "spacetime" in a theory of physics in which the inertial coordinate systems are defined on all of ##\mathbb R^4##, then either the Galilean group or the Poincaré group must in some way be a "property" of that structure. Then we can define "spacetime" either as the pair ##(\mathbb R^4, G)## where G is the group, or we can try to find a structure that in some other way has the group as a "property". Since the Poincaré group is the isometry group of the Minkowski metric, it's much prettier to define spacetime as Minkowski spacetime. Unfortunately, there's no metric that gets the job done in the other case, so we'll have to either go for the ugly definition ##(\mathbb R^4, G)##, or a fancy one where "spacetime" is defined as some sort of fiber bundle over ##\mathbb R##, with ##\mathbb R^3## as the fiber.

Fredrik · Nov 18, 2012

Erland said:

Strangerep, I must admire your patience. Yes, I suppose one must spend months if one should get a chance to understand this proof by Guo et al. A proof that to me seems to be utter gibberish. Even if their reasoning probably is correct, they have utterly failed to communicate it in an intelligible way.
But since you claim you now understand it. I keep asking you about it. I hope that's okay...What do you mean by "pick a parametrization"? How is this picking administered? Surely, such parametrizations cannot be picked in a completely arbitrary manner, not even depending continuously upon the lines (or their positions)?

The only way I can understand this is to consider a map from lines to lines, but not lines as point sets, but as parametrized lines. If (x₀,v) determines a parametization x=x₀+λv of a line, this is mapped to M(x₀,v)=(y₀,w) where y₀=T(x₀₎ and w=T(x₀+v)-T(x₀), where T is the coordinate transformation.

But even so, f(x,v) should be a function of x₀, v and λ, not of x. And I don't understand how they can claim that f depends linearly upon v. This seems outright false, since we have the factors vⁱv^j, which is a quadratic expression in v, not a linear one. And then they deduce an equation (B3) in a way that I don't understand either.

So, there is not much I understand in this proof.

Here's my take on that part of the proof. I think I've made it to eq. (B3), but like you (if I understand you correctly), I have ##x_0## where they have ##x##. I'll write t,s instead of λ,λ' because it's easier to type, and I'll write u instead of v' because I'm going to use primes for derivatives, so I don't want any other primes. I will denote the map that takes straight lines to straight lines by ##\Lambda##, because that's a fairly common notation for a change of coordinates, and because seeing it written as x' really irritates me.

Let x be an arbitrary vector. Let v be an arbitrary non-zero vector. The map ##t\mapsto x+tv## (with domain ℝ) is a straight line. (Note that my x is their x₀). By assumption, ##\Lambda## takes this to a straight line. So ##\Lambda(x)## is on that line, and for all t in ℝ, ##\Lambda(x+tv)## is on that line too. This implies that there's a non-zero vector u (in the codomain of ##\Lambda##) such that for each t, there's an s such that ##\Lambda(x+tv)=\Lambda(x)+su##.

Since we're dealing with a finite-dimensional vector space, let's define a norm on it and require u to be a unit vector. Now the number s is completely determined by the properties of ##\Lambda## along the straight line ##t\mapsto x+tv##, which is completely determined by x and v. It would therefore be appropriate to write the last term of ##\Lambda(x)+su## as s(x,v,t)u(x,v), but that would clutter the notation, so I will just write s(t)u. We will have to remember that they also depend on x and v. I will write the partial derivative of s with respect to t as s'. So, for all t, we have
$$\Lambda(x+tv)=\Lambda(x)+s(t)u.\qquad (1)$$ Now take the ith component of (1) and Taylor expand both sides around t=0. I will use the notation ##{}_{,j}## for the jth partial derivative. The first-order terms must be equal:
$$t\Lambda^i{}_{,j}(x)v^j=ts'(0)u.$$ This implies that
$$u=\frac{\Lambda^i{}_{,j}(x) v^j}{s'(0)}.$$ Now differentiate both sides of the ith component of (1) twice with respect to t, and then set t=0.
$$\Lambda^i{}_{,jk}(x)v^jv^k =s''(0)u=\frac{s''(0)}{s'(0)}\Lambda^i{}_{,j}(x) v^j.\qquad(2)$$ Now it's time to remember that s(t) really means s(x,v,t). The value of s''(0)/s'(0) depends on x and v, and is fully determined by the values of those two variables. So there's a function f such that ##f(x,v)=s''(0)/s'(0)##. Let's postpone the discussion of whether f must be linear in the second variable, and first consider what happens if it is linear in the second variable. Then we can write ##f(x,v)=v^i f_{,i}(x,0)=2f_{i}(x)v^i##, where I have defined ##f_i## by ##f_i(x)=f_{,i}(x,0)/2##. The reason for the factor of 2 will be obvious below. Now we can write (2) as
\begin{align}
\Lambda^i{}_{,jk}(x)v^jv^k &=2f_k(x)\Lambda^i{}_{,j}(x) v^j v^k\\
&=f_k(x)\Lambda^i{}_{,j}(x) v^j v^k +f_k(x)\Lambda^i{}_{,j}(x) v^j v^k\\
&=f_k(x)\Lambda^i{}_{,j}(x) v^j v^k +f_j(x)\Lambda^i{}_{,k}(x) v^k v^j\\
&=\big(f_k(x)\Lambda^i{}_{,j}(x)+f_j(x)\Lambda^i{}_{,k}(x)\big)v^k v^j.\qquad (3)
\end{align} All I did to get the third line from the second was to swap the dummy indices j and k in the second term. Since (3) holds for all x and all v≠0, it implies that
$$\Lambda^i{}_{,jk}(x)=f_k(x)\Lambda^i{}_{,j}(x)+f_j(x)\Lambda^i{}_{,k}(x).\qquad (4)$$ This is my version of their (B3). Since my x is their x₀, it's not exactly the same. The fact that they have x (i.e. my x+tv) in the final result suggests that they didn't set t=0 like I did. So I think their result is equivalent to mine even though it looks slightly different.Let's get back to the linearity of f in the second variable. I don't have a perfect argument for it yet, but I'm fairly sure that it can be proved using arguments similar to this (even though this one doesn't quite go all the way): (2) is an equality of the form
$$v^T M v= g(v)m^Tv,$$ where M is an n×n matrix and m is an n×1 matrix (like v). The equality is supposed to hold for all v. For all ##a\in\mathbb R##, we have
$$g(av)m^Tv =\frac{g(av)Mg(av)}{a} =\frac{1}{a}(av)^TM(av) =av^TMv =ag(v)m^Tv.$$ So at least we have ##g(av)=ag(v)## for all v such that ##m^Tv\neq 0##.

Erland · Nov 18, 2012

Fredrik, I am impressed!

Yes, I think you did what Guo et al intended, only in a clear, understandable way.

For the rest of the linearity of f wrt. v, this would follow quite easily if we could prove that
$$v^T M w= g(v)m^Tw$$ holds also for ##v\neq w##.

But how can we prove this? Some parallellogram law-like argument, perhaps?

strangerep · Nov 18, 2012

Fredrik said:

$$g(av)m^Tv =\frac{g(av)Mg(av)}{a} =\frac{1}{a}(av)^TM(av) =av^TMv =ag(v)m^Tv.$$

The 2nd expression seems wrong (but also unnecessary, since the rest looks right if you just skip over it).

The earlier part of your argument is certainly an improvement over the original.

[Erland, I'll assume there's no longer any need for me to answer your post #58, unless you tell me otherwise.]

Fredrik · Nov 18, 2012

strangerep said:

The 2nd expression seems wrong (but also unnecessary, since the rest looks right if you just skip over it).

Yes, that looks weird. This is what I scribbled on paper:
$$g(av)m^Tv =\frac{g(av)m^T(av)}{a}=\frac{(av)^T M(av)}{a}=a v^TM v =ag(v)m^T v.$$ I guess I ended up typing something else.

Fredrik · Nov 18, 2012

Erland said:

For the rest of the linearity of f wrt. v, this would follow quite easily if we could prove that
$$v^T M w= g(v)m^Tw$$ holds also for ##v\neq w##.

But how can we prove this? Some parallellogram law-like argument, perhaps?

You mean something like inserting v=u+w and v=u-w (where u and w are arbitrary), and subtracting one of the equalities from the other? I think we need to know that g is linear before we can get something useful from that kind of trick.

Fredrik · Nov 18, 2012

I've been thinking about the linearity some more, and I'm starting to doubt that it's possible to prove that g is linear, i.e. that f(x,v) is linear in v. I mean, the function probably is linear, since the theorem ends up with what I trust is the correct conclusion, but it doesn't look possible to prove it just from the statement ##v^TMv=g(v)m^Tv## for all v. Not if we don't know anything about M or m. Since ##M_{jk}=\Lambda^i{}_{,\, jk}(x)##, we have ##M^T=M##, but that doesn't seem to help. I'm pretty confused right now.

By the way, I got a tip that my simplified version of the theorem is more or less "the fundamental theorem of affine geometry". See e.g. page 52 of "Geometry" by Marcel Berger. Link. Unfortunately I can't see the whole proof, but I can see that it's long and complicated.

strangerep · Nov 19, 2012

Fredrik said:

I've been thinking about the linearity some more, and I'm starting to doubt that it's possible to prove that g is linear, i.e. that f(x,v) is linear in v. I mean, the function probably is linear, since the theorem ends up with what I trust is the correct conclusion, but it doesn't look possible to prove it just from the statement ##v^TMv=g(v)m^Tv## for all v. Not if we don't know anything about M or m. Since ##M_{jk}=\Lambda^i{}_{,\, jk}(x)##, we have ##M^T=M##, but that doesn't seem to help. I'm pretty confused right now.

I believe the key point is to understand what is dependent on what. The mapping ##\Lambda## goes between two different copies of ##R^n## -- physically these correspond to different frames of reference. I'll call the copies ##V## and ##V'## (even though you dislike primes for this purpose -- I can't think of a better notation right now). A line in ##V## is expressed as ##L(x) = x_0 + \lambda v##, and a line in ##V'## is expressed as ##L'(x') = x'_0 + \lambda' v'## (component indices suppressed). The mapping is expressed as
$$
x ~\to~ x' = \Lambda(x) ~.
$$ When Guo et al write partial derivatives like ##\partial x'/\partial x## it should be thought of in terms of ##\partial \Lambda/\partial x##. This does not depend on ##v## since it refers to the entire mapping between the spaces ##V## and ##V'##.

Once this subtlety is seen, it becomes trivial (imho) that ##f(x,v)## is linear in ##v##, but I suspect I still haven't explained it adequately. :-(

Then, to pass from ##f(x,v)## to their ##f_i## functions, we just make an ansatz for ##f(x,v)## of the form
$$
f(x,v) ~=~ \sum_j f_j v^j
$$ and substitute it accordingly. The 2 terms in Guo's (B3) arise because on the LHS the partial derivatives commute.

Erland · Nov 19, 2012

strangerep said:

Once this subtlety is seen, it becomes trivial (imho) that ##f(x,v)## is linear in ##v##, but I suspect I still haven't explained it adequately. :-(

Not trivial at all, imho. Please, show us!

Fredrik · Nov 19, 2012

strangerep said:

I believe the key point is to understand what is dependent on what. The mapping ##\Lambda## goes between two different copies of ##R^n## -- physically these correspond to different frames of reference. I'll call the copies ##V## and ##V'## (even though you dislike primes for this purpose -- I can't think of a better notation right now). A line in ##V## is expressed as ##L(x) = x_0 + \lambda v##, and a line in ##V'## is expressed as ##L'(x') = x'_0 + \lambda' v'## (component indices suppressed).

I don't mind primes for this purpose. The only thing I really disliked about the article's notation was that they denoted the coordinate transformation by ##x'## instead of (something like) ##\Lambda##.

I don't understand your notation L(x) and L'(x'). Don't you mean L(λ) and L'(λ') (with x=L(λ) and x'=L'(λ')), i.e. that L and L' are maps that take a real number to a point in a 1-dimensional subspace. I would call both those functions and those 1-dimensional subspaces "lines".

strangerep said:

When Guo et al write partial derivatives like ##\partial x'/\partial x## it should be thought of in terms of ##\partial \Lambda/\partial x##. This does not depend on ##v## since it refers to the entire mapping between the spaces ##V## and ##V'##.

Once this subtlety is seen, it becomes trivial (imho) that ##f(x,v)## is linear in ##v##, but I suspect I still haven't explained it adequately. :-(

I agree with Erland. It looks far from trivial to me too. Note that I do understand that the partial derivatives do not depend on v. I made that explicit by putting them into matrices M and m that are treated as constants. (They obviously depend on my x, i.e. Guo's ##x_0##). The fact that ##M_{jk}=\Lambda^i_{,\,jk}(x)## only tells me that M is symmetric.

Eq. (2) in post #61 is
$$\Lambda^i{}_{,\,jk}(x)v^jv^k =f(x,v)\Lambda^i{}_{,\,j}(x) v^j.$$ Are we really supposed to deduce that f(x,v) is linear in v only from this? Here's my biggest problem with that idea: What if v is orthogonal (with respect to the Euclidean inner product) to the vector whose j component is ##\Lambda^i{}_{,\,j}(x)##. (This is my m). Then the right-hand side above is =0, and f isn't even part of the equation.

The orthogonal complement of m isn't just some insignificant set. It's an (n-1)-dimensional subspace. I don't see a reason to think that ##v\mapsto f(x,v)## is linear on that subspace.

Erland · Nov 19, 2012

Fredrik said:

Eq. (2) in post #61 is
$$\Lambda^i{}_{,\,jk}(x)v^jv^k =f(x,v)\Lambda^i{}_{,\,j}(x) v^j.$$ Are we really supposed to deduce that f(x,v) is linear in v only from this? Here's my biggest problem with that idea: What if v is orthogonal (with respect to the Euclidean inner product) to the vector whose j component is ##\Lambda^i{}_{,\,j}(x)##. (This is my m). Then the right-hand side above is =0, and f isn't even part of the equation.

The orthogonal complement of m isn't just some insignificant set. It's an (n-1)-dimensional subspace. I don't see a reason to think that ##v\mapsto f(x,v)## is linear on that subspace.

True, but in this case. the equation above holds for all ##i##. And, since the matrix ##\Lambda^i{}_{,\,j}(x)## is assumed to be invertible for all ##x##, not all its rows can be orthogonal to ##v##.

Still, I cannot deduce that ##f(x,v)## is linear i ##v##. I cannot get rid of the ##v##-dependence when I want to show that two matrices must be equal...

Let us, for a fixed ##x##, denote the matrix ##\Lambda^i{}_{,\,j}(x)## by ##A##, and let ##B(v)## be the ##n\times n##-matrix whose element in position ##ij## is ##\Lambda^i{}_{,\,jk}(x)v^k##, where each element in ##B(v)## is a linear function of ##v##. Finally, let ##g(v)=f(x,v)##, as before.
We then have the vector equation

##B(v)v=g(v)Av##.

If we could prove that ##B(v)=g(v)A##, we would be done, but the ##v##-dependence seems to destroy such a proof.

Showing that Lorentz transformations are the only ones possible

Similar threads

Hot Threads

Recent Insights