Showing that Lorentz transformations are the only ones possible

Click For Summary
The discussion centers on demonstrating that Lorentz transformations are the only transformations that preserve the invariant interval of spacetime, represented by the equations c²t² - x² - y² - z² = 0 in both frames A and B. It is suggested that to prove this, one should consider linear functions of the form x(x',t') and t(x',t') with unknown coefficients derived from physical assumptions. The conversation highlights that merely preserving the interval is insufficient; additional constraints are necessary to rule out arbitrary transformations. The participants also debate whether the conditions stated in a referenced book are adequate for proving the exclusivity of Lorentz transformations, with some arguing that more general transformations exist. Ultimately, the consensus is that while Lorentz transformations are significant, they are not the only possible transformations when considering broader mathematical frameworks.
  • #61
Erland said:
Strangerep, I must admire your patience. Yes, I suppose one must spend months if one should get a chance to understand this proof by Guo et al. A proof that to me seems to be utter gibberish. Even if their reasoning probably is correct, they have utterly failed to communicate it in an intelligible way.
But since you claim you now understand it. I keep asking you about it. I hope that's okay...What do you mean by "pick a parametrization"? How is this picking administered? Surely, such parametrizations cannot be picked in a completely arbitrary manner, not even depending continuously upon the lines (or their positions)?

The only way I can understand this is to consider a map from lines to lines, but not lines as point sets, but as parametrized lines. If (x0,v) determines a parametization x=x0+λv of a line, this is mapped to M(x0,v)=(y0,w) where y0=T(x0) and w=T(x0+v)-T(x0), where T is the coordinate transformation.

But even so, f(x,v) should be a function of x0, v and λ, not of x. And I don't understand how they can claim that f depends linearly upon v. This seems outright false, since we have the factors vivj, which is a quadratic expression in v, not a linear one. And then they deduce an equation (B3) in a way that I don't understand either.

So, there is not much I understand in this proof. :confused:
Here's my take on that part of the proof. I think I've made it to eq. (B3), but like you (if I understand you correctly), I have ##x_0## where they have ##x##. I'll write t,s instead of λ,λ' because it's easier to type, and I'll write u instead of v' because I'm going to use primes for derivatives, so I don't want any other primes. I will denote the map that takes straight lines to straight lines by ##\Lambda##, because that's a fairly common notation for a change of coordinates, and because seeing it written as x' really irritates me.

Let x be an arbitrary vector. Let v be an arbitrary non-zero vector. The map ##t\mapsto x+tv## (with domain ℝ) is a straight line. (Note that my x is their x0). By assumption, ##\Lambda## takes this to a straight line. So ##\Lambda(x)## is on that line, and for all t in ℝ, ##\Lambda(x+tv)## is on that line too. This implies that there's a non-zero vector u (in the codomain of ##\Lambda##) such that for each t, there's an s such that ##\Lambda(x+tv)=\Lambda(x)+su##.

Since we're dealing with a finite-dimensional vector space, let's define a norm on it and require u to be a unit vector. Now the number s is completely determined by the properties of ##\Lambda## along the straight line ##t\mapsto x+tv##, which is completely determined by x and v. It would therefore be appropriate to write the last term of ##\Lambda(x)+su## as s(x,v,t)u(x,v), but that would clutter the notation, so I will just write s(t)u. We will have to remember that they also depend on x and v. I will write the partial derivative of s with respect to t as s'. So, for all t, we have
$$\Lambda(x+tv)=\Lambda(x)+s(t)u.\qquad (1)$$ Now take the ith component of (1) and Taylor expand both sides around t=0. I will use the notation ##{}_{,j}## for the jth partial derivative. The first-order terms must be equal:
$$t\Lambda^i{}_{,j}(x)v^j=ts'(0)u.$$ This implies that
$$u=\frac{\Lambda^i{}_{,j}(x) v^j}{s'(0)}.$$ Now differentiate both sides of the ith component of (1) twice with respect to t, and then set t=0.
$$\Lambda^i{}_{,jk}(x)v^jv^k =s''(0)u=\frac{s''(0)}{s'(0)}\Lambda^i{}_{,j}(x) v^j.\qquad(2)$$ Now it's time to remember that s(t) really means s(x,v,t). The value of s''(0)/s'(0) depends on x and v, and is fully determined by the values of those two variables. So there's a function f such that ##f(x,v)=s''(0)/s'(0)##. Let's postpone the discussion of whether f must be linear in the second variable, and first consider what happens if it is linear in the second variable. Then we can write ##f(x,v)=v^i f_{,i}(x,0)=2f_{i}(x)v^i##, where I have defined ##f_i## by ##f_i(x)=f_{,i}(x,0)/2##. The reason for the factor of 2 will be obvious below. Now we can write (2) as
\begin{align}
\Lambda^i{}_{,jk}(x)v^jv^k &=2f_k(x)\Lambda^i{}_{,j}(x) v^j v^k\\
&=f_k(x)\Lambda^i{}_{,j}(x) v^j v^k +f_k(x)\Lambda^i{}_{,j}(x) v^j v^k\\
&=f_k(x)\Lambda^i{}_{,j}(x) v^j v^k +f_j(x)\Lambda^i{}_{,k}(x) v^k v^j\\
&=\big(f_k(x)\Lambda^i{}_{,j}(x)+f_j(x)\Lambda^i{}_{,k}(x)\big)v^k v^j.\qquad (3)
\end{align} All I did to get the third line from the second was to swap the dummy indices j and k in the second term. Since (3) holds for all x and all v≠0, it implies that
$$\Lambda^i{}_{,jk}(x)=f_k(x)\Lambda^i{}_{,j}(x)+f_j(x)\Lambda^i{}_{,k}(x).\qquad (4)$$ This is my version of their (B3). Since my x is their x0, it's not exactly the same. The fact that they have x (i.e. my x+tv) in the final result suggests that they didn't set t=0 like I did. So I think their result is equivalent to mine even though it looks slightly different.Let's get back to the linearity of f in the second variable. I don't have a perfect argument for it yet, but I'm fairly sure that it can be proved using arguments similar to this (even though this one doesn't quite go all the way): (2) is an equality of the form
$$v^T M v= g(v)m^Tv,$$ where M is an n×n matrix and m is an n×1 matrix (like v). The equality is supposed to hold for all v. For all ##a\in\mathbb R##, we have
$$g(av)m^Tv =\frac{g(av)Mg(av)}{a} =\frac{1}{a}(av)^TM(av) =av^TMv =ag(v)m^Tv.$$ So at least we have ##g(av)=ag(v)## for all v such that ##m^Tv\neq 0##.
 
Last edited:
Physics news on Phys.org
  • #62
Fredrik, I am impressed!

Yes, I think you did what Guo et al intended, only in a clear, understandable way.

For the rest of the linearity of f wrt. v, this would follow quite easily if we could prove that
$$v^T M w= g(v)m^Tw$$ holds also for ##v\neq w##.

But how can we prove this? Some parallellogram law-like argument, perhaps?
 
  • #63
Fredrik said:
$$g(av)m^Tv =\frac{g(av)Mg(av)}{a} =\frac{1}{a}(av)^TM(av) =av^TMv =ag(v)m^Tv.$$
The 2nd expression seems wrong (but also unnecessary, since the rest looks right if you just skip over it).

The earlier part of your argument is certainly an improvement over the original.

[Erland, I'll assume there's no longer any need for me to answer your post #58, unless you tell me otherwise.]
 
  • #64
strangerep said:
The 2nd expression seems wrong (but also unnecessary, since the rest looks right if you just skip over it).
Yes, that looks weird. This is what I scribbled on paper:
$$g(av)m^Tv =\frac{g(av)m^T(av)}{a}=\frac{(av)^T M(av)}{a}=a v^TM v =ag(v)m^T v.$$ I guess I ended up typing something else.
 
  • #65
Erland said:
For the rest of the linearity of f wrt. v, this would follow quite easily if we could prove that
$$v^T M w= g(v)m^Tw$$ holds also for ##v\neq w##.

But how can we prove this? Some parallellogram law-like argument, perhaps?
You mean something like inserting v=u+w and v=u-w (where u and w are arbitrary), and subtracting one of the equalities from the other? I think we need to know that g is linear before we can get something useful from that kind of trick.
 
Last edited:
  • #66
I've been thinking about the linearity some more, and I'm starting to doubt that it's possible to prove that g is linear, i.e. that f(x,v) is linear in v. I mean, the function probably is linear, since the theorem ends up with what I trust is the correct conclusion, but it doesn't look possible to prove it just from the statement ##v^TMv=g(v)m^Tv## for all v. Not if we don't know anything about M or m. Since ##M_{jk}=\Lambda^i{}_{,\, jk}(x)##, we have ##M^T=M##, but that doesn't seem to help. I'm pretty confused right now.

By the way, I got a tip that my simplified version of the theorem is more or less "the fundamental theorem of affine geometry". See e.g. page 52 of "Geometry" by Marcel Berger. Link. Unfortunately I can't see the whole proof, but I can see that it's long and complicated.
 
Last edited:
  • #67
Fredrik said:
I've been thinking about the linearity some more, and I'm starting to doubt that it's possible to prove that g is linear, i.e. that f(x,v) is linear in v. I mean, the function probably is linear, since the theorem ends up with what I trust is the correct conclusion, but it doesn't look possible to prove it just from the statement ##v^TMv=g(v)m^Tv## for all v. Not if we don't know anything about M or m. Since ##M_{jk}=\Lambda^i{}_{,\, jk}(x)##, we have ##M^T=M##, but that doesn't seem to help. I'm pretty confused right now.
I believe the key point is to understand what is dependent on what. The mapping ##\Lambda## goes between two different copies of ##R^n## -- physically these correspond to different frames of reference. I'll call the copies ##V## and ##V'## (even though you dislike primes for this purpose -- I can't think of a better notation right now). A line in ##V## is expressed as ##L(x) = x_0 + \lambda v##, and a line in ##V'## is expressed as ##L'(x') = x'_0 + \lambda' v'## (component indices suppressed). The mapping is expressed as
$$
x ~\to~ x' = \Lambda(x) ~.
$$ When Guo et al write partial derivatives like ##\partial x'/\partial x## it should be thought of in terms of ##\partial \Lambda/\partial x##. This does not depend on ##v## since it refers to the entire mapping between the spaces ##V## and ##V'##.

Once this subtlety is seen, it becomes trivial (imho) that ##f(x,v)## is linear in ##v##, but I suspect I still haven't explained it adequately. :-(

Then, to pass from ##f(x,v)## to their ##f_i## functions, we just make an ansatz for ##f(x,v)## of the form
$$
f(x,v) ~=~ \sum_j f_j v^j
$$ and substitute it accordingly. The 2 terms in Guo's (B3) arise because on the LHS the partial derivatives commute.
 
Last edited:
  • #68
strangerep said:
Once this subtlety is seen, it becomes trivial (imho) that ##f(x,v)## is linear in ##v##, but I suspect I still haven't explained it adequately. :-(
Not trivial at all, imho. Please, show us!
 
  • #69
strangerep said:
I believe the key point is to understand what is dependent on what. The mapping ##\Lambda## goes between two different copies of ##R^n## -- physically these correspond to different frames of reference. I'll call the copies ##V## and ##V'## (even though you dislike primes for this purpose -- I can't think of a better notation right now). A line in ##V## is expressed as ##L(x) = x_0 + \lambda v##, and a line in ##V'## is expressed as ##L'(x') = x'_0 + \lambda' v'## (component indices suppressed).
I don't mind primes for this purpose. The only thing I really disliked about the article's notation was that they denoted the coordinate transformation by ##x'## instead of (something like) ##\Lambda##.

I don't understand your notation L(x) and L'(x'). Don't you mean L(λ) and L'(λ') (with x=L(λ) and x'=L'(λ')), i.e. that L and L' are maps that take a real number to a point in a 1-dimensional subspace. I would call both those functions and those 1-dimensional subspaces "lines".

strangerep said:
When Guo et al write partial derivatives like ##\partial x'/\partial x## it should be thought of in terms of ##\partial \Lambda/\partial x##. This does not depend on ##v## since it refers to the entire mapping between the spaces ##V## and ##V'##.

Once this subtlety is seen, it becomes trivial (imho) that ##f(x,v)## is linear in ##v##, but I suspect I still haven't explained it adequately. :-(
I agree with Erland. It looks far from trivial to me too. Note that I do understand that the partial derivatives do not depend on v. I made that explicit by putting them into matrices M and m that are treated as constants. (They obviously depend on my x, i.e. Guo's ##x_0##). The fact that ##M_{jk}=\Lambda^i_{,\,jk}(x)## only tells me that M is symmetric.

Eq. (2) in post #61 is
$$\Lambda^i{}_{,\,jk}(x)v^jv^k =f(x,v)\Lambda^i{}_{,\,j}(x) v^j.$$ Are we really supposed to deduce that f(x,v) is linear in v only from this? Here's my biggest problem with that idea: What if v is orthogonal (with respect to the Euclidean inner product) to the vector whose j component is ##\Lambda^i{}_{,\,j}(x)##. (This is my m). Then the right-hand side above is =0, and f isn't even part of the equation.

The orthogonal complement of m isn't just some insignificant set. It's an (n-1)-dimensional subspace. I don't see a reason to think that ##v\mapsto f(x,v)## is linear on that subspace.
 
  • #70
Fredrik said:
Eq. (2) in post #61 is
$$\Lambda^i{}_{,\,jk}(x)v^jv^k =f(x,v)\Lambda^i{}_{,\,j}(x) v^j.$$ Are we really supposed to deduce that f(x,v) is linear in v only from this? Here's my biggest problem with that idea: What if v is orthogonal (with respect to the Euclidean inner product) to the vector whose j component is ##\Lambda^i{}_{,\,j}(x)##. (This is my m). Then the right-hand side above is =0, and f isn't even part of the equation.

The orthogonal complement of m isn't just some insignificant set. It's an (n-1)-dimensional subspace. I don't see a reason to think that ##v\mapsto f(x,v)## is linear on that subspace.
True, but in this case. the equation above holds for all ##i##. And, since the matrix ##\Lambda^i{}_{,\,j}(x)## is assumed to be invertible for all ##x##, not all its rows can be orthogonal to ##v##.

Still, I cannot deduce that ##f(x,v)## is linear i ##v##. I cannot get rid of the ##v##-dependence when I want to show that two matrices must be equal...

Let us, for a fixed ##x##, denote the matrix ##\Lambda^i{}_{,\,j}(x)## by ##A##, and let ##B(v)## be the ##n\times n##-matrix whose element in position ##ij## is ##\Lambda^i{}_{,\,jk}(x)v^k##, where each element in ##B(v)## is a linear function of ##v##. Finally, let ##g(v)=f(x,v)##, as before.
We then have the vector equation

##B(v)v=g(v)Av##.

If we could prove that ##B(v)=g(v)A##, we would be done, but the ##v##-dependence seems to destroy such a proof.
 
  • #71
Fredrik said:
Don't you mean L(λ) and L'(λ') (with x=L(λ) and x'=L'(λ')), [...]
More or less. I was trying to find a notation that made it more obvious that the ##L'## stuff was in a diffferent space. I need to think about the notation a bit more to come up with something better,
[...]Note that I do understand that the partial derivatives do not depend on v. I made that explicit by putting them into matrices M and m that are treated as constants.
OK, then let's dispose of the easy part, assuming that the partial derivatives do not depend on v, and using an example that's easy to relate to your M,m notation.

First off, suppose I give you this equation:
$$
az^2 ~=~ b z f(z) ~,
$$where ##z## is a real variable and ##a,b,## are real constants (i.e., independent of ##z##). Then I ask you to determine the most general form of the function ##f##, (assuming it's analytic).

We express it as a Taylor series: ##f(z) = f_0 + z f_1 + z^2 f_2^2 + \dots## where the ##f_i## coefficients are real constants. Substituting this into the main equation, we get
$$
az^2 ~=~ b z (f_0 + z f_1 + z^2 f_2^2 + \dots)
$$ Then, since ##z## is a variable, we may equate coefficients of like powers of ##z## on both sides. This implies ##f_1 = a/b## but all the other ##f_i## are zero. Hence ##f(z) \propto z## is the most general form of ##f## allowed.

Now extend this example to 2 independent variables ##z_1, z_2## and suppose we are given an equation like
$$
A^{ij} z_i z_j ~=~ b^k z_k f(z_1,z_2) ~,
$$ (in a hopefully-obvious index notation), where ##A,b## are independent of ##z##. Now we're asked to find the most general (analytic) form of ##f##. Since ##z_1, z_2## are independent variables, we may expand ##f## as a 2D Taylor series, substitute it into the above equation, and equate coefficients for like powers of the independent variables. We get an infinite set of equations for the coefficients of ##1, z_1, z_2, z_1^2, z_1 z_2, z_2^2, \dots~## but only the terms from the expansion of ##f## corresponding to ##f^j z_j## can possibly match up with a nonzero coefficient on the LHS.

[Erland: Does that explain it enough? All the ##v^i## are independent variables, because we're trying to find a mapping whose input constraint involves a set of arbitrary lines.]
 
Last edited:
  • #72
strangerep said:
First off, suppose I give you this equation:
$$
az^2 ~=~ b z f(z) ~,$$ [...] Then I ask you to determine the most general form of the function ##f##, (assuming it's analytic).
If it's analytic, then I agree that what you're doing proves the linearity. But I don't think it's obvious that our f(x,v) is analytic in v.

Erland said:
True, but in this case. the equation above holds for all ##i##. And, since the matrix ##\Lambda^i{}_{,\,j}(x)## is assumed to be invertible for all ##x##, not all its rows can be orthogonal to ##v##.
Hm, that would solve one of our problems at least. I wrote as ##v^TMv=g(v)m^Tv## is n equalities, not just one. I should have kept the i index around to make that explicit. I'll put it downstairs: ##v^T M_i v =g(v)m_i^T v##. What you're saying is that when v≠0, there's always an i such that ##m_i^Tv\neq 0##. So if you're right, we can do this:

Let v be non-zero, but otherwise arbitrary. Let a be an arbitrary real number. For all i, we have
$$g(av)m_i^Tv=\frac{g(av)m_i^T(av)}{a} =\frac{(av)^TM_i(av)}{a} =a v^T M_i v =ag(v)m_iv^T.$$ So now we just choose i such that ##m_i^T v\neq 0## and cancel that factor from both sides to get g(av)=ag(v).

Unfortunately, I still don't see how to prove that g(u+v)=g(u)+g(v) for all u,v.

You may have to remind me of some calculus. The square matrix that has the ##m_i^T## as its rows is the Jacobian matrix of ##\Lambda##. We need those rows to be linearly independent, so we need the Jacobian determinant of ##\Lambda## to be non-zero. But what's the problem with a function whose Jacobian determinant is zero? I haven't thought about these things in a while.
 
  • #73
Fredrik said:
If it's analytic, then I agree that what you're doing proves the linearity. But I don't think it's obvious that our f(x,v) is analytic in v.
Well, that needs more care. I think one only needs the assumption that the desired be analytic in a neightborhood of the origin, but that's a subject for another post.

Unfortunately, I still don't see how to prove that g(u+v)=g(u)+g(v) for all u,v.
Having shown that ##f(x,v)## is of the form ##f_k v^k##, isn't that enough to continue to Guo's eq(165) and beyond?
You may have to remind me of some calculus. The square matrix that has the ##m_i^T## as its rows is the Jacobian matrix of ##\Lambda##. We need those rows to be linearly independent, so we need the Jacobian determinant of ##\Lambda## to be non-zero. But what's the problem with a function whose Jacobian determinant is zero? I haven't thought about these things in a while.
Since we're talking about transformations between inertial observers, we must be try to find a group of transformations, hence they must be invertible. This should probably be inserted in the statement of the theorem.
 
  • #74
strangerep said:
Having shown that ##f(x,v)## is of the form ##f_k v^k##, isn't that enough to continue to Guo's eq(165) and beyond?
I suppose we can move on, but I don't think we have shown that.

strangerep said:
Since we're talking about transformations between inertial observers, we must be try to find a group of transformations, hence they must be invertible. This should probably be inserted in the statement of the theorem.
Right, but for ##\Lambda## to be invertible, isn't it sufficient that its Jacobian matrix at x is ≠0 for all x? The condition on ##\Lambda## that we need to be able to prove that ##f(x,av)=af(x,v)## for all x,v and all real numbers a, is that its Jacobian determinant at x is non-zero for all x. To put it another way, it's sufficient to know that the rows of the Jacobian matrix are linearly independent.
 
Last edited:
  • #75
Fredrik said:
I suppose we can move on, but I don't think we have shown that.
Wait -- if you don't follow that, then we can't move on. Are you able to do the 2-variable example in my earlier post #71 explicitly, and show that the ##f(z)## there is indeed of the form ##f_j z^j## ?
 
  • #76
strangerep said:
Wait -- if you don't follow that, then we can't move on. Are you able to do the 2-variable example in my earlier post #71 explicitly, and show that the ##f(z)## there is indeed of the form ##f_j z^j## ?
Yes, if f is analytic, but we don't know even know if it's differentiable.
 
  • #77
Fredrik said:
Yes, if f is analytic, but we don't know even know if it's differentiable.
I think this follows from continuity of the mapping from ##\lambda## to ##\lambda'## (in terms of which ##f## was defined).

Edit: Adding a bit more detail... It's also physically reasonable to require that inertial observers with velocities ##v## and ##v+\epsilon## should not map to pathologically different inertial observers in the target space, else small error margins in one frame do not remain "small" in any sense under the mapping. Expressing this principle in a mathematically precise way, we say that open sets in ##v## space must map to open sets in ##v'## space, and vice versa. IOW, the mapping must be continuous wrt ##v##, in standard topology.
 
Last edited:
  • #78
Of course it is so that a square matrix is invertible iff its rows are linearly independent iff its determinant is ≠0. If we assume that ##\Lambda## is an invertible tranformation such that both itself and its inverse are C1 everywhere, then the Jacobian matrix of ##\Lambda## is invertible everywhere.

strangerep, I agree that you have proved that f(x,v) is linear in v if it is analytic, as a function of v, in a neighbourhood of the origin, but I agree with Fredrik that this is not obvious. Analyticity is a quite strong condition and I can't see any physical reason for it.
 
  • #79
Erland said:
strangerep, I agree that you have proved that f(x,v) is linear in v if it is analytic, as a function of v, in a neighbourhood of the origin, but I agree with Fredrik that this is not obvious. Analyticity is a quite strong condition and I can't see any physical reason for it.
Are you ok with the physical motivation that the mapping of the original projective space (of lines) to the target projective space (of lines) should be continuous?

Except for the point about analyticity, are you ok with the rest of the proof now?
 
Last edited:
  • #80
strangerep said:
Are you ok with the physical motivation that the mapping of the original projective space (of lines) to the target projective space (of lines) should be continuous?
Yes, this is a reasonable assumption. So, analyticity follows from this?
strangerep said:
Except for the point about analyticity, are you ok with the rest of the proof now?
Up to the point we have discussed hitherto, yes. I have to read the rest of the proof.

Btw. It is indeed sufficient to prove analyticity in a neighbourhood of v=0. For then, strangerep's argument shows linearity for "small" vectors, and then Fredrik's argument showing homogeneity shows linearity also for "large" vectors.
 
  • #81
By the way, if anybody is interested: the theorem also holds without any smoothness or continuity assumptions. So if U and V are open in \mathbb{R}^n and if \varphi:U\rightarrow V is a bijection, then it is of the form described in the paper (which is called a projectivity).

This result is known as the local form of the fundamental theorem of projective geometry.
A general proof can be found here: rupertmccallum.com/thesis11.pdf

In my opinion, that proof is much more easier than Guo's "proof" and more general. Sadly, I don't think the paper is very readable. If anybody is interested, then I'll write up a complete proof.
 
  • #82
I'm definitely interested in some of it, but I'm not sure if I will need the most general theorem. I'm mainly interested in proving this:
Suppose that X is a vector space over ℝ such that 2 ≤ dim X < ∞. If T:X→X is a bijection that takes straight lines to straight lines, then there's a y in X, and a linear L:X→X such that T(x)=Lx+y for all x in X.​
I have started looking at the approach based on affine spaces. (Link). I had to refresh my memory about group actions and what an affine space is, but I think I've made it to the point where I can at least understand the statement of the theorem ("the fundamental theorem of affine geometry"). Translated to vector space language, it says the following:
Suppose that X is a vector space over K, and that X' is a vector space over K'. Suppose that 2 ≤ dim X = dim X' < ∞. If T:X→X' is a bijection that takes straight lines to straight lines, then there's a y in X', an isomorphism σ:K→K', and a σ-linear L:X→X' such that T(x)=Lx+y for all x in X.​
Immediately after stating the theorem, the author suggests that it can be used to prove that the only automorphism of ℝ is the identity, and that the only continuous automorphisms of ℂ are the identity and complex conjugation. That's another result that I've been curious about for a while, so if it actually follows from the fundamental theorem of affine geometry, then I think I want to study that instead of the special case I've been thinking about.

But now you're mentioning the fundamental theorem of projective geometry, so I have to ask? Why do we need to go to projective spaces?

Also, if you (or anyone) can tell me how that statement about automorphisms of ℝ and ℂ follows from the fundamental theorem of affine geometry, I would appreciate it.
 
Last edited:
  • #83
micromass said:
By the way, if anybody is interested [...]
YES! YES! YES! (Thank God someone who knows more math than me has taken pity on us and decided to participate in this thread... :-)
the theorem also holds without any smoothness or continuity assumptions. So if U and V are open in \mathbb{R}^n and if \varphi:U\rightarrow V is a bijection, then it is of the form described in the paper (which is called a projectivity).
Hmmm. On Wiki, "projectivity" redirects to "collineation", but there's not enough useful detail on projective linear transformations and "automorphic collineations". :-(
This result is known as the local form of the fundamental theorem of projective geometry.
A general proof can be found here: rupertmccallum.com/thesis11.pdf
Coincidentally, I downloaded McCallum's thesis yesterday after doing a Google search for fundamental theorems in projective geometry. But I quickly realized it's not an easy read, hence not something I can digest easily.
In my opinion, that proof is much more easier than Guo's "proof" and more general. Sadly, I don't think the paper is very readable. If anybody is interested, then I'll write up a complete proof.
YES, PLEASE! If you can derive those fractional-linear transformations in a way that physicists can understand, I'd certainly appreciate it -- I haven't been able to find such a proof at that level, despite searching quite hard. :-(

[Edit: I'm certainly interested in the more general projective case, although Fredrik is not.]
 
Last edited:
  • #84
I've just realized there's a simple geometric proof, for Fredrik's special case, for the case of the whole of \mathbb{R}^2, which I suspect would easily extend to higher dimensions.

Let T : \mathbb{R}^2 \rightarrow \mathbb{R}^2 be a bijection that maps straight lines to straight lines. It must map parallel lines to parallel lines, otherwise two points on different parallel lines would both be mapped to the intersection of the non-parallel image lines, contradicting bijectivity. So it maps parallelograms to parallelograms. But, if you think about it, that's pretty much the defining property of linearity (assuming T(0)=0).

There are a few I's to dot and T's to cross to turn the above into a rigorous proof, but I think I'm pretty much there -- or have I omitted too many steps in my thinking? (I think you may have to assume T is continuous to extend the additive property of linearity to the scalar multiplication property.)
 
  • #85
DrGreg said:
I've just realized there's a simple geometric proof, for Fredrik's special case, for the case of the whole of \mathbb{R}^2, which I suspect would easily extend to higher dimensions.

Let T : \mathbb{R}^2 \rightarrow \mathbb{R}^2 be a bijection that maps straight lines to straight lines. It must map parallel lines to parallel lines, otherwise two points on different parallel lines would both be mapped to the intersection of the non-parallel image lines, contradicting bijectivity. So it maps parallelograms to parallelograms. But, if you think about it, that's pretty much the defining property of linearity (assuming T(0)=0).

There are a few I's to dot and T's to cross to turn the above into a rigorous proof, but I think I'm pretty much there -- or have I omitted too many steps in my thinking? (I think you may have to assume T is continuous to extend the additive property of linearity to the scalar multiplication property.)
This idea is similar to the proof of the fundamental theorem of affine geometry in the book I linked to. The author is breaking it up into five steps. I think these are the steps, in vector space language:

Step 1: Show that T takes linearly independent sets to linearly independent sets.
Step 2: Show that T takes parallel lines to parallel lines.
Step 3: Show that T(x+y)=T(x)+T(y) for all x,y in X.
Step 4: Define an isomorphism σ:K→K'.
Step 5: Show that T(ax)=σ(a)T(x) for all a in K.

For my special case, we can skip step 4 and simplify step 5 is to "Show that T(ax)=aT(x) for all a in K". I've been thinking that I should just try to prove these statements myself, using the book for hints, but I haven't had time to do a serious attempt yet.
 
  • #86
Fredrik said:
I'm definitely interested in some of it, but I'm not sure if I will need the most general theorem. I'm mainly interested in proving this:
If X is a finite-dimensional vector space over ℝ, and T:X→X is a bijection that takes straight lines to straight lines, then there's a y in X, and a linear L:X→X such that T(x)=Lx+y for all x in X.​

OK, I'll try to type out the proof for you in this special case.

I have started looking at the approach based on affine spaces. (Link). I had to refresh my memory about group actions and what an affine space is, but I think I've made it to the point where I can at least understand the statement of the theorem ("the fundamental theorem of affine geometry"). Translated to vector space language, it says the following:
Suppose that X is a vector space over K, and that X' is a vector space over K'. Suppose that dim X = dim X' ≥ 2. If T:X→X' is a bijection that takes straight lines to straight lines, then there's a y in X', an isomorphism σ:K→K', and a σ-linear L:X→X' such that T(x)=Lx+y for all x in X.​
(I don't know if these vector spaces need to be finite-dimensional).

Ah, but this is far more general since it deals with arbitrary fields and stuff. The proof will probably be significantly harder than the \mathbb{R} case.

Immediately after stating the theorem, the author suggests that it can be used to prove that the only automorphism of ℝ is the identity, and that the only continuous automorphisms of ℂ are the identity and complex conjugation. That's another result that I've been curious about for a while, so if it actually follows from the fundamental theorem of affine geometry, then I think I want to study that instead of the special case I've been thinking about.

I don't think you can use the fundamental theorem to prove that \mathbb{R} has only automorphism. I agree the author makes you think that. But what he actually wants to do is prove that the only line preserving maps \mathbb{R}^n\rightarrow\mathbb{R}^n are the affine maps. The fundamental theorem deals with semi-affine maps: so there is an automorphism of the field. So in order to prove the case of \mathbb{R}^n he needs a lemma that states that there is only one automorphism on \mathbb{R}. It is not a result that (I think) follows from the fundamental theorem.

That said, the proof that \mathbb{R} has only one automorphism is not very hard. Let \sigma:\mathbb{R}\rightarrow \mathbb{R} be an automorphism. So:

  • \sigma is bijective
  • \sigma(x+y)=\sigma(x)+\sigma(y)
  • \sigma(xy)=\sigma(x)\sigma(y)

So \sigma(0)=\sigma(0+0)=\sigma(0)+\sigma(0), so \sigma(0)=0.
Likewise, \sigma(1)=\sigma(1.1)=\sigma(1)\sigma(1), so \sigma(1)=1 (unless \sigma(1)=0 which is impossible because if injectivity).

Take n\in \mathbb{N}. Then we can write n=\sum_{k=1}^n 1. So
\sigma(n)=\sigma\left(\sum_{k=1}^n 1\right)=\sum_{k=1}^n \sigma(1)=\sum_{k=1}^n 1=n

Now, we know that 0=\sigma(0)=\sigma(n+(-n))=\sigma(n)+\sigma(-n). It follows that \sigma(-n)=\sigma(n).

So we have proven that \sigma is fixed on \mathbb{Z}.

Take p\neq 0. Then 1=\sigma(1)=\sigma(p\frac{1}{p})= \sigma(p)\sigma(\frac{1}{p})=p\sigma(\frac{1}{p}). So \sigma(1/p)=1/p.
So, for q,p\in \mathbb{Z} with p\neq 0: \sigma(p/q)=\sigma(p)\sigma(1/q)=p/q. So this proves that \sigma is fixed on \mathbb{Q}.

Take x&gt;0 in \mathbb{R}. Then there exists a unique y\in \mathbb{R} with y^2=x. But then \sigma(y)^2=\sigma(x). It follows that \sigma(x)&gt;0.
Take x&lt;y in \mathbb{R}. Then x-y&gt;0. So \sigma(x-y)&gt;0. Thus \sigma(x)&lt;\sigma(y). So \sigma preserves the ordering.

Assume that there exists an x\in \mathbb{R} such that \sigma(x)\neq x. Assume (for example), that \sigma(x)&lt;x. Then there exists a q\in \mathbb{Q} such that \sigma(x)&lt;q&lt;x. But since \sigma preserves orderings and rationals, it follows that \sigma(x)&gt;q, which is a contradiction. So \sigma(x)=x.

This proves that the identity is the only automorphism on \mathbb{R}.

Now, for automorphisms on \mathbb{C}. Let \tau be a continuous automorphism on \mathbb{C}. Completely analogously, we prove that \tau is fixed on \mathbb{Q}. Since \tau is continuous and since \mathbb{Q} is dense in \mathbb{R}, it follows that \tau is fixed on \mathbb{R}.

Now, since i^2=-1. It follows that \tau(i)^2=-1. So \tau(i)=i or \tau(i)=-i. In the first case \tau(a+ib)=\tau(a)+\tau(i)\tau(b)=a+ib. In the second case: \tau(a+ib)=a-ib.
So there are only two automorphisms on \mathbb{C}.

But now you're mentioning the fundamental theorem of projective geometry, so I have to ask? Why do we need to go to projective spaces?

We don't really need projective spaces. We can prove the result without referring to it. But the result is often stated in this form because it is more general.
Also, one of the advantages of projective spaces is that \varphi(\mathbf{x})=\frac{A\mathbf{x}+B}{C\mathbf{x}+D} is everywhere defined, even if the denominator is 0 (in that case, the result will be a point at infinity).
 
Last edited:
  • #87
Fredrik said:
This idea is similar to the proof of the fundamental theorem of affine geometry in the book I linked to. The author is breaking it up into five steps. I think these are the steps, in vector space language:

Step 1: Show that T takes linearly independent sets to linearly independent sets.
Step 2: Show that T takes parallel lines to parallel lines.
Step 3: Show that T(x+y)=T(x)+T(y) for all x,y in X.
Step 4: Define an isomorphism σ:K→K'.
Step 5: Show that T(ax)=σ(a)T(x) for all a in K.

For my special case, we can skip step 4 and simplify step 5 is to "Show that T(ax)=aT(x) for all a in K". I've been thinking that I should just try to prove these statements myself, using the book for hints, but I haven't had time to do a serious attempt yet.
Maybe I need to spell this bit out. I think if T is continuous and your Step 3 is true and K = \mathbb{R} then you can prove T(a\mathbf{x})=aT(\mathbf{x}) as follows.

It's clearly true for a = 2 (put x=y in step 3).

By induction it's true for any integer a (y = (a-1)x).

By rescaling it's true for any rational a.

By continuity of T and density of \mathbb{Q} in \mathbb{R} it's true for all real a.
 
Last edited:
  • #88
micromass said:
But what he actually wants to do is prove that the only line preserving maps \mathbb{R}^n\rightarrow\mathbb{R}^n are the affine maps. The fundamental theorem deals with semi-affine maps: so there is an automorphism of the field. So in order to prove the case of \mathbb{R}^n he needs a lemma that states that there is only one automorphism on \mathbb{R}. It is not a result that (I think) follows from the fundamental theorem.

That said, the proof that \mathbb{R} has only one automorphism is not very hard.
...
Now, for automorphisms on \mathbb{C}.
...
Thank you micromass. That was exceptionally clear. I didn't even have to grab a pen. :smile: This saved me a lot of time.

DrGreg said:
Maybe I need to spell this bit out. I think if T is continuous and your Step 3 is true and K = \mathbb{R} then you can prove T(a\mathbf{x})=aT(\mathbf{x}) as follows.

It's clearly true for a = 2 (put x=y in step 3).

By induction it's true for any integer a (y = (a-1)x).

By rescaling it's true for any rational a.

By continuity of T and density of \mathbb{Q} in \mathbb{R} it's true for all real a.
Interesting idea. Thanks for posting it. I will however still be interested in a proof that doesn't rely on the assumption that T is continuous.
 
Last edited:
  • #89
Here is a proof for the plane. I think the same method of proof directly generalizes to higher dimensions, but it might get annoying to write down.

DEFINITION: A projectivity is a function \varphi on \mathbb{R}^2 such that


\varphi(x,y)=\left(\frac{Ax+By+C}{Gx+Hy+I},\frac{Dx+Ey+F}{Gx+Hy+I}\right)

where A,B,C,D,E,F,G,H,I are real numbers such that the matrix

\left(\begin{array}{ccc} A &amp; B &amp; C\\ D &amp; E &amp; F\\ G &amp; H &amp; I\end{array}\right)

is invertible. This invertible-condition tells us exactly that \varphi is invertible. The inverse is again a perspectivity and its matrix is given by the inverse of the above matrix.

We can see this easily as follows:
Recall that a homogeneous coordinate is defined as a triple [x:y:z] with not all x, y and z zero. Furthermore, if \alpha\neq 0, then we define [\alpha x: \alpha y : \alpha z]=[x:y:z].

There exists a bijection between \mathbb{R}^2 and the homogeneous coordinates [x:y:z] with nonzero z. Indeed, with (x,y) in \mathbb{R}^2, we can associate [x:y:1]. And with [x:y:z] with nonzero z, we can associate (x/z,y/z).

We can now look at \varphi on homogeneous coordinates. We define \varphi [x:y:z] = \varphi(x/z,y/z). Clearly, if \alpha\neq 0, then \varphi [\alpha x:\alpha y:\alpha z]=\varphi [x:y:z]. So the map is well defined.

Actually, our \varphi is actually just matrix multiplication:

\varphi[x:y:z] = \left(\begin{array}{ccc} A &amp; B &amp; C\\ D &amp; E &amp; F\\ G &amp; H &amp; I\end{array}\right)\left(\begin{array}{c} x\\ y \\ z\end{array}\right)

Now we see clearly that \varphi has an inverse given by

\varphi^{-1} [x:y:z] = \left(\begin{array}{ccc} A &amp; B &amp; C\\ D &amp; E &amp; F\\ G &amp; H &amp; I\end{array}\right)^{-1}\left(\begin{array}{c} x\\ y \\ z\end{array}\right)




LEMMA: Let x,y,z and t in \mathbb{R}^2 be four distinct points such that no three of them lie on the same line. Let x',y',z',t' in \mathbb{R}^2 also be four points such that no three of them lie on the same line. There exists a projectivity \varphi such that \varphi(x)=x^\prime, \varphi(y)=y^\prime, \varphi(z)=z^\prime, \varphi(t)=t^\prime.

We write in homogeneous coordinates:
x=[x_1:x_2:x_3],~y=[y_1:y_2:y_3],~z=[z_1:z_2:z_3],~t=[t_1:t_2:t_3]

Since \mathbb{R}^3 has dimension 3, we can find \alpha,\beta,\gamma in \mathbb{R} such that

(t_1,t_2,t_3)=(\alpha x_1,\alpha x_2,\alpha x_3)+(\beta y_1,\beta y_2,\beta y_3)+ (\gamma z_1, \gamma z_2,\gamma z_3).

The vectors (\alpha x_1,\alpha x_2,\alpha x_3), (\beta y_1,\beta y_2,\beta y_3), (\gamma z_1, \gamma z_2,\gamma z_3) form a basis for \mathbb{R}^3 (because of the condition that not three of x,y,z or t is on one line).

We can do the same for the x',y',z',t' and we again obtain a basis (\alpha^\prime x_1^\prime,\alpha^\prime x_2^\prime,\alpha^\prime x_3^\prime), (\beta^\prime y_1^\prime,\beta^\prime y_2^\prime,\beta^\prime y_3^\prime), (\gamma^\prime z_1^\prime, \gamma^\prime z_2^\prime,\gamma^\prime z_3^\prime) such that

(t_1^\prime, t_2^\prime,t_3^\prime)=(\alpha^\prime x_1^\prime,\alpha^\prime x_2^\prime,\alpha^\prime x_3^\prime)+(\beta^\prime y_1^\prime,\beta^\prime y_2^\prime,\beta^\prime y_3^\prime)+(\gamma^\prime z_1^\prime, \gamma^\prime z_2^\prime,\gamma^\prime z_3^\prime)


By linear algebra, we know that there exists an invertible matrix T that sends the bases on each other. This implies directly that the associated projectivity sends x to x', y to y' and z to z'.
Since
(t_1,t_2,t_3)=(\alpha x_1,\alpha x_2,\alpha x_3)+(\beta y_1,\beta y_2,\beta y_3)+ (\gamma z_1, \gamma z_2,\gamma z_3)
we get after applying T that

T(t_1,t_2,t_3)=(\alpha^\prime x_1^\prime,\alpha^\prime x_2^\prime,\alpha^\prime x_3^\prime)+(\beta^\prime y_1^\prime,\beta^\prime y_2^\prime,\beta^\prime y_3^\prime)+(\gamma^\prime z_1^\prime, \gamma^\prime z_2^\prime,\gamma^\prime z_3^\prime)

and thus T(t_1,t_2,t_3)=(t_1^\prime,t_2^\prime, t_3^\prime). Thus the projectivity also sends t to t'.



THEOREM Let U\subseteq \mathbb{R}^2 be open and let \varphi:U\rightarrow \mathbb{R}^2 be injective. Assume that \varphi sends lines to lines, then it is a projectivity.

We can of course assume that U contains an equilateral triangle ABC. Let P be the centroid of ABC.
By the previous lemma, there exists a projectivity \psi such that \psi(\varphi(A))=A, ~\psi(\varphi(B))=B, ~\psi(\varphi(C))=C, ~\psi(\varphi(P))=P. So we see that \sigma:=\psi\circ\varphi sends lines to lines and that \sigma(A)=A,~\sigma(B)=B,~\sigma(C)=C,~\sigma(P)=P. We will prove that \sigma is the identity.

HINT: look at Figure 2.1, p.19 of the Mccallum paper.

Define E the midpoint of AC. Then E is the intersection of AC and PB. But these lines are fixed by \sigma. Thus \sigma(E)=E. Let D be the midpoint of BC and F the midpoint of AB. Likewise follows that \sigma(D)=D and \sigma(F)=F.

Thus \sigma preserves the verticles of the equilateral traingles AFE, FBD, DEF and EDC. Since \sigma preserves parallelism, we see easily that \sigma preserves the midpoints and centroids of the smaller triangles. So we can subdivide the triangles in even smaller triangles whose vertices are preserved. We keep doing this process and eventually we find a set S dense in the triangle such that \sigma is fixed on that dense set. If \sigma were continuous, then \sigma is the identity on the triangle.

To prove continuity, we show that certain rhombuses are preserved. Look at Figure 2.3 on page 20 of McCallum. We have shown that the vertices of arbitrary triangles are preserved. Putting those two triangles together gives a rhombus. We will show that \sigma sends the interior of any rhombus ABCD into the rhombus ABCD. Since the rhombus can be made arbitrarily small around an arbitrary point, it would follow that \sigma were continuous.

By composing with a suitable linear map, we restrict to the following situation:

LEMMA: Let A=(0,0), B=(1,0), C=(1,1) and D=(0,1) and let \Sigma be the square ABCD. Suppose that \sigma:\Sigma\rightarrow \mathbb{R}^2 sends lines to lines and suppose that \sigma is fixed on A,B,C and D. Then \sigma(\Sigma)\subseteq \Sigma.

Take S on CB. We can make a construction analogous to 2.4 p.22 in MCCallen. So we let TS be horizontal, TU have slope -1 and VU be vertical. We define Q as the intersection of AS and VU. If S has coordinates (1,s) for some s. Then we can easily check that Q has coordinates (s,s^2). In particular, Q lies in the upper half-plane (= everything about AB).

Since S in CB and since C and B are fixed. We see that \sigma(S)\in CB. Let's say that \sigma(S)=(1,t) for some t. The line TS is a horizontal and \sigma maps this to a horizontal. So \sigma(T) has the form (0,t). The line TU has slope -1. So \sigma(U) has the form (t,0). Finally, it follows that \sigma(Q) has the form (t,t^2). In particular, \sigma(Q) is in the upper half plane.

So we have proven that if S is on CB, then they ray AS emanating from A is sent into the upper half plane. Let P be an arbitrary point in the square, then it is an element of a ray AS for some S. This ray is taken to the upper half plane. So \sigma(P) is in the upper half plane.

So this proves that the square ABCD is sent by \sigma into the upper half plane. Similar constructions show that the square is also sent to the lower half plane, the left and right half planes. So taking all of these things together: ABCD is sent into ABCD. This proves the lemma.

So, right now we have shown that \sigma is the identity on some small equilateral triangle in U. So \varphi is a projectivity on some small open set U^\prime of U (namely on the interior of the triangle). We prove now that \varphi will be a projectivity on entire U.

Around any point P in U, we can find some equilateral triangle. And we proved for such triangles that \varphi is a projectivity and thus analytic. The uniqueness of analytic continuation now proves that \varphi is a projectivity on entire U.
 
  • #90
Nice proof!
If I understand it correctly this proves that the most general transformations that take straight lines to straight lines are the linear fractional ones.
To get to the linear case one still needs to impose the condition mentioned above about the continuity of the transformation, right?
Classically(Pauli for instance) this was done just assuming the euclidean (minkowskian) space as the underlying geometry.
 

Similar threads

  • · Replies 101 ·
4
Replies
101
Views
7K
Replies
3
Views
1K
  • · Replies 7 ·
Replies
7
Views
1K
  • · Replies 54 ·
2
Replies
54
Views
4K
  • · Replies 40 ·
2
Replies
40
Views
5K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 33 ·
2
Replies
33
Views
3K
  • · Replies 10 ·
Replies
10
Views
1K
  • · Replies 123 ·
5
Replies
123
Views
8K
  • · Replies 6 ·
Replies
6
Views
2K