- #1
- 10,877
- 422
I'm trying to find a way to simplify a complicated proof. The worst step of the proof involves a product of five 4×4 matrices. I'm hoping, perhaps naively, that if I could understand why the result of this operation is so simple, I may be able to explain the proof to others without actually doing the matrix multiplication.
The theorem is a statement about subgroups of GL(ℝ4). I will denote the standard basis for ##\mathbb R^4## by ##\{e_0,e_1,e_2,e_3\}##, and label rows and columns of matrices from 0 to 3. (This convention is often used in relativity). The goal is to find all groups ##G\subset\mathrm{GL}(\mathbb R^4)## such that the subgroup of G that consists of all matrices of the form
\begin{pmatrix}* & * & * & *\\ 0 & * & * & *\\ 0 & * & * & *\\ 0 & * & * & *\end{pmatrix} is equal to the group of all matrices of the form
\begin{pmatrix}1 & 0 & 0 & 0\\ 0 & & & \\ 0 & & R & \\ 0 & & & \end{pmatrix} where R is a member of SO(3). I will use the notation U(R) for such a matrix. The theorem says that this assumption implies that G is either the group of Galilean boosts or the group of Lorentz boosts.
I will describe some of the key steps of the proof. The first step is the observation that for all ##\Lambda\in G##, there exist ##R,R'\in\mathrm{SO}(3)## such that the 20,30,21,31 components of ##\Lambda## are all zero. (This is the lower left 2×2 corner). This isn't particularly hard. If we write
$$\Lambda=\begin{pmatrix}a & b^T\\ c & D\end{pmatrix}$$ where a is a number, b,c are 3×1 matrices, and D is a 3×3 matrix, we have
$$U(R)\Lambda U(R') =\begin{pmatrix}a & b^TR'\\ Rc & RDR'\end{pmatrix}.$$ So all we need to do is to choose R,R' such that the last two rows of R are orthogonal to c, and the first column of R' is orthogonal to the last two rows of RD.
The next step is to show that if the lower left is all zeroes, then so is the upper right. This is simple once you have seen the trick. (I will post it on request). The proof then focuses on the subgroup that consists of matrices such that the lower left and upper right are both all zeroes. It contains a matrix of the form
$$M=\begin{pmatrix} a & b & 0 & 0\\ -vd & d & 0 & 0\\ 0 & 0 & k & 0\\ 0 & 0 & 0 & \pm k\end{pmatrix}.$$ This is where things get interesting (because it gets so complicated that you can fill many pages with nothing but matrix multiplication). Define
$$F(t)=\begin{pmatrix}1 & 0 & 0 & 0\\ 0 & \cos t & -\sin t & 0\\ 0 & \sin t & \cos t & 0\\ 0 & 0 & 0 & 1\end{pmatrix}$$ Now the idea is to consider the matrix ##M^{-1}F(t)M##, which doesn't have the pretty form that M does, and then bring it to the pretty form by a transformation ##X\mapsto U(R)XU(R')##. This is where the magic happens. We choose R,R' in the simplest possible way to ensure that the lower left of ##U(R)M^{-1}F(t)MU(R')## will be all zeroes, and the result turns out to be even simpler than M! This is the magic I would like to be able to explain.
It turns out that ##U(R)M^{-1}F(t)MU(R')## is of the form
$$\begin{pmatrix}f & g & 0 & 0\\ -wf & f & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\end{pmatrix}.$$ Now the question is, how did this happen? How did we get rid of the lower left diagonal elements and why are the upper left diagonal elements equal?
I've been doing the matrix multiplication in Mathematica. (I will post the code on request). Its result for ##M^{-1}F(t)M## is
$$\left(
\begin{array}{cccc}
\frac{a+b v \cos (t)}{a+b v} & \frac{b-b \cos (t)}{a+b v} & \frac{b k \sin (t)}{a d+b v d} & 0 \\
\frac{2 a v \sin ^2\left(\frac{t}{2}\right)}{a+b v} & \frac{b v+a \cos (t)}{a+b v} & -\frac{a k \sin (t)}{a d+b v d} & 0 \\
-\frac{d v \sin (t)}{k} & \frac{d \sin (t)}{k} & \cos (t) & 0 \\
0 & 0 & 0 & 1
\end{array}
\right)$$ If we write this as
$$\begin{pmatrix}r & q^T\\ p & S\end{pmatrix},$$ we have
$$U(R)M^{-1}F(t)MU(R') =\begin{pmatrix}r & q^TR'\\ Rp & RSR'\end{pmatrix}.$$ Since p is in the 1-2 plane, we choose R to be a rotation in the 1-2 plane. And then we can choose R' to be a rotation in the 1-2 plane as well, since all we're trying to do is to "zero out" the lower left. When R and R' are chosen this way, we end up with the very simple result above.
The theorem (stated in a very awkward way) and its proof (sans matrix multiplication details) can as far as I know only be found in this book. Unfortunately, it's not possible to view all the pages. I had to go to a library to check it out. I think I understand the proof well enough to answer questions about it.
One final comment, the result we get doesn't have all zeroes in the upper right. Instead, we use the fact that they must be zero (by the lemma that says that if the lower left is all zeroes then so is the upper right) to determine a relationship between the variables, and use that to further simplify the non-zero components.
The theorem is a statement about subgroups of GL(ℝ4). I will denote the standard basis for ##\mathbb R^4## by ##\{e_0,e_1,e_2,e_3\}##, and label rows and columns of matrices from 0 to 3. (This convention is often used in relativity). The goal is to find all groups ##G\subset\mathrm{GL}(\mathbb R^4)## such that the subgroup of G that consists of all matrices of the form
\begin{pmatrix}* & * & * & *\\ 0 & * & * & *\\ 0 & * & * & *\\ 0 & * & * & *\end{pmatrix} is equal to the group of all matrices of the form
\begin{pmatrix}1 & 0 & 0 & 0\\ 0 & & & \\ 0 & & R & \\ 0 & & & \end{pmatrix} where R is a member of SO(3). I will use the notation U(R) for such a matrix. The theorem says that this assumption implies that G is either the group of Galilean boosts or the group of Lorentz boosts.
I will describe some of the key steps of the proof. The first step is the observation that for all ##\Lambda\in G##, there exist ##R,R'\in\mathrm{SO}(3)## such that the 20,30,21,31 components of ##\Lambda## are all zero. (This is the lower left 2×2 corner). This isn't particularly hard. If we write
$$\Lambda=\begin{pmatrix}a & b^T\\ c & D\end{pmatrix}$$ where a is a number, b,c are 3×1 matrices, and D is a 3×3 matrix, we have
$$U(R)\Lambda U(R') =\begin{pmatrix}a & b^TR'\\ Rc & RDR'\end{pmatrix}.$$ So all we need to do is to choose R,R' such that the last two rows of R are orthogonal to c, and the first column of R' is orthogonal to the last two rows of RD.
The next step is to show that if the lower left is all zeroes, then so is the upper right. This is simple once you have seen the trick. (I will post it on request). The proof then focuses on the subgroup that consists of matrices such that the lower left and upper right are both all zeroes. It contains a matrix of the form
$$M=\begin{pmatrix} a & b & 0 & 0\\ -vd & d & 0 & 0\\ 0 & 0 & k & 0\\ 0 & 0 & 0 & \pm k\end{pmatrix}.$$ This is where things get interesting (because it gets so complicated that you can fill many pages with nothing but matrix multiplication). Define
$$F(t)=\begin{pmatrix}1 & 0 & 0 & 0\\ 0 & \cos t & -\sin t & 0\\ 0 & \sin t & \cos t & 0\\ 0 & 0 & 0 & 1\end{pmatrix}$$ Now the idea is to consider the matrix ##M^{-1}F(t)M##, which doesn't have the pretty form that M does, and then bring it to the pretty form by a transformation ##X\mapsto U(R)XU(R')##. This is where the magic happens. We choose R,R' in the simplest possible way to ensure that the lower left of ##U(R)M^{-1}F(t)MU(R')## will be all zeroes, and the result turns out to be even simpler than M! This is the magic I would like to be able to explain.
It turns out that ##U(R)M^{-1}F(t)MU(R')## is of the form
$$\begin{pmatrix}f & g & 0 & 0\\ -wf & f & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\end{pmatrix}.$$ Now the question is, how did this happen? How did we get rid of the lower left diagonal elements and why are the upper left diagonal elements equal?
I've been doing the matrix multiplication in Mathematica. (I will post the code on request). Its result for ##M^{-1}F(t)M## is
$$\left(
\begin{array}{cccc}
\frac{a+b v \cos (t)}{a+b v} & \frac{b-b \cos (t)}{a+b v} & \frac{b k \sin (t)}{a d+b v d} & 0 \\
\frac{2 a v \sin ^2\left(\frac{t}{2}\right)}{a+b v} & \frac{b v+a \cos (t)}{a+b v} & -\frac{a k \sin (t)}{a d+b v d} & 0 \\
-\frac{d v \sin (t)}{k} & \frac{d \sin (t)}{k} & \cos (t) & 0 \\
0 & 0 & 0 & 1
\end{array}
\right)$$ If we write this as
$$\begin{pmatrix}r & q^T\\ p & S\end{pmatrix},$$ we have
$$U(R)M^{-1}F(t)MU(R') =\begin{pmatrix}r & q^TR'\\ Rp & RSR'\end{pmatrix}.$$ Since p is in the 1-2 plane, we choose R to be a rotation in the 1-2 plane. And then we can choose R' to be a rotation in the 1-2 plane as well, since all we're trying to do is to "zero out" the lower left. When R and R' are chosen this way, we end up with the very simple result above.
The theorem (stated in a very awkward way) and its proof (sans matrix multiplication details) can as far as I know only be found in this book. Unfortunately, it's not possible to view all the pages. I had to go to a library to check it out. I think I understand the proof well enough to answer questions about it.
One final comment, the result we get doesn't have all zeroes in the upper right. Instead, we use the fact that they must be zero (by the lemma that says that if the lower left is all zeroes then so is the upper right) to determine a relationship between the variables, and use that to further simplify the non-zero components.
Last edited: