# I Some clarifications needed about SR notation

1. Sep 9, 2016

### JulienB

Hi everybody! The last chapter of my course named "Advanced Mechanics and Special Relativity" treats of Lorentz transformations, but the script of my teacher does not explain much about the notation used and it's getting quite confusing for me without understanding it fully.

So far we've considered an inertial frame $\sum '$ moving "away" from inertial frame $\sum$ at velocity $v$ in the $x$-direction. We derived the Lorentz transformation for that change of inertial frame:

$\Lambda = \begin{pmatrix} \cosh \eta & - \sinh \eta & 0 & 0 \\ - \sinh \eta & \cosh \eta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}$

so that $(ct', x', y', z') = \Lambda (ct, x, y, z)$ with $\tanh \eta = \frac{v}{c}$. Our Minkowski metric is defined as

$\eta = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}$.

Then without any explanation he suddenly introduces that $\Lambda^T \eta \Lambda = \eta$ can equivalently be written as $\Lambda_{\rho}^{\mu} \eta_{\mu \nu} \Lambda_{\kappa}^{\nu} = \eta_{\rho \kappa}$! From this point he uses only this notation without describing what it means...

So I've done some research to try to make sense of it, but it seems rather complicated. Here are some assumptions I have made with the help of Google and some definitions given in my script:

1. $x^{\mu}$ is known as the contravariant 4-vector and can be defined as $x^{\mu} = (x^0, x^1, x^2, x^3) = (ct, x, y, z)$ (at least in the case of our example above). So $\mu$ seems to refer to the components of the vector. Is that right? I would also assume without any certainty at all that the index up means the coordinates are not given using the Minkowski metric, because:

2. $x_{\mu} = \eta_{\mu \nu} x^{\nu}$ is known as the covariant 4-vector and when doing the matrix multiplication I get $x_{\mu} = (-ct, x, y, z)$. That seems to be the spacetime coordinates using the Minkowski metric, since it is the multiplication of $\eta$ with $x^{\nu}$. Is that assumption correct?

3. I've noticed that notations such as $\eta_{\mu \nu} x^{\nu}$ refer to the Einstein summation convention, and in fact mean $\sum_{\nu = 0}^{3} \eta_{\mu \nu} x^{\nu}$. But I actually don't see the difference with a regular matrix multiplication. I read that a few times already though, does that mean it is different to matrix multiplication but I will realise that only later down the road? But still, what confuses me the most are the indexes. ${\mu}$ seems to refer to the columns of the matrix $\eta$, and ${\nu}$ to the rows. Why would we even write them then if we are not dealing with components? Or do we write them just to show that the index is down, so that we are not in the Minkowski metric (if my first assumption about those indexes was right in the first place...)?

4. I get the same problem about the notation for the matrix $\Lambda_{\nu}^{\mu}$. They seem to refer to the components of the matrix ($\mu$ would be the rows and $\nu$ the columns?), but I don't understand again why we would write them and what does it mean then when say I have two matrices $\Lambda_{\nu}^{\mu}$ and $\Lambda_{\kappa}^{\rho}$...

Those are my first questions about the notation in SR... The script of my teacher directly derives relations and goes to infinitesimal Lorentz transformations after that (using this notation of course). Apart from what I wrote above, internet gave me very complicated answers about the matter. The special relativity represents only 3 chapters out of over 60 in that course, so I'm not gonna start (yet) reading a book about SR just to understand that notation when I have an exam coming up in 3 weeks...

Thank you very much in advance for your help, and sorry my message was so long... I think some examples illustrating the uses of this notation would be very helpful for me to understand it.

Julien.

2. Sep 9, 2016

### JulienB

Well... I'm getting it slowly. The indices of the matrices simply refer to the components of the matrix, and I assume it will be the same for the vectors. My confusion came from the fact that vectors and components are not differentiated in the script (they both use the same notation). The summation now makes total sense, and my new interpretation of the matrix product would be:

$\Lambda_{\rho}^{\mu} \eta_{\mu \nu} \Lambda_{\kappa}^{\nu} = \sum_{\mu = 0}^{3} \sum_{\nu = 0}^{3} \Lambda_{\rho}^{\mu} \eta_{\mu \nu} \Lambda_{\kappa}^{\nu}$

and it gives the expected result $\eta_{\rho \kappa}$.

Julien.

3. Sep 10, 2016

### vanhees71

It is of UTMOST importance to introduce a proper notation! You should NOT write $\Lambda_{\rho}^{\mu}$ but ${\Lambda^{\mu}}_{\rho}$. The horizontal position of the vertices is important. Also the vertical position of the index is very important, because it tells you whether you have co- (lower indices) or contravariant (upper indices) vector or tensor components.

The time-position four-vector (I use natural units with $c=1$ in the following) has contravariant components $(x^{\mu})=(t,x,y,z)^{T}$ (it's a column vector). Then in the boosted frame it has components
$$x'=\Lambda x$$
or in the index (or Ricci) notation
$$x^{\prime \mu}={\Lambda^{\mu}}_{\rho} x^{\rho},$$
where the summation over repeated indices is implied (Einstein summation convention). Here it is important that you are allowed to sum over such an index pair only if one is an upper and the other a lower index. An expression, where this is not true, must be wrong due to some error!

Now a Lorentz transformation is one that transforms from a Minkowski pseudoorthonormal basis to another one, i.e., for any two four-vectors you have
$$x' \cdot y'=x \cdot y \; \Leftrightarrow \; \eta_{\mu \nu} x^{\prime \mu} y^{\prime \nu}=\eta_{\rho \sigma} x^{\rho} y^{\sigma}.$$
Now you use the Lorentz transformation
$$\eta_{\mu \nu} x^{\prime \mu} y^{\prime \nu} = \eta_{\mu \nu} {\Lambda^{\mu}}_{\rho} {\Lambda^{\nu}}_{\sigma} x^{\rho} y^{\sigma} \stackrel{!}{=} \eta_{\rho \sigma} x^{\rho} y^{\sigma}.$$
Since this must hold for all four-vectors $x$ and $y$ for $\Lambda$ to be a matrix of a Lorentz transformation thus you get
$$\eta_{\mu \nu} {\Lambda^{\mu}}_{\rho} {\Lambda^{\nu}}_{\sigma}=\eta_{\rho \sigma}.$$
In matrix notation this means
$$\Lambda^T \eta \Lambda=\eta$$
or since $\eta^2=1$
$$(\eta \Lambda^T \eta) \Lambda=1 \; \Rightarrow \; \Lambda^{-1} = \eta \Lambda^T \eta.$$
In the correct Ricci notation one should note that $(\eta_{\mu \nu})=(\eta^{\mu \nu})=\mathrm{diag}(1,-1,-1,-1)$. Then the latter equation reads
$${(\Lambda^{-1})^{\mu}}_{\nu} = \eta_{\mu \rho} \eta^{\nu \sigma} {\Lambda^{\rho}}_{\sigma} = {\Lambda_{\mu}}^{\nu}.$$
Here you clearly see how important it is to keep the horizontal as well as the vertical place of the indices of vector (tensor) components and Lorentz-transformation matrix elements straight!

4. Sep 10, 2016

### JulienB

Hi @vanhees71 and first thank you for your answer and for correcting my mistakes about writing the indices.

Though your post was definitely very helpful, a few things are still leaving me confused.

If $x^{\mu}$ is a column vector, then shouldn't it be $(t\ x\ y\ z)^T$? As I understand it, a vector written with commas is a column vector and its transpose is a row vector then.

Okay, starting with the matrix notation:

$(x') = \Lambda (x) = \begin{pmatrix} \sum_{\rho} {\Lambda^{0}}_{\rho} x^{\rho} \\ \sum_{\rho} {\Lambda^{1}}_{\rho} x^{\rho} \\ \sum_{\rho} {\Lambda^{2}}_{\rho} x^{\rho} \\ \sum_{\rho} {\Lambda^{3}}_{\rho} x^{\rho} \end{pmatrix}$

Then it makes total sense to me that the components of $(x')$ are $x'^{\mu} = \sum_{\rho} {\Lambda^{\mu}}_{\rho} x^{\rho} = {\Lambda^{\mu}}_{\rho} x^{\rho}$. No problem here (hopefully).

Now I didn't have that information unfortunately. Let me try to derive it myself:

$x' \cdot y' = x'_{\mu} \cdot y'^{\mu}$
$= x'^0 y'^0 - x'^1 y'^1 - x'^2 y'^2 - x'^3 y'^3$
$= \gamma^2 (x^0 - \beta x^1)(y^0 - \beta y^1) - \gamma^2 (x^1 - \beta x^0)(y^1 - \beta y^0) - x^2 y^2 - x^3 y^3$ (here I do the proof only for a boost in x-direction)
$= x^0 y^0 - x^1 y^1 - x^2 y^2 -x^3 y^3$
$= x \cdot y$

Okay that took me quite a while already :) Then since $x \cdot y = x_{\mu} y^{\mu}$ with $x_{\mu} = \eta_{\mu \nu} x^{\nu}$, it follows:

$\eta_{\mu \nu} x'^{\mu} y'^{\nu} = \eta_{\rho \sigma} x^{\rho} y^{\sigma}$.

Okay that's quite clear.

That now is not so obvious to me. I don't understand the relation between $A^T$ and the expression above in Ricci notation. And isn't $A^T = A$ anyway? I am probably missing an essential point here.

May I suggest a "system" to go from matrix notation to Ricci notation? It is based on the last equation, let me know if that holds:

$\Lambda^{-1} = \eta \Lambda^{T} \eta$
1. I arbitrarily attribute indices to $\Lambda^{-1}$, say $({\Lambda^{-1})^{\mu}}_{\nu}$. The rest of the equation will depend on those.
2. The components of the first $\eta$ and $\Lambda^{T}$ (= $\Lambda$ I suppose) have to multiply each other following matrix multiplication rules, hence this expression becomes in Ricci notation:
$\eta_{? \rho} {\Lambda^{\rho}}_{?}$.
3. The ? in the lower term of the $\Lambda$ can be freely defined as long as it is consistent with the second $\eta$, where it must be up in order to respect the summation notation and as a 2nd character because of matrix multiplication rules. So the expression becomes:
$\eta_{? \rho} {\Lambda^{\rho}}_{\sigma} \eta^{? \sigma}$
4. The first question marks corresponds to the rows of the matrix, therefore it should be $\mu$. The 2nd one corresponds to the columns, so it should be $\nu$:
$({\Lambda^{-1})^{\mu}}_{\nu} = \eta_{\mu \rho} {\Lambda^{\rho}}_{\sigma} \eta^{\nu \sigma}$

That's a bit farfetched, but I'm in search of some comfort in that new notation. I'm sure it's gonna come with some practice though.

Yes that is now clear. And thank you very much for all your efforts at explaining me this new notation.

Julien.

5. Sep 11, 2016

### vanhees71

I strongly discourage a notation, where the variables are not separated by commas. For me $(t,x,y,z)$ is a row and $(t,x,y,z)^T$ a column.

Well, in the expression ${\Lambda^T}\eta$ you sum over the first indices of the objects, i.e., it means
$$(\Lambda^T \eta)_{\mu \nu} = {\Lambda^{\rho}}_{\mu} \eta_{\rho \nu}.$$
Also note that a general Lorentz-transformation matrix is not necessarily symmetric. Pure boosts are, however, symmetric.

Usually the direct use of the index notation is more clear than the matrix-vector notation, because it immediately tells you from the notation, whether you have co- or contravariant components. To see the transformation properties start with the contravariant components of a vector. It transforms by definition as the $x^{\mu}$, i.e.,
$$A^{\prime \mu}={\Lambda^{\mu}}_{\nu} A^{\nu}.$$
Now we ask, how covariant components transform. By definition you have
$$A_{\mu}'=\eta_{\mu \rho} A^{\prime \rho}=\eta_{\mu \rho} {\Lambda^{\rho}}_{\sigma} A^{\sigma}=\eta_{\mu \rho} {\Lambda^{\rho}}_{\sigma} \eta^{\sigma \nu} A_{\nu}= {(\Lambda^{-1})^{\nu}}_{\mu} A_{\nu}.$$
This means that the $A_{\mu}$ transform contragredient to the $A^{\mu}$, as it should be.

The invariance of the Minkowski product is the key for the understanding of the formalism. It's also directly motivated from Einstein's two postulates. For details see my FAQ article on SR:

http://th.physik.uni-frankfurt.de/~hees/pf-faq/srt.pdf

6. Sep 11, 2016

### pervect

Staff Emeritus
It's harder to write, but I suppose for utmost clarity one can use the latex on PF to format a column vector, i.e.

$$\begin{pmatrix} t \\ x \\ y \\ z \end{pmatrix}$$

The latex code for this is begin{pmatrix} t \\ x \\ y \\ z \end{pmatrix}, it appears to me the OP knows how to invoke latex math mode already. The latex to formatting $\Lambda^\mu{}_\rho$ with the horizontal spaces is \Lambda^\mu{}_\rho.

I'm really not sure why the horizontal space is critical, though it's the way I've always seen it written that way. MTW describes the index spacing convention as "northwest to southeast".

7. Sep 11, 2016

### robphy

If one wrote $A^{\mu}_{\nu}$, upon raising the index with the metric, does one obtain $A^{\mu\nu}$ or $A^{\nu\mu}$? (It gets worse for higher-index structures.) That's why MTW also uses the slot notation. In addition, more mathematically-conscious treatments make explicit the mapping from specific orderings of vectors and covectors to the reals.