# How to learn advanced/abstract notation

1. Aug 28, 2016

### Xilor

Hi, while self-studying physics, I keep bumping into books that transition into more abstract math, using notation systems unfamiliar to me. This is often accompanied by explanations consisting of: 'we can write this as' 'it is easy to show that' 'therefore', which are not particularly helpful. And as books usually build on earlier sections, this is usually the point where I have to abandon a work. And of course, the next book reexplains all the sections I did understand, until again suddenly descending into some rune-language.

Are there any good methods/places to become familiar with these notation systems? Unfortunately I don't have access to professors who can explain the tricky bits when questions arise.
My focus would mostly be on SR/GR, and the problem usually starts appearing around the point where more abstract objects containing multiple elements, such as matrices, start appearing in formulas with millions of indices.

2. Aug 28, 2016

### Lucas SV

To be honest the best place to learn mathematical notation from is mathematics itself. Personally the first time I experienced material that emphasized the importance of proof is in this youtube series on analysis:
If you start practising proofs in which you use sets, the symbol $\in$, $\subset$, $\cup$, $\cap$, logical notation $\Rightarrow$, $\forall$, $\exists$, you should start getting it at some point. I'm not sure it is this kind of notation you were talking about though.

The main mathematics of GR is differential geometry, which can be very technical. However, for a first look in GR. you do not need to know many technicalities. A book that teaches GR without going into the geometric details, or describing spacetime as a manifold, is Weinberg's "gravitation and cosmology". This was actually the first book I learnt GR from, and it allowed me to learn it really early on. Later I needed to learn manifolds anyway, though.

If you have examples of notations you are struggling with feel free to post. Perhaps make a list. Look up the latex symbols required, as in http://web.ift.uib.no/Teori/KURS/WRK/TeX/symALL.html, or a post in PF (the faq talks about latex and writing equations).

Last edited: Aug 28, 2016
3. Aug 28, 2016

### Xilor

Unfortunately it's more the technical side that I'm interested in now. Conceptually most of GR seems clear, but I'm trying to work myself towards the point where I could understand and perhaps work with the field equations etc.

For example, I was pointed to the Sean Caroll lectures and quickly got stuck on some of the earliest sections.
Up to (1.8) I don't have any problems following. At (1.9) I start having trouble. It took me some time here to figure out that the xu and xv actually refer to vectors using all of the dimensions, rather than just meaning one value of one dimension. but ok, I got through it.
At (1.10), I have no clue why the prime goes on top of the indice and I don't know why the delta is missing now, but conceptually still no problems here.
From (1.12) I can tell what (1.11) is supposed to be, but would not have figured it out otherwise. What is happening with these indices, why is the 'u' flying over to the matrix when he just referred to multiplying by xu what is its meaning up there at the matrix?
(1.13) makes me confident that I have no idea what is going on. What are these 'T' indices, why is this what we 'would like', how can I tell that whatever he's doing here relates to things being invariant? Where did the delta-x's suddenly come from and why do have two of them. Are the T's even behind the numbers they belong to, or are they in front? What happened to all those other indices we were using earlier? Does this no longer apply to all dimensions? What?
Then he transforms whatever that was, using a 'therefore' and an 'or' to whatever (1.15) is, and I know I should just abandon this.

Another example, in this book by Schutz:
http://202.38.64.11/~jmy/documents/...rse in General Relativity(Second Edition).pdf
It starts out innocently enough, until (1.2) happens. A sudden transformation into some notation I don't understand. Okay, alpha and beta here are just 0,1,2,3 apparently, and I know what is supposed to come out of it so I have some vague idea. But when he then gets to (1.3), there is no way I can still follow, and again might as well abandon.

4. Aug 28, 2016

### Lucas SV

Ok, I may have come across these notes when I started learning tensorial notation. I think I had similar struggles. Anyway I will try to answer each problem.

This is the first example of a tensor equation written in components. The key to understanding this is the Einstein summation convention. As explained in the notes, (1.3) has the same content. Indeed if you expand (1.9) by summing over both $\mu$ and $\nu$ indices, you will see this (Do it!). It is useful to note that although you are summing over 16 numbers, since $\mu$ and $\nu$ both range over four numbers, most of the terms in the sum are $0$ because $\eta$ is diagonal.

This is just a convention. Some authors use prime over indices to say that $x$ is written in the new coordinate system, while some authors use $x'^\mu$. In the convention used by Caroll, $\mu$ and $\mu'$ should be considered as different indices in the same way $\mu$ and $\nu$ are considered as different indices.

Again Einstein summation convention. Do the following exercise: expand (1.11) using the summation convention. Then expand (1.12) using the componentwise definition of a matrix acting on a vector ($\Lambda$ is a matrix with components $\Lambda^{\alpha}_{\ \beta}$, while $x$ is a vector). Compare your results.

$T$ means the matrix transpose (look it up if you don't know it!), so it is not an index. You can take the transpose of a column vector and it becomes a row vector (vectors are also matrices).

Carroll switched notation back from components to matrix notation in (1.13). The first line of 1.13 (unprimmed) is the same as (1.9), when written in components (Prove this!). He equates it to the prime version because $s^2$ as a scalar is meant to be invariant under $\Lambda$ coordinate transormations. Then he uses the transformation laws for the change in $x$, as given in (1.12) in order to find the orthogonality condition (1.15).

5. Aug 28, 2016

### Lucas SV

Well what I meant by technical is harder than tensor notation. The approach by Weinberg is the same as chapter one of Carroll, except instead of going to manifolds in chapter 2, Weinberg keeps using components and equations like in chapter 1 (Carroll), throughout the whole book. But although this may be a shortcut (you will see what I mean one you get past chapter 1), it is prety old-fashioned and nowadays people would expect you to know about manifolds and index free notation, which appears in chapter 2. But still, Weinberg's book is a great book, with lots of physical insight and will certainly teach you how to do calculations in GR and apply them to different circunstances.

A lecture series that may help understanding the notation (it certainly did help me) is Suskind, .

6. Aug 28, 2016

### Lucas SV

This actually helps understanding carroll's (1.13), and is related to what I said "componentwise definition of a matrix acting on a vector". So Schutz equation (1.2) is very important. To convince yourself of its truthfullness, do as many exercises of matrix multiplication of the following form:
Compute the number $u^T\cdot A \cdot v$, where $A$ is an $n\times n$ matrix, and both $u$ and $v$ are $n\times 1$ matrices (a.k.a. column vectors). $T$ is the transpose I already described.

Pick any matrix you like and any two vectors you like and do the computation. Play around with this. After you are comfortable with some examples prove that (1.2) is true for the case of $n=2$. Then move on to $n=3$. Soon enough you will understand the pattern.

7. Aug 28, 2016

### Xilor

So here at (1.11) I'm mostly confused why it is written as: x$\mu'$ = $\Lambda^{\mu}_{\nu}$x$\nu$ rather than x' = $\Lambda_{\nu}$x$\nu$. The latter seems the same to me as (1.12), so what does this other information indicate?

Ah alright, that makes the T make sense. So then should I read these sections of 1.13 as:
a. Take the vector with the 4 values (x,y,z,t), transform that vector using a transposed form of the matrix we were using (wait isn't that the same since it was 4x4 and had a diagonal?), then multiply it by the vector again.
b. Same thing as the last one but now with the coords found earlier. (which we want to result in the same interval if the interval is invariant)
c. Take same coords as a. Transform them with that matrix we had used to find x' ($\Lambda$), transform with the transposed matrix again like in a. and b. , then transform with the transposed form of $\Lambda$. So basically the first transform would provide us with x' again, then the second step would be the same result as that what we had after the first transform in step 2. Then I guess the third step is supposed to take us back to the place we were right before multiplying by the last vector, so that the result is the same as in step 1 in the end.
So we need a kind of matrix for $\Lambda$ that would make that possible. So $\Lambda$ needs to be a matrix that when transposed will basically undo it's transformation from x to x', but that needs to take into account that we transformed with the other matrix in between. Is that a correct interpretation of this step? (and basically what 1.14 says?).

Then on (1.15), if that just means the same again but using the summation notation then that mostly makes sense again. I still can't really read it, because the way this convention is done properly still eludes me. I for example have no clue how to figure out the order in which these operations are supposed to be done in this system. The whole transposing thing of $\eta$ is still confusing too. Or does the T mean a transpose of everything that happened before, rather than using a transposed form of whatever matrix it is in front?

I'll have a look at his if I end up struggling throughout Caroll's for sure. The lecture series sounds good too, thanks for the suggestions!

So my problem here with (1.2) was not that I don't believe they're the same or struggle with vector/matrix transformations. It's more that I couldn't really parse it to mean anything, let alone the right thing. But thanks to your previous comments, I'm guessing the meaning of it is that it's going to output (with both alpha and beta being 0,1,2,3 using t=0, x=1,y=2,z=3) 16 vectors M dealing with one of the possible combinations of dimensions. Is that correct? And for each we need to do a multiplication using only the values of those dimensions. Lets take alpha is 1 and beta is 2, we're going to have a vector M which is (0,x,0,0) * (0,0,y,0) = (0,0,0,0). And for alpha is 0 and beta is 0 we have (t,0,0,0) * (t,0,0,0) = (t^2,0,0,0). Is that correct?
On second thought. That doesn't seem right, after adding everything together we'd have a vector (t^2,x^2,y^2,z^2), rather than a single number, which is presumably what we want. And we also have t^2 instead of -t^2. How is it even possible to get the negative sign in here when nothing in 1.2 makes a reference to a negative sign?
If M is a number, then why the notation and wouldn't we get a different result?
If M is a matrix, then how exactly does everything even work, it doesn't seem to be defined as anything, so wouldn't it be just 4x4 zeroes?
Seems I'm still confused.

Perhaps the lectures will help. The question was initially more about figuring out how to be able to learn these kinds of things on my own accord anyway. It's amazing having someone knowledgeable help out, but if that's the solution for every roadblock, it's pretty hard to get further.

8. Aug 28, 2016

### Lucas SV

Yes you certainly need to struggle to learn those things.

It is $x^{\mu'}=\Lambda^{\mu'}_\nu x^\nu$. Basically this actually means four equations, each equation for a specific value of $\mu'=0,1,2,3$. It is four equations that only involve components, so numbers, each equation's RHS has four terms. The index $\mu'$ which appears once in each side is called a free index. The indices $\nu$ which appear on the RHS are called dummy indices. You sum over dummy indices but not free indices. If you recall your maths class on analytic geometry you would have learnt that a system of linear equations can be written in matrix form. Well you can think of $x^{\mu'}=\Lambda^{\mu'}_\nu x^\nu$ as the system of four linear equations with four unknowns $x^\nu$ and $x'=\Lambda\cdot x$ as the corresponding matrix form.

I'm not sure I follow, but it is true that 1.14 is a condition on the matrix $\Lambda$ that must hold in order for $s^2$ to be invariant under $\Lambda$ transformations. This condition is extremely important in SR, it is called the orthogonality condition. Any matrix $\Lambda$ satisfying this condition is called a Lorentz transformation.

Are you trying to show (1.2) from Schutz? The short answer is: it depends on $M$. If so, what we really want to compute is the product (which in mathematical terms is called a bilinear form):
$$\langle u,v \rangle = u^T\cdot A \cdot v$$
where $u$ and $v$ are elements of $\mathbb{R}^n$ and $A$ is a fixed linear operator acting on the space $\mathbb{R}^n$. If you are not too familiar with vector spaces and linear operators, just think of $A$ as a matrix.
I will write this equation for the case of $\mathbb{R}^2$
$$\langle \begin{pmatrix} u_1 \\ u_2 \end{pmatrix} , \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} \rangle = \begin{pmatrix} u_1 & u_2 \end{pmatrix} \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \\ \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \end{pmatrix}$$
Please compute this equation. Then you will find an equation just like 1.2 (which by the way, does not use Einstein summation convention), except the indices range over two values instead of four. This is why I gave you the exercises, which I still suggest doing regardless of you knowing how to compute matrix multiplication. The point is to show the relationship between matrix multiplication in the special case of a bilinear form, and the expression for the bilinear form in components.

$s^2$ appearing in SR is just a bilinear form in which $M$ happens to be $\eta$. Schutz was trying explain why this is the case, by starting with an arbitrary $M$ and deriving what the components of $M$ should be, in order for $s^2$ to satisfy (1.1).

So, generically speaking, whenever you see an object with two indices it is the components of a matrix, and whenever you see an object with one index, it is the components of a vector. Well you will learn all this once you move to tensors and transformation laws. If you really can't follow even after some exercises, go directly to the material on tensors, before coming back to the physics. And again Suskind lectures help.

I think if you really understand (1.2) you will find all the rest much easier to understand also.

9. Aug 29, 2016

### Lucas SV

10. Aug 30, 2016

### Xilor

Aha. So all the $\mu'$ is saying is that after calculating the vector (or 4 equal vectors?), we take the value of that vector at index i, and plug it into the same index into x? So read it kind of like this: $x^{\mu'}=(\Lambda_\nu x^\nu)^{\mu'}$. Is that it? Unfortunately I've not taken this class you mention, so maybe this is a bit of learning to run before walking.

So I did your thing, and I got:

[u1 * (A11v1+ A12v2) + [u1 * (A21v1+ A22v2) + u2 * (A11v1 + A12v2)] + u2 *(A21v1 + A22v2)]

So I'm assuming you mean that we end up with something that is functionally similar to (1.2), with A representing M, and the dimensional values being like the values of u and v. It still doesn't tell me anything about A or M though, so how could it claim these are equal? I suppose M would have to be the standard SR matrix

\begin{pmatrix}
-1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{pmatrix}

But it doesn't say that we should use that anywhere. So how is one supposed to deduce that?
In your example, using just the first two columns/rows. We'd get
[u1 * (-1*v1+ 0*]v2) + [u1 * (0*v1+ 1*v2) + u2 * (1*v1 + 0*v2)] + u2 *(0*v1 + 1*v2)]
=
-v1u1 + u1v2 - v1u2+ u2v2

Wait, that doesn't check out anyway...

Last edited: Aug 30, 2016
11. Aug 30, 2016

### Lucas SV

Yes, maybe it is.

I don't see where you get the two middle terms from. You should get
$$\langle u, v \rangle = u_1(A_{11}v_1+A_{12}v_2)+u_2(A_{21}v_1+A_{22}v_2)$$The RHS of (1.2) is not too different than this, except Schutz gave the name of $M$ to $A$, and the expression is underlying vector space has four dimensions instead of two.

The $A$ in the definition of the bilinear product is meant to be arbitrary. Actually I should be more careful of the wording and say that is the definition of a bilinear product with respect to $A$. So for any matrix $A$ you pick (you are free to choose), the bilinear product is defined as in post #8. Notice, from the equation I just wrote, that the bilinear product is a linear combination of products of components of the vectors $u$ and $v$. Any arbitrary such linear combination will be a bilinear product for some matrix $A$.

Now let us return to Schutz. The argument goes as follows. Assume (1.1) holds. Assume we make a linear change of coordinates. Then, since $\Delta s^2$ is a linear combination of components of $\Delta x$ (by which I mean the four-vector) and $\Delta x$, it is the case that $\Delta \bar{s}^2$ is a linear combination of components of $\Delta \bar{x}$ and $\Delta \bar{x}$. Therefore $\Delta \bar{s}^2$ must be a bilinear form with respect to some matrix $M$. This is the same as saying that there exists a matrix $M$ such that (1.2) is true. Then this is used to figure out the components of $M$ up to the point where (1.5) is proved.

By the way, I understand what you mean by the notation $\Lambda_\nu x^\nu$. But this notation is a little awkward and people don't really use it. Either they write in full component form or in index-free notation. What you are suggesting is to mix the two, which again is non-standard (there are exceptions to this statement, in gauge theory, but you still have way to go to get there.)

Also you had a good idea to only take the first two components to get your last equation (which would have been correct if you had gotten the bilinear form right). This is actually called a 1+1 spacetime.

Last edited: Aug 30, 2016
12. Aug 31, 2016

### Xilor

It was because I took the outer product instead of the dot product. It makes sense now.

Thanks for the explanation, I finally get what (1.2) is saying. I think I should probably take a few steps backwards before approaching the rest of this though, it's clearly still above my level.

13. Aug 31, 2016

### Staff: Mentor

There is no particular meaning for that. It is just a common convention in the SR literature to denote that a particular value is in a different reference frame (the primed frame) than the other frame (the unprimed frame)