# I Intuition behind elementary operations on matrices

Tags:
1. May 20, 2017

### Mr Real

For finding the inverse of a matrix A, we convert the expression A = I A (where I is identity matrix), such that we get I = B A ( here B is inverse of matrix A) by employing elementary row or column operations. But why do these operations work? Why does changing elements of a complete row by another corresponding row work but e.g. we can't add/subtract the same number in the equality?
Is there some proof of these operations? What is the principle/intuition behind these operations? Can they be explained by any first principles?

2. May 20, 2017

### Stephen Tashi

You have to be clear about what you mean by "work".

If the only requirement for a procedure to "work" is that it transforms an equation into another equation then we can add and subtract to both sides of a matrix equation. Each "side" of a matrix equation , when completely simplified, is a matrix. For example, suppose we have the equation $C = D$ where $C$ and $D$ are 2x3 matrices. If we add 7 to the entry in the 2nd row 3rd column on both sides of the equation, we get another equation. So adding 7 to the corresponding entry of both matrices does "work" in that sense.

Suppose we have the equation $C = (F)(G)$ where $C,F,G$ are 2x3 matrices. It is not true that we always transform this to an equivalent equation if we add 7 to the 2nd row 3rd column of $C$ and do the same to only to the matrix $F$. That isn't surprising. By analogy, if you have the equation in real variables $0 = (x-2)(x+1)$ we don't preserve the solution set if if we add 7 to the left hand side and add 7 only to factor $(x-2)$ on the right hand side.

With an equation in real variables, you can multiply both sides by the same number. For example, we can transform the equation $0 = (x/7 - 1/7)(x-2)$ to an equivalent equation by multiplying both sides by $7$. When we multiply the right side by 7, we are permitted to multiply only the factor $(x/7 - 1/7)$ by 7. By analogy, if we have the matrix equation $C = (F)(G)$ we can multiply both sides of the equation by a matrix $E$, obtaining an equivalent equation $(E)(C) = (E)(F)(G)$. In evaluating the right hand side, we can compute it as $(EF)(G)$

The reason an elementary row operation "works" is that it is the same as multiplying both sides of a matrix equation of the form $C = (F)(G)$ by a matrix $E$. Have you studied how to find the "elementary matrix" that performs a given elementary row operation?

3. May 20, 2017

### Mr Real

Okay, i understood what all you said above. But I still have some doubts. Consider C = (F)(G), now why does doing operations like replacing a complete row by another row of the same matrix or changing R2 to R1 - R3 (R refers to row) give an equivalent equation? What's the logic behind this?
Also while doing a row operation and changing C and G by the same amount (e.g. by replacing a row), it wouldn't give an equivalent equation but changing C and F by the same amount would. Why is it so?

4. May 20, 2017

### Staff: Mentor

5. May 20, 2017

### Stephen Tashi

The operations you described may not give an equilvalent equation - depending on what things in the equation are the variables. Not all manipulations with rows are "elementary row operations". An example of an elementary row operation is to replace R2 by R1 + R2.

That elementary row operation changes the matrix $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$ to $\begin{pmatrix} a & b\\ a+c && b+d \end{pmatrix}$.

The same effect can be accomplished by the multiplication $\begin{pmatrix} 1 &0 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} a & b \\ c& d \end{pmatrix}$. That's why I asked you if you have studied the "elementary matrices" that correspond to "elementary row operations".

When you perform an elementary row operation on a matrix equation $C = (F)(G)$ you are implementing a short-cut for multiplying both sides of the equation by an invertible matrix $E$.

6. May 21, 2017

### Mr Real

Can you provide an example of an equation of the form: C = (F)(G) where if we change R2 to R1 - R3, it doesn't give an equivalent equation.
And no, I didn't know about elementary matrices. Just read a bit about it and have understood these operations a little better. Will continue doing so.
Thanks for your response.
Mr R

7. May 21, 2017

### Staff: Mentor

Without giving it very much thought, I don't believe that replacing R2 by R1 - R3 is a valid row operation.
There are three row operations. Each of these row operations can be represented by an elementary matrix.
1. Replace a row by a nonzero multiple of itself. $R_i <-- kR_i$
2. Switch two rows. $R_i <--> R_j$
3.Replace a row by itself plus a constant multiple of another row. $R_i <-- R_i + kR_j$

Each row operation results in a changed matrix, but one that is equivalent to the one you started with. In other words, the three row operations don't change the solution set of the matrix you started with.

8. May 21, 2017

### Mr Real

Okay, so one of the terms needs to be the row that is to be changed.
So won't replacing R2 by R1+ 7R2 be fine ?
Just a set of some other queries regarding this: so we can't replace, let's say, R1 by R1 - R2 + R3 (as it has 3 terms), by R1 - R1/R2 ?
Thanks for your response.
Mr R

9. May 21, 2017

### Staff: Mentor

This isn't an elementary row operation, but it could be done by two of these operations.
1. $R_2 <-- 7R_2$
2. $R_2 <-- R_2 + R_1$
For the first, this could be done in two steps, similar to what I did above.
For your second, dividing one row by another isn't a valid operation. The elementary row operations involve only the addition of two rows, multiplication of a row by a nonzero constant, and swapping two rows. They don't involve multiplying one row by another, or dividing one row by another, taking the square root of a row, or other such operation.

10. May 21, 2017

### Stephen Tashi

An "equation" is statement involving variables. By contrast, an "equality" is simply a statement that two constant expressions are equal. For two equations to be "equivalent" they must have the same set of solutions.

The matrix equation $\begin{pmatrix}x&0&0 \\0&y&0 \\0&0&z \end{pmatrix} = \begin{pmatrix} 1&0&0 \\0&1&0 \\0&0&1 \end{pmatrix} \begin{pmatrix} 3&0&0 \\0&4&0 \\0&0&5 \end{pmatrix}$ has solution $x = 3, y =4, z = 5$.

The matrix equation $\begin{pmatrix} x&0&0 \\x&0&-z \\0&0&z \end{pmatrix} = \begin{pmatrix} 1&0&0 \\1&0&-1 \\0&0&1 \end{pmatrix} \begin{pmatrix} 3&0&0 \\0&4&0 \\0&0&5 \end{pmatrix}$ has additional solutions such as $x = 3, y = 13, z = 5$ because the equation allows $y$ to be arbitrary.

It is correct that the row operation you mention does change an "equality" into another equality.

$\begin{pmatrix}3&0&0 \\0&4&0 \\0&0&5 \end{pmatrix} = \begin{pmatrix} 1&0&0 \\0&1&0 \\0&0&1 \end{pmatrix} \begin{pmatrix} 3&0&0 \\0&4&0 \\0&0&5 \end{pmatrix}$

$\begin{pmatrix}3&0&0 \\3&0&-5 \\0&0&5 \end{pmatrix} = \begin{pmatrix} 1&0&0 \\1&0&-1 \\0&0&1 \end{pmatrix}\begin{pmatrix} 3&0&0 \\0&4&0 \\0&0&5 \end{pmatrix}$

However changing from one equality to another is not sufficient to prove that such a series of steps reaches a desired goal. For example one could begin with a matrix equality and apply the operation "Change all rows to rows of zeroes". This would produce another equality, but serve no useful purpose.

The goal we are pursuing in manipulating the equality $A = (I)(A)$ is to reach another equality of the form $I = (B)(A)$. For this goal to be reached, we must avoid producing an equality like
$\begin{pmatrix}3&0&0 \\3&0&-5 \\0&0&5 \end{pmatrix} = \begin{pmatrix} 1&0&0 \\1&0&-1 \\0&0&1 \end{pmatrix}\begin{pmatrix} 3&0&0 \\0&4&0 \\0&0&5 \end{pmatrix}$ because this leaves us no way to proceed that will change the left hand side to the identity matrix. The fact that the second column of entries in the left hand size is all zeroes prevents us from getting the second row $0,1,0$ of the identity matrix.

11. May 21, 2017

### mathwonk

matrices represent linear transformations between two vector spaces. there are several natural equivalence relations in these terms. one is left equivalence, where two transformations, between the same two spaces, are "left - equivalent" iff they become equal after composing one of them with an isomorphism of the target space. by convention of writing compositions, this means composing on the left with an isomorphism. this turns out to be the same as saying the two transformations have the same row space, also iff they have the same kernel.

in terms of the matrices of the linear transformations, they are left equivalent iff multiplying one of them from the left by an invertible matrix changes it into the other. since every isomorphism can be written as a composition of a sequence of "elementary matrices", this is the same as saying one of the matrices can be changed into the other by a sequence of elementary row operations. I.e. left multiplying by an elementary matrix just performs an elementary row operation.

since any matrix can be changed into any other matrix with the same kernel by a sequence of row operations, then any square matrix with zero kernel, can be changed into the identity matrix by such a sequence. Necessarily such a sequence will have as product the inverse of the matrix. This is why we can compute an inverse by these operations.

another equivalence relation is right equivalence, where two matrices are R-equivalent iff one can be changed into the other by right multiplication by an invertible matrix, or equivalently by a sequence of column operations. this happens iff the two matrices have the same image, i.e. same column space.

the 3rd equivalence relation for arbitrary matrices makes two matrices equivalent iff one can be changed into the other by using both row and column operations. for square matrices, the key equivalence relation is similarity, where we change one into the other by conjugating ti by an invertible matrix.

there are normal foms for matrices under all these equivalences. the normal form for row equivalence is the row educed echelon form, for column operations it is the reduced version of the columns, i.e. iff their transposes have the same row reduced echelon form. the row reduced echelon form of every invertible matrix for example is the identity matrix.

for 2 sided equivalence, the normal form is diagonal, with only 1's and zeroes on the diagonal, so two matrices are 2-sided equivalent iff they have the same rank. finally two square matrices are similar iff they have the same rational canonical form, iff their characteristic matrices are 2 sided equivalent over the polynomial ring, i.e. iff their characteristic matrices have the same diagonal form, (when the diagonal entries are made to successively divide one another.) these forms can be computed by the euclidean algorithm.

there is another normal form for similarity, the jordan form, but it cannot be computed by hand unless one is lucky enough to be able to factor the characteristic polynomial, which in general is unfeasible, but in cooked examples found in books, it can be done by the rational root theorem, since in those examples the characteristic polynomial is rigged to have all its irreducible factors linear polynomials.

Last edited: May 21, 2017
12. May 21, 2017

### WWGD

I don't know if this is what you are looking for, but , formally, the set of all elementary matrices generates the space of all invertible matrices through the operation of multiplication. This means if you start with the identity matrix ( for convenience; you can start with any other invertible matrix) and apply a finite number of elementary transformations, you will end up with any invertible matrix.

13. May 21, 2017

### Mr Real

But I have studied that an "expression" is a statement involving variables and an equation is an expression that has an equality sign.
How is the second equation an equality? The product of the two matrices in RHS does not give the matrix in LHS ( the second row of the matrix in LHS should be [0 0 0] instead of [3 0 -5].
How did you come to know just by seeing the equality that the second column elements are all zero that the matrix on the left cannot be converted into an identity matrix? How does second column elements all being zero stop us from converting it to an identity matrix? Is it some property related to matrices?
Thanks
Mr R

Last edited: May 21, 2017
14. May 21, 2017

### Stephen Tashi

An equation has an equality sign, but an equation implies there are variables, for which we wish to find solutions. For example, "3 = 2 + 1" does not say anything about a variable being involved. You could pose a problem by saying "Find x such that 3 = 2 + 1" and force the reader to consider a variable. In that case the solution set for the "equation" 3 = 2 + 1 would be all numbers.

Second "equation" or second "equality" ? In either case, check your multiplication.
For example the computation for the first element in the second row is $(1)(3) + (0)(0)+ (-1)(0) = 3$.

In performing elementary row operations, once you get a column of zeroes, no amount of adding rows together or multiplying rows by numbers will get rid of the zeroes. It's just something you'll learn from experience.

15. May 21, 2017

### mathwonk

i apologize if my answer was too long, but i was trying to be complete. WWGD's nice answer for example is my second sentence, second paragraph. post 11.

maybe this point indeed is what the OP wants to grasp. i.e. we are trying to transform a matrix into a simple form by multiplying on the left by a suitable invertible matrix. Now elementary matrices are chosen because they are basic building blocks for all invertible matrices. Thus we can achieve the same result of left multiplying by any invertible matrix, in stages, by performing a sequence of elementary row operations. So elementary row operations are a tool for accomplishing the left multiplication of a matrix by any invertible matrix. In particular, if starting from an invertible matrix, we can reach the identity only by left multiplying by its left inverse. hence if we can find any sequence of operations that results in the identity, then those must have as their product, the inverse matrix.

as to why some of your operations are not acceptible, all operations must be invertible, i.e. it must be possible to reverse their effect by some other operation. but some of your operations lose information which cannot be recovered by another operation, as steven tashi is explaining to you.

Last edited: May 21, 2017
16. May 22, 2017

### Mr Real

Yeah, you are right, it was my mistake.

17. May 22, 2017

### Mr Real

Actually, I've just started to learn about matrices, so many of the terms used in your previous answer were alien to me but I understood this answer.
But how is an operation like converting R2 to R1 - R3 not invertible but converting it to R2 - R3 is invertible? Can you explain this with an example?
Thanks
Mr R

18. May 22, 2017

### Staff: Mentor

How would you get back R2 in your matrix ? You can't add or subtract any rows now, you've lost R2 permanently.

However the R2-R3 replacement for R2 means you can revert back to the old R2 just by adding R3 to it.

19. May 22, 2017

### Mr Real

Yeah, I get it now. So, would this work for all elementary row/column operations and for any matrix?
Thanks
Mr R

20. May 22, 2017

### PeroK

In this context matrices are just a convenient way of representing a set of equations. As row operations involve at most two rows, you can explain it all using just two equations. These can have any number of variables, but let's have three in this example:

$a_1x + b_1y + c_1z = d_1$
$a_2x + b_2y + c_2z = d_2$

Now, let's suppose we have found $(x, y, z)$ that solve these equations. It's clear that if:

$a_1x + b_1y + c_1z = d_1$

Then:

$k(a_1x + b_1y + c_1z) = k(d_1)$

Where $k$ is some non-zero constant. And, likewise, if this second equation holds (with $k$ in it) then so does the first without the $k$.

That's why you can multiply a row by a constant.

It's even clearer that if:

$a_1x + b_1y + c_1z = d_1$
$a_2x + b_2y + c_2z = d_2$

Then:

$a_2x + b_2y + c_2z = d_2$
$a_1x + b_1y + c_1z = d_1$

That's just the same equations written in a different order, so must have the same solutions.

Finally, if

$a_1x + b_1y + c_1z = d_1$
$a_2x + b_2y + c_2z = d_2$

Then adding these together gives:

$(a_1+a_2)x + (b_1+b_2)y + (c_1+c_2)z = d_1+d_2$

And, if we put this together with either of the original equations we have:

$a_1x + b_1y + c_1z = d_1$
$(a_1+a_2)x + (b_1+b_2)y + (c_1+c_2)z = d_1+d_2$

This has exactly the same solutions as the original equations. That's why you can add one row to another. Adding a mutiple of one row to another isn't really a new operation, just a combination of these ones.