MHB Similar Matrices and Change of Basis

Math Amateur
Gold Member
MHB
Messages
3,920
Reaction score
48
I am spending time revising vector spaces. I am using Dummit and Foote: Abstract Algebra (Chapter 11) and also the book Linear Algebra by Stephen Freidberg, Arnold Insel and Lawrence Spence.

On page 419 D&F define similar matrices as follows:View attachment 3047

They then state the following:View attachment 3048

BUT? ... how exactly does it follow that $$ P^{-1} M_{\mathcal{B}}^{\mathcal{B}} (\phi) P = M_{\mathcal{E}}^{\mathcal{E}} (\phi)$$

Can anyone show me how this result is obtained?

I must be missing something obvious because no indication is given of how this result is obtained ... ... ?

Peter

***EDIT***

(1) Reflecting ... ... I am beginning to think that $$M_{\mathcal{B}}^{\mathcal{B}} (\phi)$$ and $$M_{\mathcal{E}}^{\mathcal{E}} (\phi)$$ are equal to the identity matrix $$I $$ ... ... ? ... ... but then, what is the point of essentially writing $$P^{-1} I P = I$$?(2) Further reflecting ... ... It may be that the above formula makes more sense in the overall context of what D&F say about the change of basis or transition matrix ... ?

In the light of (2) I am proving the relevant test on similar matrices and the transition of change of basis matrix for MHB members interested in the post ... ... see below ... ... View attachment 3049
 
Last edited:
Physics news on Phys.org
Here is the thing I want you to remember, and take to heart:

A similarity transform, and a change-of-basis, are the same thing (essentially).

We have two ways of looking at this: the linear transformation view, and the matrix view. One is "abstract", and one is "concrete".

Suppose $\rho \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (or linear automorphism).

This means it is invertible, that is, there exists $\rho^{-1} \in \text{Hom}_{\ F}(V,V)$ such that:

$\rho \circ \rho^{-1} = \rho^{-1} \circ \rho = 1_V$

Now suppose $\phi \in \text{Hom}_{\ F}(V,V)$ is any linear endomorphism. Clearly:

$\rho^{-1}\phi\rho$ is also a linear endomorphism. What might this do?

Some things you will have to prove, before you are fully prepared to really comprehend this:

1) $\phi \in \text{Hom}_{\ F}(V,V)$ is injective if and only if for every linearly independent subset $S \subseteq V,\ \phi(S)$ is linearly independent.

2) $\phi \in \text{Hom}_{\ F}(V,V)$ is surjective if and only if for every set $T$ with $\text{span}(T) = V$, we have that $\text{span}(\phi(T)) = V$ as well.

Taken together, these two statements imply:

3) $\phi \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (of vector spaces) if and only if $\phi$ maps a basis to a basis.

Note that these conditions reduce "total behavior" of $\phi$ (on all of $V$) to behavior on certain kinds of subsets (which in most of the simpler cases, are FINITE). So what we have is "labor-saving criteria". We can test for injectivity, surjectivity, or bijectivity on certain "well-chosen" subsets.

It's condition (3) that matters, here. Essentially, $\rho$ replaces one basis with another. Then $\phi$ "does its thing" (whatever linear transform it does), on the "new basis". Finally, $\rho^{-1}$ "returns us to our original basis". So, in a sense, $\phi$ and $\rho^{-1}\phi\rho$ represent "the same transformation" (hence the name "similar"), just "different bases" (which can be thought of, naively, as "coordinate systems" or "terminologies", since an isomorphism is essentially a "re-naming scheme").

Now let's look at the concrete side of things. Realize that, abstractly, "vectors are vectors", they don't care how we label them. When we attach NUMBERS (that is, field element entries) to a vector, what those numbers MEAN is "up to us". Imagine the Euclidean plane as a blank piece of paper, with just the one dot marking the 0-vector (0,0). The coordinate axes we draw, and the unit lengths we assign on them, are OUR CHOICES, they don't come with the space. We USUALLY draw them perpendicular, and "scaled the same", but this is a bit arbitrary, on our part.

As a pair, (2,3) is just a pair of numbers. As a VECTOR, we usually mean:

$(2,3) = 2v_1 + 2v_2$, where $\{v_1,v_2\}$ is a basis. WE HAVE TO SAY what $v_1,v_2$ ARE.

For example, in the polynomial space $P_1(\Bbb R) = \{a_0 + a_1t: a_0,a_1\}$, if our basis is $\{1,t\}$, then:

$(2,3)$ means $2 + 3t$.

If our basis is $\{1-t,1+t\}$, then $(2,3) = 2(1-t) + 3(1+t) = 2 - 2t + 3 + 3t = 5 + t$, which is a different polynomial.

A matrix in one basis, may have a totally different "appearance" in another basis. For example, it may be upper-triangular in one basis (everything below the main diagonal is 0), and not so in a different basis. Some bases may be easier to work with than others, depending on what kinds of calculations we are doing.

I am going to give you an example of how this works. Study it well.

Suppose we have the linear transformation $T:\Bbb R^2 \to \Bbb R^2$ given by:

$T(x,y) = (2x+4y,3x+3y)$

In the basis $\mathcal{B} = \{(1,0),(0,1)\} = \{e_1,e_2\}$ (this is called the standard basis), we have the matrix:

$[T]_{\mathcal{B}}^{\mathcal{B}} = \begin{bmatrix}2&4\\3&3 \end{bmatrix}$ (verify this!).

Note that the first column of this matrix is $[T(e_1)]_{\mathcal{B}}$, and the second column is $[T(e_2)]_{\mathcal{B}}$. This is no accident, the way the standard basis vectors (expressed IN that basis) "pick out columns" is a function of how matrix multiplication works (if we "hit them on the other side", as row-vectors, they "pick out rows").

Now $\mathcal{E} = \{(-4,3),(1,1)\} = \{v_1,v_2\}$ is ALSO a basis for $\Bbb R^2$:

It is linearly independent:

If $c_1v_1 + c_2 v_2 = 0$ that is, if: $c_1(-4,3) + c_2(1,1) = (0,0)$, so that:

$c_2 - 4c_1 = 0$
$3c_1 + c_2 = 0$

Then $4c_1 - c_2 + 3c_1 + c_2 = -0 + 0 = 0$, that is: $7c_1 = 0\implies c_1 = 0$. Clearly, we must have as well $c_2 = 0$, so $\{v_1,v_2\}$ is linearly independent. What this means is, neither $v_1$ nor $v_2$ (the possible non-empty proper subsets of $\mathcal{E}$) is expressible in terms of the other: we need them BOTH to describe linear combinations of the two.

It spans $\Bbb R^2$:

Given $(a,b) \in \Bbb R^2$ we have:

$(a,b) = \frac{1}{7}(7a,7b) = \frac{1}{7}[(4a-4b,3b-3a) + (3a+4b,3a+4b)]$

$= \dfrac{b-a}{7}(-4,3) + \dfrac{3a+4b}{7}(1,1)$

so any point in $\Bbb R^2$ is expressible as a linear combination of $v_1,v_2$.

Why would we choose to use such an unusual basis?

Let us calculate the matrix $[T]_{\mathcal{E}}^{\mathcal{E}}$. We will do this 2 ways.

First, we calculate $T(v_1),T(v_2)$.

$T(v_1) = T((-4,3)) = (2(-4) + 4(3),3(-4) + 3(3)) = (4,-3) = -v_1$.

In the basis $\mathcal{E}$ this is the linear combination:

$(-1)v_1 + 0v_2$ so $[T(v_1)]_{\mathcal{E}} = (-1,0)$.

$T(v_2) = T((1,1) = (2(1) + 4(1),3(1) + 3(1)) = (6,6) = 6v_2$. So $[T(v_2)]_{\mathcal{E}} = (0,6)$ and by our definition of $[T]_{\mathcal{E}}^{\mathcal{E}}$ we have:

$[T]_{\mathcal{E}}^{\mathcal{E}} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$.

Next, we take "the long way around". First, we need to find a matrix $P$ that take $\mathcal{E}$-coordinates to $\mathcal{B}$-coordinates. Such a matrix would take $[v_1]_{\mathcal{E}}$ to $[v_1]_{\mathcal{B}}$, that is, when it was multiplied by $(1,0)^T$, would yield $(-4,3)^T$, and similarly, for $v_2$, would take $(0,1)^T$ to $(1,1)^T$.

It doesn't take much thought to see that this matrix is:

$P = \begin{bmatrix}-4&1\\3&1 \end{bmatrix}$.

Having applied $P$ to our $\mathcal{E}$-coordinates, we are now in $\mathcal{B}$-coordinates, and may just multiply by our "old matrix" for $T$, to get $T(v)$ in $\mathcal{B}$-coordinates:

$[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_{\mathcal{E}}) = [T]_{\mathcal{B}}^{\mathcal{B}}([v]_{\mathcal{B}}) = [T(v)]_{\mathcal{B}}$

Now the inverse coordinate transformation matrix is just going to be the inverse matrix $P^{-1}$ (why?). This is:

$P^{-1} = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}$, and we have:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_\mathcal{E}) = P^{-1}([T(v)]_{\mathcal{B}} = [T(v)]_{\mathcal{E}}$

that is:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = [T]_{\mathcal{E}}^{\mathcal{E}}$

since that IS the matrix which takes $[v]_{\mathcal{E}}$ to $[T(v)]_{\mathcal{E}}$

Seeing is believing:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}2&4\\3&3 \end{bmatrix}\begin{bmatrix}-4&1\\3&1 \end{bmatrix}$

$= \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}4&6\\-3&6 \end{bmatrix}$

$=\frac{-1}{7}\begin{bmatrix}7&0\\0&-42\end{bmatrix} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$

In this "unusual basis" we see that what $T$ does, is change the sign of the $v_1$ coordinate, and magnify the $v_2$ coordinate by a factor of $6$, that is, it is the composition of an axis flip, and an axis stretch, something that is not at all apparent when using the "standard axes".
 
Last edited:
Deveno said:
Here is the thing I want you to remember, and take to heart:

A similarity transform, and a change-of-basis, are the same thing (essentially).

We have two ways of looking at this: the linear transformation view, and the matrix view. One is "abstract", and one is "concrete".

Suppose $\rho \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (or linear automorphism).

This means it is invertible, that is, there exists $\rho^{-1} \in \text{Hom}_{\ F}(V,V)$ such that:

$\rho \circ \rho^{-1} = \rho^{-1} \circ \rho = 1_V$

Now suppose $\phi \in \text{Hom}_{\ F}(V,V)$ is any linear endomorphism. Clearly:

$\rho^{-1}\phi\rho$ is also a linear endomorphism. What might this do?

Some things you will have to prove, before you are fully prepared to really comprehend this:

1) $\phi \in \text{Hom}_{\ F}(V,V)$ is injective if and only if for every linearly independent subset $S \subseteq V,\ \phi(S)$ is linearly independent.

2) $\phi \in \text{Hom}_{\ F}(V,V)$ is surjective if and only if for every set $T$ with $\text{span}(T) = V$, we have that $\text{span}(\phi(T)) = V$ as well.

Taken together, these two statements imply:

3) $\phi \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (of vector spaces) if and only if $\phi$ maps a basis to a basis.

Note that these conditions reduce "total behavior" of $\phi$ (on all of $V$) to behavior on certain kinds of subsets (which in most of the simpler cases, are FINITE). So what we have is "labor-saving criteria". We can test for injectivity, surjectivity, or bijectivity on certain "well-chosen" subsets.

It's condition (3) that matters, here. Essentially, $\rho$ replaces one basis with another. Then $\phi$ "does its thing" (whatever linear transform it does), on the "new basis". Finally, $\rho^{-1}$ "returns us to our original basis". So, in a sense, $\phi$ and $\rho^{-1}\phi\rho$ represent "the same transformation" (hence the name "similar"), just "different bases" (which can be thought of, naively, as "coordinate systems" or "terminologies", since an isomorphism is essentially a "re-naming scheme").

Now let's look at the concrete side of things. Realize that, abstractly, "vectors are vectors", they don't care how we label them. When we attach NUMBERS (that is, field element entries) to a vector, what those numbers MEAN is "up to us". Imagine the Euclidean plane as a blank piece of paper, with just the one dot marking the 0-vector (0,0). The coordinate axes we draw, and the unit lengths we assign on them, are OUR CHOICES, they don't come with the space. We USUALLY draw them perpendicular, and "scaled the same", but this is a bit arbitrary, on our part.

As a pair, (2,3) is just a pair of numbers. As a VECTOR, we usually mean:

$(2,3) = 2v_1 + 2v_2$, where $\{v_1,v_2\}$ is a basis. WE HAVE TO SAY what $v_1,v_2$ ARE.

For example, in the polynomial space $P_1(\Bbb R) = \{a_0 + a_1t: a_0,a_1\}$, if our basis is $\{1,t\}$, then:

$(2,3)$ means $2 + 3t$.

If our basis is $\{1-t,1+t\}$, then $(2,3) = 2(1-t) + 3(1+t) = 2 - 2t + 3 + 3t = 5 + t$, which is a different polynomial.

A matrix in one basis, may have a totally different "appearance" in another basis. For example, it may be upper-triangular in one basis (everything below the main diagonal is 0), and not so in a different basis. Some bases may be easier to work with than others, depending on what kinds of calculations we are doing.

I am going to give you an example of how this works. Study it well.

Suppose we have the linear transformation $T:\Bbb R^2 \to \Bbb R^2$ given by:

$T(x,y) = (2x+4y,3x+3y)$

In the basis $\mathcal{B} = \{(1,0),(0,1)\} = \{e_1,e_2\}$ (this is called the standard basis), we have the matrix:

$[T]_{\mathcal{B}}^{\mathcal{B}} = \begin{bmatrix}2&4\\3&3 \end{bmatrix}$ (verify this!).

Note that the first column of this matrix is $[T(e_1)]_{\mathcal{B}}$, and the second column is $[T(e_2)]_{\mathcal{B}}$. This is no accident, the way the standard basis vectors (expressed IN that basis) "pick out columns" is a function of how matrix multiplication works (if we "hit them on the other side", as row-vectors, they "pick out rows").

Now $\mathcal{E} = \{(-4,3),(1,1)\} = \{v_1,v_2\}$ is ALSO a basis for $\Bbb R^2$:

It is linearly independent:

If $c_1v_1 + c_2 v_2 = 0$ that is, if: $c_1(-4,3) + c_2(1,1) = (0,0)$, so that:

$c_2 - 4c_1 = 0$
$3c_1 + c_2 = 0$

Then $4c_1 - c_2 + 3c_1 + c_2 = -0 + 0 = 0$, that is: $7c_1 = 0\implies c_1 = 0$. Clearly, we must have as well $c_2 = 0$, so $\{v_1,v_2\}$ is linearly independent. What this means is, neither $v_1$ nor $v_2$ (the possible non-empty proper subsets of $\mathcal{E}$) is expressible in terms of the other: we need them BOTH to describe linear combinations of the two.

It spans $\Bbb R^2$:

Given $(a,b) \in \Bbb R^2$ we have:

$(a,b) = \frac{1}{7}(7a,7b) = \frac{1}{7}[(4a-4b,3b-3a) + (3a+4b,3a+4b)]$

$= \dfrac{b-a}{7}(-4,3) + \dfrac{3a+4b}{7}(1,1)$

so any point in $\Bbb R^2$ is expressible as a linear combination of $v_1,v_2$.

Why would we choose to use such an unusual basis?

Let us calculate the matrix $[T]_{\mathcal{E}}^{\mathcal{E}}$. We will do this 2 ways.

First, we calculate $T(v_1),T(v_2)$.

$T(v_1) = T((-4,3)) = (2(-4) + 4(3),3(-4) + 3(3)) = (4,-3) = -v_1$.

In the basis $\mathcal{E}$ this is the linear combination:

$(-1)v_1 + 0v_2$ so $[T(v_1)]_{\mathcal{E}} = (-1,0)$.

$T(v_2) = T((1,1) = (2(1) + 4(1),3(1) + 3(1)) = (6,6) = 6v_2$. So $[T(v_2)]_{\mathcal{E}} = (0,6)$ and by our definition of $[T]_{\mathcal{E}}^{\mathcal{E}}$ we have:

$[T]_{\mathcal{E}}^{\mathcal{E}} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$.

Next, we take "the long way around". First, we need to find a matrix $P$ that take $\mathcal{E}$-coordinates to $\mathcal{B}$-coordinates. Such a matrix would take $[v_1]_{\mathcal{E}}$ to $[v_1]_{\mathcal{B}}$, that is, when it was multiplied by $(1,0)^T$, would yield $(-4,3)^T$, and similarly, for $v_2$, would take $(0,1)^T$ to $(1,1)^T$.

It doesn't take much thought to see that this matrix is:

$P = \begin{bmatrix}-4&1\\3&1 \end{bmatrix}$.

Having applied $P$ to our $\mathcal{E}$-coordinates, we are now in $\mathcal{B}$-coordinates, and may just multiply by our "old matrix" for $T$, to get $T(v)$ in $\mathcal{B}$-coordinates:

$[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_{\mathcal{E}}) = [T]_{\mathcal{B}}^{\mathcal{B}}([v]_{\mathcal{B}}) = [T(v)]_{\mathcal{B}}$

Now the inverse coordinate transformation matrix is just going to be the inverse matrix $P^{-1}$ (why?). This is:

$P^{-1} = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}$, and we have:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_\mathcal{E}) = P^{-1}([T(v)]_{\mathcal{B}} = [T(v)]_{\mathcal{E}}$

that is:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = [T]_{\mathcal{E}}^{\mathcal{E}}$

since that IS the matrix which takes $[v]_{\mathcal{E}}$ to $[T(v)]_{\mathcal{E}}$

Seeing is believing:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}2&4\\3&3 \end{bmatrix}\begin{bmatrix}-4&1\\3&1 \end{bmatrix}$

$= \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}4&6\\-3&6 \end{bmatrix}$

$=\frac{-1}{7}\begin{bmatrix}7&0\\0&-42\end{bmatrix} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$

In this "unusual basis" we see that what $T$ does, is change the sign of the $v_1$ coordinate, and magnify the $v_2$ coordinate by a factor of $6$, that is, it is the composition of an axis flip, and an axis stretch, something that is not at all apparent when using the "standard axes".
Really helpful post ... So much more guidance and clarity than Dummit and Foote's explanation ... Thank you ...

Peter
 
##\textbf{Exercise 10}:## I came across the following solution online: Questions: 1. When the author states in "that ring (not sure if he is referring to ##R## or ##R/\mathfrak{p}##, but I am guessing the later) ##x_n x_{n+1}=0## for all odd $n$ and ##x_{n+1}## is invertible, so that ##x_n=0##" 2. How does ##x_nx_{n+1}=0## implies that ##x_{n+1}## is invertible and ##x_n=0##. I mean if the quotient ring ##R/\mathfrak{p}## is an integral domain, and ##x_{n+1}## is invertible then...
The following are taken from the two sources, 1) from this online page and the book An Introduction to Module Theory by: Ibrahim Assem, Flavio U. Coelho. In the Abelian Categories chapter in the module theory text on page 157, right after presenting IV.2.21 Definition, the authors states "Image and coimage may or may not exist, but if they do, then they are unique up to isomorphism (because so are kernels and cokernels). Also in the reference url page above, the authors present two...
When decomposing a representation ##\rho## of a finite group ##G## into irreducible representations, we can find the number of times the representation contains a particular irrep ##\rho_0## through the character inner product $$ \langle \chi, \chi_0\rangle = \frac{1}{|G|} \sum_{g\in G} \chi(g) \chi_0(g)^*$$ where ##\chi## and ##\chi_0## are the characters of ##\rho## and ##\rho_0##, respectively. Since all group elements in the same conjugacy class have the same characters, this may be...
Back
Top