MHB Similar Matrices and Change of Basis

Math Amateur · Aug 25, 2014

I am spending time revising vector spaces. I am using Dummit and Foote: Abstract Algebra (Chapter 11) and also the book Linear Algebra by Stephen Freidberg, Arnold Insel and Lawrence Spence.

On page 419 D&F define similar matrices as follows:View attachment 3047

They then state the following:View attachment 3048

BUT? ... how exactly does it follow that $$ P^{-1} M_{\mathcal{B}}^{\mathcal{B}} (\phi) P = M_{\mathcal{E}}^{\mathcal{E}} (\phi)$$

Can anyone show me how this result is obtained?

I must be missing something obvious because no indication is given of how this result is obtained ... ... ?

Peter

***EDIT***

(1) Reflecting ... ... I am beginning to think that $$M_{\mathcal{B}}^{\mathcal{B}} (\phi)$$ and $$M_{\mathcal{E}}^{\mathcal{E}} (\phi)$$ are equal to the identity matrix $$I $$ ... ... ? ... ... but then, what is the point of essentially writing $$P^{-1} I P = I$$?(2) Further reflecting ... ... It may be that the above formula makes more sense in the overall context of what D&F say about the change of basis or transition matrix ... ?

In the light of (2) I am proving the relevant test on similar matrices and the transition of change of basis matrix for MHB members interested in the post ... ... see below ... ... View attachment 3049

Deveno · Aug 25, 2014

Here is the thing I want you to remember, and take to heart:

A similarity transform, and a change-of-basis, are the same thing (essentially).

We have two ways of looking at this: the linear transformation view, and the matrix view. One is "abstract", and one is "concrete".

Suppose $\rho \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (or linear automorphism).

This means it is invertible, that is, there exists $\rho^{-1} \in \text{Hom}_{\ F}(V,V)$ such that:

$\rho \circ \rho^{-1} = \rho^{-1} \circ \rho = 1_V$

Now suppose $\phi \in \text{Hom}_{\ F}(V,V)$ is any linear endomorphism. Clearly:

$\rho^{-1}\phi\rho$ is also a linear endomorphism. What might this do?

Some things you will have to prove, before you are fully prepared to really comprehend this:

1) $\phi \in \text{Hom}_{\ F}(V,V)$ is injective if and only if for every linearly independent subset $S \subseteq V,\ \phi(S)$ is linearly independent.

2) $\phi \in \text{Hom}_{\ F}(V,V)$ is surjective if and only if for every set $T$ with $\text{span}(T) = V$, we have that $\text{span}(\phi(T)) = V$ as well.

Taken together, these two statements imply:

3) $\phi \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (of vector spaces) if and only if $\phi$ maps a basis to a basis.

Note that these conditions reduce "total behavior" of $\phi$ (on all of $V$) to behavior on certain kinds of subsets (which in most of the simpler cases, are FINITE). So what we have is "labor-saving criteria". We can test for injectivity, surjectivity, or bijectivity on certain "well-chosen" subsets.

It's condition (3) that matters, here. Essentially, $\rho$ replaces one basis with another. Then $\phi$ "does its thing" (whatever linear transform it does), on the "new basis". Finally, $\rho^{-1}$ "returns us to our original basis". So, in a sense, $\phi$ and $\rho^{-1}\phi\rho$ represent "the same transformation" (hence the name "similar"), just "different bases" (which can be thought of, naively, as "coordinate systems" or "terminologies", since an isomorphism is essentially a "re-naming scheme").

Now let's look at the concrete side of things. Realize that, abstractly, "vectors are vectors", they don't care how we label them. When we attach NUMBERS (that is, field element entries) to a vector, what those numbers MEAN is "up to us". Imagine the Euclidean plane as a blank piece of paper, with just the one dot marking the 0-vector (0,0). The coordinate axes we draw, and the unit lengths we assign on them, are OUR CHOICES, they don't come with the space. We USUALLY draw them perpendicular, and "scaled the same", but this is a bit arbitrary, on our part.

As a pair, (2,3) is just a pair of numbers. As a VECTOR, we usually mean:

$(2,3) = 2v_1 + 2v_2$, where $\{v_1,v_2\}$ is a basis. WE HAVE TO SAY what $v_1,v_2$ ARE.

For example, in the polynomial space $P_1(\Bbb R) = \{a_0 + a_1t: a_0,a_1\}$, if our basis is $\{1,t\}$, then:

$(2,3)$ means $2 + 3t$.

If our basis is $\{1-t,1+t\}$, then $(2,3) = 2(1-t) + 3(1+t) = 2 - 2t + 3 + 3t = 5 + t$, which is a different polynomial.

A matrix in one basis, may have a totally different "appearance" in another basis. For example, it may be upper-triangular in one basis (everything below the main diagonal is 0), and not so in a different basis. Some bases may be easier to work with than others, depending on what kinds of calculations we are doing.

I am going to give you an example of how this works. Study it well.

Suppose we have the linear transformation $T:\Bbb R^2 \to \Bbb R^2$ given by:

$T(x,y) = (2x+4y,3x+3y)$

In the basis $\mathcal{B} = \{(1,0),(0,1)\} = \{e_1,e_2\}$ (this is called the standard basis), we have the matrix:

$[T]_{\mathcal{B}}^{\mathcal{B}} = \begin{bmatrix}2&4\\3&3 \end{bmatrix}$ (verify this!).

Note that the first column of this matrix is $[T(e_1)]_{\mathcal{B}}$, and the second column is $[T(e_2)]_{\mathcal{B}}$. This is no accident, the way the standard basis vectors (expressed IN that basis) "pick out columns" is a function of how matrix multiplication works (if we "hit them on the other side", as row-vectors, they "pick out rows").

Now $\mathcal{E} = \{(-4,3),(1,1)\} = \{v_1,v_2\}$ is ALSO a basis for $\Bbb R^2$:

It is linearly independent:

If $c_1v_1 + c_2 v_2 = 0$ that is, if: $c_1(-4,3) + c_2(1,1) = (0,0)$, so that:

$c_2 - 4c_1 = 0$
$3c_1 + c_2 = 0$

Then $4c_1 - c_2 + 3c_1 + c_2 = -0 + 0 = 0$, that is: $7c_1 = 0\implies c_1 = 0$. Clearly, we must have as well $c_2 = 0$, so $\{v_1,v_2\}$ is linearly independent. What this means is, neither $v_1$ nor $v_2$ (the possible non-empty proper subsets of $\mathcal{E}$) is expressible in terms of the other: we need them BOTH to describe linear combinations of the two.

It spans $\Bbb R^2$:

Given $(a,b) \in \Bbb R^2$ we have:

$(a,b) = \frac{1}{7}(7a,7b) = \frac{1}{7}[(4a-4b,3b-3a) + (3a+4b,3a+4b)]$

$= \dfrac{b-a}{7}(-4,3) + \dfrac{3a+4b}{7}(1,1)$

so any point in $\Bbb R^2$ is expressible as a linear combination of $v_1,v_2$.

Why would we choose to use such an unusual basis?

Let us calculate the matrix $[T]_{\mathcal{E}}^{\mathcal{E}}$. We will do this 2 ways.

First, we calculate $T(v_1),T(v_2)$.

$T(v_1) = T((-4,3)) = (2(-4) + 4(3),3(-4) + 3(3)) = (4,-3) = -v_1$.

In the basis $\mathcal{E}$ this is the linear combination:

$(-1)v_1 + 0v_2$ so $[T(v_1)]_{\mathcal{E}} = (-1,0)$.

$T(v_2) = T((1,1) = (2(1) + 4(1),3(1) + 3(1)) = (6,6) = 6v_2$. So $[T(v_2)]_{\mathcal{E}} = (0,6)$ and by our definition of $[T]_{\mathcal{E}}^{\mathcal{E}}$ we have:

$[T]_{\mathcal{E}}^{\mathcal{E}} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$.

Next, we take "the long way around". First, we need to find a matrix $P$ that take $\mathcal{E}$-coordinates to $\mathcal{B}$-coordinates. Such a matrix would take $[v_1]_{\mathcal{E}}$ to $[v_1]_{\mathcal{B}}$, that is, when it was multiplied by $(1,0)^T$, would yield $(-4,3)^T$, and similarly, for $v_2$, would take $(0,1)^T$ to $(1,1)^T$.

It doesn't take much thought to see that this matrix is:

$P = \begin{bmatrix}-4&1\\3&1 \end{bmatrix}$.

Having applied $P$ to our $\mathcal{E}$-coordinates, we are now in $\mathcal{B}$-coordinates, and may just multiply by our "old matrix" for $T$, to get $T(v)$ in $\mathcal{B}$-coordinates:

$[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_{\mathcal{E}}) = [T]_{\mathcal{B}}^{\mathcal{B}}([v]_{\mathcal{B}}) = [T(v)]_{\mathcal{B}}$

Now the inverse coordinate transformation matrix is just going to be the inverse matrix $P^{-1}$ (why?). This is:

$P^{-1} = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}$, and we have:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_\mathcal{E}) = P^{-1}([T(v)]_{\mathcal{B}} = [T(v)]_{\mathcal{E}}$

that is:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = [T]_{\mathcal{E}}^{\mathcal{E}}$

since that IS the matrix which takes $[v]_{\mathcal{E}}$ to $[T(v)]_{\mathcal{E}}$

Seeing is believing:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}2&4\\3&3 \end{bmatrix}\begin{bmatrix}-4&1\\3&1 \end{bmatrix}$

$= \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}4&6\\-3&6 \end{bmatrix}$

$=\frac{-1}{7}\begin{bmatrix}7&0\\0&-42\end{bmatrix} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$

In this "unusual basis" we see that what $T$ does, is change the sign of the $v_1$ coordinate, and magnify the $v_2$ coordinate by a factor of $6$, that is, it is the composition of an axis flip, and an axis stretch, something that is not at all apparent when using the "standard axes".

Math Amateur · Aug 26, 2014

Deveno said:

Here is the thing I want you to remember, and take to heart:

A similarity transform, and a change-of-basis, are the same thing (essentially).

We have two ways of looking at this: the linear transformation view, and the matrix view. One is "abstract", and one is "concrete".

Suppose $\rho \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (or linear automorphism).

This means it is invertible, that is, there exists $\rho^{-1} \in \text{Hom}_{\ F}(V,V)$ such that:

$\rho \circ \rho^{-1} = \rho^{-1} \circ \rho = 1_V$

Now suppose $\phi \in \text{Hom}_{\ F}(V,V)$ is any linear endomorphism. Clearly:

$\rho^{-1}\phi\rho$ is also a linear endomorphism. What might this do?

Some things you will have to prove, before you are fully prepared to really comprehend this:

1) $\phi \in \text{Hom}_{\ F}(V,V)$ is injective if and only if for every linearly independent subset $S \subseteq V,\ \phi(S)$ is linearly independent.

2) $\phi \in \text{Hom}_{\ F}(V,V)$ is surjective if and only if for every set $T$ with $\text{span}(T) = V$, we have that $\text{span}(\phi(T)) = V$ as well.

Taken together, these two statements imply:

3) $\phi \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (of vector spaces) if and only if $\phi$ maps a basis to a basis.

Note that these conditions reduce "total behavior" of $\phi$ (on all of $V$) to behavior on certain kinds of subsets (which in most of the simpler cases, are FINITE). So what we have is "labor-saving criteria". We can test for injectivity, surjectivity, or bijectivity on certain "well-chosen" subsets.

It's condition (3) that matters, here. Essentially, $\rho$ replaces one basis with another. Then $\phi$ "does its thing" (whatever linear transform it does), on the "new basis". Finally, $\rho^{-1}$ "returns us to our original basis". So, in a sense, $\phi$ and $\rho^{-1}\phi\rho$ represent "the same transformation" (hence the name "similar"), just "different bases" (which can be thought of, naively, as "coordinate systems" or "terminologies", since an isomorphism is essentially a "re-naming scheme").

Now let's look at the concrete side of things. Realize that, abstractly, "vectors are vectors", they don't care how we label them. When we attach NUMBERS (that is, field element entries) to a vector, what those numbers MEAN is "up to us". Imagine the Euclidean plane as a blank piece of paper, with just the one dot marking the 0-vector (0,0). The coordinate axes we draw, and the unit lengths we assign on them, are OUR CHOICES, they don't come with the space. We USUALLY draw them perpendicular, and "scaled the same", but this is a bit arbitrary, on our part.

As a pair, (2,3) is just a pair of numbers. As a VECTOR, we usually mean:

$(2,3) = 2v_1 + 2v_2$, where $\{v_1,v_2\}$ is a basis. WE HAVE TO SAY what $v_1,v_2$ ARE.

For example, in the polynomial space $P_1(\Bbb R) = \{a_0 + a_1t: a_0,a_1\}$, if our basis is $\{1,t\}$, then:

$(2,3)$ means $2 + 3t$.

If our basis is $\{1-t,1+t\}$, then $(2,3) = 2(1-t) + 3(1+t) = 2 - 2t + 3 + 3t = 5 + t$, which is a different polynomial.

A matrix in one basis, may have a totally different "appearance" in another basis. For example, it may be upper-triangular in one basis (everything below the main diagonal is 0), and not so in a different basis. Some bases may be easier to work with than others, depending on what kinds of calculations we are doing.

I am going to give you an example of how this works. Study it well.

Suppose we have the linear transformation $T:\Bbb R^2 \to \Bbb R^2$ given by:

$T(x,y) = (2x+4y,3x+3y)$

In the basis $\mathcal{B} = \{(1,0),(0,1)\} = \{e_1,e_2\}$ (this is called the standard basis), we have the matrix:

$[T]_{\mathcal{B}}^{\mathcal{B}} = \begin{bmatrix}2&4\\3&3 \end{bmatrix}$ (verify this!).

Note that the first column of this matrix is $[T(e_1)]_{\mathcal{B}}$, and the second column is $[T(e_2)]_{\mathcal{B}}$. This is no accident, the way the standard basis vectors (expressed IN that basis) "pick out columns" is a function of how matrix multiplication works (if we "hit them on the other side", as row-vectors, they "pick out rows").

Now $\mathcal{E} = \{(-4,3),(1,1)\} = \{v_1,v_2\}$ is ALSO a basis for $\Bbb R^2$:

It is linearly independent:

If $c_1v_1 + c_2 v_2 = 0$ that is, if: $c_1(-4,3) + c_2(1,1) = (0,0)$, so that:

$c_2 - 4c_1 = 0$
$3c_1 + c_2 = 0$

Then $4c_1 - c_2 + 3c_1 + c_2 = -0 + 0 = 0$, that is: $7c_1 = 0\implies c_1 = 0$. Clearly, we must have as well $c_2 = 0$, so $\{v_1,v_2\}$ is linearly independent. What this means is, neither $v_1$ nor $v_2$ (the possible non-empty proper subsets of $\mathcal{E}$) is expressible in terms of the other: we need them BOTH to describe linear combinations of the two.

It spans $\Bbb R^2$:

Given $(a,b) \in \Bbb R^2$ we have:

$(a,b) = \frac{1}{7}(7a,7b) = \frac{1}{7}[(4a-4b,3b-3a) + (3a+4b,3a+4b)]$

$= \dfrac{b-a}{7}(-4,3) + \dfrac{3a+4b}{7}(1,1)$

so any point in $\Bbb R^2$ is expressible as a linear combination of $v_1,v_2$.

Why would we choose to use such an unusual basis?

Let us calculate the matrix $[T]_{\mathcal{E}}^{\mathcal{E}}$. We will do this 2 ways.

First, we calculate $T(v_1),T(v_2)$.

$T(v_1) = T((-4,3)) = (2(-4) + 4(3),3(-4) + 3(3)) = (4,-3) = -v_1$.

In the basis $\mathcal{E}$ this is the linear combination:

$(-1)v_1 + 0v_2$ so $[T(v_1)]_{\mathcal{E}} = (-1,0)$.

$T(v_2) = T((1,1) = (2(1) + 4(1),3(1) + 3(1)) = (6,6) = 6v_2$. So $[T(v_2)]_{\mathcal{E}} = (0,6)$ and by our definition of $[T]_{\mathcal{E}}^{\mathcal{E}}$ we have:

$[T]_{\mathcal{E}}^{\mathcal{E}} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$.

Next, we take "the long way around". First, we need to find a matrix $P$ that take $\mathcal{E}$-coordinates to $\mathcal{B}$-coordinates. Such a matrix would take $[v_1]_{\mathcal{E}}$ to $[v_1]_{\mathcal{B}}$, that is, when it was multiplied by $(1,0)^T$, would yield $(-4,3)^T$, and similarly, for $v_2$, would take $(0,1)^T$ to $(1,1)^T$.

It doesn't take much thought to see that this matrix is:

$P = \begin{bmatrix}-4&1\\3&1 \end{bmatrix}$.

Having applied $P$ to our $\mathcal{E}$-coordinates, we are now in $\mathcal{B}$-coordinates, and may just multiply by our "old matrix" for $T$, to get $T(v)$ in $\mathcal{B}$-coordinates:

$[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_{\mathcal{E}}) = [T]_{\mathcal{B}}^{\mathcal{B}}([v]_{\mathcal{B}}) = [T(v)]_{\mathcal{B}}$

Now the inverse coordinate transformation matrix is just going to be the inverse matrix $P^{-1}$ (why?). This is:

$P^{-1} = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}$, and we have:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_\mathcal{E}) = P^{-1}([T(v)]_{\mathcal{B}} = [T(v)]_{\mathcal{E}}$

that is:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = [T]_{\mathcal{E}}^{\mathcal{E}}$

since that IS the matrix which takes $[v]_{\mathcal{E}}$ to $[T(v)]_{\mathcal{E}}$

Seeing is believing:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}2&4\\3&3 \end{bmatrix}\begin{bmatrix}-4&1\\3&1 \end{bmatrix}$

$= \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}4&6\\-3&6 \end{bmatrix}$

$=\frac{-1}{7}\begin{bmatrix}7&0\\0&-42\end{bmatrix} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$

In this "unusual basis" we see that what $T$ does, is change the sign of the $v_1$ coordinate, and magnify the $v_2$ coordinate by a factor of $6$, that is, it is the composition of an axis flip, and an axis stretch, something that is not at all apparent when using the "standard axes".

Really helpful post ... So much more guidance and clarity than Dummit and Foote's explanation ... Thank you ...

Peter

MHB Similar Matrices and Change of Basis

Thread 'About the existence of Hamel basis for vector spaces'

Similar threads

I How to show ##p(x)=g(x)x\pm 1\in\Bbb{Q}[x]## is irreducible in ##\Bbb{Q}_{\Bbb{Z}}[x]##?

I Showing ##k[x_1,\ldots,x_n]/\mathfrak{a}## is finite dimensional

I About the existence of Hamel basis for vector spaces

I How do we distinguish two different notations for cokernel and coimage?

I Localising a non integral domain at a prime

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers