# Matrix Representations of Linear Transformations

Let ##i\in\{1,\dots,m\}## and ##j\in\{1,\dots,n\}## be arbitrary. ##Te_j## is by definition of ##T## an element of Y. Since we use the notation ##y_i## for the ith component of an arbitrary ##y\in Y## with respect to B, it’s natural to use the notation ##(Te_j)_i## for the ith component of ##Te_j## with respect to B. The mn (m times n) scalars ##(Te_j)_i## with ##i\in\{1,\dots,m\}## and ##j\in\{1,\dots,n\}## are called the components, or matrix elements, of T with respect to (A,B). The m×n matrix

$$\begin{pmatrix}(Te_1)_1 & \cdots & (Te_n)_1\\ \vdots & \ddots & \vdots\\ (Te_1)_m & \dots & (Te_n)_m\end{pmatrix}$$ is called the matrix representation of T with respect to (A,B). It is often denoted by the same symbol as the linear transformation, in this case T. In situations where you would prefer to use different notations for the linear transformation and its matrix representation, a notation like ##[T]## or ##[T]_{B,A}## can be used for the latter.

The standard notation for the scalar on row i, column j of a matrix ##T## is ##T_{ij}##. In this notation, we have ##T_{ij}=(Te_j)_i##. This is the formula you need to remember. An alternative notation for the scalar on row i, column j is ##T^i_j##. In this notation, we have ##T^i_j=(Te_j)^i##. You may find it easier to remember this version of the formula, since the index that’s upstairs on the left is upstairs on the right.

Given an m×n matrix M, there’s a simple way to define a linear transformation ##T:X\to Y## such that the matrix representation of T with respect to (A,B) is M. We define T to be the unique linear ##T:X\to Y## such that ##Te_j=\sum_{i=1}^n T_{ij}f_i## for all ##j\in\{1,\dots,n\}##. In the alternative notation, we would write ##Te_j=\sum_{i=1}^n T^i_j f_i##. You may find it easier to remember this version of the formula, since the summation is over an index that appears once upstairs and once downstairs.

The following observation provides some motivation for the definitions. Let ##x\in X## be arbitrary. Define ##y\in Y## by ##y=Tx##. Let ##x_1,\dots,x_n## be the components of x with respect to A, and let ##y_1,\dots,y_m## be the components of y with respect to B. We have

\begin{align*}

y &=\sum_{i=1}^m y_i f_i\\

Tx &=T\left(\sum_{j=1}^n x_j e_j\right) =\sum_{j=1}^n x_jTe_j =\sum_{j=1}^n x_j \left(\sum_{i=1}^m(Te_j)_i f_i\right) =\sum_{j=1}^n \sum_{i=1}^m x_j (Te_j)_i f_i\\

&=\sum_{i=1}^m \left(\sum_{j=1}^n x_j (Te_j)_i\right) f_i.

\end{align*}

Since ##\{f_1,\dots,f_m\}## is linearly independent, these results and the equality y=Tx imply that

$$\sum_{j=1}^n x_j (Te_j)_i= y_i$$ for all ##i\in\{1,\dots,m\}##. If you recall that the definition of matrix multiplication is ##(AB)_{ij}=\sum_k A_{ik}B_{kj}##, you can recognize the above as the ith row of the matrix equation

$$\begin{pmatrix}(Te_1)_1 & \cdots & (Te_n)_1\\ \vdots & \ddots & \vdots\\ (Te_1)_m & \dots & (Te_n)_m\end{pmatrix} \begin{pmatrix}x_1\\ \vdots\\ x_n\end{pmatrix} =\begin{pmatrix}y_1\\ \vdots\\ y_m\end{pmatrix}.$$

The following is a simple example of how to find a matrix representation of a linear transformation. Define ##S:\mathbb R^3\to\mathbb R^2## by ##S(x,y,z)=(3z-x,2y)##. This S is linear. Let ##C=(g_1,g_2,g_3)## and ##D=(h_1,h_2)## be the standard ordered bases for ##\mathbb R^3## and ##\mathbb R^2## respectively. We will denote the matrix of S with respect to (C,D) by ##[ S]##.

\begin{align*}Sg_1 &=S(1,0,0)=(-1,0) =-1h_1+0h_2\\

Sg_2 &=S(0,1,0) =(0,2)=0h_1+2h_2\\

Sg_3 &=S(0,0,1) =(3,0)=3h_1+0h_2\\

[ S] &=\begin{pmatrix}(Sg_1)_1 & (Sg_2)_1 & (Sg_3)_1\\ (Sg_1)_2 & (Sg_2)_2 & (Sg_3)_2\end{pmatrix} =\begin{pmatrix}-1 & 0 & 3\\ 0 & 2 & 0 \end{pmatrix}.

\end{align*} Note that for all ##x,y,z\in\mathbb R##,

$$[ S]\begin{pmatrix}x\\ y\\ z\end{pmatrix} = \begin{pmatrix}-1 & 0 & 3\\ 0 & 2 & 0 \end{pmatrix} \begin{pmatrix}x\\ y\\ z\end{pmatrix} =\begin{pmatrix}-x+3z\\ 2y\end{pmatrix}.$$

If Y is an inner product space, and B is an orthonormal ordered basis, the easiest way to find the matrix elements is often to use the inner product. We will use the physicist’s convention for inner products. This means that the term “inner product” is defined so that when we’re dealing with a vector space over ℂ (i.e. when the set of scalars is ℂ rather than ℝ), the map ##v\mapsto\langle u,v\rangle## is linear and the map ##u\mapsto\langle u,v\rangle## is antilinear (i.e. conjugate linear). Let ##i\in\{1,\dots,m\}## and ##y\in Y## be arbitrary. Let ##y_1,\dots,y_m## be the components of y with respect to B.

$$\left\langle f_i,y\right\rangle =\left\langle f_i,\sum_{j=1}^m y_j f_j \right\rangle =\sum_{j=1}^m y_j \left\langle f_i,f_j\right\rangle =\sum_{j=1}^m y_j \delta_{ij} =y_i.$$ Since i and y are arbitrary, this implies that for all ##i\in\{1,\dots,m\}## and all ##j\in\{1,\dots n\}##,

$$\left\langle f_i,Te_j \right\rangle=(Te_j)_i .$$ If X=Y, it’s convenient to choose B=A, and to speak of the matrix representation of T with respect to A instead of with respect to (A,A), or (A,B). The formula for ##T_{ij}## can now be written as

$$T_{ij}=(Te_j)_i=\left\langle e_i,Te_j \right\rangle.$$ One final comment for those of you who have studied quantum mechanics. (If you haven’t, just ignore this). In bra-ket notation, we would usually write the ith basis vector as ##\left|i\right\rangle##. This turns the last formula above into

$$T_{ij} =\left\langle i\right|T\left|j\right\rangle.$$

Great work Fredrik!

Maybe I’m wrong and it’s my browser but it seems there are non compiled latex lines in the text.

Fixed, thanks!

Thank you for the nice article!

I hope it will help beginning students to avoid the kind of confusion that I used to experience. (I think that part of this confusion is due to the fact that in physics literature, one usually doesn’t distinguish between an operator and its matrix representation and, often, one also omits the specification of the underlying bases. For trained readers this is usually not a problem, but for students just coming from an LA course and looking to apply the theory in physics problems, I believe this can cause unnecessary difficulties.)

One typo:

In the line starting with: “We just define T to be the unique linear ##T:X→Y## such that (…)” you probably meant to write

$$

T e_j = sum_{i=1}^m{T_{ij}f_i}

$$

since at this point in the text you have not yet assumed that ##X = Y##, etc.

Two suggestions:

[LIST=1]

[*]I think it would make the article even better if you would also discuss a second example, this time of an operator acting on an abstract (but still finite dimensional) vector space, such as a space of polynomials or so. This way, it becomes clear that a vector / matrix and its representation w.r.t. a basis are really two different things, and it also showcases the power of matrix representations when doing computations with abstract operators.

[*]Perhaps, alluding to your remark on QM at the end, it would be nice if you would write a follow-up on how this generalizes quite easily to bounded linear operators on separable Hilbert spaces. Then, you could also comment on what happens when you replace a bounded operator with an unbounded (differential) operator, which is typically the case physicists encounter when studying QM.

[/LIST]

Hopefully you do not consider these comments an interference, but rather an expression of my enthusiasm for the subject and the attention that it has recently received on PF.

As pointed out by Krylov

”Given an m×n matrix M, there’s a simple way to define a linear transformation T:X→Y such that the matrix representation of T with respect to (A,B) is M. We just define T to be the unique linear ##T:X→Y ## such that ##Tej=∑ni=1Tijei## for all j∈{1,…,n}.”only works if ##X=Y##. A matrix determines a linear transformation for each choice of basis for ##X## and ##Y##. Without a choice of bases, the matrix does not determine a linear transformation.

But given a matrix the unity vectors in both spaces always define a natural basis to which the matrix is a linear transformation.

Good catch. That line isn’t present in the last draft I that I discussed with other people (in February 2013) before I turned it into a FAQ post (in June 2013…I’m pretty slow apparently), so I must have put it in later and not proofread it well enough.

Your comments are welcome, and I like your suggestions. Unfortunately I don’t have a lot of time to improve this post right now. If you would like to do it, I’m more than OK with that.

The LaTeX can be improved. When I wrote this in 2013, LaTeX behaved differently here. There was no automatic numbering of equations for example. I would like to make sure that only those equations that should be numbered are numbered. Removing all the numbers is also an option. Also, the equation that begins with Tx= wasn’t split over two lines before. It needs an explicit line break followed by an alignment symbol. (I could edit the post when it was a normal FAQ post. I don’t think I can now that it’s an Insights post).

I also think the bases selected in each of ##X,Y ## both have to be ordered bases for there to be an isomorphism between ## L(X,Y) ## , linear maps between ## X,Y ## and ##M_{n times m}(R)## , where ##R## is the Ring; ## M_{n times m}(R) ## is the space of matrices with coefficients in the ring and##X,Y ## are (free, of course) ##R ##-modules (both right- or left-, I think); I think this is the most general scope of the isomorphism

Yes if one already has two bases then the matrix defines a linear map. But there is no natural given basis for a vector space. You need to select one. Not sure what you mean by the unity vectors.

Physicists probably write them ##e_i = (δ_{ij})_j##. I learned unit vectors. Ok, it’s not the i-th basis vector but the coordinate representation of the i-th basis vector. But that is hair-splitting. To awake the impression that a matrix isn’t a linear transformation is negligent. There is always a basis to which the matrix is a linear transformation. And in the finite dimensional case even without the use of the axiom of choice. I just wanted to avoid someone saying: “But I’ve read on the internet that a matrix isn’t a linear transformation.” The discussion distinguishing between the vectors themselves and their coordinate representation is in my opinion something for specialists and logicians.

Hmmm. The point I was trying to make is that a matrix determines a continuum of linear transformations each of which depends on a choice of basis.

Good point. I remember I had my difficulties, too, when I first learned the concept. All of a sudden there surfaced matrices ##T## and ##T^{-1}## surrounding my original ##A##….or even worse ##T## and ##S^{-1}##

But a matrix does not necessarily describe a linear transformation (sorry if this is not what you mean). It can represent the adjacency conditions of a graph, a Markov process, etc. If you mean that there is a bijection ( isomorphism) between linear maps and matrices, then I agree.

That’s a good one. But to be honest, e.g. Markov processes didn’t come to my mind in a thread about linear transformations.

It reminds me on a test I once recorded. The student could perfectly define a linear transformation and was asked about an example. The professor would had been satisfied with a rotation or just a matrix. Unfortunately for the student he couldn’t tell one. I remember this because I still wonder what the professor would have answered on my example. I would have answered: 0. (And 1 next.)

But of course you are completely right: a matrix is nothing else as any elements of any set ordered in a rectangle. Or a movie …

I don’t know if this gets you into Philosophy, but isn’t a linear transformation expressed in different bases essentially the same linear transformation, i.e., given ##L## in any one basis, then the set { ## S^{-1}LS ##} for any (invertible) matrix S just one linear transformation?

Yes it is the same. But its matrix representation changes by a conjugation – at least for a linear map of a vector space into itself.

Perhaps the insight should explain this by showing how the matrix changes for a change of basis.

Really? Take ##P_2##, the vector space of, say, real polynomials of order ##le 2##, including the zero polynomial. There are no unit vectors here. (We cannot even normalize to unity because there is no norm chosen yet.) Let’s pick a basis, perhaps ##mathcal{A} := {1, x, x^2}## and consider ##p in P_2## defined by ##p(x) = 6 – x^2##. Then its coordinate vector is ##[p]_{mathcal{A}} = [6, 0, -1] in mathbb{R}^3##. However, with respect to the basis ##mathcal{B} = {2 – x, x, -x^2}## we have ##[p]_{mathcal{B}} = [3, 3, 1]_{mathcal{B}} in mathbb{R}^3##. Also, the representation of the first basis vector in ##mathcal{A}## with respect to ##mathcal{B}## is ##[1]_{mathcal{B}} = [tfrac{1}{2},tfrac{1}{2},0] in mathbb{R}^3##, etc.

Especially when learning LA, it is very important that students, in mathematics and physics alike, distinguish between ##p##, ##[p]_{mathcal{A}}## and ##p_{mathcal{B}}##. (Later on, they may learn that these vectors are related through isomorphisms, but that is not how one starts.) It is also crucial when doing computations, for example in numerical analysis.

Nobody awoke this impression, but as we have just seen, one has to be precise, especially when dealing with vector spaces different from ##mathbb{R}^n## or ##mathbb{C}^n##.

In combination with your earlier comments on unit vectors, you seem to suggest that for infinite dimensional vector spaces every matrix defines a linear transformation on the space. This is already false for separable Hilbert spaces. Take the sequence space ##ell_2## with the canonical basis (yes indeed, the one consisting of the unit vectors ##{(delta_{mn})_{m = 1}^{infty} ,:,n in mathbb{N}}##) and consider the infinite matrix ##M = (delta_{mn}n)_{m,n=1}^{infty}##. Then ##x = [n^{-1}]_{n=1}^{infty}## is in ##ell_2## but ##Mx = [1]_{n=1}^{infty}## is not.

In fact, by Parseval’s identity there is no orthonormal basis of ##ell_2## with respect to which ##M## represents a linear operator.

No, it is not, as we have already seen in the example on ##P_2##. It is part of any decent first course on linear algebra. On the other hand, if by “specialists” are meant people that actually know what they are talking about, then I agree.

Yes, I’m quite irritated. Your post lacks any, well… insight, and is one of those that sometimes makes me wonder whether I’m wasting my time here.

I have made some minor edits. (Greg showed me how). I fixed the mistake that Krylov found, removed the equation numbers, and made some minor changes to the language.

I got a comment about my usage of the term “n-tuple”. I have always felt that it’s unnecessary to say “ordered n-tuple”, since no one uses the term “n-tuple” to refer to a set of cardinality n. How do you guys feel about this? Do you feel that my usage is like saying “line” instead of “straight line”, or that it’s plain wrong?

Haven’t met anyone either. If it’s not ordered you won’t say tuple. Even the notation in round brackets implies it’s ordered, imao.