B Why Didn't the Eigenvector Approach Work for This Matrix?

jonjacson
Messages
450
Reaction score
38
Hi
If I have this matrix:
\begin{array}{cc}0&1\\1&0\end{array}
and I want to find its eigenvectors and eigenvalues, I can try it using the definition of an eigenvector which is:

A x = λ x , where x are the eigenvectors

But if I try this directly I fail to get the right answer, for example using a column eigenvector (a b) , instead I get:

(b a) = λ (a b) , (These are column vectors.)

THere is no lambda able to make this correct, unless it is zero which is not the right answer. Why is that this approach didn't work?

I have to use the identity matrix, and the determinant of A - λ I, to get the right result.

Thanks!
 
Physics news on Phys.org
jonjacson said:
(b a) = λ (a b) , (These are column vectors.)

THere is no lambda able to make this correct, unless it is zero which is not the right answer. Why is that this approach didn't work?
The part I bolded is false. The set of two equations of two unknowns is not hard to solve. Try it!
 
  • Like
Likes jonjacson
jonjacson said:
THere is no lambda able to make this correct, unless it is zero
You can check that this must be wrong by comparing it with the outcome of
jonjacson said:
I have to use the identity matrix, and the determinant of A - λ I, to get the right result
where you find ##\lambda^2-1 = 0 \Rightarrow \lambda = \pm 1## and can then determine the corresponding eigenvectors from (b a) = λ (a b) !
 
Ups, ok, you are right. I get the same results, and then for the eigenvectors I get:

Lambda=1 ---> (a a) , I simply get a=b, so the eigenvector is (1 1)

Lambda=-1---> (a -a), I get a=-b, so the eigenvector is (1 -1)
 
Well done! (apart from normalization) you have found the eigenvectors of the first of the Pauli matrices; they play an important role in quantum mechanics for particles with spin.
 
  • Like
Likes jonjacson
Yes, that is what I was reading about.

Thanks.
 
you could just picture the map, since from the matrix columns, it is obvious it interchanges the standard unit vectors on the x and y axes. Hence it is a reflection of the plane in the line x=y, hence it leaves invariant both that line and its orthocomplement, acting as the identity on the line x=y and as - the identity on the line x=-y. but the algebraic method is more sure, if less geometric.
 
  • Like
Likes BvU
mathwonk said:
you could just picture the map, since from the matrix columns, it is obvious it interchanges the standard unit vectors on the x and y axes. Hence it is a reflection of the plane in the line x=y, hence it leaves invariant both that line and its orthocomplement, acting as the identity on the line x=y and as - the identity on the line x=-y. but the algebraic method is more sure, if less geometric.

This is a good idea for transformations but it seems harder for other situations. Can you (or anyone else, of course) see some geometric interpretations for e.g. a correlation or covariance matrix?
 
Sorry, I don't mean to hijack the thread, it is just that I am curious about a related issue: the interpretation of eigenvalues in correlation/covariance matrices. These supposedly describe directions of maximal variability of the data, but I just cannot see it at this point. I thought since the OP seems satisfied with the answers given, it may make sense to extend the thread beyond the original scope.
 
  • #10
WWGD said:
Sorry, I don't mean to hijack the thread, it is just that I am curious about a related issue: the interpretation of eigenvalues in correlation/covariance matrices. These supposedly describe directions of maximal variability of the data, but I just cannot see it at this point. I thought since the OP seems satisfied with the answers given, it may make sense to extend the thread beyond the original scope.

I'm not totally sure I understand your question as they are a lot of interpretations here. In all cases I assume we're working with centered (read: zero mean, by column) data. I also assume we're operating in reals.

If you have your data in a matrix ##\mathbf A##, and you have some arbitrary vector ##\mathbf x##, where ##\big \vert \big \vert \mathbf x \big \vert \big \vert_2^{2} = 1##, then to maximize ##\big \vert \big \vert \mathbf{Ax} \big \vert \big \vert_2^{2}##, you'd allocate entirely to ##\lambda_1##, ##\mathbf v_1## the largest eigenpair in ##\mathbf A^T \mathbf A## (aka largest singular value (squared) of ##\mathbf A## and the associated right singular vector). This is a quadratic form interpretation of your answer. Typically people prove this with a diagonalization argument or a Lagrange multiplier argument. I assume the eigenvalues are well ordered for this symmetric positive (semi)definite covariance matrix, so ##\lambda_1 \geq \lambda_2 \geq ... \lambda_n \geq 0##. Where
##\mathbf A =
\bigg[\begin{array}{c|c|c|c}
\mathbf a_1 & \mathbf a_2 &\cdots & \mathbf a_{n}
\end{array}\bigg]
##
That is, ##\mathbf a_j## refers to the jth feature column in ##\mathbf A##

using the interpretation of matrix vector multiplication as the scaled sum across the column space of a matrix, we see that ##\mathbf {Ax} = x_1 *\mathbf a_1 + x_2 *\mathbf a_2 + ... + x_n *\mathbf a_n## .

Thus when someone asks to do a constrained maximization of ##\big \vert \big \vert \mathbf{Ax} \big \vert \big \vert_2^{2}##, what they are saying is come up with the vector that is a linear combination of features from data matrix ##\mathbf A## that has the maximal length, subject to the constraint that ##x_1^2 + x_2^2 + ... + x_n^2 = 1## (or some other constant > 0, but we use one for simplicity here). Since all features are zero mean (i.e. you centered your data), what you have done is extract a vector with the highest second moment / variance from your features -- again subject to the constraint ##x_1^2 + x_2^2 + ... + x_n^2 = 1##.

here is another interpretation

If you wanted to low rank approximate -- say rank one -- your matrix ##\mathbf A##, and you were using ##\big \vert \big \vert \mathbf A \big \vert \big \vert_F^{2}## as your ruler (i.e. sum up the squared value of everything in ##\mathbf A## which is a generalization of a L2 norm on vectors), you'd also allocate entirely to ##\lambda_1##, where ##\big \vert \big \vert \mathbf A \big \vert \big \vert_F^{2} = trace\big(\mathbf A^T \mathbf A\big) = \lambda_1 + \lambda_2 + ... + \lambda_n##, where each associated eigenvector is mutually orthonormal, so we have a clean partition, and for each eigenvalue your allocate to, you increase the rank of your approximation, thus for rank 2 approximation you'd allocate to ##\lambda_1## and ##\lambda_2## and so forth.
 
  • Like
Likes WWGD
Back
Top