Why Didn't the Eigenvector Approach Work for This Matrix?

jonjacson · Mar 27, 2017

Hi
If I have this matrix:
\begin{array}{cc}0&1\\1&0\end{array}
and I want to find its eigenvectors and eigenvalues, I can try it using the definition of an eigenvector which is:

A x = λ x , where x are the eigenvectors

But if I try this directly I fail to get the right answer, for example using a column eigenvector (a b) , instead I get:

(b a) = λ (a b) , (These are column vectors.)

THere is no lambda able to make this correct, unless it is zero which is not the right answer. Why is that this approach didn't work?

I have to use the identity matrix, and the determinant of A - λ I, to get the right result.

Thanks!

DrClaude · Mar 27, 2017

jonjacson said:

(b a) = λ (a b) , (These are column vectors.)

THere is no lambda able to make this correct, unless it is zero which is not the right answer. Why is that this approach didn't work?

The part I bolded is false. The set of two equations of two unknowns is not hard to solve. Try it!

BvU · Mar 27, 2017

jonjacson said:

THere is no lambda able to make this correct, unless it is zero

You can check that this must be wrong by comparing it with the outcome of

jonjacson said:

I have to use the identity matrix, and the determinant of A - λ I, to get the right result

where you find ##\lambda^2-1 = 0 \Rightarrow \lambda = \pm 1## and can then determine the corresponding eigenvectors from (b a) = λ (a b) !

jonjacson · Mar 27, 2017

Ups, ok, you are right. I get the same results, and then for the eigenvectors I get:

Lambda=1 ---> (a a) , I simply get a=b, so the eigenvector is (1 1)

Lambda=-1---> (a -a), I get a=-b, so the eigenvector is (1 -1)

BvU · Mar 27, 2017

Well done! (apart from normalization) you have found the eigenvectors of the first of the Pauli matrices; they play an important role in quantum mechanics for particles with spin.

jonjacson · Mar 27, 2017

Yes, that is what I was reading about.

Thanks.

mathwonk · Mar 30, 2017

you could just picture the map, since from the matrix columns, it is obvious it interchanges the standard unit vectors on the x and y axes. Hence it is a reflection of the plane in the line x=y, hence it leaves invariant both that line and its orthocomplement, acting as the identity on the line x=y and as - the identity on the line x=-y. but the algebraic method is more sure, if less geometric.

WWGD · Mar 30, 2017

mathwonk said:

you could just picture the map, since from the matrix columns, it is obvious it interchanges the standard unit vectors on the x and y axes. Hence it is a reflection of the plane in the line x=y, hence it leaves invariant both that line and its orthocomplement, acting as the identity on the line x=y and as - the identity on the line x=-y. but the algebraic method is more sure, if less geometric.

This is a good idea for transformations but it seems harder for other situations. Can you (or anyone else, of course) see some geometric interpretations for e.g. a correlation or covariance matrix?

WWGD · Apr 4, 2017

Sorry, I don't mean to hijack the thread, it is just that I am curious about a related issue: the interpretation of eigenvalues in correlation/covariance matrices. These supposedly describe directions of maximal variability of the data, but I just cannot see it at this point. I thought since the OP seems satisfied with the answers given, it may make sense to extend the thread beyond the original scope.

StoneTemplePython · Apr 5, 2017

WWGD said:

Sorry, I don't mean to hijack the thread, it is just that I am curious about a related issue: the interpretation of eigenvalues in correlation/covariance matrices. These supposedly describe directions of maximal variability of the data, but I just cannot see it at this point. I thought since the OP seems satisfied with the answers given, it may make sense to extend the thread beyond the original scope.

I'm not totally sure I understand your question as they are a lot of interpretations here. In all cases I assume we're working with centered (read: zero mean, by column) data. I also assume we're operating in reals.

If you have your data in a matrix ##\mathbf A##, and you have some arbitrary vector ##\mathbf x##, where ##\big \vert \big \vert \mathbf x \big \vert \big \vert_2^{2} = 1##, then to maximize ##\big \vert \big \vert \mathbf{Ax} \big \vert \big \vert_2^{2}##, you'd allocate entirely to ##\lambda_1##, ##\mathbf v_1## the largest eigenpair in ##\mathbf A^T \mathbf A## (aka largest singular value (squared) of ##\mathbf A## and the associated right singular vector). This is a quadratic form interpretation of your answer. Typically people prove this with a diagonalization argument or a Lagrange multiplier argument. I assume the eigenvalues are well ordered for this symmetric positive (semi)definite covariance matrix, so ##\lambda_1 \geq \lambda_2 \geq ... \lambda_n \geq 0##. Where
##\mathbf A =
\bigg[\begin{array}{c|c|c|c}
\mathbf a_1 & \mathbf a_2 &\cdots & \mathbf a_{n}
\end{array}\bigg]
##
That is, ##\mathbf a_j## refers to the jth feature column in ##\mathbf A##

using the interpretation of matrix vector multiplication as the scaled sum across the column space of a matrix, we see that ##\mathbf {Ax} = x_1 *\mathbf a_1 + x_2 *\mathbf a_2 + ... + x_n *\mathbf a_n## .

Thus when someone asks to do a constrained maximization of ##\big \vert \big \vert \mathbf{Ax} \big \vert \big \vert_2^{2}##, what they are saying is come up with the vector that is a linear combination of features from data matrix ##\mathbf A## that has the maximal length, subject to the constraint that ##x_1^2 + x_2^2 + ... + x_n^2 = 1## (or some other constant > 0, but we use one for simplicity here). Since all features are zero mean (i.e. you centered your data), what you have done is extract a vector with the highest second moment / variance from your features -- again subject to the constraint ##x_1^2 + x_2^2 + ... + x_n^2 = 1##.

here is another interpretation

If you wanted to low rank approximate -- say rank one -- your matrix ##\mathbf A##, and you were using ##\big \vert \big \vert \mathbf A \big \vert \big \vert_F^{2}## as your ruler (i.e. sum up the squared value of everything in ##\mathbf A## which is a generalization of a L2 norm on vectors), you'd also allocate entirely to ##\lambda_1##, where ##\big \vert \big \vert \mathbf A \big \vert \big \vert_F^{2} = trace\big(\mathbf A^T \mathbf A\big) = \lambda_1 + \lambda_2 + ... + \lambda_n##, where each associated eigenvector is mutually orthonormal, so we have a clean partition, and for each eigenvalue your allocate to, you increase the rank of your approximation, thus for rank 2 approximation you'd allocate to ##\lambda_1## and ##\lambda_2## and so forth.

Why Didn't the Eigenvector Approach Work for This Matrix?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad The vector to which a dual vector corresponds

Graduate Confusion about the Moyal-Weyl twist

Undergrad 2 interpretations of bra-ket expression: equal, & isomorphic, but...

Undergrad Spinor calculus

Undergrad Matrix representation of rank-2 spinors

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect