How to prove The Cayley-Hamilton Theorem?

Nipon Waiyaworn · Jul 19, 2017

Hello everybody. I have to prove the fallowing theorem:
The Cayley-Hamilton Theorem

help me please...

fresh_42 · Jul 19, 2017

How far did you get? As far as I can see, it is not as simple as it might appear, will say it takes several steps. What do you know about minimal polynomials and generalized eigenspaces? Have you done a google search? I'm almost sure that the proof can be found online.

mathman · Jul 19, 2017

What happens if you substitute A for t in the determinant equation?

Nipon Waiyaworn · Jul 19, 2017

mathman said:

What happens if you substitute A for t in the determinant equation?

Nipon Waiyaworn · Jul 19, 2017

fresh_42 said:

How far did you get? As far as I can see, it is not as simple as it might appear, will say it takes several steps. What do you know about minimal polynomials and generalized eigenspaces? Have you done a google search? I'm almost sure that the proof can be found online.

I found Chris Bernhart's document but I don't understand, How is inverse of I-tA indicated by power series in t with coefficients M(n,n)?
and How is adj(I-tA) indicated by power series in t with coefficients M(n,n)?
finally, I don't understand the last paragraph you help me to explain it please...

mathwonk · Jul 20, 2017

the "standard equation" formula in line 16 of your post implies the result. since it shows that (tI-A) divides the polynomial det(tI-A). I.e. non commutative algebra shows that this occurs if and only if t=A is a root of the polynomial det(tI-A), just as in high school algebra of polynomials. technically this formula shows that (tI-A) divides it from the left and hence t=A is a left root, but the formula also holds on the right. I.e. the root - factor theorem in non commutative polynomial rings takes into account that one may possibly get different results substituting values from the left or from the right, but that does not occur here. (mathman's suggestion is famously wrong however. i.e. the second equal sign in nipon's picture is false.)

StoneTemplePython · Jul 20, 2017

You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

Then to extend this to defective matrices, you either cleverly work with upper triangular matrices (which has some subtleties), or perturb the eigenvalues by very small amounts, and show that your matrix is approximated up to any arbitrary level of precision by a diagonalizable one... and re-use the argument from before.

There's a nice walk-though in chapter 9 of Linear Algebra Done Wrong.

https://www.math.brown.edu/~treil/papers/LADW/book.pdf

I'd recommend working through the whole book at some point -- its very well done.

I guess you could also use formal power series and classical adjoints... but that seems rather unpleasant.

lavinia · Jul 21, 2017

StoneTemplePython said:

You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

The field does not need to be the complex numbers. One needs the eigen values to be distinct when one extends the scalar field to be the complex numbers. For instance the real matrix ##\begin{pmatrix}0&-1\\1&0\end{pmatrix}## has characteristic polynomial ##(t-i)(t+i)##.

StoneTemplePython · Jul 21, 2017

lavinia said:

The field does not need to be the complex numbers. One needs the eigen values to be distinct when one extends the scalar field to be the complex numbers. For instance the real matrix ##\begin{pmatrix}0&-1\\1&0\end{pmatrix}## has characteristic polynomial ##(t-i)(t+i)##.

I'm not sure what you're trying to get at here. My suggestion to OP was to start simple. To my mind this means diagonalizable matrices in an algebraically closed filed... ##\mathbb C## is very natural in this context.

I have no idea what you're referring to when you say that "one needs the eigenvalues to be distinct" when using complex numbers. The example you give is skew Hermitian which is a normal matrix, and non-defective per spectral theorem. I can think of plenty of larger ##n## x ##n## permutation matrices that have repeated eigenvalues (i.e. algebraic multiplicity ##\gt 1##). Yet these too are non-defective because they are a special case of a unitary and hence normal.

After dealing with diagonalizable matrices, the natural next step for OP is to look at Cayley Hamilton in context of defective matrices, which is a bit more work.

There are of course other approaches, but I'm inclined to start simple and build which is what LADW does.

lavinia · Jul 21, 2017

StoneTemplePython said:

I'm not sure what you're trying to get at here. My suggestion to OP was to start simple. To my mind this means diagonalizable matrices in an algebraically closed filed... ##\mathbb C## is very natural in this context.

I have no idea what you're referring to when you say that "one needs the eigenvalues to be distinct" when using complex numbers. The example you give is skew Hermitian which is a normal matrix, and non-defective per spectral theorem. I can think of plenty of larger ##n## x ##n## permutation matrices that have repeated eigenvalues (i.e. algebraic multiplicity ##\gt 1##). Yet these too are non-defective because they are a special case of a unitary and hence normal.

After dealing with diagonalizable matrices, the natural next step for OP is to look at Cayley Hamilton in context of defective matrices, which is a bit more work.

There are of course other approaches, but I'm inclined to start simple and build which is what LADW does.

I was just trying to say that you can extend scalars and get to the situation that you are talking about. I thought your point was that using the complex numbers allows the characteristic polynomial to be factored into linear factors. If these factors are distinct as you want to assume in order to "start simple" then the theorem follows immediately.

The example I gave was a rotation of the 2 dimensional real plane. It is an irreducible transformation. But tensoring with the complex numbers makes it into a linear transformation of a two dimensional complex vector space where it has two one dimensional eigenspaces with distinct eigen values.

If all you care about is a linear basis made of eigen vectors with unique eigen values as a starting point then there is no need for the complex numbers per se. In fact, if the eigenvectors form a basis then there is no need to assume that the eigenvalues are distinct.

- Interestingly, one might push your approach to get a complete proof of Cayley-Hamilton for all matrices. I think that the complex n×n matrices with distinct eigenvalues are dense in the nxn complex matrices under any reasonable norm say the sup norm on the matrix entries. This means that Cayley Hamilton is true for a Cauchy sequence of matrices that converge to any given complex matrix. One then uses extension of scalars to get the general case. I will try to see if this idea works.

fresh_42 · Jul 21, 2017

What about fields which are not of characteristic zero?

Nipon Waiyaworn · Jul 21, 2017

StoneTemplePython said:

You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

Then to extend this to defective matrices, you either cleverly work with upper triangular matrices (which has some subtleties), or perturb the eigenvalues by very small amounts, and show that your matrix is approximated up to any arbitrary level of precision by a diagonalizable one... and re-use the argument from before.

There's a nice walk-though in chapter 9 of Linear Algebra Done Wrong.

https://www.math.brown.edu/~treil/papers/LADW/book.pdf

I'd recommend working through the whole book at some point -- its very well done.

I guess you could also use formal power series and classical adjoints... but that seems rather unpleasant.

Thank you ขอบคุณครับ

lavinia · Jul 21, 2017

fresh_42 said:

What about fields which are not of characteristic zero?

I think one can use the structure theorem for modules of finite type over a principal ideal domain. This works for any vector space but seems a bit abstracted away from the original question.

fresh_42 · Jul 21, 2017

lavinia said:

I think one can use the structure theorem for modules of finite type over a principal ideal domain. This works for any vector space but seems a bit abstracted away from the original question.

I only thought that the nice density argument, which I really liked, fails in these cases, and a field extension would be of no help. But I'm not sure whether there's a topological workaround. The Zariski topology comes to mind for algebraic equations. In my book it's done with generalized eigenspaces and I assumed that the theorem works for any field, although I haven't checked it. (If authors assume certain conditions on the characteristic, it's usually somewhere else in the book.)

lavinia · Jul 21, 2017

fresh_42 said:

I only thought that the nice density argument, which I really liked, fails in these cases, and a field extension would be of no help. But I'm not sure whether there's a topological workaround. The Zariski topology comes to mind for algebraic equations. In my book it's done with generalized eigenspaces and I assumed that the theorem works for any field, although I haven't checked it. (If authors assume certain conditions on the characteristic, it's usually somewhere else in the book.)

I would be very interested to see how this might work.

fresh_42 · Jul 21, 2017

lavinia said:

I would be very interested to see how this might work.

I'm afraid I don't know what you mean. I think that a topological argument doesn't work in a finite case. It's just that I don't claim anything if topology is involved, and as long as I'm not sure. Greub's proof is only linear algebra. He shows that the minimal polynomial of a transformation divides the characteristic polynomial ... and as I thought, his overall assumption at the beginning of the chapter is characteristic zero. But where would it be needed for Cayley-Hamilton?

mathwonk · Jul 22, 2017

do you understand my post #6? i.e. in high school algebra we learn the root factor theorem that in particular, if (x-a) divides a polynomial, then x=a is a root of that polynomial. Now the characteristic determinant is a polynomial with matrix coefficients. By essentially the same proof as in high school one can prove that the root ffactor theorem for polynomials with coefficients chiosen from a non commutative ring. The only difference is that since the ring elements do not commute, you must distinguish whether you plug in the element on the left or on the right. I.e. you evaluate from the left by substitutiing for X in X^r.c, and you evaluate from the right by substituting for X in c.X^r. But the same result holds. I.e. X=A is a left root of a polynomial if and only if (X-A) divides the polynomial from the left.

Then use the standard equation above in post #5, oops, it should be rather: [tI-A].adjoint[tI-A] = det[tI-A].I.

then use the fact that this equation between matrices with polynomial entries holds also as an equation between polynomials with matrix entries.

Thus the linear factor [tI-A] divides the characteristic polynomial det[tI-A] from the left, as a polynomial with matrix enries. Hence setting t = A gives a left root of the characteristic polynomial. This proof requires understanding some non - commutative algebra, but it is the simplest most direct proof since it shows that if you understand what is being said, then the standard equation immediately gives the proof. In particular the nature of the field is irrelevant.

If you do not understand this, I want to admit that it took me 40 years to grasp it after it was shown to me. But you don't have to wait that long.

I have written this up in notes for several courses on my webpage (math 8000 and math 4050). In particular, there are 4 proofs of cayley hamilton in the first 9 pages of these 8000 notes. This proof is the 4th one, on pages 8 and 9: (but the argument given there on p. 9 is wrong! but easily fixed, as explained below in my next post.)

http://alpha.math.uga.edu/%7Eroy/80006c.pdf

or see Fundamental concepts of higher algebra, by A. Adrian Albert, 1956, University of Chicago press, page 84, theorem 23.

https://www.abebooks.com/servlet/Se...damental+concepts+of+higher+algebra&kn=&isbn=

lavinia · Jul 22, 2017

mathwonk said:

http://alpha.math.uga.edu/%7Eroy/80006c.pdf

These are great notes. I have only read the beginning. The part that I read explains what I learned as an application of the structure theorem for torsion modules of finite type over a principal ideal domain to the special case of a linear endomorphism of a vector space. Truly wonderful.

mathwonk · Jul 22, 2017

thank you very much for the feedback! i should clarify one point of the non commutative argument above. in the equation

[tI-A].adjoint[tI-A] = det[tI-A].I.
one might think as in the trivial high school argument, that since setting t=A on the left side of the equation gives zero, that also setting t=A gives zero on the right side of th equation. this hiowever is not so clear here. The problem is that with polynomials with non commutative coefficients, one assumes the variable commutes with all coefficients. However after setting t=A, it is no longer true that A commutes with all coefficients. Thus it is not clear that that substitution commutes with multiplication.

[Edit: Hence the argument for cayley hamilton on page 9 of my linked notes is nonsense! My apologies. I.e. since substitution does NOT commute with multiplication of polynomials, even though setting t=A gives zero on the left side of the standard equation, it does NOT follow that it also gives zero on the right. However, I must have been getting really tired when I wrote that argument since I had previously just gone to all the trouble of proving the remainder theorem and its corollary, which DOES imply the desired result. I.e. that does say that since (tI-A) divides the characteristic polynomial (say from the left), then the remainder after dividing by tI-A, i.e. zero, does equal the evaluation of that polynomial at t=A also from the left. I.e. characteristic poly does equal zero when evaluated at A from the left. (It also does so from the right, which is clear since the characteristic polynomial actually has scalar coefficients.)]

continuing: ...
Thus the other aspect of the high school argument is ok. I.e. one usually proves the division theorem whereby the remainder after dividing f(t) by the polynomial (t-a) equals the result of substituting t=a in f. It follows then that the result of substituting equals zero if and only if (t-a) divides the polynomial exactly. Here this is also true. I.e. in the non commutative case, the remainder after left division by (t-A) equals the result of left substitution of t=A. So the same corollary holds again. I.e. the standard equation displays the fact that (t-A) does divide the characteristic polynomial. Hence setting t=A in that polynomial does give zero.

The proof of this division theorem is very easy. Namely try it first for the monomial t^n. One has as usual the equation

t^n - A^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1). Hence t^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1) + A^n. Notice that here the remainder of this division of t^n by (t-A) equals A^n, which is the result of substituting t=A into t^n.

This so far is symmetrical, but when we ramp up to having another coefficient, say C.t^n or t^n.C, it then matters on which side we substitute t=A.

i.e. then (t^n.C - A^n.C) = (t-A)(t^n-1.C +...+A^n-1.C) so t^n.C = (t-A)(t^n-1.C +...+A^n-1.C) + A^n.C. I.e. dividing t^n.C from the left by (t-A) has remainder the result of substituting t=A from the left. (I hope I got this right, I'm not checking my work here.)

Now that we have it for t^n.C we get it by addition for all polynomials.

How to prove The Cayley-Hamilton Theorem?

Similar threads

Undergrad About the existence of Hamel basis for vector spaces

Undergrad How to define a vector field?

Graduate Confusion about the Moyal-Weyl twist

Undergrad Can one find a matrix that's 'unique' to a collection of eigenvectors?

Undergrad Erroneously finding discrepancy in transpose rule

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers