How to prove The Cayley-Hamilton Theorem?

  • Context: Graduate 
  • Thread starter Thread starter Nipon Waiyaworn
  • Start date Start date
  • Tags Tags
    Theorem
Click For Summary
SUMMARY

The discussion centers on proving the Cayley-Hamilton Theorem, which states that every square matrix satisfies its own characteristic polynomial. Participants emphasize the importance of understanding minimal polynomials and generalized eigenspaces. Key strategies include proving the theorem for non-defective matrices first and then extending the proof to defective matrices using upper triangular matrices or perturbation methods. The discussion references "Linear Algebra Done Wrong" by Treil for a comprehensive walkthrough and highlights the significance of non-commutative algebra in the proof process.

PREREQUISITES
  • Understanding of the Cayley-Hamilton Theorem
  • Familiarity with minimal polynomials and generalized eigenspaces
  • Knowledge of non-commutative algebra
  • Basic concepts of linear algebra, particularly eigenvalues and eigenvectors
NEXT STEPS
  • Study the proof of the Cayley-Hamilton Theorem for non-defective matrices
  • Explore perturbation methods for extending the theorem to defective matrices
  • Read chapter 9 of "Linear Algebra Done Wrong" by Treil for detailed explanations
  • Investigate the role of non-commutative algebra in polynomial factorization
USEFUL FOR

Mathematicians, students of linear algebra, and anyone interested in advanced matrix theory and the applications of the Cayley-Hamilton Theorem.

Nipon Waiyaworn
Hello everybody. I have to prove the fallowing theorem:
The Cayley-Hamilton Theorem
upload_2017-7-19_16-44-26.png

help me please...
 
Physics news on Phys.org
How far did you get? As far as I can see, it is not as simple as it might appear, will say it takes several steps. What do you know about minimal polynomials and generalized eigenspaces? Have you done a google search? I'm almost sure that the proof can be found online.
 
What happens if you substitute A for t in the determinant equation?
 
mathman said:
What happens if you substitute A for t in the determinant equation?
20205626_852489851566272_712562201_o.jpg
 
fresh_42 said:
How far did you get? As far as I can see, it is not as simple as it might appear, will say it takes several steps. What do you know about minimal polynomials and generalized eigenspaces? Have you done a google search? I'm almost sure that the proof can be found online.
I found Chris Bernhart's document but I don't understand, How is inverse of I-tA indicated by power series in t with coefficients M(n,n)?
and How is adj(I-tA) indicated by power series in t with coefficients M(n,n)?
finally, I don't understand the last paragraph you help me to explain it please...
upload_2017-7-20_11-8-30.png


upload_2017-7-20_11-12-22.png
 
the "standard equation" formula in line 16 of your post implies the result. since it shows that (tI-A) divides the polynomial det(tI-A). I.e. non commutative algebra shows that this occurs if and only if t=A is a root of the polynomial det(tI-A), just as in high school algebra of polynomials. technically this formula shows that (tI-A) divides it from the left and hence t=A is a left root, but the formula also holds on the right. I.e. the root - factor theorem in non commutative polynomial rings takes into account that one may possibly get different results substituting values from the left or from the right, but that does not occur here. (mathman's suggestion is famously wrong however. i.e. the second equal sign in nipon's picture is false.)
 
You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

Then to extend this to defective matrices, you either cleverly work with upper triangular matrices (which has some subtleties), or perturb the eigenvalues by very small amounts, and show that your matrix is approximated up to any arbitrary level of precision by a diagonalizable one... and re-use the argument from before.

There's a nice walk-though in chapter 9 of Linear Algebra Done Wrong.

https://www.math.brown.edu/~treil/papers/LADW/book.pdf

I'd recommend working through the whole book at some point -- its very well done.

I guess you could also use formal power series and classical adjoints... but that seems rather unpleasant.
 
StoneTemplePython said:
You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

The field does not need to be the complex numbers. One needs the eigen values to be distinct when one extends the scalar field to be the complex numbers. For instance the real matrix ##\begin{pmatrix}0&-1\\1&0\end{pmatrix}## has characteristic polynomial ##(t-i)(t+i)##.
 
lavinia said:
The field does not need to be the complex numbers. One needs the eigen values to be distinct when one extends the scalar field to be the complex numbers. For instance the real matrix ##\begin{pmatrix}0&-1\\1&0\end{pmatrix}## has characteristic polynomial ##(t-i)(t+i)##.

I'm not sure what you're trying to get at here. My suggestion to OP was to start simple. To my mind this means diagonalizable matrices in an algebraically closed filed... ##\mathbb C## is very natural in this context.

I have no idea what you're referring to when you say that "one needs the eigenvalues to be distinct" when using complex numbers. The example you give is skew Hermitian which is a normal matrix, and non-defective per spectral theorem. I can think of plenty of larger ##n## x ##n## permutation matrices that have repeated eigenvalues (i.e. algebraic multiplicity ##\gt 1##). Yet these too are non-defective because they are a special case of a unitary and hence normal.

After dealing with diagonalizable matrices, the natural next step for OP is to look at Cayley Hamilton in context of defective matrices, which is a bit more work.

There are of course other approaches, but I'm inclined to start simple and build which is what LADW does.
 
  • #10
StoneTemplePython said:
I'm not sure what you're trying to get at here. My suggestion to OP was to start simple. To my mind this means diagonalizable matrices in an algebraically closed filed... ##\mathbb C## is very natural in this context.

I have no idea what you're referring to when you say that "one needs the eigenvalues to be distinct" when using complex numbers. The example you give is skew Hermitian which is a normal matrix, and non-defective per spectral theorem. I can think of plenty of larger ##n## x ##n## permutation matrices that have repeated eigenvalues (i.e. algebraic multiplicity ##\gt 1##). Yet these too are non-defective because they are a special case of a unitary and hence normal.

After dealing with diagonalizable matrices, the natural next step for OP is to look at Cayley Hamilton in context of defective matrices, which is a bit more work.

There are of course other approaches, but I'm inclined to start simple and build which is what LADW does.
I was just trying to say that you can extend scalars and get to the situation that you are talking about. I thought your point was that using the complex numbers allows the characteristic polynomial to be factored into linear factors. If these factors are distinct as you want to assume in order to "start simple" then the theorem follows immediately.

The example I gave was a rotation of the 2 dimensional real plane. It is an irreducible transformation. But tensoring with the complex numbers makes it into a linear transformation of a two dimensional complex vector space where it has two one dimensional eigenspaces with distinct eigen values.

If all you care about is a linear basis made of eigen vectors with unique eigen values as a starting point then there is no need for the complex numbers per se. In fact, if the eigenvectors form a basis then there is no need to assume that the eigenvalues are distinct.

- Interestingly, one might push your approach to get a complete proof of Cayley-Hamilton for all matrices. I think that the complex n×n matrices with distinct eigenvalues are dense in the nxn complex matrices under any reasonable norm say the sup norm on the matrix entries. This means that Cayley Hamilton is true for a Cauchy sequence of matrices that converge to any given complex matrix. One then uses extension of scalars to get the general case. I will try to see if this idea works.
 
Last edited:
  • #11
What about fields which are not of characteristic zero?
 
Last edited:
  • #12
StoneTemplePython said:
You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

Then to extend this to defective matrices, you either cleverly work with upper triangular matrices (which has some subtleties), or perturb the eigenvalues by very small amounts, and show that your matrix is approximated up to any arbitrary level of precision by a diagonalizable one... and re-use the argument from before.

There's a nice walk-though in chapter 9 of Linear Algebra Done Wrong.

https://www.math.brown.edu/~treil/papers/LADW/book.pdf

I'd recommend working through the whole book at some point -- its very well done.

I guess you could also use formal power series and classical adjoints... but that seems rather unpleasant.
Thank you ขอบคุณครับ
 
  • #13
fresh_42 said:
What about fields which are not of characteristic zero?

I think one can use the structure theorem for modules of finite type over a principal ideal domain. This works for any vector space but seems a bit abstracted away from the original question.
 
  • #14
lavinia said:
I think one can use the structure theorem for modules of finite type over a principal ideal domain. This works for any vector space but seems a bit abstracted away from the original question.
I only thought that the nice density argument, which I really liked, fails in these cases, and a field extension would be of no help. But I'm not sure whether there's a topological workaround. The Zariski topology comes to mind for algebraic equations. In my book it's done with generalized eigenspaces and I assumed that the theorem works for any field, although I haven't checked it. (If authors assume certain conditions on the characteristic, it's usually somewhere else in the book.)
 
  • #15
fresh_42 said:
I only thought that the nice density argument, which I really liked, fails in these cases, and a field extension would be of no help. But I'm not sure whether there's a topological workaround. The Zariski topology comes to mind for algebraic equations. In my book it's done with generalized eigenspaces and I assumed that the theorem works for any field, although I haven't checked it. (If authors assume certain conditions on the characteristic, it's usually somewhere else in the book.)

I would be very interested to see how this might work.
 
  • #16
lavinia said:
I would be very interested to see how this might work.
I'm afraid I don't know what you mean. I think that a topological argument doesn't work in a finite case. It's just that I don't claim anything if topology is involved, and as long as I'm not sure. Greub's proof is only linear algebra. He shows that the minimal polynomial of a transformation divides the characteristic polynomial ... and as I thought, his overall assumption at the beginning of the chapter is characteristic zero. But where would it be needed for Cayley-Hamilton?
 
  • #17
do you understand my post #6? i.e. in high school algebra we learn the root factor theorem that in particular, if (x-a) divides a polynomial, then x=a is a root of that polynomial. Now the characteristic determinant is a polynomial with matrix coefficients. By essentially the same proof as in high school one can prove that the root ffactor theorem for polynomials with coefficients chiosen from a non commutative ring. The only difference is that since the ring elements do not commute, you must distinguish whether you plug in the element on the left or on the right. I.e. you evaluate from the left by substitutiing for X in X^r.c, and you evaluate from the right by substituting for X in c.X^r. But the same result holds. I.e. X=A is a left root of a polynomial if and only if (X-A) divides the polynomial from the left.

Then use the standard equation above in post #5, oops, it should be rather: [tI-A].adjoint[tI-A] = det[tI-A].I.

then use the fact that this equation between matrices with polynomial entries holds also as an equation between polynomials with matrix entries.

Thus the linear factor [tI-A] divides the characteristic polynomial det[tI-A] from the left, as a polynomial with matrix enries. Hence setting t = A gives a left root of the characteristic polynomial. This proof requires understanding some non - commutative algebra, but it is the simplest most direct proof since it shows that if you understand what is being said, then the standard equation immediately gives the proof. In particular the nature of the field is irrelevant.

If you do not understand this, I want to admit that it took me 40 years to grasp it after it was shown to me. But you don't have to wait that long.

I have written this up in notes for several courses on my webpage (math 8000 and math 4050). In particular, there are 4 proofs of cayley hamilton in the first 9 pages of these 8000 notes. This proof is the 4th one, on pages 8 and 9: (but the argument given there on p. 9 is wrong! but easily fixed, as explained below in my next post.)

http://alpha.math.uga.edu/%7Eroy/80006c.pdf

or see Fundamental concepts of higher algebra, by A. Adrian Albert, 1956, University of Chicago press, page 84, theorem 23.

https://www.abebooks.com/servlet/Se...damental+concepts+of+higher+algebra&kn=&isbn=
 
Last edited:
  • #18
mathwonk said:
http://alpha.math.uga.edu/%7Eroy/80006c.pdf

These are great notes. I have only read the beginning. The part that I read explains what I learned as an application of the structure theorem for torsion modules of finite type over a principal ideal domain to the special case of a linear endomorphism of a vector space. Truly wonderful.
 
Last edited:
  • #19
thank you very much for the feedback! i should clarify one point of the non commutative argument above. in the equation

[tI-A].adjoint[tI-A] = det[tI-A].I.
one might think as in the trivial high school argument, that since setting t=A on the left side of the equation gives zero, that also setting t=A gives zero on the right side of th equation. this hiowever is not so clear here. The problem is that with polynomials with non commutative coefficients, one assumes the variable commutes with all coefficients. However after setting t=A, it is no longer true that A commutes with all coefficients. Thus it is not clear that that substitution commutes with multiplication.

[Edit: Hence the argument for cayley hamilton on page 9 of my linked notes is nonsense! My apologies. I.e. since substitution does NOT commute with multiplication of polynomials, even though setting t=A gives zero on the left side of the standard equation, it does NOT follow that it also gives zero on the right. However, I must have been getting really tired when I wrote that argument since I had previously just gone to all the trouble of proving the remainder theorem and its corollary, which DOES imply the desired result. I.e. that does say that since (tI-A) divides the characteristic polynomial (say from the left), then the remainder after dividing by tI-A, i.e. zero, does equal the evaluation of that polynomial at t=A also from the left. I.e. characteristic poly does equal zero when evaluated at A from the left. (It also does so from the right, which is clear since the characteristic polynomial actually has scalar coefficients.)]

continuing: ...
Thus the other aspect of the high school argument is ok. I.e. one usually proves the division theorem whereby the remainder after dividing f(t) by the polynomial (t-a) equals the result of substituting t=a in f. It follows then that the result of substituting equals zero if and only if (t-a) divides the polynomial exactly. Here this is also true. I.e. in the non commutative case, the remainder after left division by (t-A) equals the result of left substitution of t=A. So the same corollary holds again. I.e. the standard equation displays the fact that (t-A) does divide the characteristic polynomial. Hence setting t=A in that polynomial does give zero.

The proof of this division theorem is very easy. Namely try it first for the monomial t^n. One has as usual the equation

t^n - A^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1). Hence t^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1) + A^n. Notice that here the remainder of this division of t^n by (t-A) equals A^n, which is the result of substituting t=A into t^n.

This so far is symmetrical, but when we ramp up to having another coefficient, say C.t^n or t^n.C, it then matters on which side we substitute t=A.

i.e. then (t^n.C - A^n.C) = (t-A)(t^n-1.C +...+A^n-1.C) so t^n.C = (t-A)(t^n-1.C +...+A^n-1.C) + A^n.C. I.e. dividing t^n.C from the left by (t-A) has remainder the result of substituting t=A from the left. (I hope I got this right, I'm not checking my work here.)

Now that we have it for t^n.C we get it by addition for all polynomials.
 
Last edited:

Similar threads

  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
16K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K