- #1

Nipon Waiyaworn

The Cayley-Hamilton Theorem

help me please...

- A
- Thread starter Nipon Waiyaworn
- Start date

- #1

Nipon Waiyaworn

The Cayley-Hamilton Theorem

help me please...

- #2

fresh_42

Mentor

- 13,457

- 10,516

- #3

mathman

Science Advisor

- 7,858

- 446

What happens if you substitute A for t in the determinant equation?

- #4

Nipon Waiyaworn

What happens if you substitute A for t in the determinant equation?

- #5

Nipon Waiyaworn

I found Chris Bernhart's document but I don't understand, How is inverse of I-tA indicated by power series in t with coefficients M(n,n)?

and How is adj(I-tA) indicated by power series in t with coefficients M(n,n)?

finally, I don't understand the last paragraph you help me to explain it please...

- #6

mathwonk

Science Advisor

Homework Helper

- 11,002

- 1,195

- #7

StoneTemplePython

Science Advisor

Gold Member

2019 Award

- 1,167

- 569

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

Then to extend this to defective matrices, you either cleverly work with upper triangular matrices (which has some subtleties), or perturb the eigenvalues by

There's a nice walk-though in chapter 9 of Linear Algebra Done Wrong.

https://www.math.brown.edu/~treil/papers/LADW/book.pdf

I'd recommend working through the whole book at some point -- its very well done.

I guess you could also use formal power series and classical adjoints... but that seems rather unpleasant.

- #8

lavinia

Science Advisor

Gold Member

- 3,236

- 623

The field does not need to be the complex numbers. One needs the eigen values to be distinct when one extends the scalar field to be the complex numbers. For instance the real matrix ##\begin{pmatrix}0&-1\\1&0\end{pmatrix}## has characteristic polynomial ##(t-i)(t+i)##.You might want to start simple here.

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

- #9

StoneTemplePython

Science Advisor

Gold Member

2019 Award

- 1,167

- 569

I'm not sure what you're trying to get at here. My suggestion to OP was to start simple. To my mind this means diagonalizable matrices in an algebraically closed filed... ##\mathbb C## is very natural in this context.The field does not need to be the complex numbers. One needs the eigen values to be distinct when one extends the scalar field to be the complex numbers. For instance the real matrix ##\begin{pmatrix}0&-1\\1&0\end{pmatrix}## has characteristic polynomial ##(t-i)(t+i)##.

I have no idea what you're referring to when you say that "one needs the eigenvalues to be distinct" when using complex numbers. The example you give is skew Hermitian which is a normal matrix, and non-defective per spectral theorem. I can think of plenty of larger ##n## x ##n## permutation matrices that have repeated eigenvalues (i.e. algebraic multiplicity ##\gt 1##). Yet these too are non-defective because they are a special case of a unitary and hence normal.

After dealing with diagonalizable matrices, the natural next step for OP is to look at Cayley Hamilton in context of defective matrices, which is a bit more work.

There are of course other approaches, but I'm inclined to start simple and build which is what LADW does.

- #10

lavinia

Science Advisor

Gold Member

- 3,236

- 623

I was just trying to say that you can extend scalars and get to the situation that you are talking about. I thought your point was that using the complex numbers allows the characteristic polynomial to be factored into linear factors. If these factors are distinct as you want to assume in order to "start simple" then the theorem follows immediately.I'm not sure what you're trying to get at here. My suggestion to OP was to start simple. To my mind this means diagonalizable matrices in an algebraically closed filed... ##\mathbb C## is very natural in this context.

I have no idea what you're referring to when you say that "one needs the eigenvalues to be distinct" when using complex numbers. The example you give is skew Hermitian which is a normal matrix, and non-defective per spectral theorem. I can think of plenty of larger ##n## x ##n## permutation matrices that have repeated eigenvalues (i.e. algebraic multiplicity ##\gt 1##). Yet these too are non-defective because they are a special case of a unitary and hence normal.

After dealing with diagonalizable matrices, the natural next step for OP is to look at Cayley Hamilton in context of defective matrices, which is a bit more work.

There are of course other approaches, but I'm inclined to start simple and build which is what LADW does.

The example I gave was a rotation of the 2 dimensional real plane. It is an irreducible transformation. But tensoring with the complex numbers makes it into a linear transformation of a two dimensional complex vector space where it has two one dimensional eigenspaces with distinct eigen values.

If all you care about is a linear basis made of eigen vectors with unique eigen values as a starting point then there is no need for the complex numbers per se. In fact, if the eigenvectors form a basis then there is no need to assume that the eigenvalues are distinct.

- Interestingly, one might push your approach to get a complete proof of Cayley-Hamilton for all matrices. I think that the complex n×n matrices with distinct eigenvalues are dense in the nxn complex matrices under any reasonable norm say the sup norm on the matrix entries. This means that Cayley Hamilton is true for a Cauchy sequence of matrices that converge to any given complex matrix. One then uses extension of scalars to get the general case. I will try to see if this idea works.

Last edited:

- #11

fresh_42

Mentor

- 13,457

- 10,516

What about fields which are not of characteristic zero?

Last edited:

- #12

Nipon Waiyaworn

Thank you ขอบคุณครับ

assume the field is ##\mathbb C## (or if all eigenvalues are real, and the matrix is real, then you can use ##\mathbb R##)

Now assume the matrix is not defective (i.e. the ##n## x ##n## matrix has ##n## linearly independent eigenvectors.) Can you prove Cayley Hamilton for the non-defective case?

Then to extend this to defective matrices, you either cleverly work with upper triangular matrices (which has some subtleties), or perturb the eigenvalues byvery small amounts,and show that your matrix is approximated up to any arbitrary level of precision by a diagonalizable one... and re-use the argument from before.

There's a nice walk-though in chapter 9 of Linear Algebra Done Wrong.

https://www.math.brown.edu/~treil/papers/LADW/book.pdf

I'd recommend working through the whole book at some point -- its very well done.

I guess you could also use formal power series and classical adjoints... but that seems rather unpleasant.

- #13

lavinia

Science Advisor

Gold Member

- 3,236

- 623

I think one can use the structure theorem for modules of finite type over a principal ideal domain. This works for any vector space but seems a bit abstracted away from the original question.What about fields which are not of characteristic zero?

- #14

fresh_42

Mentor

- 13,457

- 10,516

I only thought that the nice density argument, which I really liked, fails in these cases, and a field extension would be of no help. But I'm not sure whether there's a topological workaround. The Zariski topology comes to mind for algebraic equations. In my book it's done with generalized eigenspaces and I assumed that the theorem works for any field, although I haven't checked it. (If authors assume certain conditions on the characteristic, it's usually somewhere else in the book.)I think one can use the structure theorem for modules of finite type over a principal ideal domain. This works for any vector space but seems a bit abstracted away from the original question.

- #15

lavinia

Science Advisor

Gold Member

- 3,236

- 623

I would be very interested to see how this might work.I only thought that the nice density argument, which I really liked, fails in these cases, and a field extension would be of no help. But I'm not sure whether there's a topological workaround. The Zariski topology comes to mind for algebraic equations. In my book it's done with generalized eigenspaces and I assumed that the theorem works for any field, although I haven't checked it. (If authors assume certain conditions on the characteristic, it's usually somewhere else in the book.)

- #16

fresh_42

Mentor

- 13,457

- 10,516

I'm afraid I don't know what you mean. I think that a topological argument doesn't work in a finite case. It's just that I don't claim anything if topology is involved, and as long as I'm not sure. Greub's proof is only linear algebra. He shows that the minimal polynomial of a transformation divides the characteristic polynomial ... and as I thought, his overall assumption at the beginning of the chapter is characteristic zero. But where would it be needed for Cayley-Hamilton?I would be very interested to see how this might work.

- #17

mathwonk

Science Advisor

Homework Helper

- 11,002

- 1,195

do you understand my post #6? i.e. in high school algebra we learn the root factor theorem that in particular, if (x-a) divides a polynomial, then x=a is a root of that polynomial. Now the characteristic determinant is a polynomial with matrix coefficients. By essentially the same proof as in high school one can prove that the root ffactor theorem for polynomials with coefficients chiosen from a non commutative ring. The only difference is that since the ring elements do not commute, you must distinguish whether you plug in the element on the left or on the right. I.e. you evaluate from the left by substitutiing for X in X^r.c, and you evaluate from the right by substituting for X in c.X^r. But the same result holds. I.e. X=A is a left root of a polynomial if and only if (X-A) divides the polynomial from the left.

Then use the standard equation above in post #5, oops, it should be rather: [tI-A].adjoint[tI-A] = det[tI-A].I.

then use the fact that this equation between matrices with polynomial entries holds also as an equation between polynomials with matrix entries.

Thus the linear factor [tI-A] divides the characteristic polynomial det[tI-A] from the left, as a polynomial with matrix enries. Hence setting t = A gives a left root of the characteristic polynomial. This proof requires understanding some non - commutative algebra, but it is the simplest most direct proof since it shows that if you understand what is being said, then the standard equation immediately gives the proof. In particular the nature of the field is irrelevant.

If you do not understand this, I want to admit that it took me 40 years to grasp it after it was shown to me. But you don't have to wait that long.

I have written this up in notes for several courses on my webpage (math 8000 and math 4050). In particular, there are 4 proofs of cayley hamilton in the first 9 pages of these 8000 notes. This proof is the 4th one, on pages 8 and 9: (but the argument given there on p. 9 is wrong! but easily fixed, as explained below in my next post.)

http://alpha.math.uga.edu/~roy/80006c.pdf

or see Fundamental concepts of higher algebra, by A. Adrian Albert, 1956, University of Chicago press, page 84, theorem 23.

https://www.abebooks.com/servlet/Se...damental+concepts+of+higher+algebra&kn=&isbn=

Then use the standard equation above in post #5, oops, it should be rather: [tI-A].adjoint[tI-A] = det[tI-A].I.

then use the fact that this equation between matrices with polynomial entries holds also as an equation between polynomials with matrix entries.

Thus the linear factor [tI-A] divides the characteristic polynomial det[tI-A] from the left, as a polynomial with matrix enries. Hence setting t = A gives a left root of the characteristic polynomial. This proof requires understanding some non - commutative algebra, but it is the simplest most direct proof since it shows that if you understand what is being said, then the standard equation immediately gives the proof. In particular the nature of the field is irrelevant.

If you do not understand this, I want to admit that it took me 40 years to grasp it after it was shown to me. But you don't have to wait that long.

I have written this up in notes for several courses on my webpage (math 8000 and math 4050). In particular, there are 4 proofs of cayley hamilton in the first 9 pages of these 8000 notes. This proof is the 4th one, on pages 8 and 9: (but the argument given there on p. 9 is wrong! but easily fixed, as explained below in my next post.)

http://alpha.math.uga.edu/~roy/80006c.pdf

or see Fundamental concepts of higher algebra, by A. Adrian Albert, 1956, University of Chicago press, page 84, theorem 23.

https://www.abebooks.com/servlet/Se...damental+concepts+of+higher+algebra&kn=&isbn=

Last edited:

- #18

lavinia

Science Advisor

Gold Member

- 3,236

- 623

These are great notes. I have only read the beginning. The part that I read explains what I learned as an application of the structure theorem for torsion modules of finite type over a principal ideal domain to the special case of a linear endomorphism of a vector space. Truly wonderful.

Last edited:

- #19

mathwonk

Science Advisor

Homework Helper

- 11,002

- 1,195

thank you very much for the feedback!

i should clarify one point of the non commutative argument above. in the equation

[tI-A].adjoint[tI-A] = det[tI-A].I.

one might think as in the trivial high school argument, that since setting t=A on the left side of the equation gives zero, that also setting t=A gives zero on the right side of th equation. this hiowever is not so clear here. The problem is that with polynomials with non commutative coefficients, one assumes the variable commutes with all coefficients. However after setting t=A, it is no longer true that A commutes with all coefficients. Thus it is not clear that that substitution commutes with multiplication.

[Edit: Hence the argument for cayley hamilton on page 9 of my linked notes is nonsense! My apologies. I.e. since substitution does NOT commute with multiplication of polynomials, even though setting t=A gives zero on the left side of the standard equation, it does NOT follow that it also gives zero on the right. However, I must have been getting really tired when I wrote that argument since I had previously just gone to all the trouble of proving the remainder theorem and its corollary, which DOES imply the desired result. I.e. that does say that since (tI-A) divides the characteristic polynomial (say from the left), then the remainder after dividing by tI-A, i.e. zero, does equal the evaluation of that polynomial at t=A also from the left. I.e. characteristic poly does equal zero when evaluated at A from the left. (It also does so from the right, which is clear since the characteristic polynomial actually has scalar coefficients.)]

continuing: ......

Thus the other aspect of the high school argument is ok. I.e. one usually proves the division theorem whereby the remainder after dividing f(t) by the polynomial (t-a) equals the result of substituting t=a in f. It follows then that the result of substituting equals zero if and only if (t-a) divides the polynomial exactly. Here this is also true. I.e. in the non commutative case, the remainder after left division by (t-A) equals the result of left substitution of t=A. So the same corollary holds again. I.e. the standard equation displays the fact that (t-A) does divide the characteristic polynomial. Hence setting t=A in that polynomial does give zero.

The proof of this division theorem is very easy. Namely try it first for the monomial t^n. One has as usual the equation

t^n - A^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1). Hence t^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1) + A^n. Notice that here the remainder of this division of t^n by (t-A) equals A^n, which is the result of substituting t=A into t^n.

This so far is symmetrical, but when we ramp up to having another coefficient, say C.t^n or t^n.C, it then matters on which side we substitute t=A.

i.e. then (t^n.C - A^n.C) = (t-A)(t^n-1.C +.....+A^n-1.C) so t^n.C = (t-A)(t^n-1.C +.....+A^n-1.C) + A^n.C. I.e. dividing t^n.C from the left by (t-A) has remainder the result of substituting t=A from the left. (I hope I got this right, I'm not checking my work here.)

Now that we have it for t^n.C we get it by addition for all polynomials.

i should clarify one point of the non commutative argument above. in the equation

[tI-A].adjoint[tI-A] = det[tI-A].I.

one might think as in the trivial high school argument, that since setting t=A on the left side of the equation gives zero, that also setting t=A gives zero on the right side of th equation. this hiowever is not so clear here. The problem is that with polynomials with non commutative coefficients, one assumes the variable commutes with all coefficients. However after setting t=A, it is no longer true that A commutes with all coefficients. Thus it is not clear that that substitution commutes with multiplication.

[Edit: Hence the argument for cayley hamilton on page 9 of my linked notes is nonsense! My apologies. I.e. since substitution does NOT commute with multiplication of polynomials, even though setting t=A gives zero on the left side of the standard equation, it does NOT follow that it also gives zero on the right. However, I must have been getting really tired when I wrote that argument since I had previously just gone to all the trouble of proving the remainder theorem and its corollary, which DOES imply the desired result. I.e. that does say that since (tI-A) divides the characteristic polynomial (say from the left), then the remainder after dividing by tI-A, i.e. zero, does equal the evaluation of that polynomial at t=A also from the left. I.e. characteristic poly does equal zero when evaluated at A from the left. (It also does so from the right, which is clear since the characteristic polynomial actually has scalar coefficients.)]

continuing: ......

Thus the other aspect of the high school argument is ok. I.e. one usually proves the division theorem whereby the remainder after dividing f(t) by the polynomial (t-a) equals the result of substituting t=a in f. It follows then that the result of substituting equals zero if and only if (t-a) divides the polynomial exactly. Here this is also true. I.e. in the non commutative case, the remainder after left division by (t-A) equals the result of left substitution of t=A. So the same corollary holds again. I.e. the standard equation displays the fact that (t-A) does divide the characteristic polynomial. Hence setting t=A in that polynomial does give zero.

The proof of this division theorem is very easy. Namely try it first for the monomial t^n. One has as usual the equation

t^n - A^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1). Hence t^n = (t-A)(t^n-1 + A.t^n-2 + A^2.t^n-3 +...+A^n-1) + A^n. Notice that here the remainder of this division of t^n by (t-A) equals A^n, which is the result of substituting t=A into t^n.

This so far is symmetrical, but when we ramp up to having another coefficient, say C.t^n or t^n.C, it then matters on which side we substitute t=A.

i.e. then (t^n.C - A^n.C) = (t-A)(t^n-1.C +.....+A^n-1.C) so t^n.C = (t-A)(t^n-1.C +.....+A^n-1.C) + A^n.C. I.e. dividing t^n.C from the left by (t-A) has remainder the result of substituting t=A from the left. (I hope I got this right, I'm not checking my work here.)

Now that we have it for t^n.C we get it by addition for all polynomials.

Last edited:

- Last Post

- Replies
- 7

- Views
- 8K

- Last Post

- Replies
- 3

- Views
- 14K

- Last Post

- Replies
- 3

- Views
- 5K

- Replies
- 1

- Views
- 5K

- Replies
- 2

- Views
- 860

- Last Post

- Replies
- 7

- Views
- 6K

- Last Post

- Replies
- 1

- Views
- 8K

- Replies
- 8

- Views
- 2K

- Last Post

- Replies
- 1

- Views
- 2K

- Last Post

- Replies
- 1

- Views
- 2K