Cayley Hamilton Theorem

1. Nov 10, 2012

Benn

In my linear algebra course, we just finished proving the cayley hamilton theorem (if p(x) = det (A - xI), then p(A) = 0).

The theorem seems obvious: if you plug in A into p, you get det (A-AI) = det (0) = 0. But, of course, you can't do that (this is especially clear when you consider what A-xI looks like... you can't subtract matrices from real numbers)

Is there any way to salvage the idea of just plugging in A? or is it just a coincidence that it seems so obvious?

2. Nov 10, 2012

bins4wins

We know p(x) is the characteristic polynomial of A. The meaning of p(A) is to take the input A into the characteristic polynomial, not into the formula d(A - xI). Also, note that the 0 in the expression p(A) = 0 isn't referring to a scalar 0, its referring to the zero matrix.

3. Nov 10, 2012

Benn

Yes, thank you.

I understand that. But I wasn't sure if it was just a coincidence that the theorem seemed so obvious when we considered the characteristic polynomial to be det (A-xI), or if there was some way to make rigorous the idea of plugging in A.

4. Nov 10, 2012

bins4wins

Personally, I find the idea of the notation p(A) fairly loose in terms of rigor, since the definition of p as a function has a domain of the reals. The most amount of rigor you can put in plugging in A is just by defining what it exactly means to plug in A and that is to plug it into the characteristic polynomial expression.

5. Nov 10, 2012

Benn

If $p(x) = c_{n}x^{n} + ... + c_{1}x + c_{0}$ where $p$ is defined from $\mathbb{R}$ to $\mathbb{R}$ and $c_{i} \in \mathbb{R}$, then $p(A): \{ \text{m x m matrices} \} \rightarrow \{ \text{m x m matrices} \}$ is defined by $p(A) = c_{n}A^{n} + ... + c_{1}A + c_{0}I$. ... I'm completely happy with that.

I must not have been clear in my question. I'm asking for a proof using the idea of 'plugging A into det (A - xI) or an explanation of why there isn't one, not an clarification of the statement of the theorem.

6. Nov 10, 2012

Ray Vickson

Well, for one thing, p(A) is a matrix (that happens to have all entries = 0), while det(I.A - A) is a scalar (that happens to equal zero).

More generally, suppose we have p(x) = det(xB + C) = det(Bx + C) for nxn matrices B and C, and suppose we happen to have p(A) = 0. Does it follow that det(AB + C) = 0 or that det(BA + C) = 0? Conversely, if either det(AB + C) = 0 or det(BA + C) = 0, does it follow that p(A) = 0? I am not 100% sure of the answers, but I have my doubts that any of the answers is "yes". (If this turns out to be right, then it would be a sheer accident that you happen to get the correct result that p(A) = 0 in the special case that B = I and C = A.)

RGV

Last edited: Nov 10, 2012