# Eigenvalues of the product of two matrices

## Main Question or Discussion Point

Hello everyone,

Before I ask my question, be informed that I haven't had any formal course in linear algebra, so please forgive me if the question has a well-known answer.

I have two symmetric matrices, A and B. We know the eigenvalues and eigenvectors of A, and B. Now I need to calculate eigenvalues of the product of the two matrices, AB. The questions:

1. Is there a relation between the eigenvalues of AB and those of A and B?

2. If the answer to the above question is no, Can they be of any help in finding the eigenvalues of AB?

Thanks

Related Linear and Abstract Algebra News on Phys.org
HallsofIvy
Homework Helper
In general, no, unless they happen to have the same eigenvectors. If $\lambda$ is an eigenvalue of A and $\mu$ is an eigenvalue of B, both corresponding to eigenvector v, then we can say
$$(AB)v= A(Bv)= A(\mu v)= \mu Av= \mu(\lambda v=(\mu\lambda)v$$
That is, the eigenvalues of AB (and BA) are the products of corresponding eigenvalues of A and B separately. But that is only true if A and B have the same eigenvectors. If not, there will be no relationship.

• 1 person
Thank's HallsofIvy,

In fact I need to solve the following equation for the scalar ω2 :

det(A-ω2B)=0

This is equal to finding the eigenvalues of B-1A. I was hoping to find the eigenvalues without multiplying the matrices, because my matrices are very large and very sparse.

Hello everyone,

I have two symmetric matrices, A and B. We know the eigenvalues and eigenvectors of A, and B. Now I need to calculate eigenvalues of the product of the two matrices, AB. The questions:

1. Is there a relation between the eigenvalues of AB and those of A and B?

2. If the answer to the above question is no, Can they be of any help in finding the eigenvalues of AB?

Thanks
Not an expert on linear algebra, but anyway:
I think you can get bounds on the modulus of the eigenvalues of the product. There are very short, 1 or 2 line, proofs, based on considering scalars x'Ay (where x and y are column vectors and prime is transpose), that real symmetric matrices have real eigenvalues and that the eigenspaces corresponding to distinct eigenvalues are orthogonal. Somehow I think it can also be shown further that all real symmetric matrices have their own orthonormal basis of eigenvectors ("spectral theorem"?). This enables any eigenvector of AB to be expanded as a linear combination of orthonormal eigenvectors of B. Arguing along such lines you can (I think?) show that the modulus of any eigenvalue of AB cannot exceed the product of the moduli of the two largest eigenvalues, respectively, of A and of B. There is a whole theory of "matrix norms" which I do not know well, which can be used to produce such arguments.

If I am recalling correctly (that A and B each has an orthonormal basis of eigenvectors), then there is an orthogonal transformation mapping each member of one basis onto a different member of the other, which may possibly have consequences relevant to your question.

Last edited:
AlephZero
Homework Helper
In fact I need to solve the following equation for the scalar ω2 :

det(A-ω2B)=0

This is equal to finding the eigenvalues of B-1A. I was hoping to find the eigenvalues without multiplying the matrices, because my matrices are very large and very sparse.
Solving this by forming ##B^{-1}A## is not a good idea even for small matrices, because ##B^{-1}A## is not even symmetric when ##A## and ##B## are symmetric.

If one of ##A## or ##B## is positive definite, a slightly better idea for small matrices is to do a Cholesky factorization, say ## A = LL^T##, find the eigenpairs of ##L^{-1}BL^{-T}## and then transform the results back to the original problem.

If A and B are large sparse matrices, what you really want is a good implementation of the Lanczos method, for example ARPACK which AFAIK is included Matlab (or Google will find the Fortran code). But don't try to implement Lanczos yourself - the math description of the algorithm looks simple enough, but there are some practical issues involved in getting it to work reliably which you probably don't want to know about!

If I am recalling correctly (that A and B each has an orthonormal basis of eigenvectors), then there is an orthogonal transformation mapping each member of one basis onto a different member of the other, which may possibly have consequences relevant to your question.
Yes, I have recently learned that the eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthonormal. The proof is easy:

$Ax_{1}=\lambda_{1} x_{1} → x^{T}_{2}Ax_{1}=\lambda_{1} x^{T}_{2} x_{1}\:\:\: (1)$

$Ax_{2}=\lambda_{2} x2 → x^{T}_{1}Ax_{2}=\lambda_{2} x^{T}_{1} x_{2}$

transposoing the latter, we have

$x^{T}_{2}A^{T}x_{1}=\lambda_{2} x^{T}_{2} x_{1}\:\:\: (2)$

if A is symmetric then (1) and (2) yields $x^{T}_{2} x_{1} =0$ or $\lambda_{1} = \lambda_{2}$.

Some methods use this important property but I don't remember which ones exactly. About the bounds, I tried a little but my conclusion was different from yours. According to my rough calculations (about which I'm not sure ), the upper bound becomes the ration of the largest eigenvalue of A to the smallest of B, and the lowest bound is the ratio of the smallest to the larges. This is not so important because its quite easy to get the largest and smallest eigenvalues of a standard or a generalized eigenvalue problem using "Power method" and "Inverse method".

If A and B are large sparse matrices, what you really want is a good implementation of the Lanczos method, for example ARPACK which AFAIK is included Matlab (or Google will find the Fortran code). But don't try to implement Lanczos yourself - the math description of the algorithm looks simple enough, but there are some practical issues involved in getting it to work reliably which you probably don't want to know about!
Thanks,

Since the time I posted my question, I have learned a little bit more about the problem. My original post was about the generalized eigenvalue problem. There are vector iterative methods to solve such problems. However for large matrices, It's very expensive to calculate all the eigenvalues. We often need a number of smallest eigenvalues. I found "Subspace Iteration Method" a good one.

As for the Lanczos method, I'm still confused and despite reading dozens of papers and websites. I don't see any relation between the eigenvalues of the small tridiagonal matrix generated by the Lanczos method and the eigenvalues of the original matrix. For example the original matrix is 10000 by 10000 and we use the method for m=50. Are the 50 eigenvalues are approximation to the 50 smallest eigenvalues of the original matrix? The results I get is very different. Could you please explain how we use the Lanczos method?

Thanks for the help.

AlephZero
Homework Helper
You are right, Subspace iteration is a good method. It was pretty much the standard method before the implemtation detals of Lanczos were sorted out (in the 1980s).

The way normal people "use" Lanczos is to get some code that works, and figure out how to input the matrices to it. You don't need to understand exactly how it works, any more than you need to know how your computer calculates square roots in order to use the sqrt() function in a programming language.

Lanczos in similar to subspace iteration, in the sense that the subspace spanned by the Lanczos vectors converges to a good approximation to the lowest modes of the eigenproblem. Using the Lanczos vectors as basis vectors, the matrices are tridiagonal (unlke subspace, where they start out as full matrices and become almost diagonal as the iterations converge).

With subspace, you choose a fixed number of vectors and change all of them at each iteration, but with Lanzos you start with one (random) vector and successively add vectors one at a time. This gives a big speed improvement over subspace, because instead of N units of time to process all N vectors in each iteration, you only need 1 unit of time to create one new vector.

If you want to find say the smallest 50 eigenvalues of a typical FE model with say 50,000 degrees of freedom, subspace should be perfectly adequate for the job, even though Lanczos might run 20 times faster. But it you want 500 eigenvalues of a model with 500,000 DOF, subspace will probably struggle to work at all.

Seems I have misunderstood the Lanczos method. To me, it is more about tridiagonalization algorithm. In fact I don't see the expected iteration for finding the eigenvalues. For an NxN matrix A, you construct and mxm matrix T ( non-iterative) and find its eigenvalues. If m=N, Yes, the you get the same eigenvalues of A. but for m<<N, what we get? And how should we iterate to improve the eigenvalues? If we increase m, what is the relation between the variables of step m+1 and those of the step m?

AlephZero
Homework Helper
In Lanczos, each iteration adds another Lanczos vector and inceases the size of m by 1.

The eigenvalues of the 1x1, 2x2, 3x3, .... tridiagonal matrices converge to the eigenvalues of the full matrix. IIRC the convergence criterion is based on the eigenvectors of the tridiagonal matrix. You check whether an eigenvector of the size m+1 eigenproblem is (nearly) the same as a vector from the size m eigenproblem, with a zero term appended to it, which means the new Lanczos vector is orthogonal to the eigenvector of the NxN matrix. Because of the way the Lanczos vectors are constructed, all the later Lanczos vectors will also be orthogonal to it.

If I directly construct the largest tridiagonal that I can handle, do I lose anything by not constructing smaller ones?
When I test the method for a matrix with known eigenvalues, m needs to be large enough to get good approximations of the eigenvalues of the original matrix. By large enough ,I mean m>N/2, which is not possible in practice. The subspace iteration takes a small number but it compensates it by iteration to get good approximates of the lowest eigenvalues. I still can't recognize the expected iteration in Lanczos method. The iteration I see is in fact Gram-Schmidt orthogonality.

I think I understand it now. The iteration is in fact in the dimension of T. For convergence, what maters is "m" rather than the ratio m/N. I tested it with smaller N and expected to get good approximates with m=N/10.

Thanks for helping me understand.

Let A = P'EP
Let B = Q'DQ
where P'P = Q'Q = I

Now:
AB= P'EPQ'DQ

What we would need for the eigenvalues to be a direct product is PQ' = I and P'Q = I. But (PQ')' = QP' = I = P'Q
or Q(P'P) = P'QP = Q
or P = I
which is a different finding then previously indicated.

AlephZero,

I have a question about the subspace iteration method, specifically about matrix projection on the approximated eigenvalues at iteration j. One can iterate without projectingobtain the convectors after some number of iteration . The projection is an important step of the algorithm and is expected to reduce the number of required iterations. However, the updated vectors are not orthogonal ( except for last iteration(s) so the projection is not orthogonal while the Rayleigh_Ritz approximation used in the algorithm is based on orthogonal projection. Perhaps for this reason, when I bypass the approximation and instead only orthogonalize the updated vectors, the convergence is achieved in lower number of iteration. What do you think I am doing wrong here?

Thanks

Edit: The convergence problem was solved.I found ( by myself) that we need to reorder the Ritz eigenvectors in order to make sure each vector goes to the right one for the next iteration. This seems very important while none of the paper mentioned about it.

Last edited:
AlephZero
Homework Helper
Edit: The convergence problem was solved.I found ( by myself) that we need to reorder the Ritz eigenvectors in order to make sure each vector goes to the right one for the next iteration. This seems very important while none of the paper mentioned about it.
I'm glad you solved it, because I didn't entirely understand your terminology!

The take-home lesson from this is: there is a reason why eigenvalues and eigenvectors are sometimes called eigenpairs. You just discovered why. Sorry. For your reference and perhaps my future questions, the generalized eigenvalue problem $KX=MX\Omega^{2}$, the subspace iteration algorithm is as follows:

1. For p smalleds strat from a starting set of q=min(q+8,2*p) vectors ,${X}_{1}]\itex]. for k=2,3,... 2. Solve [itex]\overline {X}_{k+1}=M X_{k}$ for$\overline{X}_{k+1}$

3. Find the projection of K and M on vectors $\overline{X}_{k+1 }$ :

$\overline{K}_{k+1}= \overline{X}^{T}_{k+1} K \overline{X}_{k+1}$
$\overline{M}_{k+1}= \overline{X}^{T}_{k+1} M \overline{X}_{k+1}$

4. solve the eigensystem of $\overline{K}_{k+1}Q_{k+1}=\overline{M}_{k+1}Q_{k+1}\Omega^{2}_{k+1}$
The above eigensystem is much smaller than the original one and can be handled directly.

5. Find the improved approximation of the eigenvectors for the original eignesystem:
$X_{k+1}= \overline{X}_{k+1}Q_{k+1}$

6. repeat till convergence is achieved.

If we jump from step to to step 6, still the convergence is achieved. Steps 3 to 5 are to improve the algorithm. My question was about those steps. My previous comment about the order is not correct. The order becomes important when after convergence is achieved, we want to choose the p lowest ones. Also for convergence criterion, we check for pth eigenvalue. It's then important to order then before errer check at each iteration.

AlephZero
Homework Helper
You make a typo in your Tex code in step 1, and I think you also made a typo in step 2:
[
Sorry. For your reference and perhaps my future questions, the generalized eigenvalue problem $KX=MX\Omega^{2}$, the subspace iteration algorithm is as follows:

1. For p smalleds strat from a starting set of q=min(q+8,2*p) vectors ,${X}_{1}$.

for k=2,3,...

2. Solve $K\overline {X}_{k+1}=M X_{k}$ for$\overline{X}_{k+1}$
It you leave out steps 3 to 5, isn't this just the inverse power iteration method? If you don't orthogonalize the vectors at step 4, won't they all converge to mode 1?

Yes, you are right. In fact I replaced those steps with an orthogonalization step which is quite fast. Then it also converges to the smallest eigenvalues but requites a larger number of iteration. The projection is NOT an orthogonal projection yet very effective.

Let A = P'EP
Let B = Q'DQ
where P'P = Q'Q = I

Now:
AB= P'EPQ'DQ

What we would need for the eigenvalues to be a direct product is PQ' = I and P'Q = I. But (PQ')' = QP' = I = P'Q
or Q(P'P) = P'QP = Q
or P = I
which is a different finding then previously indicated.
CORRECTION:
It is true for each corresponding eigenvalue to be a direct product, one must have PQ' = I implying that PQ'Q = Q or P=Q (as Q'Q =I), which is the prior statement (namely, the eigenvectors must be the same) for each eigenvalue k. However, it is clear that:

Det(AB) = Det(A)*Det(B) = Det(P'EPQ'DQ) = Det(P'PQ'QED) = Det(I*I*ED) = Det(ED) = Det(E)*Det(D)

So, in total, the product of all the eigenvalues of A*B is equal to the product of all the eigenvalues of both A and B, from which one may be able to establish search bounds for individual eigenvalues.

Note, if the Kth eigenvalue is negative, I claim that by multiplying all entries in the Kth (or, it may be some other) row by minus one, the sign of the negative eigenvalue becomes positive (or, remains negative and another eigenvalue becomes negative) as this matrix modification is known to change only the sign of the determinant (note, absolute values of all eigenvalues can, in general, change by this row negation, but their product remains constant). If the matrix can be diagonalized, this sign change can occur only by a change in sign in one (or an odd number) of the eigenvalues. Interestingly, in one matrix product instance even without any sign change operations, with both matrix A and B having positive eigenvalues, the product matrix AB have an even number of negative eigenvalues! This occurrence is apparent without direct knowledge of the eigenvalues through the Characteristic polynomial where a term indicates the negative of the trace of eigenvalue matrix, which noticeably declined in absolute value.

Note, for all positive eigenvalues, the log of the Determinant may serve as a bound to the log of the largest possible eigenvalue. If we known the largest eigenvalue (via the Power Method), then the (log Det - Log Max Ei ) is an upper bound to log of the second largest eigenvalue.

Last edited:
If one of ##A## or ##B## is positive definite, a slightly better idea for small matrices is to do a Cholesky factorization, say ## A = LL^T##, find the eigenpairs of ##L^{-1}BL^{-T}## and then transform the results back to the original problem.
Is there a relation between the eigenvalues/eigenvectors of $L^{-1}AL^{-T}$ and those of $A$?

Note, for all positive eigenvalues, the log of the Determinant may serve as a bound to the log of the largest possible eigenvalue. If we known the largest eigenvalue (via the Power Method), then the (log Det - Log Max Ei ) is an upper bound to log of the second largest eigenvalue.
If the eigenvalues of A and B are all larger than 1, then det(AB) >det(A). That is, the largest eigenvalue of AB has an upper bound larger than that of A. In this case could the largest eigenvalue of AB be smaller than that of A? ( I think it is possible).

Here is a relationship between A = LL' and the diagonal matrix D of eigenvalues of A.

Assume there exist a matrix P such that P'P = I and PAP' = D.

Then, P(A^2)P' = D^2

or, P(LL'LL')P' = D^2

or, (PL)L'L(PL)' = D^2

Note, (D^-1/2)*(PL)*(L'P')*(D^-1/2) = (D^-1/2)*(PAP')*(D^-1/2) = (D^-1/2)*(D)*(D^-1/2) = I . So, (D^-1/2)*(PL) is the associated eigenvector of the matrix L'L and its eigenvalues equal D.

Last edited:
AlephZero
Homework Helper
Is there a relation between the eigenvalues/eigenvectors of $L^{-1}AL^{-T}$ and those of $A$?
Er, $L^{-1}AL^{-T}$ is the unit matrix!

If you meant
Is there a relation between the eigenvalues/eigenvectors of $L^{-1}BL^{-T}$ and those of $A$?
The eigenvalues are the same and the eigenvectors are related.
If ## L^{-1} B L^{-T} x = \lambda x ##, let ## y = L^{-T} x##.
## L^{-1} B y = \lambda L^T y ##
## L L^{-1} B y = \lambda L L^T y##
## B y = \lambda A y##.

Er, $L^{-1}AL^{-T}$ is the unit matrix!

If you meant

The eigenvalues are the same and the eigenvectors are related.
If ## L^{-1} B L^{-T} x = \lambda x ##, let ## y = L^{-T} x##.
## L^{-1} B y = \lambda L^T y ##
## L L^{-1} B y = \lambda L L^T y##
## B y = \lambda A y##.
Sorry, I knew that. When I asked the question I had another problem in mind which I explain bellow:

I have a symmetric positive-definite matrix A whose entries are very large and the diagonal elements are the largest. If I want to solve the system of equation $Ax_{k+1}=x_{k}$ (as it is required in most eigenvalue problems) I need to scale my matrix first otherwise the round-off error would grow large. A nice idea is to scale it with the inverse of the diagonal elements but in other to preserve the symmetry, the new equation takes the following form:

$\hat{A}y_{k+1}=y_{k}$

with $\hat{A}=D^{-1}AD^{-1}$ and $y_{k}=D^{-1}x_{k}$

where D is a diagonal matrix with entries equal to square of the diagonal entries of A. Now all the diagonal entries of $\hat{A}$ are "1.0" and the off-diagonal ones have magnitudes less than 1.0.

If there was a relation between the eigenvalues of $A$ and $\hat{A}$ , it would make the job easier. But now, without the relation, in each iteration I have to do a conversion between x and y to find the eigenpairs of A directly.

Another question is about the size of the matrices you deal with. Yo talking about decomposing a matrix and inverting the triangular matrix as if they are possible. But in finite element problems , the matrices are large and sparse. It's very usual to have matrices of order 100,000. I save the nonzero entries on my matrix and instead and of course I can't perform a complete $LL^{T}$ decomposition because the L matrix is not necessarily sparse. Am I missing something here?

AlephZero
Homework Helper
"Matrix balancing" is a well known technique for improving the behaviour of some non-symmetric eigenproblems, but the usual method is ##DAD^{-1}## not ##D^{-1}AD^{-1}##

http://web.eecs.utk.edu/~dongarra/etemplates/node205.html
http://www.mathworks.co.uk/help/techdoc/ref/balance.html

You shouldn't need to do this with symmetric positive definite matrices though, especially if the matrix is diagonally dominant.

If A is a sparse matrix, L matrix is also sparse, provided the rows and columns of L are ordered in a sensible way. For example if A is a banded matrix L has the same bandwith as A. Look up "skyline storage" for a fairly simple sparse storage method that is more general than banded matrices.

There are many algorithms to reorder the rows and columns of a matrix to reduce the size of L. Modern algorithms treat this as a problem in graph theory and may do a symbolic (non-numerical) decomposition of the matrix to attempt to minimize the amount of fill-in in the L matrix. Two simple, fast, old methods which work pretty well for finite element problems are Grooms and Cuthill-MkCee.

You can often get a pretty good ordering for a FE model simply by assembling the elements in a "logical" order working through the structure systematically, rather than using the "random" node numbers produced by a mesh generator. Many FE preprocessors will include a reordering algorithm.