# Continuity of Matrix Inverse

## Main Question or Discussion Point

When doing some self-study in probability, I have seen a number of authors state, without proof or justification, that the inverse of a matrix is continuous. For instance, a passage in a popular econometrics text (White (2001)) reads: "The matrix inverse function is continuous at every point that represents a nonsingular matrix" (p16). After poring through a number of references on linear algebra, I have yet to find even a definition of what it means for a function on a matrix to be continuous, yet alone how I would go about showing that the inverse satisfies these properties. I subsequently have two questions:
(1) What does it mean for a function accepting a matrix as an argument to be continuous?
Any help is greatly appreciated

Related Linear and Abstract Algebra News on Phys.org
You have to give the set of matrices a topology for continuity to mean something; you should be able to do that by making a metric space out of the set of matrices, using any matrix norm. From that, you can define continuous functions of matrices. Once you've done that, you should be able to use the epsilon-delta stuff to test continuity of matrix functions.

HallsofIvy
Homework Helper
The statement "the function f(M)= M-1", or, for that matter, that any function is continuous, makes no sense with out some kind of topology. Have those authors defined a topology for matrices? In other words have they defined a "distance" between matrices in order to make "close to" precise? Have they defined a "norm"? That can be used to define a topology the way absolute value is used to define distance in the real numbers.

Thanks for the replies. There has been no mention of a matrix topology, and a norm has never been defined when this claim of continuity was made. This is actually why I was so confused and felt that I was missing something entirely (that is, was the result was so obvious that it needed no further explanation?). Unfortunately, this is a common frustration with the exposition in many econometrics books. Do either of you happen to know of a reference that covers analysis-type issues on matrices (e.g., this continuity question). I am going to try to work out some examples on my own, but I always find it somewhat comforting to have some examples for reference.
Thanks again.

they are probably talking about the standard norm on a finite dimensional vector space. In fact, all norms for finite vector spaces are equivalent in the sense that one norm is related to another by
C|a|_2<|a|_1<D|a|_2. Where C, D are constants. Hence they all give the same topology.

So pick any norm you want, be it the max of absolute value of all entries, or the sup of |Ax| where x is on the unit sphere and |x| the euclidean norm... The later is the one used in Rudin's analysis book. He gives a very clear proof on the continuity of the inverse.

tim_lou,
Thanks for the help. Are you referring to "baby" Rudin (Principles of Mathematical Analysis), or his other analysis book? I am away from my books, but I look forward to going over this when I return.

Tom

I think there's something called Cramer's rule that tells you how to construct the inverse of a matrix in terms of determinant of the matrix and the cofactor of the matrix. Since both of these objects are polynomials the entries of inverse matrix are just polynomial functions of the entries of the original matrix. And these polynomial functions are of course continuous.

(You probably have to be a little more careful, making sure you aren't dividing by 0, etc. But I think the requirement that the determinant of the original matrix is not 0 will help.)

Hope that gives you a start.

You could proceed by induction. Suppose M and W are matrices that are nearby, and split the matrices into blocks, say,

$$M = \left(\begin{matrix}a_m & b_m* \\ c_m & D_m\end{matrix}\right)$$

$$W = \left(\begin{matrix}a_w & b_w* \\ c_w & D_w\end{matrix}\right)$$

where for each matrix a is a scalar, b* a row vector, c a column vector, and D is an invertible k-1 by k-1 matrix. The inverses can be explicitly calculated in terms of the blocks:

$$\left(\begin{matrix}a & b* \\ c & D\end{matrix}\right)^{-1} = \left(\begin{matrix}(a-b^*D^{-1}c)^{-1} & -a^{-1}b(D-a^{-1}cb^*)^{-1} \\ -cD^{-1}(a-b^*D^{-1}c)^{-1} & (D-a^{-1}cb^*)^{-1}\end{matrix}\right)$$

Now we just have to verify that each block of M-1-W-1 can be made sufficiently small. Choose M and W close enough that

$$a_m = a_w(1+O(\epsilon))$$

$$b^*_m = b^*_w(I+O(\epsilon))$$

$$c_m = c_w(I+O(\epsilon))$$

$$D_m^{-1} = D_w^{-1}(I+O(\epsilon))$$

The last such statement about D-1 follows from the inductive hypothesis - Dm and Dw are close k-1 by k-1 invertible matrices, so Dm-1 and Dw-1 are close as well.

Then the first element is:

$$(M^{-1}-W^{-1})_{1,1} = (a_m-b_m^*D_m^{-1}c_m)^{-1} - (a_w-b_w^*D_w^{-1}c_w)^{-1}$$

$$= (a_w(1+O(\epsilon))-b_w^*(I+O(\epsilon))D_w^{-1}(I+O(\epsilon))c_w(I+O(\epsilon)))^{-1} - (a_w-b_w^*D_w^{-1}c_w)^{-1}$$

$$= (a_w-b_w^*D_w^{-1}c_w)^{-1}(1+O(\epsilon)) - (a_w-b_w^*D_w^{-1}c_w)^{-1}$$

$$=O(\epsilon)(a_w-b_w^*D_w^{-1}c_w)^{-1} = O(\epsilon) \cdot const$$

So $(M^{-1}-W^{-1})_{1,1}$ can be made arbitrarily small. The other blocks $(M^{-1}-W^{-1})_{1,2}$, $(M^{-1}-W^{-1})_{2,1}$, and $(M^{-1}-W^{-1})_{2,2}$ follow similarly, though you will need to invoke the Sherman-Morrison formula (about the inverse of a rank-1 update of a matrix):

$$(A+xy^*)^{-1} = A^{-1}-\frac{A^{-1}xy^*A^{-1}}{1+y^*A^{-1}x}$$

to deal with the (1,2) and (2,2) blocks.

Last edited:
Oh, here is an easier way using the Sherman Morrison formula more directly,

$$(A+\epsilon xy^*)^{-1} = A^{-1}-\epsilon \frac{A^{-1}xy^*A^{-1}}{1+\epsilon y^*A^{-1}x} = A^{-1} + \epsilon M$$

Where $\epsilon M$ slightly perturbs $A^{-1}$, with ||M|| < ||xy*|| ||A||2. More generally,

$$\left(A + \epsilon \sum_{j=1}^{n} x_j y_j^*\right)^{-1} = \left( \left(A + \epsilon \sum_{j=1}^{n-1} x_j y_j^*\right) + \epsilon x_n y_n^*\right)^{-1} = \left(A + \epsilon \sum_{j=1}^{n-1} x_j y_j^*\right)^{-1} + \epsilon M_n$$

$$= \left(A + \epsilon \sum_{j=1}^{n-2} x_j y_j^*\right)^{-1} + \epsilon M_{n-1} + \epsilon M_n = \hdots = A^{-1} + \epsilon \left(M_1 + M_2 + \hdots + M_n\right) = A^{-1} + \epsilon M$$

with $||M|| < c ||\sum_{j=1}^{n} x_j y_j^*||$. You could calculate the exact constant c if you are careful about keeping track of the errors, but I'm too lazy. The key thing is that the constant depends only on A and n, but not on the xj's and yj's. But any matrix can be written as a sum of rank-1 matrices (ie: $W = \sum_j x_j y_j^*$), so for all W of norm less than or equal to 1 there exists an M such that $\left(A + \epsilon W\right)^{-1} = A^{-1} + \epsilon M$ with ||M|| < c.

Then for $||A-B|| < \epsilon$, we can rewrite B as a perturbation of A: $B = A + \epsilon W$ with ||W|| less than or equal to 1. So by the above, we have
$$||A^{-1} - B^{-1}|| = ||A^{-1} - \left(A^{-1} + \epsilon M\right)|| = \epsilon ||M|| < \epsilon c$$

Last edited:
mathwonk