Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Continuity of Matrix Inverse

  1. Dec 14, 2008 #1
    When doing some self-study in probability, I have seen a number of authors state, without proof or justification, that the inverse of a matrix is continuous. For instance, a passage in a popular econometrics text (White (2001)) reads: "The matrix inverse function is continuous at every point that represents a nonsingular matrix" (p16). After poring through a number of references on linear algebra, I have yet to find even a definition of what it means for a function on a matrix to be continuous, yet alone how I would go about showing that the inverse satisfies these properties. I subsequently have two questions:
    (1) What does it mean for a function accepting a matrix as an argument to be continuous?
    (2) Do you know of any references in which I could learn more about such functions?
    Any help is greatly appreciated
  2. jcsd
  3. Dec 14, 2008 #2
    You have to give the set of matrices a topology for continuity to mean something; you should be able to do that by making a metric space out of the set of matrices, using any matrix norm. From that, you can define continuous functions of matrices. Once you've done that, you should be able to use the epsilon-delta stuff to test continuity of matrix functions.
  4. Dec 14, 2008 #3


    User Avatar
    Science Advisor

    The statement "the function f(M)= M-1", or, for that matter, that any function is continuous, makes no sense with out some kind of topology. Have those authors defined a topology for matrices? In other words have they defined a "distance" between matrices in order to make "close to" precise? Have they defined a "norm"? That can be used to define a topology the way absolute value is used to define distance in the real numbers.
  5. Dec 15, 2008 #4
    Thanks for the replies. There has been no mention of a matrix topology, and a norm has never been defined when this claim of continuity was made. This is actually why I was so confused and felt that I was missing something entirely (that is, was the result was so obvious that it needed no further explanation?). Unfortunately, this is a common frustration with the exposition in many econometrics books. Do either of you happen to know of a reference that covers analysis-type issues on matrices (e.g., this continuity question). I am going to try to work out some examples on my own, but I always find it somewhat comforting to have some examples for reference.
    Thanks again.
  6. Dec 18, 2008 #5
    they are probably talking about the standard norm on a finite dimensional vector space. In fact, all norms for finite vector spaces are equivalent in the sense that one norm is related to another by
    C|a|_2<|a|_1<D|a|_2. Where C, D are constants. Hence they all give the same topology.

    So pick any norm you want, be it the max of absolute value of all entries, or the sup of |Ax| where x is on the unit sphere and |x| the euclidean norm... The later is the one used in Rudin's analysis book. He gives a very clear proof on the continuity of the inverse.
  7. Dec 18, 2008 #6
    Thanks for the help. Are you referring to "baby" Rudin (Principles of Mathematical Analysis), or his other analysis book? I am away from my books, but I look forward to going over this when I return.

  8. Dec 18, 2008 #7
    I think there's something called Cramer's rule that tells you how to construct the inverse of a matrix in terms of determinant of the matrix and the cofactor of the matrix. Since both of these objects are polynomials the entries of inverse matrix are just polynomial functions of the entries of the original matrix. And these polynomial functions are of course continuous.

    (You probably have to be a little more careful, making sure you aren't dividing by 0, etc. But I think the requirement that the determinant of the original matrix is not 0 will help.)

    Hope that gives you a start.
  9. Dec 19, 2008 #8
    You could proceed by induction. Suppose M and W are matrices that are nearby, and split the matrices into blocks, say,

    [tex]M = \left(\begin{matrix}a_m & b_m* \\ c_m & D_m\end{matrix}\right)[/tex]

    [tex]W = \left(\begin{matrix}a_w & b_w* \\ c_w & D_w\end{matrix}\right)[/tex]

    where for each matrix a is a scalar, b* a row vector, c a column vector, and D is an invertible k-1 by k-1 matrix. The inverses can be explicitly calculated in terms of the blocks:

    [tex]\left(\begin{matrix}a & b* \\ c & D\end{matrix}\right)^{-1} = \left(\begin{matrix}(a-b^*D^{-1}c)^{-1} & -a^{-1}b(D-a^{-1}cb^*)^{-1} \\ -cD^{-1}(a-b^*D^{-1}c)^{-1} & (D-a^{-1}cb^*)^{-1}\end{matrix}\right)[/tex]

    Now we just have to verify that each block of M-1-W-1 can be made sufficiently small. Choose M and W close enough that

    [tex]a_m = a_w(1+O(\epsilon))[/tex]

    [tex]b^*_m = b^*_w(I+O(\epsilon))[/tex]

    [tex]c_m = c_w(I+O(\epsilon))[/tex]

    [tex]D_m^{-1} = D_w^{-1}(I+O(\epsilon))[/tex]

    The last such statement about D-1 follows from the inductive hypothesis - Dm and Dw are close k-1 by k-1 invertible matrices, so Dm-1 and Dw-1 are close as well.

    Then the first element is:

    [tex](M^{-1}-W^{-1})_{1,1} = (a_m-b_m^*D_m^{-1}c_m)^{-1} - (a_w-b_w^*D_w^{-1}c_w)^{-1}[/tex]

    [tex]= (a_w(1+O(\epsilon))-b_w^*(I+O(\epsilon))D_w^{-1}(I+O(\epsilon))c_w(I+O(\epsilon)))^{-1} - (a_w-b_w^*D_w^{-1}c_w)^{-1} [/tex]

    [tex]= (a_w-b_w^*D_w^{-1}c_w)^{-1}(1+O(\epsilon)) - (a_w-b_w^*D_w^{-1}c_w)^{-1}[/tex]

    [tex]=O(\epsilon)(a_w-b_w^*D_w^{-1}c_w)^{-1} = O(\epsilon) \cdot const[/tex]

    So [itex](M^{-1}-W^{-1})_{1,1}[/itex] can be made arbitrarily small. The other blocks [itex](M^{-1}-W^{-1})_{1,2}[/itex], [itex](M^{-1}-W^{-1})_{2,1}[/itex], and [itex](M^{-1}-W^{-1})_{2,2}[/itex] follow similarly, though you will need to invoke the Sherman-Morrison formula (about the inverse of a rank-1 update of a matrix):

    [tex](A+xy^*)^{-1} = A^{-1}-\frac{A^{-1}xy^*A^{-1}}{1+y^*A^{-1}x}[/tex]

    to deal with the (1,2) and (2,2) blocks.
    Last edited: Dec 19, 2008
  10. Dec 19, 2008 #9
    Oh, here is an easier way using the Sherman Morrison formula more directly,

    [tex](A+\epsilon xy^*)^{-1} = A^{-1}-\epsilon \frac{A^{-1}xy^*A^{-1}}{1+\epsilon y^*A^{-1}x} = A^{-1} + \epsilon M[/tex]

    Where [itex]\epsilon M[/itex] slightly perturbs [itex]A^{-1}[/itex], with ||M|| < ||xy*|| ||A||2. More generally,

    [tex]\left(A + \epsilon \sum_{j=1}^{n} x_j y_j^*\right)^{-1} = \left( \left(A + \epsilon \sum_{j=1}^{n-1} x_j y_j^*\right) + \epsilon x_n y_n^*\right)^{-1} = \left(A + \epsilon \sum_{j=1}^{n-1} x_j y_j^*\right)^{-1} + \epsilon M_n[/tex]

    [tex]= \left(A + \epsilon \sum_{j=1}^{n-2} x_j y_j^*\right)^{-1} + \epsilon M_{n-1} + \epsilon M_n = \hdots = A^{-1} + \epsilon \left(M_1 + M_2 + \hdots + M_n\right) = A^{-1} + \epsilon M[/tex]

    with [itex]||M|| < c ||\sum_{j=1}^{n} x_j y_j^*||[/itex]. You could calculate the exact constant c if you are careful about keeping track of the errors, but I'm too lazy. The key thing is that the constant depends only on A and n, but not on the xj's and yj's. But any matrix can be written as a sum of rank-1 matrices (ie: [itex]W = \sum_j x_j y_j^*[/itex]), so for all W of norm less than or equal to 1 there exists an M such that [itex]\left(A + \epsilon W\right)^{-1} = A^{-1} + \epsilon M[/itex] with ||M|| < c.

    Then for [itex]||A-B|| < \epsilon[/itex], we can rewrite B as a perturbation of A: [itex]B = A + \epsilon W[/itex] with ||W|| less than or equal to 1. So by the above, we have
    [tex]||A^{-1} - B^{-1}|| = ||A^{-1} - \left(A^{-1} + \epsilon M\right)|| = \epsilon ||M|| < \epsilon c[/tex]
    Last edited: Dec 19, 2008
  11. Dec 21, 2008 #10


    User Avatar
    Science Advisor
    Homework Helper

    Isn't it obvious that an nbyn matrix is equivalent to a sequence of n^2 numbers? hence to a point in R^(n^2)? hence there is a natural topology as usual on euclidean space.

    then cramer's rule says that in these coordinates the inverse matrix function is a quotient of polynomials with non zero denominator, hence continuous.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook