# Rank of sample covariance matrix

1. Dec 9, 2009

### fmilano

I was reading Turk and Pentland paper 'Eigenfaces for recognition' and they assert that, if M < N, the maximum rank of a covariance matrix is M - 1, being M the number of samples and NxN the size of the covariance matrix.

Is there any simple demonstration of this fact?

Federico

2. Dec 14, 2009

### brian44

This seems incorrect to me, it should be M.

The reason is we usually capture the covariance matrix using X*X' where X is an NxM matrix of N features by M observations (usually we normalize X so that each row [feature] has mean 0 and variance 1, or some similar process, so that we are getting the standard covariance estimate).

Then of course the rank of X*X' can be at most M for M < N because you may view applying the transformation X*X' as applying the transformation X' followed by applying the transformation X. Then because X' is an MxN matrix taking R^N to R^M, so its range (i.e. rank) cannot be greater than the dimension of R^M, namely M. Then, X takes R^M to a subspace of R^N, with dimension no greater than M (this follows from linear algebra, but you may simply think about it like this: X is taking the entire subspace of R^M (the range of X') to somewhere in R^N, since it is a linear function, it cannot fill up more of R^N then what is put into it, namely it cannot output more than the image of X on the subspace of R^M, so again the dimension of the range is no greater than M). Thus X*X' can have rank no greater than M.
(Note also I use X' to denote transpose of X and R^n, assuming real number system).

Assuming this is what they mean by covariance matrix, it is easy to come up with a counter-example to the rank = M-1 claim, unless there is more information missing, i.e. some other constraint given on the variables. In fact generate a random NXM matrix X and take the covariance matrix, or equivalently normalize it properly and take X*X' and most of the time the rank will be M.

3. Dec 14, 2009

### DrGreg

brian44's argument shows that the rank can be no greater than M but nevertheless it could be less (as he hints), depending on what the data actually is.

I had a look at Turk and Pentland's paper (Eigenfaces for recognition) and in their case the M columns of X (or A in their notation) are not independent because they have subtracted the mean column from each column (i.e. the sum along each row is zero). In their notation, Φi = Γi - Ψ. This constraint reduces the rank by one.