Rank of sample covariance matrix

In summary, Turk and Pentland's paper 'Eigenfaces for recognition' asserts that in a covariance matrix, if the number of samples (M) is less than the size of the matrix (N), then the maximum rank of the matrix is M-1. This is due to the constraint of the columns not being independent, as they have subtracted the mean column from each column. A simple demonstration of this fact would be to generate a random N x M matrix and take the covariance matrix, or to look at the example in Turk and Pentland's paper.
  • #1
fmilano
7
0
I was reading Turk and Pentland paper 'Eigenfaces for recognition' and they assert that, if M < N, the maximum rank of a covariance matrix is M - 1, being M the number of samples and NxN the size of the covariance matrix.

Is there any simple demonstration of this fact?

Thanks in advance,

Federico
 
Physics news on Phys.org
  • #2
This seems incorrect to me, it should be M.

The reason is we usually capture the covariance matrix using X*X' where X is an NxM matrix of N features by M observations (usually we normalize X so that each row [feature] has mean 0 and variance 1, or some similar process, so that we are getting the standard covariance estimate).

Then of course the rank of X*X' can be at most M for M < N because you may view applying the transformation X*X' as applying the transformation X' followed by applying the transformation X. Then because X' is an MxN matrix taking R^N to R^M, so its range (i.e. rank) cannot be greater than the dimension of R^M, namely M. Then, X takes R^M to a subspace of R^N, with dimension no greater than M (this follows from linear algebra, but you may simply think about it like this: X is taking the entire subspace of R^M (the range of X') to somewhere in R^N, since it is a linear function, it cannot fill up more of R^N then what is put into it, namely it cannot output more than the image of X on the subspace of R^M, so again the dimension of the range is no greater than M). Thus X*X' can have rank no greater than M.
(Note also I use X' to denote transpose of X and R^n, assuming real number system).

Assuming this is what they mean by covariance matrix, it is easy to come up with a counter-example to the rank = M-1 claim, unless there is more information missing, i.e. some other constraint given on the variables. In fact generate a random NXM matrix X and take the covariance matrix, or equivalently normalize it properly and take X*X' and most of the time the rank will be M.
 
  • #3
brian44's argument shows that the rank can be no greater than M but nevertheless it could be less (as he hints), depending on what the data actually is.

I had a look at Turk and Pentland's paper (Eigenfaces for recognition) and in their case the M columns of X (or A in their notation) are not independent because they have subtracted the mean column from each column (i.e. the sum along each row is zero). In their notation, Φi = Γi - Ψ. This constraint reduces the rank by one.
 

Related to Rank of sample covariance matrix

What is the rank of a sample covariance matrix?

The rank of a sample covariance matrix is the number of linearly independent rows or columns in the matrix. In other words, it is the highest number of variables that are not perfectly correlated with each other.

Why is the rank of a sample covariance matrix important?

The rank of a sample covariance matrix is important because it tells us how many variables are needed to fully describe the data. It also provides information about the dimensionality of the data and can be used in various statistical analyses.

How is the rank of a sample covariance matrix calculated?

The rank of a sample covariance matrix can be calculated by finding the number of non-zero eigenvalues in the matrix. Alternatively, it can also be calculated by using the singular value decomposition (SVD) method.

Can the rank of a sample covariance matrix be greater than the number of variables?

No, the rank of a sample covariance matrix cannot be greater than the number of variables. This is because the maximum rank of a matrix is equal to the minimum of its number of rows and columns.

How does the rank of a sample covariance matrix affect principal component analysis (PCA)?

The rank of a sample covariance matrix is a crucial factor in PCA. It determines the number of principal components that can be extracted from the data. If the rank is less than the number of variables, then not all variables can be represented by the extracted principal components.

Similar threads

  • Linear and Abstract Algebra
Replies
2
Views
2K
  • Linear and Abstract Algebra
Replies
1
Views
936
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
1K
  • Linear and Abstract Algebra
Replies
1
Views
756
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
  • Linear and Abstract Algebra
Replies
5
Views
1K
  • Linear and Abstract Algebra
Replies
4
Views
2K
  • Linear and Abstract Algebra
Replies
1
Views
48K
Back
Top