# Low Rank Approximation

1. Dec 5, 2013

### TheOldHag

I'm studying low rank approximation by way of SVD and I'm having trouble understanding how the result matrix has lower rank. For instance, in the link the calculation performed on page 11 resulting in the so-called low rank approximation on page 12. This matrix on page 12 doesn't appear to me to be of rank 2 as one would think. I would expect only 2 linearly independent row vector. When I put it back into a calculator it is still rank 9.

2. Dec 5, 2013

### TheOldHag

This is hard for me to word.

Basically, this explanation of low-rank approximation says the new m x n matrix should have rank r is not something I'm seeing in the Latent Semantic Indexing examples out there. They are coming up with new matrices that to me look like they have higher rank than r.

http://en.wikipedia.org/wiki/Low-rank_approximation

3. Dec 5, 2013

### AlephZero

The matrix on page 12 is only printed to 2 significant figures, so if you copied and pasted those numbers into a calculator, most likely the matrix wound be "rank 9" because of the rounding errors introduced.

The matrix on page is actually the product W (12 x 2) . S (2 x 2) . P' (2 x 9) where the ( ) are the dimensions of each matrix and the relevant parts of the full W S and P matrices are shaded on page 11.

Of course the full W and P matrices on page 11 are only shown to two significant figures as well. If you check whether W and P are orthonormal using those rounded values, you will probably find the "zero" terms are of order 0.0001 or worse.

If you want to do a numerical check for orthogonality, you really need to calculate the SVD yourself to 16 significant figures in double precision arithmetic, and work with that.

FWIW it should be obvious that if you took only one SVD component, the resulting matrix
W (12 x 1) . S (1 x 1) . P' (1 x 9)
would be of rank 1.

4. Dec 5, 2013

### TheOldHag

I think your last point helps understand this better and the rounding error makes perfect sense.

As far as feature extraction in general or latent semantic indexing in general I'm not sure how this reduced dimensionality results in reduced computational cost down the road since you are still left with a term-document matrix that is no longer sparse but having fewer dimensions. Are there further steps to this feature extraction process beyond this point. For instance, determine the basis and then maps things in term of the basis. Seems to be a step missing in these papers on reading since they just end with this new matrix that doesn't appear much simpler on the surface.

5. Dec 5, 2013

### AlephZero

I don't know anything about semantic feature extraction specifically, but the math in your link looks quite similar to methods that are used in other areas to reduce "large" models to manageable size. Numerical methods based on the SVD decomposition don't seem to feature much in first courses in numerical methods, but they are very useful in practice.

I assume those examples show the matrices in full just to demonstrate what is going on. In practice, it is unlikely you would compute the full matrices. For example if you only want to compute a subset of the singular values (say 1000 SVs of a matrix of dimension maybe 10,000,000 x 20,000, for a large text data base with a realistic sized vocabulary), you don't need to calculate all the SVs , plus the full orthonormal matrices, and then throw most of that data away.

6. Dec 6, 2013

### TheOldHag

I appreciate your help. I'm pass that bit of confusion now.