Understanding Low Rank Approximation with SVD: A Comprehensive Guide

  • Context: Graduate 
  • Thread starter Thread starter TheOldHag
  • Start date Start date
  • Tags Tags
    Approximation rank
Click For Summary

Discussion Overview

The discussion revolves around the concept of low rank approximation using Singular Value Decomposition (SVD), focusing on the rank of resulting matrices and the implications for feature extraction in contexts like Latent Semantic Indexing. Participants explore the mathematical properties and practical applications of these approximations, as well as the challenges posed by rounding errors and dimensionality reduction.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant expresses confusion about the rank of the low rank approximation matrix, suggesting it appears to have a higher rank than expected based on the SVD method.
  • Another participant notes that the rank of the new matrix may be affected by rounding errors due to significant figures in the printed values.
  • A participant explains that the low rank approximation is derived from the product of matrices W, S, and P', and emphasizes the importance of using high precision in calculations to avoid misinterpretations of rank.
  • There is a discussion about the implications of reduced dimensionality for computational efficiency, with questions raised about further steps in feature extraction beyond the initial low rank approximation.
  • One participant mentions that while the mathematics appears similar to other numerical methods, SVD is not commonly emphasized in introductory courses, yet it is practically useful.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the rank of the approximation matrices or the implications of dimensionality reduction. There are multiple competing views regarding the effects of rounding errors and the practical applications of SVD in feature extraction.

Contextual Notes

Participants highlight limitations related to rounding errors and the significance of using precise calculations when working with SVD. The discussion also touches on the potential complexity of the resulting matrices and the need for further steps in the feature extraction process that are not always clearly outlined in the literature.

TheOldHag
Messages
44
Reaction score
3
I'm studying low rank approximation by way of SVD and I'm having trouble understanding how the result matrix has lower rank. For instance, in the link the calculation performed on page 11 resulting in the so-called low rank approximation on page 12. This matrix on page 12 doesn't appear to me to be of rank 2 as one would think. I would expect only 2 linearly independent row vector. When I put it back into a calculator it is still rank 9.

http://lsa.colorado.edu/papers/dp1.LSAintro.pdf
 
Physics news on Phys.org
This is hard for me to word.

Basically, this explanation of low-rank approximation says the new m x n matrix should have rank r is not something I'm seeing in the Latent Semantic Indexing examples out there. They are coming up with new matrices that to me look like they have higher rank than r.

http://en.wikipedia.org/wiki/Low-rank_approximation
 
The matrix on page 12 is only printed to 2 significant figures, so if you copied and pasted those numbers into a calculator, most likely the matrix wound be "rank 9" because of the rounding errors introduced.

The matrix on page is actually the product W (12 x 2) . S (2 x 2) . P' (2 x 9) where the ( ) are the dimensions of each matrix and the relevant parts of the full W S and P matrices are shaded on page 11.

Of course the full W and P matrices on page 11 are only shown to two significant figures as well. If you check whether W and P are orthonormal using those rounded values, you will probably find the "zero" terms are of order 0.0001 or worse.

If you want to do a numerical check for orthogonality, you really need to calculate the SVD yourself to 16 significant figures in double precision arithmetic, and work with that.

FWIW it should be obvious that if you took only one SVD component, the resulting matrix
W (12 x 1) . S (1 x 1) . P' (1 x 9)
would be of rank 1.
 
I think your last point helps understand this better and the rounding error makes perfect sense.

As far as feature extraction in general or latent semantic indexing in general I'm not sure how this reduced dimensionality results in reduced computational cost down the road since you are still left with a term-document matrix that is no longer sparse but having fewer dimensions. Are there further steps to this feature extraction process beyond this point. For instance, determine the basis and then maps things in term of the basis. Seems to be a step missing in these papers on reading since they just end with this new matrix that doesn't appear much simpler on the surface.
 
I don't know anything about semantic feature extraction specifically, but the math in your link looks quite similar to methods that are used in other areas to reduce "large" models to manageable size. Numerical methods based on the SVD decomposition don't seem to feature much in first courses in numerical methods, but they are very useful in practice.

I assume those examples show the matrices in full just to demonstrate what is going on. In practice, it is unlikely you would compute the full matrices. For example if you only want to compute a subset of the singular values (say 1000 SVs of a matrix of dimension maybe 10,000,000 x 20,000, for a large text data base with a realistic sized vocabulary), you don't need to calculate all the SVs , plus the full orthonormal matrices, and then throw most of that data away.
 
I appreciate your help. I'm pass that bit of confusion now.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
Replies
2
Views
1K
  • · Replies 16 ·
Replies
16
Views
12K
  • · Replies 28 ·
Replies
28
Views
13K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 14 ·
Replies
14
Views
6K