SVD Low-Rank Approximation Algorithm

In summary, the conversation discusses the use of SVD for low-rank approximation and its mathematical properties, as well as efficient ways to calculate it for large matrices. It is recommended to use existing linear algebra software such as LAPACK or ARPACK, and not to write code for the Lanczos method unless extensive research is done. The possibility of using Apache Mahout for machine learning and dimensional reduction is also mentioned, as well as the use of GPUs for faster matrix calculations.
  • #1
TheOldHag
44
3
I'm looking for a concise description of an algorithm for low-rank approximation via SVD. I've seen a number of articles referring to Lanczos method and finding eigenvalues but nothing going all the way to determining all the matrices involved in the low-rank SVD of a given matrix.

Any pointers is appreciated.
 
Physics news on Phys.org
  • #2
You can probably split that question into two parts.

1. What are the math (linear algebra) properties of the SVD, how they can be used for least squares fitting, pseudo-inverse matrices, interpreting the meaning of ignoring (or setting to zero) some of the singular values, etc.

2. How to calculate the SVD efficiently, especially for large matrices.

A (very short and cryptic) answer to (1) is that the SVD of a matrix ##A## is closely related to the eigenvalues and vectors of the matrices ##A^TA## and ##AA^T##. Books and course notes on numerical methods and/or linear algebra should have the details.

For (2), you probably don't really want to know. If you want to calculate SVDs, use the routines from a library like LAPACK, or ARPACK for very large problems.

IIRC the "Numerical Recipes" book has a reasonable description of (1), and some (state of the art in the 1960s) computer code for (2). Old versions of "NR" are free, from here. http://www.nr.com/oldverswitcher.html

Lanczos methods are more or less the current state of the art for very large eigenproblems, but the difference between the basic math (which is simple enough) and computer code that actually works lies in some subtle details about dealing with rounding errors, etc. The math behind Lanczos dates back to the 1940s or 50s, but it took untll the 1980s before anybody figured out how to make it work reliably as a numerical method. (In fact Wilkinson's classic book of the 1960s, "The Algebraic Eigenvalue Problem", "proved" that it was nice in theory but useless in practice!) You most definitely don't want to be write your own code for the Lanczos method, unless you do a LOT of research first.
 
  • #3
Thanks for the note. Is that to say ARPACK is for large spare matrices. I'm looking at something on the order of 1,000,000 rows and columns for text mining. These columns would be document term vectors and since most documents will only have a small number of terms with respect to the total term vocabulary this would be a pretty spare matrix. Would ARPACK be better for this or would LAPACK be good enough.
 
  • #4
When you say"...1,000,000 rows and columns..." do you mean 1,000,000 rows AND 1,000,000 columns? In other words a matrix of 10^12 elements (as a dense matrix)?

Do you have any idea how long it's going to take to calculate a SVD for that matrix?

AlephZero is right on the money when he says "You most definitely don't want to be write your own code for the Lanczos method, unless you do a LOT of research first."

You need to use already existing linear algebra software, written by people with years of experience doing this. Writing code to handle sparse matrices would be even more difficult.

Just to give you an example, Mathematica in its current incarnation has highly optimized code for this, including special routines for sparse matrices.

I created a 10000x10000 tridiagonal matrix with random floating point numbers on the 3 diagonals and had Mathematica calculate the first 5 singular values, treating the matrix as a dense matrix. It took 336 seconds on a fast Intel quad core machine with a solid state hard drive.

Converting the dense matrix to a sparse matrix, Mathematica took .203 seconds to calculate the first 5 singular values.

If the time scaled up linearly (which it doesn't), calculating the first 5 singular values of your 1000000x1000000 dense matrix would take 3360000 seconds.

Just the storage for your proposed matrix, as a dense matrix, would take 1000 gigabytes if you only allocated 1 byte per element of the matrix.

Your only hope is to create your matrix as a sparse matrix, and use something like Mathematica or Matlab or some similar package. You definitely don't have enough time to write your own software from scratch. I think even starting with something like LAPACK (actually you would probably want EISPACK rather than LAPACK; even those are old) and interfacing it (with plenty of debugging) would probably consume more time than you have for your entire project.
 
  • #5
Yes, ... apparently my 2 classes in Linear Algebra 10 years ago aren't quite enough. I did an introductory applied class and then went through 'Done Right' for a second time just recently. I'm going to try my hand at Golub over the next 3 months.

In the meantime, I think you are correct regarding doing it yourself. Brick wall comes to mind. Apparently Apache Mahout provides algorithms for machine learning which includes some dimensional reduction stuff. I'm a bit skeptical regarding the performance but they also provide a stochastic implementation of SVD and rank reduction. This looks more up my alley since Mahout is in the general ecosystem of Lucene and text indexing.

Any comments are appreciated?

What do you think of this

https://cwiki.apache.org/confluence/display/mahout/dimensional+Reduction

https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
 
  • #6
Your first reference mentions dense matrices with 10^7 rows and columns having 100 trillion non-zero entries! Are people really working with such large matrices? I guess so; he says "Each of these examples deal with cases of matrices which tend to be tremendously large (often millions to tens of millions to hundreds of millions of rows or more, by sometimes a comparable number of columns), but also rather sparse."

But, do you have access to a supercomputer for your project, or are you going to do the number crunching on a personal computer? If you will use a desktop sized computer, simply creating and storing these large matrices and intermediate results of your calculations will be problematic.

Using sparse matrix techniques will be necessary, I think. A fairly recent improvement in matrix arithmetic speed is the use of the graphics processor unit (GPU) for matrix calculations. Look up "CUDA" in connection with Nvidia graphics cards. Mathematica can use CUDA, and other packages may as well.

You might try communicating with some of the workers in the field whose references you have located and ask them about the practical problems.
 
  • #7
Thanks for comments. I'm a bit skeptical of Mahout's claims regarding the size of matrices that need to be worked with. There are document collections in the millions and vocabulary consisting of around a million terms. However, the term document-matrix for a collection would be extremely sparse and irregular. I'm guessing, that the fact that it calculates a low-rank approximation might help since the recommended rank is somewhere between 200 and 400 for text clustering so from that the resultant matrices would have that amount of columns or rows that need to be stored and the rest zeroed out in the SVD computation. The other dimension would still be large though.

As for storage, most text indexes are comparable in size to the collection itself, which is big.

In response to your question, these matrices are popping up in the context of text indexing. Linear algebra is at the mathematical core of text indexing.
 
  • #8
The Electrician said:
(actually you would probably want EISPACK rather than LAPACK; even those are old)

EISPACK was a brilliant library based on of 1960s vintage numerical analysis, and running on 1960s or 1970s computer hardware there was nothing to match it. But if your matrices are bigger than about 500 rows and columns, forget about it.

LAPACK is newer, and much more solid, and the LAPACK project is now getting into 21st century high performance computing (it started in the 20th).

TheOldHag said:

My experience over the last 20 or 30 years says, the devil is always in the detail. There wasn't anything that raised my eyebrows much (either in joy or disbelief) but without spending a month playing with it, I don't think that counts for much either way.
 
  • #9
AlephZero said:
My experience over the last 20 or 30 years says, the devil is always in the detail. There wasn't anything that raised my eyebrows much (either in joy or disbelief) but without spending a month playing with it, I don't think that counts for much either way.

I agree with your sentiment. But do you think LAPACK would be able to handle these massively sized matrices. Mahout advertises it but I have some skepticism until I dig in. There documentation is horrible and jumbled from what I have seen compared to LAPACK. Every other sentence in large parts of their documentation is pushing you to install a new open source project to proceed. Very disconnected.
 

1. What is the SVD Low-Rank Approximation Algorithm?

The SVD Low-Rank Approximation Algorithm is a mathematical method used to approximate a matrix by reducing its rank. It utilizes the Singular Value Decomposition (SVD) technique to decompose a matrix into singular values and corresponding vectors, and then approximates the original matrix by keeping only the most significant singular values and their corresponding vectors.

2. When is the SVD Low-Rank Approximation Algorithm used?

The SVD Low-Rank Approximation Algorithm is commonly used in data compression, noise reduction, and dimensionality reduction. It can also be used in machine learning tasks such as recommender systems, image and signal processing, and natural language processing.

3. How does the SVD Low-Rank Approximation Algorithm work?

The algorithm works by first decomposing the original matrix into three matrices: U, Σ, and V. U and V are orthogonal matrices, while Σ is a diagonal matrix with the singular values of the original matrix on its diagonal. These singular values represent the importance of each dimension in the original matrix. The algorithm then approximates the original matrix by keeping only the most significant singular values and their corresponding vectors from U and V.

4. What are the benefits of using the SVD Low-Rank Approximation Algorithm?

One of the main benefits of this algorithm is that it can reduce the size of a matrix without losing too much information. This makes it useful for compressing large datasets or reducing the dimensionality of high-dimensional data. It can also help to remove noise from data and improve the performance of machine learning models by reducing the complexity of the data.

5. What are the limitations of the SVD Low-Rank Approximation Algorithm?

One limitation of this algorithm is that it can only be applied to square matrices. Additionally, it may not work well for matrices with very few or very large singular values, as it can either over or under approximate the original matrix. It also requires significant computational resources, making it less efficient for large datasets.

Similar threads

Replies
5
Views
2K
  • Linear and Abstract Algebra
Replies
4
Views
2K
  • Linear and Abstract Algebra
Replies
4
Views
1K
  • Differential Equations
Replies
2
Views
2K
Replies
2
Views
8K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
  • Linear and Abstract Algebra
Replies
8
Views
2K
  • Poll
  • Science and Math Textbooks
Replies
4
Views
6K
  • Calculus and Beyond Homework Help
Replies
1
Views
3K
  • Differential Equations
Replies
1
Views
4K
Back
Top