Singular Value Decomposition: How Can We Efficiently Use SVD Practically?

In summary, the conversation discusses a question on singular value decomposition and the interpretation of matrix U in relation to certain words. The link to a paper on the topic is provided and the conversation delves into the details of the matrix and its components. The conversation also mentions some errors in the author's interpretation and provides a reconstruction of the correct interpretation.
  • #1
member 428835
Hello again PF!

I had a question on singular value decomposition. My question relates to a pepper I have been reading, and I'm wondering if any of you would be able to glance over the paper and let me know your thoughts?

The link is here: http://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf

My question is on page 22. Specifically, how does the author interpret matrix U with respect to the words car, doctor, hospital, wheel, and nurse?

i'm trying to get a good understanding of SVD practically. any help is greatly appreciated!

josh
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
The very top of the page is marked "ROUGH DRAFT - BEWARE suggestions <email address elided>". That certainly is true with regard to pages 21 and 22. The row and column labels are missing on matrix A on page 21, and what he said about the eigenvalues on page 22 is inconsistent.

Let me try to reconstruct that. This is a guess. A "ROUGH DRAFT", if you will. The matrix A is an occurrence matrix. Each element indicates how many times a given word occurs in a given document. With labels attached, here's my guess as to that matrix:
[tex]
\begin{array}{lrrrrr}
& \text{#1} & \text{#2} & \text{#3} & \text{#4} & \text{#5} \\
\text{doctor} & 2 & 0 & 8 & 6 & 0 \\
\text{car} & 1 & 6 & 0 & 1 & 7 \\
\text{hospital} & 5 & 0 & 7 & 4 & 0 \\
\text{nurse} & 7 & 0 & 8 & 5 & 0 \\
\text{wheel} & 0 & 10 & 0 & 0 & 7
\end{array}[/tex]
The rows represent different words whose occurrence we want to study; the columns represent documents in which the words appear. I've made a guess as to which word corresponds to which row. I've just labeled the columns #1, #2, etc. The names of those documents isn't relevant here. (Obviously it is in an analysis of those documents.)

Some obvious things that jump out, just looking at the matrix:
  • Doctors, nurses, and hospitals are deeply connected with one another. You can see this in documents #1, #3, and #4.
  • Cars and wheels are deeply connected with one another. Two of the documents, #2 and #5, only mention cars and wheels.
  • A bit weaker, nurses and cars are connected when doctors are absent. Documents #1 and #4 mention cars, document #3 doesn't. Document #1 barely mentions doctors but does mention cars. Document #3 mentions doctors a lot but doesn't mention cars at all.

The following link computes an SVD of that array, courtesy WolframAlpha.

Note that except for some sign changes, the U matrix in that document and the U matrix from WolframAlpha are the same. Negate an eigenvector and you still have an eigenvector.

The author of the paper completely messed up the interpretation of the matrix. The first column vector corresponds to the largest singular value, so it's the most significant eingenvector. That column says that doctors, nurses, and hospitals go hand in hand. A document that contains one of those words is most likely to contain all three. Cars and wheels are a bit irrelevant in this first eigenvector. The second eigenvector says that some documents contain the words car and wheel and none of the other words. The third eigenvector represents documents that don't contain the words doctor or wheel but do contain the words nurse, and to a lesser extent contain the words car and hospital. The fourth and fifth eigenvectors? Now we're in the noise. Those eigenvectors are small, particularly for a 5x5 matrix.
 
  • Like
Likes 1 person
  • #3
joshmccraney said:
Hello again PF!

I had a question on singular value decomposition. My question relates to a pepper I have been reading, and I'm wondering if any of you would be able to glance over the paper and let me know your thoughts?

josh

You should not try to read peppers or other vegetables. :biggrin:
 
  • #4
SteamKing said:
You should not try to read peppers or other vegetables. :biggrin:

what's wrong if i "red peppers" bahahhaha? do they hurt your eyes?
 
  • #5
Thanks a ton DH! a few following question for you if it's okay?

first, the matrix U. is it still categorized as ##words \times documents number##? it seems like the document number is no longer preserved. is this correct?

also, what does ##V^T## give us? would it be ##document number \times words##? or are words omitted?

thanks a ton!
 
  • #6
If ##A = U\Sigma V^T##, then ##A^T = V\Sigma U^T##. So you can interpret the ##V## matrix that way.

Instead of showing you clusters of words that occur in individual documents, it is showing you the "clusters of documents" that contain individual words. That probably isn't so interesting, because in a real life sized example you would have many more words than documents. If you want to know which documents contain a particular word, you know that already from the A matrix.
 
  • Like
Likes 1 person
  • #7
joshmccraney said:
Thanks a ton DH! a few following question for you if it's okay?

first, the matrix U. is it still categorized as ##words \times documents number##? it seems like the document number is no longer preserved. is this correct?

also, what does ##V^T## give us? would it be ##document number \times words##? or are words omitted?
Suppose A is an NxM matrix, where in this case N is the number of words and M is the number of documents. Then AAT will be an NxN matrix, which in this case is number of words by number of words. AAT shows how words are related by their mutual appearance (or lack thereof) across all of the documents. The U matrix contains the eigenvectors of AAT.

The flip side of this is the matrix ATA. This is an MxM matrix, i.e., number of documents by number of documents. ATA shows how documents are related by using common words (or using disjoint words) across all of the words. The V matrix contains the eigenvectors of ATA.
 
  • #8
D H said:
Suppose A is an NxM matrix, where in this case N is the number of words and M is the number of documents. Then AAT will be an NxN matrix, which in this case is number of words by number of words. AAT shows how words are related by their mutual appearance (or lack thereof) across all of the documents. The U matrix contains the eigenvectors of AAT.

The flip side of this is the matrix ATA. This is an MxM matrix, i.e., number of documents by number of documents. ATA shows how documents are related by using common words (or using disjoint words) across all of the words. The V matrix contains the eigenvectors of ATA.

perfect, i get it now (i think)! thanks so much. what would happen if we examined the row vectors of U, rather than the columns? would it be anything significant?

how about ##V^T##? do we still look at columns?

thanks!
 
  • #9
You look at the columns of V, not VT. That's one of the reasons the matrix is returned as V rather that VT.
 
  • #10
thanks!

Suppose the SVD for an ##n \times n## matrix ##A=U \Sigma V^T## where each matrix of the decomposition is an ##n \times n## matrix. Let's say we are going to try to truncate for data reduction so we take out the last ##n-2## diagonals of ##\Sigma## and leave only the first two. Let's call this truncated matrix ##\Sigma '##. How will this help data reduction? Generally won't matrix ##U \Sigma ' V^T## still be an ##n \times n## matrix and not necessarily have any columns that are zeros?
 
  • #11
If you set an entry in ##\Sigma## to zero, The corresponding row of ##U## and column of ##V^T## are multiplied by 0 in the product ##U\Sigma V^T##, so you can delete them from ##U## and ##V^T##.

If you only keep 2 non-zero entries in ##\Sigma##, you only need the corresponding 2 columns of ##U##, and the corresponding 2 rows of ##V^T## (or the 2 columns of ##V##).

You are right that ##U\Sigma V^T## is still a fully populated matrix, but you don't need to store it explicitly. When you need to use it in a calculation, you can use the truncated versions of ##U##, ##\Sigma## and ##V## instead.

This is analogous to not calculating the inverse of a matrix explicitly. If you want to solve some linear equations ##Ax = b##, the matrix ##A^{-1}## is usually fully populated even when ##A## is almost all zeros (e.g. ##A## is a tridiagonal matrix). You don't form the matrix ##A^{-1}## and then multiply it by ##b##. You do some form of Gaussian elimination that preserves the sparseness of ##A##.
 
Last edited:
  • Like
Likes 1 person

What is Singular Value Decomposition?

Singular Value Decomposition (SVD) is a mathematical technique used to decompose a matrix into three matrices, namely U, Σ, and V. It is commonly used for dimensionality reduction, data compression, and solving linear systems of equations.

Why is Singular Value Decomposition important?

Singular Value Decomposition is important because it allows us to find the underlying structure of a matrix and extract the most important features. It is also used in various applications such as image processing, data mining, and signal processing.

What are the applications of Singular Value Decomposition?

Singular Value Decomposition has a wide range of applications in various fields such as data compression, image processing, recommendation systems, and natural language processing. It is also used in solving linear systems of equations and in matrix factorization methods.

How does Singular Value Decomposition work?

Singular Value Decomposition works by decomposing a matrix A into three matrices, U, Σ, and V. The matrix U contains the left singular vectors, Σ is a diagonal matrix containing the singular values, and V contains the right singular vectors. The singular values represent the importance of each feature in the original matrix.

What are the advantages of using Singular Value Decomposition?

Some of the advantages of using Singular Value Decomposition include its ability to handle large datasets, its efficiency in capturing the most important features, and its usefulness in solving linear systems of equations. It also helps in reducing the noise and redundancy in a dataset, making it easier to interpret and analyze the data.

Similar threads

  • Linear and Abstract Algebra
Replies
5
Views
2K
  • Linear and Abstract Algebra
Replies
6
Views
1K
Replies
2
Views
2K
Replies
1
Views
2K
  • Linear and Abstract Algebra
Replies
8
Views
1K
  • Calculus and Beyond Homework Help
Replies
6
Views
3K
  • Calculus and Beyond Homework Help
Replies
4
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Linear and Abstract Algebra
Replies
18
Views
7K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
Back
Top