# Why use SVD?

Gold Member
Hi PF!

Can you please tell me why it is beneficial to use SVD in data processing? My understanding is, given a lot of data, if we arrange it in a matrix, we can filter out the important pieces from the unimportant pieces.

Evidently, any matrix ##A## can be decomposed into ##U \Sigma V^T## where ##U## is an orthonormal basis for the row space of ##A## and ##V^T## is an orthogonal basis for column space of ##A##. ##\Sigma## is a diagonal matrix whose trace is the eigenvalues of ##A## in descending order. Apparently the values of ##\Sigma## report the "importance" of each component of ##A##.

I've read some articles dealing with the unit sphere and they make some sense, but I'm still not sure how to interpret what these orthogonal and diagonal vectors give me and how they filter data.

Can anyone add to this intuition and perhaps give me an example (not how to calculate SVD but how to interpret it)?

Thanks!

## Answers and Replies

Related Linear and Abstract Algebra News on Phys.org
AlephZero
Science Advisor
Homework Helper
##\Sigma## is a diagonal matrix whose trace is the eigenvalues of ##A## in descending order.
No, the diagonal terms of ##\Sigma## are the eigenvalues of ##A^TA##. The SVD works for rectangular matrices as well as square ones, but eigenvalues and eigenvectors are not defined for a rectangular matrix. ##A^TA## is always a square matrix.

There is nothing special about the order of the diagonals in ##\Sigma##, but usually if you want to ignore the "small" terms so it makes practical sense to sort them.

The general idea for applications is that if you want to solve an over- or under-determined system of equations ##Ax=b##, one idea is to use a least squares solution which in principle means you solve the equations ##A^TAx=A^Tb##. If you form the SVD of ##A## (which is always a numerically well-conditioned operation) ##A^TA = V \Sigma^2 V^T## and its inverse is ##V \Sigma^{-2} V^T##, so the formal solution to the least squares equations are ##x = V \Sigma^{-1} U^T b ##.

If the least squares problem was poorly conditioned, some of the diagonals of ##\Sigma## will be small (compared with the largest value) or exactly zero. The small terms in ##\Sigma## become large terms in ##\Sigma^{-1}## and contribute a large amount of "noise" to the value of ##V \Sigma^{-1} U^T##, which results in wild excursions of the solution for ##x##, in an attempt to squeeze the last drop of accuracy from the solution. It often makes more sense to just ignore the contribution from the "small" singular values, by taking ##\sigma_{ii}^{-1} = 0## when ##\sigma_{ii}## is small.

Because ##\Sigma## is diagonal, you can express ##V \Sigma^{-1} U^T## as the sum of contributions ##\displaystyle \sum_{i = 1}^n v_i\sigma_{ii}^{-1}u_i##, where ##v_i## amd ##u_i## are the columns of ##V## and ##U##.

Last edited:
1 person
WWGD
Science Advisor
Gold Member
2019 Award
@Aleph:
What do you mean by a poorly-conditioned problem?

Gold Member
Hey Aleph, so if we're given data for several velocity components, lets say ##v_x,v_y,v_z## (the x,y,z velocity components respectively), and perhaps we want to compute ##v_i v_j##, for all ##i## and ##j##, how would SVD help?

thanks so much!

Hi PF!
I've read some articles dealing with the unit sphere and they make some sense, but I'm still not sure how to interpret what these orthogonal and diagonal vectors give me and how they filter data.
Hi josh,

You mentioned you read about unit sphere and SVD, can you explain geometrically what you understand/don't understand?

AlephZero
Science Advisor
Homework Helper
@Aleph:
What do you mean by a poorly-conditioned problem?
Informally, a situation where the calculated answer is very sensitive to small changes in the data (for example changes of the same magnitude as the errors in measured data).

Common-sense logic says such changes don't have any practical significance - or if they are significant, you need more accurate data to calculate them reliably.

More formally, see http://en.wikipedia.org/wiki/Condition_number

AlephZero
Science Advisor
Homework Helper
Hey Aleph, so if we're given data for several velocity components, lets say ##v_x,v_y,v_z## (the x,y,z velocity components respectively), and perhaps we want to compute ##v_i v_j##, for all ##i## and ##j##, how would SVD help?
Sorry, I don't understand what you want to do from that description.

Gold Member
Sorry, I don't understand what you want to do from that description.
sorry.

suppose we take several velocity measurements in all three cartesian (x,y,z) directions and want to calculate ##\langle v_i v_j \rangle## where ##v_i## is the velocity component in the ##i^{th}## direction and ##i=1,2,3## for the 3 cartesian x,y,z axis, respectively.

my question is, how does SVD help us out? (i believe it should since ##v_i v_j## is a ##3 \times 3## matrix).

Last edited:
Gold Member
Hi josh,

You mentioned you read about unit sphere and SVD, can you explain geometrically what you understand/don't understand?
my understanding is, suppose matrix ##A## maps the unit circle from ##R^2## into ##R^3## as an off-axis, yet planar, ellipse. then ##V^T## takes two vectors to ##i## and ##j##. ##\Sigma## take ##i,j## into 3-space and tilts the xy axis and elongates the old ##i,j## into the shortest and longest distances from the center of the ellipse. lastly, ##U## takes the tilted ##xy## axis and makes it orthogonal to the ##z## axis in 3 space. it seems some slight revolution may occur.

my understanding is, suppose matrix ##A## maps the unit circle from ##R^2## into ##R^3## as an off-axis, yet planar, ellipse. then ##V^T## takes two vectors to ##i## and ##j##. ##\Sigma## take ##i,j## into 3-space and tilts the xy axis and elongates the old ##i,j## into the shortest and longest distances from the center of the ellipse. lastly, ##U## takes the tilted ##xy## axis and makes it orthogonal to the ##z## axis in 3 space. it seems some slight revolution may occur.
I suppose the ##A## matrix you use is a 3x2. And yes, A maps a circle to a ellipse, but I will try to clarify the process of how a circle is mapped to the ellipse geometrically using the SVD. So, a circle in ##R^2## is the set of unit vector, each one get mapped to another vector in ##R^3##. So, if you start with a unit vector ##p##, applying ##A##, you will get ##q##=##A####p##. If you write ##A## as ##U####\Sigma####V^T##, then, you have ##q##=##U####\Sigma####V^T####p##, in particular, let ##r##=##V^T####p##, ##s##=##\Sigma####r##. You will see ##r## is a ##R^2## vector (since ##V^T## is 2x2), while ##s## is ##R^3##. It turns out that any orthogonal matrix is a rotation matrix, for example, ##V^T## being a 2x2 orthogonal matrix, maps ##p## to ##r## by rotating ##p## about the origin (0,0) within the ##R^2## plane, and any vector is mapped by ##V^T## by rotating for the same angle about the origin. So, a circle in ##R^2## is mapped to another circle in the plane by rotation. And ##r## is still a unit vector. Now, applying ##\Sigma## to any unit vector in the new circle will map it to an ##R^3## vector, with the 3rd component always zero while the length is not preserved in general. So, for the circle, ##\Sigma## actually elongates or compresses the circle into a ellipse depending on the actual magnitude of its diagonal component, but the ellipse is still sitting in an ##R^2## plane in the ##R^3## space. The final 3x3 orthogonal matrix ##U## is again a rotation matrix, though it performs a 3-D rotation about some spatial axis. So, the ellipse will become tilted after that. The idea is the same for any mxn matrix: rotation -> elongate/compress -> rotation.

As for the velocity analysis you mentioned, it is not exactly clear what you are trying to do from the brief description. But I guess you are interested in using Principal component analysis? I am not very familiar with PCA, you might want to look it up from Wiki, I think it mentions SVD there.

Stephen Tashi
Science Advisor
My view of the SVD:

Think of a typical printed data table. The table has row headings and column headings. An entry gives a number that repersents some sort of result for the row and column entry that go with it.

A remakably simple type of data table is one where the row and column entries are some numbers (not necessarily integers and not necessarily in any particular order) and the entry in the table gives their product. This type of table is so simple that it would hardly be worth storing the whole table in a computer program. You need only store the row entries and column entries and you can quickly generate the data from that information.

Many tables of data are as not simple as the above type. A frequently occuring idea in mathematics is to express a complicated thing in terms of simple components. It is natural to ask if a complicated data table can be written as a linear combination of simple data tables. If you consider a matrix to be a table of data, its SVD tells you how to do that.