Nearly Singular Matrix Inversion

EngWiPy · Aug 27, 2013

Hi,

I have a matrix that is nearly singular, so I am using singular value decomposition (SVD) approach for inversion. In this case an N-by-N matrix can be written as:

\mathbf{H}=\mathbf{U}\Sigma\mathbf{V}^H

Then the inverse can be written as:

\mathbf{H}^{-1}=\mathbf{V}\Sigma^{-1}\mathbf{U}^H

where

\Sigma=\text{diag}(\sigma_1,\ldots,\sigma_N)

The problem in singular matrices is that some of the singular values are zeros, and hence invert \Sigma will have some undefined values. To overcome this problem, if a singular value is less than a threshold, then its inverse is forced to 0.

The question is: how to control the threshold such that the matrix is inverted with high accuracy?

Thanks

D H · Aug 27, 2013

As a starter, you should make your threshold relative to the largest singular value. How low should you go? It depends in part on how you are representing the reals. For example, you are almost guaranteed to get a mess if you use double precision and set the threshold to less than 10^-16 times the largest singular value. Even a value of 10^-14 is a bit dubious. You are still bound to run into precision loss problems. A more reasonable scale factor is typically somewhere between 10^-12 and 10^-3.

Another factor is, what does your matrix H represent? Those small singular values says that don't really have N degrees of freedom. How many do you think you have? If you know, or think you know, the answer is n (rather than N), then another approach is just to take the first n largest singular values as having any real physical meaning, and discarding the rest.

EngWiPy · Aug 27, 2013

D H said:

As a starter, you should make your threshold relative to the largest singular value. How low should you go? It depends in part on how you are representing the reals. For example, you are almost guaranteed to get a mess if you use double precision and set the threshold to less than 10^-16 times the largest singular value. Even a value of 10^-14 is a bit dubious. You are still bound to run into precision loss problems. A more reasonable scale factor is typically somewhere between 10^-12 and 10^-3.

Another factor is, what does your matrix H represent? Those small singular values says that don't really have N degrees of freedom. How many do you think you have? If you know, or think you know, the answer is n (rather than N), then another approach is just to take the first n largest singular values as having any real physical meaning, and discarding the rest.

Again, what "small" singular values means? like < 10^-6, for example? The matrix represents an equivalent communication channel where the components are correlated due the type of processing at the receiver.

I am using the first approach you mentined, I set the threshold as th*sigma_max, but for some matrices th=10^-4 works fine, for others it doesn't. For some matrices I tried th=0.1, 0.01, 0.001, ... 1e-6, but it doesn't work.

How to solve this issue? Is there another way to invert such matrices and yet more stable and accurate?

Thanks

D H · Aug 27, 2013

There is no clear cut answer, and you are asking for what often are conflicting goals, stability and accuracy.

By way of analogy, suppose you are given 2001 measurements of some scalar quantity taken at 2001 different points in time. Your job: Develop some polynomial model that describes the quantity varies with time. You could develop a 2000th degree polynomial and hit each of those sampled values exactly. That ridiculous polynomial might be accurate (you can't get any more accurate), but it's absolutely worthless for interpolation and worth even less for extrapolation. Accuracy is not the be-all and end-all.

What you should be looking for instead is the smallest polynomial that still does a reasonable job of modeling your data. If you know how good those measurements are, you can use this as one measure of when to stop adding coefficients. Stop when you the residuals look like measurement noise. If you don't know the uncertainties in the measurements, another approach is to reserve some of those sampled values as evaluation points. You don't use those reserved points in the regression; you instead use them to evaluate the quality of the generated model.

This is a fairly standard approach not just in curve fitting but throughout machine learning. (Curve fitting is just a rather simple form of machine learning.) Machine learning is oftentimes plagued by not knowing the quality of the measured values. It's much better to take those uncertainties into account in your modeling if you do know something about measurement errors. It's also a good idea to keep some data in reserve for evaluation.

EngWiPy · Aug 27, 2013

D H said:

There is no clear cut answer, and you are asking for what often are conflicting goals, stability and accuracy.

By way of analogy, suppose you are given 2001 measurements of some scalar quantity taken at 2001 different points in time. Your job: Develop some polynomial model that describes the quantity varies with time. You could develop a 2000th degree polynomial and hit each of those sampled values exactly. That ridiculous polynomial might be accurate (you can't get any more accurate), but it's absolutely worthless for interpolation and worth even less for extrapolation. Accuracy is not the be-all and end-all.

What you should be looking for instead is the smallest polynomial that still does a reasonable job of modeling your data. If you know how good those measurements are, you can use this as one measure of when to stop adding coefficients. Stop when you the residuals look like measurement noise. If you don't know the uncertainties in the measurements, another approach is to reserve some of those sampled values as evaluation points. You don't use those reserved points in the regression; you instead use them to evaluate the quality of the generated model.

This is a fairly standard approach not just in curve fitting but throughout machine learning. (Curve fitting is just a rather simple form of machine learning.) Machine learning is oftentimes plagued by not knowing the quality of the measured values. It's much better to take those uncertainties into account in your modeling if you do know something about measurement errors. It's also a good idea to keep some data in reserve for evaluation.

I need to invert the matrix to detect transmitted symbols. In particular, I want to find this matrix:

\mathbf{G}=\mathbf{H}^H(\mathbf{H}\mathbf{H}^H+N_0\mathbf{H})^{-1}

where N0 H is the covariance matrix of the noise. Here H is nearly singular. G is crucial for data detection, because if it is not accurate enough, the detection process will be erroneous. I agree the inverse must not be 100% accurate, but accurate enough for correct detection as in perfect inversion.

AlephZero · Aug 27, 2013

S_David said:

I am using the first approach you mentined, I set the threshold as th*sigma_max, but for some matrices th=10^-4 works fine, for others it doesn't. For some matrices I tried th=0.1, 0.01, 0.001, ... 1e-6, but it doesn't work.

A practical approach is male a plot of the normalized singular values, sorted in descending order, on a log scale.

Often, the plot has two distinct parts. The largest SV's decrease fairly quickly, and the smaller ones all have about the same value. The plot will show a clear "kink" between the two parts.

Set the threshold to filter out all the "small" SVs. If your matrix represents measured data, the numerical value of "small" will depends very much on the accuracy and resolution of the measurements. If you have some source of "noise" in the measurement system which has different amplitudes for different measurements, either you need to choose the filter value for each set of data, or (better) find a way to eliminate or at least stabilize the noise level in the measurement system.

EngWiPy · Aug 27, 2013

AlephZero said:

A practical approach is male a plot of the normalized singular values, sorted in descending order, on a log scale.

Often, the plot has two distinct parts. The largest SV's decrease fairly quickly, and the smaller ones all have about the same value. The plot will show a clear "kink" between the two parts.

Set the threshold to filter out all the "small" SVs. If your matrix represents measured data, the numerical value of "small" will depends very much on the accuracy and resolution of the measurements. If you have some source of "noise" in the measurement system which has different amplitudes for different measurements, either you need to choose the filter value for each set of data, or (better) find a way to eliminate or at least stabilize the noise level in the measurement system.

My channel is changing (random), so how to know what values are "small" to discard them each time?

I just discovered that reducing th=0.1 enhanced the performance, but the results were not the same as a journal paper I am trying to simulate. When I put th=1e-5 it is almost the same as the paper.

A question comes to my mind here: how to know if an inversion approximation is accurate enough since we don't know the accurate inversion? Basically, I am transmitting signals over a wireless channel, and these signals interfered at the receiver. So, I am suing minimum mean squared error (MMSE) equalizer to reduce the effect of interference and noise as following:

MSE=E\left[\|\mathbf{d}-\mathbf{G}\mathbf{y}\|^2\right]

where d is the transmitted signals and y is the received signals. A related question: does enhancing the performance of detection necessarily mean that the inversion process is accurate enough?

Nearly Singular Matrix Inversion

Similar threads

Undergrad Geometry problem of interest with a 3-4-5 triangle

Undergrad Trigonometry problem of interest

Insights Fixing Things Which Can Go Wrong With Complex Numbers

High School Area of Overlapping Squares

High School Can Six Pencils Be Arranged to Each Touch the Other Five?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers