# Row selection of matrix and the condition number

1. Jan 21, 2014

### divB

Hi,

Given an over-determined system of linear equations y=A c, the condition number of matrix A essentially says how good vector c can be restored from measurements y.

Changing the order of rows clearly does not change the condition number.

But is there information/literature on how to best choose a subset of rows to improve the condition number?

Say, e.g., vector c is small (30) but a large number of measurements are available (>>1000) and the goal is to reduce the number of measurements as much as possible (e.g. I only want to use 50 measurements). From experiments I find I get best results when just picking a random set of 50 rows. But is this really the case? Is there a way to proof this/understand this?

Given the fact I know the structure of the matrix A, is there a chance to find an optimal selection of rows?

Thanks
divB

2. Jan 21, 2014

### Office_Shredder

Staff Emeritus
The condition number of a matrix basically is a measure of how badly skewed its rows are from being orthogonal. If you pick ~n vectors from $\mathbb{R}^n$ in a uniform way (which if you start with >> 1000 vectors and are restricting yourself to 30 you are probably doing something close to this), then there is a very high probability that they will all be nearly orthogonal to each other, meaning you will have a small condition number.

I don't recall for this kind of problem specifically but I know there are examples of high dimensional geometry/probability calculations applied to them where you can do slightly better by picking your vectors in a very controlled manner than by picking randomly. But the improvement is often small and the time it takes to pick the vectors is typically quite large; meaning it might be workable as a pre-processing step if you are using the same matrix a lot of times, but unless you have a lot of computing power to spare on each calculation is not a great strategy over random selection.

I question restricting your rows as a valid strategy to begin with though. What you are really seeing is a subset of the effect that if you have a very tall matrix (something much bigger than n x n), with very high probability it is almost an embedding. So your matrix A starts with a very good condition number, and when you restrict the rows I can only imagine you are getting a worse condition number (but luckily still quite good). You should be able to solve for c from A and y (or estimate the best guess) using all the information and the pseudo-inverse of A

http://en.wikipedia.org/wiki/Moore–Penrose_pseudoinverse#Linear_least-squares

in this case you will get an exact solution if y and c are perfect, and if y is a bit noisy, you will find the best fit for c that you can get. If your problem is that you can't compute the pseudoinverse of A because of size issues then your current strategy is probably the best.

3. Jan 21, 2014

### divB

Thanks, this helps a little bit.

Regarding the restriction strategy: Of course, the more rows the better. In this scenario, each row corresponds to a measurement and taking measurements are expensive. To be precice, the main goal is to reduce the number of measurements (=rows) as much as possible such that the recovery performance is still maximized. My goal is to understand this and maybe come up with a better selection because the random choice of measurements is also expensive ;-)

Is there any way I can analytically get a handle on this if I assume certain things?

As examples, suppose the matrix has Q=3 rows, N=128 measurements are available, M=10 rows should be taken.

(1) Suppose the Matrix is NxQ WGN. Then the rows should really not matter, right?

(2) Then suppose I take the first Q columns of the NxN DFT matrix. The condition number of this matrix is one. However, if I take the first 10 rows, the condition number is 240 (similarly for any consecutive block of 10 rows). If I take random 10 rows, the average is 8. If I take 10 measurements linearly spaced apart from row 1...128 I get 1.15. Better than random!

Shouldn't it be possible to analyze that using the the inner product (=measure of orthogonality) or maybe cross correlation between any two rows?
Condition nr of 1 means that all rows are perfectly orthogonal? Why don't I get an inner product of 0 for arbitrary rows?

(3) Now assume the same matrix as (2) but each row is multiplied by a constant. For simplicity, I assume the constants are WGN. The empiric results are the same as (2) (consecutive worst, then random, then linearly spaced).

There should be any way to relate the information to the condition nr. of the resulting matrix but I am stuck (my real matrix is one step more complicated).

Do you have an idea how I could proceed?

4. Jan 22, 2014

### Office_Shredder

Staff Emeritus
OK, I want to make sure I have a handle on your situation:

When you say you have thousands of rows, do you mean there is an extremely large but finite number of distinct measurements that you can take? For example you can't just take the measurement which is "find the third coordinate" but there are thousands of different kinds of measurements which you are capable of taking.

Are the measurements you are capable of taking the same each time, or different? This will make a huge difference as pre-processing a measurement matrix used thousands of times is more feasible than trying to do an operation on it thousands of times.

Why is randomly selecting which measurements to take expensive? I can ask matlab to give me 10000 random numbers and it takes me .0005 seconds to do it. Or do you simply mean that the number of measurements you have to take to get good results if you randomly select measurements is an expensive number of measurements to take.

5. Jan 23, 2014

### divB

Maybe it makes more sense if I become even more explicit: It is about sampling a signal (I do not know if you have heard about Compressed Sensing - it is similar like that).

You can think of each row corresponding to a sample at Nyquist rate, e.g. at 500 MHz. An ADC doing that at 12 bit costs 60\$, is very power hungry etc. Due to a specific "preprocessing" it is possible to get much less (but still arbitrary) rows at an arbitrarily lower, and thus cheaper rate. E.g. 100kHz with 12 bit are essentially "free". It's not exactly like that but I hope it gives an idea about the rationale.

Regarding randomness: This is due to practical implementation issues: Preprocessing a signal with some random data is just MUCH MORE expensive than with a fixed pattern or even fixed row selection. E.g., think of a simple RF mixer versus a PLL driven by arbitrary random frequencies which need to be synchronized with some digital backend etc. etc.

Thanks,
divB

Exactly. If you are familiar with adaptive filters, system identification or Wiener filter (actually a special case of exactly that - Least Squares): A signal is fed through a system, the rows correspond to the measurements of the output and the system coefficients are to be found

If I understand it correctly - each time different.

Again the system identification analogy: If the unknown system has 3 coefficients, I could just take the first 3 output samples. But usually it's a much better idea to get 1000 to improve the estimate.
In my case taking a sample is expensive, so the question is which one, in a certain time perdiod to pick when I want to limit myself to say 10 measurements.

Last edited: Jan 23, 2014
6. Jan 24, 2014

### Office_Shredder

Staff Emeritus
I don't understand what you mean when you say that taking a random measurement is more expensive than measuring a fixed row.

My interest in these types of problems is mostly in studying high dimensional geometry, so when you say something like
I don't really know any of what you mean haha.

7. Jan 26, 2014

### divB

It's just an implementation issue because the matrix operation corresponds to something "physical" on a piece of hardware.

;-) Nevermind, that was the reason I wanted to "hide" details ...

Just one thing more:

Are you sure this shouldn't mean columns?

8. Feb 16, 2014

### divB

May I push this thread? In particular, is this a correct statement?

First of all, shouldn't this mean columns? What is the interpretation of rows and columns in this sense?
Second, it sounds a little bit counter-intuitive: Shouldn't only the first n measurements (n being the #columns=unknowns) be orthogonal and the rest as linear dependent as possible in order to support the other measurements with low error?

9. Feb 17, 2014

### lurflurf

This is a very big question. As a first step you might like to read about the singular value decomposition. It addresses your problem. Given some large matrix we want a smaller matrix that approximates the larger matrix well in some sense. Depending on the particular problem the smaller matrix might be as a good or better than naive use of the larger matrix if we had a lot of noise in our measurements.

10. Feb 17, 2014

### divB

Thank you.
And does this also hold for overdetermined systems?
Is it better have the columns or the rows as orthogonal as possible in an overdetermined system?

Or asked differently: Suppose I want to give a good example of a stable 10x5 equation system. How would I choose the matrix? Of course, I would use the matrix with the smallest condition number. But that's difficult. Can't I just say something like: I would pick the matrix such that its columns are as orthogonal as possible? Or its rows? Or both?

11. Feb 18, 2014

### lurflurf

Yes it holds for overdetermined systems that is the usual case. If the condition number is induced by the Euclidean norm it is the ratio of the singular values. The truncated matrix is chosen to have condition number as small as possible. That is we throw out the lowest so many values. Before throwing out equations there is a change of basis so our remaining equations are combinations of the original equations. That is a more efficient use of available information.