# Large weighted least squares system

1. Apr 17, 2015

### vibe3

I have a large weighted least squares system with millions of observations. I can't store the least squares matrix in memory, so I only store the normal equations matrix:

$$A^T W A$$

where $A$ is dense and n-by-p and $W = diag(w_1,...,w_n)$. It takes a very long time to build this normal equations matrix due to the large number of observations. However, I also want to iteratively re-weight the observations to reduce the effect of outliers, and so at the next iteration, the weight matrix will be the product of the original weights $W$ with some new weights $V = diag(v_1,...,v_n)$. Therefore I need to construct the matrix:

$$A^T W' A$$

where $W' = diag(w_1 v_1, w_2 v_2, ..., w_n v_n)$.

Does anyone know of a slick (fast) way to determine the matrix $A^T W' A$ given the matrix $A^T W A$ and the new weights $V$? I am hoping I don't need to start over from scratch to build $A^T W' A$ for each iteration, since it takes so long.

I did search the literature and didn't find much on this topic.

2. Apr 17, 2015

### SteamKing

Staff Emeritus
Yeah, forget forming the normal equations altogether. You should use a more modern LS algorithm which incorporates either the QR or the SVD algorithm.

If the matrices are too large to store in memory at one time, then use an out-of-core solver.

3. Apr 17, 2015

### vibe3

My matrix $A$ has 8 millions rows (observations) and 2000 columns (parameters). The normal equations matrix $A^T W A$ is therefore 2000-by-2000, which can easily be handled by LAPACK to solve for the unknown parameters.

I can't even store the full matrix A in memory to attempt to do a QR or SVD on it. How do you propose I use these decompositions? Do you have a reference?

4. Apr 17, 2015

### MarneMath

If you can't store it in memory, then don't. If you're familiar with a mapreduce rent an amazon cluster and use a hadoop system to run a QR decomposition in map reduce land. It'll generally compute faster and memory won't be an issue.

Also relating to SteamKing's comment. Here's a paper on how to use out of core.

http://www.tau.ac.il/~stoledo/Bib/Pubs/pp01-qr.pdf

It's specifically used in cases that a matrix cannot be stored in memory but only on disk.

(Although personally for me, a mapreduce on a cluster is easier to handle)

5. Apr 17, 2015

### SteamKing

Staff Emeritus
It's not clear that solving that many normal equations will produce a useful result. Round off error in solving that large a system of normal equations could produce a garbage regression.

Not at the moment. With such a large problem to solve, QR or SVD may not be practical. You may have to resort to a solution by iteration.

The point is, the numerical techniques which may work well for finding a regression line thru a handful of data points may not be ideally suited when you have millions of data points and thousands of parameters.