# Least Square Solution

1. May 27, 2012

### Hiche

Assume we have this random matrix:$$\begin{pmatrix} 1 & 0 & 2\\ 2 & 0 & 4\\ 0 & 1 & 2 \end{pmatrix}$$
The least square solution can be found though: $A^TAx = A^Tb$. What is exactly meant by least square solution and is there a proof of some sort on how the formula was computed?

2. May 27, 2012

### algebrat

I'll answer the first part of your question, "what is meant by the least squares solution?" My explanation will lead right up to the point where we want b (in Ax=b) to be the projection of b onto the column space of A (since it may not lie in the column space already). The projection of b onto column space of A is Ax, where x is the solution for A^TAx=A^Tb. The last will always have a solution. I guess we could say Ax-b will have shortest length. By the way, for this x, Ax will be orthogonal to b-Ax, which when people prove this, the often include diagrams displaying this.

Suppose we have data sets of two measurements for each event: (x,u), (y,v), (z,w). We expect a linear relation, m and b, so u=mx+b, v=my+b, w=mz+b. But there may be no such line for any m and b. So we want a best fit. We want to find some m and b which minimize the sum of squares: (u-(mx+b))^2+(v-(my+b))^2+(w-(mz+b))^2. Let X be the matrix with columns (x,y,z) and (1,1,1), and let M be the vector (m,b), and U be the vector (u,v,w). Then if there had been a line going through all the data sets, we would have had a solution to XM=U. Now suppose there was possibly no line through all data sets. Then the column space of X does not contain U. Finding the closest point to U in the column space corresponds both to finding the projection of U onto the column space of X, and also minimizing the sum of squares of errors. So the M we want will correspond with XM being the projection of U onto the column space of X. Apparently, and this is the part i can remember, the projection of U onto the column space of X is given by XM, where M is the solution of X^TXM=X^TU.

Last edited: May 27, 2012
3. May 28, 2012

### HallsofIvy

Staff Emeritus
You have left out one important part of your question! "ATAx= A^Tb" is a "least squares solution" to what problem??

The "problem" is, of course, to solve Ax= b. If this were a basic algebra course, we would say "divide both sides by A". In matrix algebra, that would be "multiply both sides by the inverse of A". Here, we cannot do that because A does NOT have an inverse- its determinant is 0.

In terms of more general linear transformations from a vector space U to a vector space V, Ax= b has a unique solution if and only if A is both "one to one" and "onto". One can show that a linear transformation, from U to V, always maps U into a subspace of V. If A is not "one to one" and "onto", it maps U into some proper subspace. In that case, Ax= b will have a solution (perhaps an infinite number of solutions) if and only if b happens to lie in that subspace.

If A is not invertible, so that it maps U into some proper subspace of V, and b is not in that subspace, there is no solution to Ax= b. In that case, we have to ask "what is the best we can do?"- can we find and x so that Ax is closest to b?

To talk about "closest", we have to have a notion of distance, of course, so we have to have a "metric", perhaps a metric that can be derived from a "norm", and perhaps a norm that can be derived from a an "inner product". Now, lets go back from "linear transformation" to "n by n matrix" because an n by n matrix is a linear transformation from Rn to Rn- and there is a "natural" inner product on Rn, the "dot product".

Given inner products on two vector spaces, U and V, and a linear transformation, A, from U to V, the "adjoint" of A, AT, is defined as the linear transformation from V back to U such that for any u in U and any v in V, <Au, v>= <u, ATv> (notice that since Au is in V and ATv is in U, the first of those inner products is in V and the second in V). When U and V are Rn, the "adjoint" of matrix A is its adjoint, which is why I use the notation "AT".

Now, using the norm defined by that inner product ($|u|= \sqrt{<u, u>}$) and the metric defined by that norm (distance from u to v is $|u- v|= \sqrt{<u-v, u-v>}$), we get the usual "geometric" fact that the shortest distance from a point p to a subspace is along the line through p perpendicular to the subspace. That is, if b is the vector in Rn closest to Ax, which is in the subspace, we must have Ax- b perpendicular to the subspace which, in turn, means if v is any vector in the subspace, that is v= Au for some u, we must have <v, Ax- b>= <Au, Ax- b>= 0. That inner product is, of course, in the "range" of A, but we can use the "adjoint" (transpose) to cast that back to the domain: <Au, Ax- b>= <u, AT(Ax- b)>= <u, ATAx- ATb>= 0.

But u could be any member the domain so we must have ATAx- ATb= 0 so that ATAx= ATb.

Of course, this is called the "least squares" solution because we are minmizing the "distance" given by $\sqrt{<u, u>}$ which involves squares.

Last edited: May 28, 2012