# How were these line fitting equations derived?

I've run into these formulas many times but I've never seen how they were derived.

Say you want to fit a line to some data. Your data is a bunch of (xi,yi) pairs.

Through some kind of magic, the slope m and the y-intercept b are given by these formulas:

http://img262.imageshack.us/img262/9250/eb49c97d171bcd52e220911.png [Broken]
http://img407.imageshack.us/img407/5681/bdea06a3287d543f035cac4.png [Broken]

My attempt at deriving them:

I set up a matrix equation Ax=b and use the pseudoinverse to find the least-squares solution: x = inv(A'*A)*A'*b where A' is A transpose.

Annnnnd now I'm stuck. :) How would you go from that to those two formulas? Or maybe I'm headed in the wrong direction...

Last edited by a moderator:

## Answers and Replies

HallsofIvy
Science Advisor
Homework Helper
Finding a, b so that y= ax+ b gives the "least squares" best fit to the set of points $(x_i, y_i)$ means "solving" the equations $y_i= ax_i+ b$ for all i. That is equivalent to the matrix equation Ax= y or
$$\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}\begin{bmatrix} a \\ b\end{bmatrix}= \begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}$$

Now that is a linear transformation from $R^2$ to $R^n$ and so its image is a 2 dimensional subspace of $R^n$. If the vector
$$\begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}$$
happens to lie in that subspace, that is, if the points happen to lie on a straight line, then there would be a precise solution. If not, then then you are looking for the <a, b> that comes closest to that. Geometrically, you can find that solution vector by dropping a perpendicular from y to that two dimensional subspace. If $\overline{x}$ gives that "optimal" solution, that is, if $A\overline{x}$ is closest to y, then $A\overline{x}- y$ is perpendicular to that two dimensional subspace, the space of all $Ax$. That means that, for any A, the inner product $<Ax, A\overline{x}- y>= 0$.

Now the "adjoint" of A, $A^+$, has the property that $<Ax, y>= <x, A^+y>$ so here we have $<Ax, A\overline{x}- y>= <x, A^+(A\overline{x}- y)>$$=<x, A^+A\overline{x}- A^+y>= 0$. But since that is true for all x in R2, we must have $A^+A\overline{x}- A^+y= 0$ or $A^+A\overline{x}= A^+y$.

Now we can solve for x by multiplying both sides by $(A^+A)^{-1}$:
$x= (A^+A)^{-1}A^+(y)$. That is the "pseudoinverse" you refer to.

Because, as before,
$$A= \begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}[/itex] [tex]A^+= \begin{bmatrix}x_1 & x_2 & \cdot\cdot\cdot & x_n \\ 1 & 1 & 1 \cdot\cdot\cdot & 1\end{bmatrix}$$

So that
$$A^+A= \begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}$$
and
$$A^+ y= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex] So you want to solve the matrix equation [tex]\begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix}= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex] That should be easy- its just a 2 by 2 matrix and the inverse of [tex]\begin{bmatrix}A & B \\ C & D\end{bmatrix}$$
is
$$\frac{1}{AD- BC}\begin{bmatrix}D & -B \\ -C & A\end{bmatrix}$$

Last edited by a moderator:
What an amazing reply! Thanks so much. It's so simple now that I see it, haha.

Borek
Mentor
I seem to remember it can be derived just by finding minimum value of

$$\sum(y_i - a x_i - b)^2$$

that is, solving

$$\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial a } = 0$$

and

$$\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial b } = 0$$

HallsofIvy
Science Advisor
Homework Helper
Yeah, but my explanation was much cooler!:rofl:

More seriously, it also has the advantage that the same argument can be applied directly to "least squares parabola", "least squares cubic", etc.

For a least squares parabola, use
$$\begin{bmatrix}x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n^2 & x_n & 1\end{bmatrix}$$
instead of
$$\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n & 1\end{bmatrix}$$

Last edited by a moderator:
Borek
Mentor
$$\sum(y_i - a x_i^2 - b x_i - c)^2$$

Still doable :tongue2: