How were these line fitting equations derived?

  • Thread starter samh
  • Start date
  • #1
46
0

Main Question or Discussion Point

I've run into these formulas many times but I've never seen how they were derived.

Say you want to fit a line to some data. Your data is a bunch of (xi,yi) pairs.

Through some kind of magic, the slope m and the y-intercept b are given by these formulas:

http://img262.imageshack.us/img262/9250/eb49c97d171bcd52e220911.png [Broken]
http://img407.imageshack.us/img407/5681/bdea06a3287d543f035cac4.png [Broken]

My attempt at deriving them:

I set up a matrix equation Ax=b and use the pseudoinverse to find the least-squares solution: x = inv(A'*A)*A'*b where A' is A transpose.

Annnnnd now I'm stuck. :) How would you go from that to those two formulas? Or maybe I'm headed in the wrong direction...
 
Last edited by a moderator:

Answers and Replies

  • #2
HallsofIvy
Science Advisor
Homework Helper
41,794
925
Finding a, b so that y= ax+ b gives the "least squares" best fit to the set of points [itex](x_i, y_i)[/itex] means "solving" the equations [itex]y_i= ax_i+ b[/itex] for all i. That is equivalent to the matrix equation Ax= y or
[tex]\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}\begin{bmatrix} a \\ b\end{bmatrix}= \begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}[/tex]

Now that is a linear transformation from [itex]R^2[/itex] to [itex]R^n[/itex] and so its image is a 2 dimensional subspace of [itex]R^n[/itex]. If the vector
[tex]\begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}[/tex]
happens to lie in that subspace, that is, if the points happen to lie on a straight line, then there would be a precise solution. If not, then then you are looking for the <a, b> that comes closest to that. Geometrically, you can find that solution vector by dropping a perpendicular from y to that two dimensional subspace. If [itex]\overline{x}[/itex] gives that "optimal" solution, that is, if [itex]A\overline{x}[/itex] is closest to y, then [itex]A\overline{x}- y[/itex] is perpendicular to that two dimensional subspace, the space of all [itex]Ax[/itex]. That means that, for any A, the inner product [itex]<Ax, A\overline{x}- y>= 0[/itex].

Now the "adjoint" of A, [itex]A^+[/itex], has the property that [itex]<Ax, y>= <x, A^+y>[/itex] so here we have [itex]<Ax, A\overline{x}- y>= <x, A^+(A\overline{x}- y)>[/itex][itex]=<x, A^+A\overline{x}- A^+y>= 0[/itex]. But since that is true for all x in R2, we must have [itex]A^+A\overline{x}- A^+y= 0[/itex] or [itex]A^+A\overline{x}= A^+y[/itex].

Now we can solve for x by multiplying both sides by [itex](A^+A)^{-1}[/itex]:
[itex]x= (A^+A)^{-1}A^+(y)[/itex]. That is the "pseudoinverse" you refer to.

Because, as before,
[tex]A= \begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}[/itex]
[tex]A^+= \begin{bmatrix}x_1 & x_2 & \cdot\cdot\cdot & x_n \\ 1 & 1 & 1 \cdot\cdot\cdot & 1\end{bmatrix}[/tex]

So that
[tex]A^+A= \begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}[/tex]
and
[tex]A^+ y= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex]

So you want to solve the matrix equation
[tex]\begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix}= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex]

That should be easy- its just a 2 by 2 matrix and the inverse of
[tex]\begin{bmatrix}A & B \\ C & D\end{bmatrix}[/tex]
is
[tex]\frac{1}{AD- BC}\begin{bmatrix}D & -B \\ -C & A\end{bmatrix}[/tex]
 
Last edited by a moderator:
  • #3
46
0
What an amazing reply! Thanks so much. It's so simple now that I see it, haha.
 
  • #4
Borek
Mentor
28,328
2,716
I seem to remember it can be derived just by finding minimum value of

[tex]\sum(y_i - a x_i - b)^2[/tex]

that is, solving

[tex]\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial a } = 0[/tex]

and

[tex]\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial b } = 0[/tex]
 
  • #5
HallsofIvy
Science Advisor
Homework Helper
41,794
925
Yeah, but my explanation was much cooler!:rofl:

More seriously, it also has the advantage that the same argument can be applied directly to "least squares parabola", "least squares cubic", etc.

For a least squares parabola, use
[tex]\begin{bmatrix}x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n^2 & x_n & 1\end{bmatrix}[/tex]
instead of
[tex]\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n & 1\end{bmatrix}[/tex]
 
Last edited by a moderator:
  • #6
Borek
Mentor
28,328
2,716
[tex]
\sum(y_i - a x_i^2 - b x_i - c)^2
[/tex]

Still doable :tongue2:
 

Related Threads for: How were these line fitting equations derived?

Replies
3
Views
2K
Replies
8
Views
1K
  • Last Post
Replies
3
Views
2K
  • Last Post
Replies
2
Views
2K
  • Last Post
Replies
3
Views
1K
  • Last Post
Replies
4
Views
2K
Replies
1
Views
4K
  • Last Post
Replies
2
Views
6K
Top