How were these line fitting equations derived?

samh · May 13, 2009

I've run into these formulas many times but I've never seen how they were derived.

Say you want to fit a line to some data. Your data is a bunch of (xi,yi) pairs.

Through some kind of magic, the slope m and the y-intercept b are given by these formulas:

http://img262.imageshack.us/img262/9250/eb49c97d171bcd52e220911.png
http://img407.imageshack.us/img407/5681/bdea06a3287d543f035cac4.png

My attempt at deriving them:

I set up a matrix equation Ax=b and use the pseudoinverse to find the least-squares solution: x = inv(A'*A)*A'*b where A' is A transpose.

Annnnnd now I'm stuck. :) How would you go from that to those two formulas? Or maybe I'm headed in the wrong direction...

HallsofIvy · May 13, 2009

Finding a, b so that y= ax+ b gives the "least squares" best fit to the set of points [itex](x_i, y_i)[/itex] means "solving" the equations [itex]y_i= ax_i+ b[/itex] for all i. That is equivalent to the matrix equation Ax= y or
[tex]\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}\begin{bmatrix} a \\ b\end{bmatrix}= \begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}[/tex]

Now that is a linear transformation from [itex]R^2[/itex] to [itex]R^n[/itex] and so its image is a 2 dimensional subspace of [itex]R^n[/itex]. If the vector
[tex]\begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}[/tex]
happens to lie in that subspace, that is, if the points happen to lie on a straight line, then there would be a precise solution. If not, then then you are looking for the <a, b> that comes closest to that. Geometrically, you can find that solution vector by dropping a perpendicular from y to that two dimensional subspace. If [itex]\overline{x}[/itex] gives that "optimal" solution, that is, if [itex]A\overline{x}[/itex] is closest to y, then [itex]A\overline{x}- y[/itex] is perpendicular to that two dimensional subspace, the space of all [itex]Ax[/itex]. That means that, for any A, the inner product [itex]<Ax, A\overline{x}- y>= 0[/itex].

Now the "adjoint" of A, [itex]A^+[/itex], has the property that [itex]<Ax, y>= <x, A^+y>[/itex] so here we have [itex]<Ax, A\overline{x}- y>= <x, A^+(A\overline{x}- y)>[/itex][itex]=<x, A^+A\overline{x}- A^+y>= 0[/itex]. But since that is true for all x in R², we must have [itex]A^+A\overline{x}- A^+y= 0[/itex] or [itex]A^+A\overline{x}= A^+y[/itex].

Now we can solve for x by multiplying both sides by [itex](A^+A)^{-1}[/itex]:
[itex]x= (A^+A)^{-1}A^+(y)[/itex]. That is the "pseudoinverse" you refer to.

Because, as before,
[tex]A= \begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}[/itex] [tex]A^+= \begin{bmatrix}x_1 & x_2 & \cdot\cdot\cdot & x_n \\ 1 & 1 & 1 \cdot\cdot\cdot & 1\end{bmatrix}[/tex] So that [tex]A^+A= \begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}[/tex] and [tex]A^+ y= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex] So you want to solve the matrix equation [tex]\begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix}= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex] That should be easy- its just a 2 by 2 matrix and the inverse of [tex]\begin{bmatrix}A & B \\ C & D\end{bmatrix}[/tex] is [tex]\frac{1}{AD- BC}\begin{bmatrix}D & -B \\ -C & A\end{bmatrix}[/tex][/tex][/tex][/tex]

samh · May 13, 2009

What an amazing reply! Thanks so much. It's so simple now that I see it, haha.

Borek · May 14, 2009

I seem to remember it can be derived just by finding minimum value of

[tex]\sum(y_i - a x_i - b)^2[/tex]

that is, solving

[tex]\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial a } = 0[/tex]

and

[tex]\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial b } = 0[/tex]

HallsofIvy · May 14, 2009

Yeah, but my explanation was much cooler!

More seriously, it also has the advantage that the same argument can be applied directly to "least squares parabola", "least squares cubic", etc.

For a least squares parabola, use
[tex]\begin{bmatrix}x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n^2 & x_n & 1\end{bmatrix}[/tex]
instead of
[tex]\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n & 1\end{bmatrix}[/tex]

Borek · May 14, 2009

[tex] \sum(y_i - a x_i^2 - b x_i - c)^2[/tex]

Still doable

How were these line fitting equations derived?

Undergrad The vector to which a dual vector corresponds

Undergrad Spinor calculus

Undergrad Matrix representation of rank-2 spinors

Undergrad Looking for a paper about spinors

On the Moore–Penrose inverse from a banal linear algebra viewpoint

On the Moore–Penrose inverse from a banal linear algebra viewpoint

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

How were these line fitting equations derived?

Similar threads