How were these line fitting equations derived?

  • Thread starter Thread starter samh
  • Start date Start date
  • Tags Tags
    Fitting Line
AI Thread Summary
The discussion focuses on deriving the line fitting equations used in least squares regression. The user attempts to establish a matrix equation and apply the pseudoinverse to find the optimal slope and intercept for fitting a line to data points. They detail the steps involved in setting up the linear transformation and the geometric interpretation of finding the closest solution. The conversation also touches on extending the least squares method to fit parabolas and other polynomial forms. Ultimately, the derivation is clarified through matrix manipulation and calculus, showcasing the versatility of the least squares approach.
samh
Messages
46
Reaction score
0
I've run into these formulas many times but I've never seen how they were derived.

Say you want to fit a line to some data. Your data is a bunch of (xi,yi) pairs.

Through some kind of magic, the slope m and the y-intercept b are given by these formulas:

http://img262.imageshack.us/img262/9250/eb49c97d171bcd52e220911.png
http://img407.imageshack.us/img407/5681/bdea06a3287d543f035cac4.png

My attempt at deriving them:

I set up a matrix equation Ax=b and use the pseudoinverse to find the least-squares solution: x = inv(A'*A)*A'*b where A' is A transpose.

Annnnnd now I'm stuck. :) How would you go from that to those two formulas? Or maybe I'm headed in the wrong direction...
 
Last edited by a moderator:
Mathematics news on Phys.org
Finding a, b so that y= ax+ b gives the "least squares" best fit to the set of points (x_i, y_i) means "solving" the equations y_i= ax_i+ b for all i. That is equivalent to the matrix equation Ax= y or
\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}\begin{bmatrix} a \\ b\end{bmatrix}= \begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}

Now that is a linear transformation from R^2 to R^n and so its image is a 2 dimensional subspace of R^n. If the vector
\begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}
happens to lie in that subspace, that is, if the points happen to lie on a straight line, then there would be a precise solution. If not, then then you are looking for the <a, b> that comes closest to that. Geometrically, you can find that solution vector by dropping a perpendicular from y to that two dimensional subspace. If \overline{x} gives that "optimal" solution, that is, if A\overline{x} is closest to y, then A\overline{x}- y is perpendicular to that two dimensional subspace, the space of all Ax. That means that, for any A, the inner product &lt;Ax, A\overline{x}- y&gt;= 0.

Now the "adjoint" of A, A^+, has the property that &lt;Ax, y&gt;= &lt;x, A^+y&gt; so here we have &lt;Ax, A\overline{x}- y&gt;= &lt;x, A^+(A\overline{x}- y)&gt;=&lt;x, A^+A\overline{x}- A^+y&gt;= 0. But since that is true for all x in R2, we must have A^+A\overline{x}- A^+y= 0 or A^+A\overline{x}= A^+y.

Now we can solve for x by multiplying both sides by (A^+A)^{-1}:
x= (A^+A)^{-1}A^+(y). That is the "pseudoinverse" you refer to.

Because, as before,
A= \begin{bmatrix}x_1 &amp; 1 \\ x_2 &amp; 1 \\ \cdot &amp; \cdot \\ \cdot &amp; \cdot \\ \cdot &amp; \cdot \\ x_n &amp; 1\end{bmatrix}[/itex]<br /> A^+= \begin{bmatrix}x_1 &amp;amp; x_2 &amp;amp; \cdot\cdot\cdot &amp;amp; x_n \\ 1 &amp;amp; 1 &amp;amp; 1 \cdot\cdot\cdot &amp;amp; 1\end{bmatrix}<br /> <br /> So that <br /> A^+A= \begin{bmatrix}\sum_{i= 1}^n x_i^2 &amp;amp; \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i &amp;amp; n\end{bmatrix}<br /> and<br /> A^+ y= \begin{bmatrix}\sum_{i=1}^n x_iy_i &amp;amp; \sum_{i=1}^n y_i\end{bmatrix}[/itex]&lt;br /&gt; &lt;br /&gt; So you want to solve the matrix equation &lt;br /&gt; \begin{bmatrix}\sum_{i= 1}^n x_i^2 &amp;amp;amp; \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i &amp;amp;amp; n\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix}= \begin{bmatrix}\sum_{i=1}^n x_iy_i &amp;amp;amp; \sum_{i=1}^n y_i\end{bmatrix}[/itex]&amp;lt;br /&amp;gt; &amp;lt;br /&amp;gt; That should be easy- its just a 2 by 2 matrix and the inverse of &amp;lt;br /&amp;gt; \begin{bmatrix}A &amp;amp;amp;amp; B \\ C &amp;amp;amp;amp; D\end{bmatrix}&amp;lt;br /&amp;gt; is &amp;lt;br /&amp;gt; \frac{1}{AD- BC}\begin{bmatrix}D &amp;amp;amp;amp; -B \\ -C &amp;amp;amp;amp; A\end{bmatrix}
 
Last edited by a moderator:
What an amazing reply! Thanks so much. It's so simple now that I see it, haha.
 
I seem to remember it can be derived just by finding minimum value of

\sum(y_i - a x_i - b)^2

that is, solving

\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial a } = 0

and

\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial b } = 0
 
Yeah, but my explanation was much cooler!:smile:

More seriously, it also has the advantage that the same argument can be applied directly to "least squares parabola", "least squares cubic", etc.

For a least squares parabola, use
\begin{bmatrix}x_1^2 &amp; x_1 &amp; 1 \\ x_2^2 &amp; x_2 &amp; 1 \\ \cdot &amp; \cdot &amp; \cdot \\ \cdot &amp; \cdot &amp; \cdot \\ \cdot &amp; \cdot &amp; \cdot \\ x_n^2 &amp; x_n &amp; 1\end{bmatrix}
instead of
\begin{bmatrix}x_1 &amp; 1 \\ x_2 &amp; 1 \\ \cdot &amp; \cdot &amp; \cdot \\ \cdot &amp; \cdot &amp; \cdot \\ \cdot &amp; \cdot &amp; \cdot \\ x_n &amp; 1\end{bmatrix}
 
Last edited by a moderator:
<br /> \sum(y_i - a x_i^2 - b x_i - c)^2<br />

Still doable :-p
 
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes. I have seen that this is an important subject in maths My question is what physical applications does such a model apply to? I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Back
Top