Least Squares Derivation question

Click For Summary

Discussion Overview

The discussion revolves around the derivation of the least squares method for finding the best fit line in a set of data points. Participants explore the mathematical steps involved in deriving the slope (m) and intercept (c) of the line, focusing particularly on the transformation of the equation for m and the implications of using means of x and y values.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant expresses confusion about the transition from the equation for m as derived from residuals to the form presented in the book, questioning why x_i's become (x_i - \overline{x})'s.
  • Another participant suggests that substituting c with \overline{y} - m\overline{x} leads to a circular dependency that requires re-solving for m.
  • A different viewpoint indicates that the book's formulation assumes a model with zero mean for y, leading to a simplified model that estimates m through the origin.
  • Further elaboration is provided on the mathematical steps to derive the expression for m, including manipulation of the equations and consideration of sample variance.
  • One participant seeks clarification on specific lines of derivation, indicating that while they understand most of the process, they find certain transitions unclear.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the derivation process, with multiple interpretations and methods presented. Some participants agree on the steps taken but express confusion about specific transformations, indicating that the discussion remains unresolved.

Contextual Notes

Participants highlight the potential for confusion due to assumptions made in the derivation, particularly regarding the treatment of means and the implications of different models used in the least squares method.

Xyius
Messages
501
Reaction score
4
So I am learning how to use the method of least squares to find the best fit straight line and there is this one part in the derivation I do not fully understand and I was wondering if anyone could help me out.

So basically we start out with the residuals of the equation of a straight line..

y_i-mx_i-c
And now we take the root mean square of these residuals and try to find the minimum points of m and c by taking the partial derivatives.

\Sigma (y_i-mx_i-c)^2
\frac{\partial S}{\partial m}=-2\Sigma x_i(y_i-mx_i-c)=0
\frac{\partial S}{\partial c}=-2\Sigma (y_i-mx_i-c)=0

Now from the second equation it is easy to see that
c=\overline{y}-m\overline{x}
since
\overline{y}=\frac{1}{n}\Sigma y_i
and
\overline{x}=\frac{1}{n}\Sigma x_i

Solving the first equation for m we get..
m=\frac{\Sigma x_i y_i -c\Sigma x_i}{\Sigma x_i ^2}

The part I do not understand is the book says that m is equal to..

m=\frac{\Sigma (x_i-\overline{x})y_i}{\Sigma (x_i-\overline{x})^2}
I feel like I must be missing something simple due to the books lack of explanation. Can anyone help me get this formula from the one I got for m? Why did the x_i's turn into (x_i-\overline{x})'s?? Did they made c=0 for some reason? Any help would be appreciated!
 
Physics news on Phys.org
Xyius said:
m=\frac{\Sigma x_i y_i -c\Sigma x_i}{\Sigma x_i ^2}

The part I do not understand is the book says that m is equal to..
m=\frac{\Sigma (x_i-\overline{x})y_i}{\Sigma (x_i-\overline{x})^2}

In the first equation, if you replace c by \overline{y} - m \overline{x} then you get m's on both sides and you have to solve for m again. Maybe that's how to do it.
 
Not quite right - looks like they assumed a model with zero y-mean. Another way of looking at this is a model, in x about x-mean, through the origin. With this later terminology, you only have one parameter, the slope (m), to estimate

y_i = m \, (x_i-\bar{x})

Hence, setting the derivative of S(m) to zero yields

\sum y \, (x_i-\bar{x}) = m \cdot \sum (x_i-\bar{x})^2

Solve for m to yield text answer
 
m=\frac{\Sigma x_i y_i -c\Sigma x_i}{\Sigma x_i ^2}
m = \frac{\Sigma x_i y_i - (\overline y - m \overline x) \Sigma x_i}{\Sigma x_i ^2}
m = \frac{ \Sigma x_i y_i -\overline y \Sigma x_i + m \overline x \Sigma x_i}{\Sigma x_i^2}
m \Sigma x_i^2 - m \overline x \Sigma x_i = \Sigma x_i y_i - \overline y \Sigma x_i
m= \frac{ \Sigma x_i y_i - \overline y \Sigma x_i}{\Sigma x_i^2 - \overline x \Sigma x_i}

= \frac{ \Sigma x_i y_i - \frac{\Sigma y_i}{N} \Sigma x_i}{\Sigma x_i^2 - (\frac{\Sigma x_i}{N}) \Sigma x_i }

The numerator is equal to \Sigma x_i y_i - \Sigma y_i (\frac{\Sigma x_i}{N})
= \Sigma x_i y_i - \Sigma y_i \overline x
= \Sigma (x_i - \overline x) y_i

To deal with the denominator, consider an estimator for the sample variance:

\sigma^2 = \frac { \Sigma (x_i - \overline x)^2}{N}
= \frac{ \Sigma ( x_i^2 - 2 x_i \overline x + \overline x \overline x)}{N}
= \frac{ \Sigma x_i^2 - 2 \overline x \Sigma x_i + \Sigma \overline x \overline x }{N}
= \frac {\Sigma x_i^2}{N} - 2 \overline x \frac{\Sigma x_i}{N} + \frac{ N \overline x \overline x}{N}
= \frac {\Sigma x_i^2}{N} - 2 \overline x \overline x + \overline x \overline x
= \frac{ \Sigma x_i^2}{N} - \overline x \overline x

This establishes that
\frac {\Sigma (x_i - \overline x)^2}{N} = \frac{\Sigma x_i^2}{N} - \overline x \overline x

So
\Sigma (x_i - \overline x)^2 = \Sigma x_i^2 - N \overline x \overline x
= \Sigma x_i^2 - N ( \frac{\Sigma x_i}{N} \frac{\Sigma x_i}{N})
= \Sigma x_i^2 - \frac{\Sigma x_i \Sigma x_i}{N}
 
The only thing that confuses me Stephen Tashi, is line 4 and 5. Where do you get those relations and how does "m" turn into the expression in line 5? Otherwise, everything after it is fine and I understand completely. :]
 
m = \frac{ \Sigma x_i y_i -\overline y \Sigma x_i + m \overline x \Sigma x_i}{\Sigma x_i^2}
Multiply both sides of the equation by \Sigma x_i^2 and then subtract m\overline x \Sigma x_i from both sides.
m \Sigma x_i^2 - m \overline x \Sigma x_i = \Sigma x_i y_i - \overline y \Sigma x_i

m( \Sigma x_i^2 - \overline x \Sigma x_i) = \Sigma x_i y_i - \overline y \Sigma x_i

Divide both sides by \Sigma x_i^2 - \overline x \Sigma x_i

m= \frac{ \Sigma x_i y_i - \overline y \Sigma x_i}{\Sigma x_i^2 - \overline x \Sigma x_i}
 
Last edited:
Thank you very much!
 

Similar threads

Replies
1
Views
4K
  • · Replies 42 ·
2
Replies
42
Views
6K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 11 ·
Replies
11
Views
1K
Replies
24
Views
3K