Least squares line - understanding formulas

Click For Summary
SUMMARY

This discussion focuses on understanding the formulas for calculating the slope and y-intercept in simple linear regression, specifically the formulas: slope = [ n×Σxy - ΣxΣy ] / [ n×Σx^2 - (Σx)^2 ] and intercept = [ Σy/n ] - slope x [ Σx / n ]. Participants seek clarity on the mathematical intuition behind each component of these formulas, including the roles of covariance and variance. The conversation emphasizes that while individual terms may lack standalone significance, they collectively achieve the goal of minimizing error in the regression model.

PREREQUISITES
  • Understanding of basic statistics, including correlation and regression concepts.
  • Familiarity with the terms covariance and variance in statistical analysis.
  • Knowledge of the least squares method for regression analysis.
  • Basic proficiency in mathematical notation and operations involving summation.
NEXT STEPS
  • Study the derivation of the least squares regression formulas in detail.
  • Learn about the Pearson correlation coefficient and its relationship to slope calculations.
  • Explore the concept of residuals and their significance in regression analysis.
  • Investigate the differences between one-pass and two-pass calculations in regression.
USEFUL FOR

Statisticians, data analysts, and students of statistics who are looking to deepen their understanding of linear regression and the mathematical foundations behind regression formulas.

Vital
Messages
108
Reaction score
4
Hello.

I have listened to a great lecture, which gave helpful intuitive insight into correlation and regression (basic stuff). But there are formulas, which I cannot grasp intuitively and don't know their origin. To remember them I would like to understand what's happening in each part of the formula and why these mathematical combinations are used to get the desired result, i.e. I would like to understand both mathematically and intuitively what's happening in those formulas.
I will be grateful for your patience and your help.

The first one is for the slope, and the second - for y-intercept
(both formulas below are used for variables in a simple linear relationships formula
y = y-intercept + slope multiplied by x).

slope = [ n×Σxy - ΣxΣy ] / [ n×Σx^2 - (Σx)^2 ]

I have "whys" about each part of this formula:
numerator
- why we take the sum of xy
- why we then multiply that sum by n (the number of elements) and what is the meaning and role of the result
- why we subtract from the previous result the sum of x multiplied by the sum of y
denominator
- why we take the sum of x squared
- why we then multiply it by n
- why we take the sum of all x and then square the result
- why we subtract the first from the second
formula
- why we use [n×Σxy - ΣxΣy] for numerator and [n×Σx^2 - (Σx)^2 ] for the denominator,
how do they work together, and what is the intuition behind the process?

The second one is for the y intercept:
intercept = [ Σy/n ] - slope x [ Σx / n]

Same questions here.And finally what is more confusing is that [ Σx / n] is called a margin of error. Why is this called a margin of error if it looks as a formula for finding the average value of x, given n elements. Thank you.
 
Physics news on Phys.org
The slope formula has been manipulated to be easier to calculate (one pass through the data rather than two). It is closely related to the Pearson correlation coefficient. You can see a fairly intuitive initial definition of the Pearson correlation coefficient which is then manipulated to be close to your slope formula here.

For the intercept, I don't know what slope x means. Whatever it is, I assume that the same sort of manipulations has been done as was done for the slope formula.

I have not heard the sample average called a "margin of error" before, so I can't help you there. The usual use of the term "margin of error" in statistics does not have that definition.
 
Last edited:
  • Like
Likes   Reactions: Vital
Qualitatively, the slope is the covariance / variance

[ n×Σx^2 - (Σx)^2 ] is the variance:

1566226277038.png


the covariance in the numerator is the same thing by XY instead of X2. If the correlation is perfect, covariance = variance and the slope is 1. If there is no correlation, then covariance is zero and so is the slope.
 
  • Like
Likes   Reactions: FactChecker
@Vital I think your approach here is not going to be fruitful. To my knowledge there is no “why” for the individual terms, there is only a “why” for the whole formula. The individual terms are only there because together they achieve the goal of the overall formula, they individually have no particular importance.

The purpose of the overall formula is to calculate the ##m## and ##b## that minimize the error from ##y=mx +b##. Specifically, we want to find ##m## and ##b## such that ##\frac{\partial}{\partial m}\Sigma r^2=0## and ##\frac{\partial}{\partial b}\Sigma r^2=0## where ##r## is the residual error ##r=y-(mx+b)##. All of those formulas you are looking into are just what you get when you solve these equations.
 
  • Like
Likes   Reactions: Vital, FactChecker and DaveE
Dale said:
@Vital I think your approach here is not going to be fruitful. To my knowledge there is no “why” for the individual terms, there is only a “why” for the whole formula.
I agree. It started with a very intuitive formula, but then got manipulated so that the parts are not intuitive. The reason for the manipulation was to make it a single-pass calculation through the data, which is easier than the original two-pass formula (a first pass through the data to get the average followed by a second pass to total all the deviations from that average).
 
  • Like
Likes   Reactions: Vital and Dale
Thank you very much for your answers and guidance.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K