Least squares line - understanding formulas

Vital · Aug 19, 2019

Hello.

I have listened to a great lecture, which gave helpful intuitive insight into correlation and regression (basic stuff). But there are formulas, which I cannot grasp intuitively and don't know their origin. To remember them I would like to understand what's happening in each part of the formula and why these mathematical combinations are used to get the desired result, i.e. I would like to understand both mathematically and intuitively what's happening in those formulas.
I will be grateful for your patience and your help.

The first one is for the slope, and the second - for y-intercept
(both formulas below are used for variables in a simple linear relationships formula
y = y-intercept + slope multiplied by x).

slope = [ n×Σxy - ΣxΣy ] / [ n×Σx^2 - (Σx)^2 ]

I have "whys" about each part of this formula:
numerator
- why we take the sum of xy
- why we then multiply that sum by n (the number of elements) and what is the meaning and role of the result
- why we subtract from the previous result the sum of x multiplied by the sum of y
denominator
- why we take the sum of x squared
- why we then multiply it by n
- why we take the sum of all x and then square the result
- why we subtract the first from the second
formula
- why we use [n×Σxy - ΣxΣy] for numerator and [n×Σx^2 - (Σx)^2 ] for the denominator,
how do they work together, and what is the intuition behind the process?

The second one is for the y intercept:
intercept = [ Σy/n ] - slope x [ Σx / n]

Same questions here.And finally what is more confusing is that [ Σx / n] is called a margin of error. Why is this called a margin of error if it looks as a formula for finding the average value of x, given n elements. Thank you.

FactChecker · Aug 19, 2019

The slope formula has been manipulated to be easier to calculate (one pass through the data rather than two). It is closely related to the Pearson correlation coefficient. You can see a fairly intuitive initial definition of the Pearson correlation coefficient which is then manipulated to be close to your slope formula here.

For the intercept, I don't know what slope x means. Whatever it is, I assume that the same sort of manipulations has been done as was done for the slope formula.

I have not heard the sample average called a "margin of error" before, so I can't help you there. The usual use of the term "margin of error" in statistics does not have that definition.

BWV · Aug 19, 2019

Qualitatively, the slope is the covariance / variance

[ n×Σx^2 - (Σx)^2 ] is the variance:

the covariance in the numerator is the same thing by XY instead of X2. If the correlation is perfect, covariance = variance and the slope is 1. If there is no correlation, then covariance is zero and so is the slope.

Dale · Aug 19, 2019

@Vital I think your approach here is not going to be fruitful. To my knowledge there is no “why” for the individual terms, there is only a “why” for the whole formula. The individual terms are only there because together they achieve the goal of the overall formula, they individually have no particular importance.

The purpose of the overall formula is to calculate the ##m## and ##b## that minimize the error from ##y=mx +b##. Specifically, we want to find ##m## and ##b## such that ##\frac{\partial}{\partial m}\Sigma r^2=0## and ##\frac{\partial}{\partial b}\Sigma r^2=0## where ##r## is the residual error ##r=y-(mx+b)##. All of those formulas you are looking into are just what you get when you solve these equations.

FactChecker · Aug 19, 2019

Dale said:

@Vital I think your approach here is not going to be fruitful. To my knowledge there is no “why” for the individual terms, there is only a “why” for the whole formula.

I agree. It started with a very intuitive formula, but then got manipulated so that the parts are not intuitive. The reason for the manipulation was to make it a single-pass calculation through the data, which is easier than the original two-pass formula (a first pass through the data to get the average followed by a second pass to total all the deviations from that average).

Vital · Aug 20, 2019

Thank you very much for your answers and guidance.

Least squares line - understanding formulas

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Least squares line - understanding formulas

Similar threads