How Do OLS and LAD Regression Methods Differ?

  • Thread starter Thread starter oleandora
  • Start date Start date
  • Tags Tags
    Difference Ols
Click For Summary
Least Squares (LS) and Least Absolute Deviation (LAD) regression methods differ primarily in their optimization goals; LS minimizes the sum of squared differences, while LAD minimizes the sum of absolute differences. The calculation steps for determining coefficients in both methods are distinct, with LS providing more straightforward formulas for estimates. Although both methods can yield similar results on certain datasets, they may also produce significantly different outcomes, especially in the presence of outliers. LS is widely used due to its familiarity and optimal properties under normal error distribution, but it is less robust compared to LAD, which is somewhat more resilient to outliers. Overall, understanding these differences is crucial for selecting the appropriate regression technique based on data characteristics.
oleandora
Messages
2
Reaction score
0
Hi
I'mwondering what's the difference between least squares method with least absolute deviation method.
Assume we have y=ax+b+s where s isdeviation.
Is the step to calculate even a and b is different.
I read that those two methods are almost the same but hardly found a real good explanation about LAD.
Thank you
 
Physics news on Phys.org
oleandora said:
I'mwondering what's the difference between least squares method with least absolute deviation method.
Assume we have y=ax+b+s where s isdeviation.
Is the step to calculate even a and b is different.
I read that those two methods are almost the same but hardly found a real good explanation about LAD.

I assume that you are referring specifically to linear regressions. The difference between least squares and least absolute deviation is what is being optimized when the line is fit to the data. Yes, the mechanics of the LS and LAD (also called "L-1") fitting procedures are quite different.

While regression procedures which optimize different error functions sometimes produce similar results on a given set of data, they can also yield substantially different results. You can see an example of this in my posting, http://matlabdatamining.blogspot.com/2007/10/l-1-linear-regression.html" .


-Will Dwinnell
http://matlabdatamining.blogspot.com/"
 
Last edited by a moderator:
The two methods are quite different in concept. In least squares the estimates are obtained by minimizing the sum of squared differences between the data and the fit (also described as minimizing the sum of the squares of the residuals)

If we call the estimates \widehat{\alpha} and \widehat{\beta}, then for least squares

<br /> S(\widehat{\alpha}, \widehat{\beta}) = \min_{(\alpha,\beta) \in R^2} \sum (y-(\alpha + \beta x))^2<br />

while for L1

<br /> S(\widehat{\alpha}, \widehat{\beta}) = \min_{(\alpha, \beta) \in R^2} \sum |y - (\alpha + \beta) }<br />

Two benefits, not the most important, of least squares:
- the underlying calculations are easier to show with pencil and paper (than they are for L1)
- it is possible to write down formulas for the two estimates obtained from least squares - it isn't for L1

Estimates from both methods have asymptotic normal distributions under fairly general conditions.

The least squares estimates are the classical estimates when normality of the error distribution is assumed: they have certain optimality properties in that case, and, if you are interested in looking only at certain types of estimates, they are BLUE (Best Linear Unbiased Estimates) of the underlying parameters.

Least squares is so widely used because people are familiar with it. Its biggest downside is that fits from least squares are incredibly non-robust (sensitive to outliers and leverage points). L1 fits also suffer from this, but not quite as seriously as least squares.

Regression based on ranks, as well as regression based on Huber's M-estimates, are more robust and, with the ongoing combination of computing power increase and lower cost, are
ever-more reasonable alternatives.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K