# Analytical linear regression: is it possible?

1. Aug 5, 2011

### striphe

I've been told that their exists no perfect mathematical method of obtaining a line of best fit from a population of data.

This doesn't make a whole lot of sense to me, so I have made an attempt at doing such (see google docs link)

Is their a way of determining if the formula has any merit?

Last edited by a moderator: May 5, 2017
2. Aug 5, 2011

### chiro

In linear algebra, there is a method known as least squares that finds the best linear approximation to some system.

The method uses the idea that the best approximation provides the linear object with closest distance and this is represented through measures of orthogonality (if you are familiar with the equation of a linear object, the shortest distance from a point to that object is based on the inner product with the point and the "normal" of the object).

I can't really say though with respect to that formula, and its 11pm and I'm tired, but chances are that you may be able to prove it via a least squares formalism.

Last edited by a moderator: May 5, 2017
3. Aug 5, 2011

### Stephen Tashi

The is no single definition for what makes a curve fit "best" or "perfect", so unless you give a mathematical definition for those terms, nobody can say if your formula accomplishes a "perfect" or "best" fit.

For various definitions of "best", there are known ways to attain a "best" fit. The most common criterion for "best" is that the curve minimize the square of the errors in the dependent variable. But there are no mathematical theorems that say this is the only criteria for "best". An example of a method that pursues a different definition of "best" is "total least squares regression".

4. Aug 5, 2011

### striphe

Say if I state that the definition of the line of best fit, is the line which achieves the least absolute errors, when measurements from the line are taken perpendicular to the line. (The measurement is at right angles to the line)

Is there some way of determining if this is a more robust definition of best?

5. Aug 5, 2011

### chiro

This is the basic idea in least squares approximation.

6. Aug 5, 2011

### Pyrrhus

The use of alternative definitions to OLS has already been done. The literature is very rich. Just do a google search.

7. Aug 5, 2011

### Stephen Tashi

To clarify that remark, the basic idea of least squares approximation is to be a method that is, in some sense, more robust than minimizing the total of the absolute errors. The line that minimizes the absolute errors need not be the same as the line that minimizes the mean square error.

8. Aug 5, 2011

### Stephen Tashi

striphe,

I don't think your formula can be appiied to data sets such as (-1,-1),(0,0),(1,1) since it involves division by zero.

Last edited: Aug 5, 2011
9. Aug 6, 2011

### striphe

The issue has to do with determining the gradient between an individual that has the same values for x and y as the mean and the mean, It's anything and everything as they are in the same position.

As a result I've had to change the definitions of y, x and n, so that they are not inclusive of any individuals that are in the same co-ordinates as the mean. Ignoring these individuals once the mean is calculated is the best policy (see the Google doc link in first post for more details)

you must understand that minimising the absolute error is different to the absolute perpendicular error. The length between the line and individual is at 90 degrees to the line; it being the minimum distance between the line and the individual.

Does this technique that I have described exist in the literature?

10. Aug 6, 2011

### Stephen Tashi

You haven't explained what you are trying to minimize. Are you minimizing the size of the largest perpendicular distance between a data point (x,y) and the line you calculate? Or are you minimizing the average of those distances taken over all data points?

You also haven't explained why you think your formula minimizes whatever it is that your are trying to minimize.

I don't know if your formula exists in literature. It isn't the mainstream way of fitting lines to data. Have you actually applied this formula to any real world examples? I doubt many people would want to use your formula since it is so strongly influenced by small errors in $y$ when $(x,y)$ is near $( \bar{x} , \bar{y} )$.

11. Aug 6, 2011

### Stephen Tashi

Apply your formula to the dataset (-10.0, -10.0), (-0.1, -0.4), (10.1,10.4)

12. Aug 6, 2011

### striphe

I've clearly jumped the gun on this one, the formulation doesn't match up with the minimum sum of absolute errors. The best example of this would be to compare the data set [(-10,0)(-1,-1)(1,1)(10,0)] and [(-20,0)(-1,-1)(1,1)(20,0)] they both have the same line of best fit, but you would intuitively know that the latter would have a line of best fit with a lower gradient.

I still do not see how the minimum sum of absolute perpendicular errors isn't a preferable method; as it isn't determined by the relationship the individuals have with the x axis but the relationship that the individuals have with each other.

13. Aug 7, 2011

### Stephen Tashi

Then you haven't thought clearly about the question of why one method should be preferable to another.

If you are doing measurements where you are confident that x can be measured precisely and the y measurement is the one that is subject to "random errors" then measuring error along the y axis instead of perpendicular to the regression line makes more sense.

If you are dealing with some situation where percentage errors are what matters (like calibrating a measuring instrument who specs state a max percentage error in its reading) then percentage error is more important than absolute error.

That said, I agree that it would be interesting to investigate how to find lines that minimize the sum of the absolute perpendicular errors. However, notice that more than one line may have have that property. Intuitively, if you have a line that runs through the data points and doesn't hit any of them, you can move that line perpendicular to itself and you won't change the sum of the perpedicular errors as long as you don't cross any data points. So "the line" that minimizes the sum of the absolute perpendicular errors may be one of infinitely many other lines that have that property.

As I said, I haven't searched for whether algorithms to solve this problem have been written up. This is the type of problem that computers can solve numerically -by trial an error if need be. If you are interested in the problem, you should do some searching. The least squares perpendicular error problem is called "total least squares" curve fitting. You might find something if you search for "total absolute error" regression or curve fitting.

14. Aug 7, 2011

### SW VandeCarr

Typically there are two choices of fitting models to data: Least Squares (LSE) and Maximum Likelihood (MLE) estimation. The latter is considered better for parameter estimation and for non linear data. LSE is often preferred for linear data, particularly when the data are relatively sparse, but still sufficient for hypothesis testing.

Last edited by a moderator: May 5, 2017
15. Aug 7, 2011

### striphe

The thing is, the line of best fit has to go through the mean of the population. I think you will find their are instances where multiple lines can exist, but for the most part they don't.

16. Aug 7, 2011

### SW VandeCarr

If you overlooked my post, that's why MLE is often preferred. The single most likely line/curve, given the data, is selected by an iterative process which maximizes the likelihood function.

17. Aug 8, 2011

### I like Serena

Yes, this is often a more robust definition of best.

In practice you often have outliers in your measurements, which can have various causes.
These points are weighed inordinately heavy by a least squares fit.
To reduce this effect, the method you propose is used (least absolute errors).

This is documented in numerical literature.

18. Aug 8, 2011

### SW VandeCarr

The OP asked about the possibility of analytic linear regression. I've answered his/her question. Is there any reason why this thread keeps on going? Please read the first sentence in the second paragraph:

http://www.itl.nist.gov/div898/handbook/apr/section4/apr412.htm

With small samples of linear data, SLE is better, but the fully analytic MLE is better in most other cases. LSE is not fully analytic in that it is (usually) a linear approximation to the MLE.

Last edited: Aug 8, 2011