The least squares approximation - best fit lines revisited

Click For Summary

Discussion Overview

The discussion revolves around the least squares method for finding best fit lines in the context of linear relationships between two random variables, y and x. Participants explore the effectiveness of this method compared to alternative approaches, particularly focusing on the implications of noise in the data and the underlying distribution of errors.

Discussion Character

  • Debate/contested
  • Exploratory
  • Technical explanation

Main Points Raised

  • Some participants question whether the least squares method is indeed the best approach for fitting a line to data, suggesting that minimizing absolute differences may yield different results.
  • One participant introduces the concept of noise in measurements, proposing that the relationship should be modeled as Y = aX + b + N, where N represents noise, and suggests maximizing the likelihood of noise instead of using least squares.
  • Another participant argues that least squares is optimal when errors are Gaussian, while other methods may be preferable if the error distribution differs, such as using least absolute error for a two-sided exponential error distribution.
  • There is a discussion about the implications of different methods for polynomial approximations, noting that minimizing worst-case error may be more beneficial than least squares in certain contexts.
  • Some participants emphasize the importance of empirical testing to determine which method yields the best results for specific applications, acknowledging that theoretical predictions may not always align with practical outcomes.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of the least squares method versus alternative approaches, indicating that there is no consensus on which method is universally superior. The discussion remains unresolved regarding the best approach under varying conditions.

Contextual Notes

Participants note that the effectiveness of different methods may depend on the underlying distribution of errors and the characteristics of the noise present in the data. There is also mention of the central limit theorem in relation to the normality of error distributions.

Who May Find This Useful

This discussion may be of interest to those involved in data analysis, statistical modeling, and experimental design, particularly in fields where fitting models to data is critical, such as physics, engineering, and applied mathematics.

cosmicminer
Messages
20
Reaction score
1
We all know the least squares method to find the best fit line for a collection of random data.
But I wonder if it is the best method.

Suppose we have two random variables y and x that appear to have a linear relation of the type y = ax+b.
What we want is, given the next type x signal to predict as close as possible the value of the y signal.
The well known method tells us to use our experimental readings and minimize the variance functional - so the values of a and b are easily computed.
That is we seek the minimum of

F = sum of [ ( Yi - aXi - b) ^ 2 ]

which works out easily - see your maths book.

But what if I go for the minimum of

F1 = sum of [ | Yi - aXi - b | ]

instead ?

This one has no analytical solution but that does n't matter because it is very easy to work it out using any crude numerical approach.

So in general we get two different -or somewhat different- best fit lines.
Which one is the best best ?

After all if we go back to the standard normal distribution, the moment X^2 is the variance (sigma ^ 2) and the moment |x| is also proportional to the standard deviation (|x| moment = sigma x sqr ( 2 / pi ) ).

In a real problem using one method and then the other, the results are not likely to be identical.
In terms of probabilistic inference which method is better ?
And I don't believe in calculus books, because maybe what they wanted was to have an analytical solution and print it !
 
Physics news on Phys.org
You've overlooked a part of the model: the noise! While you might have the relationship

Y = aX + b

what you are measuring is the value

Z = Y + N

where N is the noise. So you don't have (X, Y) pairs, but instead have (X, Z) pairs.

Given an a and b, you can compute the noise from an (X, Z) pair via:

N = Z - aX - b

So, a reasonable way to attack the problem is to find the (a, b) pair that gives the most likely noise.

In other words, you want to maximize P(N).

If you model the noise on a gaussian, then you have:

P(N) = P exp(Q N²)

for some constants P and Q. But, since we can calculate N from our (X, Z) data points, given an assumption on a and b:

P(N) = P exp(Q (Z - aX - b)²)

But wait, we have several (independent) data points: we should really be looking at:

[tex] \prod_i P(N_i) = \prod_i P e^{Q (Z_i - aX_i - b)^2}<br /> = P e^{Q \sum_i (Z_i - aX_i - b)^2}[/tex]

So we want to maximize this expression over all a and b. Look familiar? (Recall that Q < 0)


If you don't like my introduction of the Z variable, you should say that you are modelling your Y's as:

Y = aX + b + N

instead of simply being Y = aX + b. (Since that relationship is clearly not true, from the data!)


So there's the theoretical reason. (P.S. least squares is often given in stats books too)

Of course, empirical testing is important -- you might find that your particular channel is not gaussian noise, is better handled with least-absolute error approximation, instead of least-squared error.


Of course, I have not shown that the resulting estimator for future Z's is a good one -- there's no guarantee that the best a and b yield the best estimator for Z in terms of X. I'd have to spend a bit more time to figure that one out.
 
Last edited:
So you are saying the least squares method is strictly speaking the correct one and not the absolute differences.
Looks like sound argument you made - the difference in real applications is likely to be small though to be readily obvious.
What might be a source of random variables of this type (other than computer generated random numbers) ?
 
Last edited:
Least squares is the best method if the error source is Gaussian. A method other than least squares is "better" if the underlying random process is not Guassian. For example, minimizing the sum of the absolute errors produces the "best" fit (in a maximum likelihood sense) if the underlying error process is a two-sided exponential,
[tex]P(x) = k \exp\left(-\left|\frac{x-x_0}{\sigma}\right|\right)[/tex]

Another kind of problem is finding the "best" representation of some function [itex]f(x)[/itex]. Suppose you are asked to find some polynomial approximation of [itex]\sin(x)[/itex] over some interval. Least squares will produce one answer. A "better" answer is produced by minimizing the worst-case error over the interval, because that is the error I care about as a user of your approximuation.
 
If you believe/know that the least sq. estimator is biased for whatever reason (e.g. non-normal error), you should use a max. likelihood estimator. You need to set up the likelihood function then maximize it. It may not always have an analytic solution, though.

With normal (gaussian) errors, least sq. estimator is identical to the max. likelihood estimator.
 
What might be a source of random variables of this type (other than computer generated random numbers) ?
Anything, really, due to the central limit theorem.

Very loosely speaking, if there are enough independent factors that contribute to noise, the result will look normally distributed.

I do want to stress again that empiricism is useful -- practice has a nasty habit of defying theory's predictions, especially on the fine details! If you have the time and ability, it would almost certainly be worth trying out different methods to see which one gives the best results for your application.

Or, if you're really ambitious, you could carry out a detailed analysis of the noise to guide the design of a better way to estimate your y's.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
24
Views
3K