The least squares approximation - best fit lines revisited

In summary, the conversation discusses the least squares method for finding the best fit line for a collection of random data and questions if it is the best method. It introduces the idea of using the sum of absolute differences instead of the sum of squared differences and discusses the theoretical reasons for both methods. It also mentions the importance of empirical testing and considering the underlying random process when choosing the best method. Ultimately, it concludes that the best method may vary depending on the specific application and recommends experimenting with different methods or analyzing the noise to find a better approach.
  • #1
cosmicminer
20
1
We all know the least squares method to find the best fit line for a collection of random data.
But I wonder if it is the best method.

Suppose we have two random variables y and x that appear to have a linear relation of the type y = ax+b.
What we want is, given the next type x signal to predict as close as possible the value of the y signal.
The well known method tells us to use our experimental readings and minimize the variance functional - so the values of a and b are easily computed.
That is we seek the minimum of

F = sum of [ ( Yi - aXi - b) ^ 2 ]

which works out easily - see your maths book.

But what if I go for the minimum of

F1 = sum of [ | Yi - aXi - b | ]

instead ?

This one has no analytical solution but that does n't matter because it is very easy to work it out using any crude numerical approach.

So in general we get two different -or somewhat different- best fit lines.
Which one is the best best ?

After all if we go back to the standard normal distribution, the moment X^2 is the variance (sigma ^ 2) and the moment |x| is also proportional to the standard deviation (|x| moment = sigma x sqr ( 2 / pi ) ).

In a real problem using one method and then the other, the results are not likely to be identical.
In terms of probabilistic inference which method is better ?
And I don't believe in calculus books, because maybe what they wanted was to have an analytical solution and print it !
 
Physics news on Phys.org
  • #2
You've overlooked a part of the model: the noise! While you might have the relationship

Y = aX + b

what you are measuring is the value

Z = Y + N

where N is the noise. So you don't have (X, Y) pairs, but instead have (X, Z) pairs.

Given an a and b, you can compute the noise from an (X, Z) pair via:

N = Z - aX - b

So, a reasonable way to attack the problem is to find the (a, b) pair that gives the most likely noise.

In other words, you want to maximize P(N).

If you model the noise on a gaussian, then you have:

P(N) = P exp(Q N²)

for some constants P and Q. But, since we can calculate N from our (X, Z) data points, given an assumption on a and b:

P(N) = P exp(Q (Z - aX - b)²)

But wait, we have several (independent) data points: we should really be looking at:

[tex]
\prod_i P(N_i) = \prod_i P e^{Q (Z_i - aX_i - b)^2}
= P e^{Q \sum_i (Z_i - aX_i - b)^2}
[/tex]

So we want to maximize this expression over all a and b. Look familiar? (Recall that Q < 0)


If you don't like my introduction of the Z variable, you should say that you are modelling your Y's as:

Y = aX + b + N

instead of simply being Y = aX + b. (Since that relationship is clearly not true, from the data!)


So there's the theoretical reason. (P.S. least squares is often given in stats books too)

Of course, empirical testing is important -- you might find that your particular channel is not gaussian noise, is better handled with least-absolute error approximation, instead of least-squared error.


Of course, I have not shown that the resulting estimator for future Z's is a good one -- there's no guarantee that the best a and b yield the best estimator for Z in terms of X. I'd have to spend a bit more time to figure that one out.
 
Last edited:
  • #3
So you are saying the least squares method is strictly speaking the correct one and not the absolute differences.
Looks like sound argument you made - the difference in real applications is likely to be small though to be readily obvious.
What might be a source of random variables of this type (other than computer generated random numbers) ?
 
Last edited:
  • #4
Least squares is the best method if the error source is Gaussian. A method other than least squares is "better" if the underlying random process is not Guassian. For example, minimizing the sum of the absolute errors produces the "best" fit (in a maximum likelihood sense) if the underlying error process is a two-sided exponential,
[tex]P(x) = k \exp\left(-\left|\frac{x-x_0}{\sigma}\right|\right)[/tex]

Another kind of problem is finding the "best" representation of some function [itex]f(x)[/itex]. Suppose you are asked to find some polynomial approximation of [itex]\sin(x)[/itex] over some interval. Least squares will produce one answer. A "better" answer is produced by minimizing the worst-case error over the interval, because that is the error I care about as a user of your approximuation.
 
  • #5
If you believe/know that the least sq. estimator is biased for whatever reason (e.g. non-normal error), you should use a max. likelihood estimator. You need to set up the likelihood function then maximize it. It may not always have an analytic solution, though.

With normal (gaussian) errors, least sq. estimator is identical to the max. likelihood estimator.
 
  • #6
What might be a source of random variables of this type (other than computer generated random numbers) ?
Anything, really, due to the central limit theorem.

Very loosely speaking, if there are enough independent factors that contribute to noise, the result will look normally distributed.

I do want to stress again that empiricism is useful -- practice has a nasty habit of defying theory's predictions, especially on the fine details! If you have the time and ability, it would almost certainly be worth trying out different methods to see which one gives the best results for your application.

Or, if you're really ambitious, you could carry out a detailed analysis of the noise to guide the design of a better way to estimate your y's.
 

1. What is the least squares approximation?

The least squares approximation is a mathematical method used to find the best fit line for a set of data points. It minimizes the sum of the squared distances between the data points and the line, making it the most accurate representation of the relationship between the variables.

2. How is the best fit line determined using the least squares method?

The best fit line is determined by finding the values of the slope and y-intercept that minimize the sum of the squared distances between the data points and the line. This is achieved by finding the partial derivatives of the sum of squares equation and setting them equal to 0, then solving for the slope and y-intercept.

3. What types of data can be analyzed using the least squares approximation?

The least squares approximation can be used for any type of data that has two variables and a linear relationship. This includes data sets from various fields such as economics, physics, and social sciences.

4. How accurate is the best fit line obtained through least squares approximation?

The accuracy of the best fit line depends on the amount of variation in the data. If the data points are closely clustered around the line, the fit will be more accurate. However, if there is a lot of variation in the data, the fit may not be as accurate.

5. Are there any limitations to using the least squares approximation?

The least squares approximation assumes that the relationship between the variables is linear. If the relationship is not linear, the fit may not be accurate. Additionally, the method may not be suitable for data sets with a large number of outliers or extreme values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
483
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
493
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
893
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
978
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
Back
Top