Linear Regression, Linear Least Squares, Least Squares, Non-linear Least Squares

hotvette · Oct 21, 2005

It seems to me that Linear Regression and Linear Least Squares are often used interchangeably, but I believe there to be subtle differences between the two. From what I can tell (for simplicity let's assume the uncertainity is in y only), Linear Regression refers to the general case of fitting a straight line to a set of data, but the method of determining optimal fit can be most anything (e.g. sum of vertical differences, sum of absolute value of vertical differences, max vertical difference, sum of square of vertical differences, etc.), whereas Linear Least Squares refers to a specific measure of optimal fit, namely, sum of the square of vertical differences.

Actually, it seems to me that Linear Least Squares doesn't necessarily mean that you are fitting a straight line to the data, it just means that the modelling function is linear in the unknowns (e.g. [itex]y = ax^2 + bx + c[/itex] is linear in a, b, and c). Perhaps it is established convention that Linear Least Squares does, in fact, refer to fitting a straight line, whereas Least Squares is the more general case?

Lastly, Non-linear Least Squares refers to cases where the modelling function is not linear in the unknowns (e.g. [itex]y = e^{-ax^b}[/itex], where a,b are sought).

Is my understanding correct on this?

Dr Transport · Oct 21, 2005

You can "linearize" [tex]y = e^{-ax^b}[/tex] so that you are fitting an [tex]-x^b[/tex] as opposed to an exponential by fitting [tex]\ln(y) = -ax^b[/tex]. This is done all the time, maybe not linear, but a much easier function to fit in the long run. It is still non-linear in the strictest sense.

You are correct in saying that least squares is a more general case. In theory you can fit any polynomial, exponential, logarithmic, etc...by using least squares.

Dr Transport · Oct 22, 2005

In poking around some of my numerical analysis texts I was reminded of the Levenberg-Marquardt method of fitting curves to data. It seems to be one of the more robust nonlinear least squares methods.

http://www.library.cornell.edu/nr/bookcpdf/c15-5.pdf

Take a look...

hotvette · Oct 27, 2005

Thanks. I agree re [itex]y = e^{-ax^b}[/itex]. Bad choice on my part. I've read several web articles on Levenberg-Marquardt Method but don't seem to quite follow. I've seen what appears to be 2 versions, one for general unconstrained optimization where you are minimizing an objective function, which for least squares would be [itex]\epsilon^2 = \Sigma(f(x_i)-y_i)^2[/itex].

[tex][x^{k+1}] = [x^k] - \alpha [H(x^k) + \beta <i>]^{-1}[J(x^k)]</i>[/tex]

Where x is the unknown parameter list, [itex]H(x^k)[/itex] is the Hessian matrix of second derivatives of the objective function [itex]\epsilon^2[/itex] with respect to the unknown parameters, [itex]\alpha, \beta[/itex] are parameters that control the stability of the iterative solution, and [itex]J(x^k)[/itex] is the Jacobian of first derivatives of the objective function with respect to the unknown parameters. I've actually been successful in using this for non-linear least squares problems, but convergence is extremely sensitive to [itex]\alpha, \beta[/itex]. In this version, the values for y are used within the objective function. Thus, we have a single equation in n unknowns (depending on the complexity of the fitting function) that we are trying to minimize.

I've also seen articles talking specifically to using Levenberg-Marquardt for non-linear least squares, using a solution technique NOT requiring 2nd derivatives and is completely analogous to the linear least squares solution:

[tex]a^{k+1} = a^k - \alpha {[J^TJ + \beta I]^{-1}J^Tf(x)}[/tex]

Where a is the unknown parameters and J is the Jacobian of [itex]y_i = f_i(x_i)[/itex] with respect to the unknown parameters. In this form, the 2nd derivative isn't used and I've seen comments to the effect that the 2nd derivatives can lead to unstable situations. This version I can't get to work at all and I suspect there is something wrong with my interpretation.

I'd appreciate some help with where I'm going astray with the 2nd version. It seems strange to me that one implementation uses 2nd derivatives and the other doesn't. Many thanks!

hotvette · Oct 27, 2005

I found the discrepancy. The [itex]f(x)[/itex] in:

[tex]a^{k+1} = a^k - \alpha {[J^TJ + \beta I]^{-1}J^Tf(x)}[/tex]

is really [itex]f(x)-y[/itex]. I was just using [itex]y[/itex]. I used it successfully and the convergence wasn't nearly as dependent on [itex]\alpha[/itex] and [itex]\beta[/itex] as it was using the first method. I'm amazed that 2 different ways of supposedly using the same method can produce such different results, not in the final answer, but in the complexity of setup (i.e. computing 2nd derivatives vs not) and stability of the solution. I guess that comment I read about 2nd derivatives contributing to instability was right. The net seems to be that the non-linear least squares version of Levenberg-Marquardt is much more stable than the general version for unconstrained optimization. Amazing.

Linear Regression, Linear Least Squares, Least Squares, Non-linear Least Squares

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad The vector to which a dual vector corresponds

Graduate Confusion about the Moyal-Weyl twist

Undergrad 2 interpretations of bra-ket expression: equal, & isomorphic, but...

Undergrad Spinor calculus

Undergrad Matrix representation of rank-2 spinors

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect