Linear Regression, Linear Least Squares, Least Squares, Non-linear Least Squares

hotvette
Homework Helper
Messages
1,001
Reaction score
11
It seems to me that Linear Regression and Linear Least Squares are often used interchangeably, but I believe there to be subtle differences between the two. From what I can tell (for simplicity let's assume the uncertainity is in y only), Linear Regression refers to the general case of fitting a straight line to a set of data, but the method of determining optimal fit can be most anything (e.g. sum of vertical differences, sum of absolute value of vertical differences, max vertical difference, sum of square of vertical differences, etc.), whereas Linear Least Squares refers to a specific measure of optimal fit, namely, sum of the square of vertical differences.

Actually, it seems to me that Linear Least Squares doesn't necessarily mean that you are fitting a straight line to the data, it just means that the modelling function is linear in the unknowns (e.g. y = ax^2 + bx + c is linear in a, b, and c). Perhaps it is established convention that Linear Least Squares does, in fact, refer to fitting a straight line, whereas Least Squares is the more general case?

Lastly, Non-linear Least Squares refers to cases where the modelling function is not linear in the unknowns (e.g. y = e^{-ax^b}, where a,b are sought).

Is my understanding correct on this?
 
Last edited:
Physics news on Phys.org
You can "linearize" y = e^{-ax^b} so that you are fitting an -x^b as opposed to an exponential by fitting \ln(y) = -ax^b. This is done all the time, maybe not linear, but a much easier function to fit in the long run. It is still non-linear in the strictest sense.

You are correct in saying that least squares is a more general case. In theory you can fit any polynomial, exponential, logarithmic, etc...by using least squares.
 
Last edited:
In poking around some of my numerical analysis texts I was reminded of the Levenberg-Marquardt method of fitting curves to data. It seems to be one of the more robust nonlinear least squares methods.

http://www.library.cornell.edu/nr/bookcpdf/c15-5.pdf

Take a look...
 
Last edited by a moderator:
Thanks. I agree re y = e^{-ax^b}. Bad choice on my part. I've read several web articles on Levenberg-Marquardt Method but don't seem to quite follow. I've seen what appears to be 2 versions, one for general unconstrained optimization where you are minimizing an objective function, which for least squares would be \epsilon^2 = \Sigma(f(x_i)-y_i)^2.

[x^{k+1}] = [x^k] - \alpha [H(x^k) + \beta <i>]^{-1}[J(x^k)]</i>

Where x is the unknown parameter list, H(x^k) is the Hessian matrix of second derivatives of the objective function \epsilon^2 with respect to the unknown parameters, \alpha, \beta are parameters that control the stability of the iterative solution, and J(x^k) is the Jacobian of first derivatives of the objective function with respect to the unknown parameters. I've actually been successful in using this for non-linear least squares problems, but convergence is extremely sensitive to \alpha, \beta. In this version, the values for y are used within the objective function. Thus, we have a single equation in n unknowns (depending on the complexity of the fitting function) that we are trying to minimize.

I've also seen articles talking specifically to using Levenberg-Marquardt for non-linear least squares, using a solution technique NOT requiring 2nd derivatives and is completely analogous to the linear least squares solution:

a^{k+1} = a^k - \alpha {[J^TJ + \beta I]^{-1}J^Tf(x)}

Where a is the unknown parameters and J is the Jacobian of y_i = f_i(x_i) with respect to the unknown parameters. In this form, the 2nd derivative isn't used and I've seen comments to the effect that the 2nd derivatives can lead to unstable situations. This version I can't get to work at all and I suspect there is something wrong with my interpretation.

I'd appreciate some help with where I'm going astray with the 2nd version. It seems strange to me that one implementation uses 2nd derivatives and the other doesn't. Many thanks!
 
Last edited:
I found the discrepancy. The f(x) in:

a^{k+1} = a^k - \alpha {[J^TJ + \beta I]^{-1}J^Tf(x)}

is really f(x)-y. I was just using y. I used it successfully and the convergence wasn't nearly as dependent on \alpha and \beta as it was using the first method. I'm amazed that 2 different ways of supposedly using the same method can produce such different results, not in the final answer, but in the complexity of setup (i.e. computing 2nd derivatives vs not) and stability of the solution. I guess that comment I read about 2nd derivatives contributing to instability was right. The net seems to be that the non-linear least squares version of Levenberg-Marquardt is much more stable than the general version for unconstrained optimization. Amazing.
 
Last edited:
I asked online questions about Proposition 2.1.1: The answer I got is the following: I have some questions about the answer I got. When the person answering says: ##1.## Is the map ##\mathfrak{q}\mapsto \mathfrak{q} A _\mathfrak{p}## from ##A\setminus \mathfrak{p}\to A_\mathfrak{p}##? But I don't understand what the author meant for the rest of the sentence in mathematical notation: ##2.## In the next statement where the author says: How is ##A\to...
##\textbf{Exercise 10}:## I came across the following solution online: Questions: 1. When the author states in "that ring (not sure if he is referring to ##R## or ##R/\mathfrak{p}##, but I am guessing the later) ##x_n x_{n+1}=0## for all odd $n$ and ##x_{n+1}## is invertible, so that ##x_n=0##" 2. How does ##x_nx_{n+1}=0## implies that ##x_{n+1}## is invertible and ##x_n=0##. I mean if the quotient ring ##R/\mathfrak{p}## is an integral domain, and ##x_{n+1}## is invertible then...
The following are taken from the two sources, 1) from this online page and the book An Introduction to Module Theory by: Ibrahim Assem, Flavio U. Coelho. In the Abelian Categories chapter in the module theory text on page 157, right after presenting IV.2.21 Definition, the authors states "Image and coimage may or may not exist, but if they do, then they are unique up to isomorphism (because so are kernels and cokernels). Also in the reference url page above, the authors present two...
Back
Top