# Substitution to turn a non-linear least squares problem into a linear one

1. Dec 11, 2012

### bers

Hi,
I want to solve an overdetermined non-linear equation with the method of least squares. Assume it's f(x) = 1 + ax + a^2 + b, and I want to estimate a and b. This is non-linear, as I said, so the derivatives of the squared residuals involve a^3 terms and are difficult to solve.
Now I thought about substituting c = a^2 + b, which would turn this into a linear system that LOOKS pretty easy to solve. But when calculating the derivatives of the residuals, do I need to take into account that dc/da != 0 (which would make the terms quite complicated again), or can I simply proceed as if c had been the original variable?
If I am allowed to go this way (I guess I am), what are the conditions - when am I allowed to substitute to turn a non-linear least-squares system into a linear one? Can you point me at any fundamental rule for this? I searched Google quite a bit, but have not found any adequate.

Thanks
bers

2. Dec 11, 2012

### HallsofIvy

You can approximate a non-linear problem, about a fixed set of values, by replacing any non-linear formula with its linear approximation. You are allowed to do this any time, at any point but how accurate the approximation is depends on how close to linear the function is at that point. If you are allowed to choose the point, then take the derivative and choose the point where the derivative (or its magnitude) is smallest. If, as often happens, you are given the point, check the derivative to see if "linearizing" will give reasonably accurate results.

3. Dec 11, 2012

### hotvette

Least squares using non-linear functions can be rather straighforward. It's really just a series of linearized least squares problems (based on an initial guess of the solution) that (hopefully) iteratively converge to a solution. Difficulties can arrise if the starting point is far from the solution and/or the matrices are ill-conditioned.

Last edited: Dec 11, 2012
4. Dec 12, 2012

### bers

Hi,
thanks for your answers. I do think, however, I did not point out two important things clearly enough:

1. I am aware of many iterative algorithms for this problem. However, I'm interested in a non-iterative solution. (Would you call this an ad-hoc solution?) Anyway, I want to have a fixed-run-time algorithm for the problem I mentioned, and we all know that linear problems can be solved this way.

2. I am also not interested in an approximation. I am interested whether the substitution will allow me to find the exact solution in a linear way.

So let me summarize what I got to so far. Let's talke about $f(x) = 1 + ax + a^2 + b$. Using some data $x_i$ and $y_i$, I can define the residual

$R = \sum_{i=1}^n ||f(x_i) - y_i||^2$

Inserting $f$ with $c$ substituted by $a^2 + b$, I find

$R = \sum_{i=1}^n ||1 + ax_i + c - y_i||^2$

Now, I need to calculate the derivatives with respect to a and c. Not knowing whether total or partial derivatives are the way to go, let's try both.

$\partial R / \partial a = \sum_{i=1}^n 2 (1 + ax_i + c - y_i) \cdot x_i = 0$
$\partial R / \partial c = \sum_{i=1}^n 2 (1 + ax_i + c - y_i) \cdot 1 = 0$

These are pretty easy to solve non-iteratively, because they are linear in a and c, although my original problem is not. That is my main concern. Will the solution I achieve from this system of equations be exact? In the manual examples I calculated, they were, but I cannot be sure this is always the case.

If we go for total derivatives, we need to account for c depending on a, with $\partial c / \partial a = 2a$ and $\partial a / \partial c$ involving roots.

$dR / da = \sum_{i=1}^n 2 (1 + ax_i + c - y_i) \cdot (x_i + \partial c / \partial a) = 0$
$dR / dc = \sum_{i=1}^n 2 (1 + ax_i + c - y_i) \cdot (1 + \partial a / \partial c)= 0$

These are now not linear any more, and much more complex to solve. Testing some values fulfilling $f(x_i) = y_i$ and the correct values for a and b will of course make the first term $1 + \dots - y_i$ equal 0, so the derivatives are 0 in both cases.

So, still I am not sure which way is the right one...

5. Dec 12, 2012

### hotvette

Seems to me your first approach is perfectly valid. But, the problem is even simpler if you let c = 1 + a2 + b. Your equation is then f(x) = ax + c which is very easy to solve for a, c from a least squares standpoint. Then, you can determine b = c - 1 - a2.

6. Dec 12, 2012

### bers

Yes, I confirmed that this works using distorted data (y_i = f(x_i) + some error) with a non-zero residuum. I obtained the same estimates as with Matlab and a non-linear iterative procedure (and got wrong results with the total differential method, by the way).

So that would mean I can obtain a linear formulation for a lot of non-linear problems, such as f(x) = bx + ax + a^2. For example, I could do c = a^2 and then d = b + sqrt(c) to obtain f(x) = dx + c. What I am wondering is:
- Why can't I find any mentioning of these substitutions on Google or elsewhere?
- Under which conditions can I do this substitution? So, what "kinds of" non-linearities can be (perfectly) "linearized", and which ones can't?

7. Dec 12, 2012

### hotvette

I believe the problem you stated is really a linear problem. Let's say the problem was proposed in reverse. I give you data and ask you to fit it to the equation f(x) = ax + c. After you give me the result I tell you I want to re-define c = 1 + a2 + b. The fact that a valid algebraic manipulation is performed on the parameters doesn't turn a linear problem into a non-linear one.

It seems to me (though I have no idea how to prove it mathematically) that if you can obtain a solveable algebraic expression for the unknown parameters you'll be ok.