# A Least-squares - fitting an inverse function

1. Aug 22, 2015

### kandelabr

I have a set of data points that I must fit to an inverse function like this:

y(x) = a/(x+b) + c

My problem is that least-squares fitting using this equation is extremely unstable and heavily dependent on initial guess. No matter how accurate parameters I start with, the algorithm often goes berserk (trying 3.5e+58 instead of 1.2 etc.). It also doesn't matter which algorithm I use.

I guess there must be some mathematical preconditioning voodoo or something that I could use, but I can't find anything that works for me.

Any ideas?

Thanks!

2. Aug 22, 2015

### Dr. Courtney

Try parameterizing it differently, but in a way that is equivalent.

Maybe: y(x) = a/(x/b + 1) + c

Try some different ones until you find something stable.

3. Aug 22, 2015

### SteamKing

Staff Emeritus
It depends sometimes on the data you are trying to fit.

If the same data set contains 3.5E+58 and 1.2, you're probably going to run into problems with scaling the data. Sometimes, these problems can be minimized, but you'll have to provide more information about the problems you are experiencing and the data you are using.

4. Aug 23, 2015

### HallsofIvy

By the way- that is NOT an "inverse" function, it is a "rational" function. The word "inverse", applied to functions means the inverse of the composition of two functions, not the reciprocal.

5. Aug 27, 2015

### kandelabr

Whoa, sorry for late reply. Let me answer all of you at once.

I have tried, and voila, it's much better. I have also found out that I can fix one parameter and only fit the other. That makes everything very simple.
This is a calibration of a somewhat simplified capacitive displacement sensor. It turned out that those simplifications bring complications with everything else, so we abandoned this type of sensing and went straight to strain gauges. :) (we'll still be using the same calibration procedure in the non-simplified versions)

You're right. But it's what results contain, not my data set. I just wanted to point out that if you get a value this high but you're expecting something 'normal', you know something went wrong.

You're also right. I have no idea where I got that name from. It's an inverse of y(x) = x, though :)

Thank you all for your answers. I'll be having trouble in the future, so we'll discuss more later :)

6. Aug 27, 2015

### DrStupid

With

$x \cdot y = a + b \cdot c - b \cdot y + c \cdot x$

you don't need an algorithm at all. But as it results in another minimum you would need to check if it works with your data.

7. Aug 27, 2015

### kandelabr

what do you mean with "don't need an algorithm"? i'm solving overdetermined system of equations using least squares. am i missing something?

8. Aug 28, 2015

### JJacquelin

A direct method to solve the problem (no iterative process, no initial guess required) is shown in attachment.
This refers to §.5 of the paper :

Since this paper is in French, the equations are fully written in attachment. So one can apply the method without reading the referenced paper.

Sorry, I don't succeed to joint the attachment. If interested, send me a private message.

9. Aug 28, 2015

### kandelabr

looks interesting, i'll check it out.

thanks

10. Aug 28, 2015

### DrStupid

In this case the least square methods results in a system of three linear equations that can be solved explicit.

11. Aug 28, 2015

### kandelabr

I actually tried that before - I chose random points and solved the system numerically with numpy, but the problem is that measurements have some noise and different points give different results. I even chose multiple sets of random selections of points and averaged the results, but it doesn't work that way.

12. Aug 28, 2015

### SteamKing

Staff Emeritus
If you want to use the entire data set, noise and all, you can form the normal equations and solve them, like is done for a linear regression. The problem is that numerically, for some types of data, the solution to the normal equations may be unstable, particularly if you have data covering several orders of magnitude or more.

In those cases, formation of the normal equations may not be advisable. You can still use the data, by writing an equation for each data point, and solving the resulting rectangular system using QR-factorization or Singular Value Decomposition.

13. Oct 20, 2016

### MarcJ

Hello all,

I found this thread because I'm trying to do the same thing: measure a distance using a capacitive sensor. As an extra, I'm doing this as the data comes in, so a streaming algorithm would be best. Normal linear regression can easily be used with streaming data, since only summations are involved (i.e., you don't need to go over all the data for each incoming sample; you can just push a new sample in and pop the old one out). That's what we've been using so far.
Now this is really helpful, because it would allow us to upgrade the algorithm from linear to hyperbolic regression (I know, it would be linear regression over a hyperbolic function). I have tried the simple approach as described here, but that doesn't take into account the offset $c$. I read the French paper (as far as I could), but I haven't been able to extrapolate it to this case. §5 is bit too general for my abilities...

@JJacquelin or @DrStupid, could you send or transcribe the set of equations you mentioned? That would benefit a lot of people, I think.

14. Oct 20, 2016

### DrStupid

But it answeres your question. Let me try to explain it with matrix notation:

With $a = \left( {\begin{array}{*{20}c} {a_0 } \\ {a_1 } \\ \vdots \\ {a_m } \\ \end{array}} \right)$ , $f = \left( {\begin{array}{*{20}c} {F_0 \left( {x_0 } \right)} & {F_1 \left( {x_0 } \right)} & \cdots & {F_m \left( {x_0 } \right)} \\ {F_0 \left( {x_1 } \right)} & {F_1 \left( {x_1 } \right)} & {} & {} \\ \vdots & {} & \ddots & {} \\ {F_0 \left( {x_n } \right)} & {} & {} & {F_m \left( {x_n } \right)} \\ \end{array}} \right)$ and $F = \left( {\begin{array}{*{20}c} {F\left( {x_0 } \right)} \\ {F\left( {x_1 } \right)} \\ \vdots \\ {F\left( {x_n } \right)} \\ \end{array}} \right)$

(where x can be a vector itself) the equation

$F\left( {x_k } \right) \approx \sum\limits_i {a_i f_i \left( {x_k } \right)}$

can be writen as

$F \approx f \cdot a$

and the minimization problem

$\left( {f \cdot a - F} \right)^2 \Rightarrow Min.$

results in

$a = \left( {f^T f} \right)^{ - 1} f^T F$

You can derive the elements of the pseudoinverse $\left( {f^T f} \right)^{ - 1} f^T$ manually or use the matrix equation directly. Even Excel can do that. But if you use Excel you might prefer the build-in functionality for multiliniear regression (LINEST function).

15. Oct 20, 2016

### MarcJ

Thanks Doctor!

Thing is, I'm not using Excel of even high-level libraries. This runs on an embedded platform where memory is tight. All code is in C. So even for the linear regression we're using now, I have to spell out the equations and make sure nothing overflows. That works quite well, but the system would benefit from using all information in the signal. We know the relationship: $$C(d) = C_0 + \epsilon_0 \epsilon_r \frac{A}{d}$$ We only change $d$, and fringe effects are insignificant here, so the expression holds very well. $C_0$ is mostly due to wiring capacitance and is relatively large, so it cannot simply be ignored. We don't know ahead of time where we start on the curve.

For after-the-fact analysis I can use Numpy's polyfit or Scipy's curve_fit. The real-time process needs equations, and my linear algebra is too rusty by now to expand $\left( {f^T f} \right)^{ - 1} f^T$ (it's been over 15 years...) I hope you can help me here.

16. Oct 20, 2016

### DrStupid

That's not good for the solution above, because the pseudeinverse is an m×m matrix. But I have another solution with an n×n matrix:

With $F_k : = F\left( {x_k } \right)$ and $f_k : = \left( {f_0 \left( {x_k } \right),f_1 \left( {x_k } \right), \ldots ,f_m \left( {x_k } \right)} \right)^T$ the minimization problem

$\sum\limits_k {\left( {f_k^T a - F_k } \right)^2 } \Rightarrow Min.$

results in

$a = \left( {\sum\limits_k {f_k f_k^T } } \right)^{ - 1} \sum\limits_k {f_k F_k }$

With three parameters this means

$\sum\limits_k {f_k F_k } = \left( {\begin{array}{*{20}c} {\sum\limits_k {f_{1,k} \cdot F_k } } \\ {\sum\limits_k {f_{2,k} \cdot F_k } } \\ {\sum\limits_k {f_{3,k} \cdot F_k } } \\ \end{array}} \right)$

and

$\sum\limits_k {f_k f_k^T } = \left( {\begin{array}{*{20}c} {\sum\limits_k {f_{1,k} f_{1,k} } } & {\sum\limits_k {f_{1,k} f_{2,k} } } & {\sum\limits_k {f_{1,k} f_{3,k} } } \\ {\sum\limits_k {f_{1,k} f_{2,k} } } & {\sum\limits_k {f_{2,k} f_{2,k} } } & {\sum\limits_k {f_{2,k} f_{3,k} } } \\ {\sum\limits_k {f_{1,k} f_{3,k} } } & {\sum\limits_k {f_{2,k} f_{3,k} } } & {\sum\limits_k {f_{3,k} f_{3,k} } } \\ \end{array}} \right): = \left( {\begin{array}{*{20}c} {S_{11} } & {S_{12} } & {S_{13} } \\ {S_{12} } & {S_{22} } & {S_{23} } \\ {S_{13} } & {S_{23} } & {S_{33} } \\ \end{array}} \right)$

and the inverse is

$\left( {\begin{array}{*{20}c} {S_{11} } & {S_{12} } & {S_{13} } \\ {S_{12} } & {S_{22} } & {S_{23} } \\ {S_{13} } & {S_{23} } & {S_{33} } \\ \end{array}} \right)^{ - 1} = \frac{1}{K} \cdot \left( {\begin{array}{*{20}c} {S_{22} S_{33} - S_{23} S_{23} } & {S_{13} S_{23} - S_{12} S_{33} } & {S_{12} S_{23} - S_{13} S_{22} } \\ {S_{13} S_{23} - S_{12} S_{33} } & {S_{11} S_{33} - S_{13} S_{13} } & {S_{12} S_{13} - S_{11} S_{23} } \\ {S_{12} S_{23} - S_{13} S_{22} } & {S_{12} S_{13} - S_{11} S_{23} } & {S_{11} S_{22} - S_{12} S_{12} } \\ \end{array}} \right)$

with

$K = S_{11} \left( {S_{22} S_{33} - S_{23} S_{23} } \right) + S_{12} \left( {S_{13} S_{23} - S_{12} S_{33} } \right) + S_{13} \left( {S_{12} S_{23} - S_{13} S_{22} } \right)$

The implementation in C shouldn't be very difficult.

Last edited: Oct 20, 2016
17. Oct 21, 2016

### DrStupid

Yesterday I was so busy with my formulas that I didn’t pay attention to your specific problem. Now I realize that this is just a simple linear regression:

$y = m \cdot x + n$

with $y = C$, $x = \frac{1}{d}$, $m = \varepsilon _0 \varepsilon _r A$ and $n = C_0$.

Do I miss something?

18. Oct 24, 2016

### MarcJ

I'm getting closer, but we're not there yet. Let me first reply to your thoughts. We don't know the exact distance $d$ in the system, but we can make a well-defined move ($\Delta d$). That's what I meant when I said we don't know our starting point on the curve. So we need $b$ to account for an unknown offset in distance...

Somehow, I got hung up on how the cross product $xy$ would fit in the calculations. Then I realized it's perfectly straightforward. This is what I'm doing now: we're trying to find the coefficients in $$y = \frac{a}{x + b} + c$$ which can be written as $$xy = a + bc - by + cx$$ Replacing the independant part $a + bc$ with $a'$, we need to minimize the errors $\varepsilon_i$ in the system $$x_i y_i = a' - b y_i + c x_i + \varepsilon_i$$
which can be written as \begin{align} \begin{pmatrix} x_1 y_1 \\ x_2 y_2 \\ \vdots \\ x_n y_n \end{pmatrix} &= \begin{pmatrix} 1 & -y_1 & x_1 \\ 1 & -y_2 & x_2 \\ \vdots \\ 1 & -y_n & x_n \end{pmatrix} \begin{pmatrix} a' \\ b \\ c \end{pmatrix} + \begin{pmatrix} \varepsilon_1 \\ \varepsilon_2 \\ & \vdots \\ \varepsilon_n \end{pmatrix} \nonumber \\ \vec {xy} &= \mathbf F \vec a + \vec \varepsilon \nonumber \end{align}
Minimizing $\vec \varepsilon$: $$\hat {\vec a} = \left(\mathbf F^T \mathbf F\right)^{-1} \mathbf F^T \vec{xy}$$
So I tried this with numpy on data that is similar to sensor data, and it works! Even better: we may be able to offload these calculations to the application processor, where we do have a Python environment available. The crucial lines are:
Code (Text):
import numpy as np
xy = np.array([x * y for (x, y) in zip(x, y)])
F = np.array([[1, -y, x] for (x, y) in zip(x, y)])
est = np.linalg.inv(F.T.dot(F)).dot(F.T).dot(xy)
return [est[0] - est[1] * est[2], est[1], est[2]]
Now the real world steps in, adding noise to the data. We know that the noise has a normal (Gaussian) distribution and does not scale with the amplitude of the signal. You'd think that in a large data set of, say, 1000 samples, the noise would average out. Unfortunately, it doesn't when the noise is significant. I analyzed part of a curve without noise ($a = 3000, b = 1, c = 107, 15 \leq x \leq 5$), the numbers obtained from an earlier curve-fitting experiment.
Code (Text):
def f_inv(x, a, b, c):
return a / (x + b) + c
a = 3000
b = 1
c = 107
# data set
x = np.array(np.linspace(15, 5, 1000))
y = np.array([f_inv(i, a, b, c) for i in x])
Running the data set through the code above, the same coefficients are found. However, when I add noise comparable to a real-world signal, I see a significant error.
Code (Text):
y += 10 * np.random.randn(*y.shape)
I'll see if I can upload some plots, and I can send the script if you like. Any idea how to improve the fit in the presence of noise?

BTW, typing the equations does take quite some time. Thanks for your efforts!

19. Oct 25, 2016

### MarcJ

I have cleaned up the Python code and created some plots for the values of $a$, $b$ and $c$ mentioned above. Noise is simply added to the 'ideal' data, has a mean of 0 and a standard deviation of 0, 1 and 10.

What I infer is that, at higher noise levels, the mean of the residual is still low, but apparently positive and negative errors cancel out. I was hoping that the regression would lead to minimum errors over the entire data set...

#### Attached Files:

• ###### hypfit.py.txt
File size:
2.7 KB
Views:
47
20. Oct 25, 2016

### DrStupid

As integrals are usually less sensitive to noise I tried

$\int {xy\,dx} = \left( {a + bc} \right) \cdot \left( {x - x_0 } \right) + c \cdot \frac{{x^2 - x_0^2 }}{2} - b \cdot \int {y\,dx}$

(using Simpson's rule for integration) and got a much better result. Compared to a combination of a linear regression for a and c and a golden section search for b, this linearisation apears to be almost as accurate but faster.