# Curve-fitting to data with horizontal/vertical error bars

1. Oct 19, 2012

### mikeph

Hello

I've measured some data, let's say f±Δf as a function of x±Δx, and I know the form of f(x) but not the specific parameters, so it will be something like f(x) = (A/x)*exp(-B/x + C), I think.

I'm comfortable enough fitting the data (x,f) to the curve and finding A,B,C, but can anyone point me in the right direction to find errors for A,B,C? Is this even possible for non-linear fitting? Or is there an alternative statistical approach?

I'm going to have to use my best fit (let's say f') to calculate f'(x) for some other (precise) values of x and I'd like to know the errors of the resulting output, even if there are no errors in the input.

Thanks for anyone that can point me in the right direction.

Mike

2. Oct 19, 2012

### HallsofIvy

Staff Emeritus
The obvious thing to do would be to fit three curves- to the values given, the upper error, and the lower error.

3. Oct 22, 2012

### mikeph

Thanks for the reply, but I'm not sure that would give me accurate error data... eg. if I fit to y=mx+c, and I fit to the upper/lower errors, would the gradient of each of the three fits not be identical? I don't see why the analysis should be biased towards the possibility that either all the errors are positive or all are negative, which seems to be the case if I only fit to (x+Δx, f+Δf) and (x-Δx, f-Δf).

Is it not equally likely that the first half of my data has positive errors and the second half has negative errors, resulting in a (negative) error of the gradient?

4. Oct 22, 2012

### digfarenough

Deming regression may be a useful starting point. Roughly instead of minimizing the summed squared residuals in the y direction, you minimize the perpendicular distance from the points to your line, thus taking into account both x and y error (scaling the errors if the errors in the x and y directions are unequal).

I don't know what software you're using, but in MATLAB I would do a nonlinear least squares fit (e.g. function lsqnonlin) and apply the Deming SSR in the residual function.

Worth a shot! That doesn't directly address your final question about knowing the errors of the outputs, though.

5. Oct 22, 2012

### Staff: Mentor

That should be an output of the fitting routine, once you fixed the variable to you want to minimize.
It is the basic concept - how much does the minimized variable change with changing parameters?

6. Oct 22, 2012

### haruspex

A Monte Carlo approach would be to generate datasets from the actual data by adding errors according to the presumed distributions. Compute A, B and C for each dataset and extract the distributions of these.

7. Oct 22, 2012

### Stephen Tashi

I think the search keywords you want are "asymptotic linearized confidence interval". I recall reading about them, but tonight I haven't found a good link that explains the topic.

Saying that you want the "errors" or "error bars" in the parameters is not specific. Perhaps you want to find the standard deviations of the parameters A,B,C about their means. We have no data to compute this (even in linear curve fitting). After all, your data consists of samples of (x,y) not samples of A,B,C, so how can we say A,B,C have a mean or variance? Yet curve fitting software packages claim to give such information for parameters. How do they do it?

I'm not certain. I'll make a conjecture based on reading about "asymptotic linear confidence intervals" .

Express the value of each parameter as a known function of the data. For example, when we do the least squares fit of a linear function, the slope and intercept are computed as a function of the data values.

Let's call the parameter p and the say
$p = F(X_1,X_2,...X_n, Y_1, Y_2,...Y_n)$ where the $(X_i ,Y_i)$ are the data.

You may not know the symbolic expression for $F$ , but you have a numerical method for computing it, namely your curve fitting algorithm. So you could approximate the partial derivatives of $F$ numerically.

Let's say that your particular curve fit found that $p = p_0$ when the specific data was $X_i = x_i, Y_i = y_i$.

Find (symbolically or numerically) the differential expression that approximates a change in $p_0$ as a function of changes in the $x_i, y_i$.

$p0 + \delta p = F(x_1,x_2,...) + \delta x_1 \frac{\partial F}{\partial X_1} + \delta x_2 \frac{\partial F}{\partial X_2}+ \delta y_1 \frac{\partial F}{\partial Y_1} + \delta y_2 \frac{\partial F}{\partial Y_2} + ...$

$\delta p = \delta x_1 \frac{\partial F}{\partial X_1} + \delta x_2 \frac{\partial F}{\partial X_2} + \delta y_1 \frac{\partial F}{\partial Y_1} + \delta y2 \frac{\partial F}{\partial Y_2} + ...$

Assume $p_0 = F(x_1,x_2,...y_1,y_2..)$ is a good estimate for the mean value of $p$

Assume the $\delta x_i$ are independently identically distributed , mean zero, gaussian random errors. Assume the $\delta y_i$ are also. The above approximation expresses the random variable $\delta p$ as a linear function of the independent mean zero normal random variables $\delta x_i , \delta y_i$ You can compute the variance of $\delta p$ if you know the variance of the $\delta x_i$ and the $\delta y_i$.

Let's assume the $\delta y_i$ have a variance that is estimated by the variance of the residuals.

How do we find the variance of the $\delta x_i$? You could assume that there are no measurement errors in the $X_i$ and set the $\delta x_i = 0$. If you can't assume that, perhaps we can use the linear approximation trick again (but I'm not really sure if this makes sense.) The curve fit (using specific values of the parameters) expresses the prediction of $Y_i$ as a function of the $X_i$ so $Y_i = G(X_1, X_2,...)$.

Approximate using:

$Y_i + \delta y_i = G(x_1,x_2,..) + \delta x_1 \frac{\partial G}{\partial X_1} + \delta x_2 \frac{\partial G}{\partial X_2} + ...$

$\delta y_i = \delta x_1 \frac{\partial G}{\partial X_1} + \delta x_2 \frac{\partial G}{\partial X_2} + ...$

We have assumed the variance of the $\delta y_i$ is the variance of the residuals. Use the above equation to solve for the variance of the $\delta x_i$.

To me, the above process is rather circular and suspicious. It involves many assumptions and I'm not sure I stated all of them. However, it's the best I can to to reconstruct how standard deviations could be estimated for use in "asymptotic linearized confidence intervals" for parameters in a curve. fit. I anyone knows better, please comment!

-----

8. Oct 23, 2012

### mikeph

Wow, thanks everyone. I'll work my way through the replies.

edit- I'm leaning towards upgrading my curve fitter to the Deming regression, and then using a Monte Carlo approach to get some idea of the standard deviation, mainly because I have a lot of computer power and I already understand the basics of the approach. I'm having a read of "asymptotic linearized confidence interval" and might try to see if it can be implemented in the future, or if something else fails.

One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

I'm not sure if this is actually a problem, it just doesn't quite seem right.

Last edited: Oct 23, 2012
9. Oct 23, 2012

### ImaLooser

Yes, that's correct. That's why we have error bars. The answer is probably between the bars. If we have bad luck, then the answer is not between the bars.

10. Oct 23, 2012

### mikeph

Yes... the error in f could, in general, depend on the mean of x. But it cannot possibly depend on the measured value of x, which is random. If I use the Monte Carlo approach, then it will.

I suppose I'd better try to take as many measurements as possible.

11. Oct 23, 2012

### Stephen Tashi

"Deming regresssion" might be the same as "total least squares regression" if you need another search phrase for it.

I think "asymptotic linearized confidence intervals" make the same assumption.

(You haven't explained exactly what you intended to do by a Monte-Carlo method.)

If you want do anything mathematically respectable, you need a specific probability model for how the data is generated. You also should understand that "error bars" have a common misinterpretation. Many people think that if they see an "error bar" around a particular value that they can say there is a certain probability that the "true" value is with the interval defined by the error bar. In general , this is not a correct unless a Bayesian prior had been given for the quantity. (Study the difference in meaning between a "confidence interval" and a Bayesian "credible interval".)

12. Oct 23, 2012

### haruspex

Maybe it would be better to generate the datasets by best fit + random errors?