Curve-fitting to data with horizontal/vertical error bars

  • Thread starter mikeph
  • Start date
  • #1
1,233
17

Main Question or Discussion Point

Hello

I've measured some data, let's say f±Δf as a function of x±Δx, and I know the form of f(x) but not the specific parameters, so it will be something like f(x) = (A/x)*exp(-B/x + C), I think.

I'm comfortable enough fitting the data (x,f) to the curve and finding A,B,C, but can anyone point me in the right direction to find errors for A,B,C? Is this even possible for non-linear fitting? Or is there an alternative statistical approach?

I'm going to have to use my best fit (let's say f') to calculate f'(x) for some other (precise) values of x and I'd like to know the errors of the resulting output, even if there are no errors in the input.

Thanks for anyone that can point me in the right direction.

Mike
 

Answers and Replies

  • #2
HallsofIvy
Science Advisor
Homework Helper
41,833
955
The obvious thing to do would be to fit three curves- to the values given, the upper error, and the lower error.
 
  • #3
1,233
17
Thanks for the reply, but I'm not sure that would give me accurate error data... eg. if I fit to y=mx+c, and I fit to the upper/lower errors, would the gradient of each of the three fits not be identical? I don't see why the analysis should be biased towards the possibility that either all the errors are positive or all are negative, which seems to be the case if I only fit to (x+Δx, f+Δf) and (x-Δx, f-Δf).

Is it not equally likely that the first half of my data has positive errors and the second half has negative errors, resulting in a (negative) error of the gradient?
 
  • #4
Deming regression may be a useful starting point. Roughly instead of minimizing the summed squared residuals in the y direction, you minimize the perpendicular distance from the points to your line, thus taking into account both x and y error (scaling the errors if the errors in the x and y directions are unequal).

I don't know what software you're using, but in MATLAB I would do a nonlinear least squares fit (e.g. function lsqnonlin) and apply the Deming SSR in the residual function.

Worth a shot! That doesn't directly address your final question about knowing the errors of the outputs, though.
 
  • #5
34,387
10,475
Worth a shot! That doesn't directly address your final question about knowing the errors of the outputs, though.
That should be an output of the fitting routine, once you fixed the variable to you want to minimize.
It is the basic concept - how much does the minimized variable change with changing parameters?
 
  • #6
haruspex
Science Advisor
Homework Helper
Insights Author
Gold Member
33,535
5,451
A Monte Carlo approach would be to generate datasets from the actual data by adding errors according to the presumed distributions. Compute A, B and C for each dataset and extract the distributions of these.
 
  • #7
Stephen Tashi
Science Advisor
7,178
1,316
but can anyone point me in the right direction to find errors for A,B,C? Is this even possible for non-linear fitting?
I think the search keywords you want are "asymptotic linearized confidence interval". I recall reading about them, but tonight I haven't found a good link that explains the topic.

Saying that you want the "errors" or "error bars" in the parameters is not specific. Perhaps you want to find the standard deviations of the parameters A,B,C about their means. We have no data to compute this (even in linear curve fitting). After all, your data consists of samples of (x,y) not samples of A,B,C, so how can we say A,B,C have a mean or variance? Yet curve fitting software packages claim to give such information for parameters. How do they do it?

I'm not certain. I'll make a conjecture based on reading about "asymptotic linear confidence intervals" .

Express the value of each parameter as a known function of the data. For example, when we do the least squares fit of a linear function, the slope and intercept are computed as a function of the data values.

Let's call the parameter p and the say
[itex] p = F(X_1,X_2,...X_n, Y_1, Y_2,...Y_n) [/itex] where the [itex] (X_i ,Y_i) [/itex] are the data.

You may not know the symbolic expression for [itex] F [/itex] , but you have a numerical method for computing it, namely your curve fitting algorithm. So you could approximate the partial derivatives of [itex] F [/itex] numerically.

Let's say that your particular curve fit found that [itex] p = p_0 [/itex] when the specific data was [itex] X_i = x_i, Y_i = y_i [/itex].

Find (symbolically or numerically) the differential expression that approximates a change in [itex] p_0 [/itex] as a function of changes in the [itex] x_i, y_i [/itex].


[itex] p0 + \delta p = F(x_1,x_2,...) + \delta x_1 \frac{\partial F}{\partial X_1} + \delta x_2 \frac{\partial F}{\partial X_2}+ \delta y_1 \frac{\partial F}{\partial Y_1} + \delta y_2 \frac{\partial F}{\partial Y_2} + ... [/itex]

[itex] \delta p = \delta x_1 \frac{\partial F}{\partial X_1} + \delta x_2 \frac{\partial F}{\partial X_2} + \delta y_1 \frac{\partial F}{\partial Y_1} + \delta y2 \frac{\partial F}{\partial Y_2} + ... [/itex]

Assume [itex] p_0 = F(x_1,x_2,...y_1,y_2..) [/itex] is a good estimate for the mean value of [itex] p [/itex]

Assume the [itex] \delta x_i [/itex] are independently identically distributed , mean zero, gaussian random errors. Assume the [itex] \delta y_i [/itex] are also. The above approximation expresses the random variable [itex] \delta p [/itex] as a linear function of the independent mean zero normal random variables [itex]\delta x_i , \delta y_i [/itex] You can compute the variance of [itex] \delta p [/itex] if you know the variance of the [itex] \delta x_i [/itex] and the [itex] \delta y_i [/itex].

Let's assume the [itex] \delta y_i [/itex] have a variance that is estimated by the variance of the residuals.

How do we find the variance of the [itex] \delta x_i [/itex]? You could assume that there are no measurement errors in the [itex] X_i [/itex] and set the [itex] \delta x_i = 0 [/itex]. If you can't assume that, perhaps we can use the linear approximation trick again (but I'm not really sure if this makes sense.) The curve fit (using specific values of the parameters) expresses the prediction of [itex] Y_i [/itex] as a function of the [itex] X_i [/itex] so [itex] Y_i = G(X_1, X_2,...) [/itex].

Approximate using:

[itex] Y_i + \delta y_i = G(x_1,x_2,..) + \delta x_1 \frac{\partial G}{\partial X_1} + \delta x_2 \frac{\partial G}{\partial X_2} + ... [/itex]

[itex] \delta y_i = \delta x_1 \frac{\partial G}{\partial X_1} + \delta x_2 \frac{\partial G}{\partial X_2} + ... [/itex]

We have assumed the variance of the [itex] \delta y_i [/itex] is the variance of the residuals. Use the above equation to solve for the variance of the [itex] \delta x_i [/itex].

To me, the above process is rather circular and suspicious. It involves many assumptions and I'm not sure I stated all of them. However, it's the best I can to to reconstruct how standard deviations could be estimated for use in "asymptotic linearized confidence intervals" for parameters in a curve. fit. I anyone knows better, please comment!

-----
 
  • #8
1,233
17
Wow, thanks everyone. I'll work my way through the replies.

edit- I'm leaning towards upgrading my curve fitter to the Deming regression, and then using a Monte Carlo approach to get some idea of the standard deviation, mainly because I have a lot of computer power and I already understand the basics of the approach. I'm having a read of "asymptotic linearized confidence interval" and might try to see if it can be implemented in the future, or if something else fails.

One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

I'm not sure if this is actually a problem, it just doesn't quite seem right.
 
Last edited:
  • #9
483
2
One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.
Yes, that's correct. That's why we have error bars. The answer is probably between the bars. If we have bad luck, then the answer is not between the bars.
 
  • #10
1,233
17
Yes... the error in f could, in general, depend on the mean of x. But it cannot possibly depend on the measured value of x, which is random. If I use the Monte Carlo approach, then it will.

I suppose I'd better try to take as many measurements as possible.
 
  • #11
Stephen Tashi
Science Advisor
7,178
1,316
edit- I'm leaning towards upgrading my curve fitter to the Deming regression
"Deming regresssion" might be the same as "total least squares regression" if you need another search phrase for it.

One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx.
I think "asymptotic linearized confidence intervals" make the same assumption.

But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.
(You haven't explained exactly what you intended to do by a Monte-Carlo method.)

If you want do anything mathematically respectable, you need a specific probability model for how the data is generated. You also should understand that "error bars" have a common misinterpretation. Many people think that if they see an "error bar" around a particular value that they can say there is a certain probability that the "true" value is with the interval defined by the error bar. In general , this is not a correct unless a Bayesian prior had been given for the quantity. (Study the difference in meaning between a "confidence interval" and a Bayesian "credible interval".)
 
  • #12
haruspex
Science Advisor
Homework Helper
Insights Author
Gold Member
33,535
5,451
One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

I'm not sure if this is actually a problem, it just doesn't quite seem right.
Maybe it would be better to generate the datasets by best fit + random errors?
 

Related Threads on Curve-fitting to data with horizontal/vertical error bars

Replies
10
Views
2K
Replies
5
Views
2K
Replies
0
Views
1K
Replies
7
Views
3K
Replies
3
Views
517
Replies
4
Views
6K
Replies
7
Views
3K
  • Last Post
Replies
2
Views
1K
Top