Curve-fitting to data with horizontal/vertical error bars

In summary, the speakers are discussing methods for finding errors in parameters A, B, and C for non-linear fitting. Suggestions include fitting three curves for the upper/lower error and using Deming regression or a Monte Carlo approach. The concept of asymptotic linearized confidence intervals is mentioned as a possible solution, but the process for calculating these intervals is not fully understood. The speakers also discuss the difficulties of finding errors when the data consists of samples of (x,y) rather than samples of A,B,C. The idea of expressing the parameters as a known function of the data is introduced, but the process for finding the variance of the errors is not fully explained.
  • #1
mikeph
1,235
18
Hello

I've measured some data, let's say f±Δf as a function of x±Δx, and I know the form of f(x) but not the specific parameters, so it will be something like f(x) = (A/x)*exp(-B/x + C), I think.

I'm comfortable enough fitting the data (x,f) to the curve and finding A,B,C, but can anyone point me in the right direction to find errors for A,B,C? Is this even possible for non-linear fitting? Or is there an alternative statistical approach?

I'm going to have to use my best fit (let's say f') to calculate f'(x) for some other (precise) values of x and I'd like to know the errors of the resulting output, even if there are no errors in the input.

Thanks for anyone that can point me in the right direction.

Mike
 
Physics news on Phys.org
  • #2
The obvious thing to do would be to fit three curves- to the values given, the upper error, and the lower error.
 
  • #3
Thanks for the reply, but I'm not sure that would give me accurate error data... eg. if I fit to y=mx+c, and I fit to the upper/lower errors, would the gradient of each of the three fits not be identical? I don't see why the analysis should be biased towards the possibility that either all the errors are positive or all are negative, which seems to be the case if I only fit to (x+Δx, f+Δf) and (x-Δx, f-Δf).

Is it not equally likely that the first half of my data has positive errors and the second half has negative errors, resulting in a (negative) error of the gradient?
 
  • #4
Deming regression may be a useful starting point. Roughly instead of minimizing the summed squared residuals in the y direction, you minimize the perpendicular distance from the points to your line, thus taking into account both x and y error (scaling the errors if the errors in the x and y directions are unequal).

I don't know what software you're using, but in MATLAB I would do a nonlinear least squares fit (e.g. function lsqnonlin) and apply the Deming SSR in the residual function.

Worth a shot! That doesn't directly address your final question about knowing the errors of the outputs, though.
 
  • #5
digfarenough said:
Worth a shot! That doesn't directly address your final question about knowing the errors of the outputs, though.
That should be an output of the fitting routine, once you fixed the variable to you want to minimize.
It is the basic concept - how much does the minimized variable change with changing parameters?
 
  • #6
A Monte Carlo approach would be to generate datasets from the actual data by adding errors according to the presumed distributions. Compute A, B and C for each dataset and extract the distributions of these.
 
  • #7
MikeyW said:
but can anyone point me in the right direction to find errors for A,B,C? Is this even possible for non-linear fitting?

I think the search keywords you want are "asymptotic linearized confidence interval". I recall reading about them, but tonight I haven't found a good link that explains the topic.

Saying that you want the "errors" or "error bars" in the parameters is not specific. Perhaps you want to find the standard deviations of the parameters A,B,C about their means. We have no data to compute this (even in linear curve fitting). After all, your data consists of samples of (x,y) not samples of A,B,C, so how can we say A,B,C have a mean or variance? Yet curve fitting software packages claim to give such information for parameters. How do they do it?

I'm not certain. I'll make a conjecture based on reading about "asymptotic linear confidence intervals" .

Express the value of each parameter as a known function of the data. For example, when we do the least squares fit of a linear function, the slope and intercept are computed as a function of the data values.

Let's call the parameter p and the say
[itex] p = F(X_1,X_2,...X_n, Y_1, Y_2,...Y_n) [/itex] where the [itex] (X_i ,Y_i) [/itex] are the data.

You may not know the symbolic expression for [itex] F [/itex] , but you have a numerical method for computing it, namely your curve fitting algorithm. So you could approximate the partial derivatives of [itex] F [/itex] numerically.

Let's say that your particular curve fit found that [itex] p = p_0 [/itex] when the specific data was [itex] X_i = x_i, Y_i = y_i [/itex].

Find (symbolically or numerically) the differential expression that approximates a change in [itex] p_0 [/itex] as a function of changes in the [itex] x_i, y_i [/itex].


[itex] p0 + \delta p = F(x_1,x_2,...) + \delta x_1 \frac{\partial F}{\partial X_1} + \delta x_2 \frac{\partial F}{\partial X_2}+ \delta y_1 \frac{\partial F}{\partial Y_1} + \delta y_2 \frac{\partial F}{\partial Y_2} + ... [/itex]

[itex] \delta p = \delta x_1 \frac{\partial F}{\partial X_1} + \delta x_2 \frac{\partial F}{\partial X_2} + \delta y_1 \frac{\partial F}{\partial Y_1} + \delta y2 \frac{\partial F}{\partial Y_2} + ... [/itex]

Assume [itex] p_0 = F(x_1,x_2,...y_1,y_2..) [/itex] is a good estimate for the mean value of [itex] p [/itex]

Assume the [itex] \delta x_i [/itex] are independently identically distributed , mean zero, gaussian random errors. Assume the [itex] \delta y_i [/itex] are also. The above approximation expresses the random variable [itex] \delta p [/itex] as a linear function of the independent mean zero normal random variables [itex]\delta x_i , \delta y_i [/itex] You can compute the variance of [itex] \delta p [/itex] if you know the variance of the [itex] \delta x_i [/itex] and the [itex] \delta y_i [/itex].

Let's assume the [itex] \delta y_i [/itex] have a variance that is estimated by the variance of the residuals.

How do we find the variance of the [itex] \delta x_i [/itex]? You could assume that there are no measurement errors in the [itex] X_i [/itex] and set the [itex] \delta x_i = 0 [/itex]. If you can't assume that, perhaps we can use the linear approximation trick again (but I'm not really sure if this makes sense.) The curve fit (using specific values of the parameters) expresses the prediction of [itex] Y_i [/itex] as a function of the [itex] X_i [/itex] so [itex] Y_i = G(X_1, X_2,...) [/itex].

Approximate using:

[itex] Y_i + \delta y_i = G(x_1,x_2,..) + \delta x_1 \frac{\partial G}{\partial X_1} + \delta x_2 \frac{\partial G}{\partial X_2} + ... [/itex]

[itex] \delta y_i = \delta x_1 \frac{\partial G}{\partial X_1} + \delta x_2 \frac{\partial G}{\partial X_2} + ... [/itex]

We have assumed the variance of the [itex] \delta y_i [/itex] is the variance of the residuals. Use the above equation to solve for the variance of the [itex] \delta x_i [/itex].

To me, the above process is rather circular and suspicious. It involves many assumptions and I'm not sure I stated all of them. However, it's the best I can to to reconstruct how standard deviations could be estimated for use in "asymptotic linearized confidence intervals" for parameters in a curve. fit. I anyone knows better, please comment!

-----
 
  • #8
Wow, thanks everyone. I'll work my way through the replies.

edit- I'm leaning towards upgrading my curve fitter to the Deming regression, and then using a Monte Carlo approach to get some idea of the standard deviation, mainly because I have a lot of computer power and I already understand the basics of the approach. I'm having a read of "asymptotic linearized confidence interval" and might try to see if it can be implemented in the future, or if something else fails.

One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

I'm not sure if this is actually a problem, it just doesn't quite seem right.
 
Last edited:
  • #9
MikeyW said:
One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

Yes, that's correct. That's why we have error bars. The answer is probably between the bars. If we have bad luck, then the answer is not between the bars.
 
  • #10
Yes... the error in f could, in general, depend on the mean of x. But it cannot possibly depend on the measured value of x, which is random. If I use the Monte Carlo approach, then it will.

I suppose I'd better try to take as many measurements as possible.
 
  • #11
MikeyW said:
edit- I'm leaning towards upgrading my curve fitter to the Deming regression
"Deming regresssion" might be the same as "total least squares regression" if you need another search phrase for it.

One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx.

I think "asymptotic linearized confidence intervals" make the same assumption.

But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

(You haven't explained exactly what you intended to do by a Monte-Carlo method.)

If you want do anything mathematically respectable, you need a specific probability model for how the data is generated. You also should understand that "error bars" have a common misinterpretation. Many people think that if they see an "error bar" around a particular value that they can say there is a certain probability that the "true" value is with the interval defined by the error bar. In general , this is not a correct unless a Bayesian prior had been given for the quantity. (Study the difference in meaning between a "confidence interval" and a Bayesian "credible interval".)
 
  • #12
MikeyW said:
One thing that slightly troubles me about the Monte Carlo approach: if I make a measurement of some physical parameter X, and obtain the result x±Δx, then when we generate our dataset for the simulation, we're assuming that the mean is x and the s.d. is Δx. But X is the mean, not x... in reality x could be very far from X by sheer bad luck, and our entire analysis depends on this.

I'm not sure if this is actually a problem, it just doesn't quite seem right.
Maybe it would be better to generate the datasets by best fit + random errors?
 

1. What is curve-fitting to data with horizontal/vertical error bars?

Curve-fitting to data with horizontal/vertical error bars is a statistical method used to analyze the relationship between two variables by fitting a curve to a set of data points. This method takes into account the uncertainty or variability in both the horizontal (independent) and vertical (dependent) variables, which is represented by the error bars.

2. Why is curve-fitting to data with horizontal/vertical error bars important?

Curve-fitting to data with horizontal/vertical error bars is important because it allows for a more accurate representation of the relationship between two variables. By considering the uncertainty in both variables, it provides a more realistic understanding of the data and improves the reliability of any conclusions drawn from the analysis.

3. What types of functions can be used for curve-fitting to data with horizontal/vertical error bars?

There are various types of functions that can be used for curve-fitting with error bars, such as linear, quadratic, exponential, and logarithmic functions. The choice of function depends on the nature of the data and the research question being investigated.

4. How are the error bars determined in curve-fitting to data with horizontal/vertical error bars?

The error bars in curve-fitting are typically determined by calculating the standard deviation or confidence intervals of the data points. These values are then used to create the error bars, which represent the variability or uncertainty in the data.

5. What are some limitations of curve-fitting to data with horizontal/vertical error bars?

One limitation of curve-fitting with error bars is that it assumes a linear relationship between the two variables. In cases where the relationship is non-linear, other methods of analysis may be more appropriate. Additionally, curve-fitting with error bars may not accurately represent the variability in the data if the sample size is small or if there are outliers present in the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
887
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
2K
Back
Top