Uncertainty of a non linear least squares fit

In summary: Bayesian statistics quantifies the probability of an idea given the data and the parameters of the model. It allows you to make assumptions about the probability of things, and to use that information to calculate probabilities for things that you might want to know (like the probability that the true values of things are in a given interval). So, if you want to estimate the probability that the true values of a parameter are in a given interval, you could do that using Bayesian statistics. But if you just want to know the standard deviation of the parameter, or the probability that a given interval includes the true value of the parameter, you could not do that using Bayesian statistics.
  • #1
oliverroth
5
0
Hi,

I have some experimental data as a function of time t and temperature T. I have done a least squares fit of the data with a function f=f(a1,a2,t,T) (the function is non linear in a1 and a2!). Optimization of e^2 = sum((yi-f(a1,a2,ti,Ti)^2) with Matlab's fminsearch gave me a1, a2 and the residual error^2 (e^2).

Now I need some estimation of the quality of the fit (something comparable to R^2 in linear regressions). What can I use for this purpose?

I think I can remember that e^2/(n-2) (n=sample volume) can be used to estimate the uncertainty of the fit. Am I right? If so, how is this quantity called and how can I interpret it (e.g. what statistical test is applicable?)?

Or is it necessary/better to calculate the covariance matrix?
I found somewhere that it can be calculated with:
e^2/(n-2)*C ^-1 with Cij=sum(df/dai*df/daj).

I guess, I have to evaluate the differentials df(t,T)/da1 and df(t,T)/da2 at the optimized a1 and a2 and sum over all combinations of ti and Ti: Cij=sum(sum(df(ti,Ti)/dai*df(ti,Ti)/daj)); right??



Thanks for any help !
 
Physics news on Phys.org
  • #2
oliverroth said:
Hi,
I think I can remember that e^2/(n-2) (n=sample volume) can be used to estimate the uncertainty of the fit. Am I right? If so, how is this quantity called and how can I interpret it (e.g. what statistical test is applicable?)?
(e^2)/(n-1) is the usual "unbiased estimator of the population variance", if we assume your errors are independent random samples from a normal distribution.

Using a statistical test would be appropriate if you were making some sort of decision, but you haven't said what decision that would be.

Or is it necessary/better to calculate the covariance matrix?
I found somewhere that it can be calculated with:
e^2/(n-2)*C ^-1 with Cij=sum(df/dai*df/daj).

I guess, I have to evaluate the differentials df(t,T)/da1 and df(t,T)/da2 at the optimized a1 and a2 and sum over all combinations of ti and Ti: Cij=sum(sum(df(ti,Ti)/dai*df(ti,Ti)/daj)); right??

I don't know what procedure you are referring to. Post a link to something like it and perhaps someone can comment.



Thanks for any help ![/QUOTE]
 
  • #3
I just want to have a simple measure for the uncertainty of the fit. I am actually just interested in one of the 2 parameters and I want to have something like a1 = xx ± error.

I have to calculate e^2 anyway and so I thought, I could use it for this purpose. But it doesn’t even have the same units as the parameter a1 and I have no idea how to interpret it.

So possibly it is better to calculate the covariance matrix cov=e^2/(n-2)*C ^-1; C = J’*J, where J is the Jacobian matrix (http://www.orbitals.com/self/least/least.htm). The calculation should be correct but again I don’t know exactly what to do with it. Can I simply use the diagonal element of cov for a1 = xx +/- √cov(1,1)?
 
  • #4
I looked at the paper in that link. In my opinion, the bottom line is that you do have to compute Jacobians and covariance matrices to get intervals for the parameters you are estimating, if you follow the method it prescribes.

I don't know whether you are rushing to produce a report or whether you have the time and inclination to understand exactly what you are doing. Actually understanding what is going on in nonlinear least total squares curve fitting is complicated and I don't claim to have a comprehensive grasp of it. ( From the point of view of explaining the probability and statistics involved, the paper in that link isn't well written.)

Some elementary points:

1. It's best to use unambiguous terminology. In some contexts "uncertainty" amounts to the standard deviation of a random variable. In others it refers to entropy. It isn't clear what you mean by the "uncertainty of the fit". Likewise, although "confidence" has technical definition in statistics, most laymen are thinking of their own misinterpretation of that word when they use it. For example, if we say that 1.90 plus or minus 1.3 is a "90% confidence interval" for a parameter, this does NOT mean that there is a 90% chance than the true value of the parameter is in that interval.

2. As I understand the paper in the link, it is not computing "confidence intervals" in the ordinary sense of that terminology. It is computing "asymptotic linearized confidence intervals". It doesn't bother to include those adjectives or explain the ideas behind them.

3. When a common sense person has data, he naturally wants information about the probability that certain ideas or true or the probability that the true values of things are in certain intervals. Unless he makes enough assumptions to use Bayesian statistics, he can never get such answers. The usual kind of statistics ("frequentist" statistics") does not solve that type of problem. It doesn't quantify the probability of some idea given the data. Instead it computes the probability of the data given some idea. The terminology of frequentist statistics ("confidence", "significance" etc.) strongly suggests to laymen that they are getting answers about the the probability of various ideas or the location of various parameters given the data they have. In fact, they are actually getting numbers based on computing the probability of data when assuming certain ideas and locations are true (i.e. true with probability 1). There is a distinction between "The probability of A given B" versus "The probability of B given A", which I hope is obvious .
 
  • #5


I would like to first commend you on your thorough analysis and use of appropriate statistical methods in your research. It is clear that you have put a lot of effort into your data analysis and are seeking to fully understand the uncertainty in your results.

To answer your question, there are a few ways to estimate the quality of a non-linear least squares fit. One commonly used metric is the coefficient of determination, also known as R-squared. This measures the proportion of variability in the data that is explained by the model. However, R-squared is not always appropriate for non-linear models and can be biased in certain cases.

Another option is to use the reduced chi-squared statistic, which is similar to what you mentioned in your question. This statistic takes into account the number of parameters in the model, as well as the sample size, to estimate the goodness of fit. In your case, it would be e^2/(n-p), where p is the number of parameters in your model. This statistic is often used in non-linear least squares fitting and can provide a more reliable measure of the fit quality compared to R-squared.

Alternatively, you can also calculate the covariance matrix as you mentioned. This matrix can provide information about the uncertainty in your parameter estimates and can be used to calculate confidence intervals for your parameters. However, this approach may be more complex and may require some additional assumptions about the distribution of your data.

In summary, there are multiple ways to estimate the quality of a non-linear least squares fit, and it may be beneficial to use a combination of these methods to fully understand the uncertainty in your results. I would recommend consulting with a statistician or conducting further research to determine the most appropriate approach for your specific data and model.
 

1. What is the meaning of "uncertainty" in a non linear least squares fit?

Uncertainty refers to the amount of error or variation in the data points that are being fitted to a non linear model. It is a measure of how confident we can be in the accuracy of the fitted parameters.

2. How is uncertainty calculated in a non linear least squares fit?

Uncertainty is typically calculated using the covariance matrix, which measures the relationship between the fitted parameters. It takes into account the errors in the data points and the sensitivity of the model to changes in the parameters.

3. What factors can affect the uncertainty in a non linear least squares fit?

There are several factors that can affect the uncertainty in a non linear least squares fit, including the number of data points, the distribution of the data, and the complexity of the model being fitted. Additionally, the presence of outliers or systematic errors can also increase uncertainty.

4. How can we use uncertainty in a non linear least squares fit to evaluate the quality of the fit?

Uncertainty can be used to calculate confidence intervals for the fitted parameters, which can give us a range of values within which the true value is likely to fall. If these intervals are large, it may indicate that the fit is not very reliable and may need to be improved.

5. Can we reduce uncertainty in a non linear least squares fit?

Yes, there are several ways to reduce uncertainty in a non linear least squares fit. This includes increasing the number of data points, improving the quality of the data, and using more advanced fitting techniques. It is also important to carefully consider the choice of model and any assumptions made during the fitting process.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
455
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
853
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
Back
Top