To what extent is the fit to experimental data good?

Click For Summary

Discussion Overview

The discussion revolves around the evaluation of the fit of a theoretical function to experimental spectral data, specifically focusing on the use of confidence intervals, residual plots, and criteria for assessing the quality of the fit. Participants explore various statistical methods and considerations relevant to both linear and non-linear regression models.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants suggest that confidence intervals and residual plots are important for assessing the fit, while others propose additional metrics like chi-squared.
  • Concerns are raised about the reasonableness of confidence intervals, particularly regarding the acceptable range of values and their implications for fit quality.
  • Participants discuss the criteria for evaluating residual plots, noting that symmetry around zero is important but questioning how large deviations can be tolerated.
  • There is a debate about the applicability of residual plots primarily discussed in the context of linear regression to non-linear models, such as Gaussian functions.
  • Questions arise about the types of residual plots that should be constructed, including residuals versus frequency, fitted values, or experimental intensities.
  • One participant shares their experience with simulated data, noting discrepancies in relative residual plots and seeking clarification on their interpretation.
  • Another participant expresses skepticism about the quality of the fit based on the appearance of the residuals and the statistical significance of the results.

Areas of Agreement / Disagreement

Participants express differing views on the adequacy of various statistical measures for assessing fit quality, and there is no consensus on the criteria for acceptable confidence intervals or residual plots. The discussion remains unresolved regarding the best practices for evaluating non-linear regression models.

Contextual Notes

Limitations include potential uncertainties in the experimental data, the need for clear definitions of acceptable fit criteria, and the dependence on the specific context of the data being analyzed.

Leonid92
Messages
45
Reaction score
2
I have experimental spectrum in which y-axis is intensity values, and x-axis is frequency values. Int - array of experimental intensities (y-axis). w - array of frequencies (x-axis). I know the view of theoretical function that must describe the obtained spectrum. I explicitly set the function in Matlab, using syntax:

fun = @(p,w)p(1).*exp(-2*((w-p(2))./p(3)).^2);

I set initial guess for parameters as:

p0 = [a, b, c];

where a, b and c - specific values which I chose as initial values for parameters p(1), p(2) and p(3), respectively. Then I make fitting using the following code:

[p,resnorm,residual,exitflag,output,lambda,J] = lsqcurvefit(fun, p0, w, Int);

According to Matlab's help for lsqcurvefit function, residual is calculated as fun(p,w)-Int at the solution p. After that, I find 95% confidence interval:

conf = nlparci(p,residual,'jacobian',J);

The next step is plotting experimental graph and fit function - this step is not important here, so I will miss it. The final step is building residuals plot:

plot(w,residual,'.')

I have 5 questions:

1) Is it enough to consider 95% confidence intervals and residual plots in order to determine whether the theoretical function fits well the experimental data or not well? Or there are other quantities which should be calculated in order to say that the fit is good or bad?

2) What is criteria for that the calculated 95% confidence intervals are reasonable? For example, I obtained the mean value 1560 for parameter p(1), and 95% confidence interval calculated is 1400 and 1720, i.e. the error is +- 160. But if calculated 95% confidence interval, for example, is 1200 and 1920, i.e. the error is +- 360, will it be still good? Where is the limit? How can I be sure that the calculated 95% confidence interval is acceptable?

3) What is criteria for that the residuals plot is good? I mean, what deviation of experimental data from fit function is acceptable? Everywhere is written that residuals plot must be symmetric around zero level, but again, the deviation can be very large - is it OK?

4) I found two sites where the residual plots are treated by default for linear regression model, here are the links: http://www.r-tutor.com/elementary-statistics/simple-linear-regression/residual-plothttp://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/ So the question is why do these authors consider only linear regression model when talking about residual plots? What about non-linear regression models? For example, Gaussian function is non-linear function.

5) What type of residual plots does one need to build - residuals vs frequency, residuals vs fitted values, or residuals vs experimental intensities? Or all of them?

I will be very grateful for any help or advice.
 
Physics news on Phys.org
Hello Leonid, ##\qquad## :welcome: ##\qquad## !
Leonid92 said:
this step is not important here
We start at loggerheads :wink:: I wholeheartedly disagree: such a plot can show you whether you have done something stupid or not !
Leonid92 said:
I know the view of theoretical function that must describe the obtained spectrum
Maybe so, but do you also know for sure there is no background ? And no Lorentzian component in your peak ?

1) you could look at chi squared
2) looks like pretty bad statistics to me: a 5% sigma for peak position !?
Leonid92 said:
How can I be sure that the calculated 95% confidence interval is acceptable?
By looking at the plot you deemed unimportant :rolleyes:
3) your judgment is important: is it all random noise or is there a trend ? Did your Int spectrum have reasonable w-bin width ?
 
  • Like
Likes   Reactions: Leonid92
Thank you for reply!
1) You mean chi-squared normalized to number of degrees of freedom? It should be near 1, is it right?
2) I just gave an example; I always build plot of course, but in this case I decided not to pay attention to this step
3) As I know, there is no any random noise in spectrum. What do you mean when saying about reasonable w-bin width?
 
Leonid92 said:
Everywhere is written that residuals plot must be symmetric around zero level, but again, the deviation can be very large

This is true. The deviation of the residuals depends on the estimated uncertainty of the data which that you should have some way of determining. The usefulness of the confidence level depends on how well you know these uncertainties.

Can you show us the data/fit and the residual plot?
 
  • Like
Likes   Reactions: Leonid92
gleem said:
This is true. The deviation of the residuals depends on the estimated uncertainty of the data which that you should have some way of determining. The usefulness of the confidence level depends on how well you know these uncertainties.

Can you show us the data/fit and the residual plot?
Sorry for late reply. In Matlab, I generated data which obey Gaussian behavior, i.e. generated data using function y = 5*exp(-(x-250).^2/(2*20^2))+noise. Then I fit Gaussian function { a*exp(-(x-b).^2/(2*c^2)) } to these simulated data, calculate 95% confidence interval for each parameter, calculate residuals and relative residuals in 2 ways { relative residual1 = abs(residual / y_experimental); relative residual2 = abs(residual / y_fit) }. Visually fit is good, residual plot is symmetric around zero level, 95% confidence intervals for each parameter are also good ( for example, found parameter a = 4.988157 - it is the bestfit parameter value, its 95% confidence interval is [4.94977, 5.026544] ). You can see found bestfit parameter values and corresponding 95% confidence intervals in the plot that I attach to this post. But the relative residual plots are very bad. What am I doing wrong? Why relative residual plots are bad? How can I find residual (deviation) in % ?
a-7-000000-b-300-000000-c-70-000000.jpg


Residuals-a-7-000000-b-300-000000-c-70-000000.jpg


Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg


Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg
 

Attachments

  • a-7-000000-b-300-000000-c-70-000000.jpg
    a-7-000000-b-300-000000-c-70-000000.jpg
    44.4 KB · Views: 899
  • Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    46.9 KB · Views: 796
  • Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    26.3 KB · Views: 754
  • Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg
    Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg
    25.9 KB · Views: 812
Sorry, I made a mistake in titles of plots - instead of "w0", there is parameter "c"
 
I looked at the plots before reading what you posted (old habit :rolleyes:) and had to suppress a feeling of being cheated: can't be real, too good to be true. The residuals all nicely gaussian distributed, no skewness, kurtosis, background, etc.

Then read
Leonid92 said:
I have experimental spectrum

Was reassured when I found text in #5. Of course a fit of something like that reproduces the original values faithfully and accurately.

as a side note: you have about 500 points, so the error accuracy is around 4%; no point in quoting 9 digits : 2 or 3 is enough.​

Never mind.

95% confidence level is about two standard deviations, so you have a = 4.99 ##\pm## 0.02, b = 249.9 ##\pm## 0.1, c = 20.2 ##\pm## 0.1.

[edited: I forgot to divide by 2 -- fixed now]

In other words: you retrieved the original peak position to within 0.05% and the amplitude and width to within 0.5% . If this were real data, I'd start to worry frantically about calibrating all kinds of equipment to get comparable systematic errors !
Leonid92 said:
But the relative residual plots are very bad. What am I doing wrong? Why relative residual plots are bad? How can I find residual (deviation) in % ?
They are not, and you are doing nothing wrong.

The relative deviations don't say much useful: the signal to noise ratio is huge in the tails since there is no signal.
So in pic 3 you get 1 (signal = 0, noise = measurement) and in pic 4 you get close to ##\infty##; witness the scale of 10^30 at 12 ##\sigma## from the peak.

So now show what you do have
Leonid92 said:
I have experimental spectrum
 
  • Like
Likes   Reactions: Leonid92
BvU said:
I looked at the plots before reading what you posted (old habit :rolleyes:) and had to suppress a feeling of being cheated: can't be real, too good to be true. The residuals all nicely gaussian distributed, no skewness, kurtosis, background, etc.

Then readWas reassured when I found text in #5. Of course a fit of something like that reproduces the original values faithfully and accurately.

as a side note: you have about 500 points, so the error accuracy is around 4%; no point in quoting 9 digits : 2 or 3 is enough.​

Never mind.

95% confidence level is about two standard deviations, so you have a = 4.99 ##\pm## 0.02, b = 249.9 ##\pm## 0.1, c = 20.2 ##\pm## 0.1.

[edited: I forgot to divide by 2 -- fixed now]

In other words: you retrieved the original peak position to within 0.05% and the amplitude and width to within 0.5% . If this were real data, I'd start to worry frantically about calibrating all kinds of equipment to get comparable systematic errors !They are not, and you are doing nothing wrong.

The relative deviations don't say much useful: the signal to noise ratio is huge in the tails since there is no signal.
So in pic 3 you get 1 (signal = 0, noise = measurement) and in pic 4 you get close to ##\infty##; witness the scale of 10^30 at 12 ##\sigma## from the peak.

So now show what you do have
Thanks a lot for answer! So I don't need to calculate relative residuals.
My supervisor doesn't allow me to post our experimental data in the Internet.
 
Makes sense.
Normally, fitting a single peak of known shape on a smooth background can yield good results even with less abundant data, e.g.
functions.png

picture from https://root.cern.ch/root/html/guides/primer/ROOTPrimer.pdf p 43

(but: turns out to be fake data too ?:) !
and way too many digits (but that's because it's program output, not for publication)​
 

Attachments

  • functions.png
    functions.png
    34.4 KB · Views: 911
  • Like
Likes   Reactions: Leonid92
  • #10
Leonid what is the nature of your data that you want to test?
 
  • #11
gleem said:
Leonid what is the nature of your data that you want to test?
My experimental data are magnetic resonance spectra.
 
  • #12
Plenty litterature on analysis of those ! Google is your friend !
Any specific questions that might be answered without visuals ?
You sure you want to fit Gaussians and not Lorentzians ?
 
  • Like
Likes   Reactions: Leonid92
  • #13
BvU said:
Plenty litterature on analysis of those ! Google is your friend !
Any specific questions that might be answered without visuals ?
You sure you want to fit Gaussians and not Lorentzians ?

Yes, I need Lorentzians. Thank you for link!
 

Similar threads

  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 6 ·
Replies
6
Views
6K
  • · Replies 9 ·
Replies
9
Views
3K
Replies
1
Views
13K
  • · Replies 2 ·
Replies
2
Views
8K
  • · Replies 3 ·
Replies
3
Views
3K