To what extent is the fit to experimental data good?

In summary: But I can't understand the meaning of residuals and relative residuals. If they are small, then the fit is good. But how small? Or in which cases they are not small? What do they mean? This is the question.In summary, the conversation discusses fitting experimental data to a theoretical function using Matlab. The process involves setting initial parameters and using the "lsqcurvefit" function to fit the data. The 95% confidence interval and residual plots are used to determine the quality of the fit. The conversation also touches on the importance of considering background and Lorentzian components in the data. Finally, the question is raised about the meaning and significance of residuals and relative residuals in determining the goodness of fit.
  • #1
Leonid92
45
2
I have experimental spectrum in which y-axis is intensity values, and x-axis is frequency values. Int - array of experimental intensities (y-axis). w - array of frequencies (x-axis). I know the view of theoretical function that must describe the obtained spectrum. I explicitly set the function in Matlab, using syntax:

fun = @(p,w)p(1).*exp(-2*((w-p(2))./p(3)).^2);

I set initial guess for parameters as:

p0 = [a, b, c];

where a, b and c - specific values which I chose as initial values for parameters p(1), p(2) and p(3), respectively. Then I make fitting using the following code:

[p,resnorm,residual,exitflag,output,lambda,J] = lsqcurvefit(fun, p0, w, Int);

According to Matlab's help for lsqcurvefit function, residual is calculated as fun(p,w)-Int at the solution p. After that, I find 95% confidence interval:

conf = nlparci(p,residual,'jacobian',J);

The next step is plotting experimental graph and fit function - this step is not important here, so I will miss it. The final step is building residuals plot:

plot(w,residual,'.')

I have 5 questions:

1) Is it enough to consider 95% confidence intervals and residual plots in order to determine whether the theoretical function fits well the experimental data or not well? Or there are other quantities which should be calculated in order to say that the fit is good or bad?

2) What is criteria for that the calculated 95% confidence intervals are reasonable? For example, I obtained the mean value 1560 for parameter p(1), and 95% confidence interval calculated is 1400 and 1720, i.e. the error is +- 160. But if calculated 95% confidence interval, for example, is 1200 and 1920, i.e. the error is +- 360, will it be still good? Where is the limit? How can I be sure that the calculated 95% confidence interval is acceptable?

3) What is criteria for that the residuals plot is good? I mean, what deviation of experimental data from fit function is acceptable? Everywhere is written that residuals plot must be symmetric around zero level, but again, the deviation can be very large - is it OK?

4) I found two sites where the residual plots are treated by default for linear regression model, here are the links: http://www.r-tutor.com/elementary-statistics/simple-linear-regression/residual-plothttp://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/ So the question is why do these authors consider only linear regression model when talking about residual plots? What about non-linear regression models? For example, Gaussian function is non-linear function.

5) What type of residual plots does one need to build - residuals vs frequency, residuals vs fitted values, or residuals vs experimental intensities? Or all of them?

I will be very grateful for any help or advice.
 
Physics news on Phys.org
  • #2
Hello Leonid, ##\qquad## :welcome: ##\qquad## !
Leonid92 said:
this step is not important here
We start at loggerheads :wink:: I wholeheartedly disagree: such a plot can show you whether you have done something stupid or not !
Leonid92 said:
I know the view of theoretical function that must describe the obtained spectrum
Maybe so, but do you also know for sure there is no background ? And no Lorentzian component in your peak ?

1) you could look at chi squared
2) looks like pretty bad statistics to me: a 5% sigma for peak position !?
Leonid92 said:
How can I be sure that the calculated 95% confidence interval is acceptable?
By looking at the plot you deemed unimportant :rolleyes:
3) your judgment is important: is it all random noise or is there a trend ? Did your Int spectrum have reasonable w-bin width ?
 
  • Like
Likes Leonid92
  • #3
Thank you for reply!
1) You mean chi-squared normalized to number of degrees of freedom? It should be near 1, is it right?
2) I just gave an example; I always build plot of course, but in this case I decided not to pay attention to this step
3) As I know, there is no any random noise in spectrum. What do you mean when saying about reasonable w-bin width?
 
  • #4
Leonid92 said:
Everywhere is written that residuals plot must be symmetric around zero level, but again, the deviation can be very large

This is true. The deviation of the residuals depends on the estimated uncertainty of the data which that you should have some way of determining. The usefulness of the confidence level depends on how well you know these uncertainties.

Can you show us the data/fit and the residual plot?
 
  • Like
Likes Leonid92
  • #5
gleem said:
This is true. The deviation of the residuals depends on the estimated uncertainty of the data which that you should have some way of determining. The usefulness of the confidence level depends on how well you know these uncertainties.

Can you show us the data/fit and the residual plot?
Sorry for late reply. In Matlab, I generated data which obey Gaussian behavior, i.e. generated data using function y = 5*exp(-(x-250).^2/(2*20^2))+noise. Then I fit Gaussian function { a*exp(-(x-b).^2/(2*c^2)) } to these simulated data, calculate 95% confidence interval for each parameter, calculate residuals and relative residuals in 2 ways { relative residual1 = abs(residual / y_experimental); relative residual2 = abs(residual / y_fit) }. Visually fit is good, residual plot is symmetric around zero level, 95% confidence intervals for each parameter are also good ( for example, found parameter a = 4.988157 - it is the bestfit parameter value, its 95% confidence interval is [4.94977, 5.026544] ). You can see found bestfit parameter values and corresponding 95% confidence intervals in the plot that I attach to this post. But the relative residual plots are very bad. What am I doing wrong? Why relative residual plots are bad? How can I find residual (deviation) in % ?
a-7-000000-b-300-000000-c-70-000000.jpg


Residuals-a-7-000000-b-300-000000-c-70-000000.jpg


Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg


Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg
 

Attachments

  • a-7-000000-b-300-000000-c-70-000000.jpg
    a-7-000000-b-300-000000-c-70-000000.jpg
    44.4 KB · Views: 777
  • Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    46.9 KB · Views: 705
  • Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg
    26.3 KB · Views: 638
  • Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg
    Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg
    25.9 KB · Views: 696
  • #6
Sorry, I made a mistake in titles of plots - instead of "w0", there is parameter "c"
 
  • #7
I looked at the plots before reading what you posted (old habit :rolleyes:) and had to suppress a feeling of being cheated: can't be real, too good to be true. The residuals all nicely gaussian distributed, no skewness, kurtosis, background, etc.

Then read
Leonid92 said:
I have experimental spectrum

Was reassured when I found text in #5. Of course a fit of something like that reproduces the original values faithfully and accurately.

as a side note: you have about 500 points, so the error accuracy is around 4%; no point in quoting 9 digits : 2 or 3 is enough.​

Never mind.

95% confidence level is about two standard deviations, so you have a = 4.99 ##\pm## 0.02, b = 249.9 ##\pm## 0.1, c = 20.2 ##\pm## 0.1.

[edited: I forgot to divide by 2 -- fixed now]

In other words: you retrieved the original peak position to within 0.05% and the amplitude and width to within 0.5% . If this were real data, I'd start to worry frantically about calibrating all kinds of equipment to get comparable systematic errors !
Leonid92 said:
But the relative residual plots are very bad. What am I doing wrong? Why relative residual plots are bad? How can I find residual (deviation) in % ?
They are not, and you are doing nothing wrong.

The relative deviations don't say much useful: the signal to noise ratio is huge in the tails since there is no signal.
So in pic 3 you get 1 (signal = 0, noise = measurement) and in pic 4 you get close to ##\infty##; witness the scale of 10^30 at 12 ##\sigma## from the peak.

So now show what you do have
Leonid92 said:
I have experimental spectrum
 
  • Like
Likes Leonid92
  • #8
BvU said:
I looked at the plots before reading what you posted (old habit :rolleyes:) and had to suppress a feeling of being cheated: can't be real, too good to be true. The residuals all nicely gaussian distributed, no skewness, kurtosis, background, etc.

Then readWas reassured when I found text in #5. Of course a fit of something like that reproduces the original values faithfully and accurately.

as a side note: you have about 500 points, so the error accuracy is around 4%; no point in quoting 9 digits : 2 or 3 is enough.​

Never mind.

95% confidence level is about two standard deviations, so you have a = 4.99 ##\pm## 0.02, b = 249.9 ##\pm## 0.1, c = 20.2 ##\pm## 0.1.

[edited: I forgot to divide by 2 -- fixed now]

In other words: you retrieved the original peak position to within 0.05% and the amplitude and width to within 0.5% . If this were real data, I'd start to worry frantically about calibrating all kinds of equipment to get comparable systematic errors !They are not, and you are doing nothing wrong.

The relative deviations don't say much useful: the signal to noise ratio is huge in the tails since there is no signal.
So in pic 3 you get 1 (signal = 0, noise = measurement) and in pic 4 you get close to ##\infty##; witness the scale of 10^30 at 12 ##\sigma## from the peak.

So now show what you do have
Thanks a lot for answer! So I don't need to calculate relative residuals.
My supervisor doesn't allow me to post our experimental data in the Internet.
 
  • #9
Makes sense.
Normally, fitting a single peak of known shape on a smooth background can yield good results even with less abundant data, e.g.
functions.png

picture from https://root.cern.ch/root/html/guides/primer/ROOTPrimer.pdf p 43

(but: turns out to be fake data too ?:) !
and way too many digits (but that's because it's program output, not for publication)​
 

Attachments

  • functions.png
    functions.png
    34.4 KB · Views: 793
  • Like
Likes Leonid92
  • #10
Leonid what is the nature of your data that you want to test?
 
  • #11
gleem said:
Leonid what is the nature of your data that you want to test?
My experimental data are magnetic resonance spectra.
 
  • #12
Plenty litterature on analysis of those ! Google is your friend !
Any specific questions that might be answered without visuals ?
You sure you want to fit Gaussians and not Lorentzians ?
 
  • Like
Likes Leonid92
  • #13
BvU said:
Plenty litterature on analysis of those ! Google is your friend !
Any specific questions that might be answered without visuals ?
You sure you want to fit Gaussians and not Lorentzians ?

Yes, I need Lorentzians. Thank you for link!
 

1. What is the definition of "good fit" in relation to experimental data?

The term "good fit" refers to how closely a mathematical model or hypothesis matches the observed data from an experiment. It is a measure of how well the model can explain or predict the results of the experiment.

2. How is the fit to experimental data evaluated?

The fit to experimental data is evaluated by comparing the predicted values from the model to the actual values obtained from the experiment. This can be done through statistical tests, such as the chi-square test or the coefficient of determination (R-squared), which provide a quantitative measure of the fit.

3. What factors can affect the fit to experimental data?

The fit to experimental data can be affected by various factors, such as the quality and quantity of the data, the complexity of the model, and the assumptions made in the model. Other factors, such as measurement errors, outliers, and experimental variability, can also impact the fit.

4. How can one improve the fit to experimental data?

To improve the fit to experimental data, one can adjust the parameters of the model or explore alternative models. It is also important to carefully evaluate the data and make sure it is of high quality. In some cases, it may be necessary to collect more data or perform additional experiments to obtain a better fit.

5. What does a poor fit to experimental data indicate?

A poor fit to experimental data can indicate that the model is not a good representation of the underlying phenomenon or that there are errors or limitations in the data. It may also suggest that there are other factors or variables that are not accounted for in the model. In any case, a poor fit indicates that the model should be re-evaluated and potentially revised or refined.

Similar threads

  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
14
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
4K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
6
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
8K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
Back
Top