To what extent is the fit to experimental data good?

Leonid92 · Feb 12, 2019

I have experimental spectrum in which y-axis is intensity values, and x-axis is frequency values. Int - array of experimental intensities (y-axis). w - array of frequencies (x-axis). I know the view of theoretical function that must describe the obtained spectrum. I explicitly set the function in Matlab, using syntax:

fun = @(p,w)p(1).*exp(-2*((w-p(2))./p(3)).^2);

I set initial guess for parameters as:

p0 = [a, b, c];

where a, b and c - specific values which I chose as initial values for parameters p(1), p(2) and p(3), respectively. Then I make fitting using the following code:

[p,resnorm,residual,exitflag,output,lambda,J] = lsqcurvefit(fun, p0, w, Int);

According to Matlab's help for lsqcurvefit function, residual is calculated as fun(p,w)-Int at the solution p. After that, I find 95% confidence interval:

conf = nlparci(p,residual,'jacobian',J);

The next step is plotting experimental graph and fit function - this step is not important here, so I will miss it. The final step is building residuals plot:

plot(w,residual,'.')

I have 5 questions:

1) Is it enough to consider 95% confidence intervals and residual plots in order to determine whether the theoretical function fits well the experimental data or not well? Or there are other quantities which should be calculated in order to say that the fit is good or bad?

2) What is criteria for that the calculated 95% confidence intervals are reasonable? For example, I obtained the mean value 1560 for parameter p(1), and 95% confidence interval calculated is 1400 and 1720, i.e. the error is +- 160. But if calculated 95% confidence interval, for example, is 1200 and 1920, i.e. the error is +- 360, will it be still good? Where is the limit? How can I be sure that the calculated 95% confidence interval is acceptable?

3) What is criteria for that the residuals plot is good? I mean, what deviation of experimental data from fit function is acceptable? Everywhere is written that residuals plot must be symmetric around zero level, but again, the deviation can be very large - is it OK?

4) I found two sites where the residual plots are treated by default for linear regression model, here are the links: http://www.r-tutor.com/elementary-statistics/simple-linear-regression/residual-plot http://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/ So the question is why do these authors consider only linear regression model when talking about residual plots? What about non-linear regression models? For example, Gaussian function is non-linear function.

5) What type of residual plots does one need to build - residuals vs frequency, residuals vs fitted values, or residuals vs experimental intensities? Or all of them?

I will be very grateful for any help or advice.

BvU · Feb 12, 2019

Hello Leonid, ##\qquad## :welcome:

##\qquad## !

Leonid92 said:

this step is not important here

We start at loggerheads

: I wholeheartedly disagree: such a plot can show you whether you have done something stupid or not !

Leonid92 said:

I know the view of theoretical function that must describe the obtained spectrum

Maybe so, but do you also know for sure there is no background ? And no Lorentzian component in your peak ?

1) you could look at chi squared
2) looks like pretty bad statistics to me: a 5% sigma for peak position !?

Leonid92 said:

How can I be sure that the calculated 95% confidence interval is acceptable?

By looking at the plot you deemed unimportant

3) your judgment is important: is it all random noise or is there a trend ? Did your Int spectrum have reasonable w-bin width ?

Leonid92 · Feb 12, 2019

Thank you for reply!
1) You mean chi-squared normalized to number of degrees of freedom? It should be near 1, is it right?
2) I just gave an example; I always build plot of course, but in this case I decided not to pay attention to this step
3) As I know, there is no any random noise in spectrum. What do you mean when saying about reasonable w-bin width?

gleem · Feb 12, 2019

Leonid92 said:

Everywhere is written that residuals plot must be symmetric around zero level, but again, the deviation can be very large

This is true. The deviation of the residuals depends on the estimated uncertainty of the data which that you should have some way of determining. The usefulness of the confidence level depends on how well you know these uncertainties.

Can you show us the data/fit and the residual plot?

Leonid92 · Feb 27, 2019

gleem said:

This is true. The deviation of the residuals depends on the estimated uncertainty of the data which that you should have some way of determining. The usefulness of the confidence level depends on how well you know these uncertainties.

Can you show us the data/fit and the residual plot?

Sorry for late reply. In Matlab, I generated data which obey Gaussian behavior, i.e. generated data using function y = 5*exp(-(x-250).^2/(2*20^2))+noise. Then I fit Gaussian function { a*exp(-(x-b).^2/(2*c^2)) } to these simulated data, calculate 95% confidence interval for each parameter, calculate residuals and relative residuals in 2 ways { relative residual1 = abs(residual / y_experimental); relative residual2 = abs(residual / y_fit) }. Visually fit is good, residual plot is symmetric around zero level, 95% confidence intervals for each parameter are also good ( for example, found parameter a = 4.988157 - it is the bestfit parameter value, its 95% confidence interval is [4.94977, 5.026544] ). You can see found bestfit parameter values and corresponding 95% confidence intervals in the plot that I attach to this post. But the relative residual plots are very bad. What am I doing wrong? Why relative residual plots are bad? How can I find residual (deviation) in % ?

Relative-Residuals-a-7-000000-b-300-000000-c-70-000000.jpg

Relative-Residuals1-a-7-000000-b-300-000000-c-70-000000.jpg

Leonid92 · Feb 27, 2019

Sorry, I made a mistake in titles of plots - instead of "w0", there is parameter "c"

BvU · Feb 28, 2019

I looked at the plots before reading what you posted (old habit

) and had to suppress a feeling of being cheated: can't be real, too good to be true. The residuals all nicely gaussian distributed, no skewness, kurtosis, background, etc.

Then read

Leonid92 said:

I have experimental spectrum

Was reassured when I found text in #5. Of course a fit of something like that reproduces the original values faithfully and accurately.

as a side note: you have about 500 points, so the error accuracy is around 4%; no point in quoting 9 digits : 2 or 3 is enough.

Never mind.

95% confidence level is about two standard deviations, so you have a = 4.99 ##\pm## 0.02, b = 249.9 ##\pm## 0.1, c = 20.2 ##\pm## 0.1.

[edited: I forgot to divide by 2 -- fixed now]

In other words: you retrieved the original peak position to within 0.05% and the amplitude and width to within 0.5% . If this were real data, I'd start to worry frantically about calibrating all kinds of equipment to get comparable systematic errors !

Leonid92 said:

But the relative residual plots are very bad. What am I doing wrong? Why relative residual plots are bad? How can I find residual (deviation) in % ?

They are not, and you are doing nothing wrong.

The relative deviations don't say much useful: the signal to noise ratio is huge in the tails since there is no signal.
So in pic 3 you get 1 (signal = 0, noise = measurement) and in pic 4 you get close to ##\infty##; witness the scale of 10^30 at 12 ##\sigma## from the peak.

So now show what you do have

Leonid92 said:

I have experimental spectrum

Leonid92 · Feb 28, 2019

BvU said:

I looked at the plots before reading what you posted (old habit ) and had to suppress a feeling of being cheated: can't be real, too good to be true. The residuals all nicely gaussian distributed, no skewness, kurtosis, background, etc.

Then readWas reassured when I found text in #5. Of course a fit of something like that reproduces the original values faithfully and accurately.

as a side note: you have about 500 points, so the error accuracy is around 4%; no point in quoting 9 digits : 2 or 3 is enough.

Never mind.

95% confidence level is about two standard deviations, so you have a = 4.99 ##\pm## 0.02, b = 249.9 ##\pm## 0.1, c = 20.2 ##\pm## 0.1.

[edited: I forgot to divide by 2 -- fixed now]

In other words: you retrieved the original peak position to within 0.05% and the amplitude and width to within 0.5% . If this were real data, I'd start to worry frantically about calibrating all kinds of equipment to get comparable systematic errors !They are not, and you are doing nothing wrong.

The relative deviations don't say much useful: the signal to noise ratio is huge in the tails since there is no signal.
So in pic 3 you get 1 (signal = 0, noise = measurement) and in pic 4 you get close to ##\infty##; witness the scale of 10^30 at 12 ##\sigma## from the peak.

So now show what you do have

Thanks a lot for answer! So I don't need to calculate relative residuals.
My supervisor doesn't allow me to post our experimental data in the Internet.

BvU · Feb 28, 2019

Makes sense.
Normally, fitting a single peak of known shape on a smooth background can yield good results even with less abundant data, e.g.

picture from https://root.cern.ch/root/html/guides/primer/ROOTPrimer.pdf p 43

(but: turns out to be fake data too ?:)

!
and way too many digits (but that's because it's program output, not for publication)

gleem · Feb 28, 2019

Leonid what is the nature of your data that you want to test?

Leonid92 · Feb 28, 2019

gleem said:

Leonid what is the nature of your data that you want to test?

My experimental data are magnetic resonance spectra.

BvU · Feb 28, 2019

Plenty litterature on analysis of those ! Google is your friend !
Any specific questions that might be answered without visuals ?
You sure you want to fit Gaussians and not Lorentzians ?

Leonid92 · Mar 1, 2019

BvU said:

Plenty litterature on analysis of those ! Google is your friend !
Any specific questions that might be answered without visuals ?
You sure you want to fit Gaussians and not Lorentzians ?

Yes, I need Lorentzians. Thank you for link!

To what extent is the fit to experimental data good?

Attachments

Attachments

1. What is the definition of "good fit" in relation to experimental data?

2. How is the fit to experimental data evaluated?

3. What factors can affect the fit to experimental data?

4. How can one improve the fit to experimental data?

5. What does a poor fit to experimental data indicate?

Similar threads

Hot Threads

Recent Insights