Scaling covariance for an error calculation

Click For Summary

Discussion Overview

The discussion revolves around the use of covariance scaling in error calculations during linear fitting of data, specifically in the context of fitting programs like Python's lmfit and scipy. Participants explore the implications of setting the scale_covar parameter to True or False, and how this affects the reported errors on fit parameters. The conversation touches on the appropriateness of these methods in experimental physics and the conventions surrounding error reporting in published papers.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant notes that setting scale_covar=True adjusts errors until the chi-square is 1, while setting it to False changes the errors on parameters based on the original y errors.
  • Another participant suggests that significant scaling indicates potential issues with the uncertainties in the data.
  • Some participants propose that scaling should be used when there is uncertainty about the correctness of the errors, while others argue that it may indicate a problem with the model or data.
  • Concerns are raised about the validity of using a straight line fit and whether the uncertainties on the data points are underestimated.
  • A participant shares their specific data and fitting results, highlighting discrepancies in error reporting based on the scaling parameter.
  • Discussion includes references to Anscombe's quartet as an example of how different datasets can yield misleading fits, emphasizing the need for careful consideration of the fitting approach.
  • Some participants recommend being conservative and reporting larger uncertainties if there is doubt about the fit's validity.

Areas of Agreement / Disagreement

Participants express differing views on when to use covariance scaling, with no consensus reached on a definitive guideline. Some agree that significant scaling suggests issues with the data or model, while others remain uncertain about the appropriate conditions for applying scaling.

Contextual Notes

Participants mention limitations in their understanding of when to apply scaling and the potential for errors in their data points or fitting models. The discussion reflects a lack of clarity on established conventions in the field regarding error reporting in fits.

Who May Find This Useful

This discussion may be useful for researchers and practitioners in experimental physics, data analysis, and statistical modeling who are grappling with error calculations and fitting methodologies.

Malamala
Messages
348
Reaction score
28
Hello! I just discovered (maybe a bit late) that most fitting programs (Python lmfit or scipy, for example) have a parameter (by default turned on) that allows a scaling of the covariance matrix for calculating the errors (usually called scale_covar or something similar). After some reading I figured out (hopefully correctly) that setting that parameter on (scale_covar=True) means basically to adjust the errors on the data until the chi-square would be 1, and report the errors on parameters using these adjusted values. I have noticed that doing so, if you scale all the y error by the same amount the error on the parameter fit doesn't change. On the other hand if I set the parameter off (scale_covar=False), scaling the errors on y changes the errors on the parameters of the fit, too.

In my case I need ,to do a linear fit to some data. If I use scale_covar=True (which is the default) I get something around (ignoring decimals) ##25 \pm 1##. If I set it to False I get half the error ##25 \pm 0.5##. I am quite confident about the errors on my points and the fit looks good. Which value should I report? And in general, for any fit, when should I set this parameter to True and when to False?

Lastly, I don't really remember reading any experimental physics paper where they actually talk about which of these 2 methods they use to get the errors from a fit. They just state the errors on the parameters. Is there a generally agreed upon way of doing this, such that everyone is setting that parameter (in their fitting program) to True or False? And if so what is the convention.

In principle, I would like to know, if I were to publish my data in a journal (say PRL), without talking about this covariance scaling stuff (as no one does, it seems) should I use (in my linear case fit) 0.5 or 1 for the error?

Thank you!
 
Physics news on Phys.org
If that scaling is significant something went wrong with your uncertainties.
The particle data group scales up the uncertainty if the measurements are incompatible (not sure what exactly their threshold is) and points out that it did so in the few cases where this is necessary.
 
mfb said:
If that scaling is significant something went wrong with your uncertainties.
The particle data group scales up the uncertainty if the measurements are incompatible (not sure what exactly their threshold is) and points out that it did so in the few cases where this is necessary.
Thank you for your reply. So should I use scaling when I am not sure that my errors are correct? But in my case I am not sure what would be incompatible with what. I just have 5 data points with errors on them and I want to fit a straight line to it. I don't have 2 sets of measurements or something to compare with such that I could say that something is not compatible. But, regardless of that, could you please explain to me when one should use scaling and when not? I am not sure I understand that. Thank you!
 
Maybe a straight line is a bad assumption or you underestimate your uncertainties on these points.
If in doubt be conservative and give the larger uncertainty.
 
mfb said:
Maybe a straight line is a bad assumption or you underestimate your uncertainties on these points.
If in doubt be conservative and give the larger uncertainty.
But I am still not sure I understand when I should use scaling and when not. Could you explain that a bit (or point me towards some readings)? Thank you!
 
If it is significant you should first try to find the errors in your data points or your model to describe the data points. You might fit nonsense. If you do not then the scaling shouldn't be a factor 2.
Check Anscombe's quartet. 4 datasets that all give the same straight line fit - but in three cases that fit is clearly the wrong approach. Here are more examples.

If you can't find any reason why your ##\chi^2/ndf## is so bad then it might be acceptable to use the scaled up uncertainties to get something, but you should discuss this explicitly because it means something went wrong somewhere.
 
mfb said:
If it is significant you should first try to find the errors in your data points or your model to describe the data points. You might fit nonsense. If you do not then the scaling shouldn't be a factor 2.
Check Anscombe's quartet. 4 datasets that all give the same straight line fit - but in three cases that fit is clearly the wrong approach. Here are more examples.

If you can't find any reason why your ##\chi^2/ndf## is so bad then it might be acceptable to use the scaled up uncertainties to get something, but you should discuss this explicitly because it means something went wrong somewhere.
Thank you again for your reply! Here is my actual data that I am trying to fit:
##x = [-0.312, -0.217, -0.081, 0., 0.211]##
##y = [-8.050, -5.278 , -3.510, 0., 5.521]##
##y_{err} = [0.121, 0.218, 0.421, 0.115, 0.305]##

I also attached the plot with the fit I get. (I am using the lmfit package in python). When I use scaled covariance I am getting these parameters for the fit:
line1intercept: -0.02108874 +/- 0.20201635
line1slope: 25.6603677 +/- 0.95263756
when I don't use scaled covariance I am getting this:
line1intercept: -0.02108874 +/- 0.09520060
line1slope: 25.6603677 +/- 0.44893231

The reduced chi squared is 4.50291419. Is there something I am doing wrong? To me the fit looks pretty good (and there are some theoretical motivations for a straight fit, too). Thank you, again, for the help.

Screen Shot 2019-12-16 at 02.35.22.png
 
The central data point is a bit over 3 standard deviations away from the fit. If you think this is a good data point and nothing went wrong with it use the scaled up uncertainties. Most likely something went wrong there, so it is better to be conservative.
 
mfb said:
The central data point is a bit over 3 standard deviations away from the fit. If you think this is a good data point and nothing went wrong with it use the scaled up uncertainties. Most likely something went wrong there, so it is better to be conservative.
Thank you for this! I digged a bit deeper into my problem, more specifically where are my errors coming from. I have a counting measurement and I fit my points with a curve. From there I extract the centers of the peaks for different measurements and the plot I showed previously shows the difference between such centers. So the errors there come from the error on the centers of the peaks i.e. from the errors given by the fitting program (which is the same as before, lmfit). I attach below such a counting fit. I have noticed that here I have exactly the same problem as before: scaling my errors give me twice the errors on the parameters (mainly the peak center) compared to when I don't scale them. So the issue I mentioned above, I think, it's just a propagation of the issue from here. In the plot below, the errors are just Poisson i.e. square root of the number of counts. Also the fit is motivated on theoretical grounds. Do you know why do I have this problem here, in the first place when in principle, both the fit and the errors should be right? Thank you so much for your help!

Screen Shot 2019-12-16 at 20.22.42.png
 
  • #10
Your uncertainties are clearly smaller than the spread of the measurements. Something makes them spread by more than the square root of the counts. In addition the fit doesn't do a good job in the two larger peaks and it overestimates the flat area to the right of the last peak.
 
  • Like
Likes   Reactions: BvU

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 31 ·
2
Replies
31
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
13
Views
2K