Weighting data based on the errors

kelly0303 · Sep 2, 2019

Hello! I have some data (counts) with a Poisson error associated to it and I want to make a fit to the data. I am trying to weight the data inversely proportional to the errors, such that the data points with high errors are less important for the fit. However, using the the error on its own, doesn't seem right. If I have a point with a value ##100 \pm 10## and another one with the value ##10000 \pm 100##, the first one has a smaller error, but the second one should be (I think) more important for the fit, as the relative error is much smaller. So, should I weight each data point by the inverse of its percentage error i.e. the first point would have a weight of ##10##, while the other a weight of ##100##? Is this the right way to do it? Thank you!

tnich · Sep 2, 2019

Maximum Likelihood Estimation is an approach that may accomplish what you want to do. With the scant details you have given, I can't say whether it would be the best possible approach.

kelly0303 · Sep 2, 2019

tnich said:

Maximum Likelihood Estimation is an approach that may accomplish what you want to do. With the scant details you have given, I can't say whether it would be the best possible approach.

Thank you for your reply! To give more details: I have some data points to which I want to fit a Voigt profile + background. For each data point I have the energy, the counts and the error (square root of the number of counts). In principle, I am just using an already build Python fitting module (lmfit), and one of the options for that is "weights". I assumed that I have to use the value of the errors as the weights to the fit (1/errors), I am just not sure if I should use the absolute or relative error (or something totally different).

tnich · Sep 2, 2019

I don't know what algorithm lmfit uses, or how it uses the weight data, so I don't really know how to answer your question. Since the only error information you have is derived solely from your count data, it would not provide any more information to the fit algorithm than the count data would by itself. I would be tempted not to enter weight data in that case.

I can tell you that the data with low counts is the important data for fitting a pdf curve, especially when it is a Voigt profile. In that case you don't know whether you have a Cauchy distribution, a Gaussian distribution, or some combination of the two. So estimating the tails of the distribution is the important aspect, and that depends on fitting the experimental data you have for the tails. Your fit algorithm may take that into account or it may not.

kelly0303 · Sep 2, 2019

tnich said:

I don't know what algorithm lmfit uses, or how it uses the weight data, so I don't really know how to answer your question. Since the only error information you have is derived solely from your count data, it would not provide any more information to the fit algorithm than the count data would by itself. I would be tempted not to enter weight data in that case.

I can tell you that the data with low counts is the important data for fitting a pdf curve, especially when it is a Voigt profile. In that case you don't know whether you have a Cauchy distribution, a Gaussian distribution, or some combination of the two. So estimating the tails of the distribution is the important aspect, and that depends on fitting the experimental data you have for the tails. Your fit algorithm may take that into account or it may not.

Thank you for your reply. So you are saying that fitting the background right (i.e. the tail) is more important than the peak itself for a Voigt profile? If that's the case, then using the inverse error as the weight will help, I just thought it makes more sense to focus on fitting the peak right (and extracting the parameters of the peak) rather than the background.

Weighting data based on the errors

What is weighting data based on the errors?

Why is weighting data based on the errors important?

How is weighting data based on the errors calculated?

What are the benefits of weighting data based on the errors?

Are there any limitations to weighting data based on the errors?

Similar threads

Hot Threads

Recent Insights