Statistical weighting of data to improve fitting

Click For Summary

Homework Help Overview

The discussion revolves around performing a weighted fit of a data set consisting of vectors (x, y) with a specified uncertainty in the y values. Participants are exploring the implications of using statistical weighting in the context of fitting data, particularly when all measurements have the same error.

Discussion Character

  • Exploratory, Assumption checking, Conceptual clarification

Approaches and Questions Raised

  • Participants discuss the calculation of weights based on the uncertainty in the y values and question the validity of using different weights when the error is uniform across all data points. There is also a consideration of how to identify outliers and their impact on the fitting process.

Discussion Status

The discussion is ongoing, with various perspectives on the necessity and method of applying weights to the data. Some participants suggest that using different weights may not be appropriate if the variances are equal, while others propose alternative fitting methods that could be more robust against outliers.

Contextual Notes

There are concerns regarding the precision of the error stated as 0.001, with participants questioning its validity and implications for the fitting process. Additionally, there is mention of the need to clarify the units used in calculations to avoid confusion.

roam
Messages
1,265
Reaction score
12

Homework Statement


I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).

plt.png

Homework Equations



The Attempt at a Solution


Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.
 
Physics news on Phys.org
No need to use different weights if all points have the same error.
But is this really what you are asking ?
 
However, I find it hard to believe the error in the ##y## measurement is 0.001 -- neither 0.001 m (too coarse) nor 0.001 mm (from the deviations)

Check the errors in slope and intercept -- is the intercept significantly different from zero ?
 
BvU said:
No need to use different weights if all points have the same error.
But is this really what you are asking ?

Yes, I am required to use different weights, so the data would conform better to the model. We give less weight to the less precise measurements (e.g. outliers) and vice versa. I am wondering how you would calculate the required ##\sigma## and thus the weight? :confused:

They say the precision was 0.001 cm.
 
roam said:
Yes, I am required to use different weights, so the data would conform better to the model.
Well the weights are ##1\over \sigma^2##, so be my guest.
We give less weight to the less precise measurements (e.g. outliers) and vice versa.
You can't do that without cheating. Unless you have an external argument to reject a particular measurement (doors banging, earthquake, whatever) you have no moral right to degrade one measurement wrt another.
I am wondering how you would calculate the required ##\sigma## and thus the weight? :confused:
##1\over \sigma^2##
They say the precision was 0.001 cm.
'They say' means what ? Is it 0.00001000 m, 0.00001 ##\pm## 0.000005 m ?

Oh, and don't use meters in your formula, cm for the error and mm along the axis and a completely confusing x 10-6 in the blue above.

From the complete error treatment you get an internal error, that you can compare with the external error ( ## \sigma ## ). (chi squared analysis)
 
  • Like
Likes   Reactions: roam
BvU said:
There are formulas for errors in linear least squares (we had a very thorough thread here)
Found this in older threads. From there:
BvU said:
Kirchner (Berkeley) gives a derivation and the expressions here
Reading Kirchner throroughly is a good investment.
 
roam said:

Homework Statement


I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).


Homework Equations



The Attempt at a Solution


Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.

If the underlying uncertainty model has equal (theoretical) variances, it would be wrong to use different weights at the different points. If you want to try to "improve" the fit, you need to be able to identify outliers (data points that, somehow, do not belong), and either omit them altogether or (as you say) give them different weights. More on this below.

Sometimes the least-squares fit can be overly-sensitive to outliers, so an alternative method that is resistant to outliers and robust to departures from normality may be more useful. That is usually achieved by performing an L1-fit instead of a least-squares fit. That is, instead of minimizing the total squared error ##S_2 = \sum_i (a + b x_i - y_i)^2## you can minimize the total absolute error ##S_1 = \sum_i |a + b x_i - y_i|##. This problem is mathematically harder than the least-squares problem, but nowadays is pretty readily solved; it can be tackled either as (1) a linear programming problem; or (2) a convergent sequence of weighted least-squares problems (where the weights are adjusted between iterations until some convergence criterion is satisfied).

So, we have that one source of weights in the least-squares procedure is to mimic the least-absolute problem! It really does not have much to do with "statistics", but more to do with the structure of solution algorithms.

For more on L1 regression, see
See, eg., https://en.wikipedia.org/wiki/Least_absolute_deviations
and for its relation to weighted least-squares, see
https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
 
Last edited:
  • Like
Likes   Reactions: roam

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
24
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 11 ·
Replies
11
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K