# Statistical weighting of data to improve fitting

## Homework Statement

I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).

## The Attempt at a Solution

Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.

Related Precalculus Mathematics Homework Help News on Phys.org
BvU
Homework Helper
2019 Award
No need to use different weights if all points have the same error.
But is this really what you are asking ?

BvU
Homework Helper
2019 Award
However, I find it hard to believe the error in the ##y## measurement is 0.001 -- neither 0.001 m (too coarse) nor 0.001 mm (from the deviations)

Check the errors in slope and intercept -- is the intercept significantly different from zero ?

No need to use different weights if all points have the same error.
But is this really what you are asking ?
Yes, I am required to use different weights, so the data would conform better to the model. We give less weight to the less precise measurements (e.g. outliers) and vice versa. I am wondering how you would calculate the required ##\sigma## and thus the weight?

They say the precision was 0.001 cm.

BvU
Homework Helper
2019 Award
Yes, I am required to use different weights, so the data would conform better to the model.
Well the weights are ##1\over \sigma^2##, so be my guest.
We give less weight to the less precise measurements (e.g. outliers) and vice versa.
You can't do that without cheating. Unless you have an external argument to reject a particular measurement (doors banging, earthquake, whatever) you have no moral right to degrade one measurement wrt another.
I am wondering how you would calculate the required ##\sigma## and thus the weight?
##1\over \sigma^2##
They say the precision was 0.001 cm.
'They say' means what ? Is it 0.00001000 m, 0.00001 ##\pm## 0.000005 m ?

Oh, and don't use meters in your formula, cm for the error and mm along the axis and a completely confusing x 10-6 in the blue above.

From the complete error treatment you get an internal error, that you can compare with the external error ( ## \sigma ## ). (chi squared analysis)

roam
BvU
Homework Helper
2019 Award
There are formulas for errors in linear least squares (we had a very thorough thread here)
Found this in older threads. From there:
Kirchner (Berkeley) gives a derivation and the expressions here
Reading Kirchner throroughly is a good investment.

Ray Vickson
Homework Helper
Dearly Missed

## Homework Statement

I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).

## The Attempt at a Solution

Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.
If the underlying uncertainty model has equal (theoretical) variances, it would be wrong to use different weights at the different points. If you want to try to "improve" the fit, you need to be able to identify outliers (data points that, somehow, do not belong), and either omit them altogether or (as you say) give them different weights. More on this below.

Sometimes the least-squares fit can be overly-sensitive to outliers, so an alternative method that is resistant to outliers and robust to departures from normality may be more useful. That is usually achieved by performing an L1-fit instead of a least-squares fit. That is, instead of minimizing the total squared error ##S_2 = \sum_i (a + b x_i - y_i)^2## you can minimize the total absolute error ##S_1 = \sum_i |a + b x_i - y_i|##. This problem is mathematically harder than the least-squares problem, but nowadays is pretty readily solved; it can be tackled either as (1) a linear programming problem; or (2) a convergent sequence of weighted least-squares problems (where the weights are adjusted between iterations until some convergence criterion is satisfied).

So, we have that one source of weights in the least-squares procedure is to mimic the least-absolute problem! It really does not have much to do with "statistics", but more to do with the structure of solution algorithms.

For more on L1 regression, see
See, eg., https://en.wikipedia.org/wiki/Least_absolute_deviations
and for its relation to weighted least-squares, see
https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares

Last edited:
roam