Statistical weighting of data to improve fitting

roam · May 29, 2017

Homework Statement

I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).

Homework Equations

The Attempt at a Solution

Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.

BvU · May 29, 2017

No need to use different weights if all points have the same error.
But is this really what you are asking ?

BvU · May 29, 2017

However, I find it hard to believe the error in the ##y## measurement is 0.001 -- neither 0.001 m (too coarse) nor 0.001 mm (from the deviations)

Check the errors in slope and intercept -- is the intercept significantly different from zero ?

roam · May 29, 2017

BvU said:

No need to use different weights if all points have the same error.
But is this really what you are asking ?

Yes, I am required to use different weights, so the data would conform better to the model. We give less weight to the less precise measurements (e.g. outliers) and vice versa. I am wondering how you would calculate the required ##\sigma## and thus the weight?

They say the precision was 0.001 cm.

BvU · May 29, 2017

roam said:

Yes, I am required to use different weights, so the data would conform better to the model.

Well the weights are ##1\over \sigma^2##, so be my guest.

We give less weight to the less precise measurements (e.g. outliers) and vice versa.

You can't do that without cheating. Unless you have an external argument to reject a particular measurement (doors banging, earthquake, whatever) you have no moral right to degrade one measurement wrt another.

I am wondering how you would calculate the required ##\sigma## and thus the weight?

##1\over \sigma^2##

They say the precision was 0.001 cm.

'They say' means what ? Is it 0.00001000 m, 0.00001 ##\pm## 0.000005 m ?

Oh, and don't use meters in your formula, cm for the error and mm along the axis and a completely confusing x 10^-6 in the blue above.

From the complete error treatment you get an internal error, that you can compare with the external error ( ## \sigma ## ). (chi squared analysis)

BvU · May 29, 2017

BvU said:

There are formulas for errors in linear least squares (we had a very thorough thread here)

Found this in older threads. From there:

BvU said:

Kirchner (Berkeley) gives a derivation and the expressions here

Reading Kirchner throroughly is a good investment.

Ray Vickson · May 29, 2017

roam said:

Homework Statement

I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).

View attachment 204459

Homework Equations

The Attempt at a Solution

Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.

If the underlying uncertainty model has equal (theoretical) variances, it would be wrong to use different weights at the different points. If you want to try to "improve" the fit, you need to be able to identify outliers (data points that, somehow, do not belong), and either omit them altogether or (as you say) give them different weights. More on this below.

Sometimes the least-squares fit can be overly-sensitive to outliers, so an alternative method that is resistant to outliers and robust to departures from normality may be more useful. That is usually achieved by performing an L1-fit instead of a least-squares fit. That is, instead of minimizing the total squared error ##S_2 = \sum_i (a + b x_i - y_i)^2## you can minimize the total absolute error ##S_1 = \sum_i |a + b x_i - y_i|##. This problem is mathematically harder than the least-squares problem, but nowadays is pretty readily solved; it can be tackled either as (1) a linear programming problem; or (2) a convergent sequence of weighted least-squares problems (where the weights are adjusted between iterations until some convergence criterion is satisfied).

So, we have that one source of weights in the least-squares procedure is to mimic the least-absolute problem! It really does not have much to do with "statistics", but more to do with the structure of solution algorithms.

For more on L1 regression, see
See, eg., https://en.wikipedia.org/wiki/Least_absolute_deviations
and for its relation to weighted least-squares, see
https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares

Statistical weighting of data to improve fitting

Homework Help Overview

Discussion Character

Approaches and Questions Raised

Discussion Status

Contextual Notes

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

The optimal way of dividing the bet three ways

"Critical" Triangle Problem

What does "compute Aut(G)" mean?

Hedging on a weather prediction

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect