Statistical weighting of data to improve fitting

In summary: The Attempt at a SolutionStatistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?Any explanation would be greatly appreciated.You can't do that without cheating. Unless you have an external argument to reject a particular measurement (doors banging, earthquake, whatever) you have no moral right to degrade one
  • #1
roam
1,271
12

Homework Statement


I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).

plt.png

Homework Equations



The Attempt at a Solution


Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.
 
Physics news on Phys.org
  • #2
No need to use different weights if all points have the same error.
But is this really what you are asking ?
 
  • #3
However, I find it hard to believe the error in the ##y## measurement is 0.001 -- neither 0.001 m (too coarse) nor 0.001 mm (from the deviations)

Check the errors in slope and intercept -- is the intercept significantly different from zero ?
 
  • #4
BvU said:
No need to use different weights if all points have the same error.
But is this really what you are asking ?

Yes, I am required to use different weights, so the data would conform better to the model. We give less weight to the less precise measurements (e.g. outliers) and vice versa. I am wondering how you would calculate the required ##\sigma## and thus the weight? :confused:

They say the precision was 0.001 cm.
 
  • #5
roam said:
Yes, I am required to use different weights, so the data would conform better to the model.
Well the weights are ##1\over \sigma^2##, so be my guest.
We give less weight to the less precise measurements (e.g. outliers) and vice versa.
You can't do that without cheating. Unless you have an external argument to reject a particular measurement (doors banging, earthquake, whatever) you have no moral right to degrade one measurement wrt another.
I am wondering how you would calculate the required ##\sigma## and thus the weight? :confused:
##1\over \sigma^2##
They say the precision was 0.001 cm.
'They say' means what ? Is it 0.00001000 m, 0.00001 ##\pm## 0.000005 m ?

Oh, and don't use meters in your formula, cm for the error and mm along the axis and a completely confusing x 10-6 in the blue above.

From the complete error treatment you get an internal error, that you can compare with the external error ( ## \sigma ## ). (chi squared analysis)
 
  • Like
Likes roam
  • #6
BvU said:
There are formulas for errors in linear least squares (we had a very thorough thread here)
Found this in older threads. From there:
BvU said:
Kirchner (Berkeley) gives a derivation and the expressions here
Reading Kirchner throroughly is a good investment.
 
  • #7
roam said:

Homework Statement


I am trying to perform a weighted fit of a data set ##(x,y)## shown below. The only information I have are the two vectors ##x## and ##y## and the uncertainty present in the ##y## values (##=0.001##).


Homework Equations



The Attempt at a Solution


Statistical weighting of data is usually taken to be ##1/\sigma^2##, where ##\sigma^2## is the variance in the (normally distributed) error in the value of ##y## at the data point ##(x_i, y_i)##. In this case, what equation do I need to use to calculate the ##\sigma## to get the required weight for the corresponding ##y## value?

Any explanation would be greatly appreciated.

If the underlying uncertainty model has equal (theoretical) variances, it would be wrong to use different weights at the different points. If you want to try to "improve" the fit, you need to be able to identify outliers (data points that, somehow, do not belong), and either omit them altogether or (as you say) give them different weights. More on this below.

Sometimes the least-squares fit can be overly-sensitive to outliers, so an alternative method that is resistant to outliers and robust to departures from normality may be more useful. That is usually achieved by performing an L1-fit instead of a least-squares fit. That is, instead of minimizing the total squared error ##S_2 = \sum_i (a + b x_i - y_i)^2## you can minimize the total absolute error ##S_1 = \sum_i |a + b x_i - y_i|##. This problem is mathematically harder than the least-squares problem, but nowadays is pretty readily solved; it can be tackled either as (1) a linear programming problem; or (2) a convergent sequence of weighted least-squares problems (where the weights are adjusted between iterations until some convergence criterion is satisfied).

So, we have that one source of weights in the least-squares procedure is to mimic the least-absolute problem! It really does not have much to do with "statistics", but more to do with the structure of solution algorithms.

For more on L1 regression, see
See, eg., https://en.wikipedia.org/wiki/Least_absolute_deviations
and for its relation to weighted least-squares, see
https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
 
Last edited:
  • Like
Likes roam

What is statistical weighting of data?

Statistical weighting of data is a method used in data analysis to assign different levels of importance, or weight, to different data points. This allows for a more accurate representation of the data and can improve the fitting of models to the data.

Why is statistical weighting of data important?

Statistical weighting of data is important because it can help to reduce the impact of outliers or errors in the data, resulting in a more accurate representation of the underlying relationship between variables. It can also improve the accuracy of statistical models and predictions based on the data.

How is statistical weighting of data determined?

The determination of statistical weighting of data depends on the specific data and the analysis being conducted. In some cases, the weights may be based on the reliability or precision of the data, while in others, they may be based on the significance or importance of certain data points.

What are the different types of statistical weighting methods?

There are several different types of statistical weighting methods, including equal weighting, inverse weighting, and proportional weighting. Equal weighting assigns the same weight to each data point, while inverse weighting assigns higher weights to data points with smaller variances. Proportional weighting assigns weights based on the proportion of the data each point represents.

Can statistical weighting of data be used for any type of data?

Statistical weighting of data can be used for various types of data, including numerical data, categorical data, and time series data. However, the specific weighting method used may vary depending on the type of data and the analysis being conducted.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
489
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Precalculus Mathematics Homework Help
Replies
1
Views
1K
  • General Math
Replies
6
Views
784
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
  • Precalculus Mathematics Homework Help
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
889
  • Precalculus Mathematics Homework Help
Replies
2
Views
1K
Back
Top