Chi-squared fit with errors on both x and y

Click For Summary

Discussion Overview

The discussion revolves around fitting a straight line to data points that have errors in both x and y coordinates. Participants explore methods for incorporating these errors into a fitting procedure, particularly focusing on the statistical validity of different approaches and the implications of mixed units in the calculations.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant suggests modifying the chi-squared minimization formula to account for errors in both x and y by replacing ##\sigma_y^2## with ##\sigma_x^2+\sigma_y^2##.
  • Another participant counters that this approach is overly simplistic and references errors-in-variables models and regression dilution as important considerations.
  • Orthogonal distance regression is mentioned as a relevant method for handling errors in both variables.
  • Concerns are raised about the uniqueness and unbiased nature of solutions when using perpendicular distances for fitting.
  • One participant highlights the potential issue of inconsistent dimensions when minimizing a function that combines errors in x and y, using temperature and time as an example.
  • Standardization of variables is proposed as a potential solution to avoid issues with mixed units in the minimization process.
  • A later reply seeks clarification on the concept of standardization and its implications for unit consistency in the context of the discussion.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate methods for fitting a line with errors in both coordinates, indicating that multiple competing approaches exist. The discussion remains unresolved regarding the best way to handle these errors statistically.

Contextual Notes

Participants note the importance of clearly defining the problem to obtain a statistically valid answer. There are unresolved concerns about the implications of mixed units in the minimization process and the need for a well-defined question to achieve a meaningful solution.

Malamala
Messages
348
Reaction score
28
Hello I have some data points which have errors on both x and y coordinates. I want to fit a straight line to them but I am not sure how to take the error on x into account. Normally, when I have just the error on y, I want to minimize $$\sum\frac{(y_{pred}(x)-y_{measured}(x))^2}{\sigma_y^2}$$
Can I just replace ##\sigma_y^2## with ##\sigma_x^2+\sigma_y^2##? The errors on x and y are not correlated. Thank you!
 
Physics news on Phys.org
  • Like
Likes   Reactions: WWGD
It is also called orthogonal distance regression.
 
Dale said:
It is also called orthogonal distance regression.

Yes. You start with the obvious thing - a line y = mx + b, and you try and do a least-squares fit using the perpendicular distances between the points and the candidate line instead of the y-distances. Problem is that doesn't always get you a unique unbiased solution.

That's why you need to specify what you are looking for very carefully.
 
  • Like
Likes   Reactions: WWGD and Dale
Even though this appears to be a drive-by posting, I'll make one more comment.

If you minimize a function of Δy only, it's clear what you are doing. If you minimize something like Δx2 + Δy2 it's not even guaranteed that you have a number with consistent dimensions: suppose y is temperature and x is time. What units would Δx2 + Δy2 even be in?

To get a well-defined answer, one needs to pose a much, much better defined question. And even then it may not exist.
 
  • Like
Likes   Reactions: WWGD
Vanadium 50 said:
Even though this appears to be a drive-by posting, I'll make one more comment.

If you minimize a function of Δy only, it's clear what you are doing. If you minimize something like Δx2 + Δy2 it's not even guaranteed that you have a number with consistent dimensions: suppose y is temperature and x is time. What units would Δx2 + Δy2 even be in?

To get a well-defined answer, one needs to pose a much, much better defined question. And even then it may not exist.
Maybe if you standardize your variables you can avoid the issue with units? I understand that is one if the reasons for standardization.
 
WWGD said:
Maybe if you standardize your variables you can avoid the issue with units? I understand that is one if the reasons for standardization.
What do you mean by this?
 
Malamala said:
What do you mean by this?
I was replying to @Vanadium 50 regarding his statement on mixed units in the expression ##\sqrt \delta x^2 + \ delta y^2 ##. If you standardize your expression ( assuming normality of data or other) the resulting variable is unitless , from algebra alone ( you're dividing two expressions with the same units ), so that you avoid at least this issue of having mixed units. Seems like something @Stephen Tashi may know about.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K