Linear Regression with Measurement Errors

In summary, the conversation discusses finding a best fitting line for data with measurement errors. The topic of standard regression methods and their limitations due to significant errors in the independent variables is also brought up. The use of Demming regression and Principle Component Analysis is mentioned as alternative methods. The importance of computing the correlation coefficient and considering equal variances for the X and Y variables is highlighted. The FITEXY function in numerical recipes is also recommended as a resource for this type of problem.
  • #1
Arishy Han
2
0
Hello,

I have a set of data, two columns, and each datum has its measurement error like illustration shows below:
x | y
--------------|-----------------
x1+/-xe1 | y1+/-ye1
. | .
. | .
-------------------------------
Now, I intend to find a best fitting line with those data.
I just ask where I can find the sources or exact formulae that I can
calculate the slope and the intercept even the slope variance and intercept variance.

Thank you so much,
Han
 
Mathematics news on Phys.org
  • #2
Every statistics textbook should have the formulas, every statistics tool will have them, for a simple least square approximation you can also follow the links at the Wikipedia article.
 
  • #3
mfb said:
Every statistics textbook should have the formulas, every statistics tool will have them, for a simple least square approximation you can also follow the links at the Wikipedia article.
Thank you !
 
  • #4
  • #5
I don't know what you call "standard", as long as the uncertainties can be treated as uncorrelated Gaussians this looks pretty regular to me.
 
  • #6
mfb said:
I don't know what you call "standard", as long as the uncertainties can be treated as uncorrelated Gaussians this looks pretty regular to me.
The sum-square function being minimized is different when X measurements have errors. The error in the X measurement should not be treated as an error of the Y variable. The errors should not be added since there is a significant difference between 2 independent random error variables being 1 σ from 0 versus 1 variable being 2 σ from 0.
From the second figure of wikipedia link above: "The bivariate (Deming regression) case of total least squares. The red lines show the error in both x and y. This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations measured perpendicularly, arises when x and y have equal variances."

The Deming regression reminds me of the Principle Component Analysis. I can't immediately see the difference.
Apparently the Demming Regression can be done in the language R after installing the mcr package. See https://www.r-bloggers.com/deming-and-passing-bablok-regression-in-r/. I have no experience with it.

PS. It may be advisable to rescale the X and/or Y variables so that their sample variances are equal. If these algorithms minimize a total sum-squared errors, it may be necessary that the variance of both variables be equal. However, the algorithms might already take care of this.
 
Last edited:
  • #7
Don't forget to compute the correlation coefficient (r2) when you are through. The linear regression formula gives you an answer every time - even if it is clearly wrong. I have seen researchers in the softer disciplines using linear regression on every set of data they have in the hope of finding some significance and getting excited over an r2 of 0.01(!).

2I1fi.png
 
  • #8
Guess the correlation

With given uncertainties, chi^2/ndf shows how reasonable the linear fit is.
FactChecker said:
The sum-square function being minimized is different when X measurements have errors. The error in the X measurement should not be treated as an error of the Y variable. The errors should not be added since there is a significant difference between 2 independent random error variables being 1 σ from 0 versus 1 variable being 2 σ from 0.
No one suggested that. You have to find the point on the line that fits best given the two independent uncertainties (in x and y). That is a single analytic expression.
 
  • #9
mfb said:
Guess the correlation

With given uncertainties, chi^2/ndf shows how reasonable the linear fit is.No one suggested that. You have to find the point on the line that fits best given the two independent uncertainties (in x and y). That is a single analytic expression.
The standard linear regression algorithms assume that the random error is added to the Y variable only. Treating the X and Y errors separately gives a different regression line, which is the correct one in this case.
 
Last edited:
  • #10

1. What is linear regression with measurement errors?

Linear regression with measurement errors is a statistical method used to analyze the relationship between two variables when there is uncertainty or error in the measurement of one or both of the variables. It takes into account the measurement errors in order to accurately estimate the relationship between the variables.

2. How does linear regression with measurement errors differ from regular linear regression?

In regular linear regression, it is assumed that the independent variable is measured without any error. However, in linear regression with measurement errors, the independent variable is also subject to error. This method takes into account the uncertainty in both variables, resulting in more accurate estimates of the relationship between them.

3. What are some common sources of measurement errors in linear regression?

Measurement errors in linear regression can come from a variety of sources, including human error, equipment malfunction, or natural variability in the data. In some cases, the error may be random and in others it may be systematic. It is important to identify and account for these errors in order to obtain accurate results.

4. How is linear regression with measurement errors useful in scientific research?

Linear regression with measurement errors is useful in scientific research because it allows for more accurate estimation of the true relationship between variables. This can help researchers to better understand and predict the behavior of complex systems, and to make more informed decisions based on their data.

5. What are some limitations of linear regression with measurement errors?

One limitation of linear regression with measurement errors is that it requires a large sample size in order to obtain reliable estimates. Additionally, it may not be appropriate for data sets with high levels of measurement error or for non-linear relationships between variables. It is important to carefully consider the assumptions and potential limitations of this method when using it in research.

Similar threads

Replies
8
Views
2K
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
877
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
978
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
460
Back
Top