Propagating Measurement Uncertainty into a Linear Regression Model

Click For Summary

Discussion Overview

The discussion revolves around the integration of measurement uncertainty into the standard error of a linear regression model. Participants explore how to account for errors in both x and y measurements when determining the slope of the regression line, particularly in the context of plotting concentration against a derived variable for flux calculation.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant seeks to combine the standard error of the best fit line with the individual measurement errors, specifically questioning how to propagate a 9% error from y measurements into the regression error.
  • Another participant suggests that the regression error already accounts for the errors in y values, implying that additional propagation may not be necessary.
  • A participant clarifies their interest in integrating the individual error margins of each data point into the overall error of the linear regression.
  • One proposed method involves generating random numbers for each y value to simulate the uncertainty, suggesting a Monte-Carlo approach.
  • Another participant expresses a desire for an analytical method rather than a simulation approach, indicating a preference for simplicity in the linear regression model.
  • A more technical contribution discusses the use of a variance matrix for different error terms in the context of ordinary least squares (OLS) regression, providing a mathematical framework for estimating regression coefficients with known variances.

Areas of Agreement / Disagreement

Participants exhibit disagreement regarding the necessity and method of incorporating measurement uncertainty into the regression analysis. Some believe the existing regression error suffices, while others seek additional methods to account for individual measurement errors.

Contextual Notes

There are unresolved assumptions regarding the distribution of measurement errors and the implications of using different methods (analytical vs. simulation) for error propagation in linear regression.

lschong
Messages
2
Reaction score
0
I am trying to figure out how to combine uncertainty (in x and y) into the standard error of the best fit line from the linear regression for that dataset.

I am plotting units of concentration (x) versus del t/height (y) to get a value for the flux (which is the slope)

I understand how to get the standard error of the best fit line, but that only gives the error in y in relation to the best fit line. Is there a good way to combine that error with the error from the individual measurements?

For example:
(x) (y)
delt/h Conc.
0.00 563.84
2.39 568.77
3.53 566.64
11.03 572.59

The error in each y measurement is 9%

When I do the linear regression, I get a slope of .71 and an error of .21

Is there a (relatively) simple way to propagate the 9% error into the regression error?
 
Physics news on Phys.org
Putting aside the errors in the x values, the regression error already includes the errors in y.
 
Are you referring to the standard error of the regression line? I know that the standard error includes all the vertical error from each point to the line, but what I want to do is take into account the vertical error in each data point with respect to the line.

So, my first point y = 531 +/- 51 and the second point y = 540+/- 46 and so on. How do I integrate the +/- values for each data point into the error for the linear regression?

Thanks.
 
The computationally easy way is to generate random numbers for each y. For y = 531 +/- 51, you could generate (say) 10 uniform random numbers with mean = 531 and range = +/- 51, all matched to the same x value.
 
Hi,
I would like to do the same thing as Ischong. Is there an analytical way rather than using Monte-Carlo simulation as someone suggests. I know that simulation will surely work but need more simple way as the model is just linear regression.

Sincerely yours,
 
Suppose you have T observations and K variables. Suppose you also know the distribution of each y[t]; for example, y[t] ~ N(m[t], s[t]), t = 1 to T. If s[t] is constant for all t, then you have the standard OLS model. If s[t] is different for each t, then each error term u[t] is distributed N(0, s[t]). Since you know s[t] for all t, you can define the matrix [itex]\bold\Phi_{T\times T} = diag(s[t]^2)[/itex] as the variance matrix (of the errors). Then

[tex]\hat{\beta}=\left(X'\bold\Phi^{-1}X\right)^{-1}X'\bold\Phi^{-1}y[/tex]

is the best linear unbiased estimator of the regression coefficient vector.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K
Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K