# Propagating Measurement Uncertainty into a Linear Regression Model

1. Jan 17, 2010

### lschong

I am trying to figure out how to combine uncertainty (in x and y) into the standard error of the best fit line from the linear regression for that dataset.

I am plotting units of concentration (x) versus del t/height (y) to get a value for the flux (which is the slope)

I understand how to get the standard error of the best fit line, but that only gives the error in y in relation to the best fit line. Is there a good way to combine that error with the error from the individual measurements?

For example:
(x) (y)
delt/h Conc.
0.00 563.84
2.39 568.77
3.53 566.64
11.03 572.59

The error in each y measurement is 9%

When I do the linear regression, I get a slope of .71 and an error of .21

Is there a (relatively) simple way to propagate the 9% error into the regression error?

2. Jan 19, 2010

### EnumaElish

Putting aside the errors in the x values, the regression error already includes the errors in y.

3. Jan 19, 2010

### lschong

Are you referring to the standard error of the regression line? I know that the standard error includes all the vertical error from each point to the line, but what I want to do is take into account the vertical error in each data point with respect to the line.

So, my first point y = 531 +/- 51 and the second point y = 540+/- 46 and so on. How do I integrate the +/- values for each data point into the error for the linear regression?

Thanks.

4. Jan 20, 2010

### EnumaElish

The computationally easy way is to generate random numbers for each y. For y = 531 +/- 51, you could generate (say) 10 uniform random numbers with mean = 531 and range = +/- 51, all matched to the same x value.

5. Jun 10, 2010

### rojana

Hi,
I would like to do the same thing as Ischong. Is there an analytical way rather than using Monte-Carlo simulation as someone suggests. I know that simulation will surely work but need more simple way as the model is just linear regression.

Sincerely yours,

6. Jun 11, 2010

### EnumaElish

Suppose you have T observations and K variables. Suppose you also know the distribution of each y[t]; for example, y[t] ~ N(m[t], s[t]), t = 1 to T. If s[t] is constant for all t, then you have the standard OLS model. If s[t] is different for each t, then each error term u[t] is distributed N(0, s[t]). Since you know s[t] for all t, you can define the matrix $\bold\Phi_{T\times T} = diag(s[t]^2)$ as the variance matrix (of the errors). Then

$$\hat{\beta}=\left(X'\bold\Phi^{-1}X\right)^{-1}X'\bold\Phi^{-1}y$$

is the best linear unbiased estimator of the regression coefficient vector.