How to prove zero correlation between residuals and predictors?

  • Thread starter Thread starter NotEuler
  • Start date Start date
  • Tags Tags
    Correlation Zero
Click For Summary
In a linear least squares multiple regression model, the residuals are proven to be uncorrelated with the predictors. This is established by minimizing the sum of squared errors, leading to conditions where the covariance between the residuals and each predictor equals zero. The proof involves calculating partial derivatives of the error function, which results in the mean of the residuals being zero and the covariance of residuals with predictors also being zero. This conclusion holds for any number of independent variables, indicating that residuals remain uncorrelated with both predictors and predicted values. Thus, the relationship between residuals and predictors in regression models is consistently characterized by zero correlation.
NotEuler
Messages
58
Reaction score
2
Hi,
I'm trying to figure out something I'm pretty sure is true, but don't know how to prove it. I couldn't find the answer with a google search, but hopefully someone here knows the answer!

So I have a linear least squares multiple regression model:
Y=a+bX1+cX2+e

where a is the intercept, X1 and X2 predictor/independent variables, and e denotes the residuals.
The model (i.e. the values of a, b and c) is fitted so that Ʃe^2 is minimized.

How do I prove that cov(e,X1)=cov(e,X2=0?

Thanks!
NotEuler
 
Physics news on Phys.org
Maybe I should clarify my question...

1) Assume I have a dataset of dependent variables Yi, and independent variables X1i and X2i.

2) I fit a linear regression model to that dataset: Y=a+bX1+cX2+e.

3) The model is fitted, i.e. the parameters a, b and c are determined, so that the sum of square of the errors Ʃei^2 = Ʃ(Yi-a-bX1i-cX2i)^2 is minimized.

4) I then calculate the covariance of the e:s from that same fitted model, and either set of independent variables (X1:s or X2:s) from the original dataset.

5) I think both cov(e,X1) and cov(e,X2) will always equal zero, regardless of what the original dataset was, and regardless of whether the real dependences are linear or something else.
I also think this should hold for any number of independent variables.

6) I think that to prove this, I need to write the covariance as cov(e,X1) = cov(Y-a-bX1-cX2, X1) = cov(Y,X1)-cov(a,X1)-cov(bX1,X1)-cov(cX2,X1).
And then somehow use the consequences of step 3 to show that if the square of errors is minimized, then this covariance is always zero.Does this make any sense? I'm no expert on regressions or covariances, so this might be hard to follow. It's also possible I'm wrong, and cov(e,X1) is not always zero.
Either way, any hints on how to proceed would be much appreciated!

Cheers,
NotEuler
 
NotEuler said:
3) The model is fitted, i.e. the parameters a, b and c are determined, so that the sum of square of the errors Ʃei^2 = Ʃ(Yi-a-bX1i-cX2i)^2 is minimized.

And then somehow use the consequences of step 3 to show that if the square of errors is minimized, then this covariance is always zero.

The partial derivatives of the function in step 3 with respect to a,b,c would be zero at an extrema. Perhaps that helps.
 
  • Like
Likes 1 person
Yes, that helps a lot! Here's a sketch of the proof, happy to hear if you see any mistakes. I've changed the notation slightly to show that it applies to a regression model with any number of predictors. I will denote means with ~ (i.e. the mean of the ei:s =E(ei)=e~.

1) Assume I have a dataset of dependent variables Yi, and independent variables X1i, X2i, X3i,... Xki.

2) I fit a linear regression model to that dataset: Y=a + bX1 + Z + e, where Z is a linear combination of all the independent variables from X2 onwards: Z=cX2+dX3+...
Z is therefore independent of a and b.

3) The model is fitted, i.e. the parameters a, b, c, d... are determined, so that the sum of square of the errors s(a,b,c,d...) = Ʃei^2 = Ʃ(Yi-a-bX1i-Zi)^2 is minimized.

4) To do this, I calculate the partial derivatives of s for a,b,c,d... and set them to equal 0.
I find that
∂s/∂a = -2 Ʃ(Yi-a-bX1i-Zi). Therefore Ʃ(Yi-a-bX1i-Zi) = Ʃei = 0, and E[e]=e~= 0
∂s/∂b = -2 Ʃ X1i (Yi-a-bX1i-Zi). Therefore Ʃ X1i (Yi-a-bX1i-Zi) = Ʃ X1i ei= 0

5) Ʃ (ei-e~)(X1i-X1~) = Ʃ (eiX1i - eiX1~ - e~X1i + e~X1~)
= ƩeiX1i - ƩeiX1~ - Ʃe~X1i + Ʃe~X1~ = 0 - X1~Ʃei -Ʃ0 + Ʃ0 = -X1~0 = 0

Therefore Cov(e,X1) = 0, which is what I wanted to prove.

Now I could replace X1 with any of the other X:s that are all combined in Z, and repeat the above analysis. Because the regression function is symmetric for all the predictor variables, I would then find that cov(e,Xk)=0 for any k.

Therefore the residuals are always uncorrelated with the predictors in a least squares linear regression model.
 
Now that I think about it, this result immediately implies that the residuals are also uncorrelated with the values predicted by the model (i.e. not the original dataset Yi, but the y=a+bX1+Z predicted.

This is because (following the notation above) cov(e,y)=cov(e, a+bX1+Z)=cov(e,a)+cov(e,bX1)+cov(e,Z)=0+0+0.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 2 ·
Replies
2
Views
1K
Replies
6
Views
1K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 27 ·
Replies
27
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 3 ·
Replies
3
Views
859
  • · Replies 6 ·
Replies
6
Views
2K