How to prove zero correlation between residuals and predictors?

NotEuler · Dec 2, 2013

Hi,
I'm trying to figure out something I'm pretty sure is true, but don't know how to prove it. I couldn't find the answer with a google search, but hopefully someone here knows the answer!

So I have a linear least squares multiple regression model:
Y=a+bX1+cX2+e

where a is the intercept, X1 and X2 predictor/independent variables, and e denotes the residuals.
The model (i.e. the values of a, b and c) is fitted so that Ʃe^2 is minimized.

How do I prove that cov(e,X1)=cov(e,X2=0?

Thanks!
NotEuler

NotEuler · Dec 2, 2013

Maybe I should clarify my question...

1) Assume I have a dataset of dependent variables Yi, and independent variables X1i and X2i.

2) I fit a linear regression model to that dataset: Y=a+bX1+cX2+e.

3) The model is fitted, i.e. the parameters a, b and c are determined, so that the sum of square of the errors Ʃei^2 = Ʃ(Yi-a-bX1i-cX2i)^2 is minimized.

4) I then calculate the covariance of the e:s from that same fitted model, and either set of independent variables (X1:s or X2:s) from the original dataset.

5) I think both cov(e,X1) and cov(e,X2) will always equal zero, regardless of what the original dataset was, and regardless of whether the real dependences are linear or something else.
I also think this should hold for any number of independent variables.

6) I think that to prove this, I need to write the covariance as cov(e,X1) = cov(Y-a-bX1-cX2, X1) = cov(Y,X1)-cov(a,X1)-cov(bX1,X1)-cov(cX2,X1).
And then somehow use the consequences of step 3 to show that if the square of errors is minimized, then this covariance is always zero.Does this make any sense? I'm no expert on regressions or covariances, so this might be hard to follow. It's also possible I'm wrong, and cov(e,X1) is not always zero.
Either way, any hints on how to proceed would be much appreciated!

Cheers,
NotEuler

Stephen Tashi · Dec 3, 2013

NotEuler said:

3) The model is fitted, i.e. the parameters a, b and c are determined, so that the sum of square of the errors Ʃei^2 = Ʃ(Yi-a-bX1i-cX2i)^2 is minimized.

And then somehow use the consequences of step 3 to show that if the square of errors is minimized, then this covariance is always zero.

The partial derivatives of the function in step 3 with respect to a,b,c would be zero at an extrema. Perhaps that helps.

NotEuler · Dec 3, 2013

Yes, that helps a lot! Here's a sketch of the proof, happy to hear if you see any mistakes. I've changed the notation slightly to show that it applies to a regression model with any number of predictors. I will denote means with ~ (i.e. the mean of the ei:s =E(ei)=e~.

1) Assume I have a dataset of dependent variables Yi, and independent variables X1i, X2i, X3i,... Xki.

2) I fit a linear regression model to that dataset: Y=a + bX1 + Z + e, where Z is a linear combination of all the independent variables from X2 onwards: Z=cX2+dX3+...
Z is therefore independent of a and b.

3) The model is fitted, i.e. the parameters a, b, c, d... are determined, so that the sum of square of the errors s(a,b,c,d...) = Ʃei^2 = Ʃ(Yi-a-bX1i-Zi)^2 is minimized.

4) To do this, I calculate the partial derivatives of s for a,b,c,d... and set them to equal 0.
I find that
∂s/∂a = -2 Ʃ(Yi-a-bX1i-Zi). Therefore Ʃ(Yi-a-bX1i-Zi) = Ʃei = 0, and E[e]=e~= 0
∂s/∂b = -2 Ʃ X1i (Yi-a-bX1i-Zi). Therefore Ʃ X1i (Yi-a-bX1i-Zi) = Ʃ X1i ei= 0

5) Ʃ (ei-e~)(X1i-X1~) = Ʃ (eiX1i - eiX1~ - e~X1i + e~X1~)
= ƩeiX1i - ƩeiX1~ - Ʃe~X1i + Ʃe~X1~ = 0 - X1~Ʃei -Ʃ0 + Ʃ0 = -X1~0 = 0

Therefore Cov(e,X1) = 0, which is what I wanted to prove.

Now I could replace X1 with any of the other X:s that are all combined in Z, and repeat the above analysis. Because the regression function is symmetric for all the predictor variables, I would then find that cov(e,Xk)=0 for any k.

Therefore the residuals are always uncorrelated with the predictors in a least squares linear regression model.

NotEuler · Dec 5, 2013

Now that I think about it, this result immediately implies that the residuals are also uncorrelated with the values predicted by the model (i.e. not the original dataset Yi, but the y=a+bX1+Z predicted.

This is because (following the notation above) cov(e,y)=cov(e, a+bX1+Z)=cov(e,a)+cov(e,bX1)+cov(e,Z)=0+0+0.

How to prove zero correlation between residuals and predictors?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight