# On Cochran Theorem (and Orthogonal Projections)

**R-english – Freakonometrics**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove that if \(\boldsymbol{Y}\sim\mathcal{N}(\boldsymbol{0},\mathbb{I}_d)\) is a random vector with \(d\) \(\mathcal{N}(0,1)\) variable then (i) if \(A\) is a (squared) idempotent matrix \(\boldsymbol{Y}^\top A\boldsymbol{Y}\sim\chi^2_r\) where \(r\) is the rank of matrix \(A\), and (ii) conversely, if \(\boldsymbol{Y}^\top A\boldsymbol{Y}\sim\chi^2_r\) then \(A\) is an idempotent matrix of rank \(r\). And just in case, \(A\) is an idempotent matrix means that \(A^2=A\), and a lot of results can be derived (for instance on the eigenvalues). The prof of that result (at least the (i) part) is nice: we diagonlize matrix \(A\), so that \(A=P\Delta P^\top\), with \(P\) orthonormal. Since \(A\) is an idempotent matrix observe that\(A^2=P\Delta P^\top=P\Delta P^\top=P\Delta^2 P^\top\)where \(\Delta\) is some diagonal matrix such that \(\Delta^2=\Delta\), so terms on the diagonal of \(\Delta\) are either \(0\) or \(1\)‘s. And because the rank of \(A\) (and \(\Delta\)) is \(r\) then there should be \(r\) \(1\)‘s and \(d-r\) \(1\)‘s. Now write\(\boldsymbol{Y}^\top A\boldsymbol{Y}=\boldsymbol{Y}^\top P\Delta P^\top\boldsymbol{Y}=\boldsymbol{Z}^\top \Delta\boldsymbol{Z}\)where \(\boldsymbol{Z}=P^\top\boldsymbol{Y}\) that satisfies\(\boldsymbol{Z}\sim\mathcal{N}(\boldsymbol{0},PP^\top)\) i.e. \(\boldsymbol{Z}\sim\mathcal{N}(\boldsymbol{0},\mathbb{I}_d)\). Thus \(\boldsymbol{Z}^\top \Delta\boldsymbol{Z}=\sum_{i:\Delta_{i,i}-1}Z_i^2\sim\chi^2_r\)Nice, isn’t it. And there is more (that will be strongly connected actually to Cochran theorem). Let \(A=A_1+\dots+A_k\), then the two following statements are equivalent (i) \(A\) is idempotent and \(\text{rank}(A)=\text{rank}(A_1)+\dots+\text{rank}(A_k)\) (ii) \(A_i\)‘s are idempotents, \(A_iA_j=0\) for all \(i\neq j\).

Now, let us talk about projections. Let \(\boldsymbol{y}\) be a vector in \(\mathbb{R}^n\). Its projection on the space \(\mathcal V(\boldsymbol{v}_1,\dots,\boldsymbol{v}_p)\) (generated by those \(p\) vectors) is the vector \(\hat{\boldsymbol{y}}=\boldsymbol{V} \hat{\boldsymbol{a}}\) that minimizes \(\|\boldsymbol{y} -\boldsymbol{V} \boldsymbol{a}\|\) (in \(\boldsymbol{a}\)). The solution is\(\hat{\boldsymbol{a}}=( \boldsymbol{V}^\top \boldsymbol{V})^{-1} \boldsymbol{V}^\top \boldsymbol{y} \text{ and } \hat{\boldsymbol{y}} = \boldsymbol{V} \hat{\boldsymbol{a}}\)

Matrix \( P=\boldsymbol{V} ( \boldsymbol{V}^\top \boldsymbol{V})^{-1} \boldsymbol{V}^\top\) is the orthogonal projection on \(\{\boldsymbol{v}_1,\dots,\boldsymbol{v}_p\}\) and \(\hat{\boldsymbol{y}} = P\boldsymbol{y}\).

Now we can recall Cochran theorem. Let \(\boldsymbol{Y}\sim\mathcal{N}(\boldsymbol{\mu},\sigma^2\mathbb{I}_d)\) for some \(\sigma>0\) and \(\boldsymbol{\mu}\). Consider sub-vector orthogonal spaces \(F_1,\dots,F_m\), with dimension \(d_i\). Let \(P_{F_i}\) be the orthogonal projection matrix on \(F_i\), then (i) vectors \(P_{F_1}\boldsymbol{X},\dots,P_{F_m}\boldsymbol{X}\) are independent, with respective distribution \(\mathcal{N}(P_{F_i}\boldsymbol{\mu},\sigma^2\mathbb{I}_{d_i})\) and (ii) random variables \(\|P_{F_i}(\boldsymbol{X}-\boldsymbol{\mu})\|^2/\sigma^2\) are independent and \(\chi^2_{d_i}\) distributed.

We can try to visualize those results. For instance, the orthogonal projection of a random vector has a Gaussian distribution. Consider a two-dimensional Gaussian vector

```
library(mnormt)
r = .7
s1 = 1
s2 = 1
Sig = matrix(c(s1^2,r*s1*s2,r*s1*s2,s2^2),2,2)
Sig
Y = rmnorm(n = 1000,mean=c(0,0),varcov = Sig)
plot(Y,cex=.6)
vu = seq(-4,4,length=101)
vz = outer(vu,vu,function (x,y) dmnorm(cbind(x,y),
mean=c(0,0), varcov = Sig))
contour(vu,vu,vz,add=TRUE,col='blue')
abline(a=0,b=2,col="red")
```

Consider now the projection of points \(\boldsymbol{y}=(y_1,y_2)\) on the straight linear with directional vector \(\overrightarrow{\boldsymbol{u}}\) with slope \(a\) (say \(a=2\)). To get the projected point \(\boldsymbol{x}=(x_1,x_2)\) recall that \(x_2=ay_1\) and \(\overrightarrow{\boldsymbol{x},\boldsymbol{y}}\perp\overrightarrow{\boldsymbol{u}}\). Hence, the following code will give us the orthogonal projections

```
p = function(a){
x0=(Y[,1]+a*Y[,2])/(1+a^2)
y0=a*x0
cbind(x0,y0)
}
```

with

```
P = p(2)
for(i in 1:20) segments(Y[i,1],Y[i,2],P[i,1],P[i,2],lwd=4,col="red")
points(P[,1],P[,2],col="red",cex=.7)
```

Now, if we look at the distribution of points on that line, we get… a Gaussian distribution, as expected,

```
z = sqrt(P[,1]^2+P[,2]^2)*c(-1,+1)[(P[,1]>0)*1+1]
vu = seq(-6,6,length=601)
vv = dnorm(vu,mean(z),sd(z))
hist(z,probability = TRUE,breaks = seq(-4,4,by=.25))
lines(vu,vv,col="red")
```

Or course, we can use the matrix representation to get the projection on \(\overrightarrow{\boldsymbol{u}}\), or a normalized version of that vector actually

```
a=2
U = c(1,a)/sqrt(a^2+1)
U
[1] 0.4472136 0.8944272
matP = U %*% solve(t(U) %*% U) %*% t(U)
matP %*% Y[1,]
[,1]
[1,] -0.1120555
[2,] -0.2241110
P[1,]
x0 y0
-0.1120555 -0.2241110
```

(which is consistent with our manual computation). Now, in Cochran theorem, we start with independent random variables,

`Y = rmnorm(n = 1000,mean=c(0,0),varcov = diag(c(1,1)))`

Then we consider the projection on \(\overrightarrow{\boldsymbol{u}}\) and \(\overrightarrow{\boldsymbol{v}}=\overrightarrow{\boldsymbol{u}}^\perp\)

```
U = c(1,a)/sqrt(a^2+1)
matP1 = U %*% solve(t(U) %*% U) %*% t(U)
P1 = Y %*% matP1
z1 = sqrt(P1[,1]^2+P1[,2]^2)*c(-1,+1)[(P1[,1]>0)*1+1]
V = c(a,-1)/sqrt(a^2+1)
matP2 = V %*% solve(t(V) %*% V) %*% t(V)
P2 = Y %*% matP2
z2 = sqrt(P2[,1]^2+P2[,2]^2)*c(-1,+1)[(P2[,1]>0)*1+1]
```

We can plot those two projections

`plot(z1,z2)`

and observe that the two are indeed, independent Gaussian variables. And (of course) there squared norms are \(\chi^2_{1}\) distributed.

**leave a comment**for the author, please follow the link and comment on their blog:

**R-english – Freakonometrics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.