Undergrad Proof request for best linear predictor

Click For Summary
The discussion centers on proving Theorem 5.2, which states that the best linear predictor of Y based on X can be expressed in terms of means, variances, and covariance. Key definitions include the regression function E(Y|X=x) and the concept of linear predictors, which are functions of X. The expected quadratic prediction error is introduced as a measure to compare the effectiveness of different predictors. Theorem 5.1 establishes that the regression function is the optimal predictor when the second moment of Y is finite. The thread emphasizes the need for clarity in applying these definitions and theorems to prove the main theorem.
psie
Messages
315
Reaction score
40
TL;DR
In An Intermediate Course in Probability by Gut, there's a theorem stated without proof concerning best linear predictors. I was wondering if anyone knows how to prove it/or knows other sources where it has been proved.
Maybe this is a simple exercise, but I don't see how to prove the below theorem with the tools I've been given in the section (if it is possible at all).

Theorem 5.2. Suppose that ##EX^2<\infty## and ##EY^2<\infty##. Set \begin{align*}\mu_x&=EX, \\ \mu_y&=EY, \\ \sigma_x^2&=\operatorname{Var}X,\\ \sigma_y^2&=\operatorname{Var}Y, \\ \sigma_{xy}&=\operatorname{Cov}(X,Y), \\ \rho&=\sigma_{xy}/(\sigma_x\sigma_y).\end{align*} The best linear predictor of ##Y## based on ##X## is $$L(X)=\alpha+\beta X,$$where ##\alpha=\mu_y-\frac{\sigma_{xy}}{\sigma_x^2}\mu_x=\mu_y-\rho\frac{\sigma_y}{\sigma_x}\mu_x## and ##\beta=\frac{\sigma_{xy}}{\sigma_x^2}=\rho\frac{\sigma_y}{\sigma_x}##.

That's the theorem that I'm looking to prove. Now I'll just state some definitions and a theorem that has been given in the section prior to the above theorem. As done in the book, we confine ourselves to conditioning on a random variable, although definitions and theorems extend to conditioning on a random vector.

Definition 5.1. The function ##h(x)=E(Y\mid X=x)## is called the regression function ##Y## on ##X##.

Definition 5.2. A predictor (for ##Y##) based on ##X## is a function, ##d(X)##. The predictor is called linear if ##d## is linear.

Definition 5.3. The expected quadratic prediction error is $$E(Y-d(X))^2.$$ Moreover, if ##d_1## and ##d_2## are predictors, we say that ##d_1## is better than ##d_2## if ##E(Y-d_1(X))^2\leq E(Y-d_2(X))^2##.

Theorem 5.1. Suppose that ##EY^2<\infty##. Then ##h(X)=E(Y\mid X)## (i.e. the regression function ##Y## on ##X##) is the best predictor of ##Y## based on ##X##.
 
Physics news on Phys.org
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
Replies
2
Views
4K
  • · Replies 114 ·
4
Replies
114
Views
11K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 18 ·
Replies
18
Views
2K