I Proof request for best linear predictor

  • I
  • Thread starter Thread starter psie
  • Start date Start date
  • Tags Tags
    Probability theory
Click For Summary
The discussion centers on proving Theorem 5.2, which states that the best linear predictor of Y based on X can be expressed in terms of means, variances, and covariance. Key definitions include the regression function E(Y|X=x) and the concept of linear predictors, which are functions of X. The expected quadratic prediction error is introduced as a measure to compare the effectiveness of different predictors. Theorem 5.1 establishes that the regression function is the optimal predictor when the second moment of Y is finite. The thread emphasizes the need for clarity in applying these definitions and theorems to prove the main theorem.
psie
Messages
315
Reaction score
40
TL;DR
In An Intermediate Course in Probability by Gut, there's a theorem stated without proof concerning best linear predictors. I was wondering if anyone knows how to prove it/or knows other sources where it has been proved.
Maybe this is a simple exercise, but I don't see how to prove the below theorem with the tools I've been given in the section (if it is possible at all).

Theorem 5.2. Suppose that ##EX^2<\infty## and ##EY^2<\infty##. Set \begin{align*}\mu_x&=EX, \\ \mu_y&=EY, \\ \sigma_x^2&=\operatorname{Var}X,\\ \sigma_y^2&=\operatorname{Var}Y, \\ \sigma_{xy}&=\operatorname{Cov}(X,Y), \\ \rho&=\sigma_{xy}/(\sigma_x\sigma_y).\end{align*} The best linear predictor of ##Y## based on ##X## is $$L(X)=\alpha+\beta X,$$where ##\alpha=\mu_y-\frac{\sigma_{xy}}{\sigma_x^2}\mu_x=\mu_y-\rho\frac{\sigma_y}{\sigma_x}\mu_x## and ##\beta=\frac{\sigma_{xy}}{\sigma_x^2}=\rho\frac{\sigma_y}{\sigma_x}##.

That's the theorem that I'm looking to prove. Now I'll just state some definitions and a theorem that has been given in the section prior to the above theorem. As done in the book, we confine ourselves to conditioning on a random variable, although definitions and theorems extend to conditioning on a random vector.

Definition 5.1. The function ##h(x)=E(Y\mid X=x)## is called the regression function ##Y## on ##X##.

Definition 5.2. A predictor (for ##Y##) based on ##X## is a function, ##d(X)##. The predictor is called linear if ##d## is linear.

Definition 5.3. The expected quadratic prediction error is $$E(Y-d(X))^2.$$ Moreover, if ##d_1## and ##d_2## are predictors, we say that ##d_1## is better than ##d_2## if ##E(Y-d_1(X))^2\leq E(Y-d_2(X))^2##.

Theorem 5.1. Suppose that ##EY^2<\infty##. Then ##h(X)=E(Y\mid X)## (i.e. the regression function ##Y## on ##X##) is the best predictor of ##Y## based on ##X##.
 
Physics news on Phys.org
Hello, I'm joining this forum to ask two questions which have nagged me for some time. They both are presumed obvious, yet don't make sense to me. Nobody will explain their positions, which is...uh...aka science. I also have a thread for the other question. But this one involves probability, known as the Monty Hall Problem. Please see any number of YouTube videos on this for an explanation, I'll leave it to them to explain it. I question the predicate of all those who answer this...