How Does the Least Squares Estimator Minimize Error in Linear Regression?

squenshl · Jun 9, 2016

Homework Statement

Suppose that ##Y \sim N_n\left(X\beta,\sigma^2I\right)##, where the density function of ##Y## is
$$\frac{1}{\left(2\pi\sigma^2\right)^{\frac{n}{2}}}e^{-\frac{1}{2\sigma^2}(Y-X\beta)^T(Y-X\beta)},$$
and ##X## is an ##n\times p## matrix of rank ##p##.
Let ##\hat{\beta}## be the least squares estimator of ##\beta##.

Show that ##(Y-X\beta)^T(Y-X\beta) = \left(Y-X\hat{\beta}\right)^T(Y-X\hat{\beta})+\left(\hat{\beta}-\beta\right)^TX^TX\left(\hat{\beta}-\beta\right)## and therefore that ##\hat{\beta}## is the least squares estimate.
Hint: ##Y-X\beta = Y-X\hat{\beta}+X\hat{\beta}-X\beta##.

Homework Equations

The Attempt at a Solution

I have no idea where to start. Do I substitute the hint into ##(Y-X\beta)^T(Y-X\beta)## and expand out the brackets?

Please help!

andrewkirk · Jun 9, 2016

There seems to be something odd about how this problem is stated. It asks the student to assume that ##\hat\beta## is the least squares estimator of ##\beta## - and then to use that to prove that it is the least squares estimate. Are they trying to draw a distinction between estimator and estimate? If not, the problem is trivial. However if we want to get very precise about terminology I would have thought that an estimator is a function whereas the estimate is the result of the function. Is there some particular meaning of 'estimator' and 'estimate' that they are using in your course?

As to how to proceed to prove their formula, yes substitution along the lines you mention sounds a good way to start. You can rewrite the RHS of the hint as ##(Y-X\hat\beta)+X(\hat\beta-\beta)##. Expanding out then gives us a right hand side that is what they show above, plus
$$2(X(\hat\beta-\beta))^T(Y-X\hat\beta)$$
So this needs to be shown to be zero. However it seems to me that should be impossible, since it is a function of the unknown parameter vector ##\beta##, which can be changed without changing any of the other elements in the formula (##X,Y,\hat\beta##) .

Are you sure there wasn't an expectation operator around that equation they want you to prove, or some other constraining condition?

squenshl · Jun 9, 2016

andrewkirk said:

There seems to be something odd about how this problem is stated. It asks the student to assume that ##\hat\beta## is the least squares estimator of ##\beta## - and then to use that to prove that it is the least squares estimate. Are they trying to draw a distinction between estimator and estimate? If not, the problem is trivial. However if we want to get very precise about terminology I would have thought that an estimator is a function whereas the estimate is the result of the function. Is there some particular meaning of 'estimator' and 'estimate' that they are using in your course?

As to how to proceed to prove their formula, yes substitution along the lines you mention sounds a good way to start. You can rewrite the RHS of the hint as ##(Y-X\hat\beta)+X(\hat\beta-\beta)##. Expanding out then gives us a right hand side that is what they show above, plus
$$2(X(\hat\beta-\beta))^T(Y-X\hat\beta)$$
So this needs to be shown to be zero. However it seems to me that should be impossible, since it is a function of the unknown parameter vector ##\beta##, which can be changed without changing any of the other elements in the formula (##X,Y,\hat\beta##) .

Are you sure there wasn't an expectation operator around that equation they want you to prove, or some other constraining condition?

Nope that's the question asked.

Ray Vickson · Jun 10, 2016

squenshl said:

Homework Statement

Suppose that ##Y \sim N_n\left(X\beta,\sigma^2I\right)##, where the density function of ##Y## is
$$\frac{1}{\left(2\pi\sigma^2\right)^{\frac{n}{2}}}e^{-\frac{1}{2\sigma^2}(Y-X\beta)^T(Y-X\beta)},$$
and ##X## is an ##n\times p## matrix of rank ##p##.
Let ##\hat{\beta}## be the least squares estimator of ##\beta##.

Show that ##(Y-X\beta)^T(Y-X\beta) = \left(Y-X\hat{\beta}\right)^T(Y-X\hat{\beta})+\left(\hat{\beta}-\beta\right)^TX^TX\left(\hat{\beta}-\beta\right)## and therefore that ##\hat{\beta}## is the least squares estimate.
Hint: ##Y-X\beta = Y-X\hat{\beta}+X\hat{\beta}-X\beta##.

Homework Equations
The Attempt at a Solution

I have no idea where to start. Do I substitute the hint into ##(Y-X\beta)^T(Y-X\beta)## and expand out the brackets?

Please help!

Let ##Q(\beta) = (Y - X \beta)^T (Y - X \beta)##. If you write ##\beta = b + e## you can expand ##Q(b+e)## as a quadratic in ##e##. It will have 0-order terms (not containing ##e##), first-order terms (linear in ##e##) and second-order terms (of the form ##e^T M e## for some matrix ##M## that depends on ##X, Y## and ##b##). However, if you choose ##b## correctly, the terms of first-order in ##e## will vanish, leaving you with only zero-order and second-order terms in ##e##. That will happen when ##b = \hat{\beta}##. You will obtain the expression you are being asked to prove, where ##e = \beta ##-

How Does the Least Squares Estimator Minimize Error in Linear Regression?

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Thread 'Distance between a Clock's hands when the distance is increasing most rapidly'

Similar threads

Prove that the integral is equal to ##\pi^2/8##

Distance between a Clock's hands when the distance is increasing most rapidly

Limit of piecewise function using epsilon delta

Volume with spherical coordinates

Use greedy vertex coloring algorithm to prove the upper bound of χ

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers