OLS regression - using an assumption as the proof?

musicgold · Jun 19, 2009

Hi,

My question is about a common procedure used to find minimum and maximum values of a function. In many problems we find the first derivative of a function and then equate it to zero. I understand the use of this method when one is trying to find the minimum or maximum value of the function.

However, I get confused when I see people using that ‘equating to 0’ assumption as a proof for something else.

To better explain my question, I have attached a file here. The file has equations used in deriving the coefficients of a least-square regression line.

The OLS method starts with the partial differentiation of equation 3.1.2, and then equates the derivatives to 0 and solves them to get the coeff. I get it up to this point.

However, in the last section, to prove that the sum of the residuals is 0, the author uses terms from partial differentiation as the proof.

I don’t understand how an assumption can be used as the proof for something.

Thanks,

MG.

HallsofIvy · Jun 20, 2009

I looked at your attachment. I do not see any "assumption used as a proof". What "assumption" are you talking about?

musicgold · Jun 22, 2009

HallsofIvy,

Thanks.

Equation 1, in the middle section of the attachment, is a partial derivative of Eqn 3.1.2 (in the top section), with respect to β1. Then Eqn 1, along with Eqn 2, is equated to zero to get the values of β1 and β2 (estimates).

Isn’t this equating eqn 1 and 2 to zero is an assumption used only to get the values of β1 and β2.
And if that is an assumption, why is it being used to prove u =0, as in the last section of the attachment.

MG.

statdad · Jun 23, 2009

Is there some other page where the author(s) show that "the mean value of the residuals is zero", as they state on the bottom portion of the page you attached? If not, the writing is poor.

However, nothing is really cyclic: you have

 S(\hat{\beta}_1, \hat{\beta}_2) = \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i\right)^2 

and you want to find the \hat{\beta}_1, \hat{\beta}_2 pair that minimizes it. Since it is a very nice function (polynomial in two variables), the usual calculus-based methods can be used to find them. The first steps are to find the two partial derivatives, set them to zero, and solve - exactly what is discussed. Setting the partial derivatives to zero gives
 \begin{align*} \frac{\partial S}{\partial \hat{\beta}_1} & = -2 \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i \right) = 0 \\ \frac{\partial S}{\partial \hat{\beta}_2} & = -2 \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i\right) X_i = 0 \end{align*} 

The first equation of my final pair shows that the sum (and so the mean) of the residuals is zero, and the final equation corresponds to 3A.1 equation 2.

musicgold · Jun 23, 2009

statdad,

Thanks.

statdad said:
Is there some other page where the author(s) show that "the mean value of the residuals is zero",

I think that is what the author is trying to prove here ( see the last line of my attachment)

Now back to my question.

Let me explain you what I understand here first.

By equating those partial derivate terms to zero, we are looking at the points where the surface or function reaches a maximum or minimum with respect to β1 and β2, and I understand that. Basically, we are using two known points on the function to find the value of two unknowns.

What I don’t understand is how we can use the same method (of equating a differential equation to zero) to prove that the mean value of residuals is zero.

I guess, I am not able to interpret this geometrically. What are we trying to say here: the point where the partial derivative w.r.t β1 is zero, the expected value of the residual term is also zero, and therefore, it is zero at all other points on the surface? (not sure if I am making any sense here)

Thanks,

MG.

statdad · Jun 23, 2009

First, note that I had two typos in my earlier post (I have fixed them). I neglected to write X_i in the two partial derivatives.

Now, the way this estimation is usually approached (that is, the way I learned it and the way I often teach it) is to say: OK, we have data, and we want to estimate the slope and intercept with least squares. Since every linear equation can be written in the form

 Y = a + bx 

let's try to find the values of a, b that will minimize this expression:

 S(a,b) = \sum_{i=1}^n \left(Y_i - (a + bX_i)\right)^2 

- this is simply the sum of the vertical distances between the points and the line. We can find the values that minimize this with simple calculus.

 \begin{align*} \frac{\partial S}{\partial a} & = -2 \sum_{i=1}^n \left(Y_i - (a + bX_i) \right) = 0\\ \frac{\partial S}{\partial b} & = -2 \sum_{i=1}^n \left(Y_i - (a + bX_i) \right)X_i = 0 \end{align*} 

The solutions to these equations are found by simple algebra - these solutions are the \hat{\beta}_1 and \hat{\beta}_2 values. With these values, the first partial derivative above shows that

 \left. \frac{\partial S}{\partial a}\right|_{a=\hat{\beta}_1, b = \hat{\beta}_2} = \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i \right) = 0 

This is the point where we see that the sum of the least squares residuals equals zero. Since the sum of the residuals is zero, the mean (as in arithmetic mean) of the sample residuals is zero.

This is similar to a property the one-sample arithmetic mean has: \bar x has the property that

 \sum_{i=1}^n (x_i - \bar x)^2 

is minimized, and as a consequence it is easy to show that

 \sum_{i=1}^n (x_i - \bar x) = 0 

Did I get closer to answering your question this time?

EnumaElish · Jun 24, 2009

musicgold said:

What I don’t understand is how we can use the same method (of equating a differential equation to zero) to prove that the mean value of residuals is zero

It is because the beta-hat coefficients are chosen such that the first-order derivatives equal zero.

In other words: the last equation is zero precisely because the beta-hat coefficients have been chosen (i.e. solved) to satisfy equations (1) and (2), as statdat has shown.

[BTW, is this a D.E. question? I would have put it under calculus.]

musicgold · Jun 24, 2009

statdad and EnumaElish,

Thanks a lot. It is clear to me now.

MG.

OLS regression - using an assumption as the proof?

Attachments

Similar threads

Hot Threads

I Understanding Legendre differential equation

I Visual Representation of Separation of Varables

I Proving a basis is a basis for homogeneous linear differential equation with constant (complex) coefficients

A Searching for radial part solution of Klein-Gordon type equation

I Why Lagrange’s method of solving Pp + Qq=R works?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective