# OLS regression - using an assumption as the proof?

1. Jun 19, 2009

### musicgold

Hi,

My question is about a common procedure used to find minimum and maximum values of a function. In many problems we find the first derivative of a function and then equate it to zero. I understand the use of this method when one is trying to find the minimum or maximum value of the function.

However, I get confused when I see people using that ‘equating to 0’ assumption as a proof for something else.

To better explain my question, I have attached a file here. The file has equations used in deriving the coefficients of a least-square regression line.

The OLS method starts with the partial differentiation of equation 3.1.2, and then equates the derivatives to 0 and solves them to get the coeff. I get it up to this point.

However, in the last section, to prove that the sum of the residuals is 0, the author uses terms from partial differentiation as the proof.

I don’t understand how an assumption can be used as the proof for something.

Thanks,

MG.

#### Attached Files:

• ###### regression eqn.doc
File size:
47 KB
Views:
78
2. Jun 20, 2009

### HallsofIvy

Staff Emeritus
I looked at your attachment. I do not see any "assumption used as a proof". What "assumption" are you talking about?

3. Jun 22, 2009

### musicgold

HallsofIvy,

Thanks.

Equation 1, in the middle section of the attachment, is a partial derivative of Eqn 3.1.2 (in the top section), with respect to β1. Then Eqn 1, along with Eqn 2, is equated to zero to get the values of β1 and β2 (estimates).

Isn’t this equating eqn 1 and 2 to zero is an assumption used only to get the values of β1 and β2.
And if that is an assumption, why is it being used to prove u =0, as in the last section of the attachment.

MG.

4. Jun 23, 2009

Is there some other page where the author(s) show that "the mean value of the residuals is zero", as they state on the bottom portion of the page you attached? If not, the writing is poor.

However, nothing is really cyclic: you have

$$S(\hat{\beta}_1, \hat{\beta}_2) = \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i\right)^2$$

and you want to find the $$\hat{\beta}_1, \hat{\beta}_2$$ pair that minimizes it. Since it is a very nice function (polynomial in two variables), the usual calculus-based methods can be used to find them. The first steps are to find the two partial derivatives, set them to zero, and solve - exactly what is discussed. Setting the partial derivatives to zero gives
\begin{align*} \frac{\partial S}{\partial \hat{\beta}_1} & = -2 \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i \right) = 0 \\ \frac{\partial S}{\partial \hat{\beta}_2} & = -2 \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i\right) X_i = 0 \end{align*}

The first equation of my final pair shows that the sum (and so the mean) of the residuals is zero, and the final equation corresponds to 3A.1 equation 2.

Last edited: Jun 23, 2009
5. Jun 23, 2009

### musicgold

Thanks.

I think that is what the author is trying to prove here ( see the last line of my attachment)

Now back to my question.

Let me explain you what I understand here first.

By equating those partial derivate terms to zero, we are looking at the points where the surface or function reaches a maximum or minimum with respect to β1 and β2, and I understand that. Basically, we are using two known points on the function to find the value of two unknowns.

What I don’t understand is how we can use the same method (of equating a differential equation to zero) to prove that the mean value of residuals is zero.

I guess, I am not able to interpret this geometrically. What are we trying to say here: the point where the partial derivative w.r.t β1 is zero, the expected value of the residual term is also zero, and therefore, it is zero at all other points on the surface? (not sure if I am making any sense here)

Thanks,

MG.

6. Jun 23, 2009

First, note that I had two typos in my earlier post (I have fixed them). I neglected to write $$X_i$$ in the two partial derivatives.

Now, the way this estimation is usually approached (that is, the way I learned it and the way I often teach it) is to say: OK, we have data, and we want to estimate the slope and intercept with least squares. Since every linear equation can be written in the form

$$Y = a + bx$$

let's try to find the values of $$a, b$$ that will minimize this expression:

$$S(a,b) = \sum_{i=1}^n \left(Y_i - (a + bX_i)\right)^2$$

- this is simply the sum of the vertical distances between the points and the line. We can find the values that minimize this with simple calculus.

\begin{align*} \frac{\partial S}{\partial a} & = -2 \sum_{i=1}^n \left(Y_i - (a + bX_i) \right) = 0\\ \frac{\partial S}{\partial b} & = -2 \sum_{i=1}^n \left(Y_i - (a + bX_i) \right)X_i = 0 \end{align*}

The solutions to these equations are found by simple algebra - these solutions are the $$\hat{\beta}_1$$ and $$\hat{\beta}_2$$ values. With these values, the first partial derivative above shows that

$$\left. \frac{\partial S}{\partial a}\right|_{a=\hat{\beta}_1, b = \hat{\beta}_2} = \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i \right) = 0$$

This is the point where we see that the sum of the least squares residuals equals zero. Since the sum of the residuals is zero, the mean (as in arithmetic mean) of the sample residuals is zero.

This is similar to a property the one-sample arithmetic mean has: $$\bar x$$ has the property that

$$\sum_{i=1}^n (x_i - \bar x)^2$$

is minimized, and as a consequence it is easy to show that

$$\sum_{i=1}^n (x_i - \bar x) = 0$$

7. Jun 24, 2009

### EnumaElish

It is because the beta-hat coefficients are chosen such that the first-order derivatives equal zero.

In other words: the last equation is zero precisely because the beta-hat coefficients have been chosen (i.e. solved) to satisfy equations (1) and (2), as statdat has shown.

[BTW, is this a D.E. question? I would have put it under calculus.]

Last edited: Jun 24, 2009
8. Jun 24, 2009