# Coefficient of Determination in case of repeat points, in linear regression

1. Feb 4, 2010

### maverick280857

Hello,

In simple linear regression (or even in multiple linear regression) how does one prove that the coefficient of determination, given by

$$R^2 = \frac{SS_{Reg}}{SS_{Total}} = 1-\frac{SS_{Res}}{SS_{Total}}= 1-\frac{\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}{\sum_{i=1}^{n}(y_i-\overline{y})^2}$$

is strictly less than 1, if there are repeat points? That is, if there are multiple values of the response $y_i$ at one value of the regressor $x_i$?

2. Feb 4, 2010

### EnumaElish

Wouldn't a general proof be sufficient?

3. Feb 4, 2010

### maverick280857

Well, it is easy to see that $R^2 \leq 1$. For the repeat-point case, I want to show that $R^2 < 1$.

4. Feb 5, 2010

### EnumaElish

Ah; thanks for pointing that out.

I have the outline of a heuristic proof. For R^2 = 1 the regression line has to coincide with all data points. Also, as a general matter, the regression line y = a + b x has to go through the sample averages of (X, Y) -- that is, mean(Y) = a + b mean(X). Suppose your data are {(x1, y1), (x1, y2), (x2, y3)}, y1 is not equal to y2, and your slope coefficient satisfies -infty < b < +infty.

If b(mean(X) - x1) equals mean(Y) - y1 then Y(x1) = y1, and the regression line does not go through y2.

If b(mean(X) - x1) equals mean(Y) - y2 then Y(x1) = y2, and the regression line does not go through y1.

If b(mean(X) - x1) equals neither mean(Y) - y1 nor mean(Y) - y2 then the regression line does not go through y1 or y2.

Last edited: Feb 5, 2010