Coefficient of Determination in case of repeat points, in linear regression

AI Thread Summary
In simple and multiple linear regression, the coefficient of determination, R², is defined as R² = SS_Reg / SS_Total = 1 - SS_Res / SS_Total. It is established that R² is always less than or equal to 1, but in cases with repeated points, R² is strictly less than 1. This is because for R² to equal 1, the regression line must pass through all data points, which is impossible when there are multiple distinct response values for the same regressor value. The regression line must also pass through the averages of the data, leading to scenarios where it cannot simultaneously fit all repeated points. Thus, the presence of repeat points ensures that R² remains less than 1.
maverick280857
Messages
1,774
Reaction score
5
Hello,

In simple linear regression (or even in multiple linear regression) how does one prove that the coefficient of determination, given by

R^2 = \frac{SS_{Reg}}{SS_{Total}} = 1-\frac{SS_{Res}}{SS_{Total}}= 1-\frac{\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}{\sum_{i=1}^{n}(y_i-\overline{y})^2}

is strictly less than 1, if there are repeat points? That is, if there are multiple values of the response y_i at one value of the regressor x_i?

Thanks in advance.
 
Physics news on Phys.org
Wouldn't a general proof be sufficient?
 
Well, it is easy to see that R^2 \leq 1. For the repeat-point case, I want to show that R^2 < 1.
 
Ah; thanks for pointing that out.

I have the outline of a heuristic proof. For R^2 = 1 the regression line has to coincide with all data points. Also, as a general matter, the regression line y = a + b x has to go through the sample averages of (X, Y) -- that is, mean(Y) = a + b mean(X). Suppose your data are {(x1, y1), (x1, y2), (x2, y3)}, y1 is not equal to y2, and your slope coefficient satisfies -infty < b < +infty.

If b(mean(X) - x1) equals mean(Y) - y1 then Y(x1) = y1, and the regression line does not go through y2.

If b(mean(X) - x1) equals mean(Y) - y2 then Y(x1) = y2, and the regression line does not go through y1.

If b(mean(X) - x1) equals neither mean(Y) - y1 nor mean(Y) - y2 then the regression line does not go through y1 or y2.
 
Last edited:
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top