Coefficient of Determination in case of repeat points, in linear regression

maverick280857
Messages
1,774
Reaction score
5
Hello,

In simple linear regression (or even in multiple linear regression) how does one prove that the coefficient of determination, given by

R^2 = \frac{SS_{Reg}}{SS_{Total}} = 1-\frac{SS_{Res}}{SS_{Total}}= 1-\frac{\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}{\sum_{i=1}^{n}(y_i-\overline{y})^2}

is strictly less than 1, if there are repeat points? That is, if there are multiple values of the response y_i at one value of the regressor x_i?

Thanks in advance.
 
Physics news on Phys.org
Wouldn't a general proof be sufficient?
 
Well, it is easy to see that R^2 \leq 1. For the repeat-point case, I want to show that R^2 < 1.
 
Ah; thanks for pointing that out.

I have the outline of a heuristic proof. For R^2 = 1 the regression line has to coincide with all data points. Also, as a general matter, the regression line y = a + b x has to go through the sample averages of (X, Y) -- that is, mean(Y) = a + b mean(X). Suppose your data are {(x1, y1), (x1, y2), (x2, y3)}, y1 is not equal to y2, and your slope coefficient satisfies -infty < b < +infty.

If b(mean(X) - x1) equals mean(Y) - y1 then Y(x1) = y1, and the regression line does not go through y2.

If b(mean(X) - x1) equals mean(Y) - y2 then Y(x1) = y2, and the regression line does not go through y1.

If b(mean(X) - x1) equals neither mean(Y) - y1 nor mean(Y) - y2 then the regression line does not go through y1 or y2.
 
Last edited:
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top