- #1

EngWiPy

- 1,368

- 61

While I was reading about linear regression, I stumbled on the concept r-squared statistic that measures the goodness-of-fit of the line to the data points. It is defined as:

[tex]R^2 = 1 - \frac{\sum_i (y_i - f(x_i))^2}{\sum_i (y_i - E[y])^2}[/tex]

where f(x_i) is the fitted/predicted response value due to x_i, y_i is the actual observed response variable, and E[y] is the expected value of {y}_i.

It is said that this statistic falls between 0 and 1. I can understand why r-squared could be 1 (it means that y_i = f(x_i), i.e., the line fits the data points exactly), but how could r-squared be 0? This implies, I think, that the maximum variation of y is around its mean, and thus the numerator cannot exceeds that value? Is this true?

Thanks