Regression SS in multiple linear regression

In summary: In multiple linear regression, with matrix notation, the "nice" approach of drawing pictures to represent things breaks down. However, a little linear algebra can be used to describe exactly why the residuals sum to zero, why the different quantities have different degrees of freedom, as well as provide convenient ways to generate tests.
  • #1
kingwinner
1,270
0
In MULTIPLE linear regression, is it still true that the regression sum of squares is equal to
∑ (Y_i hat -Y bar)^2 ?

My textbook defines regression SS in the chapters for simple linear regression as ∑ (Y_i hat -Y bar)^2, and then in the chapters for multiple linear regression, the regression SS is defined in MATRIX form, and it did not say anywhere whether it is still equal to ∑ (Y_i hat -Y bar)^2 or not, so I am confused...

If it is still equal to ∑ (Y_i hat -Y bar)^2 in MULTIPLE linear regression (this is such a simple formula), what is the whole point of expressing the regression SS in terms of matrices in mutliple linear regression? I don't see any point of doing so when the formula ∑ (Y_i hat -Y bar)^2 is already so simple. There is no need to develop additional headaches...

Thanks for explaining!
 
Physics news on Phys.org
  • #2
I think you have notation (and/or terms) confused. In simple linear regression

[tex]
\begin{align*}
SSTO & = \sum(Y_i - \bar Y)^2 \\
SSE & = \sum (Y_i - \hat Y_i)^2 \\
SSR & = SSTO - SSE = \sum (\hat Y_i - \bar Y)^2
\end{align*}
[/tex]

In multiple linear regression, with matrix notation,

[tex]
\begin{align*}
SSTO & = \mathbf{Y}'\mathbf{Y} - n \bar{Y}^2 \quad(=\sum (Y_i - \bar Y)^2)\\
SSE & = \hat{e}' \hat{e} = \mathbf{Y}' \mathbf{Y} - \hat{\mathbf{\beta}}' \mathbf{X}' \mathbf{Y} \quad (=\sum (Y_i - \hat Y_i)^2) \\
SSR & = SSTO - SSE = \hat{\mathbf{\beta}}' \mathbf{X}' \mathbf{Y} - n \bar{Y}^2
\end{align*}
[/tex]

The matrix approach isn't here simply to cause confusion: in multiple linear regression the "nice" approach of drawing pictures to represent things breaks down. However, a little linear algebra can be used to describe exactly why the residuals sum to zero, why the different quantities have different degrees of freedom, as well as provide convenient ways to generate tests (there are many theorems that describe the probability distribution of different quadratic forms of multivariate normal distributions: using matrices in multiple regression allow these theorems to be used to develop hypothesis tests.)

On a more basic level: imagine trying to derive the normal equations (to estimate the regression coefficients) by algebra rather than via the matrix approach. It isn't fun.

As one more: example:

The fitted values in multiple regression can be written as

[tex]
\hat Y = X \left(X'X\right)^{-1} X' Y \equiv P_V Y
[/tex]

where [tex] P_V = X \left(X'X\right)^{-1} X' [/tex] is a projection matrix onto the space spanned by the columns of [tex] X [/tex].

The residuals are

[tex]
\hat e & = Y - \hat Y = \left(I - X \left(X'X\right)^{-1} X'\right) Y \equiv P_{\hat V} Y
[/tex]

where [tex] P_{\hat V} = I - X \left(X'X\right)^{-1} X' [/tex] is the projection onto the space orthogonal to the column space of [tex] X [/tex].

Now

[tex]
\hat e' \hat Y = Y' P_{\hat V} \left(I - P_{\hat V}\right) Y = Y' \left(P_{\hat V} - P_{\hat V}^2\right) Y = Y' \left(P_{\hat V} - P_{\hat V}\right) Y = 0
[/tex]

or, in short,

[tex]
\sum \hat{e}_i \hat{y}_i = 0
[/tex]

just as in linear regression.
 
  • #3
Do you know how to prove that
SSE = Syy – Sxy2/Sxx = Syy - β1hat2 * Sxx
 

1. What is Regression SS in multiple linear regression?

Regression SS (sum of squares) in multiple linear regression is a statistical measure that indicates the amount of variation in the dependent variable that is explained by the independent variables in the model. It is a measure of the overall predictive power of the regression model.

2. How is Regression SS calculated in multiple linear regression?

Regression SS is calculated by taking the sum of the squared differences between the actual values of the dependent variable and the predicted values from the regression model. This value is then compared to the sum of the squared differences between the actual values and the mean of the dependent variable. The difference between these two sums is the Regression SS.

3. What does a high Regression SS indicate in multiple linear regression?

A high Regression SS indicates that the independent variables in the regression model are able to explain a significant amount of the variation in the dependent variable. This suggests that the model has good predictive power and is a good fit for the data.

4. Can Regression SS be negative in multiple linear regression?

No, Regression SS cannot be negative in multiple linear regression. It is always a positive value, as it is calculated by squaring the differences between the actual and predicted values. A negative value would indicate that the model is performing worse than simply using the mean of the dependent variable to make predictions.

5. How is Regression SS used to evaluate the performance of a multiple linear regression model?

Regression SS is used in conjunction with other statistics, such as the coefficient of determination (R-squared) and the F-statistic, to evaluate the performance of a multiple linear regression model. A higher R-squared and a significant F-statistic indicate a strong model with good predictive power, while a low R-squared and a non-significant F-statistic suggest that the model may not be a good fit for the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
658
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
908
Back
Top