Multiple linear regression: partial F-test

In summary, when testing the significance of adding two additional predictor variables to a multiple linear regression model with three existing independent variables, the partial F-test for the coefficients of the added variables is equivalent to testing the increase in R^2 for the larger model. This is because as more predictors are added, R^2 will always increase, so the question is whether this increase is due to chance or if at least one of the new coefficients is non-zero. The test statistic for this is F = [(R^2_full - R^2_reduced) / (5 -3)] / [(1 - R^_full) / (n - 5 - 1)], where R^2_full is the R^2 for the model
  • #1
kingwinner
1,270
0
"Suppose that in a MULTIPLE linear regression analysis, it is of interest to compare a model with 3 independent variables to a model with the same response varaible and these same 3 independent variables plus 2 additional independent variables.
As more predictors are added to the model, the coefficient of multiple determination (R^2) will increase, so the model with 5 predicator variables will have a higher R^2.
The partial F-test for the coefficients of the 2 additional predictor variables (H_o: β_4=β_5=0) is equivalent to testing that the increase in R^2 is statistically signifcant."


I don't understand the bolded sentence. Why are they equivalent?

Thanks for explaining!
 
Physics news on Phys.org
  • #2
Mathematically [tex] R^2 [/tex] will increase whether or not the new variables contribute to the model. Because of this, the question in practice is whether the larger [tex] R^2 [/tex] is due simply to the math (this corresponds to [tex] H_0 \colon \beta_4 = \beta_5 = 0 [/tex]) or whether the increase is due to at least one of the two coefficients is non-zero (this would be the alternative hypothesis that at least one of the two coefficients is non-zero). If [tex] H_0 [/tex] is rejected, we know at least one coefficient is non-zero, and we also know that the increase in [tex] R^2 [/tex] is due to something other than mere chance.

Does this help, or were you looking for a mathematical explanation?
 
  • #3
Do you have a mathematical explanation for that?

The statement claims that the test of H_o: β_4 = β_5 = 0 is equivalent to testing that the increase in R^2 is statistically signifcant. What would be the equivalent null and alternative hypotheses in terms of R^2?

Thanks!
 
Last edited:
  • #4
Suppose you have a total of five variables (since you reference [tex] \beta_4, \beta_5 [/tex]

We want to test

[tex]
\begin{align*}
H_0 \colon & \beta_4 = \beta_5 = 0 \\
H_a \colon & \text{At least one of } \beta_4, \beta_5 \ne 0
\end{align*}
[/tex]

The test begins with the fitting of a full and a reduced model:

[tex]
Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 \tag{Full}
[/tex]

[tex]
Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \tag{Reduced}
[/tex]

Denote the sum of squares for error in the full model by [tex] SSE(F) = SSE(x_1, x_2, x_3, x_4, x_5) [/tex], and the sum of squares for error in the reduced model by [tex] SSE(R) = SSE(x_1, x_2, x_3) [/tex]

Since we use more variables in the full model than in the reduced model, we will see [tex] SSE(F) < SSE(R) [/tex]. The test statistic for the above hypotheses are

[tex]
F = \frac{SSE(R) - SSE(F)}{(n-4) - (n-6)} \div \frac{SSE(F)}{n-6}
[/tex]

In the old days (to be read as "when statdad was in school") the numerator of this statistic was written as

[tex]
SSE(R) - SSE(F) = SSE(X_1, X_2, X_3) - SSE(X_1, X_2, X_3, X_4, X_5) = SSR(X_4, X_5 \mid X_1, X_2, X_3)
[/tex]

Think of the last notation ("sum of squares R eduction") as denoting the reduction in variation from adding [tex] x_4, x_5 [/tex] to a model that already contains the other three variables. The test is done by comparing F to the appropriate tables.

How is this related to [tex] R^2 [/tex]? It isn't, directly, it is related to something called a coefficient of partial determination . The first bit of notation is this:

[tex]
r^2_{Y45.123}
[/tex]

In the subscript the numbers to the left of the "." are the dependent variable and the "number label" of the variables being added to the model, while the numbers to the right of the "." are the "number labels" of the variables originally in the model. The coefficient of partial determination is calculated as

[tex]
r^2_{Y45.123} = \frac{SSR(X_4, X_5 \mid X_1, X_2, X_3)}{SSE(X_1, X_2, X_3)}
[/tex]

Technically, this measures the percentage reduction in error sum of squares that results when we move from the model with 3 variables to the model with all 5 variables.

When the F-test referred to above is significant) ([tex] H_0 [/tex] is rejected), this coefficient of partial determination indicates a significant [/tex] change in [tex] R^2 [/tex]

Hope this helped.
 
  • Like
Likes WWGD
  • #5
Thanks!

R^2 = regression SS/total SS

F = [(R^2_full - R^2_reduced) / (5 -3)] / [(1 - R^_full) / (n - 5 - 1)] .
where R^2_full is the R^2 with 5 independent variables and R^2_reduced is the R^2 with 3 independent variables

Based on this form of the F statistic, can we say that the partial F-test for the coefficients of the 2 additional predictor variables (H_o: β_4=β_5=0) is equivalent to testing that the increase in R^2 is statistically signifcant?
 
  • #6
Yes - good job.
 

What is multiple linear regression?

Multiple linear regression is a statistical method used to model the relationship between multiple independent variables and a single dependent variable. It is commonly used in data analysis and prediction to understand how changes in the independent variables affect the dependent variable.

What is a partial F-test in multiple linear regression?

A partial F-test is a statistical test used to determine the significance of adding or removing a variable from a multiple linear regression model. It compares the fit of a model with the variable in question to the fit of a model without the variable, to see if the variable adds any significant explanatory power to the model.

How is a partial F-test conducted in multiple linear regression?

To conduct a partial F-test, the model is first fitted with the variable in question, and the sum of squared errors (SSE) is calculated. Then, the model is refitted without the variable, and the new SSE is calculated. The partial F-test statistic is then calculated by dividing the difference in SSE by the degrees of freedom, and comparing it to a critical F-value based on the significance level and degrees of freedom.

What is the purpose of a partial F-test in multiple linear regression?

The purpose of a partial F-test is to determine whether or not a variable should be included in a multiple linear regression model. If the partial F-test shows that the variable adds significant explanatory power to the model, it should be included in the final model. If not, it can be removed to simplify the model and improve its accuracy.

What are some limitations of the partial F-test in multiple linear regression?

One limitation of the partial F-test is that it assumes the residuals of the model are normally distributed. If the residuals are not normally distributed, the results of the test may be inaccurate. Additionally, the partial F-test does not account for interactions between variables, which can also affect the significance of a variable in the model.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
804
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
455
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
Back
Top