Testing regression model with F-test

In summary, the F-test is used to check if the model is significant. The degrees of freedom for RSS is "1", which means that the F-test value is always larger than if the degrees of freedom was "75".
  • #1
Phoeniyx
16
1
Hey guys. I have some trouble understanding how the F-test is used for testing the viability of a regression model. Before I delve into the background/question, just wanted to post a link that discusses the topic briefly:
http://www.stat.yale.edu/Courses/1997-98/101/anovareg.htm

So, coming back to the question, let's say we have 77 data points (xi, yi) and we try to fit it to a regression model as:
[itex]\hat{y}_{i} = A + Bx_{i}[/itex]

In the Yale example, A and B are calculated based on the 77 data points.

To check if the model is significant, we calculated the "explained sum of squares" (ESS) which is the squared difference between the model estimate and the mean: [itex]ESS = \Sigma{(\hat{y}_{i} - \bar{y})^{2}}[/itex]

Then we calculate the "residual sum of squares" (RSS) which is the squared difference between the actual data point and the model: [itex]RSS = \Sigma{({y}_{i} - \hat{y}_{i})^{2}}[/itex]

The degrees of freedom for RSS is: # of data points - estimated model parameters from data = 77 - 2 = 75. Perfectly fine with this.

BUT, apparently, the degrees of freedom for ESS is "1"... I do not get this. More on why I am confused later in the questions section.

The [itex]F-test value = \frac{ESS / DF_{ESS}}{RSS / DF_{RSS}}[/itex], where DF = degrees of freedom

In the Yale link above, this calculates to [itex]8654.7/84.6 = 102.35[/itex]

So, I have two questions:
1) Since the degrees of freedom of ESS is always "1", if I had 85 data points (instead of 77), the F-test value would be even larger - since ESS is not averaged and is simply the sum of squares between model value and mean. e.g. the 8654.7 above could be 9500 and the 84.6 (RSS/DF) above could be higher or lower - but will likely be still around 85 (larger sum of squares / 85). Wouldn't this imply that the significance is a function of the # of data points tested on the model (as opposed to real-world observed data points)? Related to this question, does the # of (x, y) real-world observed points used for RSS calculation (e.g. 77) must be the same as the # of points for the ESS calculation? e.g. can ESS be based on 95 model trials while RSS is only based on the 77 real world values?

2) I still don't understand (conceptually) why the F-test works to check whether the model parameters A and B are zero or not. Conceptually, how does this make sense?

Thanks very much everyone.
 
Physics news on Phys.org
  • #2
Using the formulas for A and B, ESS is proportional to the variance of B, i.e. the variance of 1 degree of freedom.
 
  • #3
Hi DrDu. I am sorry, but I am not understanding your response. Could you please elaborate a bit further? Thank you.
 

1. What is the purpose of using an F-test to test a regression model?

An F-test is used to determine whether there is a significant relationship between the independent variables and the dependent variable in a regression model. It helps to determine if the model fits the data and if the independent variables are making a significant contribution to predicting the dependent variable.

2. How is an F-test calculated?

The F-test is calculated by taking the ratio of the mean squared error of the model to the mean squared error of the residuals. This ratio is then compared to an F-distribution with degrees of freedom equal to the number of independent variables in the model.

3. What do the results of an F-test indicate?

If the calculated F-statistic is greater than the critical value from the F-distribution, it indicates that the model is a good fit for the data and that the independent variables are making a significant contribution to predicting the dependent variable. On the other hand, if the calculated F-statistic is less than the critical value, it suggests that the model is not a good fit for the data and the independent variables are not significantly related to the dependent variable.

4. Can an F-test be used to compare two regression models?

Yes, an F-test can be used to compare two regression models. This is known as the extra sum of squares F-test. It compares the difference in the sum of squared errors between the two models and determines if this difference is statistically significant.

5. Are there any limitations to using an F-test to test a regression model?

Yes, there are some limitations to using an F-test. It assumes that the errors in the model are normally distributed and that the variance of the errors is constant. It is also sensitive to outliers and can give misleading results if the assumptions of the test are not met. Additionally, it does not consider the overall fit of the model, only the significance of the individual independent variables.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
391
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
2K
Back
Top