F-test explained in laymans terms

  • Thread starter Thread starter jprockbelly
  • Start date Start date
  • Tags Tags
    Terms
AI Thread Summary
The F-test evaluates whether at least one of the independent variables in a multiple linear regression model significantly contributes to predicting the dependent variable. It tests the null hypothesis that all coefficients are zero against the alternative that at least one is non-zero. If the null hypothesis is true, the best estimate for the mean value of the dependent variable is simply the sample mean. The discussion also explains how the F-statistic relates to the t-statistic, demonstrating that F equals the square of the t-statistic in cases with a single independent variable. Overall, the F-test helps determine if a model with variable means is more appropriate than one with a constant mean.
jprockbelly
Messages
5
Reaction score
0
Hello, I have created a multiple linear regression fit (using least squares) for a project. The regression has two independent variables, rainfall and time, and fits these to groundwater level. The regression was calculated automatically in Excel. I have been asked to report on the P-value and F-statistic for this regression (both generated automatically by program). I have read some good explanations of P-value, but cannot find any simple explanation of the F-test or F-statistic.

Can anyone provide, or recommend a simple explanation?
 
Physics news on Phys.org
The F value is the value of the F statistic on a joint test of all indep. variable coefficients ("the betas") simultaneously being different from zero. With a single indep. var., the F test reduces to testing the indep. var. coefficient (the beta) being different from zero. This is the same hypothesis tested by the t Stat for the beta.

Try the following: drop one of your indep. variables; then verify that F = (t Stat)^2 and t Stat = SQRT(F).
 
Last edited:
Remember that in regression you're investigating whether the independent variables provide any useful information for you to use in the prediction of the mean value of Y (this is a simplified comment, but we're talking about linear regression so it works).

The basic hypotheses for the F-test are
\begin{align*}<br /> H_0 \colon &amp; \beta_1 = \beta_2 = 0 \\<br /> H_a \colon &amp; \text{At least one coefficient is not zero}<br /> \end{align*}<br />

If the null hypothesis is true you're left with the result that the best way to estimate the mean value of Y is with the ordinary sample mean. If the alternative hypothesis is true then you can say your data indicates the mean value of Y is not constant but varies in a way consistent with your model.

In short: the F-test provides a way to distinguish which of two models (constant mean vs variable mean) best describes the variable Y.
 
Thanks
 
I used to teach statistics to chemists (wannabe physicists that couldn't handle the math <G>) and found that a paticular thought experiment that explored HOW the F-distribution might be generated was useful. It goes like this:

Suppose you have a very large container of ball bearings. You extract 3 bearings ("randomly") and measure their average weight. Now you extract 5 bearings and measure the average weight of those five. You calculate the ratio. You continue this process of calculating 5 & 3 average ball bearing weight ratios and build a histogram. After you do this an "infinite" number of times you have a facsimile of the F(5,3) distribution (sort of).

Now the key idea - I pull three ball bearings from my pocket. I ask you "what is the probability that those three ball bearings came from the "big container"? The way you answer my question is to grab 5 bearings (at "random") from the big container average their weight. That average weight is compared to the average weight of the three bearing I pulled from my pocket. The position of this "experimental" ratio is located on the histogram you so laboriously constructed. Since the histogram is a picture of a probability distribution you can determine what the probability is that the three ball bearings were pulled from the "big container". In other words, what are the chances that the "big container" could produce a 5 - 3 ratio like the one you measured using the three from my pocket.

Substitute residuals for ball bearings. The "big container" contains "random" error. Why square the residuals before consulting an F-distribution? Because it gets rid of negative numbers which could produce zero values upon averaging.

Why not use 4th powers of residuals instead of squares? You could but you would have to construct the distribution - the distribution of squared values is already made for you.

I hope that helps a little. It is not rigorous but I do not think you are looking for rigor.
 
WACG, thanks for that. It actually helps alot.
 

Similar threads

Back
Top