Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Is this a good forecasting model?

  1. Dec 30, 2013 #1

    Please see the attached Excel file.

    I have a sample of 70 data pairs. The correlation between X and Y is -0.68. The OLS regression coefficient is statistically significant as shown in the file. However, with a R^2 of 0.40, I am not sure if my model would be good enough to forecast Y.
    Can you please take a look?


    Attached Files:

  2. jcsd
  3. Dec 30, 2013 #2


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

  4. Dec 30, 2013 #3

    As shown in the Excel file, the regression analysis gets a very low p-value for the coefficient, so I know they are related (or not independent).

    Separately, I also calculated the t-statistic of the correlation coefficient, which was 6.7, i.e. there is a very low chance that the sample correlation value occurred randomly. So I am quite confident that there is a statistically significant relationship.

    What is making me nervous is the relatively low value of R^2 of the regression line. I am not sure how confident I should be about the predictions by this model.
  5. Dec 30, 2013 #4


    User Avatar
    2017 Award

    Staff: Mentor

    As you can see in the graph, knowing the x-value won't give a reliable prediction for y. It is better than not knowing the x-value (that's what the non-zero correlation tells you), but the spread of the datapoints is quite large.
  6. Dec 30, 2013 #5
    If you want to determine the predictive value of your model, set aside a portion of your data to use for validation (or collect new measurements and use them for validation). I agree with mfb that the model probably won't have a great deal of predictive value.

    You should be aware that the p- and t- values don't really allow you to say any of those things. Null hypothesis testing (especially the p-value) is very commonly misinterpreted; the wikipedia article contains a list of common misconceptions that you may want to read.
  7. Dec 30, 2013 #6
    What do you mean by this?
  8. Dec 30, 2013 #7


    User Avatar
    2017 Award

    Staff: Mentor

    See the highlighted areas in the attached image - very similar x-values (within each of them), but a large variation in y. Your prediction can be something like "it is probably within that y-range", but not better than that.


    Attached Files:

    • reg.png
      File size:
      5.7 KB
  9. Dec 30, 2013 #8
    Got it. Thanks.
  10. Jan 2, 2014 #9
    So should I go back, take only 60 of the 74 points and run the regression analysis again and see how the new model predicts the Y values for the remaining 14 X-values?

    If yes, how should I go about selecting the 60 points, randomly?

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook