# Is this a good forecasting model?

1. Dec 30, 2013

### musicgold

Hi,

Please see the attached Excel file.

I have a sample of 70 data pairs. The correlation between X and Y is -0.68. The OLS regression coefficient is statistically significant as shown in the file. However, with a R^2 of 0.40, I am not sure if my model would be good enough to forecast Y.
Can you please take a look?

Thanks.

#### Attached Files:

• ###### Regression example.xlsx
File size:
48.5 KB
Views:
58
2. Dec 30, 2013

### Office_Shredder

Staff Emeritus
3. Dec 30, 2013

### musicgold

Thanks.

As shown in the Excel file, the regression analysis gets a very low p-value for the coefficient, so I know they are related (or not independent).

Separately, I also calculated the t-statistic of the correlation coefficient, which was 6.7, i.e. there is a very low chance that the sample correlation value occurred randomly. So I am quite confident that there is a statistically significant relationship.

What is making me nervous is the relatively low value of R^2 of the regression line. I am not sure how confident I should be about the predictions by this model.

4. Dec 30, 2013

### Staff: Mentor

As you can see in the graph, knowing the x-value won't give a reliable prediction for y. It is better than not knowing the x-value (that's what the non-zero correlation tells you), but the spread of the datapoints is quite large.

5. Dec 30, 2013

### Number Nine

If you want to determine the predictive value of your model, set aside a portion of your data to use for validation (or collect new measurements and use them for validation). I agree with mfb that the model probably won't have a great deal of predictive value.

You should be aware that the p- and t- values don't really allow you to say any of those things. Null hypothesis testing (especially the p-value) is very commonly misinterpreted; the wikipedia article contains a list of common misconceptions that you may want to read.

6. Dec 30, 2013

### musicgold

What do you mean by this?

7. Dec 30, 2013

### Staff: Mentor

See the highlighted areas in the attached image - very similar x-values (within each of them), but a large variation in y. Your prediction can be something like "it is probably within that y-range", but not better than that.

#### Attached Files:

• ###### reg.png
File size:
5.7 KB
Views:
99
8. Dec 30, 2013

### musicgold

Got it. Thanks.

9. Jan 2, 2014

### musicgold

So should I go back, take only 60 of the 74 points and run the regression analysis again and see how the new model predicts the Y values for the remaining 14 X-values?

If yes, how should I go about selecting the 60 points, randomly?

Thanks.