Is this a good forecasting model?

musicgold · Dec 30, 2013

Hi,

Please see the attached Excel file.

I have a sample of 70 data pairs. The correlation between X and Y is -0.68. The OLS regression coefficient is statistically significant as shown in the file. However, with a R^2 of 0.40, I am not sure if my model would be good enough to forecast Y.
Can you please take a look?

Thanks.

Office_Shredder · Dec 30, 2013

The phrase "good enough to forecast Y" is a bit loaded. How well do you want to forecast Y? If you simply want to know whether this is sufficient to imply that X and Y are not uncorrelated then you should do a t-test:

http://en.wikipedia.org/wiki/Student's_t-test#Slope_of_a_regression_line

musicgold · Dec 30, 2013

Thanks.

If you simply want to know whether this is sufficient to imply that X and Y are not uncorrelated then you should do a t-test:

As shown in the Excel file, the regression analysis gets a very low p-value for the coefficient, so I know they are related (or not independent).

Separately, I also calculated the t-statistic of the correlation coefficient, which was 6.7, i.e. there is a very low chance that the sample correlation value occurred randomly. So I am quite confident that there is a statistically significant relationship.

What is making me nervous is the relatively low value of R^2 of the regression line. I am not sure how confident I should be about the predictions by this model.

mfb · Dec 30, 2013

musicgold said:

What is making me nervous is the relatively low value of R^2 of the regression line. I am not sure how confident I should be about the predictions by this model.

As you can see in the graph, knowing the x-value won't give a reliable prediction for y. It is better than not knowing the x-value (that's what the non-zero correlation tells you), but the spread of the datapoints is quite large.

Number Nine · Dec 30, 2013

If you want to determine the predictive value of your model, set aside a portion of your data to use for validation (or collect new measurements and use them for validation). I agree with mfb that the model probably won't have a great deal of predictive value.

the regression analysis gets a very low p-value for the coefficient, so I know they are related (or not independent).

Separately, I also calculated the t-statistic of the correlation coefficient, which was 6.7, i.e. there is a very low chance that the sample correlation value occurred randomly.

You should be aware that the p- and t- values don't really allow you to say any of those things. Null hypothesis testing (especially the p-value) is very commonly misinterpreted; the wikipedia article contains a list of common misconceptions that you may want to read.

musicgold · Dec 30, 2013

mfb said:

the spread of the datapoints is quite large.

What do you mean by this?

mfb · Dec 30, 2013

musicgold said:

What do you mean by this?

See the highlighted areas in the attached image - very similar x-values (within each of them), but a large variation in y. Your prediction can be something like "it is probably within that y-range", but not better than that.

attachment.php?attachmentid=65221&d=1388424451.png

musicgold · Dec 30, 2013

Got it. Thanks.

musicgold · Jan 2, 2014

Number Nine said:

If you want to determine the predictive value of your model, set aside a portion of your data to use for validation (or collect new measurements and use them for validation).

So should I go back, take only 60 of the 74 points and run the regression analysis again and see how the new model predicts the Y values for the remaining 14 X-values?

If yes, how should I go about selecting the 60 points, randomly?

Thanks.

Is this a good forecasting model?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Attachments

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight