How to do a correct prediction

  • A
  • Thread starter FallenApple
  • Start date
  • Tags
    Prediction
In summary, the conversation discussed the process of predicting the next point in a dataset. The speaker suggested using stepwise AIC to reduce the initial model and then using cross-validation to evaluate prediction error and choose the best model. They also mentioned the use of elastic-net regression as a way to automatically select variables and prevent overfitting.
  • #1
FallenApple
566
61
So say I want to predict the next point in the data.(or outside since its prediction)

So first, I would include all the variables inside the dataset into the initial model. Then I would use stepwise AIC to iteratively reduce the model down to the final model with minimum AIC. ( I can do this because I do not care about causal explanation, just about prediction)

Now, should the large model consist of just sums of the variables, or should it also include all possible combinations of interactions as well?

Finally, what do I with the final model? Do I just interpret the estimates and confidence intervals as usual?
 
Physics news on Phys.org
  • #2
If you want to do prediction, why use AIC? Why not evaluate prediction error directly? Cross-validation is what you want. Just compare all the models you're interested in and choose the one with the best predictive performance. The AIC is actually asymptotically equivalent to leave-one-out cross validation, but LOO-CV has weird properties, so go with something like k-fold cross-validation.
 
Last edited:
  • Like
Likes FallenApple
  • #3
Number Nine said:
If you want to do prediction, why use AIC? Why not evaluate prediction error directly? Cross-validation is what you want. Just compare all the models you're interested in and choose the one with the best predictive performance. The AIC is actually asymptotically equivalent to leave-one-out cross validation, but LOO-CV has weird properties, so go with something like k-fold cross-validation.

Got it.

But what if I don't have any models in mind?

Then should I just take a combination of each possibility? For example, say I want to predict y and my data set has x1,x2,x3. Then complete model with everything in it is y=x1+x2+x3+x1*x2+x1*x3+x3*x2+x1*x2*x3. That is the complete model. Then whatever valid algorithm should be able to spit out the subset that best predicts y.

What if I have some models in mind from knowledge of the science? Then would that influence my choice for the prediction?
 
  • #4
This is actually a bit of a tricky question. The simplest approach would be to just compare all possible models and select the best one, which would be 2^7 = 128 models in your case, which might take a while if you have to code them manually.

If you want to try something a little more sophisticated, my general approach to these kinds of problems is to fit the full model (with all variables and all interactions) and put a penalty on the model which will help to prevent overfitting and automatically select which variables should be included. For example, elastic-net regression is a modified version of ordinary least-squares regression which shrinks the coefficients towards zero (which tends to help avoid overfitting and increase predictive power) and has the power to set some coefficients to exactly zero, in some sense "automatically" removing variables which do not contribute to the model. If you're comfortable with R, there are several packages which will do this, included glmnet:

https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html

The package, handily, can also do cross validation automatically. The above link includes a tutorial.
 
  • Like
Likes FallenApple

1. How do I choose the right prediction model?

Choosing the right prediction model depends on various factors such as the type of data, the complexity of the problem, and the desired level of accuracy. Some commonly used models include linear regression, decision trees, and neural networks. It is important to understand the strengths and limitations of each model before making a decision.

2. How much data do I need for accurate predictions?

The amount of data needed for accurate predictions varies depending on the complexity of the problem and the chosen model. Generally, a larger dataset can lead to more reliable predictions. It is recommended to have a minimum of 100 data points for simple problems, and thousands of data points for more complex problems.

3. How do I evaluate the performance of my prediction model?

Evaluating the performance of a prediction model involves comparing the predicted values with the actual values. Common metrics used for evaluation include mean squared error, mean absolute error, and R-squared. It is important to choose the appropriate metric based on the problem and the desired level of accuracy.

4. How can I improve the accuracy of my predictions?

There are several ways to improve the accuracy of predictions, such as using more data, choosing a more advanced model, fine-tuning the model parameters, and feature selection. It is also important to regularly monitor and re-evaluate the model's performance to make necessary adjustments.

5. Can prediction models make accurate forecasts for the future?

Prediction models use historical data to make predictions, so they cannot guarantee accuracy for future events. However, by regularly updating the model with new data and monitoring its performance, it can make fairly accurate forecasts for the future. It is important to note that unexpected factors or changes in the data can affect the accuracy of the predictions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
993
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
494
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
961
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
472
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top