I am looking at a solution. The problem is to predict the number of medical doctors in a county. The data set has a few variables such as the observed number of crimes in that county, proverty level etc. Since the problem is just to predict, then I assume that a data driven method is necessary. It turned out, a data driven method was used, but somewhat in combination with logical reasoning about the problem( i.e what variable might logically interact with another etc) The starting model is a linear regression full model with all the variables included but with no interaction term. Then AIC, mallows CP, best subset was used. So we get 3 models, one from each of the three algorihms. The AIC model was picked because it was the most parsimonious. That and that it had all of the indicators for the regions of the US( i.e east, south etc). I suppose the reason for this is that the boxplot of doctors by region showed that the number of doctors might differ by region. Then after the additive model was chosen. An Anova F test was used to compare the updated interaction models. So if the model from AIC was y~a+b+c Anova(y~a+b+c, y~a+b+c+b:c) and Anova(y~a+b+c, y~a+b+c+a:c) where those are the suspected interactions. From mostly logical reasoning. i.e income of a county might have a differential effect on number of doctors across different levels of crime etc) So in summary: 1)it was algorithms applied on an additive model with variables in the data set. 2) use the model from part 1) and add on the interaction terms. Then do an F test to compare model2 with model1 to see if an interaction included can better predict the number of physicians in a county. Now my question is, 1) why didn't they just use the full model with the interaction terms in it to begin with and then apply the algorithms? That way, there is no need to follow up with a model comparison with the F-test. 2)And why AIC and the other methods? Why not just use one of them?