Discussion Overview
The discussion revolves around the use of data-driven models versus Bayesian methods for model selection in statistical analysis. Participants explore the implications of model selection strategies, the potential for overfitting, and the role of confounders in regression analysis.
Discussion Character
- Debate/contested
- Technical explanation
- Mathematical reasoning
Main Points Raised
- Some participants suggest that data-driven models may inflate type 1 error due to overfitting, advocating for pre-defined models before analyzing data.
- Others argue for the importance of examining scatterplot matrices to identify confounders that should be adjusted for in regression models.
- Stepwise multiple regression procedures are mentioned as a means to avoid overfitting, with various approaches (forward, backward, both) discussed.
- There is a question about whether it is possible to combine a scientific question-driven regression setup with stepwise methods to identify confounders.
- Some participants express skepticism about the effectiveness of stepwise regression and suggest that it may not adequately protect against overfitting, recommending cross-validation and regularization instead.
- Bayesian methods are proposed as an alternative, with the Bayes factor mentioned as a tool for model comparison without the multiple comparisons issue.
- Concerns are raised about whether stepwise procedures can account for confounders or variables in causal pathways, with specific examples provided to illustrate potential issues.
Areas of Agreement / Disagreement
Participants do not reach a consensus on the best approach to model selection, with multiple competing views on the effectiveness of data-driven models, stepwise regression, and Bayesian methods. The discussion remains unresolved regarding the optimal strategy for addressing confounders in regression analysis.
Contextual Notes
Limitations include the potential for overfitting in stepwise regression, the complexity of distinguishing between correlation and causation, and the unresolved nature of how different methods handle confounders.