Regression strategy for hypothesis testing vs prediction

Click For Summary
SUMMARY

The discussion centers on the challenges of regression strategies in hypothesis testing versus prediction, particularly within the framework of frequentist statistics. Seth highlights the risks associated with automated techniques like stepwise regression, which can distort significance tests and lead to overfitting. He emphasizes the importance of using a full model for hypothesis testing while managing the bias-variance trade-off. The conversation concludes with a call for clearer resources on model building strategies tailored to different objectives.

PREREQUISITES
  • Understanding of multiple linear regression techniques
  • Familiarity with frequentist statistics and p-values
  • Knowledge of bias-variance trade-off in statistical modeling
  • Experience with variable selection methods, including stepwise regression
NEXT STEPS
  • Research best practices for hypothesis testing in multiple linear regression
  • Explore the implications of variable elimination on model accuracy
  • Learn about data splitting techniques for model validation
  • Investigate resources on model building strategies for different statistical objectives
USEFUL FOR

Statisticians, data analysts, and researchers involved in hypothesis testing and predictive modeling who seek to enhance their understanding of regression strategies and avoid common pitfalls in model building.

wvguy8258
Messages
48
Reaction score
0
Hi,

I'm interested in this question primarily in how it relates to what is usually called classic or frequentist statistics (p-values, etc). I'm fully aware of how stepwise and other automated techniques can negatively impact analyses and render inference based upon parameter estimates suspect. I also realize how variable elimination may be desirable if one wants to make predictions outside of the calibration data, as extraneous variables can lead to overfitting. I'm primarily interested in hypothesis testing using multiple linear regression. If stepwise and other automated procedures distort the interpretation of significance tests, then will this not also happen if the analyst removes variables or builds the regression somewhat interactively (i.e. adding an important variable based upon theory, then looking at partial residual plots to determine the next likely variable to include, or even determining a log transform is needed on the dependent variable after inspection of initial results,etc)? If this is the case, then it seems best to calibrate the regression model using all variables you wish to test hypotheses on and also a fewer others you feel it may be necessary to control the effects of, and then base inference on this initial full model with correction taken for multiple testing. This is a bit risky because you are playing internally with the bias-variance trade-off for the parameter estimates (do I add this variable to control for an effect and thereby inflate variance?). It just seems that to be really pure in it, you aren't supposed to really look at the data before conducting a hypothesis test. Once you break this boundary, and start trying variable transformations, removing, adding things, it seems you are moving closer and closer to data dredging. What is the remedy for this? Splitting the data, doing what you will to it to tease apart relationships, and then apply one final model to the withheld data with hands off? I've been quite frustrated for lack of finding a few succinct articles on model building strategies under different modeling objectives.

Thanks,
Seth
 
Physics news on Phys.org
wvguy8258 said:
... I've been quite frustrated for lack of finding a few succinct articles on model building strategies under different modeling objectives.

I understand your frustration - authors who can summarize the key points in a few readable lines are few and far between. Let me know how the search goes.
 

Similar threads

  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 3 ·
Replies
3
Views
1K