Regression strategy for hypothesis testing vs prediction

In summary, the conversation discusses concerns about the impact of automated techniques on hypothesis testing using multiple linear regression. The speaker suggests calibrating the regression model with all relevant variables and controlling for multiple testing. However, they also acknowledge the risk of data dredging and suggest splitting the data as a potential remedy. They express frustration with the lack of concise articles on model building strategies for different modeling objectives.
  • #1
wvguy8258
50
0
Hi,

I'm interested in this question primarily in how it relates to what is usually called classic or frequentist statistics (p-values, etc). I'm fully aware of how stepwise and other automated techniques can negatively impact analyses and render inference based upon parameter estimates suspect. I also realize how variable elimination may be desirable if one wants to make predictions outside of the calibration data, as extraneous variables can lead to overfitting. I'm primarily interested in hypothesis testing using multiple linear regression. If stepwise and other automated procedures distort the interpretation of significance tests, then will this not also happen if the analyst removes variables or builds the regression somewhat interactively (i.e. adding an important variable based upon theory, then looking at partial residual plots to determine the next likely variable to include, or even determining a log transform is needed on the dependent variable after inspection of initial results,etc)? If this is the case, then it seems best to calibrate the regression model using all variables you wish to test hypotheses on and also a fewer others you feel it may be necessary to control the effects of, and then base inference on this initial full model with correction taken for multiple testing. This is a bit risky because you are playing internally with the bias-variance trade-off for the parameter estimates (do I add this variable to control for an effect and thereby inflate variance?). It just seems that to be really pure in it, you aren't supposed to really look at the data before conducting a hypothesis test. Once you break this boundary, and start trying variable transformations, removing, adding things, it seems you are moving closer and closer to data dredging. What is the remedy for this? Splitting the data, doing what you will to it to tease apart relationships, and then apply one final model to the withheld data with hands off? I've been quite frustrated for lack of finding a few succinct articles on model building strategies under different modeling objectives.

Thanks,
Seth
 
Physics news on Phys.org
  • #2
wvguy8258 said:
... I've been quite frustrated for lack of finding a few succinct articles on model building strategies under different modeling objectives.

I understand your frustration - authors who can summarize the key points in a few readable lines are few and far between. Let me know how the search goes.
 

1. What is the difference between regression strategy for hypothesis testing and prediction?

Regression strategy for hypothesis testing is used to determine the relationship between an independent variable and a dependent variable. It is used to test a specific hypothesis and determine if there is a significant relationship between the variables. On the other hand, regression strategy for prediction is used to make predictions about future values of a dependent variable based on the relationship with an independent variable. It is not focused on testing a specific hypothesis, but rather on predicting outcomes.

2. How do you choose between using regression for hypothesis testing or prediction?

The choice between using regression for hypothesis testing or prediction depends on the research question and the purpose of the study. If the main goal is to test a specific hypothesis, then regression strategy for hypothesis testing would be more appropriate. If the main goal is to make predictions, then regression strategy for prediction would be more suitable.

3. What are some common regression models used for hypothesis testing and prediction?

Some common regression models used for hypothesis testing include linear regression, logistic regression, and multiple regression. For prediction, common regression models include linear regression, time series regression, and polynomial regression.

4. What are the main assumptions of regression for hypothesis testing and prediction?

The main assumptions for regression in hypothesis testing are that the relationship between the variables is linear, the errors are normally distributed, and the variability of the errors is constant. For prediction, the main assumptions are that the relationship between the variables is stable over time and that the data used for prediction is representative of future data.

5. How do you evaluate the performance of a regression model for hypothesis testing or prediction?

The performance of a regression model for hypothesis testing can be evaluated by looking at the significance of the relationship between the variables, the strength of the relationship, and the accuracy of the model in predicting the dependent variable. For prediction, the performance can be evaluated by comparing the predicted values to the actual values and calculating the error or accuracy metrics such as mean square error or R-squared.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
464
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
476
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • General Math
Replies
1
Views
809
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
Back
Top