(regression) why would you exclude an explanatory variable

mrcleanhands · Sep 10, 2010

If someone is interested in modelling data on households to find out whether there is discrimination in the workplace why would they ever leave out variables which are relevant to explaining the dependent variable but not so relevant to the investigation?

e.g. let's say they survey age, race and family honor rank (out of 100) and the Y variable is employability (also out of 100). This is pretty bad, but its just to help illustrate my question.

Why would you exclude "education" from this regression or never bother to collect it?

Although its probably not relevant to what we are trying to discover won't you possibly get:
biased estimators (if education is correlated with "honor rank").

the only other thing I could think of is multicollinearity as an explanation, but then honour rank would have to be highly correlated with education - and we don't know that.

So isn't it best to just include the "education" variable as a sort of insurance to make sure out estimator turn out right? and we can handle multicollinearity once we test the regression.

statdad · Sep 11, 2010

Several possibilities:

* correlated estimators can play havoc with the estimates: coefficients can have the wrong sign (i.e., previous work indicates that a variable should contribute with a positive coefficient, but your model has it with a negative coefficient)
* correlated predictors can distort the standard errors of the estimates
* even if the predictors are not correlated, we look for models that do good jobs as efficiently as possible. A crude but widely used way to assess the "worth" of a regression model is to look at its R^2 value: it is a mathematical fact that this will increase any time a new predictor is introduced, regardless of whether that predictor is or is not appropriate. We have to decide whether an increased value of R^2 is worth the added complexity of the model: if it is, keep it: if not, don't keep it.

(regression) why would you exclude an explanatory variable

1. Why would you exclude an explanatory variable in regression?

2. Can excluding an explanatory variable improve the model's predictive power?

3. What are the potential consequences of excluding an explanatory variable?

4. How do you decide which explanatory variables to include or exclude in a regression model?

5. Can you add an excluded explanatory variable back into the model later on?

Similar threads

Hot Threads

Recent Insights