(regression) why would you exclude an explanatory variable

In summary, when trying to determine if there is discrimination in the workplace, it is important to include all relevant variables in the regression analysis. Leaving out variables such as education could result in biased estimators and distorted standard errors. Even if a variable may not seem directly related, it is best to include it as a form of insurance to ensure accurate estimates. In addition, adding more predictors can increase the R^2 value, but we must consider if the added complexity is worth it for the accuracy of the model.
  • #1
mrcleanhands
If someone is interested in modelling data on households to find out whether there is discrimination in the workplace why would they ever leave out variables which are relevant to explaining the dependent variable but not so relevant to the investigation?

e.g. let's say they survey age, race and family honor rank (out of 100) and the Y variable is employability (also out of 100). This is pretty bad, but its just to help illustrate my question.


Why would you exclude "education" from this regression or never bother to collect it?

Although its probably not relevant to what we are trying to discover won't you possibly get:
biased estimators (if education is correlated with "honor rank").

the only other thing I could think of is multicollinearity as an explanation, but then honour rank would have to be highly correlated with education - and we don't know that.

So isn't it best to just include the "education" variable as a sort of insurance to make sure out estimator turn out right? and we can handle multicollinearity once we test the regression.
 
Physics news on Phys.org
  • #2
Several possibilities:

* correlated estimators can play havoc with the estimates: coefficients can have the wrong sign (i.e., previous work indicates that a variable should contribute with a positive coefficient, but your model has it with a negative coefficient)
* correlated predictors can distort the standard errors of the estimates
* even if the predictors are not correlated, we look for models that do good jobs as efficiently as possible. A crude but widely used way to assess the "worth" of a regression model is to look at its R^2 value: it is a mathematical fact that this will increase any time a new predictor is introduced, regardless of whether that predictor is or is not appropriate. We have to decide whether an increased value of R^2 is worth the added complexity of the model: if it is, keep it: if not, don't keep it.
 

1. Why would you exclude an explanatory variable in regression?

One possible reason for excluding an explanatory variable in regression is if it is highly correlated with another explanatory variable already included in the model. This can lead to multicollinearity, which can cause issues with the interpretation and accuracy of the regression coefficients.

2. Can excluding an explanatory variable improve the model's predictive power?

In some cases, yes. If the excluded variable does not have a significant impact on the dependent variable, removing it from the model can reduce noise and improve the model's predictive power. However, this should only be done if there is evidence that the variable is not important.

3. What are the potential consequences of excluding an explanatory variable?

Excluding an important explanatory variable can lead to biased and misleading results. It can also cause the remaining variables in the model to have inflated coefficients, making their significance levels unreliable. Additionally, excluding variables without proper justification can weaken the overall credibility of the model.

4. How do you decide which explanatory variables to include or exclude in a regression model?

The decision to include or exclude explanatory variables should be based on a combination of statistical analysis and subject matter expertise. Factors such as the strength of the relationship with the dependent variable, multicollinearity, and theoretical relevance should all be considered. It is important to carefully evaluate each variable and make a well-informed decision.

5. Can you add an excluded explanatory variable back into the model later on?

Yes, it is possible to add an excluded explanatory variable back into the model at a later stage. However, this should be done with caution and only if there is a valid reason for doing so. Adding variables without proper justification can lead to issues with overfitting and undermine the validity of the model's results.

Similar threads

  • Calculus and Beyond Homework Help
Replies
1
Views
994
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
5K
  • Quantum Physics
Replies
9
Views
2K
  • General Discussion
Replies
4
Views
666
Replies
89
Views
6K
  • Sci-Fi Writing and World Building
Replies
31
Views
2K
  • Quantum Physics
Replies
25
Views
3K
Replies
8
Views
2K
Back
Top