- #1
mrcleanhands
If someone is interested in modelling data on households to find out whether there is discrimination in the workplace why would they ever leave out variables which are relevant to explaining the dependent variable but not so relevant to the investigation?
e.g. let's say they survey age, race and family honor rank (out of 100) and the Y variable is employability (also out of 100). This is pretty bad, but its just to help illustrate my question.
Why would you exclude "education" from this regression or never bother to collect it?
Although its probably not relevant to what we are trying to discover won't you possibly get:
biased estimators (if education is correlated with "honor rank").
the only other thing I could think of is multicollinearity as an explanation, but then honour rank would have to be highly correlated with education - and we don't know that.
So isn't it best to just include the "education" variable as a sort of insurance to make sure out estimator turn out right? and we can handle multicollinearity once we test the regression.
e.g. let's say they survey age, race and family honor rank (out of 100) and the Y variable is employability (also out of 100). This is pretty bad, but its just to help illustrate my question.
Why would you exclude "education" from this regression or never bother to collect it?
Although its probably not relevant to what we are trying to discover won't you possibly get:
biased estimators (if education is correlated with "honor rank").
the only other thing I could think of is multicollinearity as an explanation, but then honour rank would have to be highly correlated with education - and we don't know that.
So isn't it best to just include the "education" variable as a sort of insurance to make sure out estimator turn out right? and we can handle multicollinearity once we test the regression.