- #1
- 50
- 0
Hi,
I'm running a few generalized linear models. One of the predictors of interest is a categorical variable with 4 levels. I have this coded as 3 dummy variables, with one as a baseline that will influence the intercept (multicollinearity concerns prompt this, of course). I have not read a good treatment of the following: should you consider dropping an individual dummy variable from the model or only do so by the whole group (meaning all in or all out). The categorical variable here is land use/cover, the classes are forest, agriculture, grass, wetlands. Forest is the category not represented by a dummy variable. If agriculture and grass are statistically significant but wetland is not, then it seems the effect of removing wetland as a variable is to make forest/wetland now a single, baseline category. This has some intuitive appeal because the nonsignificant results indicates the possibility of no difference between forest and wetland as a predictor. So, in a sense, you are allowing the model results to inform how to modify the categorical variable from which the dummy variables are produced, in this case aggregating forest/wetland would be indicated. Am I missing something important here? Any literature recommendation that is related? Thanks, Seth
I'm running a few generalized linear models. One of the predictors of interest is a categorical variable with 4 levels. I have this coded as 3 dummy variables, with one as a baseline that will influence the intercept (multicollinearity concerns prompt this, of course). I have not read a good treatment of the following: should you consider dropping an individual dummy variable from the model or only do so by the whole group (meaning all in or all out). The categorical variable here is land use/cover, the classes are forest, agriculture, grass, wetlands. Forest is the category not represented by a dummy variable. If agriculture and grass are statistically significant but wetland is not, then it seems the effect of removing wetland as a variable is to make forest/wetland now a single, baseline category. This has some intuitive appeal because the nonsignificant results indicates the possibility of no difference between forest and wetland as a predictor. So, in a sense, you are allowing the model results to inform how to modify the categorical variable from which the dummy variables are produced, in this case aggregating forest/wetland would be indicated. Am I missing something important here? Any literature recommendation that is related? Thanks, Seth