Expected coefficient change from simple to multiple linear regression

Click For Summary
SUMMARY

The discussion centers on the impact of multicollinearity on coefficient values in linear regression models. Participants confirm that coefficients for predictors in a multiple linear regression model, represented as $$Y= \beta_3 X_3 + \beta_2 X_2 + \beta_1 X_1 + \beta_0$$, can change in both magnitude and sign compared to simple linear regression models. It is established that multicollinearity, which occurs when predictor variables are correlated, can lead to unexpected changes in coefficients. Additionally, even in the absence of multicollinearity, random variations in data can also result in sign changes of coefficients.

PREREQUISITES
  • Understanding of linear regression models, including simple and multiple linear regression.
  • Familiarity with the concept of multicollinearity and its implications in regression analysis.
  • Knowledge of coefficient interpretation in statistical modeling.
  • Basic proficiency in statistical software for regression analysis (e.g., R, Python with statsmodels).
NEXT STEPS
  • Research the effects of multicollinearity on regression coefficients using R's 'vif' function.
  • Learn about techniques to detect multicollinearity, such as Variance Inflation Factor (VIF) and correlation matrices.
  • Explore methods to mitigate multicollinearity, including ridge regression and principal component analysis (PCA).
  • Study the implications of coefficient sign changes in the context of model interpretation and decision-making.
USEFUL FOR

Data analysts, statisticians, and researchers involved in regression modeling who need to understand the nuances of coefficient behavior in the presence of multicollinearity.

fog37
Messages
1,566
Reaction score
108
TL;DR
understand the expected coefficient change (magnitude and sign) from simple to multiple linear regression
Hello forum,

I have created some linear regression models based on a simple dataset with 4 variables (columns). The first models simply involve one predictor variable: $$Y=\beta_1 X_1+\beta_0$$ and $$Y=\beta_2 X_2+ \beta_0$$
The 3rd model is multiple linear regression model involving the 3 predictors: $$Y= \beta_3 X_3 + \beta_2 X_2 + \beta_1 X_1 + \beta_0$$
I believe that the coefficient ##\beta_1## or ##\beta_2## for the predictors ##X_1## and ##X_2## change in magnitude when the two predictors are included together in the multivariate model (model 3), correct? What about the sign of those coefficients? Should the sign stay the same or can it possibly change?

I would think that the sign should remain the same to indicate that the variable ##Y## and ##X_1## (or ##X_2##) vary in the same direction in both the simple and multiple linear regression models...

Now, if multicollinearity is present, the coefficients for each predictor would certainly change in magnitude and sign from the coefficients in the simple linear regression model but not in the correct way due to the inter-variable correlation...

Thanks
 
Physics news on Phys.org
I agree with you but with a couple of caveats:
  1. In real-world models, multicollinearity (correlation between the explanatory variables ##X_1,X_2,X_3##) is usually present, which undermines the expectation stated in your second last para.
  2. Even without genuine multicollinearity, random idiosyncratic variation in the sample can make an appearance of multicollinearity, in which case we can still get sign changes of coefficients. This will not usually happen, but it will sometimes. The larger the data set, the less often it will happen.
 
  • Like
Likes   Reactions: fog37
andrewkirk said:
I agree with you but with a couple of caveats:
  1. In real-world models, multicollinearity (correlation between the explanatory variables ##X_1,X_2,X_3##) is usually present, which undermines the expectation stated in your second last para.
  2. Even without genuine multicollinearity, random idiosyncratic variation in the sample can make an appearance of multicollinearity, in which case we can still get sign changes of coefficients. This will not usually happen, but it will sometimes. The larger the data set, the less often it will happen.
Thanks for the quick and interesting reply. I am indeed surprise to learn that even, without any multicollinearity, a change in coefficient sign may be possible when the same variables of interest are present in both a simple and in a multiple regression model...

In regards to multicollinearity, my understanding is that it affects the coefficients' values in strange ways. I recently learned that, in the case of a model with a term ##X## and a quadratic term ##X^2##, like $$Y=\beta_1+\beta_2 X^2$$ it seems that multicollinearity would not be a problem if ##X## and ##X^2## are dependent (even if not linearly dependent). Isn't the fact that one variable changing causes an change in the other variable the prime definition of multicollinearity?
 
fog37 said:
Isn't the fact that one variable changing causes an change in the other variable the prime definition of multicollinearity?
No, they just have to be correlated. Causation is not part of the definition (eg see here). A common situation is where the correlation arises from each of the explanatory variables being driven ("caused") by another variable that may not be part of the set of explanatory variables. eg in a regression that had population crime levels and sickness levels as explanatory variables, we would likely find that those two are correlated because driven by a third variable of average population wealth, which may not be in the explanatory variables.
 
  • Like
Likes   Reactions: fog37
andrewkirk said:
No, they just have to be correlated. Causation is not part of the definition (eg see here). A common situation is where the correlation arises from each of the explanatory variables being driven ("caused") by another variable that may not be part of the set of explanatory variables. eg in a regression that had population crime levels and sickness levels as explanatory variables, we would likely find that those two are correlated because driven by a third variable of average population wealth, which may not be in the explanatory variables.
Sure, sorry, I used "cause" inadvertently. But just the fact that ##X## and ##X^2## are deterministically dependent terms would make me think that structural collinearity would emerge from them...
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
Replies
3
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K