Collinearity between predictors: what happens under the hood

  • I
  • Thread starter fog37
  • Start date
In summary: Interaction effects are not accounted for in a linear regression model by using a single beta. Instead, a multiple beta model is used to account for the interaction between the variables.
  • #1
fog37
1,568
108
TL;DR Summary
Understanding the idea of "keeping the other predictors fixed" to interpret partial coefficients
Hello,

In the presence of NO multicollinearity, with a linear regression model like ##Y=3 X_1+2 X_2##, the predictors ##X_1, X_2## are not pairwise correlated.
  • When ##X_1## changes by 1 unit, the dependent variable ##Y## change by a factor of ##3##, i.e. ##\Delta Y =3##, while the other variables are kept fixed/constant, i.e. they are not simultaneously changing with ##X_1## and participating in the ##\Delta Y## being equal to 3. By analogy, it is like the predictors are working "decoupled" gears.
  • However, when multicollinearity is present (##X_1## and ##X_2## are correlated), it is not true that as ##X_1## changes by 1 unit, the change ##\Delta Y=3## is not solely due to that unit change in ##X_1## alone while the other variables are fixed/constant. The number 3 is due to the explicit change of ##X_1## but also to the implicit change of ##X_2## (also by one unit?) caused by ##X_1##: changing the variable ##X_1## also changes automatically the variable ##X_2## which is not kept constant while ##X_1## changes.
I think my understanding is correct but I don't fully understand how all this happens mechanically within the data. Does the idea of "while keeping the other variables fixed" really mean that the calculation of the coefficients ##\beta## involves the pairwise correlation between ##r_{12}## compromising the purity of the coefficient? I just don't see how, operationally, changing ##X_1## by one unit (i.e. setting ##X_1=1##) automatically, under the hood, activates a change of ##X_2## in the equation which silently contributes to a partial change of ##\Delta Y##.

It is like ##\Delta Y## = (##\Delta Y## due to ##X_1##) + (##\Delta Y## due to ##X_2##)

Thank you for any clarification.
 
Physics news on Phys.org
  • #2
In the case of correlated independent variables, ##X_1## and ##X_2##, the coefficients of the linear regression are not necessarily unique. As an extreme example, consider the case where ##X_1= X_2##. The use of a second variable is completely redundant and linear regressions with both variables are possible with a whole set of coefficient combinations.

A step-by-step process alleviates the problem and gives statistical meaning to the coefficients. Suppose ##X_1= X_2## and the linear regression model of ##Y = a_1 X_1 + \epsilon## gives the minimal sum-squared-errors. Then there will be no correlation between the sample ##x_{2,i}## and the sample errors, ##\epsilon_i## because the factor ##a_1 X_1## has taken care of the entire correlation that could be obtained by adding ##X_2## to the linear model.

In a less extreme example, where ##X_1## and ##X_2## are correlated but not equal, there might be some residual error from the ##Y = a_1 X_1 + \epsilon## that can be reduced by adding an ##X_2## term to the linear regression. If the reduction is statistically significant, ##X_2## can be added. Then the term ##a_2 X_2## can be thought of as accounting for/explaining/predicting the residual errors left over by the ##Y = a_1 X_1+ \epsilon## model.

This process is automated in the stepwise linear regression algorithm. The results should be examined for validity and not just applied blindly. The bidirectional elimination algorithm is the most sophisticated. Suppose that variable ##X_1## gives the best single-variable model but ##X_2## and ##X_3## are added in later steps because their reduction of the residual errors were statistically significant. It can happen that the model with only ##X_2## and ##X_3## explains so much of the ##Y## values that the ##X_1## term is no longer statistically significant. The bidirectional elimination algorithm would go back and remove ##X_1## from the final regression result..
 
Last edited:
  • Like
Likes fog37
  • #3
Measuring collinearity is the same thing as asking what the beta is using ##X_1## to predict ##X_2## and getting a non zero answer. In your example suppose ##X_2=0.5X_1+\text{independent noise}##. Then how would you expect ##Y## to change if ##X_1## changes by 1 unit?

This is basically the same thing as the chain rule with multiple inputs, if that's something you're familiar with.
 
  • Like
Likes fog37
  • #4
Office_Shredder said:
Measuring collinearity is the same thing as asking what the beta is using ##X_1## to predict ##X_2## and getting a non zero answer. In your example suppose ##X_2=0.5X_1+\text{independent noise}##. Then how would you expect ##Y## to change if ##X_1## changes by 1 unit?

This is basically the same thing as the chain rule with multiple inputs, if that's something you're familiar with.
The chain rule example clears things well. For example, in the case of perfect collinearity, if ##X_2=-2X_1##

$$Y=3X_1 + 2X_2=3X_1-4 X_1$$

and ##\frac {\Delta Y}{{\Delta_X}_1} = -1## instead of ##\frac {\Delta Y}{{\Delta_X}_1} = 3##
 
  • #5
Yep. There is one very important difference with the chain rule. When taking derivatives, things invert in the natural way. If ##\partial X_2 /\partial X_1=3##, then ##\partial X_1 /\partial X_2=1/3##. Betas don't work that way. If ##X_2=0.5 X_1+noise##, then the only thing you can say is ##X_1=\beta X_2+noise## where ##|\beta|\leq 2## (equality only if they are perfectly correlated)
 
  • Like
Likes fog37
  • #6
Don't you consider interaction effects in your model, as in ##\beta_1X_1 + \beta_2 X_2+ \beta_3 X_1*X_2##?

Which you ultimately test for, to test for the assumption ##\beta_3=0##?
 
  • Like
Likes fog37
  • #7
WWGD said:
Don't you consider interaction effects in your model, as in ##\beta_1X_1 + \beta_2 X_2+ \beta_3 X_1*X_2##?

Which you ultimately test for, to test for the assumption ##\beta_3=0##?
You certainly can, but that can be done with the variable ##X_3 = X_1*X_2## in the usual way.
 
  • #8
Just a related question about fitting a multiple linear regression model to our multivariate data: what steps can we take to figure out if a multiple linear regression is the adequate model at all for our data ?
For simple linear regression, we can easily inspect the scatterplot between ##Y## and the single predictor ##X## to see if the cloud of data follows a linear trend...But in the case of multiple regressors ##X_1, X_2, X_3, X_4##, would we first plot individual scatterplots between ##Y## and ##X_1##, ##Y## and ##X_2##, ##Y## and ##X_3##, ##Y## and ##X_4## ?
And if the scatterplots are all showing a linear trend, then we try to fit the model with a multiple linear regression equation (aka a plane)? What if the data is linear for some scatterplots and not for others scatterplots?

Thank you!
 
  • #9
The best thing is if you have knowledge of the subject matter and are comfortable with the form of your model. The regression algorithm in any good statistical package will indicate the statistical significance of each term in the model. You should not include terms in the model that do not both make sense in the subject matter and pass the test of statistical significance.
 
  • Like
Likes fog37
  • #10
When you evaluate a regression model, you should keep one thing in mind. Suppose that two independent variables, ##X_1## and ##X_2## have positive correlations with ##Y##. It can easily happen that the best linear regression model ##Y = a_1 X_1 +a_2 X_2 +\epsilon## has ##a_1## a little high and that is corrected with a negative ##a_2##. That may be correct even though the sign of ##a_2## appears wrong. A close examination of the regression process will allow you to determine what happened.
 
  • #11
You also use the distribution of the coefficients to test the null hypothesis ##H_0: \beta_i=0, H_A: \beta_i \neq 0 ## And test the adjusted ##r^2##, see whether it inreases or decreases as you add variables. There are also methods like forward, stepwise regression: forward selection, backwards elimination.
https://en.wikipedia.org/wiki/Stepwise_regression
 
  • Like
Likes FactChecker and fog37
  • #12
FactChecker said:
In the case of correlated independent variables, ##X_1## and ##X_2##, the coefficients of the linear regression are not necessarily unique. As an extreme example, consider the case where ##X_1= X_2##. The use of a second variable is completely redundant and linear regressions with both variables are possible with a whole set of coefficient combinations.

A step-by-step process alleviates the problem and gives statistical meaning to the coefficients. Suppose ##X_1= X_2## and the linear regression model of ##Y = a_1 X_1 + \epsilon## gives the minimal sum-squared-errors. Then there will be no correlation between the sample ##x_{2,i}## and the sample errors, ##\epsilon_i## because the factor ##a_1 X_1## has taken care of the entire correlation that could be obtained by adding ##X_2## to the linear model.

In a less extreme example, where ##X_1## and ##X_2## are correlated but not equal, there might be some residual error from the ##Y = a_1 X_1 + \epsilon## that can be reduced by adding an ##X_2## term to the linear regression. If the reduction is statistically significant, ##X_2## can be added. Then the term ##a_2 X_2## can be thought of as accounting for/explaining/predicting the residual errors left over by the ##Y = a_1 X_1+ \epsilon## model.

This process is automated in the stepwise linear regression algorithm. The results should be examined for validity and not just applied blindly. The bidirectional elimination algorithm is the most sophisticated. Suppose that variable ##X_1## gives the best single-variable model but ##X_2## and ##X_3## are added in later steps because their reduction of the residual errors were statistically significant. It can happen that the model with only ##X_2## and ##X_3## explains so much of the ##Y## values that the ##X_1## term is no longer statistically significant. The bidirectional elimination algorithm would go back and remove ##X_1## from the final regression result..
It's important to note that stepwise regression methods are, in general, not good choices and even though I teach courses that discuss them I strongly urge students not to use them in practice. A few reasons:
1. The R^2 for models that come form them tend to be higher than they should be
2. The F statistics often reported don't really have F distributions
3. The standard errors of the parameter estimates are too small and so the confidence intervals around the parameter estimates are not accurate
5. Because of the multiple tests in the process the p-values are often too low are difficult to correct
6. The slope estimates are biased (this is probably not the strongest argument against them, since the notion of a slope estimate being unbiased simply means they are unbiased for the model you specify, and you have no idea whether it is the correct model)
7. These methods increase problems caused when there is collinearity in the predictors
For a good discussion of these issues see Frank Harrell's Regression Modeling Strategies (2001).
 
  • Skeptical
  • Like
Likes fog37 and FactChecker
  • #13
statdad said:
It's important to note that stepwise regression methods are, in general, not good choices and even though I teach courses that discuss them I strongly urge students not to use them in practice. A few reasons:
1. The R^2 for models that come form them tend to be higher than they should be
2. The F statistics often reported don't really have F distributions
3. The standard errors of the parameter estimates are too small and so the confidence intervals around the parameter estimates are not accurate
5. Because of the multiple tests in the process the p-values are often too low are difficult to correct
6. The slope estimates are biased (this is probably not the strongest argument against them, since the notion of a slope estimate being unbiased simply means they are unbiased for the model you specify, and you have no idea whether it is the correct model)
7. These methods increase problems caused when there is collinearity in the predictors
For a good discussion of these issues see Frank Harrell's Regression Modeling Strategies (2001).
IMO, if the assumptions are met, the mathematics is correct and well established.
 
  • Like
Likes fog37
  • #14
FactChecker said:
IMO, if the assumptions are met, the mathematics is correct and well established.
I'm not sure what you mean here: the points I made (again, look at Harrell for deeper discussion) are also mathematical points: they apply even if the assumptions are met.
Stepwise methods, by their nature, negate the benefits of the usual assumptions about LS regression.
 
  • Like
Likes fog37

1. What is collinearity between predictors?

Collinearity between predictors refers to the situation where two or more predictor variables in a statistical model are highly correlated with each other. This can cause issues in the model, such as inflated standard errors and difficulty in interpreting the effects of individual predictors.

2. How does collinearity affect the results of a statistical model?

Collinearity can lead to unstable and unreliable estimates of the coefficients in a model. This means that the effects of the individual predictors may be difficult to interpret and can lead to incorrect conclusions about the relationship between the predictors and the outcome variable.

3. How is collinearity detected?

Collinearity can be detected by examining the correlation matrix of the predictor variables. A high correlation coefficient (typically above 0.7) between two or more variables indicates collinearity. Alternatively, statistical tests such as the variance inflation factor (VIF) can also be used to detect collinearity.

4. Can collinearity be corrected?

Collinearity can be corrected by removing one or more of the highly correlated predictors from the model. This can be done by either manually selecting the most important predictors or by using statistical techniques such as principal component analysis (PCA) to combine the correlated predictors into a single component.

5. How can I prevent collinearity in my statistical model?

To prevent collinearity, it is important to carefully select the predictors for your model. This means avoiding including highly correlated variables and choosing predictors that are relevant and meaningful for the outcome variable. Additionally, collecting a larger sample size can also help reduce the chances of collinearity occurring in your model.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
464
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Differential Equations
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Classical Physics
Replies
0
Views
149
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
Back
Top