Bivariate correlation does not always catch multicollinearity

  • Context: Undergrad 
  • Thread starter Thread starter fog37
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around the concept of multicollinearity in the context of multiple predictors in statistical modeling. Participants explore how multicollinearity can exist even when pairwise correlations between predictors are low, questioning the implications for regression analysis and the interpretation of correlation coefficients.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • Some participants suggest that multicollinearity can occur despite low pairwise correlations among predictors, using examples of three predictors where collective correlation may be high.
  • One participant questions how predictors can be collectively correlated without pairwise correlation, proposing the use of Venn diagrams to visualize the relationships.
  • Another participant discusses the implications of having independent, identically distributed random variables and how this affects the correlation of a dependent variable with individual predictors.
  • There is a suggestion that the variance inflation factor (VIF) may help identify correlation combinations that are not apparent through pairwise correlation analysis.
  • Some participants express confusion about how collective correlation can exceed pairwise correlations, particularly in visual representations.
  • One participant acknowledges the complexity of the topic and suggests that the more predictors included, the weaker the correlation between the dependent variable and individual predictors may become.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the nature of multicollinearity and its relationship with pairwise correlations. Multiple competing views and interpretations remain present throughout the discussion.

Contextual Notes

Participants express uncertainty regarding the definitions and implications of multicollinearity, particularly in relation to the conditions under which it can be identified. There are unresolved questions about the visual representation of these relationships and the mathematical underpinnings of the claims made.

fog37
Messages
1,566
Reaction score
108
TL;DR
Bivariate correlation does not always catch multicollinearity
Hello,

While studying multicollinearity, I learned that if there are more than 2 predictors ##X##, for examples 3 predictors ##X_1, X_2, X_3##, it may be possible for all the possible pairwise correlations to be low in value but multicollinearity to still be an issue...That would mean that the "triple" correlation, i.e. the average of the products ##(X_1 X_2 X_3)##, would have a high value (higher than 0.7)...Is that correct?

Would you a have a simple example of how three variables may be correlated collectively even if their pairwise correlation is low?

Thank you!
 
Physics news on Phys.org
In a visual sense, using Venn diagrams, how can the predictors be correlated all together if they are not pairwise correlated at all? The figures below show moderate multicollinearity and strong multicollinearity. I don't see how the ##X## circles cannot overlap and still cause multicollinearity...

1704308542699.png
 
It may depend on how low you demand the individual pairwise correlations to be. Suppose that ##X_1## and ##X_2## are independent, identically distributed random variables and that ##Y = X_1+X_2##. Then I think it is clear that the correlation of ##Y## with any one ##X_i## may be smaller than the threshold even though ##Y## is a deterministic function of ##X_1, X_2##.
In fact, it gets easier when ##Y## is a function of more independent ##X_i## variables. Any one ##X_i## might have a low correlation with ##Y## but the combination of all the ##X_i##s might completely determine ##Y##. Suppose ##Y = X_1+X_2+...+X_{100}##, where the ##X_i##s are pairwise independent.
 
  • Like
Likes   Reactions: Office_Shredder and fog37
FactChecker said:
It may depend on how low you demand the individual pairwise correlations to be. Suppose that ##X_1## and ##X_2## are independent, identically distributed random variables and that ##Y = X_1+X_2##. Then I think it is clear that the correlation of ##Y## with any one ##X_i## may be smaller than the threshold even though ##Y## is a deterministic function of ##X_1, X_2##.
In fact, it gets easier when ##Y## is a function of more independent ##X_i## variables. Any one ##X_i## might have a low correlation with ##Y## but the combination of all the ##X_i## s might completely determine ##Y##. Suppose ##Y = X_1+X_2+...+X_{100}##, where the ##X_i## are pairwise independent.
Processing...multicollinearity is when the predictors are correlated in such a way that the estimated coefficient for a predictor, which would indicate the change in ##Y## per unit change in ##X##, is not what it is really is because ##X_1## and ##X_2## are correlated so when ##X_1## changes by one unit we cannot hold ##X_2## fixed and it changes too...

Let's say ##Y=b_1 X_1 + b_2 X_2 + b_3 X_3##...and the predictors ##Xs## are pairwise linearly independent with the correlation coefficients being low: ##r_{12} = r_{13} = r_{23} \approx 0.2##. That is not an automatic proof of lack of multicollinearity...

It could be ##r_{123} \approx 0.8##... But could that be? How can they collectively be more correlated than pairwise? I am struggling to see that, especially visually using the Venn diagram where each smaller circle represents the variance of ##X## and the larger circle is the variance of ##Y##...
 
Oh, maybe I get it now...It could be that ##Y=\beta_1 X_1 +\beta_2 X_2 + \beta_3 X_3## and the three regressors are pairwise uncorrelated to each other BUT the correlations between ##X_1## and, for example, the variable given by the sum ##X_2+X_3## to be nonzero and high in value. Same goes for the correlation btw ##X_2## and ##X_1+X_3##, etc.

I think that is what the variance inflation factor (VIF) does in checking these correlation combinations, which cannot be visualized with the Venn diagrams of the individual predictors and response variables, instead of focusing on the pairwise correlations...
 
fog37 said:
Oh, maybe I get it now...It could be that ##Y=\beta_1 X_1 +\beta_2 X_2 + \beta_3 X_3## and the three regressors are pairwise uncorrelated to each other BUT the correlations between ##X_1## and, for example, the variable given by the sum ##X_2+X_3## to be nonzero and high in value.
Not if the ##X_i##s are independent. Then ##X_1## would be uncorrelated to ##X_2+X_3##.

I probably should leave this for others since I am not an expert. But if ##Y = X_1+X_2##, where the ##X##s are independent, then ##Y, X_1, X_2## are all estimators of ##Y## to varying extents. ##X_1## and ##X_2## are independent. ##Y## is somewhat correlated to an individual ##X_i##, but completely determined by the pair. The more ##X_i##s there are in the sum, the weaker would be the correlation between ##Y## and the individual ##X_i##s.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 21 ·
Replies
21
Views
4K