Regression analysis - case of multicollinearity

Click For Summary
SUMMARY

This discussion addresses remedial procedures for multicollinearity in linear regression, specifically when the Variance Inflation Factor (VIF) is greater than or equal to 10. While dropping the problematic independent variable is a common solution, centering the predictor variables (xi = Xi - Xbar) is also mentioned, although it primarily simplifies calculations rather than alleviating collinearity. The calculation of VIF indicates the degree of collinearity but does not specify whether the issue stems from a single variable or multiple variables. The cutoff of 10 for VIF is arbitrary and lacks a definitive threshold for identifying large values.

PREREQUISITES
  • Understanding of linear regression analysis
  • Familiarity with Variance Inflation Factor (VIF)
  • Knowledge of correlation matrices
  • Basic statistical concepts such as multiple correlation coefficients
NEXT STEPS
  • Research methods for addressing multicollinearity in regression analysis
  • Learn about alternative diagnostics for collinearity issues
  • Explore the implications of centering and scaling predictor variables
  • Investigate the use of regularization techniques like Ridge regression
USEFUL FOR

Statisticians, data analysts, and students in introductory statistics courses who are working with linear regression and seeking to understand and address multicollinearity issues.

flying_young
Messages
9
Reaction score
0
What are some of the elementary remedial procedures to multicollinearity (VIF >= 10) in linear regression? We were told to simply just drop that particular independent variable, but someone else suggested we could center the predictor variables (ie., xi = Xi - Xbar). Can somebody explain why centering may also be appropriate in this case?

Thank you very much in advance!
 
Physics news on Phys.org
Centering won't do much to alleviate the condition of collinearity, but does make some calculations simpler to represent. As one example, if the data matrix has been centered (and, as is often done, scaled so that each column has unit length), then

[tex] R = X' X, \quad R^{-1} = \left(X ' X\right)^{-1}[/tex]

are the correlation and inverse of the correlation matrices.

One problem with the variance inflation factor comes from its calculation:

[tex] VIF_i = \frac 1 {1 - R^2_i}[/tex]

where [tex]R^2_i[/tex] is the multiple correlation coefficient (determination) of [tex]X_i[/tex] when regressed on the other predictors. If the VIF is large, that indicates a [tex]R^2_i[/tex] that is near one, so there is collinearity somewhere. It does not say whether you have a single case of collinearity (one variable depending on others) or whether there are several variables that exhibit close relationships. In short, you know you have a problem, but you don't know what type of problem you have.
It's also worth mentioning that the cutoff of 10 you mention for the size of VIF is arbitrarily set: there is no easily determined cutoff for what constitutes a large value.

That said, if you are looking for a simple attack (I'm assuming this is an introductory level course, or possibly a non-statistics course using multiple regression as an aside?) you can try removing the predictor that corresponds to the high VIF and re-run the analysis. There are other diagnostics available that allow a more detailed investigation of the problem, but that doesn't seem to be what you're after.

Good luck - hope something here helped.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K