Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Regression analysis - case of multicollinearity

  1. Dec 27, 2008 #1
    What are some of the elementary remedial procedures to multicollinearity (VIF >= 10) in linear regression? We were told to simply just drop that particular independent variable, but someone else suggested we could center the predictor variables (ie., xi = Xi - Xbar). Can somebody explain why centering may also be appropriate in this case?

    Thank you very much in advance!
  2. jcsd
  3. Jan 1, 2009 #2


    User Avatar
    Homework Helper

    Centering won't do much to alleviate the condition of collinearity, but does make some calculations simpler to represent. As one example, if the data matrix has been centered (and, as is often done, scaled so that each column has unit length), then

    R = X' X, \quad R^{-1} = \left(X ' X\right)^{-1}

    are the correlation and inverse of the correlation matrices.

    One problem with the variance inflation factor comes from its calculation:

    VIF_i = \frac 1 {1 - R^2_i}

    where [tex] R^2_i [/tex] is the multiple correlation coefficient (determination) of [tex] X_i [/tex] when regressed on the other predictors. If the VIF is large, that indicates a [tex] R^2_i [/tex] that is near one, so there is collinearity somewhere. It does not say whether you have a single case of collinearity (one variable depending on others) or whether there are several variables that exhibit close relationships. In short, you know you have a problem, but you don't know what type of problem you have.
    It's also worth mentioning that the cutoff of 10 you mention for the size of VIF is arbitrarily set: there is no easily determined cutoff for what constitutes a large value.

    That said, if you are looking for a simple attack (I'm assuming this is an introductory level course, or possibly a non-statistics course using multiple regression as an aside?) you can try removing the predictor that corresponds to the high VIF and re-run the analysis. There are other diagnostics available that allow a more detailed investigation of the problem, but that doesn't seem to be what you're after.

    Good luck - hope something here helped.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook