Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Heteroskedasticity and its implications

  1. Jan 30, 2014 #1

    I have a data set that I have performed a regression upon. It looks to me like the data is heteroskedastic but I would like to make sure. I have heard that the White Test is a good test for heteroskedasticity but I have never performed one before so would be interested in any guidance on doing so. Also, if it does turn out that my data is heteroskedastic then what does that mean for my regression?

  2. jcsd
  3. Feb 13, 2014 #2
    If Var(ei) = σ2, i.e. the variance of the error terms is constant, you are in a case of homoscedasticity. If the error terms do not have constant variance, they are said to be heteroscedastic. Technically, you can detect heteroscedasticity with a simple visual inspection by plotting the residuals against the fitted values :

    In a large sample (n > 30), you'll notice that the residuals lay on a pattern of even width.

    In a smaller sample, residuals will be somewhat larger near the mean of the distribution than at the corners.

    Therefore, if it is obvious that the residuals are roughly the same size for all values of X, it is generally safe to assume that heteroscedasticity is not severe enough to be problematic. But obviously it depends on the level of precision you need and the context of your regression.

    Finally, if the plot of residuals shows an uneven pattern of residuals, so that the width of the pattern is considerably larger for some values of X than for others, a more precise test for heteroscedasticity should be conducted, for exemple White's test, which tests the null hypothesis σi22 for all i.

    This is easily done on R. See reference : http://www.inside-r.org/packages/cran/bstats/docs/white.test
  4. Feb 13, 2014 #3
    This really depends on what you hope to accomplish with your model. A modest amount of heteroskedasticity will tend not to have a major effect on the coefficient estimates themselves, so if you're only trying to get a sense of the relationship between your variables, it may not be a big issue. Where you're going to run into problems is with the standard errors of your estimates, and other related statistics (variability explained, etc). If it looks severe (or if you have an outlier problem), then you might try some kind of robust regression.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook