Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Linear regression and high correlation problems

  1. Jun 13, 2010 #1
    Hi guys,

    I have data of 20 peoples height, weight, calorie intake and skinfold thickness. I have carried out a regression of calorie on height, on weight and on height and weight. I have done the same thing for skinfold thickness. I then used R to work out the summary of results. each model also has an intercept i.e. y= beta1 + beta2X .

    using the 't' values I have found for calories both height and weight are significantly different from zero in the individual models. But when I look at the model where height and weight are both included then both become non significant.

    For the skinfold a similar thing happens. This time height and weight are not significantly different from zero individually but in the model including both they both become significant.

    I have found the correlation between weight and height to be -0.88 which is high. Can anyone help me explain what causes the changes in signficance?

    thanks in advance
     
  2. jcsd
  3. Jun 13, 2010 #2

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

  4. Jun 13, 2010 #3
    Ok I have read that. So am I right to say that for case 1 the high correlation causes the standard errors to increase, which lowers the 't' values which leads to the joint model having weight and height not being significant. But I am not sure why the opposite happens for the second case?

    Also is it possible in case 1 to transform the model to reduce the correlation between height and weight?

    thanks in advance
     
  5. Jun 13, 2010 #4

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    In a finite sample "anything's possible" due to outliers and other idiosyncrasies. With that caveat, one guess is that the true model is SF = b0 + b1 H + b2 W + u and the regression does a good job of identifying both factors despite their high negative correlation. In contrast, each of the partial models SF = a0 + a1 H + u and SF = c0 + c1 W + u underestimates the slope coefficient due to omitted variable bias.
     
    Last edited: Jun 13, 2010
  6. Jun 13, 2010 #5
    ah ok, and also is there a way of reducing the correlation in case 1 by transforming the models?
     
  7. Jun 13, 2010 #6

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    I would start with re-defining calories as calorie/inch or calorie/pound, and regress calorie/inch on W (alternatively H and W together), and regress calorie/pound on H (alternatively H and W together).
     
  8. Jun 13, 2010 #7
    Erm, the data I have is calorie intake index. I am not quite sure how to convert this to calorie/inch for example
     
  9. Jun 13, 2010 #8

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    Imagine the index is "normalized calories per person per unit of time." When divided by weight, for example, you will have "normalized calories per pound per unit of time."
     
  10. Jun 14, 2010 #9
    I have just tried that and it doesn't change the correlation between height and weight unless I am doing something wrong?
     
  11. Jun 14, 2010 #10

    statdad

    User Avatar
    Homework Helper

    Do you need all of the predictors? High correlation among them indicates that they "have the same information". I am especially curious since, with a sample size of 20, using a multiple regression with 4 predictors is a bit odd.
     
  12. Jun 14, 2010 #11
    I know what you mean, but its a study we have to do and we have been told to carry out a regression of calorie intake on weight, height and weight on height and then comment on what goes wrong and how to solve it.
     
  13. Jun 14, 2010 #12
    i.e how to solve the fact neither are significant in the regression on height and weight due to the correlation between the two.
     
  14. Jun 14, 2010 #13

    statdad

    User Avatar
    Homework Helper

    By "solve" do you mean "explain, and take remedy" or "perform some work that will allow both to be used in the regression"?
     
  15. Jun 14, 2010 #14
    Sorry, it says to explain why the contradiction occurs and why if the results are interpreted correctly the right conclusion can be drawn with no contradictions
     
  16. Jun 17, 2010 #15
    can anyone help me with how the results if interpreted correctly can still draw the right conclusion?
     
  17. Jun 17, 2010 #16

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    What conclusions do you draw about the results, can you post about that?
     
  18. Jun 17, 2010 #17
    Are the conclusions not just that calorie can be modelled using height as a variable and using weight but not using both together?
     
  19. Jun 17, 2010 #18

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    A question is, are you losing any information when you have both H and W as regressors?
     
  20. Jun 17, 2010 #19
    No are you not gaining information?
     
  21. Jun 17, 2010 #20

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    "Bingo!" You are gaining some additional info when you include both H and W, even though they are highly correlated and neither has an individually significant t-statistic.

    Which regression statistic tells you about the joint significance of all of the slope variables simultaneously?
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Linear regression and high correlation problems
  1. Linear Correlation (Replies: 2)

  2. Linear regression (Replies: 7)

Loading...