Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I Residuals are not 'random' -- How do I resolve?

  1. Mar 8, 2016 #1
    I'm dabbling with regression (In excel), but I'm stuck because my residual plot is not normal. I have 2 variables: age, and gender(0 or 1). I regress it in excel and also plot the residuals, it is not random. In general, how do I solve this issue?

    If it matters, the result of my raw when looks like a curve. It curves up and slowly curves back down(imagine a sine function that goes from 0 to pi)
     
  2. jcsd
  3. Mar 8, 2016 #2
    Generally, this means that your model is not appropriate (in this case, that a simple linear model does not adequately explain the data). It's hard to say more without seeing the data/model.
     
  4. Mar 9, 2016 #3
    thanks for the feedback. When you say that a simple linear model can not explain the data, do you mean that it could be non linear? Is it possible to determine that by visual inspection of the scatter plot? From what I can see, it does curve up and then down like a sine function, so maybe it is non linear? If it is non linear, how do I go about regressing it? If If that's not the issue, how do I go about determine the best model( linear or otherwise)?
     
  5. Mar 9, 2016 #4
    Linear regression is used if there is some reason to believe the data is linear. This seems not to be the case here.

    There is no procedure to input data and get a model in return. You have to have some reason to think that the data matches your model. Statistics is meant to tell you whether your guess is reasonable or not, not to guess for you.
     
  6. Mar 9, 2016 #5

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    Yes, it seems to be nonlinear. Linear regression could still apply, but you should add in higher order terms. Try to make a model with squares or cubes instead of just a linear parameter. That might do the trick.
    It's hard to give more advice without some specific pictures and details about the model you're trying to fit.
     
  7. Mar 9, 2016 #6

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    What confidence interval do you get for the slope, what value for r^2?
     
  8. Mar 12, 2016 #7

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    Good point. Modeling it as aX2+bX + c would allow linear regression to determine the parabola coefficients a,b,c that best fit the data. And it sounds like that may be what is needed. This is still called linear regression because the coefficients a,b,c are used in a linear way. The X2 does not prevent applying linear regression. Of course, there will be a very strong correlation between the X and X2 data entries. Step-wise linear regression should be used to take the correlations into account when it determines the final model. Excel might not have a good step-wise regression. In that case you might want to look into a statistical package like R.
     
  9. Mar 12, 2016 #8

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    A standard trick to remove this problem is by centering the variables. So instead of using ##Y = a + bX + cX^2## as a model, you should use ##Y = a + b(X - \overline{X}) + c(X - \overline{X})^2##. It's the same thing of course, but you have no strong correlations anymore this way.
     
  10. Mar 12, 2016 #9

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    Wait... how do you do all that with a binary variable (gender)?
     
  11. Mar 12, 2016 #10

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    I think OP is using standard linear regression with numerical variables for dependent and independent variables.
     
  12. Mar 12, 2016 #11

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    It depends highly on the specifics. I find this thread a bit annoying since we're basically shooting in the dark since the OP hasn't given us any plots or numbers or anything. It's hard to give any meaningful advice then.

    In any case, with a categorical variable, I would analyse the two genders separately first. Then you can bring them in a full model containing perhaps an interaction term or multiple ones.
     
  13. Mar 12, 2016 #12

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    I just see age and gender mentioned. And gender is a binary variable, which makes regression a bit... simple?
     
  14. Mar 12, 2016 #13

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    Well, I took it as an unnamed variable ##Y## with predictors gender and age.
     
  15. Mar 12, 2016 #14

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    My bad, you're right.
     
  16. Mar 12, 2016 #15

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Can this be anything other than a logistic regression, i.e., one of the inputs is Boolean/Binary. What are the options for the dependent variable?
     
  17. Mar 12, 2016 #16

    micromass

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2016 Award

    Come on, a regression ##\text{gender} \sim \text{age}## makes no sense at all. Who in their right mind would try to predict gender based on the age? There has to be a dependent variable that the OP is not telling us.
     
  18. Mar 12, 2016 #17

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    These two may be independent variables used to logistically regress some third variable. There is no specification in the OP as to whether either of these is the dependent variable or not.
     
  19. Mar 12, 2016 #18

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    I suggest to wait for @semidevil to explain in more detail what is done and what went wrong.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted