Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

When y is negative in linear regression?

  1. Jan 28, 2014 #1
    I am using linear regression to predict 'y' based on 8 variables.
    With my example, most the Betas that I got are negative. So, y, the value to predict, is negative.
    To my data, y is a time in seconds, so I think it shouldn't be negative.

    I my example in python, and I want to know if y should be negative, even when y is seconds, or my code is not correct.

    Is is possible that y can be negative?
     
  2. jcsd
  3. Jan 28, 2014 #2

    SteamKing

    User Avatar
    Staff Emeritus
    Science Advisor
    Homework Helper

    It's not clear what you mean by a linear regression with 8 variables. Does this mean you are using 8 data points?
     
  4. Jan 29, 2014 #3
    I mean that I use 8 independent variables to get y.

    y = Beta1*x1 + Beta2*x2 + Beta3*x3 + Beta4*x4 + Beta5*x5 + Beta6*x6 + Beta7*x7 + Beta8*x8

    And when I calculate the Betas to get a predicted y, \hat{y}, some of them are negative, making \hat{y} negative.
     
    Last edited: Jan 29, 2014
  5. Jan 29, 2014 #4

    Office_Shredder

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Have you checked the statistical significance of those betas? If some of them are just noise then you would expect to get nonsense results.

    Even then statistical modeling with a linear fit is never going to be a perfect, it is entirely possible that if x1 is larger that the timing of y will be shorter, causing a negative Beta1 to appear. At that point you might question whether a linear model is a good one to use for the x1 variable.
     
  6. Jan 29, 2014 #5

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    A lot of statistics packages have stepwise regression algorithms. They start with a constant and the most significant independent variable (say Xm): Y = Beta0 + Betam * Xm. Then, one by one, add in the next most significant term, then the next, etc., till there are no statistically significant terms to add. That will allow you to include only those terms that are statistically significant.

    If you know that Y can never be negative, you might want to try a model that will never go negative, like Y = exp( Beta0 + Beta1 * X1). For that, do a stepwise linear regression using the natural log of the Y data. That will give an expression ln(Y) = Beta0 + Beta1 * X1. Many statistics packages have these types of regressions as options.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook