Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Testing for Linear Relation:r^2 vs H_0: slope =0

  1. Feb 7, 2016 #1

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Hi All,
    I am trying to understand better the tests used to determine the existence of a linear relation between
    two variables X,Y. AFAIK, one way of testing the strength of any linear relationship is by computing
    ##r^2##, where ##r## is the correlation coefficient; this measures the extend to which X determines Y, i.e., the extend to which the value of X contributes to the value of Y.

    But then there is a second test, and I am confused as to how it relates to the one above. In this other tst, we do a hypothesis test for the slope of the regression line ## y=mx+b ## , with ## H_0: m=0, H_A: m \neq 0 ##. Are both these tests necessary, or is one used to corroborate the other? Are there situations where one test is preferable to the other?
    Thanks.
     
  2. jcsd
  3. Feb 7, 2016 #2
    r^2 is monotonic with the p value if the number of degrees of freedom are constant.

    If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

    In addition to the r^2 and p values, I like to consider the uncertainty in the slope.
     
  4. Feb 7, 2016 #3

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in ##H_0##?
     
  5. Feb 7, 2016 #4
    Note the difference between the number of variables and the number of degrees of freedom. The number of degrees of freedom is the number of data points minus 2 for a linear least squares fit. Suppose you have a number of least squares fits that all return r^2 = 0.81. (r = -0.9 or 0.9)

    The p-value computed by most stats packages is related to the hypothesis that the slope is different from zero (two-tailed) or specifically greater than (or less than) zero (one tailed). In the case of 3 data points, the one tailed p-value is 0.144, and the two tailed p-value is 0.287. Neither are statistically significant at the p< 0.05 level. But increase to 4 data points, and the one tailed p-value is 0.05 (at the edge of significance), and the two tailed p-value is 0.10 (not significant). At 4 data points, the one tailed p-value is 0.0187 (significant) and the two tailed p-value is 0.037 (also significant). Increase to 10 data points and an r of 0.9 is signficant (both 1 and 2 tailed) at < 0.001. See: http://vassarstats.net/tabs_r.html

    A given r^2 value is more believable with more points.

    Some fitting packages (including gnuplot and SciDavis, which I use) will also report the uncertainty in the slope, m. From this, one can compute a z score assuming the mean slope should have been zero. A slope which is two uncertainties away from zero has only about a 2.3% probability of being attributable to random chance.

    But you should keep in mind that these tests really only suggest the significance of a correlation, they do not really tell you with any confidence whether the relationship is linear, quadratic, exponential, or something else. That's a much more challenging question to answer definitively, especially if different models give comparable r^2 values.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook