Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Testing for Linear Relation:r^2 vs H_0: slope =0

  1. Feb 7, 2016 #1


    User Avatar
    Science Advisor
    Gold Member

    Hi All,
    I am trying to understand better the tests used to determine the existence of a linear relation between
    two variables X,Y. AFAIK, one way of testing the strength of any linear relationship is by computing
    ##r^2##, where ##r## is the correlation coefficient; this measures the extend to which X determines Y, i.e., the extend to which the value of X contributes to the value of Y.

    But then there is a second test, and I am confused as to how it relates to the one above. In this other tst, we do a hypothesis test for the slope of the regression line ## y=mx+b ## , with ## H_0: m=0, H_A: m \neq 0 ##. Are both these tests necessary, or is one used to corroborate the other? Are there situations where one test is preferable to the other?
  2. jcsd
  3. Feb 7, 2016 #2
    r^2 is monotonic with the p value if the number of degrees of freedom are constant.

    If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

    In addition to the r^2 and p values, I like to consider the uncertainty in the slope.
  4. Feb 7, 2016 #3


    User Avatar
    Science Advisor
    Gold Member

    Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in ##H_0##?
  5. Feb 7, 2016 #4
    Note the difference between the number of variables and the number of degrees of freedom. The number of degrees of freedom is the number of data points minus 2 for a linear least squares fit. Suppose you have a number of least squares fits that all return r^2 = 0.81. (r = -0.9 or 0.9)

    The p-value computed by most stats packages is related to the hypothesis that the slope is different from zero (two-tailed) or specifically greater than (or less than) zero (one tailed). In the case of 3 data points, the one tailed p-value is 0.144, and the two tailed p-value is 0.287. Neither are statistically significant at the p< 0.05 level. But increase to 4 data points, and the one tailed p-value is 0.05 (at the edge of significance), and the two tailed p-value is 0.10 (not significant). At 4 data points, the one tailed p-value is 0.0187 (significant) and the two tailed p-value is 0.037 (also significant). Increase to 10 data points and an r of 0.9 is signficant (both 1 and 2 tailed) at < 0.001. See: http://vassarstats.net/tabs_r.html

    A given r^2 value is more believable with more points.

    Some fitting packages (including gnuplot and SciDavis, which I use) will also report the uncertainty in the slope, m. From this, one can compute a z score assuming the mean slope should have been zero. A slope which is two uncertainties away from zero has only about a 2.3% probability of being attributable to random chance.

    But you should keep in mind that these tests really only suggest the significance of a correlation, they do not really tell you with any confidence whether the relationship is linear, quadratic, exponential, or something else. That's a much more challenging question to answer definitively, especially if different models give comparable r^2 values.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook