# Testing for Linear Relation:r^2 vs H_0: slope =0

1. Feb 7, 2016

### WWGD

Hi All,
I am trying to understand better the tests used to determine the existence of a linear relation between
two variables X,Y. AFAIK, one way of testing the strength of any linear relationship is by computing
$r^2$, where $r$ is the correlation coefficient; this measures the extend to which X determines Y, i.e., the extend to which the value of X contributes to the value of Y.

But then there is a second test, and I am confused as to how it relates to the one above. In this other tst, we do a hypothesis test for the slope of the regression line $y=mx+b$ , with $H_0: m=0, H_A: m \neq 0$. Are both these tests necessary, or is one used to corroborate the other? Are there situations where one test is preferable to the other?
Thanks.

2. Feb 7, 2016

### Dr. Courtney

r^2 is monotonic with the p value if the number of degrees of freedom are constant.

If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

In addition to the r^2 and p values, I like to consider the uncertainty in the slope.

3. Feb 7, 2016

### WWGD

Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in $H_0$?

4. Feb 7, 2016

### Dr. Courtney

Note the difference between the number of variables and the number of degrees of freedom. The number of degrees of freedom is the number of data points minus 2 for a linear least squares fit. Suppose you have a number of least squares fits that all return r^2 = 0.81. (r = -0.9 or 0.9)

The p-value computed by most stats packages is related to the hypothesis that the slope is different from zero (two-tailed) or specifically greater than (or less than) zero (one tailed). In the case of 3 data points, the one tailed p-value is 0.144, and the two tailed p-value is 0.287. Neither are statistically significant at the p< 0.05 level. But increase to 4 data points, and the one tailed p-value is 0.05 (at the edge of significance), and the two tailed p-value is 0.10 (not significant). At 4 data points, the one tailed p-value is 0.0187 (significant) and the two tailed p-value is 0.037 (also significant). Increase to 10 data points and an r of 0.9 is signficant (both 1 and 2 tailed) at < 0.001. See: http://vassarstats.net/tabs_r.html

A given r^2 value is more believable with more points.

Some fitting packages (including gnuplot and SciDavis, which I use) will also report the uncertainty in the slope, m. From this, one can compute a z score assuming the mean slope should have been zero. A slope which is two uncertainties away from zero has only about a 2.3% probability of being attributable to random chance.

But you should keep in mind that these tests really only suggest the significance of a correlation, they do not really tell you with any confidence whether the relationship is linear, quadratic, exponential, or something else. That's a much more challenging question to answer definitively, especially if different models give comparable r^2 values.