Testing for Linear Relation:r^2 vs H_0: slope =0

WWGD
Science Advisor
Homework Helper
Messages
7,679
Reaction score
12,434
Hi All,
I am trying to understand better the tests used to determine the existence of a linear relation between
two variables X,Y. AFAIK, one way of testing the strength of any linear relationship is by computing
##r^2##, where ##r## is the correlation coefficient; this measures the extend to which X determines Y, i.e., the extend to which the value of X contributes to the value of Y.

But then there is a second test, and I am confused as to how it relates to the one above. In this other tst, we do a hypothesis test for the slope of the regression line ## y=mx+b ## , with ## H_0: m=0, H_A: m \neq 0 ##. Are both these tests necessary, or is one used to corroborate the other? Are there situations where one test is preferable to the other?
Thanks.
 
Physics news on Phys.org
r^2 is monotonic with the p value if the number of degrees of freedom are constant.

If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

In addition to the r^2 and p values, I like to consider the uncertainty in the slope.
 
  • Like
Likes WWGD
Dr. Courtney said:
r^2 is monotonic with the p value if the number of degrees of freedom are constant.

If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

In addition to the r^2 and p values, I like to consider the uncertainty in the slope.
Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in ##H_0##?
 
WWGD said:
Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in ##H_0##?

Note the difference between the number of variables and the number of degrees of freedom. The number of degrees of freedom is the number of data points minus 2 for a linear least squares fit. Suppose you have a number of least squares fits that all return r^2 = 0.81. (r = -0.9 or 0.9)

The p-value computed by most stats packages is related to the hypothesis that the slope is different from zero (two-tailed) or specifically greater than (or less than) zero (one tailed). In the case of 3 data points, the one tailed p-value is 0.144, and the two tailed p-value is 0.287. Neither are statistically significant at the p< 0.05 level. But increase to 4 data points, and the one tailed p-value is 0.05 (at the edge of significance), and the two tailed p-value is 0.10 (not significant). At 4 data points, the one tailed p-value is 0.0187 (significant) and the two tailed p-value is 0.037 (also significant). Increase to 10 data points and an r of 0.9 is signficant (both 1 and 2 tailed) at < 0.001. See: http://vassarstats.net/tabs_r.html

A given r^2 value is more believable with more points.

Some fitting packages (including gnuplot and SciDavis, which I use) will also report the uncertainty in the slope, m. From this, one can compute a z score assuming the mean slope should have been zero. A slope which is two uncertainties away from zero has only about a 2.3% probability of being attributable to random chance.

But you should keep in mind that these tests really only suggest the significance of a correlation, they do not really tell you with any confidence whether the relationship is linear, quadratic, exponential, or something else. That's a much more challenging question to answer definitively, especially if different models give comparable r^2 values.
 
  • Like
Likes WWGD
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top