Testing for Linear Relation:r^2 vs H_0: slope =0

In summary, r^2 is a measure of the strength of a linear relationship between two variables, and the p-value is related to the hypothesis that the slope is different from zero (two-tailed) or specifically greater than (or less than) zero (one tailed). However, r^2 and p values are only significant if a given number of data points are included in the analysis. Additionally, the uncertainty in the slope can be estimated by calculating a z-score, and is more believable when the correlation is stronger.
  • #1
WWGD
Science Advisor
Gold Member
7,017
10,547
Hi All,
I am trying to understand better the tests used to determine the existence of a linear relation between
two variables X,Y. AFAIK, one way of testing the strength of any linear relationship is by computing
##r^2##, where ##r## is the correlation coefficient; this measures the extend to which X determines Y, i.e., the extend to which the value of X contributes to the value of Y.

But then there is a second test, and I am confused as to how it relates to the one above. In this other tst, we do a hypothesis test for the slope of the regression line ## y=mx+b ## , with ## H_0: m=0, H_A: m \neq 0 ##. Are both these tests necessary, or is one used to corroborate the other? Are there situations where one test is preferable to the other?
Thanks.
 
Physics news on Phys.org
  • #2
r^2 is monotonic with the p value if the number of degrees of freedom are constant.

If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

In addition to the r^2 and p values, I like to consider the uncertainty in the slope.
 
  • Like
Likes WWGD
  • #3
Dr. Courtney said:
r^2 is monotonic with the p value if the number of degrees of freedom are constant.

If r^2 is constant, the p value gets closer and closer to zero (more significant) as the number of degrees of freedom are increased.

In addition to the r^2 and p values, I like to consider the uncertainty in the slope.
Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in ##H_0##?
 
  • #4
WWGD said:
Sorry for my ignorance here, but if r^2 is constant with respect to what?DO you mean after adjusting for the number of variables? And I guess the p-value is the one used in ##H_0##?

Note the difference between the number of variables and the number of degrees of freedom. The number of degrees of freedom is the number of data points minus 2 for a linear least squares fit. Suppose you have a number of least squares fits that all return r^2 = 0.81. (r = -0.9 or 0.9)

The p-value computed by most stats packages is related to the hypothesis that the slope is different from zero (two-tailed) or specifically greater than (or less than) zero (one tailed). In the case of 3 data points, the one tailed p-value is 0.144, and the two tailed p-value is 0.287. Neither are statistically significant at the p< 0.05 level. But increase to 4 data points, and the one tailed p-value is 0.05 (at the edge of significance), and the two tailed p-value is 0.10 (not significant). At 4 data points, the one tailed p-value is 0.0187 (significant) and the two tailed p-value is 0.037 (also significant). Increase to 10 data points and an r of 0.9 is signficant (both 1 and 2 tailed) at < 0.001. See: http://vassarstats.net/tabs_r.html

A given r^2 value is more believable with more points.

Some fitting packages (including gnuplot and SciDavis, which I use) will also report the uncertainty in the slope, m. From this, one can compute a z score assuming the mean slope should have been zero. A slope which is two uncertainties away from zero has only about a 2.3% probability of being attributable to random chance.

But you should keep in mind that these tests really only suggest the significance of a correlation, they do not really tell you with any confidence whether the relationship is linear, quadratic, exponential, or something else. That's a much more challenging question to answer definitively, especially if different models give comparable r^2 values.
 
  • Like
Likes WWGD

Related to Testing for Linear Relation:r^2 vs H_0: slope =0

What is the purpose of testing for linear relation?

The purpose of testing for linear relation is to determine if there is a significant relationship between two variables. This can help us understand the strength and direction of the relationship, and make predictions about future values.

What does r^2 measure in testing for linear relation?

r^2, also known as the coefficient of determination, measures the proportion of variation in the dependent variable that is explained by the independent variable in a linear regression model. It ranges from 0 to 1, where 0 indicates no relationship and 1 indicates a perfect relationship.

What is the null hypothesis in testing for linear relation?

The null hypothesis in testing for linear relation is that there is no linear relationship between the two variables. In other words, the slope of the regression line is equal to 0.

How is the p-value used in testing for linear relation?

The p-value is used to determine the significance of the relationship between the two variables. If the p-value is less than the chosen significance level (typically 0.05), we reject the null hypothesis and conclude that there is a significant linear relationship between the variables.

What are the assumptions of testing for linear relation?

The assumptions of testing for linear relation include linearity, normality, independence, and homoscedasticity. Linearity assumes that the relationship between the two variables can be described by a straight line. Normality assumes that the residuals (differences between actual and predicted values) are normally distributed. Independence assumes that there is no relationship between the residuals. Homoscedasticity assumes that the variation of the residuals is the same for all levels of the independent variable.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
513
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
521
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
940
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top