Does a correlation coefficient represent probability?

Click For Summary

Discussion Overview

The discussion centers around the interpretation of correlation coefficients, specifically r-squared values, and whether they can be considered as probabilities. Participants explore the definitions and implications of r-squared in relation to chance and correlation, touching on statistical concepts and potential pitfalls in interpretation.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • Some participants argue that r-squared values are not probabilities, emphasizing that they represent the fraction of variance explained by one variable in relation to another.
  • Others suggest that while r-squared is not a direct probability, it is a statistic that has a distribution and associated probabilities, implying a connection between variables.
  • Concerns are raised about the potential for overfitting, where a high r-squared value may not indicate a true relationship due to the inclusion of too many variables.
  • One participant mentions that the chance of randomness is better represented by the p-value, which is calculated separately from the r-squared value.
  • There is a caution against interpreting r-squared values without considering the context and the data set used, as it can lead to misleading conclusions.

Areas of Agreement / Disagreement

Participants do not reach a consensus on whether r-squared values can be interpreted as probabilities. Multiple competing views remain regarding the nature of r-squared and its implications in statistical analysis.

Contextual Notes

Limitations include the dependence on the definitions of correlation and probability, as well as the potential for misunderstanding the implications of r-squared values in the context of statistical modeling.

Hallucinogen
Messages
37
Reaction score
0
Sorry if this is trivial - I'm arguing with someone on Facebook :sorry: who is claiming that r-squared values, or correlation coefficients represents "chance" or "probability". I've never heard of this.
I just need a simple yes or no answer along with a short explanation why it is or isn't probability.
He says:
"It is the probability that there is a connection between two different things.

A Correlation Probability of 1 means that these two things ALWAYS occur together, even if there is absolutely no causal links between them.

A Correlation Probability of ½ (or .5, or 50%) means that the two things are connected 50% of the time. Or, roughly what we would see in "Chance" for most things, depending upon what two things we are looking at."

I say:
"I think you have something wrong here. I don't know what a correlation probability is and I'm struggling to find it online. The ".55" value and other correlations in the literature are correlation coefficients, they are r-squared values calculated using a least squares regression analysis. R-squared estimates the fraction of the variance in IQ scores that is explained by % genetic relation - it has nothing to do with chance.

Chance of randomness is given by the p value, calculated by a t test on the same data. The chance of randomness depends on the data sets you're attempting to correlate and the variances of the data points. I still don't actually understand where you've gotten 50% from."

He says:
"An R-Squared IS a PROBABILITY (Describing the Chance/Probability that two variables are correlated)."

Is r-squared the same thing as probability? I've not learned it that way, I've learned it *only* as how much variance in one thing explains variance in another.
 
Physics news on Phys.org
An ##R^2## is not a probability.
 
  • Like
Likes   Reactions: Hallucinogen
R2 is a measurement of what fraction of the variation of one variable might be explained by the other variable. Although it is not directly a probability, it is a statistic that has a distribution and associated probabilities. A large R2 from enough data implies that the apparent association between the variables would have taken a lot of luck if they are not, in fact, related.
 
FactChecker said:
A large R2 from enough data implies that the apparent association between the variables would have taken a lot of luck if they are not, in fact, related.

I would be very very careful before taking this interpretation. It is always possible to inflate your ##R^2## by taking enough variables. So your large ##R^2## might be due to overfitting, and not due to an actual relation.
 
  • Like
Likes   Reactions: FactChecker
micromass said:
I would be very very careful before taking this interpretation. It is always possible to inflate your ##R^2## by taking enough variables. So your large ##R^2## might be due to overfitting, and not due to an actual relation.
I agree about the danger, especially if there are a lot of variables and not a lot of data. With all applied math, you have to be careful. But it is the only interpretation to take and is fundamental for lot of statistics. Algorithms will often help to select a small subset of variables that gives a statistically significant R2.
 
Last edited:

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K