# B Does a correlation coefficient represent probability?

1. Dec 27, 2016

### Hallucinogen

Sorry if this is trivial - I'm arguing with someone on Facebook who is claiming that r-squared values, or correlation coefficients represents "chance" or "probability". I've never heard of this.
I just need a simple yes or no answer along with a short explanation why it is or isn't probability.
He says:
"It is the probability that there is a connection between two different things.

A Correlation Probability of 1 means that these two things ALWAYS occur together, even if there is absolutely no causal links between them.

A Correlation Probability of ½ (or .5, or 50%) means that the two things are connected 50% of the time. Or, roughly what we would see in "Chance" for most things, depending upon what two things we are looking at."

I say:
"I think you have something wrong here. I don't know what a correlation probability is and I'm struggling to find it online. The ".55" value and other correlations in the literature are correlation coefficients, they are r-squared values calculated using a least squares regression analysis. R-squared estimates the fraction of the variance in IQ scores that is explained by % genetic relation - it has nothing to do with chance.

Chance of randomness is given by the p value, calculated by a t test on the same data. The chance of randomness depends on the data sets you're attempting to correlate and the variances of the data points. I still don't actually understand where you've gotten 50% from."

He says:
"An R-Squared IS a PROBABILITY (Describing the Chance/Probability that two variables are correlated)."

Is r-squared the same thing as probability? I've not learned it that way, I've learned it *only* as how much variance in one thing explains variance in another.

2. Dec 27, 2016

### micromass

Staff Emeritus
An $R^2$ is not a probability.

3. Dec 27, 2016

### FactChecker

R2 is a measurement of what fraction of the variation of one variable might be explained by the other variable. Although it is not directly a probability, it is a statistic that has a distribution and associated probabilities. A large R2 from enough data implies that the apparent association between the variables would have taken a lot of luck if they are not, in fact, related.

4. Dec 27, 2016

### micromass

Staff Emeritus
I would be very very careful before taking this interpretation. It is always possible to inflate your $R^2$ by taking enough variables. So your large $R^2$ might be due to overfitting, and not due to an actual relation.

5. Dec 27, 2016

### FactChecker

I agree about the danger, especially if there are a lot of variables and not a lot of data. With all applied math, you have to be careful. But it is the only interpretation to take and is fundamental for lot of statistics. Algorithms will often help to select a small subset of variables that gives a statistically significant R2.

Last edited: Dec 27, 2016