# Does a correlation coefficient represent probability?

• B
• Hallucinogen
In summary, r-squared values, or correlation coefficients, are not the same as probability. R-squared measures the amount of variation in one variable that can be explained by another variable, while probability is a measure of the likelihood of an event occurring. Although R-squared has a distribution and associated probabilities, it is not a direct measure of probability and can be influenced by other factors such as overfitting. It is important to be cautious when interpreting R-squared values and to understand their limitations.
Hallucinogen
Sorry if this is trivial - I'm arguing with someone on Facebook who is claiming that r-squared values, or correlation coefficients represents "chance" or "probability". I've never heard of this.
I just need a simple yes or no answer along with a short explanation why it is or isn't probability.
He says:
"It is the probability that there is a connection between two different things.

A Correlation Probability of 1 means that these two things ALWAYS occur together, even if there is absolutely no causal links between them.

A Correlation Probability of ½ (or .5, or 50%) means that the two things are connected 50% of the time. Or, roughly what we would see in "Chance" for most things, depending upon what two things we are looking at."

I say:
"I think you have something wrong here. I don't know what a correlation probability is and I'm struggling to find it online. The ".55" value and other correlations in the literature are correlation coefficients, they are r-squared values calculated using a least squares regression analysis. R-squared estimates the fraction of the variance in IQ scores that is explained by % genetic relation - it has nothing to do with chance.

Chance of randomness is given by the p value, calculated by a t test on the same data. The chance of randomness depends on the data sets you're attempting to correlate and the variances of the data points. I still don't actually understand where you've gotten 50% from."

He says:
"An R-Squared IS a PROBABILITY (Describing the Chance/Probability that two variables are correlated)."

Is r-squared the same thing as probability? I've not learned it that way, I've learned it *only* as how much variance in one thing explains variance in another.

An ##R^2## is not a probability.

Hallucinogen
R2 is a measurement of what fraction of the variation of one variable might be explained by the other variable. Although it is not directly a probability, it is a statistic that has a distribution and associated probabilities. A large R2 from enough data implies that the apparent association between the variables would have taken a lot of luck if they are not, in fact, related.

FactChecker said:
A large R2 from enough data implies that the apparent association between the variables would have taken a lot of luck if they are not, in fact, related.

I would be very very careful before taking this interpretation. It is always possible to inflate your ##R^2## by taking enough variables. So your large ##R^2## might be due to overfitting, and not due to an actual relation.

FactChecker
micromass said:
I would be very very careful before taking this interpretation. It is always possible to inflate your ##R^2## by taking enough variables. So your large ##R^2## might be due to overfitting, and not due to an actual relation.
I agree about the danger, especially if there are a lot of variables and not a lot of data. With all applied math, you have to be careful. But it is the only interpretation to take and is fundamental for lot of statistics. Algorithms will often help to select a small subset of variables that gives a statistically significant R2.

Last edited:

## 1. What is a correlation coefficient?

A correlation coefficient is a statistical measure that represents the strength and direction of the relationship between two variables. It is a numerical value between -1 and 1, where a value of 0 indicates no relationship, a positive value indicates a positive relationship, and a negative value indicates a negative relationship.

## 2. How is a correlation coefficient calculated?

A correlation coefficient is calculated by dividing the covariance of the two variables by the product of their standard deviations. This calculation results in a value between -1 and 1, with a higher absolute value indicating a stronger relationship between the variables.

## 3. Does a correlation coefficient represent causation?

No, a correlation coefficient does not represent causation. It only measures the strength and direction of the relationship between two variables. It is possible for two variables to have a strong correlation, but not have a causal relationship.

## 4. Can a correlation coefficient be used to predict future outcomes?

While a correlation coefficient can give an indication of the relationship between two variables, it cannot be used to predict future outcomes. Other factors and variables may also influence the relationship between the two variables and impact future outcomes.

## 5. Does a correlation coefficient represent probability?

No, a correlation coefficient does not represent probability. It is a measure of the strength and direction of the relationship between two variables, not the likelihood of a certain outcome occurring. Probability is typically represented by a value between 0 and 1, while a correlation coefficient is a numerical value between -1 and 1.

Replies
7
Views
2K
Replies
30
Views
3K
Replies
9
Views
1K
Replies
4
Views
2K
Replies
17
Views
2K
Replies
2
Views
2K
Replies
7
Views
3K
Replies
7
Views
3K
Replies
24
Views
3K
Replies
21
Views
3K