Where did the correlation formula come from?

CuriousBanker · Mar 7, 2014

So I am still studying for CFA level one, and it has been years since I took a statistics course.

Anyway, the formula is Correlation=Covariance(x,y)/(standard deviation of x times the standard deviation of y)

It is easy enough to calculate...but where did the formula come from? As in, how was it derived? Also, how do we know it always stays between -1 and +1?

I am just curious, because although variance, standard deviation and covariance are all intuitive to me, and although the concept of correlation is very easy to grasp, for some reason the formula is not making intuitive sense to me.

Thanks in advance.

micromass · Mar 7, 2014

If you're ok with covariance, then correlation is not so difficult. It's basically just a normalization of the covariance formula. That is, it's just the covariance, but we made sure that it always stays between -1 and 1.

So 0 should indicate no correlation (so independent properties have no correlation, although the converse is false).
A correlation of 1 indicates a linear relationship between x and y with positive slope. So you know (almost surely) that if x is large then y will be large too.
A correlation of -1 indicates a linear relationship between x and y with negative slope. So you know (almost surely) that if x is large, then y will be small.

The correlation then indicates how close you are to these three situations. So if a correlation is .95, then you're very close to a linear relationship. So if x is large, then you're pretty sure that y is large too.

As for why the correlation is between 1 and -1, I think you should just look up the proof in a probability/statistics book. In more mathy texts, they will just say it follows from the Cauchy-Bunyakovsky-Schwarz inequality though. In fact, if you ever studied dot products in geometry, then you have without a doubt seen the formula

[tex]x\cdot y = \|x\|\|y\|\cos\theta[/tex]

Rearranging gives you

[tex]\cos\theta =\frac{x\cdot y}{\|x\|\|y\|}[/tex]

Now, if you interpret the covariance as a dot product and the standard deviations as the norms, then you can interpret the correlation as the cosine of an angle. Please do ignore if this last part makes no sense to you at all. It's not so important, but I think it's a neat interpretation.

CuriousBanker · Mar 7, 2014

Micro, thanks for always helping.

I already understood what it all meant...and why close to 1 means strongly linear, etc.

The rest, I don't understand at all. I'm thinking maybe I should just memorize the formula for now, and save the deep understanding for when I take my probability/stats classes in a couple of years. What do you think?

Also, besides the proof that it always stays in between -1 and +1, why are we dividing by the product of the standard deviations? Or is that something else I should just memorize now and understand later?

That's the one thing I hate about these kind of licenses/exams...to me, this stuff is meaningless without the proofs and explanations...but whatever, if it's what firms want to see, I'll just do it.

micromass · Mar 7, 2014

CuriousBanker said:

Micro, thanks for always helping.

I already understood what it all meant...and why close to 1 means strongly linear, etc.

The rest, I don't understand at all. I'm thinking maybe I should just memorize the formula for now, and save the deep understanding for when I take my probability/stats classes in a couple of years. What do you think?

To be honest, I don't really think there is any deep understanding involved here. The only stuff that might be a bit deep is the connection with inner-product space in linear algebra, and that's not even such an important thing. So yes, just memorize the formula, but I think you already understand it fine. I can't blame you for thinking there is some deep stuff going on here that you don't understand at the moment, and often this will be the case in mathematics, but not here.

Also, besides the proof that it always stays in between -1 and +1, why are we dividing by the product of the standard deviations? Or is that something else I should just memorize now and understand later?

We are dividing precisely so that it stays between -1 and +1. Another reason for me to divide by them is the analogy with the dot product. I don't think there are any other reasons.

Stephen Tashi · Mar 8, 2014

When a line if fit to data by linear regression using "least squares" (as opposed to "total least squares"), the method assumes there can be error in Y measurements, but no error in X. For this reason the regression line of Y as a function of X is usually not the same line as you would get if you did a regression of X as a function of Y ( even if you rotate the graph, i.e. the predicted value of Y given an X by one method need not agree with the predicted value of Y using the other).. You could view the correlation coefficient as an attempt to say something about the linear relation of X and Y without committing to which has errors in measurement.

FactChecker · Mar 13, 2014

Some basic cases should make you comfortable with the equation. Look at the equation in these examples of completely correlated variables: cor(x,x)=1, cor(x,-x)=-1. With A positive, cor(x,A*x)=1, cor(x,-A*x)=-1, and cor(x, A*x+B)=1. At the other extreme of completely independent variables, x and y, cor(x,y)=0.

Where did the correlation formula come from?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight