Joint Used to Show lack of Correlation?

WWGD · Jun 30, 2019

Hi All,
I think I have some idea of how to interpret covariance and correlation. But some doubts remain:
1)What joint distribution do we assume? An example of uncorrelated variables is that of points on a circle, i.e., the variables ##X ##and ##\sqrt{ 1- x^2} ##are uncorrelated -- have ##Cov(X,Y)=0 ##.

## Cov(X,Y) =E(XY) - \mu_X \mu_Y ## Now, each of these terms assumes a distribution. That for ##X, Y ##and a joint for ##X,Y ##. But I have never seen any mention of either after searching.

2) Is there a way of "going backwards" and deciding which joints/marginals would create uncorrelated variables, i.e., can we find all ## f_{XY}(X,Y) ## so that :

## \Sigma X_iY_i f_{XY}(x_i,y_i) - \Sigma xf_X(x) \Sigma yf_Y(y) =0 ##

or, in the continuous case:

## \int xy f_{XY}(x,y) dxdy - \int xf_X(x) dx \int yf_Y(y)dy =0 ## ?
3) In what sense is correlation a measure of linear dependence? I don't see where/how this follows from the formulas.
Thanks.

BvU · Jun 30, 2019

WWGD said:

An example of uncorrelated variables is that of points on a circle, i.e., the variables ##X## and ##\sqrt{ 1- x^2}## are uncorrelated

Hogwash. 'uncorrelated' means you know nothing about y when given x

WWGD said:

after searching

Where ? Behind the refrigerator ?

WWGD · Jun 30, 2019

BvU said:

Hogwash. 'uncorrelated' means you know nothing about y when given x

I thought that was independence. AFAIK uncorrelated , implying Cov(X,Y)=0 means there is no "clear pattern of change of Y when X changes" . EDIT: At any rate, ##Cov(X,Y)= E(XY)-\mu_X \mu_Y ## In order to determine when this equals 0 we must now the joint for (X,Y). From the joint we can find the marginals ##f_X, f_Y##

Besides, I have seen enought that e.g., points in a parabola, or the pair ##(X,X^2)## is uncorrelated, though clearly we know _everything_ about ##Y=X^2 ## when we know ##X##.

BvU · Jun 30, 2019

Seems to me we are speaking different languages here. To me there is a very clear pattern of change of ##y=x^2## when ##x## changes.

WWGD · Jun 30, 2019

BvU said:

Seems to me we are speaking different languages here. To me there is a very clear pattern of change of ##y=x^2## when ##x## changes.

I think it has to see with the fact that ##(X-\mu_X)(Y- \mu_Y)## is alternatingly positive and negative, i.e., neither overwhelmingly positive nor negative. So there is no clear sense of either x increases with y nor x decreases with y, but I am still trying to get a better feel for the term. EDIT: In a more formal sense which I don't fully get yet) Cov(X,Y) is a quadratic form and we can show Cov(X,Y)= -Cov(X,Y), forcing it to equal 0.

EDIT2: Here is a screenshot of a Covariance calculator for the pair ##(X, Y=X^2) ## where correlation is 0. Data points X=( 1,2,3,4,5,-1,-2,-3,-4,-5) and Y=( 1,4,9,16,25,1,4,9,16,25)

EDIT2:

Stephen Tashi · Jun 30, 2019

WWGD said:

AFAIK uncorrelated , implying Cov(X,Y)=0 means there is no "clear pattern of change of Y when X changes" .

That is not a good intuition. Suppose (X,Y) jointly take on the following values, each with probability 1/5: (-2,-2), (-1,-1), (0,0), (1,1), (2, -3).

##\mu_X = 0 ##
##\mu_Y = -1##
## E(X,Y) = 0##
##COV(X,Y) = 0##

In this case, knowing ##X## completely determines ##Y##.

In the above example, ##X## and ##Y## are independent. Neither independence nor zero correlation rule out a deterministic relationship ( a "clear pattern") relating two random variables.

(It's also possible to give examples of two random variables that are uncorrelated but not independent. )

A better intuition about COV(X,Y) is that it has to with with approximating the relationship between ##X## and ##Y## by a linear equation.

3) In what sense is correlation a measure of linear dependence? I don't see where/how this follows from the formulas.

Look at the formulas for doing linear regression. They indicate that correlation has something to do with least squares linear approximation.

2) Is there a way of "going backwards" and deciding which joints/marginals would create uncorrelated variables,

Going backwards from what starting point? It's an interesting line of thought, but the task "List all bivariate distributions whose variables are uncorrelated" is too general. More restrictive questions would be better - questions about particular families of distributions or questions about how to take two bivariate distributions and use them to form a third bivariate distributions whose variables are uncorrelated.

WWGD · Jun 30, 2019

Stephen Tashi said:

That is not a good intuition. Suppose (X,Y) jointly take on the following values, each with probability 1/5: (-2,-2), (-1,-1), (0,0), (1,1), (2, -3).

##\mu_X = 0 ##
##\mu_Y = -1##
## E(X,Y) = 0##
##COV(X,Y) = 0##

In this case, knowing ##X## completely determines ##Y##.

In the above example, ##X## and ##Y## are independent. Neither independence nor zero correlation rule out a deterministic relationship ( a "clear pattern") relating two random variables.

(It's also possible to give examples of two random variables that are uncorrelated but not independent. )

A better intuition about COV(X,Y) is that it has to with with approximating the relationship between ##X## and ##Y## by a linear equation.Look at the formulas for doing linear regression. They indicate that correlation has something to do with least squares linear approximation.Going backwards from what starting point? It's an interesting line of thought, but the task "List all bivariate distributions whose variables are uncorrelated" is too general. More restrictive questions would be better - questions about particular families of distributions or questions about how to take two bivariate distributions and use them to form a third bivariate distributions whose variables are uncorrelated.

Are you assuming E(X,Y)=E(XY)?
Thanks. Yes, I didn't make myself very clear. I meant that the expression ##(X-\mu_X)(Y- \mu_Y) ## is neither " Overwhlmingly" positive nor negative so we can neither say with accuracy that Y increases with X nor that Y decreases with X. In this sense they do not have a clear pattern of changing together -- co -varying. Still trying to pin down the concept more clearly.

WWGD · Jun 30, 2019

Still, my initial question is: What joint are we assuming for a pair (X,Y) when we say they are uncorrelated? It seems strange when I read these statements without seeing a mentioned of a joint.

Stephen Tashi · Jun 30, 2019

WWGD said:

Are you assuming E(X,Y)=E(XY)?

Yes. It's a typo. It should be E(X,Y).

I meant that the expression (X−μX)(Y−μY)
is neither " Overwhlmingly" positive nor negative so we can neither say with accuracy that Y increases with X nor that Y decreases with X. In this sense they do not have a clear pattern of changing together -- co -varying. Still trying to pin down the concept more clearly.

For the purpose of understanding the mathematics, it's best not make this kind of qualitative interpretation of covariance. In the example of the previous post, a qualitative evaluation might say that ##Y## does tend to increase as ##X## increases. For the purpose of understanding a typical presentation where someone is presenting statistics, that kind of qualitative intepretation is often ok.

WWGD · Jun 30, 2019

Stephen Tashi said:

Yes. It's a typo. It should be E(X,Y).
For the purpose of understanding the mathematics, it's best not make this kind of qualitative interpretation of covariance. In the example of the previous post, a qualitative evaluation might say that ##Y## does tend to increase as ##X## increases. For the purpose of understanding a typical presentation where someone is presenting statistics, that kind of qualitative intepretation is often ok.

Thank you Stephen. But the regression aspect makes it more confusing to me. We have two major cases: (X,Y) both random variables, and (X,Y): X is a mathematical variable and Y is random. I guess we speak about correlation only in the first case?

Stephen Tashi · Jun 30, 2019

WWGD said:

Still, my initial question is: What joint are we assuming for a pair (X,Y) when we say they are uncorrelated? It seems strange when I read these statements without seeing a mentioned of a joint.

The fact that two random variables are uncorrelated does not imply that they have a particular joint distribution. The technique of linear least squares regression is applied to data, not to distributions, but imagine you have lots of data, so the plot of your data resembles the joint probability density distribution of two random variables. If the random variables are uncorrelated and you did a least squares linear regression between them, the slope of the regression line would zero. There are many different "clouds" of data points that can produce a zero regression line. For example, the shape of the data cloud does not have to be symmetrical about a horizontal line. Many points above a horizontal line could be "canceled out" by a few points far below it. Correlation (in the mathematical sense of "correlation coefficient" ) is a quantitative concept.

Stephen Tashi · Jun 30, 2019

WWGD said:

We have two major cases: (X,Y) both random variables, and (X,Y): X is a mathematical variable and Y is random. I guess we speak about correlation only in the first case?

That's a good point. Yes, we should only speak of mathematical correlation in the case where ##X## and ##Y## are both random varables. However, the properties of random variables are estimated from data, so people say things like "The standard deviation was 25.3" when they mean that 25.3 is a number computed from some data that is used to estimate the standard deviation of a probability distribution. Likewise, we can talk about ##COV(X,Y)## as being a property of a joint probability distribution or we can say things like ##COV(X,Y) = 3.20 ## when we are taking about estimators computed from data.

In the scenario for linear least squares regresson ##y = ax + b##, both ##x## and ##y## are mathematical variables. The data used has the form ##(x_i,y_i)## where ##y_i## is assumed to be a realization of a random variable ##Y## that has the form ##Y = ax_i + b + E## where ##E## is a random variable.

To relate linear least squares regression to a bivariate distribution, you have to imagine taking samples ##(x_i,y_i)## from that distribution and doing a regression on that data. So you wouldn't generate data by picking values of ##x_i## in some systematic manner such as taking an equal number of measurements of ##y## when ##x = 1,2,3,...##.

StoneTemplePython · Jul 1, 2019

WWGD said:

3) In what sense is correlation a measure of linear dependence? I don't see where/how this follows from the formulas.

1st: zero mean random variables form a vector space. 2nd changing the mean (by addition of a constant) doesn't change the computed covariance. So assume WLOG that you are dealing with zero mean random variables.

Now supposing your random variables have finite variance, apply cauchy schwarz to
##E\big[XY\big]##
or look at the 2x2 covariance matrix for ##(X,Y)##. This is a gram matrix...

WWGD · Jul 1, 2019

StoneTemplePython said:

1st: zero mean random variables form a vector space. 2nd changing the mean (by addition of a constant) doesn't change the computed covariance. So assume WLOG that you are dealing with zero mean random variables.

Now supposing your random variables have finite variance, apply cauchy schwarz to
##E\big[XY\big]##
or look at the 2x2 covariance matrix for ##(X,Y)##. This is a gram matrix...

I know there was an approach using quadratic forms, possibly similar to this. So you mean we can obtain the result without knowing the actual joint? So I guess E(XY) is an inner-product? Ah, yes, I am remembering the probability subsection of the Cauchy-Schwarz section in Wiki.

StoneTemplePython · Jul 1, 2019

WWGD said:

So I guess E(XY) is an inner-product? Ah, yes, I am remembering the probability subsection of the Cauchy-Schwarz section in Wiki.

run with this for a bit...

WWGD said:

So you mean we can obtain the result without knowing the actual joint?

This seems like a vague question. One way or another to directly compute ##E\big[XY\big]## you need a joint distribution.

But depending on what you want out of this, linear algebra still something to consider -- you could have 2 independent random variables (consider it a random vector ##\mathbf x##, zero mean for convenience, i.e. ##E\big[\mathbf x\big] = \mathbf 0##).

The covariance matrix then is diagonal, ##E\big[\mathbf{xx}^T \big] = \Lambda##. But you could multiply by an orthogonal matrix ##\mathbf U## to get random vector ##\big(\mathbf {Ux}\big)## with covariance matrix

##E\big[\mathbf U\mathbf{xx}^T \mathbf U^T \big] = \mathbf U E\big[\mathbf{xx}^T\big] \mathbf U^T = \mathbf U\Lambda\mathbf U^T = \Sigma##
which in general is not diagonal and hence the the random vector ##\big(\mathbf {Ux}\big)## has correlated random variables though you never had to get in the weeds of the distributions.

Going through these manipulations is most productive and sharpest with the very important special case of a multivariate Gaussian ##\mathbf x## where zero covariance is actually the same thing as independence.

WWGD · Jul 1, 2019

StoneTemplePython said:

run with this for a bit...This seems like a vague question. One way or another to directly compute ##E\big[XY\big]## you need a joint distribution.

I read a comment to the effect one can show that we can show that E[XY]:=<X,Y> as an inner- product or quadratic form equals its own negative. I am trying to see why/how. But this is an area I am rusty, so sorry if I am being dense in/with this.

StoneTemplePython · Jul 1, 2019

WWGD said:

I read a comment to the effect one can show that we can show that E[XY]:=<X,Y> as an inner- product or quadratic form equals its own negative.

I don't know what this means. Since inner products are (bi)linear you should immediately question comments like this. From what I can tell you're saying

##E\big[XY\big] = E\big[-XY\big] = -E\big[XY\big]##
where the RHS follows by linearity of expectations. But this implies ##E\big[XY\big]=0## which of course isn't true in general.

Your statement also seems to contradict the fact that every n x n, real symmetric positive (semi)definite matrix is a covariance matrix (for a multivariate gaussian) and every covariance matrix (where 2nd moments exist) is an n x n, real symmetric positive (semi)definite matrix.

FactChecker · Jul 1, 2019

WWGD said:

Still, my initial question is: What joint are we assuming for a pair (X,Y) when we say they are uncorrelated? It seems strange when I read these statements without seeing a mentioned of a joint.

This is a property that a joint distribution may or may not have. There is no need to specify a particular joint distribution. It is like saying that f(x) = f(-x) defines the property of an even function without specifying any particular function.

WWGD · Jul 2, 2019

FactChecker said:

This is a property that a joint distribution may or may not have. There is no need to specify a particular joint distribution. It is like saying that f(x) = f(-x) defines the property of an even function without specifying any particular function.

I am not sure I get your point. Do you mean being uncorrelated depends on the joint? Yes, of course. But I wonder when I see the claim of uncorrelated which choice is assumed?

FactChecker · Jul 2, 2019

There is no need to specify any specific distribution in the definition of "uncorrelated". Of course, when one talks about any particular pair of random variables, X and Y, there is a joint distribution for those variables. That will be the one that applies when one talks about the correlation between X and Y.

WWGD · Jul 2, 2019

FactChecker said:

There is no need to specify any specific distribution in the definition of "uncorrelated". Of course, when one talks about any particular pair of random variables, X and Y, there is a joint distribution for those variables. That will be the one that applies when one talks about the correlation between X and Y.

Yes, I understand that, but I am trying to test that, e.g. the pair ## (X,X^2) ## is uncorrelated. How would I go about it? Same for points on a circle ( say unit) : ## (X, \sqrt{(1-X^2)} ## . How would I show it then?

FactChecker · Jul 2, 2019

WWGD said:

Yes, I understand that, but I am trying to test that, e.g. the pair (X,X^2) is uncorrelated. How would I go about it?

Your question is not well defined (unless I have missed something). It is up to you to specify what distributions you are working with. If X is uniformly distributed on the interval [2,3], then X and X^2 are correlated. If it is uniformly distributed on the interval [-1,1], then X and X^2 are uncorrelated.

A simpler example of the two cases is:
X = 2 or 3 with equal probability 1/2 (correlated)
X = -1 or 1 with equal probability 1/2 (uncorrelated)

WWGD · Jul 2, 2019

FactChecker said:

Your question is not well defined (unless I have missed something). It is up to you to specify what distributions you are working with. If X is uniformly distributed on the interval [2,3], then X and X^2 are correlated. If it is uniformly distributed on the interval [-1,1], then X and X^2 are uncorrelated.

A simpler example of the two cases is:
X = 2 or 3 with equal probability 1/2 (correlated)
X = -1 or 1 with equal probability 1/2 (uncorrelated)

Yes, I understand, this is almost tautological. No two pairs (X,Y) are " intrinsically" correlated/uncorrelated. But I am _given_ that they are without a mention of a joint . This means a joint is used _ implicitly_ and I am trying to make this assumption _explicit_.
I think we are not understanding each other. I am _given/told_ that the two are uncorrelated. This is given as a fact, without mentioning any underlying joint. So, a joint is assumed .I want to make this assumption explicit.

Stephen Tashi · Jul 2, 2019

WWGD said:

Yes, I understand that, but I am trying to test that, e.g. the pair (X,X^2) is uncorrelated. How would I go about it?

In such a case, I see why a defining a joint distribution presents a technical problem. The commonly encoutered bivariate density is a function ##j(x,y)## that integrates to 1 over some area (finite or infinite) in 2D space. To define a joint density for a set of points of the form ##(x,x^2)## brings up the problem of defining a function ##j(x,y)## that integrates to 1 over a line or line segment in 2D space. Ordinary 2D Riemann integration gives an answer of zero when we do 2D integration over a line segment in 2D.

I think we can appeal to a more advanced form of integration and solve that technical problem, but we can also sidestep the question of a joint density. To compute the expected value of a function of a random variable ##g(X)## we only need the density ##f(x)## for ##X##. ##E(g(x)) = \int g(x) f(x) dx##. The question of whether ##X## is correlated with ##X^2## only requires computing ##E(X)##, ##E(X^2)## and ##E((X)(X^2)) = E(X^3)##. Those expectations are functions of ##X##, so they can be computed using only the 1D density function for ##X##.

It would an interesting exercise in abstract mathematics to say the correct words for defining a joint density for ##(X,X^2)## in 2D and to use that definition to show computation using the joint density is equivalent to taking the 1D view of things. However, I don't know if that interests you - or whether I could do it.

WWGD · Jul 2, 2019

Essentially, I am trying to solve:

##E[XY]-\mu_X\mu_Y=0 , aka \int xy f_{XY}(x,y)dxdy- \int xf_X (x)dx \int yf_Y(y)Dy=0 for f_{XY}. ##

StoneTemplePython · Jul 2, 2019

Stephen Tashi said:

I think we can appeal to a more advanced form of integration and solve that technical problem, but we can also sidestep the question of a joint density. To compute the expected value of a function of a random variable ##g(X)## we only need the density ##f(x)## for ##X##. ##E(g(x)) = \int g(x) f(x) dx##. The question of whether ##X## is correlated with ##X^2## only requires computing ##E(X)##, ##E(X^2)## and ##E((X)(X^2)) = E(X^3)##. Those expectations are functions of ##X##, so they can be computed using only the 1D density function for ##X##.

It would an interesting exercise in abstract mathematics to say the correct words for defining a joint density for ##(X,X^2)## in 2D and to use that definition to show computation using the joint density is equivalent to taking the 1D view of things. However, I don't know if that interests you - or whether I could do it.

This is popularly called the Law of The Unconscious Statistician. It is implied by Law of Total Expectation.
Or in more abstract form, it is implied by the fact that ##E\Big[g(X)\big \vert X\Big] = g(X)##

FactChecker · Jul 2, 2019

So there must be something in the context of the problem that either explicitly or implicitly defines the density function of ##X## (the joint density function of ##(X,X^2)## can be derived from the density of ##X##) .

WWGD · Jul 2, 2019

FactChecker said:

So there must be something in the context of the problem that either explicitly or implicitly defines the density function of ##X## (the joint density function of ##(X,X^2)## can be derived from the density of ##X##) .

And it only took 27 posts to just get to the right formulation of the question. Serenity now!

WWGD · Jul 4, 2019

I just thought of another seemingly implicit assumption about distribution s. When the mean is described as the arithmetic average of numbers, i.e. ##(x_1+x_2+...+x_n)/n## which assumes a uniform distribution. I don't remember this assumption being stated explicitly.

StoneTemplePython · Jul 4, 2019

WWGD said:

I just thought of another seemingly implicit assumption about distribution s. When the mean is described as the arithmetic average of numbers, i.e. ##(x_1+x_2+...+x_n)/n## which assumes a uniform distribution. I don't remember this assumption being stated explicitly.

This is completely inaccurate. It's a definition and assumes no such thing. You are also likely mixing up statistics and probability theory.

Among other things, the SLLN tells us that iid sums of random variables with a finite mean
##\frac{1}{n}\big(X_1 + X_2 +... +X_n) \to \mu## with probability one.

suppose those ##X_i##'s are iid standard normal random variables, then for any natural number ##n##
##\frac{1}{n}\big(X_1 + X_2 +... +X_n)## is a gaussian distributed random variable, not a uniformly distributed random variable. You should have been able to figure this is out your self by looking at the MGF or CF for Uniform random variables and say any other convolution of iid random variables.

- - - -
edit:
if you know what you're doing you can even apply SLLN (or WLLN) to non-iid random variables with different means (supposing you meet a sufficient condition like kolmogorov criterion) which makes the idea that
##\frac{1}{n}\big(X_1 + X_2 +... +X_n)##
is somehow uniformly distributed even more bizarre

Joint Used to Show lack of Correlation?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Joint Used to Show lack of Correlation?

Similar threads