Joint Used to Show lack of Correlation?

In summary, the conversation discusses the concepts of covariance and correlation and their relationship to joint and marginal distributions. The idea of uncorrelated variables is also explored, with the understanding that uncorrelated does not necessarily mean independent. The conversation also delves into the intuition behind correlation as a measure of linear dependence and the possibility of determining which joints and marginals would create uncorrelated variables.
  • #1
WWGD
Science Advisor
Gold Member
7,405
11,411
Hi All,
I think I have some idea of how to interpret covariance and correlation. But some doubts remain:
1)What joint distribution do we assume? An example of uncorrelated variables is that of points on a circle, i.e., the variables ##X ##and ##\sqrt{ 1- x^2} ##are uncorrelated -- have ##Cov(X,Y)=0 ##.

## Cov(X,Y) =E(XY) - \mu_X \mu_Y ## Now, each of these terms assumes a distribution. That for ##X, Y ##and a joint for ##X,Y ##. But I have never seen any mention of either after searching.

2) Is there a way of "going backwards" and deciding which joints/marginals would create uncorrelated variables, i.e., can we find all ## f_{XY}(X,Y) ## so that :

## \Sigma X_iY_i f_{XY}(x_i,y_i) - \Sigma xf_X(x) \Sigma yf_Y(y) =0 ##

or, in the continuous case:

## \int xy f_{XY}(x,y) dxdy - \int xf_X(x) dx \int yf_Y(y)dy =0 ## ?
3) In what sense is correlation a measure of linear dependence? I don't see where/how this follows from the formulas.
Thanks.
 
Last edited:
Physics news on Phys.org
  • #2
WWGD said:
An example of uncorrelated variables is that of points on a circle, i.e., the variables ##X## and ##\sqrt{ 1- x^2}## are uncorrelated
Hogwash. 'uncorrelated' means you know nothing about y when given x

WWGD said:
after searching
Where ? Behind the refrigerator ?
 
  • #3
BvU said:
Hogwash. 'uncorrelated' means you know nothing about y when given x
I thought that was independence. AFAIK uncorrelated , implying Cov(X,Y)=0 means there is no "clear pattern of change of Y when X changes" . EDIT: At any rate, ##Cov(X,Y)= E(XY)-\mu_X \mu_Y ## In order to determine when this equals 0 we must now the joint for (X,Y). From the joint we can find the marginals ##f_X, f_Y##

Besides, I have seen enought that e.g., points in a parabola, or the pair ##(X,X^2)## is uncorrelated, though clearly we know _everything_ about ##Y=X^2 ## when we know ##X##.
 
Last edited:
  • #4
Seems to me we are speaking different languages here. To me there is a very clear pattern of change of ##y=x^2## when ##x## changes.
 
  • #5
BvU said:
Seems to me we are speaking different languages here. To me there is a very clear pattern of change of ##y=x^2## when ##x## changes.
I think it has to see with the fact that ##(X-\mu_X)(Y- \mu_Y)## is alternatingly positive and negative, i.e., neither overwhelmingly positive nor negative. So there is no clear sense of either x increases with y nor x decreases with y, but I am still trying to get a better feel for the term. EDIT: In a more formal sense which I don't fully get yet) Cov(X,Y) is a quadratic form and we can show Cov(X,Y)= -Cov(X,Y), forcing it to equal 0.

EDIT2: Here is a screenshot of a Covariance calculator for the pair ##(X, Y=X^2) ## where correlation is 0. Data points X=( 1,2,3,4,5,-1,-2,-3,-4,-5) and Y=( 1,4,9,16,25,1,4,9,16,25)

EDIT2:
245981
 
Last edited:
  • #6
WWGD said:
AFAIK uncorrelated , implying Cov(X,Y)=0 means there is no "clear pattern of change of Y when X changes" .

That is not a good intuition. Suppose (X,Y) jointly take on the following values, each with probability 1/5: (-2,-2), (-1,-1), (0,0), (1,1), (2, -3).

##\mu_X = 0 ##
##\mu_Y = -1##
## E(X,Y) = 0##
##COV(X,Y) = 0##

In this case, knowing ##X## completely determines ##Y##.

In the above example, ##X## and ##Y## are independent. Neither independence nor zero correlation rule out a deterministic relationship ( a "clear pattern") relating two random variables.

(It's also possible to give examples of two random variables that are uncorrelated but not independent. )

A better intuition about COV(X,Y) is that it has to with with approximating the relationship between ##X## and ##Y## by a linear equation.

3) In what sense is correlation a measure of linear dependence? I don't see where/how this follows from the formulas.
Look at the formulas for doing linear regression. They indicate that correlation has something to do with least squares linear approximation.

2) Is there a way of "going backwards" and deciding which joints/marginals would create uncorrelated variables,
Going backwards from what starting point? It's an interesting line of thought, but the task "List all bivariate distributions whose variables are uncorrelated" is too general. More restrictive questions would be better - questions about particular families of distributions or questions about how to take two bivariate distributions and use them to form a third bivariate distributions whose variables are uncorrelated.
 
  • Like
Likes BvU and WWGD
  • #7
Stephen Tashi said:
That is not a good intuition. Suppose (X,Y) jointly take on the following values, each with probability 1/5: (-2,-2), (-1,-1), (0,0), (1,1), (2, -3).

##\mu_X = 0 ##
##\mu_Y = -1##
## E(X,Y) = 0##
##COV(X,Y) = 0##

In this case, knowing ##X## completely determines ##Y##.

In the above example, ##X## and ##Y## are independent. Neither independence nor zero correlation rule out a deterministic relationship ( a "clear pattern") relating two random variables.

(It's also possible to give examples of two random variables that are uncorrelated but not independent. )

A better intuition about COV(X,Y) is that it has to with with approximating the relationship between ##X## and ##Y## by a linear equation.Look at the formulas for doing linear regression. They indicate that correlation has something to do with least squares linear approximation.Going backwards from what starting point? It's an interesting line of thought, but the task "List all bivariate distributions whose variables are uncorrelated" is too general. More restrictive questions would be better - questions about particular families of distributions or questions about how to take two bivariate distributions and use them to form a third bivariate distributions whose variables are uncorrelated.
Are you assuming E(X,Y)=E(XY)?
Thanks. Yes, I didn't make myself very clear. I meant that the expression ##(X-\mu_X)(Y- \mu_Y) ## is neither " Overwhlmingly" positive nor negative so we can neither say with accuracy that Y increases with X nor that Y decreases with X. In this sense they do not have a clear pattern of changing together -- co -varying. Still trying to pin down the concept more clearly.
 
  • #8
Still, my initial question is: What joint are we assuming for a pair (X,Y) when we say they are uncorrelated? It seems strange when I read these statements without seeing a mentioned of a joint.
 
  • #9
WWGD said:
Are you assuming E(X,Y)=E(XY)?
Yes. It's a typo. It should be E(X,Y).

I meant that the expression (X−μX)(Y−μY)
is neither " Overwhlmingly" positive nor negative so we can neither say with accuracy that Y increases with X nor that Y decreases with X. In this sense they do not have a clear pattern of changing together -- co -varying. Still trying to pin down the concept more clearly.

For the purpose of understanding the mathematics, it's best not make this kind of qualitative interpretation of covariance. In the example of the previous post, a qualitative evaluation might say that ##Y## does tend to increase as ##X## increases. For the purpose of understanding a typical presentation where someone is presenting statistics, that kind of qualitative intepretation is often ok.
 
  • #10
Stephen Tashi said:
Yes. It's a typo. It should be E(X,Y).
For the purpose of understanding the mathematics, it's best not make this kind of qualitative interpretation of covariance. In the example of the previous post, a qualitative evaluation might say that ##Y## does tend to increase as ##X## increases. For the purpose of understanding a typical presentation where someone is presenting statistics, that kind of qualitative intepretation is often ok.
Thank you Stephen. But the regression aspect makes it more confusing to me. We have two major cases: (X,Y) both random variables, and (X,Y): X is a mathematical variable and Y is random. I guess we speak about correlation only in the first case?
 
  • #11
WWGD said:
Still, my initial question is: What joint are we assuming for a pair (X,Y) when we say they are uncorrelated? It seems strange when I read these statements without seeing a mentioned of a joint.

The fact that two random variables are uncorrelated does not imply that they have a particular joint distribution. The technique of linear least squares regression is applied to data, not to distributions, but imagine you have lots of data, so the plot of your data resembles the joint probability density distribution of two random variables. If the random variables are uncorrelated and you did a least squares linear regression between them, the slope of the regression line would zero. There are many different "clouds" of data points that can produce a zero regression line. For example, the shape of the data cloud does not have to be symmetrical about a horizontal line. Many points above a horizontal line could be "canceled out" by a few points far below it. Correlation (in the mathematical sense of "correlation coefficient" ) is a quantitative concept.
 
  • Like
Likes WWGD
  • #12
WWGD said:
We have two major cases: (X,Y) both random variables, and (X,Y): X is a mathematical variable and Y is random. I guess we speak about correlation only in the first case?

That's a good point. Yes, we should only speak of mathematical correlation in the case where ##X## and ##Y## are both random varables. However, the properties of random variables are estimated from data, so people say things like "The standard deviation was 25.3" when they mean that 25.3 is a number computed from some data that is used to estimate the standard deviation of a probability distribution. Likewise, we can talk about ##COV(X,Y)## as being a property of a joint probability distribution or we can say things like ##COV(X,Y) = 3.20 ## when we are taking about estimators computed from data.

In the scenario for linear least squares regresson ##y = ax + b##, both ##x## and ##y## are mathematical variables. The data used has the form ##(x_i,y_i)## where ##y_i## is assumed to be a realization of a random variable ##Y## that has the form ##Y = ax_i + b + E## where ##E## is a random variable.

To relate linear least squares regression to a bivariate distribution, you have to imagine taking samples ##(x_i,y_i)## from that distribution and doing a regression on that data. So you wouldn't generate data by picking values of ##x_i## in some systematic manner such as taking an equal number of measurements of ##y## when ##x = 1,2,3,...##.
 
Last edited:
  • #13
WWGD said:
3) In what sense is correlation a measure of linear dependence? I don't see where/how this follows from the formulas.
1st: zero mean random variables form a vector space. 2nd changing the mean (by addition of a constant) doesn't change the computed covariance. So assume WLOG that you are dealing with zero mean random variables.

Now supposing your random variables have finite variance, apply cauchy schwarz to
##E\big[XY\big]##
or look at the 2x2 covariance matrix for ##(X,Y)##. This is a gram matrix...
 
  • #14
StoneTemplePython said:
1st: zero mean random variables form a vector space. 2nd changing the mean (by addition of a constant) doesn't change the computed covariance. So assume WLOG that you are dealing with zero mean random variables.

Now supposing your random variables have finite variance, apply cauchy schwarz to
##E\big[XY\big]##
or look at the 2x2 covariance matrix for ##(X,Y)##. This is a gram matrix...
I know there was an approach using quadratic forms, possibly similar to this. So you mean we can obtain the result without knowing the actual joint? So I guess E(XY) is an inner-product? Ah, yes, I am remembering the probability subsection of the Cauchy-Schwarz section in Wiki.
 
  • #15
WWGD said:
So I guess E(XY) is an inner-product? Ah, yes, I am remembering the probability subsection of the Cauchy-Schwarz section in Wiki.
run with this for a bit...

WWGD said:
So you mean we can obtain the result without knowing the actual joint?
This seems like a vague question. One way or another to directly compute ##E\big[XY\big]## you need a joint distribution.

But depending on what you want out of this, linear algebra still something to consider -- you could have 2 independent random variables (consider it a random vector ##\mathbf x##, zero mean for convenience, i.e. ##E\big[\mathbf x\big] = \mathbf 0##).

The covariance matrix then is diagonal, ##E\big[\mathbf{xx}^T \big] = \Lambda##. But you could multiply by an orthogonal matrix ##\mathbf U## to get random vector ##\big(\mathbf {Ux}\big)## with covariance matrix

##E\big[\mathbf U\mathbf{xx}^T \mathbf U^T \big] = \mathbf U E\big[\mathbf{xx}^T\big] \mathbf U^T = \mathbf U\Lambda\mathbf U^T = \Sigma##
which in general is not diagonal and hence the the random vector ##\big(\mathbf {Ux}\big)## has correlated random variables though you never had to get in the weeds of the distributions.

Going through these manipulations is most productive and sharpest with the very important special case of a multivariate Gaussian ##\mathbf x## where zero covariance is actually the same thing as independence.
 
  • Like
Likes WWGD
  • #16
StoneTemplePython said:
run with this for a bit...This seems like a vague question. One way or another to directly compute ##E\big[XY\big]## you need a joint distribution.
I read a comment to the effect one can show that we can show that E[XY]:=<X,Y> as an inner- product or quadratic form equals its own negative. I am trying to see why/how. But this is an area I am rusty, so sorry if I am being dense in/with this.
 
Last edited:
  • #17
WWGD said:
I read a comment to the effect one can show that we can show that E[XY]:=<X,Y> as an inner- product or quadratic form equals its own negative.
I don't know what this means. Since inner products are (bi)linear you should immediately question comments like this. From what I can tell you're saying

##E\big[XY\big] = E\big[-XY\big] = -E\big[XY\big]##
where the RHS follows by linearity of expectations. But this implies ##E\big[XY\big]=0## which of course isn't true in general.

Your statement also seems to contradict the fact that every n x n, real symmetric positive (semi)definite matrix is a covariance matrix (for a multivariate gaussian) and every covariance matrix (where 2nd moments exist) is an n x n, real symmetric positive (semi)definite matrix.
 
  • #18
WWGD said:
Still, my initial question is: What joint are we assuming for a pair (X,Y) when we say they are uncorrelated? It seems strange when I read these statements without seeing a mentioned of a joint.
This is a property that a joint distribution may or may not have. There is no need to specify a particular joint distribution. It is like saying that f(x) = f(-x) defines the property of an even function without specifying any particular function.
 
  • #19
FactChecker said:
This is a property that a joint distribution may or may not have. There is no need to specify a particular joint distribution. It is like saying that f(x) = f(-x) defines the property of an even function without specifying any particular function.
I am not sure I get your point. Do you mean being uncorrelated depends on the joint? Yes, of course. But I wonder when I see the claim of uncorrelated which choice is assumed?
 
  • #20
There is no need to specify any specific distribution in the definition of "uncorrelated". Of course, when one talks about any particular pair of random variables, X and Y, there is a joint distribution for those variables. That will be the one that applies when one talks about the correlation between X and Y.
 
  • #21
FactChecker said:
There is no need to specify any specific distribution in the definition of "uncorrelated". Of course, when one talks about any particular pair of random variables, X and Y, there is a joint distribution for those variables. That will be the one that applies when one talks about the correlation between X and Y.
Yes, I understand that, but I am trying to test that, e.g. the pair ## (X,X^2) ## is uncorrelated. How would I go about it? Same for points on a circle ( say unit) : ## (X, \sqrt{(1-X^2)} ## . How would I show it then?
 
Last edited:
  • #22
WWGD said:
Yes, I understand that, but I am trying to test that, e.g. the pair (X,X^2) is uncorrelated. How would I go about it?
Your question is not well defined (unless I have missed something). It is up to you to specify what distributions you are working with. If X is uniformly distributed on the interval [2,3], then X and X^2 are correlated. If it is uniformly distributed on the interval [-1,1], then X and X^2 are uncorrelated.

A simpler example of the two cases is:
X = 2 or 3 with equal probability 1/2 (correlated)
X = -1 or 1 with equal probability 1/2 (uncorrelated)
 
  • Like
Likes StoneTemplePython and Stephen Tashi
  • #23
FactChecker said:
Your question is not well defined (unless I have missed something). It is up to you to specify what distributions you are working with. If X is uniformly distributed on the interval [2,3], then X and X^2 are correlated. If it is uniformly distributed on the interval [-1,1], then X and X^2 are uncorrelated.

A simpler example of the two cases is:
X = 2 or 3 with equal probability 1/2 (correlated)
X = -1 or 1 with equal probability 1/2 (uncorrelated)
Yes, I understand, this is almost tautological. No two pairs (X,Y) are " intrinsically" correlated/uncorrelated. But I am _given_ that they are without a mention of a joint . This means a joint is used _ implicitly_ and I am trying to make this assumption _explicit_.
I think we are not understanding each other. I am _given/told_ that the two are uncorrelated. This is given as a fact, without mentioning any underlying joint. So, a joint is assumed .I want to make this assumption explicit.
 
  • #24
WWGD said:
Yes, I understand that, but I am trying to test that, e.g. the pair (X,X^2) is uncorrelated. How would I go about it?

In such a case, I see why a defining a joint distribution presents a technical problem. The commonly encoutered bivariate density is a function ##j(x,y)## that integrates to 1 over some area (finite or infinite) in 2D space. To define a joint density for a set of points of the form ##(x,x^2)## brings up the problem of defining a function ##j(x,y)## that integrates to 1 over a line or line segment in 2D space. Ordinary 2D Riemann integration gives an answer of zero when we do 2D integration over a line segment in 2D.

I think we can appeal to a more advanced form of integration and solve that technical problem, but we can also sidestep the question of a joint density. To compute the expected value of a function of a random variable ##g(X)## we only need the density ##f(x)## for ##X##. ##E(g(x)) = \int g(x) f(x) dx##. The question of whether ##X## is correlated with ##X^2## only requires computing ##E(X)##, ##E(X^2)## and ##E((X)(X^2)) = E(X^3)##. Those expectations are functions of ##X##, so they can be computed using only the 1D density function for ##X##.

It would an interesting exercise in abstract mathematics to say the correct words for defining a joint density for ##(X,X^2)## in 2D and to use that definition to show computation using the joint density is equivalent to taking the 1D view of things. However, I don't know if that interests you - or whether I could do it.
 
  • Like
Likes WWGD
  • #25
Essentially, I am trying to solve:

##E[XY]-\mu_X\mu_Y=0 , aka \int xy f_{XY}(x,y)dxdy- \int xf_X (x)dx \int yf_Y(y)Dy=0 for f_{XY}. ##
 
Last edited:
  • #26
Stephen Tashi said:
I think we can appeal to a more advanced form of integration and solve that technical problem, but we can also sidestep the question of a joint density. To compute the expected value of a function of a random variable ##g(X)## we only need the density ##f(x)## for ##X##. ##E(g(x)) = \int g(x) f(x) dx##. The question of whether ##X## is correlated with ##X^2## only requires computing ##E(X)##, ##E(X^2)## and ##E((X)(X^2)) = E(X^3)##. Those expectations are functions of ##X##, so they can be computed using only the 1D density function for ##X##.

It would an interesting exercise in abstract mathematics to say the correct words for defining a joint density for ##(X,X^2)## in 2D and to use that definition to show computation using the joint density is equivalent to taking the 1D view of things. However, I don't know if that interests you - or whether I could do it.

This is popularly called the Law of The Unconscious Statistician. It is implied by Law of Total Expectation.
Or in more abstract form, it is implied by the fact that ##E\Big[g(X)\big \vert X\Big] = g(X)##
 
  • #27
So there must be something in the context of the problem that either explicitly or implicitly defines the density function of ##X## (the joint density function of ##(X,X^2)## can be derived from the density of ##X##) .
 
  • #28
FactChecker said:
So there must be something in the context of the problem that either explicitly or implicitly defines the density function of ##X## (the joint density function of ##(X,X^2)## can be derived from the density of ##X##) .
And it only took 27 posts to just get to the right formulation of the question. Serenity now!
 
  • Like
Likes FactChecker
  • #29
I just thought of another seemingly implicit assumption about distribution s. When the mean is described as the arithmetic average of numbers, i.e. ##(x_1+x_2+...+x_n)/n## which assumes a uniform distribution. I don't remember this assumption being stated explicitly.
 
  • #30
WWGD said:
I just thought of another seemingly implicit assumption about distribution s. When the mean is described as the arithmetic average of numbers, i.e. ##(x_1+x_2+...+x_n)/n## which assumes a uniform distribution. I don't remember this assumption being stated explicitly.

This is completely inaccurate. It's a definition and assumes no such thing. You are also likely mixing up statistics and probability theory.

Among other things, the SLLN tells us that iid sums of random variables with a finite mean
##\frac{1}{n}\big(X_1 + X_2 +... +X_n) \to \mu## with probability one.

suppose those ##X_i##'s are iid standard normal random variables, then for any natural number ##n##
##\frac{1}{n}\big(X_1 + X_2 +... +X_n)## is a gaussian distributed random variable, not a uniformly distributed random variable. You should have been able to figure this is out your self by looking at the MGF or CF for Uniform random variables and say any other convolution of iid random variables.

- - - -
edit:
if you know what you're doing you can even apply SLLN (or WLLN) to non-iid random variables with different means (supposing you meet a sufficient condition like kolmogorov criterion) which makes the idea that
##\frac{1}{n}\big(X_1 + X_2 +... +X_n)##
is somehow uniformly distributed even more bizarre
 
  • #31
StoneTemplePython said:
This is completely inaccurate. It's a definition and assumes no such thing. You are also likely mixing up statistics and probability theory.

Among other things, the SLLN tells us that iid sums of random variables with a finite mean
##\frac{1}{n}\big(X_1 + X_2 +... +X_n) \to \mu## with probability one.

suppose those ##X_i##'s are iid standard normal random variables, then for any natural number ##n##
##\frac{1}{n}\big(X_1 + X_2 +... +X_n)## is a gaussian distributed random variable, not a uniformly distributed random variable. You should have been able to figure this is out your self by looking at the MGF or CF for Uniform random variables and say any other convolution of iid random variables.

- - - -
edit:
if you know what you're doing you can even apply SLLN (or WLLN) to non-iid random variables with different means (supposing you meet a sufficient condition like kolmogorov criterion) which makes the idea that
##\frac{1}{n}\big(X_1 + X_2 +... +X_n)##
is somehow uniformly distributed even more bizarre
I am not saying that the expression is uniformly distributed. What I mean is that, strictly speaking, the expected value or mean is defined ( discrete case) as ##\Sigma x_if(x_i)## , where ##f(x) ## is the associated density. But if we define the mean / expected value as## (x_1+...x_n)/n##, this means we are assuming ##f(x_i)=1/n ## for all ##x_i## or at least it ends up coming down to the same thing as ##x_1 *1/n+...+x_n *1/n ##
 
  • #32
WWGD said:
I am not saying that the expression is uniformly distributed. What I mean is that, strictly speaking, the expected value or mean is defined ( discrete case) as ##\Sigma x_if(x_i)## , where ##f(x) ## is the associated density. But if we define the mean / expected value as## (x_1+...x_n)/n##, this means we are assuming ##f(x_i)=1/n ## for all ##x_i## or at least it ends up coming down to the same thing as ##x_1 *1/n+...+x_n *1/n ##
I understand the analogy you're trying to make -- I'm tempted to sign on off on "at least it ends up coming down to the same thing..." though I think it creates problems and isn't very helpful.

At this stage I'd suggest not having an interpretation -- just understanding the definition and the inequalities that are deployed. It will also make it easier to understand the CLT -- otherwise what is that-- an implicit 'uniform distribution between ##x_i##'s except we they have 'extra' mass rescaled by square root of n'? That doesn't make any sense to me.

In both cases (really WLLN and CLT), whether you divide by ##n## or ##\sqrt{n}##, it really has to do with carefully managing how variance grows / contracts/ stabilizes as you add random variables. That's really the point.

- - - -
note: You're using the wrong terminology. A discrete random variable doesn't have a probability density -- absolutely continuous ones do. Too much of this thread reads like "Casual" posts in Math Section -- something you've complained about before.
 
  • #33
Anything put in terms of the random variables, like ##(X_1+X_2+...+X_n)/n## is a random variable with a probability distribution, not a fixed number. Anything put in terms of the sample results, like ##(x_1+x_2+...+x_n)/n## is a single number, which estimates the mean but is not exact.
 
  • #34
FactChecker said:
Anything put in terms of the random variables, like ##(X_1+X_2+...+X_n)/n## is a random variable with a probability distribution, not a fixed number. Anything put in terms of the sample results, like ##(x_1+x_2+...+x_n)/n## is a single number, which estimates the mean but is not exact.
Well, I was referring to the ##x_i## as the population itself, so this _is_ the mean as I know it. I would agree if the ##x_i## was sample data.
 
  • #35
WWGD said:
Well, I was referring to the ##x_i## as the population itself, so this _is_ the mean as I know it. I would agree if the ##x_i## was sample data.

If the ##x_1,x_2,...x_n## are the possible values of the population, the mean of the population is not defined to be ##\frac{ \sum_{i=1}^n x_i}{n}##.
 

Similar threads

Back
Top