# Var(X) = Cov(X,X) ??

by Rasalhague
Tags: covx, varx
 P: 1,400 $$Var(X)=\sum_{i=1}^N P(X_i)(X_i-EX)^2.$$ $$Cov(X,Y) = \sum_{i=1}^N\sum_{j=1}^M P(X_i,Y_j)(X_i - EX)(Y_j - EY).$$ If, for instance, $P(X_i) = 1/N$ and $X = Y = (1,2,3)$, then $$Var(X) = \frac{1}{3} ((1-2)^2 + (2-2)^2 + (3-2)^2) = \frac{2}{3},$$ but $$Cov(X,X) = \sum_{i=1}^3 \sum_{j=1}^3 \frac{1}{9} (X_i - EX)(X_j - EX)$$ $$=\frac{1}{9}((1-2)^2+(3-2)^2+2(1-2)(3-2)) = 2 - 2 = 0??$$ There are 9 values of (X,Y); each occurs with equal probability. I've omitted the terms that contain (2-2) from the summation. Apparently I've misunderstood something about the definition of covariance, but what?
Emeritus
Thanks
PF Gold
P: 15,868
 Quote by Rasalhague $$Var(X)=\sum_{i=1}^N P(X_i)(X_i-EX)^2.$$ $$Cov(X,Y) = \sum_{i=1}^N\sum_{j=1}^M P(X_i,Y_j)(X_i - EX)(Y_j - EY).$$
This formula is wrong.

Here is how you calculate it. By definition, the covariance is

$$Cov(X,X)=E[ (X-EX)(X-EX) ]$$

So define the random variable $Z=(X-EX)(X-EX)=(X-2)^2$. The covariance is EZ. Now, if X takes on the values 1,2 and 3. Then Z takes on the values 0,1. Furthermore $P\{Z=0\}=P\{X=2\}=1/3$ and $P\{Z=1\}=P(\{X=1\}\cup \{X=3\}) = 2/3$.

Thus

$$Cov(X,X)=EZ = \sum_{k=0}^1 k P\{Z=k\} = 2/3$$
 P: 4,568 Hey Rasalhague. I don't know what you did, but Ill use the expanded form of covariance in your definition. Cov(X,X) = E[(X - E[X])(X - E[X])] = E[X^2] - E[X]^2. You are not applying the expectation operator correctly since you are need to apply the definition of the expectation to the whole definition (i.e (X-E[X])(X-E[X) and this means taking into account shifts by the mean. If you expand the Covariance operator you get: Cov(X,Y) = E[XY] - E[X]E[Y] and this is done using some simple algebra which leaves us with Cov(X,X) = E[X^2] - E[X]^2 which is the same as the variance. You are not calculating the variance or covariance but something that I have absolutely no idea with.
P: 1,400

## Var(X) = Cov(X,X) ??

The formula defines covariance for discrete variables in Simon & Blume (1994): Mathematics for Economists, end of section A5.4, and in Robert J. Serfling's online intro 'Covariance and Correlation', formula (1) which he identifies with E[(X-EX)(Y-EY)P(X,Y)] in the formula which follows that. Serfling also states that P(X,Y) means

$$P_X(X)\cdot P_Y(Y)$$

which in my example makes P(X,X) = (1/3)(1/3) = 1/9. Perhaps you could explain how you would calculate an example where $X \neq Y$, e.g. X = (1, 2, 3) and Y = (1, 4, 9).
 P: 1,400 I'm not sure how to reconcile Serfling's formula (1) with the way Wolfram Mathworld writes it out explicitly for the case where N = M: http://mathworld.wolfram.com/Covariance.html Are there two somewhat different concepts each called covariance, each corresponding to its own way of defining the mean of the product of two random variables?
 P: 1,400 Ah, reading further on that Mathworld article, it seems one definition concerns real-valued random variables from a finite sample space, another definition concerns tuples of such random variables. But still, there appear to be a variety of concepts here to which the name covariance is attached, with disagreement over certain points, and Mathworld doesn't give an explicit version of the more general definition.