# Covariance of random sum

1. Dec 30, 2007

### jimmy1

Suppose $$X_1,....,X_n$$ are independent and identically distributed random variables.
Now suppose I picked $$m_1$$ random variables from the set $$X_1,....,X_n$$ and defined $$Y_1$$ as the sum of the $$m_1$$ variables, where $$m_1$$ is also a random variable.
Now suppose I did this again and I picked $$m_2$$ random variables from the set $$X_1,....,X_n$$ and defined $$Y_2$$ as the sum of the $$m_2$$ variables, where $$m_2$$ again is a random variable.
I also know the expected number of random variables from the set $$X_1,....,X_n$$, that are contained in both sums $$Y_1$$ and $$Y_2$$. Call this number $$a$$.

So I basically have two random sums, $$Y_1$$ and $$Y_2$$, and I want to find the covariance bwteen them, $$Cov(Y_1, Y_2)$$. I came up with the following solution but it doesn't seem to work, so any pointers on what's wrong or how to go about doing it would be great.

So I just simply used the definition of covariance of sums, ie. For sequences of random variables $$A_1,....,A_m$$ and $$B_1,....,B_n$$, we have $$Cov(\sum_{i=1}^{m}A_i,\sum_{j=1}^{n}B_j) = \sum_{i=1}^{m}\sum_{j=1}^{n}Cov(A_i,B_j)$$.
So applying the above formula to my situation of $$Cov(Y_1, Y_2)$$, I have that because $$X_1,....,X_n$$ are independent, most of the terms in the double sum in the above formula will be zero, and will only be non-zero if $$X_i \equiv X_j$$, in which case $$Cov(X_i,X_j)$$ will be just $$Var(X_1)$$.
Hence the $$Cov(Y_1, Y_2)$$, will be just $$a*Var(X_1)$$ ???

There is something wrong in the logic above, as the formula $$a*Var(X_1)$$ doesn't seem to work, but I can't figure out where I am going wrong. Any help??

2. Dec 30, 2007

### EnumaElish

If all the X's are mutually independent then doesn't that allow you to make a statement about Cov(X1+X2, X3+X4+X5), for example?

3. Dec 31, 2007

### jimmy1

All the X's are mutually independent, so if I apply the definition $$Cov(\sum_{i=1}^{m}A_i,\sum_{j=1}^{n}B_j) = \sum_{i=1}^{m}\sum_{j=1}^{n}Cov(A_i,B_j)$$ to your example Cov(X1+X2, X3+X4+X5), then the answer is 0.

But in my situation there is a certain amount of overlap, For example, suppose I have the set $$X_1,.....X_{20}$$, and $$m_1=5$$ and $$m_2=7$$, then I might have a situation where $$Y_1=X_1+X_3+X_6+X_8+X_9$$ and $$Y_2=X_1+X_2+X_3+X_6+X_{10}+X_{12}+X_{15}$$.

So in this case if I apply the above covariance of sum definition then $$Cov(Y_1,Y_2)$$ will not be 0, as there will be 3 non-zero terms in the sum (ie. $$Cov(X_1,X_1), Cov(X_3,X_3), Cov(X_6,X_6))$$.
So, as all X's are identically distributed, we get $$Cov(Y_1,Y_2)$$ = $$a*Var(X) = 3*Var(X)$$.

Now this formula, $$a*Var(X)$$, works when $$m_1$$ and $$m_2$$ are not random variables, but when they are random variables it doesn't work anymore. When they are random variables, I know what the expected values of $$m_1$$ and $$m_2$$ are going to be, and also know what the expected number of overlapping elements will be, call this $$a$$.

So from this information, anyone know how to get the expression for $$Cov(Y_1,Y_2)$$ when $$m_1$$ and $$m_2$$ are random variables??

Last edited: Dec 31, 2007
4. Jan 1, 2008

### EnumaElish

To simplify, suppose you have X1, X2.

Then m = 1 or 2, and n = 1 or 2.

If m = 1 then Y1 is X1 or X2. If m = 2 then Y1 is X1+X2.

Similarly if n = 1 then Y2 is X1 or X2. If n = 2 then Y2 is X1+X2.

If you can make a table of these possible outcomes and assign a probability to each, you can calculate a probability-weighted average of the covariance formulas for each possible case.