Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Covariance of random sum

  1. Dec 30, 2007 #1
    Suppose [tex]X_1,....,X_n[/tex] are independent and identically distributed random variables.
    Now suppose I picked [tex]m_1[/tex] random variables from the set [tex]X_1,....,X_n[/tex] and defined [tex]Y_1[/tex] as the sum of the [tex]m_1[/tex] variables, where [tex]m_1[/tex] is also a random variable.
    Now suppose I did this again and I picked [tex]m_2[/tex] random variables from the set [tex]X_1,....,X_n[/tex] and defined [tex]Y_2[/tex] as the sum of the [tex]m_2[/tex] variables, where [tex]m_2[/tex] again is a random variable.
    I also know the expected number of random variables from the set [tex]X_1,....,X_n[/tex], that are contained in both sums [tex]Y_1[/tex] and [tex]Y_2[/tex]. Call this number [tex]a[/tex].

    So I basically have two random sums, [tex]Y_1[/tex] and [tex]Y_2[/tex], and I want to find the covariance bwteen them, [tex]Cov(Y_1, Y_2)[/tex]. I came up with the following solution but it doesn't seem to work, so any pointers on what's wrong or how to go about doing it would be great.

    So I just simply used the definition of covariance of sums, ie. For sequences of random variables [tex]A_1,....,A_m[/tex] and [tex]B_1,....,B_n[/tex], we have [tex]Cov(\sum_{i=1}^{m}A_i,\sum_{j=1}^{n}B_j) = \sum_{i=1}^{m}\sum_{j=1}^{n}Cov(A_i,B_j)[/tex].
    So applying the above formula to my situation of [tex]Cov(Y_1, Y_2)[/tex], I have that because [tex]X_1,....,X_n[/tex] are independent, most of the terms in the double sum in the above formula will be zero, and will only be non-zero if [tex]X_i \equiv X_j[/tex], in which case [tex]Cov(X_i,X_j)[/tex] will be just [tex]Var(X_1)[/tex].
    Hence the [tex]Cov(Y_1, Y_2)[/tex], will be just [tex]a*Var(X_1)[/tex] ???

    There is something wrong in the logic above, as the formula [tex]a*Var(X_1)[/tex] doesn't seem to work, but I can't figure out where I am going wrong. Any help??
  2. jcsd
  3. Dec 30, 2007 #2


    User Avatar
    Science Advisor
    Homework Helper

    If all the X's are mutually independent then doesn't that allow you to make a statement about Cov(X1+X2, X3+X4+X5), for example?
  4. Dec 31, 2007 #3
    All the X's are mutually independent, so if I apply the definition [tex]Cov(\sum_{i=1}^{m}A_i,\sum_{j=1}^{n}B_j) = \sum_{i=1}^{m}\sum_{j=1}^{n}Cov(A_i,B_j)[/tex] to your example Cov(X1+X2, X3+X4+X5), then the answer is 0.

    But in my situation there is a certain amount of overlap, For example, suppose I have the set [tex]X_1,.....X_{20}[/tex], and [tex]m_1=5[/tex] and [tex]m_2=7[/tex], then I might have a situation where [tex]Y_1=X_1+X_3+X_6+X_8+X_9[/tex] and [tex]Y_2=X_1+X_2+X_3+X_6+X_{10}+X_{12}+X_{15}[/tex].

    So in this case if I apply the above covariance of sum definition then [tex]Cov(Y_1,Y_2)[/tex] will not be 0, as there will be 3 non-zero terms in the sum (ie. [tex]Cov(X_1,X_1), Cov(X_3,X_3), Cov(X_6,X_6))[/tex].
    So, as all X's are identically distributed, we get [tex]Cov(Y_1,Y_2)[/tex] = [tex]a*Var(X) = 3*Var(X)[/tex].

    Now this formula, [tex]a*Var(X)[/tex], works when [tex]m_1[/tex] and [tex]m_2[/tex] are not random variables, but when they are random variables it doesn't work anymore. When they are random variables, I know what the expected values of [tex]m_1[/tex] and [tex]m_2[/tex] are going to be, and also know what the expected number of overlapping elements will be, call this [tex]a[/tex].

    So from this information, anyone know how to get the expression for [tex]Cov(Y_1,Y_2)[/tex] when [tex]m_1[/tex] and [tex]m_2[/tex] are random variables??
    Last edited: Dec 31, 2007
  5. Jan 1, 2008 #4


    User Avatar
    Science Advisor
    Homework Helper

    To simplify, suppose you have X1, X2.

    Then m = 1 or 2, and n = 1 or 2.

    If m = 1 then Y1 is X1 or X2. If m = 2 then Y1 is X1+X2.

    Similarly if n = 1 then Y2 is X1 or X2. If n = 2 then Y2 is X1+X2.

    If you can make a table of these possible outcomes and assign a probability to each, you can calculate a probability-weighted average of the covariance formulas for each possible case.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook