1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Sum of squares of correlated Bernoulli rvs

  1. Jan 25, 2012 #1
    Let X, Y be two Bernoulli random variables (respective success probabilities p_X, p_Y) with Pearson correlation coefficient c.

    (To be precise, these random variables correspond to the truth tables of Boolean functions f_X and f_Y. That is, for a randomly chosen n-bit input i, X = 0 iff f_X(i) = 0.)

    Let us assume that there exist two counters C_X, C_Y, each of which we initialise with the value 0.

    There follows a sequence of m experiments. In each experiment, we generate a random n-bit input i, and calculate f_X(i) and f_Y(i).

    If f_X(i)=0, we add 1 to C_X. If f_X(i) = 1, we subtract 1 from C_X (which can contain negative values.)
    Similarly, we add (-1)^(f_Y(i)) to C_Y.

    The distribution of each of the values of C_X and C_Y is pretty much Binomial - we would have to subtract m from each and divide by -2 to obtain the actual Binomially distributed variables - which suggests that if m is sufficiently large, C_X and C_Y are approximately Normally distributed. If c=0, it would follow that the distribution of the sum of their squares was the noncentral chi-squared distribution.

    My first question is: what is the distribution of (C_{X}^2 + C_{Y}^2) if c is not zero; i.e if X and Y are not independent? And is there any way in which it is related to the noncentral chi-squared distribution?

    (If [C_{X}^2 + (some function of c and C_{Y}^2)] is noncentral chi-square distributed, this would be especially helpful)

    Now, let us complicate the issue by introducing a third Bernoulli variable, Z. Let c_xy be the correlation coefficient of X and Y, c_xz be that of X and Z, etc. Let f_Z, C_Z be defined in terms of Z in the same way as before.

    (I've got an actual example in which c_xy = -0.6, c_xz = 0.467, c_yz = -0.6.)

    My second question is: what is the probability distribution of (C_{X}^{2} + C_{Y}^{2} + C_{Z}^{2})?

    As you may have guessed, my final question is; when we generalise this to the case of k different Bernoulli r.vs (X_1, ..., X_k), what is the probability distribution of (C_{X_{1}} + ... + C_{X_{k}}^2)?

    Many thanks,

    James McLaughlin.
  2. jcsd
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Can you offer guidance or do you also need help?
Draft saved Draft deleted