# Sum of squares of correlated Bernoulli rvs

1. Jan 25, 2012

### jamie_m

Let X, Y be two Bernoulli random variables (respective success probabilities p_X, p_Y) with Pearson correlation coefficient c.

(To be precise, these random variables correspond to the truth tables of Boolean functions f_X and f_Y. That is, for a randomly chosen n-bit input i, X = 0 iff f_X(i) = 0.)

Let us assume that there exist two counters C_X, C_Y, each of which we initialise with the value 0.

There follows a sequence of m experiments. In each experiment, we generate a random n-bit input i, and calculate f_X(i) and f_Y(i).

If f_X(i)=0, we add 1 to C_X. If f_X(i) = 1, we subtract 1 from C_X (which can contain negative values.)
Similarly, we add (-1)^(f_Y(i)) to C_Y.

The distribution of each of the values of C_X and C_Y is pretty much Binomial - we would have to subtract m from each and divide by -2 to obtain the actual Binomially distributed variables - which suggests that if m is sufficiently large, C_X and C_Y are approximately Normally distributed. If c=0, it would follow that the distribution of the sum of their squares was the noncentral chi-squared distribution.

My first question is: what is the distribution of (C_{X}^2 + C_{Y}^2) if c is not zero; i.e if X and Y are not independent? And is there any way in which it is related to the noncentral chi-squared distribution?

(If [C_{X}^2 + (some function of c and C_{Y}^2)] is noncentral chi-square distributed, this would be especially helpful)

Now, let us complicate the issue by introducing a third Bernoulli variable, Z. Let c_xy be the correlation coefficient of X and Y, c_xz be that of X and Z, etc. Let f_Z, C_Z be defined in terms of Z in the same way as before.

(I've got an actual example in which c_xy = -0.6, c_xz = 0.467, c_yz = -0.6.)

My second question is: what is the probability distribution of (C_{X}^{2} + C_{Y}^{2} + C_{Z}^{2})?

As you may have guessed, my final question is; when we generalise this to the case of k different Bernoulli r.vs (X_1, ..., X_k), what is the probability distribution of (C_{X_{1}} + ... + C_{X_{k}}^2)?

Many thanks,

James McLaughlin.