Regarding indicators in Statistics.

1. Jun 9, 2013

peripatein

Hi,
1. The problem statement, all variables and given/known data
n different balls are distributed independently between m boxes with unlimited capacity each. I am asked to find the expectation and variance of the number of empty boxes.

2. Relevant equations

3. The attempt at a solution
The probability of i-th box being empty at the end is (1-1/m)n. Ergo, E[Xi] = P(Xi) = (1-1/m)n. Hence, E[X] = m(1-1/m)n.
As for the Variance, I used Var(Xi) = E[Xi](1-E[Xi])=(1-1/m)n(1-(1-1/m)n). Therefore, Var(X) = m*Var(Xi).
Is that correct?
1. The problem statement, all variables and given/known data

2. Relevant equations

3. The attempt at a solution

2. Jun 9, 2013

peripatein

I am wondering why no one has yet replied. Is my formulation inappropriate/incomprehensible?

3. Jun 9, 2013

Office_Shredder

Staff Emeritus
If the Xi were independent, then your variance would be correct. They aren't though, so you will have terms like E(Xi Xj) that you will have to deal with (luckily it's not too hard to calculate these) when you expand E(X2)

4. Jun 9, 2013

peripatein

So should it be:
m(1-1/m)n(1-(1-1/m)n) + 2m(1-2/m)n?
(I used the expression for the Covariance)

5. Jun 9, 2013

Ray Vickson

Not incomprehensible, just sloppy and incomplete. What is the meaning of $X_i$? If you mean that $X_i = 1$ if box i is empty and $X_i = 0$ if box i is not empty, then you should say so. Also, the notation $P(X_i)$ is meaningless; if you mean $P(X_i = 1)$ you should write that.

6. Jun 9, 2013

peripatein

I am sorry, I was using my mobile to post that one. I did mean everything you thought I might have meant.
I'd very much appreciate it if you could comment on my attempt at solution now.
Actually, both this one and the Statistics problem I posted earlier.

7. Jun 10, 2013

peripatein

I'd truly appreciate some feedback on my recent attempt at solution, namely:
The probability of the i-th box being empty at the end is (1-1/m)n. Ergo, E[Xi] = P(Xi) = (1-1/m)n. Hence, E[X] = m(1-1/m)n.
As for the Variance, I used the following:
Var(X) = m*V(Xi) + 2*SIGMA(where i<j)*Cov(Xi,Xj) = m(1-1/m)n(1-(1-1/m)n) + mC2*(1-2/m)n
I am not sure this is correct but would certainly appreciate any comments.