# Regarding indicators in Statistics.

1. Jun 9, 2013

### peripatein

Hi,
1. The problem statement, all variables and given/known data
n different balls are distributed independently between m boxes with unlimited capacity each. I am asked to find the expectation and variance of the number of empty boxes.

2. Relevant equations

3. The attempt at a solution
The probability of i-th box being empty at the end is (1-1/m)n. Ergo, E[Xi] = P(Xi) = (1-1/m)n. Hence, E[X] = m(1-1/m)n.
As for the Variance, I used Var(Xi) = E[Xi](1-E[Xi])=(1-1/m)n(1-(1-1/m)n). Therefore, Var(X) = m*Var(Xi).
Is that correct?
1. The problem statement, all variables and given/known data

2. Relevant equations

3. The attempt at a solution

2. Jun 9, 2013

### peripatein

I am wondering why no one has yet replied. Is my formulation inappropriate/incomprehensible?

3. Jun 9, 2013

### Office_Shredder

Staff Emeritus
If the Xi were independent, then your variance would be correct. They aren't though, so you will have terms like E(Xi Xj) that you will have to deal with (luckily it's not too hard to calculate these) when you expand E(X2)

4. Jun 9, 2013

### peripatein

So should it be:
m(1-1/m)n(1-(1-1/m)n) + 2m(1-2/m)n?
(I used the expression for the Covariance)

5. Jun 9, 2013

### Ray Vickson

Not incomprehensible, just sloppy and incomplete. What is the meaning of $X_i$? If you mean that $X_i = 1$ if box i is empty and $X_i = 0$ if box i is not empty, then you should say so. Also, the notation $P(X_i)$ is meaningless; if you mean $P(X_i = 1)$ you should write that.

6. Jun 9, 2013

### peripatein

I am sorry, I was using my mobile to post that one. I did mean everything you thought I might have meant.
I'd very much appreciate it if you could comment on my attempt at solution now.
Actually, both this one and the Statistics problem I posted earlier.

7. Jun 10, 2013

### peripatein

I'd truly appreciate some feedback on my recent attempt at solution, namely:
The probability of the i-th box being empty at the end is (1-1/m)n. Ergo, E[Xi] = P(Xi) = (1-1/m)n. Hence, E[X] = m(1-1/m)n.
As for the Variance, I used the following:
Var(X) = m*V(Xi) + 2*SIGMA(where i<j)*Cov(Xi,Xj) = m(1-1/m)n(1-(1-1/m)n) + mC2*(1-2/m)n
I am not sure this is correct but would certainly appreciate any comments.

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted