# Mean, variance and correlation - Multinomial distribution

Tags:
1. Jan 12, 2016

### AwesomeTrains

Hello everyone, I'm stuck at a elementary stochastic problem. I have to calculate means, variance and co-variance for two random variables.
1. The problem statement, all variables and given/known data
Let r,g,b∈ℕ. r red, g green and b black balls are placed in an urn.
The balls are then drawn one at a time with replacement, until a black ball is picked for the first time. (1)
X counts the number of red balls and Y the number of the green ones, until a black one is picked.
Find EX, EY, Var(X), Var(Y) and ρ(X,Y)=cov(X,Y)/σ_Xσ_Y

2. Relevant equations
Expectation value: $E(X)=\sum_{i∈I}x_iP(X=x_i)$
Multinomial distribution for 3 different balls: $P(X=x, Y=y, Z=z)=\frac{n!}{x!y!z!}p_1^xp_2^yp_3^z$, with $n=x+y+z$

3. The attempt at a solution
The probabilities for drawing a red ball, $p_1=\frac{r}{r+g+b}$, green $p_2=\frac{g}{r+g+b}$ and black $p_3=\frac{b}{r+g+b}$
I thought X and Y was i.d.d to the binomial distribution and would therefore have the means $EX=np_1$ and $EY=np_2$ but then the condition (1) isn't used.
I then thought about calculating $E(X)=\sum_{i∈I}x_iP(X=x, Y=y, Z=1)=\sum_{i∈I}x_iP(X=x, Y=y, Z=1)=\frac{(x+y+1)!}{x!y!1!}p_1^xp_2^yp_3$ where I let Z denote the number of black balls drawn.
But I got stuck again, not knowing what variable to sum for, and I think it's wrong to write $E(X)$ because the sum is not only for the random variable X.

Any tips are much appreciated
Alex

2. Jan 12, 2016

### Ray Vickson

Since you are drawing with replacement, the number of draws $Z$ before the first black is a (modified) geometric distribution with success probability $p_b = b/(r+g+b)$; that is, $P(Z = k) = (1-p_b)^k p_b, k = 0,1,2, \ldots \, .$ Given $Z = k , k \in \{0,1,2, \ldots \}$, can you see why the pair $(X,Y)$ given $Z =k$ has the binomial distribution $X \sim \text{Bin}(k,p_r/(p_r + p_g) ) \,$? That is, $X$ is binomial and $Y = k-X$. Using this, you can easily get $E(X|Z=k)$, $E(Y|Z=k)$, $\text{Var}(X|Z=k)$ and $\text{Var}(Y|Z=k)$. Then, by "unconditioning" you can get $EX, EY, \text{Var}(X), \text{Var}(Y)$. I'll let you worry about how to get $\text{Cov}(X,Y)$ by a conditioning-unconditioning argument.

BTW: it is not wrong to write $E(X)$ or $\text{Var}(X)$; these are just the mean and variance of the marginal distribution of $X$.

3. Jan 12, 2016

### AwesomeTrains

Thanks for the fast reply and for the help :)
I would never have thought of that. Thought I had chosen the right distribution.

Is it because there's either succes, (drawing the black ball) or failure drawing a red one?

How do you get this?

Then $E(X|Z=k)=\sum_x x\frac{P(Z=k, X=x)}{P(Z=k)}$?
What is meant by $P(Z=k, X=x)$ here?

4. Jan 13, 2016

### haruspex

No. It's because having fixed the number of draws before a black is drawn at k (note, Ray calls this Z, which is nor how you used Z in the OP), we know that only red and green balls are drawn in the first k draws. So that's just a binary result for each of k trials.
It's the joint probability, P[Z=k&X=x].

5. Jan 14, 2016

### AwesomeTrains

Hello
I've gotten this so far: $E(X|Z=k)=\sum_{x=0}^\infty xP(X=x|Z=k)=\sum_{x=0}^\infty x\frac{P(X=x,Z=k)}{P(Z=k)}=P(Z=k)^{-1}\sum_{x=0}^\infty xP(X=x,Z=k)= P(Z=k)^{-1}\sum_{x=0}^\infty x \text{Bin}(k,p_r/(p_r + p_g) ) \,$ I'm pretty sure the last equality sign is wrong though.
I'm not sure how to handle the joint probability. Those events aren't independent are they?

Btw, I've also gotten this tip from our tutor:
$EX=\sum_{i=1}^\infty E(X|Z=i)P(Z=i)$
Can I use that in the solution you've outlined for me?

6. Jan 14, 2016

### Ray Vickson

The event $\{ X=x, Z=k\}$ occurs when there are $x$ 'reds' and $k-x$ 'greens', followed by a 'black', and that has probability
$$P(X=x,Z=k) = C(k,x)\, p_r^x \, p_g^{k-x} \, p_b$$
Thus, $P(Z=k) = \sum_{x=0}^k P(X=x, Z=k) = (p_r + p_g)^k \, p_b$, by the binomial expansion of $(p_r + p_g)^k$. Therefore, we have
$$P(X = x | Z = k) = \frac{P(X=x,Z=k)}{P(Z=k)} = C(k,x)\, \left( \frac{p_r}{p_r+p_g}\right)^x \, \left( \frac{p_g}{p_r+p_g}\right)^{k-x}$$.
That means that $[X|Z=k\ \sim \text{Bin}\,(k, p_r/(p_r+p_g))$ as claimed before. Because $[X|Z=k]$ is binomial, its mean is known right away as $E(X|Z=k) = kp_r/(p_g+p_r)$. Therefore, $EX = \sum_{k=0}^{\infty} p_b (p_r+p_g)^k k p_r/(p_r+p_g)$. That last summation is do-able.

If I were trying to get $\text{Var}(X)$ I would use the fact that $\text{Var}(X) = E(X^2) - (E X)^2$, then get $E(X^2)$ in a similar manner as $EX$: $E(X^2|Z=k)$ is the mean of a squared binomial random variable, and that is related to its variance and mean as $E(X^2|Z=k) = \text{Var}(X|Z=k) + [E(X|Z=k)]^2$.

Note added in edit: Alternatively, you can try to compute the marginal distribution of $X$ directly as
$$P(X=x) = \sum_{k=x}^{\infty} P(Z=k) P(X=x|Z=k)$$

It also turns out that you can almost immediately write down the final expression for $P(X = x)$ without any complicated evaluations, provided that you reason very carefully (and quite subtly) about the nature of the event $\{ X = x \}$.

Last edited: Jan 14, 2016