# Mean, variance and correlation - Multinomial distribution

Hello everyone, I'm stuck at a elementary stochastic problem. I have to calculate means, variance and co-variance for two random variables.

## Homework Statement

Let r,g,b∈ℕ. r red, g green and b black balls are placed in an urn.
The balls are then drawn one at a time with replacement, until a black ball is picked for the first time. (1)
X counts the number of red balls and Y the number of the green ones, until a black one is picked.
Find EX, EY, Var(X), Var(Y) and ρ(X,Y)=cov(X,Y)/σ_Xσ_Y

## Homework Equations

Expectation value: $E(X)=\sum_{i∈I}x_iP(X=x_i)$
Multinomial distribution for 3 different balls: $P(X=x, Y=y, Z=z)=\frac{n!}{x!y!z!}p_1^xp_2^yp_3^z$, with $n=x+y+z$

## The Attempt at a Solution

The probabilities for drawing a red ball, $p_1=\frac{r}{r+g+b}$, green $p_2=\frac{g}{r+g+b}$ and black $p_3=\frac{b}{r+g+b}$
I thought X and Y was i.d.d to the binomial distribution and would therefore have the means $EX=np_1$ and $EY=np_2$ but then the condition (1) isn't used.
I then thought about calculating $E(X)=\sum_{i∈I}x_iP(X=x, Y=y, Z=1)=\sum_{i∈I}x_iP(X=x, Y=y, Z=1)=\frac{(x+y+1)!}{x!y!1!}p_1^xp_2^yp_3$ where I let Z denote the number of black balls drawn.
But I got stuck again, not knowing what variable to sum for, and I think it's wrong to write $E(X)$ because the sum is not only for the random variable X.

Any tips are much appreciated
Alex

## Answers and Replies

Ray Vickson
Homework Helper
Dearly Missed
Hello everyone, I'm stuck at a elementary stochastic problem. I have to calculate means, variance and co-variance for two random variables.

## Homework Statement

Let r,g,b∈ℕ. r red, g green and b black balls are placed in an urn.
The balls are then drawn one at a time with replacement, until a black ball is picked for the first time. (1)
X counts the number of red balls and Y the number of the green ones, until a black one is picked.
Find EX, EY, Var(X), Var(Y) and ρ(X,Y)=cov(X,Y)/σ_Xσ_Y

## Homework Equations

Expectation value: $E(X)=\sum_{i∈I}x_iP(X=x_i)$
Multinomial distribution for 3 different balls: $P(X=x, Y=y, Z=z)=\frac{n!}{x!y!z!}p_1^xp_2^yp_3^z$, with $n=x+y+z$

## The Attempt at a Solution

The probabilities for drawing a red ball, $p_1=\frac{r}{r+g+b}$, green $p_2=\frac{g}{r+g+b}$ and black $p_3=\frac{b}{r+g+b}$
I thought X and Y was i.d.d to the binomial distribution and would therefore have the means $EX=np_1$ and $EY=np_2$ but then the condition (1) isn't used.
I then thought about calculating $E(X)=\sum_{i∈I}x_iP(X=x, Y=y, Z=1)=\sum_{i∈I}x_iP(X=x, Y=y, Z=1)=\frac{(x+y+1)!}{x!y!1!}p_1^xp_2^yp_3$ where I let Z denote the number of black balls drawn.
But I got stuck again, not knowing what variable to sum for, and I think it's wrong to write $E(X)$ because the sum is not only for the random variable X.

Any tips are much appreciated
Alex

Since you are drawing with replacement, the number of draws ##Z## before the first black is a (modified) geometric distribution with success probability ##p_b = b/(r+g+b)##; that is, ##P(Z = k) = (1-p_b)^k p_b, k = 0,1,2, \ldots \, .## Given ##Z = k , k \in \{0,1,2, \ldots \}##, can you see why the pair ##(X,Y)## given ##Z =k## has the binomial distribution ##X \sim \text{Bin}(k,p_r/(p_r + p_g) ) \, ##? That is, ##X## is binomial and ##Y = k-X##. Using this, you can easily get ##E(X|Z=k)##, ##E(Y|Z=k)##, ##\text{Var}(X|Z=k)## and ##\text{Var}(Y|Z=k)##. Then, by "unconditioning" you can get ##EX, EY, \text{Var}(X), \text{Var}(Y)##. I'll let you worry about how to get ##\text{Cov}(X,Y)## by a conditioning-unconditioning argument.

BTW: it is not wrong to write ##E(X)## or ##\text{Var}(X)##; these are just the mean and variance of the marginal distribution of ##X##.

Thanks for the fast reply and for the help :)
I would never have thought of that. Thought I had chosen the right distribution.

can you see why the pair (X,Y) given Z=k has the binomial distribution X∼Bin(k,pr/(pr+pg))?
Is it because there's either succes, (drawing the black ball) or failure drawing a red one?

That is, X is binomial and Y=k−X
How do you get this?

Then $E(X|Z=k)=\sum_x x\frac{P(Z=k, X=x)}{P(Z=k)}$?
What is meant by $P(Z=k, X=x)$ here?

haruspex
Homework Helper
Gold Member
2020 Award
Is it because there's either succes, (drawing the black ball) or failure drawing a red one?
No. It's because having fixed the number of draws before a black is drawn at k (note, Ray calls this Z, which is nor how you used Z in the OP), we know that only red and green balls are drawn in the first k draws. So that's just a binary result for each of k trials.
What is meant by P(Z=k,X=x) here?
It's the joint probability, P[Z=k&X=x].

Hello
I've gotten this so far: $E(X|Z=k)=\sum_{x=0}^\infty xP(X=x|Z=k)=\sum_{x=0}^\infty x\frac{P(X=x,Z=k)}{P(Z=k)}=P(Z=k)^{-1}\sum_{x=0}^\infty xP(X=x,Z=k)= P(Z=k)^{-1}\sum_{x=0}^\infty x \text{Bin}(k,p_r/(p_r + p_g) ) \,$ I'm pretty sure the last equality sign is wrong though.
I'm not sure how to handle the joint probability. Those events aren't independent are they?

Btw, I've also gotten this tip from our tutor:
$EX=\sum_{i=1}^\infty E(X|Z=i)P(Z=i)$
Can I use that in the solution you've outlined for me?

Ray Vickson
Homework Helper
Dearly Missed
Hello
I've gotten this so far: $E(X|Z=k)=\sum_{x=0}^\infty xP(X=x|Z=k)=\sum_{x=0}^\infty x\frac{P(X=x,Z=k)}{P(Z=k)}=P(Z=k)^{-1}\sum_{x=0}^\infty xP(X=x,Z=k)= P(Z=k)^{-1}\sum_{x=0}^\infty x \text{Bin}(k,p_r/(p_r + p_g) ) \,$ I'm pretty sure the last equality sign is wrong though.
I'm not sure how to handle the joint probability. Those events aren't independent are they?

Btw, I've also gotten this tip from our tutor:
$EX=\sum_{i=1}^\infty E(X|Z=i)P(Z=i)$
Can I use that in the solution you've outlined for me?

The event ##\{ X=x, Z=k\}## occurs when there are ##x## 'reds' and ##k-x## 'greens', followed by a 'black', and that has probability
$$P(X=x,Z=k) = C(k,x)\, p_r^x \, p_g^{k-x} \, p_b$$
Thus, ##P(Z=k) = \sum_{x=0}^k P(X=x, Z=k) = (p_r + p_g)^k \, p_b##, by the binomial expansion of ##(p_r + p_g)^k##. Therefore, we have
$$P(X = x | Z = k) = \frac{P(X=x,Z=k)}{P(Z=k)} = C(k,x)\, \left( \frac{p_r}{p_r+p_g}\right)^x \, \left( \frac{p_g}{p_r+p_g}\right)^{k-x}$$.
That means that ##[X|Z=k\ \sim \text{Bin}\,(k, p_r/(p_r+p_g))## as claimed before. Because ##[X|Z=k]## is binomial, its mean is known right away as ##E(X|Z=k) = kp_r/(p_g+p_r)##. Therefore, ##EX = \sum_{k=0}^{\infty} p_b (p_r+p_g)^k k p_r/(p_r+p_g)##. That last summation is do-able.

If I were trying to get ##\text{Var}(X)## I would use the fact that ##\text{Var}(X) = E(X^2) - (E X)^2##, then get ##E(X^2)## in a similar manner as ##EX##: ##E(X^2|Z=k)## is the mean of a squared binomial random variable, and that is related to its variance and mean as ##E(X^2|Z=k) = \text{Var}(X|Z=k) + [E(X|Z=k)]^2##.

Note added in edit: Alternatively, you can try to compute the marginal distribution of ##X## directly as
$$P(X=x) = \sum_{k=x}^{\infty} P(Z=k) P(X=x|Z=k)$$

It also turns out that you can almost immediately write down the final expression for ##P(X = x)## without any complicated evaluations, provided that you reason very carefully (and quite subtly) about the nature of the event ##\{ X = x \}##.

Last edited: