# Probability question involving picking balls from a bag

Tags:
1. Nov 30, 2014

### Ryuzaki

I’m working on a chemistry problem, which essentially translates to finding the answer to a related probability problem. However, my knowledge in probability is very limited and I'd be grateful if someone could help me out with it. The following is the problem:-

Suppose I have a bag containing $70$ red balls and $30$ blue balls. For the purpose of illustration, let’s call them $R$s (red balls) and $B$s (blue balls). Now, I am going to pick one ball at a time from this bag, without replacement. I define a run to be a sequence of consecutive $R$s (or alternately, $B$s) picked, along with the first $B$ (or $R$) that is picked. And I define a red (or blue) run length to be the number of consecutive $R$s (or $B$s) I pick in a run, before I encounter a $B$ (or $R$) or until the number of balls run out.

As examples, $RRRRRRB$ is a run (for simplicity, let me denote it by $R_6$ in shorthand) with red run length $6$, $RB$ is a run (denoted by $R_1$) with red run length $1$, $BBBR$ is a run (denoted by $B_3$) with blue run length $3$.

In each simulation, I keep doing runs until all the $100$ balls are picked out (since the balls are picked without replacement, the number of runs and the red/blue run lengths are both finite).

Let’s look at a typical simulation of ball-picking: $R_{50}R_{10}B_{28}R_9$. In this simulation, there are $4$ runs. The first run consists of $50$ consecutive red balls, until a blue ball is encountered. The second run consists of $10$ consecutive red balls until a ball is encountered. The third run consists of $28$ consecutive blue balls until a red ball is encountered. And the last run consists of $9$ consecutive red balls, and the simulation ends as there are no more balls to be picked.

It is easy to see that the minimum possible number of runs is $2$ (attained by $R_{70}$ followed by $B_{29}$, or $B_{30}$ followed by $R_{69}$) and the maximum possible number of runs is $31$ (attained by $R_1$ $30$ times followed by $R_{40}$, or $B_1$ $30$ times followed by $R_{70}$).

Also, the maximum possible value of red run length is $70$ and that of blue run length is $30$.

Now, I’m interested in knowing the probability distribution of the red and blue run lengths. For this, I believe that I must first find the expected value of the number of runs in a simulation. But I’m not sure how to proceed from here. So to sum up, the following are my questions:-

1. How do I find the expected value of the number of runs in a simulation?

2. For that expected value, how do I calculate the probability distribution of red and blue run lengths?

2. Nov 30, 2014

### Staff: Mentor

It might be possible to find formulas, but that problem looks messy if you want exact answers.

You can get a reasonable approximation for small run lengths (=the most frequent case) if you assume each ball has a .7 probability to be red and a .3 probability to be blue, even if those numbers change during a run. You'll get an exponential distribution for run lengths.

Alternatively, simulate it. Especially for the expected number of runs, this is probably the easiest way.