The Investor said:
If you toss a fair coin 20 times, what is the chance that you get a run of at least 5 heads at any time?
Easy to solve by simulation, and I know the answer already, but I'm trying to create a spreadsheet where I can enter the number of tosses and the length of the run required and get the probability.
Can someone tell me how to do this in excel?
http://www.win.tue.nl/~iadan/blockq/rows.pdf
I know you are specifically asking how to do that recurrence method in Excel, and I don't know that one. But there is a way to get an exact, explicit solution for any number of tosses n and any run length k by counting. In fact, you and your friend figured out the main trick, which is considering sequences of the form THHHHH. I learned this from Tiny Tim, who I think is an admin here.
A run of at least 5 heads can happen in two ways. Either the series of 20 tosses starts out HHHHH, or at some point in the series the sequence THHHHH comes up. These are not mutually exclusive, so the probability you are looking for is
P(series starts out HHHHH) + P(sequence THHHHH comes up) - P(start out HHHHH
and THHHHH comes up later)
The first probability is (1/2)
5, of course. To find the second probability, the probability that THHHHH comes up at least once in 20 tosses, you again use the Principle of Inclusion/Exclusion (PIE) to avoid over counting.
The sequence THHHHH could start on the first toss, the second toss, the third toss, etc. Symbolize these by
THHHHHXXXXXXXXXXXXXX
XTHHHHHXXXXXXXXXXXXX
XXTHHHHHXXXXXXXXXXXX
.
.
.
(there are 14 + 1 = 15 of these)
where X could be either H or T. Each of these sequences has probability (1/2)
6. As you have already pointed out, this overcounts such sequences as
XTHHHHHXXXXTHHHHHXXX
So they must be subtracted out. There are 20 -(2*6) + 2 = 10 places for either an X or a THHHHH to fit in, and you must choose 2. Thus there are 10 C 2 such sequences, and each sequence has probability (1/2)
12
But now we have undercounted those sequences that have 3 occurrences of THHHHH, such as
THHHHHXXTHHHHHTHHHHH
We must add these back in. There are 20-(3*6) + 3 = 5 places for either an X or a THHHHH to fit in, and you must choose 3. So the probability of all these sequences is (5 C 3)*(1/2)
18. Therefore the probability for an occurrence of THHHHH in 20 tosses is
P(sequence THHHHH comes up) = 15*(1/2)
6 - (10 C 2)*(1/2)
12 + (5 C 3)*(1/2)
18
The probability that you start out with HHHHH and also have a THHHHH come up later is calculated in a similar way.