Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Probability Problem - Coin Tossing

  1. May 24, 2010 #1
    If you toss a fair coin 20 times, what is the chance that you get a run of at least 5 heads at any time?

    Easy to solve by simulation, and I know the answer already, but I'm trying to create a spreadsheet where I can enter the number of tosses and the length of the run required and get the probability.

    Can someone tell me how to do this in excel?

    http://www.win.tue.nl/~iadan/blockq/rows.pdf
     
  2. jcsd
  3. May 24, 2010 #2

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

  4. May 26, 2010 #3
    Thanks EnumaElish,

    Can you explain how this continues to get x?

    where x = 1 + q.p^r + (r+1)(q.p^r)^2 + ....
     
  5. May 26, 2010 #4
    Ok, here's a solution I used after getting some help from a friend. This is a very simple method to get the correct answer exactly, rather than an approximation.

    I've asked a few people about this and many take (1-0.5^5)^16 as a starting point, as you can get at least 5 heads in a row on tosses 1-5, 2-6,...16-20. But that seems to lead to a dead end as it doesn't simplify anything. A sequence of HTHHH on tosses 1-5
    already rules out HHHHH on tosses 2-6 and so on.

    Start with :
    0.03125 Chance of HHHHH on toss 1-5
    Then:
    0.015625 Chance of THHHHH on toss 1-6
    0.015625 Chance of THHHHH on toss 2-7
    They are mutually exclusive and all possible ways of achieving at least 5 consecutive heads are covered. We treat THHHHHH on toss 1-7 as identical to THHHHH on toss 1-6, as we are interested in at least 5 consecutive heads.
    0.015625 Chance of THHHHH on toss 3-8
    0.015625 Chance of THHHHH on toss 4-9
    0.015625 Chance of THHHHH on toss 5-10
    Now comes the tricky bit. We can't simply continue like this until toss 16-20 because sequences get counted multiple times (for instance HHHHHTHHHHH would be counted twice). So you need to multiply The chance of the next group of 6 being THHHHH by the chance that there hasn't previously been a group of THHHHH. At this stage that chance would be 1 - ( the chance of THHHHH on tosses 1-6 + the chance of THHHHH on tosses 1-7). and continue.


    In excel this can be done in a few seconds:

    A1 = 0.5^5
    A2 through A6 = 0.5^6
    B7 = 1-sum($A$1:A1)
    A7 = 0.5^6*B7
    Copy the range A7:B7 to A16:B16
    The sum of A1:A16 gives the answer. It also gives all the answers for at least 5 heads from 5-19 tosses.

    The output is

    0.03125
    0.015625
    0.015625
    0.015625
    0.015625
    0.015625
    0.015136719 0.96875
    0.014892578 0.953125
    0.014648438 0.9375
    0.014404297 0.921875
    0.014160156 0.90625
    0.013916016 0.890625
    0.013679504 0.875488281
    0.013446808 0.860595703
    0.013217926 0.845947266
    0.012992859 0.831542969
    Answer (rounded off to 7 digits): 0.2498703


    This can also be adapted to solve 6 consecutive heads in 100 tosses or any other number.

    I'd still appreciate some help in getting this solution in excel though: http://www.win.tue.nl/~iadan/blockq/rows.pdf
    As I'd like to understand how it works!
     
  6. May 26, 2010 #5
    "1 - ( the chance of THHHHH on tosses 1-6 + the chance of THHHHH on tosses 1-7)"

    Should be
    1 - ( the chance of HHHHH on tosses 1-5)

    The probability is the same.
     
  7. May 28, 2010 #6
    I know you are specifically asking how to do that recurrence method in Excel, and I don't know that one. But there is a way to get an exact, explicit solution for any number of tosses n and any run length k by counting. In fact, you and your friend figured out the main trick, which is considering sequences of the form THHHHH. I learned this from Tiny Tim, who I think is an admin here.

    A run of at least 5 heads can happen in two ways. Either the series of 20 tosses starts out HHHHH, or at some point in the series the sequence THHHHH comes up. These are not mutually exclusive, so the probability you are looking for is

    P(series starts out HHHHH) + P(sequence THHHHH comes up) - P(start out HHHHH and THHHHH comes up later)


    The first probability is (1/2)5, of course. To find the second probability, the probability that THHHHH comes up at least once in 20 tosses, you again use the Principle of Inclusion/Exclusion (PIE) to avoid over counting.

    The sequence THHHHH could start on the first toss, the second toss, the third toss, etc. Symbolize these by

    THHHHHXXXXXXXXXXXXXX
    XTHHHHHXXXXXXXXXXXXX
    XXTHHHHHXXXXXXXXXXXX
    .
    .
    .
    (there are 14 + 1 = 15 of these)

    where X could be either H or T. Each of these sequences has probability (1/2)6. As you have already pointed out, this overcounts such sequences as

    XTHHHHHXXXXTHHHHHXXX

    So they must be subtracted out. There are 20 -(2*6) + 2 = 10 places for either an X or a THHHHH to fit in, and you must choose 2. Thus there are 10 C 2 such sequences, and each sequence has probability (1/2)12

    But now we have undercounted those sequences that have 3 occurrences of THHHHH, such as

    THHHHHXXTHHHHHTHHHHH

    We must add these back in. There are 20-(3*6) + 3 = 5 places for either an X or a THHHHH to fit in, and you must choose 3. So the probability of all these sequences is (5 C 3)*(1/2)18. Therefore the probability for an occurrence of THHHHH in 20 tosses is

    P(sequence THHHHH comes up) = 15*(1/2)6 - (10 C 2)*(1/2)12 + (5 C 3)*(1/2)18


    The probability that you start out with HHHHH and also have a THHHHH come up later is calculated in a similar way.
     
  8. May 29, 2010 #7
    [tex]P(H_5|N_{20}) = \binom{20}{5} \left (\frac{1}{2} \right )^5 \left (\frac{1}{2} \right ) ^{15} [/tex]

    (should be to the power of 15... not sure why PF latex is broken)

    Where [tex]\binom{20}{5}[/tex] is the binomial coefficient found by [tex]\binom{n}{k} = \frac{n!}{k! (n - k)!}[/tex]

    Now,

    [tex]P(H_x;x \geq 5|H_{20}) = 1 - P(H_x; x \leq 4|H_{20})[/tex]

    or

    [tex]= 1 - \sum_j^4 P(H_j|H_{20})[/tex]

    Where (from earlier),

    [tex]P(H_k|N_{n}) = \binom{n}{k} p^k (1-p) ^{n-k}[/tex]

    So it comes out to something like:

    20 flips, at least 5 heads.
    Combinations for 0 heads = 20! / [0! * (20 - 0)!] = 1
    Combinations for 1 heads = 20! / [1! * 19!] = 20
    Combinations for 2 heads = 20! / [2! * 18!] = 180
    C for 3 heads = 20! / [3! * 17!] = 1140
    C for 4 heads = 20! / [4! * 16!] = 4845

    Probability of 0 heads in 20 flips = 1 combination * (1/2)^20
    Probability of 1 head in 20 flips = 20 * (1/2)^19 * (1/2) = 20 * (1/2)^20
    2 heads = 180 * (1/2)^20
    ...

    at least 5 heads = 1 - 1*(1/2)^20 - 20*(1/2)^20 - 180*(1/2)^20 - 1140*(1/2)^20 - 4845*(1/2)^20
    = 1 - (1/2^20) * [1 + 20 + 180 + 1140 + 4845]

    There you go.
     
    Last edited: May 29, 2010
  9. May 29, 2010 #8
    Thanks techmologist, that works well. Might be a bit time consuming if you are working with large numbers though (if I understand you correctly).

    luma, thanks for your contribution. The math is fine, but it answers a different question. Here I am looking specifically at a the probability of a run of at least 5 heads occurring rather than at least 5 heads occurring in the sequence in total.

    Someone else suggested another way to solve the problems which seems to be the simplest of all.

    For N tosses of a coin, the number of outcomes that don't include H consecutive heads can be calculated by addition as follows. You start at N=0, and work you way up to the amount of tosses you are interested in:

    For 0≤N<H it's simply 2^N
    For N≥H it's the sum of the H previous values. (Hope that makes sense, as I'm not sure this is the proper way of putting it).

    So for N=20 and H=5, you would work out

    N H
    0 1 =(2^N)
    1 2 =(2^N)
    2 4 =(2^N)
    3 8 =(2^N)
    4 16= (2^N)
    5 31 =(1+2+4+8+16)
    6 61 =(2+4+8+16+31) etc.
    7 120
    8 236
    9 464
    10 912
    11 1793
    12 3525
    13 6930
    14 13624
    15 26784
    16 52656
    17 103519
    18 203513
    19 400096
    20 786568

    Now you have the number of outcomes that don't include at least 5 consecutive heads after 20 tosses, as well as those after 19, 18 etc.
    The total possible combinations = 2^20, so the probability for at least 5 consecutive heads =1-786568/2^20

    for N=100 and H=6

    N H
    0 1
    1 2
    2 4
    3 8
    4 16
    5 32
    6 63
    7 125
    8 248
    9 492
    10 976
    11 1936
    12 3840
    13 7617
    14 15109
    15 29970
    16 59448
    etc.
     
  10. May 30, 2010 #9
    Correct me if im wrong here but if you have a 0.03125% chance of getting 5 heads in a row and you want to find out how likely you are to get that in 20 coin tosses then wouldn't you be able to just take the chance of getting 5 heads in a row and times it by the amount of trials you get to do that? Like in 20 coin toss you would have a chance from the start of 1-15 of the coin tosses so 15 chances overall? In that case I get like a .46 chance of it happening from .03125% x 15.
     
  11. May 30, 2010 #10
    (It's a 0.03125 = 3.125% chance for 5 heads in a row for 5 tosses)

    That doesn't work as it counts sequences that are impossible. For instance, if tosses 1-5 HTHHH, the chance of HHHHH on tosses 2-6 is 0%, not 3.125%. It would also count HHHHHH[rest of tosses] as two separate occurrences, because the sequence of at least 5 heads occurs on 1-5, and 2-6.



    Try to figure out what the probability is for at least 5 heads in a row for 50 tosses. .3125x46 is not possible as it is greater than one.

    One small thing, there are 16 rather 15 chances. 1-5,...,16-20
     
  12. May 30, 2010 #11
    Yeah, sometimes an exact closed form solution is less useful than just solving a problem iteratively or recursively on a computer. And with this problem being as hard as it is, any solution is going to be messy. But I think the general solution using PIE is manageable. You could get Matlab or Mathematica to evaluate it quickly.

    Probability of a run of at least H heads in N tosses =
    [tex]\frac{1}{2^H}+\sum_{j=1}^{\lfloor \frac{N}{H+1} \rfloor}(-1)^{j+1}\binom{N-j(H+1)+j}{j}\frac{1}{2^{j(H+1)}} - \frac{1}{2^H}\sum_{j=1}^{\lfloor \frac{N-H}{H+1} \rfloor} (-1)^{j+1}\binom{N-H-j(H+1)+j}{j}\frac{1}{2^{j(H+1)}}[/tex]

    I get that.

    I don't understand that. Can you explain why that works? If it does, that is definitely the easiest way to do it.

    EDIT: latex formula should be right now. Thanks to D H for corrections.
     
    Last edited: May 31, 2010
  13. May 30, 2010 #12

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    The Investor's answer in post #4 is simple, correct, and easily generalizable to determine the probability of obtaining a run of some length N in more in a set of M (M≥N) tosses.

    techmologist, your counting in post #6 is incorrect. A run starting on the first throw is conceptually different from a run starting on any other roll. You need to carry this distinction throughout in your accounting. The count approach quickly gets intractable. With 23 rolls you have to worry about overcounting four runs of five, for example.
     
  14. May 30, 2010 #13
    Sorry for my muddled up symbols. I'm using H as the run length of heads we're interested in, and in the table H stands for the number of variations without that run occurring, so it's all a bit confusing. I guess you do see that it works, but want to know why? I'm not sure, because someone else gave me this solution, and I asked this question too, but haven't received a response yet!

    I did come up with another solution independently while trying to work it out:

    This involves solving a series of simple problems that give the answer for the complex one.
    Once again solve it in one step increments. So if you are interested in the probability of at least 5 consecutive heads in 20 tosses, first solve the number of occurrences without a run of 5 consecutive heads in 0 tosses, then 1 toss, 2 tosses, 3 tosses, .... 20 tosses.


    When N<H, number of outcomes without a run of 5 heads= N(x) = 2^x
    When N=H, number of outcomes without a run of 5 heads = N(x) = 2^x-1
    When N>H, number of outcomes without a run of 5 heads = N(x) = N(x-1)*2-N(x-1-H)

    So for N=20 and H=5, first find the number of outcomes without H consecutive heads for N=0 and H=5, then N=1 and H=5,... N=20 and H=5

    N(0)=2^0=1
    N(1)=2^1=2
    [...]
    N(5)=2^5-1=31
    N(6)= N(5)*2-N(5-H)
    N(7)= N(6)*2-N(6-H)
    N(8)=N(7)*2- N(7-H)
    [...]
    N(20)=N(19)*2-N(19-H)

    And for N=100 and H=6, first find the number of outcomes without H consecutive heads for N=0 and H=6, then N=1 and H=6,... N=100 and H=6

    N(0)=2^0=1
    N(1)=2^1=2
    [...]
    N(5)=2^5=32
    N(6)= 2^6-1=63
    N(7)= N(6)*2-N(6-H)
    N(8)=N(7)*2- N(7-H)
    [...]
    N(20)=N(19)*2-N(19-H)

    I have no idea if I have conveyed this correctly, so feel free to correct my notation or ask for clarification. I have checked it, and the answers generated using this algorithm are correct, but the way I have described it probably isn't!
     
  15. May 30, 2010 #14
    I don't really see what the hthhh roll has to do with it because if your going to account for one random roll effecting the rest then imo you just cherry picked that number and the roll and should have to factor for all possible rolls and in that case your left with the same chance.

    When I read the Op question it seemed he was just asking what are the chances of getting 5 heads in a row in 20 coin tosses. I personally think that the answer for that type of question should be very simple because the question itself sounds simple.
     
  16. May 30, 2010 #15
    Also I wanted to point out that you should probably test your numbers irl before you use them. I was just checking with a coin toss generator however the one I was using only allowed for 16 coins tossed at a time at max. Anyhow it seems like with just 16 coins tossed the amount of times I got 5 heads in a row was about 33% if I did it for like 10 or so trys. But that site was a little wierd as it tryed to factor for different coins and stuff id rather have used a just random generated coin toss but was stressed for time.
     
  17. May 30, 2010 #16
    That's what I thought to, I was initially surprised that answering such a simple question was not that easy. To understand why your method doesn't work, you can take a much simpler problem:

    If you toss a fair coin 6 times, what is the chance that you get a run of at least 5 heads at any time?


    You'll see that reasoning along the lines of two chances to get 5 consecutive heads (tosses 1-5 and tossed 2-6), does not lead to the correct answer.

    We can simply count them. HHHHHH, THHHHH, HHHHHT, there are 3 sequences out of 64 possibilities, so P(5 consecutive heads in 6 tosses)=3/64
     
  18. May 30, 2010 #17
    huh? the hhhhhh is included with the thhhhh and hhhhht imo... At least with the way the problem is worded.
     
  19. May 30, 2010 #18
    That's quite a lot more than I would expect.
    P(5 consecutive heads in 16 tosses) = 0.196533203
     
  20. May 30, 2010 #19
    If you toss a fair coin 3 times, what is the chance that you get a run of at least 2 heads at any time?

    There are 8 equiprobable outcomes:
    HHH
    HHT
    HTH
    THH
    HTT
    THT
    TTH
    TTT

    so the chance = 3/8
    Do you see that HHHHHH, THHHHH and HHHHHT are distinct?
     
  21. May 30, 2010 #20
    Nope take it back I still think what I said was right. There is a 1/32 chance of getting 5 coins in a row as heads. There is 15 chances where you could get 5 coins in a row as heads.

    Your saying it will happen about 24% of the time im saying it will happen about 46% of the time. I totally think 25% is too low of a chance %.

    Also if you count the HHH as two you end up with 4/8 and that's exactly what my math will give. I can understand why you would not want to count the HHH as if it was 2 but I think you should.
     
    Last edited: May 30, 2010
  22. May 30, 2010 #21
    Just run a simulation like this:

    Sub monte()
    Randomize Time
    heads5 = 0
    trials = 1000000
    For this_trial = 1 To trials
    heads = 0
    For this_toss = 1 To 20
    If Rnd() > 0.5 Then
    heads = heads + 1
    Else
    heads = 0
    End If
    If heads = 5 Then
    heads5 = heads5 + 1
    Exit For
    End If
    Next
    Next
    Debug.Print heads5 / trials
    End Sub

    The outcome should be very close to 0.25
     
  23. May 30, 2010 #22

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    That obviously isn't right. It isn't even correct for the trivial case N=H. It appears you are trying to account for double and triple counting events here. What about counting events four times, as might happen when trying to flip 5 heads in a row with 23 or more tosses?

    It is easy to make mistakes when using combinatorics. That's one reason I like The Investor's algorithm in post #4. It avoids most of the combinatorics nastiness, it is simple and extensible, and above all it is correct. (A simple, extensible algorithm that yields incorrect answers is of course worthless.)

    Let me take a hack at that.

    First off, the reason for all this protection against double counting is that we are introducing an artificiality to make the problem a bit more tractable. Suppose you convince someone to give you 4:1 odds that you can't flip five heads in a row in twenty flips. You should take your money and run as soon as you do flip five heads in a row. No telling, after all, what the guy is going to do when he discovers the odds should have been 3:1 (3.002076:1). You would not flip the coin twenty times. The over-counting of events occurs because of artificial insistence that the coin be flipped twenty times. The reason for doing so is that it is far easier to account for events that don't really count than it is to deal with a space with a variable number of events.

    There is an easy way to address this artificial multiplicity. The game is won if a subsequence of H or more consecutive heads occurs in a sequence of N coin flips. The Investor's algorithm defines a winning sequence in terms of the toss number that starts the sequence of H heads. Breaking this down, there are three cases to consider:
    1. The winning sequence starts on flip #1. This happens when
      • Each of the H consecutive flips starting with flip #1 is heads.
    2. The winning sequence starts on flip #n, where n is between 2 and H+1 inclusive. This happens when
      • Each of the H consecutive flips starting with flip #n is heads and
      • Flip #n-1 was tails.
    3. The winning sequence starts on flip #m, where m is between H+2 and N-H+1 inclusive. This happens when
      • Each of the H consecutive flips starting with flip #m is heads and
      • Flip #m-1 was tails and
      • A win did not occur on flips #1 to #m-(H+1).

    This covers the space of winning events, and it does so in a way that makes the winning events mutually exclusive. This means the total probability is the sum of the probabilities for each event. Each event comprises several independent subevents, so these event probabilities are the product of these subevent probabilities.

    In short, the answer to the question "why does that work?" is that the algorithm nicely takes advantage of the rules for mutually exclusive events and independent events.
     
  24. May 30, 2010 #23

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    Just because a question is simple does not mean that the answer is. Some simple questions have very hard answers. A couple:
    • Why do protons have the mass they do?
    • How to stop the Deepwater Horizon oil spill?
     
  25. May 31, 2010 #24
    This example kinda shows hes not grasping what im saying...

    "If you toss a fair coin 3 times, what is the chance that you get a run of at least 2 heads at any time?

    There are 8 equiprobable outcomes:
    HHH
    HHT
    HTH
    THH
    HTT
    THT
    TTH
    TTT

    so the chance = 3/8"

    This shows the amount of times hh+ will show up if you roll every possible outcome once it however does not show the chance of getting hh+ unless you factor for the hhh as being two times it happened then it does.
     
  26. May 31, 2010 #25

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    No, it shows that you don't understand what we are saying.

    You cannot compute probabilities the way you are doing. Think of it this way: What is the probability of tossing 2 heads in a row if you toss a fair coin 7 times? Multiplication would lead you to think the probability is 6*1/4=1.5. Any time that a calculation yields a probability greater than one means the calculation is wrong. (BTW, note that the multiplicand is six, not five. You can win by tossing consecutive heads on toss numbers 1&2, 2&3, 3&4, 4&5, 5&6, or 6&7.)

    You can add probabilities only if the events are mutually exclusive. This is not the case here. For example, obtaining heads on the first two and last two tosses of a set of seven tosses are not mutually exclusive events.
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook