Trying to determine the number of coin tosses for HHTH

member 428835 · Apr 28, 2022

Hi PF!

This is related to a forum I just posted here: how many coin tosses are expected to get the sequence HHTH. I believe the answer is 30, explained in the OP. However, I thought I'd try and program it in python. Below is my code:

Python:

import numpy as np
import statistics
# EXPECTED NUMBER OF TOSSES FOR COIN WITH T WEIGHTED p FOR SEQUENCE HHTH
class coin:
    def approx(self, n, p):
        lst = []
        for _ in range(n):
            toss = 0
            count = 0
            while count < 4:
                # THE TOSS
                x = np.random.rand()
                toss += 1
                # LET x < p INDICATE TAILS
                # LET x > p INDICATE HEADS
                ## AFTER HHT, WE SEEK H
                if count == 3:
                    if x > p:       # IF HEADS, APPEND lst WITH NUM OF TOSSES
                        lst.append(toss)
                        count = 4   # MAKE COUNT 4 TO BREAK WHILE LOOP
                    else:
                        count = 0   # SEQUENCE IS HHTT WHICH IS T, SO COUNT RESTARTS
                ## AFTER HH, WE SEEK T
                if count == 2:
                    if x > p:
                        count = 2   # IF HEADS, WE HAVE HH
                    else:
                        count = 3   # IF TAILS, WE HAVE HHT
                # FIRST TWO HH
                if count < 2:
                    if x < p: # IF TAILS, RESTART COUNT
                        count = 0
                    else:   # IF HEADS, WE HAVE H OR HH
                        count += 1
        return statistics.mean(lst)
         
if __name__ == "__main__":
    n = 10000
    p = 0.5
    print(coin().approx(n, p))

The results are giving me 18 coin tosses on average. Is my programming wrong or is my calculation in the other post wrong (or maybe both are)?

FactChecker · Apr 28, 2022

I used a completely independent program to do a few simulations , each of 10 million strings, and got an average of 18 (+-0.005), counting the characters in the termination string, "HHTH".

I used this Perl program:

[CODE lang="perl" title="Perl simulation of 10 million strings to get average length to HHTH"]$numTrials=10000000;
@chars = ("H", "T");
foreach $i (1..$numTrials){
$string='';
$count=0;
while( $string !~ /HHTH/){
$ran = int(rand(2));
$char = $chars[$ran];
$string = "$string$char";
$count++;
}
# print "count=$count string=$string\n";
$totCount += $count;
}
$avgCount = $totCount/$numTrials;
print "avgCount=$avgCount\n";[/CODE]

PS. In my opinion, it is not unusual for simulations to get answers much more easily and reliably than by using analysis. So I would go back and check your analysis.

member 428835 · Apr 28, 2022

FactChecker said:

I used a completely independent program to do a few simulations , each of 10 million strings, and got an average of 18 (+-0.005), counting the characters in the termination string, "HHTH".

I used this Perl program:

[CODE lang="perl" title="Perl simulation of 10 million strings to get average length to HHTH"]$numTrials=10000000;
@chars = ("H", "T");
foreach $i (1..$numTrials){
$string='';
$count=0;
while( $string !~ /HHTH/){
$ran = int(rand(2));
$char = $chars[$ran];
$string = "$string$char";
$count++;
}
# print "count=$count string=$string\n";
$totCount += $count;
}
$avgCount = $totCount/$numTrials;
print "avgCount=$avgCount\n";[/CODE]

PS. In my opinion, it is not unusual for simulations to get answers much more easily and reliably than by using analysis. So I would go back and check your analysis.

Nice way to code this! Thanks for the reinforcement!

.Scott · Apr 28, 2022

Here's a non-Monte-Carlo method. I keep track of the likelihood of each state after every coin toss.
I iterate for 1000 tosses:

C++:

   {
      int     i, toss;
      double  HHTH, expected, states[8], newstates[8], cum;
      typedef enum {
         TTT=0, TTH, THT, THH, HTT, HTH, HHT, HHH
      } Toss_t;

      //
      expected = 0.0;
      printf(
         "  # of   percent    Cummulative\n"
         " tosses occurrence Expected value\n"
      );
      //
      // After 3 tosses, we will be in one of 8 states:
      for(i=0;i<8;i++) states[i] = 0.125;
      //
      // From toss number 4 on, the states will change as follows:
      for(toss=4;toss<=1000;toss++) {
         //
         //  If we were HHT before, there is a 50% chance we will
         // finish on this toss.
         HHTH = 0.5 * states[HHT];
         //
         // TTT and TTH will each be half of TTT and HTT:
         newstates[TTT] = (states[TTT]+states[HTT])/2;
         newstates[TTH] = (states[TTT]+states[HTT])/2;
         //
         // THT and THH will each be half of TTH and HTH:
         newstates[THT] = (states[TTH]+states[HTH])/2;
         newstates[THH] = (states[TTH]+states[HTH])/2;
         //
         // HTT will be half of THT and HHT,
         // But HTH will be half of THT:
         newstates[HTT] = (states[THT]+states[HHT])/2;
         newstates[HTH] = (states[THT])/2;
         //
         // HHT and HHH will each be half of THH and HHH:
         newstates[HHT] = (states[THH]+states[HHH])/2;
         newstates[HHH] = (states[THH]+states[HHH])/2;
         //
         for(cum=0.0, i=0;i<8;i++) {
            cum += states[i] = newstates[i];
         }
         expected += toss * HHTH;
         //
         // Report the first 10 tosses and then every 20th.
         if( (toss<=10) || ((toss%20)==0) ) {
            printf(
               " %4d  %9.5f%% %12.8f\n",
               toss, 100*HHTH, expected + ((toss+1)*cum)
            );
         }
      }
   }

And here are the results:

Code:

  # of   percent    Cummulative
 tosses occurrence Expected value
    4    6.25000%   4.93750000
    5    6.25000%   5.81250000
    6    6.25000%   6.62500000
    7    5.46875%   7.38281250
    8    5.07812%   8.08984375
    9    4.68750%   8.75000000
   10    4.39453%   9.36621094
   20    2.20737%  13.66702461
   40    0.55596%  16.90866547
   60    0.14003%  17.72512859
   80    0.03527%  17.93076890
  100    0.00888%  17.98256295
  120    0.00224%  17.99560818
  140    0.00056%  17.99889384
  160    0.00014%  17.99972140
  180    0.00004%  17.99992983
  200    0.00001%  17.99998233
  220    0.00000%  17.99999555
  240    0.00000%  17.99999888
  260    0.00000%  17.99999972
  280    0.00000%  17.99999993
  300    0.00000%  17.99999998
  320    0.00000%  18.00000000
  340    0.00000%  18.00000000
  360    0.00000%  18.00000000
  380    0.00000%  18.00000000
  400    0.00000%  18.00000000
  420    0.00000%  18.00000000
  440    0.00000%  18.00000000
  460    0.00000%  18.00000000
  480    0.00000%  18.00000000
  500    0.00000%  18.00000000
  520    0.00000%  18.00000000
  540    0.00000%  18.00000000
  560    0.00000%  18.00000000
  580    0.00000%  18.00000000
  600    0.00000%  18.00000000
  620    0.00000%  18.00000000
  640    0.00000%  18.00000000
  660    0.00000%  18.00000000
  680    0.00000%  18.00000000
  700    0.00000%  18.00000000
  720    0.00000%  18.00000000
  740    0.00000%  18.00000000
  760    0.00000%  18.00000000
  780    0.00000%  18.00000000
  800    0.00000%  18.00000000
  820    0.00000%  18.00000000
  840    0.00000%  18.00000000
  860    0.00000%  18.00000000
  880    0.00000%  18.00000000
  900    0.00000%  18.00000000
  920    0.00000%  18.00000000
  940    0.00000%  18.00000000
  960    0.00000%  18.00000000
  980    0.00000%  18.00000000
 1000    0.00000%  18.00000000

PeroK · Apr 29, 2022

Here's my plodding Python code:

Python:

#
# Calculate the expected number of coin tosses to get the sequence HHTH:
# State 0 is the starting point; state 1 is H; state 2 is HH; state 3 is HHT
#
import random
#
# Choose the number of games in the simulation
games = 1_000
#
# Counter to keep the total number of tosses across all games
tot_count = 0
#
for k in range(0, games):
    state = 0
    toss_count = 0
    game_over = False
#
    while not game_over:
        toss_count += 1
        toss = random.randint(0, 1)
#
# 0 is tails, 1 is heads
#
        if state == 0:
            if toss == 0:
                state = 0
                continue
            else:
                state = 1
                continue
#
        if state == 1:
            if toss == 0:
                state = 0
                continue
            else:
                state = 2
                continue
#
        if state == 2:
            if toss == 0:
                state = 3
                continue
            else:
                state = 2
                continue
#
        if state == 3:
            if toss == 0:
                state = 0
                continue
            else:
                game_over = True
#
    tot_count += toss_count
#    print(toss_count)
#
# Output the average number of tosses per game
print(tot_count/games)

.Scott · Apr 29, 2022

The "other thread" is interested in a variation of this program that handles unfair coin tosses.
For a probability of head being 0.1 (instead of 0.5), the expected number of tosses climbs above 1121.

C++:

   {
      int     i, toss;
      double  p, HHTH, expected, states[8], newstates[8], cum;

      //
      //  Change the constants below for different heads/tails
      // probabilities.
      //
      // For a fair coin:
      //  const double pH = 0.5;
      //  #define TOSS_LIMIT  (500)
      //  #define REPORT_INTERVAL (20)
      //
      // For a very biased coin:
      const double pH = 0.1;
      #define TOSS_LIMIT   (32000)
      #define REPORT_INTERVAL  (1000)

      const double pT = (1.0-pH);

      typedef enum {
         TTT=0, TTH, THT, THH, HTT, HTH, HHT, HHH
      } Toss_t;

      //
      expected = 0.0;
      //
      // After 3 tosses, we will be in one of 8 states:
      for(i=0;i<8;i++) {
         p = (i&1) ? pH : pT;
         p *= (i&2) ? pH : pT;
         states[i] = p * ((i&4) ? pH : pT);
      }
      //
      // From toss number 4 on, the states will change as follows:
      printf(
         "     # of   percent       Cumulative\n"
         "    tosses occurrence   Expected value\n"
      );
      for(toss=4;toss<=TOSS_LIMIT;toss++) {
         //
         //  If we were HHT before, there is a 50% chance we will
         // finish on this toss.
         HHTH = pH * states[HHT];
         //
         // TTT and TTH will each be based on TTT and HTT:
         newstates[TTT] = pT * (states[TTT]+states[HTT]);
         newstates[TTH] = pH * (states[TTT]+states[HTT]);
         //
         // THT and THH will each be based on TTH and HTH:
         newstates[THT] = pT * (states[TTH]+states[HTH]);
         newstates[THH] = pH * (states[TTH]+states[HTH]);
         //
         // HTT will be based on THT and HHT,
         // But HHT will be contribute to HTH:
         newstates[HTT] = pT * (states[THT]+states[HHT]);
         newstates[HTH] = pH * (states[THT]);
         //
         // HHT and HHH will each be based on THH and HHH:
         newstates[HHT] = pT * (states[THH]+states[HHH]);
         newstates[HHH] = pH * (states[THH]+states[HHH]);
         //
         for(cum=0.0, i=0;i<8;i++) {
            cum += states[i] = newstates[i];
         }
         expected += toss * HHTH;
         //
         // Report the first 10 tosses and then every 20th.
         if( (toss<=10) || ((toss%REPORT_INTERVAL)==0) ) {
            printf(
               " %7d  %9.5f%% %15.8f\n",
               toss, 100*HHTH, expected + ((toss+1)*cum)
            );
         }
      }
   }

Code:

     # of   percent       Cumulative
    tosses occurrence   Expected value
       4    0.09000%      4.99910000
       5    0.09000%      5.99730000
       6    0.09000%      6.99460000
       7    0.08919%      7.99100810
       8    0.08911%      8.98652511
       9    0.08903%      9.98115184
      10    0.08895%     10.97488903
    1000    0.03668%    663.30899519
    2000    0.01499%    934.00262938
    3000    0.00613%   1044.63793041
    4000    0.00250%   1089.85572754
    5000    0.00102%   1108.33671108
    6000    0.00042%   1115.89008108
    7000    0.00017%   1118.97722189
    8000    0.00007%   1120.23896841
    9000    0.00003%   1120.75465733
   10000    0.00001%   1120.96542475
   11000    0.00000%   1121.05156758
   12000    0.00000%   1121.08677505
   13000    0.00000%   1121.10116471
   14000    0.00000%   1121.10704591
   15000    0.00000%   1121.10944962
   16000    0.00000%   1121.11043204
   17000    0.00000%   1121.11083357
   18000    0.00000%   1121.11099768
   19000    0.00000%   1121.11106475
   20000    0.00000%   1121.11109216
   21000    0.00000%   1121.11110337
   22000    0.00000%   1121.11110795
   23000    0.00000%   1121.11110982
   24000    0.00000%   1121.11111058
   25000    0.00000%   1121.11111090
   26000    0.00000%   1121.11111102
   27000    0.00000%   1121.11111108
   28000    0.00000%   1121.11111110
   29000    0.00000%   1121.11111111
   30000    0.00000%   1121.11111111
   31000    0.00000%   1121.11111111
   32000    0.00000%   1121.11111111

Algr · May 1, 2022

You might make sure that you are checking each toss as being the potential start of the string. It is easy to accidentally look at groups of four instead, which would ignore strings that occur outside such a group, such as THHT, HTTH, ect...

FactChecker · May 1, 2022

Algr said:

You might make sure that you are checking each toss as being the potential start of the string. It is easy to accidentally look at groups of four instead, which would ignore strings that occur outside such a group, such as THHT, HTTH, ect...

We all seem to be getting the same answer and I know that at least one (post #2) does not have that problem.

.Scott · May 2, 2022

Algr said:

You might make sure that you are checking each toss as being the potential start of the string. It is easy to accidentally look at groups of four instead, which would ignore strings that occur outside such a group, such as THHT, HTTH, ect...

In the code I provided (post 4 and 6 above), the check is done on the last four tosses on every coin toss after the first three. ... so it's handled.

Algr · May 2, 2022

Since the various code analysis were generating lower (and consistent) values, I was suggesting that the original summation (that thought the answer was 30) was the one with that error.

Jarvis323 · May 6, 2022

The other thread was locked, but I calculated it as a Markov process like @FactChecker suggested based on this link.

https://mpaldridge.github.io/math2750/S08-hitting-times.html

Here is the Markov process I came up with.

The hitting time from state ##e## to state ##1101## is

\begin{split}
\mu_{e,1101} &= 1 + 0.5 \mu_{e,1101} + 0.5 \mu_{1,1101} \\
&= 1 + 0.5 \mu_{e,1101} + 0.5 ( 1 + 0.5 \mu_{e,1101} + 0.5 \mu_{11,1101} ) \\
\end{split}

Need to calculate ##\mu_{11,1101}## in terms of ##\mu_{e,1101}## to continue.

\begin{split}
\mu_{11,1101} &=1 + 0.5 \mu_{11,1101} + 0.5 \mu_{110,1101} \\
&= 1 + 0.5 \mu_{11,1101} + 0.5 ( 1 + 0.5 \mu_{e,1101} ) \\ &= 3 + 0.5 \mu_{e,1101} \\
\end{split}

Now substitute.

\begin{split}
\mu_{e,1101} &= 1 + 0.5 \mu_{e,1101} + 0.5 ( 1 + 0.5 \mu_{e,1101} + 0.5 ( 3 + 0.5 \mu_{e,1101} ) ) \\
&= 2.25 + 0.875 \mu_{e,1101} \\
&= 18 \\
\end{split}

Trying to determine the number of coin tosses for HHTH

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect