Calculating Probability of Difference of Heads and Tails After n Flips

  • Context: Graduate 
  • Thread starter Thread starter techmologist
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on calculating the probability that the absolute difference between heads and tails in a series of 10 flips of a fair coin reaches 3 or more. The participants explore various mathematical approaches, including probability matrices and random walk theory, to derive a general formula. A key finding is that the probability of achieving a difference of 3 or more is approximately 68.36%, supported by both analytical methods and Monte Carlo simulations. The conversation also touches on related problems involving biased coins and maximum deviations.

PREREQUISITES
  • Understanding of binomial distribution and its applications
  • Familiarity with random walk theory and Gaussian distribution
  • Basic knowledge of probability matrices
  • Experience with Monte Carlo simulation techniques
NEXT STEPS
  • Learn about the application of binomial distribution in random walks
  • Study the derivation of probabilities using probability matrices
  • Explore Monte Carlo simulation methods for probability estimation
  • Investigate the implications of biased coins on probability outcomes
USEFUL FOR

Mathematicians, statisticians, and anyone interested in probability theory, particularly those exploring coin flip problems and random processes.

techmologist
Messages
305
Reaction score
12
What is the probability that, at some point during this series of 10 flips, the difference (absolute value) between the number of heads and tails is 3 or more. Assume the coin is fair.

Is it possible to find a general formula for this problem, that allows you to simply plug in the number n flips, the difference d (3 in the above problem), and the probability p that a single flip results in heads (assumed 1/2 above), and out pops the required probability?

edit:

This is for enjoyment, only. I can think of one way at least (using probability matrices) to get the answer for any particular set of values. I'm just interested to see what other ways people can think of. Thank you.
 
Physics news on Phys.org
Well, the first thing that comes to mind - this is me thinking as a physicist - is that the difference between the number of heads and the number of tails for a fair coin is practically a textbook example of a random walk. It's a well known fact that the accumulated values of a random walker form a Gaussian distribution with a standard deviation equal to the square root of the number of steps. But that's inexact; it would only give you the limit as the number of flips went to infinity.

I'm sure there's a way to use the binomial distribution to describe the short-term behavior of random walkers, but I couldn't tell you what it is off the top of my head. I don't often have opportunity to do that kind of math these days.
 
diazona said:
Well, the first thing that comes to mind - this is me thinking as a physicist - is that the difference between the number of heads and the number of tails for a fair coin is practically a textbook example of a random walk. It's a well known fact that the accumulated values of a random walker form a Gaussian distribution with a standard deviation equal to the square root of the number of steps. But that's inexact; it would only give you the limit as the number of flips went to infinity.

I'm sure there's a way to use the binomial distribution to describe the short-term behavior of random walkers, but I couldn't tell you what it is off the top of my head. I don't often have opportunity to do that kind of math these days.

Right. n=10 might be a little small for a gaussian approximation, especially if p is not 1/2, but the result that the standard deviation is proportional to the square root of n still holds. This problem is more about maximum deviation during the course of 10 flips. There may be a way of relating the expected maximum deviation and the standard deviation.

I thought of another way of doing it that works for the particular example above, with n=10, d=3, and p=1/2. Since it is a fair coin, you can just count the number of flip sequences that deviate 3 or more at some point, and divide by 2^10.

Call a sequence "successful" if results in a deviation of 3 or more. Let N(i,j) be the number of successful sequences of length j in which the starting difference between heads and tails is i. Then

N(i,j+1) = N(i+1,j) + N(i-1,j)

We are looking for N(0,10). Note that N(3,j) = N(-3,j) = 2^j since the sequence starts in a successful state, so any j flips after that still leaves the sequence successful.

N(0,10) = N(1,9) + N(-1,9) = N(2,8) + 2*N(0,8) + N(-2,8)
... = N(3,7) + 3*N(1,7) + 3*N(-1,7) + N(-3,7) = 2^8 + 3*( N(1,7) + N(-1,7) )

Continuing in this way, we finally get

N(0,10) = 2^8 + 3*2^6 + 9*2^4 + 27*2^2 + 81*(N(1,1) + N(-1,1))

But N(1,1) = N(-1,1) = 0, since there's no way to reach 3 or -3 from either of 1 or -1 in only one flip. So

N(0,10) = 256 + 192 + 144 + 108 = 700

The probability is then P = 700/1024. This way would be a lot of work for n > 50 or so. And it doesn't work for a biased coin.
 
awesome problem...not quite sure how to solve it yet
 
frustr8photon said:
awesome problem...not quite sure how to solve it yet

It's the most challenging coin flip problem I've tried so far. There's a related problem that seems like it would be harder, but is actually simpler (unless I'm missing something easy on the problem in the OP):

Players A and B bet one unit on each flip of a coin, where A wins with heads and B wins with tails. Player A starts with n units; B starts with m units. They play until one of them runs out of money. The coin is biased: p = Pr(Heads) > 1/2. What is the probability that player B wins? That is, what is the probability that B is ahead n units before he gets behind m units?

Pr(B wins) = [(q/p)^n - (q/p)^(m+n)] / [1 - (q/p)^(m+n)] where q = 1-p < 1/2

I'm looking for a similar kind of formula for the problem in my first post...one where you can plug and chug :)
 
The problem posed in the original post is not all that hard. In many probability problems it is easier to look at the opposite problem. This is the case here. The opposite problem is

What is the probability that, at every point during a series of 10 flips, the difference (absolute value) between the number of heads and tails is 2 or less.

The answer to the original problem is 1 less this probability. So how to compute the answer to this alternate question?

Obviously the probability is one up to the third roll. On the third roll, the number of heads must be one or two to satisfy the conditions. On the fourth roll, the number of heads must be one, two, or three. Five rolls is similar to the three roll problem, six to the four roll problem, etc. Multiplying all of these together yields the answer to the alternate problem. After a bit of work, this is

p_{\text{alt}} = \prod_{m=2}^5 \frac{(2m-1)(3m+1)(2m)!}{(m+1)(m!)^2\,2^{4m+2}} \approx 0.000287429

The answer to the original problem is thus 1-0.000287429=0.999712571. In other words, it is close to a certainty (99.97%) that at some point in the roll of ten die the absolute value of the difference between the number of heads and number of tails will be three or more.
 
D H said:
p_{\text{alt}} = \prod_{m=2}^5 \frac{(2m-1)(3m+1)(2m)!}{(m+1)(m!)^2\,2^{4m+2}} \approx 0.000287429

The answer to the original problem is thus 1-0.000287429=0.999712571. In other words, it is close to a certainty (99.97%) that at some point in the roll of ten die the absolute value of the difference between the number of heads and number of tails will be three or more.

I flipped a coin 10 times and got
THHTTTHTHT
in which the discrepancy is 2 or less at all times. So I reject your conclusion with alpha = 0.0003. :biggrin:

But seriously, your conclusion doesn't seem right, so I suspect you assumed independence between non-independent events or something similar. Experimentally I found a probability near 68%. This is consistent with techmologist's result of 700/1024.
 
Obviously I did something wrong. Working on it ...
 
Got it. There is an easy way to look at this problem.

Redefine the problem as flipping a coin some consecutive number of times, stopping when either the absolute value of the difference between heads and tails is three or more or when some pre-defined number of coin flips is reached. The key is that the experiment will only stop on odd-numbered flips or on the final flip. The experiment cannot stop on an even-numbered flip prior to the final flip because it would have already stopped on the preceding odd-numbered flip.

After the first flip, the difference between the number of heads and tails rolled will be ±1. There is a 1/4 chance that the experiment will stop on the third flip. If it does not stop on the third flip, the difference will once again be ±1. Given that the difference on the third flip is ±1, there is a 1/4 chance that the experiment will stop on the fifth flip. And so on. The probability of making it to the tenth flip is (3/4)4=0.31640625 (exactly). That tenth flip is a free ride. The probability of *not* having a difference between heads or tails be or 3 or more (absolute) anywhere in a sequence of ten flips is 31.64%, so the probability of having a difference of 3 or more (absolute) is 68.36%.

This result agrees with Monte Carlo tests.

Addendum
Suppose the question was with respect to 60 coin flips rather than 10. I ran a 60 coin toss experiment 10 million times in a Monte Carlo simulation. Of those 10 million cases, 2356 of them made it to the 60th flip. This experimental probability of 0.0002356 agrees rather nicely with the computed probability of (3/4)29=0.000238109.
 
Last edited:
  • #10
D H,

Thank you. I really like that way of looking at the problem. That is really clever, and it makes perfect sense. I'm trying to think of how to generalize it to larger differences, but it seems to get pretty tricky even with d=4.
 
  • #11
techmologist said:
D H,

Thank you. I really like that way of looking at the problem. That is really clever, and it makes perfect sense. I'm trying to think of how to generalize it to larger differences, but it seems to get pretty tricky even with d=4.

To generalize D H's (very nice) argument to greater discrepancies, you'll need to consider more than one state. The argument won't be so nice as the d = 2 case.
 
  • #12
CRGreathouse said:
To generalize D H's (very nice) argument to greater discrepancies, you'll need to consider more than one state. The argument won't be so nice as the d = 2 case.


Yes, I tried it for the case d=4 (which, by D H's method, requires tracking discrepancies up to 3). It seems to require as much work as multiplying probability matrices by hand. But the case that D H worked out is very easy to calculate. I will have to remember that in case I get a chance to try this bet on someone at the pool hall. :) You don't usually get an opportunity to do a 20 minute pen and paper calculation in that situation.
 
  • #13
Yes, the d=4 case is a bit worse, but still manageable. Imagine a Pascal triangle, but with the sides cutoff at a heads-tails difference of ±4. Now imagine striking the odd-numbered rows; in this case the odd-numbered coin tosses are free rides. You'll end up with something like this (the ±4 columns are displayed but do not contribute to the triangle sums):

<br /> \begin{array}{r r||r|r r r|r}<br /> n &amp; n/2 &amp; -4 &amp; -2 &amp; 0 &amp; 2 &amp; 4 \\<br /> \hline<br /> 0 &amp; 0 &amp; &amp; &amp; 1 &amp; &amp; \\<br /> 2 &amp; 1 &amp; &amp; 1 &amp; 2 &amp; 1 &amp; \\<br /> 4 &amp; 2 &amp; 1 &amp; 4 &amp; 6 &amp; 4 &amp; 1 \\<br /> 6 &amp; 3 &amp; 4 &amp; 14 &amp; 20 &amp; 14 &amp; 4 \\<br /> 8 &amp; 4 &amp; 14 &amp; 48 &amp; 68 &amp; 48 &amp; 14 \\<br /> 10 &amp; 5 &amp; 48 &amp; 164 &amp; 232 &amp; 164 &amp; 48 \\<br /> 12 &amp; 6 &amp; 164 &amp; 560 &amp; 792 &amp; 560 &amp; 164<br /> \end{array}<br />

Sequences and such are much nicer when the index numbers increase by 1 rather than 2. I'll denote n/2 (the second column) as m and index things by m from now on. Note that the truncated triangle is symmetric. All that matters are the head-tails differences of 0, 2, and 4. Denote these as am, bm, and cm.

The probability that a sequence of rows ends on the 2mth toss is 2c_m/t_m, where tm is the total of the entries for a given row, including those c values that don't contribute to sums on the next row. This total is t_m=2c_m+2b_m+a_m

Note that those cm values are equal to the bm values on the previous row. There is no need to account for the c values in the state model. A bit trickier to see, those bm values aren't needed either. For all but the top row, b_m=a_m-a_{m-1}[/tex]. The only reason this doesn&#039;t work for the top row is because there is no <i>a</i><sub>-1</sub> value. A value of <i>a</i><sub>-1</sub>=1 would work quite nicely.<br /> <br /> So all that is needed is column <i>a</i>, the central column in the truncated Pascal triangle. Playing around with the math just a bit,<br /> <br /> \aligned&lt;br /&gt; a_m &amp;amp;= 4a_{m-1}-2a_{m-2} \\ \\&lt;br /&gt; p_{\text{end},m} &amp;amp;= \frac 1 2\,\frac {a_{m-1}-a_{m-2}} {a_m-a_{m-1}}&lt;br /&gt; \endaligned<br /> <br /> where <i>p</i><sub>end,m</sub> is the conditional probability that the sequence ends on toss 2*<i>m</i> with a heads-tails difference of ±4 given that the heads-tails difference on all the previous tosses was 3 or less (absolute). Multiplying (1-<i>p</i><sub>end,1</sub>)*(1-<i>p</i><sub>end,2</sub>)*...*(1-<i>p</i><sub>end,m</sub>) yields the probability that the heads-tails difference remains within ±3 for all 2*<i>m</i> tosses in the sequence.<br /> <br /> This time I tested before I opened my big mouth (cf post #6). Monte Carlo testing agrees with these computed results.<br /> <br /> <br /> A couple of final notes:<br /> 1. The conditional probability quickly converges on a steady-state value of 1/(2+\sqrt 2)<br /> <br /> 2. That <i>a</i> sequence looks a bit Fibonacci-ish; there should be way to express <i>a</i><sub>bm</sub> directly.
 
  • #14
D H,

Wow, that is a very slick method for systematically keeping track of all the ways a series of flips can go! I had to think about it a while, but I get it now. What I really like is that your way, using a recurrence relation, works for any number of flips, n. To get the probability that the discrepancy remains < 4 for 2m tosses, you can divide 2*bm + am by 22m, the total number of sequences of 2m tosses.

\mbox{Pr(discrepancy remains &lt; 4 through 2m flips)} = \frac{2b_m + a_m}{2^{2m}}= \frac{a_{m+1} - a_m}{2^{2m}}

I solved the recurrence am - 4*am-1 + 2*am-2 = 0 by assuming a solution of the form rm. That gives a quadratic with solutions

r = 2 +/- Sqrt(2)

The solution is a linear combination of these:

am = A*r+m + B*r-m

Choosing A = B = 1/2 satisfies the initial conditions a0 = 1, a1 = 2.

a_m = \frac{1}{2}((2+\sqrt{2})^m + (2-\sqrt{2})^m)

Plug this into the above formula for the probability to get

\mbox{Pr(discrepancy remains &lt; 4 through 2m flips)} = \frac{(1+\sqrt{2})(2+\sqrt{2})^m+(1-\sqrt{2})(2-\sqrt{2})^m}{2^{2m+1}}

So far I've only checked this for m = 5 (10 tosses), and got the same result that I did doing it another way:

Probability that the discrepancy remains < 4 for 10 tosses = 560/1024

Your method is also easily generalized to higher discrepancies, but then you have to solve higher order recurrences. So for d=5, we need to solve a cubic equation, and for d=6 a quartic equation. I don't remember how to do either of those, but I could look it up :). That's as far as it goes for a general solution. But then, a general solution has little practical use anyway. It is just a curiosity for me.
 

Similar threads

  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 47 ·
2
Replies
47
Views
8K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 41 ·
2
Replies
41
Views
8K
  • · Replies 57 ·
2
Replies
57
Views
6K
Replies
8
Views
1K
  • · Replies 14 ·
Replies
14
Views
6K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 45 ·
2
Replies
45
Views
6K
Replies
7
Views
2K