Calculating Probability of Difference of Heads and Tails After n Flips

techmologist · Jan 15, 2010

What is the probability that, at some point during this series of 10 flips, the difference (absolute value) between the number of heads and tails is 3 or more. Assume the coin is fair.

Is it possible to find a general formula for this problem, that allows you to simply plug in the number n flips, the difference d (3 in the above problem), and the probability p that a single flip results in heads (assumed 1/2 above), and out pops the required probability?

edit:

This is for enjoyment, only. I can think of one way at least (using probability matrices) to get the answer for any particular set of values. I'm just interested to see what other ways people can think of. Thank you.

diazona · Jan 15, 2010

Well, the first thing that comes to mind - this is me thinking as a physicist - is that the difference between the number of heads and the number of tails for a fair coin is practically a textbook example of a random walk. It's a well known fact that the accumulated values of a random walker form a Gaussian distribution with a standard deviation equal to the square root of the number of steps. But that's inexact; it would only give you the limit as the number of flips went to infinity.

I'm sure there's a way to use the binomial distribution to describe the short-term behavior of random walkers, but I couldn't tell you what it is off the top of my head. I don't often have opportunity to do that kind of math these days.

techmologist · Jan 16, 2010

diazona said:

Well, the first thing that comes to mind - this is me thinking as a physicist - is that the difference between the number of heads and the number of tails for a fair coin is practically a textbook example of a random walk. It's a well known fact that the accumulated values of a random walker form a Gaussian distribution with a standard deviation equal to the square root of the number of steps. But that's inexact; it would only give you the limit as the number of flips went to infinity.

I'm sure there's a way to use the binomial distribution to describe the short-term behavior of random walkers, but I couldn't tell you what it is off the top of my head. I don't often have opportunity to do that kind of math these days.

Right. n=10 might be a little small for a gaussian approximation, especially if p is not 1/2, but the result that the standard deviation is proportional to the square root of n still holds. This problem is more about maximum deviation during the course of 10 flips. There may be a way of relating the expected maximum deviation and the standard deviation.

I thought of another way of doing it that works for the particular example above, with n=10, d=3, and p=1/2. Since it is a fair coin, you can just count the number of flip sequences that deviate 3 or more at some point, and divide by 2^10.

Call a sequence "successful" if results in a deviation of 3 or more. Let N(i,j) be the number of successful sequences of length j in which the starting difference between heads and tails is i. Then

N(i,j+1) = N(i+1,j) + N(i-1,j)

We are looking for N(0,10). Note that N(3,j) = N(-3,j) = 2^j since the sequence starts in a successful state, so any j flips after that still leaves the sequence successful.

N(0,10) = N(1,9) + N(-1,9) = N(2,8) + 2*N(0,8) + N(-2,8)
... = N(3,7) + 3*N(1,7) + 3*N(-1,7) + N(-3,7) = 2^8 + 3*( N(1,7) + N(-1,7) )

Continuing in this way, we finally get

N(0,10) = 2^8 + 3*2^6 + 9*2^4 + 27*2^2 + 81*(N(1,1) + N(-1,1))

But N(1,1) = N(-1,1) = 0, since there's no way to reach 3 or -3 from either of 1 or -1 in only one flip. So

N(0,10) = 256 + 192 + 144 + 108 = 700

The probability is then P = 700/1024. This way would be a lot of work for n > 50 or so. And it doesn't work for a biased coin.

frustr8photon · Jan 18, 2010

awesome problem...not quite sure how to solve it yet

techmologist · Jan 18, 2010

frustr8photon said:

awesome problem...not quite sure how to solve it yet

It's the most challenging coin flip problem I've tried so far. There's a related problem that seems like it would be harder, but is actually simpler (unless I'm missing something easy on the problem in the OP):

Players A and B bet one unit on each flip of a coin, where A wins with heads and B wins with tails. Player A starts with n units; B starts with m units. They play until one of them runs out of money. The coin is biased: p = Pr(Heads) > 1/2. What is the probability that player B wins? That is, what is the probability that B is ahead n units before he gets behind m units?

Pr(B wins) = [(q/p)^n - (q/p)^(m+n)] / [1 - (q/p)^(m+n)] where q = 1-p < 1/2

I'm looking for a similar kind of formula for the problem in my first post...one where you can plug and chug :)

D H · Jan 18, 2010

The problem posed in the original post is not all that hard. In many probability problems it is easier to look at the opposite problem. This is the case here. The opposite problem is

What is the probability that, at every point during a series of 10 flips, the difference (absolute value) between the number of heads and tails is 2 or less.

The answer to the original problem is 1 less this probability. So how to compute the answer to this alternate question?

Obviously the probability is one up to the third roll. On the third roll, the number of heads must be one or two to satisfy the conditions. On the fourth roll, the number of heads must be one, two, or three. Five rolls is similar to the three roll problem, six to the four roll problem, etc. Multiplying all of these together yields the answer to the alternate problem. After a bit of work, this is

p_{\text{alt}} = \prod_{m=2}^5 \frac{(2m-1)(3m+1)(2m)!}{(m+1)(m!)^2\,2^{4m+2}} \approx 0.000287429

The answer to the original problem is thus 1-0.000287429=0.999712571. In other words, it is close to a certainty (99.97%) that at some point in the roll of ten die the absolute value of the difference between the number of heads and number of tails will be three or more.

CRGreathouse · Jan 18, 2010

D H said:

p_{\text{alt}} = \prod_{m=2}^5 \frac{(2m-1)(3m+1)(2m)!}{(m+1)(m!)^2\,2^{4m+2}} \approx 0.000287429

The answer to the original problem is thus 1-0.000287429=0.999712571. In other words, it is close to a certainty (99.97%) that at some point in the roll of ten die the absolute value of the difference between the number of heads and number of tails will be three or more.

I flipped a coin 10 times and got
THHTTTHTHT
in which the discrepancy is 2 or less at all times. So I reject your conclusion with alpha = 0.0003.

But seriously, your conclusion doesn't seem right, so I suspect you assumed independence between non-independent events or something similar. Experimentally I found a probability near 68%. This is consistent with techmologist's result of 700/1024.

D H · Jan 18, 2010

Obviously I did something wrong. Working on it ...

D H · Jan 18, 2010

Got it. There is an easy way to look at this problem.

Redefine the problem as flipping a coin some consecutive number of times, stopping when either the absolute value of the difference between heads and tails is three or more or when some pre-defined number of coin flips is reached. The key is that the experiment will only stop on odd-numbered flips or on the final flip. The experiment cannot stop on an even-numbered flip prior to the final flip because it would have already stopped on the preceding odd-numbered flip.

After the first flip, the difference between the number of heads and tails rolled will be ±1. There is a 1/4 chance that the experiment will stop on the third flip. If it does not stop on the third flip, the difference will once again be ±1. Given that the difference on the third flip is ±1, there is a 1/4 chance that the experiment will stop on the fifth flip. And so on. The probability of making it to the tenth flip is (3/4)⁴=0.31640625 (exactly). That tenth flip is a free ride. The probability of *not* having a difference between heads or tails be or 3 or more (absolute) anywhere in a sequence of ten flips is 31.64%, so the probability of having a difference of 3 or more (absolute) is 68.36%.

This result agrees with Monte Carlo tests.

Addendum
Suppose the question was with respect to 60 coin flips rather than 10. I ran a 60 coin toss experiment 10 million times in a Monte Carlo simulation. Of those 10 million cases, 2356 of them made it to the 60th flip. This experimental probability of 0.0002356 agrees rather nicely with the computed probability of (3/4)²⁹=0.000238109.

techmologist · Jan 18, 2010

D H,

Thank you. I really like that way of looking at the problem. That is really clever, and it makes perfect sense. I'm trying to think of how to generalize it to larger differences, but it seems to get pretty tricky even with d=4.

CRGreathouse · Jan 18, 2010

techmologist said:

D H,

Thank you. I really like that way of looking at the problem. That is really clever, and it makes perfect sense. I'm trying to think of how to generalize it to larger differences, but it seems to get pretty tricky even with d=4.

To generalize D H's (very nice) argument to greater discrepancies, you'll need to consider more than one state. The argument won't be so nice as the d = 2 case.

techmologist · Jan 18, 2010

CRGreathouse said:

To generalize D H's (very nice) argument to greater discrepancies, you'll need to consider more than one state. The argument won't be so nice as the d = 2 case.

Yes, I tried it for the case d=4 (which, by D H's method, requires tracking discrepancies up to 3). It seems to require as much work as multiplying probability matrices by hand. But the case that D H worked out is very easy to calculate. I will have to remember that in case I get a chance to try this bet on someone at the pool hall. :) You don't usually get an opportunity to do a 20 minute pen and paper calculation in that situation.

D H · Jan 19, 2010

Yes, the d=4 case is a bit worse, but still manageable. Imagine a Pascal triangle, but with the sides cutoff at a heads-tails difference of ±4. Now imagine striking the odd-numbered rows; in this case the odd-numbered coin tosses are free rides. You'll end up with something like this (the ±4 columns are displayed but do not contribute to the triangle sums):

 \begin{array}{r r||r|r r r|r} n & n/2 & -4 & -2 & 0 & 2 & 4 \\ \hline 0 & 0 & & & 1 & & \\ 2 & 1 & & 1 & 2 & 1 & \\ 4 & 2 & 1 & 4 & 6 & 4 & 1 \\ 6 & 3 & 4 & 14 & 20 & 14 & 4 \\ 8 & 4 & 14 & 48 & 68 & 48 & 14 \\ 10 & 5 & 48 & 164 & 232 & 164 & 48 \\ 12 & 6 & 164 & 560 & 792 & 560 & 164 \end{array} 

Sequences and such are much nicer when the index numbers increase by 1 rather than 2. I'll denote n/2 (the second column) as m and index things by m from now on. Note that the truncated triangle is symmetric. All that matters are the head-tails differences of 0, 2, and 4. Denote these as a_m, b_m, and c_m.

The probability that a sequence of rows ends on the 2m^th toss is 2c_m/t_m, where t_m is the total of the entries for a given row, including those c values that don't contribute to sums on the next row. This total is t_m=2c_m+2b_m+a_m

Note that those c_m values are equal to the b_m values on the previous row. There is no need to account for the c values in the state model. A bit trickier to see, those b_m values aren't needed either. For all but the top row, b_m=a_m-a_{m-1}[/tex]. The only reason this doesn't work for the top row is because there is no a-1 value. A value of a-1=1 would work quite nicely. So all that is needed is column a, the central column in the truncated Pascal triangle. Playing around with the math just a bit, \aligned a_m &amp;= 4a_{m-1}-2a_{m-2} \\ \\ p_{\text{end},m} &amp;= \frac 1 2\,\frac {a_{m-1}-a_{m-2}} {a_m-a_{m-1}} \endaligned where pend,m is the conditional probability that the sequence ends on toss 2*m with a heads-tails difference of ±4 given that the heads-tails difference on all the previous tosses was 3 or less (absolute). Multiplying (1-pend,1)*(1-pend,2)*...*(1-pend,m) yields the probability that the heads-tails difference remains within ±3 for all 2*m tosses in the sequence. This time I tested before I opened my big mouth (cf post #6). Monte Carlo testing agrees with these computed results. A couple of final notes: 1. The conditional probability quickly converges on a steady-state value of 1/(2+\sqrt 2) 2. That a sequence looks a bit Fibonacci-ish; there should be way to express abm directly.

techmologist · Jan 20, 2010

D H,

Wow, that is a very slick method for systematically keeping track of all the ways a series of flips can go! I had to think about it a while, but I get it now. What I really like is that your way, using a recurrence relation, works for any number of flips, n. To get the probability that the discrepancy remains < 4 for 2m tosses, you can divide 2*b_m + a_m by 2^2m, the total number of sequences of 2m tosses.

\mbox{Pr(discrepancy remains < 4 through 2m flips)} = \frac{2b_m + a_m}{2^{2m}}= \frac{a_{m+1} - a_m}{2^{2m}}

I solved the recurrence a_m - 4*a_m-1 + 2*a_m-2 = 0 by assuming a solution of the form r^m. That gives a quadratic with solutions

r = 2 +/- Sqrt(2)

The solution is a linear combination of these:

a_m = A*r₊^m + B*r_-^m

Choosing A = B = 1/2 satisfies the initial conditions a₀ = 1, a₁ = 2.

a_m = \frac{1}{2}((2+\sqrt{2})^m + (2-\sqrt{2})^m)

Plug this into the above formula for the probability to get

\mbox{Pr(discrepancy remains < 4 through 2m flips)} = \frac{(1+\sqrt{2})(2+\sqrt{2})^m+(1-\sqrt{2})(2-\sqrt{2})^m}{2^{2m+1}}

So far I've only checked this for m = 5 (10 tosses), and got the same result that I did doing it another way:

Probability that the discrepancy remains < 4 for 10 tosses = 560/1024

Your method is also easily generalized to higher discrepancies, but then you have to solve higher order recurrences. So for d=5, we need to solve a cubic equation, and for d=6 a quartic equation. I don't remember how to do either of those, but I could look it up :). That's as far as it goes for a general solution. But then, a general solution has little practical use anyway. It is just a curiosity for me.

Calculating Probability of Difference of Heads and Tails After n Flips

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Undergrad My basic understanding of set theory

High School Onto set mapping is the surjective set mapping, and into injective?

Undergrad How do E[X] and E[|X|] relate?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers