B Probability of n consecutive tails in n coin tosses

Chris Miller · Nov 12, 2018

The probability P of k consecutive tails occurring in n coin tosses is 1 - (1 / F) where F is element n+2 in the k-step Fibonacci series divided by 2ⁿ. As n approaches infinity, P approaches 1 for any value of k.

Is the above statement true?

EDIT: sorry, mistyped heading question, can't edit. First 'n' should be 'k'

RPinPA · Nov 12, 2018

I'm not sure what your definition of F means. You seem to be using two numbers, ##k## and ##n+2## to characterize a Fibonacci number. What is a k-step Fibonacci series? Is this different from the ordinary Fibonacci sequence 1, 1, 2, 3, 5, ...?

I think problems like this are often tackled recursively. Let's say ##P(n, k)## is the probability of at least one run of ##k## tails somewhere in ##n## coin tosses (I'm interpreting your question as asking about "at least one").

Clearly ##P(k,k) = 1/2^k##, the probability of ##k## tails in a row. Also obviously ##P(n,k) = 0## if ##n < k##.

Now consider ##P(n, k)## with ##n > k##. A run of ##k## tails can happen in one of two ways. Either it occurs in the first ##n - 1## tosses, probability ##P(n-1, k)##, or it does not occur in the first ##n-1## tosses, but the last ##k-1## tosses are tails and so is the ##n##-th toss. I have to think for a minute on how to construct that probability.

So ##P(n,k)## is expressed recursively in terms of those two probabilities.

RPinPA · Nov 12, 2018

RPinPA said:

A run of ##k## tails can happen in one of two ways. Either it occurs in the first ##n - 1## tosses, probability ##P(n-1, k)##, or it does not occur in the first ##n-1## tosses, but the last ##k-1## tosses are tails and so is the ##n##-th toss. I have to think for a minute on how to construct that probability.

I'm trying to be careful about constructing mutually-exclusive events, and events that span all the possibilities, so that I can simply add up the probabilities.

Let me restate this reasoning to make it a little simpler (I hope). There are two possibilities: A run of ##k## occurs at the end and nowhere else, or a run of ##k## occurs somewhere else and not at the end.

That first case has to end HT...T, a head followed by exactly ##k## tails, probability ##1/2^{k+1}##. And in the first ##n-(k+1)## tosses, there is no run of ##k##, and the probability of that is ##1 - P(n-(k+1), k)##. So the probability of such an outcome, a run of ##k## tails at the end and nowhere else, is the product of those or ##\left[ 1 - P(n-k-1,k) \right ] / 2^{k+1}##.

For the second case, ##P(n-1,k)## counts all outcomes where a run of ##k## occurs somewhere in the first ##n-1## tosses, including possibly at the end. If it's at the end, have I included it in the above? No, that would be an ending of T...Tx for the last ##k+1## positions where x is heads or tails, and that's distinct from HT...T. So I'm OK, these are non-overlapping cases. And I'm pretty sure they cover all possibilities.

Thus ##P(n,k) = \left[ 1 - P(n-k-1,k) \right ] / 2^{k+1} + P(n-1, k)##.

I have no idea how that relates to your expression, but let's see if we can solve the recursion.
##P(k,k) = 1/2^k##
##P(k+1,k) = \left[ 1 - P(k+1-k-1,k) \right ] / 2^{k+1} + P(k, k) = \left[1 - P(0,k) \right ]/2^{k+1} + 1/2^k = (1 + 2)/2^{k+1}##
##P(k+2,k) = \left[1 - P(1, k) \right ]/2^{k+2} + 1/2^{k+1} = [1 - P(1,k) + 2] / 2^{k + 2}##

I'm not sure where this is going but it's possible there's something like your Fibonacci numbers in there somewhere.

Chris Miller · Nov 12, 2018

Hey RPinPA. Yes, I'm interested in the probability of at least one occurrence of k consecutive tails in n coin tosses. A k-step Fibonacci series is where the previous k terms are added. See: http://mathworld.wolfram.com/Fibonaccin-StepNumber.html - The 2-step series is the one that gives rise to phi. I appreciate your taking the time to think about this.

StoneTemplePython · Nov 12, 2018

A couple of things.

1.) I don't think this is a B thread. A lot of things in probability are easy to state and not easy to solve. This is one of them. Problems of k length runs are not treated in any of the first course undergrad probability books that I'm aware of.

2.) rather than looking at the probability of having at least one 'success' after n tosses, let's look at the complement -- i.e. the probability of never having seen a run of ##k## tails after ##n## tosses.

Let me suggest partitioning the events as

##H##
##TH##
##TTH##
##\vdots##
##\underbrace{T...T}_{\text{k-1 times}}H##

where there are ##k## of these in total. Calculate the probabilities of each event as ##p_0, p_1, ..., p_{k-1}## respectively

It is assumed that ##n\geq k## and then try working with the below matrix.

##\mathbf A^r \mathbf u = \begin{bmatrix}
p_0 & p_1 & p_2 & \dots &p_{k-1}\\
1 & 0 & 0 & \dots & 0\\
0& 1& 0 & \dots & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots\\
0 & 0 & 0 & 1 & 0
\end{bmatrix}^r \begin{bmatrix}
u_{k-1}\\
u_{k-2}\\
\vdots\\
u_{1}\\
u_{0}
\end{bmatrix}##

This chain is used to give you the probability of not having seen a run after a desired number of tosses. eg ##\mathbf e_1^T \mathbf A^r \mathbf u## with ##\mathbf e_1## as the first standard basis vector, and setting ##r = 1## gives probability of no run after ##k## tosses, ##r=2## gives probability of no run after ##k+1## tosses, and so on.

By inspection we see that
##
\begin{bmatrix}
u_{k-1}\\
u_{k-2}\\
\vdots\\
u_{1}\\
u_{0}
\end{bmatrix} =
\begin{bmatrix}
1\\
1\\
\vdots\\
1\\
1
\end{bmatrix}##

i.e. ##\mathbf u = \mathbf 1##

In words: the above can be interpreted as "last step analysis" and the way to interpret this is that after 0, 1, 2, ...,and even ##k-1## tosses, we haven't seen a run yet with probability 1. These are the initial conditions of the recurrence that are implied by how the problem is set up.

Then if we look at the total probability of no run after say ##k## tosses, it is because we must have seen an event from the above collection. I.e. we must have not yet had a run of ##k## tails at time ##k-j## and then an event of length ##j## from the above -- and we sum over ##j=1, 2,...,n##. If you take a careful look at the lack of 'overlaps' in the events, you'll see they are mutually exclusive.

A renewal theory take:
Technically you don't need renewal theory to interpret the above argument though it helps clarify that we're looking at a defective renewal process and the first row of ##\mathbf A## gives the (defective) probability distribution of time until next renewal (where a renewal is defined as the event that a ##H## occurs and a run of k ##T##'s has not yet occurred). Note that this view point offers two different perspectives -- ##\mathbf e_1^T \big(\mathbf A^r \mathbf 1\big)## is the basic recurrence argument from above and we are in effect using ##\mathbf A## as an expectations operator. On the other hand if we use associativity and look at ##\big(\mathbf e_1^T \mathbf A^r\big) \mathbf 1## we can interpret this a queuing model where we start at the 'top' of the queue and after ##r## trials we use the ones vector to sum up everything that still exists in the queue (but since the process is defective a decent chunk will have 'fallen off the face of the earth' -- and this is precisely the probability of at least one run of k tails)

- - - - -
The above matrix represents a residual life chain from renewal theory. It is also similar to the companion matrix. It also should look somewhat fibonacci matrix-ish.

A nice way to have some fun and get the conclusion OP was asking about, is consider that

##p_0 + p_1 + p_2 + ... +p_{k-1}\lt 1##

(how do we know this? easy way: consider that if this was an infinite serries it would be the geometric distribution and sum to one)

but we may (perhaps numerically) solve for some ##\theta\gt 1## such that

##\theta p_0 + \theta^2 p_1 + \theta^3 p_2 + ... + \theta^k p_{k-1}= 1##

and make use of exponential tilting. See e.g. the last 2 pages of here, which shows (with a typo or two) how to apply exponential tilting to a fibonacci matrix:
http://galton.uchicago.edu/~lalley/Courses/Summer/Renewal2.pdf
- - - -
conclusion:
in the case of ##k=2## you'll see that ##\theta = -1 + \sqrt{5}##

This should look familiar in terms of Fibonacci-- i.e. you have the dominant root, but times two.

and in general, because you have a fair coin (homogenous probability between heads and tails) and are talking about a pattern that is a 'pure' run of ##k ## tails (homogenous pattern)

you'll see that

##\theta p_0 + \theta^2 p_1 + \theta^3 p_2 + ... + \theta^k p_{k-1}= 1##
is equivalent to
##\big(\frac{\theta}{2}\big) + \big(\frac{\theta}{2}\big)^2 + \big(\frac{\theta}{2}\big)^3 + ... + \big(\frac{\theta}{2}\big)^k= 1##

so if you solve for ##\big(\frac{\theta}{2}\big)## then yes, this is equivalent to the dominant root of a k-step fibonnaci.

The advantage of the above setup is it rapidly generalizes to arbitrary patterns and coins that are not necessarily fair (or even objects that aren't coins, say for looking at patterns in dice or random letters, etc.)

It is possible that there's a more direct combinatorial argument that works for this special case of a fair coin and a run of k tails, though I don't know it, and I have a strong preference for arguments that are probabilistic in nature as opposed to combinatorial manipulations.

- - - -
a more direct close: you can, with some care, also skip the 'proper' exponential tilting by setting ##\theta := 2## in which case your first row ##\mathbf A_{\text{tilted}}## becomes all ones and you recover the k step fibonnaci matrix exactly. Maybe this is what OP wants here, though you lose some nice probabilistic linkages and exponential tilting becomes just an algebraic hack. But yes, this would seem to tie directly in to

Chris Miller said:

The probability P of k consecutive tails occurring in n coin tosses is 1 - (1 / F) where F is element n+2 in the k-step Fibonacci series divided by 2ⁿ.

Though I think there is a typo and it should read ##1 -F##

Qualitatively ##\frac{1}{F}## can't be right since probabilities are bounded in 0 and 1, and ##\big(\frac{\text{fibonacci of n}}{2^n}\big)^{-1} =\frac{2^n}{\text{fibonacci of n}}## grows much larger than ##1## -- e.g. consider the roots for a length 2 fibonanci -- they both have magnitude of less than 2.

Chris Miller · Nov 13, 2018

StoneTemplePython, thanks so much for the work, which I'm going to have to look at awhile to understand, and for finding my "typo." Of course as F approaches 0, 1/F approaches infinity. Although, does F approach 0? It seems so, but I haven't seen proof. Funny how any mathematical question always gives rise to more questions. Like is there a formula for calculating (or even approximating as with factorials) fibonacci of k, or do you have to calculate it iteratively like I've done? For a minute last night (after reading your post) I thought that I could calculate P as just 1 - S where S is (the probability of not seeing k tails at any juncture)^(n-k) , but then realized that would only be the case if, after failing, you could go back and start over at the next juncture/toss. Now I wonder if you could compute S as (the probability of not seeing k consecutive tails at any given juncture)^{probable number of heads in n - k tosses}. Or if this would even be a good approximation. Plan to test when I get time. Thanks again for your very helpful response.

Chris Miller · Nov 13, 2018

for k=20 and n=1000000, I tested my fibonacci-free equation against this:

http://www.drdobbs.com/architecture-and-design/20-heads-in-a-row-what-are-the-odds/229300217
=0.379253961388950068663971868

where probable number of non-tails (i.e., heads) h=(n-k)/2
1 - ((2^k-1)/2^k)^h
=0.3792506171717

Close enough? Would be easy to modify the formula for events like runs of snake-eyes in dice throws, etc.

StoneTemplePython said:

1.) I don't think this is a B thread. A lot of things in probability are easy to state and not easy to solve. This is one of them. Problems of k length runs are not treated in any of the first course undergrad probability books that I'm aware of.

You're probably right. It's just my formal math education isn't much beyond B. But I'd hazard to say the above approach brings it down to that level?

StoneTemplePython · Nov 13, 2018

Chris Miller said:

StoneTemplePython, thanks so much for the work, which I'm going to have to look at awhile to understand, and for finding my "typo." Of course as F approaches 0, 1/F approaches infinity. Although, does F approach 0? It seems so, but I haven't seen proof.

Yes. There are 2 'easy' ways to show this.

First: it is implied by the exponential tilting setup. Can I suggest you work through some specific examples of properly tilting a k-step Fibonnaci matrix? I.e. in general

##\mathbf F = \begin{bmatrix}
1 & 1 & 1 & \dots &1\\
1 & 0 & 0 & \dots & 0\\
0& 1& 0 & \dots & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots\\
0 & 0 & 0 & 1 & 0
\end{bmatrix}
\to
\begin{bmatrix}
\theta & \theta^2 & \theta^3 & \dots &\theta^k\\
1 & 0 & 0 & \dots & 0\\
0& 1& 0 & \dots & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots\\
0 & 0 & 0 & 1 & 0
\end{bmatrix} = \mathbf F_{\text{tilted}}##

where
##\theta \gt 0## and
##\theta + \theta^2 + \theta^3 +\dots + \theta^k := 1##

(but on your end, I'd suggest working through in full detail for say ##k=2, 3, 4, 5, 10, 20##)

It must be the case that ##\theta \in (\frac{1}{2},1)##. Here's a look at a proof:

Note that if we try ##\theta = 0## then the LHS = 0 (too small) and if we try ##\theta = 1## the RHS ##= k \gt 1## (too big) and the sum monotonically increases with ##\theta## so making use of intermediate value theorem plus the fact that for ##\theta \in (0,1)##

##\theta\cdot \sum_{i=0}^{k-1} \theta^i = \theta + \theta^2 + \theta^3 +\dots + \theta^k \lt \theta + \theta^2 + \theta^3 +\dots + \theta^k + \theta^{k+1} + \theta^{k+2} +... = \theta\cdot \sum_{i=0}^\infty \theta^i = \frac{\theta}{1-\theta}##

yet if we try ##\theta = \frac{1}{2}## the RHS = 1 and the LHS is strictly less than this. Putting this all together proves that ## \frac{1}{2} \lt \theta \lt 1##.

And if you look through those pages on exponential tilting you see (modulo a bad typo in the doc!) that the asymptotic rate of growth is ##\theta^{-n}## which means that the asymptotic growth rate is ##1 \lt \theta^{-n} = (2 - \delta)^n \lt 2^n##
for some ##\delta \in (0, 1)##

and dividing by ##2^n##

gives
##\frac{1}{2^n} \lt ( 1- \frac{\delta}{2})^n \lt 1##

in particular note (via a convexity argument)
##(1- \frac{\delta}{2})^n \leq \exp\big(- \frac{\delta}{2}\cdot n\big)##
and the right hand side tends to zero as ##n## grows large. If you prefer you can just note that ## (1- \frac{\delta}{2})## is some positive number less than one and raising it to large enough ##n## makes it arbitrarily small.Second: Another way to prove this, that is more typically linear algebraic, is to prove that all eigenvalues of the Fibonnacci matrix have magnitude less than 2. You can immediately show that all eigenvalues have magnitude of less than or equal to 2 by direct application of Gerschgorin discs. To make the inequality strict you need to recognize that the final column doesn't have a radius reaching the value of 2, and Taussky's refinement of the inequality then shows the inequality is strict. This is the eigenvalue approach to proving that after dividing by ##2^n## F tends to zero. If you want to get 90% of the understanding here with a minimal amount of work, I'd suggest ignoring the strictness and settle for direct application of Gerschgorin discs -- there's some very nice pictures and a walkthrough here: https://golem.ph.utexas.edu/category/2016/08/in_praise_of_the_gershgorin_di.html

Chris Miller said:

Like is there a formula for calculating (or even approximating as with factorials) fibonacci of k, or do you have to calculate it iteratively like I've done?

well yes, this is where Diagonalizing a matrix comes into play. In general your k step fibonnaci matrix will have ##k## eigenvalues that realistically you'll want a computer to find. This is basic blocking and tackling stuff from Linear Algebra though for whatever reason it seems to me that a lot of people are uncomfortable with this.

Exponential tilting allows you to just hone in on the dominant eigenvalue which is nice for a lot of reasons. The fact that the properly tilted matrix is row stochastic also has some nice bonus features. Exponential tilting is considerably more esoteric -- I think I've only seen it mentioned in that link and one book. (If you do a google search you'll see a lot of hits under "exponential tilting" but they are almost all for tilting actual random variables via importance sampling by using the moment generating function -- it's part of a general family of techniques involving tilting but I think what we are discussing here is quite a bit different in the details and not discussed that much.) It's something that seems almost obvious after you play around with it for a while, but not at all obvious beforehand.

As an extension: if you'd like I can show you how to get an easy asymptotic estimate using this approach and also an easy bound on the error made for some much smaller ##n## number of tosses-- provided that you're comfortable with things like the role of the ones vector for stochastic matrices and manipulating inequalities.
- - - -
edit:
There's actually direct formula for your problem with an even odds coin, the probability of seeing a run of (at least) k heads over n tosses is given by

$$\sum_{m=1}^{\lfloor \frac{n+1}{k+1} \rfloor} (-1)^{m+1}\Big[\frac{\binom{n-mk}{m}}{2^{m(k+1)}} + \frac{\binom{n-mk}{m-1}}{2^{m(k+1)-1}}\Big]$$

The result is proven by inclusion-exclusion.

To be honest, I don't like this result as there are binomial coefficients all over the place and the alternating sign function gets in the way of nice monotonicity that we have in non-negative ##\mathbf F##. Structurally it does not seem obvious to me that this ties in with Fibonnaci numbers, though what's "obvious" in math is a slippery concept. Put differently: I can't really eyeball what the underlying structure (esp. asymptotics) of this formula are and there's a lot of terms in that summation so I don't care for it too much.

- - - -

Chris Miller said:

You're probably right. It's just my formal math education isn't much beyond B. But I'd hazard to say the above approach brings it down to that level?

I tried to bring it to an I. The sticking points I think are (a) understanding Events and (b) familiarity with matrices. The latter should be approachable if you know matrix multiplication just patiently work through lots of small examples on exponential tilting. But for (a), I'm not so sure ... I think it just takes people a long time to figure out how to make sense of going from events in a sample space to a probabilistic model.

By the way there are some other extremely nice ways to approximate and bound the probabilities mentioned above but you'd need a course in stochastics to make any sense of it I think.

Chris Miller · Nov 14, 2018

Thanks again, StoneTemplePython. I think you overestimate my mathematical abilities/understandings. A lot of your terminologies and even symbols/expressions are over my head. You're speaking I+ to a B-. I don't mind. Actually much appreciate. It gives directions to explore. I think your "direct formula" looks like it'd be hard to implement, seems more an algorithm than an expression like my simple 1 - ((2^k-1)/2^k)^(n-k)/2 which could easily be tweaked for other problems (e.g. dice, e.g. bitcoin mining). Seems a pretty close approximation of k.fibonacci{n+2} /2ⁿ

StoneTemplePython · Nov 14, 2018

Chris Miller said:

Thanks again, StoneTemplePython. I think you overestimate my mathematical abilities/understandings. A lot of your terminologies and even symbols/expressions are over my head. You're speaking I+ to a B-. I don't mind. Actually much appreciate. It gives directions to explore.

yea, I figured as much. The thing is you asked for proof that fibonaccis/2 vanish -- I gave the 2 most direct ones I could think of. Proofs aren't always easy. The result is implied by some standard results regarding markov chains / renewal theory in that defective renewal processes have a finite expected number of renewals, but I thought that was too advanced so I didn't mention it.

Chris Miller said:

I think your "direct formula" looks like it'd be hard to implement, seems more an algorithm than an expression like my simple 1 - ((2^k-1)/2^k)^(n-k)/2 which could easily be tweaked for other problems (e.g. dice, e.g. bitcoin mining). Seems a pretty close approximation of k.fibonacci{n+2} /2ⁿ

I'll add a couple of comments on the direct formula in a bit. I think it is unwieldy but there are some small kernels of usefulness in there that could have prevented egregious errors made by the guy in the article you linked.

As for your other 'formula'? I have no idea why it should be expected to work, so I wouldn't expect it to generalize to other situations.

Chris Miller · Nov 14, 2018

StoneTemplePython said:

As for your other 'formula'? I have no idea why it should be expected to work, so I wouldn't expect it to generalize to other situations.

It did work, though, insofar as it was accurate to five decimal places compared to that Dobbs guy's calculations using the Fibonacci method. The reason it should work is that it assumes the number of start overs (i.e., heads) that would be probabilistically expected and raises the probability of not seeing k successive tails (in k tosses) to this exponent, then subtracts from 1.

Easy to generalize. Like I'd bet that the probability of 10 consecutive 7s somewhere in 100 rolls of a pair of dice would be very close to
1 - ((6¹⁰-1)/6¹⁰)^(5*90)/6
or 1-(60466175/60466176)⁷⁵

edit: or not, the numbers look too small

StoneTemplePython · Nov 14, 2018

I was hoping to emphasize the geometric/ exponential decay in probability of not having a n consecutive tails run, but maybe that isn't the best thing for OP to learn here. Instead, how about Boole's Inequality / Union Bound? The proof is fairly easy for the case of ##n## events and as is often the case ##n=2## is the most important one. Note: Bernouli trial means independent identically distributed coin tosses -- that's the underlying generator of your problem.

For a heuristic proof:
draw a Venn Diagram -- seriously drawing a picture helps. Color the left entity blue and label it ##A_0## and the right one red, label it ##A_1## and the overlap, if any, is then purple. The probability of event ##A_0## is ##P(A_0)## and the event ##A_1## is ##P(A_1)##. What is the probability of the union of those events i.e. ##P(A_0 \cup A_1)##? If your venn diagram has no overlap (no purple) then ##P(A_0 \cup A_1) = P(A_0) +P(A_1)##, but if there is overlap then you've double counted the purple region.

Note: no overlap aka no purple is equivalent to the events being mutually exclusive.

So

##P(A_0 \cup A_1) = P(A_0) +P(A_1) - P(A_0 \cap A_1)##
recognizing that probabilities are non-negative we get
##P(A_0 \cup A_1) \leq P(A_0) +P(A_1)##
this is Boole's Inequality aka the Union Bound.

another idea that falls out of this: mutually exclusive events are 'good' and easy to work with.
- - - -
So returning to your problem: rather than looking at the complementary event as before, we can look directly at the probability of having one or more runs of k tails in n coin tosses.

First, suppose there are ##n \gt k## coin tosses (we know the ##n=k## case and the ##n\lt k## cases so we don't need to worry about them). For convenience let's start by assuming a pretty large ##n## though it isn't needed.

We can see that there may be a run of ##k## tails on the very first ##k## tosses so call this event ##A_0## and of course
##P(A_0) = \text{prob tails}^k = 2^{-k}##.

In all other cases of a run occurring then we could try for the same logic of looking at exactly k tails in a k-length block. However a common problem comes up -- you don't have mutual exclusivity with "neighbors" -- e.g. if you looked at bernouli trials '10 to (10 + k-1)' vs say '11 to k', the success in the former doesn't prevent there being a success in the latter. But mutual exclusivity is desirable for all sorts of reasons, so instead let's define all events other than that initial edge case of ##A_0## as being the event that (at least one) Heads occurs followed by a k length run of tails -- so we assign ##A_1, A_2, ..., A_{n-(k+1)}## for each of these events. (Confirm that we still always detect the event of a run of k tails in this setup and never count a 'fake run' -- i.e. this setup does what we want it to do.) There's sometimes a little fine-tuning to do on indexing nits but the above should be right or off by maybe one in index.

This gives us
##P(A_i) = \text{prob heads}\cdot \text{prob tails}^k = 2^{-1}\cdot 2^{-k} = 2^{-(k+1)}##
for ##i\geq 1##

so
##\text{Prob of at least one run of k tails } = P\big(A_0 \cup A_1 \cup A_2 \cup ... \cup A_{n-(k+1)}\big) ##
##\leq P\big(A_0\big) + P\big(A_1\big)+ P\big(A_2\big) + ... + P\big(A_{n-(k+1)} \big)##
##= 2^{-k} + \big(n-(k+1)\big)2^{-(k+1)}##
##= 2\cdot 2^{-(k+1)} + \big(n-(k+1)\big)2^{-(k+1)} ##
##= \big(n-(k-1)\big)2^{-(k+1)}##

by Boole's Inequality. For convenience we can upper bound this as
##\leq n \cdot 2^{-(k+1)}##

Thus far things are about as basic as they can be in probability. Union Bounds are crude but easy bounds and often enough: they are surprisingly useful.

note that by using log in base 2, we can write
##2^{19.93}\lt 1,000,000 \lt 2^{19.94}##

so what is the probability of having at least one run of 20 tails in 1MM tosses? We can use our above results to say
##\leq n \cdot 2^{-(k+1)} = \frac{1000000}{2^{21}} \lt \frac{2^{19.94}}{2^{21}} = \big(\frac{1}{2}\big)^{1.06}\approx 0.479632##

This is a loose upper bound, courtesy of the union bound.

What's quite shocking is that your link references a NYT article where Wulf, former president of the National Academy of Engineering, is quoted as saying "If you have a million coin flips,” he says, “it’s almost certain that somewhere in those coin flips there will be 20 heads in a row.” Which of course is blatantly wrong.
(note this is a fair coin so the calculation is probabilistically the same as there being 20 tails in a row)

Equally surprising: Nelson writes in a Letter to the Editors of NYT to correct this point and then says that the probability of this happening is around ##0.6##. Also impossible! we know from the union bound that the probability must be less than half!

- - - -
sweetener:
the formula with many binomial coefficients -- i.e. in post #8-- comes from using the above set of events and applying inclusion-exclusion. In the million coin tosses case this nominally means a summation with about 50,000 lines to sum! However, while it's too much for me to derive here, there is a nice refinement of Boole's Inequality called Bonferroni's Inequality. Basically what this says is if you only use the first step of inclusion-exclusion then you get an upper bound (Boole's), if you only use the first two steps you get a lower bound, if you only use the first 3 steps you get a tighter upper bound, if you only use the first 4 steps you get a tighter lower bound, and so on.

So rather than running through a summation with 50,000 things you can get a nice estimate with far less terms. In fact 4 or 5 lines is enough for this problem.

Note: I hid some of the rationale, but one of the reasons for pushing for mutual exclusivity for events with overlap -- i.e. in the way we defined ##A_i## for ##i\geq 1## is it makes this alternating bounding / inclusion-exclusion formula easy to work with -- if we used a naive event setup where overlaps don't have mutually exclusive success probabilities, the formula would be much nastier, maybe intractable.

playing around with these things numerically can be useful, so here's some easy python code:

Python:

import numpy as np
from scipy.special import comb
# in Python 3.x

k = 20 # run of tails
n = 1000000 # total tosses
p = 0.5 # probability of tails
# some caution is needed when playing around with markedly different scenarios as this approach
# has some significant numeric stability issues

upper_threshold = 5  # some positive integer

X_bar = (1 - p**k)/((1-p)*p**k)
print(1 - np.exp(-(n-k)/X_bar), "<-- analytic lower bound, don't ask where this came from\nbeginning bonferonni\n")

running_sum = 0
for m in range(1,upper_threshold + 1):
    if m-1 >= n - m * k:
        break
    running_sum += (-1)**(m+1)*((1-p)**(m)*p**(m*k)*comb(n-m*k,m) + p**(m*k +m-1)*comb(n-m*k,m-1))
    # if we ignore A_0 or bound its impact via other means, we can comment out the above line
    # and uncomment the below line, which is simpler and just has the first of the two terms
    # running_sum+= (-1)**(m+1)*(1-p)**(m)*p**(m*k)*comb(n-m*k,m)
 
    print(running_sum, "for m =", m)
    # this shows the alternating and tightening bound on probability estimates

StoneTemplePython · Nov 15, 2018

Chris Miller said:

It did work, though, insofar as it was accurate to five decimal places compared to that Dobbs guy's calculations using the Fibonacci method. The reason it should work is that it assumes the number of start overs (i.e., heads) that would be probabilistically expected and raises the probability of not seeing k successive tails (in k tosses) to this exponent, then subtracts from 1.

What you are doing seems awfully close to the 'technique' that is linked in through the article you posted i.e. the underlying link: https://marknelson.us/posts/2010/09/12/innumeracy-revisited.html

But this isn't a mathematical technique: it's a story and it went very wrong in that case.

- - - -
edit:
it appears that this has been hashed through before on this forum, with markneslon/snorkelman even weighing in:

https://www.physicsforums.com/threa...-5-consecutive-heads-in-10-coin-flips.607793/
- - - -

Chris Miller said:

Easy to generalize. Like I'd bet that the probability of 10 consecutive 7s somewhere in 100 rolls of a pair of dice would be very close to
1 - ((6¹⁰-1)/6¹⁰)^(5*90)/6
or 1-(60466175/60466176)⁷⁵

edit: or not, the numbers look too small

The numbers are in fact very close, but it appears to be a case of coincidence. I can show why via use of the exponential function:

Note the expected time until a run of k tails is given by
##\bar{X} = \frac{1-p^k}{(1-p)p^k} = \frac{1-\frac{1}{2^k}}{\frac{1}{2^{k+1}}} = 2^{k+1}\big(1-\frac{1}{2^k}\big)##
this was flushed out in a recent homework thread and takes a lot of work to understand.

Focusing on the complementary probability -- i.e. the probability of no k-tails run in n tosses -- if you work through exponential tilting and some granular eigenvalue stuff coming from perron frobenius theory (as well as clever use of the ones vector which is a right eigenvector for the tilted matrix), then you get

##Pr\{\text{no k tails run}\} \leq \exp\big(\frac{-(n-k)}{\bar{X}}\big)##
it's an upper bound but for the most part an extremely close approximation.

Now what you wrote is for the probability of at least one run over n tosses

Chris Miller said:

1 - ((2^k-1)/2^k)^(n-k)/2

so the complement is
##Pr\{\text{no k tails run}\} \approx
\Big( \frac{2^k - 1}{2^k}\Big) ^{\frac{n-k}{2}} = \Big( 1 - \frac{ 1}{2^k}\Big) ^{\frac{n-k}{2}}\leq \exp\big(-\frac{ 1}{2^k}\frac{n-k}{2}\big) = \exp\big(\frac{-(n-k)}{2^{k+1}}\big)##
typically this upper bound via the exponential function is a very close approximation

notice how these similar inequalities are. In effect you are using
##2^{k+1} \approx 2^{k+1}\big(1-\frac{1}{2^k}\big) = \bar{X}##

which works as an approximation for even modest k, and certainly for ##k = 20##. The difference is one approach has strong mathematical reasoning behind it. If you want to use heuristics for rare events you may want to try using Poisson or Poisson clumping heuristics. (A Poisson approximation will basically recover your approximation in exponential form -- i.e. ##\exp\big(\frac{-(n-k)}{2^{k+1}}\big)## but it does so via a very different line of reasoning. This problem is tractable and the error can actually be bounded via Stein-Chen -- but I think I've said enough for this.)
- - - -
I've now given a direct way to solve via a matrix as well as a nice result via exponential tilting and the exact answer via inclusion-exclusion as well as various bounds.

That's about all I have to say for this thread.

Chris Miller · Nov 16, 2018

Hey STP, Thank you again for all the time and thought you've put into this. Nonetheless, despite your obviously much superior mathematical training, ability (and probably intelligence) I must respectfully disagree with respect to the following:

StoneTemplePython said:

What you are doing seems awfully close to the 'technique' that is linked in through the article you posted i.e. the underlying link: https://marknelson.us/posts/2010/09/12/innumeracy-revisited.html ... But this isn't a mathematical technique: it's a story and it went very wrong in that case.

No. Here's his first kick at the cat:

The chance of n heads in a row occurring is 1/2ⁿ, so the inverse probability is (2ⁿ-1)/2ⁿ. If we multiply that probability once for all 999,981 possible occurrences of a streak of 20 heads, it seemed to me that I would be in business. Doing this is a simple enough calculation, and the result was the 60% figure. That figure felt like it was in the ballpark to me, and I left it that.

His way went wrong because it fails to take into consideration the number of times he'll really get to start over (per million tosses) over infinite attempts (at a million tosses). This number , i.e. the number of times you get to actually try, is the mean of the number of non-tails one can expect (i.e., a million tosses less the number of consecutive tails desired, divided by 2). Where 20 consecutive tails are wanted, this number is, obviously, (1000000 - 20)/2. Where 10 consecutive 7s in a million throws of a pair of dice are needed, this number, the number of start-overs, on average, over infinite attempts (at a million throws) would be, (5*(1000000-10))/6.

I believe this proves that the equation for calculating the probability of k consecutive occurrences of event x somewhere in n attempts, where the probability of x occurring once is a/b, is exactly,

1-((1-((a/b)^k))^{((((b-a)*(n-k))/b))})

Here's a simple program to calculate. Note that huge integers/floats weren't needed to match the Dobbs guy's result to 5 decimal places.

n=1000000 // number of events
k=20 // number of consecutive events

// probability of single event occurring is a/b
a=1
b=2
// for coin flips

? 'The probability of',k, 'consecutive occurrences of a', a/b, 'probability event somewhere in',n, 'attempts is:'
? 1-((1-((a/b)**k))**((((b-a)*(n-k))/b)))

outputs:
The probability of 20 consecutive occurrences of a 0.50000 probability event somewhere in 1000000 attempts is:
0.37925where,

n=100000
k=8
a=1
b=6
// for dice throws resulting in 7
.
.
.

outputs:
The probability of 8 consecutive occurances of a 0.16667 probability event somewhere in 1000000 attempts is:
0.39112

StoneTemplePython · Nov 16, 2018

Chris Miller said:

I believe this proves that the equation for calculating the probability of k consecutive occurrences of event x somewhere in n attempts, where the probability of x occurring once is a/b, is exactly,

1-((1-((a/b)^k))^{((((b-a)*(n-k))/b))})

(red flag 1)
The verbiage leading up to this cited no theorems, just a story. Hence your 'exact formula' doesn't pass the smell test.

(red flag 2)
I've already shown the exact result using inclusion - exclusion and tightening bounds via Bonferonni -- your 'exact solution' will violate these bounds. Going through this would be a direct and constructive way of showing that your formula is not correct. I could work through it symbolically in say sympy -- the focus on numeric stability is a red herring-- and show this but I've done more than enough to spoon feed results on this thread.

(red flag 3)
This is a proof by contradiction:
suppose what you are saying is correct and the exact formula is what you've shown. For any given problem setup we specify a and b in advance. Let's go back to the fair coin problem -- that immediately fixes a and b. And we select ##k## in advance, maybe something simple like ##k=2 ## or ##k=3## is good. If you algebraically simplify it, your formula for the complementary probability is

##= \lambda^r ##

where as in my original post, ##r:= n-k## or ##r:= n-k+1## -- it's an indexing nit that can be tweaked either way.

Now we ask what is the probability of no k-run of tails after ##n = \{k,k+1,k+2,...\}## attempts? The answer comes exactly from ##\lambda^r## by selecting the correct ##r##. But as addressed in my original post, this is a fibonnaci number tilted by ##\frac{1}{2^n}## . Hence you've just "proven" that fibonnaci numbers are generating entirely by an operator having only one unique non-zero eigenvalue. This contradicts centuries of knowledge of mathematics about Fibonnaci numbers, including the Binet formula and hence cannot be true.

for the more general case
where the probability of tails ##q \in \big(0,1\big)##

We can eyeball the rows to see that all are linearly independent (or if you recognize ##\mathbf A## is similar to the companion matrix, you can just look at the magnitude of the determinant, located in the top right corner, which is non-zero) -- as a result we see that ##\mathbf A## is non-singular or equivalently ##\big \vert \det\big(\mathbf A\big)\big\vert \gt 0##.

As mentioned in prior posts, the top row sums to less than one but by inspection we see that all others sum to one, hence

##\mathbf A\mathbf 1
\not\propto
\mathbf 1##
(i.e. the ones vector is not an eigenvector)

but using OP's solution here implies that

##\mathbf A^{k-1} \mathbf u= \mathbf A^{k-1} \mathbf 1=
\begin{bmatrix}
\lambda^k\\
\lambda^{k-1}\\
\vdots\\
\lambda^1\\
1
\end{bmatrix} = \mathbf v ##

(which should look like an eigenvector for the companion matrix)

and

##\mathbf A^{k}\mathbf 1 =\mathbf A\big( \mathbf A^{k-1}\mathbf 1\big) = \mathbf A \mathbf v =
\begin{bmatrix}
\lambda^{k+1}\\
\lambda^{k}\\
\vdots\\
\lambda^2\\
\lambda
\end{bmatrix}=
\lambda\cdot \begin{bmatrix}
\lambda^k\\
\lambda^{k-1}\\
\vdots\\
\lambda^1\\
1
\end{bmatrix}
= \lambda \mathbf v
##

which confirms that ##\mathbf v## is an eigenvector of ##\mathbf A##. Using basic arithmetic, we can write

##\mathbf 1 = \frac{1}{\lambda^{k-1}}\mathbf v + \mathbf x##

##\begin{bmatrix}
1\\
1\\
\vdots\\
1\\
1
\end{bmatrix} = \begin{bmatrix}
\lambda^{1}\\
1\\
\vdots\\
\lambda^{-(k-2)}\\
\lambda^{-(k-1)}
\end{bmatrix} + \begin{bmatrix}
1-\lambda\\
0\\
\vdots\\
1-\lambda^{-(k-2)}\\
1-\lambda^{-(k-1)}
\end{bmatrix}
##

but this means

##\mathbf v = \mathbf A^{k-1} \mathbf 1 = \mathbf A^{k-1}\big(\frac{1}{\lambda^{k-1}}\mathbf v + \mathbf x\big) = \frac{1}{\lambda^{k-1}}\mathbf A^{k-1}\mathbf v + \mathbf A^{k-1}\mathbf x = \lambda^{k-1}\frac{1}{\lambda^{k-1}}\mathbf v + \mathbf A^{k-1}\mathbf x = \mathbf v + \mathbf A^{k-1}\mathbf x##

subtracting ##\mathbf v## from both sides gives

##\mathbf A^{k-1}\mathbf x = \mathbf 0##

However ##\mathbf A## is non-singular which means ##\mathbf A^{k-1}## is nonsingular which means ##\mathbf x = \mathbf 0##. There are two ways to close:
(a) geometrically: if ##\mathbf x = \mathbf 0## then ##\mathbf 1## is in fact an eigenvector, which as we've shown, is a contradiction.
(b) algebraicly: ##\mathbf x = \mathbf 0## implies ##1-\lambda = 0## i.e. that ##\lambda = 1## but from the earlier argument with say Gerschgorin discs we know that ##\lambda \lt 1## and this is a contradiction.

Hence OP's exact solution to this recurrence can never be correct.

- - - -
The thread has now come full circle.

If you want to actually educate yourself:
i.) stop learning from blogs by people who don't have expertise in that area
ii.) learn to evaluate what is and isn't a proof
iii.) pick up an actual math book and go through it while working out examples and exercises in full. This is a lot of hard work.

Your problem is treated rigorously and in depth in Chapter XI of Feller volume 1, 3rd edition. Feller's approach is somewhat different than the various approaches I've shown above, and heavily uses ordinary generating functions. What's written in there of course agrees with what I've said and contradicts what you've said.

B Probability of n consecutive tails in n coin tosses

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Stochastic calculus: Ito's lemma and differentials

I Help me understand skewness in QQ-plots please

I Intransitive implication

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem