The Drunkard's Walk: A Discussion of the Text

benorin · Mar 8, 2018

This thread is for us to discuss the text The Drunkard's Walk: How Randomness Rules Our Lives by Leonard Mlodinow, and by that I mean for me to ask questions of you, those of you who will suffer me, my experts in probability, the PhysicsForums readers, on things I'm interested in from the text. I did ask the forum moderators whether or not I could post short exact quotes from the text without violating copyright law and they gave me their blessing if anybody has interest, here's link to the book -- don't be shy... post whatever you want us to analyze in here and we'll do our best.

https://www.amazon.com/dp/0307275175/?tag=pfamazon01-20
<< Moderator's Note -- link updated >>

I hope I'm not the only one directing conversations in this thread by the end of it, 'twas a good book and I think it ought be discussed, and I feel that discussion should require pencil and paper. I have marked this thread with the Basic (High School) prefix because the book was written for such an audience but I've no way of knowing whether my questions will venture off into Intermediate at some point. As for me, since you might need to know to answer my questions in a way I can understand, I was a math major, and I studied mostly analysis though I did take the standard undergrad probability and statistics course.

There was a particular passage that I had in mind when I decided to create this thread: pg. 174, "[...]The point was rather starkly illustrated by mathematician George Spencer-Brown, who wrote that in a random series of 10^1,000,007 zeroes and ones, you should expect at least 10 nonoverlapping subsequences of 1 million consecutive zeroes." And the citation is George Spencer-Brown, Probability and Scientific Inference (London: Longmans, Green, 1957), pp. 55-56. Actually, 10 is a gross underestimate.

I am aware that the surrounding text to the above quote makes a very different point not related to my line of questioning so I have omitted the point mentioned in the quote since it doesn't serve my purpose.

My questions concerning this passage are (1) how does one compute such a thing (in brief)? (2) for random sequences of events that are very much shorter, say less than 100 events, can we use a similar calculation to grasp the likely steaks of consecutive events when the underlying probability of each event is known? (3) specifically, suppose we had three possible events A, B, and C; and that P(A)=0.4, P(B)=0.4, and P(C)=0.2, what then in a sequence of 30 events should we expect for streaks of either A's, B's, or C's? This relates to a game I play wherein you buy chests containing one of three colored coins, and you need to create sets of the colored coins to make item you can use in the game where it has often been the experience of a player needing only one purple coin to roll 7 orange coins in a row. I wish to quantify this experience and verify that the good ol' RNG actually does approximate randomness and not just stacking the deck against us.

Well, that's all for today, hope you aren't disappointed that I'm not the one providing the thoughtful analysis this time, just questions for now.

Thanks for your time Probability Forum Reader,
-Ben Orin

jedishrfu · Mar 8, 2018

Before we begin we need some soothing music related to the topic:

I think Uncle Walter did the drunkards walk. The story is from Dr Seuss poem: Uncle Terwilliger Waltzes with Bears

https://genius.com/Dr-seuss-my-uncle-terwilliger-waltzes-with-bears-lyrics

jedishrfu · Mar 8, 2018

How do you want folks to answer? It seems you want us to read and discuss the book. I think our attention spans may be too small for that.

In any event, the problem of ##10^{10^6+7}## looks interesting.

StoneTemplePython · Mar 9, 2018

benorin said:

There was a particular passage that I had in mind when I decided to create this thread: pg. 174, "[...]The point was rather starkly illustrated by mathematician George Spencer-Brown, who wrote that in a random series of 10^1,000,007 zeroes and ones, you should expect at least 10 nonoverlapping subsequences of 1 million consecutive zeroes." And the citation is George Spencer-Brown, Probability and Scientific Inference (London: Longmans, Green, 1957), pp. 55-56. Actually, 10 is a gross underestimate.

I am aware that the surrounding text to the above quote makes a very different point not related to my line of questioning so I have omitted the point mentioned in the quote since it doesn't serve my purpose.

My questions concerning this passage are (1) how does one compute such a thing (in brief)?

I'm actually finding it linguistically ambiguous to "expect at least 10.."

To the extent he means to lower bound the expected number of these non-overlapping subsequences, then yes this is a gross underestimate. (Note: I have a copy of this book, but it's in storage -- long story.)

To get a quick and easy lower bound on the expected number of these subsequences, bunch the trials into groups of ##1,000,000## -- this converts the problem into doing ##10^{1,000,000}## independent coin tosses (bernouli trials). Probability of success per trial is ##2^{-1,000,000}##. Now apply linearity of expectations and find the expected number of successes in this 'bunched' setup is

##2^{-1,000,000}\big(10^{1,000,000}\big) = \big(\frac{10}{2}\big)^{1,000,000} = \big(5\big)^{1,000,000} \gt 10##

stevendaryl · Mar 9, 2018

Here's the way that I would compute a rough lower bound to the number [itex]K[/itex] of sequences of [itex]m[/itex] consecutive 0s in a random string of 0s and 1s of length [itex]N[/itex].

Break the [itex]N[/itex] bits into regions of length [itex]m[/itex]. For each region, there is a probability of [itex]2^{-m}[/itex] that it will be all 0s. So the expected number of regions that are all zeros is the number of regions times the probability per region. There are [itex]N/m[/itex] regions, so we estimate [itex]K[/itex] as follows:

[itex]K = \frac{N}{m} 2^{-m}[/itex]

This is a big underestimate, because even though a region may not have [itex]m[/itex] consecutive 0s, it might end with [itex]n[/itex] consecutive 0s. This could be combined with [itex]m-n[/itex] consecutive 0s of the next region to produce an extra region of [itex]m[/itex] consecutive 0s. But if you're just interested in a lower bound on [itex]K[/itex]...

Rearranging, we can ask how big [itex]N[/itex] must be to have at least [itex]K[/itex] regions of at least [itex]m[/itex] consecutive 0s.

[itex]N = m K 2^m[/itex]

In the particular problem of interest, [itex]m = 10^6[/itex], [itex]K = 10[/itex], and [itex]2^m \approx 10^{0.3 \cdot m} = 10^{0.3 \cdot 10^6}[/itex]

So my lower-bound estimate is that [itex]N = 10^{7 + 0.3 \cdot 10^6} = 10^{301035}[/itex]

It seems to me that the estimate [itex]N = 10^{1000007}[/itex] assumes that the chance of getting 1 million 0s is [itex]10^{-1000000}[/itex] instead of [itex]2^{-1000000}[/itex].

StoneTemplePython · Mar 12, 2018

By the way, there are some very nice asymptotic results / estimates that may be used for this problem. Specifically using some mixture of renewal theory and markov chains, we can use the time average renewal rate:

##\lim_{t \to \infty}\frac{E[N(t)]}{t} = \frac{1}{\bar{X}}##

where we have ##t## (positive integer time) iterations, ##E[N(t)]## is the expected count of "successful" non-overlapping subsequences, and ##\bar{X}## is the average inter-renewal interval.

with respect to this problem, there are ##m## states in the markov chain, and we can solve for

##\frac{E[N(t)]}{t} \approx \frac{1}{\bar{X}} = \frac{1}{(2^m - 2)}##
for large ##t##

which gives us

##E[N(t)] \approx \frac{t}{2^m - 2}##
for large t
where ##m = 1,000,000## and ##t = 10^{1,000,007}##
- - - -
As a quick check with respect to the quality of the estimate, we may note:
there is exponential growth in ##\bar{X}##

##\log_2 \big(2^{1,000,000} - 2 \big) \approx 1,000,000##

vs our value of ##t## iterations

##\log_2 \big(10^{1,000,007}\big) = 1,000,007 \big(\log_2 (10)\big) \approx 1,000,000 (3.3)##.

Hence in logspace the number of iterations (t) is more than 3x the expected renewal time --- i.e. more than the expected renewal time, cubed, after exiting logspace-- which gives us a quick estimate that ##t## is quite large relative to the expected renewal time. (There may be markov chains literature in estimating the second largest eigenvalue of this chain to get a more precise estimate on the convergence rate, though I'm not aware of any specifics here.) A lot of these renewal theory estimates exist because they are good approximations, so long as your amount of time/iterations is reasonably large.

StoneTemplePython · Mar 12, 2018

note on rate of convergence for steady state chain:
- - - -
Note: this entire discussion can be safely skipped. Because we have such a large ##m## of a million, and the number of iterations is so much larger than the number of states, and as a raw number the number of iterations is just massive, the only way we could be worried about convergence problems is if the largest magnitude subdominant eigenvalue has magnitude almost arbitrarily close to 1. In logspace (base 2 or e -- doesn't really matter here), the magnitude would need to be in the neighborhood of ##-\frac{1}{t}##. For a degree ##m## polynomial where ##m \lt \lt t##, who's coefficients are all integer (after doubling the matrix) it would be quite remarkable (and perhaps impossible) to find eigenvalues with magnitudes like ##e^{\frac{-1}{t}}##. There is s a ton of special structure in the problem, though, and working through explicit smaller examples may be helpful to some, so I went ahead wrote up the below.

- - - -
There is a some extremely nice structure in here --- this allows one one to get the markov chain eigenvalue estimates in closed form-- which is quite rare. In general figuring out the magnitude of the subdominant eigenvalue is quite nice because it allows you estimate the (exponential) rate of convergence with steady state, for a finite state (connected and aperiodic) markov chain.

Since the process in the problem looks like a generalization of a geometric series, associated with coin tossing, the eigenvalues maybe aren't that surprising.

There are a few different ways to model this. Here's one way, with the matrix representing the markov chain, shown in the case of ##m = 7##.

##\mathbf A = \left[\begin{matrix}0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5\\0.5 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.5\\0.0 & 0.5 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.5 & 0.0 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.5 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.0 & 0.5 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.5 & 0.0\end{matrix}\right]##

where at time zero we start in position ##\mathbf e_1## (i.e. top left corner) but we are interested in setting up a counting process that counts the number of times we hit the far right column (i.e. state m aka ##\mathbf e_m##). Technically time zero is a delayed renewal because proper renewals begin and end at ##\mathbf e_m##, though this is a minor point.

An extremely close graph is given by

## \mathbf B = \left[\begin{matrix}0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 1\\0.5 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0\\0.0 & 0.5 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.5 & 0.0 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.5 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.0 & 0.5 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.5 & 0.0\end{matrix}\right]##

This graph is identical to the other one, except it adds, 1 more iteration on average per renewal period, because it makes state ##m## always go to state ##1## instead of state ##1##, 50 percent of the time and state ##2##, 50 percent of the time. (I.e. we 'artificially' enter state 1 with probability ##p##, basic geometric series math tells us that on average it takes ##\frac{1}{p}## to leave state 1, hence the expected cost is ##p\big(\frac{1}{p}\big) = 1##.)

The main idea is that the difference between ##\bar{X} = 2^m## vs ##\bar{X} = 2^m - 1## vs ##\bar{X} = 2^m -2## is negligible for large ##m##. What this means is if we can get comfortable with the spectrum of ##\mathbf B## and how fast it converges, and in turn the implications for renewal theory estimates of its expected value, then we also become comfortable with ##\mathbf A##. Note the prior post states ##\bar{X} = 2^m -2## (i.e. expected return time from state m to state m) which is the renewal rate for state m in ##\mathbf A## -- for convenience we'll state the renewal rate for state m in ##\mathbf B## to be ##\bar{X} = 2^m -1## which is one more than than ##2^m -2##, though again, technically the renewal rate for m in ##\mathbf B## is at most one more than in ##\mathbf A##.

This second graph is still a markov chain and hence has an eigenvalue equal to one. For convenience we can double it (which doubles the eigenvalues), which gives us

##\mathbf C = 2\mathbf B = \left[\begin{matrix}1 & 1 & 1 & 1 & 1 & 1 & 2\\1 & 0 & 0 & 0 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 1 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 1 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 1 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 1 & 0\end{matrix}\right]##

then effect a similarity transform with the reflection matrix ##\mathbf J## (which is a permutation matrix and involutory) where

##\mathbf J =

\left[\begin{matrix}0 & 0 & 0 & 0 & 0 & 0 & 1\\0 & 0 & 0 & 0 & 0 & 1 & 0\\0 & 0 & 0 & 0 & 1 & 0 & 0\\0 & 0 & 0 & 1& 0 & 0 & 0\\0 & 0 & 1 & 0 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0 & 0\\1 & 0 & 0 & 0 & 0 & 0 & 0\end{matrix}\right]##

we get

## \mathbf J \mathbf C \mathbf J^{-1} = \mathbf J \mathbf C \mathbf J =\left[\begin{matrix}0 & 1 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 1 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 1 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 1 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 1 & 0\\0 & 0 & 0 & 0 & 0 & 0 & 1\\2 & 1 & 1 & 1 & 1 & 1 & 1\end{matrix}\right]##

which is the (transpose of the) Companion Matrix. This means we can simply read off the characteristic polynomial from the bottom row of said matrix. From here factor out ##(x-2)## -- i.e. 1 is an eigenvalue in ##\mathbf B##, but after doubling everything, 2 is an eigenvalue of ##\mathbf C##, which we want to factor out from the polynomial to get a degree ##m-1## polynomial. After factoring this out, we get

##p(x) = x^{m-1} + x^{m-2} + x^{m-3}+ ... + x + 1##
all coefficients are positive, so no positive number can be a root. Here we can telescope the sum to get
##p(x) = \frac{1 - x^m}{1-x}##

which tells us that all mth roots of unity are eigenvalues of ##\mathbf C##, except of course for the value of 1. That is ##\mathbf C## has eigenvalues of ##\{2, \omega^{1}, \omega^{2}, \omega^{3}, ..., \omega^{m-2}, \omega^{m-1}\}##

from here we divide everything by 2 to get that close approximation to our original problem and find that ##\mathbf B## has eigenvalues given by

##\{1,\frac{ \omega^{1}}{2}, \frac{ \omega^{2}}{2}, \frac{ \omega^{3}}{2}, ..., \frac{ \omega^{m-2}}{2}, \frac{ \omega^{m-1}}{2}\}##

Thus this markov chain has subdominant eigenvalue(s) with magnitude ##\frac{1}{2}##. This tells us that if we write ##\mathbf B## in terms of spectral projectors,

##\mathbf B^k = E_1 + \sum_{r=2}^m \lambda_r^k E_r \approx E_1##

for even moderately large ##k##.

note, in this setup, ##E_1 = \mathbf {v1}^T## and ##\mathbf 1^T \mathbf v = 1##, where ##\mathbf v## is the dominant right eigenvector and ##\mathbf 1## is the dominant left eigenvector -- i.e. each column of ##\mathbf B## sums to one. (This is par for linear algebra literature, though its common in probability literature to use the transpose of this -- i.e. have ##\mathbf B## have each row sum to one.)

That is, the graph is well approximated by its steady state even for "pretty small" ##k##. (Because our interest is in reaching that final state, and the paths increment by at most one state at a time, then I'd suggest the approximation is reasonable for all ##k \gt m##.)

The literal (expected) counting process we're interested in is

##\sum_{k=1}^t \mathbf e_m^T \mathbf B^k \mathbf e_1 \approx \sum_{k=1}^t \mathbf e_m^T E_1 \mathbf e_1 = \sum_{k=1}^t \mathbf e_m^T \mathbf v\big(\mathbf 1^T \mathbf e_1\big) = \sum_{k=1}^t \mathbf e_m^T \mathbf v\big(\mathbf 1^T \mathbf e_m\big) = \sum_{k=1}^t \mathbf e_m^T E_1 \mathbf e_m##

where we recall that ##\mathbf 1^T \mathbf e_1 = 1## and ##\mathbf 1^T \mathbf e_m = 1##

From the prior post, there's the renewal theory view, which can be interpreted as

## \frac{E[N(t)]}{t} \approx \frac{1}{\bar{X}} = \frac{1}{(2^m - 2)} \approx \frac{1}{(2^m - 1)}=\frac{1}{t} \sum_{k=1}^t \mathbf e_m^T E_1 \mathbf e_m = \mathbf e_m^T E_1 \mathbf e_m ##

Which, I hope, is a nice clear link between the finite state markov chains approach and renewal theory.

I'd suggest the above estimate can be decomposed into two portions

##\frac{m}{t} \approx 0\%## that is "bad" and ##\frac{t-m}{t} \approx 100\%## that is "good" because our ##t## is so much larger than ##m##.
- - - -
we may which to be conservative and choose the first, say ##3m ## iterations to be 'bad', but this gives the same end result:

##\frac{3m}{t} \approx 0\%## that is "bad" and ##\frac{t-3m}{t} \approx 100\%## that is "good" because our ##t## is so much larger than ##m##.
- - - -

Put differently, the asymptotic approach from renewal theory is an extremely good estimate of the expected value for the problem.

(note: the spectrum for ##\mathbf A## is actually ##1##, as well as ##\frac{1}{2}## times the (m-1) roots of unity, except for the root of 1, again, and also an eigenvalue of zero. This seemed a lot less pleasant to show. Note: The same argument / process can be used for any ##0\lt p \lt 1## for finding the spectrum of ##B##, which in general is
##\{1, p\omega^{1}, p\omega^{2}, p\omega^{3}, ..., p\omega^{m-2}, p\omega^{m-1}\}##

though it is a little bit messier to show this. )

- - - -
here's some code for anyone interested

Python:

import numpy as np
# python 3.x

m = 7
p = 0.5

A = np.zeros((m,m))
B = np.zeros((m,m))

A[0,-1]= 1-p
A[1,-1]= p

B[0,-1]= 1
##########
A[0,0] = 1-p
A[1,0] = p
B[0,0] = 1-p
B[1,0] = pfor k in range(0, m-1):
    A[k+1,k] = p
    A[0,k] = 1-p
 
    B[k+1,k] = p
    B[0,k] = 1-p

print(A,"\n")
print(B)

JeffJo · Mar 15, 2018

My interest lies in Mlodinow's treatment of what is popularly known as the Two Child Problem (or TCP). If I recall correctly, Mlodinow actually phrased the problem in a couple of ways. Here's the one that I recall, hopefully correctly and with the same meaning but in my own words:

You recall that a distant relative has two children, and that one of them is a girl. But you don't recall whether both are girls. What is the probability that both are girls?

Some will call this the Two Child Paradox, referring the fact that conflicting answers are claimed, but I don't like to. One answer is wrong, so it doesn't make a paradox.

In the problem's predecessor, referred to as Bertrand's Box Paradox, the "paradox" that Joseph Bertrand referred to was not the problem, but how to show which of the conflicting answers must be incorrect. It was a cautionary tale about how NOT to use information in a probability problem. Unfortunately, that was exactly how Mlodinow used information in his TCP.

What if, in Mlodinow's version, I tell you that I recall one gender in a two-child family. But I don't tell you what that gender is. What are the chances the family has two children with the same gender? You can solve this as a conditional probability problem, using the Law of Total Probability. You define all the possibilities for what I could know, determine the conditional probability of matching genders in each of those cases, multiple each conditional probability by the probability that the respective case is true, and add them up. The two cases are {boy,girl}, the conditional probability in each case is the answer to Mlodinow's version of the question, and it turns out not to matter what the probabilities of each case are since they must add up to 1. The answer is the same as the answer to Mlodinow's question.

If Mlodinow's answer is correct, the fact that I told you "I know one gender" changed the probability of matching genders from 1/2 to 1/3. Since I gave you no useful information, such a change would be a paradox and disproves Mlodinow's answer. In fact, this paradox proves that any answer other than 1/2 must be incorrect. But it doesn't show why 1/2 is correct.

But that's easy: If the family has two girls (or two boys), you could only remember about a girl (or a boy) in Mlodinow's question. But if the family has one of each, you could have remembered that there was a boy even when there was a girl. In fact, without information saying otherwise, you must assume that there was a 50% chance to remember either. Mlodinow counted all of the cases where the family included a girl, but probability says he should only count half of the ones where it included a boy as well. Doing so makes the answer 1/2. The only way Mlodinow's solution is correct, is if there was no possibility that you could remember a boy. That is, if you had asked the relative "Is either of your children a girl" and they answered "yes" without further explanation.

Mlodinow also compared his question to recalling that the older child was a girl, in which case his answer was 1/2. What he failed to recognize was that what makes this question different is that it distinguishes the two children, not that one was born first. Any method that distinguishes them - the older, the one whose name is alphabetically first, the one who sits to Mother's right at the dinner table - accomplishes the same thing.

+++++

But Mlodinow didn't stop there.

Now suppose that you recall that one of them is a girl with a very unusual name, like "Florida." Does this change the probability that both are girls?

Mlodinow said it did - to approximately 1/2. In fact, in internet discussions of the book, he said it was (2-f)/(4-f), where f is the proportion of girls named Florida, and the fact that f is small allowed him to dismiss the fact that he included families with two girls named Florida in his calculation.

Mlodinow's solution is correct only if you had asked the relative "Is either of your children a girl named Florida" and they answered "yes" without further explanation.
If you don't know that this question was asked, a variation of Bertrand's Box paradox disproves Mlodinow's result. The calculation may not be simple, but Mlodinow's logic says that recalling any name should increase the probability from 1/3 to something that is in between 1/3 and 1/2. The unusualness of the name "Florida" just let him approximate it as 1/2. But if this logic were correct, saying "I recall that one had a girl's name" paradoxically increases the probabiliity from 1/3, but still less than 1/2. This increase, and the necessary change, goes away if you recognize that the original answer must be exactly 1/2.
If you can't have two girls named Florida in a family, then recall a girl with that name distinguishes her from her sibling with the exact same effect as saying she was the older child.
But not only does Mlodinow allow there to be two girls named Florida in the same family, he allows any name to be duplicated. It turns out that, if you disallow all duplication but ask "Is either of your children a girl named Florida", then the probability of two girls is actually greater than 1/2. The set of possible names is smaller for a second daughter, and common names are removed more often that uncommon ones. So it is more likely that the second daughter will be named "Florida" than the first daughter, increasing the chances of two girls when you ask about her.

StoneTemplePython · Mar 17, 2018

The original question regarding something from the book had a very large state space, and absurdly large number of iterations involved, which basically forced an asymptotic approach to estimate the expected number of non-overlapping sequences of length m, over ##r## iterations.

- - - - -
The follow-up question is a lot easier, and I belatedly realized the thread is tagged Beginner.

benorin said:

(2) for random sequences of events that are very much shorter, say less than 100 events, can we use a similar calculation to grasp the likely steaks of consecutive events when the underlying probability of each event is known? (3) specifically, suppose we had three possible events A, B, and C; and that P(A)=0.4, P(B)=0.4, and P(C)=0.2, what then in a sequence of 30 events should we expect for streaks of either A's, B's, or C's?

All you need to calculate the expected number of streaks of length m over 30 iterations is a tiny bit of coding and a tiny bit of matrix algebra. This calculation is straightforward, and exact (up to floating point precision).

The matrix algebra approach, in the case of sequences of length 7 C's in a row.

Python:

import numpy as np
import numba
# python 3.x

m = 7
p = 0.2

# p is the probability of a forward transition (setting aside the final state)
# and 1-p is probability of 'backward' transition

A = np.zeros((m,m))
# transitiion matrix A.  Note that Python indexes at zero.
A[0,-1]= 1-p
A[1,-1]= p

##########
A[0,0] = 1-p
A[1,0] = p

for k in range(0, m-1):
    A[k+1,k] = p
    A[0,k] = 1-p
print(A,"\n")
### the transition matrix has been generated.  Now the calculation begins
number_of_iterations = 30

score = 0
# this is the expected score / average count of successes
x = np.zeros(m)
x[0]=1
for _ in range(number_of_iterations):
    b = A @ x
    x = b
    score += x[-1]
    # this is the last state, i.e. state m, except Python indexes at zero so it would be referred to as state m-1
print("expected number of non overlapping sequences of length ", m)
print(score)

In math-ese that for loop calculation to generate the 'score' is
##r:= \text{number_of_iterations}##
## E[N(t)] = \text{expected score} = \sum_{k=1}^{r} \mathbf e_m^T \mathbf A^k \mathbf e_1##

and here would be a comparable mc simulation approach

Python:

@numba.jit(nopython =True)
def run_sim(m, n_trials, t_iterations_per_trial, p =0.5):
    counter = 0
    for _ in range(n_trials):
        cur_state = 0
        for k in range(t_iterations_per_trial):
            rand_num = np.random.random()
            if rand_num >= p:
                # go back to state zero
                cur_state = 0
            else:
                if cur_state == m-1:
                    cur_state = 1
                else:
                    cur_state += 1
                    if cur_state == m -1:
                        counter += 1
                    # increment each time we enter that terminal state
    return counter / (n_trials)

sim_result = run_sim(m, t_iterations_per_trial= number_of_iterations, n_trials= 5000000, p =p)
print(sim_result)

Zafa Pi · Mar 20, 2018

JeffJo said:

My interest lies in Mlodinow's treatment of what is popularly known as the Two Child Problem (or TCP). If I recall correctly, Mlodinow actually phrased the problem in a couple of ways. Here's the one that I recall, hopefully correctly and with the same meaning but in my own words:

"You recall that a distant relative has two children, and that one of them is a girl. But you don't recall whether both are girls. What is the probability that both are girls?"

The question seems ambiguous to me.
If the reason for my recall is the father told me that both weren't boys, then under the usual assumption that a boy and girl are equally likely, the answer is 1/3.
If the reason for my recall is he said my youngest is a girl, then the answer is 1/2.

JeffJo said:

In fact, this paradox proves that any answer other than 1/2 must be incorrect. But it doesn't show why 1/2 is correct.

What else is there other than there is no answer?

JeffJo · Mar 25, 2018

Zafa Pi said:

The question seems ambiguous to me.

And so it is. As is "Did the coin I just flipped land on Heads or Tails?" The point of probability is to represent all of the ambiguous possibilities mathematically, with probabilities that add up to 1. So if you ask a question about probability, it must cover the ambiguous possibilities.

With my version of the question, "You recall that ... one of them is a girl. But you don't recall whether both are girls":

There is a 1/4 probability that the family has two girls. If you recall only one gender, it must be "girl."
There is a 1/4 probability that the family has two boys. If you recall only one gender, it must be "boy."
There is a 1/2 probability that the family has one girl and one boy. If you recall only one gender, assume the probability that it is "girl" is 0<=Q<=1.

[Yours isn't any different.]

When we model this ambiguity, there is a (1/4+Q/2) probability that you will recall "one of them is a girl," and a (1/4+(1-Q)/2) probability that you will recall "one of them is a boy." My point is that these two probabilities can't be different, unless you state a specific bias, so you must assume Q=1/2.

And when you assume Q=1/2, you can "recall" that a family has a boy in cases where it also has a girl.

If the reason for my recall is the father told me that both weren't boys, then under the usual assumption that a boy and girl are equally likely, the answer is 1/3.

No, it isn't.

Your reasoning applies only if there is one girl and one boy. In that case, the father could have told you "both aren't boys" or "both aren't girls." You are assuming only one of those statements was possible. Why?

benorin · Apr 10, 2018

StoneTemplePython I greatly appreciate all the hard work you put into your posts on this thread, thank you! I used to have that kind of zeal in my old Calc & beyond Homework help posts but to the point: your posts are way, way over my head. I need something simple based on no more than basic probability, calculus or measure theory at very most so I can follow you. Thanks much.

StoneTemplePython · Apr 11, 2018

benorin said:

StoneTemplePython I greatly appreciate all the hard work you put into your posts on this thread, thank you! I used to have that kind of zeal in my old Calc & beyond Homework help posts but to the point: your posts are way, way over my head. I need something simple based on no more than basic probability, calculus or measure theory at very most so I can follow you. Thanks much.

I think post #9 referring to your question (and not the example in the book) should be within reach. For problems like this you can sometimes come up with a very clever Poisson approximation, but I'm not seeing it here. In general you can tackle them with some mixture of (a) countable/finite markov chains, (b) renewal theory, (c) large deviation theory, and/or (d) martingales. Things get more and more abstract as you move away from (a). Outside of using these tools/techniques, I'm afraid what you're asking can't be answered.

- - - -
note:

1.) The lower bound mentioned in post #4 should also be in reach -- it really is just basic probability.

2.) There is a very nice comparable problem, btw, that you'd learn to solve in an intro probability course, where you are asked to estimate the expected number of these desired subsequences, but you allow the subsequences to overlap -- it's a simple and deep result based on a very clever application of Linearity of Expectations. Up to a very close approximation there's a simple formula to relate that problem and your original one (asymptotically at least) but I'm finding a heuristic explanation for why it holds to be elusive -- it becomes obvious if you look at the underlying Markov Chain graphs, which are identical except for two conveniently located edges, but when I look for a simple, high level basic probability reason for why the formula holds, I come up short.

- - - -

Overall, this is a very nice little problem.

- - - -
subsequent note:

our expected count given by ##N\big(t\big)## or sometimes ##m(t)## can be very nicely refined for the original massive example... cleverly using Wald's Equality and taking advantage of monotonicity in the underlying chain, we can actually put up error bars and say

##\frac{t}{\bar{X}} - 1 \leq m(t) \leq \frac{t}{\bar{X}}##

which in the case of the original massive problem, actually gives an absurdly tight estimate on the expected number of successes.

I suppose it's beyond the scope of the original question's intent, but it's a very nice result that's considerably stronger than merely being asymptotically correct.

benorin · Aug 13, 2018

This next quote from the text may verge on too long but it illuminates what I want to talk about so well...

Suppose I tell you that I have made up a rule for the construction of a sequence of three numbers and that the sequence 2, 4, 6 satisfies my rule. Can you guess the rule? A single set of three numbers is not a lot to go on, so let's pretend that if you present me with other sequences of three numbers, I will tell you whether or not they satisfy my rule. Please take a moment to think up some three-number sequences to test-the advantage of reading a book over interacting in person is that in the book the author can display infinite patience.
Now that you have pondered your strategy, I can say that if you are like most people, the sequences you present will look something like 2, 6, 8 or 8, 10, 12 or 20, 24, 30. Yes, those sequences obey my rule. So what's the rule? Most people, after presenting a handful of such test cases, will grow confident and conclude that the rule is that the sequence must consist of increasing even numbers. But actually my rule was simply that the series must consist of increasing numbers. The sequence 1, 2, 3, for example, would have fit; there was no need for the numbers to be even. Would the sequences you thought of have revealed this?
When we are in the grasp of an illusion--or, for that matter, whenever we have a new idea--instead of searching for ways to prove our idea wrong, we usually attempt to prove them correct. Psychologists call this the confirmation bias, and it presents a major impediment to our ability to break free from the misinterpretation of randomness. In the example above, most people immediately recognize that the sequence consists of increasing even numbers. Then, seeking to confirm their guess, they try out many more sequences of that type. But very few find the answer the fast way--through the attempt to falsify their idea by testing a sequence that includes an odd number. ...
To make matters worse, not only do we preferentially seek evidence to confirm our preconceived notions, but we also interpret ambiguous evidence in favor of our ideas. ...
The confirmation bias has many unfortunate consequences in the real world. When a teacher initially believes that one student is smarter than another, he selectively focuses on evidence that tends to confirm the hypothesis.(48) When an employer interviews a prospective candidate, the employer typically forms a quick first impression and spends the rest of the interview seeking information that supports it.(49) ... And when people interpret the behavior of someone who is a member of a minority, they interpret it in the context of preconceived stereotypes.(51)

This is a really big clue I think to what I'd like to discuss: just how do we come to believe things when the distribution is unknown? An example from the Xbox One game Destiny 2 is during an event wearing a certain kind of armor while killing enemies is to produce three kinds of orbs (void, solar, and arc). According to a web site, killing an enemy with the corresponding elemental weapon type produces that kind of orb whilst wear that armor (for example, killing an enemy with a void weapon whilst wearing that armor produces a void orb). I quickly disprove this idea by several means, not only did killing an enemy with a void weapon whilst wearing the armor not always produce a void orb, or indeed at times this would produce no orbs at all, and sometimes it produced solar or arc orbs as well so the website was wrong I concluded. Another thing was that me and my dad produced quite predominately void orbs over other types of orb while we were both the voidwalkers sub-class, and this lead to the hypothesis that kills with sub-class elemental powers produced the corresponding orb types. I think this was just a "misinterpretation of randomness" like the sequence of even numbers in the quote. Now I rest on the hypothesis that orb types generated are purely random, or according to some proportion. How can I test this? How better to form hypotheses to test in light of the quote? And what are the likely pitfalls encountered by people trying to discover the distribution on their own? Can we generalize these things?

StoneTemplePython · Aug 13, 2018

The thread lives! This must be quite a video game.

benorin said:

How can I test this? How better to form hypotheses to test in light of the quote? And what are the likely pitfalls encountered by people trying to discover the distribution on their own? Can we generalize these things?

It depends on what you're looking for here, but your recent questions are, basically, machine learning. And there's a ton of literature on sufficient conditions (VC dimension is a big one) for learning and various techniques for harnessing generalization -- a big idea is to 'train' in sample (perhaps with validation set) and then set aside a bunch of data you only use once for "out of sample" test. I may be able to send some high level links, but if you thought what I said before was too technical... you won't like this.

benorin · Nov 4, 2018

The book touches on determinism, whether if all initial conditions were known for the universe that one could predict the outcomes of every event in spacetime (I'm probably paraphrasing that wrong). So try not to shrug at the raising of an old tired debate that goes way back in philosophy, I really wonder at this with childlike amazement still, and wish to have a brief discussion of it here so please brush off a happy argument that convinces you personally one way or the other and let's please leave God out of it.

I'm uncertain personally, the oceans of probabilistic evidence for randomness are weighty: gambling works in favor of the house like it's supposed to given the randomness assumption (but I do not wish to discuss the finer points of whether dice rolls are really a uniform distribution due to aerodynamics or what have you); science seems to work okay when we base our theories on statistics and/or probability like Brownian motion and quantum mechanics. But when I read things about the cosmic microwave background radiation (CMB) like "So when we map the CMB, we are looking back in time to 380,000 years after the Big Bang, just after the universe was opaque to radiation"¹

I wonder if it is all deterministic after all, that's quite some cause-and-effect chain of events going back almost forever. Makes me wonder if probabilistic theories are just an illusion, a purely mathematical construct, that yes, fits the data adequately enough for us to believe it represents the underlying truth of reality but maybe it's just like that one guy's predictive model for the stock market mentioned in the book based on sports statistics that happens to give the correct answer most of the time but no one would believe there's any kind of real cause-and-effect relationship betwixt the stock market and the outcome of sports games, it's just a nice correlation.

I really could go on, but I think I'll let you take the floor now.

¹ source: Cosmic Microwave Background: Remnant of the Big Bang

A. Neumaier · Nov 4, 2018

benorin said:

Makes me wonder if probabilistic theories are just an illusion, a purely mathematical construct

It is a mathematical construct that allows us to organize certain kinds of uncertainty into a rational whole. See Chapters A3 and A4 in my theoretical physics FAQ.

TonyP0927 · Nov 5, 2018

1. Let's suppose you are a hyper-dimensional being that is able to see time as a spatial dimension. You could look one way and see the past and look the other way and see the future. So, the past is always happening and the future is always happening [Brian Greene, Neil deGrasse Tyson]. This implies a deterministic universe. You can theoretically "peek" into the future by taking advantage of the Theory of Simultaneity.

2. Given the information from number 1, if we apply the MWI of quantum mechanics, the future direction of the timeline may not be so deterministic. If it is not being observed by a conscious observer, all of the future selves could be in superposition until our conscious "now" selves observe it. This idea really depends on how quantum mechanics apply across the entire time axis and if observations can be made at any point in time.

The Drunkard's Walk: A Discussion of the Text

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect