The probability challenge discusses flipping a set of pennies until heads appear, determining the winner as the penny flipped the most times. The argument presented shows that for a number of pennies \( n \geq 4 \), the probability of having a unique winner is approximately between 0.71 and 0.72, which supports the claim that this probability is at least \( \frac{2}{3} \). The analysis involves concepts from order statistics and Markov chains, demonstrating that the probabilities converge as the number of pennies increases. A simplified approach suggests using a guess of \( \frac{2}{3} \) for \( m \geq 4 \) to streamline the proof. Ultimately, the discussion emphasizes the robustness of the probability estimation through various mathematical techniques.
#1
lfdahl
Gold Member
MHB
747
0
Start with some pennies. Flip each penny until a head comes up on that penny.
The winner(s) are the penny(s) which were flipped the most times. Prove that
the probability there is only one winner is at least $\frac{2}{3}$.
I took "some pennies" to mean $n$ pennies where $n$ is any arbitrary natural number you like $\geq 2$ .
We probably could solve this directly with order statistics but it seemed like a lot of calculations with not that much insight.
My approach below is conceptually simple in that for $n\geq 4$ all the probabilities of having a unique maximal penny (i.e. non-bulk arrival in the final epoch of the merged bernouli process) tends to be right around 0.71 to 0.72. Nice random variables like sums of bernoulis concentrate about their mean so we can ignore the probabilities associated with tails for large $n$. This then implies that all probabilities have to be $\gt 0.7$ for large $n$. That really is the whole argument. Unfortunately flushing this out ended up being a bit long.
This implies
$a_k = \prod_{k = 0}^r \big( 1 - 2 e^2 \cdot e^{-(k+21)/2} \big) \geq 1 - \sum_{k=0}^\infty 2 e^4 \cdot e^{-(k+21)/2} \geq 1- \frac{2}{e^{6.5}-e^{6}} \approx 0.992358 $
by the union bound so the sequence is monotone decreasing: $1 \gt a_0 \gt a_1 \gt a_2 \gt ...$
and bounded below by 0.992, so the limit exists and is $\in (0.992,1)$
- - - -
For example suppose we have n = 20 pennies. This is equivalent to a merger of 20 different Bernouli processes, with probability of success = $\frac{1}{2}$, where a given process "drops out' upon first success / arrival. (This is analogous exactly with order statistics in exponential r.v.'s and poisson merging and splitting, except Poisson's are cleaner with zero probability of bulk arrivals, which is the point of this problem I suppose.)
If there are $s$ bulk arrivals success/ in the first round, then we have $n-s$ pennies in the second round, equivalently a merger of m-s different Bernoulis. By memorylessness this holds for any arbitrary round. E.g. if there are 20 pennies at the beginning of a round and 4 successes, then there are 16 pennies still being tossed next round. The number of successes in a given round with $m$ pennies at the beginning is binomially distributed with $m$ and $\frac{1}{2}$ as parameters. This in combination with the number of successes is enough to form an absorbing state markov chain, where 0 and 1 are absorbing states; literally interpreted the problem in effect asks what is the probability of entering state 0 through state 1, but all anyone in state one enters state zero with probability 1 -- so we though we can make state 1 absorbing to capture the total probability of paths entering state one before finishing. (Equivalenlty items absorbed in state 0 have bulk arrivals in their final success period -- i.e. multiple pennies with the same maximal number of tosses amongst the n pennies.)
The markov chain has a row stochastic transition matrix that looks like
where $\mathbf T$ represents transient states (which implies a spectral radius striclty less than one), and coincidentally is lower triangular
for our example, the problem would have states $\{0,1,2,...,20\}$ -- so it would be a 21 x 21 matrix and $\mathbf T$ would be 19 x 19
using standard absorbing state markov chain techniques (justified on grounds of 'first step analysis') we have
$\big(\mathbf I - \mathbf T\big) \mathbf v = \mathbf r_1$
so
$\mathbf v =\big(\mathbf I - \mathbf T\big)^{-1}\mathbf r_1$ if we consider
$\mathbf x =\begin{bmatrix}
0\\
1 \\
\mathbf v\\
\end{bmatrix}$
then $\mathbf x$ is a right eigenvector, i.e. $P\mathbf x = \mathbf x$
which can be justified on linear algebra grounds, or more easily on martingale grounds. Here is what the transition matrix looks like when $m= 6$
$P =\left[\begin{matrix}1 & 0 & 0 & 0 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0 & 0\\0.25 & 0.5 & 0.25 & 0 & 0 & 0 & 0\\0.125 & 0.375 & 0.375 & 0.125 & 0 & 0 & 0\\0.0625 & 0.25 & 0.375 & 0.25 & 0.0625 & 0 & 0\\0.03125 & 0.15625 & 0.3125 & 0.3125 & 0.15625 & 0.03125 & 0\\0.015625 & 0.09375 & 0.234375 & 0.3125 & 0.234375 & 0.09375 & 0.015625\end{matrix}\right]
$
(I'm more interested in $m=20$ but its too big and messy to show)
Code:
#python 3.x
import numpy as np
from scipy.special import comb
def binom_formula(p, n, k):
return comb(n,k)*p**k * (1-p)**(n-k)m= 20
p = 1/2
q = 1 - p
A = np.zeros((m+1,m+1))
A[0,0] = 1
A[1,1] = 1
A[2,2] = q**2
A[2,1] = comb(2, 1)*q*p
A[2,0] = comb(2, 0)*q**0*p**2
for i in range(3, m):
for j in range(0,i+1):
A[i,j] = binom_formula(p, i, j)
noticing that the bound we want to prove is sharp for the $m=2$ case, but for $m\geq 3$ all values are at least 0.71
so for $m=21$ we'd like to find the unique value that we could append to the vector $\mathbf x^{(20)}$ to create $\mathbf x^{(21)}$ such that it is still a fixed point of the appropriate transition matrix (and this must be our probability of ultimately being absorbed into state 1)
$L_{21}$
$ = \lim_{r \to \infty} p_{21,1}^{(r)} $
$= p_{21,0}\cdot 0 + p_{21,1}\cdot 1 + p_{21,2}\cdot \frac{2}{3} +\Big( p_{21,3}\cdot 0.714285714285714 + ... + 0.721299387756606 \cdot p_{21,20} + L_{21}\cdot p_{21,21}\Big) $
$\geq \Big( p_{21,3}\cdot 0.714285714285714 + ... + 0.721299387756606 \cdot p_{21,20} + L_{21}\cdot p_{21,21}\Big) $
$= \sum_{j=3}^{21} p_{21,j}\cdot x_j^{(21)} $
$\geq \sum_{j=3}^{18} p_{21,j}\cdot x_j^{(21)} $ (removing the "right tail" technically isn't needed but the reasoning is subtle so I went with a cruder, easier approach)
$= \sum_{j=3}^{18} p_{21,j}\cdot x_j^{(20)} $
$\geq \sum_{j=3}^{18} p_{21,j}\cdot 0.71 $
$=0.71 \sum_{j=3}^{18} p_{21,j} $
$\geq 0.71 \cdot a_0$
where the final inequality follows from (the complement of) the 2 sided chernoff bound
The one sided chernoff bound is
$P\Big(S_n \geq c + \frac{n}{2} = n-2\Big) \leq e^{-\frac{2c^2}{n}} $
and double it for the 2 sided bound (mutually exclusive probabilities add, and so do upper bounds on their probabilities) Finally, inducting or iterating from here gives the general bound, e.g. for $m=22$ we'd have
$L_{22} \geq 0.71 \cdot a_1\geq 0.71 \cdot 0.992$
and for arbitrary $m\geq 20$ we
$L_{m} \geq 0.71 \cdot a_{m-20} \geq 0.71 \cdot 0.992 \gt \frac{2}{3}$
[/sp]
Last edited:
#3
lfdahl
Gold Member
MHB
747
0
Hi, steep, thankyou indeed for your in depth analytic approach to the problem.
Unfortunately, I am not capable of evaluating your answer which by far exceeds the level intended for the problem (please cf. below the suggested solution).
But I really appreciate your participation and effort! Thankyou so much!(Nod)
Maybe someone else reading these lines will help evaluating your thorough answer. In fact, this could be a new challenge in the forum!
Hope to see you in new challenges to come!
Suggested solution:
Let´s say we start with two pennies ($n=2$).
Define the events:
$A$: There´s only one winner (“survivor”)
$B_0$: There are no “survivors” (both pennies show heads).
Hi, steep, thankyou indeed for your in depth analytic approach to the problem.
Unfortunately, I am not capable of evaluating your answer which by far exceeds the level intended for the problem (please cf. below the suggested solution).
But I really appreciate your participation and effort! Thankyou so much!(Nod)
Yea I did kind of throw the kitchen sink at this one, and there are probably a few typos in there as I kept changing things (e.g. the sequence should have said $a_r$ not $a_k$).
It occurs to me that my approach to the problem could/should be massively simplified using a 'guess' of $\frac{2}{3}$ for all values of $m\geq 4$ and then basically applying method of relaxations, which is a common technique in Dirichlet problems... Basically I then show that the guess implies an increasing sequence and it's done. So
$\mathbf x^{(20)}_{\text{'guess' 2 }} = P\mathbf x^{(20)}_{\text{'guess' 1 }}$
but this implies the below non-decreasing sequence, where the below are understood as component-wise inequalities
This should make for a much nicer solution than what I originally posted. I have a time shortage right now but hope to fill in details in details in a few days. Conceptually it is fairly similar to the solution you posted, I think, which is basically to homogenize estimates at $\frac{2}{3}$ instead of $0.71$ as I originally did.
#python 3.x
import numpy as np
from scipy.special import comb
def binom_formula(p, n, k):
return comb(n,k)*p**k * (1-p)**(n-k)m= 20
p = 1/2
q = 1 - p
A = np.zeros((m+1,m+1))
A[0,0] = 1
A[1,1] = 1
A[2,2] = q**2
A[2,1] = comb(2, 1)*q*p
A[2,0] = comb(2, 0)*q**0*p**2
for i in range(3, m):
for j in range(0,i+1):
A[i,j] = binom_formula(p, i, j)
the essence of the fair game / martingale argument is that in finite dimensions, a fair game always stays fair. (For infinite dimensions, there are delicate convergence issues to address.) So if we get a reward of 1 for visiting (absorbing) state 1 and reward of 0 for visiting (absorbing) state 0, but with probability 1 we always visit one of those states, then we can figure out our probability of visiting state 1 vs state 0 by finding a well chosen $\mathbf f$ such that $P\mathbf f = \mathbf f$ and the zeroth component of $\mathbf f$ is zero and the 1st component of $\mathbf f$ is one, because
$\lim_{r \to \infty} P^r\mathbf f = P\mathbf f = \mathbf f$
you can easily check that $P-I$ has 2 linearly independent vectors in its nullspace, and that $P\mathbf 1 = \mathbf 1$, so we can interpret $\mathbf f$ as the 'other' right eigenvector with eigenvalue of 1.. (If you prefer you can choose the other non-zero vector in the nullspace of $P-I$ to be orthogonal to $\mathbf 1$ and write $\mathbf f$ as a linear combination of these).
which is aimed at undergrads, though for whatever reason i don't think many people know this approach.
= = = = =
the question becomes, how do we find this $\mathbf f$ besides a long messy Gaussian elimination?
The point is that for $m \geq 2 $ (i.e. the transient states) we have a binomial probability distribution, and with probability of heads = $\frac{1}{2}$ then we can look at probability of getting m-1 heads vs m heads and see
$1 \cdot m 2^{-m} + 0 \cdot 2^{-m} = m 2^{-m} \geq \frac{2}{3}\big(m 2^{-m} + 2^{-m} \big) = \frac{2}{3} \cdot (m+1) 2^{-m}$
or
$m \geq \frac{2}{3} \cdot (m+1) $
or
$\frac{3}{2} m \geq m+1 $ or
$\frac{1}{2} m \geq 1 $ , i.e. $m\geq 2$
$\mathbf x^{(n)}_{\text{'guess' 1 }}= \left[\begin{matrix}0\\1\\ \frac{2}{3}\\ \frac{2}{3} \\ \vdots \\ \frac{2}{3}\\ \frac{2}{3}\end{matrix}\right]$
so looking at row $i\geq 2$ of the above
we see that the ith row is given by (where $B\big(i,j,\frac{1}{2}\big)$ is binomial probability where there are $i$ trials and we choose $j$ successes)
and of course for rows zero and one we have
$ \mathbf e_0^T P\mathbf x^{(n)}_{\text{'guess' 1 }} = 0$
$ \mathbf e_1^T P\mathbf x^{(n)}_{\text{'guess' 1 }} = 1$
putting this together:
$ P\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf x^{(n)}_{\text{'guess' 1 }} + \mathbf y^{(1)} \geq \mathbf x^{(n)}_{\text{'guess' 1 }}$ where all inequalities are understood to be component wise inequalities so
$\mathbf y^{(1)} \geq 0$
but applying $P$ again gives
$ P^2\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf x^{(n)}_{\text{'guess' 1 }} + \mathbf y^{(1)} + \mathbf y^{(2)} \geq P \mathbf x^{(n)}_{\text{'guess' 1 }}$
where we make use of the fact that $P\mathbf y^{(1)} \geq 0$ because each are comprised solely of real non-negative components
Iterating gives a non-decreasing sequence that is bounded above (because these are probabilities...)
$ \mathbf x^{(n)}_{\text{'guess' 1 }}\leq P\mathbf x^{(n)}_{\text{'guess' 1 }} \leq P^2\mathbf x^{(n)}_{\text{'guess' 1 }}\leq ... \leq P^r\mathbf x^{(n)}_{\text{'guess' 1 }} \leq \begin{bmatrix}
0 \\
1 \\
\mathbf 1\\
\end{bmatrix}$
(the highest power is P to the r, though its kind of hard to see here)
applying monotone convergence component-wise tells us that a limit $z_i$ exists for each component, and in fact
$\mathbf x^{(n)}_{\text{'guess' 1 }}\leq P\mathbf x^{(n)}_{\text{'guess' 1 }} \leq P^2\mathbf x^{(n)}_{\text{'guess' 1 }}\leq \lim_{r \to \infty} P^r\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf z = \lim_{r \to \infty} P\cdot P^r\mathbf x^{(n)}_{\text{'guess' 1 }} = P\cdot \mathbf z$ so $\mathbf z =P\cdot \mathbf z$
and it has a zero in zeroth component, and 1 in first component which tells us it must be $\mathbf f$
hence $\mathbf x^{(n)}_{\text{'guess' 1 }}\leq \mathbf z = \mathbf f$
which completes the problem
[/sp]
After looking at the other provided solution, I think the reason that the simpler inductive approach works without e.g. needing to introduce markov chains, is because $\mathbf T$ is triangular. In any case they are similar in spirit. My writeup is still long, mostly because I tried to include background material. If people know the techniques involved, then the entire argument condenses to
this implies a non-decreasing sequence when we apply $P$ many times to our guess vector $\mathbf x^{(n)}_{\text{'guess' 1 }} $, and this essentially gives the result, with no hard estimates involved.
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes.
I have seen that this is an important subject in maths
My question is what physical applications does such a model apply to?
I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles.
In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra
Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/
by...
Greg tells me the feature to generate a new insight announcement is broken, so I am doing this:
https://www.physicsforums.com/insights/fixing-things-which-can-go-wrong-with-complex-numbers/