# The dollar bill problem

1. Apr 26, 2014

### lavoisier

Hi everyone,
I have a question concerning probability.
In short, a guy on TV was betting with people in a bar that they wouldn't be able to guess 3 digits right out of the 8 digits of the serial number of a randomly chosen one dollar bill, and indeed, he won several times. He later explained that 'it isn't so easy to guess 3 digits out of an 8-digit number'.

I was curious to know how difficult it actually was (quantitatively), so I worked on the problem for a while and I found a formula that seemed OK. I used basic set and probability theory.
However, I wasn't very rigorous in deriving it (I'm not a mathematician): when I seemed to spot a pattern (binomial coefficients with alternating signs) I just 'assumed' this could be extended to a more general case. But I'm not sure I was correct in doing so.

This is what I got.
Suppose you pick a random number from the set of all numbers of p digits, each digit taking n possible values (so in the example above, p = 8 and n = 10).
A person (who doesn't know what the picked number is) writes down m distinct digits (obviously, 1≤mp, and each digit can only take one of the n defined values).

According to my calculations, the probability that the m digits are all contained (at least once) in the randomly picked number is:

${{\sum_{i=0}^{m}{\left(-1\right)^{i}\,{m\choose i}\,\left(n-i% \right)^{p}}}\over{n^{p}}}$

I tested the formula in the simplified case of p=3, n=10, by actually counting how many numbers contained a given set of m=1, 2 or 3 digits, and it seemed to work there (the probabilities being 27.1%, 5.4% and 0.6%, respectively).
For the original problem (n=10, p=8, m=3), I found a probability of about 15%.

My questions are:
- is the formula correct?
- could it be expressed in a better/more concise form?

Thank you!
L

PS: someone in another forum suggested this problem is already addressed by the binomial distribution, but the numerical results I got from that didn't match the ones above, even for the case of p=3, where I am sure they are correct.

2. Apr 26, 2014

### Staff: Mentor

3. Apr 27, 2014

### lavoisier

Thank you for your reply jedishrfu.
However, it doesn't seem to be about the problem I was describing.
In my problem it's only one serial number at a time that is considered, and the digits on which one bets are all distinct.

4. Apr 28, 2014

### eigenperson

Your formula is correct. The easiest way to check it is with the principle of inclusion-exclusion. Let $E_i$ be the event that the serial number of the bill does not contain the digit $i$. Then we are looking for $1 - P\left(\bigcup_{i \in S}E_i\right)$, and this is equal to
$$1 - \sum_{T \subseteq S}(-1)^{\left|T\right|}P\left(\bigcap_{i \in T}E_i\right);$$using the symmetry and calculating the probabilities, this is
$$1 - \sum_{i=1}^{\left|S\right|}(-1)^i\binom{\left|S\right|}{i} \left(n-i \over n\right)^p$$ and this is equal to your formula.

I don't immediately see a way to make it more concise.

5. Apr 29, 2014

### tycoon515

Interesting problem. For starters I wrote a program to verify the numbers you mentioned. The computer constructs a random $p$-digit string to represent a serial number, then picks $m$ random digit characters. It then checks whether all of the $m$ digits picked occur at least once in the string. I ran this through a loop $100$ $000$ $000$ times, counting the number of times all $m$ were in the string, and then returning the count. This way the significant digits of the count would be the significant digits of the proportion. The sample proportions I observed for $n=10$, $p=3$ were consistently within $.0001$ of $.271$, $.054$, and $.006$ for $m=1$, $2$, and $3$, respectively. When I upped the string length to $8$, the computer took longer to get through $100$ $000$ $000$ plays, but when it finally came back with an answer, it was in the ballpark of $15$ $427$ $000$, which corresponds closely with the figure you mentioned.

I then realized I could get the computer to count all of the $8$-digit strings that have three distinct given digits, so I wrote another program that runs through a loop for every integer from $0$ to $n$$p$ $-$ $1$. Every time through the loop, it constructs an equivalent $p$-digit string to represent the integer of the current pass by prepending $0$s to the integer until it is $p$ digits in length. It then checks the resulting string for three distinct digit characters, the same three every time through the loop, and if on a given pass all three are present, it increments the count. After $n$$p$ string representations corresponding to integers $0$ to $n$$p$ $-$ $1$ have been checked, it returns the count. When I ran this program with $n=10$, $p=8$, and $m=3$, it returned $15$ $426$ $684$, which, expressed as a proportion out of the $100$ $000$ $000$ total possible $8$-digit strings, is $.15426684$. Not only is this the exact proportion you seek, but whereas the computer required nearly a full minute to get through $100$ $000$ $000$ plays using my empirical method, I only waited five seconds for the computer to get back this time.

While these algorithms are a quick and nifty way to get the correct answer, I think this problem deserves better from me, so I'm going to get right down to the statistics of it. First we must acknowledge that in searching a serial number, every digit place you check either contains a digit of interest whose presence among the string is uncertain (success), or it does not (failure); furthermore, there is only one way that a given digit of interest can occur at a given digit place in the string. We will know the string contains our three distinct digits if and when we have three successes; we need not look further. The first success can occur in one of three ways at a given digit place since it can contain any of the three distinct digits of interest. Given this success, we are now only interested in the two digits remaining, so our definition of success changes to include only those two. The second success can occur in one of two ways at any subsequent digit place. Given the second success, we redefine success yet again to include only the last digit of interest, so the third success can only occur in one way at any subsequent digit place. If we have three successes before we get to the end of the string, then any permutations of the remaining digits to check count toward the number of strings that contain our three.
Let $Y$ denote a success, $N$ a failure, and $E$ any outcome following the third success. We're interested in all string permutations that have three successes (i.e. contain all three digits of interest). Following is a table of all possible string combinations where we have three successes. Traversing a row to the right represents placing the first success further into the string, and as a consequence, there are fewer combinations whose first successes start at that digit place, symbolized by successively shorter columns.

$YYYEEEEE$           $NYYYEEEE$         $NNYYYEEE$          $NNNYYYEE$          $NNNNYYYE$          $NNNNNYYY$

$YYNYEEEE$          $NYYNYEEE$        $NNYYNYEE$         $NNNYYNYE$          $NNNNYYNY$       

$YYNNYEEE$         $NYYNNYEE$       $NNYYNNYE$         $NNNYYNNY$          $NNNNYNYY$       

$YYNNNYEE$        $NYYNNNYE$       $NNYYNNNY$        $NNNYNYYE$      

$YYNNNNYE$       $NYYNNNNY$      $NNYNYYEE$          $NNNYNYNY$      

$YYNNNNNY$       $NYNYYEEE$       $NNYNYNYE$         $NNNYNNYY$      

$YNYYEEEE$         $NYNYNYEE$       $NNYNYNNY$        

$YNYNYEEE$         $NYNYNNYE$      $NNYNNYYE$          

$YNYNNYEE$         $NYNYNNNY$     $NNYNNYNY$          

$YNYNNNYE$        $NYNNYYEE$       $NNYNNNYY$          

$YNYNNNNY$       $NYNNYNYE$      

$YNNYYEEE$         $NYNNYNNY$      

$YNNYNYEE$         $NYNNNYYE$      

$YNNYNNYE$         $NYNNNYNY$      

$YNNYNNNY$        $NYNNNNYY$      

$YNNNYYEE$       

$YNNNYNYE$       

$YNNNYNNY$       

$YNNNNYYE$       

$YNNNNYNY$       

$YNNNNNYY$       

Now I will rewrite every combination as a chain product of the number of ways each of its successive outcomes could happen: the first success ($3$), the second success ($2$), the third success ($1$), any failures before the first success ($7$), any failures between the first and second successes ($8$), any failures between the second and third successes ($9$), and any outcomes after the third success ($10$). In this way I account for every permutation of interest.

Permutations whose first successes occur at the first digit place:

$3\times2\times1\times10\times10\times10\times10\times10=600$ $000$

$3\times2\times9\times1\times10\times10\times10\times10=540$ $000$

$3\times2\times9\times9\times1\times10\times10\times10=486$ $000$

$3\times2\times9\times9\times9\times1\times10\times10=437$ $400$

$3\times2\times9\times9\times9\times9\times1\times10=393$ $660$

$3\times2\times9\times9\times9\times9\times9\times1=354$ $294$

$3\times8\times2\times1\times10\times10\times10\times10=480$ $000$

$3\times8\times2\times9\times1\times10\times10\times10=432$ $000$

$3\times8\times2\times9\times9\times1\times10\times10=388$ $800$

$3\times8\times2\times9\times9\times9\times1\times10=349$ $920$

$3\times8\times2\times9\times9\times9\times9\times1=314$ $928$

$3\times8\times8\times2\times1\times10\times10\times10=384$ $000$

$3\times8\times8\times2\times9\times1\times10\times10=345$ $600$

$3\times8\times8\times2\times9\times9\times1\times10=311$ $040$

$3\times8\times8\times2\times9\times9\times9\times1=279$ $936$

$3\times8\times8\times8\times2\times1\times10\times10=307$ $200$

$3\times8\times8\times8\times2\times9\times1\times10=276$ $480$

$3\times8\times8\times8\times2\times9\times9\times1=248$ $832$

$3\times8\times8\times8\times8\times2\times1\times10=245$ $760$

$3\times8\times8\times8\times8\times2\times9\times1=221$ $184$

$3\times8\times8\times8\times8\times8\times2\times1=196$ $608$

Sum: $7$ $593$ $642$

Permutations whose first successes occur at the second digit place:

$7\times3\times2\times1\times10\times10\times10\times10 = 420$ $000$

$7\times3\times2\times9\times1\times10\times10\times10 = 378$ $000$

$7\times3\times2\times9\times9\times1\times10\times10 = 340$ $200$

$7\times3\times2\times9\times9\times9\times1\times10 = 306$ $180$

$7\times3\times2\times9\times9\times9\times9\times1 = 275$ $562$

$7\times3\times8\times2\times1\times10\times10\times10 = 336$ $000$

$7\times3\times8\times2\times9\times1\times10\times10 = 302$ $400$

$7\times3\times8\times2\times9\times9\times1\times10 = 272$ $160$

$7\times3\times8\times2\times9\times9\times9\times1 = 244$ $944$

$7\times3\times8\times8\times2\times1\times10\times10 = 268$ $800$

$7\times3\times8\times8\times2\times9\times1\times10 = 241$ $920$

$7\times3\times8\times8\times2\times9\times9\times1 = 217$ $728$

$7\times3\times8\times8\times8\times2\times1\times10 = 215$ $040$

$7\times3\times8\times8\times8\times2\times9\times1 = 193$ $536$

$7\times3\times8\times8\times8\times8\times2\times1 = 172$ $032$

Sum: $4$ $184$ $502$

Permutations whose first successes occur at the third digit place:

$7\times7\times3\times2\times1\times10\times10\times10 = 294$ $000$

$7\times7\times3\times2\times9\times1\times10\times10 = 264$ $600$

$7\times7\times3\times2\times9\times9\times1\times10 = 238$ $140$

$7\times7\times3\times2\times9\times9\times9\times1 = 214$ $326$

$7\times7\times3\times8\times2\times1\times10\times10 = 235$ $200$

$7\times7\times3\times8\times2\times9\times1\times10 = 211$ $680$

$7\times7\times3\times8\times2\times9\times9\times1 = 190$ $512$

$7\times7\times3\times8\times8\times2\times1\times10 = 188$ $160$

$7\times7\times3\times8\times8\times2\times9\times1 = 169$ $344$

$7\times7\times3\times8\times8\times8\times2\times1 = 150$ $528$

Sum: $2$ $156$ $490$

Permutations whose first successes occur at the fourth digit place:

$7\times7\times7\times3\times2\times1\times10\times10 = 205$ $800$

$7\times7\times7\times3\times2\times9\times1\times10 = 185$ $220$

$7\times7\times7\times3\times2\times9\times9\times1 = 166$ $698$

$7\times7\times7\times3\times8\times2\times1\times10 = 164$ $640$

$7\times7\times7\times3\times8\times2\times9\times1 = 148$ $176$

$7\times7\times7\times3\times8\times8\times2\times1 = 131$ $712$

Sum: $1$ $002$ $246$

Permutations whose first successes occur at the fifth digit place:

$7\times7\times7\times7\times3\times2\times1\times10 = 144$ $060$

$7\times7\times7\times7\times3\times2\times9\times1 = 129$ $654$

$7\times7\times7\times7\times3\times8\times2\times1 = 115$ $248$

Sum: $388$ $962$

Permutations whose first successes occur at the sixth digit place:

$7\times7\times7\times7\times7\times3\times2\times1 = 100$ $842$

Sum: $100$ $842$

By taking the sum of these six sums of sets of permutations, we have determined that there are $15$ $426$ $684$ $8$-digit strings that contain each of any three distinct digits. Therefore we can find the probability that all three digits you guess are in the serial number on a dollar bill in the following way:

$P(win)=\frac{15426684}{100000000}=0.15426684$

Applying this whole process in general involves setting up $p-m+1$ sets of combinations, where the sets explore combinations whose first successes occur anywhere from the first digit place to the $(p-m+1)$th digit place. The way you consider all the combinations in a given set seems to involve moving the last success from right after the preceding success along to the end, then starting the process over again for every position closer to the end the preceding success is moved, and then starting that over again for every position closer to the end the success that precedes that one is moved, and so on, such that the number of combinations follows a binomial pattern. To get permutations from combinations, you substitute the number of ways each outcome can happen. At the beginning, there will always be $m$ ways for a success to occur at any spot and $n-m$ ways for a failure to occur. Every time you mark down a success, the number of ways for a success to occur at any subsequent spot decrements by one and the number of ways for a failure to occur increments by one. Marking failures does not change the number of ways either can occur. Lastly, once $m$ successes have been marked down, any subsequent spots can take on any of the $n$ possible values. Then it's just a matter of summing the sets of permutations of interest and dividing by the total possible number of permutations. I'd like to think that your formula is mathematical shorthand for carrying out this long-winded process, but if there's one thing I can guarantee, its that your formula is completely backed by the results I've obtained from each of the methods I've used to tackle this problem.

6. Apr 29, 2014

### lavoisier

Thank you both very much eigenperson and tycoon515! I couldn't have hoped for more professional and thorough explanations.
My approach was much less sophisticated than yours, but it's nice to know I got it right nevertheless.
Interesting though to see how even someone with limited mathematical knowledge like myself can sometimes produce correct results when there's money involved...

I think it's now safe to assume that the binomial distribution can't be used in this case.
From tycoon515's answer I seem to gather that it's rather a sum of several hypergeometric-looking terms that eventually (in some way) produces my formula, and the formula itself also still contains a term decreasing by one unit at a time raised to a constant power, (n-i)^p, rather than a constant ratio raised to a changing power.

I don't know, perhaps there is a way to express my sum more concisely, or as a combination of known hypergeometric-based functions, but it may be 'shorthand' enough as it is.

The bottom line, if I understand correctly, is that with a 15% probability of losing, the guy had reasonably high odds of making money by playing this game long enough. I'll be careful if anyone approaches me with this kind of gambles...

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook