# Homework Help: Probability and Mathematica

Tags:
1. Oct 1, 2016

### BennyT

1. The problem statement, all variables and given/known data
The problem I'm having involves looking for a specific formula for the probability of matching a deck of playing cards via the result of a shuffle. I want to know how many times I must shuffle for the probability of the shuffle matching one of the past resulting decks to be at least 50%.

2. Relevant equations
At first, I considered the binomial equation, but because each shuffle introduces a new deck to compare subsequent shuffles to I found myself at a loss. So the equation I have examined is
P=1-(1-p)^n

3. The attempt at a solution
I thought about the probabilities step-wise looking for a pattern. For example, say I do the same with a 32 sided dice. At first, I choose any side to be the one I am looking for a match. So in one roll, the probability of rolling a match is 1/32. But in two rolls, it would be (1/32)+(31/32)(2/32), because it assumes 31 out of 32 times I need to reroll, but now I have two sides I will consider a success. For 3 rolls, it would be (1/32)+(31/32)(2/32)+(31/32)(31/32)(3/32)? Correct? I just don't know whether my take on probability is correct and would love any comment.

2. Oct 1, 2016

### haruspex

You are on the right track, but there is a mistake in the last term. What is the probability that none of the first three matched?
But this sequential approach is tough. It is easier if you ask instead, given n trials, in how many ways can they all produce different results.

3. Oct 1, 2016

### andrewkirk

The problem statement is ambiguous.

What I think you meant to ask is

'How many shuffles must there be for there to be at least a 50% probability that at least two of the card orderings are identical?'

This is the same as the birthday problem, but replacing 365 by 52. That link should tell you how to solve it. Post back here if you have trouble. A little care is needed, to notice that the number of different card orders is one more than the number of shuffles.

The way you have written the problem is different though. You refer to the probability of 'the shuffle' matching a previous shuffle. That sounds like a reference to the latest shuffle. That is potentially a more complex question and needs more information in order to be well-posed. Depending on exactly what you mean it may involve stopping times and/or expected values. Just to give an idea of why it becomes more complex, note that the probability of the latest shuffle matching an earlier one depends not only on the number of shuffles so far, but also on how many of them were different. If we've done 100 shuffles and every time, by an amazing fluke, they came out the same, the probability of the next shuffle matching a previous order is 1/52!. On the other hand, if every one of the 101 orders is different, the probability is 101/52!.

Let me know if you didn't mean the birthday problem, and we can explore what it is that you are after.

4. Oct 1, 2016

### BennyT

I'm definitely going to give this a read. Thank you for your help.
To clarify, I'm looking to find the number of shuffles (with the assumption no successful deck match has been found prior) necessary so that the probability of subsequent shuffles matching any one of the unique combinations found in prior reorders of the deck is 50%. I'm sorry my question is so ambiguous, but I thought of it initally as having a deck order, then comparing a single shuffle to the order and including all the deck orders in a group to be compared to all subsequent shuffles but then summing the probability of each occurring event.

I used the dice example because I don't want to explicitly figure out the problem I am working on as it is entended to be homework, but I'm a bit more concerned about the way the odds add up.

For example, the probability of matching after one shuffle is 1/52!. But then for two shuffles, it means we have two orders to compare it to, which I thought would mean the probability of finding a match in two shuffles would be (1/52!)+(52!-1)/(52!)(2/52!) because it factors in the 52!-1 results one doesn't find a match and the new probability of 2/52! . I don't know if this thinking is correct, but then next term or shuffle probability is what has me most confused.

Would it be (1/52!)+(52!-1)/(52!)(2/52!)+(((52!-1)/(52!))^2)(3/52!) or (1/52!)+(52!-1)/(52!)(2/52!)+((52!-1)/(52!))((52!-2)/(52!))(3/52!) for three shuffles?

I'm sorry for responding before I read about the birthday paradox, but I'm more concerned with rather my understanding of sequential probability is correct.

All help is sincerely appreciated, and I will be posting back soon after I have read over than advice given before.

5. Oct 1, 2016

### BennyT

Do you mean that it should be (1/32)+(31/32)(2/32)+(30/32)(31/32)(3/32) instead? I'm not quite sure I'm thinking properly about my probability.

6. Oct 1, 2016

### BennyT

Also, the overall goal is to find an expression which I can give to Mathematica to determine the number of shuffles I need to meet this condition. Honestly, I don't know if I'll actually be able to considering the size of these numbers and Mathematica's number limit. But I'm also considering using approximation methods such as Stirling's Approximation once I understand the probability as a function of shuffles.

Once again, I cannot thank you all enough.

7. Oct 1, 2016

### BennyT

After looking into the birthday paradox, it appears to be the same problem but instead of 365 options, I have 52! options. So, I am a bit skeptical as to whether, approximations or not, I will be able to produce a number answer.

8. Oct 1, 2016

### Ray Vickson

You have the birthday problem, but with $N = 52!$ birthdays instead of 365. Assuming perfect shuffles, the probability that the second shuffles does not match the first is $p_2 = (N-1)/N = 1 -1/N$. The probability that the third shuffles does not match either of the first two is $p_3 = (N-1)(N-2)/N^2 = (1- 1/N)(1-2/N)$, etc. So, you want the smallest integer $k > 1$ giving
$$\left( 1 - \frac{1}{N} \right) \left( 1 - \frac{2}{N} \right) \cdots \left( 1 - \frac{k-1}{N} \right) \leq 1/2.$$

9. Oct 1, 2016

### andrewkirk

'subsequent shuffles' or 'the next shuffle'? It doesn't sound like the birthday problem, but perhaps a (IMHO) more interesting one.

Perhaps what you are searching for is as follows.

Consider an infinite sequence $(M_j)_{j\in\mathbb N}$of random variables in $S_{52}$ (which is the names for the set of all 52! permutations of 52 numbers), and let $A_n$ denote the sequence $\langle M_1,M_2,...,M_n\rangle$, ie the first $n$ orderings of the deck.

(1) Find the smallest $n$ such that $E[Prob(M_{n+1}\in A_n)]\geq 0.5$

In words, find the smallest $n$ such that the expected probability of the $n+1$th ordering matching one of the $n$ earlier ones is at least 50%.

To contrast, the birthday problem, using 52! to replace 365, is

(2) Find the smallest $n$ such that $Prob(\exists j,k\leq n:\ M_j=M_k)\geq 0.5$

In words, find the smallest $n$ such that there is at least a 50% probability that, amongst the first $n$ orderings, there exist at least two (numbered $j$ and $k$) that are identical.

Let us know if you want to do (1). The birthday one (2) has already been covered.

EDIT: Ha! I just realised that $n$ will be enormous for problem (1). I suspect it will be more than 52!/2.

Last edited: Oct 1, 2016
10. Oct 2, 2016

### BennyT

I was expecting an answer around order 10^34, so that sounds spot on. Thank you for clarifying my issues with probability. That's actually what I wanted to check more than anything as I'd like to try the problem, the actual solving of it, on my own.
May I ask how you ended up finding that in Maple? I'd like to see if I can find a similar notation to make a generic case in Mathematica for any N.

Thank you once again.
Also, the reason I was expecting that order of magnitude was when I first attempted a solution, I found a probability of about 0.50000001 was given around N=9.89873*10^33. I did this just by changing N until it got to about 0.5 then tweaked it. I probably sound like an idiot right now though so, eh.

11. Oct 2, 2016

### BennyT

Oh, by subsequent shuffles I think I meant the next shuffle after N shuffles will have a probability of 0.5, and then shuffles further on would be greater than or equal to 0.5. So I guess subsequent was just referring to the shuffle at which the probability after N shuffles is at least 0.5, or as you put it the next shuffle. Sorry, I'm really bad with language and math. I sincerely appreciate the time you have put into helping me with this question.

Looking at your cases, I believe I am looking to do case 2, but then again I don't quite understand case 1. If you have time to do so, would you mind giving me a more tangible example of what number 1 would be? It doesn't need to be with playing cards.

Thank you so very much.

12. Oct 2, 2016

### BennyT

EDIT: I also would like to take the time to figure out the actual answer to this question on my own, but what I'm really looking for guidance for is looking at the probability as a function of N so that I may be able to translate its expression into a way Mathematica or another computational tool could help me find a number value like the user using Maple had found. Thank you once again.
Also, by N I mean it to be the variable that denotes the number of shuffles, I just recognized this inconsistency.

Last edited: Oct 2, 2016
13. Oct 2, 2016

### haruspex

You should not need to use Mathematica, etc.
If you use the approach I suggested in post #2, you should get a simple equation involving factorials. With Stirling's formula, and a suitable approximation for num trials << N, leads to trials = constant times √N. I'll leave you to find the constant.

14. Oct 2, 2016

### BennyT

Would that constant by any chance be (-2ln(1/2))^(1/2)?

15. Oct 2, 2016

### haruspex

I got (-ln(1/2))^(1/2), i.e. √ ln(2)

16. Oct 2, 2016

### BennyT

I found that p(no match)=e^(-(N^2/(2n)) where n is my number of combinations. Then 1-p(match)=p(no match) and taking the log I get ln(1-.5)=-(N^2/(2n) so N=(-2ln(1/2)*N)^(1/2).

17. Oct 2, 2016

### Ray Vickson

That is essentially the "birthday problem" solution, but as Andrew Kirk pointed out in #9, that may not be the solution to your problem. In the birthday problem the complement of the event {no match in n shuffles} is not the same as the event {n th shuffle is the first match}.

Formulas for the correct problem are not difficult to obtain. Let events be Dn ={first n shuffles are all different} and Mn= {shuffle n matches one of the other (n-1) shuffles}. The probability of a first match at shuffle n is $P_n = P(D_{n-1}\: \& \:M_n)$, and you can get this from a simple conditional-probability argument.

However, If you really want to find an $n$ giving $P_n \geq 1/2$ you are out of luck: you are being asked an ill-posed, impossible question. In fact, the largest single value of $P_n$ for any $n =1, 2, \ldots, 52!$ is about $P_{\max} \doteq 1/\sqrt{e \, 52!} \doteq 0.675 \times 10^{-34}$.
The probability mass-function $P_n, n = 1,2, \ldots 52!$ consists of a huge number of tiny numbers that do, however, sum to 1. The birthday problem solution essentially gives you the cumulative distribution of the $\{P_n\}$, not the individual $P_n$ values.

Last edited: Oct 2, 2016
18. Oct 2, 2016

### BennyT

I'm sorry, but I don't quite understand why the question I asked isn't just the birthday paradox with 52! as the number of days. I may have said something earlier to convince you otherwise.

So what exactly is the difference between the birthday problem and the other more complicated one? I am fairly new to understanding probability.

19. Oct 2, 2016

### Ray Vickson

There are $N = 52!$ possible shuffles, so the number of the first matching shuffle can be any integer from 2 to 52! (not the 1 to 52! that I said above). So, if $P_n$ is the probability that the first match occurs at shuffle $n$ we have a probability distribution $\{ P_n \}$ on the sample space $n = 2, 3, \ldots, 52!$. If $D_k$ is the event that the first $k$ shuffles contain no match, then $1-P(D_k)$ is the probability of at least one match among shuffles $2, \ldots, k$ and so is the probability that the first match occurred on or before shuffle $k$. In other words, it is $P_2 + P_3 + \cdots + P_k$. That is the cumulative distribution of $\{ P_n \}$ evaluated at $n=k$, and is what you get from the "birthday" problem analysis.

We can set
$$P_k = P(D_{k-1}) P(\text{match at} \; k|D_{k-1}) = (k/N) e^{-k^2/2N}.$$
This is an approximation, but will be good to more than 20-30 digits when $N = 52!$ and $k << N$.

If we set $k = r \sqrt{N}$ we get
$$P_k = \frac{r}{\sqrt{N}} e^{-r^2/2}$$.
Since $\max_{r > 0} r \exp(-r^2/2) = e^{-1/2} = 1 \sqrt{e}$, we have $P_k \leq 1/\sqrt{N e}$ for all $k$.

20. Oct 2, 2016

### BennyT

Maybe I'm not thinking about this problem correctly, but for the question I'm asking, I don't believe I need a match to be found. I'm just looking for how many shuffles it takes for the probability of shuffles after this number of shuffles matching prior deck arrangements to be 0.5.

21. Oct 2, 2016

### andrewkirk

See last part of post 3. That probability depends on how many different orderings occurred in the first n shuffles. So you need to take an expected value to make this question meaningful.

22. Oct 2, 2016

### Ray Vickson

I think you do need a match to be found at shuffle n, but in a different way from the two scenarios already given. If I understand what you are saying now, you have an event Ek = {shuffle k matches at least one of shuffles 1,2,...,k-1}, but with any number of matches allowed in the first (k-1) shuffles. This is different from the cases already given, which were {one or more matches occur within shuffles 1,2,...,k} (birthday version) or {first match occurs at shuffle k} (version I mentioned in #19). In post #1 you said "I want to know how many times I must shuffle for the probability of the shuffle matching one of the past resulting decks to be at least 50%". To me, that says that you want the smallest $k$ giving $P(E_k) \geq 0.5$; it says nothing about "future" shuffles---only the current one and the past ones.

Now in #20 above you seem to have changed the problem to something like "find a $k$ so that $P(E_k \cup E_{k+1} \cup \cdots \cup E_N)$ is close to 0.5". Again, though, you leave it a bit undefined: are the "prior arrangements" at shuffle k and beyond only those that occured during shuffles 1,...,k, or is it OK for shuffle (k+n) to match one of shuffles 1,2,...,k+n-1, whether they came before or after shuffle k?

Last edited: Oct 2, 2016
23. Oct 2, 2016

### haruspex

In case it is still not clear, here are several different things you might mean:
Earlier you wrote
That is, you have executed r shuffles, you have observed that no two are the same, and you want the probability that the next shuffle will yield a match. Clearly that is r/52! So to get an evens chance you need r=52!/2.

Dropping the assumption of no two the same so far:
You have executed r shuffles (some of which might match) and you want the next shuffle to have an evens chance of a match.
In this case, it should be clear that the number of shuffles needed is greater than 52!/2. I get r=52! ln(2).

If you intend the Birthday problem, the correct description would be "how many shuffles do I need to do to have an evens chance that some two of them yield the same result". Note there's nothing here about the next shuffle. This gives something of the order of √(52!), as discussed.

Last edited: Oct 2, 2016
24. Oct 3, 2016

### Ray Vickson

I think I finally get it: there is a 50% probability that shuffle (k+1) matches one of shuffles 1,2,...,k if those k shuffles contain at least M = N/2 distinct permutations, where N = 52!. So, one way to think about the problem is to ask how many shuffles are needed until M different ones have been encountered.

That is reminiscent of the "Coupon Collector's Problem", but the difference now is that having to collect all N = 52! coupons, we can stop when we have collected M = N/2 of them. Of course, we cannot say exactly how many shuffles will be needed, but we can speak of the probability distribution of that number, or the mean number, etc.

The moment-generating function $Q_M(z)$ of the number of shuffles $S_M$ needed to "collect" M different permutations is
$$Q_M(z) = z^M \prod_{n=1}^M \frac{N-k}{N-kz}$$.
The mean number of shuffles needed is
$$ES_M = N \sum_{n=1}^M \frac{1}{n}$$.
The variance of the number of needed shuffles is
$$\text{Var}(S_M) = N \sum_{n=1}^M \frac{n}{(N-n)^2}$$.

It is almost hopelessely difficult to obtain accurate, usable expressions for the actual probability distribution of $S_M$, and for such large M and N as we have in this problem, getting precise values for the mean and variance is difficult, too. However, "good" approximations are obtainable. Use
$$f(1) + f(2) + \cdots + f(M) \approx \frac{1}{2} [f(1) + f(M)] + \int_1^M f(x) \, dx,$$
which obtainable by approximating the integral by the trapezoidal-rule sum. Applying this to $f(x) = N/x$ gives
$$\sum_{n=1}^M N/n \doteq N/2 + N/(2M) +N \ln(M) = 1 + N/2 + N \ln(N/2) \doteq .1259620083 \times 10^{71}$$.
Applying the approximation to the case $f(x) = N x/(N-x)^2$ gives
$$\text{Var}{S_M} \doteq \frac{(1-\ln 2) (N^2-N)-1}{N-1} \doteq .2475018846 \times 10^{68}$$
The standard deviation of $S_M$ is $\sigma_M = \sqrt{ \text{Var}(S_M)} \doteq .4974956126 \times 10^{34}$.

So, to summarize: the number of shuffles needed is random, but with mean about $0.126 \times 10^{68}$ and standard deviation about $0.497 \times 10^{34}$.

Last edited: Oct 3, 2016