Coupon Collection Problem Variation

Master1022 · Sep 28, 2021

Hi,

I was attempting the following problem but didn't quite understand the steps in the solution. The problem reminded me of the coupon problem, but the probabilities of the 'coupons' are different.

Question: Call a “consecutive difference” the absolute value of the difference between two consecutive rolls of a d6. For example, the sequence of rolls 143511 has the corresponding sequence of consecutive differences 31240. What is the expected number of rolls until all 6 consecutive differences have appeared?

Attempt:
The first thing I did was draw out a table of the differences between two d6 rolls and found the probability distribution of the difference ##D##: ##P(D = 0) = \frac{6}{36}##, ##P(D = 1) = \frac{10}{36}##, ##P(D = 2) = \frac{8}{36} ##, ## P(D = 3) = \frac{6}{36} ##, ## P(D = 4) = \frac{4}{36} ##, and ## P(D = 5) = \frac{2}{36} ##.

The first difference will be shown after 1 roll. After that, I thought I would do the sum of ## \frac{1}{p} ##, but the solution says:

''' The most likely first consecutive dif- ference to get is the one with probability 10/36, then the expected number of rolls until getting a new consecutive difference would be ## \frac{36}{(36 − 10)} = \frac{36}{26} ##. We can proceed in this way, getting an estimate of ##1 + \frac{36}{36} + \frac{36}{26} + \frac{36}{18} + \frac{36}{12} + \frac{36}{6} + \frac{36}{2} \approx 32.38 ## '''

However, I don't understand the intuition behind terms such as ## \frac{36}{36 - 10} ##. Would anyone be able to help out and explain these concepts?

Thanks in advance

Office_Shredder · Sep 29, 2021

After you hit the 10/36 case, there are only 26 rolls you can get which give you a new case you haven't seen. So you need to be do 36/26 such rolls. The most likely outcome there is that you get the one with 8/36 chance. Then 10+8=18 out of the 36 cases you wrote down have been seen, so there are only 36-18=18 cases left that give you a new consecutive difference. So you need to roll 36/18 times on average to see a new case.

I think a nice follow up question to this solution is: is it overestimating or underestimating the true correct answer?

Master1022 · Sep 29, 2021

Office_Shredder said:

After you hit the 10/36 case, there are only 26 rolls you can get which give you a new case you haven't seen. So you need to be do 36/26 such rolls. The most likely outcome there is that you get the one with 8/36 chance. Then 10+8=18 out of the 36 cases you wrote down have been seen, so there are only 36-18=18 cases left that give you a new consecutive difference. So you need to roll 36/18 times on average to see a new case.

I think a nice follow up question to this solution is: is it overestimating or underestimating the true correct answer?

Thanks @Office_Shredder ! This does make sense now.

I think it is overestimating the true answer. I feel like the above method is considering getting a new difference one at a time (almost assuming some kind of 'independence')

haruspex · Oct 2, 2021

A crude lower bound can be had by taking the most improbable (i.e. largest) difference and seeing how long on average before that difference arises. For an N-sided die, about ##N^2/2##.
Better, we can note that the probability of difference i is
##i=0: p_i=\frac 1N##
##i>0: p_i=\frac{2N-2i}{N^2}##
Treating consecutive differences as independent, the probability that difference i occurs at least once in r trials is ##1-(1-p_i)^r##.
So for all to have occurred, the probability is ##\Pi_i(1-(1-p_i)^r)##.
For this to occur first at the rth roll: ##\Pi_i(1-(1-p_i)^r)-\Pi_i(1-(1-p_i)^{r-1})##.
Average number of rolls = ##\Sigma_{r=1}r[\Pi_i(1-(1-p_i)^r)-\Pi_i(1-(1-p_i)^{r-1})]##
##=\Sigma_{r=0} (1-\Pi_i(1-(1-p_i)^r))##.
This gives about 22.47 for N=6, 5.73 for N=3.

Treating consecutive differences as independent is likely to give a low answer. In practice, small differences are more likely to be followed by more small differences than by large differences.

The exact solution for N=3 is 7. I got this by defining 16 states, denoted as ##\alpha n##, where ##\alpha## is A if the last roll was a 1 or 3, B if the last roll was a 2, and ##n## is a number from 0 to 7 indicating which differences have been encountered: 0 initially, then the sum of 1 if a 0 has been seen, 2 if a 1 has been seen, 4 if a 2 has been seen. We stop when the state number reaches 7.

So the sequence of throws 13112 would go through the states A0 A4 A4 A5 B7.
It is not hard to figure out how many rolls more would be expected on average in each state, working back from the final states.
E.g. from A6 the next roll takes us:
1-> A7
2-> B6
3-> A6
whereas from B6:
1-> A6
2-> B7
3-> A6
whence we compute the average further throws as 3 for both A6 and B6.
Note that states B4 and B5 cannot be reached.

For N=6, I simulated the exact behaviour in a spreadsheet using random numbers. It seems to average about 27.

pbuk · Oct 2, 2021

haruspex said:

For N=6, I simulated the exact behaviour in a spreadsheet using random numbers. It seems to average about 27.

Using this CodePen the expected value appears to converge to ~~exactly 25. Interesting.~~ approximately 25.85.

pbuk · Oct 2, 2021

haruspex said:

The exact solution for N=3 is 7.

I haven't checked your workings but the Monte Carlo solution above appears to converge to ~7.81.

haruspex · Oct 2, 2021

pbuk said:

I haven't checked your workings but the Monte Carlo solution above appears to converge to ~7.81.

You're right, I found an error.
These are my equations:
A7=B7=0 (we're done, no more throws needed)
B6=1+(A6+B7+A6)/3=1+2A6/3
(i.e., last throw was a 2 (so 'B'), we throw a 1, 2 or 3 next;
throwing a 1 means diff of 1, which ORs 2 into the state number and the new 'last throw' is 'A'; etc.
Note that we don't need to distinguish between last throw being a 1 or 3 because of the symmetry.)
A6=1+(A7+B6+A6)
Solving: A6=B6=3.
A5=1+(A5+B7+A5)/3= 1+2A5/3
Hence A5=3
A3=1+(A3+B3+A7)/3=1+(A3+B3)/3
Hence 2A3=3+B3
B3=1+(A3+B3+A3)/3
Hence B3=6, A3=9/2
A1=1+(A1+B3+A5)/3=1+(A1+6+3)/3=A1/3+4
Hence A1=6
B1=1+(A3+B1+A3)/3=1+(B1+9)/3
Hence B1=6
A2=1+(A3+B2+A6)/3=1+(9/2+B2+3)/3=B2/3+7/2
B2=1+(A2+B3+A2)/3=3+2A2/3
Hence A2=(3+2A2/3)/3+7/2
7A2/9=9/2
A2=81/14
B2=48/7
A4=1+(A5+B6+A4)/3=3+A4/3
Hence A4=9/2
A0=1+(A1+B2+A4)/3=1+(6+48/7+9/2)/3=1+2+16/7+3/2=95/14
B0=1+(A2+B1+A2)/3=1+(81/7+6)/3=48/7
Expected number of throws=1+(A0+B0+A0)/3=164/21=7.8095...

Applying the same approach to N=6 gets hard because at the point where about half of the differences have arisen we get 20 different numeric elements to the state (3 of the six possible differences) and 3 alphabetic elements, leading to a system of nearly 60 simultaneous equations to solve. (A few combinations are impossible.)
That could be automated, though.

haruspex · Oct 3, 2021

A next step would be to find an upper bound.
If the last throw was an extreme value, 1 or N, then all differences are possible and equally likely for the next throw. If there are r differences yet to be achieved, the probability of hitting one is r/N.
If not at an extreme, it will be on average N/2 throws to get back to being at an extreme.
If we ignore the possibility of scoring a new difference from other than an extreme position we have an average of ##(N/r)(N/2)=N^2/(2r)## throws to achieve a new difference and return to an extreme.
Summing, that gives an upper bound of ##\frac 12N^2\ln(N)##.

Master1022 · Oct 8, 2021

Many thanks for your responses - these are very insightful!

Coupon Collection Problem Variation

What is the Coupon Collection Problem Variation?

What is the significance of the Coupon Collection Problem Variation?

What are some strategies for solving the Coupon Collection Problem Variation?

What are some limitations of the Coupon Collection Problem Variation?

What are some real-world examples of the Coupon Collection Problem Variation?

Similar threads

Hot Threads

Recent Insights