Coupon Collection Problem Variation

  • Thread starter Thread starter Master1022
  • Start date Start date
  • Tags Tags
    Variation
AI Thread Summary
The discussion centers on a variation of the coupon collection problem involving the expected number of rolls of a d6 until all six consecutive differences appear. Participants analyze the probability distribution of differences and how to calculate expected rolls using concepts like conditional probabilities and independence. The initial approach estimates the expected rolls by summing fractions based on the probabilities of obtaining new differences after each roll. Some contributors suggest that treating the differences as independent may lead to an underestimation of the expected rolls, while others provide simulations and calculations that converge to specific values for different die sizes. The conversation concludes with insights into the complexity of deriving exact solutions and potential upper bounds for the expected number of rolls.
Master1022
Messages
590
Reaction score
116
Homework Statement
Call a “consecutive difference” the absolute value of the difference between two consecutive rolls of a d6. For example, the sequence of rolls 143511 has the corresponding sequence of consecutive differences 31240. What is the expected number of rolls until all 6 consecutive differences have appeared?
Relevant Equations
Probability
Hi,

I was attempting the following problem but didn't quite understand the steps in the solution. The problem reminded me of the coupon problem, but the probabilities of the 'coupons' are different.

Question: Call a “consecutive difference” the absolute value of the difference between two consecutive rolls of a d6. For example, the sequence of rolls 143511 has the corresponding sequence of consecutive differences 31240. What is the expected number of rolls until all 6 consecutive differences have appeared?

Attempt:
The first thing I did was draw out a table of the differences between two d6 rolls and found the probability distribution of the difference ##D##: ##P(D = 0) = \frac{6}{36}##, ##P(D = 1) = \frac{10}{36}##, ##P(D = 2) = \frac{8}{36} ##, ## P(D = 3) = \frac{6}{36} ##, ## P(D = 4) = \frac{4}{36} ##, and ## P(D = 5) = \frac{2}{36} ##.

The first difference will be shown after 1 roll. After that, I thought I would do the sum of ## \frac{1}{p} ##, but the solution says:

''' The most likely first consecutive dif- ference to get is the one with probability 10/36, then the expected number of rolls until getting a new consecutive difference would be ## \frac{36}{(36 − 10)} = \frac{36}{26} ##. We can proceed in this way, getting an estimate of ##1 + \frac{36}{36} + \frac{36}{26} + \frac{36}{18} + \frac{36}{12} + \frac{36}{6} + \frac{36}{2} \approx 32.38 ## '''

However, I don't understand the intuition behind terms such as ## \frac{36}{36 - 10} ##. Would anyone be able to help out and explain these concepts?

Thanks in advance
 
Physics news on Phys.org
After you hit the 10/36 case, there are only 26 rolls you can get which give you a new case you haven't seen. So you need to be do 36/26 such rolls. The most likely outcome there is that you get the one with 8/36 chance. Then 10+8=18 out of the 36 cases you wrote down have been seen, so there are only 36-18=18 cases left that give you a new consecutive difference. So you need to roll 36/18 times on average to see a new case.

I think a nice follow up question to this solution is: is it overestimating or underestimating the true correct answer?
 
  • Like
Likes Master1022
Office_Shredder said:
After you hit the 10/36 case, there are only 26 rolls you can get which give you a new case you haven't seen. So you need to be do 36/26 such rolls. The most likely outcome there is that you get the one with 8/36 chance. Then 10+8=18 out of the 36 cases you wrote down have been seen, so there are only 36-18=18 cases left that give you a new consecutive difference. So you need to roll 36/18 times on average to see a new case.

I think a nice follow up question to this solution is: is it overestimating or underestimating the true correct answer?
Thanks @Office_Shredder ! This does make sense now.

I think it is overestimating the true answer. I feel like the above method is considering getting a new difference one at a time (almost assuming some kind of 'independence')
 
A crude lower bound can be had by taking the most improbable (i.e. largest) difference and seeing how long on average before that difference arises. For an N-sided die, about ##N^2/2##.
Better, we can note that the probability of difference i is
##i=0: p_i=\frac 1N##
##i>0: p_i=\frac{2N-2i}{N^2}##
Treating consecutive differences as independent, the probability that difference i occurs at least once in r trials is ##1-(1-p_i)^r##.
So for all to have occurred, the probability is ##\Pi_i(1-(1-p_i)^r)##.
For this to occur first at the rth roll: ##\Pi_i(1-(1-p_i)^r)-\Pi_i(1-(1-p_i)^{r-1})##.
Average number of rolls = ##\Sigma_{r=1}r[\Pi_i(1-(1-p_i)^r)-\Pi_i(1-(1-p_i)^{r-1})]##
##=\Sigma_{r=0} (1-\Pi_i(1-(1-p_i)^r))##.
This gives about 22.47 for N=6, 5.73 for N=3.

Treating consecutive differences as independent is likely to give a low answer. In practice, small differences are more likely to be followed by more small differences than by large differences.

The exact solution for N=3 is 7. I got this by defining 16 states, denoted as ##\alpha n##, where ##\alpha## is A if the last roll was a 1 or 3, B if the last roll was a 2, and ##n## is a number from 0 to 7 indicating which differences have been encountered: 0 initially, then the sum of 1 if a 0 has been seen, 2 if a 1 has been seen, 4 if a 2 has been seen. We stop when the state number reaches 7.

So the sequence of throws 13112 would go through the states A0 A4 A4 A5 B7.
It is not hard to figure out how many rolls more would be expected on average in each state, working back from the final states.
E.g. from A6 the next roll takes us:
1-> A7
2-> B6
3-> A6
whereas from B6:
1-> A6
2-> B7
3-> A6
whence we compute the average further throws as 3 for both A6 and B6.
Note that states B4 and B5 cannot be reached.

For N=6, I simulated the exact behaviour in a spreadsheet using random numbers. It seems to average about 27.
 
haruspex said:
For N=6, I simulated the exact behaviour in a spreadsheet using random numbers. It seems to average about 27.
Using this CodePen the expected value appears to converge to exactly 25. Interesting. approximately 25.85.

 
Last edited:
  • Informative
Likes Master1022
haruspex said:
The exact solution for N=3 is 7.
I haven't checked your workings but the Monte Carlo solution above appears to converge to ~7.81.
 
pbuk said:
I haven't checked your workings but the Monte Carlo solution above appears to converge to ~7.81.
You're right, I found an error.
These are my equations:
A7=B7=0 (we're done, no more throws needed)
B6=1+(A6+B7+A6)/3=1+2A6/3
(i.e., last throw was a 2 (so 'B'), we throw a 1, 2 or 3 next;
throwing a 1 means diff of 1, which ORs 2 into the state number and the new 'last throw' is 'A'; etc.
Note that we don't need to distinguish between last throw being a 1 or 3 because of the symmetry.)
A6=1+(A7+B6+A6)
Solving: A6=B6=3.
A5=1+(A5+B7+A5)/3= 1+2A5/3
Hence A5=3
A3=1+(A3+B3+A7)/3=1+(A3+B3)/3
Hence 2A3=3+B3
B3=1+(A3+B3+A3)/3
Hence B3=6, A3=9/2
A1=1+(A1+B3+A5)/3=1+(A1+6+3)/3=A1/3+4
Hence A1=6
B1=1+(A3+B1+A3)/3=1+(B1+9)/3
Hence B1=6
A2=1+(A3+B2+A6)/3=1+(9/2+B2+3)/3=B2/3+7/2
B2=1+(A2+B3+A2)/3=3+2A2/3
Hence A2=(3+2A2/3)/3+7/2
7A2/9=9/2
A2=81/14
B2=48/7
A4=1+(A5+B6+A4)/3=3+A4/3
Hence A4=9/2
A0=1+(A1+B2+A4)/3=1+(6+48/7+9/2)/3=1+2+16/7+3/2=95/14
B0=1+(A2+B1+A2)/3=1+(81/7+6)/3=48/7
Expected number of throws=1+(A0+B0+A0)/3=164/21=7.8095...

Applying the same approach to N=6 gets hard because at the point where about half of the differences have arisen we get 20 different numeric elements to the state (3 of the six possible differences) and 3 alphabetic elements, leading to a system of nearly 60 simultaneous equations to solve. (A few combinations are impossible.)
That could be automated, though.
 
  • Like
Likes pbuk
A next step would be to find an upper bound.
If the last throw was an extreme value, 1 or N, then all differences are possible and equally likely for the next throw. If there are r differences yet to be achieved, the probability of hitting one is r/N.
If not at an extreme, it will be on average N/2 throws to get back to being at an extreme.
If we ignore the possibility of scoring a new difference from other than an extreme position we have an average of ##(N/r)(N/2)=N^2/(2r)## throws to achieve a new difference and return to an extreme.
Summing, that gives an upper bound of ##\frac 12N^2\ln(N)##.
 
  • Informative
Likes Master1022
Many thanks for your responses - these are very insightful!
 
Back
Top