# Number of Die Throws until you get a repeated number

Master1022
Homework Statement:
If we have a 6-sided die (it is a fair die), how many throws would we expect to make before we see a repeated number?
Relevant Equations:
Probability
Hi,

I found a similar problem online, but there wasn't any solution. The question actually had a larger number of sides (e.g. 50 or 100, etc.), but I thought the underlying solution method should be the same for any arbitrary ##n##-sided die.

Question:
If we have a 6-sided die (it is a fair die), how many throws would we expect to make before we see a repeated number?

Attempt:
I have been thinking about this problem for a while, and am very stuck on how to proceed. I did write a Python script to simulate this game for different ##n##, and the graph aligned with intuition (larger ##n## required more throws until repetition). I remember reading that this problem was similar to the birthday paradox conceptually, however I haven't been able to use that to solve the problem.

If I follow that birthday paradox line of reasoning, I suppose I could do:
$$P(\text{repetition on 5th throw}) = 1 \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \times \frac{4}{6} = \frac{(6)! \cdot 4}{(6-4)! \cdot 6^5}$$

Thus this could be generalized to the probability of a repetition on the ##k##-th throw:
$$P(\text{repetition on k-th throw for n sided die}) = \frac{(n)! \times (k-1)}{(n-(k-1))! \cdot n^k}$$

Is the above correct? I am not sure I am fully convinced by the abiThis question was an interview question, so how could one find the expected number without a calculator - are there tricks or approximations?

Any help would be greatly appreciated.

Homework Helper
Gold Member
2021 Award
You can't go more than ##7## throws without repetition. Using your formula for ##k =2 \dots 7## should be easy?

WWGD, FactChecker and Master1022
Gold Member
The chances are 0 in 6 = 0.0 ... that you hit on the first roll.
The chances are 1 in 6 = 0.1666 ... that you hit on the second roll.
The chances are 2 in 6 = 0.3333 ... that you hit on the third roll.
The chances are 3 in 6 = 0.5 that you hit on the fourth roll.
The chances are 4 in 6 = 0.6666 ... that you hit on the fifth roll.
The chances are 5 in 6 = 0.8333 ... that you hit on the sixth roll.
The chances are 6 in 6 =1.0 that you hit on the seventh roll.

I hope you find this helpful.

I corrected this during the time @PeroK posted his comment.

Homework Helper
Gold Member
2021 Award
The chances are 1 in 6 = 0.1666 ... that you hit on the first roll.
The chances are 2 in 6 = 0.3333 ... that you hit on the second roll.
The chances are 3 in 6 = 0.5 that you hit on the third roll.
The chances are 4 in 6 = 0.6666 ... that you hit on the fourth roll.
The chances are 5 in 6 = 0.8333 ... that you hit on the fifth roll.
The chances are 6 in 6 =1.0 that you hit on the sixth roll.

I hope you find this helpful.
Or, wrong!

sysprog
As @PeroK pointed out, you can't go more than 7 throws without repetition ##-##
You can't go more than ##7## throws without repetition. Using your formula for ##k =2 \dots 7## should be easy?
More precisely, 7 throws is enough to guarantee repetition ##-## this is an example of the pigeonhole principle.

Delta2
Master1022
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

Homework Helper
Gold Member
2021 Award
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

What do you mean by repetition? Is that two in a row the same?

sysprog
sysprog
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

By the pigeonhole principle, if you throw a 100-sided die 100 times and get a different number on each throw, there is no number for throw 101 that is not a repeated number. You of course could get a repeat on throw 2; however, you have only a 1% chance of doing that. Throw 1 sets the first number that is a candidate for repetition. If throw 2 is a different number, then you have 2 candidate numbers for throw 3, meaning you have a 2% chance on throw 3 ##\dots##

Last edited:
Gold Member
One thing to realize is that the original problem with "100-sided die" might be expecting an entirely different approach to the solution. A large number of sides makes me suspect that there is a continuous approximation that they expect you to apply. I'm sorry that I am too rusty to recognize what the approximation might be (Poisson? It doesn't seem right.). There might be something appropriate in the section where the problem appears.

Last edited:
WWGD, sysprog and Master1022
Master1022
What do you mean by repetition? Is that two in a row the same?
Sorry. Just mean't the same number appears again, but doesn't have to be consecutive. E.g. 1, 4, 5, 1 would count as 1 has been repeated now

sysprog
Gold Member
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" ]."

Last edited:
Master1022, jim mcnamara, DaveC426913 and 3 others
Homework Helper
Gold Member
2021 Award
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!

sysprog
sysprog
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!
I think that's partly because there are multiple ways to do it ##-## here's one such that for ##n=100## I'd need a calculator to get an exact answer: to get a repetition in ##k## rolls of an ##n##-sided die, ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}##. (ref)

Last edited:
sysprog
One thing to realize is that the original problem with "100-sided die" might be expecting an entirely different approach to the solution. A large number of sides makes me suspect that there is a continuous approximation that they expect you to apply. I'm sorry that I am too rusty to recognize what the approximation might be (Poisson? It doesn't seem right.). There might be something appropriate in the section where the problem appears.
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)

Master1022, FactChecker and PeroK
Gold Member
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
Nice! I guess I was thinking of an asymptotic approximation to a known common probability distribution, like Poisson. But the question was asked in an interview, so it may be just to see how the person responds and analysis the problem, as @pbuk pointed out. There might not be any reasonably simple solution.

Master1022 and sysprog
sysprog
Nice! I guess I was thinking of an asymptotic approximation to a known common probability distribution, like Poisson. But the question was asked in an interview, so it may be just to see how the person responds and analysis the problem, as @pbuk pointed out. There might not be any reasonably simple solution.
Handling the (similar) 'birthday problem' as Poisson processes is discussed here.

Also, the expected number of rolls until the first ##k##-repetition with an ##n##-sided die is given by the integral: ##\mathbb{E}(T) = \int_0^\infty \left(\sum_{j=0}^{k-1} {1\over j!}\left({x\over n}\right)^j\right)^n \,e^{-x}\,dx##. (ref)

I would use my TI-83 calculator for this if I wanted a reasonably quick answer.

uart, FactChecker and PeroK
Homework Helper
Gold Member
2021 Award
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
For ##n =6## I got ##3.847##, and your formula gives ##3.737##. It appears to converge quickly.

FactChecker and sysprog
Homework Helper
Gold Member
2021 Award
Also, quite amusing is the case of a coin, I.e. ##n =2##. The formula gives ##2.44 ##, which is pretty close to the precise ##2.5##.

sysprog
sysprog
Doing the recursion manually for ##n=6## yields ##{1223\over324}\approx 3.77##. (ref)

Gold Member
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" ]."
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?

sysprog
Homework Helper
Gold Member
2021 Award
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
A formula that provides a good approximation was provided here:

An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)

Master1022, WWGD and sysprog
Gold Member
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
That seems like a good guess, but it is not close to the approximation, ##\sqrt{n\pi/2}+2/3##, that @sysprog gave. That gives about 13 when n=100.
Compare this problem to the well-known problem of the odds of two students in a class of 30 having the same birthday. The answer to that is that it is very likely. But half of 365 would make it very unlikely.

Master1022, sysprog and WWGD
Gold Member
That seems like a good guess, but it is not close to the approximation, ##\sqrt{n\pi/2}+2/3##, that @sysprog gave. That gives about 13 when n=100.
Compare this problem to the well-known problem of the odds of two students in a class of 30 having the same birthday. The answer to that is that it is very likely. But half of 365 would make it very unlikely.
Thanks. So much for my intuition ;).

Gold Member
Thanks. So much for my intuition ;).
Mine too. When I read your post I thought you had to be right. But now I guess not.

WWGD
Homework Helper
Gold Member
2021 Award
Mine too. When I read your post I thought you had to be right. But now I guess not.
I beat you both to it:

Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!

FactChecker
Gold Member
I think that there must be multiple interpretations of what the question means. As I understand the question, a trial consists of throwing the die and counting the number of throws which takes place such that the last throw is a number that has appeared previously. The question is: what is the average number of throws per trial? For a six sided die, the number of throws for one trial can be 1, 2, 3, 4, 5, 6, or 7.

The probability of success for roll n is as follows

n=1 p=0
n=2 p=(1/6)=0.166666666666667
n=3 p = (5/6)x(2/6)=0.277777777777778
n=4 p=(5/6)x(4/6)x(3/6)=0.277777777777778
n=5 p=(5/6)x(4/6)x(3/6)x(4/6)=0.185185185185185
n=6 p=(5/6)x(4/6)x(3/6)x(2/6)x(5/6)=0.0771604938271605
n=7 p=(5/6)x(4/6)x(3/6)x(2/6)x(1/6)x1=0.0154320987654321

The average number of rolls over a large number of trials is 3.77469135802469.

The above was calculated on a spreadsheet. I have started to work on a spreadsheet to calculate the case for a die with 100 sides, but it is likely to take me a few days.

OOPS - Bad calculatation for n=7. I will fix it soon.
FIXED IT!

Last edited:
PeroK and sysprog
sysprog
The average number of rolls over a large number of trials is 3.77469135802469.
That's remarkably close to the manually calculable expected value.
sysprog said:
Doing the recursion manually for ##n=6## yields ##1223 \over {324}#### \approx 3.77##. (ref)

That's remarkably close to the manually calculable expected value.
I'm pretty sure that *is* the manually calculated expected value.

sysprog
sysprog
I'm pretty sure that *is* the manually calculated expected value.
I agree ##-## 1223/324 is the manually calculable expected value, and its decimal expansion for the number of digits reported by @Buzz Bloom is exactly the same as on a calculator ##-## the reason for my characterizing it as 'remarkably close' is that it was reported as an average over a large number of trials; not as simply the result of doing a decimal division to expand the expected value fraction . . .

Last edited:
Master1022
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" ]."
Sorry for late reply - finally have some time to get back to this problem. Sure, this seems reasonable. Also, thanks for the insight into the use cases; didn't know about those before and will definitely read up on them!

sysprog
Master1022
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
Thanks @sysprog ! I actually made a Python simulation of this game for an arbitrary ##n##-sided dice (unfortunately was made on a machine which I cannot access now). Nonetheless, your approximation gives a very similar result to some of the values I was getting from repeated experimentation (e.g. I also got a similar value for n = 100 to the one quoted by @FactChecker )

EDIT: what type of things should I read to learn about how one would derive that approximation? (or approximations to binomials in general)? I have never done any full probability & statistics courses so my knowledge only extends to reading random lecture notes on the internet.

Gold Member
Thanks @sysprog ! I actually made a Python simulation of this game for an arbitrary ##n##-sided dice (unfortunately was made on a machine which I cannot access now). Nonetheless, your approximation gives a very similar result to some of the values I was getting from repeated experimentation (e.g. I also got a similar value for n = 100 to the one quoted by @FactChecker )

EDIT: what type of things should I read to learn about how one would derive that approximation? (or approximations to binomials in general)? I have never done any full probability & statistics courses so my knowledge only extends to reading random lecture notes on the internet.
Try the original link and let us know if you have a question:

https://math.stackexchange.com/ques...ded-die-until-repeat-is-found/1908500#1908500

Here, basically, you have 6* options for the first throw, 5 for the second, etc. These can be rearranged in Cn,k (" n Choose k ) ways. All of it in n^k throws of the dice.

* Or, n, in the general case.

sysprog and Master1022
Master1022
Try the original link and let us know if you have a question:

https://math.stackexchange.com/ques...ded-die-until-repeat-is-found/1908500#1908500

Here, basically, you have 6* options for the first throw, 5 for the second, etc. These can be rearranged in Cn,k (" n Choose k ) ways. All of it in n^k throws of the dice.

* Or, n, in the general case.
Okay thanks will do. I will read the answer again, but I thought they just skipped from the summation expression to the form with ## \sqrt{\frac{n \pi}{2}} + ... ##

sysprog
Gold Member
Okay thanks will do. I will read the answer again, but I thought they just skipped from the summation expression to the form with ## \sqrt{\frac{n \pi}{2}} + ... ##
Yes, this is a limit form as the number of sides grows to ##\infty ##. I think it uses Stirling's approximation.

Master1022 and sysprog
sysprog
Thanks @sysprog ! I actually made a Python simulation of this game for an arbitrary ##n##-sided dice (unfortunately was made on a machine which I cannot access now). Nonetheless, your approximation gives a very similar result to some of the values I was getting from repeated experimentation (e.g. I also got a similar value for n = 100 to the one quoted by @FactChecker )

EDIT: what type of things should I read to learn about how one would derive that approximation? (or approximations to binomials in general)? I have never done any full probability & statistics courses so my knowledge only extends to reading random lecture notes on the internet.
Links to free textbooks on combinatorics:
https://www.whitman.edu/mathematics/cgt_online/cgt.pdf
https://users.math.msu.edu/users/bsagan/Books/Aoc/final.pdf
http://www-math.mit.edu/~rstan/ec/ec1.pdf
https://people.math.gatech.edu/~trotter/book.pdf
http://www.math.utk.edu/~wagner/papers/comb.pdf

There are many more if you do a search on: combinatorics textbook pdf
I included only a few that are from .edu sources; I can't be sure of the copyright authorizations on the others ##-## at least not sure enough to post them on PF.

In general, combinatorics may be thought of as a subdiscipline of discrete mathematics, although in asymptotic approximations and other aspects it abuts on or overlaps some continuous mathematics.

Last edited:
Master1022, PeroK and WWGD