# Number of Die Throws until you get a repeated number

Homework Statement:
If we have a 6-sided die (it is a fair die), how many throws would we expect to make before we see a repeated number?
Relevant Equations:
Probability
Hi,

I found a similar problem online, but there wasn't any solution. The question actually had a larger number of sides (e.g. 50 or 100, etc.), but I thought the underlying solution method should be the same for any arbitrary ##n##-sided die.

Question:
If we have a 6-sided die (it is a fair die), how many throws would we expect to make before we see a repeated number?

Attempt:
I have been thinking about this problem for a while, and am very stuck on how to proceed. I did write a Python script to simulate this game for different ##n##, and the graph aligned with intuition (larger ##n## required more throws until repetition). I remember reading that this problem was similar to the birthday paradox conceptually, however I haven't been able to use that to solve the problem.

If I follow that birthday paradox line of reasoning, I suppose I could do:
$$P(\text{repetition on 5th throw}) = 1 \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \times \frac{4}{6} = \frac{(6)! \cdot 4}{(6-4)! \cdot 6^5}$$

Thus this could be generalized to the probability of a repetition on the ##k##-th throw:
$$P(\text{repetition on k-th throw for n sided die}) = \frac{(n)! \times (k-1)}{(n-(k-1))! \cdot n^k}$$

Is the above correct? I am not sure I am fully convinced by the abiThis question was an interview question, so how could one find the expected number without a calculator - are there tricks or approximations?

Any help would be greatly appreciated.

PeroK
Homework Helper
Gold Member
2020 Award
You can't go more than ##7## throws without repetition. Using your formula for ##k =2 \dots 7## should be easy?

WWGD, FactChecker and Master1022
Buzz Bloom
Gold Member
The chances are 0 in 6 = 0.0 ... that you hit on the first roll.
The chances are 1 in 6 = 0.1666 ... that you hit on the second roll.
The chances are 2 in 6 = 0.3333 ... that you hit on the third roll.
The chances are 3 in 6 = 0.5 that you hit on the fourth roll.
The chances are 4 in 6 = 0.6666 ... that you hit on the fifth roll.
The chances are 5 in 6 = 0.8333 ... that you hit on the sixth roll.
The chances are 6 in 6 =1.0 that you hit on the seventh roll.

I hope you find this helpful.

I corrected this during the time @PeroK posted his comment.

PeroK
Homework Helper
Gold Member
2020 Award
The chances are 1 in 6 = 0.1666 ... that you hit on the first roll.
The chances are 2 in 6 = 0.3333 ... that you hit on the second roll.
The chances are 3 in 6 = 0.5 that you hit on the third roll.
The chances are 4 in 6 = 0.6666 ... that you hit on the fourth roll.
The chances are 5 in 6 = 0.8333 ... that you hit on the fifth roll.
The chances are 6 in 6 =1.0 that you hit on the sixth roll.

I hope you find this helpful.
Or, wrong!

As @PeroK pointed out, you can't go more than 7 throws without repetition ##-##
You can't go more than ##7## throws without repetition. Using your formula for ##k =2 \dots 7## should be easy?
More precisely, 7 throws is enough to guarantee repetition ##-## this is an example of the pigeonhole principle.

Delta2
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

PeroK
Homework Helper
Gold Member
2020 Award
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

What do you mean by repetition? Is that two in a row the same?

sysprog
Thanks @PeroK, @Buzz Bloom , and @sysprog!

Sure, for this 6-sided die, going through different values of ##k## isn't too bad. However, I just used 6 to simplify the problem (was originally 100-sided die), in which case I think it becomes more difficult. Was my underlying method/reasoning correct?

By the pigeonhole principle, if you throw a 100-sided die 100 times and get a different number on each throw, there is no number for throw 101 that is not a repeated number. You of course could get a repeat on throw 2; however, you have only a 1% chance of doing that. Throw 1 sets the first number that is a candidate for repetition. If throw 2 is a different number, then you have 2 candidate numbers for throw 3, meaning you have a 2% chance on throw 3 ##\dots##

Last edited:
FactChecker
Gold Member
One thing to realize is that the original problem with "100-sided die" might be expecting an entirely different approach to the solution. A large number of sides makes me suspect that there is a continuous approximation that they expect you to apply. I'm sorry that I am too rusty to recognize what the approximation might be (Poisson? It doesn't seem right.). There might be something appropriate in the section where the problem appears.

Last edited:
WWGD, sysprog and Master1022
What do you mean by repetition? Is that two in a row the same?
Sorry. Just mean't the same number appears again, but doesn't have to be consecutive. E.g. 1, 4, 5, 1 would count as 1 has been repeated now

sysprog
pbuk
Gold Member
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" ]."

Last edited:
Master1022, jim mcnamara, DaveC426913 and 3 others
PeroK
Homework Helper
Gold Member
2020 Award
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!

sysprog
Or, it must be about ##\frac n 2 + 1## but it's a tricky calculation to get an exact answer!
I think that's partly because there are multiple ways to do it ##-## here's one such that for ##n=100## I'd need a calculator to get an exact answer: to get a repetition in ##k## rolls of an ##n##-sided die, ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}##. (ref)

Last edited:
One thing to realize is that the original problem with "100-sided die" might be expecting an entirely different approach to the solution. A large number of sides makes me suspect that there is a continuous approximation that they expect you to apply. I'm sorry that I am too rusty to recognize what the approximation might be (Poisson? It doesn't seem right.). There might be something appropriate in the section where the problem appears.
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)

Master1022, FactChecker and PeroK
FactChecker
Gold Member
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
Nice! I guess I was thinking of an asymptotic approximation to a known common probability distribution, like Poisson. But the question was asked in an interview, so it may be just to see how the person responds and analysis the problem, as @pbuk pointed out. There might not be any reasonably simple solution.

Master1022 and sysprog
Nice! I guess I was thinking of an asymptotic approximation to a known common probability distribution, like Poisson. But the question was asked in an interview, so it may be just to see how the person responds and analysis the problem, as @pbuk pointed out. There might not be any reasonably simple solution.
Handling the (similar) 'birthday problem' as Poisson processes is discussed here.

Also, the expected number of rolls until the first ##k##-repetition with an ##n##-sided die is given by the integral: ##\mathbb{E}(T) = \int_0^\infty \left(\sum_{j=0}^{k-1} {1\over j!}\left({x\over n}\right)^j\right)^n \,e^{-x}\,dx##. (ref)

I would use my TI-83 calculator for this if I wanted a reasonably quick answer.

uart, FactChecker and PeroK
PeroK
Homework Helper
Gold Member
2020 Award
An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)
For ##n =6## I got ##3.847##, and your formula gives ##3.737##. It appears to converge quickly.

FactChecker and sysprog
PeroK
Homework Helper
Gold Member
2020 Award
Also, quite amusing is the case of a coin, I.e. ##n =2##. The formula gives ##2.44 ##, which is pretty close to the precise ##2.5##.

sysprog
Doing the recursion manually for ##n=6## yields ##{1223\over324}\approx 3.77##. (ref)

WWGD
Gold Member
Sometimes an interviewer will ask a question that cannot be answered within the available time or resources: they are interested in how you approach the problem rather than what the solution is.

So a good answer might go "Hmmm, that looks tricky. I can see straight away that the answer must be less than 101 [interviewer notes "can narrow down a problem by setting bounds"], but it looks like a lot less [Interviewer notes "demonstrates ability to estimate"]. It reminds me of the birthday paradox [interviewer notes "has experience of common problems in probability"]". At this point the interviewer interrupts: "yes that's right: do you know anywhere this might be useful in your job at CodesRUs?".

Candidate: "Well the same problem is encountered when we create a hash key for indexing a table - the longer the key the less frequently we get hash collisions [Interviewer notes: "follow-up interview with database team"]. And again in cryptography - I think it's even called a 'birthday attack'; this is where an attacker exploits the repetition associated with the length of the key [Interviewer notes: "follow-up interview with security team"]. And of course the birthday paradox always goes down well at parties [Interviewer notes: "not suitable for customer facing role" ]."
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?

sysprog
PeroK
Homework Helper
Gold Member
2020 Award
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
A formula that provides a good approximation was provided here:

An asymptotic approximation would be: ##\sum_{k=2}^{n+1}\frac{k!}{n^{k-1}}\binom{n-1}{k-2}\sim\sqrt{\frac{n\pi}2}+\frac23##
(with an error of approximately ##\frac1{10\sqrt{n}}##). (ref)

Master1022, WWGD and sysprog
FactChecker
Gold Member
A technical point : 101 throws gives you a guarantee, but I believe the question was about the expected number of throws. Would n/2 or ( least integer greater than n/2 ) for an n-sided fair die be a good rule of thumb?
That seems like a good guess, but it is not close to the approximation, ##\sqrt{n\pi/2}+2/3##, that @sysprog gave. That gives about 13 when n=100.
Compare this problem to the well-known problem of the odds of two students in a class of 30 having the same birthday. The answer to that is that it is very likely. But half of 365 would make it very unlikely.

Master1022, sysprog and WWGD
WWGD
Gold Member
That seems like a good guess, but it is not close to the approximation, ##\sqrt{n\pi/2}+2/3##, that @sysprog gave. That gives about 13 when n=100.
Compare this problem to the well-known problem of the odds of two students in a class of 30 having the same birthday. The answer to that is that it is very likely. But half of 365 would make it very unlikely.
Thanks. So much for my intuition ;).

FactChecker
Gold Member
Thanks. So much for my intuition ;).
Mine too. When I read your post I thought you had to be right. But now I guess not.

WWGD
PeroK