How many candies should you buy to get all the toys?

  • Context: Graduate 
  • Thread starter Thread starter jostpuur
  • Start date Start date
  • Tags Tags
    Numbers Random
Click For Summary

Discussion Overview

The discussion revolves around a mathematical problem related to probability and combinatorics, specifically focusing on how many random draws are needed to ensure that all items from a finite set are collected at least once. This is framed within the context of a scenario involving candies and toys, leading to the exploration of the coupon collector's problem.

Discussion Character

  • Exploratory
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • Some participants discuss the probability of drawing all numbers from a set using a random number generator, noting that M random draws from a set of N numbers can be analyzed mathematically.
  • One participant suggests that the distribution of toys affects the number of candies needed to collect all toys, indicating that non-uniform distributions would require more purchases.
  • Another participant models the number of draws needed to collect all toys, arriving at an average of 29.5 draws based on simulations.
  • There is a discussion about the coupon collector's problem, with participants noting that it can be approached through combinatorial methods involving "balls and boxes."
  • Some participants provide different methods to calculate the expected number of draws, including using harmonic series approximations.
  • There is a debate about the validity of fractional draws in the context of random number generation, with differing opinions on the significance of the precision of the average number of draws.
  • One participant emphasizes the importance of understanding the degree of error in calculations, preferring analytical methods over purely computational ones.

Areas of Agreement / Disagreement

Participants express a range of views on the methods of calculating the expected number of draws, with some favoring simulation results and others preferring analytical approaches. There is no consensus on the best method or the significance of the differences in results.

Contextual Notes

Participants mention various mathematical models and approximations, but there are unresolved details regarding the assumptions behind the distributions and the implications of using different calculation methods.

jostpuur
Messages
2,112
Reaction score
19
Suppose we have a random number generator which gives a random numbers from a set {1,2,3,...,N}, and any single number of these comes out with probability 1/N.

Then fix some number M > N, and take M random numbers out from the generator. What is the probability, that these M numbers include each one of the numbers 1,2,3,...,N at least once?

---

I know one person who buys some kind of candy where a small toy always comes with the candy. There is some number of different kind of toys out there, and this person became interested to know how many candy he/she should buy before getting them all. I found myself unable to answer it.
 
Physics news on Phys.org
jostpuur said:
I know one person who buys some kind of candy where a small toy always comes with the candy. There is some number of different kind of toys out there, and this person became interested to know how many candy he/she should buy before getting them all. I found myself unable to answer it.

You would need to know the distribution of the toys. Are there more toy cars than toy boats? Obviously, if it's not uniform, you would have to buy a lot more to get one of each.
 
I told the background about toys and candy for sake of motivation, but in the end I'm interested in the mathematical problem which I outlined.

If a simple mathematical model for a real world problem is too difficult, then it doesn't make sense to seek for more complicated models either. It's better to solve models one step at a time.
 
Well, I can model how many numbers I would have to draw from a set of 10 to get at least 1 of each (a mean of 29.5).
 
Mensanator said:
Well, I can model how many numbers I would have to draw from a set of 10 to get at least 1 of each (a mean of 29.5).

How did you get this number 29.5?
 
Your first question is easy to turn into balls and boxes:

We have M balls and N boxes, with M > N. We want to know
  • How many ways are there to put the balls into the boxes
  • How many ways are there to put the balls into the boxes such that each box has at least one ball

I claim the first one is easy. I think the second one can be transformed into an easy problem, but I'm having a mental block on the details.




Your second question is a famous problem -- the coupon collector's problem.
 
Last edited:
I see. So the collector's problem can be solved without solving the problem I described, although my problem would give one way of approaching the collector's problem.
 
jostpuur said:
How did you get this number 29.5?

By randomly selecting numbers in the range (0-9) and storing them in a dictionary. When the length of the dictionary reaches 10, there must be a least 1 of every value. At that point, the sum of the dictionary is the number of draws. On some runs, it might take only 17 draws to get all 10, other runs might take 45. I do 1000 such runs and take the average: 29.5.
 
Hurkyl said:
Your first question is easy to turn into balls and boxes:

We have M balls and N boxes, with M > N. We want to know
  • How many ways are there to put the balls into the boxes
  • How many ways are there to put the balls into the boxes such that each box has at least one ball

I claim the first one is easy. I think the second one can be transformed into an easy problem, but I'm having a mental block on the details.




Your second question is a famous problem -- the coupon collector's problem.

The first one is simply M choose N.

The second one is simply (M-1) choose (N-1).

(This turns up in the Collatz Conjecture.)
 
  • #10
Mensanator said:
By randomly selecting numbers in the range (0-9) and storing them in a dictionary. When the length of the dictionary reaches 10, there must be a least 1 of every value. At that point, the sum of the dictionary is the number of draws. On some runs, it might take only 17 draws to get all 10, other runs might take 45. I do 1000 such runs and take the average: 29.5.

After reading Hurkyl's link to the Wikipedia page, I would prefer a following calculation

[tex] 10\cdot\Big(1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{10}\Big) \approx 29.29[/tex]

:wink:
 
  • #11
jostpuur said:
After reading Hurkyl's link to the Wikipedia page, I would prefer a following calculation

[tex] 10\cdot\Big(1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{10}\Big) \approx 29.29[/tex]

:wink:

You DO realize you can't get a fraction of a draw from a random number generator?
 
  • #12
Mensanator said:
You DO realize you can't get a fraction of a draw from a random number generator?

Why would that preclude a fractional answer?
 
  • #13
CRGreathouse said:
Why would that preclude a fractional answer?

I didn't say it would. But IF I don't have the formula at hand AND Wikipedia is not available, THEN I prefer my answer as there's nothing stopping me from obtaining it and it's good enough. The difference between 29.29 and 29.5 is just splitting hairs.
 
  • #14
I would not see it as exactly splitting hairs.

Euler has a formula for the harmonic series http://en.wikipedia.org/wiki/Harmonic_number:

[tex]In(n) +\gamma +1/2n -1/(12n^2)+ 1/(12n^4)- +-[/tex]

Letting gamma be .5772 and dropping terms past the square factor gives about 29.29 after multiplication by 10.

It is rather useful to get a handle on the degree of error. I prefer such a calculation to a computer devised answer that does not tell us about the error involved.
 
Last edited by a moderator:
  • #15
robert Ihnot said:
I would not see it as exactly splitting hairs.

Euler has a formula for the harmonic series http://en.wikipedia.org/wiki/Harmonic_number:

[tex]In(n) +\gamma +1/2n -1/(12n^2)+ 1/(12n^4)- +-[/tex]

Letting gamma be .5772 and dropping terms past the square factor gives about 29.29 after multiplication by 10.

It is rather useful to get a handle on the degree of error. I prefer such a calculation to a computer devised answer that does not tell us about the error involved.

Like I said, that's fine IF you know Euler's formula. A computer devised answer is preferable to no answer.
 
Last edited by a moderator:
  • #16
My gosh! I must add that my TI-86 can sum up the 50 case IN FOUR SECONDS! giving 224.96.

As far as this 10 case, why plain arithmetic will do here: 1+.5+.3333+.2500 + .2000+.1667+.1429+.1250 +.1111+.1000 = 2.9290.

2.929x10 = 29.290. This can be summed in about 30 seconds. If very careful probably around 45 seconds.
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 29 ·
Replies
29
Views
6K
  • · Replies 3 ·
Replies
3
Views
5K
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 45 ·
2
Replies
45
Views
17K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K