# What is the probability exactly 3 out of N people share the same birthday?

1. Jan 14, 2013

### Ryker

1. The problem statement, all variables and given/known data
What is the probability that in a class of N people you will be able to find a trio that shares the same birthday? You only need to find one such trio, and the rest can have whatever birthday they want, as long as it's not the same as that of the chosen trio. So it could be that we have a trio and everyone else has the same birthday or that we have multiple trios. The only thing that matter is that there is one such trio.

3. The attempt at a solution
${365\choose 1}{N\choose 3}\frac{1}{365^{3}}\frac{364^{N-3}}{365^{N-3}}$

I don't see where I could've gone wrong, but Mathworld says that the probability of at least 3 people sharing a birthday tops 50% at 88. But with my formula I get a lower number than that despite the fact that what I'm asking is a subset of that. So my probability for the same number N should be lower, thereby requiring a bigger class for the probability to exceed 50%.

What am I doing wrong?

2. Jan 14, 2013

### jbunniii

Why can't it be the same? If 4 people share the same birthday, then you certainly have (several different) trios that share that birthday.

3. Jan 14, 2013

### haruspex

I eventually figured out you meant a lower number than 88 for the 50% threshold.
Your mistake is that if there is more than one trio you will count each of them. You need the probability that there's at least one trio.

4. Jan 14, 2013

### Ryker

Well it can't be the same, because that's how the question was posed to me, I didn't make my impose my own arbitrary rules on it So like I explain below, we can't have 4 people sharing the same birthday, unless there's already exactly three other ones already sharing a different one.
Yeah, thinking about it, that's what I thought, as well. But how would one go about correcting that?
Well, OK, but that's not what the question asks. We want a trio, and while we allow for quartets and quintets and whatnot if there is a trio in the class, without the former having the latter doesn't satisfy the conditions.

Thanks, guys, and please you and anyone else comment further if you can help me!

5. Jan 15, 2013

### haruspex

Yes it is - at least one exact trio and maybe quartets etc. too. The OP says so here: "So it could be that we have a trio and ... or that we have multiple trios."
You can use the principle of inclusion and exclusion to eliminate double-counting of trios, but the formula might get messy.

6. Jan 15, 2013

### Ryker

Oh, sorry, I thought you meant the standard birthday problem statement of "at least three people having the same birthday", which is different from my particular case. So yeah, it's the probability of at least one trio, but not of at least three people having the same birthday. As for the principle of inclusion and exclusion, I'm not familiar with that and don't know how to even start on it. Any further comments on this perhaps?

7. Jan 15, 2013

### haruspex

8. Jan 15, 2013

### haruspex

To use inclusion-exclusion, you need to get an expression for the probability that a given set of r dates each occurs as a trio (regardless of what other dates do). If that is Pr, then inclusion-exclusion quickly gives you a series sum involving these to get the answer you're looking for. I believe I have found an expression for Pr, but I doubt the series can be summed analytically.
For 6 individuals and only 3 possible dates I get 155/243.

9. Jan 15, 2013

### Ray Vickson

Your quantity above is merely the first term in a (finite) series; when the other terms are included, the number needed to have a probability of ≥ 50% rises to 89. In fact, for n = 88 the probability of one or more trio is 0.48935 while for n = 89 it is 0.500044. Here is how I get these.

For i = 1,2,...,365, let D_i be the event that exactly 3 people have birthday i. Then, the inclusion-exclusion principle implies that the probability P≥1 = P{at least one event Di occurs} is
$$P_{\geq 1} = S_1 - S_2 + S_3 - S_4 + \cdots .$$
Here
$$S_1 = \sum_{i=1}^{365} P(D_i) = 356 P(D_1) \\ S_2 = \sum_{1 \leq i < j \leq 365} P(D_i D_j) = {365 \choose 2} P(D_1 D_2)\\ S_3 = \sum_{1 \leq i < j < k \leq 365} P(D_i D_j D_k) = {365 \choose 3} P(D_1 D_2 D_3)\\ \vdots$$
where we assume each of the 365 days is equally likely and where we use the notation $AB$ instead of $A \cap B$, $A B C$ instead of $A \cap B \cap C$, etc. You computed $S_1$ in your post #1.

The probability P(D_1) is given by the binomial distribution with parameters N, p = 1/365:
$$P(D_1) = {N \choose 3} p^3 (1-p)^{N-3}.$$
The probability P(D_1 D_2) is given by the trinomial distribution with parameters $N, p_1 = p_2 = p = 1/365, p_3 = 1 - 2p$:
$$P(D_1 D_2) = {N \choose 3, 3, N-6} p^6 (1-2p)^{N-6} = \frac{N!}{3!^2 \, (N-6)!} (1/365)^6 (363/365)^{N-6},$$.
Similarly,
$$P(D_1 D_2 \ldots D_k) = {N \choose 3,3,3, \ldots, 3, N - 3k} p^{3k} (1 - kp)^{N - 3k},$$
where the multinomial coefficient is
$${N \choose j_1, j_2, \ldots, j_r, N-j_1 - j_2 - \cdots - j_r} = \frac{N!}{j_1! j_2! \cdots j_r! (N - j_1 - j_2 - \cdots j_r)!}$$

Your computation was just S_1, but the other terms are important as well. While in principle the expression for P≥1 should include up to $\min(365, \lfloor N/3 \rfloor)$ terms, in practice it suffices to take about 8 terms because the successive Sk are decreasing and the series is alternating, so the error by stopping at Sj is less than Sj+1.

Last edited: Jan 15, 2013
10. Jan 18, 2013

### Ryker

Thanks a lot for that, Ray, and sorry for replying so late. Your post was really informative, so I appreciate the work you put into it. As for the solutions itself, we haven't dealt with multinomial coefficients and don't think ever will, so while I figured the general formula you listed would have to be applied (the alternating sum), I was stuck when trying to obtain the actual probabilities for multiple trios.

The weird thing is I actually now think the professor thinks the solution I posted in my original post is actually the correct one...

11. Jan 19, 2013

### Ray Vickson

You don't really need the multinomial, but using it makes things a bit easier (but only if you know the multinomial!). In fact, $P(D_1 D_2) = P(D_1) P(D_2|D_1)$. So, if $B(n,p) = C(n,3) p^3 (1-p)^{n-3}$ denotes the binomial probability of 3 successes in n trials with success probability p per trial we have
$$P(D_1) = B(N,1/365) \text{ and } P(D_2|D_1) = B(N-3,1/364),$$
because given that exactly 3 people were born on January 1, we now have (N-3) people left for the other 364 days of the year. Thus,
$$P(D_1 D_2 ) = B(N,1/365) B(N-3,1/364).$$
Similarly,
$$P(D_1 D_2 \cdots D_k) = B(N,1/365) B(N-3,1/364) B(N-6,1/363) \cdots B(N - 3(k-1),1/(365-k+1)).$$
If you expand these out in detail you will just get the multinomial formula, but of course you do not need to expand them out: you can just use them as-is. You can even do the calculations this way, if you have a "binomial calculator".

Now, as to your final sentence: the problem as stated needs to use inclusion-exclusion; for large N the numerical differences between just the first term and the complete series are important. I really do hope your instructor does not think the correct answer to the problem --- as stated --- is just your S_1, because that is patently not the case.

Please do not take my word for it; the material is found in all probability books, and the formulas I used are explained and proved in such classics as Feller, "Introduction to Probability Theory and Its Applications, Vol I"", Wiley 1968 and many, many others (although Feller's book is my very favourite probability book). If the problem really meant something different from what was written then, of course, all bets are off.

Last edited: Jan 19, 2013
12. Feb 5, 2013

### Ryker

Sorry for taking so long with the reply, Ray, but I wanted to wait until we get our homeworks back. Well, as it turns out, my instructor did and still does think that the answer I gave in the original post is the correct one. Luckily, after you helped me, I looked at what we've covered thus far, and anticipated that would be the case. So I actually just went with the "solution" from my original post and got full marks :uhh: Unfortunately, however, as the term progresses we're seeing that this mistake isn't the only one the instructor has made, and while I can laugh about the homework problem at hand (especially in light of knowing the actual answer now, thanks to you), I think this might lead to problems down the line...