Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Generalized Birthday Problem

  1. Aug 5, 2013 #1
    There is a lot of information on the web about how to calculate the probability that, in an arbitrarily-sized group, 2 people will share a birthday. However, I am trying to determine the probability that a larger number of people are born on a specific day (e.g., a group of people have a birthday today, as opposed to "sharing" a birthday any time during the year). I can't find anything on this point.

    I am trying to address this problem in the context of the following example: Assuming a group of 800 people, what are the odds that 10 people in that group have a birthday today?

    The probability that any particular one of those 800 people has a birthday today is 1/365. The probability that any particular one does not have a birthday today is 364/365. We want exactly ten people to have a birthday today, and the probability of any 10 having a birthday today and any 790 not having a birthday today is ((1/365)^10)*((364/365)^790). However, this needs to be multiplied by 800!/10!790!, because that is the number of combinations meeting the specified criteria. Does this reasoning make sense?

    Many thanks.
     
  2. jcsd
  3. Aug 5, 2013 #2

    mathman

    User Avatar
    Science Advisor
    Gold Member

    Yes - it is a direct application of binomial distribution.
     
  4. Aug 5, 2013 #3
    Great. Thank you. There are two additional cases I would like to explore: first, if the number of people having a birthday today (let than number = n) is defined as n >=10; second, if the condition for success is not merely that n people's birthday is today, but, rather, is that n people have the same birthday at any time during the year (ignoring leap years).

    In order to solve for the first case, would it be necessary to express the probability for each n? [i.e., ((1/365)^10)*((364/365)^790)*(800!/(10!790!)) + ... + ((1/365)^800)*((364/365)^0)*(800!/800!)]

    In order to solve for the second case, it seems like the probability increases by a factor of 365. This is because the probability of any one person having a birthday on any particular day of the year is (365/365)^1. For n=10, this needs to be multiplied by the probability of 9 people having the same birthday and 790 people not having such birthday. This is no different from the expression I initially used, so the entire expression for this second case is: ((365/365)^1)*((1/365)^9)*((364/365)^790)*(800!/10!790!). I can't find anything wrong with this, but I have a gut feeling that it doesn't work.
     
  5. Aug 6, 2013 #4

    mathman

    User Avatar
    Science Advisor
    Gold Member

    You have to be a little careful for the second case. You may have more than one occurrence of 10 people having birthdays on the same day.
     
  6. Aug 14, 2013 #5
    Excellent point. However, accounting for all of the cases in which groups of n=10 people share the same birthday produces a lengthy expression. First, you must account for the case I described above. Then you need to account for a case in which there are two groups, with each group composed of 10 people sharing the same birthday. So on and so forth for all possible groups of ten people. As there are 800 people, the most extreme case is one in which all 800 people are distributed amongst 80 groups of 10 people who share the same birthday. Each individual case (e.g., 2 groups of 10 sharing a birthday) is conjunctive. But the total probability of any one of these cases occurring is disjunctive. So it is necessary to determine the individual probability of each such case and then sum such individual probabilities. An expression for this sum would take the form (A) + (B) + ... + (Z), where (A) = ((365/365)^1)*((1/365)^9)*((364/365)^790)*(800!/10!790!); (B) = ((365/365)^1)*((1/365)^9)*((364/365)^1)*((1/365)^9)*((363/365)^780)*(800!/10!10!780!); and (Z) = ((365/365)^1)*((1/365)^9)*((364/365)^1)*((1/365)^9)*...*((286/365)^1)*((1/365)^9)*(800!/10!10!...10!514!).

    This expression is quite unwieldy (and possibly also incorrect-I'm not entirely confident in it). And even though it could probably be stated more elegantly using sigma notation, the actual calculations required are substantial. Additionally, if one wanted to further generalize the equation by permitting n>=10 (rather than n=10) in the case for which the above expression applies, an even more complicated expression would be required.

    Is there a "trick" for dealing with this type of counting problem, or is it really necessary to approach it through a lengthy expression as I have.

    Many thanks.
     
  7. Aug 14, 2013 #6

    mathman

    User Avatar
    Science Advisor
    Gold Member

    I haven't tried it, but I don't think there is any trick.
     
  8. Aug 14, 2013 #7
    Do you see the potential for the application of the poisson distribution? I have no particular way to do so, but it was suggested to me as a potential simplifying solution.
     
  9. Aug 15, 2013 #8

    mathman

    User Avatar
    Science Advisor
    Gold Member

    Poisson distribution is a good approximation to binomial when the mean is much smaller than the total number. I am not sure how it would apply here.
     
  10. Aug 24, 2013 #9

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    The Poisson distribution applies when you aggregate trials, each of which has a very small chance of success, into a batch large enough that there is a reasonable chance of some successes. In the present context, if you take a batch of N people, each having prob p of a birthday today, the prob that k do is therefore approx (Np)k e-Np/k!.
    An alternative is to observe that the binomial distribution has mean Np and variance Np(1-p) and approximate it with a normal distribution with the same parameters.

    But you really want the prob that some r have the same birthday. So imagine tossing N balls into 365 buckets. The count in the ith bucket is some random variable Xi. If the cdf of Xi is F(x), and we pretend they're independent, the prob that no bucket has r or more is Fr(x). I think that taking them as independent will tend to overestimate the prob of high counts.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Generalized Birthday Problem
  1. Birthday Problem (Replies: 2)

Loading...