Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Help! I need the expected value for a tricky situation!

  1. Aug 26, 2011 #1
    I need your help. I’m really terrible at statistics, probability, permutations, and combinations. I’ve looked around a lot for the answer to this, but I haven’t seen anything just like it. I need the answer to this for some research that I’m doing.

    Here’s the situation:
    Imagine 60 bins or locations. Each bin is assigned one randomly-selected number, out of 256 possible numbers. It is possible that two or more bins can receive the same number.

    My question is this: out of those 60 bins, on average, how many bins contain different numbers? Or, to put it another way, on average, how many different numbers are contained in the 60 bins?

    Clearly, it’s possible that on some runs of this experiment, the 60 bins could all contain different numbers, so the count would be exactly 60. It’s also possible, although extremely unlikely, that all 60 bins could contain the same number, so the count would be 1.

    From doing computer simulations of this situation, I get that the average count is between 53 and 54, which sounds reasonable. I just don’t know how to calculate that theoretical average count.

    I also actually created simpler versions of this situation, with fewer bins and fewer numbers to assign to the bins. I then counted how many different numbers there were, and calculated the average. Here’s what I got:
    For 3 bins and 4 numbers, the average count is 2.3125.
    For 3 bins and 5 numbers, the average count is 2.44.
    For 3 bins and 6 numbers, the average count is 2.527778.
    For 3 bins and 7 numbers, the average count is 2.591837.
    For 3 bins and 8 numbers, the average count is 2.640625.

    So, a general formula for this situation should give these results.

    I will be very grateful if you can tell me what the answer is for 60 bins and 256 numbers, and even more grateful if you can show me the formula for calculating this.

    Thank you in advance!
  2. jcsd
  3. Aug 26, 2011 #2
    Last edited: Aug 26, 2011
  4. Aug 26, 2011 #3

    It sounds like Stirling numbers of the second kind are definitely related to what I need.

    (By the way, I love this line from the Wikipedia article: "Stirling numbers of the second kind is one of two types of Stirling numbers, the other type being called Stirling numbers of the first kind." :) )

    As I understand the description, the Stirling number tells me the total number of ways that 256 numbers can go into the 60 bins.

    What I need now is a way to count how many of those "ways" use 60 different numbers, how many use 59, then 58, and so forth.

  5. Aug 26, 2011 #4
    S(n,k) tells you the number of ways to partition a set of n objects into k non-empty (nondistinct) subsets. Multiply by k! to partition them into k non-empty distinct subsets (I forgot that second step - edited my earlier post).

    For example: how many ways are there to have 3 distinct numbers among 5 bins? Well let's ask a slightly easier question, how many ways are there to have the specific numbers 10, 11, and 12 among 5 bins (and no other numbers)? This is the number of ways you can distribute the 5 bins among the 3 distinct sets labeled 10, 11, and 12, such that each set gets at least 1 bin. Putting a bin in one of those sets means you assign the corresponding number to that bin. So it's 3! S(5,3) = 150

    But in that case we used the specific 3 numbers 10, 11, and 12, but they could really be any 3 numbers from the 60. So multiply by C(60,3) to find the number of ways to get exactly 3 numbers among 5 bins: 3! S(5,3) C(60,3) = 34220.
  6. Aug 26, 2011 #5
    Let [tex]X_i = 1[/tex] if number i is drawn at least once, = 0 otherwise, for i = 1, 2, 3, ..., 256.

    Then [tex]P(X_i = 1) = 1 - (1 - 1/256)^{60}[/tex]
    [tex]E(X_i) = 1 - (1 - 1/256)^{60}[/tex]

    So the expected number of distinct numbers drawn is
    [tex]E(\sum_{i=1}^{256} X_i) = \sum_{i=1}^{256} E(X_i) = 256 \; [1 - (1 - 1/256)^{60}][/tex]

    Here we have made use of the theorem that E(X+Y) = E(X) + E(Y). It's important to realize that this theorem holds even if X and Y are not independent, which is good for us in this case because the X_i's are not independent.
  7. Aug 26, 2011 #6
    That also works... you only need the Stirling numbers if you want an explicit distribution.
  8. Aug 27, 2011 #7
    awkward, that is it!

    That gives a value of 53.5802554, which is right in the range that my simulation suggests!

    Thank you so much for the formula and the explanation!
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook