Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question of finding the exepcted value of repeated numbers

  1. Jul 10, 2011 #1


    User Avatar

    Hi there,
    I am thinking a question of finding the expectation of duplicate numbers (how many time any pair could be made in N rows each contains one number). I start from 1000 numbers each randomly picked ranged 1 to 100. I sort the randomized 1000 numbers, count the total number of "1" called s1, count the total number of "2" called s2 and up to "100" (s100), obviously , for s1 "1", the way to combine any a pair of "1" is [tex]C_{s1}^2 = \displaystyle\frac{s1(s1-1)}{2}[/tex]. Similarly, the way for have a pair for all S2 "2" is [tex]C_{s2}^2 = \displaystyle\frac{s2(s2-1)}{2}[/tex] ....

    so the total duplicates is just [tex]\sum_i \frac{S_i(S_i-1)}{2}[/tex]

    However, we assume each number appears at equal probability such that in ideal case in the 1000 numbers, there will be 10 x "1", 10 x "2", 10 x "3" ... 10 x "100". Or we can say for any number 1 2 3 ... 100, the duplicate number will be 10 and which can be used to make

    [tex]S = 10 \times (10-1)/2 [/tex], so total expectation should be [tex]TOTAL EV = 100\times S = \frac{100 \times 10 \times (10-1)}{2} = 4500[/tex]

    But someone told me the expectation should be 4995. And I setup a simulation to simulate the above random process for 10000000 and calculate the average. I find that the average expectation number is close to 5000 (about 4997.8). So my logic in calculating the expectation is wrong. Would anyone please show me the right way to calculate the expectation for the question? Thanks.
  2. jcsd
  3. Jul 10, 2011 #2
    Ok so you've got random variables X1,...,XN uniformly distributed on {1,...,100} and if M is the number of matching pairs you want to find E[M]. Set Iij=1 if Xi=Xj and 0 otherwise, so M=sum_{1<=i<j<=N}Iij and thus E[M]=sum(E[Iij])=(NC2).E[Iij]. NC2=N(N-1)/2 and E[Iij]=1/100 so E[M]=4995. The other method fails because the distribution of the number of occurrences of each number is not independent.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook