- #1

KFC

- 488

- 4

I am thinking a question of finding the expectation of duplicate numbers (how many time any pair could be made in N rows each contains one number). I start from 1000 numbers each randomly picked ranged 1 to 100. I sort the randomized 1000 numbers, count the total number of "1" called s1, count the total number of "2" called s2 and up to "100" (s100), obviously , for s1 "1", the way to combine any a pair of "1" is [tex]C_{s1}^2 = \displaystyle\frac{s1(s1-1)}{2}[/tex]. Similarly, the way for have a pair for all S2 "2" is [tex]C_{s2}^2 = \displaystyle\frac{s2(s2-1)}{2}[/tex] ...

so the total duplicates is just [tex]\sum_i \frac{S_i(S_i-1)}{2}[/tex]

However, we assume each number appears at equal probability such that in ideal case in the 1000 numbers, there will be 10 x "1", 10 x "2", 10 x "3" ... 10 x "100". Or we can say for any number 1 2 3 ... 100, the duplicate number will be 10 and which can be used to make

[tex]S = 10 \times (10-1)/2 [/tex], so total expectation should be [tex]TOTAL EV = 100\times S = \frac{100 \times 10 \times (10-1)}{2} = 4500[/tex]

But someone told me the expectation should be 4995. And I setup a simulation to simulate the above random process for 10000000 and calculate the average. I find that the average expectation number is close to 5000 (about 4997.8). So my logic in calculating the expectation is wrong. Would anyone please show me the right way to calculate the expectation for the question? Thanks.