Question of finding the exepcted value of repeated numbers

  • Context: Graduate 
  • Thread starter Thread starter KFC
  • Start date Start date
  • Tags Tags
    Numbers Value
Click For Summary
SUMMARY

This discussion centers on calculating the expected value of duplicate numbers in a set of 1000 randomly selected integers ranging from 1 to 100. The initial calculation suggested an expected value of 4500 based on the assumption of equal distribution, but simulations indicated an average closer to 4997.8. The correct approach involves using random variables and the concept of indicator variables to derive the expected number of matching pairs, leading to the conclusion that E[M] equals 4995. This highlights the importance of understanding the dependencies in the distribution of occurrences.

PREREQUISITES
  • Understanding of combinatorial mathematics, specifically combinations (C_{n}^{k})
  • Familiarity with probability theory and expected value calculations
  • Knowledge of random variables and their distributions
  • Experience with statistical simulations and averaging techniques
NEXT STEPS
  • Study the concept of indicator random variables in probability theory
  • Learn about combinatorial probability and its applications in statistics
  • Explore advanced statistical simulation techniques using Python or R
  • Investigate the implications of dependent versus independent distributions in probability
USEFUL FOR

Statisticians, data scientists, mathematicians, and anyone interested in probability theory and statistical simulations.

KFC
Messages
477
Reaction score
4
Hi there,
I am thinking a question of finding the expectation of duplicate numbers (how many time any pair could be made in N rows each contains one number). I start from 1000 numbers each randomly picked ranged 1 to 100. I sort the randomized 1000 numbers, count the total number of "1" called s1, count the total number of "2" called s2 and up to "100" (s100), obviously , for s1 "1", the way to combine any a pair of "1" is [tex]C_{s1}^2 = \displaystyle\frac{s1(s1-1)}{2}[/tex]. Similarly, the way for have a pair for all S2 "2" is [tex]C_{s2}^2 = \displaystyle\frac{s2(s2-1)}{2}[/tex] ...

so the total duplicates is just [tex]\sum_i \frac{S_i(S_i-1)}{2}[/tex]

However, we assume each number appears at equal probability such that in ideal case in the 1000 numbers, there will be 10 x "1", 10 x "2", 10 x "3" ... 10 x "100". Or we can say for any number 1 2 3 ... 100, the duplicate number will be 10 and which can be used to make

[tex]S = 10 \times (10-1)/2[/tex], so total expectation should be [tex]TOTAL EV = 100\times S = \frac{100 \times 10 \times (10-1)}{2} = 4500[/tex]

But someone told me the expectation should be 4995. And I setup a simulation to simulate the above random process for 10000000 and calculate the average. I find that the average expectation number is close to 5000 (about 4997.8). So my logic in calculating the expectation is wrong. Would anyone please show me the right way to calculate the expectation for the question? Thanks.
 
Physics news on Phys.org
Ok so you've got random variables X1,...,XN uniformly distributed on {1,...,100} and if M is the number of matching pairs you want to find E[M]. Set Iij=1 if Xi=Xj and 0 otherwise, so M=sum_{1<=i<j<=N}Iij and thus E[M]=sum(E[Iij])=(NC2).E[Iij]. NC2=N(N-1)/2 and E[Iij]=1/100 so E[M]=4995. The other method fails because the distribution of the number of occurrences of each number is not independent.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
879
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K