MHB Calculating probablity that random subset of population contains duplicates

  • Thread starter Thread starter mads1
  • Start date Start date
  • Tags Tags
    population Random
AI Thread Summary
The discussion centers on calculating the expected number of duplicates in a sample drawn from a larger population, specifically in the context of biological data. The user seeks to understand how the population size and sample size affect the likelihood of duplicates. A suggestion is made to explore the hypergeometric distribution for this calculation. Clarification is requested regarding the definition of "duplicate," particularly whether it refers to sampling with or without replacement. Understanding these parameters is crucial for accurately determining the expected rate of duplicates in the samples.
mads1
Messages
1
Reaction score
0
Hi,

Apologies that this is basic question but I have to start somewhere! (-:

The problem is succinctly stated in the msg title but, in greater detail; I'm working with some biological data from which samples have been taken. The sampling should have been at random. The samples include duplicates. What I need to know is how to calculate the expected number of duplicates in a sample size drawn from a population size.

For example, if I have a population size, p, of 3 million, and take 3 million samples, s, then the extent of duplicates within the samples s would be expected to be greater than if I take 300thousand samples.

But how do I calculate the expected rate given various values of p and s?
I have access to R & should be able to find my way to any libraries which might be helpful in answering this. Thanks

m
 
Mathematics news on Phys.org
mads said:
Hi,

Apologies that this is basic question but I have to start somewhere! (-:

The problem is succinctly stated in the msg title but, in greater detail; I'm working with some biological data from which samples have been taken. The sampling should have been at random. The samples include duplicates. What I need to know is how to calculate the expected number of duplicates in a sample size drawn from a population size.

For example, if I have a population size, p, of 3 million, and take 3 million samples, s, then the extent of duplicates within the samples s would be expected to be greater than if I take 300thousand samples.

But how do I calculate the expected rate given various values of p and s?
I have access to R & should be able to find my way to any libraries which might be helpful in answering this. Thanks

m
If I understand the problem correctly, then I think you should take a look at the hypergeometric distribution (use your preferred search engine).
 
Hi Mads,

What do you mean by a "duplicate"? Do you mean its like you caught a fish, threw if back into the lake, and then caught the same fish again? Or is it like catching another fish of the same species? And to pursue the fishing analogy further, do you return the fish to the lake ("sampling with replacement"), or do you keep it ("sampling without replacement")?
 
Seemingly by some mathematical coincidence, a hexagon of sides 2,2,7,7, 11, and 11 can be inscribed in a circle of radius 7. The other day I saw a math problem on line, which they said came from a Polish Olympiad, where you compute the length x of the 3rd side which is the same as the radius, so that the sides of length 2,x, and 11 are inscribed on the arc of a semi-circle. The law of cosines applied twice gives the answer for x of exactly 7, but the arithmetic is so complex that the...

Similar threads

Back
Top