How Many DNA Molecules to Sample for Sufficient Unique Sequences?

1eray
Messages
1
Reaction score
0
I have 2x10^12 unique sequences of DNA, and I have an average of 47 copies of each sequence (so 94x10^12 DNA molecules total).
How many molecules do I need to choose at random to be "confident" (defined as you please) that I have at least 10^10 unique molecules? 10^11? 10^12?
I would really like to know how to do this calculation.
Any help would be very appreciated.
Thanks,
Ed
 
Physics news on Phys.org
If the number of copies of each is not exactly 47 then the answer could vary wildly (consider the case with 1 copy of all but one, and copies of a single one making up the rest).

First step would be to show that the number required is somewhere between the number required when there is 1 copy of all (trivial) and the number required when there are infinitely many copies (which is a version of the coupon collector's problem).

To solve the latter problem you'd use the same techniques as for the CCP but truncate the sums appropriately. In effect you're modelling how the number of distinct copies found increases (randomly) as you add one more to the sample. Then, for example, apply Chebyshev's inequality to the 1st and 2nd moments of the distribution as a function of the sample size which would give you a very conservative estimate of the minimum number required.
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top