How Many DNA Molecules to Sample for Sufficient Unique Sequences?

1eray · Aug 29, 2011

I have 2x10^12 unique sequences of DNA, and I have an average of 47 copies of each sequence (so 94x10^12 DNA molecules total).
How many molecules do I need to choose at random to be "confident" (defined as you please) that I have at least 10^10 unique molecules? 10^11? 10^12?
I would really like to know how to do this calculation.
Any help would be very appreciated.
Thanks,
Ed

bpet · Aug 31, 2011

If the number of copies of each is not exactly 47 then the answer could vary wildly (consider the case with 1 copy of all but one, and copies of a single one making up the rest).

First step would be to show that the number required is somewhere between the number required when there is 1 copy of all (trivial) and the number required when there are infinitely many copies (which is a version of the coupon collector's problem).

To solve the latter problem you'd use the same techniques as for the CCP but truncate the sums appropriately. In effect you're modelling how the number of distinct copies found increases (randomly) as you add one more to the sample. Then, for example, apply Chebyshev's inequality to the 1st and 2nd moments of the distribution as a function of the sample size which would give you a very conservative estimate of the minimum number required.

How Many DNA Molecules to Sample for Sufficient Unique Sequences?

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight