How Many DNA Molecules to Sample for Sufficient Unique Sequences?

In summary, to be "confident" that you have at least 10^10 unique molecules, you would need to choose at least 10^10 molecules at random. The exact number required would depend on the number of copies of each sequence and could vary greatly. To solve this problem, you would use techniques similar to the coupon collector's problem, but with adjustments for the specific scenario. Applying Chebyshev's inequality to the 1st and 2nd moments of the distribution would give a conservative estimate for the minimum number required.
  • #1
1eray
1
0
I have 2x10^12 unique sequences of DNA, and I have an average of 47 copies of each sequence (so 94x10^12 DNA molecules total).
How many molecules do I need to choose at random to be "confident" (defined as you please) that I have at least 10^10 unique molecules? 10^11? 10^12?
I would really like to know how to do this calculation.
Any help would be very appreciated.
Thanks,
Ed
 
Physics news on Phys.org
  • #2
If the number of copies of each is not exactly 47 then the answer could vary wildly (consider the case with 1 copy of all but one, and copies of a single one making up the rest).

First step would be to show that the number required is somewhere between the number required when there is 1 copy of all (trivial) and the number required when there are infinitely many copies (which is a version of the coupon collector's problem).

To solve the latter problem you'd use the same techniques as for the CCP but truncate the sums appropriately. In effect you're modelling how the number of distinct copies found increases (randomly) as you add one more to the sample. Then, for example, apply Chebyshev's inequality to the 1st and 2nd moments of the distribution as a function of the sample size which would give you a very conservative estimate of the minimum number required.
 

FAQ: How Many DNA Molecules to Sample for Sufficient Unique Sequences?

1. What is probability in the context of DNA?

Probability in the context of DNA refers to the likelihood of a certain genetic event occurring in a population. This can include the likelihood of inheriting a certain gene or the likelihood of a certain mutation occurring.

2. How is probability used in genetic research?

Probability is used in genetic research to make predictions about the likelihood of certain genetic outcomes. This can help researchers understand patterns of inheritance and identify potential risk factors for genetic diseases.

3. Can probability be used to predict specific traits or characteristics?

No, probability alone cannot be used to predict specific traits or characteristics. It can only provide probabilities for certain outcomes based on known genetic information. Other factors, such as environmental influences, also play a role in determining traits.

4. What is random DNA?

Random DNA refers to the portion of an organism's genome that is not specifically coded for any particular function. It is often referred to as "junk" DNA and is believed to have accumulated through random mutations over time.

5. How does random DNA affect genetic variability?

Random DNA can affect genetic variability by introducing new mutations into the genome, which can then be passed down to offspring. This can lead to genetic diversity within a population and potentially result in new traits or adaptations.

Similar threads

Back
Top