Copies of shredded text are needed to get 99.9% reconstruction of mutated text?

  • Context: Graduate 
  • Thread starter Thread starter Simfish
  • Start date Start date
  • Tags Tags
    Text
Click For Summary
SUMMARY

The discussion centers on NASA astrobiologist Chris McKay's inquiry into the reconstruction of ancient genomes from shredded text fragments. Specifically, it examines the mathematical problem of determining how many copies of a shredded text are necessary to achieve a 99.9% reconstruction rate. The text is defined by its length (N) and the number of distinct units (m), with examples such as Hamlet illustrating these concepts. The challenge lies in recognizing how fragments fit together, as the context of the fragments significantly affects reconstruction accuracy.

PREREQUISITES
  • Understanding of text fragmentation and reconstruction principles
  • Familiarity with statistical probability and its applications
  • Knowledge of genetic sequencing and genome reconstruction techniques
  • Basic concepts of information theory related to data recovery
NEXT STEPS
  • Research mathematical models for text reconstruction and fragment assembly
  • Explore statistical methods for estimating reconstruction probabilities
  • Investigate genetic sequencing techniques used in ancient DNA recovery
  • Learn about information theory and its relevance to data integrity and recovery
USEFUL FOR

This discussion is beneficial for researchers in astrobiology, geneticists focused on ancient genomes, mathematicians interested in probability theory, and data scientists working on text analysis and reconstruction algorithms.

Simfish
Gold Member
Messages
811
Reaction score
2
This is a question posed by NASA astrobiologist Chris McKay (who wants to put life together back from its pieces), who really wants an answer to this question since it could lead us to reconstruct ancient genomes that are billions of years old (but mutated through years of internal radioactivity)

- A text of length N composed of an alphabet of m distinct units (eg Hamlet has 29551 words and 60 distinct units - 52 letters, space, plus 7 punctuation marks)

- Shred the text in pieces of average length X

- How many copies C of the shredded text are needed to give 99.9% reconstruction?
 
Physics news on Phys.org
I don't think this is a well defined mathematical problem unless you have information about how we are can recognize that two fragments of text go next to each other. For example does "invited for a" go with "a little drink" or "a necktie party"?