Researchers double the size of the DNA alphabet

Ygggdrasil · Feb 21, 2019

Published this week in the journal Science, researchers report that they have devised a eight letter alphabet for DNA and RNA:

The work builds off of previous work, which had expanded the genetic alphabet to six letters. The researchers call their eight-lettered nucleic acids "hachimoji," Japanese for eight letters. The four new letters seem to function just as well as the original four DNA letters, and the researchers were even able to enzymatically copy the eight-lettered DNA molecules into eight-lettered strands of RNA.

Hoshika et al. Hachimoji DNA and RNA: A genetic system with eight building blocks. Science 363: 884 (2019)
http://science.sciencemag.org/content/363/6429/884

Abstract:

We report DNA- and RNA-like systems built from eight nucleotide “letters” (hence the name “hachimoji”) that form four orthogonal pairs. These synthetic systems meet the structural requirements needed to support Darwinian evolution, including a polyelectrolyte backbone, predictable thermodynamic stability, and stereoregular building blocks that fit a Schrödinger aperiodic crystal. Measured thermodynamic parameters predict the stability of hachimoji duplexes, allowing hachimoji DNA to increase the information density of natural terran DNA. Three crystal structures show that the synthetic building blocks do not perturb the aperiodic crystal seen in the DNA double helix. Hachimoji DNA was then transcribed to give hachimoji RNA in the form of a functioning fluorescent hachimoji aptamer. These results expand the scope of molecular structures that might support life, including life throughout the cosmos.

.

Popular press summary: https://www.nytimes.com/2019/02/21/science/dna-hachimoji-genetic-alphabet.html

Buzz Bloom · Feb 21, 2019

Unfortunately I do not currently have access to read the full article, although I may be able to get the Science issue from my local library in a week or so. The abstract leaves me with a few questions which I hope PF participants will answer for me regarding what might be in the full article.

1. I am guessing that the new RNA nucleotides do not participate in triplets that code for amino acids since this was not mentioned in the abstract. Is this correct.
2. Some normal RNA sequences are part ribosomes. I am guessing that the new nucleotides do not participate in this. Is this correct.
3. The following idea was mentioned in the NYTimes article.

Hachimoji DNA could have many applications, including a far more durable way to store digital data that could last for centuries. “This could be huge that way,” said Dr. Nicholas V. Hud, a biochemist at Georgia Institute of Technology who was not involved in research.

I am guessing that the Science article did not mention this idea since it seems to not be related to the purpose of the research: demonstrating alternatives to normal DNA (which we have on Earth) which suggests the possibility that life elsewhere might have much different DNA, perhaps with more elaborate chemistry coding. Is this correct that the idea is not in the Science article?

Ygggdrasil · Feb 21, 2019

Buzz Bloom said:

1. I am guessing that the new RNA nucleotides do not participate in triplets that code for amino acids since this was not mentioned in the abstract. Is this correct.

That is correct. The researchers have not yet engineered any translation machinery to convert the new RNA nucleotides into proteins.

2. Some normal RNA sequences are part ribosomes. I am guessing that the new nucleotides do not participate in this. Is this correct.

The researchers designed a functional aptamer sequence, an RNA that folds into a structure that can bind to fluorescent dye. The aptamer sequence was based on a previously designed aptamer that used only natural nucleotides, and by strategically placing the new nucleotides, they were able to engineer an aptamer that performed the same function but contained the unnatural nucleotides. It would likely be possible to engineer existing ribozymes in a similar fashion to contain the unnatural nucleotides.

3. The following idea was mentioned in the NYTimes article.

Hachimoji DNA could have many applications, including a far more durable way to store digital data that could last for centuries. “This could be huge that way,” said Dr. Nicholas V. Hud, a biochemist at Georgia Institute of Technology who was not involved in research.
I am guessing that the Science article did not mention this idea since it seems to not be related to the purpose of the research: demonstrating alternatives to normal DNA (which we have on Earth) which suggests the possibility that life elsewhere might have much different DNA, perhaps with more elaborate chemistry coding. Is this correct that the idea is not in the Science article?

The authors envision a wide range of potential applications for their hachimoji nucleic acids. In the final paragraph of the Science paper, the authors write: "This synthetic biology makes available a mutable genetic system built from eight different building blocks. With increased information density over standard DNA and predictable duplex stability across (evidently) all 8n sequences of length n, hachimoji DNA has potential applications in bar-coding and combinatorial tagging, retrievable information storage, and self-assembling nanostructures. The structural differences among three different hachimoji duplexes are not larger than the differences between various standard DNA duplexes, making this system potentially able to support molecular evolution. Further, the ability to have structural regularity independent of sequence shows the importance of interbase hydrogen bonding in such mutable informational systems. Thus, in addition to its technical applications, this work expands the scope of the structures that we might encounter as we search for life in the cosmos."

Dr. Courtney · Feb 22, 2019

Fascinating. Thanks for posting. PF definitely makes me aware of lots of stuff I miss in my other sources.

BillTre · Feb 23, 2019

Interestingly, I was re-reading parts of an older book (The Major Transitions In Evolution, Maynard Smith and Szathmáry, 1995; end of chapter 5) when I ran across a section on why the genetic alphabet is the side it is (4 nucleotides, 2 base pairs).
They made some interesting arguments relevant to the issue of DNA alphabet size.
They concluded 4 bases is likely an optimal size due a balance of the pluses and minuses.

First, they are considering the situation when life was first forming, prior to the ribosomal based system of sequence driven protein production.
This would be an RNA world (presumably with some kind of biochemical system providing a metabolic drive for reactions).
A modern view might include the following in such a scenario:

RNA would require a way to produce the RNA precursors (the metabolism of that) and then assemble the RNA polymer.
Primitively, energy production (which would be required in some form to drive cell metabolism) is tied to cell membranes (a situation conserved between bacteria and archaea (earliest know split in living lineages)). This would also produce a contained space that could be better controlled.
Membranes would require lipid production. Presumably this would be a primitive condition evolutionarially, but biochemical membrane components are not conserved between bacteria and archaea.
It would not surprise me if there were abiotically, non-translationally generated, short peptide chains that would also be floating around in this mix (there is presumably some source of production of new organic chemicals, perhaps not well directed at first). Some would randomly have the appropriate sequence to bind certain parts of RNA structures in potentially "helpful" ways (stabilizing conformations?).

Anyway, at some primitive stage of abiotic chemical evolution, the number and type of precursors and the number and type of nucleotides were set at 4 (to make two different base pairs (A-T and C-G)).

They consider potentially positive aspects of more letters and the newly evolving ribozymes having more potential reactions sites that might turn out to have some useful function that could be metabolically beneficial. The greater the size of the alphabet the more different possible reactions sites a sequence of a given size could encode.
With m different nucleotides, there would be
m⁴ possible different sequences that are 4 nucleotides long (the size of an active (catalytic) site they say).
For m=4 (four nucleotide alphabets) there are 256 possible active sites made of distinct nucleotide sequences.
For 6 nucleotides it would be 1296 possible sites.
For 8 nucleotides it would be 4096.
Too small of a variety of possible reaction sites might be too limiting to control a functional metabolism. Perhaps one kind of base pair is therefore bad (in an adaptive kind of view).

On the other hand (negative aspects of a larger alphabet), they expect that a large alphabet of bases to potentially form base pairs (based on different arrays of hydrogen bonds they could make) would be less chemically distinct in the pattern of H-bonds they could make, which would result in a lower fidelity of replication.
They argue that this would limit the size of successful (in the face of poor fidelity) genomes to 1,00 to 10,000 base-pairs.
An example of possible reduced selectivity (based on possible H-bonds that might form) can be seen in the diagram @Ygggdrasil posted in the OP, the dP might bind to either dT or dC with two base pairs
Note that this is an argument based on conditions thought to exist at the beginning of live. Now-A-Days, there are many (more recently evolved) molecular systems that look for and correct errors on DNA replication so these are probably a less important concern in current biology.

The ribosomally based translation system of making proteins based on an RNA sequence is more recently evolved.
Assuming 20 different amino acids and reaction sites 4 residues (amino acids) long, a 4 monomer (amino acid) length of peptide could have 160,000 different potential sequences that might be able to function as reaction sites.

The fact that there are occasionally post-translational modifications of amino acids in proteins is taken to mean then library of amino acids could be usefully increased.
In theory, this could be done a bit without increasing the library of nucleotides if some of the degeneracy of the genetic code were reduced to make full use of the 32 possible different triplets. However, this would require recoding transcriptional genes (DNA or RNA through out the genome. Wherever a codon meant for encoding one of the "new" amino acids was used in the original sequence for a different (old style) amino acid, it would have to be replaced (in the genome) or a new style amino acid would be inserted into that site in the protein, perhaps with deleterious results.
6 bases would give 216 possible triplets and 8 bases would give 512 possible triplets.
Any expansion of the number of amino acids encoded in the translation process in a free living organism would require several new features:

Metabolic production of the new nucleotides (perhaps only a final enzyme in its production, perhaps other enzymes to produce any novel precursors it might require.
New tRNA genes to take the new amino acids to the ribosome for protein assembly.
New enzymes to link specific amino acids to the proper tRNAs for later use in the ribosome.

This is a lot to get done at once by natural evolution (where things have to be adaptive at each step in the sequence of change), but could be possible to assemble synthetically.
The because it would be such a big lift to add new nucleotide letters to the alphabet, the number (in natural populations) has become a frozen character and is very unlikely to change in a natural population. Thus it has been stable for billions of generations, with even larger numbers of competing organisms in each generation.

These are the ways that an molecularly encoded sequence can produce different nano-machines (or structures) with controlled chemical micro-environments (through the production of molecules with different structures and reaction sites) which can be selected for doing something adaptive.

Ygggdrasil · Feb 23, 2019

The recent Science paper is actually not the first attempt to create an eight lettered DNA alphabet. I just recently learned that in 2005, the laboratory of Eric Kool published their version of eight-lettered DNA:

Gao, Liu and Kool. Assembly of the Complete Eight‐Base Artificial Genetic Helix, xDNA, and Its Interaction with the Natural Genetic System. Angew Chem Int Ed Engl. 13: 3118 (2005) https://onlinelibrary.wiley.com/doi/full/10.1002/anie.200500069

Of course, as one can see from the picture, the bases are bigger than natural base pairs, which results in DNA that is much wider than its natural counterpart. The hachimoji DNA designed by Benner et al. has the advantage of retaining a similar geometry as natural DNA, enabling them to make only slight modifications to natural enzymes in order to get them to process the hachimoji DNA as well as enabling them to easily adapt natural RNA aptamers into hachimoji RNA aptamers.

Researchers double the size of the DNA alphabet

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Similar threads

Can Dogs Talk Using Buttons?

Hantavirus outbreak aboard ship MV Hondius; virus present in Argentina

A New Niche for Life at Low G

There are people in biology who really do math

Molds in Front Loading Washing Machine

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect