The Central Dogma is why proteins cannot be first
Adeimantus said:
I recently borrowed the book Information Theory, Evolution, and the Origin of Life by Hubert Yockey. In it he points to the fact that there can be no code from the amino acid alphabet (about 20 symbols) to the RNA/DNA alphabet (64 3-letter codons, a second extension of the 4-symbol alphabet {A,C,G,T}) and says this proves, beyond any doubt, that proteins were not the first step in the origin of life. No information can be transferred from proteins to DNA. However, he also says in the book that the genetic code probably evolved from an earlier code with 2-letter codons. This first extension of {A,C,G,T} has 16 symbols, which is less than 20. So doesn't that leave open the possibility of a code from the amino acid alphabet to the pre-DNA nucleic acid alphabet?
Thank you for reading my book,
Information Theory, Evolution and the Origin of Life (Cambridge University Press, 2005). However, no, it doesn’t.
It was the late Thomas Jukes who put forward the speculation that the triplet genetic code, which has three-letter codons, may have evolved from a doublet code of two-letter codons. I cited his work.
For information to go back and forth between two alphabets, the code to translate between them must have a one-to-one mapping between the symbols of each alphabet. That is the only way to know for sure which symbol in alphabet A specifies what symbol in alphabet B.
For example, let us have a pair of dice represent the DNA/RNA alphabet as a doublet code and the number 7, from an alphabet of 1 through 12, represent an amino acid. To specify for the number 7, our metaphorical amino acid, the two dice can come up as 1,6; 6,1; 2,5; 5,2; 3,4; or 4,3. That means anyone of six letters from the dice (DNA/RNA) can specify one symbol from the alphabet representing amino acids. This is called a several-to-one mapping.
If all you know is the result of “7,” you can’t possibly know which of the six possible combinations from the doublet dice alphabet specified—or is mapped to—the “7.” That is why information cannot flow from the smaller alphabet to the larger alphabet—that is the central dogma. However, if you know any symbol in the larger alphabet, you know what symbol it will specify in the smaller alphabet. It doesn’t matter that more than one symbol in the larger alphabet may specify a particular symbol in the smaller alphabet. If you know the symbols of the larger alphabet and how they are mapped to the symbols of the smaller alphabet—the code—then when you know a particular symbol from the larger alphabet, you always know what symbol it is mapped to in the smaller alphabet. But just knowing the code doesn’t help you to go with precision from the smaller alphabet to the larger one because you can never know for sure which symbol in the larger alphabet specified the symbol in the smaller one.
This mathematical property of codes alone explains why no “proteins first” scenario for the origin of life is valid.
Our problem is that we know that life exists, so it must have had an origin. However, the origin of life must be an axiom of biology because there is no way to create an algorithm to show how it originated due to the central dogma that information can only flow between alphabets with a one-to-one mapping and only flows from larger alphabets to smaller ones.
It is a pity that biologists and molecular biologists are uncomfortable with things that are unknowable, like the origin of life. Mathematicians and physicists, however, are able to cope with things that are unknowable. I hope that as molecular biologists now must master information theory and coding theory to master their discipline that they will come to accept that there are things that are unknowable. This will clear the decks of incorrect speculations and save young scientists from throwing away their careers pursuing dead ends. That can only result in the good of science.
The ancient Greek mathematicians knew they were unable to solve three problems: (1) the trisection of the angle; (2) construction of a square of exactly the same area of a given circle; and (3), construction of a cube exactly twice the volume of a given cube. We now know that these problems are unknowable because no procedure, or algorithm, exists to solve them. (For #2, it is because pi has no end.)
The question of the origin of life is a traditional proxy battleground for persons who want to force others either to believe in God or to deny the existence of God. In this feud, I have had the role of Mercutio, “a pox on both your houses.” Religionists must stick to questions of faith, while scientists must stick to, as I quote Socrates in my book, what can be “counted and measured.”
Speaking of Socrates, Adeimantus, how is your brother, Plato?