How Can the Complex Evolution of Proteins Be Explained by Mathematical Theories?

KenJackson · Oct 1, 2017

Our bodies are made up of tens of thousands of proteins, each of which is a long precise sequence of amino acid molecules ("residues"). Evolution requires them to be "formed by numerous, successive, slight modifications."

Does this mean the code for one of the twenty residues was added every few generations to the DNA coding for the protein until it was complete? But how would each generation know if the selected addition was correct, since partial proteins don't work whether correct or not? Also, natural selection removes stuff that doesn't work.

It seems like the rules conflict. What am I missing?

jim mcnamara · Oct 1, 2017

You are missing some basic concepts. Maybe I can help:

1. Most mutations (changes) are deleterious. As in usually "fatal" to the organism with the change.
2. Many proteins with a common parentage have been part of living things for billions of years.
3. Some very few mutations are actually a win for the organism. So what we see today is who won, not who lost the Natural Selection sweepstakes.
This is called survivorship bias - and it is why we do not always know what protein came before what see now.
https://en.wikipedia.org/wiki/Survivorship_bias

4. Most importantly is the concept of emergence and of self organization:
Take a set of a few dirt simple rules. Run them against basic compounds known to exist elsewhere in the Solar System and eventually you get complex life -- proteins from scratch if you like.

Requires liquid water.

IMO, the best way to see this is a simple computer "game" by John Conway called 'The Game of Life".
-- this is a great way to understand the emergence concept as it applies to abiogenesis, look at the graphics, first.

https://en.wikipedia.org/wiki/Conway's_Game_of_Life

If you actually read the link and fiddle with the graphics it will become obvious pretty quickly. You can also get internet versions of the game you can play yourself if you want.

KenJackson · Oct 1, 2017

jim mcnamara said:

You are missing some basic concepts. Maybe I can help:

Your first three points seem to only address changes to fully-formed complete proteins. That's not what I'm asking about.

jim mcnamara said:

4. Most importantly is the concept of emergence and of self organization:
Take a set of a few dirt simple rules. Run them against basic compounds known to exist elsewhere in the Solar System and eventually you get complex life -- proteins from scratch if you like.
...
If you actually read the link and fiddle with the graphics it will become obvious pretty quickly. ...

This sounds a little like "poof!" How? What mechanism? How did partial proteins maintain existence before they were complete?

And John Conway's Game of Life seems to be a mathematical thing. Even looking at the graphics, I don't see the relevance. If we look at just the math, it's horrible. Consider a relatively small 100-residue protein. There are about 20^100/2 or 6E129 possible random combinations. The universe isn't remotely close to that many Planck times old. Clearly proteins didn't come about by any brute force mathematical thing, which the game seems to employ.

It's tempting to say most random arrangements were rejected by only accepting the right answer at each addition. But as far as I know, a longer correct-but-incomplete protein doesn't work any better than a shorter correct-but-incomplete protein. So how were incorrect additions rejected on the development path?

BillTre · Oct 1, 2017

@jim mcnamara's first three points are quite relevant to the fine tuning of an existing protein sequence.

There are additional evolutionary factors that will reduce the odds of evolving a protein.

More rapid increases in variation (these can short cut the addition of amino acids and selection of the sequence for particular functions and structures):
1) Most of what are current proteins are not just created from scratch. They are descendants of other proteins, often more than one.
Gene duplications and whole genome duplications are not uncommon in evolution and are considered an important source of working material for further evolution. Duplicating whole genomes results in additional copies of every protein coding gene in an organism's genome. This frees up half of the protein encoding gene from being constrained to fulfill a specified function and allows them more freedom to change existing sequence or acquire new ones. Thus these sequences can change without killing the the organism that is carrying them. They can either degrade into pseudo-genes or change into new functional genes.

2) Many genes are the product of combining parts of other genes together. There are functional components of proteins (such as an enzymatic site, a membrane spanning site, a protein tag for it to be exported from the cell, or some binding site) which could be copies from one place and put into another which may or may not be a part of a gene. This can result in a protein which has a new combinations of functions, such as a enzyme normally in cytoplasm getting put into a vesicle instead. Sometimes this can generate proteins with new functions.

3) Viruses can grab and transfer a part of a protein from one place (or species) and put it into another protein encoding gene.

Large numbers reducing the odds of an unlikely occurrence (throwing more dice/more throws of the dice):
4) The numbers of single cell organisms (in which most protein structures were first evolved) can be immensely huge. I have grown 10⁸ mammalian cells in less than a liter of tissue culture media (under optimal conditions). There are approximately 1,335,000,000 (1.3 x 10⁹) cubic km of ocean water, which is 1.3 x 10²¹ liters. Bacterial cells are smaller but can be even more dense in optimal conditions. The natural world would not usually provide optimal conditions, but there is still a good possibility for really large numbers.

5) Bacterial generations can be as short as 20 minutes which can (optimally) yield >26,000 generations per year.

Not all amino acids in a protein are equally important (not every position in a protein is limited to only one amino acid):
6) Some can be switched out for biochemically similar amino acids with little obvious change in function. Some parts of a protein are kind of like a scaffold, holding the more critical parts in their places.

KenJackson · Oct 1, 2017

BillTre said:

1) Most of what are current proteins are not just created from scratch. ...
2) Many genes are the product of combining parts of other genes together. ...
3) Viruses can grab and transfer a part of a protein from one place (or species) and put it into another protein encoding gene.

In each case, it looks like we're starting with many many complete proteins before a new one is whipped up. That's interesting, but it's still not what I'm asking about.

BillTre said:

4) The numbers of single cell organisms (in which most protein structures were first evolved) ...
5) Bacterial generations can be as short as 20 minutes which can (optimally) yield >26,000 generations per year.

Also interesting, but we're starting with complete organisms, presumably with fully functional proteins, in these cases.

Let's look at the "in which most protein structures were first evolved" case. How did it first evolve? That's the case I'm asking about. Not a modification after it exists.

BillTre said:

6) Some can be switched out for biochemically similar amino acids with little obvious change in function. ...

I thought I understood that each protein has to have a fairly precise sequence of amino acids and that most changes are detrimental. I thought I saw that there were studies somewhere that showed this. About what percent of changes can be tolerated? If the percentage is large, then I understand less than I thought I did about proteins. If it's small, then the math is still astronomically bad.

Ygggdrasil · Oct 1, 2017

The exact origin of proteins is not fully understood, however there are some hypotheses that researchers are testing. These hypotheses begin with the recognition that life likely began with an "RNA world," where RNA acted both as a genetic material and catalyst for accelerating key biosynthetic reactions. RNA catalysts (ribozymes), however, are not very efficient and part of the problem comes because RNA is highly negatively charged, so folding RNAs into compact structures for catalysis fights against the electrostatic repulsion of RNA-RNA interactions.

Thus, some researchers hypothesize that cationic peptides first evolved to bind to these ribozymes to help their folding and function (e.g. see this recent paper from Jack Szostak's lab). Obviously, in this, primarily structural role, there is not a huge deal of sequence specificity required for their function. However, over time, these peptides could become more elaborate and evolve to more specifically bind the RNAs they chaperone. A good example of this is the ribosome, where RNA still performs the catalysis, but the RNA binds >50 different ribosomal proteins that aid in RNA folding, ribosome assembly, and mediating interaction of the ribosome with other cellular components. Eventually, proteins could replace the RNA completely, leading to protein-only enzymes. There are some extant enzymes which support this hypothesis. For example, most RNaseP enzymes are protein-RNA complexes that use RNA for catalysis. However, researchers have discovered some variants of the RNaseP enzyme that do not contain RNA and rely solely on protein for catalysis.

KenJackson said:

I thought I understood that each protein has to have a fairly precise sequence of amino acids and that most changes are detrimental. I thought I saw that there were studies somewhere that showed this. About what percent of changes can be tolerated? If the percentage is large, then I understand less than I thought I did about proteins. If it's small, then the math is still astronomically bad.

Most proteins can tolerate some amount of mutation, as has been shown in studies where researchers systematically mutate each amino acid in a protein. For example:

Approximately 30% of TEM1 mutations cause a partial reduction in fitness, primarily owing to mildly destabilizing mutations that reduce the levels of soluble, folded protein. The remaining fraction (~62%) has no immediate measurable effects on fitness (Box 1). Although this detailed distribution is available for only one protein, experiments with other proteins show similar trends; approximately 40% of mutations reduce or completely abolish the activity of the mutated protein8. However, the available data are largely limited to one class of proteins — single-domain, soluble enzymes. Other classes of proteins, such as membrane proteins, remain unexplored and their distributions may differ substantially.

http://www.nature.com/nrg/journal/v11/n8/full/nrg2808.html

jim mcnamara · Oct 1, 2017

@KenJackson - abiogenesis is what you just asked. @Ygggdrasil gave you the textbook answer on the start, my response gave you the proposed "method"- abiogenesis from emergent properties based on elementary Laws of Chemistry and Physics. Emergence exemplified by the Game of Life. You seem to be a computer programmer/admin so I thought it was a good choice.

I fail to see what else we can do. Please help us get to what you don't see. Thanks.

sas3 · Oct 2, 2017

Going back a step there was a good discussion about a paper on the emergence of native peptide sequences in prebiotic replication networks.
Here is the link to the discussion

SciencewithDrJ · Oct 24, 2017

Ygggdrasil said:

The exact origin of proteins is not fully understood, however there are some hypotheses that researchers are testing. These hypotheses begin with the recognition that life likely began with an "RNA world," where RNA acted both as a genetic material and catalyst for accelerating key biosynthetic reactions. RNA catalysts (ribozymes), however, are not very efficient and part of the problem comes because RNA is highly negatively charged, so folding RNAs into compact structures for catalysis fights against the electrostatic repulsion of RNA-RNA interactions.

Thus, some researchers hypothesize that cationic peptides first evolved to bind to these ribozymes to help their folding and function (e.g. see this recent paper from Jack Szostak's lab). Obviously, in this, primarily structural role, there is not a huge deal of sequence specificity required for their function. However, over time, these peptides could become more elaborate and evolve to more specifically bind the RNAs they chaperone. A good example of this is the ribosome, where RNA still performs the catalysis, but the RNA binds >50 different ribosomal proteins that aid in RNA folding, ribosome assembly, and mediating interaction of the ribosome with other cellular components. Eventually, proteins could replace the RNA completely, leading to protein-only enzymes. There are some extant enzymes which support this hypothesis. For example, most RNaseP enzymes are protein-RNA complexes that use RNA for catalysis. However, researchers have discovered some variants of the RNaseP enzyme that do not contain RNA and rely solely on protein for catalysis.
Most proteins can tolerate some amount of mutation, as has been shown in studies where researchers systematically mutate each amino acid in a protein. For example:

http://www.nature.com/nrg/journal/v11/n8/full/nrg2808.html

Excellent response and I logged into respond giving the same explanation, but I noted that you have eloquently described what I wanted to contribute, so I will not repeat, but would praise this superb response.

Auto-Didact · Nov 3, 2017

Here is one biophysical theory based on quantum information ideas. There are also many other mathematical theories, mostly involving complex networks of molecules interacting with the environment and models of self-organizing nonlinear systems.

In any case, it always bothers me when people, especially physicists, insist and actually seem to believe that a product of complex evolutionary network dynamics can be directly reduced to an exercise in statistical mechanics. Biology is precisely the domain where both the closed system assumption and the thermal equilibrium assumption are violated vigourously, seeing all living systems are open systems and they are only ever in thermal equilibrium during death.

The theory of non-equilibrium statistical mechanics is however currently still only a pipedream. This is perhaps in part due to not enough mathematicians and theoreticians taking to heart the limitations of classical statistical mechanics and the need for a more comprehensive theory. Instead of tackling this outstanding problem, we have theoreticians spending their time worrying about naturalness, multiverses and gravitinos.

How Can the Complex Evolution of Proteins Be Explained by Mathematical Theories?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Can Dogs Talk Using Buttons?

Incredible Difference in Ant Sizes

A New Niche for Life at Low G

There are people in biology who really do math

Hantavirus outbreak aboard ship MV Hondius; virus present in Argentina

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect