Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Current research on protein folding

  1. Nov 13, 2014 #1


    User Avatar
    Gold Member

    As a non-biologist, I'm curious: What are some things that current research on protein folding is focusing on? What are some current challenges that researchers are facing? I understand that the protein folding problem is a very important problem in the field of biophysics. Any experts on here?
  2. jcsd
  3. Nov 13, 2014 #2


    User Avatar
    Science Advisor

    Protein folding, of course, is a fairly big area of science, so it's hard to write a comprehensive answer to your question. In lieu of that, I'll point you to a review article published in Science that has a fairly good overview of the field and the current questions interesting those studying protein folding. At the end of the review, the authors list a few of the most prominent unsolved questions in the field:

    Dill KA and MacCallum JL. 2012. The Protein-Folding Problem, 50 Years On. Science. 338: 1042. doi:10.1126/science.1219021
  4. Nov 14, 2014 #3
    Though I'm a biologist, I have never worked in protein folding. Having said that, I'd like to add something to what Ygggdrasil has said just to make it a bit more understandable why those are all very important questions from the biological point of view. I believe that the holy grail of the protein folding field would be to predict protein structure only from the protein sequence. The protein sequence is more or less easily predictable from the DNA and would be amazing if you could predict how a protein would fold from its sequence alone. This would allow understanding much more about each organism's proteome, what each protein could be doing, and how various processes would behave under various conditions.

    For now, I believe that most of what you can do only from a known protein sequence is to search for similar protein sequences whose structures are known (be it whole proteins, or motifs) and then derive your new protein folding from that. Of course, this is not very accurate because different proteins are folded in different ways (hence, the importance of finding more about the protein-folding energy landscapes), and due to protein interactions when and after protein assembly, these landscapes are changed. Chaperones, for instance, are a class of proteins that help in folding or maintaining the correct protein structure, but there is no systematic way of finding what they do.
  5. Nov 14, 2014 #4


    User Avatar
    Gold Member

    Thanks very much for the responses!
    What makes them so sure that the structure of protein will be completely determined by its amino acid sequence?
  6. Nov 14, 2014 #5
    The problem many modelers have is that they are overly reductionist. Many scientists completely forget that a protein isn't just a protein, they're often GLYCOproteins. What is the significance of the glyco portion of a protein? Well for starters, glycosylation of proteins is fundamentally important to protein folding:


    We now know that probably >90% all proteins are glycosylated (which includes intracellular proteins). The number of possible glycan structures that can be added proteins is extraordinarily massive, it is orders of magnitude larger than even the entire proteome. Studying glycosylation is beastly--there simply is no code to allow you to predict what type of glycosylation will get added to a protein--hence it will make it extremely difficult to be able to predict protein folding from sequence alone if you can not predict its glycosylation patterns that will profoundly affect protein folding. The structure of glycans can be extremely complex, and when in solution, are moving as well, just like how proteins can achieve multiple conformations. Additionally, no one can predict where all types of glycosylation will occur on proteins because no known motif exists for certain types of glycosylation. In order to be able to 100% predict and model how a protein will fold I have no doubt that 1.) you'd need to be able to predict which glycans get added to proteins (impossible so far) 2.) you'd need to include in your model how glycans can move in solution (extremely complex) 3.) be able to predict at exactly which sites on a protein glycans will be added (impossible).

    And just to highlight how important glycosylation is for protein folding, recent publications have shown that glycosylation metabolism which is used to build glycans is inherently linked to the ER stress response:


    All of this leads to different glycoforms of the same protein that can have different functions:


    There's a massive world of biology besides DNA and proteins that often gets overlooked, and that's what happens in the post-translationalome. There are over 400 different types of PTMs which lead to the hundreds of millions or billions of distinict molecular species needed for life. You can not understand how proteins work without knowing how posttranslational modifications, some sets of which are even much more complex than the entire proteome itself, affect protein folding/signaling/trafficking.
  7. Nov 15, 2014 #6
    Indeed, this is very important! And if you look at some of the research on protein structure, protein crystallography is often carried while the protein is interacting with a specific component because that's very important for determining the protein folding.

    @ZetaOfThree - there are some features that you may derive from the protein sequence itself. The aminoacid side chains provide many clues to the types of inter- and intra-interactions that may occur in those proteins. For instance, cysteine residues can form disulfide bonds with other cysteine residues. This post-translational modification (cysteine oxidation) may help stabilize the protein structure, or may simply be involving that protein in some signaling process while *maybe* not having any role in that protein's structure. Nevertheless, similar motifs tend to behave similarly, or at least exhibit similar characteristics, which is why it is so appealing to find a way of predicting protein structure from sequence.

    ...of course, this is still a very complex task to achieve as you can see from the responses of @gravenewworld and @Ygggdrasil. :) And simply predicting whether a residue will be on the surface of the protein or not is already a bit uncertain.

    (I'd like to give you some more information regarding your question but, not having worked in the field, I'm afraid I can't tell you much more...)
  8. Nov 15, 2014 #7


    User Avatar
    Science Advisor

    In a classic experiment, for which Christian Anfinsen won the Nobel Prize, Anfinsen showed that he could unfold, then refold a small enzyme in a test tube. The fact that he could refold the purified enzyme in the absence of any other cellular factors demonstrated that the amino acid sequence contains all the information required to fold the protein. This experiment has been repeated with a number of other proteins by a number of other groups and has led to the recognition that the native structure of a protein is (with only a few exceptions) the lowest energy configuration of a particular amino acid sequence. Thus, the structure prediction problem is a purely thermodynamic problem: if you can find the lowest energy configuration of a protein, you've found its native fold. Indeed, this principle has allowed researchers to simulate the folding of small proteins on a supercomputer, and it also forms the basis for our ability to computationally design new protein structures.

    Although the native structure of most proteins is its minimum energy state, many proteins cannot find that minimum state alone. As others have noted, many proteins require chaperone proteins aid in their folding. For example, membrane proteins require help from chaperone proteins in order to be inserted into the membrane prior to folding. Furthermore, as @gravenwworld noted, post-translational modification can also affect the folding of proteins. Furthermore, although the minimum energy state of an isolated protein is its native fold, aggregation of multiple proteins together often creates an even lower energy state, forming the basis for amyloid formation in various neurodegenerative disorders like Alzheimer's and Parkinson's.

    Really? I'd be interested in seeing the citation for this figure.
  9. Nov 15, 2014 #8
    Ouch, you asked for experts. Well, I am not one, but I can tell you from my layman interest in astrobiology that understanding some of protein folding is important there too. How did emergent life handle that aspect of proteins?

    If we look at the ribosome and its function, protein folding is often automatic (even if there are derived exceptions). In fact, up to the point that ribosomes can attach to membranes to inject membrane proteins for self-folding.

    And we see from tentative phylogenetics of the ribosome that it started out as a generic dimer cofactor producer, proceeded to do unordered protein nests for catalytic metal atoms, and first later evolved a code and hence started to rely on self-folding as expected. ["Evolution of the ribosome at atomic resolution", Petrov et al]

    Therefore it would be no surprise if protein folding was under selection for all the lifetime of the genetic code. Though I don't know how it has been received, there is work that seem to confirm that:

    "A Rice team led by biophysicists Peter Wolynes and José Onuchic used computer models to show that the energy landscapes that describe how nature selects viable protein sequences over evolutionary timescales employ essentially the same forces as those that allow proteins to fold in less than a second. For proteins, energy landscapes serve as maps that show the number of possible forms they may take as they fold."

    "The research reported today in the Proceedings of the National Academy of Sciences shows that when both of the Rice team's theoretical approaches—one evolutionary, the other physics-based—are applied to specific proteins, they lead to the same conclusions for what the researchers call the selection temperature that measures how much the energy landscape of proteins has guided evolution. In every case, the selection temperature is lower than the temperature at which proteins actually fold; this shows the importance of the landscape's shape for evolution.

    The low selection temperature indicates that as functional proteins evolve, they are constrained to have "funnel-shaped" energy landscapes, the scientists wrote."

    "The key is the selection temperature, which Onuchic explained is an abstract metric drawn from a protein's actual folding (high) and glass transition (low) temperatures. "When proteins fold, they are searching a physical space, but when proteins evolve they move through a sequence space, where the search consists of changing the sequence of amino acids," he said.

    "If the selection temperature is too high in the sequence space, the search will give every possible sequence. But most of those wouldn't fold right. The low selection temperature tells us how important folding has been for evolution."

    "If the selection temperature and the folding temperature were the same, it would tell us that proteins merely have to be thermodynamically stable," Wolynes said. "But when the selection temperature is lower than the folding temperature, the landscape actually has to be funneled."

    "If proteins evolved to search for funnel-like sequences, the signature of this evolution will be seen projected on the sequences that we observe," Onuchic said. The close match between the sequence data and energetic structure analyses clearly show such a signature, he said, "and the importance of that is enormous.""

    [ http://phys.org/news/2014-08-eons-seconds-proteins-exploit.html ; my bold]

    I know this contradicts some of what has been recounted here, but if evolution and folding can be tied together it looks promising to me. It certainly fits the ribosome history.
  10. Nov 15, 2014 #9
    I'm curious. "Reductionist" is a philosophic concept. How do you envision it as a feature and as a problematic feature of modeling?

    It seems to me, I can be mistaken, that you imply it means making approximations at the beginning of modeling. (E.g. try to leave out glycosylation and other post-translation modification.) But KISS is a feature, not a bug, of the modeling approach.

    [It may, or may not, be a coincidence that I previously posted a paper claiming that the approximation will suffice in most cases. :p

    And the provided references seem to agree, they don't see the folding as affected but the folded protein stability and function. Unless I am mistaken after taking a cursory glance only.]
    Last edited: Nov 15, 2014
  11. Nov 15, 2014 #10
    This is the best one I could find:

    Among all the PTMs, glycosylation, which generally involves the covalent attachment of glycans to Ser/Thr/Asn residues, is predicted to occur in 80-90% of all extracellular and nucleocytoplasmic proteins and thus it is probably the most abundant and structurally diverse 1,2


    Dr. Hart (author) is one of the leading experts in this field and claimed 80-90% in that paper, but that paper was probably written in 2013. I've since been to numerous talks given by Dr. Hart, and each and every year as mass spec techniques improve, his estimate keeps going up and up, and it is quite likely that >90% of all proteins are in some way shape or form glycosylated. Just 5-8 years ago, the estimate for the number of proteins that are glycosylated was at 50%, but in that short time span, the number now believed to be glycosylated has almost doubled.

    True, this classic experiment works, but there is one extremely important caveat--it was done on an intracellular enzyme. Intracellular proteins are almost never modified by N- and O- linked glycan structures, which are the most complex types of glycosylation structures that get added to proteins bound for secretion or for embedding in the cellular membrane. Virutally all cellular membrane proteins are glycosylated, and those glycan structures will profoundly influence their folding, trafficking, cell surface half life and physiological function. Glycosylation is a huge reason why cell membrane proteins can be diabolically difficult to clone--you simply do not get the right type of glycosylation structures on membrane proteins in a yeast based cloning system and those proteins will often fail to fold correctly. A therapeutic like EPO that is made in plants only worked once scienctists engineered entire glycosylation pathways into the plant genome to make the proper folded and glycosylated EPO. So, even though protein folding from sequence alone may work for an intracellular enzyme, cell surface proteins and secreted proteins are a whole different beast--any many extremely important therapies must utilize or work through cell membrane proteins (50% of all current drugs target GPCRs).

    However, there is still one other catch. The ultimate goal of protein folding is probably, I'm guessing, the ability to predict function from form, and this is where things are not straightforward at all. Yes, an intracellular enzyme may fold correctly based on its amino acid sequence and it is not N- or O- linked glycosylated, however, there is one other class of glycosylation that is extremely special--and that's the addition of a single sugar called N-acetylglucosamine (O-glcnac) to Ser/Thr sites that can act in a ying-yang relationship with phosphorylation (and O-glcnac plus N- and O- linked glycans are the reason people now say >90% of proteins are modified by sugars in one way or another). Phosphorylation, as many scientists know, serves to act as an on/off switch for proteins, so even if you correctly predicted the folding for an intracellular protein you could probably still guess its function if all you thought about was phosphorylation since phosphorylation only serves as an on/off mechanism. However, the addition O-GlcNAc is much different. Not only can it serve as an on/off regulatory mechanism just like phosphorylation, O-glcnac can endow the same protein with completely different capabilities. For examples of what O-glcnac can do see :

    (phos. and O-glcnac sites do not always overlap which indicates the O-glcnac alone can endow proteins with different functions--and even the NIH believes in this idea--the lab that received the largest amount of NIH money out of all of Harvard is dedicated to studying O-glcnac).

    Thus, even if you have a correct prediction for protein folding for an *intracellular* protein, you probably are still not be able to decipher protein function from form alone because things like O-glcnac of proteins can radically alter protein activity--and these physiological changes due to PTMs or even where and when a PTM like O-glcnac will occur on your protein can not be predicted based on protein sequence.
  12. Nov 15, 2014 #11

    Which *types* of proteins get modeled? Intracellular proteins, which contain no N- or O- linked glycan structures may be easier to model in contrast to cell membrane/secreted proteins (see above). See how glycans affect folding of EPO


    (just to highlight that when it comes to modeling, you must be very careful...what works for an intracellular enzyme will probably not work for a cell membrane protein because something like glycans for cell membrane proteins are much more complex compared to the sugars on intracellular proteins).

    And even if you do correctly predict protein folding of an intracellular enzyme, you are still not guaranteed at all that you can correctly predict protein function from form due to the 400 other PTMs that modify funciton in a completely unpredictable way that responds to physiological cues. This is why I say 'reductionist'.
  13. Nov 15, 2014 #12


    User Avatar
    Homework Helper
    Gold Member

    I am way out of date on this but about ten years ago there was a regular and keen competition in which guys would announce they had solved a 3D structure of some protein but witheld it and challenge the competitors to predict it. Everyone used their computer algorithms of various sorts except there was one guy who didn't use computers at all but physicochemical reasoning and intuition. He was doing about as well as the most successful.
  14. Nov 16, 2014 #13


    User Avatar
    Science Advisor

    Thanks for the reference and explanation, I didn't know GlcNAcylation was so common for nuclear and cytoplasmic proteins. Maybe I should check to see if the enzyme I study gets glycosylated.

    Actually, the Anfinsen experiment was done on RNase A, a secreted protein.

    According to your reference, "Biophysical and functional characterization of the nonglycosylated Escherichia coli-expressed EPO have already demonstrated that it is well folded, has the same in vitro biological activity as the Chinese hamster ovary cell-derived glycosylated EPO (WT-EPO), but is thermodynamically less stable and tends to precipitate after short-term storage at 20 °C," suggesting that EPO can fold perfectly well without glycosylation. Glycosylation likely modifies the properties of the protein (the kinetics of folding, the stability, and its interactions with other proteins), but is probably not essential for defining the fold of the protein. For example (and you can correct me if I am wrong, because you're the expert on glycobiology), many of the glycosylation reactions associated with the secretory pathway act as sensors (not effectors) of protein folding. That is, the glycosylation enzymes check to see whether a protein has correctly folded before glycosylation can occur, ensuring that only correctly folded enzymes get trafficked out of the ER.

    Indeed, because protein folding is thought to occur co-translationally in most cases, most nuclear and cytoplasmic proteins will define their structure largely in the absence of PTMs. PTMs will likely effect conformational changes, but the overall fold of the protein can likely be predicted without accounting for PTMs.

    However, your point that knowledge of PTMs is essential for predicting function is completely on point. Furthermore, as you note, the folding of membrane proteins is much more complicated and perhaps a failure to account for PTMs is one reason why most efforts to predict membrane protein structure are mostly unsuccessful (though the relative lack of membrane protein structures to train force fields on is likely a bigger factor).

    Yes, you are referring to the Critical Assessment of protein Structure Prediction (CASP). The Science review I posted earlier discusses the results from the past CASP competitions and charts in Figure 4 the progress the field has made in since the first CASP competition in 1994.
  15. Nov 18, 2014 #14
    You should PM me your protein of interest. I could check to see if it modified by O-glcnac, but it if it is unknown whether it is or not, I can give you some ideas to be able to probe whether it is or not. The probability that your protein (intracellular I assume) is modified by O-glcnac is very, and publications that show that a new protein is found to be O-glcnacylated tend to be very high impact. PM if your are interested in references for protocols on how to probe for O-glcnac.

    True. For some reason I always mistakenly think that Anfinsen did his work on DNA polymerase and can never remember it is RNAse a. .

    In my opinion, it is still not straight forward. At elevated temperatures and at protein concentrations reaching 300 mg/ml, folding becomes unfavorable in live cells. Anfinsen was able to get his protein to fold, but surprisingly it was still less inefficient than what happens in a cell. Cells, must be using some alternative mechanisms in an unfavorable environment to assist them with protein folding.

    Right. glycosylation can effect the kinetics of folding for EPO (which was my intent for linking to that article). Maybe I should have made it clearer that I meant to show, that for a protein like EPO, glycans can drastically alter kinetics of folding. While EPO may have enough information to fold correctly on its on, the kinetics would be too slow without glycosylation and could possibly lead to a buildup of unfolded proteins in the ER in causing cell death, so I'd still argue that even from a kinetic standpoint, glycosylation is fundamentally important for the way proteins fold in vivo (what's the point of modeling a process that is going to happen to slow and doesn't happen in vivo?).


    And just to quote from that article so that everyone doesn't have to read it all about how glycans are inherently linked to protein folding :

    On top of providing the compliment to which lectin chaperones bind to to regulate protein folding, glycans, as the article implies, are directly altering the way proteins fold (other than through kinetics e.g. masking of the hydrophobic stretches). Also, another example is when you can use tunicamycin, which does nothing but remove N- linked glycans, to affect folding of hemagluttin--the glycoprotein that influenza uses to infect cells:

    http://www.ncbi.nlm.nih.gov/pubmed/2738090 (block glycosylation all of the hemagglutinin precursor becomes unfolded).

    Agreed that predictions for protein folding might work for intracellular proteins (and excreted proteins such as RNAse a which are non-glycosylated but are rare), however, 1/3 of all proteins (according to the 1st article) must go through the secretory pathway. Many, many, many therapeutic targets (such as the entire class of GPCRs) and almost all proteins pathogens use to get inside of the cell are on the cellular surface and are almost all undoubtedly modified by sugars and glycans, glycans which, during the formation of glycoproteins, are going to affect their folding many times (not just kinetics). Thus, what works for predicting how an intracellular protein is going to fold will probably not likely work for all proteins.
  16. Dec 1, 2014 #15


    User Avatar
    Gold Member

    The results in that paper seem quite impressive. But what is the future of such simulations? Could they be used to map trajectories that are common to all proteins in phase space (the energy landscape)?
  17. Dec 3, 2014 #16


    User Avatar
    Science Advisor

    Yes, the main advance in that Science paper was that they were able to observe the likely folding trajectories of the proteins and find that most of the proteins fold along a single, dominant route whereas before some had hypothesized that proteins could travel on a number of different folding trajectories to reach the same native state. Of course, the study looked only at very small, single-domain proteins that have been experimentally shown to fold rapidly without stable intermediates, so more research is certainly required to determine whether this principle holds for other types of proteins.

    An area where these simulations will be extremely useful (which has so far been lacking in this discussion) is the area of protein dynamics. The prevalence of crystal structures gives us the false notion that proteins are static entities with a single native structure. In reality, the native state of a protein is more accurately described as an ensemble of interconverting states at the bottom of a broad well in the protein's energy landscape. The breathing motions and other conformational changes that occur are often important for protein function. Indeed, the rate limiting step of many enzymes is not performing the chemistry, but rather a conformational change in the enzyme that occurs during the catalytic cycle. The fact that most attempts at computational enzyme design focus only on designing static structures could explain why these attempts have not yet achieved the same catalytic efficiencies as natural enzymes. Protein dynamics are also important for understanding protein ligand interactions, which has important implications for drug design. Indeed, scientists often want a drug that targets a single member of a large family of proteins that all have similar active sites (e.g. protein kinases). Research has demonstrated that in the case of the kinase inhibitor Gleevec (one of the most successful anticancer drugs), its ability to distinguish between related kinases and inhibit only the Abl kinase comes from http://www.nature.com/nsmb/journal/v21/n10/full/nsmb.2891.html that occur after drug binding. MD simulations could aid drug design by helping to understand how ligand binding alters the energy landscape of proteins and thus their dynamics.

    Here's a recent review from the DE Shaw group that wrote that Science paper further discussing the past, present, and future of atomic-level simulation of biomolecules.
    Last edited by a moderator: May 7, 2017
  18. Dec 10, 2014 #17
    Thanks for the response! Sorry I haven't had time to get around to this earlier.

    But I'm sorry, differing degree of model problems doesn't seem to me to map to an inherent problem of the science method of studying simplifications, parts of problems et cetera. It has worked well so far.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook