Can computational models accurately predict cellular phenotypes from genotype?

Ygggdrasil · Jul 21, 2012

Researchers at Stanford University and the J. Craig Venter Institute (the group who made the first "synthetic" bacterium) have produced a computational model that tracks the activities of every single gene in the bacterium Mycoplasma genitalium during its cell cycle (the researchers chose to model this bacterium because it has the smallest number of genes of any known organism). Here's the abstract from their paper, published in the journal Cell:

Understanding how complex phenotypes arise from individual molecules and their interactions is a primary challenge in biology that computational approaches are poised to tackle. We report a whole-cell computational model of the life cycle of the human pathogen Mycoplasma genitalium that includes all of its molecular components and their interactions. An integrative approach to modeling that combines diverse mathematics enabled the simultaneous inclusion of fundamentally different cellular processes and experimental measurements. Our whole-cell model accounts for all annotated gene functions and was validated against a broad range of data. The model provides insights into many previously unobserved cellular behaviors, including in vivo rates of protein-DNA association and an inverse relationship between the durations of DNA replication initiation and replication. In addition, experimental analysis directed by model predictions identified previously undetected kinetic parameters and biological functions. We conclude that comprehensive whole-cell models can be used to facilitate biological discovery.

Karr et al. 2012. A Whole-Cell Computational Model Predicts Phenotype from Genotype. Cell, 150: 389. http://dx.doi.org/10.1016/j.cell.2012.05.044 .
NY Times summary: https://www.nytimes.com/2012/07/21/...entire-organism-is-simulated-by-software.html

The model, of course, is far from perfect and even the authors acknowledge that the model represents more of a "first draft" than a complete model. Indeed, M. genitalium is not particularly easy to work with experimentally, so much of the data used to model the genes in M. genitalium came from studies of similar genes in other bacteria. Nevertheless, the paper represents an interesting proof of principle that could lead to some very interesting work in the future.

Q_Goest · Jan 25, 2014

Hi Ygggdrasil
Sorry for digging up an old thread but this research became a topic for Scientific American in this month’s (Jan ’14) issue. It was the first I’d heard of it. Really interesting article and it goes into a whole lot more detail about the software these guys came up with. Very interesting stuff. If you haven’t seen anything about this recently, there seems to be a lot out there now.

The Scientific American article is here: SciAm Simulating a Living Cell
http://www.scientificamerican.com/article/scientists-successfully-model-a-living-cell-with-software/

Unfortunately, that web page requires $$ so here’s another site I thought was interesting on ZMESci:
http://www.zmescience.com/medicine/genetic/computer-model-simulation-bacteria-31243/

But the papers they published are online here:
The Dawn of Virtual Cell Biology
http://www.sciencedirect.com/science/article/pii/S0092867412008197

A Whole-Cell computational model predicts phenotype from genotype
http://www.sciencedirect.com/science/article/pii/S0092867412008197

There’s also a very interesting discussion posted on YouTube here:

<Does that YouTube link work? Search on "dawn of virtual cell biology" if it doesn't. I watched it from the start and thought it was an excellent discussion on this software model.>

A couple of things that I’ve noticed. The NY Times article you posted states:

But Dr. Covert said: “Where I think our work is different is that we explicitly include all of the genes and every known gene function. There’s no one else out there who has been able to include more than a handful of functions or more than, say, one-third of the genes.”

On the YouTube presentation at 1:03:30, a woman asks a question that addresses this. The man’s response was that the number of genes whose functions they have included is 75%. The response also goes into some other interesting aspects of how accurate he feels this model is. Which is what really struck me about the model they are using. In ZMESci, they talk about this being a “bio-CAD” model. I think they meant something analogous to FEA or CFD, but that aside, I think there’s another great question around why certain assumptions were made for this model - and how those assumptions differ from FEA or CFD or any other computational model I know of. There’s two parts to that question.

Why did they only include a single volume of space. In engineering we might call it a control volume, especially a “differential control volume”.
Why did they use 1 second as the time step?

They address both of these questions to some degree in the YouTube video. I want to unpack the first issue just a bit. In engineering, we use FEA and CFD to model structures and fluids respectively, not “CAD”. I want to start there. Those types of programs and every other computational modeling program I know of, including those that model such things as the informational qualities of the brain are essentially FEA type programs. They all break up some chunk of our universe into discrete volumes of space and then they take the differential equations that describe the interactions and linearize them. That holds true for models of neurons such as “Neuron” and “Genesis”. The Navier Stokes equations are generally used for CFD analysis for example, just as the Hodgkins-Huxley models are used for neuron models.

What struck me was that this model doesn’t have any spatial dimensions whatsoever as near as I can see. There is something mentioned about there being 6 segments or something like that in the YouTube video, but I wasn’t sure what that really meant. There is some mention however in the video, about the spatial dimensions. I’m going from memory so pinch of salt here: Somewhere early in the discussion they talked about the cell being small enough that it didn’t matter. The cell is one micron across so that’s pretty small. I thought of it this way: For movement across half the distance of the cell diameter in one second, we get .000 002 in/sec = .00012 in/min = .0072 in/hr = .000 000 1 mph. If you like metric units, 0.18 mm/hr. I would think that would be slow enough for Brownian motion to move every molecule around in a haphazard motion but I also think that would be a bad assumption. But I think you need to make some kind of assumption like that to assume that spatial dimensions did not matter.

Any thoughts on the first issue?

On the second issue, how long should we use for a time step, I would think that would be a function of the first question to some degree. The question was however addressed by the girl in the YouTube video and basically she said the length of time for this model’s time step is 1 second because it was assumed that molecules could “find each other” within that period of time. They claim not to have “played around” with this time step too much. The time step problem was one that they chose to ignore. Discussion of this time step occurs during the Q&A period starting at 0:49:00.

Any thoughts at all on the second issue?

Thanks for any insights you may have. I realize these probably aren’t easy questions to answer even if you’re on the team, working on the project.

Pythagorean · Jan 26, 2014

There was some research showing that the cytosol of bacteria is "glass-like" meaning, as a function of metabolic activity, the cytosol can become more solid-like or more fluid-like. This might have interesting implications for simplifications you can make in a modelling them, since it lends to some kind of spatial stability in local regions of the bacteria.

https://www.physicsforums.com/showthread.php?t=730082

Ygggdrasil · Jan 26, 2014

Regarding the lack of spatial organization in the model, there are reasons to believe their claims that you can treat the bacterium as a single volume and reasons to think that their model would benefit from adding spatial parameters. First, according to the bionumbers website, the average diffusion coefficient of a protein in cytoplasm is 5-15µm²/s, so traversing the 2-4µm length of the bacteria takes on average ~ tens of milliseconds. Of course, the Christine Jacobs-Wagner paper that Pythagorean cited (along with other recent research) shows that diffusion and transport in bacteria is not so simple, so these back-of-the-envelope calculations may not always be applicable.

A bigger problem is the fundamental assumption that spatial organization does not matter. While as late as ~10 years ago, most scientists would have probably said that bacteria are essentially just a bag of enzymes with little spatial organization, recent studies of bacteria have demonstrated that like eukarotes, the cellular machinery of bacteria is also highly organized (for example, see this recent review article from Lucy Shapiro at Stanford). However, we still know very little about the spatial organization of bacteria and how it affects their physiology. Furthermore, there are many conflicting results in the field, and there still remains little consensus about the right ways to think about spatial organization in bacteria. Thus, it's difficult to know how to correctly model anything about it (part of the reason, I suspect, Covert and colleagues omitted any spatial parameters in their model). Part of the reason why we lack information about the spatial organization of bacteria is their small size. Bacteria are ~1-2 microns in size whereas conventional light microscopy (which has been such an important tool in studying the organization of eukaryotic cells) has a resolution of only ~ 0.2µm at best (and for technical reasons it is difficult to gather useful information from higher resolution imaging techniques such as electron microscopy). Recent advances in superresolution microscopy – light microscopy that breaks the diffraction barrier and allows imaging at the nanometer scale – has finally enabled scientists to observe subcellular structures in bacteria (here's a good example of a study that applies superresolution imaging to discover new levels of spatial organization in bacteria).

On the question of timescale, from what I remember of the paper (it's been a while since I read it in detail), they're focused mainly modeling metabolic processes in the cell. Because metabolic process are chemical reactions, and these reactions generally proceed on the second timescale, a one second step size seems reasonable (combined with the insight above estimating that diffusion across the cell would occur in tens of milliseconds).

Q_Goest · Jan 26, 2014

Thanks both of you for the comments. Just one other thing then. I guess much larger cells, especially something like a neuron, might require many discrete volumes to properly model. I could envision something like an FEA type program being developed for analyzing cells though it sounds like we're a long way off because of the huge amount of information we don't have. Not only would the cell need to be modeled, but adjoining cells would also need to be modeled in some way, either using boundary conditions or by modeling those cells as well. Perhaps a large model similar to the Blue Brain project might be required. Do you think it's reasonable to expect this kind of computational cell modeling to expand like that? Is there any reason larger cells might NOT be modeled using a large number of separate volumes?

Pythagorean · Jan 27, 2014

From a modeling perspective, there's always a problem of generality vs. specificity. The more general models are usually simpler but make more simplifying assumptions that are "more true" or more often "true in the limit" so they fit some kind of plutonic ideal of the system.

In reality, we know the systems exist as a distribution, so there is really no ideal system (Eve Marder's work). So we like to include some specificity, but if you go to far in that direction than you're taking particular experimental factors too seriously. Experiments represent a limited set of systems in a limited number of states (whatever it takes to get experimental data). This is analogous to overfitting in parameter optimization (whereas going to general is more analogous to underfitting).

So there's a fine balance between the two that you have to find in any model, and this is especially true of spatially extended models. Most of modelling (for me, anyway) is determining where you lie on the spectrum. What data you should take seriously, and what data is less "representative" and that takes a lot of data from a lot of different cells.

It also takes thinking critically about the context of the data and how it compares to something in nature. For example, the Eve Marder paper means that if you just take averages of every parameter value from the experiment, your model could easily fail. You have to maintain the pair relationship of parameters for each cell (or spatiotemporal compartment) and experimentalists don't always do that: they tend to save the averages (I don't know the status of this with bacteria).

There is an alternative idealization to breaking a cell into several spatial components, and using boundary conditions: if you take the volume of each compartment to go to zero in the limit, you make a continuously differentiable space, so your model becomes a partial differential equation (much like the wave equation has a spatial extension without using compartments). We usually can't do that because we don't know what the equation would look like. Because of the inherent complexity, we often have to take the computational/numerical approach instead.

Of course, the hope is that, for appropriate volume size and boundary conditions, your model reproduces the behavior of the continuous limit to good enough approximation to make predictions, or confirm beliefs about underlying mechanisms.

So from my perspective, the summarized answer to your question "Is there any reason larger cells might NOT be modeled using a large number of separate volumes?" is complexity. There are probably people trying to model it that way right now, but it will take a while for them to work it out. It also depends on question demand... does anybody care about the kinds of questions that a spatially extended model could answer or are the current pool of questions sufficiently approached with the spherical cow bacteria model?

Q_Goest · Jan 28, 2014

Thanks P. I think what you’re saying is that, given the variations in cell properties and experimental data such as described by Marder, making a model of a cell or group of cells isn’t necessarily going to provide realistic results for typical cells. Garbage in/garbage out? I read through Marder’s paper and a couple of things stuck out for me. First, Figure 2a through 2f and the description of those figures helped quite a bit. The “mean” and the “best” values for the parameters may or may not even be realistic I suppose.

In engineering, we generally look at basic principles to describe ‘parameters’ if you will. Parameters such as fluid properties, material stresses and other things are well understood and not variable in the sense those parameters are variable in biology. I guess my naïve logic made the assumption that these parameters such as in the cell model described in the OP, and the interactions between those parameters were essentially biological, chemical reactions that could be described using basic principles. So reaction rates and such things would be described using those basic principles, but it sounds like that’s not a good assumption. Is that a fair assessment? It sounds like a pretty severe condemnation of neuron modeling software.

Pythagorean · Jan 28, 2014

the "best" values are meant to mean most realistic; that they reproduce the physiology. The point is just that they aren't always the average, not that you can't model them. Sometimes the average leads to no physiological behavior at all. The blue dots in those Marder figures are where you get the physiological result you're looking for; sometimes, the average doesn't coincide with them.

You could add variability to the model and use something like mass-action kinetics to change parameters (that is one of the aspects of my current research) but that's a very complex process and it requires lots of data that experimentalists aren't always interested in collecting. So not a lot of us are doing it.

Q_Goest · Jan 29, 2014

Marder’s paper states:

If we were magically able to look into each of those nervous systems and measure the numbers and properties of the synapses, ion channels, receptors and enzymes in all of the individuals across the population, we would find real biological variation in most, if not all, of these parameters.

I don’t know what you might call those parameters, but I see them as being mostly structures in the cell, with the exception of enzymes. I don’t know how those parameters arise in the cell but I would think it’s primarily a function of DNA and secondarily of random chemistry in the cell as well as interactions at the cell boundary with the local environment.

Similarly, I would think that we could break up cells such as bacteria or neurons into structures and biochemistry. Structures I’m assuming will vary exceedingly slowly compared to the rate of chemical reaction, but the program mentioned in the OP does talk about the division of a cell and how that is modeled. Markus Covert talks about untangling the web of interlinked chemical reactions that make living cells tick and modeling those reactions in the program. So I see there being structure and this tangle of chemical reactions. The chemical reactions are dictated by concentrations, reaction rates as well as the basic structure of the cell. But again, I have little background, so I’ll ask, is Marder’s concern about this lack of knowledge of parameters in the cell something that could be addressed by creating realistic models of cellular structures and then of biochemical reactions? Would cellular models get around the parameter issues Marder is concerned about by limiting the model to specific cells and their specific biochemistry? If we model an actual cell and take into account its specific structure and biochemistry, doesn't Marder's concern disappear? I understand that might be problematic because of the size issues that Ygggdrasil points out in post #4.

As a side note, I see Marder’s paper also focuses on “unsuspected compensatory mechanisms that contribute to neuron and network function”. So while there seem to be compensatory mechanisms that result in two different neurons functioning in a similar way, I think that point is less important if we model actual cells and had some method to show that the model of a given cell mimicked the actual cell.

Side note number 2. (Sorry for the longwinded response from someone who isn’t a biologist!) But there was a paper by Destexhe who seems to have tried to model a specific neuron or neurons, similar to what I’m suggesting above. He seems to have tested actual neurons both in vivo and in vitro and showed how his models matched experiment, but I’m not sure if I may be misinterpreting something. Any thoughts on that paper?

Can computational models accurately predict cellular phenotypes from genotype?

1. What is a computer model of a bacterium?

2. How is a computer model of a bacterium created?

3. What is the purpose of creating a computer model of a bacterium?

4. How accurate are computer models of bacteria?

5. What are the limitations of computer models of bacteria?

Similar threads

Hot Threads

Recent Insights