Stem cell differentiation as a many-body problem
See allHide authors and affiliations
Contributed by Peter G. Wolynes, May 9, 2014 (sent for review March 25, 2014)

Significance
A molecular understanding of stem cell differentiation requires the study of gene regulatory network dynamics that includes the statistics of synthesizing transcription factors, their degradation, and their binding to the DNA. Brute force simulation for complex large realistic networks can be computationally challenging. Here we develop a sound approximation method built upon theories originating from quantum statistical mechanics to study the network for embryonic stem cell development. Mechanistic insight is provided on the role of the master regulator Nanog in safeguarding stem cell pluripotency. We also demonstrate the significant influence of DNA-binding kinetics, an aspect that often has been overlooked, on the transition rate between gene network steady states.
Abstract
Stem cell differentiation has been viewed as coming from transitions between attractors on an epigenetic landscape that governs the dynamics of a regulatory network involving many genes. Rigorous definition of such a landscape is made possible by the realization that gene regulation is stochastic, owing to the small copy number of the transcription factors that regulate gene expression and because of the single-molecule nature of the gene itself. We develop an approximation that allows the quantitative construction of the epigenetic landscape for large realistic model networks. Applying this approach to the network for embryonic stem cell development explains many experimental observations, including the heterogeneous distribution of the transcription factor Nanog and its role in safeguarding the stem cell pluripotency, which can be understood by finding stable steady-state attractors and the most probable transition paths between those attractors. We also demonstrate that the switching rate between attractors can be significantly influenced by the gene expression noise arising from the fluctuations of DNA occupancy when binding to a specific DNA site is slow.
Understanding the underlying mechanisms of the differentiation of stem cells into many distinct cell types has long been a goal of developmental biology (1) and regenerative medicine (2). The epigenetic landscape, originally proposed by Waddington, has proved to be a useful metaphor for visualizing cellular dynamics (3). However, is it more than a metaphor? In this view, cell phenotypes are identified as attractors with well-defined patterns of robust gene expression, and differentiation occurs through transitions from the stem cell attractor to other attractors for the differentiated cells. It has become clear that for simple models, taking into account stochastic effects allows a well-defined landscape to be constructed (4⇓–6). This generalized potential landscape governs much of the gene regulatory network dynamics, such that corrections to such a landscape picture can also be defined and formalized (7, 8). It remains a challenge to construct such landscapes for realistically large network models.
Describing the stochastic dynamics of gene networks must include the statistics of synthesizing transcription factors, their degradation, and their binding to genes on the DNA. These all play crucial roles in shaping the epigenetic landscape (9). Due to the complexity of the assembly of the machinery involved for protein synthesis, it has often been assumed that protein translation is much slower than the diffusion-limited DNA-binding process. Under this view, the ensemble of DNA occupancies, i.e., the set of transcription factors bound at a gene’s regulatory elements, can be assumed to have achieved a quasi-equilibrium so that the expression dynamics of this gene can then be described as a birth–death process governed by an averaged protein production rate that depends nonlinearly on transcription factor concentrations (10). Although perhaps valid for some networks in prokaryotic cells, this assumption on timescales is probably not completely adequate for eukaryotic systems because the processes of chromatin decondensation and unwrapping of DNA from nucleosomes, both of which are required for proteins to bind, take time. The high-level architecture of chromatin can severely limit the access of transcription factors to the DNA, slowing down DNA-binding kinetics (11, 12) and resulting in gene expression noise deviating from Poisson statistics (13). Even when the transcription factors function as pioneers that can directly bind with the chromatin sites occupied by the nucleosome, slow DNA binding (14, 15) is still a good approximation to describe the effect of the progressive change of the chromatin structure and histone modification induced by the pioneer factors on gene regulation (16). As a result, DNA binding must be treated on equal footing together with protein synthesis and degradation to fully understand eukaryotic gene regulation (14⇓⇓⇓–18).
By increasing the dimensionality of the problem, investigating the effects arising from slow DNA-binding kinetics on gene network dynamics becomes computationally challenging. Although being in some aspects conceptually transparent, where one can directly simulate the chemical reactions involved in a gene network using a Monte Carlo algorithm (19), such a procedure quickly becomes inefficient as the system complexity increases and is even impractical for studying rare, transient switching events between steady states that occur on long timescales. In recognizing the close analogy between gene networks and magnetic systems, Sasai and Wolynes suggested that analytical approaches originating from quantum statistical mechanics could be used to study the epigenetic landscape of networks and discussed the steady states of a very stylized model network with many attractors (4). Although their approach was efficient in identifying steady-state solutions, it remained to show how this approach can characterize transitions between different attractors. In the adiabatic limit where DNA binding is fast, analytical approaches to the transition process based on large deviation theory have proved successful in studying noise-induced transitions (7, 20⇓–22). Here we show how to find equivalent approaches for large networks where DNA binding must be treated explicitly.
In this paper, we generalize a kinetic model originally proposed by Sasai and Wolynes (4) to explicitly model DNA binding along with protein synthesis and degradation in large gene networks with multifactorial and complex switches. An approximation method is further developed that allows the construction of the network’s epigenetic landscape via the determination of not only steady-state solutions, but also most probable transition paths and stochastic switching rates between steady states. When applied to a model of the core transcriptional regulatory network of embryonic stem cells, these approximations turn out to explain the observed fluctuations in the expression of the master pluripotent regulator Nanog and the important role it has in safeguarding stem cell pluripotency against differentiation. We also demonstrate the striking effect that DNA-binding kinetics have on the switching rate between steady states, making clear the importance of explicitly modeling fluctuations of DNA occupancy in studying the fate of gene networks.
Theoretical Approaches for Gene Networks
Viewed as a molecular process, gene regulation is complicated, with many actors as well as attractors, and spans a large spatial and temporal domain. Several approximations are needed to make tractable mathematical models. We use an admittedly simplified model but one that contains many of the crucial features of the real embryonic stem cell network.
Constructing Schematic Kinetic Models for Gene Networks.
Following Sasai and Wolynes (4), we model gene regulation at a level that includes explicit binding and unbinding of transcription factors to the DNA, protein translation, and protein degradation. Both the mRNA intermediary and the serial nature of macromolecular synthesis are neglected in the present model but are likely important. Fig. 1A illustrates how we can diagram a mutual repression motif between genes Cdx2 and Oct4. The gene Cdx2 can be transcribed to proteins at a rate g, which will change depending on whether the transcription factors produced by the Oct4 gene are bound at the Cdx2 gene’s regulatory element. Here, we assume Oct4 proteins will interact with the Cdx2 regulatory element in the form of a homodimer and that they will bind with the regulatory element at a rate h and unbind with a rate f. Similarly, the Cdx2 proteins can also interact with the Oct4 gene’s regulatory element and regulate the production rate of Oct4 proteins. Proteins in our model are also degraded with a rate k. We do not explicitly model nucleosome dynamics and covalent chromatin modifications such as DNA methylation and histone acetylation and methylation and assume these affect gene expression only via regulating the kinetics of transcription factor binding to the DNA. Accounting for such epigenetic modifications may be necessary to fully appreciate the complexity of eukaryotic gene regulation, as emphasized by Sasai et al. (16). Simplified as arrows, this mutual repression motif can be represented as the diagram shown in Fig. 1A, Inset. Including interactions with other genes, a relatively realistic network for the embryonic stem cell can be diagrammed as in Fig. 1B. We emphasize that every link in Fig. 1B includes the series of reactions illustrated in Fig. 1A.
A schematic gene regulatory network for the embryonic stem cell. (A) Schematics for the detailed chemical reactions included in modeling the mutual repression between genes Oct4 and Cdx2. Inset simplifies the reactions into a diagram that is useful for illustrating the topology of complex networks. (B) Topology of the stem cell gene network. Each node represents either a transcription factor (single word) or a protein complex (two words) involved in early stem cell differentiation. Arrows that end at transcription factors represent transcriptional regulations with the mechanism shown in A. The green color is used for activation and the red color indicates repression. Arrows targeting protein complexes emerge from the monomers involved in forming the complex. Gene markers found in cells being in pluripotent states are colored in red and orange, and those found for differentiated state phenotypes are colored in blue.
Obviously many quantitative molecular details of even this simplified scheme are still unknown. In building a kinetic model for the gene network, further approximations are thus necessary and worthy of clarification. The resulting model still can thus be deemed only “stylized.” First, we assume all of the transcription factors, except the Oct4–Sox2 protein complex, will interact with the DNA sequence in the form of homodimers. Although many transcription factors are indeed found to function as dimers (23), it is unlikely that this is true for all of the proteins included in our current network. Second, many genes in the network are regulated by a multiplicity of transcription factors. Again, for simplicity and generality, we assume all of the different transcription factors occupy unique, nonoverlapping regulatory elements. Finally, because many of the kinetic parameters are not available at this point, all of the genes in the network are assumed to function with the same set of rates. We see then this is something like an “Ising model” for gene networks. Despite these simplifications, the significance of having such a specific yet stylized model cannot be overemphasized. As shown below, many concrete conclusions that appear insensitive to these simplifications can be drawn due to the robustness of the underlying gene network.
Master Equation for Network Dynamics.
As shown in ref. 4, to fully specify a gene’s state, knowledge of both the occupancy of the DNA site and the concentration of the proteins is required. For a gene that is regulated by only one transcription factor, the joint probability of the DNA occupancy and the number of proteins n can be conveniently written as a two-component column vector
Including the chemical reactions illustrated in Fig. 1A, the self-consistent field regulatory dynamics for a given gene m can be described via the following master equation:
Steady States and Most Probable Transition Paths.
We follow the approach of Doi (25) and Peliti (26) to rewrite the master equation in an operator form. Defining creation and annihilation operators such that
From Eq. 2, the transition probability
The steady states of the master equation can be found as attractors of the following deterministic dynamics that would extremize the action:
The most probable transition paths between steady states are determined from the Hamiltonian equations (28)
To highlight the effects of DNA-binding kinetics, i.e., the rate of fluctuations of DNA occupancy, on the epigenetic landscape, it is often convenient to compare results obtained in the adiabatic limit, in which DNA binding is considered to be fast, with the full results obtained directly from Eqs. 4 and 6. In the adiabatic limit, the state probabilities
Cell Types as Steady States of Gene Networks
Realistic models of the core transcriptional regulatory network for the embryonic stem cell have already been constructed by others (16, 32, 33). We illustrate the network used in this study in Fig. 1B. This core model consists of eight unique transcription factors and their genes interconnected via transcriptional activation (green) and repression (red) inputs. The triad of master pluripotency regulators, Oct4, Sox2, and Nanog, is highlighted in red. These regulators maintain stem cell pluripotency by activating a set of genes crucial for self-renewal and pluripotency (orange) while simultaneously suppressing differentiation genes (blue). Using the kinetic parameters given in SI Text, we determine the steady-state solutions of such a network and compare them with the gene expression patterns of known cell phenotypes.
Fig. 2A presents the steady-state gene expression profiles calculated in the adiabatic limit and labels these profiles using abbreviations of corresponding cell types, with SC standing for stem cell, PE for primitive endoderm, TE for trophectoderm, and DC for differentiated cell type. For the two steady states SC1 and SC2 found in the bottom of Fig. 2A, proteins for the pluripotency gene markers Oct4, Sox2, and Klf4 are expressed in high amounts, thus defining the corresponding phenotypes as stem cells. Compared with the steady-state SC2, SC1 has a much higher level of Nanog gene expression. The prediction of two distinct levels of Nanog gene expression in stem cell steady states agrees with the experimentally observed heterogeneous distribution of Nanog proteins from a population of stem cells (34⇓–36). In the remaining three steady states, the pluripotency gene markers are turned off, allowing the expression of differentiation genes such as Gata6 and Cdx2. These steady states thus have expression profiles characteristic of differentiated cells. In particular, consistent with primitive endoderm cells, the Gata6 gene is turned on in the steady-state PE. And consistent with trophectoderm cells, the Cdx2 protein is highly expressed in the steady-state TE. Finally, the network also supports the steady-state DC, in which both Gata6 and Cdx2 proteins are produced in large amounts. A similar expression pattern to this predicted steady state has been observed in Oct4 knockdown embryonic stem cells (37). Stability of the steady-state DC might be compromised in the presence of cell–cell communication, which is known to lead to mutual repression between Gata6 and Cdx2 genes (16). We note gene expression profiles consistent with the five predicted steady states in Fig. 2A have been observed in previous studies from the analysis of embryonic stem cell networks (15, 16, 32).
Steady-state solutions for the embryonic stem cell network. (A) Gene expression levels in the steady-state solutions identified for the network when its dynamics are modeled in the adiabatic limit. (B) Dependence of the number of steady-state solutions,
The observed five steady-state gene expression profiles are consistent with the network topology shown in Fig. 1B. The stem cell network contains three self-activating genes Nanog, Cdx2, and Gata6. The self-activation of Nanog arises indirectly via the intermediate gene Pbx1. Because it is known that bistability can arise in certain regimes for self-activating motifs (38), a total of three of these motifs will in principle give rise to
The steady-state solutions shown in Fig. 2A are obtained from calculations performed in the adiabatic limit, and the DNA occupancy states are assumed to achieve a conditional equilibrium. This assumption is, however, not always valid for eukaryotic cells because of slow chromatin dynamics (12, 14⇓–16). To investigate the effect of DNA-binding kinetics on the epigenetic landscape, we determined the number of steady-state solutions for the stem cell network over a wide range of DNA-unbinding rates. The results are presented in Fig. 2B.
Fig. 2B demonstrates that from the weakly adiabatic regime
Most Probable Transition Pathways
The excellent agreement between the steady-state gene expression levels shown in Fig. 2A and the gene expression levels of cell phenotypes known in the laboratory strongly suggests that cell types indeed do correspond to attractors on an epigenetic landscape. We next set out to test whether one can understand aspects of stem cell differentiation by studying the stochastic transition between attractors. In particular, we focus on studying the differentiation of stem cells into primitive endoderm cells and the role of the Nanog gene in regulating this differentiation process. We first determine the most probable stochastic transition path from the stem cell steady-state SC1 to the primitive endoderm steady-state PE in the adiabatic limit; then we discuss the effect of DNA-binding kinetics on the mechanism of this transition in the next section. Details regarding these calculations are provided in the SI Text and Tables S1–S3.
Fig. 3A presents the changing gene expression patterns along the most probable transition pathway from the steady-state SC1 to the steady-state PE. Starting from SC1, the pathway first transits to the steady-state SC2. During the transition from SC1 to SC2, the Nanog gene switches from an on state to an off state, accompanied by a coherent down-regulation in the expression of the Pbx1 gene. Little change is observed for the expression of other genes. As the pathway exits from the steady-state SC2, the expression level of the differentiation gene Gata6 starts to increase along with its antagonizing gene Nanog being turned off. Accumulation of Gata6 proteins leads to the inhibition of pluripotency genes such as Oct4 and Sox2, while in the meantime the decreased expression of pluripotency genes also diminishes their repressive effect on the Gata6 gene. This positive reinforcement leads to the final stable expression of the Gata6 gene in the steady-state PE.
The switching mechanism between steady states inferred from the most probable transition pathway. (A) Number of protein molecules along the most probable transition path from the stem cell steady-state SC1 to the primitive endoderm steady-state PE. The x axis indicates the progression along the transition and the y axis refers to different transcription factors. (B) The most probable differentiation pathway as in A plotted through the number of Nanog and Gata6 protein molecules. The steady-state solutions are shown as red circles. (C) Contributions to the action from individual genes along the most probable differentiation pathway from the steady-state SC1 to SC2 (Upper) and from the steady-state SC2 to PE (Lower).
The changing gene expression patterns shown in Fig. 3A suggest that the transition from the steady-state SC1 to PE occurs in two steps mediated by the intermediate-state SC2. This two-step transition mechanism becomes more apparent when the differentiation pathway is plotted using the expression levels of the Nanog and Gata6 genes, as shown in Fig. 3B. The three red circles correspond to gene expression levels in each of the three steady states, respectively. The transition pathway clearly transits through the steady-state SC2 before committing to the steady-state PE.
To further quantify the role of individual genes along the transition pathway, we decompose the transition action defined in Eq. 7 and use
The stochastic transition path shown in Fig. 3 provides a mechanistic understanding of the important role of the Nanog gene in safeguarding stem cell pluripotency. The calculations suggest that for a stem cell to differentiate, the network must transit through the intermediate steady-state SC2 by turning off the Nanog gene. Because the Nanog off state acts as a bridge that connects the stem cell attractor SC1 to the differentiated cell attractor PE, it is reasonable to expect that stem cells constructed with low expression of Nanog will be more likely to differentiate compared with those constructed having high expression of Nanog, as has been observed in various experiments (34, 35). A similar analysis on the transition from the steady-state SC1 to TE, as shown in Fig. S3, leads to consistent conclusions. We note transition through an intermediate state with low Nanog expression along the differentiation pathway has also been observed in previous studies (8, 14⇓–16, 39).
Effect of DNA-Binding Kinetics on Stochastic Switching
To quantify the effect of DNA-binding kinetics on the stochastic transition mechanism and transition rate, we calculate the most probable transition path between the steady-state SC1 and PE for various DNA-unbinding rates.
Fig. 4A compares the average expression of the two genes Nanog and Gata6 along the most probable transition pathways calculated at several different DNA-unbinding rates. The blue line is the path calculated in the adiabatic limit and is the same as the transition path shown in Fig. 3B. The yellow and red lines are obtained at the DNA-unbinding rates
The effect of DNA-binding kinetics on the switching pathway and switching rate between steady states. (A) Comparison of the most probable differentiation pathways from the steady-state SC1 to PE calculated at several different DNA-unbinding rates. The blue line is the same as the adiabatic result also shown in Fig. 3B. (B) Dependence of the transition action on the DNA-unbinding rate f. The black solid line is a numerical fit to the calculated actions (red circles), using the expression
On the other hand, Fig. 4B further compares the magnitudes of the transition actions along the most probable pathways calculated at different DNA-unbinding rates. The dashed line indicates the action value in the adiabatic limit. The black solid line is a numerical fit to the transition actions (red circles), using the expression
The results shown in Fig. 4 A and B can be understood from the “churning mechanism” proposed in ref. 38. In the weakly to strongly adiabatic regime, the DNA occupancy responds quickly to the change of the proteomic atmosphere and reaches a local steady state before the protein number changes by a large amount. The mean expression of a given gene is mostly determined using an effective synthesis rate averaged over the ensemble of DNA occupancy states and is relatively insensitive to the DNA-binding kinetics as shown in Fig. 4A. Unlike the mean, the fluctuations of gene expression, however, strongly depend on the DNA-binding kinetics. This fluctuation in gene expression can be quantified via a local diffusion rate in protein number D. In the adiabatic limit, the local diffusion
Discussion
The strong dependence of transition rates between steady states on the DNA-binding kinetics suggests a potential mechanism for the heterogeneous distribution of gene expression levels in embryonic cells (40), adult stem cells (41) and cancer cells (42). Many kinetic studies suggest that the stem cell gene network may function in a weakly adiabatic regime (16, 43), which according to Fig. S4 will allow fast switching between different steady-state gene expression profiles. Therefore, the experimentally measured gene expression levels from a population of cells may in fact represent, instead of gene expression from a single attractor, an ensemble mixture of different steady-state gene expressions, which naturally leads to a heterogeneous distribution of expression levels. In that respect, it is particularly encouraging to note that stem cells can be classified into distinct clusters, in different ones of which the gene expression profiles bear close resemblance to those found for restricted lineages (41, 44). The fast switching between attractors in the weakly adiabatic regime may be beneficial to allow developmental plasticity of stem cells. However, this tendency of switching can also be detrimental and can compromise the phenotypic robustness of differentiated cells. We therefore expect that cell fate may be fixed by a transition of gene network dynamics from the weakly to the strongly adiabatic regime during cell differentiation to suppress the switching probability, possibly via covalent modifications of the chromatin such as methylation and acetylation.
Acknowledgments
We thank Dr. Davit Potoyan, Dr. Masaki Sasai, Dr. Aleksandra Walczak, and Dr. Jin Wang for critical reading of the manuscript. B.Z. acknowledges help from Dr. Matthias Heymann on the implementation of gMAM for finding the most probable path. We acknowledge financial support by the D. R. Bullard-Welch Chair at Rice University and by the Center for Theoretical Biological Physics sponsored by the National Science Foundation (Grant PHY-0822283).
Footnotes
- ↵1To whom correspondence should be addressed. E-mail: pwolynes{at}rice.edu.
Author contributions: B.Z. and P.G.W. designed research, performed research, analyzed data, and wrote the paper.
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1408561111/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵
- Davidson E
- ↵
- ↵
- Waddington CH
- ↵
- Sasai M,
- Wolynes PG
- ↵
- Wang J,
- Xu L,
- Wang E
- ↵
- ↵
- ↵
- Wang J,
- Zhang K,
- Xu L,
- Wang E
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Li C,
- Wang J
- ↵
- ↵
- ↵
- Potoyan DA,
- Wolynes PG
- ↵
- ↵
- ↵
- ↵
- Wang P,
- et al.
- ↵
- Chapman K,
- Higgins S
- ↵
- ↵
- ↵
- ↵
- Zhang K,
- Sasai M,
- Wang J
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Walczak AM,
- Onuchic JN,
- Wolynes PG
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
Citation Manager Formats
Article Classifications
- Biological Sciences
- Biophysics and Computational Biology
- Physical Sciences
- Chemistry