Green function of correlated genes in a minimal mechanical model of protein evolution
- aCenter for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Korea;
- bDépartement de Physique Théorique and Section de Mathématiques, Université de Genève, CH-1211 Geneva 4, Switzerland;
- cCenter for Studies in Physics and Biology, The Rockefeller University, New York, NY 10021;
- dDepartment of Physics, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
See allHide authors and affiliations
Contributed by Albert Libchaber, November 20, 2017 (sent for review September 25, 2017; reviewed by Mukund Thattai and Massimo Vergassola)

Significance
Many protein functions involve large-scale motion of their amino acids, while alignment of their sequences shows long-range correlations. This has motivated search for physical links between genetic and phenotypic collective behaviors. The major challenge is the complex nature of protein: nonrandom heteropolymers made of 20 species of amino acids that fold into a strongly coupled network. In light of this complexity, simplified models are useful. Our model describes protein in terms of the Green function, which directly links the gene to force propagation and collective dynamics in the protein. This allows for derivation of basic determinants of evolution, such as fitness landscape and epistasis, which are often hard to calculate.
Abstract
The function of proteins arises from cooperative interactions and rearrangements of their amino acids, which exhibit large-scale dynamical modes. Long-range correlations have also been revealed in protein sequences, and this has motivated the search for physical links between the observed genetic and dynamic cooperativity. We outline here a simplified theory of protein, which relates sequence correlations to physical interactions and to the emergence of mechanical function. Our protein is modeled as a strongly coupled amino acid network with interactions and motions that are captured by the mechanical propagator, the Green function. The propagator describes how the gene determines the connectivity of the amino acids and thereby, the transmission of forces. Mutations introduce localized perturbations to the propagator that scatter the force field. The emergence of function is manifested by a topological transition when a band of such perturbations divides the protein into subdomains. We find that epistasis—the interaction among mutations in the gene—is related to the nonlinearity of the Green function, which can be interpreted as a sum over multiple scattering paths. We apply this mechanical framework to simulations of protein evolution and observe long-range epistasis, which facilitates collective functional modes.
A common physical basis for the diverse biological functions of proteins is the emergence of collective patterns of forces and coordinated displacements of their amino acids (1⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–13). In particular, the mechanisms of allostery (14⇓⇓⇓–18) and induced fit (19) often involve global conformational changes by hinge-like rotations, twists, or shear-like sliding of protein subdomains (20⇓–22). An approach to examine the link between function and motion is to model proteins as elastic networks (23⇓⇓–26). Decomposing the dynamics of the network into normal modes revealed that low-frequency “soft” modes capture functionally relevant large-scale motion (27⇓⇓–30), especially in allosteric proteins (31⇓–33). Recent works associate these soft modes with the emergence of weakly connected regions in the protein (Fig. 1 A and B)—“cracks,” “shear bands,” or “channels” (21, 22, 34⇓–36)—that enable viscoelastic motion (37, 38). Such patterns of “floppy” modes (39⇓⇓–42) emerge in models of allosteric proteins (36, 43⇓–45) and networks (46⇓–48).
Protein as an evolving machine and propagation of mechanical forces. (A) Formation of a softer shear band (red) separating the protein into two rigid subdomains (light blue). When a ligand binds, the biochemical function involves a low-energy hinge-like or shear motion (arrows). (B) Shear band and large-scale motion in a real protein: the arrows show the displacement of all amino acids in human glucokinase when it binds glucose (Protein Data Bank ID codes 1v4s and 1v4t). The coloring shows a high-shear region (red) separating two low-shear domains that move as rigid bodies (shear calculated as in refs. 21 and 36). (C) The mechanical model. The protein is made of two species of amino acids, polar (P; red) and hydrophobic (H; blue), with a sequence that is encoded in a gene. Each amino acid forms weak or strong bonds with its 12 near neighbors (Right) according to the interaction rule in the table (Left). (D) The protein is made of
Like their dynamic phenotypes, proteins’ genotypes are remarkably collective. When aligned, sequences of protein families show long-range correlations among the amino acids (49⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–61). The correlations indicate epistasis, the interaction among mutations that takes place among residues linked by physical forces or common function. By inducing nonlinear effects, epistasis shapes the protein’s fitness landscape (62⇓⇓⇓⇓⇓–68). Provided with sufficiently large data, analysis of sequence variation can predict the 3D structure of proteins (50⇓–52), allosteric pathways (53⇓–55), epistatic interactions (56, 57), and coevolving subsets of amino acids (58⇓–60, 69).
Still, the mapping between sequence correlation and collective dynamics—and in particular, the underlying epistasis—is not fully understood. Experiments and simulations provide valuable information on protein dynamics, and extensive sequencing accumulates databases required for reliable analysis; however, there remain inherent challenges: the complexity of the physical interactions and the sparsity of the data. The genotype-to-phenotype map of proteins connects spaces of huge dimension, which are hard to sample, even by high-throughput experiments or natural evolution (70⇓–72). A complementary approach is the application of simplified coarse-grained models, such as lattice proteins (73⇓–75) or elastic networks (24), which allow one to extensively survey the map and examine basic questions of protein evolution. Such models have been recently used to study allosteric proteins (35, 36, 43⇓–45) and in networks (46⇓–48). Our aim here is different: to construct a simplified model of how the collective dynamics of functional proteins directs their evolution and in particular, to give a mechanical interpretation of epistasis.
This paper introduces a coarse-grained theory that treats protein as an evolving amino acid network with topology that is encoded in the gene. Mutations that substitute one amino acid with another tweak the interactions, allowing the network to evolve toward a specific mechanical function: in response to a localized force, the protein will undergo a large-scale conformational change (Fig. 1 C and D). We show that the application of a Green function (76, 77) is a natural way to understand the protein’s collective dynamics. The Green function measures how the protein responds to a localized impulse via propagation of forces and motion. The propagation of mechanical response across the protein defines its fitness and directs the evolutionary search. Thus, the Green function explicitly defines the map: gene → amino acid network → protein dynamics → function. We use this map to examine the effects of mutations and epistasis. A mutation perturbs the Green function and scatters the propagation of force through the protein (Fig. 2). We quantify epistasis in terms of “multiple scattering” pathways. These indirect physical interactions appear as long-range correlations in the coevolving genes.
Force propagation, mutations, and epistasis. (A) The Green function G measures the propagation of the mechanical signal, depicted as a “diffraction wave,” across the protein (blue) from the force source f (pinch) to the response site v. (B) A mutation
Using a Metropolis-type evolution algorithm, solutions are quickly found, typically after
Model: Protein as an Evolving Machine
The Amino Acid Network and Its Green Function.
We use a coarse-grained description in terms of an elastic network (23⇓⇓⇓–27, 39) with connectivity and interactions that are encoded in a gene (Fig. 1 C and D). Similar vector elasticity models were considered in refs. 35 and 36 (app. B3 therein). The protein is a chain of
We consider a constant fold, and therefore, any particular codon
Evolution searches for a protein that will respond by a prescribed large-scale motion to a given localized force f (“pinch”). In induced fit, for example, specific binding of a substrate should induce global deformation of an enzyme. The response u is determined by the Green function G (76):
When the protein is moved as a rigid body, the lengths of the bonds do not change, and the elastic energy cost vanishes. A 2D protein has
The fitness function rewards strong mechanical response to a localized probe (pinch in Fig. 1D): a force dipole at two neighboring amino acids
Evolution Searches in the Mechanical Fitness Landscape.
Our simulations search for a prescribed response v induced by a force f applied at a specific site on the left side (pinch). The prescribed dipolar response may occur at any of the sites on the right side. This gives rise to a wider shear band that allows the protein to perform general mechanical tasks (unlike specific allostery tasks of communicating between specified sites on L and R). We define the fitness as the maximum of F [2] over all potential locations of the channel’s output (typically 8–10 sites) (Materials and Methods). The protein is evolved via a point mutation process where, at each step, we flip a randomly selected codon between zero and one. This corresponds to exchanging H and P at a random position in the protein, thereby changing the bond pattern and the elastic response by softening or stiffening the amino acid network.
Evolution starts from a random protein configuration encoded in a random gene. Typically, we take a small fraction of amino acid of type P (about
The typical evolution trajectory lasts about
The mechanical Green function and the emergence of protein function. (A) Progression of the fitness F during the evolution run shown in Fig. 1D (black) together with the fitness trajectory averaged over
Results
Mechanical Function Emerges at a Topological Transition.
The hallmark of evolution reaching a solution gene
As the shear band is taking shape, the correlation among codons builds up. To see this, we align genes from the
Point Mutations Are Localized Mechanical Perturbations.
A mutation may vary the strength of no more than
Epistasis Links Protein Mechanics to Genetic Correlations.
Our model provides a calculable definition of epistasis, the nonlinearity of the fitness effect of interacting mutations (Fig. 2C). We take a functional protein obtained from the evolution algorithm and mutate an amino acid at a site i. This mutation induces a change in the Green function
Mechanical epistasis. The epistasis [7], averaged over
Definition [7] is a direct link between epistasis and protein mechanics: the nonlinearity (“curvature”) of the Green function measures the deviation of the mechanical response from additivity of the combined effect of isolated mutations at i and j,
In the gene, epistatic interactions are manifested in codon correlations (56, 57) shown in Fig. 4D, which depicts two-codon correlations
Epistasis as a Sum over Scattering Paths.
One can classify epistasis according to the interaction range. Neighboring amino acids exhibit contact epistasis (49⇓–51), because two adjacent perturbations,
Geometry of Fitness Landscape and Gene-to-Function Map.
With our mechanical evolution algorithm, we can swiftly explore the fitness landscape to examine its geometry. The genotype space is a 200D hypercube with vertices that are all possible genes c. The phenotypes reside in a 400D space of all possible mechanical responses u. The Green function provides the genotype-to-phenotype map [1]. A functional protein is encoded by a gene
The singular value decomposition (SVD) of the
From gene to mechanical function: spectra and dimensions. (A) The first four SVD eigenvectors (in the text) of the gene
We examine the correspondence among three sets of eigenvectors:
In the phenotype space, we represent the displacement field u in the SVD basis,
Discussion
Theories of protein need to combine the many-body physics of the amino acid matter with the evolution of genetic information, which together, give rise to protein function. We introduced a simplified theory of protein, where the mapping between genotype and phenotype takes an explicit form in terms of mechanical propagators (Green functions), which can be efficiently calculated. As a functional phenotype, we take cooperative motion and force transmission through the protein [2]. This allows us to map genetic mutations to mechanical perturbations, which scatter the force field and deflect its propagation [3 and 6] (Fig. 2). The evolutionary process amounts to solving the inverse scattering problem: given prescribed functional modes, one looks for network configurations that yield this low end of the dynamical spectrum. Epistasis, the interaction among loci in the gene, corresponds to a sum over all multiple scattering trajectories or equivalently, the nonlinearity of the Green function [7 and 8]. We find that long-range epistasis signals the emergence of a collective functional mode in the protein. The results of this theory (in particular, the expressions for epistasis) follow from the basic geometry of the amino acid network and the localized mutations and are, therefore, applicable to general tasks and fitness functions with multiple inputs and responses.
Materials and Methods
The Mechanical Model of Protein.
We model the protein as an elastic network made of harmonic springs (23, 24, 39, 90). The connectivity of the network is described by a hexagonal lattice with vertices that are amino acids and edges that correspond to bonds. There are
We embed the graph in Euclidean space
We define the “embedded” gradient operator D (of size
The Inverse Problem: Green Function and Its Spectrum.
The Green function G is defined by the inverse relation to Hooke’s law,
Pinching the Network.
A pinch is given as a localized force applied at the boundary of the “protein.” We usually apply the force on a pair of neighboring boundary vertices,
Evolution tunes the spring network to exhibit a low-energy mode, in which the protein is divided into two subdomains moving like rigid bodies. This large-scale mode can be detected by examining the relative motion of two neighboring vertices, p and q, at another location at the boundary (usually at the opposite side). Such a desired response at the other side of the protein is specified by a response vector v, and the only nonzero entries correspond to the directions of the response at p and q. Again, we usually consider a “dipole” response
Evolution and Mutation.
The quality of the response (i.e., the biological fitness) is specified by how well the response follows the prescribed one v. In the context of our model, we chose the (scalar) observable F as
Evolving the Green Function Using the Dyson and Woodbury Formulas.
The Dyson formula follows from the identity
The two expressions for the mutation impact
Pathologies and Broken Networks.
A network broken into disjoint components exhibits floppy modes owing to the low energies of the relative motion of the components with respect to each other. The evolutionary search might end up in such nonfunctional unintended modes. The common pathologies that we observed are (i) isolated nodes at the boundary that become weakly connected via
Dimension and SVD.
To examine the geometry of the fitness landscape and the genotype-to-phenotype map, we looked at the correlation among numerous solutions, typically
The Protein Backbone.
A question may arise as to what extent the protein’s backbone might affect the results described so far. Proteins are polypeptides, linear heteropolymers of amino acids, linked by covalent peptide bonds, which form the protein backbone. The peptide bonds are much stronger than the noncovalent interactions among the amino acids and do not change when the protein mutates. We, therefore, augmented our model with a “backbone”: a linear path of conserved strong bonds that passes once through all amino acids. We focused on two extreme cases: a serpentine backbone either parallel to the shear band or perpendicular to it (Fig. 6).
The effect of the backbone on evolution of mechanical function. The backbone induces long-range mechanical correlations, which influence protein evolution. We examine two configurations: parallel (A and B) and perpendicular (C and D) to the channel. (A and B) Parallel. (A) The backbone directs the formation of a narrow channel along the fold (compared with Fig. 5A). (B) The first four SVD eigenvectors of the gene
The presence of the backbone does not interfere with the emergence of a low-energy mode of the protein with a flow pattern (i.e., displacement field) that is similar to the backboneless case with two eddies moving in a hinge-like fashion. In the parallel configuration, the backbone constrains the channel formation to progress along the fold (Fig. 6A). The resulting channel is narrower than in the model without backbone (Figs. 1D and 5). In the perpendicular configuration, the evolutionary progression of the channel is much less oriented (Fig. 6C). While the flow patterns are similar, closer inspection shows noticeable differences, as can be seen in the flow eigenvectors
As for the correspondence between gene eigenvectors
Acknowledgments
We thank Jacques Rougemont for calculations of shear in glucokinase (Fig. 1B) and for helpful discussions. We thank Stanislas Leibler, Michael R. Mitchell, Elisha Moses, Giovanni Zocchi, and Olivier Rivoire for helpful discussions and encouragement. We also thank Alex Petroff, Steve Granick, Le Yan, and Matthieu Wyart for valuable comments on the manuscript. J.-P.E. is supported by European Research Council Advanced Grant Bridges, and T.T. is supported by Institute for Basic Science Grant IBS-R020 and the Simons Center for Systems Biology of the Institute for Advanced Study, Princeton.
Footnotes
- ↵1To whom correspondence may be addressed. Email: libchbr{at}rockefeller.edu or tsvitlusty{at}gmail.com.
Author contributions: S.D., J.-P.E., A.L., and T.T. designed and performed research and wrote the paper.
Reviewers: M.T., Tata Institute of Fundamental Research, National Center for Biological Sciences; and M.V., University of California, San Diego.
The authors declare no conflict of interest.
- Copyright © 2018 the Author(s). Published by PNAS.
This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).
References
- ↵
- ↵
- ↵
- ↵
- Boehr DD,
- McElheny D,
- Dyson HJ,
- Wright PE
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Koshland D
- ↵
- Gerstein M,
- Lesk AM,
- Chothia C
- ↵
- Mitchell MR,
- Tlusty T,
- Leibler S
- ↵
- Mitchell MR,
- Leibler S
- ↵
- ↵
- ↵
- Bahar I
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Zheng WJ,
- Brooks BR,
- Thirumalai D
- ↵
- Arora K,
- Brooks CL
- ↵
- Miyashita O,
- Onuchic JN,
- Wolynes PG
- ↵
- ↵
- Tlusty T,
- Libchaber A,
- Eckmann JP
- ↵
- Qu H,
- Zocchi G
- ↵
- ↵
- ↵
- Alexander S,
- Orbach R
- ↵
- ↵
- ↵
- Hemery M,
- Rivoire O
- ↵
- Flechsig H
- ↵
- Thirumalai D,
- Hyeon C
- ↵
- Rocks JW, et al.
- ↵
- Yan L,
- Ravasio R,
- Brito C,
- Wyart M
- ↵
- Yan L,
- Ravasio R,
- Brito C,
- Wyart M
- ↵
- ↵
- ↵
- ↵
- ↵
- Lockless SW,
- Ranganathan R
- ↵
- ↵
- ↵
- ↵
- Poelwijk FJ,
- Socolich M,
- Ranganathan R
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Ortlund EA,
- Bridgham JT,
- Redinbo MR,
- Thornton JW
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Dill KA
- ↵
- ↵
- ↵
- Green G
- ↵
- Abrikosov A,
- Gorkov L,
- Dzyaloshinski I
- ↵
- ↵
- Ben-Israel A,
- Greville TN
- ↵
- Tewary VK
- ↵
- ↵
- ↵
- Desai MM,
- Weissman D,
- Feldman MW
- ↵
- Savir Y,
- Noor E,
- Milo R,
- Tlusty T
- ↵
- ↵
- Kaneko K,
- Furusawa C,
- Yomo T
- ↵
- ↵
- Furusawa C,
- Kaneko K
- ↵
- ↵
- Born M,
- Huang K
- ↵
- Woodbury MA
- ↵
Citation Manager Formats
Article Classifications
- Physical Sciences
- Biophysics and Computational Biology
- Biological Sciences
- Biophysics and Computational Biology