Emergence of symmetry in homooligomeric biological assemblies
 *Department of Biochemistry, University of Washington, Seattle, WA 98195;
 ^{‡}Los Alamos National Laboratory, Los Alamos, NM 87545;
 ^{§}Institute for Nuclear Theory, Box 351550, Seattle, WA 981951550;
 ^{¶}Fred Hutchinson Cancer Research Center, Seattle, WA 98109; and
 ^{‖}Howard Hughes Medical Institute, Seattle, WA 98195
See allHide authors and affiliations

Contributed by David Baker, August 7, 2008

↵^{†}I.A. and C.E.M.S. contributed equally to this work. (received for review May 19, 2008)
Abstract
Naturally occurring homooligomeric protein complexes exhibit striking internal symmetry. The evolutionary origins of this symmetry have been the subject of considerable speculation; proposals for the advantages associated with symmetry include greater folding efficiency, reduced aggregation, amenability to allosteric regulation, and greater adaptability. An alternative possibility stems from the idea that to contribute to fitness, and hence be subject to evolutionary optimization, a complex must be significantly populated, which implies that the interaction energy between monomers in the ancestors of modernday complexes must have been sufficient to at least partially overcome the entropic cost of association. Here, we investigate the effects of this bias toward verylowenergy complexes on the distribution of symmetry in primordial homooligomers modeled as randomly interacting pairs of monomers. We demonstrate quantitatively that a bias toward verylowenergy complexes can result in the emergence of symmetry from random ensembles in which the overall frequency of symmetric complexes is vanishingly small. This result is corroborated by using explicit protein–protein docking calculations to generate ensembles of randomly docked complexes: the fraction of these that are symmetric increases from 0.02% in the overall population to >50% in very low energy subpopulations.
The origin of the striking symmetry of homooligomeric protein assemblies has been the subject of much speculation. The potential benefits of symmetric assemblies to an organism include increased coding economy (1), greater stability, reduced aggregation, robustness to errors in synthesis (2, 3), and amenability to allosteric regulation (3). It has also been proposed that structural symmetry arises because it makes proteins more designable and foldable, or is a result of domain swapping (4–7). Monod and colleagues argued that symmetrical arrangements enhance the sensitivity of selection because “effects of single mutations occurring in a symmetrical oligomer … should be greatly amplified as compared to the effects of similar mutations in a monomer or nonsymmetrical oligomer,” since each mutation at the interface effectively counts twice (3). In analogy to atomic clusters it has been suggested that symmetric oligomeric arrangements may, in general, be low in energy (8). Shakhnovich and coworkers argued that homodimers are favored over heterodimers because the mean energies of homodimeric interfaces are likely to be lower than those of heterodimeric interfaces (9, 10). Many of these arguments are discussed in greater detail in an excellent review by Goodsell and Olsen (2).
As with many evolutionary questions, it is difficult to assess which of the factors described above contributed to the symmetry of modernday assemblies and to what extent. However, with the advent of molecular modeling, it is now possible to assess the sufficiency of quantitative hypotheses in this area. In this article we develop a quantitative model for the emergence of symmetry in biological assemblies that is based on explicit docking calculations on realistic protein models.
The key premise underlying our approach is that selection is only likely to operate on primordial complexes with sufficient initial interaction energy to at least partially overcome the entropic costs of association of the monomers; evolution can only optimize a complex that is populated sufficiently to confer a benefit to the organism. For example, contrast the effects of a mutation that stabilizes an interface by 1 kcal/mol on the population of binding modes with interaction energies of 0 and −8 kcal/mol, respectively. Assuming that the freeenergy cost arising from the entropy loss in association is 10 kcal/mol, the mutation increases the population of the first binding mode from 5 ×10^{−8} to 3 ×10^{−7}, and the second, from 0.035 to 0.18. Because the population of the first binding mode is vanishingly small even after mutation, it is unlikely to contribute to fitness, and hence, the second, much lower energy binding mode is a more likely starting point for evolutionary optimization by mutation and selection.
We investigate the extent to which such a bias toward verylowenergy primordial complexes could have contributed to the emergence of the highly symmetric homooligomeric complexes ubiquitous in nature today. We show that there are two competing factors that determine the fraction of complexes with a given degree of symmetry in a subpopulation: for complexes of increasing symmetry, the number of possible configurations decreases while the variance in their interaction energy grows. We show by using both simple analytic models and explicit protein–protein docking simulations that the combination of these two counteracting factors leads to a predominance of symmetric structures in very low energy binding modes. In contrast with most previous suggestions, in our model, symmetrical complexes are not favored because they provide specific functional advantages, but because the initial pool of protein complexes available for competitive selection is heavily biased toward symmetrical binding modes.
Results and Discussion
To determine whether a bias toward low energy alone can give rise to highly symmetric complexes, we first quantify the extent of symmetry in randomly docked complexes, and show that a majority of naturally occurring complexes are more symmetric than 99.98% of random complexes. Next, we show that the variance in the interaction energy distribution for randomly docked complexes increases with increasing symmetry. We then combine these results and find that increasingly stringent bias toward lowenergy structures yields increasingly symmetric populations. At each step, the results from simple model calculations are corroborated by using explicit protein–protein docking simulations; in particular, we find that the fraction of symmetric complexes generated in protein docking simulations (11) increases from 0.02% in the overall population to over 50% in verylowenergy subpopulations.
Extent of Symmetry in Randomly Generated Complexes.
We begin by comparing the symmetry of naturally occurring homodimeric proteins to that of randomly generated homodimeric complexes. To quantify the notion of nearly symmetric requires a metric for deviations from perfect symmetry in homodimer complexes. Let A and B be any two atoms in one protein monomer, and A′ and B′ be the corresponding atoms in the other monomer. If the dimer is perfectly symmetric, then
We define the symmetry measure S_{dev} to be the average of dist(A, B′) − dist(A′, B) over all pairs of αcarbon atoms. This atomicdistancebased measure is useful, among other reasons, because the system energy is largely a function of interatomic distances. Perfect symmetry corresponds to S_{dev} = 0.
Estimates of the S_{dev} probability distribution, P(S_{dev}), for random homodimers were obtained by randomly orienting pairs of monomeric protein structures (Table 1) followed by a translation to bring them into atomic contact with no energy minimization. The S_{dev} probability distributions for different proteins (Fig. 1A, yellow, green, and brown lines) are very similar, with an approximately linear increase in probability density with increasing S_{dev} starting from zero at perfect symmetry. To investigate the generality of this result, we determine P(S_{dev}) for a simple model system composed of a pair of identical spheres of radius r_{g} with atoms uniformly distributed on their surfaces. For finite distance between the two spheres, this calculation (supporting information (SI) Appendix) involves an integral that must be evaluated numerically. In the limit of infinite distance between the spheres, the integral can be evaluated analytically, and the result is simple: As shown in Fig. 1A, the interacting sphere model (blue lines) qualitatively recapitulates the results for the random protein complexes (yellow, green, and brown lines). The infinitedistance solution (dashed blue line) is a reasonable approximation to the exact solution even at small intermonomer distances (solid blue line), in particular, when the deviation, S_{dev}, from perfect symmetry is small.
The empirical S_{dev} distribution for a set of 796 native protein homodimers contrasts strikingly with that of randomly generated dimeric complexes as illustrated in Fig. 1A (red line). The majority of native structures are more symmetric than >99.98% of randomly docked complexes; the probability of observing a random complex with S_{dev} < 0.2, as is typical for native homodimers, is ≈1/5,000.
The contrast between the high symmetry of native complexes and low symmetry of random homodimers frames the central objective of this article: to establish a quantitative explanation for the high symmetry of oligomeric assemblies observed in nature.
Interaction Energy Distribution for Randomly Generated Complexes.
Consider a set of randomly generated homodimeric interfaces, each of which contains N independent interactions across the interface. Supposing each of these interactions is drawn independently from a distribution with standard deviation E, the distribution for the total interaction energy has a standard deviation
We tested the prediction of a broader energy distribution for symmetrical protein arrangements by using docking simulations. As shown in Fig. S1, the distribution of interaction energies in randomly generated C_{2} symmetric complexes is indeed broader than that of the randomly generated asymmetric complexes described above; the variance is ≈1.8 times greater in the symmetric case, reasonably close to the theoretical estimate of 2. The distributions are Gaussian, as might be anticipated from the Central Limit Theorem applied to a sum of random interactions (see SI Appendix for discussion).
The analysis is readily extended to complexes that deviate from perfect symmetry. The interaction energy is the sum of pairs of (A, B′) and (A′, B) interactions; in the S_{dev} = 0 Å limit these are perfectly correlated and for large values of S_{dev} they are completely uncorrelated. In the intervening partially symmetric regime, the interaction energies are partially correlated. The difference in the AB′ distance and the A′B distance is on average S_{dev}, and hence the correlation ρ(S_{dev}) between the AB′ and A′B interaction energies decays as S_{dev} approaches the characteristic length of the interaction potential. Assuming for concreteness a Gaussian decay in the interaction energy correlation over length L (≈1 Å), we obtain (see SI Appendix and Fig. S2) The Gaussian energy distribution for complexes with symmetry S_{dev} is then
Emergence of Symmetry.
Combining Eqs. 1 and 3, we find that the frequency of complexes with interaction energy E and symmetry S_{dev} is Evolutionary selection can operate only on complexes sufficiently populated to contribute to function; thus, we are interested in the distribution of S_{dev} for complexes with binding energy strong enough to at least partially overcome the 10–15 kcal/mol entropic cost of association (12). We obtain the S_{dev} distribution as a function of the binding energy threshold E′ required for a contribution to fitness by numerically integrating P(S_{dev}, E) from −∞ to E′. As shown in Fig. 1B, with increasingly stringent thresholds (i.e., more negative E′), the probability distribution shifts from predominantly asymmetric (i.e., highS_{dev}) complexes in the random population (blue) to increasingly symmetric (black); when the energy threshold drops below −10σ (red), the distribution is similar to that of the highly symmetric natural homodimers (Fig. 1A, red). In the intermediate case (black), the distribution is bimodal with dominantly asymmetric and dominantly symmetric subpopulations; we quantify the abruptness of this transition between the red and blue modes further below.
We investigated whether this striking emergence of symmetry is manifested in explicit protein docking calculations. The randomly docked complexes described above were refined by using the Rosetta Monte Carlo protein docking protocol (11) to produce well packed proteinlike interfaces. For each of the five protein test cases, this energybased refinement alone considerably increased the extent of symmetry in the population (Table 1; see the SI Appendix and Fig. S3 for discussion). We then selected the subset of refined structures with energies below different cutoff thresholds and determined the S_{dev} distribution in each lowenergy subpopulation. As shown in Fig. 1C for a typical protein, this energybased enrichment (black) shifts the original random distribution (blue) dramatically toward low values of S_{dev}, and closely parallels the analytic model results (Fig. 1B).
The increase in the fraction of nearly symmetric structures (S_{dev} < 0.2 Å) with more stringent energybased selection threshold is illustrated in Fig. 1D. The simple model predicts a relatively abrupt phasetransition behavior as the energy cutoff for selection drops below −5σ (Fig. 1D, red). This increase in symmetry after energybased selection is recapitulated in the populations of refined docked protein complexes: increasingly stringent bias toward lowenergy structures produces populations with increasing fractions of nearly symmetric (S_{dev} < 0.2 Å) structures (Fig. 1D, brown line, and Table 1). Examples of verylowenergy docked complexes with high symmetry are shown on the left side of Fig. 2.
The preceding analysis suggests that the primordial complexes with populations sufficient to contribute to fitness and hence be subject to natural selection were very likely to be symmetrical. In our model, these states will form the starting point for subsequent evolutionary optimization by mutation and selection. As discussed earlier, Monod et al. (3) suggested that symmetrical interfaces have a further advantage at this evolutionary optimization stage because the effect of a mutation is effectively amplified by symmetry. Indeed, the argument presented in the paragraphs preceding Eq. 2 carries over immediately to the mutation case: the standard deviation of the interaction energy associated with single mutations at interfaces is
The analysis in this article has focused primarily on oligomeric forms in which there is a single type of interaction patch between monomers. Routes to more complex oligomers have been discussed by Monod et al. (3), Hansen (13), Teichmann et al. (14, 15), and Bennett et al. (4, 5). Teichmann and coworkers have shown that higherorder oligomers are often built up of smaller symmetric dimers and trimers that associate by using additional interaction patches (15). Our findings of increased likelihood of symmetry among lowestenergy complexes are relevant to both the early stage and late stage in acquisition of oligomer structure in their model; for example, in a tetramer, the initial forming of dimers from monomers, and subsequently the formation of tetramer from interacting dimers.
As noted in the introduction, Shakhnovich and coworkers have studied a question related to the problem investigated here; the energetic advantage of homodimeric versus heterodimeric complexes (9, 10). In the twodimensional interacting disk model used in those studies, homodimeric complexes are necessarily symmetric, so the problem of the origin of symmetry of homodimeric complexes did not arise. For this model the authors found a twofold greater variance in the homodimer energy distribution than in the heterodimer energy distribution, which parallels our finding for symmetric versus asymmetric homodimeric complexes in this article. Although provocative, in the absence of a model for the decrease in the number of states with increasing symmetry, the earlier work does not indicate whether selection for low energy alone is sufficient to result in the emergence of symmetric structures. (See SI Appendix for further discussion.)
The importance of estimating the prior probability of symmetric complexes is illustrated by the apparent inconsistency between the prevalence in nature of homodimers (and dimers of dimers) (15) and the jfold increase in the variance in energy in cyclic oligomers with J subunits as noted in the above analysis of the interaction energy distribution for random complexes, which should favor the higherorder symmetric structures in the lowenergy subpopulation. As emphasized earlier, the fraction of symmetric structures in the lowenergy population depends not only on the energy variance, but also on the prior probability of forming a symmetric complex. The probability of randomly forming a symmetric complex is much smaller for higherorder cyclic oligomers^{††}, and the balance between the two factors favors symmetric dimers over higherorder symmetric oligomers.
The key result of this article is that symmetric assemblies can arise solely from a bias toward complexes with sufficiently favorable interaction energy to be significantly populated despite the entropic cost of association. Particularly notable is the almost complete shift toward the S_{dev} distribution of native complexes (Fig. 1A, red) produced by selecting the lowestenergy complexes (Fig. 1C, black) from large populations of randomly docked complexes (Fig. 1C, blue) generated in Rosetta allatom docking simulations. This enrichment occurs because the increase in the variance of the interaction energy with increasing symmetry more than compensates for the decrease in the density of states with increasing symmetry. As shown in the SI Appendix, if the variance in the interaction energy becomes large, as perhaps is the case of partially folded monomers that can reconfigure to optimize interactions with other monomers, symmetry can arise at equilibrium in the absence of selection, which may have relevance to amyloid formation. Similar considerations may also have relevance to the nucleation of symmetry from intrinsically nonsymmetric monomers in nonbiological assemblies such as crystals and clusters. Finally, although we have shown that the pool of randomly arising candidates for eventual competitive selection are a priori extremely biased toward symmetry, it must be emphasized that, as in any argument about history, it is nearly impossible to evaluate the relative contributions of different potential causes; along with the initial bias described here, the factors noted by Monod et al. (3, 16) and others summarized in the introduction likely contributed to the high symmetry of modern homooligomers.
Methods
The proteinmodeling software Rosetta was used to perform the protein–protein docking and protein design calculations (11). Rosetta uses realspace Monte Carlo minimization to find the lowestenergy conformation of binding partners. The first step in the docking protocol consists of a lowresolution search where only the backbone atoms and a single “centroid” atom representing the side chains are modeled. Random starting points for docking are generated by randomizing the rotational degrees of freedom of the binding partners, and then translating them into contact. Fig. 1A and column 4 of Table 1 was generated at this stage. In the subsequent highresolution docking stage, a detailed physically realistic allatom energy function (17) is optimized by using Monte Carlo minimization protocol. Each step in this protocol consists of (i) a small random perturbation to the rigid body degrees of freedom, (ii) discrete optimization of the sidechain rotamer conformations, and (iii) quasiNewton optimization of the side chain and rigid body degrees of freedom (11).
The SI Appendix contains a more detailed description of the methodology, notably a more detailed description of the docking and design calculations, derivations of the analytical results, detailed descriptions of the numerical simulations used to generate Fig. 1, and simulations investigating the effect of protein shape on the distribution of S_{dev} (Fig. S5).
Acknowledgments
We thank Ora SchuelerFurman, John Moult, Rhiju Das, and David Eisenberg for stimulating conversations about the origin of symmetry. This work was supported by a Knut and Alice Wallenberg Foundation postdoctoral fellowship (to I.A.), by Defense Threat Reduction Agency Contract MIPR7KO8970172 (to C.E.M.S.), in part by Spanish MEC grant SAB20060089 and project FPA200605423 (to D.B.K.), by the regional Comunidad de Madrid HEPHACOS project (D.B.K.), and by Department of Energy Grant DEFG0200ER41132 (to D.B.K.), and by the National Institutes of Health and Howard Hughes Medical Institute.
Footnotes
 **To whom correspondence should be addressed. Email: dabaker{at}u.washington.edu

Author contributions: I.A., C.E.M.S., D.B.K., P.B., and D.B. designed research; I.A., C.E.M.S., D.B.K., and D.B. performed research; I.A., C.E.M.S., D.B.K., and D.B. analyzed data; and I.A., C.E.M.S., and D.B. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0807576105/DCSupplemental.

↵†† The number of degrees of freedom required to specify an oligomer structure is 6 [(number of subunits) − 1] for asymmetric complexes and 4 for symmetric complexes, so the fraction of randomly formed complexes that are symmetric is exponentially decreasing with exponent [6 (number of subunits) − 10]. Dimers of dimers have 6 degrees of freedom versus the 4 degrees of freedom for cyclic tetramers, which may also contribute to their greater prevalence. The loss of entropy on binding also favors dimers over higherorder oligomers because the entropic cost of forming a higherorder oligomer increases linearly with the number of subunits.
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 ↵
 ↵
 Bennett MJ,
 Choe S,
 Eisenberg D
 ↵
 ↵
 Levy Y,
 Cho SS,
 Shen T,
 Onuchic JN,
 Wolynes PG
 ↵
 ↵
 Wolynes PG
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Monod J
 Engstrom A,
 Strandberg B
 ↵