Excluded volume, local structural cooperativity, and the polymer physics of protein folding rates
See allHide authors and affiliations

Edited by Peter G. Wolynes, University of California at San Diego, La Jolla, CA, and approved April 14, 2007 (received for review October 20, 2006)
Abstract
A coarsegrained variational model is used to investigate the polymer dynamics of barrier crossing for a diverse set of twostate folding proteins. The model gives reliable folding rate predictions provided excluded volume terms that induce minor structural cooperativity are included in the interaction potential. In general, the cooperative folding routes have sharper interfaces between folded and unfolded regions of the folding nucleus and higher free energy barriers. The calculated free energy barriers are strongly correlated with native topology as characterized by contact order. Increasing the rigidity of the folding nucleus changes the local structure of the transition state ensemble nonuniformly across the set of proteins studied. Nevertheless, the calculated prefactors k _{0} are found to be relatively uniform across the protein set, with variation in 1/k _{0} less than a factor of 5. This direct calculation justifies the common assumption that the prefactor is roughly the same for all small twostate folding proteins. Using the barrier heights obtained from the model and the bestfit monomer relaxation time 30 ns, we find that 1/k _{0} ∼ 1–5 μs (with average 1/k _{0} ∼ 4 μs). This model can be extended to study subtle aspects of folding such as the variation of the folding rate with stability or solvent viscosity and the onset of downhill folding.
Folding in small proteins is often well characterized as a cooperative transition between two well defined structural populations: an unstructured globule ensemble and a structured folded ensemble. The transition rate between free energy minima is controlled by the dynamics of passing through an unstable transition region determined by saddlepoints in the free energy surface. Accordingly, the rate is expected to follow Arrhenius form where β = 1/k _{B} T is the inverse temperature and ΔF ^{†} is the free energy difference between the unfolded and transitionstate ensembles. The exponential factor in Eq. 1 reflects the equilibrium population of the transitionstate ensemble relative to unfolded ensemble and the prefactor, k _{0}, is the time scale associated with the dynamics of crossing the free energy barrier. Successful identification of specific residues structured in the transitionstate ensemble by several different theoretical models (1–8) and numerous simulation studies (see, e.g., refs. 9–11 and references therein) has established that the topology of the native structure determines the folding mechanism of these proteins. In addition, twostate folding rates are well correlated with very simple measures of the native state topology such as contact order (12–15). While additive potentials often produce reasonable structural characterization of the transition state ensemble, the range of simulated folding rates and their relationship with contact order does not agree with experiment (10). In this paper we present direct folding rate calculations that capture the trends noted earlier in lattice models (16, 17) and very recently in continuum models (18, 19), leading several groups to speculate that the behavior of folding rates indicates enhanced structural cooperativity.
The term “structural cooperativity” usually refers to a mechanism by which the presence of a structured region makes additional order more favorable. For example, cooperativity is greater when a contact between two residues is more stabilized after one of the partners is already ordered. The additional stability is most naturally introduced through local attractive multibody interactions associated with coarsegrained potentials (18, 20, 21). Nonadditive potentials can also be neutral with respect to disordered and ordered residues and still increase cooperativity. For example, destabilizing partially ordered residues near structured resides generically sharpens the interface between folded and unfolded regions by increasing the surface energy of the folding nucleus (2, 22). Even purely repulsive interactions can enhance cooperativity. A good example of this is the “induced rigidity” (enhanced helical order) due to liquid crystal ordering in dense polymer solutions (23, 24) and globular helical proteins (25).
In the present analytic model, cooperativity is introduced through repulsive excludedvolume interactions between residues in proximity to native contact pairs. This potential is effectively “neutral” because it primarily destabilizes partially ordered residues at the interface of the folding nucleus. The cooperative term of the potential is pairwise additive in the space of all contacts, but it corresponds to an effective multibody potential when projected onto the set of native contacts. The particular form of cooperativity was developed so that the calculated barrier heights remain robust with respect to variations of excluded volume strength in the original variational model (3, 7).
Experimental evidence supporting a specific decomposition of the folding rate into the dynamic and thermodynamic factors in Eq. 1 is necessarily indirect (26). While structural predictions from models with a strong native state bias (Gō models) are robust, the value of the barrier height (and corresponding absolute time scale 1/k _{0}) is more sensitive to details of the model (27, 28). The predicted prefactor is commonly assumed to be roughly uniform for different proteins with a magnitude of O(0.1–1 μs^{−1}) (26, 29–31), although prefactors as large as O(100 μs^{−1}) have been proposed recently (27, 32). While the precise value of the prefactor is a subdominant determinant of the absolute rate, accurate estimates give an important reference time scale essential, for example, to identify the fastest measured rates as downhill (or barrierless) folding (26, 33, 34). Calculations for 28 twostate proteins presented in this paper predict that the prefactor is relatively uniform on the order O(1 μs^{−1}), largely independent of differences in the absolute folding rates or the native state topology. Furthermore, predicted folding rates agree with experimental trends provided interaction terms favoring modest structural cooperativity are included in the model. In particular, the relationship between barrier heights and contact order is found to be a consequence of relatively rigid folding nuclei.
Model: Excluded Volume and Cooperativity
The variational model developed by Portman, Takada, and Wolynes (7, 8) has proved reliable in predicting the structure of the transitionstate ensemble of individual proteins at the residue level of resolution (3, 35, 36). In this model, the free energy of partially ordered ensembles of polymer configurations is developed through a reference Gaussian chain inhomogeneously constrained to the native positions by N harmonic variational constraints {C}. A summary of the variational model is given in the supporting information (SI) Text . Here, we focus on how enhanced cooperativity can be realized by the addition of repulsive interactions between nonnative contacts.
We divide the energy into two contributions where the subscript indicates an average over the reference Hamiltonian. The first term represents attractive interactions between monomers that are neighbors in the native structure (i.e., the Gōmodel assumption). The second term, which is new to the model, represents excludedvolume interactions between nonnative contact pairs. Before explaining the consequences of these repulsive interactions, we first motivate the need for this contribution by considering the native contact potential in the original model.
The form of the interactions between native contacts is the sum of three Gaussians for convenience: u ^{nat}(r _{ij} ) = ε _{ij} ∑_{a=l,i,s} γ _{a} ^{e−αarij2} , where ε _{ij} is the strength of the interaction (37). Repulsive intermediaterange and attractive longrange Gaussians sum to give a potential well with minimum u ^{nat}(r _{min}) = −1 at the distance r _{min} = 6 Å. The shortrange Gaussian represents excluded volume for native contact pairs; we choose γ _{l} for each contact to give the same strength at zero distance, U(0). The finite strength of the repulsion at r = 0 is an artifact of the potential (and the finite native monomer density at short range), so there is some ambiguity in determining the appropriate value for U(0). This is troubling because it was found that the calculated barrier height is sensitive to the value of U(0), even though the structure of the transition state ensemble is relatively robust for most proteins.
The sensitivity of ΔF
^{†}/k
_{B}
T
_{f} on the excluded volume strength U(0) indicates that the cooperativity in the model is relatively low. This can be understood by considering the shortdistance pair density of a partially ordered chain, n
_{ij}
(r) = <δ(r − r
_{ij}
)〉_{0}. Integrating over angles gives the radial pair density
where the correlations G_{ij}
= 〈δr
_{i}
· δr
_{j}
〉_{0}/a
^{2} and δG_{ij}
= G_{ii}
+ G_{jj}
− 2G_{ij}
is the magnitude of the fluctuations about the relative mean separation s
_{ij} = ∑ (G_{ik}
− G_{jk}
)C_{k}
r
_{k}
^{N}; and a = 3.8 Å is the distance between adjacent monomers. The weight at short distances (r < r
_{0}, where r
_{0} is excluded volume interaction length scale) is small when the pair is sufficiently delocalized
When viewed as an effective potential involving only native contact pairs, the repulsion between nonnative contacts effectively induces local multibodied interactions. Because of chain connectivity, structured regions that are sufficiently nonlocal in sequence have greater cooperativity (see Fig. 1). Accordingly, we define a reduced set of nonnative contacts: for every native pair (i, j) with i − j ≥ 12, we include pairs within a window [i ± 4, j ± 4] and eliminate duplicates or native contact pairs from the sum. With this convention, the barrier height varies less than about 1–2 k _{B} T _{f}, over a wide range of U(0) (5 ≤ U(0) ≤ 60). In the following we take U(0) = 50.
The parameters of the model are the same as given in ref. 7 except (i) the magnitude U(0) is fixed for each contact; (ii) inclusion of the cooperativity term; (iii) the radius of gyration of the globule is set by the chain length according to the scaling law given in ref. 38. We note that with these parameters, the folding route for λ repressor studied in ref. 7 is structurally similar to the cooperative folding route, although the barrier is ∼2 times larger.
Results
Folding Rates and Prefactors.
We calculated the prefactors and folding routes of 28 twostate folding proteins. The corresponding folding rates at the transition midpoint, k_{f} = k _{0} e ^{−ΔF†/kBTf}, are shown in Fig. 2. Calculated rates in absolute units depend on the time scale set by the monomer relaxation rate σ = 3D _{0}/a^{2}, which we take as a fitting parameter. As shown in Fig. 2, the predicted and measured rates are well correlated (r = 0.8) with agreement within an order of magnitude for 80% of the proteins.
The bestfit monomer relaxation time 1/σ = 30 ns is on the order of the time scale of unfolding a helical segment (39). With this microscopic time scale, the longest relaxation time of a chain of 100 monomers is approximately τ_{R} ∼O(10 μs), which compares well (40) with the time scale for the fastest collapse kinetics measured in proteins and polypeptides (41–43). On the other hand, 1/σ is an order of magnitude slower than estimates obtained from an effective diffusion coefficient inferred from loop closure experiments of small peptides (∼1 ns) and two orders of magnitude slower than estimates from bare diffusion coefficients of the monomer (∼100 ps) (44). The source of small effective diffusion coefficients associated with simple Gaussian models is not fully understood (45–47). Nevertheless, results from recent experiments on small peptides under different solvent conditions indicate that intrachain interactions (which can be interpreted kinetically as a kind of internal friction) induce local activation barriers that renormalize the effective monomer diffusion coefficient (48, 49). Although controversial, internal friction may explain why the speed limit of protein folding is fixed at ∼O(0.1–1.0 μs) (50).
Even though the prefactor of each protein is calculated individually, its value in absolute units ultimately depends on the calculated barrier heights through the microscopic time scale σ. Relative prefactors, on the other hand, are independent of this fitting parameter. As shown in Fig. 3, the distribution of prefactors is relatively uniform, varying within a factor of 5 for most proteins. Using the fitted value for σ, τ_{0} = 1/k _{0} varies mainly between 1 and 5 μs, with an average τ̄_{0} = 4 μs. Given this narrow distribution, it is not surprising that a uniform prefactor of k̄ _{0} = (4 μs)^{−1} gives essentially the same correlation to the measured and predicted rates (data not shown). Thus, direct calculation of the barriercrossing dynamics gives solid evidence supporting the common assumption that the folding rate prefactor is largely independent of topology. Recent work by Henry and Eaton (28) also suggests the prefactor is relatively uniform across twostate folding proteins based on analysis of folding rates from a different set of analytic models. The value for the average prefactor k̄ _{0} ∼ 10^{5} s^{−1} agrees within an order of magnitude with estimates based on semiempirical and theoretical models (5, 51, 52) as well as analysis of thermodynamic data from differential scanning calorimetry (53). This value also is consistent with the fastest measured rates, ∼1 μs, if the time scale for downhill folding is approximated by the Arrhenius rate with a vanishing barrier (26, 34).
A closer look at the two proteins (1lmb and 1pks) with exceptionally small calculated prefactors reveals that in each case the unstable mode becomes degenerate at a stability near the transition midpoint. The structure of the transition ensemble changes sharply, although continuously, as the temperature crosses the degenerate point. In particular, the curvature of the unstable mode (and consequently the calculated prefactor) sharply vanishes in a cusp catastrophe (54). Away from these isolated temperatures, the prefactors return to the range exhibited by the majority of the proteins studied. Several of the proteins studied have similar rapid changes of the transition state as a function of temperature, occurring at temperature sufficiently far away from the midpoint so that the prefactor is relatively unaffected near T _{f}. In this highdimensional model, catastrophes can be generally expected when local minima and saddlepoints merge at isolated values of the control parameters (e.g., temperature). The shape of the calculated prefactor versus temperature is thus determined by these degenerate points. For example, even for a route with a single transition state, the metastable unfolded or folded minimum disappears in a fold catastrophe at the limit of stability (spinodal) for both low and high temperatures (55, 56). If there are no other catastrophes, the calculated prefactor obtains a maximum at an intermediate temperature and vanishes at the spinodals. Near the maximum the prefactor varies much more slowly with temperature than near the spinodal. This generic shape of the prefactor is interesting because it can account qualitatively for nonlinear dependence of the rate with stability (chevron turnovers) and may indicate kinetic signatures anticipating the onset of downhill folding.
Nevertheless, interpreting these results requires some care. The harmonic expansion of the free energy is not expected to accurately reflect the global curvature of the free energy over ∼F ^{†} ± k _{B} T when the local curvature is very small. For these cases, it is likely that the formalism should be modified away from strictly local curvatures to get accurate estimates of the prefactor. Even if the renormalized prefactor is found to be relatively constant, the rapid change of the order parameter at the transition state that accompanies a catastrophe may itself account for chevron rollover, similar to the transition state switching mechanism suggested by Oliveberg and colleagues (57). The subtle variation of the prefactor and free energy barrier height with stability is an important issue that has yet to be thoroughly explored.
Free Energy Profiles and Folding Routes.
In the context of identifying kinetic trends for all twostate proteins, a complete theory of the folding mechanism must reliably predict structural properties of the transition state ensemble in addition to absolute folding rates. The formation of local order along the folding route can be characterized by the degree of localization about the native positions We refer to ρ _{i} as the native density. Comparing the folding profiles and structural localization of the residues shown in Fig. 4 illustrates the cooperative nature of the folding routes induced by the repulsive nonnative interactions. While the coarsegrained structures of the transition state ensembles are similar for this protein, the residues order more gradually in the noncooperative routes. Still, even for the cooperative route, the interface has a finite width as the structural ensembles retain some partial ordering of the residues. The sharper interface of the cooperative route is also accompanied by a significantly larger barrier.
The effect of cooperativity on the structure of the transition state ensemble is complicated to describe in general. Cooperativity narrows the interface by destabilizing partially ordered residues in favor of either more ordered or more disordered. Whether a particular interfacial residue is excluded or incorporated into the folding nucleus is a subtle question, determined by the delicate balance between changes in entropy and energy due to localization. One measure to characterize changes in local structural order is the crosscorrelation where ρ̂_{coop} and ρ̂_{noncoop} denote unit vectors with elements ρ _{i} [{C}] for transition state ensembles with and without cooperativity, respectively. Fig. 5 shows the value Ω for each protein as well as a typical example of the overlap of native densities evaluated at Q*. For 80% of the proteins studied, the overlap between the transition state ensemble structures is >60%. Nevertheless, the variation of Ω indicates this form of cooperativity does not effect every protein uniformly.
Changes of the transition state ensemble can be also be characterized by the variation of the global order parameter ΔQ*. As shown in Fig. 5, the majority of the proteins studied have ΔQ* ≤ 0.1. In terms of global order, the αhelical proteins are not very sensitive to cooperativity, although the local structure of the transition state ensemble can change significantly. For β and α/β proteins, some systematic errors in the calculated barrier height can be associated with relatively large changes in the global order. For proteins with ΔQ* > 0.1 (1urn, 1c8c, 1psf, 1csp), the model overestimates the barrier heights, whereas for proteins with ΔQ* < 0.1 (1pgb, 1a0n, 1coa, 1shg) the model underestimates the barrier height (see Fig. 2). This trend may be particular to the form of cooperativity used in this model.
Direct comparison between theoretical and measured φ values shows that cooperative routes generally have significantly higher correlation with experiment. Following Garbuzynskiy et al. (27), we make a distinction between contact maps obtained from native structures determined by xray crystallography and those from the first model of an NMR structure or minimized averaged NMR structure. Overall, the theory predicts φ values for studied xray structures reasonably well. Still, there are exceptions. Of the 11 xray structures (see legend of Fig. 2), two proteins(1shg and 1ten) have large negative correlations. The average correlation coefficient for 9 remaining proteins increases from 0.33 (for noncooperative routes) to 0.6 (cooperative routes). Predictions of φ values for NMRdetermined structures is significantly worse, with the average 0.1 for both noncooperative routes and cooperative routes. In Fig. 5, we give three examples each for xray structures and NMR structures.
Folding Barriers and Absolute Contact Order.
Because the prefactors are relatively uniform, the wide variation of relative folding rates is determined by differences in free energy barriers. To investigate the relationship between barrier heights and native topology, we consider the correlation between the free energy barriers and the absolute contact order (13). Fig. 6 shows that the calculated barrier height is highly correlated (r = 0.9) with absolute contact order when the cooperativity term is included in the model. The barrier heights calculated without cooperativity do not show significant correlation with absolute contact order. This observation indicates that the relationship between native topology and the folding rate is sensitive to the rigidity of the folding nucleus. This may in fact be a robust result, largely independent of the details of reasonable potentials that increase local cooperativity between native contacts (18).
Assuming the prefactor is roughly uniform, the range of measured rates for this protein set corresponds to a range of free energy barrier heights of ∼14k _{B} T _{f}, in agreement with the calculated barriers. In contrast, the range of barriers for the noncooperative routes spans only ∼5k _{B} T _{f}. Interestingly, this is the same range determined through coarsegrained Gōmodel simulations (10, 58, 59). Furthermore, the low correlation between contact order and barrier heights of noncooperative routes is also reminiscent of results from Gōmodel simulations (10). Together, these results suggest that the cooperativity of typical Gōmodel simulations based on twobody pair potential is too low (16–19).
Conclusion
The repulsive potential between residues in proximity to native contacts is a convenient way to alleviate sensitivity on the excluded volume strength in the original model. This approach was successful because the potential enhances cooperativity of the model. Our point of view is that the nature of the interface of the folding nucleus is key in determining the behavior of folding rates and mechanisms, regardless of the specific form of cooperative interactions or the microscopic origins. If the qualitative results from this study can be extended beyond this variation model, it is likely to be limited to models that enhance cooperativity locally. Because these results are robust with respect to the excluded volume strength U(0), the model lacks flexibility to explore a wide range of surface tensions. It will be interesting to see whether these conclusions hold when the interfacial surface tension is controlled directly through, for example, the formalism of density functional theory of firstorder nucleation.
Acknowledgments
This work was supported in part by grant awarded by the Ohio Board of Regents Research Challenge program.
Footnotes
 ^{‡}To whom correspondence should be addressed. Email: jportman{at}kent.edu

Author contributions: J.J.P. designed research; X.Q. and J.J.P. performed research; X.Q. and J.J.P. analyzed data; and X.Q. and J.J.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0609321104/DC1.

Freely available online through the PNAS open access option.
 © 2007 by The National Academy of Sciences of the USA
References

↵
 Shoemaker BA ,
 Wang J ,
 Wolynes PG
 ↵
 ↵

↵
 Alm E ,
 Baker D

↵
 Muñoz V ,
 Eaton WA

↵
 Galzitskaya OV ,
 Finkelstein AV
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Ejtehadi MR ,
 Avall SP ,
 Plotkin SS
 ↵
 ↵
 ↵

↵
 Wolynes PG
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Jha AK ,
 Colubri A ,
 Freed KF ,
 Sosnick TR
 ↵

↵
 de Gennes PG
 ↵

↵
 Qui L ,
 Zachariah C ,
 Hagen SJ
 ↵

↵
 Lapidus LJ ,
 Eaton WA ,
 Hofrichter J
 ↵
 ↵
 ↵
 ↵

↵
 Moglich A ,
 Joder K ,
 Kiefhaber T
 ↵
 ↵

↵
 Kouza M ,
 Li MS ,
 O'Brien EP, Jr ,
 Hu CK ,
 Thirumalai D
 ↵

↵
 Gilmore R

↵
 Wales DJ
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵