Speeding molecular recognition by using the folding funnel: The fly-casting mechanism

  1. Benjamin A. Shoemaker,
  2. John J. Portman, and
  3. Peter G. Wolynes*
  1. Departments of Chemistry and Physics, University of Illinois, 600 South Mathews Avenue, Urbana, IL 61801
  1. Contributed by Peter G. Wolynes

Abstract

Protein folding and binding are kindred processes. Many proteins in the cell are unfolded, so folding and function are coupled. This paper investigates how binding kinetics is influenced by the folding of a protein. We find that a relatively unstructured protein molecule can have a greater capture radius for a specific binding site than the folded state with its restricted conformational freedom. In this scenario of binding, the unfolded state binds weakly at a relatively large distance followed by folding as the protein approaches the binding site: the “fly-casting mechanism.” We illustrate this scenario with the hypothetical kinetics of binding a single repressor molecule to a DNA site and find that the binding rate can be significantly enhanced over the rate of binding of a fully folded protein.

We often glibly state that a protein must be folded to function. The reasoning underlying this statement is that organizing complex networks of chemical reactions in the cell requires these reactions to be highly specific. This specificity is achieved ultimately by having a high degree of geometrical precision in molecular binding—the famous “lock and key.” Geometric precision accompanies the increased rigidity of a biomolecule once it has folded; thus, apparently, folding is required for specific function. It does come as a surprise, then, that many proteins in the cell appear to be unfolded most of the time (see ref. 1 and refs. therein). Several ideas about potential biological advantages of being unfolded have been proposed. For example, the rapid turnover of unfolded proteins because of proteolytic degradation may be required for cell-cycle regulation (2). Thermodynamic arguments have also been made suggesting that coupling folding and binding may allow greater equilibrium distinctions for binding to different sites (3). In this note, we investigate whether too much rigidity may conflict with the need for biomolecules to move during their function and is therefore a kinetic disadvantage. There is much evidence for residual flexibility in the folded state, which must be thought of as having an ensemble of conformations (4). Still, the range of motions allowed in the folded state, as measured by Debye–Waller factors (5), is more restricted than those allowed in an unfolded molecule, thus slowing the exploration of configuration space. Here, we illustrate, by means of a specific example of operator binding to DNA, how the speed of molecular recognition can be enhanced by having folding (necessary for the required specificity) occur during binding rather than before.

Folding and binding are kindred phenomena. The similarity of binding and folding is clear at the thermodynamic level, where both processes involve accurately locating molecular fragments with respect to each other, reducing the configurational entropy, and simultaneously lowering the free energy by the exclusion of solvent and formation of hydrogen bonds and salt bridges (6, 7). At the structural level, the similarity between packing patterns between protein subunits and in protein interiors has been investigated (8, 9). The question of induced fit in binding (10) is clearly parallel to questions of specificity in the folding problem (1114). Our goal is to unite folding and binding in a dynamic perspective. The dynamics of searching for specific conformations in folding has been treated by exploiting statistical mechanical theories of energy landscapes (1518). These theories picture folding as diffusion through an ensemble of partially folded states characterized by reaction coordinates or order parameters. For this reduced description, the energy landscape approach accounts for the trends in average energy as the protein becomes more ordered and utilizes the statistics of the variations in energy, the “ruggedness” of the landscape that allows the possibility of kinetic traps, to understand the speed of the search. Rapid search through the landscape aimed towards a specific native structure requires the landscape to have an overall gradient that is large compared to the local ruggedness. Globally, the energy landscape resembles a funnel (1922). The effect of the ruggedness on a funneled landscape is to provide a slowing of the diffusion through the configuration space. The rate of finding the folded state depends on the configurational diffusion rate as well as the free energy barriers arising from the tradeoff between energy and entropy captured by the funnel description. The free energy of each individual configuration of the chain falls as the ground state is approached, thus providing a driving force in that direction. Countering this trend, there are many more configurations, which are disorganized, at the top of a funnel landscape. The way in which energy and entropy compensate each other determines how big the thermodynamic barrier to folding is. The thermodynamic barriers for real proteins seem to be well described by perfectly funnel-like landscapes (18), so-called Gō models (23). These models account for the energetics of forming a native-like contact, but nonnative interactions are dismissed as thermodynamically insignificant. Such models finesse the issue of specificity by taking it as a phenomenological fact. Dynamics on perfectly funnel-like landscapes can be addressed by using several different mathematical approaches, including direct molecular dynamics or Monte Carlo simulation (2428). An alternative approach develops a free energy functional that describes energy entropy compensation as different parts of the protein order. Such free energy functional methods make explicit the interplay of energetics and chain topology. Unlike detailed atomistic simulations, these methods use as input chain statistics directly accessible to experimental determination such as chain stiffness, etc. The free energy functional is described through local order parameters such as the fluctuation of each residue about the native structure (29) or the probability of forming specific contacts (3032). Such free energy functionals can often be even further simplified by assuming perfect ordering of contiguous segments. These simplified theories account well for trends in folding kinetics (3335).

Free Energy Profiles for Folding and Binding

To describe the combined processes of folding and binding, we first introduce a free energy functional that depends on the fraction of specific contacts made in the protein itself, {q ij p} (3032). Extending this contact free energy functional to describe binding requires additional order parameters: the fractional occupancy of contacts between the protein and its target-binding site (surface contacts), {q ij s}, and geometrical parameters describing the approach of the protein molecule as a whole towards its target, e.g., a vector displacement of the center of mass of the protein, R cm and a set of Euler angles, Ω, specifying the molecular orientation with respect to target fixed axes.

Although the explicit form of the free energy is outlined in the Appendix, it is useful here to briefly comment on the newer physical considerations incorporated in this functional. We consider the free energy functional Formula The first term, representing the free energy functional for protein contacts, is taken from previous publications (3032). The second term accounts for the free energy of binding and couples local structure formation and the binding interactions through the entropy cost of forming surface contacts at a fixed distance and orientation from the binding site. The entropy of localizing surface residues near the binding site depends on the accessible conformations available to those residues. We describe the ensemble of accessible conformations of a partially ordered protein by the distribution of monomer positions about the native structure, {ρi(x i, r i)}. These densities depend on R cm and Ω through the fiducial native positions {r i} (see Fig. 1). The monomer density depends implicitly on the structural characterization of the ensemble. Unstructured regions of the protein will have larger fluctuations and hence a lower entropy cost of binding compared to more localized residues at the same separation distance from the binding site.

Figure 1

The position of a protein surface residue as the protein is displaced from its operator-binding site as used in the free energy functional model. The native crystal structure defines a constant fiducial position of a surface residue (ri) from the protein center of mass (R cm). The fluctuations of residue i are centered about the mean position r i = 〈x i〉 (relative to the binding location, r i B). If the protein is translated only without rotation (Ω = 0), then r i B equals ri so that the probability density of residue i can be described by P(x ir i).


Analogous to refs. 3032, the contact probabilities are determined self consistently by minimizing the free energy with respect to {q ij p} and {q ij s} now at a fixed location from the binding site. Other global structural reaction coordinates such as the total fraction of internal protein contacts, Q p, and the fraction of contacts with the target, Q s, are constrained through Lagrange multipliers. If the internal chain dynamics is fast, this procedure leads to free energy profiles as a function of the approach geometry.

Reduced free energy profiles parameterized by R cm and Ω are directly observable through single-molecule experiments by using atomic force microscope techniques (3638). Even without detailed calculations, we can easily understand how such free energy profiles differ in the limiting cases in which the molecule is completely folded or completely unstructured. If the protein were completely folded at all times (Q p = 1), then binding could occur only when the protein approaches the binding site within a distance of the order of the folded state rms displacement (rmsd), typically on the order of 1 Å, and with an appropriate orientation so that binding contacts can be made across the interface. This gives rise to a very short-range anisotropic attraction because of the small rmsd in a completely folded protein. Effectively there is a large entropic barrier to binding. Conversely, when a protein is completely unfolded, binding can begin as soon as the protein approaches the binding site within a radius of gyration of the unfolded protein (a fraction of the end-to-end distance of a random coil of N bonds of length a, R gFormula a). When unfolded, the attraction has a much longer range but will be quite weak because the binding contacts will not be strongly ordered until they cooperate. By coupling folding with binding, a weak attraction at large distances grows on approach, allowing further folding of the molecule to take place cooperatively, becoming a short-range specific attraction. These considerations suggest that the binding speed is enhanced relative to that of the fully folded protein.

Assuming a concentration of one molecule per Å3, the binding flux for species α can be approximated by the simple formula Formula where D is the diffusion constant, F α(R) is the free energy of species α at the location indicated by R, and R 0 is the contact distance. The binding rate measured in the laboratory comes from each species, so k tot = j u M u + j f M f, where M α = exp[−βF α(∞)]/(exp[−βF u(∞)] + exp[−βF f(∞)]) represents the fraction of each species. This bimolecular rate is in units of inverse molar per second. As a point of reference, we also compute rates for binding of the completely folded (Q = 1) species as k Q=1 = j Q=1 M Q=1, where the subscript denoting the protein order parameter Q p has been suppressed for clarity.

Eq. 2 assumes three-dimensional isotropic diffusion and averages completely over orientations. Accounting for anisotropy is straightforward and would lead to a larger relative speed up. Here we calculate the free energy only as a function of distance from the binding site with the orientation fixed as in the folded complex. Somewhere along the approach to the binding site, the folding commences because the free energy curves cross. Precisely where this occurs depends on whether folding can be nucleated from the binding site outwards. This “nonadiabatic” effect, caused by diffusion through the protein conformation space, requires us to examine a two-dimensional free energy plot F(Q p, R).

An Illustrative Example of the Fly-Casting Mechanism

Speed will be important for binding when concentrations are low, as in gene regulation, where only few copies of a gene and its regulators are sometimes present. Unfolded gene regulators are often found (3941). The protein synthesis parts of the genetic switch mechanism may well be slower than this search, in which event binding speed per se may not have a significant evolutionary advantage, whereas kinetic discrimination might (41). Despite large amounts of beautiful work on these kinetics over the years, there remain open questions of mechanism (4246). Our study should therefore be considered as merely illustrative of the scenario. For concreteness, we consider a caricature of binding of arc repressor. When bound, arc repressor is dimeric. Early evidence suggested that preformed and folded dimers bind as a bimolecular event (47). Our model calculation used to illustrate the mechanism considers a different scenario where one monomer of the pair is already bound, and the final unfolded monomer approaches. The reality may well be somewhere in between, requiring the consideration of a true termolecular process in which the two arc repressor molecules move up and down the DNA chain, binding only when they both encounter the target DNA sequence. Other systems that might better conform to the present scenario may include bZIP proteins (41) and other proteins involved in protein–protein signaling (48). Through a nonspecific largely electrostatic attraction, the repressors are localized in the vicinity of the nucleic acid chain (44). Again, for purposes of illustration, we neglect this aspect, assuming for simplicity that one repressor has already found its site, has folded, and is waiting for the next unfolded repressor to arrive. The folding–binding coupling should have an even more dramatic effect on a true (simultaneous) termolecular process.

The free energy curves as a function of separation distance are plotted in Fig. 2. The two curves that correspond to the folded and unfolded minima remain distinct throughout much of the binding. Also shown on the same plot are the profiles for the limiting cases of a completely folded state that has no disorder (Q p = 1) and a completely unfolded state that has no order (Q p = 0). These limiting cases behave as expected: for the Q p = 1 species, the free energy switches rapidly from no interaction with the binding site to the full stabilization of binding within a short distance of approach to the well; in contrast, for the Q p = 0 species, a weak interaction is formed further away but remains weak at closest approach. We see that if the protein enters in its unfolded but partially ordered state, a long-range attraction is formed that is quite weak but that this surface crosses to the folded state at some point. If folding dynamics is sufficiently fast, then the system switches from the unfolded curve to the lower free energy folded curve, giving, ultimately, the strong binding of the folded state to the DNA molecule.

Figure 2

Free energy of binding for specific ensembles of arc repressor. The FR) curves are shown for the folded (red) and unfolded (green) minima as well as the fully ordered [(Q = 1 (black)] and disordered [Q = 0 (blue)] states at T f. ΔR = RR 0 is the separation distance relative to that of the bound complex (R 0). The effective capture radius is expanded by 8 Å for the unfolded state over the folded (which is 16 Å) and by 14 Å over that for the completely folded Q = 1 state. The orange curve is the free energy of the steepest descent path on the FR, Q p) surface shown in Fig. 3. Note the broken scales of R used to delineate the folding events, which occur in a narrow range of approach distance. The radius of the square well potential is b = R 0 + 6.5 Å. The Debye–Waller factor for the folded residues is Δf = 1 Å, and for the unfolded chain Δu = 17 Å, which is the end-to-end distance of a random coil with 20 bond segments (the number of residues in the binding site).


The two-dimensional plots of F(R, Q p) shown in Fig. 3 give a more complete picture. The transition from the unfolded to folded states does not occur precisely at the equilibrium thermodynamic point where the one-dimensional free energy profiles cross, but instead would most probably follow one of the steepest descent trajectories. Considering the variation of target contact probability along the steepest descent trajectory, shown in red in Fig. 3 and in orange in Fig. 2, we see that some weak binding contacts are made at a large distance in the unfolded state because of the large fluctuations of the unfolded molecule. These contacts, however, ease the formation of structure within the molecule, allowing the free energy to lower as the target is approached. One can picturesquely imagine the process as one of a randomly gesticulating unfolded molecule casting out pieces of polymer chain, waiting for these to bind to the target.

Figure 3

Contour plots of the two-dimensional free energy surface (in units of k B T f) as a function of approach distance and contact ordering fraction Q p. ΔR = RR 0 is the separation distance relative to that of the bound complex (R 0). The steepest descent paths are shown for the unfolded (red) and folded (green) states. At T f, far from the binding site, the folded and unfolded structures are equally favorable. At ΔR = 20 Å, the unfolded state already feels the interaction falling to a lower contour while the folded state remains unaffected by the binding interaction. Q p for the unfolded ensemble hardly increases until ΔR = 8 Å, whereupon the free energy falls dramatically, while Q p increases by about 0.2. Closer in, the unfolded trajectory completes folding rapidly with binding. Parameters and broken scale in R are same as in Fig. 2.


Once the target has been engaged, the folding free energy comes into play. The whole molecule folds and reels in the target, as in fly fishing. The target has a smaller diffusion constant than the protein; so, in fact, the protein gets reeled into its DNA sequence target rather than the other way around. We call the present sequence of events the “fly-casting mechanism.” The main components of this mechanism are illustrated in Fig. 4. How effective the fly-casting mechanism is in speeding up rates depends on the speed of folding itself and on the concentrations of target protein molecule pairs, which determine the overall folding vs. binding rates. Our simplifications probably lead to an underestimate of the effect.

Figure 4

A cartoon of how fly casting increases folding speed. At an approach distance R cm, the partially folded ensemble is already able to form a few initial contacts to the binding site, while the folded structure remains out of range because of the smaller fluctuations in the folded state. Although these initial contacts are weak, they allow the protein to “reel” itself into the operator, completing folding and binding simultaneously. The increased capture radius allows the unfolded protein to find its specific binding site faster.


The free energy calculated as a function of separation distance allows us to compute the binding rate using Eq. 2. In Fig. 5, the ratio of rates k tot/k Q=1 calculated from the free energy along the steepest descents paths shown in Fig. 3, is plotted for a range of unfolded concentrations, M u. This ratio indicates the effective speedup of the fly-casting mechanism over a range of temperatures and binding stabilities. When allowed to have the flexibility to latch onto the binding surface at a greater distance, the partially folded protein binds up to 1.6 times the rate, if only the completely folded (Q p = 1) protein could bind.

Figure 5

The fly-casting speedup ratio, k tot/k Q=1, is plotted vs. the unfolded protein concentration assuming a binding stability corresponding to experiment. For K bind = K exp, fly casting speeds up binding 1.6 times the rate of the fully formed (Q p = 1) protein through a range of reasonable protein concentrations. Parameters for the free energy functional are as in Fig. 2, and the contact radius is taken to be R 0 = 3 Å.


The fly-casting mechanism leads to characteristic and potentially testable dependences of measured rates on stability and temperature. The Arrhenius plot for k tot shown in Fig. 6 indicates that binding is an activated process (if only mildly) at the experimental binding stability, K eq b = 5.2 × 10−14 M (47). This arises because partial unfolding caused by temperature increase helps the mechanism come into play. This behavior is found to persist throughout a wide range of values of the equilibrium constant. For comparison, the binding rate for the completely folded (Q p = 1) state is also plotted in Fig. 6. For this reference situation, binding is virtually unactivated. The temperature dependence given by the model assumes the binding energies themselves are temperature independent. Of course, this assumption is not valid because of the entropic contributions to the hydrophobic interactions. Thus, the laboratory temperature dependence may well be nonmonotonic, as in the phenomenon of cold denaturation. A real test of the mechanism thus needs careful measurements of the equilibrium interactions as a function of temperature.

Figure 6

Arrhenius plots for the total binding flux k tot (solid) and completely folded binding flux (dashed). Notice energetic interaction terms are constant here. Parameters are same as in Fig. 5.


In the process of binding, as in folding a single monomer, explicitly cooperative (i.e., many-body) forces caused by hydrophobic residue burial and side-chain ordering play a role. The free energy functional model implicitly takes these into account for binding in a number of ways through coarse-graining contacts. We have also considered the effects of introducing an explicit local cooperativity between surface contacts, F b coop = −∑ij q ij s q ij s. This cooperativity diminishes the role of fly casting in both the folded and unfolded minima. Significantly increased local cooperativity effectively turns off the ability of the folded state to fly cast, while still allowing the unfolded state to maintain its expanded capture radius (although with a smaller free energetic difference to the Q p = 1 state).

Because folding and binding are coupled, one consequence of the fly-casting mechanism is that even residues away from the binding site in the folded complex can influence the reaction. Indeed, preliminary calculations for this model suggest that changing the stability of contacts outside the binding site gives binding rates that vary with stability. By using an approach similar to the φ-value analysis developed to probe the transition state ensemble in protein folding (49), it should be possible to test this prediction by measuring the binding rates of site specific mutants that alter the relative stability of the folded and unfolded states, but that do not change contacts.

Conclusion

The fly-casting mechanism has some parallels to well-known dimensionality reduction mechanisms that speed the search for specific targets. The possibility of such schemes was pointed out by Adam and Delbrück for sensory receptor binding and by Eigen and Richter for DNA recognition (50, 51), who noticed that for DNA recognition to be fast, it was important to have a weak longer-range nonspecific binding to the DNA before a target sequence was encountered. Here, by exploiting the available folding free energy, an increased capture radius arises for the final recognition event, even though the configuration space of the proteins is high dimensional. The high-dimensional search is facile because the configurational diffusion is guided by the folding funnel. Our calculation, based on a purely funnel-like surface, has not discussed the role of nonspecific binding, but fly casting should also help avoid metastable nonspecific bound complexes arising from the ruggedness of the DNA-protein interaction landscape.

Acknowledgments

We thank José Onuchic and Don Crothers for critically reading the manuscript. This work was supported by the National Institutes of Health (Grant 1R01 GM44557) and the Petroleum Research Fund administered by the American Chemical Society (Grant PRF 30353-AC6).

Footnotes

  • * To whom reprint requests should be addressed. E-mail: wolynes{at}uiuc.edu.

  • Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.160259697.

  • Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.160259697

References