New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Brownian dynamics simulations of protein folding: Access to milliseconds time scale and beyond

Communicated by Edwin N. Lightfoot, Jr., University of Wisconsin–Madison, Madison, WI (received for review July 21, 1997)
Abstract
Protein folding occurs on a time scale ranging from milliseconds to minutes for a majority of proteins. Computer simulation of protein folding, from a random configuration to the native structure, is nontrivial owing to the large disparity between the simulation and folding time scales. As an effort to overcome this limitation, simple models with idealized protein subdomains, e.g., the diffusion–collision model of Karplus and Weaver, have gained some popularity. We present here new results for the folding of a fourhelix bundle within the framework of the diffusion–collision model. Even with such simplifying assumptions, a direct application of standard Brownian dynamics methods would consume 10,000 processoryears on current supercomputers. We circumvent this difficulty by invoking a special Brownian dynamics simulation. The method features the calculation of the mean passage time of an event from the flux overpopulation method and the sampling of events that lead to productive collisions even if their probability is extremely small (because of large freeenergy barriers that separate them from the higher probability events). Using these developments, we demonstrate that a coarsegrained model of the fourhelix bundle can be simulated in several days on current supercomputers. Furthermore, such simulations yield folding times that are in the range of time scales observed in experiments.
Molecular and Brownian dynamics simulations (1, 2), as well as lattice Monte Carlo simulations (3, 4), have been used to investigate proteinfolding pathways with some success. However, the time scales accessible by the dynamics simulation techniques are in the microseconds range or less and thus fall short of the experimentally observed proteinfolding time scales, which, for most proteins, lie somewhere between a few milliseconds to minutes. The limit on accessible time scales in these simulations originates from the small time steps required for the accurate integrations of the relevant equations of motion; a simulation with too long a characteristic time scale simply takes too many time steps and consumes too much computer resources. Developed specifically to handle simulations of largetimescale processes, the WeightedEnsemble Brownian (WEB) dynamics algorithm of Huber and Kim (5), a biased Brownian dynamics scheme, can overcome the time scale restriction in the traditional simulation techniques through judicious splitting and combining of Brownian particles. In this paper, we show that the WEB dynamics idea can be generalized to the simulation of a simple diffusion–collision fourhelixbundle proteinfolding model, allowing us to investigate its folding kinetics in a reasonable time frame.
We develop our fourhelixbundle proteinfolding model from the diffusion–collision approach of Karplus and Weaver (6–8). The diffusion–collision proteinfolding mechanism postulates the earlystage formation of fluctuating quasiparticles, called microdomains, which may be incipient secondary structures (αhelices and βsheets) or hydrophobic clusters. These microdomains move diffusely, and their coalescence leads to the formation of folded proteins. Consequently, the diffusion–collision model reduces the complexity of the folding process from a consideration of the individual amino acids to that of the properties of few microdomains and their interactions. Experimental evidence supporting this model comes from folding experiments where elements of secondary structure can be detected early in the folding process (9–11), as well as other experiments that show the tendency of various polypeptide fragments to form stable secondary structures in solution (8, 12, 13). The diffusion–collision model has been used to estimate foldingrate constants in proteins where diffusion–collision of microdomains is the ratelimiting step (14–18).
The fourhelix bundle is a well defined motif in several proteins (19, 20); de novo designs of fourhelixbundle proteins have also been reported (21). Theoretical studies of folding kinetics of a fourhelix bundle protein have been carried out by Yapa and Weaver (17, 18) in the context of the diffusion–collision model and the chemical kinetics approximation. In this approximation, the analytical rate constants for pairwise coalescence of spherical microdomains or microdomain complexes can be used to construct a system of firstorder differential equations that can be solved analytically to determine the folding time of the protein and the relative population of various folding intermediates (7). To obtain analytical rate constants for microdomain coalescence, one must make several simplifying assumptions. Using the WEB dynamics algorithm allows us to retain the diffusion–collision framework while allowing more details of microdomain geometry and interactions in our model and bypassing the chemical kinetics approximation all together. Although some qualitative agreement with experimental results can be obtained from our model, we emphasize that our major objective is to demonstrate the applicability of the WEB dynamics method to the proteinfolding problem. More definitive simulations of proteinfolding kinetics may come in the future, perhaps by reinvesting the savings from our algorithmic gains on more accurate force calculations.
The remaining portions of this paper are organized as follows. The next section describes the folding model in the context of the diffusion–collision framework and the adaptation of the WEB dynamics algorithm to the fourhelix model. The subsequent section on results and discussion explores the relationship between the model parameters (e.g., helix length, strength of the electrostatic charges) and the folding time. We conclude with some thoughts on future directions for “WEBbased” proteinfolding simulations.
MATERIALS AND METHODS
Diffusion–Collision FourHelixBundle Protein Model.
Four rigid nonoverlapping spherocylinders connected to one another by frictionless strings were used to model four helical microdomains, which must diffuse, collide, and aggregate to form the folded fourhelix bundle (see Fig. 1). The connecting frictionless strings serve only as constraints that preserve the topological contiguity of the protein, and they exert no additional force on the microdomains. In addition, two coarsegrained helix–helix interaction schemes are employed and compared in this work to gauge the effects of simulation details on our results. The interaction schemes are designed to mimic the helixorienting effects of helix dipoles (22). The first interaction scheme puts pairs of opposite charges of ±0.5e at the cap centers of the four helical microdomains (Fig. 2a). The other scheme employs point dipoles located at the centers of microdomains (Fig. 2b). The magnitudes of these dipoles are set to 0.3e per helix length (measured, in Å, from one cap center to another); the reason for selecting this value will be discussed in detail in the following section. Both helix–helix interaction schemes are expected to help steer the microdomains into the desired up–down–up–down helix arrangement in the folded form.
In defining the folding criteria of our protein model, it is noticed that in the folded configuration (Fig. 1a), the cap centers of the spherocylindrical microdomains line up to form two quadrilaterals (denoted by white sticks in Fig. 1b), one at each end of the helix bundle. When the protein unfolds (Fig. 1c), some of the edges of the quadrilaterals defined by cap centers elongate (Fig. 1d). A protein satisfies our folding criteria when the longest edge of the aforementioned quadrilaterals is shorter than 10 Å (1.25 times the helix diameter). Because the microdomains are nonoverlapping, this folding criterion is found to be very stringent.
Because rigid microdomains are used, the configuration of a protein in our simulation is defined by the Cartesian coordinates of helix centers, x_{i}, and quaternion parameters, q_{i}, representing their orientations (23–25). Starting from randomly generated configurations, the configurations of the four helices are simulated by using the conventional equations of motion: 1 2 3 where F_{i} and T_{i} denote force and torque on helix i, D_{i}^{t} and D_{i}^{r} denote the translational and rotational diffusion matrices of helix i, dW denotes the Gaussian noise with variance of dt (26, 27), and the superscript BF denotes the evaluation in the bodyfixed frame of reference. Hydrodynamics interaction (28) is neglected in all of our simulations. Because it has been shown that the hydrodynamics interaction plays only a secondary role to electrostatic steering in the enzyme–substrate association process (29), it is expected that the hydrodynamics interaction also plays a minor role in the folding process as well. In this work, the translational and rotational diffusion constants of each isolated helix are calculated by using the completed doublelayer boundary element integral method (30, 31) and are plotted in Fig. 3. The equations of motion above are solved by using the algorithm of Ermak and McCammon (32) with the quaternion parameters being renormalized at every time step.
At this point, we have not included the effects order–disorder transition of the microdomains (8) and shortrange stabilizing interactions such as van der Waals and hydrophobic interactions (33, 34). Because of the very simplistic nature of our protein model, the principal emphasis of our investigation lies in the applicability and efficiency of the simulation algorithm, rather than the simulation results obtained thereof. In light of previous experimental results (8–10, 35), the use of preformed rigid secondary structure elements as microdomains in our simulation, as well as our rudimentary helix–helix interaction schemes, evidently is unrealistic. Nevertheless, irrespective of these simplifications, it suffices that this model, by the virtue of its characteristic folding time scale being comparable to that of the proteinfolding process, should be an effective test for the WEB dynamics simulation algorithm.
WEB Dynamics Simulation.
The underlying theoretical principle for the present Brownian dynamics simulation is identical to that described in Huber and Kim (5). For the most general formulation of the Brownian dynamics simulation, we consider diffusion in an Ndimensional configuration space, with an arbitrary potential and (possibly varying) diffusion matrix. Each configuration of the microdomain assembly would be described by a set of coordinates in this configuration space and thus be represented by a “particle” at that point in space. The folding time or the mean passage time of this particle in the configuration space is the average amount of time it takes a particle to traverse the configuration space to the region that corresponds to folded proteins.
Intuitively, one can obtain the mean passage time in question from computer simulations by either simulating many Brownian dynamic trajectories and averaging the results, or using the flux overpopulation method (36). The latter refers to a type of simulation where multiple Brownian dynamic trajectories are simulated at once, with the trajectories being immediately reinitialized once their destination is reached. The two methods have been shown to be mathematically equivalent, and the mean passage time for the second method equals the ratio of the number of the trajectories in the simulation and the number of the trajectories reaching their destination per unit time at the steady state (37). The main difficulty in determining the mean passage time using either simulation method is that most interesting systems require particles to cross one or more substantial freeenergy barriers before reaching their destination. As a result, the particles spend most of their time wandering within local freeenergy wells and only rarely surmount the barriers. To obtain good estimates of the mean passage times of such systems via computer simulation would have consumed vast amounts of computational resources. However, it was pointed out that the flux overpopulation method can be modified to obtain meaningful results in a more efficient manner; this is the essence of the weightedensemble exposition in Huber and Kim (5).
There are two ways to imagine a large collection of particles in a configuration space. The first way is to view them as actual copies of the physical system in the real world. The second viewpoint is to regard the collection of particles as an estimate of the probability distribution of states taken by one system. In other words, each particle represents a packet of probability. If all particles are considered to carry the same amount of probability, then these two viewpoints are equivalent. However, if each particle is allowed to carry a different amount of probability, then adopting the second viewpoint introduces additional flexibility. For example, the probability distribution at the top of a substantial freeenergy barrier would be poorly sampled by an ensemble of particles with equal weights, because very few, if any, particles would be near the barrier top. On the other hand, if the particles are endowed with variable statistical weight, then many particles with small statistical weights can be present at the barrier top allowing the corresponding region in the configuration space to be adequately sampled. The weighted ensemble scheme allows particles to be split and combined with one another so that some particles can reach the barrier top without breaking any rules of probability theory.
The implementation of the weightedensemble method first requires selection of a measure that represents the “progress” of each configuration in the simulation toward its destination. This measure, calculated from the current configuration, will be used in deciding whether WEB particles are split, combined, or left alone at each time step. This progress measure can be identified with the reaction coordinate when the simulation concerns the association of an enzyme and a substrate (5). For the proteinfolding problem, the progress indicator can be identified with the “order parameter” in the spirit of the energy landscape treatment of proteinfolding kinetics (38, 39). In our simulations, we define the progress measure, d, as the maximum length of the edges of the two quadrilaterals formed by helix ends. It should be noted that the same measure is used to determine the folding criteria, and when the progress measure falls below 10 Å, the folding criterion is automatically satisfied. Because the progress measure is used simply as a guideline for the splitting and combining of particles, the folding pathway needs not be known in advance, and any reasonable progress measure, such as the free energy of each protein configuration, could be substituted without any loss of accuracy.
In WEB simulations, we first subdivide configuration space into regions, or bins, in intervals along the dcoordinate. Although the regions (bins) that correspond to small d are not easily accessible in standard Brownian dynamics simulations because of the freeenergy barriers to folding, the WEB algorithm is designed to sample configurations in these regions of configuration space effectively by splitting a single particle in these regions into many particles with smaller weights. Moreover, after splitting, there is a good chance that at least one of the newly split particles will progress further to bins with even smaller d parameters. In addition, to prevent the number of particles in the WEB simulations to grow exponentially, we also need to combine Brownian particles in regions in configuration space that already contain enough Brownian particles for effective sampling. Hence, in our implementation of WEB dynamics, we split and combine particles to have the same number of particles in each bin. At every iteration, the particles are stepped forward one time step according to the standard Brownian dynamics algorithm. During the time step, particles might move from one bin to another. For bins with too many particles, we pick the two lightest particles and combine them using the combine algorithm as in Huber and Kim (5). The weight of the conglomerate particle is the sum of the weights of the constituents. The position of the conglomerate is the position of one of the original particles; the probability of a position being chosen is proportional to the weight of the original particle at that position. The combine algorithm is repeated until the number of particles in the bin has been reduced to the target quota. For bins with too few particles, we pick the heaviest particle and split it into two particles of equal weight. Both children are inserted into the position in configuration space that was occupied by the parent. The splitting algorithm is repeated until the number of particles in the bin has been increased to the target quota.§ After the particle readjustments, the next time step is taken and the process is repeated. The net result is that intervals with smaller values of d (closer to the folded state) receive particles, albeit of minute weight, and a greatly reduced computational effort is expended in “uninteresting” regions of configuration space. Lastly, as in the flux overpopulation method, particles that satisfy the folding criteria are immediately reinitialized in a random configuration, and because the total weight of WEB particles in our simulation is normalized to one, the folding time of our folding model in WEB simulations equals the inverse of the particle flux, which is the total weight of particles satisfying the folding criteria per unit of time.
Simulation Details.
To demonstrate the effectiveness of the WEB dynamics algorithm, we chose to study the effects of microdomain geometry on the folding time because these effects cannot be thoroughly investigated with analytical methods such as the chemical kinetics approximation. The dimensions of the helix microdomains used in this study are as follows. The radius of each microdomain is 4 Å, and the length of the connecting string, measured between cap centers, is 12 Å. To study the effects of microdomain geometry, the length of the major axis is varied between 24, 32, and 40 Å. The simulations are carried out at 298 K, where the viscosity of solution (water) is 0.89 centipoise. In addition, because the folding process takes place in aqueous solution, the dielectric constant is set to that of water (ɛ = 78). The random initial configuration is generated by first laying down the cap center that corresponds to the N terminus of the first helix, and a random unit vector is generated for the direction of the first helix. Then the cap center that corresponds to N terminus of the second helix is generated such that the connecting string between the two helices is less than 12 Å, and a unit vector for the direction of the second helix is generated such that none of the helices generated so far overlap. The process is repeated for the third and fourth helices. Time steps of 0.1 ps are used in the integration of Eqs. 1 and 2. The WEB dynamics algorithm is applied using 1,200 bins, 5 particles per bin, and at least 1 × 10^{7} Brownian steps. Because the error distribution of particle flux is slightly skewed, the Bootstrap Monte Carlo method (5, 40) is used to calculate confidence interval of the data, assuming independent block size of 50,000 steps. The reported “standard error” is taken to be the halfwidth of the 68.3% confidence interval.¶ The calculation stops when the aforementioned halfwidth is within 10% of the average value.
RESULTS AND DISCUSSION
Given the crudeness of the model, it is quite encouraging to see that the predicted folding times from the simulations, as compiled in Table 1, fall in the range observed for typical proteins. However, the 24 Å bundle foldingtime scale is approximately two orders of magnitude slower than the experimentally observed folding time scale of the similarly sized fourhelixbundle bovine acylCoAbinding protein (ACBP) (41). The discrepancy might arise from the crudeness of our model or from the rather arbitrary definition of folding criteria employed here.
It is also interesting to note the effect of model parameters such as the helix length and the dipole strength. Shorter helices have greater mobility (larger diffusion constants). Also, in the present model helix length is coupled to the force model. For example, the dipole model has a helix–helix interaction energy that varies as the square of the length, whereas the dependence of interaction energy on helix length is somewhat weaker in the model with charges at the helix ends. Consequently, the folding time is more sensitive to helix length in the dipole model. In fact, with the helix dipole per length of 0.5e, the model yields the incorrect result that the helix bundle folding time is proportional to the helix length (data not shown). It is only when the dipole strength per length is reduced to 0.3e that reasonable results are obtained.
In general, our computational experiments with two different models for the distribution of charges over the helix microdomains show that the pattern of the charge distribution has a major impact on the folding time. The key point here is not the veracity of a particular model, but that such computational experiments could be performed at all. Most of the simulations were conducted on just four processors of a Silicon Graphics PowerChallenge array at the National Center for Supercomputing Applications. Each entry in Table 1 consumed about 3 weeks of CPU time. Although our CPU consumption seems large, it is estimated that bruteforce Brownian dynamics using a comparable number of trajectories would have required about 10,000 processoryears. Furthermore, if the error bars in Table 1 are relaxed to 50%, i.e., orderofmagnitude estimates of the folding time, each WEBbased folding simulation can be completed in a matter of 2 or 3 days.
In future work, an expanded simulation capability permitted by the WEB simulation method could be reinvested in more detailed models. Although a simple microdomainlevel model and the simulation results given above are sufficient to illustrate the underlying concepts and algorithmic details, a more realistic protein model must be employed in the simulations to answer relevant kinetics questions, such as folding cooperativity and folding intermediates. In this regard, a more detailed residuelevel model of small proteins (18, 42–44) seems to be a good candidate for future simulation studies. Based on our computer usage for this work and the currently available CPU cycles at supercomputer centers, it is believed that the WEB simulations using residuelevel models of small proteins, 50–80 residues in length, already can be carried out at the present time.
Acknowledgments
This work was supported in part by grants from IBM, the Biological Sciences and Technology Program of the Office of Naval Research, and Divisions of Advanced Scientific Computing and Biological Infrastructure of the National Science Foundation. We thank the National Center for Supercomputer Applications and the Maui HPLC Computing Center for computational resources. We also thank Dr. Iasonas Mustakis for his help with calculations of diffusion coefficients.
Footnotes

↵‡ To whom reprint requests should be addressed at the present address: Scientific Information Resources, Parke–Davis Pharmaceutical Research, 2800 Plymouth Road, Ann Arbor, MI 48105. email: kim01{at}aa.WL.com.

↵§ The implementation of the combine–split algorithm in the present work allocates the same quota of particles to each bin and in this respect differs from the original algorithm described in Huber and Kim (5). Having the same number of particles in each bin facilitates data layout on modern parallel supercomputers.

↵¶ For normally distributed data, the 68.3% confidence interval corresponds exactly to ±1 SD about the average value.
ABBREVIATION
 WEB,
 WeightedEnsemble Brownian
 Received July 21, 1997.
 Accepted January 23, 1998.
 Copyright © 1998, The National Academy of Sciences
References
 ↵
 ↵
 McCammon J
 ↵
 Kolinski A,
 Skolnick J
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Pain R
 Roder H,
 Elöve G
 ↵
 Ballew R,
 Sabelko J,
 Gruebele M
 ↵
 ↵
 ↵

 Bashford D,
 Cohen F,
 Karplus M,
 Weaver D

 Gierasch L,
 King J
 Bashford D,
 Karplus M,
 Weaver D
 ↵
 ↵
 ↵
 ↵
 ↵
 Hecht M,
 Richardson J,
 Richardson D,
 Ogden R
 ↵
 ↵
 Evans D
 ↵
 Allen M,
 Tildesley D
 ↵
 Gardiner C
 ↵
 Öttinger H
 ↵
 Brune D,
 Kim S
 ↵
 ↵
 Brune D,
 Kim S
 ↵
 Kim S,
 Karrila S
 ↵
 ↵
 Creighton T
 ↵
 ↵
 ↵
 Farkas L
 ↵
 ↵
 Bryngelson J,
 Wolynes P
 ↵
 ↵
 Press W,
 Teukolsky S,
 Vettering W,
 Flannery B
 ↵
 ↵
 Rey A,
 Skolnick J
 ↵
 States D,
 Agarwal P,
 Gassterland T,
 Hunter L,
 Smith R
 Subramaniam S,
 Tcheng D,
 Fenton J
Citation Manager Formats
More Articles of This Classification
Biological Sciences
Related Content
 No related articles found.