New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Reaction coordinates and rates from transition paths

Edited by Bruce J. Berne, Columbia University, New York, NY, and approved February 28, 2005 (received for review October 31, 2004)
Abstract
The molecular mechanism of a reaction in solution is reflected in its transitionstate ensemble and transition paths. We use a Bayesian formula relating the equilibrium and transitionpath ensembles to identify transition states, rank reaction coordinates, and estimate rate coefficients. We also introduce a variational procedure to optimize reaction coordinates. The theory is illustrated with applications to protein folding and the dipole reorientation of an ordered water chain inside a carbon nanotube. To describe the folding of a simple model of a threehelix bundle protein, we variationally optimize the weights of a projection onto the matrix of native and nonnative amino acid contacts. The resulting onedimensional reaction coordinate captures the folding transition state, with formation and packing of helix 2 and 3 constituting the bottleneck for folding.
Identifying the molecular mechanism of a reaction in solution, such as protein folding or enzymecatalyzed chemistry, poses serious challenges because of the large number of coupled degrees of freedom (1–5). The identification and characterization of populated intermediate states along reaction paths is only a first step. Understanding the mechanism at a molecular level requires in addition the characterization of the transitions between those populated intermediates. The goal then is (i) to identify what is common to the transitions in a rare molecular reaction (or significant subsets thereof), and (ii) to find coordinates that not only measure the progress of the reaction but also are useful to characterize the reaction dynamics. The former leads to the concept of a transition state, the latter to that of a reaction coordinate.
Transition states can be thought of as configurations “intermediate” between reactants and products. In one widely used definition (6–11), the ensemble of transition states comprises those configurations that have an equal probability of reaching reactant and product regions. The chance of proceeding to reactants or products first can be quantified by the splitting (or commitment) probability introduced by Onsager for ionpair recombination (12). The splitting probability is defined as the fraction of trajectories reaching the reactant region first when initiated from a given configuration with random Maxwell–Boltzmann velocities, and possibly averaged over noise for stochastic dynamics. We note that one of the difficulties arising from the above definition of transition states is that splitting probabilities cannot be measured experimentally, not even in a singlemolecule measurement with atomic resolution. The reason is that multiple initializations with precise atomic positions, including those of solvent molecules, are required. Here, we will show how this difficulty can be circumvented by calculating average splitting probabilities (13) from transitionpath and equilibrium ensembles that can also be measured experimentally.
From a good reaction coordinate, one may expect a dynamically meaningful measure of the progress of a reaction. Formally, the projection operator formalism (1, 2) allows us to obtain the dynamics along a chosen coordinate, but for poor choices the projected dynamics will be highly nonMarkovian with longtime memory effects. In contrast, for a well chosen coordinate, the dynamics will be essentially Markovian after a brief initial period accounting for “molecular collisions” (14). Qualitatively, this Markovian character implies that if all one knows is the value of the reaction coordinate for a configuration intermediate between reactants and products, one can predict the likely fate of a trajectory initiated from that configuration. Now the “likely fate” is the splitting probability! Thus, a good reaction coordinate should parameterize the splitting probability, such that the splitting probability of a configuration is a function of the corresponding reaction coordinate alone (15, 16). Berezhkovskii and Szabo (17) indeed found the optimal onedimensional reaction coordinate to be normal to the surface of equal splitting probabilities [i.e., the separatrix (6, 18, 19)].
How does one identify transition states and good reaction coordinates for a rare molecular reaction in condensed phase? Answering this question requires access to an ensemble of reactive trajectories that can be obtained most efficiently from transitionpath sampling (8–11, 13, 20, 21) or, if the transitions are sufficiently frequent, from long equilibrium trajectories. We note that partial information about reactive trajectories can also be obtained from singlemolecule measurements. As illustrated in Fig. 1, a poor coordinate can often be distinguished from a good one almost immediately. Projected onto the poor coordinate, it may be possible to assign the state (i.e., reactant or product) in the context of the timeseries history. However, equilibrium excursions from either state overlap in the projection. So if all one knows is the value of the reaction coordinate (say, r = r ^{‡}) without the preceding history, one cannot assign the state with confidence. In contrast, the good coordinate separates the states, and equilibrium excursions from either state do not overlap. Configurations with a reactioncoordinate value r = r ^{‡} between reactant and product regions occur essentially only during transition paths, such that r ≈ r ^{‡} should capture the configurations of the transition state. One of the objectives of this paper is to quantify such qualitative observations.
In the following, we will first introduce a probabilistic relation between the equilibrium and transitionpath ensembles. This Bayesian expression allows us to define and identify a transitionstate ensemble. Moreover, it immediately leads to a quantitative measure for the quality of a reaction coordinate that will allow us to search systematically for optimal reaction coordinates. The Bayesian relation between the equilibrium and transitionpath ensembles also leads to an expression for the rate coefficient and to a simple transitionpath sampling algorithm. To illustrate the variational method of identifying reaction coordinates, we will study the folding of a simple protein model. Transitionpath sampling and rate calculations will be illustrated for the slow dipolar reorientation of a hydrogenbonded water chain inside a carbon nanotube.
Theory
Bayesian Relation Between Equilibrium and TransitionPath Ensembles. In the following, we consider a molecular system with deterministic Newtonian or stochastic Langevin dynamics in phase or configuration space. Following ref. 13, we define transition paths as those trajectory segments that exit from the reactant region A and reach the product region B without crossing back into A, and vice versa. Because of the requirement of no recrossing into A and B, the definitions of reactant and product regions can be rather stringent and include only configurations in the densely populated regions. (Incidentally, dividing up long equilibrium trajectories into segments separated by transition paths that connect the “bottoms” of the free energy wells also provides a convenient way of assigning intermediate configurations to reactant and product states.)
We can now construct probability distributions in phase space p _{eq}(x) and p(xTP) for the equilibrium ensemble and the transition paths, respectively. x is a point in the full phase space in which the dynamics is Markovian. These probability distributions are related to each other through a Bayesian expression for conditional probabilities, where we have introduced two probabilities (with values between 0 and 1): p(TPx) is the probability for being on a transition path (TP), given that the system is in x; and p(TP) is the fraction of time spent in transition paths, averaged over long equilibrium trajectories. Transition states can now be identified as those points with the highest probability p(TPx) that trajectories passing through them are reactive (13), i.e., form transition paths between reactants and products.
Relation to Splitting Probabilities. As shown in ref. 13, the conditional probability p(TPx) of being in a transition path is directly related to the splitting probabilities φ _{A} (x) of reaching the reactant region A first and φ _{B} (x) of reaching the product region B first on trajectories initiated from x: p(TPx) = φ _{A} ( x ) φ _{B} (x) + φ _{A} (x)φ _{B} ( x ), where x = (–p, q) is a point in phase space with the same position q as x = (p, q), but reversed momenta p. In particular, for diffusive dynamics (i.e., Langevin dynamics in the overdamped limit), we have p(TPq) = 2φ _{A} (q)φ _{B} (q) with φ _{B} (q) = 1 – φ _{A} (q). p(TPq) reaches its maximum of ½ exactly on the stochastic separatrix (6, 18, 19) where φ _{A} (q) = φ _{B} (q) = 1/2. For diffusive dynamics, the definition of transition states as points with the highest probability p(TPq) that trajectories passing through them are transition paths is thus equivalent to that using a “separatrix” or commitment probabilities (6–11).
Test for Reaction Coordinates. The Bayesian relation, Eq. 1, can be generalized for projected dynamics. For a reaction coordinate r = r(x), we have (13) where p(TPr) is the conditional average of p(TPx) with an equilibrium weight: with δ(r) Dirac's delta function.
For a good reaction coordinate r = r(x), p(TPr) should have a single sharp and high peak, collapsing the transition states with a high value of p(TPx) into a single value of r. With Eq. 2, different reaction coordinates r(x) and r′(x) can be compared quantitatively even without knowing the normalizing factor p(TP) in Eq. 2, because p(TP) is identical for all projections. Furthermore, the same criterion can be used in a variational search for optimal reaction coordinates, as shown below. Fig. 1 C and F shows normalized p(TPr) for projections onto a poor and good coordinate: in the latter case, the maximum of p(TPr) is considerably higher and approaches the diffusive limit of 0.5. In practice, the equilibrium probabilities p _{eq}(r) can be obtained from umbrella sampling (in any suitable coordinate) and the transition path probabilities p(rTP) from transitionpath sampling (8–11, 13, 20, 21) or, if feasible, both can be obtained from long equilibrium simulations with multiple transitions.
Estimating Reaction Rates. p(TP) is the fraction of time spent in transition paths, averaged over long equilibrium trajectories. Dividing by the average duration of a transition path, 〈t _{TP}〉, one obtains the number of crossings between reactant and product regions per unit time (13). This relation can be used to estimate rate coefficients for the twostate model , where c_{A} and c_{B} are the equilibrium mole fractions of reactants and products, respectively.
TransitionPath Sampling by Shooting. Transitionpath sampling (8–11, 20, 21) provides a powerful method to create and examine reactive trajectories alone. To avoid storing of intermediate paths, one can perform transitionpath sampling by shooting from a single arbitrary dividing surface (13). The computational efficiency is determined by p(TPr) and will therefore be lower for a poor reaction coordinate. Initial configurations q _{1}, q _{2},..., q _{n} with reaction coordinates r ^{‡} ≈ r _{1} ≈ r _{2} ≈... ≈ r_{n} , can be obtained, e.g., by saving structures along an umbrella sampling run biased toward r ^{‡}. From each of those structures, one (or several) trajectory pairs would be initiated with Maxwell–Boltzmann momenta p _{i} and –p _{i} . If a trajectory pair starting in q _{i} ends in opposite regions A and B, the combined trajectory [with the momenta of one segment reversed (11)] forms a transition path and enters the transitionpath ensemble with a relative weight of (13) where the sum is over the points of intersection of the trajectory with the dividing surface r_{i} = r(q _{i} ) from which the trajectory was started, and v_{k} = dr/dt at time t = t_{k} is the velocity normal to the dividing surface at the kth intersection. In combination with Eq. 4, we obtain an estimate of the reaction rate coefficients (13), with θ_{TP} = 1 if the trajectory pair forms a transition path, and 0 otherwise. The average 〈...〉 in Eq. 6 is over combined forward and backward paths initiated from an equilibrium ensemble of phase points (p _{i} , q _{i} ).
Results
Folding of a Small “TwoState” Protein. Protein folding is an intrinsically highdimensional problem, yet many proteins are experimentally found to populate essentially only two states at equilibrium (22). Transitions between those states (folded and unfolded) are highly cooperative (23). It should thus be possible to find lowdimensional coordinates that accurately describe the process. In the following, we will identify and characterize such coordinates for a simple protein model.
As an example, we have used a small (47 residue) threehelix bundle protein that folds fast in experiments (24). A Gōlike model of this protein was built from the experimental structure (25) by using a standard procedure (26). The principal feature of this model is that favorable interactions occur only between residues in contact in the native state. Simulations were run by using the charmm code (27).
The protein model exhibits essentially twostate transitions, as monitored by a standard Gōmodel reaction coordinate, the fraction of native contacts Q (28): Langevin dynamics trajectories at the folding temperature (T _{f}) hop frequently between an “unfolded” state having Q ≈ 0.4 and a folded state with Q ≈ 0.9 (the folding and unfolding times τ_{f} and τ_{u} are ≈4 × 10^{6} time steps, respectively, at T _{f}). Therefore, in this case all analysis can be performed on long equilibrium simulations, without the need for transitionpath sampling.
Although Q turns out to be quite a good reaction coordinate (29), having a maximal p(TPQ) of 0.40, we can attempt to construct an equally good (or better) contactbased coordinate with limited a priori information (as would be the case for simulations with a more “realistic” transferable potential). Because folding trajectories are highly heterogeneous in Cartesian space, we have chosen to follow the simulations in contact space. Each configuration is described by a “contact matrix” q. A reaction coordinate r _{w} can be defined by projecting the contact matrix onto an arbitrary weight matrix w, . The goal then is to find a set of 1,081 = 47 × 46/2 weights w_{ij} (with w_{ij} = w_{ji} and w_{ii} = 0) that correspond to a good reaction coordinate. In the contact matrix, q_{ij} is 1 when the distance between i and j is <12.0 Å and 0 otherwise. This definition does not discriminate between native and nonnative contacts, and thus differs somewhat from that used for the fraction of native contacts, Q, where the cutoff distance for residue pairs is proportional to their distance in the native structure (26). Therefore, Q cannot strictly be recovered with the basis set used here.
We started our search from the least biased initial guess of equal weights w _{uni}. The projection of a trajectory segment at T _{f} onto w _{uni} is shown in Fig. 1 A . Because there are generally more contacts formed in the folded state than the unfolded, especially with a Gōlike potential, the corresponding reaction coordinate r _{uni} turns out to be a reasonably good order parameter (i.e., it separates folded and unfolded states). However, it is relatively poor at identifying reactive states, having a maximal value of p(TPr _{uni}) of only 0.17 (Fig.1C ; transition paths spanned Q = 0.4–0.9 without recrossings). In addition, there is a significant transitionpath probability p(TPr _{uni}) for some large values of r _{uni}, which is caused by contacts formed only on a few transition paths that are not representative of most transitions.
To improve on this initial guess, we use p(TPr) as a target function for a variational optimization procedure. Specifically, we optimize the maximum of a Gaussian fit to p(TPr), to ensure that all reactive configurations are condensed into a single peak in p(TPr). This procedure also suppresses subsidiary peaks from atypical contacts corresponding to configurations q where p(TPq) is large, but both p(qTP) and p _{eq}(q) are vanishingly small. We use a Monte Carlo optimization procedure in which we modify only relative weights by randomly changing two elements, w_{ij} and w_{kl} , in such a way as to preserve the magnitude of w. Monte Carlo moves include swapping, sign reversal, and reassigning fractions of the total weight. Reprojections on a trial coordinate can then be evaluated efficiently, because only the changes arising from the two altered elements need to be calculated. Starting from the initial uniform matrix of weights w _{uni}, we applied this procedure recursively, accepting only moves that increased the maximum of p(TPr), to generate an optimal matrix w _{opt}. As shown in Fig. 1F , w _{opt} gives a sharply peaked distribution of p(TPr _{opt}) with a maximum of 0.39.
To test whether individual configurations at the maximum of p(TPr _{opt}) were indeed part of the transitionstate ensemble, we calculated the distribution of the folding probability φ_{F} [or p _{fold} (7)]. Starting from 240 configurations close to the maximum of p(TPr _{opt}), we initiated 100 trajectories each with random Maxwell–Boltzmann velocities at T _{f}. The fraction of runs that fold first provides an estimate of the φ_{F} value of the starting configuration. The distribution of φ_{F} for the 240 configurations is sharply peaked close to 0.5 (Fig. 1F Inset); further, the mean value of 2φ_{F}(1 – φ_{F}) is 0.38, within the statistical error of the maximum in the distribution of p(TPr _{opt}), as expected for diffusive dynamics. In contrast, the distribution of φ_{F} for structures having maximal values of p(TPr _{uni}) is peaked at 0 and 1. Unlike the optimized coordinate r _{opt}, the uniform projection r _{uni} captures few transitionstate configurations (Fig. 1C Inset), as is expected from the substantially smaller maximum of p(TPr _{uni}). Even though r _{opt} is a good reaction coordinate, it is not unique. We find that nearoptimal weight matrices generated by Metropolis Monte Carlo search in weight space (at nonzero effective temperature) are degenerate, reflecting strong correlations among contacts (i.e., presence of one contact implying likely presence of another). Indeed, reaction coordinates of similar quality could be found even if only 100 of 1,081 contacts were included in the optimization (where the search randomly varied both the set of 100 contacts and their weights; Fig. 5, which is published as supporting information on the PNAS web site).
The optimized reaction coordinate r _{opt} can be used to gain insight into the folding mechanism. By selecting snapshots from the trajectories corresponding to certain r _{opt} projections, we can identify critical events on the folding paths. Fig. 2 shows 10 structures each for three values of r _{opt} that are not significantly populated at equilibrium. On the unfolded side of the peak in p(TPr _{opt}) (r _{opt} = 6.5; Fig. 2 A ), a largely unstructured ensemble is obtained, although there is some local helical structure. The remarkable feature of the transition state (r _{opt} = 6.9; Fig. 2B ) is that the structure of helices 2 and 3 is almost completely nativelike, whereas helix 1 is partly formed, but not docked against helices 2 and 3. On the folded side of the barrier (r _{opt} = 7.4; Fig. 2C ), all three helices are nativelike, with slight disorder in helix 1. On the basis of this analysis, we identify the formation and packing of helices 2 and 3 as the bottleneck of folding.
Analysis of the contact maps for the folded and unfolded protein, and for the transition state identified above (Fig. 2 D–F ) provides further information on the mechanism. The only significant tertiary contacts present in the denatured state are those between helices 1 and 2, which are present ≈30% of the time. In the transition state, the additional contacts present are principally between helices 2 and 3 (80% formed), as suggested by Fig. 2B , and to a lesser extent between helices 1 and 3. The relative importance of H2–H3 and H1–H3 interhelical contacts is also reflected in the nativelike appearance of the average weight matrix obtained by Metropolis Monte Carlo sampling, and of restricted weight matrices with only 100 contacts (Fig. 6, which is published as supporting information on the PNAS web site). Of the helix–helix interaction energies in the native state, those between helices 2 and 3 (H2–H3) are strongest (–7.6 kcal·mol^{–1}), with weaker H1–H2 and H1–H3 interactions (–5.6 and –3.0 kcal·mol^{–1}, respectively). In summary, the strongest tertiary interaction (H2–H3) is absent from the unfolded state (which may not at first be expected for a Gōlike model), and its formation is the “decisive” step during folding.
Collective Dipole Flip of Ordered Water Chain in Nanotube. In the previous example, we extracted transition paths from long equilibrium simulations. Next, we will perform transitionpath sampling to collect otherwise rare reactive trajectories.
Molecular dynamics (MD) simulations of a short carbon nanotube segment dissolved in water showed that water filled the tube, forming hydrogenbonded wires with collective dipole orientations either up or down along the tube axis (30, 31). The characteristic time for reorientations of the dipole chain was estimated to be in the range of 2–3 ns, three orders of magnitude slower than that of an individual water molecule in the bulk fluid. Dipole reorientation is an essential step in the Grotthuss mechanism (32) of proton transfer along onedimensionally ordered water chains (33, 34). In a molecular model of the biological proton pump cytochrome c oxidase, waterchain reorientations induced by the changes in the electric fields in the protein active site provide the coupling of redox chemistry to vectorial proton translocation across the membrane (35).
In the following, we explore the dipole flip of ordered water chains, first to illustrate the transitionpath sampling algorithm based on Eq. 5 and to test the accuracy of the rate estimates, Eqs. 4 and 6, and then to gain insight into the mechanism by which such reorientations occur. As reaction coordinate, we chose the total dipole moment M_{z} of water molecules inside the pore projected onto its axis. A continuous and differentiable 3D sigmoidaltype weight was used in summing the contributions of water dipole moments to M_{z} for the tumbling nanotube. To obtain an accurate equilibrium distribution of M_{z} , we performed 1ns umbrella sampling runs with a harmonic bias for 12 overlapping windows. The simulation setup was as in ref. 30, with amber code and parameters for the carbons (36), the threesite water model TIP3P (37), a temperature of 300 K, and a pressure of 1 bar (1 bar = 100 kPa). The simulation system contained one (6,6)type nanotube of ≈13.4Å length and ≈8.1Å diameter together with ≈1,000 water molecules in a cubic box under periodic boundary conditions. Transitionpath sampling was performed by running trajectories at constant energy and volume “forward” and “backward” in time (i.e., with initial momenta ±p) from starting configurations in the region near the barrier, M_{z} = 0. The transitionpath simulations were terminated when M_{z}  exceeded 9 debye (1 debye + 3.3356 × 10^{–30} m·C; TIP3P dipole moment is 2.35 debye). The resulting 1,174 transition paths with a combined length of ≈15 ns were weighted according to Eq. 5. Initial Maxwell–Boltzmann velocities for the rigid water molecules were created by using a rigidbody representation. With the leapfrogtype integrator using velocities at half steps, care was taken that the forward and backward paths were continuous and reversible at the starting configuration.
Fig. 3 summarizes the results for the thermodynamics and kinetics of the waterdipole flip. We find that Eqs. 4 and 6 give accurate rate coefficients when compared to long (66ns) equilibrium simulations. In comparison with the slow dipolar reorientation with a rate coefficient of ≈1/(2 ns), transition paths are fast, with an average duration of only ≈2.0 ps. This time for flipping a chain of five or six water molecules is comparable to the characteristic time of ≈2 ps for reorientation of a single TIP3P water molecule in the bulk fluid. The distribution of transitionpath durations t _{TP} is well approximated by a gamma distribution of mean 2 ps and standard deviation 1.4 ps (not shown). On average, the transition paths cross the M_{z} = 0 dividing surface seven times, with a roughly exponential distribution of the (odd) number of crossings, suggesting “diffusive” dynamics on the broad barrier top (Fig. 3D ). Accordingly, the relative weights of the paths created by straightforward shooting, Eq. 5, are distributed broadly, and only about 15% of the paths contribute to the top 95% of the relative weights.
The molecular configurations along transition paths differ significantly from those seen in equilibrium trajectories. In particular, we find that the dipole moment distribution p(M_{z} TP) in the transitionpath ensemble peaks near M_{z} = 0. At equilibrium, M_{z} = 0 is at the center of an ≈7–8 k _{B} T high freeenergy barrier and thus rarely visited. The terracelike steps in p(M_{z} TP) indicate that during transition paths, dipole flips of individual water molecules have discrete character. Indeed, if we analyze individual transition paths, we find a steplike reorientation. As shown in Fig. 4, rather than collectively reorienting the dipoles of the water molecules, a hydrogenbonding defect moves through the tube without significant translational motion of the water molecules, consistent with the static equilibrium analysis of a water chain in vacuum by Pomès and Roux (33). In the nanotube system, a D defect [formed by a water molecule accepting two hydrogen bonds and donating none (34)] is energetically preferred over an L defect (formed by a water molecule donating two hydrogen bonds and accepting none), as is expected from an earlier analysis of the preferred waterdipole orientation at the openings of the tube (38). Correspondingly, defects enter almost exclusively from the end at which pore water donates hydrogen bonds to the surrounding solvent.
The transitionpath probability p(TPM_{z} ) reaches ≈0.5 near M_{z} = 0, the maximum expected for diffusive dynamics (Fig. 3), suggesting that the dipole moment M_{z} is a good reaction coordinate. To test whether M_{z} = 0 captures the transition state we have calculated the splitting probabilities for 45 equilibrium configurations with dipole moments near M_{z} = 0 by using 50 trajectories each with Maxwell–Boltzmann initial velocities. We find that the distribution of the resulting splitting probabilities is indeed narrowly peaked at 0.5, with a standard deviation of about 0.125, close to that expected from random binomial variations in 50 trials.
To explore whether M_{z} alone also captures the dynamics of the collective dipole flip, we have constructed a onedimensional Langevin model for motion along M_{z} . From the equipartition theorems for the kinetic and potential energies, and the time integral of the M_{z} correlation function of a Langevin oscillator, we obtain an effective mass of 2.46 × 10^{–6} ps^{2}/Å^{3} and a friction coefficient of 216 ps^{–1} by using the simulation data for the harmonically biased simulation near the barrier top, M_{z} = 0. Langevin simulations on the full free energy surface (Fig. 3A ) give a rate coefficient for dipole reorientation of ≈1/(2.1 ns), a transitionpath duration of ≈1.6 ps, an average number of barrier crossings of 6.7 per transition path, and p(TPM_{z} = 0) = 0.50, all in excellent agreement with the actual MD data [≈1/(2 ns), 2.0 ps, 7, and 0.5, respectively]. That M_{z} alone is a good reaction coordinate may appear somewhat surprising, considering that the dipole moment of the water chain is strongly coupled to the water solvent surrounding the tube through electrostatic interactions. Indeed, the rate of dipole reorientation for water wires across equivalent (6,6) nanotubes in 2D membranes is much lower because of the high cost of moving an effectively charged hydrogen bonding defect (34) through a lowdielectric environment (single transition during ≈20ns equilibrium MD; unpublished results). The solventdependent rate without explicit solvent component in the reaction coordinate suggests fast solvent relaxation outside the tube. Consequently, solvent effects are well described by a potential of mean force and an effective friction for the M_{z} dynamics. Finally, we note that if the polarity of the pore is lowered (30), we observe an entirely different dipoleflip mechanism in which the tube first empties, and then water reenters the tube with the opposite dipole orientation.
Conclusions
We have shown that a Bayesian relation between the equilibrium ensemble and the transitionpath ensemble can be used to locate transition states, optimize reaction coordinates, and estimate reaction rate coefficients. For a simple model of a threehelix bundle protein, we constructed a reaction coordinate by variationally optimizing the weights of a projection onto the amino acid contact matrix. Starting from a poor initial coordinate with uniform weights, we obtained a projection that accurately located the transitionstate ensemble. By comparing protein configurations along the coordinate, we could identify the bottleneck of folding as the formation and packing of two helices, with the third helix relatively disordered at the transition state. We have also described a simple transitionpath sampling algorithm and tested it for the dipole reorientation of an ordered water chain inside a carbon nanotube. The transition paths were used to estimate the rate of dipolar reorientations, giving a result in agreement with long equilibrium MD simulations.
In the proteinfolding example considered here, a good basis set for the variational optimization of reaction coordinates could be guessed relatively easily. In general, that may not be the case. To expand the basis set, linear combinations of projections onto principal component axes may be useful for peptides and proteins (39, 40); nonlinear functions can be optimized in the same way if required. In addition, inclusion of solvent coordinates (9, 15, 41) may be necessary to construct a search space that covers degrees of freedom essential for the reaction kinetics and mechanism. Applications of our approach to more complex systems are expected to aid in the development of novel and unanticipated reaction coordinates.
Acknowledgments
We thank A. Szabo, W. A. Eaton, and A. Berezhkovskii for many helpful discussions.
Footnotes

↵ * To whom correspondence should be addressed. Email: hummer{at}helix.nih.gov.

Author contributions: R.B.B. and G.H. designed research; R.B.B. and G.H. performed research; R.B.B. and G.H. analyzed data; and R.B.B. and G.H. wrote the paper.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviation: MD, molecular dynamics.
References

↵
Zwanzig, R. (2001) Nonequilibrium Statistical Mechanics (Oxford Univ. Press, New York).

↵
Berne, B. J. & Pecora, R. (1976) Dynamic Light Scattering (Wiley, New York).
 ↵
 ↵

↵
Klosek, M. M., Matkowsky, B. J. & Schuss, Z. (1991) Ber. Bunsen Ges. Phys. Chem. 95 , 331–337.
 ↵
 ↵

↵
Bolhuis, P. G., Dellago, C. & Chandler, D. (2000) Proc. Natl. Acad. Sci. USA 97 , 5877–5882. pmid:10801977
 ↵
 ↵
 ↵
 ↵

↵
Ma, A. & Dinner, A. R. (February 11, 2005) J. Phys. Chem. B, http://dx.doi.org/10.1021/jp045546c.

↵
Rhee, Y. M. & Pande, V. S. (January 27, 2005) J. Phys. Chem. B, http://dx.doi.org/10.1021/jp045544s.
 ↵

↵
Ryter, D. (1987) Physica A 142 , 103–121.
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S. & Karplus, M. (1983) J. Comput. Chem. 4 , 187–217.

↵
Nymeyer, H., García, A. E. & Onuchic, J. N. (1998) Proc. Natl. Acad. Sci. USA 95 , 5921–5928. pmid:9600893
 ↵
 ↵
 ↵

↵
de Grotthuss, C. J. T. (1806) Ann. Chim. 58 , 54–74.
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵