# Broad distributions of transition-path times are fingerprints of multidimensionality of the underlying free energy landscapes

^{a}Department of Chemistry, University of Texas at Austin, Austin, TX 78712;^{b}Mathematical and Statistical Computing Laboratory, Office of Intramural Research, Center for Information Technology, National Institutes of Health, Bethesda, MD 20892;^{c}Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712

See allHide authors and affiliations

Edited by Gerhard Hummer, Max Planck Institute for Biophysics (Max-Planck-Gesellschaft), Frankfurt am Main, Germany, and accepted by Editorial Board Member Peter J. Rossky September 28, 2020 (received for review April 28, 2020)

## Significance

The function of biomolecules is controlled by the energy landscapes traversed when they fold, bind to other molecules, generate mechanical motion, or catalyze chemical reactions. Information about such landscapes is encoded in single-molecule observations of transition paths, i.e., brief events where molecules cross activation barriers, but teasing out dynamical details from inherently low-dimensional experimental signals is a challenge. Here, we ask a question of principle: How can one decide whether an observed biomolecular transition can be represented as motion along a one-dimensional reaction coordinate or multidimensionality is essential? We establish a rigorous constraint on the width of the distribution of transition path times, violation of which necessitates use of multidimensional models for the interpretation of experimental data.

## Abstract

Recent single-molecule experiments have observed transition paths, i.e., brief events where molecules (particularly biomolecules) are caught in the act of surmounting activation barriers. Such measurements offer unprecedented mechanistic insights into the dynamics of biomolecular folding and binding, molecular machines, and biological membrane channels. A key challenge to these studies is to infer the complex details of the multidimensional energy landscape traversed by the transition paths from inherently low-dimensional experimental signals. A common minimalist model attempting to do so is that of one-dimensional diffusion along a reaction coordinate, yet its validity has been called into question. Here, we show that the distribution of the transition path time, which is a common experimental observable, can be used to differentiate between the dynamics described by models of one-dimensional diffusion from the dynamics in which multidimensionality is essential. Specifically, we prove that the coefficient of variation obtained from this distribution cannot possibly exceed 1 for any one-dimensional diffusive model, no matter how rugged its underlying free energy landscape is: In other words, this distribution cannot be broader than the single-exponential one. Thus, a coefficient of variation exceeding 1 is a fingerprint of multidimensional dynamics. Analysis of transition paths in atomistic simulations of proteins shows that this coefficient often exceeds 1, signifying essential multidimensionality of those systems.

In recent years, single-molecule studies of the folding (1, 2) and binding (3, 4) dynamics of biomolecules have brought unprecedented mechanistic insights into those phenomena. A key advance that has made this possible was achieving a high temporal resolution allowing observation of fast events where the molecule is caught en route between its stable molecular conformations (e.g., the folded and unfolded states) as it crosses, and possibly recrosses, the free energy barrier that separates them. More precisely, one can observe a “transition path,” a piece of a molecular trajectory

The distribution of the transition path time is sensitive to the details of the barrier crossing dynamics. The most common model used to rationalize the dynamics of biomolecular folding and binding is that of diffusive motion over a one-dimensional (1D) free energy landscape *x* being a suitable “reaction coordinate” (7⇓–9) (Fig. 1). When applied to experimental observations, the reaction coordinate *x* is related to the experimental observable, such as the mechanical extension of the molecule in force spectroscopy measurements or the distance between fluorescent probes in fluorescence experiments. Several recent studies have suggested, however, that the simple, 1D picture of diffusive barrier crossing along experimental reaction coordinates is inadequate as a description of transition path time distributions. Specifically, the distribution of the transition path time,

Even more dramatic discrepancy between the predictions of the 1D diffusion model and the observed transition path times is encountered in atomistic simulations of protein folding (13, 14) and other conformational changes in proteins (Fig. 3). In particular, the observed distributions are much broader than those predicted by the 1D diffusion model: The latter, when the diffusivity is adjusted to give the correct mean transition path time, significantly underestimates the contribution of both short and long transition paths (13). Several explanations to this discrepancy have been proposed, including roughness of the free energy landscape (15), entropic barriers (16, 17), heterogeneity of the pathways (14, 18), and memory effects (19⇓–21), but so far the experimental or simulation data have not allowed one to differentiate among such different possibilities. A further complication is that the distribution of the transition path time for the case of diffusive dynamics depends on the shape of the potential of mean force (Fig. 1). While this potential is accessible experimentally in single-molecule force spectroscopy studies (22⇓–24), it is generally unknown in fluorescence-based measurements of transition path times. This necessitates the use of an additional adjustable parameter, the barrier height

Here, we show that a simple measure of the distribution width, known as the coefficient of variation (denoted *C*), also known as the relative SD, can be used to differentiate between data interpretations based on 1D diffusive dynamics and on multidimensional dynamics (*Methods*). Specifically, the 1D interpretation is unequivocally ruled out when this coefficient is greater than 1. We use this result to show that, according to this measure, the distributions of transition path times observed in atomistic simulations of a number of proteins are inconsistent with the model of 1D diffusive dynamics along the reaction coordinate and indicate essential multidimensionality of their dynamics.

## Results

### Coefficient of Variation.

The coefficient of variation is a common statistical measure of a distribution used in a variety of fields, from engineering to economics. For a distribution of the transition-path time,

The coefficient of variation is large for broad distributions and small for narrow ones (approaching zero for sharply peaked distributions). To understand the range of values of this coefficient in the case of 1D diffusion, consider the simple example of diffusion across a symmetric parabolic potential,*k* correspond to a barrier and positive ones to a potential well. The height of the barrier measured relative to the boundaries *C*), and, as the curvature *k* becomes positive and its absolute value increases, the coefficient *C* approaches a value of 1. This limit is easy to understand: When a particle traverses a deep potential well, it becomes trapped in this well for a long time. The transition path time in this limit becomes the same as the time it takes to escape the well via activated barrier crossing, which is known to have an exponential distribution, **1** immediately gives *C* = 1. For the case of *k* = 0, the distribution of the transition path time and its first two moments can be obtained analytically (25), resulting in

### Rigorous Bound on the Coefficient of Variation for 1D Diffusion.

The central finding of this work is that *Methods*). In other words, the distribution of transition path times for the diffusive dynamics in any 1D potential can never be “broader” (as quantified by Eq. **1**) than the single-exponential distribution. Thus, an observed value of *C* > 1 would immediately necessitate that the 1D Markov diffusion model should be abandoned.

This result may appear surprising. Consider, for instance, the dynamics on a rugged (but 1D) free energy landscape, as illustrated in Fig. 4*A*, with multiple potential wells that can trap the particle. It is known that a broad distribution of trap depths leads to nonergodic behavior and anomalous-diffusion–type phenomena (26⇓⇓–29). For example, if the average lifetime τ inside each trapped state obeys the Arrhenius law, *E* > 0 is the trap depth, then an exponential distribution of the trap depths, *C* > 1. It is, however, not true.

### Coefficient of Variation for Discrete Markov State Networks.

Our finding can be restated within the framework of discrete models. For example, the scenario depicted in Fig. 4*A* can be mapped onto a random walk, wherein each potential well corresponds to a discrete state with a randomly drawn average lifetime. Surprisingly, then, for any linear chain of discrete states within the transition region, the coefficient of variation will always remain below 1 (that is, the distribution will always be narrower than the single-exponential one) regardless of how the average lifetimes of the discrete states are distributed.

Values of *C* greater than 1 are, however, possible for networks with nonlinear topologies. This observation is particularly important because existence of trapped, long-lived states is often viewed as the origin of anomalous-diffusion–type dynamics of reaction coordinates in proteins (30⇓⇓–33); yet our result shows that such trapping alone cannot account for long-tailed distributions of transition path times observed in Fig. 3.

The role of dimensionality (or network topology in the discrete case) can be illustrated by two two-site models of biological membrane channels shown in Fig. 4 *B* and *C*. In the first, sites 1 and 2 represent the left and right sides of a channel (34, 35). A transition path, say, from left to right starts when a translocating particle enters the channel from the left reservoir and occupies site 1. It ends when the particle leaves the channel into the bulk on the right-hand side escaping the channel from site 2. The transition path may, of course, involve multiple transitions between the sites as long as the particle does not leave the channel. The distribution of the transition path time in this case can be found analytically:*B*. Accordingly, using Eq. **1**, one finds

The second example, Fig. 4*C*, is the simplest model of a discrete network with nonlinear topology. In this case, the transition region also consists of two sites, 1 and 2, but the site 2 is an off-pathway intermediate, and thus each transition path both begins when the particle arrives in site 1 and ends when it leaves site 1 to the bulk. A similar calculation shows that, in this case,*C* approaches unity in the limit *C* → 1.

### Application to HIV-1 Integrase.

We now apply these considerations to the end-to-end dynamics in the intrinsically disordered N-terminal domain of HIV-1 integrase (37), a system that attracted recent experimental attention (38). In this case, the “reaction” in question is the closure of a loop where the end-to-end distance (which is viewed as the reaction coordinate *x*) drops below a specified value *SI Appendix* for further details. The distribution of the transition path time obtained from molecular dynamics simulations of this protein (37) is shown in Fig. 3. Similarly to the case of the folding dynamics of another protein (13, 14), the distribution of transition path times for this protein exhibits a long tail (Fig. 3). Consistent with this, we find a coefficient of variation to be equal to

To gain further insight into the physical origin of this broad distribution, we have mapped the atomistically resolved dynamics of HIV-1 integrase onto that of a Markov-state network (39, 40), whose connectivity is illustrated in Fig. 5*A*. The resulting Markov-state model (MSM) provides a good description of the distribution of the transition path time *B*), which agrees with that from the original atomistic model except at short times (because, by construction, a MSM cannot reproduce the dynamics at timescales where the Markov assumption is invalid). See *SI Appendix* for other tests performed to validate the MSM. Importantly, the MSM also predicts a heavy-tailed distribution with *C* = 1.6.

To find out whether the nontrivial network topology alone could explain the observed distribution, we have smoothed the MSM’s free energy landscape by setting the transition rates between every pair of connected nodes to be a constant value (and by adjusting this value to give the correct mean transition path time). This modified MSM still predicts a heavy-tailed distribution of transition path times (Fig. 5*B*) (*C* = 2.0). Since the modified network, by construction, lacks states that act as long-lived kinetic traps, we conclude that nontrivial connectivity of the network rather than its energetic ruggedness is the primary physical origin of the observed broad distribution of the transition-path times. Indeed, as stated above (and proven below), a linear network of states cannot predict a distribution with *C* > 1 regardless of the transition rates between its nodes and, correspondingly, regardless of the free energies of the nodes.

### The Role of Experimental Artifacts and Multidimensional Diffusion Model.

Single-molecule experiments do not usually probe the molecular degrees of freedom directly. For example, in single-molecule force spectroscopy experiments, the intrinsic molecular extension *x* is inferred from the observable experimental degrees of freedom such as the position *y* of the force probe (23, 41⇓⇓–44). As a result, even the minimalist model describing such experiments may require dimensionality higher than 1. Does such instrument-induced multidimensionality lead to a broad distribution of the transition path time with *C* > 1?

Although the coupling of the molecular coordinates to those of the experimental setup has been reported to broaden the distributions of the transition path time as compared to the 1D case, both experimental observations and simulations often report still narrower-than-single-exponential distributions (10, 19, 44). Here, we have studied this case in more detail by considering the standard model where the coupling between the two degrees of freedom is described as a harmonic spring with a free energy of the form *k* represents the stiffness of the polymeric handle connecting the molecule and the force probe (*SI Appendix*). Although we do not have a rigorous proof of the inequality *C* < 1 regardless of the model parameters in this case, extensive search in the parameter space yielded no regimes with *C* exceeding 1. An implication of this observation is that, if a value *C* > 1 is observed experimentally, it is more likely to result from inherent multidimensionality of the molecular dynamics rather than from experimental artifacts.

What properties of multidimensional dynamics and/or of the underlying multidimensional energy landscape are then responsible for values *C* that exceed 1? Some insight into this question can be achieved by considering a potential energy landscape that has two separate saddles corresponding to two different pathways (reaction channels) leading from the reactant to the product (Fig. 6). Such landscapes are common even in reactions involving small molecules, stilbene isomerization being a classic example (45), and they are ubiquitous in conformational transitions that take place in biomolecules (46). We emphasize that, although in general multiple pathways may lead to nonexponential kinetics, existence of parallel reaction pathways is not inconsistent with the exponential kinetics of the overall reaction as long as all activation barriers are high enough (6). Let now *f* and 1 − *f* be the corresponding fractions of transitions occurring via each channel. Then the observed distribution of the transition path time is given by the following:*i* = 1, 2. Introducing coefficients of variation **9** that *C* > 1 if*f*, **10** is satisfied if the ratio **8** can be broad, with *C* > 1, given sufficient disparity between the average timescales **10** becomes

The above simple example suggests that heterogeneity of the ensemble of transition paths may account for broad distributions with *C* > 1. Existence of multiple activation barriers is, however, not a necessary condition. Consider, for example, the scenario where the system has, in addition to the reaction coordinate *x*, a degree of freedom *y* that is slow in comparison with the typical transition path time. Then *y* remains nearly constant for the duration of each transition path, and the distribution of their times can be written as follows:*y* and *y* for the ensemble of transition paths. As Eq. **11** is a generalization of Eq. **8** to the continuous case, we expect that a value *C* > 1 can also be achieved in this case. Indeed, in *SI Appendix*, we present a numerical study of a 2D toy model with a single transition-state saddle, in which a soft *y*-mode results in transition paths wandering far away from the saddle (*SI Appendix*, Fig. S6) and leads to broadly distributed transition path times.

## Discussion

Until recently, rate coefficients, which are related to mean first passage times, have been the focus of kinetic studies of biological molecules. Distributions of first passage times, as opposed to their mean values, are however becoming increasingly accessible to single-molecule measurements. Such distributions are beginning to provide detailed information about energy landscapes traversed by biomolecules en route between stable molecular species. For example, analysis of the short-time behavior of distributions of first passage times enables one to detect the number of intermediate states encountered in a molecular transition (47).

Transition path times are conditional first passage times that are particularly interesting, as they probe the dynamics during the fleeting yet rate-defining events where the activation barrier of a chemical reaction is being surmounted or a biological membrane channel is being traversed. Here, we showed that the width of the distribution of the transition path time contains important information about the dimensionality of the underlying free energy landscape. Specifically, we have rigorously proven that no 1D diffusive model, regardless of the ruggedness of the underlying free energy landscape or position dependence of the diffusivity, can possibly predict a distribution whose coefficient of variation *C* exceeds 1. A coefficient of variation exceeding 1 is thus a fingerprint of multidimensional dynamics.

A value of *C* that is below 1, however, does not necessarily mean that the system is 1D, nor does it imply that a 1D description is a good approximation. Indeed, a common reason requiring one to invoke multidimensionality to describe single-molecule force spectroscopy experiments is the coupling of the molecular coordinate *x* to the observable experimental degrees of freedom such as the position *y* of the force probe (23, 41⇓⇓–44).This coupling leads to memory effects in the dynamics of both *x* and *y* even if the intrinsic dynamics of these degrees of freedom is Markovian (19, 48), and thus a 1D diffusion model may no longer be an adequate description of the observed experimental signal even when the intrinsic dynamics along *x* is diffusive. However, we find here that elastic coupling of the molecule to the experimental observables is not likely to lead to a broad distribution of the transition path time (with *C* > 1). Consequently, coefficients of variation greater than 1, if observed experimentally, are unlikely to arise from experimental artifacts; rather, they indicate “inherent multidimensionality” of the underlying free energy landscape.

Our analysis of toy 2D diffusion models and of the MSM of a protein suggests that broader-than-single-exponential distributions of the transition path time arise as a result of multiple parallel reaction pathways. Consistent with this proposal, recent experimental (49, 50) and computational (18) observations have already indicated heterogeneity of the ensembles of the biomolecular transition paths, but such heterogeneity is a property that is not easy to quantify. In particular, Hoffer et al. (50) invoke energy landscapes with multiple saddles or with broad saddles (as in *SI Appendix*, Fig. S6) as a possible explanation of the high variability of transition path shapes observed experimentally for some DNA hairpins. The above considerations suggest that the coefficient of variation of the transition-path-time distribution may serve as a useful proxy for the “diversity” of pathways.

Other physical models may also be used to describe dynamics that yields values of *C* higher than 1, particularly the fractional Brownian motion model, which was shown to accurately predict the broad distribution of the transition-path time in the villin headpiece HP35 (13). Nondiffusive dynamics and memory along the reaction coordinate could also be captured using the generalized Langevin equation (GLE) (15, 20, 21). Although analytical approximations for *C* greater than 1. Note that a GLE with exponential memory for the reaction coordinate *x* is equivalent mathematically to the dynamics arising from the above 2D model, where the reaction coordinate *x* is coupled to a mode *y* via a harmonic spring term (19, 51); thus, we know that *C <* 1 in this case, but whether the same conclusion holds for more complicated memory kernels is an open question.

From an experimental perspective, use of coefficients of variation offers a significant advantage over analyzing the transition-path-time distributions via fitting them to some analytical function. First, such fitting usually assumes a specific shape of the potential of mean force (e.g., parabolic) and often makes additional approximations such as the barrier height being much greater than the thermal energy. Second, fitting is especially unreliable at long and short times where the statistics are poor. Calculation of the coefficient of variation, in contrast, only requires the first and the second moments of the distribution and should be much more robust with respect to experimental constraints than the distribution itself.

Finally, we note that calculating how the value of the coefficient of variation depends on the transition region boundaries offers a method to probe the dynamics of the system locally. For example, one could choose the boundaries of the transition region to be *SI Appendix*, Figs. S10 and S11). Although in this case we find

To gain additional insight into these observations, particularly into the nonmonotonic Δ dependence of the coefficient of variation, we performed the same analysis for the toy 2D model with a broad saddle (*SI Appendix*, Figs. S7 and S8). Again, we find that the value of *C* first increases as the width Δ decreases, but then it decreases again approaching the 1D result in the limit *C* on the boundaries of the transition region is determined by the topography of the underlying free energy landscape in relationship to the observed reaction coordinate or, equivalently, to the transition region boundaries (Fig. 6). One hopes that such properties can be deduced from the observed functions

## Methods

### Outline of the Proof That Transition Path Distributions Are Narrow (*C* < 1) for 1D Diffusive Dynamics in an Arbitrary Potential.

We start with the known expression (52⇓–54) for the mean transition path time for transitions between boundaries

Here, *D* is the diffusivity, which, for simplicity, is assumed to be position independent here. It is, however, straightforward to generalize this derivation to allow any position-dependent diffusivity function *x*, and

is the splitting probability (55) defined as the probability that the system starting at a point *x*, *SI Appendix*) that the second moment of the distribution of the transition path time is given by a similar relationship:

Here, *x* such that *SI Appendix*). Thus, from Eqs. **12** and **14**, we have the following:

As a consequence, we have the following:

which completes the proof.

## Data Availability.

Codes and trajectory data have been deposited in the GitHub repository (https://github.com/rsatijaUT/Broad-TPT) (58).

## Acknowledgments

A.M.B. was supported by the Intramural Research Program of the NIH, Center for Information Technology. R.S. and D.E.M. were supported by the Robert A. Welch Foundation (Grant F-1514) and the NSF (Grants CHE 1566001 and CHE 1955552).

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: makarov{at}cm.utexas.edu.

Author contributions: R.S., A.M.B., and D.E.M. designed research; R.S., A.M.B., and D.E.M. performed research; R.S. analyzed data; and R.S., A.M.B., and D.E.M. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission. G.H. is a guest editor invited by the Editorial Board.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2008307117/-/DCSupplemental.

Published under the PNAS license.

## References

- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- D. E. Makarov

- 7.↵
- 8.↵
- 9.↵
- 10.↵
- K. Neupane et al

- 11.↵
- 12.↵
- 13.↵
- 14.↵
- T. Mori,
- S. Saito

- 15.↵
- 16.↵
- 17.↵
- 18.↵
- W. M. Jacobs,
- E. I. Shakhnovich

- 19.↵
- E. Medina,
- R. Satija,
- D. E. Makarov

- 20.↵
- R. Satija,
- D. E. Makarov

- 21.↵
- E. Carlon,
- H. Orland,
- T. Sakaue,
- C. Vanderzande

- 22.↵
- R. Covino,
- M. T. Woodside,
- G. Hummer,
- A. Szabo,
- P. Cossio

- 23.↵
- G. Hummer,
- A. Szabo

- 24.↵
- 25.↵
- A. M. Berezhkovskii,
- L. Dagdug,
- S. M. Bezrukov

- 26.↵
- J. Klafter,
- I. M. Sokolov

- 27.↵
- 28.↵
- R. Metzler,
- J. Klafter

- 29.↵
- 30.↵
- 31.↵
- S. V. Krivov

- 32.↵
- 33.↵
- A. Das,
- D. E. Makarov

- 34.↵
- 35.↵
- A. M. Berezhkovskii,
- S. M. Bezrukov

- 36.↵
- 37.↵
- S. Piana,
- A. G. Donchev,
- P. Robustelli,
- D. E. Shaw

- 38.↵
- F. Zosel,
- D. Haenni,
- A. Soranno,
- D. Nettels,
- B. Schuler

- 39.↵
- 40.↵
- 41.↵
- 42.↵
- K. Neupane,
- M. T. Woodside

- 43.↵
- P. Cossio,
- G. Hummer,
- A. Szabo

- 44.↵
- P. Cossio,
- G. Hummer,
- A. Szabo

- 45.↵
- N. Agmon,
- R. Kosloff

- 46.↵
- D. J. Wales

- 47.↵
- A. L. Thorneywork et al

- 48.↵
- A. G. T. Pyo,
- M. T. Woodside

- 49.↵
- J. Y. Kim,
- H. S. Chung

- 50.↵
- N. Q. Hoffer,
- K. Neupane,
- A. G. T. Pyo,
- M. T. Woodside

- 51.↵
- R. Elber,
- D. E. Makarov,
- H. Orland

- 52.↵
- 53.↵
- 54.↵
- 55.↵
- C. W. Gardiner

- 56.↵
- S. Piana,
- K. Lindorff-Larsen,
- D. E. Shaw

- 57.↵
- R. Satija,
- A. Das,
- S. Mühle,
- J. Enderlein,
- D. E. Makarov

- 58.↵
- R. Satija,
- A. M. Berezhkovskii,
- D. E. Makarov

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Biophysics and Computational Biology

- Biological Sciences
- Biophysics and Computational Biology