## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Data-driven parameterization of the generalized Langevin equation

Edited by Alexandre J. Chorin, University of California, Berkeley, CA, and approved October 27, 2016 (received for review June 15, 2016)

## Significance

The generalized Langevin equation (GLE) provides a precise description of coarse-grained variable dynamics in reduced dimension models. However, computation of the memory kernel poses a major challenge to the practical use of the GLE. This paper presents a data-driven approach to compute the memory kernel, relying on a hierarchy of parameterized rational approximations in terms of the Laplace transform, which can be expanded to arbitrarily high order as necessary. This parameterization makes it convenient to represent the GLE via an extended stochastic model where the memory term is eliminated by properly introducing auxiliary variables. The present method is well-suited for constructing reduced models for nonequilibrium properties of complex systems such as biomolecules, chemical reaction networks, and climate simulations.

## Abstract

We present a data-driven approach to determine the memory kernel and random noise in generalized Langevin equations. To facilitate practical implementations, we parameterize the kernel function in the Laplace domain by a rational function, with coefficients directly linked to the equilibrium statistics of the coarse-grain variables. We show that such an approximation can be constructed to arbitrarily high order and the resulting generalized Langevin dynamics can be embedded in an extended stochastic model without explicit memory. We demonstrate how to introduce the stochastic noise so that the second fluctuation-dissipation theorem is exactly satisfied. Results from several numerical tests are presented to demonstrate the effectiveness of the proposed method.

- generalized Langevin dynamics
- data-driven parameterization
- coarse-grained molecular models
- reaction rate
- model reduction

Generalized Langevin equations (GLEs) have recently reemerged in the area of molecular modeling as a promising description of reduced-dimension coarse-grained variables. In principle, GLEs can be derived using the Mori–Zwanzig projection formalism (1, 2). Examples of such derivations can be found for a variety of applications (3⇓⇓⇓⇓⇓⇓–10), for example, climate modeling (11, 12). The GLE approach eliminates a large number of irrelevant degrees of freedom, reducing system dimensions to make direct computation feasible. Because this elimination often projects out high-frequency modes, the GLE can also extend the time scale of simulations. The GLE does this by describing the dynamics for explicit quantities of interest and implicitly describing the remaining degrees of freedom through a memory term and a random noise term. The random noise term is often strongly correlated in time.

However, practical implementations of GLEs require specification of the memory function, which can be difficult to obtain, even when the full dynamics of the system is known. For example, the memory functions obtained in past studies (8, 9, 13, 14) have involved functions of high-dimensional matrices. Darve et al. (14) proposed a more efficient algorithm to compute the memory kernel by solving an equation for the orthogonal dynamics derived from the Mori–Zwanzig formalism. However, the orthogonal dynamics equation can be expensive to solve when the original system is large. Furthermore, even when the memory kernel function is available, direct evaluation of the memory term can be costly because it requires the history of the coarse-grained (CG) variables at every time step and the associated numerical integration. Sampling of the random noise is also a challenging component of GLEs: To generate the correct equilibrium statistics for the CG model, the random noise has to obey the second fluctuation-dissipation theorem (FDT) (15). The theory of stationary processes (16) states that the random process is uniquely determined by the correlation function, which is proportional to the memory kernel; however, sampling the random noise is nontrivial in practice. Methods based on matrix factorization are computationally challenging because they require decomposition of a correlation matrix with dimension proportional to the total simulation period. Alternatively, more efficient methods based on fast Fourier transforms may create artificial periodicity (17).

In addition to the direct derivation of memory kernels (8, 9, 13, 14), there have been numerous attempts to compute the memory kernel from full molecular dynamics (MD) simulations (18⇓–20), especially for systems with zero net mean force. Such analyses lead to integral equations of the first kind, which are numerically unstable without additional regularization. Another approach for estimating the kernel uses Kalman filtering and assumes functions of exponential form (21, 22) such that the GLE can be embedded in a Markovian dynamics framework. In recent work, Chorin and Lu (23) considered a time-discrete representation, representing the memory effects using the NARMAX (nonlinear autoregression moving average with exogenous input) method. Voth and coworkers (24) proposed an alternative approach to recover the CG dynamics by introducing fictitious particles that interact with the variables, effectively introducing an approximation of the kernel function.

In this work, we present a hierarchical approach to obtain GLE kernel functions from simulation data. Such a data-driven approach is particularly useful for complex models (e.g., biomolecules or climate) in which full dynamics models are typically unavailable or inaccessible with finite computing resources. The key idea is parameterization of the kernel function Laplace transform by a rational approximation. The parameters in our ansatz can be computed directly from statistical properties of the CG variables. Additionally, this ansatz makes it possible to eliminate memory from the GLE by introducing auxiliary variables. In particular, we will show how to introduce inexpensive white-noise terms into the extended dynamics to approximate the random noise while satisfying the second FDT is exactly. As a result, no memory term needs to be evaluated, and no colored noise needs to be sampled. The first two approximations in the rational approximation hierarchy correspond to the Markovian approximation (6, 25, 26) and approximation of noise by a Ornstein–Uhlenbeck process, which is the ansatz used in some previous work. As we will show, these two approximations are often insufficient to predict dynamics properties; however, our hierarchy can be used to construct arbitrarily high-order models to characterize long-time behaviors and complex transition dynamics. The technique of using auxiliary variables has been demonstrated by others (27⇓–29) to treat the memory term in the GLE. However, the current work makes a direct connection between the parameters in the extended system and the statistics of the CG variables. As a result, our method does not require the knowledge of the full model: Only the time series of the CG variables are needed.

## Data-Driven Parameterization of GLE

The GLE can be expressed in the following form:

### Calculation of the Memory Kernel.

We assume that we have a time series dataset of **1** by

Here we have defined the correlation matrices,

Given the correlation functions, Eq. **3** can be regarded as an integral equation from which the memory function can be computed. However, this is an integral equation of the first kind, and it is not well-posed, leading to unreliable solutions. Instead of a determining the kernel function directly in the time-domain, we can instead parameterize its Laplace transform. We define the Laplace transform as

Similarly, we denote the Laplace transforms of **3**, we arrive at

We use the values of

It is clear that

For short or intermediate time scales, we use the point **6**, one can find the limiting values of the kernel and its derivatives as **5**, we find that

For example, we have

In the derivations above, we have incorporated the values at

### Rational Approximations.

Given limiting values available extracted from the data, we seek a rational function approximation for

The coefficients

The zeroth-order approximation treats **7**. In fact,

For the first-order approximation (

By matching Eqs. **7** and **10**, we find that

In this case, the memory function in the time domain is given by

Depending on the eigenvalues of

### Extended Dynamics.

A computational difficulty in GLE simulation is that the integral has to be evaluated at every step. However, this difficulty can be removed by introducing auxiliary equations based on the rational approximation of the memory function Eq. **11**. More specifically, we can define **12** implies that the approximate GLE can be written as

We can show that this new memoryless dynamics corresponds to an approximation of the GLE Eq. **1**. The memory function is approximated by the rational function in the frequency space, which is precisely Eq. **13**. More importantly, with this proper choice of the initial condition for **2** exactly with an invariant distribution (see *Supporting Information* for details) given by

The procedure above can be extended to arbitrarily high order, and the extended system can be written as follows:

The matrices **11**. Similar to Eq. **14**, we can also show that, by choosing the white noise **2** exactly. Eq. **16** has an invariant distribution (see *Supporting Information* for details) given by

## Numerical Results

### A One-Dimensional Chain Model.

We demonstrate our method through coarse-graining the dynamics of a tagged particle within a one-dimensional harmonic chain. In particular, we consider the first particle (on the free end) as the target particle and treat the remaining particles as the heat bath. As shown in refs. 39 and 40, the dynamics of the target particle can be modeled by a GLE with kernel given by *Supporting Information* for details). Fig. 1 shows the numerical results for

### A Tagged Particle in Solvent.

We also studied a tagged particle immersed in a fluid system governed by pairwise-conservative forces similar to those in dissipative particle dynamics (DPD) simulations (41, 42), that is, defined by*Supporting Information*).

Following the Mori–Zwanzig method, the dynamics of the target particle can be modeled by a GLE with zero mean force. We considered two cases: (I) **11**. As shown in Fig. 2, the exact kernel function

Unlike case (I), **1** with kernel

In case (I), Eq. **1** can reproduce the velocity correlation at short timescales (e.g.,

The different performance for cases (I) and (II) can be understood by examining the time scale separation of **20** yields fairly good agreement for case (I). In contrast, there is no apparent time scale separation between

### Non-Markovian Effects on Transition Dynamics.

To further validate our method, we simulated the transition rate of a tagged particle in a double-well potential using both by GLE and full MD. The double-well potential had the form**19** with *Supporting Information*). The zero-order approximation (corresponding to Langevin dynamics with a constant friction coefficient) yielded a

## Conclusion

We have presented a data-driven approach to obtain the memory kernel for a GLE through rational-function approximation in the Laplace transform domain. The data-driven nature of this method arises through connection of the rational function coefficients with equilibrium statistics of the CG variables—statistics that can be calculated through simulation time series data. The zero- and first-order approximations recover the Langevin and Ornstein–Uhlenbeck stochastic processes, respectively. Higher-order approximations have also been systematically derived for systems with significant memory effects. Unlike the time-domain kernel function representation, numerical simulations of the GLE using the rational approximation can be conveniently implemented by introducing auxiliary variables that follow linear stochastic dynamics with no memory, eliminating the need for expensive calculations of history-dependent memory terms. We have also shown that the second FDT Eq. **2** can be satisfied automatically using our approach. This method has been tested with simple systems but is applicable to much more complicated biological and material systems with pronounced memory effects [e.g., subdiffusion in single-molecule measurements (43, 44) or transition dynamics of chemical and biological reaction systems (45, 46)]. Many of these systems will be high-dimensional, even after coarse graining. Although our GLE-based method has no dimensionality restrictions, it can still suffer from the curse of dimensionality. In particular, high-dimensional systems will be computationally expensive and require time series for all CG variables. The rational function approximations (Eqs. **14** and **16**) of the GLE enable us to analyze the transition dynamics via the extended dynamics in an augmented phase space

## Acknowledgments

We thank George Karniadakis, Eric Darve, William Noid, Panos Stinis, Gregory Schenter, Lei Wu, Dave Sept, J. Andrew McCammon, and Zhen Li for informative discussions and advice and the two anonymous reviewers for very helpful suggestions. This work was supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research as part of the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: huan.lei{at}pnnl.gov.

Author contributions: H.L., N.A.B., and X.L. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1609587113/-/DCSupplemental.

## References

- ↵.
- Mori H

- ↵
- ↵
- ↵.
- Chorin AJ,
- Stinis P

- ↵
- ↵
- ↵
- ↵.
- Li X

- ↵.
- Ma L,
- Li X,
- Liu C

- ↵
- ↵.
- Wouters J,
- Lucarini V

- ↵.
- Majda AJ,
- Harlim J

- ↵.
- Chen M,
- Li X,
- Liu C

- ↵.
- Darve E,
- Solomon J,
- Kia A

- ↵
- ↵.
- Doob JL

- ↵.
- Li Z,
- Bian X,
- Li X,
- Karniadakis GE

- ↵.
- Oliva B,
- Daura X,
- Querol E,
- Avilés FX,
- Tapia O

- ↵
- ↵
- ↵.
- Harlim J,
- Li X

- ↵.
- Fricks J,
- Yao L,
- Elston TC,
- Forest MG

- ↵.
- Chorin AJ,
- Lu F

- ↵.
- Davtyan A,
- Dama JF,
- Voth GA,
- Andersen HC

- ↵
- ↵
- ↵
- ↵
- ↵.
- Baczewski AD,
- Bond SD

- ↵.
- Zwanzig R

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Laio A,
- Parrinello M

- ↵.
- Einstein A

- ↵.
- Adelman SA,
- Doll JD

- ↵.
- Li X,
- E W

- ↵
- ↵
- ↵
- ↵
- ↵.
- Schenter GK,
- McRae RP,
- Garrett BC

- ↵
- ↵.
- Zhu GQ,
- Tan JQ

## Citation Manager Formats

## Sign up for Article Alerts

## Jump to section

## You May Also be Interested in

### More Articles of This Classification

### Physical Sciences

### Related Content

- No related articles found.