Ab initio generalized Langevin equation

Significance Recent developments of neural network–based atomistic models enable large scale molecular dynamics based on quantum mechanical theory. However, there is still a gap between modeling microscale molecular dynamics and modeling mesoscale coarse-grained (CG) dynamics with the same level of accuracy. The ab initio generalized Langevin equation (AIGLE) model is proposed to bridge the gap, by learning accurately an effective equation of motion for multidimensional CG variables from molecular dynamics data. The capability of AIGLE is demonstrated by simulating domain wall dynamics and extensive dipolar motion in the ferroelectric crystal lead titanate. This work is a step toward bottom–up modeling of mesoscale transformations in materials.


I. INTRODUCTION
Developing accurate and reliable meso-scale physical models is a long-standing problem [1][2][3].In this context, the Mori-Zwanzig formalism [4] stands out as a general methodology for constructing effective coarsegrained (CG) models for any set of collective variables (CVs) defined in terms of microscopic degrees of freedom, such as the atomic coordinates.The idea is to project the dynamics of the microscopic variables on the space of the CVs.Finding an approximate surrogate model for the formal projective dynamics requires knowledge of the free energy as a function of the CVs, and brings in two important new effects: memory, because CV dynamics is generally non-Markovian, and noise, associated with the initial condition for the variables eliminated in the projection process.These effects are difficult to model.As a consequence, one often resorts to simpler approximations for the effective dynamics, such as the Markovian Langevin equation (LE).
Combined with Landau free energy models [5][6][7], LE has been a popular tool for describing meso-scale dynamical processes.Well-known examples include the Landau-Lifshitz equation for the evolution of the magnetization in materials [8], the Allen-Cahn and the Cahn-Hilliard equations for the dynamics of phase transitions and sep-arations [9,10], and, more generally, the phase field models [11] and the phase-field-crystal models [12,13] for a variety of problems in materials science.Landau-based LEs provide invaluable physical insight but may lack the flexibility required to quantitatively model the CG dynamics of real systems.A main issue is the insufficient separation of time scales between the CVs and the noise.In realistic systems, noise may originate from vibrational modes that are not significantly faster than the CVs.In this scenario, the non-Markovian generalized Langevin equation (GLE) is a much better approximation.It can be rigorously derived within the Mori-Zwanzig formalism for Hamiltonians that depend quadratically on the microscopic degrees of freedom [4].In the presence of anharmonicity, the GLE is not exact but can be a flexible enough tool for connecting micro-and meso-scale dynamics, similar in spirit to the way in which semilocal density functional theory (DFT) bridges electronic quantum mechanics and atomistic models [14].So far, efforts to develop quantitatively accurate GLE models have been limited by difficulties in the parameterization of the memory and noise terms [15][16][17].In the context of bottom-up multi-scale modeling, these difficulties lie in the lack of microscopic data, on the one hand, and of robust algorithms to parameterize the GLE, on the other.
In recent years, machine learning has emerged as a arXiv:2211.06558v2[physics.comp-ph]15 Feb 2024 powerful tool in the study of static and dynamic statistical properties of molecular systems [18][19][20][21][22][23][24], enabling ab initio simulations of unprecedented scale [25,26].Today, massive amounts of data can be generated by all-atom molecular dynamics (MD) trajectories with ab initio accuracy.As we will demonstrate below, machine learning can also address the second difficulty mentioned above.
In this paper, we introduce a machine learning-based method for constructing accurate coarse-grained GLE models from fine-grained/microscopic Hamiltonians.We illustrate the approach with atomistic models derived from DFT, but the methodology can also be applied to microscopic models derived phenomenologically.In our scheme, memory is of finite length and translationally invariant in time, and the noise satisfies the constraint imposed by the second fluctuation-dissipation theorem (2FDT) [27], which connects the memory kernel to the autocorrelation function (ACF) of the noise.The 2FDT is essential to describe the dynamics of near-equilibrium physical systems.We call the schemes constructed in this way ab initio generalized Langevin equation (AIGLE) models, because they are trained on data generated with an ab initio microscopic model.The LE model derived from AIGLE by taking the Markovian limit in the memory kernel and the noise will be called ab initio Langevin equation (AILE) model.
Previous works have studied data-driven parameterizations of the GLE [28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43], the LE [44][45][46], the far-fromequilibrium GLE [47,48], and generic stochastic processes [49,50].For GLE restricted by 2FDT, the noise generator is usually constructed from a pre-determined memory kernel or from the ACF of the noise.For instance, in Ref. [29,30], the noise generator is a Yule-Walker linear autoregressive (AR) model fitted to the ACF of the noise.The resulting model does not guarantee the stationarity of the noise.That can be imposed by adjusting the roots of the characteristic equation, but this may lead to uncontrolled errors.Another approach, reported in Ref. [31,32], assumes that the noise generator is a Fourier series with random coefficients sampled from a distribution defined by the memory kernel.In practice, the Fourier series is of finite length, and the generated noise and its ACF become periodic.Recently, Ref. [41] proposes to convert a GLE into Markovian equations of motion for fictitious degrees of freedom, by using a finite order Padé approximant for the memory kernel.The dynamics of the fictitious degrees of freedom is constructed according to the memory kernel while retaining the 2FDT constraint on the noise.In general, approaches that use an average property like the ACF to fix the noise are "mean-field" approximations, aiming at consistency with data on 2FDT rather than on presumably less relevant features like higher-order correlations or kurtosis.Concerns have been raised that in some of these approaches, the statistical error of the correlation functions may be amplified in an uncontrolled way [43].
Alternatively, one can go beyond "mean-field" by adopting a regression approach.For example, Ref. [49] introduced a non-linear autoregressive model for generic stochastic processes not constrained by the 2FDT.Refs.[50,51] used recurrent neural networks for learning generic dynamical systems, as the non-linear nature of general regression tasks may require sophisticated deep neural network models.However, in specialized but important cases such as near-equilibrium systems, knowledge of the free energy surface (FES) and the 2FDT facilitate the task, making it possible to reproduce the time series with relatively shallow and simple neural network regressors.Then, accuracy, stationarity, and efficiency can be achieved simultaneously.In AIGLE, we strive to optimize these three qualities, while "mean-field" approaches essentially compromise accuracy.We constrain the memory kernel with the 2FDT and model the noise with a neural network-based generalized autoregressive (GAR) scheme that can deal with insufficient time-scale separation and anharmonic coupling of the modes.These complications are common in real materials but are often overlooked in toy models.To have an efficient noise generator suited for long-time simulation, we keep the neural network as simple as a compact feed-forward neural network.While most data-driven GLEs assume prior knowledge of the FES, in AIGLE not only the noise but also the FES and the couplings to the driving fields can be parameterized.Moreover, while most existing literature uses a one-dimensional GLE, we introduce a multi-dimensional version of AIGLE, based on a local kernel approximation and a consistent GAR model, which can reproduce not only the one-body but also local two-body correlations.The adopted approximation balances efficiency and accuracy, making it possible to deal with infinite-dimensional, homogeneous CVs.To our knowledge, multidimensional GLEs have only been used so far to study the few-body dynamics of low-dimensional CVs [52][53][54].
In this paper, we expose the details of AIGLE and demonstrate its effectiveness in an ab initio multi-scale study of ferroelectric lead titanate ( PbTiO 3 ).The scheme is not limited to ferroelectric problems and its mathematical structure can be used in reduced models of general crystalline materials.In the present application the order parameters, i.e., the CVs, depend on the local electric dipole moments associated with the crystalline lattice.These local moments act like lattice spins in ferromagnets, and, as the latter, can be coarse-grained to scalar or vector fields in the continuum limit.Unlike lattice spins of fixed magnitude, the local dipoles fluctuate in both direction and magnitude.The sum of the local dipoles defines the polarization of the system, which is a typical example of a global order parameter in Landau's theory of symmetry breaking.Damped vibrational modes associated with polar phonons are embedded in the dynamics of the local dipoles, inducing oscillating correlations among the dipoles, a behavior that differs significantly from the diffusive dynamics of Brownian particles, whose velocity ACF decays exponentially with time.Hence, the difficulties encountered in CG dipole dynamics are similar to those encountered in realistic multiscale models of materials and macromolecules.Specifically, we consider two examples of mesoscale dynamics in crystalline materials: the field-driven dynamics of a planar ferroelectric domain wall treated as a virtual particle in a non-Markovian bath, and the dynamics of extensive local order parameters with translationally invariant interactions.In both cases, AIGLE is trained with atomic trajectories generated at room temperature with the Deep Potential (DP) scheme [21], a deep learning approach that closely reproduces the quantum mechanical potential energy surface at the DFT level of theory.The microscopic lattice dipoles, rigorously defined in the theory of the electric polarization within DFT [55,56], are represented by an equivariant generalization of the DP model [57].In the first example, we study the glassy dynamics of a planar 180 • domain wall induced by a weak electric field E = E ẑ parallel to the polarization of one of the domains.We find that the domain wall shifts by a succession of rare events.At low fields, the domain velocity v D predicted by AIGLE gradually deviates from its AILE counterpart and from the phenomenological scaling law of Merz [58], according to which v D ∝ e −Ea/E , with constant E a .Merz's law can be derived from the theory of elastic interface motion [59,60] that describes the dynamics with an overdamped Langevin equation.Our results suggest that inertia and memory effects captured by AIGLE play a role in the glassy motion of elastic interfaces under weak applied fields.In the second example, we consider the dynamics of a CG lattice of dipoles in the bulk of a compressively-strained PbTiO 3 crystal.AIGLE reproduces well self and close-neighbor correlations of the dipoles, and captures approximately the ACF of the time derivative of the polarization, whose Fourier transform yields the far-infrared optical spectrum.AILE fails in this task, but still models correctly the relaxation pattern of a domain structure, when this is driven by surface tension and memory is not important.A CG lattice dynamics of extensive CVs like that provided here by AIGLE or AILE, would be useful, in general, in studies of the dynamics of extended crystal defects and of epitaxial growth of materials.
The paper is organized as follows.In Sec.II we introduce the AIGLE formalism.In Sec.III, we use AIGLE for ab initio multi-scale modeling of PbTiO 3 .Specifically, we report in Sec.III A a model for the field-driven motion of a planar domain wall in epitaxial PbTiO 3 .In Sec.III B, we report an extensive model of CG lattice dynamics.Details of one-dimensional AIGLE are in the Material and Methods section.Details of multi-dimensional AIGLE are in the Supporting Information (SI), which also includes the microscopic models for PbTiO 3 and other technical details.

II. THE AIGLE MODEL
The starting point is a microscopic model of molecular dynamics.Collective variables (CVs), obtained by coarse-graining the microscopic degrees of freedom when constructing the FES, form a column vector x.The aim is to eliminate the remaining degrees of freedom, and obtain an accurate dynamic model for the CVs, using the GLE ansatz: Here, M is the effective mass matrix, G(x) is the FES, the vector F comprises the external driving forces, K is the memory kernel matrix, and the vector R represents the noise.We define v = dx dt , a = d 2 x dt 2 , F = −∇G(x)+F , and use the subscript T for the transpose of a vector or a matrix.We shall use the brackets ⟨• • • ⟩ to indicate an average over the equilibrium ensemble at t = 0. We require ⟨R(t)⟩ = 0, and the orthogonality condition ⟨R(t)v T (0)⟩ = 0, from which the 2FDT can be derived [27].The 2FDT prescribes that, at equilibrium, relating the memory kernel to the ACF of the noise.In addition, although the noise should not be strictly stationary, ⟨R(t 0 + t)R T (t 0 )⟩ should be independent of t 0 for sufficiently large t 0 .
In AIGLE, Eq. ( 1) is learned from the trajectories of x.The scheme can use, but does not require, a predetermined FES [61][62][63][64][65][66][67][68][69], as all the terms in Eq. ( 1) can be learned from adequate trajectory data.The resulting GLE satisfies numerically the 2FDT for the equilibrium ensemble, and the model can be extended to nearequilibrium dynamics.Extensions to general nonlinear dynamics beyond the 2FDT would be possible, but this paper is limited to near-equilibrium situations.We show that the scheme can be constructed from large-scale MD simulations of realistic materials models.First, we formulate AIGLE for a one-dimensional CV and then we generalize it to infinite-dimensional lattice CVs.
To learn from time series data generated by MD, it is convenient to transform the integro-differential equation (1) into discrete form.We assume that the memory kernel is homogeneous, i.e., independent of position [70] and time origin [71], and use ∆t for the time step of the discretized GLE.Setting t = 0 for the arbitrary starting time, the current time is t = n∆t, and we use the notation f (n) to indicate a time-dependent function f (n∆t).Then, the discretized form of Eq. ( 1) for a onedimensional CV reads Eq. ( 2) is propagated with the leapfrog algorithm: 2 ) ∆t. ( This setup allows synchronization with MD data when ∆t equals an integer multiple of δt, the integration time step of MD.More accurate multi-step schemes for integrating stochastic dynamics exist [72,73], but here we adopt the simple leapfrog scheme because the autoregressive noise model of AIGLE benefits from a simple discretization scheme.Moreover, in consideration of the errors in the free energy calculations, the errors in autoregression, and the lack of a conservation law for stochastic dynamics, the numerical integration error is a minor issue as long as ∆t is sufficiently small relative to the shortest vibrational period of the CVs. The free energy G in Eq. ( 2) can be parameterized empirically, e.g., using a polynomial ansatz, or, more generally, it can be represented by a neural network ansatz when dealing with high-dimensional CVs.The noise term {R (n) } in Eq. ( 2) is modeled by a generalized autoregressive (GAR) model where {ϕ (k) } are Yule-Walker linear autoregressive parameters [74].µ (n) and σ (n) are non-linear functions that depend on the history of the noise.In our approach, µ (n) and σ (n) are the outputs of a deep neural network whose arguments are represents the uncorrelated part of the noise on the scale of ∆t, and should be close to Gaussian white noise for the scheme to be successful.The GAR model becomes a standard AR(m A ) model [75] for constant µ (n) and σ (n) .When the time dependence of µ (n) and σ (n) cannot be ignored, GAR outperforms AR in reducing w (n) to an almost ideal white noise upon training with MD data, a crucial property for the time correlation functions of CG dynamics to agree with the data.Eqs.(2)(3)(4) constitute the AIGLE model.The corresponding AILE model is ma , where the friction satisfies ϑ = n−1 s=0 K (s+ 1 2 ) ∆t, and the white noise w (n) is fixed by the Markovian 2FDT.In AIGLE, the parameters defining G and F , the memory kernel, the Yule-Walker model, and the neural networks in the GAR model, are learned from MD trajectories kept near thermal equilibrium by a stochastic environment that mimicks a heat bath.This is a common situation in realistic finite-temperature systems.Memory in these open systems should extend over a finite time interval, specified by the integer m K , i.e., K (s+ 1  2 ) = 0 for s > m K − 1. m K and m A are a priori parameters of the same order of the relaxation time of the ACF of the velocities of the CVs.At the beginning of the learning protocol, it is convenient to set m A = m K to be several times larger than the relaxation time.Upon fine tuning, the final values of m A and m K get close to the relaxation time of the velocity ACF, and we find empirically that a good choice corresponds to m A < m K .The mass m is fixed by the equipartition theorem.
The recommended learning procedure involves the three actions outlined below.More details are in the Methods section.In the first action the models for G and the memory kernel (2) are trained on equilibrated MD data.Static and conservative forces {F (n) } are absorbed into G.Two steps are iterated to self-consistency.In the first step, G is optimized while keeping the memory kernel fixed.The loss function is the mean squared deviation from MD of the model prediction for the force on the CV without including noise effects.This procedure is equivalent to minimizing the noise.In the second step, the memory kernel is optimized, while keeping G fixed, by imposing orthogonality of velocity and noise in the least squared sense.Self-consistency typically requires a few thousand iterations.At the completion of the first action, the noise {R (n) } is defined by subtracting the gradient force, −∇G, and the memory-dependent friction force, predicted by the model, from the true force acting on the CV in the MD data.Then, we turn to the second action, in which the GAR model is optimized using a maximum likelihood loss function, in which the noise values {R (n) } constitute the time-series data.This procedure ends when the residual noise is almost white noise, and the GAR model is numerically stationary.At this point, the GLE is fully determined for equilibrium systems and can be used to model mesoscale dynamics under equilibrium conditions.However, when an external driving force F is present, extra training may be necessary.This is done in a third action, which is only executed if needed.In this procedure, the parameters that define G and F are refined with the loss function used in the first action, while keeping the memory kernel and the GAR model fixed.We tested the above procedure and the 1D AIGLE model on a toy system, the infinite harmonic chain, which can be solved analytically within the Mori-Zwanzig formalism.The results of this validation test, reported in the SI, show that AIGLE reproduces with high accuracy the MD data, the analytical Mori-Zwanzig solution, and the 2FDT.
Finally, we generalize AIGLE for general lattice problems.First, we reinterpret Eq. (1) for on-lattice CVs.We let x i = (x i1 , • • • , x ids ) T represent a d s -dimensional local order parameter associated to site-i of a d l -dimensional Bravais lattice with periodic boundary conditions.L is the number of sites in the simulation supercell.By concatenating {x i } we define a We let A = {A ij } be the L × L adjacency matrix of the lattice.We use i ∼ j to indicate a neighboring pair (A ij =1).For large L, it is not practical to model a dense d s L × d s L memory kernel matrix ) and a long-range correlated R(t) that preserve exactly the velocity correlation matrix or the 2FDT intrinsic to the data.The simplest approximation is to assume locality and set K jβ iα (s) to be equal to zero whenever the indices i and j do not correspond to the same site, i.e., when i ̸ = j.Even with this drastic approximation, the one-body memory kernel will depend not only on the autocorrelation but also on the cross-correlations of the CVs on different sites.Limiting consideration to close-neighbor correlations, we adopt a variational principle for the optimal memory kernel.We define the orthogonality tensor Ω jβ iα (t) = ⟨R iα (t)v jβ (0)⟩, and define the corresponding orthogonality loss, a functional of K, by: For a given cutoff of the memory time t K = m K ∆t, the optimal one-body memory kernel K * minimizes Eq. ( 5), i.e., K * = argmin K L[K](t K ).We call K * a "local kernel approximation" of the exact many-body memory kernel.It enforces a weak form of the 2FDT.The optimality condition for the memory kernel in the case of 1D AIGLE can be regarded as a special case of Eq. ( 5).We show, in the SI, that the local kernel approximation can be viewed as a special case of a general variational principle for the orthogonality condition.In practice, we still adopt the discretized form of Eq. ( 1) with the leapfrog scheme.We use a multi-dimensional version of Eqs.(2-3), but keep the notation f (n) for a time-dependent function f (n∆t).We require K (s+ 1 2 ) = 0 for s ≥ m K .Given the force field F , after discretizing Eq. ( 5), we obtain a least-square solution for the optimal memory kernel K * (s+ 1 2 ) for 0 ≤ s < m K .The derivation, given in the SI, is lengthy but straightforward.We also generalize the 1D GAR model to the multi-dimensional case.Note that, although K * is one-body, the noise R is still spatially correlated as required by the 2FDT.Thus, the noise generator can not be defined locally as commonly done with molecular dynamics thermostats.To deal with this complication, we allow the noise (R (n) ) i at site-i and time step n to depend not only on its own history but also on the history of a finite number of neighboring sites-j defined in terms of the adjacency matrix A. The sites-j could include nearest neighbor sites (A ij = 1), next nearest neighbor sites, and so on.In other words, the multi-dimensional GAR model is analogous to a graph neural network [76] on the graph of the CVs defined by A. The details of the multi-dimensional GAR model are given in the SI.The training of multi-dimensional AIGLE follows the same protocol of uni-variant AIGLE.

III. APPLICATIONS A. Domain wall as a virtual particle
In this section, AIGLE is used to study a prototypical problem of ferroelectric domain switching -the fielddriven motion of a twin 180 • domain wall in epitaxially grown PbTiO 3 .Schematic drawings of the domain wall and of the crystal structure are shown in Fig. 1.Electric dipole moments (local dipoles) p j , represented by yellow arrows in Fig. 1, are associated with the Ti-centered elementary cells-j [77].The polarization is defined by P = j p j /V , with V the volume of the sample.Polarization changes are experimentally observable.AIGLE is constructed from ab initio electronic structure data within DFT.Microscopic definitions of p j and P are given in the SI.
In epitaxially strained tetragonal PbTiO 3 , polarized along the [001] crystallographic direction at room temperature (T = 300K), the ferroelectric domains have narrow 180 • domain walls [77].A CV that describes continuously the switch of a domain from +ẑ to −ẑ is α = i tanh piz p * /2A wall , where the sum extends to the local dipoles.We choose a value of p * that is close to the bulk average of ∥p iz ∥.In the simulations, we set the parameter A wall to be equal to 400, the supercell area in the xz plane in units of elementary cells.With this definition, when the +ẑ domain grows by one layer of unit cells in the y-direction, the increment of α is approximately equal to 1.
We model the motion of the domain wall driven by an external electric field E = E ẑ. Experimentally, the domain wall velocity, v D , obeys approximately a phenomenological law suggested by Merz [58], according to which v D = v 0 e −Ea/E , with v 0 and E a empirical parameters.For small E, v D → 0, and the wall dynamics is glassy.This motion, called domain wall creep, is usually initiated by the nucleation and growth of flat nu- clei at the separating interface [78].On a coarser time scale, the moving interface can be viewed as a virtual particle that performs a succession of noise-activated, rare hopping events, rather than a steady continuous motion.This behavior cannot be deduced from phenomenological laws, like Merz's, and is usually ignored in continuum models, but can be probed, in principle, with microscopic simulations [79].However, glassy dynamics can easily exceed all-atom simulation capabilities when the time scale is of the order of the microsecond or longer.To cope with the long-timescale bottleneck, one often turns to kinetic Monte Carlo [78], an approach that typically requires ad hoc iteration rules and assumes Markovianity.AIGLE can simulate non-Markovian dynamics with ab initio accuracy for time scales comparable to those reachable by kinetic Monte Carlo.
We generate training data for AIGLE with MD simulations of PbTiO 3 .We adopt the Deep Potential (DP) model for the interatomic interactions, and an effective Born charge (BC) model for the local dipoles (see SI).The MD supercell is shown in Fig. 1, where the x and y dimensions are fixed to match the experimental lattice constant a = 3.91 Å, and the z dimension is barostatted at a constant pressure of P z = 28kbar, a value chosen to roughly match the experimental lattice constant c at 300K and atmospheric pressure (see SI and Ref. [55,80] for more details).With the above setup, we run MD trajectories at T = 300K, with temperature controlled by a stochastic thermostat, in the presence of homogeneous electric fields of varying magnitude E, with 0 ≤ E ≤ 3 mV/ Å, along the ẑ direction.The microscopic data for the CV α are extracted from these trajectories.In these simulations, the atomistic degrees of freedom equilibrate quickly with the environment, and the loss of detailed balance is mostly associated with the CV describing domain motion.
The atomistic simulations suggest that the dynamics of α resemble that of a virtual particle subject to colored noise in a tilted periodic potential [81].When E is small, the particle is trapped in a metastable equilibrium and the velocity ACF of α, defined by A α α(τ ) = ⟨ α(τ ) α(0)⟩, exhibits several characteristic oscillations (modes), i.e., a behavior dramatically different from the simple exponential decay characteristic of Brownian dynamics driven by white noise.These characteristic modes originate mainly from the optical phonons of PbTiO 3 and provide the thermal fluctuations that activate nucleationdriven creep events at small driving fields.Taking time scale separation into account, it is convenient, when constructing a CG model, to filter out the frequencies much higher than that of the slowest mode of α (≈ 30cm −1 ).Then, a new CV, x, is constructed by acting on α with a truncated Gaussian filter in time: Here, ς is the truncation parameter that we set equal to 40.With this choice, the modes of α in the range [0, 100] cm −1 are barely affected, while the modes with higher frequency are suppressed (see SI).The AIGLE integration time step ∆t = 10 fs is equal to five times the MD time step δt = 2 fs.This procedure is substantiated by the fact that the residual noise, after training the model, is indeed very close to white noise.The GLE equation of motion for x, deriving from Eq.( 2), is: Here, we parameterized the FES of Eq.( 2) with the periodic function Comparison of C MD vv (τ ), the normalized autocorrelation function (NACF) of the CV velocity extracted from MD, with its AIGLE counterpart, C GLE vv (τ ), provides a direct validation of AIGLE.The two NACFs, calculated at metastable equilibrium conditions for E = 2mV/ Å, are reported in Fig. 2(a).They agree well with each other but for minor discrepancies.The figure also indicates that the adopted cutoff m K is large enough to satisfy the condition C MD vv (τ > m K ∆t) ≪ 1.A major cause of the small differences between MD and AIGLE is apparent in the inset of Fig. 2(a), which reports the real parts of Fourier transforms of the velocity NACFs, ℜ( ĈMD vv (Ω)) and ℜ( ĈGLE vv (Ω)).The slowest mode occurs at Ω ≈ 30cm −1 in both NACFs.In ℜ( ĈMD vv (Ω)) this mode exhibits a fast oscillatory line shape, indicating relaxational origin.The same mode in ℜ( ĈGLE vv (Ω)) has a smooth Lorentzian line shape with a peak frequency that matches the harmonic frequency of the nearly quadratic free energy basin depicted in Fig. 2(b).This indicates that the relaxational fluctuation of the domain wall is turned into an effective harmonic oscillation in a potential well.Since the ansatz for U (x) assumes a smooth, rather than fractal, dependence on x, the slowest mode of ℜ( ĈGLE vv (Ω)) displays a clean harmonic peak, sharper than the relaxational peak of ℜ( ĈMD vv (Ω)).This subtle difference is, in fact, a desired consequence of coarsegraining the FES.Two other modes (near 80cm −1 ) are displayed by ℜ( ĈGLE vv (Ω)) and by ℜ( ĈMD vv (Ω)) as well.At higher frequencies the spectrum of ℜ( ĈGLE vv (Ω)) is quite smooth and agrees well with the MD results.
The optimized free energy profile as a function of x is shown in Fig. 2(b).Metastability disappears for E greater than E * ≈ 2.8mV/ Å when the profile becomes monotonic.Hence, E * represents the threshold beyond which the near-equilibrium regime appropriate for AIGLE is no longer valid.Under near-equilibrium conditions, the lifetime of a metastable state should be much longer than the relaxation times of the atomic vibrations.To study this phenomenology, we run AIGLE for a dense grid of electric field values in the interval (1, 5) mV/ Å.To visualize the variation of the domain velocity v D , which spans several orders of magnitude, we display in Fig. 2(c) the natural logarithm of v D , extracted from AIGLE and MD simulations, respectively, as a function of 1/E.MD data are only available for v D ⪆ 1m/s due to time limits of fully atomistic simulations.When AIGLE and MD data are both available, the two approaches agree well for E ≤ E * , i.e., under nearequilibrium conditions.For E > E * , the domain velocity of MD is significantly larger than its AIGLE counterpart.From a coarse-graining point of view, this occurs because the 2FDT, valid near equilibrium, has been imposed far from equilibrium.From a microscopic point of view, the electric dipoles, temporarily associated with the moving domain wall, are unable to dissipate energy before separating from the wall.Far from equilibrium memory effects are different from those learned for E < E * .Thus, the present AIGLE model should only be used when E ≤ E * , i.e. within the creep regime of the Markovian theory of elastic interface dynamics [59,60].When v D ⪆ 1m/s, MD shows linear behavior of ln v D with 1/E, in agreement with Merz's law: ln v D = ln v 0 − E a /E [58].A best fit of the MD data to this law gives E a = 28mV/ Å. AIGLE gives essentially the same result, E a = 27mV/ Å, for E ∈ [2mV/ Å, E * ].Thus, v D at low electric fields can be estimated from Merz's law fitted to MD for v D ⪆ 1m/s, as done, e.g., in Refs.[78,79].However, when v D ≪ 1m/s, direct AIGLE simulations display a gradual deviation from Merz's law, as illustrated in Fig. 2(c).When 1/E > 0.6 Å/mV, AIGLE predicts a v D higher than Merz's law by orders of magnitude.This behavior is similar to the stretched exponential inferred from the relation ln(v D /v 0 ) ∝ (E/E c ) −µ of the Markovian theory of elastic interfaces [59] when the dynamic exponent µ is less than 1.This theory assumes a Markovian overdamped regime.Yet, the deviation from Merz's law, predicted by AIGLE at low fields, is markedly more rapid than the stretched exponential of the Markovian theory.This suggests that memory and inertia play an increasingly important role in the regime of very rare domain motions.
To gauge the implication of non-Markovianity, we approximate AIGLE with AILE.Fig. 2(d), shows that LE predicts for v D a behavior consistent with Merz's law, which is not surprising because the derivation of Merz's law requires a Markovian approximation.The same figure shows that LE and AIGLE agree well with each other when 1/E is close to 1/E * , a situation in which the external driving force dominates over memory and noise.For larger 1/E, the LE predicted behavior deviates from a pure exponential in a very minor way, underestimating v D by orders of magnitude relative to AIGLE at the largest values of 1/E.Within Markovian dynamics the friction is always dissipative, hindering thermallyactivated motion irrespective of the time scale of the creep events.An even simpler dynamics is postulated in the Markovian theory of elastic interfaces [59,60] that adopts an overdamped Langevin equation, where both memory and inertia effects are absent.By contrast, within AIGLE, memory results from a convolution of oscillating functions and can occasionally lead to a kinetic energy increase over a short time interval.In combination with inertia, this effect enhances the likelihood of barrier crossing.From the perspective of transition state theory, this effect can be understood as effectively enhancing the pre-exponential factor in the formula for the rate.Non-Markovian effects that facilitate barriercrossing have also been discussed in other contexts, such as, e.g., in the Grote-Hynes theory of chemical reaction rates [82].
Using the domain velocity v D (E) calculated with AIGLE, we can estimate the hysteresis loop observed experimentally when the polarization is reversed by a driving field.We report in the SI a hysteresis loop calculation using a very simple model of ferroelectric switching that ignores point defects and dependence on the curvature of the domain wall.The results are in semi-quantitative agreement with experiments.

B.
Coarse-grained lattice dynamics Here, we use multi-dimensional AIGLE to describe the dynamics of lattice CVs, which are either the local dipole moments {p j } or a CG model of them.The underlying microscopic model is the all-atom DP model of Sec.III A. For each atomic configuration the local dipoles are provided by a neural network model (see SI).
We run NVT-MD to generate the training data.The lattice parameters are fixed to a = b = 3.93 Å and c ≈ 4.04 Å.For E = 0, we run equilibrium NVT-MD in a 8nm × 8nm × 5nm supercell comprising a single ferroelectric domain.The system is illustrated in Fig. 3 (b), where yellow arrows represent the local dipole moments.For E = 0.5mV/A and E = 1mV/A along +z, we run near equilibrium NVT-MD in a 20nm × 20nm × 5nm supercell that initially contains two opposite ferroelectric domains, i.e., a nearly cylindrical up(+z)-polarized domain having a radius of about 6nm, embedded in an environment of opposite polarization, as illustrated in Fig. 3(a).To reduce the energy cost of the cylindrical interface the up-polarized domain shrinks during the simulation, in spite of the applied field favoring up-polarization, with a longer relaxation time when E is larger.In the MD simulations, we calculate the trajectories of all the local dipoles {p j (t)}.Taking the local dipoles in the tetragonal PbTiO 3 lattice as CVs ( Fig. 3(c, middle)), the degrees of freedom are one-fifth of the atom coordinates (Fig. 3(c, left)).Further coarse-graining is motivated by the following considerations.The time correlations of the Cartesian components of the local dipole veloci-ties, i.e., ⟨ ṗjα (t 0 + τ ) ṗjβ (t 0 )⟩ for α, β ∈ {x, y, z}, indicate that the correlations for α ̸ = β are negligible compared to those for α = β.Thus, we can reduce by one-third the CVs by retaining only the z-components, {p jz (t)}, of the local dipoles, which are related to spontaneous polarization.Nearest neighbor dipoles are strongly correlated, because the oxygen atoms, whose displacements contribute to the polarization the most, are shared between adjacent cells.As a consequence, further coarsegraining is possible by blocking into a single dipole pairs of nearest-neighbor dipoles of the original simple tetragonal lattice S. The blocking operation defines two interpenetrating body-centered tetragonal (BCT) lattices S 1 and S 2 obtained from S by bipartition.We assume that our choice of CG dipoles corresponds to S 1 , as illustrated in the right panel of Fig. 3(c).If a = ax, b = bŷ, c = cẑ are the (conventional) unit cell vectors of S, the (conventional) unit cell vectors of S 1 are u = a + b, v = −a + b and w = 2c.Let L and A be the size and the adjacency matrix, respectively, of S 1 .Each site-i of S 1 has 12 neighboring sites (in the sense of graph adjacency on S 1 ), displaced by (±a ± b), (±a ± c) and (±b ± c), respectively.The corresponding CVs are denoted by p = (p iz ) i∈ [1,L] .By construction, the degrees of freedom in p are one-30th of the atomic coordinates, but the polarization along ẑ is left unaffected.Then, we apply a truncated Gaussian filter in time to p(t) to remove high-frequency contributions.The resulting CVs are called {x (n) }: Here δt = 2fs, ∆t = 5δt and ς = 40, as in Sec.III A.
The polarization of the system is under local kernel approximation.We use for multidimensional AIGLE the same notation adopted in Eq. ( 7) for one-dimensional AIGLE.In principle, the external field E (n) can vary in space and time, but we consider here only fields that are time-independent and uniform in space.For the free energy G(x) we assume a simple polynomial form,   have a negligible effect on the ferroelectric transition in PbTiO 3 (see, e.g., Ref. [80] and references therein).
Training proceeds through several steps.We first predetermine G with equilibrium MD data (E = 0) by force matching.Then, we calculate the memory kernel with the same data under local kernel approximation and train the multi-dimensional GAR model using R (n) as time series data.For the noise at site-j, the GAR model includes the noise history of site-j and of its neighbors on S 1 displaced by (±a±c),(±b±c) or (±2c).In the last step, we retrain G and p with non-equilibrium MD data (E > 0).The details are in the SI.The corresponding AILE model is defined by the Markovian approximation of AIGLE as in the one-dimensional case.
The relaxation dynamics of the cylindrical domain in Fig. 3(a), under weak applied field, is illustrated in Fig. 3(d), for E = 0.5mV/A, and in Fig. 3(e), for E = 1mV/A.The noise in AIGLE and AILE trajectories is at the origin of the observed fluctuations in the domain lifetime.Within the uncertainty of the noise, AIGLE and AILE lifetimes coincide, suggesting that non-Markovian effects should be negligible.Indeed, domain shrinking is caused primarily by surface tension, which acts to reduce the area of the interface between domains, a systematic effect originating from the gradient of the free energy.The MD lifetime is deterministic and is extracted from a single trajectory.It agrees with AIGLE/AILE within the uncertainty of the noise for E = 1mV/A (Fig. 3(e)), but is approximately 10ps shorter than AIGLE/AILE for E = 0.5mV/A (Fig. 3(d)).This discrepancy is likely due to the inaccuracy of the simple polynomial model adopted for the FES.Non-Markovian effects should be more pronounced for larger cylindrical domains, where the surface tension is smaller.Simulation of much larger domains would be feasible with AIGLE and AILE but not with all-atom MD, hampering direct comparison for these settings.The special case of a planar domain wall dynamics under applied field was considered in Sec.III A, where it was found that non-Markovian effects play a role for very weak fields.
Next, we consider a uniformly polarized bulk sample in the absence of an external field (E = 0).Static and dynamic properties of the dipoles are reported in Fig. 3(fk).Memory and noise effects are more pronounced in the equilibrium dynamics of the bulk than in the relaxation dynamics of a cylindrical interface.Indeed, the MD ACF of an individual CG dipole x i is reproduced accurately by AIGLE but not by AILE (Fig. 3(f)).At the same time, nearly identical results are obtained with AIGLE and AILE for static properties like the probability distribution P (|x i |) of the local dipole, reported in Fig. 3(g), as expected from the fact that AIGLE and AILE yield the same equilibrium Boltzmann distribution.On the scale of Fig. 3(g), AIGLE and AILE are identical and only AIGLE is reported.The AIGLE distribution overlaps almost perfectly with the MD distribution barring a minor overall shift, much smaller than the range of the average dipole magnitudes extracted from experiments.
The remaining panels in the figure confirm the importance of non-Markovian effects.Fig. 3(h) shows that the ACF of v i (t), the time derivative of x i (t), is reproduced accurately by AIGLE but not by AILE.Also, the cross-correlation function between the time derivatives of neighboring dipoles shown in Fig. 3(i) is reproduced well, at least up to about 0.5ps, by AIGLE but not by AILE.These results suggest that the adopted local kernel approximation, which uses an optimized one-body memory kernel and many-body-correlated noise, can capture the short-range correlations among the dipoles that should dominate the fluctuation and dissipation of observables like the spontaneous polarization P. Indeed, the ACF of Ṗ(t), the time derivative of P(t), displayed in Fig. 3(j), shows that AIGLE captures its dominant oscillatory frequency, while AILE misses it completely.However, at larger lagging times τ in the interval [0.5, 1.5]ps AIGLE fails to reproduce the weak out-of-phase oscillations observed in MD.This behavior may originate from anharmonic couplings between vibrational modes that are not captured in the CG model.Neglect of long-range correlations in the noise could be another source of errors, as suggested by the observation that AIGLE would overestimate ⟨ Ṗ2 (t)⟩ by about 30% if the GAR model did not include the history dependence of neighbors separated by (±2c).Thus, including longer-range correlations may improve the accuracy of the model.This may be possible by adopting a more elaborate GAR model for the noise while retaining the simple local kernel approximation of AIGLE.It is also instructive to compute S(Ω), the spectrum of ⟨ Ṗ(t + τ ) Ṗ(t)⟩, which can be compared with experimental infrared spectroscopy.The spectra from MD, AIGLE, and AILE, given by S(Ω) = ℜ ∞ 0 dτ exp(−iΩτ )⟨ Ṗ(t + τ ) Ṗ(t)⟩, are reported in Fig. 3(k), upon Gaussian broadening with full width at half maximum of 12cm −1 .As expected from the realtime data, the AILE peak in Fig. 3(k) is significantly weaker than the other two, while AIGLE is stronger than MD, reflecting a sharper spectral feature.AIGLE reproduces well the peak frequency of MD, while AILE is redshifted by approximately 40cm −1 .The spectral feature in Fig. 3(k) is associated with the zone-center 1A 1 transverse optical phonon, which is both infrared and Raman active.The corresponding feature from Raman scattering experiments lies at ω EXP 1A1 = 148.5cm−1 [86], with a full width at half maximum (FWHM) of approximately 30cm −1 , while the MD FWHM is 43cm −1 and that of AIGLE is 23cm −1 .The red shift of the MD/AIGLE peak at 120cm −1 , relative to the experiment, is mainly due to the adopted DFT approximation.
The above results show that AIGLE with the local kernel approximation can capture to a large extent the dynamic behavior of the CVs predicted by MD for bulk PbTiO 3 .At the same time, AILE, while equivalent to AIGLE for static properties, can not capture dynamical correlations when memory is important.
Finally, a comment on computational efficiency is in order.When modeling the dynamics of a 20nm × 20nm × 5nm supercell on one Nvidia-A100 GPU, MD runs at 0.5ns/day, AIGLE at 0.4µs/day, and AILE at 1.7µs/day.Thus, the speedup over MD is of three orders of magnitude for both AIGLE and AILE.Moreover, AIGLE and AILE use significantly less memory than MD, facilitating simulations of significantly larger supercells.

IV. DISCUSSION
We have introduced a practical scheme to construct coarse-grained GLE models from MD trajectories.Our approach does not rely on the formal projection of MD onto the space of the CVs.As a consequence, the GLE construct is not exact, but should rather be viewed as a physically motivated approximation.While the idea of parameterizing GLE models with data extracted from MD trajectories dates back to at least 50 years [28], we exploit here modern techniques, such as machine learning and deep neural network representations, to generate extensive training data sets with MD and to construct the correlated noise model in the GLE.This enables us to construct AIGLE models, consistent with the microscopic dynamics, for one-and multi-dimensional CVs.Multi-dimensional AIGLE is not a trivial extension of its one-dimensional counterpart, and requires a local variational approximation for the memory kernel and a nearsightedness approximation for the correlated noise.The latter could be formulated only for systems in which local CVs reside on sites with a fixed topology described by an adjacency matrix or a graph, such as crystals and individual polymeric molecules.How to extend the approach to more general disordered systems remains an open issue.Here, we considered mesoscale processes in PbTiO 3 , a ferroelectric crystal, to illustrate the scheme and test its validity.
When used to study one-dimensional interface dynamics, AIGLE can model rare events on glassy landscapes caused by nucleation and growth at the atomistic level, reproducing the interface evolution driven by a weak applied field at a much lower computational cost than MD.In contrast to MD, AIGLE can access very rare events, revealing that, in the "slow" creep regime, when the time scale of the events is much longer than that of the memory, the scaling law for the domain velocity may deviate significantly from that of the "fast" creep regime, due to non-Markovian effects.
When applied to the dynamics of extensive CVs, AIGLE can model the relaxation of an elastic interface of any shape, a special case of extended defects, while still keeping the bulk dynamics of the CVs consistent with MD.These features distinguish AIGLE from other multiscale models with more drastic levels of coarse-graining, such as, e.g., a Landau-Ginzburg field theory of the extensive CVs in the continuum limit.A field theory model can not provide atomistic level resolution of an interface, or correctly describe the vibrational spectrum of a global order parameter like the electric polarization at low but nonzero frequency.In ferroelectric materials polarization dynamics at low frequency is typically dominated by an optical phonon mode that cannot be reduced to white noise, and cannot be modeled by AILE.In this context, AIGLE captures many-body correlations between CV components that are topologically close when the distance is measured in a graph.This feature is the key difference between a truly multidimensional GLE and a set of one-dimensional GLEs with independent frictions and noises.
Our study provides also examples where non-Markovian effects are irrelevant.In the CG lattice dynamics of PbTiO 3 AIGLE and AILE give similar results for the motion of an interface dominated by a systematic driving force like the surface tension.In that case, memory effects are negligible, but our study shows that they may become important for glassy dynamics.It would be interesting to investigate the effect of driving fields that vary in space and time.Terahertz control of materials is an area of growing importance due to novel experimental developments [87].For controlling fields within the frequency range of atomic/molecular vibra-tions, non-Markovian memory and noise effects could be in resonance with the external controlling field, coupling the latter to collective behavior associated with domain motion and/or phase transitions.
Modeling CG lattice dynamics with AIGLE or AILE brings us to a scale where phenomena are typically treated with continuum models.These phenomena include general domain dynamics and phase separation in the condensed phase, which occur in ferromagnets [88], ferroelectrics [89], and alloys [9,10].Other phenomena, important in the fabrication and characterization of nanomaterials, include morphology evolution in epitaxial growth [90] and height fluctuations of two-dimensional membranes [91].In these contexts, the application of phase field models is very popular, whereby a continuum approximation is imposed a priori and partial differential equations are constructed, guided by symmetry and physical intuition.This approach often captures the correct qualitative physics.However, when defects like impurities, grain boundaries, and domain walls are present, ad hoc continuum approximations fitted to few experimental observations, may be insufficient.When the role of defects is important, lattice models for the local dipole moments, local strains, and spins should be more reliable, as defect dynamics could be incorporated in lattice models by coupling homogeneous CVs on a lattice to a finite number of virtual particles representing mobile defects.Along this line, one may be able to model notoriously difficult processes, such as those leading to the fatigue of ferroelectric devices when the dynamics of point defects gradually impacts the dynamics of domain walls over large space and time scales [92].
All the applications discussed in the present work focused on the near-equilibrium regime, where the dynamics is constrained by the 2FDT.However, our methodology could be extended to far-from-equilibrium situations, where a governing principle like the 2FDT does not apply.A regression-based approach like AIGLE can be adapted to deal with these situations, whereas conventional approaches based on ACFs would lose the convenience of direct construction of memory and noise terms.How to extend AIGLE to deal with far-from-equilibrium phenomena is a direction that we intend to explore in future studies.

Separation of noise
The first step of learning relies on equilibrated MD trajectories κ = {x (n) |n ∈ [0, N ]} with ergodic fast de-grees of freedom.v (n+ 1  2 ) and a (n) are computed from Eq. (3).v (n) is further determined as the average of v (n+ 1  2 ) and v (n− 1 2 ) .We turn the ensemble-averaged orthogonality condition ⟨R (n) v (0) ⟩ = 0 to a time-averaged one.To achieve that, we introduce the shifted GLE with an arbitrary starting point n 0 ≥ 0: Here, R(n0) is a shifted noise and n > n 0 is required.As demonstrated in Ref. [27], ⟨ R(n0) For a given CV trajectory, the shifted noise R(n0) (n) is explicitly computed by inverting Eq. ( 10): For n = n 0 , we let R(n , where N k = N − k.Note that ζ 0 only depends on the force fields while ζ k also depends on the memory kernel for k > 0. It is not recommended, for numerical stability, to train F by imposing the orthogonality condition ζ 0 = 0 directly.We recommended, instead, to train F by minimizing the noise within a maximum-likelihood perspective, and further decouple the training of F and K for stability and efficiency.To achieve these goals, we first define the constrained optimization problem: minimize Here, π κ denotes an ensemble of κ trajectories.The ensemble average E κ∼πκ is not necessary when κ is ergodic and sufficiently long.But in practice averaging over multiple finite-size trajectories is preferred.m K is the finite memory cutoff of K. F θ and K θ are the parameters of F and K, respectively.F can be any differentiable parameterized function, including neural networks.Eq. ( 12) should be transformed into an unconstrained problem in practical applications.Notice that the constraint E κ∼πκ ζ k = 0 can be written equivalently as Considering k ∈ [1, m K ], Eq. ( 13) can be written in matrix form as Y = CK.Y and K are vectors of length m K .C is a m K × m K lower triangular matrix.The left-hand side of Eq. ( 13) is the k-th entry of Y.
The j-th entry of K is Hence, the least-square solution of Eq. ( 13) can be written as K = Inv(C T C)C T Y. Inv is the pseudo-inverse operator computed from single-value decomposition with a cutoff ratio to avoid numerical instability.
We are then able to approach the solution of Eq. ( 12) practically by interleaving n GD ≥ 1 unconstrained optimization steps towards minimize with one iteration of The parameter ϵ ∈ (0, 1) should be small enough for stability.In this work we use ϵ = 0.01.The second step in Eq. ( 15) forces K m K = 0 over the course of training.

Training of the GAR model
In the previous step, the noise Then one can establish a GAR model with R (n) as data.The GAR parameters include the linear coefficients ϕ = (ϕ (1) , • • • , ϕ (k) ) and the parameters {µ θ , σ θ } of the neural network.We define the maximum likelihood loss function It is not recommended to minimize L GAR directly with respect to all the parameters without constraints.Overfitting the data should be avoided for the long-term stationarity of the GAR model.This is crucially important for simulating AIGLE at or above the µs scale, much longer than the picosecond/nanosecond duration of the MD trajectories.So, we harness the GAR model by imposing on ϕ the constraint that they should satisfy the Yule-Walker equation.For a given CV trajectory, let λ (k) be the estimator of the noise ACF, given by λ L GAR (17) with one iteration of Although in the formal presentation, the training of GAR is done after the training of the first step, in practice one can train GAR on the fly to simplify the implementation.

Incorporation of near-equilibrium data
In this step, we deal with additional datasets that violate detailed balance.We fix the memory kernel and the GAR model obtained for thermal equilibrium, assuming that they are approximately the same in near-equilibrium situations.The optimization task is simply minimize for the extended dataset.Here F θ may include the parameters of the external driving forces.

Data and Code availability
The DP model and a minimal implementation of AIGLE are publicly available on Github [93].

I. VALIDATION WITH A SOLVABLE MODEL
We study a toy model with a known memory kernel [S1] in the MZ framework under thermal equilibrium conditions.The model consists of a one-dimensional harmonic chain of N + 1 identical particles of mass m.The chain has one free and one fixed end.Let {z i } i∈[0,N ] and {p i } i∈[0,N ] be the coordinate and momentum of the particles in the harmonic chain.Let z 0 = 0 and p 0 = 0.The Hamiltonian of the system is The parameters of the system we simulate are x N is connected to a Langevin thermostat with damping time t d = 10, representing a gas/solid interface.The CV x is chosen to be the displacement of the free end of the chain from its equilibrium position, i.e., x = x N .We use a MD time step δt = 0.1 and obtain 200 trajectories of x, each lasting t M D = 2000 after initial equilibration.The trajectories are further coarse-grained with a time step ∆t = 4δt by sampling the MD data every 4 MD time steps.The GLE ansatz for this system can be written as The neural network part of the corresponding GAR model is a feed-forward neural network with two hidden layers (size=10).The neural network only outputs µ (n) in the GAR model.The σ (n) in the GAR model is parameterized directly by one scalar variable since its history dependence is unnecessary for this case study.The finite memory cutoff is taken to be m K = 50 for the memory kernel and m A = 25 for the GAR model.The training is done with the Adam optimizer implemented in PyTorch [S2] with the default setting and learning rate of 0.001.Convergence is reached with approximately 3000 iterations.For each iteration, 50 trajectories of the CV are randomly chosen for optimization.The parameters ϵ for the iterative update of memory kernel and the Yule-Walker coefficient are all 0.01.The interval n GD is 10.The optimized ω 0 is 0.09.The optimized standard deviation of the white noise is σ (n) = 0.721.The colored noise R and the white noise w are both distributed normally on the dataset with negligible off-centering.Non-stationarity is not detected numerically for the GAR model.The outcome of the training is reported in Fig. S1.The GAR model successfully reduces w to almost white noise with delta-like normalized autocorrelation function (NACF), as shown in Fig. S1(a).An obvious serial correlation would be present if we used an AR model instead.Both R and w are normally distributed on the dataset.Fig. S1(b) shows that the optimized K(τ ) obeys 2FDT and agrees well with the theoretical result for an isolated infinite chain, K MZ (τ ) = −J 1 (2τ )/τ , where J 1 is a Bessel function of the first kind.The discrepancy near τ = 0 is due to the inclusion of white noise from the Langevin thermostat.Fig. S1(c) compares the velocity autocorrelation from the MD dataset and the AIGLE simulation.They display minor disagreement for τ > 10.This is the error caused by substituting w (n) with the ideal white noise generator when propagating AIGLE.The same effect can be seen in Fig. S1(a) on the slight discrepancy between C data RR and C sim RR .We also compare a simulated AIGLE trajectory and an MD trajectory with the same initial conditions in Fig. S1(d).They display statistically equivalent behavior of stochastic harmonic oscillators.

II. CONTINUOUS-TIME ON-LATTICE GLE
A. Variational approach for memory kernel Following the notation used in the main text, the GLE for We recall the orthogonality condition ⟨R(t)v T (0)⟩ = 0, and the 2FDT for the equilibrium ensemble.We start by formulating a variational principle for the most general memory kernel, i.e., a kernel K(s) = {K jβ iα (s)} represented by a dense tensor.Using Eq. ( S3), the orthogonality tensor defined by Ω jβ iα (t) = ⟨R iα (t)v jβ (0)⟩, can be written as where the Einstein notation for repeated indices is used, with Θ the Heaviside step function (Θ(x) = 1 for x > 0, Θ(x) = 0 otherwise).In general, the deterministic force F can be defined by force matching.Assuming that the exact F is known, the diagonal element of the orthogonality tensor at t = 0 vanishes, i.e., Ω iα iα (0) = Q iα iα (0) = 0 Given a canonical ensemble of trajectories and the corresponding exact F, the orthogonality condition implies that the exact memory kernel K minimizes the canonical orthogonality loss functional L[K], defined by By exploiting locality in space and finite memory time, we approximate L with L given by where t K is the cutoff time of the memory, d c is the cutoff length of the correlation in space, and d ij is the graph geodesic distance between sites i and j, given the adjacency matrix A of the lattice.We require K(s) = 0 for s > t K .
A bulk crystalline system, under equilibrium conditions, has translational symmetry, and the complexity of evaluating L[K](t K ) is limited by d c and t K , regardless of the size of the lattice.Then, the optimal K = K * for given d c and t K is found by minimization, i.e.,

B. Local kernel approximation
The approximation introduced in the previous subsection is still excessively complicated for use in practical applications.Here, we make a further important simplification, valid when the velocity correlations of the CVs in the extensive coarse-grained lattice dynamics are negligible beyond nearest neighbors.Then, 1 < d c < 2, and the functional in Eq. (S7) becomes We also assume that M jβ iα = δ ij δ αβ M α has been determined from the data by applying the equipartition theorem to the kinetic energy of the CVs.The local kernel approximation for K, adopted in the main text, assumes that only the on-site elements K iβ iα (s) can be nonzero.Intuitively, one expects that the main contribution to memory should be on-site.In practice, the local kernel approximation is justified when the corresponding coarse-grained dynamics reproduces with sufficient accuracy the dynamics inferred from the underlying microscopic Hamiltonian.We notice that the local kernel approximation would be compatible with a less restrictive choice of the spatial cutoff d c in Eq. (S7).In this way, the memory kernel will still be on-site but will depend effectively on longer range correlations between the CVs.Since, by translational symmetry, K iβ iα (s) does not depend on the site index i, we drop this index in the following, and use the notation K β α (s) for K, and Ω 0 α,β for Ω iβ iα For the same reason, Ω jβ iα should depend only on the relative displacement η ij connecting sites i and j in the lattice.Let {η ι |ι = 1, • • • , d nn } be the set of different lattice vectors connecting nearest neighbor sites, and let η 0 be the null vector.Using the label i [ι] to indicate a site displaced from site-i by η ι , we further simplify the notation for Ω to Ω ι α,β .Then, for arbitrary i and j, we have Then, Eq. (S9) becomes Here, Ω ι α,β (t) is given by In Eq. (S11), ⟩ are directly calculated from the data.Writing this equation in matrix form (with respect to the indices α and β) leads to Here all matrices are of size d s × d s .Taking the functional derivative of Putting Eq. (S13) in matrix form and inserting Eq. (S12) leads to The second equality holds because by definition C ι (x) = 0 for x < 0. Finally, K * is the least square solution of δL In the continuous-time formulation of the GLE it is not evident how one could find the solution K * via a standard least-square algorithm.This will become evident in the next section where we use the discrete-time formulation of the GLE recommended in practical implementations.

III. DISCRETE-TIME ON-LATTICE AIGLE
A. Formulation The discrete-time GLE is an approximation to the continuous-time GLE.Here we follow the same leap-frog strategy adopted for univariant AIGLE.The discrete-time on-lattice AIGLE takes the form Here ∆t is the time step of integration, n is an integer, and t = n∆t is the current time.We use f (n) as an abbreviation for any time-dependent function f (n∆t).The integration scheme for the GLE is given by We call {n∆t|n ∈ Z} the integer grid and {(n − 1 2 )∆t|n ∈ Z} the shifted grid.The trajectory data x (n) are given on the integer grid.v (n− 1  2 ) and a (n) can are computed directly from Eq. (S17).v (n) on the integer grid is subsequently intepolated as To avoid messy indexing, in the following we make all the indices labelling a local order parameter component implicit.With this simplified notation 2 ) .

B. Memory kernel
The results of Sec.II can be translated straightforwardly into discrete-time notation.Thus, we go directly to the local kernel approximation and show how the optimal memory kernel can be obtained in practice.
Eq. (S10) becomes The matrix Ω ι (n) = (Ω ι α,β,(n) ) 1≤α,β≤ds is given by with n ≥ 1. Ω ι (0) = Q ι (0) is excluded from Eq. (S18) because it does not depend on the memory kernel.Similar to the derivation of Eq. (S15) from Eq. (S10), the optimal condition for the memory kernel is derived from Eq. (S18), giving To make implementations with standard linear algebra routines evident, in Eq. (S20) we further define the d s × d s matrices U l , V l,k and W k .V l,k and W k are obtained directly from the data, while U l is to be found with a tensor least square routine.For the special case for which d s = 1, U l , V l,k and W k are just scalars, and we define the vector U = (U 0 , • • • , U m K −1 ), the matrix V = (V l,k ) 0≤l≤m K −1;0≤k≤m K −1 , and the vector W = (W 0 , • • • , W m K −1 ) .Note that V is positive semidefinite.This is the simplest case of local kernel approximation, for which U = V −1 W .

C. Noise generator
Under local kernel approximation, the memory kernel is one-body, but the time evolution of R can not be reduced to a one-body equation of motion.Then, the univariant GAR model should be generalized to the multi-dimensional case.Let R i,(n) = (R i1,(n) , • • • , R ids,(n) ) be the d s -dimensional noise acting on site-i at time step n.Let n A be any positive integer.Let (ϕ α,k ) 1≤α≤ds,1≤k≤n A be a d s × n A real matrix .Assuming translational symmetry, the multi-dimensional GAR model for predicting R i,(n) from the history of noise is given by and Here, ϕ α,k are the Yule-Walker linear autoregressive parameters [S3] for R iα,(n) as a time series with respect to n. σ iα = σ α is a scalar parameter.In principle, σ iα could be a function of the history of R iα,(n) , but in all the cases we studied we found that assuming a scalar σ iα was sufficient for accurate autoregression, which is the choice that we adopt in the following.w iα,(n) is a standard Gaussian white noise, independently sampled for each i, α and n.N (i) is the neighborhood of site-i.Typically, we can let N (i) includes site-i and all site-j satisfying A ij = 1.It is also feasible to include second nearest neighbor beyond A ij = 1 in N (i).µ j iα is a one-body function taking as input the history of R jα,(n) as a time series with respect to n.In ab initio GLE, we use a collection of small feed-forward neural networks to represent (µ j iα ) 1≤α≤ds .The size of the collection can be reduced when system symmetry is taken into consideration.For a translationall-invariant lattice, (µ j iα ) is decided by the separation between site-i and site-j.So the complexity of the GAR model does not grow with the lattice size.
Training of the multi-dimensional GAR model is similar to that of the univariant GAR model.With a given K * , we extract the time series R (n) from Using {R (n) } as data, we first determine the Yule-Walker solution for ϕ α,k using a least square routine.Then, we train the parameters in (µ j iα ) 1≤α≤ds and (σ α ) 1<α<ds using the maximum likelihood loss function After training, one can run the GAR model independently, and calculate the noise ACF Υ The subscript "GAR" indicates that the ensemble average is calculated over trajectories generated by the GAR simulation.Then, the ACF of the noise can be compared with ⟨v (0) v T (0) ⟩K * T (n) to verify that the 2FDT is satisfied.For that purpose, the equal time velocity correlation matrix G = ⟨v (0) v T (0) ⟩ is extracted from the data.A small deviation of Υ (n) from GK * T (n) should be expected, reflecting a slight violation of the 2FDT derived from the data.This is an inevitable consequence of the approximation made for the memory kernel, i.e., the local kernel approximation.Thus, the equal-time velocity correlation matrix from the AIGLE simulation will deviate slightly from the corresponding matrix from the MD data.One may try to reduce this error by modifying the memory kernel to enforce the 2FDT directly.We call K * , a modified memory kernel that minimizes the target function where we follow the conventions of subsection III A, i.e., G ι α,γ = 1 and can be found straightforwardly with a least square scheme.K * is different from K * , defined in subsection B. In practice, the difference between the two approximate kernels is mostly cosmetic.While K * is more faithful to the data in terms of the time series regression, K * is more faithful to the data in terms of the equal-time velocity correlation matrix.In practice, the two approaches produce minor quantitative differences, but are equivalent in terms of physical accuracy.

A. DFT-based atomistic models
Our atomistic model for PbTiO 3 is based on the methodology developed in Refs.[S4-S7].The DP model for representing the potential energy surface is trained on DFT data with the SCAN meta-GGA functional [S8].The dataset is collected through the active learning procedure implemented in the DPGEN code [S9, S10].The data are labeled by Quantum ESPRESSO [S11] with norm-conserving pseudo-potentials [S12] including semi-core states.An early version of the current DP model was introduced in Ref. [S7].It was trained with configurations without domain walls and used to study the ferroelectric phase transition in bulk PbTiO 3 with atomistic simulations.The model predicted thermodynamic and ferroelectric properties of PbTiO 3 in close agreement with experimental results, and provided unique insight into the microscopic mechanism driving the phase transition.To train the current DP model we follow the same numerical protocol adopted in Ref. [S7], where the full technical details can be found.The final dataset for the current DP model contains configurations without domain walls and with two twin 90 • domain walls or two twin 180 • domain walls.1276 configurations with no domain walls are collected with the active learning procedure of Ref. [S7] in the temperature interval [300K, 1200K] and the pressure interval [0, 10 5 Pa].494 configurations with domain walls are collected with the same active learning procedure in the temperature interval [100K, 600K] and the pressure interval [0, 10 5 Pa].The largest supercell with domain wall configurations contains 54 elementary cells, corresponding to 270 atoms.We use a spatial cutoff radius of 8 Å for the DP model.This is larger than in typical DP models but is necessary to capture long-range interactions in the presence of two domain walls with periodic boundary conditions.
Fig. S2 plots the nearly Gaussian error distribution of the DP model against the training set.The root-meansquare error (RMSE) of the energy is 1meV/atom for configurations without domain walls and 1.4meV/atom for the remaining ones.The RMSE of the Cartesian components of the forces is around 0.1eV/ Å in all cases.The force error is slightly larger for the direction orthogonal to the domain walls, which is the x-axis for our data with 180 • domain walls.The DP model is also validated on a small independent testing set, showing similar accuracy to its performance Next, we calculate the Berry-phase polarization change, relative to the centrosymmetric structure for all configurations without domain walls.The calculation is done with the Wannier90 code [S13].The total polarization is the sum of electronic (Berry phase) contributions and ionic contributions [S14].The dipole D of the simulation cell (cell dipole) is obtained by multiplying the total polarization by the volume of the cell.The Born charge (BC) associated with atom-i is the tensor Z i,αβ = ∂Dα ∂x i,β defined as the change of D in direction α caused by the displacement of atom-i away from its equilibrium position in direction β.We adopt a linear approximation for the dependence of D α on the atomic coordinates, given by D α = i β∈{x,y,z} Z i,αβ x i,β , which, in the manuscript, is referred to as the BC model.Due to space-group symmetry or negligible contribution, some degrees of freedom of the BC model can be eliminated.For PbTiO 3 , the BCs of Pb and Ti atoms are conventionally approximated [S15] by diagonal matrices Z Pb,αβ = Z Pb δ αβ and Z Ti,αβ = Z Ti δ αβ , respectively.The BC of the O atom is also given a diagonal approximation, with values Z O1 for the Ti-O bond direction and Z O2 for the other two orthogonal directions.In the present study, the BC model is fitted to the cell dipole data with mean squared error loss, leading to Z Pb = 3.7140e, Z Ti = 5.4897e, Z O1 = −3.3551eand Z O2 = −2.9234e.The RMSE of the cell dipole predicted by the resultant BC model is 2e Å for a 3 × 3 × 3 supercell.This corresponds to roughly a 2µC/cm 2 RMSE in polarization, much smaller than the 72µC/cm 2 total polarization of PbTiO 3 extracted from our room temperature simulations.The BC model constructed in this way is statistically more accurate at finite temperature than models using conventional Born effective charges calculated by perturbing the ground state equilibrium structure [S16].Using the BC model so defined, the local electric dipole p j associated with each Ti-centered elementary cell-j of PbTiO 3 is given by a weighted sum of the position of the atoms in the cell-j [S17, S18].The local dipole defined via the BC model is an approximation of the local dipole introduced in Ref. [S7] via maximally localized Wannier functions.The latter includes full (nonlinear) environmental dependence of the local dipoles, which is computationally significantly more expensive than the simple BC model.So in the current study, we use the BC model, to accelerate the simulation of domain wall dynamics.We also remark that the linear approximation is sufficiently accurate for MD simulations at room temperature, which is sufficiently suppressed.Among all, the mode near 280cm −1 is not completely eliminated but becomes a continuous spectrum between 100cm −1 and 400cm −1 .The magnitude of the continuous spectrum is, however, negligible compared to the slow modes between 0cm −1 and 100cm −1 .So the integration time step of AIGLE with respect to x is mainly limited by the mode peaked near 90cm −1 .
To have a more straightforward view of temporal coarse-graining, we plot α(t) and x(t) for the same segment of MD trajectory (no domain motion event) simulated under zero external electric field, in Fig. S3(b).It is clear the fast and small oscillation of α is reduced in x, while the slow and strong oscillation roughly between 0.02A and −0.02A is preserved.

D.
Training of AIGLE In the first two training steps of the AIGLE model, the dataset consists of several 400ps long trajectories of x under E = 2mV/ Å.The underlying Hamiltonian dynamics is at metastable equilibrium -the twin domain walls are trapped by the periodic potential around a local minimum within the simulated time scale.Under such circumstance, the potential energy surface of the CV is not fully explored, but the fast atomistic degrees of freedom are sufficiently ergodic for determining K and the GAR model.The GAR model contains a feed-forward neural network with two hidden layers (size=10).The σ (n) is parameterized by one scalar variable.We use the Adam optimizer with an initial learning rate of 0.01 and an exponential decay rate of 0.9 every 500 steps.We let n GD = 10 and ϵ = 0.01.Convergence is reached within 5000 iterations.The entire dataset is used for every iteration.After training, the colored noise R and the white noise w are both distributed normally on the dataset with negligible off-centering.Non-stationarity is not detected numerically for the GAR model.
In the out-of-equilibrium regime, we fix the memory kernel and the GAR model.Then we retrain the force field F(x) = −∂ x U (x) + pEx with MD trajectories (about 1.2ns long in total) simulated independently under different E ∈ [2.0, 2.4]mV/A.In this dataset, MD can be out-of-equilibrium by creeping down metastable states.Although the entire landscape is explored, the creeping events are too sparse in the dataset for fitting the barrier height U b accurately.So we predetermine it using metadynamics, a method for computing free energy differences [S27], under E = 0.The resulting free energy barrier ∆E in energy units is converted to U b = ∆Ea 2 m , where a = 3.91 Å is the lattice constant.All the other parameters in F(x) are then trained directly on the MD dataset.We use the Adam optimizer with the same setting.Convergence is reached within 5000 iterations.The entire dataset is used at each iteration.
The productive AIGLE model is trained through the steps described above.Next, we examine its agreement with the statistics of MD data.We first extract the reference noise from the MD data with our AIGLE model.Then we use the GAR model to generate the simulated noise time series for comparison.In Fig. S4(a), we report the normalized autocorrelation function (NACF) of the reference noise extracted from the MD data at metastable equilibrium, defined as C MD RR (τ ) = ⟨R(t 0 + τ )R(t 0 )⟩/⟨R(t 0 ) 2 ⟩, and the NACF of the corresponding reference residual noise C MD ww (τ ) = ⟨w(t 0 + τ )w(t 0 )⟩/⟨w(t 0 ) 2 ⟩, which is also extracted from the MD data.It is apparent that the cutoff m A is large enough since C MD RR (τ > m A ∆t) ≪ 1.Moreover, C MD ww (τ ) is delta-like, except for a very small residual correlation for τ > m A ∆t. Additionally, both R and w display zero-centered normal distributions on the dataset, indicating that the GAR model successfully reduces the correlated noise in MD trajectories into Gaussian white noise.The NACF of the simulated noise (generated by the GAR model), denoted by C GLE RR (τ ), is also plotted in Fig. S4(a).C GLE RR (τ ) captures accurately the dominant oscillatory feature of C MD RR (τ ), but for a small discrepancy for τ > m A ∆t, which is caused by the residual correlation not eliminated by GAR.To display this effect in another way, we compare the Fourier transform of C MD RR (τ ) and of C GLE RR (τ ), denoted by ĈMD RR (Ω) and by ĈGLE RR (Ω), respectively.The corresponding real parts, ℜ( ĈMD RR (Ω)) and ℜ( ĈGLE RR (Ω)), shown in the inset of Fig. S4(a), display overdamped modes within [0, 400]cm −1 .As expected, the noise captures mainly broad overdamped modes, due to the interactions of the CV with a bath of fast variables.The main difference between ℜ( ĈMD RR (Ω)) and ℜ( ĈGLE RR (Ω)) occurs for frequencies close to zero, associated to long-time correlation of the noise that can not be captured accurately by a GAR model with finite memory.Given that long-time correlations appear to be weak and fluctuating in C MD RR (τ ), the errors due to neglecting them in the GAR model should have a negligible overall impact.Finally, in Fig. S4(a), the optimized memory kernel K is rescaled as in K(τ ) = K(τ ) K(∆t/2) to facilitate direct comparison with C MD RR (τ ).The excellent agreement observed in the figure indicates that the 2FDT is well-satisfied by AIGLE at metastable equilibrium.
The productive AIGLE model is able to simualte efficiently the dynamics of the CV under finite electric field E. Segments of CV trajectories simualted by AIGLE are plotted in Fig. S4(b), where one can see that the time scale of domain displacement events increases from picoseconds to nanoseconds with a modest decrement of E.

E. Ferroelectric switching
The switching process of a ferroelectric thin film typically involves three stages.The first stage is the nucleation of opposite domains at particular nucleation sites.The second stage is the forward growth of the nucleus across the thin film.The last stage is the sideways growth (widening) of the domain.For perovskite oxides, the first two stages are usually much faster than the last one [S29].Hence, as a rough approximation, one can consider just the sideways growth of a cylindrical domain for computing the hysteresis loop of a thin film of PbTiO 3 .Moreover, when the radius of the switched area is much larger than several nanometers, the curvature dependence of the domain wall velocity should be negligible.
Here, we consider a thin PbTiO 3 film of radius l ∈ [1, 1000]µm, initially polarized downward, as shown in Fig. S5(a).An upward square electric field pulse of magnitude E and duration t p is applied to the film, causing the growth of a central cylindrical domain with upward polarization separated by an interface from the surrounding medium with downward polarization.Alternatively, one could also consider the growing domain to be cuboidal instead of cylindrical.In our atomistic simulations, we find a cuboidal domain, with round corners, is structurally more stable than its cylindrical counterpart when growing (not true when it is shrinking).In this case, approximating v D with the one computed from an infinite planar domain wall will only be more appropriate.
Because l is large enough, the domain wall velocity can be well approximated by the v D computed for infinitely large flat domain walls.So after the application of the square pulse, the radius of the upward domain is r ↑ = min(v D (E)t p , l).The cross-sectional area is then A ↑ = πr 2 ↑ for the upward domain, and A ↓ = πl 2 − πr 2 ↑ for the downward domain.The lower branch of the hysteresis loop P(E) can be approximated by the average polarization of the two domains: where the remnant polarization P 0 = 72µC/cm 2 and the susceptibility χ = 0.643nC/mV are extracted from the MD simulations.The upper branch of the hysteresis loop is computed in a similar way.With this toy model of ferroelectric switching, the hysteresis loop for three typical domain radii l are reported in Fig. S5(b), where we also display the hysteresis loops observed in two experiments [S22, S28].The vertical span of the loops is determined by the remnant polarization P 0 , which is equal to 72µC/cm 2 in our atomistic model and is equal to 94µC/cm 2 and 96µC/cm 2 , respectively, in the two experiments.The loop width depends on the radius l of the film, on the velocity v D (E) of the interface, and on the coercive field associated to the polarization switch.Quite remarkably, simulations without any empirical input are able to reproduce closely the widths of the loops observed in experiments.The main difference between theory and experiment is that the polarization switch is much sharper in the former than in the latter.Interestingly, had we used Merz's law instead of AIGLE to compute v D (E), the theoretical loops would have been even sharper, albeit marginally so.The effect is small because near the coercive field, the difference between AIGLE and Merz's law is minor.The discrepancy between theory and experiment in the switching rate of the polarization with E suggests that effects beyond our simple model play a role.Realistic models should describe polarization pinning by point defects [S30], the finite thickness of the experimental samples [S25, S31], edge effects, the curvature dependence of the domain wall velocity, the morphology of the samples where several polarization domains may be present and eventually coalesce, etc.

V. DETAILS OF MODELING COARSE-GRAINED LATTICE DYNAMICS
A.
DFT-based atomistic models and MD simulations For this part of the study, we used the same DP model as the one introduced in Sec.IV A. But for computing the local dipole moments as CVs, we use the Deep Dipole model instead of the effective Born charge model.The latter is understood as a linear approximation to the former.The Deep Dipole model used here is exactly the same as the one introduced and analyzed in detail, in Ref. [S7] via maximally localized Wannier functions.
We carry out two types of MD simulations.The first is E = 0 NVT-MD simulation of a 8nm × 8nm × 5nm (20 × 20 × 12 elementary cells) supercell within a single ferroelectric domain.We collect 30ps-long trajectories of all local dipole moments as training data.The second type of MD simulations is electric field-driven NVT-MD simulation of a 20nm × 20nm × 5nm (50 × 50 × 12 elementary cells) supercell initialized with two opposite ferroelectric domains.For E = 0.5mV/A and E = 1mV/A, we collect respectively 20ps-long trajectories of all local dipole moments as training data.In addition, we collect 200ps-long trajectories of local dipole moments as validation data, used in Fig. 3(d,e) of the main text for direct comparison with AIGLE.

B. Training of AIGLE
We first determine the mass of the local dipole moment from the trajectory data {x (n) } with the equipartition theorem.We train the parameters in the free energy G through a simple force matching scheme: minimizing the mean-squared-error loss L MSE = n ∥M a (n) + ∇ x G(x (n) ) − pM E (n) ∥ 2 over the trajectory data collected under zero external field (E = 0).Then, we calculate the memory kernel K * through Eq. (S20), followed by the calculation of the Yule-Walker coefficients in Eq. (S21).Note that here we do not use any iterative scheme since the free energy is predetermined.Next, with the free energy model and the memory kernel, we extract the noise from the data through 2 ) v (n−l− 1 2 ) ∆t.We train the GAR model, with R (n) as the time series data and the Yule-Walker coefficients frozen, through the maximum likelihood loss Eq.(S24).After the training of the GAR model, we reinforce the 2FDT by modifying the memory kernel from K * to K * , as given by Eq. (S25).In the end, we retrain the parameters of the free energy and the response coefficient p by minimizing the loss where ∆F (n) = M a (n) + ∇ x G(x (n) ) − pM E (n) , over the trajectory data collected under E = 0, E = 0.5mV/A and E = 1mV/A.Θ(|x i,(n) | − x 0 ) is the Heaviside step function that vanishes if the magnitude of the local dipole moment x i,(n) is smaller than a threshold value x 0 = 2e Å, which distinguishes a local dipole at the interface of opposite domains from the local dipoles in the bulk.The goal is to improve the accuracy of the force field at the interface since these configurations are not included in the first force-matching step.After all these steps, we obtain the AIGLE model yielding the results displayed in the main text.

FIG. 1 .
FIG. 1. Upper panel: Lateral view of the 180 • domain wall in PbTiO3 .The bonds between titanium atoms and the nearest oxygen atoms are shown to visualize the domain separation.The yellow arrows represent the local dipoles, which are weaker near the domain wall.Lower panel left: A 20 × 40 × 20 supercell of PbTiO3 with parallel twin domain walls on the xz plane.The x and y dimensions are fixed to match the experimental lattice constant (see text) while the z dimension fluctuates under constant pressure Pz = 28kbar (see text) at temperature T = 300K.Lower panel right: The elementary cell of PbTiO3 .

FIG. 2 .
FIG. 2. (a) The velocity NACF C MD vv (τ ) from MD (blue) and the velocity NACF C GLE vv (τ ) from AIGLE (orange), for τ < 2ps.For τ > 2ps the correlations are confined to the interval [-0.1,0.1] and decay rapidly to zero.The inset reports ℜ( ĈMD vv (Ω)) (blue) and ℜ( ĈGLE vv (Ω)) (orange) in arbitrary units.The first peak of ℜ( ĈGLE vv (Ω)) is located near 30cm −1 .(b) Free energy profiles along the CV x in the presence of driving fields of various strengths.Two periods of the parametrized U (x) (see text) are shown.U (x) is represented by a dashed blue line.Metadynamics results are also reported as blue crosses, for comparison.The inset shows a magnified plot of the free energy basin including U (x) and metadynamics data.(c) The natural logarithm of the domain wall velocity vD is plotted against 1/E.The vertical dashed line indicates E = E * .Each GLE data point (orange dot) is the average of five AIGLE simulations lasting 0.5µs each.The corresponding error bars are smaller than the size of the dot.Each MD data point (blue diamond) is the average of one hundred MD trajectories lasting 0.1ns each.The error bars are smaller than the size of the diamond.The dashed blue line is the best fit of Merz's law with MD data (Ea = 28mV/ Å).(d) A comparison of AIGLE and LE predictions for ln vD vs 1/E.Each LE data point (green triangle) is the average of five LE trajectories lasting 0.5µs each.The error bars are smaller than the size of the triangle.
suggested by effective Hamiltonian models[55].By symmetry, b ij = b 3 if the sites i and j are separated by (±a ± b), and we set b ij = b 4 , otherwise.The b coefficients are assumed to be independent of E, as appropriate in the linear response regime.Hence, we limit simulations to E ≤ 1mV/A.Our model for G is shortranged but captures well the dipole-dipole interactions of the DP model within the cutoff radius of the latter.Long-range electrostatic interactions among the dipoles

FIG. 3 .
FIG. 3. (a) A (non-equilibrium) configuration of PbTiO3 showing a cylindrical up-polarized domain in an environment of opposite polarization.Spheres are Ti atoms, Pb and O atoms are not shown.The arrows depict the local dipole moments assigned to Ti-centered elementary cells.(b) A configuration within the up-polarized domain.(c) Coarse-graining procedure.(d,e) Relaxation dynamics of the local dipoles in an atomic layer perpendicular to the direction ẑ of spontaneous polarization under electric fields E = 0.5mV/ Å (d), and E = 1mV/ Å (e).The three horizontal sequences of panels in (d) and (e) depict the evolution of the yellow domain from MD, AIGLE, and AILE.The pixels correspond to the S sites.In the MD panels, they give the magnitude and sign of the microscopic dipoles {pjz}, as per scale on the right.In the AIGLE and AILE panels the pixels associated with the S1 sites give the CG dipoles {xj}, while those associated with the S2 sites are the average of the CG dipoles at the neighboring S1 sites.The lifetimes of the cylindrical domain are shown on the two time axes.The solid vertical bars, at 40ps for E = 0.5mV/A, and at 104ps for E = 1mV/A, are extracted from two MD simulations.The grey rectangles are extracted from nine independent AIGLE and AILE simulations for each value of the electric field.(f) Shifted autocorrelation function (ACF) of the local dipole xi(t), in units of (e Å) 2 , from MD, AIGLE, and AILE.(g) Equilibrium probability distribution κ∼πκ λ (|j−k|) .The Yule-Walker equation for a standard AR(m A ) model is Λ = Mϕ, the least square solution of which can be written as ϕ YW = Inv(M T M)M T Λ.Using the Yule-Walker solu-tion as a constraint, we optimize the GAR model by interleaving n GD unconstrained optimization steps towards minimize µ θ ,σ θ FIG. S1.(a) NACF of the noise.C data RR (τ ) = ⟨R(t0 + τ )R(t0)⟩/⟨R(t0) 2 ⟩ and C sim RR (τ ) = ⟨R(t0 + τ )R(t0)⟩/⟨R(t0) 2 ⟩ are averaged over the dataset and the simulated trajectories, respectively.C data ww (τ ) = ⟨w(t0 + τ )w(t0)⟩/⟨w(t0) 2 ⟩ is the NACF of the residual noise from the dataset.(b) The optimized memory kernel K(τ ) compared to the theoretical one and the 2FDT prediction K FDT (τ = n∆t) = −β⟨R (n) R (0) ⟩.(c) The velocity autocorrelation function.(d) Comparison between an MD trajectory and a GLE trajectory with the same initial conditions.

FIG. S2 .
FIG. S2.Error distribution of the DP model on the training set.The blue color marks configurations without domain walls, and the orange color marks the rest.EDFT, F x,y,z DFT are the energy and force labels of the data.E0 is a constant.∆E, ∆Fx,y,z are the difference between the model prediction and the data label.

FIG. S4 .
FIG. S4.(a) The NACF of the noise and the rescaled memory kernel K(τ ) for τ < 1ps.For τ > 1ps correlations exhibit a weakly oscillatory pattern similar to that shown here in the vicinity of τ = 1ps.The inset shows ℜ( ĈMD RR (Ω)) (blue) and ℜ( ĈGLE RR (Ω)) (orange) in arbitrary units.ℜ( ĈMD RR (Ω)) displays unphysical oscillations originating from statistical errors due to limited MD data.The first peak in ℜ( ĈGLE RR (Ω)) is located near 50cm −1 .(b) Time evolution of the CV x under different driving fields E. The waiting time distribution for the domain motion has a long tail.

FIG. S5 .
FIG. S5.(a) Schematic representation of the sideways-growth stage of ferroelectric switching.In our model, the effects of the finite film thickness (h) are ignored.The radius of the thin film is l.The purple domain has upward polarization, parallel to the external field E. The blue domain has downward polarization.The speed of growth vD can be approximated by the speed of a flat domain wall under the same external field, when the upward domain is sufficiently large.(b) The hysteresis loop P, scaled by 1/P0, for tp = 1/f = 100µs and different l.P0 is the remnant polarization at E = 0.The "EXP1" experimental data [S22] were obtained for the same tp and unknown l.The "EXP2" data [S28] were obtained for l ≈ 144µm and unknown tp.The remnant polarization in the two experiments is 94µC/cm 2 [S22] and 96µC/cm 2 [S28], respectively.Our atomistic model gives P0 = 72µC/cm 2 .
[62] is a scalar mass, estimated with the equipartition theorem.U b , the barrier height, is predetermined with metadynamics[62]since it is hard to fit it accurately in the near-equilibrium regime without enhanced sampling. Texternal force in Eq. (2) is represented by F (x) = mpE.The values of the parameters k, ω, x 0 and p, are fixed by training.The time cutoffs for the memory kernel K and for the GAR model for R are m K ∆t = 2ps and m A ∆t = 0.4ps, respectively.Assuming a linear response regime, the model parameters are independent of E. The AIGLE model introduced here is trained on several MD trajectories with E ∈ [2.0, 2.4]mV/ Å.Details of training and validation can be found in the SI.MD systems are at metastable equilibrium for E ≈ 2mV/ Å and near equilibrium for smaller E.
[86]grey dashed line indicates the peak frequency ω EXP 1A 1 of the Raman spectroscopy feature associated to the zone-center 1A1 transverse optical phonon[86].
[83][84][85]AIGLE, and AILE.(g)Equilibrium probability distribution P (|xi|) of the dipole magnitude from MD and AIGLE.The AILE result coincides with AIGLE and is not reported.The grey dashed line shows the range of the average dipole magnitude ⟨|xi(t)|⟩ from different experiments[83][84][85]. (h) ACF of the time derivative, vi(t), of xi(t), in units of (e Å/ps) 2 .(i) Cross-correlation function of vi(t) and vj(t) on adjacent sites in the S1 lattice, in units of (e Å/ps) 2 .(j) ACF of Ṗ(t), the time derivative of the polarization P(t), in units of (µC/cm 2 /ps) 2 .(k) Gaussian convoluted Fourier transform S(Ω) of the ACF of Ṗ(t) (see text for details).