# Nonparametric inference of interaction laws in systems of agents from trajectory data

^{a}Department of Mathematics, Johns Hopkins University, Baltimore, MD 21218;^{b}Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218;^{c}Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD 21218;^{d}Mathematical Institute for Data Science, Johns Hopkins University, Baltimore, MD 21218

See allHide authors and affiliations

Edited by Bin Yu, University of California, Berkeley, CA, and approved June 3, 2019 (received for review December 26, 2018)

## Significance

Particle and agent-based systems are ubiquitous in science. The complexity of emergent patterns and the high dimensionality of the state space of such systems are obstacles to the creation of data-driven methods for inferring the driving laws from observational data. We introduce a nonparametric estimator for learning interaction kernels from trajectory data, scalable to large datasets, statistically optimal, avoiding the curse of dimensionality, and applicable to a wide variety of systems from Physics, Biology, Ecology, and Social Sciences.

## Abstract

Inferring the laws of interaction in agent-based systems from observational data is a fundamental challenge in a wide variety of disciplines. We propose a nonparametric statistical learning approach for distance-based interactions, with no reference or assumption on their analytical form, given data consisting of sampled trajectories of interacting agents. We demonstrate the effectiveness of our estimators both by providing theoretical guarantees that avoid the curse of dimensionality and by testing them on a variety of prototypical systems used in various disciplines. These systems include homogeneous and heterogeneous agent systems, ranging from particle systems in fundamental physics to agent-based systems that model opinion dynamics under the social influence, prey–predator dynamics, flocking and swarming, and phototaxis in cell dynamics.

## 1. Introduction

Systems of interacting agents arise in a wide variety of disciplines, including Physics, Biology, Ecology, Neurobiology, Social Sciences, and Economics (e.g., refs. 1⇓⇓–4 and references therein). Agents may represent particles, atoms, cells, animals, neurons, people, rational agents, opinions, etc. The understanding of agent interactions at the appropriate scale in these systems is as fundamental a problem as the understanding of interaction laws of particles in Physics.

How can laws of interaction between agents be discovered? In Physics, vast knowledge and intuition exist to formulate hypotheses about the form of interactions, inspiring careful experiments and accurate measurements, that together lead to the inference of interaction laws. This is a classical area of research, dating back to at least Gauss, Lagrange, and Laplace (5), that plays a fundamental role in many disciplines. In the context of interacting agents at the scale of complex organisms, there are fewer controlled experiments possible and few “canonical” choices for modeling the interactions. Different types and models of interactions have been proposed in different scientific fields and fit to experimental data, which in turn may suggest new modeling approaches, in a model–data validation loop. Often, the form of governing interaction laws is chosen a priori, within perhaps a small parametric family, and the aim is often to reproduce only qualitatively, and not quantitatively, some of the macroscopic features of the observed dynamics, such as the formation of certain patterns.

Our work fits at the boundary between statistical/machine learning and dynamical systems, where equations are estimated from observed trajectory data, and inference takes into account assumptions about the form of the equations governing the dynamics. Since the past decade, the rapidly increasing acquisition of data, due to decreasing costs of sensors and measurements, has made the learning of large and complex systems possible, and there has been an increasing interest in inference techniques that are model-agnostic and scalable to high-dimensional systems and large datasets.

We establish statistically sound, dynamically accurate, computationally efficient techniques* for inferring these interaction laws from trajectory data. We propose a nonparametric approach for learning interaction laws in particle and agent systems, based on observations of trajectories of the states (e.g., position, opinion, etc.) of the systems, on the assumption that the interaction kernel depends on pairwise distances only, unlike recent efforts that either require feature libraries or parametric forms for such interactions (6⇓⇓⇓–10), or aim at identifying only the type of interaction from a small set of possible types (11⇓–13). We consider a least-squares (LS) estimator, classical in the area of inverse problems (dating back to Legendre and Gauss), suitably regularized and tuned to the learning of the interaction kernel in agent-based systems.

The unknown is the interaction kernel, a function of pairwise distances between agents of the systems. While the values of this function are not observed, in contrast to the standard regression problems, we are able to show that our estimator converges at an optimal rate as if we were in the 1D regression setting. In particular, the learning rate has no dependency on the dimension of the state space of the system, therefore avoiding any curse of dimensionality, and making these estimators well-suited for the modern high-dimensional data regime. It may be easily extended to a variety of complex systems; here, we consider first- and second-order models, with single and multiple types of agents, and with interactions with simple environments. We demonstrate with examples that the theoretical guarantees on the performance of the estimator make it suitable for testing hypotheses on underlying models of interactions, assisting an investigator in choosing among different possible (nonparametric) models.

Finally, our estimator is constructed with algorithms that are computationally efficient (with complexity *SI Appendix*, section 2F) and may be implemented in a streaming fashion: It is, therefore, well-suited for large datasets.

## 2. Learning Interaction Kernels

We start with a model that is used in a wide variety of interacting agent systems [e.g., physical particles or influence propagation in a population (14, 15)]: Consider **1** is the gradient flow for the potential energy

Our goal is to infer, in a nonparametric fashion, the interaction kernel ϕ, by constructing an estimator **1** as

We proceed in a different direction, aiming for the flexibility of a nonparametric model while exploiting the structure of the system in Eq. **1**. The target function ϕ depends on just one variable (pairwise distance), but it is observed through a collection of nonindependent linear measurements (the left-hand side of Eq. **1**), at locations **1**. When the **4** for a formal definition.

We measure the performance of

We are also interested in whether trajectories

Finally, while the error

### A. Different Sampling Regimes and Randomness

The total number of observations is (number of ICs)× (number of temporal observations in

#### Many short time trajectories.

T is small, L is small (e.g.,

#### Single large time trajectory.

T is large (even comparable to the relaxation time of the system if applicable), L is large, and

#### Intermediate time scale.

T, L and M are all not small, but none is very large, corresponding to multiple “medium”-length trajectories, with several different ICs.

Randomness is injected via the ICs, and in our main results in Section 3, the sample size will be M. If the system is ergodic, the regimes above are partially related to each other, at least when the ICs are sampled from the ergodic distribution

### B. Example: Interacting Particles with the Lennard–Jones Potential

We illustrate the learning procedure on a particle system with **1** with *SI Appendix*, section 3B contains a detailed description of the experiments. Fig. 1 demonstrates that the estimators approximate the true kernel well in different sampling regimes and that the trajectories of the true system are well-approximated by those of the learned system both in the “training” interval (

The rate of decay of the estimation error is close to the optimal rate in Thm. 3.3 (Fig. 2); this is a consequence of two factors: the use of an empirical approximation to *SI Appendix*, Section 3B for a detailed discussion).

Fig. 3 shows the behavior of the error of the estimators as both L and M are increased. It indicates that a single long trajectory may not contain enough “information” to learn the kernel, at least for deterministic systems approaching a steady state. It also shows the behavior predicted by Thm. 3.3—namely, for each fixed L the error decreases as M increases.

## 3. Learning Theory

We introduce an error functional based on the structure of the dynamical system **1**. The restriction

### A. Probability Measures Adapted to the Dynamics

To measure the quality of the estimator of the interaction kernel ϕ, we introduce two probability measures on *SI Appendix*, Lemma 1.1), measuring how much regions of

We report here on the analysis in the discrete-time observation case, most relevant in practice, with

### B. Learnability: The Coercivity Condition

A fundamental question is the learnability of the kernel, i.e., the convergence of the estimator **3** to the true kernel ϕ as the sample size increases (i.e.,

**Definition 3.1.** The dynamical system in Eq. **1**, with IC sampled from **coercivity condition** on a set H if there exists a constant *SI Appendix*, Thm. 1.2 and Prop. 1.3). Thm. 3.1 proves that the coercivity condition holds under suitable hypotheses, even independently of N; numerical tests suggest that it holds generically over larger classes of interaction kernels and distributions of ICs, for large L, and as long as *SI Appendix*, Fig. S6). Finally, *SI Appendix*, Prop. 2.1 for details).

We prove that coercivity holds when *SI Appendix*, Fig. S6) suggest that the coercivity condition holds true for a larger class of interaction kernels, for various initial distributions including Gaussian and uniform distributions, and for large L, as long as

**Theorem 3.1.** *Suppose* *and assume that the distribution of* *is exchangeable Gaussian with* *for a constant* *. Then*, *the coercivity condition holds true with* *where* *is independent of* N, *is positive for any compact* *and is zero for*

In this setting, the analysis of the coercivity constant

**Lemma 3.2.** *Let* *be exchangeable Gaussian random vectors in* *with* *for a constant* *. Suppose* *. Then*, *there is a positive definite integral kernel* *such that for any* *where* *, since* *. Therefore*, *there exists* *depending only on* *such that for* *and* *if* H *is compact in*

We conclude that under the assumptions of Thm. 3.1, if H is compact, then

### C. Optimal Rates of Convergence

The classical bias–variance trade-off in statistical estimation guides the selection of a hypothesis space H, whose dimension will depend on M, the number of observed trajectories. On the one hand, H should be large so that the bias (distance between the true kernel ϕ and H) is small; on the other hand, H should be small so that variance of the estimator is small. In the extreme case where *SI Appendix*, Prop. 1.5). In fact, significantly better rates may be achieved for regular ϕ’s:

**Theorem 3.3.** *Assume that* *. Let* *be a sequence of subspaces of* *with* *and* *for some constants* *Assume that the coercivity condition holds on* *. Such a sequence exists*, *for example*, *if* ϕ *is* s-*Hölder regular*, *and can be chosen so that* H *is compact in* *. Choose* *. Then*, *there exists a constant* *such that**optimal*: It coincides with the minimax rate in the classical regression setting, where one can observe directly noisy values of an s-Hölder regression function at the sample points. We obtain this optimal rate, even if we do not observe the values **7**, the computational complexity of computing

One shortcoming of our result is that the rate is not a function of the total number of observations, which is *random* samples. Numerical experiments (see Fig. 3 and similar experiments for the other systems, reported in *SI Appendix*) suggest that the estimator improves as L increases, at least to a point, limited by the “information” in a single trajectory. Comparing to ref. 17, where the mean field limit

Our work here may be compared with the classical parameter estimation problem for the ODE models (19⇓⇓–22), where one is interested in estimating the vector parameter θ in the ODE model

### D. Trajectory-Based Performance Measures

It is important not only that

**Proposition 3.4.** *Assume* *with Lipschitz constant* *. Let* *and* *be the solutions of systems with kernels* *and* ϕ, *respectively*, *started from the same IC. Then*, *for each trajectory**and on average with respect to the distribution* *of ICs*:*where the measure* *is defined in* *Eq.***4** *and*

## 4. Extensions: Heterogeneous Agent Systems, First and Second Order

The method proposed extends naturally to a large variety of interacting agent systems arising in a multitude of applications (4), including systems with multiple types of agents, driven by second-order equations, and including interactions with an environment. For detailed discussions of related topics on self-organized dynamics, we refer the readers to refs. 3 and 32⇓⇓–35 and the recent surveys (36, 37).

### A. First-Order Heterogeneous Agents Systems

Let the agents be divided into K disjoint sets

Let **8**, and **8** as **2**, with a weighted norm, to define the estimators:

The generalization of **5** (similarly for

While this case requires learning multiple interaction kernels, it turns out that the learning theory developed for the single-type agent systems can be generalized, and the estimator in Eq. **9** still achieves optimal rates of convergence, and a similar control on the error of predicted trajectories can be obtained.

### B. Second-Order Heterogeneous Agent Systems

Here, we focus on a broad family of second-order multitype agent systems (not included, even when rewritten as first-order systems, in the family discussed above). We consider systems with K types of agents:

Note that here each agent is influenced by a weighted sum of different influences over agents of different types, leading to a rich family of models (including but not limited to prey–predator, leader–follower, and cars–pedestrian models). Using vector notation, let **2**

The algorithm to construct the estimator in Eq. **13** generalizes that for the first-order single-type agent systems, and involves a LS problem with a structured matrix with

We note that while of course the second-order system may be written as a first-order system in the variables **8**.

## 5. Examples

We consider the learning of interaction kernels and the prediction of trajectories for three canonical categories of examples of self-organized dynamics (see *SI Appendix*, section 3 for details).

### Opinion Dynamics

These are first-order ODE systems with a single type of agent, with bounded, discontinuous, compactly supported, and attraction-only interaction kernels. They model how the opinions of people influence each other and how consensus is formed based on different kinds of influence functions (refs. 14, 15, and 38 and references therein).

### Predator–Swarm System

We consider a first-order system with a single predator and a swarm of prey, with the interaction kernels (prey–prey, predator–prey, and prey–predator) similar to Lennard–Jones kernels (with appropriate signs to model attractions and repulsions). Different chasing patterns arise depending on the relative interaction strength of predator–prey vs. prey–predator interactions. We also consider a second-order predator–swarm system, with the collective interaction acting on accelerations, leading to even richer dynamics and chasing patterns (e.g., refs. 39⇓–41).

### Phototaxis

This is a second-order ODE system with a single type of agents interacting in an environment, modeling phototactic bacteria moving toward a far-away fixed light source. The response of the bacteria to the light source is represented in the auxiliary variable

In our experiments, we report the measure *SI Appendix*, section 3 for further details on the setup for the experiments and a comprehensive report of all of the results, as well as a detailed description of the final algorithm and its computation complexity in *SI Appendix*, section 2.

#### Model Selection and Transfer Learning.

We also consider the use of our method for model selection, where the theoretical guarantees on learning the interaction kernels and on predicting trajectories are used to decide between different models for the dynamics. We consider two examples of model selection, to test whether: (*i*) a second-order system is driven by energy-based or alignment-based interactions; or (*ii*) a heterogeneous agent system is driven by first- or second-order ODEs. For each of them, we construct two estimators assuming either case and then select models according to the performance of the estimators in predicting trajectories. See Table 1 and Fig. 7 for results and discussions and *SI Appendix*, section 3E for details.

As a simple example of transfer learning, we use the interaction kernel learned on a system with N agents to accurately predict trajectories of the same type of system but with more agents (*SI Appendix*, section 3, we report the corresponding results, for all of the systems considered (see, however, Fig. 1 for the Lennard–Jones system).

#### Noisy Observations.

Our estimators appear robust under observation noise, namely, if the observed positions and derivatives are corrupted by noise. Fig. 8 demonstrates the kernel estimation and trajectory prediction for the first-order predator–swarm system when only noisy observations are available. Similar results (reported in *SI Appendix*, section 3) are obtained in all of the other systems considered.

#### Choice of the Basis of the Hypothesis Space.

Our learning approach is robust to the choice of hypothesis space H, as long as the coercivity condition is satisfied by H (or the sequence *SI Appendix*, Prop. 2.1). To demonstrate this numerically, we compare the B-splines linear basis with the piecewise polynomial basis on the *SI Appendix*, Fig. S8.

## 6. Discussion and Conclusion

We proposed a nonparametric estimator for learning interaction kernels from observations of agent systems, implemented by computationally efficient algorithms. We applied the estimator to several classes of systems, including first- and second-order, with single- and multiple-type agents, and with simple environments. We have also considered observation data from different sampling regimes: many short-time trajectories, a single large-time trajectory, and intermediate time scales.

Our inference approach is nonparametric, does not rely on a dictionary of hypotheses (such as in refs. 6⇓–8), exploits the structure of dynamics, and enjoys optimal rates of convergence (which we proved here for first-order systems), independent of the dimension of the state space of the system. Having techniques with solid statistical guarantees is fundamental in establishing trust in data-driven models for these systems and in using them as an aide to the researcher in formulating and testing conjectures about models underlying observed systems. In this vein, we presented two examples of model selection, showing that our estimators can reliably identify the order of a system and identify whether a system is driven by energy- or alignment-type interactions.

We expect further generalizations to the case of stochastic dynamical systems and to the cases of more general interaction kernels that depend on more general types of interaction between agents, beyond pairwise, distance-based interactions. Other future directions include (but are not limited to) a better understanding of learnability, model selection based on the theory, learning from partial observations, and learning reduced models for large systems.

## Acknowledgments

We thank the reviewers for comments, which led to significant improvements to the paper; and Prof. Massimo Fornasier, Prof. Pierre-Emmanuel Jabin, Prof. Yannis Kevrekidis, Prof. Nathan Kutz, Prof. Yaozhong Hu, and Dr. Cheng Zhang for discussions. We thank Duke University and the Maryland Advanced Research Computing Center for access to computing equipment. This work was supported by National Science Foundation Grants DMS-1708602, ATD-1737984, IIS-1546392, DMS-1821211, and IIS-1837991; Air Force Office of Scientific Research Grant AFOSR-FA9550-17-1-0280; and S.T. is grateful for support from the American Mathematical Society-Simons Travel grant.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: mauromaggionijhu{at}icloud.com.

Author contributions: F.L. and M.M. designed research; F.L., M.Z., S.T., and M.M. performed research; F.L., M.Z., S.T., and M.M. analyzed data; and F.L., M.Z., S.T., and M.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The software package implementing the proposed algorithms can be found on https://github.com/MingZhongCodes/LearningDynamics.

↵*The software package implementing the proposed algorithms can be found on https://github.com/MingZhongCodes/LearningDynamics.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1822012116/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- N. Bellomo,
- P. Degond,
- E. T

- J. A. Carrillo,
- Y. Choi,
- S. Perez

- ↵
- T. Kolokolnikov,
- H. Sun,
- D. Uminsky,
- A. Bertozzi

- ↵
- ↵
- Y. Shoham,
- K. Leyton-Brown

- ↵
- S. M. Stigler

- ↵
- H. Schaeffer,
- R. Caflisch,
- C. D. Hauck,
- S. Osher

- ↵
- S. Brunton,
- J. Proctor,
- J. Kutz

- ↵
- G. Tran,
- R. Ward

- ↵
- W. Bialek et al.

- ↵
- Y. Li,
- J. Wu,
- R. Tedrake,
- J. B. Tenenbaum,
- A. Torralba

- ↵
- M. Ballerini et al.

- ↵
- R. Lukeman,
- Y. X. Li,
- L. Edelstein-Keshet

- ↵
- Y. Katz,
- K. Tunstrom,
- C. Ioannou,
- C. Huepe,
- I. Couzin

- ↵
- U. Krause

- ↵
- ↵
- L. Györfi,
- M. Kohler,
- A. Krzyzak,
- H. Walk

- ↵
- M. Bongini,
- M. Fornasier,
- M. Hansen,
- M. Maggioni

- ↵
- H. Schaeffer,
- G. Tran,
- R. Ward

- ↵
- ↵
- ↵
- J. Cao,
- L. Wang,
- J. Xu

- ↵
- J. O. Ramsay,
- G. Hooker,
- D. Campbell,
- J. Cao

- ↵
- ↵
- J. M. Varah

- ↵
- J. O. Ramsay

- ↵
- ↵
- J. Timmer,
- H. Rust,
- W. Horbelt,
- H. Voss

- ↵
- H. Miao,
- X. Xia,
- A. S. Perelson,
- H. Wu

- ↵
- J. Ramsay,
- G. Hooker

- ↵
- J. von Brecht,
- D. Uminsky

- ↵
- R. Simione,
- D. Slepčev,
- I. Topaloglu

- ↵
- F. Cucker,
- J. G. Dong

- ↵
- F. Cucker,
- E. Mordecki

- ↵
- ↵
- J. Ke,
- J. W. Minett,
- C. P. Au,
- W. S. Y. Wang

- ↵
- A. Muntean,
- F. Toschi

- J. A. Carrilo,
- Y. P. Choi,
- M. Haurray

- ↵
- G. Naldi,
- L. Pareschi,
- G. Toscani,
- N. Bellom

- J. A. Carrilo,
- M. Fornasier,
- G. Toscani,
- F. Vecil

- ↵
- S. Mostch,
- E. Tadmor

- ↵
- Y. Chen,
- T. Kolokolnikov

- ↵
- ↵
- ↵
- S. Ha,
- D. Levy

- ↵
- J. M. Skerker,
- H. C. Berg

- ↵
- D. Bhaya,
- A. Takahashi,
- A. R. Grossman

- ↵

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Applied Mathematics