## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Automated, predictive, and interpretable inference of *Caenorhabditis elegans* escape dynamics

Edited by David A. Weitz, Harvard University, Cambridge, MA, and approved February 22, 2019 (received for review September 25, 2018)

## Significance

The cost of an empirical bit in biophysics has fallen dramatically, and high-precision data are now abundant. However, biological systems are notoriously complex, multiscale, and inhomogeneous, so that we often lack intuition for transforming such measurements into theoretical frameworks. Modern machine learning can be used as an aid. Here we apply our *Sir Isaac* platform for automatic inference of a model of the escape response behavior in a roundworm directly from time series data. The automatically constructed model is more accurate than that curated manually, is biophysically interpretable, and makes nontrivial predictions about the system.

## Abstract

The roundworm *Caenorhabditis elegans* exhibits robust escape behavior in response to rapidly rising temperature. The behavior lasts for a few seconds, shows history dependence, involves both sensory and motor systems, and is too complicated to model mechanistically using currently available knowledge. Instead we model the process phenomenologically, and we use the *Sir Isaac* dynamical inference platform to infer the model in a fully automated fashion directly from experimental data. The inferred model requires incorporation of an unobserved dynamical variable and is biologically interpretable. The model makes accurate predictions about the dynamics of the worm behavior, and it can be used to characterize the functional logic of the dynamical system underlying the escape response. This work illustrates the power of modern artificial intelligence to aid in discovery of accurate and interpretable models of complex natural systems.

The quantitative biology revolution of recent decades has resulted in an unprecedented ability to measure dynamics of complex biological systems in response to perturbations with the accuracy previously reserved for inanimate, physical systems. For example, the entire escape behavior of a roundworm *Caenorhabditis elegans* in response to a noxious temperature stimulus can be measured for many seconds in hundreds of worms (1, 2). At the same time, theoretical understanding of such living dynamical systems has lagged behind, largely because, in the absence of symmetries, averaging, and small parameters to guide our intuition, building mathematical models of such complex biological processes has remained a very delicate art. Recent years have shown the emergence of automated modeling approaches, which use modern machine-learning methods to automatically infer the dynamical laws underlying a studied experimental system and predict its future dynamics (3⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–14). However, arguably, these methods have not yet been applied to any real experimental data with dynamics of a priori unknown structure to produce interpretable dynamical representations of the system. Thus, their ability to build not just statistical but physical models of data (15) which are interpretable by humans, answer interesting scientific questions, and guide future discovery remains unclear.

Here we apply the *Sir Isaac* platform for automated inference of dynamical equations underlying time series data to infer a model of the *C. elegans* escape response, averaged over a population of worms. We show that *Sir Isaac* is able not only to fit the observed data, but also to make predictions about the worm dynamics that extend beyond the data used for training. The inferred optimal model has an easily interpretable form, with the identified interactions and the inferred latent dynamical variable connecting naturally to known mechanisms of *C. elegans* sensorimotor control. And by analyzing the dynamical structure of the model—number of dynamic variables, number of attractors (distinct behaviors), etc.—we can generalize these results across many biophysical systems.

## Results

### Automated Dynamical Inference.

*Sir Isaac* (7, 8) is one of the new generation of machine-learning algorithms able to infer a dynamical model of time series data, with the model expressed in terms of a system of differential equations. Compared with other approaches, *Sir Isaac* is able to infer dynamics (at least for synthetic test systems) that are (*i*) relatively low dimensional, (*ii*) have unobserved (hidden or latent) variables, (*iii*) have arbitrary nonlinearities, (*iv*) rely only on noisy measurements of the system’s state variables and not of the rate of change of these variables, and (*v*) are expressed in terms of an interpretable system of coupled differential equations. Briefly, the algorithm sets up a complete and nested hierarchy of nonlinear dynamical models. Nestedness means that each next model in the hierarchy is more complex (in the sense of having a larger explanatory power) (16⇓–18) than the previous one and includes it as a special case. Completeness means that any sufficiently general dynamics can be approximated arbitrarily well by some model within the hierarchy. That is, the only restriction on the dynamics is that they are continuous and do not have infinite rates of change. Two such hierarchies have been developed, one based on S systems (19) and the other on sigmoidal networks (20). Both progressively add hidden dynamical variables to the model and then couple them to the previously introduced variables using nonlinear interactions of specific forms. *Sir Isaac* then uses a semianalytical formulation of Bayesian model selection (8, 18, 21, 22) to choose the model in the hierarchy that best balances the quality of fit vs. overfitting and is, therefore, expected to produce the best generalization. The sigmoidal network hierarchy is especially well suited to modeling biological systems, where rates of change of variables usually saturate over some scale, and it is the sole focus of our study.

### Experimental Model System.

Nociception evokes a rapid escape behavior designed to protect the animal from potential harm (23, 24). *C. elegans*, a small nematode with a simple nervous system, is a classic model organism used in the study of nociception. A variety of studies have used *C. elegans* to elucidate genes and neurons mediating nociception to a variety of aversive stimuli including high osmolarity and mechanical, chemical, and thermal stimuli (25⇓⇓–28). However, a complete dynamic understanding of the escape response at the neuronal, as well as the molecular, level is not fully known. Recent studies have quantified the behavioral escape response of the worm when thermally stimulated with laser heating (1, 2), and these data are the focus of our study. The response is dynamic: When the stimulus is applied to the animal’s head, it quickly withdraws, briefly accelerating backward, and eventually returns to forward motion, usually in a different direction. Various features of this response change with the level of laser heating, such as the length of time moving in the reverse direction and the maximum speed attained.

### Fits and Predictions.

We use the worm center-of-mass speed, v, as the variable whose dynamics need to be explained in response to the laser heating pulse. We define *Materials and Methods*. Based on trajectories of 201 worms in response to laser currents ranging between 9.6 mA and 177.4 mA, we let *Sir Isaac* determine the most likely dynamical system explaining these data within the sigmoidal networks model class (8) (see *Materials and Methods* for a detailed description of the modeling and inference). The inferred model has a latent (unobserved) dynamical variable, hereafter referred to as *Sir Isaac* inferred model. The resulting model is*Materials and Methods* for inferred values of

The fits produced by this model are compared with data in Fig. 1*A*, showing an excellent agreement (see *Materials and Methods* for quantification of the quality of fits). Surprisingly, the quality of the fit for this automatically generated model, with very little human input, is better than that of a state-of-the-art manually constructed probabilistic, nondynamical model (2): Only about 10% of explainable variance in the data remains unexplained by the model for times between 100 ms and 2 s after the stimulus, compared with about 20% for the manual model (*Materials and Methods*).

However, the quality of the fit may not be surprising in itself since the *Sir Isaac* model hierarchy can fit any dynamics using sufficient data. A utility of a mathematical model is in its ability to make predictions about data that were not used in fitting. Thus, we use the inferred model to predict when the worm will return to forward motion. This usually happens at times well after *B* and *C* compares these predictions with experiments, showing very good predictions. Such ability to extrapolate beyond the training range is usually an indication that the model captures the underlying physics and is not purely statistical (15), giving us confidence in using the model for inferences about the worm.

### Model Analysis.

The algorithm has chosen to include a single latent dynamical variable, which is a linear leaky integrator of the experienced temperature. Having access to both the instantaneous stimulus and its integral over the immediate past allows the worm to estimate the rate of change in the stimulus. This agrees with the observation (1) that both the current temperature and the rate of its increase are noxious for the worm. From this, one could have guessed, perhaps, that at least one latent variable (temperature derivative or temperature at some previous time) is required to properly model the escape response. However, the fact that *Sir Isaac* inferred this from time series data alone and was able to model the data with exactly one hidden variable is surprising.

Fig. 2 shows the phase portraits of the inferred dynamical model, Eqs. **1** and **2**, as well as the dynamics of the speed and the heat stimulus h. Crucially, we see that there is only one fixed point in the phase space at any instant of time, and the position of this fixed point is affected by the current laser stimulus value. This suggests that, at least at the level of the population-averaged response, the behavior does not involve switching among alternative behaviors defined dynamically as multiple fixed points or limit cycles (e.g., forward and backward motion) with the switching probability influenced by the stimulus (29), but rather the stimulus controls the direction and the speed of the single dominant crawling state.

The network diagram of the model in Eqs. **1** and **2** is shown in Fig. 3, where we omit the linear degradation terms for v and

Another notable feature of the network diagram is that it is similar to other well-known sensory networks, namely chemotaxis (and the related thermotaxis) in *Escherichia coli* (34, 35) and chemotaxis in *Dictyostelium discoideum* (36, 37). In all these cases, the current value of the stimulus is sensed in parallel to the stimulus integrated over the recent time. They are later brought together in a negative feedback loop (*E. coli*) or an incoherent feedforward loop (*D. discoideum*), resulting in various adaptive behaviors. In contrast, the worm’s behavior is more complicated: The ambient temperature participates in a negative feedback loop on the speed through

## Discussion

In this article, we have used modern machine learning to learn the dynamics underlying the temperature escape behavior in *C. elegans*. The resulting automatically inferred model is more accurate than the model curated by hand. It uses the dynamics within what is normally considered as discrete behavioral states to make extremely precise verifiable (and verified) predictions about the behavior of the worm beyond the range of time used for training. The model is fully interpretable, with many of its features having direct biological, mechanistic equivalents. Where such biological equivalents are unknown, the model makes strong predictions of what they should be and suggests what future experiments need to search for.

One can question whether describing the *C. elegans* nociceptive behavior, which typically is viewed as stochastic (1, 2) and switching between discrete states, with the deterministic dynamics approach of *Sir Isaac* is appropriate. The quality of the fits and predictions is an indication that it is. This is likely because the dynamics of the mean behavior, which are modeled by *Sir Isaac*, are deterministic even for a stochastic system. Furthermore, as the stimulus intensity increases, the entire escape, and not just its mean, becomes more and more deterministic (1). In this regime, the discretized behavior states (forward, backward, pause, …) have their own pronounced internal dynamics with different time-dependent velocities. These dynamical changes are comparable to differences between the states. That is, the range of forward speeds is similar to the difference between the mean forward and the mean backward speeds. Unlike most probabilistic methods, *Sir Isaac* correctly models these dynamical changes within and across the states.

Crucially, the model discovers that, at least in the context of stimuli that are scalar multiples of the same single-pulse time course, the behavior of an average worm is not a simple one-to-one mapping of the input signal: The instantaneous stimulus and its temporally integrated history (one latent variable) are both important for driving the behavior. The behavior is driven by one fixed point in the velocity–memory phase space, and the worm changes its speed while chasing this fixed point, which in turn changes in response to the stimulus. This is in contrast to other possibilities, such as the worm being able to exhibit both the forward and the backward motion at any stimulus value, and the stimulus and its history merely affect the probability of engaging in either of these two behaviors. In this initial study, we have focused on explaining the first few seconds of the worm’s response to a single heat pulse. Future experimental analysis will reveal whether the same dynamical model also explains the response to more complex temporal stimulus profiles, which might include habituation and other longer-term phenomena.

We emphasize that our automatic phenomenological inference of dynamics of complex biological systems relies very little on help from constraints imposed by knowledge of the underlying biology. Indeed, the choice of the sigmoidal model hierarchy assumes only continuous dynamics without infinite rates of change. The emphasis on interpretable, physical models allows extrapolation well beyond the data used for training, which is difficult for purely statistical methods. The automation allows for a comprehensive search through the model space, so that the automatically inferred model is better than the human-assembled one, especially when, as even for the best-studied biological systems, we do not have the necessary set of measurements to model them from the ground up. Our work illustrates the power of a phenomenological modeling approach, which allows for top–down modeling, adding interpretable constraints to our understanding of the system.

## Materials and Methods

### Data Collection.

A detailed description of the experiments has been previously published (2). We raised wild-type, N2 *C. elegans* using standard methods, incubated at 20°C with food. Worms were washed to remove traces of food and placed on the surface of an agar plate for 30 min at 20°C to acclimatize. Worms were then transferred to an agar assay plate seeded with bacteria (food) and left to acclimatize for 30 more minutes. The worms were then stimulated with an infrared laser (2-mm beam diameter) focused (100-mm focal length) to a diffraction-limited spot (

### Input Data.

Data used for the inference are as described in ref. 2. Speed data for 201 worms were extracted from video frames at 60 Hz and smoothed using a Gaussian kernel of width 74.5 ms (*SI Appendix*). We use data between 0.5 s before and 2.25 s after the start of the laser stimulus. Aligning the data by laser start time, the stimulus happens at the same time in each trial. Naively used, this can produce models that simply encode a short delay followed by an escape, without requiring the stimulus. To ensure that instead the stimulus causes the response in the model, for each trial we add a random delay between 0 s and 1 s to the time data.

Additionally, we are not interested in capturing any dynamics in the prelaser free-crawling state. If we use only the small amount of prelaser data measured in this experiment, the inference procedure is free to include models with complicated transient behavior before the stimulus. For this reason, we include a copy of prelaser data at a fictitious “equilibration time” long before the stimulus time (10 s), artificially forcing the model to develop a prestimulus steady state of forward motion. Finally, we weight the prelaser data such that they appear with equal frequency as postlaser data in fitting, so that the inference algorithm is not biased toward capturing postlaser behavior more accurately than the prelaser one. Data used in the work are available from figshare (https://figshare.com/articles/Data_and_Code_Archive_for_Automated_predictive_and_interpretable_inference_of_C_elegans_escape_dynamics_/7806602).

### Estimating Explainable Variance.

The observed variance in worm speed can be partitioned into that caused by the input (changes in laser current) and that caused by other factors (individual variability, experimental noise, etc.). As in ref. 2, we focus on our model’s ability to capture the former, “explainable” variance. We treat the latter variance as “unexplainable” by our model, and it is this variance that we use to define uncertainties on data points for use in the inference procedure. We estimate these variances by splitting the data into trials with similar laser current

An analogous goodness-of-fit measure can be calculated for the predictions in Fig. 1 *B* and *C*, where we use bins corresponding to the displayed orange circles. For predicting the duration of backward motion as a function of laser current (Fig. 1*B*), *C*),

*Sir Isaac* Inference Algorithm.

We use the *Sir Isaac* dynamical inference algorithm (8) to find a set of ordinary differential equations (ODEs) that best describes the data without overfitting. Based on previous studies of simulated biological systems, we use the continuous-time sigmoidal network model class, which produces a set of J ODEs of the form**7**. The algorithm infers both the number of parameters (controlled by the total number of dynamical variables J) and the parameter values themselves (the timescales

The fitting procedure starts by using one data point (one random time) from each of a few trials and then gradually adds trials and eventually multiple data points per trial, refitting model parameters at each step (see Table 1 for algorithm parameters). The resulting model fits are scored based on their performance in predicting the entire time series (see Fig. 4 for the fit quality). When the performance and model complexity of the winning model saturate, we use the resulting model as our description of the system. In this way, parameters are fitted using only a small subset of the available data—we find that using

### Inferred models.

The inference procedure produces the following differential equations:

We note that some of the inferred parameters are close to zero, and so we check whether the model can be simplified by setting each of them to zero one at a time and in combinations and then measuring the approximate Bayesian model selection posterior-likelihood score (8) for the original *Sir Isaac* inferred model and for each of the reduced models. The original model has log-likelihood −197.2, and the best model, Eqs. **1** and **2**, has the highest Bayesian log-likelihood of −192.9. This becomes our chosen model, maximum-likelihood parameters for which can be found in Table 3.

### Bayesian Model of Uncertainty.

To quantify uncertainty in model structure and parameters, we take a Bayesian approach and sample from the posterior distribution over parameters. Assuming independent Gaussian fluctuations in the data used in fits, the posterior is simply **3**. We use a standard Metropolis Monte Carlo sampler, as implemented in SloppyCell (40), to take 100 samples (each separated by 1,000 Monte Carlo steps) used to quantify uncertainty in the fit (Fig 1*A*).

### Model of Sensory Input.

Each experimental trial begins with a forward-moving worm, which is stimulated with a laser pulse of duration **5** and **6**, giving

### Comparison to the Model of Ref. 2.

A quantitative model of *C. elegans* nociceptive response was constructed by hand in ref. 2. Contrasting with our automated inference, the model was not fitted by optimizing parameters, but rather includes the entire measured escape behavior as a templated, stereotyped response. It was based on careful study of the data, which revealed that the mean response of worms is nearly stereotypical, with the response amplitude depending nonlinearly on the stimulus intensity. Of the variability in the response that is explainable by the stimulus, the model of ref. 2 could not account for *Sir Isaac*, with very little human input, our selected model captures more of the data variance. It achieves this using a model of a very different form (a set of coupled ODEs), leaving only *SI Appendix*, Fig. S1).

## Acknowledgments

We are grateful to the NVIDIA Corporation for donated GPUs and to the Aspen Center for Physics, supported by NSF Grant PHY-1607611, for its “Physics of Behavior” programs. This work was partially supported by NIH Grants EB022872 and NS084844, James S. McDonnell Foundation Grant 220020321, a Natural Sciences and Engineering Research Council of Canada Discovery Grant, and NSF Grant IOS-1456912.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: ilya.nemenman{at}emory.edu.

Author contributions: I.N. coordinated the project; B.C.D., W.S.R., and I.N. designed research; B.C.D. wrote the software; B.C.D., W.S.R., and I.N. performed research; W.S.R. collected experimental data; B.C.D., W.S.R., and I.N. analyzed data; and B.C.D., W.S.R., and I.N. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: Data related to this work have been deposited on figshare (https://figshare.com/articles/Data_and_Code_Archive_for_Automated_predictive_and_interpretable_inference_of_C_elegans_escape_dynamics_/7806602). The developed software is available on GitHub (https://github.com/EmoryUniversityTheoreticalBiophysics/SirIsaac).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1816531116/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- ↵
- Leung K,
- Mohammadi A,
- Ryu W,
- Nemenman I

*Caenorhabditis elegans*allows quantification of nociceptive stimuli levels. PLoS Comput Biol 12:e1005262. - ↵
- Schmidt M,
- Lipson H

- ↵
- ↵
- ↵
- Neuert G, et al.

- ↵
- ↵
- ↵
- Brunton S,
- Proctor J,
- Kutz J

- ↵
- Mangan N,
- Brunton S,
- Proctor J,
- Kutz J

- ↵
- Lu Z, et al.

- ↵
- Pathak J,
- Hunt B,
- Girvan M,
- Lu Z,
- Ott E

- ↵
- ↵
- Henry A,
- Hemery M,
- François P

- ↵
- Nelson P

- ↵
- Vapnik V

- ↵
- Rissanen J

- ↵
- ↵
- ↵
- Beer R,
- Daniels B

- ↵
- ↵
- ↵
- Eaton R

- ↵
- ↵
- Bargmann C,
- Thomas J,
- Horvitz H

*Caenorhabditis elegans*. Cold Spring Harbor Symp Quant Biol 55:529–538. - ↵
- Hilliard M, et al.

*C. elegans*ash neurons: Cellular response and adaptation to chemical repellents. EMBO J 24:63–72. - ↵
- Kaplan J,
- Horvitz H

*Caenorhabditis elegans*. Proc Natl Acad Sci USA 90:2227–2231. - ↵
- Wittenburg N,
- Baumeister R

*Caenorhabditis elegans*: An approach to the study of nociception. Proc Natl Acad Sci USA 96:10477–10482. - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Sourjik V,
- Wingreen NS

- ↵
- ↵
- ↵
- ↵
- Mehta P, et al.

- ↵
- Myers C,
- Gutenkunst R,
- Sethna J

- ↵
- Mohammadi A

*Caenorhabditis elegans*: Investigation of neural substrates spatially mediating the noxious response, and the effects of pharmacological perturbations. PhD thesis (Univ of Toronto, Toronto).

*Caenorhabditis elegans*escape dynamics

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Physical Sciences
- Physics

- Biological Sciences
- Biophysics and Computational Biology