## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Simple framework for constructing functional spiking recurrent neural networks

Contributed by Terrence J. Sejnowski, September 5, 2019 (sent for review April 8, 2019; reviewed by Larry Abbott and David Sussillo)

## Significance

Recent advances in artificial intelligence and deep learning have significantly improved the capability of recurrently connected artificial neural networks. Although these networks can achieve high performance on various tasks, they often lack basic biological constraints, such as communication via spikes. However, recurrent microcircuitry in the brain can attain similar or better performance with discrete spikes in a much more efficient manner. Here, we introduce an extremely simple platform to construct spiking recurrent neural networks capable of performing numerous cognitive tasks commonly studied in neuroscience. Our method utilizes a close relationship between rate-based and spike-based networks that emerges under certain conditions. By characterizing these conditions, we provide another avenue that can be probed for constructing power-efficient spiking recurrent neural networks.

## Abstract

Cortical microcircuits exhibit complex recurrent architectures that possess dynamically rich properties. The neurons that make up these microcircuits communicate mainly via discrete spikes, and it is not clear how spikes give rise to dynamics that can be used to perform computationally challenging tasks. In contrast, continuous models of rate-coding neurons can be trained to perform complex tasks. Here, we present a simple framework to construct biologically realistic spiking recurrent neural networks (RNNs) capable of learning a wide range of tasks. Our framework involves training a continuous-variable rate RNN with important biophysical constraints and transferring the learned dynamics and constraints to a spiking RNN in a one-to-one manner. The proposed framework introduces only 1 additional parameter to establish the equivalence between rate and spiking RNN models. We also study other model parameters related to the rate and spiking networks to optimize the one-to-one mapping. By establishing a close relationship between rate and spiking models, we demonstrate that spiking RNNs could be constructed to achieve similar performance as their counterpart continuous rate networks.

Dense recurrent connections common in cortical circuits suggest their important role in computational processes (1⇓–3). Network models based on recurrent neural networks (RNNs) of continuous-variable rate units have been extensively studied to characterize network dynamics underlying neural computations (4⇓⇓⇓⇓–9). Methods commonly used to train rate networks to perform cognitive tasks can be largely classified into 3 categories: recursive least squares (RLS)-based, gradient-based, and reward-based algorithms. The first-order reduced and controlled error (FORCE) algorithm, which utilizes RLS, has been widely used to train RNNs to produce complex output signals (5) and to reproduce experimental results (6, 10, 11). Gradient descent-based methods, including Hessian-free methods, have also been successfully applied to train rate networks in a supervised manner and to replicate the computational dynamics observed in networks from behaving animals (7, 12, 13). Unlike the previous 2 categories (i.e., RLS-based and gradient-based algorithms), reward-based learning methods are more biologically plausible and have been shown to be as effective in training rate RNNs as the supervised learning methods (14⇓⇓–17). Even though these models have been vital in uncovering previously unknown computational mechanisms, continuous rate networks do not incorporate basic biophysical constraints, such as the spiking nature of biological neurons.

Training spiking network models where units communicate with one another via discrete spikes is more difficult than training continuous rate networks. The nondifferentiable nature of spike signals prevents the use of gradient descent-based methods to train spiking networks directly, although several differentiable models have been proposed (18, 19). Due to this challenge, FORCE-based learning algorithms have been most commonly used to train spiking recurrent networks. While recent advances have successfully modified and applied FORCE training to construct functional spike RNNs (8, 20⇓⇓–23), FORCE training is computationally inefficient and unstable when connectivity constraints, including separate populations for excitatory and inhibitory populations (Dale’s principle) and sparse connectivity patterns, are imposed (21).

Due to these limitations, computational capabilities of spiking networks that abide by biological constraints have been challenging to explore. For instance, it is not clear if spiking RNNs operating in a purely rate-coding regime can perform tasks as complex as the ones rate RNN models are trained to perform. If such spiking networks can be constructed, then it would be important to characterize how much spiking-related noise not present in rate networks affects the performance of the networks. Establishing the relationship between these 2 types of RNN models could also serve as a good starting point for designing power-efficient spiking networks that can incorporate both rate and temporal coding.

To address the above questions, we present a computational framework for directly mapping rate RNNs with basic biophysical constraints to leaky integrate-and-fire (LIF) spiking RNNs without significantly compromising task performance. Our method introduces only 1 additional parameter to place the spiking RNNs in the same dynamic regime as their counterpart rate RNNs and takes advantage of the previously established methods to efficiently optimize network parameters while adhering to biophysical restrictions. These previously established methods include training a continuous-variable rate RNN using a gradient descent-based method (24⇓⇓–27) and connectivity weight matrix parametrization method to impose Dale’s principle (13). The gradient descent learning algorithm allowed us to easily optimize many parameters, including the connectivity weights of the network and the synaptic decay time constant for each unit. The weight parametrization method proposed by Song et al. (13) was utilized to enforce Dale’s principles and additional connectivity patterns without significantly affecting computational efficiency and network stability.

Combining these 2 existing methods with correct parameter values enabled us to directly map rate RNNs trained with backpropagation to LIF RNNs in a one-to-one manner. The parameters critical for mapping to succeed included the network size, the nonlinear activation function used for training rate RNNs, and a constant factor for scaling down the connectivity weights of the trained rate RNNs. Here, we investigated these parameters along with other LIF parameters and identified the range of values required for the mapping to be effective. We demonstrate that, when these parameters are set to their optimal values, the LIF models constructed from our framework can perform the same tasks that the rate models are trained to perform equally well.

## Results

Here, we provide a brief overview of the 2 types of RNNs that we used throughout this study (more details are in *Materials and Methods*): continuous-variable firing rate RNNs and spiking RNNs. The continuous-variable rate network model consisted of N rate units with firing rates that were estimated via a nonlinear input–output transfer function (4, 5). The model was governed by the following set of equations:*Materials and Methods*).

The second RNN model that we considered was a network composed of N spiking units. Throughout this study, we focused on networks of LIF units with membrane voltage dynamics that were given by*Materials and Methods*) has posed a major challenge for directly training spiking networks using gradient-based supervised learning. Even though the main results presented here are based on LIF networks, our method can be generalized to quadratic integrate-and-fire (QIF) networks with only a few minor changes to the model parameters (*SI Appendix*, Table S1).

Continuous rate network training was implemented using the open source software library TensorFlow in Python, while LIF/QIF network simulations along with the rest of the analyses were performed in MATLAB.

### Training Continuous Rate Networks.

Throughout this study, we used a gradient-descent supervised method, known as backpropagation through time (BPTT), to train rate RNNs to produce target signals associated with a specific task (13, 24). The method that we used is similar to the one used by previous studies (13, 25, 27) (more details are in *Materials and Methods*) with 1 major difference in synaptic decay time constants. Instead of assigning a single time constant to be shared by all of the units in a network, our method tunes a synaptic constant for each unit using BPTT (*Materials and Methods*). Although tuning of synaptic time constants may not be biologically plausible, this feature was included to model diverse intrinsic synaptic timescales observed in single cortical neurons (28⇓–30).

We trained rate RNNs of various sizes on a simple task modeled after a Go-NoGo task to demonstrate our training method (Fig. 1). Each network was trained to produce a positive mean population activity approaching +1 after a brief input pulse (Fig. 1*A*). For a trial without an input pulse (i.e., NoGo trial), the networks were trained to maintain the output signal close to 0. The units in a rate RNN were sparsely connected via *Materials and Methods*).

The network size (N) was varied from 10 to 400 (9 different sizes), and 100 networks with random initializations were trained for each size. For all of the networks, the minimum and maximum synaptic decay time constants were fixed to 20 and 50 ms, respectively. As expected, the smallest rate RNNs (*C*; *SI Appendix* has training termination criteria).

### One-to-One Mapping from Continuous Rate Networks to Spiking Networks.

We developed a simple procedure that directly maps dynamics of a trained continuous rate RNN to a spiking RNN in a one-to-one manner.

In our framework, the 3 sets of the weight matrices (*Materials and Methods* and Fig. 2*A*). The effects of the scaling factor are clear in an example LIF RNN model constructed from a rate model trained to perform the Go-NoGo task (Fig. 2*B*). With an appropriate value for λ, the LIF network performed the task with the same accuracy as the rate network, and the LIF units fired at rates similar to the “rates” of the continuous network units (*SI Appendix*, Fig. S1). In addition, the LIF network reproduced the population dynamics of the rate RNN model as shown by the time evolution of the top 3 principal components extracted by the principal component analysis (*SI Appendix*, Fig. S2).

Using the procedure outlined above, we converted all of the rate RNNs trained in the previous section to spiking RNNs. Only the rate RNNs that successfully performed the task (i.e., training termination criteria met within the first 6,000 trials) were converted. Fig. 2*C* characterizes the proportion of the LIF networks that successfully performed the Go-NoGo task (*SI Appendix*) and the average task performance of the LIF models for each network size group. For each conversion, the scaling factor (λ) was determined via a grid search method (*Materials and Methods*). The LIF RNNs constructed from the small rate networks (*C*).

In order to investigate the effects of the synaptic decay time constants on the mapping robustness, we trained rate RNNs composed of 250 units (*D*). For the shortest synaptic decay time constant considered (20 ms), the average task performance was the lowest at *E*). The LIF models for the rest of the maximum synaptic decay conditions were robust. Although this might indicate that tuning of *Analysis of the Conversion Method*.

Our framework also allows seamless integration of additional functional connectivity constraints. For example, a common cortical microcircuitry motif where somatostatin-expressing interneurons inhibit both pyramidal and parvalbumin-positive neurons can be easily implemented in our framework (*Materials and Methods* and *SI Appendix*, Fig. S3). In addition, Dale’s principle is not required for our framework (*SI Appendix*, Fig. S4).

### LIF Networks for Context-Dependent Input Integration.

The Go-NoGo task considered in the previous section did not require complex cognitive computations. In this section, we consider a more complex task and probe whether spiking RNNs can be constructed from trained rate networks in a similar fashion. The task considered here is modeled after the context-dependent sensory integration task used by Mante et al. (7). Briefly, Mante et al. (7) trained rhesus monkeys to integrate inputs from one sensory modality (dominant color or dominant motion of randomly moving dots) while ignoring inputs from the other modality (7). A contextual cue was also given to instruct the monkeys which sensory modality they should attend to. The task required the monkeys to utilize flexible computations, as the same modality can be either relevant or irrelevant depending on the contextual cue. Previous works have successfully trained continuous rate RNNs to perform a simplified version of the task and replicated the neural dynamics present in the experimental data (7, 13, 15). Using our framework, we constructed a spiking RNN model that can perform the task and capture the dynamics observed in the experimental data.

For the task paradigm, we adopted a similar design as the one used by the previous modeling studies (7, 13, 15). A network of recurrently connected units received 2 streams of noisy input signals along with a constant-valued signal that encoded the contextual cue (*Materials and Methods* and Fig. 3*A*). To simulate a noisy sensory input signal, a random Gaussian time series signal with 0 mean and unit variance was first generated. Each input signal was then shifted by a positive or negative constant (“offset”) to encode evidence toward the (+) or (−) choice, respectively. Therefore, the offset value determined how much evidence for the specific choice was represented in the noisy input signal. The network was trained to produce an output signal approaching +1 (or −1) if the cued input signal had a positive (or negative) mean. For example, if the cued input signal was generated using a positive offset value, then the network should produce an output that approaches +1 regardless of the mean of the irrelevant input signal.

Rate networks with different sizes (*B* and *C*). The synaptic decay time constants were again limited to a range of 20 and 50 ms, and 100 rate RNNs with random initial conditions were trained for each network size. For the smallest network size (*B*).

Next, all of the rate networks successfully trained for the task were transformed into LIF models. Example output responses along with the distribution of the tuned synaptic decay constants from a converted LIF model (*A* and *B*. The task performance of the LIF model was 98% and comparable with the rate RNN used to construct the spiking model (Fig. 4*C*). In addition, the LIF network manifested population dynamics similar to the dynamics observed in the group of neurons recorded by Mante et al. (7) and rate RNN models investigated in previous studies (7, 13, 15): individual LIF units displayed mixed representation of the 4 task variables (modality 1, modality 2, network choice, and context) (*SI Appendix*, Fig. S5*A*), and the network revealed the characteristic line attractor dynamics (*SI Appendix*, Fig. S5*B*).

Similar to the spiking networks constructed for the Go-NoGo task, the LIF RNNs performed the input integration task more accurately as the network size increased (Fig. 4*D*). Next, the network size was fixed to *E*).

### Analysis of the Conversion Method.

Previous sections illustrated that our framework for converting rate RNNs to LIF RNNs is robust as long as the network size is not too small (*D* and 4*D*). In this section, we further investigate the relationship between rate and LIF RNN models and characterize other parameters crucial for the conversion to be effective.

#### Training synaptic decay time constants.

As shown in Fig. 5, training the synaptic decay constants for all of the rate units is not required for the conversion to work. Rate RNNs (100 models with different initial conditions) with the synaptic decay time constant fixed to 35 ms (average

#### Other LIF parameters.

We also probed how LIF model parameters affected our framework. More specifically, we focused on the refractory period and synaptic filtering. The LIF models constructed in the previous sections used an absolute refractory period of 2 ms and a double-exponential synaptic filter (*Materials and Methods*). Rate models (*A*). When the refractory period was set to 0 ms, the LIF RNNs still performed the integration task with a moderately high average accuracy (92.8 ± 14.3%), but the best task performance was achieved when the refractory period was set to 2 ms (average performance, 97.0 ± 6.6%) (Fig. 6*A*, *Inset*).

We also investigated how different synaptic filters influenced the mapping process. We first fixed the refractory period to its optimal value (2 ms) and constructed 100 LIF networks (*Materials and Methods* and Fig. 6*B*, light blue). Next, the synaptic filter was changed to the following single-exponential filter:*B*).

#### Initial connectivity weight scaling.

We considered the role of the connectivity weight initialization in our framework. In the previous sections, the connectivity weights (*Materials and Methods*). Previous studies have shown that rate networks operating in a high gain regime (*C*) to perform the contextual integration task. The LIF models performed the task equally well across all of the gain terms considered (no statistical significance detected).

#### Transfer function.

One of the most important factors that determines whether rate RNNs can be mapped to LIF RNNs in a one-to-one manner is the nonlinear transfer function used in the rate models. We considered 3 nonnegative transfer functions commonly used in the machine learning field to train rate RNNs on the Go-NoGo task: sigmoid, rectified linear, and softplus functions (Fig. 7*A* and *SI Appendix*). For each transfer function, 100 rate models (*B*), the average task performance and the number of successful LIF RNNs were highest for the rate models trained with the sigmoid transfer function (Fig. 7*C*). None of the rate models trained with the rectified linear transfer function could be successfully mapped to LIF models, while the spiking networks constructed from the rate models trained with the softplus function were not robust and produced incorrect responses (*SI Appendix*, Fig. S6).

## Discussion

In this study, we presented a simple framework that harnesses the dynamics of trained continuous rate network models to produce functional spiking RNN models. We identified a set of parameters required to directly transform trained rate RNNs to LIF models, thus establishing a one-to-one correspondence between these 2 model types. Despite of additional spiking-related parameters, surprisingly only a single parameter (i.e., scaling factor) was required for LIF RNN models to closely mimic their counterpart rate models. Furthermore, this framework can flexibly impose functional connectivity constraints and heterogeneous synaptic time constants.

We investigated and characterized the effects of several model parameters on the stability of the transfer learning from rate models to spiking models. The parameters critical for the mapping to be robust included the network size, choice of activation function for training rate RNNs, and a constant factor to scale down the connectivity weights of the trained rate networks. Although the softplus and rectified linear activation functions are popular for training deep neural networks, we demonstrated that the rate networks trained with these functions do not translate robustly to LIF RNNs (Fig. 7). However, the rate models trained with the sigmoid function were transformed to LIF models with high fidelity.

Another important parameter was the constant scaling factor used to scale *A*). Training the synaptic decay time constants, choice of synaptic filter (between single- and double-exponential filter), and connectivity weight initialization did not affect the mapping procedure (Figs. 5 and 6 *B* and *C*).

The type of approach used in this study (i.e., conversion of a rate network to a spiking network) has been previously used in neuromorphic engineering to construct power-efficient deep spiking networks (31⇓⇓⇓⇓–36). These studies mainly used feedforward multilayer networks or convolutional neural networks aimed to accurately classify input signals or images without placing too much emphasis on biophysical limitations. The overarching goal in these studies was to maximize task performance while minimizing power consumption and computational cost. However, the main aim of this study was to construct spiking recurrent network models that abide by important biological constraints in order to relate emerging mechanisms and dynamics to experimentally observed findings. To this end, we have carefully designed our continuous rate RNNs to include several biological features. These include 1) recurrent architectures, 2) sparse connectivity that respects Dale’s principle, and 3) heterogeneous synaptic decay time constants.

For constructing spiking RNNs, recent studies have proposed methods that built on the FORCE method to train spiking RNNs (8, 20⇓–22). Conceptually, our work is most similar to the work by DePasquale et al. (21). The method developed by DePasquale et al. (21) also relies on mapping a trained continuous-variable rate RNN to a spiking RNN model. However, the rate RNN model used in their study was designed to provide dynamically rich auxiliary basis functions meant to be distributed to overlapping populations of spiking units. Due to this reason, the relationship between their rate and spiking models is rather complex, and it is not straightforward to impose functional connectivity constraints on their spiking RNN model. An additional procedure was introduced to implement Dale’s principle, but this led to more fragile spiking networks with considerably increased training time (21). The one-to-one mapping between rate and spiking networks used in our method solved these problems without sacrificing network stability and computational cost: biophysical constraints that we wanted to incorporate into our spiking model were implemented in our rate network model first and then, transferred to the spiking model.

While our framework incorporated the basic yet important biological constraints, there are several features that are also not biologically realistic in our models. The gradient-descent method used to tune the rate model parameters, including the connectivity weights and the synaptic decay time constants, in a supervised manner is not biologically plausible. Although tuning of the synaptic time constants is not realistic and has not been observed experimentally, previous studies have underscored the importance of the diversity of synaptic timescales both in silico and in vivo (8, 29, 30). In addition, other works have validated and uncovered neural mechanisms observed in experimental settings using RNN models trained with backpropagation (7, 13, 37), thus highlighting that a network model can be biologically plausible even if it was constructed using nonbiological means. Another limitation of our method is the lack of temporal coding in our LIF models. Since our framework involves rate RNNs that operate in a rate-coding scheme, the spiking RNNs that our framework produces also use rate coding by nature. Previous studies have shown that spike coding can improve spiking efficiency and enhance network stability (20, 38, 39), and recent studies emphasized the importance of precise spike coordination without modulations in firing rates (40, 41). Lastly, our framework does not model nonlinear dendritic processes, which have been shown to play a significant role in efficient input integration and flexible information processing (22, 42, 43). Incorporating nonlinear dendritic processes into our platform using the method proposed by Thalmeier et al. (22) will be an interesting next step to further investigate the role of dendritic computation in information processing.

In summary, we provide an easy-to-use platform that converts a continuous recurrent network model with basic biological constraints to a spiking model. The tight relationship between rate and LIF RNN models under certain parameter values suggests that spiking networks could be put together to perform complex tasks traditionally used to train and study continuous rate networks. Future work needs to focus on why and how such a tight relationship emerges. The framework along with the findings presented in this study lay the groundwork for discovering principles on how neural circuits solve computational problems with discrete spikes and for constructing more power-efficient spiking networks. Extending our platform to incorporate other commonly used neural network architectures could help design biologically plausible deep learning networks that operate at a fraction of the power consumption required for current deep neural networks.

## Materials and Methods

The implementation of our framework and the codes to generate all of the figures in this work are available at https://github.com/rkim35/spikeRNN. The repository also contains implementation of other tasks, including autonomous oscillation and delayed match-to-sample tasks.

All of the trained models used in this study have been deposited into Open Science Framework (44).

### Continuous Rate Network Structure.

The continuous rate RNN model contains N units recurrently connected to one another. The dynamics of the model is governed by*Training Details* discusses how these are initialized and optimized),

The connectivity weight matrix (

The external currents (*SI Appendix*) along with a Gaussian white noise variable:

The output of the rate RNN at time t is computed as a linear readout of the population activity:

Eq. **5** is discretized using the first-order Euler approximation method:

### Spiking Network Structure.

For our spiking RNN model, we considered a network of LIF units governed by*Training Details*). The spike train produced by unit i is represented as a sum of Dirac δ functions, and

The external current input (*Continuous Rate Network Structure*). The only difference is the addition of a constant background current set near the action potential threshold (see below).

The output of our spiking model at time t is given by

Other LIF model parameters were set to the values used by Nicola and Clopath (23). These include the action potential threshold (−40 mV), the reset potential (−65 mV), the absolute refractory period (2 ms), and the constant bias current (−40 pA). The parameter values for the LIF and the QIF models are listed in *SI Appendix*, Table S1.

### Training Details.

In this study, we only considered supervised learning tasks. A task-specific target signal (z) is used along with the rate RNN output (

In order to train the rate model to minimize the above loss function (Eq. **8**), we used the adaptive moment estimation stochastic gradient descent algorithm. The learning rate was set to 0.01, and the TensorFlow default values were used for the first and second moment decay rates. The gradient descent method was used to optimize the following parameters in the rate model: synaptic decay time constants (

Here, we describe the method to train synaptic decay time constants (**6**) used to constrain the time constants to be nonnegative. The time constant values are also bounded by the minimum (**8**) is then backpropagated to update the time constants at each iteration:

The method proposed by Song et al. (13) was used to impose Dale’s principle and create separate excitatory and inhibitory populations. Briefly, the recurrent connectivity matrix (

To impose specific connectivity patterns, we apply a binary mask (**9**:*SI Appendix*, Fig. S3):

### Transfer Learning from a Rate Model to a Spiking Model.

In this section, we describe the method that we developed to perform transfer learning from a trained rate model to an LIF model. After the rate RNN model is trained using the gradient descent method, the rate model parameters are transferred to an LIF network in a one-to-one manner. First, the LIF network is initialized to have the same topology as the trained rate RNN. Second, the input weight matrix (

If the recurrent connectivity weights from the trained rate model are transferred to a spiking network without any changes, the spiking model produces largely fluctuating signals (as illustrated in Fig. 2*B*), because the LIF firing rates are significantly larger than 1 (whereas the firing rates of the rate model are constrained to range between 0 and 1 by the sigmoid transfer function).

To place the spiking RNN in the similar dynamic regime as the rate network, we first assume a linear relationship between the rate model connectivity weights and the spike model weights:

Using the above assumption, the synaptic drive (d) that unit i in the LIF RNN receives can be expressed as

Similarly, unit i in the rate RNN model receives the following synaptic drive at time t:

If we set the above 2 synaptic drives (Eqs. **10** and **11**) equal to each other, we have**12** to all of the units in the network, we have

The readout weights from the rate model (

In order to find the optimal scaling factor, we developed a simple grid search algorithm. For a given range of values for

## Acknowledgments

We thank Ben Huh, Gerald Pao, Jason Fleischer, Debha Amatya, Yusi Chen, and Ben Tsuda for helpful discussions and feedback on the manuscript. We also thank Jorge Aldana for assistance with computing resources. This work was funded by National Institute of Mental Health Grant F30MH115605-01A1 (to R.K.), the Harold R. Schwalenberg Medical Scholarship (R.K.), and the Burnand–Partridge Foundation Scholarship (R.K.). We acknowledge the support of the NVIDIA Corporation with the donation of the Quadro P6000 graphics processing unit used for this research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: rkim{at}salk.edu or terry{at}salk.edu.

Author contributions: R.K. and T.J.S. designed research; R.K., Y.L., and T.J.S. performed research; R.K., Y.L., and T.J.S. analyzed data; and R.K., Y.L., and T.J.S. wrote the paper.

Reviewers: L.A., Columbia University; and D.S., Google.

The authors declare no competing interest.

Data deposition: The data reported in this paper have been deposited in Open Science Framework, https://osf.io/jd4b6/.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1905926116/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- C. M. Kim,
- C. C. Chow

- ↵
- F. Mastrogiuseppe,
- S. Ostojic

- ↵
- ↵
- ↵
- ↵
- ↵
- H. F. Song,
- G. R. Yang,
- X. J. Wang

- ↵
- ↵
- ↵
- Z. Zhang,
- Z. Cheng,
- Z. Lin,
- C. Nie,
- T. Yang

- ↵
- Bengio S et al.

- D. Huh,
- T. J. Sejnowski

- ↵
- J. H. Lee,
- T. Delbruck,
- M. Pfeiffer

- ↵
- ↵
- B. DePasquale,
- M. M. Churchland,
- L. F. Abbott

- ↵
- D. Thalmeier,
- M. Uhlmann,
- H. J. Kappen,
- R. M. Memmesheimer

- ↵
- ↵
- P. J. Werbos

- ↵
- L. Getoor,
- T. Scheffer

- J. Martens,
- I. Sutskever

- ↵
- S. Dasgupta,
- D. McAllester

- R. Pascanu,
- T. Mikolov,
- Y. Bengio

- ↵
- R. Ward,
- L. Deng

- Y. Bengio,
- N. Boulanger-Lewandowski,
- R. Pascanu

- ↵
- ↵
- D. F. Wasmuht,
- E. Spaak,
- T. J. Buschman,
- E. K. Miller,
- M. G. Stokes

- ↵
- S. E. Cavanagh,
- J. P. Towers,
- J. D. Wallis,
- L. T. Hunt,
- S. W. Kennerley

- ↵
- Y. Cao,
- Y. Chen,
- D. Khosla

- ↵
- D.-S. Huang

- P. U. Diehl et al.

- ↵
- S. Williams

- P. U. Diehl,
- G. Zarrella,
- A. Cassidy,
- B. U. Pedroni,
- E. Neftci

- ↵
- E. Hunsberger,
- C. Eliasmith

- ↵
- B. Rueckauer,
- I. A. Lungu,
- Y. Hu,
- M. Pfeiffer

- ↵
- A. Sengupta,
- Y. Ye,
- R. Wang,
- C. Liu,
- K. Roy

- ↵
- W. Chaisangmongkon,
- S. K. Swaminathan,
- D. J. Freedman,
- X. J. Wang

- ↵
- ↵
- S. McIlraith,
- K. Weinberger

- A. Alemi,
- C. K. Machens,
- S. Denéve,
- J. J. E. Slotine

- ↵
- ↵
- ↵
- B. B. Ujfalussy,
- J. K. Makara,
- T. Branco,
- M. Lengyel

- ↵
- ↵
- R. Kim,
- Y. Li,
- T. J. Sejnowski

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Biological Sciences
- Neuroscience