# Cell-type–specific neuromodulation guides synaptic credit assignment in a spiking neural network

^{a}Department of Applied Mathematics, University of Washington, Seattle, WA 98195;^{b}Allen Institute for Brain Science, Seattle, WA 98109;^{c}Computational Neuroscience Center, University of Washington, Seattle, WA 98195;^{d}Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA 94305

See allHide authors and affiliations

Edited by Terrence Sejnowski, Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA; received June 28, 2021; accepted October 28, 2021

## Significance

Synaptic connectivity provides the foundation for our present understanding of neuronal network function, but static connectivity cannot explain learning and memory. We propose a computational role for the diversity of cortical neuronal types and their associated cell-type–specific neuromodulators in improving the efficiency of synaptic weight adjustments for task learning in neuronal networks.

## Abstract

Brains learn tasks via experience-driven differential adjustment of their myriad individual synaptic connections, but the mechanisms that target appropriate adjustment to particular connections remain deeply enigmatic. While Hebbian synaptic plasticity, synaptic eligibility traces, and top-down feedback signals surely contribute to solving this synaptic credit-assignment problem, alone, they appear to be insufficient. Inspired by new genetic perspectives on neuronal signaling architectures, here, we present a normative theory for synaptic learning, where we predict that neurons communicate their contribution to the learning outcome to nearby neurons via cell-type–specific local neuromodulation. Computational tests suggest that neuron-type diversity and neuron-type–specific local neuromodulation may be critical pieces of the biological credit-assignment puzzle. They also suggest algorithms for improved artificial neural network learning efficiency.

Mathematical “gradient backpropagation” algorithms (1, 2) now solve the problem of credit assignment for artificial neural networks effectively enough to have ushered in an era of shockingly powerful artificial intelligence. Nevertheless, their exact implementation on advanced tasks can be extremely costly in terms of computation, storage, and circuit interconnects (3), driving a search for more efficient credit-assignment algorithms, such as approximate gradient methods (4⇓⇓–7), which, e.g., limit temporal contributions to learning (8) or exploit neuromorphic methods to improve energy efficiency (9, 10). Neuroscientists meanwhile recognize that exact gradient backpropagation demands precise, but nonlocal, communication that is implausible in the biological brain and instead propose approximate learning rules that sidestep the demands. These have shown impressive performance, largely in feedforward networks (11⇓⇓⇓⇓⇓⇓⇓⇓–20), with recent extensions to the more enigmatic case of recurrently connected networks (21, 22). This said, biological neural networks feature a spectacular array of dynamical and signaling mechanisms, whose potential contributions to credit assignment have not yet been considered. Taken together, this creates a remarkable need and opportunity for bio-inspired network-learning algorithms to advance both neuroscience and computer science research. Here, we follow this path and present evidence for a previously unrecognized temporal credit-assignment mechanism inspired by recent advances in brain genetics.

Prior efforts to address the biology of synaptic credit assignment have focused on Hebbian spike-timing-dependent synaptic plasticity and “top-down” (TD) signaling by dopamine (13, 23⇓–25), a monoamine neuromodulator released from axons that ramify extensively throughout the brain from small midbrain nuclei. All cellular actions of dopamine are exerted by activation of G protein-coupled receptors (GPCRs), which can strongly modulate the timing dependence of Hebbian synaptic plasticity (25⇓–27). While such actions clearly contribute to synaptic credit assignment, and recent evidence suggests spatiotemporal sculpting of the dopaminergic signal (28, 29), biologically plausible models based on these principles significantly underperform gradient backpropagation algorithms, let alone the brain, and many gaps in our understanding remain (12, 13).

Transcriptomic studies have now revealed that genes encoding hundreds of other modulatory GPCRs, including those selective for serotonin, norepinephrine, acetylcholine, amino acids, and the many neuropeptides (NPs), are expressed throughout the brain. Downstream actions of these other GPCRs on nerve membranes and synapses are similar to those of dopamine receptors, suggesting that they, too, could participate in credit assignment. Single-cell RNA-sequencing studies now show also, however, that expression of this diverse array of GPCR genes is highly neuron-type–specific (30). Furthermore, virtually every neuron expresses one or more GPCR-targeting NP ligands, again, in highly neuron-type–specific patterns. These new single-cell transcriptomic data thus suggest the prospect of an interplay between synaptic and local peptidergic modulatory networks that could help to guide credit assignment.

The new genetic results have led us to formulate a theory of network learning that casts neuronal networks in terms of interacting synaptic and modulatory connections, with discrete neuron types as common nodes. To explore this normative theory, we have instantiated the simplified computational model schematized by Figs. 1 and 2. The model comprises both dopamine-like TD and NP-like local modulatory signaling, shown with a network of arrow-spray glyphs connecting populations of cells, in addition to synaptic transmission via discrete spikes, shown with a network of lines connecting individual cells (Fig. 1*A*). Our model has multiple neuron types distinguished by both their synaptic connectivity and differential expression of modulatory ligand–receptor pairs that regulate Hebbian synaptic plasticity (Fig. 1 *B* and *C*). Our proof-of-concept implementation shows significant improvements over previous literature. The key development is that neurons utilize modulatory networks to actively broadcast their own contribution to the network performance to nearby neurons via cell-type–specific local neuromodulation (Figs. 1*D* and 2*C*)—specifically, each cell broadcasts its own direct contribution to the overall task “error” signal. This is a major departure from more global roles for modulators previously proposed, such as carrying error or reward signals. From a neuroscience perspective, our study proposes a model of cortical learning shaped by the interplay of local modulatory signaling carrying credit-assignment information and synaptic transmission and potentially brings us closer to understanding biological intelligence. From a computer science perspective, our method offers a significantly smaller number of interconnects for on-chip neuro-inspired artificial intelligence.

## Results

### Overview of Multidigraph Learning in Recurrent Spiking Neural Networks.

Gradient descent on the task error (or negative reward) *E* can iteratively adjust synaptic weights to learn the task. However, computing the error gradient in a recurrent network requires unwrapping the dynamics over time because weights influence future activity in synaptically far-away neurons. The Backpropagation Through Time (BPTT) and Real-Time Recurrent Learning (RTRL) algorithms calculate this error gradient by allowing cell-specific, nonlocal communication among synapses in adjusting their weights; for example, in Fig. 1*A*, the synapse between the uppermost cells labeled C and E would receive information about the many synaptic weights and cell activities downstream of that cell E. They also require either noncausal dependencies (BPTT) or infeasible memory scalability (RTRL) (detailed in *Methods*, Eqs. **2**–**17** and see Fig. 6). Faced with this, a key step in state-of-the-art rate-based (21) and spike-based (ref. 22; “e-prop”) biologically plausible learning algorithms is to drop the nonlocal terms so that the activities of only the presynaptic and postsynaptic neurons would be needed to update the weight of the synapse between them. As illustrated in Fig. 2*A*, the resulting three-factor local learning rules represent this presynaptic and postsynaptic information as a time-dependent eligibility trace (ET) and combine it with TD signals to update the weight *Methods*; see Figs. 5*B* and 6 *D*, *ii*; see also ref. 7).

We propose a role for cell-type-based modulatory signals in recovering a key part of the error-gradient information that is lost by dropping nonlocal terms in such conventional three-factor rules. We describe this in terms of the update *q* to neuron *p* with strength *w*. To begin, we consider the contributions to the error gradient of those neurons that are one synapse away from the postsynaptic site (i.e., neurons *j* such that synapse *Methods*; see Fig. 5*C*). We find that this set of contributions can be realized by activity-dependent signals emitted by neurons *j* and the ET for the synapse **19**). Intriguingly, this signal is precisely neuron *j*’s contribution to the task error, thereby taking into account the indirect contribution of the synapse *j* for more accurate synaptic credit assignment.

This initial form, however, still requires the knowledge of cell-specific signals from cells *j* not participating in the synapse of interest. We further make the key observation that when just the contributions from cells up to two synapses away are considered, those terms only appear under a sum: The mechanism updating the synapse *j*, as their sum suffices. This observation is critical in elucidating a role for diffusive neuromodulatory signaling in carrying this summed, indirect signal and thus serving as an additional factor in synaptic plasticity.

To fully remove cell-specific dependencies in the indirect signal, we further approximate the cell-specific weights *w _{jp}* that it contains with the cell-type–specific terms

*j*belongs to type

*α*and presynaptic cell

*p*belongs to type

*β*. We postulate that

*w*represents the affinity of GPCRs expressed by cells of type

_{αβ}*β*to peptides secreted by cells of type

*α*(Fig. 1

*C*and

*Methods*, Eq.

**20**; see Fig. 5

*D*), and this cell-type–specific variable is genetically determined. The local diffusion assumption (35) suggests a further idealization, where this type of signaling is registered only by local synaptic partners and therefore preserves the connectivity structure of

*w*(Eq.

_{jp}**23**). It is also worth noting that the rich set of ligand and receptor types with different downstream actions (30) can support that

*w*is a signed term.

_{αβ}Bringing these together, we have the learning rule illustrated in Fig. 2*B*. The weight update is given as*j* is of type *α* and neuron *p* is of type *β*, *C* denotes the set of neuronal cell types, *p*, *w _{αβ}* denotes the effect of ligands secreted by class

*α*neurons on class

*β*receptors, and

*C*; see also Eq.

**24**for details). Specifically, the modulatory signal

*j*in Eq.

**1**is precisely the contribution of cell

*j*to the task error, as measured by the (partial) derivative of the error with respect to cell

*j*’s membrane potential. Moreover, this framework proposes that cell-type–specific GPCR affinities allow these local signals to be informative without the need to know precise synaptic weights. The ability to assess the indirect impact of neurons on the overall loss via such communication is critical to accurate synaptic reweighting and improved performance over existing biologically plausible rules, as we demonstrate next (Fig. 3 and

*SI Appendix*, Fig. S2).

In summary, we have proposed a rule for updating a synapse *w _{pq}*, which we refer to as the multidigraph learning rule, or MDGL, where the Hebbian ET is compounded not only with TD learning signals—as in modern biologically plausible learning rules (31, 32)—but also with cell-type–specific, diffuse modulatory signals.

### Simulation Framework for Testing Multidigraph Learning in Recurrent Spiking Neural Networks.

To test the MDGL formulation, we study its performance in recurrent spiking neural networks (RSNNs) learning well-known tasks involving temporal processing: pattern generation, delayed match to sample, and evidence accumulation. We use two main cell classes, inhibitory (I) and excitatory (E) cells, and obey experimentally observed constraints (e.g., refractoriness, synaptic delay, and connection sparsity). We further endow a fraction of the E cells with threshold adaptation (37). This mimics the hierarchical structure of cell types (38) through the simple example of two main cell types, one of which has two subtypes (E cells with and without threshold adaptation). The existence/lack of synaptic connections to output neurons further divides each population into two, thus bringing the cell type tally to six in our conceptual model (Fig. 1). Our implementation does not involve rapid and random formation of new synapses after each experience (39), further increasing its biological plausibility.

We compare the learning performance of MDGL (Fig. 2*B*) with the state-of-the-art biologically plausible learning rule [e-prop (22)] (Fig. 2*A* and *SI Appendix*, Fig. S1). As a three-factor rule, e-prop does not involve local cell-type–specific signaling and restricts the update to depend only on presynaptic and postsynaptic activity, as well as a TD instructive signal. To provide a lower bound on task error, we also compare performance with BPTT (see Fig. 5*A*), which uses exact error gradients to update weights. These learning rules are further illustrated in *Methods*; see Fig. 5 *A*, *B*, and *D*.

### Multidigraph Learning Guides Temporal Credit Assignment in Benchmark Tasks.

We first study pattern generation with RSNNs, where the aim is to produce a one-dimensional target output, generated from the sum of five sinusoids, given a fixed Poisson input realization (40). We change the target output and the Poisson input along with the initial weights for different training runs (*SI Appendix*, Fig. S3*A*) and illustrate the learning curve in Fig. 3*A* and *SI Appendix*, Fig. S4*A* across five such runs. We observe that MDGL performs significantly better than e-prop.

Next, to study how RSNNs can learn to process discrete cues that impact delayed rewards, we consider a delayed match to sample task (41). Here, two cue alternatives are encoded by the presence/absence of input spikes. The RSNN is trained to remember the first cue and learn to compare it with the second cue delivered at a later time (*SI Appendix*, Fig. S3*B*). Fig. 3*B* and *SI Appendix*, Figs. S4*B* and S5 display the learning curve for novel inputs. We observe that the same general conclusions as for the pattern-generation task hold; introducing cell-type–specific neuromodulation significantly improves learning outcomes.

Finally, we study an evidence-accumulation task (29), which involves integration of several cues to produce the desired output at a later time: A simulated agent moves along a path while encountering a series of sensory cues presented either on the right or left side of a track (Fig. 3*C* and *SI Appendix*, Figs. S3*C* and S4*C*). When it reaches a T-junction, it decides if more cues were received on the left or right. We test our learning rule to see if the addition of diffuse modulatory signals can indeed bring the learning curve closer to BPTT, without relying on rapid and random rewiring (39). Fig. 3*C* demonstrates that the performance trends of the previous two experiments continue to hold. *SI Appendix*, Fig. S6 illustrates that without threshold adaptation and recurrent connectivity, the network cannot significantly decrease loss and thus is unable to learn this task. In line with these experiments, gradients approximated by MDGL are more similar to the exact gradients (*SI Appendix*, Fig. S2), shedding light on its superior performance. We also observe that MDGL’s performance depends only weakly on the hypothesized link (Eq. **23**) between abstract cell-type-based connectivities and modulatory receptor affinities (*SI Appendix*, Fig. S7), enabling flexible implementations in vivo and in silico.

We conducted further studies to better quantify how a model’s network architectures impact the performance of MDGL relative to other learning rules. First, as depicted in Figs. 1 and 2, recall that only output-projecting neurons receive TD signals, which is a consequence of gradient-based learning (22), and only these neurons secrete local neuromodulators (i.e., nonzero LM in Eq. **1**). Consistent with this, we found that MDGL is most advantageous relative to e-prop in cases where relatively small fractions of recurrently connected neurons are output projecting, as we may expect in many biological networks (*SI Appendix*, Fig. S8). In these “sparse-output” cases, while many neurons do not receive learning signals in the e-prop formulation, MDGL still allows these neurons to receive such signals via local modulation. The result is more accurate approximation of gradients and more efficient learning. Next, *SI Appendix*, Fig. S9 demonstrates the effectiveness of using the cell-type–specific, rather than more precise cell-specific, weights of the modulatory signals within the MGDL framework. We find that the cell-type-based approximation does degrade performance, but that this effect is relatively minor.

Finally, we note that while our proof-of-concept implementation assumes symmetric feedback weights for the output connections (i.e., the same output projection weight is used during the computation of TD feedback signal), the random feedback-alignment approach (14) or approximating the feedback weights using the same cell-type-based calculations in Eq. **23** both offer improved biological plausibility for this single-layer feedforward problem.

### Spatiotemporal Extent of Multidigraph Learning.

Owing both to extracellular ligand-diffusion biophysics and the complex metabolic nature of GPCR-based signal transduction, neuromodulation time scales are generally much longer than those of conventional synaptic transmission (42). Since our model does not explicitly limit the communication bandwidths of either channel, comparing the frequency content offers an important check for biological plausibility. It also provides a test of our approximation that the summation over cells in Eq. **19** acts as a smoothing operation. Fig. 4 *A*–*C* and *SI Appendix*, Fig. S10 demonstrate that the modulatory input indeed has significantly lower frequency content than the synaptic input for all three tasks.

The distance ranges of diffusive modulatory signals remain uncertain (43). For most of the simulations described here (those according to Eq. **23**), modulatory signaling was limited to synaptically coupled pairs (a token of anatomic proximity), representing an idealization of the short-range signaling case. We also examined the opposite extreme case, where modulatory signals extend to all cells (independent of any anatomic proximity), referring to this nonlocal form of MDGL as NL-MDGL. This would make *w _{αβ}* a worse approximation to

*w*, presumably degrading the quality of the gradient estimate. Indeed, removing the locality of modulatory signals degrades performance while remaining superior to that in the absence of modulatory signaling (Fig. 4

_{jp}*D*–

*F*)—suggesting that the biophysics of diffusive modulatory signaling may condition the efficiency of synaptic learning.

## Discussion

Here, we have presented a multidigraph theory and instantiated simple models based on this theory that explicitly represent diverse neuron types classified by their synaptic and neuromodulatory connections. Simulations based on these simple models show that diverse signaling modes can facilitate credit assignment and enhance learning. A wealth of new genetic data provide strong support for the biological plausibility of this array of signaling modes and furthermore argue strongly that most or all modern eumetazoans (all multicellular animals except sponges) comprise numbers of cell types and modulatory signals far in excess of those represented in our simulations (43). We believe therefore that conceiving of neuronal networks as multidigraphs, involving multiple modulatory and synaptic signals, integrated by discrete cell-type nodes, may offer fruitful paths toward the understanding of synaptic credit assignment in biological neuronal networks. This multidigraph theory may also lead to more computationally efficient local learning rules for neural-network-based artificial intelligence.

In addition to established elements of Hebbian plasticity, ETs, and reward feedback signals, our normative theory posits important roles for neuronal cell-type diversity and local neuromodulatory communication in enabling efficient synaptic credit assignment. In particular, our findings predict that neurons secrete information about TD feedback signals they receive to nearby neurons in an activity-dependent and cell-type–specific manner using local modulation. As a consequence, levels of local modulatory signals may reflect the learning process. Indeed, our computational experiments imply that the level of modulatory input decreases over training and sharply rises in response to changes in task condition (*SI Appendix*, Fig. S11). It is also interesting to note that phylogenomic studies now suggest that peptidergic neuromodulation may evolutionarily predate dopamine signaling (43) and thus may have actually provided the foundation upon which dopaminergic TD signaling evolved.

The nature of “intermediate” cells (38), whose phenotypes appear to be a mixture of “pure” cell types, is a key problem in cell-types research. Our findings may explain the existence of such phenotypes from a connectivity perspective: While average connectivities between types remain relatively constant during training, connectivities of individual cells can deviate significantly from those averages (*SI Appendix*, Fig. S12). We hypothesize a link between abstract cell-type-based connectivities and modulatory receptor affinities, where the average synaptic connection weights between types are taken to be the cell-type–specific modulatory receptor affinities (Eq. **23**). How tightly the individual synaptic weights and cell-type–specific receptor affinities should be coupled may be explored in future work. *SI Appendix*, Fig. S2 indicates that even with imprecise GPCR affinities, MDGL can still improve gradient approximation, while *SI Appendix*, Fig. S7 suggests that the effect of imprecise GPCR affinities on the performance is task-dependent.

Learning rules often explicitly minimize a loss function, and the error gradient, if available, tells exactly how much each network parameter should be adjusted to reduce this loss function. Rules that follow this gradient, RTRL and BPTT, are well established, but are not biologically plausible and have unmanageably vast memory storage demands. However, a growing body of studies have demonstrated that learning rules that only partially follow the gradient, while alleviating some of these problems of the exact rules, can still lead to desirable outcomes (44, 45). An example is the seminal concept of feedback alignment (14), which rivals backpropagation on a variety of tasks, even using random feedback weights for credit assignment. In addition, approximations to RTRL have been proposed (4⇓⇓⇓⇓–9, 21) for efficient online learning in recurrent neural networks. Our learning rule has *N* is the number of neurons, which is less expensive than SnAp-2 that has a storage cost of

Examination of the learning capability of MDGL under a broader range of tasks and conditions represents a valuable future avenue. For instance, our preliminary simulations on modulating the delay period in the match to sample task (*SI Appendix*, Fig. S13) suggests that such studies can help reveal the reasons underpinning the observed animal behavior (47), as well as limitations of MDGL. In addition, brain cells are extremely diverse (38, 48) with a matching diversity in the expression of peptidergic genes (30). Further studies can also investigate the interplay of task complexity and cell diversity. A starting point for that could be further dividing inhibitory cells into subtypes with and without threshold adaptation.

Our work suggests that multiple cell-type–specific, diffuse, and relatively slow modulatory signals should be considered as possible bases for credit-assignment computations. Though inspiration for the present work came primarily from new transcriptomic data on local NP signaling in neocortex (30, 42), it is quite possible that other cell-type–specific neuromodulators could likewise contribute to credit assignment. Many of these alternative agents act, as do NPs, via GPCRs (e.g., the monoamines, amino acids, acetylcholine, and endocannabinoids), but our multidigraph template might even apply to other neuronally secreted neuromodulators, such as the neurotrophins and cytokines, that act via different classes of receptors (49, 50). While experimental tests of such hypotheses have not seemed feasible up until now, emerging methods for genetically addressed measurement of various neuromodulatory signals in specific cell types (30, 51) are now bringing the necessary critical tests within reach (e.g., ref. 52).

## Methods

### Visual Summary of Learning Rules.

An overview of our network model and the mathematical basis of the learning rules used in this work is given in the beginning of *Results*. Here, we first present an additional, more detailed visual illustration of these learning rules in Fig. 5, beginning with the exact gradient update (Fig. 5*A*), as for BPTT, and leading from its dramatic truncation in the e-prop rule (Fig. 5*B*), to MGDL (Fig. 5 *C*–*E*), which partially recovers gradient information lost in this truncation.

#### Spiking neuron model.

We consider a discrete-time implementation of RSNNs. The model, as shown in Fig. 6*A*, denotes the observable states, i.e., spikes, as *z _{t}* at time

*t*, and the corresponding hidden states as

*s*. For leaky integrate-and-fire (LIF) cells, the state

_{t}*s*corresponds to membrane potential, and the dynamics of those states are governed by

_{t}*j*at time

*t*,

*dt*and membrane time constant

*τ*,

_{m}*w*denotes the weight of the synaptic connection from neuron

_{lj}*j*to

*l*,

*m*and neuron

*j*,

*x*denotes the external input spike at time

_{t}*t*, and

*H*denotes the Heaviside step function.

Following ref. 22, which implemented adaptive threshold LIF (ALIF) units (37) and observed that this neuron model improves computing capabilities of RSNNs relative to networks with LIF neurons only, we also include ALIF cells in our model. In addition to the membrane potential, ALIF cells have a second hidden variable, *b _{t}*, governing the adaptive threshold. The spiking dynamics of both LIF and ALIF cells can be characterized by the following set of equations:

**3**is the same as Eq.

**2**. A spike is generated when the voltage

*β*controls how much adaptation affects the threshold, and state

*ρ*is given by

*dt*and adaptation time constant

*τ*, which is typically chosen on the behavioral task time scale. For regular LIF neurons without adaptive threshold, one can simply set

_{b}*β*= 0.

#### Network output and loss function.

Dynamics of leaky, graded readout neurons is implemented as*j* to output neuron *k*, *k*-th output neuron, *τ _{OUT}*.

We quantify how well the network output matches the desired target using error function *E*:*SI Appendix*, Note 3.

While the tasks involving time-delayed rewards studied in this manuscript can be labeled as regression and classification tasks due to the nature of the objective function, we note that the theoretical development is general and applies to all loss functions whose partial derivative with respect to spiking activity can be expressed as

As an important example, our derivation is immediately applicable to the actor–critic reinforcement learning framework (22). For the regression task,

One can see that when the leak *κ* is not zero, the error derivative will depend on future errors, which seemingly poses an obstacle to online learning. We provide the online implementation for this readout convention in *SI Appendix*, Note 1. In addition to accuracy optimization described above, we added a firing-rate regularization term *j*, respectively, and

#### Notation for derivatives.

There are two types of computational dependencies in RSNNs: direct and indirect dependencies. For example, variable *w _{pq}* can impact state

**2**, as well as indirectly via its influence through other cells in the network. We distinguish direct dependencies vs. all dependencies (including indirect ones) using partial derivatives (∂) vs. total derivatives (

*d*).

#### Gradient descent learning in RSNNs.

We study iterative adjustment of all synaptic weights (input weights *w ^{IN}*, recurrent weights

*w*, and output weights

*w*) using gradient descent on loss

^{OUT}*E*:

*λ*denotes the learning rate, and the gradient of the error with respect to the synaptic weights must be calculated. This error gradient can be calculated with classical machine-learning algorithms, BPTT and RTRL, by unwrapping the RSNN dynamics over time (Fig. 6

*B*). While these two algorithms yield equivalent results, their bookkeeping for chain rule differs. Gradient calculations in BPTT depend on future activity, which poses an obstacle for online learning and biological plausibility. Our learning-rule derivation follows the RTRL factorization because it is causal. Therefore, we focus our analysis on RTRL and factor the error gradient across time and space as

**13**is related to the TD learning signal

*SI Appendix*, Notes 1 and 2 show that the leak term of the output neurons makes these two terms different and derives an online implementation that uses

We now discuss the second factor in Eq. **13**, i.e., **14**. The first factor, *H* in Eq. **3**, whose derivative is not defined at zero and is zero everywhere else. We overcome this issue by approximating the decay of the derivative using a piece-wise linear function (10, 22, 46, 54). Here, the pseudoderivative

The dampening factor *γ* (typically set to 0.3) dampens the increase of backpropagated errors in order to improve the stability of training very deep (unrolled) RSNNs (22). Throughout this study, neuronal firing displays refractoriness, where *j* (*SI Appendix*, Note 3).

Key problems that RTRL poses to biological plausibility and computational cost reside in the factor **13** and **14**). The factor *j* on weight *w _{pq}*. In other words, this factor accounts for both the spatial and temporal dependencies in RSNNs: State dependencies across time

*t*, as explained above, result from unwrapping the temporal dependencies illustrated in Fig. 6

*B*; state dependencies across space, however, are due to the indirect dependencies (of all

*z*on

_{t}*w*and all

*C*). These recurrent dependencies are all accounted for in the

Thus, the factor *D*, *i*) and requires *w _{pq}*.

To address this, Murray (21) and Bellec et al. (22) (e-prop) dropped the nonlocal terms so that the updates to weight *w _{pq}* would only depend on presynaptic and postsynaptic activity (Figs. 5

*B*and 6

*D*,

*ii*) and applied this truncation to train rate-based and spiking neural networks, respectively. While both works succeed in improving over previous biologically plausible learning rules, a significant performance gap with respect to the full BPTT/RTRL algorithms remains.

#### Derivation of multidigraph learning in RSNNs.

We continue from the previous section in giving a detailed derivation of our learning rule. To reveal a potential role for cell-type-based modulatory signals in synaptic plasticity as well as improve upon the aforementioned biologically plausible gradient descent approximations, we begin by partially restoring nonlocal dependencies between cells—those within one connection step. This is the “truncated” RTRL framework (Figs. 5*D* and 6 *D*, *iii*), and the memory trace term

Thus, when *j* = *p*, our truncation implements **18** adds the case when **18** resembles the *n*-step RTRL approximation recently proposed in ref. 8, known as SnAP-n, which stores *j* such that parameter *w _{pq}* influences the activity of unit

*j*within

*n*time steps. The computations of SnAp-n converge to those of RTRL as

*n*increases, resonating with our improved performance when more terms of the exact gradient are included. Our truncation in Eq.

**18**is similar to SnAp-n with

*n*= 2 with two differences: 1) We apply it to spiking neural networks, and 2) we drop the previous time step’s Jacobian term

**18**requires the maintenance of only a rank-two (“2-d”) tensor specific to synapse

By substituting Eq. **18** into Eqs. **13** and **14**, we approximate the overall gradient as*p*, **24**) denotes the activity-dependent modulatory signal emitted by neuron *j* at time *t*, and **25**) is the ET maintained by postsynaptic cell *p* to keep a memory of the preceding activity of presynaptic cell *q* and postsynaptic cell *p*. In Eq. **19**, the first term **19**, our truncation requires maintaining a {*p*, *q*}-dependent double tensor (for

Importantly, we observe that, for the update to synapse *w _{pq}* in Eq.

**19**, the terms that depend on cells

*j*only appear under a sum. Therefore, the mechanism updating the synapse

*j*. Rather, only their sum suffices.

While it is tempting to consider the first factors in *j*, the involvement of the synapse from neuron *p* via *w _{jp}* and a lack of known mechanisms in calculating this neuron-specific composite signal suggest that this is unlikely to be a biological solution. Instead, inspired by the cell-type–specific (rather than neuron-specific) affinities for peptidergic neuromodulation (30, 42), we propose to approximate the signaling gain

*w*in Eq.

_{jp}**19**by the average value

*w*across its presynaptic and postsynaptic cell types. More specifically, when postsynaptic cell

_{αβ}*j*belongs to type

*α*and presynaptic cell

*p*belongs to type

*β*, we approximate neuron-specific weight

*w*with cell-type–specific gain

_{jp}*w*represents the affinity of the GPCRs expressed by cells of type

_{αβ}*β*to the modulators secreted by cells of type

*α*(

*Cell-type–specific receptor affinities*).

Thus, the gradient estimate at time *t* due to our learning rule involves compounding ET with both TD and local modulatory signals, thereby recovering the general form introduced in Eq. **1**:*p* is of type *β*, *C* denotes the set of neuronal cell types, *p* to *j*, and **19** with cell-type–specific weight averages.

In summary, cell *p* receives local modulatory input **20**) in addition to synaptic input **2**):

It may be instructive to note the dichotomy in the functions of these two different inputs: The cell uses

Hence, our update rule suggests an additive term to compute the plasticity update at synapse *t*, *j*, the affinity of receptors of cell type *β* to ligands of type *α*, *w _{αβ}*, and the ET at the synapse

#### Cell-type–specific receptor affinities.

We explain cell-type–specific signaling implementation, notably, how type–specific receptor affinity *w _{αβ}* is defined. As introduced in our learning rule derivation,

*w*is an approximation of gain

_{αβ}*w*(Eqs.

_{jp}**19**and

**20**), and we proposed to define

*w*as the weight average across its presynaptic and postsynaptic cell types:

_{αβ}*p*to

*j*, motivated by the local diffusion assumption discussed in ref. 35, in which this type of signaling is registered only by local synaptic partners and therefore preserves the connectivity structure of

*w*. One obvious variant of this receptor-affinity definition is one with a different spatial extent, for which we examine the opposite extreme in Fig. 4

_{jp}*D*–

*F*, where modulatory signals diffuse to all cells in the network. More specifically, the signaling gain

*w*is replaced by

_{αβ}*w*= 0 so that modulatory signals diffuse to all cells with the same strength in the network.

_{jp}For a proof of concept, we implemented MDGL with modulatory types mapped to the two main cell classes; i.e., cell-type–specific signaling gain, *SI Appendix*, Fig. S9). On the other hand, increasing the number of modulatory types involved in the cell-type discretization could be key to realizing the potential of MDGL in more complicated tasks, suggesting an explanation for the observed diversity of cell types in the brain.

We implemented receptor affinities as weight averages across types, but how tightly coupled those modulatory gains and synaptic weights are is a subject for future investigation. *SI Appendix*, Fig. S7 explores the sensitivity of the learning performance to imprecise receptor affinities.

#### Activity-dependent modulatory emission implementation.

As introduced in Eq. **20**, activity-dependent modulatory signal emitted by neuron *j* at time *t*, an important component of MDGL, is defined as

As defined, *j*’s membrane potential explained above. While Eq. **20** suggests an “online” implementation with the update at *t*, the factor *SI Appendix*, Note 1 derives an online update for the more general case, which has the same form as Eq. **20**.

#### ET implementation.

As introduced in Eq. **20**, ET, another important component of MDGL, is defined as**26** follows directly from Eq. **18**. *q* and postsynaptic cell *p*. A discussion on interpreting ETs as derivatives can be found in ref. 22. Here, we briefly explain its implementation by expanding the factors in Eqs. **25** and **26** for both LIF and ALIF cells.

For LIF cells, there is no adaptive threshold, so the hidden state consists only of the membrane potential. Thus, we have factors **15**, **2**.

For ALIF cells, there are two hidden variables, so the eligibility vector is now a two-dimensional vector **3**, one can obtain factors

Thus, the ET

## Data Availability

Code for data generation and analysis is available in GitHub at https://github.com/Helena-Yuhan-Liu/MDGL-main.

## Acknowledgments

We thank Forrest Collman, Guillaume Lajoie, James Murray, Scott Owen, and Bosiljka Tasic for helpful feedback on the manuscript. We also thank the Allen Institute founder, Paul G. Allen, for his vision, encouragement. and support. Y.H.L. is supported by the Natural Sciences and Engineering Research Council Postgraduate Scholarships–Doctoral program. This work was facilitated through the use of the University of Washington Hyak supercomputer system.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: hyliu24{at}uw.edu or uygars{at}alleninstitute.org.

- Accepted October 28, 2021.
Author contributions: U.S. conceived initial theory; Y.H.L., S.S., S.M., E.S.-B., and U.S. designed research; Y.H.L., S.S., S.M., E.S.-B., and U.S. performed research; Y.H.L. analyzed data; and Y.H.L., S.S., E.S.-B., and U.S. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2111821118/-/DCSupplemental.

- Copyright © 2021 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

## References

- ↵
- ↵
- T. J. Sejnowski

- ↵
- R. J. Williams,
- D. Zipser

- ↵
- O. Marschall,
- K. Cho,
- C. Savin

- ↵
- A. Mujika,
- F. Meier,
- A. Steger

- ↵
- C. Tallec,
- Y. Ollivier

- ↵
- C. Roth,
- I. Kanitscheider,
- I. Fiete

- ↵
- J. Menick et al

- ↵
- F. Zenke,
- E. O. Neftci

- ↵
- ↵
- ↵
- ↵
- J. E. Rubin,
- C. Vich,
- M. Clapp,
- K. Noneman,
- T. Verstynen

- ↵
- ↵
- A. Payeur,
- J. Guerguiev,
- F. Zenke,
- B. A. Richards,
- R. Naud

- ↵
- I. Pozzi,
- S. Bohté,
- P. Roelfsema

- ↵
- J. Sacramento,
- R. P. Costa,
- Y. Bengio,
- W. Senn

- ↵
- A. Laborieux et al

- ↵
- Y. Amit

- ↵
- B. Millidge,
- A. Tschantz,
- A. K. Seth,
- C. L. Buckley

- ↵
- ↵
- G. Bellec et al

- ↵
- P. Dayan,
- L. F. Abbott

- ↵
- ↵
- ↵
- ↵
- ↵
- A. A. Hamid,
- M. J. Frank,
- C. I. Moore

- ↵
- ↵
- ↵
- J. C. Magee,
- C. Grienberger

- ↵
- W. Gerstner,
- M. Lehmann,
- V. Liakoni,
- D. Corneil,
- J. Brea

- ↵
- S. Yagishita et al

- ↵
- A. Suvrathan

- ↵
- ↵
- ↵
- ↵
- ↵
- G. Bellec,
- D. Kappel,
- W. Maass,
- R. Legenstein

- ↵
- ↵
- T. Meyer,
- X. L. Qi,
- T. R. Stanford,
- C. Constantinidis

- ↵
- S. J. Smith,
- M. Hawrylycz,
- J. Rossier,
- U. Sümbül

- ↵
- ↵
- ↵
- D. Linsley,
- A. K. Ashok,
- L. N. Govindarajan,
- R. Liu,
- T. Serre

- ↵
- D. Huh,
- T. J. Sejnowski

- ↵
- R. Elliott,
- R. J. Dolan

- ↵
- N. W. Gouwens et al

- ↵
- ↵
- ↵
- ↵
- S. Melzer et al

- ↵
- ↵
- S. K. Esser et al

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Neuroscience