## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Network cloning using DNA barcodes

Edited by Terrence J. Sejnowski, Salk Institute for Biological Studies, La Jolla, CA, and approved March 25, 2019 (received for review April 11, 2017)

## Significance

The connections between neurons determine the computations performed by a neural network. Connections can be considered a “summary” of the statistical structure of the experience—data—on which the network was trained. Here, we propose a method for how neuronal network connectivity can be copied or “cloned” from one network to another. Our method relies on the use of DNA barcodes—short DNA sequences that allow tagging individual neurons with unique labels. In our study, we prove theorems that show that such a transfer of network connectivity is theoretically possible.

## Abstract

The connections between neurons determine the computations performed by both artificial and biological neural networks. Recently, we have proposed SYNSeq, a method for converting the connectivity of a biological network into a form that can exploit the tremendous efficiencies of high-throughput DNA sequencing. In SYNSeq, each neuron is tagged with a random sequence of DNA—a “barcode”—and synapses are represented as barcode pairs. SYNSeq addresses the analysis problem, reducing a network into a suspension of barcode pairs. Here, we formulate a complementary synthesis problem: How can the suspension of barcode pairs be used to “clone” or copy the network back into an uninitialized tabula rasa network? Although this synthesis problem might be expected to be computationally intractable, we find that, surprisingly, this problem can be solved efficiently, using only neuron-local information. We present the “one-barcode–one-cell” (OBOC) algorithm, which forces all barcodes of a given sequence to coalesce into the same neuron, and show that it converges in a number of steps that is a power law of the network size. Rapid and reliable network cloning with single-synapse precision is thus theoretically possible.

The connections between neurons determine the computations performed by a neural network. In both biological and artificial neural networks, connections are established and tuned by experience and learning. Connections can thus be considered a “summary” of the statistical structure of the experience—data—on which the network was trained. This summary may be considerably more compact and efficient than the original data. For example, deep neural networks for object recognition contain tens of millions of connections derived from training sets consisting of hundreds of billions pixels, which results in more than 1,000-fold compression (1, 2). It would therefore be more efficient to copy these connections onto a new network than to retrain a new network from scratch.

Most current implementations of artificial neural networks exploit digital computers and graphics processing units (2). On these architectures, connections are stored explicitly and are therefore straightforward to extract and copy into a new network. In biological networks, by contrast, there is no central repository for connections, so reading out the connections of a network and copying them into a new network represents a difficult challenge. During neural development, for example, a genomic DNA sequence representing prior evolutionary experience is converted into the brain’s connectivity. Similar challenges may arise in future artificial or hybrid biological/artificial architectures.

We have recently proposed SYNSeq, an approach for determining neuronal connectivity (3, 4). The key idea is to convert the connections into a form that can be read out using high-throughput DNA sequencing, thereby benefitting from the advances in sequencing technology. Sequencing is now extremely fast and inexpensive—it is routine to decode billions of DNA fragments per day, and sequencing cost has dropped at a rate faster than Moore’s law. To convert neuronal connectivity into a sequencing problem, we induce individual neurons to express unique random nucleotide identifiers called “barcodes.” Pairs of presynaptic and postsynaptic barcodes represent individual synaptic connections. These barcode pairs can then be used to represent the connectivity of a network (Fig. 1).

Here, we formulate a different problem: Given an ensemble of connections represented by barcode pairs, can we copy them into a new network? In other words, can the original network be cloned? We explore a computational model that simulates the behavior of barcodes introduced into a tabula rasa network with unstructured connectivity and test its ability to recreate target connectivity in such networks. We require the underlying mechanisms to be purely local, that is, that the algorithm uses only information available to a given neuron and its synapses. Below, we present an algorithm that allows robust copying of connectivity based only on local interactions.

In our approach, connectivity is specified by unique molecular labels (DNA barcodes) with single-synapse precision. It is commonly assumed that implementing connectivity via individual synaptic tags is not feasible due to the absence of guidance mechanism that would direct the cells to form the right synapses (5). One might expect that establishing desired connectivity using individual synaptic labels would require a number of steps that is exponential in network size. The inadequacy of unique molecular tags in instructing connectivity had motivated Roger Sperry (6) to introduce the idea of molecular gradients. Here, we propose a form of molecular dynamics and find, surprisingly, that it yields convergence to the target connectivity in a number of steps that is polynomial in network size, even though the connectivity is specified by unique molecular labels for each synapse. This finding implies that copying connectivity with single-neuron precision using our strategy is theoretically possible.

## Results

Our algorithm attempts to recreate the target connectivity between neurons (Fig. 1*A*). The connectivity can be represented as a connection matrix *B*). We assume that every network node (neuron) is identified by a unique barcode, that is, by a sequence of nucleotides long enough to label uniquely every neuron in the network (Fig. 1*A*). Network connectivity is thus encoded by barcode pairs, where each barcode pair consists of a presynaptic barcode, a postsynaptic barcode, and a spacer between them indicating the connection’s direction (Fig. 1*C*). This network description is similar to the netlist representation (7). As seen in Fig. 1*C*, every barcode, representing an individual neuron, can be encountered multiple times, equal to the number of synapses made by this neuron. We therefore define a barcode type as the set of barcodes with the same sequence that represents the same neuron. Each barcode pair, on the other hand, is present in only one copy, because it represents an individual synaptic connection. The total number of barcode pairs is equal to the number of nonzero entries in the connection matrix, or to the total number of connections in the network. To simplify notation, we will represent each barcode by a single letter of the alphabet rather than as a string of nucleotides (Fig. 1*D*).

The barcode pairs are introduced into synapses of a tabula rasa network that is, initially, fully connected (Fig. 2*B*). Since connectivity in our model is directional, we assume that, between every two cells, synapses are formed initially in both directions. The full connectivity assumption is made here to simplify the description of network dynamics, and is reminiscent of the overproduction of synaptic connections that occurs during development (8, 9) and the full potential cortical connectivity (10). The number of neurons in the tabula rasa network is assumed to be equal to the number of nodes in the desired network, that is, equal to the number of barcodes.

The barcodes are initially introduced into synapses of tabula rasa network randomly. One possible solution for recreating the target network is to mark each neuron of the tabula rasa network with individual barcode tags and to distribute the barcode pairs into synapses according to these tags. The unoccupied synapses of the tabula rasa network would subsequently be eliminated. Implementing this mechanism practically would require a global supervising mechanism that keeps track of unique label assignments and appropriate barcode placements. Such mechanism is therefore not biologically plausible. Instead, here we formulate a fully local procedure for recreating the target network in the tabula rasa network in which processes in each cell rely only on the information available to each cell. Thus, the target network emerges as a result of self-organization of barcodes in the tabula rasa. The barcodes are rearranged in the network via three types of local moves. First, each barcode can be reinserted in the synapse between the same pair of cells in different orientation (“flips”; Fig. 2*C*). Second, the barcodes can jump from one synapse to another synapse of the same cell (“jumps”; Fig. 2*D*). Finally, two barcodes located in the same neuron can trade places (“swaps”; Fig. 2*E*). To practically implement these three moves, we select two synapses of the same neuron at random, ensure that at least one of them contains a barcode pair, and swap the pairs, even if source and destination are the same or one of them is empty. In implementing these moves, we keep track of the direction of barcode pairs and synapses, that is, barcode pairs are introduced into synapses of the correct orientation. We ensure that the described moves are local in that the barcode pairs are only relocated between synapses of the same neuron.

Using this set of moves, we rearrange barcode pairs in the network attempting to implement the “one-barcode–one-cell” (OBOC) solution. In the OBOC solution, all barcodes in the synapses of the same cell, facing this cell, are the same (Fig. 2*G*). Thus, in Fig. 2*G*, all barcodes in the rightmost cell are described by letter Y (V, X, Y, Z is a short-hand notation for much longer nucleotide sequences). Similarly, all barcodes in the leftmost cell are labeled by letter Z. We reasoned that if the logic of the interaction of cells and barcodes favors OBOC solution, cells will discover their identity as encoded by barcodes. Because every cell in the tabula rasa network has a potential to become any cell as defined by the barcodes, a specific arrangement of barcode pairs respecting OBOC rule is associated with a symmetry breaking, whereby the network selects one possible assignment of barcodes into cells out of

To practically implement OBOC solution, we defined a cost function, H, that is minimized by the barcode dynamics. The cost function depends on the synapse–barcode connection index (SBCI), *B*:

Importantly, the cost function [**1**] has a property of locality, that is, the contribution for each neuron depends on the variables available to this neuron **1**] does not require a global supervisor, which would render the mechanism biologically implausible.

The approach based on minimizing a cost function is one of the ways to quantitatively describe biological processes and has been used successfully to describe establishing connectivity, especially when competition or interdependence between cells is important (16, 17). To minimize the cost function, we use the Metropolis Monte Carlo (MMC) procedure that has been shown to closely approximate the dynamics of synaptic connections during neural development (8, 16⇓–18). Our MMC procedure relied on three types of barcode moves as described above. After the cost function is minimized, at the end of the MMC procedure, we remove synapses that carry no barcodes. Within our model, we can prove the following theorems with regard to reproducing the target connectivity. The detailed proofs are provided in *Appendix*.

### Theorem 1.

*Let* *be the target connectivity defined by the barcode pairs*. *Let* *be a cell connectivity corresponding to an OBOC solution for the same set of the barcode pairs arising after barcode-free synapses are eliminated*. *Then*, *a one-to-one mapping* *exists between the set of barcodes and the neurons*, *which makes*

*Theorem 1* shows that reaching OBOC state is equivalent to cloning the target connectivity. Although this statement is quite obvious, we prove it in *Appendix* for completeness. In *Appendix*, we show that connectivity can be cloned up to a permutation with

### Theorem 2.

*For* *and* *in a non-OBOC state*, *there is always a barcode jump decreasing the cost function*.

*Theorem 2* shows that the cost function [**1**] does not have any non-OBOC minima, meaning that the barcode dynamics would not lead to a metastable, yet wrong, connectivity. Therefore, we prove the following corollary.

### Corollary.

*For* *and* *all of the minima of the cost function* [**1**] *correspond to OBOC solutions*.

This corollary combined with the theorems above shows that minimization of the cost function [**1**] will lead to an OBOC solution, thus cloning the barcode defined connectivity in the network. To estimate the number of steps until convergence to the target connectivity, we proved the following theorem.

### Theorem 3.

*For* *and* *convergence to a minimum of the cost function* [**1**] *takes a number of steps limited from above by a number polynomial in the number of network nodes* (*cells*). *Theorem 3* shows that the convergence of the network connectivity to the target is not exponential.

To measure the number of steps needed to clone network connections and to obtain a stronger estimate for the speed of convergence to the target connectivity under various circumstances, we performed several computer simulations (Fig. 3). To generate the examples of target connectivities, we used random networks of various topologies, connectivity density f, and size N. To quantify the speed of convergence, for each MMC simulation, we computed the number of attempts, **2**] holds for both random Erdős–Rényi (19) networks (Fig. 4*A*) and scale-free (Barabási–Albert) (20) networks (Fig. 4*B*), suggesting that the dependence of the performance on the network topology is negligible compared with the dependence on the connectivity density and network size.

So far, we have assumed that target networks initially have an all-to-all connectivity. Barcode-free synapses are eliminated at the end of cost-function minimization, after the barcode pairs have found an OBOC solution. This full initial connectivity assumption is clearly a simplification intended to mimic the overproduction of synapses during neural development (8, 9). In reality, the formation of connectivity is accomplished via the process of trial and error during which synapses are both created and eliminated (8, 9). To test our conclusions in the model in which synapses can be formed and pruned while the barcode pairs are relocated in the network, we performed simulations with the same cost function in the conditions when synaptic connectivity is both sparse and dynamic. Initially, tabula rasa network was sparse, with the sparseness parameter exceeding the sparseness of the barcode matrix. The number of excess synaptic connections was given by the formula *Methods* for more detail). Thus, for **2**] holds in the case of dynamic synapses (Fig. 4*C*), meaning that the network should not be necessarily fully connected to obtain an accurate copy of the connectivity in a polynomial number of steps.

## Discussion

Here, we have addressed the question whether connectivity can be copied from one neural network to another, using only a local rule. It should be noted that the connectivity in the original network can be obtained using any paradigm that results in connection matrix with the single-synapse precision, such as using volume electron microscopy methods (21, 22) or SYNSeq approach (3, 4). Original connectivity can also result from an application of a learning algorithm in an artificial neural network (1, 2). Independently on their origin, the connections can be represented by an ensemble of DNA barcode pairs (3, 4). We analyzed the dynamics of barcode pairs introduced into a clean-slate tabula rasa network. The particular form of dynamics that we considered is described by OBOC, which favors positioning of a single type of barcodes in a single neuron. We showed that OBOC dynamics leads to fast and reliable recreation of desired connectivity in the new network. The formation of new connectivity is achieved in a number of steps given by a power law of the network size [**2**]. We have proved a convergence theorem (*Theorem 2*) showing that movements of barcodes toward OBOC solution are not obstructed by local minima. Thus, we have demonstrated that copying connectivity from one neural network to another using DNA barcodes is theoretically possible.

The number of steps to convergence, defined by Eq. **2**, may seem impractical, as the number of steps grows rapidly with the network size. Using only local information in cost function [**1**], however, allows moving the barcode pairs in parallel, thus reducing the number of the steps to the convergence. Since the number of barcode pairs is **3**] suggests that the time to copy connectivity does grow with the network size; however, the growth is described by a power law with the relatively small exponent of 1.5. Thus, cloning a network with 10 times more neurons is expected to take about 30 times more time.

In our study, we derived several results on theoretical plausibility of copying the structure of biological neuronal networks. We were motivated by the conventional assumption that neuronal network connectivity carries an imprint of long-term memory and, as such, is an essential substrate of network function (23). Copying (cloning) connectivity might facilitate the transfer of these imprints from one biological network to another. To model network formation, we used a formalism based on the cost function, which is found useful in explaining formation of networks during neural development (8, 16⇓–18). Since our focus was on theoretical plausibility, we did not explore the biological mechanism that could implement the cost function used in this study. Such biological mechanism has to be fairly intricate, as, to form OBOC solutions, barcodes have to agglomerate into the same cell based on their identity defined by their sequence. Precedents to OBOC solution can be found in systems implementing gene expression, for example. Thus, in the olfactory system, each olfactory sensory neuron chooses to express a single olfactory gene out of thousands of possibilities (24). It seems, therefore, that implementations of OBOC solution might involve a system based on gene expression. Further studies are needed to explore these potential mechanisms both experimentally and theoretically.

Our approach can be extended in several directions. For example, in our model, connections were binary and each binary synapse was encoded by a single pair of barcodes. Multiple levels of synaptic strengths can be introduced into our approach by using more than one barcode pair per synaptic connection. A connection encoded by several barcode pairs might be stronger due to the formation of multiple disconnected synaptic active zones between a pair of cells or an increased synaptic area within a single active zone. Although our model cannot distinguish between these two mechanisms, both of them lead to an overall increase in the synaptic conductance between two cells, and, consequently, their effects are similar. In the model explored here, the total number of neurons matched the number of barcode types. This assumption was made to simplify our analysis. It is almost certain to fail in realistic cases. Finally, it is tempting to expand our approach by including multiple sets of barcodes that implement a hierarchy of connection rules. One set of barcodes pairs might encode connectivity between brain regions, while different barcode sets may enforce the rules on mesoscopic and microscopic scales. Including multiple levels of connectivity rules could accelerate wiring large-scale networks. These extensions of our approach could be further investigated.

## Methods

We generated random directed target networks of various topologies, sizes N, and connectivity densities f, as described below. We used 10 different network sizes from **1**]. At the end of simulation, we removed synapses carrying no barcode pairs. In the group of simulations with dynamic synapses, synapses were created and eliminated at the same time as the barcodes were moved between them. In this case, tabula rasa network was random and sparse, with the sparseness parameter

For both the cases of fixed and dynamic synapses, barcodes were relocated between synapses via jumps, swaps and flips as described, according to MMC statistical rules. The probabilities of attempting these three operations were

We used **1**], we used MMC procedure (8, 16⇓–18). The value of temperature was chosen to be sufficiently low *Theorem 2*, connectivity copying can be accomplished via a greedy algorithm. We used MMC procedure because it is quantitatively close to the processes guiding the formation of synaptic connectivity (8, 16⇓–18). To compare connectivity on each step to the target connectivity, we used a greedy procedure that finds dominant barcodes for each cell. To quantify the speed of the convergence, for each MMC simulation, we computed the number of attempts to move the barcodes before a perfect OBOC solution was achieved (Fig. 4). We used linear regression in log–log space to approximate the number of steps to convergence.

## Appendix

### Theorem 1.

*Let* *be the target connectivity defined by the barcode pairs*. *Let* *be a cell connectivity corresponding to an OBOC solution for the same set of the barcode pairs arising after barcode-free synapses are eliminated*. *Then*, *a one-to-one mapping* *exists between the set of barcodes and the neurons*, *which makes*

##### Proof:

Although this theorem is somewhat trivial, we prove it here for completeness. Because, in our approach, the number of barcodes is equal to the number of cells, in OBOC solution, every cell has a unique barcode. One can thus use barcodes to identify cells. We then define a permutation matrix

### Theorem 2.

*For* *and* *in a non-OBOC state*, *there is always a barcode jump decreasing the cost function*.

##### Proof:

According to Eq. **1** of the main text, if barcode of the type β is relocated from the cell number m to the cell number n, the change in the cost function

##### Case 1 (each barcode is contained in one cell only):

Assume that the barcodes are in a non-OBOC state. The simplest case is when each barcode type is located in a single neuron. In this case, several barcode types can share the same neuron, leading to the non-OBOC state. Since the number of barcodes is equal to the number of cells, this implies that the network contains a cell with no barcodes. We denote a cell hosting multiple barcode types by index m and an empty one by index n. The barcode type with the minimal abundance in m is called β. Because

##### Case 2 (at least one barcode is shared between two cells):

In this case, we can pick a pair of cells m and n, both hosting at least one copy of the barcode of the same type, henceforth referred to as β. Using Eq. **4**, we can compute the sum of changes in the cost function for opposite movements of the barcode, that is, from m to n and from n to m:

### Corollary.

*For* *and* *all of the minima of the cost function* [**1**] *correspond to OBOC solutions*.

##### Proof:

Assume we are in a cost-function minimum, which is non-OBOC. According to *Theorem 2*, there is a barcode pair movement, decreasing the cost function. Therefore, it is not a minimum.

### Theorem 3.

*For* *and* *convergence to a minimum of the cost function* [**1**] *takes a number of steps limited from above by a number polynomial in the number of network nodes (cells)*.

##### Proof:

The cost-function spectrum is discrete and limited. For example, if ε is a natural number, then the cost function is an integer number. The lower boundary of the cost-function spectrum corresponds to an OBOC solution (*Corollary*), and equals to *Theorem 2*). Thus, the overall number of the steps to convergence does not exceed

## Acknowledgments

This work was supported by the NIH (Grant R01DA036913), the Swartz Foundation, and by the Aspen Center for Physics (Grant NSF PHY-1066293).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: akula{at}cshl.edu.

Author contributions: S.A.S., B.B., A.M.Z., and A.A.K. designed research; S.A.S., B.B., A.M.Z., and A.A.K. performed research; and A.M.Z. and A.A.K. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

- Copyright © 2019 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

## References

- ↵
- Krizhevsky A,
- Sutskever I,
- Hinton GE

- ↵
- ↵
- Kebschull JM, et al.

- ↵
- ↵
- Tessier-Lavigne M,
- Goodman CS

- ↵
- Sperry RW

- ↵
- Tygar JD,
- Ellickson R

- ↵
- ↵
- ↵
- ↵
- ↵
- Vinje WE,
- Gallant JL

- ↵
- Wegner F

- ↵
- ↵
- ↵
- ↵
- Triplett JW, et al.

- ↵
- ↵
- Erdos P,
- Rényi A

- ↵
- Barabási A-L,
- Albert R

- ↵
- ↵
- ↵
- ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Neuroscience

## Sign up for Article Alerts

## Jump to section

## You May Also be Interested in

_{2.5}in 2011, with a societal cost of $886 billion, highlighting the importance of modeling emissions at fine spatial scales to prioritize emissions mitigation efforts.

### More Articles of This Classification

### Biological Sciences

### Neuroscience

### Related Content

- No related articles found.

### Cited by...

- No citing articles found.