## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Ant-inspired density estimation via random walks

Contributed by Nancy A. Lynch, August 14, 2017 (sent for review April 18, 2017; reviewed by Yehuda Afek, Ziv Bar-Joseph, and Amos Korman)

## Significance

Highly complex distributed algorithms are ubiquitous in nature: from the behavior of social insect colonies and bird flocks, to cellular differentiation in embryonic development, to neural information processing. In our research, we study biological computation theoretically, combining a scientific perspective, which seeks to better understand the systems being studied, with an engineering perspective, which takes inspiration from these systems to improve algorithm design. In this work, we focus on the problem of population density estimation in ant colonies, demonstrating that extremely simple algorithms, similar to those used by ants, solve the problem with strong theoretical guarantees and have a number of interesting computational applications.

## Abstract

Many ant species use distributed population density estimation in applications ranging from quorum sensing, to task allocation, to appraisal of enemy colony strength. It has been shown that ants estimate local population density by tracking encounter rates: The higher the density, the more often the ants bump into each other. We study distributed density estimation from a theoretical perspective. We prove that a group of anonymous agents randomly walking on a grid are able to estimate their density within a small multiplicative error in few steps by measuring their rates of encounter with other agents. Despite dependencies inherent in the fact that nearby agents may collide repeatedly (and, worse, cannot recognize when this happens), our bound nearly matches what would be required to estimate density by independently sampling grid locations. From a biological perspective, our work helps shed light on how ants and other social insects can obtain relatively accurate density estimates via encounter rates. From a technical perspective, our analysis provides tools for understanding complex dependencies in the collision probabilities of multiple random walks. We bound the strength of these dependencies using local mixing properties of the underlying graph. Our results extend beyond the grid to more general graphs, and we discuss applications to size estimation for social networks, density estimation for robot swarms, and random walk-based sampling for sensor networks.

- population density estimation
- random walk sampling
- network exploration
- ant colony algorithms
- biological distributed algorithms

The ability to sense local population density is an important tool used by many ant species. When a colony of *Temnothorax* ants must relocate to a new nest, scouts search for potential nest sites, assess their quality, and recruit other scouts to high-quality locations. A high enough density of scouts at a potential new nest (a quorum threshold) triggers those ants to decide on the site and transport the rest of the colony there (2). When neighboring colonies of *Azteca* ants compete for territory, a high relative density of a colony’s ants in a contested area will cause those ants to attack enemies in the area, while a low relative density will cause the colony to retreat (3). Varying densities of harvester ants successfully performing certain tasks such as foraging or brood care can trigger other ants to switch tasks, maintaining proper worker allocation in the colony (4, 5).

It has been shown that ants estimate density in a distributed manner, by measuring encounter rates (2, 6). As ants randomly walk around an area, if they bump into a larger number of other ants, this indicates a higher population density. By tracking encounters with specific types of ants, for example, successful foragers or enemies, ants can estimate more specific densities. This strategy allows each ant to obtain an accurate density estimate and requires very little communication: Ants must simply detect when they collide and do not need to perform any higher-level data aggregation.

## Density Estimation on a Grid

We study distributed density estimation from a theoretical perspective. We model a colony of ants as a set of anonymous agents randomly placed on a 2D grid. Computation proceeds in rounds, with each agent stepping in a random direction in each round. A collision occurs when two agents reach the same position in the same round, and encounter rate is measured as the number of collisions an agent is involved in during a sequence of rounds divided by the number of rounds. Aside from collision detection, the agents have no other means of communication.

The intuition that encounter rate tracks density is clear. It is easy to show that, for a set of randomly walking agents, the expected encounter rate measured by each agent is exactly the density *Lemma 2*). However, it is unclear if encounter rate actually gives a good density estimate, that is, if the estimate is close to its expectation with high probability.

Consider agents positioned not on the grid but on a complete graph. In each round, each agent steps to a uniformly random position, and, in expectation, the number of other agents it collides with in this step is

On the grid graph, the picture is significantly more complex. If two agents are initially located near each other, they are more likely to collide via random walking. After a first collision, due to their proximity, they are likely to collide repeatedly in future rounds. Since the agents are anonymous, they cannot recognize repeat collisions, and, even if they could, it is unclear that it would help. On average, compared with the complete graph, agents collide with fewer individuals and collide multiple times with those individuals that they do encounter, making encounter rates a less reliable estimate of population density.

Mathematically speaking, on a graph with a fast mixing time (7), like the complete graph, each agent’s location is only weakly correlated with its previous locations. This ensures that collisions are also weakly correlated between rounds, and encounter rate serves as a very accurate estimate of density. The grid graph, on the other hand, is slow mixing: Agent positions and hence collisions are highly correlated between rounds, lowering the accuracy of encounter-rate-based estimation.

## Results

Surprisingly, despite the high correlation between collisions, we show that encounter rate-based density estimation on the grid is nearly as accurate as on the complete graph. After just *Theorem 1*). This matches performance on the complete graph up to a

Technically, to bound accuracy on the grid, we obtain moment bounds on the number of times that two randomly walking agents collide over a set of rounds (*Lemma 5*). These bounds also apply to the number of equalizations (returns to origin) of a single walk. While expected random walk hitting times, return times, and collision rates are well studied for many graphs, including grid graphs (7⇓–9), higher moment bounds and high probability results are much less common.

Our moment bounds show that, while the grid graph is slow mixing, it has strong local mixing. That is, random walks tend to spread quickly over a local area and not repeatedly cover the same nodes, making random walk-based density estimation accurate. Significant work has focused on showing that random walk sampling is nearly as good as independent sampling for fast-mixing expander graphs (10, 11). We extend this type of analysis to slowly mixing graphs, showing that strong local mixing is sufficient in many applications.

The key to the local mixing property of the grid is an upper bound on the probability that two random walks starting from the same position recollide (or that a single random walk equalizes) after a certain number of steps (*Lemma 3*). We show that recollision probability bounds imply collision moment bounds on general graphs, and apply this technique to extend our results to

## Theoretical Model for Density Estimation

We consider a set of agents populating a 2D torus with

Initially, each agent is placed independently at a uniform random node in the torus. Computation proceeds in discrete, synchronous rounds. Each agent updates its position with a step chosen uniformly at random from

Aside from the ability to move in each round, agents can sense the number of agents other than themselves at their position at the end of each round, formally through the function

## The Density Estimation Problem

Let

### Local vs. Global Density.

The problem described above requires estimating the global population density. We assume that agents are initially distributed uniformly at random on the torus, which is critical for fast global density estimation: When agents are uniformly distributed, the local density in a small radius around their starting position reflects the global density with good probability. Of course, in nature, ants are not typically uniformly distributed in the nest or surrounding areas. Additionally, they are often interested in estimating local population densities, e.g., in a new nest site when house-hunting (2) or around a nest entrance when estimating the number of successful foragers for task allocation (4).

We view our work as a first step toward a theoretical understanding of density estimation, and we focus on the global density for simplicity. Removing our assumption of uniformly distributed agents and understanding local density estimation are important directions for future work.

## Random Walk-Based Density Estimation on the 2D Torus

As discussed, the challenge in analyzing random walk-based density estimation on the torus arises from correlations between collisions of nearby agents. If we do not restrict agents to random walking, and instead allow each agent to take an arbitrary step in each round, they can avoid collision correlations by splitting into “stationary” and “mobile” groups and counting collisions only between members of different groups. This allows them to essentially simulate independent sampling of grid locations to estimate density. This method is simple to analyze (*SI Appendix*, section S1), but it is not “natural” in a biological sense or useful for the applications we present. Further, independent sampling is unnecessary! *Algorithm 1* describes a simple random walk-based approach that gives a nearly matching bound.

Our main theoretical result follows; its proof appears at the end of this section, after a number of preliminary lemmas. Throughout our analysis, we take the viewpoint of a single agent executing *Algorithm 1*.

### Theorem 1 (Random Walk Sampling Accuracy Bound).

*After running for* *rounds, assuming* *, an agent executing* Algorithm 1 *returns* *such that, for any* *, with a probability of* *,* *for* *In other words, for any* *if* *,* *is a* *multiplicative estimate of* *with a probability of*

*Theorem 1* focuses on the density estimate of a single agent executing *Algorithm 1*. However, we note that, if we set

### Correctness of Encounter Rate in Expectation.

The first step in proving *Theorem 1* is to show that the encounter rate

### Lemma 2 (Unbiased Estimator). E d ∼ = d .

#### Proof.

We can decompose the collision bound *Algorithm 1* as the sum of collisions with different agents over different rounds. Specifically, give the

Since each agent is initially at a uniform random location and, after any number of steps, is still at a uniform random location, for all

We note that the torus is bipartite, and hence two agents initially located an odd number of steps away from each other will never meet via random walking. However, this fact does not change the expectation of

With *Lemma 2* in place, it remains to show that the encounter rate is close to its expectation with high probability and so provides a good estimate of density. To do this, we must bound the strength of correlations between collisions of nearby agents in successive rounds, which can decrease the accuracy of the encounter rate-based estimate.

### A Recollision Probability Bound.

The key to bounding collision correlations is bounding the probability of a recollision between two agents in round

Let

Each *Theorem 1*.

### Lemma 3 (Recollision Probability Bound).

*Consider two agents* *and* *randomly walking on a 2D torus of dimensions* *. If* *and* *collide in round* *, for any* *, the probability that* *and* *collide again in round* *is*

#### Proof sketch.

The full proof of *Lemma 3* is given in *SI Appendix*, section S2. We sketch the main ideas here and illustrate them in Fig. 2.

We first show that the probability that

The recollision probability is the probability that

One idea might be to bound this “equalization probability” using the global mixing time of the torus (7). After

Thus, we must take a different approach. We first assume, for simplicity, that the walk is on an infinite grid, and so there is no possibility of returning to its origin by “wrapping around” the torus. We later show that this only affects the equalization probability by an

Considering a walk on the infinite grid, we condition on the walk taking roughly

It is well known that an

Since it may be of independent interest, in *Corollary 15* in *SI Appendix*, section S3, we restate the result of *Lemma 3* explicitly in terms of a bound on the probability that a single random walk returns to its origin (equalizes) after

### Collision Moment Bound.

With *Lemma 3* in hand, we can prove our collision moment bound, which we use to show that the number of collisions an agent sees concentrates strongly around its expectation. We first show that any agent is likely to collide with many other agents during the execution of *Algorithm 1*, rather than repeatedly colliding with just a few other agents. That is, the probability that an agent collides at least once with any given other agent is not too low.

### Lemma 4 (First Collision Probability).

*Assuming* *, for all* *,*

#### Proof sketch.

By *Lemma 3* and the assumption that *Lemma 2*, an agent expects to be involved in *SI Appendix*, section S2.

*Lemma 4* used that, by *Lemma 3*, an agent expects to collide

### Lemma 5 (Collision Moment Bound).

*For* *, let* *and assume* *. There is some fixed constant* *such that*, *for any integer* *,**Lemma 5* gives a bound on the variance of

#### Proof sketch.

Very roughly, we separately consider the simple case when *Lemma 4*. In the latter case, we split *Lemma 3* to bound this probability as

Obtaining the theorem requires combining this bound with Eq. **1** and applying a number of careful rearrangements. However, the bound on *SI Appendix*, section S2.

As with *Lemma 3*, the techniques used in *Lemma 5* can be applied to bounding the moments of the number of equalizations of a single random walk. See *Corollaries 16* and *17* in *SI Appendix*, section S3.

### Correctness of Encounter Rate with High Probability.

Armed with *Lemma 5*, we can finally show that *Algorithm 1*). We first restate *Lemma 5* using a standard “Bernstein condition” on the sum

### Corollary 6 (Bernstein Condition).

*Assuming* *for all* *and some* *and*

#### Proof.

By *Lemma 5*, there exists some constant

We use the following concentration bound for random variables satisfying such a Bernstein condition.

### Lemma 7.

*Suppose that* *satisfies* *for all* *. Then, for any* *,*

We conclude this section by proving our main theorem on the accuracy of random walk-based density estimation.

### Proof of Theorem 1.

In *Algorithm 1*, *Lemma 2*. By *Corollary 6* and *Lemma 7*,

## Extensions to Other Topologies

We now discuss extensions of our results to a broader set of graph topologies, demonstrating the generality of our local mixing analysis. We illustrate divergence between local and global mixing properties, which can have significant effects on random walk-based algorithms. Full proofs for all results in this section are deferred to *SI Appendix*, section S4.

### From Recollision Bounds to Accurate Density Estimation.

Our proofs for the 2D torus are largely independent of graph structure, using just a recollision probability bound (*Lemma 3*) and the regularity (uniform node degrees) of the grid, so agents remain uniformly distributed on the nodes in each round (see, for example, *Lemma 2*). Hence, extending our results to other regular graphs primarily involves obtaining recollision probability bounds for these graphs.

We consider agents on a graph with *Algorithm 1*, stepping to a random neighbor in each round. Again, we focus on the multiagent case, but similar bounds (resembling *Corollaries 16* and *17* in *SI Appendix*, section S3) hold for a single random walk. We start with a lemma which gives density estimation accuracy in terms of recollision probability. This is a direct generalization of our grid analysis.

### Lemma 8 (Recollision Probability to Density Estimation Accuracy).

Consider a regular graph with *Algorithm 1* returns

Note that, in the special case of the 2D torus, by *Lemma 3*, we can set *Theorem 1*.

### Density Estimation on *k*-Dimensional Tori.

We first consider

### Lemma 9 (Recollision Probability Bound: Ring).

*If two randomly walking agents* *and* *are located on a* *D torus (a ring) with* *nodes, and collide in round* *, for any* *, the probability that* *and* *collide again in round* *for* *is*

#### Proof sketch.

This bound can be shown similarly to *Lemma 3* (and, in fact, its proof is fully contained in the proof of *Lemma 3*.) A

For *Lemma 8*, on a ring, random walk-based density estimation gives

We now cover

### Lemma 10 (Recollision Probability Bound: High-Dimensional Torus).

*If two randomly walking agents* *and* *are located on a* *-dimensional torus with* *nodes, and collide in round* *, for any constant* *,* *, the probability that* *and* *collide in round* *is*

#### Proof sketch.

The proof is similar to that of *Lemma 3*. To collide in round *Lemma 3*. The result follows by multiplying these

To convert the above bound to a density estimation accuracy, we can use a slightly modified version of *Lemma 8*, which applies to the case when our collision probability is

### Density Estimation on Regular Expanders.

When a graph does mix well globally, it mixes well locally. An obvious example is the complete graph, on which random walk-based and independent sampling-based density estimations are equivalent. We extend this intuition to any regular expander. An expander is a graph whose random walk matrix has its second eigenvalue bounded away from

### Lemma 11 (Recollision Probability Bound: Regular Expander).

*Let* *be a* *-regular expander with* *nodes and adjacency matrix* *. Let* *be its random walk matrix, with eigenvalues* *. Let* *. If two randomly walking agents* *and* *collide in round* *, for any* *, the probability that they collide again in round* *is*, *at most*,

#### Proof sketch.

The bound follows from noting that the stable distribution on a regular expander is uniform, and the location distribution of any agent after

Again, we bound density estimation accuracy via a modification of *Lemma 8*, which applies when we have collision probability

### Density Estimation *k*-Dimensional Hypercubes.

Finally, we give bounds for a *Lemma 11* with

### Lemma 12 (Recollision Probability Bound: *k*-Dimensional Hypercube).

*If two randomly walking agents* *and* *are located on a* *-dimensional hypercube with* *vertices and collide in round* *, for any* *, the probability that* *and* *collide in round* *is*

Converting to a density estimation bound, we have

## Applications

We conclude by discussing algorithmic applications of our ant-inspired density estimation algorithm (*Algorithm 1*), variations on this algorithm, and the analysis techniques we have developed.

### Social Network Size Estimation.

Random walk-based density estimation is closely related to work on estimating the size of social networks and other massive graphs using random walks (12, 17⇓–19). In these applications, one does not have access to the full graph (so cannot exactly count the nodes) but can simulate random walks by following links between nodes (20, 21). One approach is to run a single random walk and count repeat node visits (17, 18). Alternatively, ref. 12 proposes running multiple random walks and counting their collisions, which gives an estimate of the walk’s density. Since the number of walks is known, this yields an estimate for network size.

This approach can be significantly more efficient, since the dominant cost is typically in link queries to the network. With multiple, shorter random walks, this cost can be trivially distributed to multiple servers simulating walks independently. Visit information can then be aggregated, and the collision count can be computed in a centralized manner.

#### Random walk-based algorithm for network size estimation.

Consider an undirected, connected, nonbipartite graph

To compute the network size

A number of challenges are introduced by this application. While we can simulate many random walks on

Further, in general,

Our results depend on a natural generalization of recollision probability. For any *Lemma 8*.

For simplicity, we initially ignore burn-in and assume that our walks start distributed exactly by the stable distribution of *SI Appendix*, section S5 for a rigorous analysis of burn-in and average degree estimation.

Note that there are many ways to implement the *Algorithm 2*. One possibility is to simulate the random walks in parallel, recording their paths, and then to perform centralized postprocessing to count collisions. As queries to the network are considered to dominate time cost, this collision counting step is relatively inexpensive.

We prove the following theorem in *SI Appendix*, section S5.

### Theorem 13.

*If* Algorithm 2 *is run using* *random walks for* *steps, as long as* *, then*, *with probability at least* *, it returns*

#### Proof sketch.

The proof is similar to that of *Theorem 1*. It is not hard to see that, due to our reweighting of each collision by *Lemma 5*. However, we can give a variance bound on

### Overall runtime and comparison to previous work.

Let *Algorithm 2* (see *SI Appendix*, section S5 for details). To obtain a *Theorem 13* and our analysis of average degree estimation in *SI Appendix*, section S5, we have

Ref. 12 uses a different approach, halting random walks and counting collisions immediately after burn-in. For reasonable node degrees, they require **2** gives an important tradeoff: By increasing

As an illustrative example, consider a *Algorithm 2* (see *SI Appendix*, section S5 for details) is *Lemma 10*,

We leave open comparing our bounds with those of ref. 12 on more natural classes of graphs. It would be interesting to determine typical values of

### Distributed Density Estimation by Robot Swarms.

*Algorithm 1* can be directly applied as a simple and robust density estimation algorithm for robot swarms moving on a 2D plane modeled as a grid. Additionally, the algorithm can be used to estimate the frequency of certain properties within the swarm. Let

Assuming that agents with property *Theorem 1*, after running for

In an ant colony, properties may include whether an ant has recently completed a successful foraging trip (4), or if an ant is a nestmate or enemy (3). In a robotics setting, properties may include whether a robot is part of a certain task group, whether it has completed a certain task, or whether it has detected a certain event or environmental property.

### Random Walk-Based Sensor Network Sampling.

Finally, we believe our moment bounds for a single random walk (*Corollaries 16* and *17* in *SI Appendix*, section S3) can be applied to random walk-based distributed algorithms for sensor network sampling. We leave obtaining rigorous bounds in this domain to future work.

Random walk-based sensor network sampling (13, 14) is a technique in which a query message (a “token”) is initially sent by a base station to some sensor. The token is relayed randomly between sensors, which are connected via a grid network, and its value is updated appropriately at each step to give an answer to the query. This scheme is robust and efficient; it easily adapts to node failures and does not require setting up or storing spanning tree communication structures.

Random walk-based sampling could be used, for example, to estimate the percentage of sensors that have recorded a specific condition, or the average value of some measurement at each sensor. However, as in density estimation, unless an effort is made to record which sensors have been previously visited, additional error is added due to repeat visits. Recording previous visits introduces computational burden: Either the token message size must increase or nodes themselves must remember which tokens they have seen. We are hopeful that our moment bounds can be used to show that this is unnecessary: Due to strong local mixing, the number of repeat sensor visits will be low, and the performance reduction limited.

We remark that estimating the percentage of sensors in a network or the density of robots in a swarm with a property that is uniformly distributed is a special case of a more general data aggregation problem: Each agent or sensor holds a value

## Discussion and Future Work

We have presented a theoretical analysis of random walk-based density estimation by agents moving synchronously on a 2D torus graph. We have also presented applications of our techniques to density estimation on other simple graph topologies and to the problems of social network size estimation and density estimation on robot swarms.

Aside from using our bounds to study sensor network sampling and giving improved theoretical and empirical understanding of our social network size estimation algorithm, our work leaves open a number of questions related to modeling random walk-based density estimation in ant colonies.

We feel that our simple computational model well reflects the behavior of ants estimating density via collision rates while moving around a 2D surface. However, extending our results to more realistic models, e.g., with continuous movement along a surface which is either bounded or extends out indefinitely, is an interesting future direction.

As discussed, understanding how close actual ant movements are to random walks, and how nonrandom behavior influences density estimation via collision detection, is also important. In conjunction with this issue, removing our uniform density assumption and understanding how ants may estimate local population densities which may vary throughout the nest or surrounding area is an important direction.

Finally, we note that the accuracy bound of *Theorem 1* depends on the density

## Acknowledgments

We thank Amartya Shankha Biswas, Christopher Musco, and Mira Radeva for useful discussions. Research was supported by National Science Foundation (NSF) Grants BIO-1455983 and CCF-1461559, NSF Center for Science of Information Grant CCF-0939370, and Air Force Office of Scientific Research Grant FA9550-13-1-0042. C.M. is partially supported by an NSF graduate student fellowship.

## Footnotes

↵

^{1}C.M., H.S., and N.L. contributed equally to this work.- ↵
^{2}To whom correspondence should be addressed. Email: lynch{at}csail.mit.edu.

This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected in 2016.

An extended abstract of this work has been previously published (1).

Author contributions: C.M., H.-H.S., and N.A.L. designed research; C.M., H.-H.S., and N.A.L. performed research; C.M. and H.-H.S. contributed new reagents/analytic tools; and C.M., H.-H.S., and N.A.L. wrote the paper.

Reviewers: Y.A., Tel-Aviv University; Z.B.-J., Carnegie Mellon University; and A.K., French National Center for Scientific Research.

The authors declare no conflict of interest.

See QnAs on page 10512.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1706439114/-/DCSupplemental.

## References

- ↵.
- Musco C,
- Su H-H,
- Lynch NA

- ↵
- ↵
- ↵.
- Gordon DM

- ↵
- ↵
- ↵.
- Lovász L

- ↵.
- Elsässer R,
- Sauerwald T

- ↵.
- Kanade V,
- Mallmann-Trenn F,
- Sauerwald T

- ↵.
- Gillman D

- ↵.
- Chung KM,
- Lam H,
- Liu Z,
- Mitzenmacher M

- ↵.
- Katzir L,
- Liberty E,
- Somekh O,
- Cosma IA

- ↵.
- Avin C,
- Brito C

- ↵.
- Lima L,
- Barros J

- ↵
- ↵.
- Bassalygo L,
- Pinsker M

- ↵.
- Kurant M,
- Butts CT,
- Markopoulou A

- ↵.
- Lu J,
- Li D

- ↵.
- Lu J,
- Wang H

- ↵.
- Mislove A,
- Marcon M,
- Gummadi KP,
- Druschel P,
- Bhattacharjee B

- ↵.
- Gjoka M,
- Kurant M,
- Butts CT,
- Markopoulou A