## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Direct reciprocity in structured populations

Edited by Simon A. Levin, Princeton University, Princeton, NJ, and approved May 3, 2012 (received for review April 20, 2012)

## Abstract

Reciprocity and repeated games have been at the center of attention when studying the evolution of human cooperation. Direct reciprocity is considered to be a powerful mechanism for the evolution of cooperation, and it is generally assumed that it can lead to high levels of cooperation. Here we explore an open-ended, infinite strategy space, where every strategy that can be encoded by a finite state automaton is a possible mutant. Surprisingly, we find that direct reciprocity alone does not lead to high levels of cooperation. Instead we observe perpetual oscillations between cooperation and defection, with defection being substantially more frequent than cooperation. The reason for this is that “indirect invasions” remove equilibrium strategies: every strategy has neutral mutants, which in turn can be invaded by other strategies. However, reciprocity is not the only way to promote cooperation. Another mechanism for the evolution of cooperation, which has received as much attention, is assortment because of population structure. Here we develop a theory that allows us to study the synergistic interaction between direct reciprocity and assortment. This framework is particularly well suited for understanding human interactions, which are typically repeated and occur in relatively fluid but not unstructured populations. We show that if repeated games are combined with only a small amount of assortment, then natural selection favors the behavior typically observed among humans: high levels of cooperation implemented using conditional strategies.

The problem of cooperation in its simplest and most challenging form is captured by the Prisoners’ Dilemma. Two people can choose between cooperation and defection. If both cooperate, they get more than if both defect, but if one defects and the other cooperates, the defector gets the highest payoff and the cooperator gets the lowest. In the one-shot Prisoners’ Dilemma, it is in each person’s interest to defect, even though both would be better off had they cooperated. This game illustrates the tension between private and common interest.

However, people often cooperate in social dilemmas. Explaining this apparent paradox has been a major focus of research across fields for decades. Two important explanations for the evolution of cooperation that have emerged are reciprocity (1⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–19) and population structure (20⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–32). If individuals find themselves in a repeated Prisoner’s Dilemma—rather than a one-shot version—then there are Nash equilibria where both players cooperate under the threat of retaliation in future rounds (1⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–19). The existence of such equilibria is a cornerstone result in economics (1⇓–3), and the evolution of cooperation in repeated games is of shared interest for biology (4⇓⇓⇓⇓⇓–10), economics (11⇓⇓–14), psychology (15), and sociology (16), with applications that range from antitrust laws (17) to sticklebacks (18), although it has been argued that firm empirical support in nonhuman animal societies is rare (19).

Population structure is equally important. If individuals are more likely to interact with others playing the same strategy, then cooperation can evolve even in one-shot Prisoner’s Dilemmas, because then cooperators not only give, but also receive more cooperation than defectors (20⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–32). There are a host of different population structures and update rules that can cause the necessary assortment (28, 31). Whether thought of in terms of kin selection (20, 25, 26), group selection (24, 27, 32), both (29), or neither (30, 31), population structure can allow for the evolution of cooperative behavior that would not evolve in a well-mixed population. Assortment can, but does not have to be genetic, as for example in coevolutionary models based on cultural group selection (32).

In this article, we consider the interaction of these two mechanisms: direct reciprocity and population structure. We begin by re-examining the ability of direct reciprocity to promote cooperation in unstructured populations. Previous studies tend to consider strategies that only condition on the previous period (8⇓–10) or use static equilibrium concepts that focus on infinitely many repetitions (11, 12). Although useful for analytical tractability, both of these approaches could potentially bias the results. Thus, we explore evolutionary dynamics that allow for an open-ended, infinite strategy space, and look at games where subsequent repetitions occur with a fixed probability δ.

To do so, we perform computer simulations where strategies are implemented using finite state automata (see Fig. 1 for examples), and compliment these simulations with analytical results, which are completely general and apply to all possible deterministic strategies. Our simulations contain a mutation procedure that guarantees that every finite state automaton can be reached from every other finite state automaton through a sequence of mutations. Thus, every strategy that can be encoded by a finite state automaton is a possible mutant. The mutants that emerge at a given time depend on the current state of the population: close-by mutants, requiring only one or two mutations, are more likely to arise than far away mutants, requiring many mutations.

Our computer program can, in principle, explore the whole space of deterministic strategies encoded by finite state automata. Fig. 1 shows that the population regularly transitions in and out of cooperation and gives a sample of the equilibrium strategies, with different degrees of cooperation, that surface temporarily. The variety of equilibria shows that evolution does explore a host of different possibilities for equilibrium behavior, and that it is as creative in constructing equilibria as it is in undermining them.

Based on previous analyses of repeated games, one might expect evolution to lead to high levels of cooperation for relatively small b/c ratios in our simulations, provided the continuation probability δ is reasonably large. However, this is not what we find. To understand why, we have to consider indirect invasions (33).

In a well-mixed population, the strategy Tit-for-Tat (TFT) can easily resist a direct invasion of ALLD (always choosing to defect, regardless) because ALLD performs badly in a population of TFT players, provided that the continuation probability δ is large enough. The strategy ALLC (unconditional cooperation), however, can serve as a springboard for ALLD and thereby disrupt cooperation. ALLC is a neutral mutant of TFT: when meeting themselves and each other, both strategies always cooperate, and hence they earn identical payoffs. Thus, ALLC can become more abundant in a population of TFT players through neutral drift. When this process occurs, ALLD can then invade by exploiting ALLC. Unconditional cooperation is therefore cooperation’s worst enemy (10).

Indirect invasions do not only destroy cooperation; they can also establish cooperation (for examples see the *SI Appendix*). If the population size is not too small, those indirect invasions in and out of cooperation come to dominate the dynamics in the population (34).

We can show that no strategy is ever robust against indirect invasions; there are always indirect paths out of equilibrium (34). Our simulations also suggest that in a well-mixed population, paths out of cooperation are more likely than paths into cooperation. Even for relatively high continuation probabilities, evolution consistently leads only to moderate levels of cooperation when averaged over time (unless the benefit-to-cost ratio of cooperation is very high) (see also *SI Appendix*, Fig. S5).

How, then, can we achieve high levels of cooperation? We find that adding a small amount of population structure increases the average level of cooperation substantially if games are repeated. The interaction between repetition and population structure is of primary importance, especially for humans, who tend to play games with many repetitions and who live in fluid, but not totally unstructured populations (35⇓⇓⇓⇓–40).

To explore the effect of introducing population structure, we no longer have individuals meet entirely at random. Instead, a player with strategy *S* is matched with an opponent that also uses strategy *S* with probability α *+* (1 *–* α)*x _{S}*, where

*x*is the frequency of strategy

_{S}*S*in the population, and α is a parameter that can vary continuously between 0 and 1. With this matching process, the assortment parameter α is the probability for a rare mutant to meet a player that has the same strategy (21, 29, 41, 42).

If δ *=* 0, we are back in the one-shot version of the game. If α *=* 0, we study evolution in the unstructured (well-mixed) population. Thus, the settings where only one of the two mechanisms is present are included as special cases in our framework.

Fig. 2 shows how assortment and repetition together affect the average level of cooperation in our simulations. If there is no repetition, δ *=* 0, we find a sharp threshold for the evolution of cooperation. If there is some repetition, δ *>* 0, we find a more gradual rise in the average level of cooperation as assortment α increases. In the lower right region of Fig. 2, where repetition is high and assortment is low (but nonzero), we find behavior similar to what is observed in humans: the average level of cooperation is high, and the strategies are based on conditional cooperation (14, 43–46). In contrast, when assortment is high and repetition is low we observe the evolution of ALLC, which is rare among humans.

To gain a deeper understanding of these simulation results, we now turn to analytical calculations. For the stage game of the repeated game we consider the following payoff matrix.

Our analytical results are derived without restricting the strategy space and by considering both direct and indirect invasions (see the *SI Appendix* for calculation details). The analytical results are therefore even more general than the simulation results. The simulations allow for all strategies that can be represented by finite automata (a large, countably infinite set of strategies where the mutation procedure specifies which mutations are more or less likely to occur). The analytical treatment, on the other hand, considers and allows for all possible strategies. Thus, the analytical results are completely general and independent of any assumptions about specific mutation procedures.

We find that the parameter space, which is given by the unit square spanned by α and δ, can be divided into five main regions (Fig. 2*B*). We refer to the lower left corner, containing δ = 0, α = 0, as region 1, and number the remaining regions 2–5, proceeding counter clockwise around the pivot (δ, α) = (0, *c/b*). To discuss these regions we introduce the cooperativity of a strategy, defined as the average frequency of cooperation a strategy achieves when playing against another individual using the same strategy. Cooperativity is a number between 0 and 1: fully defecting strategies, such as ALLD, have cooperativity 0, and fully cooperative strategies, such as ALLC or TFT (without noise), have cooperativity 1. In Fig. 3, we show representative simulation outcomes for parameter values in the center of each region.

In regions 1 and 2, where both α and δ are less than *c*/*b*, all equilibrium strategies have cooperativity 0. In region 1, ALLD can directly invade every nonequilibrium strategy. In region 2 ALLD no longer directly invades all nonequilibrium strategies, but for every strategy with cooperativity larger than 0, there is at least one strategy that can directly invade. The fact that these direct invaders exist, however, does not prevent the population from spending some time in cooperative states. In our simulations, direct invasions out of cooperation are relatively rare, possibly because direct invaders are hard to find by mutation. Instead, cooperative states are more often left by indirect invasions.

In region 3 there exist equilibrium strategies for levels of cooperativity ranging from 0 to 1. In the simulations, we observe the population going from equilibrium to equilibrium via indirect invasions. The population spends time in states ranging from fully cooperative to completely uncooperative. As α and δ increase, indirect invasions that increase cooperation become more likely and indirect invasions that decrease cooperation become less likely; therefore the average level of cooperativity increases.

In region 4 there still exist equilibrium strategies for different levels of cooperativity, but fully defecting strategies are no longer equilibria. All equilibria are at least somewhat cooperative. Indirect invasions by fully defecting strategies are possible, and they do occur, but they result in relatively short-lived excursions into fully defecting disequilibrium states, which can be directly invaded by strategies with positive cooperativity. As a result, cooperativity is high across much of region 4. In particular, reciprocal cooperative strategies, which condition their cooperation on past play, are common, for example TFT and Grim.

Finally, in region 5 all equilibrium strategies have maximum cooperativity, and ALLC can directly invade every strategy that is not fully cooperative. It is disadvantageous to defect regardless of the other’s behavior. Therefore, not only is cooperativity high, but specifically unconditional cooperation (ALLC) is the most common cooperative strategy by a wide margin.

The five regions are separated by four curves, which are calculated in the *SI Appendix*. One remarkable finding is the following. If assortment is sufficiently high, α > *c*/*b*, then introducing repetition (choosing δ > 0) can be bad for cooperation. The intuition behind this finding is that reciprocity not only protects cooperative strategies from direct invasions by defecting ones, but also shields somewhat cooperative strategies from direct invasion by more cooperative strategies. (Details are in the *SI Appendix*, along with an explanation how repetition can facilitate indirect invasions into ALLC for large enough δ. See also ref. 47 and references therein for games other than the Prisoner’s Dilemma in which repetition can be bad, even without population structure). If we move horizontally through the parameter space, starting in region 5 and increasing the continuation probability, average cooperativity in the simulations therefore first decreases (Fig. 2). Later, cooperativity goes back up again. The reason for this result is that equilibrium strategies can start with “handshakes,” which require one or more mutual defections before they begin to cooperate with themselves (see strategies B, E, F, I, J, K, and O in Fig. 1). The loss of cooperativity for any given handshake decreases if the expected length of the repeated game increases, because the handshake then becomes a relatively small fraction of total play. That effect is only partly offset by the fact that an increase in continuation probability also allows for equilibria with longer handshakes.

In our simulations, individuals do not make mistakes; they both perceive the other’s action and execute their own strategy with perfect accuracy. Mistakes, however, are very relevant for repeated games (11, 44, 45, 48–50). Therefore, we also ran simulations with errors for a selection of parameter combinations to check the robustness of our results. In those runs, every time a player chooses an action, C or D, there is some chance that the opposite move occurs. Fig. 4 compares error-free simulations with those that have a 1% and 5% error rate. We find that our conclusions are robust with respect to errors: it is the interaction between repetition and structure that yields high cooperation. Another classic extension is to include complexity costs (12, 48). In addition, here one can reasonably expect that the simulation results will be very similar as long as complexity costs are sufficiently small.

In summary, we have shown that repetition alone is not enough to support high levels of cooperation, but that repetition together with a small amount of population structure can lead to the evolution of cooperation. In particular, in the parameter region where repetition is common and assortment is small but nonzero, we find a high prevalence of conditionally cooperative strategies. These findings are noteworthy because human interactions are typically repeated and occur in the context of population structure. Moreover, experimental studies show that humans are highly cooperative in repeated games and use conditional strategies (14, 43–46). Thus, our results seem to paint an accurate picture of cooperation among humans. Summarizing, one can say that one possible recipe for human cooperation may have been “a strong dose of repetition and a pinch of population structure.”

## Acknowledgments

We thank Tore Ellingsen, Drew Fudenberg, Corina Tarnita, and Jörgen Weibull for comments. This study was supported by the Netherlands Science Foundation, the National Institutes of Health, and the Research Priority Area Behavioral Economics at the University of Amsterdam; D.G.R. is supported by a grant from the John Templeton Foundation.

## Footnotes

↵

^{1}M.v.V. and J.G. contributed equally to this work.- ↵
^{2}To whom correspondence should be addressed. E-mail: C.M.vanVeelen{at}uva.nl.

Author contributions: M.v.V. and J.G. designed research; M.v.V. and J.G. performed research; M.v.V., J.G., D.G.R., and M.A.N. analyzed data; and M.v.V., J.G., D.G.R., and M.A.N. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1206694109/-/DCSupplemental.

## References

- ↵
- ↵
- ↵
- ↵
- Axelrod R,
- Hamilton WD

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Imhof LA,
- Fudenberg D,
- Nowak MA

- ↵
- Fudenberg D,
- Maskin E

- ↵
- ↵
- ↵
- ↵
- Liberman V,
- Samuels SM,
- Ross L

- ↵
- Bendor J,
- Swistak P

- ↵
- Abreu D,
- Pearce D,
- Stacchetti E

- ↵
- ↵
- ↵
- ↵
- Eshel I,
- Cavalli-Sforza LL

- ↵
- ↵
- ↵
- ↵
- ↵
- Rousset F

- ↵
- Traulsen A,
- Nowak MA

- ↵
- Fletcher JA,
- Doebeli M

- ↵
- ↵
- ↵
- Nowak MA,
- Tarnita CE,
- Antal T

- ↵
- Richerson P,
- Boyd R

- ↵
- ↵
- van Veelen M,
- García J

- ↵
- ↵
- ↵
- ↵
- ↵
- Tarnita CE,
- Wage N,
- Nowak MA

- ↵
- Rand DG,
- Arbesman S,
- Christakis NA

- ↵
- Grafen A

- ↵
- ↵
- Wedekind C,
- Milinski M

- ↵
- ↵
- ↵
- ↵
- Dasgupta P

- ↵
- ↵

## Citation Manager Formats

### More Articles of This Classification

### Biological Sciences

### Evolution

### Social Sciences

### Related Content

- No related articles found.

### Cited by...

- Relatedness decreases and reciprocity increases cooperation in Norway rats
- Memory-n strategies of direct reciprocity
- Co-evolution of cooperation and cognition: the impact of imperfect deliberation and context-sensitive intuition
- Negotiation and appeasement can be more effective drivers of sociality than kin selection
- Intuition, deliberation, and the evolution of cooperation
- Computational complexity of ecological and evolutionary spatial dynamics
- Cooperation and control in multiplayer social dilemmas
- Learning dynamics explains human behaviour in Prisoner's Dilemma on networks
- Cooperation creates selection for tactical deception
- When Paths to Cooperation Converge