# Memory-*n* strategies of direct reciprocity

^{a}Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria;^{b}Institute of Cognitive Sciences and Technologies, National Research Council of Italy, 00185 Rome, Italy;^{c}Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138;^{d}Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138;^{e}Department of Mathematics, Harvard University, Cambridge, MA 02138

See allHide authors and affiliations

Edited by Kenneth W. Wachter, University of California, Berkeley, CA, and approved March 21, 2017 (received for review December 23, 2016)

## Significance

Direct reciprocity is one of the fundamental mechanisms for cooperation. It is based on the idea that individuals are more likely to cooperate if they can expect their beneficiaries to remember and to return their cooperative acts in future. Previous computational models, however, often had to restrict the number of past rounds subjects can memorize. Herein we suggest an alternative approach. We propose general properties that robust cooperative strategies ought to have. Then we characterize all memory-*n* strategies that meet these properties, and we show that such strategies naturally emerge across different evolutionary scenarios. Our results are applicable to general social dilemmas of arbitrary size. For some dilemmas, longer memory is all it takes for cooperation to evolve.

## Abstract

Humans routinely use conditionally cooperative strategies when interacting in repeated social dilemmas. They are more likely to cooperate if others cooperated before, and are ready to retaliate if others defected. To capture the emergence of reciprocity, most previous models consider subjects who can only choose from a restricted set of representative strategies, or who react to the outcome of the very last round only. As players memorize more rounds, the dimension of the strategy space increases exponentially. This increasing computational complexity renders simulations for individuals with higher cognitive abilities infeasible, especially if multiplayer interactions are taken into account. Here, we take an axiomatic approach instead. We propose several properties that a robust cooperative strategy for a repeated multiplayer dilemma should have. These properties naturally lead to a unique class of cooperative strategies, which contains the classical Win–Stay Lose–Shift rule as a special case. A comprehensive numerical analysis for the prisoner’s dilemma and for the public goods game suggests that strategies of this class readily evolve across various memory-*n* spaces. Our results reveal that successful strategies depend not only on how cooperative others were in the past but also on the respective context of cooperation.

In repeated social dilemmas, humans often show conditionally cooperative behaviors (1⇓–3). When there is a temptation to defect at the expense of other group members, subjects consider whether they or others defected before, and react accordingly. However, modeling conditional cooperation is not straightforward, as it is difficult to capture how humans actually make their decisions. Economic models often consider rational subjects who remember all past interactions, and who follow a predefined equilibrium plan (4). Evolutionary models, on the other hand, often take the opposite approach. With a few notable exceptions (5⇓⇓–8), evolutionary models focus on naive subjects who can only choose from a restricted set of strategies (9⇓⇓⇓–13), or who do not remember anything beyond the outcome of the very last round (14⇓⇓⇓⇓⇓⇓–21).

Both approaches represent idealizations, which serve the purpose of making the models computationally tractable. Already for the simplest example, the repeated prisoner’s dilemma, calculations are greatly simplified if one assumes that the players’ strategies depend on the last round only. These so-called memory-1 strategies represent a four-dimensional space, which can be explored systematically (e.g., refs. 16 and 22). Previous studies identified a number of successful memory-1 strategies, including Tit-for-Tat (

To identify successful memory-*i*) mutually cooperative, (*ii*) able to correct errors, and (*iii*) sufficiently retaliating against defectors. We show that all strategies that satisfy these requirements belong to a unique class of all-or-none strategies, which includes the well-known

## Results

### Repeated Dilemmas with Memory-*n* Strategies.

We consider a repeated game in a group of *i*) Players prefer their coplayers to cooperate, *ii*) A defector’s payoff always exceeds the payoff of a cooperator, *iii*) For the whole group, mutual cooperation is beneficial, *SI Appendix*, we show that our results are robust with respect to other payoff specifications (*SI Appendix*, Fig. S7). Further examples of social dilemmas include the snowdrift game (32), the stag hunt game (33), and the public goods game (13, 34).

For a given group member

In repeated games, a player’s decision to cooperate in one round may depend on the entire history of the game so far. Herein, we assume that players base their decision on the previous *SI Appendix*).

In the limit of rare errors, a player with some given strategy may never experience certain *SI Appendix*). We refer to the set of all consistent

### Desirable Properties of Memory-*n* Strategies.

Within the set of memory-1 strategies for the prisoner’s dilemma, evolutionary processes often lead to a particular cooperative strategy, *WSLS* (15, 16). A *A* for a visual description, and see *SI Appendix* for formal definitions). We will start with the property of mutual cooperativeness.

#### (MC_{𝑘}).

A strategy is mutually cooperative if there are histories for which the strategy prescribes to cooperate, and if it continues to cooperate after rounds with mutual cooperation (provided the last

The property (MC_{k}) follows from the general scope of our paper: We are exactly looking for strategies that, in principle, allow for fully cooperative interactions. We do not require such players always to cooperate after mutual cooperation, but they need to do so for all generic histories that are reached with positive probability as the error rate goes to zero. In addition to _{k}), including

#### (EC_{𝑘}).

A strategy

The property (EC_{k}) is especially relevant when considering simulations for stochastic strategies. In such simulations, mutations rarely introduce strategies that perfectly satisfy (MC_{k}). Instead, players may only apply approximations to mutually cooperative strategies. If a cooperative strategy is to be successful, it thus needs to find effective ways to cope with this noise.

When all players use the same (EC_{k}) strategy, it seems beneficial that _{0}), but is easily exploited by defectors. Thus we ask for the following property.

#### (RE_{𝑘}).

A strategy

Condition (RE_{k}) is important for two reasons. First, by construction, such strategies show some robustness against exploitation by defectors. Second, and maybe less obviously, such strategies are also more resistant against invasion by more forgiving strategies (which, in turn, would be susceptible to defectors). For example, in the prisoner’s dilemma, a _{1})] cannot be invaded by _{0})]: If a _{1}) strategy as well: The player needs to become less forgiving.

We note that, for the prisoner’s dilemma, _{1}), (EC_{1}), and (RE_{1}). Conversely, *B*). In *SI Appendix*, we show that similar strategies exist for any memory length

A player with this strategy only cooperates if all players used exactly the same actions in the past, that is, if, in each of the last *i*) The *ii*) Strategies resembling *iii*) Pinheiro et al. (34) were first to use the name “all-or-none strategy,” to describe the strategy

*SI Appendix*, that any strategy satisfying the above properties needs to be equivalent to

When condition **2** holds, no mutant strategy can gain a higher payoff, even if mutants were allowed to use stochastic strategies or if they had access to higher memory. For the prisoner’s dilemma, this condition simplifies to

Although subgame perfection guarantees that no single mutant can have a higher payoff, it does not imply that *SI Appendix*, that *SI Appendix*, we provide simulation results suggesting that *SI Appendix*, Figs. S2 and S5).

Finally, it is also worth noting that cooperative strategies do not necessarily need to have the form of _{k}) is rather restrictive. It does not require only that unilateral defection is punished for

### Stability of Pure Memory-2 Strategies with Errors.

As an application of our previous results, let us consider memory-2 strategies for the repeated prisoner’s dilemma. Depending on the player’s own two moves in the previous two rounds, and depending on the two moves of the coplayer, there are

We performed an exhaustive numerical analysis to identify all strict Nash equilibria if players are restricted to pure memory-2 strategies. To this end, we considered a small error rate of *SI Appendix* for a detailed description of the method). This analysis shows that, for typical benefit-to-cost ratios (with

This numerical approach is not restricted to cooperative strategies; we also used this method to record all other stable memory-2 strategies for the repeated prisoner’s dilemma. We find that, besides the class of cooperative strategies, one can distinguish three additional classes of stable behaviors. First, there are Nash equilibria that lead to mutual defection, containing 15 elements, including *SI Appendix*, Table S2). Second, we also identified a class of stable self-alternating strategies, containing eight strategies in total (*SI Appendix*, Table S3). When applied by both players, these self-alternating strategies lead to a deterministic switch between rounds of mutual cooperation and rounds of mutual defection. Finally, the last class of Nash equilibria consists of strategies that have two absorbing states, for example, mutual cooperation and mutual defection (*SI Appendix*, Table S4). When two such players interact, they defect for a large number of rounds; however, after a specific sequence of erroneous moves, players begin to cooperate until cooperation again breaks down due to errors. Thus, the set of memory-2 strategies allows for multiple equilibria that differ in their prospects for cooperation. Which of these equilibria is most relevant may thus depend on how likely they are to emerge in natural evolutionary processes.

### Evolutionary Dynamics Among Memory-*n* Players.

Based on the previous equilibrium analysis, we may predict the following: (*i*) For intermediate *ii*) If cooperation evolves, it is due to strategies with an all-or-none character (i.e., strategies that are particularly likely to cooperate if players chose the same actions during the previous rounds). In the following, we test these predictions by simulating a simple imitation process based on the dynamics of Imhof and Nowak (17) for stochastic memory-1 and memory-2 strategies (the setup of these simulations is outlined in *Materials and Methods*).

Our simulation results support both predictions. When players are allowed to use memory-2 strategies, the evolving cooperation rates sharply increase once *A*); as the evolving cooperation rate is a continuous function of the *B* and *C*). When both players cooperated in all remembered rounds, players are almost certain to cooperate in the next round. Moreover, players are most likely to cooperate if the players’ actions in the last rounds coincided, consistent with all-or-none behavior.

Among memory-2 players, *C*, this would require that the first four bars are close to 1, because a

Fig 2 *B* and *C* also suggests that the evolving memory-1 strategies yield somewhat better approximations to

To further corroborate our theoretical predictions, we show, in *SI Appendix*, that all-or-none strategies are also predominant when we simulate the dynamics among memory-1 strategies for the public goods game (see *SI Appendix*, Fig. S4, using a strictly larger strategy set than in refs. 13 and 34). Similarly, we show that behaviors reminiscent of all-or-none strategies evolve in the prisoner’s dilemma when players only remember how often (but not when) players cooperated during the previous *SI Appendix*, Fig. S3). Among these simplified memory-*SI Appendix*, Fig. S3*A*). These results suggest memory is not only important to assess how cooperative other group members have been but is also an important mechanism to reach coordination among like-minded players. Such coordination attempts are most successful if players remember both the degree of cooperation and its timing.

## Discussion

Previous research often used tournaments and evolutionary contests to distill properties that successful reciprocal strategies ought to have. Herein, we take the converse approach. We formulated three simple principles that we can expect well-performing strategies to obey. Based on these principles, we derived a successful class of cooperative strategies for general multiplayer dilemmas. Each of our principles seems to be psychologically intuitive. Our first principle of mutual cooperativeness is motivated by the observation that most subjects are most likely to cooperate in fully cooperative groups (3, 43). The principle of retaliation is based on findings that people are willing to fight back when being exploited, sometimes even if this comes at a cost to themselves (44, 45). Our last principle acknowledges that mutual cooperation can only be sustained in noisy environments if we are able to forgive others for their occasional failures to cooperate (46). The importance of these principles was noted before (9, 22, 23, 29, 47). However, the application of these principles has been usually limited to specific two-player games, assuming that players are subject to rather severe constraints on their cognitive capabilities. Here we show that the above principles can be used to construct strategies that can be applied in any multiplayer dilemma, and where subjects may remember an arbitrary number of past events.

Similar to the well-known *WSLS* rule in the prisoner’s dilemma, all-or-none strategies

Higher memory is not necessary if a strategy only needs to resist invasion by defectors. As an example, let us consider the generous ZD strategy (24⇓–26, 28, 42) that always reciprocates cooperation, but forgives the coplayer’s defection in the prisoner’s dilemma with probability

Overall, our study suggests that memory-

## Materials and Methods

In the following paragraphs, we describe the setup of our evolutionary process. Our evolutionary simulations are based on a simple pairwise imitation process, based on the dynamics described in ref. 17. We consider a population of size

The parameter *SI Appendix*, however, we present further simulations suggesting that our qualitative results are independent of the assumption of rare mutations, and of the considered selection strength (*SI Appendix*, Fig. S6).

## Acknowledgments

This work was supported by the European Research Council Start Grant 279307: Graph Games (to K.C.), Austrian Science Fund (FWF) Grant P23499-N23 (to K.C.), FWF Nationale Forschungsnetzwerke Grant S11407-N23 Rigorous Systems Engineering/Systematic Methods in Systems Engineering (to K.C.), Office of Naval Research Grant N00014-16-1-2914 (to M.A.N.), and the John Templeton Foundation (M.A.N.). The Program for Evolutionary Dynamics is supported, in part, by a gift from B. Wu and Eric Larson. C.H. acknowledges generous support from the ISTFELLOW program, and L.A.M.-V. gratefully acknowledges support from the European Research Consortium for Informatics and Mathematics Alain Bensoussan Fellowship Program.

## Footnotes

↵

^{1}C.H. and L.A.M.-V. contributed equally to this work.- ↵
^{2}To whom correspondence should be addressed. Email: christian.hilbe{at}ist.ac.at.

Author contributions: C.H., L.A.M.-V., K.C., and M.A.N. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1621239114/-/DCSupplemental.

## References

- ↵
- ↵.
- Rand DG,
- Nowak MA

- ↵
- ↵.
- Fudenberg D,
- Tirole J

- ↵.
- Arthur WB,
- Durlauf SN,
- Lane DA

- Lindgren K

- ↵.
- Hauert C,
- Schuster HG

- ↵.
- van Veelen M,
- García J,
- Rand DG,
- Nowak MA

- ↵
- ↵.
- Axelrod R,
- Hamilton WD

- ↵
- ↵.
- Kurokawa S,
- Ihara Y

- ↵.
- Szolnoki A,
- Perc M,
- Szabó G

- ↵
- ↵
- ↵
- ↵
- ↵.
- Imhof LA,
- Nowak MA

- ↵.
- Martinez-Vaquero LA,
- Cuesta JA,
- Sanchez A

- ↵.
- Hilbe C,
- Nowak MA,
- Sigmund K

- ↵.
- Dong Y,
- Li C,
- Tao Y,
- Zhang B

- ↵.
- Baek SK,
- Jeong HC,
- Hilbe C,
- Nowak MA

- ↵.
- Sigmund K

- ↵
- ↵.
- Press WH,
- Dyson FD

- ↵.
- Stewart AJ,
- Plotkin JB

- ↵.
- Akin E

- ↵.
- Stewart AJ,
- Plotkin JB

- ↵
- ↵
- ↵
- ↵.
- Hilbe C,
- Wu B,
- Traulsen A,
- Nowak MA

- ↵
- ↵.
- Pacheco JM,
- Santos FC,
- Souza MO,
- Skyrms B

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Garcia J,
- van Veelen M

- ↵.
- Fudenberg D,
- Maskin E

- ↵
- ↵.
- Stewart AJ,
- Plotkin JB

- ↵.
- Traulsen A,
- Semmann D,
- Sommerfeld RD,
- Krambeck HJ,
- Milinski M

- ↵
- ↵
- ↵
- ↵
- ↵.
- Fischer I, et al.

- ↵.
- Milinski M,
- Wedekind C

- ↵.
- Traulsen A,
- Nowak MA,
- Pacheco JM

- ↵
- ↵

*n*strategies of direct reciprocity

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Evolution

- Social Sciences
- Social Sciences