## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Evolution of extortion in Iterated Prisoner’s Dilemma games

Edited by Kenneth Wachter, University of California, Berkeley, CA, and approved March 15, 2013 (received for review August 29, 2012)

## Abstract

Iterated games are a fundamental component of economic and evolutionary game theory. They describe situations where two players interact repeatedly and have the ability to use conditional strategies that depend on the outcome of previous interactions, thus allowing for reciprocation. Recently, a new class of strategies has been proposed, so-called “zero-determinant” strategies. These strategies enforce a fixed linear relationship between one’s own payoff and that of the other player. A subset of those strategies allows “extortioners” to ensure that any increase in one player’s own payoff exceeds that of the other player by a fixed percentage. Here, we analyze the evolutionary performance of this new class of strategies. We show that in reasonably large populations, they can act as catalysts for the evolution of cooperation, similar to tit-for-tat, but that they are not the stable outcome of natural selection. In very small populations, however, extortioners hold their ground. Extortion strategies do particularly well in coevolutionary arms races between two distinct populations. Significantly, they benefit the population that evolves at the slower rate, an example of the so-called “Red King” effect. This may affect the evolution of interactions between host species and their endosymbionts.

The Iterated Prisoner’s Dilemma (IPD) has a long history as a model for the cultural and biological evolution of cooperation (1⇓⇓⇓⇓⇓⇓⇓–9). A new class of so-called “zero-determinant” (ZD) strategies has recently attracted considerable attention (10⇓–12). Such strategies allow players to enforce a linear relation unilaterally between one player’s own payoff and the coplayer’s payoff. A subset consists of the so-called “equalizer” strategies, which assign to the coplayer’s score a predetermined value, independent of the coplayer’s strategy (13). Another subset consists of the extortion strategies, which guarantee that one player’s own surplus exceeds the coplayer’s surplus by a fixed percentage. Press and Dyson (10) have explored the power of ZD strategies to manipulate any “evolutionary” opponent (i.e., any coplayer able to learn and to adapt).

In Stewart and Plotkin’s (11) commentary to the article by Press and Dyson (10), they ask: “What does the existence of ZD strategies mean for evolutionary game theory: Can such strategies naturally arise by mutation, invade, and remain dominant in evolving populations?” In evolutionary game theory, it is the population that adapts: More and more players switch to the more successful strategies. From the outset, it may seem that the opportunities for extortion strategies are limited. If a strategy is successful, it will spread, and therefore be more likely to be matched against its like, but any two extortioners hold each other down to surplus zero. In a homogeneous population of extortioners, it is thus better to deviate by cooperating. Extortion is therefore evolutionarily unstable (12). However, we shall see that if the two players engaged in an IPD game belong to distinct populations, the evolutionary prospects of extortion improve significantly.

In the following, we investigate the impact of ZD strategies on evolutionary game theory. We show that in large, well-mixed populations, extortion strategies can play an important role, but only as catalyzers for cooperation and not as a long-term outcome. However, if the IPD game is played between members of two separate populations evolving on different time scales, extortion strategies can get the upper hand in whichever population evolves more slowly and enable it to enslave the other population, an interesting example of the so-called “Red King” effect (14).

The Prisoner’s Dilemma (PD) game is a game between two players *I* and *II* having two strategies each, which we denote by *C* (“to cooperate”) and *D* (“to defect”). It is assumed that the payoff for two cooperating players, *R*, is larger than the payoff for two defecting players, *P*. If one player cooperates and the other defects, the defector’s payoff *T* is larger than *R* and the cooperator’s payoff *S* is smaller than *P*. Thus, the game is defined by . An important special case is the so-called “donation game,” where each player can “cooperate” (play *C*) by providing a benefit *b* to the other player at his or her cost *c*, with . Then, , , , and .

In the IPD game, the two players are required to play an infinite number of rounds, and their payoffs respectively (resp.) are given by the limit in the mean of the payoffs per round. An important class of strategies consists of so-called “memory-one” strategies. They are given by the conditional probabilities , and to play *C* after experiencing outcome resp. *P* in the previous round. [In addition, such a strategy has to specify the move in the first round, but this has only a transient effect and plays no role in the long run (15)]. An important class of memory-one strategies consists of reactive strategies, which only depend on the coplayer’s move in the previous round (not one’s own move). Then, and , such that a reactive strategy corresponds to a point in the unit square (16).

We will first define and characterize ZD strategies, equalizers, and extortioners. We then investigate, in the context of evolutionary game theory, the contest between extortioners and four of the most important memory-one strategies. We will show that extortion cannot be an outcome of evolution but can catalyze the emergence of cooperation. The same result will then be obtained if we consider all memory-one strategies. Hence, extortion strategies can only get a foothold if the population is very small. If the IPD game is played between members of two distinct populations, ZD strategies can emerge in the population that evolves more slowly. In particular, extortion strategies can allow host species to enslave their endosymbionts.

## Methods and Results

### Definitions.

Press and Dyson (10) define the class of ZD strategies as those memory-one strategies satisfying, for some real values , the equations

We note that and are the probabilities to switch from *C* to *D*, whereas and are the probabilities to switch from *D* to *C*. Press and Dyson (10) showed that if player *I* uses such a ZD strategy, thenno matter which strategy player *II* is using. Equalizer strategies are those ZD strategies for which , then

Thus, player *I* can assign to the coplayer any payoff between *P* and *R*. (Indeed, because the values have to be between 0 and 1, it follows that and ). The so-called “*χ*-extortion” strategies are those ZD strategies for which , with . Then,In this case, player *I* can guarantee that his or her own “surplus” (over the maximin value *P*) is the *χ*-fold of the coplayer’s surplus. Fig. 1 shows examples of these different ZD strategies.

Press and Dyson (10) speak of ZD strategies because they use for their proof of Eq. **2** an ingenious method based on determinants. In *Appendix A*, we present a more elementary proof, following the method of Boerlijst et al. (13). Within the 4D unit cube of all memory-one strategies , the ZD strategies form a 3D subset containing the 2D subsets and of equalizers resp. extortioners (*Appendix B*). In Fig. 2, we sketch these sets for the reactive strategies.

### Extortion Within One Population.

To investigate the role of extortion in the context of evolutionary games, we concentrate on the donation game (in *SI Text*, we provide further results for the general IPD, which show that the main conclusions are independent of special characteristics of the donation game). We first consider how a *χ*-extortion strategy fares against some of the most important memory-one strategies, namely, tit for tat [*TFT* = (1,0,1,0)], always defect [*All D* = (0,0,0,0)], always cooperate [*All C* = (1,1,1,1)] and the win-stay-lose-shift strategy *WSLS*, which is encoded by , and hence cooperates if and only if the coplayer’s move in the previous round was the same as one’s own move (7). We note that *TFT* is a ZD strategy and can be viewed as a limiting case of an extortion strategy, with . For the donation game, the payoff for a player using strategy *i* against a player using strategy *j* is given by the th element of the following matrix:

Let us start with the pairwise comparisons. The extortioner strategy is neutral with respect to *All D*. It is weakly dominated by *TFT*, in the sense that a *TFT* player does not fare better than an extortioner against extortioners but that interactions with other *TFT* players are giving an advantage to *TFT*. *All C* players can invade extortioners, and vice versa: These two strategies can stably coexist in proportions . Finally, *WSLS* dominates extortioners (in the sense that *WSLS* provides a better response than extortion against itself and against extortioners). We note that the mixed equilibrium of extortioners and unconditional cooperators can be invaded by each of the other three strategies. The same holds for the mixed equilibria of extortioners and unconditional defectors if the frequency of extortioners is sufficiently high. In particular, *TFT* can always invade such a mixed equilibrium but can, in turn, be invaded by *WSLS* or *All C*. No Nash equilibrium involves . If , there are two Nash equilibria: a mixture of *TFT*, *All C*, and *All D*, and a mixture of *TFT*, *WSLS*, and *All D*. If , there exist four Nash equilibria. In particular, *WSLS* is then a strict Nash equilibrium.

The replicator dynamics (17) displays for the payoff matrix continuous families of fixed points and periodic orbits, and hence is far from being structurally stable: Small changes in the dynamics can lead to vastly different outcomes. The same applies to most other deterministic game dynamics (18). It seems more reliable to consider a stochastic process that describes a finite, well-mixed population consisting of *M* players and evolving via copying of successful strategies and exploration (i.e., by a selection-mutation process) (19⇓–21). Selection is viewed here as an imitation process; in each time step, two randomly chosen players *A* and *B* compare their average payoffs and , and *A* switches to *B*’s strategy with a probability given by , where corresponds to “selection strength.” (As shown in *SI Text*, the details of the imitation process matter little.) Additionally, mutations occur with a small probability (corresponding to the adoption of another strategy, with each alternative being equally likely). Any such stochastic process yields a steady-state distribution of strategies.

We find that although extortioners are never the most abundant strategy, they can play the role of a catalyzer. Indeed, if only *All D* and *WSLS* are available, a population may be trapped in a noncooperative state for a considerable time, leading to a mutation-selection equilibrium that clearly favors defectors (Fig. 3*A*). In such a case, extortioners (Fig. 3*B*) and *TFT* (Fig. 3*C*) offer an escape: These strategies can subvert an *All D* population through neutral drift and selection, respectively. Once defectors are rare, *WSLS* outperforms *TFT*, and it also prevails against extortioners if the population is sufficiently large (in a direct competition, *WSLS* always gets a higher payoff than if ). Thus, in large populations, extortioners and *TFT* players tip the mutation-selection balance toward *WSLS*, and therefore increase the level of cooperation. Further expansion of the strategy space through adding *All C* has only a small effect on the steady state (Fig. 3 *D* and *E*), slightly favoring extortioners.

What happens when players are not restricted to the five specific strategies considered so far but can choose among all possible memory-one strategies? We study this by using the stochastic evolutionary dynamics of Imhof and Nowak (22), assuming that mutants can pick up any memory-one strategy, with a uniform probability distribution on the 4D unit cube. We further assume that the mutant reaches fixation, or is eliminated, before the next mutation occurs. Overall, this stochastic process leads to a sequence of monomorphic populations. The evolutionary importance of a given strategy can then be assessed by computing how often the state of the population is in its neighborhood. For a subset *A* of the set of memory-one strategies, we denote the *δ*-neighborhood of *A* (with respect to Euclidean distance) by , and let denote the fraction of time that the evolving population visits . We say that is favored by selection if the evolutionary process visits more often than expected under neutral evolution, [i.e., if is larger than the volume of the intersection of with the unit cube of all memory-one strategies]. We apply this concept to .

Extensive simulations indicate that neither extortioners nor equalizers or ZD strategies are favored by selection if the population is reasonably large (Fig. 4*A*). By contrast, very small population sizes promote the selection of these behaviors. For extortioners, this result is intuitive: In small populations, the fact that self-interactions are excluded yields greater weight to interactions with players using the rival strategy rather than interactions with players using one’s own strategy (19); this effect may even result in the evolution of spite (24, 25). We address this point in more detail in *SI Text* (*section 2*). Essentially, both extortioners and equalizers suffer from not achieving maximal payoff against themselves, which causes their inherent instability, as also stressed by Adami and Hintze (12). The same holds for most ZD strategies. By contrast, *WSLS* players do well against their like, and therefore prevail in the evolutionary dynamics for long periods if the population size is large, at least when or, for more general PD games, when (15) (Fig. 4*B*). As a (possibly surprising) consequence, larger populations also yield higher average payoffs (Fig. 4*C*). In *SI Text*, we show that these qualitative results are robust with respect to changes in parameter values, such as benefits and costs or the strength of selection. Hence, extortion is disfavored by evolution as soon as the population size exceeds a critical level.

### Extortion Between Two Populations.

Let us now consider two species (e.g., hosts and their symbionts) or two classes of a single species (e.g., old and young, buyers and sellers, rulers and subjects) engaged in an IPD game, which, of course, is now unlikely to be symmetrical. In such situations, extortioners may evolve even in large populations. Indeed, extortioners provide incentives to cooperate: As shown by Press and Dyson (10), *All C* is always a best response to an extortion strategy. In a single population of homogeneous players, this is not turned to advantage, because the extortioners’ success leads to more interactions with their own kind. If extortioners evolve in one of two separate populations, they will not have to interact with coplayers of their own kind. Nevertheless, their success may be short-lived because they will be tempted to adopt the even more profitable *All D* strategy as a reaction to the *All C* coplayers who they have produced, which, in turns, leads to the disappearance of the *All C* players.

Extortioners can only achieve a lasting (rather than short-lived) success if the rate of adaptation for the host population is much slower than that for the symbionts. To elucidate this point, we extend our previous analysis by revisiting a coevolutionary model of Damore and Gore (26). These authors consider host–symbiont interactions where each host interacts with its own subpopulation of endosymbionts. Let us assume that these interactions are given by an IPD game. Members of both species reproduce with a probability proportional to their fitness (which is an increasing function of their payoffs) by replacing a randomly chosen organism of their species. However, the two populations of hosts and symbionts may evolve on different time scales, as measured by their relative evolutionary rate (RER). For an RER of 1, hosts and symbionts evolve at a similar pace in the evolutionary arms race and no population is able to extort the other (Fig. 5*A*). This changes drastically as soon as we increase the RER, by allowing symbionts to adapt more quickly. Fast adaptation results in a short-term increase of the symbionts’ payoffs, because they can quickly adjust to their respective host. In the long term, however, this induces hosts to adopt extortion strategies (Fig. 5*B*), thereby forcing their symbionts to cooperate. Thus, it pays off in the long run for the host to be slow to evolve; for the parameters in Fig. 5*B*, the resulting equilibrium allocates them, on average, a surplus more than 10-fold larger than the surplus achieved by the symbionts.

## Discussion

Our main results show that within one population, extortion strategies can act as catalyzers for cooperation but prevail only if the population size is very small, and that in interactions between two populations, extortion can emerge if the rates of evolution differ. This holds not only for the donation game (and therefore whenever ) but in considerably more general contexts. In the last part of *SI Text*, we emphasize this robustness. We could also assume that the players alternate their moves in the donation game (27, 28) or that the underlying PD game is asymmetrical (the definitions have to be modified in a straightforward way). As noted by Press and Dyson (10), some results hold also for non-PD games; this deserves further investigation.

In orthodox game theory, strategy *A* dominates strategy *B* if *A* yields at least the payoff of *B* no matter what the coplayer does. When Press and Dyson (10) argue that extortioners dominate their coplayers, they mean that no matter what the coplayer does, the extortioner gets more. This is not quite the same, and we display in *SI Text* (*section 2*) an example that highlights the difference. Adami and Hintze (12) stress a similar point in their title: “Winning isn’t everything.” Moreover, when Press and Dyson (10) speak of evolutionary players, they refer to players who adapt their strategy in the course of an IPD game, whereas in evolutionary game theory, it is the population that evolves. Thus, Press and Dyson (10) analyzed ZD strategies in the context of classical game theory, with two players locked in contest: Extortion strategies play an important role in this context, as do the more orthodox trigger strategies (3, 6). In the context of evolutionary game theory, whole populations are engaged in the game. For a very small population size, extortion strategies still offer good prospects. This is not surprising, because the limiting case, a population size , reduces to the scenario analyzed by Press and Dyson (10). In larger populations (with our parameter values for ), the outcome is different. However, evolutionary game theory can reflect features of classical game theory if the two interacting players belong to two separate evolving populations.

Extortion strategies are only a small subset of ZD strategies. We have seen that within large populations, the class of ZD strategies is not favored by selection, in the sense that its neighborhood is not visited disproportionally often. This does not preclude, of course, that certain elements of the class are favored by selection. Thus, generous *TFT* does well, as do other less known strategies. In particular, Stewart and Plotkin (11) highlighted a class of strategies defined, instead of Eq. **3**, by (with ). A player using this strategy does not claim a larger portion of the surplus but a larger share of the loss (relative to the outcome *R* of full cooperation). Remarkably, these “compliant” strategies do as well as *WSLS*. They are the only ZD strategies that are best replies against themselves.

In the study by Adami and Hintze (12), the evolutionary stability of several ZD strategies was tested by replicator dynamics and agent-based simulations, which independently confirm the result that these strategies do not prevail in large populations. They used a population size of *M* = 1,024, and payoff values of , , , and (i.e., a PD game that cannot be reduced to a donation game). Adami and Hintze (12) also discuss the evolutionary success of “tag-based” strategies, which use extortion only against those opponents who do not share their tag. These strategies are not memory-one strategies because they depend not only on the previous move; rather, they use memory-one strategies in specific contexts, which depend on the tag. Such a tag is an additional trait that has to evolve and risks being faked.

In interactions between different populations, a cheater-proof tag is provided for free and extortion may accordingly evolve. In endosymbiotic relationships, as we have seen, the species that evolves at the slower rate gains a disproportionate share of the benefit, an instance of the Red King effect (14, 29, 30). This requires two conditions to be met: Individuals need to come from different populations, and these populations have to evolve on different time scales. If these conditions are fulfilled, extortioner hosts can manipulate their symbionts’ evolutionary landscape in such a way that the hosts’ and the symbionts’ payoffs are perfectly correlated. This ensures that only those symbiont mutants that are beneficial for the host can succeed. In this sense, such hosts apply an evolutionary kind of mechanism design; they create an environment that makes the symbionts’ cooperation profitable for the symbionts but even more profitable for themselves.

## Appendix A: Proof of Eq. 2

Let us denote by and the players’ payoffs in round *n*; by the probability that *I* experiences outcome in that round; and by the conditional probability, given outcome *i*, that *II* plays *C* in round . By conditioning on round *n*, we see that is given byand is given by

Hence, the probability that *I* plays *C* in round [i.e., ], is given by , where , , , and . Thus, is given bywhich is just . Summing over and dividing by *N*, we obtainhence, Eq. **2** holds, independent of the strategy of player *II*. The same proof works for any game (even if it is asymmetrical; one just has to replace with the corresponding payoff vector). In many cases, however, there will be no solutions to Eq. **1** that are feasible (i.e., probabilities between 0 and 1).

## Appendix B: Sets , , and

Elementary algebra shows that within the 4D unit cube of all memory-one strategies , the ZD strategies are characterized by(a 3D subset of the cube). Equalizers are characterized, in addition, by(they form a 2D set), and *χ*-extortion strategies are also characterized by and(for each *χ*, a 1D set). In the special case of the donation game, these equations reduce torespectively. The set of equalizers is spanned by , , , and , and the set of extortion strategies is spanned by , , and . All reactive strategies are ZD strategies, the reactive equalizers are those satisfying , and the reactive *χ*-extortioners are those with and (Fig. 2).

## Acknowledgments

We thank M. Abou Chakra, A. Traulsen, J.A. Damore and R. Trivers for useful discussions. K.S. acknowledges support from the Foundational Questions in Evolutionary Biology Fund (Grant RFP-12-21).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: karl.sigmund{at}univie.ac.at.

Author contributions: C.H., M.A.N., and K.S. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1214834110/-/DCSupplemental.

## References

- ↵
- Rapoport A,
- Chammah A

- ↵
- ↵
- Aumann R

- ↵
- Axelrod R

- ↵
- ↵
- Fudenberg D,
- Maskin E

- ↵
- ↵
- Kendall G,
- Yao X,
- Chong SY

- ↵
- Trivers R

- ↵
- Press WH,
- Dyson FJ

- ↵
- Stewart AJ,
- Plotkin JB

- ↵
- Adami C,
- Hintze A

- ↵
- ↵
- Bergstrom CT,
- Lachmann M

- ↵
- Sigmund K

- ↵
- ↵
- ↵
- Hofbauer J

- ↵
- ↵
- Nowak MA

- ↵
- ↵
- Imhof LA,
- Nowak MA

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Frean MR

- ↵
- Doebeli M,
- Knowlton N

- ↵

## Citation Manager Formats

### More Articles of This Classification

### Biological Sciences

### Evolution

### Related Content

- No related articles found.

### Cited by...

- The Red Queen and King in finite populations
- Memory-n strategies of direct reciprocity
- An oscillating tragedy of the commons in replicator dynamics with game-environment feedback
- Evolutionary consequences of behavioral diversity
- Autocratic strategies for iterated games with arbitrary action spaces
- Conformity enhances network reciprocity in evolutionary social dilemmas
- Collapse of cooperation in evolving games
- Cooperation and control in multiplayer social dilemmas
- From extortion to generosity, evolution in the Iterated Prisoner's Dilemma