# Autocratic strategies for iterated games with arbitrary action spaces

See allHide authors and affiliations

Edited by Christian Hilbe, Harvard University, Cambridge, MA, and accepted by the Editorial Board February 8, 2016 (received for review October 11, 2015)

## Significance

Games with two actions (“cooperate” and “defect”) have been extremely useful in modeling biological interactions. Despite their utility, however, there are many naturally occurring encounters that these games cannot fully capture. For example, the effort expended in animal grooming or the rate of siderophore production by microbes can take on a continuous range of values. Such interactions are better modeled by games with more general (continuous) action spaces. In this setting, we prove the existence of autocratic strategies that unilaterally enforce linear relationships on the payoffs for repeated games. In particular, we show that a player can often enforce such a relationship by playing only two actions throughout the repeated game.

## Abstract

The recent discovery of zero-determinant strategies for the iterated prisoner’s dilemma sparked a surge of interest in the surprising fact that a player can exert unilateral control over iterated interactions. These remarkable strategies, however, are known to exist only in games in which players choose between two alternative actions such as “cooperate” and “defect.” Here we introduce a broader class of autocratic strategies by extending zero-determinant strategies to iterated games with more general action spaces. We use the continuous donation game as an example, which represents an instance of the prisoner’s dilemma that intuitively extends to a continuous range of cooperation levels. Surprisingly, despite the fact that the opponent has infinitely many donation levels from which to choose, a player can devise an autocratic strategy to enforce a linear relationship between his or her payoff and that of the opponent even when restricting his or her actions to merely two discrete levels of cooperation. In particular, a player can use such a strategy to extort an unfair share of the payoffs from the opponent. Therefore, although the action space of the continuous donation game dwarfs that of the classic prisoner’s dilemma, players can still devise relatively simple autocratic and, in particular, extortionate strategies.

Game theory provides a powerful framework to study interactions between individuals (“players”). Among the most interesting types of interactions are social dilemmas, which result from conflicts of interest between individuals and groups (1, 2). Perhaps the most well-studied model of a social dilemma is the prisoner’s dilemma (3). A two-player game with actions, *C* (“cooperate”) and *D* (“defect”), and payoff matrix,*R*) than they can from mutual defection (*P*), resulting in a conflict of interest between the individual and the pair, which characterizes social dilemmas. Thus, in a one-shot game (i.e., a single encounter), two opponents have an incentive to defect against one another, but the outcome of mutual defection (the unique Nash equilibrium) is suboptimal for both players.

One proposed mechanism for the emergence of cooperation in games such as the prisoner’s dilemma is direct reciprocity (5, 6), which entails repeated encounters between players and allows for reciprocation of cooperative behaviors. In an iterated game, a player might forgo the temptation to defect in the present due to the threat of future retaliation—“the shadow of the future”—or the possibility of future rewards for cooperating (4, 7), phenomena for which there is both theoretical and empirical support (8, 9). One example of a strategy for the iterated game is to copy the action of the opponent in the previous round (“tit for tat”) (4). Alternatively, a player might choose to retain his or her action from the previous round if and only if the most recent payoff was *R* or *T* (“win-stay, lose-shift”) (10). These examples are among the simplest and most successful strategies for the iterated prisoner’s dilemma (11).

In a landmark paper, Press and Dyson (12) deduce the existence of zero-determinant strategies, which allow a single player to exert much more control over this game than previously thought possible. Since their introduction, these strategies have been extended to cover multiplayer social dilemmas (13, 14) and temporally discounted games (15). Moreover, zero-determinant strategies have been studied in the context of evolutionary game theory (16⇓⇓⇓–20), adaptive dynamics (21), and human behavioral experiments (22). In each of these studies, the game is assumed to have only two actions: cooperate and defect. In fact, the qualifier “zero-determinant” actually reflects this assumption because these strategies force a matrix determinant to vanish for action spaces with only two options. We show here that this assumption is unnecessary.

More specifically, suppose that players *X* and *Y* interact repeatedly with no limit on the number of interactions. For games with two actions, *C* and *D*, a memory-one strategy for player *X* is a vector, *X* cooperates following an outcome in which *X* plays *x* and *Y* plays *y*. Let *X* and *Y*, respectively, and let α, β, and γ be fixed constants. Press and Dyson (12) show that if there is a constant, ϕ, for which*X* can unilaterally enforce the linear relationship **2** is known as a zero-determinant strategy due to the fact that *P* is the payoff for mutual defection in the prisoner’s dilemma, then *X* if

The most common extensions of finite action sets are continuous action spaces. An element *K*), such as the amount a player invests in a public good (24); the volume of blood one vampire bat donates to another (25); the amount of resources used by microbes to produce siderophores (26); or the effort expended in intraspecies grooming (27, 28). It is important to note that games with continuous action spaces can yield qualitatively different results than their discrete counterparts. For example, the strategy “raise the stakes” initially offers a small investment in prisoner’s dilemma interactions and subsequently raises the contribution in discrete increments if the opponent matches or betters the investment (29). However, in a continuous action space, raise the stakes evolves into defection due to the fact that another strategy can be arbitrarily close—in terms of the initial investment and subsequent increases in contribution—yet exhibit qualitatively different behavior (30). In particular, raise the stakes succeeds in a discrete action space but fails in a continuous one.

Akin (31) calls the vector, **2** a Press–Dyson vector. For continuous action spaces, the payoff vectors, *i* when *X* plays *x* and *Y* plays *y*. The analog of the linear combination *X* can enforce a linear relationship on expected payoffs by choosing a memory-one strategy that plays just two actions, despite the fact that the opponent may have an infinite number of actions from which to choose. We give examples of such autocratic strategies in the continuous donation game, which represents an instance of the prisoner’s dilemma but with an extended, continuous action space.

## Autocratic Strategies

Consider a two-player iterated game with actions spaces, *X* and *Y*, respectively. Players *X* and *Y* interact repeatedly (infinitely many times), deriving a payoff at each round based on *λ* with *t* (32). Alternatively, one may interpret this game as having a finite number of rounds, where in any given round λ denotes the probability of another round (6), which results in an expected game length of

Suppose that, for each *X* and *Y* at time *t*. Then, irrespective of the interpretation of λ, the average payoff to player *X* is*Y*, **3**. If the strategies of *X* and *Y* are stochastic, then the payoffs are random variables with expectations, *Supporting Information*). Of particular interest are memory-one strategies, which are probabilistic strategies that depend on only the most recent outcome of the game. If *X*, then we denote by *X* uses *s* after *X* plays *x* and *Y* plays *y* (Fig. 1).

The proofs of the existence of zero-determinant strategies (both in games with and without discounting) rely heavily on the fact that the action space is finite (12, 15, 31). In particular, it remains unclear whether zero-determinant strategies are consequences of the finiteness of the action space or instances of a more general concept. Here, we introduce autocratic strategies as an extension of zero-determinant strategies to discounted games with arbitrary action spaces. The traditional, undiscounted case is recovered in the limit

### Theorem. (Autocratic Strategies).

*Suppose that* *is a memory-one strategy for player* X *and let* *be player* X*’s initial action. If, for some bounded function, ψ*, *the equation**holds for each* *and* *then* *and* *ogether enforce the linear payoff relationship**for any strategy of player* Y*. In other words, the pair* *is an autocratic strategy for player* X*.*

Note that the initial action, **4** becomes irrelevant without discounting (**2**, which is chosen so that the entries of **4** a Press–Dyson function, which extends the Press–Dyson vector of Eq. **2** to arbitrary action spaces (*Supporting Information*). In contrast to action spaces with two options (cooperate and defect, for instance), autocratic strategies are defined only implicitly via Eq. **4** for general action spaces (and actually already for games with just three actions).

For each *x* and *y*, the integral, **4** explicitly for **5**.

Interestingly, under mild conditions, *Corollary 2* in *Supporting Information*). Player *X* can enforce Eq. **5** by playing either *A*). Thus, a strategy of this form uses the (memory one) history of previous play only to adjust the relative weights placed on **4** (see *Remark 3* in *Supporting Information*). In a two-action game, every memory-one strategy is concentrated on two points, which explains why games like the classic prisoner’s dilemma fail to capture the implicit nature of autocratic strategies.

## Continuous Donation Game

In the classic donation game, cooperators pay a cost, *c*, to provide a benefit, *b*, to the opponent (33). Defectors make no donations and pay no costs. The payoff matrix for this game is given by Eq. **1** with *X*, **6** defines an extortionate strategy, which unilaterally enforces

Instead of discrete “levels” of cooperation, the continuous donation game admits a range of cooperation levels, *s*, denoted by *s* and, in analogy to the discrete case, satisfy **1** is replaced by payoff functions, with the payoffs to players *X* and *Y* for playing *x* against *y* being

### Two-Point Autocratic Strategies.

For the continuous donation game, we show, using *Theorem*, that player *X* can unilaterally enforce *X* plays only 0 and *K*, a memory-one strategy for player *X* is defined by a reaction function, *X* plays *K* following an outcome in which *X* plays *Y* plays *X* plays 0 (i.e., defects). Player *X*’s initial action is determined by the probability, *X* plays

Consider the function *X*’s initial action is **S60** in *Supporting Information*), the reaction function**8** represents a reactive strategy (39) because *X* conditions her play on only the previous move of the opponent (Fig. S1*B*).

For **8** defines an extortionate strategy, *X* a fixed share of the payoffs over the payoff for mutual defection. If *X* ensures the opponent has a payoff equal to her own (14). On the other hand, if **8** defines a generous (or “compliant”) strategy (16, 23). By playing a generous strategy, player *X* ensures that her payoff is at most that of her opponent’s. For each of these types of strategies, the probability that *X* reacts to *y* by cooperating increases as a function of *y*. In particular, *X* is most likely to cooperate after *Y* fully cooperates (*Y* defects (

Similarly, if *X* to set **S63** in *Supporting Information*). A strategy that allows a player to single-handedly set the score of the opponent is termed an equalizer strategy (16, 40). However, no autocratic strategy allows player *X* to set her own score via Eq. **4** (*Supporting Information*). These results are consistent with the observations of Press and Dyson (12) that, in the classic prisoner’s dilemma without discounting, player *X* cannot set her own score but can set player *Y*’s score to anything between the payoffs for mutual defection and mutual cooperation.

### Deterministic Autocratic Strategies.

One feature of two-point autocratic strategies is that they allow a player to exert control over the payoffs of a repeated game while ignoring most of the action space. One drawback is that they restrict the region of feasible game payoffs (Fig. 2). This shortcoming of two-point strategies leads to a new class of strategies called deterministic strategies, which are perhaps the simplest alternatives to two-point strategies.

A deterministic strategy requires a player to respond to a history of previous play by playing an action with certainty rather than probabilistically. For example, a memory-one deterministic strategy for player *X* is defined by (*i*) an initial action, *ii*) a reaction function, *X* plays *X* plays *x* and *Y* plays *y*. One well-known example of a deterministic strategy is tit for tat in the classic prisoner’s dilemma, which is defined by

For general memory-one deterministic strategies, the condition for the existence of autocratic strategies, Eq. **4**, becomes**11** holds for each **7** and **S65** in *Supporting Information*).

Similarly, to enforce **S67** in *Supporting Information*).

Examples of deterministic extortionate, generous, and equalizer strategies are given in Fig. 3. It is evident that deterministic strategies increase the feasible region in which linear payoff relationships can be enforced compared with their two-point counterparts (cf. Fig. 2).

## Discussion

In games with two actions, zero-determinant strategies are typically defined via a technical condition such as Eq. **2** (12, 15). This definition makes generalizations to games with larger action spaces difficult because Eq. **2** makes sense only for two-action games. Therefore, we introduce the more general term “autocratic strategy” for any strategy that unilaterally enforces a linear relationship on expected payoffs. Of course, this linear relationship is precisely what makes strategies satisfying Eq. **2** interesting.

The *Theorem* provides a condition for the existence of autocratic strategies for games with general action spaces. We illustrate this phenomenon with a continuous-action-space extension of the classic donation game, which represents an instance of the prisoner’s dilemma. The existing literature on zero-determinant strategies for the classic prisoner’s dilemma provides no way of treating this continuous extension of the donation game. However, the *Theorem* makes no assumptions on the action space of the game and thus applies to the continuous donation game as well as its classic counterpart.

Surprisingly, in many cases a player can enforce a linear relationship on expected payoffs by playing only two actions, despite the fact that the opponent may have infinitely many actions available to use (*Corollary 2* in *Supporting Information*). We demonstrate that the conditions guaranteeing the existence of extortionate, generous, fair, and equalizer strategies in the continuous donation game are in fact similar to those of the two-action case. However, despite the simplicity of these two-point strategies, a player needs to know how to respond to every possible move of the opponent; knowledge of how to respond to just defection (

Another important difference is that, whereas in the classic prisoner’s dilemma mutual generosity represents a symmetric Nash equilibrium (15), this need not be the case in the continuous donation game. Instead, mutual generosity results in a payoff of *K* is the maximal investment, but intermediate levels of cooperation may yield *m* instead of *K* (Figs. 2 and 3). However, no player can enforce a generous relationship with baseline payoff *Supporting Information*). Thus, the performance of a generous strategy as a response to itself depends critically on whether the game has two actions or a continuous range of actions.

Extortionate strategies for the iterated prisoner’s dilemma are not evolutionarily stable (17). Because mutual generosity in the continuous donation game need not be a Nash equilibrium, it follows that generous strategies also need not be evolutionarily stable. Moreover, against human opponents, extortioners are punished by a refusal to fully cooperate, and generous players provide their opponents with an incentive to cooperate and fare better in experiments (22). Such behavior supports what one would expect from a player using a fair autocratic strategy enforcing

In games with two discrete actions, our definition of a Press–Dyson function specializes to a multiple of the Press–Dyson vector, *C* a constant (see Eq. **2** and *Supporting Information* for details). The Press–Dyson vector is recovered by normalizing the Press–Dyson function and thus eliminating the constant, *C*. However, in games with *d* actions, this function involves at least *Supporting Information* for an example with three actions). Therefore, based on the *Theorem*, in two-action games it is perhaps more appropriate to define a Press–Dyson vector to be any vector of the form *Theorem* covers all of the known results on the existence of zero-determinant strategies for repeated games.

More importantly, however, the analysis of iterated games with only two actions completely misses the fact that autocratic strategies are most naturally presented implicitly via Eq. **4**. Even in the case of two actions, infinitely many autocratic strategies may exist (12), but their simplistic nature admits explicit solutions. Our extension shows that, in general, (*i*) autocratic strategies need not be unique and (*ii*) one cannot explicitly list all autocratic strategies that produce a fixed Press–Dyson function. Thus, for arbitrary action spaces (but already for games with

## Iterated Games with Two Players and Measurable Action Spaces

By “action space,” we mean a measurable space, *S*, equipped with a σ -algebra, *S*]. Informally, *S* is the space of actions, decisions, investments, or options available to a player at each round of the iterated interaction and could be a finite set, a continuous interval, or something more complicated. Because the players need not have the same action space, we denote by *X* and by *Y*. In what follows, all functions are assumed to be measurable and bounded.

In each encounter (i.e., “one-shot game”), the players receive payoffs based on a payoff function,*X* and *Y*, respectively. An iterated game between players *X* and *Y* consists of a sequence of these one-shot interactions. If, at time *t*, player *X* uses *Y* uses *X* for a sequence of *Y* is obtained by replacing **S2**. Thus, the discounted payoffs,

Here, we consider stochastic strategies that condition on the history of play: both players observe the sequence of play up to the current period and use it to devise an action for the present encounter. To formally define such strategies, we first recall the notion of “history” in a repeated game: a history at time *T* is a sequence of action pairs,*T* (32). In other words, a history at time *T* is an element of *t*, and

A pure strategy for player *X* in the repeated game is a map, *X* could look at the history of past play and use this information to choose an action from *X* is a map*t*. If *t*, then we say that

Suppose that *X* and *Y*, respectively. Consider the map, σ, defined by the product measure,*t* is in *E*. The sequences, *X* and *Y*, respectively. Before stating this result, we first formally define expected payoffs for the

### Definition 1 (Objective Function for a Finite Game).

If *X* and *Y*, respectively, and if **S7**), then the objective function (or expected payoff) for player *X* in the

### Lemma 1.

For fixed *Lemma 1*, we see that

### Definition 2 (Objective Function for an Infinite Game).

If *X* and *Y*, respectively, and if *X* in the infinite game is

### Remark 1.

Classically, the objective function of a repeated game with infinitely many rounds is defined using a distribution over infinite histories, which is generated by the players’ strategies for the repeated game (32). That is, for *X* is defined by**S13** as an objective function for player *X*, we do not need to worry about what μ is (or if it even exists for a general action space) because Eq. **S14**, whenever it is defined, must coincide with Eq. **S13**. To see why, suppose that there is a distribution, μ, on **S15**, and Eq. **S9**,*Lemma 1*, the objective function for player *X* defined by *S* being finite or the measures in *Lemma 1*, we do not need to worry about the existence of such a distribution.

To prove *Lemma 1*, we first need a simple technical result:

### Lemma 2.

Suppose that

#### Proof.

Because *f* is bounded, there exists a sequence of simple functions,

#### Proof of Lemma 1.

By *Lemma 2* and the definitions of

The objective function of Eq. **S13**, which is obtained using *Lemma 1*, eliminates the need to deal with histories when proving our main results for iterated games. With the background on expected payoffs now established, we turn our attention to the proofs of the results claimed in the main text:

## Detailed Proofs of the Main Results

Before proving our main results, we state a technical lemma that generalizes Lemma 3.1 of Akin (31)—which Hilbe et al. (14) refer to as Akin’s Lemma—and Lemma 1 of Hilbe et al. (15). This lemma relates the strategies of the two players, *X* when

### Lemma 3.

For any memory-one strategy, *X*.

#### Proof.

By the definition of *t*,

By the definitions of *Theorem*, which states that player *X* can enforce the relation *Theorem* by setting

### Proposition.

If *X*.

#### Proof.

Because ψ is bounded, there exists a sequence of simple functions, *S*. For each *Lemma 3*, we obtain

Although the *Proposition* applies to discounted games with **S26** by

### Corollary 1.

If *X*.

### Theorem (Autocratic Strategies in Arbitrary Action Spaces).

Suppose that *X* and let *X*’s initial action. If, for some bounded function, ψ, the equation*Y*. In other words, the pair

### Two-Point Autocratic Strategies.

#### Corollary 2.

Let *X* initially uses *X* initially uses *S* centered at *s*, and, for *Y*’s strategy, this choice of

##### Proof.

By *Theorem*, we need only show that there exists *X*’s actions to two points, we may assume that **S32**, the proof is complete.

#### Remark 2.

In the undiscounted case (**S32** is satisfied for some **S32** holds for some λ, **S38** also holds for every **S38** does not hold for a particular choice of **S38**, which is easy to check, offers a straightforward way to show that two actions cannot form a two-point autocratic strategy for a particular game.

#### Remark 3.

For ψ, *Corollary 2* that, for a strategy of this form, we must have**S41**, which is typically not possible for strategies concentrated on more than just two actions.

## Examples

Here we present some simple examples of the *Theorem* and its implications. In the following section, we demonstrate how the *Theorem* reduces to the main result of Press and Dyson (12) when the action space has only two options. Moreover, we use an action space consisting of three choices to illustrate the implicit nature of autocratic strategies defined via Press–Dyson functions for more than two actions. In Continuous Donation Game, we show that there is no way for a player to unilaterally set her own score using the *Theorem*. In particular, despite the implicit nature of autocratic strategies, one can use the *Theorem* to deduce the nonexistence of certain classes of strategies.

### Games with Finitely Many Actions and No Discounting.

Suppose that **2** in the main text).

On the other hand, if

### Continuous Donation Game.

Here we establish some further results claimed in the main text for extortionate, generous, and equalizer strategies for the continuous donation game.

#### Relationship to the classic donation game.

For a two-point strategy, *X*’s action space may be restricted to *Theorem* is defined by two numbers, *Corollary 2* that the function*X* to enforce the linear relationship**S46** simply states formally that player *X* fully cooperates with probability *X* plays *x* and *Y* plays *y*.

In the absence of discounting, i.e., in the limit **S48**, recovers the discrete-action-space case, Eq. **6**. In particular, the autocratic memory-one strategy in Eq. **S46** is a direct generalization of zero-determinant strategies to the continuous donation game. However, this strategy contains much more information than the corresponding strategy for the classic donation game because it encodes *X*’s play in response to *Y*’s for every *Y* has an infinite number of actions to choose from, player *X* can still ensure that Eq. **S47** holds by playing only two actions.

#### Extortionate and generous strategies.

In the main text, we saw that *X* can unilaterally enforce *Theorem*. Indeed, if

#### Opponent-equalizing strategies.

Although a player cannot set her own score in the continuous donation game, she can set the score of her opponent. We saw in the main text that *X* can set *Y*’s score to anything between 0 and *Y* that *X* can unilaterally set via the *Theorem*. Indeed, if γ satisfies

#### Self-equalizing strategies.

We saw in the main text that extortionate strategies exist in the continuous donation game, as demonstrated by the two-point strategy defined by Eq. **8**. However, it certainly need not be the case that for any *X* to unilaterally enforce the equation

#### Initial actions.

Here we state the conditions on *X* to enforce autocratic strategies in the continuous donation game.

If *X* initially plays **7** in the main text and *X* to enforce

If *X* wishes to enforce *X* can also enforce **7** holds and *X*’s initial action, *X* to set *X*’s initial action,

## Acknowledgments

The authors are grateful to Christian Hilbe for reading an earlier draft of this manuscript and for suggesting that we look into deterministic strategies. The authors acknowledge financial support from the Natural Sciences and Engineering Research Council of Canada Grant RGPIN-2015-05795.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: mcavoy{at}math.ubc.ca.

Author contributions: A.M. and C. Hauert designed research; A.M. performed research; A.M. contributed new reagents/analytic tools; A.M. and C. Hauert analyzed data; and A.M. and C. Hauert wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. C. Hilbe is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1520163113/-/DCSupplemental.

## References

- ↵
- ↵
- ↵.
- Axelrod R,
- Hamilton WD

- ↵.
- Axelrod R

- ↵
- ↵.
- Nowak MA

- ↵.
- Delton AW,
- Krasnow MM,
- Cosmides L,
- Tooby J

- ↵.
- Heide JB,
- Miner AS

- ↵
- ↵
- ↵.
- Nowak MA

- ↵.
- Press WH,
- Dyson FJ

- ↵.
- Pan L,
- Hao D,
- Rong Z,
- Zhou T

- ↵.
- Hilbe C,
- Wu B,
- Traulsen A,
- Nowak MA

- ↵
- ↵.
- Hilbe C,
- Nowak MA,
- Sigmund K

- ↵
- ↵.
- Stewart AJ,
- Plotkin JB

- ↵
- ↵
- ↵
- ↵
- ↵.
- Stewart AJ,
- Plotkin JB

- ↵
- ↵
- ↵.
- West SA,
- Buckling A

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Fudenberg D,
- Tirole J

- ↵.
- Sigmund K

- ↵.
- Killingback T,
- Doebeli M,
- Knowlton N

- ↵
- ↵
- ↵
- ↵.
- Doebeli M,
- Hauert C,
- Killingback T

- ↵
- ↵
- ↵.
- Korevaar J

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Evolution