New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Evolutionary cycles of cooperation and defection

Edited by Robert M. May, University of Oxford, Oxford, United Kingdom (received for review March 29, 2005)
Abstract
The main obstacle for the evolution of cooperation is that natural selection favors defection in most settings. In the repeated prisoner's dilemma, two individuals interact several times, and, in each round, they have a choice between cooperation and defection. We analyze the evolutionary dynamics of three simple strategies for the repeated prisoner's dilemma: always defect (ALLD), always cooperate (ALLC), and titfortat (TFT). We study mutation–selection dynamics in finite populations. Despite ALLD being the only strict Nash equilibrium, we observe evolutionary oscillations among all three strategies. The population cycles from ALLD to TFT to ALLC and back to ALLD. Most surprisingly, the time average of these oscillations can be entirely concentrated on TFT. In contrast to the classical expectation, which is informed by deterministic evolutionary game theory of infinitely large populations, stochastic evolution of finite populations need not choose the strict Nash equilibrium and can therefore favor cooperation over defection.
In the prisoner's dilemma, two players have the choice to cooperate or to defect. Both obtain payoff R for mutual cooperation, but a lower payoff P for mutual defection. If one individual defects, while the other cooperates, then the defector receives the highest payoff T whereas the cooperator receives the lowest payoff S. We have T > R > P > S. Defection dominates cooperation: in any mixed population, defectors have a higher fitness than cooperators.
As is standard in repeated games, new strategies become possible when the game is repeated, and these strategies can lead to a wider range of equilibrium outcomes (1–8). In particular, in the infinitely repeated prisoner's dilemma, cooperation becomes an equilibrium outcome, but defection remains an equilibrium as well (9, 10). To select between these equilibria, the authors of refs. 11–13 looked at the replicator dynamic on a continuum population, and the authors of refs. 14 and 15 applied variants of evolutionary stability to repeated games with complexity costs. These solution concepts do not have explicit dynamics and are based on models with a continuum population.
Our goal is to study explicit evolutionary dynamics in a large but finite population. To explicitly model evolutionary dynamics, the space of possible strategies must be restricted. In this article, we explore the evolutionary dynamics of three strategies: always defect (ALLD), always cooperate (ALLC), and titfortat (TFT). TFT cooperates in the first move and then does whatever the opponent did in the previous move. Ever since Axelrod's celebrated computer tournaments (16), TFT is a world champion in the repeated prisoner's dilemma, although it has some weaknesses and has at times been defeated by other strategies (11, 12, 17). For our purpose here, these weaknesses are not important. We conjecture that similar results hold for other reciprocal strategies, such as generoustitfortat (11) or win–stay, lose–shift (12, 18), which is also known as perfecttitfortat (15).
We consider a finitely repeated game with an average number of rounds, m. TFT is a conditional strategy, whereas the other two strategies are unconditional. Therefore, it is natural to include a complexity cost for TFT (14): the payoff for TFT is reduced by a small constant c. The payoff matrix is given by The pairwise comparison of the three strategies leads to the following conclusions. (i) ALLC is dominated by ALLD, which means it is best to play ALLD against both ALLC and ALLD. (ii) TFT is dominated by ALLC. These two strategies cooperate in every single round, but the complexity cost of TFT implies that ALLC has a higher payoff. (iii) If the average number of rounds exceeds a minimum value, m > [(T  P + c)/(R  P)], then TFT and ALLD are bistable. This result means that, choosing between ALLD and TFT, each strategy is a best response to itself.
Let us now consider traditional evolutionary game dynamics of all three strategies as given by the replicator equation (19–21). This approach describes deterministic selection in infinitely large populations. The frequency of a strategy increases at a rate given by the difference between its fitness and the average fitness of the population. The fitness of a strategy is the expected payoff from the game assuming many random encounters with other individuals. In this framework, any mixed population of ALLC, TFT, and ALLD will converge to a pure ALLD population. The state where everybody plays ALLD is the only stable equilibrium.
This outcome does not surprise us. From the payoff matrix (1), we see that ALLD is the only evolutionarily stable strategy (ESS) and the only strict Nash equilibrium (22–24). If everybody uses ALLD, then every other strategy has a lower fitness. Therefore, no mutant strategy can invade an ALLD population. In contrast, neither TFT nor ALLC nor any mixed population has this property.
We can extend the replicator equation and consider selection and mutation in a deterministic framework. In the resulting “replicator–mutator equation” (see Appendix), deterministic evolutionary dynamics remain essentially the same for very small mutation rates: all trajectories starting in the interior converge to a population that consists of almost only ALLD players. We call this equilibrium “almost ALLD.” For small or zero mutation rates, there is also an unstable mixed equilibrium containing all three strategies. When there are no mutations, the proportion of ALLD players in this equilibrium is c/[(m  1)(P  S)] and thus can be made arbitrarily small by increasing the number of rounds or by reducing the complexity cost. If the mutation rate exceeds a critical value, a stable limit cycle forms around this mixed equilibrium, so that there are two basins of attraction. Certain initial conditions converge to almost ALLD, whereas others converge to the limit cycle. For even larger mutation rates, almost ALLD loses stability, and the limit cycle becomes a global attractor. There is another critical mutation rate, where the limit cycle disappears and all trajectories converge to a stable mixed equilibrium containing all three strategies. Fig. 1 illustrates these deterministic dynamics.
Let us now move from deterministic evolution of infinite populations to stochastic evolution of finite populations (25–31). We study a frequencydependent Moran process (30–32) with mutation. In each time period, an individual is chosen for reproduction with a probability proportional to its fitness. The offspring replaces a randomly chosen individual. The total population size is constant. With a small mutation probability, the offspring does not use the same strategy as the parent, but one of the two other strategies. A precise description of the stochastic process is in the Appendix. Reproduction can be interpreted genetically or culturally. In the latter case, successful strategies spread by learning (or imitation) from one individual to another.
The evolutionary dynamics of this stochastic process differ from the deterministic approach. In the limiting case of very small mutation rates, the stochastic process does not converge to ALLD, but instead there are endless oscillations from ALLC to ALLD to TFT and back to ALLC. For a long time, the population is almost homogeneous for one strategy, but then a mutant is produced that generates a lineage that takes over the population. The transition rate from one homogeneous population to the next is given by the product of the population size, N, times the mutation rate u, times the fixation probability of the mutant in the resident population. The oscillations tend to revolve in one direction, because the transitions from ALLC to ALLD, from ALLD to TFT, and from TFT to ALLC are much more likely than the corresponding reverse transitions.
Surprisingly, the time average of these oscillations can be entirely dominated by TFT. This means that, for most of the time, the population is in a state that consists of only TFT players. This observation is of interest because, in the limit of very small mutation rates, an infinite population chooses defection, but a finite population (of the right size) chooses reciprocity. This is a remarkable result, given that the payoff matrix (1) clearly indicates that ALLD is the only strict Nash solution and the only evolutionarily stable strategy. We observe that neither concept implies evolutionary success in the stochastic setting of finite populations.
The Appendix contains a theorem that states that the stochastic process has a time average that is arbitrarily close to TFT. More precisely, for a suitable range of population sizes, the population consists most of the time of only TFT players, provided that the average number of rounds m of the repeated prisoner's dilemma is large enough. Fig. 2 shows the (stationary) distribution of the stochastic process for numerical simulations.
The transition rate from an ALLD population to a TFT population is Nuρ, where ρ is the fixation probability of a single TFT player in an ALLD population. The transition rate from a TFT to an ALLC population is of order u, because ALLC and TFT are nearly neutral given a small complexity cost of TFT. The transition rate from an ALLC to an ALLD population is of order Nu, because ALLD has a strong selective advantage. Therefore, the most rapid transition is from ALLC to ALLD. If Nρ > 1, then the transition from ALLD to TFT is faster than the transition from TFT to ALLC. In this case, the population consists most often of TFT players.
A natural extension of our work would be to accommodate the possibility that players make mistakes, so that there is a small probability that a realized action is different from the intended action (15). This extension seems to be particularly relevant when the number of rounds is large. In two recent articles (33, 34), Brandt and Sigmund studied the effects of errors in a deterministic model for the evolution of an infinite population of ALLC, ALLD, and TFT players.
Evolutionary game theory of the last two decades has largely focused on deterministic descriptions of infinitely large populations. Most of our intuitions about evolutionary dynamics come from this important tradition (20, 21). In this article, we have shown that the intrinsic stochasticity of finite populations can lead to surprising outcomes. Instead of convergence to the only strict Nash solution (ALLD), we observe oscillations from ALLD to TFT to ALLC and back to ALLD, with a time average that is concentrated on TFT. Stochastic evolution in finite populations leads to a natural selection of reciprocity.
Acknowledgments
We thank the referees for their detailed suggestions, which helped to improve the presentation of our results. This material is based upon work supported by National Science Foundation Grant 0426199 and by the John Templeton Foundation. The Program for Evolutionary Dynamics at Harvard University is sponsored by Jeffrey Epstein.
Footnotes

↵ † To whom correspondence should be addressed. Email: limhof{at}unibonn.de.

Author contributions: L.A.I., D.F., and M.A.N. performed research, contributed new reagents/analytic tools, and wrote the paper.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: ALLD, always defect; ALLC, always cooperate; TFT, titfortat.
 Copyright © 2005, The National Academy of Sciences
References

↵
Luce, D. & Raiffa, H. (1957) Games and Decisions (Wiley, New York).

↵
Flood, M. M. (1958) Manage. Sci. 5 , 526.

Rubinstein, A. (1979) J. Econ. Theory 21 , 19.

Aumann, R. J. (1981) in Essays in Game Theory and Mathematical Economics in Honor of Oscar Morgenstern, ed. Aumann, R. J., pp. 1142.

Lindgren, K. (1992) in Theory and Control of Dynamical Systems, eds. Andersson, A. E., Andersson, S. I. & Ottoson, U. (World Scientific, Singapore), pp. 95107.

↵
Roth, A. E. (1993) J. Hist. Econ. Thought 15 , 184209.
 ↵

↵
Axelrod, R. & Hamilton, W. D. (1981) Science 211 , 13901396. pmid:7466396
 ↵
 ↵
 ↵
 ↵

↵
Fudenberg, D. & Maskin, E. (1990) Am. Econ. Rev. 80 , 274279.

↵
Axelrod, R. (1984) The Evolution of Cooperation (Basic Books, New York).
 ↵
 ↵
 ↵

↵
Hofbauer, J. & Sigmund, K. (1998) Evolutionary Games and Population Dynamics (Cambridge Univ. Press, Cambridge, U.K.).
 ↵
 ↵

Nash, J. F. (1950) Proc. Natl. Acad. Sci. USA 36 , 4849.

↵
Maynard Smith, J. (1982) Evolution and the Theory of Games (Cambridge Univ. Press, Cambridge, U.K.).
 ↵

Ficici, S. & Pollack, J. (2000) in Proceedings of the 2000 Genetic and Evolutionary Computation Conference, ed. Darrell Whitley, L. (Morgan–Kaufmann, San Francisco), pp. 927934.
 ↵
 ↵

↵
Moran, P. A. P. (1962) The Statistical Processes of Evolutionary Theory (Clarendon Press, Oxford).

↵
Brandt, H. & Sigmund, K. (2005) Proc. Natl. Acad. Sci. USA 102 , 26662670. pmid:15695589

↵
Brandt, H. & Sigmund, K. (2005) J. Theor. Biol., in press.

↵
Fudenberg, D., Imhof, L. A. & Nowak, M. A. (2005) Time Average of Evolutionary Oscillations, preprint.