New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Human strategy updating in evolutionary games

Edited by Simon A. Levin, Princeton University, Princeton, NJ, and approved December 22, 2009 (received for review October 29, 2009)
Abstract
Evolutionary game dynamics describe not only frequencydependent genetic evolution, but also cultural evolution in humans. In this context, successful strategies spread by imitation. It has been shown that the details of strategy update rules can have a crucial impact on evolutionary dynamics in theoretical models and, for example, can significantly alter the level of cooperation in social dilemmas. What kind of strategy update rules can describe imitation dynamics in humans? Here, we present a way to measure such strategy update rules in a behavioral experiment. We use a setting in which individuals are virtually arranged on a spatial lattice. This produces a large number of different strategic situations from which we can assess strategy updating. Most importantly, spontaneous strategy changes corresponding to mutations or exploration behavior are more frequent than assumed in many models. Our experimental approach to measure properties of the update mechanisms used in theoretical models will be useful for mathematical models of cultural evolution.
Classical game theory assumes that agents make rational decisions, taking into account that they are interdependent with other agents that are also fully rational (1). Although this assumption has proved to be problematic even in humans, evolutionary game theory has been developed to describe the dynamics of genetic or cultural evolution when fitness is not fixed but depends on the interactions with others. Applications of this framework range from the dynamics of microbes (2 –4) to animal behavior (5, 6) and human behavior (7 –9). Many aspects of evolutionary dynamics hinge on the microscopic rules describing how successful strategies spread. In particular, in structured populations, these rules can crucially alter the evolutionary outcome and, for example, determine whether cooperation evolves or not (10 –12). Thus, it is of great importance to infer how strategies are actually adopted. To this end, we have developed a behavioral experiment that mimics typical properties of theoretical models but replaces the computer agents by real human players. Each player interacts only with his or her immediate neighbors. To evaluate his or her performance, each player can compare his or her payoff with the payoff of the neighbors and use this as a basis to adopt new strategies. However, there are some subtle differences between mathematical models and human behavior: Humans may use mixed strategies (i.e., randomize between their options) or even change their strategies over time, whereas most theoretical models consider the simplest case in which a player's strategy is equated with his action. Thus, any change in behavior is equated to a change in strategy. If we aim to apply this simple framework of oneshot games as a first approximation to describe human behavior, we have to infer the details of strategy adoption (e.g., the rate of spontaneous strategy changes). We use a spatial game in which human players are interacting with their immediate neighbors only. This leads to a large number of different strategic situations that allow us to infer under which circumstances a neighboring strategy is adopted.
A large portion of the literature on evolutionary games focuses on the Prisoner’s Dilemma. This is a paradigm to study the evolution of costly cooperation among selfish individuals because it highlights the potential differences between individual interests and the social optimum (13 –17). In the Prisoner's Dilemma, two players have to decide simultaneously whether to cooperate with each other or not. If both players cooperate, they obtain a reward, R. If one defects and the other cooperates, the defector gets a temptation to defect, T, and the cooperator obtains a sucker's payoff, S. If both defect, they get a punishment, P. This can be summarized by the payoff matrix
The Prisoner’s Dilemma is characterized by the payoff ranking T > R > P > S (and, in addition, 2R > T + S for repeated games). In this case, rational individuals choose defection: They are greedy and try to exploit other cooperators (T > R), but they also fear that others will try to exploit them (P > S). However, because mutual cooperation yields a higher payoff than mutual defection (R > P), players face a dilemma: Individual reasoning leads to defection, but mutual cooperation implies a higher payoff. Similarly, in an evolutionary setting, the higher payoff of defectors implies more reproductive success; thus, cooperation should not evolve. However, cooperation can evolve, for example, by kin selection, spatial structure, or when interactions are repeated (18). There is a large body of literature on behavioral experiments based on the repeated Prisoner's Dilemma (e.g., 19, 20). It is clear the humans behave in a more sophisticated way than simple computer programs (19), but it has also been shown that working memory constraints limit human behavior in repeated games (21). Nonetheless, with a few exceptions (e.g., 22), theorists have focused on simple forms of strategy choice, for example, to disentangle the effects of population structure and game characteristics. In particular, the spatial version of the Prisoner’s Dilemma has been analyzed in great detail by theorists (23 –27). Initially, research has focused on simple lattices that approximate interactions in spatially homogeneous systems. More recently, many studies have addressed complex social networks instead (28, 29). Typically, players are arranged on a social network and interact pairwise only with their immediate neighbors, choosing either cooperation or defection for all interactions. In each round, the payoff of every player is accumulated in pairwise encounters with all neighbors. Individuals with high payoffs are either imitated more often than others (in social models) or produce more offspring (in genetic models). The dynamics in spatially structured populations depend crucially on the details of the microscopic rules by which the players update their strategies. Our goal is to shed some light on these microscopic rules that describe how players change their strategies.
Such a behavioral experiment with humans can only be done in comparably small systems because of some restrictions in experimental games that are absent in mathematical models. For example, participants have to be paid in real money, and their anonymity must be guaranteed such that the results are not blurred by potential reputation effects. Throughout this study, we focus on R = €0.30, S = €0.00, T = €0.40, and P = €0.10. This leads to the 2 × 2 payoff matrix:
Players were virtually arranged on a spatial 4 × 4 lattice with periodical boundary conditions, which corresponds to the surface of a torus. The participants had four fixed neighbors throughout the entire game. Thus, the possible cooperator payoffs accumulated in their four interactions are 0.00, 0.30, 0.60, 0.90, and 1.20. A defector has the possible payoff values of 0.40, 0.70, 1.00, 1.30, and 1.60.
Many theoretical studies are based on synchronous updating, which means that all players make strategy revisions at the same time. This can easily be mimicked in behavioral experiments. However, the way in which strategies are changed is more difficult to address. A typical assumption is that each player chooses the strategy that obtains the highest payoff in the neighborhood, either his or her previous strategy or a different one. In our experiment, players have many different possibilities for strategy updating. It is clear that human players sometimes do not follow this “imitatethebest” rule but choose their strategies in a different fashion. Nonetheless, these imitation dynamics can serve as a first approximation for strategy updating.
More recent studies have stressed that strategy adoption is stochastic, which can be modeled introducing an intensity of selection (30, 31). One possibility is the following imitation process with errors. Each player compares his or her payoff with that of the best performing neighbor who has played a different strategy and calculates the payoff difference, Δπ. With probability (1 + exp [− βΔπ])^{−1}, he or she adopts the neighbor’s strategy (32 –34). Here, β measures the intensity of selection (i.e., how important the payoffs are for strategy revisions). In our case with two strategies only, this is equivalent to the multinomial logit model (35, 36). Our goal is to understand which strategy adaption rules can describe human behavior in this game.
Results
Let us first address whether imitation dynamics can describe human strategy updating. In total, we have 5,760 individual decisions to keep a strategy or to switch it. As a first model, we assume that all individuals use the imitatethebest rule (i.e., they always imitate the bestperforming neighbor strategy, including their own). It has been shown that this cannot fully describe human behavior (37). Fig. 1 reveals that in our experiment, 62% of the individuals initially follow the imitatethebest rule. However, the remaining 38% of the strategy changes cannot be explained by pure imitation. This fraction tends to decrease over time in the experiment. Fitting an exponential function to the data from Fig. 1 reveals that the fraction of strategy choices not explained by imitation decreases roughly by 4% per round. This reflects the fact that strategy choice changes over time in our behavioral experiment and that a stationary state is not reached.
In theoretical models of the spatial Prisoner's Dilemma, one is typically interested in the average level of cooperation of the system. The idea is that in a spatial setting, clusters of cooperators can form, leading to a significant degree of cooperation (11, 12, 23, 24). To explore how the level of cooperation is affected by spatial structure, we have also conducted a control experiment in which the spatial structure was broken up by reassigning each player’s neighbors each round. Because individuals always interact with the same coplayers in the spatial treatment and can form stable clusters of cooperators, one would expect a higher level of cooperation in the fixedneighbors treatment than in the randomneighbors treatment. As described in previous human behavioral experiments (38) (and not necessarily in line with the expectations of theoreticians), the average level of cooperation at the start of the experiment is comparably large and very similar in the treatment with fixed neighbors (70.0%, averaged over 15 repeats) and the treatment with random neighbors (70.6%, averaged over 10 repeats). Most interestingly, we do not find a significant difference in the level of cooperation during the course of the game between the two treatments (Fig. 2). Only in round 4, there is a significant difference between the levels of cooperation, which disappears after Bonferroni correction for multiple comparisons. Stable clusters of cooperators are not found in our behavioral experiments. The high probability of spontaneous strategy changes decreases the influence of spatial structure.
It turns out that the dynamics can be explained based on the way in which our subjects revise their strategies. The general dynamics of the system can be captured by a simple random strategy choice approach (39). We assume that a player can do two things when revising his or her strategy: (i) With probability ν, he or she chooses a random strategy, and (ii) with probability 1 − ν, he or she imitates her bestperforming neighbor. In our behavioral experiment, we find that ν decays exponentially with round t of the game as ν = ν_{0}Γ ^{t} ^{−1}. Such an exponential decay of exploration rates has been reported before (40). Our experiment yields ν_{0} = 0.380 and Γ = 0.962 for the best fit. To test our assumption, we simulated the temporal dynamics of 15 runs under imitation dynamics with four neighbors, fitting the strategy choice parameters to the experiment. To be consistent with random strategy choice, we assume that only a fraction of 1–2 ν is correct imitation. A fraction ν is random strategy choice leading to the “correct” strategy that is consistent with imitation, and a fraction ν is strategy changes not expected from imitation. Fig. 2 reveals that this approach can capture the average cooperation level in the behavioral experiment. Comparing 15 simulations with 15 experimental treatments reveals no significant difference between the simulations and the experiments after Bonferroni correction, which takes into account multiple comparisons. We can summarize this approach by the following equation governing strategy choice: where B is the bestperforming neighbor of A; t is the round of the game; π _{A} and π _{B} are the payoffs of A and B, respectively; and Θ(α) is the Heaviside function [Θ(α) = 0 for α ≤ 0 and Θ(α) = 1 for α > 0]. In our experiment, we find ν_{0} = 0.380 ± 0.013 and Γ = 0.962 ± 0.003 (see Fig. 1).
Next, let us abstract from the fact that strategy adoption changes over time and analyze the way in which individuals imitate their coplayers in more detail. First, we analyze all situations in which players do the same as their four neighbors. How likely are they to switch strategies? It turns out that cooperators switch to defection in such a homogeneous environment with a probability of μ _{C} = 0.28 ± 0.07 (averaged over 45 such situations). Defectors switch to cooperation with s probability of μ _{D} = 0.25 ± 0.01 (averaged over 1,400 such situations). These probabilities correspond to spontaneous mutations or strategy exploration of the players. To analyze imitation is less straightforward, because it is impossible to say if people changed to a different strategy imitating a particular neighbor, several at the same time, at random, or based on some more sophisticated argumentation. For example, human players who find themselves in a neighborhood of cooperators may be tempted to defect, anticipating to win the highest possible payoff, before another neighbor defects. They may also expect others to take advantage of a cooperative neighborhood sooner or later. However, we can at least quantify the average behavior. We take all decisions into account in which a focal cooperator had at least one defecting neighbor (1,524 decisions) or in which a focal defector had a least one cooperating neighbor (2,791 decisions). Again, some of these strategy changes will correspond to random strategy exploration, but we can assume that this occurs with a probability that is independent of the payoff difference.
Depending on the payoff difference to the neighbor who performs best by using a different strategy than the focal player, what is the probability that the focal player switches to that other strategy? Fig. 3 shows that the probability increases with the success of the neighbor, as expected. A cooperator is typically confronted with a defector performing better, whereas a defector can typically only choose to imitate a cooperator performing worse. Moreover, defectors are more resilient to change than cooperators. To model strategy changes, we assume that the probability of switching strategy is given by . Note that for β → ∞, we recover the unconditional imitation from above. Fitting this function to the data shown in Fig. 3 leads to β = 1.20 ± 0.25. The error corresponds to the SD in a binomial distribution, , where n is the number of samples. If we want to take the difference in strategy adoption of cooperating players and defecting players into account, we can also fit two different functions to the data (Fig. 3). If we instead use the average payoff difference to players using a different strategy, we obtain β = 1.15 ± 0.23. Also in this case, defecting players seem to be more resilient to change.
Fig. 3 also shows how the probability of cooperating depends on the number of cooperating neighbors. This does not take any payoffs into account and addresses whether players imitate the common rather than the more successful. It turns out that the probability of cooperating is below 50% even when all neighbors are cooperating. Thus, in our experiment, players not only imitate the most common strategy but decide for cooperation or defection in more complex ways.
The intensity of selection measured in our experiments reveals that humans do not simply accept any strategy that is performing better than their strategy, as assumed by imitation dynamics. However, β is also so high that analytical results obtained under weak selection may not always apply. Again, we can summarize our approach by means of a simple equation. If we neglect temporal dependence but take the differences between cooperators and defectors into account, we find the following:
Our analysis leads to μ _{C} = 0.28 ± 0.07, β _{C} = 0.67 ± 0.28, and α _{C} = −0.11 ± 0.23 for cooperating players and μ _{D} = 0.25 ± 0.01, β _{D} = 0.99 ± 0.23, and α _{D} = 0.79 ± 0.14 for defecting players.
Discussion
As expected, players imitate others with the probability increasing with the payoff difference. In evolutionary game dynamics, this corresponds to selection. Sometimes, players switch spontaneously to a new strategy at random, however, which corresponds to a mutation. Our approach reveals that the probability of such random changes is much higher than typically assumed in theoretical models.
Theoreticians are often interested in the dynamics for very large populations and not in finite size effects. However, considering large population is unfeasible in behavioral experiments, where many repeats are required. Moreover, our predecessors lived in small social groups, and our behavior may have adapted to that situation. Regardless of the complexity of our modern society, human interactions occur typically within small social groups even today. Most importantly, the way in which players choose strategies based on local information does not seem to be fundamentally different in larger systems (41). Decision making in humans is certainly a complicated process that goes far beyond the simple models that are typically considered. However, we argue that important aspects of human behavior are not captured by the different mechanisms of imitation. Modeling these processes by random strategy choice can lead to very different dynamics in theoretical models and captures the general trend of the dynamics in our system (Fig. 2).
In our experiment, we have analyzed the simplest system in which humans play a spatial game. Many challenges lie ahead: Theoretical models describe interactions not only on regular lattices but on heterogeneous networks (42), dynamic networks (43), or set structured populations (44). It would be fruitful to initiate a discussion in the scientific community on how such more complex models can be approached by behavioral experiments.
Methods
From 2003 to 2004, voluntary human subjects for the experiment were recruited from firstsemester biology courses at the Universities of Kiel, Cologne, and Bonn. A total of 400 students participated in the experiment. The students were divided into 25 groups consisting of 16 players each.
In the spatial treatment (15 groups), the 16 subjects were virtually arranged on a spatial grid with periodical boundary conditions. This torusshaped geometry ensures that there are no edges in the system. Each subject had four fixed direct neighbors throughout the experiment (vonNeumann neighborhood). To ensure the players’ anonymity, each player was identified by a letter ranging from a to p (e.g., a has the following neighbors: b, d, e, and m). The subjects would exclusively interact with these four neighbors and received no further information about the remaining 11 subjects. In the nonspatial control treatment (10 groups), the 16 subjects were positioned on a different random position on the lattice in each round, such that the probability that another interaction with a particular coplayer takes place is 4/15. Otherwise, the control experiment was conducted exactly in the same way as in the spatial treatment. The students were fully aware of whether they were in a fixed or randomized neighborhood.
The subjects started both treatments without money in their account. Each group played a total of 25 Prisoner's Dilemma rounds, allowing them to earn, on average, between 10.00€ (for full defection) and 30.00€ (for full cooperation). A single player, however, may theoretically also obtain nothing (if the player always cooperates but his four partners always defect) or up to 40.00€ (if the player always defects but his partners always cooperate).
Each subject had a decision box on his or her private table that was equipped with silent “YES,” “NO,” and “OK” buttons. During a short oral introduction, the subjects received information about the use of their decision box and how their anonymity would be ensured throughout and after the experiment. At the beginning of the experiment, written instructions explaining the game (see SI Text ) were projected on a screen visible to all players. Each subject had to confirm via the OK button that he or she had finished reading and had understood each of the displayed instruction pages.
In both treatments, each subject had to make a single decision in each round either to cooperate or defect in the Prisoner’s Dilemma played with all four neighbors simultaneously. This setting corresponds to synchronous strategy adjustment. After every round, the subjects could observe the results of the round on their personal display, which could display a maximum of 32 characters. Decisions were displayed in the following form:
The display was explained in detail in three examples, and subjects had no problems in understanding it. Here, s, t, u, v, and w are the codes for the different players. Each player is provided with his or her own strategy [cooperation (Y) or defection (N)] and payoff as well as the chosen strategies of his or her direct neighbors and their respective payoffs, which resulted from their interactions with their four neighbors (e.g., their own payoff s: 6 = €0.60; payoff of player t: 12 = €1.20). The computer calculated the individual's payoff from all four encounters and transferred the cumulated payoff to the player's account after each round. At the end of the experiment, the players received the money in their respective accounts in cash without losing their anonymity (for details, see ref. 45).
Throughout the experiment, complete anonymity of the subjects was assured by the following measures. Subjects were seated between separations, such that no visual contact between them was possible. All boxes were connected to a computer to record each individual decision. The subjects were informed that they were not allowed to talk to or contact each other during the experiment. Each player could only be identified by his pseudonym (a–p) both by other players as well as by the experimenters. Pseudonyms could not be connected with the students’ real identity.
Acknowledgments
We are grateful to S. Bonhoeffer for helping us to choose appropriate parameters for the experiment. We thank T.M.C. Bakker, H. Arndt, and H. Brendelberger for support and the 400 students for their participation as well as D. Helbing and A. Sanchez for stimulating discussions. A.T. is supported by the Emmy–Noether program of the Deutsche Forschungsgemeinschaft.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: traulsen{at}evolbio.mpg.de.

Author contributions: D.S. and M.M. designed research; D.S., H.J.K., and M.M. performed research; A.T. analyzed data; and A.T., D.S., R.D.S., and M.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0912515107/DCSupplemental.
References
 ↵
 von Neumann J,
 Morgenstern O
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Bendor J,
 Swistak P
 ↵
 Milinski M,
 Semmann D,
 Krambeck HJ,
 Marotzke J
 ↵
 ↵
 ↵
 ↵
 ↵
 Rapoport A,
 Chammah AM
 ↵
 ↵
 Axelrod R
 ↵
 ↵
 Macy MW,
 Flache A
 ↵
 Nowak MA
 ↵
 Kagel JH,
 Roth AE
 Roth AE
 ↵
 Camerer C
 ↵
 Milinski M,
 Wedekind C
 ↵
 ↵
 ↵
 Hauert C
 ↵
 Skyrms B
 ↵
 ↵
 Helbing D,
 Yu W
 ↵
 Abramson G,
 Kuperman M
 ↵
 Santos FC,
 Pacheco JM,
 Lenaerts T
 ↵
 ↵
 ↵
 ↵
 Szabó G,
 Tőke C
 ↵
 ↵
 Manski CF,
 McFadden D
 McFadden D
 ↵
 Sandholm WH
 ↵
 ↵
 ↵
 Traulsen A,
 Hauert C,
 De Silva H,
 Nowak MA,
 Sigmund K
 ↵
 Schreckenberg M,
 Selten R
 Helbing D
 ↵
 Grujic J,
 Fosco C,
 Araujo L,
 Cuesta J,
 Sanchez A
 ↵
 ↵
 ↵
 Tarnita CE,
 Antal T,
 Ohtsuki H,
 Nowak MA
 ↵
Citation Manager Formats
More Articles of This Classification
Biological Sciences
Related Content
Cited by...
 Punishment diminishes the benefits of network reciprocity in social dilemma experiments
 Memoryn strategies of direct reciprocity
 Emergence of communities and diversity in social networks
 Onymity promotes cooperation in social dilemma experiments
 The effects of reputational and social knowledge on cooperation
 Focus on the success of others leads to selfish behavior
 Static network structure can stabilize human cooperation
 Cooperation and assortativity with dynamic partner updating
 Heterogeneous networks do not promote cooperation when humans play a Prisoner's Dilemma
 Conditional cooperation can hinder network reciprocity
 Dynamic social networks promote cooperation in experiments with humans
 Emergence of social cohesion in a model society of greedy, mobile individuals
 The future of social experimenting