Reward and punishment
See allHide authors and affiliations

Edited by Kenneth W. Wachter, University of California, Berkeley, CA, and approved May 25, 2001 (received for review March 30, 2001)
Abstract
Minigames capturing the essence of Public Goods experiments show that even in the absence of rationality assumptions, both punishment and reward will fail to bring about prosocial behavior. This result holds in particular for the wellknown Ultimatum Game, which emerges as a special case. But reputation can induce fairness and cooperation in populations adapting through learning or imitation. Indeed, the inclusion of reputation effects in the corresponding dynamical models leads to the evolution of economically productive behavior, with agents contributing to the public good and either punishing those who do not or rewarding those who do. Reward and punishment correspond to two types of bifurcation with intriguing complementarity. The analysis suggests that reputation is essential for fostering social behavior among selfish agents, and that it is considerably more effective with punishment than with reward.
Experimental economics relies increasingly on simple games to exhibit behavior that is blatantly at odds with the assumption that players are uniquely attempting to maximize their own utility (1–4). We briefly describe two particularly wellknown games that highlight the prevalence of fairness and solidarity, without delving into experimental details and variations.
In the Ultimatum Game, the experimenter offers a certain sum of money to two players, provided they can split it among themselves according to specific rules. One randomly chosen player (the proposer) is asked to propose how to divide the money. The coplayer (the responder) can either accept this proposal, in which case the money is accordingly divided, or else reject the offer, in which case both players get nothing. The game is not repeated. Because a “rational” responder ought to accept any offer, as long as it is positive, a selfish proposer who thinks that the responder is rational, in this sense, should offer the minimal positive sum. As has been well documented in many experiments, this is not how humans behave, in general. Many proposers offer close to onehalf of the sum, and responders who are offered less than onethird often reject the offer (5, 6).
In the Public Goods Game, the experimenter asks each of N players to invest some amount of money into a common pool. This money is then multiplied by a factor r (with 1 < r < N) and divided equally among the N players, irrespective of their contribution. The selfish strategy is obviously to invest nothing, because only a fraction r/N < 1 of each contribution returns to the donor. Nevertheless, a sizeable proportion of players invest a substantial amount. This economically productive tendency is further enhanced if the players, after the game, are allowed to impose fines on their coplayers. These fines must be paid to the experimenter, not to the punisher. In fact, imposing a fine costs a certain fee to the punisher (which also goes to the experimenter). Punishing is therefore an unselfish activity. Nevertheless, even in the absence of future interactions, many players are ready to punish free riders, and this behavior has the obvious effect of increasing the contributions to the common pool (refs. 3, 4, 7–9; for the role of punishment in animal societies, see ref. 10).
Simple as they are, both games have a large number of possible strategies. For the Ultimatum Game, these consist in the amount offered (when proposer) or the aspiration level (when responder); any amount below the aspiration level is rejected. For the Public Goods Game with Punishment, the strategies are defined by the size of the contribution and the fines meted out to the coplayers. To achieve a better theoretical understanding, it is useful to reduce these simple games even further and to consider minigames with binary options only. In doing this, we are following a distinguished file of predecessors (5, 11, 12). We shall then use the results from Gaunersdorfer et al. (13) (see also refs. 14, 15) to analyze these games by studying the corresponding replicator dynamics. It turns out that the Ultimatum minigame is just a special case of the Public Goods with Punishment minigame. Evolutionary game theory—like the classical theory—predicts the selfish “rational” outcome. But if an arbitrarily small reputation effect is included in the analysis, a bifurcation of the dynamics allows for an outcome that is more “social” and closer to what is actually observed in experiments.
We analyze similarly a minigame describing the Public Goods Game with Rewards (in which case the recipient of a gift has the option of returning part of it to the donor). Again, evolutionary game theory and classical theory predict the selfish outcome: no gifts and no rewards. This time, the corresponding reputation effect introduces another type of bifurcation. The outcome is more complex and less stable than in the punishment case.
It is tempting to suggest that this finding reflects why, in experiments, results obtained by including rewards are considerably less pronounced than those describing punishment (Ernst Fehr, personal communication). We concentrate in this note on the mathematical aspects of the minigames, but we argue in the discussion that reduction to a minigame is also interesting for experimenters, because the options are more clearcut.
Public Goods with Punishment
For the minigame reflecting the Public Goods Game, we shall assume that there are only two players, and that both can send a gift g to their coplayer at a cost −c to themselves, with 0 < c < g. The players have to decide simultaneously whether to send the gift to their coplayer. They are effectively engaged in a Prisoner's Dilemma. We continue to call it a Public Goods Game, although the reduction to two players may affect an essential aspect of the game.
After this interaction, they are offered the opportunity to punish their coplayer by imposing a fine. The fine amounts to a loss −β to the punished player, but it entails a cost −γ to the punisher. Defecting and refusing to punish is obviously the dominating solution.
If we assume that players can impose their fine conditionally, fining only those who have failed to help them, the longterm outcome will still be the same as before: no prosocial behavior emerges. Indeed, let us label with e_{1} those players who cooperate by sending a gift to their coplayer and with e_{2} those who do not, i.e., who defect; similarly, let f_{1} denote those who punish defectors and f_{2} those who do not. The payoff matrix is given by 1 Here, the first number in each entry is the payoff for the corresponding row player and the second number, for the column player.
For the minigame corresponding to the Ultimatum Game, we normalize the sum to be divided as 1 and assume that proposers have to decide between two offers only, high and low. Thus proposers have to choose between option e_{1} (high offer h) and e_{2} (low offer l) with 0 < l < h < 1. Responders are of two types, namely f_{1} (accept high offers only) and f_{2} (accept every offer). In this case, the payoff matrix is 2
A Minicourse on Minigames
More generally, let us assume that players are in two roles, I and II, such that players in role I interact only with players in role II and viceversa. Let there be two possible options, e_{1} and e_{2}, in role I, and f_{1} and f_{2} in role II, and let the payoff matrix be 3 If players find themselves in both roles, their strategies are G_{1} = e_{1}f_{1}, G_{2} = e_{2}f_{1}, G_{3} = e_{2}f_{2} and G_{4} = e_{1}f_{2}. Therefore we obtain a symmetric game, and the payoff for a player using G_{i} against a player using G_{j} is given by the (i, j)entry of the matrix 4 For instance, a G_{1} player meeting a G_{3} opponent plays e_{1} against the opponents f_{2} and obtains B and plays f_{1} against the opponents e_{2}, which yields c. In the Public Goods with Punishment minigame, the two roles are that of potential donor and potential punisher, and both players play both roles. In the Ultimatum Game, a player plays only one role and the coplayer the other, but because they find themselves with equal probability in one or the other role, we only have to multiply the previous matrix with the factor 1/2 to get the expected payoff values. We shall omit this factor in the following.
We turn now to the standard version of evolutionary game theory, where we consider a large population of players who are randomly matched to play the game. We denote by x_{i}(t) the frequency of strategy G_{i} at time t and assume that these frequencies change according to the success of the strategies. Thus the state x = (x_{1}, x_{2}, x_{3}, x_{4}) (with x_{i} ≥ 0 and ∑ x_{i} = 1) evolves in the unit simplex S_{4}. The average payoff for strategy G_{i} is (Mx)_{i}. We shall assume a particularly simple learning mechanism and postulate that the rate according to which a G_{i}player switches to strategy G_{j} is proportional to the payoff difference (Mx)_{j} − (Mx)_{i} (and is 0 if the difference is negative). We then obtain the replicator equation (14, 16, 17), 5 for i = 1, 2, 3, 4, where M̄ = ∑ x_{j}(Mx)_{j} is the average payoff in the population. It is well known that the dynamics does not change if one modifies the payoff matrix M by replacing m_{ij} by m_{ij} − m_{1j}. Thus, we can use, instead of Eq. 4, the matrix 6 where R = C − A, r = b − a, S = D − B and s = d − c. Alternatively, we could have normalized the payoff matrix (Eq. 3) to 7 The matrix M has the property that m_{1j} + m_{3j} = m_{2j} + m_{4j} for each j, so that (Mx)_{1} + (Mx)_{3} = (Mx)_{2} + (Mx)_{4} for all x. From this equality follows that (x_{1}x_{3})/(x_{2}x_{4}) is an invariant of motion for the replicator dynamics: the value of this ratio remains unchanged along every orbit. Hence the interior of the state simplex S_{4} is foliated by the invariant surfaces W_{K} = {x ∈ S_{4}: x_{1}x_{3} = Kx_{2}x_{4}}, with 0 < K < ∞. Each such saddlelike surface is spanned by the frame G_{1} − G_{2} − G_{3} − G_{4} − G_{1} consisting of four edges of S_{4}. The orientation of the flow on these edges can easily be obtained from the previous matrix. For instance, if R = 0, then the edge G_{1}G_{2} consists of fixed points. If R > 0, the flow points from G_{1} towards G_{2} (G_{2} dominates G_{1} in the absence of the other strategies), and conversely from G_{2} to G_{1} if R < 0. Similarly, the orientation of the edge G_{2}G_{3} is given by the sign of s, that of G_{3}G_{4} by the sign of S, and that of G_{4}G_{1} by the sign of r.
Generically, the parameters R, S, r, and s are nonzero. Therefore we have 16 orientations of G_{1}G_{2}G_{3}G_{4}, which, by symmetry, can be reduced to 4. In Gaunersdorfer et al. (13), all possible dynamics for the generic case have been classified.
Public Goods with Punishment and Ultimatum Minigames
If we apply this to the Public Goods with Punishment minigame, we find R = c − β, S = c, r = 0, and s = γ. For the Ultimatum minigame, we get R = −(1 − h), S = h − l, r = 0, and s = l.
In fact, the Ultimatum minigame is a Public Goods minigame, with l = γ, β = 1 − l, and g = c = h − l. Intuitively, this rejection simply means that in the Ultimatum minigame, the gift consists in making the high instead of the low offer. The benefit to the recipient (i.e., the responder) h − l is equal to the cost to the donor (i.e., the proposer). The punishment consists in refusing the offer. This costs the responder the amount l (which had been offered to him) and punishes the proposer by the amount 1 − l, which can be large if the offer has been dismal.
We can therefore concentrate on the Public Goods minigame. Note that it is nongeneric (r is zero), because the punishment option is excluded after a cooperative move (and in the Ultimatum minigame, no responder rejects the high offer).
In the interior of S_{4} (more precisely, whenever x_{2} > 0 or x_{3} > 0) we have (Mx)_{4} > (Mx)_{1}, and hence x_{4}/x_{1} is increasing. Similarly, x_{3}/x_{2} is increasing. Therefore, there is no fixed point in the interior of S_{4}. Thus the fixed points in W_{K} are the corners G_{i} and the points on the edge G_{1}G_{4}. To check which of these are Nash equilibria, it is enough to check whether they are saturated. We note that a fixed point z is said to be saturated if (Mz)_{i} ≤ M̄ for all i, with z_{i} = 0. G_{3} is saturated; G_{2} is not. A point x on the edge G_{1}G_{4} is saturated whenever (Mx)_{3} ≤ [x_{1}(Mx)_{1} + (1 − x_{1})(Mx)_{4}], i.e., whenever x_{1} ≥ c/β (using (Mx)_{4} = (Mx)_{1}). The condition (Mx)_{2} ≤ M̄ reduces to the same inequality. Thus if c > β, G_{3} is the only Nash equilibrium. This case is of little interest.
From now on, we restrict our attention to the case c < β: the fine costs more than the cooperative act. We note that this inequality is always satisfied for the Ultimatum minigame and for public transportation. We denote the point (c/β, 0, 0, (β − c)/β) with Q and see that the closed segment QG_{1} consists of Nash equilibria.
In this case, R < 0, and the orientation of the edges of W_{K} is given by Fig. 1. On the edge G_{2}G_{4}, there exists another fixed point F = (0, c/(β + γ), 0, (β + γ − c)/(β + γ)). It is attracting on the edge and in the face G_{2}G_{4}G_{1} but repelling on the face G_{2}G_{4}G_{3}. Finally, there is also a fixed point on the edge G_{1}G_{3}, namely the point P = ((c + γ)/(β + γ), 0, (β − c)/(β + γ), 0). It is attracting in the face spanned by that edge and G_{2} but repelling in the face spanned by that edge and G_{4}. In the absence of other strategies, the strategies G_{1} and G_{3} are bistable. The strategy G_{1} is risk dominant (i.e., it has the larger basin of attraction) if 2c < β − γ. We note that in the special case of the Ultimatum minigame, this reduces to the condition h < 1/2.
Apart from G_{3} and the segment QG_{1}, there are no other Nash equilibria. Depending on the initial condition, orbits in the interior of S_{4} converge either to G_{3} or to a Nash equilibrium on QG_{1}. Selective forces do not act on the edge G_{1}G_{4}, because it consists of fixed points only. But the state x fluctuates along the edge by neutral drift (reflecting random shocks of the system). Random shocks will also introduce occasionally a minority of a missing strategy. If this happens while x is in QG_{1}, selection will send the state back to the edge, but a bit closer to Q (because x_{4}/x_{1} increases). Once the state has reached the segment QG_{4} and a minority of G_{3} is introduced by chance, this minority will be favored by selection and eventually become fixed in the population. Thus in spite of the segment of Nash equilibria, the asocial state G_{3} will get established in the long run. This result plays the central role in Nowak et al. (18).
Bifurcation Through Reputation
In the Ultimatum Game and the Public Goods Game, experiments are usually performed under conditions of anonymity. The players do not know each other and are not supposed to interact again. But let us now introduce a small probability that players know the reputation of their coplayer and, in particular, whether the coplayer has failed to punish a defector on some previous occasion. This reputation creates a temptation to defect.
Let us assume that with a probability μ, cooperators (e_{1} players) defect against nonpunishers (f_{2} players), i.e., μ is the probability that (i) the f_{2} type becomes known, and (ii) the e_{1} type decides to defect. Let us similarly assume that with a small probability ν, defectors (e_{2} players) cooperate against punishers (f_{1} players), i.e., ν is the probability that (i) the f_{1} type becomes known, and (ii) the e_{2} type decides to cooperate. The payoff matrix for this “Public Goods with Second Thoughts” minigame becomes 8 We obtain R = (1 − ν)(c − β) < 0, S = c(1 − μ) > 0, s = γ − ν(g + γ), which is positive for small ν and r = −gμ < 0. Thus the edge G_{1}G_{4} consists no longer of fixed points but of an orbit converging to G_{1}. This is a generic situation, and we can use the results from Gaunersdorfer et al. (13).
The fixed points in the interior of S_{4} must satisfy (Mx)_{1} = (Mx)_{2} = (Mx)_{3} = (Mx)_{4} (and, of course, x_{1} + x_{2} + x_{3} + x_{4} = 1). There exists now a line L of fixed points in the interior of S_{4}, satisfying (Mx)_{1} = (Mx)_{2}, which reduces to 9 and also satisfying (Mx)_{1} = (Mx)_{4}, which reduces to 10 This yields solutions in the simplex S_{4} if, and only if, RS < 0 and rs < 0. Both conditions are satisfied for the new minigame. It is easily verified that the line of fixed points L is given by l_{i} = m_{i} + p for i = 1, 3, and l_{i} = m_{i} − p for i = 2, 4, with p as parameter and 11 (see Fig. 2). Setting ν = 0 for simplicity, this yields in our case 12 and reduces for the Ultimatum minigame to 13 with k = (1 − l − μ(h − l))(l + μ(1 − l)). This line passes through the quadrangle G_{1}G_{2}G_{3}G_{4} and hence intersects every W_{K} in exactly one point (it intersects W_{1} in m). Because Rr > 0, this point is a saddle point for the replicator dynamics in the corresponding W_{K} (see Fig. 3). On each surface, and therefore in the whole interior of S_{4}, the dynamics is bistable, with attractors G_{1} and G_{3}. Depending on the initial condition, every orbit, with the exception of a set of measure zero, converges to one of these two attractors (see Fig. 3).
For μ → 0, the point m, and consequently all interior fixed points (which are all Nash equilibria), converge to the point Q. At μ = 0, we observe a highly degenerate bifurcation. The (very short) segment of fixed points is suddenly replaced by a transversal line of fixed points, namely the edge G_{1}G_{4}, of which one segment, namely QG_{1}, consists of Nash equilibria.
Thus, introducing an arbitrarily small perturbation μ (which is proportional to the probability of having information about the other player's punishing behavior) changes the longterm state of the population. Instead of converging in the long run to the asocial regime G_{3} (defect, do not punish), the dynamics has now two attractors, namely G_{3} and the social regime G_{1} (cooperate, punish defectors). For small μ and ν, this new attractor is even riskdominant (in the sense that it has the larger basin of attraction on the edge G_{1}G_{3}) provided 2c < β − γ, which for the Ultimatum case reduces to h < 1/2. One can argue that, in this case, random shocks (or diffusion) will favor the social regime.
If μ = 1, i.e., if there is full knowledge about the type of the coplayer, we obtain S = 0. This case yields in some way the mirror image of the case μ = 0. G_{3}G_{4} is now the fixed point edge; the points on Q̂G_{3} are Nash (with Q̂ = (0, 0, g/(g + γ), γ/(g + γ)) if we assume additionally that ν = 0), and fluctuations send the state ultimately to the unique other Nash equilibrium, namely G_{1}, the social regime.
Reward and Reputation
Let us now consider another minigame, a variant of Public Goods with Second Thoughts, where reward replaces punishment. More precisely, two players are simultaneously asked whether they want to send a gift to the coplayer (as before, the benefit to the recipient is g, and the cost to the donor −c). Subsequently, recipients have the possibility to return a part of their gift to the donor. We assume that this costs them −γ and yields β to the coplayer (if γ = β, this is simply a payback). We assume 0 < c < β and 0 < γ < g. We label the players who reward their donor with f_{1} and those who don't with f_{2}. We shall assume that with a small likelihood μ, cooperators defect if they know that the other player is not going to reward them, i.e., μ is the probability that (i) the f_{2} type becomes known, and (ii) the e_{1} type decides to defect. Similarly, we denote by ν the small likelihood that defectors cooperate if they know that they will be rewarded. (ν is the probability that (i) the f_{1} type becomes known and (ii) the e_{2} type reacts accordingly). We obtain the payoff matrix 14 Now R = (c − β)(1 − ν) < 0, S = c(1 − μ) > 0, r = γ − gμ, which is positive if μ is small, and s = (γ − g)ν, which is negative.
If ν = 0 (no clue that the coplayer rewards), then G_{2}G_{3} consists of fixed points. As before, we see that the saturated fixed points (i.e., the Nash equilibria) on this edge form the segment QG_{3} (with Q = (0, c/β, (β − c)/β, 0) if μ is also 0). But now, the flow along the edges leads from G_{2} to G_{1}, from there to G_{4}, and from there to G_{3}. All orbits in the interior have their α limit on G_{2}Q and their ω limit on QG_{3}. If a small random shock sends a state from the segment G_{2}Q towards the interior, the replicator dynamics first amplifies the frequencies of the new strategies but then eliminates them again, leading to a state on QG_{3}. If a small random shock sends a state from the segment QG_{3} towards the interior, the replicator dynamics sends it directly back to a state that is closer to G_{3}. Eventually, with a sufficient number of random shocks, almost all orbits end up close to G_{3}, the asocial state (see Fig. 4).
For ν > 0, the flow on the edge G_{2}G_{3} leads towards G_{3}, so that the frame spanning the saddletype surfaces W_{K} is cyclically oriented (see Fig. 5). As before, there exists now a line L of fixed points in the interior of S_{4}. The surface W_{1} consists of periodic orbits. If Δ := (β − γ)(1 − ν) + (g − c)(μ − ν) is negative, all nonequilibrium orbits on W_{K} with 0 < K < 1 spiral away from this line of fixed points. On W_{K}, they spiral towards the heteroclinic cycle G_{1}G_{2}G_{3}G_{4}. All nonequilibrium orbits in W_{K} with K > 1 spiral away from that heteroclinic cycle and towards the line of fixed points. If Δ is positive, the converse holds. If Δ = 0 (for instance, if β = γ and μ = ν), then all orbits off the edges and the line L of fixed points are periodic. For ν → 0, we obtain again to a highly degenerate bifurcation replacing a onedimensional continuum of fixed points (which shrinks towards Q as ν decreases) by another, namely the edge G_{2}G_{3}.
We stress the highly unpredictable dynamics if ν > 0 and Δ ≠ 0. For onehalf of the initial conditions, the replicator dynamics sends the state towards the line L of fixed points. But there, random fluctuations will eventually lead to the other half of the simplex, where the replicator dynamics leads towards the heteroclinic cycle G_{1}G_{2}G_{3}G_{4}. The population seems glued for a long time to one strategy, then suddenly switches to the next, remains there for a still longer time, etc. However, an arbitrarily small random shock will send the state back into the halfsimplex where the state converges again to the line L of fixed points, etc. Not even the time average of the frequencies of strategies converges. One can say only that the most probable state of the population is either monomorphic (i.e., close to one corner of S_{4}) or else close to the attracting part of the line of fixed points (all four types present, the proportion of cooperators larger among rewarders than among nonrewarders).
In this paper, we have concentrated on the replicator dynamics. There exist other plausible game dynamics, for instance, the best reply dynamics (see, e.g., ref. 14), where it is assumed that occasionally players switch to whatever is, among all pure available strategies, the best response in the current state of the population. Berger (19) has shown that almost all orbits converge in this case to m. We note that if the values of μ and ν are small, the frequency x_{1} + x_{4} of giftgivers is small.
Discussion
In a minigame, players are in two roles with two options in each role. Such games lead to interesting dynamics on the simplex S_{4}. The edges of this simplex span a family of saddlelike surfaces that foliate S_{4}. The orientation on the edges is given by the payoff values, i.e., by the signs of R, S, r, and s. Generically, these numbers are all nonzero. But in many games (especially among those given in extensive form), there exists one option where the payoff is unaffected by the type of the other player. In the Public Goods with Punishment, this is the giftgiving option: the coplayer will never punish. In the Public Goods with Reward, it is the option to withhold the gift: the coplayer will never reward. In each case, one edge of S_{4} consists of fixed points, one segment of it (from a point Q up to a corner G_{i} of the edge) being made up of Nash equilibria. A small perturbation leading from a point x on QG_{i} into the interior of the simplex (i.e., introducing missing strategies) is offset by the dynamics, i.e., the new strategies are eliminated again and the state returns to QG_{i}. But in one case, the state is closer to Q than before; in the other case, it is further away. The corresponding bifurcation replaces the fixed points on that edge by a continuum of fixed points, which, in one case, are saddle points (on the invariant surface W_{K}) and in the other case have complex eigenvalues. There are two rather distinct types of longterm behavior—in one case, bistability, and in the other case, a highly complex and unpredictable oscillatory behavior.
It is obviously easy to set up experiments where the reputation of the coplayer is manipulated. In particular, our model seems to predict that in the punishment treatment, what is essential for the bifurcation is a nonzero likelihood (corresponding to the parameter μ) that the cooperator believes that she is faced with a nonpunisher. What is essential for the bifurcation to happen in the rewards treatment, in contrast, is that there is a nonzero likelihood (corresponding to the parameter ν) that the defector believes that he is faced with a rewarder.
The possibly irritating message is that for promoting cooperative behavior, punishing works much better than rewarding. In both cases, however, reputation is essential.
Acknowledgments
K.S. acknowledges support of the Wissenschaftskolleg (Vienna) Grant WK W008 Differential Equation Models in Science and Engineering; Ch.H. acknowledges support of the Swiss National Science Foundation, and M.A.N. acknowledges support from the Packard Foundation, the Leon Levy and Shelby White Initiatives Fund, the Florence Gould Foundation, the Ambrose Monell Foundation, the Alfred P. Sloan Foundation, and the National Science Foundation.
Footnotes

↵§ To whom reprint requests should be addressed. Email: nowak{at}ias.edu.

This paper was submitted directly (Track II) to the PNAS office.
 Received March 30, 2001.
 Copyright © 2001, The National Academy of Sciences
References
 ↵

 Wedekind C,
 Milinski M
 ↵
 ↵
 Kagel J H,
 Roth A E
 ↵
 ↵
 Henrich J,
 Boyd R,
 Bowles S,
 Camerer C,
 Fehr E,
 Gintis H,
 McElreath R
 ↵
 Gintis H
 ↵
 ↵
 ↵
 Gale J,
 Binmore K,
 Samuelson L
 ↵
 Huck S,
 Oechssler J
 ↵
 Gaunersdorfer A,
 Hofbauer J,
 Sigmund K
 ↵
 Hofbauer J,
 Sigmund K
 ↵
 Cressman R,
 Gaunersdorfer A,
 Wen J F
 ↵
 ↵
 Weibull J W
 ↵
 Nowak M A,
 Page K M,
 Sigmund K
 ↵
 Berger U