New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
On the impossibility of predicting the behavior of rational agents

Edited by Richard D. McKelvey, California Institute of Technology, Pasadena, CA, and approved August 1, 2001 (received for review November 8, 2000)
Abstract
A foundational assumption in economics is that people are rational: they choose optimal plans of action given their predictions about future states of the world. In games of strategy this means that each player's strategy should be optimal given his or her prediction of the opponents' strategies. We demonstrate that there is an inherent tension between rationality and prediction when players are uncertain about their opponents' payoff functions. Specifically, there are games in which it is impossible for perfectly rational players to learn to predict the future behavior of their opponents (even approximately) no matter what learning rule they use. The reason is that in trying to predict the nextperiod behavior of an opponent, a rational player must take an action this period that the opponent can observe. This observation may cause the opponent to alter his nextperiod behavior, thus invalidating the first player's prediction. The resulting feedback loop has the property that, a positive fraction of the time, the predicted probability of some action next period differs substantially from the actual probability with which the action is going to occur. We conclude that there are strategic situations in which it is impossible in principle for perfectly rational agents to learn to predict the future behavior of other perfectly rational agents based solely on their observed actions.
Rationality vs. Predictability
Economists often assume that people are rational: they maximize their expected payoffs given their beliefs about future states of the world. This hypothesis plays a crucial role in game theory, where each player is assumed to choose an optimal strategy given his belief about the strategies of his opponents. In this setting, a belief amounts to a forecast or prediction of the opponents' future behavior, that is, of the probability with which the opponents will take various actions. The prediction is good if the forecasted probabilities are close to the actual probabilities. Together prediction and rationality justify the central solution concept of the theory. Namely, if each player correctly predicts the opponents' strategies and if each chooses an optimal strategy given his prediction, then the strategies form a Nash equilibrium of the repeated game. But under what circumstances will rational players actually learn to predict the behavior of others starting from outofequilibrium conditions?
In this article we show that there are very simple games of incomplete information such that players almost never learn to predict their opponents' behavior even approximately, and they almost never come close to playing a Nash equilibrium. This impossibility result and its proof builds on the existing literature on learning in repeated games (1–8); for other critiques of Bayesian learning in economic environments see refs. 9–11 and 19. The present contribution demonstrates the incompatibility between rationality and prediction without placing any restrictions on the players' prior beliefs, their learning rules, or the degree to which they are forwardlooking.
An Example.
We begin by illustrating the problem in a concrete case. Consider two individuals, A and B, who are playing the game of matching pennies. Simultaneously each turns a penny face up or face down. If the pennies match (both are heads or both are tails), then B buys a prize for A; if they do not match, A buys a prize for B. Assume first that the prize is one dollar and that the utility of both players is linear in money. Then the game has a unique Nash equilibrium in which each player randomizes by choosing heads (H) and tails (T) with equal probability. If both adopt this strategy, then each is optimizing given the strategy of the other. Moreover, although neither can predict the realized action of the opponent in any given period, each can predict his strategy, namely, the probabilities with which the actions will be taken. In this case no tension exists between rationality and prediction because the game has a unique equilibrium, and the players know what it is.
Now change the situation by assuming that if both players choose H, then B buys an ice cream cone for A, whereas if both choose T then B buys A a milk shake. Similarly, if A chooses H and B chooses T then A buys B a coke, whereas if the opposite occurs then A buys B a bag of chips. Assume that the game is played once each day, the players' tastes do not change from one day to the next, and they have a fixed positive utility for each of the prizes and also for money. Unlike the previous situation, this is a game of incomplete information in which neither player knows the other's payoffs.
For expositional simplicity assume first that the players are myopic, that is, they do not worry about the effect of their actions on the future course of the game. Imagine that the following sequence of actions has occurred over the first ten periods. The immediate problem for each player is to predict the intention of the opponent in period 11 and to choose an optimal response. The opponent's intention might be to play H for sure, T for sure, or to randomize with probability p for H and 1 − p for T. If the opponent's intention is to randomize, then obviously one cannot predict his realized action, but it does not seem too much to ask that one predict the approximate probability with which he intends to play each action. We claim, however, that this is essentially impossible.
To see why, let's put ourselves in A's shoes. The behavior of B suggests an alternating pattern, perhaps leading us to predict that B will play T next period. Because we are rational, we will (given our prediction) play T for sure next period. But if B is a good predictor, then she must be able to predict that with high probability we are in fact going to play T next period. This prediction by B leads her to play H next period, thus falsifying our original prediction that she is about to play T.
The point is that if either side makes a prediction that leads them to play H or T for sure, the other side must predict that they are going to do so with high probability, which means that they too will choose H or T for sure. But there is no pair of predictions such that both are approximately correct and the optimal responses are H or T for sure. It follows that, for both players to be good predictors of the opponent's nextperiod behavior, at least one of them must be intending to play a mixed strategy next period, and the other must predict this.
Suppose, for example, that player B intends to play a mixed strategy in period 11. Because B is rational, she only plays a mixed strategy if she is exactly indifferent between playing H and T given her predictions about A. (If there is a slight difference in payoff between the two actions, strict rationality requires that the one with higher payoff be chosen exclusively.) Now B's predictions about A's behavior in the 11th period are based on the observed history of play in the first 10 periods. Let's say that the particular history given above leads B to predict that A will play H with a probability of 0.127. Because B intends to play mixed, it must be the case that B's expected utility from playing H or T is identical given B's utility function u_{B} for the various outcomes. In other words, it must be that But there is no reason to think that B's utilities actually do satisfy this equation exactly. More precisely, let us suppose that B's utility for each outcome could be any real number within a certain interval, and that B's actual utility (B's type) is the result of a random draw from among these possible values. (The draw occurs once and for all before the game begins.) Following Jordan (2), we claim that the probability is zero that the above equation will be satisfied. The reason is that there is only a finite number of distinct predictions that B could make at this point in time, because B's prediction can only be based on A's (and B's) previous observed behavior together with B's initial beliefs. Because this argument holds for every period, the probability is zero that B will ever be indifferent. From this and the preceding argument it follows that in any given period, one or both players must be making a bad prediction. Moreover, they cannot be playing a Nash equilibrium in any given period (or even close to a Nash equilibrium), because this would require them to play mixed strategies, which means that both must be indifferent.
Jordan (2) was the first to employ this kind of argument to show that myopic players effectively cannot learn mixed equilibria no matter what their beliefs are. Moreover, as we have just seen, the same argument shows that at least one of them cannot learn to predict the behavior of the other. The limitation of Jordan's result is that it assumes players are completely myopic. Forwardlooking behavior allows for a richer repertoire of learning strategies and more time to detect complex patterns in the behavior of one's opponent. Nevertheless, the incompatibility between rationality and prediction continues to hold even in this case, as we shall show below.
A second closely related body of work is provided by Nachbar (6–8). He was the first to argue that there is a fundamental tension between prediction and rationality in the context of Bayesian learning even when players are forwardlooking. Nachbar's critique was prompted by an earlier paper by Kalai and Lehrer (4), which laid out conditions under which Bayesian rational players would in fact be able to learn to predict the behavior of their opponents. Suppose that each player begins the game with a prior belief over the possible repeated game strategies that his opponents might use. Kalai and Lehrer show that, if these prior beliefs contain a “grain of truth,” that is, they put positive probability (however small) on the actual repeated game strategies of the opponents, then players learn to predict with probability one.
As Nachbar points out, however, the grainoftruth condition may be very difficult to satisfy in practice. To illustrate, consider the preceding example and suppose that the players are perfectly myopic. Then the unique equilibrium of the repeated game is for A to play H with some fixed probability p* each period and for B to play H with some fixed probability q* each period. These values are not known to the players because p* depends on B's payoffs, whereas q* depends on A's payoffs. Can they be learned through Bayesian updating of a diffuse prior? Suppose that each player begins with a belief that the other is playing an i.i.d. strategy with an unknown parameter (the probability of playing H), where the beliefs have full support on the interval [0, 1]. In any given period, the players almost surely will have updated beliefs that lead them to play H or T with a probability of 1 in that period, because the expected payoffs from H and T are not exactly equal. However, their updated beliefs lead them to predict that their opponent is almost surely going to play a mixed strategy next period. Thus their predictions almost certainly are not close to their actual strategies. Furthermore, as the game proceeds, rationality causes them to play H for sure in some periods and T for sure in others. Hence their actual strategies are not i.i.d. and not in the support of their beliefs. More generally, Nachbar (6–8) argues that in games such as this it is difficult to identify any plausible family of beliefs such that the players' best response strategies are in the support of their beliefs [another paper in the same general spirit is provided by R. I. Miller and C. W. Sanchirico (12)].
In this paper we are agnostic about whether or not the players are Bayesian and what the structure of their priors might be. Instead we show that no matter how players use the information revealed by repeated play, they will fail to learn to predict the opponents' behavior in some kinds of games.
Before turning to a precise statement of our result, we should point out that it is prediction by the players that is problematical; to an observer the average behavior of the players may exhibit empirical regularities. For example, it could be that the cumulative frequency distribution of play approaches a Nash equilibrium of the game. In fact this will be the case for fictitious play in which each player uses the empirical distribution of the opponent's play up through a given period to predict his nextperiod behavior and then chooses a best response given that prediction. In games such as matching pennies, this simple learning rule induces long run average behavior that converges to the mixed Nash equilibrium of the game (13, 14). There are other models in which a player's average behavior mimics Nash equilibrium from the observer's standpoint (15, 16); in fact Nash himself proposed such an interpretation (17). But this does not imply that individual players ever play Nash equilibrium strategies or that they learn to predict.
The Learning Model.
We now describe our impossibility result in detail. Consider an nperson game G with finite action space X = Π X_{i} and utility functions u_{i}: X → R. We shall assume that the payoffs take the form u_{i}(x) = u(x) + ω_{i}(x), where the u(x) are payoffs in a benchmark game G^{0}, and the ω_{i}(x) are i.i.d. random variables drawn from a continuous density ν(ω), the support of which is the interval Ι_{λ} = [−λ/2, λ/2]. The parameter λ > 0 is the range of uncertainty in the payoffs. We shall assume that the distribution of payoffs is common knowledge, but the realized payoff u_{i}(x) is known only to player i. Errors are drawn once only before play begins, and the resulting oneshot game (called the νperturbation of G^{0}) is played infinitely often.
Each player takes an action once in each time period t = 1, 2, 3,… The outcome in period t is an ntuple of actions x^{t} ∈ X, where x is the action taken by i in period t. A state of the process at time t is a history of play up to t, that is, a sequence of outcomes h^{t} = (x^{1}, x^{2}, … x^{t}). Let h^{0} represent the null history, H^{t} the set of all lengtht histories, and H = ∪_{t} H^{t} the set of all finite histories, i.e., the set of all states. A realization of the process will be denoted by h, and the set of realizations (i.e., the set of infinite histories) will be denoted by H^{∞}. Histories are observed publicly, that is, there is perfect monitoring.
The discounted payoff to player i from a realization h = (x^{1}, x^{2}, … x^{t}, … ) is where δ_{i} is i's discount factor, 0 ≤ δ_{i} < 1 (if δ_{i} = 0, U_{i}(h) = u_{i}(x^{1})). Let Δ_{i} denote the set of probability distributions over X_{i}. Let Δ = Π_{j}Δ_{j} denote the product set of mixtures, and let Δ_{−i} = Π_{j≠i}Δ_{j} be the product set of mixtures by i's opponents. A behavioral strategy for player i specifies a conditional probability distribution over i's actions in each period conditional on the state in the previous period. Thus we can represent i's strategy by a function q = g_{i}(h^{t−1}) ∈ Δ_{i}, where q(x_{i}) is the probability that i plays x_{i} in period t given that h^{t−1} is the state in period t − 1. This is of course a function of i's realized utility function u_{i}, but we shall not write this dependence explicitly.
A prior belief of player i is a probability distribution over all possible combinations of the opponent's strategies. We can decompose any such belief into onestepahead forecasts of the opponent's behavior conditional on each possible state. Thus, if h^{t−1} is the state at time t − 1, i's forecast about the behavior of her opponents in period t can be represented by a probability distribution p = f_{i}(h^{t−1}) ∈ Δ_{−i}, where p(x_{−i}) is the probability that i assigns to the others playing the combination x_{−i} in period t. The function f_{i}: H → Δ_{−i} will be called i's forecasting function. Given any vector of forecasting functions f = (f_{1}, f_{2}, … f_{n}), one for each player, there exists a set of prior beliefs such that the f_{i} describe the onestepahead forecasts of players with these beliefs [see Kalai and Lehrer (4)].
Consider the situation just after the players have been informed privately of their realized payoff functions u_{i}. Because of the independence of the draws among players, no one knows anything he did not already know about the others' payoffs, and this fact is common knowledge. This has an implication for the forecasting functions. Namely, at the beginning of each period t, i knows that j's information consists solely of the publicly observed history h^{t−1} and j's own payoff function u_{j}. Player j's behavior cannot be conditioned on information that j does not have (namely u_{−j}), and player i's forecast of j's behavior cannot be conditioned on information that i does not have (namely, u_{−i}). Thus i's forecast [f_{i}(h^{t−1})]_{j} about j's behavior in each period t does not depend on the realization of the values u_{k} for every k, including k = i, j. It follows that the functions f_{i} do not depend on the realized payoff functions u_{i}(⋅), although they may depend on ν. Another way of saying this is that the beliefs must be consistent with the players' a priori knowledge of the information structure.
Following Jordan (2), we shall say that a learning process is a pair (f, g) = (f_{1}, … , f_{n}, g_{1}, … , g_{n}), where f_{i}: H → Δ_{−i} and g_{i}: H → Δ_{i} for each player i. Given a realization of the process h, we shall denote player i's forecast in period t by p(h) = f_{i}(h^{t−1}), and i's behavioral strategy in period t by q(h) = g_{i}(h^{t−1}).
The pair (f_{i}, g_{i}) induces a probability measure on the set of all realizations H^{∞}. Similarly, for every state h^{t−1}, f_{i} and g_{i} induce a conditional probability distribution on all continuations of h^{t−1}. Denote this conditional distribution by μ_{i}(f_{i}, g_{i} h^{t−1}). We say that individual i is rational if, for every h^{t−1}, i's conditional strategy g_{i}(⋅h^{t−1}) optimizes i's expected utility from time t on, given i's conditional forecast f_{i}(⋅h^{t−1}). (This is also known as sequential rationality.) Specifically, for every alternative choice of strategy g_{i}′(⋅h^{t−1}),
Prediction.
Intuitively, player i learns to predict the behavior of his opponent(s) if i's forecast of their nextperiod behavior comes closer and closer to their actual nextperiod strategies. This idea may be formalized as follows. Consider a learning process (f, g), and let μ(g) denote the probability measure induced on H^{∞} by the strategies g = (g_{1}, g_{2}, … , g_{n}). We say that player i learns to predict if the mean square error of i's nextperiod predictions goes to zero over almost all histories of play. In other words, for μ(g)almost all realizations h 1 Similarly, we shall say that player i never learns to predict if the subset of histories for which Eq. 1 holds has μmeasure zero. Note that this condition permits players to make bad forecasts from time to time, provided they do not occur too often.
An Impossibility Theorem.
We now demonstrate a class of repeated games such that, with probability one, some player never learns to predict his opponent's behavior, and this holds for all prior beliefs. Because our result holds for all beliefs, it must hold for beliefs that are in some sense best possible. A reasonable candidate for “best possible beliefs” are rational expectations beliefs. These have the property that, at every point in time, each player's prediction of his opponent's future behavior is conditioned correctly on the posterior distribution of payoff types revealed by play so far. Jordan (1, 3) shows that these posterior distributions converge to the set of Nash equilibria of the game (see also ref. 18). However, this does not imply that the posteriors lead to predictions that are close to being correct for a given opponent. Our result shows, in fact, that these rational expectations predictions are not close to being correct for almost all opponents.
This still leaves open the possibility that for some combinations of beliefs the players' strategies converge to Nash equilibrium even though their predictions do not. In a repeated game convergence to equilibrium can be given a variety of interpretations; we shall show that the process fails to converge to equilibrium in almost any reasonable sense. Let Q^{N} be the set of all oneperiod strategy tuples q ∈ Δ such that q occurs in some time period in some Nash equilibrium of the repeated game. For every q ∈ Δ let d(q, Q^{N}) be the minimum Euclidean distance between q and the compact set Q^{N}. Given a learning process (f, g) and a specific history h, if the behavioral strategies come close to Nash equilibrium on h then at a minimum we would expect the following condition to hold, 2 This implies that, for every ɛ > 0, play is within ɛ of some Nash equilibrium at each point in time except possibly for a sparse set of times. We shall show that the process fails to come close to Nash in the sense that Eq. 2 fails to hold for almost all histories h.
Theorem.
Let ν be a continuous density on [−λ/2, λ/2], and let G be a νperturbation of a finite, zerosum, twoperson game G^{0}, all of whose Nash equilibria have full support. Assume that the players are perfectly rational, have arbitrary discount factors less than unity, and that each updates his predictions of the opponent's future behavior by a learning rule that is based solely on observable actions. If λ is sufficiently small, then for νalmost all payoff realizations, the probability is 1 that someone never learns to predict and that play fails to come close to Nash.
We remark that the set of games for which this impossibility result holds is actually much larger than the one stated in the theorem. Consider, for example, any twoperson game G with strategy space Y_{1} × Y_{2} such that Y_{i} ≥ 2; all Nash equilibria have full support on Y_{1} × Y_{2}, and every action not in Y_{i} is dominated strictly by some action in Y_{i}. Then the theorem holds for perturbed versions of this game. Next let us extend G to an nperson game G* by adjoining n − 2 players as follows: each new player has a strictly dominant action, and G* is the twoperson subgame that results when they play these actions. It follows that for any finite action space X = ΠX_{i}, there exists an nperson game G* on X such that when the payoffs of G* are perturbed by small i.i.d. random errors, good prediction fails to occur with probability one.
Now consider any nperson game G on the finite strategy space X = ΠX_{i}. Suppose that we perturb the payoffs of G by i.i.d. random errors drawn from a normal distribution or in fact any distribution with a continuous density, the support of which is the whole real line. With positive probability the payoffs of the realized game will be close to the game G* constructed above. Thus as a corollary we obtain the following.
Corollary.
Let G be any finite nperson game, the payoffs of which are perturbed once by i.i.d. normally distributed random errors. Assume that the players are perfectly rational, have arbitrary discount factors less than unity, and that each updates his predictions of the opponents' future behavior by a learning rule that is based solely on observable actions. For almost all payoff realizations, there is a positive probability that someone never learns to predict and that play fails to come close to Nash.
Proof of the Theorem.
Because the proof is somewhat involved, we shall explain first why the argument given in the introduction for myopic players does not extend easily to the general case. One difficulty is that patient players might interact through conditional strategies that involve no randomization, and these might be predictable at least some of the time. Eliminating this case requires a delicate probabilistic argument. The second difficulty is that even when players randomize and are therefore indifferent among alternative strategies, this does not imply that the stagegame payoffs are solutions of a linear equation. Rather, they are the roots of a nonlinear function, and we must show that the roots of this function constitute a set of measure zero.
To increase the transparency of the proof, we shall give it for the game of matching pennies. It generalizes readily to any finite zerosum, twoperson game, the stagegame Nash equilibria of which are all strictly interior in the space of mixed strategies. Fix a continuous density ν, the support of which is [−λ/2, λ/2]. To be concrete, we may think of ν as the uniform distribution. The perturbed game has the payoff matrix where ω_{ij}, ω′_{ij} are i.i.d. random variables distributed according to ν.
Fix two rational players, 1 and 2, with discount factors 0 ≤ δ_{1} ≤ δ_{2} < 1. Let their beliefs be f_{1}, f_{2}, and let their strategies be g_{1}(⋅A), g_{2}(⋅B), where A and B are the realized values of the players' payoff matrices. The functions f_{1}, f_{2}, g_{1}, g_{2} will be fixed throughout the proof. All probability statements will be conditional on them without writing this dependence explicitly. Let H(A, B) be the set of all histories h such that good prediction (Eq. 1) holds when the realized payoffs are (A, B). Let P be the set of pairs (A, B) such that good prediction holds with positive probability, that is, μ[H(A, B)] > 0. First we shall show that ν(P) = 0, that is, there are almost no payoff realizations (A, B) such that both players learn to predict with positive probability. In the second part of the proof we shall show that for almost all (A, B) the process fails to come close to Nash.
Lemma 1.
For every positive integer m, every 0 < ɛ′ ≤ ɛ < 1, and every (A, B) ∈ P, there exists a time T, possibly depending on m, ɛ, ɛ′, A, B, such that with μprobability at least 1 − ɛ′, each player forecasts the other's nextperiod strategy within ɛ in each of the periods T + 1, … , T + m.
Proof.
Let (A, B) ∈ P and suppose there were no such time T. Then for every time T the μprobability would be greater than ɛ′ > 0 that at least one player misforecasts the opponent's behavior by more than ɛ in one or more of the periods T + 1, … , T + m. This would imply that Eq. 1 is violated for almost all histories, that is, μ[H(A, B)] = 0, which contradicts our assumption that (A, B) ∈ P.
Lemma 2.
For each (A, B) ∈ P there exists a time T and a history h^{T}, possibly depending on A, B, such that conditional on h^{T} each player's expected future payoffs, discounted to T + 1, are bounded above by cλ for some positive number c that depends only on the discount rates.
Proof.
Given a small λ > 0, choose m ≥ 1 such that δ ≤ λ and 0 < ɛ′ ≤ ɛ ≤ λ/m4^{m}e^{m}. As guaranteed by Lemma 1, let h^{T} be a history such that the μprobability is at least 1 − ɛ′ that each player forecasts the other's nextperiod strategy within ɛ in each of the periods T + 1, … , T + m. Let α^{*}_{Τ+1} and β^{*}_{T+1} be the payoffs that players 1 and 2 expect to get from period T + 1 on, discounted to period T + 1. We shall exhibit a positive constant c, depending only on the discount factors, such that α^{*}_{T+1}, β^{*}_{T+1} ≤ cλ. Note first that each player has the option of playing 50–50 in each period from T + 1 on, which has an expected discounted payoff at least −λ/2. Because each player's strategy is optimal, it follows that α^{*}_{T+1}, β^{*}_{T+1} ≥ −λ/2.
For each j, 1 ≤ j ≤ m, let α_{T+j} be player 1's expected undiscounted payoff in period T + j as forecast by player 1 at the end of period T. Define β_{T+j} similarly for player 2. Let H_{j, hT} be the set of all continuations of h^{T} to time T + j. Let φ_{1}(h^{T+j}) denote player 1's probability assessment of h^{T+j} ∈ H_{j, hT} and similarly define φ_{2}(h^{T+j}) for player 2. The true probability is μ_{0}(h^{T+j}), where μ_{0} is μ conditional on h^{T}. The set of continuations on which someone makes a bad forecast have μ_{0}probability at most ɛ′. On the remaining good continuations, each player errs by at most ɛ in forecasting his opponent's stagegame behavior in each of j stages. Hence for every good continuation h^{T+j}, φ_{i}(h^{T+j}) − μ_{0}(h^{T+j}) ≤ (1 + ɛ)^{j} − 1 ≤ (jɛ)e^{jɛ}.
Each player's forecasted payoff in period T + j cannot differ from the actual payoff in period T + j by more than 2 + λ no matter how bad the forecast is. There are 4^{j} continuations to period T + j including good and bad. Over all of the good ones, player 1's forecasted expected payoff differs from his actual expected payoff by at most 4^{j}(jɛ)e^{jɛ} (2 + λ). Over all of the bad ones the two differ by at most ɛ′(2 + λ). Thus the difference between 1's forecasted expected payoff, α_{T+j}, and his actual expected payoff, ᾱ_{T+j}, is at most (ɛ′ + 4^{j}(jɛ)e^{jɛ})(2 + λ). By assumption, ɛ′ ≤ ɛ ≤ λ/m4^{m}e^{m} and j ≤ m, so ɛ′ + 4^{j}(jɛ)e^{jɛ} ≤ ɛ + 4^{m}(mɛ)e^{mɛ} ≤ 2λ. Thus α_{T+j} − ᾱ_{T+j} ≤ 2λ(2 + λ) ≤ 6λ. Similarly β_{T+j} − β̄_{T+j} ≤ 6λ. The actual payoffs satisfy ᾱ_{T+j}+ β̄_{T+j} ≤ λ, from which we conclude that α_{T+j} + β_{T+j} ≤ 13λ for 1 ≤ j ≤ m.
For each j, 1 ≤ j ≤ m, let α^{*}_{T + j} = (1 − δ_{1})(α_{T+j} + δ_{1}α_{T+j+1} + δ α_{T+j+2} + … ) be player 1's expected payoff from period T + j on, discounted to period T + j, as forecast at the end of period T. Similarly define β^{*}_{T+j} = (1 − δ_{2})(β_{T+j} + δ_{2}β_{T+j+1} + δ β_{T+j+2} +… ). We claim that α^{*}_{T+j}, β^{*}_{T+j} ≥ −λ/2 for every j. If not, some player could switch his strategy to a 50–50 random mixture from period T + j on, thus increasing his expected payoff from that time on, which would contradict sequential rationality.
Beyond period T + m, the forecasts may no longer be good within ɛ. However, neither player expects to get more than 1 + λ/2 in any period, thus the sum of expected payoffs beyond period T + m, discounted to period T + 1, cannot be more than (1 − δ_{2})δ(1 + λ/2). By choice of m, δ ≤ λ, thus the previous expression is at most 2λ when λ is small. Putting this fact together with α_{T+j} + β_{T+j} ≤ 13λ, it follows that 3 The term ∑δ α_{T+j} is similar in form to α^{*}_{T+1} except that the wrong discount factor is being used, and the sum is truncated. Nevertheless, we claim that if α^{*}_{T+1} is small, then so is the term in question. To see this, consider the identity α^{*}_{T+j} = δ_{1}α^{*}_{T+j+1} + (1 − δ_{1})α_{T+j}, which holds for all j. From this we obtain and after rearranging terms, 4 All of the terms α^{*}_{T+2}, … α^{*}_{T+m} are at least −λ/2, the term α^{*}_{T+m+1} is at most 1 + λ/2, and δ_{1}δ ≤ δ ≤ λ. Thus, the righthand side of Eq. 4 is bounded below by α^{*}_{T+1}/(1 − δ_{1}) − c′λ, where c′ > 0 depends only on the discount factors. The lefthand side of Eq. 4 is the summation on the righthand side of 3. Substituting this expression into 3 we see that β^{*}_{T+1} + [(1 − δ_{2})/(1 − δ_{1})]α^{*}_{T+1} ≤ 15λ + (1 − δ_{2})c′λ. Because α^{*}_{T+1}, β^{*}_{T+1} ≥ − λ/2, we conclude that both α^{*}_{T+1} and β^{*}_{T+1} are bounded above by cλ for some c that depends only on the discount factors δ_{1} and δ_{2}. This concludes the proof of Lemma 2.
Lemma 3.
For every positive integer m and all sufficiently small λ > 0, if (A, B) ∈ P, then there exists a history h^{T} such that, conditional on h^{T} at time T, both players randomize in each of the periods T + 1, … , T + m.
Proof.
As in the proof of Lemma 2, choose m ≥ 1 such that δ ≤ λ and let 0 < ɛ ≤ λ/m4^{m}e^{m}. Assume in addition that ɛ′ = ɛ^{4m}. Now apply Lemma 1 with 2m instead of m: there is a time T such that, with a probability of at least 1 − ɛ′, the nextperiod forecasts are within ɛ of being correct for the periods T + 1, … , T + 2m.
For each h^{T+j}, 0 ≤ j ≤ 2m−1, say that h^{T+j} is good if both players' nextperiod forecasts are within ɛ of being correct; otherwise h^{T+j} is bad. Say that h^{T+j} is γgood if it is good and, conditional on h^{T+j} occurring in period T + j, the probability is at most γ that someone makes a bad nextperiod forecast in any continuation of h^{T+j} through period T + 2m − 1.
By choice of T there is at least one state, h^{T}, that has positive probability under the strategies and is ɛ′good. Lemma 2 implies that the expected discounted payoffs from T + 1 on are bounded above by cλ. We claim this implies that both players randomize in period T + 1, and in fact each of them chooses each action with probability at least ɛ. Suppose, to the contrary, that some player (say player 1) chooses action 1 with probability less than ɛ. Because h^{T} is good, player 2 forecasts that 1 will play action 2 with probability at least 1 − 2ɛ. But then player 2 could obtain a higher expected payoff by mismatching (playing action 1) in period T + 1 and randomizing fiftyfifty in every period thereafter. (The expected payoff from this strategy is at least [(1 − δ_{2})(1 − 4ɛ) − λ/2], which is greater than cλ for all sufficiently small λ and ɛ ≤ λ.) This contradiction shows that player 1 chooses each action in period T + 1 with probability at least ɛ, and the same holds for player 2.
It follows that each of the four possible continuations of h^{T} to period T + 1 has probability at least ɛ^{2}. Because ɛ^{2} > ɛ′ and h^{T} is ɛ′good, none of these four continuations can be bad, and in fact each of them must be at least (ɛ′/ɛ^{2})good. Now apply Lemma 2 again (redefining ɛ′ to be ɛ^{4m}/ɛ^{2}) and conclude that, for every continuation of h^{T} to some h^{T+2}, the conditional expected payoffs from period T + 2 forward are bounded above by cλ. As before, we conclude that both players randomize in period T + 2, each putting at least ɛ on each action. Continuing in this manner, we deduce that both players randomize in every continuation of h^{T} to period T + m. This concludes the proof of Lemma 3.
The gist of the proof thus far is that, if the payoff realizations (A, B) lead to good predictions with μpositive probability, then for every sufficiently large positive integer m, there exists a state h^{T} that induces randomization by both players in each of the next m periods. We now show that this implies that the payoffs are zeroes of a function, the set of zeroes of which has νmeasure zero. This will show that good prediction occurs with νmeasure zero.
Let h^{T} be any state, and let m be a positive integer. Suppose that player 1 plays action 1 in each of the periods T + 1 to T + m, after which he plays an optimal strategy given his beliefs. We can write his expected utility, discounted to time T + 1, as a function of his payoff matrix A as follows: Here θ_{1} comes from player 2's randomization between actions 1 and 2, and the remainder term R_{1}(A) is convex and bounded. In fact, R_{1}(A) ≤ (1 − δ_{1}^{m})(a_{11} + a_{12} + a_{21} + a_{22}). Similarly define U_{2}(A) to be player 1's expected utility from playing action 2 for m periods and an optimal strategy thereafter. This can be written analogously to U_{1}(A) with a remainder function R_{2}(A) that satisfies the same bound as R_{1}(A). All of these functions depend of course on h^{T}.
It will be convenient to consider a onedimensional subspace of the payoff matrices A. Namely, for every four real numbers w, x, y, and z, let ψ_{x, y, z}(w) be the 2 × 2 matrix with entries a_{11} = w + x, a_{12} = w − x, a_{21} = y, and a_{22} = z. Given x, y, and z, define the following function: F_{x, y, z}(w) = U_{1}(ψ_{x, y, z}(w)) − U_{2}(ψ_{x, y, z}(w)). Abbreviating ψ_{x, y, z}(w) by ψ(w), we can write this in the form where K_{x, y, z} is a linear function of x, y, z and does not depend on w. The functions R_{i}(ψ(w)) are convex and bounded by the same bound as before. By choosing m to be sufficiently large, we can ensure that F_{x, y, z}(w) is strictly monotone increasing in w. It follows that for any triple x, y, z, there is at most one value of w such that F_{x, y, z}(w) = 0. Because x, y, z are drawn from the continuous density ν, we have P[{w: F_{x, y, z}(w) = 0x, y, z)}] = 0. By the smoothing theorem (i.e., the law of iterated expectations), it follows that P[{(w, x, y, z) ∈ R^{4}: F_{x, y, z}(w) = 0}] = 0.
To state this in terms of the matrix A, let G(A) = F_{x, y, z}(a_{11} + a_{12})/2) where x = (a_{11} − a_{12})/2, y = a_{21}, and z = a_{22}. The preceding implies that P[{A: G(A) = 0}] = 0. Recalling that F (and thus G) are conditional on a particular history h^{T}, we can write this as P[{A: G(A) = 0}h^{T}] = 0. Hence, Σ_{hT} P[{A: G(A) = 0} h^{T}] P(h^{T}) = 0. In other words, player 1 is only indifferent between actions 1 and 2 on a set of payoff matrices A having νmeasure zero.
Suppose now that (A, B) is a pair for which good prediction holds. Let h^{T} be a history as guaranteed by Lemma 3, where m is sufficiently large that F is strictly monotone increasing in w. By Lemma 3, player 1 randomizes in each of the periods T + 1, … , T + m. Hence he is indifferent between playing action 1 or action 2 in each of these periods, an event that has νmeasure zero. We conclude that there are νalmost no payoff realizations (A, B) such that both players learn to predict with positive probability. This establishes the first claim of the theorem. We also note for future reference that we have actually established the following fact.
Lemma 4.
If m is large enough and λ is small enough, then the νprobability is zero that there exists a state h^{T} such that, conditional on h^{T} at time T, both players randomize in each of the periods T + 1, … , T + m.
It remains to be shown that, for νalmost all (A, B), play fails to come close to the set of Nash equilibria in the sense that condition 2 fails to hold for almost all histories h. The first step is to show that all Nash equilibria of the repeated game are mixed sufficiently in each time period provided that λ is sufficiently small.
Lemma 5.
There exists ɛ > 0 and λ′ > 0 such that whenever 0 < λ ≤ λ′, every Nash equilibrium of the repeated game puts probability at least 2ɛ on each action in every time period.
The proof is similar to that of Lemma 2; in outline it runs as follows. In equilibrium, each player's expected discounted payoff must be at least −λ/2, because at least this much is guaranteed by randomizing fiftyfifty in every period. Because the actual payoffs in each period sum to λ or less, each player's expected discounted payoff can be bounded from above by kλ, where k is a positive constant. If some player were to play an action with less than probability 2ɛ in some period t, the opponent can take a pure action with expected payoff at least (1 − 4ɛ − λ/2) in period t and get at least −λ/2 in every period thereafter. When ɛ and λ are sufficiently small, the expected discounted payoff from such a deviation exceeds kλ, a contradiction.
Fix λ ∈ (0, λ′]. For each pair of payoff matrices (A, B), let N(A, B) be the set of all histories h such that condition 2 holds, i.e., such that play comes close to Nash in a weak sense. We are going to show that there are μalmost no such histories for νalmost all (A, B). This is a consequence of the following.
Lemma 6.
Let (A, B) be a pair of payoff matrices such that 2 holds with μpositive probability. Then for every positive integer m and every sufficiently small λ, there exists a state h^{T} such that, conditional on h^{T}, each player randomizes in each of the periods T + 1,… , T + m.
By choosing m large enough, it follows from Lemma 4 that there are νalmost no payoff realizations (A, B) with this property. In other words, for νalmost all payoff realizations play fails to come close to Nash. Thus, once we establish Lemma 6, we will have completed the proof of the theorem.
Proof of Lemma 6.
Fix a pair (A, B) such that condition 2 holds with μpositive probability. Choose ɛ and λ such that every element of Q^{N} puts probability at least 2ɛ on each action in each time period as guaranteed by Lemma 5. Let m be a positive integer, and let ɛ′ = ɛ^{2m}. Let q^{t+1}(h^{t}) denote the strategies in period t + 1 given the history h^{t} to period t. There exists a time T such that with μprobability at least 1 − ɛ′, d(q^{t+1}(h^{t}), Q^{N}) ≤ ɛ for every h^{t} in the interval T ≤ t ≤ T + m − 1. (If this were not so, condition 2 would hold with μprobability zero, contrary to our assumption.)
Say that a history h^{t+1} is good if d(q^{t+1}(h^{t}), Q^{N}) ≤ ɛ. It is very good if it is good and all of its successors for the next m periods are good. If a history is good then each action is played in the next period with probability at least ɛ. Hence every continuation of a good history occurs with probability at least ɛ^{2}. If no history at time T is very good, then the μprobability of a bad history occurring in the interval T, T + 1, … , T + m − 1 is at least ɛ^{2m−2} > ɛ′, contrary to our assumption. Hence there exists h^{T} such that d(q^{t+1}(h^{t}), Q^{N}) ≤ ɛ for every continuation of h^{T} in the interval T ≤ t ≤ T + m − 1, and hence both players randomize for m periods in succession. By Lemma 4 this happens with νprobability zero. This concludes the proof of Lemma 6, and thereby the proof of the theorem.
Acknowledgments
We thank the editor, referees, and members of the Santa Fe Institute and the BrookingsJohns Hopkins Center on Social and Economic Dynamics for constructive comments on an earlier draft. This research was supported in part by National Science Foundation Grant SBR 9601743.
Footnotes

↵§ To whom reprint requests should be addressed. Email: pyoung{at}jhu.edu.

This paper was submitted directly (Track II) to the PNAS office.
Abbreviations
 H,
 heads;
 T,
 tails;
 i.i.d.,
 Independent and identically distributed
 Received November 8, 2000.
 Copyright © 2001, The National Academy of Sciences
References
 ↵
 Jordan J S
 ↵
 ↵
 Jordan J S
 ↵
 ↵

 Nachbar J H
 ↵
 Nachbar J H
 ↵

 Binmore K
 ↵
 Majumdar M
 Blume L,
 Easley D
 ↵
Miller, R. & Sanchirico, C. (1997) Columbia University Discussion Paper, No. 969725.
 ↵
 Miyasawa K
 ↵
 ↵
 ↵
 ↵
 Nash J
 ↵
 ↵
 Prawitz D,
 Skyrms B,
 Westerstahl D
 Binmore K
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Social Sciences
Related Content
 No related articles found.