Manipulability of comparative tests
- aDepartment of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208;
- bDepartment of Economics, University of Pennsylvania, 3718 Locust Walk, Phildadelphia, PA 19104; and
- cDepartment of Managerial Economics and Decision Sciences, Kellogg School of Management, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208
-
Edited by Avinash K. Dixit, Princeton University, Princeton, NJ, and approved January 30, 2009 (received for review December 17, 2008)
Abstract
Multiple self-proclaimed experts claim that they know the probabilities of future events. A tester does not know the odds of future events and she also does not know whether, among the multiple experts, there are some who do know the relevant probabilities. So the tester requires each expert to announce, before any data are observed, the probabilities of all future events. A test either rejects or does not reject each expert based on the observed data and the profile of the probabilities announced by the experts. We assume that the test controls for the type I error of rejecting the true probabilities. However, consider the case in which all experts are uninformed (i.e., they do not know anything about true probabilities). We show that they can still independently produce false forecasts that are likely to both pass the test, no matter how the data evolve in the future. Hence, the data may not suffice to effectively discredit uninformed, but strategic, experts.
Forecasting plays a vital role in human activity. Consumers, managers, and politicians make their decisions in part based on their anticipation of future events. In economics, it is often assumed that agents decide as if the relevant probabilities are known. In actuality, the probabilities of key variables such as inflation rates, stock indexes, or election outcomes are difficult to determine. The complexities in properly anticipating future events may encourage decision makers to seek experts' advice. The main difficulty is, however, that professional forecasts may not be reliable. If an expert is informed (i.e., he knows the relevant odds), then he can reveal the relevant probabilities to decision makers. However, if an expert is uninformed (i.e., he knows nothing about the relevant odds), then he may mislead the decision makers. A fundamental question is therefore how to determine whether experts are informed.
One way to evaluate experts' predictive abilities is to compare the forecasts of their competing theories with the observed data. Some authors (see, for example, refs. 1 and 2) compare the performance of weather forecasts coming from different sources. Examples like this abound. The idea that data can be used to compare theories is commonplace in scientific and management practice. This motivates a search for tests comparing the forecasts of self-proclaimed experts (and the observed data) to determine whether there are any informed experts among them. One concern, which can be traced back at least to ref. 3, is that if experts are tested, they may misrepresent what they know to sustain a false reputation of knowledge of the relevant odds.
Consider the following setup. In each period, nature selects an outcome, which can be either 1 or 0, with a probability that may change over time. We refer to the function mapping past histories of data to nature's probability of 1 next period as the true theory. An arbitrary function mapping past data to a probability of 1 is just called a theory. A tester does not know the true theory, but two experts claim that they do. The tester also does not know whether any of the experts is informed about the relevant probabilities and so she tests them. A test takes, as input, the two experts' theories and data and returns, as output, a verdict for each theory that can be either “reject” or “not reject.”
A test defines the following contract. An expert who accepts the contract must deliver a theory before any data are observed. He obtains a positive payoff (e.g., a compensation for his service) at period zero; but if his theory is rejected, then he incurs a negative payoff (e.g., a loss in terms of his professional reputation). An expert who rejects the contract receives zero payoff.
Suppose that a test controls for the type I error of rejecting the true theory, and a contract is based on such a test. Then, an informed expert who announces the true theory is unlikely to incur the negative payoff of his theory being rejected. So informed experts accept the contract. Therefore, if an expert refuses the contract, he then reveals to the tester that he is uninformed. Our main result shows, however, that all uninformed experts who know nothing about the relevant odds will also accept the contract and thereby sustain a false reputation of knowledge.† At the heart of our argument is the demonstration that if the test is likely to pass the true theory, then uninformed experts can strategically produce potentially false theories that are also likely to pass the test, no matter how the data evolve in the future. This result shows a difficulty in determining whether, among several experts, there are some experts who are informed about the relevant probabilities. The true theory cannot be inferred from data, and the data may not suffice to effectively discredit potentially uninformed, but strategic, experts.
Our model can be extended in different ways. For example, in addition to reputational concerns, the experts may also be ideologues who advocate particular theories; they therefore receive additional payoffs if certain theories are announced. Our substantive point still holds: if informed experts must accept a contract, then uninformed experts also accept this contract.
This article is organized as follows: In sections 2 and 3 we present our concepts. Section 4 follows with the results. An example in section 5 shows the difficulties that uninformed experts must overcome to pass some tests. Section 6 concludes. The proofs are in supporting information (SI) Appendix.
1. Literature Review
A number of papers show impossibility results on the testing of a single, potentially strategic expert (see refs. 4–9, 12, 18, 19). The assumption of a single expert leaves open the possibility that the result is an artifact of a single theory being tested. In scientific and management practice, data are often used to compare competing theories. Testing multiple experts is seemingly quite different from testing a single expert. Because different uninformed experts who act independently may produce different theories, the test may determine, depending on the data, that one theory outperformed the other, leading to the rejection of at least one theory.‡ Uninformed experts must overcome this hurdle if they are all to sustain a false reputation of knowledge of the relevant odds. Earlier results suggest a fundamental difference between testing a single expert and testing multiple experts. In ref. 11, the authors provide a test for multiple experts that may reject uninformed experts. However, this test also rejects some theories out-of-hand. Thus, the set of theories rejected without being tested may contain the true theory. So the test in ref. 11 may not control for the type I error of rejecting the true theory. Our result demonstrates that the manipulability of tests depends crucially on whether some theories are rejected without being tested, and less crucially on comparing competing theories.
2. Tests
In each period t = 1,2,…, an outcome ωt which can be either 0 or 1, is observed (It is simple to extend the results to finitely many possible outcomes in each period.). A t history ht = (ω1,….,ωt−1) ∈ {0,1}t−1 comprises the outcomes that can be observed at the start of period t (i.e., before the outcome of period t is observed). These are the outcomes from period 1 to period t − 1. Conditional on any t history ht ∈ {0,1}t−1, a theory f claims that outcome 1 will be observed in period t with probability f(ht).
To simplify the language, we identify a theory with its predictions. That is, we define a theory as an arbitrary function that takes as input any finite history and returns as output a probability of outcome 1 next period.
Formally, a theory is a function
where H∞ = ∪t=1∞{0,1}t−1 is the set of of all finite histories. (We assume that {0,1}0 = {h0}, where h0 denotes the empty (or null) history).
There are two experts (males) in this model. (It is simple to extend the results to any finite number of experts). Before
any outcome is observed (i.e., at period 0), each expert announces a theory. A tester (female) tests the theories. At any
finite history, the tester must either reject or not reject each theory. A comparative test C is a function
where F is the set of all theories and H is the set of all subsets of H∞.
A comparative test takes as input a pair of theories (f1,f2) and returns as output a pair of sets C1(f1,f2) ⊆ H∞ and C2(f1,f2) ⊆ H∞ which consist of finite histories. The set Ci(f1,f2), where i = 1 or i = 2, is the rejection set of theory fi. The histories from the set Ci(f1,f2) are interpreted as inconsistent with the theory fi. So the tester rejects expert i's theory fi at period t if she observes data (i.e., a t- history ht) that belong to the rejection set Ci(f1,f2). In the opposite case, ht∉Ci(f1,f2) and theory fi is not rejected at period t, although the theory fi may (or may not) be rejected next period.§
A comparative test rejects (or does not reject) a given theory depending on the observed data and the other theory (e.g., the rejection set for expert i's theory depends on the entire profile of the experts' theories and not just on expert i's theory). For example, theory 1 may be rejected if, given the observed data, theory 1 was outperformed (in a sense the test specifies) by theory 2. Hereafter, to simplify the language, we refer to comparative tests simply as tests. In addition, if a theory is not rejected by the test at history ht ∈ H∞, we say that the theory passed the test at ht.
At period zero, the tester selects her test C. The two experts learn about (and so they know) the test selected by the tester; each expert then announces his theory at period zero (i.e., before any data are observed). A test defines the following contract: An expert who accepts the contract must deliver a theory at period 0. He receives a payment (i.e., a positive utility u > 0) at period zero, but if his theory is rejected in the future, then he incurs a loss (i.e., a disutility d > 0).¶ An expert who refuses the contract obtains zero payoff.
Outcomes are generated by some true theory. The true theory maps each finite history to nature's probability of 1 next period. Each expert can be either an informed expert who knows and reports the true theory to the tester, or an uninformed expert who knows nothing about the true theory. So, the theories produced by uninformed experts are potentially false (i.e., they need not coincide with the true theory). The tester does not know the true theory. The tester also does not know whether any of the experts are informed. In addition, the tester does not have a prior over the space of theories. So, both the tester and the uninformed experts face uncertainty about the data. (Economists often speak of uncertainty when the odds are unknown, and refer to risk when the odds are known). The tester hopes to learn the odds of future histories from informed experts. That is, she hopes to transform her uncertainty into common risk.
It is helpful to compare the present setting with the conventional Neyman–Pearson approach. Under the Neyman–Pearson approach, we consider two disjoint sets of probability distributions, and test to determine whether one of them contains the true probability distribution. Here, we consider two theories and test to determine whether either of the two (possibly none or both) coincides with the true theory.
3. Properties of Tests
Any theory f uniquely defines a probability of any set A ⊆ H∞ of finite histories. Indeed, the probability of outcome ωt contingent on history ht = (ω1,…,ωt−1)—denoted by Pr(ωt | ht)—is equal to f(ht), if ωt = 1, and 1 − f(ht), if ωt = 0. The probability of a finite history hm = (ω1,…,ωm−1) is equal to the product
of conditional probabilities Pr(ωt | ht); and the probability of any set A ⊆ H∞ —denoted by Pf(A)— is equal to the sum of the probabilities of the single finite histories that belong to A.∥
If f is the true theory, then for any set A ⊆ H∞, the true probability of set A is equal to Pf(A).
Definition 1: Fix ɛ ∈ [0,1]. A test C passes the true theory with probability 1−ɛ if, for any pair of theories (f1,f2) ∈ F × F,

Suppose that expert 1 is informed, and announces the true theory. So, f1 is the true theory. Then, no matter which theory expert 2 announces, Eq. 3.1 ensures that the true theory f1 is likely to pass the test. The odds that f1 will pass the test are computed by Pf1 (i.e., by nature's true theory). Analogously, if expert 2 is informed and announces the true theory, then his theory is likely to pass the test, no matter which theory expert 1 announces.
Now, consider the contract defined by a test that is likely to pass the true theory. If expert i (where i = 1 or i = 2) is informed, accepts the contract, and announces the true theory, then he obtains the expected payoff u − dPfi(Ci(f1,f2)). Hence, if Eq. 3.1 holds, and ɛ is small enough, then the expected payoff is strictly positive. Informed experts will then accept the contract. So, consider a contract defined by a test that is likely to pass the true theory. The tester knows that informed experts accept this contract. Thus, an expert who refuses the contract reveals to the tester that he is uninformed.
Suppose that none of the experts is informed. The question is whether they will reject the contract and reveal themselves to be uniformed, or instead will accept the contract and confound the tester. By definition, uninformed experts do not know the probabilities of future outcomes. They must decide whether to accept the contract without knowing the exact odds of rejection. Consider a test such that, for any given theory, there are data that reject it (i.e., it is feasible to reject any given theory). If an expert announces his theory deterministically, then for some data, his payoff will be u − d. As long as the penalty for rejection exceeds the reward for announcing a theory, i.e., as long as d > u, then the payoff of uninformed experts may be positive or negative. For uninformed experts, the probability of a negative payoff is unknown; it can be anything from 0 to 1. In addition, if u is small and d is large, then uninformed experts receive either a large punishment or a small reward with completely unknown odds. Hence, if the uninformed experts are sufficiently averse to uncertainty (i.e., if they are sufficiently averse to smaller payoffs with unknown odds), then uninformed experts are better off rejecting the contract, rather than accepting it and announcing any theory deterministically. It therefore seems that the tester can avoid receiving the theories of uninformed experts, at least if these experts are very uncertainty-averse. However, the uninformed experts still have one remaining recourse. They can randomize (but only once, before any data are observed), and select their theory according to this randomization.
Each expert may randomize when selecting his theory at period 0. Let a random generator of theories ζ be a probability distribution over the set F of all theories. The set of all random generators of theories will be denoted by ΔF. The possibility of selecting theories at random may at first seem redundant, since one might think that a mixture of theories is, for present purposes, again a theory. However, we will show that randomization radically changes the prospects of the uninformed experts. The possibility of selecting theories at random is important for the main result of this article.
The experts must randomize independently of one another. If expert 1 randomizes by using random generator of theories ζ1, and expert 2 randomizes by using random generator of theories ζ2, then the joint probability distribution over pairs of selected theories (f1,f2) ∈ F × F is the product measure ζ1 ×ζ2 of measures ζ1 and ζ2. Independence rules out producing theories contingent on signals, observable to both experts; it also rules out collusion (e.g., a situation in which the experts always announce identical theories).
The independence assumption is not meant to be realistic. Rather, it is an extreme case in which our result holds; it therefore still holds under milder and more realistic conditions where the experts can produce theories contingent on public signals. The inability to access a correlating device may pose the following difficulty to uninformed experts: If they must randomize independently, then they may produce different theories. The test may determine, depending on the data, that one theory outperformed another, leading to the rejection of at least one theory. This is a hurdle the uninformed experts must surpass if they must all sustain a false reputation of knowledge by passing the test simultaneously.
For i = 1 and i = 2, and a history ht ∈ H∞, let the revelation set
denote the set of pairs of theories such that, if announced, the theory of expert i will be rejected at history ht.
Given the experts' randomization devices ζ1 and ζ2 and a finite history ht, expert i selects a theory that will be rejected on ht with probability (ζ1 ×ζ2)(Ri(ht)). That is, (ζ1 ×ζ2)(Ri(ht)) is the probability of the revelation set Ri(ht). This is the probability of rejection at history ht computed by the odds given by the experts' randomization devices. A test can be ignorantly passed if both experts can produce theories at period zero (perhaps at random, but independently of each other) that are both unlikely to be rejected, no matter which data are eventually observed (i.e., the probability of any revelation set Ri(ht) is small).**
Definition 2: Fix ɛ ∈ [0,1]. A test C can be ignorantly passed with probability 1−ɛ if there exists a pair of independent random generators of theories (ζ1,ζ2) such that, for i = 1 and i = 2, and all histories ht ∈ H∞,

If a test can be ignorantly passed, then both experts can randomly select theories, independently of one another, such that the theory selected by each expert will be rejected only with small probability (no higher than ɛ), no matter how the data unfold in the future. (After the pair of theories is drawn, some data may reject them. However, given any data, the randomization devices are unlikely to draw theories that will be rejected on these data.)
Consider a contract based on a test. Assume that both experts are uninformed and produce theories with random devices ζ1 and ζ2. Then, the expected utility of both experts, i.e., for i = 1 and i = 2, computed at history ht, is u − d(ζ1 ×ζ2)(Ri(ht)). As long as Eq. 3.2 holds with ɛ sufficiently small, the expected payoff of both experts (with a contract) is strictly positive, for any realization of the data. So, both experts accept the contract. Thus, if a contract is based on a test that can be ignorantly passed, then both uninformed experts accept the contract, produce potentially false theories with devices designed to pass the test, and do not reveal that they are uninformed. This follows because if a test can be ignorantly passed, then the odds of rejection can be bounded by strategic randomization.
We conclude this section with an additional condition on tests, which is of a more technical nature. Two theories f and
are equivalent until period m if f(ht) =
(ht) for all t histories ht with t < m. So, two theories are equivalent until period m, if they make the same predictions up to period m. (The predictions up to period m must be identical, contingent on all past histories, not only on observed histories.)
Definition 3: A test C is future-independent if for any two pairs of theories (f1,
2) and (f2,
2) such that f1 and
1 are equivalent until period m, and f2 and
2 are also equivalent until period m,
for i = 1 and i = 2, and for all t histories ht with t < m.
A test is future-independent if the possibility that any expert's theory is rejected at period m depends only on the data observed up to period m and the predictions made by the theories of both experts up to period m.††
4. Main Result
Proposition 1. Fix any ɛ ∈ [0,1] and δ ∈ (0,1−ɛ]. Suppose that a test C is future-independent, and passes the true theory with probability 1−ɛ. Then test C can be ignorantly passed with probability 1 − ɛ − δ.
Proposition 1 shows that both experts can select theories at random (randomizing independently of one another) in such a way that both are likely to pass the test. This holds even though neither of the two experts has any idea of how the data will unfold in the future. Even in the worst possible realization of the data, the theories selected by the experts are unlikely to be rejected, provided that the true theory itself is unlikely to be rejected, and the test is future-independent.
4.1. Informal Description of the Proof of Proposition 1.
Suppose, first, that one of the experts, say, expert 1, selects his theory at random, using a random generator of theories ζ1. Now consider the following zero-sum game between nature and expert 2 : nature's pure strategy is an infinite sequence of outcomes. Expert 2 's pure strategy is a theory. Expert 2 's payoff is 1 if his theory is never rejected and 0 otherwise; nature's objective is to minimize this payoff. Both nature and the expert are allowed to randomize. A mixed strategy of nature is a probability measure over the space of infinite histories of outcomes. So, a mixed strategy of nature is associated with a theory.
If the test passes the true theory with probability 1−ɛ, then the expected payoff of expert 2 is 1−ɛ (assuming he announces the true theory). In game-theoretic language this means that for every mixed strategy of nature, there is a pure strategy for expert 2 (to announce the true theory) that gives him an expected payoff of 1−ɛ, or higher. So, if the conditions of the celebrated Fan's minmax theorem (see ref. 15) were satisfied, there exists a (mixed) strategy ζ2 for expert 2 that ensures him an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature chooses (and, in particular, no matter which data nature chooses).
The assumptions of Fan's minmax theorem require the expert's strategy space to be compact and nature's payoff function to be lower semicontinuous with respect to the expert's strategy. These two conditions are not simultaneously satisfied for all tests. Some additional properties of the test must be assumed, and this is the reason for assuming future independence. We restrict the set of expert 2 's pure strategies to theories that make a forecast, in each period t, from a finite set of predictions Rt ⊂ [0,1]. This new pure strategy space is compact if endowed with the product of discrete topologies. As a result, the new mixed strategy space of expert 2 is compact if endowed with the weak-* topology. Moreover, nature's payoff function is lower semi-continuous with respect to expert 2 's strategy.‡‡
The informal argument above delivers the following result: for every strategy of ζ1 of expert 1, there exists a strategy ζ2 of expert 2 that ensures him an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature chooses. Similarly, for every strategy of ζ2 of expert 2, there exists a strategy ζ1 of expert 1 that ensures him an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature chooses. Applying Glicksberg–Kakutani's fixed-point theorem (see ref. 16), one can show the existence of a pair of independent random generators of theories (ζ1,ζ2) that ensure each expert an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature chooses. This implies that for any history ht, the (ζ1 ×ζ2) -probability of the revelation sets R1(ht) and R2(ht) are not much higher than ɛ.
Moreover, the proof of Proposition 1 establishes an even stronger result. The assumption that the test passes the true theory with probability 1−ɛ can be replaced
with the following weaker assumption: if any expert i knows the true theory and correctly anticipates the mixed strategy of the other expert, then expert i himself has a mixed strategy ζi that ensures that he will pass the test with probability close to 1−ɛ. More precisely, the weaker assumption says that for
any i ∈ {1,2}, and ζj ∈ ΔF, where j≠i, there exists ζi ∈ ΔF such that
where Xi : F × F → [0,1] is the random variable that maps any pair of theories (f1,f2) into the probability Pfi(Ci(f1,f2)) that theory fi is rejected, and Eζ1×ζ2 denotes the expected-value operator associated with ζ1 × ζ2.
4.2. Extensions of the Main Result.
So far, we have assumed that the experts are only concerned about their reputation. That is, the experts want to be perceived as knowing the relevant odds. In this section, we extend the basic results to accommodate additional motivations the experts may have. For example, in addition to their reputation concerns, the experts may be ideologues who want to advocate particular theories. To capture this idea, we define direct payoffs for the experts on the announced theories.
Let U1 : F × F → R and U2 : F × F → R be two continuous utility functions. (We equip F, which is the Cartesian product of countably many copies of [0,1], with the product topology. See footnote **). So, Ui(f1,f2), for i = 1 and i = 2, is interpreted as the direct utility that expert i obtains if theories (f1,f2) are announced.
If expert i is informed, accepts the contract based on a test, and announces the true theory, his expected payoff is
So, as in section 3, informed experts accept the contract if Eq. 4.1 is strictly positive for any pair of theories (f1,f2).
Suppose now that both experts are uninformed. They must decide whether to accept the contract without knowing the exact odds
of rejection. However, as long as they can ensure that their payoffs (with a contract) are strictly positive, for any possible
realizations of the data, they will certainly accept the contract. So, uninformed experts accept the contract if there exists a pair of independent random generators of theories (ζ1,ζ2) ∈ Δ(F) × Δ(F) such that
is strictly positive for any history ht.
Claim 1: If informed experts accept the contract, then uninformed experts also accept the contract.
The proof of Claim 1 is the same as the proof of Proposition 1. As we note in section 4.1, Proposition 1 follows from Fan's minmax theorem and Glicksberg–Kakutani's fixed-point theorem. The continuity and the linearity of uninformed experts' payoff functions are the critical conditions that allows the use of these two results. These conditions hold whether the uninformed experts' payoffs are given by u − d(ζ1 × ζ2)(Ri(ht)) (i.e, the experts are solely motivated by reputation concerns), or whether the uninformed experts' payoffs are given by Eq. 4.2 (i.e., the experts are motivated by both reputation concerns and ideology).
In general, a contract may specify a (discounted or undiscounted) flow of payoffs. (The assumption that the experts do not discount the future is not necessary in this article. The undiscounted case seems, however, more interesting because it imposes no exogenous constraints on the use of the data.). At each period, the payoff of each expert may depend on the history of outcomes observed so far and the theories the two experts announced at period zero. General contracts may be able to accommodate a wide variety of motivations the experts may have. For the same reason mentioned above, Claim 1 still holds: as long as the perfectly informed agents accept a contract, the completely uninformed also accept it.
The substantive point of Claim 1 is as follows: The tester does not know the type of experts (informed or uninformed), but each expert knows his type. The tester cannot infer the experts' type (informed or uninformed) from their choice to accept or reject the contract. And the data do not alleviate the adverse selection problem of the tester either because the test on which the contract is based is unlikely to reject the theories of the uninformed experts.
Claim 1 holds under the assumption that the tester knows the experts' payoffs. So, the tester is unsure about whether any of the experts have relevant knowledge about the probabilities of future events, but she is not unsure of the experts' motivations. In a more realistic model, the tester may be unsure about the experts' knowledge and also about the experts' ideological biases. Our results do not say anything about whether data may help the tester to screen between biased or unbiased experts of unknown ideologies. This is an interesting problem that is outside the scope of this article. However, it seems (to us) rather implausible that a tester would be able to determine whether some experts are informed when she does not know the experts' motivations, given that she is unable to do so when she does know their motivations.
5. Example
We now provide an example that shows that simultaneous manipulation of tests (although possible) may not be an easy task. The example will also be helpful in explaining the relation between this article and the existing literature on testing strategic experts. To define our test, we need some auxiliary concepts.
Pick a positive number k. We say that theory fi k -outperforms theory fj at history ht if
i.e., history ht is k times more likely according to theory fi than according to theory fj.
Pick a number η ∈ (0,1] and a natural number r. Given a history ht, let hs, where s ≤ t, be the s history whose outcomes coincide with the first s − 1 outcomes of ht. A theory fi is (η,r) -similar to a theory fj at history ht, where r ≤ t, if
for all s = 1,…,t except at most r of them. That is, there exists at most r periods s such that Eq. 5.1 does not hold.
That is, the predictions of theories fi and fj along history ht differ at most by η in all but r periods. Informally, if η is small, and r is much smaller than t, then the predictions of (η,r) -similar theories along t -histories are, most of the time, close to one another.
Given any theory f and γ ∈ (0,.5], let
be an alternative theory defined by

So, the forecasts of theories f and
differ by γ. When f forecasts 1 with probability no greater than 0.5, then the forecast of
(for 1) adds γ to the forecasts of f. When f forecasts 1 with probability greater than 0.5, then the forecast of
subtracts γ from the forecasts of f. Theory
can be interpreted as an alternative theory constructed by the tester.
Fix numbers k > 1, η ∈ (0,1], γ ∈ (0,.5], and the positive natural numbers r and m. We define test
as follows: The rejection set
i (f1, f2) consists only of m histories.§§ Theory fi of expert i is rejected at history hm if:
-
The theory fi does not k- outperform the alternative theory
i at hm;
or if
-
The theory fj of expert j≠i is not (η,r) -similar to theory fi at history hm, and theory fi does not 1 -outperform theory fj at hm.
Informally, test
requires a theory to k -outperform the alternative theory constructed by the tester. It also requires the theory to 1 -outperform the other expert's
theory, in the case in which the experts' two theories are not similar. (We refer the reader to ref. 17 for a literature on predictions that is indirectly related to the ideas in this article.)
Condition 1 (by itself) defines a likelihood test, which has been studied in ref. 12. Condition 2 adapts to our setting the idea behind a test studied in ref. 10. The authors of ref. 5 show that if the tester knows that one expert has announced the true theory (but she does not know which expert), and the other expert has announced a theory that is not (η,r) -similar to the true theory, then, with sufficiently large datasets, the tester is eventually able to detect (with high probability) which expert has announced the true theory. This can be achieved by selecting the expert whose theory 1 -outperforms the theory of the other expert. In contrast, the tester, in our setting, does not know whether any expert is informed. So the tester cannot rule out the possibility that both experts are uninformed.
By proposition 2 in ref. 12 and proposition 1 in ref. 10, it follows that for any k, γ and ɛ > 0, there are natural numbers m and r such that test
passes the true theory with probability 1−ɛ. It is also easy to see that test
is future-independent. It now follows from Proposition 1 that for these m and r, the test
can be ignorantly passed with probability 1−ɛ. We now argue that it is not an easy task for both experts to ignorantly pass
this test.
Fix an expert. No matter which theory f he announces, it will not k -outperform the alternative theory on all datasets. Indeed, for some m histories, the alternative theory
1 -outperforms theory f.¶¶ So, if this expert announces any theory deterministically, then he will be rejected (by condition 1) at some histories. As
a result, both experts have to randomize with odds carefully designed to avoid rejection by condition 1.
In addition, on any dataset, at least one expert fails the test (by condition 2) if their theories are not similar. Thus, if test
is to be ignorantly passed, then the experts' theories have to be similar with high probability. However, if m is sufficiently large compared with r, one might expect it to be difficult for both experts to announce similar theories, since they have to randomize independently
of each other. There therefore seems to be a potential conflict between conditions 1 and 2. Condition 1 requires theories
to be selected at random with specific odds; and condition 2 requires the selected theories to be similar to each other. Nevertheless,
by Proposition 1, both experts can avoid rejection by both conditions, by randomizing independently. It may be worth noting that Proposition 1 only shows the existence of randomization devices designed to pass the test. How to construct such devices is beyond the
scope of this article.
6. Conclusion
There is an incompatibility between the following two basic properties of future-independent tests: (i) The test is unlikely to reject the true theory. (ii) The test cannot be ignorantly passed. Despite earlier research suggesting a difference between testing a single theory and testing competing theories, this incompatibility result holds for simultaneous testing of multiple experts. Moreover, this result can be extended to accommodate several motivations that experts may have in practice.
Acknowledgments
We thank the editor and referee for helpful comments. This work was supported by the National Science Foundation.
Footnotes
- 1To whom correspondence should be addressed. E-mail: wo{at}northwestern.edu
-
Author contributions: W.O. and A.S. designed research, performed research, analyzed data, and wrote the paper.
-
The authors declare no conflict of interest.
-
This article is a PNAS Direct Submission.
-
This article contains supporting information online at www.pnas.org/cgi/content/full/0812602106/DCSupplemental.
-
↵† As we argue in section 4.2, the substantive point of our result holds for general contracts where the experts' motivations may go beyond avoiding rejection.
-
↵‡ In ref. 10, the authors show that, if it is known that some of the experts are informed, then (under some conditions) the data may be able to identify which experts are informed.
-
↵§ Rejection sets are assumed to be such that if a theory is rejected at a history ht = (ω1,…,ωt−1), then it is also rejected at all extensions of history ht, i.e., at all histories hm = (ω′1,…,ω′t−1,ω′t,…,ω′m−1) such that m > t and (ω1,…,ωt−1) = (ω′1,…,ω′t−1).
-
↵¶ For expositional simplicity, we assume that the experts do not discount the future. That is, the disutility d does not depend on the period in which their theories are rejected. However, all our arguments remain valid in the discounted case.
-
↵∥ Any theory f uniquely defines a probability distribution on the set {0,1}∞ of infinite histories. So, one can interpret a theory as a probability distribution on {0,1}∞, parametrized by the conditional probabilities Pr(ωt | ht).
-
↵** This definition requires the set F × F to be equipped with a σ algebra and the provision that for i = 1 and i = 2, and any ht ∈ H∞, the set revelation Ri(ht) is measurable with respect to that σ algebra.
↵The set of all theories F is the set of functions from a countable set to [0,1]. If we equip [0,1] with the σ algebra of Borel sets, then F inherits the product Borel structure. Similarly, F × F inherits the product Borel structure of two copies of F. All tests C in this article are assumed to be such that the revelation sets Ri(ht) are measurable with respect to this σ algebra.
-
↵†† In several cases, the tester can use only future-independent tests. This is true, for example, in the case in which the experts claim that they will know the probability that 1 occurs at period t + 1 no earlier than at period t. We refer the reader to ref. 12 for a detailed discussion of future independence.
↵The assumption of future independence can be dispensed with if (unlike the model in this article) the tester has bounded datasets or the experts discount future payoffs. In general, future independence can be relaxed, but not completely dispensed with. We refer the reader to refs. 13 and 14 for future-dependent tests that are likely to pass the true theory and cannot be ignorantly passed.
-
↵‡‡ If the set of expert 2's pure strategies is restricted, then we may no longer have the property that for every mixed strategy of nature, there is a strategy for expert 2 that gives him a payoff of 1−ɛ, or higher. However, an additional step in our proof shows that this property is preserved for properly chosen sets of predictions Rt ⊂ [0,1].
-
↵§§ This statement must be modified, of course, to the effect that, as we assumed in section 2 (see footnote §), every rejection set automatically includes histories ht with t > m, whose first m outcomes coincide with the outcomes of a history that belongs to
i(f1, f2. The fact that the rejection sets consist of m histories (and their extensions) ensures that test
satisfies the required measurability provision (see footnote **).
-
↵¶¶ To see an example of such a history, take in each period s = 1,…,m an outcome that is more likely according to
i than according to fi.
-
Freely available online through the PNAS open access option.
- © 2009 by The National Academy of Sciences of the USA










