# Bayesian posteriors for arbitrarily rare events

See allHide authors and affiliations

Contributed by Drew Fudenberg, March 27, 2017 (sent for review November 14, 2016; reviewed by Keisuke Hirano, Demian Pouzo, and Bruno Strulovici)

## Significance

Many decision problems in contexts ranging from drug safety tests to game-theoretic learning models require Bayesian comparisons between the likelihoods of two events. When both events are arbitrarily rare, a large data set is needed to reach the correct decision with high probability. The best result in previous work requires the data size to grow so quickly with rarity that the expectation of the number of observations of the rare event explodes. We show for a large class of priors that it is enough that this expectation exceeds a prior-dependent constant. However, without some restrictions on the prior the result fails, and our condition on the data size is the weakest possible.

## Abstract

We study how much data a Bayesian observer needs to correctly infer the relative likelihoods of two events when both events are arbitrarily rare. Each period, either a blue die or a red die is tossed. The two dice land on side

Suppose a physician is deciding between a routine surgery and a newly approved drug for her patient. Either treatment can, in rare cases, lead to a life-threatening complication. She adopts a Bayesian approach to estimate the respective probability of complication, as is common among practitioners in medicine when dealing with rare events; see, for example, refs. 1 and 2 on the “zero-numerator problem.” She reads the medical literature to learn about

Phrased more generally, we study how much data are required for the Bayesian posterior means on two probabilities to respect an inequality between them in the data-generating process, where these true probabilities may be arbitrarily small. Each period, one of two dice, blue or red, is chosen to be tossed. The choices can be deterministic or random, but have to be independent of past outcomes. The blue and red dice land on side

Suppose that in every period, the blue die is chosen with the same probability and that outcome

The best related result known so far is a consequence of the uniform consistency result of Diaconis and Freedman in ref. 4. Their result leads to the desired conclusion only under the stronger condition that the sample size is so large that the expected number of times the blue die lands on side

Our improvement of the sample size condition is made possible by a pair of inequalities that relate the Bayes estimates to observed frequencies. Like the bounds of ref. 4, the inequalities apply to all sample sequences without exceptional null sets and they do not involve true parameter values. Our result is related to a recent result of ref. 5, which shows that, under some conditions, the posterior distribution converges faster when the true parameter is on the boundary. Our result is also related to ref. 6, which considers a half space not containing the maximum-likelihood estimate of the true parameter and studies how quickly the posterior probability assigned to the half space converges to zero.

## Bayes Estimates for Multinomial Probabilities

We first consider the simpler problem of estimating for a single

Motivated by applications where some of the

For a wide class of priors, we show in *Theorem 1* that there is a constant **1**] holds whenever

*Condition*𝓟 *:*

We say that a density *Condition* *Condition* *Condition*

For example, if *Condition* *Condition* *Condition*

### Theorem 1.

*Suppose* *satisfies Condition* *. Then for every* *there exists* *so that**if*

The proofs of the results in this section are given in *SI Appendix*.

The proof of *Theorem 1* uses bounds on the posterior means given in *Proposition 1* below. These bounds imply that there is an

Inequality **2** shows a higher accuracy of the Bayes estimator **2**] is less than *Remark 2* we discuss the properties of the prior that have an impact on the

*Condition* *Theorem 1* fails to hold for a prior density that converges to

#### Example 1:

Let

The idea behind this example is that the prior assigns very little mass near the boundary point where

The next example shows that the sample size condition of *Theorem 1*, **2**] can be proved cannot be enlarged to a set of the form

#### Example 2:

Suppose *Condition*

The following proposition gives fairly sharp bounds on the posterior means under the assumption that the prior density satisfies *Condition* *Theorems 1* and *2*.

### Proposition 1.

*Suppose* *satisfies Condition* *. Then for every* *there exists a constant* *such that**for* *and all* *with*

#### Remark 1:

If **3**] hold with

The proofs of our main results, *Theorems 1* and *2*, apply to all priors whose densities satisfy inequalities **3** or **4**. In particular, the conclusions of these theorems and of their corollaries hold if the prior distribution is a mixture of Dirichlet distributions and the support of the mixing distribution is bounded.

#### Remark 2:

*Condition* *Proposition 1* relies on the fact that **3**] can be taken to be *i*) if *ii*) if *iii*) if *Theorem 1* depends on the prior through the constant *Proposition 1* and the properties of

In particular, *Condition* *Example 1*, where no finite *Theorem 1*. If *Theorem 1* is

#### Remark 3:

Using results on the degree of approximation by Bernstein polynomials, one may compute explicit values for the constants *Proposition 1* and *Theorem 1*. Details are given in in *SI Appendix*, *Remarks* *3′* and *3″*.

#### Remark 4:

Suppose *SI Appendix*, *Proposition 2* shows that whenever the original density *Condition* *Condition* *Condition* *Condition*

## Comparison of Two Multinomial Distributions

Here we consider two dice, blue and red, each with

Let

We study the following problem. Fix a side

Clearly, as *Condition*

### Theorem 2.

*Suppose that* *and* *satisfy Condition* *. Let* *and* *. Then there exists* *so that for every deterministic sequence of choices of the dice to be tossed*,*for all* *with* *and all* *with* *and*

We prove *Theorem 2* in the next section.

Note that the only constraints on the sample size here are that the product of

In the language of hypothesis testing, *Theorem 2* says that under the stated condition on the prior, the test that rejects the null hypothesis

We now turn to the case where the dice are randomly chosen. The probability of choosing the blue die need not be constant over time but must not depend on the unknown parameter

### Corollary 1.

*Suppose that* *and* *satisfy Condition* *. Let* *and* *. Suppose that in every period*, *the die to be tossed is chosen at random*, *independent of the past*, *and that**Then there exists* *so that**for all* *with* *and all* *with*

The proof of *Corollary 1* is given at the end of the next section.

In the decision problem described at the beginning, *Theorem 2* and *Corollary 1* ensure that whenever surgery is the safer option, the probability that the physician actually chooses surgery is at least

In the rest of this section we assume that in every period the blue die is chosen at random with the same probability **6** is met.

The following example shows that the conditions on the prior densities cannot be omitted from *Corollary 1*.

#### Example 3:

Suppose *Condition*

The next example shows that the sample size condition of *Corollary 1*, **7**] holds uniformly for all

#### Example 4:

Suppose that *Condition*

*Examples 3* and *4* are proved in *SI Appendix*.

Suppose that after data

### Corollary 2.

*Suppose that* *and* *satisfy Condition* *. Then there exists* *such that whenever* *and* *there is probability at least* *that the posterior odds ratio of blue relative to red exceeds* *when the* *th die lands on side*

*Corollary 2* is used by Fudenberg and He in ref. 3, who provide a learning-based foundation for equilibrium refinements in signaling games. They consider a sequence of learning environments, each containing populations of blue senders, red senders, and receivers. Senders are randomly matched with receivers each period and communicate using one of *Condition* *Corollary 2* all but *Condition* *Corollary 2* shows that the Dirichlet restriction can be substantially relaxed.

## Proofs of *Theorem 2* and *Corollary 1*

We begin with two auxiliary results needed in the proof of *Theorem 2*. *Lemma 1* is a large deviation estimate that gives a bound on the probability that the frequency of side *Lemma 2* implies that, with probability close to *Lemmas 1* and *2* are in *SI Appendix*.

### Lemma 1.

*Let* *be a binomial random variable with parameters* *and* *and let* *be a binomial random variable with parameters* *and* *. Let* *and* *. Suppose* *and* *are independent*, *and* *. Then*

### Lemma 2.

*Let* *and* *. Then there exists* *so that if* *is a binomial random variable with parameters* *and* *and* *then*

#### Proof of Theorem 2:

Let *Proposition 1*, there exists *Lemma 1* satisfies

We now show that for all **10**] that**8**, twice the second, and finally the fourth inequality in [**10**] we get**11**].

Let **10**] **11**] yields that**9**,*Lemma 1* and the definition of *Lemma 2*, there exists

#### Remark 5:

If *Theorem 1* to give an alternative proof of *Theorem 2* for the case *Proof of Theorem 2* does not use *Theorem 1*.

### Proof of Corollary 1:

By Chebyshev’s inequality, **6**, there exists *Theorem 2*, there exists **7**] because

## Acknowledgments

We thank three referees for many useful suggestions. We thank Gary Chamberlain, Martin Cripps, Ignacio Esponda, and Muhamet Yildiz for helpful conversations. This research is supported by National Science Foundation Grant SES 1558205.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: drew.fudenberg{at}gmail.com.

Author contributions: D.F., K.H., and L.A.I. designed research, performed research, and wrote the paper.

Reviewers: K.H., Pennsylvania State University; D.P., University of California, Berkeley; and B.S., Northwestern University.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1618780114/-/DCSupplemental.

## References

- ↵.
- US Foodand Drug Administration

*Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials*(FDA, Rockville, MD), Tech Rep 2006D–0191. - ↵.
- Thompson LA

- ↵.
- Fudenberg D,
- He K

- ↵.
- Diaconis P,
- Freedman D

- ↵.
- Bochkina NA,
- Green PJ

- ↵.
- Dudley R,
- Haughton D

## Citation Manager Formats

## Article Classifications

- Social Sciences
- Economic Sciences

- Physical Sciences
- Statistics