Computing the distribution of the maximum in balls-and-boxes problems with application to clusters of disease cases

  1. Warren J. Ewens*, and
  2. Herbert S. Wilf,
  1. *Department of Biology, University of Pennsylvania, Philadelphia, PA 19104-6018; and
  2. Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104-6395
  1. Edited by Richard V. Kadison, University of Pennsylvania, Philadelphia, PA, and approved May 22, 2007 (received for review May 18, 2007)

Abstract

We present a rapid method for the exact calculation of the cumulative distribution function of the maximum of multinomially distributed random variables. The method runs in time O(mn), where m is the desired maximum and n is the number of variables. We apply the method to the analysis of two situations in which an apparent clustering of cases of a disease in some locality has raised epidemiological concerns, and these concerns have been discussed in the recent literature. We conclude that one of these clusters may be explained on purely random grounds, namely the leukemia cluster in Niles, IL, in 1956–1960; whereas the other, a leukemia cluster in Fallon, NV, in 1999–2001, may not.

Footnotes

  • To whom correspondence may be addressed. E-mail: wewens{at}sas.upenn.edu or wilf{at}math.upenn.edu
  • Author contributions: W.J.E. and H.S.W. designed research, performed research, analyzed data, and wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

« Previous | Next Article »Table of Contents