Equitability, mutual information, and the maximal information coefficient
See allHide authors and affiliations
Edited* by David L. Donoho, Stanford University, Stanford, CA, and approved January 21, 2014 (received for review May 24, 2013)

Significance
Attention has recently focused on a basic yet unresolved problem in statistics: How can one quantify the strength of a statistical association between two variables without bias for relationships of a specific form? Here we propose a way of mathematically formalizing this “equitability” criterion, using core concepts from information theory. This criterion is naturally satisfied by a fundamental information-theoretic measure of dependence called “mutual information.” By contrast, a recently introduced dependence measure called the “maximal information coefficient” is seen to violate equitability. We conclude that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets.
Abstract
How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical “equitability” has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518–1524], which proposed an alternative definition of equitability and introduced a new statistic, the “maximal information coefficient” (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.
Footnotes
- ↵1To whom correspondence should be addressed. E-mail: jkinney{at}cshl.edu.
Author contributions: J.B.K. and G.S.A. designed research, performed research, and wrote the paper.
The authors declare no conflict of interest.
↵*This Direct Submission article had a prearranged editor.
Data deposition: All analysis code reported in this paper have been deposited in the SourceForge database at https://sourceforge.net/projects/equitability/.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1309933111/-/DCSupplemental.
Freely available online through the PNAS open access option.
Citation Manager Formats
Article Classifications
- Physical Sciences
- Statistics
This article has Letters. Please see:
- Relationship between Research Article and Letter - April 29, 2014
- Relationship between Research Article and Letter - August 19, 2014
See related content:
- Reply to Murrell et al.- Apr 29, 2014
- Falsifiability or bust- Aug 19, 2014