Norris and Kahn. 10.1073/pnas.0510115103.
Table 3. Multiple-hypothesis-testing outcomes
|
Called negative |
Called positive |
Total |
|
|
H0 (unchanging) |
U |
V |
m0 |
|
Ha (changing) |
G |
S |
ma |
|
Total |
W |
R |
m |
Supporting Text
FNR Procedure Motivation. Adopting established nomenclature (1), one can define the numeric outcomes from multiple-hypothesis testing (Table 3). The hypotheses (i.e., genes) can be sorted by the magnitude of their test statistics into descending order of significance: T1>T2..>Tm. The false negative rate (FNR) is simply
FNR = E(G/ma) = 1 – E(S)/ma.
Because ma is invariant for a given analysis, it was extracted from the expectation operator. We can rewrite the expected value of S for any collection of genes 1…k as
E(Sk) = å E(si), i = 1…k,
where si is the probability that gene i is truly changing and declared positive, notated Pr(Ha & call pos). By using conditional probability,
E(si) = Pr(call pos | Ha) × Pr(Ha).
Recall that Pr(call pos | Ha) is statistical power, notated here as pow, which can be calculated through resampling. In single-hypothesis testing, Pr(Ha) is the statistical confidence = 1 – p value. This approach obviously fails in multiple-hypothesis testing. Below, we detail the derivation of an estimator, termed c, for Pr(Ha) in the multiple testing context:
ci = 1 – (m/m0) × ( i × FDRi – (i –1) × FDRi – 1 ).
Thus, the value Sk for collection of genes 1…k is
E(Sk) = å (powi × ci), i = 1…k.
The FNR at gene k is then simply FNRk = 1 – E(Sk)/ma. However, because, necessarily, FNR0 = 1 and FNRm = 0, and because ma cannot be truly known, it is prudent to drop ma from the equation and, instead, substitute the terminal value of S:
FNRk = 1 – E(Sk)/E(Sm).
FNR Resampling Algorithm. Hypotheses (i.e., genes) are sorted by the magnitude of their test-statistics Ti = |ti| into descending order of significance: T1>T2..>Tm.
1. The null distribution is sampled by B independent permutations of the data labels, producing null statisticsT0i,b for each gene i.
2. The alternate distribution is sampled by B independent bootstraps of the data, preserving category identity, producing alternate statistics Tai,b for each gene i.
3. The FDR is calculated as outlined in Box 5 of Ge et al. (1).
4. For each gene i, calculate ci = 1 – FDRi × i + FDRi – 1 × (i – 1).
5. For each gene i, sort T0i,1>T0i,2..>T0i,m. Find the null statistic at the significance level a as T0i,(a × B).
6. For each gene i, determine its power as the fraction of alternate distribution test statistics >T0i,(a × B). Formally, powi = #{k: Tai,k>T0i,(a × B)}/B.
7. Calculate Sk = å (powi × ci), i = 1…k, across the entire sorted gene list.
8. Finally, determine the FNR at each gene k as FNRk = 1 – Sk/Sm.
Derivation of c. For any single gene, the statistical confidence corrected for multiple-hypothesis testing can be derived from the FDR, as outlined here. We assume that the FDR has been calculated for a gene list that has been sorted by from strongest to weakest test-statistic magnitudes, i.e., T1>T2..>Tm. The FDR is thus defined as
FDRi = E(Vi/Ri).
At position i, Ri = i, that is, i genes are being declared positives. Thus,
FDRi = E(Vi)/i.
We can define Vi as the sum of the relevant v, the genewise probabilities for V, notated as follows
Vi = å vk , k = 1…i.
For each gene k, vk can be calculated as the joint probability that the gene is declared significant and that it is truly negative, which can be notated as Pr(declared significant & truly negative). By using conditional probability, this can be rearranged as
vk = Pr(declared significant | truly negative) × Pr(truly negative).
The probability for any gene being declared significant, given that it is truly negative, namely, Pr(declared significant | truly negative), is the significance level, ak. Because this term is derived in the context of the FDR, it is a significance level corrected for multiple-hypothesis testing, so we will notate it ak,FDR. We will take the unconditional probability of being truly negative simply as m0/m. Thus,
vk = ak,FDR × (m0/m)
Returning now to the definition of the FDR, we find that
FDRi = å ak,FDR × (m0/m)/i, k = 1…i.
Rearranging and splitting of terms then yields the following:
i × FDRi = å ak,FDR × (m0/m) + ai,FDR × (m0/m), k = 1…i – 1,
and
i × FDRi = (i – 1) × FDRi – 1 + ai,FDR × (m0/m),
and finally
ai,FDR = (m/m0) × (i × FDRi – (i – 1) × FDRi – 1).
We are interested in the statistical confidence for each single gene, corrected for multiple-hypothesis testing. This can then be easily derived, knowing that statistical confidence and significance sum to one. Thus,
ci = 1 – (m/m0) × (i × FDRi – (i – 1) × FDRi – 1).
Because the estimated FDR is imperfect, ci, when estimated from the FDR, can have considerable point-to-point variability. Thus, it is not suitable, in the present form, for asking questions about individual genes. But when used for collections of genes, the amassed behavior of c is sufficiently accurate for use in aggregate calculations such as the FNR algorithm presented above.
1. Ge, Y. C., Dudoit, S. & Speed, T. P. (2003) Test 12, 1–77.