• Open Access Science Articles
  • Science Sessions: The PNAS Podcast Program

Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer

Table 1.

The number of samples n needed to get an overlap ffc = 1 − ε, with confidence 1 − δ, for two PGLs of size αNg

fc = 1 − ε δ n (10) n (11)
0.02 0.5 87 104
0.05 0.5 170 218
0.10 0.5 290 383
0.20 0.5 553 743
0.50 0.5 2,300 3,142
0.02 0.1 178 195
0.05 0.1 270 319
0.10 0.1 412 507
0.20 0.1 736 930
0.50 0.1 3,026 3,883
  • We use α = 0.0046 (corresponding to a PGL of 70 genes) and α = 0.0068 (corresponding to a PGL of 76 genes) for refs. 10 and 11, respectively. For δ = 0.5, fc = f*n , and hence n represents the number of samples needed for an average overlap of 1 − ε. The effective number of genes used here (after preprocessing) was Ng = 15,125 for ref. 10 and Ng = 11,130 for ref. 11.

Online Impact