Middendorf et al. 10.1073/pnas.0409515102.
Supporting Figure 6
Supporting Figure 7
Supporting Figure 8
Supporting Figure 9
Supporting Figure 10
Supporting Figure 11
Supporting Figure 12
Supporting Figure 13
Supporting Figure 14
Supporting Figure 15
Supporting Table 3
Supporting Table 4
Supporting Table 5
Supporting Table 6
Supporting Table 7
Supporting Table 8
Supporting Table 9
Supporting Text
Fig. 6. Percolative events. While lowering p*, many small components join together to form a single large connected network. Major percolation events occur only for values above p* = 0.65. At p* = 0.65, we have one giant component (1,433 nodes) and the other 703 small components of size less than or equal to 15 nodes.
Fig. 7. Part of an alternating decision tree (ADT) for one of the folds learned on training data with p* = 0.65 and all 8-step subgraphs. The subgraph labels are given by Fig. 8.
Fig. 8. Subgraphs associated with the subgraph labels in Fig. 7.
Fig. 9. Rank scores for 8-step subgraphs and p* = 0.65. Rank scores for all 148 8-step subgraphs and for every mechanism, based on training data for a confidence threshold p* = 0.65. The subgraphs are sorted by similarity in rank scores. A rank score of 50% indicates that the median of the distribution associated with a given subgraph and network mechanism is equal to Drosophila’s subgraph count. The upper histogram shows the raw subgraph counts for the Drosophila network at p* = 0.65. The corresponding subgraphs for the labels S1-S148 are shown in Fig. 10.
Fig. 10. Subgraphs associated to Fig. 9. All 148 8-step subgraphs are shown.
Fig. 11. Rank scores for 8-step subgraphs and p* = 0.5. Rank scores for all 148 8-step subgraphs and for every mechanism, based on training data for a confidence threshold p* = 0.5. The subgraphs are sorted by similarity in rank scores. A rank score of 50% indicates that the median of the distribution associated with a given subgraph and network mechanism is equal to Drosophila’s subgraph count. The upper histogram shows the raw subgraph counts for the Drosophila network at p* = 0.5. The corresponding subgraphs for the labels S1-S148 are shown in Fig. 12.
Fig. 12. Subgraphs associated to Fig. 11. All 148 8-step subgraphs are shown.
Fig. 13. Rank scores for 7-edge-subgraphs and p* = 0.65. Rank scores for all 130 7-edge subgraphs and for every mechanism, based on training data for a confidence threshold p* = 0.65. The subgraphs are sorted by similarity in rank scores. A rank score of 50% indicates that the median of the distribution associated with a given subgraph and network mechanism is equal to Drosophila’s subgraph count. The upper histogram shows the raw subgraph counts for the Drosophila network at p* = 0.65. The corresponding subgraphs for the labels S1-S130 are shown in Fig. 14.
Fig. 14. Subgraphs associated to Fig. 13. All 130 7-edge subgraphs are shown.
Fig. 15. Drosophila with artificial noise: Color-coded raw subgraph counts for Drosophila’s network with artificially introduced noise. The y axis shows the fraction of edges that have been replaced randomly. The subgraph counts are averaged over 200 independent realizations of the randomization procedure. The subgraph labels correspond to Fig. 8.
Table 3. Comparing the sizes of the giant components
|
p* |
DMC |
DMR |
RDG |
AGV |
LPA |
SMW |
RDS |
Drosophila |
|
0.65 |
85 ± 65 |
1643 ± 138 |
1657 ± 121 |
308 ± 248 |
531 ± 233 |
64 ± 25 |
888 ± 336 |
1433 |
|
0.5 |
353 ± 205 |
3393 ± 184 |
3267 ± 113 |
4187 ± 904 |
4409 ± 366 |
3315 ± 1624 |
3679 ± 178 |
3039 |
Shown are the average giant component sizes for every mechanism and both confidence thresholds. Note that for this single feature, DMC does not reproduce a value close to Drosophila’s.
Table 4. Prediction accuracy for tested networks using fivefold cross-validation
|
Prediction |
|||||||
|
Truth |
DMC |
DMR |
RDG |
AGV |
LPA |
SMW |
RDS |
|
DMC |
99.0% |
0.0% |
0.0% |
0.0% |
0.0% |
1.0% |
0.0% |
|
DMR |
0.1% |
96.1% |
3.7% |
0.0% |
0.0% |
0.0% |
0.1% |
|
RDG |
0.0% |
2.4% |
97.6% |
0.0% |
0.0% |
0.0% |
0.0% |
|
AGV |
0.0% |
0.0% |
0.0% |
81.4% |
10.6% |
6.9% |
1.1% |
|
LPA |
0.0% |
0.0% |
0.0% |
7.8% |
92.2% |
0.0% |
0.0% |
|
SMW |
0.3% |
0.0% |
0.0% |
2.5% |
0.0% |
96.9% |
0.3% |
|
RDS |
0.0% |
0.0% |
0.0% |
0.8% |
0.0% |
0.4% |
98.8% |
The (i, j) entry is the probability of predicting class j given that the true class is i. The training data are based on the Drosophila protein network with p* = 0.65. The input features for the classifier are 8-step subgraphs.
Table 5. Prediction accuracy for tested networks using fivefold cross-validation
|
Prediction |
|||||||
|
Truth |
DMR |
DMC |
AGV |
LPA |
SMW |
RDS |
RDG |
|
DMR |
99.3% |
0.0% |
0.0% |
0.0% |
0.0% |
0.1% |
0.6% |
|
DMC |
0.0% |
99.7% |
0.0% |
0.0% |
0.3% |
0.0% |
0.0% |
|
AGV |
0.0% |
0.1% |
84.7% |
13.5% |
1.2% |
0.5% |
0.0% |
|
LPA |
0.0% |
0.0% |
10.3% |
89.6% |
0.0% |
0.0% |
0.1% |
|
SMW |
0.0% |
0.0% |
0.6% |
0.0% |
99.0% |
0.4% |
0.0% |
|
RDS |
0.0% |
0.0% |
0.2% |
0.0% |
0.8% |
99.0% |
0.0% |
|
RDG |
0.9% |
0.0% |
0.0% |
0.1% |
0.0% |
0.0% |
99.0% |
The (i, j) entry is the probability of predicting class j given that the true class is i. The training data are based on the Drosophila protein network with p* = 0.5. The input features for the classifier are 8-step subgraphs.
Table 6. Prediction accuracy for tested networks using fivefold cross-validation
|
Prediction |
|||||||
|
Truth |
DMC |
DMR |
RDG |
AGV |
LPA |
SMW |
RDS |
|
DMC |
99.3% |
0.0% |
0.0% |
0.0% |
0.0% |
0.7% |
0.0% |
|
DMR |
0.1% |
97.0% |
2.9% |
0.0% |
0.0% |
0.0% |
0.0% |
|
RDG |
0.0% |
2.7% |
97.0% |
0.2% |
0.1% |
0.0% |
0.0% |
|
AGV |
0.0% |
0.0% |
0.0% |
82.7% |
10.6% |
6.1% |
0.6% |
|
LPA |
0.0% |
0.0% |
0.0% |
9.1% |
90.9% |
0.0% |
0.0% |
|
SMW |
0.3% |
0.0% |
0.0% |
2.8% |
0.0% |
96.7% |
0.2% |
|
RDS |
0.0% |
0.0% |
0.0% |
0.5% |
0.0% |
0.4% |
99.1% |
The (i, j) entry is the probability of predicting class j given that the true class is i. The training data are based on the Drosophila protein network with p* = 0.65. The input features for the classifier are subgraphs with up to seven edges.
Table 7. Performance: Training- and test-losses for different confidence thresholds p* and different cut-offs in subgraph size
|
Eight-step subgraphs |
Subgraphs with up to seven edges |
||
|
p* = 0.65 |
p* = 0.5 |
p* = 0.65 |
|
|
Training-loss |
1.3 ± 0.2% |
0.0 ± 0.0% |
1.4 ± 0.2% |
|
Test-loss |
5.4 ± 0.3% |
4.2 ± 0.4% |
5.3 ± 0.4% |
Table 9. Prediction scores for the Drosophila protein network for different confidence thresholds p* and different cut-offs in subgraph size
|
Eight-step subgraphs |
Subgraphs with up to seven edges |
Eight-step subgraphs |
||||
|
p* = 0.65 |
p* = 0.65 |
p* = 0.5 |
||||
|
Rank |
Class |
Score |
Class |
Score |
Class |
Score |
|
1 |
DMC |
8.2 ± 1.0 |
DMC |
8.6 ± 1.1 |
DMC |
0.8 ± 2.9 |
|
2 |
DMR |
−6.8 ± 0.9 |
DMR |
−6.1 ± 1.7 |
DMR |
−2.1 ± 2.0 |
|
3 |
RDG |
−9.5 ± 2.3 |
RDG |
−9.3 ± 1.6 |
AGV |
−3.1 ± 2.2 |
|
4 |
AGV |
−10.6 ± 4.2 |
AGV |
−11.5 ± 4.1 |
LPA |
−10.1 ± 3.1 |
|
5 |
LPA |
−16.5 ± 3.4 |
LPA |
−14.3 ± 3.2 |
SMW |
−20.6 ± 1.9 |
|
6 |
SMW |
−18.9 ± 0.7 |
SMW |
−18.3 ± 1.9 |
RDS |
−22.3 ± 1.7 |
|
7 |
RDS |
−19.1 ± 2.3 |
RDS |
−19.9 ± 1.5 |
RDG |
−22.5 ± 4.7 |
Drosophila is consistently classified as a DMC network, with an especially strong prediction for a confidence threshold of p* = 0.65 and independently of the cut-off in subgraph size.