Inferring network mechanisms: The Drosophila melanogaster protein interaction network

Middendorf et al. 10.1073/pnas.0409515102.

Supporting Information

Files in this Data Supplement:

Supporting Figure 6
Supporting Figure 7
Supporting Figure 8
Supporting Figure 9
Supporting Figure 10
Supporting Figure 11
Supporting Figure 12
Supporting Figure 13
Supporting Figure 14
Supporting Figure 15
Supporting Table 3
Supporting Table 4
Supporting Table 5
Supporting Table 6
Supporting Table 7
Supporting Table 8
Supporting Table 9
Supporting Text





Supporting Figure 6

Fig. 6. Percolative events. While lowering p*, many small components join together to form a single large connected network. Major percolation events occur only for values above p* = 0.65. At p* = 0.65, we have one giant component (1,433 nodes) and the other 703 small components of size less than or equal to 15 nodes.





Supporting Figure 7

Fig. 7. Part of an alternating decision tree (ADT) for one of the folds learned on training data with p* = 0.65 and all 8-step subgraphs. The subgraph labels are given by Fig. 8.





Supporting Figure 8

Fig. 8. Subgraphs associated with the subgraph labels in Fig. 7.





Supporting Figure 9

Fig. 9. Rank scores for 8-step subgraphs and p* = 0.65. Rank scores for all 148 8-step subgraphs and for every mechanism, based on training data for a confidence threshold p* = 0.65. The subgraphs are sorted by similarity in rank scores. A rank score of 50% indicates that the median of the distribution associated with a given subgraph and network mechanism is equal to Drosophila’s subgraph count. The upper histogram shows the raw subgraph counts for the Drosophila network at p* = 0.65. The corresponding subgraphs for the labels S1-S148 are shown in Fig. 10.





Supporting Figure 10

Fig. 10. Subgraphs associated to Fig. 9. All 148 8-step subgraphs are shown.





Supporting Figure 11

Fig. 11. Rank scores for 8-step subgraphs and p* = 0.5. Rank scores for all 148 8-step subgraphs and for every mechanism, based on training data for a confidence threshold p* = 0.5. The subgraphs are sorted by similarity in rank scores. A rank score of 50% indicates that the median of the distribution associated with a given subgraph and network mechanism is equal to Drosophila’s subgraph count. The upper histogram shows the raw subgraph counts for the Drosophila network at p* = 0.5. The corresponding subgraphs for the labels S1-S148 are shown in Fig. 12.





Supporting Figure 12

Fig. 12. Subgraphs associated to Fig. 11. All 148 8-step subgraphs are shown.





Supporting Figure 13

Fig. 13. Rank scores for 7-edge-subgraphs and p* = 0.65. Rank scores for all 130 7-edge subgraphs and for every mechanism, based on training data for a confidence threshold p* = 0.65. The subgraphs are sorted by similarity in rank scores. A rank score of 50% indicates that the median of the distribution associated with a given subgraph and network mechanism is equal to Drosophila’s subgraph count. The upper histogram shows the raw subgraph counts for the Drosophila network at p* = 0.65. The corresponding subgraphs for the labels S1-S130 are shown in Fig. 14.





Supporting Figure 14

Fig. 14. Subgraphs associated to Fig. 13. All 130 7-edge subgraphs are shown.





Supporting Figure 15

Fig. 15. Drosophila with artificial noise: Color-coded raw subgraph counts for Drosophila’s network with artificially introduced noise. The y axis shows the fraction of edges that have been replaced randomly. The subgraph counts are averaged over 200 independent realizations of the randomization procedure. The subgraph labels correspond to Fig. 8.





Table 3. Comparing the sizes of the giant components

p*

DMC

DMR

RDG

AGV

LPA

SMW

RDS

Drosophila

0.65

85 ± 65

1643 ± 138

1657 ± 121

308 ± 248

531 ± 233

64 ± 25

888 ± 336

1433

0.5

353 ± 205

3393 ± 184

3267 ± 113

4187 ± 904

4409 ± 366

3315 ± 1624

3679 ± 178

3039

Shown are the average giant component sizes for every mechanism and both confidence thresholds. Note that for this single feature, DMC does not reproduce a value close to Drosophila’s.





Table 4. Prediction accuracy for tested networks using fivefold cross-validation

Prediction

Truth

DMC

DMR

RDG

AGV

LPA

SMW

RDS

DMC

99.0%

0.0%

0.0%

0.0%

0.0%

1.0%

0.0%

DMR

0.1%

96.1%

3.7%

0.0%

0.0%

0.0%

0.1%

RDG

0.0%

2.4%

97.6%

0.0%

0.0%

0.0%

0.0%

AGV

0.0%

0.0%

0.0%

81.4%

10.6%

6.9%

1.1%

LPA

0.0%

0.0%

0.0%

7.8%

92.2%

0.0%

0.0%

SMW

0.3%

0.0%

0.0%

2.5%

0.0%

96.9%

0.3%

RDS

0.0%

0.0%

0.0%

0.8%

0.0%

0.4%

98.8%

The (i, j) entry is the probability of predicting class j given that the true class is i. The training data are based on the Drosophila protein network with p* = 0.65. The input features for the classifier are 8-step subgraphs.





Table 5. Prediction accuracy for tested networks using fivefold cross-validation

Prediction

Truth

DMR

DMC

AGV

LPA

SMW

RDS

RDG

DMR

99.3%

0.0%

0.0%

0.0%

0.0%

0.1%

0.6%

DMC

0.0%

99.7%

0.0%

0.0%

0.3%

0.0%

0.0%

AGV

0.0%

0.1%

84.7%

13.5%

1.2%

0.5%

0.0%

LPA

0.0%

0.0%

10.3%

89.6%

0.0%

0.0%

0.1%

SMW

0.0%

0.0%

0.6%

0.0%

99.0%

0.4%

0.0%

RDS

0.0%

0.0%

0.2%

0.0%

0.8%

99.0%

0.0%

RDG

0.9%

0.0%

0.0%

0.1%

0.0%

0.0%

99.0%

The (i, j) entry is the probability of predicting class j given that the true class is i. The training data are based on the Drosophila protein network with p* = 0.5. The input features for the classifier are 8-step subgraphs.





Table 6. Prediction accuracy for tested networks using fivefold cross-validation

Prediction

Truth

DMC

DMR

RDG

AGV

LPA

SMW

RDS

DMC

99.3%

0.0%

0.0%

0.0%

0.0%

0.7%

0.0%

DMR

0.1%

97.0%

2.9%

0.0%

0.0%

0.0%

0.0%

RDG

0.0%

2.7%

97.0%

0.2%

0.1%

0.0%

0.0%

AGV

0.0%

0.0%

0.0%

82.7%

10.6%

6.1%

0.6%

LPA

0.0%

0.0%

0.0%

9.1%

90.9%

0.0%

0.0%

SMW

0.3%

0.0%

0.0%

2.8%

0.0%

96.7%

0.2%

RDS

0.0%

0.0%

0.0%

0.5%

0.0%

0.4%

99.1%

The (i, j) entry is the probability of predicting class j given that the true class is i. The training data are based on the Drosophila protein network with p* = 0.65. The input features for the classifier are subgraphs with up to seven edges.





Table 7. Performance: Training- and test-losses for different confidence thresholds p* and different cut-offs in subgraph size

 

Eight-step subgraphs

Subgraphs with up to seven edges

 

p* = 0.65

p* = 0.5

p* = 0.65

Training-loss

1.3 ± 0.2%

0.0 ± 0.0%

1.4 ± 0.2%

Test-loss

5.4 ± 0.3%

4.2 ± 0.4%

5.3 ± 0.4%

 

 

Table 9. Prediction scores for the Drosophila protein network for different confidence thresholds p* and different cut-offs in subgraph size

 

Eight-step subgraphs

Subgraphs with up to seven edges

Eight-step subgraphs

 

p* = 0.65

p* = 0.65

p* = 0.5

Rank

Class

Score

Class

Score

Class

Score

1

DMC

8.2 ± 1.0

DMC

8.6 ± 1.1

DMC

0.8 ± 2.9

2

DMR

−6.8 ± 0.9

DMR

−6.1 ± 1.7

DMR

−2.1 ± 2.0

3

RDG

−9.5 ± 2.3

RDG

−9.3 ± 1.6

AGV

−3.1 ± 2.2

4

AGV

−10.6 ± 4.2

AGV

−11.5 ± 4.1

LPA

−10.1 ± 3.1

5

LPA

−16.5 ± 3.4

LPA

−14.3 ± 3.2

SMW

−20.6 ± 1.9

6

SMW

−18.9 ± 0.7

SMW

−18.3 ± 1.9

RDS

−22.3 ± 1.7

7

RDS

−19.1 ± 2.3

RDS

−19.9 ± 1.5

RDG

−22.5 ± 4.7

Drosophila is consistently classified as a DMC network, with an especially strong prediction for a confidence threshold of p* = 0.65 and independently of the cut-off in subgraph size.

 

This Article

  1. PNAS March 1, 2005 vol. 102 no. 9 3192-3197
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information