Kim et al. 10.1073/pnas.0710183104.
Fig. 4. (A) The periphery of the human interactome is strongly enriched for genes under positive selection. Shown is the correlation of the likelihood to be positively selected (5) and the degree centrality. Dots are colored according to the same scheme as in Fig. 1. As expected for a highly significant Spearman rank correlation, almost all dots are near the x axis for high degree centralities, whereas high probabilities for positive selection are only observed at low degree centralities (Spearman r = -0.06, significant at P = 1.2E-06). (B) The periphery of the human interaction network is more variable on the protein sequence level. Shown is the ratio of nonsynonymous to synonymous SNPs vs. network centrality. A higher ratio (which corresponds to variability at the protein sequence level) tends to occur at the network periphery. (Spearman r = -0.1, significant at P = 4.0E-04). (C Left) Degree centrality of genes with some likelihood of being under positive selection (with a log-likelihood ratio of larger than 0) vs. all other genes. (C Right) Degree centrality of genes with a high ratio of nonsynonymous to synonymous SNPs vs. genes with a low ratio of nonsynonymous to synonymous SNPs. The significance level of the differences is given as the Wilcoxon rank sum P value between the bars.
Fig. 5. Gene expression and positive selection. Genes that are likely to be under positive selection (dN/dS > 1) have a lower average gene expression than other genes. (A) Average RMA expression value for proteins with a degree >5 and £5. (B) Average number of tissues in which the gene is highly expressed (higher than 80th percentile in RMA value) for proteins with a degree >5 and £5.
Fig. 6. Gene expression and network centrality. Proteins with a high degree have a lower average gene expression. (A) Average RMA expression value for proteins with a degree >5 and £5. (B) Average number of tissues in which the gene is highly expressed (higher than 80th percentile in RMA value) for proteins with a degree >5 and £5.
Fig. 7. (A) Correlation of the number of overlapping SDs of each gene with the degree centrality of the associated protein (Spearman r = -0.04, significant at P = 3.3E-03). (B) The periphery of the human interaction network is more variable on the level of genome rearrangements. Shown is the frequency of CNVs that intersect a given gene vs. the corresponding protein's network centrality (Spearman r = -0.03, significant at P = 0.002). (C Left) Degree centrality of genes that intersect with at least one SD vs. the centrality of all other genes. (C Right) Degree centrality of genes that intersect with at least one CNV vs. the centrality of all other genes. The significance level of the differences is given as the Wilcoxon rank sum P value between the bars.
Table 6. Different interaction networks and their correlation with positive selection, SDs, SNPs, and CNVs
|
Positive selection pN/pS LR |
pN/pS ratio |
Number of overlapping SDs |
Allele frequency of overlapping CNVs |
|
|
HPRD degree |
-0.067 |
-0.049 |
-0.039 |
-0.034 |
|
HPRD betweenness |
-0.060 |
-0.043 |
-0.038 |
-0.032 |
|
CCSB HC degree |
-0.117 |
-0.026 |
-0.045 |
-0.047 |
|
CCSB HC betweenness |
-0.120 |
-0.069 |
-0.078 |
-0.064 |
|
All HTP degree |
-0.028 |
-0.030 |
-0.024 |
-0.021 |
|
All HTP betweenness |
-0.026 |
-0.026 |
-0.035 |
-0.018 |
|
BioGRID degree |
-0.089 |
-0.047 |
-0.022 |
-0.017 |
|
BioGRID betweenness |
-0.070 |
-0.042 |
-0.024 |
-0.012 |
|
Combined without ribosomal proteins degree |
-0.087 |
-0.044 |
-0.043 |
-0.041 |
|
Combined without ribosomal proteins betweenness |
-0.076 |
-0.040 |
-0.038 |
-0.032 |
To combat the effect of potential data biases in current incomplete interaction networks, we examined a number of different interaction networks that draw on different sources. We show here that the results of correlation with the genetic features we examined remain the same independent of network. We selected the following networks: The Human Protein Reference Database (HPRD) network (1), the Center for Cancer Systems Biology High Confidence (CCSB HC) (validated by at least one corroborating feature) network (2), the combination of the two high-throughput screen networks (2, 3), the current BioGRID network (4), and finally our original network (combining HPRD with the two HTP networks) when removing all ribosomal proteins. We show the Spearman rank correlation coefficients with the positive selection likelihood ratio (LR), the pN/pS ratio, the occurrence of SDs, and the frequency of CNVs.
1. Gandhi TK, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, Sharma S, Pinkert S, Nagaraju S, Periaswamy B, et al. (2006) Nat Genet 38:285-293.
2. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al. (2005) Nature 437:1173-1178.
3. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al. (2005) Cell 122:957-968.
4. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) Nucleic Acids Res 34:D535-D539.
5. Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, et al. (2005) PLoS Biol 3:e170.
Table 7. Correlation of network centrality and positive selection within cellular components
|
GO Slim cellular component |
Extracellular region |
Membrane |
Cytoplasm |
Nucleus |
Chromosome |
All |
|
Spearman ρ |
-0.04 |
-0.10 |
-0.13 |
-0.08 |
0.10 |
-0.06 |
|
Spearman P |
0.78 |
<<0.01 |
<<0.01 |
0.01 |
0.38 |
<<0.01 |
We asked whether this correspondence of cellular and network centrality would explain all of the observed signs of preferential adaptation at the network periphery. Although some GO categories preferentially occur at the network periphery, for a sizeable number of tested categories the significant correlations between positive selection and network centrality/betweenness remain even when only proteins within the category are analyzed (see Tables 3-5), indicating that the trend to be preferentially adapted for at the network periphery is, to a large extent, independent of functional categories or cellular localizations. Notable examples are the functional categories "response to stimulus" (Spearman correlation r = -0.1 within the category, P = 0.01) and "macromolecular metabolism." Furthermore, proteins inserted into the membrane (i.e., members of the respective GO cellular component category) also follow the same trend: membrane proteins are more likely to be positively selected if at the periphery of the interaction network than if positioned more toward the network center (Spearman correlation r = -0.1 within the category, P = 0.009). Shown is the average degree and betweenness of proteins that are annotated to the GO cellular component terms. Also shown are the Spearman correlation of the betweenness centrality with the likelihood ratio of positive selection when only considering genes from this particular GO term. Peripheral cellular components also tend to lie on the network periphery. Within many cellular component (and biological process) terms, the correlation remains significant, whereas in some, it becomes even stronger. Notable are strong and significant correlations within membrane proteins at the spatial cell periphery.