New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Conservation and topology of protein interaction networks under duplicationdivergence evolution

Communicated by M. Gromov, Institut des Hautes Études Scientifiques, BuressurYvette, France, April 30, 2008 (received for review March 22, 2007)
Abstract
Genomic duplicationdivergence processes are the primary source of new protein functions and thereby contribute to the evolutionary expansion of functional molecular networks. Yet, it is still unclear to what extent such duplicationdivergence processes also restrict by construction the emerging properties of molecular networks, regardless of any specific cellular functions. We address this question, here, focusing on the evolution of protein–protein interaction (PPI) networks. We solve a general duplicationdivergence model, based on the statistically necessary deletions of protein–protein interactions arising from stochastic duplications at various genomic scales, from singlegene to wholegenome duplications. Major evolutionary scenarios are shown to depend on two global parameters only: (i) a protein conservation index (M), which controls the evolutionary history of PPI networks, and (ii) a distinct topology index (M′) controlling their resulting structure. We then demonstrate that conserved, nondense networks, which are of prime biological relevance, are also necessarily scalefree by construction, irrespective of any evolutionary variations or fluctuations of the model parameters. It is shown to result from a fundamental linkage between individual protein conservation and network topology under general duplicationdivergence evolution. By contrast, we find that conservation of network motifs with two or more proteins cannot be indefinitely preserved under general duplicationdivergence evolution (independently from any network rewiring dynamics), in broad agreement with empirical evidence between phylogenetically distant species. All in all, these evolutionary constraints, inherent to duplicationdivergence processes, appear to have largely controlled the overall topology and scaledependent conservation of PPI networks, regardless of any specific biological function.
The primary source of new protein functions is generally considered to originate from duplication of existing genes followed by functional divergence of their duplicate copies (1–3). In fact, duplicationdivergence events have occurred and continue to occur at a wide range of genomic scales, from many independent duplications of individual genes^{†} [10^{−3} fixed events per gene per million years (MY) (4)] to rare but evolutionary dramatic duplications of entire genomes [one fixed event per 100–200 MY (5)]. For instance, there have been between two and four consecutive wholegenome duplications in all major eukaryote kingdoms in the past 300–500 MY (5). This actually amounts to a moreorless similar contribution of new genes from wholegenome duplication as from individual gene duplications [i.e., one fixed event per 100–200 MY ≃ 10^{−3} fixed events per gene per MY, assuming a 10% fixation rate after a wholegenome duplication with ≈10,000 genes (5)].
This succession of wholegenome duplications, together with the accumulation of individual gene duplications, must have greatly contributed to shaping the global structure of large biological networks, such as protein–protein interaction (PPI) networks, that control cellular activities. In fact, concordant empirical evidence reveals the evolutionary persistence of duplicationderived protein–protein interactions. For instance, there are clear enrichments of recent protein duplicates around common protein partners compared with randomly picked pairs of proteins (5, 6), although the fraction of proteins identified as having undergone a (recent) duplication (<200 MY) remains typically small in absolute terms, for example, 10% (4). Similarly, protein residues implicated in protein–protein interaction are generally the most conserved at the surface of proteins (7), revealing their duplicationderived origin,^{‡} with typically little more than one conserved binding interface per proteinbinding domains.^{§}
Ispolatov et al. (10) proposed an interesting local duplicationdivergence model of PPI network evolution based on (i) the statistical deletion of individual, duplicationderived interactions and (ii) a timelinear increase in genome and PPI network sizes. Clearly, the deletion of redundant interactions arising from duplication is necessary to avoid the emergence of biologically irrelevant, densely connected PPI networks, lacking lowdegree connectivities. Yet, we expect that independent local duplications and, a forteriori, partial or wholegenome duplications all lead to exponential, not timelinear, evolutionary dynamics of PPI networks. In the long time limit, exponential dynamics should outweigh all timelinear processes that have been assumed in earlier PPI network evolution models (10–15). Models based on timelinear processes also assume that local evolutionary dynamics remain essentially frozen, as long as they are not directly affected by a local modification of the network. Yet, in reality, sequence mutations and environmental changes continue to affect the evolution of whole PPI networks, not just in the immediate surroundings of recently duplicated proteins.
In this article, we propose and asymptotically solve a general duplicationdivergence model based on prevailing exponential dynamics^{¶} of PPI network evolution under local, partial, or global genome duplications. The only interaction changes that are considered are deletions of duplicationderived interactions. In particular, the rewiring dynamics of PPI networks by de novo creation of proteinbinding interfaces (4) is neglected (10), as suggested by the empirical evidence mentioned earlier (see also Evolution of PPI network motifs). Indeed, our aim here is to establish a theoretical baseline from which other evolutionary processes beyond strict gene duplication and interaction loss events, such as shuffling of protein domains (5) or horizontal gene transfers, can then be considered.
A visual overview of the model is shown in Fig. 1 including its two main effective parameters, M and M′, that control, respectively, the evolutionary history or conservation (M) and resulting structure or topology (M′) of PPI networks under duplicationdivergence evolution. In this article, we demonstrate a fundamental relation between protein conservation (M) and network topology (M′), that is, M ≤ M′, that is strictly independent from any evolutionary variations or fluctuations of the model parameters. We then discuss simple consequences in terms of evolutionary linkage between individual protein conservation and PPI network topology. The approach is also extended to outline the evolutionary statistics of smallnetwork motifs including two or more proteins. In particular, we show that network motifs, unlike individual proteins, cannot be indefinitely conserved under general duplicationdivergence evolution, regardless of any networkrewiring dynamics. Throughout the article, theoretical assumptions and results are commented on with brief discussions highlighting their biological relevance.
Results
General DuplicationDivergence Model.
The general duplicationdivergence (GDD) model is designed to capture PPI network properties caused by evolutionary constraints, inherent to duplicationdivergence processes and independent of selective adaptation (3) or any specific biological function. Concretely, the GDD model analyzes the deletion statistics of protein–protein interactions that arise from stochastic duplications at various genomic scales, from singlegene to wholegenome duplications. This deletion statistics of duplicationderived interactions is indeed a necessary “background” dynamics of PPI network evolution to prevent the emergence of biologically irrelevant, densely connected PPI networks, lacking lowdegree connectivities.
In practice, a fraction q of extant genes is randomly duplicated at each time step of the GDD model. The divergence of both duplicated and nonduplicated genes then leads to the stochastic deletion or conservation of their related interactions, before another round of duplicationdivergence occurs (Fig. 1). In the following, we first solve the GDD model assuming that q is constant over evolutionary time scales. We then study more realistic scenarios combining, for instance, rare wholegenome duplications (q = 1) with more frequent local duplications of individual genes (q ≪ 1), and including also stochastic fluctuations in all microscopic parameters of the GDD model (see Fig. 1 and below). To analyze the deletion statistics of duplicationderived interactions, we assume that ancient and recently duplicated interactions are stochastically conserved with distinct probabilities γ_{ij}'s, depending only on the recently duplicated or nonduplicated state of each protein partners, as well as on the asymmetric divergence between “old” and “new” (or more “conserved” and more “divergent”) gene duplicates (5), see the Fig. 1 legend (“s” for “singular,” nonduplicated genes and “o”/“n” for old/new asymmetrically divergent duplicates). Here, we consider nonoriented PPI networks, that is, γ_{ij} = γ_{ji}, for i, j = s, o, n.
The first effective parameters derived from these microscopic evolutionary parameters are the average rates of connectivity change Γ_{i} (i.e., k → kΓ_{i}) for each type of node i = s, o, n, where Γ_{i} = (1 − q)γ_{is} + q(γ_{io} + γ_{in}) is independent from node connectivity k. In the following, we assume Γ_{o} ≥ Γ_{n} by definition of old and new duplicates caused by asymmetric divergence. Note that selfinteracting proteins, corresponding to selflink loops, are not taken into account, for simplicity, in the main text, because they can be shown to have little effect on the asymptotic evolutionary regimes of the connectivity distribution (see SI Appendix, Fig. S3 and SI Text, for details).
We study the GDD evolutionary dynamics of PPI networks in terms of ensemble averages 〈Q^{(n)}〉 defined as the mean value of a feature Q over all realizations of the evolutionary dynamics after n successive duplications. This does not imply, of course, that all network realizations “coexist,” but only that a random selection of them is reasonably well characterized by the theoretical ensemble average. Although it is generally not the case for exponentially growing systems, here, we can show that ensemble averages over all evolutionary dynamics indeed reflect the properties of typical network realizations for biologically relevant regimes (see Statistical Properties of GDD Models in SI Appendix).
In the following, we focus on the number of proteins (or “nodes”) N_{k} of connectivity k in PPI networks, while postponing the analysis of GDD models for simple network motifs to the end of the article and the SI Appendix. The total number of nodes in the network is noted N = Σ_{k≥0} N_{k} and the total number of interactions (or “links”) L = Σ_{k≥0} kN_{k}/2. The dynamics of the ensemble averages 〈N_{k}^{(n)}〉 after n duplications is analyzed by using a generating function, The evolutionary dynamics of F^{(n)}(x) corresponds to the following recurrence deduced from the microscopic definition of the GDD model (see SI Appendix), where we note for i = s, o, n, where δ_{ij} = 1 − γ_{ij} are deletion probabilities (i, j = s, o, n) and A′_{i}(1) = (1 − q)γ_{is} + q(γ_{io} + γ_{in}) = Γ_{i}, average rates of connectivity change for each type of nodes i = s, o, n (Fig. 1).
Network Expansion (Γ) and Protein Conservation (M).
The total number of nodes generated by the GDD model, F^{(n)} (1), growths exponentially with the number of partial duplications, F^{(n)} (1) = C·(1 + q)^{n}, where C is the initial number of nodes, as a constant fraction of nodes q is duplicated at each time step. Yet, some nodes become completely disconnected from the rest of the graph during divergence and rejoin the disconnected component of size F^{(n)} (0). From a biological point of view, these disconnected nodes represent genes that have presumably lost all biological functions and become pseudogenes before being simply eliminated from the genome. We neglect the possibility for nonfunctional genes to reconvert to functional genes again after suitable mutations, and remove them at each round of partial duplication^{‖} focusing solely on the connected part of the graph.
In particular, the link growth rate Γ = (1 − q)Γ_{s} + qΓ_{o} + qΓ_{n} obtained by taking the first derivative of Eq. 2 at x = 1, controls whether the connected part of the graph is exponentially growing (Γ > 1) or shrinking (Γ < 1).
Let us now introduce another rate of prime biological interest, M = (1 − q)Γ_{s} + qΓ_{o}. It is the average rate of connectivity increase (M >1) of decrease (M < 1) for the most conserved duplicate lineage, which corresponds to a stochastic alternance between singular (s) and most conserved (o) duplicate descents. In particular, we have by construction, M < Γ = M + qΓ_{n}, independently from any evolutionary parameters, q and γ_{ij} > 0. This implies three main evolutionary regimes from the perspective of network expansion (Γ) and protein conservation (M) (Fig. 2):

If M < Γ < 1. PPI networks are vanishing in this regime with seemingly little biological relevance.

If M < 1 < Γ. PPI networks are expanding, in this case, but their proteins are not conserved over long evolutionary time scales. This implies that the networks forget their evolutionary history exponentially fast, as most nodes eventually disappear and, with them, all traces of network evolution. These networks are not preserved over time, but instead are continuously renewed from duplication of the (few) most connected nodes (Fig. 2). Individual proteins of a given network realization are thus more similar to one another than to any protein of other network realizations, which can be seen, from a speciation perspective, as PPI networks of phylogenetically distant organisms. This is in sharp contrast to the widespread structural orthology observed across all extant life forms, even though functions of orthologs often differ (see Evolution of PPI Network Motifs).

If 1 < M < Γ. By contrast, PPI networks remember their past evolution from the very beginning, in this case, as proteins statistically keep on increasing their connectivity once they have emerged from a duplicationdivergence event. This implies that most proteins are conserved throughout the evolution process and preserve some interaction partners. This is indeed in broad agreement with empirical evidence, because traces of protein conservation are even observed within the core transcriptional and translational machineries across all three major living kingdoms (16).
Evolution of PPI Network Degree Distribution.
We now turn to the evolution of the degree distribution and other topological properties of PPI networks, which correspond to the technical core of the GDD model. To this end, we rescale the exponentially growing connected graph by introducing a normalized generating function for the average degree distribution, where 〈N^{(n)}〉 = Σ_{k≥1} 〈N_{k}^{(n)}〉, that is, after removing 〈N_{0}^{(n)}〉. F^{(n)}(x) can be reconstructed from the shifted degree distribution, p̃^{(n)}(x) = p^{(n)}(x) − 1, as which yields the following recurrence for p̃^{(n)}(x), where Δ^{(n)} is the ratio between two consecutive graph sizes in terms of connected nodes, that is, Δ^{(n)} = 〈N^{(n + 1)}〉/〈N^{(n)}〉, Although Δ^{(n)} is not known a priori and should, in general, be determined selfconsistently with p̃^{(n)}(x) itself, it is directly related to the evolution of the mean degree k̄^{(n)} = Σ_{k≥1} kp_{k}^{(n)} obtained by taking the first derivative of Eq. 6 at x = 1, Hence, although connected networks grow exponentially both in terms of number of links (link growth rate Γ) and number of connected nodes (node growth rate Δ^{(n)}), features normalized over these growing networks, such as node mean connectivity (Eq. 8) or distributions of node degree (or simple network motifs, see below), exhibit richer evolutionary dynamics in the asymptotic limit n → ∞, as we will now discuss.
Asymptotic Analysis of Node Degree Distribution (M′).
The node degree distribution can be shown (see SI Appendix) to converge toward a limit function p(x), with p̃(x) = p(x) − 1 solution of the functional Eq. 6. where Δ = lim_{n→∞}Δ^{(n)} with both Δ ≤ 1 + q, the maximum node growth rate, and Δ ≤ Γ, the link growth rate, because the number of connected nodes cannot increase faster than the number of links. Asymptotic regimes with Δ = Γ correspond to the same exponential growth of the network in terms of connected nodes and links, and will be referred to as linear regimes, hereafter, whereas Δ < Γ corresponds to nonlinear asymptotic regimes, which imply a diverging mean connectivity k̄^{(n)} → ∞ in the asymptotic limit n → ∞ (Eq. 8).
To determine Δ and p(x) selfconsistently, we first express successive derivatives of p(x) at x = 1 in terms of lower derivatives by using Eq. 9, where α_{k,l} are positive functions of the 1 + 6 parameters. Inspection of this expression readily defines two classes of asymptotic regimes, regular and singular regimes, depending on the value of a topology index M′ = max_{i}(Γ_{i}), for i = s, o, n. The detailed analysis relies on the “characteristic function” h(α) = (1 − q)Γ_{s}^{α} + qΓ_{o}^{α} +qΓ_{n}^{α}, as outlined below and in Fig. 3(see SI Appendix, Asymptotic Methods, for proof details).
Regular regimes, if M′ = max_{i}(Γ_{i}) < 1, for i = s, o, n. In this case, the only possible solution is Δ = h(1) (i.e., linear regime). Hence, since M′ < 1, h(1) > h(k), and successive derivatives ∂_{x}^{k}p(1) are thus finite and positive for all k ≥ 1. This corresponds to an exponential decrease of the node degree distribution for k ≫ 1, p_{k} ∝ e^{−μk} with a power law prefactor. The limit average connectivity (Eq. 8) is finite in this case, k̄ < ∞.
Singular regimes, if M′ = max_{i}(Γ_{i}) > 1, for i = s, o, n. In this case, Eq. 10 suggests that there exists an integer r ≥ 1 for which the rth derivative is negative, ∂_{x}^{r}p(1) < 0, which is impossible by definition. This simply means that neither this derivative nor any higher ones exist (for k ≥ r). We thus look for selfconsistent solutions of the “characteristic equation” h(α) = Δ (with r − 1 < α ≤ r) corresponding to a singularity of p(x) at x = 1 and a power law tail of p_{k}, for k ≫ 1 (17), where the singular term (1 − x)^{α} is replaced by (1 − x)^{r} ln(1 − x) for α = r exactly. Several asymptotic behaviors are predicted from the convex shape of h(α) (∂_{α}^{2}h ≥ 0), depending on the signs of its derivatives h′(0) and h′(1) (Fig. 3 Inset).

If h′(0) < 0 and h′(1) < 0. There exists an α^{★} > 1 so that h(α^{★}) = h(1) and the condition Δ ≤ h(1) implies that α^{★} ≥ α ≥ 1. The solution α = 1 requires h′(1) = 0 and should be rejected in this case. Hence, because k̄ < ∞ for α > 1, we must have Δ = h(1) (linear regime) and a scalefree limit degree distribution with a unique α = α^{★} > 1, p_{k} ∝ k^{−α★−1} for k ≫ 1.

If h′(0) < 0 and h′(1) = 0. α = 1, Δ = h(1), and p_{k} ∝ k^{−2} for k ≫ 1 (k̄^{(n)} → ∞ as n → ∞).

If h′(0) < 0 and h′(1) > 0. The general condition Δ ≤ min(h(0), h(1)) leads a priori to a whole range of possible α ∈]0, 1] corresponding to stationary scalefree degree distributions with diverging mean degrees k̄^{(n)} → ∞. Yet, numerical simulations suggest that there might still be a unique asymptotic node growth rate Δ regardless of initial conditions or evolutionary trajectories, although convergence is extremely slow (see SI Appendix, Numerical simulations).

If h′(0) ≥ 0 and h′(1) > 0. Δ = h(0) = 1 + q, implying that all duplicated nodes are selected in this case. No suitable α exists as the node degree distribution is exponentially shifted toward higher and higher connectivities. This is a dense, nonstationary regime with seemingly little relevance to biological networks.
Finally, note that the characteristic equation Δ = h(α) can be recovered directly from the average change of connectivity k → kΓ_{i} and the following continuous approximation (by using N^{(n)} = Σ_{k} N_{k}^{(n)} ≃ ∫_{u} N_{u}^{(n)} du and 〈N_{k}^{(n)}〉 ∝ k^{−α−1}),
Local (q ≪ 1) and Global (q = 1) Duplication Limits.
The asymptotic degree distribution of the GDD model can be conveniently mapped into the (Γ, M) plane for two limit regimes of prime biological relevance: (i) for local duplication events (q ≪ 1 and γ_{ss} = 1; Fig. 4A) and (ii) for wholegenome duplication events (q = 1; Fig. 4B). See SI Appendix for details.
The local duplicationdivergence limit leads to scalefree limit degree distributions for both conserved and nonconserved networks, with power law exponents 1 < α + 1 ≤ 3 if γ_{so} ≃ 1 (i.e., which ensures that most previous interactions are conserved in at least one copy after duplication). By contrast, the wholegenome duplicationdivergence limit leads to a wide range of asymptotic behaviors from nonconserved, exponential regimes to conserved, scalefree regimes with arbitrary power law exponents. Conserved, nondense networks require, however, an asymmetric divergence between old and new duplicates (γ_{oo} ≠ γ_{nn}) (5) and lead to scalefree limit degree distributions with the same range of exponents 1 < α + 1 ≤ 3 for maximum divergence asymmetry (γ_{oo} ≃ 1 and γ_{nn} ≃ 0).
Evolutionary Variations of Model Parameters.
The previous analysis with fixed parameters {q, γ_{ij}} can be readily extended to combine local and global PPI network duplications (Fig. 4 A and B) or even include any evolutionary variations and stochastic fluctuations of the GDD model parameters with arbitrary series {q^{(n)},γ_{ij}^{(n)}}_{R} (see SI Appendix). Protein conservation is then found to be controlled by the cumulated product of connectivity growth/decrease rates following the most conserved, old duplicate lineage, with conserved (resp. nonconserved) protein evolutionary regimes corresponding to M > 1 (resp. M < 1).
A similar geometric average also controls the nature of the asymptotic degree distribution as the network topology index now reads, with M′ < 1 corresponding to exponential networks and M′ > 1 to scalefree (or dense) networks with an effective node degree exponent α and effective node growth rate Δ that are selfconsistent solutions of the generalized characteristic equation, where h^{(n)}(α) = (1 − q^{(n)})Γ_{s}^{(n)α} + q^{(n)}Γ_{o}^{(n)α} + q^{(n)}Γ_{n}^{(n)α}, as before. This leads to exactly the same discussion for singular regimes as with constant q and Γ_{i} (Fig. 3) because of the convexity of the generalized function h(α) (∂_{α}^{2}h(α) ≥ 0; see SI Appendix for details and discussion on the R → ∞ limit).
In particular, because (1 − q^{(n)})Γ_{s}^{(n)} + q^{(n)}Γ_{o}^{(n)} ≤ max_{i}(Γ_{i}^{(n)}) for all q^{(n)} and Γ_{i}^{(n)} (i = s, o, n), we always have M ≤ M′. This relation implies a fundamental linkage between protein conservation and network topology under general duplicationdivergence evolution, regardless of all possible evolutionary variations of the model parameters, q^{(n)} and Γ_{i}^{(n)}. We expect, in particular, that all conserved networks are necessarily scalefree (or dense) (1 < M ≤ M′), whereas exponential networks can never be conserved (M ≤ M′ < 1), under general duplicationdivergence evolution.
Evolution of PPI Network Motifs.
The generating function approach, introduced for the onenode degree distribution p_{k}^{(n)} (Eqs. 1–6), can be generalized to analyze the evolutionary statistics of multinode correlation functions and related clustering coefficient, distribution of firstneighbor average connectivity g_{k} (18) (see Fig. 6) and smallnetwork motifs. Yet, although M′ also controls transitions between major evolutionary regimes for multinode correlation functions, their analysis remains technically involved (SI Appendix).
By contrast, the conservation property of network motifs under general duplicationdivergence evolution turns out to be remarkably simple, as outlined in Fig. 5. We derive conservation indices for specific network motifs by summing over all possible combinations of s nodes (with probability κ_{s} = 1 − q) or o nodes (with probability κ_{o} = q) and the corresponding γ_{ij} (i, j = s, o) (Fig. 5). Clearly, network motifs with a larger number of interactions, p ≥ 1, have lower conservation indices, M_{p} ≃ O(γ_{ij}^{p}) (Fig. 5). Moreover, because the probability to conserve a specific interaction γ_{ij} cannot be exactly 1, because of deleterious mutations (i.e., γ_{ij} < 1), motif conservation indices M_{p} must all be <1, regardless of any parameter variations, q^{(n)} and γ_{ij}^{(n)}.
Hence, network motifs cannot be indefinitely conserved under duplicationdivergence evolution, even though their individual proteins are typically conserved in the network (if M > 1) (Fig. 2). This implies that structural orthology between individual proteins from phylogenetically distant species cannot indefinitely coincide with functional orthology at the level of protein interactions and complexes, in broad agreement with empirical evidences (19). The resulting turnover toward more and more divergent interaction partners is a simple evolutionary consequence of the GDD model, regardless of any networkrewiring dynamics (that have been neglected here). In particular, even the most conserved orthologous proteins (s/o descents) must eventually perform different functions, but conserved ancestral functions are inevitably passed down to less conserved protein complexes (s/o/n descents) in phylogenetically distant species. This inherent evolutionary constraint of the GDD model sets conceptual and practical limitations on functional annotation transfer between genomes (19).
Comparison with Empirical PPI Network Data.
Although the GDD model can, in principle, accommodate 1q + 6γ_{ij} fitting parameters, and any evolutionary variations or fluctuations q^{(n)} and γ_{ij}^{(n)}, we have restricted all quantitative comparisons with the empirical data (20) to biologically relevant regimes, limited to one or two effective fitting parameters, only.
The results, corresponding to 10^{3} to 10^{4} protein nodes and low network growth rates in the conserved, scalefree regime (1< M ≤ M′), are compared with Yeast PPI network data (20) (Fig. 6), both for the connectivity distribution p_{k} and the firstneighbor average connectivity g_{k} (18) (see Fig. 6 legend and SI Appendix for model and fitting details).
Extending these numerical simulations to larger PPI network sizes (>10^{5} nodes) shows little changes in these distributions for small k ≤ 20 (SI Appendix, Figs. S5–S7). This suggests that the available empirical PPI network for yeast (20) has essentially reached its asymptotic degree distribution, for k ≤ 20. By contrast, global convergence of the GDD model is typically quite slow for large growth rate regimes with little biological relevance, as shown in SI Appendix, Numerical simulations, Figs. S5–S7.
Conclusions
Although conservation and network topology have a priori no reason to be related features of emerging PPI networks in the course of evolution, we demonstrate in this article that they are, in fact, linked properties under general duplicationdivergence evolution, because of a fundamental linkage between protein conservation and network topology indices, that is, M ≤ M′, regardless of any variations or fluctuations of the model parameters.
By contrast, we also showed that conservation of network motifs cannot be indefinitely preserved under general duplicationdivergence evolution (independently from any network rewiring dynamics). This underlies intrinsic limitations on functional annotation transfer across phylogenetically distant genomes (19).
All in all, these evolutionary constraints, inherent to duplicationdivergence processes and independent from selective adaptation (3), appear to have largely controlled the overall topology and scaledependent conservation of PPI networks, regardless of any specific biological function.
Acknowledgments
We thank U. Alon, R. Bruinsma, M. CosentinoLagomarsino, T. Fink, R. Monasson, and M. Vergassola for discussion. This work was supported by the Ministry of Higher Education and Research, Centre National de la Recherche Scientifique, and Institut Curie.
Footnotes
 *To whom correspondence should be addressed. Email: herve.isambert{at}curie.fr

Author contributions: H.I. designed research; K.E. and H.I. performed research; K.E. and H.I. analyzed data; and K.E. and H.I. wrote the paper.

The authors declare no conflict of interest.

↵† Duplicated protein domains or subdomains are also quite common even within ancestral proteins, such as the ubiquitous aquaporin membrane proteins in eubacteria, archaea, and eukaryotes, or the TATAbinding protein from archaea and eukaryotes.

↵‡ Except for a few interesting cases of proteinbinding mimicry, typically found in virus–host protein–protein interactions (8).

↵§ Except for domains that selfassemble into homooligomers, which must have at least two binding interfaces, see table 2 in ref. 9.

↵¶ Results from the timelinear duplicationdivergence model (10) are recovered as a special limit, see supporting information (SI) Appendix.

This article contains supporting information online at www.pnas.org/cgi/content/full/0804119105/DCSupplemental.

↵‖ Note, however, that pseudogenes may still have a critical role in evolution by providing functional domains that can be fused to adjacent genes. This supports a view of PPI network evolution in terms of protein domains instead of entire proteins (SI Appendix, Fig. S6B, and ref. 5). Yet, we showed in ref. 5 that extensive domain shuffling does not change the resulting network topology from duplicationdivergence models.
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 Ohno S
 ↵
 Li WH
 ↵
 Lynch M
 ↵
 ↵
 ↵
 ↵
 Jones S,
 Thornton JM
 ↵
 Henschel A,
 Kim WK,
 Schroeder M
 ↵
 ↵
 Ispolatov I,
 Krapivsky PL,
 Yuryev
 ↵
 ↵
 ↵
 Raval A
 ↵
 ↵
 ↵
 ↵
 Flajolet P,
 Sedgewick R
 ↵
 Maslov S,
 Sneppen K
 ↵
 Yu H,
 et al.
 ↵
 Alfarano C,
 et al.
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Applied Mathematics
Biological Sciences
Related Content
 No related articles found.