The topological relationship between the largescale attributes and local interaction patterns of complex networks
 ^{*}Department of Physics and Center for Complex Network Research, University of Notre Dame, Notre Dame, IN 46556; ^{†}Department of Pathology, Northwestern University, Chicago, IL 60611; and ^{‡}Départment de Physique Théorique and ^{§}Section de Mathématiques, Université de Genève, CH1211 Geneva, Switzerland
See allHide authors and affiliations

Edited by Harry L. Swinney, University of Texas, Austin, TX, and approved November 1, 2004 (received for review August 16, 2004)
Abstract
Recent evidence indicates that the abundance of recurring elementary interaction patterns in complex networks, often called subgraphs or motifs, carry significant information about their function and overall organization. Yet, the underlying reasons for the variable quantity of different subgraph types, their propensity to form clusters, and their relationship with the networks' global organization remain poorly understood. Here we show that a network's largescale topological organization and its local subgraph structure mutually define and predict each other, as confirmed by direct measurements in five well studied cellular networks. We also demonstrate the inherent existence of two distinct classes of subgraphs, and show that, in contrast to the lowdensity type II subgraphs, the highly abundant type I subgraphs cannot exist in isolation but must naturally aggregate into subgraph clusters. The identified topological framework may have important implications for our understanding of the origin and function of subgraphs in all complex networks.
A number of complex biological and nonbiological networks were recently found to contain network motifs, representing elementary interaction patterns between small groups of nodes (subgraphs) that occur substantially more often than would be expected in a random network of similar size and connectivity (1, 2). Theoretical and experimental evidence indicates that at least some of these recurring elementary interaction patterns carry significant information about the given network's function and overall organization (1–4). For example, transcriptional regulatory networks of cells (1, 2, 5, 6), neural networks of C. elegans (2), and some electronic circuits (2) are all information processing networks that contain a significant number of feedforward loop (FFL) motifs. However, in transcriptional regulatory networks these motifs do not exist in isolation but meld into motif clusters (7), while other networks are devoid of FFLs altogether (2).
In general, all subgraphs have two important properties: their topology and the directionality of their links. In cellular networks, these two properties can be clearly separated from each other. In protein–protein interaction (PPI) networks all links are by definition nondirectional. In contrast, in transcriptional regulatory networks information flow between a transcription factor and the operon (gene) regulated by it is almost always unidirectional (1, 2). Metabolic networks occupy an intermediate position between these two extremes, because most, but not all, metabolic reactions are reversible under various growth conditions. Despite the difference in the relative role of link directionality, the largescale organization of the three different network types is quite similar, most being characterized by a scalefree connectivity distribution and hierarchical modularity (8–12). The only exception is the incoming degree distribution (i.e., the number of transcription factors regulating a target gene) of regulatory networks, which decays faster than a power law, because the number of transcription factors that can simultaneously bind to a target gene's promoter region appears to be limited by structural constraints (13).
A coherent understanding of a network's topological and functional organization requires the development of a single framework that can explain the appearance of subgraphs and motifs, the mechanisms responsible for their aggregation into larger superstructures, and their relationship with the universal largescale features of complex networks. Here we present such a unifying framework by focusing on five well characterized cellular networks of a prokaryotic model organism and a eukaryotic model organism, the metabolic and transcriptional regulatory networks of Saccharomyces cerevisiae and Escherichia coli, respectively, and the PPI network of S. cerevisiae. We show that the subgraph density in these networks can be fully predicted based on knowledge of the two parameters characterizing their global scalefree and hierarchical topology. Furthermore, we demonstrate that a network's largescale topological organization and its local subgraph structure mutually define and predict each other. We also show the inherent existence of two distinct classes of subgraphs, demonstrating that in contrast to the lowdensity type II subgraphs, the highly abundant type I subgraphs cannot exist in isolation but must naturally aggregate into subgraph clusters. These results imply a fundamental unity in the origin of subgraphs and subgraph clusters in all complex networks.
Materials and Methods
The transcriptional regulatory networks of E. coli and S. cerevisiae (1, 2) are available from www.weizmann.ac.il/mcb/UriAlon. We have studied their undirected representations, where transcription factors and genes are represented by nodes and each regulationbased interaction is replaced by an undirected link. The metabolic networks of E. coli and S. cerevisiae were obtained from the wit/ergo database (14) (http://igweb.integratedgenomics.com/IGwit). Metabolites are represented by nodes, and undirected links connect each substrate to each product of the same reaction. The PPI network of S. cerevisiae was obtained from dip (15) (http://dip.doembi.ucla.edu). Proteins are represented by nodes, and each pairwise protein interaction is represented by an undirected link.
Results
The Abundance of Subgraphs in Cellular Networks.Table 1 lists the density of several nnode subgraphs of the five studied intracellular molecular interaction networks: the metabolic and transcriptional regulatory networks of S. cerevisiae and E. coli and the PPI network of S. cerevisiae. Our study is limited to subgraphs with n nodes and m links that can be decomposed into a central node with n – 1 neighbors, the remaining m – n + 1 links connecting these neighbors to each other. The comparison shown in Table 1 demonstrates that the densities of specific subgraphs in the corresponding E. coli and S. cerevisiae networks are comparable, underscoring the absence of significant differences in the subgraph density between the two organisms. There are notable differences, however, among the different types of molecular interaction webs even within the same organism: The metabolic and PPI networks display a much higher subgraph density than transcriptional regulatory networks. The observed paucity of certain subgraph types and the abundance of others suggest two possible scenarios for their origin: Their number may be largely determined by local functional constraints, such as the desirable signal processing properties of feedforward motifs (16, 17), or, alternatively, may primarily reflect on the network's topological organization.
To assess the observed paucity of certain subgraph types and the abundance of others, we start by focusing on the two key topological parameters of a network's largescale structure: the degree exponent, γ (18), and the hierarchical exponent, α (19). The degree exponent (γ) characterizes the number of interactions in which a node is engaged, capturing the overall inhomogeneity in the connectivity of complex cellular networks: Whereas most molecules are engaged in only a few interactions, a few hubs are linked to a significantly higher number of other molecules (nodes). These wide degree variations are captured by the degree distribution, which for the studied cellular networks follows a power law, P(k) ∼ k^{–γ} (7, 13, 20–23). In contrast, the hierarchical exponent (α) characterizes the networks' innate modularity, indicating that many small, highly interconnected groups of nodes form larger but less cohesive topological modules (7, 19). This hierarchical modularity is captured by the scaling law (24, 38) C(k) ∼ C_{0}k^{–α}, where C(k) = 2T(k)/k(k – 1) is the clustering coefficient of a node with k links, denoting the probability that a node's neighbors are linked to each other (25), and T(k) is the number of direct links between the node's k neighbors. Empirical studies indicate that each cellular network is characterized by a unique pair of (γ, α) parameters, listed in Table 2, which were determined from the scaling of P(k) and C(k) functions describing the undirected version of these networks (7, 19).
Type I and II Subgraphs. To examine the relationship between these two parameters and the observed subgraph density, we calculated analytically the number N_{nm} of subgraphs with n nodes and m interactions expected for a network of N nodes, in which the nodes, apart from fixed (γ,α) parameters, are randomly connected to each other. As each pair of neighbors of a node with degree k is connected with a probability C(k) ∼ k^{–α}, the average number of (n,m) subgraphs that pass by a node with degree k scales as N_{nm}(k) ∼ k^{n}^{–1–(m–n+1)α}. Summing over the degree distribution, we obtain the number of (n,m) subgraphs, N_{nm} ∼ N Σ_{k} P(k)N_{nm}(k). The convergence of this sum predicts the existence of two subgraph classes. Type I subgraphs are those that satisfy (m – n + 1)α – (n – γ) < 0, their number being given by , where k_{max} denotes the degree of the most connected node in the network. Type II subgraphs are those that satisfy (m – n + 1)α – (n – γ) > 0, and their number is given by . As even for finite networks k_{max} ≫ 1, the typical number of type I subgraphs is significantly larger than the number of type II subgraphs . Moreover, for infinite systems (N → ∞) the relative number of type II subgraphs is vanishingly small compared with type I subgraphs, as . Table 1 supports these predictions, indicating that the density of the subgraphs with a minimal number of connections (extreme type I) (4,3), (5,4), (6,5), (7,6) is in the range 10 to 10^{5}. In contrast, the density of the subgraphs with a maximal number of connections (extreme type II) (4,6), (5,10), (6,15), (7,21) is either zero or close to zero, and always negligible compared with their type I counterparts.
The main results of our analysis are summarized in the (n,m) phase diagrams of Fig. 1, in which each square corresponds to a different subgraph. The (m – n + 1)α – (n – γ) = 0 condition, predicted to separate the type I and II subgraphs, appears as stepped yellow phase boundaries in the phase diagrams. For example, for the E. coli transcriptional regulatory network with α = 1 and γ = 2.1 (Table 2) the phase boundary corresponds to a steppedline with approximate overall slope 1 + 1/α = 2.0 and intercept –1 – γ/α = –3.1 (Fig. 1a). The type II subgraphs are those above this boundary and should be either absent or present only in very low numbers in the transcriptional regulatory network. In contrast, the type I subgraphs below the boundary are predicted to be abundant.
To visually highlight the validity of these predictions, we colorcoded Fig. 1 according to the normalized count of each subgraph in each cellular network. We find a good agreement between the analytical predictions and the measured subgraph count: The normalized count of the type I subgraphs below the phase boundary is in the 10^{–2} to 1 range, in contrast with the type II subgraphs above the predicted boundary, whose normalized count is either zero or in the 10^{–9} to 10^{–3} range. Comparing Fig. 1 a–e indicates that whereas the stepped phase boundaries for the different cellular networks differ because of the differences in the (γ,α) exponents (Table 2), the observed densities in the real networks follow relatively closely the predicted phase boundaries. Occasional local deviations from the predictions can be attributed to the error bars of the (γ,α) exponents (Table 2), which allow for some local uncertainties for the phase boundary. Fig. 1 a–e also indicates that, in agreement with the empirical findings (1–4), each cellular network is characterized by a distinct set of overrepresented type I subgraphs, raising the possibility of classifying networks based on their local structure (4). Yet, the phase diagrams demonstrate that knowledge of two global topological parameters automatically uncovers the local structure of cellular networks, suggesting that a subgraph or motifbased classification could be equivalent with a classification based on the different (γ,α) exponents characterizing these networks.
Subgraphs and Motifs. The concept of motifs was recently introduced to denote those subgraphs whose number exceeds by a preset threshold their expected count in a randomized network (1–4). Our results indicate that overrepresented type I subgraphs are innate topological features of complex networks, and we do not need to invoke a comparison to a randomized graph or introduce a threshold parameter to identify them. Indeed, the signature of type I subgraphs is that their density increases with the number of nodes in the network , compared with the type II subgraphs, whose density is independent of the network size . The existence of the type II subgraphs is intertwined with the network's global hierarchical topology: The decreasing C(k) reduces the likelihood that the neighbors of a highly connected node are linked to each other, therefore limiting the chance that these nodes participate in highly connected subgraphs. If C(k) were independent of k (i.e., α = 0), only type I subgraphs would exist, since in the α → 0 limit the 1 + 1/α slope of the yellow phase boundary diverges, eliminating all type II subgraphs. Because the absolute count of the subgraphs is the most fundamental quantity for evaluating a local interaction pattern's topological role in a network, we will continue focusing on the direct subgraph count, limiting the discussion on motifs and the role of the randomized reference frame to the supporting information, which is published on the PNAS web site. Note that the scaling of the subgraph density with the network size N was already predicted in ref. 26. Yet, the calculation did not take into account the scaling of the clustering coefficient; thus, the results are limited to the α = 0 limit of our predictions. Thanks to the C(k) scaling, however, for realistic γ values we predict a new phase, which contains the type II subgraphs.
Subgraphs Aggregate Around Hubs. The very large densities we observe for some type I subgraphs (Tables 1 and 2) require us to explain how to distribute as many as 10^{11} subgraphs in a network with only 10^{3} nodes. We address this question by calculating the number of distinct subgraphs in which a given node (gene, metabolite, or protein) participates. We first focus on the triangle subgraph (3,3), the elementary building block of many higherorder subgraphs. A node with k links participates on average in T(k) = C(k)k(k – 1)/2 triangles. For large k this scales as T(k) ∼ k^{2–α}. Therefore, the probability that exactly T triangles pass through a node is P(T) ∼ T – δ, where δ = 1 + (γ – 1)/(2 – α), a powerlaw dependence that indicates that whereas the majority of nodes participate in at most one or two triangles, a few nodes take part in a very large number of triangle subgraphs. The monotonic nature of T(k) indicates that the triangles are not distributed uniformly within the network but tend to aggregate around the hubs. Because a node with k links can carry up to ≈k^{2} triangles, the aggregation around the high k hubs, visible, e.g., in Fig. 2 a and b, allows the network with a modest number of nodes to absorb a very large number of subgraphs. These calculations can be extended to arbitrary (n,m) subgraphs, in each case predicting a power law for both T(k) and P(T), with exponents that depend on the (n,m) parameters (see supporting information). To test the validity of these analytical predictions, we determined numerically P(T) and T(k) for several subgraphs in each of the studied cellular networks. As shown in Fig. 2 c and d, the results support not only the predicted power law nature of P(T) but also the numerically determined exponent δ, which are in good agreement with the analytically predicted values (Tables 1 and 2).
The fact that the P(T) distribution of the individual subgraphs can be uniquely determined by the (γ,α) exponents has a quite unexpected consequence: It indicates that the relationship between the network's global architecture and its subgraph densities is reciprocal, so that the network's largescale topology can be uncovered from the inspection of the local subgraph structure. Indeed, by measuring the P(T) distribution for any two subgraphs (e.g., those shown in Fig. 2), and using the derived relationship between δ, α, and γ, we can determine the α and γ exponents of the overall network. Because the scaling region of P(T) is more extended than that of P(k) or C(k), displaying, e.g., over five orders of magnitude of scaling in Fig. 2d, such subgraphbased determination of γ and α can be at times more precise than the direct fitting of P(k) and C(k). Taken together, these findings indicate the equivalence of the information obtained from measurements focusing on the local (subgraph based) and global (scalefree and hierarchical) structure of complex networks: A proper characterization of the network's local topology allows us to determine its largescale parameters, or the direct measurement of the network's global statistical features allows us to predict its detailed subgraph structure.
Subgraph Percolation Leads to Subgraph Clusters. The analytical tools we have developed allow us to uncover how the various subgraphs relate to each other, an issue that is likely to have significant influence on, e.g., a particular subgraph's potential functional properties in biological systems. The topological relationship between various subgraphs is illustrated in Fig. 3, where we show all nodes participating in several sixnode subgraphs (n = 6) for each of the three studied S. cerevisiae cellular networks. The figure indicates that the underrepresented type II subgraphs, shown on the right, are either absent or form small fragmented islands with only a few nodes. As we move toward the type I subgraphs shown on the left, we not only observe a rapid increase in the subgraph density, but also a spectacular aggregation process, forcing all of the highdensity type I subgraphs into a single giant cluster consisting of thousands to millions of highly interconnected subgraphs.
Our analytical methods permit us to uncover the mechanisms of the observed subgraph aggregation, predicting the existence of a percolation condition given by the equation (m – n + 1)α – (n – 2) < 0, such that the subgraphs satisfying this condition should form a giant cluster. The subgraphs that do not satisfy this condition, however, are allowed to break into isolated islands and/or vanish in size. Direct quantitative evidence for the percolationlike transition is provided by the measurement of the relative size of the largest cluster (shown as squares in Fig. 3), indicating that as we move away from the abundant type I subgraphs, from left to right, the size of the largest cluster shrinks, falling particularly rapidly in the vicinity of the predicted percolation transition. The analytical prediction, shown as a continuous line, indicates a good agreement between the predicted and the measured cluster sizes for the two larger networks (metabolic and protein). Therefore, these findings indicate that if a node participates in two or more subgraphs, such participation is imposed on the node by the network's topological constraints deriving from the need to distribute a large number of triangles among a finite number of nodes with widely different connectivity.
Directed Subgraphs. Because transcriptional regulatory interactions and some metabolic reactions are directed, we need to extend our calculations to directed subgraphs as well. For this, we consider directed subgraphs made of n nodes and m directed links that can be decomposed into a central node and n – 1 inneighbors (j is an inneighbor of i if there is a directed link from j to i). Among the m directed links, n – 1 connect the central node to its n – 1 inneighbors, while the remaining m – n + 1 directed links connect any two inneighbors. Whenever there is a link between two inneighbors they will form, together with the central node, a FFL (1, 2). Therefore, the problem of finding the number of (n, m) directed subgraphs is equivalent to the undirected case discussed above, after replacing the degree by the indegree, defined as the number of inneighbors, the degree distribution by the indegree distribution P(k_{in}), and the clustering coefficient by the FFL clustering coefficient, C_{FFL}, defined as the number of FFLs passing by a node divided by the maximum number of FFLs that can pass by it. Assuming that and , our calculations again predict the existence of the type I and II subgraphs for (m – n + 1)α_{in} – (n – γ_{in}) < 0 and (m – n + 1)α_{in} – (n – γ_{in}) > 0, respectively. These results indicate that the distinction between type I and II subgraphs obtained for undirected networks is present in directed networks as well. A complete study of all directed subgraphs can be also completed, but because the discussion of all possible cases is not particularly instructive, it is delegated to further work.
Discussion
The demonstrated equivalence between the local and global topological organization not only illustrates the importance of taking into account the mathematical realities and constraints when interpreting biological data, but also has a number of important consequences for our understanding of cellular networks. First, it is tempting to conclude that as the largescale exponents α and γ determine the subgraph density, then the global organization has priority over the local one. Such conclusion is a too simplistic, and therefore incorrect. Indeed, a series of studies have indicated that the evolution of the largescale structure of cellular networks is the consequence of two genomelevel mechanisms: gene duplication and the divergence of duplicated molecular interactions due to subsequent mutations (27–32). The combination of these processes allows one to predict the α and γ exponents, in agreement with the experimental data (27–32). In contrast, the network's local wiring diagram may be shaped by selection toward subgraphs with desirable functional properties. Therefore, whereas the global structure reflects the sum of events contributing to the network's growth and buildup, it is often implied that the local properties reflect solely evolutionary selection toward desirable functional traits (1–4). Our results indicate, however, that a sharp distinction between the local and global structure is not justified: Determining the largescale exponents (α and γ) is equivalent with specifying the number of subgraphs, or providing the distribution of any two subgraphs uniquely identifies the system's largescale organization and the scaling exponents. Thus, such local processes as gene duplication and subsequent interaction divergence (32) likely determine both the network's largescale topology (α and γ) (27–32) and the statistical relevance and density of subgraphs. This common origin of the local and global characteristics is the most likely biological reason for their mathematical equivalence, because neither the density and topology of subgraphs nor the largescale properties can be dissociated from the evolution of the overall network. Selection for function is likely to play an important role in shaping the directionality and/or strength of the links [e.g., of the molecular interactions for information processing in transcriptional regulatory networks (1–3)]. As our study shows, the inevitable aggregation of type I subgraphs into clusters is equally important, because it implies that the potential functional properties of statistically abundant subgraphs need also to be evaluated beyond the level of a single subgraph, at the level of subgraphs clusters.
It is important to note that the simplifications we made in the calculations leading to Figs. 1, 2, 3 can be relaxed (see supporting information). First, as we have shown above, type I and II subgraphs can be generalized to directed networks, representing a biologically more relevant approximation for the regulatory and metabolic networks. Second, although Fig. 1 is limited to the subset of nnode subgraphs that contain a central node, the results can be generalized to other elementary subgraphs as well, such as those containing cycles of four or more nodes. Subgraphs with a central node are, however, abundant in complex networks with a high clustering coefficient, as in the case of biological networks, and therefore deserve special attention. Finally, the incompleteness of the current maps of cellular networks suggests potentially higher triangle densities than are currently detectable. Yet, as long as the missing and falsepositive interactions are distributed randomly throughout the network, they do not affect our findings. This conclusion is supported by the fact that our predictions work equally well for the nearly complete metabolic network and the incomplete transcriptional regulatory network (Figs. 1, 2, 3).
In conclusion, the demonstrated mathematical equivalence of a network's largescale and local, subgraphbased structure underscores the need to understand the properties and evolution of cellular networks as fully integrated systems, where the achievable local changes are inherently intertwined with the network's global organization. Also, the interdependence between the local and global architecture is by no means limited to cellular networks but is expected to apply to all networked systems, from the World Wide Web to transportation and social networks (8–12, 33). Indeed, preliminary results indicate that the analysis described here can be successfully carried out for the Internet topology and other networks (12, 34, 35) and may have an impact on our understanding of cycles in complex networks as well (36, 37) (A.V., J. G. Oliveira, and A.L.B., unpublished work). Therefore, although there appears to be significant freedom in the evolution (and subsequent function) of various complex networks, the kind and abundance of local interaction patterns are uniquely characterized by their two global parameters, raising intriguing questions about the role of local, individual events in shaping a network's overall behavior.
Acknowledgments
We thank G. Balázsi for comments on the manuscript. Research at the University of Notre Dame and Northwestern University was supported by grants from the U.S. Department of Energy, the National Institutes of Health, and the National Science Foundation. Part of this work was supported by the Fonds National Suisse.
Footnotes

↵¶ To whom correspondence should be addressed. Email: alb{at}nd.edu.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: FFL, feedforward loop; PPI, protein–protein interaction.
 Received August 16, 2004.
 Copyright © 2004, The National Academy of Sciences
References
 ↵
 ↵
Milo, R., ShenOrr, S. S., Itzkovitz, S., Kashtan, N. & Alon, U. (2002) Science 298, 824–827.pmid:12399590
 ↵
 ↵
Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., ShenOrr, S., Ayzenshtat, I., Sheffer, M. & Alon, U. (2004) Science 303, 1538–1542.pmid:15001784
 ↵
Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., BarJoseph, Z., Gerber, G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I., et al. (2002) Science 298, 799–804.pmid:12399584
 ↵
Hinman, V. F., Nguyen, A. T., Cameron, R. A. & Davidson, E. H. (2003) Proc. Natl. Acad. Sci. USA 100, 13356–13361.pmid:14595011
 ↵
 ↵

Dorogovtsev, S. N. & Mendes, J. F. F. (2003) Evolution of Networks: From Biological Nets to the Internet and WWW (Oxford Univ. Press, Oxford).
 ↵
PastorSatorras, R. & Vespignani, A. (2004) Evolution and Structure of the Internet: A Statistical Physics Approach (Cambridge Univ. Press, Cambridge, U.K.).
 ↵
 ↵
Overbeek, R., Larsen, N., Walunas, T., D'Souza, M., Pusch, G., Selkov, E., Jr., Liolios, K., Joukov, V., Kaznadzey, D., Anderson, I., et al. (2003) Nucleic Acids Res. 31, 164–171.pmid:12519973
 ↵
Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U. & Eisenberg, D. (2004) Nucleic Acids Res. 32, D449–D451.pmid:14681454
 ↵
Mangan, S. & Alon, U. (2003) Proc. Natl. Acad. Sci. USA 100, 11980–11985.pmid:14530388
 ↵
Basu, S., Mehreja, R., Thiberge, S., Chen, M. T. & Weiss, R. (2004) Proc. Natl. Acad. Sci. USA 101, 6355–6360.pmid:15096621
 ↵
Barabási, A.L. & Albert, R. (1999) Science 286, 509–512.pmid:10521342
 ↵
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A.L. (2002) Science 297, 1551–1555.pmid:12202830
 ↵

Wagner, A. & Fell, D. A. (2001) Proc. R. Soc. London Ser. B 268, 1803–1810.
 ↵
Wagner, A. (2001) Mol. Biol. Evol. 18, 1283–1292.pmid:11420367
 ↵
 ↵
 ↵
 ↵
Rzhetsky, A. & Gomez, S. M. (2001) Bioinformatics 17, 988–996.pmid:11673244

Bhan, A., Galas, D. J. & Dewey, T. G. (2002) Bioinformatics 18, 1486–1493.pmid:12424120
 ↵
 ↵
 ↵
Cohen, R., Dolev, D., Havlin, S., Kalisky, T., Mokryn, O. & Shavitt, Y. (2004) arXiv:condmat/0305582.
 ↵
 ↵
Gleiss, P. M., Stadler, P. F., Wagner, A. & Fell, D. A. (2001) Adv. Complex Systems 1, 1–18.
 ↵
 ↵
Echmann, J.P. & Moses, E. (2002) Proc. Natl. Acad. Sci. USA 99, 5825–5829.pmid:11972019