The architecture of complex weighted networks
 ^{*}Laboratoire de Physique Théorique (Unité Mixte de Recherche du Centre National de la Recherche Scientifique 8627), Bâtiment 210, Université de ParisSud, 91405 Orsay Cedex, France; ^{†}Commissariat à l'Energie AtomiqueDépartement de Physique Théorique et Appliquée, 91191 BruyèresLeChatel, France; and ^{‡}Departament de Física i Enginyeria Nuclear, Universitat Politècnica de Catalunya, Campus Nord, Mòdul B4, 08034 Barcelona, Spain
See allHide authors and affiliations

Communicated by Giorgio Parisi, University of Rome, Rome, Italy, January 8, 2004 (received for review October 29, 2003)
Abstract
Networked structures arise in a wide array of different contexts such as technological and transportation infrastructures, social phenomena, and biological systems. These highly interconnected systems have recently been the focus of a great deal of attention that has uncovered and characterized their topological complexity. Along with a complex topological structure, real networks display a large heterogeneity in the capacity and intensity of the connections. These features, however, have mainly not been considered in past studies where links are usually represented as binary states, i.e., either present or absent. Here, we study the scientific collaboration network and the worldwide airtransportation network, which are representative examples of social and large infrastructure systems, respectively. In both cases it is possible to assign to each edge of the graph a weight proportional to the intensity or capacity of the connections among the various elements of the network. We define appropriate metrics combining weighted and topological observables that enable us to characterize the complex statistical properties and heterogeneity of the actual strength of edges and vertices. This information allows us to investigate the correlations among weighted quantities and the underlying topological structure of the network. These results provide a better description of the hierarchies and organizational principles at the basis of the architecture of weighted networks.
A large number of natural and manmade systems are structured in the form of networks. Typical examples include large communication systems (the Internet, the telephone network, the World Wide Web), transportation infrastructures (railroad and airline routes), biological systems (gene and/or protein interaction networks), and a variety of social interaction structures (13). The macroscopic properties of these networks have been the subject of intense scientific activity that has highlighted the emergence of a number of significant topological features. Specifically, many of these networks show the smallworld property (4), which implies that the network has an average topological distance between the various nodes increasing very slowly with the number of nodes (logarithmically or even slower), despite showing a large degree of local interconnectedness typical of more ordered lattices. Additionally, several of these networks are characterized by a statistical abundance of “hubs” with a very large number of connections k compared with the average degree value . The empirical evidence collected from real data indicates that this distinctive feature finds its statistical characterization in the presence of scalefree degree distributions P(k), i.e., showing a powerlaw behavior P(k) ∼ k ^{γ} for a significant range of values of k (5). These topological features turn out to be extremely relevant because they have a strong impact in assessing such networks' physical properties as their robustness or vulnerability (69).
While these findings alone might provide insight for threat analysis and policy decisions, networks are specified not only by their topology but also by the dynamics of information or traffic flow taking place on the structure. In particular, the heterogeneity in the intensity of connections may be very important in the understanding of social systems. Analogously, the amount of traffic characterizing the connections in communication systems and large transport infrastructures is fundamental for a full description of these networks.
Motivated by these observations, we undertake in this paper the statistical analysis of complex networks whose edges have been assigned a given weight (the flow or the intensity) and thus can be generally described in terms of weighted graphs (10, 11). Working with two typical examples of this kind of network, we introduce some metrics that combine in a natural way both the topology of the connections and the weight assigned to them. These quantities provide a general characterization of the heterogenous statistical properties of weights and identify alternative definitions of centrality, local cohesiveness, and affinity. By appropriate measurements it is also possible to exploit the correlation between the weights and the topological structure of the graph, unveiling the complex architecture shown by real weighted networks.
Weighted Networks Data
To proceed to the general analysis of complex weighted networks we consider two specific examples for which it is possible to have a full characterization of the links among the elements of the systems, the worldwide airport network (WAN) and the scientist collaboration network (SCN).
WAN. We analyze the International Air Transportation Association (www.iata.org) database containing the world list of airports pairs connected by direct flights and the number of available seats on any given connection for the year 2002. The resulting airtransportation graph comprises N = 3,880 vertices denoting airports and E = 18,810 edges accounting for the presence of a direct flight connection. The average degree of the network is , while the maximal degree is 318. The topology of the graph exhibits both smallworld and scalefree properties as already observed in different dataset analyses (12, 13). In particular, the average shortest path length, measured as the average number of edges separating any two nodes in the network, shows the value , very small compared with the network size N. The degree distribution takes the form P(k) = k ^{γ} f(k/k _{x}), where γ ≅ 2.0 and f(k/k _{x}) is an exponential cutoff function that finds its origin in physical constraints on the maximum number of connections that a single airport can handle (3, 13). The airport connection graph is therefore a clear example of a network with an heterogeneous degree distribution, showing scalefree properties on a wide range of degree values.
SCN. We consider the network of scientists who have authored manuscripts submitted to the ePrint Archive relative to condensed matter physics (http://xxx.lanl.gov/archive/condmat) between 1995 and 1998. Scientists are identified with nodes, and an edge exists between two scientists if they have coauthored at least one paper. The resulting connected network has N = 12,722 nodes, with an average degree (i.e., average number of collaborators) and maximal degree 97. The topological properties of this network and other similar networks of scientific collaborations have been studied in refs. 1416.
The properties of a graph can be expressed by its adjacency matrix a _{ij}, whose elements take the value 1 if an edge connects the vertex i to the vertex j and 0 otherwise. The data contained in the previous datasets permit one to go beyond this topological representation by defining a weighted graph (10) that assigns a weight or value characterizing each connecting link. In the case of the WAN the weight w _{ij} of an edge linking airports i and j represents the number of available seats in flights between these two airports. The inspection of the weights shows that the average numbers of seats in both directions are identical w _{ij} = w _{ji} for an overwhelming majority of edges. In the following we will thus work with the symmetric undirected graph and avoid the complication deriving from flow imbalances. We show an example of the resulting weighted graph in Fig. 1. Noticeably, the above definition of weights is a straightforward and objective measure of the traffic flow on top of the network.
For the SCN we follow the definition of weight introduced in refs. 14 and 15: The intensity w _{ij} of the interaction between two collaborators i and j is defined as
where the index p runs over all papers, n _{p} is the number of authors of paper p, and is 1 if author i has contributed to paper p and 0 otherwise. While any definition of the intensity of a connection in social networks depends on the particular elements chosen to be relevant, the above definition seems to be rather objective and representative of the scientific interaction: It is large for collaborators having many papers in common but the contribution to the weight introduced by any given paper is inversely proportional to the number of authors.
Centrality and Weights
To take into account the information provided by the weighted graph, we shall identify the appropriate quantities characterizing its structure and organization at the statistical level. The statistical analysis of weights w _{ij} between pairs of vertices indicates the presence of rightskewed distributions, already signaling a high level of heterogeneity in the system for both the WAN and the SCN as also reported in refs. 12, 14, and 15. It has been observed, however, that the individual edge weights do not provide a general picture of the network's complexity (11). A more significant measure of the network properties in terms of the actual weights is obtained by extending the definition of vertex degree k _{i} = Σ_{j} a _{ij} in terms of the vertex strength s _{i}, defined as
This quantity measures the strength of vertices in terms of the total weight of their connections. In the case of the WAN the vertex strength simply accounts for the total traffic handled by each airport. For the SCN, on the other hand, the strength is a measure of scientific productivity because it is equal to the total number of publications of any given scientist, excluding singleauthor publications. This quantity is a natural measure of the importance or centrality of a vertex i in the network.
The identification of the most central nodes in the system is a major issue in network characterization (17). The most intuitive topological measure of centrality is given by the degree: more connected nodes are more central. However, more is not necessarily better. Indeed, by considering solely the degree of a node we overlook that nodes with small degree may be crucial for connecting different regions of the network by acting as bridges. To quantitatively account for the role of such nodes, betweenness centrality (14, 15, 17, 18) has been defined as the number of shortest paths between pairs of vertices that pass through a given vertex.¶ Central nodes are therefore part of more shortest paths within the network than peripheral nodes. Moreover, the betweenness centrality is often used in transport networks to provide an estimate of the traffic handled by the vertices, assuming that the number of shortest paths is a zerothorder approximation to the frequency of use of a given node.∥ The above definition of centrality relies only on topological elements. It is therefore intuitive to consider the alternative definition of centrality constructed by looking at the strength s _{i} of the vertices as a more appropriate definition of the importance of a vertex in weighted networks. For instance, in the case of the WAN this quantity provides the actual traffic going through the vertex i, and it is natural to study how it compares and correlates with other topological measures of centrality.
The probability distribution P(s) that a vertex has strength s is heavy tailed in both networks, and the functional behavior exhibits similarities with the degree distribution P(k) (see Fig. 2). A precise functional description of the heavytailed distributions may be very important in understanding the network evolution and will be deferred to future analysis. This behavior is not unexpected because it is plausible that the strength s _{i} increases with the vertex degree k _{i}, and thus the slow decaying tail of P(s) stems directly from the very slow decay of the degree distribution. To shed more light on the relationship between the vertices' strength and degree, we investigate the dependence of s _{i} on k _{i}. We find that the average strength s(k) of vertices with degree k increases with the degree as
In the absence of correlations between the weight of edges and the degree of vertices, the weights w _{ij} are on average independent of i and j, and therefore we can approximate , where is the average weight in the network. From Eq. 2 we then have . That is, the strength of a vertex is simply proportional to its degree, yielding an exponent β = 1, and the two quantities provide therefore the same information on the system. In Fig. 3 we report the behavior obtained for both the real weighted networks and their randomized versions, generated by a random redistribution of the actual weights on the existing topology of the network. For the SCN the curves are very similar and well fitted by the uncorrelated approximation . Interestingly, this is not the case of the WAN. Fig. 3B clearly shows a very different behavior for the real data set and its randomized version. In particular, the powerlaw fit for the real data gives an “anomalous” exponent β_{WAN} = 1.5 ± 0.1. This value implies that the strength of vertices grows faster than their degree, i.e., the weight of edges belonging to highly connected vertices tends to have a value higher than the one corresponding to a random assignment of weights. This tendency denotes a strong correlation between the weight and the topological properties in the WAN, where the larger is an airport, the more traffic it can handle.
The fingerprint of these correlations is also observed in the dependence of the weight w _{ij} on the degrees of the endpoint nodes k _{i} and k _{j}. As we can see in Fig. 4, for the WAN the behavior of the average weight as a function of the endpoint degrees can be well approximated by a powerlaw dependence
with an exponent θ = 0.5 ± 0.1. This exponent can be related to the β exponent by noticing that , resulting in β = 1 + θ, if the topological correlations between the degrees of connected vertices can be neglected. This is indeed the case of the WAN, where the above scaling relation is well satisfied by the numerical values provided by the independent measurements of the exponents. In the SCN, instead, is almost constant for more than two decades, confirming a general lack of correlations between the weights and the vertex degrees.
Analogously, a study of the average value s(b) of the strength for vertices with betweenness b shows that the functional behavior can be approximated by a scaling form s(b) ∼ b ^{δ} with δ_{SCN} ≅ 0.5 and δ_{WAN} ≅ 0.8 for the SCN and the WAN, respectively. As before, the comparison between the behavior of the real data and the randomized case shows more pronounced differences in the case of the WAN. In both networks, the strength grows with the betweenness faster than in the randomized case, especially in the WAN. This behavior is another clear signature of the correlations between weighted properties and the network topology.
Structural Organization of Weighted Networks
Along with the vertices hierarchy imposed by the strength distribution, the larger the more central, complex networks show an architecture imposed by the structural and administrative organization of these systems. For instance, topical areas and national research structures give rise to well defined groups or communities in the SCN. In the WAN, on the other hand, different hierarchies correspond to domestic or regional airport groups and intracontinental transport systems; political or economic factors can impose additional constraints on the network structure (13). To uncover these structures, some topological quantities are customarily studied. The clustering coefficient measures the local group cohesiveness and is defined for any vertex i as the fraction of connected neighbors of i (4). The average clustering coefficient C = N ^{1}Σ_{i} c _{i} thus expresses the statistical level of cohesiveness measuring the global density of interconnected vertex triplets in the network. Further information can be gathered by inspecting the average clustering coefficient C(k) restricted to classes of vertices with degree k. In real networks (20, 21), C(k) exhibits a highly nontrivial behavior with a powerlaw decay as a function of k, signaling a hierarchy in which low degree vertices belong generally to well interconnected communities (high clustering coefficient), while hubs connect many vertices that are not directly connected (small clustering coefficient) (20, 21). Another quantity used to probe the networks' architecture is the average degree of nearest neighbors, k _{nn}(k), for vertices of degree k (22). This last quantity is related to the correlations between the degree of connected vertices (22, 23) because it can be expressed as k _{nn}(k) = Σ_{k′} k′P(k′k), where P(k′k) is the conditional probability that a given vertex with degree k is connected to a vertex of degree k′. In the absence of degree correlations, P(k′k) does not depend on k and neither does the average nearest neighbors' degree; i.e., k _{nn}(k) = constant (22). In the presence of correlations, the behavior of k _{nn}(k) identifies two general classes of networks. If k _{nn}(k) is an increasing function of k, vertices with high degree have a larger probability to be connected with large degree vertices. This property is referred to in physics and social sciences as assortative mixing (24). In contrast, a decreasing behavior of k _{nn}(k) defines disassortative mixing, in the sense that highdegree vertices have a majority of neighbors with low degree, whereas the opposite holds for lowdegree vertices.
The above quantities provide clear signatures of a structural organization of networks in which different degree classes show different properties in the local connectivity structure. However, they are defined solely on topological grounds, and the inclusion of weights and their correlations might change consistently our view of the hierarchical and structural organization of the network. This can be easily understood with the simple example of a network in which the weights of all edges forming triples of interconnected vertices are extremely small. Even for a large clustering coefficient, it is clear that these triples have a minor role in the network dynamics and organization, and that the clustering properties are definitely overestimated by a simple topological analysis. Similarly, highdegree vertices could be connected to a majority of lowdegree vertices while concentrating the largest fraction of their strength only on the vertices with high degree. In this case the topological information would point to disassortative properties, whereas the network could be considered assortative in an effective way, because the more relevant edges in term of weights are linking highdegree vertices.
To solve the previous incongruities we introduce metrics that combine the topological information with the weight distribution of the network. First, we consider the weighted clustering coefficient defined as (see Fig. 5)
This coefficient is a measure of the local cohesiveness that takes into account the importance of the clustered structure on the basis of the amount of traffic or interaction intensity actually found on the local triplets. Indeed, counts for each triplet formed in the neighborhood of the vertex i the weight of the two participating edges of the vertex i. In this way we are considering not just the number of closed triplets in the neighborhood of a vertex but also their total relative weight with respect to the strength of the vertex. The normalization factor s _{i}(k _{i}  1) accounts for the weight of each edge times the maximum possible number of triplets in which it may participate, and it ensures that . Consistently, the definition recovers the topological clustering coefficient in the case that w _{ij} = constant. Next we define C^{w} and C^{w} (k) as the weighted clustering coefficient averaged over all vertices of the network and over all vertices with degree k, respectively. These quantities provide global information on the correlation between weights and topology, especially by comparing them with their topological analogs. In the case of a large randomized network (lack of correlations) it is easy to see that C^{w} = C and C^{w} (k) = C(k). In real weighted networks, however, we can face two opposite cases. If C^{w} > C, we are in presence of a network in which the interconnected triplets are more likely formed by the edges with larger weights. On the other hand, C^{w} < C signals a network in which the topological clustering is generated by edges with low weight. In this case the clustering has a minor effect in the organization of the network because the largest part of the interactions (traffic, frequency of the relations, etc.) is occurring on edges not belonging to interconnected triplets. The same may happen for C^{w} (k), for which it is also possible to analyze the variations with respect to the degree class k.
Along with the weighted clustering coefficient, we introduce the weighted average nearestneighbors degree, defined as (see Fig. 5)
In this case, we perform a local weighted average of the nearestneighbor degree according to the normalized weight of the connecting edges, w _{ij}/s _{i}. This definition implies that if the edges with the larger weights are pointing to the neighbors with larger degree and in the opposite case. The thus measures the effective affinity to connect with high or lowdegree neighbors according to the magnitude of the actual interactions. As well, the behavior of the function marks the weighted assortative or disassortative properties considering the actual interactions among the system's elements.
As a general test, we inspect the results obtained for both the SCN and the WAN by comparing the regular topological quantities with those obtained with the weighted definition introduced here. The topological measurements tell us that the SCN has a continuously decaying spectrum C(k) (see Fig. 6A ). This implies that hubs present a much lower clustered neighborhood than lowdegree vertices. This effect can be interpreted as the evidence that authors with few collaborators usually work within a well defined research group in which all of the scientists collaborate (high clustering). Authors with a large degree, however, collaborate with different groups and communities, which in their turn do not often have collaborations, thus creating a lower clustering coefficient. Furthermore, the SCN exhibits an assortative behavior in agreement with the general evidence that social networks are usually denoted by a strong assortative character (24) (see Fig. 6B ). The analysis of weighted quantities confirms this topological picture, providing further information on the network architecture. The weighted clustering coefficient is very close to the topological one (C^{w} /C ≅ 1). This fact states in a quantitative way that group collaborations tend on average to be stable and determine the average intensity of the interactions in the network. In addition, the inspection of C^{w} (k) (see Fig. 6A ) shows generally that for k ≥ 10 the weighted clustering coefficient is larger than the topological one. This difference implies that highdegree authors (i.e., with many collaborators) tend to publish more papers with interconnected groups of coauthors. This finding suggests that influential scientists form stable research groups where the largest part of their production is obtained. Finally, the assortative properties find a clearcut confirmation in the weighted analysis with a growing as a power of k.
A different picture is found in the WAN, where the weighted analysis provides a richer and somehow different scenario (Fig. 7). This network also shows a decaying C(k), a consequence of the role of large airports that provide nonstop connections to very far destinations on an international and intercontinental scale. These destinations are usually not interconnected among them, giving rise to a low clustering coefficient for the hubs. We find, however, that C^{w} /C ≅ 1.1, indicating an accumulation of traffic on interconnected groups of vertices. The weighted clustering coefficient C^{w} (k) also has a different behavior in that its variation is much more limited in the whole spectrum of k. This observation implies that highdegree airports have a progressive tendency to form interconnected groups with hightraffic links, thus balancing the reduced topological clustering. Because high traffic is associated to hubs, we have a network in which highdegree nodes tend to form cliques with nodes with equal or higher degree, the socalled richclub phenomenon (25). Interesting evidence emerges also from the comparison of k _{nn}(k) and . The topological k _{nn}(k) does show an assortative behavior only at small degrees. For k > 10, k _{nn}(k) approaches a constant value, a fact revealing an uncorrelated structure in which vertices with very different degrees have a very similar neighborhood. The analysis of the weighted , however, exhibits a pronounced assortative behavior in the whole k spectrum, providing a different picture in which highdegree airports have a larger affinity for other large airports where the major part of the traffic is directed.
Conclusions
We have shown that a more complete view of complex networks is provided by the study of the interactions defining the links of these systems. The weights characterizing the various connections exhibit complex statistical features with highly varying distributions and powerlaw behavior. In particular we have considered the specific examples of SCN and WAN where it is possible to appreciate the importance of the correlations between weights and topology in the characterization of real network properties. Indeed, the analysis of the weighted quantities and the study of the correlations between weights and topology provide a complementary perspective on the structural organization of the network that might be undetected by quantities based only on topological information. Our study thus offers a quantitative and general approach to understand the complex architecture of real weighted networks.
Acknowledgments
We thank the International Air Transportation Association for making the airline commercial flight database available to us. We also thank M. E. J. Newman for giving us the possibility of using the SCN data (see wwwpersonal.umich.edu/~mejn/collaboration). We are grateful to L. A. N. Amaral and R. Guimerà for many discussions and sharing of results during the various stages of this work, and to L. Brualla for help with Fig. 1. A.B., R.P.S., and A.V. are partially funded by the European Commission, Future and Emerging Technologies Open Project COSIN (Coevolution and SelfOrganisation in Dynamical Networks) IST200133555. R.P.S. acknowledges financial support from the Ministerio de Ciencia y Tecnología (Spain) and from the Departament d'Universitats, Recerca i Societat de la Informació, Generalitat de Catalunya (Spain).
Footnotes

↵ § To whom correspondence should be addressed. Email: alexv{at}th.upsud.fr.

Abbreviations: WAN, worldwide airport network; SCN, scientist collaboration network.

↵ ¶ More precisely, if D _{hj} is the total number of shortest paths from h to j and D _{hj}(i) is the number of these shortest paths that pass through the vertex i, the betweenness of the vertex i is defined as b _{i} = Σ D _{hj}(i)/D _{hj}, where the sum runs over all h, j pairs with j ≠ h ≠ i. An efficient algorithm to compute betweenness centrality is reported in ref. 19.

↵ ∥ For the airport network, the analysis of the betweenness centrality and its correlation with the degree has been discussed in ref. 13.
 Copyright © 2004, The National Academy of Sciences
References
 ↵

Dorogovtsev, S. N. & Mendes, J. F. F. (2003) Evolution of Networks: From Biological Nets to the Internet and WWW (Oxford Univ. Press, Oxford).

↵
Amaral, L. A. N., Scala, A., Barthélemy, M. & Stanley, H. E. (2000) Proc. Natl. Acad. Sci. USA 97 , 1114911152. pmid:11005838
 ↵

↵
Barabási, A.L & Albert, R. (1999) Science 286 , 509512. pmid:10521342
 ↵
 ↵

↵
Clark, J. & Holton, D. A. (1998) A First Look at Graph Theory (World Scientific, Singapore).
 ↵

↵
Li, W. & Cai, X. (2003) ePrint Archive, http://xxx.lanl.gov/abs/condmat/0309236.

↵
Guimerà, R., Mossa, S., Turtschi, A. & Amaral, L. A. N. (2003) ePrint Archive, http://xxx.lanl.gov/abs/condmat/0312535.
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
Maslov, S. & Sneppen, K. (2001) Science 296 , 910913.
 ↵

↵
Zhou, S. & Mondragon, R. J. (2003) ePrint Archive, http://xxx.lanl.gov/abs/cs.NI/0303028.