Maps of random walks on complex networks reveal community structure
-
Fig. 1.
Detecting communities by compressing the description of information flows on networks. (A) We want to describe the trajectory of a random walk on the network such that important structures have unique names. The orange line shows one sample trajectory. (B) A basic approach is to give a unique name to every node in the network. The Huffman code illustrated here is an efficient way to do so. The 314 bits shown under the network describe the sample trajectory in A, starting with 1111100 for the first node on the walk in the upper left corner, 1100 for the second node, etc., and ending with 00011 for the last node on the walk in the lower right corner. (C) A two-level description of the random walk, in which major clusters receive unique names, but the names of nodes within clusters are reused, yields on average a 32% shorter description for this network. The codes naming the modules and the codes used to indicate an exit from each module are shown to the left and the right of the arrows under the network, respectively. Using this code, we can describe the walk in A by the 243 bits shown under the network in C. The first three bits 111 indicate that the walk begins in the red module, the code 0000 specifies the first node on the walk, etc. (D) Reporting only the module names, and not the locations within the modules, provides an efficient coarse graining of the network.
-
Fig. 2.
Mapping flow highlights different aspects of structure than does optimizing modularity in directed and weighted networks. The coloring of nodes illustrates alternative partitions of two sample networks. (Left) Partitions show the modular structure as optimized by the map equation (minimum L). (Right) Partitions show the structure as optimized by modularity (maximum Q). In the network shown in A, the left-hand partition minimizes the map equation because the persistence times in the modules are long; with the weight of the bold links set to twice the weight of other links, a random walker without teleportation takes on average three steps in a module before exiting. The right-hand clustering gives a longer description length because a random walker takes on average only 12/5 steps in a module before exiting. The right-hand clustering maximizes the modularity because modularity counts weights of links, the in-degree, and the out-degree in the modules; the right-hand partitioning places the heavily weighted links inside of the modules. In B, for the same reason, the right-hand partition again maximizes modularity, but not so the map equation. Because every node is either a sink or a source in this network, the links do not induce any long-range flow, and the one-step walks are best described as in the left-hand partition, with all nodes in the same cluster.
-
Fig. 3.
A map of science based on citation patterns. We partitioned 6,128 journals connected by 6,434,916 citations into 88 modules and 3,024 directed and weighted links. For visual simplicity, we show only the links that the random surfer traverses >1/5,000th of her time, and we only show the modules that are visited via these links (see SI for the complete list). Because of the automatic ranking of nodes and links by the random surfer (22), we are assured of showing the most important links and nodes. For this particular level of detail, we capture 98% of the node weights and 94% of all flow.
-
Fig. 4.
A map of the social sciences. The journals listed in the 2004 social science edition of Journal Citation Reports (32) are a subset of those illustrated in Fig. 3, totaling 1,431 journals and 217,287 citations. When we map this subset on its own, we get a finer level of resolution. The 10 modules that correspond to the social sciences now are partitioned into 54 modules, but for simplicity we show only links that the random surfer visits at least 1/2,000th of her times together with the modules they connect (see SI for the complete list). For this particular level of detail, we capture 97% of the node weights and 90% of all flow.
Footnotes
- †To whom correspondence should be addressed. E-mail: rosvall{at}u.washington.edu
- © 2008 by The National Academy of Sciences of the USA







