# On the origins of hierarchy in complex networks

^{a}Institució Catalana per a la Recerca i Estudis Avancats–Complex Systems Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain;^{b}Institut de Biologia Evolutiva (Consejo Superior de Investigaciones Cientificas–Universitat Pompeu Fabra), 08003 Barcelona, Spain;^{c}Section for Science of Complex Systems, Medical University of Vienna, 1090 Vienna, Austria;^{d}Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405; and^{e}Santa Fe Institute, Santa Fe, NM 87501

See allHide authors and affiliations

Edited* by Ignacio Rodriguez-Iturbe, Princeton University, Princeton, NJ, and approved June 24, 2013 (received for review January 17, 2013)

## Abstract

Hierarchy seems to pervade complexity in both living and artificial systems. Despite its relevance, no general theory that captures all features of hierarchy and its origins has been proposed yet. Here we present a formal approach resulting from the convergence of theoretical morphology and network theory that allows constructing a 3D morphospace of hierarchies and hence comparing the hierarchical organization of ecological, cellular, technological, and social networks. Embedded within large voids in the morphospace of all possible hierarchies, four major groups are identified. Two of them match the expected from random networks with similar connectivity, thus suggesting that nonadaptive factors are at work. Ecological and gene networks define the other two, indicating that their topological order is the result of functional constraints. These results are consistent with an exploration of the morphospace, using *in silico* evolved networks.

Fifty years ago (1), Herbert Simon defined complex systems as nested hierarchical networks of components organized as interconnected modules. Hierarchy seems a pervasive feature of the organization of natural and artificial systems (2, 3). The examples span from social interactions (4, 5), urban growth (6, 7), and allometric scaling (8) to cell function (9⇓⇓⇓–13), development (14), ecosystem flows (15, 16), river networks (17), brain organization (18), and macroevolution (19, 20). It also seems to pervade a coherent form of organization that allows reducing the costs associated to reliable information transmission (21) and to support efficient genetic and metabolic control in cellular networks (22). However, hierarchy is a polysemous word, involving order, levels, inclusion, or control as possible descriptors (23), none of which captures either its complexity or the problem of its measure and origins. Although previous work using complex networks theory has quantitatively tackled the problem (4, 24⇓⇓⇓⇓⇓⇓⇓⇓⇓–34), some questions remain: Is hierarchy a widespread feature of complex systems organization? What types of hierarchies do exist? Are hierarchies the result of selection pressures or, conversely, do they arise as a by-product of structural constraints?

A well-established concept where such questions are addressed involves the use of a morphospace (35⇓⇓⇓⇓–40), namely a phenotype space where a small set of quantitative traits can be defined as the axes. Here we take a step in this direction by combining morphospace and network theories, taking the intuitive idea of hierarchy as the starting point: a pattern of relations where there is no ambiguity in who controls whom with a pyramidal structure in which the few control the many. Formally, the picture of hierarchy matches a tree of relations (41), ideally represented by a directed graph. As shown in Fig. 1 *A–D*, the elements of the system are represented by nodes connected by arrows establishing the map of relations of who affects whom. Accordingly, a measure of hierarchy should account for the deviations from this ideal tree picture. Such deviations occur because (*a*) several elements are on the top, (*b*) downstream elements interact horizontally, or (*c*) feedback loops are present. As shown in Fig. 1*A*, the tree-like picture matches the concept of genealogies, taxonomies, armies, and corporations. Conversely, drainage networks in river basins (17) would define a reverse antihierarchical situation, as depicted in Fig. 1*B*. In this structure, multiple elements on the top merge downstream like an inverted tree. In between them, we can place a more or less symmetric web (Fig. 1*C*), somewhat combining both tendencies. However, in general, neither biological nor technological webs match the feedforward pattern. In most real systems, signal integration requires gathering inputs from different sources, whereas robust processing and control require crosstalk and feedbacks, as represented by Fig. 1*D*, which are often organized in a modular fashion (42).

A unified picture of hierarchy should not only provide a formal definition but also help in understanding the forces that shape it. To naturally incorporate both components of Simon’s view, namely causal downstream relations plus modules, a network formalism is required. Here we provide the formalization and quantitative characterization of the morphospace of the possible hierarchies. Its characterization is performed by theoretical and numerical analysis of random null models. The study of a large number of real networks and its comparison with model systems provide some unexpected answers to the previous questions. The study is completed with an exploration of the morphospace accessibility, using an evolutionary-driven search algorithm. Technical aspects of this work following the main text’s organization are in *SI Appendix*.

## The Coordinates of Hierarchy

Because hierarchy is about relations, our approach formalizes the interaction between the system’s elements by means of a directed graph (43), where is a node and is an arrow going from to (*SI Appendix*). This graph is transformed into the key object for our methodology: the so-called node-weighted condensed graph, . As Fig. 1 *D* and *H* shows, is a feedforward structure where cyclic modules of [the so-called strongly connected components (SCCs) ] are represented by individual nodes (thereby obtaining the condensed graph) (43). Such SCC method detection has been shown to be a powerful approach for subsystem identification (13, 44, 45), unraveling the presence of nested organizations. In a given , every node has a weight that indicates the number of elements from it includes (details in *SI Appendix*). By using , we make explicit the inclusion of both causal links and irreducible modules.

Our space of hierarchies is a metric space Ω, defined from three coordinates: treeness (*T*), feedforwardness (*F*), and orderability (*O*), which properly quantify graph hierarchy.

### Treeness.

Treeness (*T*, being *T* in the range ) weights how pyramidal is the structure and how unambiguous is its chain of command. This measure covers the range from hierarchical (, Fig. 1*A*) to antihierarchical (, Fig. 1*B*) graphs, including those structures that do not exhibit any pyramidal behavior (, Fig. 1*C*). As illustrated in Fig. 1 *E–G*), these graphs are characterized by taking the structure as a road map where we compare the diversity of choices we can make going top–down, i.e., following the arrows of the structure, vs. the uncertainty generated when reverting the paths going bottom–up. Such diversity is properly quantified using forward and backward entropies, respectively. Entropies are computed over directed acyclic graphs (28, 46). Although *SI Appendix* presents a rigorous description of these concepts, they are briefly presented departing from the node-weighted condensed graph shown in Fig. 1*H*. Note that a directed acyclic graph is naturally a node-weighted graph and hence ensures . From the graph , we define two sets, namely *M* and *μ*: the first, composed of nodes with , i.e., the set of maximal nodes, and the second, the set of nodes with , referred to as the set of minimal nodes. Now, let be the set of all paths starting in some maximal node. Because is a graph without cycles, this set contains a finite number of elements . If , the information required to follow a given path starting from and ending at some node in *μ*, will be given by the following path entropy,where is the probability that the path is followed, starting from node . Averaging this entropy over all paths from *M* to *μ* gives the average minimum information required to follow a path starting from some node in *M*. This reads , which is the general expression of the forward entropy of . In a similar way but in the bottom–up direction, a backward entropy of can be obtained. represents the average amount of information required to reverse a path . A detailed derivation of both and is given in *SI Appendix*. Roughly speaking, a system such that is considered as hierarchical.

Our measure of treeness is now obtained from the normalized difference (28) of the two entropies:[consistently, we have if , which takes place when is a linear chain]. The final value is computed as follows: First, let be the set containing and all subgraphs of that can be obtained by means of the application of a leaf removal algorithm (either top–down or bottom–up; details in *SI Appendix*). Then, is obtained by averaging *f* along all of the members of :In addition, if has no links—which can happen, e.g., if is totally cyclic. has been shown to be a very good indicator of the deviations of a given acyclic graph from the ideal hierarchical, tree-like picture (28).

### Feedfordwardness.

In the , because the elements within a SCC cannot be intrinsically ordered, SCCs constitute nonorderable modules within the feedforward structure. As Fig. 1*H* shows, SCCs represent a violation of the downstream order. Here, the size of the SCCs and also their position in the feedforward structure are key elements for the quantification of the impact of cyclic modules in the feedforward structure, because the higher the SCC position, the larger the number of its downstream dependencies. According to this, we define feedforwardness (*F*, being ) as a measure that weights the impact of cyclic modules on the feedforward structure, where cyclic modules closer to the top of introduce a larger penalty on hierarchical order than those placed at the bottom. For every path starting from the top of we compute the fraction of the nodes that it contains against the actual nodes of it represents. Formally, if is the set of nodes participating in the path ,where, as illustrated in Fig. 1, is the weight of node in the node-weighted condensed graph. To obtain a statistical estimator of the impact of the location of cyclic modules within the causal flow described by the network, we first have to define the set , which is the set containing all possible paths starting from the set of maximal nodes, *M*, and ending in any other node of . Now, is, thus, simply the average of over all elements of ; i.e.,

### Orderability.

As hierarchy is grounded on the concept of order, we need a descriptor that accounts for how orderable is the graph under study. Ranging from a fully cyclic graph to a feedforward structure, orderability (*O*) lies in the range and it is defined as the fraction of the nodes of the graph that does not belong to any cycle. These nodes, therefore, make part of the fraction of the network that can be actually ordered. Such a fraction weights how ordered is the set of nodes within the graph (Fig. 1 *I–L* and *SI Appendix*). Formally, we define as

Having exposed the formal description of the different hierarchy indicators, i.e., *T*, *F*, and *O*, we proceed to use them as the axes of our morphospace Ω, where networks can be properly located and compared.

## The Definition of the Morphospace Ω

According to our formalism, the hierarchical features of any directed network are given by a point in a 3D morphospace Ω, being (Fig. 2*A* and *SI Appendix*, Figs. S13 and S14). The point represents the graph by three coordinates,Using the schematic representation of graphs outlined in Fig. 1, we define an intuitive icon associated to each kind of graph. As summarized in Fig. 2*A*, the perfect hierarchy is located at , whereas the completely nonhierarchical system—a totally cyclic network—is located at . As defined, orderability and feedforwardness provide complementary information that defines forbidden regions of the morphospace. Because is possible only when , feedforward networks belong to the line. Given , no other is allowed. Attending to *F* and *O*, we find an interesting region, defined by and . It is worth stressing that, although and converge in their upper bound in a single value that defines the region of feedforward networks, they differ when *O* goes to zero. This is because deals with the condensed graph while including condensed modules but is about the nodes outside cycles. This little difference permits a family of rare networks, highlighted in our diagram by means of green icons in Fig. 1*A* (*SI Appendix*, Fig. S13). They are typically formed by chains of small SCCs arranged in a feedforward structure. As is shown below, no such networks are found in nature or technology: They belong to the domain of the possible, but not the actual.

From these two axes accounting for the cyclic nature of networks, the coordinate *T* provides additional information about the organization of the resulting feedforward structure after network condensation. Attending to the concept of a pyramidal structure, the plane separating hierarchy from antihierarchy defines a family of symmetric structures, where the downstream path diversity is canceled by the uncertainty resulting when reversing these paths. Interestingly, when cycles are incorporated to this structure at , the resulting structure is a bow-tie organization (Fig. 3). In this particular case, a large SCC occupies a central position within a balanced feedforward structure of inputs and outputs (47). As shown in Fig. 2*A*, the larger the SCC is, the lower are the values of *F* and *O*.

## Null Models and Real Network Analysis

Because null models do not consider optimal designs or functional constraints, they provide good insights about the part of the morphospace where no selection pressure is at work. Fig. 2 *B* and *C* shows that both homogeneous and broad random networks occupy (basically) the same region in Ω. This co-occupation, within the bow-tie plane with (*SI Appendix*, Figs. S18–S24), shows that random graphs appear located right in the middle between hierarchical and antihierarchical structures, independently of the type of degree distribution considered. Interestingly, following the configuration model approximation (48⇓⇓–51), theoretical results confirm this zero-centered behavior of *T* and the sharp dependence of *O* by observed in numerical simulations (*SI Appendix*, Figs. S15–S17). Further research in the analytical behavior of these measures would consider the prediction of *F* behavior. The theoretical and numerical analyses indicate that hierarchical organization of null models accounts for both bow-tie structures and feedforward sparse webs affected by . The analysis of different models shows that differences in the degree distribution in the models produce little impact on *TFO* behavior.

The theoretical and numerical analyses indicate that the hierarchical organization obtained from the null models accounts for both bow-tie structures and feedforward sparse webs. Null model analysis shows that the shape of the underlying degree distribution has little impact on the *TFO* behavior.

What about real nets? Here we use real networks encompassing 13 classes of natural and artificial systems [see *SI Appendix*, Figs. S44 and S45 for numerical details of values]. As Fig. 2*D* shows, a few isolated systems reach the boundaries of Ω: a cell lineage located at and a small social network located at . However, most networks fall into four clusters.

First, a group consisting of metabolic, neural, linguistic, and some social networks is found at the lower part of the bow-tie domain, clearly embedded within the cloud of random graphs (Fig. 2*D*). Interestingly, randomized nets of this group show a similar behavior although with a more central position within the cloud of random nets (Fig. 2 and *SI Appendix*, Fig. S24). An interesting case is given by the presence of bow-tie patterns in metabolic networks (52). They display a large central cycle, much larger than that observed in their randomized counterparts (*SI Appendix*, Fig. S29). This likely reflects the advantage of reusing and recycling molecules. The second group placed at the plane shows a narrow band of feedforward nets, including electronic circuits with and software graphs slightly biased to negative values. Here too the dispersal seems consistent with what is expected from very diluted random graphs (Fig. 2*C*). This sparseness is a consequence of engineering practices focused on reducing the wiring costs while keeping the system connected (53).

The third group displays slightly positive values of and is composed of graphs with cycles of small size but with a predominant position at the feedforward structure giving rise to a very high with variable . These are gene regulatory networks plus a protein kinase network. The broad range of values is due to the variable size of modules located at the top of the structure. The special location in Ω, far from the random cloud, is caused by a small fraction of genes, the DNA-binding elements (transcription factors) located at the top of the network, which participate in cycles. Finally, the fourth group is defined by an isolated cluster of ecological flow graphs, located around . Their values indicate a certain degree of pyramidal structure and the low s are consistent with an important role played by loops. The special status of these networks (not shared by other webs) is consistent with the well-known picture of a trophic pyramid combined with the presence of recycling (16, 54).

The four clusters point to different scenarios pervading the origins of their hierarchical organization. One key observation is that most datasets are found within the envelope predicted by the ensembles of random graphs (Fig. 2). Because these null models do not consider optimal designs or functional traits, we conclude that, as was reported in the context of modularity (55, 56), hierarchical order may be a by-product of inevitable random fluctuations, which spontaneously generate graph correlations. In this context, bow-tie networks, which have been suggested to define a flexible, optimally designed, and robust type of system (22, 42), would be instead a by-product of the generation rules responsible for network growth.

## Morphospace Accessibility Under Evolutionary Search

The previous results raise the question of how the voids in the morphospace need to be interpreted. To decide whether they are simply forbidden or have not been reached by evolution, we used an evolutionary search algorithm (57), which, starting from a random configuration, tries to get the points of an evenly gridded partition of the morphospace (details in *SI Appendix*). The results provide a picture of how accessible are the different regions of Ω. Very briefly, the evolutionary algorithm starts from a given set of small random graphs belonging to the cloud of null models. Given a target point of Ω, these graphs are evolved by a random process of link addition and deletion with selection of networks minimizing the distance . The number of nodes for every graph, , remains constant in this process and graphs remain connected. Only the number of links and their distribution are affected by the evolutionary algorithm. In this way the algorithm explores the network space, approaching the desired point and sometimes reaching it. The high computational cost of this experiment makes it difficult to operate with larger networks. However, their small size provides also an advantage for the evolutionary search in the change of the network configuration because a few changes in the connections have in general a deep impact in their structure. In this way, the use of small network sizes contributes to an efficient exploration, providing a coarse-grained picture of network reachability. Further computational and theoretical efforts on the analysis of the network evolutionary behavior would contribute to a deeper understanding of the accessible regions of the *TFO* morphospace.

Fig. 4 reveals that the cloud of null models represented in Fig. 2*B* is easily accessible, as indicated by the dark blue color of the region. As expected, hierarchical and antihierarchical regions are quite symmetric. Deviations are due to the dispersion produced by the finite-size statistics, especially when the resulting become very little, by the imposition of a high number of cycles, as happens for . This is the reason why, at , reachability is rather heterogeneous. The solutions for this orderability are possible only by forcing the network population to exhibit a large fraction of the nodes within cycles. This constraint inevitably produces a with just a handful of nodes. Then, as happens for a broad number of topological measures, values of *T* and *F* are sensitive to small variations in network configurations. Such a trend is less dramatic when *O* increases (see and in Fig. 4) because the fraction of nodes belonging to cycles is small enough to produce a rich combination of configurations in the resulting node-weighted condensed graphs.

However, the most interesting results here concern the presence of inaccessible regions, shown here in red. Low levels of *O* seem to reduce the space of possible conformations. At the extremes of *T* and *F* are inaccessible under our *in silico* evolutionary experiments. Such a behavior is relaxed at Here, a large region of high reachability is observed for extreme values of *T* but not occupied by real networks. This may indicate that a part of Ω is accessible and yet not occupied, suggesting that the spontaneous correlations created by random fluctuations provide the source of order for free. This would give support to the role played by nonadaptive processes (58) in shaping hierarchies both in nature and in man-made systems.

Finally, an interesting trend is observed when *O* approaches its upper bound. The larger *O* is, the more reduced is the range of the possibilities of *F*. Close to , only graph configurations for exist, coinciding with feedforward networks encompassing electronic circuits and software networks.

## Concluding Remarks

Both biological evolution and cultural evolution operate under a number of deep constraints (59). Some of them result from the underlying rules of network growth and change, which strongly limit the repertoire of potential designs. An important question posed by evolutionary theory is the nature and relevance of such constraints in shaping the space of the possible. Our study provides a rationale for exploring the possible and the actual in complex networks under a static view dominated by causal relations among components and modules. In this context, the inclusion of functionality, dynamics, or weighted structures has not been taken into account and should be the object of further work. By defining a general space of hierarchical webs, we are able to detect the presence of a rather limited domain occupied by real and random null models. The large voids surrounding these clusters of webs (defining four major groups) are partially inaccessible and partially reachable, as shown by means of a directed evolution algorithm.

The majority of webs display a balance between integration of multiple signals and control over multiple targets under a bow-tie structural pattern. The computational nature of regulatory networks and the combination of layers and cycles common to energy flows in food webs separate them from this large cluster. The matching of random and real webs in the first two clusters suggests that their hierarchical features can be accounted for from the spontaneous correlations associated to random graphs of a given degree, indicating that the observed webs are simply the most probable ones. By connecting network theory with theoretical morphology a powerful picture of complexity emerges, which allows us to both characterize hierarchical order and provide an evolutionary framework to explain how hierarchy emerges in nature. The formalism presented in this work provides a suitable framework for the quantitative approximation for the study of hierarchical organizations, and links to nonequilibrium thermodynamics could be defined in the future, attending to the similarity of certain approaches (60, 61). Further effort in the inclusion of the strength of relations among elements from empiric data as weighted graphs would contribute to a more accurate view of the hierarchy of systems. Future work in the development of generative models for the study of the emergence of hierarchy will be of strong interest in the study of dynamics in the exploration of the limits of what is possible for natural, technological, and social organizations.

## Acknowledgments

We thank Complex Systems Laboratory members and Olaf Sporns for useful comments; Wormbase, Vladimir Batagelj, and Andrej Mrvar for the Pajek dataset; and Mark Newman for his network dataset. R.V.S. thanks D. Erwin, E. Smith, G. West, and M. Gell-Mann for useful discussions on hierarchy. The Botín Foundation, the James McDonnell Foundation, Spanish Government Grant E-28-2012-0504681 (to J.G.), the Austrian Fonds zur Förderung der Wissenschaftlichen Forschung project “Quantifying socio-economic multiplex networks in a massive multiplayer online game.” KPP23378FW (to B.C.-M.), and the Santa Fe Institute supported this work.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. E-mail: carlos.rodriguez{at}upf.edu or ricard.sole{at}upf.edu.

Author contributions: B.C.-M., J.G., R.V.S., and C.R.-C. designed research; B.C.-M., J.G., and C.R.-C. performed research; B.C.-M., J.G., and C.R.-C. contributed new reagents/analytic tools; B.C.-M., J.G., R.V.S., and C.R.-C. analyzed data; B.C.-M., R.V.S., and C.R.-C. wrote the paper; and J.G. performed programming and computational analysis.

The authors declare no conflict of interest.

↵*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1300832110/-/DCSupplemental.

## References

- ↵
- Simon HA

- ↵
- ↵
- Amaral MHR,
- Loch CH,
- Wilkinson D,
- Huberman BA

- ↵
- ↵
- ↵
- ↵
- Batty M,
- Longley P

- ↵
- West GB,
- Brown JH,
- Enquist BJ

- ↵
- ↵
- Yu H,
- Gerstein M

- ↵
- Cosentino Lagomarsino M,
- Jona P,
- Bassetti B,
- Isambert H

- ↵
- Bhardwaj N,
- Yan KK,
- Gerstein MB

- ↵
- ↵
- ↵
- ↵
- ↵
- Rodríguez-Iturbe I,
- Rinaldo A

- ↵
- ↵
- Eldredge N

- ↵
- McShea DW

- ↵
- ↵
- ↵
- Pumain D

- Lane D

- ↵
- Ravasz E,
- Somera AL,
- Mongru DA,
- Oltvai ZN,
- Barabási AL

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Nicolis JS

- ↵
- ↵
- ↵
- Niklas KJ

- ↵
- McGhee GR

- ↵
- Thomas RD,
- Shearman RM,
- Stewart GW

- ↵
- Shoval O,
- et al.

- ↵
- Schuetz R,
- Zamboni N,
- Zampieri M,
- Heinemann M,
- Sauer U

- ↵
- ↵
- Whyte LL,
- Wilson AG,
- Wilson DM

- ↵
- ↵
- Gross J,
- Yellen J

- ↵
- Bonchev D,
- Rouvray D

- ↵
- ↵
- ↵
- ↵
- Bollobás B

- ↵
- ↵
- ↵
- Newman MEJ

- ↵
- Ma HW,
- Zeng AP

- ↵
- ↵
- ↵
- ↵
- Solé RV,
- Valverde S

- ↵
- ↵
- ↵
- Solé RV,
- et al.

- ↵
- ↵
- ↵
- ↵
- White JG,
- Southgate E,
- Thomson JN,
- Brenner S

- ↵
- ↵
- Ma H,
- Zeng AP

- ↵
- ↵
- ↵
- Smoot ME,
- Ono K,
- Ruscheinski J,
- Wang PL,
- Ideker T

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Statistics

- Biological Sciences
- Evolution