Subnets of scale-free networks are not scale-free: Sampling properties of networks

  1. Michael P. H. Stumpf,,
  2. Carsten Wiuf§, and
  3. Robert M. May
  1. Centre for Bioinformatics, Imperial College London, Wolfson Building, London SW7 2AZ, United Kingdom; §Bioinformatics Research Center, University of Aarhus, 8000 Aarhus C, Denmark; and Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, United Kingdom
  1. Contributed by Robert M. May, February 11, 2005

Abstract

Most studies of networks have only looked at small subsets of the true network. Here, we discuss the sampling properties of a network's degree distribution under the most parsimonious sampling scheme. Only if the degree distributions of the network and randomly sampled subnets belong to the same family of probability distributions is it possible to extrapolate from subnet data to properties of the global network. We show that this condition is indeed satisfied for some important classes of networks, notably classical random graphs and exponential random graphs. For scale-free degree distributions, however, this is not the case. Thus, inferences about the scale-free nature of a network may have to be treated with some caution. The work presented here has important implications for the analysis of molecular networks as well as for graph theory and the theory of networks in general.

Footnotes

  • To whom correspondence should be addressed. E-mail: m.stumpf{at}imperial.ac.uk.

  • Author contributions: M.P.H.S., C.W., and R.M.M. designed research, performed research, and wrote the paper.

  • Abbreviation: PGF, probability-generating function.

  • For the subnet, we have the PGF Formula Summing first over k (0 ≤ ki), and remembering P(0) = 0 (for scale-free networks), we get Formula Note that G*(1) = ΣP(i) = 1, as it should.

    The subsequent sample will, however, contain orphan nodes, given by Formula. If we redefine P*(0) ≡ 0 by discarding such orphans, we have the subnet defined by Eq. 5, where the renormalization constant C is required to compensate for the deletion of the orphan nodes: C[1 – P*(0)] = 1 or Formula

  • †† The negative binomial distribution has the PGF G(s) = [1 + (m/k)(1 – s)]k, where m represents the distribution`s mean value and k characterizes the distribution's “clumpiness” (the variance is given by σ2/m 2 = 1/m + 1/k). This widely studied distribution includes the Poisson distribution (the degree distribution of classical random graphs) as the special case k → ∞ and the exponential or geometric as k = 1. The subnet PGF is obtained, via Eq. 4, by substituting 1 – p(1 – s) for s in G(s), to get G*(s) = [1 + (mp/k)(1 – s)]k. Thus, the subnet has an identical PGF to the full distribution, excepting only that the mean is reduced to mp (the clumping parameter k is unaltered). The proof for the binomial distribution is even more trivial.

  • ‡‡ For Eq. 1 with γ = 2, the PGF of the subnet is Formula, with C given by Formula. Defining u = 1 – p + ps, whence dG*(s)/ds = pdG*/du, we have the degree distribution given by k!P*(k) = pk(dkG*(u)/duk)u = 1 – p. For small p « 1, consider first Formula. This exact result gives P*(1) = –[Cpln(p)]/(1 – p). Further differentiation gives exact, but increasingly complicated, expressions for P*(k > 1). Thus, P*(2) = [Cp/(1 – p)][1 + pln(p)/(1 – p)]. For larger k, P*(k > 2) = [Cp/k(k – 1)][1 + O(p)]. Finally, we can calculate Formula.

  • §§ By method analogous to those in the previous note, we can obtain, for small p, the analytic results: Formula Formula Formula Formula

« Previous | Next Article »Table of Contents
From the Cover