Understanding scaling through history-dependent processes with collapsing sample space

History-dependent processes are ubiquitous in natural and social systems. Many such stochastic processes, especially those that are associated with complex systems, become more constrained as they unfold, meaning that their sample-space, or their set of possible outcomes, reduces as they age. We demonstrate that these sample-space reducing (SSR) processes necessarily lead to Zipf's law in the rank distributions of their outcomes. We show that by adding noise to SSR processes the corresponding rank distributions remain exact power-laws, $p(x)\sim x^{-\lambda}$, where the exponent directly corresponds to the mixing ratio of the SSR process and noise. This allows us to give a precise meaning to the scaling exponent in terms of the degree to how much a given process reduces its sample-space as it unfolds. Noisy SSR processes further allow us to explain a wide range of scaling exponents in frequency distributions ranging from $\alpha = 2$ to $\infty$. We discuss several applications showing how SSR processes can be used to understand Zipf's law in word frequencies, and how they are related to diffusion processes in directed networks, or ageing processes such as in fragmentation processes. SSR processes provide a new alternative to understand the origin of scaling in complex systems without the recourse to multiplicative, preferential, or self-organised critical processes.

A typical feature of ageing is that the number of possibilities in a system reduces as it ages.While a newborn can become a composer, politician, physicist, actor, or anything else, the chances for a 65 year old physics professor to become a concert pianist are practically zero.The sample space, defined as the set of all possible outcomes of an ageing stochastic system (such as career paths), typically changes over time.Many history-dependent systems become more constrained in their dynamics as they unfold, i.e., their sample space becomes smaller over time.An example for a sample-space-reducing process is the formation of sentences.The first word in a sentence can be drawn from (the sample space of) all existing words, the use of subsequent words is constrained by the particular choice of the first word and the second word can only be drawn from a smaller sample space.As the length of a sentence increases, typically the sample space of word use reduces.
Many history-dependent processes are characterised by power-law distribution functions in the frequency and rank distribution of their outcomes.The most famous example is the rank distribution of word frequencies in texts, which follows a power law with an exponent ∼ −1, the Zipf law [1].Zipf's law has been found in countless natural and social phenomena, including gene expression patterns [2], human behavioural sequences [3], fluctuations in financial markets [4], scientific citations [5,6], distributions of city- [7], and firm sizes [8], and many more, see e.g.[9] 1 .Over the past decades there has been a tremendous effort to understand the origin of power-laws in distribution functions obtained from complex systems.Most of the explanations offered are based FIG. 1: Imagine a set of N = 20 dice with different numbers of faces.We start by throwing the 20-faced dice (icosahedron).Suppose we get a face-value of 13.We now have to take the 12-faced dice (dodecahedron), throw it, and get a face-value of say 9, so that we must continue with the 8-faced dice.Say we throw a 7, forcing us to take the (ordinary) dice, with which we throw say a 5.With the 4-faced dice we get a 2, which forces us to take the 2-faced dice (coin).The process ends when we throw a 1 for the first time.The set of possible outcomes (sample space) reduces as the process unfolds.
The sequence above was chosen to make use of the platonic dice for pictorial reasons only.The distribution of face-values (rank ordered) gives Zipf's law.
on multiplicative [10][11][12], or preferential [13][14][15] mechanisms, self-organised criticality [16], and a few other alternatives [17][18][19][20][21].Here we offer an alternative route to scaling based on processes that reduce their sample space over time.History-dependent random processes have been studied generically [22,23], however not providing a general rationale for the emergence of scaling in complex systems.The essence of sample-space-reducing stochastic pro- PN(i) The ball can only bounce downstairs, the left-right symmetry is broken.When we reach level 1 the process stops and is repeated.Sample-space reduces from step to step in a nested way (nested random walks φ).After many iterations the histogram of visits to each level follows Zipf's law, p(x) ∝ x −1 .Symmetry breaking changes the uniform probability distribution to a power law.Mixing the processes φ and φR yields distributions p(x) ∝ x −λ , with λ being the mixing ratio.
cesses can be illustrated by thinking of a set of N fair dice with different numbers of faces.The first dice has one face, the second has two faces (coin), the third one three, etc., up to dice number N which has N faces.Faces are numbered and have face Take the dice with the largest number of faces (N ) and throw it.The result is a number between 1 and N , say it is K.We now take dice number K −1 (with K −1 faces) and throw it, to get a number between 1 and K − 1, say we throw L. We now take dice number L − 1 throw it, etc..We repeat the process until we reach dice number 1, and the process stops.
The process is depicted in Fig. 1.Given that the number of dice is N , what is the probability distribution P N (n) of the possible outcomes (face values n = 1, 2, 3, • • • , N ), that were generated during the process?For later use, we call this process φ, and introduce a notation for samplespace Ω k , which is the set of all possible outcomes of the process at the next timestep, given that the outcome of the previous timestep has been k.For example in our case we have . Ω 1 is the empty set, thus, at 1 the process has no further options and stops.That the rank distribution of this process indeed is exactly Zipf's law is shown with a simple proof by in-duction on N .Given the process φ, and N = 2, then P 2 (2) = 1/2, since the dice that starts the process has two faces i = 1, 2; with probability 1/2 we throw i = 2.By construction, P 2 (1) = 1, since with dice 2 we throw i = 1 with probability 1/2, and with probability 1/2 we get i = 2 and necessarily obtain i = 1 in the next time step.Let us now suppose that P N (m) = 1/m has been shown up to level N = N − 1.Now, if the process starts with dice N , the probability to hit m directly is 1/N .Consistently, one throws any other n, N ≥ n > m, with probability 1/N .If we get such n > m, then we will get m with probability P n (m), which leads us to the recursive scheme for all m < N , P N (m) = 1 N 1 + m<n≤N P n−1 (m) .Since P n−1 (m) = 1/m, with m < n ≤ N by assumption, simple algebra yields P N +1 (m) = 1/m.As pointed out above, we have P N (N ) = 1 N , which completes the proof that This shows that this simple prototype of a sample-spacereducing processes exhibits an exact Zipf's law.An alternative picture that illustrates the pathdependence aspect of the same sample-space-reducing processes is shown in Fig. 2. In the left panel we show an iid stochastic process, where at each timestep a ball can jump from one of N sites to any other with equal probability.Since the process is independent the conditional probability of jumping from site i to site j is P (j|i) = 1/N .There is no path dependence, and the sample-space Ω x is constant over time and potential outcomes of the dice, We refer to this process as the unconstrained random walk, and denote it by φ R .The distribution of visits to each site is p(i) = 1/N , for all sites i, see Fig. 2. To introduce path dependence, now imagine a ball that can bounce downstairs to lower levels randomly, but that can never climb to higher levels, Fig. 2 (right panel).If at time t the ball is at level (site) i, at t + 1 all lower levels j ∈ Ω i can be reached with the same probability 1/(i−1), P (j|i) = 1/(i − 1), for j < i.To bounce to higher levels is forbidden, P (j|i) = 0, for j ≥ i.The process ends when the lowest stair level 1 is reached.In this process, obviously sample-space displays a nested structure, These process are sample-space-reducing, we call them nested random walks.This type of nesting breaks the symmetry of the iid stochastic process.The distribution of the visits to each sites (levels) is p N (i) = 1/i: Since this process is equivalent to φ, the same proof applies.The breaking of symmetry in random walks can naturally lead to power laws [24].
It is conceivable that in many real systems nestedness of sample-space-reducing processes is not realized perfectly.In the above example this would mean that from time to time up-ward moves are allowed, or that the nested process is perturbed with noise.We will now compute the distribution function for nested processes φ, with a given noise level.In the language of the scenario depicted in Fig. 2 we look at a superposition of the nested random walk φ, and the unconstrained random walk φ R .Using λ to denote the mixing ratio, the nested process Φ with noise is More concretely, if the ball is at site i, with probability λ it jumps to any of site k ∈ Ω i (with uniform probability), and with probability 1−λ, it jumps to any of the N sites, (j ∈ Ω N +1 ).In other words, each time before throwing the dice we decide with probability λ that the sample space for the next throw is Ω i (nested φ process), or that with (1−λ) it is Ω N +1 (iid noise φ R ).We repeat this until the face value 1 is obtained for the first time, which stops the process.λ = 0 recovers the unconstrained random walk, λ = 1 gives the perfect nested walk.
The probability that the noisy nested random walker reaches site i from j in the next step is Let the probability to observe the walker at site i be p λ (i).Obviously p λ (i) = N j=1 P (i|j) p λ (j).Using Eq. (3) we get For simplicity we switch from discrete states i and j to continuous variables x and y, respectively.This is perfectly justified for systems with many states.We obtain with F (x) = x+ζ 2 λ y−1 p λ (y)dy.2 1−λ N and F (N ) are constants of the system.Taking the derivative of Eq. ( 5), we get dp λ (x) dx = − λ x p λ (x), with the solution, which is again an exact power law with exponent λ.Note that λ is the mixing parameter for the noise component.λ = 1 recovers Zipf's law, p(x) ∝ x −1 ; for λ = 0, one obtains the uniform distribution.We find perfect agreement of the result of Eq. ( 6) and numerical simulations Fig. 3(a).Sentence, and more generally, discourse formation is a process that fits exactly the properties of the sample-space-reducing type.As a first approach, in Fig. 3(b) we show the empirical distribution of word frequencies of Darwin's The Origin of Species, which has an exponent of γ ≈ 0.9, which under our framework, is recovered by a mixed process Φ λ with a mixing parameter λ = 0.9, indicating that the nesting is not perfect.Equation ( 6) is a statement about the rank distribution of the system.The result is easily transferred to probability distributions based on frequency, where the exponents are given by α = 1+λ λ , see e.g.[9], and cover interval λ ∈ (0, 1] ⇒ α ∈ [2, ∞).This means that noisy sample-space-reducing processes Φ λ are able to explain a remarkable range of exponents.Many of the observed power laws in nature display rank exponents around 1 (sometimes slightly below 1), and frequency distribution exponents between 2 and 3.In our framework this implies a mixing ratio of λ > 0.5.
The main result of Eq. ( 6) is remarkable in so far as it explains the emergence of scaling in an extremely simple way.Zipf's law emerges as a trivial consequence of breaking a directional symmetry in stochastic processes, or by nestedness of sample space as the process unfolds.More general power exponents are obtained by the addition of iid random fluctuations to the process.The relation of exponents and the noise level is strikingly simple.Sample-space-reducing processes provide a new alternative view on the emergence of scaling in natural, social, and man made systems.It is a true alternative to multiplicative, preferential, self-organised criticality, or other mechanisms e.g.proportional growth or communication constraints, that have so far been used for the understanding the origin of power laws in various contexts.As an application the emergence of scaling through sample-space-reducing processes can be used to understand Zipf's law in word frequencies.An empirical quantification of the degree of nestedness in sentence formation in a number of books allows to understand the variations of the scaling exponents between the individual books [25].Alternative growth models of scale-free networks that are not based on (non-local) preferential attachment are other obvious areas for applications.Finally we note that sample-space-reducing processes and nesting are deeply connected to phase-space collapse in statistical physics [24,[26][27][28], where the number of configurations does not grow exponentially with system size (as in ergodic systems), but grows sub-exponentially.Sub-exponential growth can be shown to hold for the 'phase-space' of the sequences of the introduced samplespace-reducing processes.

FIG. 3 :
FIG.3:(a) Rank distributions of nested random walks with iid noise contributions from simulations of Φ λ , for λ = 1 (black), 0.7 (red), and 0.5 (blue).N = 10, 000.The dependence of the exponent (slope) on noise level λ is shown in the inset.Obviously the exponent is identical with λ.(b) Empirical distribution of word frequencies in The Origin of Species (black) showing an exponent γ ≈ 0.9.The corresponding distribution of the Φ λ process with λ = 0.9 is shown (red), suggesting a slight deviation from perfect nesting.