New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Scaling laws of human interaction activity
Abstract
Even though people in our contemporary technological society are depending on communication, our understanding of the underlying laws of human communicational behavior continues to be poorly understood. Here we investigate the communication patterns in 2 social Internet communities in search of statistical laws in human interaction activity. This research reveals that human communication networks dynamically follow scaling laws that may also explain the observed trends in economic growth. Specifically, we identify a generalized version of Gibrat's law of social activity expressed as a scaling law between the fluctuations in the number of messages sent by members and their level of activity. Gibrat's law has been essential in understanding economic growth patterns, yet without an underlying general principle for its origin. We attribute this scaling law to longterm correlation patterns in human activity, which surprisingly span from days to the entire period of the available data of more than 1 year. Further, we provide a mathematical framework that relates the generalized version of Gibrat's law to the longterm correlated dynamics, which suggests that the same underlying mechanism could be the source of Gibrat's law in economics, ranging from large firms, research and development expenditures, gross domestic product of countries, to city population growth. These findings are also of importance for designing communication networks and for the understanding of the dynamics of social systems in which communication plays a role, such as economic markets and political systems.
The question of whether unforeseen outcomes of social activity follow emergent statistical laws has been an acknowledged problem in the social sciences since at least the last decade of the 19th century (1 –4). Earlier discoveries include Pareto's law for income distributions (5), Zipf's law initially applied to word frequency in texts and later extended to firms, cities and others (6), and Gibrat's law of proportionate growth in economics (7 –9).
Social networks are permanently evolving and Internet communities are growing more each day. Having access to the communication patterns of Internet users opens the possibility to unveil the origins of statistical laws that may lead us to the better understanding of human behavior as a whole. In this paper, we analyze the dynamics of sending messages in 2 Internet communities in search of statistical laws of human communication activity. The first online community (OC1) is mainly used by the group of men who have sex with men (MSM).* The data consists of over 80,000 members and more than 12.5 million messages sent over the course of 63 days. The target group of the second online community (OC2) is teenagers (10). The data covers 492 days of activity with more than 500,000 messages sent among almost 30,000 members. Both web sites are also used for social interaction in general. All data are completely anonymous, lack any message content, and consist only of the times at which the messages are sent and the identification numbers of the senders and receivers.
The act of writing and sending messages is an example of an intentional social action. In contrast to routinized behavior, the actants are aware of the purpose of their actions (2, 3). Nevertheless, the emergent properties of the collective behavior of the actants are unintended. In Fig. 1 A, we show a typical example of the activity of a member of OC1 depicting the times when the member sends messages. Figure 1 B provides the cumulative number of messages sent (green curve) compared with a random surrogate dataset (brown curve) obtained by shuffling the data, as discussed in Materials and Methods. As would be expected, there are large fluctuations in the members' activity when compared with a random signal (11 –13, 15). The messages sent at random display small temporal fluctuations, whereas the OC1 member, sends many more messages in the beginning and a lot fewer at the end of the period of data acquisition (as also seen in Fig. 1 C, displaying the number of messages sent per day). Although such extreme events or bursts have been documented for many systems, including email and letter post communication, instant messaging, web browsing and movie watching (11 –15), their origin is still an open question.
Results
Growth in the Number of Messages
The cumulative number m _{j}(t) expresses how many messages have been sent by a certain member j up to a given time t [for better readability, we will not write the index j explicitly, m(t); see details on the notation in the SI Appendix, Sec. I]. The dynamics of m(t) between times t _{0} and t _{1} within the period of data acquisition T (t _{0} < t _{1} ≤ T) can be considered as a growth process, where each member exhibits a specific growth rate r _{j} (r for short notation): where m _{0} ≡ m(t _{0}) and m _{1} ≡ m(t _{1}) are the number of messages sent until t _{0} and t _{1}, respectively, by every member. To characterize the dynamics of the activity, we consider 2 measures. (i) The conditional average growth rate, 〈r(m _{0})〉, quantifies the average growth of the number of messages sent by the members between t _{0} and t _{1} depending on the initial number of messages, m _{0}. In other words, we consider the average growth rate of only those members who have sent m _{0} messages until t _{0} (see Materials and Methods for more details). (ii) The conditional standard deviation of the growth rate for those members who have sent m _{0} messages until t _{0}, \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$\sigma ({m}_{0})\equiv \sqrt{\langle {(r({m}_{0})\langle r({m}_{0})\rangle )}^{2}\rangle }$$ \end{document} , expresses the statistical spread or fluctuation of growth among the members depending on m _{0}. Both quantities are relevant in the context of Gibrat's law in economics (7 –9) which proposes a proportionate growth process, entailing the assumption that the average and the standard deviation of the growth rate of a given economic indicator are constant and independent of the specific indicator value. That is, both 〈r(m _{0})〉 and σ(m _{0}) are independent of m _{0} (9).
In Fig. 2 A and B, we show the results of 〈r(m _{0})〉 and σ(m _{0}) versus m _{0} for both online communities. We find that the conditional average growth rate is fairly independent of m _{0}. On the other hand, the standard deviation decreases as a powerlaw of the form: We obtain by leastsquare fitting the exponents β_{OC1} = 0.22 ± 0.01 for OC1 and β_{OC2} = 0.17 ± 0.03 for OC2 (the values deviate slightly for large m _{0} due to low statistics). Although the web sites are used by different member populations, the power law and the obtained exponents are quite similar. The exponents are also close to those reported for growth in economic systems such as firms and countries (0.15 − 0.18) (16), research and development expenditures at universities (0.25) (17), scientific output (0.28 − 0.4) (18), and city population growth (0.19 − 0.27) (19). The approximate agreement between the exponents obtained for very different systems (social or of human origin) can be considered as a generalization of Gibrat's law, suggesting that the mechanisms behind the growth properties in different systems may originate in the human activity represented by Eq. 2.
Figure 2 C and D depicts the results when we randomize the data of OC1 and OC2, respectively (see Materials and Methods for details of the randomization procedure), such that any temporal correlations are removed. The typical dynamics for such surrogate data set are shown in Fig. 1 B (the brown curve), which displays a clear random pattern of small fluctuations in comparison with the original data of larger fluctuations (green curve). We find that the random signal displays a close to constant average growth rate 〈r(m _{0})〉 and that the fluctuations behave as in Eq. 2 but with an exponent β_{rnd} = 1/2 (Fig. 2 C and D). The origin of this value has a simple explanation: If an isolated individual randomly flips an ideal coin with no memory of the previous attempt, then the fluctuations from the expected value of the fraction of obtained heads decay as a squareroot of the number of throws, implying β_{rnd} = 1/2. In contrast to randomness, here we hypothesize that the origin of the generalized version of Gibrat's law with β < 1/2 in Eq. 2 is a nontrivial longterm correlation in communication activity. These correlations possibly arise from internal and external stimuli from other members transmitted through the highly connected network of individuals, an effect that is absent in the randomized data. The exponent value of β ≈ 0.2 for OC1 and OC2 implies that the fluctuations of very active members are smaller than the ones of less active members, but they are significantly larger compared with the random case (compare Fig. 2 A and B with Fig. 2 C and D).
LongTerm Correlations
The exceptional quality of the data (more than 10 million messages spanning several effective decades of magnitude in terms of both activity and time) allows to test the above hypothesis by investigating the presence of temporal correlations in the individuals' activity. We aggregate the data to records of messages per day (an example is shown in Fig. 1 C) to avoid the daily cycle in the activity and analyze the number of messages sent by individuals per day, μ(t), where t denotes the day [ \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$m(t)\equiv {\sum }_{t\text{'}=1}^{t}\mu (t\text{'})$$ \end{document} , Fig. 1 D–F shows the color coded daily activity of 3 members in OC1]. For every member, we obtain a record of a length of 63 days (OC1) or 492 days (OC2). We note that former studies reporting Eq. 2 (such as refs. 16 –19) typically were not based on data with temporal resolution as we use it here and therefore were not able to investigate its origin in terms of temporal correlations.
We quantify the temporal correlations in the members' activity by mapping the problem to a 1dimensional random walk. The quantity \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$Y(t)\equiv {\sum }_{t\text{'}=1}^{t}(\mu (t\text{'})\langle \mu (t)\rangle )$$ \end{document} , where 〈μ(t)〉 is the average of the corresponding record μ(t), represents the position of the random walker that performs an up or down step given by μ(t′) − 〈μ(t)〉 at time step t′. The correlations after Δt steps are reflected in the behavior of the rootmeansquare displacement \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$F(\Delta t)\equiv \sqrt{\langle {[Y(t+\Delta t)Y(t)]}^{2}\rangle }$$ \end{document} (20), where 〈·〉 is the average over t and members. If the activity μ(t) is uncorrelated or shortterm correlated, then one obtains F(Δt) ∼ (Δt)^{1/2}, Fick's law of diffusion, after some crossover time. In the case of longterm correlations, the result is a powerlaw increase where H > 1/2 is the fluctuation exponent [also known as Hurst exponent (20)]. In statistical physics, longterm correlation or persistence is also referred to as longterm “memory”. Because, in general, the records might be affected by trends, we use the standard detrended fluctuation analysis (DFA) (21) to calculate H (see SI Appendix, section III for a detailed description).
The results for OC1 are shown in Fig. 3 A and B, where we calculate Eq. 3 by separating the members in groups with different total number of messages sent by the members, M. We find that F(Δt) asymptotically follows a power law with H ≈ 1/2 for the less active members who sent fewer than 10 messages in the entire period (M < 10). The dynamics of the more active members display clear longterm correlations. We find that the fluctuation exponent increases to H ≈ 0.75 for members with M > 10^{3} (see Fig. 3 B). The smaller value of H for less active members could be due to the small amount of information that these members provide in the available time of data acquisition. When we shuffle the data to remove any temporal correlations, we obtain the random exponent H _{rnd} = 1/2 (as seen in Fig. 3 B), confirming that the correlations in the data are due to temporal structure.
The dynamics of the message activity in OC2 is similar to OC1 (see Fig. 3 C). On large time scales, we measure the fluctuation exponent increasing from H ≈ 1/2 to H ≈ 0.9, with increasing M (the exponents for very active members are based on poor statistics and therefore carry large error bars). Analogous to the results obtained for OC1, there are no correlations in the shuffled records (H _{rnd} = 1/2 in Fig. 3 D). The fact that H > 1/2 means that a sudden burst in activity of a member persists on times scales ranging from days to years. The distribution of activity is selfsimilar over time. Similar correlation results have been found in traded values of stocks and email data (22).
Relation Between β and H
Next, we elaborate the mathematical framework that relates the growth process Eq. 2 to the longterm correlations, Eq. 3. To relate the exponent from Eq. 2, β, to the temporal correlation exponent γ, from Eq. 4, and therefore to H, one can first rewrite Eq. 1 as Next, the total increment of messages Δm is expressed in terms of smaller increments μ(t), such as messages per day: which is (assuming stationarity) statistically equivalent to \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$\Delta m={\sum }_{t=1}^{\Delta t}\mu (t)$$ \end{document} , and one can write \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$r\approx \frac{1}{{m}_{0}}{\sum }_{t=1}^{\Delta t}\mu (t)$$ \end{document} for the growth rate. The conditional average growth is then Then the conditional standard deviation \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$\sigma ({m}_{0})=\sqrt{\langle {[r({m}_{0})\langle r({m}_{0})\rangle ]}^{2}\rangle }$$ \end{document} can be written in terms of the autocorrelation function as follows: where \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$C(\Delta t)=\frac{1}{{\sigma }_{\mu }^{2}}\langle [\mu (t)\langle \mu (t)\rangle ][\mu (t+\Delta t)\langle \mu (t)\rangle ]\rangle $$ \end{document} is the autocorrelation function of μ(t) and σ_{μ} is the standard deviation of μ(t). The autocorrelation function C(Δt) measures the interdependencies between the values of the record μ(t). For uncorrelated values, C(Δt) is zero for Δt > 0 because, on average, positive and negative products of the record will cancel out each other. In the case of shortterm correlations, C(Δt) has a characteristic decay time, Δt _{×}. A prominent example is the exponential decay C(Δt) ∼ exp(−Δt/Δt _{×}). Longterm correlations are described by a slower decay, namely a powerlaw with the correlation exponent 0 < γ < 1, which is related to the fluctuation exponent H from Eq. 3 by γ = 2 − 2H (20). We note that γ = 1 (or γ > 1) corresponds to an uncorrelated record with H = 1/2. A key property of longterm correlations is a pronounced mountain–valley structure in the records (20). Statistically, large values of μ(t) are likely to be followed by large values and small values by small values. Ideally, this feature holds on all time scales, which means a sequence in daily, weekly, or monthly resolution is correlated in the same way as the original sequence.
Assuming longterm correlations asymptotically decaying as in Eq. 4, we approximate the double sum with integrals and obtain
In order to relate Δt and m _{0}, one can use Δt = x t _{0} [where x is an arbitrary (small) constant that simply states how large Δt is compared with t _{0}], and m _{0} ∼ t _{0}, which states that the number of messages is proportional to time assuming stationary activity. By using these 2 arguments we obtain Comparing this equation with Eq. 2, we finally obtain β = γ/2, and, with γ = 2 − 2H,
Eq. 5 is a scaling law formalizing the relation between growth and longterm correlations in the activity and is confirmed by our data. For OC1, we measured β_{OC1} ≈ 0.22 yielding H _{OC1} ≈ 0.78 from Eq. 5, which is in approximate agreement with the (maximum) exponent we obtained by direct measurements for OC1 (H = 0.75 ± 0.05 from Fig. 3 B). For OC2, we obtained β_{OC2} ≈ 0.17 and therefore H _{OC2} ≈ 0.83 through Eq. 5, which is not too far from the (maximum) exponent found by direct measurements for OC2 (H = 0.88 ± 0.03). According to Eq. 5, the original Gibrat's law (β_{G} = 0) corresponds to very strong longterm correlations with H _{G} = 1. This is the case when the activity on all time scales exhibits equally strong correlations. In contrast, β_{rnd} = 1/2 represents completely random activity (H _{rnd} = 1/2), as obtained for the randomized data in Fig. 3 B and D.
The mathematical framework relating longterm correlations quantified by H and the growth fluctuations quantified by β could be relevant to other complex systems. While the generalized version of Gibrat's law has been reported for economic indicators displaying β ≈ 0.2 (16 –18), the origin of this scaling law is not clear and is still being investigated. Our results suggest that the value of β could be explained by the existence of longterm correlations in the activity of the corresponding system ranging from firms and markets to social and population dynamics. In turn, Eq. 5 establishes a missing link between studies of growth processes in economic or social systems (16 –18) and studies of longterm correlations, such as in finance and the economy (23), Ethernet traffic (24), and human brain (25) or motor activity (26). Our results foreshadow the possibility that systems involving other types of human interaction, such as various Internet activities, communication via cell phones, trading activity, etc., may display similar growth and correlation properties as found here, offering the possibility of explaining their dynamics in terms of the longterm persistence of individuals' behavior.
Growth of the Degree in the Underlying Social Network
Communication among the members of a community represents a type of a social interaction that defines a network, whereas a message is sent either based on an existing relation between 2 members or establishing a new one. There is considerable interest in the origin of broad distributions of activity in social systems. Two paradigms have been invoked for various applications in social systems: the “richgetricher” idea used by Simon in 1955 (27) and the models based on optimization strategies as proposed by Mandelbrot (28). Regarding network models, the preferential attachment (PA) model has been introduced (29) to generate a type of stochastic scalefree network with a powerlaw degree distribution in the network topology. Considering the social network of members linked when they exchange at least 1 message (that has not been sent before), we examine the dynamic of the number of outgoing links of each member [the outdegree k(t)] in analogy to Eq. 2.
We start from the empty set of nodes consisting of all the members in the community and chronologically add a directed link between 2 members when a messages is sent. In analogy to the growth in the number of messages m(t) of each member, we study the growth of the members' outdegree k(t), i.e. the number of links to others. We define the growth rate of every member as where k _{0} ≡ k(t _{0}) is the outdegree of a member at time t _{0} and k _{1} ≡ k(t _{1}) is the outdegree at time t _{1}. Again, there is a growth rate for each member j, but for better readability we skip the index. In Fig. 4, we study 〈r _{k}(k _{0})〉, the average growth rate conditional to the initial outdegree k _{0}, and σ_{k}(k _{0}), the standard deviation of the growth rate conditional to the initial outdegree k _{0} for OC1 and OC2. We obtain almost constant average growth 〈r _{k}(k _{0})〉 as a function of k _{0} as in the study of messages.
The conditional standard deviation of the network degree, σ_{k}(k _{0}), is shown in Fig. 4 for both social communities. We obtain a powerlaw relation analogous to Eq. 2: with βexponents very similar to those found for the number of messages, namely β_{k,OC1} = 0.22 ± 0.02 for OC1 and β_{k,OC2} = 0.17 ± 0.08 for OC2. These values are consistent with those we obtained for the activity of sending messages.
Next, we consider the preferential attachment model, which has been introduced to generate scalefree networks (29) with powerlaw degree distribution P(k) of the type investigated in the present study. Essentially, it consists of subsequently adding nodes to the network by linking them to existing nodes that are chosen randomly with a probability proportional to their degree. We consider the undirected network and study the degree growth properties using Eqs. 6 and 7 and calculate the conditional average growth rate 〈r _{PA}(k _{0})〉 and the conditional standard deviation σ_{PA}(k _{0}). The times t _{0} and t _{1} are defined by the number of nodes attached to the network. Fig. 2 in the SI Appendix, section IV shows the results where an average degree 〈k〉 = 20; 50,000 nodes in t _{0} and 100,000 nodes in t _{1} were chosen. We find constant average growth rate that does not depend on the initial degree k _{0}. The conditional standard deviation is a function of k _{0} and exhibits a powerlaw decay characterized by Eq. 7 with β_{PA} = 1/2. The value β_{PA} = 1/2 in Eq. 5 corresponds to H = 1/2, indicating complete randomness. There is no memory in the system. Because each addition of a new node is completely independent from precedent ones, there cannot be temporal correlations in the activity of adding links. Therefore, purely preferential attachment type of growth is not sufficient to describe the social network dynamics found in the present study and further temporal correlations have to be incorporated according to Eq. 3.
For the PA model, it has been shown that the degree of each node grows in time as \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$k(t)\tilde{\left(\frac{t}{{t}^{*}}\right)}^{b}$$ \end{document} , where t* is the time when the corresponding node was introduced to the system and b = 1/2 is the dynamics exponent in growing network models (30). Accordingly, the growth rate is given by \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $${r}_{\hbox{ PA }}=bln\frac{{t}_{1}}{{t}_{0}}$$ \end{document} , which is a constant independent of k _{0}, in accordance with our numerical findings. Furthermore, in the SI Appendix, section IV we analytically obtain the exponent β_{PA} = 1/2 and confirm the numerical results as well. Interestingly, an extension of the standard PA model has been proposed (31) that takes into account different fitnesses of the nodes to acquiring links involving a distribution of bexponents and therefore a distribution of growth rates. This model opens the possibility to relate the distribution of fitness values to the fluctuations in the growth rates, a point that requires further investigation.
Discussion
From a statistical physics point of view, the finding of longterm correlations opens the question of the origin of such a persistence pattern in the communication. At this point, we speculate on 2 possible scenarios that require further studies. The question is whether the finding of an exponent H > 0.5 is due to a powerlaw (Levytype) distribution (32, 33) in the time interval between 2 messages of the same person or just from pure correlations or longterm memory in the activity of people. In the first scenario, the intervals between the messages follow a power law (13, 34). Accordingly, the activity pattern comprises many short intervals and few long ones, implying persistent epochs of small and large activity. This fractallike activity leads to longterm correlations with H > 1/2 (see the analogous problem of the origin of longterm correlations in DNA sequences as discussed in ref. 33). This scenario implies a direct link between the correlations and the distribution of interevent intervals which can be obtained analytically. In the second scenario, the intervals between the messages do not follow a Levytype distribution, but the value of the time intervals are not independent of each other, again representing longterm persistence. For example, the distribution of interevent times could be stretched exponential [see recent work on the study of extreme events of climatological records exhibiting longterm correlations (35)]. Thus, deciding between these 2 possible scenarios for the origin of correlations in activity requires an extended analysis of interevent intervals as well as correlations to determine whether the behavior is Levylike or pure memorylike. A careful statistical analysis is needed.
To some extent, the human nature of persistent interactions enables the prediction of the actants' activity. Our finding implies that traditional meanfield approximations based on the assumption that the particular type of human activity under study can be treated as a large number of independent random events (Poisson statistics) may result in faulty predictions. On the contrary, from the growth properties found here, one can estimate the probability for members of a certain activity level to send more than a given number of messages in the future. This result may help to improve the proper allocation of resources in communicationbased systems ranging from economic markets to political systems. As a byproduct, our finding that the activity of sending messages exhibits longterm persistence suggests the existence of an underlying longterm correlated process. This process can be understood as an unknown individual state driven by various internal and external stimuli (36, 37) providing the probability to send messages. In addition, the memory in activity found here could be the origin of the longterm persistence found in other records representing a superposition of the individuals' behavior, such as the Ethernet traffic (24), highway traffic, stock markets, and so forth.
Materials and Methods
Calculations of 〈r(m_{0})〉, σ(m_{0}) and Optimal Times t_{0} and t_{1}.
The average growth rate, 〈r(m _{0})〉, and the standard deviation, \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $$\sigma ({m}_{0})=\sqrt{\langle r{({m}_{0})}^{2}\rangle {\langle r({m}_{0})\rangle }^{2}}$$ \end{document} , are defined as follows. Calling P(rm _{0}) the conditional probability density of finding a member with growth rate r(m _{0}) with the condition of initial number of messages m _{0}, we obtain and
In order to calculate the growth rate in Eq. 1, one has to choose the times t _{0} and t _{1} in the period of data acquisition T. Naturally, it is best to use all data in order to have optimal statistics. Accordingly, t _{1} is chosen best at the end of the available data (t _{1} = T). We argue that if the choice of t _{0} is too small, then m(t _{0}) is zero for many members (those that send their messages later), which are then rejected in the calculation because of the division in Eq. 1. Conversely, if the choice of t _{0} is too large, then there is not enough time to observe the member's activity and r = 0 will occur frequently, indicating no change (members have sent their messages before). Thus, there must be an optimal time in between. In the SI Appendix, section II, Fig. 1, we plot, as a function of t _{0}, the number of members with at least 1 message at t _{0} [m _{0} > 0] and further exhibit at least some activity until t _{1} = T [m _{1} − m _{0} > 0]. For both online communities, we find an optimal t _{0} in the middle of the period of observation t _{0} = T/2, a value that is used for the analysis in the main text.
Shuffling of the Message Data.
The raw data comprises 1 entry for each message consisting of the time when the message is sent, the sender identifier, and the receiver identifier. For example: At t = 1 member a sends a message to member b, at t = 2 member a sends a message to member c, and so on.
The randomized surrogate dataset is created by randomly swapping the instants (time) at which the messages are sent between 2 events chosen at random. Thus, each message entry randomly obtains the time of another one, meaning that the total number of messages is preserved and the associations between them get shuffled. Temporal correlations are destroyed, but the set of instants at which the messages are sent remains unchanged. For instance, swapping events at t = 1 and t = 6 results in t = 1, c → d, and t = 6, a → b.
Acknowledgments
We thank C. Briscoe, L.K. Gallos, and H.D. Rozenfeld for discussions. This work was supported by National Science Foundation Grant SES0624116. F.L. acknowledges financial support from The Swedish Bank Tercentenary Foundation.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: hmakse{at}lev.ccny.cuny.edu

Edited by H. Eugene Stanley, Boston University, Boston, MA, and approved June 2, 2009

Author contributions: D.R., S.V.B., S.H., F.L., and H.A.M. designed research; D.R. performed research; D.R., S.V.B., S.H., F.L., and H.A.M. analyzed data; and D.R., S.V.B., S.H., F.L., and H.A.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

↵* The study of the deidentified MSM dating site network data was approved by the Regional Ethical Review Board in Stockholm, record 2005/5:3.

This article contains supporting information online at www.pnas.org/cgi/content/full/0902667106/DCSupplemental.
References
 ↵
 ↵
 Weber M
 ↵
 Giddens A
 ↵
 Durkheim É
 ↵
 Pareto V
 ↵
 Zipf G
 ↵
 Gibrat R
 ↵
 Sutton J
 ↵
 ↵
 ↵
 ↵
 Dewes C,
 Wichmann A,
 Feldman A
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Rozenfeld HD,
 et al.
 ↵
 Feder J
 ↵
 ↵
 ↵
 Mantegna RN,
 Stanley HE
 ↵
 ↵
 LinkenkaerHansen K,
 Nikouline VV,
 Palva JM,
 Ilmoniemi RJ
 ↵
 Ivanov PC,
 Hu K,
 Hilton MF,
 Shea SA,
 Stanley HE
 ↵
 ↵
 Jackson W
 Mandelbrot B
 ↵
 Barabási AL,
 Albert R
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Hedström P
 ↵
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Related Content
 No related articles found.
Cited by...
 Quantifying the evolution of individual scientific impact
 Collective credit allocation in science
 Excitable human dynamics driven by extrinsic events in massive communities
 On the origin of longrange correlations in texts
 Evidence for a bimodal distribution in human communication
 Spontaneous emergence of social influence in online systems
 Information dynamics shape the sexual networks of Internetmediated prostitution