Nonlinear bias toward complex contagion in uncertain transmission settings
Edited by Alan Hastings, University of California, Davis, CA; received July 18, 2023; accepted November 24, 2023
Significance
Contagion dynamics are usually separated into two classes: simple contagion, used notably to describe the spread of infectious diseases, and complex contagion, mainly used to model the spread of certain social phenomena. A distinguishing feature of simple contagion is that the rate of infection of an individual is proportional to the number of exposures—being in contact with two infectious individuals doubles the rate. Complex contagions on the other hand allow different nonlinear rates of adoption. In this paper, however, we show that an imperfect knowledge of the transmission settings, like how fast a disease spreads in different environments, blurs the line between the two by introducing a systematic nonlinear bias in the rate of adoption.
Abstract
Current epidemics in the biological and social domains are challenging the standard assumptions of mathematical contagion models. Chief among them are the complex patterns of transmission caused by heterogeneous group sizes and infection risk varying by orders of magnitude in different settings, like indoor versus outdoor gatherings in the COVID-19 pandemic or different moderation practices in social media communities. However, quantifying these heterogeneous levels of risk is difficult, and most models typically ignore them. Here, we include these features in an epidemic model on weighted hypergraphs to capture group-specific transmission rates. We study analytically the consequences of ignoring the heterogeneous transmissibility and find an induced superlinear infection rate during the emergence of a new outbreak, even though the underlying mechanism is a simple, linear contagion. The dynamics produced at the individual and group levels are therefore more similar to complex, nonlinear contagions, thus blurring the line between simple and complex contagions in realistic settings. We support this claim by introducing a Bayesian inference framework to quantify the nonlinearity of contagion processes. We show that simple contagions on real weighted hypergraphs are systematically biased toward the superlinear regime if the heterogeneity of the weights is ignored, greatly increasing the risk of erroneous classification as complex contagions. Our results provide an important cautionary tale for the challenging task of inferring transmission mechanisms from incidence data. Yet, it also paves the way for effective models that capture complex features of epidemics through nonlinear infection rates.
Sign up for PNAS alerts.
Get alerts for new articles, or get an alert when an article is cited.
Models of epidemics on networks allow us to account for the complex contact structures found within human populations (1), albeit at the price of having to make significant simplifying assumption. Most commonly, we assume that there is a linear relationship between exposure and the rate of infection and that the slope of this relationship is accurately captured by an average transmission rate (2). The ongoing COVID-19 pandemic challenges these assumptions as the risk of transmission has been shown to vary 20-fold or more between indoor and outdoor settings (3) and that simple differences in indoor ventilation can also greatly affect the risk of transmission (4). Similarly, the environmental media will affect the different modes of transmission of influenza A viruses (5), leading to variable risks of infection. Therefore, there is no single “transmission rate” for diseases like influenza or COVID-19 since it varies across spaces and activities simply due to the physical settings of the interactions.
Heterogeneous transmission is not unique to respiratory diseases, the context of contacts always matters for biological pathogens—famously so for sexually transmitted infections (6, 7)—and is perhaps even more relevant for the study of social contagions (8). For instance, individuals might behave or express themselves in different ways in different groups (9). One well-studied example is that of positive feedback between affective sharing and similarity attraction among group members (10), where individuals might share more with people they find similar to themselves. Indeed, recent models now attempt to include the impacts of such context-dependent behavior on epidemic dynamics (11).
One critical lesson from the study of complex systems is that a quantity that varies across orders of magnitude is unlikely to be well described and captured by its mean as many models often assume. Despite evidence that context-specific transmission is a key feature of contagions of all sorts, there is currently no general approach to model these dynamics.
Modeling heterogeneous transmission rates in different settings is challenging because they induce important dynamical correlations between agents found in that setting (12). For instance, consider an office building with an inadequate ventilation system; not only is transmission increased around infectious individuals, but we are also more likely to find infectious individuals in this building given that people there work in a setting with bad ventilation. Models thus need to capture two important features: heterogeneity in transmission rates across settings and dynamical correlations between the epidemiological states of individuals found in these different settings.
Accounting for these heterogeneities and correlations in disease spread can be done in multiple ways. We can use stochastic simulations on networks to fully capture the structure of contacts and of the context-dependent transmission rates of infectious diseases. Here, we mainly turn toward recent advances in the modeling of higher-order networks (13) and use weighted hypergraphs in an approximate master equation framework (14) to capture these very same features. We show that these two approaches are equivalent, but the latter allows us to unravel the complex dynamics produced by the heterogeneous transmission.
More precisely, we show that heterogeneous transmission rates across settings can be captured using a superlinear infection rate at the level of groups. For instance, if there are many more infectious individuals in an office building than expected on average, one can infer that the local transmission rate is likely greater than the mean. This leads to a local transmission rate that varies with the number of infectious, and the functional forms can be rich and varied depending on the underlying heterogeneity. We demonstrate how to perform this mapping in the context of both archetypal Susceptible-Infectious-Susceptible (SIS) and Susceptible-Infectious-Recovered (SIR) dynamics.
As we derive these results in the next sections, it is important to keep in mind the potential impacts of this nonlinearity. Notably, using a Bayesian inference framework, we show that data produced by systems with heterogeneous rates lead to a systematic nonlinear bias in the inferred infection rate function if heterogeneity is not taken into account. This is usually interpreted as an indicator of complex contagions (15, 16) or, more recently in the network science community, an indicator of higher-order interactions (13). Therefore, without a careful treatment of heterogeneity in the transmission settings, distinguishing simple and complex contagions becomes impractical. This may come as a surprise since complex contagion with superlinear infection rates typically leads to dramatically different outcomes than linear ones (2, 17–19). However, the nonlinearity induced by heterogeneous rates is only an effective model, and as we demonstrate here, is solely valid in a limited time window. In a nutshell, the fact that heterogeneous transmission rates map to a superlinear infection rate is a cautionary tale for mechanistic inference but also an opportunity for practitioners to improve models, forecasts, and interventions. We provide examples to highlight these different aspects in what follows.
Results
Contagion Models.
We consider infinite-size random higher-order networks: Nodes belong to groups of size n and each node has a membership m, corresponding to the number of groups in which it participates. The ensemble is characterized by a group size distribution and a membership distribution . Nodes are assigned to groups uniformly at random, and there are no correlations between m and n. Also, each group of size n possesses some intrinsic transmission rate variable drawn from a conditional probability density function .
Formally, we are describing an ensemble of random weighted hypergraphs, where λ is the weight associated with a group [Fig. 1A]. We therefore refer to a specific group type by the pair , with joint probability density .
Fig. 1.

On these higher-order networks, we consider simple contagion processes in which each node is either infectious, susceptible, or recovered. Below, we mainly focus on the Susceptible-Infectious-Susceptible model in which infectious individuals who recover immediately become susceptible again. Equivalent derivations for the Susceptible-Infectious-Removed model are provided in Materials and Methods (Section E).
In a group of type with infectious nodes, each of the susceptible nodes become infectious at a linear rate . Infectious nodes recover at a rate set to 1 without loss of generality. We denote by the fraction of nodes that are of membership m and are susceptible at time t, and denote by the fraction of all groups that are of type with i infectious nodes at time t. The evolution of these quantities is governed by the following system of approximate master equations (2)
[1a]
[1b]
This system is technically infinite dimensional because of λ. However, if we assume a discretization such that with finite, then the system contains a total of equations, where and are the maximal membership and maximal group size respectively.
In Eq. 1a, the evolution of each is treated in a heterogeneous mean-field fashion (1): The first two terms describe infectious nodes recovering at unit rate (they represent a fraction of all nodes), while the third term corresponds to new infections of susceptible nodes member of m groups, with the average rate of infection within each of these groups. In Eq. 1b, the evolution of each is described using a master equation characterizing the inflow and outflow of probabilities associated to all possible states—all possible number i of infectious—for a group of type . The first two terms describe infectious nodes recovering at a unit rate, while the last two correspond to new infections. The infection rate due to infectious nodes within the group is treated exactly (i.e., the terms involving λ), while the contribution of all other groups to which a susceptible node belongs is approximated by the average infection rate . We thus call this an approximate master equation system.
The mean-field quantities and are calculated as
[2a]
[2b]
Note that unless specified otherwise, sums over m (n) are over every value such that (), and sums over i cover the range . The estimation of corresponds to the average rate of infection for a susceptible node in a group, which is calculated by averaging over groups proportionally to their number of susceptible, . We then estimate by multiplying with the expected number of other groups a susceptible node in a group belongs to. The membership distribution of a susceptible node in a group is proportional to —because of the friendship paradox—and the number of other groups is .
The global prevalence (average fraction of infectious nodes) is then measured as
In Figs. 1 B and C, we show the accuracy of our framework compared to Monte Carlo simulations, for both the SIS and SIR models.
Note that we model contagion on a quenched (static) hypergraph representing the backbone of social interactions, but the formalism allows more flexibility. We could choose other forms for , for instance, to represent dynamically changing random interactions—for instance, random encounters at the grocery store—more in line with standard mass action models. Therefore, can be seen as a general mean-field term that couples otherwise isolated group interactions. Let us emphasize that other forms of coupling would not change the main results in this paper, which mainly concern the local group dynamics.
A clear limitation of our theoretical framework however is the hypothesis of a randomized structure. As we will show, our results still hold quite well for general higher-order networks, but already one can envision generalization of this work to other formalisms incorporating more structural features. Compartmental approaches taking into account degree-based correlations (20) and individual-based mean-field approaches (1, 21, 22) could potentially fill this gap; it is, however, essential that they not only describe accurately the network but also take into account local dynamical correlations, as in ref. 11, a crucial element for what will unfold.
Characterization of the Effective Transmission Rate.
The system of ODEs given by Eq. 1 is highly resolved and of high dimension—let us call the complete partition. While data on group interactions are possible to extract (membership and size), the strength of these interactions which will dictate the local transmission rate λ is much harder to measure. This likely explains why typical models ignore such heterogeneity and use a homogeneous transmission rate . While this modeling assumption is standard practice, we show here that it systematically transforms the infection rate into a superlinear function.
To model heterogeneous group transmissibility with a homogeneous rate, we need to average over the transmission rate, without losing the correlation between the state of a group and the underlying local transmission rate. Namely, we focus on the following coarse-grained system
[3a]
[3b]
where is the coarse-grained partition, and where is the effective transmission rate in a group of size n where i nodes are already infectious. Note that the definition of remains the same [Eq. 2b], but that we redefine
[4]
There is no approximation involved when passing from Eq. 1b) to Eq. 3b. However, the complexity of the complete system is now hidden inside the effective transmission rate
[5]
The exact description of the temporal evolution of the coarse-grained model [Eq. 3] requires the evaluation of the effective transmission rate at Eq. 5, which depends on the complete partition . However, in the early stage of an epidemic when only a vanishing fraction of individuals are infectious, the population is essentially healthy, and therefore , where is the leading eigenvector of the Jacobian matrix and is its associated eigenvalue (Materials and Methods, Section D). An important thing to notice is that the temporal term is decoupled from the term depending on λ, which is just . Therefore, the effective transmission rate at the beginning of an outbreak simplifies to
[6]
which is time invariant. Unless is sharply peaked around a value , the resulting infection rate will be a nonlinear function of i, the number of infectious nodes in the group.
It is worth underscoring that nonlinear rates or activation functions have been associated with complex contagions for some time (16, 23, 24). More recently, nonlinear infection mechanisms at the level of groups (19) were shown to be an equivalent formulation for simplicial and hypergraph contagion models (20, 22, 25–29) which have been actively studied in the past few years. In these processes, the contagion is transmitted through both pairwise and higher-order interactions involving more than two nodes when all but one is infectious. In the context of simplicial contagion, for instance, the infection rate within a group—associated with a simplex—becomes a combinatorial sum of all active transmission channels. However, this can be transformed into a generic nonlinear function of i, the number of infectious nodes in the simplex (see Materials and Methods, Section A for an explicit mapping). In essence, the effective nonlinear infection rate we find by averaging over group transmission leads to a mechanism we would associate with generic complex contagion models, but also with this more recent perspective of higher-order contagion.
Fig. 2A illustrates the temporal evolution of the effective transmission rate for a network with groups of size . We thus focus on the dependence on i, i.e., . We see that the eigenvector (EV) approximation of Eq. 6 captures accurately the effective transmission rate for a long time at the beginning of an epidemic. Notice here that the effective transmission rate increases approximately linearly with i, which results in a superlinear infection rate at the level of groups.
Fig. 2.

The EV effective transmission rate combined with the coarse-grained dynamical system [Eq. 3b] capture the early phase of an outbreak, as seen in Fig. 2 B and C for the SIS model and the SIR model (Materials and Methods, Section E), as opposed to simply considering a mean effective rate . However, when a sufficiently large portion of the population has been infected, the EV approximation breaks [see Fig. 2A, ]. At that point, the coarse-grained approximate system predicts a superexponential growth in Fig. 2 B and C—typical of models with a superlinear infection rate (2). This is not a realistic feature of the underlying system, however. Note that we observe a similar behavior with other group structures and rate distributions (SI Appendix).
Let us try to better understand why we obtain this functional form of effective transmission rate in Fig. 2, how this varies with the rate distribution, and how it breaks when a sufficient number of nodes have been infected. The leading eigenvector does not possess an explicit analytical form in general, but near the critical point, is proportional to the stationary distribution for (Material and Methods, Section D), which on the other hand possesses an explicit analytical form. Therefore, even though it seems counterintuitive since we aim to describe the early phase, the stationary state () of the SIS model provides helpful analytical insights.
Enforcing detailed balance, we find that the stationary state of the complete dynamical system [Eq. 1b] is
[7]
with (Materials and Methods, Section B). The stationary effective transmission rate is then obtained by injecting Eq. 7 into
[8]
If the system is arbitrarily close to the critical point (akin to the “low temperature” limit in statistical physics), then and . In this case, we develop , where
[9]
for all and is the gamma function. Therefore, for , we obtain the following critical effective transmission rate
[10]
The critical effective transmission rate is a ratio of consecutive moments of , and therefore depends on i. In fact, unless , it will be an increasing function of i, meaning that the rate of infection is superlinear.
The effective transmission rate captures the fact that if a group has a large number of infectious members, i, this is probably because the underlying transmission rate λ is large as well. Let us illustrate this conclusion with a simple example in which is a bimodal distribution , with . In this case,
[11]
where and . If , then , whereas if . The effective transmission rate is sigmoidal, with a soft threshold value around . If , the local rate is probably , and if , then the local rate is probably . In other words, our framework implicitly infers whether a group with i infectious members most likely possesses a local transmission rate or —indeed, another way to interpret Eq. 5 is as the posterior mean for the transmission rate λ.
Models of complex spreading often impose a similar threshold on the adoption rate—or probability—, separating a low and high regime of adoption (15, 23, 24). The rationale behind this threshold is that the benefits of adopting the social norm only become significant if a critical mass of individuals has already adopted it. This type of positive feedback mechanism is often called social reinforcement. Here, it emerges as an effective mechanism by averaging over the underlying heterogeneity.
Let us now consider more realistic rate distributions. Since we know that the ratio of consecutive moments in Eq. 10 is mostly affected by the tail of , Fig. 3 illustrates three cases of effective rate derived from distributions with increasingly heavier tails: the Weibull, the lognormal, and the Fréchet distributions. Additionally, since the theory works at all sizes n, let us consider a group of moderate size and focus on the variation of the stationary effective transmission rate as a function of i.
Fig. 3.

In Fig. 3A, we show that the Weibull distribution yields an effective transmission rate that is approximately power-law for small , i.e., . This observation explains the linear effective transmission rate in Fig. 2A, where we use . The resulting infection rate is also a power-law . This type of model has been studied initially at the population level (17) using the mass-action approximation and has been shown to represent the synergistic interaction of supercritical diseases (18). More general power-law activation functions have been used to model language dynamics (30) and can emerge from the combination of temporal heterogeneity and threshold dynamics (2), the cornerstone of most social contagion models.
In Fig. 3B, we see that the lognormal distribution produces an effective transmission rate that is approximately exponential, i.e., . Although less common as far as we know, similar effective transmission rates can emerge from the synergistic interaction of otherwise subcritical diseases in a population (18).
In Fig. 3C, we consider the even more heterogeneous Fréchet distribution—which has a power-law tail—and we recover a sigmoidal effective transmission rate, akin to the bimodal case explained above. In this specific case, the distribution is so heterogeneous that our analytical approach effectively infers two parts: groups either belong to the bulk of the rate distribution or to the tail. The soft threshold separating the regimes is now directly related to the exponent of the cumulative distribution function (Materials and Methods, Section C).
While Eqs. 6 and 10 characterize the effective transmission rate in the early stage of an epidemic, the EV approximation eventually breaks, as seen in Fig. 2. To understand why we look at the other limit case, (akin to the “high temperature” in statistical mechanistic), which is equivalent to the scenario where almost everybody in the population is infectious, , for all λ. Thus Eq. 8 becomes
[12]
In this limit, the number of infectious nodes i does not affect the effective transmission rate. In other words, dynamical correlations do not matter in this limit. Again, we can appeal to the “statistical inference” interpretation of our effective transmission rate: If the rate of infection by external groups () is very large, it is impossible to gain information about the local transmission rate from the current group state.
As predicted, all cases explored in Fig. 3 have a rate independent of i in the limit of large . However, it is worth mentioning that this limit is out of reach for most systems: Eq. 2b shows that is, in general, a finite quantity. This explains why, in Fig. 2A, is not independent of i for large t—the effective transmission rate rather takes a complicated nonlinear form, in between the low and high-temperature limits, better represented by intermediate values of in Fig. 3.
Pitfall for Mechanistic Inference and the Identification of Complex Contagions.
Our framework predicts a superlinear rate of infection in the early phase of an outbreak if we coarse-grain or average transmissions over groups, even though the true underlying contagion is linear. This systematic bias has important implications for parameter inference and the identification of complex contagion from time series (15, 31, 32). To complement our theoretical results, we introduce a Bayesian inference framework (Fig. 4) to quantify the nonlinearity of contagion processes.
Fig. 4.

Let us consider simulations of the SIS model on a network with heterogeneous group transmission as our evidence. We use the full sequence of states in the early phase of the epidemic, where is the vector of the states of all nodes at time t and T is the first time the prevalence reaches a value [Fig. 4A]. Ignoring the heterogeneous group transmission, we suppose a nonlinear infection rate of the form . We infer the parameters using the posterior distribution
[13]
Here, we use a flat prior distribution , and the likelihood is evaluated using Eq. 32 in Materials and Methods, Section F.
We first validate the framework with synthetic networks and a Weibull distribution of group transmission with shape parameter . Fig. 4A illustrates the time evolution of the prevalence for each simulation. For the simulation corresponding to the red curve in Fig. 4A, we show the joint posterior distribution in Fig. 4B, which clearly suggests a superlinear rate of infection . For each simulation, the marginal distribution on the exponent in Fig. 4C is consistent with our prediction for a Weibull distribution of group transmission rate, i.e., .
Fig. 5 shows the results of the same experiment but on real hypergraphs (Materials and Methods, Section G). The results shown in Fig. 5A were obtained from simulations on a hypergraph constructed from coauthorship data [Fig. 5A], but with a synthetic Weibull group-transmission distribution with different values of shape parameter . The relation no longer holds due to structural correlations neglected by our approach, which is where other formalisms (29) could provide improvements on our result. Nevertheless, contagions with heterogeneous group transmission remain much more accurately described by a superlinear rate of infection, and increasing the heterogeneity (increasing ) leads to a larger exponent , as predicted by our theoretical framework. In Fig. 5 B and C, we use weighted hypergraphs constructed from high-school contacts and email exchanges (33–36). The weights of the groups in both datasets are very heterogeneous, making it an ideal case study for our framework (SI Appendix). Again, for all simulations, we obtain a clear signal of superlinear contagion. In SI Appendix, we further validate that our results are robust to a change of functional form for the infection rate.
Fig. 5.

Altogether, Figs. 4 and 5 provide evidence of a dangerous pitfall for those trying to identify complex contagion from time series data. One could easily conclude erroneously that social reinforcement or other mechanisms are important factors influencing an observed contagion process, while in fact, ignored heterogeneity in transmissibility could potentially explain the apparent nonlinearity. In fact, we find that real weighted hypergraphs robustly create simple contagion dynamics that look complex once aggregated over groups.
Discussion
We developed an approximate master equation framework to capture the dynamics of contagions whose transmission rates vary arbitrarily across groups or settings. In doing so, we showed that once collapsed on an average rate of transmission, the dynamics of these contagions are mapped to superlinear rates of infection, incidentally blurring the line between simple and complex contagions in realistic settings
Interestingly, several other mechanisms can produce particular cases of the superlinear infection rates shown here. Interacting contagions can produce nonlinear dynamics that resemble the one produced by a simple contagion with a Weibull or a lognormal distribution of transmission rates [figure 1 of ref. 18]. Bursty interaction patterns between individuals and groups have also been shown to lead to power-law rate of infection (2), akin to what we observe here with a Weibull distribution of transmission rates. Perhaps most importantly, complex contagion mechanisms taking the form of threshold dynamics are used widely to model social contagion (37): Here, we show that it can be reproduced using a bimodal distribution of transmission rates, or a very heterogeneous one.
On the modeling side, the fact that multiple mechanisms can lead to a similar model is not problematic per se. In physics, this is usually celebrated as one is able to claim the universality of the resulting model. However, one distinguishing feature of the superlinear rate of infection induced by heterogeneous group transmission is that it is stable for long periods of time, as shown in Fig. 2, but it eventually breaks. This contrasts with other mechanisms that produce a nonlinear infection rate that is truly time-invariant. Therefore, nonlinear rates of infection are to be used with caution: One could calibrate a particular model early in an emerging outbreak, where it fits, but then lead to dramatically wrong predictions if extrapolated to later times, as seen in Fig. 2 B and C.
Yet, most epidemics are not left unchecked and close to their epidemic threshold, whether as they emerge or as we seek to eradicate them, superlinear infection rates could be used to construct good effective models. They capture the complex and heterogeneous dynamics of transmission in ways that simple contagion models cannot. Our recommendation, however, would be to i) limit those approaches to short-horizon forecasts and ii) use a short calibration window to continuously update the nonlinear infection rate as more data become available while minimizing the bias coming from older data. This comes as a silver lining as machine learning approaches, which by design create effective models of reality, are becoming an essential tool to provide epidemic forecasts (31, 38).
For mechanistic inference, our framework and the aforementioned studies (2, 18) highlight the inherent difficulties of this task as one needs to control for all other potential causes, be it an unobserved interaction with other dynamical processes, temporal patterns in contact networks, and heterogeneity in the transmission rate across settings. It can lead us to observe complex contagion mechanisms (16) or higher-order group interactions (13), but these are not necessarily intrinsic properties of the process. They may simply reflect a shortcoming of our modeling approach, whose assumptions and dimensionality can influence the shape of the dynamics (39). This can be problematic since many past efforts aim to measure nonlinear effects as evidence of social reinforcement or peer pressure (15, 40).
Consequently, future works should investigate more carefully the feasibility of distinguishing simple and complex contagion in more realistic scenarios, with an imperfect knowledge of the transmission in different settings. Beyond binary classification, efforts have been made to quantify the nonlinearity of contagions from real-world experiments (41). Since heterogeneous group transmission leads to a systematic superlinear bias, we encourage researchers to take this effect into account if relevant to their situation. As we gather evidence about the explanatory power of complex contagions, we must be careful and consider the subtle but important role heterogeneity can play in shaping the rate of infection.
Materials and Methods
Explicit Mapping to Simplicial Contagion.
In the simplicial contagion model (26), a d-simplex where all nodes are infectious except one infects the remaining node at rate , but also the node receives contributions from all lower-dimensional simplices included in the simplex. In ref. 19, it was shown to be equivalent to having a nonlinear infection rate at the level of groups. Indeed, interpreting a group of size as a simplex of dimension , we would decompose the infection rate as
[14]
A similar expression can be obtained for higher dimensional simplex, but with a more complicated combinatorial expansion.
Stationary State.
The complete system described in Eq. 1b eventually settles to a stationary state in the limit . The variables characterizing the stationary state are obtained by solving the following self-consistent expressions
[15a]
[15b]
Eq. 15b can be solved explicitly by noting that must satisfy the simpler detailed balance condition. Indeed, all states for a group can placed on a line. At equilibrium, the flow of probability from i to must be equal to the flow of probability in the reverse direction (this can be proved by induction starting from either endpoint, or ). The detailed balance condition is
[16]
with solution
[17]
with .
For the coarse-grained system, we obtain a form very similar to Eq. 17,
[18]
where .
Stationary Effective Transmission Rate.
We exemplify three cases of increasingly heterogeneous transmission rate distributions: the Weibull, the lognormal, and the Fréchet distributions. While there are many other distributions we could investigate, the overall qualitative behavior of should be covered by one of these cases.
To simplify the notation, we use independent of n, which also implies . We use two positive real parameters , a scale parameter and a shape parameter respectively. Larger values of imply a larger variance for the distribution. Since is a scale parameter, we will always have a critical effective transmission rate of the form
in the limit with some function .
Weibull distribution.
Let us consider a Weibull distribution of the form
[19]
The tail of this distribution is driven by the exponential term, which decreases slower with λ for larger .
In the limit , we have
This is illustrated in Fig. 3A. For large i, this implies
Therefore, Weibull distributed rates lead to a power-law effective transmission rate . Note that for , the distribution is peaked, and we recover a constant rate.
It is worth mentioning that all distributions with an exponential tail produce similar power-law behavior. The exponential distribution is directly a subcase (), and it is easy to show that a gamma distribution would also produce an approximately power-law rate of infection.
Lognormal distribution.
Let us now consider a distribution with a tail that decreases slower than the Weibull, the lognormal distribution
[20]
The tail of this distribution is driven by the exponential term again, but the exponential argument decreases with , which is overall faster than a power-law, but slower than the Weibull.
In the limit , we have
This is illustrated in Fig. 3B. Therefore, lognormal distributed rates lead to an effective transmission rate that increases exponentially, . Note that again for , we recover a peaked distribution and the effective transmission rate is a constant.
Fréchet distribution.
Let us now use a Fréchet distribution (also known as inverse Weibull),
[21]
The Fréchet distribution has a power-law tail, of the form . This means that the moment of order i, , is undefined if . We, therefore, restrict to have a well-defined average rate . Let us also introduce a cutoff value , where .
Using a change of variable , in the limit we have
where we recognize a ratio of incomplete gamma functions, whose behavior for depends on i and .
If , then the limit is well defined and corresponds to
If instead , we have
which diverges like . Finally, if , we have
which diverges like , and for , it is a constant independent from i. We have omitted the equality cases and , which are only intermediate limit behavior in between the three cases above.
Putting all these cases together, for small but non-zero , the critical effective transmission rate is a sigmoid function for i with a jump around , as illustrated in Fig. 3C.
Effective Rate Based on the Leading Eigenvector.
If we want to model accurately the beginning of an epidemic, a good approximation is obtained for the effective transmission rate by considering the leading eigenvector of the Jacobian matrix (the one associated with the eigenvalue of maximal real part) of the dynamical system near the critical point. Let us rewrite Eq. 1 as
where , and is formally an infinite dimensional vector where the elements are of the form . This also means the Jacobian matrix is infinite-dimensional.
We linearize the dynamical system near the state and . To simplify the notation, all quantities in this section are evaluated at the critical point. First,
[22]
Indeed, the only term in depending on is , but since at the critical point, the above expression holds. Therefore, we can ignore the part of the Jacobian since it does not influence the part of the Jacobian, which is the important one determining the effective transmission rate.
Second, from Eq. 1b, we can show that
[23]
Let us define as the part of the leading eigenvector of the Jacobian matrix. It must therefore respect the eigenvector relation
[24]
where is the associated eigenvalue. Using Eq. 23, we obtain the simplified expression
[25]
where
[26]
Note the similarity with Eq. 15b: At the critical point (), they exactly match, which means .
The simplest way to solve Eq. 25 in general is by using a power method. Note that might not be the eigenvalue with the largest magnitude. For instance, let us assume there exists an eigenvalue such that . Note that we restrain ourselves to real eigenvalues and eigenvectors by choosing a real starting eigenvector at random. We then solve for the leading eigenvector by considering the following iteration procedure
[27]
where , is the Jacobian matrix restrained to the part, and is a parameter that can be tuned. The matrix has the same eigenvectors as , but its eigenvalues are shifted and rescaled. Therefore, by choosing sufficiently small, we can ensure that the procedure converges on the leading eigenvector.
SIR Model.
Let us now assume that infectious individuals who recover are removed from the pool of susceptible (for instance, they could be immune to the disease), leading to a Susceptible-Infectious-Removed (SIR) model. To describe this model, we can consider that n is no longer fixed and characterize the sum of infectious and susceptible nodes in a group, which we name the effective size of a group. Therefore, when an infectious node recovers, the effective size is reduced by one, . This requires little change to our approximate master equations for the complete system:
[28a]
[28b]
[28c]
Indeed, we simply remove the term in the second equation because there is no positive input of susceptible individuals anymore, and we change the first term in the third one to account for the reduction of the effective size of a group when infectious nodes recover. The fraction of nodes that are infectious and of membership m, , is no longer because of the nodes that are removed, so we included it in the system of differential equations. We can calculate the number of removed nodes as
[29]
Similarly, only the first term on the right-hand side changes for the coarse-grained system
[30]
Since there is no stationary state for the SIR model, we can only rely on the leading eigenvector of the Jacobian matrix to approximate the effective transmission rate . The eigenvectors for the complete SIR model respect a very similar self-consistent relationship, namely
[31]
with the only change being on the right-hand side.
Likelihood for Statistical Inference.
The type of models we consider are continuous-time and time-homogeneous Markov processes. To infer the parameters of a nonlinear contagion model, we evaluate the likelihood
[32]
where M is the number of state transitions, correspond to the time of these transitions, is the total rate of transition out of the state , and gives the probability that the next state after is . We compute by summing the rate of all possible recovery and infection events, namely
[33]
where is the number of infectious nodes, is the set of all groups and () is the size (number of infectious) of group g. Assuming is a recovery event, then is simply , while if it is an infection event—node k got infected—then
[34]
where is the subset of groups to which node k belongs.
Real Weighted Hypergraphs.
In Fig. 5A, we use coauthorship data from DBLP (33). It consists of a list of publications (groups) and authors (nodes belonging to groups), which naturally takes the form of a hypergraph. The original dataset contains 1,831,127 nodes and 2,954,518 groups; to perform stochastic simulations, we used a subhypergraph obtained from a breadth-first search. We started from a random group and visited all groups at a maximum distance of 3. The resulting subhypergraph contains 116,700 and 136,108 groups.
In Fig. 5B, we use high-school contact patterns originating from the SocioPatterns research collaboration (34). We use the version available on XGI-DATA (42), processed by ref. 33. Wearable sensors detect pairwise interaction between people at a resolution of 20 s. Maximal cliques of interacting individuals are then promoted to higher-order group interactions. Because these are timestamped group interactions, one could construct a temporal hypergraph. Instead, we associate a weight to each unique group interaction, corresponding to the number of times it appears in the dataset. The result is a weighted hypergraph.
In Fig. 5C, we use email exchanges within a large European research institution (33, 35, 36). We use the version available on XGI-DATA (42). It consists of communication between institution members, and all individuals involved in an email are associated to a group interaction. Again, these are timestamped group interactions, but instead, we associate a weight to each unique group interaction corresponding to the number of times it appears in the dataset, resulting in a weighted hypergraph.
See SI Appendix for more information on the hypergraphs properties.
Data, Materials, and Software Availability
Code and network data have been deposited in Zenodo (43). All other data are included in the manuscript and/or SI Appendix.
Acknowledgments
L.H.-D. acknowledges financial support from the NIH 1P20 GM125498-01 Centers of Biomedical Research Excellence Award. A.A. acknowledges financial support from the Sentinelle Nord initiative of the Canada First Research Excellence Fund and from the Natural Sciences and Engineering Research Council of Canada (project 2019-05183). G.S.-O. acknowledges financial support from the Fonds de recherche du Québec - Nature et technologies (project 313475) and support from the Cooperative Agreement no. NU38OT000297 from the Council of State and Territorial Epidemiologists. The findings and conclusions in this study are those of the authors and do not necessarily represent the official position of the funding agencies.
Author contributions
G.S.-O., L.H.-D., and A.A. designed research; G.S.-O. performed research; G.S.-O. contributed new reagents/analytic tools; G.S.-O., L.H.-D., and A.A. analyzed data; and G.S.-O., L.H.-D., and A.A. wrote the paper.
Competing interests
The authors declare no competing interest.
Supporting Information
Appendix 01 (PDF)
- Download
- 514.72 KB
References
1
R. Pastor-Satorras, C. Castellano, P. Van Mieghem, A. Vespignani, Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925–979 (2015).
2
G. St-Onge, H. Sun, A. Allard, L. Hébert-Dufresne, G. Bianconi, Universal nonlinear infection kernel from heterogeneous exposure on higher-order networks. Phys. Rev. Lett. 127, 158301 (2021).
3
J. G. Allen, A. M. Ibrahim, Indoor air changes and potential implications for SARS-CoV-2 transmission. JAMA 325, 2112–2113 (2021).
4
J. M. Robles-Romero, G. Conde-Guillén, J. C. Safont-Montes, F. M. García-Padilla, M. Romero-Martín, Behaviour of aerosols and their role in the transmission of SARS-CoV-2; a scoping review. Rev. Med. Virol. 32, e2297 (2022).
5
T. P. Weber, N. I. Stilianakis, Inactivation of influenza A viruses in the environment and modes of transmission: A critical review. J. Infect. 57, 361–373 (2008).
6
H. W. Hethcote, J. A. Yorke, Gonorrhea Transmission Dynamics and Control (Springer, Heidelberg, 1984).
7
S. T. Leu et al., Sex, synchrony, and skin contact: Integrating multiple behaviors to assess pathogen transmission risk. Behav. Ecol. 31, 651–660 (2020).
8
N. O. Hodas, K. Lerman, The simple rules of social contagion. Sci. Rep. 4, 4343 (2014).
9
A. Pentland, Honest Signals: How They Shape Our World (MIT Press, 2010).
10
F. Walter, H. Bruch, The positive group affect spiral: A dynamic model of the emergence of positive affective similarity in work groups. J. Organ. Behav. 29, 239–261 (2008).
11
G. Burgio, S. Gómez, A. Arenas, Spreading dynamics in networks under context-dependent behavior. Phys. Rev. E 107, 064304 (2023).
12
G. St-Onge, V. Thibeault, A. Allard, L. J. Dubé, L. Hébert-Dufresne, Master equation analysis of mesoscopic localization in contagion dynamics on higher-order networks. Phys. Rev. E 103, 032301 (2021).
13
F. Battiston et al., Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 874, 1–92 (2020).
14
L. Hébert-Dufresne, P. A. Noël, V. Marceau, A. Allard, L. J. Dubé, Propagation dynamics on networks featuring complex topologies. Phys. Rev. E 82, 036115 (2010).
15
B. Mønsted, P. Sapieżyński, E. Ferrara, S. Lehmann, Evidence of complex contagion of information in social media: An experiment using Twitter bots. PLoS One 12, e0184148 (2017).
16
S. Lehmann, Y. Y. Ahn, Eds., Complex Spreading Phenomena in Social Systems, Computational Social Sciences (Springer, 2018).
17
Wm. Liu, H. W. Hethcote, S. A. Levin, Dynamical behavior of epidemiological models with nonlinear incidence rates. J. Math. Biol. 25, 359–380 (1987).
18
L. Hébert-Dufresne, S. V. Scarpino, J. G. Young, Macroscopic patterns of interacting contagions are indistinguishable from social reinforcement. Nat. Phys. 16, 426–431 (2020).
19
G. St-Onge et al., Influential groups for seeding and sustaining nonlinear contagion in heterogeneous hypergraphs. Commun. Phys. 5, 25 (2022).
20
N. W. Landry, J. G. Restrepo, The effect of heterogeneity on hypergraph contagion models. Chaos 30, 103117 (2020).
21
G. F. de Arruda, G. Petri, Y. Moreno, Social contagion models on hypergraphs. Phys. Rev. Res. 2, 023032 (2020).
22
J. T. Matamalas, S. Gómez, A. Arenas, Abrupt phase transition of epidemic spreading in simplicial complexes. Phys. Rev. Res. 2, 012049 (2020).
23
M. Granovetter, Threshold models of collective behavior. Am. J. Sociol. 83, 1420–1443 (1978).
24
D. Centola, M. Macy, Complex contagions and the weakness of long ties. Am. J. Sociol. 113, 702–734 (2007).
25
Á. Bodó, G. Y. Katona, P. L. Simon, SIS epidemic propagation on hypergraphs. Bull. Math. Biol. 78, 713–735 (2016).
26
I. Iacopini, G. Petri, A. Barrat, V. Latora, Simplicial models of social contagion. Nat. Commun. 10, 2485 (2019).
27
B. Jhun, M. Jo, B. Kahng, Simplicial SIS model in scale-free uniform hypergraph. J. Stat. Mech. 2019, 123207 (2019).
28
G. Ferraz, G. de Arruda, Y Moreno Petri, Social contagion models on hypergraphs. Phys. Rev. Res. 2, 023032 (2020).
29
G. Burgio, A. Arenas, S. Gómez, J. T. Matamalas, Network clique cover approximation to analyze complex contagions through group interactions. Commun. Phys. 4, 1–10 (2021).
30
D. M. Abrams, S. H. Strogatz, Modelling the dynamics of language death. Nature 424, 900 (2003).
31
C. Murphy, E. Laurence, A. Allard, Deep learning of contagion dynamics on complex networks. Nat. Commun. 12, 4720 (2021).
32
G. Cencetti, D. A. Contreras, M. Mancastroppa, A. Barrat, Distinguishing simple and complex contagion processes on networks. Phys. Rev. Lett. 130, 247401 (2023).
33
A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, J. Kleinberg, Simplicial closure and higher-order link prediction. Proc. Natl. Acad. Sci. U.S.A. 115, E11221–E11230 (2018).
34
R. Mastrandrea, J. Fournet, A. Barrat, Contact patterns in a high school: A comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS One 10, 1–26 (2015).
35
J. Leskovec, J. Kleinberg, C. Faloutsos, Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 2-es (2007).
36
H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich, “Local higher-order graph clustering” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 (Association for Computing Machinery, New York, NY, USA, 2017), pp. 555–564.
37
P. S. Dodds, D. J. Watts, Universal behavior in a generalized model of contagion. Phys. Rev. Lett. 92, 218701 (2004).
38
B. Klein et al., Forecasting hospital-level COVID-19 admissions using real-time mobility data. Commun. Med. 3, 25 (2023).
39
V. Thibeault, A. Allard, P. Desrosiers, The low-rank hypothesis of complex systems. arXiv [Preprint] (2022). http://arxiv.org/abs/2208.04848.
40
L. Weng, F. Menczer, Y. Y. Ahn, Virality prediction and community structure in social networks. Sci. Rep. 3, 2522 (2013).
41
J. Lee, D. Lazer, C. Riedl, Complex contagion in viral marketing: Causal evidence and embeddedness effects from a country-scale field experiment (Northeastern U. D’Amore-McKim School of Business Research Paper No. 409205, 2022).
42
N. W. Landry et al., XGI: A Python package for higher-order interaction networks. J. Open Source Softw. 8, 5162 (2023).
43
G. St-Onge, gstonge/heterogeneous-transmission. Zenodo. https://doi.org/10.5281/zenodo.7679204. Deposited 26 February 2023.
Information & Authors
Information
Published in
Classifications
Copyright
Copyright © 2023 the Author(s). Published by PNAS. This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).
Data, Materials, and Software Availability
Code and network data have been deposited in Zenodo (43). All other data are included in the manuscript and/or SI Appendix.
Submission history
Received: July 18, 2023
Accepted: November 24, 2023
Published online: December 28, 2023
Published in issue: January 2, 2024
Keywords
Acknowledgments
L.H.-D. acknowledges financial support from the NIH 1P20 GM125498-01 Centers of Biomedical Research Excellence Award. A.A. acknowledges financial support from the Sentinelle Nord initiative of the Canada First Research Excellence Fund and from the Natural Sciences and Engineering Research Council of Canada (project 2019-05183). G.S.-O. acknowledges financial support from the Fonds de recherche du Québec - Nature et technologies (project 313475) and support from the Cooperative Agreement no. NU38OT000297 from the Council of State and Territorial Epidemiologists. The findings and conclusions in this study are those of the authors and do not necessarily represent the official position of the funding agencies.
Author contributions
G.S.-O., L.H.-D., and A.A. designed research; G.S.-O. performed research; G.S.-O. contributed new reagents/analytic tools; G.S.-O., L.H.-D., and A.A. analyzed data; and G.S.-O., L.H.-D., and A.A. wrote the paper.
Competing interests
The authors declare no competing interest.
Notes
This article is a PNAS Direct Submission.
Authors
Metrics & Citations
Metrics
Altmetrics
Citations
Cite this article
Nonlinear bias toward complex contagion in uncertain transmission settings, Proc. Natl. Acad. Sci. U.S.A.
121 (1) e2312202121,
https://doi.org/10.1073/pnas.2312202121
(2024).
Copied!
Copying failed.
Export the article citation data by selecting a format from the list below and clicking Export.
Cited by
Loading...
View Options
View options
PDF format
Download this article as a PDF file
DOWNLOAD PDFLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Personal login Institutional LoginRecommend to a librarian
Recommend PNAS to a LibrarianPurchase options
Purchase this article to access the full text.