The discrete-time Kermack–McKendrick model: A versatile and computationally attractive framework for modeling epidemics

Edited by Alan Hastings, University of California, Davis, CA, and approved June 25, 2021 (received for review April 2, 2021)
September 24, 2021
118 (39) e2106332118
Commentary
A discrete-time infectious disease model for global pandemics
Abdul-Aziz Yakubu

Significance

We demonstrate the strength of a general discrete-time modeling framework in which model specification is reduced to the bare essentials. The approach is due to Kermack and McKendrick, dating from 1927. Despite the many citations, the true contents of their work is essentially unknown. On top of that, their continuous-time model takes the mathematical form of a renewal equation, and only a few experts can handle such equations numerically. Here, we establish that the discrete-time version is equally general and flexible and, yet, is very simple to parametrize on the basis of data and to implement computationally. This makes the framework highly suitable for studying control scenarios for epidemic diseases such as COVID-19.

Abstract

The COVID-19 pandemic has led to numerous mathematical models for the spread of infection, the majority of which are large compartmental models that implicitly constrain the generation-time distribution. On the other hand, the continuous-time Kermack–McKendrick epidemic model of 1927 (KM27) allows an arbitrary generation-time distribution, but it suffers from the drawback that its numerical implementation is rather cumbersome. Here, we introduce a discrete-time version of KM27 that is as general and flexible, and yet is very easy to implement computationally. Thus, it promises to become a very powerful tool for exploring control scenarios for specific infectious diseases such as COVID-19. To demonstrate this potential, we investigate numerically how the incidence-peak size depends on model ingredients. We find that, with the same reproduction number and the same initial growth rate, compartmental models systematically predict lower peak sizes than models in which the latent and the infectious period have fixed duration.
The arrival and spread of COVID-19 has caused tremendous trouble. In the wake of that, an avalanche of mathematical models arose. The present paper, too, was triggered by COVID-19, albeit indirectly. Its primary aim is methodological: we want to establish the strength of a general discrete-time modeling framework in which model specification is reduced to the bare essentials. But let us first explore the motivation for discrete-time models.
The day–night cycle has a strong impact on the behavior of humans, animals, and plants. As a rule, the resulting time heterogeneity is ignored in epidemiological and ecological models. One simply pretends that the representation of time by a continuous quantity t, “flowing” at a constant rate, is suitable for bookkeeping of the time course of the relevant events.
Census data, on the other hand, are often collected at regular intervals, so on a discrete-time basis. Indeed, as evidenced by the COVID-19 pandemic, incidence is usually reported in the form of the number of new cases on a particular day or in a specified week.
So even when the processes that we want to capture take place in continuous time, one may want to consider discrete-time bookkeeping schemes, in order to relate directly to the data, as advocated in the pioneering paper ref. 1. Moreover, a significant bonus of discrete-time models is that numerical implementation is straightforward and that, accordingly, simulations are easy to perform. In sharp contrast, the numerical solution of continuous-time renewal equations, as discussed in ref. 2, presents a substantial challenge to the uninitiated.
Counter to the practical advantages runs a modeling difficulty: The formulation of discrete-time models is subtle and, hence, error-prone. In infinitesimal time intervals, the effects of different mechanisms are independent. Consequently, one can add terms that describe contributions to the rate of change of a quantity. When trying to capture (in one go, rather than by solving a differential equation) change in a finite time interval, we do have to think about the order of events and how one event may trigger or prevent another event. In the epidemic context, a key point is that, when a susceptible host is infected by an infectious host, it cannot any longer be infected by another infectious host.
The first aim of this short note is to formulate the discrete-time version of the general Kermack–McKendrick epidemic model from 1927 (KM27). As far as we know, this has not been done before. And, yet, the model is, most likely, an ideal tool for data-driven analysis of infectious disease outbreaks. Indeed, as we shall demonstrate, the model is general, flexible, and user-friendly, both when it comes to incorporating the effects of heterogeneity (e.g., asymptomatic infection) and of (time-dependent) public health interventions and when it comes to computing outbreak dynamics. Moreover, the model parameters can be related rather directly to surveillance data (without the need to make assumptions about the duration of, e.g., the infectious period; in some sense, the model is nonparametric).
A second (and admittedly somewhat pedantic) aim of this note is to point out explicitly a frequent mistake in the formulation of discrete-time epidemiological and ecological models. We hope that by clearly exposing the underlying fallacy, the mistake will have had its day.
To put some flesh on the bones, we show that the qualitative behavior of the discrete-time model is the spitting image of the qualitative behavior of the continuous-time KM27 model. In order not to ignore the popularity of compartmental variants (which is unwarranted, in our opinion, as there is neither evidence that the length of, for instance, the infectious period is exponentially/geometrically distributed nor that infectiousness is constant during this period), we put them in the spotlight. The key point of the paper is to highlight a striking and important difference with the continuous-time formulation: the discrete-time version is very easy to implement computationally! We demonstrate this in Section 6 and then exploit it in Section 7, where we illustrate the relevance (in particular, for public health policy) of the lesser-known members of the KM27 family, by investigating numerically how the peak of the incidence curve varies among members that are identical with respect to both the initial growth rate ρ and the basic reproduction number R0, but differ in assumptions about the duration of the exposed and the infectious period.
To avoid misunderstanding, we now clarify what is, and what is not, stochastic in the models formulated and analyzed below. All models are deterministic at the population level. [One can think of them as the large initial population-size limit of a stochastic model for finitely many individuals, in the sense used by Kurtz (3), originally for chemical reactions.]
When one assumes that the infectious period of all individuals, once infected, has exactly the same length, and their infectiousness during this period is one and the same constant, there is no randomness at the individual level either.
But most models incorporate heterogeneity/stochasticity at the individual level. For instance, in the familiar continuous-time SIR compartmental model, the length of the infectious period of a newly infected individual is exponentially distributed, say, with parameter γ. If during the infectious period, all infected individuals have the same constant infectiousness β, then the expected infectiousness A(τ) at time τ after becoming infected equals βeγτ. This reflects the fact that after time τ has elapsed, the probability to still be infectious equals eγτ. At the population level, we translate this into: A fraction eγτ of those infected at time t is still infectious at time t+τ. In the discrete-time setting, we should replace the exponential distribution by the geometric distribution. As a final elucidation, we mention that the parameter β and the function A(τ) implicitly incorporate information about the contact (between individuals) process that underlies transmission. A key feature is that contacts are assumed to be uniformly at random (so neither spatial nor age nor social structure are incorporated; but see [2.10] below for a generalization that does incorporate forms of heterogeneity affecting contact probabilities).

1. The Cumulative (over One Time Step) Force of Infection

First of all, we want to motivate and promote the equation
S(t+1)=eΛ^(t)S(t),
[1.1]
as a building block for discrete-time models of the spread of an infectious disease in a host population when
The disease generates permanent immunity;
The host population is demographically closed (meaning that demographic turnover happens at a much slower time scale than transmission of the disease and is therefore ignored).
As usual, S(t) denotes the size of the subpopulation of susceptibles at census time t. Underlying [1.1] is a choice of the unit of time: It equals the length of the interval between one census and the next. So the magnitude of Λ^(t) is proportional to this length, and when this magnitude figures in our discussion below, one may interpret the statements in terms of the length of the discretization step. We call Λ^(t) the cumulative force of infection over the time window (t,t+1] for reasons that we now explain. The continuous-time version of [1.1] reads
dSdt(t)=Λ(t)S(t),
[1.2]
where Λ(t) is the force of infection at time t, i.e., the probability per unit of time for a susceptible to become infected at time t. By integration, we deduce from [1.2] the relation
S(t+1)=ett+1Λ(τ)dτS(t).
[1.3]
The first factor at the right-hand side of [1.1] and [1.3] is, in both cases, the probability for a susceptible to escape from infection in the time window (t,t+1]. The integral
tt+1Λ(τ)dτ,
in [1.3] is replaced by Λ^(t) in [1.1], and this, we hope, clarifies why we call Λ^(t) the cumulative force of infection over (t,t+1].
We insist that one should adjust the multiplicative factor (as was indeed done in ref. 1) and not replace the differential Eq. 1.2 by the “additive” difference equation
S(t+1)S(t)=Λ^(t)S(t),
i.e., by
S(t+1)=(1Λ^(t))S(t).
[1.4]
Of course, [1.4] provides a good approximation of the “true” Eq. 1.1 for small values of Λ^(t). But [1.4] is not exact, and it may fail dramatically for not so small values of Λ^(t), in particular, since it may lead to negative values of S. The reason is that [1.4] does not take into account that a host can become infected only once. To stress this point, we now present a somewhat mechanistic derivation of the multiplicative factor in [1.1], showing that [1.1] does take this into account.
Assume that when a single infectious individual is present in a certain host population (during a time interval of, say, length one), every susceptible host becomes infected with probability p. Then, any susceptible escapes from becoming infected with the complementary probability 1p. Next, assume that there are I infectious individuals and that these make contacts with susceptibles independently of each other. Then, any susceptible escapes from becoming infected with probability
(1p)I=eIln(1p).
So, a susceptible is infected with probability 1(1p)I rather than with “probability” pI.*
From a numerical point of view, the exponential has the disadvantage of being expensive in terms of calculation costs. It may therefore be tempting to reduce the step size in order to work safely with the linear approximation. We actually wonder whether solving the relevant ordinary differential equations, with, for instance, a Runge Kutta solver is not a more attractive alternative, especially in terms of accuracy. Moreover, we maintain that choosing a time interval that matches the data points has definite advantages.

2. The General Discrete-Time Kermack–McKendrick Model

As already expressed in [1.2], the incidence at time t equals Λ(t)S(t) with Λ the force of infection. Common sense tells us that the current force of infection is generated by individuals who were themselves infected some time ago. Following earlier work by Ross and Hudson (refs. 5 and 6 and references therein), Kermack and McKendrick translate this observation into the constitutive equation
Λ(t)=0A(τ)Λ(tτ)S(tτ)dτ,
[2.1]
with A(τ) the expected contribution to the force of infection at time τ after infection. So, in this top-down approach, the infinite dimensional parameter A is introduced as a key model ingredient. For any specific disease, one may, in principle, use a within-host model of the struggle between pathogen and immune system to provide bottom-up a quantitative specification. Alternatively, one may use population-level data to infer certain characteristics of A (cf. ref. 7). Often, this is done after first restricting to a parameterized family of functions A (but see ref. 8 for an alternative methodology). Note that, with N denoting the population size, we have
R0=N0A(τ)dτ,
and that, in the initial phase of the epidemic, the distribution of the generation-interval (refs. 9 and 10 and the references therein) has as density the renormalized (to have integral 1) function A.
The incidence at time t is given by Λ(t)S(t). Under the “permanent immunity and no demographic turnover” assumption, the incidence equals S(t) (cf. [1.2]). Substituting this into [2.1], we deduce by integration (and upon changing the order of integration) the identity
tt+1Λ(σ)dσ=0A(τ)[S(tτ)S(t+1τ)]dτ.
The discrete-time counterpart reads
Λ^(t)=j=1Aj[S(tj)S(t+1j)],
[2.2]
where now Aj is the expected contribution to the cumulative force of infection over (t,t+1] of an individual, who itself became infected in the time window (tj,tj+1]j time steps earlier, and S(tj)S(t+1j) is the incidence in (tj,tj+1]. So, the key model ingredient is now the collection
{Aj}j=1,
of nonnegative numbers, which we assume to be such that
j=1Aj<.
(Incidentally, note that control measures or seasonality may cause the Ak to depend on calendar time. See the end of Section 5 for a somewhat concrete example. In Section 6, we shall briefly indicate how one can easily implement this generalization.)
Eq. 1.1, with Λ^(t) specified by [2.2], provides an updating scheme, but to get started, one needs to specify an “initial” condition in the form of the history of S up to a certain point in time. The interpretation requires that this prescribed history is a monotone nondecreasing (when looking back into time) sequence, bounded from above by the total host population size N.
As we show next, one can reformulate [1.1], [2.2] as the scalar higher-order recursion relation
s(t+1)=ek=1(1s(tk+1))Ãk,
[2.3]
where
s(t)S(t)N,
[2.4]
and
ÃkAkN.
[2.5]
Eq. 2.3 is the discrete-time analog of the nonlinear renewal equation
s(t)=e0(1s(tτ))NA(τ)dτ,
[2.6]
that follows by combining [1.2] with [2.2] and incidence equal to S (2, 4). Both [2.3] and [2.6] involve the additional assumption
S()=N,
[2.7]
expressing that in the infinite past, all host individuals were susceptible.
To derive [2.3], first note that iteration of [1.1] yields, if [2.7] holds, the identity
S(t+1)=ei=0Λ̂(ti)N.
[2.8]
From [2.2], we deduce
i=0Λ^(ti)=i=0k=1Ak[S(tik)S(tik+1)]=k=1Aki=0[S(tik)S(tik+1)]=k=1Ak[NS(tk+1)].
If we use this last identity in [2.8], divide both sides of [2.8] by N and adopt the notation [2.4] and [2.5], we obtain [2.3].
If one copies [2.3], with t+1 replaced by t, and combines the two formulas, one can derive the variant
s(t+1)=s(t)ek=1(s(tk)s(tk+1))Ãk.
[2.9]
This variant has the advantage that one can provide an initial condition, say, at time 0, by prescribing s(0) and the (nonnegative) incidences ,s(3)s(2),s(2)s(1),s(1)s(0). We refer to Section 6 for a more pragmatic formulation of the initial value problem.
We conclude that [2.3]/[2.9] is the mathematical form of the discrete-time KM27 model with, in principle, a countably infinite parameter {Ak}k=1, but, in practice, a finite, say, m, dimensional parameter with an infinite tail of zeros.
While the homogeneous version is simplest, various forms of heterogeneity are easily incorporated in the discrete-time bookkeeping scheme. We may generalize [2.2] to
si(t)=ej=1ncijk=1m(1sj(tk))NjAjk,
[2.10]
so as to capture the situation where individuals are distinguished according to type i, with i running from 1 to n. Additional model parameters are the subpopulation sizes Ni and the contact matrix C with entries cij specifying the contact intensities between the various types. Note that [2.10] allows the parameters Ak to be type-specific. When “type” influences contact, but is irrelevant for intrinsic infectiousness, this dependence should be dropped. We refer to ref. 4 (chapter 12) for a discussion of some of the subtle modeling intricacies associated with the specification of C, notably, its dependence on the n-vector of subpopulation sizes.
For the continuous-time setting, it is shown in ref. 2 how to incorporate demographic turnover by adding age structure (with no constraint at all on the survival probability as a function of age!). This still leads to a scalar renewal equation, albeit one involving a double integral. In SI Appendix, we present the discrete-time analog.

3. The Initial Phase and the Final Size

To capture the demographic stochasticity during the very early phase of the introduction of an infectious disease in a host population, we need branching processes (e.g., ref. 4). But once there is a large number of infected individuals, we can switch to a deterministic description. The large number may, of course, still constitute only a rather small fraction of a very large host population. In this situation, we may put
s(t)=1x(t),
[3.1]
into [2.3] and assume that x is so small that it makes sense to replace the exponential by the zeroth and first-order terms of its Taylor expansion. This yields the linearized equation
x(t+1)=k=1Ãkx(tk+1).
[3.2]
We define
R0=k=1Ãk=k=1AkN,
[3.3]
and interpret, based on the last identity, R0 as the expected number of secondary cases caused by a primary case in a totally susceptible host population.
In order to show that positive solutions of [3.2] grow when R0>1, but decline when R0<1, we make the Ansatz
x(t)=λt.
[3.4]
By substitution of [3.4] into [3.2], we find that x defined by [3.4] is indeed a solution if and only if λ is a real root of the discrete-time characteristic equation
1=k=1λkÃk,
[3.5]
known as the Euler–Lotka equation. The nonnegativity of Ãk, k=1,2,, guarantees that [3.5] has at most one real root ρ and that it does indeed have a real root when the right-hand side assumes a value bigger than one for some real λ, so, in particular, when R0>1 (when R0<1 and Ãk has power-like behavior for k, the value of the right-hand side may jump from a value less than one to infinity when λ is decreased; when Ãk=0 for large k, this cannot happen, and ρ exists). Readers who wonder (or even worry) about the potential importance of complex roots can consult ref. 4 (section 8.2 and the references given there) to be eased.
So, we see that a key point is that the right-hand side of [3.5] is a monotone decreasing function of real λ, and, as a consequence, we have
sign(ρ1)=sign(R01).
[3.6]
(Incidentally, note that ρ corresponds to er, with r the Malthusian parameter featuring in the continuous-time theory.) General linear theory (cf. ref. 11) guarantees that positive solutions of [3.2] grow geometrically with rate ρ for t when ρ>1 (and decline with rate ρ, when ρ exists and is less than one). General nonlinear theory (cf. ref. 12) guarantees that the steady-state solution s(t)1 of [2.3] is asymptotically stable for ρ<1 (hence, for R0<1), but unstable for ρ>1, i.e., for R0>1 {here, we refer to the Principle of Linearized Stability; for the version with only finitely many Ak different from zero (cf. Section 6), the standard (i.e., finite dimensional) Hartman–Grobman Theorem implies that the intersection of the unstable manifold and the positive cone is one-dimensional; this means that, modulo translation, there is exactly one positive solution of [2.3] that has limit 1 for t (see ref. 13 for the continuous-time version)}.
So, when R0>1, the introduction of the pathogen will, provided the pathogen does not go extinct by bad (or good, depending on the point of view) luck when still very rare, break through and cause s to decrease to below 1. The interpretation makes it obvious that s is a monotone decreasing function of time and that it has a limit for t. We denote this limit by s(). The equation
s()=eR0(1s()),
[3.7]
is obtained by passing to the limit in [2.3], while using the fact that the {Ãk} are summable. For R0>1, this equation has a unique solution in (0,1) (Fig. 1 and ref. 4, exercise 1.19).
Fig. 1.
Graph of the final size 1s(), i.e., the fraction of the population that gets infected in the course of the outbreak, as a function of the basic reproduction number R0.
A comparison of the results in refs. 14, 4 (chapter 1), and 2 with those above establishes that when we compare the continuous-time and discrete-time formulations,
There is only a formal difference in the expressions for R0;
If we put ρ=er, there is only a formal difference in the equations characterizing, respectively (resp.), ρ and r;
The equations specifying s() on the basis of R0 are identical (as already noted in ref. 1).
We conclude that at the level of theory, there is an exact parallel.

4. Compartmental Formulation for Some Very Special Cases

We shall use the standard notational convention (or should one say “ambiguity”?) that a compartment and its contents are denoted by the same symbol. We start with SIR and after that generalize to SEIR, hoping that these two examples elucidate the general pattern of how to construct discrete-time models in compartmental settings. See ref. 15 for a more general set-up.
Assume that, upon infection, an individual is transferred from the compartment S to the compartment I of infectious individuals. Assume that every following time step, this infected individual stays in I with probability 1α while being “removed” (i.e., losing infectiousness, either by way of the immune system conquering the pathogen or by death) with probability α. We put removed individuals in a compartment R and assume that immunity is permanent (and resurrection impossible). Finally, we assume that the cumulative force of infection equals βI, i.e., the per capita contribution to the force of infection equals β. Note that β is proportional to the length of the discretization interval, i.e., the time between census points, and that α=1eα̃ with α̃ proportional to this length.
These assumptions lead to the system of recurrence relations
S(t+1)=eβI(t)S(t),I(t+1)=(1eβI(t))S(t)+(1α)I(t),R(t+1)=αI(t)+R(t).
[4.1]
We show that the system [4.1] may be reduced to the scalar recurrence [2.3] by choosing the Ãk appropriately:
Ãk=β(1α)k1N.
[4.2]
The first step corresponds to the derivation of [2.8]: By iteration of the first equation of [4.1], we obtain
S(t+1)=eβj=0I(tj)N.
Rewriting the second equation of [4.1] as
I(t+1)=S(t)S(t+1)+(1α)I(t),
we obtain by summation the identity
j=0I(tj)=NS(t)+(1α)j=0I(tj1),
and by substitution of this identity repeatedly at the right-hand side,
j=0I(tj)=NS(t)+(1α)(NS(t1))+(1α)2N(S(t2))+.
Finally, substitution of this last identity in the formula for S(t+1) above yields [2.8].
Conversely, starting from [2.3] with Ãk given by [4.2], we easily recover [4.1] by defining
I(t)k=1[S(tk)S(tk+1)](1α)k1,
[4.3]
(note that the equation for R(t) is just an appendix; it has no impact on the dynamics of S(t) and I(t); it simply keeps track of individuals that are no longer infectious).
We emphasize that if one replaces eβI(t) by 1βI(t), the reduction to a higher-order scalar recursion relation fails (we invite readers to convince themselves of this fact)!
In order to capture a latent period, we next change the assumptions. Upon infection, an individual now enters the compartment E of exposed (i.e., infected, but not yet infectious) individuals. When the length of the latent period is geometrically distributed with parameter γ, we have to replace [4.1] by
S(t+1)=eβI(t)S(t),E(t+1)=(1eβI(t))S(t)+(1γ)E(t),I(t+1)=γE(t)+(1α)I(t),R(t+1)=αI(t)+R(t).
[4.4]
Do parameters Ãk exist such that [4.4] can be condensed to [2.3]? It is helpful to think in terms of a stochastic process in which an individual can be in the states S, E, I, and R. In fact, E and I suffice, since we start “looking” at the individual when it is infected and stop “looking” when it loses infectiousness. If we label E with index 1 and I with index 2, then the probability distribution of the state-at-infection is represented by the vector
10.
The state transitions are described by the matrix
M=1γ0γ1α,
and infectiousness by the vector
b=0β.
So, the expected infectiousness k units of time after becoming infected is given by
Ak=bMk110,
and hence by
Ak=b(Mk1)2,1=βl=1k1γ(1γ)l1(1α)k1l,
[4.5]
(with the convention that the sum equals zero when the upper index does not exceed or equal the lower index). The parameters Ãk are again defined by [2.5]. And when Ãk has the form defined by [2.5] and [4.5], then [4.4] follows from [2.3] if we define
E(t)=j=1[S(tj)S(tj+1)](1γ)j1,I(t)=j=1[S(tj)S(tj+1)]    l=1γ(1γ)l1(1α)j1l.
[4.6]
We trust that our presentation above, in terms of two vectors and one matrix, all having a well-defined interpretation, makes clear how one can, in general, relate compartmental epidemic models to a scalar higher-order recursion. See section 9.3 of ref. 16 for a detailed elaboration of the continuous-time case.
We conclude that there is a multitude of compartmental models that correspond to a special choice of the parameters Ak. To prove the results of Section 3 directly for a model with 27 compartments is very difficult, especially if one does not recognize the underlying structure (and 27 components make recognition difficult). More importantly, it is an unnecessary job: One only needs to observe that one deals with a special case of [2.3].

5. On the Choice of Parameters Ak

The ingredients {Ak} subsume mechanistic properties of the process of contact between hosts, as well as physiological/immunological properties of within-host dynamics. As a rule, information about such properties is scarce. One has to make educated guesses. See ref. 17 for a concrete example.
The SIR and SEIR formulations of the last section have the advantage of involving just a few parameters. But, in our opinion, they have the disadvantage of being wrongly educated guesses: They result from the tendency to do what others do, despite the fact that data, as a rule, do not support geometric distributions for the length of the latent and/or the infectious period.
A facilitating aspect is that Ak are averages (recall the remarks about p in Section 1, and see ref. 4, section 2.1 for a detailed exposition, including examples). If we “know” that at day six after infection, only 10% of the infected individuals are infectious, while at day seven this rises to 20%, we can use this information directly in our choice of Ak. If we know that at days six and seven, the degree of infectiousness differs among individuals, we can still use the guesstimated average.
A more theoretical example is the following. Assume that a fraction p of the infected individuals is asymptomatic. Assume that a symptomatic individual has at day j after infection a probability θj to be detected and put into quarantine. Assume that the intrinsic infectiousness and contact intensity of symptomatic and asymptomatic cases is identical and given by {Bk}. Then, we choose
Ak=p+(1p)j=1k(1θj)Bk.
[5.1]
Note that [5.1] is based on the debatable assumption that at the day of its detection, an individual does not contribute to the force of infection. This weakness is easily remedied, but at the cost of introducing yet another parameter.
The parameters θj can capture the effect of testing. During a serious outbreak, such as the COVID-19 pandemic, the testing policy and possibility depend on calendar day. This introduces time dependence in the parameters θj. Similarly, control measures that reduce contact opportunities affect the Bk in a time-dependent multiplicative manner. In the next section, we introduce a computational scheme in which such time dependence is easily incorporated.

6. Reformulation as a First-Order System

As shown in Section 3, the scalar higher-order recursion relation [2.3] is very convenient for theoretical purposes. But, for doing computations, a first-order system of equations is more convenient.
For feasibility, we want a finite dimensional system. To achieve this, we make the very reasonable assumption that the indices j for which Aj is strictly positive have a finite upper bound. In other words, we assume that an integer m exists such that Aj=0 for all jm+1. The relevant consequence is that the history of S, which matters for determining the future, has finite length.
Define
Xj(t)s(t+1j),j=1,,m.
[6.1]
Much of the dynamics of the vector X amounts to shifting:
Xj(t+1)=Xj1(t),j=2,,m.
[6.2]
Combination of [1.1], [2.4], [2.5], and [2.9] yields the rule for extension
X1(t+1)=X1(t)ej=1mÃj(Xj+1(t)Xj(t)).
[6.3]
In [6.3], it is harmless to allow Ãj to depend on time t!
Alternatively, we might start from [2.9] and choose as before
X1(t)=s(t),
[6.4]
but for j>1
Xj(t)=s(t+1j)s(t+2j),
[6.5]
which corresponds to the incidence in time window (t+1j,tj+2]. With this choice of Xj(t), the cumulative force of infection becomes
Λ^(t)=j=1mAjXj+1(t).
[6.6]
This leads to the update rules
X1(t+1)=X1(t)ek=1mÃkXk+1(t),
[6.7]
X2(t+1)=X1(t)X1(t+1),
[6.8]
Xj(t+1)=Xj1(t)forj>2.
[6.9]
In this formulation, too, we can allow Ãk to depend on time t.
This seems a good moment to point out that the use of labels like “exposed” or “infectious” is perfectly possible within the general framework. For any such label, say, L, specify, on the basis of the choice of the parameters Ak, as described in Section 5, the probability πj that an individual carries this label at time j after becoming infected. Then, the number of individuals carrying label L at time t is given by
NL(t)=j=1mπjXj+1(t).
[6.10]
So all one needs to do to plot the time course of NL is to add to [6.7][6.9] the Eq. 6.10 (with t replaced by t+1, for consistency).
Note that [4.3] and [4.6] are examples of [6.10]. In the very special situation considered in Section 4, the labels actually correspond to states at the individual level, and, as a consequence, one can express NL(t+1), for L=S, E, I, R in terms of these same quantities at time t, without reference to X(t). In general, this is impossible. (Incidentally, note that probabilists often speak about non-Markovian models when the labels refer to compartments and sojourn time distributions are not exponential, while calling the labels “states,” even though, strictly speaking, they do not qualify as such.)

7. About the Peak of the Incidence Curve

An epidemic curve has many features, such as
The initial growth rate ρ;
The height and timing of the peak;
The final size.
For the first and last of these, it is well understood how they relate to the parameters of simple models that ignore heterogeneity. For instance, the final size is completely determined by R0, while ρ is a solution of the Euler–Lotka equation (cf. [3.5]).
At the start of an outbreak, one may observe the initial growth rate and next use information about the generation interval to make inferences about R0 (refs. 9 and 10 and the references given there). Next, one may choose the model parameters such that ρ and R0 of the model correspond to the estimated values.
The ongoing outbreak of COVID-19 generates much interest in peaks, largely because of concern that hospitals may be overwhelmed with patients, leading to healthcare breakdown. As far as we know, there is no analytical method to determine the height and timing of the peak from model parameters (except, perhaps, in the oversimplified SIR system of differential equations). So, one has to rely on numerical calculations.
The key question addressed in this section is: How much is peak height influenced by model details? Here, we systematically compare the discrete-time SEIR model, described by [4.4] and corresponding to geometric distributions of the length of the latent and infectious period, to a model with deterministic, i.e., fixed, duration of these periods and constant infectiousness during the infectious period. Thus, both types of model have three parameters. By restricting to R0=2.5, we fixed the infectiousness parameter in terms of the other two. We calibrated the models by making sure that ρ and the mean length of the latent period are the same, thus creating a one-to-one relationship between the two parameters of one type of model and the two parameters of the other type of model.
As initial condition, we took a short history of decreasing fractions of susceptibles, reflecting an exponential increase in new cases at the rate ρ. We computed the peak value of the incidence for both types as a function of the two parameters and next their ratio. The results are depicted in Fig. 2.
Fig. 2.
Comparing incidences between two types of models for different combinations of the expected time individuals are exposed (TE) and the expected time individuals are infectious (TI). In the block model, the lengths of the latent and infectious periods are deterministic (fixed for all individuals); in the SEIR model, the lengths of these periods are stochastic (independently exponentially distributed with identical parameters, resp., 1/TE and 1/TI). See SI Appendix for details. (Top) Maximum incidence of the block model (with deterministic periods; Left), SEIR model (with stochastic periods; Center), and the relative ratio between the two (i.e., blockSEIRSEIR; Right), as a function of TE, the (actual, resp., expected) time individuals are exposed, and TI, the (actual, resp., expected) time individuals are infectious. Models were compared after ensuring that they have the same R0 and initial growth rate ρ. Note that the incidence of the deterministic model always reaches a higher peak within the ranges of TE and TI considered, by about 8 to 15%, than the corresponding SEIR model. (Middle and Bottom) Example simulations with the deterministic model (blue) and corresponding SEIR model (red) with the same R0 and ρ. Middle corresponds to the parameters at which the ratio of peak heights is minimal, (TE,TI)=(3,4); Bottom corresponds to when this ratio is maximal, (TE,TI)=(6,4). One can clearly see that the incidence grows initially at one and the same rate.
The main conclusions are:
Deterministic periods lead to higher peaks than geometrically distributed periods;
This is most prominent when the latent period is large and the infectious period is small;
The difference is, for reasonable parameter values, in the order of 10%.
[Note that, since compartmental models have fatter tails, they need, for given R0, to have an earlier peak of infectiousness in order to have the same ρ. This is clearly visible in Figs. 2 and 3. See section 4.1 of the recent book ref. 19 for a different, yet somewhat related, observation concerning the influence of (the shape of) a sojourn time distribution.]
Fig. 3.
Comparing a model, denoted by SC for SARS-CoV-2, with parameters corresponding to the Weibull distribution derived from SC generation-interval data in ref. 18 to the SEIR model with the same reproduction number R0 and the same initial growth rate ρ. (Top Left) Relative ratio between the maximal incidences (1, as in Fig. 2) as a function of TE, the expected length of the exposed period (note that the maximal incidence of the SC model is the same for all TE, as TE sets only the parameters of the SEIR model; SI Appendix). (Top Right) Expected contribution to force of infection for the SC model. (Middle Left) Incidence of the SC model (blue) and the SEIR model (red) with TE=1. At this TE, the relative difference in the peak incidences are maximal. (Middle Right) The expected contribution of the SEIR model for TE=1. (Bottom) Same, but now for TE=3, at which the difference in peak incidence is minimal. For TE>4.12, the SEIR model cannot be parameterized to have the same R0 and ρ as the SC model. For further details, see SI Appendix.
After the first conclusion emerged, we aspired to find a somewhat mechanistic explanation. This led to the following observation. Roughly speaking, an outbreak reaches its peak when S is reduced to the level corresponding to R0=1. How many more cases there will be after the peak depends largely on the number of individuals that are, or are on the way to becoming, infectious at the time the peak is reached. (In ref. 4, section 1.3.2, it is explained how the overshoot phenomenon corresponding to a large stock of recently infected individuals at the time of reaching the peak causes the final size, as a fraction of the population, to increase when R0 increases.) For compartmental models, there is a relatively fat tail in the distribution of the time until becoming “removed,” i.e., having no future infectiousness. So when comparing models with the same R0, and hence the same final size, we should expect that for compartmental models, the reservoir of latent and infectious individuals is smaller, at peak time, than for models in which expected future infectiousness reduces to zero after finite time. This is illustrated in Fig. 4, and as reservoir size correlates with peak size, we should expect lower peaks for compartmental models, exactly as found in our numerical results.
Fig. 4.
The fraction of the population that is either latently infected or infectious in the block model (blue) and the corresponding SEIR model (red). (Left) (TE,TI)=(3,4). (Right) (TE,TI)=(6,4). The corresponding incidences can be found in Fig. 2. Note that although the peak incidences are not so very different (Fig. 2), there is a large difference in the fraction of infected-and-not-yet-removed individuals, due to the comparatively fatter tail in the expected future contribution to the force of infection in the SEIR model.
Just to elucidate that the higher-peak-phenomenon matters in the COVID-19 context, we chose, on the one hand, the parameters Ak as integrals over 1-d time intervals of the Weibull generation-interval distribution, as derived from data in ref. 18, and, on the other hand, determined the one-parameter family of SEIR models that has both R0 and ρ equal to these quantities for the Weibull. The results of a comparison are presented in Fig. 3. The peak heights differ 5 to 10%.

8. Conclusions

The success of the SIR and SEIR variants dwarfs the attention for the general KM27 model, even though, in principle, the latter has much on offer for a would-be modeler. We surmise that the reason is that the general model is formulated as a renewal (or Volterra integral) equation and that for these unfamiliar equations, there are no user-friendly numerical tools available.
Here, we introduced a discrete-time version that has many advantages:
The generality and flexibility are retained;
Computing the epidemic time course is super easy;
The time step can be adjusted to the time interval between data points (e.g., 1 d or 1 wk);
The model parameters bear a direct relation to observational data and control measures;
Time-varying infectiousness due to interventions can be incorporated without difficulty.
In Section 7, we showed that precise assumptions about the latent and infectious period matter for predicting the peak of the incidence curve, a quantity of interest from a public health perspective. So, we claim, the generality matters for practical issues and is not just an academic fancy. Of course, in a practical context, all kinds of heterogeneity (e.g., reflecting age) matter as well. These have been neglected here, but by combining [2.10] with material in ref. 4 and other textbooks, progress can definitely be made.
Our hope is that our pragmatic reformulation leads to well-deserved and long-overdue (20) popularity of the true KM27 model.

Data Availability

All study data are included in the article and/or supporting information.

Acknowledgments

We thank Matthias Kreck for prompting one of us to formulate the discrete-time Kermack–McKendrick model. In turn, it was the COVID-19 pandemic that sparked the recent work of Matthias Kreck and Erhard Scholz. We thank two referees for critical remarks that led to several improvements.

Supporting Information

Appendix (PDF)

References

1
F. Brauer, Z. Feng, C. Castillo-Chavez, Discrete epidemic models. Math. Biosci. Eng. 7, 1–15 (2010).
2
D. Breda, O. Diekmann, W. F. de Graaf, A. Pugliese, R. Vermiglio, On the formulation of epidemic models (an appraisal of Kermack and McKendrick). J. Biol. Dyn. 6 (suppl. 2), 103–117 (2012).
3
T. G. Kurtz, Approximation of Population Processes (CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM: Society for Industrial and Applied Mathematics, Philadelphia, PA, 1981), vol. 36.
4
O. Diekmann, H. Heesterbeek, T. Britton, Mathematical Tools for Understanding Infectious Disease Dynamics (Princeton University Press, Princeton, NJ, 2012).
5
J. A. Heesterbeek, A brief history of R0 and a recipe for its calculation. Acta Biotheor. 50, 189–204 (2002).
6
J. A. P. Heesterbeek, “The law of mass-action in epidemiology: A historical perspective” in Ecological Paradigms Lost: Routes of Theory Change, K. Cuddington, B. Beisner, Eds. (Academic Press, San Diego, CA, 2005), pp. 81–105.
7
F. P. Pijpers, A non-parametric method for determining epidemiological reproduction numbers. J. Math. Biol. 82, 37 (2021).
8
P. Groeneboom, G. Jongbloed, Nonparametric Estimation Under Shape Constraints: Estimators, Algorithms, and Asymptotics (Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, New York, 2014), vol. 38.
9
S. W. Park, D. Champredon, J. S. Weitz, J. Dushoff, A practical generation-interval-based approach to inferring the strength of epidemics from their speed. Epidemics 27, 12–18 (2019).
10
S. W. Park et al., Forward-looking serial intervals correctly link epidemic growth to reproduction numbers. Proc. Natl. Acad. Sci. U.S.A. 118, e2011548118 (2021).
11
A. Bátkai, M. K. Fijavž, A. Rhandi, Positive Operator Semigroups (Studies in Advanced Mathematics, Birkhäuser Basel, Basel, Switzerland, 2017), vol. 257.
12
C. Robinson, Dynamical Systems: Stability, Symbolic Dynamics, and Chaos (CRC Press, Boca Raton, FL, 1995).
13
O. Diekmann, Limiting behaviour in an epidemic model. Nonlin. Anal. Theory and Appl. 1, 459–470 (1977).
14
W. O. Kermack, A. G. McKendrick, A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. A Contain. Pap. Math. Phys. Character 115, 700–721 (1927).
15
N. Hernandez-Ceron, Z. Feng, C. Castillo-Chavez, Discrete epidemic models with arbitrary stage distributions and applications to disease control. Bull. Math. Biol. 75, 1716–1746 (2013).
16
O. Diekmann, M. Gyllenberg, J. Metz, Finite dimensional state representation of linear and nonlinear delay systems. J. Dyn. Differ. Equ. 30, 1439–1467 (2018).
17
M. Kreck, E. Scholz, Proposal of a recursive compartment model of epidemics and applications to the covid-19 pandemic. arXiv [Preprint] (2020). https://arxiv.org/abs/2009.00308 (Accessed 9 August 2021).
18
L. Ferretti et al., Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 368, eabb6936 (2020).
19
J. Wu, X. Zhang, Transmission Dynamics of Tick-Borne Diseases with Co-Feeding, Developmental and Behavioural Diapause (Lecture Notes on Math Modeling in the Life Sciences, Springer, Cham, Switzerland, 2020).
20
O. Diekmann, The 1927 epidemic model of Kermack and McKendrick: A success story or a tragicomedy? Newslett. Japanese Soc. Mathemat. Biol. 92, 8–11 (2020).

Information & Authors

Information

Published in

The cover image for PNAS Vol.118; No.39
Proceedings of the National Academy of Sciences
Vol. 118 | No. 39
September 28, 2021
PubMed: 34561307

Classifications

Data Availability

All study data are included in the article and/or supporting information.

Submission history

Accepted: June 25, 2021
Published online: September 24, 2021
Published in issue: September 28, 2021

Keywords

  1. epidemic outbreak
  2. Kermack–McKendrick
  3. discrete-time model
  4. incidence peak
  5. basic reproduction number

Acknowledgments

We thank Matthias Kreck for prompting one of us to formulate the discrete-time Kermack–McKendrick model. In turn, it was the COVID-19 pandemic that sparked the recent work of Matthias Kreck and Erhard Scholz. We thank two referees for critical remarks that led to several improvements.

Notes

This article is a PNAS Direct Submission.
*Note that a lot of heterogeneity is subsumed: Individuals may have varying 1) degrees of infectiousness and 2) propensity to make contact; 3) their activities may be unevenly distributed over the time unit. As long as I is large and the properties of the two individuals involved in a transmission have independent influence, one can simply interpret p as an average; see ref. 4, chapter 2, and Section 5.
See online for related content such as Commentaries.

Authors

Affiliations

Mathematical Institute, Utrecht University, 3584 CD Utrecht, Netherlands;
School of Mathematics, University of Minnesota, Minneapolis, MN 55455;
Department of Mathematics, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands;
Mathematical Institute, Utrecht University, 3584 CD Utrecht, Netherlands;
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, 3584 CG Utrecht, Netherlands

Notes

1
To whom correspondence may be addressed. Email: [email protected].
Author contributions: O.D. designed research; O.D., H.G.O., R.P., and M.C.J.B. performed research; O.D., H.G.O., R.P., and M.C.J.B. wrote the paper; and R.P. performed simulations and made figures.

Competing Interests

The authors declare no competing interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

Export the article citation data by selecting a format from the list below and clicking Export.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    The discrete-time Kermack–McKendrick model: A versatile and computationally attractive framework for modeling epidemics
    Proceedings of the National Academy of Sciences
    • Vol. 118
    • No. 39

    Figures

    Tables

    Media

    Share

    Share

    Share article link

    Share on social media