Accepted Manuscript Title: Two approaches to forecast Ebola synthetic epidemics Author: David Champredon Michael Li Benjamin M. Bolker Jonathan Dushoff PII: DOI: Reference:
S1755-4365(17)30023-3 http://dx.doi.org/doi:10.1016/j.epidem.2017.02.011 EPIDEM 250
To appear in: Received date: Revised date: Accepted date:
23-6-2016 21-12-2016 17-2-2017
Please cite this article as: David Champredon, Michael Li, Benjamin M. Bolker, Jonathan Dushoff, Two approaches to forecast Ebola synthetic epidemics, (2017), http://dx.doi.org/10.1016/j.epidem.2017.02.011 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Two approaches to forecast Ebola synthetic epidemics David Champredon∗,1,2 , Michael Li1 , Benjamin M Bolker1 , Jonathan Dushoff∗,1 Contributed equally to this work and corresponding authors.
1
McMaster University, Hamilton, Ontario, Canada.
2
York University, Toronto, Ontario, Canada.
cr
ip t
∗
us
Abstract
Introduction
te
1
d
M
an
We use two modelling approaches to forecast synthetic Ebola epidemics in the context of the RAPIDD Ebola Forecasting Challenge. The first approach is a standard stochastic compartmental model that aims to forecast incidence, hospitalization and deaths among both the general population and health care workers. The second is a model based on the renewal equation with latent variables that forecasts incidence in the whole population only. We describe fitting and forecasting procedures for each model and discuss their advantages and drawbacks. We did not find that one model was consistently better in forecasting than the other.
Ac ce p
The scale of 2014 West African Ebola epidemic took the world by surprise. Although several Ebola outbreaks erupted in Africa over the past 40 years, none reached more than several hundred cases, far from the estimated 28,000 cases and 11,000 deaths observed during this last epidemic. As national and international health organizations rushed to control an unprecedented epidemic, the need to forecast case incidence mounted daily in order to support difficult decisions about limited resource allocation. Mathematical and statistical tools developed over the last two decades have improved epidemic forecasting [1–3]. Yet it is still notoriously difficult to accurately predict an epidemic trajectory, because key features of the epidemic may be poorly known (e.g. pathogen natural history, routes of transmission, observation errors, etc.). Indeed, the 2014 West African Ebola epidemic was challenging to forecast because i) it was the first time an Ebola epidemic hit an African urban centre, ii) the natural history of the pathogen was poorly known and iii) the quality of epidemiological data reported was poor. Forecasting incidence with a mathematical model involves two steps. First, the model parameters are fitted to observed data, then incidence is simulated forward using the inferred values for these parameters, while propagating uncertainty. The difficulty of each step depends on the modelling framework chosen. Here, we explore two different modelling approaches in the context of the Ebola Challenge, a forecasting exercise on synthetic Ebola epidemics ((cite overarching paper describing the Ebola challenge)). 1
Page 1 of 15
2
ip t
The first approach uses a classical compartmental model with population and transmission structure, fitted with approximate Bayesian computation [4]; the corresponding forecasting step uses a simple filtering technique. The second model uses the renewal equation with latent variables. The fit uses Markov chain Monte Carlo, which is less straightforward than the filtering methods of the compartmental model, but provides a more robust statistical framework for inference.
Methods
Compartmental model (SHERIF)
an
2.1
us
cr
In this section, we describe the theoretical framework, the fitting and forecasting procedure for the two mathematical models used in the Challenge. The first model is a standard compartmental model; the second is based on the renewal equation.
te
Epidemiological structure
d
M
The compartments were chosen to reflect the natural history of Ebola virus disease and to use the data that was going to be made available by the challenge organizers. The data included contact-tracing data and incidence and death counts among the general population and health care workers (HCW). More generally, this model is a discretestate, compartmental and stochastic model. Unlike canonical compartmental models, however, it keeps track of individuals in order to make use of contact-tracing data.
Ac ce p
We subdivide our population into ten epidemiological compartments: • S: susceptible individuals from the general population • Sw : susceptible health care workers • E: exposed (i.e., latent, infected but not yet infectious) individuals from the general population • Ew : exposed health care workers • I: infectious and not hospitalized individuals from the general population • Iw : infectious and not hospitalized health care workers • H: infectious and hospitalized individuals (both general population and health care workers) • F: dead individuals whose funeral ceremonies are not yet completed • D: buried dead individuals • R: individuals who have fully recovered from infection and are no longer infectious nor susceptible
2
Page 2 of 15
ip t
R Health care workers only
E
I
H
pH
pHw
Iw
δH δ
F
Transi0ons Force of infec0on
D
an
Single compartment
Sw
us
δ
Ew
cr
S
“box-‐car” compartments (Erlang/Gamma distribu0on)
M
“box-‐car” infec0ous compartments
d
Figure 1. Diagram of the SHERIF compartmental model
Ac ce p
te
We refer to this model with the acronym SHERIF. To make time distributions (e.g., the distribution of the infectious period) more realistic, we sub-divided each of the transient epidemiological compartments into functionally identical sub-compartments (thus replacing exponentially distributed residence times with more realistic gammadistributed residence times [5]; this is also called the “linear chain trick” [6]). We used nE , nI , nH and nF sub-compartments for E, I, H and F , respectively. We assumed that disease progression is the same for HCW and others, so compartments E w and I w also had nE and nI sub-compartments, respectively. Susceptible individuals from the general population can be infected by individuals in states I, I w and F . Only susceptible HCW can be infected by hospitalized patients (in state H). There are three possibilities after initial infection: i) recovery without hospitalization, ii) hospitalization and iii) death. In order to reflect interventions and/or behavioural change, all contact rates were made time-dependent (as piece-wise linear functions of time since the start of the epidemic). Figure 1 shows a diagram of the model. The population is discrete; individuals progress through the epidemiological states at pre-specified rates. We used the standard Gillespie method with a tau-leap approximation with an interval of 0.25 day [7]. Simulation events are listed in Table 1, and model parameters are described in Table 2. Although the mathematical model specifies random mixing within the population, each individual in the population is uniquely identified in order to record information about generation intervals and reproductive number for the calibration stage. For example, if during a tau-leap time interval, m
3
Page 3 of 15
ip t
infections are meant to occur, the m pairs of “infector”/“infectee” individuals are randomly chosen (from the infectious and susceptible compartments, respectively). The initial conditions for all simulations were the same: five individuals from the general population were infectious (in the I compartment) and all the rest of the population was susceptible (compartments S and Sw ). The effective population size was a fitted parameter (see next section), hence the number of individuals simulated varied across Monte Carlo iterations. The posterior distribution for all synthetic scenarios had a median at around 40,000 individuals.
cr
The model was implemented in C++ and interfaced with R [8] for data analysis and calibration.
us
Calibration
an
The choice of parameters to fit to data or to fix to a pre-specified value is based on the literature and on the modeler’s prior knowledge given the scenario and the timing of the forecast relative to the whole epidemic [9].
Ac ce p
te
d
M
Two calibration methods were implemented. First, we followed the methodology described in [10] where a synthetic multivariate normal likelihood is calculated from summary statistics of selected outputs of the model. Priors on the parameter values are set, first using a relatively non-informative distribution, then narrowing the distributions based on the posteriors from the first phase. We used the R package synlik to implement this calibration method. The second method implemented was an approximate Bayesian computation (ABC) where priors were defined similarly to the synthetic likelihood method [11]. The summary statistics for both synthetic likelihood and ABC were based on incidence and death counts. More precisely, for both the general population and HCW, we used the sum of squared errors between the quadratic interpolations of the observed and modelled log-incidence. We also used the squared error between the observed and modelled cumulative death counts. The choice to use cumulative counts was driven by the small number of weekly deaths observed, which were challenging to fit on a time-interval basis, but more manageable on a cumulative one (the time interval death counts often consisted in sparse “spikes”; hence the error between data and simulated deaths was easier to handle when considering the cumulative counts). The contact-tracing information, when available, was used to fit the distribution of the generation interval (via nE and nI ). We used both calibration methods for the two first prediction points but ended up using only ABC because this implementation was more robust (synthetic likelihood would occasionally fail because of very small variances of the summary statistics; this was likely a solvable problem but we did not have the time to find and implement a solution). Table 2 summarizes all model parameters and their fitting method, if any. Briefly, we chose to estimate contact parameters between infectious and susceptible individuals (βF S , βIS ), the mean duration of latency (σ), infectiousness (γ) and the effective population size. All other parameters were either fixed to values taken from the literature or set proportional to the estimated parameters. The contact-tracing data, when available, were used to fit the backward generation interval distribution [12]. The number of initial infectious individuals was arbitrarily set to five, so not fitted. The effective 4
Page 4 of 15
total size had a Beta distribution as prior, where the mean was arbitrarily set at 40,000 (which represents about 1% of the total population of Liberia, the country used for this challenge), and boundary values set at 0 and 100,000. Forecasting
cr
ip t
Once the model parameters were fitted to data, we simulated the epidemic forward to forecast incidence and deaths in both the general population and among HCW. We generated 1000 simulations with parameter values sampled from the posterior distributions.
M
an
us
For a given observation time, the total incidence is T = O+U , with O the observed cases and U the unobserved ones. In order to infer a confidence interval for T , we assumed that U had a negative binomial distribution with U as its overdispersion parameter (the number of “failures” in the failure-counting definition of the distribution) and a constant reporting rate arbitrarily chosen to be 30% for its probability parameter. In other words, the reported data were modelled with an observation error following a negative binomial distribution. The associated 95% confidence interval was used to filter, from the 1000 simulations generated, the trajectories where the incidence value at the last observed time was contained in this confidence interval. The forecast statistics were then calculated based on these remaining simulations.
Generalized renewal equation
d
2.2
Ac ce p
te
The generalized renewal equation (GRE) model is a Bayesian, latent-incidence model. The true incidence in each time period is treated as a latent (unobserved) variable. These variables are used to calculate the likelihood of the observed data and of the latent-incidence time series itself: each latent incidence is assumed to arise from a force of infection due to the infected individuals stemming from incidence in preceding time periods. The force of infection is reduced to account for endogenous behaviour change and for the estimated effects of interventions. The number of infections in a time period results from the force of infection being applied to an effective number of susceptibles, which is also a latent variable. Fig. 2 shows a diagram of the model. We fit the GRE model using JAGS [13], a program for Bayesian inference that takes a model structure and explicit priors for all parameters and uses Markov chain Monte Carlo sampling to generate samples from the posterior distribution of the parameters. 2.2.1
The susceptible population
We assumed that a proportion ρ of the population was a priori susceptible to reported infections. This proportion consists of two components: the effective proportion of the population which is susceptible ρE , and the reporting fraction ρr (thus ρ = ρE ρr ). To
5
Page 5 of 15
ip t cr us an M
d
Figure 2. Diagram of the GRE model
te
aid convergence, we fit the relatively stable value of ρ and parameterized ρE as ρpE (thus ρr = ρ1−pE ), with Beta-distributed priors on both ρ and pE .
Ac ce p
The starting number of susceptibles was therefore S(0) = ρE N , where N was the approximate population of Liberia. At each subsequent time step, we decreased the effective number of susceptibles by the latent incidence i: S(t) = S(t − 1) − i(t). 2.2.2
Force of infection
Our base force of infection was calculated using the renewal equation: F0 (t) = R0
L X
g(τ ) i(t − τ )
(1)
τ =1
where R0 is the basic reproductive number and g is the intrinsic generation interval distribution. For the Challenge, we used a time step of one week, and 5 time lags (thus, a maximum generation lag of L = 5 weeks). Given more time, we would have preferred to use a time step of 0.5 weeks, aggregating model-predicted incidences to match the observed incidence data. We modeled the generation-interval distribution phenomenologically using the functional form of the negative binomial (but truncated at L), with a position 6
Page 6 of 15
parameter (expressed as a function of the maximum generation time) and a shape parameter (analogous to the negative-binomial shape parameter). The realized force of infection was calculated as: α S(t) F (t) = F0 (t) e−At , S(0)
(2)
Incidence and observation process
an
2.2.3
us
cr
ip t
where the factor (S(t)/S(0))α accounts for intrinsic behavioural responses to the epi−At accounts for transmission reduction due to interventions. We demic, and the P factor e wrote At = k βk xk , where xk was our inferred intensity for each intervention, with Gamma-distributed priors for α and the βk . Due to time constraints, interventions were modelled crudely during the contest. The intervention intensities xk were the expected or realized number of Ebola Treatment Unit beds for each time period, and binary assessments of the presence of burial teams and contact tracing, as covariates in each week.
Ac ce p
te
d
M
The main interface between the model and the data is the relationship between the (latent) estimated real incidence and the reported incidence. To aid model convergence, we put these two quantities on the same scale using the reporting ratio; thus we keep track of latent incidence using ˆι(t) = ρr i(t), where i is inferred true incidence in the population and ˆι is the rescaled value that we use as a latent variable. The rescaled latent incidence ˆι is not restricted to integer values, which causes technical difficulties when we want to consider it as the outcome of a binomial process. Thus, we took a “hybrid” approach – by tracking latent incidence as a continuous variable, but treating observed incidence as a discrete number, with likelihood estimated based on a Betabinomial distribution. We modeled latent incidence as follows: ˆ F (t); ki ) π(t) ∼ Gamma(S(t) ˆι(t) = q(π(t)) ˆ = S(t ˆ − 1) − ˆι(t) S(t)
(3) (4) (5) (6)
obs(t) ∼ NegBinom(ˆι(t); kr )
where F (t) is the force of infection from (2). We model rescaled latent incidence ˆι(t) in two steps for numerical stability: π is the naive expectation of rescaled latent incidence, and q uses a generalized exponential function with estimated shape parameter ˆ κ to translate π into a value between 0 and S(t) = ρr S(t), the effective susceptible population, on the reporting scale (see above) – thus, the probability of infection P = q(P0 ) = (1 + P0 κ)−κ , where P0 = π/S. The function q approaches the exponential function when κ is large. Process error is incorporated through the Gamma distribution in (3), and observation error is incorporated through the negative binomial distribution (6). Both of these use shape parameters fitted with broad prior distributions (Table 3). 7
Page 7 of 15
2.3
Model selection
cr
3.1
Results SHERIF
us
3
ip t
Both models were developed during the Challenge. SHERIF being the first ready to fit and forecast, our contributed forecasts were made with this model for the first two prediction time points. We did not have a formal method to select between models for later prediction time points. Model selection was done based on visually assessing the two models’ performance in previous rounds and on our opinions about the plausibility of their predictions.
Ac ce p
te
d
M
an
Because many parameters of the SHERIF model could not be directly inferred using the data provided by the Ebola Challenge, we decided to fix the values for those that could be reasonably supported by the literature (Table 2). We also set some parameters proportional to others in order to reduce the high dimensionality associated with fitting the SHERIF model. Furthermore, we had to give informative priors (obtained with sequential fits, as explained in the Methods section). Figure 3 shows an example of the forecasting process. Note that our method requires a large number of simulations because the filtered ones represent only a small fraction (in our case, only about 10% of simulations were kept). One difficulty we faced was when the incidence data showed “double bumps”, probably indicative of an effect of spatial structure where a subepidemic in one region has already reached its maximum incidence while a second sub-epidemic is still growing. Because we did not have the resources to develop a spatial version of SHERIF in time for this challenge, capturing a double bump was challenging. We artificially accounted for this suspected spatial structure by changing the time-varying transmission rates in order to mimic these observed incidence patterns. The model calibration and forecast were insensitive to most of the HCW parameters, most likely because HCW cases and deaths represented a small proportion of the overall epidemic so their contribution to the force of infection could not be clearly identified in this modelling framework. The calibration process took about 2 hours on a small cluster of 12 CPUs at 2.1GHz and the forecasting process (simulating forward based on fitted parameters values) took less than 10 minutes. Neither calibration nor forecasting demanded large amounts of memory.
3.2
GRE
An advantage of the GRE model is that, once the priors are set, it can fit observed data without intervention or supervision; we simply ran the MCMC algorithm generated by JAGS long enough to achieve convergence (as assessed based on visual assessment of trace plots and standard diagnostics such as the Gelman-Rubin statistic [14]). A drawback of our particular implementation (due to time constraints) is that the GRE was fit only to incidence data, and only forecasted incidence. Deaths were reported 8
Page 8 of 15
ip t cr us an M d te Ac ce p
Figure 3. Example of incidence forecasting output from SHERIF. The left panel shows the 1000 simulations generated by sampling the fitted parameters from their posterior distribution (solid lines). The red open circles show the observed incidence. The green solid lines show the simulations kept for forecasting, as their value at the last observed time was within the 95% credible interval of the observation process (thick green segments). The right panel shows only the simulations kept for forecasting. The thick solid black line represent the median. The x-axis shows time both in days and weeks. The transparent blue triangles indicate peak incidence. The blue histogram illustrates the distribution of the peak incidence timing.
9
Page 9 of 15
ip t cr us an M d te
Ac ce p
Figure 4. Example of incidence forecasting output from GRE. Each panel corresponds to the four epidemic scenarios. The open circles represent the data. Only data before the prediction time were available. Data until the next prediction point are displayed to compare with forecasts. The solid line is the median forecast, while the dashed and dotted line are respectively the 50% and 90% credible intervals.
for the Challenge as a fixed proportion of incidence (we used the same ratio as the one found in SHERIF). Examples of typical forecasting outputs are shown in Figure 4. As with the SHERIF model, we did not have time to add simple spatial structure. We expect it would help fitting incidence curves, although it would probably increase forecast uncertainties. The GRE model fit all of the scenarios reasonably well without any specific tuning or calibration. Memory demands were not large, and convergence was typically obtained for a single scenario with a 30-minute run on a laptop. For the contest, we ran each scenario for one to two hours; this improved convergence in some cases.
10
Page 10 of 15
4
Discussion
For this Challenge, we used two models with distinct advantages and drawbacks.
ip t
The main advantage of the SHERIF compartmental model is its use of a structured population that can identify incidence and the impact of targeted interventions in specific communities (here, general population and HCW). But this comes at the price of having 24 parameters, for many of which no calibration data were available, forcing us to make arbitrary choices during the fitting process.
M
an
us
cr
The GRE model provided a theoretical framework particularly well adapted for forecasting incidence. It has relatively few parameters, and once finished was able to robustly make forecasts without supervision. However, we experienced implementation and numerical difficulties with the fitting process. In particular, we found it was necessary to implement a “hybrid approach” to aid MCMC convergence. Our efforts to use more powerful MCMC algorithms (e.g., Hamiltonian MCMC [15]) failed during the Challenge time frame, but we have pursued them further in other venues (Li et al., ms. in prep). The simplicity of the GRE framework means that we are optimistic about our ability to add further structure, such as explicit tracking of deaths and funerals and representing health-care workers and hospital settings in the model. However, some of these added complexities will no doubt add to the computation difficulties of fitting the model, which will require further innovations analogous to the hybrid model described above.
Ac ce p
te
d
The approach taken with SHERIF — that is, a compartmental model with parameters estimated from reporting data — is fairly standard in the literature: the natural history of the pathogen is translated in differential equations involving relevant disease states, then a statistical inference method is chosen, usually one that can deal with a relatively high-dimensional parameter estimation problem (see for example [1, 3]). The renewal equation used in the GRE framework has historically been used for the estimation of the reproductive number of an epidemic [16] rather than forecasting because, in the simplest formulation of the renewal equation, the epidemic grows exponentially to infinity. In this regard, the GRE is relatively innovative. But in its current form, the GRE can only forecast incidence, which may be a limitation if more detailed forecasts (e.g. deaths, number of hospital beds needed) are required. In the context of the Challenge, we found that both models suffered from the lack of spatial structure that could have provided a more principled framework to fit “doublebump” incidence patterns. In closing, we want caution about the differences between short- and long-term incidence forecasting. While a mathematical model can make detailed, and often reasonably correct, short-term epidemiological forecasts, it neglects much larger phenomena like knowledge transfer and behaviour changes, which can significantly affect longer term forecasts.
11
Page 11 of 15
References [1] Phenyo E Lekone and B¨arbel F Finkenst¨adt. Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention: Ebola as a Case Study. Biometrics, 62(4):1170–1177, June 2006.
ip t
[2] N M Ferguson. The Foot-and-Mouth Epidemic in Great Britain: Pattern of Spread and Impact of Interventions. Science, 292(5519):1155–1160, April 2001.
cr
[3] Jeffrey Shaman and Alicia Karspeck. Forecasting seasonal outbreaks of influenza. Proceedings of the National Academy of Sciences, 109(50):20425–20430, December 2012.
us
[4] T Toni, D Welch, N Strelkowa, A Ipsen, and M P H Stumpf. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of The Royal Society Interface, 6(31):187–202, February 2009.
an
[5] Helen J Wearing, Pejman Rohani, and Matt J Keeling. Appropriate models for the management of infectious diseases. PLoS Medicine, 2(7):e174, 2005.
M
[6] S. P. Blythe and R. M. Anderson. Distributed incubation and infectious periods in models of the transmission dynamics of the human immunodeficiency virus (HIV). Mathematical Medicine and Biology, 5(1):1–19, 1988.
d
[7] D T Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics, 2001. [8] R Core Team. R: A Language and Environment for Statistical Computing.
Ac ce p
te
[9] WHO Ebola Response Team. Ebola Virus Disease in West Africa — The First 9 Months of the Epidemic and Forward Projections. New England Journal of Medicine, page 140926130020005, September 2014. [10] Simon N Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104, August 2010. [11] Mark A Beaumont. Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics, 41(1):379–406, December 2010. [12] David Champredon and Jonathan Dushoff. Intrinsic and realized generation intervals in infectious-disease transmission. Proceedings of the Royal Society B: Biological Sciences, 282(1821):20152026, December 2015. [13] Martin Plummer. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In DSC 2003 Working Paper, pages 1–8, 2003. [14] Andrew Gelman and Donald B Rubin. Inference from iterative simulation using multiple sequences. Statistical Science, 7(4):457–511, 1992. [15] Radford M Neal. MCMC using Hamiltonian dynamics. In Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, editors, Handbook of Markov Chain Monte Carlo, chapter 5. CRC Press, May 2011.
12
Page 12 of 15
Ac ce p
te
d
M
an
us
cr
ip t
[16] WHO. Contact tracing during an outbreak of Ebola virus disease. Technical report, September 2014.
13
Page 13 of 15
5
Appendix Table 1. Epidemiological events in SHERIF.
Demography
Rate
Infection GenPop
S − 1 ; E1 + 1
(βIS I + βF S F + βI w S I w )S/N
Infection HCW
Sw − 1 ; Ew1 + 1
(βIS w I + βF S w F + βI w S w I w βHS w H)S w /N
Progression within E
Ek − 1 ; Ek+1 + 1
σEk nE
Progression within E w
w Ekw − 1 ; Ek+1 +1
σEkw nE
Infection onset GenPop
EnE − 1 ; I1 + 1
σEnE nE
Infection onset HCW
Enw − 1 ; I1 + 1
Progression within I
Ik − 1 ; Ik+1 + 1
Progression within I w
w Ikw − 1 ; Ik+1 +1
γ w Ikw nI
Hospitalization GenPop
InI − 1 ; H1 + 1
pH γH InI nI
Hospitalization HCW
InwI − 1 ; H1 + 1
pHw γHw InwI nI
Progression within H
Hk − 1 ; Hk+1 + 1
hHk nH
cr
us σEnw nE E
γIk nI
d
M
an
E
ip t
Event
InI − 1 ; F1 + 1
te
Funeral non-hospitalized
Funeral non-hospitalized HCW Inw − 1 ; F1 + 1 I
(1 − pH )δγF InI nI (1 − pHw )δγF Inw nI I
HnH − 1 ; F1 + 1
δH hF HnH nH
Progression within F
Fk − 1 ; Fk+1 + 1
f Fk nF
Recovery non-hosp. GenPop
InI − 1 ; R + 1
(1 − δ)(1 − pH )γR InI nI
Recovery non-hosp. HCW
InwI − 1 ; R + 1
(1 − δ)(1 − pHw )γR InwI nI
Recovery hospitalized
HnH − 1 ; R + 1
(1 − δH )hR HnH nH
Burial of dead
FnF − 1; D + 1
f FnF nF
Ac ce p
Funeral hospitalized
14
Page 14 of 15
Table 2. SHERIF model parameters definition and calibration method. Parameter N βIS
Description
Calibration
Effective population size
fitted, beta prior
Effective contact rate b/w I and S
fitted, lognormal prior
Effective contact rate b/w F and S
fitted, lognormal prior
Effective contact rate b/w I w and S
fixed at βIS
βIS w
Effective contact rate b/w I and S w
fixed at constant × βIS
Sw
βF S w
Effective contact rate b/w F and
βI w S w
Effective contact rate b/w I w and S w
βHS w
Effective contact rate b/w H and S w
ip t
βF S βI w S
fixed at βIF fixed at βIS fitted
Number of sub-compartments in latent stage E
nI
Number of sub-compartments in infectious stage I or I w
fitted if data available
nH
Number of sub-compartments in hospitalization H
nF
Number of sub-compartments in funerals F
1/σ
Mean duration of latency (infected but not infectious yet)
fitted, lognormal prior
1/γH
Mean duration of being infectious and non-hospitalized, given that the individual will be subsequently hospitalized
4 days
w 1/γH
Mean duration of a HCW being infectious and non-hospitalized, given that the individual will be subsequently hospitalized
4 days
1/γF
Mean duration of being infectious and non-hospitalized, given that the individual will die (without hospitalization)
8 days
1/γR
Mean duration of being infectious and non-hospitalized, given that the individual will recover (without hospitalization)
11.5 days
cr
nE
fitted if data available
M
an
us
fitted if data available fitted if data available
n/a
w + (1 − p γ w = pHw γH Hw )(δγF + (1 − δ)γR )
n/a
pH
Probability an infectious individual from the general population is hospitalized
0.4
pHw δ
Probability an infectious HCW is hospitalized
te
γ
d
γ = pH γH + (1 − pH )(δγF + (1 − δ)γR )
γw
0.5
Probability to die from infection when not hospitalized
0.8
Probability to die from infection when hospitalized
0.8
1/hF
Mean duration of hospitalization, given that the individual will die from infection
5.4 days
1/hR
Mean duration of hospitalization, given that the individual will recover from infection
15 days
Ac ce p
δH
h
1/f
h = δH hF + (1 − δH )hR
n/a
Mean duration of funerals
2 days
Table 3. GRE model parameters
Parameter
Description
Value
N
Total population size
Fixed at 4e6
L
Maximum length of the generation interval
Fixed at 5 weeks
Basic reproductive number
gamma prior
Effective reported susceptible proportion of the population
beta prior
R0 ρ pE
Parameter to apportion ρ into ρE and ρr
beta prior
gp
Position parameter for generation interval
beta prior
gs
Shape parameter for generation interval
gamma prior
α
Behavioural response parameter
gamma prior
βk
Effectiveness of intervention k intervention
gamma prior
ki
Shape parameter for disease transmission process
gamma prior
kr
Shape parameter for disease reporting process
gamma prior
15
Page 15 of 15