Fundamentals of Mathematical Models of Infectious Diseases and Their Application to Data Analyses

Fundamentals of Mathematical Models of Infectious Diseases and Their Application to Data Analyses

Chapter 1 Fundamentals of Mathematical Models of Infectious Diseases and Their Application to Data Analyses Masayuki Kakehashi*,1 and Shoko Kawano† *...

2MB Sizes 1 Downloads 13 Views

Chapter 1

Fundamentals of Mathematical Models of Infectious Diseases and Their Application to Data Analyses Masayuki Kakehashi*,1 and Shoko Kawano† *

Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan School of Nursing, Kurume University, Kurume, Japan 1 Corresponding author: e-mail: [email protected]

ABSTRACT The spread of infectious disease is an important and interesting task not only from the statistical viewpoint of data analysis but also from the practical viewpoint of prevention. This chapter presents statistical analyses based on dynamical models of infectious diseases in contrast to traditional approaches. In a typical traditional approach, data are phenomenologically analyzed applying a regression model consisting of merely observed variables, not deeply considering the mechanism lying behind. In contrast, in a dynamical model approach, we incorporate variables of the number of susceptible and infected individuals which are not available or at least very difficult to obtain. Dynamical model approach is a kind of state-space approaches which treats system variables in addition to observed variables. By setting such substantial variables of the system, we can easily consider preventive measure and calculate the effects of such measures. Dynamical system approach will provide new opportunities in the realm of data analyses together with the use of highly developed computers. Keywords: Infectious diseases, SIR model, SEIR model, Back calculation, State space, Influenza, School closure, Humidity

The prevention of infectious diseases has been one of the most important tasks in public health. There once was recognition that infectious diseases have been almost overcome. Such belief must have been much easily accepted by success in the eradication of small pox in 1977. World Health Organization (WHO) declared the epoch making victory against small pox in 1980. But,

Handbook of Statistics, Vol. 36. https://doi.org/10.1016/bs.host.2017.06.002 © 2017 Elsevier B.V. All rights reserved.

3

4

SECTION

I Introduction and Disease Modeling

soon after that, such an idea was overthrown by newly emerging infectious diseases like HIV/AIDS, SARS, etc. To solve the problems of infectious diseases successfully, it is necessary to establish global surveillance system and supporting system of executing preventive measures sharing the knowhow of infectious disease prevention. Among various preventive measures, it is very important to provide the prediction of infectious disease spread and the estimation of the effect of preventive measure based on scientific evidence. Analysis of infectious disease spread data plays a very important role to achieve these missions. This chapter aims to help to do so by providing the methods of infectious disease data analysis. Traditionally, statistical analyses have been based on phenomenological models. Most typical example is an application of linear regression, most commonly used tool in the analysis of numerical data. Regression models usually consist of merely observed variables, not deeply examining the mechanism lying behind. In statistical analyses based on dynamical models of infectious diseases presented in this chapter, we consider the mechanism of the spread and incorporate variables of the numbers of susceptible and infected individuals which are not simply observable variables. Setting such substantial variables of the system may well help us consider preventive measures and calculate the effect of such measures. In this chapter, we present basic ideas of dynamical system models of infectious diseases and their applications to actual infectious disease data. Data analyses are consisted of two parts. One is macroscopic level, treating all members of population as a whole or dividing total population into only a couple of subpopulations, so that the variables are the numbers of individuals in the total population or a couple of subgroups of the population. And the other is microscopic level, in which variables are the numbers of individuals in each class of each school. In microscopic level, models are stochastic, and the numbers of individuals are all integers. In Section 1, several important ideas of infectious disease models are introduced. In Section 2, dynamical models are applied to the analysis of observed data of total population. The analysis of subpopulations is also presented. In Section 3, we then apply the models to microscopic data. Next, in Section 4, we try to analyze the data from a spatial viewpoint. Finally, in Section 5, we summarize the benefits and further tasks of these approaches and possible development in future.

1 INTRODUCTION: FUNDAMENTALS OF INFECTIOUS DISEASE DYNAMICAL MODELS In this section, we briefly explain important concepts and basic models for the spread of infectious diseases. First, we present principles of change in numbers of biological population, referring to Malthus equation model with exponential population growth and logistic equation model where population

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

5

growth saturates at a certain level. We then provide fundamental models of infectious disease spread, SIR and SEIR models where changes in numbers of susceptible, exposed, infected, and removed individuals are described. This part also provides facts on important behavior of the models that are useful when we utilize these models in data analysis. We finally explain important concepts in infectious disease epidemiology including prevalence and incidence to avoid confusions that may happen without clear understanding.

1.1

Population Dynamics of Biological Populations

Before starting the explanation of infectious disease models, we begin with very fundamental models of population dynamics or change in the number of individuals belonging to a certain population. It has been stated that biological population tend to grow or increase in number exponentially, or in the manner of geometric series. If we denote the number of individuals of a population at time t by N(t), the simplest equation that describes the change of N(t) is: dN ¼ rN, (1) dt where r is a growth rate, i.e., the number of increase per capita per unit time. If birth rate is b and death rate (mortality rate) is d, then r can be expressed as r ¼ b  d. For the readers who are more interested in expressing the change of number by differential equations as in Eq. (1), some introductory textbooks on mathematical ecology or population biology will be helpful (Brauer and Castillo-Cha´vez, 2001; Murray, 2002; Thieme, 2003). The solution of the equation is given as follows: N ðtÞ ¼ N0 exp ðrtÞ provided that N(0) ¼ N0 (the initial condition). The equation is called Malthus equation after Malthus (1798) because the population grows exponentially as Malthus stated in his classical book on human population. In this model, per capita growth rate (dN/dt divided by N, i.e., 1/N dN/dt) is assumed to be constant. If the model is time discrete version, DN  N ðt + 1Þ  N ðtÞ ¼ rN ðtÞ and the solution is N(t) ¼ N0(1 + r)t1. The parameter r can be considered as 1 +r ¼ exp(r). If the net growth rate is negative, i.e., r ¼  j r j, then the population exhibits exponential decay following the equation: N ðtÞ ¼ N0 exp ðjr jtÞ: According to Malthus equation with r > 0, biological population increases without bound, which is unrealistic. In real world, there must be a limitation of growth due to the shortage of resources to be used for population growth. To incorporate such an effect in the model, logistic equation was proposed by Verhulst and Pearl (Thieme, 2003). The logistic equation is

6

SECTION

I Introduction and Disease Modeling

  dN N ¼ rN 1  : dt K Please note that the per capita growth rate is now linearly decreasing and becomes 0 at N ¼ K in contrast to Malthus equation where the growth rate was constant. The solution with initial value N(0) ¼ N0 is given as follows: N ðtÞ ¼

K  : K  1 exp ðrtÞ 1+ N0 

Per capita growth rate of the logistic equation is maximum when N ¼ K/2 and it is just half of r. In time discrete version,   N ðt Þ DN  N ðt + 1Þ  N ðtÞ ¼ rN ðtÞ 1  K or     r N ðt Þ N ðtÞ ¼ r0 N ðtÞ 1  0 , N ðt + 1Þ ¼ ð1 + rÞN ðtÞ 1  1+r K K 1+r K. r In these population dynamical models, every variable must remain nonnegative because they represent the number of individuals. Discrete version of logistic model may sometimes violate this requirement. To avoid such inconvenience, it is often used in the form: where r0 ¼ 1 + r and K 0 ¼

N ðt + 1Þ ¼ r0 N ðtÞ exp ðN ðtÞ=K 0 Þ ¼ N ðtÞ exp ðr 0 ð1  N ðtÞ=K 00 ÞÞ, where r0 ¼ exp(r0 ) and K00 ¼ r0 K0 . It is notable that, if N(t) is small, exp( N(t)/K0 ) ffi 1  N(t)/K0 . The final population size, or carrying capacity, is equal to K00 : N∞ ¼ lim N ðtÞ ¼ K 00 : t!∞

All these are fundamentals of mathematical ecology, so-called the field of population dynamics. These ideas can be considered as basic principles of temporal change in numbers in biological population.

1.2 Infectious Disease Spread Models, or Theoretical Epidemiology Models of infectious disease spread are the models that describe temporal change in number of individuals. In contrast to dealing with the number of total population only in typical population dynamical models, models of infectious disease spread deal with several variables representing the numbers

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

7

of individuals of several different attributes concerning with infection in the whole population. Representative attributes concerning with infection are usually susceptible (S), exposed (E), infected (I), and removed (R). In the model of infectious disease spread, we must recognize whether or not an individual is infectious. Individuals who are infectious, i.e., who are infected and able to transmit pathogen to another member of the population, are classified into the infected (I). Another important aspect is whether an individual can be infected. If an individual already has immunity to the considering infectious disease and never develop the disease, he or she must be distinguished from individuals who may possibly be infected. Individuals who have possibility to be infected are called susceptible (S). If individuals can no longer be infected, i.e., protected from infection by any means, they are classified into the removed or the recovered (with immunity) (R). If an individual is isolated to avoid further infection, he or she may belong to “R.” It is notable in most cases that an individual cannot be able to transmit infectious disease as soon as he or she gets infected, usually there is delay between being infected and being infectious. Individuals who are already infected or exposed to infection but have not yet become infectious are classified into the exposed (E). These four categories are fundamental in infectious disease spread model. In the view point of data collection, it is easy to report the number of individuals who newly exhibited symptoms of the disease. But such a case is difficult to classify into the exact class of the above four categories because showing symptoms is not equivalent to being infectious. This problem will be briefly discussed in the next subsection. There have been a lot of studies on infectious disease spread using mathematical models. The very basic form of such mathematical models is SEIR model originally proposed by Kermack and McKendrick (1927). Although their original version was age-specific model described by partial differential equations, we simply introduce here its simplified version disregarding age and described by ordinary differential equations. If the sensitivity to or transmissibility of the disease is different among individuals, it is appropriate to divide total population into subgroups. Age group is such a subgroup often adopted. The simplified version of the model is: dS ¼ l  bSI  mS, dt dE ¼ bSI  gE  mE, dt dI ¼ gE  fI  ðm + dÞI, dt and

dR ¼ fI  mR: dt

8

SECTION

I Introduction and Disease Modeling

In this model, l represents the number of births per unit time and m represents per capita mortality rate. Additional mortality due to the infectious disease is represented by d. The parameter b is a transmission rate from one infectious individual to one susceptible individual. The term bI representing the transmission rate of single susceptible is sometimes called the “force of infection.” The rest of parameters g and f are the transition rate from the exposed to the infectious (g) and that from the infectious to the removed (f ), respectively. For simplicity, in the application of short-term analysis, we can reasonably put m ¼ 0, l ¼ 0, and d ¼ 0. This implies no deaths and births take place during the observed period and no extra mortality rate due to the infection are experienced by the patients. In this case, the model becomes: dS ¼ bSI, dt dE ¼ bSI  gE, dt dI ¼ gE  fI, dt and

dR ¼ fI dt

with common initial value setting, S(0) ¼ N  1, I(0) ¼ 1, E(0) ¼ 0, and R(0) ¼ 0, where N is the total population size. In this specific case, please note that total population size is kept constant, i.e., S(t) + E(t) + I(t) + R(t)  N. It is frequently the case that the data are collected discrete time, e.g., daily, weekly, monthly, or annually. In such cases, discrete version models are more convenient to be used. Time discrete SEIR model is described as follows if we disregard mortality and birth: Sðt + 1Þ ¼ SðtÞ  bSðtÞI ðtÞ ¼ SðtÞð1  bI ðtÞÞ, Eðt + 1Þ ¼ EðtÞ + bSðtÞI ðtÞ  gEðtÞ ¼ bSðtÞI ðtÞ + ð1  gÞEðtÞ, I ðt + 1Þ ¼ I ðtÞ + gEðtÞ  fIðtÞ ¼ gEðtÞ + ð1  f ÞI ðtÞ, and Rðt + 1Þ ¼ RðtÞ + fI ðtÞ: SIR model is then: Sðt + 1Þ ¼ SðtÞ  bSðtÞI ðtÞ ¼ SðtÞð1  bI ðtÞÞ, I ðt + 1Þ ¼ I ðtÞ + bSðtÞI ðtÞ  fI ðtÞ ¼ bSðtÞI ðtÞ + ð1  f ÞI ðtÞ, and Rðt + 1Þ ¼ RðtÞ + fI ðtÞ:

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

9

In the simplest SIR model without birth and mortality, there are two parameters, b and f. The parameter b is the rate of transmission from one infectious individual per unit time. The other parameter f is the rate of recovery from the infectious disease, losing infectivity. The rate is assumed to be constant and the inverse of the value 1/f represents the mean duration of being infectious. Once the parameter values are fixed, and initial values of the variables are assumed, we can calculate the values of variables at the next time unit from the values of variables at present time. This is how simulation is carried out. By changing these parameters and carrying out simulation, we can see how the parameters affect the results of the simulation, or the way how the infectious disease spreads. We have illustrated how the results of simulation are changed when parameter values are set differently. First we see the effect of parameter b when the value of f is kept constant. Fig. 1 shows temporal change of S(upper left), I(upper right), and R(below). As is clearly shown, the peak of incidence (the graph at the upper right) gradually becomes higher and earlier as the value of b becomes larger. The final level of S(left) becomes smaller and that of R(below) becomes larger as the value of b becomes larger. We then see the effect of parameter f when the value of b is kept constant. As is clearly shown, the peak of incidence (the graph at the upper right) gradually becomes higher and earlier as the value of f becomes smaller. The final level of S(upper left) becomes smaller and that of R(below) becomes larger as the value of f becomes smaller (Fig. 2). We finally see the effect of parameters b and f when the ratio of b and f( ¼ b/f ) is kept constant. The result is a little different in this case. The peak of I (incidence) 1000

S 50,000

800

40,000 30,000

600

20,000

400

10,000

200

0

50

100

150

50,000

200

t (day)

0

50

100

150

200

t (day)

R

40,000 30,000 20,000 10,000

0

50

100

150

200

t (day)

FIG. 1 Simulation with different b values: b ¼ 7.0  106 , 7.2  106 , … , 9.0  106 ; f ¼ 0.33.

10

SECTION

I Introduction and Disease Modeling

I (incidence)

S

2500

50,000

2000

40,000 30,000

1500

20,000

1000

10,000

500

0

50

100

150

t (day)

200

0

50

100

150

200

t (day)

R 50,000 40,000 30,000 20,000 10,000

0

50

100

150

200

t (day)

FIG. 2 Simulation with different f values: f ¼ 0.08 , 0.13 , 0.18 , 0.23 , … , 0.58; b ¼ 8.0  106. S

I(incidence) 1000

50,000 40,000

800

30,000

600

20,000

400

10,000

200

0

50

100

150

200

t(day) 0

50

150

200

100

150

200

t(day)

R 50,000 40,000 30,000 20,000 10,000

0

50

100

t(day)

FIG. 3 Simulation with different beta values: (b, f ) ¼ (4.0  106, 0.165) , (4.8  106, 0.198) , (5.6  106, 0.231) , … , (12.0  106, 0.495) ; b/f  8.0  106/0.33 (the ratio of b and f is kept constant).

incidence (the graph in the upper right) gradually becomes higher and earlier as the value of b becomes larger or f becomes smaller. But the final level of S(upper left) and that of R(below) remain constant independent of the values of b and f. These levels seem dependent only on the ratio of b and f( ¼ b/f ) (Fig. 3).

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

11

In summary, larger b and smaller f contribute to earlier and larger peak of incidence but the final level of outbreak is determined by the ratio b/f. Large b means that one infectious individual can transmit pathogens to many susceptible individuals and small f implies that infectious individual remain infectious for a long time. Thus it is natural that large b and small f correspond to fast spread of reaching the final level quickly while it is slow if b is small and f is large.

1.3

Important Concepts in Infectious Disease Epidemiology

As we just noticed in the previous subsection, the definition of variables consisting of the SEIR or SIR model seems clear but it is rather complicated when they are applied to the collection of data. It may well happen that an individual is infectious without showing any symptoms. Thus counting S, E, I, and R individuals cannot be simple in the real situation. The framework we view infectious disease spread in population may be described as follows: Pathogens that cause infection are possibly transmitted by a specific type of contact from one infected individual to another individual who are not yet infected if we are considering directly transmitted diseases. But it is usually not clear which contact actually transmitted pathogens successfully. Successful transmission can presumably be determined only after one actually developed disease. It is also difficult to record all possible contacts of each individual that may cause transmission. Thus it is quite difficult to obtain the number of newly infected individuals counting as soon as they got infected. Recent molecular genetic techniques can improve the situation but it may deserve a lot of efforts. So, in most cases in data collection, obtained data are reported numbers of newly diagnosed cases. As the timing when patients lose their infectivity is also difficult to know, the number of individuals who are infectious at a certain time cannot be known precisely. Moreover, the delay of report (reporting delay) and the incompleteness of report (not all cases are properly reported, often called “under-reporting”) may make the situation more complicated. Thus someone may claim whether or not using S, E, I, and R variables is really useful. But when we are planning measures against infectious disease, it is quite useful to have the image in mind that how people have contact in a specific manner with each other to transmit disease. This kind of discrepancy of observable variables and postulated state variable is encountered in many areas of science and in many actual systems. The approach to such problems is the idea of state space in time series analysis. We can estimate the values of state variables from observed data by some method. Model involving entities essentially consisting the system is helpful and required in actual situations. We will see how it is actually work in the next section in the analyses of infectious disease spread data. It is worth noticing the importance of distinguishing incidence and prevalence. The concept of incidence is the rate of newly emerged cases per person

12

SECTION

I Introduction and Disease Modeling

per time. On the contrary, the concept of prevalence is the proportion of cases in the population at a specific time. The variable I in the model corresponds to prevalence because I represents number of individuals who are infectious at each time. But data usually correspond to incidence because data are based on the reported numbers of newly emerged cases during a certain period. The term “rate” is used when the value is calculated as per time. But it is conventionally called prevalence “rate” although it is a proportion in the exact sense. Thus it is sometimes called prevalence proportion. Another important concept is cumulative incidence. If the incidence rated is summed up to time t from the beginning of outbreak, the obtained value is called cumulative incidence. The number obtained by subtracting cumulative incidence from total population size can be regarded that it represents the number of susceptible individuals. But it must be noted that this value may contain infected individuals who are not yet reported. It is also very important to discriminate incubation period and latent period. The concept of incubation period is the time from the start of infection to the beginning of exhibiting symptoms. On the contrary, the concept of latent period is the time from the start of infection before becoming infectious. So, from the viewpoint of dynamical model, fact on latent period can be more important. There are several important points when we interpret outcome of the models of infectious diseases to the facts of infection in real world. The methods that can appropriately calculate the values in the theory of infectious diseases from the observed data were described in detail in Nishiura et al. (2009).

1.4 Important Concepts From Dynamical Models of Infectious Diseases Dynamical models have a typical set of analytic tools. Applying such tools, various important principles have been found out. We briefly present these important facts in this subsection. One of the important concepts in dynamical models is the idea of equilibrium (pl. equilibria). In dynamical models, the set of all possible values of the constituent variables, i.e., {(S, I, R)|S  0 , I  0 , R  0} for SIR model, for example, is called the state space or the phase space. Thus each element (S, I, R) is called a “point.” The point at which the system “stops,” i.e., (S, I, R) such that satisfies dS/dt ¼ dI/dt ¼ dR/dt ¼ 0 for SIR model, is an equilibrium. The dynamical system stays at an equilibrium point all through the time if it starts form the equilibrium point provided that it is exposed to no disturbance (i.e., forced small changes of values in the state variables). We can determine whether an equilibrium point is stable or unstable according to the response of the equilibrium to the disturbance. The equilibrium is defined as (asymptotically) stable if it returns to the original equilibrium state

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

13

even after the disturbance. Readers who found these analyses interesting are recommended to consult on the books given in the references (Hirsch et al., 2004; Murray, 2002). The state where the system stays for a long time must be an equilibrium point. The example of equilibrium in case of SIR model with no mortality is: Equilibrium 1: (S, I, R) ¼ (N, 0, 0), and Equilibrium 2: (S, I, R) ¼ (N  R*, 0, R*). The equilibrium 1 corresponds to the state with no disease or before an outbreak and the equilibrium 2 corresponds to the state after an outbreak when the spread is terminated. Another tool is linear approximation of the system at an equilibrium point. Dynamical systems can be written dx ¼ Fðx, uÞ, dt with the initial condition x(t) ¼ x0. If we denote the vector of state variables x, the involved parameter in F by u and the initial value by x0. The solution is expressed as x ¼ ft ðu, x0 Þ:

(2)

The solution is dependent on the initial values in addition to the parameter values. We may have d ðx  x∗Þ ¼ Fðx∗Þ + DFðx∗Þðx  x∗Þ dt as an linear approximation at x ¼ x∗ of the original system. DF(x∗) is a Jacobian matrix of F(x) evaluated at x ¼ x∗. Because the equilibrium point was the point such that F(x∗) ¼ 0, we have d ðx  x∗Þ ¼ DFðx∗Þðx  x∗Þ: dt Thus the deviation from the equilibrium, x  x∗, is approximately described by linear differential equations, in which behavior of the system is completely determined by the eigenvalues of the matrix DF(x∗). If the real part of all eigenvalues is negative, the equilibrium is stable. The linearly approximated SIR model with no mortality at the equilibrium 1, we have 0 1 0 1 0 bN 0 bI ∗ bS∗ 0 @ bI ∗ bS∗  f 0 A ¼ @ 0 bN  f 0 A 0 f 0 0 f 0 because the Jacobian matrix (left-hand side) is evaluated at (S∗, I∗, R∗) ¼ (N, 0, 0). The eigenvalues are bN  f, 0, and 0. Thus, if bN  f > 0, the equilibria with no disease (i.e., before an outbreak) is unstable and the infectious

14

SECTION

I Introduction and Disease Modeling

disease begins to prevail. This condition can be transformed into the following form: bN > 1: f The numerator of the left-hand side fraction, bN, is the number of secondary infection per unit time from one case when all members of population (¼ N) are susceptible. The denominator, or more precisely 1/f, represents the duration of infectious period. Thus product of these values can be regarded as the number of secondary cases from the first case (sometimes called index case) when all population are susceptible. This value is called the basic reproduction number, defined below: R0 ¼

bN : f

If the basic reproduction number is larger than unity, the infectious disease is considered to be able to spread. This equation provides another fact that f R0 > 1 , N > : b This implies that host density must exceed certain level for the infectious disease to be able to spread. This critical density is called threshold density. Another important principle is obtained if we consider the proportion of the coverage of vaccination. If the proportion p of total population has immunity against the infectious disease, then the following condition suggests that the infectious disease cannot spread: bð1  pÞN ¼ ð1  pÞR0 < 1: f This leads the idea of proportion of vaccination required to successful exclusion of the infectious disease: 1 p>1 : R0 Thus basic reproduction number can be used to calculate a rough estimation of required proportion of vaccination. It is not necessary to achieve 100% vaccination to realize exclusion of infectious disease. This concept is called “herd immunity.” When the whole population is subdivided into several subpopulations, the concept of basic reproduction number is extended to the idea of next-generation matrix. The detail will be explained in the next section. Dynamical models so far assumed that member of population have equal probability of contact to another member of the population, which is unrealistic in the actual infectious diseases. Recently, such heterogeneous relationship among members of the population is analyzed in detail using the idea of

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

15

network. Some researches pointed out the importance of individual who may have enormously large number of contacts than other average member of the population. Such individual is called “super spreader.” In the analyses of infectious diseases, two types of situations must be worth distinguishing. One is an outbreak type where an infectious disease suddenly invades into a population and, after a certain level of spread is attained, it finally extincts. The other is sustainable type where an infectious disease is sustained and continues to survive at almost stationary level. As to the latter case, there are many methods for analyses including autoregression, periodgram, spectral analysis, and wavelet analysis, which we do not deal with in this chapter. Readers who are interested in these orthodox methods of time series analysis are recommended to consult on Subba Rao et al. (2012). This chapter concentrates on the use of dynamical model to data analysis.

2 ANALYSES OF WHOLE POPULATION: MACROSCOPIC ANALYSES Dynamical models of infectious diseases seem much more complex than usually used statistical regression models to be used in the actual data analysis. In spite of this impression, we would like to show how it is useful and leads to more rich understanding. As explained in the previous section, we concentrate upon the analyses of outbreak type in this chapter. Basic pattern of infectious disease outbreak can be summarized into the size of outbreak and its velocity of spread. We will find these two important characteristics of infectious disease spread can be expressed in terms of parameters of the dynamical model.

2.1

Data Description

We first describe the data we used in this chapter. Following variables were included in the data: Reported numbers of students/pupils who were diagnosed as influenza and whether the class and/or school was closed or not. Data were collected in each class of each school every day from August 1, 2009 to March 31, 2010. The date was represented by variable t, with t ¼ 1 corresponding to August 1 and t ¼ 243 corresponding to March 31 of the next year. Total number of pupils or students was 51,872 in 134 schools including several kindergartens. The number reported as influenza cases are represented by I^ij ðtÞ where school i, class j, and day t. The variable C^ij ðtÞ represents school closure. This variable is ¼1 if the class is closed and ¼0 if not closed. Fig. 4 is spiky particularly by the reason the reported numbers were smaller on Saturdays and Sundays and larger on Mondays producing weekly spike in the graph. To extract trend out of the graph, moving average is often

16

SECTION

I Introduction and Disease Modeling

I(incidence) 1200 1000 800 600 400 200

0

50

100

150

200

t(day)

FIG. 4 Daily reported numbers of cases from August 1, 2009 to March 31, 2010.

I(incidence) 600 500 400 300 200 100

0

50

100

150

200

t(day)

FIG. 5 Graph of moving average 1.

calculated. Moving average is the calculation of average at each time point using several neighboring points. Fig. 5 is the result of application of moving average method to the original data. There are still spikes with reduced height. The sharp peak observes at t ¼ 126 is because there was national holiday on Monday and large number of reports were carried over to Tuesday. If we operate moving average again to the sequence of which moving average was already once calculated, then we have more smoothed curve. If we again calculate moving average to already averaged twice, we have much more smoothed curve (Fig. 6).

Mathematical Models of Infectious Diseases and Data Analyses Chapter I(incidence) 600

I(incidence) 600

500

500

400

400

300

300

200

200

100

1

17

100 0

50

100

150

t(day)

200

0

50

100

150

200

t(day)

FIG. 6 Graphs of moving average 2 (left) and 3 (right). I(incidence)

I(incidence) 20,000

20,000

15,000

15,000

10,000

10,000

5000

5000

0

50

100

150

t(day)

200

0

50

100

150

200

t(day)

FIG. 7 Cumulative incidence of original data (raw data, left) and that of moving averaged data (right) they seem quite alike.

The graph of cumulative incidence was also presented. The spikes could have been removed to show general tendency. Smoothing can be alternatively achieved in this way (Fig. 7). We first present the results of analyses on total population in a couple of following subsections. The variables to be analyzed are I^ðtÞ, the sum of all XX I^ij ðtÞ) at day t. I^ij ðtÞs (i.e., i

2.2

j

Simple Regression Analysis

The graph of I^ðtÞ was changing very rapidly. In contrast, if we calculate cumulative I^ðtÞ or removed R^ðtÞ: R^ðtÞ ¼

t X

I^ðτÞ,

τ¼1

we have rather smooth curve, showing less variation. We can easily apply nonlinear regression model using logistic curve RðtÞ ¼

K : 1 + C exp ðrtÞ

Please note that R(t) ffi K, almost constant, for large t and R(0) ¼ K/(1 + C). It can also be noted that R(t) ffi (K/C) exp(rt), i.e., the growth is nearly

18

SECTION

I Introduction and Disease Modeling

exponential with growth rate r in the very early stage of the spread. The graph of R^ðtÞ together with S^ðtÞ ¼ N  R^ðtÞ is illustrated in Fig. 8. In Fig. 8, incidence was plotted as 50 times larger while other two variables were drawn normally. The time course of R(t) exhibits a typical pattern of growth with saturation. After we applied nonlinear regression analysis adopting logistic curve, we obtained the following result: RðtÞ ¼

21136:5 : 1 + 1574:18exp ð0:0693439tÞ

In particular, the right-hand side graph (Fig. 9) shows almost perfect coincidence with Pearson correlation coefficient r ¼ 0.9998 (Fig. 10). The double humps of incidence peaked nearly at t ¼ 85 and t ¼ 115 exhibited in the actual data (left-hand side) have vanished in the graph drawn from logistic curve fitting data. But they seem very fit as shown in Fig. 11. The peak of the logistic curve fitting data was at t ¼ 106.2 (November 14). As we have mentioned in previous section, the data of infectious disease spread are usually incidence rates. In contrast, the variable of the number of S, I, R 50,000 40,000 30,000

S

20,000

R

I × 50

10,000

0

50

100

150

t(day)

200

FIG. 8 The spread of flu in schools from August 1, 2009 to March 31, 2010. R

R 20,000

20,000

15,000

15,000 Logistic curve Raw data

10,000

Logistic curve Moving averaged

10,000 5000

5000

50

100

150

200

t(day)

50

100

150

200

t(day)

FIG. 9 Variable R, raw (left), and moving averaged (right), with fitted logistic curves.

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

19

S, I×50, R S

50,000

I×50 40,000

R

30,000 20,000 10,000

50

100

150

200

t(day)

FIG. 10 Variables S and I (incidence) calculated from R obtained by logistic curve fitting.

I(incidence) 400

300 Logistic curve 200

Moving averaged

100

50

100

150

200

t(day)

FIG. 11 Variables I (incidence) obtained by logistic curve fitting and by moving average.

infected individuals represents prevalence. To distinguish these two different concepts clearly, we write Iinc for incidence rate and Iprev for prevalence rate. They can be connected by the relationship that 1 Iprev ¼ Iinc f where f represents recovery or removal rate implying that 1/f is a duration of infectious period. For simplicity we apply SIR model to the data rather than SEIR model. The applied model is: dS ¼ bSIprev , dt dIprev ¼ bSIprev  fIprev , dt

(3)

20

SECTION

I Introduction and Disease Modeling

and dR ¼ fIprev : dt We have RðtÞ ¼

ðt

Iinc ðτÞdτ and Iinc ðtÞ ¼ f Iprev ðtÞ:

0

As explained in the previous section, there is an essential value to describe infection, i.e., the basic reproduction number, which is usually defined as the number of secondary infection from the first index case in the population in which there is no infected and immune individuals, i.e., all of population is susceptible. It is denoted by R0. Do not confuse this notation with the variable R which represents the number of removed individuals. We had R0 ¼

bN : f

According to the equation we have dS b ¼ bSIprev ¼  SIinc dt f 1 dS d b ¼ log S ¼  Iinc S dt dt f       ðt b b R0 SðtÞ ¼ Sð0Þ exp  Iinc ðτÞdτ ¼ N exp  RðtÞ ¼ N exp  RðtÞ N f 0 f (4) As t ! 0, S(t) turns to be S∞ ¼ lim t!∞ SðtÞ and R(t) to be R∞ ¼ lim t!∞ RðtÞ. We have S∞ + R∞ ¼ N or

S ∞ R∞ + ¼ 1: N N

From the equation above,

  S∞ R∞ R∞ ¼1 ¼ exp R0 : N N N

(5)

Thus, if we have the proportion of uninfected individual at the final stage, R∞ P ¼ , P is a solution of the following equation. 1  P ¼ exp( R0P). On the N contrary, if we know the value of P, then the value of the basic reproduction number R0 is calculated as R0 ¼  [log(1  P)]/P. In our case, we have P ¼ 0.407 and thus R0 ¼ 1.28. Thus we have b/f ¼ R0/N ¼ 0.00002468. In this way, we have shown that dynamical model of infectious diseases is quite useful to understand the infectious disease spread in host population.

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

21

Ratio of bs 2.0

1.5

1.0

0.5

0

50

100

150

200

t(day)

FIG. 12 Ratio of observed b to estimated b.

To examine how good is the performance of the model, we can compare actual b calculated by the observed and estimated data and theoretically obtained value of b. According to the equations of the model, there are several ways of calculating the actual and temporarily changing values of b. The simplest and straight forward one is to calculate: b¼

1 1 dS b 1 1 dS or more preferably ¼  : SðtÞ Iprev ðtÞ dt f SðtÞ Iinc ðtÞ dt

In time discrete case, this can be rewritten as b Sðt + 1Þ  SðtÞ : ¼ f SðtÞ  Iinc ðtÞ

(6)

By dividing this value by estimated b/f value, we obtain Fig. 12. If the model completely holds all over the time, it must be equal to one all over the period. The dashed line is drawn as a reference (¼1) from September 1 to December 24, the second semester, between two long school holidays, summer and winter vacations. The estimated values exhibited considerable variations and tended to smaller than the actual value. This may reasonably suggest the effect of school closure analyzed in detail in the following subsection. If we once have the value of b/f, we can calculate the values of S(t) using Eq. (4). Then, if we use these S(t) values together with observed data I^ðtÞ, we can obtain the values of b and f as the regression coefficients of following regression analysis: regression model: Y ðtÞ ¼ b1 X1 ðtÞ + b2 X2 ðtÞ, ^ ^ + 1)  I(t), ^ X1(t) ¼ I(t)S(t) and X2 ðtÞ ¼ I^ðtÞ. where Y(t) ¼ I(t

22

SECTION

I Introduction and Disease Modeling

According to Eq. (3), b1 corresponds to b and b2 to f. Newly calculated values of b and f are not necessarily the same as previously calculated b and f as b/f from R0 using Eq. (5). New ones are considered to be more precise values. Thus, if we use newly obtained b and f values to calculate new S(t) by Eq. (4), this process can be iterated and expected to obtain b and f values converged to the most precise values.

2.3 The Effect of School Closure The analysis so far disregarded the effect of school closure. What kind of change must be made in the approach so far presented? We have to consider the effect of school closure that, under school closure, students are not exposed to infection because they do not go to school nor have contacts to cause infection with their classmates. Thus the equation of the dynamics must be rewritten as dS ¼ bSO Iprev , dt dIprev ¼ bSO Iprev  fIprev , dt and dR ¼ fIprev , dt where we wrote susceptible students who attended schools as SO (the susceptible of open classes and schools). In time discrete model, Sðt + 1Þ ¼ SðtÞ  bSO ðtÞI ðtÞ ¼ SO ðtÞð1  bI ðtÞÞ + SC ðtÞ, I ðt + 1Þ ¼ I ðtÞ + bSO ðtÞI ðtÞ  fI ðtÞ, and Rðt + 1Þ ¼ RðtÞ + fI ðtÞ: The variable SC represents the number of susceptible students of closed schools and we have (Fig. 13): SðtÞ  SO ðtÞ + SC ðtÞ: If we calculate the discrepancy as in Fig. 12, we have Fig. 14. In this time, we find a trend that b is gradually increasing, strongly suggesting that there must be some other influential factor. Most probable factor seems to be humidity as is often pointed out that transition rate increases as absolute humidity decreases (Shaman and Kohn, 2009). Actually, the absolute humidity of this area in the season was as exhibited in Fig. 15.

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

23

Proportion of school closure 1.00

0.98

0.96

0.94

0

50

100

150

200

t(day)

FIG. 13 The proportion of students attending at school. Maximum approximately 8% of students were under school closure at t ¼ 90 (the end of October and beginning of November). Ratio of bs 2.0

1.5

1.0

0.5

0

50

100

150

200

t(day)

FIG. 14 Actual b values were gradually increasing as time goes on from September to December, 2009.

New parameters obtained by regression considering the effect of school closure is as follows: b ¼ 0.00000766 and f ¼ 0.300. In this case, basic reproduction number (R0) is calculated as R0 ¼ 1.32. By using these parameter values, we can calculate the situation of spread if school closure was not executed at all. The result is shown in Fig. 16. We can observe that the peak of incidence became higher and earlier than it actually was and moreover the final proportion of students became larger from 41.2% to 45.2%.

24

SECTION

I Introduction and Disease Modeling

Humidity

20

15

10

5

50

100

150

t(day)

200

FIG. 15 Humidity (raw data and moving averaged data). I(incidence)

I(incidence)

500

No school closure School closure

400

20,000 15,000

300

10,000

200

No school closure School closure

5000

100 50

100

150

200

t(day)

50

100

150

200

t(day)

FIG. 16 Estimation of the situation of spread if no school closure was installed (black curve) in comparison with the curve of observed data (gray curve).

2.4 Incorporating Exposed Phase: SEIR Model It is more realistic to assume that there is a lag time before becoming infectious from the time just infected. The duration is called latent period as explained before. In this case, regression model to be used is obtained as follows. First we must notice that dI ¼ gE  fI: dt The two variables, E and I, are not observed variables and so this equation cannot be used directly as a regression model. But the variable I can be related to I^ (the reported number of cases, the observed variable) by the relationship ^ . Thus the equation is rewritten as dI=dt ^ ¼ fgE  f I. ^ But E that I^¼ fI or I ¼ I=f still remains unable to be used. Fortunately, it is notable that   1 d I^ ^ +fI : E¼ fg dt

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

25

^ Moreover it This means E can be calculated from observed data I. follows that   dE 1 d 2 I^ dI^ + f ¼ : dt fg dt2 dt Thus we have     dE 1 d2 I^ dI^ b ^ b ^ 1 dI^ ^ +f ¼ ¼ SI  gE ¼ SI  +fI : dt fg dt2 dt f f f dt Finally, we have 2^ d I^ bg ^ fg ^ 1 d2 I^ ^ b2 I^ b3 d I : ¼ SI  ¼ b S I I 1 dt f + g f +g f + g dt2 dt2

This equation is used as a regression model because all variables are available as data except for S. But we can use S^ (apparent S that may include I) instead of S at first. Once the regression coefficients are obtained, parameters f and g are calculated as the roots of the quadratic equation, b3x2  x + b2 ¼ 0. The last parameter b is calculated as b ¼ b1(f + g)/g. The result of regression analysis is given below: Solution 1: f ¼ 1.61, g ¼ 0.32, and b ¼ 0.0000386, Solution 2: f ¼ 0.32, g ¼ 1.61, and b ¼ 0.00000759. Please note that R0 takes the common value, R0 ¼ 1.24, in both case. By successive approximation, the accuracy can be improved to: f ¼ 1.58, g ¼ 0.314, and b ¼ 0.0000388, with R0 ¼ 1.28. In the simulation, we cannot take D t ¼ 1 when g or f is larger than 1. In such a case, we must take D t ¼ 1/2 or smaller. Simulation was carried out with initial values S(1) ¼ 51,868, E(1) ¼ 2, I(1) ¼ 2, and R(1) ¼ 0 (Fig. 17).

2.5

Distributions of Latent and Infectious Periods

So far we assumed that latent and infectious periods follow exponential distribution. The exponential distribution has remarkable characteristics that removal is independent of the history of individual. The probability of developing disease is the same independent of how long time before the infection took place. That is the reason we can write the coefficient of removal is constant independent of time. Another characteristics of the exponential distribution is that it is peaked at the beginning and gradually decrease as time goes on. This idea can naturally lead to the method of back calculation that calculates number of infected in the past from reported number of cases using distribution of incubation period inversely. In this subsection, we consider the distribution of latent period keeping the distribution of infectious period to be exponential distribution.

26

SECTION

I Introduction and Disease Modeling

I

S, I, R 50,000

400

40,000

300 S

30,000

Simulation

I × 50

200

100

10,000 0

Observed data (moving averaged)

R

20,000

50

100

150

t(day)

200

50

100

150

200

t(day)

R 20,000 15,000

Simulation Observed data (moving averaged)

10,000 5000

50

100

150

200

t(day)

FIG. 17 Application of SEIR model to the observed data.

Commonly used distribution is Weibull and gamma distributions. They include exponential distribution as a special case in the family. If we denote the distribution of latent period t by G(tj a) using parameter a, we have dS ¼ bSI, dt

ðt dE ¼ bSI  Gðt  τj aÞEðτÞdτ, dt 0 ðt dI ¼ Gðt  τj aÞEðτÞdτ  fI, dt 0

and dR ¼ fI: dt with common initial value setting, S(0) ¼ N  1, I(0) ¼ 1, E(τ)  0 for τ  0, and R(0) ¼ 0, where N is the total population size. In the case where the model is time discrete version, we have Sðt + 1Þ ¼ SðtÞ  bSðtÞI ðtÞ, E1 ðt + 1Þ ¼ ð1  W1 ÞbSðtÞI ðtÞ, Ei ðt + 1Þ ¼ ð1  Wi ÞEi1 ðtÞ, Eo1 ðt + 1Þ ¼ ð1  Wo1 ÞEo2 ðtÞ,

Mathematical Models of Infectious Diseases and Data Analyses Chapter

I ðt + 1Þ ¼ I ðtÞ  fIðtÞ +

o1 X

1

27

Wj + 1 Ej ðtÞ + W1 bSðtÞI ðtÞ,

j¼1

and

Rðt + 1Þ ¼ RðtÞ + fIðtÞ:

Please note that there are variables E1, E2, …, Eo1 if we set the maximum length of latent period is o. The variable Ei represents the number of individuals who spent i days from the day when infection took place. The variable ends at Eo1 because Eo is equal to I. The parameter wi is a proportion of exposed (infected) individual who develops disease i days after the day of P exposure (infection). We have wi  0 and o i¼1 wi ¼ 1. The parameter Wi is a conditional probability that infected individual who did not develop disease during i  1 days turns to develop on the ith day: wi : Wi ¼ o X wj j¼i

It is notable that W1 ¼ w1 and Wo ¼ 1. The last equation means the exposed always move to the infected at t ¼ o and we have Eo(t)  0. If the distribution is Weibull, then the probability density function is given as follows:   m  m  t m1 t : exp  g ðt Þ ¼ y y y The parameter m is called the shape parameter and the other parameter y is the scale parameter. The mean and variance values are given by following equations:     y 1 1 ¼ yG 1 + m¼ G m m m and

 2      2 ! y 2 1  G 2mG s ¼ m m m     2 ! 2 1 2  G 1+ , ¼y G 1+ m m 2

where G() is a Gamma function. Inversely, if mean m and variance s2 is known, shape and scale parameters of Weibull distribution are calculated as follows. The shape parameter m is a solution of the equation:       2 2 s2 1 G 1+ ¼ 1+ 2 : G 1+ m m m

28

SECTION

I Introduction and Disease Modeling

Once m is known, the scale parameter y is calculated by y ¼ m/G(1 + 1/m). When m < 1, the events occurs earlier stage, whereas the events takes place over time with increasing probability when m > 1. If m ¼ 1, this distribution turns to be an exponential distribution where probability is independent of time. Another distribution frequently used as one describes latent period is Gamma. The probability density function of Gamma distribution is given as follows:  t tk1 exp  : g~ðtÞ ¼ k y GðkÞy The parameter k is called the shape parameter and the other parameter y is the scale parameter. The mean of the distribution is ky and the variance is ky2. If k ¼ 1, this distribution again turns to be an exponential distribution. When k is a positive integer, it is known that the sum of k values each value following exponential distribution with scale parameter y follows Gamma distribution with shape parameter k. In contrast, the distribution we previously assumed, the exponential distribution, with mean 1/g is given as follows: g~ðtÞ ¼ g exp ðgtÞ. The variance of the distribution is given by 1/g2. The cumulative distribution function, or the probability that the time of occurrence is less than t is given by G(t) ¼ 1  exp( gt). The model can be expressed in another form as follows: Sðt + 1Þ ¼ SðtÞ  bSðtÞI ðtÞ, I ðt + 1Þ ¼ I ðtÞ  fI ðtÞ + w1 bSðtÞI ðtÞ + w2 bSðt  1ÞI ðt  1Þ + ⋯ +wo bSðt  o + 1ÞI ðt  o + 1Þ, and

Rðt + 1Þ ¼ RðtÞ + fI ðtÞ:

This is essentially an SEIR model although it involves no E variable. If the distribution is fixed time (i.e., Dirac delta), see Kawano and Kakehashi (2015). The distribution of latent period is more serious concern in the case where latent period is long. We show an example of analysis for disease with long latent period, i.e., bovine spongiform encephalopathy (BSE) or variant Creutzfeldt-Jakob disease (vCJD). If we consider the distribution of infectious period instead of latent period, SIR model can also be formed. This example is not the case of directly transmitted infectious disease, it can provide how the distribution of time interval is used in the analysis. Following graphs are the incidence of BSE (left) of cow and that of vCJD (right) of

Mathematical Models of Infectious Diseases and Data Analyses Chapter

Number of BSE cows

1

29

Number of vCJD cases

35,000 25

30,000 25,000

20

20,000

15

15,000

10

10,000

5

5000 0

1985

1990

1995

2000

2005

Year

0

1985

1990

1995

2000

2005

Year

FIG. 18 Incidence of BSE (left) and incidence of vCJD (right) in United Kingdom (1980–2007).

human in United Kingdom. The horizontal axis represents years from 1980 to 2007 (Fig. 18). To carry out the analysis following notations are used: N(t): the population size of United Kingdom at year t, B(t): the number of cows infected by BSE in United Kingdom at year t, C(t): the number of vCJD deaths in United Kingdom at year t, e(t): the effectiveness of preventive measure against BSE in United Kingdom at year t, e(t) ¼ 1 if the measure is perfect and e(t) ¼ 0 if the measure has no effect, f(t, a): the distribution of incubation period, the probability of developing vCJD after t years from infection where a is a parameter of the distribution. Please note that this is not an infectious disease directly transmitted from human case to human susceptible. Using these notations, the expected value of the incidence of vCJD at year t is given by below: lðtÞ ¼

t X

kBðiÞeðiÞN ðiÞfðt  i + 1, aÞ ¼ kLðt, aÞ,

i¼1

where k is a constant. The method of calculating the number of infection from the number of reported cases using the distribution of incubation period inversely is called back calculation frequently used in the analyses of HIV/ AIDS. If we assume Poisson distribution, then the log likelihood function LL is: " !# tˆ Y lðtÞCðtÞ expðlðtÞÞ : LL ¼ log CðtÞ! t¼1 Consequently, we have " !# tˆ Y ðkLðt, aÞÞCðtÞ LL ¼ log expðkLðt, aÞÞ CðtÞ! t¼1

30

SECTION

¼

I Introduction and Disease Modeling

tˆ X

CðtÞ log ðkLðt, aÞÞ 

tˆ X

t¼1

¼

tˆ X

log ðCðtÞ!Þ 

tˆ X

t¼1

CðtÞ½ log ðkÞ + log ðLðt, aÞÞ 

t¼1

kLðt, aÞ

t¼1 tˆ X

log ðCðtÞ!Þ 

t¼1

tˆ X

kLðt, aÞ:

t¼1

When the data C(t) are given, we can determine the values of k and a that maximize log likelihood LL. In particular, if we calculate partial derivative of LL with respect to k and set equals to 0, we obtain: tˆ X



CðtÞ

t¼1

tˆ X

:

Lðt, aÞ

t¼1

If we assume Weibull distribution for incubation period with parameter a ¼ (y, p), we have

0 −100 −200 −300 6

0.8 0.6 Sd/mean

8 10 Mean

0.4 12 14

0.2

Log–likelihood

Log–likelihood

fði, aÞ ¼ W ðij y, pÞ  W ði  1j y,pÞ,  up  (u  0) is a distribution function of Weiwhere W ðuj y,pÞ ¼ 1  exp  y bull distribution. Three-dimensional graphs of log likelihood LL for different values of mean and variance were exhibited for the case where measure was not effective at all (left, e ¼ 0) and the case where it was completely effective (right, e ¼ 1) (Fig. 19). In the case where the preventive measure was not effective at all, mean incubation time was 8.5 years with standard deviation 1.7 years (0.2 times mean). In contrast, in the case where it was perfect, mean incubation time was 10.0 years with standard deviation 3.0 years (0.3 times mean). Comparing these two cases with the observed data of vCJD cases shown in Fig. 20, the case of no effect of ban seemed to fit more closely.

0 −100

0.8

−200 −300 6

0.6 Sd/mean

8 10 Mean

0.4 12 14

0.2

FIG. 19 Three-dimensional graph of log likelihood (LL): no ban (e ¼ 0, left) and complete ban (e ¼ 1, right).

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

31

Number of vCJD cases Observed 25 20

No ban Ban

15 10 5 0

Year 1995

2000

2005

FIG. 20 Comparison of estimated number of vCJD cases.

2.6

Multiple Subgroups and Generation Matrix

It is often observed that whole population is consisted of several groups with different characteristics with respect to infection. For example, students may well be transmitted disease more likely from students of the same class or school than those of different classes or schools. In general, people are more likely to have contact with people with the same or similar age. Here, in this section, we present an example of infectious disease spread model of population consisting of three subgroups. The subgroup 1 is consisted of elementary schools and kindergartens. Total number of students (pupils) is 28,797 (27,403 of elementary school and 1394 of kindergartens). The subgroup 2 is consisted of junior high schools and the subgroup 3 is of high schools. Total number of students is 13,346 and 9729, respectively. There were 60 elementary schools, 28 junior high schools, 13 high schools, and 33 kindergartens. The graph of I^i ði ¼ 1, 2, and3Þ and R^i ði ¼ 1, 2, and3Þ is given in Fig. 21. The graph of I^i s was smoothed by operating moving average method as before (left). The graph of R^i s was based on the raw data (right). Each subgroup reached the saturation level but the level was different as 44.3% for subgroup 1, 41.2% for subgroup 2, and 30.7% for subgroup 3. The final proportion of infected students was smaller in subgroup 3 than in subgroups 1 and 2. The model for these three subpopulations can be written as follows: Si ðt + 1Þ ¼ Si ðtÞ  Si ðtÞðbi1 I1 ðtÞ + bi2 I2 ðtÞ + bi3 I3 ðtÞÞ ¼ Si ðtÞð1  ðbi1 I1 ðtÞ + bi2 I2 ðtÞ + bi3 I3 ðtÞÞÞ, Ii ðt + 1Þ ¼ Ii ðtÞ + Si ðtÞðbi1 I1 ðtÞ + bi2 I2 ðtÞ + bi3 I3 ðtÞÞ  fi Ii ðtÞ ¼ Si ðtÞðbi1 I1 ðtÞ + bi2 I2 ðtÞ + bi3 I3 ðtÞÞ + ð1  fi ÞIi ðtÞ

32

SECTION

I Introduction and Disease Modeling

I(incidence) R

400 12,000

300 1 2 3

200

1 2 3

10,000 8000 6000

100

4000 2000 t(day) 0

50

100

150

200

50

100

150

200

t(day)

^ of three subgroups: (1) eleFIG. 21 Observed data (incidence I^ and cumulative incidence R) mentary schools and kindergartens, (2) junior high schools, and (3) high schools (incidence data were moving averaged three times).

and Ri(t + 1) ¼ Ri(t) + fiIi(t), for i ¼ 1 , 2 , and 3. Here we assumed different f for each subgroup. In this case, there are three fs (f1, f2, f3) and nine bs (b11, b12, … , b33). The next-generation matrix is the form as follows: 0 1 b11 N1 b12 N2 b13 N3 0 1 B f1 f2 f3 C B C R11 R12 R13 B b21 N1 b22 N2 b23 N3 C C @ R21 R22 R23 A ¼ B B f1 f2 f3 C B C R31 R32 R33 @ b31 N1 b32 N2 b33 N3 A f1 f2 f3 The dynamics of 0 1 0 Sb I d @ 1 A @ 1 11 I2 ¼ S2 b21 dt I3 S3 b31

the infected is written as follows: 10 1 0 10 1 f1 0 0 S1 b12 S1 b13 I1 I1 S2 b22 S2 b23 A@ I2 A  @ 0 f2 0 A@ I2 A ¼ AI  BI: S3 b32 S3 b33 I3 I3 0 0 f3

Matrix A represents increase in number and the inverse of matrix B represents duration of infectious time. Then next-generation matrix is given as: 0 1 R11 R12 R13 R0 ¼ @ R21 R22 R23 A ¼ AB1 : R31 R32 R33 We estimate the parameters by regression models below: Ii ðt + 1  Ii ðtÞÞ ¼ bi1 Si ðtÞI1 ðtÞ + bi2 Si ðtÞI2 ðtÞ + bi3 Si ðtÞI3 ðtÞ  bi4 Ii ðtÞ: We find each parameter by least square method but we postulate constraint that any b is not negative (i.e., positive or 0). In addition, the transmissive effect to elementary school pupils from high school students was unreasonably high, we put obligatory b13 ¼ 0. The results are as follows:

Mathematical Models of Infectious Diseases and Data Analyses Chapter

A

B

S

33

1

S

14,000

30,000

12,000

25,000

10,000

20,000

8000 15,000

6000 Simulation

10,000

Simulation

4000

Observed 5000

Observed

2000 0

50

100

150

200

C

t(day)

0

50

100

150

200

t(day)

D S

S

10,000

50,000

8000

40,000

6000

30,000

4000

Simulation Observed

2000

0

50

100

150

200

20,000

Simulation Observed

10,000 t(day)

0

50

100

150

200

t(day)

FIG. 22 Results of the simulation of three subpopulations: Temporal change in susceptible students of (A) elementary schools and kindergartens, (B) junior high schools, (C) high schools, and (D) the sum of three subpopulations.

0 1 0 1 0:234 f1 @ f2 A ¼ @ 0:296 A 0:391 f3 and

0

1 0 1 b11 b12 b13 10:23 0:65 0∗ @ b21 b22 b23 A ¼ @ 0:00 27:52 1:04 A  106 b31 b32 b33 0:11 0:43 45:63

*: forced to be 0. The correlation coefficients between independent variables were so large (almost nearly unity) that we must care for multiple collinearity. The results of simulation using the above parameters were given in Fig. 22. Dealing with only regression may lead to mistake but if we use knowledge together with regression it may lead us to the hidden truth.

3 STOCHASTIC MODEL OF INFECTIOUS DISEASE SPREAD: MICROSCOPIC MODEL CONSIDERING EACH CLASS Deterministic dynamical models of infectious disease spread are dealing with average numbers. So the numbers are real numbers or decimal numbers in contrast to actual number of individuals which is an integer. When the whole population is large enough and there was no inconvenience to regard the

34

SECTION

I Introduction and Disease Modeling

population size as a real number, deterministic model is suitable to be used. But, more frequently, there is a situation we must consider more small subpopulations in which their populations sizes cannot be dealt as decimals reasonably. Classes under the spread of influenza in school population are the typical example of such subpopulations. The criteria for school closure are applied to classes because the rule is that a class is closed if more than 10% of students of the class are absent. If more than two classes in the same grade of the same school is closed, all the classes of the grade are closed and moreover if more than two grades are closed then whole school is closed. If the number of infected students were treated in average, all classes would be closed synchronously although some classes may be closed while other classes may not be due to stochastic variability in the real world. In this section, we present models of infectious diseases that treat the numbers of students as integers. They are stochastic models in which the number of individuals at the next time unit is given by probability distributions on integers. More specifically, we assume the number of students who are transmitted infectious disease in a certain class follows binomial distribution Bin(S, p) with S the number of students who are susceptible in the class and p the probability that the transmission of infectious disease takes place. The number of students who remain susceptible also follows binominal distribution with probability 1  p. The probability p is estimated by maximum likelihood method as a function of the numbers of infectious students around at the moment. The method of estimation we used in the previous section, the least square method, is equivalent if the distribution of error is normal with mean 0 (i.e., a normal distribution N(0, s2), where s2 is the variance). Taking advantage of infectious disease spread data collected in small-sized groups such as school classes, using individually based stochastic model is considered to be appropriate. In spite that collected data are usually the numbers of cases (patients) on each time interval, the data required to estimate parameters in the model are those of persons just infected on each interval. To reconstruct infection data from case report data taking the effect of latent period into account, we proposed “Monte Carlo back calculation (MCBC)” approach to analyze properly the effect of school closure.

3.1 Analyses for Counted Data As we have seen in the previous section, the number of individuals is often treated as a continuous number in dynamical models of population dynamics and theoretical epidemiology. But, to be more realistic, the number must be an integer. To carry out this kind of analysis, we must found stochastic models considering probability of changing one integer to another. In this section, we apply this idea to the school flu data. In this subsection, we represent variables with the following expressions: Sij(t): the number of susceptible students of class j of school i at time t,

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

35

Iij(t) and Rij(t): those of infectious and removed students. The observed data, the numbers of cases of class j of school i at time t is represented by I^ij ðtÞ. We postulate the association that I^ij ðtÞ ¼ fIij ðtÞ. We calculate: Xt R^ij ðtÞ ¼ I^ij ðτÞ and S^ij ðtÞ ¼ Nij  R^ij ðtÞ. τ¼1

The hat (^) variables are all derived from observed data. The variable S^ij ðtÞ is called “susceptible” but may include infected students not reported by time t. For simplicity, we use S^i ðtÞ, I^i ðtÞ, and R^i ðtÞ to represent the sum of all the classes of school i. Moreover, the sums of all schools are S^ðtÞ, I^ðtÞ, and R^ðtÞ, as are the same as in the previous section. Closure of class j of school i at time t is represented by C^ij ðtÞ. The variable ^ Cij ðtÞ is 1 if the class is closed and equals to 0 if it is open. We consider between September 1 (t ¼ 32) and December 24 (t ¼ 145), the second semester, to avoid the influence of long-term vacation (up to August 31, it was summer vacation and from December 25, winter vacation started).

3.2

Modeling the Reporting Delay

One difficult problem that arises when we treat reported case data as count data are that we must involve reporting delay in the model. If we carry out smoothing method like moving average, the data become decimal data and no longer remain integers. Thus it is obligatory to build a model incorporating reporting delay process. In the model of reporting delay, we introduce new variables in addition to Sij(t) , Eij(t) , Iij(t) , and Rij(t). They are denoted without “^” because they are state variables of the system. New variables are Aij(t) and Jij(t). The variable Aij(t) represents the number of new cases of which reporting is postponed in the class j of school i at time t. If time t is a weekday, Aij(t) ¼ 0. If time t is a holiday, i.e., Saturday, Sunday, or a national holiday, Aij(t) can be positive (Aij(t) > 0). If the next day t + 1 is a weekday, the number Aij(t) is included in the number of reported cases at time t + 1, I^ij ðt + 1Þ. On the contrary, if the next day t + 1 is a holiday, the number Aij(t) is added to the number of cases of which report is postponed, Jij(t + 1). The variable Jij(t) is the number of cases in the “waiting list” to be reported at time t. It will be added to I^ij , the number of reported cases, on the first day which is not a holiday after the day t. The flow of transition is schematically illustrated in Fig. 23. How many cases are postponed on holidays? In Fig. 24, bar chart illustrates the number of reported cases summed by the day of week, Monday, Tuesday, …, Sunday. Weeks containing national holidays are excluded from the calculation. We can clearly observe dips on Saturdays and Sundays and increase on Mondays. Almost 30% of cases seem to be postponed on holidays. Thus we assumed a constant probability of reporting delay on holidays. We assumed constant rate of postponing report as 67% based on the observed

36

SECTION

I Introduction and Disease Modeling

t t–1 (weekday) (weekday)

t t–4 t–2 t–3 t–1 (weekday) (holiday) (holiday) (holiday) (weekday) I

I

I 1–a

1

1–a

a

0 Î J

J

I

I 1–a

a

1

a

Î

Î

Î

Î

J

J

J

J

(= 0)

(= 0)

FIG. 23 Scheme for reporting delay.

Frequency (days)

4000

3000

2000

1000

0 Sun

Mon

Tue

Wed

Thu

Fri

Sat

FIG. 24 Sum of reported numbers by the day of the week.

data. We also assumed average duration of latent period was 3 days, according to the results of previous analysis. Average number of reported case in each day of the week was 1908. But the reported numbers of cases on Sundays and Saturdays were 481 and 869, respectively. Thus, (481 + 869)/(1908  2) ¼ 0.354, only 1/3 of cases were reported on Saturdays and on Sundays. In other words, approximately two out of three cases were postponed reporting on Saturdays and on Sundays.

3.3 Modeling the Transition of Infectious Diseases We assume state variables of the system, Sij(t) , Eij(t) , Iij(t) , and Rij(t) with respect to infection, whereas the data available to estimate these variables is only I^ij ðtÞ. Other data, C^ij ðtÞ, are concerning to school closure, not directly

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

37

related with the status of students from the viewpoint of infection. First of all, we build up stochastic model of state transition. We denote the probability of being transmitted disease in the class j of school i at time t as pij(t). The probability pij(t) is assumed to be the function of the numbers of infected students at time t. The infectivity of infected student can be different among different types of students, e.g., students belonging to the same class, not the same class but the same school, and a different school. The force of infection is thus assumed to be the linear combination of different categories of infected students. The transition rates, bs, are specified as follows: ba: the transition rate caused by infected students belonging to different school, bs: the transition rate caused by infected students belonging to different class but the same school, Dbs: additional rate from those who belongs to different school (¼ bs  ba), bc: the transition rate caused by infected students belonging to the same class, Dbc: additional rate from those who belongs to different class of the same school (¼ bc  bs). If we denote baseline transition rate by b0, the probability is given by the following equation:   lij ðtÞ ¼ b0 + ba I ðtÞ + ð1  hðtÞÞ 1  Cij ðtÞ Dbs Ii ðtÞ + Dbc Iij ðtÞ , where Cij(t) represents school closure (¼1 if closed, ¼0 if not closed) and h(t) represents holiday (¼1 if holiday, ¼0 if not holiday). We assumed no transmission of infectious disease takes place between students belonging to the same class or school on days when school or class closure is implemented and on holidays except for the level of influence between different schools. Probability must stay between 0 and 1. To avoid values out of range, we put,  pij ðtÞ ¼ 1  exp lij ðtÞ If we assume binomial distribution with parameters Sij(t) and pij(t), then we have mean number of newly infected individuals as Sij(t)pij(t), which is the same value in the deterministic model. The log likelihood LL is given as follows: LL ¼

X145 Xi max Xj max ðiÞ

t¼32

i¼1

j¼1

Sij ðt + 1ÞlðtÞ + Eij ðt + 1Þ log ð1  exp ðlðtÞÞÞ :

Please note that among Sij(t) students, Sij(t + 1) remain susceptible with probability 1  pij(t) ¼ exp( lij(t)) and become infected to turn to Eij(t + 1) with probability pij(t) ¼ 1  exp( lij(t)). The whole flow of transition is illustrated in Fig. 25. The double lined arrows represent the flows on weekdays and the broken lined arrows on holidays.

38

SECTION

I Introduction and Disease Modeling

Class j of school i Susceptible

Day: t exp (–lij (t))

Sij (t)

Day: t +1 Sij (t + 1)

1–exp (–lij (t)) Exposed (latent)

Eij (t)

1–u

Eij (t + 1)

u

Infected (infectious)

a

Case to be reported (new)

Iij (t + 1)

Iij (t) 1–a

Aij (t)

Aij (t + 1)

Case to be reported (cumulative)

Jij (t)

Jij (t + 1)

Case (reported)

Îji (t)

Îji (t + 1)

: Transition if t + 1 is a weekday

: Transition if t +1 is a holiday

FIG. 25 Schematic presentation of transition flows.

3.4 Reconstruction of the Values of State Variables of the System In the scheme we used in this section, we assumed that the distribution of latent period was exponential and that infected students were usually reported as cases and stopped the transmission of infectious disease. We can adopt other different type of assumption but the method of analysis will be applied independent of the detail of assumptions. The first task we have to accomplish is to reconstruct the number of infected students, Iij(t) and Jij(t), from the number of reported cases, I^ij ðtÞ. And the next is to estimate the number of exposed, Eij(t), from the infected, Iij(t). Usually I^ij ðtÞ is calculated form Iij(t) and Jij(t), and Iij(t) and Jij(t) are calculated from Eij(t), we have to calculate reversely, which is more difficult than usual calculation. We first solve the reporting delay problem. As in the scheme presented before, it is easy when the days t and t + 1 are both weekdays. It simply holds that Iij ðtÞ ¼ I^ij ðt + 1Þ and Jij(t) ¼ 0. It is a little complicated when holidays are included. We consider a series of holidays together with the weekdays of the last before and the first after the series of holidays (see the previous figure on the right-hand side). Supposed that holidays continued d days. In the figure d ¼ 3 case is exhibited. We assume that the days of t and t  d  1 are weekdays and the days of t  1, t  2, …, t  d are holidays. In this situation we have to calculate Iij(t  1), Iij(t  2), …, Iij(t  d  1), and Jij(t  1), Jij(t  2), …, Jij(t  d) from I^ij ðtÞ, I^ij ðt  1Þ, …, I^ij ðt  dÞ because Jij(t  d  1) ¼ 0. If we focus one element of I^ij ðtÞ, this must come from Iij(t  1) with probability 1 or from Iij(t  2), …, or Iij(t  d  1) with probability a. Thus all individuals

Mathematical Models of Infectious Diseases and Data Analyses Chapter

u (1 – u)2

u (1 – u)3

E

E

E 1–u

u

1–u

u

39

E

1–u

u

t–4

u (1 – u)

1

1–u

u

u

I

I

I

I

t–3

t–2

t–1

t

FIG. 26 Transition from the exposed to the infected.

belonging to I^ij ðtÞ are delivered to days t  1, t  2, …, t  d  1 with probability 1/(1 + da), a/(1 + da), …, a/(1 + da), respectively. The delivery follows multinomial distribution. If the realized values are Q1, Q2, …, Qd+1, then we set Iij(t  1)¼ Q1, Iij ðt  2Þ ¼ Q2 + I^ij ðt  1Þ, …, Iij ðt  d  1Þ ¼ Qd + 1 + I^ij ðt  d Þ, in addition to Jij(t  d) ¼ Qd+1, Jij(t  d + 1)¼ Qd + Jij(t  d), …, Jij(t  1) ¼ Q2 + Jij(t  2). As far as a large difference cannot be expected among the numbers of infected students on these days, this method is justifiable. We next consider the reconstruction of variable Eij(t) from Iij(t). We assumed constant rate of transition from the exposed to the infected, u (see Fig. 26). So, here again, one element of Iij(t) must have come from one of the days t  1, t  2, … with probability u, u(1  u), … Thus total number of Iij(t) is distributed to Eij(t  1), Eij(t  2), … with these probabilities. In practice, we can disrupt after certain maximal number where contribution to transition is negligible. As we can estimate Eij(t) and Iij(t) in this way and Rij(t) is naturally calculated as the sum of I^ij ðτÞ up to t, we can obtain Sij(t) by subtracting the sum of Eij(t), Iij(t), and Rij(t) from the total number of population, N, at each time t. Obtained values of the system variables, {{(Sij(t), Eij(t), Iij(t), Rij(t))j 1  t  tmax}} are defferent in each time of calculation, but there are many possibilities and we cannot determine only one set of values that actually was. In the situation where detailed information is lost and the estimation from partial information is inevitable, even the past must be treated as a probability distribution with uncertainty.

3.5

Analysis and Simulation, and the Validity of the Model

After a sample of the values of system variables are obtained by the method mentioned above, we can estimate parameters by maximum likelihood method.

40

SECTION

I Introduction and Disease Modeling

I(incidence) 500

Stochastic model (no reporting delay)

400

Moving averaged

300 200 100 0

40

60

80

100

120

140

t(day)

FIG. 27 Simulation of the microscopic model (Stochastic model).

We here showed the result of 100 examples although we must use much larger number of samples to obtain more precise estimation. We obtained D bc ¼ 0.00648, D bs ¼ 0.000364, ba ¼ 0.0000184, and b0 ¼ 0.000136. The analysis, transmission coefficients due to individual in the same class and due to those in the same school were separately and found the former was approximately 15 times larger than that of the latter. It was also found that the transmission coefficients due to individual in the same school were approximately 10 times larger than that in a different school. We also carried out simulation by obtained parameter values (Fig. 27). The simulation was carried out during the second semester (t ¼ 32 , … , 146) as is the same interval in which the maximum likelihood estimation was calculated. The number of reported cases was compared after moving average is calculated because the result of simulation was incidence without reporting delay. Two sequences were almost coincident except for the discrepancy seemed a little larger during the last 10 days of September and the first and the second 10 days of October (t ¼ 52 , … , 82). These differences can be attributable to the fact that the effect of school closure was not incorporated properly in the simulation. Since we can simulate class level spread of flu, school/class closure strategy can be installed in the simulation. We can now investigate the effect of preventive measures based on the situation of the class. We can also examine other hypotheses based on more realistic model. Although it takes more time to carry out simulation than deterministic simulation of total population because the total number of classes is 1655, rapidly developing technology of computers will make it no problem.

4 AN ANALYSIS OF SPATIAL DISTRIBUTION In this subsection, we analyze the effect of geographic location of schools using school as the unit of analysis. So we use Sm(t), Im(t), Rm(t), and I^m ðtÞ

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

41

5

0

–5

–10 –15

–10

–5

0

5

10

15

20

FIG. 28 Location of schools: each school is represented by a disk. The width of the disk is proportional to school size (the number of all students of the school).

as variables for school m. The distance between school m and school n is denoted by dmn. The basic hypothesis considered in this section is whether the infection in schools is more strongly influenced from schools nearby than from schools far away. In the analysis, the concept of transition kernel, which is a function of the distance between two schools, plays an important role. We begin with the explanation of the location of the schools.

4.1

Location of Schools

In Fig. 28, schools are represented by disk at the location of the map with the size of area proportional to the total number of students of the school. The effect of distance to transmission of infection has been investigated using the idea of kernel since Keeling et al. (2001) on the analyses of foot-andmouth disease in United Kingdom. The method also played an important role in evaluating preventive measures (Tildesley et al., 2006). We use here the idea of kernel to investigate the relationship between schools of different distance.

4.2

Estimating Transition Kernel

The transition kernel is a function of the distance between two schools, representing the reduction of transmission rate b due to the distance. We assume following type of function after Keeling et al. (2001): K ðd Þ ¼

1 ð1 + adÞb

,

where d is the distance between two schools and a and b are parameters. The force of infection at school m at time t is given as follows: X X 1 b0 K ðdmn ÞIn ðtÞ ¼ b0 In ðtÞ, Fm ðtÞ ¼ ð1 + admn Þb n n

42

SECTION

I Introduction and Disease Modeling

where dmn is the distance of two schools m and n. Since the value of the kernel is unity if the distance is 0, the transition rate b due to infected students from the same school is b0 because the “distance” between the same school is 0 (i.e., the value of the kernel is unity). Using this function, transmission dynamics using SIR model are described as follows: Sm ðt + 1Þ ¼ Sm ðtÞ  Sm ðtÞFm ðtÞ ¼ Sm ðtÞð1  Fm ðtÞÞ, Im ðt + 1Þ ¼ Im ðtÞ + Sm ðtÞFm ðtÞ  fIm ðtÞ ¼ Sm ðtÞFm ðtÞ + ð1  f ÞIm ðtÞ, and Rm ðt + 1Þ ¼ Rm ðtÞ + fIm ðtÞ, for all m (all schools). The equations are quite similar to the subgroup model except for we assumed the same f here. Parameter estimation is carried out in the same way as described in the previous section. The log likelihood function LL is: ! YY Sm ðt + 1Þ Im ðt + 1Þ ð1  pm ðtÞÞ pm ð t Þ LL ¼ log t

¼

XX t

m

ðSm ðt + 1Þlog ð1  pm ðtÞÞ + Im ðt + 1Þlog pm ðtÞÞ:

m

We assumed pm(t) ¼ Fm(t). To avoid negative values, we sometimes have to put, p0 m(t) ¼ 1  exp( pm(t)). The result of maximum likelihood estimation is as follows: b0 ¼ 0.0000320, a ¼ 0.0235, and b ¼ 2.08. The graph of the kernel is given in Fig. 29. We can see that the transition rate is reduced to approximately 64% if the school is 10 km away. 44.9% if 20 km away.

1.0

0.8 0.6 0.4 0.2

0

5

10 Distance (km)

15

20

FIG. 29 Obtained transition kernel: K(d) ¼ 1/(1 + ad)b with a ¼ 0.0235 and b ¼ 2.08.

Mathematical Models of Infectious Diseases and Data Analyses Chapter

4.3

1

43

Influence of the Network of Transmission

In the analysis explained in this section, we used direct straight line distance between schools. But, in general, the frequency of human contact that has large influence on the transmission of infection may depend more on the means of transportation. Even if two schools are close with each other in straight line distance, contacts between students of two schools will be less frequent if there is a barrier between the two schools, e.g., a river, etc. Thus means of transportation must be considered to measure the distance between two groups. Models of infectious diseases usually assume that the contact between individuals in the population is random for simplicity. But human contact is not random and small number of individuals may have very large number of contacts. The effect of such nonrandom contact, or the network of human relationship is treated in Pyne et al. (2015).

5

CONCLUSION

So far in this chapter, we presented an application of dynamical model to infectious disease spread data. In spite of the method usually used in traditional analyses, this approach with dynamical models provided a new insight for the analyses on the spread of infectious diseases. Using dynamical models in epidemiology has long been called theoretical epidemiology. Some may find it strange for “theory” to be emphasized very much because theory is important and indispensable in every field of science. Theoretical epidemiology must cover many fields of epidemiology other than infectious diseases. It may well happen that things too far from practice not directly used is called “theory.” Difficulty encountered when dynamical model is used in data analyses has been making this situation likely. The naming of theoretical epidemiology can partly be explained by this viewpoint. But now by the development of new methodology as we have presented, and in addition to availability of highly developed computers, this viewpoint may no longer hold. We can preferably call mathematical epidemiology rather than “theoretical epidemiology.” In our real world, the provision of predictive information is sought as soon as possible when an outbreak of infectious disease has taken place. Providing reliable prediction where available information is restricted requires accumulated knowledge on infectious disease outbreaks. In this viewpoint, the outbreak of Ebola virus disease occurred in West Africa in 2014–15 is full of lessons. Chowell et al. (2017) emphasized the importance of sharing data and information online. Insufficiency of data in the very early stage. They also noticed that the combination of various approaches could reduce the uncertainty that single model approach may have. At the same time, they pointed out that the possibility of misleading nature that applying exponential growth model easily to whole population. Using dynamical model as statistical model for data analyses is in principle very simple. Once parameter values and initial values are given, the

44

SECTION

I Introduction and Disease Modeling

solution of the dynamical system, or the predicted values are obtained. Then, we can calculate discrepancy between the predicted values and observed values if we set suitable measure of discrepancy or errors. We can find the obtained estimated parameter value as the parameter values that minimize the measure of the discrepancy. Even if this method needs a lot of amount of calculation, rapidly developing technology of computers sooner or later resolve such a problem. If it is true, what is the meaning of using dynamical model as a regression model? As residual analysis is always emphasized in common regression analysis, it is very important to examine the difference between observed data and predicted values using models or theories. To carry out this in the analysis using dynamical model as a regression model, it is inevitable to investigate the difference in the framework of dynamical system. Mathematical model is sometimes said most useful when it failed. Failure of the model implies that some important factor is not involved in the model or mistakenly connected to other factors. Thus failure is the chance to recognize such lack of knowledge and to improve the understanding of the phenomena. Some people often claim that mathematical model is too simple to be realistic and far beyond to believe the results. Such people cannot be benefitted by the role of mathematical model it can play in the understanding our world. We believe that models may usefully work, in a sense, similar to null hypothesis in statistical testing. Dynamical model approach is a kind of state-space approaches which treats system variables in addition to observed variables. By setting such substantial variables of the system, we can easily involve preventive measure and calculate the effects of such measures. But using dynamical models as statistical model in data analyses have many tasks to be resolved. Some equations theoretically valid may not useful in estimation of parameters from observed data. “Mont Carlo Back Calculation” approach was also a useful method to estimate parameters involved in stochastic infectious disease spread model. We hope readers will further develop these methods by analyzing important infectious disease data in real world. The spread of infectious diseases is influenced by a lot of factors even though it is driven by the contact of hosts. People my change their behavior that has large influence to transmission. As we stated in the first place, the analysis of the spread of infectious diseases can contribute population health and challenging task to tackle. It is much pleasure of ours that this chapter could contribute to such activities in our society.

REFERENCES Brauer, F., Castillo-Cha´vez, C., 2001. Mathematical Models in Population Biology and Epidemiology. Springer-Verlag, New York. Chowell, G., Viboud, C., Simonsen, L., Merler, S., Vespignani, A., 2017. Perspectives in model forecasts of the 2014-2015 Ebola epidemic in West Africa: lessons and the way forward. BMC Med. 15, 42.

Mathematical Models of Infectious Diseases and Data Analyses Chapter

1

45

Hirsch, M.W., Smale, S., Devaney, R.L., 2004. Differential Equations, Dynamical Systems, and an Introduction to Chaos. Elsevier Academic Press, New York. Kawano, S., Kakehashi, M., 2015. Substantial impact of school closure on the transmission dynamics during the pandemic flu H1N1-2009 in Oita, Japan. PLoS One 10, e0144839. Keeling, M.J., Woolhouse, M.E.J., Shaw, D.J., Matthews, L., Chase-Topping, M., Haydon, D.T., Cornell, S.J., Kappey, J., Wilesmith, J., Grenfell, B.T., 2001. Dynamics of the 2001 UK foot and mouth epidemics: stochastic dispersal in a heterogeneous landscape. Science 294, 813–817. Kermack, W.O., McKendrick, A.G., 1927. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. 115, 700–721. Malthus, T.R., 1798. An Essay on the Principle of Population. Pelican, Harmondsworth, Middlesex. republished by Penguin Books, New York, in 1970. Murray, J.D., 2002. Mathematical Biology: I. An Introduction, third ed. Springer, New York. Nishiura, H., Kakehashi, M., Inaba, H., 2009. Two critical issues in quantitative modeling of communicable diseases: inference of unobservables and dependent happening. In: Chowell, G., Hyman, J.M., Bettencourt, L.M.A., Castillo-Chavez, C. (Eds.), Mathematical and Statistical Estimation Approaches in Epidemiology. Springer, New York. Pyne, S., Vullikanti, A.K.S., Marathe, M.V., 2015. Big data applications in health sciences and epidemiology. In: Govindaraju, V., Raghavan, V.V., Rao, C.R. (Eds.), Big Data Analytics. Handbook of Statistics, vol. 33. Elsevier, Amsterdam. Shaman, J., Kohn, M., 2009. Absolute humidity modulates influenza survival, transmission, and seasonality. Proc. Natl. Acad. Sci. U.S.A. 106, 3243–3248. Subba Rao, T., Subba Rao, S., Rao, C.R. (Eds.), 2012. Time Series Analysis: Methods and Applications. Handbook of Statistics, vol. 30. Elsevier, Amsterdam. Thieme, H.R., 2003. Mathematics in Population Biology. Princeton University Press, Princeton. Tildesley, M.J., Savill, N.J., Shaw, D.J., Deardon, R., Brooks, S.P., Woolhouse, M.E.J., Grenfell, B.T., Keeling, M.J., 2006. Optimal reactive vaccination strategies for a foot-andmouth outbreak in the UK. Nature 440, 83–86.

FURTHER READING Dietz, K., 1993. The estimation of the basic reproduction number for infectious diseases. Stat. Methods Med. Res. 2, 23–41. Grenfell, B., Matthew, K., 2007. Dynamics of infectious disease. In: May, R.M., McLean, A.R. (Eds.), Theoretical Ecology Principles and Applications. Oxford University Press, Oxford.