Bézier curve parametrisation and echo state network methods for solving optimal control problems of SIR model

Bézier curve parametrisation and echo state network methods for solving optimal control problems of SIR model

BioSystems xxx (xxxx) xxxx Contents lists available at ScienceDirect BioSystems journal homepage: www.elsevier.com/locate/biosystems Bézier curve p...

7MB Sizes 0 Downloads 20 Views

BioSystems xxx (xxxx) xxxx

Contents lists available at ScienceDirect

BioSystems journal homepage: www.elsevier.com/locate/biosystems

Bézier curve parametrisation and echo state network methods for solving optimal control problems of SIR model Tibor Kmeta, , Maria Kmetovab ⁎

a b

J. Selye University, Department of Mathematics and Informatics, Bratislavska Cesta 3322, 945 01 Komarno, Slovakia Constantine the Philosopher University, Faculty of Natural Sciences, Department of Mathematics, Nitra, Slovakia

ARTICLE INFO

ABSTRACT

Keywords: Bernstein–Bézier curve Invasive weed optimisation Particle swarm optimisation Echo state networks Adaptive critic design SIR model Optimal control problem Numerical examples

In this work, we introduce an optimal control problem with two control variables of the SIR (susceptible-infected-recovered) epidemic model to minimise the infective and susceptible individuals. To solve the control problem, we use direct Bernstein–Bézier parametrisation of the control variables and metaheuristic optimisation methods of an objective function, and indirect methods, adaptive critic design (ACD) with echo state networks (ESNs) and Pontryagin's maximum principle. We propose Bernstein–Bézier parametrisation and ESNs based algorithms to solve optimal control problems for systems governed by differential equations. We use ACD with ESNs to approximate co-state equations. Our simulations have shown that the proposed two methods are able to solve optimal control problems.

1. Introduction There are two general approaches to solve optimal control problems. These are often labelled as indirect and direct methods. In indirect methods there are some numerical methods to solve the problems of obtaining an optimal control uˆ (t ) and optimal trajectory xˆ (t ) (Kirk, 1989; Pontryagin et al., 1983) using Pontryagin's maximal principle (PMP) or necessary condition of optimality. We introduce here two indirect methods. The first indirect method is based on solving optimality systems that consist of two coupled differential equations, state x(t) and co-state λ(t) differential equations, with opposite orientation and a control equation (Borzi, 2003). This indirect method starts with an initial estimation for the co-state equation, then the state equation and the control equation are solved by a forward method in time. These state and control values are used to solve the co-state equations by backward method in time. The second indirect method is based on work (Werbos, 1992). In 1977, Werbos (1992) introduced an approach for approximate dynamic programming which later became known as adaptive critic design (ACD). The typical ACD consists of three modules: action, critic, and model (plant). The action and critic networks approximate control u(t) and co-state λ(t) variables, respectively, forward in time. At any given time, the critic provides guidance on how to improve the control law and the action is used to update the critic network. The algorithm that successively iterates between action and critic networks converges to the optimal solution over time. The plant dynamics are discrete



and deterministic, and they are modelled by a difference equation. The action and critic networks are chosen as neural networks, for example feed forward or echo state networks (ESNs) (Kmet and Kmetova, 2016, 2017b). In the direct method, optimal control is seen as a standard optimisation problem: perform a search for the control function u(t) that optimises the objective functional (u) . However, as optimisation routines do not operate on infinite dimensional space, we used straightforward discretisation of continuous space (Ghosh et al., 2011). To approximate optimal control u (t) we use a Bézier curve which can parametrise smooth function with a minimum number of parameters. Given a large enough number of properly selected control points, any smooth function can be approximated by a Bézier curve to arbitrary accuracy (Farin, 1993). Here, Bézier control parametrisation (BCP) (Ghosh et al., 2011) is used to determine an optimal control objective function (Kmet and Kmetova, 2017a). The purpose of this paper, which connects to and provides more information about our previous works (Kmet and Kmetova, 2017a,b), is to compare direct and indirect methods. We use an SIR model to compare the direct method including invasive weed optimisation (IWO) or particle swarm optimisation (PSO) and Bézier curve approximation with two indirect methods, one of them based on Pontryagin's maximum principle and the second on necessary conditions for optimal control problem (Kirk, 1989) with ACD and ESNs. To examine the effectiveness of these methods, we use optimal control problem of vaccination in an epidemic model introduced by Kermack and Mckendrick (1927), which was modified by various

Corresponding author. E-mail addresses: [email protected] (T. Kmet), [email protected] (M. Kmetova). URLs: http://www.ujs.sk (T. Kmet), http://www.ukf.sk (M. Kmetova).

https://doi.org/10.1016/j.biosystems.2019.104029 Received 19 March 2018; Received in revised form 23 August 2019; Accepted 6 September 2019 0303-2647/ © 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Tibor Kmet and Maria Kmetova, BioSystems, https://doi.org/10.1016/j.biosystems.2019.104029

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

researchers (Jung et al., 2002; Kandhway and Kuri, 2014; Rodrigues et al., 2014; Zaman et al., 2008, 2009, 2017; Wang et al., 2016; Xu et al., 2017). A basic idea of this model is to use vaccination as a control variable to minimise the infected and susceptible individuals. In this paper, we use a modified model with two control variables for numerical calculation to minimise the infected and susceptible individuals presented in Zaman et al. (2008). The presented paper is organised as follows. In Section 2, we briefly describe necessary and sufficient conditions of the existence of optimal control for the general problem. Section 3 presents a mathematical model describing the population dynamics of infectious disease and the optimal control problem of vaccine coverage threshold needed for disease control and eradication. We also give an analysis of existence and stability of equilibria of this model. Section 4 describes Bernstein–Bézier parametrisation. Section 5 elaborates the IWO and PSO algorithm. In Section 6, we describe the typical ESNs architecture. Section 7 introduces the foundations of model-reference adaptive critic designs and presents the proposed adaptive critic algorithm. Section 8 includes some numerical results of optimal control problem solving by PMP, BCP-IWO, BCP-PSO and ACDESNs methods. All numerical experiments given in this section were performed on a PC with Intel(r) i7 CPU and 8 GB main memory. The software used was Matlab 2016a. Finally, Section 9 concludes the paper.

0 (t )

m

H (t f ) =

t0

f0 (x (t ), u (t ))dt

} denotes the class of all (u(·), Let = {(u (·), x (·,x 0 , u (·)): u (·) x(· , x0, u(·))), such that x(· , x0, u(·)) is a solution of Eq. (2) on interval [t0, tf] with control u(·) and initial condition x0. Also, we define a set of = {x f as terminal condition points Rm : (x f ) = 0: x (t f , x 0 , u) = x f for some (u (·), x ((·), x 0 , u (·))) }. Let us write a vector z˜ Rm + 1 as z˜ = (z , z m + 1) , where z = (z1, …, F˜ (x ) = zm) ∈ Rm. For each x ∈ Rm let {˜: z z = f (x , u), zm + 1 f0 (x , u), u U } . µ= Let us denote inf J (x , u) . Sufficient condition of existence of optimal control so-

(x , u)

lution is given by the following theorem: Theorem 1 (Fleming and Rishel, 1975). Suppose that (a) (b) (c) (d) (e)

U }.

t

(2)

1

0

m

u),

(3)

m+1

where λ ∈ R is the adjoint variable. Let (xˆ, uˆ) be an optimal solution for Eq. (1); then the necessary condition (see Kirk, 1989; Pontryagin et al., 1983) implies that there exists a piecewise continuous and piecewise continuously differentiable adjoint function λ : [t0, tf] → Rm+1 and a multiplier σ ∈ Rr satisfying j (t )

=

j (t f )

H (xˆ (t ), (t ), uˆ (t )), xj = g xj (xˆ (t f )) + j = 0, …, m ,

β

0

2

1

Mathematical models describing the population dynamics of infectious disease have played an important role in the better understanding of epidemiological patterns and disease control. A well-known model of the infectious diseases is the classical SIR model (Brauer, 2001; Zaman et al., 2009). In this model, the whole population is divided into three compartments that describe separated groups of individuals: susceptible which are able to contract the disease (denoted by x1(t)), infective which are capable of transmitting the disease (denoted by x2(t)) and recovered which are permanently immune (denoted by x3(t)). The variables xi(t) represent the number of individuals in each compartment at a particular time t, and the whole population size x4(t) is the sum of the above fractional groups, i.e. x1(t) + x2(t) + x3(t) = x4(t). For the optimal control problem we consider the control variable u(t) = (u1(t), u2(t)) ∈ U relative to the state variables x1, …, x4, where U = {(u1, u2), ui(t) is measurable, uimin ≤ ui(t) ≤ uimax} denotes an admissible control set. The effect of dispersion of the population has been taken into consideration and, in this situation, the governing equations for the population densities become a system of differential equations. The aim of the given optimal control problem is to minimise the

Then the augmented Hamiltonian function for problem (1) is given as j f j (x ,

1

f

1

f

3. SIR model

x 0 (0) = 0.

j=0

0 as ∥u ∥ → ∞

a compact such that x • ψ is continuous on , there exists and J(x, u) ≤ μ imply that x , where μ > μ. • U is convex, f(x, u) = α(x) + β(x)u, f (x, .) is convex in U, • f (x, u) ≥ c ∥ u ∥ − c , c > 0, β > 1.

defined by

H (x , , u ) =

h (u ) u

Corollary 1 (Fleming and Rishel, 1975). In Theorem 1 any of the assumptions (c), (d), (e) can be replaced by the corresponding (c’), (d’), (e’), where:

f0 (x (s ), u (s ))ds

x 0 (t ) = f0 (x (t ), u (t )),

is nonempty. U is closed. is compact and g is continuous on . F˜ (x ) is convex for each x ∈ Rm. f0(x, u) ≥ h(u), where h is continuous and u ∈ U.

If U is compact, then assumption (e) holds vacuously (Fleming and Rishel, 1975).

(1)

Functions g : Rm → R, f0 : Rm+d → R, f : Rm+d → Rm and ψ : Rm → Rr, 0 ≤ r ≤ m are assumed to be sufficiently smooth (C1) on appropriate open sets. We introduce an additional state variable 0

(6)

2.1. Existence of an optimal control solution

x (t 0 ) = x 0 , (x (t f )) = 0.

x 0 (t ) =

u ) t f = 0.

For free terminal condition ψ(x) ≡ 0 and g(x) ≡ 0 we get zero terminal condition λj(tf) = 0, j = 1, …, m and we can set λ0 = −1.

subject to

x (t ) = f (x (t ), u (t )),

j f j (x , j=0

In the following sections, we suppose that U = {u ∈ Rd : uimin ≤ ui ≤ uimax}. The optimal control problem is to minimise tf

(5)

For free terminal time tf, an additional condition needs to be satisfied:

In this section, we briefly describe the necessary and sufficient conditions of optimality for the optimal control problem. We consider a nonlinear optimal control problem. Let x(t) ∈ Rm denotes the state variable of a system and u(t) ∈ U ⊂ Rd the control variable in a given time interval [t0, tf]. We define an admissible control set as

(x , u) = g (x (t f )) +

H (xˆ (t ), (t ), uˆ (t )). u

0=

2. Optimal control problem

= {u (·) = (u1 (·), …, ud (·)): u (·) is measurable and u (t )

= 0,

xj (xˆ (t f )),

(4) 2

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

infective and susceptible individuals by using the possible minimal control variables u1(t) and u2(t). Susceptible individuals induce an optimal control vaccine u1(t) before the infection and an optimal control treatment u2(t) should be provided to infected individuals. The time evolutions of the population's compartments in the SIR model is described by four nonlinear differential equations:

x1 (t ) =

b

µ

x (t ) + u2 (t ) 2 x 4 (t ) x1 (t ) x2 (t ) x 4 (t )

x2 (t ) = x3 (t ) =

x (t ) u2 (t ) 2 x 4 (t )

u1

rx (t ) µ) 4 x1 (t ) Kc c + (1

c + (1

µ)

totically stable if

u1 (t ) x1 (t ),

rx (t ) µ) 4 x2 (t ) Kc

A1

x2 (t ),

J (X¯ ) =

rx 4 (t ) x3 (t ), Kc

rt

,

2

X˜2 ,

J (x ) =

u2

x1 Kc

x2 Kc

x1 u2 Kc

u1

(1

A1 u ) 2 + Kc

if u1 = 0 and

+ A1 +

A1

u1

(1

)

u2 Kc

X˜2 Kc

A1

0

0 (1

)

.

0

u2 + Kc

A1

+

X˜2 + A1 + Kc

+

X˜2 (1 Kc

)

u2 + A1 + u1 Kc

( + A1) = 0.

We set an optimal control problem in the SIR model to control the spread of diseases. The main goal of this control problem is to investigate the optimal vaccine coverage threshold needed for disease control and eradication (Kar and Batabyal, 2011). From these facts, our optimal control problem is given by the following. Find a control u(t) to minimise the objective functional

(interior),

(x , u ) =

tf t0

a1 x1 (t ) + a2 x2 (t ) +

1

2

u1 (t ) 2 +

2

2

u2 (t ) 2 dt

(10)

subject to the state system (7), where a1, a2, ϵ1, ϵ2 are positive constants representing different weights in the objective functional (x , u) . The augmented Hamiltonian function for problem ((10)) with λ0 = −1 (Pontryagin et al., 1983) is given by

0 0

u2 Kc

3.1. Optimal control problem of vaccine coverage

X for where A1 = b − μr. Note that X exists only for u1 = 0 and X¯ u1 → 0. Let us calculate the Jacobian J(x) of the right hand side of (7), we get:

u1

>

By Routh–Hurwitz stability criterion (Gantmacher, 1959) we get that the real part of eigenvalues are negative, i.e. the positive X˜ is locally asymptotically stable.

(medium),

X˜1

(u2 / K c + A1 + )(A1 + u1) . A1

Eigenvalues of Jacobi matrix at X˜ are the solutions of the following characteristic equation:

(trivial),

K c u2 A1 K c (A1 + u1) X˜1 X˜ = ( + A1 + ), , K˜ c Kc (1 ) u2/K c + A1 +

A1

J (X¯ ) A1 matrix are 1 = , 3 = A1, i.e. equilibrium X˜ is locally

<

u1

(9)

A1 K c u1 A1 K c , 0, , A1 + u1 A1 (A1 + u1)

>

J (X˜ ) =

Now we give an analysis of existence and stability of equilibria of SIR model (Zaman et al., 2008, 2017). System (7) for x4 = Kc > 0 has three equilibria:

A1

the A1

.

0

(u2 / K c + A1 + )(A1 + u1) A1

X˜2 Kc

Note 1. The non-negative orthant R+4 is positively forward invariant for Eq. (7), and there exists a unique solution for all x 0 R+4 and t > 0, which is bounded.

x2 Kc

(1

0

For x = X˜ we get

i.e. all solutions x4(t) with x40 > 0 converge to carrying capacity Kc.

X¯ =

A1 A1 + u1

A1 A1 + u1 u2 A1 Kc u ) 2 + Kc

if u1 > 0. It follows that the equilibrium X¯ can be stable if the interior equilibrium X˜ does not exist.

(8)

i = 1, …, 4.

X = (K c , 0, 0),

u2 Kc

c

A1

Note 2. Positive equilibrium X˜ exists for

Here, t denotes the time, b > 0, c > 0, α > 0 and β > 0 are the birth, death, recovery and contact rate, respectively. Further, r = b − c is the intrinsic growth rate, μ is the convex combination constant, Kc is the carrying capacity of the population, ω is the rate of moving infected individuals x2 to susceptible x1 and recovered class x3, and u = (u1, u2) is the vaccination coverage of susceptible and infected individuals. The whole population size x4(t) is given as

x 40

u2 Kc

u1

1

.

A1 , matrix J(X) are 1 = A1, i.e. equilibrium X is locally asymp+ A1 + . For equilibrium x = X¯ we get

asymptotically stable if for

x 40 K c + (K c x 40) e

<

0

1

x 4 (t ) Kc

0

the 3 =

,

Eigenvalues of A u2 u1, 2 = A +1 u K

with initial conditions

x 4 (t ) =

A1

u1

(7)

x i (0) = x 0i ,

of A1

c

0

u ) 2 + Kc

(1

Eigenvalues u2 2 = K

x (t ) ) u2 (t ) 2 + u1 (t ) x1 (t ) x 4 (t )

x2 (t ) + (1

x 4 (t ) = rx 4 (t ) 1

c + (1

0

J (X ) =

x1 (t ) x2 (t ) x 4 (t )

rx 4 (t ) x 4 (t ) Kc

u2 Kc u2 Kc

A1

(x , , u ) = .

=

A1

a1 x1 (t ) 2

2 1

2

For equilibrium x = X we get

a2 x2 (t )

u2 (t )2 + u2 (t )

1

2

u1 (t )2

x2 (t ) ( x 4 (t )

1 (t )

u1 (t ) 2 + u1 (t ) x1 (t )( 3 (t )

2

2

4

u2 (t )2 +

j Fj (x ,

u)

j=1 2 (t )

1 (t ))

+ (1

)

3 (t ))

+ G (t ), (11)

3

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

where λ ∈ R4 is the co-state variable. Vector function F = (F1, …, F4) is given by the right-hand side of Eq. (7), and G(t) consists of all terms of Hamiltonian, that are independent of control variables u(t) = (u1(t), u2(t)).

control points Ci ∈ Rd, i = 0, 1, …, n and a parameter z ∈ [0, 1]. Set Cir (z ) = (1 z ) Cir 1 (z ) + zCri + 11 (z ), r = 1, …, n, i = 0, …, n r and Ci0 (z ) = Ci . Then C0n (z ) Rd is a point (with parameter value z) of Bézier curve Cn, hence C n (z ) = C0n (z ) (Farin, 1993). The result of the algorithm is a Bézier curve of degree n expressed in terms of Bernstein polynomials of degree n :

Proposition 1. There exists an optimal control variable uˆ = (uˆ1, uˆ2) and corresponding optimal state trajectory xˆ = (xˆ1, …, xˆ 4 ), that minimise (x , u ) .

n

Proof. Using Theorem 1, which gives sufficient conditions of existence, similarly as in Kandhway and Kuri (2014), Rodrigues et al. (2014), Wang et al. (2016), Xu et al. (2017) and Zaman et al. (2009), we can prove the existence of optimal control variable uˆ = (uˆ1, uˆ2) . Optimal control problem Eq. (10) satisfies the following conditions:

i=0

where Bin (z ) =

(xˆ, xˆ , , uˆ),

0=

u

i=0

t = u (t )

(14)

ˆ1 (t ) 1u

+ x1 (t )( 3 (t )

(xˆ, , uˆ) = u2

ˆ2 (t ) 2u

+

xˆ2 (t ) ( xˆ 4 (t )

x (t )( (t )

1 (t ))

1 (t )

+ (1 x 2 (t ) ( x 4 (t )

(t ))

) 1 (t )

3 (t ))

2 (t ) + (1

) 3 (t ))

t0 . t0

n

(ni) z (1 i

i=0

z )n

i

ti ci

(17)

(c10, …, c1n, c20, …, c2n) =

tf t0

a1 x1 (t ) + a2 x2 (t ) +

1 1 u1 (t )2 + u2 (t ) 2 dt, 2 2

(18)

where c10, …, c1n and c20, …, c2n are control points of u1(t) and u2(t), respectively, with time control points t0, …, tn. To evaluate Eq. (18), we use a numerical approach, for example Simpson's rule and metaheuristic methods (IWO, PSO) to find optimal values of c10, …, c1n and c20, …, c2n, respectively.

(15)

u2 min , if U2 u2 min , uˆ2 (t ) = U2, if u2 min < U2 < u2 max , u2 max , if U2 u2 max or

ui (t ) = max{min[u imax , Ui ], uimin},

t tf

( )( ) ( )

= 0.

1 Let us denote U1 = 1 3 and U2 = . 1 2 Now, using the property of the control space ui ∈ [uimin, uimax], we get

u1 min , if U1 u1 min , uˆ1 (t ) = U1, if u1 min < U1 < u1 max , u1 max , if U1 u1 max ,

are Bernstein polynomials. One of the

with control points Ci =

= 0,

2 (t )

i

ti , i = 0, …, n . A fixed regular mesh is used ci on the t-axis to make the curve single-valued and to reduce the dimension of the optimisation vectors to n + 1. Substituting Eq. (17) to Eq. (10) with control variable u(t) = (u1(t), u2(t)), where n t t0 i t f t n i n uj (t ) = i = 0 cji , j = 1, 2, we get the following t f t0 i t f t0 objective function:

According to the optimality condition (14), we have

(xˆ, , uˆ) = u1

z )n

In our model, similarly as in Ghosh et al. (2011) and Grothey and Yang (2012), we use Bézier curve in (t, u) space to estimate optimal control u (t). Given a fixed set of t components of the control points as ti = t0 + hi, t f t0 i = 0, …, n, where h = n and the parameters ui, i = 0, …, n, the Bézier curve for u(t) is

(13)

(xˆ, , uˆ).

Ci Bin

C n (t ) =

(12)

(t f ) = 0,

i

n

Let (xˆ, uˆ) be an optimal solution for ((10)). Then the necessary optimality condition for ((10)) implies (Kirk, 1989; Pontryagin et al., 1983) that there exists a piecewise continuous and piecewise continuously differentiable co-state function λ : Q → R4 satisfying

x

(ni) z (1

important properties of Bernstein polynomials is the recursion: Bin (z ) = (1 z ) Bin 1 (z ) + zBin 11 (z ) . The property that Bernstein polyn nomials of degree n form a partition of unity, i = 0 Bin (z ) = 1, is necessary for using them as the coefficients of affine combination of control points of a Bézier curve. The following property is also imn i portant: i = 0 n Bin (z ) = z . Bézier curves are invariant under affine parameter transformation. The curve can be defined over the arbitrary interval t0 ≤ t ≤ tf of the real line. After the introduction of local coordinates z = (t − t0)/(tf − t0), the algorithm proceeds as usual. The corresponding generalised de Casteljau algorithm is in this form: Cir (t ) = (t f t )/(t t 0 ) Cir 1 (t ) + (t t0 )/(t f t 0) Cir+ 11 (t ) . Thus the algebraic expression for Bézier curve Cn over the interval [t0, tf] is

(a) The terminal condition ψ(x) ≡ 0 is satisfied for all x(t, x0, u), therefore the set of admissible pairs (u(t), x(t, x0, u)) is nonempty. (b) U is a compact set. (c) Solutions x(t, x0, u) are non-negative and x1(t, x0, u) + x2(t, x0, u) + x3(t, x0, u) → Kc, i.e. the set S is compact. (d) F(x, u) = (F1(x, u), …, F4(x, u)) is linear in u and F0(x, u) is convex in u. (e) U is a compact set, therefore condition (e) from Theorem 1 holds vacuously.

=

Ci Bin (z ),

C n (z ) =

5. Optimisation algorithms

i = 1, 2.

(16)

Usually, the parameters governing the optimisation problem are presented as a vector u = (u1, …, ud). The task of optimisation is to search (for parameter vector u*) to minimise the objective function (u) . To obtain a minimum of function (u) we use bio-inspired optimisation algorithms. Detailed description of such algorithms can be found in Binitha and Sathya (2012). Note that such algorithms may be NP-hard problems (Rabie et al., 2013). We choose two different metaheuristic methods inspired by nature; invasive weed optimisation (IWO) and particle swarm optimisation (PSO).

4. Bézier curve parametrisation The first approach to Bézier curves was introduced by Paul de Casteljau in the early 1960s at Citroen car company to modelling car shapes. The de Casteljau algorithm is based on the repeated linear interpolation between pairs of control points (Farin, 1993). For creating a Bézier curve of degree n via de Casteljau algorithm, let us give n + 1

4

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

5.1. Invasive weed optimisation algorithm

5.2. Particle swarm optimisation

The first method is IWO algorithm, inspired by colonising weeds, that was first introduced by Mehrabian and Lucas (2006). Since its invention, IWO has been used in many applications (Pourjafari and Mojallali, 2012; Zhou et al., 2014, 2015) like the design and optimisation of antenna array, solving optimal control problems (Ghosh et al., 2011), as well as training neural networks (Giri et al., 2010). In the basic IWO, weeds represent the approximations of the solution. A finite number of weeds is being randomly generated over the search area, then every weed produces some new weeds depending on its fitness. The new weeds are generated randomly by normally distributed random numbers with a mean equal to zero. The pseudo-code for the IWO algorithm is given in Algorithm 1 in detail.

PSO was introduced by Kennedy and Eberhart (1995) as a stochastic population-based search method, which simulates the swarm behaviour. In PSO, each of M randomly chosen particles has three elements: position, velocity, and fitness. The position ui(t) of the particle represents a potential solution of an optimisation problem of generation t, the velocity v i (t ) is the moving step length of position ui(t), the fitness is an objective function value of position ui(t). When updating the velocity, each particle is changed proportionally by its personal historical best position bui(t) and that of the global best particle gu(t) in the swarm (Beheshti and Shamsuddin, 2015; Cao et al., 2018). Velocity in time t+1

v i (t + 1) = v i (t ) + c1 r1 (bui (t )

Algorithm 1. Pseudo-code for the IWO algorithm (Roshanaei et al., 2009)

ui (t + 1) = ui (t ) + v i (t + 1),

ui (t )) + c2 r2 (gu(t )

ui (t )),

(19) (20)

where ω is the inertia weight, r1 and r2 are random numbers, c1 and c2 are cognitive and social parameters, respectively. The pseudo-code for

5

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

the PSO algorithm is given in detail by Algorithm 2.

generates a time series depending on input vector v (t ) (Ongenae et al., 2013; Verstraeten et al., 2007). Vector y(t) is then read by an activation

Algorithm 2. Pseudo-code for the PSO algorithm (Kennedy and Eberhart, 1995)

function g in order to make predictions using the constructed model. When training the model, only the readout function is modified, the complex dynamic, modelling the behaviour of the reservoir, is left unchanged. Echo state networks (ESNs) are said to have the echo state property when their states are uniquely determined by the input time series. If the largest absolute eigenvalue of the matrix W |λ|max < 1, the network state y(t) becomes asymptotically independent of the initial condition y(0) and depends only on the input history v (t ), i.e. ESNs have the echo state property. ESNs (Jaeger, 2001, 2002; Jaeger and Haas, 2004) provide an architecture and supervised learning principle. The idea is (i) to drive a random, large, fixed RNN with the input signal, thereby inducing a nonlinear response signal in each neuron within this “reservoir” network and (ii) combine a desired output signal by a trainable linear combination of all of these response signals. The internal weights of the underlying reservoir network are not changed by the learning, only the connections from the reservoir to the output units are learnt during the training. Supervised learning, or what statisticians know as nonparametric regression, is the problem of estimating a function Y (v ) RL × 1 given only by a training set of pairs of input–output points {(v (t ), Y (t )}tp= 1, sampled, usually with noise, from the function. For a training set with p patterns, the optimal weight vector can be found by minimising the sum of squared errors

6. Echo state neural networks In this section, we describe the standard echo state networks which are recurrent neural networks (RNNs). An RNN consists of a large, randomly connected neural network, the reservoir, which is driven by an input signal and projects to an output unit (Jaeger, 2001). The general layout of an ESN is shown in Fig. 1. It consists of K input nodes, N reservoir nodes, and L output nodes. The standard discrete-time ESN, which we denote by

y (t + 1) = F (y (t ), y out (t ), v (t + 1)), is defined as follows:

y (t + 1) = f (Wy(t ) + W bf y out (t ) + W inv (t )),

(21)

y out (t ) = g (W out [y (t ); v (t )]), where W ∈ RN×N is the internal connection's weight matrix of the reservoir, Win ∈ RN×K is the input matrix, Wbf ∈ RN×L is the feedback matrix, Wout ∈ RL×(N+K) is the output matrix, and y (t ) RN × 1, v (t ) RK × 1, y out (t ) RL × 1 are the internal, input and output vectors at time t, respectively. The state activation function f = (f1, …, fN)T is a sigmoid function (usually fi = tanh) applied component-wise with f(0) = 0 and the output activation function is g = (g1, …, gL)T, where each gi is usually the identity or a sigmoid function. [;] denotes vector concatenation and yT denotes the transpose of the vector y. For every sample, y(0) is initialised as 0. Dynamic of reservoir internal vector y(t) is determined by discrete dynamic system (21), which

p

L

(Wiout [y (t ); v (t )]

Err(W out ) = t =1 i=1

and is given by 6

Yi (t )) 2

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

Fig. 1. The basic structure of an ESN. Solid arrows denote the fixed connections and dashed arrows denote the trainable connections.

W out = (H T H ) 1H T Y or

W out = H T (HHT ) 1Y .

(22)

H is given as

H = ([y (1); v (1)], …, [y (p); v (p)]). This equation can be computed by using the Moore–Penrose generalised matrix inverse, or pseudo-inverse of the matrix H (Penrose, 1955), which is least squares regression. 7. Adaptive critic design based algorithm for solving optimal control problem This section gives a description of model-reference adaptive critic designs, placing them within the framework of optimal control. In the optimal control problems the objective is to find a control variable that minimises the objective function Eq. (10). Two well-known solution approaches are the calculus of variations, involving the Euler–Lagrange equations, and backward dynamic programming (Ferrari and Stengel, 2004). In 1977, Werbos (1992) introduced an approach for approximate dynamic programming, which later became known as adaptive critic design (ACD). A typical design of ACD consists of three modules: action, model (plant), and critic (Werbos, 1990, 1992), as shown in Fig. 2. For those modules, we need to determine: How to learn and adapt the critic network; How to adapt the model network; and How to learn and adapt the action network. The ACD processes, through the nonlinear function approximation capabilities of neural networks, overcome the computational complexity that plagued the dynamic programming formulation of optimal control problems (Balakrishnan et al., 2008) and the computation complexity of the ACD is (n3) (Chen et al., 2019). The action network consists of a parametrised control law with state x input and control u output. The critic network approximates the value-related function, co-state variables λ depending on state x. At any given time, the critic network provides guidance on how to improve the control law. In return, the action network can be used to update the critic network (Ferrari and Stengel, 2004). The algorithm that successively iterates between these two operations converges to the optimal solution over time. The plant dynamics are discrete, time-invariant, and deterministic, and they can be modelled by a difference equation given by discretisation of (7). The action and critic network are chosen as echo state neural networks.

Fig. 2. The three modulus of an adaptive critic design xi, ui – input signal to the action network, xi+1 – output signal from model network, which is input signal to critic network and λi+1 is output signal from critic network.

7

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

To solve optimal control problem (10) indirectly, we use ACD and ESNs, where state x and co-state variables λ are solved forward in time. The adaptive critic neural network procedure of the optimal control problem is summarised in Algorithm 3. We modify the algorithm given in Kmet and Kmetova (2017b) for ordinary differential equations.

and Pontryagin's maximum principle with Matlab code from Bryson (1999) (Br) with 700 discretisation points. To minimise Eq. (18) we have used Bézier curve of degree n = 6 and Simpson's numerical quadrature. In the adaptive critic synthesis, the action and critic network were selected in such a way that they consist of 4 input neurons,

Algorithm 3. Algorithm to solve the optimal control problem

1000 neurons in reservoir and 2 and 4 neurons in output, respectively. Values of objective function (uˆ) and computation time for β = 0.21 and β = 4.21 are presented in Table 1. As it follows from these figures and Table 1, the numerical results of uˆ (t ) = (uˆ1 (t ), uˆ2 (t )) optimal control and optimal trajectory xˆ (t ) = (xˆ1 (t ), …, xˆ 4 (t )) for the four presented methods are comparable. The numerical simulations show that the number of susceptible and infected individuals decrease after the optimal control treatment and a small number of individuals are infected from population x4(t). Without this treatment, the number of susceptible and infected individuals is very high. The numerical calculations also show that depending on contact rate β we can get different optimal control uˆ and optimal trajectory xˆ .

8. Numerical simulation The solution of optimal control problem (10) using Bézier curve approximation and adaptive critic neural network are displayed in Figs. 3–6. We have plotted the optimal treatment strategy, susceptible and recovered individuals by considering values of parameters (Zaman et al., 2009) as b = 0.07, c = 0.0123, α = 0.0476, β = 0.21 (4.21), μ = 0.014, ω = 0.35, Kc = 130, a1 = 1, a2 = 1, ϵ1, ϵ2 = 1. We choose uimin = 0 and uimax = 2, i = 1, 2. In Figs. 3–6 we illustrate results of four computation algorithms. We have used Bézier curve parametrisation with metaheuristic methods (BCP-IWO) and (BCP-PSO), adaptive critic design with ESNs and discretisation of optimal control problem (10), 8

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

Fig. 3. ACD, Bézier curve simulation with IWO and PSO and Br results of optimal controls uˆ1 (t ) and uˆ2 (t ) with initial condition x(0) = (60, 50, 50, 160) and β = 0.21.

Fig. 4. The plot represents the population of susceptible and recovered individuals with initial condition x(0) = (60, 50, 50, 160) and β = 0.21 for ACD, Bézier curve simulation with IWO and PSO and Br simulation.

Fig. 5. ACD, Bézier curve simulation with IWO and PSO and Br results of optimal controls uˆ1 (t ) and uˆ2 (t ) with initial condition x(0) = (60, 50, 50, 160) and β = 4.21.

trajectory xˆ converges to its equilibrium (see Fig. 6). The critical value of β for the given set of parameters based on Note 1 was chosen as (u / K + A + )(A1 + u1 max ) = 2 max c 1A = 3.95. As it follows from Figs. 3–6, our 1 results are quite similar to the direct and indirect methods presented

(u / K + A + )(A1 + u1) < 2 c 1A xˆ converges to equilibrium X¯ with For 1 u1 = 1.45, u2 = 0, which is locally asymptotically stable (see Fig. 4) and (u / K + A + )(A1 + u1) infected individuals have vanished. When > 2 c 1A there 1

exists interior equilibrium X˜ with u1 = 2, u2 = 0.77 and optimal 9

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

Fig. 6. The plot represents the population of susceptible and recovered individuals with initial condition x(0) = (60, 50, 50, 160) and β = 4.21 for ACD, Bézier curve simulation with IWO and PSO and Br simulation.

(u ) =

Table 1 Values of objective function (u) and computationally time of simulations for initial conditions x0 = (60, 50, 50, 160) and β = 0.21 (upper line) and β = 4.21 (lower line). Algorithm

Value of J(u)

Computationally time (s)

BCP-IWO

1269 3088 1269 3084 1277 3106 1268 3082

720 850 380 440 160 96 81 160

BCP-PSO ACD Br

2

(u12 + u22) + a1

A1 K c . A1 + u1

It is evident that u2 = 0. For equilibrium point X˜ we get

(u ) =

K c u2 + A1 + Kc A1 K c (A1 + u1) X˜1 + a2 , (1 ) u2/K c + A1 + 2

(u12 + u22 ) + a1

The optimal control depending on constant ϵ = ϵ1 = ϵ2 is shown in Fig. 7. For large ϵ and ui ∈ [uimin, uimax] functions u1 and u2 are decreasing functions of ϵ. 8.2. Feedback control

but the computation time for direct methods is very high (see Table 1).

Let

= {xs = (x1i , x2j , x3k , x 4s ): x1i = ih, x2j = jh, x3k = kh, x 4s is a

= x1i + x2j + x3k , 0 i , j, k N } set of grid points of state variable x. We can modify Algorithm 3 to determine optimal control law uˆ (xs ) for all points xs. Choosing a point xs and using Algorithm 3 for fixed time t = 0 we get the corre3 sponding control uˆ (xs ) and obtain the training set T = {(xs , uˆ s )}sN= 1, illustrated in Fig. 8. We can use supervised learning and ESNs to determine internal connection's weight matrix Wout from Eq. (22), and estimate optimal control variable u(x) by Eq. (21) as a function of state

8.1. Optimal control of equilibria We have a special case when we choose equilibrium points as an initial condition. For equilibrium X¯ we minimise the following function:

Fig. 7. Optimal control uˆ for equilibrium points X¯ , X˜ depending on ϵ. 10

BioSystems xxx (xxxx) xxxx

T. Kmet and M. Kmetova

Fig. 8. The plot represents the function of feedback control of uˆ (x ) .

Fig. 9. The plot represents optimal trajectory xˆ and optimal control uˆ using ESN with training set T and Bézier curve simulation with PSO with initial condition x (0) = (60, 50, 50, 160) and β = 0.21.

variable for all x R+4 . The optimal control and the trajectory for this computation are shown in Fig. 9. We have used 4096 pairs of inputoutput points (xs , uˆ s ) to determine Wout during simulation. The numerical calculation shows that ESNs approximation of uˆ is able to give similar results as the direct and indirect methods presented.

using the SIR model to minimise the number of susceptible and infected individuals of the population. Simulations of optimal control problem with the four methods mentioned show that the values of the objective function for different initial conditions are similar but the computational times of simulations are different. The computational time is longer for the direct method than for indirect methods. It is interesting to note that depending on the value of contact rate β, the optimal trajectory is close to the equilibria X¯ or X˜ in time tf. For small β, the number of infected individuals goes to zero and the optimal trajectory is close to X¯ . If β is sufficiently large, there exists an interior equilibrium X˜ and the optimal trajectory is close to X˜ . Finally, we determined the optimal control law for control variable u(x) as a function of state variable x. By a given training set, we determined the connection's weight matrix of the echo state networks and estimated optimal control variable u(x). The numerical calculation showed that the echo state networks approximation of control variable is able to give similar results as the direct and indirect methods.

9. Conclusion In this paper, we gave an analysis of the existence and stability of equilibria. Depending on contact rate β, the SIR model has three equilibria: trivial, medium and interior. If a medium equilibrium exists, the trivial one loses its stability. Similarly, the existence of interior equilibrium means instability of the medium equilibrium. We have also studied optimal control strategies to prevent the spread of infection. Using MATLAB, we presented a comparison of one direct and two indirect methods. The direct method was based on Bernstein–Bézier parametrisation of control variable u(t), where the control points of the Bézier curve were found by IWO and PSO algorithms. The indirect methods were based on Pontryagin's maximum principle and on the necessary condition of optimal control problem, where co-state variables were approximated with adaptive critic design using echo state networks. We compared the effectiveness of these methods

Conflict of interest None declared.

11

T. Kmet and M. Kmetova

BioSystems xxx (xxxx) xxxx

Acknowledgments

Kermack, W.O., Mckendrick, A.G., 1927. Contribution to the mathematical theory of epidemic. Proc. R. Soc. Lond. Ser. A 115, 700–721. Kirk, D.E., 1989. Optimal Control Theory: An Introduction. Dover Publications, New York. Kmet, T., Kmetova, M., 2016. Neural networks simulation of distributed control problems with state and control constraints. In: Villa, A.E.P., Massuli, P., Rivero, A.J.P. (Eds.), LNCS 9886 Artificial Neural Networks and Machine Learning – ICANN 2016. Springer, Heidelberg, pp. 468–477. Kmet, T., Kmetova, M., 2017a. Bézier curve parametrization methods for solving optimal control problems of SIR model. In: Martín Vde, C., Neruda, R., Vega-Rodrigues, M.A. (Eds.), LNCS 10687 Theory and Practice of Natural Computing – TPNC 2017. Springer, Heidelberg, pp. 100–110. Kmet, T., Kmetova, M., 2017b. Echo state networks simulation of sir distributed control. In: Rutkowski, L. (Ed.), LNAI 10245 Artificial Intelligence and Soft Computing AISC 2017. Springer, Heidelberg, pp. 86–96. Mehrabian, A., Lucas, C., 2006. A novel numerical optimization algorithm inspired from weed colonization. Ecol. Inform. 1 (4), 355–366. Ongenae, F., Van Looy, S., Verstraeten, D., Verplancke, T., Decruyenaere, J., 2013. Time series classification for the prediction of dialysis in critically ill patients using echo state networks. Eng. Appl. Artif. Intell. 26, 984–996. Penrose, R., 1955. A generalized inverse for matrices. Math. Proc. Cambr. Philos. Soc. 51, 406–413. Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R., Mischenko, E.F., 1983. Freshwater Ecosystems. Modelling and Simulation, Developments in Environmental Modelling. Nauka (in Russian), Moscow. Pourjafari, E., Mojallali, H., 2012. Solving nonlinear equations systems with a new approach based on invasive weed optimization algorithm and clustering. Swarm Evol. Comput. 4, 33–43. Rabie, H.M., El-Khodary, I.A., Tharwart, A.A., 2013. A particle swarm optimization algorithm for the continuous absolute p-center location problem with Euclidean distance. Int. J. Adv. Comput. Sci. Appl. 4, 101–106. Rodrigues, H.S., Monteiro, M.T.T., Torres, D.F.M., 2014. Vaccination models and optimal control strategies to dengue. Math. Biosci. 247, 1–12. Roshanaei, M., Lucas, C., Mehrabian, A., 2009. Adaptive beam forming using a novel numerical optimisation algorithm. IET Microwaves Anten. Propag. 3 (5), 765–773. Verstraeten, D., Schrauwen, B., D’Haene, M., Stroobandt, D., 2007. An experimental unification of reservoir computing methods. Neural Netw. 20 (3), 414–423. Wang, X., Zang, L., Lin, Y., Zhao, Z., Hu, X., 2016. Computational model and optimal control strategies for emotion contagion in the human population in emergencies. Knowl.-Based Syst. 109, 35–47. Werbos, P.J., 1990. A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (Eds.), Neural Networks for Control. The MIT Press, Cambridge, MA, pp. 493–525 (Chapter 3). Werbos, P.J., 1992. Approximate dynamic programming for real-time control and neural modelling. In: White, D.A., Sofge, D.A. (Eds.), Handbook of intelligent control: Neural Fuzzy, and Adaptive Approaches. Van Nostrand, New York, pp. 493–525. Xu, D., Xu, X., Xie, Y., Yang, C., 2017. Optimal control of an SIVRS epidemic model with virus variation based on complex networks. Commun. Nonlinear Sci. Numer. Simul. 48, 200–210. Zaman, G., Kang, Y.H.G., Jung, C.I.H., 2017. Optimal strategy of vaccination and treatment in an SIR epidemic model. Math. Comput. Simul. Biosyst. 136, 63–77. Zaman, G., Kang, Y.H., Jung, I.H., 2008. Stability analysis and optimal vaccination of an SIR epidemic model. BioSystems 93, 240–249. Zaman, G., Kang, Y.H., Jung, I.H., 2009. Optimal treatment of an SIR epidemic model with time delay. BioSystems 98, 43–50. Zhou, Y., Chen, H., Zhou, G., 2014. Invasive weed optimization algorithm for optimization no-idle flow shop scheduling problem. Neurocomputing 137, 285–292. Zhou, Y., Luo, Q., He, A., Wu, J., 2015. A discrete invasive weed optimization algorithm for solving salesman problem. Neurocomputing 151, 1227–1236.

We are very grateful to the anonymous reviewers for their careful reading, constructive comments and helpful suggestions, which helped us improve the presentation of this work. This paper is part of the solution of the scientific projects PADE-2019 and ED-18-1-2018-0014. References Balakrishnan, S., Ding, J., Lewis, F., 2008. Issues on stability of ADP feedback controllers for dynamical systems. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 38 (4), 913–916. Beheshti, Z., Shamsuddin, S.M., 2015. Non-parametric particle swarm optimization for global optimization. Appl. Soft Comput. 28, 345–359. Binitha, S., Sathya, S.S., 2012. A survey of bio-inspired optimization algorithms. Int. J. Soft Eng. 2, 137–151. Borzi, A., 2003. Multigrid methods for parabolic distributed optimal control problems. J. Comput. Appl. Math. 157, 365–382. Brauer, F.R., 2001. Mathematical Models in Population Biology and Epidemiology. Springer, New York. Bryson, A.E.J., 1999. Dynamic optimization. Addison-Wesley, Berkeley. Cao, L., Xu, L., Goodman, E.D., 2018. A neighbor-based learning particle swarm optimizer with short-term and long-term memory for dynamic optimization problems. Inform. Sci. 453, 463–485. Chen, X., Wang, W., Cao, W., Wu, M., 2019. Gaussian-kernel-based adaptive critic design using two-phase value iteration. Inform. Sci. 482, 139–155. Farin, G., 1993. Curves and Surfaces for Computer Aided Geometric Design – A Practical Guide. Academic Press Professional, San Diego. Ferrari, S., Stengel, R.E., 2004. Model-based adaptive critic designs. In: Si, J., Barto, A., Powell, W., Wunsch, D. (Eds.), Learning and Approximate Dynamic Programming: Scaling Up to the Real World. IEEE Press and John Wiley and Sons, pp. 468–477. Fleming, H.F., Rishel, R.W., 1975. Deterministic and Stochastic Optimal Control. Springer-Verlag, New York, Heidelberg, Berlin. Gantmacher, F.R., 1959. The Theory of Matrices, Vol. 2. Chelsea, New York. Ghosh, A., Das, S., Chowdhury, A., Giri, R., 2011. An ecologically inspired direct search method for solving optimal control problems with Bézier parameterization. Eng. Appl. Artif. Intell. 24, 1195–1203. Giri, R., Chowdhury, A., Ghosh, A., Das, S., Abraham, A., Snasel, V., 2010. A modified invasive weed optimization algorithm for training of feed-forward neural networks. In: IEEE International Conference on Systems Man and Cybernetics. IEEE. pp. 3166–3173. Grothey, A., Yang, X., 2012. Approximate dynamic programming with Bézier curves/ surfaces for top-percentile traffic routing. Eur. J. Oper. Res. 218, 698–707. Jaeger, H., 2001. The “Echo State” Approach to Analysing and Training Recurrent Neural Networks. Technical Report GMD 148. German National Research Institute for Computer Science, Bonn. Jaeger, H., 2002. Short Term Memory in Echo State Networks. Technical Report GMD 152. German National Research Institute for Computer Science, Bonn. Jaeger, H., Haas, H., 2004. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80. Jung, E., Lenhart, S., Feng, Z., 2002. Optimal control of treatments in a two-strain tuberculosis model. Discr. Cont. Dyn. Syst. 2 (4), 473–482. Kandhway, K., Kuri, J., 2014. How to run a campaign: optimal control of SIS and SIR information epidemics. Appl. Math. Comput. 231, 79–92. Kar, T.K., Batabyal, A., 2011. Stability analysis and optimal control of an SIR epidemic model with vaccination. Biosyst. Sci. 104, 127–135. Kennedy, J., Eberhart, R.C., 1995. Particle swarm optimization. In: IEEE International Conference on Neural Networks. IEEE. pp. 1942–1948.

12