Characterization of the optimal policy for the control of a simple immigration process through total catastrophes

Characterization of the optimal policy for the control of a simple immigration process through total catastrophes

Operations Research Letters 24 (1999) 245–248 www.elsevier.com/locate/orms Characterization of the optimal policy for the control of a simple immigr...

76KB Sizes 0 Downloads 29 Views

Operations Research Letters 24 (1999) 245–248

www.elsevier.com/locate/orms

Characterization of the optimal policy for the control of a simple immigration process through total catastrophes E.G. Kyriakidis Technological Educational Institute of Heraklion, Branch of Chania, Department of Electronics, 3 Romanou Str., Halepa, Chania, 731 33, Crete, Greece Received 1 October 1996; received in revised form 1 January 1999

Abstract This paper deals with the problem of controlling a simple immigration process, which represents a pest population, by the introduction of total catastrophes. It is assumed that the cost per unit of time caused by the pests is a bounded non-decreasing c 1999 Elsevier function of their population size. The average-cost optimal policy is found to be of control-limit type. Science B.V. All rights reserved. Keywords: Markov decision process; Catastrophes; Pest control

1. The model Consider a pest population whose growth is represented by a simple immigration process with immigration rate  ¿ 0. The damage caused by the pests is represented by a cost ci ¿0 (i¿0) for each unit of time during which the population size is i. It is natural to assume that the sequence {ci } is non-decreasing and c0 = 0. Moreover, it is assumed that the sequence {ci } is bounded above. Suppose that the pest population may be controlled by some action which introduces total catastrophes. When such a catastrophe occurs, the population size is instantaneously reduced to zero. It is assumed that the unit of time has been chosen in such a way that the rate at which the catastrophes are introduced is equal to one. Thus, when the controlling action is taken, the length of time until the occur-

rence of a catastrophe is exponentially distributed with unit mean. Let the cost of taking controlling action be k per unit time, where k is a positive constant. Our goal is to characterize the form of the policy which minimises the expected long-run average cost per unit time. The decision epochs are the epochs at which an immigration of a pest occurs. Let f ≡ {fi }; i = 0; 1; : : : denote a stationary policy, under which the action fi is taken when the process is in state i. Assume that fi = 1 when the controlling action, which introduces catastrophes, is being taken, and fi = 0 when the controlling action is not being taken. If the stationary policy f = {fi }; i = 0; 1; : : : is employed, our assumptions imply that we have a continuous-time Markov chain, for the population growth of the pests with state space S ≡ {0; 1; : : :} and the following transition

c 1999 Elsevier Science B.V. All rights reserved. 0167-6377/99/$ - see front matter PII: S 0 1 6 7 - 6 3 7 7 ( 9 9 ) 0 0 0 1 2 - 7

246

E.G. Kyriakidis / Operations Research Letters 24 (1999) 245–248

Since {ci } is bounded above, it is deduced that for all i ∈ {1; 2; : : :} and all a ∈ {0; 1}

rates: Transition

Rate

i →i+1



(i¿0)

i→0

fi

(i¿1)

It seems intuitively reasonable that the optimal policy belongs to the class of control-limit policies P ≡ {Px : x =1; 2; : : :}, where Px is the stationary policy according to which the controlling action is taken if and only if the pest population size is greater than or equal to x. In Section 2 a necessary and sucient condition is given that guarantees the optimality of the policy that never takes controlling action. If the condition fails, the optimality of a control-limit policy within the class of all policies is proved. The present model is related to two Markov decision processes (see [3,4]) in which ci = i and the pest population grows according to a simple immigration-birth process and a simple immigration-birth–death process, respectively. In the ÿrst case it was proved analytically that a control-limit policy is optimal while in the second case the optimal policy was found to be of control-limit type by implementing an appropriate algorithm.

C (i; a) ¡

supi ci + k ≡ B: 

(1)

Let pij (a) denote the probability that the next state of the process will be j, given that the present state is i and the action a is chosen. Let also Fij (t; a) denote the probability distribution of the time until the next decision epoch, given that the present state is i, the next state is j and the action a is chosen. The minimum n-step expected discounted cost V (i; n), where i is the initial state, can be found for all n = 1; 2; : : : ; recursively, from the following equations (see e.g. [2], p. 202)  ∞  X pij (a) C (i; a) + V (i; n) = min a∈{0;1}  j=0

Z ×

∞ 0

 

e− t V (j; n − 1) dFij (t; a)



(i¿0) with initial condition

2. The form of the optimal policy

V (i; 0) = 0:

Following the usual line of proof (see e.g. [1,6]) we ÿrst consider the corresponding ÿnite-horizon discounted cost problem. The relevant theory can be found in Chapter 7 of Ross’s book [5] and in Chapter 5 of Heyman and Sobel’s book [2]. Let C (i; a) be the one-step expected discounted cost if the discount rate is ¿0 and the action a ∈ {0; 1} is taken when the state of the process is i¿1. By conditioning on the corresponding one-step transition times we obtain  Z ∞ Z t ci ; ci e− s ds e−t dt = C (i; 0) = + 0 0  Z ∞ Z t − s (ci + k)e ds C (i; 1) =

The above equations in the present problem take the following form:

0

0

×( + 1)e =

−(+1)t

ci + k : ++1

dt

V (0; n) = V (i; n)

(i¿0)

 V (1; n − 1); +



= min

ci + V (i + 1; n − 1) ; +

ci + k + V (i + 1; n − 1) + V (0; n − 1) ++1 (i¿1)



(2)

with initial value V (i; 0) = 0

(i¿0):

Lemma 1. {V (i; n)} (i¿0) is non-decreasing in i; for all n = 0; 1; : : : .

E.G. Kyriakidis / Operations Research Letters 24 (1999) 245–248

Proof. The result is true for n = 0, since V (i; 0) = 0 (i¿0). Suppose that for a particular n¿1; {V (i; n− 1)} (i¿0) is non-decreasing in i. From Eq. (2) and the monotonicity of {ci } it follows that {V (i; n)} (i¿1) is non-decreasing in i. Note also that V (0; n) =

 V (1; n − 1) +

where the last inequality follows from the fact that the minimum ÿnite-horizon expected discounted cost is a non-decreasing function of the length of the horizon. Hence {V (i; n)} (i¿0) is non-decreasing in i and the result of the lemma has been proved by induction on n. The limit of V (i; n) as n → ∞ is equal to the minimum total expected discounted cost V (i), with initial state i (see Theorem 7.3 on p. 158 in [5]). From Lemma 1 it is deduced that {V (i)} (i¿0) is non-decreasing in i. Assume now that the initial state is i¿1. Let Xr ; r= 0; 1; : : : be the state of the process at the rth decision epoch and Tr be the time until the rth decision epoch (T0 = 0). Let  denote the non-stationary policy that takes controlling action until the ÿrst entry into state 0 and thereafter chooses the same actions as the discounted-cost optimal policy ∗ , and N denote the decision epoch at which the ÿrst entry into state 0 occurs. Let also V (i; ∗ ) and V (i; ) be the total expected discounted costs under the policies ∗ and , respectively, given that the initial state is i. Then

where E represents the conditional expectation given that the policy  is employed. Separating the above series into two parts we obtain # " N −1 X exp(− Tr )C (Xr ; ar ) |X0 = i V (i) 6 E

+E

r=N

# exp(− Tr )C (Xr ; ar ) |X0 = i

+E [exp(− TN )] " ∞ X exp[ − (Tr − TN )] ×E #

× C (Xr ; ar ) |X0 = i (since the policy  is independent of TN after entry into state 0) 6 BE

" N −1 X

# exp(− Tr )

r=0

+ E [exp(− TN )]V (0) (using Eq: (1) and the deÿnitions of ; N ) 6 BE N + V (0):

(3)

According to Wald’s equation (see e.g. [5], p. 38) E N is equal to the expected time until a catastrophe occurs divided by the expected time of each transition. Hence E N = 1=( + 1)−1 = 1 + :

(4)

Since {V (i)} (i¿0) is non-decreasing, from relations (3) and (4) we have

Therefore the condition of Theorem 7.7 in [5] is satisÿed and it follows that there exists a bounded sequence {hi }, i = 0; 1; : : : and a constant g such that

r=0

∞ X

exp(− Tr )C (Xr ; ar ) |X0 = i

r=0

|V (i) − V (0)|6B(1 + ) (i¿0; ¿ 0):

V (i) = V (i; ∗ )6V (i; ) " ∞ # X exp(− Tr )C (Xr ; ar ) |X0 = i ; = E

"

#

r=N

¡ V (1; n − 1)6V (1; n);

r=0

= E

" N −1 X

247

hi = lim [V n (i) − V n (0)] n→∞

(i¿0)

(5)

for some sequence n → 0, and g = h1 − h0 g = min{ci + hi+1 − hi ; ci + k + hi+1 + h0 − ( + 1)hi }

(i¿1):

(6)

248

E.G. Kyriakidis / Operations Research Letters 24 (1999) 245–248

Furthermore, there exists a stationary optimal average-cost policy that chooses at each state i¿1 the minimising action on the right-hand side in Eq. (6). Its average cost is equal to g for every initial state. From Eq. (5) it follows that the sequence {hi } (i¿0) is non-decreasing in i, since {V (i)} (i¿0) is non-decreasing in i. Note that the average cost under the policy that never takes controlling action is equal to g = supi ci . Consequently, this policy is optimal if and only if ci + hi+1 − hi (i¿1);

(7)

where the sequence {hi } (i¿0) satisÿes the di erence equations sup ci = ci + hi+1 − hi i

(i¿0):

(8)

The inequalities (7) are equivalent to the following inequalities: (i¿1):

Iterating Eq. (8) we obtain  i−1  X −1 sup ci − cj + h0 hi =  j=0

i

(9)

(i¿0):

(10)

From the expression (10) we deduce that the inequalities (9) hold if and only if  ∞  X sup ci − cj 6k: (11) −1 j=0

i

x∗ = min{integer i¿1 such that hi ¿k + h0 } is the overall average-cost optimal policy.

6ci + k + hi+1 − ( + 1)hi + h0

hi 6k + h0

Hence, the above relation is a necessary and sucient condition that guarantees the optimality of the policy that never takes controlling action. Suppose that Eq. (11) does not hold. Then, there must exist some state at which the optimal policy takes controlling action. The optimal policy takes controlling action at i¿1 if and only if hi ¿k + h0 . From the monotonicity of {hi } it follows that the control-limit policy Px∗ where,

References [1] N. Douer, U. Yechiali, Optimal repair and replacement in Markovian systems, Comm. Statist. Stochastic Models 10 (1994) 253–270. [2] D.P. Heyman, M.J. Sobel, Stochastic Models in Operations Research, vol. 2, McGraw-Hill, New York, 1984. [3] E.G. Kyriakidis, A. Abakuks, Optimal pest control through catastrophes, J. Appl. Probab. 27 (1989) 873–879. [4] E.G. Kyriakidis, Optimal control of a simple immigrationbirth–death process through catastrophes, European J. Oper. Res. 81 (1995) 346–356. [5] S.M. Ross, Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, 1970. [6] F.A. Van der Duyn Schouten, S.G. Vanneste, Maintenance optimization of a production system with bu er capacity, European J. Oper. Res. 82 (1995) 323–338.