Artificial Intelligence in Medicine 63 (2015) 33–40
Contents lists available at ScienceDirect
Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim
Adaptive dynamic programming algorithms for sequential appointment scheduling with patient preferences Jin Wang, Richard Y.K. Fung ∗ Department of Systems Engineering and Engineering Management, City University of Hong Kong, 83 Tat Chee Ave, Kowloon, Hong Kong
a r t i c l e
i n f o
Article history: Received 19 September 2013 Received in revised form 1 December 2014 Accepted 4 December 2014 Keywords: Markov decision process Adaptive dynamic programming Outpatient department Appointment scheduling Patient preferences
a b s t r a c t Objectives: A well-developed appointment system can help increase the utilization of medical facilities in an outpatient department. This paper outlines the development of an appointment system that can make an outpatient department work more efficiently and improve patient satisfaction level. Methods: A Markov decision process model is proposed to schedule sequential appointments with the consideration of patient preferences in order to maximize the patient satisfaction level. Adaptive dynamic programming algorithms are developed to avoid the curse of dimensionality. These algorithms can dynamically capture patient preferences, update the value of being a state, and thus improve the appointment decisions. Results: Experiments were conducted to investigate the performance of the algorithms. The convergence behaviors under different settings, including the number of iterations needed for convergence and the accuracy of results, were examined. Bias-adjusted Kalman filter step-sizes were found to lead to the best convergence behavior, which stabilized within 5000 iterations. As for the effects of exploration and exploitation, it resulted in the best convergence behavior when the probability of taking a myopically optimal action equaled 0.9. The performance of value function approximation algorithm was greatly affected by the combination of basis functions. Under different combinations, errors varied from 2.7% to 8.3%. More preferences resulted in faster convergence, but required longer computation time. Conclusions: System parameters are adaptively updated as bookings are confirmed. The proposed appointment scheduling system could certainly contribute to better patient satisfaction level during the booking periods. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Healthcare systems are facing increasing pressure to satisfy diverse demands especially in an aging society. According to statistics from Ministry of Health of China, the number of visits to health institutions in 2012 approached 6.9 billion, representing a 9.7(2013) [1] showed that there were only 3.19 practicing physicians per 1000 persons in the urban areas. Therefore, demands for healthcare resources far outstrips their supply. Owing to ineffective healthcare systems, the complaints received by institutions are escalating. Hence, it is essential that healthcare resources and medical quality have to be significantly improved as early as possible. In 2009, the Ministry of Health in mainland China started requiring hospitals to install appointment systems. Currently, because
∗ Corresponding author. Tel.: +852 34428413. E-mail address:
[email protected] (R.Y.K. Fung). http://dx.doi.org/10.1016/j.artmed.2014.12.002 0933-3657/© 2014 Elsevier B.V. All rights reserved.
of great demand, a patient wanting to see a specialist (a senior physician) has to make an appointment in advance. A good appointment system can increase utilization of facilities, cut waiting time for patients, and in turn improve patients’ satisfaction level. Besides, patients’ satisfaction is influenced not only by the perceived quality of medical services but also by their appointment booking experience [2]. However, there are a lot of variables in the patients’ arrival and service process, making scheduling of outpatient appointments more complicated. Many papers focus on different aspects, such as no-show [3,4], interruption [5], cancellation [6], traditional and open-access policies [7], and patient priority [8]. These papers contribute greatly for their respective areas. However, patient preference is still a relatively new attribute in the scheduling problem in healthcare industry. Fortunately, the problems of preference and choice in marketing have been widely studied in the past, giving great inspirations to healthcare. Van Ryzin [9] provides an exact and general analysis of the revenue management issues subject to customer preference. In the model, some available choices are provided to allow
34
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
customers to select. With the assumption that purchase probability increases with the number of choices made available, they arrive at an optimal policy involving an elegant form of an ordered family of “efficient” subsets. Zhang and Cooper [10] consider airline revenue management problems with customer choice within a group of flights serving a common route. Chen and Homem-deMello [11] discuss an airline revenue management problem with discrete customer choice behavior, and preference orders are proposed to describe customers’ choice list. If a preferred option is not available, the customer moves to the next choice on the list subject to certain probabilities, and a post-optimization heuristic is used to refine the allocation process. Shen and Su [12] review methods of modeling customer behavior in revenue management and auction. These papers definitely give researchers a great deal of inspirations to study the problem of patient preferences. Patient preference is first modeled explicitly by Gupta and Wang [2]. To some extent, the approach constitutes significant progress in research on appointment scheduling. Patient choices in this paper include preferred physicians and time slots. Patients are divided into two categories: regular patients who call more than one day in advance, and same-day patients who arrive at the start of a workday. The patient choice of a particular workday is modeled as a Markov decision process (MDP). Patients are allowed to switch their choice if their preferred time slot or physician is not available. Both single physician and multi-physician scenarios are simulated. Based on this work, Wang and Gupta [13] further develop an adaptive appointment system, which dynamically learns and updates patients’ preferences. The patients seeking to book a block have an acceptable set, from which scheduler offers a block to them. The work described in this paper differs from their previous one since appointments are scheduled in an outpatient department more than just resources allocation. Qu and Shi [14] model the performance of open access scheduling with patient choice. Vermeulen et al. [15] consider both patient preference and the urgency of the cases. However, most available literature considers appointment systems in the western world which are relatively well-developed. However, very little work has been done concerning situations relevant to mainland China. This paper mainly focuses on sequential appointment scheduling in outpatient departments (OPDs) in hospitals in China. The appointment systems in OPD also have different characteristics because of differences in the approaches used in China and the West. Gupta and Denton [16] summarize three different healthcare scenarios: primary care clinics, specialty care clinics and hospitals. In the West, most people go to a neighborhood clinic in ordinary consultations. If necessary, they are referred to a hospital for follow-up actions. Neighborhood clinics greatly relieve the pressure on the hospitals. However, in China, most people prefer to go directly to a hospital because they trust hospitals more. As a result, hospitals become heavily crowded, especially the more “famous” ones. OPD in China is similar to some extent to primary care clinics in the West. However, there are some distinct differences. For instance, in China, most OPD physicians also work in inpatient departments and the distribution can vary daily based on the planned schedule. Also, in Western primary care clinics, patients usually have a designated physician, the primary care provider, to take care of the majority of the medical needs of a given individual. However, in China, people believe that it is better to go to a well-known hospital specializing in a given type of medical needs. In most Western cases, the primary care clinic is close to the patient’s home whereas, in China, patients do not mind going to a hospital far away from home, if it is well-known. Therefore, along with the availability of the resources needed, patient preference in China can vary with the reputation of the hospital or, even, of a certain physician.
Taking the special healthcare environment into account, this paper mainly focuses on the appointment scheduling for specialists in OPD. The main contributions of this paper are as follows: • A new revenue function is formulated to evaluate patient satisfaction level, disregarding the financial income. • Based on a new evaluation method, a MDP model is developed to model the sequential appointment process. • Adaptive dynamic programming algorithms are proposed to dynamically capture patient preferences, and thus update the value functions timely. 2. Model formulation 2.1. Booking process The model tries to emulate the booking processes in an outpatient department of a typical hospital in China specializing in the treatment to cardiovascular diseases. The outpatient department is composed of seven sub-departments each with several physicians. Patients can make their choices according to the availability of physicians and the schedule on the website. The booking process is illustrated in Fig. 1. During online booking exercises, patients indicate their preferences, including the preferred physician(physician preference) and the convenient time(time preference). It is common for a patient to have more than one choice. Denote R as a preference set and pR as the probability that a patient with a preference set R calls for an appointment in a call-in period. Let be a set that includes all possible preference sets and p0 the probability that there is no appointment request during the call-in period. R⊆ pR is considered as the probability that there is a call (no matter what preferences (s)he holds) in a call-in period. There is R⊆ pR + p0 = 1, indicating that the probability that there are more than one call in a call-in period is zero. Considering patient preferences and the current system state, the scheduler confirms an appointment decision, i.e., assigns or refuses the request. Since no emergency patients are considered in the proposed model, a non-assignment policy is permitted provided that the penalty of a non-assigned patient is smaller than the expected revenue upon completion of the time slot in question. If the request is refused, a penalty l1 is imposed on the hospital. Upon receiving an offer, the patient can choose to accept or decline it. Whenever a patient declines an offer, the hospital suffers a revenue penalty of l2 because the appointment decision dissatisfies the patient (normally, l1 ≥ l2 ). Finally, the booking process is terminated. If all slots in a shift have been fully scheduled, the booking system for the given shift is closed. Hospital statistics have indicated that, owing to the high demand for specialist service and reputation of the hospital in question, the rate of a patient declining an appointment decision is low. Each call-in period (the booking horizon) is divided into T time intervals [4]. Each interval is small enough so that there is no more than one call within the given time period. For a given workday schedule, calls come within the booking horizon. Different from literature where patients are divided according to on their arrival probabilities [17], choice probabilities [2], or no-show rates [4], patients in our model are categorized according to their preferences. 2.2. The states and the action set In the proposed appointment system, there are M physicians whose working shifts in a day are divided into N time slots. It is assumed that the length of all the time slots are equal [17], all patients should arrive punctually without no-shows, each time slot
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
35
Fig. 1. Booking process.
can accommodate only one patient, and the service process can be finished within the time slot. A typical state S takes the form
⎛
s11
...
s1n
i be the revenue derived from the fact that satisfaction level. Let rm the patient is assigned to physician i, while physician m was the preferred one. Then,
⎞
⎜ . . .. ⎟ ⎜ . ⎟ .. . . ⎟ S=⎜ ⎜ ⎟ ⎝ sm1 · · · sMN ⎠ where sij = {0, 1}. sij denotes the state of time slot j of physician i. sij = 1 means that the time slot has already been booked, Otherwise, sij = 0. Therefore, in principle, there are 2MN states in each period. This number can be huge though M and N, individually, are not very large. This leads to the curse of dimensionality [18]. There are several approximate methods that can resolve this problem. An example is the simulation-based method [19]. These methods are used in Section 3 to illustrate how the problem of curse of dimensionality can be resolved. After receiving information concerning patient preferences, the scheduler makes an appointment decision (i, j) based on the current state S. A decision/action (i, j) means that the patient is assigned into jth time slot of physician i. Obviously,
(i, j) ∈ A(S), where A(S) = (i, j) |i = 1, . . ., M; j = 1, · · ·, N; sij = 0 . The patient must be assigned into an available time slot. If an entry (i, j) is assigned, ij is used to describe this decision. ij is an M * N matrix, in which entry (i, j) equals 1, indicating that a patient has been assigned to slot j of physician i, while other entries are all zero. 2.3. Revenue function The work reported in this paper seeks to maximize the total expected satisfaction level of all patients during a booking horizon. To achieve this objective, a revenue function is formulated to evaluate patient satisfaction levels. “Revenue” is taken as the contribution of a healthcare system to the society. Note that this has nothing to do with financial income. This definition of revenue is an important feature that makes this work different from most works described in the literature. There is little doubt that the function depends on the degrees to which patient preferences match appointment decisions. The preferences with regard to physicians and time slots are assumed to be independent of each other. First, let us consider how physician preference affects revenue. Let r be the average revenue from each accepted request when physician preference is satisfied, r1 be the average revenue from patient–physician mismatch resulting from the appointment, and r2 be the average revenue derivable by assigning a patient with no physician preference. Patient–physician mismatches can lower the overall revenue, that is, r1 ≤ r. Physicians have to spend more time reading the medical histories of unfamiliar patients [20]. On the other hand, patients forced to see an unexpected physician most likely have a lower
i rm =
⎧ r ⎪ ⎪ ⎪r ⎨
i=m i= / m
1
r2 m = 0 ⎪ ⎪ ⎪ ⎩
in which m = 0 means the patient has no physician preference. Estimating the revenue from a time slot assignment is a bit more complicated because of the fact that the interval between the assigned and expected time slots influences revenue. If a patient preferring slot n is assigned to slot j, the degree of mismatch can be evaluated as (1/N) n − j. Hence the degree of match is
1 − (1/N) n − j. Let dn be the revenue derivable in case the patient is assigned to time slot j, whereas time slot n is the preferred one. Then,
j
dn =
j
⎧ 1 ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
1 1 − n − j N
n=0 n= / 0
where n = 0 means that the patient has no time preference. Obvij ously, 0 ≤ dn ≤ 1. In summary, by combining the above two branched functions, the total revenue function of an appointment decision can be expressed as ij
Cmn =
i rm j + dn r
(1)
ij
in which, Cmn is the revenue derivable from the fact that the patient with a preference for physician-slot pair (m, n) is actually assigned to physician-slot pair (i, j). The first part of the function is to normalij ize the revenue from physician preference. Clearly, 0 ≤ Cmn ≤ 2. It is now easy to arrive at the following lemma, which is utilized in the algorithm to be proposed Section 3.
The revenue decreases as the value of m − i or n − j
Lemma 1. increases.
2.4. Markov decision process model The sequential appointment scheduling problem can be modeled by the following recursive equation.
Vt (S) =
pR max
R⊆
ij
max (m,n)∈R,(i,j)∈A(S)
+ 1 − pR (Vt+1 (S) − l2 )
ij
ij
pR Vt+1 S + ij + Cmn
, Vt+1 (S) − l1
+ p0 Vt+1 (S) (2)
36
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
where t = 0, 1, 2, . . ., T, and the value function Vt (S) specifies the maximum revenue from the current stage, t, onwards, given the current state S. The first part of the equation captures the expected revenue assuming that a request has indeed arrived at stage t. The second part is the expected revenue from the current stage onwards if there is no request at stage, t. The revenue for the first part comes from making the best decision in response to an appointment request. After the scheduler has made an offer, the patient ij can decide whether to accept the decision. Let pR be the probability that the offer is accepted (it can be achieved from historical data). The appointment scheduling exercise is finalized at the last stage T, or when all pairs (i, j) have turned unavailable. A matrix, E(a M * N matrix of ones), is used to describe the scenario so that VT (S) = Vt (E) = 0.
(3)
3. Adaptive dynamic programming algorithms Adaptive dynamic programming (ADP) algorithms have been widely used while learning patient preferences and improving appointment decisions. In this section, several ADP algorithms, including a simulation-based algorithm (SBA) and a value function approximation algorithm (VFA), are proposed. Algorithm 1 Simulation-based algorithm Step 0: Initialization Step 0a: Initialize V¯ t0 (S); Step 0b: Choose an initial state S01 ; Step 0c: Set k = 1. Step 1: Generate a sample path. Step 2: For t = 1, 2, . . ., T, k−1 Step 2a: Let V¯tk (S) = V¯ t (S);
Step 2b: If |A Stk | = 0, go to Step 3;
k−1 Step 2c: If there is no call, v = V¯ t+1 Stk ;
Step 2d: If there is a call,
v = max
max (m,n)∈R,(i,j)∈A(S k ) t ij
+ 1 − pR
ij
k−1 Vt+1 Stk
k−1 pR V¯ t+1 Stk + ij
− l2
k−1 , Vt+1 Stk
ij
+ Cmn
− l1
Let ak be the action that solves the maximization. Take ak with probability p. Take other action with probability 1 − p and get a new v. value of k Step 2e: Update V¯ t (S),
V¯ tk (S) =
(1 − ˛k−1 ) V¯ tk−1 Stk V¯ k−1 (S)
+ ˛k−1 v S = Stk O.W.
t
k = Stk + ak . Step 2f: Compute St+1 Step 3: Let k = k + 1. If k < K, go to Step 1.
T
Step 4: Return the value function V¯ tK
t=0
.
3.1. Simulation-based algorithm In the backward induction process applied in the proposed MDP model, all states at any stage must be visited, leading to a waste of computational space and time. In order to sidestep the disadvantages associated with solving the backward problem directly. Algorithm 1 is used while implementing a forward simulation algorithm. This algorithm is a modified version of the forward dynamic programming routine described in Chapter 4 of [18]. Several fundamental challenges impacting the convergence rate of the algorithm need to be noted. 3.1.1. Initialization of value functions The first challenge is the initialization problem. In the first step, all the values, V¯ t0 , concerning value of being in a state need to be initialized. It is claimed by [18] that, for a deterministic problem, an optimistic estimate of a value can guarantee the best solution, i.e., a lower value for a problem of minimization and a larger value for one of maximization. For a stochastic problem, an optimistic estimate
leads to wider exploration, which results in a better solution. Since ours is a maximization problem, our goal is to find the upper bound. Lemma 2.
For t = 1, 2, 3, · · · , T − 1, Vt (S) 2 min
A (S) , T − t .
Lemma 2 indicates two upper bounds of the value function. They come from the assumptions that each time slot can just allow one patient and that there is no more than one call during any given calling period. The upper bounds can provide some reference points during initialization. 3.1.2. Exploration vs. exploitation Another challenge associated with the algorithm concerns the probability of taking a myopically optimal action, p, in Step 2d. The probability refers to the “exploration vs. exploitation” problem. Exploration uses better information to obtain a superior value of being in a state, no matter whether the chosen decision is the optimal one given the current information. Exploitation proceeds from currently optimal decisions. A purely exploitative policy leads to a local optimum [18]. This paper simply employs some strategies involving deterministic probabilities. In Section 4, the effects of such probabilities are investigated. 3.1.3. Setting of step-size Another significant challenge relates to how one should iterate the value of the step-size, ˛k , in Step 2e. Step-size is related to how important the averaged contributions and the current observations individually are. There are three different step-size rules: the constant rule, the deterministic rule, and the stochastic rule. In a constant rule, the value of ˛k does not change during all iterations, e.g., ˛k = 0.05. In a deterministic rules, ˛k does not change with the observations [21]. There are many kinds of deterministic rules [18], such as the generalized harmonic (GH) step-size rule, the Polynomial Learning Rate rule, and McClain’s Formula. With respect to GH step-sizes, a widely-used rule, Eq. (4) is chosen to investigate the proposed model: ˛k−1 =
a a+k−1
(4)
where a is a parameter that should be chosen carefully to determine the value of GH step-sizes (see Section 4.1). In contrast to deterministic rules, stochastic rules adapt to the observations [21]. Our approach uses the bias-adjusted Kalman filter (BAKF) step-sizes proposed by George and Powell [22]: ˛k−1 = 1 −
(¯ 2 )
k
(5)
v¯ k
where ¯ 2 and v¯ k denote the estimated variance of the error and of the bias after iteration k respectively. v¯ k can be estimated recursively by
v¯ k = (1 − k−1 )¯vk−1 + k−1 (εk )
2
where k is computed by using k = error V¯ k−1 − V¯ k .
(6) a , a+k
and
εk
is the temporary
3.2. Value function approximation algorithm Another method to reduce the computational space is to adopt a value function approximation algorithm, which uses certain parametric models to estimate value functions. The value function can be considered as a linear or non-linear combination of basis functions, e.g., a linear model takes the form [23]: V¯ (S) = w0 +
H h=1
wh h (S)
(7)
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
37
where H denotes the number of basis functions. It is expensive to estimate the parameters, wh , by using classical statistics. Let W
4.1. Effects of step-sizes
be a |1 + H| vector, W = w0 , w1 , . . ., wH . Powell [18] presents following recursive formula (Eq. (8)) to update the vector W.
Consider now the experiment conducted to examine a subdepartment with two physicians, each of whom has four time slots. Let l1 = 0.1, l2 = 0.08, T = 10, r = 1, r1 = 1/2, and r2 = 1/3. It was assumed that there was no more than one preference for each patient. First, the simulation-based algorithm was implemented up to 10,000 iterations. Fig. 2a listed the values estimated using different step-sizes during iterations, where GH step-sizes were described in the form of “GH – the value of parameter a”. The BAKF step-sizes varied randomly and tended to converge to zero because the estimate of the variance of the error and bias were likely to be equal. However, GH step-sizes converged rapidly to zero. The smaller the parameter a, the faster the convergence to zero. The estimated value of V0 (0) (the expected revenue at the beginning, and 0 was a M * N matrix of zeros) was used as an example to illustrate the convergence behaviors during the iterations, that is, the value of V0 (0) was the objective value. Fig. 2b plotted the convergence behaviors resulting from different step sizes. The value of the benchmark was derived from resolving Eq. (2) directly using backward iterations. Table 1 listed the computational times, the objective values, the differences between the means and the corresponding benchmarks, and the numbers of iterations leading to final convergence. The first column indicated the algorithms in the form of “algorithm name (SBA)-step-size and parameter-exploitation probability-iteration times” or “algorithm name(VFA)-basis functions-iteration times”. As shown in the table, computational time was almost the same for the same number of iterations. It was obvious that BAKF step-sizes yielded the best convergence behavior, which stabilized after 5000 iterations, and that the result was much better than any of those obtained by deterministic procedures or ones using GH step-sizes. As for GH step-sizes, larger values of parameter a leaded to superior convergence behaviors, that is, they converged and stabilized quickly. However, deterministic and GH step-sizes cannot converge even when the number of iterations reached 10,000. Hence, even more iterations (say 100,000 cycles) were conducted. As shown in Table 1, the deterministic step-size method converged after 50,000 iterations, whereas GH still did not. The reason was that when iteration number k increased to a large number the GH stepa sizes, ˛k−1 = a+k−1 , became very small and almost equaled to zero. Table 1 indicated that the results following 10,000 and 100,000 iterations were quite close: 13.97 and 12.59 respectively. Note that the final 90,000 iterations converged very slowly. Though, seemingly, the deterministic step-size (˛ = 0.05) method exhibited a higher convergence rate in comparison to GH, it was less stable.
T
W k = W k−1 − Q k k εk where is a |1 + H| vector, =
T 1, 1 , 2 , . . ., H
(8) (Here
h
(S)
is simplified to be h for notational convenience.); Q is a |1 + H| by |1 + H| matrix; and ε is a scalar. Q and ε are updated using the following equations. Qk =
1 k
Bk−1 ;
εk = V¯ s (W k−1 ) − vk ; Bk = Bk−1 −
1 k
T
(Bk−1 k (k ) Bk−1 );
T
k = 1 + (k ) Bk−1 k . In the above equations, is a scalar and B is a |1 + H| by |1 + H| matrix, which is initialized by using B0 = E (E is the identity matrix; and is a small constant, e.g., it equals 0.1 in our numerical study in Section 4.3). A notable feature of this algorithm concerns the design of basis functions, which can be used to capture all the features of the value function. Since the value of being in a state in the proposed model is closely related to the situation with regard to the available time slots, the following four sets of basis functions are proposed as follows. 1 m =
N
for m = 1, 2, . . ., M
smj ,
(9)
j=1
n2 =
M
for n = 1, 2, . . ., N
sin ,
(10)
i=1 3 m =
N m
sij ,
for m = 1, 2, . . ., M
(11)
sij ,
for n = 1, 2, . . ., N
(12)
i=1 j=1
n4 =
M n j=1 i=1
Linear basis functions 1 and 2 represent the number of unavailable slots belonging to a physician and a time period separately. 3 and 4 are non-linear basis functions that are also relevant with the number of available slots. While implementing the value function approximation algorithm, the equation in Step 2d in Algorithm 1 is replaced by Eq. (13).
v = max {
max
(m,n)∈R,(i,j)∈A(S)
+
ij
1 − pR
ij
{pR
w0 +
4.2. Comparison between exploration and exploitation
H
ij
wh h (S) + Cmn
h=1
k−1 k−1 Vt+1 (S) − l2 }, Vt+1 (S) − l1 }
(13)
4. Numerical experiments This section describes some experiments performed to evaluate the performances of the proposed algorithms. The computations were performed in MATLAB R2009b in the following computational environment: I5-2400 CPU 3.10 GHz, 4.00 GB RAM, Window 7 Enterprise Edition.
In this subsection, variant exploitation probability was tested to compare the effects of exploration and exploitation. In Fig. 3, smaller exploitation probability had larger a convergence rate, while larger probability was associated with better stabilization. When the probability was 0.9, the result was closest to the benchmark. Table 1 showed the difference between the value of the objective values and benchmark. The iteration time had nothing to do with exploitation probability. Columns labeled “Variance” and “Bias” in Table 1 listed the variances of the error and bias after 10,000 iterations, respectively. However, the question of how one can balance exploration and exploitation still does not have a clear answer. Powell [18] tried to provide some theories that might be put into practice.
38
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
(a)
(b)
Fig. 2. Effects of different step-sizes. (a) The value of step-size ˛ under different settings. (b) Convergence behaviors under different step-sizes.
Fig. 3. Exploration vs. exploitation. Fig. 4. Algorithm comparison.
4.3. Algorithm comparisons went below the benchmark and tended to go down further. As a result, basis functions 1 and 2 cannot fully represent the features of states. By contrast, 3 and 4 leaded to good convergence behaviors; they stabilized and approached the benchmark within just 1500 iterations. Compared to other algorithms, this one was extremely effective and efficient. However, when all the proposed basis functions (1234 ) were used, the result was unacceptable since it was not superior to even 34 . This might be due to the negative effects of 1 and 2 . However, the question of how one should design the basis functions continued to remain a problematic area in ADP. As shown in Table 1, 12 required less computational time
In this subsection, the SBA was compared with the value function the VFA. In Fig. 4, the estimation of V0 (0) was also used as an example to show the convergence behaviors of SBA and VFA with different combinations of basis functions. Generally, VFA exhibited a faster convergence rate and reached the value near the benchmark in about 2500 iterations. Additionally, the convergence behavior of VFA was more stable than that of SBA. Furthermore, VFA with different basis functions resulted in different situations of convergence scenarios. Linear basis functions 1 and 2 did not appear to converge to a value within 10,000 iterations, and worse, the achieved results after 3000 iterations Table 1 Comparisons of algorithms.
SBA-C-0.05-0.7-10000 SBA-C-0.05-0.7-100000 SBA-GH-50-0.7-10000 SBA-GH-100-0.7-10000 SBA-GH-150-0.7-10000 SBA-GH-150-0.7-100000 SBA-BAKF-150-0.7-10000 SBA-BAKF-150-0.8-10000 SBA-BAKF-150-0.9-10000 VFA-12 -10000 VFA-34 -10000 VFA-1−4 -10000
Time
Objective value
Difference (%)
Variance
Bias
Iterations for convergence
549.38s 5182.34s 540.17s 542.36s 536.48s 5135.55s 542.12s 552.80s 539.34s 420.69s 499.70s 563.59s
15.36 11.32 16.44 15.01 13.97 12.59 11.09 11.79 12.06 10.97 12.34 12.64
27.8 -5.8 36.8 24.9 16.2 4.7 −7.7 −1.9 0.3 8.7 2.7 5.2
– – – – – 0.1793 0.1584 0.1635 0.5721 0.8193 0.6804
– – – – – – 0.0291 0.0058 0.0080 0.0951 0.094 0.1396
– 50,000 – – – – 5000 5000 5000 – 1500 3000
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
(a)
39
(b)
Fig. 5. Effects of the number of preferences. (a) Results and computational time of the number of preferences. (b) Convergence behavior with the different number of preferences.
than the other two combinations. The variance of the error of VFA was larger than that of SBA. 4.4. Effects of the number of preferences In improving patient satisfaction levels, it was permitted that a patient can provide several preferences, that is, |R| > 1. In this subsection, a smaller system was used to test the effects of the number of preferences, which was set at two physicians each with two time slots. Set T = 6. Let the number of preferences vary from 0 to 9. Using the adjusted “SBA-BAKF-50-0.9-3000” algorithm, Fig. 5a showed that both the computational time and the expected revenue increased with the number of preferences. The reason was that schedulers had more choices when patients provided more preferences, which also leaded to more computational time. At the extreme, when the number of preferences reached the maximal value(9 in this example), i.e., patients can accept any physician and any time slot, the expected revenue rises to the maximum value(8 in this example). Fig. 2b showed the convergence behaviors with different number of preferences. A greater number of preferences resulted in faster convergence. 5. Discussion This paper has focused on appointment scheduling with patient preferences. Hence, a critical step while implementing the proposed appointment systems concerns the collection of patient preference and choice information. In China, more and more patients make appointments before seeing doctor. Some hospitals have their own appointment systems. For instance, the outpatient department of the cardiovascular hospital discussed in this paper allows patients to choose a particular physician. However, this appointment system is based simply on first come first served. The result is that the future patient’s needs are not considered. In addition, in this appointment system, patients cannot ask for a particular time-slot. Hence, an appointment just ensures that they can receive medical service on that day. As a result, patients have to arrive at the hospital early and wait a long time for medical service. Our simulation-based algorithm iterates the value of being in a state by traversing a large number of sample paths. Several settings need serious consideration, e.g., initialization of the value function, explorations vs. exploitation, and step-size. These settings complicate algorithm formulations because any change in the setting affects the accuracy of final results significantly. In order to avoid
the disadvantage of ADP, another group of approximate algorithm is developed recently. This method transforms the MDP model into a liner programming model, and then solves the latter in view of its simplicity [24,25]. However, whether the LP model is resolvable greatly depends on the scale of the model (the number of variables and constraints). A decomposition method has to be used to reduce the number of variables. Hence, the method is often called decomposition-based algorithm. The advantage of decompositionbased algorithm is that it sidesteps iterative learning [18]. However, the simulation-based algorithm is more appropriate for large-scale problems, since there is no need to handle a mathematical programming model. 6. Conclusion and future work A well-developed appointment system can help increase the utilization of medical facilities in an outpatient department. This paper proposed adaptive algorithms for sequential appointment scheduling in outpatient department. Taking patient preferences into account, a revenue evaluation model was presented. It ignored the financial income, and focused on the satisfaction level of patients. A patient’s preference encompassed a preferred physician and a convenient time slot. In this context, a MDP model was proposed for scheduling sequential appointments. Patients were scheduled to a slot with a view of maximizing expected revenue of the whole department. To avoid the problem of the curse of dimensionality, several adaptive algorithms were developed, including simulation-based algorithm and value function approximation algorithm. These algorithms did not try to achieve the value of being in a state directly, but to approximate the value during each iteration and converge to the value at last. In practice, each iteration represented the booking process for a workday. When using these algorithms, two important issues were concerned, i.e., the number of iterations needed for convergence and whether the converged point was equal to or approaching the benchmark. For simulation-based algorithm, three points should be well-developed, initialization, exploration vs. exploitation, and step-size. Experiments showed the convergence behaviors of the algorithm with different settings. BAKF step-size leaded to best convergence behaviors. As for value function approximation algorithm, two critical points, i.e., how to design basis functions and how to iterate the parameters, should be taken into account seriously. Four sets of basis functions were proposed to capture the features of the state. In the experiment part, several combinations of basis functions were illustrated. It was apparent that the combination of the third and fourth basis
40
J. Wang, R.Y.K. Fung / Artificial Intelligence in Medicine 63 (2015) 33–40
functions resulted in best convergence behaviors. At last, the effects of the number of preferences were investigated. Scheduler should strike a balance between the accuracy of result and the computational time required. The proposed model only considers the scheduling problem in a particular workday. The appointments for different workdays are assumed to be independent. However, it is not always the case. Besides, the sum of maximal revenue in each workday cannot guarantee the maximal revenue over a period of time. Therefore, some further work should be done to enhance the proposed model. Another future research direction is to investigate how the exploration probability should be set and how to design basis functions in order to improve the ADP algorithm. Acknowledgments The research is partially supported by Research Grant Council (RGC) of Hong Kong by the Grant No. T32-102/14-N. The authors are grateful to the anonymous referees and the editor-in-Chief for their constructive comments on earlier versions of the paper. References [1] Sheng L, et al. China statistical yearbook. Beijing: China Statistics Press; 2013. [2] Gupta D, Wang L. Revenue management for a primary-care clinic in the presence of patient choice. Oper Res 2008;56(3):576–92. [3] Muthuraman K, Lawley M. A stochastic overbooking model for outpatient clinical scheduling with no-shows. IIE Trans 2008;40(9):820–37. [4] Lin J, Muthuraman K, Lawley M. Optimal and approximate algorithms for sequential clinical scheduling with no-shows. IIE Trans Healthcare Syst Eng 2011;1(1):20–36. [5] Luo J, Kulkarni VG, Ziya S. Appointment scheduling under patient no-shows and service interruptions. Manuf Serv Oper Manage 2012;14(4):670–84. [6] Huang Y, Zuniga P. Effective cancellation policy to reduce the negative impact of patient no-show. J Oper Res Soc 2014;65(5):605–15.
[7] Robinson L, Chen R. A comparison of traditional and open-access policies for appointment scheduling. Manuf Serv Oper Manage 2010;12(2):330–46. [8] Min D, Yih Y. An elective surgery scheduling problem considering patient priority. Comp Oper Res 2010;37(6):1091–9. [9] Van Ryzin G. Revenue management under a general discrete choice model of consumer behavior. Manage Sci 2004;50(1):15–33. [10] Zhang D, Cooper WL. Revenue management for parallel flights with customerchoice behavior. Oper Res 2005;53(3):415–31. [11] Chen L, Homem-de Mello T. Mathematical programming models for revenue management under customer choice. Eur J Oper Res 2010;203(2):294–305. [12] Shen Z-JM, Su X. Customer behavior modeling in revenue management and auctions: a review and new research opportunities. Prod Oper Manage 2007;16(6):713–28. [13] Wang W-Y, Gupta D. Adaptive appointment systems with patient preferences. Manuf Serv Oper Manage 2011;13(3):373–89. [14] Qu X, Shi J. Modeling the effect of patient choice on the performance of open access scheduling. Int J Prod Econ 2011;129(2):314–27. [15] Vermeulen I, Bohte S, Bosman P, Elkhuizen S, Bakker P, La Poutré J. Optimization of online patient scheduling with urgencies and preferences. Artif Intell Med 2009:71–80. [16] Gupta D, Denton B. Appointment scheduling in health care: challenges and opportunities. IIE Trans 2008;40(9):800–19. [17] Patrick J, Puterman ML, Queyranne M. Dynamic multipriority patient scheduling for a diagnostic resource. Oper Res 2008;56(6):1507–25. [18] Powell WB. Approximate dynamic programming: solving the curses of dimensionality. 2nd. edn Hoboken, NJ: John Wiley & Sons; 2011. [19] Si J, Barto A, Powell W.Handbook of learning and approximate dynamic programming. Piscataway, NJ: IEEE Press; 2004. [20] O’Hare C, Corlett J. The outcomes of open-access scheduling. Fam Pract Manage 2004;11(2):35–8. [21] He M, Zhao L, Powell WB. Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation. Eur J Oper Res 2012;222(2):328–40. [22] George A, Powell W. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach Learn 2006;65(1):167–98. [23] Bertsekas D, Tsitsiklis J. Neuro-dynamic programming. NH: Athena Scientific; 1996. [24] Kunnumkal S, Topaloglu H. A new dynamic programming decomposition method for the network revenue management problem with customer choice behavior. Prod Oper Manage 2010;19(5):575–90. [25] Meissner J, Strauss A, Talluri K. An enhanced concave program relaxation for choice network revenue management. Prod Oper Manage 2013;22(1):71–87.