Heavy traffic analysis of state-dependent parallel queues with triggers and an application to web search systems

Heavy traffic analysis of state-dependent parallel queues with triggers and an application to web search systems

Performance Evaluation 67 (2010) 913–928 Contents lists available at ScienceDirect Performance Evaluation journal homepage: www.elsevier.com/locate/...

830KB Sizes 1 Downloads 22 Views

Performance Evaluation 67 (2010) 913–928

Contents lists available at ScienceDirect

Performance Evaluation journal homepage: www.elsevier.com/locate/peva

Heavy traffic analysis of state-dependent parallel queues with triggers and an application to web search systems Saul C. Leite, Marcelo D. Fragoso ∗ Department of Systems and Control, National Laboratory for Scientific Computing (LNCC), Av. Getulio Vargas 333, Petropolis, RJ, CEP:25651-075, Brazil

article

info

Article history: Received 9 January 2009 Received in revised form 1 March 2010 Accepted 4 March 2010 Available online 2 April 2010 Keywords: Queueing theory Heavy traffic approximation Stochastic models Parallel systems Web search systems

abstract In this paper, we consider the approximation of the number of customers of a Poissontype parallel system of two queues, operating under heavy traffic, by a reflected stochastic differential equation. The type of queueing system here has the so-called fork-join structure. In addition, the model allows for state-dependent (service and arrival) rates. A novelty in our study is that a signal named ‘‘trigger’’ is considered. These signals are used to move a customer from one queue to the other. This, in turn, can be used to reduce the imbalance (and as a consequence, the system response time) which appears in certain parallel systems. The results here are applied to a web search system, which is a relevant example of a parallel system which operates under heavy traffic and suffers from imbalance among servers. A stochastic optimal control problem is formulated in order to find the best routing policy of signals which reduces the imbalance. The optimal control was tested in a simulation, indicating significant reduction in system response time when compared to the uncontrolled system. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Consider a queueing network composed of K stations in parallel. Each arriving job at this system is forked into K tasks, and each task is assigned to one of the K stations. At each station, the tasks join a queue and wait for processing. Jobs only leave the system when all of its K tasks complete service. This type of queueing system is called fork-join, and it arises naturally as a model of parallel processing systems (see [1] for a review). In several applications, the response time of parallel processing systems is of critical importance, and usually one desires to reduce it as much as possible. Since a job only leaves the system when each of its K tasks have finished, the response time will be greater when the system is imbalanced. Due to the stochastic nature of these systems, an imbalance can appear in heterogeneous networks (i.e., networks comprised of stations with different processing speeds) or even in homogeneous systems. In order to reduce the imbalance, it is natural to consider a strategy that distributes jobs optimally between the queues. There are several ways to accomplish this. The control can be performed at arrival time at an external node by assigning more than one task to a processor if another is overloaded. This strategy is usually called centralized control. Alternatively, the control can be performed by each processor individually, and this is called distributed control. In this case, the control can be receiver-initiated, where the underloaded station asks other processors for work, or sender-initiated, where an overloaded station moves work to other processors [2]. Distributed control seems, prima facie, to be the most appropriate control strategy for reducing the response time of forkjoin systems. This is due to how job priority can be implemented in each scheme. Since each station in a fork-join system receives jobs at the same instant, when the system is imbalanced, it means that there are jobs waiting for delayed tasks. To reduce the response time, an underloaded station should give priority to the tasks that arrived first. In a centralized control



Corresponding author. Tel.: +55 24 2233 6013. E-mail addresses: [email protected] (S.C. Leite), [email protected] (M.D. Fragoso).

0166-5316/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.peva.2010.03.002

914

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

a

b

c

d

Fig. 1. Diagram of two stations in parallel. (a) Queues before the arrival of the fourth job. (b) A system with no balancing scheme. (c) A system with a central control scheme. (d) A system with a distributed control scheme.

scheme, the underloaded stations receive tasks of the jobs that arrived last, as opposed to a distributed control scheme, where priority can be given to older tasks. In order to illustrate this statement, suppose that we have two stations (labeled A and B) in parallel, each receiving a fraction (or task) of an arriving job. Consider a scenario where three jobs arrive and station A is able to finish its tasks before station B finishes its first task (see Fig. 1(a) for reference). Following that, a fourth job arrives and we consider three different cases. In the first case, given by Fig. 1(b), we have a system with no balancing strategy. In the second case, Fig. 1(c), we have a system with centralized control, which assigns at arrival time both tasks of job 4 to the same station A. Finally, in Fig. 1(d), we have a system with receiver-initiated distributed control, where each task is assigned to its corresponding queue but the underloaded queue may move tasks from the other queue in order to keep the system balanced. Although both systems with control balance the queues, only the distributed control can give priority to tasks that arrived first. In queueing theory, this type of ‘‘communication’’ between stations was introduced by Gelenbe in [3]. These systems belong to a larger class of queueing systems with signals called G-networks (see [4] for a review on the subject). This type of signal, where a station moves a task from one queue to the other, is commonly referred to as a ‘‘trigger’’. Moving tasks between processing stations may be costly, either because of communication delay or due to delay in computation (i.e., each station may be specialized at their fraction of the job, but they can work on a neighboring station’s task at an additional cost in computation). Therefore, the question of identifying the ideal moment to move a customer is not trivial. The problem becomes more delicate when these systems are operating under their critical level of stability, which is characterized by job arrival rates being close to server processing rates. Under this condition, large numbers of tasks can fluctuate rapidly at any given station, adding more complexity to the problem. In this paper, we are interested in finding good policies to balance parallel processing systems operating under heavy traffic using the receiver-initiated load balancing scheme. The approach taken is to characterize these systems in their limit working conditions by a diffusion. Using this characterization, the problem of finding the optimal balancing policy can be framed into an optimal stochastic control problem. There is by now a wide range of applications of diffusion approximation to handle complex problems in physics, biology, economics, inter alia (see, e.g., [5–9]). In general, it incorporates into the model some key limiting condition. An interesting feature of this approximation is that it strips away unessential details and reveals key features determining performance, i.e., unessential details appears as noise, described by a Brownian motion. In addition, in this setting, we can make use of a huge amount of results from the theory of diffusion processes in the solution of many real problems. Due, in part, to the immense success of this powerful approach in dealing with problems which would be intractable if an exact model were used, its application has been broadened to many areas, including queueing theory. In the queueing scenario, diffusion approximation provides a simple description for complicated queueing processes and reveals the impact of variability upon queueing performance (see, e.g., [10,11] for comprehensive accounts on this subject). Diffusion approximations for queueing systems, which are obtained via a powerful method dubbed heavy traffic analysis, have their roots in the early 1960s with the seminal works of Kingman [12], Prohorov [13] and Borovkov [14,15]. The technique gets its name from the required assumption that the job arrival rates are close to the service rates, which is a common setting in computer systems. Using this approximation, one is able to describe the transient evolution of these systems by a reflected stochastic differential equation. There is no (convenient) closed-form expression to describe the dynamics of queueing systems, and for that reason these approximations are very useful. An alternative would be to describe the time evolution of queueing systems using a simulation. However, simulations become impractical (due to computational

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

915

costs) in some cases. In addition, heavy traffic analysis adds mathematical information regarding the limiting behavior of heavily loaded systems. In some situations, when controls are involved, it is desirable to have the control operating adequately under limiting working conditions. These approximations can be useful even if the actual queueing system does not experience heavy traffic (see [10,11]). In this paper, the heavy traffic approximation for a G-network with triggers composed of two stations, where the service rate, arrival rates and routing probabilities of signals are state-dependent, is derived. We adopt the approach used in [16], that is, the system is described in a similar fashion to what is done in [17], and some tools from [10] are used to achieve the limit equation. To the best of our knowledge, queueing systems with triggers have not yet been treated under heavy traffic analysis. There is an extensive list of articles in the literature dealing with fork-join queueing systems. Most of them focus on performance bounds (see, for example, [18–21]). Diffusion approximations for these systems have been presented in [22–24]. Also, a similar problem of assigning jobs to processors in parallel has been considered by several authors under the name of ‘‘resource pooling’’. Usually, in this approach, the workload of the system is considered as a ‘‘system invariant’’ and a pathwise asymptotically optimal policy is sought (see, for instance, [25–27,10,28]). A key difference in our approach is the introduction of triggers, which move customers between queues, as opposed to dealing with the problem of assigning processor time to different queues. Web search systems are good examples of parallel systems which operate under heavy traffic and suffer from an imbalance among servers, as pointed out by Badue et al. [29]. The heavy traffic model developed here is applied to this setting and a stochastic optimal control problem is formulated in order to find the best strategy to reduce the imbalance via signals. Although finding a solution for an optimal control problem is, in general, a rather difficult task, we take advantage here of the numerical method developed in [30], which can be used to calculate the optimal policy for our setting. To illustrate the results, a simulation was constructed using the data collected by Gonçalves et al. [31] from a real web search system. The results indicate a considerable reduction in the response time of the system using the optimal policy when compared to the uncontrolled system. An outline of the paper is as follows. In Section 2, the queueing model is discussed in detail and an useful decomposition of the process is given. This decomposition separates the terms which give rise to ‘‘noise’’ in the system and the terms which are ‘‘regular’’. Next, in Section 3, a scaled version of the queueing system is introduced. The scaled system is needed in order to achieve the heavy traffic limit. A more detailed motivation for the scaling is also discussed in this section. Section 4 is devoted to the derivation of the heavy traffic limit. In short, it is shown that, under certain conditions, the scaled number of tasks in station i, given by xni , converges in distribution to xi , which satisfies the equation xi (t ) = xi (0) +

t

Z

gi (x(s))ds + wi (t ) + zi (t ), 0

Rt

where 0 gi (x(s))ds is the drift term, which arises from the ‘‘regular’’ behavior of the system, wi is a Wiener process which models the ‘‘noise’’ in the system, and zi is usually called the ‘‘reflection process’’; it maintains the diffusion constrained to the desired state space. Section 5 discusses the application of the queueing model and its heavy traffic approximation in order to devise a policy for balancing tasks in a web search system. Also, this section discusses the simulation which was used to validate the numerical experiments. Section 6 gives the numerical experiments and results. Finally, some auxiliary results are given in the Appendix.

2. Queueing model Consider a parallel processing system where jobs come from the same source and are forked upon arrival (see Fig. 2 for reference). The server processes are supposed to be independent of each other, and independent of the arrival process. When a task is completed, the station from where it departed may send a signal (or a trigger) to another station. This signal is sent at the same instant of the task completion. Therefore, we use the interpretation that the completed task was ‘‘routed’’ as a signal, and we suppose that the decision of routing is state dependent. This signal has the effect of moving a task from the station which is receiving the signal to the station which has sent the signal. The model considered here is restricted to a network consisting of only two queues. The reason for this is to avoid a technical difficulty that appears in the proof of the characterization of the heavy traffic limit; a similar problem was encountered in [16]. This problem is discussed in more detail in Section 4, where the appropriate notation and terminology are introduced. Also, it is important to mention that finding numerical solutions to stochastic optimal control problems for systems containing several (i.e., hundreds, as is the case for modern parallel computer systems) state variables is nowadays impractical due to the methods currently available (see, for instance, [10,30]). Although we impose this restriction on the theoretical model, the balancing strategy devised for a system of two stations can be used in a system with several stations. The reason for this is that task movement is always performed between two queues. We discuss this in more detail in Section 5.1. Suppose that E (t ) denotes the number of job arrivals in the system by time t. Let Di (t ), for i ∈ {1, 2}, be the number of completed tasks at station i by time t. Also, define Ci (t ), for i ∈ {1, 2}, as the number of tasks that left station j (j 6= i) and

916

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

Fig. 2. Diagram of two parallel stations with forked arrival coming from the same source. The process E counts the number of arrivals in the system and Di the number of completed tasks at each station, i ∈ {1, 2}. Process C1 counts the number of customers leaving queue 2 and joining queue 1 due to the effect of a trigger.

joined station i by time t, due to the effect of a trigger. Hence, we can write the number of customers at station i by time t, denoted by Xi (t ), as Xi (t ) = Xi (0) + E (t ) − Di (t ) + Ci (t ) − Cj (t ),

(1)

for i, j ∈ {1, 2}, i 6= j, where Xi (0) is the initial condition of station i. Queue buffers are allowed to be finite and denoted by constants B˜ i ∈ N, for each station i. Infinite buffers can be considered by letting B˜ i = ∞. A job is only allowed to enter the system if both stations have space for its tasks. This way, the process X is constrained to the state space L , {ξ ∈ R2 : 0 ≤ ξi ≤ B˜ i , if B˜ i < ∞, and 0 ≤ ξi < ∞, if B˜ i = ∞, i = 1, 2} for every t ≥ 0. 2.1. Process definitions and stochastic primitives Let us introduce a common stochastic basis (Ω , F , P, Ft ), where all the driving processes, given below, are defined. Let Nid , for i ∈ {1, 2}, and N a be standard (rate 1) Poisson processes. Also, let Λa and Λdi be measurable functions from R2+ to R+ , where R+ , {ξ ∈ R|ξ ≥ 0}. Define the following state-dependent Poisson processes: t

Z

A(t ) , N a

 Λa (X (s))ds ,

0

Si (t ) ,

t

Z

Nid

Λdi



(X (s))ds ,

0

where X = (X1 , X2 )0 denotes the number of tasks in each station. Notice that the integrals given as arguments to N a and Nid work as random time changes. The functions Λa and Λdi can be thought of as state-dependent rates of the Poisson processes A and Si . Let the indicator function I{ξ ∈ B}, for B ⊂ R, be defined as I{ξ ∈ B} = 1 if ξ ∈ B and I{ξ ∈ B} = 0 otherwise. Then the process E (t ), which counts the number of arrivals in the system, and the process Di (t ), which counts the number of served tasks by time t, are defined by E (t ) ,

t

Z

I{X1 (s−) < B˜ 1 , X2 (s−) < B˜ 2 }dA(s), 0

Di (t ) ,

t

Z

I{Xi (s−) > 0}dSi (s). 0

The indicator functions inside the integrals are used to neglect arrivals when one of the buffers is full, and to neglect departures when the station is empty (see, for instance, [32] for more detail with respect to this notation). In order to define Ci , let Ici (t ) denote the indicator function of the event that a customer leaving station i at time t is ‘‘routed’’ to the other queue as a trigger. Hence, we define Ci (t ) ,

t

Z

Ici (s)I{Xj (s−) > 0, Xi (s−) < B˜ i }dDi (s). 0

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

917

The filtration Ft given as part of the stochastic basis in the beginning of this section is defined as the minimal σ -algebra that measures all driving processes defined above up to time t (Ft is assumed to satisfy the so-called usual assumptions: it is right continuous and contains all the P-null sets of F ). Also, let Ftr be defined as Ft except that it does not include the routing decision at the instant t. At this point, we will decompose the process X as a martingale and a predictable process. For that, the following assumptions need to be introduced: Assumption 1. For i ∈ {1, 2}, (a) let the random quantities Xi (0), N a , Nid be mutually independent; (b) let the functions Λa and Λdi be measurable and bounded;   (c) let there be a measurable function Qic : R2+ → [0, 1] such that E Ici (t ) Ftr = Qic (X (t −)). The boundedness of the functions Λa and Λdi guarantees that the jumping processes E, Di and Ci are non-explosive (see [32, Th. 8 pg. 27]). The function Qic (ξ ) can be interpreted as the probability of routing a task as a trigger, given that the current state of the system is ξ . The counting processes defined above have a martingale decomposition (see [17, pg. 625], and [32, Th. 8 pg. 27]); that is, there are Ft -martingales M a and Mid such that

ˆ i, and Mid = Di − D

M a = E − Eˆ

ˆ i are Ft -predictable processes (see [32, pg. 8]) called compensators, and where Eˆ and D Eˆ (t ) =

t

Z

I{Xi (s) < B˜ i , Xj (s) < B˜ j }Λa (X (s))ds, 0

ˆ i (t ) = D

t

Z

I{Xi (s) > 0}Λdi (X (s))ds. 0

t Define Mic (t ) , 0 Ici (s) − Qic (X (s−)) I{Xj (s−) > 0, Xi (s−) < B˜ i }dDi (s). Using the same argument as in [17, pg. 626], can be shown to be an Ft -martingale. Therefore, we have

R

Mic

Ci ( t ) =

Mic

(t ) +



t

Z

Qic (X (s−))I{Xj (s−) > 0, Xi (s−) < B˜ i }dDi (s) 0

Z t = Mic (t ) + Qic (X (s−))I{Xj (s−) > 0, Xi (s−) < B˜ i }dMid (s) 0 Z t c + Qi (X (s))I{Xi (s) > 0, Xj (s) > 0, Xi (s) < B˜ i }Λdi (X (s))ds. 0

Thus we arrive at the desired semi-martingale decomposition of X : Xi (t ) = Xi (0) + Bi (t ) + Mi (t ),

(2)

where Bi (t ) ,

t

Z

I{Xi (s) < B˜ i , Xj (s) < B˜ j }Λa (X (s)) − I{Xi (s) > 0}Λdi (X (s)) 0

+Qic (X (s))I{Xi (s) > 0, Xj (s) > 0, Xi (s) < B˜ i }Λdi (X (s)) − Qjc (X (s))I{Xj (s) > 0, Xi (s) > 0, Xj (s) < B˜ j }Λdj (X (s))ds, and Mi (t ) , M a (t ) − Mid (t ) + Mic (t ) +

t

Z

Qic (X (s−))I{Xj (s−) > 0, Xi (s−) < B˜ i }dMid (s) 0



Mjc

(t ) −

Z

t

Qjc (X (s−))I{Xi (s−) > 0, Xj (s−) < B˜ j }dMjd (s).

0

Notice that B is the sum of the predictable processes and M is the sum of the martingale terms. 2.1.1. Some remarks on the model Before proceeding to the derivation of the heavy traffic approximation, let us make an important consideration about the model defined in this section. Perhaps one thing that needs to be clear is the reason for restricting the model to Poissontype service processes. Consider the following line of events described by Fig. 3. At time tk , the kth task enters a station (say, station i), which was empty prior to the arrival of this task. The service time of the task is denoted by the random variable ∆k . At time τ , this task is moved to another queue due to a received signal. Following that, task k + 1 enters station i and begins service.

918

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

Fig. 3. A time line of events illustrating a discrepancy generated by removing a customer at service; see text for details.

The model would have to account for the discrepancy generated by the task k being moved during service. That is, we would need to compute tk + ∆k − tk+1 and use it appropriately in the model in order to fix this discrepancy. This problem is avoided when the service times are exponentially distributed by their memoryless property. In fact, using this property, it is easy to verify that P (∆k+1 > t ) = P ( tk + ∆k − tk+1 > t | ∆k > (tk+1 − tk )), and the residual time tk + ∆k − tk+1 can be considered as the service time for task k + 1. One may also argue that the above problem could be avoided if signals are only allowed to move customers from a station i at times t such that Xi (t ) > 1. However, with this approach, one would encounter a difficulty in arriving at a representation for the reflection process in the limit equation. This problem is discussed in full detail by Kushner in [10, pg. 277]. 3. Scaled queueing system Heavy traffic analysis of queueing systems relies on the fact that properly scaled sums of random variables centered around their means converge in distribution to Brownian motion, under broad conditions. The best-known version of this result is attributed to Donsker, and it is sometimes called Functional Central Limit Theorem. We shall use a more elaborated version in which a properly scaled martingale converges weakly to a Wiener process (see Theorem 6). In addition to introducing a scaling in space and time, one also needs to introduce the heavy traffic assumption. The way this is usually performed is by supposing that there is a sequence of queueing systems {X n } which approaches its limit utilization, that is, ρ n ↑√ 1 as n ↑ ∞, where ρ n is the traffic intensity parameter of the system. n n Let x (t ) , X (nt )/ n, for n ∈ N, where, for each n, X n is the process defined in the previous section, which satisfies Eq. (1). Any mathematical object defined with respect to X in the previous section is now indexed with the parameter √ c ,n d,n n (i.e., F n , Λa,n , Λi , Qi ). Also, let the buffer sizes satisfy B˜ ni = nBi , where Bi ∈ (0, ∞) ∪ {∞} is a given constant independent of n. We can rewrite Eq. (2) so that xni (t ) = xni (0) + bni (t ) + mni (t ),

(3)

√ where (0) = (0)/ n, bni is defined as Z t √ d,n n bi (t ) = n I{xni (s) < Bi , xnj (s) < Bj }λa,n (xn (s)) − I{xni (s) > 0}λi (xn (s)) xni

Xin

0

+ qic ,n (xn (s))I{xni (s) > 0, xnj (s) > 0, xni (s) < Bi }λdi ,n (xn (s)) − qjc ,n (xn (s))I{xnj (s) > 0, xni (s) > 0, xnj (s) < Bj }λdj ,n (xn (s))ds, (4) √ d,n √ c ,n √ d,n c ,n a,n a,n where a change of variable was used and we define λ (ξ ) , Λ ( nξ ), λi (ξ ) , Λi ( nξ ), qi (ξ ) , Qi ( nξ ), for ξ ∈ R2+ , and mni is defined as Z t c ,n c ,n d,n c ,n d,n qi (xn (s−))I{xnj (s−) > 0, xni (s−) < Bi }dmi (s) mni (t ) = ma,n (t ) − mi (t ) + mi (t ) − mj (t ) + 0 t

Z

d,n

c ,n

qj (xn (s−))I{xni (s−) > 0, xnj (s−) < Bj }dmj (s),

− 0



d,n

d,n



c ,n

(5) c ,n



where ma,n (t ) , M a,n (nt )/ n, mi (t ) , Mi (nt )/ n, and mi (t ) , Mi (nt )/ n. In the next section, we show that this scaled queueing system xn converges weakly to a reflected diffusion when n ↑ ∞, under certain conditions. 4. Heavy traffic approximation This section is devoted to the presentation of the main theorem of the paper. We show that the scaled number of customers in each station xni , defined in the previous section, converges weakly to a reflected stochastic differential equation, under certain conditions. As usual in the heavy traffic approach, conditions to ensure that the scaled system approaches heavy traffic as n increases are necessary. Notice that the following assumption introduces conditions on the service rates, arrival rate, and routing probabilities in such a way that, for large n, the rate of customers entering the system is close to the rate of customers leaving the system, which characterizes the heavy traffic regime. Assumption 2. Assume that (a) there exists constants r a , rid ∈ (0, ∞) and ric ∈ [0, 1), and bounded and continuous functions f a , fid and fic , i ∈ {1, 2}, such that, for any ξ ∈ R2+ , we have

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

919

√ √ λa,n (ξ ) = r a + f a (ξ )/ n + o(1/ n), √ √ λdi ,n (ξ ) = rid + fid (ξ )/ n + o(1/ n), √ √ c ,n qi (ξ ) = ric + fic (ξ )/ n + o(1/ n), √ √ √ where the term o(1/ n) denotes a function of ξ ∈ R2 such that no(1/ n) converges uniformly in ξ to zero; d c d d c a (b) r + ri ri = ri + rj rj , for i, j ∈ {1, 2}, where i 6= j. This condition is usually called the heavy traffic condition. The continuity of the functions f a , fid , and fic are used to characterize the weak-sense limit in the theorem below. This condition can be relaxed, and that is discussed at the end of this section. Although the state dependence introduced above by Assumption 2 is small for large n, it will have a significant effect on the limit. The functions f a , fid , and fic appear in the drift term of the limit equation. Let us define the state space G , {ξ ∈ R2 : 0 ≤ ξi ≤ Bi , if Bi < ∞, and 0 ≤ ξi < ∞, otherwise}, which is used in the statement of the theorem below. Theorem 3. Let xn (0) converge weakly to x(0). Under Assumptions 1 and 2, {xn } is tight and the weak-sense limit of any weakly convergent subsequence satisfies the equation below: x(t ) = x(0) +

Z

t

g (x(s))ds + w(t ) + z (t ),

(6)

0

where x(t ) ∈ G for all t ≥ 0, and gi (x) = f a (x) − fid (x)(1 − ric ) + rid fic (x) − rjd fjc (x) − rjc fjd (x),

wi (t ) = w a (t ) − wid (t )(1 − ric ) + wic (t ) − wjc (t ) − rjc wjd (t ), for i, j ∈ {1, 2}, i 6= j. Let Ft denote the minimal σ -algebra that measures {x(0), x(s), w(s), z (s); s ≤ t }. The processes w a , wid , and wic are independent Ft -measurable Wiener processes with variances r a , rid , and rid ric (1 − ric ), respectively. The process z (t ) , Rζ (t ) is the reflection process, where

 R,

1 1 − r2d /r a

1 − r1d /r a 1

−1 − r1c r1d /r a −1 + r1c r1d /r a

 −1 + r2c r2d /r a , −1 − r2c r2d /r a

ζ = (y1 , y2 , u1 , u2 )0 , and yi (resp., ui ) is non-decreasing, continuous and increases only at t such that xi (t ) = 0 (resp., xi (t ) = Bi ), also yi (0) = 0 and ui (0) = 0. Proof. no loss in generality, suppose that Bi < ∞, for i ∈ {1, 2}. The general case follows analogously by setting √ R t With n 0 I{xni (s) = Bi }ds = 0 when Bi = ∞. Let us begin with the decomposition given by Eq. (3) for the scaled process. Notice that we can rewrite bni given by Eq. (4) using Assumption 2(a) as bni (t ) =

t

Z

f a (xn (s)) − fid (xn (s)) + rid fic (xn (s)) + ric fid (xn (s)) − rjc fjd (xn (s)) − rjd fjc (xn (s))ds 0

+ (rid − ric rid + rjc rjd )yni (t ) + (−ric rid + rjc rjd )ynj (t ) + (ric rid − rjc rjd )˜yn (t ) − (r a + ric rid )uni (t ) − (r a − rjc rjd )unj (t ) + r a u˜ n (t ) + ric rid w ˜ in (t ) − rjc rjd w ˜ jn (t ) √ √ √ √ √ √ + O(yni (t )/ n) + O(ynj (t )/ n) + O(uni (t )/ n) + O(unj (t )/ n) + no(1/ n), (7) R R R √ √ √ t t t where yni (t ) , n I{xn (s) = 0}ds, uni (t ) , n 0 I{xni (s) = Bi }ds, y˜ n (t ) , n 0 I{xn1 (s) = 0, xn2 (s) = 0}ds, √ Rt n 0 i √ Rt n n n n n u˜ (t ) , n 0 I{x1 (s) = B1 , x2 (s) = B2 }ds, w ˜ i (t ) , n 0 I{xi (s) = Bi , xj (s) = 0}ds, for i ∈ {1, 2}, i 6= j, and the terms O(·) are such that O(ξ ) ≤ K ξ for a constant K ∈ (0, ∞). Notice that the processes yni , uni , y˜ n , u˜ n , and w ˜ in increase whenever xn hits one of the state space boundaries. When these

processes increase, the usual interpretation is that they ‘‘push’’ xn back to the interior of the state space in the direction of the so-called ‘‘reflection direction’’. Fig. 4 shows the directions in which xn is pushed when it hits the state space boundaries. These reflection directions are given by r1d (1 − r1c ) + r2c r2d −r2c r2d + r1c r1d

−r1c r1d + r2c r2d d2 = d1 = d r2 (1 − r2c ) + r1c r1d  a    −r − r1c r1d −r a + r2c r2d d3 = d = 4 −r a + r1c r1d −r a − r2c r2d  c d    r1 r1 − r2c r2d r1c r1d c1 = c = 2 r2c r2d − r1c r1d −r1c r1d  a  c d r −r 2 r 2 c3 = c4 = . a c d 

r



r2 r2





920

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

Fig. 4. The direction of reflection defined at each boundary and corner of the state space.

Eq. (7) can be rewritten in vector notation as b (t ) = n

Z t 0

g1 (xn (s)) ds + z n (t ) +  n (t ), g2 (xn (s))



where  n is an error accounting for the O(·) and o(·) terms, and gi is as defined in the theorem statement. Also, z n (t ) , d1 yn1 (t ) + d2 yn2 (t ) + d3 un1 (t ) + d4 un2 (t ) + c1 y˜ n (t ) + c2 w ˜ 1n (t ) + c4 w ˜ 2n (t ) + c3 u˜ n (t ) is the sum of the reflection terms. With some algebraic manipulation, and using the fact that rid and r a are positive and 0 ≤ ric < 1, one can verify that there are α p ∈ R2+ \ {0}, for p ∈ {1, 2, 3, 4}, such that

(d1 , d2 )α 1 (d2 , d3 )α 2 (d3 , d4 )α 3 (d4 , d1 )α 4

= d1 + c1 + d2 , = d2 + c2 + d3 , = d3 + c3 + d4 , = d4 + c4 + d1 ,

where (dk , dl ) denotes a matrix in R2×2 , for k, l ∈ {1, 2, 3, 4}. Hence, the processes yni and uni , for i ∈ {1, 2}, can be modified in such a way that they account for the directions appearing at the corner of the state space. That is, processes ζi , for i ∈ {1, 2, 3, 4}, can be defined in such a way that 4 X

di ζin (t ) = z n (t ).

i=1

At this point, Theorem 8 is going to be used with

ψ n (t ) ,

t

Z

gi (xn (s))ds + mni (t ) + in (t ). 0

It is straightforward to verify that

Z lim lim sup P sup sup δ→0 n t ≤T s≤δ

Rt 0

t +s



gi (xn (s))ds and in (t ) satisfy Eq. (A.1). Indeed, for v > 0 and T > 0,

 gi (x (u))du +  (t + s) −  (t ) ≥ v = lim lim sup P (K δ ≥ v) = 0, δ→0 n n

t

n i

n i

for some constant K > 0, by the boundedness of gi . Now we are going to show that {mni } is tight with continuous weak-sense limits, which guarantees that Eq. (A.1) is satisfied for mni . In order to show the desired property, Theorem 6 is used. For that, the associated Doob–Meyer processes d,n

need to be computed. By Theorem 5, one can verify that the Doob–Meyer processes associated with ma,n and mi by



ma,n (t ) =



t

Z

are given

I{xn1 (s) < B1 , xn2 (s) < B2 }λa,n (xn (s))ds,

0

D

d,n

mi

E

(t ) =

t

Z

d,n

I{xni (s) > 0}λi (xn (s))ds.

(8)

0

Using Theorem 2.4.3 of [10, pg. 63], one shows that



c ,n

mi



(t ) =

t

Z 0

(1 − qci ,n (xn (s)))qci ,n (xn (s))I{xni (s) > 0, xnj (s) > 0, xni (s) < Bi }λdi ,n (xn (s))ds.

(9)

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

α,n

Also, ma,n , mi



D

E

n = 0, mdi ,n , mcj ,n = 0, and mα, 1 , m2



α,n

921

= 0, for i, j ∈ {1, 2} and α ∈ {c , d}, by the independence assump-

tions (i.e., Assumptions 1(a) and (c)). Now, we can apply Theorem 6 with the fact that λa (·), λdi (·) and qci (·) are bounded, √ and the fact that the martingales have jumps sizes of the order of 1/ n, to show that they are tight and asymptotically continuous. The condition given by Assumption 7 is satisfied because of the restrictions on the constants r a , rid , and ric given by Assumption 2(a). Hence {z n , ζ n } is tight with continuous weak-sense limits, which implies that {xn } is tight. All that is left is to show is that any weakly convergent subsequence Rof {xn } satisfies Eq. R (6). By continuity of gi , we have that if xnl ⇒ x then gi (xnl (s))ds ⇒ gi (x(s))ds, where ‘‘⇒’’ denotes weak convergence. Also, by the tightness of {ζ n }, the term  n converges weakly to the zero process. The indicator functions in the Doob–Meyer processes, given by Eqs. (8) and (9), can be dropped with no effect on the limit, by the tightness of {ζ n }. Hence, Theorem 6 together with Assumption 2(a) gives the characterization of the martingale terms as Wiener processes. In addition, we have that t

Z

c ,n

d,n

d,n

qi (xn (s−))I{xnj (s−) > 0, xni (s−) < Bi }dmi (s) = ric mi (t ), 0

modulo an error that goes to zero as n ↑ ∞. Let R˜ , (d1 , d2 , d3 , d4 ) ∈ R2×4 . Using the identities required by the heavy traffic condition, one can rewrite R˜ in the more convenient form R˜ = Rr a , where R is the matrix given by the theorem statement.  The condition on the continuity of the functions f a , fic , and fid can be relaxed, and that is given by the theorem below. This extension is useful since most of the ‘‘routing’’ policies that will be considered here are in fact discontinuous, but satisfy the weaker conditions of Theorem 4. For more details on this subject see Theorem 8.4.1 of [10]. Theorem 4. Assume that xn (0) converges weakly to x(0), and RAssumptions 1 and 2, but replace the continuity of the functions f a R· · and fiα , α ∈ {c , d}, by measurability and suppose that φ(·) 7→ 0 f a (φ(s))ds and φ(·) 7→ 0 fiα (φ(s))ds are continuous functions on D(R; 0, ∞) w.p.1 with respect to the measure induced by any weak-sense limit x. Then, the assertions in Theorem 3 are still valid. Proof. The proof is carried out exactly as in Theorem 3, except for the characterization of the drift term. In order to accomplish that, one can apply the so-called continuous mapping theorem (e.g., Theorem 5.1 in [33], or Theorem 3.4.3 in [11]) to arrive at the desired result.  As a final note, observe that the restriction that the system is composed of two queues was used to eliminate the direction of reflection that appeared at the corner of the state space. An extension of the above result for a system containing more than two queues is possible whenever one is able to deal properly with the reflection directions appearing at the corners of the state space in such a way that Theorem 8 can still be applied. 5. Web search systems According to Barroso et al. [34], web search engines are typically implemented on parallel computing clusters in order to process all user queries it receives. The processing at a web search system is usually divided into two major parts. The first, as described by Badue et al. [29] and Barroso et al. [34], consists in retrieving the related documents from the databases, and the second consists in ranking these related documents. The former part is the most demanding part of the process. In order to quicken the response to the user, the database is split homogeneously among several parallel stations. Each station searches for the related information in a local fraction of the document collection. A response is only sent to the user when each station has finished its local search. As was observed in [29], even when these stations are homogeneous, there exists an imbalance among the servers, contributing to the delay of the response time. Consider the following strategy to reduce the imbalance: suppose that each station in the parallel cluster has more than one fraction of the document collection. For example, a station may have its own fraction and the fraction of a neighbor. Then if this station finishes its process earlier, and its neighbor is still processing an older task, it may grab a task (that is not being processed) from the neighboring station in order to quicken response time. In the case considered here, we suppose that every station has a copy of every fraction of the document collection. The queueing model and the heavy traffic limit derived in the previous sections are used to find a good policy for moving customers from one queue to the other. This policy is found through the formulation of a stochastic optimal control problem, which is defined below. The rest of the section is structured as follows. In Section 5.1, a simulation used to test and validate the balancing strategies is described. The simulation is based on the data collected by Gonçalves et al. [31] from a real web search system. Also, we explain how we implemented the strategy designed for a system of two queues in a system with several queues. Section 5.2 describes how one can use the heavy traffic limit derived in Section 4 to obtain an approximated dynamics for the system. In Section 5.3, we formulate the stochastic optimal control problem that will be used to find a good policy for moving tasks between queues. The numerical results are given in Section 6.

922

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

5.1. Computer simulation In order to test the balancing policies, an event-driven computer simulation of a web search system was constructed. Following the observation of Gonçalves et al. [31], we assume that the inter-arrival time of customers is exponentially distributed. Also,according to the model presented by Gonçalves et al. [31], we let the service time be separated into two cases: in the first case, the server has to access the disk, and in the second case, the server retrieves the needed information from the disk cache. In both cases, the service times are exponentially distributed. Hence, with probability 1 − h ∈ [0, 1] the service time will be exponentially distributed with mean md when the system needs to access the disk, and with probability h the service time will be exponentially distributed with mean mc when it finds the needed information in the disk cache. From the data collected in [31] for a system of eight homogeneous servers, we have h = 0.17, md = 38.12 ms and mc = 9.20 ms. Since the model is constructed for a system of two queues, the policy of moving customers between stations is implemented as follows. At the end of a task completion at any station, the largest queue in the system is found (that is, the station with the most number of pending tasks). Then, the policy is used to decide whether a task can be moved from the largest queue to the station that has just finished its task. If it is to be moved, then the task that would be processed next by the station with the large queue is moved. In order to make the simulation more realistic, a moved task and the task following it will always have service distribution of the first type (i.e., exponentially distributed with mean md ) since it will be unlikely that they find any cached information. The simulation computes the response time of the system under steady state. For that, averages begin to be computed after 104 job completions. The program terminates after 5 × 104 job completions. We also compute the 95% t confidence interval for the simulation (see [35, pg. 392]). 5.2. Application of the theoretical model In practice, we do not have a sequence of queueing systems indexed by a parameter n that approaches heavy traffic as n increases. Instead, we have one system operating near its maximum capacity, and we want to use the heavy traffic limit in order to approximate its dynamics. The result given in Section 4 is used in the following way: a large parameter √ n, say n = N ∈ N, is chosen and the number of customers in each queue, given by Xi (t ), can be approximated by Xi (Nt ) ∼ Nxi (t ), where x(t ) is given by Eq. (6). Since the service and arrival rate are not state dependent in this application, let them be given by the constants λ and µi , i = 1, 2, respectively. Also, given the √assumption that the system is under heavy√traffic, which is a common scenario in web search systems, the constants bi , N (µi − λ) are small relative to the size of N. Hence, using the notation in Section 4, we have that

λa,N (x) = λ

√ λid,N (x) = λ + bi / N ,

i = 1, 2.

In addition, our routing policy is given by c ,N

qi

√ (x) = fic (x)/ N ,



where fic : R2+ → [0, K ] will be determined later, and K < N is a positive constant. It is important to mention here that, in order to be in complete agreement with the theoretical model, K should be set to 1. However, we leave it free at this point and show, by numerical experimentation, that indeed the choice of K = 1 yields better controls; see Section 6 for more detail. Hence, for this case, the process x(t ) is given by dx(t ) =



    −b1 + λ f1c (x(t )) − f2c (x(t )) 2λ dt + c c λ −b2 + λ f2 (x(t )) − f1 (x(t ))

λ 2λ

1/2

dw(t ) +



1 0

0 1

−1 −1

 −1 dζ (t ), −1

(10)

where we denote by A1/2 the matrix such that A1/2 (A1/2 )0 = A, for any matrix A which accepts such representation. For the experiments performed here, we used N = 100. Also, the service rate was set to µi = 1/(h · mc + (1 − h) · md ), where h, mc , and md are given in the previous section. This is not ideal, since the service time described in Section 5.1 is, in fact, hyper-exponentially distributed. The plot of the exponential and hyper-exponential cumulative distribution functions using the parameters above is plotted in Fig. 5 for comparison. Note that the exponential distribution gives us a reasonably good estimate. 5.3. Optimal control problem The optimum control problem is defined via a cost function that is used to guide a search in the space of all admissible policies. This cost function must penalize imbalance among queues and also the control action itself, since moving customers between queues can be costly. The following discounted cost is used:

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

923

Fig. 5. Plot of the exponential and hyper-exponential cumulative distribution functions (CDFs).

W ( x, f ) = c

c Efx

"Z

#



e

−β t

(v|x1 (t ) − x2 (t )| +

X

ci fi (x(t )))dt , c

i=1,2

0

where v is a constant associated with the cost of having an imbalance in the system, c1 and c2 are constants associated with the cost of movingcustomers between queues, x(·) = (x1 (·), x2 (·))0 is the system dynamics equation given by Eq. (10), and x(0) = x is the initial condition. Hence, we seek the solution to the following optimization problem:

(

find f ∗ ∈ U that minimizes the costs V (x, f ∗ ) = inf W (x, f ),

(11)

f ∈U

where U is the set of admissible feedback controls of the type f = (f1 , f2 )0 and fi : R2+ → [0, K ], i = 1, 2. In order to compute the optimal policy numerically, we used the Markov chain approximation method √of Kushner and Dupuis [30] (using the implementation of Jarvis and Kushner [36]) with discretization parameter set to 1/ N, as suggested in [37]. We let c1 = c2 = c, where c is a positive constant that is chosen later. Also, we set v = 1, β = 0.001, and let the scaled size of the buffers be Bi = 5, for i ∈ {1, 2}. Finite buffers are needed for the Markov chain approximation method, but this does not influence the final result. 6. Numerical results We begin by validating the simulation described in Section 5.1. For that, the setup used in [31] is considered: a web search system composed of eight identical stations with mean service time described in Section 5.1. We plot the response time of the simulation for varying values of arrival rates. In addition, the upper and lower bounds used in [31] are plotted; they are given by U =

H8 · m 1−ρ

L=

m 1−ρ

,

respectively, where m = h · mc +(1 − h)md , ρ is the traffic intensity, and Hp is the pth harmonic number (1 + 1/2 +· · ·+ 1/p). The results are given by Fig. 6(a). The behavior of the simulation is very similar to the one found in the real web search system of [31]. At this point, the simulation is used to fine tune the parameters of the model. In order to find a good choice of the parameter K (defined in Section 5.2), we solve the optimization problem (11) for varying numbers of K with arrival rate set to 28 queries/second. Every optimal policy is given by a switching curve; that is, after a certain threshold, the control is applied at maximum intensity. These switching curves are roughly a straight line, except near the boundaries. For convenience, we approximate them by a straight line. The optimal policies were applied to the simulation and the mean response times were computed. The result is given in Table 1. Observe that the policies are interpreted as follows: with probability P, move a customer from queue 1 to queue 2 if aX1 − b > X2 , where a and b are constants and X1 (resp., X2 ) is the (unscaled) number of customers in the first queue (resp., second queue). With these results, we use K = 1 henceforth. In order to find a good choice of parameter c, which is the constant defined in Section 5.3 and is related to the cost of moving a customer from one queue to another, we follow the strategy in the previous paragraph. The result is summarized in Table 2. Notice that c = 15 gives the best response time. Thus it will be used from now on. Also, the optimum switching curves for varying c are plotted in Fig. 6(b). Now that c = 15 and K = 1 are chosen, we compare the system using the optimum control with the uncontrolled system. The result is given by Fig. 7(b). Notice that, as the arrival rates increase (i.e., as the system approaches heavy traffic),

924

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

a

b

Fig. 6. (a) Plot of the system response time calculated by the simulation with varying arrival rates (the 95% t confidence interval for the simulation is also plotted). (b) Optimal switching curves (i.e., f2c (·)) for varying values of c ∈ {1, 5, 10, 15, 20, 25} (shown for the scaled state space). Table 1 Table reporting the response time (of the simulation) for a system using the optimal control found for different values of parameter K . K

Mean response time

95% t confidence interval

Switching curve

1 2 3 4 5 6 7

0.629228 0.725531 1.044478 1.678786 0.871848 0.973629 1.035629

0.010970 0.019650 0.062501 0.157343 0.036587 0.049951 0.050448

X1 X1 X1 X1 X1 X1 X1

− 1.00 > X2 − 1.00 > X2 − 1.11 > X2 − 1.35 > X2 − 2.00 > X2 − 2.00 > X2 − 2.00 > X2

Probability P 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Table 2 Table reporting the response time (of the simulation) for a system using the optimal control found for different values of parameter c. c

Mean response time

95% t confidence interval

Switching curve

1 5 10 15 20 25

0.629651 0.601555 0.599692 0.593476 0.600476 0.614856

0.012978 0.009725 0.011384 0.007581 0.008314 0.007030

X1 X1 X1 X1 X1 X1

− 1.00 > X2 − 2.00 > X2 − 4.00 > X2 − 5.00 > X2 − 6.00 > X2 − 8.00 > X2

Probability P 0.1 0.1 0.1 0.1 0.1 0.1

the system response time with control approaches that of the lower bound (given in Fig. 6(a)) as opposed to the system with no control that approaches the upper bound. Fig. 7(a) plots the regions where each control fic (·) is active. In order to have a comparison basis, let us define the SIB (switch if bigger) control as following: move a customer from queue 1 to queue 2 with probability one if X1 − 1.0 > X2 . This control reflects the general intuition that one should move a customer if it is in a larger queue (ignoring the customer at service). Fig. 8(a) shows the relative gain1 of the system using the optimum control with respect to the uncontrolled system and the system using SIB control. Notice that for lower arrival rates the system using the SIB control is actually better. However, as the system approaches heavy traffic, the optimal control yields lower response times when compared to the SIB control, having relative gain over 80% for arrival rate of 28 queries/second. Also notice that, for this arrival rate, the system using the SIB control has worse relative gain when compared to the system with no control. This means that, under heavy traffic, SIB actually increases the response time of the system. Next we plot the relative gain of the system with control with respect to the system with no control for varying numbers of servers and arrival rate set to 28 queries/second. The results are given in Fig. 8(b). Notice that there is a slight increase in gain as the number of servers increases, suggesting that this strategy is good for systems composed of more servers. For a second experiment, we add four new faster servers to the eight used until now. We suppose that the service rate continues to be the same for the old servers, and we let h = 0.17, md = 30.0 ms, and mc = 7.0 ms for the new servers. Therefore we want to find a good balancing strategy for this new heterogeneous system. When the decision of moving a customer is between two homogeneous servers, we continue to use the rule found previously. What we want to find now is a good strategy to move a customer when the decision is between two heterogeneous servers. The optimal policy found for arrival rate of 28 queries/second is the following: if queue 2 is faster than queue 1, then with probability 0.1 move a customer from queue 1 to queue 2 if 1.34 · X1 − 2.00 > X2 ; otherwise, if queue 2 is slower than

1 Here, we call relative gain of any quantity b with respect to a the quantity [(a − b)/a] · 100%.

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

a

925

b

Fig. 7. (a) Region of the (scaled) state space where each control acts. Control 1 and 2 are f1c (·) and f2c (·), respectively. The regions were roughly approximated by R1 = {x ∈ R2+ : x1 + 0.5 > x2 } and R2 = {x ∈ R2+ : x1 − 0.5 < x2 }. The blue line crossing the middle of the state space is the line x1 = x2 . (b) Plot of the system response time calculated by the simulation with varying arrival rates calculated for the system with control and with no control (the 95% t confidence interval for the simulation is also plotted). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

a

b

Fig. 8. (a) Relative gain of the system with control with respect to the system with no control and the system using the SIB control. (b) Relative gain of the system with control with respect to the system with no control for varying number of servers and arrival rate set to 28 queries/second (the 95% t confidence interval for the simulation is also plotted).

a

b

Fig. 9. (a) Region of the (scaled) state space where each control acts. Controls 1 and 2 are f1c (·) and f2c (·), respectively. The regions were roughly approximated by R1 = {x ∈ R2+ : 1.34x1 + 0.2 > x2 } and R2 = {x ∈ R2+ : 1.34x1 − 1.6 < x2 }. The blue line crossing the middle of the state space is the line x1 = x2 . In this plot, queue 2 is faster than queue 1. (b) Plot of the system response time with varying arrival rates calculated for the system with control and with no control (the 95% t confidence interval for the simulation is also plotted). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

queue 1, then with probability 0.1 move a customer from queue 1 to queue 2 if 0.71 · X1 − 10.9 > X2 . Fig. 9(a) plots the regions where each control fic (·) is active. Next, we compare the response time of the system with no control to the system with control;, the result is given in Fig. 9(b). Observe that, as the system approaches heavy traffic, the difference between the response times is very large. This is explained by the fact that the slower servers will be a bottleneck for the system with no control for arrival rates larger than 28 queries/second. This is not such a big problem for the system with control since the faster servers will help reduce congestion by moving clients from the slower queues.

926

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928

a

b

Fig. 10. (a) Relative gain of the system with control with respect to the system with no control, the system using the SIB control, and the system using the control found for the homogeneous system. (b) Relative gain of the system with control with respect to the system with no control for varying number of servers and arrival rate set to 28 queries/second (the 95% t confidence interval for the simulation is also plotted).

Fig. 10(a) shows the relative gain of the system with control with respect to the system with no control. Also, we compare the control found for the heterogeneous model with the system using the SIB control and the system using the strategy found for the homogeneous system of the previous experiment. Again, under heavy traffic, the system using the SIB control has higher response times. Also observe that the system using the heterogeneous control has a slight gain over the system using the homogeneous control. Finally, we plot the relative gain in the response time for the system with control with respect to the system with no control for varying number of parallel servers. For simplicity, we let half of the servers be fast and half slow. The results are given in Fig. 10(b). Again, there is a slight increase in gain. This suggests that this strategy is suitable for systems composed of more servers. 7. Conclusion We have presented the heavy traffic limit of two state-dependent queues with triggers in parallel. This theoretical model was applied to a web search system and a stochastic optimal control problem was constructed in order to find a good strategy for reducing imbalance. The results indicate that the strategy presented here can be beneficial for reducing the response times of real web search systems. For future work, we will consider the possibility of server breakdowns. As pointed out in [34], web search systems are typically composed of multiple unreliable replicas. Also, it would be interesting to account for the possibility of multithreading at each processing node. Acknowledgements We would like to express our sincere gratitude to Dr. Artur Ziviani for the insightful discussions which directed us to the application of the model proposed here, and for allowing us to have access to his unpublished manuscript [31]. The research was partially supported by the Brazilian National Research Council-CNPq, under the Grants Nos. 140687/2005-0, 384163/2009-2, 301740/2007-0 and 470527/2007-2, and by FAPERJ, Grant No. E-26/100.579/2007 and E-26/170.008/2008.

Appendix. Auxiliary results This appendix contains some auxiliary results which are used in this paper and are here for ease of reference. Theorem 5. Let M be an Ft -martingale such that M (t ) =

Z

t

H (s) (dN (s) − λ(s)ds) ,

0

where N is a point process with Ft -stochastic intensity λ and H is an predictable process satisfying t

Z E

 |H (s)|2 λ(s)ds < ∞,

t ≥ 0.

0

Then the Doob–Meyer process associated with M is given by

hM i (t ) =

Z

t

H 2 (s)λ(s)ds,

t ≥ 0.

0

Proof. From Itô’s formula (given by Theorem 32 of [38, pg. 71]) for M with f (x) = x2 , we have that

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928 t

Z

M 2 (t ) = M 2 (0) +

2M (s−)dM (s) +

X

927

(1M (s))2 ,

0
0

where 1M (s) = M (s) − M (s−). Notice that, if M jumps at the instant s, then (1M (s)) = H (s). Hence, we may write

X

(1M (s))2 =

∞ X

0
H 2 (sl )I{sl ≤ t } =

t

Z

H 2 (s)dN (s), 0

l =0

where sl denotes the instant of the l-th jump of M (or N). Therefore, we have that M 2 (t ) = M 2 (0) +

t

Z

[2M (s−) + H (s)] H (s) (dN (s) − λ(s)ds) +

= M (0) +

t

Z

H 2 (s)λ(s)ds 0

0 2

t

Z

[2M (s−) + H (s)] dM (s) +

t

Z

H 2 (s)λ(s)ds. 0

0

Rt

˜ (t ) , [2M (s−) + H (s)] dM (s) is an Ft -martingale. By the definition of the By Theorem T8 of [32, pg. 27], we have that M 0 Doob–Meyer process associated with M, we have that hM i (t ) =

t

Z

H 2 (s)λ(s)ds.  0

The theorem below is Theorem 2.8.2 of [10, pg. 80] adapted to the needs of this paper. In the statement of the theorem below, let Sq denote the space of Rq×q symmetric and non-negative definite matrices. Theorem 6. Let V n , H n , and M n be stochastic processes with values in D(Rd ; 0, ∞), D(Sq ; 0, ∞), and D(Rq ; 0, ∞), respectively. 2 Let Ftn be the minimum σ -algebra that measures {V n (s), H n (s), M n (s); s ≤ t }. Suppose that M n is an Ftn -martingale with

Rt

associated Doob–Meyer process given by hM n i (t ) = 0 H n (s)ds, where H n is bounded. In addition, suppose that the maximum discontinuities of M n are bounded by δn , which goes to zero as n ↑ ∞. Then {M n } is tight with continuous weak-sense limits. If in addition (V n , H n , M n ) converges weakly to (V , H , M ), where H is deterministic and Ft is the minimal σ -algebra that measures   R1 {V (s), H (s), M (s); s ≤ t }, then M is an Ft Wiener process with covariance matrix given by E M (1)M (1)0 = 0 H (s)ds.

The theorem below is an adapted version of Theorem 3.6.1 of [10, pg. 130]. In order to state it, the following notation needs to be introduced. Let K ≥ 0 and Bi ∈ (0, ∞) ∪ {∞}, i ∈ {1, . . . , K }, be given constants. Let S1 , {i = 1, . . . , K : Bi < ∞}. Define the state space G , {ξ ∈ RK : 0 ≤ ξi ≤ Bi , for i ∈ S1 , and 0 ≤ ξi < ∞, for i 6∈ S1 }. For i ∈ {1, . . . , K }, let ni denote the interior normal on the face of ∂ Gi , {ξ ∈ RK : ξi = 0}. For i ∈ {K + S1 } , {i + K : i ∈ S1 }, let ni denote the interior normal on the face of ∂ Gi , {ξ ∈ RK : ξi−K = Bi−K }. On each of those faces, there is defined a ‘‘reflection direction’’ di ∈ RK , which satisfies hdi , ni i > 0 for i ∈ S, where S , {1, . . . , K } ∪ {K + S1 }. Also, ni and di satisfy the assumption below. Assumption 7. For each x on the boundary of G, let N (x) , {ni : x ∈ ∂ Gi } and D(x) , {di : x ∈ ∂ Gi }; then there exists a vector ν ∈ Hull(N (x)) such that hγ , νi > 0 for all γ ∈ D(x), where Hull(N (x)) denotes the set generated by convex combinations of the elements of N (x). Let xn , ψ n , z n , and yni , for i ∈ S, be processes such that xn (t ) = xn (0) + ψ n (t ) + z n (t ), z n (t ) =

X

di yni (t ),

i∈S

where xn (t ) ∈ G for all t ≥ 0, and yni is a process satisfying yni (0) = 0, yni is non-decreasing, increases only at t such that xn (t ) ∈ ∂ Gi , and has continuous sample paths. In addition, the process ψ n satisfies, for each v > 0 and T > 0, the following condition:



lim lim sup P sup sup |ψ n (t + s) − ψ n (t )| ≥ v

δ→0

n



= 0.

(A.1)

t ≤T s≤δ

Theorem 8. Let G, {ni ; i ∈ S }, {di ; i ∈ S }, xn , ψ n , z n , and yn be defined as above. Suppose that {xn (0)} is tight; then {z n , yn } is tight and the limit process of any weakly convergent subsequence is continuous almost surely. References [1] O.J. Boxma, G. Koole, Z. Liu, Queueing-theoretic solution methods for models of parallel and distributed systems, in: O.J. Boxma, G.M. Koole (Eds.), Performance Evaluation of Parallel and Distributed Systems—Solution Methods, CWI Tract, Amsterdam, 1994, pp. 1–24. [2] D.L. Eager, E.D. Lazowska, J. Zahorjan, A comparison of receiver-initiated and sender-initiated adaptive load sharing, Performance Evaluation 6 (1) (1986) 53–68. 2 D(S ; 0, ∞), for a metric space S, is the space of all S-valued functions on [0, ∞) that are right-continuous and have left-hand limits; see, for example, [33] for reference.

928 [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]

S.C. Leite, M.D. Fragoso / Performance Evaluation 67 (2010) 913–928 E. Gelenbe, G-networks with triggered customer movement, Journal of Applied Probability 30 (1993) 742–748. J.R Artalejo, G-networks: a versatile approach for work removal in queueing networks, European Journal of Operational Research 126 (2000) 233–249. S. Chandrasekhar, Stochastic problems in physics and astronomy, in: N. Wax (Ed.), Noise and Stochastic Processes, Dover, New York, 1994. W. Feller, Diffusion processes in genetics, in: Proc. Second Berkeley Symposium on Probability and Statistics, University of California Press, Berkeley, 1951, pp. 227–246. I Karatzas, S.E. Shreve, Methods of Mathematical Finance, Springer-Verlag, New York, 1998. G.P. Karev, F.S. Berezovskaya, E.V. Koonin, Modeling genome evolution with a diffusion approximation of a birth-and-death process, Bioinformatics 21 (Suppl. 3) (2005) iii12–iii19. T.G. Kurtz, Approximation of Population Processes, in: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 36, SIAM, Philadelphia, 1981. H.J. Kushner, Heavy Traffic Analysis of Controlled Queueing and Communication Networks, Springer-Verlag, New York, 2001. W. Whitt, Stochastic-Process Limits, Springer-Verlag, New York, 2002. J.F.C. Kingman, The single server queue in heavy traffic, Mathematical Proceedings of the Cambridge Philosophical Society 57 (1961) 902–904. Yu. Prohorov, Transient phenomena in process of mass service, Litovsk. Mat. Sb. 3 (1963) 199–205 (In Russian). A. Borovkov, Some limit theorems in the theory of mass service I, Theory of Probability and its Applications 9 (1964) 550–565. A. Borovkov, Some limit theorems in the theory of mass service II, Theory of Probability and its Applications 10 (1965) 375–400. S.C. Leite, M.D. Fragoso, Diffusion approximation of state dependent G-networks under heavy traffic, Journal of Applied Probability 45 (2008) 347–362. A. Mandelbaum, G. Pats, State-dependent stochastic network. part I: approximations and applications with continuous diffusion limits, The Annals of Applied Probability 8 (1998) 569–646. F. Baccelli, A.M. Makowski, A. Shwartz, The fork-join queue and related systems with synchronization constraints: Stochastic ordering and computable bounds, Advances in Applied Probability 21 (3) (1989) 629–660. J.R. Chen, A hybrid solution of fork/join synchronization in parallel queues, IEEE Transactions on Parallel and Distributed Systems 12 (8) (2001) 829–844. S.-S. Ko, R.F. Serfozo, Response times in m/m/s fork-join networks, Advances in Applied Probability 36 (2004) 854–871. R. Nelson, A.N. Tantawi, Approximate analysis of fork/join synchronization in parallel queues, IEEE Transactions on Computers 37 (6) (1988) 739–743. V. Nguyen, Processing networks with parallel and sequential tasks: heavy traffic analysis and Brownian limits, The Annals of Applied Probability 3 (1) (1993) 28–55. X. Tan, C. Knessl, A fork-join queueing model: Diffusion approximation, integral representation and asymptotics, Queueing Systems 22 (1996) 287–322. S. Varma, Heavy and light traffic approximations for queues with synchronization constraints, Ph.D. Thesis, Department of Electrical Engineering, University of Maryland, 1990. S.L. Bell, R.J. Williams, Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: asymptotic optimality of a threshold policy, The Annals of Applied Probability 11 (3) (2001) 608–649. J.M. Harrison, Heavy traffic analysis of a system with parallel servers: Asymptotic optimality of discrete-review policies, The Annals of Applied Probability 8 (3) (1998) 822–848. J.M. Harrison, M.J. López, Heavy traffic resource pooling in parallel-server systems, Queueing Systems: Theory and Applications 33 (4) (1999) 339–368. R.J. Williams, On dynamic scheduling of a parallel server system with complete resource pooling, in: Proceedings of the Fields Institute Workshop on Analysis and Simulation of Communication Networks, American Mathematical Society, Providence, RI, 1999, p. 48. C.S. Badue, R. Baeza-Yates, B. Ribeiro-Neto, A. Ziviani, N. Ziviani, Analyzing imbalance among homogeneous index servers in a web search system, Information Processing and Management 43 (2007) 592–608. H.J. Kushner, P.G. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer–Verlag, New York, 1992. C.S. Badue Gonçalves, J. Almeida, V. Almeida, R. Baeza-Yates, B. Ribeiro-Neto, A. Ziviani, N. Ziviani, A capacity planning model for web search engines, Unpublished Manuscript, 2007. P. Brémaud, Point Processes and Queues, Martingale Dynamics, Springer-Verlag, New York, 1981. P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York, 1968. L.A. Barroso, J. Dean, U. Holzle, Web search for a planet: the Google cluster architecture, IEEE Micro 23 (2) (2003) 22–28. D. Gross, C.M. Harris, Fundamentals of Queueing Theory, 3rd ed., John Wiley & Sons, New York, 1998. D. Jarvis, H.J. Kushner, Codes for optimal stochastic control: documentation and users guide. Technical report, Brown University, Lefschetz Center for Dynamical Systems Report 96-3, 1996. H.J. Kushner, J. Yang, D. Jarvis, Controlled and optimally controlled multiplexing systems: a numerical exploration, Queueing Systems 20 (1995) 255–291. P. Protter, Stochastic Integration and Differential Equations: A New Approach, Springer–Verlag, New York, 1995.

Saul C. Leite received his B.Sc. degree in Computer Science from the Oklahoma State University in 2002. From 2003 to 2005, he worked under graduate technical scholarships at the University of São Paulo and at the National Laboratory for Scientific Computing (LNCC), Brazil. In 2009, he received his Ph.D. degree in Computational Modeling from the LNCC, where he currently works under a grant from the National Counsel of Technological and Scientific Development (CNPq). His current interests include stochastic models, heavy traffic analysis of queueing systems, and stochastic control theory.

Marcelo D. Fragoso received his DIC and Ph.D. in Electrical Engineering from the Imperial College of Science and Technology, London, England, both in 1986. Since then he has been with the Department of Systems and Control of the National Laboratory for Scientific Computing, Rio de Janeiro, Brazil, where he is currently a full Professor. He is the co-author of the book Discrete Time Markov Jump Linear Systems (New York: Springer-Verlag, 2005). He served as a member of the Editorial Board of the journal Computational and Applied Mathematics, Birkhäser, from 1994 to 1998. His general research interest is in stochastic systems including fundamental theory, modeling, control and filtering. His current research interest is focused on Markov jump linear systems, heavy traffic analysis of queueing network and jump-type Fleming–Viot processes in dynamic population genetics.