IFAC
Copyright ro IFAC System Structure and Control, Prague, Czech Republic, 2001
c
0
C>
Publications www.elsevier.comllocatelifac
SENSITIVITY ANALYSIS OF JOINT CHARACTERISTICS OF (MAX,+ )-LINEAR QUEUEING NETWORKS Bernd Heidergot t
Eindhoven University of Technology Department of Mathematics and Computer Science Division of Mathematics p.a. Box 513, NL-5600 MB Eindhoven
Abstract: We introduce the concept of V- differentiability of matrices over the (max,+ ) algebra. Specifically, we view the stochastic (max,+ )-linear system x(k + 1) = A(k) 0 x(k), for k ~ 0 with x(O) = xo, as a Markov chain the transition dynamic of which is given through the matrices A(k). Elaborating on the product rule of V differentiability for Markov kernels, we obtain results on differentiability of (max,+)linear systems and unbiased gradient estimators as well. Moreover, we establish sufficient conditions for deducing the differentiability of a (max,+ )- linear system from that of the firing time distributions of the corresponding stochastic event graph. The results hold uniformly on a predefined class of performance functions . We illustrate our approach with an analysis of joint characteristics of waiting times in a (max,+)linear queueing network. Copyright 0200 J IFA C Keywords: perturbation analysis, stochastic Petri nets, Markov models, estimation algorithms
1. INTRODUCTION
In this paper, the concept of V-differentiability of matrices over the (max,+) algebra is introduced, where V is a predefined class of performance functions. Specifically, we interpret the (max,+)linear system x(k + 1) = A(k) 0 x(k), for k ~ 0, with x(O) = xo, as a Markov chain the transition dynamic of which is given through the matrices A(k). Elaborating on the product rule of Vdifferentiability for Markov kernels, see (Heidergott, B. and Vazquez-Abad, F., 2000), we obtain results on differentiability of (max,+ )-linear systems and unbiased gradient estimators as well. Moreover, we present a scheme that allows to deduce the differentiability of a (max,+ )-linear system from that of the firing times of the corresponding stochastic event graph. In other words, the complexity of the derivative is independent of the type of performance function . We think of V-differentiability as a promising concept for
Baccelli and Schmidt showed in their pioneering paper (Baccelli, F. and Schmidt, V., 1996) that waiting times in (max,+ )-linear queueing networks with Poisson-A-arrival stream can be obtained via Taylor expansions w.r.t. A (see also (Baccelli, F. et al., 1997». For certain classes of open queueing networks this leads to a feasible way of calculating the waiting time distribution, see (Jean-Marie, A., 1997). In a recent paper Ayhan and Baccelli developed joint characteristics, like covariances, of stationary waiting times in (max,+ )-linear systems into a Taylor series, see (Ayhan, H. and Baccelli, F., 2000) . Unfortunately, the elements of Taylor series expansion turned out to be of such complexity that they provide no feasible way of calculating, say, the covariance of stationary waiting times.
891
obtaining Taylor series expansions of joint characteristics of (max,+ )-linear systems.
max
The paper is organised as follows. Section 2 introduces the concept of measure valued differentiation and states the product rule of measure valued differentiation for Markov kernels. In Section 3, we introduce (max, + )-linear systems and their corresponding Markov chains. Sensitivity analysis of (max,+ )- linear systems is addressed in Section 4, and Section 5 discusses the application of our approach to a queueing network example.
0) = l[o,l/o)(x)(l - ()x)
(~fo(X),
and
0) = l[l/0,oo)(x) «()x -
max ( - !fo(x),
1) e-
Introducing densities
It(x)
~ l[O ,l/O)(X) (1- ()x) e- ox
:=
(1)
and 1 Co
2. MEASURE VALUED DIFFERENTIATION
100(x) := -1[1/0 00) (x) «()x - 1) e- lJx , (2)
Let (5, S) be a Polish measurable space. Denote the set of finite (signed) measures on (S, S) by M, and the set of probability measures by M1. To abbreviate the notation, we write for Jl E M
< g,Jl >=
with 1 Co := ()e '
1
g(x) Jl(dx) ,
we obtain for any g E Cp([O, 00))
!1
whenever the right-hand side is defined. For e c lR a compact set, let (JlO : () E e) be a family of probability distributions on (S,S). Denote by £1 (JlO; 6) c IRs the set of mappings g : S -+ IR, such that < g, Jlo > is finite for all () E e.
g(x)lo(dx)
R+
1/0
= ()le
Example 2. Let S
°
In the above example, Jlo as well as Jlt and Jlo have Lebesgue densities, that is, the measure as well as its V-derivative are dominated by the same measure. The following example demonstrates that this is not always the case.
°
sup
1~/8(X)1
S (l+x)e-
ax
Example 3. Let Jlo be the uniform distribution on the interval [0, ()] for < () S a, with a < 00. Let Cb denote the set of continuous mappings from [0, a] onto lR. For any g in Cb it holds that
°
,
which has a finite Lebesgue integral. Applying the dominated convergence theorem, we obtain for any g E Cp([O, (0)) 00
!
00
~! g(x)fo(x)dx = ° 0
g(x) «()x - 1) e- oz dx
and a Cp([O,oo»-derivative of JlO is given by (co, It d>", 10" d>" ), where>.. denotes the Lebesgue measure on lR.
be the Lebesgue density of an exponential distribution with rate (), denoted by JlO. Take e = [a, b], with < a < b < 00 and V = Cp([O,oo», where Cp([O, 00)) is the set of all mappings g from [0,(0) onto IR, such that Ig(x)1 S Cl + c2lxl P , for Cl,C2 ~ 0. Then JlO is Cp([O,oo))-differentiable for any p. To see this, note that
OE[a,b)
1 00
I/O
x ~
,
g(x) (1 - ()x) e- ox dx
- ()le
for all g E V .
= [0,00) and let
fo(x) := ()e- lJz
1 °
Definition 1. A measure Jlo(') E Ml is called Vdifferentiable, for V C Ll(Jlo; e), if a finite signed measure Jle (-) E M exists, such that:
~ ( g , JlO) = (g, Jle) ,
'
g(x)
!f
.(x) ",(dx) =
! (i
i
.(x) dx )
o
~/o(x)dx.
1 = og«())
=~
Note that
-
1 ()2
!
g(x) dx
°
(I g(x) 80
(dx) -
!
g(x) JlO(dx))
where 8x denotes the Dirac measure in x . Hence. (1/(),80,Jlo) is a Cb-derivative of JlO· .
892
ox
The mapping P: S x S -+ [0,1] is a homogeneous Markov kernel (on (S,S)) if (a) P(·;s) E Ml for all S E S; and (b) P(B;·) is S measurable for all B E S. Denote the set of Markov kernels on (S, S) by K. The product of Markov kernels is again a Markov kernel. Specifically, let P, Q E K, then the product of P and Q is defined as follows. For S E S and B E S set PQ(B;s) = (PoQ)(B,s) = fsP(B;z)Q(dz;s). Moreover, write pn(.; s) for the measure obtained by the n fold product of P in the above way. When an initial distribution J.L E Ml is given, P defines a Markov process {X(n)} with state-space (S,S). For example, one may think of starting X(n) in an initial state So and take J.L = oso, the Dirac measure in {so}, as initial distribution. For what follows we assume that the initial distribution J.L is fixed. We set < g,P >= fg(x)P(dx;y), where it is understood that the left-hand side depends on the initial value y, and for J.L E M 1 ,
< g, PJ.L > = /
/
Result (R) If Pe is V-differentiable, such that for any 9 E V
/ g(s) Pe (ds; .) E V 5
and sup / g( s) Po (ds; .) E V , /lEe
5
Pe
then is V-differentiable and it holds for any 9 E V that n
d( p,n) d(J g, /I
=
" ( p,n- j p,'p,j-l) ~ g, /I /I /I j=1
=
tJ
cP.(Sj-l)g(sn) . IT P/I(dsi;Si-d
)=1 5 "
.=)+1
(P:(dsj; Sj-l) - Pi (ds j ; Sj-I)) j-l (4) x P/I(ds i ; si-d , x
g(x) P(dx; Y)J.L(dy) E lIt
IT
For e c IR a compact set, let (Po : (J E 6) be a family of Markov kernels on (S, S) and assume that the initial distribution is fixed and independent of (J. Denote by L 1 (Po; 6) C 1R5 the set of mappings 9 : S -+ IR, such that f g(u)Po(du; s) is finite for all (J E e and sE S. To simplify the notation, we will suppress the initial distribution when this causes no confusion.
i=1
where So denotes the initial state of the chain. In words, the derivative of < g, > is obtained as the rescaled difference of two variants of the nominal Markov chain. These variants are obtained by splitting the nominal chain at transition j, where for one variant the transition is performed according to P:, the positive part of Po , and for the other variant the transition is obtained by performing this particular transition according to P/I-' the negative part of Po.
Pe
We call Po V-differentiable if for any s E S the measure Po (.; s) is V-differentiable. For details and examples of V-differentiable Markov kernels, we refer to (Heidergott, B. and Va.zquez-Abad, F., 2000). Following (Heidergott, B. and Va.zquezAbad, F ., 2000), if Pe(·; s) exists for s E S, then there exist Markov kernels P: (-; s), Po- (-; s) and a constant CP. (s), such that for all 9 E V
3. (MAX,+ )-LINEAR SYSTEMS The (max,+) algebra is usually introduced as follows. Let f = -00 and let us denote by lR. the set lR U {f}. For elements a, b E lR. we define the operations El) and 0 by a El) b = max( a, b) and a 0 b = a + b, where we adopt the convention that for all a E lR max(a, -00) = max( -oo,a) = a and a+ ( -00) = - 00 + a = -00. The set lR. together with the operations El) and 0 is called the (max,+)algebra and is denoted by IRmax. In particular, f is the neutral element for the operation El) and absorbing for 0, that is, for all a E lR. a 0 f = f. The neutral element for 0 is e = O. For matrices A E jRlxJ and B E lRJxK the matrix product A 0 B is defined in the usual way as follows:
~(g, Po(';s») = cP.(s)((g, Po+(';s») - (g, P/I-(';s»)) . (3)
?:
The kernel is called the (normalised) positive part of Po' and P/I- is called the (normalised) negative part of Po'; and cP. (s) is called the normalising constant. These measures are not unique, an instance of them is always obtainable via the Hahn-Jordan decomposition of
Pe.
Definition 4. The triple (cP.(s),P:(·;s),Pi(·;s» in (3) is called a V -derivative of P/I (.; s).
J
(A 0 B)ik
In (Heidergott, B. and Va.zquez-Abad, F., 2000) the following result has been proved for homogeneous Markov chains.
= EB Aij 0
Bjk .
(5)
j=1
Example 5. Consider an open tandem network consisting of J FCFS single servers. Let queue 0
893
IIAIIED = max{IAijERAij : 1 ~ i ~ I, 1 ~ j ~ J}
represent an external arrival stream of customers. Each customer who arrives at the system has to pass through queues 1 to J, and then leaves the system. For simplicity, we assume that the system starts empty. We let xj(k) denote the time of the kth service completion at station j and denote the kth service time at j by oj(k) (where (Jo(k) has to be interpreted as the kth interarrival time). In particular, we let xo(k) denote the kth arrival epoch at the system. The time evolution of the system can then be described by a (J + 1)-dimensional vector x(k) = (xo(k), ... , xJ(k» following the recursion
x(k + 1) = A(k) with x(O) given by
@
x(k) ,
I
.=1 j=l
Let A be a random element in JR!' x J defined on some probability space (n, A, P). The random matrix A E JR!' x J is said to have fixed support if P(Aij = €) E {O, I} for 1 ~ i ~ I and 1 ~ j ~ J. Provided that A has fixed support, we call A integrable if E[ 11 All $ 1 < 00, in other words, A is integrable if for all entries Aij that are different from € it holds that El IAijll < 00. For pEN, we denote by Cp(lR;, 11 . liED) the set of all functions 9 : ~ -t IR, such that
k? 0 ,
Ig(x)1 ~
= (e, ... , e), where the matrix A(k) is = Q9(JI(k + 1)
Cl
+c2I1xll~,
0 ~ k ~p.
The space Cp(lR{, 11 . liED) allows us to describe many interesting performance characteristics as the following example illustrates.
;
(A(k»;j
J
= EB EB lAijERAij .
(6)
l=j
Example 6. Taking g(Xl, ... ,XJ) = exp(-rxj), with r ? 0, we obtain the Laplace transform of the jth component of x E ne. Furthermore, for g( Xl, ... , X J) = x1j, we obtain the higher-order moments of the jth component of x E lRJ, for p ? 1; and setting g(Xl, ... ,XJ) = XiXj, for specified i and j, with i f:. j, leads to evaluation of the covariance between the ith and jth component of x. Finally, taking g(Xb ... ,xJ) = exp(-rxi) exp(-sxj), for r,s ? 0 and specified i and j, with i f:. j, we obtain the joined Laplace transform of Xi and Xj'
for J ? i ? j ? 0 and otherwise f. Let Wj (k) be the time the kth customer arriving at the network spends in the system until leaving node j, then
Wj(k)
= Xj (k) -
xo(k).
Let A E ~xI be a measurable mapping of random variables Xl>" . , X m, with Xi E IR. for 1 ~ i ~ rn, that is, assume that
We call Xl>"" Xm the input of A. Consider the (max,+ )-linear system x(k + 1) = A(k) @ x(k), for k ? 0 and x(O) = xo, with {A(k)} an i.i.d. sequence. We denote the Markov kernel associated to {A(k)} by PA, that is, for any measurable set B and any state x
We define 'D-differentiability of A E lR{ x I through the V-differentiability of the associated Markov kernel PA.
Definition 7. We call A E JR!' x J 'D-differentiable if the associated Markov kernel PA is 'Ddifferentiable, that is, for any x E IR{ , the measure PA(·lx) is V-differentiable. Moreover, we denote the 'D-derivative of A by (cA,A+,A-), where P(A± @ x E B) = Pi(Blx) for any measurable set lJ and any x E IR{ , and CA = CPA'
PA(Blx) =P(x(k+l)EBlx(k)=x) = P(A(k)@x E B). Moreover, if A(k) has input (Xl (k), ... ,Xm(k», with Xi(k) mutually independent and {Xi(k)} LLd. with distribution J-'i(k), for 1 ~ i ~ rn, then
Let x(k + 1) = A(k) ® x(k), k ? 0 and x(O) = xo, be a (max,+ )-linear system, with {A(k)} an LLd. sequence, such that A(I) E IR{xJ is integrable. Provided that A(I) is Cp(IR{XJ, 11 . IIED)--differentiable, Result (R) immediately yields the following gradient estimation algorithm for E[g(x(k+l»lx(O) = xo] for any 9 E Cp(IR{, II·IIED)·
PA(Blx)
=
IT J
l A (zl •... 'Zm)®ZEB ITJ-'i(dxi)'
=lR~
.=1
Algorithm (A) Construct x;(k) of x(k) as follows. Initialise x;(k) to Xo. For k f:. j - 1, set
4. SENSITIVITY ANALYSIS We introduce the mapping 11 . liED as follows: for
AE
a: XJ
x;(k + 1)
894
= A(k) ® x;(k) ,
otherwise (i.e. for j xjU)
=k -
of (X 2 , ••• , Xm) is independent of B completes the proof. 0
1), set
= A+U-1)<9xjU-1)
In a queueing application, the entries of the matrix A(k) are typically sums of service times and the condition in the above theorem is satisfied, see, for example, definition (6) in Example 5.
and xjU) = A-U -1) <9xjU -1),
Theorem 9. Under the conditions in Theorem 8. If E[/ IXi II~I < 00, 1 ~ i ::; m, then for any 9 E Cp(IR{ ,11 . 11$) it holds that
then by virtue of Result (R) for any 9 E Cp(IR{XJ,1I
·lle)
E[g(x(k +
1»
f
Ix(O) = Xo I
k+l
=
L E[CA(XU - 1»
(g(xj(k
+ 1»
j=l
-g(xj(k+
g(a) PA(dal ·) E Cp(R"{, 11 · 11$) .
If in addition
1»)],
supE[lIxtll < 00
/;lEe
where we make use of the fact that x(j - 1) xjU - 1) = xj(j - 1).
SUPCF < 00, /;lEe
then it holds for any 9 E Cp(IR{,
The Cp(IR{xJ, 11·1IEl1)-differentiability of a matrix can be deduced from that of its entries as the following theorem shows.
~~~
Theorem 8. Let A E IR!XJ haveinputXI, ... ,Xm , with Xi E JR., for 1 ::; i ::; m, and let the Xl have a cumulative distribution function P, such that P has Cp(JR." II · II$)-derivative
If
/I . 11$)
that
g(a) PA(da'·)1 E Cp(lR;, 11·11$) .
Proof: For the proof of the first part of the theorem note that for any x E lRf it holds that
If
(CF, p+(-), P-(-») .
g(a) PA(dalx)
I
::;E[/g(A<9x)l]
If (a) Xl is stochastically independent of (X z , . . ., Xm), (b) (X z , . . . , Xm) does not depend on B, and (c) acE (0,00) exists, such that
+ C2E[I/A <9 xl/~] Cl + C2 E [(I/AII$ + I/xl/$)P] .
::; Cl
=
Since E[IIXill~] < 00, it follows that E[IIAlle] < 00 for 1 ::; m ::; p (for a proof use inequality (7». Hence, for appropriately defined constants d l , dz , we obtain
then A has Cp(~xJ,II · II$)-derivative (CA, A+, A-) with CF = CA and
• A+ = A(Xi,Xz, ... ,Xm ), where xi is distributed according to P+(·); • A- = A(X l ,X2, . . . ,Xm ), where Xl is distributed according to P-(·) .
which concludes the proof of the first part of the theorem.
Proof: The mapping A maps (JR.,)m onto ~xJ . Write gA(XI, ... ,Xm ) for g(A(XI, ... ,X m We obtain
».
IgA(XI, . .. ,xm)1
For proof of the second part note that for any y E lRf it holds that
~~~
= Ig(A(Xl, ... ,xm»1 ::; Cl
(7) ::; Cl
+ C2 I/A(XI, ... ,xm)I/~
If
g(a) PA(daly)
I
::; SUpE[CA(/g(A+ 0y)1 + Ig(A- <9y)I)]
+ Cz c" IIxI, ... , Xm I/~ .
/;lEe
= sup E[CA(lg(A(Xi ,X2 ···, Xm) <9 y)1 /;lEe
Hence, for any 9 E Cp(IR!XJ, 11·11$) the mapping gA is in Cp(R.:m , 11·11$). Under the assumption of the theorem, we may apply Corollary 2 in (Heidergott, B. and Va.zquez-Abad, F., 2(00) to the independent product of the distributions of the input variables Xj. The fact that the distribution
+ Ig(A(Xl , X 2, ... Xm) <9 y)1 )]. Using the fact that 9 E Cp(lRf, 1/ . 11$), we elaborate on (7) to conclude the proof (note that Theorem 8 implies that CA = CF). 0
895
the complexity of the algorithm is independent of the function g. More precisely, the order p of the function space Cp(!Rf xJ, 11 . 11$ ) with respect to which we carry out our analysis doesn't effect the algorithm.
5. A QUEUEING NETWORK EXAMPLE Consider the queueing network given in Example 5 with A(k) = A(uo(k), . .. ,oAk» .
Assume that the interarrival time uo(k) is stochastically independent of the service times U1 (k) , . .. .. . , uJ(k) and that the service times are independent of B. Let uo(k) be exponentially distributed with mean I/B, that is, P(uo(k) x) = F(x, B) = 1 - e- Bx • In accordance with Example 2, F is Cp(JR.:, 11 . II$ )-differentiable for any p. Observe that
:s
IIA(uo(k), . . . ,uJ(k»II$
:s
(J + 1) II(uo(k), . .. ,u;(k»II$'
Hence, Theorem 8 applies and we obtain the following Cp(IR;XJ, 11 . II$)-derivative of A(k) A+(k) CA
= A(u.j(k),U1(k), ... ,uJ(k»
,
= 1/(eB) and A-(k) = A(uo(k),U1(k) , ... ,uJ(k» ,
where u.j(k) has density It (defined in (1», and uo(k) has density 10 (defined (2». For o = [a, b], Theorem 9 implies that we may apply Result (R) and, hence, that Algorithm (A) yields an unbiased estimator. The same holds true when we take F as the uniform distribution, cf. Example 3. Specifically, taking g(xo, .. . , xJ) = (Xj -xo) (x. -xo) for j =1= i and j , i > 0, Algorithm (A) yields an unbiased gradient estimator for dE[Wj(k) W.(k)]/dB. 6. SUMMARY We introduced the concept of D-differentiability of matrices over the (max,+) algebra. Moreover, we established sufficient condition under which D-differentiability of one of the firing time distributions of a (max,+ )-linear system guarantees for any 9 out of a predefined class of performance functions unbiasedness of the gradient estimator for E[g(x(k»lx(O) = xo]. Specifically, if one of the firing time distributions is Cp(IR.,11 . II$)-differentiable, then typically the transition matrices {A( k)} of the (max, +) model are weakly Cp(R;xJ , II'II$)-differentiable (Theorem 8). Provided that the Cp(JR.., II'II$)-derivative of the firing time depending on B is bounded, the Cp(IR;xJ, 11 . II$}-derivative of A(k) is bounded (Theorem 9). Then, Result (R) implies differentiability of E[g(x(k»lx(O} = xo] for any 9 E Cp(!RfxJ, 11·11$) and Algorithm (A) yields an unbiased gradient estimation algorithm. Specifically,
896
The extension of this analysis to higher-order derivatives and Taylor series expansions is topic of further research.
7. REFERENCES Ayhan, H. and Baccelli, F. (2000) . Expansions for joint Laplace transform for stationary waiting times in (max,+ )-linear systems with Poisson input. (submitted). Baccelli, F. and Schmidt, V. (1996) . Taylor series expansions for Poisson-driven (max,+)linear systems. AnLs. Appl. Prob. 6, 138-185. Baccelli, F., HasenfuB, S. and Schmidt, V. (1997). Transient and stationary waiting times in (max,+ )-linear systems with Poisson input. QUESTA 26, 301-342. Heidergott, B. and Va.zquez-Abad, F. (2000). Measure valued differentiation for stochastic processes: the finite horizon case. EURANDOM report 2000-033. The report can be accessed via the web at http://www.eurandom.nl Jean-Marie, A. (1997). Waiting time distributions in Poisson-driven deterministic systems. Technical report no. 3083. INRlA Sophia Antipolis.