Volume 15. Numhr
4
INFORMATION
PROCESSING LETTERS
A SOLUTION METHOD FOR THE NON-ADDITIVE RESOURCE ALLOCATION IN DISTRIBUTED SYSTEM DESIGN
31 October 1982
PROBLEM
S. CERI and G. PELAGATTI Istituto di Elettrotecnica ed Elettronica, Poiitecnico di Milano, 20133 - Milano, Ita& Received December 1980; revised version received 22 July 1982
Keywords: Distributed systems, resource allocation
1. Introductiou
Distributed information systems are characterized by the sharing of data and processing resources between the different nodes of a computer network. The applications which are activated at each node can be distinguished into local applications, which are processed completely at that node, and distributed applications, which need some non-local resource. In the latter case, non-local processes are activated and some information is transmitted along the computer network. Clearly, the determination of the optimal allocation of resources with respect to a given computer r::twork and a given c;et of applications is crucial to the design of a successful distributed inf&mation system. Several works exist on this su’bject [ 1-3); in all of them the advantage of allocatihg a resource where it is used by an application (or conversely, the cost of accessing a remote resource by an .application) is the relevant optimization parameter. However, in none of these works the advantage of allocating aN the resources needed by .3n application on its execution node, thus obtaining a local application, is considered. The basic assumption of this paper is that the advantage of executing an application locally is not only the sum of the saved remote accesses, but also the encreased simplicity in controlling the execution of the transaction (see, for instance, [4] 174
on coricurrency control in a distributed database system ). The optimization model which takes care of the above consideration is of course more complex than a model which does not, because there is a non-additivity in the cost function, due to the need of distingtiifhing between local and distributed applications, as it will be shown. An assumption of this model is that the set of resources which are used by an application is known beforehand. in Section 2 of this paper a formulation of the problem is given and in Section 3 a solution is presented which uses the analogue between this type of allocation problem and the determination of the maximum flow-minimum cut in a graph. Finally, Section 4 presents an example of application of the optimization model, in which the considered resources are the files of a distributed database system.
2. Problem formulation Consider one node N of the computer network. Let A = (a3 be the set of the applications which are activated at this node, B = {ri}be the set of the resources, and R(aj) c B be the subset of resources which are required by application aj. We can define the subset R c B of those resources which
0020-O190/82/0c1~~0-ooO0/$02.~5 @ 1982 North-Holland
Volume 15, Number 4
INFORMATION
PROCESSING LE’ITERS
are required by at least one application aj E A; therefore, R 3 U R(aj). a,EA
Let u(ri) be the cost of allocating a resource ri E R on the node s, v(ri) the sum of the costs of accessing resource ri remotely by all the applications of A and w(aj) the cost of performing an application distributedly instead of locally. A feasible solution of the allocation problem for node R is the allocation of a subset R s R of resources on node s. The cost of a solution R is the sum of three elements: (i)
U(R)=
C
U(Ti)
C,ER
is the cost of allocating the resources at node N; (ii)
V(R) =
C
V( r;
3 1 October 1982
Definition (iii) introduces a non-additivity in the goal function that changes the nature of the optimization model and requires a different soiution method with respect to traditional formulations of the alloc;ation problem [l-3]. 3. Solution method The proposed solution method consists in showing that the allocation problem can be formulated as the classical problem of finding a maximum flow-minimum cut in a graph. Given an allocation problem AP, the minimum cut problem MC(AP) can be obtained in the following way. Let A and R be the set of applications and of used resources of the AP problem as defined in the previous section; the graph of the derived MC(AP) problem is defined as G = (N, E), with N=AuRusud, E=E,uE,uE,uE,
)
~,E(R-ii)
is the cost of accessing remotely the resources which are needed by the applications of A but are not allocated at N;
where s and representing node of the the directed
(iii)
El 3
W(R) =
C a,
E
w(ai)_
A(E)
{(S,
ri>,
d are two additional nodes of MC(AP), the source node and the destination graph, having no meaning in AP, and edges E are defined as fohows, V ri E
R},
(3 . 1)
E, E {(ri, d), V ri E R},
(34
E, s ((aj, d), tl aj E A),
(3.3)
A(R) = (aj: R(aj) 45R)
Edz((ri,aj):riER(aj),riER,ajEA).
(3 .4)
is the major cost of performing the applications which require some resource ri E R distributedly instead of locally. Remark that the definition of the set x(R) is such that the cost w(aj) is considered only if at least one resource needed by awlication aj has not been included in the solution R. The optimal solution R” is then expressed by
The edges of E, represent the fact that a resource is used by an application. Consider the following capacity assignment to the edges of G:
where --
C u(q) +
5:
V(ri)
r,E(R-ii) +
C
w(aj)
j X(K)
8 E
(3 1) Y.
i
where 9, is the set of all possible solutions.
C( (s, ri)) = u(ri) C((ri 9d)) = v(ri)
v(s, ri> E PI,
(3.5)
v(ri 9d) E E, 9
(3.6,
C( (aj, d)) = w(aj)
V(aj, d) E E, 3
(3.7)
C((ri,
tl(ri,aj)E
Ed.
(3.8)
aj>)=
f20
A cut T separating s and d is a partition of the nodes of the graph into two subsets S, and D, such that s E S, and d E D,. To each cut T a set of edges E, can be associated, with E,=
{(n,, n2> E E: n, E S-r, n7 E DJ.
INFORMATION
Volume 15, Nunnbar 4 nl
31 October 1982
PROCESSING LETTERS
a
d
Condition (3.1Oa) excludes from I those cuts in which an edge with unlimited capacity exists from S, to Dr. Condition (3.10b) excludes from T some cuts which are dominated by some other cut in T, because for each cut T’ which does not satisfy condition (3.1Ob)a cut T” E I can be found with a lower capacity. In fact, let A’ E:ST, be the set of applications which do not satisfy condition (3. lob), then consider the cut T” E 7 with ST,, = ST’- A’ and D,,, = D*, U A’; ‘Y” dominates T’ because C(E-r ) = C(E,.,) + C
C( (aj, d)).
a,EA’
Fig. 1. Representation of the transportation network associated to the problem MC(AP) and of a cut T separating the source s and the destination d (R =(r,, r2, rj), A =(a,, a2, a3, a,}, R(a,)={r,), R(a,)=(r,, r2), R(a,)=(r,, rj). R(ag)=(r2, r& R*(T) = (r,, @I.
Of course, the optimal cut To belongs to T. The capacity of a cut T E T can be decomposed in four terms:
nE,)-tC(E,nE,)+C(E,nE,)
C(E,)=C(E,
(3.11)
+C(E,n EdI)= The capacity of a cut T, denoted by C(E,), is the sum of the capacities of the edges of E-r. The set R*(T) of resources associated with a cut T is defined in the following way: r;E R*(T) e
ri ED-~-.
Theorem 3.1. If TO is the minimum capacity cut for the problem MC(AP), then R*(T”j is the optimal sohdon of the allocation problem AI? Proof. The following two conditions (3.10a) and (3.10b) define the set T of candidate optimal cuts: * a(ri, aj):
(r; E ST) A (aj E DT) A (ri E R(aj)),
(3,lOa)
T E 7 * sa,: (aj E ST) A (R(aj) C D-t-). (3.10b) 176
C(E,nEi)=
C
C((s,ri))=
riEDT
(3 .9)
An example of a graph associated with 2 problem MC(AP) of a cut T and of the set R*(T) is shown in Fig. 1 (the nodes belonging to S, are given in black). The above definitions have shown how to construct a directed graph from the allocation problem; the following theorem states that an optimal solution of the minimum capacity cut problem on the graph allows to determine also an optimal solution for the allocation problem on the node N of the computer network. ,
TE7
The four capacity coefficients are computed as follows: C
u(r;),
ri e R*(T)
(3.12) C(E, n E2) = C C((ri, d)) = ri’Sf
C
V(ri)q
rieR_R*(T)
(3.13) C(E-rnE,)=
C
C((aj,d))=
aJGST
C
w(aj)
a, E NT)
(3.14) where A(T)=(aj:
R(aj)gR*(T))
because of conditioln (3.1Ob), which states that aj E ST only if there is at least one resource ri E R(a,) such that ri G R*(T),
C(E,nE,)=O
(3.15)
because of condition (3.1Oa). The minimum cut prcblem can be formulated as
To = min(C(E,)); TET
Volume 15, Number 4
INFORMATION PROCESSING LETTERS
substituting (3.1 l), (3.12), (3.13), (3.14) and (3.15) we have TO = mix1 Try..( +
C
dri)+
c
4%)
a,EA(T)
II
V(ri)
r,ER-R*(T)
~,~R+(T)
I
(3.16)
l
Comparing expressions (2.1) and (3.16) it is clear that the formulations of the allocation problem AP and its associated minimuin cut problem MC(AP) are equivalent, provided that the spaces T and $8, are the same. In fact, definition (3.9) associates to each cut T E T a set R*(T) E 631of resources to be allocated; conversely, to each allocation R E cjt* exactly one cut T E T is associated, with riEDr
0
*
this section it is shown how the cmfficients of the optimization model can be derived from the typical parameters of the file allocation problem. It is assumed that the transactions address the retrieval requests to one copy of the files, possibly a local one, while updates are addressed to all the copies to preserve consistency [l]. The file allocation problem formulated in this section consists in determining a set of files (resources), a copy g which it is convenient to locate on a given n The parameters of the file allocation problem are shown in Table 1. In order to apply formulation (2.1) of the optimization model, the coefficients of the optimization model can be computed as follows: (i)
U(ri)=
C, lSi + C,;
ri E R,
while the partitioning of the aj is unique because of conditions (3.10) and can be derived through the following rule ajEST
3 I October 1982
+c#h
C jETL
C f,ny, JETR
fjnYj
takes care of updating and storage costs due IOthe allocation of a copy of ri on node R;
q
(3riES,:riER(aj)).
(ii)
V(ri)
= (C,,
-
CJ,) * C f,n’i, jETL
The above theorem shows that the allocation problem is equivalent to a minimum cut problem, which is a typical problem of operations research; the equivalence has been proved between this problem and the determination of the maximum flow between the source node and the destination node [5]. Many classical and efficient solution methods have been presented, like the ‘labelling method’ [5] or the ‘out-of-kilter method’ [6]. Edmonds and Karp have shown in [7] that the computational complexity of the labelling method, provided that a good choice is made of the node labelling sequence, is O(m*n) where m is the number of arcs and n the number of nodes of the network.
4. An application: file ahcation in distributed databases
An example of the application of the proposed optimization model is the solution of the file allocation problem in distributed database design. In .
takes care of the costs of performing remote retrieval accesses to ri rather than local retrievals by
Table 1 Parameters for the file allocation problem
.
ri
.
a, TL TR F s, f, n:,
: :
n\
:
cs c =,I’ G” cm
: : : : : :
d,
: : :
: :
the irh file (resource) the jfh transaction (application) the set of local (on R) transaction indexes the set of remote transaction indexes the set of file indexes size of file ri frequency of activation of transaction a, number of retrieval accesses of transaction a, to file r, number of update accesses of transaction a, to file r, unit cost of storage cost of one local retrieval access cost of one remote retrieval access cost of one local update access cost of one remote update access cost of performing rransaction a, distributedly instead of iocally
. 177
INFORMATION PROCESSING LETTERS
Volume 15, Number 4 the
(iii)
References
applications located on N; w(aj) = dj fj
VI K.P. Eswaran, Placement of records in a file and file
l
takes cafe of the costs of performing transaction aj distributedly instead of locally fj times; Finally, in order to apply formulation (24, consider that a solution E is any subset of the set of files F and that the set A@) is given by --
A(R) f
ai:jETL
A
C ie{F-ii)
(Il\j+ll~j)*O
. I)
Thus, all the coefficients required by formulation (2.1) have been expressed using the parameters of Table 1, and the solution method proposed in Section 3 can be applied.
178
31 October 1982
allocation in a computer network, Proc. Information Processing (North-Holland, Amsterdam, 1974). PI S. Mahmoud and J.S. Riordon, Optimal allocation of resources in distributed information networks, ACM Trans. Database Systems 1 (1) (1976). 131H.L. Morgan and J.D. Levin, Optimal program and data locations in computer networks, Comm. ACM 20 (5) (1977). 141P.A. Bernstein, D.W. Shipman and J.B. Rothnie, Concurrency control in a system for distributed databases (SDD-I), ACM Trans. Database Systems 5 (1) (1980). PI L.R. Ford and D.R. Fulkerson, Flows in Network (Princeton University Press, 1962). Fl D.R. Fulkerson, An out-of-kilter method for minimal cost flow problems, SIAM J. Appl. Math. 9 (1961). 171J. Edmonds and R.M. Karp, Theoretical improvements in algorithmic efficiency for network flow problems, J. ACM 19 (1972). PI M. Balinski, On a selection problem, Management Sci. 17 (3) (1970).