QoS-aware automatic syntactic service composition problem: Complexity and resolution

QoS-aware automatic syntactic service composition problem: Complexity and resolution

Accepted Manuscript QoS-aware automatic syntactic service composition problem: Complexity and resolution Virginie Gabrel, Maude Manouvrier, Kamil More...

630KB Sizes 1 Downloads 48 Views

Accepted Manuscript QoS-aware automatic syntactic service composition problem: Complexity and resolution Virginie Gabrel, Maude Manouvrier, Kamil Moreau, C´ecile Murat PII: DOI: Reference:

S0167-739X(17)30557-5 http://dx.doi.org/10.1016/j.future.2017.04.009 FUTURE 3411

To appear in:

Future Generation Computer Systems

Received date: 11 July 2016 Revised date: 10 March 2017 Accepted date: 4 April 2017 Please cite this article as: V. Gabrel, M. Manouvrier, K. Moreau, C. Murat, QoS-aware automatic syntactic service composition problem: Complexity and resolution, Future Generation Computer Systems (2017), http://dx.doi.org/10.1016/j.future.2017.04.009 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Highlights (for review)

Our contribution is twofold.  We propose an original formulation of the QoS-aware automatic syntactic service composition problem in terms of scheduling problem with AND/OR constraints, for the execution time and throughput QoS criteria.  We perform experimental results on WSC-09 proving the efficiency of our exact algorithm based on this scheduling formulation.  We analyse the complexity of optimizing other types of QoS criterion (e.g. cost and reliability criteria).  We exhibit polynomial cases.

*Revised Manuscript with source files (Word document) Click here to download Revised Manuscript with source files (Word document): QoSAwareSyntacticServiceCompositionR2_GabrelM Click here to view linked References

Journal Logo Future Generation Computer Systems 00 (2017) 1–21

Special Issue on Service-Oriented Systems Engineering

QoS-aware Automatic Syntactic Service Composition problem: complexity and resolution Virginie Gabrel, Maude Manouvrier, Kamil Moreau, C´ecile Murat Universit´e Paris-Dauphine, PSL Research University CNRS, UMR 7243, LAMSADE 75016 PARIS, FRANCE E-mail: {gabrel,murat,manouvrier}@lamsade.dauphine.fr

Abstract Automatic syntactic service composition problem consists in automatically selecting services, from a registry, by matching their input and output data. The composite service, resulting from this selection, allows producing a set of output data, needed by a user, from a set of input data, given by the user. Adding Quality-Of-Service (QoS) values for each service (for example execution time and cost values), this selection problem becomes an optimization one called the QoS-aware automatic syntactic service composition problem. Depending on the QoS criterion used, many models and algorithms resolving the aforementioned problem are proposed in the literature. The aim of this article is twofold. Firstly, we provide a unified understanding of the wide variety of existing approaches and we analyse and compare the theoretical complexity induced by each QoS criterion. We state that optimal solution for execution time or throughput QoS criteria can be determined in polynomial time but optimality is no more guaranteed in polynomial time for QoS criteria like cost or reliability. Indeed, we show that the composition problem becomes NP-hard when optimizing such QoS criteria. Secondly, we propose a novel approach for solving more efficiently polynomial cases. This approach is based on a scheduling formulation with AND/OR constraints, using a directed graph structure. For the Web Service Challenge-09 benchmark (considering execution time and throughput QoS criteria), our exact algorithm outperforms the related work. Keywords: Service composition, Web Service Challenge, AND/OR constraints, QoS optimization.

1. Introduction The QoS-aware automatic syntactic service composition problem consists in automatically composing services by matching their parameters (input and output data) such that the resulting composite service can produce a set of output data from a set of input ones while optimizing a Quality of Service (QoS) criterion. Such composition problem is a very active area of research, as shown in recent surveys [1, 2], and is tackled by standard synthetic data benchmarks, as the Web Service Challenge (WSC) one. WSC, described in [3], proposes an incremental synthetic data benchmark. In 2005, it only focuses on syntactic web service composition, then it integrates OWL for Taxonomy in 2007, and structured data types in 2008 [4], and finally it introduces QoS criteria (execution time and throughput) in 2009 [5]. These benchmarks are still used by recent approaches. For example, WSC-07 is recently used in [6], authors in [7] experiment their approach on WSC-08, authors of [8] use WSC-08 and WSC-09 and authors of [9] execute experiments on WSC-09 and WSC-10. 1

/ Future Generation Computer Systems 00 (2017) 1–21

2

Several polynomial-time algorithms have been proposed for solving the QoS-aware automatic syntactic service composition problem, e.g. in [9, 10, 11, 12, 13]. In this article, we show that the theoretical complexity of the problem depends on the QoS criterion used. We state that polynomial-time algorithms can only determine optimal solutions for execution time or throughput QoS criteria (the ones proposed in WSC-09), but optimality is no more guaranteed for QoS criteria like cost or reliability. Indeed, the composition problem becomes NP-hard when optimizing such QoS criteria. Our contribution is twofold. Firstly, we analyse the complexity of optimizing different types of QoS criterion and we explain why some problems are polynomial while others are NP-hard. Secondly, we propose an original formulation of the QoS-aware automatic syntactic service composition problem in terms of scheduling problem with AND/OR constraints, for the execution time and throughput QoS criteria. This formulation allows to apply very efficient exact algorithm for optimizing both aforementioned criteria (as shown by our experimental results on WSC09). This article is organized as follows. Section 2 presents the syntactic service composition problem. Related work is reviewed in Section 3. Section 4 analyses the theoretical complexity of service composition problem in function of QoS criterion. Considering execution time and throughput criteria, a new approach is proposed in Section 5 in which the service composition problem is modelled and solved as a scheduling one with AND/OR constraints. This new approach is particularly efficient as shown by our numerous experimental results. Finally, Section 6 concludes. 2. QoS-aware automatic Syntactic Service Composition problem: definition In the syntactic service composition problem, the set of available services is known: each service is described by the data needed for its execution and the data produced when it has been invoked. A user query is defined by a set of provided data and a set of data to be produced. The syntactic service composition problem is to determine if the user query can be satisfied by selecting a subset of services. The feasibility constraints associated with the service selection are described in the following Section 2.1. 2.1. Problem description: the feasibility part As the syntactic description of functions in programming language, any (SOAP, RESTful or Web API) service s can be syntactically described by the set of its input data (i.e. the data needed by the service to be invoked), in(s), and the set of its output data (i.e. the data produced by the service invocation), out(s). A service registry contains a set S of services and an induced set D of data. Syntactic description of services is used in many approaches. For example, authors in [14] recently propose a multi-level index, whose first levels are constructed according to the inputs and the outputs of services. In [15], services inputs and outputs are stored in a relational database. This description results from a semantic analysis allowing to matching services parameters. Usually, the semantic information is described by an ontology. For example in [16], each parameter is an instance of an OWL [17] class in the DBpedia1 ontology (see in [18] for a description of DBpedia). In this article, we suppose that syntactic description of services is given. The service discovery and the semantic analysis to extract such description are out of the scope of the paper. Example 1. Let us consider an example with 9 services, numbered from 1 to 9, and 9 data, represented by capital letters in 1.

A service registry can easily be represented by a directed graph G = (X, U), called in the following ServiceData graph: • vertices represent services and data: X = D ∪ S , 1 http://dbpedia.org/ontology/

2

/ Future Generation Computer Systems 00 (2017) 1–21 s in(s) out(s)

1 J M

2 K, P J, M

3 L, M N

4 M P

5 M P, Q

6 P R, Q

7 J L

3

8 N, P T

9 L N, P

Table 1. An example of a service registry with S = {1, ...9} and D = {J, K, L, M, N, P, Q, R}

• directed edges in U represent two kinds of relation: (1) an edge from i ∈ S to j ∈ D represents the fact that service i produces data j ( j ∈ out(i)), (2) an edge from i ∈ D to j ∈ S represents the fact that service j needs data i to be executed (i ∈ in( j)). Let us remark that G is a bipartite graph (there does not exist any directed edge linking two vertices belonging to S or two vertices belonging to D). Example 2. The ServiceData graph associated with the service registry given in Table 1 is represented in Fig. 1. Vertices representing data are represented with circles while vertices representing services are drawn with squares. For example, vertex P is a data produced by services represented by vertices 4, 5 or 9 and, is an input of services 2, 6 and 8. Service represented by vertex 3 needs data L and M to be executed and produces data N. L

9

7 3

J

N

1

R

M

K

2

T

8

4

P

6

5

Q

Figure 1. The ServiceData graph associated with registry of Table 1

A user query, denoted (I, O), is defined by a set I ⊆ D of data provided by the user and a set O ⊆ D of data required by the user. A service can be invoked if and only if all its input data are available. A data is available for a service s either when it is provided by the user, or when it has been produced by another service invoked before s. We can now formally define a composite service as follows: Definition 1. Given a service registry and a user query (I, O), a composite service C is a subset of services such that: - ∀s ∈ C, ∀d ∈ in(s), d ∈ I ∪ ( ∪ out(r)) meaning that each service s ∈ C can be invoked if all its input data are r∈C\{s}

available. - ∀d ∈ O, d ∈ ∪ out(r) meaning that all data required by the user are available. r∈C

In order to respond to a user query, the syntactic service composition problem consists in selecting a composite service that provides, from the user input set I, the set O of all outputs needed by the user. To formulate this problem on the ServiceData graph, the following transformations are necessary: • add a fictitious vertex d0 in D and I, representing an ”empty” data; add an arc (d0 , s) ∈ U (d0 is included in in(s)) for each service s which does not need any input to be executed, • add a fictitious vertex denoted s0 in S , which represents the beginning of the process; for all i ∈ I, add an arc (s0 , i) ∈ U (thus out(s0 ) = I), 3

/ Future Generation Computer Systems 00 (2017) 1–21

4

• add a fictitious vertex denoted End in S , which represents the end of the process; for all o ∈ O, add an arc (o, End) ∈ U (thus in(End) = O). A feasible solution, denoted C, of the aforementioned composition problem can be expressed in terms of subgraph of the transformed ServiceData graph. Property 1. Given a query (I, O), a subgraph GC = (X C , U C ) of the ServiceData graph G corresponds to a composite service C if and only if: (i) X C contains services s0 and End, all! services of C, and all the input data of these services: X C = {s0 , End} ∪ C ∪



s∈C∪{End}

in(s) ,

(ii) ∀i ∈ D ∩ X C , U C contains exactly one arc of the form ( j, i), with j ∈ C ∪ {s0 }, (iii) ∀i ∈ C ∪ {End}, U C contains all arcs of the form ( j, i), with j ∈ D, (iv) GC does not contain any directed cycle.

Proof 1. By construction, given (I, O), the subgraph GC represents a composite service C since: - (i) induces that only the services of C, and the necessary data, are included in X C , - (ii) induces that each data i has to be produced by exactly one service, - (iii) induces that each service i ∈ C can be invoked if and only if all its inputs data are available, - (iii) induces that data in O are produced since all input data of End are available, - concerning (iv), let us suppose that GC contains a directed cycle of the form {i1 , j1 , . . . , jk , i1 }. This cycle describes the following execution scheme: service j1 is invoked before jk and jk produces data i1 . This means that data i1 is not available before the execution of j1 which contradicts the fact that j1 needs i1 to be invoked. Example 3. Given the graph of Fig. 1 and the query described by I = {J, K} and O = {Q, T }, we can propose the composite service {6, 7, 8, 9}. This solution is associated with the subgraph represented in bold arcs in Fig. 2. L

9

7 3

J

N

1

K

End

R

M

s0

T

8

4

2

P

6

5

Q

Figure 2. A composite service and the associated subgraph

Let us remark that a composite service can be obtained in polynomial time using an adapted Depth First Search algorithm. Moreover, it may exist a lot of different composite services to satisfy a particular query. These composite services can be compared on the basis of the QoS criteria describing each service of the composite ones. Among QoS criteria, we can find the execution time (i.e. the time needed by a service to produce output data from the input ones) and the throughput (i.e. the average number of successful executions). In the following Section 2.2, we present the way to evaluate a composite service according to the most used QoS criteria and the induced optimization problems. 4

/ Future Generation Computer Systems 00 (2017) 1–21

5

2.2. Problems description: the evaluation part The QoS of a composite service C depends on the QoS of each service belonging to the composite service. In the following, we choose to present 4 famous QoS criteria which are representative of 4 different ways to aggregate individual QoS services: • Execution time Service Composition (ESC). Execution time represents the time needed to execute the service. Let us denote e(s) ≥ 0 the execution time of service s, PC the set of paths in GC from s0 to End and ve (C) the value of the composite service C on the execution time criterion. We have: X ve (C) = max e(s) µ∈PC

s∈µ

This criterion has to be minimized. • Throughput Service Composition (TSC). Throughput represents the average number of successful executions of the service. Let us denote t(s) ≥ 0 the throughput of service s and vt (C) the value of the composite service C on the throughput criterion. We have: vt (C) = min t(s) = min min t(s) s∈C

µ∈PC s∈µ

This criterion has to be maximized. • Cost Service Composition (CSC). Cost represents the fee the user has to pay to invoke the service. Let us denote c(s) ≥ 0 the cost of service s and vc (C) the value of the composite service C on the cost criterion. We have: X vc (C) = c(s) s∈C

This criterion has to be minimized.

• Reliability Service Composition (RSC). Reliability represents the probability of the service successful executions. Let us denote r(s), with 0 ≤ r(s) ≤ 1, the reliability of service s and vr (C) the value of the composite service C on the reliability criterion. We have: Y vr (C) = r(s) s∈C

This criterion has to be maximized. The QoS values are modelized in the ServiceData graph as weight on vertices: a vertex representing a service of S has a weight equal to its QoS value, vertices associated to data and fictitious vertices (s0 and End) have a neutral value (0 for the execution time and the cost values, 1 for the reliability value and +∞ for the throughput value). In this article, we analyse the four optimality problems, each of them defining a specific version of the QoS-aware automatic Syntactic Service Composition problem, and our contributions are the following: • We show that the minimal ESC problem is a well-studied scheduling problem with AND/OR precedence constraints. • The minimal ESC and maximal TSC problems can be solved with a polynomial-time algorithm (already proposed in the scheduling context) directly applied on the associated ServiceData graph. This algorithm is more efficient on WSC-09 instances than already published algorithms [10, 11, 12]. • We analyse the particular difficulty of the NP-hard minimal CSC problem and we exhibit a polynomial case (associated with a particular class of ServiceData graph). • We show that the maximal RSC problem is NP-hard and we exhibit also a polynomial case. The next section presents the related approaches associated with the aforementioned problems. 5

/ Future Generation Computer Systems 00 (2017) 1–21

6

3. Related work QoS-aware automatic syntactic service composition problem has attracted a lot of attention from different fields in recent years. We choose to classify these approaches into four groups: search approaches, dependency graph approaches, planning graph approaches and integer linear programming ones. 3.1. Search approaches In [10] and [19], the WSC-09 first run-up winners propose a polynomial-time algorithm to solve the minimal ESC and maximal TSC problems. Its worst case complexity, given in [10], is O(a|S |2), with a the average number of services’ input data. This algorithm proceeds in two steps: in the first one, a forward search aims to eliminate useless services for satisfying the user’s query and, at the same time, computing optimal QoS values of useful services; in the second stage, a backtrack search is executed to determine the optimal composite service. In [12], authors propose a more efficient polynomial-time algorithm for solving the minimal ESC problem: the first part is a search procedure determining, at each iteration, the minimal accumulated execution time of one service (this search procedure stops when all user outputs are reached) and, the second part is a backtrack search in order to determine the optimal composite service. No theoretical complexity results are given. This algorithm is the closest one from ours. However, authors claim that the backtrack search can be rather long. From our point of view, this is due to the chosen data structures and we propose a more efficient graph-based data structure. 3.2. Dependency graph approaches In [11, 20], authors represent the QoS-aware automatic syntactic service composition problem with another directed graph, called dependency graph. In the dependency graph, denoted H = (S , A), vertices only represent services. There is an arc in A from i to j if the intersection between the output of i and the input of j is not empty (i.e. service j needs a subset of the outputs produced by i to be executed, in( j) ∩ out(i) , ∅). The arc is tagged with the subset of data belonging to the intersection.ServiceData graph Example 4. The dependency graph corresponding to the service registry given in Table 1 is represented in Fig. 3. Vertex s is drawn with a square divided into three columns: in(s) is given in the first column, s in the middle and out(s) in the latest one. For example, service 9 produces data N and P and service 6 needs data P, so there is an arc from 9 to 6 with tag P. L

J 7 L

N, P L 9 N P P

J

M 4 P

N

M M

J 1 M

P N P 8 T

P P

J

P

P

K 2 J P M

P 6

Q R

P M L

M

L M 3 N

M

P

M

M 5

Q P

Figure 3. The dependency graph associated with registry of Table 1

Remark 1. For building the dependency graph, one needs to read the service registry and, for each service, one has to match its inputs with the outputs of all the other services, inducing an O(|S |) complexity for each service (cf.[21]). Consequently, the building step of the dependency graph is O(|S |2 ). 6

/ Future Generation Computer Systems 00 (2017) 1–21

7

In [11], authors propose a new approach: to apply a Dijkstra-like algorithm on the dependency graph to solve the minimal ESC and maximal TSC problems. In their experimental results, we remark that the computation time is mainly due to the building step of the dependency graph (for example, for instances with 10000 services, the building time is 6000ms while the resolution time is 60ms). Moreover, this article includes inexact results: authors claim that their algorithm can be also applied to determine the optimal composite service on product-type criterion (reliability). We show, in Section 4.1, that it is not true since RSC problem is NP-hard. In [13], authors mention the inexact result of [11] to establish that the optimal QoS can be calculated in polynomial time, which is right only for throughput and execution time criteria. The same misconception appears in [9].

3.3. Planning graph approaches IA planning and graph-based approaches are also used for solving service composition problem. In [22], the planning graph model is applied for solving the following feasibility problem: does there exist a feasible composite service for satisfying the user’s request, without considering any QoS criterion? Considering QoS-aware composition problem, Chen and Yan in [23] propose a three steps algorithm to determine the best composite service considering several criteria like execution time or cost. First, they represent the service composition problem by a labelled planning graph. Second, they transform the labelled planning graph into a layered weighted graph. Third, an optimal service composite is found by applying Dijkstra’s shortest path algorithm. Steps 1 and 3 can be done by polynomialtime algorithms; the complexity of step 2 is not presented in [23]. Since the QoS-aware service composition with cost criterion is NP-hard, the step 2 is the critical non-polynomial part of the algorithm. Thus, this approach is not interesting for solving the service composition with a criterion like execution time since a polynomial-time algorithm has been already proposed in [19]. The same criticism can be done for the algorithmic study presented in [24]. 3.4. Integer linear programming approaches Integer Linear Programming (ILP) model have also been proposed for QoS-aware service composition in [25, 26, 27]. In [26, 25], a composite service is decomposed into stages: a stage contains one service or several services executed in parallel. The associated ILP represents the problem of selecting one or several services per stages. Thus, the number of variables and constraints can be huge since there are proportional to the number of services and data times the number of stages. Moreover, the number of stages is not known; only upper bounds can be chosen (the worst one is to set the number of stages equals to the number of services). The size of the model does not allow to solve big size instances. In [27], authors propose a new ILP model containing a lower number of variables and constraints (proportional to the number of services and data). This model is able to optimize a QoS criterion while satisfying transactional requirements. In a large majority of the experimental instances of WSC-09, model of [27] finds a better solution more rapidly than models proposed in [25, 26]. In conclusion, there exists a lot of algorithmic propositions dedicated to particular versions of the QoS-aware syntactic composition problem. To the best of our knowledge, none of them highlights the diversity of the theoretical complexity of these different versions. The purpose of the following section is to propose a classification of these versions according to their complexity. 4. QoS-aware Syntactic Composition problem: complexity classification A particular QoS criterion applied to QoS-aware syntactic composition problem induces a specific version. The more studied versions are minimal ESC, maximal TSC, minimal CSC and maximal RSC problems. On the one hand, polynomial time algorithms have already been proposed in [9, 10, 11, 12, 13] for solving minimal ESC and maximal TSC problems. On the other hand, minimal CSC problem has been proved to be NP-hard in [20, 28]. Based on this result, we establish the NP-hardness of maximal RSC in Section 4.1. In order to explain the theoretical complexity difference between versions, we analyze polynomial cases in terms of optimal path in ServiceData graph, in Section 4.2. Beyond complexity classification, this analysis allows us to propose new polynomial cases for minimal CSC and maximal RSC problems. 7

/ Future Generation Computer Systems 00 (2017) 1–21

8

4.1. QoS-aware syntactic composition problem: NP-hard problems In [20, 28], authors show that minimal CSC problem is NP-hard. In [20], the proof is based on a reduction to the set cover problem. Indeed, the problem remains NP-hard even in the simplest case where all data in O are produced by services that can be invoked from I. In this case, each service can cover a data subset O′ ⊆ O with a data subset I ′ ⊆ I. When all service costs are equal to 1, a minimum cost composite service is equivalent to a minimum set cover of O. Based on the NP-hardness of minimal CSC problem, we deduce that the maximal RSC problem is also NP-hard. Theorem 1. The maximal RSC problem is NP-hard. Proof 2. The proof is based on a reduction from the minimal CSC problem. Given an instance of the minimal CSC problem described by a graph G = (X, U) and a cost c(s) ≥ 0 associated to each service s, we build an instance of the maximal RSC problem described by the same graph G and a reliability r(s) = e−c(s) associated to each service s, inducing that each r(s) belongs to [0, 1]. An optimal solution of such an instance of the maximal RSC problem is equivalent to an optimal solution of the minimal CSC problem since:   Y  Y r(s) r(s) ⇔ max ln  max C∈C

s∈C

C∈C

⇔ max C∈C

⇔ max C∈C

⇔ max C∈C

⇔ min C∈C

X

s∈C

ln(r(s))

s∈C

X

ln(e−c(s) )

s∈C

X (−c(s)) s∈C

X

c(s)

s∈C

Therefore, both maximal RSC and minimal CSC problems are NP-hard. In [11], authors claim that their polynomial-time algorithm can be used for exactly solving maximal RSC problem. Theorem 1 implies that this claim is wrong. Here we give an example to illustrate that the algorithm provided in [11] does not give the optimal solution. Example 5. In the dependency graph in Fig. 4, we consider the following reliability values: Services Reliability

1 0.8

2 0.8

3 0.7

Algorithm of [11] determines the composite service with services 1 and 2 to obtain data M, N, P and Q with global reliability of 0.64, while the optimal solution is clearly the service 3 with reliability of 0.7 which is better.

4.2. QoS-aware syntactic composition problem: polynomial problems While the minimal CSC and maximal RSC problems are NP-hard, the minimal ESC problem is a polynomial one. In the following we propose a new interpretation of the minimal ESC problem in terms of scheduling problem. Based on this formulation, the polynomial complexity of the minimal ESC problem is obvious. In a classical precedence-constrained scheduling problem, we have to determine the starting times of a set of jobs to be performed (with known execution time) while satisfying some precedence constraints of the form ”a job i can be performed if and only if a job j is finished”. Thus, a job is ready to be executed when all its precedence jobs are 8

/ Future Generation Computer Systems 00 (2017) 1–21

9

K 1 M N K

L

K,L

M, N

P L 2 Q

K, L

P, Q

M,N,P,Q

M, N, P, Q

M K 3 N P L Q

Figure 4. A counterexample for the reliability measure

completed. The objective is to minimized the total execution time of all the jobs. A classical approach to solve such a problem is to compute the earliest starting time for each job. The earliest starting time is equal to the maximum ending times of all precedence jobs. So, one has to compute a longest path, called critical-path, in an associated graph. This well-known problem can be solved in polynomial-time with critical-path algorithm (see e.g. [29]). In a scheduling problem with AND/OR precedence constraints, two kinds of jobs are considered: jobs with AND precedence constraints are ”classical” ones which can be executed when all their precedence jobs are finished, and jobs with OR precedence constraints can be performed when at least one of their precedence jobs is terminated. This scheduling problem can also be solved in polynomial time: several algorithms have been proposed in [30, 31, 32]. The only difference with the classical precedence-constrained scheduling problem concerns jobs with OR precedence constraints: for such a particular job, the earliest starting time is the minimum among all the ending times of its precedence jobs. With OR precedence constraints, a subset of jobs may not be performed, but the total execution time of completed jobs is still defined by a longest path. The minimal ESC problem is exactly a scheduling problem with AND/OR precedence constraints in a ServiceData graph: • both data and services are jobs, • services are AND-constrained jobs since each service can be executed if and only if ALL its input data are available, • data are OR-constrained jobs since each data is available when AT LEAST one service produces it. Since the scheduling problem with AND/OR precedence constraints is polynomial, minimal ESC problem is also polynomial. Given a composite service C and its associated subgraph GC , its execution time is given by the longest path from s0 to End. When we consider the throughput criterion, the value of a path µ in GC is vt (µ) = min t(s) and the value of s∈µ

a composite service C is then vt (C) = minC vt (µ). So, the maximal TSC problem is to find the composite service that µ∈P

maximizes its total throughput which corresponds to a path of minimum value. Consequently, minimal ESC problem and maximal TSC problem are similar and maximal TSC problem is also polynomial.

4.3. Commentaries on the complexity result disparity For ESC and TSC problems, the value of a composite service C is equal to the value of a particular optimal path in GC : the maximal path value for the execution time criterion and the minimal path value for the throughput criterion. That’s why these two criteria, minmax-type (or equivalently maxmin-type) criteria, can be similarly optimized 9

/ Future Generation Computer Systems 00 (2017) 1–21

10

by computing optimal path with a Dijkstra-like algorithm. At the opposite, CSC (resp. RSC) problem presents a sum-type (resp. product-type) objective function that cannot be expressed in terms of path value in GC , as illustrated in the following. Considering a service s (resp. a data d), we denote by C s (resp. Cd ) a composite service ending with s (resp. d) and by vc (C s ) (resp. vc (Cd )) its associated value. For minmax-type criterion, the value of C s is the value of the longest path ending in s, computed as follows v(C s ) = maxd∈in(s) v(Cd ). For the sum-type criterion, such as cost, the cost of P a composite service C s is equal to vc (C s ) = sk ∈Cs c(sk ). The difficulty comes from the fact that vc (C s ) cannot be P computed in function of paths ending in s: vc (C s ) is neither equal to d∈in(s) vc (Cd ) nor to maxd∈in(s) vc (Cd ). In the first typical situation, a service produces several data.

K

si

sj

L

Figure 5. An example with a two-output data service

The service composite of Fig. 5 has a total cost vc (C s j ) = c(si ) + c(s j ). If we sum the cost values of s j ’s inputs (or equivalently the paths cost ending in s j ), we obtain: vc (C K ) + vc (C L ) = 2c(si ) When adding c(s j ), we do not obtain vc (C s j ) since c(si ) is counted twice (forgetting that K and L are both outputs of the same service si ). In the second typical situation, a data is an input of several services. sj

si

Q

P

sl

sk

R

Figure 6. An example with a same input data of two distinct services

The composite service of Fig. 6 has a total cost vc (C sl ) = c(si ) + c(s j ) + c(sk ) + c(sl ). On the one hand, vc (C sl ) , max(vc (C Q ), vc (CR )) since vc (C Q ) = c(si ) + c(s j ) and vc (CR ) = c(si ) + c(sk ). On the other hand, summing the cost values of the sl ’s inputs, we obtain: vc (C Q ) + vc (CR ) = 2c(si ) + c(s j ) + c(sk ) When adding c(sl ), we do not obtain vc (C sl ) since the cost of P is counted twice. In general case, the cost of a composite service cannot be expressed in terms of paths value. However, it is possible to exhibit a subclass for minimal CSC (resp. RSC) problem such that the cost (resp. reliability) of a composite service is exactly the sum (resp. the product) of the paths value. 10

/ Future Generation Computer Systems 00 (2017) 1–21

If we assume the two following assumptions, then the minimal CSC problem becomes polynomial: - (a) each service of S produces only one data, - (b) each data is an input of only one service. In this case, each feasible composite service is such that: X vc (C s ) = vc (Ci ) + c(s)

11

(1)

i∈in(s)

since X Ci ∩ X C j = {s0 }, ∀i, j ∈ in(s). If X Ci ∩ X C j , {s0 }, then it exists a vertex v , s0 , v ∈ C s such it exits a path from v to i and it exists a path from v to j. Then i ∈ in(s) and j ∈ in(s) imply that there exist two disjoint paths from v to s which contradicts assumptions (a) or (b). Corollary 1. If the ServiceData graph G = (X, U) verifies assumptions (a) and (b), then the maximal RSC problem becomes polynomial. 5. Minimal ESC and maximal TSC service composition problems as a scheduling problem with AND/OR constraints In the previous section, we showed that the minimal ESC problem is exactly a well studied scheduling problem with AND/OR constraints. In [32] authors propose a Dijkstra-like algorithm to solve the scheduling problem with AND/OR constraints. This algorithm is similar to the one proposed in [11] applied to a Dependency graph. As remarked in Section 3.2, computing a Dependency graph is time consuming. Therefore, we adapt the algorithm given in [32] directly on a ServiceData graph (without computing a Dependency graph) and we show, in the experimental results, that this approach leads to much better computation times. Section 5.1 presents our algorithm for the minimal ESC problem and Section 5.2 presents our algorithm for the maximal TSC problem. Experimental results are given in Section 5.3. 5.1. Polynomial-time algorithm for minimal ESC problem In Dijkstra’s algorithm, a label λ is associated to each vertex representing its starting time. Label values are computed at each iteration of the algorithm. When the value of a label becomes equal to the earliest starting time, this label is considered as definitive, otherwise the label is considered as temporary. At the beginning of the algorithm, all labels are temporary, and at each iteration, one of them, having the smallest starting time value, becomes definitive. An efficient implementation of Dijkstra’s algorithm consists in using a data structure called a heap (or priority queue) [29], allowing to find efficiently the minimum value of temporary labels. A heap is a data structure storing and manipulating a collection of values, which can be implemented as an array. It has the following property: the value stored in the first index, called root, is the minimal one. In our algorithm, the starting time associated with a data corresponds to the time at which the data is produced. The starting time of a service corresponds to the time at which the service can be invoked, all its inputs being available. This starting time is computed by choosing the maximal starting time associated with its inputs data. At the beginning of the algorithm, all labels are initialized to an infinite value (meaning that data are not available and services can not be invoked) except for the input data of the query having a label equal to zero (because input data are available). At each iteration, our algorithm finds, using the heap, the smallest label corresponding to the earliest starting time when a data can be produced. Our algorithm is presented in Algorithm 1. We denote nd( j): the number of non-available data expected by service j to be executed (at the beginning nd( j) = |in( j)|). Data and services are labelled with variables λ representing their starting time. The labels are initialized (lines 2-8) as follows: • λ(s0 ) = 0 (line 4) • λ(i) = 0 for all data i in I (lines 5-8), • λ(i) = +∞ for any other data and services (line 2).

11

/ Future Generation Computer Systems 00 (2017) 1–21

12

For each data i, we have to record in p(i) the service that produces i and p(i) is initialized by −1 (line 2). At the beginning, all the labels are temporary. When the label of service j becomes definitive, λ( j) represents its earliest starting time. When the label of data i becomes definitive, λ(i) represents the earliest time from which the data i is available. The set of vertices with definitive labels is denoted L in Algorithm 1. At the beginning, it only contains s0 . The temporary finite labels of data are recorded in a heap H: the root of the heap contains the data with the minimal value temporary label. In Algorithm 1, the classical heap functions are called: • root(H): removes and returns the root of heap H • insert(λ(i), i, H): inserts data i with key λ(i) into heap H, • decrease(v, i, H): modifies heap H due to the decreasing of the label of data i, now equals to v. In a current iteration of the algorithm, the three following steps are performed (lines 10-24): Step 1. Data i with the smallest temporary λ(i) is chosen: its label becomes definitive (line 10). Step 2. For each service j with data i as input, decrease nd( j) of one unit. If the counter nd( j) reaches 0, it means that the service j can be executed at the starting time of its latest available input data: λ( j) = max λ(i). i∈in( j)

Thus, the label of service j becomes definitive (line 14). Step 3. For each service j whose label became definitive at the previous step, consider each data i belonging to out( j) and refresh its label as follows (lines 15-22): λ(i) = min{λ(i), λ( j) + e( j)} and update p(i). When Algorithm 1 terminates, two cases have to be considered: • The fictitious vertex End has a definitive label. It means that all its input data (corresponding to set O) are available and, λ(End) is equal to the minimum execution time. • The fictitious vertex End has not a definitive label and λ(End) equals +∞. It means that it does not exist any feasible composite service to satisfy the query. Algorithm 1 is exact: the proof is given in [32]. Concerning the complexity, we obtain: X X O(|D| log2 |D| + |in(s)| + |out(s)| log2 |D|) s∈S

s∈S

When Algorithm 1 terminates with a finite label for End, the composite service associated with the optimal value can be determined with the backtrack procedure, given in Algorithm 2. At the end of this algorithm, set S ol contains all services invoked X in the optimal composite service. Using p(i) (∀i ∈ D), this backtrack procedure is very fast since its complexity is O( in(s)). s∈S ol

5.2. Polynomial-time algorithm for maximal TSC problem Algorithm 1 can be modified in order to optimize the throughput criterion. The entire algorithm is presented in the appendix and we present here the main Xmodifications. Let us recall that the value of the optimal solution of the minimal ESC problem is v∗e = min max e(s) and the value of the optimal solution of the maximal TSC problem C

is

v∗t

µ∈PC

s∈µ

= max min min t(s). To adapt Algorithm 1 to the maximal TSC problem, all operators have to be changed, C

µ∈PC s∈µ

implying to modify the label initialization as follows: 12

/ Future Generation Computer Systems 00 (2017) 1–21

Algorithm 1 Minimal ESC. 1: H ← ∅ 2: λ(i) ← +∞ ∀i ∈ S ∪ D and p(i) ← −1 ∀i ∈ D 3: nd( j) ← |in( j)| ∀ j ∈ S 4: λ(s0 ) ← 0 and L ← {s0 } 5: for all i ∈ out(s0 ) do 6: λ(i) ← 0 and p(i) ← s0 7: insert(λ(i), i, H) 8: end for 9: while End < L and H , ∅ do 10: i ← root(H) and L ← L ∪ {i} 11: for all j ∈ S \ L such that i ∈ in( j) do 12: nd( j) ← nd( j) − 1 13: if nd( j) == 0 then 14: L ← L ∪ { j} and λ( j) ← max λ(k) k∈in( j)

15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:

v ← λ( j) + e( j) for all k ∈ out( j) \ L do if λ(k) == +∞ then λ(k) ← v, p(k) ← j and insert(v, k, H) else if λ(k) > v then λ(k) ← v, p(k) ← j, decrease(v, k, H) end if end for end if end for end while Return λ(End), p

Algorithm 2 Optimal composite service for ESC problem 1: procedure Backtrack(s, sol) 2: if s < S ol then 3: sol ← sol ∪ {s} 4: for all dod ∈ in(s) 5: backtrack(p(d), S ol) 6: end for 7: end if 8: end procedure 9: S ol ← ∅ 10: backtrack(End, S ol)

13

13

/ Future Generation Computer Systems 00 (2017) 1–21

14

• λ(s0 ) = +∞, • λ(i) = +∞ for all data i in I, • λ(i) = 0 for any other data and services. Moreover, current iteration has to be updated as follows: Step 1. Data i with the maximal temporary λ(i) is chosen. Step 2. Service j can be executed with a throughput value equals to the minimal throughput of its input data: λ( j) = min λ(i). i∈in( j)

Step 3. For each service j whose label became definitive at the previous step, consider each data i belonging to out( j) and refresh its label as follows: λ(i) = max{λ(i), min{λ( j), t( j)}} The temporary finite labels of data are recorded in a heap which root contains the maximal λ. We have to update the heap function by replacing the decrease function by an increase one. When the service composition problem does not have any solution, Algorithm 3 cannot label End and terminates with λ(End) = 0. Property 2. When Algorithm 3 terminates with End ∈ L, we have: for all i ∈ S ∪ D, λ(i) = vt (µ∗ (i)) with µ∗ (i) the maximal throughput value path from s0 to the vertex i. Proof 3. The proof is by induction on the following property: Rank k. We denote Dk (resp. S k ) the subset of data i (resp. service i) belonging to L at the kth iteration of the while loop (lines 9-25). For all i ∈ Dk ∪ S k , λ(i) equal to the optimal path value from s0 to i on throughput criterion denoted vt (µ∗ (i)). Rank 0. We have S 0 = {s0 } and D0 = I. At the initialization step of the algorithm: - λ(s0 ) = +∞ ⇒ λ(s0 ) = vt (µ∗ (s0 )) - ∀i ∈ I, λ(i) = +∞ ⇒ λ(i) = vt (µ∗ (i)) So, the property is satisfied at rank 0 and we assume that it is verified at rang k. Thus we have to show that the property is satisfied at rank k + 1. Rank k + 1. The algorithm chooses the data d belonging to D \ Dk such that λ(d) = maxi∈D\Dk λ(i). Let us suppose that this choice is wrong: the optimal path is µ∗ (d) = {s0 , d1 , . . . sq , dq+1 , . . . , sl , dl+1 = d} and we have vt (µ∗ (d)) > λ(d). If we scan µ∗ (d) from s0 to d, one necessarily encounters a service, denoted sq with 1 ≤ q ≤ l, belonging to S k while dq+1 < Dk . Since the property is verified at rank k and sq ∈ S k , we have λ(sq ) = vt (µ∗ (sq )). By construction of the algorithm, we also have: λ(dq+1 ) ≥ min{λ(sq ), t(sq )}, then: Moreover, since µ∗ (sq ) ⊂ µ∗ (d), we have:

λ(dq+1 ) ≥ min{vt (µ∗ (sq )), t(sq )}

(2)

vt (µ∗ (sq )) ≥ vt (µ∗ (d))

(3)



Equations 2 and 3 imply that: λ(dq+1 ) ≥ min{vt (µ (d)), t(sq )}. As sq ∈ µ∗ (d), we have: vt (µ∗ (d)) ≤ t(sq ). Therefore, min{vt (µ∗ (d)), t(sq )} = vt (µ∗ (d)), implying λ(dq+1 ) ≥ vt (µ∗ (d)). Assumption vt (µ∗ (d)) > λ(d) implies that λ(dq+1 ) > λ(d) which contradicts λ(d) = maxi∈D\Dk λ(i). Thus, if the algorithm chooses data d with the maximal λ(i), we have λ(d) = vt (µ∗ (d)). And at the end, λ(End) = vt (µ∗ (End)) In the next Section, we present the computation time taken by Algorithms 1 and 3.

14

/ Future Generation Computer Systems 00 (2017) 1–21

15

5.3. Experimental results Our experiments were carried out on a HP with Intel (R) Core TM i7-4610M, with 3 Ghz processor and 16 Go RAM, under Linux Ubuntu 16.04.1 LTS and Python 3.5. Because no benchmark based on real services exists for QoSaware automatic syntactic service composition, we have applied our algorithm on two different synthetic benchmarks: the well-known and very used benchmark of Web Service Challenge 20092 [5] and the benchmark WS-random-QoS3 (WSR-QoS) provided by [9]. WSC-09 provides 5 Test Sets (TS), containing around 500, 4000, 8000 and 15000 services with respectively more than 1500, 10000, 15000 and 25000 data (see Table 2). WS-random-QoS contains smaller Test Sets, thus we only focus on the biggest one containing 9000 services with around 34000 data. In both benchmarks, services are described by their response time (in milliseconds) and throughput (in invocations/second) values. Each Test Set of both benchmarks corresponds to a service registry and is associated with one query. In WSC-09, each Test Set is described by 4 main files: a WSDL standard file [33], describing services of the registry by their inputs/outputs, hierarchically organized in terms of concepts using an ontology recorded in a OWL [17] standard file, a WSLA [34] file, storing the QoS (response time and throughput) criteria values of each service, and an XML file storing the inputs and outputs of the query associated with the service registry. A service is then represented in the ServiceData graph by a vertex s. Each input i of a service s is represented by vertex i. Each output o of a service s is represented by several vertices, as many as the concept of o has ancestors in the ontology. In WSR-QoS, the Test Set is encoded as GraphML4 file, where services, inputs and outputs are represented as vertices. Inputs and outputs are connected with edges to their respective services. Output/Input matches are already precomputed and represented with directed edges from outputs to inputs. Two special service vertices, called ”Source” and ”Sink” represent the query, the outputs of the ”Source” service representing the inputs of query, and the inputs of the ”Sink” service representing the expected outputs of the query. Input and output data connected with an edge in the GraphML file are merged in our service registry. For each Test Set of WSC-09 (numbered from TS01 to TS05) and for the biggest Test Set of WSR-QoS, we report in Table 2: • in the first line, the number of vertices associated with services, |S |, • in the second line, the number of vertices associated with data, |D|, • in the third line, the number of inputs (|I|) and outputs (|O|) in the query, • in the fourth line, the average number of inputs (AVGin ) and the average number of outputs per service (AVGout ), • in the fifth line, the average number of services having a specific data as input (AVGconsume ) and the average number of services producing a specific data (AVG produce ), • in the sixth line, the computation time (in ms) necessary for Algorithm 1 to compute the optimal solution on the execution time criterion, • in the seventh line, the computation time (in ms) necessary for Algorithm 3 to compute the optimal solution on the throughput criterion. Computation times reported in lines 6 and 7 correspond to average computation times computed from 10 execution of our algorithms for each Test Set. Let us compare our results with the closest approaches already published. First, our results are better than those of [11], since we don’t have to build the dependency graph. Our results are also better than those of [12] (and consequently than those of [10, 19]) because our algorithm is based on a directed graph structure. Moreover, backtrack procedure is very efficient. The computation time of this backtrack procedure is always less than 0.1 ms. As shown in Table 2, the computation times to find the optimal solution of the four first Test Sets of WSC-09 are almost equivalent for both minimal ESC and maximal TSC problems, while for the biggest Test Sets 2 Available

at http://http://www.lamsade.dauphine.fr/∼manouvri/WSC2009 Testsets.zip (WSC-09). at https://wiki.citius.usc.es/inv:downloadable results:ws-random-qos 4 See in http://graphml.graphdrawing.org/ for more details. 3 Available

15

/ Future Generation Computer Systems 00 (2017) 1–21

|S | |D| |I|/|O| AVGin /AVGout AVGconsume /AVG produce Comp. time for min ESC (ms) Comp. time for max TSC (ms)

TS01 572 1330 34/4 16/16 2/11 2 2

TS02 4129 9599 34/3 21/21 2/15 7 6

TS03 8138 16263 2/4 23/23 3/20 8 10

TS04 8301 16364 28/3 23/23 3/21 18 17

16

TS05 15211 28337 31/2 25/25 3/24 21 15

WSR-QoS 9000 34934 286/5 3/6 2/1 25 19

Table 2. Experimental results using the query provided in benchmarks

(TS05 and WSR-QoS), computation time is bigger for solving minimal ESC problem than for solving maximal TSC problem. Therefore, our additional experiments only focus on the minimal ESC problem. Each Test Set is associated with one query. In order to verify that the execution time does not depend on the query, we randomly generated, for TS-05, 500 feasible queries using the following process. For each query, we randomly select a set Sr of services, containing r services among all the services of the registry, r being randomly generated between 5 and 50. Then, using a depth-first search algorithm from these selected services, we identify the set OS r of all the data that can be produced. Then, we randomly select a subset Or′ of OS r containing r′ data, r′ being randomly generated between 5 and 20. The input of the query corresponds to the union of all the inputs of the services of S r and the output of the query corresponds to Or′ . We also have generated 500 non-feasible queries. The only difference to generate such queries is the selection of the outputs: outputs of the query are randomly selected among all data of D.

Feasible queries Non-feasible queries

average time 13 13

min time 6 6

max time 32 32

Table 3. Minimal ESC problem computation times (in ms) for the biggest Test Set of WSC-09

We report, in Table 3, the average, minimal and maximal computation times, for solving minimal ESC problem, on these feasible and non-feasible queries. Our experimental results show that computation times are very satisfactory: they never exceed 32ms with an average value of 13ms. Moreover, the computation times are equivalent in average for the 500 non-feasible queries showing that computation times does not depend on the feasibility of the query. It is not surprising since our algorithm is a depth-first search type algorithm, which can conclude to the non-feasibility of a query without having to label all the vertices of the ServiceData graph. Our conclusions are the same for the WSR-QoS Test Set, as reported in Table 4. For this Test Set, the 500 queries are generated using the same process, except that r is randomly generated between 10 and 300, since the graph and the query characteristics differ from the one of WSC-09 (see details of both benchmarks characteristics in Table B.5 of Appendix).

Feasible queries Non-feasible queries

average time 14 15

min time 5 4

max time 34 40

Table 4. Minimal ESC problem computation times (in ms) for the biggest Test Set of WSR-QoS

In order to better understand the algorithm behaviour, we analyse the computation times in function of the number of inputs and outputs of feasible queries. Figure 7 presents the computation times (in ms) in function of the number of inputs. In this figure, a linear correlation between the computation times and the number of inputs clearly appears. Figure 8 presents the computation times (in ms) in function of the number of outputs in the feasible query, each point in the curve representing the average computation times of all queries having the same number of outputs. As 16

/ Future Generation Computer Systems 00 (2017) 1–21

17

shown in this figure, computation time cannot be explained by the number of outputs in the query. In conclusion, our experiments show that the computation time does not depend on the number of outputs but depends on the number of inputs in the query.

Figure 7. Minimal ESC problem computation times (in ms) per the number of inputs in the query for the biggest Test Set of WSC-09

Figure 8. Minimal ESC problem computation times (in ms) per the number of outputs in the query for the biggest Test Set of WSC-09

6. Conclusion The QoS-aware syntactic service composition problem has been the subject of numerous studies. Several QoS criteria are usually considered inducing different versions of the problem. In this article, we propose a classification of these versions on the basis of their theoretical complexity: for cost and reliability criteria, associated problems are NP-hard while for execution time and throughput criteria, they are polynomial. We explain this complexity gap using an original formulation in terms of scheduling problem with AND/OR constraints and beyond, we exhibit polynomial subclass for cost and reliability problems. For solving polynomial versions, we propose an efficient algorithm that is an extension of a Dijkstra-like scheduling algorithm. Our algorithm is based on a simple directed graph structure and does not need to construct any additional structure, like a dependency graph or a planning graph as usually used in the related work. Thanks to this simple graph structure, our extensive computational experiments show that our approach is the most efficient one for exactly solving, in polynomial time, the service composition problem with a minmax-type criterion (like execution time or throughput criterion). Our computational experiments have been done using the well-known benchmark WSC-09. The subject of the previous WSC-08, was to determine the syntactic service composition with the minimal number of services, without taking into account any QoS criteria. Let us remark that minimizing the number of services is equivalent to minimize the cost criterion when all cost services are equal to 1. As shown in Section 4.1, this simple case remains NP-hard. Our complexity classification highlights the NP-hardness of the problem considered in WSC-08 while problems submitted in WSC-09 were polynomial. A natural extension of QoS-aware syntactic service composition problem is to combine several criteria: for example minimize both the total execution time and the number of services as in [13]. In this ”multicriteria” context, several new versions of the problem can be defined: either, minimizing a weighted sum of several criteria, or minimizing a single criterion while satisfying QoS constraints for others criteria. However, as soon as the cost or reliability criterion 17

/ Future Generation Computer Systems 00 (2017) 1–21

18

is considered (in objective function or constraints) the induced optimization problem becomes NP-hard. Moreover, in practical context, additional constraints (such as dependency relationship between services) must be taken into account and dedicated approaches must be defined: one has to propose approximate algorithms in order to obtain solution in satisfactory computing time but also, one has to determine exact algorithms in order to evaluate the quality of the solution given by approximate algorithms. This is left for future research. Acknowledgement. We would like to thank the anonymous reviewers for their very pertinent comments and suggestions. Moreover, Pablo Rodriguez-Mier and its co-authors are gratefully acknowledged for their data sets and their help. References [1] C. Jatoth, G. Gangadharan, R. Buyya, Computational intelligence based QoS-aware web service composition: A systematic literature review, IEEE Trans. on Services Computing PP (99) (2015) 1–18. [2] Q. Z. Sheng, X. Qiao, A. V. Vasilakos, C. Szabo, S. Bourne, X. Xu, Web services composition: A decades overview, Information Sciences 280 (2014) 218–238. [3] T. Weise, M. Blake, S. Bleul, Semantic web service composition: The web service challenge perspective, in: A. Bouguettaya, Q. Z. Sheng, F. Daniel (Eds.), Web Services Foundations, Springer New York, 2014, pp. 161–187. [4] A. Bansal, M. Blake, S. Kona, S. Bleul, T. Weise, M. Jaeger, WSC-08: Continuing the Web Services Challenge, in: IEEE CEC, 2008, pp. 351–354. [5] S. Kona, A. Bansal, M. B. Blake, S. Bleul, T. Weise, WSC-2009: A Quality of Service-Oriented Web Services Challenge, in: IEEE CEC, 2009, pp. 487–490. [6] S. Bansal, A. Bansal, G. Gupta, M. B. Blake, Generalized semantic web service composition, Service Oriented Computing and Applications 10 (2) (2016) 111–133. [7] P. Rodriguez-Mier, C. Pedrinaci, M. Lama, M. Mucientes, An Integrated Semantic Web Service Discovery and Composition Framework, IEEE Trans. on Services Computing PP (99) (2015) 1–14. [8] H. Ma, A. Wang, M. Zhang, A Hybrid Approach Using Genetic Programming and Greedy Search for QoS-Aware Web Service Composition, in: A. Hameurlain, J. Kng, R. Wagner, H. Decker, L. Lhotska, S. Link (Eds.), Trans. on Large-Scale Data- and Knowledge-Centered Sys. XVIII, Vol. 8980 of LNCS, Springer Berlin Heidelberg, 2015, pp. 180–205. [9] P. Rodriguez-Mier, M. Mucientes, M. Lama, Hybrid Optimization Algorithm for Large-Scale QoS-Aware Service Composition, IEEE Trans. on Services Computing PP (99) (2015) 1–14. [10] W. Jiang, C. Zhang, Z. Huang, M. Chen, S. Hu, QSynth: A Tool for QoS-Aware Automatic Service Composition, in: IEEE Int. Conf. on Web Services (ICWS), 2010, pp. 42–49. [11] W. Jiang, T. Wu, S. Hu, Z. Liu, QoS-aware automatic service composition: A graph view, Journal of computer science and technology 26 (5) (2011) 837–853. [12] S. Luo, B. Xu, Y. Yan, An accumulated-QoS-first search approach for semantic web service composition, in: IEEE Int. Conf. on ServiceOriented Computing and Applications (SOCA), 2010, pp. 1–4. [13] P. Rodriguez-Mier, M. Mucientes, M. Lama, A Hybrid Local-Global Optimization Strategy for QoS-Aware Service Composition, in: IEEE Int. Conf. on Web Services (ICWS), 2015, pp. 735–738. [14] Y. Wu, C.-G. Yan, Z. Ding, G.-P. Liu, P. Wang, C. Jiang, M. Zhou, A Multilevel Index Model to Expedite Web Service Discovery and Composition in Large-Scale Service Repositories, IEEE Trans. on Services Comp. 9 (3) (2016) 330–342. [15] D. Lee, J. Kwon, S. Lee, S. Park, B. Hong, Scalable and efficient web services composition based on a relational database, Journal of systems and Software 84 (12) (2011) 2139–2155. [16] Z. Zhang, S. Chen, Z. Feng, Semantic Annotation for Web Services Based on DBpedia, in: IEEE Int. Symp. on Service Oriented System Eng. (SOSE), 2013, pp. 280–285. [17] S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, L. A. Stein, OWL Web Ontology Language Reference, Tech. rep., W3C, http://www.w3.org/TR/2004/REC-owl-ref-20040210/ (February 2004). [18] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, et al., DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia, Semantic Web Journal 5 (2014) 1–29. [19] Z. Huang, W. Jiang, S. Hu, Z. Liu, Effective Pruning Algorithm for QoS-Aware Service Composition, in: IEEE Conference on Commerce and Enterprise Computing, 2009, pp. 185–187. [20] S. C. Geyik, B. K. Szymanski, P. Zerfos, Robust dynamic service composition in sensor networks, IEEE Trans. on Services Computing 6 (4) (2013) 560–572. [21] A. M. Omer, A framework for Automatic Web Service Composition based on service dependency analysis, Ph.D. thesis, TU Dresden (2011). [22] X. Zheng, Y. Yan, An efficient syntactic web service composition algorithm based on the planning graph model, in: IEEE Int. Conf; on Web Services(ICWS), 2008, pp. 691–699. [23] M. Chen, Y. Yan, QoS-aware Service Composition over Graphplan through Graph Reachability, in: IEEE Int. Conf. on Services Computing (SCC),, 2014, pp. 544–551. [24] S. Deng, B. Wu, J. Yin, Z. Wu, Efficient planning for top-K Web service composition, Knowledge and information systems 36 (3) (2013) 579–605. [25] J. J.-W. Yoo, S. Kumara, D. Lee, S.-C. Oh, A web service composition framework using integer programming with non-functional objectives and constraints, algorithms 1 (2008) 7.

18

/ Future Generation Computer Systems 00 (2017) 1–21

19

[26] F. Paganelli, T. Ambra, D. Parlanti, A QoS-aware service composition approach based on semantic annotations and integer programming, Int. Journal of Web Information Systems 8 (3) (2012) 296–321. [27] V. Gabrel, M. Manouvrier, C. Murat, Optimal and Automatic Transactional Web Service Composition with Dependency Graph and 0-1 Linear Programming, in: Service-Oriented Computing, Springer, 2014, pp. 108–122. [28] S.-C. Oh, D. Lee, S. R. Kumara, Effective web service composition in diverse and large-scale service networks, IEEE Trans. on Services Computing 1 (1) (2008) 15–32. [29] R. K. Ahuja, T. L. Magnanti, J. B. Orlin, Network Flows: Theory, Algorithms, and Applications, Prentice-Hall, Inc., 1993. [30] E. A. Dinic, The Fastest Algorithm for the Pert Problem with AND-and OR-Nodes (The New-Product-New-Technology Problem), in: Integer Programming and Combinatorial Optimization Conference, University of Waterloo Press, 1990, pp. 185–187. [31] G. M. Adelson-Velsky, E. Levner, Project Scheduling in AND/OR Graphs: A Generalization of Dijkstra’s Algorithm, Mathematics of Operations Research 27 (3) (2002) 504–517. [32] R. H. M¨ohring, M. Skutella, F. Stork, Scheduling with AND/OR precedence constraints, SIAM Journal on Computing 33 (2) (2004) 393–415. [33] D. Booth, C. K. Liu, Web services description language (wsdl) version 2.0 part 0: Primer, Tech. rep., W3C, http://www.w3.org/TR/2007/REC-wsdl20-primer-20070626 (2007). [34] H. Ludwig, A. Keller, A. Dan, R. P. King, R. Franck, Web Service Level Agreement (WSLA) Language Specification, Tech. rep., W3C, http://www.research.ibm.com/wsla/WSLASpecV1-20030128.pdf (2003).

19

/ Future Generation Computer Systems 00 (2017) 1–21

20

Appendix A. Algorithm for solving maximal TSC problem The updated lines of Algorithm 1 are shown by ⊲. Algorithm 3 Maximal TSC. 1: H ← ∅ 2: λ(i) ← 0 ∀i ∈ S ∪ D and p(i) ← −1 ∀i ∈ D 3: nd( j) ← |in( j)| ∀ j ∈ S 4: λ(s0 ) ← +∞ and L ← {s0 } 5: for all i ∈ out(s0 ) do 6: λ(i) ← +∞ and p(i) ← s0 7: insert(λ(i), i, H) 8: end for 9: while End < L and H , ∅ do 10: i ← root(H) and L ← L ∪ {i} 11: for all j ∈ S \ L such that i ∈ in( j) do 12: nd( j) ← nd( j) − 1 13: if nd( j) == 0 then 14: L ← L ∪ { j} and λ( j) ← min λ(k)

⊲ ⊲ ⊲



k∈in( j)

15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:

v = min{λ( j), t( j)} for all k ∈ out( j) \ L do if λ(k) == 0 then λ(k) ← v, p(k) ← j and insert(v, k, H) else if λ(k) < v then λ(k) ← v, p(k) ← j, increase(v, k, H) end if end for end if end for end while Return λ(End)

20

⊲ ⊲ ⊲ ⊲

/ Future Generation Computer Systems 00 (2017) 1–21

21

Appendix B. Details of the biggest Test Sets of WSC-09 and WS-QoS

Percentage of services having only one input data Percentage of services having between 2 and 5 input data Percentage of services having between 6 and 9 input data Percentage of services having more than 10 input data Percentage of services producing less than 10 output data Percentage of services producing between 10 and 30 output data Percentage of services producing between 30 and 50 output data Percentage of services producing more than 50 output data Percentage of data not produced by any service Percentage of data produced by less than 10 services Percentage of data produced by less 50 services Percentage of data produced by more than 50 services

WSC09-TS05 2.24 48.35 47.08 2.33 0.89 17.47 47.40 34.23 50.86 24.94 18.48 5.72

Table B.5. Detailed information for the biggest Test Set of both benchmarks

21

WSR-QoS-TS05 13.61 72.73 13.57 0.09 95.19 2.54 0.67 1.60 0.01 97.01 2.98 0.00

*Biographies (Text)

Virginie Gabrel got her computational science PhD in 1994 at Université Paris-Dauphine, while her research has been done and funded by the DGA (Direction Générale de l'Armement). In 2005, she got the HDR at Université Paris-Dauphine. From 1996 to 2002, she has been associate professor at Paris 13 university. Since 2002, she is associate professor at Université Paris-Dauphine. From 2012 to 2014, she was assistant director of LAMSADE laboratory. She published around 40 papers in journals and conferences. Her research activities concern the modelization in Operations Research, the resolution of huge integer programs (with decomposition techniques), the industrial applications (satellite mission planning, network optimization, service composition…) and the robustness in mathematical programming. Maude Manouvrier is currently working as an Assistant Professor at the LAMSADE Lab. of Université Paris-Dauphine. She has received a Ph.D. in Computer Science in 2000. From 1998 to 2003, she has been working in an international CNRS - CONICIT cooperation with the Universidad Central de Venezuela (UCV), and from 2008 to 2012, in an international CNRS FONACIT cooperation with the Universidad Simon Bolivar (USB - Caracas Venezuela). She was reviewer or member of program committee of several International Conferences and Journals and was, in 2012 and 2013, Program Chair of the International Symposium on Advances in Transaction Processing (in conj. with the International Conference on Mobile Web Information Systems). She published 23 papers in international journals and conferences. Her research interests are focused on Spatio-Temporal and Multimedia Databases, Access Methods and Indexing Structures, Content-Based Image Retrieval and Web Services Composition. Kamil Moreau is a student at Université Paris-Dauphine, currently in the 3rd year of a bachelor in Applied Mathematics and Computing Science, also in the 3rd year of the Pluridisciplinar Cycle of further studies (CyPES) at Paris Sciences et Lettres. He passed his baccalauréat with distinction in 2013. He has studied graphs since 2014 and worked over service composition problem during a research internship supervised by Virginie Gabrel in 2015. Cécile Murat received her Ph.D. in Computer Science in 1997, from Université Paris-Dauphine. She is, from 1998, Associate Professor, in the LAMSADE Lab. She received, in 1998, for her Ph.D. studies the Award Nathalie Demassieux in Sciences ("Prix de la Chancellerie des Universités de Paris"). In 2015, she got the HDR from Université Paris-Dauphine. Her research activities, in operation research, aim to solve decision problems. The main focus of her research is to create and to develop combinatorial optimization tools, each for integrate more concepts related to the random uncertainty and others, to solve efficiently practical difficult problems. She is the author of one book and 31 articles in international journals and conferences.

*Biographies (Photograph)

*Biographies (Photograph)

*Biographies (Photograph)

*Biographies (Photograph)