Efficient and reliable service selection for heterogeneous distributed software systems

Efficient and reliable service selection for heterogeneous distributed software systems

Future Generation Computer Systems ( ) – Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevi...

2MB Sizes 0 Downloads 140 Views

Future Generation Computer Systems (

)



Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Efficient and reliable service selection for heterogeneous distributed software systems Shangguang Wang a , Lin Huang a , Lei Sun a , Ching-Hsien Hsu b,c,∗ , Fangchun Yang a a

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

b

Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan

c

Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, 300191, Tianjin, China

highlights • Design a service composition middleware for heterogeneous distributed systems. • An efficient and reliable service selection approach based on information and variance theory is proposed. • Experiments with real-world dataset show that the proposed technique is superior to other existing approaches.

article

info

Article history: Received 12 September 2015 Received in revised form 16 November 2015 Accepted 14 December 2015 Available online xxxx Keywords: Service selection Service composition QoS uncertainty Entropy Variance

abstract The service-oriented paradigm is emerging as a new approach to heterogeneous distributed software systems composed of services accessed locally or remotely by middleware technology. How to select the optimal composited service from a set of functionally equivalent services with different quality of service (QoS) attributes has become an active focus of research in the service community. However, existing middleware solutions or approaches are inefficient as they search all solution spaces. More importantly, they inherently neglect QoS uncertainty owing to the dynamic network environment. In this paper, based on a service composition middleware framework, we propose an efficient and reliable service selection approach that attempts to select the best reliable composited service by filtering lowreliability services through the computation of QoS uncertainty. The approach first employs information theory and probability theory to abandon high-QoS-uncertainty services and downsize the solution space. A reliability fitness function is then designed to select the best reliable service for composited services. We experimented with real-world and synthetic datasets and compared our approach with other approaches. Our results show that our approach is not only fast, but also finds more reliable composited services. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Services are commonly regarded as black boxes with welldefined interfaces that can be aggregated recursively into new services by service composition technology [1]. An important aspect of service composition is the finding and binding of services in order to compose them into a composite application. Service composition has become the kernel technology in the domain of

∗ Corresponding author at: Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan. E-mail addresses: [email protected] (S. Wang), [email protected] (L. Huang), [email protected] (L. Sun), [email protected] (C.-H. Hsu), [email protected] (F. Yang). http://dx.doi.org/10.1016/j.future.2015.12.013 0167-739X/© 2015 Elsevier B.V. All rights reserved.

service-oriented architecture (SOA) which is able to meet the business requirements of heterogeneous distributed software systems. According to the SOA paradigm, composite applications are specified as abstract processes composed of a set of abstract services (called the service class). Then, at the service’s run time, for each service class a concrete service (called the service candidate) is selected and invoked. This case ensures loose coupling and design flexibility for many business applications distributed within and across organizational boundaries [2]. It is well-known that QoS (e.g., response time, reliability, and throughput) plays an important role in determining the performance of selected services for service composition middleware [3]. Traditional service discovering and matching approaches (e.g., UDDI, Bluetooth, etc.) only focus on searching services with functionalities. However, with dramatic growth in the number of

2

S. Wang et al. / Future Generation Computer Systems (

services, there are typically many services that are functionally equivalent for the user, leading to the user not knowing which service should be selected. To satisfy users’ QoS requirements, concrete services have to be selected and instantiated for each business process’s abstract task. Giving the growing number of service candidates of each service class that can offer different QoS values, the selection of the optimal combination of services that fulfills users QoS constraints becomes a very complex and time consuming task [4]. Hence, a QoS-based service selection approach is proposed, aiming at finding the best combination of services that satisfy a set of end-to-end QoS constraints from the user’s requests. Some notable service selection approaches include Hybrid [5], GA [6], Replanning [7], CAR & AR [8], MIP [9], and Heuristic [10]. Although these approaches have been shown to perform well in their respective contexts, they are not suited to composited services because of violent QoS fluctuation in services (i.e., the response time of the service changes over time) [8,11,12]. They lack consideration of QoS uncertainty, so they cannot provide reliable services for users in composition systems owing to the dynamic service environments involved. Generally, service candidates participating in service selection are widely distributed in the network. These services come from different organizations/systems and run on different platforms. Hence, any slight changes in location, network environment, service requirement time, and other aspects affect the reliability of these service candidates [13]. Therefore, it is worth noting that a service with consistently good QoS is typically more reliable than a service with a large variance in QoS. Therefore, consistency should be considered as an important criterion for reliable service selection. In addition, there is a further question we must face: are there high-overhead services with the same functional attributes but different QoS? The statistics published by the web services search engine Seekda! indicate that the number of web services increased exponentially in recent years. Before cloud computing, many researchers asked whether web services can be used as service candidates for each service class. Some researchers were pessimistic on this point. Now, however, the pay-per-use business model promoted by the cloud computing paradigm may enable service providers to offer massive services (e.g., infrastructure as a service, platform as a service, and software as a service) to public, private, or hybrid cloud platforms [14]. Hence, in one vision for the future, there will be massive services. However, most existing approaches suffer from a concentrated workload with increasing number of services, causing poor real-time performance. The main reason for this is that they focused too much on optimization of selection approaches to reduce time costs within the service selection process. They neglected a basic principle: reducing the search space for service candidates (called the solution space) is more important than focusing only on the seeking or optimization of service selection approaches. Different from most existing approaches, we propose an efficient and reliable approach to consider not only the QoS uncertainty of services, but also to pay close attention to downsizing solution spaces for the service selection process. The QoS uncertainty is used to filter low-reliability services by information theory and probability theory. The higher the QoS uncertainty of a service, the lower the reliability of the service; if unacceptable, it must be filtered from among the service candidates. Why do we use information theory and probability theory in this paper? Entropy is used to measure the expectations of a random variable and its numerical value can reflect very well the degree of a service’s disorder. Furthermore, the main role of the variance is as a measure of the stability of a sample. Using these two aspects to prune low-reliability services could help make up for defects in existing service selection approaches. Compared with previous QoS-based service selection approaches, our main contributions can be summarized as follows.

)



Middleware Framework: Aimed at efficient and reliable service selection in heterogeneous distributed software systems, a service composition framework is presented with three distinct components, i.e., Discovery Engine, Selection Engine, and Composition Engine. High Reliability. We adopt entropy and variance to compute QoS uncertainty. Low-reliability services are then pruned, and high-reliability services can be selected by our designed reliability fitness function for composited services. Low Computation Time. Because many low-reliability services are filtered, the solution space of service selection is downsized sharply. This yields lower computation time in the service selection process than existing techniques. Extensive Experiments. We implemented our approach and experimented with 5825 real-world services and 10,000 synthetic services. Our results show that our approach is superior to others. We also report results on a parameter study of our approach. The remainder of this paper is organized as follows. In Section 2, we introduce the background of service selection, including related definitions and related work. Section 3 introduces the proposed service composition framework. Section 4 describes our approach in detail, including computing QoS uncertainty, filtering uncertain services, and the service selection process. The evaluation in Section 5 demonstrates the benefits of our approach. Finally, Section 6 concludes the paper. 2. Background 2.1. Related concepts In this section, we explain some concepts related to service selection and service composition. The purpose of a composition service is to achieve a particular function that can satisfy the user’s requirements and preferences. It is obtained by combining a plurality of service candidates which are selected from each service class (which consist of a number of service candidates). We can understand the concepts of a composite service thoroughly through the following example. In a composite service S = {s1 , s2 , . . . , sn }, any si ∈ S and si = {si1 , si2 , . . . , sil } refers to a service class and contains l (l > 1) functionally equivalent service candidates with different QoS values. The QoS values for a specific web service are {q1 , q2 , . . . , ql }. The QoS affects the performance of a web service and is a nonfunctional attribute of the web service. A service’s QoS has many attributes such as response time, reliability, throughput, delay, availability, and so on. Generally, QoS attributes can be divided into two categories: positive and negative. Positive QoS attributes (e.g., reliability, availability, etc.) means that the larger the attribute value, the better the quality of the web service. Conversely, negative QoS attributes (e.g., response time, delay, etc.) should be kept as low as possible. In this paper, we consider both positive and negative QoS attributes. Generally, a service’s QoS contains multiple attributes. We could obtain the corresponding attribute value through quantitative calculation. For example, the service sij has r attributes and its attribute vector can be expressed as Qsij = {q1 (sij ), q2 (sij ), . . . , qr (sij )}, where the value of qk (sij ) (1 < k < r ) represents the kth attribute value in service sij . Similarly, the composite service’s attribute vector can be expressed as QS = {q1 (S ), q2 (S ), . . . , qr (S )}, where the value of qk (S ) is aggregated by the kth attribute values from all the selected service candidates. Table 1 lists the QoS aggregation functions of the sequential composition model. Other models (e.g., parallel, conditional, and loop) can be transformed into the sequential model using techniques described in existing papers [15].

S. Wang et al. / Future Generation Computer Systems ( Table 1 QoS aggregation functions. QoS attributes

QoS aggregated functions

Response time

q(S ) =

Throughput

q(S ) = minni=1 q(si )

Reputation

q(S ) =

Reliability

q(S ) =

i=1 q(si )

n 1

n

n  n

i=1

q(si )

i=1 q(si )

Definition 1 (QoS Utility Function). Suppose there are x negative QoS attributes to be maximized and y positive QoS attributes to be minimized. The QoS utility functions for a web service sij ∈ si and composited service S are defined as follows: x  Qimax ,α − qα (sij ) min Qimax ,α − Qi,α

α=1

U (S ) =

x  Q max − qα (S )

α

α=1

Qαmax − Qαmin

· ωα +

y  qβ (sij ) − Qimin ,β

β=1

· ωα +

min Qimax ,β − Qi,β

y  qβ (S ) − Qβmin

β=1

Qβmax − Qβmin

· ωβ

· ωβ

(1)

(2)

with

 x   max  Qαmax = Qimax  ,α (Qi,α = max qα (sij ))  ∀s ∈s ij

i=1 y

(3)

∀sij ∈si

 x   min min  Qimin Qα = ,α (Qi,α = min qα (sij ))  ∀s ∈s i=1 y

ij

i

(4)

   min  Qimin Qβmin = ,β (Qi,β = min qβ (sij )) i=1

3

Linguistic terms

Weight values

Very unimportant Unimportant Medium Important Very important

0.125 0.25 0.5 0.75 1

class si . For positive QoS attributes, conversely, we compare the distance qk (sij ) − Qimin ,k between the value of a service candidate sij and the minimum value in a service class si with the distance min Qimax ,k −Qi,k between the maximum and minimum in a service class si . All QoS attributes are weighted by a user’s preferences so that the QoS utility function does not rely on any attribute, but relies on the user’s preferences. The weight ωk is a very important factor which represents the user’s preference for the kth attribute in a composite service. It is usually r assigned a value within [0, 1], and all weight values satisfy k=1 ωk = 1. The larger the numerical value, the more user attention is focused on requirements for this attribute, and the more important a position the attribute will occupy in the service selection process. Therefore, we should set the value of ωk according to the user’s preference for the kth attribute. Of course, when a user does not know how to assign the ω value of each attribute, the user can assign the weight ω for each attribute through linguistic terms, such as ‘very unimportant,’ ‘unimportant,’ ‘medium,’ ‘important,’ and ‘very important.’ The distribution of specific levels is shown in Table 2 [16]. If the user does not specify the weight ω for each QoS attribute, we allocate an average ω value for each attribute. In web service selection, in order to obtain an optimal composite service, we should add global QoS constraints to filter web services that do not satisfy the user’s QoS requirements. This is helpful for shortening computation time and improving service selection efficiency. We can obtain different optimal composite services with different constraint sets; for example, for a given vector of global QoS constraints C = {C1 , C2 , . . . , Cm } (0 < m < r ). Each constraint can be expressed in terms of upper or lower bounds for the aggregated QoS values. We can take advantage of these m constraints to select the optimal composite service. In this paper, we consider both negative and positive attributes in the service selection process for composite services.

i

   max  Qimax Qβmax = ,β (Qi,β = max qβ (sij )) i=1



Table 2 Weight values corresponding to linguistic terms.

Generally, in a web service composition, each service candidate contains multiple QoS attributes, which lead to different units or scope of its QoS attributes. This is not helpful for service selection. Therefore, QoS utility functions are used to solve the global QoS attribute values of each candidate service. The QoS utility function is usually employed to map the vector of QoS values Qsij into a single real value Usij . Then we launch service selection by sorting and ranking each candidate in terms of its global QoS aggregated value Usij . As is well-known, in web service compositions, users generally expect lower negative QoS attribute values and higher positive QoS attribute values. So, for negative QoS attributes, minimum values should be obtained while, for positive QoS attributes, maximum values should be obtained. The QoS utility function in this paper is similar to that in [5,10]. It scales all attribute values in the domain [0,1] for uniform computing of multi-dimensional service attributes; then it adds the user’s requests for each attribute. Here we list the QoS utility function definition.

U (sij ) =

)

∀sij ∈si

where ωα and ωβ represent the user’s preferences and are the x y weights for each QoS attribute, satisfying α=1 ωα + β=1 ωβ = 1(0 < ωα , ωβ < 1); Qimax ,k (0 < k < α or 0 < k < β) is the maximum value of the kth attribute among all service candidates of the service class si , and similarly Qimin ,k is the minimum value in class si ; Qkmax is the summation of each Qimax ,k in the composition service S and similarly Qkmin is the summation of each Qimin ,k in the composition service S.

In Definition 1, for negative QoS attributes we compare the distance Qimax ,k − qk (sij ) between the maximum value in a service class si and the value of a service candidate sij with the distance min Qimax ,k − Qi,k between the maximum and minimum in a service

2.2. Related work There are numerous service selection approaches in the literature. Here, we review only some of the notable work. For reliable service selection, Ran [17] proposed a new web service discovery model by combining functional requests with nonfunctional requests for service discovery. The model opens a new, wide research area that selects web services based on QoS. From the beginning of research in this area, many researchers have focused most attention on QoS-based web service selection and composition [6–8,12,18,19]. For example, Hwang et al. [8] studied the dynamic web service selection problem in a failureprone environment, which aims to determine a subset of web services to be invoked at run-time so as to successfully orchestrate a composite web service. Zheng et al. [12] proposed a collaborative quality-of-service (QoS) prediction approach for web services by taking advantages of the past web service usage experiences of service users. Liu et al. [18] proposed an open, fair, and dynamic QoS computation model for web service selection, but this model had the shortcoming that it considered only QoS attributes without considering the user’s requirements and preferences. For efficient service selection, integer programming [9–11,20] was often used to solve the selection of the optimal composite

4

S. Wang et al. / Future Generation Computer Systems (

service. In [9], the authors proposed a novel service selection optimization approach. This approach contained the following three main steps. First, loop peeling is adopted in optimization. Second, if a feasible solution for the web service composition problem does not exist, QoS parameters are negotiated in order to determine new quality values for web service invocations. Finally, a new class of global constraints is introduced which allows the execution of stateful web service components. In [20], the authors presented a middleware platform for web service composition by solving the maximization of utility functions over QoS attributes while satisfying the user’s requirements. They used integer programming to solve the optimal utility values. Alrifai et al. [3] proposed an approach based on the notion of skyline to effectively and efficiently select services for composition, reducing the number of candidate services to be considered. Our previous work [11] employed a cloud model for pruning redundant services, then used mixed integer programming to select the optimal services. Moreover, some researchers [21–23] considered the impact of QoS dependencies in the service selection processes. Barakat et al. [21] presented a correlation-aware service selection approach for handling QoS dependencies among web services and improved composition quality. The approach first models the quality dependencies among web services and then uses correlation-aware search space reduction techniques to eliminate uninteresting compositions from the search space before selection. Additionally, Deng et al. [22] proposed a novel approach that considered the QoS-aware service composition problem in the presence of service-dependent QoS and user-provided topological and QoS constraints. The approach effectively handles servicedependent QoS by integrating it directly into the composition process rather than treating it after the composition, and significantly improves performance in real-life scenarios with complex service and QoS dependencies. Deng et al. [23] proposed a method of service selection to manage QoS correlations by accounting for all services that may be integrated into optimal composite services and prunes services that are not the optimal candidate services. Although the above-mentioned approaches perform well in the service selection process, because they do not consider QoS uncertainty, it is impossible to guarantee the reliability of the solution. In contrast to our other previous work [24], this study designs a service composition middleware framework to support service selection in heterogeneous distributed software systems, and then proposes an efficient and reliable service selection approach by computing QoS uncertainty. Moreover, this paper focuses on new and expanded comparison experiments to evaluate the proposed approach. 3. Middleware framework Aimed at service selection in heterogeneous distributed software systems, a service composition framework is presented as shown in Fig. 1. There are three distinct components in our framework, i.e., the Discovery Engine, Selection Engine, and Composition Engine (see Fig. 2). Discovery Engine: Its main function is obtaining valid service lists, and the user’s QoS constraints and preferences. Service providers from heterogeneous distributed systems publish their services in a UDDI [25] (service registry) where they can be found by users or service requesters based on their functional and nonfunctional properties. Given an abstract composition request according a user’s requests, the Discovery Engine uses UDDI to locate available services with service providers for each task. UDDI uses syntactic or semantic functional matching between the tasks and service descriptions to find a list of candidate services for each

)



Fig. 1. Framework for service composition middleware.

Fig. 2. Process of service selection.

task. QoS constraints and the user’s preferences can be obtained from user input [26]. Selection Engine: This is the core of the middleware framework. Its function is finding selection results by aggregating QoS functions. In this engine, QoS uncertainty-computing model and an uncertain service-filtering model are used to prune unreliable services and reduce the solution space, respectively. The service selection model adopts 0–1 integer programming to find and select optimal services according user preferences and QoS constraints. Composition Engine: Having found suitable services, the Composition Engine can bind concrete services to them and invoke the selected concrete services one by one on the basis of selection results from the Selection Engine. To aid in understanding, Fig. 3 shows the seven-step process of service selection with service composition middleware. Based on the proposed service composition middleware, we propose an efficient and reliable service selection approach by computing QoS uncertainty. 4. Our approach The proposed approach in this paper consists of three phases. The first phase is QoS uncertainty computing, in which we adopt information theory and probability theory to transform QoS values

S. Wang et al. / Future Generation Computer Systems (

)



5

in this case, the H (X ) can be determined by the following: H (X ) = −

n 

p(xi ) log2 p(xi )

(6)

i=1

where p(xi ) represents xi ’s probability and p(xi ) ≥  n i=1 p(xi ) = 1. Note that the entropy value H (X ) ≥ 0.

0 and

(3) Variance In probability theory, variance is used to measure the deviation between random variables and their mathematical expectation. The larger the variance value, the more dispersed the random variable’s value relative to the expectation, and the greater the degree of disorder of the sample data. Since the variance’s characteristics can fully reflect the stability of the IS, we could adopt variance to filter uncertain services in composite services. The following gives the definition of the variance. Definition 3 (Variance). Let X be a random variable of IS. Let EX be the mathematical expectation of X , and DX be the variance of X . Then, EX and DX can be determined by the following:

Fig. 3. Reliability with the WSDream dataset.

into two qualitative concepts. They represent the stability of a service, aiming to rank services. The second phase is uncertain services filtering, in which we prune uncertain service candidates according to the two qualitative concepts, aiming to reduce the solution space for service selection. The third phase is service selection, in which we design a reliability fitness function to find the most reliable composite service with low computation time. 4.1. Computing QoS uncertainty We first normalize quantitative QoS values into the domain [0, 1], which is convenient for data processing and uniform QoS attribute values. Then we employ information theory and probability theory to compute QoS uncertainty by transforming QoS quantitative values into two QoS qualitative concepts. Then, according to the two qualitative concepts, a service with consistently good QoS can be distinguished from other services. For illustration, we first outline the following relevant concepts. (1) Data normalization Through the normalization process, limiting values within a certain range (e.g., [0–1]), is convenient for the QoS utility function. Data normalization means that the original QoS values are scaled proportionally. There are many ways for normalization such as linear conversion, logarithmic conversion, cotangent conversion, etc. In this paper, we adopt linear conversion to normalize the QoS value. The specific formula is as follows: y = (x − Minvalue)/(Maxvalue − Minvalue)

(5)

where x and y represent the corresponding values before and after QoS data conversion, respectively, and Maxvalue and Minvalue represent the maximum and minimum of the original data, respectively. (2) Entropy In information theory, entropy is used to measure the expectation value of a random variable, showing the average uncertainty of the overall information source (IS). For a particular IS, the entropy value changes for different statistical properties. In general, the greater the uncertainty of the variable, the larger the entropy value, and the greater disorder of its corresponding IS. In this paper, we consider real-world QoS historical values for a service as a discrete IS, and then we employ entropy to filter services using the following entropy definition: Definition 2 (Entropy). Let X be a random variable of IS and H (X ) be the entropy value of X . {X1 , X2 , . . . , Xn } is the range of X . Then,

E (X ) =

n 

xi p(xi )

(7)

i =1

D(X ) = E (X 2 ) − (E (X ))2 where p(xi )(p(xi ) ≥ 0) represents xi ’s probability and

(8)

n

i=1

p(xi )

= 1. Since the entropy and variance both fully reflect the stability of the IS, why do we adopt both and not adopt just one of them? Consider the existence of the same two entropy values or variances which represent two ISs. In this situation, if we only filter uncertain services according to one of the criteria, we will not obtain ideal results. For example, consider two random variables X1 = {0.01, 0.01, 0.06, 0.06, 0.09, 0.09} and X2 = {0.04, 0.04, 0.05, 0.05, 0.06, 0.06}, which represent the response times of six users accessing two different services (WS 1 , WS 2 ). According to (6), the entropy values can be calculated, but we find that En(X1 ) = En(X2 ). Then we could not prune a service according its entropy value. Furthermore, we calculate their variances Dx(X1), Dx(X2). Since Dx(X1 ) > Dx(X2 ), the WS 2 is more stable and reliable than WS 1 ; then we should prune WS 1 . Hence, this example demonstrates intuitively the benefit of using both criteria. Currently, entropy and variance have been applied to many fields such as financial markets, investment risks, etc. They have achieved many good results which provide a basis and reference for our approach. Using entropy and variance to determine which uncertain services to abandon, these criteria help us to select the least uncertain services. 4.2. Uncertain service filtering Through the above QoS uncertainty computing, we could use entropy En and variance Dx to filter uncertain services. En helps us to filter the services in a coarse-grained way. Suppose there are l functionally equivalent services. The top l1 (l1 < l) smallest value of En is selected and the rest are discarded. We then filter the l1 services by Dx. We select the smallest l2 services from l1 . Finally, we obtain l2 low-uncertainty services. For service filtering, we take three services WS 1 , WS 2 , and WS 3 that offer similar hotel services as an example to illustrate the different implications. In the example, the performance of WS 1 , WS 2 , and WS 3 is recorded by a series of transaction logs, which helps capture the actual QoS delivered by each provider in practical application. Because the dynamic environment in which these service providers operate causes uncertainty in their

6

S. Wang et al. / Future Generation Computer Systems (

)



Table 3 A set of sample service transactions. WS 1

WS 2

WS 3

ID

Response time (ms)

ID

Response time (ms)

ID

Response time (ms)

S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S1

12 31 15 32 14 32 31 36 34 13 24.9

S20 S21 S22 S23 S24 S25 S26 S27 S28 S29 S2

23 29 24 22 28 29 25 28 23 27 25.8

S30 S31 S32 S33 S34 S35 S36 S37 S38 S39 S3

18 16 31 32 19 33 34 17 20 33 25.3

performance, this is reflected by the fluctuation among different transactions. For ease of illustration, although the actual number of transactions would be much larger, we consider only 10 transactions for the four services, respectively. Table 3 lists these transactions with a focus on the attribute of response time: each value represents the response time when a user invokes a service. Then the aggregated QoS values (S1 , S2 , and S3 ) obtained by averaging all transactions are given in the last row of Table 3. From Table 3, the aggregated QoS values of S1 are less than those of S2 and S3 , i.e., S1 < S2 , S1 < S3 . In a traditional service selection approach, service S1 is frequently selected as a service component in service composition because of 24.9 < 25.8 and 24.9 < 25.3. However, after analyzing each transaction for these three services in great depth, we find the following three important facts that may be ignored by some existing service selection approaches:

that the QoS of service WS 2 is consistently good but services WS 1 and WS 3 have large QoS variance. Thus, service WS 2 should be selected as a service candidate rather than the other two services, which is different from some traditional approaches. Then, by setting different threshold parameters for En and Dx according to different service environments, services with large QoS variance can be distinguished from services with a consistently good QoS. The latter are then chosen as service candidates in preference to the former to achieve reliable service selection. In this way, our approach can filter uncertain services, thereby reducing the solution space for service selection and shortening computation time for service composition.

(1) In service WS 1 , five transactions lie in the interval [31, 35], four transactions lie in the interval [11, 15], and one transaction lies in the interval [36, 40]. In service WS 2 , five transactions lie in the interval [21, 25] and the other five transactions lie in the interval [26, 30]. Similarly, in service WS 3 , five transactions lie in the interval [15, 20] and the other five transactions lie in the interval [31, 35]. That means although the average response time of service WS 1 is slightly less than those of WS 2 and WS 3 , the response time of service WS 1 is larger than those of services WS 2 and WS 3 in most transactions. (2) By comparing service WS 2 with WS 3 , we can find the transactions that are evenly distributed in two intervals. However, service WS 3 has a larger span distribution than service WS 2 , which means that service WS 2 is more stable than service WS 3 . (3) The response time for services WS 1 and WS 3 is more volatile than that for service WS 2 , i.e., services WS 1 and WS 3 show a large variance for their QoS, but service WS 2 has consistently good QoS.

After QoS uncertainty computing and uncertain service filtering, service candidates with consistently good QoS can be discovered in each service class. Then a service selection solution has to be used to find the most reliable service of each class with global QoS constraints. In this paper, a 0–1 integer programming model is used to solve the optimization problem of service selection based on filtered services. Recently, integer programming has been used to solve the service composition problem in several studies [9–11, 20] and achieved good results. In this paper, we propose a reliability fitness function definition to reflect the reliability of service selection. The larger the reliability fitness value, the more reliable the solution of service selection.

Hence, according to the three facts above, if service WS 1 or WS 3 is selected as a service component, the actual execution result of service WS 1 or WS 3 may deviate from their average response time, which will result in poor composition service QoS or service selection failure. It is obvious that service WS 2 is more stable than the other two services. Therefore, how to compute service uncertainty to distinguish a service with a consistently good QoS from other services with large QoS variance is an important issue. In this study, we adopt uncertain service filtering for Table 3. Then the entropy and variance {En, Dx} of services WS 1 , WS 2 and WS 3 can be calculated, i.e., NS 1 = {1.361, 106.25}, NS 2 = {1, 6.25}, and NS 3 = {1, 56.25}. Since WS 1 ’s En is higher than for WS 2 and WS 3 (i.e., 1.361 > 1), the uncertainty level of service WS 1 is smaller than that for services WS 2 and WS 3 . In addition, WS 3 ’s Dx is higher than that of WS 2 (i.e., 56.25 > 6.25), so the uncertainty level of service WS 3 is smaller than that of service WS 2 . This means

F (S ) =

4.3. Service selection

Definition 4 (Reliability Fitness Function). Suppose there are x negative QoS attributes and y positive QoS attributes. The reliability fitness function is defined as follows:

x 

Qαmax −

n  l 

(Qαmax − Qαmin ) · σα

α=1

n  l  y

+

xij · qα (sij )

i=1 j=1

 i=1 j=1 β=1

· ωα

xij · qβ (sij ) − Qβmin

(Qβmax − Qβmin ) · σβ

· ωβ

(9)

where xij is a binary decision variable for representing whether the service candidate is selected; a service candidate sij is selected in the optimal composition service if its corresponding variable x is set to 1 and x is set to 0 otherwise; σ is the standard deviation of a sample that consists of all the service candidates from all service classes and can reflect the overall volatility of all service candidates; σk (0 < k < α or 0 < k < β) is the standard deviation of a sample that consists of kth attribute values for all the service candidates from all service classes and can reflect the

S. Wang et al. / Future Generation Computer Systems (

)



7

overall volatility of all service candidates; qk (sij ) represents the kth attribute value in service sij and Qαmax , Qαmin , Qβmax , and Qβmin can be calculated by formula (4). In the phase of service selection, we can obtain the most reliable composite service that satisfies all global QoS constraints. So the 0–1 integer programming model can be formulated as follows: Max F (S )

Subjectto

(10)

 n  l     qk (sij ) · xij ≤ (≥)Ck ,  

1≤k≤m≤r

i=1 j=1 l

    xij = 1,  

(11)

1 ≤ i ≤ n, xij ∈ {0, 1}

j =1

where r is the number of QoS attributes; m is the number of QoS constraints; n is the number of service classes; and Ck represents the kth global QoS constrains with respect to the kth QoS attribute. By solving (10) and (11), a list of reliable services is returned from each class to the service broker providing a composition service for users.

Fig. 4. Reliability with the Random dataset.

We have implemented our approach and experimented with a real-world dataset and a synthetic dataset. We compare our approach with the global approach [9] and the skyline approach [3] by conducting several groups of experiments. Moreover, we also studied the parameters of our method. The experiments indicate that the solution of our approach is more reliable than those of other approaches, and shows that the computation time of our approach is much shorter than that of other methods.

We recorded the required computation time t1 and the obtained reliability fitness value v1 for each experiment. We then applied the same experiments for all approaches and recorded the computation time t2 and the reliability fitness value v2 for the same experiments. Then we verified the accuracy of the experiments by comparing t1 with t2 and v1 with v2 for each experiment. In our experiments, the number of QoS attributes r was set to 2, QoS constraints m were also set to 2, the number of service candidates per service class varied from 100 to 1000, and the weights for the two attributes were both 0.5. For the WSDream dataset, the number of service classes n was fixed at 5 and 10 for the Random dataset. Moreover, the number of historical transactions was set to 250 for the WSDream dataset and to 500 for the Random dataset. In our approach, we used En to filter out half of the service candidates in the first phase; in second phase, we used Dx to select 2/5 of service candidates from the selected service candidates in the first phase. Then we obtained 1/5 of the service candidates which are relatively stable and have low QoS certainty. All experiments were performed on the same computer with an Intel(R) Xeon(R) 2.6 GHz processor, 32.0 GB of RAM, running Windows Server 2008 R2 and MATLAB R2013a. All experiments were performed 20 times.

5.1. Experimental setup

5.2. Reliability comparison

We conduct experiments using two types of datasets. The first is a real-world web service QoS dataset named the WSDream dataset from [27,28]. The WSDream dataset contains nearly 2 million realworld QoS web service invocation records. Values of two QoS attributes (i.e., Response time and Throughput) are collected by 339 service users on 5825 web services. It also contains the information for these 5825 web services and 339 users. In order to make sure the experimental results of our approach are not biased for the WSDream dataset used, we also conducted experiments with a second, randomly generated dataset (named the Random dataset) that contains 10,000 services with two QoS attributes. We performed several experiments on the QoS-based service composition problem. Each experiment consisted of a service composition request with n service classes, l service candidates per class, and m global QoS constraints. By varying the numbers of these parameters, we collected the results of each experiment where each unique combination of these three parameters represents one experiment. We first performed the experiments using (9) to find the optimal selection that satisfies all global QoS constraints while maximizing the reliability fitness value.

The reliability fitness function has a theoretical maximum Fmax (S ) which represents the value when obtaining the optimal service composition in an ideal situation. The Fmax (S ) is not fixed and varies with weight ω and standard deviation σ . In this paper, we define the concept of reliability as follows:

4.4. Limitations of our proposed approach

• Our approach may fail when the number of candidate service is rare. The higher the number of candidate services is, the better the performance of our proposed approach is. • If the QoS values of each service is stable, the advantage of our approach is not obvious. • If the service is newly developed services or the number of QoS values is rare, our approach is not effective. 5. Experiments

Reliability = Ffitness /Fmax (S )

(12)

where Ffitness (S ) represents the reliability fitness value obtained in each experiment; and Fmax (S ) is a constant value of 62.5 for the WSDream dataset and 75 for the Random dataset. These values were determined by analyzing a large number of experimental results. As shown in Figs. 3 and 4, we compare our approach with the global and skyline approaches in terms of the reliability. We found that the reliability obtained by our approach is always larger than that for the global or skyline approaches with an increasing number of service candidates. These experimental results illustrate that our approach verifies effectively the influence of QoS uncertainty on the quality of the composition service and greatly improves the reliability of service selection. By using En and Dx to monitor services’ historical QoS transactions, our approach

8

S. Wang et al. / Future Generation Computer Systems (

)



Fig. 7. Reliability with varying Dx. Fig. 5. Computation time with the WSDream dataset.

Fig. 8. Computation time with varying Dx.

Fig. 6. Computation time with the Random dataset.

effectively identifies those with large QoS variances and prunes them. Our approach greatly improves the reliability of service composition. 5.3. Computation time comparisons In this section, we perform experiments to compare our approach with the global and skyline approaches in terms of computation time. From Figs. 5 and 6, we find that the computation time grows with the number of service candidates and that, for any service candidates, the computation time of our approach is always the shortest among all approaches. The experimental results illustrate that our approach significantly reduces the time cost of service composition because it reduces the size of the search space. 5.4. Parameter study 5.4.1. Entropy En and variance Dx In this section, we describe the study on the two parameters En and Dx. We fixed one of the two parameters and then obtained the reliability and computation time with the change of the other one. Figs. 7 and 8 list the experimental results (reliability and computation time) with varying Dx. We fixed the number of service candidates at 500; the En was 0.5, which represents selecting half

of service candidates according to En. The Dx varied from 0.05 to 0.5, which represents the proportion of the final selected service candidates. Figs. 9 and 10 list experimental results for varying En. We also fixed the number of service candidates at 500, and the Dx was 0.2. The En varied from 0.55 to 1, which represents the proportion of selected candidate services according to En. These experiments better describe and enrich our approach and allow the reader a deeper understanding of the benefits of our approach. 5.4.2. Weight ω In this section, we also verify influence of the weight ω in a reliable composition service. The weight ω represents the user’s requests for each QoS attribute, making it very important for service composition. From Figs. 11–14 list the results for both datasets. Each value is the average of 10 results as the number of service candidates varies from 100 to 1000. As the weight of response time varies from 0 to 1, the weight of throughput varies correspondingly from 1 to 0. From the results, we find that, no matter how the ω is allocated, the results obtained by our approach are better than for global approach, i.e., the average of reliability fitness values obtained by our approach is larger than for global approach and the average of computation time is smaller than for global approach. All of the above experimental results demonstrate more fully that our approach greatly improves the reliability of service composition and shortens its computation. Our approach can select the optimal composited service while spending less time and obtaining greater reliability.

S. Wang et al. / Future Generation Computer Systems (

)



9

Fig. 9. Reliability with varying En. Fig. 12. Reliability with varying ω based on the Random dataset.

Fig. 10. Computation time with varying En.

Fig. 13. Computation time with varying ω based on the WSDream dataset.

Fig. 11. Reliability with varying ω based on the WSDream dataset.

6. Conclusions In this paper, based on a service composition framework, we presented an efficient and reliable service selection approach. Our approach uses the two concepts of entropy and variance to compute QoS uncertainty and then uses these criteria to filter out services with high uncertainty. Finally, we designed a reliability fitness function to select the most reliable composite services

Fig. 14. Computation time with varying ω based on the Random dataset.

using 0–1 integer programming. We evaluated our approach using both real-world and randomly generated service datasets. The results show that our approach obtains more reliable solutions with lower computation time than other approaches, meaning that

10

S. Wang et al. / Future Generation Computer Systems (

our approach can perform service selection on the basis of a user’s requests more efficiently and effectively. In our future work, we will strengthen our approach and continue to research more efficient service selection methods and try to adopt QoS data from the programmableWeb.com. We aim to further help users find the optimal composite service according to their QoS requirements and user preferences in the future. Acknowledgments This work was supported by the National Natural Science Foundation of China under grants no. 61202435 and 61272521 and the Natural Science Foundation of Beijing under grant no. 4132048. References [1] R. Mietzner, C. Fehling, D. Karastoyanova, F. Leymann, Combining horizontal and vertical composition of services, in: IEEE International Conference on Service-Oriented Computing and Applications, SOCA 2010, 2010, pp. 1–8. [2] G. Canfora, M. Di Penta, R. Esposito, M.L. Villani, A framework for QoS-aware binding and re-binding of composite web services, J. Syst. Softw. 81 (2008) 1754–1769. [3] M. Alrifai, D. Skoutas, T. Risse, Selecting skyline services for QoS-based web service composition, in: The 19th International Conference on the World Wide Web, WWW 2010, 2010, pp. 11–20. [4] M. Alrifai, T. Risse, Combining global optimization with local selection for efficient QoS-aware service composition, in: The 18th International Conference on the World Wide Web, WWW 2009, 2009, pp. 881–890. [5] G. Canfora, M.D. Penta, R. Esposito, M.L. Villani, An approach for QoSaware service composition based on genetic algorithms, in: The 7th Annual Conference on Genetic and Evolutionary Computation, GECCO 2005, 2005, pp. 1069–1075. [6] G. Canfora, M. Di Penta, R. Esposito, M.L. Villani, QoS-aware replanning of composite web services, in: IEEE International Conference on Web Services, ICWS 2005, 2005, pp. 121–129. [7] S.-Y. Hwang, E.-P. Lim, C.-H. Lee, C.-H. Chen, Dynamic web service selection for reliable web service composition, IEEE Trans. Serv. Comput. 1 (2008) 104–116. [8] D. Ardagna, B. Pernici, Adaptive service composition in flexible processes, IEEE Trans. Softw. Eng. 33 (2007) 369–384. [9] T. Yu, Y. Zhang, K.-J. Lin, Efficient algorithms for web services selection with end-to-end QoS constraints, ACM Trans. Web 1 (2007) 1–26. [10] W. Shangguang, Z. Zheng, S. Qibo, Z. Hua, Y. Fangchun, Cloud model for service selection, in: The 30th IEEE Conference on Computer Communications Workshops on Cloud Computing, INFOCOM WKSHPS, 2011, pp. 666–671. [11] Z. Zibin, M. Hao, M.R. Lyu, I. King, Collaborative web service QoS prediction via neighborhood integrated matrix factorization, IEEE Trans. Serv. Comput. 6 (2013) 289–299. [12] Y. Qi, A. Bouguettaya, Computing service skyline from uncertain QoWS, IEEE Trans. Serv. Comput. 3 (2010) 16–29. [13] K.S. Candan, W.-S. Li, T. Phan, M. Zhou, Frontiers in information and software as services, in: The 25th IEEE International Conference on Data Engineering, ICDE 2009, 2009, pp. 1761–1768. [14] J. Cardoso, A. Sheth, J. Miller, J. Arnold, K. Kochut, Quality of service for workflows and web service processes, Web Semant. 1 (2004) 281–308. [15] S.S. Yau, Y. Yin, QoS-based service ranking and selection for service-based systems, in: IEEE International Conference on Services Computing, SCC 2011, 2011, pp. 56–63. [16] S. Ran, A model for web services discovery with QoS, SIGecom Exchanges 4 (2003) 1–10. [17] Y. Liu, A.H. Ngu, L.Z. Zeng, QoS computation and policing in dynamic web service selection, in: The 13th International on the World Wide Web, WWW 2004, 2004, pp. 66–73. [18] K. Guosheng, L. Jianxun, T. Mingdong, L. Xiaoqing, K.K. Fletcher, Web service selection for resolving conflicting service requests, in: IEEE International Conference on Web Services, ICWS 2011, 2011, pp. 387–394. [19] L. Zeng, B. Benatallah, A.H.H. Ngu, M. Dumas, J. Kalagnanam, H. Chang, QoSaware middleware for web services composition, IEEE Trans. Softw. Eng. 30 (2004) 311–327. [20] W. Shangguang, Z. Zheng, S. Qibo, Z. Hua, Y. Fangchun, Cloud model for service selection, in: 30th IEEE Conference on Computer Communications Workshops on Cloud Computing, INFOCOM WKSHPS, 2011, pp. 666–671. [21] L. Barakat, S. Miles, M. Luck, Efficient correlation-aware service selection, in: IEEE 19th International Conference on Web Services, ICWS 2012, 2012, pp. 1–8. [22] S. Deng, H. Wu, D. Hu, J.L. Zhao, Service selection for composition with QoS correlations, IEEE Trans. Serv. Comput. (2014) http://dx.doi.org/10.1109/TSC.2014.2361138.

)



[23] F. Yuzhang, N. Le Duy, R. Kanagasabai, Dynamic service composition with service-dependent QoS attributes, in: IEEE 20th International Conference on Web Services, ICWS 2013, 2013, pp. 10–17. [24] L. Sun, S. Wang, J. Li, Q. Sun, F. Yang, QoS uncertainty filtering for fast and reliable web service selection, in: IEEE International Conference on Web Services, ICWS 2014, 2014, pp. 550–557. [25] M.P. Papazoglou, P. Traverso, S. Dustdar, F. Leymann, Service-oriented computing: State of the art and research challenges, Computer 40 (2007) 38–45. [26] F. Li, F. Yang, K. Shuang, S. Su, A policy-driven distributed framework for monitoring quality of web services, in: IEEE International Conference on Web Services, ICWS 2008, 2008, pp. 708–715. [27] Z. Zheng, Y. Zhang, M.R. Lyu, Distributed QoS evaluation for real-world Web services, in: IEEE 8th International Conference on Web Services, ICWS 2010, 2010, pp. 83–90. [28] Z. Yilei, Z. Zibin, M.R. Lyu, Exploring latent features for memory-based QoS prediction in cloud computing, in: The 30th IEEE Symposium on Reliable Distributed Systems, SRDS 2011, 2011, pp. 1–10.

Shangguang Wang is an associate professor at the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications. He received his Ph.D. degree in computer science at Beijing University of Posts and Telecommunications of China in 2011. His Ph.D. thesis was awarded as an outstanding doctoral dissertation by BUPT in 2012. His research interests include Service Computing, Mobile Services, and QoS Management.

Lin Huang received the M.E. degree in computer science and technology from the Institute of Network Technology, Beijing University of Posts and Telecommunications, in 2012. Currently, she is a Ph.D. candidate at the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications. Her research interests include Reputation measurement, Web service selection.

Lei Sun received the B.Eng. from Qingdao University. He is currently Master student at the Beijing University of Posts and Telecommunications. His research interests include service computing and distributed computing.

Ching-Hsien Hsu is a professor in the department of computer science and information engineering at Chung Hua University, Taiwan. His research includes high performance computing, cloud computing, parallel and distributed systems, and ubiquitous/pervasive computing and intelligence. He has been involved in more than 100 conferences and workshops as various chairs and more than 200 conferences/workshops as a program committee member. He is the editor-in-chief of an international journal on Grid and High Performance Computing and has served on the editorial board for approximately 20 international journals.

Fangchun Yang received his Ph.D. degree in communication and electronic systems from Beijing University of Posts and Telecommunication in 1990. He is currently a professor at the Beijing University of Posts and Telecommunication, China. He has published 6 books and more than 80 papers. His current research interests include network intelligence, services computing, communications software, soft switching technology, and network security. He is a fellow of the IET.