Web-services classification using intelligent techniques

Web-services classification using intelligent techniques

Expert Systems with Applications 37 (2010) 5484–5490 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

273KB Sizes 1 Downloads 84 Views

Expert Systems with Applications 37 (2010) 5484–5490

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Web-services classification using intelligent techniques Ramakanta Mohanty a, V. Ravi b,*, M.R. Patra a a b

Computer Science Department, Berhampur University, Berhampur, Orissa 760 007, India Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad, Andhra Pradesh 500 057, India

a r t i c l e

i n f o

Keywords: Web services Quality of services (QoS) Back propagation neural network (BPNN) Probabilistic neural network (PNN) Group method of data handling (GMDH) Classification and regression trees (CART) TreeNet Support vector machine (SVM) and ID3 decision tree (J48)

a b s t r a c t The web services, a novel paradigm in software technology, have innovative mechanism for rendering services over diversified environment. They promise to allow businesses to adapt rapidly to changes in the business environment and the needs of different customers. The rapid introduction of new web services into a dynamic business environment can adversely affect the service quality and user satisfaction. Consequently, assessment of the quality of web services is of paramount importance in selecting a web service for an application. In this paper, we employed well-known classification models viz., back propagation neural network (BPNN), probabilistic neural network (PNN), group method of data handling (GMDH), classification and regression trees (CART), TreeNet, support vector machine (SVM) and ID3 decision tree (J48) to predict the quality of a web service based on a set of quality attributes. The experiments are carried out on the QWS dataset. We applied 10-fold cross-validation to test the efficacy of the models. The J48 and TreeNet techniques outperformed all other techniques by yielding an average accuracy of 99.72%. We also performed feature selection and found that web-services relevance function (WSRF) is the most significant attribute in determining the quality of a web service. Later, we performed feature selection without WSRF and found that Reliability, Throughput, Successability, Documentation and Response Time are the most important attributes in that order. Moreover, the set of ‘if–then’ rules yielded by J48 and CART can be used as an expert system for web-services classification. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction The growth of Internet technologies is revolutionizing the way organizations do business with their partners and customers. Companies are focusing on the operation of web for more automation, efficient business process and global requirements. In order to compete, companies should implement the right software and follow recent trends in technology. They should find an integrated, e-business solution to overcome rapid change in business requirements over time. The latest development in using web for conducting business resulted in a new paradigm called web services (Tsalgatidou & Pilioura, 2002). Web services are software components, based on loosely coupled, distributed and independent services operating via the web infrastructure. They are platform and language independent, which is suitable for accessing them from heterogeneous environments. With the rapid introduction of web-services technologies, researchers focused more on the functional and interfacing aspects of web services, which include HTTP and XML-based messaging. They are used to communicate across by using open standards such as HTTP and XML-based protocols * Corresponding author. Tel.: +91 40 2353 4981; fax: +91 40 2353 5157. E-mail addresses: [email protected] (R. Mohanty), rav_padma@yahoo. com (V. Ravi), [email protected] (M.R. Patra). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.02.063

including SOAP, WSDL and UDDI (Kalepu, Krishnaswamy, & Loke, 2004). WSDL is a document that describes the service’s location on the web and the functionality the service provides. Information related to the web service is to be entered in a UDDI registry, which permits web service consumers to find out and locate the services they required. Using the information available in the UDDI registry based on the web services, client developer uses instructions in the WSDL to construct SOAP messages for exchanging data with the service over HTTP attributes (Kokash, 2005). The main objective of the paper is to develop various classification models based on intelligent techniques namely BPNN, PNN, GMDH, TreeNet, CART, SVM and J48 to predict the quality of a web service based on a number of QoS attributes. These models are developed based on the past data comprising QoS attributes as explanatory variables and the quality of web services as the dependent variable. Since each of the QoS attributes defines different dimensions of the quality of web services and since they collectively influence the quality of web service, we can safely assume that these QoS attributes are non-linearly related to the quality of web services. Therefore, we attempt to approximate this nonlinear relationship with help of several intelligent techniques. The most significant application of the developed models is that we can confidently predict the quality of a new web service (which is not in the training set) given its QoS attributes. In this context,

R. Mohanty et al. / Expert Systems with Applications 37 (2010) 5484–5490

we observed that there is no work reported addressing these aspects. For illustration purpose, we take the example of a CRM application. Assume that this CRM application is offered by two different web services. Now, any user will choose one of the web services that has higher ranking as measured by the QoS attributes, which are essentially non-functional in nature. In this context, if one develops classification model based on intelligent techniques to classify the given new web service, then the user can use this ranking in order to select a web service. In this paper, taking cue from the second author’s experience of employing intelligent techniques to various software engineering problems (Rajkiran & Ravi, 2008; Ravi, Chauhan, & Rajkiran, 2009; Vinaykumar, Ravi, Carr, & Rajkiran, 2008; Vinaykumar, Ravi, & Carr, 2009), we employ, for the first time, a host of machine learning methods to predict the quality of a web service based on a set of attributes. The rest of paper is organized as follows: Section 2 describes, in detail, the quality related issues in web services. Section 3 presents the methodology followed in this paper. Section 4 presents the results and discussion and Section 5 concludes the paper. 2. Quality issues in web services QoS plays an important role in finding out the performance of web services. Earlier, QoS has been used in networking and multimedia applications. Recently, there is a trend in adopting this concept to web services (Vinoski, 2003). The basic aim is to identify the QoS attributes (Bochmann, Kerherve, Lutffiyya, Salem, & Ye, 2001; Mani & Nagarajan, 2002; Ran, 2003; Zeng, Benatallah, Dumas, Kalagnanam, & Sheng, 2003) for improving the quality of web services through replication services (Bochmann et al., 2001), load distribution (Conti, Gregori, & Panzieri, 1999), and service redirection (Ardaiz, Freitag, & Navarro, 2001). To measure the QoS of a web service, attributes like Response Time, Throughput, Availability, Reliability, Cost, and Response Time are considered.

5485

attributes, which are independent of the service environment, and external attributes that are dependent on the service environment. The attributes of the model in Table 1 are almost similar to the attributes of QWS Dataset used in this paper. 3. Methodology The proposed models proceed in the following three major steps viz. feature selection, classification and rule base generation (from CART and J48). We employed classifiers such as BPNN, PNN, GMDH, CART, J48, TreeNet and SVM for classifying the web services using the quality of web service (QWS) (Al-Masri & Mahmood, 2008; http://www.uogue/ph.ca/~qmahmoud/qws/index. html). By observing the real web service implementations Al-Masri and Mahmood (2008) created the dataset. The dataset is collected from Web Service Crawler Engine (WSCE). The majority of web services were obtained from public sources on the web including UDDI registries, search engines and service portals. The dataset consists of 364 web services each with a set of 10 QWS attributes. 3.1. Description of QWS dataset QWS dataset consists of different rows of web service implementations and their attributes as presented in Table 2 (http:// www.uogue/ph.ca/~qmahmoud/qws/index.html). The attributes X1 to X10 are used as explanatory variables and the attribute x11 is used as the target variable. However, attributes X12 and X13 are ignored as they do not contribute to the analysis. The web services in the QWS dataset are classified into four categories, such as (i) Platinum (high quality) (ii) gold (iii) silver and (iv) bronze (low quality). The classification is measured based on the overall quality rating provided by WSRF. It is grouped into a particular web service based on classification. The functionality of the web services can be helpful to differentiate between various services. (AL-Masri & Mahmood, 2008; http://www.uogue/ph.ca/~qmahmoud/qws/ index.html).

2.1. QoS attributes 3.2. Overview of intelligent techniques employed According to Kalepu et al. (2004), quality of service (QoS) is a combination of several qualities or properties of a service, such as: (i) Availability (ii) Reliability (iii) Price (iv) Throughput (v) Response Time (vi) Latency (vii) Performance (viii) Security (ix) Regulatory (x) Accessibility (xi) Robustness/Flexibility (xii) Accuracy (xiii) Servability (xiv) Integrity and (xv) Reputation. QoS parameters determine the performances of the web services and find out which web services are best and meet user’s requirements. Users of web services are not human beings but programs that send requests for services to web service providers. QoS issues in web services have to be evaluated from the perspective of the providers of web services (such as the airline-booking web service) and from the perspective of the users of these services (in this case, the travel agent site) (Araban & Sterling, 2004). There are other models available related to the quality of web services issues. A QoS model (Araban & Sterling, 2004) represented in Table 1 shows that the main classification of QoS attributes is based on internal

3.2.1. PNN The PNN (Specht, 1990) employs Bayesian decision-making theory based on an estimate of the probability density in the data space. The PNN works for problems with integer outputs and can be used for classification problems. There are two strategies to calibrate the smoothing factor of the PNN, a faster iterative approach that optimizes a single smoothing factor common to all input variables and a slower GA-based approach that optimizes individual smoothing factors for all the input variables. The first strategy is appropriate when all inputs are equally important and the second one is chosen when inputs differ in their impact on class prediction. The PNN as implemented here has 10 neurons in the input layer corresponding to the 10 input variables in the dataset. The pattern layer stores the entire training patterns one in each pattern neuron. The summation layer has two neurons with one neuron catering to the numerator and another to the denominator of the

Table 1 QoS model of web services (Araban & Sterling, 2004). QoS factor

Internal attributes (Metrics)

External attributes (Metrics)

Reliability Performance Integrity Usability

Correctness (accuracy and precision) Efficiency (time and space complexity) – Input and output attributes

Availability and consistency Load management (throughput, waiting and response time security) Security –

5486

R. Mohanty et al. / Expert Systems with Applications 37 (2010) 5484–5490

Table 2 QWS datasets attributes and their description. ID

Attribute name

Description

Units

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

Response Time Availability Throughput Successability Reliability Compliance Best Practices Latency Documentation WSRF Service classification Service name WSDL address

Time taken to send a request and receive a response Number of successful invocations/total invocations Total number of invocations for a given period of time Number of response/number of request messages Ratio of the number of error messages to total messages The extent to which a WSDL document follows WSDL documentation The extent to which a web service follows Time taken for the server to process a given request Measure a documentation (i.e. description tags) in WSDL Web service relevance function: a rank for web service quality Levels representing service offering qualities (1 through 4) Name of the web services Location of the web service definition language (WSDL) file on web

ms % Invokes/s % % % % ms % % Classifier None None

non-parametric regression estimate. Finally, the output layer has four neurons indicating the class code of the pattern. 3.2.2. Back propagation neural networks (BPNN) BPNNs are one of the most common neural network structures, as they are simple and effective, and have found home in a wide assortment of machine learning applications. BPNNs start as a network of nodes arranged in three layers – the input, hidden, and output layers. The input and output layers serve as nodes to buffer input and output for the model, respectively, and the hidden layer serves to provide a means for input relations to be represented in the output. Before any data is passed to the network, the weights for the nodes are random, which has the effect of making the network much like a newborn’s brain – developed but without knowledge. BPNNs are feed-forward neural networks trained with the standard back propagation algorithm. They are supervised networks so they require a desired response to be trained. They learn how to transform input data into a desired response So they are widely used for pattern classification and prediction. A multi-layer perceptron is made up of several layers of neurons. Each layer is fully connected to the next one. With one or two hidden layers, they can approximate virtually any input–output map. They have been shown to yield accurate predictions in difficult problems (Rumelhart, Hinton, & Williams, 1986, chap. 8). 3.2.3. GMDH Group method of data handling (GMDH) or polynomial net is a powerful network architecture (Ivakhnenko, 1968). In fact, the GMDH network is not like regular feed-forward networks and was not originally represented as a network. The GMDH network is implemented with polynomial terms in the links and a genetic component to decide how many layers are built. The result of training at the output layer can be represented as a polynomial function of all or some of inputs. The main idea behind GMDH is that it builds a function called a polynomial model that would behave in such a way that the predicted value of the output would be as close as possible to the actual value of output (http://www.inf.kiew.ua/gmdh-home). 3.2.4. Support vector machines (SVM) The SVM is a powerful learning algorithm based on recent advances in statistical learning theory (Vapnik, 1998). SVMs are learning systems that use a hypothesis space of linear functions in a high-dimensional space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory (Cristianini & Shawe-Taylor, 2000). SVMs have recently become one of the popular tools for machine learning and data mining and can perform both classification and regression. SVM uses a linear model to implement non-linear class

boundaries by mapping input vectors non-linearly into a highdimensional feature space using kernels. The training examples that are closest to the maximum margin hyper plane are called support vectors. All other training examples are irrelevant for defining the binary class boundaries. The support vectors are then used to construct an optimal linear separating hyper plane (in case of pattern recognition) or a linear regression function (in case of regression) in this feature space. The support vectors are conventionally determined by solving a quadratic programming (QP) problem. SVMs have the following advantages: (i) they are able to generalize well even if trained with a small number of examples and (ii) they do not assume prior knowledge of the probability distribution of the underlying dataset. SVM is simple enough to be analyzed mathematically. In fact, SVM may serve as a sound alternative combining the advantages of conventional statistical methods that are more theory-driven and easy to analyze and machine learning methods that are more data-driven, distribution-free and robust. Recently, SVM are used in financial applications such as credit rating, time series prediction and insurance claim fraud detection (Vinaykumar et al., 2008).

3.2.5. J48 (Weka) Decision trees represent a supervised approach to classification. A decision tree is a simple structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. (Quinlan (1987)) has popularized the decision tree approach with his research spanning more than 15 years. The latest public domain implementation of Quinlan’s model is C4.5. The Weka classifier available in KNIME (http://www.knime. org/download.html) has its own version of C4.5 known as J48. We can summarize the general approach, as follows: (i) we can choose an attribute that best differentiates the output attribute values. (ii) Create a separate tree branch for each value of the chosen attribute. (iii) Divide the instances into subgroups so as to reflect the attribute values of the chosen node. For each subgroup, terminate the attribute selection process if: (i) all members of a subgroup have the same value for the output attribute, terminate the attribute selection process for the current path and label the branch on the current path with the specified value. (ii) The subgroup contains a single node or no further distinguishing attributes can be determined. For each subgroup created in (iii) that has not been labeled as terminal, repeat the above process. The algorithm is applied to the training data. The created decision tree is tested on a test dataset, provided one is available. If test data is not available, J48 performs a cross-validation using the training data. The confusion matrix is simply a square matrix that shows the various classifications and misclassifications of the model in a compact area. The columns of the matrix correspond to the number of

5487

R. Mohanty et al. / Expert Systems with Applications 37 (2010) 5484–5490

instances classified as a particular value and the rows correspond to the number of instances with that actual classification. 3.2.6. TreeNet Friedman (1999) introduced the TreeNet. It makes use of a new concept of ‘‘ultra slow learning” in which layers of information are gradually peeled off to reveal structure in data. TreeNet models have hundreds of small trees, each of which contributes just a tiny adjustment to the overall model. TreeNet is insensitive to data errors and needs no time-consuming data preprocessing or imputation of missing values. TreeNet is resistant to over training and is faster than a neural net. TreeNet available in (http://salford-systems.com/) is used in the paper. 3.2.7. CART Decision trees form an integral part of ‘machine learning’ an important sub-discipline of artificial intelligence. Almost all the decision tree algorithms are used for solving classification problems. However, algorithms like CART solve regression problems also. Decision tree algorithms induce a binary tree on a given training data, resulting in a set of ‘if–then’ rules. These rules can be used to solve the classification or regression problem. CART (http:// www.salford-systems.com) is a robust, easy-to-use decision tree tool that automatically sifts large, complex databases, searching for and isolating significant patterns and relationships. CART uses a recursive partitioning, a combination of exhaustive searches and intensive testing techniques to identify useful tree structures in the data. This discovered knowledge is then used to generate a decision tree resulting in reliable, easy-to-grasp predictive models in the form of ‘if–then’ rules. CART is powerful because it can deal with incomplete data; multiple types of features (floats, enumerated sets) both in input features and predicted features, and the trees it produces contain rules, which are humanly readable. Decision trees contain a binary question (with yes/no answer) about some feature at each node in the tree. The leaves of the tree contain the best prediction based on the training data. Decision lists are a reduced form of this where an answer to each question leads directly to a leaf node. A tree’s leaf node may be a single member of some class, a probability density function (over some discrete class) or a predicted mean value for a continuous feature or a Gaussian (mean and standard deviation for a continuous value). The key elements of a CART analysis are a set of rules for: (i) splitting each node in a tree, (ii) deciding when a tree is complete; and (iii) assigning each terminal node to a class outcome (or predicted value for regression). 4. Results and discussion The data is first normalized and we performed 10-fold crossvalidation throughout the study. We first discuss the results of all the classifiers without feature selection presented in Table 3. The parameters of all the classifiers are presented in Table 4. We notice that J48 yielded a spectacular average accuracy of 99.72%. TreeNet also yielded an average accuracy of 99.72%. CART produced an average accuracy of 98.61% and PNN yielded an average accuracy 98.32%. GMDH also yielded an average accuracy of 98.32%. BPNN produced the average accuracy of as 97.22%. SVM fared badly by producing an average accuracy of 63.61%. Further, we used BPNN, PNN, CART and TreeNet separately for feature selection. It was observed that for a given technique, different folds gave rise to different set of attributes as the important set of features. To come out with a unified feature subset for a given technique, we devised a frequency-based approach where in, we selected the attributes according to the frequency of their occurrence across different folds. The feature subsets so obtained are

presented in Table 5. Furthermore, it may be observed from Table 5 that these techniques selected different feature subsets as the most important ones. So, in order to resolve this conflict and come out with a unified subset of important features, we employed ensemble feature selection technique which was earlier used by Ravi, Shalom, and Manickavel (2004). In this method, we used the frequency-based approach again, In other words, we selected the attributes according to the frequency of their occurrence across different techniques. Accordingly, WSRF, Reliability, Successability, Throughput and Best Practices are selected as the most important features by this hybrid method. The feature subset selection is depicted in Fig. 1. We wanted to conduct further investigations on the dataset in order to come out with a ‘rule-based’ expert system. This is where the rules extracted by CART and J48, presented in Tables 6 and 7 respectively, come in handy. A close look at the rules induced by both CART and J48 indicates that WSRF dominates the entire scene and the rules do not depend on any other attribute. Further, since the attributes such as Response Time, Availability, Compliance, Latency and Documentation are not selected by the ensemble feature subset selection method described above, they

Table 3 Average results over 10-folds without feature selection. Classifier

Accuracy (%)

PNN BPNN GMDH J48 TreeNet CART SVM

98.71 97.22 98.32 99.72 99.72 98.61 63.61

Table 4 Parameter selection for different classifiers. #

CLs

Parameter values

1 2 3 4 5 6 7

PNN BPNN GMDH J48 (ID3) TreeNet CART SVM

328 [0, 1] 10 [0, 1] – [0, 1] – – Default values Default values CC 10

HN

SF

M

WF

SmF

CF

MnO

S

– 0.1 – –

– 0.3 – –

0.235 – – –

– – – 0.25

– – – 2

– – – 4

P Bias 0.0

Power = 1.0

O Penalty = 1.0

CLs: classifiers, HN: hidden neuron, SF: scale function, M: momentum, WF: weight factor, SmF: smoothing factor, CF: confidence factor, MnO: minimum number of objects, S: seed, CC: class column, O: overlapping, and P: polynomial.

Table 5 Feature selections from different classifiers. Serial no.

Algorithm

Selected features

1

BPNN

2

PNN

3

TreeNet

4

CART

WSRF (X10), Reliability (X5), Best Practices (X7), Successability (X4), and Throughput (X3) WSRF (X10), Throughput (X3), Best Practices (X7), Reliability (X5), and Successability (X4) WSRF (X10), Successability (X4), Latency (X8), Reliability (X5), and Compliance (X6) WSRF (X10), Reliability (X5), Throughput (X3), Response Time (X1), and Availability (X2) WSRF (X10), Reliability (X5), Throughput (X3), Successability (X4), and Best Practices (X7)

Features selected based on ensemble feature subset selection method

5488

R. Mohanty et al. / Expert Systems with Applications 37 (2010) 5484–5490

are removed from the original dataset and the classifiers are invoked again on the reduced dataset with the most important attributes viz. WSRF, Reliability Successability, Throughput and Best Practices. The results obtained in this case are presented in Table 8. From Table 8, we observe that GMDH and J48 could achieve 100% average accuracy, whereas BPNN, TreeNet and CART have yielded 99.72% average accuracy. This result confirmed the validity of our ensemble feature subset selection method. Further, we infer that the left over attributes having nothing to contribute to the average accuracies. From the rules we notice that dominance of WSRF in rules. Also, we recall the fact that WSRF is a function of Response Time (RT), Throughput (TP), Availability (AV), Accessibility (AC), Interoperability Analysis (IA) and Cost of service (C) according to Al-Masri and Mahmood (2007). Further because of the fact that Response Time (RT), Throughput (TP) and Availability (AV) are used along with WSRF, it would trigger what is known as multicollinearity, which would eventually affect the predictions badly. Therefore, we wanted to remove the all-important attribute of WSRF from the dataset and reconduct the experiments with all techniques. Accordingly, we used first nine attributes in Table 2 and invoked all the classifier again. The results of this experiment without WSRF are presented in Table 9. These results indicate that PNN with 89.99% and GMDH with 89.75% average accuracies outperformed all other classifiers. In the process, we infer that we lost about 10–11% average accuracy ow-

Fig. 1. Feature selection from different classifiers and experiments are carried out on feature-selected variables.

Table 6 Rules extracted by CART.

ing to the removal of WSRF. Then, we conducted feature selection again on the nine attributes (see Fig. 2, where feature selection procedure with nine attributes and without WSRF is presented) and the feature subsets selected by different methods are presented in Table 10. In this case also, we followed the same frequency-based approach for selecting feature subsets from across different folds for a given technique and the ensemble feature subset selection method as mentioned above. Consequently, we selected Reliability, Throughput, Successability, Response Time and Documentation as the most important features. By taking these selected variables we invoked the classifiers again and the average results of 10-fold cross-validation are presented in Table 11, which presents the results without WSRF. In Table 9 Accuracies of classifiers after removing WSRF. Classifier

Accuracy (%)

PNN BPNN GMDH J48 TreeNet CART SVM

89.99 86.38 89.75 67.77 82.44 78.61 60.55

Fig. 2. Feature selections from different classifiers and experiment are carried out on feature-selected variables without WSRF.

Table 10 Feature selection from different classifiers after removing WSRF.

Rule#

Conditions

Class

Serial no.

Algorithm

Selected features

1 2 3 4

If If If If

Bronze Silver Gold Platinum

1

BPNN

2

PNN

3

CART

4

TreeNet

Reliability (X5), Successability (X4), Throughput (X3), Documentation (X9), and Response Time (X1) Reliability (X5), Throughput (X3), Availability (X2), Documentation (X9), and Compliance (X6) Reliability (X5), Successability (X4), Throughput (X3), Availability (X2), and Response Time (X1) Reliability (X5), service name (X9), Throughput (X3), Successability (X4), and Response Time (X1) Reliability (X5), Throughput (X3), Successability (X4), Response Time (X1), and Documentation (X9)

(WSRF) 6 0.435714) then (WSRF) > 0.435714 && WSRF 6 0.578571) then (WSRF) > 0.578571 && WSRF 6 0.721429) then (WSRF) > 0.721429) then

Table 7 Rules extracted J48. Rule#

Conditions

Class

1 2 3 4

If If If If

Bronze Silver Gold Platinum

(WSRF 6 0.571429) then (WSRF 6 0.571429 && WSRF > 0.428571) then (WSRF > 0.571429 && WSRF 6 0.714286) then (WSRF > 0.571429 && WSRF > 0.714286) then

Table 8 Average results over 10-folds after feature selection with WSRF.

Features selected based on ensemble feature subset selection method

Table 11 Accuracy after removing WSRF and feature selection.

Classifier

Accuracy (%)

Serial no.

Classifiers

Accuracy (%)

PNN BPNN GMDH J48 TreeNet CART SVM

97.22 99.72 100 100 99.72 99.72 63.33

1 2 3 4 5 6 7

BPNN J48 CART PNN GMDH TreeNet SVM

86.11 67.777 73.61 88.31 86.66 79.442 60.556

5489

R. Mohanty et al. / Expert Systems with Applications 37 (2010) 5484–5490 Table 12 The rules (CART) for the 10-fold cross-validation using reduced feature without WSRF. Rule no.

Rule antecedents

Classification

1 2 3 4 5

If RT 6 0.004246 and TP 6 0.37585 and DOC 6 0.890625 and SUCS 6 0.211957 then If RT 6 0.004246 and TP 6 0.37585 and SUCS > 0.211957 and SUCS 6 0.798913 and DOC 6 0.21875 and R 6 0.633987 then If RT 6 0.004246 and TP 6 0.37585 and SUCS > 0.211957 and SUCS 6 0.798913 and DOC 6 0.21875 and R > 0.633987 then If TP 6 0.37585 and SUCS > 0.211957 and SUCS 6 0.798913 and DOC > 0.21875 and DOC 6 0.890625 and RT 6 0.0020175 then If TP 6 0.37585 and SUCS > 0.211957 and SUCS 6 0.798913 and RT > 0.0020175 and RT 6 0.004246 and DOC > 0.21875 and DOC 6 0.34375 then If TP 6 0.37585 and SUCS > 0.211957 and SUCS 6 0.798913 and RT > 0.0020175 and RT 6 0.004246 and DOC > 0.34375 and DOC 6 0.890625 then If RT 6 0.004246 and TP 6 0.37585 and SUCS 6 0.798913 and DOC > 0.890625 If RT 6 0.004246 and TP 6 0.37585 and SUCS > 0.798913 and SUCS 6 0.88587 and R 6 0.750545 If RT 6 0.004246 and TP 6 0.37585 and SUCS > 0.798913 and SUCS 6 0.88587 and R > 0.750545 then If RT 6 0.004246 and TP 6 0.37585 and SUCS > 0.88587 then If DOC 6 0.223958 and TP > 0.37585 and TP 6 0.666667 and RT 6 0.0030905 then If DOC 6 0.223958 and TP > 0.37585 and TP 6 0.666667 and RT > 0.0030905 and RT 6 0.004246 then If RT 6 0.004246 and DOC 6 0.223958 and TP > 0.666667 then If RT 6 0.004246 and TP > 0.37585 and DOC > 0.223958 then If RT > 0.004246 and SUCS 6 0.608696 and R 6 0.648148 then If RT > 0.004246 and R > 0.648148 and DOC 6 0.416667 and TP 6 0.406463 and SUCS 6 0.478261 then If RT > 0.004246 and R > 0.648148 and DOC 6 0.416667 and TP 6 0.406463 and SUCS > 0.478261 and SUCS 6 0.608696 then If RT > 0.004246 and SUCS 6 0.608696 and R > 0.648148 and DOC 6 0.416667 and TP > 0.406463 then If SUCS 6 0.608696 and R > 0.648148 and DOC > 0.416667 and RT > 0.004246 and RT 6 0.0159775 then If SUCS 6 0.608696 and R > 0.648148 and DOC > 0.416667 and RT > 0.0159775 then If TP 6 0.522108 and DOC 6 0.421875 and R 6 0.594226 and RT > 0.004246 and RT 6 0.0112355 and SUCS > 0.608696 and SUCS 6 0.766304 If TP 6 0.522108 and DOC 6 0.421875 and R 6 0.594226 and RT > 0.004246 and RT 6 0.0112355 and SUCS > 0.766304 then If SUCS > 0.608696 and TP 6 0.522108 and DOC 6 0.421875 and R 6 0.594226 and RT > 0.0112355 then If RT > 0.004246 and SUCS > 0.608696 and TP 6 0.522108 and DOC 6 0.421875 and R > 0.594226 then If RT > 0.004246 and SUCS > 0.608696 and TP 6 0.522108 and DOC > 0.421875 and R 6 0.526688 then If SUCS > 0.608696 and TP 6 0.522108 and DOC > 0.421875 and R > 0.526688 and RT > 0.004246 and RT 6 0.0084345 then If SUCS > 0.608696 and DOC > 0.421875 and RT > 0.0084345 and R > 0.526688 and R 6 0.651416 and TP 6 0.185374 then If SUCS > 0.608696 and DOC > 0.421875 and RT > 0.0084345 and R > 0.526688 and R 6 0.651416 and TP > 0.185374 and TP 6 0.522108 then If SUCS > 0.608696 and TP 6 0.522108 and DOC > 0.421875 and RT > 0.0084345 and R > 0.651416 then If RT > 0.004246 and SUCS > 0.608696 and TP > 0.522108 then

Bronze Bronze Silver Gold Gold

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Silver Gold Gold Platinum Gold Gold Silver Platinum Platinum Bronze Bronze Silver Silver Gold Silver Bronze Silver Bronze Silver Silver Gold Silver Gold Gold Gold

Response Time – RT, Availability – AV, Throughput – TP, Successability – SUCS, Reliability – R, and Documentation – Doc.

this case also PNN with 88.25% outperformed every other technique. Also, there is not much loss in accuracies even after feature selection. The average accuracies obtained (in percentages) by seven techniques viz., BPNN, J48, CART, PNN, GMDH, TreeNeT and SVM are 86.38, 67.77, 78.61, 89.99, 82.44, 89.75 and 60.555 without feature selection and the accuracies (in percentages) with feature selection for BPNN, J48, CART, PNN, GMDH, TreeNeT and SVM are 86.11, 67.77, 73.61, 88.31, 86.66, 79.44 and 60.55 (see Tables 9 and 11). The rules extracted by CART on QWS dataset without WSRF and after feature selection are presented in Table 12. It shows that once WSRF is removed from the analysis, all variables play equal role and the average number of rules extracted for all folds for CART is 30. Since, J48 yielded on average 39 rules and the average accuracy of them is 67.77%, which is less than that of CART, we did not present the rules obtained by J48.

5. Conclusions With the proliferation of web services, quality of service (QoS) becomes a key factor to differentiate the web services and their providers. In selecting a web service for use, it is important to consider non-functional properties of the web service so as to satisfy the constraints or requirements of users. In this paper, we present web services quality prediction models, which take non-functional properties into account. In this study, the experiments are carried out on QWS dataset by employing different classifiers such as BPNN, PNN, GMDH, CART, TreeNet, SVM and J48. We have applied 10-fold cross-validation on dataset. The results of average accuracy of different classifiers are found to be high and some classifiers

yielded very high accuracies. We developed ensemble feature subset selection method and applied it with and without WSRF. Without WSRF we found that Reliability, Throughput, Successability, Response Time and Documentation are the most important attributes. Finally we conclude that this study can be used to classify a new web service into one of the four predetermined classes based on the models. Acknowledgement We are grateful to Dr. E. Al-Masri and Dr. Q.H. Mahmoud for prviding us the dataset related to the web services classification. References AL-Masri, E., & Mahmood, Q. H. (2007). QoS based discovery and ranking of web services. In IEEE 16th international conference on computer communications and networks (ICCCN) (pp. 529–534). AL-Masri, E., & Mahmood, Q. H. (2008). Investigating web services on the World Wide Web. In 17th International conferences on World Wide Web (pp. 795–804). Beijing. Araban, S., & Sterling, L. (2004). Measuring quality of service for contract aware web-services. In First Australian workshop on engineering service-oriented systems (AWESOS 2004). Melbourne, Australia. Cristianini, N., & Shawe-Taylor, J. (Eds.). (2000). An introduction to support vector machine. Cambridge University Press. Ardaiz, O., Freitag, F., & Navarro, L. (2001). Improving service time of web clients using server redirection. ACM SIGMETRICS Performances Evaluation Review, 29(2). Bochmann, G. V., Kerherve, B., Lutffiyya, H., Salem, M. M., & Ye, H. (2001). Introducing QoS to electronic commerce applications. Berlin, Heidelberg: SpringerVerlag. pp. 138–147. Conti, M., Gregori, E., & Panzieri, F. (1999). Load distribution among replicated web services: A QoS based approach. In Second workshop on internet server performance. Atlanta (GA): ACM Press.

5490

R. Mohanty et al. / Expert Systems with Applications 37 (2010) 5484–5490

Rumelhart, G. E., Hinton, G. E., & Williams, R. J. (Eds.). (1986). Learning internal representations by error propagation (1). Cambridge, MA: MIT Press. Friedman, J. H. (1999). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378. Ivakhnenko, A. G. (1968). The GMDH. A rival of stochastic approximation. Soviet Automatic Control, 3, 43. Kalepu, S., Krishnaswamy, S., & Loke, S. W. (2004). Verity: A QoS metric for selecting web services and providers. In Proceedings of the fourth international conference on web mining systems engineering workshops (WISEW’03) (pp. 131–139). KNIME. Available from: http://www.knime.org/download.html. Kokash, N. (2005). Web service discovery with implicit QoS filtering. In Proceedings of the IBM PhD student symposium, in conjunction with ICSOC’05 (pp. 61–66), Netherlands. Mani, A., & Nagarajan, A. (2002). Understanding quality of services for web services. Available from: http://www-106.ibm.com/developerworks/library/ws-quality. html. Quinlan, J. R. (1987). Decision trees as probabilistic classifiers. In Proceedings of fourth international workshop machine learning (pp. 31–37). Irvine, CA. QWS Dataset. (xxxx). Avialable from: http://www.uogue/ph.ca/~qmahmoud/qws/ index.html. Rajkiran, K., & Ravi, V. (2008). Software reliability prediction by soft computing techniques. Journal of Systems and Software, 81, 576–583. Vapnik, V. (Ed.). (1998). Statistical learning theory. adaptive and learning systems (736). John Wiley and Sons. Ran, S. (2003). A model for web services discovery with QoS. ACM SIGecom Exchange, 4(1).

Ravi, V., Shalom, S. A. A., & Manickavel, A. (2004). Sputter process variables prediction via data mining. In Proceedings of first IEEE conference on cybernetics and intelligent systems (pp. 256–251). Singapore. Ravi, V., Chauhan, N. J., & Rajkiran, N. (2009). Software reliability prediction using intelligent techniques: Application to operational risk prediction in firms. International Journal of Computational Intelligence and Application, 8(2), 181–194. Specht, D. F. (1990). Probabilistic neural networks, neural networks 3 (pp. 116–118). Support Vector Machine. Available from http://www.svms.org/introduction. html. TreeNet, CART – Salford Systems Inc. Available from: http://www.salford-systems. com. Tsalgatidou, A., & Pilioura, T. (2002). An overview of standards and related technology in web services. Distributed and Parallel Databases, 12, 135–162. Vinaykumar, K., Ravi, V., & Carr, M. (2009). Software cost estimation using soft computing approaches. In E. Soria, J. D. Martín, R. Magdalena, M. Martínez, & A. J. Serrano (Eds.), Handbook on machine learning applications and trends: Algorithms, methods and techniques (pp. 499–518). USA: IGI Global. Vinaykumar, K., Ravi, V., Carr, M., & Rajkiran, N. (2008). Software development cost estimation using wavelet neural network. Journal of Systems and Software, 81(11), 1853–1867. Vinoski, S. (2003). Service discovery 101. IEEE Internet Computing, 7(1), 69–71. Zeng, L., Benatallah, B., Dumas, M., Kalagnanam, J. Z., & Sheng, Q. (2003). Web engineering: Quality driven web service composition. In Proceedings of the 12th international conference on World Wide Web (pp. 411–421). Budapest, Hungary: ACM Press.