Redundancy-aware SOAP messages compression and aggregation for enhanced performance

Redundancy-aware SOAP messages compression and aggregation for enhanced performance

Journal of Network and Computer Applications 35 (2012) 365–381 Contents lists available at SciVerse ScienceDirect Journal of Network and Computer Ap...

2MB Sizes 0 Downloads 18 Views

Journal of Network and Computer Applications 35 (2012) 365–381

Contents lists available at SciVerse ScienceDirect

Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca

Redundancy-aware SOAP messages compression and aggregation for enhanced performance Dhiah Al-Shammary n, Ibrahim Khalil School of Computer Science & IT, RMIT University, Melbourne, Australia

a r t i c l e i n f o

abstract

Article history: Received 22 February 2011 Received in revised form 23 July 2011 Accepted 16 August 2011 Available online 25 August 2011

Many organizations around the world have started to adopt Web services as well as server farms and clouds hosted by large enterprise and data centers for various applications. Web Services offer several advantages over other communication technologies. However, they have high latency and often suffer from congestion and bottlenecks due to the massive load generated by web service requests from large numbers of end users. SOAP (Simple Object Access Protocol) is the basic XML-based communication protocol of Web services. XML is a verbose encoding language in comparison with other technologies such CORBA and RMI. In this paper, two new redundancy-aware SOAP Web message aggregation models – Twobit and One-bit XML status tree – are proposed to enable the Web servers to aggregate SOAP responses and send them back as one compact aggregated message in order to reduce the required bandwidth, latency, and improve the overall performance of Web services. XML message compressibility, the Jaccard based clustering technique, and the vector space model are three similarity measurements that are proposed to cluster SOAP messages as groups based on their similarity degree. The clustering based similarity measurements enable the aggregation techniques to potentially reduce the required network traffic by minimizing the overall size of the messages. The experiments show significant performance for both aggregation techniques achieving compression ratios as high as 25 for aggregated SOAP messages. & 2011 Elsevier Ltd. All rights reserved.

Keywords: Web services SOAP Compression Aggregation

1. Introduction Web services are middleware that provide access to networked resources over the Internet with the support of network mechanisms and protocols such as HTTP and TCP (Rosu, 2007; Komathy et al., 2003; Madiraju et al., 2010). Generally, Web servers provide dynamically scalable services (responses) that are available on demand (requests) over the Internet (Christian Werner and Fischer, 2004; Diamadopoulou et al., 2008; Kuehnhausen and Frost, 2011; Subashini and Kavitha, 2011). SOAP (Simple Object Access Protocol) is the basic communication protocol of most Web services (Nakagawa et al., 2006; Hu et al., 2011). SOAP is based on XML (eXtensible Markup Language) that encodes the contents of sent/ received Web messages over the Internet (Rosu, 2007). Recently, the adoption of Web services on server farms and clouds has increased significantly by many network organizations with the aim of providing the required services without investing heavily in computing infrastructure (Hartmut Liefke, 2000; AjayKumar et al., 2009; Bo et al., 2010). Understandably, this has contributed to the growth of web services over the Internet. SOAP has been developed to improve interoperability of Web services (Khoi Anh Phan and Bertok, 2008; Christian Werner and

n

Corresponding author. E-mail addresses: [email protected], [email protected] (D. Al-Shammary), [email protected] (I. Khalil). 1084-8045/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.jnca.2011.08.004

Fischer, 2004; Chonka et al., 2011; Gu et al., 2005). However, Web services inherit the disadvantages of SOAP as messages are bigger than the real payload of the requested services (Rosu, 2007; Pastore, 2008; Ruiz-Martinez et al., 2011) which can cause high network traffic. As a result, Web services often suffer from congestion and bottlenecks due to the high number of client Web requests and the large size of Web messages (Nakagawa et al., 2006). This can result in slowing down the performance of the Web applications considerably (Christian Werner and Fischer, 2004; Khoi Anh Phan and Bertok, 2008; Hsu et al., 2009; Hu and Cho, 2011). Several compression techniques (Christian Werner and Fischer, 2004; Rosu, 2007) and textual aggregation models (Khoi Anh Phan and Bertok, 2008) have been developed to reduce the size of the messages. For example, XMill (Hartmut Liefke, 2000) distributes the XML tags into different containers and compresses them using semantic compressors. Differential encoding (Christian Werner and Fischer, 2004) reduces the computational overhead by computing the differences between the current active message and the previous one in order to compress them only and avoid the overhead. A similarity-based aggregation technique (Khoi Anh Phan and Bertok, 2008) aims to reduce network traffic by combining similar messages and deliver the compact message using multicast protocol. Despite the fact that these techniques are to some extent capable of enhancing the

366

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

performance of web services, they still suffer from few technical drawbacks:

 Some are considered to be storage consuming as they are 

either based on dictionary type approaches or depend on log files for the sent/received messages. Although both compression and aggregation models have similar objectives and exploit similarity within the message itself (redundancy in compression) or with other messages (aggregation), they have failed to take advantage of each other to achieve higher performance.

An XML binary tree structure based aggregation model by Al-Shammary and Khalil (2010a) was developed with the aim of providing a high compression ratio for aggregated messages. Twobit and One-bit compression techniques (Al-Shammary and Khalil, 2010b) are a general tree structure based models that can significantly compress individual XML messages. In this paper, new Twobit and One-bit aggregation models are proposed that exploit the redundancies found in SOAP messages to reduce the aggregated message size. Further message size reduction is achieved using compression. The objective of the proposed models is to provide an efficient aggregation that can significantly reduce the size of the messages. Two-bit and One-bit status XML tree aggregation techniques aim to enable Web servers with the capability to aggregate a group of messages that have a certain degree of similarity and send them as one compact message, minimizing the network traffic. Figure 1 shows the support from the compression and aggregation schemes in reducing the high network traffic created by Web requests/responses. The resultant aggregated messages of SOAP responses are extractable at the closest routers to the receivers (clients) to deliver only the required response to each client. An XML-aware compression technique is developed to exploit the redundancy of different SOAP messages creating one compact message structure for the Web messages. Three similarity measurements of SOAP messages are introduced in order to investigate the

best similarity based clustering model that can group messages with a significant similarity degree to enable the aggregation techniques to achieve potential message size reduction. Compressibility measurement, Jaccard coefficient (Wang and Li, 2009), and vector space technique (Liu et al., 2010) have been developed in order to cluster SOAP messages based on their similarity. Compressibility measurement investigates the possibility of size reduction that can be achieved with SOAP message pairs. Jaccard coefficient and vector space techniques are proposed to group SOAP messages into larger predefined size clusters (not only pairs). Evaluation of the proposed techniques show promising results and prove that aggregation techniques can achieve significantly higher compression ratios for similar SOAP messages than compressing them separately. The compression ratios that can be achieved by aggregating clustered messages have been investigated and the aggregated SOAP message size reduction is potentially higher than the accumulated size of the separately compressed messages. Aggregation of SOAP messages is computed with clusters varying between two and ten messages per cluster. Vector space model clustering has been shown to be slightly better in supporting the proposed aggregation techniques to reduce the overall size of the aggregated SOAP messages. Furthermore, experiments show that vector space model clustering requires significantly less processing time than the Jaccard based clustering technique. The proposed Two-bit and Onebit XML status tree aggregation techniques are compared with the Binary Tree based aggregation technique (Al-Shammary and Khalil, 2010a) and both models have shown potentially higher performance in terms of the resultant compression ratios and the processing time that is required to aggregate the clustered SOAP messages. The rest of this paper is organized as follows. Section 2 discusses related work. Section 3 explains the compressibility measurements of SOAP messages. Next, Section 4 states the development of Jaccard based clustering technique. Then, Section 5 explains the vector space model and its clustering model for grouping SOAP messages based on their cosine similarity degrees. Section 6 shows the structure of the XML tree and Section 7 explains the assigning process of the

Stock Quote Clients

Application Servers

Saved bandwidth channels

Control Node Database (Storage)

Internet

Fig. 1. Web services of Stock Quote application scenario.

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

XML tree. Section 8 describes the encoding and aggregation process of the XML trees. The evaluation of the proposed techniques is depicted in Section 9. Finally, Section 10 concludes the paper. 2. Related work In order to enhance the performance of SOAP Web services, compression of standalone (Hartmut Liefke, 2000; Christian Werner and Fischer, 2004) Web messages and textual aggregation models (Khoi Anh Phan and Bertok, 2008; Al-Shammary and Khalil, 2010a) have been proposed. In these works, the compression exploits the self-similarity of SOAP messages in order to reduce the overall message size while the textual aggregation techniques are mainly based on computing the similar content (tags and data leaves in the XML tree) as a way to aggregate the similar messages and minimize the required bandwidth to send/receive Web messages. Hartmut Liefke (2000) has an XML-aware compression technique (XMill) that is basically based on isolating the data leaves from other XML tags of the XML tree of the XML message. It distributes both the separated parts (i.e. XML tags and data leaf content) into a number of containers using the correlations among these items in order to remove redundancies and achieve high message size reduction. Finally, the data items of the containers are compressed separately using traditional semantic compressors. Christian Werner and Fischer (2004) proposed differential encoding for Web requests and responses in order to minimize the compression overhead. The proposed technique encodes the differences between the current and previously sent/received messages as it computes a skeleton for the previous message as a way to find these differences with the current message. The author also evaluated the performance of some traditional compression techniques and found that gzip compression achieved higher message size reduction than bzip2 but XMILL(ppm) outperformed both. Khoi Anh Phan and Bertok (2008) proposed an XML-aware aggregation model for SOAP Web messages after computing their similarities using Jaccard and Levenshtein similarity measurements. The aggregated messages are delivered using multicast protocol in order to avoid sending the SOAP responses separately. This helps minimize the network traffic efficiently. The generated compact message includes all the addresses of clients as strings in the header part. The structure of the aggregated message consists of two parts: the common section that contains message structures and common values of the messages, and the distinctive section that contains the non-redundant values of these messages. Intermediary routers parse the message header and create groups of client addresses based on the next hop in order to forward only the required message along the next hop. Al-Shammary and Khalil (2010a) have proposed new SOAP message aggregation concepts that are based on utilizing the compression strategy to exploit the redundancy of multi-SOAP messages as an alternative to the traditional similarity measurements. First, the technique starts by computing the XML tree of the involved messages and then converts them into a binary tree structure using the ‘‘first child/first sibling method’’. The resultant binary trees are then assigned with a combination of Two-bit codes for every single tag and data value in the tree. Next, the XML binary trees are transformed into textual expressions based on the assigned bit codes. Finally, fixed-length and Huffman are two encoding techniques the authors proposed to compress the textual expressions as well as aggregating them together removing the overall redundancy of their textual items. 3. Similarity measurements and clustering of SOAP messages Similarity-based clustering of SOAP messages represent a defacto operation for aggregation approaches by clustering messages with a

367

high level of similarity to strengthen aggregation resulting in high size reduction. In this paper, we first introduce compressibility for pairs of SOAP messages as a simple and effective similarity measurement tool to support the proposed compression based aggregation technique by computing the compressibility of messages. Jaccard coefficients are well-known for computing similarity of pairs of messages (Khoi Anh Phan and Bertok, 2008), and next, we exploit this feature to build a new clustering algorithm with n-message (where n Z2) sized clusters. Furthermore, another new clustering technique based on vector space model is proposed as a fixed cluster size technique in order to exploit the highest similarity that can be achieved in a group of messages.

3.1. Compressibility measurements Compressibility measurement is proposed in this paper as an alternative to the traditional similarities of SOAP messages by considering the redundancy within messages. In fact, clusters of SOAP message measurements determine the compressible SOAP Web messages that have common redundancy and can be combined efficiently with the aim of achieving high size reduction. As the proposed aggregation technique is a redundancy based model, the size of the Web messages is an effective criteria in predicting the potential reduction of the aggregated message size. Hence, the compressibility measurements consider the Web message’s size as a basic parameter as well as computing the overlapped ratio of the XML tags between messages. Eq. (1) is required for computing the overlapping ratio for a set of SOAP messages ðS1 ,S2 , . . . SN Þ. PN1 PN i¼1 j ¼ i þ 1 ShðSi ,Sj Þ ð1Þ OvðS1 ,S2 , . . . SN Þ ¼ PN i ¼ 1 TotðSi Þ where

 ShðSi ,Sj Þ is the number of common XML tags and data items in both messages Si and Sj.

 TotðSi Þ is the total number of XML tags and data items in message Si. Eq. (2) computes the overlapping OvðS1 ,S2 Þ ratio of two SOAP messages S1 and S2: OvðS1 ,S2 Þ ¼

ShðS1 ,S2 Þ TotðS1 Þ þ TotðS2 Þ

ð2Þ

Eq. (3) is required to compute the overall compressibility measurement CmðS1 ,S2 Þ of messages S1 and S2: CmðS1,S2Þ ¼ OvðS1 ,S2 Þ  LogðTotðS1 ,S2 ÞÞ

ð3Þ

where

 OvðS1 ,S2 Þ is the overlapping ratio between two messages S1 and S2.

 TotðS1 ,S2 Þ is the total number of XML items in both S1 and S2 messages. For the given SOAP messages S1 and S2 in Fig. 2, the shared common nodes (ShðS1 ,S2 Þ) is 14 and the total nodes ðTotðS1 Þ þ TotðS2 ÞÞ in both messages is 36. Therefore, the overlapping ratio can be computed as OvðS1 ,S2 Þ ¼ 14 36 ¼ 0:388 Then, the overall compressibility of messages S1 and S2 can be computed as CmðS1 ,S2 Þ ¼ 0:388  Logð36Þ ¼ 0:388  1:556 ¼ 0:61

368

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

where

< Stock Quote Response >

< Stock Quote Response >

< Array Of Stock Quote >

< Array Of Stock Quote >

< Stock Quote >

< Stock Quote >

< Company > IBM

< Company > NAB

The similarity between two service messages S1 and S2 is computed using the following equation:

< Company >

< Company >

SimJc ðS1 ,S2 Þ ¼ Jctemp ðS1 ,S2 Þ  Nx þ Jcleaf ðS1 ,S2 Þ  Nl

< Quote Info >

< Quote Info >

where

< Price > 22.36 < Price >

< Price > 26.47 < Price >

< Last Updated > 06 05 2010

< Last Updated > 06 05 2010

< Last Updated >

< Last Updated >

< Quote Info >

< Quote Info >

< Stock Quote >

< Stock Quote >

< Stock Quote >

< Stock Quote >

< Company > HP

< Company > ANZ

< Company >

< Company >

< Quote Info >

< Quote Info >

< Price > 21.54 < Price >

< Price >19.82 < Price >

< Last Updated > 06 05 2010

< Last Updated > 06 05 2010

< Last Updated >

< Last Updated >

< Quote Info >

< Quote Info >

3.3. Vector space messages grouping

< Stock Quote >

< Stock Quote >

< Array Of Stock Quote >

< Array Of Stock Quote >

< Stock Quote Response >

< Stock Quote Response >

S1:A SOAP response to the get Stock Quote (IBM,HP) request

S2:A SOAP response to the get Stock Quote (IBM,HP) request

The vector space model is one of the well-known techniques in information retrieval and textual documents clustering (Liu et al., 2010). It is based on computing the cosine similarities of documents in order to investigate their similarity degree (Chen and Song, 2009). The cosine similarity of the vector space model involves computing the item weights of the documents which reflect their descriptiveness in a statistical way (Liu et al., 2010). In this paper, the vector space model is proposed as a similarity measurement for SOAP messages in order to cluster them into equally sized groups. Eq. (7) measures the similarity between two SOAP messages S1 and S2:

 Nch ðXMLtreeÞ is a set of the distinctive characters of the leaf XML nodes.

Fig. 2. SOAP message responses to the request getStockQuote(X, Y).

3.2. Jaccard messages grouping The Jaccard similarity coefficient is a statistical factor that is commonly used for comparing the similarity and diversity of XML messages (Wang and Li, 2009). The Jaccard coefficient is defined as the size of the intersection of two XML messages divided by the size of the union of the same messages (Chung et al., 2010). Similar messages are determined by computing the similarities of both non-leaf and leaf nodes of all the XML trees. For non-leaf nodes, the Jaccard similarity is computed as the ratio of common nodes between two messages: Jctemp ðS1,S2Þ ¼

9Nnd ðS1Þ \ Nnd ðS2Þ9 9Nnd ðS1Þ [ Nnd ðS2Þ9

ð4Þ

where

 Nnd ðXMLtreeÞ is a set of distinctive non-leaf XML nodes; and  9X9 is the cardinality of the set X. The same Jaccard coefficient equation is modified to compute the similarity of the XML messages for leaf nodes only: Jcleaf ðS1 ,S2 Þ ¼

9Nch ðS1 Þ \ Nch ðS2 Þ9 9Nch ðS1 Þ [ Nch ðS2 Þ9

ð5Þ

ð6Þ

 Nx is the total number of non-leaf XML nodes; and  Nl is the total number of leaf XML nodes. In this paper, Jaccard similarity measurement is proposed as a simple grouping technique of XML messages that are clustered into equally sized groups based on their Jaccard similarities. Algorithm 1 is required to create the XML message groups in order to enable the proposed models to aggregate messages according to the resultant Jaccard clustered groups. In this algorithm, the centroids are selected based on the first available point (i.e. unclustered message). Firstly, the messages are flagged with a boolean value (initially ‘‘true’’) to assign all points as either still available and waiting to be clustered or already clustered to one of the generated groups based on the similarity to the nominated centroids. After selecting the first available centroid, the Jaccard similarity is computed with the remaining available points. Then, the clustered points are selected according to their high degree of similarity with the considered centroid.

SimVS ðS1 ,S2 Þ ¼

WS1 :WS2 JWS1 JJWS2 J

ð7Þ

where

 WS and WS are vectors that include the weight of each XML 1



2

document item for both messages S1 and S2 respectively. JWS1 J and JWS2 J represent the resultant norm values of the weight vectors for both messages S1 and S2 respectively.

Therefore, the cosine similarity equation can be described in detail as PN i ¼ 1 WS1 ðiÞ  WS2 ðiÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð8Þ SimVS ðS1 ,S2 Þ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN PN i ¼ 1 WS1 ðiÞ  i ¼ 1 WS2 ðiÞ The vector space model is proposed in order to cluster SOAP messages according to their cosine similarities. Algorithm 2 is required to generate the vector space based SOAP clusters. First, the algorithm determines the centers of documents by computing the summations of the frequencies of the weighted XML items for each vector and sorts them in descending order and then cluster them based on the required group size using the values of the

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

summations as the key of their distribution. The first XML document of each cluster is assigned as the center point. Then, the rest of the XML documents are grouped into clusters based on the similarity degree with the document that is being compared. Again, the points (messages) are initially flagged as available using a boolean (initially ‘‘true’’) value, and then the centroids are excluded as they are flagged ‘‘false’’. For every single centroid, its cosine similarity with all other available points is investigated to be clustered with the considered centroid.

369

5. XML tree traversing and assignments It is required to generate the XML minimized text expression (using the XML tree) in such a way that guarantees rebuilding the XML tree again in order to regenerate the original SOAP message. In this model, depth-first and breadth-first traversals are proposed to generate the minimized XML text expression by assigning all tags with binary codes to enable rebuilding of the XML tree by recognizing the correct position of each tag. 5.1. Two-bit status tree

4. XML tree structure and compression Generally, compression techniques have been used to enhance the performance of Web services by reducing the overall size of SOAP messages over the Internet to minimize the network traffic. These Web messages can be represented as a tree data structure which is a motivating factor for this work when designing the proposed redundancy-based aggregation model. The XML tree structure reduces the total number of SOAP message tags by keeping one occurrence in the tree and removing all duplicate closing tags. In this model, Web servers are enabled to aggregate SOAP responses using compression techniques to reduce the aggregated message size efficiently. A stock marketing scenario is used as an illustrative example that describes the components of the proposed redundancy-based aggregation model. Generally, stock quote Web systems involve a large number of transactions requesting a variety of information about companies and organizations such as prices. The SOAP messages shown in Fig. 2 are responses to the operation getStockQuote(IBM, HP) and getStockQuote(NAB, ANZ) respectively. In this paper, the proposed aggregation model is developed to generate the compact aggregated message by first building XML trees, then investigate their compressibility or their similarities based on Jaccard measurements or vector space model (VSM) and finally encoding them using XML status tree techniques which make use of fixed-length or Huffman (as a variable-length) encoding. An XML tree is an unmarked tree (Tr) that is a finite set of one or more nodes such that TRoot is the general root of the XML tree. The XML tree has two kinds of nodes, simple and complex nodes. Simple nodes represent all leaf nodes in the XML tree including their parent nodes as the total number of all simple nodes is Ns Z0 (s1 ,s2 , . . . ,sNs ). The remaining nodes represent the complex nodes from the same tree as the total number of them is Nc Z1 (c1 ,c2 , . . . ,cNc ). Figure 6 shows the resultant generated matrix form for two SOAP messages S1 and S2: Tr fTRoot g ¼ TRoot , 0 B B B B B B B B B B B B B B Tr fTRoot g ¼ B B B B B B B B B B B B B @

TRoot c1 c2    cNc s1 s2    sNs

c1 ,c2 , . . . ,cNc , 

s1 ,s2 , . . . ,sNs

ð9Þ

1

C Pc1 C C Pc2 C C C  C C  C C C  C C PcNc C C C Ps1 C C Ps2 C C C  C C  C C C  C A PsNs

where Pci and Psi denote parent of complex and simple nodes respectively.

Depth-first traversal is used in this technique which traverses all subtrees of every node before visiting the next sibling of that node. Three binary codes are suggested for assigning each node in the XML tree. All the non-leaf nodes are assigned with the binary code ‘‘0’’. Moreover, all right-end leaves of every complex node ci are assigned with the binary code ‘‘11’’. Finally, the remaining leaves are assigned with the binary code ‘‘10’’. Figure 4 shows the resultant assigned XML trees of the generated XML trees in Fig. 3 of the given SOAP messages in Fig. 2. Eq. (10) represents the general formula of the assigned textual expression of SOAP messages. TEXP ¼

Nd [

Bi Tag i

ð10Þ

i¼1

where

 Bi is the binary code value for the considered XML textual items.  Tagi is the assigned XML textual item.  Nd is the total number of both complex and simple XML items ðNc þ Ns Þ. The resultant textual form expression of the Two-bit binary code assigning process for message S1 can be represented as follows. f0 StockQuoteResponse 0ArrayOfStockQuote 0Stock Quote 0Company 10IBM 0QuoteInfo 0Price 1022.36 0LastUpdated 1106/05/2010 0StockQuote 0Company 10HP 0QuoteInfo 0Price 1021.54 0LastUp dated 1106/05/2010}. Similarly, message S2 can be expressed as: f0StockQuoteResponse 0ArrayOfStockQuote 0StockQuote 0Company 10NAB 0QuoteInfo 0Price 102 6.47 0LastUpdated 1106/05/2010 0StockQuote 0Company 10ANZ 0QuoteInfo 0Price 1019.82 0LastUpdated 1106/05/2010g. 5.2. One-bit status tree In the One-bit status tree technique, breadth-first traversal is proposed as it traverses all nodes of the XML tree level by level. The last traversed node of every single level is assigned by ‘‘1’’ and all of the remaining nodes are assigned by ‘‘0’’. This strategy enables the decompression algorithm to recognize the end of the every single level in addition to the overall structure of the XML tree as the nodes between the end node of the considered level and the recorded parent node are the children of that parent node at that level. Figure 5 shows the resultant One-bit assigned XML trees of the generated XML trees in Fig. 3. The resultant textual form expression of the One-bit binary code assigning process for message S1 can be represented as follows. {0 StockQuoteResponse 1ArrayOfStockQuote 0StockQuote 1StockQuote 0Company 1QuoteInfo 0Company 1QuoteInfo 1IBM 0Price 1 LastUpdated 1HP 0Price 1LastUpdated 122.36 106/05/2010 121.54 106/05/2010}. Furthermore, message S2 would be expressed as: f0StockQuoteResponse 1Array OfStockQuote 0StockQuote 1 StockQuote 0 Company 1QuoteInfo 0Company 1QuoteInfo 1NAB 0Price 1LastUpdated 1ANZ 0Price 1LastUpdated 126.47 106/05/2010 119.82 106/05/2010}.

370

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

Stock Quote Response

Stock Quote Response

Array Of Stock Quote

Array Of Stock Quote

Stock Quote

Stock Quote

IBM

Quote Info

Company

Quote Info

Company

Price

Last Updated

22.36

06/05/2010

HP

Stock Quote

Stock Quote

Price Last Updated

NAB

21.54 06/05/2010

Quote Info

Company

Quote Info

Company

Price

Last Updated ANZ

Price Last Updated

26.47

06/05/2010

19.82 06/05/2010

Fig. 3. Generated XML messages trees of both S1 (a) and S2 (b) SOAP messages.

0Stock

0

Stock Quote Response

0

0

0

10

0

Stock Quote 0

Company

0

Quote Info

IBM 0 Price 10

0

Array Of Stock Quote

0Last 11

22.36

0

Stock Quote 0

Company Quote Info 10

Updated HP

0

Price0Last Updated

Array Of Stock Quote

0Stock

Stock Quote

0Company

10NAB0

1021.54 1106/05/2010

06/05/2010

Quote Response

0

Quote Info

1026.47

0Company 0Quote

Info

10 ANZ 0Price 0Last Updated Updated

0Last

Price

Quote

11

06/05/2010

1019.82 11

06/05/2010

Fig. 4. Assigned XML messages trees of both S1 (a) and S2 (b) SOAP messages using Two-bit status technique.

0Stock

Quote Response

1Array

0Stock 0Company

Of Stock Quote

1Stock

Quote 1Quote

1IBM 0 Price 122.36

Info

1Last

0Company 1Quote

Updated

106/05/2010

0Stock

Quote Info

1HP 0 Price 1Last

Updated

0Company 1NAB

121.54 1 06/05/2010

0Stock

Quote Response

1Array

Of Stock Quote

1Stock

Quote 1Quote

0Price 126.47

Info

1Last

Quote

0Company 1Quote

Info

1ANZ 0 1 Price Last

Updated

106/05/2010

Updated

119.82 106/05/2010

Fig. 5. Assigned XML messages trees of both S1 (a) and S2 (b) SOAP messages using One-bit status technique.

6. Aggregation of SOAP expressions Encoding of the XML textual expression is the final step of the proposed model that generates the final compact version of the considered messages and represents the core component of the aggregation model. Fixed and variable length encoding techniques are proposed to generate the aggregated compact message from the combined textual expressions. Both encodings are wellknown as lossless compression techniques that can remove the redundancies of letters by assigning binary codes for these letters. The resultant encoded message structure has two parts: the lookup table that includes unique content for every single item in the XML textual expression while the second part includes the binary codes from the encoded messages. SOAP messages are aggregated during the encoding process

by generating one common lookup table for all the considered SOAP expressions. Algorithm 1. Jaccard clustering. 01: 02: 03: 04: 05: 06: 07: 08: 09:

//Notation Description: //Sn holds the number of documents //Gst holds the resultant groups //Gn holds the number of groups

10:

Gn’ Sn Gs ==Number of groups

For i ¼ 1 To Sn==Initializing flags VFlagðiÞ’True Next i Gst’’’’’==Initializing the resultant groups

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

371

Index

Node Content

Parent Index

Index

Node Content

Parent Index

0

Stock Quote Response

0

0

Stock Quote Response

0

1

Array Of Stock Quote

0

1

Array Of Stock Quote

0

2

Stock Quote

1

2

Stock Quote

1

3

Stock Quote

1

3

Stock Quote

1

4

Company

2

4

Company

2

5

Quote Info

2

5

Quote Info

2

6

Company

3

6

Company

3

7

Quote Info

3

7

Quote Info

3

8

IBM

4

8

NAB

4

9

Price

5

9

Price

5

10

Last Updated

5

10

Last Updated

5

11

HP

6

11

ANZ

6

12

Price

7

12

Price

7

13

Last Updated

7

13

Last Updated

7

14

22.36

9

14

26.47

9

15

06/05/2010

10

15

06/05/2010

10

16

21.54

12

16

19.82

12

17

06/05/2010

13

17

06/05/2010

13

Fig. 6. Generated matrix form of both S1 (a) and S2 (b) SOAP messages.

11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31:

For Gindex ¼ 1 To Gn For i ¼1 To Sn //Determine the next center document if VFlagðiÞ ¼ True then C’i Gst’concatenateðGst,CÞ exit loop end if Next i For i ¼ C þ1 To Sn //Compute the Jaccard similarities if VFlagðiÞ ¼ True then SimJc ðiÞ ¼ Jctemp ðSC ,Si Þ  Nx þ Jcleaf ðSC ,Si Þ  Nl end if Next i For i ¼1 To Gs//Find the closest documents Maxindex ’0 //Initializing the closest document index For j ¼ C þ 1 To Sn if ðMaxindex ¼ 0Þ then

32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 45:

Maxindex ’j else if ðVFlagðjÞ ¼ TrueÞ and ðSimJc ðjÞ 4 SimJc ðMaxindex ÞÞ then Maxindex ’j end if end if Next j Gst’concatenateðGst,Maxindex Þ VFlagðMaxindex Þ ¼ False Next i Gst’concatenateðGst,‘‘&’’Þ==End of Group Next Gindex

Figure 7 shows the structure of both the individually compressed and aggregated messages. Figure 8 shows the generation of the aggregated message structure from the textual expressions for two SOAP messages. The size of the aggregated message lookup table is smaller than the accumulated size of lookup tables of the individually compressed messages. This is due to the fact that common XML items exist in the different SOAP

372

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

SOAP 2 Message Expression

SOAP 1 Message Expression

SOAP 2 Message Expression

SOAP 1 Message Expression

Common Lookup Table (CLT) Lookup Table1 (LT1)

Lookup Table2 (LT2)

Message 1 Code (MC1)

Message 2 Code (MC2)

Compressed Message

Compressed Message

Aggregation-based

Aggregation-based

Message 1 Code (AMC1)

Message 2 Code (AMC2)

Aggregated Message

Fig. 7. Compressed (a) and aggregated (b) messages structures.

0 ^^^ ^^^ ^^^

X1

^^^ ^^^ ^^^ XML Message 1 Tree

1 ^^^ 2 X1

0

3 ^^^ 4 ^^^

1

5 ^^^

2

0

XML Binary Node Code

Textual Expression 1 B1^^^ B2^^^ B3^^^ B4^^^ B5 X1 B6^^^

^^^ ^^^ X1

1

0 ^^^ ^^^ X1 ^^^

^^^

1 ^^^

^^^

^^^

XML Message 2 Tree

X1

0

2 ^^^

0

3 ^^^

1

4 ^^^

1

5 ^^^ 6 ^^^

2 2

Textual Expression 2 B1^^^ B2 X1 B3^^^ B4^^^ B5^^^ B6^^^ B7^^^

------0101

^^^

----

^^^

----

^^^

----

^^^

----

^^^

----

^^^

----

^^^

----

^^^

----

Message 1 Code ----,----,----,0101, ----,----,0101,----, ----,0101,----,-----, -----,----Message 2 Code ----,0101----,0101, ----,----,----,0101, ----,0101,----,-----, -----,-----,-----

Aggregated Message Structure

Matrix forms of XML trees

Fig. 8. SOAP Web messages aggregation strategy and compact message structure.

messages, thus, the probability of similarly occuring is increased. This in turn results in a smaller common lookup table. Referring to Fig. 7, the common lookup table (CLT) is smaller than the accumulated size of both lookup Table 1 and lookup Table 2 (LT1þ LT2). On the other hand, the size of the binary encoded parts for the aggregated message is smaller than the encoded parts for the compressed messages. This is because it is based on the lengths of the generated mappings binary codes for the XML items in the lookup table, as they are encoded with fewer bits in the common lookup table. 6.1. Huffman binary tree encoding Huffman binary tree encoding is based on assigning variable length of the binary codes to the considered text items, which the length of the binary codes is based on the relative frequencies (redundancies) of the text items. Huffman binary tree encoding is an iterative computing process that builds a binary tree as it assigns binary codes to every single item in the XML textual expression. During the Huffman encoding process, we first order the text items in the textual expressions in ascending order based on the weights of the items (weight is the redundancy of the considered item). Second, we assign the first two items as

Table 1 Binary codes of XML nodes of both S1 and S2 SOAP messages. Node content

Node redundancy

Huffman code

Fix. length code

StockQuote Company QuoteInfo LastUpdated 06/05/2010 Price StockQuoteResponse ArrayOfStockQuote IBM HP NAP ANZ 22.36 21.54 26.47 19.82

4 4 4 4 4 4 2 2 1 1 1 1 1 1 1 1

100 1011 1010 001 000 0011 1101 1100 001001 001000 001011 001010 11101 11100 11111 11110

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

children of an internal node where the weight of that node will be the sum of the weights of both children. Then, we move the new internal node to its correct position in the list with the aim of

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

373

Table 2 Minimum, maximum, and average compression ratios of both aggregation and stand alone compression techniques of Fixed and Variable length encodings. Message size

Standalone

Aggregation (pairs of messages)

Fixed-length Min.

Huffman

Fixed-length

Huffman

Max.

Average

Min.

Max.

Average

Min.

Max.

Average

Min.

Max.

Average

Binary Tree

Small Medium Large V.large

1.64 2.25 4.03 9.29

2.79 3.89 9.44 12.16

2.14 3.16 7.16 11.08

1.51 2.10 3.91 10.62

2.62 3.73 10.81 14.83

2.00 3.03 7.78 13.2

2.67 4.39 9.64 12.62

4.52 9.02 13.07 14.1

3.63 6.99 11.93 13.64

2.53 4.34 11.07 15.70

4.50 10.16 16.43 18.45

3.55 7.54 14.62 17.53

Two-bit Tree

Small Medium Large V.large

1.66 2.27 4.09 9.64

2.81 3.96 9.79 12.74

2.16 3.20 7.38 11.57

1.68 2.30 4.3 11.57

2.90 4.14 11.78 16.1

2.22 3.34 8.5 14.33

2.70 4.46 10.02 13.26

4.60 9.35 13.76 14.90

3.69 7.19 12.51 14.39

2.56 4.41 11.57 16.71

4.59 10.57 17.55 19.85

3.60 7.79 15.51 18.8

One-bit Tree

Small Medium Large V.Large

1.66 2.28 4.16 10.01

2.85 4.02 10.17 13.38

2.18 3.24 7.60 12.12

1.69 2.32 4.37 12.11

2.93 4.21 12.33 17.16

2.24 3.38 8.80 15.17

2.73 4.54 10.42 13.97

4.69 9.69 14.53 15.80

3.75 7.40 13.15 15.23

2.59 4.49 12.12 17.85

4.67 11.01 18.81 21.49

3.65 8.04 16.51 20.26

maintaining the ascending order of the whole list. This process is repeated until all text items are assigned as children under one internal node. Finally, all of the left edges in the binary tree are assigned with a ‘‘0’’ and all of the right edges are assigned with a ‘‘1’’. Table 1 shows the generated binary codes of the XML textual items used in the Huffman binary tree encoding process. 6.2. Fixed-length encoding Fixed-length encoding generates the same length binary codes for all considered XML textual items. The binary code length of the codeword is generated mapping is based on the total number of the input XML textual items: NBits ¼ RoundðlogðkÞ þ 0:5Þ

ð11Þ

where k is the total number input textual items. The constant 0.5 that is added to the resultant NBits is to compute an accurate rounded number bit length that covers the required number of encoding bits. Practically, fixed-length encoding is considered as a second technique for encoding the transformed XML binary tree (XML textual expressions). 7. Experiments and discussion In the evaluation of the proposed aggregation models, we have considered a variety of SOAP message sizes that range from only 140 bytes to 53 kbytes in order to show the efficiency of the models on small messages as well as large ones. The objective of considering small messages is to investigate the fact that lossless encodings usually create large lookup tables in comparison to the encoded part of the input message that could cause in many cases an even larger encoded message than the uncompressed one. At the same time, this evaluation shows an accurate investigation for both fixed-length and Huffman encodings of Binary tree based techniques (AlShammary and Khalil, 2010a) in addition to Two-bit and One-bit techniques against other standard compression techniques (i.e. gzip, bzip2, XMill, and XBMill). Algorithm 2. Vector space clustering. 01: 02: 03: 04: 05: 06:

//Notation Description: //Sn holds the number of documents //Gst holds the resultant groups //Gn holds the number of groups Gst’’’’’==Initializing the resultant groups

07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41:

Gn’ Sn Gs ==Number of groups Cn’Gn==Number of centers For i ¼ 1 To Sn==Sum Weights of Samples P SumWðiÞ’ N j ¼ 1 WSi ðjÞ Next i TCntð1: :NÞ’Ascending SortðSumWð1: :NÞÞ ==Sorting weight vector K’1==Initializing the centers index For i ¼ 1 To Cn==Determine centers index CntðiÞ’K K’K þ Gs Next i For i ¼ 1 To Sn==Initializing flags VFlagðiÞ’True Next i For i ¼ 1 To Cn==Exclude centers VFlagðCntðiÞÞ’False Next i For i ¼ 1 To Gn==Clustering documents C’CntðiÞ==Get center index Gst’concatenateðGst,CÞ For j ¼ 1 To Sn==Compute similarities if VFlagðjÞ ¼ True then P Dp’ N k ¼ 1 ½WSC ðkÞ  WSj ðkÞ Np’SumWðCÞ  SumWðjÞ SimVS ðjÞ’ Dp Np end if Next j TSim ð1: :NÞ’Descending SortðSimVS ð1: :NÞÞ ==Sorting similarities For j ¼ 1 To Gs1==Include closest documents Gst’concatenateðGst,TSim ðjÞÞ Next j Gst’concatenateðGst,‘‘&’’Þ==End of cluster Next i

A testbed has been set up with 160 real SOAP Web messages that are distributed equally into four groups based on the message size: small (140–800 bytes), medium (800–3000 bytes), large (3000–20 000 bytes), and very large (20 000–55 000 bytes). Since single compression concepts are used as a basis for the proposed aggregation model, the compression scheme is first applied as a standalone technique and then compared against its aggregation model in order to show the ability of the compression

374

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

700

Compressed Messages Size (Byte)

Compressed Messages Size (Byte)

Fig. 9. Original and compressed size of small, medium, large, and very large sized SOAP messages using Two-bit status tree based fixed-length and Huffman encodings. (a) List ‘A’ of small size messages. (b) List ‘A’ of medium size messages. (c) List ‘A’ of large size messages. (d) List ‘A’ of very large size messages. (e) List ‘B’ of small size messages. (f) List ‘B’ of medium size messages. (g) List ‘B’ of large size messages. (h) List ‘B’ of very large size messages.

Accumulated Fixed−Length Compressed Messages Accumulated Huffman Compressed Messages Fixed−Length based Aggregated Messages Huffman based Aggregated Messages

600 500 400 300 200 100 0

5

10

15

1600 1400 1200 1000 800 Accumulated Fixed−Lengthn Compressed Messages Accumulated Huffman Compressed Messages Fixed−Length base Aggregated Messages Huffman base Aggregated Messages

600 400

20

0

5

4000 3500 3000 2500 2000 Accumulated Fixed−Length Compressed Messages Accumulated Huffman Compressed Messages Fixed−Length based Aggregated Messages Huffman based Aggregated Messages

1500 1000 0

5

10 Message Index

10

15

20

15

20

Message Index

Compressed Messages Size (Byte)

Compressed Messages Size (Byte)

Message Index

15

20

8000

Accumulated Fixed−Length Compressed Messages Accumulated Huffman Compressed Messages Fixed−Length base Aggregated Messages Huffman base Aggregated Messages

7500 7000 6500 6000 5500 5000 4500 4000 3500 3000 0

5

10 Message Index

Fig. 10. Resultant aggregation compact message and accumulated size of compressed messages (i.e. pairs) using Two-bit status technique deploying fixed-length and Huffman encodings. (a) Small messages. (b) Medium messages. (c) Large messages. (d) Very large messages.

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

375

based Huffman encoding achieved even higher compression ratios for all message types, except for the small group (2.75, 5.22, 13.8, and 17.52 for small, medium, large, and very large messages respectively). The details of the results are shown in Table 2. We show that high compression ratios have been achieved for large and very large documents and aggregation with both fixed-length and Huffman encodings using the proposed techniques have reduced the size of small and medium messages successfully. The results for small SOAP messages are interesting as very few existing standard techniques are capable of reducing small messages. The results in Table 2 show that the One-bit status tree based aggregation technique has outperformed both Two-bit status tree and Binary Tree based models. With the aim of showing a precise evaluation for the aggregation technique, the resultant aggregated size for every pair of messages is compared to the accumulated compressed size of its messages using Two-bit status tree standalone compression version (see Fig. 10). At the same time, the compressibility measurements showed that aggregated messages that have higher compressibility can be reduced more (see Fig. 11) which justifies the need for clustering of SOAP messages for the purpose of aggregation. For the aggregation, we create groups of messages where each group may contain more than two messages for higher compression. Furthermore, four standard compression techniques: XMILL and XbMILL as XML-aware techniques and gzip in addition to bzip2 as generic techniques are compared with

schemes in achieving higher reduction during the aggregation process. All of the SOAP messages in the proposed testbed are first compressed using Two-bit status tree compression technique. Then, the One-bit status tree aggregation technique is implemented on every pair of messages for all groups (small, medium, large, and very large). Figure 9 shows the ability of the Two-bit status tree standalone compression technique in compressing SOAP messages using both fixed-length and Huffman encodings. Both encodings show promising results as fixed-length encoding achieved compression ratios that are up to 2.81, 4, 9.8, and 12.74 for small, medium, large, and very large messages respectively. On the other hand, Huffman encoding showed similar results for small and medium messages while it achieved significantly higher compression ratios on large and very large messages up to 2.9, 4.14, 11.78, and 16.1 for small, medium, large, and very large messages respectively. From these results, we can see that fixed-length encoding performs better for small and medium sized messages. Furthermore, the developed aggregation versions of the proposed compression techniques are applied using the same set of messages by aggregating SOAP message pairs that belong to the same group. The results showed higher compression ratios that are up to 2.93, 5.17, 11.69, and 13.68 using Two-bit status tree based fixed-length encoding for small, medium, large, and very large messages respectively. Aggregation with Two-bit status tree

18 Fixed−Length Aggregated Messages Huffman Aggregated Messages

Average Clustering Time of Jaccard and Vector Space Model

14 x 104

12 4.5

10

Clustering Time (Milliseconds)

Compression Ratio (Cr.)

16

8 6 4 2 0 0

0.5

1

1.5 2 2.5 3 3.5 4 Compressibility Measurements

4.5

4 3.5 3 2.5 2 1.5 1

V.Large Messages Large Messages Medium Messages Small Messages

0.5

0 accard based Clustering Vector Space Clustering

5

Fig. 11. Compressibility measurements and compression ratios of aggregated SOAP Web message pairs.

Fig. 12. Average clustering time (milliseconds) of Jaccard and vector space model of small, medium, large, and V.large messages with 40 messages each.

Table 3 Resultant compressed size of different SOAP messages using XMILL, XbMILL, gzip, bzip2, Fixed and Variable length compressors in addition to the Fixed and Variable length aggregation techniques. Message size(B)

Accumulated compressed size

A

Gzip

636 358 265 1743 2122 1442 8129 11 516 16 997 47 800 29 876 52 899

B

463 140 602 2340 1542 820 19 697 18 285 4510 45 085 48 257 40 961

517 304 433 1250 1122 834 4845 5069 3936 9653 7448 8332

bzip2

563 316 470 1272 1132 931 3584 3744 3052 6063 4972 7559

XMILL

542 320 458 1184 1054 854 3659 3783 3045 9377 5247 9470

XBMILL

650 417 556 1362 1233 1024 3666 3758 2951 8621 6574 8603

Aggregated messages size Binary Tree

Two-bit

One-bit

Fix.

Huff.

Fix.

Huff.

Fix.

466 238 406 1219 1042 825 3444 3561 2906 7924 6898 7982

499 257 435 1270 1086 860 3115 3210 2694 6586 5873 6538

462 236 403 1203 1026 817 3336 3448 2824 7568 6599 7624

495 254 432 1255 1072 851 3007 3097 2611 6230 5575 6180

458 233 400 1187 1011 809 3230 3336 2743 7214 6302 7266

Binary Tree

Two-bit

One-bit

Huff.

Fix.

Huff.

Fix.

Huff.

Fix.

Huff.

491 252 429 1239 1057 844 2899 2984 2530 5875 5277 5821

400 194 371 952 836 586 2699 2798 2240 7161 6135 7220

420 209 393 941 827 599 2305 2376 1978 5665 4934 5794

395 192 367 936 821 578 2592 2685 2158 6806 5837 6861

416 206 389 925 812 591 2198 2263 1896 5309 4636 5436

391 190 364 921 807 570 2485 2573 2077 6451 5539 6503

412 204 386 910 798 583 2090 2150 1815 4954 4338 5077

376

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

Fig. 13. One and Two-bit resultant aggregated compact size of small, medium, large, and very large SOAP messages using Jaccard and vector space model similarity grouping with five messages per group as a group size.

Fig. 14. Resultant average compression ratios of small, medium, large, and very large SOAP aggregated messages using Jaccard similarity grouping. (a) Small sized messages. (b) Medium sized messages. (c) Large sized messages (d) V.Large sized messages.

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

377

Fig. 15. Resultant average compression ratios of small, medium, large, and very large SOAP aggregated messages using vector space model similarity grouping. (a) Small sized messages. (b) Medium sized messages. (c) Large sized messages (d) V.Large sized messages.

Fig. 16. One-bit, Two-bit, and Binary Tree aggregation and deaggregation time of Jaccard and vector space model based clustered small, medium, large, and very large SOAP messages with five messages per group as a group size.

378

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

Table 4 The aggregation time (milliseconds) of Binary Tree, Two-bit, and One-bit techniques for the computed SOAP message clusters using Jaccard clustering model. Message size

Number of messages per cluster

Binary Tree Fixed

Two-bit Huffman

Fixed

One-bit Huffman

Fixed

Huffman

Small

2 3 4 5 6 7 8 9 10

9.2 10.43 13.3 16 19.57 22.67 30.8 34 37.25

9.6 10.93 14.1 17.25 20.86 24 32.8 36.4 40

0.25 0.36 0.4 0.63 1 1.33 1.2 2 2.5

0.65 0.86 1.3 1.63 2.29 2.83 3.4 3.8 4.5

0.35 0.36 0.5 0.75 1.29 1.5 1.6 1.8 2.5

0.65 0.86 1.2 1.63 2.57 3.17 3.4 3.6 4.75

Medium

2 3 4 5 6 7 8 9 10

22.65 43.57 40.6 49.13 58.14 69.33 89.4 89.8 111.25

24.8 32.14 42.8 52.75 63 73.5 93.4 93.6 115.75

3.4 5.29 7.7 9.63 11.43 14.33 16.4 16.8 21

5.5 7.86 11.1 13.63 15.43 17.83 20.2 20.4 26.5

3.45 5.36 7.7 9.63 11.14 14 16.2 16 20.5

5.65 8 11.1 13.25 15 17.5 20.4 20.6 24.75

Large

2 3 4 5 6 7 8 9 10

1217.6 1745.29 2440.2 3017.63 3490.57 4037.17 4848.6 4851.2 6053.75

1223.5 1755.79 2440.4 3049.75 3485.14 4051.83 4849.8 4893.4 6065.75

122.2 173.43 248.7 317.88 366.14 428 507.8 513.8 627.5

126.55 178.43 253.4 322.38 373.14 435.83 513.4 519.4 643.5

122.45 173.79 247.7 316.13 364.29 426.5 508.8 507 633.5

126.4 177.86 252.3 318.88 370.29 432.17 513.8 512.8 644.75

V.Large

2 3 4 5 6 7 8 9 10

24 832.7 35 539.36 49 741 62 331.63 71 238.71 83 158 99 798.4 99 946 125 038.8

24 878.7 35 602.5 49 826.9 62 449 71 364 83 290.33 99 960.4 100 151.2 125 278.3

1254.85 1802.29 2545.7 3186.13 3659.57 4277.83 5133.8 5197.2 6472

1262.5 1806.36 2549 3190.5 3663.71 4288 5139.6 5175.4 6484

1256.75 1807.64 2545.8 3197.25 3674 4288.17 5248.8 5241.4 6528

1257 1807.93 2545.9 3196.25 3667 4283.5 5149.8 5186.2 6503.75

Table 5 The aggregation time (milliseconds) of Binary Tree, Two-bit, and One-bit techniques for the computed SOAP message clusters using vector space clustering model. Message size

Number of messages per cluster

Binary Tree Fixed

Two-bit Huffman

Fixed

One-bit Huffman

Fixed

Huffman

Small

2 3 4 5 6 7 8 9 10

9.25 9.71 12.5 17 22.71 22.5 30.4 33.6 30.25

9.5 10.5 13.5 18.25 24.14 24.33 33 36 32.75

0.3 0.29 0.6 0.75 0.71 1.33 1.6 2.2 2.5

0.65 0.79 1.7 2 2.57 3.17 4.4 3.8 5.25

0.75 0.5 0.5 0.88 0.71 1.33 2 1.8 2.5

0.7 1.07 1.7 1.88 2.57 3.17 4 4.2 5

Medium

2 3 4 5 6 7 8 9 10

22.1 29.43 42.3 48.63 62.43 71 89.2 92.4 89.5

24.55 32.86 46.3 53 67.43 75.33 93.4 97.2 95

3.55 5.21 8 9.75 12.14 14.17 17 17.2 20.75

5.75 8.36 11.8 13.63 16.29 18.67 21.2 21.6 25.25

3.55 5.36 8 9.63 12.29 14.17 17.2 17.2 21

6 8.5 12.3 13.63 15.89 18 21.2 22.2 25.25

2 3 4 5 6 7

1208.5 1643.36 2610.7 2936.63 3624.86 3841.33

1219 1653.36 2631.9 2967.88 3644 3870.33

121.7 168.93 268.4 310.63 378.57 413.33

126 173.71 272.1 317.5 384 410.83

122.7 169.36 268.6 309.5 377.43 406.33

126.6 173.93 272.5 316.13 383.43 413.17

Large

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

379

Table 5 (continued ) Message size

Number of messages per cluster

Binary Tree Fixed

V.Large

Two-bit Huffman

Fixed

One-bit Huffman

Fixed

8 9 10

4975 5028.2 5117.75

5030.4 5061 5152.25

533 528.4 560.25

528.8 536.6 573.75

528.6 529.6 566.25

2 3 4 5 6 7 8 9 10

24 833.45 36 132.79 47 977.8 65 797 74 040.14 84 062.67 103 559 99 212.8 104 450.5

24 831.1 36 154.79 48 014.1 65 929.88 74 067.71 84 113 103 653.6 99 157 104 586.3

1260.4 1839.64 2477.2 3351.5 3791.86 4296.5 5322.2 5111.2 6589

1260.8 1847.21 2487.8 3360.63 3798.86 4306.5 5357.2 5126 6589.25

1261.3 1834.79 2481.8 3358.5 3794.86 4304.83 5325.8 5134.8 6598.25

Huffman 527.4 533 575.5 1259.9 1845.93 2475 3353.38 3791 4307.67 5324.8 5182 6616.25

Table 6 The deaggregation time (milliseconds) of Binary Tree, Two-bit, and One-bit techniques for the aggregated SOAP message based on both Jaccard and Vector Space resultant groups. Message size

Number of messages per cluster

Jaccard based clustering Binary Tree

VSM based clustering

Two-bit

One-bit

Binary Tree

Two-bit

One-bit

Small

2 3 4 5 6 7 8 9 10

0.05 0.05 0.2 0.125 0.857 1.333 2 3.4 6

0.05 0.05 0.1 0.05 0.143 0.333 0.6 1.2 1.75

0.05 0.05 0.05 0.05 0.05 0.05 0.4 1.4 1.75

0.05 0.05 0.1 0.625 1 1.333 2.8 3.4 5.5

0.05 0.05 0.05 0.05 0.286 0.05 1 1.4 2

0.05 0.05 0.1 0.05 0.286 0.5 0.8 1.2 1.75

Medium

2 3 4 5 6 7 8 9 10

0.05 0.05 0.2 1 1.714 2.333 4 4.6 8.5

0.05 0.05 0.05 0.05 0.143 0.833 1.2 1.8 3

0.05 0.05 0.05 0.125 0.429 0.833 1.8 2.2 3.5

0.05 0.143 0.6 1.125 1.857 2.667 4.2 5.2 8

0.05 0.05 0.05 0.125 0.429 0.833 1.4 2 3

0.05 0.05 0.05 0.125 0.143 0.833 1.6 2.2 3.25

Large

2 3 4 5 6 7 8 9 10

0.05 0.571 1.5 2.625 4 5.833 8.8 10.6 16

0.05 0.05 0.5 1 1.857 2.5 4 4.4 7.25

0.05 0.05 0.4 1.375 1.571 2.667 4 4.8 7.25

0.05 0.214 1.8 2.875 4.286 6.333 10 10.6 14

0.05 0.071 0.8 1.375 1.571 2.833 4.2 4.8 6.75

0.05 0.071 0.4 1.5 1.857 2.333 4.2 5.4 7

V.Large

2 3 4 5 6 7 8 9 10

0.1 1.714 4.3 7.5 10.426 16.167 22.4 26.4 39.5

0.05 0.643 1.8 3.5 5 7.167 10.4 12 18.5

0.05 0.786 1.9 3.5 5.143 7.333 11.4 13 18.75

0.15 2 4.5 8.125 11.143 16.333 24.2 28 36.25

0.05 0.929 2.2 4 5.429 7.667 12.2 12.6 19.75

0.05 0.857 2.2 4.125 5.571 7.667 12.8 12.8 19.75

in this evaluation using the same strategy of comparing their resultant accumulated compressed size for all considered pair of messages against its resultant aggregated message size. The results shown in Table 3 confirm the high performance of the proposed aggregation models in achieving the highest compression ratios in comparison with all other standalone compression techniques. Both Two-bit status tree based fixed length and Huffman aggregation have shown promising results for small

and medium message pairs. However, Huffman based aggregation technique is shown to be the best performer for large and very large message pairs. Both Jaccard and vector space model based clustering techniques are evaluated for both message compression ratios and the processing time they require to cluster SOAP messages according to their similarity degrees that the proposed clustering models can exploit. Figure 12 shows the average clustering time of both

380

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

Jaccard and vector space models for small, medium, large, and very large messages. Vector space model requires less processing time than Jaccard based clustering technique for all messages. The resultant SOAP message clusters for the proposed clustering models are aggregated by Two-bit and One-bit status tree techniques in addition to the Binary Tree (BT) aggregation model for comparison regards. Figure 13 shows the aggregated message size reduction using Two-bit and One-bit status tree techniques of Jaccard and vector space model based clustered groups with five messages per group. The aggregation models have shown significant messages size reduction. However, the One-bit status tree aggregation technique has achieved better size reduction than the Two-bit status tree technique. Figures 14 and 15 show the average compression ratios that have been achieved using Two-bit and One-bit status tree based aggregation models in addition to the Binary Tree aggregation model for Jaccard and Vector space models based clustered messages. SOAP messages are aggregated with nine different group size starting from two messages per cluster up to 10 messages per cluster. The results show that as cluster size increases, a higher compression ratio can be achieved. The compression ratios of the aggregated groups based on the vector space model clustering are slightly higher than the aggregated groups based on Jaccard clustering. The aggregation and de-aggregation time for both Two-bit and One-bit status tree techniques in addition to the Binary Tree technique are investigated for five SOAP messages, one for each group size: small, medium, large, and very large (see Fig. 16). The proposed aggregation techniques have shown significantly better processing time than the Binary Tree technique. The same applies to the de-aggregation time as both the proposed techniques outperformed the Binary Tree technique by consuming potentially less processing time for deaggregating SOAP messages. In order to investigate the required processing time for both the aggregation and deaggregation techniques in more detail, both are investigated using different cluster sizes of SOAP messages that range from 2 up to 10 messages in each cluster. Tables 4 and 5 show the processing time for aggregating small, medium, large, and very large sized messages using Binary Tree, Two-bit, and One-bit aggregation models based on both Jaccard and vector space clustering models. Moreover, Table 6 shows the de-aggregation time for the aggregated clusters of SOAP messages that are grouped using both Jaccard and vector space models. The results show that both Two-bit and One-bit techniques have achieved tremendously better performance in terms of the processing time for both aggregation and de-aggregation versions. Furthermore, the aggregation and de-aggregation time for Twobit and One-bit techniques are very close to each other and the processing time increases as the size of SOAP messages (small, medium, large, and very large) increase.

8. Conclusion and future work XML-aware compression techniques can be developed into efficient SOAP aggregation models that can exploit redundancies in several SOAP messages. In this paper, we have shown that redundancy-based aggregation techniques can outperform all the standalone compression techniques by achieving higher compression ratios for messages of all sizes: small, medium, large and very large. The performance of the Web services can be improved by applying the redundancy-aware aggregation models enabling Web servers to generate a compact message that can be used by receivers to extract the original messages. A new compressibility measurement technique in our work shows that we can predict the ability of aggregation models when we group similar messages

appropriately. The One-bit XML status tree aggregation technique outperformed all other standard compression techniques in addition to the Two-bit XML status tree and Binary Tree based aggregation models. Both Jaccard and vector space based clustering models have shown significant performance, enabling aggregation models to achieve high compression ratio that can be Z20. However, vector space outperformed Jaccard clustering in terms of supporting the aggregation models to reduce the size of SOAP messages efficiently and the required processing time to cluster messages. For future work, standard clustering techniques such as K-Means that produce variable group size will be proposed as a SOAP similarity measurement alternative in order to investigate their capability in strengthening the aggregation techniques with respect to message size reduction.

References AjayKumar S, Nachiappan C, Periyakaruppan K, Boominathan P. Enhancing portable environment using cloud and grid. In: International conference on signal processing systems; 2009. p. 728–32. Al-Shammary D, Khalil I. Compression-based aggregation model for medical Web services. In: 2010 annual international conference of the IEEE engineering in medicine and biology society (EMBC), August 31 2010–September 4 2010a. p. 6174–7. Al-Shammary D, Khalil I. A new XML-aware compression technique for improving performance of healthcare information systems over hospital networks. In: 2010 annual international conference of the IEEE engineering in medicine and biology society (EMBC), August 31, 2010–September 4, 2010b. p. 4440–3. Bo C, Yang Z, Peng Z, Hua D, Xiaoxiao H, Zheng W, et al. Development of WebTelecom based hybrid services orchestration and execution middleware over convergence networks. Journal of Network and Computer Applications 2010;33(5):620–30 [Middleware Trends for Network Applications, ISSN 1084-8045, doi:10.1016/j.jnca.2010.03.025. /http://www.sciencedirect.com/ science/article/pii/S1084804510000603S]. Chonka A, Xiang Y, Zhou W, Bonti A. Cloud security defence to protect cloud computing against HTTP-DoS and XML-DoS attacks. Journal of Network and Computer Applications 2011;34(4):1097–107 [Advanced Topics in Cloud Computing, ISSN 1084-8045, doi:10.1016/j.jnca.2010.06.004. /http://www. sciencedirect.com/science/article/pii/S1084804510001025S]. Chen M, Song Y. Summarization of text clustering based vector space model. In: IEEE 10th international conference on computer-aided industrial design & conceptual design, 2009. CAID & CD 2009, 26–29 November; 2009. p. 2362–5. Christian Werner CB, Fischer S. Compressing soap messages by using differential encoding. In: Proceedings of the IEEE international conference on Web services, San Diego, CA, USA, 2004. p. 540–7. Chung JY, Park B, Won YJ, Strassner J, Hong JW. An effective similarity metric for application traffic classification. In: IEEE Network Operations and Management Symposium (NOMS), 19–23 April; 2010. p. 144–8. Diamadopoulou V, Makris C, Panagis Y, Sakkopoulos E. Techniques to support Web Service selection and consumption with QoS characteristics. Journal of Network and Computer Applications 2008;31(2):108–30 Trends in peer to peer and service oriented computing, ISSN 1084-8045, doi:10.1016/j.jnca.2006.03.002. /http://www.sciencedirect.com/science/article/pii/S1084804506000282S. Gu T, Pung HK, Yao JK. Towards a flexible service discovery. Journal of Network and Computer Applications 2005;28(3):233–48. ISSN 1084-8045, doi:10.1016/ j.jnca.2004.06.001. /http://www.sciencedirect.com/science/article/pii/S1084804 504000463S. Hartmut Liefke DS. An extensible compressor for xml data. ACM SIGMOD Record 2000;29(1):57–62. Hsu I-C, Chi L-P, Bor S-S. A platform for transcoding heterogeneous markup documents using ontology-based metadata. Journal of Network and Computer Applications 2009;32(3):616–29. ISSN 1084-8045, doi:10.1016/j.jnca.2008.07.006. /http:// www.sciencedirect.com/science/article/pii/S1084804508000817S. Hu C-L, Cho C-A. User-provided multimedia content distribution architecture in mobile and ubiquitous communication networks. Journal of Network and Computer Applications 2011;34(1):121–36. ISSN 1084-8045, doi:10.1016/j.jnca.2010.08.010. /http://www.sciencedirect.com/science/article/pii/S108480451000158XS. Hu J, Khalil I, Han S, Mahmood A. Seamless integration of dependability and security concepts in SOA: a feedback control system based framework and taxonomy. Journal of Network and Computer Applications 2011:34(4);1150–9 [Advanced Topics in Cloud Computing, ISSN 1084-8045, doi:10.1016/j.jnca.2010.11.013. /http://www.sciencedirect.com/science/article/pii/S1084804510002110S]. Khoi Anh Phan ZT, Bertok P. Similarity-based soap multicast protocol to reduce bandwidth and latency in Web services. IEEE Transactions on Services Computing 2008;1(2):88–103. Komathy K, Ramachandran V, Vivekanandan P. Security for XML messaging services—a component-based approach. Journal of Network and Computer Applications 2003;26(2):197–211. ISSN 1084-8045, doi:10.1016/S1084-8045(03)00003-1. /http://www.sciencedirect.com/science/article/pii/S1084804503000031S.

D. Al-Shammary, I. Khalil / Journal of Network and Computer Applications 35 (2012) 365–381

Kuehnhausen M, Frost VS. Application of the Java Message Service in mobile monitoring environments. Journal of Network and Computer Applications 2011;34(5):1707–16 [Dependable Multimedia Communications: Systems, Services, and Applications, ISSN 1084-8045, doi:10.1016/j.jnca.2011.06.003. /http://www.sciencedirect.com/science/article/pii/S1084804511001159S]. Liu H, Bao H, Wang J, Xu D. A novel vector space model for tree based concept similarity measurement. In: The 2nd IEEE international conference on information management and engineering (ICIME) 2010, 16–18 April; 2010. p. 144–8. Madiraju P, Malladi S, Balasooriya J, Hariharan A, Prasad SK, Bourgeois A. A methodology for engineering collaborative and ad-hoc mobile applications using SyD middleware. Journal of Network and Computer Applications 2010;33(5):542–55 [Middleware Trends for Network Applications, ISSN 1084-8045, doi:10.1016/j.jnca.2010.03.007. /http://www.sciencedirect.com/ science/article/pii/S1084804510000421S]. Nakagawa M, Nozaki K, Shimojo S. Web-based distributed simulation and data management services for medical applications. In: 19th IEEE international symposium on computer-based medical systems. CBMS 2006, 2006. p. 125–30.

381

Pastore S. The service discovery methods issue: a web services UDDI specification framework integrated in a grid environment. Journal of Network and Computer Applications 2008;31(2):93–107 [Trends in peer to peer and service oriented computing, ISSN 1084-8045, doi:10.1016/j.jnca.2006.05.001. /http://www.sciencedirect.com/science/article/pii/S1084804506000270S]. Rosu M-C. A-SOAP: Adaptive SOAP message processing and compression. In: Proceedings of the IEEE international conference on Web services, Salt Lake City, Utah, USA, 2007. p. 200–7. Ruiz-Martinez A, Sanchez-Montesinos J, Sanchez-Martinez D. A mobile network operator-independent mobile signature service. Journal of Network and Computer Applications 2011;34(1):294–311. ISSN 1084-8045, doi:10.1016/j.jnca.2010.07. 003. /http://www.sciencedirect.com/science/article/pii/S1084804510001256S. Subashini S, Kavitha V. A survey on security issues in service delivery models of cloud computing. Journal of Network and Computer Applications 2011;34(1): 1–11. ISSN 1084-8045, doi:10.1016/j.jnca.2010.07.006. /http://www.sciencedirect.com/science/article/pii/S1084804510001281S. Wang Y, Li Y-h. Deep Web entity identification method based on improved Jaccard coefficients. In: International conference on research challenges in computer science, 2009. ICRCCS’09, 28–29 December; 2009. p. 112–5.