PIVOT: An adaptive information discovery framework for computational grids

PIVOT: An adaptive information discovery framework for computational grids

Information Sciences 180 (2010) 4543–4556 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/i...

1MB Sizes 1 Downloads 118 Views

Information Sciences 180 (2010) 4543–4556

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

PIVOT: An adaptive information discovery framework for computational grids Guiyi Wei a, Yun Ling a, Athanasios V. Vasilakos b,*, Bin Xiao c, Yao Zheng d a

College of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou, PR China Department of Computer and Telecommunications Engineering, University of Western Macedonia, Kozani, Greece c Department of Computing, Hong Kong Polytechnic University, Hong Kong, PR China d School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, PR China b

a r t i c l e

i n f o

Article history: Received 22 November 2008 Received in revised form 21 July 2010 Accepted 26 July 2010

Keywords: Computational grid Information discovery Resource scheduling Adaptiveness Scalability Distributed cooperation

a b s t r a c t In a traditional computational grid environment, the owners of resources usually provide information about their resources extracted by pre-configured information services or web services. However, such information is not sufficient for the scheduler in the highperformance distributed computing. To solve this problem, we propose a scalable grid information service framework, named PIVOT (adaPtive Information discoVery framewOrk for compuTational grid). By using deadline-constrained flooding collector dissemination and P2P-like information collection schemes, PIVOT provides an active mechanism to collect application-specific resource information. In particular, PIVOT provides a resource information service for application-specific schedulers. The best-effort performance on overhead traffic and communication latency during information discovery is guaranteed by two new distributed cooperative algorithms. The experimental results in the simulations and real computational grid platform demonstrate that PIVOT has a high level of adaptability for application-specific resource information discovery, and also improves the accuracy of resource allocation and the efficiency of executing parallel tasks in traditional information services. Ó 2010 Elsevier Inc. All rights reserved.

1. Introduction As the grid becomes a viable and cheap high-performance computing tool, more and more experimental platforms in scientific and engineering domains are transferred to computational grid environments. However, it is hard for a grid computing system to achieve the same performance as that of the supercomputer with the same computing capability. In practice, most across-organization grid systems are unable to provide good QoS (quality-of-service) for large-scale distributed computational tasks. There are two main reasons for this issue: resources allocation and task scheduling.  Resources allocation: In grid systems, the across-organization resources are loosely, and most of them are autonomously controlled by their owners. However, the state of the resources, such as the task queue, free memory and communication capacity, is highly dynamic. This conflict results a challenge to allocate the across-organization resources to grid tasks that have diverse QoS requirements.

* Corresponding author. E-mail addresses: [email protected] (G. Wei), [email protected] (Y. Ling), [email protected] (A.V. Vasilakos), [email protected] (B. Xiao), [email protected] (Y. Zheng). 0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.07.022

4544

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

 Task scheduling: Due to the requirement of more resource information, scheduling multiple dependent subtasks of the same parallel grid task on the loosely organized resources is more challenging than scheduling single or multiple independent tasks. Till now, many scheduling algorithms, general-purpose or specialized, have been proposed to resolve the problem; and most of them are based on the assumption that all needed resource information is given. However, this assumption cannot hold in the across-organization grid system as described above. With regard to a scheduling algorithm in the grid system, we should know what resource information it needs, how much information it needs, and how it obtains the information from the dynamic resource network. The answers to these questions are determined by the requirements of the task and the underlying scheduling algorithm. However, the requirements of tasks in the grid system are generally diverse. For example, computation-intensive applications, such as Monte-Carlo simulations, involve high-speed computing power; I/O-intensive applications, such as distributed data sharing systems, involve large-capacity storage; while communication-intensive applications, such as computational fluid dynamic simulations, involve a large bandwidth and low latency. Thus, an effective resource monitoring framework and efficient resource information discovery approaches are very important for a grid system. As we know, information services in most current grid computing middlewares, such as Globus [7], Legion [14], UNICORE [31], SGE [11], Condor [8], GLORIAD [12], PRAGMA [39], and CROWN [18], work in a passive mode and provide unified resource information to the schedulers or resource brokers. The owner autonomously determines what resource information is to be shared, and then periodically sends this information to the schedulers or brokers who are involved. For the most part, the information is static and pre-configured. Hence, it is not sufficient for scheduling algorithms to get optimal results. In this paper, we propose an adaPtive Information discoVery framewOrk for compuTational grid (PIVOT) that can actively discover customized resource information from a dynamic resource network. As part of our project, PIVOT is implemented in our grid environment MASSIVE (Multidisciplinary ApplicationS-oriented SImulation and Visualization Environment) [36]. It utilizes grid security infrastructure (GSI) to securely control over autonomous resources. Our experiments demonstrate that PIVOT improves the scalability and adaptability of grid information services and the effectiveness of resource utilization in traditional information services. Our contribution can be summarized as follows: 1. PIVOT provides insight into the design of effective methods of monitoring grid resources and information discovery. 2. The across-organization resource network in PIVOT is treated as a dynamic random graph. Based on a spanning tree of the graph, customized information collectors are disseminated and the needed resource information is sent back. 3. Two distributed algorithms are designed to achieve a best-effort performance on overhead traffic and communication latency in the information collection process. 4. PIVOT improves the effectiveness and efficiency of scheduling algorithms for parallel tasks in computational grids. This is, to the best of our knowledge, the first attempt in the literature to solve the problem of dynamic information discovery using the active mode in grid computing. 5. By applying the BIP (Binary Integer Programming) model to schedule communication-intensive parallel grid tasks, we develop a deadline and budget constrained cost-time optimization algorithm for scheduling parallel and dependent tasks. 2. Related work Large-scale grid applications are usually required to manage large amounts of related resources. Information services of grid systems are used to address the challenging problems of announcing and discovering resources. Monitoring and Data Service (MDS version 2) was used in most grid systems such as LCG (Large hadron collider Computing Grid) [2], OSG (Open Science Grid) [17], TeraGrid [21], and NorduGrid [20], although it has been deprecated by Globus Alliance. Globus MDS4 [25] does not run information providers on demand, but only at fixed intervals. Furthermore, it needs a central index server to make customized indexes and periodic caching. One of its well-known implementations is TeraGrid. However, both MDS2 and MDS4 suffer from problems of scalability and stability in their information cache mechanism. Similar to MDS, BDII (Berkeley Database Information Index) was introduced into the LCG project to replace MDS in the front end of information service. BDII uses a registry of sites and caches information every two minutes. The main shortcoming of BDII is that the index size of a site’s BDII grows too quickly while queries in other sites’ BDIIs increase. To improve the information retrieval performance, Sun et al. [34] proposed a Resource Category Tree (RCT), which organizes resources based on their primary characteristics. RCT adopts a structure of a distributed AVL tree, and the method that each node represents a specific range of primary attribute values. To address the problems in uncooperative distributed information retrieval environments, Paltoglou [27] proposed an integral-based source selection algorithm. The algorithm models each source as a plot. Then, it uses the relevance score and the intra-collection position of the sampled documents by referring to a centralized sample index. RGMA (Relational Grid Monitoring Architecture) [5] uses a relational database to store information from diverse resources. Users can retrieve information via SQL queries. Similarly, NAREGI (National Research Grid Initiative) [16] uses an object manager to distribute information about computing elements. Information is aggregated to a relational database. It is implemented as a grid service by the use of the Globus service OGSA–DAI (Open Grid Services Architecture - Data Access and Integration). However, when using a database, the connection time and overhead workload are excessive during the retrieval process. In addition, a

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

4545

central server (database server [5,16] or information index server [27]) is needed. The fixed aggregation point (central server) constrains the scalability of the information service. It is also difficult to handle the conditions of large numbers of tasks simultaneously. To provide distributed information service, Grimoires (Grid RegIstry with Metadata Oriented Interface: Robustness, Efficiency, Security) [13] and gLite [9] use web service framework to implement information discovery and retrieval. Service (resource information) discovery issues are compounded by a variety of different types of service directories. By using web service framework, Pastore [19,28] proposed UDDI (Universal Description, Discovery and Integration) solutions by means of a UDDI registry deployed in an existing Globus-based grid. Pastore took into consideration the benefit of using private and secure UDDI nodes in grid environments for indexing, searching, and retrieving services. However, the web service framework does not support dynamical and variational information demands. To improve the scalability of information service, the peer-to-peer (P2P) technology is introduced into grid resource information discovery [23,30,35]. Cai and Hwang [3] proposed a scalable grid monitoring architecture that builds distributed aggregation trees (DAT). By leveraging Chord topology and routing mechanisms [33], the DAT trees are implicitly constructed from native Chord routing paths without the need to maintain membership. Marzolla et al. [24] proposed two P2P systems: tree-shaped and two-level hierarchical networks. Both systems are based on routing index. Shen [32] proposed a P2P-based intelligent resource discovery (PIRD) mechanism that weaves all attributes into a set of indices by using locality sensitive hashing, and then maps the indices to a structured P2P overlay. PIRD incorporates the Lempel–Ziv–Welch algorithm to compress attribute information for higher efficiency. Some researchers also evaluate the performance of information discovery mechanisms. Paunovski [29] evaluates the performance of a selective discovery mechanism in a distributed bio-inspired multi-agent community. In [29], the open distributed network is treated as a fully decentralized biologically inspired environment. The combination and aggregation of information are also studied in distributed environments to reduce the traffic of information discovery [10]. After analyzing the state-of-the-art studies in grid information service, we find that above proposed methods are passive for resource information demanders. Since various scheduling algorithms are adopted by applications in multidisciplinary domains, the passive information service mechanisms can not satisfy the diversity of resource demands. The passive information services are not enough for grid users to actively acquire sufficient resource information that is necessary for their schedulers. Therefore, users should deploy their customized programs to directly retrieve resource information under the approval of resource owners. In this paper, we propose a scalable grid information service framework (PIVOT) to support the active information discovery mechanism. PIVOT also provides customized grid resource information discovery to fulfill the diverse information requirements of applications and resource schedulers. 3. Overview of the MASSIVE project The MASSIVE project enables grid-based computational fluid dynamics (CFD) and computational solid mechanics (CSM), and provides computational service by using OGSA architecture. It provides both grid services and application-oriented

Fig. 1. The layered structure of MASSIVE.

4546

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

services. The architecture of MASSIVE is constructed with multiple layers, as illustrated in Fig. 1. The bottom layer is the hardware that includes various heterogeneous distributed resources. The second layer is the security layer, GSI. The third layer has the basic grid services provided by the grid middle-ware. The fourth layer contains services that are the visually re-capsulated services of the third layer. PIVOT resides in the third layer in order to provide scalable information services. The fifth layer consists of composite services that integrate application-oriented libraries. The top layer contains three environments that respond to the steps of multidisciplinary problem solving. MASSIVE supports the application-oriented service composition. It provides (1) the functionality of remote execution and monitoring for grid users; (2) some specialized services for the CFD and CSM applications, such as collaborative visualization, exploration, and analysis of the numerical results; (3) an interface to a meshing service; (4) the visualization capability of the meshes that are created. To minimize the risks of the remote execution of simulations, MASSIVE makes it possible for the user to keep the codes and data in a local machine, and to transfer them to a remote machine only when the time is right for execution. After the execution, all of the codes and data are erased from the remote site [22].

4. PIVOT information discovery framework 4.1. Problem modeling Most wide-area distributed systems contain a variety of networked devices including computer-based ones (CPU, memory, disk), network components (hubs, routers, gateways) and specialized data sources (embedded devices, sensors, data-feeds). In a grid, all of the shared resources form an ad hoc overlay network for grid applications. Each shared resource provides grid service dynamically with guaranteed quality, although he can join and leave the network freely without violating the service policy. Since no resource is privileged over any other resources, no hierarchy or centralized point of failure exists in this resource network. To optimize resource allocation and task scheduling on this resource network, grid users or resource brokers always need to collect the real-time information of the shared resources. In the context of a computational grid, the resource information usually needs to be customized by a demander or scheduler to fulfill application-specific requirements. For example, scheduling communication-bounded grid applications requires the collection of bandwidths and communication latency among resource nodes. The process of information collection is different from the file block lookup in P2P networks and the data query in distributed storage/database systems. In general, the process of information collection consists of two steps: first, the customized information collector is disseminated (The collector is a program or query that can execute in multiple platforms and retrieves specialized resource information with a privacy guarantee mechanism); second, information is collected from the covered resource nodes. The main challenges of designing a collection mechanism include (1) provisioning scalability and adaptability of the resource information discovery on the resources network according to the users’ requirements, and (2) supporting the measurable efficiency and cost of the overhead traffic and query/response latency. To model the problem, we denote N as the total number of the shared resources in the network, m as the number of actually needed resources of the demander/scheduler, and n as the number of the covered resources of a collection, where N > n > m. In the context of a dynamic network, N is unknown to all resource nodes included the demander/scheduler. The demander/scheduler is one of the N resources. The parameter n and m is designated by the demander/scheduler, where m is irrelative to the process of information discovery. In traditional passive mechanisms of collecting information, such as the GIIS (Grid Index Information Service) of MDS and UDDI of Web Service technology, the overlay resource network is treestructured. In this paper, PIVOT exploits a P2P-like collector dissemination and information collection scheme based on a random graph overlay (a decentralized, unstructured P2P network). Hence, it supports pure peer infrastructures. Since each resource node acts as a peer of a decentralized, unstructured P2P network, the intuitive resilience and decentralization enhances the scalability of the PIVOT framework. The information collection algorithms for disseminating application-specific collector information can adaptively adjust the coverage of the collector by tuning the parameters hops-to-live (HTL) or the number of needed resources n. The HTL is used to limit the maximum number of hops, which is positively related to the latency of a collection. The performance of information collection keeps relatively constant, while shared resources join and leave the network dynamically. The collector (or query) dissemination begins from a random-selected start node. First, the collector (or query) is disseminated to the start node. Then, the covered nodes forward the collector to their neighbored nodes that have not been covered. This process continues until either n nodes are covered or the HTL limit is exceeded. Every deployed collector retrieves application-specific resource information and returns information back to the sending node. The information collection process looks similar to the flooding query/response of a decentralized P2P network. The main difference between them is that the information collection process of PIVOT will always be successful in obtaining the resource information from the covered resource nodes; however, a P2P query may fail to get a valuable response in a peer node that does not have the involved contents. Without loss of generality, we assume that covered resources do not leave the network during the procedures of information collection and task execution. This is guaranteed by the resource sharing policy. In addition, the coverage of P2P flooding is restrained only by HTL, while information collection is limited by both HTL and n. To avoid multiple collections with same parameters from a demander covering relatively the same range of the resource network, a random-walk approach is designed to select the start point of a collection flooding. In fact, the demander can save and reuse the start points of those historical collections that he appreciates.

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

4547

4.2. Overhead traffic and communication latency To evaluate the main metrics (overhead traffic and latency), we assume that: (1) resource nodes only have knowledge of their immediate neighbours; (2) the coverage of n nodes is an undirected random graph G(V, E) with a maximum vertex degree d, the order of this graph jVj = n; and (3) the communication latency between a pair of neighboring nodes (one hop latency) is constant, denoted as l. We let T(n) denote the overhead traffic of the collection. There are four main types of traffic of a collection: (1) the collector disseminates from the demander to n nodes, denoted as Td(n); (2) the retrieved information on real-time resources is returned to the demander, denoted as Tr(n); (3) each covered node broadcasts the covered state to its immediate neighbors, denoted as Tb(n); and (4) the resource node sends the number of its uncovered neighbors back to the start node s and receives the command from s to decide whether or not to continue flooding, denoted as Tf(n). Then, T(n) = Td(n) + Tr(n) + Tb(n) + Tf(n). Let Bd denote the size of the collector and Bm denote the sizes of the message in Tb(n) and Tf(n). According to the spanning P P i tree of the n covered nodes, Td(n) = (n  1)Bd, Tb(n) ’ (nd  n + 1)Bm, and T f ðnÞ ’ 2Bm v 2V 0 heightðv Þ ¼ nBm h1 i¼1 2i1 , where V0 is the intermediate nodes set of the spanning tree and height(v) is the height of node v. Since information on all of the resources is isomorphic in a collection, the intermediate resources nodes can aggregate the information from its upstream nodes in the information return path to reduce the traffic Tr(n). In general, the resource information is of a relatively small size. We assume that the return traffic of all edges of the spanning tree derived from G(V, E) are of approximately equal size or that their differences in size are negligible with respect to information aggregations in the intermediate nodes. We denote the information size Br. Then, Tr(n) ’ jEjBr ’ (n  1)Br. Therefore,

TðnÞ ’ ðn  1ÞBd þ ðn  1ÞBr þ ðnd  n þ 1ÞBm þ nBm ’ nðBd þ Br Þ þ ndBm þ 1 þ n

h1 X

i

i¼1

2i1

!

h1 X

i

i¼1

2i1

 n Bm  ðBd þ Br Þ:

In addition, we let L(n) denote the latency of the collector’s dissemination of the n node discovery. Intuitively, L(n) is determined by the maximum eccentricity of the start node, i.e., the greatest distance or hops from an arbitrary covered node to the start node s. Let dist(v, s) represent the number of hops from node v to s. Then, L(n) = l  (2  maxu2V {dist(u, s)}  1). The n nodes construct a d-ary spanning tree of G(V, E) with root node s, 1 6 deg (u) 6 d for all u 2 V. With respect to graph theory, the height of the spanning tree maxu2V{dist(u, s)} is lower-bounded by hlow = dlogdn + logd (d  1)  1e. According to the demander’s latency deadline (HTL), denoted as H, the height of the spanning tree is upper-bounded by hup = H. Thus, l  hlow 6 L(n) 6 l  hup. In practice, the two constraint conditions may lead to the following conflicts: (1) the HTL is too small to guarantee that the collection can cover n nodes before the number of hops of the query exceeds the HTL, and (2) the HTL is too large, i.e., the maximum number of hops is still far from HTL when the collection has covered n nodes. The collection is considered to have failed when the former case occurs. For the latter case, flooding hops should be controlled accurately and P Ph the hops h must satisfy h1 i¼0 jV i j < n 6 i¼0 jV i j, where Vi is the set of nodes with a distance of i hops to the start node s. 4.3. Information collection algorithms In a collection, the start node and other n  1 nodes cooperate to achieve the best effort performance of overhead and latency. The start node (1) accepts the requirements (or original parameters) from the demander/scheduler, (2) disseminates and deploys the collector, and (3) collects and returns the aggregated resources information back to the demander/scheduler with the cooperation of other resource nodes. The Distributed Cooperative Algorithm for collecting resource information in Start node (DCA-S) is described with descriptive pseudo-codes in Table 1. The normal node accepts the collector and parameters from the node that precedes it, and returns aggregated information to that node. The Distributed Cooperative Algorithm in the Normal nodes (DCA-N) is described in Table 2. Since the start node is randomly chosen by the resource demander/scheduler, each resource node in the grid should have the two algorithms. 5. An example A simple computational grid, whose hypothetic topology is illustrated in Fig. 2, is used as an example. In Fig. 2, the vertex represents a resource, and the edge represents the bidirectional connection between two resources. Resources A, B, and C constitute organization 1, while resources C, D, and E belong to organization 2. A resource can participate in one or more virtual organizations or be independent (not belong to any organizations). In this example, the resources F, G, and H are independent. Obviously, the total number of the shared resources N = 9 includes the demander, and the maximum vertex degree d = 4. In the collection, HTL = 2 and n = 8. Fig. 3 illustrates the process of collector dissemination. All of the covered resource nodes construct an overlay network in a logical way that looks like a peer-to-peer information sharing system. Fig. 4 illustrates the covered nodes sending back their retrieved information by an inverse path to that of their collector disseminations. It is noticeable that the node F is not covered, since the collector dissemination constrains the number of hops.

4548

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

Table 1 DCA-S algorithm. Input: C: the information collector or query, which is disseminated to retrieve application-specific resource information; H: hops-to-live (HTL), the deadline hops of a collection for delivering the collector, H P 1; n: the number of resources that the demander/scheduler wants the collection to cover, n > 1. Output: I: the final aggregated resource information of n nodes including start node s; T: total overhead traffic of the collection; L: total communication latency of the collection. Dcas (C, H, n) { // Initialization N0 = 1;//Ni denotes the needed number of covered nodes till the ith hop; floodingFlag = True;//used to control the collector dissemination of its child-nodes; T = 0;L = 0;I = u; Bd = sizeOf(C);Bm = sizeOf(controlMessage); Begin { L = L + 1; NL = NL1 + j getValidNeighbors (s)j;// function getValidNeighbors (s) returns the neighbor nodes set of node s, in which all nodes have sufficient service time and are not covered; if NL P n {// Accomplish collection in one hop floodingFlag = False; Choose n  1 nodes randomly from the neighbors; Disseminate C to the n  1 nodes; Wait for the resource information from the n  1 nodes; I ¼ getInfoðsÞ  I1ns ; == the function getInfo(s) returns the resource information of node s,Iins represents the information set of the covered ith hop nodes, operator  means information aggregation or data fusion; T = T + (n  1)Bd; Return I, T, L; }// end of if // Collector dissemination with nodes coverage constrained by H and n Disseminate C to the Lth hop nodes; T = T + j getValidNeighbors (s)jBd; While (L 6 H) and (NL < n){ Wait for the Lth hops nodes to send back the number of their immediate neighbors; P NLþ1 ¼ NL þ v 2L j getValidNeighbors (v)j; // duplicated neighbors account for 1 in the sum operation. If NL+1 < n floodingFlag=True else floodingFlag=False; // s allows L-th hop nodes to send C to (L + 1)th hop nodes while floodingFlag=True Disseminate floodingFlag to the Lth hop nodes; P T = T + 2L ( v 2L j getValidNeighbors (v)j) Bm; L = L + 1; }//end of while If (L > H) and (NL < n) return 1;// failed due to the conflict between H and n; // Collect resource information Wait for the returning resource information and traffic, which are aggregated in the 1th hop nodes; I ¼ getInfoðsÞ  I1ns ;// after all of the information has arrived. P T ¼ T þ kj¼1 Br ðjÞ ;// k = j getValidNeighbors (s)j, j represents the jth neighbor of s, and Br(j) is the aggregated traffic of the returning information in node j; return L,T,I; }//end of begin }//end of Dcas

The process of collecting information from resources is defined as formula (1).

Deployment :¼ ðCollector; Result; Rules; HTL; nÞ

ð1Þ

 Deployment. Deployment denotes the process of a procedure to collect information from resources. It is named with a fixed format, < domain:sequenceNumber > . For example, the string ‘‘cesc.zju.edu.cn:3201” denotes that it is an information collector deployment issued by the host whose network domain name is ‘‘cesc.zju.edu.cn:3201” and identified number is ‘‘3201.”  Collector. The parameter Collector is the name of the information collector. A grid information demander may have diverse information collectors for its various scheduling objectives. In general, we name all of the Collectors with their executable file name. A Collector consists of an executable program, runtime environments (such as OS type, OS version, and necessary library) and the program version, denoted as formula (2).

Collector :¼ ðExecutable; Env ; VerÞ

ð2Þ

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

4549

Table 2 DCA-N algorithm. Dcan (i){// i denotes the id of the current node. stateFlag = False;// initially, the states of all nodes are not covered. Wait C; stateFlag = True; Broadcast stateFlag to neighbor nodes set getValidNeighbors (i); Send j getValidNeighbors (i)j back to s; Wait floodingFlag; // after accept floodingFlag from s. I(i) = getInfo(i);//I(i) represents the returning resource information aggregated in node i. Tr(i) = sizeOf(I(i));//Tr(i) represents the aggregated over traffic for collecting I(i). If floodingFlag=True { Broadcast C to neighbor nodes set getValidNeighbors (i); Wait for resource information and traffic size from getValidNeighbors (i); For j=1 to j getValidNeighbors (i)j { I(i) = I(i)  I(j);//I(j) comes from node j. Tr(i) = Tr(i) + Tr(j);//Tr(j) comes from node j.}// end of for }// end of if Send I(i) and Tr(i) to i’s preceding node; }// end of Dcan

 Results. In formula (1), the parameter Results is the set of output raw data extracted by the Collectors. In this set, each unit of raw data includes four fields as defined in formula (3).

RawData :¼ ðType; Resources; Attribute; Value; TimestampÞ

ð3Þ

There are two types of data, single and dual. The single type of data indicates that the data is information about an independent resource. For example, the raw data ‘‘Single, 10.21.202.2, FreeMemory, 54.6, 08202008142310” represents the host with an IP address of ‘‘10.21.202.2,” that has 54.6 megabytes of free RAM memory space at ‘‘14:23:10, Aug 20, 2008.” The dual type of data indicates the data is information about the interactional attributes between two and more resources, such as the communication bandwidth and latency between two hosts. For example, the raw data ‘‘Dual, <10.21.202.2,10.21.3.7>, Bandwidth, 95.3, 08202008142310” means that the communication bandwidth between hosts ‘‘10.21.202.2” and ‘‘10.21.3.7” is 95.3Mbps at ‘‘14:23:10, Aug 20, 2008.”  Rules. The parameter Rules is a database of semantics, knowledge, or transformation rules. It is defined by the information demander or scheduler and is used to filter and aggregate resource information.  HTL. HTL is designated by the information demander and is used to limit the collector’s dissemination distance.  n. This is the number of resource nodes that the collection wants to cover.

Fig. 2. A hypothetical topology of a computational grid.

4550

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

Fig. 3. The process of collector dissemination.

Fig. 4. The process of information collection.

6. Scheduling algorithm based on PIVOT We design a cost-time optimization algorithm constrained by a deadline and a budget to measure the effectiveness and performance of PIVOT. The previous version of the algorithm is described in [6]. The algorithm is finally reduced to resolve a linear programming problem. The object function is defined in this way:

min Z ¼ wm 

N X M X i¼1

j¼1

! C j  T ij  Rij

þ wt  T:

ð4Þ

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

4551

Here, Tij represents the time spent on task Xi while Xi is assigned to the resource Sj. We have assumed that for each pair of i and j, Tij is known. Cj is the cost of the computational resource Sj per unit of CPU time. wm and wt are weights of cost and time, respectively. If we really assign task Xi toSj, then the value of Rij will be 1; otherwise it will be 0. T stands for the duration from the beginning of the first task to the end of the last task. The problem constraints are listed as Eqs. (5)–(11).

Rij ¼ 0 or 1ði ¼ 1; 2; . . . ; N; j ¼ 1; 2; . . . ; MÞ

ð5Þ

T 6 T0

ð6Þ

N X M X i¼1 M X

C j  T ij  Rij 6 M0

ð7Þ

j¼1

Rij ¼ 1ði ¼ 1; 2; . . . ; NÞ

ð8Þ

Rij 6 1ði ¼ 1; 2; . . . ; MÞ

ð9Þ

j¼1 N X i¼1

n o T k ¼ maxfRij  T ij  nk g þ max Rij  tijk ; k ¼ 1; 2; . . . ; q ij ij ( !) q q X X T k þ max Rij  T ij  1  nk T¼ k¼1

ij

ð10Þ ð11Þ

k¼1

The meaning of Rij in Eq. (5) is the same as that in Eq. (4). Eqs. (6) and (7) mean ‘‘constrained.” Eq. (8) means that each single task should be assigned to one and only one computational resource. Eq. (9) means that each single computational resource should process at most one of those sub-tasks. That is because communications cannot happen until every one of those subtasks has finished by the same percentage. Eq. (10) gives the value of the duration from the end of the (k  1) and of the communication to the end of the kth communication; q stands for the times of communications between sub-tasks; and nk is the percentage amount of work completed during these two communications. For each k, tijk stands for the time of the kth communication for task Xi assigned to machine Sj (The value of Rij determines whether or not Xi is assigned to Sj). Eq. (10) means that another nk (percentage amount) of every task has been finished, and then the kth communication will happen. The reason why n we use o the first ‘‘max” is that the communication will not happen until all of the tasks are ready for communication. maxij Rij  t ijk stands for the time spent on the kth communication. Here, we use ‘‘max” for the following reason: the quality of communications varies in different paths of the network, and it is the slowest path that determines the time for communications of the whole task. Eq. (11) gives the value of the job completion time of N tasks. Furthermore, P after the last communication, 1  qk¼1 nk of each task is left. Except that Rij should be binary integers, all of the constrained conditions in our model are linear. Thus, the model is straightforward, and is reduced to a classical Binary Integer Programming problem, for which many methods are available. The precondition for the algorithm to perform is that we should acquire sufficient information about resources, such as information about network communication, resource prices, and special application libraries. Among these, the dynamic attributes of a valid resources network (including the topology, bandwidth, and latency) are more important for high-performance computing. Consequently, PIVOT can be used to discover the necessary information that traditional grid information services do not support. 7. Experiment and evaluation First, we evaluate the performance of the PIVOT framework with two simulations on two random graphs generated by the BRITE [1,26] topology generator and the GT-ITM [15,38] topology generator, respectively. Second, using the scheduling algorithm (described in Section 6), which is based on the information service of PIVOT, we carry out two parallel computing experiments to validate the effectiveness of the PIVOT framework. 7.1. Performance evaluation We construct two random graphs: (1) a B-graph generated by BRITE and (2) a G-graph generated by GT-ITM. These two graphs are both undirected and connected. In the graphs each node represents a computational resource, while each edge represents a connection of two resources. The B-graph has 1000 nodes and 1997 edges while the G-graph has 6660 nodes and 10468 edges. The average degree of the nodes in the B-graph is approximate four, while in the G-graph it is three. Assume that (1) the communication latency on each edge is 1 time unit (several milliseconds or seconds); (2) the size of the collector Bd = 20 KB, the size of the control message Bm = 0.25 KB, and the average size of the return information of all nodes Br = 1 KB. The overhead traffic is calculated by the number of transmitted bytes per edge. Take an arbitrary edge as an example, when the collector transmission traffic is a, the control message traffic is b, and the return information traffic is c, then the total traffic on this edge is a + b + c. The total overhead traffic of a collection is the sum of the traffic on all edges. In our simulations, PIVOT randomly chooses a node from the graph as the start node to begin the collector dissemination and

4552

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

information collection processes. In addition, the B-graph and G-graph are random graphs. To obtain more stable results, we do a simulation with the same parameters 50 times, and then compute the average latency and traffic of the 50 times. The results of the simulations on the B-graph and G-graph are shown in Tables 3 and 4, respectively. The simulation results show that the information discovery latency is scalable according to the variety of the number of discovered nodes. In Table 3, the B-graph has 1000 nodes, the average degree of which is 4. When the number of discovered nodes changes from 50 to 950, the communication latency only changes from 2.6 to 5.15. Meanwhile, the average overhead traffic of all concerned nodes only changes from 41.30 Kb to 81.72 Kb. This shows that PIVOT is of high scalability and efficiency. In Table 4, the G-graph has 6660 nodes, the average degree of which is three. We do similar simulations of information discovery with the coverage changing from hundreds of nodes to thousands of nodes. The results also show the same level of scalability and efficiency. It is noticeable that the average overhead traffic of all concerned nodes (from 111 Kb to 174 Kb) is very small and acceptable. We find that the PIVOT framework is highly adaptable to the scale of information discovery in a random connected resource network.

7.2. Application experiment case 1 As an experiment case, a structural analysis of a crank using CSM is selected as a parallel grid task. The task is scheduled by the algorithm depicted in Section 6 after resource information is collected using the PIVOT designed in Section 4.

Table 3 The simulation results on the B-graph (discovered and collected n nodes of information while n varied from 50 to 950). Number of nodes (n)

Latency (time unit)

Total overhead traffic (KB)

Average overhead traffic per node (KB)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950

2.6 3.0 3.4 3.65 3.65 3.8 3.7 3.9 3.95 4.15 4.5 4.5 4.5 4.45 4.55 4.85 4.9 4.75 5.15

2064.88 4771.62 9035.28 12780.10 16175.95 20058.60 23019.10 28016.35 32597.48 36517.65 43073.93 46579.10 50876.78 52650.18 58572.70 65430.19 70869.50 70905.83 77630.13

41.30 47.72 60.24 63.90 64.70 66.86 65.77 70.04 72.44 73.04 78.32 77.63 78.27 75.21 78.10 81.79 83.38 78.78 81.72

Table 4 The simulation results on the G-graph (discovered and collected n nodes information while n varied from 250 to 4750). Number of nodes (n)

Latency (time unit)

Total overhead traffic (KB)

Average overhead traffic per node (KB)

250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 4000 4250 4500 4750

6 6 6 7 7 7 8 8 8 8 8 9 9 9 9 9 9 9 10

27897 60179 90692 137732 171726 205505 262377 287286 343608 380706 408463 486833 536369 574592 628200 673585 720586 771325 830623

111 120 120 137 137 137 149 143 152 152 148 162 165 164 167 168 169 171 174

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

4553

After geometrical modeling, the crank is decomposed into 16 domains using Metis [4] and delivered to different computational resources for parallel execution, and the data is finally visualized. The simulation presents the model with displacement contour after the use of mechanics problem solving computing, and merging results. Fig. 5 shows the corresponding geometrical model after discretization. The model is decomposed into 142070 meshes (elements). The simulation is a computation-bounded and communication-bounded task. The 16 parallel processes utilize three networked grid hosts to perform the simulation. The parallel processes must communicate periodically with a large volume of data. In this case, the information about the resources network is very important for the scheduling algorithm. Traditional grid information services, however, cannot provide this type information. We conduct this simulation on MASSIVE with PIVOT. An information collector integrating Network Weather Services (NWS) [37] is developed to collect network information. Computational resources with the same topology as in Fig. 2 are used, as shown in Fig. 6.

Fig. 5. Geometrical model after discretization.

Fig. 6. Resource information discovered by PIVOT.

4554

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

Fig. 7. The steady vortex streets downstream near the jet nozzle

First, we schedule the computation task with information services MDS provided by Globus Toolkit. MDS can only discover independent host information, such as the processor, memory and disk. It is not capable of discovering information related to communication between hosts, such as bandwidth among hosts. Thus, the scheduling algorithm runs and selects resources only according to the computing power of resources. The scheduling result is that parallel processes are dispatched to node B, node D, and node G. Eventually, the task is completed with approximately 87 s wasted. Second, PIVOT is utilized to discover real-time resource information, including bandwidth between covered hosts. Based on this information, we carry out the same parallel task with the algorithm described in Section 6. The scheduling result shows that parallel processes are dispatched to node C, node D, and node E. The parallel task is completed with approximately 37 s wasted. The results show that PIVOT is effective. It helps the scheduling algorithm and dramatically improves the efficiency of the parallel task.

7.3. Application experiment case 2 In the second case, we use a numerical program of Computational Fluid Dynamics (CFD) as an example to test PIVOT. This CFD program simulates the vortex streets downstream near the nozzle in a plane’s jet. The whole computational domain is divided into two types of sections: the physical domain, and one PML buffer zone at each end of the physical domain (illustrated in Fig. 7). Four resources (processors) should be used to compute four subtasks of the same workloads. The necessary condition for a candidate computational resource is that it must be able to cope with a special numerical library file. In practice, several different types of machines are available for the task, including an SGI Onyx 3900 supercomputer with 64 processors (labeled as S1–S64), four SGI Octane workstations (labeled as w1–w4), and eight personal computers with a single processor (four of which are better than the rest, labeled as P1–P8). Partial information on these available resources is listed in Table 5. The resources are all connected to each other with a 100 Mb bandwidth network. To estimate the time required for computing and communications, we run the program and record the milestone time on a supercomputer. Analyzing the recorded time list, we discover that the data transferred during communications are very small in size, although the communications are frequent. Compared with the time for computing and waiting, the total amount of time for transferring data is very small. Thus, the communication time can be ignored in the results. First, we run a scheduling algorithm based on the information discovered by MDS. Second, we run the same scheduling algorithm based on the information discovered by PIVOT. In Table 6, the results of the experiments show that the scheduling scheme is invalid in the condition of 0.0655 6 wm : wt < 0.3122 when using MDS to discover resources information. The main reason for this is that MDS cannot provide the information about the necessary library and supports add-on information discovery programs adaptively. Compared to MDS, it makes scheduling more effective by using the resource information provided by PIVOT. Thus, the execution of tasks is more efficient.

Table 5 Partial information of available resources. Label

Machine types

CPU speed

Price

Estimated process time (Sec)

Necessary library

S1–S64 W1–W4 P1–P3 P4 P5–P8

Supercomputer Workstations Desktop PC Desktop PC Desktop PC

600 MHz 300 MHz 2.0 GHz 2.0 GHz 1.8 GHz

10 4 1.2 1.2 1

1984 9348 5465 5465 6072

Yes Yes Yes No Yes

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

4555

Table 6 Results of scheduling schemes. Information service

wm:wt

Z

Selected resources

Validity

MDS MDS MDS PIVOT PIVOT PIVOT

wm:wt < 0.0655 0.0655 6 wm:wt < 0.3122 wm:wt P 0.3122 wm:wt < 0.0655 0.0655 6 wm:wt < 0.3122 wm:wt P 0.3122

2081 – 24270 2081 11079 24270

Four processors in S1–S64 P1, P2, P3, P4 P5, P6, P7, P8 Four processors in S1–S64 P1, P2, P3, P5 P5, P6, P7, P8

Valid Invalid Valid Valid Valid Valid

8. Conclusions The information service is an important and necessary utility of all grid systems. The effectiveness of a grid task scheduling algorithm mainly lies in the quantity and quality of the discovered resource information. Since most application-specific resource requirements are largely diverse in different application domains, the effective task scheduling on grids should be supported by sufficient resource information. Therefore, a scalable and adaptive information service is important to a computational grid system. In this paper, we proposed an adaptive information discovery framework (PIVOT). As an active information discovery mechanism, PIVOT supports the deployment of application-specific information collectors and the customized resource information retrieval. PIVOT exploits a P2P-like collector dissemination and information collection scheme based on a random graph overlay. Decentralized and resilient resource organization enhances the scalability of the PIVOT framework. The performance of information collection is not influenced while shared resources join and leave the network dynamically. Under the constraints of covered resources and communication latency, two distributed cooperative algorithms were designed to achieve the best-effort performance on overhead traffic and communication latency. In addition, we developed a cost-time optimization algorithm constrained by deadline and budget for scheduling parallel and dependent tasks using PIVOT. The optimization algorithm applies the Binary Integer Programming model to schedule communication-intensive parallel grid tasks effectively. The theoretical analysis and simulation show that PIVOT provides a scalable information discovery mechanism for random resource sharing networks. And the performance of information collection on overhead traffic and latency can be guaranteed in a relatively determinate range with respect to the user’s requirements. Simulations and real-world experiments demonstrated that PIVOT is highly adaptable for application-specific resource information discoveries. Thus, it improves the accuracy of resource allocation and the efficiency of executing parallel grid tasks. Acknowledgments This work was supported in part by the National Natural Science Foundation of China under Grants No. 60803161 and No. 60673179 and the Natural Science Foundation of Zhejiang Province of China under Grant No. Z106727. The authors thank the Center for Engineering and Scientific Computation, Zhejiang University, for permitting us to make use of its computational resources, with which the research project has been carried out. References [1] BRITE topology generator, available at . [2] J.J. Bunn, H. Newman, S. McKee, D.G. Foster, R. Cavanaugh, R. Hughes-Jones, High speed data gathering, distribution and analysis for physics discoveries at the large Hadron collider, in: the Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November 11–17, 2006, New York, NY, USA. [3] M. Cai, K. Hwang, Distributed aggregation algorithms with load-balancing for scalable grid resource monitoring, in: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, IPDPS’07, IEEE Computer Society, Long Beach, California, USA, March 26–30, 2007, pp.1– 10. [4] R.W. Clough, J. Penzien, Dynamics of Structures, McGraw-Hill Books Co., New York, 1993. [5] A. Cooke et al, R-GMA: an information integration system for grid monitoring, Lecture Notes in Computer Science 2888 (2003) 462–481. [6] H. Feng, G. Song, Y. Zheng, J. Xia, A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Dependent Tasks in Grid Computing, Proceedings of the Second International Workshop Grid and Cooperative Computing, GCC’ 03, Springer-Verlag, Heidelberg, Shanghai, China, 2003, pp. 113–120. December 7–10. [7] I. Foster, C. Kesselman, Globus: A metacomputing infrastructure toolkit, International Journal of Supercomputer Applications and High Performance Computing 11 (1997) 115–128. [8] J. Frey, T. Tannenbaum, M. Livny, I. Foster, S. Tuecke, Condor-G: a computation management agent for multi-institutional grids, Cluster Computing 5 (2002) 237–246. [9] F. Gagliardi, B. Jones, F. Grey, M.E. Bégin, M. Heikkurinen, Building an infrastructure for scientific grid computing: status and goals of the EGEE project, Philosophical Transactions of the Royal Society A-mathematical Physical and Engineering Sciences 363 (2005) 1729–1742. [10] M. Ganzha, M. Paprzycki, J. Stadnik, Combining information from multiple search engines—Preliminary comparison, Information Sciences 10 (2010) 1908–1923. [11] W. Gentzsch, Sun grid engine: towards creating a compute power grid, in: the Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID’01, IEEE Computer Society, Brisbane, Qld., Australia, May 15–18, 2001, pp. 35–36. [12] GLORIAD, available at . [13] GRIMOIRES - UDDI compliant Web Service registry with metadata annotation extension, available at . [14] A.S. Grimshaw, W.A. Wulf, The legion vision of a worldwide virtual computer, Communications of the ACM 40 (1997) 39–45. [15] GT-ITM Topology Generator, available at .

4556

G. Wei et al. / Information Sciences 180 (2010) 4543–4556

[16] M. Hatanaka, Y. Nakano, Y. Iguchi, T. Ohno, K. Saga, S. Akioka, S. Matsuoka, Design and implementation of naregi super-scheduler based on OGSA architecture, IPSJ SIG Notes 57 (2005) 33–38. [17] Z. Hou, J. Tie, X. Zhou, I. Foster, M. Wilde, ADEM: automating deployment and management of application software on the open science grid, in: the Proceedings of the 2009 10th IEEE/ACM International Conference on Grid Computing, 130–137, October 13, 2009–October 15, 2009, Banff, AB, Canada. [18] J. Huai, H. Sun, C. Hu, Y. Zhu, Y. Liu, J. Li, ROST: remote and hot service deployment with trustworthiness in CROWN Grid, Future Generation Computer Systems 23 (2007) 825–835. [19] W. Jie, W. Cai, L. Wang, R. Procter, A secure information service for monitoring large scale grids, Parallel Computing 33 (2007) 572–591. [20] A. Konstantinov, P. Eerola, B. Kónya, O. Smirnova, T. Ekelöf, M. Ellert, J.R. Hansen, J.L. Nielsen, A. Wäänänen, F. Ould-Saada, Data management services of NorduGrid, in: the Proceedings of 2004 Computing in High Energy and Nuclear Physics Conference, CHEP’04, Interlaken, Switzerland, 27 September - 1 October 2004, pp. 765. [21] L. Lee, N. John-Paul, B. Eric, TeraGrid’s integrated information service, in: the Proceedings of the fifth Grid Computing Environments Workshop at Supercomputing 2009, GCE09, November 20, 2009, Portland, OR, USA. [22] L. Maria, D.W. Walker, GECEM: grid-enabled computational electromagnetics, Future Generation Computer Systems 24 (2008) 66–72. [23] S. Marti, V. Krishnan, Carmen: a dynamic service discovery architecture, Technical Report HPL-2002-257, HP Laboratories Palo Alto, 2002, Available at . [24] M. Marzolla, M. Mordacchini, S. Orlando, Peer-to-peer systems for discovering resources in a dynamic grid, Parallel Computing 33 (2007) 339–358. [25] MDS4, Monitoring & Discovery System (MDS4), web site is available at . [26] A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE: an approach to universal topology generation. in: Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS’01, IEEE Computer Society, Washington, DC, August 15– 18, 2001, pp. 346–353. [27] G. Paltoglou, M. Salampasis, M. Satratzemi, Collection-integral source selection for uncooperative distributed information retrieval environments, Information Sciences 14 (2010) 2763–2776. [28] S. Pastore, The service discovery methods issue: a web services UDDI specification framework integrated in a grid environment, Journal of Network and Computer Applications 31 (2008) 93–107. [29] O. Paunovski, G. Eleftherakis, K. Dimopoulos, T. Cowling, Evaluation of a selective distributed discovery strategy in a fully decentralized biologically inspired environment, Information Sciences 10 (2010) 1865–1875. [30] D. Puppin, S. Moncelli, R. Baraglia, N. Tonellotto, F. Silvestri, A grid information service based on peer-to-peer, Lecture Notes in Computer Science 2790 (2005) 454–464. [31] M. Romberg, The UNICORE grid infrastructure, Scientific Programming 10 (2002) 149–157. [32] H. Shen, A P2P-based intelligent resource discovery mechanism in Internet-based distributed systems, Journal of Parallel and Distributed Computing 2 (2009) 197–209. [33] I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M.F. Kaashoek, F. Dabek, H. Balakrishnan, Chord: a scalable peer-to-peer lookup protocol for internet applications, IEEE/ACM Transactions on Networking 11 (2003) 17–32. [34] H. Sun, J. Huai, Y. Liu, R. Buyya, RCT: A distributed tree for supporting efficient range and multi-attribute queries in grid computing, Future Generation Computer Systems 24 (2008) 631–643. [35] D. Talia, P. Trunfio, Web services for peer-to-peer resource discovery on the grid, in: M. Agosti, H.J. Schek, C. Türker (Eds.), the Proceedings of DELOS Workshop Digital Library Architectures 2004, Edizioni Libreria Progetto, Cagliari, Italy, 2004, pp. 73–84. [36] G. Wei, G. Song, Y. Zheng, C. Luan, C. Zhu, W. Wang, MASSIVE: a Multidisciplinary Applications-Oriented Simulation and Visualization Environment, in: L. Zhang, M. Li, A.P. Sheth, K.G Jeffery (Eds.), Proceedings of the 2004 IEEE International Conference on Services Computing SCC’04, IEEE Computer Society Press, Shanghai, China, 2004, pp. 583–587. September 15–18. [37] R. Wolski, N. Spring, J. Hayes, The network weather service: a distributed resource performance forecasting service for meta computing, Future Generation Computing Systems 15 (1999) 757–768. [38] E.W. Zegura, K.L. Calvert, M.J. Donahoo, A quantitative comparison of graph-based models for Internet topology, IEEE/ACM Transactions on Networking 6 (1997) 770–783. [39] C. Zheng, The PRAGMA Testbed: building a multi-application international grid, in: H. Jiang, J. Wu, D. Feng (Eds.), the Proceedings of Sixth IEEE International Symposium on Cluster Computing and the Grid, CCGrid‘06, IEEE Computer Society, Singapore, 16–19 May 2006, pp. 57–57.