Research of dynamic scheduling method for the air-to-ground warfare simulation system based on grid

Research of dynamic scheduling method for the air-to-ground warfare simulation system based on grid

Simulation Modelling Practice and Theory 18 (2010) 1116–1129 Contents lists available at ScienceDirect Simulation Modelling Practice and Theory jour...

1MB Sizes 0 Downloads 22 Views

Simulation Modelling Practice and Theory 18 (2010) 1116–1129

Contents lists available at ScienceDirect

Simulation Modelling Practice and Theory journal homepage: www.elsevier.com/locate/simpat

Research of dynamic scheduling method for the air-to-ground warfare simulation system based on grid Yanfang Fu a,*, Fengju Kang b, Jianghua Qi c, Shimei Duan d a

Computer Science and Engineering College, Xi’an Technological University, Xi’an, Shaanxi Province, China Collage of Marine, Northwestern Polytechnical University, Xi’an, Shaanxi Province, China c Xi’an Technological University, Shaanxi Province, China d China Flight Test Establishment, Shaanxi Province, China b

a r t i c l e

i n f o

Article history: Received 28 January 2009 Received in revised form 21 December 2009 Accepted 2 January 2010 Available online 25 January 2010 Keywords: Dynamic scheduling HLA Load balance Simulation grid Resource management

a b s t r a c t The military field has been a strong demand for automatic management of the running of simulation, and the sharing of simulation resources and so on. Aimed at the puzzles in current HLA-based simulation system, and with the combination of a new grid idea, a framework of simulation grid has been presented. This article is absorbed in the aim how to schedule the task under simulation grid environment and explore the dynamic dispatch to the parallel tasks in the federation entity level. Finally a mended heuristic scheduling algorithm has been designed. This algorithm dynamically adjusted decisionmaking through using the information of systematical real-time operating status, be able to making a timely response dynamically according to the changes of the characteristics of simulation system, re-achieve balance and improve the system performance, fault-tolerant and load-balance ability according to the adjustment of the dynamic fluctuations of the loading. Taking the air-to-ground warfare simulation system as an example, simulation results verify that the method is effective and useful, and it could contribute to enhance using resource quotient and construct the large-scale military simulation applications. Ó 2010 Elsevier B.V. All rights reserved.

1. Introduction HLA (High Level Architecture) is an open object-oriented supported system. The most famous characteristic is separated into three segments, i.e., the detailed realization of the simulation function, simulation operation management and communications. It offers relative and independent supporting service program, which hides its own realization details [1]. It enables large-scale distributed simulation with more compatibility and expansion capacity according to the development of the simulation application requirement. RTI (Run-Time Infrastructure) basically ensures the interconnection, exchange and interoperability among all the simulation nodes. But the simulation-based HLA still exist certain shortcomings, i.e., the lack of the capabilities management, the flexibility and the scheduling of simulation resources [2]. Each of simulation tasks has been bundled up with the store resources or the computing resources before simulation, which will lead to the emergence of the following problems [3]: (1) In simulation, the computing capability of nodes is similar to each other. If its load varies widely, the overload node will become the bottleneck in the entire simulation system, thus affecting the overall system performance. * Corresponding author. Tel.: +86 29 83208084. E-mail addresses: [email protected] (Y. Fu), [email protected] (F. Kang), [email protected] (J. Qi), [email protected] (S. Duan). 1569-190X/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.simpat.2010.01.006

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

1117

(2) During the running of the simulation task, the resources fault, such as the sudden crash of the computer, will lead to failure, and it may lead to the failure of the entire simulation tasks. (3) During the model processing, the creation of the federation and the federate members are based on the systematically physical model or mathematical model, where there are serious hidden dangers of the unequal distribution. HLA/RTI is lack of load-balancing mechanism, which will cause the entire simulation system getting slow or even paralyzed. Since the birth of grid technology in 1990s, its technical development has never stopped; its application area is getting wider and wider. Simulation grid is the introduction of grid technology against the limitations of HLA, which will improve the former shortcomings greatly [4]. Simulation grid system is able to schedule parallel tasks and resources on the basis of the mainstream simulation agreement, such as automatically finding, resources selection, automatic activation for federate execution, dynamic scheduling for simulation entities and automatic collection for simulation results. This technology has not only solved the traditional simulation system grid problems, but also overcome the shortcomings that HLA-based system does not support dynamic resource management and task scheduling. This essay is based on the study of the dynamic scheduling algorithm; it transforms the inspiring algorithm against the feature of simulation grid tasks, improves the efficiency and fault-tolerant ability of the simulation application, and realizes the transparency of the simulation process [5]. The deployment and application of simulation system in the grid are the two main intentions in this paper. Thus, the structure will be designed for simulation grid system. 2. The structure design of simulation grid system The efficiency of resource management and task scheduling is enhanced in the simulation grid from two levels, i.e., simulation method level and the software design level. Simulation grid system needs to improve the method of the simulation development flow, and propose the method of simulation task submission and the task distribution. Simultaneously it also needs to design the optimized algorithm to choose the grid resources and implement task scheduling [6]. Using the protocol standard of the network management service in the software design, it has realized the deployment of simulation system in the grid and its application. Simulation grid system is composed of many grid nodes. The various simulation tasks are assigned to the grid node once the implementation started [7]. Both the physical entity and the logical relation of these grid nodes in the distribution are independent and parallel through the communication mechanism or network interaction [8,9]. 2.1. Simulation Grid System Framework (SGSF) SGSF designs are shown in Fig. 1: when a user submits the simulation task, the user is required to submit many information for simulation entities, the number of entities, the simulation initialization parameters and the time parameters, once this task is demanded to decompose to several sub-task queue, which is based on the different needs of different users (such as users grades and urgent task level), to sort. This framework is designed for the improvement and expansion of the advantages of HLA management, and grid computing is used to share resources for users to create, manage, implement and operate with simulation entities [10]. The entities can real-time interact, and automatically withdraw from simulation federation. The simulation system should be able to

Fig. 1. Simulation grid system functional framework.

1118

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

GridNode 1

GridNode 2

GridNode n -1

Static scheduling system GridNode n Monitoring system

Dynamic scheduling system

Fig. 2. Simulation grid scheduling framework.

monitor simulation resources at real time, and dynamically optimize the simulation resources to deal with unexpected incidents. Because all nodes in the distributed system are self-government and dynamic, the scheduling algorithm should have all-around status information. From the point of view, a simulation task should be carried out with the scheduling needs of the current system status [11]. If the system overload in some node or light load, load balancing should be one of the main goals of the next scheduling [12]. Obviously, this self-regulating performance we mentioned in the former part is needed to warfare simulation system. SGFS composes five parts, i.e., the simulation client, static scheduling system, monitoring system, dynamic simulation systems and simulation tasks assessment services. SGSF is shown in Fig. 2. The dynamic scheduling system, monitoring systems and static scheduling systems are the main parts of SGSF. A detailed introduction will be conducted as below. 2.1.1. Monitoring system Monitoring system provides a periodic monitoring and measuring all kinds of performance information such as dynamic grid computing resources, simulation tasks and network [13]. The performance information is sent to other systems (static scheduling subsystems and dynamic scheduling subsystem) in time. When the grid nodes have such a service, the situation of running resources can be traced, the network situation is monitored by the distributed mode, and real-time data can be provided to the dynamic scheduling system in order to assess and forecast. These data monitoring systems will provide the support for the scheduling decision-making. On the other hand, it can recognize which grid resources are effective and what they are, such as the number of the effective resources and IP addresses is used to inform the scheduling subsystem to distinguish the available resources from the unavailable resources. Simulation tasks will be avoided to assign to simulation node which does not exist. On the whole, monitoring function contains three parts: real-time surveillance resources, monitoring tasks and obtaining historical information. They adopt a unified directory service form. Namely, they establish the grid directory services for information management, registration and information gathering in the distributed environment. As the kind of surveillance information is multiple, such as the state of resources, network status, and RTI, so monitoring method is various [14]. According to the impact of the classification with various parameters, the SGSF is primarily divided into three categories to complete the monitoring process. The situation of the system itself is obtained through hardware approach; the network status is obtained through SNMP (Simple Network Management Protocol). The situation of the RTI is obtained through RTI functions. Dynamic scheduling system needs monitoring information. It assesses those informations in the process of simulation based on real-time monitoring to achieve the system loading status and to complete the dynamic scheduling; static scheduling also need monitoring information according to the performance of initial resources, it distributes simulation tasks reasonably, and the overall performance of simulation system can be improved [15]. The task monitoring is divided into real-time and non-real-time monitoring, the latter get necessary task situation through the historical database. The monitoring process can obtain monitoring data according to user setting frequency or requesting. On the one hand, the data obtained replace outdated item information. On the other hand, they send the historical database to archive. When the simulation task appears a special event during running process, such as user error or setting error which is caused by setting a number beyond the load threshold, these informations will be transferred to dynamic scheduling and be treated timely, and will alarm the users by readable or visual manner. 2.1.2. Static scheduling system Static scheduling shown in Fig. 3 is launched simulation distribution function after users refer the simulation tasks. The system looks for the suitable simulation node to run through static scheduling strategy, which is based on the information

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

1119

Static scheduling management Task assign

Task assign

Task assign

Execute management

Execute management

Execute management

HLA task

HLA task

HLA task

RTI

RTI

RTI

Grid node 2

Grid node1

Grid node n

RTI Fig. 3. Static scheduling.

that the monitoring system provided. The simulation tasks will be distributed to the selected node. There are many tasks in one node. Two tables are needed in the system, they are simulation computer table and federate members’ list table. Two tables provide the dynamic scheduling system after the simulation task is distributed. Simulation host table mainly includes the relationship between physical resources and simulation task. Federate member list table is mainly related to federate information, such as the federate members, entities name and running environment configuration information. 2.1.3. Dynamic scheduling system In Fig. 4, the design of the structure simulation grid is shown. Simulation applications get the ‘‘discover” service in the course of the simulation. This server provides the statistics information with current effective resources in the grid (such as IP address and node number). To achieve the monitoring of the various resources, the operation environment and the resources are provided with the basis requirement for the task. Then static scheduling system based on the resource description and the resource situation generated by the grid, the static algorithm will distribute the sub-tasks to the choice of grid nodes, in accordance with the running conditions of task for the start of the federation of nodes [16]. Monitoring system monitors the running of sub-tasks in the simulation process of the operation, it can get the CPU utilization rate, delay time and port traffic, etc., also can display real time. Dynamic scheduling system receives the monitoring results, and then under the dynamic algorithm migrate simulation task to achieve load balancing. Ultimately, all the simulation results, process simulation data and information transfer performance assessment, report server to complete the data reports.

Federate 1

...

Federate k-1

Federate k

Agent

...

Federate n

Agent

GridNode1 TCP\IP

DSM

Monitoring

Static scheduling

0DQDJHPHQW1RGH Fig. 4. Simulation grid resources’ dynamic scheduling framework.

1120

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

TCP is adopted in the dynamic scheduling system. Dynamic Scheduling Manager (DSM) starts after the start of static scheduling system. Firstly, it receives the entities of the federation and grid resource allocation table with static scheduling system outputting. But in the simulation running process, static scheduling has assigned the reasonable federation members to the grid nodes, it needs to use scheduling algorithm to dynamically manage system resources for balancing the load. Secondly, when DSM found the operating performance of some grid resources reduced evidently, monitoring system provides real-time system performance information, which is based on current network monitoring information to select members and objectives of nodes those need to migrate. The dynamic scheduling system is close with the monitoring system. The dynamic algorithm mainly resolves scheduling question which schedules some tasks to some grid nodes in order to enable the simulation running well on the match resources, so the algorithm will be introduced. Finally, the relocation order sends from the agent of source node to purpose. The stationed agent implements the relocation order which DSM sent to. It starts and finishes the new federation members; furthermore preserves and sends the information of federation migration. This information includes: (1) Federate member end. That is end of the federation members’ process. (2) Federate member start. It receives the information that the federation members start, and receives the entity’s own dynamic simulation data, which make ensure simulation logic correct after the federation re-enter the federate executive. (3) Store. It stores the dynamic data of entities which need to end, and saves the federation information which needs to end (federation member names, federation time and federal execution), and so on.

2.2. Communication protocol design SGSF uses a dual-channel communication mechanism which mainly includes distributed simulation Task communications and management communications. Distributed simulation communications using mainly HLA standard will be reached by various simulator interconnections. And management communications are used in the management of resources and simulation tasks. 2.2.1. Distributed simulation tasks communications At present, the RTI mechanism is used plentiful in the distributed simulation field. For example, Millennium Challenge 2002 and Joint Semi-Automated Forces for USJFCOM [17]. RTI is used more and more extensively in the military simulation, because RTI can provide standard interface for the customer’s application and shield from many distributed computing details, such as the establishment of network connections and data requests sent. RTI specifies a group of the interface services to support members of the federate in accordance with the provisions of the Federal FOM [18]. During a federation execution, all exchange of FOM data among federates shall occur via the RTI. Federates shall interact with the RTI in accordance with the HLA interface specification during a federation execution (Fig. 5). The HLA provides a standard specification for accessing RTI services to support interfaces between federates and the RTI (see IEEE P1516.1). Federates use these standard interfaces to interact with the RTI. This interface specification shall define how simulations interact with the infrastructure. However, since the interface and the RTI will be used for a wide variety of applications requiring data exchange of diverse characteristics, the interface specification says nothing about the specific federate data to be exchanged over the interface. Data exchange requirements between federates shall be defined in the FOM. This communication mode is more efficient, simulation grid system retains the RTI communication mechanism to complete data distribution and federation management [19]. 2.2.2. Management communications When the load problems arise in system, it is needed to transfer simulation tasks. The scheduling algorithm determines one federate member moved to other grid node, this management order is passed to management agent in this node, when the agent received the order; they start or end the federate members in accordance with this order.

Federate A RtiExec

FederateB

Federate N

FedExec

LRC

LRC

RTI Fig. 5. RTI communication.

LRC

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

1121

The message format for dynamic scheduling reports the following (Fig. 6): Type field is the type of relocation message; IP addresses field is used to determine which grid nodes to send; MoveType field has two means: to end and to start; FederationName field stores federate executive name whose simulation tasks are assigned to the grid nodes; Federate Num field means the number of federate members; Check field covers the entire message. Firstly this value is 0, then each eight-binary code needs to be anti-code for the message, and the results are written into check field;  Data field has many components. Each of components is federate name. The number of components identified by Federate Num field.      

3. Interactive timing process of SGSF In SGSF, simulation management system (static and dynamic scheduling system), simulation resources, grid resource monitoring systems and simulation services should be built on the environment with grid and RTI, because they are closely dependent on the resources, information, data and management services that grid provided, so that they constitute a ‘‘grid resource pool”. Therefore, the task needs to experience simulation activities and to deal with the interaction among them. All these can be used as shown in Fig. 7. From the above interactive process, simulation management system is the main advantage of SGSF components; they achieve management automation of the simulation application and improve the efficiency of the simulation.

Fig. 6. Dynamic scheduling message format.

Fig. 7. SGSF interactive timing process.

1122

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

4. Dynamic scheduling algorithm modeling The dynamic scheduling function is most important in simulation gird system. Therefore, modeling method and concrete algorithm will be introduced in detail in fourth and fifth sections. HLA/RIT only cares about the interaction, interoperability and integration issues, does not care resource management, scheduling and other issues, so grid is tried to combine with. In this way, the use of computing power and resource management capacities that grid provides and the distributed simulation application are carried out more effectively [20]. Dynamic scheduling algorithm is the main idea that the task scheduling problem recapitulates into a general optimization problem, and then using mathematical programming techniques to solve it. Definition 1. Federal entities have m members, F ¼ ff1 ; f2 ; . . . ; fm g, the simulation grid system can make use of n processors P ¼ fp1 ; p2 ; . . . ; pn g; the general conditions, m > n. In order to achieve algorithm, many matrix needs as follows: Definition 2. Running cost matrix Qmn

Q mn ¼ fqi;j j1  i  m; 1  j  ng

ð1Þ

where qi,j shows the running costs that ti federation member run in the Pk processor, which is the measure by federation members running. If qi,j = 0, then ti federation member cannot run in the Pj processor. To show the distribution of tasks to the processor, the definition of distribution matrix is

xi;j ¼



1; if f i federation member distribute to Pk processor 0; if f i federation member do not distribute to Pk processor

ð2Þ

To show communication expenses, C mn is the traffic matrix:

C mm ¼ fci;j g;

i; j ¼ 1; 2; . . . ; m

ð3Þ

where ci,j express data traffic that fj federation member subscribe object class and interaction class’s number of fj federation member, if ci,j = 0, between fi and fj is not data exchange. The Federation members must have a choice in the allocation of processors, for example, visual federation member must be allocated to the image display workstations; some federation members’ entities need to use MATLAB or semi-physical simulation system [21]. Definition of priority matrix is Amn ,

Amn ¼ fai;j j1  i  m; 1  j  ng

ð4Þ

if ai;j ¼ 1, fi cannot be allocated to Pj ; otherwise = 0. The load on the processor shows U k ; k ¼ 1; 2; . . . ; n, it measures the implementation load of federation members in one processor. In accordance with the HLA-based simulation system, dynamic scheduling methods should have the following principles. Principles: the dynamic scheduling of federation members achieves through the imposition of certain restrictions. Simulation grid system mainly takes into account constraints that are processor load, storage capacity and real-time simulation step side. Processor load the restrictions can be expressed as

X

ui xi;k 6 Thresholdk ;

k ¼ 1; 2; . . . ; n

ð5Þ

where ui shows load value of the federation members in i processors, Thresholdk shows the load capacity of Pk processor. (This was mainly due to simulation grid system that may exist in heterogeneous resources, and its ability is different.) (5) – note: actual load cannot exceed the sum of the processor load capacity, load comprises all the tasks running in some processor. Storage capacity constraints can be expressed as

X

si xi;k 6 Rk ;

k ¼ 1; 2; . . . ; n

ð6Þ

where si shows the data storage capacity for the federation members subscribe the simulation data and their solution required by, Rk is the storage capacity with processor Pk (simulation grid system may exist in heterogeneous resources, its storage capacity is different). (6) – note: all the tasks in a processor required by the actual storage capacity cannot exceed the sum of the storage capacity of the processor itself. Real-time simulation step-size limit is expressed as

Maxðpacei xi;k Þ 6 T k ;

k ¼ 1; 2; . . . ; n

ð7Þ

These simulation entities of federation members compute ti task in one time occupy processor time is pacei. In the processor Pk, Tk is the time limit that all the tasks required by, that is simulation step size. (7) – note: as long as the time cannot exceed the project under the real-time restrictions that are dealt with all the tasks on a processor, namely, system simulation step size. Migration strategy’s objective function T can be defined as

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

TðxÞ ¼

XX k

fqi;k xi;k þ wai;k ci;k ui;k xi;k g

1123

ð8Þ

i

Among them, qi;k xi;k shows treatment costs that each task in its distribution processing and the second shows the comprehensive load that is due to migration between the processor. Constant w regulates the communication and load expenses, which compensate for the difference in metric units [22]. In this paper, the heuristic method is used to the dynamic scheduling algorithm. In conditions (5), (6) and (7) the restrictions, through (8) minimized, so can dynamic schedule the simulation grid system resources and complete dynamic movement. 5. Scheduling algorithm research and implementation Dynamic scheduling algorithm is the most popular heuristic method. Heuristic method mainly obtains some information in advance (including the task of particle size and tasks communication), and it is a non-analytical, similar, gradually approaching method leading to the optimal solution [23]. The experiments show that confrontation simulation systems are often due to greater communication with individual entities or at the beginning of a simulation irrational allocation of resources arising from the load imbalance and an error. How the dynamic scheduling system completes is close with monitoring system. According to the massive load situation in real time, it will be up to dynamic scheduling system to decide when to schedule simulation task, which to choose, where to migrate by the research of scheduling algorithm. This algorithm mainly resolves scheduling question which schedule some tasks to some grid nodes in order to enable the simulation running well on the match resources. The following is the dynamic scheduling algorithm: (1) To initialize dynamic scheduling parameters. Select the number of load indicators and set the threshold value of the indicators; (2) To import the results table of static scheduling. This table preserves all the one-to-one relationship of the grid nodes and the federation members. Dynamic scheduling is as the basis for scheduling; (3) To set up low-level polling frequency. This order sent to the monitoring system, monitoring systems go on the polling frequency to monitor the load of the grid nodes or federal members; (4) To establish load regulation queue. According to the information reported in the monitoring system, load conditions can assess the current grid nodes through grey cluster approach. The head of the queue is the lightest load processor; the end of the queue is the worst load the processors; (5) For (j = 1; j < n; j ++)// find a node contained POSITION pos = list.GetHeadPosition ( ); While (pos! = NULL) ( Cstring state = list.GetNext (pos); If (state = ‘‘General”) ( Polling set up high-frequency; ) else (state = ‘‘heavy load”) ( Step into f; List.RemoveAt (pos); ) )

(6) To traverse C mn , and find fi federation member which minimal communicate each other in Pi processor. (7) To choose Pk processors through the prediction model which meet the performance needs of the largest processors, and if fi join still meet the three restrictions, fi will be assigned to the Pk at the same time migratory instructions sent to Pk and Pi agents. To solve the problem of dynamic scheduling in simulation grid system, it needs to propose a federation partitioning strategy based on interaction priority algorithm. The strategy greatly improves the efficiency of simulation running than traditional strategy. The algorithm partitions entities into groups according to their interaction frequency, namely, the entities with high interaction frequency are aggregated into one group and will be mapped onto the same processor to be simulated. 6. Migration based on federate member In the distributional simulation the dynamic scheduling generally realizes migration, but migration may run under two levels. One is the weapon entity, another is the federate member. At present it is main that the studies of the entity level’s migration in computer scheduling, but it is discovered that the method does not suit the grid environment for the condition of entity level migration is preserved and restored. HLA does not support the entity migration; therefore, through the tedious step it can complete the migration activity finally, and the entity migration’s price must be high with an ordinary migration.

1124

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

In brief, which HLA does not support, and simulation grid need, in the large-scale distributional simulation the dynamic scheduling needs the special design. Therefore, the methods of dynamic scheduling and migration are proposed based on the federate member. 6.1. The method of distribution migration In the network environment, many computers are considered as the destination of migration process. The process that chooses the right grid nodes as the destination of migration process is actually the process of a resource organizations and the scheduling. The space–time expenses, the impact of the operating system and the difficulty of realization should all be considered in specific applications. Using migration method to adjust simulation system in running can make the simulation system adapt to the changes in the status of communications network, node load and interaction for the federates. In the present, the study of process migration for the simulation grid is rare. Literature [24] was studied on the environment which is based on HLA and grid simulation of resource management and load balancing. According to the available researches and the characteristics of simulation grid system (for real-time requirements), system uses the application-level process migration more simple than the system-level and application-level process migration does not increase the complexity of scheduling. So simulation grid system uses the method of application-level process migration. 6.2. Migration process information activities Throughout the migration process for federate members, the source federate members and the target federate members themselves can switch in a series of status. They complete a life cycle migration. Federal sources experience members of the state: join, run, pause, and migrate for the federate members, termination. The target federate members experience the state: start, join federation, resume the current information of the federate members, running. In order to effectively express members of the state and federal transfer trigger conditions, in Figs. 8 and 9 there is described that a pair of the federate member migrate the state changes and events. 6.2.1. The state and migration of the source federate member Join: In accordance with the HLA rules and logic simulation, the source federate member is completed the initialization and join the execution. Run: the federate member is still run and keeps up the normal state in accordance with the HLA rules. Pause: In the course of running, because simulation node overload or too much delay in interactive and other reasons, DSM will make migration decision, and then issue the migration request. Migration requests trigger an action that the federate state suspends. In this state, the federate member’s information can be collected, coded and preserved. At the same time, the source federate member waits for the request information of the target federate member. Migrate: After DSM makes migration decision, it chooses a suitable target node and establishes the communication between the agent of source node and target node, the agent of target node will transmit the current information of federate state, which would trigger a communications activity. The source federation members transfer the current information of federate member. Termination: The source federate member saves the current states and encodes it, and then sends it to the target federate member. After the target federate member succeeds in resumption, DSM will send the source federate member a ‘‘successful

Fig. 8. Migration process changes of the source federate member.

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

1125

Fig. 9. Migration process changes of the target federate member.

migration” signal. That is, it accesses termination state. In addition, if the entire simulation cycle does not happen, the source federate member will run based on the normal simulation logic until end. The simulation can also trigger a complete withdrawal activity and lead the source federate member to termination state. 6.2.2. The state and migration of the target federate member Start: In accordance with the HLA rules and logic simulation, initialization is completed by the source federate member. Join federation: After starting the target federate member join to the federation executive. In this state, the target federate member orders all of HLA class. Resume: The target federate member will request the information of the federation execution after successful join in simulation. When receiving such information, the target federate member begins to restore the state of the source federate member. All of this, the target member completed information rehabilitation activities with the help of its agent. Run: The federate member triggers a ‘‘re-run” activities after successful resume the implementation information, after the status restoration the federate member re-starts simulation, so the federate member run with the normal simulation states. In addition, according to its own logic two federate members converse each state. In the entire simulation cycle, the relationship of source and target members is only relative. After migration the target federate member also will enter the life cycle of the source federate member. From the entire of life cycle, the federate identity is mutual transformation. 7. System realization 7.1. Simulation grid system for the air-to-ground warfare Based on the warfare simulation and performance analysis, simulation grid system for the air-to-ground warfare can be used as an example, the system has been designed into distributed architecture (Fig. 10), and simulate warfare environment including the red and blue sides. This system is based on the grid, according to the characteristics of distributed simulation. The system entities include aircraft, radar entities, artillery and missile which involved in the scheduling grid systems all can be members of the federation. The system also includes three-dimensional visual federation member and management federation member. Three-dimensional visual federation member must be allocated in graphics workstation, graphics work-

Scheduling management federation

Red side Aircraft Federation

monitoring/evalution HLA/RTI

Three-dimensional visual federation

Missile Federation

Radar Federation

Blud side Fig. 10. Logical structure.

Artillery Federation

1126

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

station provide realistic simulation process, including the real battlefield scene. The process shows that combat-related operations and results, and so on. Management federation member must also fix in another node, because the user or administrator needs to manage all grid nodes and monitor the real-time simulation load conditions. After user puts in the simulation task, simulation grid system begins to assign the task all in accordance with the implementation overhead and the relationship of publish/subscribe, so that federate members are assigned to the grid resources. The simulation is running. In Fig. 11 some nodes show computer effectual and the others are contrary. Fig. 12 shows the performance information in monitoring system. The test environment is: Pentium4 3.2 GHz, 1 GB memory; Quidway S1224; 3 Com Gigabit Ethernet, Windows XP; pRTI2.0. In the process of simulation system, not only real-time monitors the performance parameters of the equipment, but also chooses the relocation equipment when there are excessive load devices. As shown in Fig. 13, the load of equipment whose IP address is 192.168.0.9 suddenly increased, and beyond the threshold, it will affect this device in the other federation members, so the scheduler migrates aircraft federation from 192.168.0.9 to 192.168.0.5. The experiments show that dynamic scheduling is used to achieve migration of the federation member, and test results correctly. 7.2. Two experiments Below, there are two experiments to test the above-mentioned method. The grid system has 13 grid nodes, which can automatically join and leave. There are 36 entities and a management of the federate member, 3D vision federate member is placed in graphics workstation. 7.2.1. Migration test In order to simulate network congestion, as shown in Fig. 14, the mutation happens because we are man-made increased network load of this resource in 80–100 s periods. When the network load over the threshold value (the threshold is 40%), local resources have taken place in the relocation of operations. 7.2.2. Comparison test of random distribution, static scheduling and dynamic scheduling Random method uses a ‘‘shotgun”. Firstly the federate member is disrupted, and then random sort and assemble by computer. So those federate members may be determined by their grid position. This system has four types of model, i.e., aircraft, radar, missile and artillery, and aircraft model is the most complicated one, so it engross the CPU largest. The 36 federate members of the simulation are tested by random method, static and dynamic scheduling ways, in the course of testing, if there is abnormal or crash when RTI sends data, system is suspended automatically until 5 s, and resume the simulation implementation. Test condition for each grid load is the same.

Fig. 11. The state of simulation grid resource in controller.

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

1127

Fig. 12. Performance information.

Fig. 13. Dynamic migration with entities of the federation members.

The results are shown in Fig. 15: Regardless of the number of grid nodes, random method does not effectively use grid resources. With the increase in the number of nodes’ members the running time will not obviously change. Only use static scheduling for static scheduling system is relatively simple and small operating expenses, but in the simulation grid, because of the difference of running real-time status it brings about load balancing, and is difficult to design and implement. Therefore, in the grid environment, dynamic scheduling data show more obvious advantages and improve load balancing.

1128

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

Network load

40% 30% 20% 10% 0% 0

40

80

120

160

200

240

280 S

Fig. 14. Grid node’s network load.

Fig. 15. Comparison test of random distribution, static scheduling and dynamic scheduling.

8. Conclusion Multi-machine environment allocation of resources is the well-known NP, this study shows that the dynamic scheduling algorithm is an effective method, which is an internal parallel distribution, and easy to combine with other ways. Furthermore, the implementation costs little and easy to achieve. Its application to the grid-based distribution interactive simulation system optimizes the scheduling of the system resources, and improves the efficiency of the system. It provides useful explorations for further developments to carry out more complex grid simulation applications.

Acknowledgment The authors wish to thank the anonymous referees for their careful reading and constructive comments on the paper.

References [1] I. Foster, C. Kesselman, J.M. Nick, S. Tuecke, Grid services for distributed systems integration, IEEE Computer 35 (6) (2002) 37–46. [2] I. Foster, N. Karonis, A gird-enabled MPI: message passing in heterogeneous distributed computing systems, in: Supercomputing’98, vol. 14, 1998, pp. 121–125. [3] L. Guy, P. Kunszt, E. Laure, H. Stockinger, K. Stockinger, Replica management in data grids, Global Grid Forum 5 (2002) 278–280. [4] C. Wei, C. Yan, C.X. Meng, Distributed database system in the interpretation of the rules of distribution of a heuristic algorithm, Computer Journal 10 (19) (2004) 136–139. [5] S. Wu, D. Sweeting, Heuristic algorithms for task assignment and scheduling in a processor network, Parallel Computing 20 (1994) 1–14. [6] R. Buyya, M. Murshed, GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing, Concurrency and Computation: Practice and Experience 14 (2002) 1175–1220. [7] A.R. Tripathi, N.M. Karnik, Tends in multiprocessor and distributed operating systems designs, Journal of Supercomputing 9 (1995) 23–49. [8] J. Watts, M. Rieffel, S. Taylor, A load balancing technique for multiphase, computations, in: Proceedings of High Performance Computing ’97, June 1997, pp. 15–20.

Y. Fu et al. / Simulation Modelling Practice and Theory 18 (2010) 1116–1129

1129

[9] H. Casanova, Simgrid: a toolkit for the simulation of application scheduling, in: Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, IEEE Computer Society, Brisbane, Australia, May 2001, pp. 430–437. [10] A. Legrand, L. Marchal, H. Casanova, Scheduling distributed applications: the SimGrid simulation framework, in: Proceedings of the Third IEEE/ACM International Symposium on Cluster Computing and the Grid, IEEE Computer Society, Tokyo, Japan, May 2003, pp.138–145. [11] H. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang, K. Taura, A. Chien, The MicroGrid: a scientific tool for modeling computational grids, Scientific Programming 8 (3) (2000) 127–141. [12] R. Buyya, M. Murshed, GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing, Concurrency and Computation: Practice and Experience 14 (13) (2002) 1175–1220. [13] R. Wlski, Experiences with predicting resource performance on-line in computational grid settings, ACM SIGMETRICS Performance Evaluation Review 30 (4) (2003) 41–49. [14] P. Holub, M. Kuba, L. Matyska, M. Ruda, Grid infrastructure monitoring as reliable information service, in: The Second European Across Grids Conference, Nicosia, Cyprus, January 2004. [15] B. Hamidzadeh, L.Y. Kit, D.J. Lija, Dynamic task scheduling using online optimization, IEEE Transactions on Parallel and Distributed Systems 11 (2000) 11–13. [16] M. Hamdi, C.K. Lee, Dynamic load-balancing of image processing application on clusters of workstations, Parallel Computing 22 (2002) 1477–1492. [17] K.K. Jain, V. Rajaraman, Lower and upper bounds on time for multi-processor optimal schedules, IEEE Transactions on Parallel and Distributed Systems 5 (8) (1994) 879–886. [18] M. Maheswaran, H.J. Siegel, A dynamic matching and scheduling algorithm for heterogeneous computing systems, in: Proceedings of the Seventh IEEE Heterogeneous Computing Workshop (HCW’98), IEEE Computer Society Press, January 1998, pp. 57–69. [19] M. Willebeek-Lemair, A.P. Reeves, Strategies for dynamic load balancing on highly parallel computers, IEEE Transactions on Parallel and Distributed Systems 9 (4) (1993) 979–993. [20] J. Cao, S.A. Jarvis, S. Saini, D.J. Kerbyson, G.R. Nudd, ARMS: an agent-based resource management system for grid computing, Scientific Programming 10 (2002) 135–148 (Special Issue on Grid Computing). [21] J.N. Cao, A.T.S. Chars, Y.D. Sun, et al., A taxonomy of application scheduling tools for high performance cluster computing, Cluster Computing May (2004). [22] T.D. Braun, H.J. Siegel, N. Beck, et al. A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems, in: Proceedings of the Eighth IEEE Heterogeneous Computing Workshop (HCW’99), IEEE Computer Society Press, July 1999, pp.15–29. [23] M.Y Wu, W. Shu, H. Zhang, Segmented min–min: a static mapping algorithm for meta-tasks on heterogeneous computing systems, in: Proceedings of the 9th Heterogeneous Computing Workshop, IEEE Computer Society, January 2000, pp. 56–64. [24] M. Maheswaran, S. Ali, H.J. Siegel, et al, Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems, in: Proceedings of the 8th IEEE Heterogeneous Computing Workshop (HCW’99), IEEE Computer Society, January 1999, pp. 30–44.