Computers and Electrical Engineering 29 (2003) 625–642 www.elsevier.com/locate/compeleceng
Simulation and control of distributed robot search teams Robert L. Dollarhide b
a,1
, Arvin Agah
b,*
a Signal Exploitation and Geolocation Division, Southwest Research Institute San Antonio, TX 78238, USA Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
Received in revised form 14 April 1999; accepted 9 September 1999
Abstract This article describes the simulation of distributed autonomous robots for search and rescue operations. The simulation system is utilized to perform experiments with various control strategies for the robot team and team organizations, evaluating the comparative performance of the strategies and organizations. The objective of the robot team is to, once deployed in an environment (floor-plan) with multiple rooms, cover as many rooms as possible. The simulated robots are capable of navigation through the environment, and can communicate using simple messages. The simulator maintains the world, provides each robot with sensory information, and carries out the actions of the robots. The simulator keeps track of the rooms visited by robots and the elapsed time, in order to evaluate the performance of the robot teams. The robot teams are composed of homogenous robots, i.e., identical control strategies are used to generate the behavior of each robot in the team. The ability to deploy autonomous robots, as opposed to humans, in hazardous search and rescue missions could provide immeasurable benefits. 2003 Elsevier Science Ltd. All rights reserved. Keywords: Evolutionary robotics; Distributed robotics; Robot search teams; Multi-agent systems
1. Introduction It was not long ago when the images and public perception of robots were limited to the extreme visions created by science fiction writers and the entertainment industry. However, today it is not uncommon to read an interesting article about recent advances in robotics, watch a robot search the surface of Mars [15] on the nightly news, or even possibly encounter one in the work place [2]. As robots make such inroads into our daily lives, it becomes increasingly apparent how *
Corresponding author. Tel.: +1-785-864-7752; fax: +1-785-864-0387. E-mail address:
[email protected] (A. Agah). 1 Work performed while Robert L. Dollarhide was at the University of Kansas. 0045-7906/03/$ - see front matter 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0045-7906(01)00048-9
626
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
they can benefit society. Nowhere is this more evident than in situations where one or more robots could replace humans in a dangerous situation. One area of study, which has recently piqued the interest of the robotics community, is the use of robots in search and rescue operations. Search and rescue operations require a massive effort by rescuers in very dangerous environments. Collapsed and unstable buildings, leaking gas lines, and fire are only a few of the things that pose a threat to the lives of human rescue teams. The ability to deploy autonomous robots, as opposed to humans, into this type of environment to search for survivors provides immeasurable benefits. The potential for using robots in place of humans requires addressing questions such as: (1) What type of robot can move effectively in an unknown and dynamic environment? (2) How many robots should be used to efficiently cover the most area in a search operation? (3) How should the robot be controlled? These and related issues can be addressed through simulation experiments where teams of deployed robots are implemented, tested, evaluated, and improved upon. This paper presents the design and implementation of a computer simulation for area coverage of a team of robots in a search and rescue mission [9]. A graphical computer simulation program was developed as part of this project using OpenGL [18] and programmed in C++ [8]. The simulator is provided with a variety of parameters to specify the scope of the experiment and then displays the robots as they work their way through a specified floorplan. In each time step, a robot uses its sensors to detect objects, and the sensory data is then mapped to the rule sets (situation-action pairs), resulting in the associated action. The performance of robot teams are rated by the percentage of total rooms (enclosed areas) entered within a pre-specified amount of time. Teams reaching complete coverage (100%) are also rated upon how quickly total coverage was achieved. A snapshot of the simulated search robots, deployed in a multi-room floorplan in shown in Fig. 1.
Fig. 1. The simulated robot search team deployed in a specific floorplan.
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
627
2. Robotics Robotics is an extremely broad field encompassing a variety of applications and research interests. From the robots used on factory assembly lines [2] to those conducting Mars exploration for NASA [15], there are seemingly endless possibilities relating to the visions and uses of robotics. The definition of the word robot is itself dependent upon both who is defining it and its intended context. For our purposes, an intelligent robot is defined as: ‘‘A machine able to extract information from its environment and use knowledge about its world to move safely in a meaningful and purposive manner’’ [4]. Most researchers agree that the methods used for controlling autonomous robots can be divided into three general categories: deliberative, reactive, and hybrid systems. The deliberative approach (also referred to as planner-based) is a strategy where intelligent tasks can be implemented by a reasoning process operating on an internal model of the world [12]. This approach dominated the artificial intelligence community for years resulting in the development of a standard architecture by the US Government in the 1980s, which reflected the deliberative model [4]. Rodney Brooks, Director of the Massachusetts Institute of TechnologyÕs Artificial Intelligence Laboratory, refers to deliberative architectures as a sense-model-plan-act (SMPA) framework [6]. In contrast, reactive systems do not maintain an internal model of the world and apply some form of a simple mapping between stimuli and responses [13]. Ronald Arkin of the Georgia Institute of Technology gives the following precise definition of the approach: ‘‘Simply put, reactive control is a technique for tightly coupling perception and action, typically in the context of motor behaviors, to produce timely robotic response in dynamic and unstructured worlds’’ [4]. The robotics community began to take interest in reactive systems in the mid 1980s, as many of the shortcomings of deliberative control for mobile robots became apparent. Specifically, deliberative autonomous systems displayed a number of deficiencies such as brittleness, inflexibility, and slow response times when operating in complex and dynamic environments [12]. Speed of response was a key weakness for deliberative systems. Maja Mataric of the University of Southern California suggests that the primary division ‘‘between reactive and deliberative strategies can be drawn based on the type and amount of computation performed at run-time [13].’’ In 1985, Brooks presented a new and fundamentally different architecture for controlling mobile robots using a purely reactive approach [7]. His architecture consisted of a number of asynchronous modules that were layered into a hierarchy for controlling a robot. Each module for control was computationally simple and could run in parallel with other modules. To accomplish complex tasks, higher-level modules could subsume lower level modules, temporarily suspending their operations. A large amount of research using subsumption (reactive) architecture followed BrooksÕ initial study. Hybrid architectures for controlling mobile robots have inevitably arisen incorporating the benefits of both deliberative and reactive systems. Most hybrid architectures utilize a reactive approach to handle real-time issues concerning the robotÕs performance within its real world environment, while a deliberative approach is used for higher level planning and complex computation [6]. These approaches frequently divide the system into a low level (reactive) layer, a high level (deliberative) layer, and an interface layer between the two. Reactive architectures and behavior-based architectures are most often considered identical. However, extremes exist regarding how basic or how complex a system can become while still
628
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
being classified as reactive or behavior-based. Mataric contends that there is a fundamental difference between reactive and behavior-based systems [13]. She suggests that though behaviorbased systems contain properties or even components of a purely reactive system, their computation need not be as limited. In this way, behavior-based systems can store various forms of state and can implement different representations. Furthermore, she suggests that behaviors are more time-extended than the reflexive actions of a reactive system. As interest in reactive systems grew, researchers inevitably attempted to mimic biological systems using machinery and computational systems for the purposes of accomplishing a desired task. Arkin describes how neuroscience ‘‘provides a basis for understanding and modeling the underlying circuitry of biological behavior [4].’’ He points out that a number of psychological schools of thought have inspired robotics researchers over the years. In particular, the study of behaviorism (which originated in the early 1910s) has secured a solid foundation within robotics. This method of study is based upon observation only in which everything is considered in terms of stimulus and response [4]. In this context, numerous research efforts have examined biological behaviors in hopes of imitating complex animal behaviors in autonomous mobile robots. Arkin makes a correlation between animal and artificial intelligence by defining intelligence as ‘‘a system with the ability to improve its likelihood of survival within the real world and where appropriate to compete or cooperate successfully with other agents to do so [4].’’ Recently, this study of biological behavior has extended into the world of multi-agent systems. Sociobiological behaviors have been studied and emulated using groups of mobile robots. Arvin Agah of the University of Kansas examined individual and collective robot learning using a Tropism System Cognitive Architecture based on the likes and dislikes of the robot agents [1]. Some of the work in this area has relied upon observations of ant and bee colonies and their ability to carry out global tasks using the limited local information of individual agents within the system. Within the last decade, researchers have begun to focus on robotic systems consisting of multiple robots, either homogeneous or heterogeneous, to accomplish one or more tasks. Some of the advantages of using distributed robotics consist of robustness, flexibility, distributed nature, and more simplified robots. Examples of such work include: [3,5,10,14,16,17].
3. The simulator The simulator program developed for this study was created using the GNU C++ compiler version 2.7. The simulation consists of four elements: parameters, testing environments, robots, and robot control (rule sets), each of which is described in this section. 3.1. Simulator parameters Before launching a run for one or more robots, the user is prompted for a number of parameters to define the scope of the experiment. The user is first asked if they would like to use the graphics mode. The simulator can be run using either a graphics mode to display the robots as they maneuver through an area or with a Ôno graphicsÕ mode. Primarily, the no graphics mode is
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
629
used when utilizing methodologies for generating controllers, i.e., rule sets, since the amount of time it takes to complete a run is greatly reduced. Both modes are identical in every aspect except for the presence of the graphics display. The display was created using OpenGL version 1.2 [18]. An instance of no graphics mode is when the simulator utilizes genetic algorithms to identify nearoptimal rule sets for governing the actions of the robot teams. Genetic algorithms are a search algorithm based upon the mechanics of natural selection and natural genetics [11]. Next, the user must specify the number of robots they wish to deploy and the desired sensing range for each robot in the team. The user can select to use one to 20 robots. Additionally, the user can select one of three sensing ranges. Sensing range 1 uses a more immediate sensing area around the robot while sensing ranges 2 and 3 enlarge the range to encompass an increasingly larger area. Robot team members are homogenous and therefore, all have identical sensing range. Once the robot parameters are specified, the user selects an environment (a floorplan) through which the robots will maneuver. There are three environments to select from (1) a home floorplan, (2) an office floorplan, and (3) a hotel or apartment complex floorplan. The floorplans consist of 6, 12, and 24 rooms respectively (these are described in depth later in this section). The simulator then asks the user to specify a time limit for the robots to cover the selected floorplan. This is to be specified in units of seconds. One simulation time unit is represented by a tenth of a second. The robots will only have the specified amount of time to move through the area and will have their performance evaluated upon completion of the time period. Performance is measured by the percentage of rooms entered. For instance, a team that enters 55% of the rooms in an environment receives an initial performance score of 55. If the robot team enters 100% of the rooms, performance is also measured by how quickly the team accomplished total coverage by adding points to a robot teamÕs initial score. The less time it takes to achieve 100% coverage, the greater amount of points that are added to the initial score. Once these parameters have been specified, the program asks the user if they would like to use the genetic algorithm. By answering Ôyes,Õ the user will ultimately generate a rule set through genetic evolution. A ÔnoÕ response indicates that the user will test a specific rule set that has already been developed or evolved. Whether using the simulator to test a developed rule set or to evolve a set of rules using the genetic algorithm, it is necessary to specify the number of rules that exist within a set. A rule set can contain from the minimum of one to a maximum of 50 rules. If testing a developed rule set, the number input by the user must match the number of rules in the actual set. When evolving rule sets with the genetic algorithm, the rules will initially be randomly generated for the user. The description of the rule set format is addressed later. If the user does not choose to use the genetic algorithm, the program launches the simulation as specified by the user parameters. If the user decides to use the genetic algorithm, it is required to specify the size of a population and number of generations. For a population of size n, the program will randomly generate n sets of rules. Once created, the program will initiate the simulation for the first member of the population using the first randomly generated rule set. When the user-defined time limit for the simulation is reached, the performance of the robot team is evaluated and a new simulation is launched for the next member of the population using its associated randomly generated rule set. This process repeats until all members of the population have completed simulations using their rule set and have been evaluated. Once this first generation finishes, the genetic algorithm creates a new population for the second generation from the population members of the first based on the fitness. This process repeats until all generations
630
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
have completed their simulations for all population members. The genetic algorithm approach creates a rule set file from the best performing population member from the final generation, which can be used for future testing. 3.2. Testing environments There are three testing environments (floorplans) through which the robots maneuver. There are the home floorplan 1, office floorplan 2, and hotel or apartment floorplan 3, consisting of 6, 12, and 24 rooms, respectively. The floorplans are shown in Fig. 2.
Fig. 2. Floorplans: (a) home floorplan 1, (b) office floorplan 2, and (c) hotel/apartment complex floorplan 3.
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
631
A testing environment is defined within a space of 300 300 units. In graphics mode, this is within a window containing 300 300 pixels. The viewing size can be adjusted. A white border defines the perimeter of each testing environment. The interior of the space consists of simple white walls over a black background (in a pattern similar to a maze). Yellow semi-circle trip wires are found at openings to rooms and are used for evaluating team performance (not for controlling robot actions). An open area in the lower left corner of each floorplan is the location where robot teams start out at the beginning of every simulation. 3.3. Robots Robots are represented in the simulation by colored circles with a black line from the center to the outer edge indicating a robotÕs current forward direction. The robots can change between any one of three states (red, green, and blue) throughout the course of a simulation. The colors are intended to represent state communication, which can be detected by other robots. Each robot senses in eight directions (every 45 about the robotÕs center). These directions are identical to the standard directions of a compass (N, NW, W, SW, S, SE, E, NE) with North being the position of a robotÕs current forward direction (Fig. 3). 3.4. Robot control Robot control is implemented using rule sets, which are comprised of a predetermined number (n) of rules (specified by the user). Each rule is represented by a 141 bit string making the rule set a bit string 141 n in length. The string format (Fig. 4) defines two distance values and an entity value (a DDE) for each of the robotÕs eight sensors. In addition, a state value and an action value are defined for mapping a robotÕs sensed world to a specific action. Each of the value types is described below:
Fig. 3. Sensing directions of a robot.
632
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642 0
13 16
6
33 DDE
Distance Values
DDE
50
67 DDE
84 DDE
101 DDE
118
DDE
135 137 140
DDE
Action Value
State Value
Entity Value DDE
Fig. 4. Rule format for robot control.
• The 16 distance values within the rule bit string (two for each sensor) represent range boundaries where an entity specified by the entity value is detected. Each distance value is seven bits in length providing a range from zero to 128 units. One distance unit is equivalent to one element of the 300 300 array used to represent the testing grounds. If the simulation were run using graphics mode (a 300 300 pixel window), robot sensors would have the capability of sensing a distance of up to 128 pixels from the robot center. • Each of the eight entity values within a rule string requires three bits and is associated with two distance values. Two distance values and one entity value define one of the eight DDEs contained within a rule (one DDE for each sensor). Entity values range from zero to four, indicating the following: the absence of any detected object, a red robot, a green robot, a blue robot, or a wall. • The state value requires two bits of the rule bit string and indicates the current state of a robot. State values range from zero to two representing a red, green, or blue state, respectively. Should a ruleÕs state value match a robotÕs current state, it is more likely that the action associated with that rule will be selected. Mapping a robotÕs sensors to a particular rule is addressed in depth later in this section. • The action value is three bits and represents the action that is to be carried out by the robot should the sensed world match the associated rule. Action values range from zero to five allowing for the following six actions: forward, turn left, turn right, turn red, turn green, or turn blue, respectively. Rule sets are read from a text file when running a simulation without the genetic algorithm. The format of the text file (Table 1) is identical to the file created following rule set evolution using the Table 1 Sample rule in text format Forward Right Forward Right Right Back Back Left Back Left Left Forward State/Null/Action Val.
Dist. Val. 1
Dist. Val. 2
Entity Val.
0 10 0 10 0 0 0 0 1
0 40 0 40 0 0 0 0 0
0 4 0 4 0 0 0 0 2
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
633
genetic algorithm. The format consists of nine rows and three columns. Each of the first eight rows displays the two distance values and the entity value for one of the robot sensors. The ninth row contains the state value, an irrelevant zero value (used to fill the ninth row second column element), and the action value. All rows begin with the ÔÕ character which is used as an identifying tag when parsing the rule set text file. Table 1 illustrates a rule in a text file format. The shaded area depicts the actual format of the rule. This rule (one of n rules in a rule set) can be interpreted as follows: when a green (state value of one) robotÕs Right Forward and Right Back sensors detect an entity value of 4 (a wall) at a range between 10 and 40 units and all other sensors detect nothing, the robot should turn right (action value of two). A user might create a rule such as the one in Table 1 based upon the following reasoning: if the robot detects a wall to the Front Right and Back Right, but not to the immediate Right, it has most likely encountered an opening and should turn right to investigate.
4. Simulator implementation 4.1. The world The entire simulation environment is maintained within a 300 300 array of integer values. Array elements can contain one of nine possible values, zero through eight, indicating that the position within the array is: empty; contains a trip wire; is white (a wall), red, green, or blue; is red, green, or blue in a space over a trip wire, respectively. This is as a large 300 300 grid where each square (an element of the array) can contain a color (a value) used to ÔdrawÕ the robotsÕ world. Table 2 illustrates the nine possible array values. At the beginning of every simulation, the specified floorplan is ÔdrawnÕ into the 300 300 array. All walls and trip wires are first placed in the array. Next, the robots are positioned over the existing array. In most cases, robots are placed over array elements containing zero values (empty). However, if a trip wire first occupies the position, the robot value inserted into the array element is between six and eight indicating that the space contained a trip wire (this assists in
Table 2 Element values in the 300 300 environment array Value
Represents
0 1 2 3 4 5 6 7 8
Empty Trip Wire Wall Red (over empty) Green (over empty) Blue (over empty) Red (over trip wire) Green (over trip wire) Blue (over trip wire)
634
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
Fig. 5. Assigning values to environment array elements.
replacing proper trip wire and empty values after a robot passes through a portion of the array). Fig. 5 illustrates a graphical example of how this occurs using 10 10 arrays. 4.2. Collision detection and control Collisions between two robots or between a robot and a wall are detected by checking the values of the environment array in 16 positions around the perimeter of a robot. With every unit of time that passes, the simulator checks an array element at the distance of the robotÕs radius + 1 for every 22.5 about the robotÕs center. Should the array element contain a wall or a robot value, a collision flag is set within the primary sensing function of the robot object within the simulation program. If a collision is detected within 90 of a robotÕs forward direction, a redirection value is established. Collisions detected to the left of the robotÕs forward direction result in a redirection value of intended_direction + ((angle_of_collision + 90) mod 360). Collisions detected to the right Intended Direction Collision Detected –45° from Intended Direction (315°)
Angle of Redirection: 0° + ( (315° + 90°) mod 360°) = 45°
Fig. 6. Robot redirection.
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
635
of the robotÕs forward direction result in a redirection value of intended_direction + ((angle_of_collision––90) mod 360). Fig. 6 illustrates how redirect values are set. If a redirection value is set as a robot attempts to move straight ahead, the robotÕs forward direction value remains unchanged while its true direction is adjusted by (intended_direction + redirection) mod 360.
4.3. Robot sensing The method used for robot sensing is similar to that applied to collision detection. Sensing is accomplished by checking the values of the environment array in eight directions around the center of a robot. With every unit of time that passes, the simulator checks array elements within a specified range for every 45 about the robotÕs center starting at the angle of forward direction (Fig. 7). For instance, if the sensing range is 50 units, the simulator checks every array element between radius + 1 and 50 units along the lines 0, 45, 90, 135, 180, 225, 270, and 315 of the robotÕs center (if the forward direction happened to be 0). In order to give the sensors a slightly wider spread, every fourth array element down lines 5 from the primary sensor direction are also checked. For example, when sensing down a line 90 from the robotÕs forward direction, every array element along that line is checked, as is every fourth element along lines 85 and 95 of the forward direction. Sensing begins at the element radius + 1 from the center of a robot and continues down the sensing line until a robot or wall is detected or the end of the sensing range is reached. If an entity is detected, the distance to the entity and the associated entity value are placed into an array for the sensed data (Table 3). Once an entity is detected or the end of the sense range is reached, the robot senses along the line of the next sensor. This continues until all sensing lines have been checked. Once complete, the sensed data array represents the robotÕs sensed world, which can then be mapped to a specific rule to determine a robotÕs action.
Intended Direction
Range between radius+1 and n Environment Array Element
Fig. 7. Simulation of robot sensing.
636
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
Table 3 Sample sensed data array Sensor position
Distance to entity
Entity type
Front Right Front Right Right Back Back Left Back Left Left Front
0 0 0 14 0 0 26 37
0 0 0 2 0 0 4 4
4.4. Matching rules A robotÕs action is selected after matching its sensed data array to a specific rule in its rule set. This mapping is analogous to playing the classic childrenÕs game, Battleship. The sensed data can be thought of as a playerÕs guess as to where their opponentÕs ships lie. These guesses are then placed over the opponentÕs playing field (a rule) and hits are scored. This is done for every rule in the rule set. The rule that takes the most hits is determined to be a match. Rule scores are maintained in a separate scoring array. For each distance and entity entry in the sensed data array (one of each for all eight sensors), the matching function determines if: • The distance falls within the range specified by the distance values for the associated sensor in the rule. • The entity value in the sensed data array matches the entity value in the rule. If the above criteria are met, the rule score is increased by one; otherwise, the rule score decreases by one. The following example illustrates initial scoring: • The Front sensor of the sensed data shows a distance of 20 and entity value of four (a wall). • The Front distance values for Rule X are 10 and 30 and the entity value is four. • The rule score for Rule X is increased by one because 20 is between 10 and 30 and the entity values match. After matching each of the sensed data to the rule data, the current state of the robot is compared to the state value specified in the rule. If the current state matches the ruleÕs state value, the score is increased by five. Greater points are awarded for matching state values to promote a divergence of robot roles through genetic evolution. By weighting the scoring, there is a greater probability that a red robot maps to a rule with a red state value and that a green robot would map to a rule with a green state value. Rule scoring is a linear task, starting with rule one and ending with rule n. After scoring a rule, its score is compared to the score of the winner. If a ruleÕs score is larger than the current winnerÕs
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
637
it becomes the new winner. Once all of the rules have been scored, the action associated with the winning rule is selected.
4.5. Bias and instincts In case a rule receives the same score as the current winner, an action bias has been imposed to determine the final action for the robot. If the rule and current winner contain the same action value, there is no need to utilize the bias; however, should the two have different action values, the bias is used to determine the final action for selection. The bias is an integer variable that holds the value of the last action taken by the robot. In case of ties where a ruleÕs score differs from the current winner, the ruleÕs action value is compared to the bias. If the two values are equal, the rule becomes the new winner; otherwise, the current winner remains unchanged. This method for tie resolution favors the repetition of the most recent robot action. The reasoning for this is to impose a limited amount of continuity upon a robotÕs actions. For instance, if a robot were in the process of turning left towards an opening based upon the actions of Rule A, but Rule B (with the same score as Rule A) conflicted with this action, the turn would continue, uninterrupted by Rule B, because of the robot bias. Throughout the course of a simulation, robots may become disabled for a variety of reasons such as: • Switching back and forth between two actions indefinitely. • Repeatedly attempting to move forward when obstructed. • Indefinitely repeating the same action. To counter problems like these, two instincts were given to the robotÕs that can override the action value defined within a winning rule. The first instinct maintains a timer that keeps track of the amount of time that a robotÕs coordinate (position) within the environment array remains unchanged. If this time exceeds 10 s, the robotÕs forward direction is adjusted one degree clockwise before selecting an action. If the robot were attempting to repeatedly move forward while obstructed, it would begin to slowly turn right after 10 s (attempting to free itself from immobility). The second instinct maintains a timer that keeps track of the duration that a robot repeats an action. If this time exceeds 15 s, the robot turns off its sensors for one second to rely solely upon collision control for guidance. This momentarily allows for a free roam in the forward direction (with possible redirection through the collision handling method previously described). This approach has proven useful when a robot becomes deadlocked by continuously changing to the same state color or spinning. When the redirection method triggered by the first instinct is ineffective at countering immobility, the free roam activated by the second instinct allows the robot to relocate before sensing again (where new sensed data could map to a rule with a more effective action value). The value of 15 s was selected to allow for at least one full rotation with adequate redirection (approximately 110 from the robotÕs forward direction when it first started to repeat an action) before triggering an instinct.
638
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
4.6. Evaluation The performance of a robot team is evaluated when the user-defined time limit is reached in a simulation. Performance is measured by the percentage of rooms entered in the environment (floorplan). Additionally, should the robots enter 100% of the rooms, the time it took to complete the total coverage is considered. These values are used to determine the performance of a robot search team and are also used as fitness levels for rule sets when using the genetic algorithm. The simulator maintains a list of the rooms that each robot has entered. This data is only used for team evaluation and not for assisting the robot during a simulation. Once the time limit is reached, the room data from each robot is checked to determine the percentage of coverage for the run. If two or more robots enter the same room, the simulator displays a list of rooms where this occurred; however, duplicated effort does not impact the evaluation of a robot team. Robots identify rooms by continually checking the environment array element found at their own x and y coordinates to see if a trip wire value is present (described previously). For instance, if the center of a robot is positioned in the 35th element of the 20th row within the environment array, the element is checked for a trip wire value. If a trip wire value is detected, the simulator compares the robotÕs coordinates to a global list indicating the location of trip wires within the testing environment and the room number associated with the trip wire. The simulator stores the room number corresponding to the detected trip wire.
5. Simulation experiments In series of simulation experiments, genetic algorithms were used to evolve controllers (rule sets) for teams of search robots. Genetic algorithms basically consist of three steps of reproduction, crossover, and mutation [11]. Reproduction is the process in which chromosomal strings that represent individuals are copied according to their fitness levels to be used in the creation of a new population of strings. Population members with greater fitness values are more likely to be used for creating strings for the following generation. The crossover process consists of selecting two strings and exchanging string segments between the two. The final step of a genetic algorithm is mutation, the random alteration of the value of a string position. In the reported experiments, rule sets are converted into binary chromosomal string and the fitness of the robots is computed to favor the rule set of robots achieving more coverage of the environment in less time. Three important variables in the experiments were the number of robots in each team, the selected floorplan, and the number of rules in a rule set. The sets of charts in Figs. 8 and 9 display the evolved results for robot team with sizes of 4 and 14, respectively. The charts in each figure represent the results of three different floor plans used for evolution of the robot teams. The evolution simulation experiments conducted showed that the genetic algorithm clearly improved robot team performance over the course of 100 generations. The simulation is being used in order to investigate the effects of team size, environment, controller size, and other factors influencing the performance of robot search teams.
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642 35 30
Average Fitness
25 20 10 Rule Fitness 15
20 Rule Fitness 30 Rule Fitness
10 5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
(a)
-5 Generation
25
Average Fitness
20
15 10 Rule Fitness 20 Rule Fitness 30 Rule Fitness
10
5
0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
(b)
Generation
25
Average Fitness
20
15 10 Rule Fitness 20 Rule Fitness 30 Rule Fitness
10
5
0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
(c)
Generation
Fig. 8. Team of 4 Robots: (a) floorplan 1, (b) floorplan 2, and (c) floorplan 3.
639
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642 80 70
Average Fitness
60 50 10 Rule Fitness 20 Rule Fitness
40
30 Rule Fitness 30 20 10 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
(a)
Generation 35
30
Average Fitness
25
20
10 Rule Fitness 20 Rule Fitness 30 Rule Fitness
15
10
5
0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
(b)
Generation 35
30
25
Average Fitness
640
20
10 Rule Fitness 20 Rule Fitness 30 Rule Fitness
15
10
5
0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
(c)
Generation
Fig. 9. Team of 14 Robots: (a) floorplan 1, (b) floorplan 2, and (c) floorplan 3.
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
641
6. Conclusion The presented simulation program has been used to conduct over 300 simulation experiments. The simulator provides an efficient tool for testing different robot behaviors and numerous compositions of robot search teams. The initial results from the evolution experiments show that robot team performance improves as the team size increases; however, this begins to plateau once team sizes reach the sizes of 12–14. With smaller team sizes, the limited time allotted for area coverage seems to have a significant impact on how many rooms could be entered by team members. Furthermore, as team size increases, team interaction promoting dispersion among team members results in larger fitness level payoffs, the effects of which may level off with medium to large sized teams. Results from cross testing rules evolved for a specific team size using different sized teams shows that in all cases, robot teams utilizing rules from different sized teams never outperform those using the team size used for evolution. This does not imply that using rule sets evolved from teams of different sizes are completely ineffective. The impacts of rule set size on robot team performance are currently inconclusive and requires further investigation. The planned future work of this simulation project is to increase the number of environments in which robot teams can be evolved and tested. Future studies should include floorplans with rooms of different shapes and sizes. Furthermore, throughout the course of evolution the robot teams and their rule sets could be migrated from one environment to another or the environment could even be randomly selected for each new generation. This could result in teams with greater flexibility when encountering new environments. The eventual goal of this simulation system is to be utilized as a tool for better design, development, and deployment of robot search teams to assist in search and rescue missions.
Acknowledgements This work was sponsored in part by the National Science Foundation under grant EIA9972843.
References [1] Agah A, Bekey G. Phylogenetic and ontogenetic learning in a colony of interacting robots. Autonom Robots 1997:85–100. [2] The American Society of Mechanical Engineers (ASME), Cobots Page. http://www.memagazine.org/contents/ current/features/cobots/cobots.html, 2000. [3] Arkin RC, Balch T. Cooperative multiagent robotic systems. Available from: ftp://ftp.cc.gatech.edu/pub/people/ arkin/web-papers/coop.ps.Z, 2000. [4] Arkin RC. Behavior-based robotics. Cambridge, MA: MIT Press; 1998. [5] Asama H, Ozaki K, Ishida Y, Yokota K, Matsumoto A, Kaetsu H, Endo I. Collaborative team organization using communication in a decentralized robotic system. IROS 1994:816–23. [6] Brooks RA. Intelligence without reason, MIT AI Lab Memo 1293, 1991. [7] Brooks RA. A robust layered control system for a mobile robot. MIT AI Lab Memo 864, 1985. [8] Deitel HM, Deitel PJ. How to program C++. Upper Saddle River, NJ: Prentice Hall; 1998.
642
R.L. Dollarhide, A. Agah / Computers and Electrical Engineering 29 (2003) 625–642
[9] Dollarhide RL. Evolving behavior-based brains for robot search teams. M.S. Thesis, Department of Electrical Engineering and Computer Science, The University of Kansas, 2000. [10] Fontan MS, Mataric MJ. Territorial multi-robot task division. IEEE Trans Robotics Automat 1998;14(5). [11] Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley; 1989. [12] Maes P. Situated agents can have goals. Robotics Autonom Syst 1990:49–70. [13] Mataric MJ. Behavior-based control: examples from navigation, learning, and group behavior. In: Hexmoor, Horswill, Kortenkamp, editors. J Experimental Theoret Artificial Intellig. Special Issue on Software Architectures for Physical Agents 1997;9(2–3). [14] McLurkin J. Using cooperative robots for explosive ordnance disposal. MIT Artificial Intelligence Laboratory. Available from: http://alpha-bits.ai.mit.edu/projects/eod-robots/eod-paper.pdf, 2000. [15] National Aeronautics and Space Administration (NASA), Rover Sojourner Home Page. http://www.ksc.nasa.gov/ mirrors/jpl/pathfinder/rover/sojourner.html. [16] Parker LE. Cooperative robotics for multi-target observation. Intelligent Automation and Soft Computing, Robotics Research at Oak Ridge National Laboratory 1999;5(1):5–19. [17] Wang J. On sign-board based inter-robot communication in distributed robotic systems. In: Proceedings of the 1994 IEEE International Conference on Robotics and Automation, San Diego, CA, 1994. p. 1044–50. [18] Woo M, Neider J, Davis T, Schreiner D. OpenGL programming guide third edition the official guide to learning OpenGL version. Reading, MA: Addison-Wesley; 1999.