Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
Contents lists available at SciVerse ScienceDirect
Robotics and Computer-Integrated Manufacturing journal homepage: www.elsevier.com/locate/rcim
Multirobot coordination in pick-and-place tasks on a moving conveyor H. Is-ıl Bozma n, M.E. Kalalıo˘glu Intelligent Systems Laboratory, Electric Electronic Engineering Department, Bogazic- i University, Istanbul, Turkey
a r t i c l e i n f o
a b s t r a c t
Article history: Received 16 November 2009 Received in revised form 7 December 2011 Accepted 20 December 2011 Available online 16 February 2012
This paper considers the problem of multirobot coordination in pick-and-place tasks on a conveyor band. The robot team is composed of identical robots with mutually exclusive, but neighboring workspaces. The products are fed in at one end of the band, move through each workspace sequentially until being picked up and are collected at the other end—if not picked up interim. Each robot has the same task—that is picking up and packaging as many products as possible. We propose an approach based on noncooperative game theory where each robot uses local observations of the conveyor band and their neighbors’ actions in order to decide on its actions. The developed algorithm has been implemented and tested in a simulated manufacturing environment using Webots. Results obtained from the simulations are analyzed using a variety of statistical performance measures. & 2012 Elsevier Ltd. All rights reserved.
Keywords: Multirobot tasks and coordination Noncooperative games Distributed decision making Flexible manufacturing
1. Introduction The demand for multirobot systems has been increasing. Even if a single robot can achieve the given task, the deployment of a team of robots may be preferable as it can potentially lead to considerable improvement in the production throughput [8,2,31,15]. Interestingly, in manufacturing settings, most studies have been based on single robot cells [4,22] and the use of multiple robots has been relatively less investigated [24,12]. In this paper, we investigate one operational problem that arose out of a real-time factory automation problem where there are multiple robots all working in parallel on a packaging assembly line [26]. Consider a moving conveyor band with products randomly fed at one end while items remaining on the band are collected at the other end as seen in Fig. 1. A set of robots line up the band on one side where each robot is assigned an exclusive workspace. The task of each robot is to pick up the products moving through its workspace as optimally as possible. However, this must be such that the overall system addresses the following concerns effectively [24]: (i) real-time operation with simple programming requirements, (ii) working in all regions of its workspace while accommodating to changes in the working environment, (iii) flexible integration and control. Hence, the goal is to design a multi-robot coordination strategy that can be employed with the specified restrictions. The traditional approach is to associate the system with a single global objective function that needs to be optimized. A substantial
n
Corresponding author. Tel.: þ90 212 359 6414; fax: þ 90 212 287 2465. E-mail address:
[email protected] (H.I. Bozma).
0736-5845/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.rcim.2011.12.001
amount of research has been directed toward the development of approaches and algorithms based on this perspective [1,11,10]. In utility based approaches, a variety of measures of acquiescence and impatience are used in (re)assignment of all the tasks simultaneously [27,33]. Alternatively, in multi-robot task allocation approaches, the problem is viewed as an instance of optimal assignment of a set of tasks to a set of robots based on maximizing the overall performance while taking their individual performances into consideration [14,20]. Auction based methods where candidate robots make ‘‘bids’’ according to their task-specific utilities employ a variant of the well-known contract net protocol [13,34]. The coordination of multiple robots is supervised by a plan manager in [16]. In a class of problems known as distributed constraint optimization problems (DCOPs), each of the many autonomous agents controls a single variable and together the agents have the joint goal of maximizing a global objective function [32]. The taxonomy of DCOP algorithms consists mainly of two groups [9]. The first group of distributed but complete algorithms maximize a global objective function, however are hard to implement in realtime due to computational complexity involved with the message passing structure [21,28]. In many scenarios such as in our problem, forming a global consensus is problematic due to several factors. First, real-time task requirement demands that a robot cannot wait for a message passing mechanism through the whole robot network [21]. Secondly, the system needs to be flexible so that robots may be removed or added to the production environment without requiring the re-programming of the whole system such as electing a leader and forming a tree-like arrangement for the new configuration [28]. The second group of DCOP algorithms – namely local iterative algorithms – consists of message passing algorithms and approximate best response algorithms [9]. In these algorithms, each agent
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
Fig. 1. Multirobot pick-and-place on a conveyor with two robots.
reacts on the basis of local knowledge, there is no need for tree-like communication [21,19]. However, since each agent’s utility is defined based as the sum of the terms of the global objective function that it is involved in [9], its decision-making has to consider all the associated robots’payoffs as well. Furthermore, as the robots are changed, the individual utility functions have to be reconsidered and changed accordingly. Alternatively in noncooperative game theory – in contrast to optimizing with respect to a single objective – a set of objective functions is simultaneously optimized. Here, each agent decides for its moves based on only individual considerations associated with its assigned task [3]. This is in general different from optimizing with respect to a single global objective function even if the global function is constructed via summing up all the individual objectives. It is known that Nash solutions in noncooperative game settings and Pareto-optimal solutions (namely the solution to a single objective function) are not necessarily the same when the individual objective functions are dependent on other robots’ decisions as well [5]. Hence, optimizing with respect to a single objective function - even if done in a decentralized manner via decomposing the global objective function into multiple objectives as done in DCOP – will in general have different solutions as compared to multiobjective optimization. In particular, in noncooperative game-theoretic setting, each robot decides based on only its individual cost function, whereas in DCOP, each robot’s objective would have to include all other robots’ objective functions that are dependent on its individual actions [9]. In this perspective, using a noncooperative gametheoretic approach has lower computational and communication requirements as each robot is considering only its individual cost. The robots continue to play rounds of games similar to dynamic or iterative games [21,23,25,6,7]. The contribution of this paper is to offer a novel approach to multirobot coordination in pick-andplace tasks based on noncooperative dynamical games. Each robot is associated with a cost function that simultaneously takes itself and its neighbors into consideration. The simplest strategy that may be employed by each robot is to ignore any coordination that may be taken with its local neighbors and always choose the action that is the best for its self-interest. In this case, this will be to pick up the closest products in its workspace (hence spending less time and energy). However, each robot also considers its neighbors’ actions and chooses an action based on the trade-off between doing this and picking up products that are least likely to be picked up by the neighbors (hence possibly spending more energy, but being more coordinating) that naturally leads to game-theoretic formulation. Hence, each robot strategically decides its own action based on the resulting trade-off within a noncooperative dynamical game. By noncooperativity, it is meant that the robots decide individually as opposed to forming coalitions to decide collectively [17]. By dynamical, it is meant that each robot views the process as a repeated game with its neighboring robots. As the environment is time-varying with the
531
continuous flow of products on the conveyor band, this is a dynamic game in which all the robots participate repeatedly as long as the conveyor is moving or contains products. In this framework, each robot needs to communicate with only its neighboring robots, asynchronous operations in and between robots occur only at higher decision-making level. Consequently, each robot can be controlled in real-time. As each robot operates in a manner that accommodates changes on the conveyor band as well neighbors’ actions, robots end up picking up products from all of their workspaces. Finally, new robots can easily be added to or removed from the system without any reprogramming required. The paper is organized as follows. The problem is mathematically formulated in Section 2. The noncooperative dynamic game formulation is presented in Section 3. In Section 4, the performance is investigated systematically through an extensive set of simulation statistics with different production parameters including a comparative study with a simple purely self-interested strategy. The paper concludes with a brief summary.
2. Problem formulation Consider a set of robots P ¼ f1, . . . ,N R g,N R A Z þ . Each robot iA P has location defined by its center bi A R3 . All the robots operate on a two-dimensional area defined on the conveyor band. This workspace is divided into NR nonoverlapping rectangular regions W i A R2 , i A N R where W i \ W j ¼ |, 8i aj. Each robot i is assigned to one workspace Wi. Furthermore, each Wi is divided into the smaller work regions W m i A Oi where ( ) [ m m n Oi ¼ W m 9 W ¼ W , W \ W ¼ | ð1Þ i i i i i m
S
Let O ¼ i A P Oi be the set of all work regions. For example, for the case of NR ¼4 robots where for all i A P, 9Oi 9 ¼ 4, the work regions are as shown in Fig. 2. The neighbor set Bi P consists of robots whose workspaces border with that of robot i. Let Ni denote the cardinality of Bi—namely Ni ¼ 9Bi 9. For the scenario where the robots flank the conveyor band on one side as shown in Fig. 2, 1 if i ¼ 1or i ¼ NR Ni ¼ ð2Þ 2 otherwise At any time, there are many products randomly located in each workspace. Let n : O R Z 0 -Z þ be a map such that nðW m i ,tÞ denotes the number of products located within Wm i at time t. The location of a product at instant t is denoted pðtÞ A R3 . Note that if a product pðtÞ A W i is not picked up by robot i, it will enter the workspace of the next robot—namely pðt 0 Þ A W i þ 1 where t 0 4 t. In order to pick up a product, each robot i has to first decide which region in Oi to move to. Hence, the action set can be represented by the index set Ai ¼ fm9W m i A Oi g. Let wi : Ai -Oi map each action to the corresponding workspace and pi : Ai -R3 denote the corresponding workspace center. Each robot is associated
Fig. 2. Workspaces for p ¼4 robots.
532
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
with an action state ai : R Z 0 -Ai where ai(t) denotes the action at time t.
3. Noncooperative dynamic game In a noncooperative dynamic game setting, the strategy employed by each robot iA P is to choose actions that is optimal for it. Optimality is based on minimizing a cost function ji that is dependent on not only the relation of products to itself, but also their relation to its neighbors as well as what the neighbors are doing. As all are time-varying, so is the cost function. First, let the vector of actions of the neighbors be denoted by a i A A i where 8 if i ¼ 1 > < A2 if i ¼ N R A i ¼ ANR 1 ð3Þ > :A A otherwise i1 iþ1 In this framework, the cost function is defined as ji : Ai A i R Z 0 -½0,1. All the robots move as to minimize their individual costs where the cost functions are constructed as to incorporate neighbors and their actions as ai ðtÞ A arg minji ðai ; a i ðtÞÞ ai
ð4Þ
The robot team can be viewed as a set of dynamic agents who constantly choose actions that minimize their individual costs. This means that at each decision instant, each robot tries to attain a Nash equilibrium which is a tuple of actions where no agent can improve its cost via unilaterally changing its decision. The dynamic decision making process can be viewed as an iterated noncooperative game that is repeated each time each robot finishes a pick-and-place task and is trying to determine where from to pick up next.
3.1. Construction of cost function The construction of the cost function embeds considerations that are used by human operators engaged in a similar task. There are three factors that affect the action decisions. The first criterion is that each robot should pick up products located in the closest workspace. Such information can be inferred visually via a machine vision system consisting of multiple cameras associated with the robots’ workspaces [4]. This would mean faster pick-up time and higher individualistic throughput. In this sense, according to this criterion, the robot is acting purely in its own interests. The cost of moving to each Wm i is measured by a function gi : Ai R Z 0 -R Z 0 as follows: 8 if nðwi ðai Þ,tÞ ¼ 0 > < C max P 1 2 ð5Þ gi ðai ,tÞ ¼ Jb pðtÞJ otherwise > : nðwi ðai Þ,tÞpðtÞ A w ða Þ i i i where C max A R 4 0 is a large number. If there are no products in a workspace, the cost Cmax is large. In the next two criteria, each robot takes its neighboring robots into consideration from a variety of different perspectives. In this paper, we consider two different measures. However, let it be noted other formulations may also be possible. First each robot should pick up products that are difficult for its neighbors to pick up. One way to attain this is to make a move such that the associated region contains products that are maximally distant from the neighbors. In this manner, the robot does not leave the pickup of such products to its neighbors. This is measured by a
function bi : Ai R Z 0 -R Z 0 defined as: 8 PP 1 > < Jbq pðtÞJ2 nðw ða Þ,tÞJ i i q A B bi ðai ,tÞ ¼ i ,p A wi ðai Þ > :C min
if Ni 4 0 otherwise
In this case, the action that maximizes bi needs to be selected. C min A R 4 0 is a small number. This information can be computed via using a camera-based vision system. Secondly, assuming that it will be easier for each robot to continue picking up from the current region as it does not require moving to another region, the products picked up by a robot should be maximally distant from the products being picked up by the neighbors—which means the associated regions must be distant as possible. This is measured by a function zi : Ai A i R Z 0 -R Z 0 defined as 8 P Jpi ðai Þpq ðaq ðtÞÞJ2 if N i ,Nq 40 < zi ðai ,a i ,tÞ ¼ q A Bi : C min otherwise Similarly, the action that maximizes zi is preferred. This criterion requires communication among the neighboring robots since they need to exchange action information. However, as this communication will need to be only local, the computational overhead involved will be of fixed amount and hence will not depend on the robot team size. The cost function ji , 8i A P, is composed by taking the weighted ratio of the gi term that needs to be minimized to the remaining two terms bi and zi both of which need to be maximized as
ji ðai ,a i ,tÞ ¼
gi ðai ,tÞk1
ð6Þ
ðbi ðai ,tÞzi ðai ,a i ,tÞÞk2
where k1 ,k2 4 0 determine the relative weights of the self and neighbors motivated terms. It incorporates a compromise between picking up close-by products and picking-up products that are distant from neighbors and their current actions. The parameter values determine the relative weighting of each term. Hence, with k2 ¼0, ji becomes totally self-interest oriented. Of course, as it depends on the distribution of products on the conveyor band, it will not be unimodal in general. However, from each robot’s perspective, choosing any minimal valued action will suffice since this will ensure that the trade-off is as desired. Let it be noted that while ji as constructed certainly encodes some of the robot’s considerations, alternate forms that encode additional constraints are also possible. For the sample 4-robot scenario as shown in Fig. 3, the cost function values ji for each robot and for each of its action is as shown in the table of Fig. 3. It is observed that now each robot chooses an action that also takes its neighbors situation into consideration. Hence, three of the robots choose actions associated with products not necessarily having the closest proximity.
R1 0.2 1.7
4.4 55.6
R2 55.6 0.3 55.6 0.2
55.6 55.6
R3 0.01 55.6
R4 55.6 2.6
1.0 55.6
Fig. 3. Top: A sample 4-robot scenario under noncooperative game decision making with optimal actions shown as darker shaded regions. Bottom: Cost function ji values for each of the four actions for a 4-robot scenario with k1 ¼ 0.5 and k2 ¼ 2.
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
Interestingly, even though the game played between the robots is noncooperative as each robot is concerned with its cost function only, the construction of the cost function generates an implicit coordination among the neighboring robots. In this setting, each robot chooses an action that is the best given the actions of its neighboring robots in the sense that changing it unilaterally does not lead to any gain. Note that such an action corresponds to a Nash equilibrium. Furthermore, as each robot needs to be concerned with local information only, the required communication overhead is minimal. In summary, each robot makes a decision via observing their respective workspaces and exchanging information with the neighbors. The action strategy is based on a compromise between what is purely the best for itself and the local robot group. 3.2. Game algorithm While a robot is deciding which move to make next, other robots may be engaged in a particular pick-and-place task or they may be also trying to decide for their next actions respectively. Hence, each robot need to use a decision strategy at each stage of the game in order to decide on the next move. For this, we use the concept of tentative actions si : Z þ -Ai . A tentative action is an action which is not yet finalized. Each robot that needs to determine a new move goes into a loop of exchanging actions with its neighbors and revising its own tentative action using its cost function ji . A neighbor may either be executing a particular pick-and-place task or making a decision for the next move. In the first case, its current action is broadcast to its neighbors and remains that way throughout the iterative process. In the second case, it exchanges its current tentative action with its neighbors. The update is based on the determining the best action given those of its neighbors as si ðkþ 1Þ A arg minji ðsi ; s i ðkÞÞ si
ð7Þ
where s i ðkÞ is the vector of actions of its neighbors. Each robot initially starts with its action from the last round so that si ð0Þ ¼ ai ðtÞ. The next action ai ðt þ dtÞ is defined as the limit of the tentative actions as ai ðt þ dtÞ ¼ lim si ðkÞ k-1
ð8Þ
where dt is the time taken for the iterative decision making process. In practice, the iterative process is repeated until si(k) converges or the maximum allowable iterations kmax for making a decision is reached. This is repeated until the packaging process is terminated via reaching tstop. The resulting algorithm is as follows: 1. Access the current time t and set k¼0. 2. Let sð0Þ ¼ aðtÞ: 3. For each robot i that needs to determine its next action, compute si ðkþ 1Þ A arg minsi ji ðsi ; s i ðkÞÞ. 4. If si ðk þ1Þ asi ðkÞ and ko kmax then k ¼ kþ 1 and go to step(3). 5. Let ai ðt þ dtÞ ¼ si ðk þ 1Þ: 6. Pick products from wi ðai ðt þ dtÞÞ:
533
7. If t þ dt ot stop go to step (1). 8. Terminate. Having described the iterated algorithm, we now study the convergence of such a game. We give a simple fact on the dynamics of the game. Lemma 1. The iterated game converges to a Nash equilibrium or to a time-constrained decision. Proof. If in a certain time period, the robot has not made a decision, there must exist at least one neighbor that is changing its decision. As long as there is a neighbor that is changing its decision, the robot’s decision may be changing. This procedure either stops when the neighbors’ action decision stabilizes or when the maximum allowable time for making decision is reached. In the first case, the decision is a Nash equilibrium since it cannot minimize its cost lower. In the latter case, the decision is a time-constrained decision since it is the decision reached with the time constraint. & 4. Simulation results A manufacturing environment consisting of a moving conveyor band and multiple robots has been designed and developed using Webots simulation software [35]. As shown in Fig. 4, products are fed continually to the conveyor band while the robots arranged linearly on one side try to pick and place them using the proposed algorithm. Each production scenario is defined by the following set of parameters: the speed (m/s), width (m) and length (m) of the conveyor band, the product feeding period TF (s). The number of products feed at each TF vary between 1 and NP. The value of NP is determined by the width of the conveyor band, the size of the products and the minimal spacing that needs to exist among so that they can be picked up. At each feeding period, a number of products varying from none to maximum possible depending on product size and conveyor band width are placed randomly on one end of conveyor band. All robots are of gantry type consisting of 9 degrees of freedom with a movement capability that allows each to reach any point in its associated two-dimensional rectangular workspace on the conveyor band. All the workspaces are of identical size that is separated into four subspace regions Wm i , m ¼ 1, . . . ,4. The lateral movement of the robot in its workspace is enabled by the rotational movement of its four wheels. A prismatic joint realizes the forward–backward extension of the gripper. The gripper wrist movements are controlled via a revolute joint while gripper jaw vertical movement and the planar extensions of the jaw are enabled by the corresponding prismatic joints. The robot speeds are categorized into three levels: level 1 (slow), level 2 (medium) and level 3 (fast) with settings as shown in Table 1. The robot team is defined by the number of the robots, their positions, operation speeds and their workspaces. The control scheme of each robot is as shown in Fig. 5. Once a robot is ready to decide where to pick up from next, it considers all the products within its workspace as well as the actions of its immediate
Fig. 4. Multirobot pick-and-place on a conveyor. Left: two robots, center: four robots, and right: eight robots.
534
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
Table 1 Speed level settings. Level Lateral
1 2 3
Gripper
Wi, i ¼ 1, . . . ,4 (m/s)
Extension (m/s)
Vertical (m/s)
Wrist (rad/s)
Jaw 1 (m/s)
Jaw 2 (m/s)
0.16 0.24 0.4
0.8 1 1.5
0.5 1 2.05
1 1 1
20 20 20
20 20 20
product from the workspace region associated with its final decision.1 The robot puts the product in its package and starts the loop immediately again. In our simulations, the operation speeds of all the robots are taken to be identical although in real applications the robots can be heterogeneous. Furthermore, the length and width of each robot’s workspace are both fixed to be 1 m while again in real applications different robots can have differently sized workspaces due to their inherent capabilities. All the products have the same size—0.15 m 0.15 m. The spacing among the products need to be 10 cm and hence NP ¼4. Hence, at each product feeding period, the number of products fed vary between 1 and 4. Each product is randomly placed on the conveyor band with the constraint that the spacing between any pairs of products exceeds 10 cm. Let it be noted that robot speed and feeding period manifest themselves as two independent parameters:
Robot speed refers to the operational speed of the robots
Fig. 5. Flowchart of each robot controller.
neighbors. Next, the action with the minimum cost is computed and announced to its neighbors. Each robot repeats doing this until its decision converges or the maximum time allowed for making a decision is reached. Then, it randomly picks up a
engaged in the task. Usually, increased robot speed means increased investment costs. Feeding period refers to the frequency of product feeding to the conveyor band and hence is associated with the production cycle time of the goods. A smaller feeding period means that shorter production cycle times.
However, the effect of increasing robot speed and decreasing feeding periods may be similar. As expected, as faster robots are used, the pick-up rate will increase which means that a smaller percentage of the products will be left unpicked. This performance may be similar to slower robots operating with slower feeding periods. Let us first consider a sample 4-robot scenario. The evolution of the instantaneous cost functions values ji ðai ðtÞ,a i ,tÞ with respect to time for different feeding periods are presented in Fig. 6. As each robot chooses an action ai(t) that minimizes ji ðai ,a i ,tÞ with respect to its possible actions ai A Ai , it is important to observe how these function values evolve for each of the robots. It is observed that the first and last robots attain higher cost values compared to the interior robots. In addition, the cost values of the robots are relatively higher at lower product feeding rates. As the product density on the conveyor increases with higher feeding periods or robots become speedier, robots tend to have bigger variations in their cost functions and have a tendency to pick up products from the workspace regions with high product density. Hence, having lower feeding periods, but speedier robots have similar cost functions compared to having higher feeding periods, but slower robots. We may reach the same conclusion by looking at the correlation between the cost function and the speed of the robots. It is observed that the cost function often increases with the speed of the robots. As the robots operate faster, the amount of products on the band gets less which can be viewed to be equivalent to low product feeding periods. The corresponding time evolution of actions are as seen in Fig. 7. In each figure it is observed that every robot in the team tries to pick up products from workspace regions which are maximally distant to the workspace regions associated with the actions of their neighbor robots. Moreover, each robot also attempts to collect products that are the closest to it and the furthest from its neighbors. Finally, it should be noted that pick up behavior of the robots are not influenced by product feeding period and speed of the robots.
1 In real applications, the robot could be picking up multiple products from this region.
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
TF= 10 sec.
0.3
0.3
0.2
0.1
0.1
0.1 0
0 50
100 150 200 time(sec.)
250
300
0
50
100 150 200 time(sec.)
TF= 10 sec.
300
0
0.4 0.3
0.2
0.1
0.1
0.1 0
0 100 150 200 time(sec.)
250
300
0
50
100 150 200 time(sec.)
TF= 10 sec.
300
0
0.4 0.3
0.3
0.2
0.2
0.1
0.1
0.1 0
0 50
100 150 200 time(sec.)
250
300
250 300
robot 1 robot 2 robot 3 robot 4
0.4
0.2
0
100 150 200 time(sec.)
0.5
Cost
0.3
0
50
TF= 5 sec. robot 1 robot 2 robot 3 robot 4
0.5
Cost
0.4
250
TF= 7.5 sec. robot 1 robot 2 robot 3 robot 4
0.5
300
0.3
0.2
50
250
robot 1 robot 2 robot 3 robot 4
0.4
0.2
0
100 150 200 time(sec.)
0.5
Cost
0.3
0
50
TF= 5 sec. robot 1 robot 2 robot 3 robot 4
0.5
Cost
0.4
250
TF= 7.5 sec. robot 1 robot 2 robot 3 robot 4
0.5
Cost
0.3
0.2
0
Cost
0.4
0.2
0
robot 1 robot 2 robot 3 robot 4
0.5
Cost
0.4 Cost
0.4
TF= 5 sec. robot 1 robot 2 robot 3 robot 4
0.5
robot 1 robot 2 robot 3 robot 4
0.5
Cost
TF= 7.5 sec.
535
0
50
100 150 200 time(sec.)
250
300
0
50
100 150 200 time(sec.)
250
300
Fig. 6. Time evolution of the cost functions ji ðai ðtÞ,a i ,tÞ during a sample game for different TF—top: slow robots, center: mid-speed robots, and bottom: speedy robots.
4.1. Statistical analysis
The percentage of fed products that are picked up which
The pick-and-place performance was studied statistically through an extensive set of simulations. The scenarios differ with respect to the following scenario parameters:
The minimum and maximum number of products packaged by
measures the efficiency of the robot team.
The number of the robots varies between two, four or eight
robots—all positioned on one side of the conveyor belt in a linear way. The robot speeds categorized into three levels: level 1 (slow), level 2 (medium) and level 3 (fast). The product feeding period TF: Three different time periods were used for product feeding—T F A f5,7:5,10g.
Hence, altogether there are totally 27 different scenarios. For each scenario, 10 different simulations of 5 min duration were made. Simulation results are quantified using the average and standard deviation of the following performance measures:
The number of products packaged by the robot team which measures the actual amount of work completed.
a robot which measures how idle or busy the robots can be. Fig. 8(top) shows the percentage of products fed to the conveyor band that are picked up as a function of different robot speed levels and team sizes. The effectiveness of the team is viewed to be proportional to both the speed of the robots and the team size. For fixed robot speeds, the effectiveness of the robot team decreases with increased TF as seen in Fig. 8 (bottom). This may be attributed to the fact that increasing feeding rate leads to increased number of products which are not picked up by the robots. In summary, the following conclusions can be reached regarding the overall pick-and-place performance with multirobot coordination. From the perspective of individual robots, each robot tends to perform actions which are associated with the workspace regions that have number of maximum products. The effectiveness of the team increases with both the speed of the robots and the team size while decreasing with the product feeding rate. Among the 27 scenarios, the robot team reaches
536
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
TF= 10 sec.
TF= 7.5 sec.
4
TF= 5 sec.
4
robot 1 robot 2 robot 3 robot 4
3
4
robot 1 robot 2 robot 3 robot 4
3 a(t)
a(t)
a(t)
3
2
2
2
1
1
1 0
100
200
300
0
time(sec.)
100
200
300
0
time(sec.)
TF= 10 sec.
3
robot 1 robot 2 robot 3 robot 4
a(t)
a(t)
a(t)
2
1
1 100 200 time(sec.)
300
0
TF= 10 sec.
100 200 time(sec.)
300
0
TF= 7.5 sec.
4
3
robot 1 robot 2 robot 3 robot 4
a(t)
a(t)
a(t)
2
1
1 100 200 time(sec.)
300
robot 1 robot 2 robot 3 robot 4
3
2
1
300
4
3
2
100 200 time(sec.) TF= 5 sec.
4
robot 1 robot 2 robot 3 robot 4
0
robot 1 robot 2 robot 3 robot 4
3
2
1
300
4
3
2
200
TF= 5 sec.
4
robot 1 robot 2 robot 3 robot 4
100
time(sec.)
TF= 7.5 sec.
4
0
robot 1 robot 2 robot 3 robot 4
0
100 200 time(sec.)
300
0
100 200 time(sec.)
300
Fig. 7. Time evolution of the actions of the robots during a sample game for different TF—top: for slow robots, center: mid-speed robots, and bottom: speedy robots.
maximum effectiveness when a mid-sized fast robot team is operating on a conveyor band with a medium level product feeding rate. 4.2. Comparative results: simple decision making In this section, the approach is compared with a simple decision making strategy where robots choose their actions based on cost functions that is constructed purely based on self-interest. Each robot acts without considering its neighbors and hence there is no coordination among the robots. In this framework, each cost function ji : Ai R Z 0 -½0,1, 8i A P, is simply defined based on the gi term that needs to be minimized:
ji ðai ,tÞ ¼ gi ðai ,tÞ
ð9Þ
Hence, the action that minimizes gi should be preferred. This cost function depends only on the distribution of the products in the associated workspace. In general, actions associated with regions that are closer to the robot will be preferred over the other actions. For a sample 4-robot scenario as shown in Fig. 9, the cost function values ji for each robot and for each of its action is as shown in the respective table of Fig. 9. It is observed that each
robot chooses the action associated with the region containing products closest to it as expected. The statistical results confirm this result and indicate that regarding the number of picked up products in a simple strategy is about the same as game-theoretic decision making as presented in Table 2. Interestingly, the distribution of actions differ quite a bit as presented in Table 3. It is observed that robots acting under simple strategy engage mostly in actions involved with picking up from the closest two workspaces while the robots acting under game-theoretic strategy tend to have a more distributed action strategy.
5. Conclusion In this work, we present a novel approach to multirobot coordination in pick-and-place tasks on a conveyor band based on noncooperative games. The robot team is composed of identical robots with mutually exclusive, but neighboring workspaces. The products are fed at one end of the band, move through each workspace sequentially if not picked up. Items remaining on the band are collected at the other end. The task of each robot in the team is to pick and place as many of the products moving through
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
NR = 4 1
0.8
0.8
0.8
0.6 0.4
Percentage %
1
0.2
0.6 0.4 0.2
Feeding Period=5 sec. Feeding Period=7.5 sec. Feeding Period=10 sec.
1
2
3
1
speed level of the robots
2
0.6 0.4 0.2
Feeding Period=5 sec. Feeding Period=7.5 sec. Feeding Period=10 sec.
3
1
speed level of the robots
NR = 2
NR = 4
0.8
0.8
0.4 Speed Level=1 Speed Level=2 Speed Level=3
0 7.5 TF (sec.)
10
Percentage %
0.8 Percentage %
1
0.6
0.6 0.4 0.2
5
7.5 TF (sec.)
0.6 0.4 0.2
Speed Level=1 Speed Level=2 Speed Level=3
0
3
NR = 8
1
5
2 speed level of the robots
1
0.2
Feeding Period=5 sec. Feeding Period=7.5 sec. Feeding Period=10 sec.
0
0
0
Percentage %
NR = 8
1
Percentage %
Percentage %
NR = 2
537
Speed Level=1 Speed Level=2 Speed Level=3
0
10
5
7.5 TF (sec.)
10
Fig. 8. Average and standard deviation of percentage of total fed products that are picked up. Top: As a function of robot speed and bottom: as a function of product feeding period.
Table 3 Comparative distribution of actions for four robots operating at medium speed level with Tf ¼ 5 s. Actions
R1 0.32 0.37 0.26 15.8
R2 15.8 0.33 15.8 0.15
R3 15.8 0.35 0.3 15.8
R4 15.8 0.4 0.305 15.8
Fig. 9. Top: a sample 4-robot scenario under simple decision making with optimal actions shown as darker shaded regions. Bottom: corresponding ji values for each of the four actions with k1 ¼ 0.5.
Table 2 Comparative pickups per robot for four robots operating at medium speed level with Tf ¼5 s. Pickups per robot Simple decision-making Game theoretic decision-making Average Min. Max.
23 17 27
23 17 29
their workspace in parallel with their teammates. Each robot is associated with its individual cost function that embodies its observations of its workspace and those of its neighbors as well as the neighbors’ actions. All the robots decide for their respective actions based only on their associated objective functions which then leads to a noncooperative game-theoretic setting. The mathematical model of this approach is formulated along with the associated algorithm that is then applied in various scenarios comprising a conveyor band and a multirobot team. The advantages of this approach are real-time applicability with simple
1 2 3 4
Simple decision-making
Game theoretic decision-making
R1
R2
R3
R4
R1
R2
R3
R4
–
– 0.1 0.4 0.5
– – 0.4 0.6
– 0.3 0.3 0.4
0.24 0.2 0.09 0.46
0.1 0.12 0.31 0.43
0.17 0.28 0.11 0.36
0.22 0.16 0.06 0.44
– 0.4 0.6
programming requirements and flexible integration while allowing all the robots to work in all parts of their workspaces. Extensive simulation results reveal that the resulting pick-andplace performance depends on the number and speed of robots as well as the feeding period of the products. Near 100% pickup can be achieved amidst local actions. Interestingly, it is observed that increasing the number of robots does not necessarily lead to a comparable increase in the production as some robots may be idling more. Finally, we compare performance with a simple strategy of picking up the closest products. In this case, it is observed while the number of products picked up remain nearly the same, the actions of the robots tend to be more distributed than that of simple strategy. For future work, the approach can be extended to scenarios where the robots are arranged in more complex topologies including overlapping workspaces.
Acknowledgments This work is supported by Bogazici University BAP 07A205, DPT 03K120250 and TUBITAK 107M240. We kindly acknowledge the contributions of Hakan Karaoguz in the additional simulations
538
H.I. Bozma, M.E. Kalalıo˘glu / Robotics and Computer-Integrated Manufacturing 28 (2012) 530–538
done for the revised paper and Webots team for their support in various technical issues pertaining to the usage of the simulation software. We also thank the anonymous reviewers for their constructive comments on the earlier version of the paper. References [1] Arai T, Pagello E, Parker LE. Guest editorial advances in multirobot systems. IEEE Transactions on Robotics and Automation 2002;18(5):655–61. [2] Bonert M, Shu L, Benhabib B. Motion planning for multi-robot assembly systems. International Journal of Computer Integrated Manufacturing 2000;13(4):301–10. [3] Bas˙ar T, Olsder GJ. Dynamic Noncooperative Game Theory. New York: Academic Press; 1982. [4] Bozma HI, Yalcın H. Visual processing and classification of items on a moving conveyor: a selective perception approach. Robotics and Computer-Integrated Manufacturing 2002(2):125–33. [5] Bozma HI, Duncan J. A game-theoretic approach to integration of modules. IEEE Transactions on Pattern Recognition and Machine Intelligence (SCI) 1994;16(11):1074–86. [6] Bozma HI, Duncan JS. Noncooperative games for decentralized integration architectures in modular systems. In: Proceedings of the IEEE international workshop on intelligent robots and systems, Osaka, Japan; December 1991. p. 431–43. [7] Bozma HI, Duncan JS. Sequential and parallel models in noncooperative games for decentralized decision making in vision systems. In: Proceedings of IEEE international conference on systems, man and cybernetics, Charlottesville, Virginia; 1991. p. 1259–64. [8] Cao YU, Fukunaga A, Kahng A. Cooperative mobile robotics: antecedents and directions. Autonomous Robots 1997;4:2–13. [9] Chapman A, Rogers A, Jennings NR, Leslie D. A unifying framework for iterative approximate best-response algorithm for distributed constraint optimization problems. The Knowledge Engineering Review 2011;26(4):411–44. [10] Dudek G, Jenkin M, Milios E, Wilkes D. A taxonomy for multi-agent robotics. Autonomous Robots 1996;3(4):375–97. [11] Farinelli A, Iocchi L, Nardi D. Multirobot systems: a classification focused on coordination. IEEE Transactions on Systems, Man and Cybernetics—Part B: Cybernetics 2004;34(5):2015–28. [12] Galante G, Passannanti G. Minimizing the cycle time in serial manufacturing systems with multiple dual-gripper robots. International Journal of Production Research 2006;44(4):639–52. [13] Gerkey BP, Mataric M. Sold!: auction methods for multirobot coordination. IEEE Transactions on Robotics and Automation 2002;18(5):758–68. [14] Gerkey BP, Mataric M. Multirobot task allocation: analyzing the complexity and optimality of key architectures. In: Proceedings of the IEEE international conference on robotics and automation, Taipei, Taiwan; September 14–19, 2003, p. 3862–8.
[15] Jin-Kyu L, Tae-Eog L. Automata-based supervisory control logic design for a multi-robot assembly cell. International Journal of Computer Integrated Manufacturing 2002;15(4):319–34. [16] Joyeux S, Alami R, Lacroix S, Philippsen R. A plan manager for multi-robot systems. International Journal of Robotics Research 2009;28(2):220–40. ¨ CS, Bozma HI, Koditschek DE. Feedback-based event-driven parts [17] Karagoz moving. IEEE Transactions on Robotics 2004;20(6):1012–8. [19] Kephart JO, Hogg T, Huberman BA. Dynamics of computational ecosystems. Physical Review A 1989;40:404–21. [20] Lerman K, Jones C, Galstyan A, Mataric MJ. Analysis of dynamic task allocation in multi-robot systems. The International Journal of Robotics Research 2006;25:225–41. [21] Maheswaran RT, Pearce JP, Tambe M. Distributed algorithms for DCOP: a graphical-game-based approach. In: Proceedings of the 17th international conference on parallel and distributed computing systems (PDCS), San Francisco, CA; September 15–17, 2004. p. 432–9. [22] Mattone R, Divona M, Wolf A. Sorting of items on a moving conveyor belt. Part 2: performance evaluation and optimization of pick-andplace operations. Robotics and Computer-Integrated Manufacturing 2002;16(2–3):81–90. [23] Mailath GJ, Samuelson L. Repeated games and reputations: long-run relationships. Oxford University Press; 2006. [24] May FB, Kaye AR, Mahmoud SA. Control and communications for multiple, cooperating robots. Robotics and Computer-Integrated Manufacturing 1989;6(1):37–53. [25] Mertens JF. Repeated games. In: Proceedings of the international congress of mathematicians, CA, USA; 1986. p. 1528–77. [26] Nof SY, Hanna D. Operational characteristics of multi-robot systems with cooperation. International Journal of Production Research 1989;27(3): 477–92. [27] Parker L, Alliance E. An architecture for fault-tolerant multirobot cooperation. IEEE Transactions on Robotics and Automation 1998;14(2):220–40. [28] Petcu A, Faltings B. A scalable method for multiagent constraint optimization. In: Proceedings of IJCAI; 2005. p. 266–71. [31] Tewolde GS, Wu C, Wang Y, Sheng W. Distributed multi-robot work load partition in manufacturing aut. In: 4th IEEE conference on automation science and engineering, USA; August 23–26, 2008. p. 504–9. [32] Yokoo M, Durfee EH, Ishida T, Kuwabara K. The distributed constraint satisfaction problem: formalization and algorithms. IEEE Transactions on Knowledge and Data Engineering 1998;10(5):673–85. [33] Werger BB, Mataric M. Broadcast of local eligibility for multi-target observation. In: Distributed autonomous robotic systems, vol. 4. Springer-Verlag; 2000. p. 347–56. [34] Zlot R, Stentz A, Dias MB, Thayer S. Multirobot exploration controlled by a market economy. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Washington, DC; May 2002. p. 3016–23. [35] Webots /http://www.cyberbotics.comS. Commercial mobile robot simulation software.