Robotics and Autonomous Systems (
)
–
Contents lists available at ScienceDirect
Robotics and Autonomous Systems journal homepage: www.elsevier.com/locate/robot
Planning and execution through variable resolution planning Moisés Martínez ∗ , Fernando Fernández, Daniel Borrajo Computer Science Department, Universidad Carlos III de Madrid, Avenida de la Universidad, 30, Leganés 28911, Madrid, Spain
highlights • • • •
A novel technique for planning and execution in dynamic and stochastic environments. When planning, the technique removes information far into the future. Planning information is abstracted by selecting several predicates. Planning and execution performance are improved by computing plans very fast.
article
info
Article history: Received 12 May 2015 Received in revised form 9 December 2015 Accepted 22 April 2016 Available online xxxx Keywords: Task planning Planning and execution Abstract representation Cognitive robotics
abstract Generating sequences of actions – plans – for robots using Automated Planning in stochastic and dynamic environments has been shown to be a difficult task with high computational complexity. These plans are composed of actions whose execution might fail due to different reasons. In many cases, if the execution of an action fails, it prevents the execution of some (or all) of the remainder actions in the plan. Therefore, in most real-world scenarios computing a complete and sound (valid) plan at each (re-)planning step is not worth the computational resources and time required to generate the plan. This is specially true given the high probability of plan execution failure. Besides, in many real-world environments, plans must be generated fast, both at the start of the execution and after every execution failure. In this paper, we present Variable Resolution Planning which uses Automated Planning to quickly compute a reasonable (not necessarily sound) plan. Our approach computes an abstract representation – removing some information from the planning task – which is used once a search depth of k steps has been reached. Thus, our approach generates a plan where the first k actions are applicable if the domain is stationary and deterministic, while the rest of the plan might not be necessarily applicable. The advantages of this approach are that it: is faster than regular full-fledged planning (both in the probabilistic or deterministic settings); does not spend much time on the far future actions that probably will not be executed, since in most cases it will need to replan before executing the end of the plan; and takes into account some information of the far future, as an improvement over pure reactive systems. We present experimental results on different robotics domains that simulate tasks on stochastic environments. © 2016 Elsevier B.V. All rights reserved.
1. Introduction Automated Planning (AP) is the branch of Artificial Intelligence that studies the generation of an ordered set of actions – plan – that allows a system to transit from a given initial state to a state where a set of goals have been achieved. AP has been successfully used to solve real world problems such as planning Mars exploration missions [1] or controlling underwater vehicles [2]. Despite of these examples, the application of AP systems to stochastic and
∗
Corresponding author. E-mail addresses:
[email protected] (M. Martínez),
[email protected] (F. Fernández),
[email protected] (D. Borrajo). http://dx.doi.org/10.1016/j.robot.2016.04.009 0921-8890/© 2016 Elsevier B.V. All rights reserved.
dynamic environments still presents some challenges, mainly because these scenarios increase the complexity of the planning and execution process: (i) new information about the environment can be discovered during action execution, modifying the structure of the planning task; (ii) actions’ execution can fail which in turn prevents the execution of the rest of the plan; (iii) the execution of the actions in the plan can generate states from which the rest of the plan cannot be successfully executed (dead-ends); and (iv) plans may need to be generated quickly to offer a real time interaction between the AP system and the environment. For these reasons, the process of generating a plan of actions can be prohibitively expensive for this kind of scenarios. There are two main (extreme) approaches to solve problems in stochastic and dynamic scenarios: deliberative and reactive. At
2
M. Martínez et al. / Robotics and Autonomous Systems (
one extreme, we find deliberative systems which are based on interleaving AP and execution with full or partial observability. If we have information about the dynamics of the environment (failures in the actuators of a robot, the structure of the terrain, accuracy of sensors), we can define a domain model with probabilistic information with full observability (such as in PPDDL [3] or RDDL [4]). Then, one alternative consists on building conditional plans [5] where plans take into account all possible outcomes. Another approach consists on generating a set of policies by solving the problem as a Markov Decision Process (MDP) [6–8]. But, usually, the dynamics of the environment are not known or cannot be easily modeled. Then, in turn, we have two alternatives. First, we can learn the dynamics and then apply the previous approaches. However, the learning effort is huge except for small tasks [9]. Another solution, and the most used one, consists of using a deterministic domain model and replan or repair the plan when a failure in execution is detected (e.g. the robot is not in the expected place). When replanning [10], the planner generates an initial applicable plan and executes it, one action at a time. If an unexpected state is detected, the system generates a new plan from scratch. This process is repeated until the system reaches the problem goals. Therefore, at each planning (replanning) step, including the initial one, the system is devoting a huge computational effort on computing a valid plan (an applicable plan that achieves the goals), when most of it will not be used. When repairing a running plan [11–13], the planner generates an initial applicable plan and executes it. If an unexpected state is detected, the system generates a new plan by reusing the plan generated previously and adding/removing some actions. In general, deliberative systems require a huge computational effort to generate a complete and sound plan. Depending upon the dynamics of the environment, most probably the plan will not be executed fully. On the other extreme, there are several approaches that solve problems in stochastic and dynamic scenarios using reactive techniques. These systems are based in greedily selecting the next action to be applied according to some knowledge which has been programmed or learned previously. If the knowledge about the environment is only used to select the next action, we can consider a pure reactive system without deliberation, where the system perceives and generates the next action in a continuous cycle. Systems based on the Subsumption architecture [14,15] are built using a control layer set, where different layers are interconnected with signals. During each execution step, one layer is chosen depending on the information perceived. Other reactive approaches are based on building reactive behavioral navigation controllers using neural networks [16,17] or fuzzy logic [18,19]. In general, reactive systems require much less computational effort and are ‘‘mostly’’ blind with respect to the future; they usually ignore the impact of the selected action on the next actions and states. Thus, they often get trapped in local minima or dead-ends. In this paper, we propose Variable Resolution Planning (vrp) for interleaving planning and execution in stochastic and dynamic environments. Our research has been inspired by the work of Zickler and Veloso [20], where a motion planning technique is used to generate a collision-free trajectory from an initial state to a goal state. They consider the far future with a different level of detail, by selectively ignoring the physical interactions with dynamic objects. Similarly, vrp is based on two main concepts: (i) most planning effort is devoted to compute a valid plan head of length k; and (ii) the rest of the plan is only generated by checking for potential reachability by relaxing the actions’ model. Actions are simplified by removing some domain details to decrease the computational effort avoiding deadends. The main advantage of our approach is that it requires
)
–
much less search time than traditional planning approaches that compute a valid complete plan (improving over pure deliberative approaches), while retaining their capability of reasoning into the future (improving over pure reactive approaches). In addition, our technique can be easily parameterized by appropriately setting a value for k so that its behavior gradually transits from a more deliberative approach (large values of k) to a more reactive approach (small values of k). In the extremes, if k = 1, vrp becomes an almost pure reactive system, while if k = ∞, vrp behaves as a standard deliberative planner. This paper is organized as follows: first in Section 2, we formally define the representation of the planning task in classical planning. Section 3 presents an overview of vrp. Section 4 introduces the concept of predicate abstraction and how it can be deployed in AP. Section 5 describes the algorithms related to vrp. Section 6 presents a description of the planning and execution architecture used to deploy vrp. Section 7 shows experimental evaluation of vrp in five different domains. Section 8 presents some works related with our approach. Finally, Section 9 concludes and introduces future work. 2. Planning formalization There are different types of planning tasks defined in the literature. In this paper, we consider the sequential classical planning task which is encoded in the propositional fragment of Planning Domain Description Language (PDDL) 2.2. It includes advanced features like numeric fluents, ADL conditions, effects and derived predicates (axioms). Definition 1 (Planning Task). A planning task can be defined as a tuple Π = (F , A, I , G), where:
• F is a finite set of grounded literals (also known as facts or atoms).
• A is a finite set of grounded actions derived from the action schemes of the domain.
• I ⊆ F is a finite set of grounded predicates that are true in the initial state.
• G ⊆ F is a finite set of goals. Any state s is a subset of facts that are true at a given time step. Each action ai ∈ A can be defined as a tuple ai = (Pre, Add, Del), where Pre(ai ) ⊆ F are the preconditions of the action, Add(ai ) ⊆ F are its add effects, and Del(ai ) ⊆ F are the delete effects. Eff (ai ) = Add(ai ) ∪ Del(ai ) are the effects of the action. Actions can also have a cost, c (ai ) (the default cost is one). An action a is applicable in si , if Pre(a) ⊆ si . Then, the result of applying an action a in state si generates a new state that can be defined as: si+1 = (si \ Del(a)) ∪ Add(a). A plan π for a planning task Π is an ordered set of actions (commonly, a sequence) π = (a1 , . . . , an ), ∀ai ∈ A, that transforms the initial state I into a state sn where G ⊆ sn . This plan π can be executed if the preconditions of each action are satisfied in the state in which it is applied; i.e. ∀ai ∈ π , Pre(ai ) ⊆ si−1 such that state si results from executing the action ai in the state si−1 . s0 is the initial state I. The cost of the solution is the sum of the action costs. In PDDL [21], planning tasks are described in terms of objects of the world (robots, locations, rocks, etc.), predicates which describe static or dynamic features of these objects or relations among them (e.g. locations are connected by roads), actions that manipulate those relations (a robot can move from one location to another, a package can be grasped by a robot), an initial state that describes the initial situation before plan execution, and a goal definition which describes the objectives that must be reached by the solution plan. Commonly, this information is provided in two input files: a domain and a problem. The domain file contains a
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
3
Fig. 1. vrp’s architecture.
definition of a set of generalized actions A, a set of ungrounded predicates F and a set of types. The problem file defines a set of objects, an initial state (I), and a set of goals (G). Currently, state-of-the-art planners compile the input specification (for ungrounded PDDL) into a propositional representation that uses grounded predicates (propositions), and instantiated actions. In the propositional representation, the world states are represented as sets of atomic propositions (including the goals) and actions are represented in terms of which propositions must be true for the action to be applicable (preconditions), and which propositions the action makes true (add effects) or false (delete effects). 3. Variable resolution planning architecture To offer the reader an overview of the vrp’s architecture, Fig. 1 shows the architecture of vrp with its main phases and how these are connected. The vrp technique is composed of three different phases. In the first phase, called Knowledge Gathering, information about the planning task is extracted. There are different ways to extract this information, and some of these approaches are described in Section 5. The information extracted on the previous phase is used by the Abstraction Generation phase to build a set of abstract actions. Finally, the Search Algorithm phase is executed to generate an abstract plan using two sets of actions: actions and abstract actions. Actions is the set of standard actions defined in the planning task. Abstract actions is the set of actions generated in the previous phase. The search algorithm uses the action set until it reaches horizon k; then, the algorithm uses the abstract action set until it reaches the goals. Upon termination, the search algorithm either outputs a solution abstract plan or no solution. We formally define the structure of an abstract plan in the next section.
describe the current location of a robot, we define the predicate (at robot1 location1 ). Predicate at describes that robot1 is at location1 . Predicates can be selectively removed to simplify the state, generating abstracted representations of the planning task. In first-order logic, these abstractions are built removing some predicates from the PDDL representation of the planning task. In the corresponding propositional representation, this is equivalent to removing literals from F , I and G and from the preconditions and effects of the grounded actions in A. To illustrate the concept of predicate abstraction used in our approach, a task of the Rovers domain is shown in Fig. 2.1 The environment is represented as a grid of 16 cells, called waypoints. Each one is denoted with a bi-dimensional coordinate (x, y), starting on the left bottom cell of the grid. White cells represent waypoints in which the rover can stay (free); and black cells represent obstacles. Two types of Mars samples can be collected: rocks and soil. Rocks are denoted with a small black circle. Soil is denoted with a small black square. Rovers are located at waypoints and can move between any two free waypoints which are adjacent. Besides, rovers can take samples of rocks and soil and analyze them. Finally, there is a lander base which is used by the rovers to send information about the analysis made over the samples. The typical goal of a Rovers task is to take some soil and rock samples, analyze them and send the result to the lander base. The following function generates a mapping between the ungrounded (PDDL) definition of a predicate p and a set of grounded facts F . Definition 2 (Grounding Mapping). Let φ be a set of facts, and p a predicate. The grounding mapping g (p, φ) is the set of grounded propositions (facts) of predicate p in φ . For instance, suppose p is the ungrounded (lifted) predicate (at ?x ?y).2 Fig. 2 shows a planning task with one rover, r1, one
4. Definitions Our approach is based on removing some future details about the planning task to speed up search on hard problems, which are executed in a stochastic and/or dynamic environment. So, we have to see first how PDDL defines planning tasks. PDDL mainly uses a kind of first-order logic. The set of predicates allows to represent actions, states and goals. For instance, in order to
1 This domain is a simplified version of the one used to provide plans for the actual Mars rovers. 2 In PDDL any symbol preceded by a question mark denotes a variable that can be grounded with particular objects. Also, in PDDL variables in predicates are further denoted with the appropriate object types that can substitute the variable. In this case, we would have (at ?x - robot ?y - waypoint).
4
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
Definition 5 (Projection of a Planning Task Over a Set of Predicates). Let Π = (F , A, I , G) be a planning task and let L be a set of predicates. A projection of the planning task Π over L is defined as: ProjL (Π ) = (ProjL (F ), ProjL (A), ProjL (I ), ProjL (G)) where ProjL (A) = {a′ | a ∈ A, a = (Pre, Add, Del), a′ = (ProjL (Pre), ProjL (Add), ProjL (Del))} and ProjL (F ), ProjL (I ), ProjL (G), ProjL (Pre), ProjL (Add) and ProjL (Del) are computed with the Definitions 3 and 4. In the case of vrp, instead of projecting over a set of predicates, we will generate abstractions by projecting over the complement of a set of predicates. That is, we will remove from the planning task everything related to that set of predicates. Definition 6 (Abstraction of a Planning Task Over a Set of Predicates). Let Π = (F , A, I , G) be a planning task, P the original set of predicates of Π and L a subset of predicates L ⊆ P. An abstraction of the planning task Π over L can be defined as: ΠLabs = ProjP \L (Π ). abs abs abs Also, ΠLabs = (FLabs , Aabs = ProjP \L (F ), Aabs = L , IL , GL ) where FL L
ProjP \L (A), ILabs = ProjP \L (I ) and Gabs = ProjP \L (G). L Fig. 2. A Rovers planning task. Take soil sample v1 from location (2,2) and rock sample s1 from location (0,1), using rover r1 . Rovers may only move between white cells, while black cells are obstacles. Samples may be analyzed and the results must be communicated to the lander base l1 , which is on location (3,3).
rock sample denoted by s1, one soil sample denoted by v 1 and 16 waypoints (w 00, . . . , w 33). Then, the grounding mapping of predicate p in φ = F of the figure’s planning task is composed of only the groundings of predicate at: g (p, F ) = {(at r1 w 00), (at r1 w 01), . . . ,
(at r1w32), (at r1 w33)}. Since vrp performs an abstraction over some predicates, we first need to define a projection over some predicate that will be the basis for the abstraction. Definition 3 (Projection of a Set of Facts Over a Predicate or Set of Predicates). Let φ be a set of facts and p a predicate. A projection of φ over p is defined as: Projp (φ) = {x | x ∈ φ, x ∈ g (p, φ)}. If L is a set of predicates, we also define the projection over L as: ProjL (φ) = {x | x ∈ φ, p ∈ L, x ∈ g (p, φ)}. We can generalize this definition to any first order logic formula by recursively applying the previous definition. Definition 4 (Projection of a Formula Over a Set of Predicates). Let
φ1 be a formula, let φ2 be a formula and L a set of predicates. A projection of φ1 and φ2 over L can be recursively defined as: • ProjL (φ1 ∧ φ2 ) = Projp (φ1 ) ∧ Projp (φ2 ). • ProjL (φ1 ∨ φ2 ) = Projp (φ1 ) ∨ Projp (φ2 ). • ProjL (¬φ1 ) = ¬Projp (φ1 ) where φ1 and φ2 are first-order logic formulas. A projection of some planning task Π over some predicate p removes the information of the rest of predicates in all the components of Π .
Let us see an example of abstracting an action. Given the planning task Π shown in Fig. 2, if L = {(at ?x ?y)} and a is the grounded action navigate(r1, w 03, w 13),3 where:
• Pre(a) = {(can_traverse r1 w03 w 13), (visible w 03 w 13), (available r1), (at r1 w03)}. • Add(a) = {(at r1 w 13)}. • Del(a) = {(at r1 w 03)}. The abstraction of an action a over L is aabs :
• Pre(aabs ) = {(can_traverse r1 w 03 w 13), (visible w03 w 13), (available r1)}. • Add(aabs ) = ∅. • Del(aabs ) = ∅. If we abstract over at (at is removed from Π ), rovers are not required to be at any location to perform any task, and therefore to move between locations. Thus, abstraction modifies the structure of the planning task. We also have to define how the abstraction affects plans. Definition 7 (Abstract Plan). Let Π = (F , A, I , G) be a planning abs abs task, let L be a set of predicates, let ΠLabs = (FLabs , Aabs L , IL , G L ) be an abstract planning task and let k be a number k > 0. An abstract plan πLabs is an action sequence πLabs = {a0 , . . . , ak−1 , ak , . . . , an−1 }. The abstract plan is divided into two sub-plans: the first one is a sound plan composed of the k first actions ∀i = 0 . . . k − 1, ai ∈ A; and the second one is composed of n − k abstract actions ∀i = k . . . n − 1, ai ∈ Aabs . Abstractions have been defined to decrease the complexity of problem solving. Actions before the horizon k are the original actions, so the head of the plan is a valid one. After the horizon, vrp only uses abstract actions, which allows for a much faster problem solving. In many cases, these abstract actions will not be executable in the environment, because some required preconditions have been removed from the definition of the action. But, on the other hand, these actions would be executed in the future (after k steps). Assuming the world is stochastic, in any case the plan will eventually fail before even attempting to execute them. So, it is not that important from the whole point of view of planning and execution that they are not completely valid, because they will
3 Moving the rover from one waypoint to another.
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
5
Fig. 3. Plans generated for the Rovers task described previously. Left: plan generated by a standard planner. Right: abstract plan generated by removing predicate at with horizon k = 4.
most probably not be used in the current run. When (if) the plan fails, a new plan will be computed with horizon k again, so the first abstract actions of the previous plan will be replaced by original valid actions. Fig. 3 compares the abstract plan generated from the example abstract planning task with k = 4 to the plan generated from the example planning task (Fig. 2). Since predicate at is removed from horizon k = 4, no more navigation actions appear in the abstract plan. The abstract plan is easier to compute than the standard plan, but it is composed of abstract actions which cannot be executed in the environment.
Table 1 Predicate categorization for the Rovers domain. Static
Dynamic
at_lander can_traverse equipped_for_soil_analysis equipped_for_rock_analysis equipped_for_imaging available supports visible
at communicated_soil_data empty communicated_rock_data have_rock_analysis communicated_image_data have_soil_analysis full calibrated have_image channel_free at_soil_sample at_rock_sample communicated_soil_data communicated_rock_data communicated_image_data
5. The variable resolution planning algorithm As we have discussed previously, predicates can be removed from the original planning task to decrease the search space and generate a new smaller abstracted search state. But, first, we need to choose what predicates will be selected (discussed in Knowledge gathering) and when they will be removed during search (discussed in Search). 5.1. Knowledge gathering The first phase of vrp consists on gathering knowledge from the planning task to select predicates to abstract over. The relevance of each predicate depends of different factors: (i) the type of predicate – if the predicate is added or deleted during search –; (ii) if the information that describes the predicate is relevant to achieve the goals; and (iii) the information on a predicate is relevant to select other predicates. During this phase, some predicates are selected according to some of these factors, and will form the predicates set of a planning task ps(Π ). We have defined four categories of predicates:
• Static: is a predicate which does not appear in the effects of any action of the planning task. • Dynamic: is a predicate which is part of the effects of at least one action of the planning task. • Goal: is a predicate that represents some of the goals of the planning task. • Function: is a predicate that describes a property of the world that is represented using numeric values. According with this categorization, it is easy to identify the relevance of the different types of predicates in relation to selecting them for abstraction. Static predicates are invariant, so they do not change during the planning process. For this reason, eliminating them does not offer any real opportunity to decrease the complexity of the search task. Goal predicates cannot be removed, since the planning task would be transformed into another unsolvable planning task. Our objective consists on
Goals
visible_from store_of calibration_target on_board
relaxing some parts of the planning task while keeping the same task. Functions can be good candidates but, for practical reasons, in this paper we do not tackle domains with functions; Fast Downward code [22], on which this research is based, cannot handle numeric preconditions in the actions. Finally, dynamic predicates change during search. Clearly, these are the best option to simplify the planning task by abstracting them without losing goals reachability. As an example, Table 1 shows the predicate categorization for the Rovers domain. There can be potentially many different ways to choose those predicates. In previous papers [23,24], we explored one alternative that consisted on some experimentation to compare the results obtained by choosing different predicates to abstract. In this paper, we describe two new alternatives: manually and landmarks based. 5.1.1. Manual selection According to our previous analysis of abstracted planning tasks, a first approach to select the predicates to remove is to rely on human expert knowledge. In this domain, the best candidate from the standpoint of an expert is predicate at. This predicate describes the location of a robot which is continuously moving between locations that are connected by different paths. The robots need to move to complete other tasks. Since there are different paths to move from any location to any another location, removing predicate at is a good choice to decrease the complexity of the search process, while not introducing dead-ends. Thus, it avoids that the planner continuously reasons about the position of the rover and computing a new path to the next task for actions to be performed in the far future. The planner will deal with those computations when the next action fails, ignoring again movement actions in the far future. For each domain, we will provide in the
6
M. Martínez et al. / Robotics and Autonomous Systems (
experimental section a rationale for choosing some predicates over others. The advantage of this approach is that it works quite well if you know well the domains at hand. The disadvantage is that it might require some manual trial and error. 5.1.2. Landmarks based selection Landmarks can be defined as logical formulas (possible consisting of a single fact) that must be achieved at some time in every valid plan [25]. There are two kinds of landmarks, fact and action landmarks, which are formally defined next:
• f ∈ F is a fact landmark of the planning task Π , if for each execution of each valid plan π that solves Π , f is true at some point.
• a ∈ A is an action landmark of the planning task Π , if for each valid plan π that solves Π , a ∈ π . Landmarks were firstly used to decompose a planning task into smaller tasks to simplify the search [26]. Later, landmarks have been used for heuristic function definition [27] by counting the number of landmarks that still need to be achieved to solve the task. Regarding the example shown on Fig. 2, the landmark set L is composed of landmarks related with the location of the rover, the state of the rover store and the analysis of rock and soil. L(Π ) = {at(r1 w 03), at(r1 w 10), at(r1 w 22), empty(rs1),
full(rs1), have_soil_analysis(r1, w22), have_rock_analysis(r1, w10). communicated_soil_data(w22), communicated_rock_data(w10)}. There are several algorithms to compute landmarks [28,29]. Once landmarks are computed, we select a subset of landmarks composed of dynamic predicates, removing goal predicates. Definition 8 (Predicates Set Extracted From the Landmark Set). Let Π = (F , A, I , G) be a planning task, and L(Π ) the set of landmarks of Π . The predicates set to be abstracted, extracted from the landmark set is: ps(Π ) = {p | l ∈ L(Π ), l ̸∈ G, l ∈ g (p, F )}. According to the previous definition, the predicates set for the planning task depicted on Fig. 2 is ps(Π ) = {at, empty, full,
have_rock_analysis, have_soil_analysis}. 5.2. Abstractions generation The next phase generates abstractions using the information of the predicates set computed in the previous phase. According to the definitions on Section 4, vrp builds the abstraction over the propositional representation [24]. But, currently, most planners use a multi-valued state variable representation, which is an extension of the SAS+ formalism [30]. The complete abstraction generation process is composed of two steps (encoding abstractions on SAS+ and generating abstract actions). Before we begin the detailed description of these steps, we need to describe first planning tasks in SAS+. A planning task in SAS+ is defined as a 4-tuple Π = (V , s0 , s∗ , O). V is a set of state variables, where each variable v ∈ V has an associated extended domain D∗v = Dv ∪ {u} which is composed of a set of values Dv and the value undefined, u. A value of a variable v ∈ V in a given state s, also known as fluent, is defined as s[v]. Partial states are states in which at least one fluent s[v] = u. s0 is the initial state which is defined over V such that s0 [v] ̸= u∀ v ∈ V . s∗ is a partial state that defines the goals. O is a finite set of operators over V , where each operator o ∈ O is
)
–
a 3-tuple o = (pre(o), post (o), prev(o)). The preconditions pre(o) define which variable values must be true before the operator is applied. These variables will be modified by the application of the operator. The post-conditions post (o) describe the new values of the variables that will be modified by the application of the operator. The prevail-conditions prev(o) define which variables’ value must be true before the operator is applied and will not be modified by the application of the operator. 5.2.1. Encoding abstractions on SAS+ Planners that use SAS+ transform a planning task expressed in PDDL into SAS+ using multi-valued state variables. Each variable value (a fluent in SAS+) is equivalent to a grounded predicate. According to the example depicted in Fig. 2, a SAS+ planner would generate a state variable r1 to define the position of the rover r1 . It will be composed of 16 fluents, once for each position in which the rover can be: r1 = {at − w00, . . . , at − w33} ∪ {u}. In this case, if in a state s s[r1] = at − w00, it would represent the same information as the grounded predicate at-r1-w00. However, in many domains, variables’ values can come from groundings of different predicates. For instance, if rovers could hold rocks and move rocks to other locations, the variable that represents the location of the rock rc1 can have as values all waypoints in the planning task (at-w00 to at-w33), as well as all robots that can hold it (holds-r1 to holds-r n in the case of a planning task with n robots). Thus, in order to apply the abstraction mechanism described before, we cannot just remove variables from the planning task, but values of variables that correspond to predicates in the predicates set. Therefore, the first stage of the process of the generation of abstractions consists on marking fluents from SAS+ variables. A new property must be defined for each fluent. Property a(vi ) indicates whether a fluent must be removed when abstractions are turned on. The marking process consists on identifying which fluent must be removed. In order to determine if a fluent is part of the predicates set, a function n(vi ) has been defined. This function extracts the name of the predicate that represent the fluent vi . The name of the predicate of each fluent is compared with each element from the predicates set. If the fact name of the fluent is part of the predicates set, the fluent is marked as abstract. The result of this stage is a set, called ap, which is composed of the fluents marked as abstract. 5.2.2. Generating abstract actions The second stage of the abstractions generation process consists on generating abstract operators using SAS+ fluents marked as abstract. A new set of operators is defined as the abstract operator space Oabs . It is composed of the new operators computed after removing fluents marked as abstract from standard operators. For each operator o ∈ O a new operator oabs is generated, where preabs (o) = pre(o) \ ap, prevabs (o) = prev(o) \ ap and post abs (o) = post (o) \ ap. If all fluents from the post and pre sets are removed, the operator is not added to Oabs . Fig. 4 illustrates the result of generating abstract operators over SAS+ for the example described in Fig. 2. As we can see in the example, operator navigate-r1-w01-w02 does not generate any abstract operator when the abstraction is applied. However, operator communicate_rock_data-r1-general-w10-w20-w30 generates an operator with one precondition less. According abstract to this, Oabs ≤ |O|. The result of this step is a SAS+ representation including abstract operators generated using the set ap. 5.3. Search using predicate abstraction Finally, the third phase of vrp consists on generating an abstract plan using both operator space O and abstract operator space Oabs
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
7
Fig. 4. Multi-valued state variable representation of the Rovers planning task including abstract operators when predicate at is removed.
during search. Our approach is based on relaxing some far future details. Thus, we have to choose from what point on these details are relaxed. We have reused the concept of horizon which has been previously employed with success [31]. Definition 9 (Horizon). A valid Horizon is any k natural number such that k > 0. The horizon described in Definition 9 is employed to define what operators’ set must be used to generate the successors of any node n during search. For each node, the depth of the corresponding node d(n) is compared with k. If k > d(n), the function that computes the successors of n, getApplicableOps(n) uses the standard operators set O to choose the applicable operators. Otherwise, getApplicableOps(n, k) uses the abstract operators set Oabs to choose them. getApplicableOps(n, k) =
operators(O, state(n)) operators(Oabs , state(n))
d(n) < k (1) d(n) ≥ k.
During search, the effect of the horizon is combined with predicate abstractions, effectively pruning the search space. The combination of both parameters decreases the number of operators available for each state si when depth is greater than k. Thus, it is expected that solving time decreases, because the
number of generated states will be much less. However, the quality of the generated abstract plan depends on the value of k. On one hand, small values of k increase the speed of the planning process, and decrease the number of search nodes. Potentially, this diminishes the quality of the plan and increases the number of replanning steps when this technique is used on a planning and execution cycle (more abstract actions means more execution failures). On the other hand, high values of k increase the quality of the plan, but increase planning time. In the worst case, if the value of k is close to the size of the standard plan, the planning time using horizon and predicate abstraction will be bigger than the standard planning time. In order to understand how the value of k influences the search process, we will choose a representative set of values to use on the experiments reported next.
6. Planning and execution environment
This section presents the planning and execution environment used to deploy vrp. We have implemented a planning, execution and replanning loop based on a light version of the PELEA
8
M. Martínez et al. / Robotics and Autonomous Systems (
architecture4 [32,33] which uses the simulator MDPSim [34] to emulate the execution of plans. MPDSim executes the actions according to a given probabilistic action model described in PPDDL which is based on the original domain. Besides, for each domain, we have defined an error simulator which increases the variability of the execution dynamics. 6.1. PPDDL: probabilistic PDDL The Probabilistic Planning Domain Definition Language (PPDDL) is the standard representation language that allows users to describe Automated Planning problems in stochastic domains. It was designed in the first International Planning Competition (IPC5 ) with a non-deterministic track [34]. This language is an extension of PDDL that supports actions with probabilistic effects. Fig. 5 shows a navigate action for Rovers Domain encoded in PPDDL. It represents the possible outcome of an action with stochastic effects, where the Rover will move to the desired waypoint (denoted by variable ?z) with a probability of 0.8. Since probabilities do not sum one, with the remaining probability (0.2), there will be no changes in the state (the rover will remain in the current waypoint, ?y).
6.2.1. Monitoring Monitoring is the main module of the PELEA architecture. This module synchronizes communications between other modules (Execution and Decision Support). The main loop Algorithm 1 of this module consists on monitoring action execution, sending the next action to the Execution Module (line 11), and asking for a new plan to the Decision support module (line 3) if an execution failure is detected (line 5). Besides, it is responsible for checking differences between the expected state and the observed state of the environment sent by the Execution module (line 5). If an observed state is not valid according to Definition 10, this module starts another planning episode to generate a new plan according to the observed state (line 16). Algorithm 1: Pseudo-code of the main loop of the Monitoring Module
2 3 4
The planning, execution and replanning loop is shown in Fig. 6.6 It allows us to evaluate vrp using a planner which implements that technique. This loop is executed using different modules of the PELEA architecture. These modules are independent and are connected between them using TCP connections. The planning and execution process starts with the Execution module. Given a planning task (consisting of a deterministic PDDL domain and problem) and a horizon k, the Execution module requests a plan π to the Monitoring module, which, in turn, requests a plan to the Decision support module. This module generates a plan using a vrp planning system which takes as input the deterministic PDDL domain and problem as well as k. Next, the Monitoring module iteratively sends every action ai in the plan to the Execution module. Finally, the Execution module sends each action to the real world to be executed and receives the new observed state. In our case, the real world has been realistically simulated by using two modules. The Error simulator transforms the action ai into another action aei in PPDDL such that it incorporates new stochastic effects into the action model. Next, the Error simulator sends the new action to the actual simulator, MDPSim in this case. MDPSim executes each action using a stochastic action model in PPDDL, manually generated from the original action model (PDDL domain). After every execution of an action, if the observed state is not valid according to the deterministic version of the domain (as in Definition 10), PELEA generates a new plan using the observed state as the new initial state. This process is repeated using a replanning, execution and monitoring loop until it senses a state where the goals have been reached. Due to the characteristics of the simulator, we assume that the different agents (robots) can wait in a secure state during the computation of a new plan.
4 https://bitbucket.org/momartin/pelea stores the source code of the architecture which has been developed in Java 1.7. There are some detailed instructions in the repository on how to use it. 5 http://ipc.icaps-conference.org. 6 https://bitbucket.org/momartin/peleahorizon stores the source code of the different modules developed to deploy VRP in the PELEA architecture. There are some detailed instructions in the repository on how to use it.
–
Definition 10 (Valid State). Let Π = (F , A, I , G) be a planning task, let a ∈ A be the next action to be executed in the current plan and let s ⊆ F be the current state of the execution. s is a valid state for the execution of action a if ∀pi ∈ Pre(a), pi ∈ s.
1
6.2. Planning and execution architecture
)
5 6 7
Data: Φ = (F , A, I , G), k begin Set state = I , plan = ∅; plan = send getPlan(state, A, k) to DecisionSupport; repeat if validState(state) then if goalReached(state, G) then return Successful Execution; else
8
12
if plan ̸= ∅ then a = getNextAction(plan); send ExecuteAction(a) To Execution; state = send getState() To Execution;
13
else
9 10 11
Return No solution;
14 15 16 17
else
plan = send getPlan(state, A, k) to DecisionSupport;
until true;
6.2.2. Decision support This module generates a plan of actions by running a planner. Additionally, when the Monitoring module informs about a discrepancy between the observed state and the expected planning state, the Decision support module will also run the planner that will generate a new plan according to the current state. This module can be configured to use two different types of planners: one for planning from scratch and another one to perform replanning or plan repair. 6.2.3. Execution This module executes each action in the environment which is simulated using MDPSim. Besides, it is responsible for initiating the work of PELEA by sending to the Monitoring module a problem and domain definition to be solved. 6.2.4. MDPSim simulator This simulator, that was used in the first IPC probabilistic track [34], is based on an MDP, and it was developed to simulate the execution of PDDL actions in stochastic environments. MDPSim executes each action according to a given probabilistic action model described in PPDDL and sends back the resulting state. The PPDDL model generates execution failures, such as actions that generate effects that were not present in the original deterministic PDDL model used by the planner.
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
9
Fig. 5. PPDDL definition for action navigate from Rover Domain.
Fig. 6. Light PELEA architecture used to evaluate vrp.
6.2.5. Error simulator MDPSim can simulate a probabilistic model to generate different outcomes for each action using PPDDL, but it is extremely difficult to generate outcomes that change some details of the environment. For instance, in the Rovers domain, in the real world a rover could move to another waypoint different to the expected one depending on the ground properties (roughness, slippery) or the lack of precision of the actuators. But, this is difficult to encode using PPDDL. Therefore, we have developed an error simulator that generates additional execution failures which cannot be (easily) encoded into the PPDDL actions. This information is encoded as parameters into the action and used in the effects to generate different outcomes, which cannot be defined by the grammar accepted by the MDPSIM parser. So, we have two levels of failures generation to account for execution failures as realistic as possible. Fig. 7 shows a navigate action for Rovers Domain encoded in PPDDL. It represents the possible outcome of an action with stochastic effects, where the Rover will move to the desired waypoint (denoted by variable ?z) with a 0.8 probability, and two different accessible waypoints (denoted by variables ?we1 and ?we2) with a 0.05 probability. Since probabilities do not sum one, with the remaining probability (0.1), there will be no changes in the state (the rover will remain in the current waypoint, ?y). In this case, the action navigate includes two new parameters which represent two waypoints. These parameters are incorporated to the action model by the Error simulator.
• Classical planning, using planning and replanning when an execution failure is detected. We use LAMA11, an anytime planner developed within the Fast-Downward framework [27]. Once LAMA11 has found a first solution, it continues to search for better solutions until it exhausts the search space or the available resources (memory and/or time). LAMA11 was the winner of the sequential satisficing track of IPC2011. LAMA11 assumes it will be given 30 min. to generate a plan. This is unreasonable for most robotics tasks. Therefore, in our case, we only use the first solution and we call this configuration lamaf. • Probabilistic planning, that uses a PPDDL model, so it has information on probabilistic outcomes of actions. We have used mGPT [6], the winner of the last Probabilistic track based on PPDDL. • Reactive planning, where only the best next action is considered without deliberation. In order to provide a better behavior than a pure reactive planner, we use akfd (2) which can be considered a better option than a reactive planner. • Our approach, akfd, with different levels of reactiveness, depending on the value of parameter k. If k is small, as 1 or 2, its performance is very close to that of a reactive planner. When k is large, it gradually turns into classical planning, as the first approach using lamaf. We present the results in five different robotics domains (Rovers, Depots-robots, TidyBot, Port and Satellite). They are very representative of current applications of planning and robotics, as Mars rovers (Rovers), robots operating in depots (Depots-robots7 ),
7. Experimental results This section presents the experimental results of using akfd for planning and execution. We compare its performance against the closest competitors models in terms of planning and execution systems based on PDDL and its variants:
7 This domain is inspired in the Kiva robots [35] that Amazon uses in its warehouses. In this domain, there is a set of robots, a set of humans and a set of pods which are located in a warehouse. The warehouse is defined as a grid composed of cells. Robots can move among adjacent free cells and carry pods to humans. Humans are located in specific locations where they can use products which are contained in the pods.
10
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
Fig. 7. PPDDL definition for action navigate from Rovers Domain. This action includes new information which is computed for the error simulator.
household robotics (TidyBot), robotic cranes (Port) [36] and space observation satellites (Satellite). For each domain, we have chosen five representative problems of different complexity (two easy problems, one of medium complexity and two hard problems). We have developed Abstract k Fast-Downward (AKFD)8 using C++ over the Fast-Downward framework [22]. The experiments were conducted on an Intel Xeon 2.93 GHz Quad Core processor (64 bits) running under Linux. The maximum available memory for the planners was set to 8 GB, the maximum planning time for a problem has been set to 1000 s and the maximum execution time for a complete planning, execution and replanning loop for each planning task has been set to 86 400 s (1 day). Each problem has been executed 15 times until goals are reached (we call it a run), the maximum planning time is reached, the maximum execution time is reached or a dead-end is detected. We have run akfd with different values of k = 2, 5, 10, 20, 30, 50, 100. The values of 50 and 100 have only been used in the Rover domain. 7.1. Rovers domain In this domain, which we have described before, we have designed an error model based on four failures which have been encoded into PPDDL. (i) There is a general error which prevents the execution of any action. Each action has a probability equal to 0.05 that the action is not executed properly. (ii) A calibration error happens when a rover tries to take a picture, and the camera accidentally removes its calibration, so the rover needs to calibrate the camera again. This error has a probability equal to 0.1. (iii) A communication error occurs when a rover tries to send samples or images to the lander and the information sent by the rover is never received. The sample is lost and the rover must take the sample again. This error has a probability equal to 0.1. The domain definition has been modified to allow these kinds of failures. Rock and soil samples do not disappear from a waypoint when a rover uses them. And (iv) a navigation error happens when a rover moves to a different waypoint than the expected one when it is navigating. In the case that the rover finished its movement in a different waypoint, this waypoint must be connected with the origin or destination waypoint. This error has a probability equal to 0.15. Table 2 shows the results for five different problems of the Rovers domain where an abstraction has been computed using predicate at. This predicate has been selected manually. The results report the average over 15 executions and the standard error.
8 https://bitbucket.org/momartin/akfd stores the source code of the planner AKFD
lamaf solves all problems, but using much longer planning time that our approach. Besides, our approach decreases on a order of magnitude the planning and execution time on problems 2037 and 2038. Also, our planner decreases the first planning time on a order of magnitude in all problems in the Rovers Domain. We can also see that there is no significant variation on the first planning time when the horizon is increased. If the horizon is smaller than 5, akfd cannot solve problems. But, when k ≥ 5, akfd can solve all problems decreasing the planning time. When k = 30, akfd obtains the best results, increasing the quality of the plan (number of actions in the plan) and decreasing the number of replanning steps. We can also see that an approach based on MDPs, mGPT, that receives as input more information than our approach (the probabilistic effects of actions), is not able to scale up and does not solve any problem. Finally, the results show that yahsp can generate plan of actions quicker than akfd. However, the quality9 of the generated plans is much worse than the one of akfd. This fact is extremely important in real world scenarios where in most cases we want to minimize either the execution cost or execution time. In case we want to minimize the total planning and execution time, given that in most cases the action’s durations takes more than 1 s, the combined planning and execution times of lamaf are longer than those of akfd. In order to show an accumulated total time for planning and execution, we need to estimate the actions’ duration. If we assume an average of 60 s, the row labeled as PE shows the sum of planning and execution time in the environment. Table 3 shows the results for five different problems of the Rovers Domain where the abstraction has been computed using the automatically generated landmarks. In this case, akfd obtains similar results than those reported in Table 2, but the abstraction has been automatically generated. If k ≥ 10, planning time is in most configurations two/three times less than the one employed by lamaf. In general, akfd decreases the mean of the planning time for the full execution loop and the first planning time. This is a key factor in domains with high stochasticity or in which reasoning time should be minimal. According to the results reported in both Tables 2 and 3, we can extract some important information about the horizon and the domain. (1) increasing the value of the horizon does not necessarily decrease the planning time or improve the quality of plans. The results improve until around k = 30 and then they start worsening their performance. And (2), small values of the horizon can decrease the planning time of each planning step, but these values increase the number of replanning steps and degenerate the
9 Quality is measured as the number of actions in this domain.
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
11
Table 2 Comparing akfd using different horizons to lamaf and mGPT over five problems from the Rovers domain manually removing predicate at. F corresponds to planning time for the first planning episode in seconds. T corresponds to the total execution time (including planning time) of all runs in seconds. R corresponds to the number of replanning episodes. A corresponds to the number of actions executed in the simulated environment. PE corresponds to the full planning and execution time and C corresponds to the number of problems solved. In bold, we highlight the best results per problem. Besides, for each problem we have defined three values which describe the complexity of the problem: (i) number of rovers; (ii) number of waypoints and (iii) number of goals. Planner
M
Problem Rovers 37 (14,80,40)
Rovers 38 (14,85,57)
Rovers 39 (14,95,62)
Rovers 40 (14,100,68)
Rovers 80 (100,60,98)
lamaf
F(s) T(s) R A PE(s) C
412.1 ± 3.3 7125.5 ± 1145.5 75.6 ± 7.4 512.2 ± 28.9 37 857.5 ± 2879.5 15,0,0,0/15
236.5 ± 1.78 3205.1 ± 236.7 59.2 ± 7.3 432.8 ± 42.3 29 173.1 ± 2774.7 15,0,0,0/15
257.1 ± 1.3 3342.1 ± 845.4 68.2 ± 5.1 486.8 ± 21.2 32 550.1 ± 2117.4 15,0,0,0/15
354.8 ± 2.2 4406.1 ± 872.1 70.1 ± 10.2 530.8 ± 23.3 36 254.1 ± 2270.1 15,0,0,0/15
– – – – – 5,0,0,0/5
mGPT
F(s) T(s) R A PE(s) C
– – – – – 0,0,15,0/15
– – – – – 0,0,15,0/15
– – – – – 0,0,15,0/15
– – – – – 0,0,15,0/15
– – – – – 0,0,5,0/5
akfd (k = 2)
F(s) T(s) R A PE(s) C
14.7 ± 0.1 22 069.1 ± 9344.2 1717.4 ± 715.7 3082.6 ± 1293.1 207 025.1 ± 86 930.2 15,0,0,0/15
21.5 ± 0.1 13 503.3 ± 3226.8 695.40 ± 169.04 1203.60 ± 290.37 85 719.3 ± 20 649.1 15,0,0,0/15
15.3 ± 0.1 18 442.6 ± 8622.77 1423.3 ± 688.9 2499.5 ± 1233.3 68 412.6 ± 82 620.7 15,0,0,0/15
25.3 ± 0.1 15 922.1 ± 2466.2 704.4 ± 113.7 1297.6 ± 192.3 93 778.1 ± 14 004.2 15,0,0,0/15
86.5 ± 0.3 20 335.7 ± 3372.8 293.5 ± 44.5 555.5 ± 247.5 58 665.7 ± 17 564.9 5,0,0,0/5
akfd (k = 5)
F(s) T(s) R A PE(s) C
13.8 ± 0.1 2311.6 ± 218.2 185.7 ± 16.5 705.1 ± 62.2 44 617.6 ± 3950.2 15,0,0,0/15
20.1 ± 0.1 2789.9 ± 94.5 152.7 ± 4.8 557.1 ± 25.6 36 215.9 ± 1630.5 15,0,0,0/15
14.5 ± 0.1 2083.1 ± 149.8 167.7 ± 13.5 633.7 ± 43.9 40 105.1 ± 2783.8 15,0,0,0/15
23.7 ± 0.3 3350.5 ± 194.8 157.3 ± 10.5 617.7 ± 61.5 40 412.5 ± 3884.8 15,0,0,0/15
86.4 ± 0.2 10 298.6 ± 47.7 151.2 ± 21.1 583.5 ± 74.5 45 308.6 ± 1317.7 5,0,0,0/5
akfd (k = 10)
F(s) T(s) R A PE(s) C
13.8 ± 0.1 1271.3 ± 95.3 104.1 ± 8.8 592.3 ± 25.7 36 809.3 ± 1637.3 15,0,0,0/15
20.2 ± 0.1 1347.7 ± 144.9 73.1 ± 8.2 400.1 ± 25.7 25 353.7 ± 1686.9 15,0,0,0/15
14.6 ± 0.1 1380.1 ± 52.9 110.7 ± 5.8 593.7 ± 23.5 37 002.1 ± 1462.9 15,0,0,0/15
23.9 ± 0.1 1996.5 ± 69.3 95.1 ± 2.8 568.3 ± 18.9 36 094.5 ± 1203.3 15,0,0,0/15
86.3 ± 0.1 6447.9 ± 384.9 92.5 ± 5.5 517.7 ± 14.2 37 509.9 ± 1236.9 5,0,0,0/5
akfd (k = 20)
F(s) T(s) R A PE(s) C
13.9 ± 0.1 885.4 ± 34.1 72.1 ± 2.5 528.7 ± 13.1 32 607.4 ± 820.1 15,0,0,0/15
20.4 ± 0.1 1135.9 ± 220.5 60.7 ± 11.9 447.7 ± 54.6 27 997.9 ± 3496.5 15,0,0,0/15
14.6 ± 0.1 1135.4 ± 120.7 91.3 ± 9.2 553.7 ± 15.9 34 357.4 ± 1074.7 15,0,0,0/15
23.9 ± 0.1 2020.4 ± 267.1 94.1 ± 12.6 571.3 ± 56.8 36 298.4 ± 3675.1 15,0,0,0/15
86.7 ± 0.1 4289.5 ± 207.5 59.5 ± 14.5 476.1 ± 71.4 32 855.5 ± 1491.5 5,0,0,0/5
akfd (k = 30)
F(s) T(s) R A PE(s) C
14.1 ± 0.1 828.5 ± 61.9 67.1 ± 4.2 500.1 ± 5.7 30 834.5 ± 403.9 15,0,0,0/15
20.4 ± 0.2 1052.6 ± 319.1 56.7 ± 17.4 412.7 ± 48.8 25 814.6 ± 3247.1 15,0,0,0/15
14.8 ± 0.1 1115.1 ± 142.1 88.7 ± 12.5 538.1 ± 42.6 33 401.1 ± 2698.1 15,0,0,0/15
23.9 ± 0.1 1668.8 ± 198.4 78.1 ± 9.4 534.3 ± 45.9 33 726.8 ± 2952.4 15,0,0,0/15
88.4 ± 0.5 4116.6 ± 292.2 58.1 ± 4.2 465.5 ± 51.5 32 046.6 ± 1982.2 5,0,0,0/5
akfd (k = 50)
F(s) T(s) R A PE(s) C
22.2 ± 2.1 1025.3 ± 158.3 78.4 ± 12.4 521.1 ± 43.5 32 291.3 ± 2768.3 15,0,0,0/15
30.7 ± 0.2 1377.7 ± 200.9 69.6 ± 10.4 558.1 ± 22.2 34 863.7 ± 1532.9 15,0,0,0/15
22.4 ± 0.2 1107.1 ± 149.1 84.1 ± 11.6 536.8 ± 63.6 33 315.1 ± 3965.1 15,0,0,0/15
35.2 ± 0.1 1727.8 ± 315.2 75.8 ± 14.3 520.8 ± 21.6 32 975.8 ± 1611.2 15,0,0,0/15
92.3 ± 0.2 4226.6 ± 493.4 57.5 ± 6.5 440.2 ± 41.1 30 638.6 ± 2959.4 5,0,0,0/5
akfd (k = 100)
F(s) T(s) R A PE(s) C
22.3 ± 0.1 969.1 ± 98.8 73.6 ± 7.7 503.8 ± 26.68 31 197.1 ± 1699.6 15,0,0,0/15
31.4 ± 0.3 1395.9 ± 299.1 70.1 ± 15.3 468.4 ± 43.1 29 499.9 ± 2885.1 15,0,0,0/15
22.9 ± 0.2 1116.3 ± 115.5 84.2 ± 9.2 535.8 ± 20.6 33 264.3 ± 1351.5 15,0,0,0/15
36.1 ± 0.2 1561.6 ± 215.2 67.6 ± 8.8 506.6 ± 41.9 31 957.6 ± 2729.2 15,0,0,0/15
102.7 ± 0.4 3454.6 ± 37.4 46.5 ± 0.5 429.1 ± 53.2 31 244.6 ± 829.4 5,0,0,0/5
quality of the solution. Besides, there is not big difference between generating abstractions using the manually selected predicate or using the ones automatically generated using landmarks. Table 4 shows the predicate set generated for each problem of the Rovers domain using landmarks. The selection tech-
nique counts the number of times that a predicate is identified as a landmark. This value may be considered as an estimator of the relevance of a predicate. According to this estimator, predicates have_rock_analysis, have_image, have_soil_analysis are the most relevant predicates for this
12
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
Table 3 Comparing akfd using different horizons to lamaf and mGPT over five problems from the Rovers domain. The meaning of columns is the same as in Table 2. Abstractions were computed using landmarks. Planner
M
Problem Rovers 37 (14,80,40)
Rovers 38 (14,85,57)
Rovers 39 (14,95,62)
Rovers 40 (14,100,68)
Rovers 80 (100,60,98)
lamaf
F(s) T(s) R A PE(s) C
412.1 ± 3.3 7125.5 ± 1145.5 75.6 ± 7.4 512.2 ± 28.9 37 857.5 ± 2879.5 15,0,0,0/15
236.5 ± 1.78 3205.1 ± 236.7 59.2 ± 7.3 432.8 ± 42.3 29 173.1 ± 2774.7 15,0,0,0/15
257.1 ± 1.3 3342.1 ± 845.4 68.2 ± 5.1 486.8 ± 21.2 32 550.1 ± 2117.4 15,0,0,0/15
354.8 ± 2.2 4406.1 ± 872.1 70.1 ± 10.2 530.8 ± 23.3 36 254.1 ± 2270.1 15,0,0,0/15
– – – – – 5,0,0,0/5
mGPT
F(s) T(s) R A PE(s) C
– – – – – 0,15,0,0/15
– – – – – 0,15,0,0/15
– – – – – 0,15,0,0/15
– – – – – 0,15,0,0/15
– – – – – 0,5,0,0/5
akfd (k = 2)
F(s) T(s) R A PE(s) C
20.2 ± 0.5 6814.2 ± 1634.9 572.2 ± 136.7 1432.1 ± 320.7 92 740.2 ± 20 876.9 15,0,0,0/15
28.9 ± 0.2 5024.6 ± 947.9 280.1 ± 53.1 699.6 ± 119.6 47 000.6 ± 8123.9 15,0,0,0/15
21.3 ± 0.2 6926.3 ± 2002.5 575.1 ± 171.5 1406.1 ± 407.9 91 292.3 ± 26 476.5 15,0,0,0/15
32.9 ± 0.4 6420.8 ± 829.5 311.4 ± 40.6 802.5 ± 101.3 54 570.8 ± 6907.5 15,0,0,0/15
262.6 ± 0.1 19 956.2 ± 3580.8 315.5 ± 58.5 588.5 ± 109.5 55 266.2 ± 10 150.8 5,0,0,0/5
akfd (k = 5)
F(s) T(s) R A PE(s) C
19.8 ± 0.1 2250.8 ± 157.8 189.1 ± 12.8 747.3 ± 37.2 47 088.8 ± 2389.8 15,0,0,0/15
29.1 ± 0.3 2128.1 ± 229.1 117.1 ± 13.6 500.4 ± 34.7 32 152.1 ± 2311.1 15,0,0,0/15
21.4 ± 0.3 2061.1 ± 373.7 169.4 ± 31.7 697.9 ± 119.1 43 935.1 ± 7519.7 15,0,0,0/15
33.1 ± 0.5 3250.4 ± 371.2 156.4 ± 18.18 656.1 ± 68.2 42 616.4 ± 4463.2 15,0,0,0/15
262.3 ± 0.7 8467.3 ± 239.9 131.50 ± 5.8 522.50 ± 28.5 39 817.3 ± 1949.9 5,0,0,0/5
akfd (k = 10)
F(s) T(s) R A PE(s) C
20.2 ± 0.3 1238.5 ± 108.8 103.1 ± 9.3 606.7 ± 43.8 37 640.5 ± 2736.8 15,0,0,0/15
29.1 ± 0.2 1496.4 ± 144.4 82.4 ± 8.3 447.6 ± 29.6 28 352.4 ± 1920.4 15,0,0,0/15
21.5 ± 0.2 1235.2 ± 91.9 100.2 ± 7.9 565.6 ± 33.4 35 171.2 ± 2095.9 15,0,0,0/15
33.1 ± 0.5 2059.1 ± 319.4 98.2 ± 15.9 570.1 ± 73.3 36 265.1 ± 4717.4 15,0,0,0/15
260.7 ± 2.9 6287.4 ± 234.6 97.1 ± 6.32 480.4 ± 18.32 35 111.4 ± 1333.8 5,0,0,0/5
akfd (k = 20)
F(s) T(s) R A PE(s) C
20.3 ± 0.6 1009.1 ± 108.9 83.1 ± 8.1 537.5 ± 25.6 33 259.1 ± 1644.9 15,0,0,0/15
29.3 ± 0.4 1299.3 ± 143.2 70.8 ± 8.3 476.8 ± 29.2 29 907.3 ± 1895.2 15,0,0,0/15
21.7 ± 0.1 1030.7 ± 90.4 82.8 ± 6.7 545.6 ± 28.4 33 766.7 ± 1794.4 15,0,0,0/15
33.2 ± 0.4 1596.7 ± 172.5 75.6 ± 8.6 533.8 ± 41.8 33 624.7 ± 2680.5 15,0,0,0/15
261.9 ± 2.6 4435.9 ± 234.7 67.9 ± 6.3 502.6 ± 12.4 34 591.9 ± 978.7 5,0,0,0/5
akfd (k = 30)
F(s) T(s) R A PE(s) C
20.2 ± 0.1 953.1 ± 124.9 78.3 ± 10.6 536.3 ± 39.4 33 131.1 ± 2488.9 15,0,0,0/15
29.6 ± 0.4 1243.7 ± 151.9 67.2 ± 8.4 446.6 ± 33.6 27 839.7 ± 2167.9 15,0,0,0/15
21.8 ± 0.3 1023.1 ± 126.8 82.1 ± 10.2 542.3 ± 24.6 33 561.1 ± 1602.8 15,0,0,0/15
33.3 ± 0.2 1522.30 ± 130.9 72.2 ± 6.3 519.6 ± 25.1 32 698.3 ± 1636.9 15,0,0,0/15
263.9 ± 0.9 4210.30 ± 614.9 62.2 ± 11.8 450.5 ± 19.5 31 240.3 ± 1784.9 5,0,0,0/5
akfd (k = 50)
F(s) T(s) R A PE(s) C
20.9 ± 0.5 998.1 ± 154.3 81.1 ± 13.4 559.6 ± 38.3 34 574.1 ± 2452.3 15,0,0,0/15
29.5 ± 0.4 1242.6 ± 190.6 66.8 ± 10.7 460.8 ± 47.7 28 890.6 ± 3052.6 15,0,0,0/15
21.9 ± 0.3 992.8 ± 166.7 78.1 ± 13.1 555.9 ± 47.6 34 346.8 ± 3022.7 15,0,0,0/15
33.9 ± 0.7 1722.84 ± 287.97 81.1 ± 14.5 543.1 ± 27.7 34 308.8 ± 1949.97 15,0,0,0/15
266.2 ± 1.9 3730.1 ± 208.5 54.5 ± 12.3 468.4 ± 22.1 31 834.1 ± 1534.5 5,0,0,0/5
akfd (k = 100)
F(s) T(s) R A PE(s) C
23.18 ± 0.1 1074.2 ± 178.5 83.4 ± 13.3 558.1 ± 30.1 34 560.2 ± 1984.5 15,0,0,0/15
30.8 ± 0.1 1136.8 ± 170.3 60.2 ± 9.3 464.6 ± 22.2 29 012.8 ± 1502.3 15,0,0,0/15
23.1 ± 0.1 1076.3 ± 154.9 82.6 ± 11.8 535.9 ± 35.5 33 230.3 ± 2284.9 15,0,0,0/15
35.6 ± 0.3 1568.4 ± 216.5 72.1 ± 10.1 511.9 ± 24.3 32 282.4 ± 1674.5 15,0,0,0/15
278.1 ± 1.3 5230.5 ± 340.8 72.9 ± 15.8 467.8 ± 39.8 33 298.5 ± 2728.8 5,0,0,0/5
domain. But, we have chosen a different one to generate abstraction manually, at. Fig. 8(a) and (b) show a comparison between the average planning time for the whole cycle of planning and execution for problem 40. Fig. 8(a) shows all the iterations, while Fig. 8(b) shows a zoom of the first 60 iterations for better understanding. These results show that lamaf needs less replanning steps to solve the problem, because the plans generated by lamaf do not contain abstract actions and the plans are fully informed.
Plans generated by the different configurations of akfd need more replanning steps, but in this case the planning time is always very short. Interestingly, there is an iteration during the planning and execution cycle at which the planning time for lamaf is similar to the planning time for akfd and past that iteration lamaf takes less time to compute plans. These results suggest that akfd does not improve over standard planning from scratch from that point on. For Rover problem 40, this point is around 40 replanning steps. Besides, the computational cost of the planning task of akfd is
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
13
Table 4 Number of occurrences of each predicate identified as a landmark for each problem of the Rovers domain. Predicate
Problem
have_rock_analysis at have_soil_analysis empty calibrated have_image
Rovers 37 (14,80,40)
Rovers 38 (14,85,57)
Rovers 39 (14,95,62)
Rovers 40 (14,100,68)
Rovers 80 (100,60,98)
256 14 329 10 18 362
503 14 463 14 15 276
253 14 221 11 16 322
603 14 489 12 15 242
1973 100 1836 92 70 1785
(a) Full execution cycle.
(b) Partial execution cycle.
Fig. 8. (a) Average planning time for the whole cycle of planning and execution for Rovers problem 40. (b) Average planning time for the first 60 iterations of planning and execution for Rovers problem 40. The x axis shows the planning time for each step and the y axis shows the number of planning step.
Fig. 9. Evolution of the planning time depending on the value of k for problem 40. The x axis shows the value of k and the y axis shows the full planning time. Blue values show results using Manual selection and Red values show results using Landmark selection. The value of 400 corresponds with lamaf, because the solution of problem 40 has less than 400 actions and no abstraction is applied to solve the problem. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 10. Evolution of the full planning time depending on the value of k for problem 40 with different stochasticity levels. The x axis shows the value of k and the y axis shows the full planning time. The value of 400 corresponds to lamaf, because the solution plan of problem 40 has less than 400 actions and no abstraction is applied to solve the problem.
constant, regardless of the value of the horizon as we can see in the results shown in Tables 2 and 3.
This section reports the experimental results obtained in five different domains. For each domain, we have developed an error simulator which prevents the execution of the actions and generates exogenous events simulating a real world environment.
Fig. 9 shows the full planning time after execution for Rover problem 40 using different values of k. We can observe that values of k between 10 and 50 generate the best results. Small values of k increase the number of replanning steps, because the generated plan has more abstract actions. Meanwhile, high values of k diminish the number of replanning steps but increase the planning time of each one. Finally, Fig. 10 shows the full planning time after execution of Rover problem 40 using different values of k and different stochasticity levels. We can observe that the behavior of vrp is similar with different values of the stochasticity level of the environment. Given that a higher percentage of errors implies more replanning steps the total time obviously increases with the stochasticity level.
7.2. Results in other domains
7.2.1. Metrics We measure the performance of planners with two types of metrics: coverage and score. In this paper we are interested in metrics that take into account the quality of solution plans found, the planning and execution time and coverage.
• Coverage: number of problems solved from the benchmark. It is the official metric for the optimal track of the IPC since 2008.
• Score: planners get a score in the interval (0, 1] for each problem solved. This metric is used to score four different values: (1) First planning time; (2) planning and execution time; (3) number of replanning steps; and (4) number of executed actions. Let v∗ be the minimum value (seconds, steps or actions)
14
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
8. Related work Abstractions and Automated Planning (AP) techniques have been combined in the literature in different ways. Our work focuses on applying abstractions over Automated Planning to decrease the computational overhead in stochastic or dynamic environments. There are three principal trends which are related to our work: (i) approaches based on generating abstraction in AP to speed up the planning process or increasing its capabilities to solve hard problems; (ii) approaches interleaving planning and execution to solve problems using heuristic search; and (iii) approaches based on relaxing information about the environment to speed up the search technique. Fig. 11. Evolution of the full planning and execution time depending on the time needed to execute an action for problem 37 of the Rover Domain. The x axis shows the action execution time and the y axis shows the full planning and execution time.
for a problem and vp the value obtained by planner p. Then, the score is computed using score(v∗ , vp ) = v∗ /vp . 7.2.2. Results Table 5 shows a summary of the results for five different planning domains. We compare akfd using different horizons to lamaf. In general, our approach obtains much higher scores than those obtained by lamaf. akfd solves many more problems than lamaf, using less time, and keeping a similar quality of the solutions. 7.3. Summary of experimentation We have presented experimental results for comparing akfd with other different execution models: classical planning using replanning; probabilistic planning; and reactive planning. In general, akfd provides a reduction of two to three times of the planning time with respect to other approaches, maintaining the quality of the solutions. The reduction in planning time is specially relevant in the first planning episode, which is crucial in many domains, so that robots can execute actions as soon as possible. In addition, our approach is able to solve problems that cannot be solved by lamaf due to their complexity, so it scales much better than standard current state-of-the-art planning techniques when dealing with realistic planning–execution–replanning episodes. It can also be seen that mGPT could only solve very few problems, and a reactive approach, as akfd (2), does not perform as well as akfd with bigger values of k. In order to see the total planning and execution time, we present some results in Fig. 11. We have varied the average actions’ duration to see the performance of lamaf and akfd. We have also used yahsp [37], the winner of the agile track of the IPC’14. The agile track measures the time that it takes the planner to provide a solution, independently of its quality (execution time in this case). yahsp provides solutions faster than akfd, but its solutions include many more actions (in many cases around a 25% more actions). Thus, if we measure the total planning and execution time, we see that it increases a lot with the actions’ duration. Besides, yahsp was not able to scale up to harder problems and in some domains as TidyBot (PR2 based domain) it could not solve any problem. Finally, in relation to the two presented methods to generate abstractions, results show that the process based on landmarks can automatically generate abstractions and obtains comparable results to those obtained generating them manually. In some domains like Port, the time needed to generate the set of landmarks is extremely high depending on the complexity of the problem. In these cases, the advantages of akfd + landmarksbased abstractions generation cannot be appreciated due to this excessive time.
8.1. Approaches based on generating abstractions in AP The first work that uses abstractions [38] extended the work of Newell and Simon on GPS [39] and used it to develop Abstrips, which combined abstractions with STRIPS [40]. This technique introduced the concept of abstraction spaces, which are generated by removing preconditions from the operators of the original planning task. This planning algorithm first generates a plan for the abstracted task, then refines it iteratively by inserting specific partial plans between abstract states by satisfying the operator preconditions which were removed in the previous abstract level. Later, Alpine [41] improves Abstrips automatically generating abstraction hierarchies, using the preconditions of the operators. In both approaches abstractions are used to generate an abstract plan. This is refined incrementally until the lowest level of criticality is expanded and goals have been satisfied. Our approach explores a similar idea using abstractions to simplify the planning task. However, vrp uses abstractions only to generate the last part of the plan, relaxing far future details. In recent years, abstractions have been used to implement heuristic functions. Hoffmann and Nebel [42] used abstractions (delete relaxation) to compute the heuristic value of nodes by building a relaxed plan to guide the search process. In these relaxations, the delete effects of actions are ignored. Another use of abstractions to build heuristics is pattern databases (PDBs), which have been shown to be very useful in several hard search problems [43] and Automated Planning [44]. Merge&Shrink [45] exploited a new form of abstraction, in which Domain Transition Graphs (DTGs) are generated together (to merge them) and then shrunk by abstracting nodes that share the same relaxed distance to the goal. vrp does not use the generated abstractions for computing the heuristic value of nodes, but to generate semiabstracted plans. Finally, the red–black heuristic [46] tries to improve the delete relaxation heuristic selecting which variables are relaxed. This approach divided the set of state variables into two disjoints sets (red and black). During the planning process, red variables are treated as in delete relaxation abstraction while black variables are not relaxed. Abstraction can be tuned depending what variables are marked as red or black. vrp exploits a similar idea; some information can be relaxed to speed up search. However, in our case, abstraction is used to simplify the search process and not as a heuristic function. 8.2. Approaches interleaving planning and execution using heuristic search Several techniques based on Heuristic Search have been developed to speed up searches by trading off search time and cost of the solution. Real-time heuristic search techniques interleave planning and execution to generate actions using bounds (time, expanded nodes) to limit search. Learning real-time algorithms
M. Martínez et al. / Robotics and Autonomous Systems (
)
–
15
Table 5 Results of planning and execution on five different planning domains. The first column (Pla) corresponds to the planners used to solve the benchmark. The second column (Met) corresponds to the different metrics measured for each domain. The meaning of the metrics is the same as in Table 2. The next ten columns correspond to the score obtained for each domain, where M corresponds to the score using Manual selection and L corresponds to Landmark selection. The last two columns correspond to the total scores for both techniques. In bold, we highlight the best results per domain. Planner
Met
Domain
Total
Rovers M
L
Depots robots
TidyBot
M
M
M
20.7 27.5 28.1 27.8 30
L
M
L
72.8 10.9 27.6 9.6 75
0 0 0 0 0
28.8 1.3 3.8 1.1 30
10 0.1 0.3 0.1 10
9 0.2 0.5 0.1 15
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
80.8 5.1 18.7 4.9 85
110.6 12.4 31.8 10.8 120
akfd (k = 5)
F(s) T(s) R A C
74.3 32 50.4 28.4 75
73.7 33.4 50.5 29 75
52.8 18 20.3 17.4 55
55.2 19.8 25.5 17.5 70
64.3 13.1 20.3 11 65
36.1 6.3 9.4 4.8 43
73.3 41.2 67.6 37.4 75
48.5 60.6 58 46.6 75
67.3 49.4 68.2 51.2 70
42.8 30.7 40.9 24.9 43
332.1 153.7 226.8 145.4 340
256.4 150.6 184.3 122.9 306
akfd (k = 10)
F(s) T(s) R A C
74.2 56.7 64.7 50.9 75
73.8 55.3 61.4 49.2 75
64.9 32.7 34.4 31.3 71
63.4 53.6 48.3 48.3 75
72.6 23.8 32.6 23.6 75
65.4 30.3 36.2 26.3 73
73.1 59.6 66 56.4 75
33.5 47.1 45.1 33.4 60
52.1 43.3 54.7 49.2 59
45.1 44.3 48.4 41.3 53
336.9 216.1 252.5 211.5 355
281.2 230.6 239.4 198.5 336
akfd (k = 20)
F(s) T(s) R A C
73.6 65 63.9 59 75
70.9 66.4 67.8 60.5 75
63.9 49 48.3 47.4 68
51.2 43.3 44.4 40.4 60
70.7 52.7 54.5 46.4 75
67 54.4 61.8 49.7 75
72.4 51.2 71.5 46.3 75
33 47.3 44.3 37.5 60
49.4 44.2 50.6 49.8 52
40.9 41.9 44.7 39.2 45
330.1 262.1 288.8 248.8 345
263 253.2 263.1 227.3 315
akfd (k = 30)
F(s) T(s) R A C
73.5 67.7 67.4 61.8 75
72.8 67.5 67.7 61.4 75
53.8 44.8 47.2 44.9 59
33 31 35 30.8 45
67.7 54.3 57.2 50.6 75
51.3 55.2 53.4 47.7 75
69.7 46.5 71.8 44.1 75
40.1 67.1 60.2 62.6 75
47.1 37.7 48.5 44.8 50
32.1 26.3 38.3 37.6 40
311.7 251 292.2 246.3 334
229.2 247.2 254.6 240.1 310
interleave planning and execution in an on-line decision-making setting, where the planning time per each action executed by the agent is bounded. Learning Real-Time A* (LRTA*) [47] is one of the first real-time heuristic search algorithms. This algorithm performs an iterative execution of the RTA* [47] search with an updating rule to enable the algorithm to learn from previous runs. The update rule is based on changing the heuristic value of the current state only with respect to its immediate neighbors. Later, Shimbo and Ishida [48] introduced different techniques on LRTA* for bounding the amount of state space exploration. LRTA*(k) [49] is an algorithm based on LRTA* with a different updating strategy. This algorithm updates the heuristic value of k states per iteration following a bounded propagation strategy. Incremental search interleaves planning and executing to solve dynamic shortest-path problems, where shortest paths have to be found repeatedly as the topology of a graph or its edge costs change during execution. These approaches take advantage of interleaving planning and execution as vrp does. However, these algorithms introduce a bound in the number of nodes for which the heuristic value is computed and not in the number of search step as vrp does. Also, these algorithms have been mainly used for path planning tasks, while vrp can solve generic task planning. Dynamic A* (D*) [31] is one of the most popular algorithms for incremental search, that has been extensively used on robotics. This algorithm generates optimal path-finding plans in realtime by incrementally repairing paths to the robots state as new information is discovered. The repairing process is always executed when new information, that changes the structure of the problem, is found. Lifelong Planning A* (LPA*) [50] is an
25.9 12.2 34.5 18.3 45
M L
70.8 5 18.3 4.8 75
akfd (k = 2)
22.3 27.4 42.2 39.2 45
L
Satellite
F(s) T(s) R A C F(s) T(s) R A C
lamaf
5.4 15.6 52.3 51.9 75
L
Port
0 0 0 0 0
74.3 86.7 157.1 137.2 195
incremental version of A* which repeatedly finds shortest paths in a graph, while the edge costs of the graph change or vertices are added or deleted. Its first search is the same as that of a version of A*. But, the subsequent searches are potentially faster, because it reuses some parts of the previous search tree which are similar to the new search tree. These algorithms interleave planning and execution updating the solution according to the new information discovered during execution. vrp is based in a similar concept, decreasing the computational effort during search by applying abstractions. It also replans due to execution failures. Again, these algorithms have been mainly developed and used for solving pathfinding problems, while vrp solves generic planning tasks. 8.3. Approaches based on relaxing information about the environment Variable Level-Of-Detail (VLOD) was introduced by Zickler and Veloso [20]. A motion planning technique generates a collisionfree trajectory from an initial state to a goal state in dynamic environments. This approach introduces the concept of VLOD, which focuses search on obtaining accurate short-term motion planning, while considering the far future with a different level of detail, by selectively ignoring the physical interactions with bodies (dynamic objects). The physics-based planning algorithm is wrapped into a continuous loop. After observing the initial state, the algorithm generates a motion plan. Then, the robot executes a pre-determined set of actions from the plan and re-observe the environment, updating the state and generating a new plan by re-planning. The re-planning interval is set by the user. This
16
M. Martínez et al. / Robotics and Autonomous Systems (
approach decreases the computational cost of the motion planning task, so that information about physical interactions with bodies is ignored. Our approach generalizes the concept of VLOD to AP ignoring some information about the environment (predicates) during the search process. The removal of those predicates is performed starting in a given horizon in the search. This horizon defines when the information is considered ‘‘too far in the future to be accurately predicted’’.
)
–
execution depending of the size of the plan or the number of goals that must be reached. Interestingly, we can observe that different values of k solve better different groups of problems. This fact leads us to think that changing the value of the horizon during planning and execution dynamically could improve the performance of the process increasing the coverage and decreasing the full planning time. Acknowledgments
9. Conclusions and future work In this paper, we have presented Variable Resolution Planning (vrp), a novel technique that uses an abstraction mechanism that dynamically removes some predicates during the planning process. Our approach is able to significantly cut down computational effort of the search process. The corresponding abstraction is only used in nodes of the search tree that are far away from the initial state of the search. The exact computation of a plan in those nodes is not crucial, given that most probably the corresponding actions will not be executable, since the execution system (robot) will find an execution failure earlier on. Abstractions are started to be used from a given planning horizon k. There can be several alternatives for computing the abstractions. In this paper, we have explored a manual selection of predicates to be abstracted, as well as computing them automatically using landmarks. One of the main advantages of vrp is that it is possible to gradually control the relation between reasoning and execution. Depending on the value of k, the system can work as a reactive system, generating short plans of actions, or as a deliberative system, generating full sound plans. There are several lines to research further in the context of this work in order to improve the results on domains with different features. First, abstractions are built using predicates that are chosen from the initial set of landmarks. When all predicates part of this set are selected, the action space is pruned greatly, diminishing the quality of the generated abstract plan after the horizon. However, if only one or two predicates are selected, vrp can obtain results with better quality, increasing the information used to build the abstract plan. Interestingly, paying attention to the number of occurrences of each predicate from the set of landmarks, we can see that some predicates have greater relevance than others. We would like to examine whether using this value as an estimator of the predicates relevance is useful when selecting the predicates for abstraction. Besides, computing the set of landmarks sometimes may require excessive computational resources. We are currently exploring other alternatives for generating the set of candidate predicates, such as selecting predicates added to the relaxed plan built using the heuristic hff [51]. Besides, we are also analyzing whether the abstractions can be changed during planning and execution to decrease the planning time, and also potentially increasing the quality of the solutions. On the other hand, while the selected predicates have some influence over the quality of the plan after the horizon, the value of k has an important influence on the quality of the plan before the horizon and even over the number of replanning steps. Small k values decrease the quality of the plan before the horizon, increasing the number of replanning steps. Meanwhile, large k values increase the quality of the plan before the horizon, but planning time increases. Besides, the value of the horizon has some influence over the planning time, which is extremely important in robotics environments where robots cannot spend much time on reasoning. In general, small values of k decrease planning time. If the value of k is changed dynamically during execution, vrp could decrease the number of replanning steps and increase the quality of the plan before the horizon. Some of the experiments shown in Fig. 9 hint that k could be tuned during planning and
This research has been partially supported by the Spanish MICINN projects TIN2011-27652-C03-02, TIN2012-38079-C03-02 and Comunidad de Madrid—UC3M (CCG10-UC3M/TIC-5597). The main author is supported by a Ph.D. grant from University Carlos III de Madrid. We offer our gratitude and special thanks to Francisco Javier García Polo, for his generous and invaluable comments during the revision of this paper. References [1] M. Ai-Chang, J.L. Bresina, L. Charest, A. Chase, J.C. Hsu, A.K. Jónsson, B. Kanefsky, P.H. Morris, K. Rajan, J. Yglesias, B.G. Chafin, W.C. Dias, P.F. Maldague, MAPGEN: mixed-initiative planning and scheduling for the mars exploration rover mission, IEEE Intell. Syst. 19 (1) (2004) 8–12. http://dx.doi.org/10.1007/s10489-014-0542-0. [2] K. Rajan, C. McGann, F. Py, H. Thomas, Robust mission planning using deliberative autonomy for autonomous underwater vehicles, in: Proceedings of the Workshop on Robotics in Challenging and Hazardous Environments, ICRA, Rome, Italy, 2007. [3] H.L.S. Younes, M.L. Littman, Ppddl1.0: An extension to pddl for expressing planning domains with probabilistic effects, In Technical Report CMU-CS-042004. [4] S. Sanner, Relational dynamic influence diagram language (RDDL): Language description, 2013. [5] M.A. Peot, D.E. Smith, Conditional nonlinear planning, in: M. Kaufmann (Ed.), Proceedings of the First International Conference on Artificial Intelligence, College Park, Maryland, 1992, pp. 189–197. [6] B. Bonet, H. Geffner, mGPT: A probabilistic planner based on heuristic search, J. Artificial Intelligence Res. 24 (2005) 933–944. http://dx.doi.org/10.1613/jair.1688. [7] S.W. Yoon, A. Fern, R. Givan, S. Kambhampati, Probabilistic planning via determinization in hindsight., in: Proceedings of the Twenty-Third Conference on Artificial Intelligence, 2008, pp. 1010–1016. [8] A. Kolobov, Mausam, D.S. Weld, Classical planning in MDP heuristics: with a little help from generalization, in: Proceedings of the Twentyth International Conference on Automated Planning and Scheduling, ICAPS 2010, Toronto, Ontario, Canada, May 12-16, 2010, pp. 97–104. [9] L.S. Zettlemoyer, H. Pasula, L.P. Kaelbling, Learning planning rules in noisy stochastic worlds, in: Proceedings of the Twentieth National Conference on Artificial Intelligence, 2005, pp. 911–918. [10] S.W. Yoon, A. Fern, R. Givan, Ff-replan: A baseline for probabilistic planning, in: Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling, Providence, Rhode Island, USA, 2007, pp. 22–26. [11] R.V.D. Krogt, M.D. Weerdt, Plan repair as an extension of planning, in: Proceedings of the 15th International Conference on Automated Planning and Scheduling, ICAPS-05, Monterey, CA, USA, 2005, pp. 161–170. [12] M. Fox, A. Gerevini, D. Long, I. Serina, Plan stability: Replanning versus plan repair, in: Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, Cumbria, UK, 2006, pp. 212–221. [13] D. Borrajo, M. Veloso, Probabilistically reusing plans in deterministic planning, in: Proceedings of ICAPS’12 Workshop on Heuristics and Search for DomainIndependent Planning, AAAI Press, Atibaia, Brazil, 2012. [14] R.A. Brooks, A robust layered control system for a mobile robot, IEEE J. Robot. Autom. 2 (10) (1986) 14–23. http://dx.doi.org/10.1109/JRA.1986.1087032. [15] G. Butler, A. Gantchev, P. Grogono, Object-oriented design of the subsumption architecture, Softw. Pract. Exp. 31 (9) (2001) 911–923. http://dx.doi.org/10.1002/spe.396. [16] E. Zalama, J. Gómez, M. Paul, P.J. Ramón, Adaptive behavior navigation of a mobile robot, in: IEEE International Conference on Systems, Man, and CybernAdaptive behavior navigation of a mobile robotetics, Part A: Systems and Humans, Vol. 31, 2002, pp. 160–169. [17] S.X. Yang, M. Meng, Real-time collision-free motion planning of a mobile robot using a neural dynamics-based approach, Int. J. Robot. Autom. 2 (3) (2003) 1541–1552. [18] E. Aguirre, A. González, A fuzzy perceptual model for ultrasound sensors applied to intelligent navigation of mobile robots, Appl. Intell. 19 (3) (2003) 171–187. [19] A. Zhu, S.X. Yang, A goal-oriented fuzzy reactive control for mobile robots with automatic rule optimization., in: International Conference on Intelligent Robots and Systems, 2010, pp. 3688–3693.
M. Martínez et al. / Robotics and Autonomous Systems ( [20] S. Zickler, M. Veloso, Variable level-of-detail motion planning in environments with poorly predictable bodies, in: Proceeding of the nineteenth European Conference on Artificial Intelligence, Lisbon, Portugal, 2010, pp. 189–194. [21] A. Gerevini, D. Long, Plan constraints and preferences in pddl3, in: Proceedings of ICAPS’06 Workshop on Soft Constraints and Preferences in Planning, The English Lake Districk, Cumbria, UK, 2006. [22] M. Helmert, The fast downward planning system, J. Artificial Intelligence Res. 26 (2006) 191–246. http://dx.doi.org/10.1613/jair.1705. [23] M. Martínez, F. Fernández, D. Borrajo, Variable resolution planning through predicate relaxation, in: Proceedings of ICAPS’12 workshop on Planning and Plan Execution for Real-World Systems: Principles and Practices, PlanEx, Atibaia, São Paulo, Brazil, 2012, pp. 5–12. [24] M. Martínez, F. Fernández, D. Borrajo, Selective abstraction in automated planning., in: Proceedings of Second Annual Conference on Advances in Cognitive Systems, Cogsys, Baltimore, USA, 2013, pp. 133–147. [25] J. Porteous, L. Sebastia, J. Hoffmann, On the extraction, ordering and usage of landmarks in planning, in: Proceedings of the European Conference of Planning, Toledo, Spain, 2001, pp. 37–48. [26] J. Hoffmann, J. Porteous, L. Sebastia, Ordered landmarks in planning, J. Artificial Intelligence Res. 22 (1) (2004) 215–278. http://dx.doi.org/10.1613/jair.1492. [27] S. Richter, M. Westphal, The lama planner: Guiding cost-based anytime planning with landmarks, J. Artificial Intelligence Res. 39 (1) (2010) 127–177. http://dx.doi.org/10.1613/jair.2972. [28] E. Keyder, S. Richter, M. Helmert, Sound and complete landmarks for and/or graphs, in: Proceedings of the 2010 Conference on ECAI 2010: 19th European Conference on Artificial Intelligence, Amsterdam, The Netherlands, 2010, pp. 335–340. [29] F. Pommerening, M. Helmert, Optimal planning for delete-free tasks with incremental lm-cut, in: Proceedings of the Twenty-Second International Conference on Automated Planning and Scheduling, ICAPS, Atibaia, São Paulo, Brazil, 2012. [30] C. Bäckström, B. Nebel, Complexity results for sas+ planning, Comput. Intelligence 11 (1993) 625–655. [31] A. Stentz, The focussed d* algorithm for real-time replanning, in: Proceedings of the 14th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 1995, pp. 1652–1659. [32] E. Quinnero, V. Alcázar, D. Borrajo, J. Fernández-Olivares, F. Fernández, A.G. Olaya, C. Guzmán, E. Onaindia, D. Prior, Autonomous mobile robot control and learning with the pelea architecture., in: Proceedings of the AAAI-11 Workshop on Automated Action Planning for Autonomous Mobile Robots, PAMR, San Francisco, CA, USA, 2011. [33] L.J. Manso, L.V. Calderita, P. Bustos, J. García, M. Martínez, F.F. Rebollo, A.R. Garcés, A. Bandera, A general-purpose architecture to control mobile robots, in: Proceedings of the 15th Workshop of physical agents, WAF 2014, León, Spain, 2014, pp. 105–116. [34] H.L.S. Younes, M.L. Littman, D. Weissman, J. Asmuth, The first probabilistic track of the international planning competition, J. Artif. Intell. Res. 24 (2005) 851–887. [35] P.R. Wurman, R. D’Andrea, M. Mountz, Coordinating hundreds of cooperative, autonomous vehicles in warehouses, in: Proceedings of the 19th National Conference on Innovative Applications of Artificial Intelligence, Vol. 2, 2007. [36] D. Borrajo, Plan sharing for multi-agent planning, in: Preprints of the ICAPS’13 DMAP Workshop on Distributed and Multi-Agent Planning, Rome, Italy, 2013, pp. 57–65. [37] V. Vidal, A lookahead strategy for heuristic search planning, in: Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling, ICAPS 2004, Whistler, British Columbia, Canada, 2004, pp. 150–160. [38] E.D. Sacerdoti, Planning in a hierarchy of abstraction spaces, Artificial Intelligence 2 (5) (1972) 115–135. [39] H. Simon, A.H. Newell, Gps: A case study in generality and problem solving, Artificial Intelligence 2 (5) (1969) 109–124. [40] R.E. Fikes, N.J. Nilsson, Strips: A new approach to the application of theorem proving to problem solving, in: Proceedings of the 2nd International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 1971, pp. 608–620. [41] J.D.T. Craig, A. Knoblock, Q. Yang, Characterizing abstraction hierarchies for planning, in: Proceedings of the nineth National Conference on Artificial Intelligence, Vol. 2, Anaheim, CA, USA, 1991, pp. 692–697. [42] J. Hoffmann, The metric-ff planning system: Translating ‘‘ignoring delete lists’’ to numeric state variables, J. Artificial Intelligence Res. 20 (2003) 291–341. [43] J.C. Culberson, J. Schaeffer, Pattern databases, Comput. Intelligence 14 (3) (1998) 318–334. [44] S. Edelkamp, Planning with pattern databases, in: Proceeding of the sixth European Conference on Planning, 2001, pp. 13–24.
)
–
17
[45] P. Haslum, A. Botea, M. Helmert, B. Bonet, S. Koenig, Domain-independent construction of pattern database heuristics for cost-optimal planning, in: Proceeding of the twenty-second Conference on Artificial Intelligence, 2007, pp. 1007–1012. [46] M. Katz, J. Hoffmann, C. Domshlak, Red–black relaxed plan heuristics, in: Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, Bellevue, Washington, USA, 2013. [47] R.E. Korf, Real-time heuristic search, Artificial Intelligence 42 (2–3) (1990) 189–211. http://dx.doi.org/10.1016/0004-3702(90)90054-4. [48] M. Shimbo, T. Ishida, Controlling the learning process of realtime heuristic search, Artificial Intelligence 146 (1) (2003) 1–41. http://dx.doi.org/10.1016/S0004-3702(03)00012-2. [49] C. Hernández, P. Meseguer, Lrta*(k), in: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, 2005, pp. 1238–1243. [50] S. Koenig, M. Likhachev, D. Furcy, Lifelong planning a*, Artificial Intelligence 155 (1–2) (2004) 93–146. http://dx.doi.org/10.1016/j.artint.2003.12.001. [51] J. Hoffmann, B. Nebel, The ff planning system: Fast plan generation through heuristic search, J. Artificial Intelligence Res. 14 (2001) 253–302. http://dx.doi.org/10.1613/jair.855.
Moisés Martínez obtained his Bachelor in Computer Science and Engineering from Universidad Carlos III de Madrid in 2011 and his master in Computer Science and Technology in 2010 with a specialization in artificial intelligence from the same university. He has held a research fellowship since 2011 in the Planning and Learning Group of Universidad Carlos III de Madrid, where he is currently studying his Ph.D. on artificial intelligence. He is also Professor of the bachelor in Computer Science and Engineering, where he teaches in two different courses: Machine Learning and Artificial Intelligence in VideoGames. His research fields are Robotics and Automated Planning, having several papers of these topics published in conference proceedings.
Fernando Fernández is a Professor of Computer Science at Universidad Carlos III de Madrid since 2005. He received his Ph.D. degree in Computer Science from University Carlos III of Madrid (UC3M) in 2003. He received his B.Sc. in 1999 from UC3M, also in Computer Science. Since 2001, he became assistant and associate professor at UC3M. In the fall of 2000, Fernando was a visiting student at the Center for Engineering Science Advanced Research at Oak Ridge National Laboratory (Tennessee). He was also a postdoctoral fellow at the Computer Science Department of Carnegie Mellon University (visit his old web page at CMU) since October 2004 until December 2005. He is the recipient of a pre-doctoral FPU fellowship award from Spanish Ministry of Education (MEC), a Doctoral Prize from UC3M, and a MEC-Fulbright postdoctoral Fellowship. He has more than 30 journal and conference papers, mainly in the field of machine learning and planning. He has advised 3 Ph.D. thesis.
Daniel Borrajo is a Professor of Computer Science at Universidad Carlos III de Madrid since 1998. He received his Ph.D. in Computer Science in 1990 and B.S. in Computer Science both at Universidad Politcnica de Madrid. He has published over 150 journal and conference papers mainly in the fields of problem solving methods (heuristic search, automated planning and game playing) and machine learning. He has been the PI and/or participated in over 30 research projects and networks funded at all levels (regional, national and European). He has been the Program Co-chair of the International Conference of Automated Planning and Scheduling (ICAPS’13), Conference co-chair of the Symposium of Combinatorial Search (SoCS’12, SoCS’11) and ICAPS’06, Chair of the Spanish conference on AI (CAEPIA’07), PC member of conferences as IJCAI (senior PC at IJCAI’07, ’11 and ’13), AAAI, ICAPS, ICML, or ECML. He has advised 14 Ph.D. thesis, and is currently member and treasurer of the ICAPS Council.