Robotics and Autonomous Systems 110 (2018) 1–11
Contents lists available at ScienceDirect
Robotics and Autonomous Systems journal homepage: www.elsevier.com/locate/robot
Automatic graspability map generation based on shape-primitives for unknown and familiar objects Danny Eizicovits, Sigal Berman
∗
Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer-Sheva, Israel
highlights • • • • •
Generating graspability maps of grasp wrist-configurations is time-consuming. For familiar objects of different sizes, graspability maps can be scaled. For unknown objects, graspability maps can be combined from shape primitives. Using shape primitives, graspability maps can be generated during run-time. Reduction in graspability map quality when using shape primitives is very small.
article
info
Article history: Received 6 February 2018 Received in revised form 22 July 2018 Accepted 6 September 2018 Available online xxxx Keywords: Robotic manipulators Grasping Graspability map
a b s t r a c t Determining goal configurations that lead to successful grasps is a critical, time-consuming stage in reach-to-grasp planning, especially in unstructured, cluttered environments. While traditional, analytic algorithms are computation intensive and susceptible to uncertainty, modern, data-driven algorithms do not offer success guarantees and require large datasets for learning models of reach-to-grasp motion. Graspability maps are data structures which store wrist configurations that lead to successful grasps of an object. They are suitable for both direct use in reach-to-grasp motion planning, and as grasp databases for gripper design analysis and for learning grasp models. The computation of graspability maps can be based on analytical models. This facilitates the integration of analytical grasp quality guarantees with data-driven grasp planning. Yet, current graspability map computation methods are prohibitively timeconsuming for many application scenarios. In the current work, we suggest a method for adaptation of graspability maps of known objects (shape primitives) to familiar and to unknown objects. The method facilitates run-time generation of graspability maps and significantly enhances their usability. Adapted maps are generated based on detecting shape primitives in the object to be grasped, scaling the apriori generated maps to the required dimensions, and combining the scaled maps to form a compound graspability map. Simulation results confirm that map adaption does not critically reduce quality while significantly reducing computation time. A case study evaluation with objects from a public pointcloud image database corroborates the method’s ability to quickly and accurately generate high-quality graspability maps for familiar and unknown objects. © 2018 Elsevier B.V. All rights reserved.
1. Introduction Grasp planning involves finding a set of suitable finger contact points, and both arm and hand joint configurations for a given task, environment, object, gripper, and robot. Due to the computation complexity, and as inspired by human motion research [1], hand and arm configurations are typically planned separately and the computation is conducted serially from the distal to the proximal sections. That is, arm configuration is determined based on the ∗ Correspondence to: Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, P.O.B 653, Beer-Sheva 84105, Israel. E-mail address:
[email protected] (S. Berman). https://doi.org/10.1016/j.robot.2018.09.001 0921-8890/© 2018 Elsevier B.V. All rights reserved.
required contact points, then wrist pose (position and orientation) is determined by calculating the forward kinematics of the hand configuration, and finally arm configuration is derived based on the wrist pose, by solving the arm’s inverse kinematics (IK). Grasp planning is a fundamental problem in robotics and a critical component of reach-to-grasp motion planning and it has been extensively studied in the last few decades [2,3]. The research has led to numerous grasp planning algorithms, which have been suggested for different prospective applications, and which have different advantages [2–7]. The difficulty of grasp planning is considerably increased when the objects to be grasped are not known in advance, when data includes uncertainty, when the environment is dynamic, when task requirements are demanding, and when the
2
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
robotic arm and hand have multiple degrees of freedom. Therefore, with current technology, grasp planning especially in unstructured, dynamic environments, can incur prohibiting computation costs. Devising methods for efficiently grasping objects in such cases is still an open research question [3,4]. Graspability maps are data structures used to store information regarding grasp quality and wrist (or tool center) poses that lead to successful grasps. Graspability maps have been used for grasp planning and for gripper design evaluation [8–10]. They are also suitable for constructing and compactly storing data required for grasp learning algorithms. Graspability maps can be derived based on an analytical model or from demonstrations. Using an analytical model, the grasp-pose quality grade and successes probability are computed based on task requirements, and object and gripper geometry [2,3]. Both construction methods (model-based and demonstration-based) are time-consuming since either multiple demonstrations or multiple computations are required. Even when done very efficiently, computing a graspability map can take 10 and even 60 min, depending on object size and map resolution [9]. This computation time-scale prohibits generation of graspability maps during run-time which is problematic for applications in which the robot may encounter unknown objects. It also makes the use of graspability maps cumbersome for applications that require multiple maps, e.g., when there are several objects and grippers. In the current work, we suggest a fast, automatic adaptation method for creating new, high quality, graspability maps based on existing maps. The considerable reduction in the time required for generating graspability maps using this method facilitates generation during run-time and efficient multiple map generation, considerably increasing graspability map usability. Grasp planning approaches can be classified based on the available knowledge regarding the object to be grasped: known, familiar, or unknown [3]. For known objects the grasp planning algorithm has a prior model of the object to be grasped. In case the object is situated in a known environment, pre-computed grasp configurations can be directly used. In case the object is situated in an unknown or dynamic environment, pre-computed hand configurations (e.g., the Columbia grasp database, [11]) can be used and only the approach path has to be planned during runtime [12]. For familiar objects, the planning algorithm has a prior model of an object which has similar characteristics to the object to be grasped, e.g., shape, texture, and graspable zones. The model facilities basing grasp planning, at least in part, on prior knowledge, leveraging the reliance on sensory inputs, reducing planning complexity and time, and increasing system robustness. For unknown objects the system does not have a suitable prior model. Therefore, grasping unknown objects highly depends on the features perceived by the sensory system, where in most cases the obtained data is sparse, incomplete, and noisy. The methods developed in the current paper are based on adaptation of graspability maps of known objects to maps for familiar and unknown objects. For familiar objects, the algorithm scales existing maps based on size. For unknown objects the algorithm constructs a compound map based on the graspability maps of the shape primitives from which the object is composed. Due to abundance of object shapes especially in natural environments, perception uncertainty, and the complexity of performing a comprehensive shape analysis [13,14], many grasp synthesis algorithms approximate object shape [15]. Partial shape approximation can be conducted based on a set of shape primitives such as spheres, cylinders, boxes, etc. [16]. The use of shape primitives allows, in many cases, approximation of unknown objects by familiar objects. Thus, facilitating generation of hypotheses regarding grasp-configuration success probability based on prior knowledge [17,18]. For example, based on studies of human perception, Aleotti and Caselli [19] suggest learning grasps from
human demonstration and storing them with their related object parts, i.e., their shape primitives. During run-time object parts are identified and the demonstrations are used to compute a set of candidate grasps (e.g., using the OpenRave1 engine in a simulated environment) for each identified object part. Grasp planning algorithms can be roughly divided into analytical and data-driven (also referred to as empirical) approaches [4, 5,20]. Analytical approaches [21–24] dominated the field of grasp planning until the beginning of the 2000 s. In analytical grasp planning algorithms, grasp contact points and configurations are found based on the analysis of wrenches and forces calculated based on gripper and object characteristics. The main advantage of the approach is that theoretical requirements regarding grasp quality can be computed and guaranteed. Typically more than one grasp fulfilling task requirements is found and an optimal grasp can be chosen based on calculated quality measures [2,3]. Several characteristics of analytical-based algorithms make them less suitable for grasp planning during run-time. They require complete and exact knowledge regarding the object, they are susceptible to noise and uncertainty [25], and they typically incur a very high computational cost [26]. The computation burden is increased when grasp planning is integrated with a reach-to-grasp path planning. Since not all successful grasp poses may be reachable, the grasp planning algorithm is required to compute several grasp configurations. Such issues which are related to object reachability, are especially problematic in cluttered environments. Data-driven grasp planning algorithms have become more prevalent recently with the advent of very fast learning and search algorithms. Unlike analytical approaches, data-driven, empirical approaches [4– 7,27–30] compute grasp contact points and configuration based on visible features, models learned from known contact points, and configurations predefined for various objects. However, grasp optimality, or even grasp success, is not guaranteed [31]. Moreover, a large database of pre-defined, successful grasp configurations for similar objects and grippers is required by most current learning algorithms for learning good grasping models. The graspability map adaptation methodology presented in the current work can be used for generating and storing multiple successful grasp poses required by the learning algorithms. As graspability map generation can be based on analytical models, this facilitates integration of analytical guarantees with data-driven grasp planning. Moreover, map adaptation is sufficiently fast to be conducted during run-time, and can be directly integrated with reach-to-grasp planning algorithms. In the following we present a methodology for graspability map adaptation based on primitive shapes, we test the methodology for multiple objects and object dimensions, present a case study based on several point-cloud representations of objects from a public database, and test a sample of the grasp poses recorded in the map in a physical environment. 2. Method The proposed methodology has two stages (Fig. 1), a-priori graspability database generation, and run-time adaptation of maps in the database. In the a-priori map generation stage, a set of fullscale (FS), high-resolution graspability maps is created and stored in a graspability map database. During run-time, the graspability map database is used for generating adapted graspability maps based on the characteristics of the gripper and the object to be grasped. 1 http://openrave.org/.
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
3
Fig. 1. Flow chart of the methodology for generating graspability maps based on full scale (FS) maps of shape primitives. The entry step is indicated by a gray ellipse (start). The final steps are indicated by a gray ellipse with a black boarder. Top: A-priori database generation of FS maps for selected shape primitives (detailed in Section 2.1). Bottom: Run-time map generation by map scaling and combination based on the FS map database and the shape primitives identified (detailed in Section 2.2).
2.1. A-priori database generation An initial set of shape primitives and known objects is defined based on the objects the system is expected to encounter. Previous research on grasp planning using shape primitives [17,18] typically includes four primitive shapes: boxes, cylinders, cones, and spheres. An additional important shape primitive is the half-toroid. Half toroids are frequently encountered in many objects (e.g., cups, cooking pots) and often characterize the shape of a part specifically planned for grasping, i.e., a handle. Therefore a half-toroid was additionally included in the presented methodology. For each known object and shape primitive, a FS graspability map is generated using the surface to voxel map generation algorithm [9]. In the surface to voxel algorithm, the surface of the object is scanned for finger contact points suitable for grasping the object. The algorithm uses the 3D point-clouds of the object and gripper for assessing grasp quality from wrist poses calculated from the surface contact points using forward kinematic equations. Grasp quality is determined based on the amalgamation of various grasp quality measures. In the current work, we use a combination of two metrics. A metric for force closure (the force closure angle) and a metric for grasp stability (the stability distance) [9]. In addition, the algorithm verifies that there are no collisions between the gripper and the object. The quality grades of feasible grasps (grasps with no collisions) are stored in the graspability map together with their wrist configurations. Grasp success probability is determined by applying a threshold on grasp quality. The threshold is validated based on physical experiments. 2.2. Run time map generation During run-time, computation is based on the segmented 3D point-cloud of the object to be grasped. The point-cloud can be obtained from a sensory apparatus (e.g., for direct grasp planning) or it can be synthetically created (e.g., for generating a much
larger graspability map database). For identifying the object or its shape primitives, the point-cloud is iteratively divided into clusters using the hierarchical Minimum Volume Bounding Box (MVBB) algorithm [32], where in the first iteration the whole object is allocated to a single cluster. The division into clusters is repeated until all clusters have been identified or the threshold of the cluster division process is reached. For each cluster, the algorithm checks whether its shape can be identified as one of the shape primitives in the database using an algorithm based on the Maximum Likelihood Estimation Sample and Consensus (MLESAC) method [33]. If the identification fails, the cluster is returned to the hierarchical cluster division process. If identification succeeds, i.e., the cluster is identified as a known object with an existing graspability map, or as a familiar object for which there is a graspability map of an object of similar dimensions, the appropriate (exact or similar) graspability map is retrieved from the database. For familiar objects, the map is scaled based on object dimensions. When all clusters are identified, the maps retrieved and scaled from the database are combined to form a new graspability map suitable for the object to be grasped. In case clustering fails, i.e., the threshold of the cluster division process is reached but not all clusters have been identified, the algorithm exists indicating that object clustering has failed. The user can opt to return to the apriori database generation stage and add graspability maps for the object and/or for additional shape primitives. Alternatively, a partial graspability map can be generated based on the clusters that were identified. A MVBB can be efficiently computed for a 3D point-cloud [34]. Decomposing a point-cloud of an unknown object into shape primitives based on a set of MVBBs can give efficient indications for grasp planning [32]. However, determining the appropriate decomposition can be computation intensive since there are infinite decomposition possibilities. For efficiently determining a suitable set of bounding boxes, Huebner et al. [32] suggest the hierarchical
4
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
MVBB algorithm. The algorithm is based on a fit-and-split hierarchy, where a heuristic is used to estimate if, and along which plane to split the parent box in two at every iteration. For computing the heuristic, the dataset bounded within the parent box is projected onto the three orthogonal Cartesian planes. A 2D convex hull is computed for each plane and its perimeter is used to generate candidate splitting lines. 2D convex hulls are computed for each subset, and the best candidate splitting line (of all candidate lines, of all planes) is chosen according to the sum of the areas of the two convex hulls of the subsets relative to the parent convex hull. The dataset is split according to the plane that includes the chosen splitting line. Finally a MVBB is computed for each of the two new data subsets. The division process termination condition is based on a threshold, θth placed on the total volume minimization gained by splitting the dataset, θ ∗ Eq. (1). When the volume minimization is lower than the threshold, the splitting continues. Setting the threshold θth too high can cause the algorithm to produce a large number of small boxes, over-fitting the data. Setting θth too low can lead to the creation of a small number of large boxes with complex shapes.
number of point-cloud samples classified as part of the model (Ci ) and the number of samples in the point-cloud (Pi ):
∑ Ci Fit rate = ∑ Pi
(2)
The two new features are based on evaluating the cylindrical symmetry along the object’s main axis. The cylindrical symmetry is determined by calculating the standard deviation of the radii’s along the main axis. The overall-cylindrical symmetry (along the main axis) is important for differentiating boxes and half-toroids from cylinders and spheres, and the half-cylindrical symmetry, i.e., cylindrical symmetry of the lower and upper parts of the object, is important for differentiation between cones and cylinders. 2.4. Map scaling for familiar objects
Familiar object are objects that are classified as one of the shape primitives in the database (e.g., cone, box, sphere, cylinder, or halftoroid), but for which the database does not include a size appropriate FS graspability map. In the current implementation, based on gripper dimensions, the acceptable size difference between the ( \P ) object and an existing map was set to 1 cm along any axis. Objects V C + V C + V A ( ) ( ) 1 2 ( ) θ∗ = (1) with larger differences were determined as familiar. V (P ) + V A\P The map scaling algorithm is based on reducing the search Where V is a volume function, A is the complete set of boxes in the space for successful grasps from the entire object surface, to examining only scaled poses of previously found successful grasps, current hierarchy, P is the current box, and C1 , C 2 are the two child i.e., the poses of successful grasps in the FS graspability map of boxes produced by the current split. the shape primitive. The reduction in the search space is based on the assumption that unsuccessful grasps will, in general, re2.3. Primitive shape identification main unsuccessful for similar objects thus, they do not need to be checked again. This assumption may not always be valid, especially Depending on scene characteristics, e.g., surface orientations, when the scaling required is large, so map scaling is limited by the reflectiveness, and lighting conditions, the point-cloud measurerequired scaling range. ments provided by commonly used Red–Green–Blue (video)-Depth A pose scaling procedure was developed for each primitive (RGB-D) cameras is susceptible to large fluctuations in accuracy shape in the database based on its geometry (radius, height, etc.). and precision [35]. The Random Sample Consensus (RANSAC) algoThe pose scaling procedures are based on the assumption that the rithm is a model-fitting algorithm suitable for such noisy datasets [36]. orientation of successful grasps is likely to remain the same with The algorithm aims at matching a model to the data while disrethe change of object size (when object shape is maintained). For garding outliers. The basic model fitting operation of the algorithm example, maps for cylinders are scaled based on their radius and includes two stages. First a small subset of the data is sampled and height. The grasp quality of the scaled wrist poses is re-evaluated used for estimating parameters for a hypothesized model. Then (using the same quality measures used for generating the original the consensus set is determined as the number of elements in the FS map). The grasps are also tested for collisions between the dataset that are consistent with the model within a threshold error. gripper and the object. If the grasp quality is above the threshold When the consensus set is sufficiently large, the hypothesized and no collisions are found then the grasp is retained and saved model is accepted. This model fitting operation is repeated several along with its new quality grade in the scaled graspability map. times and the best fit is selected. The probability of succeeding in Determining the scaling algorithm in not trivial as different estimating a model that represents the data well, increases with scaling algorithms result in a different compromise between mainthe number of repetitions. Yet, since it is a time-consuming protaining grasp quality and maintaining grasp feasibility (Fig. 2). cess, determining the number of required repetitions is important We shall demonstrate this for the cylinder. When maintaining grasp quality is prioritized, it is important to maintain the relative for ensuring both high recognition rates and fast performance. The position and distance of the object’s center of gravity and the number is determined empirically based on task requirements. The grasping point (GP), i.e., the center of the contact points between MLESAC algorithm is a variant of the RANSAC algorithm which the gripper’s fingers and the object, as it effects both force closure aims at reducing the sensitivity to the selection of the error threshand grasp stability. Recall that the graspability map is defined with old [33]. For achieving this, it evaluates the fit of the hypothesized respect to the object’s center of mass. Thus, the same coordinate model by estimating the log likelihood of the data. frame can be used for the scaling. In such a case for a cylinder with Garcia [37] proposed an algorithm based on MLESAC for idenradius, RO , and height, HO , scaling to new dimensions of radius, RN , tifying a set of shape primitives from an object point-cloud for and height, HN , is based on multiplication by two factors (change robotic grasping. The algorithm is devised for identifying the four in radius and in height): classical shape primitives, sphere, cylinder, cone, and box. We adapt the algorithm by adding features related to cylindrical symHN RN FHGP = ; FRGP = (3) metry for additionally identifying the half-toroid and for improving HNO RO the recognition rate. The shape of a cluster is identified using a GPXN = FRGP · GPXO , GPYN = FRGP · GPYO , GPZN = FHGP · GPZO (4) rule-base classification algorithm based on four features extracted directly from the point-cloud. Two features, the fit rates to a spherWhere GPXO ; GPYO ; GPZO are the GP coordinates in the FS map ical model and to a cylindrical model, are similar to the original and GPXN ; GPYN ; GPZN are the GP coordinates in the scaled map. features [37]. The model fit rate is defined as the ratio between the The wrist position, WN , can then be calculated based on GPN , the
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
5
and can be tested for each shape separately. Stability has to be tested for the combined map since the location of the center of gravity of the object changes. Collisions have to be tested since the graspability map generation process of each shape primitive does not take in to account collisions with the adjacent shapes. Thus, collisions between the gripper and adjacent shapes are checked and in case a collision is found, the pose is discarded. 3. Experiment
Fig. 2. Grasp scaling algorithms. (A) Original grasp in the FS map, object height is 2h (general units). For an object with the same radii but with height 4h (height scaled by a factor of 2) the scaling of the original grasp is presented for (B) Quality scaling (C) Feasibility scaling.
position of the contact points with respect to the wrist, and the orientation of the grasp. We term this algorithm, grasp Quality scaling. However, when the new object is larger than the original object, this scaling approach will lead to a collision between the gripper and the object rather quickly, due to the length of the gripper’s fingers. In an alternative approach, when maintaining grasp feasibility is prioritized, it is important to maintain the distance between the wrist (gripper base) and the GP. Therefore, the wrist should be linearly translated based on the change in dimensions. FHW = HN − H0 ; FRW = RN − R0 WXN =
FRW
+ WXO ; WYN =
FRW
+ WYO ; WZN =
(5) FHW
+ WZO
(6)
Where WXO ; WYO ; WZO are the wrist position coordinates in the FS map and WXN ; WYN ; GWPZN are the wrist position coordinates in the scaled map. We term this algorithm, grasp Feasibility scaling. These two approaches can be integrated by utilizing the flexibility facilitated by the length of the gripper’s fingers to form the Combined scaling algorithm. In the Combined scaling algorithm, the position of GPO can be scaled based on Eqs. (2) and (3). The position of WN is calculated based on GPN and in case of collision, the position of WN can be recalculated based on Eqs. (4) and (5). Graspability map scaling dramatically reduces the search space, since instead of searching the entire object surface, only scaled poses of previously determined successful grasps are examined. Indeed, the search space reduction comes at a cost of missing some successful grasp poses. We conducted an experiment to validate that map scaling indeed significantly reduces generation time but does not critically reduce graspability map quality, with respect to the FS map. Map quality is defined based on the surface and volume distributions of successful grasps [9]. Grasp surface distribution indicates if grasps at different regions about the object’s surface are identified, as grasping different regions may be preferable for different tasks. Grasp volume distribution indicates if grasps at different regions in the volume surrounding the object are identified, as grasping from different locations can facilitate reachto-grasp planning in cluttered environments and is important for manipulability (the ability to perform the motion required by the task). 2.5. Map combination for unknown objects During map combination, grasp quality for each grasp pose in the maps combined is re-assessed for collisions and stability. Force closure does not change when the primitive shapes are combined
A FS graspability map database was computed for five primitive shape (cone, box, sphere, cylinder, and half-toroid). Three experiments were conducted using the FS graspability map database. One experiment was conducted for determining the number of the model-fitting repetitions required for shape identification. Another experiment was conducted for ascertaining the hypotheses regarding map scaling for familiar objects. Namely determining that map scaling does not critically reduce map quality while significantly reducing computation time and that the Combined scaling algorithm out-performs both the grasp Quality and the grasp Feasibility scaling algorithms. Finally, a case-study analysis was conducted using five objects from a public 3D point-cloud database [38], demonstrating the complete methodology. Grasps with a quality grade above 0.7 were accepted as grasps that lead to successful task completion (for a quality grade in the range of 0 to 1) based on an experiment with physical objects similar to the modeled primitive shapes [9]. The algorithms were implemented using MATLABTM (Version R2014, Mathworks, USA), with an Intel 2 GHz I7 quad core with 6GB RAM. 3.1. FS graspability map database generation A FS graspability map database was generated for use in all experiments. To generate the database, 3D point-clouds of a twojaw gripper (HGPL-25-60-A, Festo, Germany) (Fig. 3(A)) and five shape primitives (cube, cylinder, sphere, cone, and half-toroid), were synthetically generated (Fig. 3(B–G)). The resolution of the point-clouds was 0.5 cm3 . FS graspability maps were generated for each shape. The map generation resolution used (surface step size 0.5*0.5 cm2 , rotation step size 11◦ per axis) was previously shown to produce high quality maps [9]. For reducing FS map computation time, the MATLABTM Parallel computing toolbox was used with seven working cores (four physical and three virtual). 3.2. The number of repetitions required for shape identification Point-clouds of seven shape variants were generated for each shape primitive. To determine the number of model fitting repetitions required, each shape variant was identified with nine different repetition conditions (1, 5, 10, 20, 30, 40, 60, 100 and 400 repetitions) and each condition was repeated 100 times (over all 7X9X100=4900 trials for each shape primitive). The performance measures evaluated were, the average success rate and the average computation time. 3.3. Map scaling for familiar objects The map scaling algorithms were systematically tested for the cylinder object. Point-clouds of ten size variants of a cylinder (radius 2 cm, height 20 cm) were synthetically generated, where five were larger and five smaller than the original cylinder shape (Table 1). Four maps were generated for each variant, a FS graspability map, and three adapted maps generated based on the FS map of the shape primitive found in the graspability map database, using each of the three map scaling algorithms (grasp Quality, grasp Feasibility, and Combined scaling). Overall 4X10=40 maps were generated for the cylindrical object variants.
6
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
Fig. 3. A synthetic point-cloud was created for the gripper and for each shape primitive. (A) The model of the gripper based on the HGPL-25-60-A (Festo, Germany) two jaw gripper, and the shape primitives used for creating the FS graspability maps: (B) Cylinder (C) Sphere (D) Box (E) Half-toroid (F) Cone.
The scaling algorithms were assessed based on the generation time reduction, the generation time of initial grasps, and the graspability map quality. Generation time reduction was assessed based on the time reduction ratio (TR), computed as the ratio between the time it took to generate a scaled map and the time it took to generate a FS map. Generation time of initial grasps was measured by the time to find the first high quality grasp (grade>0.95) and the time to find the first ten high quality grasps. Graspability map quality was determined based the quality ratio between the scaled and the FS maps. Map quality was determined based on the grasp rate about the object’s surface, i.e., grasp surface rate (SR) and the grasp rate within the object’s bounding box volume, i.e., the grasp volume rate (VR) [9]. The ratios of SR and VR of the scaled map with respect to the FS map were defined as SRN , Eq. (7) and VRN , Eq. (8) respectively.
∑Pn SR =
i=1
GPi
Pn
; SRN =
SR SRFS
· 100
(7)
Where Pn is the normalized number of pixels, GP i is 1 if a grasp with quality > 0 was found in pixel i and 0 if not, and SRFS is the SR of the FS map.
∑Vn VR =
i=1
Vn
GVi
; VRN =
VR VRFS
· 100
(8)
Where Vn is the normalized number of voxels, GV i is 1 if a grasp with quality > 0 was found in voxel i and 0 if not, and VRFS is the VR of the FS map. Statistical analysis was conducted using R and R Studio IDE (version 3.4.3). A mixed ANOVA analysis was conducted with the scaling algorithm (grasp Quality, grasp Feasibility, Combined) as the within subject factor and the direction of scaling (Smaller, Larger) as the between subjects factor, and their interaction, followed by a Bonferroni adapted post-hoc analysis for scaling algorithm. 3.4. Case study—objects from a RGB-D database Five objects, the rolodex jumbo pencil cup, Elmer’s washable glue, Kong duck dog toy, Kong AirDog squeakair ball, and Dr.
Table 1 Cylinder size variants for which scaled maps were constructed from the cylinder shape primitive with radii 2 cm and height 20 cm. Five cylinders (shaded in gray) are larger than the base cylinder, and five (not shaded) are smaller.
Brown’s bottle brush, from an online, public 3D RGB-D pointcloud database [38] were used to test the complete methodology. Their point-cloud resolution is 0.1 cm2 . All objects did not have a compatible graspability map in the database. Shape identification was conducted with 10 repetitions and θth was set as 0.5 (coarse clustering). Fifteen high quality grasps of Elmer’s washable glue were randomly selected from the adapted graspability map (uniformly distributed about the object) and tested in a physical setup with a robotic manipulator to validate the computations. The washable glue container was placed on a table in front of the robot. The gripper was positioned with open fingers based on the defined grasp configuration. The fingers were closed and the object was lifted above the table and then returned to the table where the gripper was re-opened. 4. Results 4.1. The number of repetitions required for shape identification Results of testing the appropriate number of repetitions required for shape identification are presented for all five shape primitives in Fig. 4. The half-toroid required the largest number of repetitions for stable identification. The figure shows that after only 10 repetitions the classification of all the objects started to converge, with 90% success rate for all objects except the halftoroid which had a lower success rate of 83%. Success rates above 95% were reached at about 100 repetitions. The average computation time for a single repetition was 0.21 s, with an average of 0.9 s for 10 repetitions, and 8.48 s for 100 repetitions.
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
7
Table 2 Average scaled map quality, time reduction, and initial grasps generation time for scaling a FS graspability map of a cylinder shape primitive (radii 2 cm and height 20 cm). Results for which the scaling algorithms significantly differ are in bold. Scaling Algorithm
Scaling direction
SRN [%]
VRN [%]
TR [%]
Time to first grasp [ms]
Time to 10th grasp [ms]
GQ GQ GF GF C C
Larger Smaller Larger Smaller Larger Smaller
71 (14) 68 (11) 79 (13) 72 (6) 91 (5) 91 (4)
81 (14) 59 (11) 84 (8) 67 (13) 94 (4) 82 (9)
97 (0.0) 97 (3) 96 (1) 97 (2) 96 (1) 97 (3)
133.8 (69.6) 42.6 (44.3) 69.4 (42.0) 32.8 (20.7) 71.0 (41.6) 33.2 (20.6)
1305.0 (839.8) 769.6 (1231.5) 506.6 (275.5) 270.8 (209.9) 531.0 (294.9) 280.6 (217.0)
Abbreviations: GC—grasp Quality scaling, GF—grasp Feasibility scaling, C—Combined scaling, SRN —Surface rate with respect to FS map, VRN —Volume rate with respect to FS map, TR—Time reduction. Table 3 Adapted graspability map generation. Object
Time [s]
Number of grasps
Elmer’s washable glue Rolodex jumbo pencil cup Kong duck dog toy KONG AirDog squeakair ball
208 163 182 117
31,040 47,728 12,504 5,268
4.3. Case study—objects from a RGB-D database
Fig. 4. Shape identification success rate versus number of repetitions for each of the five shape primitives.
4.2. Map scaling for familiar objects A FS graspability map and a scaled map computed using the Combined scaling algorithm for a cylinder (radii 2 cm, height 20 cm) are shown in Fig. 5. It can be seen that the high-quality grasps are retained in the scaled map. Some of the lower quality grasps that could have still been successful are missed by the scaling algorithm. Scaled map generation results for the cylinder are presented in Table 2. For both scaling directions (Smaller and Larger) the average SRN is 91%, the average VRN for Larger objects is 94%, and the average VRN for Smaller objects is 82%. The time reduction in generating a scaled map rather than a FS map is very high, 97%– 96%. Using the Combined scaling algorithm, the time to generate the first high quality grasp is 18–218 ms, the time to generate the first ten high quality grasps is 107–3232 ms, and generating the fully scaled map takes 18–263 s. For SRN there is no effect of scaling direction, while the scaling algorithm is significant (F2,16 = 12.24, p<.001) and post-hoc test show that the Combined scaling algorithm causes less quality reduction than both the grasp Quality and the grasp Feasibility algorithms (for grasp Quality p<.001, and for grasp Feasibility p<.01) which are similar to each other. For VRN map quality reduction is higher for Smaller than for Larger cylinders sizes (F1,8 = 7.39, p<.05). Additionally, the scaling algorithm is significant (F2,16 = 20.82, p<.0001) and post-hoc tests show that the Combined scaling algorithm causes less map quality reduction than the grasp Quality algorithm (p<.05), but is similar to the grasp Feasibility scaling algorithm. There is no interaction between algorithm and scaling direction. The time reduction ratio, the time to find the first high quality grasp, and the time to find the ten first high quality grasps were similar for both scaling directions and for all three scaling algorithms.
The Rolodex jumbo pencil cup was identified as a familiar cone. Elmer’s washable glue was identified as an unknown object composed of two cylinders of different dimensions. The Kong duck dog toy was identified as a familiar cylinder. The Kong AirDog squeakair ball was identified as an unknown object composed of a sphere and a box. Clustering of Dr. Brown’s bottle brush failed since the point-cloud was not continuous and there were large gaps between points (Fig. 6). The total map computation time, for all maps, was less than 4 min (Table 3). The total number of grasps was very high (∼5200–31,000 depending on object). Graspability maps created for all four correctly identified objects are presented in Fig. 7. All physically tested high-quality grasps of Elmer’s washable glue led to successful completion of a pick and place task (Fig. 8). When the main axes of Elmer’s washable glue were not aligned with the fingers of the gripper the object was re-oriented by the gripper while the fingers were closed (Fig. 8(D)). 5. Discussion The primitive shape identification algorithm successfully and efficiently identified the presented shapes. Object cylindrical symmetry along its main axis was an important feature that contributed to reduce the number of required repetition for the cylinder, sphere, box, and cone. However, the cylindrical symmetry of the half-toroid is influenced by its dimensions which hinders its recognition rate. An additional feature for recognition of the halftoroid may improve recognition performance. For familiar objects the very fast generation of the first grasp, and even the first ten grasps makes the algorithm for graspability map adaptation suitable for use for determining goal grasp poses during run-time, in conjunction with reach-to-grasp path planning algorithms. Since when time is critical, these grasps can be used for planning even before the full map is computed. The quality of the scaled maps remains high, especially for the Combined scaling algorithm which incurs less reduction in quality without increasing computation time for both the full map or for the initial grasps. Two of the five Smaller variants tested have only a smaller height and three have either only a smaller radius or both a smaller radius and a smaller height. The two variants for which only the height is smaller have a larger VRN (mean 87%) than the three other variants (mean 78%). The reduction in VRN for cylinders with the smaller radius is related to the high sensitivity of these objects
8
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
Fig. 5. Graspability maps for a cylinder with radii 2 cm, height 20 cm. Left: FS graspability map. Right: Adapted graspability map.
Fig. 6. Left: Dr. Brown’s bottle brush, and Right: its point-cloud.
Fig. 7. Top: Adapted graspability maps, and Bottom: corresponding objects. (A) Rolodex jumbo pencil cup (B) Elmer’s washable glue (C) Kong duck dog toy and (D) Kong AirDog squeakair ball.
to the force closure measure. A small change in the position of
cylinders curvature. This sensitivity is increased when cylinder
the wrist can lead to a large change in this measure due to the
radius is decreased, as then the cylinder’s curvature about its main
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
9
Fig. 8. Grasping Elmer’s washable glue. Physical implementation (Left side of each dashed box) of the grasp pose from the graspability map (Right). (A–C) Grasps where the main axes of the object were aligned with the gripper’s fingers ((A) the gripper facing the object from the side, grasping the wide dimension, (B) above, narrow, (C) side, narrow). The object was grasped by the robot exactly as planned. (D) A grasp where the main axes of the object were not aligned with the gripper’s fingers (the gripper facing the object from the side, the object diagonal to the fingers). The object was grasped but re-oriented by gripper during the grasp, such that it was finally grasped by its narrow dimension, similar to the grasp performed in (C).
axis is increased. For objects with zero curvature, such as a cube, this would not form a problem. While the shape identification algorithm, as a MLESAC-based algorithm is robust to erroneous data samples, it is susceptible to missing data. For the RGB-D sensor point-clouds, objects where successfully identified accept when a relatively large amount of data was missing. When the point-cloud was non-continuous and sparse (Dr. Brown’s bottle brush), the identification failed. In such a case additional data regarding the object is required. Either a theoretical model, or data from additional sensors, e.g., sonar sensors [39]. When the identification process succeeded, the shape primitive database was sufficiently rich, so there was no need for forming many clusters. Two objects were identified as familiar objects and two as constructed from two shape primitives. In the adapted graspability maps, high quality grasps were found in many locations about the object’s surface and volume for the Rolodex jumbo pencil cup, Elmer’s washable glue, and Kong duck dog toy. For the Kong AirDog squeakair ball the point-cloud of the box part is very thin, therefore, high quality grasps were found only about its spherical region. As for shape identification, for resolving problems caused by missing data, theoretical models or additional sensing modules are required. As indicated by the generated graspability map, the grasps tested with the physical system were all successful. The re-orientation of the glue by the gripper’s fingers when its main axes were not aligned with the fingers occurred since for map generation grasp forces are computed assuming point contacts and rigid objects. In the physical implementation the contacts are in fact surface contacts and the object has some compliance (the glue inside is in liquid form). Objects can typically be grasped in several different ways. Research of human grasping has shown that the chosen grasp depends on the object, environment, task, and the condition and capabilities of the subject performing the operation [40–42]. Object properties include visible properties, e.g., size, texture, and surface couture, and estimated properties based on a-priori knowledge, e.g., weight, and fragility. Environment properties include social and physical features, along with the spatial positioning of the object. The amalgamation of properties of an object and agent
within an environment determine the object’s affordances [43]. Experimental findings indicate that a single object motor representation codes all the object affordances, and that it is involved in implementing reach-to-grasp motion [44]. In-line with these insights, graspability maps offer an object-centered representation of knowledge regarding grasp configuration quality for the object, agent, and the task at hand. Since generation of full scale graspability maps is time-consuming, the developed method of map adaptation for familiar and unknown objects greatly enhances the usability of graspability maps. Facilitating adaptation to familiar objects of different sizes is important both for gripper design analysis and for use of graspability maps for grasp planning in unstructured environments, e.g., for pepper picking where sizes vary even within cultivar [10]. Graspability map adaptation based on shape primitives for unknown objects, facilitates at least partial utilization of the a-priori knowledge regarding grasps for each identified shape primitive. This capability is important for utilization of graspability maps for grasp planning in unstructured environments, in which the robot may encounter unknown objects, or in which complete object recognition may fail due to partial obstructions. The graspability maps in the full scale map database are computed a-priori for a set of primitive object shapes. This set is chosen based on the anticipated features of the objects for which grasp planning will be required during run-time. Unknown objects may require additional shape primitives for correct representation of grasping possibilities. In such a case full scale maps for additional shape primitives can be added to the database. The process of adding additional full scale maps is time-consuming, and cannot be done during time critical operations. In dense environments object segmentation can be very challenging. In such cases grasp planning without the need for object, or even primitive shape, recognition can make the system very robust [15]. However, in such a case, grasp planning is not adapted to the object type and to the tasks related to it. Such adaptation is facilitated by the use of graspability maps. Therefore, when the task and object type are important for grasp pose selection, using graspability maps for grasp planning would have an advantage over grasp planning without object or primitive recognition. In the current work the map adaptation algorithm was tested using a single two fingered, jaw gripper for both the FS map
10
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11
database generation and for map adaptation. The Combined scaling algorithm takes gripper dimensions into account. Therefore, using this algorithm, it is also possible to use a gripper with different dimensions for map adaptation. Yet, in such a case all maps (even maps of known objects) should be scaled based on the dimensions of the new gripper. The grasp quality computation method employed in the current work is based on point-cloud object representations, i.e., on visual information. Therefore, grasp quality is assessed assuming the object’s center of gravity is located in its center of mass. For dealing with objects for which the center of gravity is located elsewhere, additional information regarding the location of the center of mass is needed. Such information can be embedded in the map generation process and can be available for run-time map adaptation, yet determination of the center of gravity during runtime relies on prior knowledge and on object recognition. 6. Conclusions The current work presents a novel approach for efficient construction of graspability maps for known, familiar, and unknown objects. We demonstrate that the method significantly reduces computation time with respect to full-scale map generation and that the adapted maps incur only a small reduction in map quality. The reduction in quality is effected by object curvature, where there is a larger reduction in grasp volume rate, for size variants with larger curvatures. The method can be used for computing grasps during run-time and for computing a set of graspability maps that can be used as input for further analysis, e.g., for gripper design analysis or for computing a grasp generation models. Using graspability maps offers integration of prior knowledge and performance guarantees with fast grasp generation, facilitating grasp planning suitable for the affordance of the object at hand. Acknowledgments Research supported by the Ben-Gurion Paul Ivanier center for Production Management, Israel, and by the Helmsley Charitable Trust, USA through the Agricultural, Biological and Cognitive Robotics Center at Ben-Gurion University of the Negev, Israel. References [1] M. Jeannerod, Intersegmental coordination during reaching at natural visual objects, in: J. Long, A. Baddeley (Eds.), Attention and Performance IX, vol. 9, 1981, pp. 153–168. [2] R. Suárez, M. Roa, J. Cornella, Grasp Quality Measures, Institut d’Organització i Control de Sistemes Industrials, 2006. [3] M.A. Roa, S. Raúl, Grasp quality measures: review and performance, Auton. Robots 38 (2015) 65–88. [4] J. Bohg, A. Morales, T. Asfour, D. Kragic, Data-driven grasp synthesis - a survey, IEEE Trans. Robot. 30 (2) (2014) 289–309. [5] Y. Li, J.L. Fu, N. Pollard, Data-driven grasp synthesis using shape matching and task- based pruning, IEEE Trans. Vis. Comput. Graphics 13 (4) (2007) 732–747. [6] N.S. Pollard, A. Wolf, Grasp synthesis from example: Tuning the example to a task or object, in: F. Barbagli, D. Prattichizzo, K. Salisbury (Eds.), Multi-point Interaction with Real and Virtual Objects, Springer, The Netherlands, 2005, pp. 77–90. [7] Q. Lei, J. Meijer, A survey of unknown object grasping and our fast grasping algorithm-c shape grasping, in: International Conference on Control, Automation and Robotics, 2017. [8] M.A. Roa, K. Hertkorn, F. Zacharias, C. Borst, G. Hirzinger, Graspability map: A tool for evaluating grasp capabilities, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, pp. 1768–1774. [9] D. Eizicovits, S. Berman, Efficient sensory-grounded grasp pose quality mapping for gripper design and online grasp planning, J. Robot. Auton. Syst. 62 (8) (2014) 1208–1219. [10] D. Eizicovits, B.V. Tuijl, S. Berman, Y. Edan, Integration of perception capabilities in gripper design using graspability maps, Biosyst. Eng. 146 (2016) 98– 113.
[11] C. Goldfeder, M. Ciocarlie, H. Dang, P.K. Allen, The Columbia grasp database, in: IEEE International Conference on Robotics and Automation, 2009, pp. 1710– 1716. [12] C. Goldfeder, P.K. Allen, Data-driven grasping, J. Auton. Robots 31 (2011) 1–20. [13] S. Ullman, Three-dimensional object recognition based on the combination of views, Cognition 67 (1) (1998) 21–44. [14] M.C. Delfour, Shape analysis via oriented distance functions, J. Funct. Anal. 123 (1994) 129–201. [15] D. Fischinger, A. Weiss, M. Vincze, Learning grasps with topographic features, Int. J. Robot. Res. 34 (9) (2015) 1167–1194. [16] C. Dune, E. Marchand, C. Collowet, C. Leroux, Active rough shape estimation of unknown objects, in: IEEE International Conference on Intelligent Robots and Systems, 2008, pp. 3622–3627. [17] S. Jain, B. Argall, Grasp detection for assistive robotic manipulation, in: IEEE International Conference on Robotics and Automation, ICRA, Stockholm, Sweden, 2016. [18] A.T. Miller, S. Knoop, H.I. Christensen, P.K. Allen, Automatic grasp planning using shape primitives, in: IEEE International Conference on Robotics and Automation, vol. 2, 2003, pp. 1824–1829. [19] J. Aleotti, S. Caselli, Part-based robot grasp planning from human demonstration, in: IEEE International Conference on Robotics and Automation, 2011. [20] R. Ala, D.H. Kim, S.Y. Shin, C.-H. Kim, S.-K. Park, A 3d-grasp synthesis algorithm to grasp unknown objects based on graspable boundary and convex segments, Inform. Sci. 295 (2015) 91–106. [21] V.D. Nguyen, Constructing force-closure grasps, Int. J. Robot. Res. 7 (3) (1988) 3–16. [22] C. Ferrari, J. Canny, Planning optimal grasps, in: IEEE International Conference on Robotics and Automation, vol. 3, 1992, pp. 2290–2295. [23] K.B. Shimoga, Robot grasp synthesis algorithms: A survey, Int. J. Robot. Res. 15 (3) (1996) 230–266. [24] A. Bicchi, V. Kumar, Robotic grasping and contact: A review, in: IEEE International Conference on Robotics and Automation, vol. 1, 2000, pp. 348–353. [25] J. Bohg, D. Kragic, Learning grasping points with shape context, Robot. Auton. Syst. 58 (4) (2010) 362–377. [26] M.R. Cutkosky, R.D. Howe, Human grasp choice and robotic grasp analysis, in: Dextrous Robot Hands, Springer, 1990, pp. 5–31. [27] A. Sahbani, S. El-Khoury, P. Bidaud, An overview of 3-D object grasp synthesis algorithms, J. Robot. Auton. Syst. 60 (3) (2012) 326–336. [28] H. Dang, P.K. Allen, Semantic grasping: Planning task-specific stable robotic grasps, Auton. Robots 37 (3) (2014) 301–316. [29] Q. Lei, G. Chen, M. Wisse, Fast grasping of unknown objects using principal component analysis, AIP Adv. 7 (2017) 095126. [30] D. Chen, V. Dietrich, G. von Wichert, Precision grasping based on probabilistic models of unknown objects, in: IEEE International Conference on Robotics and Automation, ICRA, Stockholm, Sweden, 2016. [31] L. Wong, Learning to select robotic grasps using vision on the Stanford artificial intelligence robot, Stanford Undergraduate Res. J. 7 (2008) 59–64. [32] K. Huebner, S. Ruthotto, D. Kragic, Minimumi volume bounding box decomposition for shape approximation in robot graspng, in: IEEE International Conference on Robotics and Automation, 2008. [33] P.H. Torr, A. Zisserman, MLESAC: A new robust estimator with application to estimating image geometry, Comput. Vis. Image Underst. 78 (1) (2000) 138– 156. [34] G. Barequet, S. Har-Peled, Efficiently approximating the minimum-volume bounding box of a point set in three dimensions, J. Algorithms 38 (2001) 91– 109. [35] S. May, D. Droeschel, D. Holz, S. Fuchs, E. Malis, A. Nüchter, J. Hertzberg, Threedimensional mapping with time-of-flight cameras, J. Field Robot. 26 (11–12) (2009) 934–965. [36] M.A. Fischler, R.C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24 (6) (1981) 381–395. [37] S. Garcia, Fitting primitive shapes to point clouds for robotic grasping, in: School of Computer Science and Communication, Royal Institute of Technology, Stockholm, Sweden, 2009. [38] C. Rennie, R. Shome, K.E. Bekris, A.F. De Souza, A dataset for improved RGBDbased object detection and pose estimation for warehouse pick-and-place, IEEE Robot. Autom. Lett. 1 (2) (2016) 1179–1185. [39] Y. Yovel, M.O. Franz, P. Stilz, H.-U. Schnitzler, Plant classification from bat-like echolocation signals, PLoS Comput. Biol. 4 (3) (2008) e1000032. [40] U. Castiello, The neuroscience of grasping, Nature Rev. 6 (2005) 726–736. [41] J. Friedman, T. Flash, Task-dependent selection of grasp kinematics and stiffness in human object manipulation, Cortex 43 (2007) 444–460. [42] D. Koester, T. Schack, J. Westerholz, Neurophysiology of grasping actions: Evidence from ERPs, Front. Psychol. 7 (2016) 1996. [43] J.J. Gibson, The Ecological Approach to Visual Perception, Erlbaum, Hillsdale, NJ, 1986. [44] M. Gentilucci, Object motor representation and language, Exp. Brain Res. 153 (2003) 260–265.
D. Eizicovits, S. Berman / Robotics and Autonomous Systems 110 (2018) 1–11 Danny Eizicovitz received his B.Sc. in Electrical Engineering, from the Tel-Aviv University, Israel, the M.Sc. in Industrial Engineering and Management, from BenGurion University of the Negev and the Ph.D. degree in industrial engineering from Ben-Gurion University of the Negev, Beer-Sheva, Israel. Since 2016, he serves as the CTO of Young Engineers LTD. His research interests include: robotics and automation, system engineering, artificial intelligence, computer vision and human motor control.
11
Sigal Berman is an associate professor in the Department of Industrial Engineering and Management at Ben-Gurion University of the Negev. She received a B.Sc. in Electrical and Computer Engineering, from the Technion; a M.Sc. in Electrical and Computer Engineering, and a Ph.D. in Industrial Engineering, both from Ben-Gurion University of the Negev. Sigal leads the Telerobotics laboratory where robotic motion control, learning, and interaction methodologies are developed and assessed. Her research interests include robotics, manipulation, and human motor control.