Caging a novel object using multi-task learning method

Caging a novel object using multi-task learning method

Neurocomputing 351 (2019) 146–155 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Caging ...

3MB Sizes 0 Downloads 29 Views

Neurocomputing 351 (2019) 146–155

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Caging a novel object using multi-task learning methodR Jianhua Su a,∗, Bin Chen b, Hong Qiao a, Zhi-yong Liu a a b

State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, 100190 Beijing, PR China University of Chinese Academy of Sciences, 100190, Beijing, PR China

a r t i c l e

i n f o

Article history: Received 23 July 2018 Revised 6 November 2018 Accepted 27 March 2019 Available online 19 April 2019 Communicated by Bo Shen Keywords: Multi-task learning Grasping Kernel regression

a b s t r a c t Caging grasps provide a way to manipulate an object without full immobilization and enable dealing with the pose uncertainties of the object. Most previous works have constructed caging sets by using the geometric models of the object. This work aims to present a learning-based method for caging a novel object only with its image. A caging set is first defined using the constrained region, and a mapping from the image feature to the caging set is then constructed with kernel regression function. Avoiding the collection of large number of samples, a multi-task learning method is developed to build the regression function, where several different caging tasks are trained with a joint model. In order to transfer the caging experience to a new caging task rapidly, shape similarity for caging knowledge transfer is utilized. Thus, given only the shape context for a novel object, the learner is able to accurately predict the caging set through zero-shot learning. The proposed method can be applied to the caging of a target object in a complex real-world environment, for which the user only needs to know the shape feature of the object, without the need for the geometric model. Several experiments prove the validity of our method. © 2019 Elsevier B.V. All rights reserved.

1. Introduction In the field of robotic grasping, many research studies have been focused on the computation of grasps based on specific conditions, such as “force closure” or “form closure,” to prevent the motion of the grasped objects. A survey of the field of form closure or force closure grasps has been reported in [1]. Compared to the previous works motivated by precision manipulation which requires precision-pose estimation, caging enables dealing with the pose uncertainties of the object. Kuperberg [2] initially presented a definition for the cage. They regarded the caging set as the set of pins placements that prevented an object from moving arbitrarily from a given position. Kriegman [3] defined the cage as the set of gripper configurations that might not immobilize the object being manipulated but prevent it from escaping to infinity. Mostafa Vahedi and van der Stappen [4] developed algorithms for computing all possible placements of two- and three-pins that could cage a given closed polygon. When caging a polygon object with three

R This work is supported in part by NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization under Grant number U1509212, supported by Beijing Natural Science Foundation under Grant number 4182068, supported by NSFC under Grant number 91848109 and supported by Science and Technology on Space Intelligent Control Laboratory under Grant number HTKJ2019KL502013. ∗ Corresponding author. E-mail addresses: [email protected] (J. Su), [email protected] (B. Chen), [email protected] (H. Qiao), [email protected] (Z.-y. Liu).

https://doi.org/10.1016/j.neucom.2019.03.063 0925-2312/© 2019 Elsevier B.V. All rights reserved.

pins, the authors computed all placements of the third pin when the placements of two base pins were given. Diankov et al. [5] regarded a cage as the condition where a robot gripper constrained the configuration space of an object to a finite volume. They then planned the grasps of a multi-finger gripper by computing the cages specific to the objects. Rodriguez et al. [6] illustrated that some cages were suited to grasp an object, that is, a set of configurations of the gripper existed where the object would never escape the reach of the pins of the gripper. They also discussed the relationship between the grasps and cages of a planar object with twoor three-pins. Wan et al. [7] developed a method to manipulate the planar convex objects with three or four pins in a grasping-bycaging way. They also used the eigen-shapes to fix a gripper into a series of pins formations and built a mapping between the rasterized grids in 2D space for caging the 2D shapes [8]. Recently, Wan further developed his theoretical studies and presented two applications like multi-robot cooperative transportation [30] and pointfinger grasping-from-caging [31]. In order to approximate the maximal contact surface between the gripper and the object, Zarubin et al. [9] used geodesic balls on the surface of an object to generate the caging grasps with a realistic gripper simulation. Jia et al. [10] extended the caging strategy to squeeze deformable ring-like 2D objects. Bunis et al. [32] discussed the caging of polygonal objects with three-finger hands which maintain an equilateral triangle formation during the grasping process. In a real-world application, a robot usually has not the full knowledge of the model and poses of an object to be grasped, but will only have incomplete

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

information from sensors such as CCD cameras, tactile sensors, and etc. This will make the caging significantly more challenging [11]. However, most caging grasps need the geometric models of an object to construct the caging set, though this is not always the case. To the best of our knowledge, only a few studies have discussed the caging grasping of unknown objects. The main challenge in caging grasps of an unknown object is to find the caging set using only its images. Contrary to the analytic approach, learning-based methods place more weight on the feature extraction and the object classification. The resulting data is then used to retrieve grasps from some knowledge base or sample and rank them by comparison to existing grasp experience. Such methods have started to be popular in the last ten years. Saxena et al. [11] used a set of synthetic images from the training set to train the grasping configuration. In the experiments, a wide variety of objects, such as wine glasses, duct tapes, and jugs, none of which were in the trained set, were grasped by a two-finger gripper. In a more recent work by Lenz et al. [12], a deep network was further applied for detecting grasps in the RGB-D view of a scene containing objects. In the work of Bohg and Kragic [13], learning of the grasping points took place from labeled synthetic images in which the descriptor of the object was based on the context of the shape. In some other works such as Boularias et al. [14], Balaguer and Carpin [15], and Song et al. [16], the image training set was again used for learning of the object-grasp relations. Hyttinen et al. [17] presented a method of box approximation for the purpose of robot grasping. The algorithm included a heuristical selection based on efficient geometrical calculations, as a trained neural network to choose the final best hypothesis. Khoury and Sahbani [18] classified the graspable and nongraspable parts of an object using an artificial neural network (ANN), with a force-closure grasp then needed to be synthesized on the graspable part. Goins et al. [19] applied the Gaussian process (GP) algorithm to determine the relationships among the grasp metrics as well as to provide the variance in the predicted value. Thus, they could determine the predicted grasp success rate prior to the execution. It is generally believed that the larger the sample the more accurate the results are. However, it is a very laborious and timeconsuming effort to accurately extract the attributes and label the training samples, especially in the robotic applications. To learn the cages from small samples, this paper develops a new learning method to train the caging models. Firstly, we define the caging configurations using the constraint region formed in the caging process. And then, we develop a multi-task kernel regression method to train different caging configurations with a joint model, where caging grasps can be built with small samples. Moreover, we transfer the caging knowledge to a new cage using the similarity between the new object and the samples. This paper is an extension of the author’s conference paper [29] that learns the caging model using multi-task regression. In this work, we transfer the caging experience to a new caging task using shape similarity. Thus, given only the shape context for a novel object, the learner is able to accurately predict the caging set through zero-shot learning. The contributions of this work are concluded as follows: (a) A learning-based method is presented for caging a novel object only with its image. Computing the caging formation is a difficult and computationally intensive job. Using learning to challenge the difficulty is definitely an interesting and promising idea. To my best knowledge, this is the first paper that formulates the caging problems in the machine learning regime. (b) Avoiding the collection of large number of samples, a multitask learning method is developed to build the regression

147

function, where caging grasps can be built with small samples. The rest of the paper is arranged as follows: Section II introduces the caging region. In Section III, the on-line multi-task learning method is presented, followed by the computation of the caging region for a novel object. In Section IV, a grasping system is established, based on which several experiments are conducted to illustrate the proposed strategy. 2. Online multi-task learning for caging In this section, we introduce several concepts and definitions related to the grasping configuration, cage, caging grasp, and constraint region. A grasping configuration [1] refers to a placement of pins that is in contact with the object faces, and the contact points between the pins and the object satisfy some criterion such form or force closure. A cage [2] is a set of pins placements where the object could not escape as long as the pin placements are kept inside the set. We claim that a gripper cages an object if any motions could not carry the object out of the obstacle formed by the pins. A caging grasp ensures that an object will not escape from the obstacle formed by the pins. The set of the caging grasps is significantly larger than the set of the form closure grasps. Moreover, caging grasps are considerably less sensitive to the pins misplacements. An uncaging grasp could not guarantee that an object always be enclosed by the gripper. A constraint region [20] is denoted by a set of grasping configurations from which an object will be pushed to a stable state under the squeezing forces of the pins. We assume that there is a non-linear system dX/dt = f(X,F,t), where X is the object state and F is the squeezing forces . For all X∈, if there is a state-independent input F and a special function g(X) satisfying • g(X) > g(X0 ) when X = X0 , and g(X ) = g(X0 ) when X = X0 , with • dg(X)/dt < 0, then the object state is stable in the region , called the “constraint region”, and the function g is referred to as a constraint function. Though the definition of constraint region looks quite similar to the Lyapunov function, there are some differences between them, as explained as follows: (a) the constraint function aims to find a input F, which would make a grasping system X˙ =g(X, •) approach to a stable state, and the Lyapunov function is used to judge about the stability of the nonlinear system X˙ =f (X, u(X,t)); and (b) the reduction of the constraint function g(X) can be easily achieved by the squeezing forces of the grasping system. We usually detect the cage with the geometric model of the object, as discussed in our previous work [20]. However, it is difficult to construct the caging set for a novel object. The learning-based method, which retrieves the cage from some knowledge base or sample, will be suitable to deal with the problem of caging new objects. In the following sections, we will firstly describe how to detect the caging region with the geometric model, and then analyze how to learn the caging region without the model. 2.1. Detection of the cage region We denote the coordinate frame attached to the gripper by xg yg og zg , in which og is at the center of the griper’s base plane,

148

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

caging grasping configuration. Next, we aim to detect the boundary by learning-based method. As shown in Fig. 2, a cage region is established using Eq. (1) with the geometry information of the object. Then, a caging grasp, which is inside the region enclosed by the boundary, can be obtained. However, in the case of an unknown object, it is difficult to derive the caging region with the analytic method. In this paper, we let the robot learn how to cage by experience gathered during the caging execution. Moreover, to avoid collection of a large volume of data, learning of the cage regions for various objects is done jointly. Fig. 1. (a) Four-pin gripper coordinate frame; (b) Projection of the object.

2.2. Multi-task kernel regression (MTKR) for cages

Algorithm 1 Detection of the boundary of the caging set. Input: Curved surface of g(X). 1: Detect a local minimum of g(X), denoted by g(X0 ). If g(X0 ) cannot be found, jump to Quit. 2: Initialize σ = g(X0 ) + σ . 3: Slice the curved surface of g(X) using the plane g(X) = σ , and obtain the curve c(X) = σ . 4: If c(X) is closed, Update σ and go to Step 3. 5: Else go to quit; Output: c(X) is the boundary of the caging set U(X). Quit: U(X) is not a caging set.

the zg -axis is perpendicular to the gripper’s palm, and the xg -axis passes through the gripper’s center og and Pin-1 (Fig. 1). It is assumed that the polyhedron initially lies on the support plane, denoted by the xoy-plane, where the x-axis is parallel to the xg -axis, the y-axis parallel to the yg -axis, and og is the projection of o. The state of the grasped object in the coordinate frame is denoted by X=(x,y,θ z ), where θ z is the rotational angle of the polyhedron around the zg -axis. We define a grasping function by

g( X ) = d x + d y

(1)

where dx =2·max(|c1 |,|c3 |), and dy =2·max(|c2 |,|c4 |), ci (i = 1,2,3,4) is the intersection point of the projection polygon and the coordinate axes, and |ci | (i = 1,2,3,4) is the distance between the point ci and the point og . In the configuration space (x, y, θ z ) shown in Fig. 2a, a caging configuration should be found, which relates to the grasping configuration illustrated in Fig. 2b. It is noted that the graph shown in Fig. 2b corresponds to the point marked by “caging configuration” inside the form closure grasping cage set in Fig. 2a. Algorithm 1 describes the detection of the boundary of the caging set [20]. The boundary of the caging set is represented by the closed curve, that is, the gripper configuration inside the boundary is a

We use the boundary of the constraint region to describe the cages, and then define the mapping from the shape of the object to the cage region by the regression function

yi = g( xi ) + εi , i = 1, 2, . . . , N

(2)

where xi denotes the feature of the caged object and y is the caging region. Similarly, g is the regression function to be estimated, ε i denotes the independent and identically distributed zero mean of the noise values, and N is the total number of the samples. The Kernel Regression [21,22,28] provides a mechanism for computing point-wise estimates of the function with minimal assumptions about the global signal or noise models. It is assumed that the second derivative of the regression function exists, the polynomial expansion of g can be written as [27,28,32]:

y(xi ) = β0 + β1 (xi − x ) + β2 (xi − x )T (xi − x )

(3)

where xi and x are mx × 2 matrices and y is a my × 2 matrix. Eq. (3) suggests that the estimation of the parameter matrix (β 0 , β 1 , β 2 ) can create the desired regression function based on the data. It is noted that β 0 is a matrix of size my × 2, β 1 is a matrix of size my × mx , and β 2 is a matrix of size my × 2. However, the regression problem often suffers from insufficient training dataset. Multi-task learning [23,26,27] is a way aiming to learn multiple related but different tasks jointly. It is assumed that the parameter vectors of Eq. (3) can be represented using a linear combination of shared latent model components from a knowledge repository. That is, each parameters matrix β , which is a matrix of size my × N·m x with m x =(2+mx +2), can be represented as a linear combination of the columns of L according to the sparse coefficients s. Thus, we can obtain β = sL, where L is the shared basis of the parameters matrix of size my × m x with each column representing a latent task, where my is the dimension of the parameters of the regression function. S is a matrix of size M·my × 2 containing the weights of the linear combination for each task. The predictor β ( k ) for task k is given by s( k ) L, where s( k ) is k’th sub-matrix of matrix S. It is assumed that there are M tasks and {(xi(k ) , yi(k ) ) : i = 1, 2, . . . , N} is the training set for each task k = 1, 2,…, M. We optimize the Eq. (3) to minimize the predictive loss over all cages sharing the structure by the following objective function:

min L,s

M  N   T  ( k ) ( k ) ( k )  yi(k ) − s(k ) LXi(k ) yi − s LXi k=1 i=1





·KH xi(k ) − x(k ) + μ||s||1 + λ||L||1 with



Fig. 2. (a) Constraint region of an object, where the red line is the boundary of the constraint region; (b) Cage region related to a point inside the set. Note that the four red circles are the four pins.



⎛ 

1 ⎜ k KH xi( ) − x(k ) = exp ⎝− H

(4)

k xi( ) − x(k )

T 

k xi( ) − x(k )

2H σ 2

⎞ ⎟ ⎠.

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

where (xi(k ) ; yi(k ) ) is the i th labeled training data for caging task k, and ||·||1 is the 1 norm of the matrix s, μ and λ are two regularization parameters. The shape of the kernel weights is determined by the predefined radial basis kernel K that penalizes distance away from the local position, in which the bandwidth H controls the strength of this penalty. The cost function in Eq. (4) is convex in L for fixed S and is convex in S for fixed L. We adopt alternating optimization strategy that converges to a local minimum. For a fixed L, the optimization function can be decomposed in individual problems for S.

s(k ) = arg min s

T

(yi(k) − s(k) LXi(k) ) (yi(k) − s(k) LXi(k) )

i=1

(k )

·KH (xi where

N 

− x(k ) ) + μ||s||1

(5)



s(new) =





= X1(k ) , X2(k ) , . . . , XN(k ) , Y (k ) = y1(k ) , y2(k ) , . . . , yN(k ) ,





Equating the gradient of Eq. (5)) to zero gives





= Y (k )W (k )T X (k )T LT + μ LX (k )W (k )T X (k )T LT

−1

(6)

For a fixed S, the optimization problem reduces to the following:

min L

T  N 

n  i=1

W (k ) = diag KH (x1(k ) − x(k ) ), . . . , KH (xN(k ) − x(k ) ) ,

s (k )

Input: M: Number of all tasks k: Number of latent tasks Output: Task predictor matrix L, S, and β . 1: Learn β (k) for each task using only its own data. 2: Let W be the matrix containing (β (1) ,…, β (M) ) as columns. 3: Compute top-k singular vectors by singular value decomposition, i.e., W = U VT 4: Initialize L to the first my -row of U. 5: for k = 1 to M do T T T T s(k ) = (Y (k )W (k ) X (k ) LT + μ )(LX (k )W (k ) X (k ) LT )−1 end if 6: Construct matrix S = [s(1) , s(2) , ..., s(M ) ]T , X = [X(1) , X(2) , ..., X(M ) ], W = diag(W (1) , W (2) , ..., W (M ) ), Y = diag(Y (1) , Y (2) , ..., Y (M ) ) 7: Fix S and solve L with Eq. (8). L = (ST S )−1 (−ST Y W X T + λ)(X W X T )−1 Return outputs: L, S, and β = SL.

using the direct similarity-based knowledge transfer method, as follows:

Next, the following matrices are defined:

X

Algorithm 2 MTKR algorithm.



I2×2 (k ) − x (k ) ) ( x ⎜ ⎟ i ⎟, Xi(k ) = ⎜ .. ⎝ ⎠ . (k ) T (k ) (k ) ( k) ( x − xi ) ( x − xi ) (k )

149

T

(yi(k) − s(k) LXi(k) ) (yi(k) − s(k) LXi(k) )

k=1 i=1

·KH (xi(k ) − x(k ) ) + λ||L||1

(7)

Ds (Bnew , Bi ) s (i ) n i=1 Ds (Bnew , Bi )

(9)

where Ds (Bnew , Bi ) denotes the shape context distance between the new shape Bnew and the shape Bi belonging to the training set. N is the total number of the samples. We introduce the vector sets to calculate the similarity of two shapes. Assuming that p1 is the original point, we connect p1 and other points to build the vectors and then obtain the vector sets  12 , p  13 , . . . , p  1nsamp } for the shape P, where nsamp is the pvector = { p maximum number of the points, pnsamp is the last point of shape

12 , q 13 , . . . q 1nsamp } for the sample shape Q, where P, and Qvector = {q qnsamp is its last point. Therefore, we define the similarity of the vectors using the dot product, and calculate the similarity of the target object and others using the following equation:

  nsamp    1 p1i · q1i   sv =    nsamp − 1 i=2  p1i  · q1i  

(10)

This problem is convex in S and has a closed form solution for squared loss function in Eq. (7). Equating the gradient of Eq. (7) to zero gives

Using the Euclidean distance between the matching points and the cosine similarity between the matching vectors, we define the similarity function as:

L = (ST S )−1 (−ST Y W X T + λ )(X W X T )−1

Sv =

where



(8)

X = X (1 ) , X (2 ) , . . . , X (M ) , S= s (1 ) , s (2 ) , . . . , s (M )





T

,





W = diag W (1) , W (2) , . . . , W (M ) , Y = diag Y (1) , Y (2) , . . . , Y (M ) . Borrow the idea from GO-MTL method [23], we estimate the base L and the coefficient matrix S by the MTKR algorithm. Once the parameters β are obtained, the regression function (3) can be used to compute the boundary of the caging region with the shape of the caged object. 3. Caging a new object using previous knowledge The Algorithm 2 requires significant training data to detect the caging region for each new cage before the learner can solve it. The knowledge transfer between the tasks can improve the performance of the learned models. We eliminate this restriction by incorporating caging descriptors into learning, enabling zero-shot transfer to new tasks. In grasping of an object, it is generally believed that similar objects can be grasped in a similar way [24]. The caging experience can be transferred to the new caging task

nle f t 1 (1 + sv − d p /dthd ) − 2 3 · nsamp

(11)

where nleft is the number of wrong match points, dthd is a preset threshold. Ds (Bi , B j )d p denotes the shape context distance between the object P and the object Q. More details of the measurement of the shape context distance can be found in Ref. [25]. When the similarities between the new object and the samples are obtained, we predict the coefficient vector snew and the parameters β ne w of the new caging function by the following equation:

β new = snew L

(12)

Once the parameters β for the caging new object are obtained, the regression function (3) can be used to compute the boundary of the caging region with the shape of the caged object. The total time of the computation of the caging region for a new object includes two parts: (a) The computation of the basis L and the matrix S, in which the runtime is dominated by the dimension of the sample data X and Y for each shape. We performed the simulation using MATLAB on a personal computer with a 3.6-GHz CPU and 4.0 GB RAM. The search space of the algorithm is determined by discrete-data of the contour of the cage region.

150

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

In total, it needs 4.8 milliseconds for computation of L and S with the 16 sample dates. (b) The measurement of the shape context distance between the new shape and the samples, in which the runtime is dominated by the number of sample points for each shape. In our implementation on the personal computer, the computation of the shape for 30 sample points takes 0.98 s roughly.

4. Experiments and discussion We performed extensive tests of the presented system using a Universal robotic arm equipped with a four-pin gripper. The system has no prior knowledge of any of the objects present on the scene, or on how to cage the objects. A robotic grasping prototype with a pneumatic four-pin gripper is developed for caging the novel object from a worktable. It consists of four main parts: a) a Universal Robots collaborative robot arms-UR3; b) a pneumatic four-pin gripper, SMC MHS4-63D, which is mounted on the robot to caging up the target object; c) a color CCD camera; d) a host computer of robot, which is used to establish socket communication with the UR3 robot through TCP/IP protocol. Since that, the controller would send motion command to UR3 and then the robot move accordingly. The whole caging includes offline learning phase and online caging phase. In the offline learning phase, a database is available, consisting of a set of objects labeled with the caging region. The database entries are analyzed to extract relations between the specific features and the caging region using Algorithm 2. The result is a learned model that, given some features, can predict a caging region. Then, the graph of an Table 1 Training samples.

Fig. 3. Objects used in the physical experiments: a connector, a polyhedron, a cup, and a piston.

object can be extracted from the scene. With the graph, we can compute the caging set and the caging configruation. 4.1. Training data Training is performed using four known objects with different shapes and colors, as shown in Fig. 3. The geometric models of the four objects are previously known. We can compute the 2D projections of the objects and capture the related images of the objects. For caging purposes, we collect the images of the objects in different views, in which 2D geometric models are used to construct the caging regions and the images are the reference to match the new target object. Therefore, a total of four examples are acquired to solve the regression problem. Table 1 shows the 4 objects and the 16 images of the objects. Then, the caging configuration generated from each projection of the objects using Eq. (1) and Algorithm 1. It is noted that the red line denotes the boundary of the cage region.

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

151

Table 2 Similarity between the mouse and the samples.

Fig. 4. (a) shows an image of the mouse; (b) shows the shape context of the mouse.

We perform two experiments to extensively evaluate various aspects of our approach in terms of learning-based robotic caging grasp. Based on the strategies introduced in section IV, the steps of image-based caging strategy for finding the initial caging configuration of two new objects are discussed below. 4.2. Experiment to cage a mouse It is assumed that the robot prepares to cage a mouse on the table shown in Fig. 4a. We initially extract the shape context features from the image as shown in Fig. 4b and then evaluate the shape distance between the image and the 16 images given in Table 1. With the help of the shape distance, we estimate the parameters β ( new ) of the caging function for the image. It is noted that the red curves in Table 1 indicate the boundary of the caging region. The shape context distance between the new shape Pnew and the sample shapes Qi (i = 1,…,16) is first measured using Eq. (11). Then, the parameters β ( new ) of the new caging function should be predicted using Eq. (12). The similarity between the new shape and the sample shapes is shown in Fig. 5. The similarity between the mouse and 16 sample shapes is shown in Table 2, where the similarities between the four images

Fig. 5. Shape distance between the new target and four samples, where (a), (b), (c) and (d) are the objects 1,2,3,4 in Table 1.

Similarity

Polyhedron

Cup

Bolt

Piston

Image Image Image Image

0.57 0.52 0.61 0.39

0.57 0.48 0.46 0.55

0.64 0.68 0.32 0.58

0.75 0.75 0.74 0.79

1 2 3 4

Similarity between the mug and the samples Image 1 0.24 0.73 0.81 0.59 Image 2 0.04 0.81 0.79 0.58 Image 3 0.48 0.70 0.05 0.57 0.43 0.75 0.61 0.45 Image 4

of different objects are represented by numbers in the range (0, 1). For example, the similarity between the image 1 of the polyhedron and the current image of the mouse is 0.57. Using Algorithm 2, we can obtain the parameters of the regression function and then build the boundary of the caging set. Therefore, a caging grasp should be selected from the caging set and should guide the gripper to pick the object up. It is noted that

Fig. 6. (a) Learned boundary of the caging set; (b) and (c) Two caging grasps belong to the caging set.

152

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

Fig. 9. (a) Real mug to be caged; (b) Shape of the mug.

Fig. 7. Caging configuration related to point a1 in Fig. 6b, where (a) shows the gripper move to the pre-computed caging configuration and (b) shows that the mouse has been firmly grasped.

Fig. 10. Shape distance between the new target and the four samples, where (a), (b), (c) and (d) are the objects 1,2,3,4 in Table 1.

the contour in Fig. 6a is the boundary of the caging set, where the gripper configuration inside the set is a caging grasp. For example, points a1 (x = 0, θ z =0) and a2 (x=−0.4, θ z =−0.2) relate to the gripper configurations shown in Fig. 6b and c, which are the caging grasps. The actual experiment used for the cage grasping is demonstrated in Fig. 7a and b. It is noted that the initial caging configuration, that is, the initial position of the four pins, is obtained from Fig. 6b and c. Fig. 8. Caging configuration related to point a2 in Fig. 6c, where (a) shows the gripper move to the pre-computed caging configuration and (b) shows that the mouse has been firmly grasped.

4.3. Experiment to cage a mug In the second experiment, we try to cage a mug on the table shown in Fig. 9a and b. We initially extract the shape context

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

153

Fig. 12. Caging configuration related to point a3 in Fig. 11b, where (a) shows the gripper move to the pre-computed caging configuration and (b) shows that the mug has been firmly grasped.

4.4. Remark

Fig. 11. (a) Learned boundary of the caging set, (b) and (c) Two caging grasps belonging to the caging set.

features from the image and then evaluate the shape distance between the image and the training set as shown in Fig. 10. The results of the similarities are given in Table 2. With the help of the shape distance, we estimate the parameters β ( new ) of the caging function for the image. The grasping configurations pertaining to point a3 and a4 are shown in Fig. 11b and c. Once we obtain the cage, we can guide the gripper to conduct the caging grasp. The shape context distance between the new shape Pnew and the sample shapes Qi (i = 1,…,16) is first measured using Eq. (11). Then, the parameters β ( new ) of the new caging function should be predicted using Eq. (12). The similarities between the mug and the 16 sample shapes are shown in Table 2. Once the caging set is established as shown in Fig. 11a, a caging grasp should be selected from the set, which can then guide the gripper to pick the object up. It is noted that the blue contour in Fig. 11a denotes the boundary of the caging set, where a gripper configuration inside the set is a caging grasp. For example, points a3 (x = 0, θ z =0) and a4 (x = 10, θ z =−1) relate to the gripper configurations shown in Fig. 11b and c respectively, which are the caging grasps.

For a vision-guided caging grasp, we need to establish the mapping from the image features of the object to the caging configuration. However, the learning dataset, as shown in Table 1, does not have a sufficient supply of the training example. The MTKR function indicated by Eq. (4) aims to improve the performance of multiple caging tasks by learning them jointly. Using the four caging tasks and the training data for each task, we can estimate the base L and the coefficient matrix S. Furthermore, it is possible to construct the caging set related to different images of the same object. It is noted that an image the object is captured from the caging direction of the gripper. We establish the caging set with the image of the object using the regression function (3), where the parameters vector is β ( k ) = s( k ) L. Moreover, when faced with a new object, we can establish the regression function and then determine the caging set directly from the image of the target object. 5. Conclusion and future work This work aims to cage a novel object using learning-based method. The mapping from the image feature to the caging configuration is first established using the kernel regression function. Then, the parameter vectors of the regression function are assumed to be represented by a linear combination of the shared latent model components from a knowledge repository. That is, each regression function parameters matrix β can be represented as a linear combination of the columns of L according to the sparse coefficients S. Therefore, the base L and the coefficient matrix S will be estimated by the proposed multi-task learning algorithm. Using the proposed method, several different caging tasks can be trained

154

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

Fig. 13. Caging configuration related to point a4 in Fig. 11c, where (a) shows the gripper move to the pre-computed caging configuration and (b) shows that the mug has been firmly grasped.

Fig. 15. (a) Learned boundary of the caging set for knife, (b) is the caging configuration relates to point a5 in (a), and (c) is the caging configuration relates to point a6 in (a). Table 3 Similarity between the knife and the samples.

Fig. 14. Shape distance between the knife and the four samples.

Similarity

Polyhedron

Cup

Bolt

Piston

Image Image Image Image

0.438 0.5443 0.5208 0.453

0.7033 0.6914 0.6799 0.5618

0.6404 0.6184 0.5966 0.667

0.6613 0.5542 0.5009 0.5733

1 2 3 4

with the joint model, where the regression problem will have sufficient training dataset. Besides the mouse and mug using in the experiments, a knife, which is a long and thin object, is also used to test the proposed method. The knife could be approximated by convex polyhedron as the four samples, and the similarity between the knife and the samples thus be computed as given in Table 3. Therefore, it is possible to learn the caging region using MTKR. The analysis of caging a knife is given as follows. Fig. 15 shows the caging configuration, for example, points a5 (x = 0.5, θ z = 0.2) and a6 (x = 0, θ z = 0) relate to the gripper

J. Su, B. Chen and H. Qiao et al. / Neurocomputing 351 (2019) 146–155

configurations shown in Figs. 14b and 15c respectively, which are the caging grasps. It should be noted that MTKR cannot be used to compute the caging sets for a thin concave object such as a hook, because the caging condition used in this work is not suitable to the concave object. In our future work, we would investigate the condition to cage concave object and then establish the training samples to cover the concave object. Furthermore, in order to transfer the caging experience to a new caging task rapidly, shape similarity for caging knowledge transfer is employed. Thus, given only the shape context for a novel object, the learner is able to accurately predict the caging set through zero-shot learning. References [1] A. Bicchi, V. Kumar, Robotic grasping and contact: a review, in: Proceedings of the IEEE International Conference on Robotics and Automation, 20 0 0, pp. 348–353. [2] W. Kuperberg, Problems on polytopes and convex sets, in: Proceedings of the DIMACS Workshop on Polytopes, 1990, pp. 584–589. [3] D.J. Kriegman, Let them fall where they may: capture regions of curved 3D objects, in: Proceedings of the IEEE International Conference on Robotics and Automation, 1994, pp. 595–601. [4] A Mostafa Vahedi, Frank van derStappen, Caging polygons with two and three fingers, Int. J. Robot. Res. 27 (No.11) (2008) 1308–1324. [5] R. Diankov, S. S.Srinivasa, D. Ferguson, J. Kuffner, Manipulation planning with caging grasps, in: Proceedings of the IEEE International Conference on Humanoid Robots, 2008, pp. 285–292. [6] A. Rodriguez, M.T. Mason, S. Ferry, From caging to grasping, Int. J. Robot. Res. 31 (No.7) (2012) 886–900. [7] W.W. Wan, R. Fukui, M. Shimosaka, T. Sato, Y. Kuniyoshi, A new ‘grasping by caging’ solution by using eigen-shapes and space mapping, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2013, pp. 1566–1573. [8] R. Fukui, K. Kadowaki, Y. Niwa, W. Wan, M. Shimosaka, T. Sato, Design of distributed end-effectors for caging-specialized manipulator, Exp. Robot. (2013) 15–26. [9] D. Zarubin, F.T. Pokorny, M. Toussaint, D. Kragic, Caging complex objects with geodesic ball, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 2999–3006. [10] Y.B. Jia, F. Guo, H. Lin, Grasping deformable planar objects: squeeze, stick/slip analysis, and energy-based optimalities, Int. J. Robot. Res. 33 (no.6) (2014) 866–897. [11] A. Saxena, J. Driemeyer, A.Y. Ng, Robotic grasping of novel objects using vision, Int. J. Robot. Res. 27 (No.2) (2008) 157–173. [12] I. Lenz, H. Lee, A. Saxena, Deep learning for detecting robotic grasps, Int. J. Robot. Res. 34 (no.4) (2013) 705–724. [13] J. Bohg, A. Morales, T. Asfour, D. Kragic, Data-driven grasp synthsis-A survey, IEEE Trans. Robot. 30 (no.2) (2014) 289–309. [14] A. Boularias, O. Kroemer, J. Peters, Learning robot grasping from 3-d images with Markov random fields, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, pp. 1548–1553. [15] B.D. Balaguer, S. Carpin, A learning method to determine how to approach an unknown object to be grasped, Int. J. Humanoid Robot. 8 (no.3) (2011) 579–606.

155

[16] H.O. Song, M. Fritz, D. Goehring, T. Darrell, Learning to detect visual grasp affordance, IEEE Trans. Autom. Sci. Eng. 13 (no.2) (2016) 798–809. [17] E. Hyttinen, D. Kragic, R. Detry, Learning the tactile signatures of prototypical object parts for robust part-based grasping of novel objects, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2015, pp. 4927–4932. [18] S. El-Khoury, A. Sahbani, A new strategy combining empirical and analytical approaches for grasping unknown 3D objects, Robot. Auton. Syst. 58 (no.5) (2010) 497–507. [19] A.K. Goins, R. Carpenter, W.K. Wong, R. Balasubramanian, Evaluating the efficacy of grasp metrics for utilization in a Gaussian process-based grasp predictor, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 3353–3360. [20] J.H. Su, H. Qiao, C.K. Liu, Y.B. Song, A.L. Yang, Grasping objects: the relationship between the cage and form-closure grasp, IEEE Robot. Autom. Mag. 24 (Iss:3) (2017) 84–96. [21] H. Takeda, S. Farsiu, P. Milanfar, Kernel regression for image processing and reconstruction, IEEE Trans. Image Process. 16 (no.2) (2007) 349–366. [22] J. Fan, T. Gasser, I. Gijbels, M. Brockmann, J. Engel, Local polynomial regression: optimal kernels and asymptotic minimax efficiency, Ann. Inst. Stat. Math. 49 (no.1) (1997) 79–99. [23] K. Abhishek, H. Daumé III, Learning task grouping and overlap in multi-task learning, in: Proceedings of the International Conference on Machine Learning, 2012. [24] J. Bohg, A. Morales, T. Asfour, D. Kragic, Data-driven grasp synthesis-a survey, IEEE Trans. Robot. 30 (no.2) (2014) 289–309. [25] S.G. Salve, K.C. Jondhale, Shape matching and object recognition using shape contexts, in: Proceedings of the IEEE International Conference on Computer Science and Information Technology, 9, 2010, pp. 471–474. [26] D. Isele, M Rostami, E. Eaton, Using task features for zero-shot knowledge transfer in lifelong learning, in: Proceedings of the International Joint Conference on Artificial Intelligence, 2016, pp. 1620–1626. [27] A.A. Liu, Y.T. Su, W.Z. Nie, M. Kankanhalli, Hierarchical clustering multi-task learning for joint human action grouping and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39 (no.1) (2017) 102–114. [28] H. Takeda, S. Farsiu, P. Milanfar, Robust kernel regression for restoration and reconstruction of images from sparse noisy data, in: Proceedings of the IEEE International Conference on Image Processing, 2007, pp. 1257–1260. [29] J.H. Su, B. Chen, Computation of caging grasps of objects using multi-task learning method, in: Proceedings of the IEEE International Conference on Advanced Robotics and Mechatronics, 2018 Accepted. [30] W.W. Wan, B.X. Shi, Z.J. Wang, R. Fukui, Multi-robot object transport via robust caging, IEEE Trans. Syst. Man Cybern. (2017) 1–11. [31] W.W. Wan, R. Fukui, Efficient planar caging test using space mapping, IEEE Trans. Autom. Sci. Eng. 15 (no.1) (2018) 278–289. [32] H.A. Bunis, E.D. Rimon, T.F Allen, J.W. Burdick, Equilateral three-finger caging of polygonal objects using contact space search, IEEE Trans. Autom. Sci. Eng. 15 (no.3) (2018) 919–931. Jianhua Su received the B.Eng. degree in electronic and information engineering from Beijing Jiaotong University, Beijing, China, in 1999, the M.Eng. degree in Electronic and information engineering from the Beijing Jiaotong University, Beijing, in 2004, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2009. He is currently an Assistant Professor with the Institute of Automation, Chinese Academy of Sciences. His background is in the fields of control theory, robotics, automation, and manufacturing. His current research interests include intelligent robot system and train control system.