Optik 125 (2014) 6106–6112
Contents lists available at ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Extended gravitational pose estimation Peng Chen ∗ , Guang-Da Hu, Jiarui Cui School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
a r t i c l e
i n f o
Article history: Received 1 November 2013 Accepted 3 June 2014 Keywords: Pose estimation Correspondenceless Gravitational field
a b s t r a c t The model-to-image registration problem is a problem of determining the position and orientation (the pose) of a three-dimensional object with respect to a camera coordinate system. When there is no additional information available to constrain the pose of the object and to constrain the correspondence of object features to image features, the problem is also known as simultaneous pose and correspondence problem, or correspondenceless pose estimation problem. In this paper, we present a new algorithm, called extended gravitational pose estimation (EGPE), for determining the pose and correspondence simultaneously. The algorithm is based on gravitational pose estimation (GPE) algorithm. In our algorithm, the original GPE has been revised to deal with the problem with false image points. For problems with both occluded object points and false image points, we firstly applied single-link agglomerative clustering algorithm to pick out occluded object points when a local minimum has been found, then the revised GPE is applied again on the clustering result to update rotation and translation of the object model. EGPE has been verified on both synthetic images and real images. Empirical results show that EGPE is faster, more stable and reliable than most current algorithms, and can be used in real applications. © 2014 Elsevier GmbH. All rights reserved.
1. Introduction Vision-based pose estimation problem is to determine position and orientation of a camera and a target with a set of n feature points expressed in the object coordinates and its 2D projection expressed in the camera coordinates. It is one of the key problems in object recognition, visual navigation, robot localization, augmented reality and other areas. When the correspondence between 3D object feature points and 2D image feature points are given, the problem is also known as perspective-n-point (PnP) problem. Existing methods to solve the problem can be divided into three categories: non-iterative, iterative and globally optimal algorithms. The non-iterative algorithms apply linear methods to obtain algebraic solutions. Most research is focused on P3P, P4P and P5P problems [1,2], since PnP problem is actually a classical Direct Linear Transformation (DLT) problem and can be solved linearly when n > 5 [3]. The most popular algorithms to handle arbitrary value of n are proposed in [4,5]. Especially, the EPnP algorithm proposed in [6] is believed to be faster and more accurate than other non-iterative algorithms.
∗ Corresponding author. Permanent address: School of Automation and Electrical Engineering, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing 100083, China. Tel.: +86 13522300113. E-mail address:
[email protected] (P. Chen). http://dx.doi.org/10.1016/j.ijleo.2014.06.109 0030-4026/© 2014 Elsevier GmbH. All rights reserved.
As to iterative algorithms, the classical methods are to formulate the pose estimation as a nonlinear least-square problem with the constraint that the rotation matrix is orthogonal. The problem can be solved by using non-linear optimization algorithms, most typically, Levenberg–Marquadt or Gaussion–Newton method [7–10]. A widely used iterative algorithm is orthogonal iterative (OI) algorithm proposed by Lu et al. [11]. The algorithm reformulated a new objective function to minimize the object-space collinearity error. Compared with other iterative algorithms, OI algorithm performs higher accuracy, speed and noise-resistance. Globally optimal algorithms are recently developed algorithms based on L∞ -norm minimization method. Since the object-space collinearity error can be expressed as a quasi-convex function, techniques of linear programming or second-order cone programming can be applied. Typical algorithms are proposed in [12–14]. However, the input data are often noisy in a real application, the extracted features can be missing or false, and their position may be inaccurate due to poor image quality, bad light conditions, partial occlusions and/or precision of acquisition and feature-extraction process. In those cases, the problem actually consists of two subproblems: pose estimation and correspondence determination. When there is no additional information available with which to constrain the pose of the object and to constrain the correspondence of object features to image features, the problem is known as the simultaneous pose and correspondence problem,
P. Chen et al. / Optik 125 (2014) 6106–6112
or correspondenceless pose estimation problem. The problem is difficult because it requires solution of two coupled problems, each easy to solve only if the other has been solved first. The classic approach to solving these coupled problems is the hypothesizeand-test approach, of which the best known example is the RANSAC algorithm [15]. The RANSAC algorithm can achieve a high probability of success, but at a very heavy computation cost. A more effective algorithms of this type is Blind PnP, which is developed in [16] using a Gaussian mixture model. However, it needs some prior information on the object pose. A genetic algorithm based pose estimation algorithm with correspondences determination called EvoPose is proposed in [17]. Inspired by EvoPose, an algorithm by using differential evolution algorithm, called DePose, is proposed in [18]. The problem of the both algorithms is that poor local minima may cause the search to converge to false solutions, especially when there are missing or false image points. By integrating an iterative pose estimation technique called POSIT [19], and an iterative correspondence assignment technique called SoftAssign [20] into a single iteration loop, David et al. proposed an algorithm called SoftPOSIT [21]. It is arguably one of the best algorithms for the simultaneous pose and correspondence problem. However, SoftPOSIT do not ensure to find a pose when the initial guess is poor. Gravitational pose estimation (GPE) algorithm, which is inspired by classical mechanics, is proposed in [22]. The algorithm creates a simulated gravitational field from the image and lets the object model to move and rotate in that force field, starting from an initial pose. Unlike SoftPOSIT, GPE algorithm is more robust, consistent and fast even when starting from a bad initial pose. Because SoftPOSIT can find pose with great precision when it is able to converge, GPE and SoftPOSIT are integrated together in [23] to improve the performance of the correspondenceless pose estimation algorithm. However, the usage of GPE algorithm is limited by the assumption that there are no false image feature points. In this paper, we proposed an algorithm, called extended gravitational pose estimation (EGPE). The algorithm can solve correspondenceless pose estimation problem even when there are occluded object points or false image points. We firstly revised the original GPE algorithm to make it capable of dealing with correspondenceless pose estimation problem with false image feature points. Then we further improved the revised GPE algorithm, in order to deal with the case when there are both occluded object points and false image points. Experimental results show that our algorithm is faster, more stable and reliable than most current correspondenceless pose estimation algorithm. 2. Problem formulation Suppose the object model can be described by a set of feature points Pk (k = 1, . . ., L), the coordinate of Pk in the object coordinate system, OXw Yw Zw , is Pk w = [Xk w , Yk w , Zk w ]T , while the coordinate of Pk in the image coordinate system, OI xy, is pi = [xi , yi ]T (i = 1, . . ., N), according to perspective projection, the relationship between Pk w and pi is
⎡
f
˜i = ⎣ 0 p 0
0
0
⎤
f
0 ⎦ [R
0
1 w
w
t]P˜ k ,
(1)
˜ i and P˜ k are the corresponding homogeneous coordinate where p of pi and P w k respectively, f is the focal length. R = [R1 , R2 , R3 ] is the rotation matrix. If ui , uj , uk represents the three unit direction vectors of axes Xw , Yw , Zw of the object coordinate system, then R1 , R2 , R3 , are respectively the coordinates of ui , uj , uk expressed in the camera coordinate system. t is the translation vector. The relationship between the object coordinate system, the camera coordinate system and the image coordinate system is shown in Fig. 1.
6107
Fig. 1. Perspective projection model.
¯ Ideally, P w k should align with the line of sight (LOS) Oc pi ; however, because of noise and inaccurate feature extraction, the position of Pk on the image plan is usually pni = [xni , y ni ]T , but not pi . Therefore, the distance between P w and LOS Oc¯pn can be represented as [11]
k
2 dik = (I − V i ) RP w k +t
P ck
2 ,
i
(2)
RP w k
= + t, is the coordinate of Pk in the camera coorwhere dinate system, Vi is the LOS projection matrix, can be expressed as Vi =
v i v Ti v Ti v i
T
v i = [xin , yin , f ] .
,
(3)
According to [18], when all the object points and image points have their own corresponding points, the simultaneous pose and correspondence problem can be formulated as a minimization of the objective function
2 1 . mini (I − V i ) RP w k +t L L
E=
(4)
k=1
To apply Eq. (4) to more complicated cases, such as there are false image points or occlusions of object points, Eq. (4) is modified into the following form
2 1 mik (I − V i ) RP w +t . k L N
E=
L
(5)
i=1 k=1
where mik is a weight, equals to 0 or 1 for its corresponding square distance dik 2 . Given a set of object points P w k , k = 1, . . ., L, and a set of image points pi , i = 1, . . ., N, the square distance dik 2 can be got from Eq. (2), so that we can get a distance matrix D = [dik 2 ]N × L . To find an zero-one assignment matrix M = [mik ]N × L , the following steps can be applied: Step 1: Find an ordered pair (i, k) so that dik 2 is the minimum entry of matrix D. Step 2: Set mik = 1, meanwhile assign the ith row and kth column of matrix D to an extremely large constant C. Step 3: If all the entries of D equal to C, then return matrix M, otherwise go to Step 1. Therefore, if all the entries of the kth column of M equal 0, then the corresponding image feature of Pk is missing, so that Pk is probably occluded. If all the entries of the ith row of M equal 0, then image point pi does not match any object feature, so that pi is likely a false image feature point. 3. The algorithm 3.1. The revised gravitational pose estimation To minimize Eq. (5), dik 2 should be minimized, which means that ¯ n the object point P w k should be as close as possible to the LOS Oc pi .
6108
P. Chen et al. / Optik 125 (2014) 6106–6112
The basic idea of GPE is to imagine that there is a gravitational field where LOSes attract object points. The force (gravitational field) should be proportional to the distance between object points and LOSes, so that the force will decrease with the decreasing of dik 2 , and object point will not move away after reaching the minimum of dik 2 . Compared with GPE proposed in [22], we have made some adaptation because of the introducing of the assignment matrix M. Detailed process of the revised GPE algorithm can be described as the following: Step 1: Translate the object coordinate system to the object’s center of mass (COM), P w c , the coordinate of Pk in the translated w object coordinate system is P k , k = 1, . . ., L. Step 2: Given the initial guess of rotation matrix R and translaw w tion vector t, get the rigid transformation of P k as P ck = RP k + t, k = 1, . . ., L, calculate the distance matrix D and assignment matrix M. Step 3: Calculate the value E0 of Eq. (5), assign Eest = E0 , Rest = R0 , test = t0 , set the value of parameters ı, ε, and . Step 4: Calculate the force on each object point as the following: N
Fk =
mik (I − V i ) P ck .
(6)
i=1
Step 5: Assume every object feature point has a mass of 1, so that the total mass of the object is L. According to classical mechanics, the acceleration of object’s COM is
L
F k=1 k
ac =
L
(7)
.
Step 6: Calculate torque, Tc , around the COM as in Eq. (8), where rk is the vector that gives the relative displacement of point k with respect to COM. L
Tc =
rk × F k
ar = (J c )−1 T c ,
(9)
where Jc is the rotational inertia matrix of the object at current pose. If rk can be represented as rk = [rxk , ryk , rzk ]T , then matrix Jc has the following form −Jxy
Jxx
J c = ⎣ −Jyx
Jyy
−Jzx
−Jxz
⎤ ⎥
−Jyz ⎦ ,
−Jzy
(10)
Jzz
where Jxx =
L k=1 L
Jyy =
Jzz =
k=1
2 2 ryk + rzk ,
k=1 L
c cϕ
R = ⎣ cϕsϑs sϑs
c sϕ
− cϑsϕ
+ cϑcϕs
−s
⎤
cϑcϕ + sϑs sϕ
c sϑ ⎦ ,
cϑs sϕ − cϕsϑ
cϑs
(12)
where cˇ represents cos(ˇ), and sˇ represents sin(ˇ). The random rotation matrix can be generated by assigning ϑ, , ϕ to three random numbers. By introducing the assignment matrix, the revised GPE can handle the following three cases of correspondenceless pose estimation: • The number of object points is equal to the number of image points, and there is no occluded object points or false image points. • Some object points are occluded, but there is no false image point. In this case, the number of object points is larger than the number of image points. • There are additional false image points, but no object point is occluded. In this case, the number of image points is larger than the number of object points. It is worth noticing that the GPE algorithm proposed in [22] can only handle the first two cases. Therefore, the revised GPE algorithm is an improvement to the original version. 3.2. The extended gravitational pose estimation algorithm
Step 7: Calculate angular acceleration ar , as the following
⎢
⎡
(8)
k=1
⎡
Step 12: If E < ε, or the iterations of the algorithm has exceed then terminate the algorithm and return Eest , Rest , and test = test – RPc w . In other cases, calculate E = E–E0 . If E > 0, then assign R to a random rotation matrix, and go to step 4, otherwise assign E0 = E, and go to step 4. To avoid trapping in a local minimum, if E satisfy −ı < E < 0 in consecutive iterations, then assign R to a random rotation matrix. Since rotation matrix can be represented by three Euler angles ϑ, , ϕ as [24]
2 2 rxk + rzk ,
2 2 rxk + ryk ,
Jyz = Jzy =
L
ryk rzk ,
k=1 L
Jxz = Jzx =
rxk rzk ,
(11)
k=1 L
Jxy = Jyx =
rxk ryk .
k=1
Step 8: Assign t = at , update t as t = t + t. Step 9: To update R, rotate R1 , R2 , R3 of R around ar by an angle of |ar | radians, where |ar | denotes the length of ar . w Step 10: Calculate P ck = RP k + t, and update the distance matrix D, assignment matrix M and value E of Eq. (5). Step 11: If E < Eest , then assign Eest = E, Rest = R, test = t, otherwise go to step 12.
The main purpose to extend the revised GPE algorithm is to make it capable of handling the case when there are both occluded object points and false image points. In this case, although the occluded object points cannot be seen on the image plane, they can still be matched to false image points. In order to handle this situation, we applied a two-step scheme, when the revised GPE has trapped in a local minimum. In the first step, single-link agglomerative clustering algorithm [25] is employed to find out possible occluded object points. In the second step, the revised GPE is applied again to pick out false image points and estimate the correct pose and correspondence. Single-link agglomerative clustering algorithm is applied based the following idea. Assume current pose and correspondence are close to the true pose and correspondence, then the difference between the projection of P w k and its corresponding image feature pni should be near the origin of the image coordinate system OI xy. Otherwise, especially when P w k is an occluded object point, the difference should be significantly far away from the origin. Therefore, We can define a data vector as the following pik = pni − prk . r
r,
(13) r ]T
Pw k
where pk = [xk yk is the projection of on the image plane, and can be acquired through Eq. (1). Single-link agglomerative clustering algorithm is launched on the data set S = {pik , where the corresponding mik = 1 in the assignment matrix M}. The result of running the algorithm is a hierarchy of clusterings [25] and the best clustering is selected by searching the hierarchy for clusters that have the largest lifetime. The definition of lifetime is the absolute difference between the dissimilarity level at which the cluster
P. Chen et al. / Optik 125 (2014) 6106–6112
6109
Fig. 3. CPU time and successful rate. (a) CPU time w.r.t. #obj. pts. (b) Successful rate w.r.t. #occ. pts.
4. Experimental results
Fig. 2. Clustering results for 10 randomly generated object points. (a, b) Images of randomly generated 10 object points. (c, d) Dendrograms correspond to the two cases.
is formed and the dissimilarity level at which it is absorbed into a larger cluster [25]. Denote the best clustering as Cbest, since the occluded object points only takes a small part of all the object points in most real applications, we can further divide Cbest into two clusters: the largest cluster in Cbest, denoted as CL , and other clusters in Cbest. Based on the clustering result, if a data vector pik is in CL , then object point Pk w is corresponded to an image feature pi n ; otherwise, Pk w is identified as an occluded object point, and will be neglected in the next process. Fig. 2 shows two clustering results for 10 randomly generated object points with 1 occluded object points and 1 false image points. w In the first result, P w 2 and P 8 is not clustered to CL , therefore they are treated as occluded object points and neglected in the following process. In the second result, P w 1 is picked out as an occluded object point and neglected in the following process. When the occluded object points have been neglected, the revised GPE algorithm is applied again. Pose and correspondence can be updated; meanwhile, false image features can be identified. To further improve the precision of the algorithm, SoftPOSIT algorithm [21] is used in the end on all the object points and image points. The result of the second run of the revised GPE is treated as the initial guess of SoftPOSIT. The parameter ˇ0 , which corresponds to the fuzziness of the correspondence matrix in SoftPOSIT, can be set around 0.1 as in [23]. To sum up, the entire processing steps of the extended gravitational pose estimation algorithm is demonstrated as the following. Algortihm 1 EGPE pseudocode Step 1 Inputs: (a) List of L object points, Pk w = [Xk , Yk , Zk ]T , 1 ≤ k ≤ L, (b) List of N image points, pi n = [xi n , yi n ]T , 1 ≤ i ≤ N. Step 2 Initialize: (a) Random rotation matrix R and translation vector t. (b) Parameters ı1 , ε1 , 1 and 1 for the first run of the revised GPE and parameters ı2 , ε2 , 2 and 2 for the second run of the revised GPE. (c) Assign ˇ0 = 0.1 for SoftPOSIT. Step 3 Run the revised GPE algorithm. If E satisfy −ı1 < E < 0 in 1 consecutive iterations, then: (a) Run single-link agglomerative clustering algorithm. (b) Run the revised GPE again on the reduced object points and all the image points until E < ε2 or the iterations has exceeded 2 . (c) Run SoftPOSIT on all the object points and image points. Step 4 if E < ε1 or the iterations has exceeded 1 , then terminate the algorithm. Otherwise, assign rotation matrix R to a random rotation matrix and go to step 3.
In this section, we compared EGPE with GPEsoftPOSIT [23], DePose [18] and EvoPose [17] on synthetic data, then we validated EGPE by using real images. 4.1. Synthetic image experiments A virtual perspective camera is used to generate a set of 3D-to2D correspondences. The focal length is 35 mm. Physical size of an unit pixel is 12 × 12 m2 . The image size is 1200 × 1600. The principle point is at [600, 800]T . Object points are located randomly in a box area, [−2, 2] × [−2, 2] × [4, 9]m3 , in the camera coordinate system. 50 different orientations are generated. For each orientation, the three Euler angles are randomly selected in [0◦ , 45◦ ]. All of the algorithms are carried out on a PC with an Intel® i3 processor, a 4GB memory, and Matlab® 2012a environment. We first compared EGPE with GPEsoftPOSIT, because it is suggested that GPEsoftPOSIT outperforms both GPE and SoftPOSIT in the experimental results in [23]. 8, 10, 12, 15 object points is used respectively. Noise level is fixed to 0.5 pixels. Simulations for 0, 1, 2, 3 occluded object points were carried out for each configuration of object points. The initial Euler angles were randomly selected in the range [−90◦ , 90◦ ], the initial translation vector is randomly located in the box area, [−5, 5] × [−5, 5] × [−10, 10] m3 , in the camera coordinate system. The criterion to identify a local minimum for both GPEsoftPOSIT and EGPE was set such that the absolute change in the value of the objective function is less than 0.001 for 30 consecutive iterations. When the value of the objective function is less than 1 × 10−5 , both algorithms will be terminated. The maximum iterations for GPEsoftPOSIT is 50,000. For EGPE, the maximum iterations are 1 = 3000, 2 = 5000. Denote the true rotation matrix as Rtrue and the true translation vector as ttrue , the rotation and position error can be evaluated as [14]
pos = R Test t est − R Ttrue t true , rot = 2 cos−1 (0.5
1 + E 11 + E 22 + E 33 ),
(14) E = R est R Ttrue .
(15)
The results of GPEsoftPSOIT and EGPE are presented in Table 1. ¯ is in degree, presents the average rotation error. pos ¯ is in rot meter, presents the average position error. CPU time is in second. All the results are acquired after 50 simulations for every different setting of object points and occluded points. A result is considered to be successful only if the estimated correspondence is totally coincided with the predefined one. Only successfully estimated results ¯ ¯ and CPU time. are considered in the statics of rot, pos According to the table, the successful rate of GPEsoftPOSIT drops dramatically when the number of occluded points is greater than 2, while EGPE can maintain a more stable level of successful rate. Meanwhile, GPEsoftPOSIT algorithm needs much more CPU time
6110
P. Chen et al. / Optik 125 (2014) 6106–6112
Table 1 Comparison of EGPE and GPEsoftPOSIT. #obj. pts
8
10
12
15
#occ. pts
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
GPEsoftPOSIT
EGPE
%suc.
¯ rot
¯ pos
CPU
%suc.
¯ rot
¯ pos
CPU
98 98 82 68 98 98 82 76 98 90 92 84 100 100 94 84
0.1014 0.0429 0.0587 0.9241 0.0340 0.0389 0.0421 0.0953 0.0271 0.0927 0.1701 0.0370 0.0713 0.0367 0.0261 0.0845
0.0248 0.0058 0.0086 0.1512 0.0039 0.0054 0.0051 0.0168 0.0034 0.0210 0.0241 0.0046 0.0174 0.3104 0.0031 0.0266
49.23 55.20 51.57 53.21 60.20 66.99 61.50 64.47 83.18 81.98 85.53 83.13 124.07 121.89 124.20 121.83
98 98 98 88 100 100 98 94 100 98 98 96 100 98 96 96
0.0395 0.0540 0.0690 0.0895 0.0350 0.0389 0.0477 0.0512 0.0282 0.0301 0.0383 0.0402 0.0284 0.0257 0.0271 0.0294
0.0048 0.0072 0.0078 0.0149 0.0045 0.0047 0.0058 0.0064 0.0035 0.0041 0.0048 0.0055 0.0033 0.0032 0.0036 0.0035
3.51 3.99 6.26 31.14 4.79 5.78 7.15 16.49 6.34 9.31 12.09 21.34 6.91 14.94 35.49 27.29
than EGPE to converge to a solution. Fig. 3 tries to make this observation more obvious. Fig. 4 shows the tendencies of the orientation and position error, respectively. The expected tendencies are: error should go up as the number of occluded points goes up; error should go down as the number of object points goes up. In Fig. 4, the expected tendency can be seen in the result of EGPE, but it is not obvious for the result of GPEsoftPSOIT. This suggests that EGPE is more stable than GPEsoftPOSIT, and the result of EGPE is more reliable than GPEsoftPOSIT. Next, we compared EGPE with DePose and EvoPose, in order to verify the performance of EGPE when there are both occluded object points and false image points. Same simulation settings are applied for 16 object points. Two experiments are respectively performed under 11 different noise levels from 0 to 5 pixels. The first experiment introduces one occluded object points and one false image points, while the second experiment introduces two occluded object points and two false image points. The results of the two experiments are shown in Fig. 5. It is clear that EGPE can maintain a higher and more stable successful rate than DePose and EvoPose.
¯ w.r.t. #occ. pts. (b) rot ¯ w.r.t. #obj. pts. Fig. 4. Rotation and position error. (a) rot ¯ w.r.t. #obj. pts. ¯ w.r.t. #occ. pts. (d) pos (c) pos
Fig. 5. Comparison with DePose and EvoPose. (a) Comparison result with 1 occluded object point and 1 missing image point. (b) Comparison result with 2 occluded object points and 2 missing image points.
4.2. Real image experiments In the above experiments, the number of object points is equal or greater than the number of image points. The following
Fig. 6. Configurations for ObjectA and ObjectB. (a, b) Location of the object points and the virtual box of ObjectA. (c, d) Location of the object points and the virtual pyramid of ObjectB.
P. Chen et al. / Optik 125 (2014) 6106–6112
6111
Fig. 7. Feature extraction result and estimation result of EGPE for ObjectA. (a–c) Feature extraction result. (d–f) Estimation result of EGPE.
Fig. 8. Feature extraction result and estimation result of EGPE for ObjectB. (a–c) Feature extraction result. (d–f) Estimation result of EGPE.
experiments are to evaluate the performance of EGPE when the number of image points is greater than the number of object points. A real camera is used, whose focal length is 3.7 mm. The image size is 640 × 480. The physical size of each pixel is 7.12 × 7.12 m2 . The principle point is located at [325,235]T . Two objects are used in the experiments, denoted as ObjectA and ObjectB. For ObjectA, the object feature points in the object coordinate system are located at P1w (10, 146.3, 0), P2w (0, 136.3, 0), P3w (0, 0, 0), P4w (100, 0, 0), P5w (0, 0, 90), P6w (10, 0, 100), P7w (100, 0, 90), P8w (90, 0, 100). For ObjectB, the object points in the object coordinate system are located at P1w (0, 0, 0), P2w (0, 56, 0), P3w (0, 0, 110), P4w (0, 56, 110), P5w (110, 0, 110), P6w (110, 56, 110), P7w (110, 56, 0). In order to reflect the correctness of pose and correspondence estimation, a virtual box is placed on ObjectA, whose vertexes are located at A(0, 0, 0), B(0, 0, 50), C(100, 0, 50), D(100, 0, 0), E(0, 50, 0), F(0, 50, 50), G(100, 50, 50), H(100, 50, 0). A virtual pyramid is placed on ObjectB, whose vertexes are located at A(0, 56, 0), B(0, 56, 110), C(110, 56, 110), D(110, 56, 0), E(54, 70, 54). Fig. 6 shows the location of object points and virtual objects for ObjectA and ObjectB respectively. All the coordinates are in millimeter. For ObjectA, image feature points are extracted by hand, three false image points are added as shown in Fig. 7. Images from three different views are used to demonstrate the estimation result as shown in Fig. 7. In all the images, the location of the virtual box coincides with ObjectA. The results are not impacted by the false image points.
For ObjectB, the image feature points are extracted as the corners of the object contour. The COMs of the two holes on ObjectB are also extracted as false image points. Since the intensity of the projection of object point P w 4 is close to its vicinity, the corresponding image point cannot be extracted, therefore, P w 4 can be treated as an occluded object point. Fig. 8 shows the feature extraction results and pose estimation results. The location of the virtual pyramid coincides with ObjectB. The estimation results of EGPE are not impacted by the existence of both occluded object points and false image points. 5. Conclusion In this paper, we proposed an algorithm called EGPE to handle correspondenceless pose estimation problem when there are both occluded object points and false image points. The algorithm is mainly composed of two parts. The first part is the revised GPE algorithm, and the second part is employed only when the first part has trapped in a local minimum. The second part of the algorithm has two steps: the first step is to pick out occluded object points by using single-link agglomerative clustering algorithm, and the second step is to update the rotation and translation of the object model by using the revised GPE again based on the clustering result. SoftPOSIT algorithm is appended to the end of the second step to improve the estimation precision. Compared with GPEsoftPOSIT, DePose and Evopose, the proposed algorithm is faster, more stable
6112
P. Chen et al. / Optik 125 (2014) 6106–6112
and reliable. Real image experiments also show that EGPE can be efficiently applied on images acquired from a common digital camera. In future, we are going to embed the proposed algorithm into a rendezvous and docking system to further test its performance when video sequence processing is involved. References [1] F.C. Wu, Z.Y. Hu, A study on the P5P problem, J. Softw. 12 (5) (2001) 768–775. [2] Z. Zhang, C. Sun, P. Wang, Two-step pose estimation method based on five reference points, Chin. Opt. Lett. 10 (7) (2012) 071501-1–171501-5. [3] F.C. Wu, Z.Y. Hu, A linear method for the PnP problem, J. Softw. 14 (3) (2003) 683–688. [4] L. Quan, Z. Lan, Linear N-point camera pose determination, IEEE Trans. Pattern Anal. Mach. Intell. 21 (7) (1999) 774–780. [5] A. Ansar, K. Daniilidis, Linear pose estimation from points or lines, IEEE Trans. Pattern Anal. Mach. Intell. 24 (5) (2003) 578–589. [6] V. Lepetit, F. Moreno-Noguer, P. Fua, EPnP: an accurate o(n) solution to the PnP problem, Int. J. Comput. Vis. 81 (2009) 155–166. [7] M.L. Liu, K.H. Wong, Pose estimation using four corresponding points, Pattern Recognit. Lett. 20 (1999) 69–74. [8] R. Zhu, Y. Lin, L. Zhang, New algorithm of solving for ranges during final approach of spacecraft rendezvous, J. Beijing Univ. Aeronaut. Astronaut. 32 (7) (2006) 764–768. [9] R. Zhu, Y. Lin, Relative attitude estimation and control schemes for the final approach phase of spacecraft rendezvous, J. Beijing Univ. Aeronaut. Astronaut. 33 (5) (2007) 544–548. [10] D. Dai, X. Wang, C. Hu, Camera calibration and attitude measurement technology based on astronomical observation, Acta Opt. Sin. 32 (3) (2012) 0312005-1–1312005-5. [11] C. Lu, G. Hager, E. Mjolsness, Fast and globally convergent pose estimation from video images, IEEE Trans. Pattern Anal. Mach. Intell. 22 (6) (2000) 610–622. [12] F. Kahl, D. Henrion, Globally optimal estimates for geometric reconstruction problems, Inter. J. Comput. Vis. 74 (2007) 3–15.
[13] G. Schweighofer, A. Pinz, Globally optimal o(n) solution to the PnP problem for general camera models, in: Proceedings of the 19th British Machine Vision Conference, 2008. [14] H. Hmam, J. Kim, Optimal non-iterative pose estimation via convex relaxation, Image Vis. Comput. 28 (2010) 1515–1523. [15] M.A. Fishler, R.C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, CACM 24 (6) (1981) 381–395. [16] F. Moreno-Noguer, V. Lepetit, P. Fua, Pose priors for simultaneously solving alignment and correspondence, in: Proceedings of the 10th European Conference on Computer Vision Part II, 2008, pp. 408–415. [17] R. Claudio, M. Mohamed, J.C. Diaz, EvoPose: a model-based pose estimation algorithm with correspondences determintaion, in: Proceedings of the IEEE International Conference on Mechatronics and Automation, 2005, pp. 1551–1556. [18] J. Xia, X. Xu, J. Xiong, Simutaneous pose and correspondence determination using differential evolution, in: Proceedings of the Eighth International Conference on Natural Computation, 2012, pp. 703–707. [19] D. DeMenthon, L. Davis, Model-based object pose in 25 lines of code, Int. J. Comput. Vis. 15 (1995) 123–141. [20] S. Gold, A. Rangarajan, C.-P. Lu, New algorithms for 2D and 3D point matching: pose estimation and correspondence, Pattern Recognit. 31 (8) (1998) 1019–1031. [21] P. David, D. DeMenthon, R. Duraiswami, SoftPOSIT: simultaneous pose and correspondence determination, Int. J. Comput. Vis. 59 (3) (2004) 259–284. [22] H.F. Ugurdag, S. Goren, F. Canbay, Correspondenceless Pose Estimation from a Single 2D Image Using Classical Mechanics, vol. 132, Proceedings of IEEE International Symposium on Computer Information Science, Istanbul, 2008, pp. 1–6. [23] H.F. Ugurdag, S. Goren, F. Canbay, Gravitational pose estimation, Comput. Electr. Eng. 36 (2010) 1165–1180. [24] M.D. Shuster, A survey of attitude representations, J. Astronaut. Sci. 41 (4) (1993) 439–517. [25] S. Theodoridis, K. Koutroumbas, Pattern Recognition, Academic Press, California, 2009, pp. 653–697.