Camera relative pose estimation for visual servoing using quaternions

Camera relative pose estimation for visual servoing using quaternions

Accepted Manuscript Camera relative pose estimation for visual servoing using quaternions Kaveh Fathian, Jingfu Jin, Sung-Gil Wee, Dong-Ha Lee, Yoon-G...

5MB Sizes 0 Downloads 36 Views

Accepted Manuscript Camera relative pose estimation for visual servoing using quaternions Kaveh Fathian, Jingfu Jin, Sung-Gil Wee, Dong-Ha Lee, Yoon-Gu Kim, Nicholas R. Gans PII: DOI: Reference:

S0921-8890(17)30740-6 https://doi.org/10.1016/j.robot.2018.05.014 ROBOT 3037

To appear in:

Robotics and Autonomous Systems

Received date : 16 October 2017 Revised date : 6 April 2018 Accepted date : 28 May 2018 Please cite this article as: K. Fathian, J. Jin, S.-G. Wee, D.-H. Lee, Y.-G. Kim, N.R. Gans, Camera relative pose estimation for visual servoing using quaternions, Robotics and Autonomous Systems (2018), https://doi.org/10.1016/j.robot.2018.05.014 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Camera Relative Pose Estimation for Visual Servoing using Quaternions Kaveh Fathiana , Jingfu Jina , Sung-Gil Weeb , Dong-Ha Leeb , Yoon-Gu Kimb,∗, Nicholas R. Gansa,∗ a Department

of Electrical Engineering, University of Texas at Dallas, Richardson, TX, USA Convergence Research Center, DGIST, Daegu, South Korea

b Wellness

Abstract We present a novel approach to estimate the rotation and translation between two camera views from a minimum of five matched points in the images. Our approach simultaneously recovers the 3D structure of the points up to a common scale factor, and is immune to a variety of problems that plague existing methods that are based on the Euclidean homography or Essential matrix. Methods based on homography only function when feature points are coplanar in 3D space. Methods based on the Essential matrix often loose accuracy as the translation between two camera views goes to zero or when points are coplanar. By recovering the rotation and translation independently using quaternions, our algorithm eschews the shortcomings of these methods. Moreover, we do not impose any constraints on the 3D configuration of the points (such as coplanar or non-coplanar constraints). Our method is particularly well-suited for Position-Based Visual Servoing (PBVS) applications. Investigations using both simulations and experiments validate the new method . Comparisons between the proposed algorithm and the existing algorithms establish that our algorithm is robust to noise. A Matlab implementation of our algorithm is available online and free. Keywords: Five point algorithm, camera pose estimation, visual servoing, vision based estimation, relinearization

Supplementary Material Video of the simulations and experiments is available at https://youtu.be/pxMl537EEdc. The simulation code and the Sphero formation control package can be download from https://goo.gl/QH5qhw. 1. Introduction Estimating the rotation and translation (i.e., pose) between two camera views using acquired images is called camera pose estimation. Pose estimation has a variety of applications in computer/machine vision, robotics, autonomous vehicles and navigation. For example, in vision-based control and visual odometry, position and orientation of a robot ∗ Corresponding

authors Email addresses: [email protected] (Kaveh Fathian), [email protected] (Jingfu Jin), [email protected] (Sung-Gil Wee), [email protected] (Dong-Ha Lee), [email protected] (Yoon-Gu Kim), [email protected] (Nicholas R. Gans)

Preprint submitted to Journal of Robotics and Autonomous Systems

May 30, 2018

can be estimated from camera images [1, 2]. In photogrammetry, a scaled 3D model of an object can be constructed from a sequence of images taken from the object [3]. Conventionally, the camera pose estimation problem is solved using a set of matched feature points in the images, which can be detected using various methods [4, 5, 6, 7]. It has been shown that to ensure a finite number of solutions for the pose estimates, a minimum of 5 feature points are required [8, 9]. Philip showed that when 6 or more general feature points are available, the pose estimation problem is linear and produces a unique solution for the Essential matrix [10]. However, by using 5 points, the pose estimation problem becomes nonlinear, and there is generally more than one solution candidate. A point/camera configuration is called critical for an algorithm if the problem formulation becomes degenerate in that configuration (e.g., coincident or collinear point configuration) [11]. As discussed in [12], in addition to the theoretical interest in solving the pose estimation problem with 5 points, estimating the pose by using a 5-point algorithm has several practical advantages over algorithms that require more points: • A 5-point algorithm suffers from fewer cases of critical configurations. For example, in contrast to algorithms that use more than 5 points, coplanarity of feature points does not cause degeneracy in the 5-point estimation [13, 14, 15, 10]. • A 5-point algorithm is more robust to mismatched feature points, and as a hypothesis generator for RANdom SAmple Consensus (RANSAC) algorithm [16], needs fewer samples than other algorithms. This is because 5 points have higher probability to generate a hypothesis that is not contaminated by outliers. • Due to exploiting all geometric constraints of the problem, for a given noise distribution in feature point coordinates, the expected error of the estimation by using a 5-point algorithm is lower than the algorithms that require more points. 1.1. Our Contributions Previously, we formulated the quaternion formulation of the pose problem in [17], which we solved using an iterative method. The drawback of this initial work was the reliance on an accurate initial estimate for the iterative solver. In [18], we presented the QuEst algorithm, which introduced a novel way to solve the system of polynomial equations that arises from the quaternion formulation. In this work, we explore the relinearization method to solve this system. Relinearization [19] is an established approach for solving overdetermined polynomial systems, and is used to compare QuEst’s performance against an accepted method. We compare the accuracy and execution time of both methods with the state-of-the-art 5-point algorithms. The averaged estimation accuracy of both algorithms are shown to outperform all compared algorithms, with the exception of the Stewenius’ algorithm, which has a similar performance in some of the comparisons. The comparison between the relinearization and QuEst shows that the relinearization method has better estimation accuracy for translation when the parallax is small. QuEst, on the other hand, shows better estimation accuracy in other comparisons. The execution time of the relinearization approach is large, while QuEst has a much shorter execution time. 2

We further explore the use of the quaternion formulation in vision-based control of robots. This includes six degree of freedom systems such as an eye-in-hand manipulator and multiple experiments with a mobile robot. Experiments with the mobile robot compare our method with the Essential matrix recovered by the Stewenius’ algorithm and the homography matrix. The algorithms presented in this work do not need an initial estimate of the pose, and all solutions are found without a priori knowledge of the scene through solving an eigenvalue problem. Additionally, • Given the relative translation t ∈ R3 and rotation R ∈ SO(3), the Essential matrix is defined as E := T× R, where T× ∈ R3×3 is the skew-symmetric matrix composed of the translation vector elements. When paral-

lax (i.e., ratio of the translation between the views to the distance of feature points from the camera) is small, translation t, and therefore E are close to a zero. In such a case, recovering the rotation and translation form

E becomes an ill-conditioned problem. Hence, estimation accuracy of methods that are based on the Essential matrix (especially ones that require more than 5 points) can suffer. Our algorithm estimates the rotation independently of the translation, and is not affected in this case. • In our approach, relative rotation and translation are recovered directly (i.e., there is no need for a secondary decomposition of the homography or Essential matrix into the rotation and translation components). Furthermore, translation and depths of the points are recovered simultaneously and share a common scale factor. • The norm of the translation vector recovered by our algorithm approaches zero as the parallax goes to zero1 . This is in contrast to the translation recovered from an Essential matrix, which is always unit norm. This property is desirable in visual servoing applications. • Our algorithm (and the Essential matrix) are not restricted to coplanar feature points as the homography method requires at least 4 points to be coplanar.

• The proposed algorithm can be used without modification when more than 5 feature points are available. This paper presents two secondary contributions: a novel extension to the relinearization technique [19] that enables one to solve a polynomial system that does not have a unique solution, and a novel approach to delineate between multiple mathematically feasible solutions to the pose estimation. The paper organization is as follows. We provide an overview of the relevant work on camera pose estimation in Section 2. In Section 3, we introduce the notations and assumptions that are used throughout the paper. In Section 4, we give a brief introduction to quaternions, and explain the relinearization method that will be used to recover the pose. The quaternion representation of the 5-point problem is formulated in Section 5, and two algorithms are presented in Section 7 to recover the relative rotation between two camera views. This is followed by Section 8, where a method to recover the relative translation and depths is presented. In Section 9, the performance of the proposed algorithms 1

Please note that like any pose estimation algorithm, the translation vector can only be recovered up to an unknown scale factor.

3

are compared with several state-of-the-art algorithms using Monte Carlo analysis. Lastly, in Section 10, the algorithm is tested in real-time Visual Servoing experiments, and results are further compared with other algorithms. 2. Previous Work Pose estimation using images can be traced back to at least 1913 when Kruppa [20] proved that two camera views of five 3D points could be used to estimate the translation and rotation separating two camera views. However, the lack of sufficient computational resources limited further development until the landmark introduction of the 8-point algorithm [21, 9]. The 8-point algorithm used the epipolar constraint, embodied in the Essential matrix, to solve a set of linear equations that are generated from a set of 8 or more feature points. Work by [8, 9, 22, 23, 24] showed that there are generally 10 Essential matrix solutions to the pose estimation problem using 5 points. Nister [25] proposed a practical algorithm that formulated the solutions of the 5-point problem as the roots of a tenth degree polynomial. Stewenius et al. [26] proposed an algorithm that remains robust even for vanishing translation, and is more robust to noise. Li and Hartley [12] produced a version of Nister’s algorithm that has a faster execution time. Based on their work, Kukelova et al. [27] reformulated the 5-point pose estimation as a polynomial eigenvalue problem and showed that the new formulation can be solved using efficient numerical algorithms. Work done by [28, 29, 30] used iterative techniques to further improve the estimation results in terms of the robustness to noise and algorithm speed. What these previous work have in common is the use of the Essential matrix in the problem formulation. Subsequently, the estimated Essential matrix is decomposed into a rotation matrix and a scaled translation vector, possibly followed by a nonlinear optimization step [31]. In the special case where feature points are coplanar, methods based on the homography matrix can be used to estimate the pose [22], [32]. However, for a general 3D point configuration that does not contain at least 4 coplanar points, homography does not return the correct solution. In recent years, new algorithms have been developed that estimate the rotation independently of the translation, i.e., without resorting to the Essential matrix (which entangles the rotation and translation). Kalantari et al. [33] used Gröbner bases [34] to solve a formulation of the 5-point problem with independent rotation and translation estimation. Lim et al. [35] used the antipodal points in the image to estimate the rotation and translation separately. Their method requires images that are acquired by a large field of view (FOV) camera. Bazin et al. [36] decoupled the rotation and translation estimation for a catadioptric system using lines instead of feature points. Kneip et al. [37] proposed a new formulation for independent rotation and translation estimation, where they used Gröbner bases to solve the problem more efficiently compared to [33]. Due to the sensitivity of their algorithm to small parallax, the case of zero translation needs to be explicitly detected. In comparison, we show in the following sections that our algorithm is more robust to noise and is not sensitive to zero or small parallax. Work done by [38, 39] used the first-order approximated rotation matrix to estimate the rotation when the pose difference between two views is infinitesimal. In comparison, our method works for an arbitrary pose difference 4

between the two views. Hartley and Kahl [40] used iterative methods for pose estimation. In their subsequent work, Kneip and Lynen [41] proposed an iterative solution which is more stable, but in practice requires using more than 5 points to avoid local minima. To avoid such drawbacks, we do not use iterative methods in this work. 3. Notations and Assumptions Throughout the paper, we assume that the camera is calibrated, i.e., the calibration matrix is known a priori. Note that this matrix can be easily found through the existing camera calibration routines [42]. We do not consider RANSAC in this work and assume that the points are correctly matched, i.e., there are no outliers. Scalars are represented by lower case, vectors and quaternions by lowercase and bold, and matrices by upper case and bold letters. Camera coordinate frames are shown with upper case letters. All vectors are column vectors. Transpose of vector v and matrix M are shown by vT and MT , respectively. The right subscript indicates the matched feature point’s index. The left superscript indicates the camera coordinate frame for which the coordinates are given in. If matrix M is square and full rank, its inverse is shown by M−1 . Otherwise, the Moore-Penrose pseudo inverse of M is shown by M† . The element in row m and column n of matrix M is shown by [M]m,n . Similarly, [v]n indicates the n’th element of vector v. The dot product of vectors v1 and v2 is shown by dot(v1 , v2 ), and the angle between them by ang(v1 , v2 ). The determinant of square matrix M is shown by det(M ). The signum function of a real number  n! x is denoted by sgn(x). Binomial coefficients are denoted by nk := k!(n−k)! . Degree 4 monomials in variables q1 , q2 , q3 , q4 are single term polynomials q1a q2b q3c q4d , such that a + b + c + d = 4, for a, b, c, d ∈ {0, 1, 2, 3, 4},

e.g., q14 , q13 q2 , q12 q2 q4 , etc. By norm of a vector, we imply the l2 -norm. The shorthand notation Nn represents the set Nn := {1, 2, . . . , n}. The set of positive real numbers is shown by R+ . 4. Problem Background The quaternion representation of rotation and its properties is briefly discussed as it pertains to our pose estimation method. The relinearization method, which can solve an overdetermined system of homogeneous polynomials, is introduced and explained with a simple example. Relinearization will be used later to find all the quaternion solutions of the relative pose problem. 4.1. Quaternions Quaternions can be used to represent a rotation. They are numerically stable, and compared to matrix representation of rotation, fewer operations are needed to perform a rotation operation. The quaternion of a rotation θ ∈ R along

5

Figure 1.

Rotation θ along the unit length axis of rotation a = [ax , ay , az ]T can be represented with the quaternion q = [cos( θ2 ), sin( θ2 ) aT ]T .

the unit-norm axis of rotation a := [ax , ay , az ]T ∈ R3 , as shown in Fig. 1, can be represented by a 4 element vector     cos(θ/2) q1         q2  ax sin(θ/2) ∈H    (1) q :=   =   q3  ay sin(θ/2)     az sin(θ/2) q4

where q1 ∈ R and qv := [q2 , q3 , q4 ]T ∈ R3 are respectively called the scalar part and the vector part of the quaternion. If q is a rotation quaternion vector, its l2 -norm is equal to one. That is q12 + q22 + q32 + q42 = 1.

(2)

By restricting the scalar part of rotation quaternions to nonnegative numbers, i.e., q1 ≥ 0, the group of quaternion rotations is isomorphic to SO(3). To avoid introducing unnecessary complication of the quaternion algebra, we only

mention what is essential to formulate the problem in this work: if q1 , q2 , q3 , q4 are elements of a rotation quaternion, the associated rotation matrix is given by  q 2 + q22 − q32 − q42  1  R =  2(q2 q3 + q1 q4 )  2(q2 q4 − q1 q3 )

2(q2 q3 − q1 q4 )

q12 − q22 + q32 − q42 2(q3 q4 + q1 q2 )

2(q2 q4 + q1 q3 )



  2(q3 q4 − q1 q2 )  .  q12 − q22 − q32 + q42

(3)

4.2. Relinearization The problem of solving a system of polynomial equations is known to be a NP-complete problem. There are several techniques that can be used to solve such systems. Gröbner bases techniques are proved to find the solutions of the polynomial system in finite time [43]. However they do not usually benefit from the number of equations exceeding the number of variables [44]. As an alternative to Gröbner bases, we use the relinearization method and extend it to solve systems of polynomials with more than one solution. Relinearization was first introduced in [19] to solve a system of quadratic polynomials, where the number of equations exceeds the number of variables. Further studies have verified the effectiveness of 6

relinearization approach in solving such polynomial systems [44]. Relinearization can be extended to solve systems of polynomials with higher degrees, for example quartic polynomials. Relinearization is based on the commutativity of multiplication of monomials, and it is easiest explained through an example. Example 1. Find all q1 , q2 ∈ R, q1 ≥ 0, such that 2q14

+5q13 q2

+6q12 q22

+10q1 q23

+4q24

=0

2q14

+

q13 q2

+11q1 q23

+6q24

=0

5q14

+8q13 q2

−2q12 q22

+3q1 q23

+2q24

= 1.

+ q12 q22

(4)

By defining x1 := q14 ,

x2 := q13 q2

x3 := q12 q22 ,

x4 := q1 q23 ,

x5 := q24

(5)

and x := [x1 , x2 , x3 , x4 , x5 ]T , (4) can be linearized and presented by (6)

Cx = c where

 2   C = 2  5

5

6

1

−2

8

1

   0 10 4       11 6 c = 0 .    1 3 2

(7)

The system of equations (6) is linear with 3 equations and 5 unknowns. Solutions of (6) are given by (8)

x = xp + s1 v1 + s2 v2

where xp ∈ R5 is a particular solution (e.g., xp = C† c), s1 , s2 ∈ R are arbitrary scalars, and v1 , v2 ∈ R5 are basis vectors for the null space of C. Choosing

and substituting (9) in (8) yields

h iT xp = −6, 2, 3, −10, 21 h iT v1 = 7, −4, 1, 2, −5 h iT v2 = 0, −8, 20, −42, 85

x1 = 7s1 − 6,

x2 = −4s1 − 8s2 + 2

x4 = 2s1 − 42s2 − 10,

(9)

x3 = s1 + 20s2 + 3,

x5 = −5s1 + 85s2 + 21.

(10)

From (5) and the commutativity of monomial multiplication, the following 6 algebraically independent constraints hold x1 x3 = x2 x2 ,

x1 x4 = x2 x3 ,

x2 x5 = x3 x4 , 7

x3 x5 = x4 x4 ,

x1 x5 = x2 x4 = x3 x3 .

(11)

Substituting (10) in (11) yields − 9s21

− 64s22

18s21

+160s22

18s21

+160s22

− 9s21

− 64s22

−27s21

−36s21

−336s22

−400s22

+ 76s1 s2

+ 31s1

+ 88s2

= − 22

−206s1 s2

− 72s1

+236s2

=

54

−298s1 s2

− 90s1

+328s2

=

72

+153s1 s2

+ 46s1

−165s2

= − 37

+443s1 s2

+133s1

−506s2

= −106

+555s1 s2

+171s1

−630s2

= −135.

(12)

By introducing y1 := s21 ,

y2 := s22 ,

y3 := s1 s2 ,

y4 := s1 ,

y5 := s2

(13)

and y := [y1 , y2 , y3 , y4 , y5 ]T , (12) can be linearized again and presented as (14)

Dy = d where 

−9

   18    18 D=   −9   −27  −36

−64

76

31

160

−206

−72

160

−298

−90

−64

153

46

−336

443

133

−400

555

171

−88





−22



       54  236          72  328 .  d=     −37  −165         −106 −506    −135 −630

(15)

If D has full column rank, then y = D† d, and a unique solution for (4) is found by substituting y in (13), and sequentially substituting the variables in (10) and (5). However, examining the eigenvalues of D reveals that D has two zero eigenvalues, and therefore D is not full rank. When D is not full rank, infinitely many y exist that satisfy (14). In such a case, the relinearization method is inconclusive. We now present an extension to the relinearization technique that allows us to find all solutions to a problem such as (4). Let h

¯ := D D

i −d

Equation (14) can be written as

  y ¯ :=   . y 1

¯ y ¯ = 0. D

(16)

(17)

¯ is spanned by 2 linearly independent bases The null space of D h w1 = 1, −3, 16, 1,

iT 16, −1 ,

8

h iT w2 = −1, 0, 0, −1, 0, 1 .

(18)

Let W := [w1 , w2 ] ∈ R6×2 . From (17), it follows that there exists r := [r1 , r2 ]T ∈ R2 such that     2 1 −1 s    1     2  −3 0   s2            16 s1 s2  0  r1  . =Wr= ¯ :=  y      1 −1 r2  s1           16  s2  0     −1 1 1

¯ in (19), we can define W0 , W1 , W2 ∈ R2×2 such that Let s := [s1 , s2 ]T . Using corresponding rows of y    1 −1 r1   s = W0 r :=  16 0 r2 

From (20)



(20)

  −1 r1  = W1 r :=    s1 s =  s1 s2 16 0 r2

(21)

     s1 s2 16 0 r1  = W2 r :=   . s2 s =  s22 −3 0 r2

(22)

r = W0−1 s.

(23)

s21



(19)

1

Substituting (23) in (21) and (22) results in  1 s1 s = W1 W0−1 s =  0

 0  s, 1

 0 s2 s = W2 W0−1 s =  0

1 −3 16



s

(24)

from which all feasible values of s := [s1 , s2 ]T are found by calculating the eigenvalues of W1 W0−1 and W2 W0−1 , which are respectively {1, 1} and {0,

−3 16 }.

Consequently     1 1 s ∈ {  ,  }. −3 0 16

By sequentially substituting (25) back in (8) and (5), all solutions of (4) are       1 q1 1 }.   ∈ {  ,  −2 −0.5 q2

(25)

(26)

¯ in (17) was 2. In general, Note that since the polynomial system in Example 1 had 2 solutions, the nullity of D

¯ the number of solutions of a quartic system of form (4) is equal to the nullity of D. 9









Figure 2.

Projection of a 3D point on two camera image frames F1 and F2 with rotation difference R and translation difference t.

5. Pose Estimation Problem in Quaternions Consider two images of a set of 5 feature points, acquired by a camera at two different views. Denote the local coordinate frame of the camera at the first and second view by F1 and F2 , respectively. Let R ∈ SO(3) and t =

[tx , ty , tz ]T ∈ R3 respectively be the rotation and translation of the camera coordinate frame F1 in the camera coordinate frame F2 , as shown in Fig. 2. Let q = [q1 , q2 , q3 , q4 ]T ∈ H be the quaternion of R. Accordingly, let

1 2

pi = [1 xi , 1 y i , 1 z i ]T ∈ R3 , i ∈ N5 , represent the coordinate of feature point i in coordinate frame F1 , and similarly pi = [2 xi , 2 y i , 2 z i ]T in coordinate frame F2 . Under the pinhole camera model, projections of point i on the first

and second image planes are 1

1

mi := [1 ui , 1 v i , 1]T =

1z

1 i

pi ∈ R3 ,

2

mi := [2 ui , 2 v i , 1]T =

1 2z

2 i

pi ∈ R3 .

(27)

From the rigid motion constraint, each pair of matched image feature points satisfies the equation 1

z i R 1 mi + t = 2 z i 2 mi .

(28)

Point coordinates 1 mi and 2 mi are known from the images, and the unknowns in (28) are 1 z i , 2 z i , R, and t, which need to be recovered. The following example demonstrates how (28) can be written for the matched feature points. Example 2. Consider the pictures shown in Fig. 3, where feature points are matched, and their coordinates on the image plane are determined. In practice, these coordinates are first obtained in pixels and are then mapped via the camera calibration matrix to Cartesian coordinates on the image plane. For the matched feature points with coordinates (−0.1, −1.5) in the left image and (0.2, −1.2) in the right image, from (28) we get 1

z1 R

h −0.1 i

+ t = 2z1

h −2.3 i

+ t = 2z2

−1.5 1

h

i

,

h −2.0 i

,

0.2 −1.2 1

(29)

where scalar u1 and v1 are the depths of the 3D point at each view. Similarly, for two other matched pairs, we can write 1

z2 R

1.6 1

10

1.8 1

(30)

0.1 1.5

2.3 1.6

Figure 3.

0.2 1.2

0.6 0.9

0.9 1.1

2.0 1.8

Pairs of matched feature points across two images shown side by side (image courtesy of MathWorks).

1

z3 R

h 0.6 i 0.9 1

+ t = 2z3

h 0.9 i 1.1 1

(31)

,

where subscripts are used to distinguish the depths of the points (scalars 1 z and 2 z). Notice that rotation matrix R and translation vector t are the same in all equations. By replacing R in (28) from (3), and by direct computation, one can show that (28) is equivalent to the quaternion formulation 1

¯ t = 0, z i 1 Mi q − 2 z i 2 Mi q + Q

(32)

where 

and

0

 1  ui 1 Mi :=  1  vi  1

1

ui

1

vi

0

1

−1

0

1

vi

−1 u i

1





  −1 v i   ∈ R4×4 ,  1 ui   0 

q2

   q1 ¯ Q :=   −q4  q3

 2  ui 2 Mi :=  2  vi  1

q3 q4 q1 −q2

q4



2

0

ui

2

0

−1

1

0

−2 v i

  −q3   ∈ R4×3 ,  q2   q1

2

vi

ui

1



  vi   ∈ R4×4 , 2  − ui   0 2

(33)

(34)

are defined using the elements of image coordinates 1 mi and 2 mi in (27), rotation quaternion q = [q1 , q2 , q3 , q4 ]T , and translation t = [tx , ty , tz ]T , for every i ∈ N5 .

We need to emphasize that in the relative pose estimation problem, translation and depths of the points can only

be recovered up to a scale factor. This can be seen from (28) or (32), where any constant multiplied into both sides can be absorbed into the unknown variables 1 z i , 2 z i , and t. 6. Camera Motion Estimation To recover the pose, we first need to eliminate the unknowns 1 z, 2 z and t and derive a system of equations in terms of the quaternion elements. From solving this system, all rotation solution candidates are found. Subsequently, the 11

translation and depths of the points are recovered. By taking (28) or (32) for two different pairs of feature points and subtracting the equations, t can be eliminated. Repeating this using a third feature point pair and one of the previous pairs results in two equations with t removed. Depths can be removed by rewriting the resulting two equations in the matrix-vector form and noting that the depths form a null vector for the matrix. The determinant of this matrix, which consists of only the point coordinates and rotation unknowns, must equal zero. This gives a polynomial equation of degree four in quaternion variables q1 , q2 , q3 , q4 , with coefficients that are in terms of the feature point coordinates. The following example explicates this procedure. Example 3. By subtracting (30) and (31) from (29), respectively, and bringing terms to the left hand side we get h −0.1 i h 0.2 i h −2.3 i h −2.0 i 1 (35) z 1 R −1.5 − 2 z 1 −1.2 − 1 z 2 R 1.6 + 2 z 2 1.8 = 0, 1

1

1

z1 R

h −0.1 i −1.5 1

− 2z1

h

1

0.2 −1.2 1

i

− 1z3 R

h 0.6 i 0.9 1

1

+ 2z3

h 0.9 i 1.1 1

= 0,

(36)

in which the unknown translation t has been eliminated, and R is expressed by the quaternion elements as given in (28). We can represent (35) and (36) in the matrix-vector form  z1       - 0.1   - 0.2   2.3   - 2.0   2 z1    0 0 R - 1.6 R - 1.5 1  1.8 1.2  z2  1 -1    1   -1  = 0,    - 0.1 - 0.2 - 0.6   0.9    2 z2  R - 1.5 0 0 R - 0.9 1.2 1.1     1 1 1 1 1 {z } | z3    M 1

2

which implies that matrix M ∈ R

6×6

(37)

z3

has a null vector. Therefore, its determinant must be zero. Calculating the

determinant of M with R given in the parametric form (3) gives − 0.1 q14 + 2.4 q13 q2 + 35.6 q12 q22 − 11.4 q1 q23 + 0.3 q24 + . . . − 19.8 q32 q42 − 168 q1 q43 + 9.4 q2 q43 − 17.8 q3 q43 + 0.3 q44 = 0.

(38)

Equation (38) consists of degree four monomials in q1 , q2 , q3 , q4 , with coefficients that depend on the feature point coordinates. Note that since M is a 6 × 6 matrix, one may expect its determinant to have degree six monomials, however, due to the special structure of M, it is always possible to factor out q12 + q22 + q32 + q42 from the determinant expression. Since q12 + q22 + q32 + q42 = 1, the degree four polynomial equation in (38) follows. From the example above, one observes that three matched feature points can generate a homogeneous degree 4 polynomial in quaternion elements. These polynomials can be expressed as 35 X

cm n q1an q2bn q3cn q4dn = 0,

(39)

n=1

an + bn + cn + dn = 4,

an , bn , cn , dn ∈ N4 ∪ {0},

m ∈ N10

in which the coefficients cm n are in terms of the feature point coordinates. The exponents an , bn , cn , dn generate all  combinations of the degree 4 monomials of the quaternion elements, which are in total 73 = 35. Note that the number 12

of monomials of degree m in n variables is



m+n−1 n−1

. Accordingly, the coefficient of each monomial is labeled by

the subscript n ∈ N35 . Every subset of 3 feature points {i, j, k} chosen among the set of 5 feature points result in  a constraint of the form (39). Thus, there are total 53 = 10 constraints distinguished by the subscript m ∈ N10 in (39). Using the unit norm constraint (2) for the rotation quaternion, an extra degree 4 polynomial constraint is given for m = 11 by 35 X

cm n q1an q2bn q3cn q4dn = q12 + q22 + q32 + q42

n=1

bringing the total number of constraints to 11.

2

=1

(40)

Using a computer algebra package, the closed-from expressions of cm n in (39) in terms of the feature point coordinates can be easily computed and stored. Therefore, these coefficients are directly evaluated from the feature point coordinates and one does not need to repeat the precedent derivation to calculate cm n . In the interest of space, we will not give the closed-from expressions here. These closed-form expressions are available online. The overall 11 constraints introduced in (39) and (40) can be considered as a system of quartic polynomials in 4 variables q1 , q2 , q3 , q4 . As shown in the following theorem, these polynomial equations are generically linearly independent. Theorem 1. Define the matrix of coefficients in (39) and (40) as [C]m,n := cm n , C ∈ R11×35 . Generically, C has

full row rank.

Proof. For C not to be full rank, CCT ∈ R11×11 should have zero determinant. The proof follows by computing the

determinant of CCT in the symbolic form using a computer algebra package. By direct computation, one can show that the determinant is a nontrivial function of the matched feature point coordinates 1 ui , 2 ui , 1 v i , 2 v i , i ∈ N5 . Since

a nontrivial determinant has zero measure in the space of matrix elements, C is generically full row rank. That is, random choice of 3D feature points gives a C matrix that is full-rank with probability one. 7. Recovering the Relative Rotation In what follows, we present two algorithms to recover the relative rotation from the polynomial system (39). The rotation solution candidates are used in the next section to recover the associated translation vector and depths. The focus of this section is the minimal case of five points. A discussion on extension of the algorithms to six or more points is provided in Appendix A. 7.1. The QuEst Algorithm  Consider 53 = 10 degree four equations that are generated from five feature points. Multiplying these equations by q1 , q2 , q3 , and q4 results in 40 degree five equations. The multiplication creates more equations, which allow us

to solve the polynomial system. This new system of equations can be stacked into the matrix-vector form A x = 0, 13

where matrix A ∈ R40×56 consists of the coefficients, and vector x ∈ R56 consists of all degree five monomials (the  number of degree five monomials in q1 , q2 , q3 , q4 is 83 = 56). To solve the new system of equations, we split x ∈ R56 into two vectors x1 ∈ R35 and x2 ∈ R21 , where x1 is

the vector of all monomials that contain a power of q1 (e.g., q15 , q14 q2 , q14 q3 , . . . , q1 q34 , q1 q44 ), and x2 consists of the

rest of the monomials (e.g., q25 , q24 q3 , q23 q32 , . . . , q3 q44 , q45 ). Let A = [A1 A2 ], where A1 ∈ R40×35 consists of the

columns of A that are associated with x1 , and A2 ∈ R40×21 is the columns that are associated with x2 . The system Ax = 0 can be equivalently written as

A1 x1 + A2 x2 = 0.

(41)

By multiplying the pseudo inverse of A2 in above we obtain x2 = −A†2 A1 x1 .

(42)

Vector x1 consists of monomials that have at least one power of q1 . Thus, by factoring out q1 and representing the remaining vector by v ∈ R35 , i.e., v =

1 q1 x1 ,

(42) can be written as ¯ v, x2 = q1 B

(43)

¯ := −A† A1 ∈ R21×35 . where B 2

Equation (43) allows us to construct a square matrix B ∈ R35×35 and build an eigenvalue problem of the form

λv = Bv. Indeed, let us choose λ =

q2 q1 ,

and consider the eigenvalue problem q2 v = q1 B v.

(44)

The entries of vector q2 v either belong to x2 or x1 . For entries that belong to x2 , the associated rows of B are chosen ¯ For entries that belong to x1 , rows of B are chosen as an appropriate natural basis from the corresponding rows of B. row vector, i.e., a vector with exactly one entry of 1 and zeros elsewhere. Additional details and examples of this eigenvalue problem formulation are available in our previous work [18]. Once the eigenvalue problem (44) is constructed, 35 solution candidates for v are derived by computing the eigenvectors of B. Quaternion elements q1 , q2 , q3 , q4 are then computed from the entries of v. Lastly, the recovered solutions are normalized to meet the unit norm constraint q12 + q22 + q32 + q42 = 1. Since the 5-point problem has up to 20 solutions (including solutions with negative depths), 15 of the recovered solutions do not satisfy the original degree four system and can be easily discarded by plugging the solutions in the original system. Note that solutions corresponding to points behind the camera can be discarded after the depths are recovered. This leaves only up to 10 possible solutions. 7.2. Relinearization Algorithm The relinearization method (cf. Section 4) can be used to solve the system of 11 degree four polynomial equations in 4 variables q1 , q2 , q3 , q4 given by (39) and (40). Relinearization is an established way to solve such problems, and 14

can be used to compare QuEst’s performance against an accepted method. Let C ∈ R11×35 defined by [C]m,n :=

cm n , be the coefficient matrix of the polynomial system (39) and (40). Let c := [0, 0, . . . , 0, 1]T ∈ R11 , and denote by



x1





q14



     3    q1 q2   x2  35    x :=   ..  :=  ..  ∈ R  .   .      q44 x35

(45)

the vector of unknowns. The linearized polynomial system can be expressed as Cx = c

(46)

where C is given by the feature point coordinates. Assuming that the rank of C is 11, the null space of C is spanned by 24 linearly independent bases vi ∈ R35 , i ∈ N24 , and all solutions of (46) are given by x = xp +

24 X

si vi

(47)

i=1

where si ∈ R are unknown coefficients, and xp ∈ R35 is a particular solution, which can be chosen as xp =

C† c. Similar to (11), by commutativity of multiplication of quartic quaternion monomials in (45), 465 algebraically

independent quadratic constraints hold for xk , k ∈ N35 . Substituting (47) in these constraints results in 465 equations  in 25 = 300 quadratic terms si sj and 24 scalar terms si , where i, j ∈ N24 . By defining the coefficient matrix 2

D ∈ R465×324 , the right hand side scalar vector d ∈ R324 , and the vector of unknowns y := [s21 , s1 s2 , . . . , s224 , s1 , . . . , s24 ]T ∈ R324

(48)

the relinearized system is given by D y = d.

(49)

Being that the 5 point pose estimation problem has more than one solution, D in (49) cannot have full row rank. Following the similar approach to Example 1 in Section 4, define h

¯ := D D

i −d ,

  y ¯ :=   . y 1

(50)

¯y ¯ = 0. Including the complex solutions, the 5 point relative pose problem generally has 20 quaternion From (49), D solutions (20 rotation solutions is the consequence of decomposing 10 Essential matrix solutions. For proof see [8]). ¯ is 20. Let wj ∈ R325 , j ∈ N20 be a set of bases for the null space of D ¯ and denote by Therefore, the nullity of D W := [w1 , w2 , . . . , w20 ] ∈ R325×20 the matrix of all bases. There exists r := [r1 , r2 , . . . , r20 ]T ∈ R20 such that ¯ = W r. y 15

(51)

¯ in (50), and the corresponding rows of W in Let s := [s1 , s2 , , . . . , s24 ]T . From the definitions of y in (48) and y (51), it follows that there exist matrices W0 , W1 , . . . , W24 ∈ R24×20 such that (52)

s = W0 r,

(53)

l ∈ N24 .

sl s = Wl r, From (52)

r = W0† s.

(54)

Substituting (54) in (53) sl s = Wl W0† s,

(55)

l ∈ N24

from which all solutions can be found by calculating the eigenvalues of Wl W0† ∈ R24×24 for l ∈ N24 . Note that since the rank of Wl ∈ R24×20 is 20, rank of Wl W0† cannot be larger than 20, and therefore at least 4 eigenvalues

are zeros. By discarding these zeros eigenvalues, 20 solution candidates for s remain. Furthermore, the solutions with complex eigenvalues can be discarded. Substituting the recovered real solution for s in (47), the solutions for x are found, from which by (45) all quaternion solutions are recovered. Note that half of the recovered solutions correspond to point configurations that are behind the camera (due to opposite chirality). Therefore, the number of solution candidates can be reduced from 20 to 10. This is discussed further in Appendix B, where the chirality condition is given in part (a) of Theorem 2. 8. Recovering the Relative Translation and Depths Once the set of all quaternion solution candidates is recovered, the corresponding translation and point depths can be found from (32), repeated here for convenience 1

¯ t = 0. z i 1 Mi q − 2 z i 2 Mi q + Q

(56)

For a given quaternion solution candidate q, the unknown terms in (56) are the translation vector t := [tx , ty , tz ]T and depth of the feature points 1 z i , 2 z i , for i ∈ N5 . Arrange the unknown and known terms respectively in vector b ∈ R13 defined by

b := [tx , ty , tz , 1 z 1 , 2 z 1 , . . . , 1 z 5 , 2 z 5 ]T

and matrix B ∈ R20×13 defined by



¯ 1 M1 q −2 M1 q Q 0 0 ··· 1 ¯ 0 0 M 2 q −2 M 2 q Q

B :=  . ..

¯ Q

0

.. .

0

..

0

0

From (57) and (58), the set of constraints (56) can be expressed as B b = 0. 16

.

···

0 0 1

(57)

0 0

.. . 2

M5 q − M5 q



 .

(58)

(59)

The unknown vector b in (59) can be found by computing the right singular vector corresponding to the zero singular value of B. Consequently, the translation vector and depth components recovered from (59) share a common scale factor. 9. Comparison with Existing Algorithms In this section, we use Monte Carlo simulations to compare the performance of the proposed algorithms with several state-of-the-art 5-point algorithms. The 5-point algorithms used in the comparisons are Nister’s [25], Stewenius’ [26], Li’s [12], and Kukelova’s [27] 5-point algorithms, which are based on the Essential matrix, and Kneip’s 5-point algorithm [37], which estimates rotation and translation independently. The homography method [45], which is a commonly used algorithm in pose estimation applications, is also compared for coplanar points. The OpenGV C++ library [46], which contains the implementations of Nister, Stewenius, and Kneip’s 5-point algorithms was used. The library is called within Matlab to generate the results. Matlab implementations of the homography and our algorithm is used in all comparisons. To avoid any implementation mistakes that could misrepresent a work, we have only used open-source implementations of the aforementioned algorithms provided by the authors themselves [47, 48, 49, 46], and we did not include the work by Kalantari et al. [33], for which we could not find an implementation. Each Monte Carlo simulation consists of 5 randomly generated 3D points with uniform distribution inside a rectangular parallelepiped in front of the camera at the initial pose. The camera is moved to the second pose by the translation t, and rotated by angle θ along the rotation axis a, all chosen randomly with uniform distribution inside a rectangular parallelepiped, the interval [0, 2π], and on the unit sphere, respectively. Coordinates of the feature points on the image plane are computed by projecting the 3D points using the pinhole camera model explained in (27). We assume that each image is 1024 × 768 pixels, and the calibration matrix used in the simulations is   1060 0 514     K =  0 1060 384 .   0 0 1

(60)

The coordinate of the feature points in pixels is calculated from multiplying K−1 by the image coordinates and rounding the results. 9.1. Noise Comparison In order to evaluate the resilience of the algorithms to noise, zero mean Gaussian noise with standard deviation ranging from 0 to 10 pixels is added to all pixel coordinates. The noise standard deviation is increased by 0.1 pixel increments, and for each noise increment 100 simulations are generated. Due to multiplicity of solutions in the 5point pose problem, the solution candidate that is closest to the ground truth is chosen as the estimated pose in each algorithm. The estimation error for rotation is defined by ρ(q, qest ) = arccos(|dot(q, qest )|) ∈ [0, 17

π ] 2

(61)

Mean rotation error

0.06

0.04

0.02

Mean translation error

0

Nister Li Kneip Kukel. Stew. QuEst Relin.

0.2

0.1

0 0

2

4

6

8

10

Noise standard deviation (pixels)

Figure 4.

Noise comparison with Gaussian image noise for general points.

where q is the quaternion corresponding to the ground truth, and qest is the quaternion estimated from the noisy images. Note that (61) defines a metric on the rotation quaternion space H [50]. Similarly, the estimation error for translation is defined by ρ(tn , tn est ) = arccos(|dot(tn , tnest )|) ∈ [0,

π ] 2

(62)

where tn and tnest are the actual and estimated translation vectors, respectively, normalized such that their norm is one. Note that since in the pose estimation problem the translation can only be recovered up to a scale factor, the magnitude of the translation vector cannot be used for the comparison. Figure 4 shows the mean of the rotation and translation estimation errors at each increment of standard deviation in the noise distribution for all algorithms. Our proposed methods, QuEst and relinearization, and the Stewenius’ algorithm show the best performance. Other algorithms show less resilience to noise. The pose recovered by Nister’s, Li’s, and Kukelova’s algorithms are nearly identical, and therefore their associated lines overlap. Kneip’s algorithm only returns the rotation, and translation error is not compared for this algorithm. To allow a better comparison when the noise standard deviation is zero, we have not added the pixelization noise to the feature points (i.e., projection of a point on the image plane is not rounded to a pixel). In order to compare the 5-point algorithms with the homography method, the previous Monte Carlo analysis is repeated for coplanar points. Accordingly, in all simulations the points are distributed uniformly on a bounded plane. The mean of the rotation and translation estimation errors at each standard deviation increment is shown in Fig. 5. As can be seen from the figure, homography shows the best noise resilience when the standard deviation is small 18

Mean rotation error

0.06

0.04

Nister Li Kneip Kukel. Stew. Homog. QuEst Relin

0.02

Mean translation error

0 0.4 0.3 0.2 0.1 0 0

2

4

6

8

10

Noise standard deviation (pixels)

Figure 5.

Noise comparison with Gaussian image noise for coplanar points.

(approximately less that 3 pixels). However, our algorithms show better performance as the noise increases. 9.2. Parallax Sensitivity Comparison It is important to analyze the accuracy/robustness of the algorithms when the translation between the camera frames is small or zero, as this is often the case when the camera has a high frame rate (e.g., 30 fps or more) and its motion is slow (e.g., a hand-held camera). We define the parallax as the ratio of the translation magnitude to the mean of point depths in both images. That is, 1 2N

P2

ktk PN

j=1

i=1

jz i

,

(63)

where t is the translation between the two camera frames, N is the number of points, and j zi is the depth of feature point i in the j’th camera frame. Note that as t approaches zero, the parallax goes to zero. The comparison results are shown in Fig. 6, where parallax values from 0 to 0.1 with 0.001 increments are considered. For each parallax value, 20 coplanar and non-coplanar sets of five points were used, and Gaussian noise with zero mean and 0.75 pixel standard deviation is added to the image coordinates to represent the pixelization noise. Note that without the added noise, all estimation results are accurate up to machine precision, and hence it would not be possible to distinguish the performance of the algorithms. The rotation and translation errors are calculated according to metrics defined in 61 and 62. When parallax is zero, the actual translation is zero, and therefore tn is undefined. In this case, it is meaningless to assess the direction of the recovered translation. We have therefore defined the error as zero when parallax is zero. Since when the parallax is zero the homography approach works regardless of the point configuration, it is included in the comparisons. 19

Nister Li Kneip Kukel. Stew. Homog. QuEst Relin

Mean Rotation Error

0.009

0.006

0.003

Mean Translation Error

0

0.25 0.2 0.15 0.1 0

0.02

0.04

0.06

0.08

0.1

Parallax

Figure 6.

Estimation accuracy versus parallax at image noise level of 0.75 pixels standard deviation.

As can be seen from the figure, the performance of our methods and Stewenius’s algorithm in terms of rotation error are closely similar, and notably better than the other approaches. The proposed relinearization method outperforms all algorithms in terms of the accuracy of the estimated translation direction. When parallax approaches zero, the error associated to the translation increases, since the accuracy of the recovered translation depends of the ratio of parallax to noise. Lastly, our algorithm and homography recover the translation up to an unknown but constant scale factor, therefore, the magnitude of the recovered translation goes to zero as the parallax goes to zero. The translation recovered from the Essential matrix has unit norm regardless of the parallax. 9.3. Pure Rotation and Translation Sensitivity Comparison To provide additional insight into the robustness of the algorithms, the cases of pure rotational and translational motion are considered. For the case of pure rotation, the comparison results are shown in Fig. 7, where Guassian noise with standard divination from 0 to 10 pixels with 0.1 increments are considered. For each standard deviation increment, 50 Monte Carlo experiments with coplanar and 50 with non-coplanar sets of five points were used. Since the relative translation is zero, the homography is also included in the comparisons. Our proposed methods and the Stewenius algorithm show superior performance compared to other algorithms. Similar analysis for the case of pure translational motion is performed, where the results are shown in Fig. 8. QuEst shows the best performance, follows by the Stewenius and relinearization methods.

20

Nister Li Kneip Kukel. Stew. Homog. QuEst Relin

Mean rotation error

0.02

0.015

0.01

0.005

0 0

2

4

6

8

10

Noise standard deviation (pixels)

Estimation accuracy of relative rotation at different noise levels when camera has only rotational motion.

Mean translation error

Figure 7.

0.2

Nister Li Kneip Kukel. Stew. QuEst Relin.

0.1

0 0

2

4

6

8

10

Noise standard deviation (pixels)

Figure 8.

Estimation accuracy of relative translation at different noise levels when camera has only translational motion.

21

Table 1.

Average execution time of 5-point algorithms for 1000 Monte Carlo simulations.

Algorithm

Average execution time (ms)

7-point (Relin) 6-point (Relin)

0.9355 11.2461

5-point (Relin) 5-point (QuEst)

41.5110 0.8857

5-point (Kneip) 5-point (Nister) 5-point (Stewenius) 5-point (Li) 5-point (Kukelova)

0.9582 0.2649 0.1099 0.0354 0.3832

9.4. Time Comparison Table 1 lists the average execution time of all 5-point algorithms for 1000 Monte Carlo simulations with pixelization noise. Included for completeness is the execution time of the relinearization algorithm when six or seven feature points are available (see Appendix A). By using six or more feature points, the size of matrix D in (49) reduces. Therefore, the executional time of the algorithm is improved. All algorithms are tested on the same platform, with Intel’s 4th Gen i-7 CPU. The OpenGV implementation of Stewenius’, Nister’s, and Kneip’s 5-point algorithms took on average 0.11, 0.26, and 0.96 milliseconds, respectively, to recover the pose difference from 2 images. The Matlab implementations of the QuEst and relinearization algorithms took on average 0.88 and 41.51 milliseconds to recover the pose, respectively. Our approach is generally slower than the others tested. We do note that for QuEst and 7-point relinearization methods, the running time is on the same oder of magnitude of other algorithms, and well above the frame rate of most cameras. A user can determine if the longer computation time is worth the improved robustness to noise and the benefits our algorithm provides for control and robotics tasks, as shown in the next section. 10. Application in Visual Servoing Position-Based Visual Servoing (PBVS) [51] techniques use image features to estimate the rotation and translation between the current robot pose and a desired pose. This estimate is then used to generate a velocity command for the robot to move toward the goal pose. Often the camera is mounted on the robot, in which case the pose error can be defined in the camera frame. The goal pose can be defined with respect to objects in the scene, or with respect to the location at which a goal image was captured. The pose of the camera in the reference coordinate frame can then be found using the relative pose estimation algorithms, such as the method proposed in this paper. In this section, we first present simulation results to show how the pose recovered from the proposed approach can be used to navigate a camera in the 3D space to a goal pose via PBVS. We then present several experiments using a 22

ground robot with both general and coplanar point configurations. Our experiments are devised to simultaneously test the performance of our algorithm and show its application in PBVS. To show that our proposed approach is sufficiently fast to be used for real-time robotic application, we use the relinearization method to recover the rotation, which has the highest execution time. The estimation results are compared with the Stewenius’ algorithm, which among the Essential matrix based algorithms has the best performance, and homography when points are coplanar. Existing PBVS methods may use homography, which returns a translation vector that goes to zero as the distance between the current and desired pose goes to zero. However, since points need to be coplanar, plane detection techniques should be used initially to restrict the camera’s action to parts of the scene that are coplanar. For general scenes, PBVS methods can use Essential matrix, which returns a unit norm translation vector. However, the unit norm translation does not accurately indicate how close the goal and current camera frames are together. This reduces the applicability of proportional error feedback. So other techniques should be used to stop the camera once it reaches the goal position. Our method is a suitable choice for PBVS applications, since it provides a proportional translation feedback and is not affected by point configuration. 10.1. PBVS Simulation Consider two camera views in Fig. 9(a), where F1 denotes a goal, and F2 denotes the current camera pose. The goal camera pose is chosen at the origin of the world coordinate frame with the optical axis along the z-axis of the world frame. Origin of F2 is chosen at [−1094, 46, 646]> in the world coordinate frame, and with orientation given by the quaternion q = [0.943, 0.110, 0.223, 0.221]> . Hence, the relative pose of the goal pose expressed in F2 is given by 2 q = [0.943, −0.110, −0.223, −0.221]> , 2 t = [1097, −640, −47]> . Five general points are randomly placed with an approximate average distance of 2000 mm along the z-axis of the world coordinate frame. Using the camera calibration matrix in (64) and the pinhole camera model, projection of points in the goal and current image frames are computed and are shown in Fig. 9(e). Assuming that the goal image taken at F1 is provided, the relative pose recovered by our proposed method from the images can be used to calculate a motion that brings the camera from its current position to the goal frame. To do so, the camera is translated to the next position along the direction of recovered translation. In addition, the recovered rotation is interpolated into 120 segments and the camera is rotated along this trajectory in the quaternion space. As mentioned in the previous section, the pose recovered from using only 5 feature points is generally not unique. To find the correct answer from among the recovered solution candidates, a third camera view can be used to triangulate the pose. This is done by noting that the scaled depths of the feature points given in the F1 frame, which is calculated through the procedure explained in Section 8, should be the same when the relative pose is recovered with respect to the goal pose. The scaled depths of feature pints in the goal pose is [0.178, 0.175, 0.180, 0.233, 0.233]> , and at each iteration the pose with the closet Euclidean distance to this vector is chosen as the correct answer. Figure 9(a)-(d) shows the location of the camera throughout the PBVS simulation, where the current camera pose is shown via the blue frame. The corresponding location of feature points in the current and goal images are shown in 23

54

54 2000

3 21

1000

1000 F2

0 0

0 -1000

-1000

F2

0

F1

1000 1000

0

600

200

400

1

v (pixel)

v (pixel)

600

400

0

200

400

600

400

1000

u (pixel) 800 1000

200

400

1

42 5

0 -1000

-1000

(d)

600

800 1000

1

200

3

F1

u (pixel) 800 1000

600

(e) Figure 9.

600

1

42 5

0 -1000

-1000

1000 1000

(c)

200

3

0

F2

0

F1

u (pixel) 800 1000

200 400

F2

(b)

u (pixel) 400

0 -1000

-1000

1000 1000

3 21

1000

0

F1

(a)

200

54 2000

3 21

1000

v (pixel)

1000

54 2000

3 21

200

3

600

42 5

v (pixel)

2000

400

3

42 5

600

(f)

(g)

(h)

Goal pose F1 and the current camera pose F2 . (a)-(d) Motion of the camera through the PBVS algorithm. (e)-(h) Corresponding images

of the points and their trajectories. The image of feature points taken at the goal pose is shown by , and at the current pose by •.

Fig. 9(e)-(h). As can be seen, the PBVS algorithm drives the camera to the goal pose, in which the current and goal images coincide. The relative rotation and translation between the camera and goal frame at each iteration is shown in Fig. 10. As can be seen from the figure, the relative rotation converges to the quaternion q = [1, 0, 0, 0]> , which shows that the orientation of the camera and goal frames are aligned. The relative translation between the frames converges to zero, which shows the convergence to the desired position. 10.2. PBVS Experimental Results The experimental setup corresponding to the non-coplanar feature point configuration is shown in Fig. 11. Five spheres are used as feature points, and are arranged such that no 4 spheres are coplanar. A 30 FPS, 640 × 480 pixels

webcam with 90◦ field of view is mounted horizontally on a Pioneer P3-DX differential-drive wheeled mobile robot. The camera is calibrated, and the calibration matrix is  509.5   K =  0  0

0

329.9



  510.2 242.9 .  0 1

(64)

Images acquired by the camera at locations shown in Figs. 11(a)-(e) are shown in Figs. 11(f)-(j), respectively. The center of spheres in the images are used as the feature point coordinates. The goal pose is shown in Fig. 11(a). The coordinate frame of the camera at this pose is used as the reference coordinate frame. Note that the axes of the 24

Relative quaternion

1 0.75

q1

0.5

q2

0.25

q3

0

q4

-0.25

Relative translation

0.15 tx

0.1

ty

0.05

tz

0 -0.05 -0.1 0

50

100

150

200

250

300

350

400

450

Iteration

Figure 10.

Relative rotation and translation between the current and desired camera pose at each iteration.

reference coordinate frame are defined as follows: the z-axis is the optical axis of the camera and points forward, the x-axis points to the right, and the y-axis points to the ground. The robot is moved to a second (arbitrary) pose, as shown in Fig. 11(b), and PBVS is used to navigate the robot to the initial pose. Using the images acquired at the initial and current pose in the proposed algorithm, the pose of the robot in the reference coordinate frame is estimated. Figures 11(c) and 11(d) show the pose of the robot during PBVS at two intermediate stages. The final pose of the robot is shown in Fig. 11(e). Comparing the figures corresponding to the initial and final pose, the robot has moved to a close vicinity of the initial pose. In Appendix B, we discuss a novel approach to deal with the multiplicity of the solutions in the 5-point pose estimation problem. Using this approach, many of the solution candidates that are not feasible (e.g., solutions that do not satisfy the chirality condition) are found and eliminated. Empirically, the remaining set of solution candidates contains 2 to 8 solutions. Since the robot rotates along the y-axis (in the reference coordinate frame), a solution candidate with rotation along the y-axis is chosen from the remaining set as the actual pose difference. We would like to point out that the relative pose estimation approach presented in this work is based on the formulation of the problem in its full generality, and for example can be used for visual servoing of robotic arms, quadrotors, or other robots that are not confined to a plane. For specific applications, pose estimation algorithms such as [52, 53, 54] can be designed, where by considering additional information such as the IMU data, vehicle dynamics and motion constraints, the unknown pose parameters are reduced and fewer feature points are needed to estimate a unique pose. The robot control algorithm is explained in Appendix C. Links to videos of the experiments and Matlab implementation of the algorithm are available in the Supplementary Material section.

25

Figure 11.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(a) Initial pose of the robot. (b) Second pose of the robot at t = 0s corresponding to key frame 1. (c) Pose of the robot at time t = 13s

corresponding to key frame 38. (d) Pose of the robot at time t = 26s corresponding to key frame 75. (e) Pose of the robot at time t = 39s corresponding to key frame 112. (f) Image taken by the camera mounted on the robot at the initial pose. (g) Image taken by the camera at t = 0s. (h) Image taken at t = 13s. (i) Image taken at t = 26s. (j) Image taken at t = 39s.

10.3. Non-coplanar Point Configuration In the experiment shown in Fig. 11, the robot was moved from the initial pose to the second pose by a translation of 80 cm to the back and 20 cm to the right, followed by counterclockwise rotation of 20 degrees. Figure 12 shows the trajectory of the feature points in the image as the robot moves toward the initial pose using PBVS. As can be seen from the figure, the feature points converge to their initial position with good accuracy. The estimated relative rotation at each key frame (given in the reference coordinate frame) is shown in Fig. 13. The rotations recovered by our method and the Essential matrix (using Stewenius’ algorithm) are found from the same set of images and are shown in quaternions for comparison. Figure 14 shows the estimated translation (given in the reference coordinate frame) at each key frame. Since the translation can only be recovered up to a scale factor, to allow a better comparison between the estimated translation vectors, the recovered translations in the first key frame are normalized such that their l2 -norm is one. All subsequently recovered translation vectors are scaled by the same scale factor that was used in the first key frame of each algorithm. Clearly, the translation recovered from the Essential matrix does not go to zero as the robot approach its initial position. The scaled translation recovered using our algorithm goes to zero as the robot approaches the initial pose. To further verify the convergence to the desired pose, 5 additional experiments are conducted, where the robot was moved to a different second pose in each experiment. The estimated trajectory of the robot in the reference coordinate frame can be reconstructed (up to a scale factor) by plotting the x-z components of the recovered translations. Figure 15 shows the recovered robot trajectories for all experiments, where the trajectories are smoothed to remove noise. The arrows show the orientation of the robot recovered by the algorithm at several points on the trajectory. One can 26

u (pixel) 100

200

300

v (pixel)

100

400

500

3

4

600

5 2

200

1

300

400

Figure 12.

Trajectory of the feature points in the image plane as the robot moves toward the initial position for the general case where points are

non-coplanar. The image of feature points taken at the initial pose is shown by , at second pose by ◦, and at the final pose by •.

q 1- 1

q (Ess.)

0.15

q2

q3

q4

0.1 0.05 0

q (Quat.)

0.15 0.1 0.05 0 0

20

40

60

80

100

Iteration

Figure 13.

Comparison of the recovered rotation for the general case where points are non-coplanar.

27

t (Ess.)

0 -0.4 -0.8

t (Quat.)

0 -0.4 -0.8

tx 0

20

40

60

ty 80

tz 100

Iteration

Figure 14.

Comparison of the recovered translation for the general case where points are non-coplanar. Translation recovered from the Essential

matrix has unit norm and does not go to zero as the robot approaches the initial pose. Translation is correctly recovered using our algorithm and goes to zero as the robot approaches the initial pose.

see from the figure that the robot has reached to a close vicinity of the initial pose. 10.4. Coplanar Point Configuration Using a planar configuration of the feature points, similar experiments were conducted to have a better comparison between the homography and the 5-point algorithms. We moved the robot from its initial pose by translation of 80 cm to the back and 20 cm to the left, followed by clockwise rotation of 20 degrees. Figure 16 shows the trajectory of the feature points in the image as the robot moves toward the initial pose. The feature points converge to their initial position with good accuracy. Figures 17 and 18 show the rotation and translation recovered by each algorithm, respectively. Since the feature points are coplanar, the homography method can recover the pose correctly. In contrast to our algorithm (and homography), the translation recovered from the Essential matrix does not go to zero as robot approaches its initial pose. Five additional experiments with robot at different second pose are conducted to further verify the convergence to the desired pose. The smoothed, estimated trajectory of the robot is shown in Fig. 19 for each experiment. As can be seen from the figure, the finial pose of the robot is within a close vicinity of the initial pose. 11. Conclusion We have shown that the 5-point relative pose problem can be effectively solved by using quaternions and without relying on the Essential matrix. Our approach avoids many of the problems that are associated with existing algorithms, such as degeneracies associated with zero relative translation between two camera views or coplanar feature 28

0.2 Expr. 1 Expr. 2 Expr. 3 Expr. 4 Expr. 5 Expr. 6

0 -0.2

z

-0.4 -0.6 -0.8 -1 -0.15

-0.1

-0.05

0

0.05

0.1

0.15

x

Figure 15.

Estimated trajectory of the robot for 6 different experiments plotted based on the recovered (smoothed) translation. The arrows indicate

the orientation of the robot at intermediate steps based on the recovered rotation. Note that the initial orientation of the robot is along the z-axis.

u (pixel) 100

200

300

400

500

600

v (pixel)

100

5

200

4

1 300

3

2

400

Figure 16.

Estimated trajectory of the feature points in the image plane as the robot moves toward the initial position for the coplanar case. The

image of feature points taken at the initial pose is shown by , at second pose by ◦, and at the final pose by •.

29

q (Ess.)

0 -0.1 q1- 1

q (Hom.)

-0.2

q2

q3

q4

0 -0.1

q (Quat.)

-0.2 0 -0.1 -0.2 0

20

40

60

80

100

Key frame

t (Ess.)

0.8 0.4 0 -0.4 -0.8

t (Hom.)

Comparison of the recovered rotation for the coplanar case. All algorithms have recovered the rotation correctly.

0.8 0.4 0 -0.4 -0.8

t (Quat.)

Figure 17.

0.8 0.4 0 -0.4 -0.8

tx

0

20

40

60

ty

80

tz

100

Key frame

Figure 18.

Comparison of the recovered translation for the coplanar case. Translation recovered from the Essential matrix has unit norm and does

not go to zero as the robot approaches the initial pose. Translation recovered by the homography and our algorithm is properly scaled, i.e., it goes to zero as the robot approaches the initial pose.

30

0.2 Expr. 1 Expr. 2 Expr. 3 Expr. 4 Expr. 5 Expr. 6

0 -0.2

z

-0.4 -0.6 -0.8 -1 -0.15

-0.1

-0.05

0

0.05

0.1

0.15

x

Figure 19.

Trajectory of the robot for 6 different experiments plotted based on the recovered (smoothed) translation. The arrows indicate the

orientation of the robot at intermediate steps based on the recovered rotation. The initial orientation of the robot is along the z-axis.

points. Our method requires no a priori knowledge about the scene or the pose difference between two camera views. In addition to recovering the rotation in quaternions, our approach simultaneously recovers the translation and depth of the points with a common scale factor. We provided two different methods to recover the rotation from the quaternion formulation: QuEst and relinearization. We compared the accuracy and execution time of both methods with the state-of-the-art 5-point algorithms. Compared to QuEst, the relinearization method showed better estimation accuracy for translation in the case of small parallax, but had a large execution time. QuEst, on the other hand, showed better estimation accuracy in other comparisons, and had a much shorter execution time. On average, the estimation accuracy of both algorithms were shown to outperform all compared 5-point algorithms, with the exception of the Stewenius algorithm, which had a similar performance in some of the comparisons. However, our proposed algorithms were shown to have better accuracy compared to the Stewenius algorithm when points are coplanar. We showed that our approach can be practically used in robot control applications, such as the Position-Based Visual Servoing (PBVS). We provided PBVS simulations for a camera with all six degrees of freedom for the motion. We further provided PBVS experiments using a ground robot. Future work includes optimizing the algorithm to reduce computation load and time. 12. Acknowledgments This work was supported by the DGIST R&D Program of the Ministry of Science, ICT and Technology of Korea(15-BD-01).

31

Appendix A. Extension to Six and More Feature Points Although the algorithm presented in Section 7.2 was designed for five points, the same technique can be used when more feature points are available. For example, by using six feature points, the number of constraints in degree  4 monomials of quaternion elements raises from 10 to 63 = 20. In turn, C in (46) will be a 21 × 35 matrix, and D

¯ in (49) will have the size 465 × 119. In such a case, the total number of solutions, which equals to the nullity of D, can be shown to be 2. Similarly, when seven feature points are available, matrices C and D will have the size 36 × 35

and 465 × 10, respectively. Although in this case (and when more feature are used) C has more rows than columns, C cannot be full rank due to non-uniqueness of solutions in the pose estimation problem. The nullity of C with seven or more feature points is 2, and two solutions are recovered from the algorithm. Note that since by using six or more feature points, the size of C and D reduces, and one can expect the execution time to be improved with more feature

points. This is verified in analysis in Section 9.4. When six or more feature points are used, outlier rejection methods such as RANSAC can be employed to improve accuracy at the cost of increased computation time. The application of RANSAC is straightforward, with (32) or (46) used as the distance function. Link to the RANSAC implementation is provided in the Supplementary Material section. Appendix B. Elimination of physically Infeasible Pose Solution Candidates In what follows, we introduce a novel approach to discard infeasible pose solutions that are recovered in the 5point pose estimation problem. Our method, which is based on projection of the image points on the unit sphere, is explained by the following example. Consider the configuration of points and cameras shown in Fig. B.20. We choose the reference coordinate frame as the coordinate frame of the camera at the first view, i.e., F1 . Assume the camera has been moved to a pose by a translation t = [−800, 600, − 500]T , followed by a 35o rotation along the rotation axis [0.66, 0.33, 0.66]T . Hence,

the relative rotation between the first and second pose in quaternions is q = [0.95, 0.20, 0.10, 0.20]T . The coordinate frame F2 in Fig. B.20 corresponds to the second camera view. Images of 5 feature points taken by the camera at each pose are shown in Fig. B.21. Using the algorithms explained in Section 7, (at most) 20 quaternion solution candidates for the orientation difference between camera views are recovered. Many of these solutions correspond to configurations of cameras and points that are not physically feasible, and therefore can be eliminated. For example, consider the rotation solution candidates



0.14



     0.65   , q1 =    0.73    −0.17



0.12



     0.15   , q2 =   −0.20   −0.96 32

  0.95     0.20  , q3 =   0.10   0.20

(B.1)

2000

2 1500

5 1

4 z (mm)

1000

3 500 0

F1 -500

F2 1000

500

0

-500 -1000

-500

y (mm)

Figure B.20.

0

500

x (mm)

The camera coordinate frame F1 indicates the initial pose of the camera, where the reference coordinate frame has been chosen

as F1 . The camera coordinate frame F2 located at [−800, 600, − 500]T with orientation q = [0.95, 0.20, 0.10, 0.20]T in the reference coordinate frame indicates the second pose of the camera.

u (pixel) 100

100

200

300

400

1

500

600

3

v (pixel)

1 200

300

400

Figure B.21.

2

3 2

5 4

4

5

Overlaid images of 5 feature points taken by the camera at the first and second pose shown in in Fig. B.20. The image taken at the

first pose (F1 ) is shown by , and the image taken at the second pose (F2 ) is shown by •.

33

and their associated translation vectors (given in F1 , pointing from origin of F2 to origin of F1 )       0.72 0.30 0.19             t1 =  0.16  , t2 = −0.25 , t3 = −0.54 ,       0.45 −0.92 −0.96

(B.2)

recovered by the algorithm. To recover the pose difference uniquely, two of the solutions should be discarded. Consider the projection of an image point mi on the unit sphere given by mi / ||mi ||. Denote the origins of F1

and F2 by O1 and O2 , respectively. Projection of image points on unit spheres centered at O1 and O2 are shown in Figs. B.22(a) and B.22(b), respectively. Consider the sphere in Fig. B.22(b), on which the image at F2 is projected. Rotating this sphere about O2 by each rotation candidate in (B.1) generates the configurations shown in Figs. B.22(c)(e), where the recovered translation vectors (B.2) are also plotted in the figures. Points on the rotated spheres depict the spherical projection of the second image in a rotated coordinate frame, F20 , which is aligned with F1 . Since there is only a translational difference between F20 and F1 , moving the rotated spheres along their corresponding translation vectors should bring the spherical projection of the points to the configuration shown in Fig. B.22(a). This observation is the key to eliminate solutions that are not feasible. Consider the rotated sphere in Fig. B.22(c) corresponding to the solution candidate q1 , with translation vector t1 . One can see that the feature points on the sphere have negative z-coordinates (i.e., they are located on the south hemisphere). This implies that F20 (and therefore F2 ) is located behind the feature points. In most applications, feature points are located on objects that are not transparent. Therefore such solution candidates can be removed. Consider the rotated sphere corresponding to the solution candidate q2 and translation vector t2 , shown in Fig. B.22(d). Figure B.22(f) shows the top view of this sphere (i.e., the projection of the sphere and translation vector on the x-y plane). The top view of the sphere in Fig. B.22(a) is shown in Fig. B.22(h). Define the primary plane as the plane perpendicular to the x-y plane that contains t2 , and define the primary great arc as the intersection of the sphere with this plane. Consider the set of great arcs formed by planes that pass through the origin and intersect the primary plane along the x-y component of t2 . We call these the x-y great arcs. Translating the sphere along the x-y component of t2 moves the points on the sphere along the x-y great arcs that pass through them. The top view of some x-y great arcs are shown in Fig B.22(g). Now consider great arcs formed by intersession of the sphere with the planes that contain the z-axis. We call these the z great arcs. The z-component of the translation vector moves the points along the z great arcs that contain the points. In the top view, the z-component of t2 moves the points toward the center of the circle along the radii passing through them. Since the trajectory of the points is confined to the aforementioned great arcs, by moving along t2 , and regardless of the length of translation, it is not possible to bring the points to their original configuration on circle shown in B.22(h). Thus, q2 is not feasible and can be eliminated. By similar reasoning, one can see from Fig. B.22(f) that the primary great arc divides the points into two sets: points {1, 2} on the upper half of the circle and points {3, 4, 5} on the lower half. By overlaying this primary great

arc on spherical projection of first image, one can see from Fig. B.22(h) that the points are divided into the same sets, 34

Figure B.22.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(a) Image of feature points in the first view, projected on the unit sphere centered at the origin of F1 . (b) Image of feature points

in the second view, projected on the unit sphere centered at the origin of F2 . (c) Translation vector t1 , and image coordinates after rotation by the solution candidate q1 = [0.14, 0.65, 0.73, − 0.17]T . (d) Translation vector t2 , and image coordinates after rotation by the solution candidate q2 = [0.12, 0.15, − 0.20, − 0.96]T . (e) Translation vector t3 , and image coordinates after rotation by the solution candidate q3 = [0.95, 0.20, 0.10, 0.20]T . (f) Top view of the rotated sphere corresponding to q2 , and translation vector t2 . (g) Top view of x-y great

arcs corresponding to t2 . Translating the sphere along the x-y component of t2 moves a point on the x-y great arc that passes through the point. (h) Top view of the spherical projection in F1 , and translation vector t2 . (i) Top view of the rotated sphere corresponding to q3 , and translation vector t3 . (j) Top view of the spherical projection in F1 , and translation vector t3 .

35

but on the opposite half circles. If the primary great arc divides the points into different sets on opposite hemispheres, no translation can bring the points to the desired positions, and therefore is not feasible. Lastly, top views of the sphere in Fig. B.22(e) corresponding to the solution candidate q3 and translation vector t3 is shown in Fig. B.22(i). Figure B.22(j) shows the top view of Fig. B.22(a) with t3 overlaid on the image. One can see that the translation vector meets all former criteria and thus q3 is a valid solution candidate. To summarize the above discussion as a theorem, let 1 mi and 2 mi be the coordinates of image point i in camera coordinate frames F1 and F2 , respectively. Let q be a quaternion solution candidate recovered by the algorithm, in the coordinate frame F1 . Let txy := [tx , ty ]T be the x-y component of the recovered translation vector associated with q, given in F1 pointing from O1 to O2 . Define 1

si := [1 xi , 1 y i , 1 z i ]T = 1 mi /||1 mi ||,

2

si := [2 xi , 2 y i , 2 z i ]T = 2 mi /||2 mi ||

(B.3)

as the spherical projection coordinates of 1 mi and 2 mi onto unit spheres centered at O1 and O2 , respectively. Let 2

˜si := [2 x ˜i , 2 y˜i , 2 z˜i ]T

(B.4)

represent the coordinate of 2 si after rotation by q. Define 1 vi , 2 vi ∈ R2 respectively as the first two components of 1

si and 2˜si , i.e.,

1

vi := [1 xi , 1 y i ]T ,

2

vi := [2 x ˜i , 2 y˜i ]T .

(B.5)

Theorem 2. Given 2 images of 5 feature points, a quaternion solution candidate q and its associated translation vector txy are feasible if the following conditions are met for every i ∈ Ni (a) 2 z˜i > 0 (b) ang(txy , 2 vi ) ≤ ang(txy , 1 vi )

(c) sgn(det([txy , 2 vi ])) = sgn(det([txy , 1 vi ])).

Proof. We only outline the proof. Condition (a) is clear from our previous discussion. Condition (b) follows by observing that translating the sphere along txy moves point i along the opposite direction of txy on the x-y great arc that pass through it (cf. Fig. B.22(g)). Therefore, the angle between 2 vi and txy increases. Translating the sphere along z-component of t moves the point along the z great arcs. Thus, the angle between 2 vi and txy remains constant. Consequently, translating the sphere along the x-y and z-components of t can only increase the value of ang(txy , 2 vi ). Therefore, the original configuration is achievable only if ang(txy , 2 vi ) ≤ ang(txy , 1 vi ). Condition

(c) follows by seeing that no point can cross the primary great arc as the sphere is translated. Thus, the primary great arc should divide the points into the same groups, and on the same side of the sphere. This implies that the sign of the determinants in part (c) should be the same. Note that in general, more than one solution candidates may satisfy the conditions in Theorem 2. Empirically,

by enforcing these conditions, many solution candidates are eliminated. A priori knowledge of the scene or camera views, if available, can be used to determine the pose uniquely from the remaining set of solutions candidates. 36

Appendix C. Control Algorithm Consider the rotation q ∈ H and scaled translation vector t = [tx , ty , tz ]T ∈ R3 recovered by the algorithm,

given in the robot’s local coordinate frame. Let [θx , θy , θz ]T be the (extrinsic) Euler-angle representation of the recovered quaternion about the x-y-z axes of the reference coordinate frame. Note that the robot/camera is confined to the z-x plane and can only rotate along the y-axis. Thus, θx and θz are zero. Define the bearing angle φ ∈ [0, 2π)

by

φ := atan2(tz , tx ) and scaled distance r ∈ R by r=

p

t2x + t2z .

(C.1)

(C.2)

The robot’s motion can be described via the simple kinematics model z˙ = ν cos(θy ) x˙ = ν sin(θy )

(C.3)

θ˙y = ω where [z, x]T and θy are respectively the robot’s position and heading angle in the reference coordinate frame, and ν, ω ∈ R are control signals to be determined.

We seek to design a control law that brings the robot to the initial pose, which is given in the reference coordinate

frame by [z, x]T = [0, 0]T and θy = 0. Note that in applications such as PBVS, the vector [z, x]T is unknown and cannot be used in the control. In our proposed algorithm, the recovered translation vector is properly scaled, i.e., there exists an unknown constant real number c > 0 such that [z, x]T = c [tx , tz ]T for all camera views. Therefore, r represents the distance between the robot and the origin, multiplied by a constant (unknown) scale factor. A practical method to bring the robot to the initial pose is to move along the recovered translation vector until r is smaller than a desired threshold value r ∈ R+ . Once r < r , the control can be switched to use the recovered rotation to align the

heading angle of the robot such that θy = 0.

Based on the preceding discussion and our previous work [55], we propose a switching feedback control law. If r ≥ r , the control law is given by ν = k1 tanh(r) cos(φ)

(C.4)

ω = −k2 sin(φ)

(C.5)

where k1 , k2 ∈ R+ are control gains. This control law drives the robot to the origin [z, x]T = [0, 0]T . If r < r , the

control law is

ν=0

(C.6)

ω = −k2 sin(θy )

(C.7)

to derive θy to zero. 37

References [1] N. Gans, G. Hu, W. E. Dixon, Image based state estimation, in: Encyclopedia of Complexity and Systems Science, Springer, 2009, pp. 4751–4776. [2] L. Kneip, P. Furgale, R. Siegwart, Using multi-camera systems in robotics: Efficient solutions to the NPnP problem, in: International Conference on Robotics and Automation, IEEE, 2013, pp. 3770–3776. [3] O. Faugeras, L. Robert, S. Laveau, G. Csurka, C. Zeller, C. Gauclin, I. Zoghlami, 3D reconstruction of urban scenes from image sequences, Computer vision and image understanding 69 (3) (1998) 292–309. [4] B. D. Lucas, T. Kanade, et al., An iterative image registration technique with an application to stereo vision. [5] J. Shi, C. Tomasi, Good features to track, in: Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1994, pp. 593–600. [6] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision 60 (2) (2004) 91–110. [7] H. Bay, T. Tuytelaars, L. Van Gool, SURF: Speeded up robust features, in: European conference on computer vision, Springer, 2006, pp. 404–417. [8] O. D. Faugeras, S. Maybank, Motion from point matches: multiplicity of solutions, in: Workshop on Visual Motion, 1989, pp. 248–255. [9] M. Demazure, Sur deux problemes de reconstruction, Tech. Rep. RR-0882, INRIA (Jul. 1988). [10] J. Philip, A non-iterative algorithm for determining all essential matrices corresponding to five point pairs, The Photogrammetric Record 15 (88) (1996) 589–599. [11] J. Philip, Critical point configurations of the 5-, 6-, 7-, and 8-point algorithms for relative orientation, Department of Mathematics, Royal Institute of Technology Stockholm, Sweden, 1998. [12] H. Li, R. Hartley, Five-point motion estimation made easy, in: IEEE International Conference on Pattern Recognition, Vol. 1, 2006, pp. 630–633. [13] H. Longuet-Higgins, Multiple interpretations of a pair of images of a surface, in: Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, Vol. 418, The Royal Society, 1988, pp. 1–15. [14] S. Negahdaripour, Multiple interpretations of the shape and motion of objects from two perspective images, IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (11) (1990) 1025–1039. [15] X. Hu, N. Ahuja, Sufficient conditions for double or unique solution of motion and structure, in: International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1991, pp. 2445–2448. [16] M. A. Fischler, R. C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM 24 (6) (1981) 381–395. [17] K. Fathian, N. R. Gans, A new approach for solving the five-point relative pose problem for vision-based estimation and control, in: American Control Conference, IEEE, 2014, pp. 103–109. [18] K. Fathian, J. P. R. Paredes, E. Doucette, J. W. Curtis, N. N. Gans, Quest: A quaternion-based approach for camera motion estimation from minimal feature points, Robotics and Automation Letters. [19] A. Kipnis, A. Shamir, Cryptanalysis of the hfe public key cryptosystem by relinearization, in: Advances in cryptology, Springer, 1999, pp. 19–30. [20] E. Kruppa, Zur Ermittlung eines Objektes aus zwei Perspektiven mit innerer Orientierung, Sitzungsberichte der Mathematisch Naturwissenschaftlichen Kaiserlichen Akademie der Wissenschaften 122 (1913) 1939–1948. [21] H. C. Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature 293 (5828) (1981) 133–135. [22] O. Faugeras, Three-dimensional computer vision: a geometric viewpoint, MIT Press, 1993. [23] S. Maybank, Theory of Reconstruction from Image Motion, Springer Berlin Heidelberg, 1993. [24] D. Hook, P. McAree, Using Sturm sequences to bracket real roots of polynomial equations, Academic Press, 1990. [25] D. Nister, An efficient solution to the five-point relative pose problem, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2003, pp. 195–202.

38

[26] D. H. Stewénius, C. Engels, D. D. Nistér, Recent developments on direct relative orientation, ISPRS Journal of Photogrammetry and Remote Sensing 60 (4) (2006) 284–294. [27] Z. Kukelova, M. Bujnak, T. Pajdla, Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems., in: BMVC, Vol. 2, 2008, p. 2008. [28] D. Batra, B. Nabbe, M. Hebert, An alternative formulation for five point relative pose problem, in: Workshop on Motion and Video Computing, IEEE, 2007, pp. 21–21. [29] M. Sarkis, K. Diepold, K. Huper, A fast and robust solution to the five-pint relative pose problem using gauss-newton optimization on a manifold, in: International Conference on Acoustics, Speech and Signal Processing, Vol. 1, IEEE, 2007, pp. I–681. [30] J. Hedborg, M. Felsberg, Fast iterative five point relative pose estimation, in: IEEE Workshop on Robot Vision, 2013, pp. 60–67. [31] Y. Ma, S. Soatto, J. Kosecka, S. S. Sastry, An Invitation to 3-D Vision, Springer, 2003. [32] E. Malis, F. Chaumette, 2 1/2 D visual servoing with respect to unknown objects through a new estimation scheme of camera displacement, International Journal of Computer Vision 37 (1) (2000) 79–97. [33] M. Kalantari, F. Jung, J.-P. Guedon, N. Paparoditis, The five points pose problem: A new and accurate solution adapted to any geometric configuration, in: Advances in image and video technology, Springer, 2009, pp. 215–226. [34] B. Buchberger, Gröbner bases: An algorithmic method in polynomial ideal theory, in: Multidimensional Systems Theory–Progress, Directions and Open Problems in Multidimensional Systems, 1985, pp. 184–232. [35] J. Lim, N. Barnes, H. Li, Estimating relative camera motion from the antipodal-epipolar constraint, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (10) (2010) 1907–1914. [36] J. C. Bazin, C. Demonceaux, P. Vasseur, I. Kweon, Motion estimation by decoupling rotation and translation in catadioptric vision, Computer Vision and Image Understanding 114 (2) (2010) 254–273. [37] L. Kneip, R. Siegwart, M. Pollefeys, Finding the exact rotation between two images independently of the translation, Springer, 2012. [38] H. Stewenius, C. Engels, D. Nister, An efficient minimal solution for infinitesimal camera motion, in: IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. [39] J. Ventura, C. Arth, V. Lepetit, Approximated relative pose solvers for efficient camera motion estimation, in: Computer Vision-ECCV Workshops, Springer, 2014, pp. 180–193. [40] R. I. Hartley, F. Kahl, Global optimization through rotation space search, International Journal of Computer Vision 82 (1) (2009) 64–79. [41] L. Kneip, S. Lynen, Direct optimization of frame-to-frame rotation, in: International Conference on Computer Vision, IEEE, 2013, pp. 2352–2359. [42] J.-Y. Bouguet, Camera calibration toolbox, https://www.vision.caltech.edu/bouguetj/calib_doc/ (2015). [43] B. Buchberger, F. Winkler, Gröbner bases and applications, Vol. 251, Cambridge University Press, 1998. [44] N. Courtois, A. Klimov, J. Patarin, A. Shamir, Efficient algorithms for solving overdefined systems of multivariate polynomial equations, in: Advances in Cryptology, Springer, 2000, pp. 392–407. [45] R. I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd Edition, Cambridge University Press, ISBN: 0521540518, 2004. [46] L. Kneip, P. Furgale, OpenGV: A unified and generalized approach to real-time calibrated geometric vision, in: International Conference on Robotics and Automation, IEEE, 2014, pp. 1–8. [47] H. Stewenius, Eight-point, seven-point, and six-point solvers, http://www.vis.uky.edu/~stewe/FIVEPOINT (2006). [48] H. Li, Five- and six-point essential matrix estimation, http://users.cecs.anu.edu.au/~hongdong/Publications.html (2012). [49] Z. Kukelova, Minimal problems in computer vision, http://cmp.felk.cvut.cz/mini/ (2012). [50] D. Q. Huynh, Metrics for 3D rotations: Comparison and analysis, Journal of Mathematical Imaging and Vision 35 (2) (2009) 155–164. [51] F. Chaumette, S. Hutchinson, Visual servo control. I. basic approaches, Robotics & Automation Magazine, IEEE 13 (4) (2006) 82–90. [52] X. Zhang, Y. Fang, X. Liu, Motion-estimation-based visual servoing of nonholonomic mobile robots, Transactions on Robotics 27 (6) (2011)

39

1167–1175. [53] X. Zhang, Y. Fang, N. Sun, Visual servoing of mobile robots for posture stabilization: from theory to experiments, International journal of robust and nonlinear control 25 (1) (2015) 1–15. [54] D. Scaramuzza, 1-point-ransac structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints, International journal of computer vision 95 (1) (2011) 74–85. [55] J. Jin, Y.-G. Kim, S.-G. Wee, N. Gans, Decentralized cooperative mean approach to collision avoidance for nonholonomic mobile robots, in: International Conference on Robotics and Automation, IEEE, 2015, pp. 35–41.

40

Biography

Kaveh Fathian received the B.S. degree in Electrical Engineering from the Imam Khomeini International University, Iran, in 2010, and the M.S. degree in Electrical Engineering from the University of Texas at Dallas, TX, USA, in 2013. He is currently a PhD candidate in the Electrical Engineering program, and a M.S. candidate in the Mathematics program at the University of Texas at Dallas. His research interests include control theory, networked dynamic systems, computer vision, and robotics. Jingfu Jin received his M.S. degree in Electrical and Computer Engineering from Korea University, in 2010 and his Ph.D. in Electrical Engineering from the University of Texas at Dallas, in 2015. He is currently an automated driving software engineering at General Motors. His research interests include robotics and computer vision, with focus on autonomous vehicles, multi-robot systems, vision-based estimation and control, and sensor fusion. Sung-Gil Wee received the B.S. and M.S. degrees in Electrical Engineering from Yeungnam University in 2006 and 2008, respectively. During 2008-2011, he stayed in Gyeongbuk Research Institute of Vehicle Embedded Technology as a Researcher. He is now with Daegu Gyeongbuk Institute of Science & Technology of Rep. Korea. Dong-Ha Lee received the M.S. and Ph.D. degrees in Electronic Engineering from Kyungpook National University in 2001 and 2005, respectively. Since 2005, he has been a Principal Researcher with Daegu Gyeongbuk Institute of Science & Technology of Rep. Korea. His current research interests include renewable energy, robotics, and wellness convergence technology Yoon-Gu Kim received the M.S. and Ph.D. degrees in Computer Engineering from Yeungnam University in 2005 and 2008, respectively. Since 2008, he has been a Senior Researcher with Daegu Gyeongbuk Institute of Science & Technology of Rep. Korea. His current research interests include mobile robotics, autonomous vehicles, multi-robot systems, and industrial robots. Nicholas R. Gans received the M.S. in Electrical and Computer Engineering and his Ph.D. in Systems and Entrepreneurial Engineering from the University of Illinois at Urbana-Champaign in 2002 and 2005, respectively. He is currently a Clinical Associate Professor at The University of Texas at Dallas. His research interests include nonlinear and adaptive control and optimization, with focus on vision-based control and estimation, robotics and autonomous vehicles.

Quaternion formulation for the relative camera pose estimation problem

Estimating relative camera motion from captured images with high robustness to noise

Finding all solutions of a multi-variable polynomial system

Comparing state-of-the-art five point camera pose estimation algorithms

Vision based robot navigation via position based visual servoing