Sample-based adaptive Kalman filtering for accurate camera pose tracking

Sample-based adaptive Kalman filtering for accurate camera pose tracking

Neurocomputing 333 (2019) 307–318 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Sample-...

5MB Sizes 0 Downloads 75 Views

Neurocomputing 333 (2019) 307–318

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Sample-based adaptive Kalman filtering for accurate camera pose tracking Akbar Assa a,∗, Farrokh Janabi-Sharifi b, Konstantinos N. Plataniotis a a b

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON,Canada Department of Mechanical and Industrial Engineering, Ryerson University, Toronto,ON, Canada

a r t i c l e

i n f o

Article history: Received 27 March 2018 Revised 7 September 2018 Accepted 30 November 2018 Available online 2 January 2019 Communicated by Bin Fan Keywords: Adaptive Kalman filtering Vision-based pose estimation Covariance sampling

a b s t r a c t Inferring the camera pose with high accuracy is deemed crucial in many applications such as robot manipulation and virtual reality. Traditionally, features extracted from camera images are processed by Kalman-based schemes due to their high efficacy and fast response. Yet, the performance of the filter deteriorates if the parameters of the filter are uncertain. Estimating the noise parameters through conventional adaptive schemes alleviates this problem, yet the estimation’s accuracy still suffers from rough parameter estimations. To improve the accuracy of pose estimations further, this work proposes a novel adaptive scheme which employs a multi-model approach to approximate the system posteriori via sampling the noise and initial state covariance priories and progressively adjusting the filter parameters using the drawn samples. The experimental results confirm the enhanced performance of the proposed adaptive method compared to previously applied adaptive and non-adaptive pose estimation schemes, at the expense of additional complexity. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Obtaining the pose of an object via its image, also known as “extrinsic camera calibration”, is a historic problem in computer vision [1] and has been vastly considered in many applications including robotic object manipulation [2], visual odometry [3], visual SLAM (Simultaneous Localization And Mapping) [4], virtual reality [5], and visual servoing [6]. Most of these tasks require a high estimation accuracy which is constrained by the performance of the sensor (camera) and the exploited pose estimation algorithm. This paper addresses the need for an accurate pose estimation method by presenting a sample-based adaptive scheme. Numerous methods have been proposed to estimate the relative pose of an object of interest with respect to the camera. Perspective from n-Points (PnP) provides a linear solution for the problem using the relative geometry of points [7,8]. Closed-form linear solutions of the problem was provided by likes of [9–11]. In addition, optimization-based solutions such as bundle adjustment (which is popular in structure from motion applications [12]) were proposed employing Levenberg–Marquardt (LM) [13] and Virtual Visual Servoing (VVS) [14] algorithms. However, these methods are mostly



Corresponding author E-mail addresses: [email protected] (A. Assa), fsharifi@ryerson.ca (F. JanabiSharifi), [email protected] (K.N. Plataniotis). https://doi.org/10.1016/j.neucom.2018.11.083 0925-2312/© 2018 Elsevier B.V. All rights reserved.

sensitive to the impact of image noise induced by image formation and feature extraction. Visual odometry methods (e.g., [15]) exploit the temporal information of images to estimate the pose parameters. Nevertheless, these methods usually suffer from a drift of estimations caused by accumulation of sequential motion errors [16]. Bundle adjustment was exploited to mitigate this shortcoming, however the effect of image noise persists. More recently, a neural network-based learning scheme was proposed for pose estimation [17], yet this method requires a large data set to train the network. Besides, the accuracy of its estimations are relatively low as it roughly approximates the value of object pose from the training set. Bayesian filtering offers an interesting solution for the pose estimation problem by providing the posterior probability density of the object pose, considering both the image noise and the prediction of the pose based on its previous estimation. However, the exact solution of such approach is usually not tractable except in special cases (e.g., where the system is linear and system noise has a normal distribution). Kalman filter offers the optimal linear solution to the Bayesian filtering problem and has gained popularity in real-time pose tracking systems due to its high efficacy and low computational cost [18]. Since the image features are nonlinear functions of the object pose, extensions of the Kalman filter to nonlinear system, namely Extended Kalman Filter (EKF) [19,20], Unscented Kalman Filter (UKF) [21] and their iterative versions (e.g., Iterative EKF [22]) are exploited for pose estimation.

308

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

When the measurement and process noise as well as the initial pose values are known, these methods provide near-optimal results. However, such information is usually not available a priori, which causes the accuracy of the estimations to diminish. Among these parameters, the measurement noise covariance plays the key role as it regulates the impact of new visual measurements on the update of the pose estimations [23]. Various adaptive schemes were proposed in previous works to address the filter parameter uncertainty. The purpose of an adaptive filter is to bring the accuracy of estimations as close as possible to that of the correctly tuned filter [24]. An earlier work [25] classified these methods in four groups of Bayesian [26], Maximum Likelihood (ML) [27], correlation-based [28], and covariance matching techniques [29]. The latter approach specially gained popularity in robotic applications [30–32] due to its simplicity and compatibility with nonlinear systems. A neural network-based adaptive method was proposed for tracking in [33] which could adjust a tuning parameter for the measurement noise. Yet, the method needed to be trained and cannot provide accurate results in more complex scenarios. An adaptive Kalman filter based on Expectation-Maximization (EM) was recently proposed for localization of an autonomous underwater vehicle [34], yet their adaptation method was iterative. In addition, Multiple Model Adaptive Estimation (MMAE) schemes (e.g., [35,36]) were considered for noise parameter estimation, however, the accuracy of these schemes is dependent on the closeness of the chosen models to the real model of the system. Exploiting the noise covariance distribution, Maximum A Posteriori (MAP) [37] and Variational Bayesian (VB) [38,39] adaptive approaches were also proposed. Nevertheless, the aforementioned methods considered only the mean of the noise distribution, which led to partial loss of accuracy. Most of the aforementioned adaptive schemes are designed for linear systems and their performance is affected by nonlinearities of the real systems. In that sense, highly efficient adaptive Kalman filtering in real applications is still an open research problem. This work proposes a multi-model adaptive approach for the pose tracking problem under unknown system noise and initial value statistics by sampling the covariance prior distributions of system process and measurement noise, as well as its initial value. Using each sample, a model of the system is developed and the posteriori of each model is determined through an EKF, while the likelihoods of the models are calculated exploiting the measurements at each time step. The estimations gained through these models are then averaged using their likelihood. Moreover,the information gained from these models is used to estimate the system states and noise statistics. Since the models are based on samples drawn from the updated noise statistics, they gradually get closer to the correct model, resulting in enhanced accuracy of estimations both for system state and noise parameters. It should be noted that an early version of this scheme was proposed in a recent work of ours [23] for adaptation of only the measurement noise covariance, assuming full knowledge of the process noise and initial state covariances. This work extends the proposed adaptation paradigm to other filter parameters, namely the process noise covariance and the covariance of the system initial state, in addition to measurement noise covariance and applies this methodology to the object pose tracking problem. The proposed method is different from a particle filter since the samples from the covariance priories are exploited in this method, while particle filters propagate samples taken from the state priori and posteriori. The proposed adaptive scheme is tested and compared with tuned, untuned and other adaptive Kalman filters as well as a visual odometry method through multiple experiments. The experimental results certify the performance of the proposed method to be close to that of the tuned filter and superior to that of other adaptive methods and visual odometry, at the price of extra computations.

The rest of the paper is organized as follows. In the following section, the pose estimation problem is described and the employed models and assumptions are discussed. The details of the adaptive nonlinear filtering proposed for pose estimation is presented in Section 3, where the algorithm is explained in steps. Experimental results of the system are presented in Section 4, where the superiority of the proposed method is certified. Finally, the paper is concluded in Section 5. 2. Problem Formulation The goal of this work is to continuously estimate the pose of the object with respect to camera, exploiting the images of a camera. The camera may be installed on the robot, with a direct vision of an object of interest or set aside with a direct vision of a moving object (e.g., installed on the robot). The relative pose of the object with respect to the camera may be represented minimally by a 6 × 1 vector (denoted by ρco) which is composed of the relative translation vector from object frame to camera frame



T

(denoted by tco = xco yco zoc ) and minimal representation of the rotation matrix between object and camera frames (denoted



by co) in terms of the Euler angles, ϕco = φoc

ρco



tcoT

T ϕcoT )

θoc

ψoc

T

(i.e.,

= [40]. At each time step, the image features are extracted and related to the 3D model of the object. Various image features may be used for that matter, however point features are most common due to their fast and robust extraction(e.g., see [41]. The location of an object point in the camera frame is calculated through the homogeneous transformation [40],

pcj = opoj + tco c

(1)



 c T

ycj z j and poj are vectors of the Cartesian powhere pcj = xcj sitions of the jth point of the object in the camera and object frames, respectively. The projection of this point on image plane is found assuming the pin-hole camera model,



uj

 f v j = c xcj zj

ycj



(2)

where f is the  camera focal length acquired through camera calibration and u j v j are the image plane coordinates obtained from the image features and camera calibration parameters. The vector of system states is defined as the aggregated vector of object relative pose and its time derivatives,

 ω = ρcoT

ρ˙ coT

T

(3)

Here ω ∈ R12 is the vector of system states. A constant acceleration model is considered for state transition [42],

ωk+1 = Fωk + qk

(4)

where F is the transition matrix of the system and qk is the zero-mean white Gaussian process noise with covariance of Qk , (qk ∼ N (0, Qk )). If the dynamics of the system is not known a priori, F may be selected to be a double integrator [42]. System’s initial pose is assumed to have a normal distribution, i.e., ω0 ∼ ˆ 0 , P0 ), the mean of which is obtained through a closed-form N (ω pose estimation algorithm (e.g., [11]). Then, the object pose at each ˆ k , Pk ), the mean time step follows a normal distribution, ωk ∼ N (ω and covariance of which is sought. Once an image is acquired, the relevant image features are extracted and then exploited to gain information about the system states. The image features are related to the relative pose of the object with respect to the camera as follows:

ξ k = h ( ω k ) + rk 

T

(5)

where ξ k = u1,k v1,k , . . . , um,k vm,k is the 2m × 1 vector of measurements, m is the number of feature points, and rk is

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

the zero-mean white Gaussian measurement noise with covariance of Rk , (rk ∼ N (0, Rk )). It is assumed that ω0 , qk , and rk are independent random vectors. The function h(ωk ) in (5) is the nonlinear measurement mapping function, which is calculable using (1) and (2). The full expression of h(ωk ) and its derivation may be found in [43]. 3. Sample-based adaptive nonlinear Kalman filtering In order to estimate the pose of an object with respect to the camera, the posteriori of the pose parameters is sought. Assuming prior knowledge of system noise statistics, Kalman filter estimates the posteriori. However, when noise parameters are unknown, the posteriori is formulated by marginalizing the joint posteriori over all unknown parameters,

p(ωk |Yk ) =





Rk



Qk

P0

p(ωk , Rk , Qk , P0 |Yk )dP0 dQk dRk

(6)

p( P 0 ) =

309

NP 1  δ (P0 − Pi0 ) NP

(12)

i=1

where δ is the Dirac delta function, NQ and NP are the number of samples drawn from the process noise and initial state covariance distributions respectively, Qik ∼ IW (VQ,k , vQ ) and Pi0 ∼ IW (VP0 , vP ) are the ith random samples drawn from these distributions. Here VQ,k = (vQ − 13 )Qk−1 and VP0 = (vP − 13 )Pˆ0 . The number of samples should be selected large enough to provide a good approximation of these priories, yet not very large as it burdens fast estimation of the states. The posteriori of covariance matrices are calculated by applying the Bayesian rule,

p(Qk |Yk ) =cQ .p(ξ k |Qk , Yk−1 ) p(Qk |Yk−1 )

 p(P0 |Yk ) =

cP .p(ξ 1 |P0 ) p(P0 ) 1

(13)

k=1 k>1

(14)

where Yk = {ξ 1 , . . . , ξ k } is the set of observed measurements of all times. Since Rk , Qk , and P0 are independent random matrices, the joint distribution is calculated as follows:

where cQ and cP are normalizing factors. Employing (10)–(14) in the marginalization of the state priori yields,

p(ωk , Rk , Qk , P0 |Yk )

p(ωk |Yk−1 ) =

= p(ωk |Rk , Qk , P0 , Yk ) p(Rk |Yk ) p(Qk |Yk ) p(P0 |Yk )

(7)

The first term in the right hand side of (7) is the conditional posteriori, which is calculated through Bayesian filtering,

p(ωk |Rk , Qk , P0 , Yk ) =cω .p(ωk |Qk , P0 , Yk−1 ) p(ξ k |ωk , Rk , Yk−1 )

= (8)

where cω is a normalizing factor. Exploiting (7) and (8) in (6) yields a decoupled formulation,

p(ωk |Yk ) = cω .



Rk

p(ξ k |ωk , Rk , Yk−1 ) p(Rk |Yk )dRk



Qk







Qk

P0

p(ωk |Qk , P0 , Yk−1 ) p(Qk |Yk ) p(P0 |Yk )dP0 dQk

⎧ N Q NP j i T i ⎪ ⎪ j=1 i=1 wQ wiP N (Fωˆ k−1 ,FP0 F +Qk ) ⎪ ⎪ ⎪ NQ N P ⎪ ⎨ w j wi Q ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎩ N

Q

j=1 i=1

j=1

P0

(9)

Here, marginalizing over P0 and Qk results in the state prediction conditional on previous image measurements, while marginalizing over Rk leads to estimation update using the most recent measurements. The details of state prediction and update exploiting Kalman filtering is discussed in the sequel. It is worth noting that since a multi-model adaptive scheme is proposed, the adaptation occurs at all three stages of state prediction, state update, and noise parameter update, and not just the latter stage.

wiQ = p(ξ k |Qik , Yk−1 )



exp =

−1 2

ξ k − h(ωˆ k|k−1 )2

−1

(Ck FPk−1 FT CTk +Ck Qik CTk +Rk )

(2π ) |Ck FPk−1 F

T

m

and,



wiP = p(ξ 1 |Pi0 ) =

exp

−1 2

CTk

1

(10)

However, when the covariances of initial state and process noise are unknown, marginalization over these values are required. It is well known that the covariance of a normal random vector follows the Inverse Wishart (IW) distribution [44] and since the initial state and the process noise are normal random vectors, their covariance matrices follow the IW probability distribution (denoted by IW (V, v ), where V is the scaling matrix and v is its degree of freedom). The mean of the distribution (denoted by M) is calculated using these parameters (i.e., M = v−nVp −1 , where np is the size of the aforementioned normal random vector). The priories of these covariance matrices are approximated by finite number of samples, drawn from their IW distribution, NQ

1  δ (Qk − Qik ) NQ i=1

(11)

−1

(Ck FPi0 FT CTk +Ck Qk CTk +Rk )

( 2π ) | m

Ck FPi0 FT CTk

+

Ck Qk CTk

+ Rk |

1 2

(17)

ωˆ k|k−1

The state prediction in Kalman filter is performed by calculating the state priori mean and covariance through (4), given the initial state and process noise covariance matrices,

(16)

+ Ck Qik CTk + Rk | 2

ξ 1 − h(ωˆ 1|0 )2

  Here a2B = aT Ba, Ck = ∂ h∂(ωω ) 

3.1. State prediction

p(Qk |Yk−1 ) =

k>1

wQj

where,

×p(ωk |Qk , P0 , Yk−1 ) p(Qk |Yk ) p(P0 |Yk )dP0 dQk

p(ωk |Qk , P0 , Yk−1 ) = N (Fωk−1 , FPk−1 FT + Qk )

(15)

ˆ k−1 ,FPk−1 FT +Qik ) wQj N (Fω NQ

k=1

P

, and |A| dontes the determi-

nant of matrix A. The calculation details of Ck may be found in [43]. It was shown in (15) that the state priori is represented by a MoG. In order to reduce the computational cost of the estimations, the resultant MoG is approximated withthe closest normal distribution in a Mean Squared Error (MSE) sense, exploiting the method suggested in [45],

ˆ k|k−1 , Pk|k−1 ) p(ωk |Yk−1 ) ≈ N (ω

(18)

ˆ k|k−1 = Fω ˆ k−1 and, where ω

Pk|k−1 ≈

⎧ N Q NP j i T i ⎪ ⎪ j=1 i=1 wQ wiP (FP0 F +Qk ) ⎪ ⎪ ⎪ NQ N P ⎪ ⎨ wQj wiP

k=1

j=1 i=1

⎪ ⎪ ⎪ ⎪ ⎪ FP FT + ⎪ ⎩ k−1

NQ j=1

wQj Qik

NQ j=1

(19) k>1

wQj

In other words, the mean of the state priori is similar to that of the regular Kalman filter, while its covariance is found based on weighted samples of the covariance prior distributions.

310

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

3.2. State Update

Therefore, the resultant MoG is reduced to a single Gaussian distribution in a minimum MSE sense,

Next, the image features are used to update the states by applying the Bayesian rule discussed in (8). For that matter, the conditional probability of the measurements is exploited,

ˆ k | , Pk ) p(ωk |Yk ) = N (ω

p(ξ k |ωk , Rk , Yk−1 ) = N (h(ωk ), Rk )

ωˆ k ≈

(20)

Since the measurement mapping function (i.e., h(ωk )) is nonlinear, a linearized extension of Kalman filter (e.g., EKF) is utilized to update the system states, assuming prior knowledge of Rk . When this information is not available, the conditional probability of measurements should be marginalized as it was shown in (9). Once again, the measurement noise covariance is treated as a random variable with prior IW distribution. The measurement noise covariance priori distribution is approximated with random samples drawn from this distribution,

p(Rk |Yk−1 ) =

NR 1  δ (Rk − Rik ) NR

(21)

Pk ≈

NR  i=1 NR 

(29)

ˆk w¯ iR ω i

(30)



ˆ k )(ω ˆ k )T ˆk −ω ˆk −ω w¯ iR Pik + (ω i

i

(31)

i=1

The pose estimation at each time step is considered to be the ˆ k ). The pose estimation mean of the posteriori distribution (i.e., ω and its error covariance are then used at next time step in estimation of new pose values. The MoG form of the posteriori derived in (25) is also used to adapt the noise parameters, which is discussed in the sequel. 3.3. Noise parameter update

i=1

where NR is the number of samples and Rik ∼ IW (VR,k , vR ) is the ith random sample from the IW priori with VR,k = (vR − 2m − 1 )Rk−1 . The posteriori of the measurement noise covariance is acquired by applying the Bayesian rule,

p(Rk |Yk ) = cR .p(ξ k |Rk , Yk−1 ) p(Rk |Yk−1 )

(22)

where, cR is a normalizing factor. Then, the conditional probability of the measurement is calculated by marginalization over Rk as follows:

p(ξ k |ωk , Yk−1 ) =



Rk NR

=

i=1

p(ξ k |ωk , Rk , Yk−1 ) p(Rk |Yk )dRk wiR N

(h ( ω )

i k , Rk

NR i=1

where,

exp wiR = p(ξ k |Rik , Yk−1 ) =

ξ k − h(ωˆ k|k−1 )

2

(

−1 Ck Pk|k−1 CTk +Rik

)

ˆk ≈ 

(2π )m |Ck Pk|k−1 CTk + Rik | 2

1

p(ωk |Yk ) = cω .p(ωk |Yk−1 ) p(ξ k |ωk , Yk−1 ) =

i=1

i=1

=

NR 

wiR

ˆ k , Pik ) w¯ iR N (ω i

(25)

i=1

where w¯ iR =

wiR NR i=1

ˆ k and Pik are is the ith normalized weight, and ω i

wiR

found through EKF update formulation,

Kik = Pk|k−1 CTk (Ck Pk|k−1 CTk + Rik )−1

(26)

ωˆ = ωˆ k|k−1 + Kik (ξ k − h(ωˆ k|k−1 ))

(27)

Pik = (I12 − Kik Ck )Pk|k−1 (I12 − Kik Ck )T + Kik Rik Kik T

(28)

i k

NR  i=1 NR 

ˆ k ))(ξ k − h(ω ˆ k ))T w¯ iR (ξ k − h(ω

(34)

ˆ k|k−1 )(ω ˆ k|k−1 )T ˆk −ω ˆk −ω w¯ iR (ω

(35)

i

i

i

i

i=1

The noise parameters are then estimated by exploiting (32)–(35) and averaging over previous samples as follows:

  ˆ k + Ck Pk CTk ˆ k = 1 ( k − 1 )R ˆ k−1 +  R k   ˆ k + Pk − FPk−1 FT ˆ k = 1 ( k − 1 )Q ˆ k−1 +  Q k

ˆ k|k−1 , Pk|k−1 )N (h(ωk ), Rik ) wiR N (ω NR

(33)

ˆk ≈ 

Using (18) and (23), the state posteriori is estimated as follows:

cω .

Qk = k + Pk − FPk−1 FT



(24)

NR

(32)

Unlike [46] that suggested to approximate k and  based on previous samples, this work proposes to estimate these matrices employing the Gaussian elements of the MoG form of the state posteriori calculated in (27),

wiR

−1 2

Rk = k + Ck Pk CTk

(23)

)



The measurement noise covariance is estimated through the approximation of the measurement noise, which is represented by ˆ k ), while the process noise cothe innovation term, rˆ k = ξ k − h(ω variance is approximated exploiting the estimate of the process noise, defined as the difference between the state prediction and ˆ k|k−1 . It was shown in [46] that the system ˆk −ω its update, qˆ k = ω noise covariance is related to the covariances of rˆ k (denoted by k ) and qˆ k (denoted by ),

The formulation of the state posteriori presented in (25) suggests the calculation of a MoG for state calculation. However, it is wellknown from (4) that the state posteriori has a normal distribution.

(36) (37)

The estimation of noise parameters are then used in the estimations of pose at next time step. The proposed adaptive pose estimation algorithm is summarized in Algorithm 1. 4. Experimental results The goal of this section is to test the proposed adaptive method under various conditions and discuss its performance as compared to other available pose estimation methods. In order to demonstrate the independency of the proposed method from the experimental dataset, various datasets are used in the experiments. In particular, data gathered from a maneuvering robot-mounted camera is exploited as well as the dataset used in [47]. The experimental setup used to evaluate the proposed method is composed of a robotic arm, two cameras and two objects of interest. This setup is depicted in Fig. 1. To be more precise, two Firefly cameras from FLIR Integrated Imaging Solutions (Vancouver, BC,

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

311

Fig. 1. Experimental setup.

Fig. 2. Relative object pose trajectories using each dataset.

designed to model fast dynamics of the system by implementing sudden changes in the direction of the motion for each degree of freedom. Compared to dataset1, dataset 2 offers less variation in depth direction (zoc ), but more variation in the orientation parameters. The true pose of the object with respect to the corresponding camera is measured using an Optotrak Certus® optical tracker from NDI (Waterloo, ON, Canada) with an accuracy of 0.1 mm. For that purpose, optical markers of the tracker are installed both on the robot and the objects of interestat predetermined locations. The initial state of the system (ω0 ) is set by measuring the initial pose of the object employing the POSIT [11] algorithm. The transition matrix (F) is set to be a double integrator with time step 1/60. The measurement noise covariance (R) is measured statistically using samples taken from the static object over a period of 10 seconds, while the process noise covariance (Q) cannot be measured directly and is set by the maximum variation of pose parameters in the experiment. In order to ensure the effectiveness of the measurements, the initial state covariance (P0 ) is selected arbitrarily large. The number of covariance samples (NP , NQ , and NR ) are selected similar to [23], while the IW degrees of freedom (vQ , vR , and vP ) are selected slightly larger than their np to promote sampling diversity. The filter parameters are identified and set statistically independent of the experimental setup as follows:



Canada) are utilized for pose estimation, one is mounted on a 5DOF A255 robotic arm from CRS Robotics (Burlington, ON, Canada) and one is set off the robot. The cameras are calibrated and provide 640 × 480 pixel images from the objects of interest at the rate of 60 frames per second. The data observed of object 1 using camera 1 is recognized as dataset1 and the data collected by camera 2 from object 2 is called dataset2. The former dataset will be used in the first three experiments, while the latter is used in the fourth experiment to certify the independency of the proposed method of the dataset. Each object is composed of 4 circular coplanar circular features located at the vertices of a square with known side to facilitate the feature extraction performed by applying blob detectors. The markers are manually identified in the first image frame, and then tracked in consecutive frames exploiting a search window. The robot has a repeatability of ± 0.05 mm and is controlled in real-time using the QUARC® software from Quanser (Markham, ON, Canada), which works in MATLAB® environment provided by Mathworks (Natick, MA, USA). The robotic arm is maneuvered along the predefined trajectories resulting in object pose trajectories shown in Fig. 2, which are then tracked independently exploiting the point features (center of the blobs) extracted from the camera image at each time step. The robot trajectories are

F=

I6 06

0.016I6 I6

P0 = 100I12





Q=

0.05I6 06

06 06



R = 10−5 I8

vQ = vP = 20, vR = 15 NP = NQ = NR = 10 ω0 = 10−3 [14.1 23.6 208.9 −3.7 −16.1

32.6]T

The goal of the first experiment is to investigate the effect of the measurement noise covariance maladjustment on filter’s accuracy. For that matter, dataset1 is used and the measurement noise covariance is selected 10 0 0 times larger than its identified value to verify the effectiveness of the proposed method on such filter parameter selections. The evolution of the diagonal elements of the measurement noise covariance using the proposed adaptive method is depicted in Fig. 3 in logarithmic scale. The selected covariance samples continuously approach the true parameter, causing the estimated noise covariance to get closer to its true value gradually. In addition, the performance of the proposed Adaptive Nonlinear Kalman Filter (denoted by ANKF) is compared with that of the Tuned and Untuned Extended Kalman Filters (denoted by TEKF and UEKF respectively), the Covariance Matching adaptive extended Kalman Filter [30,31] (denoted by CMKF), the Variational

312

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

Algorithm 1 Adaptive nonlinear Kalman filtering. ˆ 0, R ˆ 0 , Pˆ 0 , ω ˆ 0 , vP , vQ , vR , NP , NQ , NR Inputs: F, Q for k=1:T do State Prediction: 3: ωˆ k|k−1 = Fωˆ k−1 4: 1:

2:

5: 6: 7: 8: 9:

 

Ck = ∂ h∂(ωω )  ωˆ

k|k−1

ˆ k−1 ˆ Q,k = (vQ − 13 )Q V ˆ0 ˆ P = (vP − 13 )P V 0

for i=1:NQ do ˆ Q,k , vQ ) Qi ∼ IW (V k

wiQ = p(ξ k |Qik , Yk−1 ) if k = 1 then for j = 1 : NP do ˆ P , vP ) Pi0 ∼ IW (V 0

10: 11: 12: 13:

wiP = p(ξ 1 |Pi0 )

14:

NQ N P

Pk|k−1 =

15:

j wQ wiP (FPi0 FT +Qik ) j=1 i=1 NQ N P j wQ wiP j=1 i=1

end for else

16: 17:

NQ

T

Pk|k−1 = FPk−1 F +

18:

j=1 NQ

20: 21: 22: 23: 24:

j



Mean

ωˆ ik = ωˆ k|k−1 + Kik (yk − h(ωˆ k|k−1 )) Pik = (I − Kik Ck )Pk|k−1 (I − Kik Ck )T + Kik Ri KiT k wiR = p(ξ k |Rik , Yk−1 )

27: 28: 29:

end for

30:

w¯ iR

=

NR

wiR

State Update: NR ωˆ k = w¯ iR ωˆ ik i=1

NR



ˆ k )(ω ˆ k )T ˆk −ω ˆk −ω w¯ iR Pik + (ω i

i

Pk =

34:

Measurement and Process Noise Covariance Update: NR i i ˆk = ˆ k ))(ξ k − h(ω ˆ k ))T w¯ iR (ξ k − h(ω 

36:

ˆ =  k

37:

ˆk = R

38:

ˆk = Q

39:

end for

i=1

exco (mm)

eyco (mm)

ezoc (mm)

eφoc (deg)

eθoc (deg)

eψoc (deg)

TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF

4.7 5.1 10.2 7.7 5.0 4.9 4.9 0.8 0.8 1.6 0.8 0.9 0.9 0.9 0.9 0.9 1.9 1.0 0.9 0.9 0.9

2.8 2.6 10.7 2.4 2.8 2.8 2.8 0.5 0.5 1.0 0.5 0.5 0.5 0.5 0.5 0.4 1.2 0.5 0.4 0.5 0.5

2.3 4.9 3.9 3.2 2.5 2.0 2.0 0.3 0.8 0.5 0.4 0.5 0.3 0.3 0.3 0.9 0.6 0.5 0.5 0.3 0.3

0.6 0.9 0.8 0.7 0.9 0.6 0.6 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.1

1.8 5.3 1.6 2.6 3.5 1.5 1.6 0.5 1.5 0.5 0.6 0.8 0.4 0.5 0.3 1.3 0.3 0.4 0.6 0.3 0.3

2.4 6.9 3.0 2.3 7.0 2.5 2.9 0.5 1.7 0.5 0.4 0.9 0.5 0.5 0.5 1.6 0.5 0.3 1.2 0.5 0.5

33:

35:

STD

wiR

i=1

32:

Max

Kik = Pk|k−1 CTk (Ck Pk|k−1 CTk + Ri )−1

26:

Method

j

wQ

end if end for Covariance Sampling: ˆ k−1 ˆ R,k = (vR − 2m − 1 )R V for i = 1 : NR do ˆ R,k , vR ) Ri ∼ W −1 (V

25:

31:

Table 1 Pose estimation error statistics of investigated methods in presence of measurement noise uncertainty.

wQ Qik

j=1

19:

Fig. 3. Evolution of the measurement noise covariance diagonal elements using the proposed adaptive method.

i=1 NR

i i ˆ k|k−1 )(ω ˆ k|k−1 )T ˆk −ω ˆk −ω w¯ iR (ω i=1   1 ˆ k + Ck Pk CT (k − 1 )Rˆ k−1 +  k k  1 ˆ + P − FP F T (k − 1 )Qˆ k−1 +  k k k−1 k

method which does not consider the system noise and therefore is needless of such tunings. While the proposed estimator minimizes the MSE, it is customary to use the first norm for pose error evaluation (e.g., see [30]), as it is more tangible in real applications. Therefore, the estimation error of each method (denoted



T

eyco ezoc eφoc eθoc eψoc ) is calculated as the by eρ = exco absolute value of the difference between their estimation and the true pose measured by the optical tracker,

ˆ o,k |, eρ ,k = |ρT,k − ρ c

ˆ o,k are the poses measured and estimated by where ρT, k and ρ the optical tracker and the algorithm of choice, respectively. The statistics of the estimation errors are briefed in Table 1 for each method, while the estimation error values are depicted in Fig. 4 for adaptive methods and in Fig. 5 for non-adaptive schemes as compared to ANKF. To better highlight the effectiveness of the proposed method, the estimation errors are demonstrated around sharp turns (happening at about second 5). As it was anticipated, ANKF and ANKF-R share the best performance, while CMKF perc

Bayesian adaptive extended Kalman Filter [38] (denoted by VBKF), a VVS-based Visual Odometry method [15] (denoted by VO), and an early version of ANKF, which only estimates the measurement noise covariance [23] (denoted by ANKF-R). The TEKF uses the identified noise covariance values, while other methods (including UEKF) start with the altered noise covariance matrix. It is worth mentioning that the VO method exploits a nonlinear optimization

(38)

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

313

Fig. 6. Evolution of the process noise covariance diagonal elements using the proposed ANKF. Table 2 Pose estimation error statistics of investigated methods with uncertain process noise.

Fig. 4. Pose estimation error of ANKF, CMKF, VBKF, and ANKF-R in presence of measurement noise uncertainty.

Max

Mean

STD

Fig. 5. Pose estimation error of TEKF, UEKF, VO, and ANKF in presence of measurement noise uncertainty.

forms better in xco and yco direction and VBKF handles orientation parameters better. Both methods introduce larger depth errors (ezoc ) as compared to the proposed method. On the other hand, the translational errors of VO method are comparatively large and the UEKF has the worst response, especially in estimating the orientation parameters. Thanks to the proposed adaptive scheme, estimations of ANKF are closest to those of TEKF.

Method

exco (mm)

eyco (mm)

ezoc (mm)

eφoc (deg)

eθoc (deg)

eψoc (deg)

TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF

4.9 5.1 10.2 7.6 4.9 4.9 4.9 0.8 0.8 1.6 1.0 0.9 0.9 0.9 0.9 0.9 1.9 1.3 0.9 0.9 0.9

2.8 2.6 10.7 2.9 2.8 2.8 2.8 0.5 0.5 1.0 0.6 0.5 0.5 0.5 0.5 0.4 1.2 0.6 0.4 0.5 0.4

2.3 4.9 3.9 3.6 2.8 2.0 2.5 0.3 0.8 0.5 0.5 0.4 0.3 0.3 0.3 0.9 0.6 0.7 0.4 0.3 0.4

0.6 0.9 0.8 1.0 1.0 0.6 0.6 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1 0.1

1.8 5.3 1.6 3.2 2.0 1.5 2.3 0.5 1.5 0.5 0.6 0.6 0.4 0.6 0.3 1.3 0.3 0.5 0.4 0.3 0.4

2.4 6.9 3.0 2.2 6.0 2.6 2.3 0.5 1.7 0.5 0.4 0.7 0.5 0.6 0.5 1.6 0.5 0.3 0.9 0.5 0.5

The purpose of the second experiment is to evaluate the performance of the proposed adaptive filter in presence of process noise covariance uncertainty with respect to other previously mentioned methods. For that matter, exploiting dataset1 the initial process noise covariance is selected 10 0 0 times smaller than its set value. The effect of the proposed ANKF on diagonal elements of the process noise covariance are demonstrated in Fig. 6 in logarithmic scale. It it noticeable that the first six elements (shown in solid lines) approach the identified value of 0.05, while the rest (depicted in dashed lines) are becoming very small (almost zero), due to the sampling and update of the noise covariance prior distribution. Moreover, the proposed adaptive method estimations are compared with those of other aforementioned methods in tracking the pose of the object, similar to the previous experiment. The statistics of the absolute errors between the pose estimations and the true pose values along the entire robot trajectory are summarized in Table 2. These errors are depicted between second 4.5 and second 5.5 in Fig. 7 for adaptive methods and in Fig. 8 for non-adaptive schemes compared to the proposed ANKF. All adaptive methods are capable of handling this change, yet, ANKF and

314

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

Fig. 9. Pose estimation error of ANKF, CMKF, VBKF, and ANKF-R with uncertain measurement and process noise. Fig. 7. Pose estimation error of ANKF, CMKF, VBKF, and ANKF-R in presence of process noise uncertainty.

Table 3 Pose estimation error statistics of investigated methods with uncertain process and measurement noise.

Max

Mean

STD

Fig. 8. Pose estimation error of TEKF, UEKF, ANKF, and VO in presence of process noise uncertainty.

ANKF-R seems to be providing slightly more accurate estimations as compared to VBKF and CMKF. It is intriguing to see that ANKFR can still perform well by adjusting the measurement noise to compensate for the loss of tuning in process noise covariance. This maladjustment has almost the same effect as previous experiment in UEKF, while ANKF still performs close to TEKF, and much better

Method

exco (mm)

eyco (mm)

ezoc (mm)

eφoc (deg)

eθoc (deg)

eψoc (deg)

TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF

4.9 15.0 10.2 7.6 17.3 5.1 4.9 0.8 1.6 1.6 1.0 1.9 0.8 0.9 0.9 2.4 1.9 1.3 2.9 0.9 0.9

2.8 4.7 10.7 2.8 1.9 2.6 2.8 0.5 1.0 1.0 0.6 0.4 0.5 0.5 0.5 1.1 1.2 0.5 0.3 0.4 0.5

2.3 47.0 3.9 3.6 44.1 5.1 2.0 0.3 7.4 0.5 0.5 4.3 0.9 0.3 0.3 9.5 0.6 0.6 7.8 1.0 0.3

0.6 9.6 0.8 1.0 4.7 1.1 0.6 0.1 2.5 0.1 0.1 0.4 0.1 0.1 0.1 2.7 0.1 0.2 0.8 0.2 0.1

1.8 16.2 1.6 3.0 42.6 5.3 1.6 0.5 6.7 0.5 0.6 4.9 1.6 0.5 0.3 4.4 0.3 0.5 8.6 1.3 0.3

2.4 23.7 3.0 2.2 38.6 7.4 2.9 0.5 11.1 0.5 0.3 2.7 1.8 0.5 0.5 5.5 0.5 0.3 5.7 1.8 0.5

than VO. This experiment certifies that the proposed ANKF is also effective in case of process noise covariance maladjustment. In many of the real-life applications, no prior information about any of the noise covariance matrices is available. Using the dataset1, in the third experiment both the measurement and process noise covariance matrices are altered 10 0 0 times to investigate the effectiveness of the proposed adaptive technique as compared to the aforementioned methods. The results of this comparison for adaptive methods are depicted in Fig. 9, and for other methods in Fig. 10. The statistics of the error for each method is brought in Table 3. Once again, ANKF performance remains close to that of TEKF. The performance of ANKF-R and CMKF deteriorates as compared to previous cases, yet VBKF provides comparable, yet slightly less accurate, estimates to that of ANKF. The errors of UEKF

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

315

Fig. 11. Pose estimation error of adaptive methods using dataset2 with uncertain system noise. Fig. 10. Pose estimation error of TEKF, UEKF, ANKF, and VO with uncertain measurement and process noise. Table 4 Pose estimation error statistics of all methods using dataset2 with uncertain process and measurement noise.

Max

Mean

STD

Method

exco (mm)

eyco (mm)

ezoc (mm)

eφoc (deg)

eθoc (deg)

eψoc (deg)

TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF

3.3 19.5 6.3 2.9 8.5 7.6 3.7 0.4 2.8 1.2 0.6 1.0 1.1 0.4 0.4 3.5 1.3 0.5 1.5 1.5 0.4

1.1 8.1 4.8 1.5 4.4 2.7 1.0 0.2 1.1 0.7 0.3 0.4 0.5 0.2 0.2 1.1 0.8 0.3 0.6 0.5 0.2

1.5 39.7 3.0 1.8 15.9 15.1 2.5 0.3 6.9 0.5 0.4 2.1 2.0 0.4 0.3 7.9 0.6 0.4 3.5 3.0 0.4

0.6 12.00 0.9 0.8 7.6 5.7 0.5 0.1 3.9 0.2 0.2 1.0 1.2 0.1 0.1 3.2 0.1 0.1 1.5 1.3 0.1

1.6 12.3 1.5 1.8 5.9 9.7 1.5 0.3 3.8 0.3 0.3 0.8 1.8 0.3 0.3 2.9 0.3 0.3 1.1 2.2 0.3

1.0 15.3 1.2 1.7 17.5 16.0 1.5 0.3 5.5 0.3 0.3 1.9 3.3 0.3 0.2 3.9 0.2 0.2 3.0 3.6 0.2

are much larger as compared to other methods, which emphasizes the necessity of adaptation for such scenarios. In order to highlight the proposed adaptive method independence of the exploited dataset, previous experiment is conducted using a different dataset (dataset2), which is collected with a different camera and object. As it was shown in Fig. 2, the pose measured using this dataset follows similar pattern, though the range and direction of change for pose parameters of this dataset are different. Similar to the previous experiment, the results of the ANKF compared with the adaptive and non-adaptive methods are depicted in Figs. 11 and 12, respectively. The statistics of the error for each method is summarized in Table 4. It is apparent that ANKF

Fig. 12. Pose estimation error of TEKF, UEKF, ANKF, and VO, using dataset2 with uncertain system noise.

provides estimates close to TEKF and superior to all other methods. The performance of VBKF is still acceptable, while other adaptive methods are not performing as well, yet way better than UEKF. The estimations of VO are comparable to those of ANKF in orientation parameters and depth, but not the two others. The fifth experiment is conducted using the dataset from [47] (dataset3) to highlight the independency of the proposed

316

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

Fig. 13. The soup box used in the fifth experiment with SIFT features depicted by yellow stars.

method from datasets and features extraction methods. In this experiment, the pose of a soup box is tracked through consecutive images in which the soup box is maneuvered over a room. The features from the box are extracted and matched using the popular SIFT algorithm [48]. The box and the extracted features are shown in Fig. 13. In order to estimate the pose of the box, the filter parameters are reset as follows:



I F= 6 06 P0 = 10I12

0.1I6 I6





0.01I6 Q= 06

06 06

Fig. 14. Pose estimation error of adaptive methods using dataset3 with uncertain system noise.



R = 10−8 I8

vQ = vP = 20, vR = 60 NP = NQ = NR = 10  ω0 = −0.02 0.025 0.32 0.05 −0.03

T

−0.3

The process and measurement noise covariances are altered 100 times to test the performance of the proposed adaptive method in case of system noise uncertainty, similar to experiments 3 and 4. the performance of the proposed method as compared to other adaptive method are depicted in Fig. 14, while its results are compared with non-adaptive methods in Fig. 15. Table 5 summarizes the error statistics of each exploited scheme. Once again, the accuracy level of the proposed ANKF is much higher than other adaptive methods and is almost comparable to the tuned method (i.e., TEKF). In the sixth experiment, the effect of the number of samples taken from covariance priories on the accuracy of estimations is investigated. For that matter, ANKF is employed with the number of samples (i.e. NP , NQ , and NR ) ranging from 2 to 20. Dataset1 is exploited for this experiment. The accuracy of the estimations is averaged over the whole trajectory and 100 Monte Carlo simulation runs. The results of this experiment are shown in Fig. 16. The increase in the number of samples does not affect the estimation accuracy in the x and y directions; however, all other degrees of freedom are benefited from the increase in the number of samples. It is also observed not much improvement is made when the number of samples is increase over 10 samples. Therefore, the number of samples is set to 10 in the experiments. In order to fairly compare the performance of the proposed method in comparison with other counterparts, the computation extensiveness of each algorithm is measured in the seventh experiment. For that purpose, each algorithm is implemented on a PC with 16 GB of RAM and a Core i7-6700 processor that operates at

Fig. 15. Pose estimation error of TEKF, UEKF, ANKF, and VO, using dataset3 with uncertain system noise.

3.4 GHz and the computation time for each pose estimation along the pose trajectory is estimated by averaging the computation time over multiple runs. All algorithms of concern are reproduced in a single-threaded fashion through a common basis and their major difference lies within their rule of adaptation (or estimation in case of VO). The average computation time per estimation is considered as a measure of computational complexity of each method and is reported for the aforementioned algorithms in Table 6. It is worth mentioning that the computation complexity of the UEFK and TEKF

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318 Table 5 Pose estimation error statistics of all methods using dataset3 with uncertain process and measurement noise.

Max

Mean

STD

Method

exco (mm)

eyco (mm)

ezoc (mm)

eφoc (deg)

eθoc (deg)

eψoc (deg)

TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF TEKF UEKF VO VBKF CMKF ANKF-R ANKF

8.3 29.9 12.2 105.8 82.1 15.5 8.8 3.5 11.3 3.9 47.0 30.6 5.4 3.6 2.4 10.3 3.0 38.2 28.8 5.1 2.5

4.6 17.0 11.2 36.8 24.5 17.6 4.5 2.0 5.1 2.9 18.9 10.6 5.2 2.0 1.4 5.6 2.6 16.4 8.6 5.2 1.3

3.5 93.6 2.4 57.1 43.6 51.3 6.9 1.2 25.1 1.2 23.1 14.3 17.2 2.7 0.9 28.7 0.6 21.0 11.9 15.9 2.1

0.8 10.7 0.4 7.8 20.1 11.6 0.6 0.2 2.7 0.1 4.0 8.3 4.9 0.2 0.2 2.2 0.1 2.1 7.8 3.4 0.1

1.6 68.0 1.0 15.3 53.6 71.6 1.9 0.5 28.1 0.4 7.1 23.4 24.4 0.6 0.4 23.8 0.3 4.8 16.1 24.2 0.4

2.0 20.6 1.5 8.2 44.0 45.1 3.5 1.0 6.1 0.6 2.9 25.2 15.0 1.2 0.6 5.9 0.5 2.4 15.1 10.9 0.8

317

computation cost of ANKF may be reduced by selecting less number of samples at the cost of reduced estimation accuracy. 5. Conclusion The performance of the Kalman-based visual pose tracking methods deteriorates when the statistics of the system parameters are not available. Motivated by this fact, this work presented a novel adaptive scheme that approximates the object’s relative pose as well as the noise statistics by drawing samples from the noise prior distributions. These samples were used to form a multimodel adaptive estimator to gain accurate estimations of the pose and filter parameters simultaneously. The experimental results suggested improved estimation accuracy gained by applying the proposed method as compared to other adaptive and non-adaptive techniques. The parameters of the proposed method should be selected carefully to achieve desirable results. The number of samples should be selected large enough to correctly approximate the covariance priori, yet not too large as the computational complexity of the system increases with this number. In addition, the degrees of freedom for the covariance priori are suggested to be chosen small to allow the samples to be as diverse as possible. Acknowledgment This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) PDF-487746-2016 to Akbar Assa. References

Fig. 16. Pose estimation error of ANKF versus the number of samples drawn from the noise covariance priories.

Table 6 Computation time per estimation for TEKF/UEKF, VO, VBKF, CMKF, ANKF-R, and ANKF. Method

TEKF /UEKF (ms)

VO (ms)

VBKF (ms)

CMKF (ms)

ANKF-R (ms)

ANKF (ms)

Time

0.2327

0.4355

0.2709

0.8622

1.6902

2.2180

are the same as they are only different in parameter settings. The proposed adaptive scheme requires about 10 times more computations for each pose estimation, which is closely related to the number of samples (models) selected from each covariance prior (NR , NP , and NQ ). This complexity is reduced for ANKF-R as only the measurement noise covariance is sampled there. The computation cost of VBKF is relatively low, while VO requires more calculations due to its iterative nature. However, the extra complexity of ANKF may be compensated by parallel computing techniques, which preserve the time efficiency of the algorithm. Moreover, the

[1] R. Szeliski, Computer Vision: Algorithms and Applications, Springer Science & Business Media, 2010. [2] S.H. Kasaei, M. Oliveira, G.H. Lim, L.S. Lopes, A.M. Tomé, Towards lifelong assistive robotics: a tight coupling between object perception and manipulation, Neurocomputing 291 (2018) 151–166. [3] J. Fabian, G.M. Clayton, Error analysis for visual odometry on indoor, wheeled mobile robots with 3-d sensors, IEEE/ASME Trans. Mech. 19 (6) (2014) 1896–1906. [4] D. Zou, P. Tan, Coslam: Collaborative visual slam in dynamic environments, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2) (2013) 354–366. [5] E. Marchand, H. Uchiyama, F. Spindler, Pose estimation for augmented reality: a hands-on survey, IEEE Trans. Vis. Comput. Graph. 22 (12) (2016) 2633–2651. [6] J. Thomas, G. Loianno, K. Daniilidis, V. Kumar, Visual servoing of quadrotors for perching by hanging from cylindrical objects, IEEE Robot. Autom. Lett. 1 (1) (2016) 57–64. [7] L. Quan, Z. Lan, Linear n-point camera pose determination, IEEE Trans. Pattern Anal. Mach. Intell. 21 (8) (1999) 774–780. [8] V. Lepetit, F. Moreno-Noguer, P. Fua, EPNP: an accurate o(n) solution to the PNP problem, Int. J. Comput. Vis. 81 (2) (2009) 155–166. [9] A. Ansar, K. Daniilidis, Linear pose estimation from points or lines, IEEE Trans. Pattern Anal. Mach. Intell. 25 (5) (2003) 578–589. [10] P.D. Fiore, Efficient linear solution of exterior orientation, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2) (2001) 140–148. [11] D. DeMenthon, L.S. Davis, Exact and approximate solutions of the perspective-three-point problem, IEEE Trans. Pattern Anal. Mach. Intell. 14 (11) (1992) 1100–1105. [12] E. Mouragnon, M. Lhuillier, M. Dhome, F. Dekeyser, P. Sayd, Generic and real– time structure from motion using local bundle adjustment, Image Vis. Comput. 27 (8) (2009) 1178–1193. [13] D.G. Lowe, et al., Fitting parameterized three-dimensional models to images, IEEE Trans. Pattern Anal. Mach. Intell. 13 (5) (1991) 441–450. [14] A. Assa, F. Janabi-Sharifi, Virtual visual servoing for multicamera pose estimation, IEEE/ASME Trans. Mech. 20 (2) (2015) 789–798. [15] J. Li-Chee-Ming, C. Armenakis, Augmenting ViSP’s 3d model-based tracker with RGB-D slam for 3D pose estimation in indoor environments, Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 41 (2016) 925–932. [16] D. Scaramuzza, F. Fraundorfer, Visual odometry [tutorial], IEEE Robot. Autom. Mag. 18 (4) (2011) 80–92. [17] R. Kouskouridas, A. Gasteratos, C. Emmanouilidis, Efficient representation and feature extraction for neural network-based 3D object pose estimation, Neurocomputing 120 (2013) 90–100. [18] S. Chen, Kalman filter for robot vision: a survey, IEEE Trans. Ind. Electron. 59 (11) (2012) 4409–4420. [19] J. Wu, R. Rink, T.M. Caelli, V. Gourishankar, Recovery of the 3-D location and motion of a rigid object through camera image (an extended Kalman filter approach), Int. J. Comput. Vis. 2 (4) (1989) 373–394.

318

A. Assa, F. Janabi-Sharifi and K.N. Plataniotis / Neurocomputing 333 (2019) 307–318

[20] C. Ye, S. Hong, A. Tamjidi, 6-DOF pose estimation of a robotic navigation aid by tracking visual and geometric features, IEEE Trans. Autom. Sci. Eng. 12 (4) (2015) 1169–1180. [21] N. Enayati, E. De Momi, G. Ferrigno, A quaternion-based unscented Kalman filter for robust optical/inertial motion tracking in computer-assisted surgery, IEEE Trans. Instrum. Meas. 64 (8) (2015) 2291–2301. [22] Y. Lin, T. Chen, F. Xi, G. Fu, Relative pose estimation from points by Kalman filters, in: Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, 2015, pp. 1495–1500. [23] A. Assa, K.N. Plataniotis, Adaptive kalman filtering by covariance sampling, IEEE Signal Process. Lett. 24 (9) (2017) 1288–1292. [24] H. Wang, Z. Deng, B. Feng, H. Ma, Y. Xia, An adaptive Kalman filter estimating process noise covariance, Neurocomputing 223 (2017) 12–17. [25] R. Mehra, Approaches to adaptive filtering, IEEE Trans. Autom. Control 17 (5) (1972) 693–698. [26] R.H. Shumway, D.S. Stoffer, An approach to time series smoothing and forecasting using the EM algorithm, J. Time Series Anal. 3 (4) (1982) 253–264. [27] R. Kashyap, Maximum likelihood identification of stochastic linear systems, IEEE Trans. Autom. Control 15 (1) (1970) 25–34. [28] B.J. Odelson, M.R. Rajamani, J.B. Rawlings, A new autocovariance least-squares method for estimating noise covariances, Automatica 42 (2) (2006) 303–308. [29] K. Myers, B. Tapley, Adaptive sequential estimation with unknown noise statistics, IEEE Trans. Autom. Control 21 (4) (1976) 520–523. [30] V. Lippiello, B. Siciliano, L. Villani, Adaptive extended Kalman filtering for visual motion estimation of 3D objects, Control Eng. Pract. 15 (1) (2007) 123–134. [31] F. Jiancheng, Y. Sheng, Study on innovation adaptive EKF for in-flight alignment of airborne pos, IEEE Trans. Instrum. Meas. 60 (4) (2011) 1378–1388. [32] F. Aghili, A. Salerno, Driftless 3-D attitude determination and positioning of mobile robots by integration of IMU with two RTK GPSs, IEEE/ASME Trans. Mech. 18 (1) (2013) 21–31. [33] J.-N. Chi, C. Qian, P. Zhang, W. Xiao, L. Xie, A novel elm based adaptive Kalman filter tracking algorithm, Neurocomputing 128 (2014) 42–49. [34] Y. Huang, Y. Zhang, B. Xu, Z. Wu, J. Chambers, A new adaptive extended Kalman filter for cooperative localization, IEEE Trans. Aerosp. Electron. Syst. 54 (1) (2018) 353–368. [35] X.R. Li, Y. Bar-Shalom, A recursive multiple model approach to noise identification, IEEE Trans. Aerosp. Electron. Syst. 30 (3) (1994) 671–684. [36] R. Kottath, P. Narkhede, V. Kumar, V. Karar, S. Poddar, Multiple model adaptive complementary filter for attitude estimation, Aerosp. Sci. Technol. 69 (2017) 574–581. [37] L.L. Scharf, D.L. Alspach, Non-linear state estimation in observation noise of unknown covariance, Int. J. Control 27 (2) (1978) 293–304. [38] G. Agamennoni, J.I. Nieto, E.M. Nebot, Approximate inference in state-space models with heavy-tailed noise, IEEE Trans. Signal Process. 60 (10) (2012) 5024–5037. [39] T. Ardeshiri, E. Özkan, U. Orguner, F. Gustafsson, Approximate Bayesian smoothing with unknown process and measurement noise covariances, IEEE Signal Process. Lett. 22 (12) (2015) 2450–2454. [40] B. Siciliano, L. Sciavicco, L. Villani, G. Oriolo, Robotics: Modelling, Planning and Control, Springer Science & Business Media, 2010. [41] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (surf), Comput. Vis. Image Underst. 110 (3) (2008) 346–359. [42] Y. Bar-Shalom, X.R. Li, T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software, John Wiley & Sons, 2004. [43] V. Lippiello, B. Siciliano, L. Villani, Eye-in-hand/eye-to-hand multi-camera visual servoing, in: Proceedings of the Forty-Forth IEEE Conference on Decision and Control, and European Control Conference., 2005, pp. 5354–5359. [44] S.W. Nydick, The Wishart and inverse Wishart distributions, Electron. J. Stat. 6 (2012) 1–19. [45] D. Alspach, H. Sorenson, Nonlinear Bayesian estimation using gaussian sum approximations, IEEE Trans. Autom. Control 17 (4) (1972) 439–448. [46] A. Mohamed, K. Schwarz, Adaptive Kalman filtering for INS/GPS, J. Geod. 73 (4) (1999) 193–203.

[47] K. Pauwels, L. Rubio, J. Diaz, E. Ros, Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2347–2354. [48] D.G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the Seventh IEEE International Conference on Computer Vision, 2, 1999, pp. 1150–1157. Akbar Assa received his Ph.D. degree in Mechanical Engineering from Ryerson University, in 2015. He was a postdoctoral fellow at Ryerson University in 2015 and served as a professor at Sheridan College from 2015 to 2016. He joined the Multimedia lab in ECE department of University of Toronto in 2017 as a NSERC postdoctoral fellow. His research interests span from robotics, visual servoing, and control systems to image processing, optimal estimation, and sensor fusion.

Farrokh Janabi-Sharifi received the Ph.D. degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada, in 1995. From 1995 to 1997, he was a NSERC Postdoctoral Fellow and an Instructor with the Center for Intelligent Machines and with the Department of Electrical Computer Engineering, McGill University, Montreal, QC, Canada. He is currently a Professor of mechanical industrial engineering and the Director of the Robotics, Mechatronics, and Automation Laboratory, Ryerson University, Toronto, ON. He has been a Visiting Professor at KAIST, Taejon, Korea; IRISA-INRIA, Rennes, France; and TUM, Munich, Germany. He has also been an Organizer and/or Coorganizer of several international conferences on optomechatronic systems control. He was also the General Chair of the 2010 International Symposium on Optomechatronic Technologies, Toronto. He currently serves as an Associate Editor of the International Journal of Optomechatronics, and an Editorial Member of the Journal of Robotics and the Open Cybernetics and Systematics Journal. His research interests include optomechatronic systems with a focus on image-guided control and planning of robots. Konstantinos N. (Kostas) Plataniotis is a Professor and the Bell Canada Chair in Multimedia with the Electrical and Computer Engineering Department, University of Toronto, Toronto, ON, Canada. He is the co-founder and inaugural Director- Research for the Identity, Privacy and Security Institute (IPSI), University of Toronto, and he served as the Director of the Knowledge Media Design Institute(KMDI),University of Toronto,from January 2010 to July 2012. His research interests are knowledge and digital media design, multimedia systems, biometrics, image and signal processing, communications systems, and pattern recognition. He authored WLAN Positioning Systems (2012) and Multi Linear Subspace Learning: Reduction of Multidimensional Data (2013). Dr. Plataniotis is a Registered Professional Engineer in the Province of Ontario. He is a Fellow of the Engineering Institute of Canada. He has served as the Editor-in-Chief of IEEE SIGNAL PROCESSING LETTERS and as Technical Co-Chair of the IEEE 2013 International Conference in Acoustics, Speech and Signal Processing. He was the IEEE Signal Processing Society Vice President for Membership (20142016). He is the General Co-Chair for the 2017 IEEE GlobalSIP Conference, the General Co-Chair for the 2018 International Conference on Image Processing (ICIP-18), and the General Co-Chair for the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021).