A learning approach to image-based visual servoing with a bagging method of velocity calculations

A learning approach to image-based visual servoing with a bagging method of velocity calculations

Information Sciences 481 (2019) 244–257 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

4MB Sizes 0 Downloads 47 Views

Information Sciences 481 (2019) 244–257

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

A learning approach to image-based visual servoing with a bagging method of velocity calculations Haobin Shi a,c, Kao-Shing Hwang b,∗, Xuesi Li a, Jialin Chen a a

School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi Province 710072, China Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan c Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an, Shaanxi Province 710072, China b

a r t i c l e

i n f o

Article history: Received 2 May 2018 Revised 28 December 2018 Accepted 29 December 2018 Available online 31 December 2018 Keywords: Image-based visual servoing Bootstrap aggregation Reinforcement learning Q-learning

a b s t r a c t Visual servoing allows accurate control of positioning and motion relative to a stationary or moving target using vision and is the subject of many studies. Most servoing gains for image-based visual servoing methods are selected heuristically or empirically so accuracy is affected. This study uses reinforcement learning to adaptively tune the proportional servoing gain over sequences of image based visual servoing, instead of using a constant gain. The system is used to control hovering and tracking for a stationary or slowly moving target. An image Jacobian matrix with four dimensions is constructed because a quadrotor drone features under-actuated dynamics. The Moore-Penrose pseudo inverse method is usually used to calculate the inverse image Jacobian matrix, but this study uses a bagging approach to calculate the inverse kinematics. The desired velocity is obtained from time-varying image errors, which gives greater robustness. An adaptive method to calculate the servoing gain method is proposed, whereby the selection of an appropriate servo gain over time is regarded as a reinforcement learning problem. The proposed visual servoing control system is implemented and tested experimentally using a quad-rotor drone system. The experimental results demonstrate that the proposed method is more robust and converges faster than other methods. © 2018 Published by Elsevier Inc.

1. Introduction Visual feedback can be used to control a system and visual servoing has been used for many robotic applications for decades [10,15,16]. In the future, this technology will experience further development in many different fields, such as motion-based detection [19] and environmental monitoring [20]. Visual servoing involves two main tasks. Image features that can be reliably detected and tracked throughout the workspace are firstly extracted. The strategy for extracting features depends on the control task and the control strategy in that it affects the convergence properties and the behavior of the visual servoing task. Many explicit visual features are utilized for visual servoing, such as points [5,22], lines [18], shapes [16,27] and image moments [2,23]. Then a stable control strategy is established to drive the features that are being tracked to the targeted features geometrically. The control strategies fall into two categories [11]: image-based visual servoing (IBVS) and position-based visual servoing (PBVS). PBVS uses a calibrated camera to reconstruct the pose of a target



Corresponding author at: Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan E-mail address: [email protected] (K.-S. Hwang).

https://doi.org/10.1016/j.ins.2018.12.082 0020-0255/© 2018 Published by Elsevier Inc.

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

245

object from image points [1,7,13,32]. Since the reconstruction relies significantly on the camera’s intrinsic properties, it is sensitive to physical errors in devices and inaccurate calibration of the camera parameters [14]. Robotic motion occurs in the Cartesian space so PBVS cannot ensure that the tracked object is restricted to the captured images [32]. However, IBVS uses an image plane, which ignores the Cartesian velocities of the controlled system [4]. A major advantage of IBVS over PBVS is that it converges to a desired configuration even if there are camera calibration errors, which only affect the rate of convergence [22,24]. Recently, neural network-based visual servoing methods have been proposed. In [30], an adaptive neural network module is established to estimate unknown robot dynamics online and to reduce the computational complexity [3] so a deep neural network-based method is highly-precise, robust and allows real-time six degrees of freedom (DOF) positioning tasks using visual servoing. Bootstrap aggregation (Bagging) decreases the deviation in a prediction by generating additional data for training using the original data. Combinations with repetitions are used to produce multiple sets that are the same size as the original data [12]. If the size of the training set is increased, the deviation is decreased, so the prediction more accurately matches the expected outcome. The main contributions of the paper are twofold. Because the Moore-Penrose pseudo inverse is highly sensitive and image noise is a problem for many IBVS studies [4,5,25], the bagging method takes a bootstrapping approach to selecting the appropriate number of feature points from a pool to form a square image Jacobian matrix. In the other aspect, most of the IBVS studies calculate the servoing gain empirically, but this study proposes an adaptive servoing gain method that uses reinforcement learning (RL) in the form of the Q-Learning to adaptively tune proportional servoing gain over sequences of image-based visual servoing. The proposed method eventually comprises two processes. The visual servoing control process converts an image feature error into aerial velocity commands and the servoing gain adjustment process learns the adaptive servoing gain from the aerial velocity. The under-actuated aerial velocity is then converted into motion commands for a quad-rotor drone. This paper is organized as follows. In Section 2, related works are summarized. Section 3 presents the core of this work, which is a bagging image-based visual servoing (B-IBVS) process that derives the desired velocity using a set of reduceddimensionality image Jacobian matrices. Section 4 presents the adaptive servoing gain method using RL for B-IBVS. Simulations and real experiments are presented in Section 5 to demonstrate the performance of the proposed method. The discussion and conclusions are detailed in the final section. 2. Related work A classical IBVS (C-IBVS) system requires six velocities to be specified in a Cartesian coordinate system: three linear velocities and three angular velocities. The formula [5] is expressed as:

v = −λˆJe

(1)

where v = [vx ; vy ; vz ; ωx ; ωy ; ωz ] denotes the linear and angular velocities of the camera’s motion along three axes. These are also called motion parameters, where e represents the image feature error vector and e = f − f∗ (f is the current feature coordinate and f∗ is the desired coordinate), ˆJ denotes the pseudo inverse matrix of the image Jacobian matrix J and λ is a scaling factor for v, called the servoing gain, which usually has a constant value for most IBVS systems. Using (1), feature errors are mapped to velocity inputs. IBVS methods that are used to control under-actuated unmanned aerial vehicles (UAV), especially quad-rotor drones, have been the subject of many studies [4,8,26,28,33]. These methods allow greater robustness and avoid the calibration problems that are associated with PBVS. In [23], the dynamic effects of an underactuated quad-rotor drone is detailed and a compact Virtual Spring approach is proposed. In [13], it is assumed that the bias in the positions of the captured features can be ignored when the vehicle has a small roll and pitch angle because of the relationship between attitude and velocity. Setting the servoing gain λ has been subject of many studies [4,5,22,25]. In [4,5,25], this parameter usually has a positive value, which is artificially specified in term of experimental experience. In [22], an adaptive gain matrix is adjusted using deviations relative to the trained image of the target object so that additional information from the image processing software can be used to increase the performance of the visual servo controller. These studies show that an appropriate servoing gain is necessary for good control. However, these method calculate the servoing gain heuristically or empirically and there is no rationale for selecting an appropriate servoing gain. A proportional gain is used for Proportional-Integral-Derivative (PID) controllers to replace the function of the servoing gain λ and this allows dynamic and stable visual servoing [17,31]. However, there is no generic approach that can be used for all or even similar tasks because the parameters for PID controllers are not easy to calculate. If a sufficient number of features can be extracted, an image Jacobian matrix J containing the features’ coordinates can be constructed to calculate the desired velocity of the controlled object by (1). However, online estimation of image Jacobian matrices is not straightforward and requires many image sequences if the estimates are to converge. Additional feature points are used to prevent a singularity. However, using additional feature points requires the use of a pseudo-inverse or the transpose of the interaction matrix, which increases the risk of becoming trapped in a local minimum. In order to decrease the effect of contaminated features caused by image noise, redundant features are fed into J to form a non-square matrix, instead of a square matrix that is constructed using two features, for which the inverse is much easier to calculate [4,22]. Methods that construct a square Jacobian matrix by limiting the number of features can produce local minima because the small number of feature points can mean that there is insufficient information about the pose. Although the inverse matrix of a non-square image Jacobian

246

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

matrix can be calculated using a Moore-Penrose pseudo inverse, the computation is very complicated, especially when the matrix is rank deficient [5,25]. 3. Bagging image-based visual servoing 3.1. Reduced-dimensionality IBVS for quad-rotor drones The proposed visual servoing method is implemented on a commercial quad-rotor drone: an ARDrone system. The drone has built-in attitude control. It has been determined that the small bias in the position of the captured features can be ignored when the roll and pitch angles are small [13], because small deviations from the predicted feature positions that are caused by roll and pitch are ignored. So the motion parameters are v = [vx ; vy ; vz ; ωz ]. Therefore, a reduced image Jacobian matrix can be constructed because its dimensionality is reduced from six to four, which reduces the computational complexity compared with the traditional method. A reduced-dimensionality IBVS system for quad-rotor drones is constructed. M features are extracted as {fi |i = 1, 2, ..., M} and the feature error e for IBVS in (1) is defined as

e = [f1 − f∗1 , f2 − f∗2 , ..., fM − f∗M ]T ∈ R2M×1 px px [xi , yi ]

(2)

∗px ∗px [xi , yi ]

f∗i

where fi = and = denote the current coordinate and the desired coordinate for the ith feature in the digital image in terms of pixels [9]. It is assumed that Fi (Xi , Yi , Zi ) is the point that corresponds to the ith feature in the camera coordinate system and the coordinate in the image plane is (xi , yi ), where f is the camera’s focal length; the coordinate of the image plane central point in the camera coordinate system is (dx , dy , − f ); α x and α y is the scaling ratio from the digital image to image plane. In terms of the imaging model, the spatial relationship between the digital image and the camera coordinate system is:



xipx Zi



yipx Zi

 =

− f /αx

0

−dx /αx

0

− f /αy

−dy /αy

⎡ ⎤  Xi ⎣Yi ⎦

(3)

Zi

where {f/α x , f/α y , dx /α x , dy /α y } are the intrinsic parameters of the camera. According to kinematics,

⎧ ⎨dXi /dt = −vx + ωzYi dYi /dt = −vy − ωz Xi ⎩ dZi /dt = −vz

(4)

Using (3) and (4),



dxipx /dt



dyipx /dt

= Li v

(5)

where v = [vx ; vy ; vz ; ωz ]; Li is the image Jacobian matrix:



Li =

f ⎢ αx Zi ⎢



0 f

0

αy Zi

(xipx +dx /αx ) Zi

(yipx + dy /αy ) Zi



⎤ αy px dy yi + αx αy ⎥ ⎥ αx px dx ⎦ − xi + αy αx

(6)



For the IBVS system for a quadrotor drone, the image Jacobian matrix in (1) is J = [L1 ; L2 ; ...; LM ] ∈ R2M×4 . 3.2. Bagging velocity calculation algorithm (BVCA) Image noise that occurs during feature extraction can affect the accuracy of the predicted velocity that is calculated using the image Jacobian matrix inverse and the Moore-Penrose pseudo inverse. A bagging method decreases the deviation in a prediction by generating additional data for training using the original data [12]. To reduce the effect of image noise and to allow a greater number of features to be extracted, a bagging method iterated in every servoing cycle to derive the desired velocity more efficiently and accurately. During a servoing cycle, M coordinate sets are defined as {FSi |FSi = {xij | j = 1, ..., Ni }, i = 1, ..., M} to record the iterative coordinates of M features in a digital image. In the initialization phase, Ni is 1 and the recorded value xi of FSi = {xi } is the coordinate of the ith feature fi that is derived using image recognition [9]. These M recognized coordinates may contain noise so the bagging method is used. In one iteration of a BVCA, the selected probability sri of FSi is calculated as:

sri = Ni /

M

i=1

Ni

(7)

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

247

Algorithm 1 BVCA. 1. Definition 2. fi : = recognized coordinate of ith feature by vision algorithm 3. get_position (): = calculate ith feature’s virtual coordinate by two features 4. get_average_coordinate (): = calculate average coordinate 5. Initialization 6. FSi ← {fi }, for 1 ≤ i ≤ M; 7. Ni ← 1, for 1 ≤ i ≤ M; 8. t ← 1; 9. Repeat t++  10. sri ← Ni / M i=1 Ni , for 1 ≤ i ≤ M; 11. K,U ← select randomly according to sri ; K 12. xk ← select randomly from FSK ; 13. xUu ← select randomly from FSU ; 14. i ← 1; //update other FSi // 15. Repeat i++ 16. pi ← get_position(xKk , xUu , i, K, U) //new virtual sample; 17. x¯ i ← get_average_coordinate(FSi ); 18. li ← Calculate the Euclidean distance between pi and x¯ i 19. if li > ɛl then //filter pi // 20. pi is filtered; 21. else //add pi // 22. pi is added into FSi ; 23. Ni ← Ni + 1; 24. end if 25. until i > M and i = K and i = U 26. ← { (i, j )|1 ≤ i < j ≤ M, det(J¯i j ) = 0}; 27. Dij ← min(Ni ,Nj ), for 1 ≤ i < j ≤ M and det(J¯i j ) = 0; e¯ i j , for 1 ≤ i < j ≤ M and det(J¯i j ) = 0; 28. v¯ i j ← J¯−1 ij  29. wij ← Di j / (i, j )∈ Di j , for 1 ≤ i < j ≤ M and det(J¯i j ) = 0;  30. vˆ ← i, j wi j v¯ i j , 1 ≤ i < j ≤ M and det(Ji j ) = 0; 31. if vˆ converge then //if velocity converge break// 32. break; 33. end if 34. until t > Tmax 35. v← −λvˆ ;

Two distinct feature sets, FSK and FSU , are selected with specific probabilities using (7). The coordinates xKk and xUu in these two sets are selected randomly to generate kin virtual samples for other features using their relative geometric positions. Before filtering, the average coordinate for the current FSi is obtained as x¯ i . The new virtual sample pi , which is the new calculated coordinate of the ith feature, is added into FSi by filtering. The Euclidean distance li between the coordinates of the image features from pi to x¯ i is then calculated. The filtering rules are: If li > ɛl , pi cannot be added into FSi . pi is added into FSi and Ni = Ni + 1. The number of elements in the coordinate sets that corresponding to correct features is increased but there is no increase in the number of incorrect features. When the coordinate sets are updated, there are M average feature coordinates. The bagging iteration recalculates these as {x¯ i |i = 1, ..., M}. The corresponding image Jacobian matrix for each two average feature coordinates is {J¯i j ∈ R4×4 |1 ≤ i < j ≤ M}. If J¯i j has no inverse matrix, the combination Cij for the ith feature and the jth feature is filtered. The final selected combinations are {Cij |(i, j) ∈ }, where  = {(i, j )|1 ≤ i < j ≤ M, det(J¯i j ) = 0}. In the current BVCA iteration, the number of elements in the selected combinations is:

Di j = min{Ni , N j }, (i, j ) ∈ 

(8)

The velocities relative to the combinations are calculated as:

v¯ i j = J¯−1 e¯ , (i, j ) ∈  ij ij R4×1

[x¯ i − x¯ ∗i ; x¯ j

(9) − x¯ ∗j ].

where e¯ i j ∈ and e¯ i j = The weights of calculated velocities are calculated as:

w i j = Di j /



(i, j )∈

Di j

(10)

The current average weighted velocity is:

vˆ =



i, j

wi j v¯ i j , (i, j ) ∈ 

(11)

These steps are repeated until the velocities converge or until the termination conditions are fulfilled. The final control velocity for the quadrotor drone is calculated as:

v = −λvˆ

(12)

248

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

Algorithm 2 ASSDA. 1. Definition 2. sk : = kth dimension of state space 3.skmin : = lower limit of sk 4.skmax : = upper limit of sk 5.skj : = current obtained state value in kth dimension of state space 6.s_indexkc : = current state index in kth dimension of state space 7. get_state_index (): = calculate the state index 8. Initialization //divide non-uniformly// 9. k ← 1; 10. Repeat k++ 11. skmin ← initial lower limit of kth dimension of state space; 12. skmax ← initial upper limit of kth dimension of state space; 13. i ← 0; 14. Repeat i++ 15. ski + ← skmax × (1 − logu+1 (u − i + 1 )); 16. sik− ← skmin × (1 − logu+1 (u − i + 1 )); 17. until i > u 18. until k > 4 19. Repeat //system starts// 20. skj ← obtained from the environment; / [skmin , skmax ] then //extend domain// 21. if skj ∈ 22. extend [skmin , skmax ] with step_ max towards skj ; 23. end if 24. s_indexkc ← get_state_index(skj ,[skmin , skmax ]); //get the state 25. until system is ended

Fig. 1. An analysis of the motion of a quad-rotor drone.

The corresponding BVCA is presented in Algorithm 1 which initializes the feature set in lines 5–8 and selects the feature randomly in lines 9–14. In lines 15–25, it filters the new sample by calculating the Euclidean distance. At the end of Algorithm 1, the weight is determined using the number of elements in the selected combinations and the average weighted velocity is calculated. 3.3. Dynamics and rotor actuation The acceleration in the x- and y-directions avx and avy is calculated using the motion parameters as:

avx = dvx /dt ; avy = dvy /dt

(13)

Taking the x direction as an example, when the specified acceleration avx is determined, the desired motion model for a

quadrotor drone in the x direction can be simplified as Fig. 1(a). In Fig. 1(a), θy∗ is the desired pitch angle, and:

tan θy∗ = mavx /mg ⇒ θy∗ = arctan avx /g

(14)

The current real pitch angle θyc is obtained using the gyroscope and the accelerometer in the drone [6]. The angular velocity ωy w.r.t the y-axis is calculated as:

ωy = (θy∗ − θyc )/t

(15)

where t is the specified movement time. In the same way, the angular velocity ωx w.r.t the x-axis is calculated as:

θx∗ = arctan g/avy ; ωx = (θx∗ − θxc )/t

(16)

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

249

Fig. 2. Diagram of adaptive state space extension.

where θx∗ is the desired roll angle and θxc is the current real roll angle, which is obtained from the gyroscope. The angular velocities ωx and ωy can then be calculated. The value of each is very small. The four-dimensional velocity is converted to six-dimensional velocity, withv = [vx ; vy ; vz ; ωx ; ωy ; ωz ]. As shown in Fig. 1(b), using the aerodynamics for a quad-rotor drone, the final servo rotation speed, i.e. the servo parameter, can be calculated [11]. 4. An adaptive servoing gain method using Q-learning The servoing gain for the IBVS in (1), which is akin to a PID control, significantly affects the rate of convergence and the degree of accuracy. There are many heuristic selection polices for selecting the servoing gain [4,5,22,25]. This study uses a systematic method which learns the selection policy for the adaptive servoing gain for the proposed bagging IBVS using Q-Learning (BVS-Q) [29]. 4.1. State space partition The selection of the state and the partitioning of the state space are crucial for a RL model. In BVS-Q, if the feature errors are chosen as the state, large features may result in the curse of dimensionality. For the IBVS, λ is relative to feature errors e and the spatial position of features in the camera coordinate system. Therefore, this study defines a state as the value of the velocity vˆ , which is calculated by BVCA in (11):

S = {s|s = vˆ }

(17)

To partition the state space, traditional partitioning divides the state space uniformly in a fixed domain. However, in this case, the domain for the state space cannot be known in advance. The smaller value of the s is, i.e. the closer it is to the zero point, smaller is the size of the partitioning unit. Therefore, the adaptive state space features an exponential partitioning unit and automatic extension. When discretizing the continuous state space, the methods for the four dimensions of state space are the same. For instance, in first dimension s1 , the initial upper limit s1max and the lower limit s1min can be obtained. If the interval [0, s1max ] is divided into u segments and every segment is [s1+ , s1+ ], i = 0, 1, ..., u − 1, s1+ is calculated as: i i+1 i

s1+ = s1max × (1 − logu+1 (u − i + 1 )) i

(18)

In the same way, the interval [s1min , 0] is divided into

− [s1i+1 , s1i − ], i

= 0, 1, ..., u − 1 and

s1i − = s1min × (1 − logu+1 (u − i + 1 )) If a value for s1j that is obtained in s1 is domain of [s1min , s1max ]. The largest division

(19) not in the interval

[s1min , s1max ],

a uniform extension is conducted in the outside

unit step_ max = s1max × logu+1 2 in the direction of s1j , as shown in Fig. 2. The values of original state space are also inherited. These are the non-blue blocks in Fig. 2(b). The values for the new extended state space are set to an initial value, which are the blue blocks in Fig. 2(b). The corresponding adaptive state space discretizing algorithm (ASSDA) shown as Algorithm 2. In Algorithm 2, line 1 to line 7 initialize the variables for the limit and the state. Lines 9–18 divide the space under the initial dimension. If the space is exceeded, an expanded space is determined and the state representation is determined in lines 19–23. 4.2. Action set Since the select range of λ is very large, adjusting it directly is very time consuming. Initially, a reasonable value λ∗ is set and then fine-tuned using RL. Different values of s, i.e. vˆ , are given relative to a different λ. To fine-tune λ∗ , the change rate is regarded as the action. The updating formula for λ(s) is:

λnew (s ) = λold (s )(1 + a )

(20)

250

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

Fig. 3. The frame of the BVS-Q.

where λnew is the value of λ after adjustment, λold is the value of λ before adjustment, the action a is an arithmetic array with a median of zero which related to the number of actions n and the change unit size q. For example, if the number of actions is n = 7 and the change unit is q = 0.05, the action set is {−0.15, −0.1, −0.05, 0.0, 0.05, 0.1, 0.15}. 4.3. Reward function The reward function is divided into three parts: 1) Identify the desired position: If the distances from every feature to its desired position are all less than ɛ pixels, the IBVS control is considered to be completed and a very good feedback is given. 2) Features beyond the camera’s field of view: If some features are lost after feature selection, a very bad feedback is given.  ∗ 3) General reward: other situations are general reward situations. The shorter the feature error distance M i=1 |fi − fi |, the better is the feedback. The reward function for the previous situations is:



r=

C , getdesired position −C , outo f camera f ield √  ∗ 2 2 −C M i=1 |fi − fi |/ (M r n + cn ), generalreward

(21)

where C is a constant that is determined for the actual situation, rn and cn represent the number of rows and columns in an image that is captured using the camera. 4.4. Value iteration The drone enacts a series of same actions until the current state transits into the next state. This is the semi-Markov phenomenon [21]. A learning epoch is defined as the time cost for RL to transform from one state to the next state. The value iteration method for this study is different from a traditional Q-Table updating strategy:

Qt+1 (st , at ) = (1 − α )Qt (st , at ) + α (r + γ max Qt (st+1 , at+1 )) at+1

(22)

where st is the state in the tth learning step, α is the learning rate, γ is the discount rate, r is real-time reward and st+1 is the next state of st after at . It is assumed that in one epoch, the current state s transits into s by the same actions after T learning cycles. During the epoch, the Q-Table is not updated and the real-time reward is {rt+i |i = 0, 1, ..., T − 1}. The action value iteration strategy is defined as:

Vt+T −1 (s, a ) = rt+T −1 + γ max Qt (s , a )

(23)

Vt+i (s, a ) = rt+i + γ Vt+i+1 (s, a ), 1 ≤ i ≤ T − 2

(24)

a

Qt+T (s, a ) = (1 − α )Qt (s, a ) + α (rt +

T −1 k=1

Vt+k )/T

(25)

where Qt (s, a) denotes the corresponding value in Q-Table for state s and action a, Vt (s, a) is the cumulative reward at the tth moment in the epoch and rt denotes the real-time reward. 4.5. Framework for BVS-Q Using this description, the holonomic architecture of the proposed BVS-Q is constructed. This method uses a bagging approach to reduce the effect of image noise and uses Q-Learning to adaptively adjust the servoing gain. The framework for the BVS-Q is shown in Fig. 3. Feature information is obtained from the environment. Using the BVCA, the real-time predicted velocity vˆ can be calculated by iteration. In the Q-Learning model, vˆ is regarded as the state space and the change rate for the servoing gain is taken as the action. After learning, the learnt servoing gain is used to calculate the four-dimensional aerial velocity using (1). Using the dynamics, the four-dimensional velocity is transformed into six-dimensional velocity to control the quad-rotor drone, as shown in Section 3.3. Finally, these steps are repeated until the task is completed.

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

251

Table 1 Intrinsic parameters of a quad-rotor drone. Identified parameter

Value

f/α x f /α y dx /α x dy /α y d Ix Iy Iz

701.65 pixels 700.49 pixels 38.74 pixels −0.49 pixels 16.5 cm 12.863 kg cm2 49.287 kg cm2 41.775 kg cm2

Fig. 4. Experimental system.

5. Simulations and experiments In the experiments, the predicted velocities, which are calculated using the bagging method and the Moore-Penrose pseudo inverse, are compared to demonstrate that the former approach can approximate the desired velocity more efficiently than the latter. To demonstrate the ability to resist noise and the efficiency of the proposed B-IBVS, it is compared T with the C-IBVS [8] with ˆJ = (JT J )−1 JT , the average IBVS (A-IBVS) [5] with ˆJ = 1/2( (J + J∗ ) (J + J∗ ) )−1 (J + J∗ )T , where J∗ , a ∗ constant, is the value of J for the desired position e = e = 0, and visual PID control (V-PID) [17,31]. A simulation platform and an actual scene are used for many experiments to select a group of parameters (gains) that have a better effect. The results for the experiments for adaptive servoing gain using RL are then compared with those for B-IBVS without RL to demonstrate the efficiency of RL. Finally, to further verify the practicality of the proposed method, the target is set to a moving object and the BVS-Q result is tested. The image Jacobian matrix is used to relate image errors to a robot’s velocity. The object is assumed to be stationary for IBVS. Therefore, the proportional control law with a constant proportional gain that is depicted in (1) is used to control the robots’ motion because this method is simple and effective. However, even with the accurate Jacobian matrix for the time sequences, the global asymptotic stability of the IBVS still cannot be ensured. There is a local asymptotic stability in the vicinity of the desired position. Using the adaptive proportional gain that is learned using RL, it is possible to track a slowly moving object. The experiments use on both a simulation platform and a real quad-rotor drone, whose intrinsic parameters are shown in Table 1 and in Fig. 4. Fig. 4(a) shows the actual static outdoor experimental scene. Fig. 4(b) shows the tracking scene, where the wheeled robot moves at a certain speed and the quad-rotor drone flies to track the targets on the wheeled robot. Fig. 4(c) shows the simulation platform with a quad-rotor drone, which is same as the real quad-rotor drone.

5.1. Comparison of the predicted velocity As shown in Fig. 4(a), one image of the real environment is captured by the bottom camera on the drone. The computer vision algorithm gives the position of the 4 features as {(−36, 165), (−73, 205), (9, 202), (−91, 203)}. The manually calculated correct positions are {(−35, 164), (−74, 207), (7, 203), (−32, 246)}. Image noise means that the 4th feature’s position is incorrect. The desired positions for the four features are {(−10, 23), (−10, −22), (−55, 23), (−55, −22)}. λ = 0.5. In Fig. 5, the x − axes shows the iteration cycle, which is fixed. The y − axes show a comparison for the predicted velocity. For the positions that are incorrect due to image noise, the predicted velocity using Moore-Penrose pseudo inverse, i.e. v = −λ(JT J )−1 JT e, is (−0.1223 m/s, −0.1305 m/s, 0.8536 m/s, −0.2775 rad/s). These are the pink lines in Fig. 5. Using the manually calculated correct positions, the predicted velocity, v = −λ(JT J )−1 JT e, i.e. the desired velocity, is (−0.1073 m/s, −0.2551 m/s, 1.1023 m/s, −0.2637 rad/s). These are shown as red lines in Fig. 5. The blue lines in Fig. 5 show the result for the bagging approach for εl = 5pixels. The velocity converges to (−0.1085 m/s, −0.2461 m/s, 1.0915 m/s, −0.2635 rad/s).

252

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

Fig. 5. A comparison of the predicted velocity: (a) Comparison of vx , (b) Comparison of vy , (c) Comparison of vz , (d) Comparison of wz .

Fig. 6. A comparison of the simulation platforms: (a) comparison of the real-time position in the x-direction, (b) comparison of the real-time position in the y-direction, (c) comparison of the real-time position in the z-direction and (d) real-time variance.

Fig. 5 shows that the result for the bagging-approach is closer to the desired velocity than that for the Moore-Penrose pseudo inverse. The bagging approach assigns better weights to the correct feature positions and reduces the effect of image noise, but the Moore-Penrose pseudo inverse does not. 5.2. A comparisons between different control methods The comparisons between the proposed B-IBVS, C-IBVS in [25], A-IBVS in [5], and V-PID in [17,31] use the simulation platform and the ARDrone system. The initial position is (0.5 m, 0.24 m, 1.0 m, −140°); the desired position is (0.0 m, 0.0 m, 1.8 m, 0°); t = 0.04s; λ = 0.5. It is worthy of note that the simulations involve no image noises, which is a factor for all field tests. Control tests were conducted to determine the efficiency and the ability to deal with noise. Each method was tested 50 times and the results are shown in Figs. 6–9. In Fig. 6(a)–(c) and Fig. 8(a)–(c), the light-red, light-green, light-blue

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

253

Fig. 7. Feature trajectory for the simulation platforms: (a) trajectory for the B-IBVS, (b) trajectory for the A-IBVS, (c) trajectory for the C-IBVS and (d) trajectory for the V-PID.

Fig. 8. Comparison of actual scenarios: (a) comparison of the real-time position in the x-direction, (b) comparison of the real-time position in the ydirection, (c) comparison of the real-time position in the z-direction and (d) real-time variance.

and light-pink lines are the real-time feedback positions and the red, green, blue and pink lines are the average positions. In Figs. 6(d) and 8(d), the red, green, blue, and pink lines are the average values for 50 experiments. The variance in one time slice is defined as the average distance for each experiment:

50 var(t ) =

k=1



2

Xk (t ) − X¯ (t )



2

2 + (Yk (t ) − Y¯ (t )) + Zk (t ) − Z¯ (t )

50

(26)

where var(t ) represents the variance in the tth time slice, Xk (t) represents the position in x-direction of kth experiment for the tth time slice, X¯ (t ) represents the average value in the x-direction for the tth time slice, Yk (t) and Y¯ (t ), Zk (t) and Z¯ (t ) are similar to Xk (t) and X¯ (t ). A comparison of the variance is shown in Figs. 6(d) and 8(d). Figs. 7 and 9 show the average motion trajectories for the four features in the digital image.

254

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

Fig. 9. Feature trajectory for the actual scenarios: (a) trajectory for the B-IBVS, (b) trajectory for the A-IBVS, (c) trajectory for the C-IBVS and (d) trajectory for the V-PID.

Fig. 10. Learning results using the simulation platform.

Fig. 11. Learning results for an ARDrone system.

The average values in Fig. 6(a)–(c) show that the drone fluctuates easily and achieves convergence after about 220 control cycles using C-IBVS. Using A-IBVS the motion trajectory of the drone is smooth and the quadrotor drone converges to zero after about 170 control cycles. Using the V-PID control, the drone converges to zero after about 350 control cycles. Using B-IBVS the motion trajectory is smooth and the drone converges to zero after about 195 control cycles. This result is similar to that for C-IBVS and lower than that for A-IBVS. The reason is that there is no noise in the simulation platform and a factor, J∗ is added to the A-IBVS, which is proven to give a better control result than C-IBVS [5]. In Fig. 6(d), because there is no noise, the values for the variance for the four methods are small, at less than 4 cm, so each method allows a stable simulation. The same results are seen in Fig. 7.

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

255

Fig. 12. A comparison of the control for adaptive servoing gain using RL and fixed servoing gain with no RL.

Fig. 13. Tracking trajectories for different speeds: (a) tracking trajectory at 0.2 m/s, (b) tracking trajectory at 0.55 m/s and (c) tracking trajectory at 0.6 m/s.

For the real environment with image noise, Fig. 8(a)–(c) show that C-IBVS converges after about 320 control cycles, AIBVS converges after about 250 control cycles, V-PID converges after about 360 control cycles and the B-IBVS converges after about 200 control cycles. Fig. 8(d) shows that B-IBVS has the least variance. When the drone arrives at the desired position, B-IBVS produces the least jitter, as shown in Fig. 9, which demonstrates that the bagging method reduces the effect of image noise more than the Moore-Penrose pseudo inverse. The simulation and the experiments in real space show that B-IBVS converges at a faster rate and deals with noise better than the compared methods. 5.3. Adaptive servoing gain using Q-learning The proposed method to give an adaptive servoing gain is initially implemented on the simulation platform and the learnt policy is then transferred to the quad-rotor drone for on-line learning. The intrinsic parameters of RL are shown in Table 2. As for Section 5.2, the initial position is (0.5 m, 0.24 m, 1.0 m, −140°), the desired position is (0.0 m, 0.0 m, 1.8 m, 0°); and t = 0.04s. The rules for the Q-learning experiments are: 1) If after 400 control cycles, the quadrotor drone does not arrive at the desired position, the episode is terminated and the next episode starts. 2) If some features are lost, this episode is terminated and the next episode starts. 3) When the quad-rotor drone is moving, if the distance between the current position and the desired position remain within ɛ pixels for a certain time, the episode is terminated and the next episode starts. 4) After every control cycle, the λ matrix is updated using (20). 5) When a new episode starts, the starting position and the desired position are changed randomly.

256

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257 Table 2 Intrinsic parameters of RL. Identified parameter

Value

ɛ u

5 pixels 5 0.5 7 0.1 0.85 0.8

λ



n q

γ α

The control cycle cost from the starting point to the end point during learning for the simulation is shown in Fig. 10. Fig. 10 shows that the final time cost for visual servoing for the simulation is less than 119 control cycles when RL is used, which is about half the initial time that is required. The trained servoing gains are then used for the real scene to continue the learning. Fig. 11 shows that for the real scene the adaptive servoing gain method that uses RL is also efficient and effective. Fig. 12 compares the final visual servoing results of the BVS-Q and B-IBVS without RL (B-IBVS W/O RL) for λ = 0.5 in the real scene. It is seen that the variance for BVS-Q is also small, so B-IBVS with RL gives a faster convergence rate and good stability. This demonstrates that the adaptive servoing gain method using RL is efficient and practical. 5.4. Tracking a moving target To demonstrate the practicality of the proposed BVS-Q, experiments to track a moving target use the same initial parameters of the experiment as those for Section 5.3. In the experiment, a specific image is placed on top of a wheeled robot and the drone follows this robot. The wheeled robot’s trajectory is a 250 mm ∗ 250 mm polygon. The initial speed is 0.2 m/s, and this was increased by 0.05 m/s for 20 consecutive experiments. The tracking trajectories for the robot moving at different speeds are shown in Fig. 13. The figure shows that as the speed of the wheeled robot increases, the real-time error between the tracking trajectory for a quad-rotor drone and the moving trajectory of the wheeled robot also increases. When the wheeled robot moves too fast, some features are lost in the image plane. The experiments show that when the wheeled robot moves at 0.55 m/s, the proposed BVS-Q tracks the target perfectly. At a speed of 0.6 m/s, the trajectory fluctuates significantly. 6. Conclusion This study proposes an adaptive visual servoing control system for a quad-rotor drone. The BVS-Q controller tracks either a stationary or a slowly moving target and addresses the problems of generic IBVS methods for quad-rotor drones, such as complex dynamic models, a lack of robustness, under-actuation and the need for constant servoing gain. The bagging-IBVS accommodates under-actuated dynamics using an image Jacobian and motion parameters with reduced-dimensionality and approximates the desired velocity by assigning a greater weight to the correct features, in order to reduce the effect of image noise. The reduced motion parameters are then extended into a full set of servoing control using the specific dynamics for a quad-rotor drone. An adaptive servoing gain method that uses Q-Learning is proposed to constantly adjust the servoing gain, which negates the need for manual regulation. The result for experiments that use simulation platforms and actual scenes verify that the proposed BVS-Q allows excellent hovering control and target tracking, better adaptability, greater robustness and faster convergence than the competing methods. In the future, visual servoing will be used extensively for all aspects of robotic control, including manipulators, wheeled robots and underwater robots. They will replace humans for more difficult tasks, which means that drones will have to be more safe, stable and flexible. For example, when the feature image for visual servoing is lost, the missing information can be completed using prior information for the observed object during the preprocessing of visual servoing features. In terms of the dynamic effects of the kinematic model, future studies might reduce the effect on the calculation of the image feature error by increasing the stability of the model. Acknowledgments This work was supported by National Key Research and Development Program of China [No. 2017YFB1001900]; the Aeronautical Science Foundation of China [No. 2016ZC53022]; and the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University [No. ZZ2018026] . References [1] A. Assa, F. Janabi-Sharifi, Virtual visual servoing for multicamera pose estimation, IEEE/ASME Trans. Mechatron. 20 (2) (2015) 789–798. [2] M. Bakthavatchalam, F. Chaumette, E. Marchand, Photometric moments: new promising candidates for visual servoing, in: 2013 IEEE International Conference on Robotics and Automation (ICRA), 2013, pp. 5241–5246.

H. Shi, K.-S. Hwang and X. Li et al. / Information Sciences 481 (2019) 244–257

257

[3] Q. Bateux, E. Marchand, J. Leitner, F. Chaumette, P. Corke, Training deep neural networks for visual servoing, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1–8. [4] O. Bourquardez, R. Mahony, N. Guenard, F. Chaumette, T. Hamel, L. Eck, Image-based visual servo control of the translation kinematics of a quadrotor aerial vehicle, IEEE Trans. Robot. 25 (3) (2009) 743–749. [5] F. Chaumette, S. Hutchinson, Visual servo control. I. Basic approaches, IEEE Robot. Autom. Mag. 13 (4) (2006) 82–90. [6] J. Favre, B.M. Jolles, O. Siegrist, K. Aminian, Quaternion-based fusion of gyroscopes and accelerometers to improve 3D angle measurement, Electron. Lett. 42 (11) (2006) 612–614. [7] L.R.G. Carrillo, A. Dzul, R. Lozano, Hovering quad-rotor control: a comparison of nonlinear controllers using visual feedback, IEEE Trans. Aerosp. Electron. Syst. 48 (4) (2012) 3159–3170. [8] L.R.G. Carrillo, G.R. Flores Colunga, G. Sanahuja, V. Kumar, Quad rotorcraft switching control: an application for the task of path following, IEEE Trans. Control Syst. Technol. 22 (4) (2014) 1255–1267. [9] M.C. Chuang, J.N. Hwang, K. Williams, R. Towler, Tracking live fish from low-contrast and low-frame-rate stereo videos, IEEE Trans. Circt. Syst. Video Technol. 25 (1) (2015) 167–179. [10] J. Gao, A.A. Proctor, Y. Shi, C. Bradley, Hierarchical model predictive image-based visual servoing of underwater vehicles with adaptive neural network dynamic control, IEEE Trans. Cybern. 46 (10) (2016) 2323–2334. [11] N. Guenard, T. Hamel, R. Mahony, A practical visual servo control for an unmanned aerial vehicle, IEEE Trans. Robot. 24 (2) (2008) 331–340. [12] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans. Syst. Man Cybern. Part C 42 (4) (2012) 463–484. [13] J.E. Gomez-Balderas, G. Flores, L.R.G. Carrillo, R. Lozano, Tracking a ground moving target with a quadrotor using switching control, J. Intell. Robot. Syst. 70 (1) (2013) 65–78. [14] F. Janabi-Sharifi, L. Deng, W.J. Wilson, Comparison of basic visual servoing methods, IEEE/ASME Trans. Mechatron. 16 (5) (2011) 967–983. [15] P. Jiang, Y. Cheng, X. Wang, Z. Feng, Unfalsified visual servoing for simultaneous object recognition and pose tracking, IEEE Trans. Cybern. 46 (12) (2016) 3032–3064. [16] J.H. Jean, F.L. Lian, Robust visual servo control of a mobile robot for object tracking using shape parameters, IEEE Trans. Control Syst. Technol. 20 (6) (2012) 1461–1472. [17] Y. Kubota, Y. Iwatani, Dependable visual servo control of a small-scale helicopter with a wireless camera, in: 2011 15th International Conference on Advanced Robotics (ICAR), 2011, pp. 476–481. [18] F. Le Bras, T. Hamel, R. Mahony, C. Barat, J. Thadasack, Approach maneuvers for autonomous landing using visual servo control, IEEE Trans. Aerosp. Electron. Syst. 50 (2) (2014) 1051–1065. [19] Y. Liu, L. Nie, L. Han, L. Zhang, D.S. Rosenblum, Action2Activity: recognizing complex activities from sensor data, in: IJCAI, 2015, pp. 1617–1623. [20] Y. Liu, Y. Zheng, Y. Liang, S. Liu, D.S. Rosenblum, Urban water quality prediction based on multi-task multi-view learning, in: IJCAI, 2016, pp. 2576–2582. [21] N. Marchenko, C. Bettstetter, Cooperative ARQ with relay selection: an analytical framework using semi-markov processes, IEEE Trans. Veh. Technol. 63 (1) (2014) 178–190. [22] Y. Wang, G. Zhang, H. Lang, B. Zuo, C.W. De Silva, A modified image-based visual servo controller with hybrid camera configuration for robust robotic grasping, Robot. Auton. Syst. 62 (10) (2014) 1398–1407. [23] R. Ozawa, F. Chaumette, Dynamic visual servoing with image moments for a quadrotor using a virtual spring approach, in: 2011 IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 5670–5676. [24] P. Serra, R. Cunha, T. Hamel, C. Silvestre, F. Le Bras, Nonlinear image-based visual servo controller for the flare maneuver of fixed-wing aircraft using optical flow, IEEE Trans. Control Syst. Technol. 23 (2) (2015) 570–583. [25] A. Santamaria-Navarro, J. Andrade-Cetto, Uncalibrated image-based visual servoing, in: 2013 IEEE International Conference on Robotics and Automation (ICRA), 2013, pp. 5247–5252. [26] H. Shi, X. Li, K.S. Hwang, W. Pan, G. Xu, Decoupled visual servoing with fuzzy Q-learning, IEEE Trans. Ind. Inform. 14 (1) (2018) 241–252. [27] C.Y. Tsai, C.C. Wong, C.J. Yu, C.C. Liu, T.Y. Liu, A hybrid switched reactive-based visual servo control of 5-DOF robot manipulators for pick-and-place tasks, IEEE Syst. J. 9 (1) (2015) 119–130. [28] J. Thomas, G. Loianno, K. Sreenath, V. Kumar, Toward image based visual servoing for aerial grasping and perching, in: 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2113–2118. [29] H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double Q-learning, in: 2016 Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016, pp. 2094–2100. [30] F. Wang, Z. Liu, C.L.P. Chen, Y Zhang, Adaptive neural network-based visual servoing control for manipulator with unknown output nonlinearities, Inf. Sci. 451 (2018) 16–33. [31] K. Watanabe, Y. Yoshihata, Y. Iwatani, K. Hashimoto, Image-based visual PID control of a micro helicopter using a stationary camera, Adv. Robot. 22 (2) (2008) 381–393. [32] H. Xie, A.F. Lynch, Input saturated visual servoing for unmanned aerial vehicles, IEEE Trans. Mechatron. 22 (2) (2017) 952–960. [33] D. Zheng, H. Wang, J. Wang, S. Chen, W. Chen, X. Liang, Image-based visual servoing of a quadrotor using virtual camera approach, IEEE Trans. Mechatron. 22 (2) (2017) 972–982.