An improved recurrent neural network for unmanned underwater vehicle online obstacle avoidance

An improved recurrent neural network for unmanned underwater vehicle online obstacle avoidance

Ocean Engineering 189 (2019) 106327 Contents lists available at ScienceDirect Ocean Engineering journal homepage: www.elsevier.com/locate/oceaneng ...

6MB Sizes 0 Downloads 67 Views

Ocean Engineering 189 (2019) 106327

Contents lists available at ScienceDirect

Ocean Engineering journal homepage: www.elsevier.com/locate/oceaneng

An improved recurrent neural network for unmanned underwater vehicle online obstacle avoidance Changjian Lin a, Hongjian Wang a, *, Jianya Yuan a, Dan Yu a, Chengfeng Li a, b a b

College of Automation, Harbin Engineering University, 145 Nantong Street Nangang District, Harbin, 150001, China College of Electrical Engineering, Suihua University, 18 Huanghe Street, Suihua, 152061, China

A R T I C L E I N F O

A B S T R A C T

Keywords: online obstacle avoidance planning Recurrent neural network Unmanned underwater vehicle Convolution

This paper focuses on online obstacle avoidance planning for unmanned underwater vehicles. To improve the autonomous ability and intelligence of obstacle avoidance planning, a recurrent neural network with convolution is proposed. In the proposed method, convolution replaces full connection in the standard recurrent neural network, thus reducing the number of parameters and improving the feature extraction capability. Training and test datasets are generated for the deep learning process and, combined with multibeam forward-looking sonar, this learning system can automatically realize online obstacle avoidance for unmanned underwater vehicles. Experiments are designed to compare the performance of the proposed structure with that of a recurrent neural network, gated recurrent units, and conventional ant colony optimization. The results fully verify the effec­ tiveness and feasibility of the proposed method, and show that the powerful learning and memory capabilities of the proposed structure can be used in the unmanned underwater vehicle autonomous learning environment. This study demonstrates that recurrent neural networks with convolution greatly enhance the ability of unmanned systems to sense and adapt to unknown environments.

1. Introduction THE versatility and flexibility of unmanned underwater vehicles (UUVs) have resulted in a wide range of applications in oceanography, marine engineering, geomorphology, and physiognomy. In most appli­ cations, the UUV is navigating in unknown environments that require obstacles to be avoided based on environmental information obtained from the UUV sensors. The purpose of UUV obstacle avoidance planning is to output optimal control instruction according to the real-time environmental information detected by sonar and the movement state of UUV, to plan a safe path for UUV that meets the optimization objective (e.g., shortest path, shortest time consumption or minimum energy consumption, etc.), and satisfy the constraints of various types, such as avoiding obstacles, maximum speed, maximum acceleration, maximum energy consumption and time consumption. Traditional obstacle avoidance planning methods include rapidly-exploring random tree (Hernandez et al., 2015), vector polar histogram method (Wang et al., 2013), potential field (Solari et al., 2017), BK-products (Bui and Kim, 2004), particle swarm optimization (Lin et al., 2017), ant colony optimization (ACO) (Wu et al., 2007; Zhang et al., 2015), quantum

particle swarm optimization (Wang et al., 2016), quantum ant colony optimization (Lin et al., 2018), differential evolution algorithm (Li et al., 2014), fuzzy controller (Fang et al., 2015) and some hybrid methods (Braginsky and Guterman, 2016; Li et al., 2018; Xu et al., 2016; Zeng et al., 2014; Zhuang et al., 2016). However, traditional UUV obstacle avoidance planning methods have the following limitations. (i) The detection sensor used in the un­ derwater environment is usually forward-looking sonar (FLS). However, FLS is susceptible to environmental interference, high levels of noise, false alarms, and missed alarm signals. Based on such sensor data, traditional obstacle avoidance planning methods can only make lowconfidence decisions, which may cause the UUV to adjust its motion frequently according to the online sensing information. (ii) Traditional obstacle avoidance algorithms are based on mathematical models. Ac­ curate modeling to reflect small changes in the system is difficult, especially for uncertain systems. (iii) The complexity of the obstacle avoidance planning algorithm increases with the complexity of the environment. Traditional obstacle avoidance algorithms solve an opti­ mization problem using a random search strategy over a large feasible solution space. The computational cost of the algorithm is reflected in the time and space complexity; however, more complicated algorithms

* Corresponding author. E-mail addresses: [email protected] (C. Lin), [email protected] (H. Wang), [email protected] (J. Yuan), [email protected] (D. Yu), 717620235@ qq.com (C. Li). https://doi.org/10.1016/j.oceaneng.2019.106327 Received 16 November 2018; Received in revised form 29 June 2019; Accepted 15 August 2019 Available online 5 September 2019 0029-8018/© 2019 Elsevier Ltd. All rights reserved.

C. Lin et al.

Ocean Engineering 189 (2019) 106327

Nomenclature NOE xb oyb

y b y a o b; c h pool δ diag rot upsample J

global coordinate system local coordinate system

η ¼ ½x; y; ψ �T position vector and heading of unmanned underwater vehicle (UUV) in global coordinate system

V ¼ ½u; v; r�T velocity vector corresponding to the surge, sway, and yaw of UUV in local coordinate system X force in the x direction (N) Y force in the y direction (N) N moment of force in the ψ direction (Nm) transformation matrix from global coordinate system to Rðψ Þ local coordinate system ~τ actuator input m mass of UUV (kg) M system inertia matrix C Coriolis-centripetal matrix D hydrodynamic damping matrix T propeller coefficients l’ distance between propeller and central axis of UUV (m) n speed of propeller (rpm) dit shortest distance between intersection of i-th ray with obstacles and current position of sonar at time step t (m) Δd input distance vector σ activation function tanh tangent function ReLU rectified linear unit W; U; V weight matrix * convolution x input vector

ψt

input label Δψ φ min max Subscripts RB A a b l t

τ

L

have a greater probability of error and may not guarantee convergence to the approximate global optimal solution. Therefore, it is difficult to meet the timeliness requirements of online real-time planning and the requirements for optimization goals. (iv) Traditional obstacle avoidance planning methods are usually designed for certain optimization goals. When the environmental conditions or mission requirements change, it is necessary to redesign the corresponding optimization goal model. Therefore, traditional planning methods do not have sufficient porta­ bility, interoperability, or adaptability. To improve the autonomy of the obstacle avoidance system, some researchers have introduced reinforcement learning and improved the avoidance performance. Ye et al. (2003) presented a neural fuzzy system with a mixed learning algorithm, in which a supervised learning method is used to determine the input and output membership functions simultaneously and a reinforcement learning algorithm is employed to fine-tune the output membership functions for obstacle avoidance. Huang et al. (2005) integrated quantum (Q)-learning and a neural network for obstacle avoidance in an indoor mobile robot, whereas Jiao et al. (2006) implemented Q-learning in a neural network and applied it to local path planning for an autonomous underwater vehicle. Vien et al. (2007) used the ant-Q framework, which extends ACO with reinforce­ ment learning, to solve the problem of obstacle avoidance path planning for a mobile robot, and Qiao et al. (2008) integrated reinforcement learning with a behavior-based architecture for obstacle avoidance in a dynamic environment. Kurozumi et al. (2010) presented a minimum vector field histogram method that modifies the user manipulation of an electric wheelchair to avoid obstacles. In this method, the modification rate is adjusted by reinforcement learning according to the environment and the user condition. Megherbi and Malayia (2012) combined a po­ tential field with reinforcement learning to solve the problem of multi-agent-based autonomous path planning in a dynamic time-varying

desired output actual output output vector of convolutional layer output vector bias output vector of hidden layer pooling error term diagonal matrix rotate magnify the matrix and redistribute the elements cost function heading of UUV at current time step input data in dataset label in dataset yaw of UUV angle of line from current UUV position to target minimum of each column maximum of each column rigid body added mass left propeller right propeller layer ID time step maximum time step maximum convolutional layer

unstructured environment. Li et al. (2013) proposed an obstacle avoidance algorithm for a robot based on a combination of the state­ –action–reward–state–action (sarsa, λ) algorithm and supervised learning, and demonstrated that the hybrid learning algorithm could significantly reduce the learning time. For obstacle avoidance in un­ manned surface vehicles, Zhang et al. (2014) proposed a sarsa rein­ forcement learning algorithm composed of local avoidance and adaptive learning modules. Duguleana and Mogan (2016) used Q-learning and a neural network planner to solve the problem of autonomous movement in four-wheeled robots in unknown environments. Cheng and Zhang (2017) proposed a deep reinforcement learning obstacle avoidance al­ gorithm with the deep Q-network architecture for underactuated un­ manned marine vessels under unknown environment dynamics. Fathinezhad et al. (2016) proposed a method of supervised fuzzy sarsa learning for robot navigation in various environments with obstacles. Dabooni and Wunsch (2016) presented a direct heuristic dynamic pro­ gramming approach for online model learning in a Markov decision process; this shortens the computation time and improves stability when compared with other traditional reinforcement learning algorithms. Despite the considerable advances made in the abovementioned research, it is difficult to design the reward functions and action strategy for reinforcement learning-based obstacle avoidance planning methods. Additionally, even the most reasonable rewards cannot avoid the problem of local minima. In particular, when UUVs navigate in narrow marine environments, it is difficult to formulate an optimal collision avoidance strategy based on reinforcement learning from limited local environmental information. Recurrent neural networks (RNNs) were first proposed as a network model for time series (Elman, 1990; Werbos, 1988). RNNs remember previous information and apply it to the calculation of the current output. The input to the hidden layer of an RNN includes the output of 2

C. Lin et al.

Ocean Engineering 189 (2019) 106327

N

0 u r

xb u

x

O

o

r

v

yb

E

y

Fig. 1. Global and local coordinate systems. 2 3 0 0 Xjuju juj þ Xu 4 0 Yjvjv jvj þ Yv Yjvjr jvj Yr 5 DðVÞ ¼ 0 Njrjv jrj Nv Njrjr jrj þ Nr

(6)

avoidance planning. Combining this learning system with FLS simula­ tion data enables online autonomous obstacle avoidance planning in an unknown environment. The ability of CRNN-based obstacle avoidance planning systems is validated through simulations for various static and unknown obstacle environments using a nonlinear UUV model. The self-learning and memory capabilities of the CRNN mean that the adjustment range for UUV motion is small and consistent with the input time series, which is beneficial to UUV motion control. The proposed CRNN has good generalization capabilities, and fully trained CRNN obstacle avoidance planning methods can be applied to a wider range of environmental conditions and mission requirements. The convolutional connections allow the CRNN to solve complex obstacle avoidance planning problems with fewer parameters.

Fig. 2. 2D simulation model of FLS.

the input layer and the output from the previous hidden layer. There­ fore, the internal state of this network can exhibit dynamic timing behavior. For the problem of UUV obstacle avoidance planning, UUV control commands must be generated according to temporal sensor in­ formation. For UUVs, environment information is detected by sonar, which means that the input to the obstacle avoidance planning system is a temporal sonar detection data sequence. However, traditional RNNs cannot extract environment features effectively, and contain many network parameters. To overcome these problems, this paper proposes an RNN with convolution (CRNN) in which a convolution connection replaces the full connection between adjacent layers of the RNN. Based on CRNN, a system of UUV autonomous obstacle avoidance planning is constructed. Offline training and testing is adopted to modify the neural network parameters of the UUV autonomous obstacle avoidance learning system, so that self-learning is applied to the collision

2. Simulation model of UUV and FLS 2.1. System model of UUV Obstacle avoidance planning on the vertical plane is usually ach­ ieved through depth adjustment. However, depth adjustment strategies often require large pitch adjustments, which affect the attitude control of the target vehicle. Considering the need to maintain a constant height when employing FLS and to reduce energy consumption, this study employs the strategy of horizontal plane obstacle avoidance regulation priority. Therefore, a horizontal three-degrees-of-freedom (3DOF) con­ trol model is defined for the UUV, which means the dynamics of the vessel associated with heave, roll, and pitch motions are ignored. The 3

C. Lin et al.

Ocean Engineering 189 (2019) 106327

1

o1

o2

h1

h2

o Fully connected layers

Output layer

h

0

Input layer

d1

0

0

0

0

0

1,1

a

a

d2

a

,l

0

0

0

0

0

Convolutional layers

a 2,l

Convolution

a1,l

Hidden layer

Pooling Convolution

Pooling

2

d

2,1

0

a

,1

Fig. 3. Network structure of CRNN (bias units are omitted to simplify the visualization).

global and local coordinate systems are shown in Fig. 1. The 3DOF control model of the UUV is described as follows (Fossen, 2002):

Table 1 Training performance of all structures. Structure

Number of parameters

Number of connections

Mean iteration time/s

Best MSE

RNN1 RNN2 GRU1 GRU2 CRNN1 CRNN2

9395 20679 9362 20546 9282 20534

9395 20679 9346 20546 18000 30180

6.17 11.25 6.36 11.4 6.44 12.82

0.1966 0.1217 0.1629 0.1031 0.1126 0.0901

η_ ¼ Rðψ ÞV

(1)

MV_ þ CðVÞV þ DðVÞV ¼ ~τ

(2)

where M ¼ MRB þ MA and CðVÞ ¼ CRB ðVÞ þ CA ðVÞ. Specifically, 2 3 cos ψ sin ψ 0 4 Rðψ Þ ¼ sin ψ cos ψ 0 5 0 0 1 2

m 0 MRB ¼ 4 0 m 0 mxG 2 CRB ðVÞ ¼ 4 2 ¼4

3 2 0 Xu_ 5 mxG MA ¼ 4 0 Iz 0

0 0 mðxG r þ vÞ

0 0 Yv_v Yr_r

0 0 Xu_ u

0 Yv_ Yr_

3 0 Yr_ 5 Nr_

3 mðxG r þ vÞ 5CA ðVÞ mu 0 3 Yv_v þ Yr_r Xu_ u 5 0

(3)

(4)

0 0 mu

(5)

The kinematic and dynamic equations of the UUV can then be written as: 8 > > > > u_ ¼ ð p11 u þ τu Þ=m11 > > � � > > > v_ ¼ ðAm33 Bm23 Þ m22 m33 m223 > > � � > < r_ ¼ ðBm Am23 Þ m22 m33 m223 22 (7) > > > x_ ¼ u cosðψ Þ v sinðψ Þ > > > y_ ¼ u sinðψ Þ þ v cosðψ Þ > > > > > > : ψ_ ¼ r

Fig. 4. MSE of all structures on test set.

4

C. Lin et al.

Ocean Engineering 189 (2019) 106327

Table 2 Comparison of CRNN with RNN, GRU, and ACO in planning success rate, computation time, and path cost. Coverage of obstacles

ACO

RNN1

GRU1

CRNN1

RNN2

GRU2

CRNN2

Planning success rate (%)

5.2% 7.75% 11.63%

99 94 86

88 77 55

92 84 79

98 97 86

96 85 82

100 98 98

99 99 100

Path cost (m)

5.2% 7.75% 11.63%

1274.8 1575.4 1744

1354.2 1613.1 1833.3

1229.2 1547.6 1711.3

1154.8 1342.3 1541.6

1205.4 1520.8 1660.7

1130.9 1425.6 1526.8

1119 1321.4 1467.3

Time-consumption (ms)

5.2% 7.75% 11.63%

195.36 198.44 215.67

48.95 50.03 52.08

52.79 53.86 56.4

48.71 49.28 52.97

172.77 175.36 179.63

156.94 157.64 1666.13

160.06 156.95 165.53

8 A ¼ p22 v þ ðp23 uc23 Þr B ¼ ðp32 uc32 Þv p33 r þ τr > > > > p11 ¼ Xur þ Xjuju juj m11 ¼ m Xu_ > > > > p22 ¼ Yvr þ Yjvjv jvj < m22 ¼ m Yv_ where. m23 ¼ Yr_ p23 ¼ Yr > > p32 ¼ Nv > m33 ¼ Iz Nr_ > > > > c ¼ m Xu_ p33 ¼ Nr þ Njrjr jrj > : 23 c32 ¼ Xu_ Yv_ The motion of the UUV is controlled by two propellers distributed in the horizontal plane of the UUV. ~τ is modeled as: � � � � �� ~τu 1 1 Tðna Þ (8) ¼ ~τr l’ l’ Tðnb Þ

information of every beam is stored in a vector dt ¼ ½d0t ; d1t ; ⋯; d79 t �. If dit > 120, then we set dit ¼ 120. To decrease the redundancy of infor­ mation and reduce the complexity of the computations, the input dis­ tance vector is defined as Δdt ¼ ½120 d0t ; 120 d1t ; ⋯; 120 d79 t �. The detection frequency of the simulation sonar is set to be 9Hz. 2.3. Structure of the CRNN The structure of the proposed CRNN is shown in Fig. 3. The main difference between CRNN and a standard RNN is how the adjacent layers are connected. In standard RNN, the nodes between any two adjacent layers are connected to each other. In the CRNN, however, only some of the nodes between two adjacent layers are connected in the form of a convolution. The forward propagation of the CRNN is as follows: Pad the edge of xt to give at;1 . For l ¼ 2 to L 1,

2.2. Simulation model of FLS One of the most important sensors in a UUV is sonar, which can detect obstacles and return the distance from the vehicle to the obstacle.

� � 8 l t;l t;l 1 þ bl < ReLU z � ¼ ReLU W *a t;l 1 a ¼ pool a � � : tanh zt;l ¼ tanh Wl at;l 1 þ bl t;l

if ​ convolutional ​ layer if ​ pooling ​ layer if ​ fully ​ connected ​ layer

A two-dimensional FLS simulation model is established as shown in Fig. 2. In this model, the maximum scan radius of the sonar is 120 m; the open angle is 120� . The number of beams is set to be 80, which means that the beam angle is 1.5� . In the two-dimensional sonar simulation model, the beam is represented by rays, and the arrows represent the current heading of the UUV. As the index increases in a clockwise di­ rection, every ray is assigned an index from 0 to 79. The detection

(9)

ht ¼ tanh WL at;L

1

þ bL þ Uht

1



(10) (11)

ot ¼ Vht þ c

(12)

t b y ¼ tanhðot Þ



t In this paper, the cost function is defined as J ¼ t¼1 J ¼ Pτ t t 2 y y Þ . The back-propagation in the CRNN proceeds as follows: t¼1 ðb The gradients of c and V are:

1 2

τ τ τ ∂J X ∂Jt X ∂Jt ∂ot X t ⋅ ¼ ðb y ¼ ¼ t ∂c t¼1 ∂c ∂o ∂c t¼1 t¼1 τ τ τ ∂J X ∂Jt X ∂Jt ∂ot X t ⋅ ¼ ðb y ¼ ¼ t ∂V t¼1 ∂V t¼1 ∂o ∂V t¼1



� yt Þdiag 1

ðb yÞ

(13)

� yt Þdiag 1

� T t 2 ðb y Þ ðht Þ

(14)

t 2

The error term for the L -th layer at time step t is defined as: δt;L ¼

∂J ∂J ∂ot ∂J ∂htþ1 ¼ ⋅ þ ⋅ ∂ht ∂ot ∂ht ∂htþ1 ∂ht ¼ VT ðb y

t

� yt Þdiag 1

t 2

ðb yÞ



� þ UT δtþ1;L diag 1

htþ1

�2 �

(15)

The gradients of WL , U, and bL are: τ τ τ � X X X ∂J ∂Jt ∂Jt ∂ht diag 1 t⋅ L¼ L ¼ L ¼ ∂ h ∂W ∂W t¼1 ∂W t¼1 t¼1

Fig. 5. UUV paths planned by seven obstacle avoidance planning systems in test case 1. 5

2

ðht Þ



δt;L at;L

� 1 T

(16a)

C. Lin et al.

Ocean Engineering 189 (2019) 106327

(a) Yaw control adjustment angles output by the different obstacle avoidance planners.

(b) Left propeller speed control feedback from the different obstacle avoidance planners.

(c) Right propeller speed control feedback from the different obstacle avoidance planners. Fig. 6. Yaw adjustment instructions and propeller speed control feedback of UUV for the different obstacle avoidance planners in test case 1.

6

C. Lin et al.

Ocean Engineering 189 (2019) 106327

Fig. 7. UUV paths planned by six obstacle avoidance planning systems in test case 2.

Fig. 9. UUV paths planned by GRU2, CRNN1/2, and ACO through the maze map.

(a) Yaw control adjustment angles output by the different obstacle avoidance planners.

(b) Left propeller speed control feedback from the different obstacle avoidance planners.

(c) Right propeller speed control feedback from the different obstacle avoidance planners. Fig. 8. Yaw adjustment instructions and propeller speed control feedback of UUV for the different obstacle avoidance planners in test case 2. 7

C. Lin et al.

Ocean Engineering 189 (2019) 106327

(a) Yaw control adjustment angles output by the different obstacle avoidance planners.

(b) Left propeller speed control feedback from the different obstacle avoidance planners.

(c) Right propeller speed control feedback from the different obstacle avoidance planners. Fig. 10. Yaw adjustment instructions and propeller speed control feedback of UUVs for the different obstacle avoidance planners in test case 3.

8

C. Lin et al.

Ocean Engineering 189 (2019) 106327

Data normalization is a critical step in the use of neural networks. There are two general methods to normalize the data: Min-Max normalization and Z-score normalization. The input data used in this paper include many zero values, and Z-score normalization would complicate these data. Thus, Min-Max normalization is applied. For each column in inputt , the data are normalized according to: input* ¼

input max

min min

(18b)

3.2. Training To compare the obstacle avoidance planning performance of the CRNN against that of a regular RNN and gated recurrent units (GRUs), all three models were trained with the same architecture, i.e., input layer, hidden layer, middle layer, and output layer. The input layer consists of 81 neurons corresponding to the input vector. The hidden Fig. 11. UUV paths planned by all obstacle avoidance planning systems under layer consists of one of three network structures. The middle layer 10% noise jamming. contains 24 neurons, and the output layer has two neurons corre­ sponding to the yaw and velocity of the UUV. Dropout is used to deal τ τ τ with the problem of overfitting, with the keep probability of neurons set � � � ∂J X ∂Jt X ∂Jt ∂ht X T 2 diag 1 ðht Þ (17a) δt;L ht 1 ¼ ¼ ¼ to be 0.6. The time step is set to be 9, batch size is set to be 5000, and the t⋅ ∂U t¼1 ∂U t¼1 ∂h ∂U t¼1 maximum number of iterations is set to be 10000. The mean squared error (MSE) loss function and the Adam optimizer are used to evaluate τ τ τ � � ∂J X ∂Jt X ∂Jt ∂ht X t;L t 2 model performance and optimize the parameters, respectively, and the δ ¼ ¼ ⋅ ¼ diag 1 ðh Þ (18a) t ∂bL t¼1 ∂bL t¼1 ∂h ∂bL t¼1 weights are updated using backpropagation through time mini-batch gradient descent with the objective of minimizing the loss function. For l ¼ L 1 to 2, The results in Table 1 and Fig. 4 indicate that the training time of the � � 8 t;lþ1 lþ1 � σ ’ zt;l if ​ convolutional ​ layer networks increases as the number of parameters rises. When the number < δ *rot180 W� � t;lþ1 � σ’ zt;l if ​ pooling ​ layer δt;l ¼ upsample δ of parameters is similar, the number of connections in CRNN is much �T � : Wlþ1 δt;lþ1 � σ ’ zt;l if ​ fully ​ connected ​ layer higher than in both RNN and GRU, and CRNN always obtains the best MSE. With fewer parameters, CRNN1 obtains a smaller MSE than RNN2, (19) and is close to the performance of GRU2. As shown in Fig. 4, the The gradients of Wl and bl are: convergence speed of the networks decreased as the number of param­ τ τ τ t t eters increased. In general, CRNN outperforms the RNN and GRU models t;l X X X ∂J ∂J ∂J ∂z ¼ ¼ ⋅ ¼ δt;l (20) in terms of both MSE and convergence. ∂bl t¼1 ∂bl t¼1 ∂zt;l ∂bl t¼1 4. Simulation experiments and results analysis

3. Construction of UUV autonomous obstacle avoidance planning learning system

In this section, the performance of RNN obstacle avoidance planning networks is validated through a comparison with ACO on a series of experiments. As a statistical experiment, three sets of maps were generated based on the coverage of obstacles, with each set containing 100 random maps. Three test cases examine the generalization ability of the obstacle avoidance planning networks. To verify the performance of the proposed CRNN obstacle avoidance planner in the presence of noise, 10% and 20% noise are then added to the FLS signals to conduct noise jamming experiments. In all test cases, the initial velocity of the UUV is set to be 8 knots. In order to facilitate the control UUV and save energy, each obstacle avoidance planning algorithm mainly realizes obstacle avoidance planning in static environment by output stable speed and real-time adjustment of yaw. Therefore, each algorithm outputs a stable speed to guide UUV to navigate at a uniform speed. Since each algorithm can output a stable speed, we pay more attention to experimental results that reflect the differences between algorithms, such as time consump­ tion, path cost and sensitivity to noise.

3.1. Dataset At time step t, the input to the obstacle avoidance planning network, inputt , is an 81-dimensional vector consisting of the 80-dimensional input distance vector Δdt obtained from the sonar and an input direc­ tion vector φt . The two-dimensional output vector from the obstacle avoidance planning network at time step t consists of the yaw Δψt and the velocity vt of the UUV. The dataset used to test the proposed method consists of 100772 training samples and 899 test samples. In the offline training process training, the labels used for supervised learning are obtained by tradi­ tional obstacle avoidance planning methods, such as ant colony opti­ mization, genetic algorithm, artificial potential field and fuzzy control method. All labels in the training set are obtained by intelligent opti­ mization algorithm in simple environments and there is no detection noise in the simulated sonar. In these environments, obstacles are composed of 30 randomly distributed rectangles, and the average obstacle coverage rate is 4.62%, and the start and target positions are generated at random. The training data set is composed of the excellent results of the above intelligent optimization algorithms. Each sample in the dataset includes an input vector and a label: inputt ¼ ½Δdt ; φt �

(16b)

labelt ¼ ½Δψ t ; vt �

(17b)

4.1. Statistical experiment design Information about the map sets and the performance of the obstacle avoidance planners are shown in Table 2. This section compares the performance of ACO with six deep learning-based obstacle avoidance planners in terms of computation time, planning success rate, and path cost. Note that the coverage of obstacles in the training dataset is 4.62% less than any map in the statistical experiment. In complex environ­ ments, all obstacle avoidance planners except RNN1 obtained high 9

C. Lin et al.

Ocean Engineering 189 (2019) 106327

(a) Yaw control adjustment angles output by the different obstacle avoidance planners.

(b) Left propeller speed control feedback from the different obstacle avoidance planners.

(c) Right propeller speed control feedback from the different obstacle avoidance planners. Fig. 12. Yaw adjustment instructions and propeller speed control feedback of UUVs for the different obstacle avoidance planners under 10% noise jamming.

10

C. Lin et al.

Ocean Engineering 189 (2019) 106327

learning ability of deep learning methods, which enhance the environ­ mental awareness of the obstacle avoidance planning system. 4.2.3. Test case 3 In test case 3, a maze map tests the ability of obstacle avoidance planning networks still further. Note that the training dataset used in this study does not contain any maze maps. The simulation results are shown in Figs. 9 and 10. In this environment, RNN1/2 and GRU1 failed to complete the obstacle avoidance planning. GRU2 and CRNN1/2 still exhibit good performance, which proves their excellent learning and generalization abilities. Although the path planned by ACO is the shortest, it is not appropriate for the kinematics of UUVs. 4.3. Simulation cases under noise jamming 4.3.1. Test case 4 In test case 4, 10% noise is added to the data detected by FLS. The performance of the seven obstacle avoidance planners is shown in Figs. 11 and 12. All deep learning-based planners perform well and are not affected by the noise. However, the ACO-based obstacle avoidance planner is sensitive to noise.

Fig. 13. UUV paths planned by all obstacle avoidance planning systems under 20% noise jamming.

success rates. The experimental results show that the planners have sufficient generalization ability to solve obstacle avoidance problems in complex environments after being trained in simple environments. This strong generalization ability makes it easy to collect sufficient training data in practical applications. As shown in Table 2, compared with the ACO obstacle avoidance system, the obstacle avoidance planners based on deep learning offer better performance in terms of both computation time and path cost. The mean computation time of the four types of networks is basically proportional to the number of parameters and the coverage of the obstacles. In any one network type, more parameters lead to better performance in terms of path cost and planning success rate, but a longer computation time. The GRU and CRNN models outperform RNN in terms of path cost and planning success rate. Compared with GRU2, CRNN1 achieves competitive performance both in path cost and planning success rate. It is worth noting that CRNN1 requires less than one-third the CPU time of GRU2.

4.3.2. Test case 5 We added 20% noise to the data detected by FLS in test case 5. As shown by the simulation results in Figs. 13 and 14, the deep learningbased planners are clearly less sensitive to noise. In contrast, there are many fluctuations on the path planned by the ACO-based obstacle avoidance planner. 4.4. Analysis of simulation experiments Compared with the conventional ACO obstacle avoidance planning method, the deep learning-based obstacle avoidance planning systems require offline training and learning under the support of a large number of training datasets. Once fully trained, the deep learning-based obstacle avoidance planning systems offer shorter planning times, economical paths, and low propeller use. Because of the strong associative memory function, learning, and generalization capabilities of neural networks, the deep learning-based UUV obstacle avoidance planning systems can directly and autonomously output obstacle-avoiding motion control instructions according to the environmental sensing from FLS. As the deep learning-based planning systems have some memory of the envi­ ronment, both the speed adjustment of the propeller and the heading adjustment are smooth and change over relatively small ranges. The autonomous learning ability of neural networks gives UUV obstacle avoidance planning systems higher planning success rate and stronger environmental adaptive ability. Test cases 4 and 5 show that deep learning-based obstacle avoidance planners are less sensitive to noise, which is important for obstacle avoidance planning based on sonar. Statistical experimental results show that CRNN1 is comparable to GRU2 and superior to GRU1 and RNN1/2. In other words, the proposed CRNN can solve the problem of UUV obstacle avoidance planning with fewer parameters and shorter computation times. The simulation results for test case 1 demonstrate that all six structures have excellent learning ability and fitting ability. Test cases 2 and 3 show that the generalization ability of CRNN is better than that of RNN and GRU, even with a similar number of parameters. For any given network, the generalization ability is related to the number of parameters. In other words, compared with RNN and GRU, CRNN has the advantages of a simpler structure, fewer parameters, faster training speed, and better learning, generalization, and robustness.

4.2. Typical simulation cases without noise jamming 4.2.1. Test case 1 In test case 1, the start position is (524, 33) and the target position at (340, 1075). As shown in Fig. 5, all methods can plan safe and collisionfree paths from the start position to the target position in this test case. The deep learning-based obstacle avoidance planners have learned to adjust the UUV’s heading to navigate toward the target position quickly after avoiding obstacles. Compared with ACO, the obstacle avoidance planner based on deep learning produces shorter and smoother paths. The yaw control adjustments output by the different obstacle avoidance planners are shown in Fig. 6(a). The UUV motion controller’s propeller speed control feedback, as shown in Fig. 6(b) and (c), corresponds with the yaw control adjustment instructions output by the obstacle avoid­ ance planner. As shown in Fig. 6, the adjustment range obtained by the deep learning-based obstacle avoidance planners is much smaller than that of ACO, both in yaw adjustment and propeller speed. This improves the kinematics of the UUV and reduces the energy consumption of the actuators, which increases their lifespan. 4.2.2. Test case 2 To test the obstacle avoidance planning networks, a complex wharf map is considered in test case 2. The start and target positions are (262, 37) and (320, 1077), respectively. The simulation results are shown in Figs. 7 and 8. In this case, the performance of RNN2, GRU1/2, and CRNN1/2 is satisfactory, but RNN1 fails to complete the obstacles avoidance task. There is a lot of turbulence in the path planned by ACO, whereas the paths obtained by the deep learning obstacle avoidance planners are very smooth. This is because of the strong memory and

5. Conclusion In this paper, by improving the connection between the hidden layers of an RNN, a convolution network structure has been proposed 11

C. Lin et al.

Ocean Engineering 189 (2019) 106327

(a) Yaw control adjustment angles output by the different obstacle avoidance planners.

(b) Left propeller speed control feedback from the different obstacle avoidance planners.

(c) Right propeller speed control feedback from the different obstacle avoidance planners. Fig. 14. Yaw adjustment instructions and propeller speed control feedback of UUVs for the different obstacle avoidance planners under 20% noise jamming.

and applied to obstacle avoidance planning in UUVs. To investigate the performance of the proposed CRNN for UUV path planning, the training time, computation time, fitting, learning, and generalization of CRNN were compared with those of a regular RNN and GRU. The CRNN obstacle avoidance planner was found to have the advantages of a short training time, simple network structure, better generalization perfor­ mance, and reliability. Compared with the ACO algorithm, the proposed CRNN-based obstacle avoidance planners require less computing time, obtain shorter paths, use less energy through their actuators, and are insensitive to noise. Because of its strong feature extraction ability, the CRNN-based obstacle avoidance planner can automatically reveal and memorize environmental features, thereby overcoming the weak observation characteristics of the environment and enhancing the capability of environmental perception and anti-noise jamming.

Acknowledgments This research work is supported by National Natural Science Foun­ dation of China (No. 61633008, No. 51609046) and Natural Science Foundation of Heilongjiang Province under Grant F2015035. References Braginsky, B., Guterman, H., 2016. Obstacle avoidance approaches for autonomous underwater vehicle: simulation and experimental results. IEEE J. Ocean. Eng. 41 (4), 882–892. https://doi.org/10.1109/JOE.2015.2506204. Bui, L., Kim, Y., 2004. A new approach of BK products of fuzzy relations for obstacle avoidance of autonomous underwater vehicles. Eur. J. Neurosci. 4 (2), 135–141. htt ps://doi.org/10.5391/IJFIS.2004.4.2.135. Cheng, Y., Zhang, W., 2017. Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels. Neurocomputing 272, 63–73. https://doi. org/10.1016/j.neucom.2017.06.066. Dabooni, S.A., Wunsch, D., 2016. Heuristic Dynamic Programming for Mobile Robot Path Planning Based on Dyna Approach. International Joint Conference on Neural

12

C. Lin et al.

Ocean Engineering 189 (2019) 106327 Megherbi, D.B., Malayia, V., 2012. Cooperation in a distributed hybrid potential-field/ reinforcement learning multi-agents-based autonomous path planning in a dynamic time-varying unstructured environment. In: IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support, pp. 80–87. New Orleans, LA, USA. https://doi.org/10.1109/CogSIMA.2012.618841 3. Qiao, J., Hou, Z., Ruan, X., 2008. Application of reinforcement learning based on neural network to dynamic obstacle avoidance. In: International Conference on Information and Automation, pp. 784–788. Changsha, China. https://doi.org/10.1109/ ICINFA.2008.4608104. Solari, F.J., Rozenfeld, A.F., Sebasti� an, V.A., Acosta, G.G., 2017. Artificial potential fields for the obstacles avoidance system of an AUV using a mechanical scanning sonar. In: IEEE/OES South American International Symposium on Oceanic Engineering, pp. 1–6. Buenos Aires, Argentina. https://doi.org/10.1109/SAISOE.2016.7922477. Vien, N.A., Viet, N.H., Lee, S.G., Chung, T.C., 2007. Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm. International Symposium on Neural Networks, Berlin, Heidelberg, pp. 704–713. https://doi.org /10.1007/978-3-540-72383-7_83. Wang, H., Wang, L., Li, J., Pan, L., 2013. A Vector Polar Histogram Method Based Obstacle Avoidance Planning for AUV. MTS/IEEE Oceans, Bergen, Norway, pp. 1–5. https://doi.org/10.1109/OCEANS.2013.6608088. Wang, H., Zhou, H., Yao, H., 2016. Research on Autonomous Planning Method Based on Improved Quantum Particle Swarm Optimization for Autonomous Underwater Vehicle. MTS/IEEE Oceans, Monterey, CA, USA, pp. 1–7. https://doi.org/10.1109/ OCEANS.2016.7761143. Werbos, P.J., 1988. Generalization of backpropagation with application to a recurrent gas market model. Neural Netw. 1 (4), 339–356. https://doi. org/10.1016/0893-6080(88)90007-X. Wu, X., 2007. Research on Path Decision Making Method for AUV Based on ACO. M.S. Thesis, Dept. Auto. Harbin Eng. Univ. Harbin, China. Xu, H., Lei, G., Jian, L., Wang, Y., Zhao, H., 2016. Experiments with Obstacle and Terrain Avoidance of Autonomous Underwater Vehicle. MTS/IEEE Oceans, Washington, DC, USA, pp. 1–4. https://doi.org/10.23919/OCEANS.2015.7404445. Ye, C., Yung, N.H.C., Wang, D., 2003. A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance. IEEE Trans. Syst. Man Cybern. B Cybern. A Publ. IEEE Syst. Man Cybern. Soc. 33 (1), 17–27. htt ps://doi.org/10.1109/TSMCB.2003.808179. Zeng, Z., Lammas, A., Sammut, K., He, F., Tang, Y., 2014. Shell space decomposition based path planning for AUVs operating in a variable environment. Ocean Eng. 91 (15), 181–195. https://doi.org/10.1016/j.oceaneng.2014.09.001. Zhang, R., Li, J., Yang, J., 2015. AUV route planning study for obstacle avoidance task based on improved ant colony algorithm. J. Huazhong Univ. Sci. Technol. (Nat. Sci. Ed.) (s1), 428–430. https://doi.org/10.13245/j.hust.15S1102. Zhang, R., Tang, P., Su, Y., Li, X., Yang, G., Shi, C., 2014. An adaptive obstacle avoidance algorithm for unmanned surface vehicle in complicated marine environments. IEEE/ CAA J. Autom. Sin. 1 (4), 385–396. https://doi.org/10.1109/JAS.2014.7004666. Zhuang, Y., Sharma, S., Subudhi, B., Huang, H., Wan, J., 2016. Efficient collision-free path planning for autonomous underwater vehicles in dynamic environments with a hybrid optimization algorithm. Ocean Eng. 127, 190–199. https://doi.org/10.1016/ j.oceaneng.2016.09.040.

Networks, pp. 3723–3730. Vancouver, BC, Canada. https://doi.org/10.1109/IJCN. 2016.7727679. Duguleana, M., Mogan, G., 2016. Neural networks based reinforcement learning for mobile robots obstacle avoidance. Expert Syst. Appl. 62, 104–115. https://doi. org/10.1016/j.eswa.2016.06.021. Elman, J.L., 1990. Finding structure in time. Cogn. Sci. 14 (2), 179–211. Fang, M., Wang, S., Wu, M., Lin, Y., 2015. Applying the self-tuning fuzzy control with the image detection technique on the obstacle-avoidance for autonomous underwater vehicles. Ocean Eng. 93 (1), 11–24. https://doi.org/10.1016/j.oceaneng.2014.11 .001. Fathinezhad, F., Derhami, V., Rezaeian, M., 2016. Supervised fuzzy reinforcement learning for robot navigation. Appl. Soft Comput. 40 (C), 33–41. https://doi.org/10 .1016/j.asoc.2015.11.030. Fossen, T.I., 2002. Marine Control Systems: Guidance, Navigation, and Control of Ships, Rigs and Underwater Vehicles. Marine Control System, Guidance, Navigation and Control of Ships, Rigs and Underwater. Hernandez, J.D., Vidal, E., Vallicrosa, G., Galceran, E., 2015. Online path planning for autonomous underwater vehicles in unknown environments. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1152–1157. Seattle, WA, USA. https://doi.org/10.1109/ICRA.2015.7139336. Huang, B., Cao, G., Guo, M., 2005. Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In: International Conference on Machine Learning and Cybernetics, pp. 85–89. Guangzhou, China. htt ps://doi.org/10.1109/ICMLC.2005.1526924. Jiao, P., 2006. Research on Obstacle Avoiding Algorithm for Autonomous Underwater Vehicle. M.S. Thesis, Dept. Auto. Harbin Eng. Univ., Harbin, China. Kurozumi, R., Tsuji, K., Ito, S.I., Sato, K., Fujisawa, S., Yamamoto, T., 2010. Experimental validation of an online adaptive and learning obstacle avoiding support system for the electric wheelchairs. In: IEEE International Conference on Systems Man and Cybernetics, Istanbul, Turkey, pp. 92–99. https://doi.org/10.1109/ICSMC.201 0.5642211. Li, Y., Ma, T., Chen, P., Jiang, Y., Wang, R., Zahng, Q., 2018. Autonomous underwater vehicle optimal path planning method for seabed terrain matching navigation. Ocean Eng. 133 (15), 107–115. https://doi.org/10.1016/j.oceaneng.2017.01.026. Li, J., Zhang, R., Yang, Y., 2014. Research on Route Obstacle Avoidance Task Planning Based on Differential Evolution Algorithm for AUV. International Conference in Swarm Intelligence. Hefei, China, pp. 106–113. https://doi.org/10.1007/978-3-319 -11897-0_13. Li, J., Bi, W., Li, M., 2013. Hybrid reinforcement learning and uneven generalization of learning space method for robot obstacle avoidance. In: Proceedings of 2013 Chinese Intelligent Automation Conference. Yangzhou, Jiangsu, China, pp. 175–182. https ://doi.org/10.1007/978-3-642-38460-8-20. Lin, C., Wang, H., Yuan, J., Fu, M., 2018. An online path planning method based on hybrid quantum ant colony optimization for AUV. Int. J. Robot. Autom. 33 (4), 435–444. https://doi.org/10.2316/Journal.206.2018.4.206-5337. Lin, Y., Wang, S., Huang, L., Fang, M., 2017. Applying the stereo-vision detection technique to the development of underwater inspection task with PSO-based dynamic routing algorithm for autonomous underwater vehicles. Ocean Eng. 139, 127–139. https://doi.org/10.1016/j.oceaneng.2017.04.051.

13