Ship detection for visual maritime surveillance from non-stationary platforms

Ship detection for visual maritime surveillance from non-stationary platforms

Ocean Engineering 141 (2017) 53–63 Contents lists available at ScienceDirect Ocean Engineering journal homepage: www.elsevier.com/locate/oceaneng S...

2MB Sizes 271 Downloads 388 Views

Ocean Engineering 141 (2017) 53–63

Contents lists available at ScienceDirect

Ocean Engineering journal homepage: www.elsevier.com/locate/oceaneng

Ship detection for visual maritime surveillance from non-stationary platforms☆

MARK



Yang Zhang, Qing-Zhong Li , Feng-Ni Zang Shandong Provincial Key Laboratory of Ocean Engineering, Ocean University of China, Qingdao 266100, China

A R T I C L E I N F O

A BS T RAC T

Keywords: Ship detection Visual maritime surveillance Object detection Gaussian mixture model Discrete cosine transform

This paper presents a new ship target detection algorithm to achieve efficient visual maritime surveillance from non-stationary surface platforms, e.g., buoys and ships, equipped with CCD cameras. In the proposed detector, the three main steps including horizon detection, background modeling and background subtraction, are all based on Discrete Cosine Transform (DCT). By exploiting the characteristics of DCT blocks, we simply extract the horizon line providing an important cue for sea-surface modeling. The DCT-based feature vectors are calculated as the sample input to a Gaussian mixture model which is effective in representing dynamic ocean textures, such as waves, wakes and foams. Having modeled sea regions, we perform the ship detection using background subtraction followed by foreground segmentation. Experimental results with various maritime images demonstrate that the proposed ship detection algorithm outperforms the traditional techniques in terms of both detection accuracy and real-time performance, especially for complex sea-surface background with large waves.

1. Introduction Sea-surface platforms are commonly deployed for a wide variety of tasks and operations, such as open ocean exploration, supervisory control of Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs), oil and gas drilling, ecological monitoring and sampling, and homeland security surveillance (Y. Zhang et al., 2016; Li et al., 2014). For non-stationary platforms, e.g., buoys and ships, it is of great importance to develop an automated maritime surveillance system, especially in wide open waters. Such a visual monitoring tool in the military applications enables significant capabilities for safeguarding maritime rights and interests, strengthening supervision and management of sensitive areas, resolving maritime disputes, and detecting illegal activities. It adds a great deal of convenience to civil applications as well, e.g., port traffic management, maritime search and rescue. The current maritime surveillance systems are mainly based on air-/space-borne Synthetic Aperture Radar (SAR), High Frequency Surface Wave Radar (HFSWR), regular ship-based radars, and air-/ space-borne optical sensors (Tello et al., 2005; Sciotti et al., 2002; Zhu et al., 2010; Künzner et al). The SAR equipment can cover an ultrawide range and operate continuously under all-weather conditions at the expense of limited image resolution. For optical sensors, the

☆ ⁎

infrared camera provides longer view distance relative to typical cameras, especially at night or in low visibility (Withagen et al., 1999; Broek et al., 2000). However, the low-resolution imagery and high power consumption of infrared cameras limit access to the deployment of autonomous surveillance systems. In comparison, visible-light images captured by optical cameras generally contain rich color and texture information, which enable people to interpret and recognize the scene more readily. Additionally, the visible-light camera has the advantages of low-cost, easy installation and low power consumption. More importantly, such a camera can be employed for military security, since the passive imaging modality does not expose the location of the surveillance system. This has motivated the development and improvement of optical systems over the past decades, to meet the critical need of maritime scene monitoring at improved imaging resolution by integrating with other sensors (Fefilatyev et al., 2012). Based on the platform structure, the visual maritime surveillance system can be divided into two categories: video surveillance with stationary/non-stationary cameras. The stationary surveillance systems are generally employed in harbor, port, and coast applications where the background remains basically unchanged. On the other hand, the non-stationary surveillance equipment usually works in open waters far away from the coastline using cameras mounted on moving

This work was supported by the National High Technology Research and Development Program of China (863 Program) under Grant 2006AA09Z237. Corresponding author. E-mail address: [email protected] (Q.-Z. Li).

http://dx.doi.org/10.1016/j.oceaneng.2017.06.022 Received 14 November 2016; Received in revised form 7 April 2017; Accepted 7 June 2017 0029-8018/ © 2017 Elsevier Ltd. All rights reserved.

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

Kumar and Selvi (2011) and Selvi and Kumar (2011) introduced an object classification algorithm based on Local Binary Pattern (LBP) for ship detection. The Histogram of Gradients (HOG) (Wijnhoven et al., 2010) is a commonly used feature representation. Loomans et al. (2013) devised a combination of a multi-scale HOG detector and a hierarchical KLT feature point tracker to track ships in harbors. This tracking system also incorporates an active camera to improve the tracking results under challenging conditions. In the Pascal Visual Object Classes (VOC) challenge, the Deformable Part Model (DPM) based on HOG achieved the best detection performance for twenty classes, including boats (Everingham et al., 2015). In Sullivan and Shah (2008), an enhanced Maximum Average Correlation Height (MACH) filter was applied for vessel detection by matching appearance templates with testing sequences via Fast Fourier transform (FFT). The advantage of these appearance-based methods is that they do not rely on background modeling. In more recent works (Everingham et al., 2015; R. Zhang et al., 2016; Zou and Shi, 2016), the Convolutional Neural Network (CNN) features have achieved a substantial improvement in detection performance. While the above feature-based detection schemes benefit from the texture consistency within sea-surface background, their computational complexities pose a rather serious challenge to real-time video communication over visual sensor networks. Aiming at visual maritime surveillance from non-stationary platforms, we propose an efficient ship target detection algorithm to achieve both high detection accuracy and real-time performance. Here, we first develop an effective learning strategy including simple horizon segmentation and complex sea-surface background modeling. In maritime scenario, ships usually appear around the position of the horizon line, occupying both sky and ocean regions. The horizon line can be used as a reference to limit the regions of interest and reduce the execution time of detection. After horizon detection, we can simply extract the sea-surface background regions below the horizon and only use these regions for background modeling. This lowers the probability of detection mistakes caused by the presence of background motion, e.g., waves, wakes, and foams. More importantly, such independent detectors for sky and sea regions increase the detection sensitivity to small objects around the horizon line. Therefore, an initial detector of horizon line is required before background modeling and object detection. In the proposed scheme, we can simply detect the horizon line by exploiting the characteristics of Discrete Cosine Transform (DCT) blocks. At the step of sea-surface background modeling, we present a novel DCT-based texture Gaussian mixture model to further separate ship targets from the complex sea-surface background below the horizon. Having detected the sea-surface background, we remove it and finally obtain ship targets according to the texture consistency. The main contribution of the proposed algorithm is to provide more accurate detection results within complex sea-surface background, which is of vital importance for ship-/buoy-based surveillance applications in the presence of large waves. Experiments with real images are presented to assess the effectiveness of the proposed ship detection approach, in comparison to traditional techniques. In the remainder, we describe the implementation details of the proposed ship detection algorithm in Section 2, and compare its performance with previous techniques in Section 3. We present a summary and conclusions from this investigation in Section 4.

ships or swaying buoys. In this case, the captured scene keeps changing due to the platform movement. This study is aimed at devising a solution for ship target detection from buoy-/ship-based visual surveillance, which can motivate more maritime applications for the realization of intelligent visual sensor networks with on-board video processing and real-time bi-directional communication. Some earlier works on automatic detection techniques for ships or surface objects have been proposed based on video imagery information. These methods generally fall into three categories. The first class is based on the background modeling and subtraction (Wang et al., 2005; Moreira et al., 2014; Prasad et al., 2017). Hu et al. (2011) proposed a vessel detection method in which the ocean background is simply modeled by median values of n video frames. Some authors (Bloisi and Iocchi, 2009; Frost and Tapamo, 2013; Grupta et al., 2009; Wei et al., 2009; Szpak and Tapamo, 2011; Robert-Inácio et al., 2007; Pires et al., 2010) used Gaussian functions to model the water surface followed by background subtraction. The Gaussian mixture model (GMM) statistically exploits the fact that each pixel belongs to the sea surface or to the ship (Moreira et al., 2014). Zhang and Zheng (2011) and Borghgraef et al. (2010) modified the conventional GMM to break down the challenging problem of fast moving maritime targets. However, these methods, primarily designed for fixed cameras, do not generally offer good detection/surveillance performance for non-stationary platforms with a high degree of variability. The second category of ship detection methods is based on human visual attention model (Prasad et al., 2017). Itti et al. (1998) utilized a visual saliency map to analyze complex natural scenes. It segregates the regions of interest (high saliency) according to the local contrast which is consistently present at various length scales. However, it is not effective in dealing with wakes corresponding to the moving ships, since they introduce a high contrast over the surrounding pixels (Prasad et al., 2017). To achieve real-time detection, Hou and Zhang (2007) constructed the saliency map in spatial domain by extracting the corresponding spectral residual in spectral domain. Agrafiotis et al. (2014) designed a maritime tracking system by combining visual attention map with GMM. The tracking results are further refined using an adaptable online neural network tracker. Additional enhancement on visual attention model was realized by a two-scale detection scheme (Liu et al., 2013). At the larger scale, sea background is removed by a mean-shift smoothing algorithm. At the smaller scale, objects of interest are coarsely labelled using salient edge region extraction; post-processing for chrominance components provides more useful cues to select the output targets. Albrecht et al. (2011) modeled the visual maritime attention using multiple low-level image features in combination with a Bayes classifier. We conclude that these approaches based on visual attention model expect to reduce the noise of sea background at a larger scale as well as enhance the salient features of object regions. However, such methods usually do not perform well when large surface waves are involved in the scene. This is because the visual saliency map will probably become inaccurate if the salience of waves has the same or even higher order of magnitude compared to the original targets. The other techniques apply edge and texture features to detect ship targets. For buoy-based visual surveillance, Fefilatyev et al. (2012) and Fefilatyev (2012) proposed a marine vehicle detection algorithm by exploiting the gradient information. After extracting the horizon by Hough transform, a global thresholding algorithm segments ship targets effectively from the background region above the estimated horizon. However, this method cannot work once the targets appear below the horizon. Although the edge and contour features (e.g., using Hough transform) are widely used in ship detection (Arshad et al., 2011; Yan et al., 2012; Xu et al., 2011), these methods generally do not achieve good performance for complex background. To make full use of various texture information in the sea-surface background, the detection accuracy can be significantly improved by incorporating fractal feature (Liang et al., 2012). Using both color and texture components,

2. The proposed ship detection algorithm The maritime images acquired from a non-stationary platform typically contain the foreground of ship targets and the background of sea surface as well as sky. The main challenge for background subtraction and detection of foreground objects is the difficulty in modeling the dynamics of water, including waves, wakes and foams (Prasad et al., 2017). In order to improve the detection accuracy of maritime surveillance systems in open sea, we present a novel ship 54

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

where A is the mean value of AC coefficients in a block, and Amax denotes the maximum over all such mean values. A block is labelled as sky/sea-surface region if t is below/above threshold T . In experiments with 200 maritime images of varying type, the 95% confidence interval of T has been found to be [0.065, 0.135]; i.e., a threshold within this interval correctly labels the sky and ocean regions with 95% probability. Accordingly, a threshold value of T = 0.1 has been selected. We mark the sky and sea-surface blocks with “0” and “1”, respectively. (3) Draw a horizon line approximately using the central points of all bottommost blocks in the sky region. In some cases, the ship targets may occupy a large fraction of sky background above the horizon line. In order to avoid the misclassification caused by ship targets, we only select the bottommost blocks with smooth changes in vertical direction (y-coordinates) for horizon detection. 2.2. Sea-surface background modeling using GMM The background subtraction technique is an effective tool for moving object detection with stationary cameras, where a pixel-wise statistical background model is used to classify the input video stream into foreground and background regions. However, such pixel-based background modeling algorithm is mainly suitable for visual maritime surveillance from stationary platforms. Our goal is to address the issue of object detection from non-stationary platforms, such as buoys or ships. From a large range of video content types, we observe that the texture feature of sea surface is generally uniform and consistent regardless of whether the camera is stable or not. Considering the texture consistency of sea-surface background, we present a DCT-based background modeling method using texture features. According to the input video contents, this model should be realized by a learning mechanism which can quickly provide texture-based features for dynamic scene analysis of sea surface.

Fig. 1. Block diagram of proposed ship detection algorithm.

detection algorithm using DCT-based Gaussian mixture model (GMM). According to the characteristics of DCT coefficients, the proposed algorithm first detects the horizon line to extract the sea-surface regions for background modeling. In the procedure of sea-surface background modeling with GMM, we calculate the coefficient energies of three regions in each DCT block as the feature vectors, to feed the learning process of GMM. Once all the ocean regions are modeled, the vessels in these regions can be detected by classifying each image block into background or foreground. Fig. 1 outlines the procedure of the proposed ship detection scheme, including the detection of horizon line, the sea-surface background modeling using GMM, and the ship detection using background subtraction. Next, we give the detailed description of each part in the proposed algorithm.

2.2.1. DCT-based feature selection of background texture The texture-based features in sea-surface background can be considered as the spatial distribution of intensity variations. Here, we define the texture feature as a three dimensional vector in DCT domain, as shown in Fig. 2. The DCT coefficients depicted by different colors account for the spectrum component in the corresponding direction. To elaborate, the white region R0 in the upper left corner denotes the direct-current component; R1 (green), R2 (yellow) and R3 (gray) regions represent the vertical, diagonal and horizontal frequency variation, i.e., horizontal, diagonal and vertical texture information. We calculate the energies E1, E2 and E3 of regions R1, R2 and R3 respectively to generate the texture-based feature vector X:

2.1. DCT-based horizon detection The initial detector of horizon line is required before background modeling and object detection. According to the characteristics of DCT coefficients, the proposed horizon detection algorithm involves the following steps: (1) Decompose the luminance component of an input image into 8× 8 non-overlapped blocks, and then apply DCT to these blocks: 7

Aij = αi αj

7

∑ ∑ Imn cos π (2m+1) i cos π (2n+1) j , i m =0 n =0

, j = 0, 1, …, 7

16

16

(1)

where Imn is the pixel value at location (m, n ) in the 8 × 8 image block, and Aij is the DCT coefficient in the corresponding 8× 8 DCT 1 1 block. The normalized weight αi = , if i = 0 ; otherwise, αi = 2 . 2 2 Each 8× 8 DCT block includes 1 DC coefficient and 63 AC coefficients. (2) Label a DCT block as sky/sea-surface region using the following measure:

t=

A Amax

(2)

Fig. 2. Frequency portioning for texture features in a DCT block.

55

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

X = (E1, E2, E3)T

(3)

where the region energy Ek (k = 1,2,3) is defined as follows

Ek =

Σ i,0

(Aij − Ak )2



(4)

i , j , Rk

1 Rk

⎛ di = ⎜ (Xt − μi, t −1)T ⎝

∑ Aij k = 1, 2, 3 (5)

Rk

Accordingly, each block in the sea-surface background yields a corresponding texture feature vector denoted by Xi (i = 1,2, …, N ); N is the total number of blocks or feature vectors within the seasurface background region. Using these texture-based feature vectors, we can model the texture background of sea surface in the following section.

i = 1, 2, …, K (6)

i =1

where K is the total number of Gaussian distributions (usually ranging K from 3 to 5), ωi, t (0 ≤ ωi, t ≤ 1, ∑i =1 ωi, t = 1) is the normalized weight of the ith Gaussian in the mixture at time t, μi, t and Σi, t are the mean and covariance matrix of the ith Gaussian in the mixture at time t, respectively. The Gaussian probability density function η is defined as:

η (X , μi , Σ i ) =

1 n 1 (2π ) 2 |Σ i | 2

1

(13)

μi, t = (1 − β ) μi, t −1 + βXt

(14)

∑i,t

(15)

= (1 − β )

∑i,t −1 + βdiag [(Xt − μi,t )(Xt − μi,t )T ]

ωi 1

( ∑i ) 2

(16)

We update the parameters of the mth Gaussian with the lowest fitness value by the following equations:

T Σ −1 (X − μ ) i , i

m = argminFiti i

(7)

Since the DCT is an orthogonal transform, coefficients in a DCT block are independent of each other (Wang et al., 2005). Accordingly, the three components E1, E2 and E3 (coefficient energies) of X are mutually independent. As the covariance of independent variables is zero, the covariance matrix Σi, t is a diagonal matrix:

(8)

(17)

ω′m, t = α

(18)

μm, t = Xt

(19)

Σ m, t

where σ1, σ2 and σ3 are the standard deviations of E1, E2 and E3, respectively. The procedure of texture-based background modeling is to simulate a mixture of Gaussian distributions using feature vectors with respect to DCT blocks. Specifically, we need to estimate the unknown parameters of ωi , μi and Σ i in the Gaussian probability density functions. The learning mechanism of the proposed background modeling is described as follows:

⎡ σ2 0 0 ⎤ ⎡V02 0 0 ⎤ ⎢ 1, m, t ⎥ ⎢ ⎥ 2 =⎢ 0 σ 2, m, t 0 ⎥ = ⎢ 0 V02 0 ⎥ ⎢ ⎥ ⎢ 2⎥ 2 ⎢⎣ 0 0 σ3, ⎦ ⎣ 0 0 V0 ⎦ m, t ⎥

(20)

Moreover, all the weights are normalized as K

ωi, t = ω′i, t / ∑ ω′ j, t , i = 1, 2, …, K j =1

(21)

where ω′i, t and ωi, t are the original and normalized weights, respectively. (3) Repeat the above-mentioned learning step until all the samples are implemented. Then, rearrange all the distributions in a decreasing order according to the fitness function, and select the first B distributions as the output background model,

(1) Initialize the following GMM parameters as:

ωi,0 = 0

(12)

ωi, t = (1 − α ) ωi, t −1 + α

Fiti =

e 2 (X − μi )

⎡ σ2 0 0 ⎤ ⎢ 1, i, t ⎥ 2 Σ i, t = ⎢ 0 σ 2, i, t 0 ⎥ ⎢ ⎥ 2 ⎢⎣ 0 0 σ3, ⎦ i, t ⎥

⎞2

where α denotes the learning rate of ωi , and β denotes the learning rate of μi and Σ i . It should be noted that the parameters of the other K − 1 Gaussian distributions are kept unchanged. If di is larger than threshold Td , i.e. the sample Xt cannot match anyone of the K Gaussian distributions, rearrange these distributions in a descending order according to the fitness function:

K

∑ ωi,t⋅η (Xt , μi,t , Σi,t ),

1

−1

∑i,t −1 (Xt − μi,t −1) ⎟⎠

We use di to measure the matching degree between Xt and the ith Gaussian. Let Td be the threshold of maximum Mahalanobis distance from a feature vector to the cluster center. If di is smaller than threshold Td , categorize Xt into the ith Gaussian, and then update the corresponding parameters in the ith Gaussian distribution:

2.2.2. Learning of Gaussian mixture model After the horizon detection described in Section 2.1, we can segment the sea-surface background from the sky background. We then categorize the sea-surface background regions into K clusters using Gaussian mixture model (GMM) (Wang et al., 2005). The GMM is based on the feature vectors described in Section 2.2.1. Let D = {X1, X2, …, Xt } account for a sample set of 3-dimensional feature vectors X defined in Eq. (3). Each sample corresponds to a DCT block from K clusters with a certain probability. To quantify, the probability of Xt based on Gaussian distribution is written as (Wang et al., 2005):

P (Xt ) =

(11)

where M0 is set to 0, and V0 is set to a larger number. (2) Calculate the Mahalanobis distance di from an observed input sample Xt to a Gaussian distribution:

where Aij (i , j ∈ Rk ) are DCT coefficients in the region Rk (k = 1,2,3), and Ak is the average of DCT coefficients of region Rk :

Ak =

⎡ σ2 0 0 ⎤ ⎡V02 0 0 ⎤ ⎢ 1, i,0 ⎥ ⎢ ⎥ = ⎢ 0 σ 22, i,0 0 ⎥ = ⎢ 0 V02 0 ⎥ ⎢ ⎥ ⎢ 2⎥ 2 ⎢⎣ 0 0 σ3, ⎦ ⎣ 0 0 V0 ⎦ i,0 ⎥

(9)

b

μi,0 = [ M0 M0 M0 ]T

B = arg min( ∑ Fiti > Tb ),

(10)

b

56

i =1

(22)

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

where Tb is a threshold to select the number of Gaussian distributions for background model. In this paper, Tb is set to 0.8. It should be noted that the number of Gaussian distribution K is related to the type of maritime scenario. For cameras installed on buoys, especially in the open sea, the background regions typically do not contain many elements, which means a relative small K is sufficient for background modeling. In experiments with varying image types, applying three to five distributions (K = 3–5) depending on the corresponding mean vector and covariance matrix has proven efficient to initialize our Gaussian mixture model. In this paper, we select K manually according to the practical maritime scenes. Specifically, for highly dynamic backgrounds, e.g., large waves, wakes and foams, K is set to 5. For a flat water surface, K is set to 3. Clearly, the samples with high probabilities play a key role in modeling sea-surface background while the low-probability samples are gradually eliminated during the above-mentioned learning procedure.

Table 1 Detailed information of the test images.

2.3. Ship detection using background subtraction. Since ships may appear within both sky and sea-surface regions, we implement the object detection in two independent ways. To elaborate, if the measure value t in Eq. (2) of a block above the detected horizon line is larger than the threshold T , such block can be labelled as a part of foreground region. Further, we apply the above-mentioned GMM background modeling to achieve background subtraction, when marine vehicles appear below the horizon line. An image block is labelled as foreground/background region if the Mahalanobis distance di defined in Eq. (12) is larger/smaller than threshold Td , and then the GMM parameters are updated using Eqs. (13)–(15). Repeat the above process until all the blocks within sea-surface background are inspected.

Precision =

Number of test images Image source Camera location Weather condition

Number of ship objects per image

Other image content

Recall =

(23)

Closing: P⋅Q = (P ⊕ Q)⊝Q

(24)

where ⊕ and ⊝ denote the dilation and erosion, respectively. In this paper, we apply the improved morphological close-minus-open (CMO) technique (Westall et al., 2008) to enhance the foreground segmentation results in the detected image blocks:

I ′ = (I −(I ∙SE )) + (I −(I °SE )) = 2I − ((I ∙SE ) + (I °SE ))

NTP NTP + NFN

804 1196 1533 467 1838 31 131 172 872 956 1500 35 9

(26)

(27)

where NTP is the number of true positives, NFP is the number of false positives, and NFN is the number of false negatives. Obviously, the higher the precision and recall are, the better the horizon detection performance is. For 1500 test images containing horizon, the average PR and RR are 98.80% and 95.40%, respectively, as listed in Table 2. The proposed horizon detection algorithm still achieves 87.50% PR and 96.60% RP averaged over 2000 images in which 500 images without horizon are involved. Fig. 4(a) gives four sample images used in this experiment. The detected horizon lines marked in red color match the original horizons very well; see Fig. 4(b). Such an effective horizon detection method, i.e., accurate background segmentation, provides a solid basis for the following background modeling and ship detection. It should be noted that our ship detection method depends on DCTbased GMM to discriminate between objects and sea regions. In order to achieve accurate sea background modeling, we have to select an appropriate threshold Td for measuring the Mahalanobis distance in Eq. (12). If the threshold Td is too small, some sea-wave regions will probably be identified as ship targets. In contrast, an overestimated threshold may increase the probability of missing detections as well. In experiments with 2000 maritime images, the 90% confidence interval of Td has been found to be [4.5, 5.5]; i.e., a threshold within this interval correctly labels the targets and background regions with 90% probability. Consequently, a threshold value of Td = 5 has been selected in this paper. In GMM, the learning rates α and β have a critical impact on the stability of detection systems. From Eq. (13), 1/ α defines the time constant which determines the speed at which the distribution’s parameters change. During this time period of 1/ α , multimodal distributions caused by water dynamics (e.g. waves, wakes and foams) are likely to occur. In order to accurately estimate the covariance matrices of multiple Gaussian distributions, the time window 1/ β of statistical features (mean and covariance) must be smaller than 1/ α , i.e. β >α . Table 3 lists the detection results corresponding to different learning rates. The smaller α and β lead to low convergence, resulting in poor background modeling. In contrast, the larger learning rates make excessive use of Gaussian distributions. Some parts of foreground are classified as background in this case, leading to low average precision. Here, we set the learning rates α and β to 0.01 and 0.02, respectively. Detections enclosed in bounding boxes are compared against the ground truth objects and judged to be true or false positives by measuring bounding box overlap. To be considered as a correct detection (true positive), the overlap area a 0 between the detected bounding box Bd and ground truth bounding box Bgt must exceed 50%

After background subtraction, the objects are enclosed in the bounding boxes by morphological operations (Prasad et al., 2017). The morphological opening and closing on image P are the combinations of dilation and erosion using the same structuring element Q for both operations:

Opening: P ∘Q = (P⊝Q )⊕Q

NTP NTP + NFP

Internet Dataset On board On shore Sunny Rainy Hazy Zero One Multiple Horizon Buildings Animals

(25)

where SE denotes the structuring element, I denotes the input image blocks containing foreground, and I′ denotes the enhanced foreground results.

3. Experimental results Since most works in maritime image processing are designed for military or commercial purpose, their datasets and codes are not available. The 2000 maritime images used in our experiments were downloaded from the Internet and the Singapore Maritime Dataset (SMD) which was created using Canon 70D cameras around Singapore waters (Prasad et al., 2017). These images were captured at different locations with various environmental conditions, e.g., rain and haze. Detailed information on these images is listed in Table 1. All the test images were resized to 320[pix] × 240[pix]. The sample images in Fig. 3 depict the various characteristics and texture contents in real ocean conditions. In order to verify the effectiveness of the proposed horizon detection, we adopt the precision rate and recall rate: 57

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

Fig. 3. Samples from test images. Table 2 Precision and recall rates by the proposed horizon line detection.

1500 images with horizon 2000 images

Table 3 Average precision of the proposed method with different learning rates.

Precision (%)

Recall (%)

α

β

Average precision (%)

98.80 87.50

95.40 96.60

0.001 0.005 0.01 0.02 0.05

0.005 0.01 0.02 0.05 0.1

55.3 60.9 61.7 56.1 49.9

(Everingham et al., 2015):

a0 =

Area (Bd ⋂Bgt ) Area (Bd ⋃Bgt )

(28)

where Bd ⋂Bgt denotes the intersection of the detected and ground truth bounding boxes, Bd ⋃Bgt denotes their union, and Area denotes the number of pixels. The unassociated objects or associated objects with a 0 less than 50% are labelled as false positives. Ground truth objects with no matching detection are false negatives. We also use the precision and recall rates in Eqs. (26) and (27) to measure the overall performance for ship detection. Since there are no public source codes or executables for ship detection, we compare the proposed algorithm with Itti et al. (1998) and Hou and Zhang (2007) algorithms based on visual attention model. Fig. 5 depicts the precision-recall curves of the three object detection algorithms. The proposed method outperforms the other two saliency-based methods in terms of both precision and recall rates. From the ending points of the curves, we can see that 85% ships in all test images are detected by our method, but the other two detectors

Fig. 5. Precision-recall curves of the three object detection methods. The legend indicates the average precision (%) obtained by the corresponding method.

Fig. 4. Horizon detection results. (a) Original images. (b) Detected horizons by the proposed algorithm.

58

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

the three object detection methods. Both Itti’s and Hou’s algorithms suffer from false detections caused by the waves and wakes; see Fig. 7(a′′, b′, c′, c′′, d′, d′′). The wakes corresponding to the ships are also enclosed in the bounding boxes, since they have the same level of saliency/contrast as in the ship regions. Moreover, some small regions in the sea-surface background are labelled incorrectly as foreground, as the waves in these regions introduce a high contrast with respect to their surroundings. This problem becomes more serious in the complex background with large waves, as shown in Fig. 7. The saliency-based detection algorithms even identify the entire sea-surface regions as foreground due to the presence of large waves. Fig. 8 depicts the saliency maps created by Itti’s and Hou’s algorithms in the same scenario. The intensities in these saliency maps are quite similar between the vessel and sea regions, leading to the false detections in Fig. 7. In comparison, the proposed block-wise background subtraction followed by foreground segmentation does not label any sea regions as objects; see Fig. 8(a’’’, b’’’). Consequently, our ship detector achieves better detection performance by suppressing the dynamics of water in Figs. 6 and 7. This confirms that the GMM used in our method is very effective in representing complex background with multimodal histogram. The dynamics of water, including waves, wakes and foams, can be readily classified into different Gaussian distributions, and thus be discriminated from the vessels.

Table 4 Average precision scores of the three algorithms for small and large waves. Background content

Small waves Large waves Average

Number of images

1209 791 –

Number of ships

2253 1280 –

Average precision (%) Itti’s

Hou’s

Proposed

68.5 49.7 59.1

68.6 51.2 59.9

68.9 54.5 61.7

only obtain 68% and 75% objects, respectively. Also, the proposed detector achieves the highest average precision (61.7%) among the three methods, which indicates high overall accuracy and stability in detection performance. Table 4 lists the average precision scores for two kinds of sea background content, i.e. small and large waves. The average precision of the proposed algorithm is slightly higher than Itti’s and Hou’s algorithms in the case of small waves. However, the improvement becomes more noticeable for the sea background with large waves. This confirms the significance of the proposed DCT-based texture GMM in complex background modeling. Next, we give some examples of subjective detection comparison. Fig. 6 presents the comparison results of varying scene content using

Fig. 6. Detection results by the three methods. (a-d) Original images. (a′-d′) Detection results by Itti et al. (1998). (a′′-d′′) Detection results by Hou and Zhang (2007). (a′′′-d′′′) Detection results by the proposed algorithm.

59

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

Fig. 7. Detection results of complex ocean background with large waves by the three methods. (a-d) Original images. (a′-d′) Detection results by Itti et al. (1998). (a′′-d′′) Detection results by Hou and Zhang (2007). (a′′′-d′′′) Detection results by the proposed algorithm.

Fig. 8. (a, b) Original images. (a′, b′) Saliency maps by Itti et al. (1998). (a′′, b′′) Saliency maps by Hou and Zhang (2007). (a′′′, b′′′) Block-wise detections by the proposed method.

60

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

Fig. 9. Detection comparison of small objects with large waves. (a) Original images. (b) Detection result by Hou and Zhang (2007). (c) Detection result by the proposed algorithm.

Fig. 10. Detection comparison of a hazy image. (a) Original images. (b) Detection result by Hou and Zhang (2007). (c) Detection result by the proposed algorithm.

Fig. 11. Detection comparison in the case of glints. (a) Original images. (b) Detection result by Hou and Zhang (2007). (c) Detection result by the proposed algorithm.

the sky region independently after horizon detection, which enhances the sensitivity of small object detection above the horizon line. A similar false detection by Hou’s algorithm occurs in a hazy scene; see Fig. 10(b). The ship at far distance is not detected completely due to the presence of haze. However, the proposed detector handles this issue in Fig. 10(c), demonstrating the robust detection performance under blurred imaging conditions. Fig. 11 shows a subjective comparison in the case of glints. Hou’s method creates overlarge bounding boxes for each ship, since the glints on the right side of the image introduce high intensity/contrast around the ground truth objects. In contrast, our method allows for the GMM to adapts to sudden illumination changes, thus achieving better ship detection. We assess the real-time performance on an Intel i7 MacBook Pro with 16 GB RAM. All the algorithms were implemented in C++ programming language. Table 5 gives the average processing time per image by the three detection algorithms. As noted, Itti’s algorithm consumes the longest time whereas our average processing time is the shortest among the three algorithms. Such good real-time performance for our algorithm arises from two reasons: (1) The texture features are

Table 5 Average processing time per image.

Time (ms)

Itti’s

Hou’s

Proposed

1229

69

44

In extreme environments with large waves, small ships are hardly identified by traditional detectors. Fig. 9 demonstrates the improvement of the proposed method under such scenario. In fact, there are four objects around the horizon line and two objects within the sea background, and some of them even cannot be readily observed by human eyes. In Fig. 9(b), Hou’s algorithm generates a false negative in the sea background and two false negatives above the horizon, but our detector avoids missing these small vessels. Such accurate detections of small objects among large waves benefit from the GMM using DCTbased texture features. The DCT blocks provide efficient texture representation for background modeling by removing the pixel correlation in spatial domain. Furthermore, the proposed algorithm detects

Fig. 12. Failure examples of the proposed algorithm due to the presence of marine animals.

61

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

Borghgraef, A., Barnich, O., Lapierre, F., 2010. An evaluation of pixel-based methods for the detection of floating objects on the sea surface. EURASIP J. Adv. Signal Process. 23, 451–461. Broek, S.P., Bakker, E.J., Lange, D., Theil, A., Jul. 2000. Detection and classification of infrared decoys and small targets in a sea background. In: Proceedings of the SPIE 4029, Targets and Backgrounds VI: Characterization, Visualization, and the Detection Process, Orlando, USA, Everingham, M., Eslami, S.A., Gool, L.V., Williams, C., Winn, J., Zisserman, A., 2015. The pascal visual object classes challenge: a retrospective,”. Int. J. Comput. Vision. 111 (1), 98–136. Fefilatyev, S., 2012. Algorithms for Visual Maritime Surveillance with Rapidly Moving Camera (Ph.D. dissertation). University of South Florida. Fefilatyev, S., Goldgof, D., Shreve, M., Lembke, C., 2012. Detection and tracking of ships in open sea with rapidly moving buoy-mounted camera system. Ocean Eng. 54, 1–12. Frost, D., Tapamo, J.R., 2013. Detection and tracking of moving objects in a maritime environment with level-set with shape priors. EURASIP J. Image Video Process. 1 (42), 1–16. Grupta, K.M., Aha, D.W., Hartley, R., Moore, P.G., Apr. 2009. Adaptive maritime video surveillance. In: Proceedings of the SPIE 7346, Visual Analytics for Homeland Defense and Security, Orlando, USA. Hou, X., Zhang, L., Jun. 2007. Saliency detection: a spectral residual approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Hu, W., Yang, C., Huang, D., 2011. Robust real-time ship detection and tracking for visual surveillance of cage aquaculture. Vis. Commun. Image Represent. 3 (9), 543–556. Itti, L., Koch, C., Niebur, E., 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20 (11), 1254–1259. Kumar, S.S., Selvi, M.U., 2011. Sea objects detection using color and texture classification. Int. J. Comput. Appl. Eng. Sci. 1 (1), 59–63. Künzner, N., Kushauer, J., Katzenbeißer, S., Nov. 2010. Modern electro-optical imaging system for maritime surveillance applications. In: Proceedings of the International WaterSide Security Conference, pp. 1–4. Li, Q., Zhang, Y., Zang, F., 2014. Fast multi-camera video stitching for underwater wide field-of-view observation. J. Electron. Imaging 23 (2). Liang, S., Wu, W., Chen, C., 2012. A detection framework for ship in sea-sky background based on constrained feature points. In: Proceedings of the International Conference on Control Engineering and Communication Technology, pp. 21–23. Liu, Z., Zhou, F., Bai, X., Yu, X., 2013. Automatic detection of ship target and motion direction in visual images. Int. J. Electron. 100 (1), 94–111. Loomans, M., de With, P., Wijnhoven, R., Sept 2013. Robust automatic ship tracking in harbours using active cameras. In: Proceedings of the 20th IEEE International Conference on Image Processing, pp. 4117–4121. Moreira, R., Ebecken, N., Alves, A., Livernet, F., Campillo-Navetti, A., 2014. A survey on video detection and tracking of maritime vessels. Int. J. Res. Rev. Appl. Sci. 20 (1), 37–50. Pires, N., Guinet, J., Dusch, E., 2010. ASV: an innovative automatic system for maritime surveillance. NAVIGATION 58 (232), 47–66. Prasad, D.K., Rajan, D., Rachmawati, L., Rajabaly, E., Quek, C., 2017. Video processing from electro-optical sensors for object detection and tracking in maritime environment: a survey. IEEE Trans. Intell. Transp. Syst.. Robert-Inácio, F., Raybaud, A., Clément, É., Dec. 2007. Multispectral target detection and tracking for seaport video surveillance. In: Proceedings of the Image and Vision Computing, Hamilton, New Zealand, pp. 169–174. Sciotti, M., Pastina, D., Lombardo, P., 2002. Exploiting the polarimetric information for the detection of ship targets in non-homogeneous SAR images. In: Proceedings of the Geoscience and Remote Sensing Symposium, pp. 1911–1913. Selvi, M.U., Kumar, S.S., 2011. Sea object detection using shape and hybrid color texture classification. Proc. Trends in Comput. Sci. Eng. Inf. Technol., pp. 19–31. Sullivan, M., Shah, M., Mar. 2008. Visual surveillance in maritime port facilities. In: Proceedings of the SPIE 6978, Visual Information Processing XVII, Orlando, USA. Szpak, Z.L., Tapamo, J.R., 2011. Maritime surveillance: tracking ships inside a dynamic background using a fast level-set. Expert Syst. Appl. 38 (6), 6669–6680. Tello, M., López-Martínez, C., Mallorqui, J.J., 2005. A novel algorithm for ship detection in SAR imagery based on the wavelet transform. IEEE Geosci. Remote Sens. Lett. 2 (2), 201–205. Wang, W., Chen, D., Gao, W., Yang, J., Oct. 2005. Modeling background from compressed video. In: Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, pp. 161–168. Wei, H., Nyguien, H., Ramu, P., Raju, C., Liu, X., Yadegar, J., May 2009. Automated intelligent video surveillance system for ships. In: Proceedings of the SPIE 7306, Optics and Photonics in Global Homeland Security V and Biometric Technology for Human Identification VI, Orlando, USA. Westall, P., Ford, J.J., O'Shea, P., Hrabar, S., Dec. 2008. Evaluation of maritime vision techniques for aerial search of humans in maritime environments. In: Proceedings of the Digital Image Computing: Techniques and Applications, pp. 176–183. Wijnhoven, R., Rens, K.V., Jaspers, E., de With P., 2010. Online learning for ship detection in maritime surveillance. In: Proceedings of the 31th Symposium on Information Theory in the Benelux, pp. 73–80. Withagen, P.J., Schutte, K., Vossepoel, A.M., Breuers, M.G., Jul. 1999. Automatic classification of ships from infrared (FLIR) images. In: Proceedings of the SPIE 3720, Signal Processing, Sensor Fusion, and Target Recognition VIII, Orlando, USA. Xu, J., Fu, K., Sun, X., 2011. An invariant generalized Hough transform based method of inshore ships detection. In: Proceedings of the International Symposium on Image and Data Fusion, pp. 1309–1314.

calculated based on efficient 8× 8 DCT blocks instead of pixel-wise information; (2) The model learning mechanism followed by updating GMM parameters saves a great deal of time for the proposed detection algorithm. However, Hou’s algorithm has to perform the point-bypoint analysis of logarithmic amplitude spectrum in Discrete Fourier Transform (DFT) domain, leading to higher computational complexity. Itti’s algorithm takes advantage of multi-scale Gaussian pyramids to generate the feature maps in terms of intensity, color and orientation features, but such complex feature extraction (at least 48 feature maps per image) makes the detection process rather time consuming. The proposed detector with low computational complexity has the potential to be embedded in a camera system on chip. Furthermore, our algorithm can be incorporated in a DCT-based video compression system (e.g. HEVC standard encoder) (Zhang et al., 2016), to achieve real-time wireless video communications over maritime surveillance networks. Fig. 12 shows three failure examples of the proposed algorithm due to the presence of marine animals. The dolphins jumping out of the water are incorrectly identified as the ship objects, since our detector does not incorporate a classifier to distinguish ships from other foreground elements. Similarly, many errors of detection/tracking may occur due to the presence of birds, planes above the horizon, and even unknown floating objects. To improve the recognition capability of the proposed detection algorithm will be one of our future works. 4. Conclusions A DCT-based ship detection algorithm has been designed for maritime surveillance on non-stationary platforms. According to the characteristics of DCT coefficients, the horizon detection method can extract the sea regions accurately for complex background modeling. Considering the wavy nature of water, the GMM background modeling significantly improves the detection efficiency by extracting important texture features from DCT blocks. The experimental results with various scenes have demonstrated that the proposed ship detection approach can achieve both improved detection accuracy and enhanced real-time performance in comparison to traditional object detection techniques. The main contribution is that we propose the texture features based on DCT coefficient energy for sea-surface modeling. These texture feature vectors are effective in representing the highly dynamic backgrounds, including large waves, wakes and foams. Using the GMM with DCT-based feature vectors, our detection scheme can deal with complex maritime scenes and extreme imaging conditions, e.g., haze and glints. Moreover, the independent detectors for sky and sea regions increase the detection sensitivity to small objects around the horizon line. The proposed horizon detection method also reduces the computational complexity by limiting modeling regions. Future work will be focused on improving the recognition ability to discriminate ships from other foreground objects. A ship classifier will also be incorporated into our detection system. References Agrafiotis, P., Doulamis, A., Doulamis, N., Georgopoulos, A., May 2014. . Multi-sensor target detection and tracking system for sea ground borders surveillance. In: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments, Rhodes, Greece, Albrecht, T., West, G.A., Tan, T., 2011. Visual maritime attention using multiple low-level features and naïve bayes classification. In: Proceedings of the International Conference on IEEE Digital Image Computing Techniques and Applications, pp. 243–249. Arshad, N., Moon, W.S., Kim, J.N., 2011. An adaptive moving ship detection and tracking based on edge information and morphological operations. In: Proceedings of the SPIE International Conference on Graphic and Image Processing, pp. 8285– 8290. Bloisi, D., Iocchi, L., 2009. Argos: a video surveillance system for boat traffic monitoring in venice. Int. J. Pattern Recognit. Artif. Intell. 23 (7), 1477–1502.

62

Ocean Engineering 141 (2017) 53–63

Y. Zhang et al.

Zhang, W., Zheng, Y., 2011. Intelligent ship-bridge collision avoidance algorithm research based on a modified Gaussian Mixture Model. In: Proceedings of the International Conference on Multimedia Technology. pp. 6414–6419. Zhu, C., Zhou, H., Wang, R., 2010. A novel hierarchical method of ship detection from spaceborne optical image based on shape and texture features. IEEE Trans. Geosci. Remote Sens. 48 (9), 3446–3456. Zou, Z., Shi, Z., 2016. Ship detection in spaceborne optical image with SVD networks. IEEE Trans. Geosci. Remote Sens. 54 (10), 5832–5845.

Yan, Z., Yan, X., Xie, L., Wang, Z., 2012. Inland ship edge detection algorithm based on improved canny operator. J. Converg. Inf. Technol. 7 (22), 567–575. Zhang, R., Yao, J., Zhang, K., Feng, C., Zhang, J., 2016. S-CNN-based ship detection from high-resolution remote sensing images. In: Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, July 2016, Prague, Czech, pp. 423–430. Zhang, Y., Negahdaripour, S., Li, Q., 2016. Low bit-rate compression of underwater imagery based on adaptive hybrid wavelets and directional filter banks. Signal Process.: Image Commun. 47, 96–114.

63