Novel event analysis for human-machine collaborative underwater exploration

Pattern Recognition 96 (2019) 106967 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/patcog ...

Download PDF

4MB Sizes 0 Downloads 13 Views

Report

Full Text

Pattern Recognition 96 (2019) 106967

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/patcog

Novel event analysis for human-machine collaborative underwater explorationR Yang Cong a,b,∗, Baojie Fan d, Dongdong Hou a,b,c, Huijie Fan a,b, Kaizhou Liu a,b, Jiebo Luo e a

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, China Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, China c University of Chinese Academy of Sciences, China d College of Automation, Nanjing University of Posts and Telecommunications, China e Department of Computer Science, University of Rochester, USA b

a r t i c l e

i n f o

Article history: Received 27 November 2018 Revised 27 March 2019 Accepted 11 July 2019 Available online 19 July 2019 Keywords: Underwater Underwater robot Visual summarization Visual saliency Visual tracking Robot vision Video analysis Novel event Deep sea

a b s t r a c t One of the main task for deep sea submersible is for human-machine collaborative scientiﬁc exploration, e.g., human ourselves drive the submersible and monitor cameras around the submersible to observe new species ﬁsh or strange topography in a tedious way. In this paper, by deﬁning novel marine animals or any extreme events as novel events, we design a new deep sea novel visual event analysis framework to improve the eﬃciency of human-machine collaboration and improve the accuracy simultaneously. Specifically, our visual framework concerns diverse functions than most state-of-the-arts, including novel event detection, tracking and summarization. Due to the power and computation resource limitation of the submersible, we design an eﬃcient deep learning based visual saliency method for novel event detection and propose an online object tracking strategy as well. All the experiments are depending on Chinese Jiaolong, the manned deep sea submersible, which mounts several PanCtiltCzoom (PTZ) camera and static cameras. We build a new novel deep sea event dataset and the results justify that our human-machine collaborative visual observation framework can automatically detect, track and summarize the novel deep sea event.

1. Introduction Underwater robots [1] are widely applied for marine scientiﬁc exploration. For example, the manned Chinese deep sea submersible [2,3], Jiaolong, are adopted for human-machine collaborative observing deep sea events and human-machine collaborative collecting deep sea samples, e.g., marine salvage, deep sea rescue, mineral/oil resources exploitation, deep sea biological research, deep sea gene acquisition, etc. Scientists can get to deep sea via the submersible, where they could observe the underwater world, map the topographic, and use various tools such as robot arm to grab biological samples from the seabed. Generally, there

R This work is supported by National Nature Science Foundation under Grant (61722311, U1613214, 61821005, 61533015), CAS-Youth Innovation Promotion Association Scholarship (2012163) and Liaoning Revitalization Talents Program (XLYC1807053). ∗ Corresponding author. E-mail addresses: [email protected] (Y. Cong), [email protected] (B. Fan), [email protected] (D. Hou), [email protected] (H. Fan), [email protected] (K. Liu), [email protected] (J. Luo).

https://doi.org/10.1016/j.patcog.2019.106967 0031-3203/© 2019 Elsevier Ltd. All rights reserved.

© 2019 Elsevier Ltd. All rights reserved.

are several cameras mounted around the submersible, all these works are done by human operators manually, e.g., online monitor and analyze the videos on the screen. Therefore, how to improve the eﬃciency of human-machine collaboration for deep sea exploration, accomplish more tasks under the energy and computation limitation is a key problem, which could improve the human labor and reduce various errors or missing important events due to subjective or objective factors as well. In this paper, by concerning novel events as any unknown marine animals, interesting objects or moving particles, we focus on the semi-automatic novel event analysis for Chinese Jiaolong [2,3]. Due to several cameras are mounted around Jiaolong and online monitored by human crews manually, our mission is to help onboard crews to reduce the intensity of work and make the work more eﬃciency and accuracy accordingly. Some previous related works have been focusing on the problem of underwater visual analysis, however most of these works intend to handle single task, e.g., marine creatures detection and tracking [4]. For example, Zhou and Clark [5] propose to track underwater ﬁsh via monocular camera. Clark et al. [6], Forney et al. [7] follows the tagged leopard sharks using the autonomous underwater vehicle (AUV).

2

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

Lin et al. [8] intends to track marine lives via multi-AUV platform. Yim et al. [9] uses the remotely operated underwater vehicle (ROV) to track a shallow water nocturnal squid. Hsiao and Chen [10] proposes to track the spars ﬁsh. Chuang et al. [11] use the low frame rate stereo videos with low-contrast quality to track ﬁsh. Chuang et al. [12] designs a multiple kernel tracker to track deformable ﬁsh on moving platform. Gebali et al. [13] detects visual saliency from underwater video. Chuang et al. [14] recognizes underwater ﬁsh species depending on supervised and unsupervised features [15]. Ravanbakhsh et al. [16] detects underwater ﬁsh via shape-based level sets. Instead of most state-of-the-arts focusing on detecting and tracking of marine creatures, we design a general framework for novel deep sea event visual analysis, including novel event detection, novel event tracking and novel event summarization. Due to several cameras mounted around “Jiaolong”: for the PTZ cameras, our framework can ﬁrst use visual saliency to detect the novel events automatically, initialize the template of the corresponding events, and track them by the our designed online tracker. Moreover, in order to achieve human-in-loop control, the tracking process could be re-initialized by human crews manually anytime to improve the accuracy. For the static cameras, we adopt our previous group sparsity based video summarization to extract the key frames for the eﬃcient overview. For a fair comparison, we collect and build a new deep sea novel events dataset to verify the effectiveness of our visual framework. Generally, there are three main contributions as follows: i We propose a new problem, i.e., novel deep sea event analysis for deep sea scientiﬁc exploration to reduce the intensity of work and also improve the eﬃciency. To our best knowledge, this is the ﬁrst work to analyze deep sea novel events using multimedia technologies, including novel deep sea event detection, tracking and summarization simultaneously. ii Due to the energy/power resource limitation, we propose a general low cost visual analysis framework, including visual saliency detection via simple structured deep learning, especially our new online novel event tracking to overcome nonrigid deformation, and novel event summarization by our eﬃcient key frame extraction. iii We collect original videos from several times of sea tests of Chinese Jiaolong and build a new deep sea video dataset divided into three subdatasets with manually annotation individually. This new deep sea event analysis dataset will be released soon. 2. Related works By concerning any unknown marine creatures, interesting objects or moving particles as novel deep sea event, in this paper, we focusing on novel deep sea event analysis, including novel deep sea event detection, tracking and summarization. For novel event detection, most previous works focus on ﬁsh or underwater creatures detection [17], which is crucial for sea ﬁsh behaviors evaluation and the study of underwater creatures behavior. For example, Leow et al. [18] intends to identify of copepods using neural network, Chuang et al. [19] proposes to recognize underwater ﬁsh via unsupervised feature learning, Huang et al. [20] designs a hierarchical classiﬁer method for live ﬁsh recognition, Spampinato et al. [21] proposes a sea ﬁsh classiﬁcation framework to help marine biologists understand ﬁsh behavior. Most of these above methods are actually to identify of known creatures. However, the intention of deep sea submersible is used for scientiﬁc exploration, i.e., we may ﬁnd new kinds of creatures or new topographies. Therefore, we cannot have enough prior about the deep sea environment and collect enough samples to

train some classiﬁer. In this paper, we concern novel event detection as a saliency detection issue. Saliency detection [22–25] is to distinguish arbitrarily image changes, e.g., visual saliency for robot perception [26], visual saliency for object segmentation [22], spectral residual based visual saliency [27], context-aware based visual saliency [23], deep learning based visual saliency [24], non-local deep feature based visual saliency [25], visual attention based on saliency detection for rapid scene analysis [28], graph-based visual saliency [29], salient object detection with non-local deep learning features [25]. In this paper, we intends to detect novel event in an eﬃcient way. For novel event tracking, there are many prominent works for object tracking [30], e.g., kernelized correlation ﬁlters (KCF) [31], DSST [32], CN [33], SAMF [34], sparsity based collaborative model (SCM) [35], Struck [36], Tracking-learning-detection (TLD) [37], online metric learning tracker [38,39] and deep learning based tracking [40]. Most previous works mainly concern underwater creatures tracking, e.g., ﬁsh tracking [41,42]. For instance, Rui et al. [43] perform ﬁsh trajectory tracking via a dynamic model hypothesis by Extreme Learning Machine (ELM). Zhou et al. [44] adopt Gabor ﬁlter to accomplish ﬁsh tracking. In order to overcome the poor motion continuity, which cannot handle dynamic background object tracking due to ﬁxed stereo camera. To address this problem, Chuang et al. [12] designs a deformable part model (DPM) based multi-ﬁsh tracker for moving platforms, however, it is diﬃcult to overcome all these problems posed by the uneven illumination and ubiquitous noise in underwater scenario. Due to the deep sea ﬁsh tracking is an online learning process, where the ﬁsh frequently overlap with each other and suffer serious self-occlusion, the ﬁsh shape change frequently with deformable appearance, therefore, we design a new eﬃcient online tracking method. For novel event summarization, there are few works focusing on visual summarization for deep sea environment. For example, Gebali et al. [13] intend to detect interesting deep sea events via a video abstraction method by overcome small and slow moving ﬁsh imaging issue. Sooknanan et al. [45] enhance and summarize the Nephrops Habitats depending on underwater videos. Actually, visual summarization is a hot topic in multimedia domain, which intends to extract key frames or video skims from long video sequence to achieve knowledge condensing and knowledge search, e.g., egocentric video summarization [46], video summarization based on story driven[47], video summarization depending on large-scale web image priors [48], video summarization from consumer video [49], video summarization via group sparsity [50,51,52], multiview video summarization [53] and deep learning based video summarization [54]. In comparison with most state-of-the-arts only concerning on single task, e.g., marine ﬁsh tracking, our semi-automatic novel deep sea event analysis framework contains much more diverse functions including novel deep sea event detection, tracking and summarization, which intends to reduce the intensity of onboard crews work and improve the work accuracy and eﬃciency accordingly. 3. Overview of the platform “Jiaolong” In this paper, we use the Chinese “Jiaolong” [2] as the testing platform as shown in Fig. 1, where the maximum dive depth is more than 7,0 0 0 meters with no more than 3 human crews (1 pilot and 2 scientists), the total weight of “Jiaolong” in the air is 22 tons, the power is the silver-zinc battery. Most performance parameters of Jiaolong are shown in Table 1, including propulsion system, automatic control system, observation system, acoustic system, under water communication system, self-navigation system and work operation tool. To overcome various disturbances and improve the robustness for automatic control, e.g., asynchronous problem

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

3

Table 1 The major performance parameter of Chinese manned deep submersible “Jiaolong”. Index

Parameter

Size Speed Num of Crews Weight Propulsion System Control System Observation System Acoustic System Communication System Navigation System Work Tools

8.2m × 3.2m × 3.2m 1 kn, max 2.5 kn 3 in total 22 tons in air 1 front propeller, 2 middle propellers, and 4 tail propellers automatic orientation, depth, position, emergency manned control 8 Led lights, 8 cameras, imaging sonar, side sweep sonar 7 collision sonar, depth sonar, ultrashort/long baseline sonars VHF, underwater acoustic telephone / communication machine GPS, motion sensor, depth meter, Doppler meter 7-DOF robot arm, 5-DOF robot arm, hydrothermal sampler, sampling basket, drilling, etc

Fig. 2. The sensing framework of the Chinese “Jiaolong”.

Fig. 1. The Chinese manned deep sea submersible “Jiaolong”.

between the control period and measurements period, timevarying of system parameter, various uncertainty of the closedloop, both the adaptive unscented Kalman ﬁlter and the fuzzy control theory are adopted to achieve robust control and selfnavigation. Therefore, “Jiaolong” can achieve precise location, automatic navigation control, complex information monitor of the manned cabin, surface real-time monitor, virtual reality with semiphysical digital simulation and data analysis of black-box; and can also move ﬂexibly with automatic control including depth, heading, 3DOF altitude, velocity and dynamic 3DOF positioning. Jiaolong is a hybrid deep sea platform for marine research, which is mainly used for deep sea observation and deep sea sample collection, e.g., new species of marine creatures or new minerals. One 7-DOF and one 5-DOF hydraulic manipulator robot arms are mounted in front of “Jiaolong” to collect various marine samples, and it is also equipped with hydrothermal sampler, sampling basket, drilling, etc. As shown in Fig. 2, the observation system contains several ﬁxed cameras, HD cameras, PTZ cameras, LED lights, ICCD, and image sonar mounted in the front and around “Jiaolong”, which can be applied for deep sea creatures observation, self-security surveillance, visual navigation operation and sample collection. In this paper, we focus on the novel deep sea event analysis by using various videos collected from the observation system. Due to the current technology to monitor and analyze the videos of Jiaolong are achieved by human crews manually, in this paper, our intention is to assist human operator to reduce the intensity of onboard work and improve work eﬃciency accordingly.

4. Our framework for novel deep sea event visual analysis We intend to propose our human-machine collaborative framework for novel deep sea event visual analysis. We consider the novel events as any unknown or interesting objects or events, for example, new deep sea plants, new marine creatures (lobster, crab, ﬁsh), or mineral (e.g., Manganese nodules). Several PTZ and static cameras are mounted around the submersible, and our purpose is to help the onboard scientist to reduce the work load other than continuously watching the monitors all the time, i.e., ours could be concerned as a warning system to improve the probability of false alarm and missing generated by various human subject factors. The general framework is demonstrated in Fig. 3, which mainly includes three components, 1) the novel deep sea event detection, which is activated during the sailing stage by detecting various novel events as visual saliency; 2) the novel deep sea event tracking, which is initialized based on the results of visual saliency, and moreover, the tracking process can be re-initialized by human crews manually to involve the human for controlling; 3) the novel deep sea event summarization, which is activated during on stage of diving, sailing and ﬂoating, is adopted for quick overview of the big stored video data and novel event warning with the generated key frames. For more details, please check the following contents. 4.1. Novel deep sea event detection Generally for scientiﬁc exploration of novel event, we do not have suﬃcient prior knowledge about the deep sea environment. Therefore, it is hard to collect enough samples for object event detection, e.g., detect various underwater creatures via deep learning. Another strategy for object detection is by motion detection,

4

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

Fig. 3. The general visual framework for analyzing novel deep see events including detection, tracking and summarization.

Fig. 4. The framework of our deep sea novel event detection model.

where the basic assumption is that the objects with rapid motion should be novel events. However, all cameras are ﬁxed on the testing platform “Jiaolong” instead of tying on the seabed, so the motion based object detection cannot work well neither. Therefore, we consider the novel deep sea events detection as visual saliency detection issue [22–25,55,56]. Visual saliency models have been broadly used for object tracking, outlier detection, etc. Most of earlier saliency detection methods are motivated by human visual cognition system and intend to calculate various as saliency map using various hand-crafted features. However, without prior knowledge, these methods cannot generate satisﬁed results in all cases. Recently, the fully convolutional neural networks (FCNs) have been used for visual saliency detection. Due to the computation resource limitation, motivated by [24], we design a visual saliency method based on simple structured deep learning, where the framework is shown in Fig. 4. Our model is a top-down view with 5 layers, where we use the short connections within the HED structure over the skip-layer structure for deep supervision, and also fuse the corresponding weight accordingly. For testing, the non-maximum suppression is adopted to generate the bounding box depending on the saliency map, where the events or objects in the bounding box are assumed as novel event, and the score to become a potential novel event is computed by averaging saliency map in the corresponding bounding box. The tuning threshold is preset manually to detect the novel event. Therefore, the greater the score of the event is, the more probability it becomes a novel deep sea event.

4.2. Novel deep sea event tracking The tracker template is initialized by the novel event detection. The deep sea event tracking suffers more challenges, e.g., non-rigid deformation of marine creatures, uneven light illumination, back scatter of water; to achieve robust tracking, we concern the novel deep sea event tracking as an online learning issue and design a new online learning based tracker to handle these challenges. Moreover, the power / consumption resource is limited onboard, so the low computational complexity is also a signiﬁcant fact needed to be concerned carefully. The developed online tracking algorithm is under the particle ﬁlter tracking framework. Given one particle, denoted X as its feature, we combine the hog and color features to represent the particle. The matrix X is the linear combination of the dictionary templates Z and the error matrix E, which indicates that the content in X are occluded or disturbed X = DZ + E = AW, A = [D, I], W = [Z; E], X = [X0,0 , . . . , Xm,n , . . . , XM−1,N−1 ], (M, N are the size of the particles), D is deﬁned as D = [A1 , . . . , Ak , . . . AK ]. Here, Ak contains all the circular shifts of the k-th base samples Am,n , k = 1, . . . K, where k

,N−1 m ∈ [0, . . . M − 1], n ∈ [0, . . . N − 1], Ak = [a0k ,0 , . . . , aM−1 ] with K k template base. Therefore, each Ak is circulant and A is the blockwise circulant. The particle correlation ﬁltering model intends to handle the following tracking objective function.

min X − AW 2F + γ1 W 2F , W

(1)

where γ 1 denotes a weight parameter. In order to increase the accuracy of the discriminative ﬁlter during the learning stage, we adds contextual information into the above function. The context

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

5

patches are sampled around the object of interest, and they can be considered as the hard negative samples, where the learned ﬁlter W generates greater response for the target patch and nearly zero for context patches. We intend to enhance it by formulating the context patches as the new regularizer in Eq. (1).

min X − AW 2F + γ1 W 2F + γ2

p

W

BiW 2F ,

(2)

i

where Bi is the circulant matrix of context patch bi , p denotes the size of the context patch, γ 2 is the weight parameter. The objective function in Eq. (2) can be formulated by stacking the context image patches below the target image patch to generate a new matrix C.

min XR − CW 2F + γ1 W 2F ,

(3)

W

√ √ where XR = [X, 0, . . . .0]T , C = [A, γ2 B1 , . . . , γ2 B p ]. In order to achieve a more robust correlation ﬁlter tracker, we exploit the anisotropy of the response, and develop a robust correlation ﬁlter tracking algorithm with elastic net loss function.

min L(XR − CW ) + γ1 W 2F ,

(4)

W

where L() denotes the elastic net loss function. Note that the reformulated model is a convex model with respect to both W and E. An iterative algorithm is required to approximate the solution. To solve the model in Eq. (4), we rewrite this equation as follows:

min CW − XR + E 2F + γ1 W 2F + δ L(E ), W,E

(5)

where δ is the tuning parameter. Eq. (5) can be separated into two subproblems, i.e.,

min CW − XR + E 2F + γ1 W 2F ,

(6)

min CW − XR + E 2F + δ L(E ),

(7)

W

E

The problem in Eq. (5) can be optimized these two subproblems alternately until the objective function values of the model converged. Eq. (6) can be eﬃciently solved as in the least square manner. The optimal solution of E in Eq. (7) is:

E=σ

δ

4 + 2δ

,

2 F −1 (XR − ) , 2+δ

(8)

where σ is the shrinkage operator with the deﬁnition as

σ (u, v ) = sign(v )max(0, |u| − v ). F −1 represents the inverse Fourier transform, refers the elementwise multiplication. is the dual conjugate of W, W = i θi i , where i is the feature-space projector. denotes the kernel matrix with each element as i, j = iT i j .

We intend to summarize the novel deep sea visual event by considering such a problem as a key frame extraction issue [49,52]. Even we record and save all the video data for backup, it is a time consuming and tedious work to monitor them. Our intention is to recommend human crews with meaningful key frames in order to reduce the work burden, and improve the eﬃciency and accuracy generated by various subjective factors. Our previous model, the video summarization based on group sparse dictionary selection [50], is adopted here, where the key technology is to use the group sparsity via 2,1 norm to generate a more sparse key frames with global optimization. The novel deep sea event summarization model can be formulated as: S

λ 2

P − PS2F +

(1 − λ ) 2

S 2,1 ,

Table 2 The summary of our novel deep sea video dataset. ID

Name

#Clips

#Frames

1 2 3

Novel Event Detection Novel Event Tracking Novel Event Summarization

11 19 2

5174 9830 11,376

where P ∈ Rd×n is the feature pool extracted from the original video shot; S ∈ Rn×n is the pursuit coeﬃcient group sparse matrix; and λ ∈ [0 1] is the pre-set tuning parameter. The ﬁrst term intend to evaluate the quality of the reconstruction error by adopting the selected dictionary to recover the whole feature pool. The second term of Eq. (9) enhances the sparse property of the dictionary selection with 2,1 norm. The model in Eq. (9) induces a sparse solution of S, where the rows of S is sparse with Si. 2 = 0 as the selected dictionary features. Our model could generate a global optimization, where the convergence rates of our is O(1/T2 ) in comparing √ with the traditional method with sub-gradient descent O(1/ T ). T denotes the iteration number). We process our novel event summarization model on each video channel independently, where we ﬁrst extract the features from each frame, segment the long video into small video shots online and then generate the feature pool P depending on the feature in the video shot. Finally, the key frames will be summarized from each video shot by our group sparse model Eq. (9), and we can collect key frames from all video shots and recommend them to human crews accordingly. 5. Comparisons and experiments We ﬁrst collect and annotate a new dataset for novel deep sea event visual analysis, and then present various experiments and comparisons to verify the effectiveness of our method accordingly.

4.3. Novel deep sea event summarization

min : f (S ) =

Fig. 5. Demo Images of our Novel Deep Sea Event Video Dataset.

(9)

5.1. Novel deep sea video dataset All original videos are collected from Chinese Jiaolong, during several real sea tests, where each sea test occupies more than 10 h including diving, sailing and ﬂoating as shown in Fig. 3. The time of sea tests are distributed between August 2009 to October 2009, May 2010 to July 2010, July 2011 to August 2011, and June 2012 to July 2012 [57]. Several biologist are required to collect and annotate the novel deep sea events, where the novel events are including new underwater creatures, novel motion particles, etc. Some demo images are shown in Fig. 5 and the summary of the dataset is shown in Table 2. We then categorize it into three individual subdatasets depending on each speciﬁc tasks:

6

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

Fig. 6. Demo ﬁgures of the novel deep sea event detection by comparing ours (bottom row) with both FASA (second row) and NLDF (third row).

Table 3 The statistic results IOU of ours compared with the state-of-the-arts. The best one is marked by bold. ID

FASA

NLDF

Ours

1 2 3 4 5 6 7 8 9 10 11 Avg

0.2779 0.2696 0.2602 0.2038 0.2030 0.2091 0.2134 0.2178 0.2173 0.2118 0.2089 0.2106

0.2147 0.2136 0.2080 0.1969 0.2007 0.2072 0.2122 0.2155 0.2149 0.2271 0.2266 0.2126

0.2302 0.2285 0.2246 0.2076 0.2102 0.2129 0.2109 0.2133 0.2131 0.2235 0.2227 0.2143

i The Novel Deep Sea Event Detection Subdataset: There are totally 11 video clips collected with the length varying from 50 to 2868 frames. Various challenges are contained in these videos, e.g., the pose, abrupt light changing, object rotation, large-scale changes, abrupt particle motion and mutual occlusion. The resolution of the original frame is 768 × 576. The bounding box are used to manually annotate the objects from these video clips by human annotator every ﬁve frames. We randomly select 581 RGB-images from these videos and manually annotate them as the dataset, and then separate them as training and testing dataset with the ratio as 7: 3. ii The Novel Deep Sea Event Tracking Subdataset: There are totally 19 video clips collected each containing several hundreds frames. The novel events of each video clip are annotated manually and the groundtruth is recorded as the bounding box described by the (x, y) of the top left corner and bottom right corner for evaluation. iii The Novel Deep Sea Event Summarization Subdataset: There are totally 2 videos collected, where each video contains various novel deep sea events. Several human are invited to label the key frame separately, and we fuse their results to generate the groundtruth. Our algorithm is required to select key frames from the corresponding videos for comparison.

5.2. Novel deep sea event detection results We use our simple structured deep learning based visual saliency model to detect the novel deep sea event, where we ﬁrst estimate the saliency map by deﬁning the pixel value as the probability of novel. Then the non-maximum suppression method is used to denote the novel event by the bounding box depending on the saliency map. In this subsection, we compare two saliency

object detection methods due to they also performs eﬃcient as follows: 1. FASA [58]: Fast Accurate and Size-Aware Salient Object Detection. 2. NLDF [25]: Non-Local Deep Features for Salient Object Detection. The Intersection-Over-Union (IOU) is adopted as the criterion for evaluation:

IOU =

Detection Result GroundTruth Detection Result GroundTruth

The statistic results are shown in Table 3, where we can see that the average IOU of ours is 0.2143 greater than both FASA and NLDF. The demo results are shown in Fig. 6, and it is obviously to observe that the novel ﬁsh and lobster are founded, and some other tiny animals are also recognized. The tracking template could be initialized using these results for further novel deep sea object tracking. Speciﬁcally, the hyper-parameters are set as learning rate (1e-8), weight decay (0.0 0 05), momentum (0.9), loss weight for each side output (1). Our fusion layer weights are all initialized with 0.1667 in the training phase. 5.3. Novel deep sea event tracking results We evaluate our online tracking algorithm on the tracking benchmark [59] and our deep sea scenarios in this subsection, respectively. We use the same parameters and initialization for all the sequences, speciﬁcally we set γ1 = 0.001, γ2 = 0.01, δ = 0.001. For a fair comparison, there are two typical evaluation criteria adopted here. 1)The precision of tracking, which is deﬁned as the percentage of frames with the corresponding location errors less than the preset threshold. The location error is measured by the Euclidean distance between the tracked target and the human labeled ground truth. 2) The success rate of tracking, which is evaluated as the percentage of frames with overlap rates greater than the preset tuning parameter. The overlap rate is deﬁned on the area(ROI ROI ) PASCAL challenge object detection score as area(ROIT ROIG ) , where T

G

ROI indicates the bounding box, T, G represent the current tracking results and labelled ground truth, respectively. For the comparison on the tracking benchmark dataset [59], we perform the One-Pass Evaluation (OPE) using our online tracker on the public benchmark, and adopt the online toolbox [59] to estimate the evaluation plots. Due to there are 29 different object trackers in the benchmark dataset [59], we only compare ours with eight state-of-the-arts, including KCF [31], DSST [32], CN [33], SAMF [34], SCM [35], Struck [36] and TLD [37]. Specifically, we adopt the source codes and data of these benchmark tracker from the original authors themselves without tuning any parameters. The precision and success rate of OPE curves are plotted in Fig. 7. The statistic tracking results are summarized in

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

7

Fig. 7. Comparison the precision and success rate of our online tracker with the states-of-arts in [59] using the benchmark datasets. Table 4 Score of precision plot in comparison of ours with the state-of-the-arts depending on 11 attributes. The top three results are annotated as bold, italic and bolditalic, respectively. Attribute

Our

SCM [35]

Struck [36]

TLD [37]

KCF [31]

DSST [32]

SAMF [34]

CN [33]

MB FM OV DEF BC IV SV OCC LR IPR OPR

0.748 0.763 0.734 0.816 0.800 0.789 0.798 0.820 0.501 0.798 0.834

0.339 0.333 0.429 0.586 0.578 0.594 0.672 0.640 0.305 0.597 0.618

0.551 0.604 0.539 0.521 0.585 0.558 0.639 0.564 0.545 0.617 0.597

0.518 0.551 0.576 0.512 0.428 0.537 0.606 0.563 0.381 0.584 0.596

0.650 0.602 0.650 0.740 0.753 0.728 0.679 0.749 0.396 0.725 0.729

0.547 0.517 0.515 0.660 0.694 0.735 0.730 0.716 0.497 0.766 0.733

0.650 0.663 0.709 0.796 0.708 0.727 0.723 0.840 0.458 0.690 0.763

0.550 0.480 0.434 0.620 0.642 0.587 0.598 0.629 0.405 0.675 0.652

Table 5 Score of success rate plot in comparison of ours with the state-of-the-arts depending on 11 attributes. The top three results are annotated as bold, italic and bolditalic, respectively. Attribute

Our

SCM [35]

Struck [36]

TLD [37]

KCF [31]

DSST [32]

SAMF [34]

CN [33]

MB FM OV DEF BC IV SV OCC LR IPR OPR

0.572 0.575 0.599 0.586 0.578 0.556 0.521 0.563 0.367 0.551 0.580

0.298 0.296 0.361 0.448 0.450 0.473 0.518 0.487 0.279 0.458 0.470

0.433 0.462 0.459 0.393 0.458 0.428 0.425 0.413 0.372 0.444 0.432

0.404 0.417 0.457 0.378 0.345 0.399 0.421 0.402 0.309 0.416 0.420

0.497 0.460 0.551 0.534 0.535 0.494 0.427 0.514 0.312 0.497 0.496

0.464 0.435 0.459 0.510 0.517 0.563 0.541 0.534 0.409 0.560 0.535

0.519 0.515 0.611 0.622 0.526 0.534 0.516 0.621 0.361 0.509 0.555

0.410 0.373 0.410 0.438 0.453 0.417 0.384 0.428 0.311 0.469 0.443

Tables 4 and 5 using 11 different attributes. We can conclude that our proposed online tracker achieves favorable performance than other state-of-the-art trackers on the tracking benchmark. From these results, we can see that our online tracker outperforms the state-of-the-arts. For the comparison on our novel deep sea event tracking dataset, we select 3 trackers with low computation cost, i.e., KCF [31], TLD [37], Staple[60]. The corresponding result curves measured by both center error and overlap rate are demonstrated in Fig. 9. By adopting the center position error (CP) as the evaluation criterion, our online learning based tracker achieving the CP

as 12.37, performs better than KCF, Staple and TLD (the CP of them are 48.47, 62.43 and 23.16, respectively). By adopting the criterion of Overlap Rate (OR), our method with the OR as 0.59, also outperforms KCF, Staple and TLD (the OR of them are 0.42, 0.32, 0.57). The demo results are demonstrated as in Fig. 8, where the deep sea ﬁsh are disturbed by various disturbances, such as largescale changing and deformation, heavy occlusion, bad illumination. It is obvious that our proposed online tracker can robustly keep on tracking the deep sea object, where some of other tracking methods are loss and drift rapidly.

8

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

Fig. 8. Sample results of the novel deep sea event tracking.

Fig. 9. The result of both the Center Position error (CP) and Overlap Rate (OR) for Usual Deep Sea Event Tracking.

5.4. Novel deep sea event summarization result We intend to summarize the novel deep sea event in this subsection. Some results are demonstrated in Fig. 10, where the selected key frames of novel deep sea events are extracted from 5 static cameras. We can observe that the novel / unusual / interesting deep sea events are summarized, for example, various deep sea ﬁsh closing to Jiaolong submersible from far and near, gathering various deep sea mineral samples or inserting the logo ﬂag using the onboard robot arms. Depending on these extracted key frames, the work intensity of onboard scientist can be relieved without continuously monitoring the screens all the time; moreover, these results can be acting as a warning system to reduce the proba-

bility of false alarms or missing by various objective or subjective factors. 5.5. Compare the time consumption In this subsection, we intend to compare the time consumption of our method with other state-of-the-art methods. For novel deep sea event detection, we ﬁrst compare the runtime of our model with FASA [58], NLDF [25], where the results are shown in Table 6. Ours is more eﬃcient than the state-of-the-arts. For object tracking, the average running time is about 56.32fps on the tracking benchmark with 50 particle ﬁlers. The platform is equipped with a single NVIDIA TITAN Xp GPU and a 4.0GHz Intel processor.

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

9

Fig. 10. The result of usual deep sea event summarization from multi-camera systems.

Fig. 11. Compare the tuning parameters γ 1 , γ 2 and δ of our online tracker.

Table 6 The runtime of ours compared with the state-of-the-arts. The best one is marked by bold. Runtime(s)

FASA

NLDF

Ours

Method

0.341

0.042

0.029

tively. Due to the power consumption limitation, both the computational complexity and online learning ability are considered carefully. A novel deep sea video dataset is also gathered and labeled depending on Jiaolong, the Chinese deep sea submersible. Various experiments and evaluations are also adopted to verify the effectiveness of the proposed framework. Some future work plans and ideas are as follows,

5.6. Compare the tuning parameters γ 1 , γ 2 and δ

•

In this subsection, we compare the tuning parameters γ 1 , γ 2 and δ of our online tracker. As shown in Fig. 11, we can ﬁnd when these parameters are tuned in a relative wide range, the performance of our model is not changed abruptly. Therefore, we can conclude that our online tracking model is robust in practice.

•

6. Conclusions and future plans •

We introduce a new problem about analyzing the novel deep sea visual events by considering various deep sea creatures or any interesting / unknown events as novel events. To our best knowledge, ours is an earlier work to adopt a general human-machine collaborative framework for automatically deep sea extreme event analysis that mainly includes novel event detection, tracking and summarization simultaneously. We then design a semi-automatic visual framework to reduce the work intensity of the onboard crews and also improve the work eﬃciency for deep sea observation, where three components are included, i.e., novel event detection, novel event tracking and novel event summarization, respec-

•

The deep sea environment is an unknown world for human ourselves, it needs urgently new technologies, new intelligent platforms to explore the unknown world. Therefore, many new problems could be deﬁned for research scientists in future. The deep sea exploration is a high-cost task and the collection of deep sea videos is hard. Therefore, the size of the video dataset is still limited and more video data will be collected, annotated and added into our video dataset. Moreover, we plan to public this dataset for research purposes soon. Even we try to design visual algorithm with lower computation complexity to consume less power, all the experiments and evaluations are tested using online collected video dataset. In future, we plan to use our novel event visual analysis framework on the Jiaolong platform online, which will actually assist the human scientists to reduce the work intensity and improve the work eﬃciency. Our proposed problem is a general issue for deep sea / underwater observation, so our proposed framework could be extended to other platforms with similar tasks as well, such as unmanned remote ocean vehicle (ROV), autonomous underwater vehicle(AUV).

10

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967

References [1] L. Kang, L. Wu, Y. Wei, S. Lao, Y.-H. Yang, Two-view underwater 3d reconstruction for cameras with unknown poses under ﬂat refractive interfaces, Pattern Recognit. 69 (2017) 251–269. [2] K. Liu, P. Zhu, Y. Zhao, S. CUI, X. Wang, Research on the control system of human occupied vehicle ??jiaolong?? Chin. Sci. Bull. 58 (S2) (2014) 40–48. [3] L. Feng, Z. Huaiyang, W. Chunsheng, L. Xiangyang, H. Zhen, C. Cunben, Chinese jiaolong’s ﬁrst scientiﬁc cruise in 2013, in: IEEE OCEANS, IEEE, 2014, pp. 1–8. [4] A. Plotnik, S. Rock, Hybrid estimation using perceptional information: Robotic tracking of deep ocean animals, in: IEEE Journal of Oceanic Engineering, 36, 2011, pp. 298–315. [5] J. Zhou, C.M. Clark, Autonomous ﬁsh tracking by rov using monocular camera, in: The 3rd Canadian Conference on Computer and Robot Vision (CRV’06), IEEE, 2006. 68–68 [6] C.M. Clark, C. Forney, E. Manii, D. Shinzaki, C. Gage, M. Farris, C.G. Lowe, M. Moline, Tracking and following a tagged leopard shark with an autonomous underwater vehicle, J. Field Robot. 30 (3) (2013) 309–322. [7] C. Forney, E. Manii, M. Farris, M.A. Moline, C.G. Lowe, C.M. Clark, Tracking of a tagged leopard shark with an auv: Sensor calibration and state estimation, in: ICRA, IEEE, 2012, pp. 5315–5321. [8] Y. Lin, J. Hsiung, R. Piersall, C. White, C.G. Lowe, C.M. Clark, A multi-autonomous underwater vehicle system for autonomous tracking of marine life, J. Field Robot. 34 (4) (2017) 757–774. [9] S. Yim, C.M. Clark, T. Peters, V. Prodanov, P. Fidopiastis, Rov-based tracking of a shallow water nocturnal squid, in: Oceans-San Diego, 2013, IEEE, 2013, pp. 1–8. [10] Y.-H. Hsiao, C.-C. Chen, A sparse sample collection and representation method using re-weighting and dynamically updating omp for ﬁsh tracking, in: IEEE ICIP, 2016, pp. 3494–3497. [11] M.-C. Chuang, J.-N. Hwang, K. Williams, R. Towler, Tracking live ﬁsh from low– contrast and low-frame-rate stereo videos, IEEE Trans. Circ. Syst. Video Technol. 25 (1) (2015) 167–179. [12] M.C. Chuang, J.N. Hwang, J.H. Ye, S.C. Huang, Underwater ﬁsh tracking for moving cameras based on deformable multiple kernels, IEEE Trans. Syst. Man Cybern. Syst. 47 (9) (2017) 2467–2477. [13] A. Gebali, A.B. Albu, M. Hoeberechts, Detection of Salient Events in Large Datasets of Underwater Video, IEEE, 2012. [14] M.-C. Chuang, J.-N. Hwang, K. Williams, Supervised and unsupervised feature extraction methods for underwater ﬁsh species recognition, in: Computer Vision for Analysis of Underwater Imagery (CVAUI), 2014 ICPR Workshop on, IEEE, 2014, pp. 33–40. [15] Y. Zheng, B. Jeon, L. Sun, J. Zhang, H. Zhang, Student t-hidden markov model for unsupervised learning using localized feature selection, IEEE Trans. Circ. Syst. Video Technol. 28 (10) (2018) 2586–2598. [16] M. Ravanbakhsh, M. Shortis, F. Shaifat, A.S. Mian, E. Harvey, J. Seager, An application of shape-based level sets to ﬁsh detection in underwater images., GSR, 2014. [17] M. Mehrnejad, A.B. Albu, D. Capson, M. Hoeberechts, Detection of stationary animals in deep-sea video, in: Oceans - San Diego, 2013, pp. 1–5. [18] L.K. Leow, L.-L. Chew, V.C. Chong, S.K. Dhillon, Automated identiﬁcation of copepods using digital image processing and artiﬁcial neural network, BMC Bioinform. 16 (18) (2015) S4. [19] M.-C. Chuang, J.-N. Hwang, K. Williams, A feature learning and object recognition framework for underwater ﬁsh images, IEEE Trans. Image process. 25 (4) (2016) 1862–1872. [20] P.X. Huang, B.J. Boom, R.B. Fisher, Hierarchical classiﬁcation with reject option for live ﬁsh recognition, Mach. Vis. Appl. 26 (1) (2015) 89–102. [21] C. Spampinato, D. Giordano, R. Di Salvo, Y.-H.J. Chen-Burger, R.B. Fisher, G. Nadarajan, Automatic ﬁsh classiﬁcation for underwater species behavior understanding, in: ACM international workshop on Analysis and retrieval of tracked events and motion in imagery streams, ACM, 2010, pp. 45–50. [22] Y. Li, X. Hou, C. Koch, J.M. Rehg, A.L. Yuille, The secrets of salient object segmentation, in: IEEE CVPR, 2014, pp. 280–287. [23] S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (10) (2012) 1915–1926. [24] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, P. Torr, Deeply supervised salient object detection with short connections, in: CVPR, IEEE, 2017, pp. 5300–5309. [25] Z. Luo, A. Mishra, A. Achkar, J. Eichel, S. Li, P.-M. Jodoin, Non-local deep features for salient object detection, CVPR, 2017. [26] Y. Yu, J. Gu, G.K. Mann, R.G. Gosine, Development and evaluation of object-based visual attention for automatic perception of robots, IEEE Trans. Autom. Sci. Eng. 10 (2) (2013) 365–379. [27] X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: CVPR, IEEE, 2007, pp. 1–8. [28] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1254–1259.

[29] J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: NIPS, 2007, pp. 545–552. [30] P. Liu, C. Liu, W. Zhao, X. Tang, Multi-level context-adaptive correlation tracking, Pattern Recognit. 87 (2019) 216–225. [31] J.F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation ﬁlters, Pattern Anal. Mach. Intell. IEEE Trans. 37 (3) (2015) 583–596. [32] M. Danelljan, G. Häger, F. Khan, M. Felsberg, Accurate scale estimation for robust visual tracking, in: BMVC, BMVA Press, 2014, pp. 1–11. [33] M. Danelljan, F.S. Khan, M. Felsberg, J. van de Weijer, Adaptive color attributes for real-time visual tracking, in: CVPR, IEEE, 2014, pp. 1090–1097. [34] Y. Li, J. Zhu, A scale adaptive kernel correlation ﬁlter tracker with feature integration, in: ECCV, 2014, pp. 1–12. [35] W. Zhong, H. Lu, M.-H. Yang, Robust object tracking via sparsity-based collaborative model, in: CVPR, 2012, pp. 1–8. [36] S. Hare, A. Saffari, P. H. S. Torr, Struck: structured output tracking with kernels, in: ICCV, 2011, pp. 1–8. [37] Z. Kalal, K. Mikolajczyk, J. Matas, Tracking-learning-detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) (2012) 1409–1422. [38] Y. Cong, B. Fan, J. Liu, J. Luo, H. Yu, Speeded up low-rank online metric learning for object tracking, IEEE Trans. Circ. Syst. Video Technol. 25 (6) (2015) 922–934. [39] W. Liu, D. Xu, I.W. Tsang, W. Zhang, Metric learning for multi-output tasks, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2019) 408–422. [40] L. Bertinetto, J. Valmadre, J.F. Henriques, A. Vedaldi, P.H. Torr, Fully-convolutional siamese networks for object tracking, in: European conference on computer vision, Springer, 2016, pp. 850–865. [41] A. Attanasi, A. Cavagna, L. Del Castello, I. Giardina, Greta-a novel global and recursive tracking algorithm in three dimensions, IEEE Trans. Pattern Anal. Mach. Intell. 37 (12) (2015). 1–1 [42] Z. Wu, T.H. Kunz, M. Betke, Eﬃcient track linking methods for track graphs using network-ﬂow and set-cover techniques, in: IEEE CVPR, 2011, pp. 1185–1192. [43] N. Rui, B. He, B. Zheng, M.V. Heeswijk, Q. Yu, Y. Miche, A. Lendasse, Extreme learning machine towards dynamic model hypothesis in ﬁsh ethology research, Neurocomputing 128 (5) (2014) 273–284. [44] J. Zhou, C.M. Clark, Autonomous ﬁsh tracking by rov using monocular camera, CRV, 2006. 68–68 [45] K. Sooknanan, Enhancement, Summarization and Analysis of Underwater Videos of Nephrops Habitats, Citeseer, 2014 Ph.D. thesis. [46] Y.J. Lee, J. Ghosh, K. Grauman, Discovering important people and objects for egocentric video summarization., CVPR, 1, 2012. 3–2 [47] Z. Lu, K. Grauman, Story-driven summarization for egocentric video, in: CVPR, IEEE, 2013, pp. 2714–2721. [48] A. Khosla, R. Hamid, C.-J. Lin, N. Sundaresan, Large-scale video summarization using web-image priors, in: CVPR, IEEE, 2013, pp. 2698–2705. [49] J. Luo, C. Papin, K. Costello, Towards extracting semantically meaningful key frames from personal video clips: from humans to computers, IEEE Trans. Circ. Syst. Video Technol. 19 (2) (2009) 289–301. [50] Y. Cong, J. Yuan, J. Luo, Towards scalable summarization of consumer videos via sparse dictionary selection, Multimed. IEEE Trans. 14 (1) (2012) 66–75. [51] S. Wang, Y. Cong, J. Cao, Y. Yang, Y. Tang, H. Zhao, H. Yu, Scalable gastroscopic video summarization via similar-inhibition dictionary selection, Artif. Intell. Med. 66 (2016) 1–13. [52] Y. Cong, J. Liu, G. Sun, Q. You, Y. Li, J. Luo, Adaptive greedy dictionary selection for web media summarization, IEEE Trans. Image process. 26 (1) (2017) 185–195. [53] J. Meng, S. Wang, H. Wang, J. Yuan, Y.-P. Tan, Video summarization via multiview representative selection, IEEE Trans. Image process. 27 (5) (2018) 2134–2145. [54] K. Kumar, D.D. Shrimankar, Deep event learning boost-up approach: delta, Multimed. Tool. Appl. (2018) 1–21. [55] D. Zhang, D. Meng, J. Han, Co-saliency detection via a self-paced multiple-instance learning framework, IEEE Trans. Pattern Anal. Mach. Intell. 39 (5) (2017) 865–878. [56] G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images, IEEE Trans. Geosci. Remote Sens. 54 (12) (2016) 7405–7415. [57] Y. Cong, B. Fan, K. Liu, H. Fan, Unusual event analysis for deep sea submersible, in: International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE, 2017, pp. 529–534. [58] G. Yildirim, S. Süsstrunk, Fasa: fast, accurate, and size-aware salient object detection, in: Asian Conference on Computer Vision, 2014, pp. 514–528. [59] Y. Wu, J. Lim, M.-H. Yang, Online object tracking: a benchmark, in: CVPR, IEEE, 2013, pp. 2411–2418. [60] L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, P.H. Torr, Staple: complementary learners for real-time tracking, in: CVPR, 2016, pp. 1401–1409.

Y. Cong, B. Fan and D. Hou et al. / Pattern Recognition 96 (2019) 106967 Yang Cong is a full professor of Chinese Academy of Sciences. He received the B.Sc. degree from Northeast University in 2004, and the Ph.D. degree from State Key Laboratory of Robotics, Chinese Academy of Sciences in 2009. He was a Research Fellow of National University of Singapore (NUS) and Nanyang Technological University (NTU) from 2009 to 2011, respectively; and a visiting scholar of University of Rochester. He has served on the editorial board of the Journal of Multimedia. His current research interests include image processing, compute vision, machine learning, multimedia, medical imaging, data mining and robot navigation. He has authored over 60 technical papers. He is a senior member of IEEE. Baojie Fan received the B.S. degree in automation from Qufu Normal University, Qufu, China, in 2006; the M.S. degree in automation from Northwest University, Xi’an, China, in 2008; and the Ph.D. degree in pattern recognition and intelligent system from State Key Laboratory of Robotics, Shenyang Institute Automation, Chinese Academy of Sciences, Beijing, China. His research interests include UAV vision system, space robot, object tracking, and pattern recognition.

Dongdong Hou is currently a Ph. D candidate in State Key Laboratory of Robotics, Shenyang Institute of Automation Chinese Academy of Sciences, University of Chinese Academy of Sciences. She received the B.S. degree from Hebei University of Technology, China, in 2014. Her current research interests include abnormal events detection, dictionary selection, and sparse representation.

11

Huijie Fan received the B.S. degree in automation from University of Science and Technology of China, P. R. China, in 2007 and the doctor’s degree in pattern recognition and intelligent systems from University of Chinese Academy of Sciences, P. R. China, in 2014. She is a research associate in Shenyang Institute of Automation, Chinese Academy of Sciences. Her research interests include Medical image processing and machine learning.

Kaizhou Liu received his Ph.D. degree in Mechatronic Engineering from the University of Chinese Academy of Sciences in 2007. From 2004, he is with the Shenyang Institute of Automation, Chinese Academy of Sciences, where he currently is a professor. He has published more than 80 journals and conference papers. His research interests include modeling and simulation, path planning and obstacle avoidance, autonomous navigation, and virtual reality for unmanned/manned underwater vehicles.

Jiebo Luo joined the Department of Computer Science at the University of Rochester in 2011 after a proliﬁc career of 15+ years with Kodak Research. His research spans computer vision, machine learning, data mining, social media, and biomedical informatics. He has authored 300+ technical papers and 90+ US patents. He has served as the program chair of ACM Multimedia 2010, IEEE CVPR 2012, ACM ICMR 2016, and IEEE ICIP 2017, as well as on the editorial boards of the IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), IEEE Trans. on Multimedia (TMM), IEEE Trans. on Circuits and Systems for Video Technology (TCSVT), Patter Recognition, Machine Vision and Applications (MVA), and ACM Trans. on Intelligent Systems and Technology (TIST). He is a Fellow of the SPIE, IEEE, and IAPR.

Novel event analysis for human-machine collaborative underwater exploration

Novel event analysis for human-machine collaborative underwater exploration

Recommend Documents