Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene

ARTICLE IN PRESS JID: NEUCOM [m5G;November 9, 2019;1:29] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing j...

Download PDF

4MB Sizes 0 Downloads 32 Views

Report

Full Text

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 9, 2019;1:29]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene Zhongli Wang, Guohui Tian∗ School of Control Science and Engineering, Shandong University, 73 Jingshi Road, Jinan, China

a r t i c l e

i n f o

Article history: Received 12 June 2019 Revised 30 September 2019 Accepted 17 October 2019 Available online xxx Communicated by Dr. Shen Jianbing Shen Keywords: Home scene Saliency detection Manifold ranking Boundary expansion Corners clustering Saliency map

a b s t r a c t In this paper, we propose a novel framework for saliency detection of home scene by exploiting manifold ranking, boundary expansion, and corners clustering. Our proposed method ﬁrstly combines color cues in RGB and CIELab to select image boundary seeds, and exclude the ones which might be located at salient objects as much as possible. Then, we utilize the boundary seeds on each image boundary as the queries of manifold ranking to compute saliency and integrate them for a background-based saliency map. For the foreground-based saliency detection. Boundary expansion combined with background-based saliency map highlights foreground regions, which are regarded as queries for a foreground-based saliency map. Moreover, we achieve center prior saliency map through multi-scale Harris corner detection and corners clustering to further highlight salient regions and suppress background regions. Finally, we integrate the three saliency maps via the proposed uniﬁed framework for a more accurate and smooth saliency map. Both qualitative and quantitative experimental results indicate that our proposed method can deliver better performance than several state-of-the-art saliency detection methods as a whole.

1. Introduction Salient object detection aims to locate salient and important regions in an image, which is different from the traditional models of predicting human eye ﬁxations [1]. Research on saliency detection can reveal the visual attention mechanism of humans, as well as model human selection behavior. So, the ﬁeld of saliency detection has attracted widely interest and lots of research, and it has been applied to various visual tasks, such as image segmentation [2], image retrieval [3], image compression [4], object classiﬁcation [5], object recognition [6], etc. In recent years, home service robot has been a hot research ﬁeld, and can accompany the old and help people do some housework to alleviate people’s burdens, such as ﬂoor cleaning, food heating, item passing,desktop arranging, etc. However, to accomplish these tasks well, robot needs excellent visual information processing capabilities to detect objects relevant to the task. Thereby image/video processing of home scene has attracted lots of attentions. On the one hand, with the development of imaging system of home scene, obtaining the useful information from massive home image/video data is becoming diﬃcult and time-consuming. On the other hand, the arrival of an aging soci-

∗

Corresponding author. E-mail address: [email protected] (G. Tian).

© 2019 Elsevier B.V. All rights reserved.

ety, the reduction of labor force and the acceleration of people’s life rhythm are all in urgent need of service robots to enter the family and perform their tasks independently. Thus, the precision detection of home scene objects is one of the major and necessary issues in research on autonomous task execution of robots nowadays. Saliency detection of home scene, serves as a pre-processing step, can not only locate important or interesting regions or salient objects in home scene with many practical applications, but also be beneﬁcial to the development of visual information processing technology in the future. Compared with the traditional visual saliency detection method. Our proposed saliency detection method is mainly used to detect salient objects or regions in home scene images. Home scenes generally refer to people’s daily life and living indoor environment, which have different family backgrounds, such as kitchen scene, living room scene, bedroom scene, bathroom scene, etc. In practice, home scene image has complex background information, and even some objects with low contrast to the background. Specially, there may exist multiple objects in an image or objects touching the image boundary. The objects in the home scene are of different sizes and categories, with large differences in categories, and they may be used as backgrounds for each other, leading to complex and changeable object backgrounds. While the objects in the outdoor scene are relatively single, with the earth, sky, road, plants and large buildings as the background, and the differences in categories in the same picture are relatively small. All these factors interact

https://doi.org/10.1016/j.neucom.2019.10.063 0925-2312/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

with each other, which increases the diﬃculty of saliency detection and forms a challenging problem to be solved by the traditional saliency detection method. Therefore, modeling the saliency detection mechanism in home scene is a new and open research issue. In the future, further efforts for saliency detection of home scene should be made for the development of home vision information processing technology. In this paper, we propose a new method combining graphbased manifold ranking with boundary expansion and corners clustering for saliency detection of home scene. Our ﬁrst contribution is a novel and reliable background measure. We measure boundary saliency through boundary color contrast in RGB and CIELab to select image boundary regions not including salient object as background regions. And then we select background regions on each boundary of an image as background seeds. Thus, four boundary-based saliency maps are achieved according to the similarity rankings with background seeds in each image boundary respectively, and integrated for a background-based saliency map. Our third contribution is that we expand boundary by means of boundary expansion to highlight foreground regions. We combine these foreground regions with background-based saliency map as queries to gain a foreground-based saliency map via graph-based manifold ranking. Our four contribution is that we gain a center prior saliency map via corners clustering. Multi-scale Harris corner detector is used to detect image corners. In order to gain saliency map based on center prior and reduce computation load, we exploit foreground regions obtained by combining boundary expansion with background-based saliency map to ﬁlter corners that are located at the background regions, and then cluster remaining corners to compute center prior saliency. Finally, these three different saliency maps are integrated to produce the ﬁnal saliency map for saliency detection of home scene. The remaining structure of this paper is organized as follows. Section 2 gives an overview of the related work. Section 3 describes the details of our algorithm. The experimental results and conclusions are given in Sections 4 and 5, respectively. 2. Related work Recently, many efforts have been made to propose algorithms for salient object detection. For lack of high-level information about salient target, the number of algorithms in favor of lowlevel information has increased, and most of the algorithms are based on the superpixel units obtained segmentation [7–9]. Based on the observation that salient regions show high contrast in some cases. Achanta et al. [10] detect salient regions in images using low-level features of luminance and color contrast to the entire image, which is simple and eﬃcient but suffers a great problem in dealing with complex background. Ma and Zhang [11] measure image saliency based on color contrast between each pixel and its neighbor pixels. Shen et al. [12] transform a higher-order energy function to a lower-order one at a local region of an image through ﬁrst or second-order Taylor expansion for image object segmentation. Goferman et al. [13] exploit local contrast by computing the dissimilarity of a region only with its relevant context. These methods are all about local contrast, which can generate clearer saliency maps. However, it prefers to show high frequency complement like object boundaries rather than the global salient target [14]. Some algorithms based on global contrast are investigated to measure image saliency, which can reduce the effect caused by local contrast. For example, global region contrast based method [15] uses spatially weighted color contrast to compute saliency, which uniformly highlight the interior target but is sensitive to the size of target. Zhai and Shah [16] propose an eﬃcient algorithm for measuring saliency utilizing the color histograms of images. Perazzi et al. [17] formulate complete contrast and saliency

estimation by rating the uniqueness and the spatial distribution of color. In addition, more and more other prior information are commonly used to improve performance in salient object detection. The center prior information assume that salient objects are always located at the center of an image. Li et al. [18] utilize predeﬁned Gaussian model to measure salient regions via the center prior. Yang et al. [19] utilizes center prior to highlight regions around an image center via a predeﬁned Gaussian model measure regional saliency scores based on the relevances given to each side of the pseudo-background queries via the manifold ranking on an undirected weighted graph. The Background prior is utilized for salient object detection, assuming that background regions are located at the boundary of an image. Wang et al. [20] utilize two types of saliency distance measures based on foreground and background cues for precisely separating salient object from the background. Zhu et al. [21] integrate boundary connectivity based on geodesic distance into a quadratic objective function to gain the ﬁnal saliency map. Qin et al. [22] formulate a background-based saliency map utilizing color and space contrast with the clustered boundary seeds.Wang et al. [23] propose a reliable and temporally consistent saliency measurement of superpixels as a prior for pixel-wise labeling based on geodesic distance. Guo et al. [24] utilize object-level saliency prior to rank and select the salient regions. As two kinds of cues, Edges and disparity boundaries are in [25] integrated together for stereo saliency detection. Intraframe boundary information and inner-frame motion information together are applied in [26] to estimate salient regions in videos. Specially, along with the rise of deep learning, it is universally exploited in salient object detection. Unsupervised learning methods are all manual design features, just like the above methods, while deep learning can extract convolution features through deep neural network [27–29]. Han et al. [30] propose a novel deep learning framework for salient object detection by ﬁrst modeling the background and then separating salient objects from the background. Li et al. [31] create a novel deep network architecture, which aims to automatically convert an existing deep contour detection model into a salient object detection model without using any manual salient object masks, and in [32] propose a network model, which employs multi-scale casecade. Zhang et al. [33] train a deep salient object detector without utilizing any human annotation via the means of “supervision by fusion”. Fully Convolutional Neural Networks(FCNs) are applied in [34–38] for salient object detection. Due to most of these methods reveal different effects in salient object detection, and each of them has distinctive advantages. So, we absorb their better merits to propose our method for salient object detection of home scene. 3. Proposed saliency detection method This section illustrates the framework of our proposed method, which incorporates manifold ranking, boundary expansion and corners clustering to simulate saliency detection, and is presented in Fig. 1. Simple linear iterative clustering (SLIC) algorithm [7] is ﬁrstly used to segment home scene image into a series of superpixels that are regarded as minimum processing units for capturing the structural information of home scene image. Secondly, we adopt boundary color contrast method to gain background seeds, which exclude salient object patches being located at the boundary image. For the background-based saliency detection, we regard the background seeds on each boundary of image as queries in turn and integrate them. Thirdly, boundary expansion combined with background-based saliency map highlight foreground regions. We regard these foreground regions as queries to rank with other image patches and gain foreground-based saliency measure. Fourthly, multi-scale Harris corner detector is used to detect image corners

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

3

Fig. 1. Framework of our proposed salient object detection model.

and ﬁlter corners that are located at the background regions, and then cluster remaining corners to compute center prior saliency map. Finally, all these different types of saliency information are integrated to form a more smooth and accurate saliency map. 3.1. Graph-based manifold ranking The graph-based manifold ranking problem is represented as follows: a node is regarded as a query, and the remaining nodes are ranked based on their relevances to the given queries. The goal is to learn a ranking function, which deﬁnes the relevance between unlabeled nodes and queries. The intrinsic manifold structure of data is exploited by manifold ranking [39]. SLIC splits an original image into a series of superpixels, each of which can be viewed as a node. Therefore, we can deﬁne a graph as G = (V, E ), where V denotes node set consisting of image patches, i.e, superpixels, and E represents a set of undirected edges. Some superpixels are the queries and the rest are the superpixels that need to be ranked according to their relevances to the queries. Let aﬃnity matrix Wi, j = [wi, j ]N×N records the weight between any two adjacent nodes i and j, and it is deﬁned as follows:

wi, j = exp −

ci − c j σ2

(1)

Where ci , cj are the mean color vectors of the superpixel i and j in CIELab, respectively. σ 2 controls the strength of the weight. Let y = [y1 , y2 , . . . , yn ] denotes an indication vector, in which yi = 1 represents that superpixel is marked as the query item, and yi = 0, otherwise. So, a ranking algorithm is needed to estimate its relevance to the query item. According to the ranking algorithm in [40], the result ranking function is deﬁned as follows:

f = (D − αW )

−1

y

(2)

Where D = diag{d11 , d22 , . . . , dmm } is the degree of Aﬃnity matrix W, namely dii = nj=1 wi j ,I is identity matrix, α speciﬁes the relative contributions to the ranking scores from neighbors and the initial ranking scores. f is the ﬁnal ranking result, indicating the degree of association between all image superpixels and query items. More information related to the graph construction and manifold ranking could be found in paper [18,39]. 3.2. Saliency detection based on background information 3.2.1. Background information selection Based on the observation that salient object is likely to appear at or near the center of an image, most image background is easily found on the boundary. So, it makes senses that saliency is deﬁned as the contrast versus boundary, and we can extract superpixels along the image boundary as background prior regions. Although image background is large and homogenous, there may be some foreground noises in the boundary regions, i.e, when salient object touches the boundary of an image, it may be foreground noises and cause a failure of salient object detection. Therefore, we propose a background seed selection measure to remove foreground noises. It can be observed that boundary superpixels whose centroids lie in within a certain number of pixels from four boundaries of the image to be the boundary set. When salient object touches image boundary, the boundary superpixels located at the object are more salient than other boundary superpixels, and large contrast compared to other boundary superpixels. So, we select background seeds by computing saliency si of every boundary superpixels in RGB color space and CIELab color space. si is deﬁned as follows:

si =

2 n 1 n f =1

f m − m f i j

(3)

j=1, j=i

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

JID: NEUCOM 4

ARTICLE IN PRESS

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

Fig. 2. Example of boundary extraction.(a)input image,(b) SLIC, (c) whole boundary set, (d) salient regions on the boundary(e) selected boundary set, (f) ground truth.

Where f is color space, namely, f = 1 is RGB and f = 2 is CIELab. n is the number of boundary superpixels. Supposed that the average saliency color value and variance of all the boundary superpixels are denoted as avg and var, respec tively. If the inequality si − 2f =1 avg f > 2f =1 var f is satisﬁed, the boundary superpixel vi is thought as salient object part and should be excluded form boundary set. Through the above treatment, foreground noises are removed as much as possible and the remaining superpixels are background seeds. Fig. 2 shows that our boundary selection method can exclude salient object located at the boundary as much as possible. 3.2.2. Background-based saliency detection We respectively rank the selected boundary seeds on each boundary of an image, rather than only rank all boundary superpixels. In our proposed scheme, we regard background seeds on each image boundary as queries, which are labelled nodes, and the remaining image superpixels are regarded as unlabeled nodes. We rank the relevances between the background seeds on each image boundary and the rest image superpixels. So, we can construct four saliency maps based on boundary prior, and integrate them for a background-based saliency map. Taking one image boundary as an example. We utilize the selected background seeds on this boundary as queries, and other nodes as the unlabeled data. Therefore, indication vector y is given, and all nodes are ranked based on the Eq. (2) for the vector f. It is a N-dimensional vector, the number of whose is equal to image superpixels. Each element in this vector indicates the relevances between the background seeds and other image superpixels. The complement of each element in vector f is saliency measure, and assigned to corresponding image superpixels. We normalize this vector to the range from 0 to 1, and gain the saliency map based on boundary seeds on one image boundary. It is deﬁned as follows:

sbj (i ) = 1 − f¯bj (i ), i = 1, 2, . . . , N; j = 1, 2, 3, 4

(4)

Where sb is boundary-based saliency measure, i is a superpixel node on constructed graph, j is different image boundary, j = 1 is the top image boundary, j = 2 is the right image boundary, j = 3 is the bottom image boundary, j = 4 is the left image boundary. According to the approach mentioned above, we can gain four saliency maps based on the four image boundaries, respectively. As is shown in the Fig. 3, which can be clearly seen that each saliency map has its own advantages to present salient object, but these maps have much noises and fail to detect to entire object. So, we integrate these four saliency maps for a better saliency map Sb :

Sb ( i ) =

4 j=1

sbj (i )

(5)

A background-based saliency map can be gained via Eq. (5). Although background-based saliency map (Fig. 3(e)) is better than four saliency maps computed by each image boundary, there still exists some background noises (Fig. 3(d)). So, we need to deal with it further. 3.3. Saliency detection based on boundary expansion and corners clustering Currently the center prior usually assumes that salient object regions are at the center of an image. When salient object is deviating from the center of an image or touches the image boundary, it may not be effectively detected by center prior. However, the texture of salient object are more enriched than background environment, and salient object contains more corners. Corner is the point whose testing function energy is very intense in any direction changes, and is one of the most important feature of image data information [41], and not only can reduce data redundancy but also improve the detection eﬃciency [42]. Therefore, we choose corner to compute center prior saliency and design multi-scale Harris corner detector to ﬁnd image corners. Moreover, boundary expansion combined with background-based saliency map can highlight foreground regions, and we can utilize these foreground regions to ﬁlter corners that are located at boundary regions for reducing computation. Foreground regions are also used as queries to be ranked by graph-based manifold ranking for foreground saliency. 3.3.1. Boundary expansion Many salient object detection methods based on foreground always regard background-based saliency map as the foundation to compute saliency map. However, if the background-based saliency map does not highlight the foreground well or exists much background noises, saliency map computed by backgroundbased saliency map may have a worse performance. So we extend boundary to highlight foreground regions for a better saliency map. Difference metrics between image superpixels are deﬁned as follows:

mer (i, j ) = exp

d2p (i, j )

σ32

dc (i, j )

(6)

where dp is the Euclidean distance of mean position vector i and j, and normalizing the two-dimensional coordinates of the image block according to the width and height of the corresponding image, respectively. dc is the Euclidean distance of mean color vectors between image patches i and j in CIELab. σ 3 adjusts the intensity of differences between different image blocks. When the value of is 0.5 to 1, the boundary expansion result is relatively stable.

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

5

Fig. 3. Background-based saliency map.(a) input image,(b) SLIC, (c) boundary extract, (d) saliency map based on each boundary, (e) background-based saliency map, (f) ground truth.

According to the Eq. (6), difference metrics between image superpixels and image boundary superpixels is deﬁned as follows:

m(v, B ) = min mer (v, j ) v∈B, j∈B

(7)

Where B is the image boundary superpixels set. v belongs to the remaining superpixels except the boundary superpixels. When the image superpixel ρ satisﬁes the following inequality, it can be expanded to background.

m (ρ , B ) <

M 1 mer (i, B ) M

(8)

i=1

Where M is the number of remaining superpixels. We select the boundary expansion to expand the background from the image boundary to the image interior, which can gain background superpixels as much as possible and highlight foreground superpixels. Moreover, we segment the background-based saliency map by adaptive threshold to obtain a set of superpixels located at salient regions. The intersection set of two sets of superpixels are selected to highlight foreground regions. It is shown as follows:

R = Af

Ab

(9)

where R is the intersection set. Af is the set of foreground superpixels achieved from boundary expansion, Ab is the set of foreground superpixels achieved from threshold segmentation of background-based saliency map. Superpixels in intersection set R are regarded as the ﬁnal foreground regions and the remaining superpixels are regarded as background regions. We rank the relevances between the foreground regions selected as queries and the remaining regions selected as unlabeled nodes. So, we can construct saliency map based on foreground prior:

S f (i ) = f¯f (i )

(10)

Where f¯f is a normalization vector from 0 to 1. Fig. 4 shows some examples of foreground regions and foreground-based saliency maps. The result of boundary expansion (Fig. 4(d)) covers most of the background regions. The ﬁnal foreground regions (Fig. 4(g)) are the intersection of foreground regions (Fig. 4(e)) and (Fig. 4(f)) from boundary expansion and adaptive threshold segment of background-based saliency map respectively, and it can be noted that the ﬁnal foreground regions better highlight salient regions. We utilize manifold ranking to rank the foreground regions for a foreground-based saliency map (Fig. 4(h)).

3.3.2. Corners clustering Harris corner detection algorithm is developed from Moravec algorithm. The main idea of Harris algorithm can be found in [43]. In order to detect corner and compute saliency based on center prior effectively, multi-scale Harris that is an improved Harris corner detection [44] is proposed. This method is presented as follows: wavelet transform is used to deﬁne the grayscale change of image pixel that is located at (x, y), and establishes autocorrelation matrix M2 j+1 of each pixel in the scale 2 j+1 . Corner corresponding function C2 j+1 (x, y ) = det (M2 j+1 − k(t rac (M2 j+1 ))) is used to detect corner. Supposed that T is threshold of an image and C2 j+1 (x, y ) > T , pixel, which is located at (x, y), is regard as corner and denoted as (ai , bi ), i = 1, 2, . . . , k, where k is the number of corners. Corners may be anywhere in an image, and the corners on the background of the image will interfere with the salient object detection. Therefore, it is necessary to ﬁlter the corners that are located at the background. We select foreground regions gained by combining boundary expansion and background-based saliency map, and then the corners located at background regions can be partly removed. The remaining corners are denoted as (ai , bi ), i = 1, 2, . . . , k, most of them are on the foreground region except that a small number of corners are still located on the background regions. For these remaining corners, we employ the K-means clustering to group them into one group with a center of (a0 , b0 ). Finally, a two-dimensional Gaussian model is established to deﬁne the saliency value based on center prior. It is shown as follows:

s¯c (i ) =

ci − c j · w p (i, j ) · wc (i )

(11)

j∈M

where

( xi − x j )2 ( yi − y j )2 w p (i, j ) = exp − − 2σx2 2σy2

wc (i ) = exp −

( xi − a0 )2 ( yi − b0 )2 − 2σx2 2σy2

(12)

(13)

s¯c (i ) is the saliency value of the superpixel i. xi and yi is the center of superpixel i. M is the number of superpixels. wp (i, j) is a Gaussian weight function, which gives a larger value to superpixel i, when superpixel i and j are close, otherwise a smaller value. wc (i) gives a larger value when superpixel i is close to location (a0 , b0 ), otherwise give a smaller value. σ x and σ y are horizontal and vertical variances of images, respectively.

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

Fig. 4. Example of foreground regions and foreground-based saliency map.(a)input image,(b) SLIC, (c) boundary extraction,(d) boundary expansion, (e) foreground regions obtained by boundary expansion, (f) foreground regions obtained by adaptive threshold segment of background-based saliency map, (g) ﬁnal foreground regions come from the intersection set of (e) and (f), (h) foreground-based saliency map (i) ground truth.

Fig. 5. Center prior saliency measure. (a) input image, (b) corner detection, (c) foreground regions, (d) corners ﬁltering, (e) corners clustering, (f) center prior saliency map, (g) ground truth.

In addition, the correction method is used to adjust saliency value. It is shown as follows:

Sc ( i ) =

s¯c (i ) k ≥ m 1 k
(14)

Where k is the number of corners located at the foreground regions, m is the threshold of the number of corners. Fig. 5 shows some examples of center prior saliency measure. Corners ﬁltering can reduce the number of corners (Fig. 5(d)). Corners clustering make corners cluster into one corner (Fig. 5(e)), which is regarded as a center point of an image. Fig. 5(f) shows the center prior saliency map, which we can see that it can locate the location of salient object accurately on the image. 3.4. Final saliency map integration Now, the background-based saliency map, the foreground-based saliency map and the center prior saliency map are respectively obtained by using background, foreground and center prior information in the above sections. Both background-based saliency map and foreground-based saliency map not only work well in revealing the global color distinction, but also perfectly preserve the whole information of the salient object. However, when an image has the low contrast ratio between the foreground and background, the accuracy of the algorithm may be reduced. On the contrary, the center prior saliency map can favorably highlight the salient regions. When salient objects are scattered in various areas of an image, it may reduce the accuracy of saliency detection. All of them are still bumpy and noisy. For making full use of advantages of each saliency map, we present a principal framework to integrate these saliency maps and generate the ﬁnal clean saliency map SF (i):

SF (i ) = exp(S f (i )) +

j∈[b, f ]

(1 − exp(−S j (i ) · Sc (i )))

(15)

Fig. 6 shows the results of each stage of our proposed method. By combining three saliency maps using Eq. 15, we can achieve integrated saliency map(Fig. 6(e)) not only suppressing background noises (Fig. 6(b)(d)) and highlighting salient objects. 4. Experimental results In this section, we present our experimental results to evaluate our proposed saliency detection model on three public datasets and one dataset that is established by us. The classic or representative methods, with which we compare, are IT98 [48], SR07 [49], FT09 [50], SEG10 [51], MSS10 [52], CA10 [53], FES11 [54], HC11 [55], SF12 [56], PCA13 [57], GMR13 [19], GC13 [58], LPS15 [59], RCRR18 [60]. for fair comparison, we use the saliency maps, which is either gained form the authors or generated by code with default parameter from paper authors. 4.1. Datasets To evaluate the performance of our method, this section compares with the classic or representative methods on three public datasets ECSSD [45], DUT-OMRON [19] and MSRA10K [46],as well as our own established dataset HOME-VISION. ECSSD dataset contains 10 0 0 images with multiple scale objects surrounded with complex but semantically meaningful. DUT-OMRON dataset contains 5168 images, each of which is selected from more than 14,0 0 0 images with one or more salient object and relatively background. MSRA10K dataset consists of pixel level ground truth annotation for 10,0 0 0 images, each of which has an unambiguous salient object and the object region is accurately annotated with pixel wise ground truth labeling. HOME-VISION is a home scene dataset, which is established by us. The dataset contains a total of 846 home scene images, including a variety of items, such as table, chair, cup, pan, sofa, refrigerator, TV, shoes, bed, computer, etc., and has different family back-

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

JID: NEUCOM

ARTICLE IN PRESS

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

7

Fig. 6. Example of saliency maps integration. (a) input image, (b) background-based saliency map, (c) center prior saliency map, (d) foreground-based saliency map, (e) integrated saliency map (f) ground truth.

Fig. 7. Pipeline of HOME-VISION dataset.The ﬁrst row of ﬁgure represents labeling of object contour coordinates, and the second row of ﬁgure represents segmentation of object instance. Taking four kinds of objects as an example. (a) TV, (b) handbag and umbrella, (c) pot.

grounds, such as kitchen scene, living room scene, bedroom scene, balcony scene, corridor scene, bathroom scene, home study scene, etc. Some of these images are downloaded directly from the internet and some are collected from home scenes using kinect2.0 cameras. In order to distinguish salient or non-salient object, ten students in Service Robot Laboratory of Shandong University are invited to observe each image, record the object they see at ﬁrst sight, and count the probability. The object with a high probability is designated as salient object. Then the salient objects of images are in our HOME-VISION dataset, which are labeled manually by a software called Labelme [47]. Fig. 7 shows the labeling pipeline.

4.2. Evaluation metrics In this paper, we evaluate the performance of our proposed method using standard precision-recall curves. In order to draw precision-recall curve, we binary the continuous saliency map using a threshold from 0 to 255, and set a threshold every 5 values to gain a series of binary maps, then compute the binary maps against the ground truth. An eﬃcient method needs not only a high precision value but also a high recall value, so we compute the combination of precision and recall, which is called F-measure

and deﬁned as:

Fβ =

(1 + β 2 ) · precision · recall β 2 · precision + recall

(16)

where we set β 2 = 0.3 to emphasize the precision. Although the precision-recall curve is universally utilized for evaluation of salient object detection, it fails to evaluate the true negative saliency measurements. For the limitation of the precision-recall curve, we adopt the value of mean absolute error (MAE) [22] as a metric, which calculates the average distinction between the saliency map S and the ground truth G. The evaluate method mainly be used in the area of image segmentation, and it reveals the similarity between the salient map and the ground truth. The lower the value is, the better the method will be. It is deﬁned as:

MAE =

W H 1 |S(x, y ) − G(x, y )| W ×H

(17)

x=1 y=1

Moreover, AUC (Area under ROC) is also utilized to test the performance of our method. AUC is achieved by calculating the area under the Receiver Operating Characteristic (ROC) curve. The ROC is the plot of true positive rate (TPR) versus the true negative rate (TNR) via variable threshold of saliency map in [0,255]. The scores

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM

Precision

8

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

Precision Recall F-measure

0.2 BSFB BSECB BSBS

0.1

0.1

0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

BSFB

BSECB

BSBS

BSECB

BSBS

Precision

Recall

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

Precision Recall F-measure

0.2 BSFB BSECB BSBS

0.1

0.1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 BSFB

Recall

Fig. 8. Performance of background-based saliency through three kinds of boundary prior methods on the ECSSD (a), (b) and HOME-VISION(c), (d).

of AUC range from 0 to 1.The closer the AUC score is to 1, the better the performance of the salient object method is. 4.3. Validation of our proposed saliency detection model In order to evaluate the boundary prior proposed by us,we carry out an ablation experiment to analyze the performance of background-based saliency map through boundary prior method on ECSSD and HOME-VISION dataset. Fig. 8 shows the performance of background-based saliency map through three kinds of boundary prior, from which we can see that the performance of background-based saliency map obtained by using our proposed boundary selection (BSBS) are better than that obtained by using the four image boundaries as background seeds (BSFB), or that obtained by combing four saliency maps via using each complete boundary as background seeds according to Eq. (5) (BSECB). The performance of the background-based saliency map obtained by our method is improved by 7.5%, 2.6% on the ECSSD dataset, and 9.7%, 1.4% on the HOME-VISION dataset compared to BSFB

and BSECB in terms of F-measure, respectively. This is because that boundary selection can extract foreground regions as much as possible, which makes the boundary region contain more background. Therefore, boundary seeds selection can improve the performance of saliency detection. To further analyze the effectiveness of our method. We conduct more experiments on ablation and each part of the proposed method. Fig. 9 presents effectiveness analysis of our proposed method on ECSSD and HOME-VISION dataset. It is clearly seen that the precision-recall of foreground-based saliency map is above background-based saliency map. This is because boundary expansion method combined with background-based saliency map can better highlight foreground regions, which serve as seeds for image ranking rather than ranking only through background-based saliency map. Center prior saliency map obtained from Gaussian model based on cluster ﬁltering and corners clustering, which can determine the approximate regions of salient regions in an image, so, we utilize center prior saliency map to optimize the performances of background-based saliency map, which can minimize

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 9, 2019;1:29]

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

0.5 BSBS BGCS FGS FBCS FCENS INS INS+ECB INS+FB INS+NCP

0.4 0.3 0.2 0.1

0.5 BSBS BGCS FGS FBCS FCENS INS INS+ECB INS+FB INS+NCP

0.4 0.3 0.2 0.1

0

9

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

Recall

0.5

0.6

0.7

0.8

0.9

1

Recall

0.9

FB IN S+ N C P

S

B

S+ IN

BS

P C N S+

IN

B

S+ IN

EC

S

S+

IN IN

FC

C BG

EC

0 S+

0

IN

0.1

IN

0.1

S

0.2

C

0.2

S

0.3

FB

0.3

FG

0.4

BS

0.4

FB

0.5

EN S

0.5

FB C S

0.6

FG S

0.6

S

0.7

BS

0.7

BS

Precision Recall F-measure

0.8

S

0.8

C

Precision Recall F-measure

BG

0.9

EN S

0

FC

Precision

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

Fig. 9. Effectiveness analysis of our proposed method on ECSSD (a), (c) and HOME-VISION (b), (d).

background noises. After optimization, we gain a better precisionrecall curve and a higher F-measure on the two datasets, as shown in BGCS in Fig. 9. For the foreground-based saliency map (FGS), after combing the current scene saliency map with the center prior saliency map, the performances of the saliency map are slightly worse on the two datasets, as shown in FCENS in Fig. 9. This is because the combination of the two saliency will highlight salient object regions while ignoring the object boundary regions, which will reduce the salient regions and cause performance degradation. So, we utilize the second term of the Eq. (15) to gain a rough combined saliency map (FBCS), which has a better precision-recall curve and F-measure. Finally, we gain a ﬁnal saliency map(INS) using our proposed method via Eq. (15), whose precision-recall and F-measure are better than other components in our proposed method. Fig. 9 also shows ablation experiment of our proposed method to validate our proposed integration framework. The ablation experiment is designed as follows: when other conditions of our method remain unchanged, Final saliency map is obtained by using

four complete boundaries (INS+FB), by using each complete boundary (INS+ECB), by not using center prior (INS+NCP). From the ablation experiment, we can see that ﬁnal saliency map obtained by our proposed method is superior to the other three methods and gain better performances in precision-recall curve and F-measure, which illustrate the effectiveness of various prior information extraction and fusion. 4.4. Overall performance evaluation 4.4.1. Evaluation on the ECSSD, DUT-OMRON and MSRA10K datasets Compared with other state-of-the-art methods, we illustrate the qualitative evaluation of the three public datasets by plotting Precision-recall curves and F-measure in Figs. 10–12. Moreover, quantitative results compromised of AUC, F-measure and MAE are shown in Table 1. (1) Evaluation on ECSSD dataset: ECSSD dataset contains complex background and are diﬃcult to detect entire objects. So, we select the ECSSD dataset to evaluate the performance of our pro-

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM 10

1

0.9

0.9

0.8

0.8

Precision Recall F-measure

0.7 IT SR FT MSS SEG CA FES HC SF PCA GC GMR LPS RCRR Our

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.6 0.5 0.4 0.3 0.2 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

SF PC A G C G M R LP RC S RR O ur

0.1

IT SR

0

CA FE S HC

0

0

FT M SS SE G

Precision

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

Fig. 10. Precision-recall curves and F-measure comparing with different state-of-the-art methods on ECSSD dataset.

0.8

0.7

0.7

0.6

0.6

IT SR FT MSS SEG CA FES HC SF PCA GC GMR LPS RCRR Our

0.5

0.4

0.3

0.2

0.1

0.5

0.4

0.3

0.2

0.1

0.6

0.7

0.8

0.9

1

Recall

SF PC A G C G M R LP RC S RR O ur

0.5

G

0.4

CA FE S HC

0.3

SE

0.2

IT SR

0.1

FT

0 0

SS

0

M

Precision

Precision Recall F-measure

Fig. 11. Precision-recall curves and F-measure comparing with different state-of-the-art methods on DUT-OMRON dataset. Table 1 Quantitative comparisons of AUC and MAE scores with different methods on ECSSD,DUT-OMRON and MSRA10K datasets.

ECSSD DUT-OMRON MSRA10K

AUC MAE AUC MAE AUC MAE

IT

SR

FT

SEG

MSS

CA

FES

HC

SF

PCA

GMR

GC

LPS

RCRR

Our

0.577 0.271 0.636 0.198 0.640 0.213

0.632 0.264 0.688 0.181 0.736 0.232

0.663 0.289 0.682 0.250 0.790 0.235

0.810 0.341 0.825 0.337 0.882 0.298

0.780 0.243 0.817 0.177 0.875 0.203

0.784 0.309 0.815 0.254 0.818 0.237

0.862 0.212 0.848 0.156 0.898 0.185

0.704 0.330 0.734 0.311 0.867 0.215

0.817 0.228 0.803 0.183 0.905 0.175

0.876 0.247 0.887 0.206 0.941 0.185

0.890 0.187 0.853 0.189 0.944 0.189

0.804 0.214 0.796 0.197 0.917 0.233

0.872 0.186 0.856 0.147 0.947 0.123

0.893 0.184 0.853 0.182 0.949 0.122

0.915 0.176 0.898 0.173 0.950 0.120

posed method with 14 state-of-the-art methods. From the Fig. 10, what can be clearly seen is that our method obtains a better precision-recall curve and outperforms other methods in terms of F-measure bar. Our method achieves the ﬁrst best in terms of AUC and MAE in Table 1, which further demonstrate that our method can suppress the background noise and make salient object simi-

lar to the ground truth. Combing the above results, our proposed method outperforms the currently existing state-of-the-art methods on the ECSSD dataset. (2) Evaluation on DUT-OMRON dataset: DUT-OMRON has complex background with pixel-wise ground truth. So, we select the DUT-OMRON dataset to evaluate the performance of our method

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

1

1

0.9

0.9

0.8

0.8 IT SR FT MSS SEG CA FES HC SF PCA GC GMR LPS RCRR Our

0.6 0.5 0.4 0.3 0.2 0.1

Precision Recall F-measure

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

SF PC A G C G M R LP RC S RR O ur

0.1

CA FE S HC

0 0

FT M SS SE G

0

IT SR

Precision

0.7

11

Fig. 12. Precision-recall curves and F-measure comparing with different state-of-the-art methods on MSRA10K dataset.

with 14 state-of-the-art methods. From the Fig. 11, it can be seen that the precision-recall curve of our proposed method is above almost all the precision-recall curves of these candidates, and Fmeasure bar outperforms other methods. From the Table 1, it can be observed that we can achieve the ﬁrst best in terms of AUC and the third best in terms of MAE. Combining all these facts, our method can perform well on the DUT-OMRON dataset. (3) Evaluation on MSRA10K dataset: The MSRA10K dataset contains 10,0 0 0 images, each of which has an unambiguous salient object. So, we select the MSRA10K dataset to evaluate the performance of our method with 14 methods. From the Fig. 12, it is clearly seen that the precision-recall curve of our method covers most of the candidate methods,and F-measure bar is lower than RCRR and higher than other methods. From the Table 1. it can be observed that our method achieves the ﬁrst best AUC and the third best MAE. Combining all these facts together, our method can achieve a better performance on the MSRA10K dataset. To further illustrate performance of our method, Fig. 13 shows a visual comparison of various saliency detection methods on three public datasets, from which we can intuitively see that our method obtains better performance on the selected saliency images as a whole. The IT [48] is based on the structure obtained by biological detection, and the resolution of saliency map is low, so it can not detect objects in images well. The FT [50] can highlight inner regions of salient object, while it contains much background noises. The CA [53] can only highlight the contour, but it cannot highlight the inner regions of salient object. The HC [55] utilizes color difference between salient and background regions to compute saliency. So, it performs well when salient object has obvious color difference with background. but, if the inner regions of salient object have different color difference, it may fail to highlight entire salient object. The SF [56] has a better ability than HC [55] to suppress the background noises,but it cannot highlight the entire salient object, which contains different color regions.The GMR [19] regards superpixels as processing unit, and performs well via graph based manifold ranking, when salient objects are located at the center of an image.if salient object touches image boundary, it may fail to detect salient object and contain much background noises. The LPS [59] and RCRR [60] can suppress background noises well, but it cannot highlight entire salient object containing different color.

Table 2 Quantitative comparisons of AUC and MAE scores with different methods on HOME-VISION dataset.

HOME-VISION

AUC MAE

LPS

GMR

RCRR

Our

0.890 0.179

0.902 0.173

0.906 0.171

0.924 0.175

Compared with the above method, our proposed method generates saliency maps that are visually similar to the ground truth. It shows high robustness under various cases, even in cases with complex background regions. when salient objects touches the image boundary, our method can highlight entire object and suppress background noises well. As a whole, our proposed method is better than other ﬁfteen methods on these datasets. 4.4.2. Evaluation on the HOME-VISION dataset According to the above experimental results on the three datasets: ECSSD, DUT-OMRON and MSRA10K, we can see that RCRR, LPS and GMR have a better performance than other methods. So we compare our proposed method with the three classic methods on the HOME-VISION dataset to evaluate salient object detection of home scene. Fig. 14 shows precision-recall curves and F-measure on HOMEVISION dataset. It can be noted that our method can gain a better precision-recall curve, and a higher value in precision, recall and Fmeasure than RCRR, LPS and GMR. Table 2 shows that our saliency detection results with AUC higher than RCRR, LPS and GMR. In terms of MAE, our method has poor performance than GMR and RCRR, and a better performance than LPS. Fig. 15 shows further comparison results with GMR, LPS and RCRR utilizing visual saliency maps. It can be noted that our proposed method can locate almost entire salient objects and highlight better object outlines than the other three methods. From the Fig. 13, we can know that most of the outdoor images have relatively simple backgrounds, with earth, sky and plants as simple backgrounds, however in Fig. 15, it can be seen that the background in a home scene is relatively complex, and other objects may be used as the background. An image taken from a home scene often has multiple salient objects, and there are even ob-

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM 12

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

Precision

Fig. 13. Some saliency maps of the compared methods. (a)input image, (b)IT, (c) FT, (d) CA, (e) HC, (f) SF, (g) GMR, (h) LPS, (i) RCRR, (j) Our, (k) ground truth.

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3 RCRR LPS GMR Our

0.2 0.1

Precision Recall F-measure

0.2 0.1

0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LPS

GMR

RCRR

Our

Recall Fig. 14. Precision-recall curves and F-measure comparing with different state-of-the-art methods on HOME-VISION dataset.

jects with less volume or low contrast to the background, which makes it diﬃcult to detect salient object. When there exist multiple salient or low contrast targets in images, the current salient object detection methods may lead to target loss or incomplete detection. Our proposed method is superior to GMR, RCRR, LPS in saliency detection of multi-object or low contrast object. The reason is attributed to multi-scale corner detection and corners clustering, which are designed to effectively reﬂect the approximate position of salient objects in images, rather than assuming that the salient objects are in the center of the image. These experimen-

tal results verify that although other methods can be applied to saliency detection of home scene, our proposed method is able to produce reliable and promising saliency detection results. 4.5. Running time Table 3 shows the average running time of some state-of-theart models over all 400 images of HOME-VISION dataset on a PC with intel Core I7-8700 at 3.2 GHz and 8 GB RAM. We implemented our proposed method by using python, and the code

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

13

Fig. 15. Visual comparison of our method with GMR, LPS and RCRR. (a) input image, (b) GMR, (c) LPS, (d) RCRR, (e) Our, (f) ground truth.

Fig. 16. Failure cases. (a) input image, (b) background-based saliency map, (c) center prior saliency map, (d) foreground-based saliency map, (e) integrated saliency map (f) ground truth.

Table 3 Average running time of different methods.

with the background, as shown in the second row of Fig. 16, the saliency detection method may fail.

Method

CA

PCA

GMR

LPS

RCRR

Our

Code Time(s)

Matlab 36.05

Matlab 3.03

Matlab 0.94

Matlab 2.45

Matlab 0.95

Python 1.03

for the other models comes directly from the code published by their respective authors. Here, we just list some saliency detection methods with good performance. Our method is faster than CA, PCA and LPS, but slower than GMR and RCRR. Our proposed model spends 0.8% on selecting the boundary seeds, 11.3% on corner ﬁltering and clustering, 15.3% on background-based saliency map, 20.2% on foreground-based saliency map, on 36.4% on center prior saliency map. 4.6. Failure cases This paper presents a novel framework for saliency detection of home scene image by exploiting manifold ranking, boundary expansion, and corners clustering. Our proposed method is highly effective for most tasks of salient object detection of home scene. However, when there exists a non-signiﬁcant object occupies a large area of the image, as shown in the ﬁrst row of Fig. 16, and if salient objects of home scene have extremely similar appearances

5. Conclusion and future work In this paper, we propose a salient object detection method, which accurately locates salient object of home scene by manifold ranking, boundary expansion and corners clustering. We ﬁrstly extract image boundary, and take each of image boundary as queries to gain four saliency maps, which are integrated for a background-based saliency map. Then boundary expansion combined with background-based saliency map highlights foreground regions, which are utilized as queries for a foreground-based map. Thirdly, we use multi-scale Harris corner detector to detect image corners. Corners in the foreground regions are preserved and clustered for a center prior map. We generate the ﬁnal saliency map by integrating the background-based saliency map, foreground-based saliency map and center prior map. Experimental results demonstrate that our proposed method achieves better performance than the other state-of-the-art methods in terms of several evaluation metrics on three public datasets, as well as HOME-VISION dataset. Although our proposed method is superior to most of the existing algorithms, there are still many research points in our future work, such as how to effectively detect complete salient objects from high resolution and complex home scene images, how

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

JID: NEUCOM 14

ARTICLE IN PRESS

[m5G;November 9, 2019;1:29]

Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

to add depth or high-level information of objects to saliency detection. Due to the rise of deep learning, high-level information can be added to the network to effectively extract deep convolution features and improve saliency detection accuracy. Therefore, our future work will focus on deep neural network learning methods based on high-order information of objects to further improve the accuracy of detection of salient objects in home scenes. Declaration of Competing Interest There are no conﬂicts of interest. Acknowledgment This work is supported by the National Key R&D Program of China (2018YFB1307101) and the National Natural Science Foundation of China (U1813215). References [1] Z. Chen, H. Wang, L. Zhang, et al., Visual saliency detection based on homology similarity and an experimental evaluation, J. Vis. Commun. Image Represent. 40 (2016) 251–264. [2] E. Rahtu, J. Kannala, M. M. Salo, et al., Segmenting salient objects from images and videos, in: Proceedings of the 11th European Conference on Computer Vision, Part V, Springer, Berlin Heidelberg, 2010. [3] T. Chen, M.M. Cheng, P. Tan, et al., Sketch2photo: internet image montage, ACM transactions on graphics (TOG) 28 (5) (2009) 124. [4] Y. Fang, Z. Chen, W. Lin, et al., Saliency detection in the compressed domain for adaptive image retargeting, IEEE Trans. Image Process. 21 (9) (2012) 3888–3901. [5] C. Siagian, L. Itti, Rapid biologically-inspired scene classiﬁcation using features shared with visual attention, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2) (20 07) 30 0–312. [6] Z. Ren, S. Gao, L.T. Chia, et al., Region-based saliency detection and its application in object recognition, IEEE Trans. Circuits Syst. Video Technol. 24 (5) (2014) 769–779. [7] S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (10) (2012) 1915–1926. [8] J. Shen, Y. Du, W. Wang, et al., Lazy random walks for superpixel segmentation, IEEE Trans. Image Process. 23 (4) (2014) 1451–1462. [9] J. Shen, X. Hao, Z. Liang, et al., Real-time superpixel segmentation by DBSCAN clustering algorithm, IEEE Trans. Image Process. 25 (12) (2016) 5933–5942. [10] R. Achanta, F. Estrada, F. Wils, et al., Salient region detection and segmentation, in: Proceedings of the International Conference on Computer Vision Systems, Springer, Berlin, Heidelberg, 2008, pp. 66–75. [11] Y.F. Ma, H. J. Zhang, Contrast-based image attention analysis by using fuzzy growing, in: Proceedings of the 11th ACM International Conference on Multimedia, 2003, pp. 374–381. [12] J. Shen, et al., Higher order energies for image segmentation, IEEE Trans. Image Process. 26 (10) (2017) 4911–4922. [13] S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (10) (2012) 1915–1926. [14] A. Borji, M.M. Cheng, H. Jiang, et al., Salient object detection: a benchmark, IEEE Trans. Image Process. 24 (12) (2015) 5706–5722. [15] M.M. Cheng, N.J. Mitra, X. Huang, et al., Global contrast based salient region detection, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 569–582. [16] Y. Zhai, M. Shah, Visual attention detection in video sequences using spatiotemporal cues, in: Proceedings of the 14th ACM international conference on Multimedia, ACM, 2006, pp. 815–824. [17] A. Hornung, Y. Pritch, P. Krahenbuhl, et al., Saliency ﬁlters: contrast based ﬁltering for salient region detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2012. [18] C. Li, Y. Yuan, W. Cai, et al., Robust saliency detection via regularized random walks ranking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2710–2717. [19] C. Yang, L. Zhang, H. Lu, et al., Saliency detection via graph-based manifold ranking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3166–3173. [20] W. Wang, J. Shen, L. Shao, F. Porikli, Correspondence driven saliency transfer, IEEE Trans. Image Process. 25 (11) (2016) 5025–5034. [21] W. Zhu, S. Liang, Y. Wei, et al., Saliency optimization from robust background detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2814–2821. [22] Y. Qin, H. Lu, Y. Xu, et al., Saliency detection via cellular automata, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 110–119. [23] W. Wang, J. Shen, R. Yang, et al., Saliency-aware video object segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 40 (1) (2017) 20–33. [24] F. Guo, W. Wang, J. Shen, et al., Video saliency detection using object proposals, IEEE Trans. Cybern. 48 (11) (2017) 3159–3170.

[25] W. Wang, J. Shen, Y. Yu, et al., Stereoscopic thumbnail creation via eﬃcient stereo saliency detection, IEEE Trans. Vis. Comput. Graph. 23 (8) (2016) 2014–2027. [26] W. Wang, J. Shen, L. Shao, Consistent video saliency using local gradient ﬂow optimization and global reﬁnement, IEEE Trans. Image Process. 24 (11) (2015) 4185–4196. [27] X. Dong, J. Shen, D. Wu, et al., Quadruplet network with one-shot learning for fast visual object tracking, IEEE Trans. Image Process. 28 (7) (2019) 3516–3527. [28] W. Wang, J. Shen, H. Ling, A deep network solution for attention and aesthetics aware photo cropping, IEEE Trans. Pattern Anal. Mach. Intell. 41 (7) (2018) 1531–1544. [29] X. Dong, J. Shen, Triplet loss in siamese network for object tracking, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 459–474. [30] J. Han, D. Zhang, X. Hu, et al., Background prior-based salient object detection via deep reconstruction residual, IEEE Trans. Circuits Syst. Video Technol. 25 (8) (2015) 1309–1321. [31] X. Li, F. Yang, H. Cheng, et al., Contour knowledge transfer for salient object detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 355–370. [32] X. Li, F. Yang, H. Cheng, et al., Multi-scale cascade network for salient object detection, in: Proceedings of the 25th ACM International Conference on Multimedia, ACM, 2017, pp. 439–447. [33] D. Zhang, J. Han, Y. Zhang, Supervision by fusion: towards unsupervised learning of deep salient object detector, in: Proceedings of the IEEE International Conference on Computer Vision, volume 2017, pp. 4048–4056. [34] W. Wang, J. Shen, Deep visual attention prediction, IEEE Trans. Image Process. 27 (5) (2017) 2368–2378. [35] G. Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5455–5463. [36] G. Lee, Y. W. Tai, J. Kim, Deep saliency with encoded low level distance map and high level features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 660–668. [37] Y. Wei, X. Liang, Y. Chen, et al., STC: a simple to complex framework for weakly-supervised semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (11) (2017) 2314–2320. [38] W. Wang, J. Shen, L. Shao, Video salient object detection via fully convolutional networks, IEEE Trans. Image Process. 27 (1) (2017) 38–49. [39] D. Zhou, J. Weston, A. Gretton, et al., Ranking on data manifolds, in: Proceedings of the Advances in Neural Information Processing Systems, 2004, pp. 169– 176. [40] D. Zhou, Learning with local and global consistency, in: Proceedings of the NIPS, 2004, p. 16. [41] L. Yi-bo, L. Jun-Jun, Harris corner detection algorithm based on improved contourlet transform, Proc. Eng. 15 (2011) 2239–2243. [42] L. Cai, Y. Liao, D. Guo, Study on image stitching methods and its key technologies, Comput. Technol. Dev. 18 (3) (2008) 1–4. [43] C. G. Harris, M. Stephens, A combined corner and edge detector, in: Proceedings of the Alvey Vision Conference, 15, 1988, pp. 10–5244. [44] X. Zhang, B. Li, D. Yang, A novel harris multi-scale corner detection algorithm, J. Electron. Inf. Technol. 29 (7) (2007) 1735–1738. [45] Q. Yan, L. Xu, J. Shi, et al., Hierarchical saliency detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1155–1162. [46] A. Borji, M.M. Cheng, Q. Hou, et al., Salient object detection: a survey, Eprint Arxiv 16 (7) (2014) 3118. [47] B.C. Russell, A. Torralba, K.P. Murphy, et al., Labelme: a database and web-based tool for image annotation, Int. J. Comput. Vis. 77 (1–3) (2008) 157–173. [48] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1998) 1254–1259. [49] X. Hou, L. Zhang, Saliency detection: A spectral residual approach, in: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007, pp. 1–8. [50] R. Achanta, S. Hemami, F. Estrada, et al., Frequency-tuned salient region detection, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2009, pp. 1597–1604. [51] E. Rahtu, J. Kannala, M. Salo, et al., Segmenting salient objects from images and videos, in: Proceedings of the European Conference on Computer Vision, Springer, Berlin, Heidelberg, 2010, pp. 366–379. [52] R. Achanta, S. Ssstrunk, Saliency detection using maximum symmetric surround, in: Proceedings of the IEEE International Conference on Image Processing, IEEE, 2010, pp. 2653–2656. [53] S. Goferman, L. Zelnikmanor, A. Tal, Context-aware saliency detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (10) (2012) 1915–1926. [54] M. M. Cheng, G.X. Zhang, N.J. Mitra, et al., Global contrast based salient region detection, Comput. Vis. Pattern Recognit. (2011). [55] M. M. Cheng, N.J. Mitra, X. Huang, et al., Global contrast based salient region detection, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 569–582. [56] A. Hornung, Y. Pritch, P. Krahenbuhl, Saliency ﬁlters: contrast based ﬁltering for salient region detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 733–740. USA [57] R. Margolin, A. Tal, L. Zelnik-Manor, What makes a patch distinct? in: Proceedings of the Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1139–1146.

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

JID: NEUCOM

ARTICLE IN PRESS Z. Wang and G. Tian / Neurocomputing xxx (xxxx) xxx

[58] Y. Wei, F. Wen, W. Zhu, et al., Geodesic saliency using background priors, in: Proceedings of the European Conference on Computer Vision, 2012, pp. 29–42. Italy [59] H. Li, H. Lu, Z. Lin, et al., Inner and inter label propagation: salient object detection in the wild, IEEE Trans. Image Process. 24 (10) (2015) 3176–3186. [60] Y. Yuan, C. Li, J. Kim, et al., Reversion correction and regularized random walk ranking for saliency detection, IEEE Trans. Image Process. 27 (3) (2018) 1311–1322. Zhongli Wang received the B.S degree in automation from Qufu Normal University, Rizhao, China, in 2013, and the M.S. degree in control science and engineering from Taiyuan University of Technology, Taiyuan, China, in 2018. He is currently pursuing the Ph.D. degree in control theory and control engineering with Shandong University, Jinan, China. His current research interests include object recognition and deep learning, saliency detection and robot task planning.

[m5G;November 9, 2019;1:29] 15

Guohui Tian (M2) was born in Hejian, Hebei, China, in 1969. He received the B.S. degree from the Department of Mathematics, Shandong University, Jinan, China, in 1990, the M.S. degree from the Department of Automation, Shandong University of Technology, in 1993, and the Ph.D. degree from the School of Automation, Northeastern University, Shenyang, China, in 1997. From 1999 to 2001, he was a Post-Doctoral Researcher with the School of Mechanical Engineering, Shandong University. From 2003 to 2005, he was a Visiting Professor with the Graduate School of Engineering, Tokyo University, Tokyo, Japan. He was a Lecturer from 1997 to 1998 and an Associate Professor from 1998 to 2002 with Shandong University, where he is currently a Professor with the School of Control Science and Engineering. His current research interests include service robot, intelligent space, cloud robotics, and brain-inspired intelligent robotics. Dr. Tian is a member of the IEEE Robotics and Automation Society. He is the Vice Director of the Intelligence Robot Specialized Committee of Chinese Association for Artiﬁcial Intelligence and the Intelligent Manufacturing System Specialized Committee of Chinese Association for Automation.

Please cite this article as: Z. Wang and G. Tian, Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.063

Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene

Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene

Recommend Documents