Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground

Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground

Journal Pre-proofs Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground Wei Tang, Zhijian Wang, Jiyou Zhai...

2MB Sizes 1 Downloads 60 Views

Journal Pre-proofs Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground Wei Tang, Zhijian Wang, Jiyou Zhai, Zhangjing Yang PII: DOI: Reference:

S1047-3203(19)30348-7 https://doi.org/10.1016/j.jvcir.2019.102727 YJVCI 102727

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

24 October 2019 28 November 2019 28 November 2019

Please cite this article as: W. Tang, Z. Wang, J. Zhai, Z. Yang, Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground, J. Vis. Commun. Image R. (2019), doi: https:// doi.org/10.1016/j.jvcir.2019.102727

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ยฉ 2019 Published by Elsevier Inc.

Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground Wei Tang1,2, Zhijian Wang1, Jiyou Zhai3, Zhangjing Yang2 1 College

of Computer and Information, Hohai University, Nanjing, 211100, P. R. China [e-mail: [email protected]]

2 School 3 School

of Information Engineering, Nanjing Audit University, Nanjing, 211815, P. R. China

of Computer Engineering, Nanjing Institute of Technology, Nanjing, 211167, P. R. China *Corresponding author: Wei Tang

Abstract This paper proposes a saliency detection method via two-stage absorbing Markov chain based on background and foreground for detecting salient objects in images. Firstly, image preprocessing is performed, followed by convex hull construction and superpixel segmentation, to prepare for subsequent processing. Secondly, according to the boundary connectivity, the superpixels with lower background probability value in the candidate boundary background set ๐ต0 are deleted, and the boundary background set ๐ต1 is obtained. With the saliency values of the nodes in the boundaryprior saliency map ๐‘†๐‘๐‘”1, the background seeds are added appropriately in the region outside the candidate boundary background set ๐ต0 and the convex hull ๐ป, and the background seed set ๐ต is obtained after update. Then, the background-absorbing Markov chain is constructed to generate background-absorbing saliency map ๐‘†๐‘๐‘”2. By fusing the saliency maps ๐‘†๐‘๐‘”1 and ๐‘†๐‘๐‘”2, the firststage background-based saliency map ๐‘†๐‘๐‘” is obtained. Thirdly, in the range of the convex hull ๐ป, the foreground seed set ๐น is determined according to the saliency map ๐‘†๐‘๐‘”. Then, the foregroundabsorbing Markov chain is constructed, to obtain the second-stage foreground-absorbing saliency map ๐‘†๐‘“๐‘”. Finally, the saliency maps ๐‘†๐‘๐‘” and ๐‘†๐‘“๐‘” of the two stages are combined to obtain a fused saliency map ๐‘†, and the final saliency map ๐‘† โˆ— is obtained after optimization through smoothing mechanism. Compared with the traditional methods, the performance of the proposed method is significantly improved. The proposed method is tested on three public image datasets, and it shows great accuracy in detecting salient objects. Keywords: Saliency detection, Markov chain, background absorbing, foreground absorbing

1. Introduction The human visual system has a selective attention mechanism that allows only a portion of the region of interest (ROI) in the visual scene to enter the brain for analysis to reduce the complexity of the scene and environment. For image saliency detection, it uses a computer to simulate the human visual system. Specifically, it calculates the saliency of each part of an image (the degree of attracting people's visual attention), and then extracts the most salient (highest degree of attraction) region. Image saliency detection reduces the complexity of computer understanding image content, perceives its main content, reduces computational cost, and improves image processing efficiency. The research of image saliency detection began with Koch [1] who proposed the concept of saliency map. Depending on the mathematical model used, image saliency detection methods can be divided into saliency detection based on feature integration theory[1-4], saliency detection based on information theory [5], graph-based saliency detection [6-7], saliency detection based on

decision theory [8-9], saliency detection based on Bayesian theory [10-11], saliency detection based on frequency-domain analysis [12-13], and saliency detection based on machine learning [14-19], etc. Depending on the purpose of detection and the function of the mathematical model used, the research on image saliency detection can be divided into eye fixation prediction [20] and salient object detection [21]. In 1998, Itti proposed the first landmark model for saliency calculation, the IT method [3], which belonged to the eye fixation prediction model. The IT method extracted multiscale visual features, such as color, intensity, and orientation. It constructed the feature maps using center-surround differences, and then fused the multi-feature maps to highlight the salient region. Before 2008, many saliency detection methods used the eye fixation prediction model. The purpose of eye fixation prediction is to predict the points that eyes move with, that is, the image positions that human eyes focus on. However, the final result only indicates the location of salient objects in an image, and does not get a clear boundary of salient objects. After 2008, the focus of image saliency detection began to turn to salient object detection. The purpose of salient object detection is to highlight the salient region in an image, which emphasizes the integrity and uniformity of salient objects in a saliency map, and uses artificially annotated ground truth (GT) maps with pixel accuracy. This research of this paper is also focused on salient object detection. In recent years, a large number of salient object detection models have been proposed, and they have been widely used in image segmentation [22], object detection and recognition [23], image compression [24], image retrieval [25], and image classification [26], etc. Salient object detection has become one of the important research directions in the field of computer vision. Salient object detection algorithms include bottom-up models [27] and top-down models [28]. A bottom-up model is generated by the direct stimulation of underlying visual signal. It does not involve high-level image information and is a fast, unconscious, data-driven visual attention mechanism. A top-down model is a slow, conscious, task-driven visual attention mechanism that typically requires learning the various characteristics of an image and using the information obtained from learning to detect objects. When researching the bottom-up model, it is generally assumed that an observer is in a state of free-viewing, that is, the observer has no purpose. The main focus of this paper is the bottom-up salient object detection model. Graph-based salient object detection has become one of the commonly used strategies in the field of saliency detection. In 2006, Harel et al. proposed a graph-based visual saliency (GBVS) algorithm for region extraction [6]. They applied random walk theory of graphs to visual saliency detection for the first time. In graph-based methods, a graph model is constructed based on graph theory, and an image is segmented into multiple regions. One region corresponds to one node in the graph, and the edges between nodes are also defined. Starting from the underlying clues of the image, some prior knowledge can be used to mark some nodes of the image as seed nodes, and then the propagation model spreads the saliency of the seed nodes. After propagation and diffusion, each node in the graph is given a corresponding saliency value. Common prior forms include boundary prior [29], center prior (30], convex hull prior [31], shape prior [32], and color prior [33], etc. Common propagation models include manifold ranking model [34], cellular automata model [35], label propagation model [36], and Markov model [37], etc. In Reference [34], Yang et al. proposed an image saliency detection method via graph-based manifold ranking (MR). The MR algorithm first constructs a k-regular graph for the superpixels in the graph model, and then performs two-stage saliency detection. In the first stage, the nodes in the

top, bottom, left, and right boundaries are used as background queries. The remaining nodes are ranked according to their relevance with these queries, and 4 saliency maps are obtained according to the ranking results, and then they are fused to obtain the initial saliency map. In the second stage, threshold segmentation is performed on the initial saliency map, which facilitates selecting foreground queries, and then the idea of manifold ranking is used for enhancement to obtain the final saliency map. In Reference [35], Qin et al. proposed a background-based maps optimized by single-layer cellular automata (BSCA) algorithm. The BSCA algorithm treats each superpixel as a cell. Firstly, it calculates the contrast between the cell and the boundary seeds according to color features and spatial distance, and establishes the background-based saliency map. Then, it propagates the obtained background-based saliency map as the initial saliency map. Finally, each cell will automatically evolve into a more accurate and steady state. In Reference [36], Li et al. proposed a label propagation (LP) algorithm. It further extracts the foreground labels of object region for complex scenes based on the traditional use of boundary superpixels as background labels for inner propagation. The LP algorithm propagates the effective information according to the similarity between the superpixels, and the final saliency map is the fusion result of the inter propagation via the combination of background and foreground labels. The use of inner propagation or inter propagation is determined by the compactness criterion. In Reference [37], Jiang et al. proposed an image saliency detection method based on absorbing Markov chain (MC). This method uses the boundary nodes as the background seed set and copies them as virtual absorbing nodes. All the nodes in the image are taken as transient nodes. Starting from any one of the transient nodes to perform random walks, the saliency of each transient node is measured by its absorbed time to absorbing nodes. The MC method takes the acquired superpixels as transient nodes in the Markov chain and copies the superpixels of the 4 boundaries into virtual absorbing nodes, thereby calculating the absorbed time of transient nodes. The disadvantages of the MC method are as follows. Firstly, the selection of boundary background seeds is not accurate, because salient objects may appear in the 1-2 boundaries of some images. Secondly, in the sample space of background seeds, the boundary background seeds only cover a part of them, which affects the efficiency of propagation to a certain extent. Thirdly, in the process of propagation, for some special images, some background areas may not be well suppressed, and the saliency values of individual singular nodes are very large, which fails to effectively highlight the salient region. Fourthly, only the absorbing and propagation of the background seeds are considered, that is, the mode of propagation is singular. Fifthly, the foreground and background in the saliency map are not uniform enough, which requires further optimization. In view of the above analysis, this paper proposes a two-stage absorbing Markov chain based on background and foreground for detecting salient objects in images. The framework of the proposed method is shown in Fig. 1. The proposed method can be divided into four steps: preprocessing step, the first-stage processing step, the second-stage processing step, fusion and smoothing step. Firstly, image preprocessing is performed to complete convex hull construction and superpixel segmentation, to prepare for subsequent processing. Secondly, according to the boundary connectivity, the superpixels with lower background probability value in the candidate boundary background set are deleted, and the boundary seeds are obtained. With the saliency values of the nodes in the boundary-prior saliency map, the background seeds are increased appropriately in the

region outside the candidate boundary background set and the convex hull. Then, the backgroundabsorbing Markov chain is constructed to generate background-absorbing saliency map. When fusing the boundary-prior saliency map and background-absorbing saliency map to obtain the firststage background-based saliency map, it should be noted that the background-absorbing saliency map takes a larger weight. Thirdly, in the range of the convex hull, the foreground seeds are selected according to the first-stage saliency map. Then, the foreground-absorbing Markov chain is constructed, to obtain the second-stage foreground-absorbing saliency map. Finally, the saliency maps of the two stages are fused, and the final saliency map is obtained after optimization through smoothing mechanism. Compared with the traditional methods, the performance of the proposed method is significantly improved. The proposed method is tested on three public image datasets, and it shows great accuracy in detecting salient objects. The main contributions of our work are summarized as follows: (1) Find the similar region by the superpixel similarity instead of just by the weight of boundaries, thereby optimizing the boundary connectivity algorithm; calculate the probability that each superpixel in the boundary belongs to the background, and remove superpixels with the lower background probability value from the candidate boundary background to get an accurate boundary background set. (2) Based on spatially weighted color contrast, in the region outside the boundary and the convex hull, some background nodes are added appropriately as background seeds, and they are combined with the boundary background set to obtain the background seed set to improve the algorithm efficiency. (3) In order to highlight the salient objects, the boundary-prior saliency map is fused with the background-absorbing saliency map to obtain the first-stage saliency map based on background. (4) According to the adjacency matrix W of the original graph model G, in the associated matrix of the absorbing Markov chain, the connections between the nodes are improved. (5) Based on the first-stage saliency map, the foreground seeds are selected within the range of the convex hull. (6) Make full use of the complementarity between background-based and foreground-based detection methods, perform effective propagation, respectively, and reasonably fuse and smooth the propagation results.

Stage 1๏ผšBackground-based saliency map Pre-processing

Input image

Boundary seeds

Result of boundary prior

Background seeds

Fusing and smoothing

Convex hull

Markov graph construction

Superpixels

Foreground seeds

Result of background absorbing

Markov graph construction

Background-based result

Final saliency map

Result of foreground absorbing

Stage 2๏ผšForeground-absorbing saliency map Fig. 1. The framework of our proposed method

The remainder of the paper is organized as follows: Section 2 reviews related work which is related to our approach. We demonstrate framework of our saliency detection method in detail in Section 3. Then, we demonstrate our experimental results based on three public image datasets and compare the results with other state-of-art saliency detection methods in Section 4. The final section concludes the paper by summarizing our findings.

2. Related Work

Given a set of states S={๐‘ 1, ๐‘ 2,โ€ฆ๐‘ ๐‘›}, a Markov chain is represented by an nร—n transition matrix P, ๐‘ƒ๐‘–๐‘— represents the transition probability of moving from state ๐‘ ๐‘– to state ๐‘ ๐‘—. The state ๐‘ ๐‘– with transition probability ๐‘ƒ๐‘–๐‘– = 1 is called the absorbing state, which means ๐‘ƒ๐‘–๐‘—=0 for all iโ‰ j at this time. If a Markov chain contains at least one absorbing state, and starting from any transient node, a random walker can reach a certain absorbing node with a positive probability through a finite number of steps, then this Markov chain is called an absorbing chain. In the MC method [37], the SLIC algorithm is first used to divide the input image into k superpixels. In general, the superpixels on the four boundaries of an image do not contain salient objects, which can be used as the nodes to be copied, and the m nodes that have been copied serve as virtual absorbing nodes in the absorbing chain. We use n nodes (n=k+m) to create a single-layer graph model G=, where ๐‘‰๐‘– represents the superpixel node in the graph G, ๐‘‰๐‘– โˆˆ V, and E is the set of boundaries. We renumber the nodes in graph G, so that the first k nodes are transient nodes and the last m nodes are absorbing nodes. The associated matrix A = (๐‘Ž๐‘–๐‘—)๐‘› โˆ— ๐‘› of the nodes in the graph G is defined by: ๐‘คโ€ฒ๐‘–๐‘— j โˆˆ N(i), 1 โ‰ค i โ‰ค k ๐‘Ž๐‘–๐‘—= 1 i=j (1) 0 otherwise where N(i) represents a set of nodes connected to node ๐‘‰๐‘–. In the graph model G, each node (transient or absorbing) is connected to the transient nodes which neighbor it or share common boundaries with its neighboring nodes. In addition, all the transient nodes of image boundary are connected to each other. The boundary weight ๐‘คโ€ฒ๐‘–๐‘— between the nodes ๐‘‰๐‘– and ๐‘‰๐‘— is defined as: โ€•

โ€–๐‘‹๐‘– โ€•๐‘‹๐‘—โ€– 2

๐œŽ ๐‘คโ€ฒ๐‘–๐‘—=๐‘’ (2) where ๐‘‹๐‘– and ๐‘‹๐‘— are the feature vectors of mean CIELAB color of superpixels ๐‘‰๐‘– and ๐‘‰๐‘—, respectively, and ฯƒ is a constant. Based on the associated matrix A, the degree matrix D=diag(โˆ‘๐‘—๐‘Ž๐‘–๐‘—) is obtained, and the element on the diagonal of the matrix D is the sum of the weights connected to the node. Therefore, the transition matrix of the graph G is defined as: P=D-1A (3) The transition matrix P of the absorbing chain can be written in a simple standard form given by: ๐‘„ ๐‘… P= 0 ๐ผ (4) where the sub-matrix ๐‘„ = (๐‘ž๐‘–๐‘—)๐‘˜ โˆ— ๐‘˜ contains the transition probability between any two transient nodes; the sub-matrix ๐‘… = (๐‘Ÿ๐‘–๐‘—)๐‘˜ โˆ— ๐‘š contains the probability of moving from any transient node to any absorbing node; 0 is the mร—k zero sub-matrix; I is the mร—m identity sub-matrix, meaning that a random walker cannot move between any pair of absorbing nodes, and the transition probability of moving from any absorbing node to itself is 1. The fundamental matrix of an absorbing Markov chain is N = (๐‘›๐‘–๐‘—)๐‘˜ โˆ— ๐‘˜, where ๐‘›๐‘–๐‘— ๏ค๏ฅ๏ฎ๏ฏ๏ด๏ฅ๏ณ the average number of times that the random walker has passed through the transient node ๐‘‰๐‘— before arriving at a certain absorbing node if he starts from the transient node ๐‘‰๐‘–. โˆž N = โˆ‘๐‘  = 0๐‘„๐‘  = (๐ผ โ€• ๐‘„) โ€•1 (5) ๐‘‰ The random walker starts from the transient node ๐‘– and finally reaches a certain absorbing node. In this process, the total number of times passing through all the transient nodes is called the absorbed time of the transient node ๐‘‰๐‘–, and the corresponding value is โˆ‘๐‘—๐‘›๐‘–๐‘—. For a k-dimensional column vector c = [1,1,โ€ฆ,1]๐‘‡, absorbed time is calculated by: y = Nc (6) ๐‘ฆ = [๐‘ฆ(1),๐‘ฆ(2),โ€ฆ,๐‘ฆ(๐‘˜)]๐‘‡ is a k-dimensional column vector recording the absorbed time of each transient node. By normalizing the absorbed time vector y, a saliency map S is obtained. S (๐‘–) = ๐‘ฆ(๐‘–)๏ผŒ๐‘– โˆˆ [1,k] (7) where i denotes the number of transient node in the image, and ๐‘ฆ denotes the normalized absorbed time vector.

[

]

3. The proposed algorithm In order to obtain a more accurate and robust saliency map, this paper proposes a two-stage Markov chain method. This section details the four steps of the proposed method which is expressed as Algorithm 1. 3.1 Image preprocessing 3.1.1 Convex hull construction Using the color enhanced Harris operator [38] and color features, Harris corner detection is performed on the original image I, and the minimum convex hull H is constructed according to the detected salient feature points. The convex hull H roughly determines the range of saliency region, and the region outside the convex hull is seen as the background region. The salient objects are contained in the convex hull, but the convex hull still contains many background areas. 3.1.2 Superpixel segmentation The simple linear iterative cluster (SLIC) algorithm [39] is used to segment the input image I into k superpixels to generate the basic elements based on the graph-based saliency detection. 3.1.3 Graph model construction An undirected graph G= is constructed for the image I processed by superpixel segmentation. ๐‘‰๐‘– represents the superpixel node in the graph G, ๐‘‰๐‘– โˆˆ V. k represents the number of superpixel nodes in the graph G. The similarity between the superpixel nodes ๐‘‰๐‘– and ๐‘‰๐‘— is defined as: โ€•

๐‘‘๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘‰๐‘–,๐‘‰๐‘—) ๐œŽ2

๐‘ ๐‘–๐‘š๐‘–๐‘—=๐‘’ (8) ๐‘ ๐‘–๐‘š ๐‘ ๐‘–๐‘š In graph G, ๐‘–๐‘— can applied to any two nodes. The closer the value of ๐‘–๐‘— is to 1, the more similar the nodes are. Even though ๐‘‰๐‘— is not a neighborhood of ๐‘‰๐‘–, the two nodes may be similar. In Eq. (8), ๐œŽ2 is a balance parameter used to control the strength of the weight, where ๐œŽ2=0.1 in this paper; ๐‘‘๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘‰๐‘–,๐‘‰๐‘—) is the Euclidean distance of the superpixels ๐‘‰๐‘–, ๐‘‰๐‘— in the CIELAB color space. ๐‘‘๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘‰๐‘–,๐‘‰๐‘—) = โ€–๐‘๐‘– โ€• ๐‘๐‘—โ€– = (๐‘™๐‘– โ€• ๐‘™๐‘—)2 + (๐‘Ž๐‘– โ€• ๐‘Ž๐‘—)2 + (๐‘๐‘– โ€• ๐‘๐‘—)2 (9) where ๐‘๐‘– and ๐‘๐‘— are the feature vectors of mean CIELAB color of superpixels ๐‘‰๐‘– and ๐‘‰๐‘—, respectively. 3.2 The first stage: saliency map ๐‘บ๐’ƒ๐’ˆ based on background 3.2.1 Determining the boundary background set In many cases, all the four boundaries of an image are background, but there will also be some images with a part of salient objects appearing on one or two boundaries. If this part is selected as the background node, it will reduce the accuracy of saliency detection based on boundary prior and background absorbing. 1) Definition of candidate boundary background set ๐ต0 In an image with a resolution of wร—h, the superpixels located in the image boundary of width d are taken as the candidate boundary background set ๐ต0. The abscissa of ๐ต0 is { xโ”‚0 โ‰ค x โ‰ค d โˆช w โ€• d โ‰ค x โ‰ค w}; the ordinate of ๐ต0 is { yโ”‚0 โ‰ค y โ‰ค d โˆช h โ€• d โ‰ค y โ‰ค h}. We take d=10 here. {๐ผ1,๐ผ2,๐ผ3,โ€ฆ} represents the pixels in the superpixel ๐‘‰๐‘–, and as long as one pixel ๐ผ๐‘˜ is in ๐ต0, it is considered that ๐‘‰๐‘– belongs to the candidate boundary background set ๐ต0, that is, ๐ต0 = {๐‘‰๐‘–โ”‚โˆƒ๐ผ๐‘˜ โˆˆ ๐ต0,๐ผ๐‘˜ โˆˆ ๐‘‰๐‘–}. To filter boundary background seeds, we need to calculate the background probability of each superpixel ๐‘‰๐‘– in ๐ต0. 2) Finding a similar region R of the boundary node ๐‘‰๐‘– For the superpixel ๐‘‰๐‘– in the image boundary, we need to find the superpixel nodes similar to ๐‘‰๐‘– in the entire image. The similar region R is expressed as: ๐‘… = {๐‘‰๐‘—โ”‚๐‘ ๐‘–๐‘š๐‘–๐‘— โ‰ฅ ๐›ฝ1,๐›ฝ1 โˆˆ [0.7,0.9]} (10) If the similarity ๐‘ ๐‘–๐‘š๐‘–๐‘— is not smaller than the adaptive threshold ๐›ฝ1, the superpixels ๐‘‰๐‘— and ๐‘‰๐‘– are considered similar. The superpixel ๐‘‰๐‘– and all similar nodes form a similar region R. 3) Calculating the boundary connectivity BCon of the similar region R Boundary connectivity [40] is a method to quantify the connection extent between region R

and an image boundary. The boundary connectivity BCon of the similar region R of the boundary node ๐‘‰๐‘– is expressed as: ๐ต๐ถ๐‘œ๐‘› =

โˆ‘๐‘—๐‘ ๐‘–๐‘š๐‘–๐‘— โˆ™ ๐›ฟ(๐‘‰๐‘— โˆˆ ๐ต0) โˆ‘๐‘—๐‘ ๐‘–๐‘š๐‘–๐‘—

(11)

๏ผŒ๐‘‰๐‘— โˆˆ ๐‘…

where ฮด(โˆ™) is an indication function. When the superpixel ๐‘‰๐‘— โˆˆ ๐ต0, the value of ๐›ฟ(๐‘‰๐‘— โˆˆ ๐ต0) is 1, otherwise the value is 0. โˆ‘๐‘—๐‘ ๐‘–๐‘š๐‘–๐‘— โˆ™ ๐›ฟ(๐‘‰๐‘— โˆˆ ๐ต0) indicates the sum of the similarity of the boundary nodes ๐‘‰๐‘– (including the similarity of ๐‘‰๐‘– itself) in the intersection of the candidate boundary background set ๐ต0 and the similar region R. โˆ‘๐‘—๐‘ ๐‘–๐‘š๐‘–๐‘— indicates the sum of the similarity of the boundary node ๐‘‰๐‘– (including the similarity of ๐‘‰๐‘– itself) in the similar region R. As shown in Fig. 2, for the background region, its value of boundary connectivity is usually large, while for the object region, its value of boundary connectivity is usually small.

Fig. 2. An illustrative example of boundary connectivity

4) Calculating the background probability of boundary node ๐‘‰๐‘– The greater the boundary connectivity of the similar region R is, the greater the probability that it belongs to the background, and further, the higher the probability that the boundary node ๐‘‰๐‘– belongs to the background. In the candidate boundary background set ๐ต0, the background probability of the superpixel ๐‘‰๐‘– is calculated by: โ€•

๐ต๐ถ๐‘œ๐‘›2(๐‘…) 2๐œŽ2 ๐ต

๐‘ƒ(๐‘–) = 1 โ€• ๐‘’ (12) In the experiment, we take ๐œŽ2๐ต=1. 5) Filtering the boundary background seeds and determining the boundary background set ๐ต1 In the candidate boundary background set ๐ต0, the superpixel ๐‘‰๐‘– with a lower background probability ๐‘ƒ(๐‘–) is removed from the candidate boundary background set ๐ต0 to obtain the final boundary background set ๐ต1. ๐ต1=๐ต0 โ€• {๐‘‰๐‘–โ”‚๐‘ƒ(๐‘–) < ๐›ฝ2,๐‘‰๐‘– โˆˆ ๐ต0, ๐›ฝ2 โˆˆ [0.1,0.4]} (13) As shown in Fig. 2, two superpixels at the bottom of the person are contained in the image boundary. Compared with other superpixels in the candidate boundary background set ๐ต0, the background probabilities of the two superpixels are the smallest, so they are removed from the candidate boundary background set ๐ต0. 3.2.2 Boundary-prior saliency map ๐‘บ๐’ƒ๐’ˆ๐Ÿ Considering that there may be multiple objects in the image, this paper does not use the method of calculating center saliency map based on convex hull, that is, the center of gravity of the convex hull (x0,y0) as the center of the salient region, by calculating the Euclidean distance between the superpixel and the center, the saliency value of each superpixel in the convex hull is obtained. This paper calculates the boundary-prior saliency map ๐‘†๐‘๐‘”1 based on spatially weighted color contrast. If a superpixel is more different from the boundary background set, it is more likely to belong to salient objects; if the difference is small, it is more likely to belong to the background region. In addition, the contribution of the boundary background set to the superpixel saliency is also related to the spatial distance, and the smaller the space distance is, the higher the contrast. Taking the boundary background set ๐ต1 as a prior, the saliency value of the superpixel ๐‘‰๐‘– is given by: ๐‘†๐‘๐‘”1(๐‘–)=๐‘ ๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘–) โˆ™ ๐‘ค๐‘‘๐‘–๐‘ (๐‘–) (14) ๐‘‘๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘‰๐‘–,๐‘‰๐‘—) โ€• 1 ๐œŽ2 1 ๐‘ ๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘–)=1 โ€• ๐‘›๐‘โˆ‘๐‘—๐‘’ , ๐‘‰๐‘— โˆˆ 2 ๐‘‘๐‘‘๐‘–๐‘ (๐‘‰๐‘–,๐‘‰๐‘—) โ€• 1 ๐œŽ2 2 ๐‘ค๐‘‘๐‘–๐‘ (๐‘–)=๐‘›๐‘โˆ‘๐‘—๐‘’ , ๐‘‰ ๐‘— โˆˆ ๐ต1

๐ต1

(15) (16)

where ๐‘ ๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘–) represents the color difference between the superpixel ๐‘‰๐‘– and the boundary background set ๐ต1; ๐‘ค๐‘‘๐‘–๐‘ (๐‘–) represents the spatial difference between the superpixel ๐‘‰๐‘– and the boundary background set ๐ต1 as a distance contrast balance factor; ๐‘›๐‘ represents the total number of superpixels in the boundary background set ๐ต1; ๐‘‘๐‘‘๐‘–๐‘ (๐‘‰๐‘–,๐‘‰๐‘—) represents the Euclidean distance of the central coordinate between the superpixels ๐‘‰๐‘– and ๐‘‰๐‘—; in the experiment, ๐œŽ21=0.2, ๐œŽ22=1.5. Finally, ๐‘†๐‘๐‘”1(๐‘–) is normalized. 3.2.3 Background-absorbing saliency map ๐’๐›๐ ๐Ÿ 1) Determining the background seed set B In addition to the boundary background set ๐ต1, some background nodes are added as background seeds, and the two sets are combined to generate a background seed set B to further optimize the algorithm. ๐ต=๐ต1 โˆช {๐‘‰๐‘–โ”‚๐‘†๐‘๐‘”1(๐‘–) < ๐›ฝ3 ๐‘Ž๐‘›๐‘‘ ๐‘†๐‘๐‘”1(๐‘–) < ๐‘Ž๐‘ฃ๐‘”(๐‘†๐‘๐‘”1),๐‘‰๐‘– โˆ‰ ๐ต0,๐‘‰๐‘– โˆ‰ ๐ป} (17) In Eq. (17), in the region outside the candidate boundary background set ๐ต0 and the convex hull H, the background nodes are appropriately added. The saliency value ๐‘†๐‘๐‘”1(๐‘–) of the increased node ๐‘‰๐‘– is less than the threshold value ๐›ฝ3, and is smaller than the average saliency value ๐‘Ž๐‘ฃ๐‘” (๐‘†๐‘๐‘”1) of all superpixels in the region. In the experiment, ๐›ฝ3=0.05. 2) Adjacency matrix W of graph model G For the undirected graph G=, the edge E is defined as: each node is connected not only to its neighboring nodes but also to the neighboring nodes of the neighboring nodes; all the nodes in the boundary background set ๐ต1 are connected to each other, which reduces the geodesic distance between similar nodes. The adjacency matrix of the graph model G is W = (๐‘ค๐‘–๐‘—)๐‘˜ โˆ— ๐‘˜, where ๐‘ค๐‘–๐‘— represents the weight of the edge (๐‘‰๐‘–, ๐‘‰๐‘—) โˆˆ E, which is same to the similarity ๐‘ ๐‘–๐‘š๐‘–๐‘— described above, and the weight is defined as: โ€•

๐‘‘๐‘๐‘œ๐‘™๐‘œ๐‘Ÿ(๐‘‰๐‘–,๐‘‰๐‘—) ๐œŽ2

๐‘ค๐‘–๐‘—=๐‘’ (18) 3) Associated matrix A of the background-absorbing Markov chain G With the graph model G, we construct an absorbing Markov chain. The m nodes in the background seed set B are copied, and are regarded as virtual absorbing nodes. All nodes (k nodes) in the original image are used as transient nodes, thus obtaining a graph model of the backgroundabsorbing Markov chain G =, and graph G is a partially directed graph. Based on the graph G, the new connections between the nodes in graph G are further defined as follows. a) Transient node's own connection: Each transient node has a self-loop, that is, a transient node can connect to itself, and the edge weight is 1. b) Connections between transient nodes: The connections between transient nodes are consistent with the connections between the corresponding nodes in the undirected graph G. c) Connections between transient nodes and virtual absorbing nodes: If a transient node has a corresponding virtual absorbing node for its replication, the transient node is unidirectionally connected to its virtual absorbing node, with edge weight of 1. The other transient nodes connected to this transient node are unidirectionally connected to the virtual absorbing nodes corresponding to the transient node. Such connections are unidirectional, which means that each absorbing node is not connected to any transient node. d) Virtual absorbing nodeโ€™s own connection: each virtual absorbing node has a self-loop, and the edge weight is 1. e) Connections between virtual absorbing nodes: There is no connection between any two virtual absorbing nodes. In graph G, the total number of nodes is n=k+m. The nodes in the graph G are rearranged, so that the k transient nodes are ranked in front, and the m absorbing nodes are ranked behind. According to the rules described above, the associated matrix A = (๐‘Ž๐‘–๐‘—)๐‘› โˆ— ๐‘› of the graph G is defined. As the nodes are locally connected, so A is a sparse matrix containing a small number of non-zero elements.

i โ‰ค k, j โ‰ค k, j โˆˆ N(i) i โ‰ค k, j > k, j=iโ€ฒ

๐‘ค๐‘–๐‘— 1 ๐‘Ž๐‘–๐‘—=

i โ‰ค k, j > k, i โˆˆ N(๐‘™), j=๐‘™โ€ฒ

๐‘ค๐‘–๐‘™

(19) 1 i=j 0 otherwise where N(i) represents a set of the sequence numbers of the nodes connected to the node ๐‘‰๐‘– in G, and iโ€ฒ is the sequence number of the copy node of ๐‘‰๐‘–. 4) Transition matrix P of the background-absorbing Markov chain G According to Eq. (3), the transition matrix P of the graph G is obtained, which is a sparse matrix whose element ๐‘ƒ๐‘–๐‘— โˆˆ [0,1] represents the probability of moving from node ๐‘‰๐‘– to node ๐‘‰๐‘—, and the value of ๐‘ƒ๐‘–๐‘— is given by: ๐‘Ž๐‘–๐‘—

๐‘ƒ๐‘–๐‘—=โˆ‘ ๐‘Ž

๐‘— ๐‘–๐‘—

(20)

5) Calculating the saliency values based on absorbed time According to Eq. (4)-(7), the saliency values of the nodes in original image are finally obtained, as shown in Eq. (21), and the background-absorbing saliency map ๐‘†๐‘๐‘”2 is generated, where ๐‘†๐‘๐‘”2 (๐‘–) represents the saliency value of superpixel node ๐‘‰๐‘–. ๐‘†๐‘๐‘”2(๐‘–) = ๐‘ฆ(๐‘–)๏ผŒ๐‘– โˆˆ [1,k] (21) The larger the ๐‘ฆ(๐‘–) value of transient node ๐‘‰๐‘–, the longer the time of ๐‘‰๐‘– being absorbed into the background seed node, and the more likely the transient node ๐‘‰๐‘– belong to salient objects; conversely, the smaller the ๐‘ฆ(๐‘–) value of transient node ๐‘‰๐‘–, the shorter the time of ๐‘‰๐‘– being absorbed into the background seed node, and the more likely the transient node ๐‘‰๐‘– belong to the background. 3.2.4 Fusion of boundary-prior saliency map and background-absorbing saliency map The boundary-prior saliency map ๐‘†๐‘๐‘”1 is fused with the background-absorbing saliency map ๐‘†๐‘๐‘”2, and the importance of the background-absorbing saliency map is highlighted. Then, the firststage saliency map ๐‘†๐‘๐‘” based on background is obtained. The saliency value of superpixel ๐‘‰๐‘– is calculated by: ๐‘†๐‘๐‘”(๐‘–) = (1 โ€• ๐‘’( โ€• ฮธ โˆ™ ๐‘†๐‘๐‘”1(๐‘–))) โˆ™ ๐‘†๐‘๐‘”2(๐‘–) (22) ( โ€• ฮธ โˆ™ ๐‘†๐‘๐‘”1(๐‘–)) ) can suppress where ฮธ is the balance factor, and we take ฮธ=5 in the experiment. (1 โ€• ๐‘’ the background noise in the background-absorbing saliency map to some extent. The obtained background-based saliency map ๐‘†๐‘๐‘” is also normalized. 3.3 The second stage: foreground-absorbing Markov saliency map ๐‘บ๐’‡๐’ˆ 3.3.1 Determining the foreground seed set F After the first stage, the main objects are roughly highlighted, but the saliency map ๐‘†๐‘๐‘” still contains some background noise that should be suppressed, that is, part of background region is misidentified as salient objects. Moreover, there may be the case where some foreground regions that are mistaken for the background. Therefore, it is necessary to start from the potential foreground region and perform back propagation based on the absorbing Markov chain to further improve the background-based saliency map ๐‘†๐‘๐‘”. In order to select the foreground seeds, in the range of the convex hull H, the superpixels with large value in the saliency map ๐‘†๐‘๐‘” are selected to the foreground seed set F. F = {๐‘‰๐‘–โ”‚๐‘†๐‘๐‘”(i) โ‰ฅ ๐›ฝ4 โˆ— max (๐‘†๐‘๐‘”) ๐‘Ž๐‘›๐‘‘ ๐‘†๐‘๐‘”(๐‘–) โ‰ฅ ๐‘Ž๐‘ฃ๐‘”(๐‘†๐‘๐‘”), ๐‘‰๐‘– โˆˆ ๐ป} (23) where max (๐‘†๐‘๐‘”) represents the maximum saliency value of the superpixel nodes in the saliency map ๐‘†๐‘๐‘”, and ๐›ฝ4 = 0.7 is taken in the experiment. ๐‘Ž๐‘ฃ๐‘”(๐‘†๐‘๐‘”) represents the average saliency value of all superpixels in the convex hull H. 3.3.2 Constructing the foreground-absorbing Markov chain The steps to construct a foreground-absorbing Markov chain are similar to the steps to construct a background-absorbing Markov chain. The main differences between the two are as follows. Firstly, at this stage, all the nodes in the foreground seed set F are copied and used as virtual absorbing nodes;

Secondly, after normalizing the absorbed time vector y, the saliency value of the original image node is calculated according to Eq. (24), thereby obtaining the foreground-absorbing saliency map ๐‘†๐‘“๐‘”, where ๐‘†๐‘“๐‘”(๐‘–) represents the saliency value of superpixel node ๐‘‰๐‘–; ๐‘†๐‘“๐‘”(๐‘–) = 1 โ€• ๐‘ฆ(๐‘–)๏ผŒ๐‘– โˆˆ [1,k] (24) Thirdly, the larger the ๐‘ฆ(๐‘–) value of transient node ๐‘‰๐‘–, the longer the time of ๐‘‰๐‘– being absorbed into the foreground seed node, and the more likely the transient node ๐‘‰๐‘– belong to the background; conversely, the smaller the ๐‘ฆ(๐‘–) value of transient node ๐‘‰๐‘– is, the shorter the time of ๐‘‰๐‘– being absorbed into the foreground seed node, and the more likely transient node ๐‘‰๐‘– belong to salient objects. 3.4 Saliency map fusion and smoothing optimization 3.4.1 Fusion of saliency maps at two stages The background-absorbing saliency map ๐‘†๐‘๐‘” and the foreground-absorbing Markov saliency map ๐‘†๐‘“๐‘” are complementary, as the former can highlight the salient objects, while the latter can better suppress the background noise. By combining these two saliency maps, we can obtain a fused saliency map S. ๐‘† = ฮฑ๐‘†๐‘๐‘” +(1 โ€• ฮฑ) ๐‘†๐‘“๐‘” (25) where ฮฑ is the balance factor. Considering the contributions of the two saliency maps are same, we take ฮฑ=0.5 in the experiment. 3.4.2 Smoothing optimization of the fused saliency map In order to highlight the salient objects, weaken the background region, and obtain a more uniform foreground and background, this paper uses the smoothing mechanism to optimize the fused saliency map by solving the optimization problem given by: 2 2 ๐‘† โˆ— = arg๐‘š๐‘–๐‘›(๐œ‡ โˆ‘๐‘–,๐‘—๐‘ค๐‘–๐‘—(๐‘† โˆ— (๐‘–) โ€• S โˆ— (๐‘—)) + โˆ‘๐‘–(๐‘† โˆ— (๐‘–) โ€• ๐‘†(๐‘–)) ) (26) ๐‘‡ โˆ— where S is the fused saliency map, ๐‘† = [๐‘†(1),๐‘†(2),โ€ฆ,๐‘†(๐‘˜)] ; ๐‘† is the final saliency map, ๐‘† โˆ— = ๐‘‡ [๐‘† โˆ— (1),๐‘† โˆ— (2),โ€ฆ,๐‘† โˆ— (๐‘˜)] . The first term of the function on the right side of Eq. (26) is the smoothness constraint, and the second term is the fitting constraint, where parameter ๐œ‡ controls the balance of the smoothness constraint and the fitting constraint. That is to say, a good saliency map ๐‘† โˆ— should not change too much between nearby superpixels, and should not differ too much from the initial saliency map (the fused saliency map ๐‘†). 2 2 The derivative of the function ๐œ‡โˆ‘๐‘–,๐‘—๐‘ค๐‘–๐‘—(๐‘† โˆ— (๐‘–) โ€• S โˆ— (๐‘—)) + โˆ‘๐‘–(๐‘† โˆ— (๐‘–) โ€• ๐‘†(๐‘–)) is set to 0 to calculate the minimum solution. Through the transformation, the final saliency map ๐‘† โˆ— is calculated by: ๐‘† โˆ— = ฮป(D โ€• W + ฮปI) โ€•1๐‘† (27) where W is the adjacency matrix of graph G, W = (๐‘ค๐‘–๐‘—)๐‘˜ โˆ— ๐‘˜; D is a degree matrix, D = diag(โˆ‘๐‘—๐‘ค๐‘–๐‘—); I is an identity matrix; ฮป = 1 (2ฮผ), and we take ฮป=0.02 in the experiment. 3.5 Algorithm representation In Sections 3.1~3.4, this paper proposes a saliency detection method via two-stage absorbing Markov chain based on background and foreground. The main steps of the proposed method are summarized in Algorithm 1. Algorithm 1. Input: Input image I 1. Use the Harris corner detection to construct the convex hull ๐ป 2. Segment image I to superpixels 3. Define the candidate boundary background set ๐ต0 4. Calculate the background probability of the boundary nodes using boundary connectivity according to Eq. (10)-(12) 5. Filter the boundary background seeds in the candidate boundary background set ๐ต0 to determine the boundary background set ๐ต1 according to Eq. (13) 6. Calculate the boundary-prior saliency map ๐‘†๐‘๐‘”1 based on the spatially weighted color contrast according to Eq. (14)-(16) 7. In the region outside the candidate boundary background set ๐ต0 and the convex hull ๐ป, add the

background nodes appropriately to determine the background seed set ๐ต according to Eq. (17) 8. The nodes in the background seed set ๐ต are copied as the virtual absorbing nodes, to obtain the associated matrix A of the background-absorbing Markov chain G according to Eq. (18)-(19) 9. Obtain the transition matrix P of the background-absorbing Markov chain G according to Eq. (3) 10. Generate background-absorbing Markov saliency map ๐‘†๐‘๐‘”2 according to Eq. (4)-(7), and Eq. (21) 11. The saliency maps ๐‘†๐‘๐‘”1 and ๐‘†๐‘๐‘”2 are fused to obtain a background-based saliency map ๐‘†๐‘๐‘” according to Eq. (22) 12. According to Eq. (23), select the foreground seeds in the range of convex hull ๐ป according to the saliency map ๐‘†๐‘๐‘”, and determine the foreground seed set ๐น. 13. Similar to steps 8~9, the nodes in the foreground seed set ๐น are copied as the virtual absorbing node, then the associated matrix A and the transition matrix P of the foreground-absorbing Markov chain G are obtained. 14. Generate foreground-absorbing Markov saliency map ๐‘†๐‘“๐‘” according to Eq. (4)-(7), and Eq. (24) 15. According to Eq. (25), the two saliency maps of ๐‘†๐‘๐‘” and ๐‘†๐‘“๐‘” are fused to obtain a fused saliency map ๐‘†. 16. According to Eq. (26)-(27), use the smoothing mechanism to optimize the fused saliency map to obtain the final saliency map ๐‘† โˆ— Output: Final Saliency map ๐‘† โˆ—

4. Experimental Results and Analysis In order to verify the performance of the proposed algorithm, the proposed algorithm is compared with 12 traditional methods, namely MC[37], DRFI[16], MR[34], SEG[41], BMS[42], SVO[43], DSR [44], PCA [45], HS [46], SF [47], SWD [48], and CA [49]. The experimental images are from ECSSD [50], PASCAL-S [51], and SED2 [52], each of which includes original images, and GT maps whose salient regions are manually marked. The ECSSD dataset is an extended set of CSSD [53] dataset, with a total of 1000 images. The dataset has large objects, including semantically rich and complex images, which cover plants, animals, and human behaviors in natural scenes. The GT maps of all images are marked as binary images with pixel-level precision. The PASCAL-S dataset is a subset of the PASCAL VOC [54] dataset. 850 images with complex scenes are selected from the PASCAL VOC dataset, involving 20 object classes. The GT maps of all images are marked as binary images with pixel-level precision. The SED2 dataset is a subset of the SED dataset, with a total of 100 images. Although the dataset is small in scale, it is relatively difficult, as it has small objects and has two objects in an image. The GT maps of all images are marked as binary images with pixel-level precision. 4.1 Qualitative analysis Qualitative analysis is a rough judgment of macroscopic results by human eyes. Fig. 3 shows a visual comparison of the results obtained by different methods on the three datasets. Qualitative analysis is performed on the experimental results obtained by the various methods. The MC method is currently one of the better-behaved methods, but the background in the middle of an image is also easy to be highlighted due to the inherent characteristics of absorbing Markov chain. The DRFI method uses a random forest regression algorithm based on multi-feature learning, and uses the supervised learning method to map regional feature vectors to saliency values. Overall, the DRFI method has high accuracy, but when the foreground and background have complex changes, missed detection or false detection may occur. The MR method uses the manifold ranking principle, which can generally highlight the entire salient region. But when the background is close to the objects in color, or the salient objects is close to the image boundary, some salient regions will be wrongly detected as the background. In the Bayesian framework, the SEG method measures the saliency of an image by comparing the center and surrounding features of a sliding window. The SEG method is not good for the weakening of background, and objects are greatly interfered by noise. The BMS method is based on a Gestalt principle of figure-ground segregation. In this method, the saliency maps are calculated by analyzing the topological structure of Boolean maps. Since only the global color topological information is used, the color features in the BMS are

not sensitive to changes in the pixel direction, which reduces the saliency values of some regions and makes the entire saliency objects uniform. The SVO method is a top-down saliency detection algorithm that uses the results of other salient algorithms as prior knowledge. In this method, the saliency map is largely subject to background interference, and some backgrounds are wrongly detected as salient objects. Based on the constructed background dictionary, the DSR method obtains the reconstruction coefficient of the image to be tested, and uses the corresponding reconstruction error to describe the image saliency, which has a good inhibitory effect on most of image backgrounds, but sometimes it cannot effectively retain the object boundary. The PCA method uses the pattern and color features of image patches to perform saliency detection, which can detect the outline of salient objects, but does not highlight some internal details. The HS method is based on the over-segmentation of an image, and uses a hierarchical model of three layer scales for saliency detection. For some images with a messy background, some errors may occur in the detection results obtained by this method. The SF method proposes an algorithm of saliency filters for salient region detection, which can outline the foreground objects, but the inside of the object region is not uniform, and the saliency values of some pixels in the salient region are wrongly suppressed. The SWD method is a saliency detection algorithm based on spatially weighted dissimilarity, and calculates the dissimilarities between patches in a relatively low dimensional space after dimensionality reduction. The SWD method can predict human fixations, but it will lose image details to produce some noise. The CA method combines global prior knowledge when local contrast is performed by means of features, and can detect the object boundary, but cannot uniformly highlight the inner region of the objects. The proposed method can highlight the salient region in a clear, bright and uniform manner, and can also effectively suppress the noise of background region (including the background in the middle of an image). For the case where the foreground objects touch the image boundary, the proposed method can also highlight the salient objects. In addition, for images with multiple objects or complex backgrounds, the proposed method performs better than other methods. It can be seen from visual comparison that the saliency maps obtained by the proposed method have the best visual effect, indicating the effectiveness of the proposed algorithm.

Fig.3. A part of visual comparisons of the saliency maps using different algorithms on the ECSSD, PASCAL-S and SED2 datasets

4.2 Quantitative analysis In the comparison experiment for quantitative analysis, this paper uses the Precision-Recall curve and the Precision/Recall/F-measure histogram to measure the effectiveness of the proposed algorithm [12, 21]. For the saliency maps obtained by a saliency detection algorithm, the thresholds of T โˆˆ [0,1,2,3,โ€ฆ,255] are used for binary segmentation, and then calculate the Precision and Recall of each saliency map at 256 thresholds according to the segmented binary map and GT map. Precision is defined as the ratio of the correctly-detected foreground region to the foreground region in the saliency map. Recall is defined as the ratio of the correctly-detected foreground region to the foreground region in the GT map. The Precision and Recall are defined as: ๐‘‡๐‘ƒ

๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› = ๐‘‡๐‘ƒ + ๐น๐‘ ๐‘‡๐‘ƒ

(28)

๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ = ๐‘‡๐‘ƒ + ๐น๐‘ƒ (29) ๐‘‡ where ๐‘ƒ refers to the number of pixels that are correctly identified to belong to the objects, ๐น๐‘ƒ refers to the number of pixels that are misidentified to belong to the objects, and ๐น๐‘ refers to the number of pixels that are misidentified to belong to the non-objects. At the 256 thresholds, we calculate the average values of Precision and Recall for all saliency maps, respectively, and describe the obtained 256 sets of data as 256 points on a two-dimensional plane with Recall for X-axis and Precision for Y-axis. By smoothing these points, the PR curve is obtained. When evaluating the saliency effect, the two indicators of Precision and Recall are mutually

constrained. When Recall is increased, Precision is often reduced. Usually, the F-measure index is used to comprehensively consider the importance of the two indicators. F-measure is the weighted harmonic mean of Precision and Recall. The larger the F-measure value, the better the performance of the detection algorithm. F-measure is defined as: ๐น๐›ฝ = 2

(1 + ๐›ฝ2) โˆ™ Precision โˆ™ Recall ๐›ฝ2 โˆ™ Precision + Recall

(30) 2

where ๐›ฝ is the balance factor to balance Precision and Recall. In this paper, the value of ๐›ฝ is set to 0.3 to increase the weight of Precision, that is, Precision is more important than Recall. When drawing the Precision/Recall/F-measure histogram, we don't need to take the thresholds in [0,1,2,3,...,255] like drawing the PR curve. For different saliency maps ๐‘† โˆ— , the threshold for binarization is adaptively determined, and the adaptive threshold T is defined as twice the average saliency value of all pixels in the saliency map ๐‘† โˆ— . T is calculated by: 2 ๐ป ๐‘Š ๐‘‡ = ๐‘Š โˆ— ๐ปโˆ‘๐‘ฅ = 1โˆ‘๐‘ฆ = 1๐‘† โˆ— (๐‘ฅ,๐‘ฆ) (31) โˆ— where H and W are the number of rows and columns of pixels in the saliency map ๐‘† , respectively, and ๐‘† โˆ— (๐‘ฅ,๐‘ฆ) is the saliency value of the pixel at position (x, y) in the saliency map ๐‘† โˆ— . For a saliency detection algorithm, a binary map is obtained by adaptive threshold T for each saliency map that is output in the entire image dataset, and a set of Precision, Recall, and F-measure are calculated. Then, the average values of Precision, Recall and F-measure of all saliency maps are calculated, respectively, and these three average values are represented by a histogram. Finally, each saliency detection algorithm can generate these three histograms to achieve the comparison of performance between the various algorithms. The quantitative comparison is shown in Fig. 4-Fig. 6. Fig. 4 shows the PR curves and the Precision/Recall/F-measure histograms of various methods on the ECSSD dataset. Fig. 5 and Fig. 6 are the comparison results on the PASCAL-S and SED2 datasets, respectively. On the three datasets ECSSD, PASCAL-S, and SED2, the two-stage MC method shows the best PR curve. On the two datasets ECSSD and PASCAL-S, the Precision, Recall, and F-measure values of the two-stage MC method are the largest in the Precision/Recall/F-measure histograms. On the SED2 dataset, the Precision value of the two-stage MC method in the Precision/Recall/Fmeasure histogram is slightly smaller than that of the SEG method, the Recall value of the two-stage MC method is slightly smaller than that of the DRFI method, but the most important F-measure value is the largest. The two-stage MC method proposed in this paper is improved on the MC method. Therefore, the F-measure values of the two methods are compared to illustrate the performance improvement of the proposed method. On the ECSSD dataset, the F-measure value of the proposed method is 5.21% higher than that of the MC method. On the PASCAL-S dataset, the F-measure value of the proposed method is 6.05% higher than that of the MC method. On the SED2 dataset, the F-measure value of the proposed method is 5.87% higher than that of the MC method. It can be seen from the results of quantitative comparison that the proposed method can detect the salient region of images with higher accuracy, and its overall performance is better than the other 12 traditional methods.

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

1

0.5

0.5

0.4

Ours

SEG

0.4

Ours

SF

0.3

MC

BMS

0.3

DSR

SWD

SVO

0.2

PCA

CA

0.1

HS

0.2

DRFI

0.1

MR

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Recall

1 Precision Recall F-measure

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Ours MC DRFI MR SEG BMS SVO DSR PCA

HS

SF SWD CA

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

Fig. 4. Quantitative comparisons of saliency maps produced by different approaches on ECSSD dataset

0.5 0.4

Ours

0.3

MC

0.2

DRFI

0.1 0 0

Ours

SF

BMS

0.3

DSR

SWD

SVO

0.2

PCA

CA

0.1

MR

0.1

0.2

0.5 0.4

SEG

0.3

0.4

0.5

Recall

0.6

0.7

0.8

0.9

1

0 0

HS

0.1

0.2

0.3

0.4

0.5

Recall

0.6

0.7

0.8

0.9

1

1

Precision Recall F-measure

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Ours MC DRFI MR SEG BMS SVO DSR PCA

HS

SF SWD CA

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

Fig. 5. Quantitative comparison of saliency maps produced by different approaches on PASCAL-S dataset

0.5

0.5

0.4

Ours

SEG

0.4

Ours

SF

0.3

MC

BMS

0.3

DSR

SWD

0.2

DRFI

SVO

0.2

PCA

CA

0.1

MR

0.1

HS

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0

0.1

0.2

0.3

Recall

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

1 Precision Recall F-measure

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Ours MC DRFI MR SEG BMS SVO DSR PCA HS

SF SWD CA

Fig. 6. Quantitative comparisons of saliency maps produced by different approaches on SED2 dataset

5. Conclusion This paper proposes a saliency detection method via two-stage absorbing Markov chain based on background. The proposed method aims to solve the problems in MC method, like inaccurate selection of boundary background seeds, the lack of background seed coverage, and the

simplification of propagation mode. With boundary connectivity, boundary prior, and convex hull techniques, the proposed method excludes the non-background seed nodes in the boundary, appropriately adds the background seeds in the region outside the candidate boundary background set and the convex hull, and selects the foreground seeds in the convex hull. Making full use of the complementarity between background-based and foreground-based detection methods, we perform effective propagation on the two-stage absorbing Markov chain based on background and foreground, and the propagation results are subjected to reasonable fusion and smoothing optimization. Qualitative and quantitative experiments show that compared with the traditional 12 methods, the proposed method can not only effectively highlight the salient regions and suppress background, but also improve the Precision, Recall and F-measure values. Future work should focus on higher-level salient feature extraction, and apply the relevant theories and methods of machine learning to the field of visual saliency to further improve the detection performance of the proposed method on complex images.

Acknowledgements This work is supported by the National Natural Science Foundation of China (Grant No. U1831127), the Natural Science Foundation of Jiangsu Higher Education Institutions of China (Grant No. 16KJB520020).

References [1] Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry[M]//Matters of intelligence. Springer, Dordrecht, 1987: 115-141. [2] Treisman A M, Gelade G. A feature-integration theory of attention[J]. Cognitive psychology, 1980, 12(1): 97-136. [3] Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(PAMI), 1998, 20(11): 1254-1259. [4] Murray N, Vanrell M, Otazu X, et al. Saliency estimation using a non-parametric low-level vision model[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2011: 433-440. [5] Bruce N D B, Tsotsos J K. Saliency, attention, and visual search: An information theoretic approach[J]. Journal of vision, 2009, 9(3): 5-5. [6] Harel J, Koch C, Perona P. Graph-based visual saliency[C]. Proceedings of Advances in Neural Information Processing Systems(NIPS), 2007: 545-552. [7] Gopalakrishnan V, Hu Y, Rajan D. Random walks on graphs to model saliency in images[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2009: 16981705. [8]Gao D, Vasconcelos N. Bottom-up saliency is a discriminant process[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2007: 1-6. [9] Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 32(1): 171-177. [10] Xie Y, Lu H, Yang M H. Bayesian saliency via low and mid level cues[J]. IEEE Transactions on Image Processing, 2013, 22(5): 1689-1698. [11] Xie Y, Lu H. Visual saliency detection based on Bayesian model[C]. Proceedings of IEEE International Conference on Image Processing, 2011: 645-648. [12] Achanta R, Hemami S, Estrada F, et al. Frequency-tuned salient region detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2009: 15971604. [13] Hou X, Harel J, Koch C. Image signature: Highlighting sparse salient regions[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 34(1): 194-201. [14] Liu T, Sun J, ZHENG N N, et al. Learning to detect a salient object[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2007: 1-8. [15] Liu T, Yuan Z, Sun J, et al. Learning to detect a salient object[J]. IEEE Transactions on Pattern analysis and machine intelligence, 2011, 33(2): 353-367. [16] Jiang H, Wang J, Yuan Z, et al. Salient object detection: A discriminative regional feature integration approach[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 2083-2090. [17] Li G, Yu Y. Visual saliency based on multiscale deep features[C]. Proceedings of IEEE

Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 5455-5463. [18] Zhao R, Ouyang W, Li H, et al. Saliency detection by multi-context deep learning[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 12651274. [19] Li G, Yu Y. Deep contrast learning for salient object detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016: 478-487. [20] Wang J, Borji A, Kuo C C J, et al. Learning a combined model of visual saliency for fixation prediction[J]. IEEE Transactions on Image Processing, 2016, 25(4): 1566-1579. [21] Cheng M M, Mitra N J, Huang X, et al. Global contrast based salient region detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 569-582. [22] Qin C, Zhang G, Zhou Y, et al. Integration of the saliency-based seed extraction and random walks for image segmentation[J]. Neurocomputing, 2014, 129: 378-391. [23] Shehnaz M, Naveen N. An object recognition algorithm with structure-guided saliency detection and SVM classifier[C]. Proceedings of International Conference on Power, Instrumentation, Control and Computing (PICC), 2015: 1-4. [24] Han S, Vasconcelos N. Object-based regions of interest for image compression[C]. Proceedings of Data Compression Conference, 2008: 132-141. [25] Papushoy A, Bors A G. Image retrieval based on query by saliency content[J]. Digital Signal Processing, 2015, 36: 156-173. [26] Sharma G, Jurie F, Schmid C. Discriminative spatial saliency for image classification[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 35063513. [27] Wang L, Xue J, Zheng N, et al. Automatic salient object extraction with contextual cue[C]. Proceedings of International Conference on Computer Vision(ICCV), 2011: 105-112. [28] Yang J, Yang M H. Top-down visual saliency via joint CRF and dictionary learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(3): 576-588. [29] Wei Y, Wen F, Zhu W, et al. Geodesic saliency using background priors[C]. Proceedings of European Conference on Computer Vision(ECCV), 2012: 29-42. [30] Gao D, Mahadevan V, Vasconcelos N. The discriminant center-surround hypothesis for bottom-up saliency[C]. Proceedings of Advances in Neural Information Processing Systems(NIPS), 2007: 497-504. [31] Yang C, Zhang L, Lu H. Graph-regularized saliency detection with convex-hull-based center prior[J]. IEEE Signal Processing Letters, 2013, 20(7): 637-640. [32] Jiang H, Wang J, Yuan Z, et al. Automatic salient object segmentation based on context and shape prior[C]. Proceedings of British Machine Vision Conference(BMVC), 2011: 1-12. [33] Shen X, Wu Y. A unified approach to salient object detection via low rank matrix recovery[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 853860. [34] Yang C, Zhang L, Lu H, et al. Saliency detection via graph-based manifold ranking[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 31663173. [35] Qin Y, Lu H, Xu Y, et al. Saliency detection via cellular automata[C] . Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 110-119. [36] Li H, Lu H, Lin Z, et al. Inner and inter label propagation: salient object detection in the wild[J]. IEEE Transactions on Image Processing, 2015, 24(10): 3176-3186. [37] Jiang B, Zhang L, Lu H, et al. Saliency detection via absorbing markov chain[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013: 1665-1672. [38] Xie Y, Lu H. Visual saliency detection based on Bayesian model[C]. Proceedings of the 18th IEEE International Conference on Image Processing, 2011: 645-648. [39] Achanta R, Shaji A, Smith K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 34(11): 22742282. [40] Zhu W, Liang S, Wei Y, et al. Saliency optimization from robust background detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2014: 28142821. [41] Rahtu E, Kannala J, Salo M, et al. Segmenting salient objects from images and videos[C]. Proceedings of European Conference on Computer Vision(ECCV), 2010: 366-379.

[42] Zhang J, Sclaroff S. Saliency detection: A boolean map approach[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013: 153-160. [43] Chang K Y, Liu T L, Chen H T, et al. Fusing generic objectness and visual saliency for salient object detection[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011: 914-921. [44] Li X, Lu H, Zhang L, et al. Saliency detection via dense and sparse reconstruction[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013: 2976-2983. [45] Margolin R, Tal A, Zelnik-Manor L. What makes a patch distinct?[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 1139-1146. [46] Yan Q, Xu L, Shi J, et al. Hierarchical saliency detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 1155-1162. [47] Perazzi F, Krรคhenbรผhl P, Pritch Y, et al. Saliency filters: Contrast based filtering for salient region detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 733-740. [48] Duan L, Wu C, Miao J, et al. Visual saliency detection by spatially weighted dissimilarity[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2011: 473480. [49] Goferman S, Zelnik-Manor L, Tal A. Context-aware saliency detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 34(10): 1915-1926. [50] Shi J, Yan Q, Xu L, et al. Hierarchical image saliency detection on extended CSSD[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(4): 717-729. [51] Li Y, Hou X, Koch C, et al. The secrets of salient object segmentation[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2014: 280-287. [52] Alpert S, Galun M, Basri R, et al. Image segmentation by probabilistic bottom-up aggregation and cue integration[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2007: 1-8. [53] Yan Q, Xu L, Shi J, et al. Hierarchical saliency detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 1155-1162. [54] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc) challenge[J]. International journal of computer vision, 2010, 88(2): 303-338.

Authors contribution

In this paper, the first author Wei Tang contributes the most to the paper. The second author Zhijian Wang and the third author Jiyou Zhai conduct experiment to verify the effectiveness of the proposed method.

Highlights 1) The boundary-prior saliency map is fused with the background-absorbing saliency map to obtain the first-stage saliency map based on background. 2) Based on the first-stage saliency map, the foreground seeds are selected within the range of the convex hull. 3) Based on spatially weighted color contrast, in the region outside the boundary and the convex hull, some background nodes are added appropriately as background seeds

Conflict of interest All authors confirm that there is no conflict of interest.