Journal Pre-proofs Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground Wei Tang, Zhijian Wang, Jiyou Zhai, Zhangjing Yang PII: DOI: Reference:
S1047-3203(19)30348-7 https://doi.org/10.1016/j.jvcir.2019.102727 YJVCI 102727
To appear in:
J. Vis. Commun. Image R.
Received Date: Revised Date: Accepted Date:
24 October 2019 28 November 2019 28 November 2019
Please cite this article as: W. Tang, Z. Wang, J. Zhai, Z. Yang, Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground, J. Vis. Commun. Image R. (2019), doi: https:// doi.org/10.1016/j.jvcir.2019.102727
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ยฉ 2019 Published by Elsevier Inc.
Salient Object Detection via Two-Stage Absorbing Markov Chain Based on Background and Foreground Wei Tang1,2, Zhijian Wang1, Jiyou Zhai3, Zhangjing Yang2 1 College
of Computer and Information, Hohai University, Nanjing, 211100, P. R. China [e-mail:
[email protected]]
2 School 3 School
of Information Engineering, Nanjing Audit University, Nanjing, 211815, P. R. China
of Computer Engineering, Nanjing Institute of Technology, Nanjing, 211167, P. R. China *Corresponding author: Wei Tang
Abstract This paper proposes a saliency detection method via two-stage absorbing Markov chain based on background and foreground for detecting salient objects in images. Firstly, image preprocessing is performed, followed by convex hull construction and superpixel segmentation, to prepare for subsequent processing. Secondly, according to the boundary connectivity, the superpixels with lower background probability value in the candidate boundary background set ๐ต0 are deleted, and the boundary background set ๐ต1 is obtained. With the saliency values of the nodes in the boundaryprior saliency map ๐๐๐1, the background seeds are added appropriately in the region outside the candidate boundary background set ๐ต0 and the convex hull ๐ป, and the background seed set ๐ต is obtained after update. Then, the background-absorbing Markov chain is constructed to generate background-absorbing saliency map ๐๐๐2. By fusing the saliency maps ๐๐๐1 and ๐๐๐2, the firststage background-based saliency map ๐๐๐ is obtained. Thirdly, in the range of the convex hull ๐ป, the foreground seed set ๐น is determined according to the saliency map ๐๐๐. Then, the foregroundabsorbing Markov chain is constructed, to obtain the second-stage foreground-absorbing saliency map ๐๐๐. Finally, the saliency maps ๐๐๐ and ๐๐๐ of the two stages are combined to obtain a fused saliency map ๐, and the final saliency map ๐ โ is obtained after optimization through smoothing mechanism. Compared with the traditional methods, the performance of the proposed method is significantly improved. The proposed method is tested on three public image datasets, and it shows great accuracy in detecting salient objects. Keywords: Saliency detection, Markov chain, background absorbing, foreground absorbing
1. Introduction The human visual system has a selective attention mechanism that allows only a portion of the region of interest (ROI) in the visual scene to enter the brain for analysis to reduce the complexity of the scene and environment. For image saliency detection, it uses a computer to simulate the human visual system. Specifically, it calculates the saliency of each part of an image (the degree of attracting people's visual attention), and then extracts the most salient (highest degree of attraction) region. Image saliency detection reduces the complexity of computer understanding image content, perceives its main content, reduces computational cost, and improves image processing efficiency. The research of image saliency detection began with Koch [1] who proposed the concept of saliency map. Depending on the mathematical model used, image saliency detection methods can be divided into saliency detection based on feature integration theory[1-4], saliency detection based on information theory [5], graph-based saliency detection [6-7], saliency detection based on
decision theory [8-9], saliency detection based on Bayesian theory [10-11], saliency detection based on frequency-domain analysis [12-13], and saliency detection based on machine learning [14-19], etc. Depending on the purpose of detection and the function of the mathematical model used, the research on image saliency detection can be divided into eye fixation prediction [20] and salient object detection [21]. In 1998, Itti proposed the first landmark model for saliency calculation, the IT method [3], which belonged to the eye fixation prediction model. The IT method extracted multiscale visual features, such as color, intensity, and orientation. It constructed the feature maps using center-surround differences, and then fused the multi-feature maps to highlight the salient region. Before 2008, many saliency detection methods used the eye fixation prediction model. The purpose of eye fixation prediction is to predict the points that eyes move with, that is, the image positions that human eyes focus on. However, the final result only indicates the location of salient objects in an image, and does not get a clear boundary of salient objects. After 2008, the focus of image saliency detection began to turn to salient object detection. The purpose of salient object detection is to highlight the salient region in an image, which emphasizes the integrity and uniformity of salient objects in a saliency map, and uses artificially annotated ground truth (GT) maps with pixel accuracy. This research of this paper is also focused on salient object detection. In recent years, a large number of salient object detection models have been proposed, and they have been widely used in image segmentation [22], object detection and recognition [23], image compression [24], image retrieval [25], and image classification [26], etc. Salient object detection has become one of the important research directions in the field of computer vision. Salient object detection algorithms include bottom-up models [27] and top-down models [28]. A bottom-up model is generated by the direct stimulation of underlying visual signal. It does not involve high-level image information and is a fast, unconscious, data-driven visual attention mechanism. A top-down model is a slow, conscious, task-driven visual attention mechanism that typically requires learning the various characteristics of an image and using the information obtained from learning to detect objects. When researching the bottom-up model, it is generally assumed that an observer is in a state of free-viewing, that is, the observer has no purpose. The main focus of this paper is the bottom-up salient object detection model. Graph-based salient object detection has become one of the commonly used strategies in the field of saliency detection. In 2006, Harel et al. proposed a graph-based visual saliency (GBVS) algorithm for region extraction [6]. They applied random walk theory of graphs to visual saliency detection for the first time. In graph-based methods, a graph model is constructed based on graph theory, and an image is segmented into multiple regions. One region corresponds to one node in the graph, and the edges between nodes are also defined. Starting from the underlying clues of the image, some prior knowledge can be used to mark some nodes of the image as seed nodes, and then the propagation model spreads the saliency of the seed nodes. After propagation and diffusion, each node in the graph is given a corresponding saliency value. Common prior forms include boundary prior [29], center prior (30], convex hull prior [31], shape prior [32], and color prior [33], etc. Common propagation models include manifold ranking model [34], cellular automata model [35], label propagation model [36], and Markov model [37], etc. In Reference [34], Yang et al. proposed an image saliency detection method via graph-based manifold ranking (MR). The MR algorithm first constructs a k-regular graph for the superpixels in the graph model, and then performs two-stage saliency detection. In the first stage, the nodes in the
top, bottom, left, and right boundaries are used as background queries. The remaining nodes are ranked according to their relevance with these queries, and 4 saliency maps are obtained according to the ranking results, and then they are fused to obtain the initial saliency map. In the second stage, threshold segmentation is performed on the initial saliency map, which facilitates selecting foreground queries, and then the idea of manifold ranking is used for enhancement to obtain the final saliency map. In Reference [35], Qin et al. proposed a background-based maps optimized by single-layer cellular automata (BSCA) algorithm. The BSCA algorithm treats each superpixel as a cell. Firstly, it calculates the contrast between the cell and the boundary seeds according to color features and spatial distance, and establishes the background-based saliency map. Then, it propagates the obtained background-based saliency map as the initial saliency map. Finally, each cell will automatically evolve into a more accurate and steady state. In Reference [36], Li et al. proposed a label propagation (LP) algorithm. It further extracts the foreground labels of object region for complex scenes based on the traditional use of boundary superpixels as background labels for inner propagation. The LP algorithm propagates the effective information according to the similarity between the superpixels, and the final saliency map is the fusion result of the inter propagation via the combination of background and foreground labels. The use of inner propagation or inter propagation is determined by the compactness criterion. In Reference [37], Jiang et al. proposed an image saliency detection method based on absorbing Markov chain (MC). This method uses the boundary nodes as the background seed set and copies them as virtual absorbing nodes. All the nodes in the image are taken as transient nodes. Starting from any one of the transient nodes to perform random walks, the saliency of each transient node is measured by its absorbed time to absorbing nodes. The MC method takes the acquired superpixels as transient nodes in the Markov chain and copies the superpixels of the 4 boundaries into virtual absorbing nodes, thereby calculating the absorbed time of transient nodes. The disadvantages of the MC method are as follows. Firstly, the selection of boundary background seeds is not accurate, because salient objects may appear in the 1-2 boundaries of some images. Secondly, in the sample space of background seeds, the boundary background seeds only cover a part of them, which affects the efficiency of propagation to a certain extent. Thirdly, in the process of propagation, for some special images, some background areas may not be well suppressed, and the saliency values of individual singular nodes are very large, which fails to effectively highlight the salient region. Fourthly, only the absorbing and propagation of the background seeds are considered, that is, the mode of propagation is singular. Fifthly, the foreground and background in the saliency map are not uniform enough, which requires further optimization. In view of the above analysis, this paper proposes a two-stage absorbing Markov chain based on background and foreground for detecting salient objects in images. The framework of the proposed method is shown in Fig. 1. The proposed method can be divided into four steps: preprocessing step, the first-stage processing step, the second-stage processing step, fusion and smoothing step. Firstly, image preprocessing is performed to complete convex hull construction and superpixel segmentation, to prepare for subsequent processing. Secondly, according to the boundary connectivity, the superpixels with lower background probability value in the candidate boundary background set are deleted, and the boundary seeds are obtained. With the saliency values of the nodes in the boundary-prior saliency map, the background seeds are increased appropriately in the
region outside the candidate boundary background set and the convex hull. Then, the backgroundabsorbing Markov chain is constructed to generate background-absorbing saliency map. When fusing the boundary-prior saliency map and background-absorbing saliency map to obtain the firststage background-based saliency map, it should be noted that the background-absorbing saliency map takes a larger weight. Thirdly, in the range of the convex hull, the foreground seeds are selected according to the first-stage saliency map. Then, the foreground-absorbing Markov chain is constructed, to obtain the second-stage foreground-absorbing saliency map. Finally, the saliency maps of the two stages are fused, and the final saliency map is obtained after optimization through smoothing mechanism. Compared with the traditional methods, the performance of the proposed method is significantly improved. The proposed method is tested on three public image datasets, and it shows great accuracy in detecting salient objects. The main contributions of our work are summarized as follows: (1) Find the similar region by the superpixel similarity instead of just by the weight of boundaries, thereby optimizing the boundary connectivity algorithm; calculate the probability that each superpixel in the boundary belongs to the background, and remove superpixels with the lower background probability value from the candidate boundary background to get an accurate boundary background set. (2) Based on spatially weighted color contrast, in the region outside the boundary and the convex hull, some background nodes are added appropriately as background seeds, and they are combined with the boundary background set to obtain the background seed set to improve the algorithm efficiency. (3) In order to highlight the salient objects, the boundary-prior saliency map is fused with the background-absorbing saliency map to obtain the first-stage saliency map based on background. (4) According to the adjacency matrix W of the original graph model G, in the associated matrix of the absorbing Markov chain, the connections between the nodes are improved. (5) Based on the first-stage saliency map, the foreground seeds are selected within the range of the convex hull. (6) Make full use of the complementarity between background-based and foreground-based detection methods, perform effective propagation, respectively, and reasonably fuse and smooth the propagation results.
Stage 1๏ผBackground-based saliency map Pre-processing
Input image
Boundary seeds
Result of boundary prior
Background seeds
Fusing and smoothing
Convex hull
Markov graph construction
Superpixels
Foreground seeds
Result of background absorbing
Markov graph construction
Background-based result
Final saliency map
Result of foreground absorbing
Stage 2๏ผForeground-absorbing saliency map Fig. 1. The framework of our proposed method
The remainder of the paper is organized as follows: Section 2 reviews related work which is related to our approach. We demonstrate framework of our saliency detection method in detail in Section 3. Then, we demonstrate our experimental results based on three public image datasets and compare the results with other state-of-art saliency detection methods in Section 4. The final section concludes the paper by summarizing our findings.
2. Related Work
Given a set of states S={๐ 1, ๐ 2,โฆ๐ ๐}, a Markov chain is represented by an nรn transition matrix P, ๐๐๐ represents the transition probability of moving from state ๐ ๐ to state ๐ ๐. The state ๐ ๐ with transition probability ๐๐๐ = 1 is called the absorbing state, which means ๐๐๐=0 for all iโ j at this time. If a Markov chain contains at least one absorbing state, and starting from any transient node, a random walker can reach a certain absorbing node with a positive probability through a finite number of steps, then this Markov chain is called an absorbing chain. In the MC method [37], the SLIC algorithm is first used to divide the input image into k superpixels. In general, the superpixels on the four boundaries of an image do not contain salient objects, which can be used as the nodes to be copied, and the m nodes that have been copied serve as virtual absorbing nodes in the absorbing chain. We use n nodes (n=k+m) to create a single-layer graph model G=
, where ๐๐ represents the superpixel node in the graph G, ๐๐ โ V, and E is the set of boundaries. We renumber the nodes in graph G, so that the first k nodes are transient nodes and the last m nodes are absorbing nodes. The associated matrix A = (๐๐๐)๐ โ ๐ of the nodes in the graph G is defined by: ๐คโฒ๐๐ j โ N(i), 1 โค i โค k ๐๐๐= 1 i=j (1) 0 otherwise where N(i) represents a set of nodes connected to node ๐๐. In the graph model G, each node (transient or absorbing) is connected to the transient nodes which neighbor it or share common boundaries with its neighboring nodes. In addition, all the transient nodes of image boundary are connected to each other. The boundary weight ๐คโฒ๐๐ between the nodes ๐๐ and ๐๐ is defined as: โ
โ๐๐ โ๐๐โ 2
๐ ๐คโฒ๐๐=๐ (2) where ๐๐ and ๐๐ are the feature vectors of mean CIELAB color of superpixels ๐๐ and ๐๐, respectively, and ฯ is a constant. Based on the associated matrix A, the degree matrix D=diag(โ๐๐๐๐) is obtained, and the element on the diagonal of the matrix D is the sum of the weights connected to the node. Therefore, the transition matrix of the graph G is defined as: P=D-1A (3) The transition matrix P of the absorbing chain can be written in a simple standard form given by: ๐ ๐
P= 0 ๐ผ (4) where the sub-matrix ๐ = (๐๐๐)๐ โ ๐ contains the transition probability between any two transient nodes; the sub-matrix ๐
= (๐๐๐)๐ โ ๐ contains the probability of moving from any transient node to any absorbing node; 0 is the mรk zero sub-matrix; I is the mรm identity sub-matrix, meaning that a random walker cannot move between any pair of absorbing nodes, and the transition probability of moving from any absorbing node to itself is 1. The fundamental matrix of an absorbing Markov chain is N = (๐๐๐)๐ โ ๐, where ๐๐๐ ๏ค๏ฅ๏ฎ๏ฏ๏ด๏ฅ๏ณ the average number of times that the random walker has passed through the transient node ๐๐ before arriving at a certain absorbing node if he starts from the transient node ๐๐. โ N = โ๐ = 0๐๐ = (๐ผ โ ๐) โ1 (5) ๐ The random walker starts from the transient node ๐ and finally reaches a certain absorbing node. In this process, the total number of times passing through all the transient nodes is called the absorbed time of the transient node ๐๐, and the corresponding value is โ๐๐๐๐. For a k-dimensional column vector c = [1,1,โฆ,1]๐, absorbed time is calculated by: y = Nc (6) ๐ฆ = [๐ฆ(1),๐ฆ(2),โฆ,๐ฆ(๐)]๐ is a k-dimensional column vector recording the absorbed time of each transient node. By normalizing the absorbed time vector y, a saliency map S is obtained. S (๐) = ๐ฆ(๐)๏ผ๐ โ [1,k] (7) where i denotes the number of transient node in the image, and ๐ฆ denotes the normalized absorbed time vector.
[
]
3. The proposed algorithm In order to obtain a more accurate and robust saliency map, this paper proposes a two-stage Markov chain method. This section details the four steps of the proposed method which is expressed as Algorithm 1. 3.1 Image preprocessing 3.1.1 Convex hull construction Using the color enhanced Harris operator [38] and color features, Harris corner detection is performed on the original image I, and the minimum convex hull H is constructed according to the detected salient feature points. The convex hull H roughly determines the range of saliency region, and the region outside the convex hull is seen as the background region. The salient objects are contained in the convex hull, but the convex hull still contains many background areas. 3.1.2 Superpixel segmentation The simple linear iterative cluster (SLIC) algorithm [39] is used to segment the input image I into k superpixels to generate the basic elements based on the graph-based saliency detection. 3.1.3 Graph model construction An undirected graph G= is constructed for the image I processed by superpixel segmentation. ๐๐ represents the superpixel node in the graph G, ๐๐ โ V. k represents the number of superpixel nodes in the graph G. The similarity between the superpixel nodes ๐๐ and ๐๐ is defined as: โ
๐๐๐๐๐๐(๐๐,๐๐) ๐2
๐ ๐๐๐๐=๐ (8) ๐ ๐๐ ๐ ๐๐ In graph G, ๐๐ can applied to any two nodes. The closer the value of ๐๐ is to 1, the more similar the nodes are. Even though ๐๐ is not a neighborhood of ๐๐, the two nodes may be similar. In Eq. (8), ๐2 is a balance parameter used to control the strength of the weight, where ๐2=0.1 in this paper; ๐๐๐๐๐๐(๐๐,๐๐) is the Euclidean distance of the superpixels ๐๐, ๐๐ in the CIELAB color space. ๐๐๐๐๐๐(๐๐,๐๐) = โ๐๐ โ ๐๐โ = (๐๐ โ ๐๐)2 + (๐๐ โ ๐๐)2 + (๐๐ โ ๐๐)2 (9) where ๐๐ and ๐๐ are the feature vectors of mean CIELAB color of superpixels ๐๐ and ๐๐, respectively. 3.2 The first stage: saliency map ๐บ๐๐ based on background 3.2.1 Determining the boundary background set In many cases, all the four boundaries of an image are background, but there will also be some images with a part of salient objects appearing on one or two boundaries. If this part is selected as the background node, it will reduce the accuracy of saliency detection based on boundary prior and background absorbing. 1) Definition of candidate boundary background set ๐ต0 In an image with a resolution of wรh, the superpixels located in the image boundary of width d are taken as the candidate boundary background set ๐ต0. The abscissa of ๐ต0 is { xโ0 โค x โค d โช w โ d โค x โค w}; the ordinate of ๐ต0 is { yโ0 โค y โค d โช h โ d โค y โค h}. We take d=10 here. {๐ผ1,๐ผ2,๐ผ3,โฆ} represents the pixels in the superpixel ๐๐, and as long as one pixel ๐ผ๐ is in ๐ต0, it is considered that ๐๐ belongs to the candidate boundary background set ๐ต0, that is, ๐ต0 = {๐๐โโ๐ผ๐ โ ๐ต0,๐ผ๐ โ ๐๐}. To filter boundary background seeds, we need to calculate the background probability of each superpixel ๐๐ in ๐ต0. 2) Finding a similar region R of the boundary node ๐๐ For the superpixel ๐๐ in the image boundary, we need to find the superpixel nodes similar to ๐๐ in the entire image. The similar region R is expressed as: ๐
= {๐๐โ๐ ๐๐๐๐ โฅ ๐ฝ1,๐ฝ1 โ [0.7,0.9]} (10) If the similarity ๐ ๐๐๐๐ is not smaller than the adaptive threshold ๐ฝ1, the superpixels ๐๐ and ๐๐ are considered similar. The superpixel ๐๐ and all similar nodes form a similar region R. 3) Calculating the boundary connectivity BCon of the similar region R Boundary connectivity [40] is a method to quantify the connection extent between region R
and an image boundary. The boundary connectivity BCon of the similar region R of the boundary node ๐๐ is expressed as: ๐ต๐ถ๐๐ =
โ๐๐ ๐๐๐๐ โ ๐ฟ(๐๐ โ ๐ต0) โ๐๐ ๐๐๐๐
(11)
๏ผ๐๐ โ ๐
where ฮด(โ) is an indication function. When the superpixel ๐๐ โ ๐ต0, the value of ๐ฟ(๐๐ โ ๐ต0) is 1, otherwise the value is 0. โ๐๐ ๐๐๐๐ โ ๐ฟ(๐๐ โ ๐ต0) indicates the sum of the similarity of the boundary nodes ๐๐ (including the similarity of ๐๐ itself) in the intersection of the candidate boundary background set ๐ต0 and the similar region R. โ๐๐ ๐๐๐๐ indicates the sum of the similarity of the boundary node ๐๐ (including the similarity of ๐๐ itself) in the similar region R. As shown in Fig. 2, for the background region, its value of boundary connectivity is usually large, while for the object region, its value of boundary connectivity is usually small.
Fig. 2. An illustrative example of boundary connectivity
4) Calculating the background probability of boundary node ๐๐ The greater the boundary connectivity of the similar region R is, the greater the probability that it belongs to the background, and further, the higher the probability that the boundary node ๐๐ belongs to the background. In the candidate boundary background set ๐ต0, the background probability of the superpixel ๐๐ is calculated by: โ
๐ต๐ถ๐๐2(๐
) 2๐2 ๐ต
๐(๐) = 1 โ ๐ (12) In the experiment, we take ๐2๐ต=1. 5) Filtering the boundary background seeds and determining the boundary background set ๐ต1 In the candidate boundary background set ๐ต0, the superpixel ๐๐ with a lower background probability ๐(๐) is removed from the candidate boundary background set ๐ต0 to obtain the final boundary background set ๐ต1. ๐ต1=๐ต0 โ {๐๐โ๐(๐) < ๐ฝ2,๐๐ โ ๐ต0, ๐ฝ2 โ [0.1,0.4]} (13) As shown in Fig. 2, two superpixels at the bottom of the person are contained in the image boundary. Compared with other superpixels in the candidate boundary background set ๐ต0, the background probabilities of the two superpixels are the smallest, so they are removed from the candidate boundary background set ๐ต0. 3.2.2 Boundary-prior saliency map ๐บ๐๐๐ Considering that there may be multiple objects in the image, this paper does not use the method of calculating center saliency map based on convex hull, that is, the center of gravity of the convex hull (x0,y0) as the center of the salient region, by calculating the Euclidean distance between the superpixel and the center, the saliency value of each superpixel in the convex hull is obtained. This paper calculates the boundary-prior saliency map ๐๐๐1 based on spatially weighted color contrast. If a superpixel is more different from the boundary background set, it is more likely to belong to salient objects; if the difference is small, it is more likely to belong to the background region. In addition, the contribution of the boundary background set to the superpixel saliency is also related to the spatial distance, and the smaller the space distance is, the higher the contrast. Taking the boundary background set ๐ต1 as a prior, the saliency value of the superpixel ๐๐ is given by: ๐๐๐1(๐)=๐ ๐๐๐๐๐(๐) โ ๐ค๐๐๐ (๐) (14) ๐๐๐๐๐๐(๐๐,๐๐) โ 1 ๐2 1 ๐ ๐๐๐๐๐(๐)=1 โ ๐๐โ๐๐ , ๐๐ โ 2 ๐๐๐๐ (๐๐,๐๐) โ 1 ๐2 2 ๐ค๐๐๐ (๐)=๐๐โ๐๐ , ๐ ๐ โ ๐ต1
๐ต1
(15) (16)
where ๐ ๐๐๐๐๐(๐) represents the color difference between the superpixel ๐๐ and the boundary background set ๐ต1; ๐ค๐๐๐ (๐) represents the spatial difference between the superpixel ๐๐ and the boundary background set ๐ต1 as a distance contrast balance factor; ๐๐ represents the total number of superpixels in the boundary background set ๐ต1; ๐๐๐๐ (๐๐,๐๐) represents the Euclidean distance of the central coordinate between the superpixels ๐๐ and ๐๐; in the experiment, ๐21=0.2, ๐22=1.5. Finally, ๐๐๐1(๐) is normalized. 3.2.3 Background-absorbing saliency map ๐๐๐ ๐ 1) Determining the background seed set B In addition to the boundary background set ๐ต1, some background nodes are added as background seeds, and the two sets are combined to generate a background seed set B to further optimize the algorithm. ๐ต=๐ต1 โช {๐๐โ๐๐๐1(๐) < ๐ฝ3 ๐๐๐ ๐๐๐1(๐) < ๐๐ฃ๐(๐๐๐1),๐๐ โ ๐ต0,๐๐ โ ๐ป} (17) In Eq. (17), in the region outside the candidate boundary background set ๐ต0 and the convex hull H, the background nodes are appropriately added. The saliency value ๐๐๐1(๐) of the increased node ๐๐ is less than the threshold value ๐ฝ3, and is smaller than the average saliency value ๐๐ฃ๐ (๐๐๐1) of all superpixels in the region. In the experiment, ๐ฝ3=0.05. 2) Adjacency matrix W of graph model G For the undirected graph G=, the edge E is defined as: each node is connected not only to its neighboring nodes but also to the neighboring nodes of the neighboring nodes; all the nodes in the boundary background set ๐ต1 are connected to each other, which reduces the geodesic distance between similar nodes. The adjacency matrix of the graph model G is W = (๐ค๐๐)๐ โ ๐, where ๐ค๐๐ represents the weight of the edge (๐๐, ๐๐) โ E, which is same to the similarity ๐ ๐๐๐๐ described above, and the weight is defined as: โ
๐๐๐๐๐๐(๐๐,๐๐) ๐2
๐ค๐๐=๐ (18) 3) Associated matrix A of the background-absorbing Markov chain G With the graph model G, we construct an absorbing Markov chain. The m nodes in the background seed set B are copied, and are regarded as virtual absorbing nodes. All nodes (k nodes) in the original image are used as transient nodes, thus obtaining a graph model of the backgroundabsorbing Markov chain G =, and graph G is a partially directed graph. Based on the graph G, the new connections between the nodes in graph G are further defined as follows. a) Transient node's own connection: Each transient node has a self-loop, that is, a transient node can connect to itself, and the edge weight is 1. b) Connections between transient nodes: The connections between transient nodes are consistent with the connections between the corresponding nodes in the undirected graph G. c) Connections between transient nodes and virtual absorbing nodes: If a transient node has a corresponding virtual absorbing node for its replication, the transient node is unidirectionally connected to its virtual absorbing node, with edge weight of 1. The other transient nodes connected to this transient node are unidirectionally connected to the virtual absorbing nodes corresponding to the transient node. Such connections are unidirectional, which means that each absorbing node is not connected to any transient node. d) Virtual absorbing nodeโs own connection: each virtual absorbing node has a self-loop, and the edge weight is 1. e) Connections between virtual absorbing nodes: There is no connection between any two virtual absorbing nodes. In graph G, the total number of nodes is n=k+m. The nodes in the graph G are rearranged, so that the k transient nodes are ranked in front, and the m absorbing nodes are ranked behind. According to the rules described above, the associated matrix A = (๐๐๐)๐ โ ๐ of the graph G is defined. As the nodes are locally connected, so A is a sparse matrix containing a small number of non-zero elements.
i โค k, j โค k, j โ N(i) i โค k, j > k, j=iโฒ
๐ค๐๐ 1 ๐๐๐=
i โค k, j > k, i โ N(๐), j=๐โฒ
๐ค๐๐
(19) 1 i=j 0 otherwise where N(i) represents a set of the sequence numbers of the nodes connected to the node ๐๐ in G, and iโฒ is the sequence number of the copy node of ๐๐. 4) Transition matrix P of the background-absorbing Markov chain G According to Eq. (3), the transition matrix P of the graph G is obtained, which is a sparse matrix whose element ๐๐๐ โ [0,1] represents the probability of moving from node ๐๐ to node ๐๐, and the value of ๐๐๐ is given by: ๐๐๐
๐๐๐=โ ๐
๐ ๐๐
(20)
5) Calculating the saliency values based on absorbed time According to Eq. (4)-(7), the saliency values of the nodes in original image are finally obtained, as shown in Eq. (21), and the background-absorbing saliency map ๐๐๐2 is generated, where ๐๐๐2 (๐) represents the saliency value of superpixel node ๐๐. ๐๐๐2(๐) = ๐ฆ(๐)๏ผ๐ โ [1,k] (21) The larger the ๐ฆ(๐) value of transient node ๐๐, the longer the time of ๐๐ being absorbed into the background seed node, and the more likely the transient node ๐๐ belong to salient objects; conversely, the smaller the ๐ฆ(๐) value of transient node ๐๐, the shorter the time of ๐๐ being absorbed into the background seed node, and the more likely the transient node ๐๐ belong to the background. 3.2.4 Fusion of boundary-prior saliency map and background-absorbing saliency map The boundary-prior saliency map ๐๐๐1 is fused with the background-absorbing saliency map ๐๐๐2, and the importance of the background-absorbing saliency map is highlighted. Then, the firststage saliency map ๐๐๐ based on background is obtained. The saliency value of superpixel ๐๐ is calculated by: ๐๐๐(๐) = (1 โ ๐( โ ฮธ โ ๐๐๐1(๐))) โ ๐๐๐2(๐) (22) ( โ ฮธ โ ๐๐๐1(๐)) ) can suppress where ฮธ is the balance factor, and we take ฮธ=5 in the experiment. (1 โ ๐ the background noise in the background-absorbing saliency map to some extent. The obtained background-based saliency map ๐๐๐ is also normalized. 3.3 The second stage: foreground-absorbing Markov saliency map ๐บ๐๐ 3.3.1 Determining the foreground seed set F After the first stage, the main objects are roughly highlighted, but the saliency map ๐๐๐ still contains some background noise that should be suppressed, that is, part of background region is misidentified as salient objects. Moreover, there may be the case where some foreground regions that are mistaken for the background. Therefore, it is necessary to start from the potential foreground region and perform back propagation based on the absorbing Markov chain to further improve the background-based saliency map ๐๐๐. In order to select the foreground seeds, in the range of the convex hull H, the superpixels with large value in the saliency map ๐๐๐ are selected to the foreground seed set F. F = {๐๐โ๐๐๐(i) โฅ ๐ฝ4 โ max (๐๐๐) ๐๐๐ ๐๐๐(๐) โฅ ๐๐ฃ๐(๐๐๐), ๐๐ โ ๐ป} (23) where max (๐๐๐) represents the maximum saliency value of the superpixel nodes in the saliency map ๐๐๐, and ๐ฝ4 = 0.7 is taken in the experiment. ๐๐ฃ๐(๐๐๐) represents the average saliency value of all superpixels in the convex hull H. 3.3.2 Constructing the foreground-absorbing Markov chain The steps to construct a foreground-absorbing Markov chain are similar to the steps to construct a background-absorbing Markov chain. The main differences between the two are as follows. Firstly, at this stage, all the nodes in the foreground seed set F are copied and used as virtual absorbing nodes;
Secondly, after normalizing the absorbed time vector y, the saliency value of the original image node is calculated according to Eq. (24), thereby obtaining the foreground-absorbing saliency map ๐๐๐, where ๐๐๐(๐) represents the saliency value of superpixel node ๐๐; ๐๐๐(๐) = 1 โ ๐ฆ(๐)๏ผ๐ โ [1,k] (24) Thirdly, the larger the ๐ฆ(๐) value of transient node ๐๐, the longer the time of ๐๐ being absorbed into the foreground seed node, and the more likely the transient node ๐๐ belong to the background; conversely, the smaller the ๐ฆ(๐) value of transient node ๐๐ is, the shorter the time of ๐๐ being absorbed into the foreground seed node, and the more likely transient node ๐๐ belong to salient objects. 3.4 Saliency map fusion and smoothing optimization 3.4.1 Fusion of saliency maps at two stages The background-absorbing saliency map ๐๐๐ and the foreground-absorbing Markov saliency map ๐๐๐ are complementary, as the former can highlight the salient objects, while the latter can better suppress the background noise. By combining these two saliency maps, we can obtain a fused saliency map S. ๐ = ฮฑ๐๐๐ +(1 โ ฮฑ) ๐๐๐ (25) where ฮฑ is the balance factor. Considering the contributions of the two saliency maps are same, we take ฮฑ=0.5 in the experiment. 3.4.2 Smoothing optimization of the fused saliency map In order to highlight the salient objects, weaken the background region, and obtain a more uniform foreground and background, this paper uses the smoothing mechanism to optimize the fused saliency map by solving the optimization problem given by: 2 2 ๐ โ = arg๐๐๐(๐ โ๐,๐๐ค๐๐(๐ โ (๐) โ S โ (๐)) + โ๐(๐ โ (๐) โ ๐(๐)) ) (26) ๐ โ where S is the fused saliency map, ๐ = [๐(1),๐(2),โฆ,๐(๐)] ; ๐ is the final saliency map, ๐ โ = ๐ [๐ โ (1),๐ โ (2),โฆ,๐ โ (๐)] . The first term of the function on the right side of Eq. (26) is the smoothness constraint, and the second term is the fitting constraint, where parameter ๐ controls the balance of the smoothness constraint and the fitting constraint. That is to say, a good saliency map ๐ โ should not change too much between nearby superpixels, and should not differ too much from the initial saliency map (the fused saliency map ๐). 2 2 The derivative of the function ๐โ๐,๐๐ค๐๐(๐ โ (๐) โ S โ (๐)) + โ๐(๐ โ (๐) โ ๐(๐)) is set to 0 to calculate the minimum solution. Through the transformation, the final saliency map ๐ โ is calculated by: ๐ โ = ฮป(D โ W + ฮปI) โ1๐ (27) where W is the adjacency matrix of graph G, W = (๐ค๐๐)๐ โ ๐; D is a degree matrix, D = diag(โ๐๐ค๐๐); I is an identity matrix; ฮป = 1 (2ฮผ), and we take ฮป=0.02 in the experiment. 3.5 Algorithm representation In Sections 3.1~3.4, this paper proposes a saliency detection method via two-stage absorbing Markov chain based on background and foreground. The main steps of the proposed method are summarized in Algorithm 1. Algorithm 1. Input: Input image I 1. Use the Harris corner detection to construct the convex hull ๐ป 2. Segment image I to superpixels 3. Define the candidate boundary background set ๐ต0 4. Calculate the background probability of the boundary nodes using boundary connectivity according to Eq. (10)-(12) 5. Filter the boundary background seeds in the candidate boundary background set ๐ต0 to determine the boundary background set ๐ต1 according to Eq. (13) 6. Calculate the boundary-prior saliency map ๐๐๐1 based on the spatially weighted color contrast according to Eq. (14)-(16) 7. In the region outside the candidate boundary background set ๐ต0 and the convex hull ๐ป, add the
background nodes appropriately to determine the background seed set ๐ต according to Eq. (17) 8. The nodes in the background seed set ๐ต are copied as the virtual absorbing nodes, to obtain the associated matrix A of the background-absorbing Markov chain G according to Eq. (18)-(19) 9. Obtain the transition matrix P of the background-absorbing Markov chain G according to Eq. (3) 10. Generate background-absorbing Markov saliency map ๐๐๐2 according to Eq. (4)-(7), and Eq. (21) 11. The saliency maps ๐๐๐1 and ๐๐๐2 are fused to obtain a background-based saliency map ๐๐๐ according to Eq. (22) 12. According to Eq. (23), select the foreground seeds in the range of convex hull ๐ป according to the saliency map ๐๐๐, and determine the foreground seed set ๐น. 13. Similar to steps 8~9, the nodes in the foreground seed set ๐น are copied as the virtual absorbing node, then the associated matrix A and the transition matrix P of the foreground-absorbing Markov chain G are obtained. 14. Generate foreground-absorbing Markov saliency map ๐๐๐ according to Eq. (4)-(7), and Eq. (24) 15. According to Eq. (25), the two saliency maps of ๐๐๐ and ๐๐๐ are fused to obtain a fused saliency map ๐. 16. According to Eq. (26)-(27), use the smoothing mechanism to optimize the fused saliency map to obtain the final saliency map ๐ โ Output: Final Saliency map ๐ โ
4. Experimental Results and Analysis In order to verify the performance of the proposed algorithm, the proposed algorithm is compared with 12 traditional methods, namely MC[37], DRFI[16], MR[34], SEG[41], BMS[42], SVO[43], DSR [44], PCA [45], HS [46], SF [47], SWD [48], and CA [49]. The experimental images are from ECSSD [50], PASCAL-S [51], and SED2 [52], each of which includes original images, and GT maps whose salient regions are manually marked. The ECSSD dataset is an extended set of CSSD [53] dataset, with a total of 1000 images. The dataset has large objects, including semantically rich and complex images, which cover plants, animals, and human behaviors in natural scenes. The GT maps of all images are marked as binary images with pixel-level precision. The PASCAL-S dataset is a subset of the PASCAL VOC [54] dataset. 850 images with complex scenes are selected from the PASCAL VOC dataset, involving 20 object classes. The GT maps of all images are marked as binary images with pixel-level precision. The SED2 dataset is a subset of the SED dataset, with a total of 100 images. Although the dataset is small in scale, it is relatively difficult, as it has small objects and has two objects in an image. The GT maps of all images are marked as binary images with pixel-level precision. 4.1 Qualitative analysis Qualitative analysis is a rough judgment of macroscopic results by human eyes. Fig. 3 shows a visual comparison of the results obtained by different methods on the three datasets. Qualitative analysis is performed on the experimental results obtained by the various methods. The MC method is currently one of the better-behaved methods, but the background in the middle of an image is also easy to be highlighted due to the inherent characteristics of absorbing Markov chain. The DRFI method uses a random forest regression algorithm based on multi-feature learning, and uses the supervised learning method to map regional feature vectors to saliency values. Overall, the DRFI method has high accuracy, but when the foreground and background have complex changes, missed detection or false detection may occur. The MR method uses the manifold ranking principle, which can generally highlight the entire salient region. But when the background is close to the objects in color, or the salient objects is close to the image boundary, some salient regions will be wrongly detected as the background. In the Bayesian framework, the SEG method measures the saliency of an image by comparing the center and surrounding features of a sliding window. The SEG method is not good for the weakening of background, and objects are greatly interfered by noise. The BMS method is based on a Gestalt principle of figure-ground segregation. In this method, the saliency maps are calculated by analyzing the topological structure of Boolean maps. Since only the global color topological information is used, the color features in the BMS are
not sensitive to changes in the pixel direction, which reduces the saliency values of some regions and makes the entire saliency objects uniform. The SVO method is a top-down saliency detection algorithm that uses the results of other salient algorithms as prior knowledge. In this method, the saliency map is largely subject to background interference, and some backgrounds are wrongly detected as salient objects. Based on the constructed background dictionary, the DSR method obtains the reconstruction coefficient of the image to be tested, and uses the corresponding reconstruction error to describe the image saliency, which has a good inhibitory effect on most of image backgrounds, but sometimes it cannot effectively retain the object boundary. The PCA method uses the pattern and color features of image patches to perform saliency detection, which can detect the outline of salient objects, but does not highlight some internal details. The HS method is based on the over-segmentation of an image, and uses a hierarchical model of three layer scales for saliency detection. For some images with a messy background, some errors may occur in the detection results obtained by this method. The SF method proposes an algorithm of saliency filters for salient region detection, which can outline the foreground objects, but the inside of the object region is not uniform, and the saliency values of some pixels in the salient region are wrongly suppressed. The SWD method is a saliency detection algorithm based on spatially weighted dissimilarity, and calculates the dissimilarities between patches in a relatively low dimensional space after dimensionality reduction. The SWD method can predict human fixations, but it will lose image details to produce some noise. The CA method combines global prior knowledge when local contrast is performed by means of features, and can detect the object boundary, but cannot uniformly highlight the inner region of the objects. The proposed method can highlight the salient region in a clear, bright and uniform manner, and can also effectively suppress the noise of background region (including the background in the middle of an image). For the case where the foreground objects touch the image boundary, the proposed method can also highlight the salient objects. In addition, for images with multiple objects or complex backgrounds, the proposed method performs better than other methods. It can be seen from visual comparison that the saliency maps obtained by the proposed method have the best visual effect, indicating the effectiveness of the proposed algorithm.
Fig.3. A part of visual comparisons of the saliency maps using different algorithms on the ECSSD, PASCAL-S and SED2 datasets
4.2 Quantitative analysis In the comparison experiment for quantitative analysis, this paper uses the Precision-Recall curve and the Precision/Recall/F-measure histogram to measure the effectiveness of the proposed algorithm [12, 21]. For the saliency maps obtained by a saliency detection algorithm, the thresholds of T โ [0,1,2,3,โฆ,255] are used for binary segmentation, and then calculate the Precision and Recall of each saliency map at 256 thresholds according to the segmented binary map and GT map. Precision is defined as the ratio of the correctly-detected foreground region to the foreground region in the saliency map. Recall is defined as the ratio of the correctly-detected foreground region to the foreground region in the GT map. The Precision and Recall are defined as: ๐๐
๐๐๐๐๐๐ ๐๐๐ = ๐๐ + ๐น๐ ๐๐
(28)
๐
๐๐๐๐๐ = ๐๐ + ๐น๐ (29) ๐ where ๐ refers to the number of pixels that are correctly identified to belong to the objects, ๐น๐ refers to the number of pixels that are misidentified to belong to the objects, and ๐น๐ refers to the number of pixels that are misidentified to belong to the non-objects. At the 256 thresholds, we calculate the average values of Precision and Recall for all saliency maps, respectively, and describe the obtained 256 sets of data as 256 points on a two-dimensional plane with Recall for X-axis and Precision for Y-axis. By smoothing these points, the PR curve is obtained. When evaluating the saliency effect, the two indicators of Precision and Recall are mutually
constrained. When Recall is increased, Precision is often reduced. Usually, the F-measure index is used to comprehensively consider the importance of the two indicators. F-measure is the weighted harmonic mean of Precision and Recall. The larger the F-measure value, the better the performance of the detection algorithm. F-measure is defined as: ๐น๐ฝ = 2
(1 + ๐ฝ2) โ Precision โ Recall ๐ฝ2 โ Precision + Recall
(30) 2
where ๐ฝ is the balance factor to balance Precision and Recall. In this paper, the value of ๐ฝ is set to 0.3 to increase the weight of Precision, that is, Precision is more important than Recall. When drawing the Precision/Recall/F-measure histogram, we don't need to take the thresholds in [0,1,2,3,...,255] like drawing the PR curve. For different saliency maps ๐ โ , the threshold for binarization is adaptively determined, and the adaptive threshold T is defined as twice the average saliency value of all pixels in the saliency map ๐ โ . T is calculated by: 2 ๐ป ๐ ๐ = ๐ โ ๐ปโ๐ฅ = 1โ๐ฆ = 1๐ โ (๐ฅ,๐ฆ) (31) โ where H and W are the number of rows and columns of pixels in the saliency map ๐ , respectively, and ๐ โ (๐ฅ,๐ฆ) is the saliency value of the pixel at position (x, y) in the saliency map ๐ โ . For a saliency detection algorithm, a binary map is obtained by adaptive threshold T for each saliency map that is output in the entire image dataset, and a set of Precision, Recall, and F-measure are calculated. Then, the average values of Precision, Recall and F-measure of all saliency maps are calculated, respectively, and these three average values are represented by a histogram. Finally, each saliency detection algorithm can generate these three histograms to achieve the comparison of performance between the various algorithms. The quantitative comparison is shown in Fig. 4-Fig. 6. Fig. 4 shows the PR curves and the Precision/Recall/F-measure histograms of various methods on the ECSSD dataset. Fig. 5 and Fig. 6 are the comparison results on the PASCAL-S and SED2 datasets, respectively. On the three datasets ECSSD, PASCAL-S, and SED2, the two-stage MC method shows the best PR curve. On the two datasets ECSSD and PASCAL-S, the Precision, Recall, and F-measure values of the two-stage MC method are the largest in the Precision/Recall/F-measure histograms. On the SED2 dataset, the Precision value of the two-stage MC method in the Precision/Recall/Fmeasure histogram is slightly smaller than that of the SEG method, the Recall value of the two-stage MC method is slightly smaller than that of the DRFI method, but the most important F-measure value is the largest. The two-stage MC method proposed in this paper is improved on the MC method. Therefore, the F-measure values of the two methods are compared to illustrate the performance improvement of the proposed method. On the ECSSD dataset, the F-measure value of the proposed method is 5.21% higher than that of the MC method. On the PASCAL-S dataset, the F-measure value of the proposed method is 6.05% higher than that of the MC method. On the SED2 dataset, the F-measure value of the proposed method is 5.87% higher than that of the MC method. It can be seen from the results of quantitative comparison that the proposed method can detect the salient region of images with higher accuracy, and its overall performance is better than the other 12 traditional methods.
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Precision
Precision
1
0.5
0.5
0.4
Ours
SEG
0.4
Ours
SF
0.3
MC
BMS
0.3
DSR
SWD
SVO
0.2
PCA
CA
0.1
HS
0.2
DRFI
0.1
MR
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Recall
1 Precision Recall F-measure
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Ours MC DRFI MR SEG BMS SVO DSR PCA
HS
SF SWD CA
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Precision
Precision
Fig. 4. Quantitative comparisons of saliency maps produced by different approaches on ECSSD dataset
0.5 0.4
Ours
0.3
MC
0.2
DRFI
0.1 0 0
Ours
SF
BMS
0.3
DSR
SWD
SVO
0.2
PCA
CA
0.1
MR
0.1
0.2
0.5 0.4
SEG
0.3
0.4
0.5
Recall
0.6
0.7
0.8
0.9
1
0 0
HS
0.1
0.2
0.3
0.4
0.5
Recall
0.6
0.7
0.8
0.9
1
1
Precision Recall F-measure
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Ours MC DRFI MR SEG BMS SVO DSR PCA
HS
SF SWD CA
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Precision
Precision
Fig. 5. Quantitative comparison of saliency maps produced by different approaches on PASCAL-S dataset
0.5
0.5
0.4
Ours
SEG
0.4
Ours
SF
0.3
MC
BMS
0.3
DSR
SWD
0.2
DRFI
SVO
0.2
PCA
CA
0.1
MR
0.1
HS
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0
0.1
0.2
0.3
Recall
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
1 Precision Recall F-measure
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Ours MC DRFI MR SEG BMS SVO DSR PCA HS
SF SWD CA
Fig. 6. Quantitative comparisons of saliency maps produced by different approaches on SED2 dataset
5. Conclusion This paper proposes a saliency detection method via two-stage absorbing Markov chain based on background. The proposed method aims to solve the problems in MC method, like inaccurate selection of boundary background seeds, the lack of background seed coverage, and the
simplification of propagation mode. With boundary connectivity, boundary prior, and convex hull techniques, the proposed method excludes the non-background seed nodes in the boundary, appropriately adds the background seeds in the region outside the candidate boundary background set and the convex hull, and selects the foreground seeds in the convex hull. Making full use of the complementarity between background-based and foreground-based detection methods, we perform effective propagation on the two-stage absorbing Markov chain based on background and foreground, and the propagation results are subjected to reasonable fusion and smoothing optimization. Qualitative and quantitative experiments show that compared with the traditional 12 methods, the proposed method can not only effectively highlight the salient regions and suppress background, but also improve the Precision, Recall and F-measure values. Future work should focus on higher-level salient feature extraction, and apply the relevant theories and methods of machine learning to the field of visual saliency to further improve the detection performance of the proposed method on complex images.
Acknowledgements This work is supported by the National Natural Science Foundation of China (Grant No. U1831127), the Natural Science Foundation of Jiangsu Higher Education Institutions of China (Grant No. 16KJB520020).
References [1] Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry[M]//Matters of intelligence. Springer, Dordrecht, 1987: 115-141. [2] Treisman A M, Gelade G. A feature-integration theory of attention[J]. Cognitive psychology, 1980, 12(1): 97-136. [3] Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(PAMI), 1998, 20(11): 1254-1259. [4] Murray N, Vanrell M, Otazu X, et al. Saliency estimation using a non-parametric low-level vision model[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2011: 433-440. [5] Bruce N D B, Tsotsos J K. Saliency, attention, and visual search: An information theoretic approach[J]. Journal of vision, 2009, 9(3): 5-5. [6] Harel J, Koch C, Perona P. Graph-based visual saliency[C]. Proceedings of Advances in Neural Information Processing Systems(NIPS), 2007: 545-552. [7] Gopalakrishnan V, Hu Y, Rajan D. Random walks on graphs to model saliency in images[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2009: 16981705. [8]Gao D, Vasconcelos N. Bottom-up saliency is a discriminant process[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2007: 1-6. [9] Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 32(1): 171-177. [10] Xie Y, Lu H, Yang M H. Bayesian saliency via low and mid level cues[J]. IEEE Transactions on Image Processing, 2013, 22(5): 1689-1698. [11] Xie Y, Lu H. Visual saliency detection based on Bayesian model[C]. Proceedings of IEEE International Conference on Image Processing, 2011: 645-648. [12] Achanta R, Hemami S, Estrada F, et al. Frequency-tuned salient region detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2009: 15971604. [13] Hou X, Harel J, Koch C. Image signature: Highlighting sparse salient regions[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 34(1): 194-201. [14] Liu T, Sun J, ZHENG N N, et al. Learning to detect a salient object[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2007: 1-8. [15] Liu T, Yuan Z, Sun J, et al. Learning to detect a salient object[J]. IEEE Transactions on Pattern analysis and machine intelligence, 2011, 33(2): 353-367. [16] Jiang H, Wang J, Yuan Z, et al. Salient object detection: A discriminative regional feature integration approach[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 2083-2090. [17] Li G, Yu Y. Visual saliency based on multiscale deep features[C]. Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 5455-5463. [18] Zhao R, Ouyang W, Li H, et al. Saliency detection by multi-context deep learning[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 12651274. [19] Li G, Yu Y. Deep contrast learning for salient object detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016: 478-487. [20] Wang J, Borji A, Kuo C C J, et al. Learning a combined model of visual saliency for fixation prediction[J]. IEEE Transactions on Image Processing, 2016, 25(4): 1566-1579. [21] Cheng M M, Mitra N J, Huang X, et al. Global contrast based salient region detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 569-582. [22] Qin C, Zhang G, Zhou Y, et al. Integration of the saliency-based seed extraction and random walks for image segmentation[J]. Neurocomputing, 2014, 129: 378-391. [23] Shehnaz M, Naveen N. An object recognition algorithm with structure-guided saliency detection and SVM classifier[C]. Proceedings of International Conference on Power, Instrumentation, Control and Computing (PICC), 2015: 1-4. [24] Han S, Vasconcelos N. Object-based regions of interest for image compression[C]. Proceedings of Data Compression Conference, 2008: 132-141. [25] Papushoy A, Bors A G. Image retrieval based on query by saliency content[J]. Digital Signal Processing, 2015, 36: 156-173. [26] Sharma G, Jurie F, Schmid C. Discriminative spatial saliency for image classification[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 35063513. [27] Wang L, Xue J, Zheng N, et al. Automatic salient object extraction with contextual cue[C]. Proceedings of International Conference on Computer Vision(ICCV), 2011: 105-112. [28] Yang J, Yang M H. Top-down visual saliency via joint CRF and dictionary learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(3): 576-588. [29] Wei Y, Wen F, Zhu W, et al. Geodesic saliency using background priors[C]. Proceedings of European Conference on Computer Vision(ECCV), 2012: 29-42. [30] Gao D, Mahadevan V, Vasconcelos N. The discriminant center-surround hypothesis for bottom-up saliency[C]. Proceedings of Advances in Neural Information Processing Systems(NIPS), 2007: 497-504. [31] Yang C, Zhang L, Lu H. Graph-regularized saliency detection with convex-hull-based center prior[J]. IEEE Signal Processing Letters, 2013, 20(7): 637-640. [32] Jiang H, Wang J, Yuan Z, et al. Automatic salient object segmentation based on context and shape prior[C]. Proceedings of British Machine Vision Conference(BMVC), 2011: 1-12. [33] Shen X, Wu Y. A unified approach to salient object detection via low rank matrix recovery[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 853860. [34] Yang C, Zhang L, Lu H, et al. Saliency detection via graph-based manifold ranking[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 31663173. [35] Qin Y, Lu H, Xu Y, et al. Saliency detection via cellular automata[C] . Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 110-119. [36] Li H, Lu H, Lin Z, et al. Inner and inter label propagation: salient object detection in the wild[J]. IEEE Transactions on Image Processing, 2015, 24(10): 3176-3186. [37] Jiang B, Zhang L, Lu H, et al. Saliency detection via absorbing markov chain[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013: 1665-1672. [38] Xie Y, Lu H. Visual saliency detection based on Bayesian model[C]. Proceedings of the 18th IEEE International Conference on Image Processing, 2011: 645-648. [39] Achanta R, Shaji A, Smith K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 34(11): 22742282. [40] Zhu W, Liang S, Wei Y, et al. Saliency optimization from robust background detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2014: 28142821. [41] Rahtu E, Kannala J, Salo M, et al. Segmenting salient objects from images and videos[C]. Proceedings of European Conference on Computer Vision(ECCV), 2010: 366-379.
[42] Zhang J, Sclaroff S. Saliency detection: A boolean map approach[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013: 153-160. [43] Chang K Y, Liu T L, Chen H T, et al. Fusing generic objectness and visual saliency for salient object detection[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011: 914-921. [44] Li X, Lu H, Zhang L, et al. Saliency detection via dense and sparse reconstruction[C]. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013: 2976-2983. [45] Margolin R, Tal A, Zelnik-Manor L. What makes a patch distinct?[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 1139-1146. [46] Yan Q, Xu L, Shi J, et al. Hierarchical saliency detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 1155-1162. [47] Perazzi F, Krรคhenbรผhl P, Pritch Y, et al. Saliency filters: Contrast based filtering for salient region detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 733-740. [48] Duan L, Wu C, Miao J, et al. Visual saliency detection by spatially weighted dissimilarity[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2011: 473480. [49] Goferman S, Zelnik-Manor L, Tal A. Context-aware saliency detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 34(10): 1915-1926. [50] Shi J, Yan Q, Xu L, et al. Hierarchical image saliency detection on extended CSSD[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(4): 717-729. [51] Li Y, Hou X, Koch C, et al. The secrets of salient object segmentation[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2014: 280-287. [52] Alpert S, Galun M, Basri R, et al. Image segmentation by probabilistic bottom-up aggregation and cue integration[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2007: 1-8. [53] Yan Q, Xu L, Shi J, et al. Hierarchical saliency detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2013: 1155-1162. [54] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc) challenge[J]. International journal of computer vision, 2010, 88(2): 303-338.
Authors contribution
In this paper, the first author Wei Tang contributes the most to the paper. The second author Zhijian Wang and the third author Jiyou Zhai conduct experiment to verify the effectiveness of the proposed method.
Highlights 1) The boundary-prior saliency map is fused with the background-absorbing saliency map to obtain the first-stage saliency map based on background. 2) Based on the first-stage saliency map, the foreground seeds are selected within the range of the convex hull. 3) Based on spatially weighted color contrast, in the region outside the boundary and the convex hull, some background nodes are added appropriately as background seeds
Conflict of interest All authors confirm that there is no conflict of interest.