Pattern Recognition 92 (2019) 52–63
Contents lists available at ScienceDirect
Pattern Recognition journal homepage: www.elsevier.com/locate/patcog
SQL: Superpixels via quaternary labeling Dengfeng Chai Institute of Spatial Information Technique, Zhejiang University, No. 38, Zheda Road, Hangzhou, Zhejiang 310027, China
a r t i c l e
i n f o
Article history: Received 20 April 2017 Revised 28 April 2018 Accepted 19 March 2019 Available online 19 March 2019 Keywords: Superpixels Segmentation Seaming Pixel labeling Graph cuts
a b s t r a c t This paper formulates superpixel segmentation as a pixel labeling problem and proposes a quaternary labeling algorithm to generate superpixel lattice. It is achieved by seaming overlapped patches regularly placed on the image plane. Patch seaming is formulated as a pixel labeling problem, where each label indexes one patch. Once the optimal seaming is completed, all pixels covered by one retained patch constitute one superpixel. Further, four kinds of patches are distinguished and assembled into four layers correspondingly, and the patch indexes are mapped to the quaternary layer indexes. It significantly reduces the number of labels and greatly improves labelling efficiency. Furthermore, an objective function is developed to achieve optimal segmentation. Lattice structure is guaranteed by fixing patch centers to be superpixel centers, compact superpixels are assured by horizontal and vertical constraints enforced on the smooth terms, and coherent superpixels are achieved by iteratively refining the data terms. Extensive experiments on BSDS data set demonstrate that SQL algorithm significantly improves labeling efficiency, outperforms the other superpixel lattice methods, and is competitive with state-of-the-art methods without lattice guarantee. Superpixel lattice allows contextual relationships among superpixels to be easily modeled by either MRFs or CNN. © 2019 Elsevier Ltd. All rights reserved.
1. Introduction Superpixels have become an effective alternative to pixels since their birth [1]. Superpixels have two prime advantages over pixels. One is the perceptual meaning, the other is the complexity. In contrast with raw pixels generated by digital sampling, superpixels are formed by pixel grouping, whose principles are based on the classical Gestalt theory [2] assuring superpixels enhanced perceptual meaning. Since many pixels are grouped into one superpixel, the number of superpixels is much smaller than that of pixels. When superpixels instead of pixels serve as atoms, the size of an image with respect to the atoms is reduced greatly. The size reduction can accelerate the processing in subsequent tasks, and in turn, it is possible to employ some advanced methods which might be computationally infeasible for the huge number of pixels. For example, compared with pixel-based convolutional neural network (CNN), superpixel-based CNN (SuperCNN) enables efficient analysis of large context and is much more effective for salient region detection [3]. A variety of computer vision and pattern recognition problems have benefited from above advantages [4]: feature extraction [5], clustering [6], classification [7], segmentation [8–10], saliency detection [11], contour detection [12], stereo computation [13–15], objectness measure [16], proposal generation
E-mail address:
[email protected] https://doi.org/10.1016/j.patcog.2019.03.012 0031-3203/© 2019 Elsevier Ltd. All rights reserved.
[17], object localization [18] and object tracking [19–21] to name a few. They also cover some domain specific applications such as remotely sensed image analysis [22,23] and medical image analysis [24,25]. Few approaches produce superpixels that conform to a regular lattice [26–28]. Lattice assures superpixels the same neighborhood system as that of pixels, and it is much more convenient to establish their contextual relationships in Markov random fields (MRFs) modeling [29]. Moreover, lattice is a prerequisite for some models such as CNN. For example, without lattice structure, SuperCNN has to treat the segmented image as a 1D array of superpixels but not a 2D array of superpixels. As a result, the structure information of the image is destroyed in SuperCNN [3]. On the other hand, the segmentation performance may be sacrificed to some extent in the maintenance of grid structure. The impaired performance prevents superpixel lattice from being widely used. It is an important issue to improve segmentation performance while maintain lattice structure. 1.1. Pixel labeling Pixel labeling has been widely studied in the community of image analysis [29]. The labels for each pixel can denote quantities such as gray, disparity, category and so on. The label fields for each image can be elegantly expressed as Markov random fields (MRFs). In a Bayesian framework, pixel labeling is formulated as maximum
D. Chai / Pattern Recognition 92 (2019) 52–63
53
a posterior estimation of the MRF (MAP-MRF), which results in an energy minimization problem. In the past two decades, Graph Cut algorithms and Belief propagation algorithms have been developed to solve the energy minimization problem [30]. Benefited from these effective energy minimizers, pixel labeling has become an effective formulation for many problems such as image restoration [31], stereo matching [32], image segmentation [33], etc. The powerfulness of Markov random fields (MRFs) benefit from the contextual relationships among sites. These relationships depend on a neighborhood system, which is easy to define on a regular gird but not a general structure. Pixel labeling can be easily extended to superpixel labeling if superpixels conform to a regular lattice [26–28,34]. Superpixel labeling can serve as a natural framework for superpixel-based image analysis. 1.2. Superpixel methods Superpixel segmentation methods can be grouped into different categories based on different criteria. Graph partition, boundary evolution and data clustering can be distinguished as their formulations. The first type includes Normalized Cut (NC) [1], Graphbased Superpixels (GS) [35], Lattice Cut (LC) [27], Entropy Rate Superpixels (ERS) [36], Compact Superpixels (CS), Variable Patch Superpixels (VPS) and Constant Intensity Superpixels (CIS) [37], etc. The second formulation covers Turbopixel (TP) [38], Structuresensitive Superpixels (SS) [39], Superpixel Extracted via EnergyDriven Sampling (SEEDS) [40]. Simple Linear Iterative Clustering (SLIC) [41] and Vcells [42] belong to the last formulation. SLIC, Vcells, Mean Shift (MS) [43] and Quick Shift (QS) [44] work on the feature space while most methods work on the original space as TP. SLIC and SEEDS need region centers and boundaries respectively to initialize the segmentation while NC and GS need no initialization. Most methods such as GS have no explicit constraints of superpixels’ spatial extent while SLIC, CIS and Superpixels via Pseudo-Boolean Optimization (SPBO) [28] prevent each superpixel to cover outside a predefined rectangle. Superpixel Lattice (SL) [26], LC and SPBO guarantee a lattice structure of superpixels while the others produce general superpixels. Although the initial states of some methods are regular grids, the lattice structure is not maintained in their procedure of segmentation and postprocessing. For example, the seeds for region growing in TP are distributed as a regular grid [38], the centers of clusters in SLIC are also selected regularly [41], the initial boundary to be adjusted by SEEDS is a regular lattice [40], the initial patches to be seemed by CS, VPS and CIS are the cells of a lattice [37]. However, no constraints are enforced to maintain a lattice structure of the superpixels in the segmentation. Such regularity may be further impaired by the postprocessing as in SLIC. Superpixel segmentation conforming to a lattice structure can be viewed as a strip seaming problem, which will be clearly described in Section 2.1. An image is covered by some overlapped horizontal and vertical strips as shown in Fig. 1a. Strip seaming is to stitch the strips such that the seams are encouraged to align with image edges. Seams determine the strip borders and generate nonoverlapped strips. All pixels covered by same nonoverlapped strips constitute one superpixel. SL employs dynamic programming or s-t min-cut method to find an optimal vertical or horizontal seam with respect to a boundary map of the image [26]. These vertical and horizontal seams are found alternatively and progressively. As an improved version, LC finds all the vertical seems or horizontal seams as a whole by assigning each pixel a label corresponding to a vertical strip or a horizontal strip [27]. The basic graph construction for this multi-label MRF is similar to Ishikawa [45], and three extra constraints are introduced to maintain lattice regularity. Superpixels are expected to be coherent based on local color models, and they are encouraged to align with image
Fig. 1. Three types of seaming for superpixel segmentation. (1a) and (1b) depict strip-based seaming and patch-based seaming respectively, (1c) illustrates a novel schema for patch-based seaming. In (1c), all non-overlapped patches are integrated as one layer, and there are only 4 layers.
edges computed as a boundary map. They are introduced into the objective function as data and smooth terms respectively. The constructed objective function is optimized via a single graph cut computation. These methods usually produce superpixels with nonuniform sizes since they are dominated by the pre-computed boundary map. SPBO is a different formulation of aforementioned pixel labeling problem [28]. All strips have fixed width and neighboring strips overlap with each other. Both horizontal strips and vertical strips are indexed by continuous numbers and they are separated into odd group and even group. Both horizontal seaming and vertical seaming are achieved by binary labeling, that is assigning a pixel to either odd strip or even strip. The horizontal seams and vertical seams are found independently. The constructed objective function is a pseudo-Boolean function without data term and its smooth terms are either submodular or nonsubmodular. Instead of finding a whole horizontal or a vertical seams, Iterative Refining Superpixel Lattice (IRSL) refines local seams iteratively [34]. The seams are refined to align with image edges indicated by large gradients. All above methods find horizontal and vertical seams separately and independently. It is a negative factor for global optimality. Patch seaming is an alternative to strip seaming. As depicted in Fig. 1b, an image is covered by many overlapped patches instead of strips. CS, VPS and CIS find all seams as a whole [37]. However, the seams are neither horizontal nor vertical. No lattice structure is guaranteed by the seams. Moreover, the number of labels equals to the number of patches (superpixels), which is quite large in practice. The large number of labels prevents efficient labeling since the optimization algorithms take either linear or quadratic time with respect to the number of labels [46]. This paper proposes superpixels via quaternary labeling to generate superpixel lattice. The basic formulation is patch seaming. Four kinds of patches are distinguished and assembled into four layers correspondingly as illustrated in Fig. 1c. Superpixel segmentation
54
D. Chai / Pattern Recognition 92 (2019) 52–63
is achieved via layer seaming instead of patch seaming, that is, each pixel is assigned to one layer instead of one patch. In this way, only 4 labels are involved. The complexity of optimization is reduced greatly and the efficiency of segmentation is improved significantly. Further, superpixel centers are fixed and seams are enforced to be either horizontal or vertical. A lattice structure is guaranteed by such centers and seams. Furthermore, the color of each patch is measured by its averaged color but not the color of its center, and it is iteratively refined in the procedure of segmentation. It assures more coherent superpixels. Finally, horizontal constraint and vertical constraints are developed to produce more compact superpixels.
2.2. Pixel labeling problem Let P and L be the sets of all pixels and labels respectively, pixel labeling is the problem of assigning one label l ∈ L to each pixel p ∈ P in order to fit the observed images as well as possible. The optimal labeling is achieved by minimizing the following energy function:
fˆ = arg min E ( f ) = arg min(Ed ( f ) + λEs ( f )) f
f
(1)
2. Superpixels via pixel labeling
where, f is a specific labeling, Ed (f) is the data term reflecting the agreement between the labeling and image, Es (f) is the smooth term reflecting the contextual relationships among neighboring pixels, and λ is a parameter for the balance between the data term and smooth term. In a pairwise random field, the data term and smooth term take the following forms:
2.1. Segmentation via seaming
Ed ( f ) =
Without loss of generality, suppose that the image shown on the right of Fig. 2 need to be segmented into up and bottom parts as indicated by their colors. Two overlapping patches are selected to cover the up and bottom parts as shown on the left of Fig. 2. These patches are stitched to eliminate their overlap. Seaming is the procedure of finding the optimal dividing line between two parts and clipping the patches along the dividing line to discard the areas under and above seam in up and bottom patches respectively. The dividing line is called as a seam. Seams constitute region (or superpixel) boundaries. In this image, the optimal seam is the line splitting blue and red areas. The optimality is reflected by faithfulness of fitting the seam to image edges. The retained patches cover the whole image without overlap, that is, all pixels are covered by one and only one patch. In this regard, patch seaming is to assign a determinate patch to each pixel such that the seams align well with image edges. Superpixel segmentation can be formulated as either strip seaming or patch seaming. In Fig. 1a, some overlapping horizontal strips and vertical strips are stitched. All pixels in both the same retained horizontal strip and vertical strip constitute one superpixel. In contrast, many overlapping patches are stitched together in Fig. 1b. All pixels in the same retained patch constitute one superpixel. Horizontal and vertical strips are stitched separately while all patched are seamed as a whole. Since separate and independent labeling is a negative factor for global optimality, patch seaming is preferred to achieve an optimal seaming. Therefore, this paper develops methods based on patch seaming instead of strip seaming, and focus on how to seam patches to achieve global optimality.
Dp( f p )
(2)
p∈P
Es ( f ) =
Vp,q ( f p , fq )
(3)
p,q∈N
where, Dp (fp ) measures how well fp fits the observed image at p, and Vp,q (fp , fq ) makes f vary smoothly across neighboring pixels. 2.3. Superpixels via pixel labeling Since patch seaming is to assign a determinate patch to each pixel such that the seams fit image edges as well as possible, it can be formulated as a pixel labeling problem, in which the labels are patch indexes. In a labeled image, the seams (superpixel boundaries) are indicated by all pair of neighboring pixels with different labels. To allows labels vary across edges such that superpixel boundaries are encouraged to align with image edges, contrastsensitive penalties are introduced into the smooth terms. To generate coherent superpixels, the data terms are constructed based on the similarity and affinity between pixels and superpixels. The details will be elaborated in Section 3.4. The key to above pixel labeling problem is the energy minimization. The computation time is dominated by the number of labels n, which equals to the expected number of superpixels. It is usually some hundreds for a general image and may reach a level of million for a remotely sensed image. Although CS, VPS and CIS explored patch coverage to reduce the pixels involved in each expansion [37], the computation is still slow even for a general image. Most energy minimizers take either linear or quadratic time with respect to n. For example, the complexities of α −expansion and α − β swap algorithms are O(n) and O(n2 ) respectively [46]. These algorithms are slow for a large number of labels. Although a bisection technique can be employed to improve labeling efficiency as in Chai et al. [47], it may impair the segmentation performance to some extent. In contrast, a quaternary labeling schema is proposed to reduce the number of labels without sacrificing segmentation performance in Section 3.4. 3. Superpixel via quaternary labeling 3.1. Initial patch placement
Fig. 2. Segmentation via seaming. Two overlapped patches are seamed together. The seam separates up and bottom parts coming from two patches respectively. Seaming is to find the optimal seam, which is indicated by the border between blue area and red area. The optimal seam splits the image into two coherent segments. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
A regular and even partition of the image plane is depicted in Fig. 3a, where each small square represents one ideal superpixel and its center is depicted as a circle. When their centers are fixed and their lengths are multiplied by 2, these squares serve as the initial patches as depicted in Fig. 3b, where each patch is filled with a specific color. Since the length of squares are multiplied by 2, the enlarged squares overlap with each other. The pixels in the center area are covered by four patches, the pixels in four border
D. Chai / Pattern Recognition 92 (2019) 52–63
(a)
(b)
55
(a)
(b)
(c)
(d)
Fig. 3. Patch placement. (3a) is a regular and even partition of the image plane. When their centers are fixed and their lengths are multiplied by 2, the squares serve as the initial patches depicted in (3b). Each patch has one specific color, and the illustrated colors are mixture of colors of overlapping patches.
areas are covered by two patches, and the pixels in four corner areas are covered by only one patch. Therefore, pixels have different colors when they are covered by different patches. The only free parameter for initial patch placement is the length of square. It can be specified directly or calculated from the number of superpixels. Let N be the total number of pixels in the image and K be the expected number of superpixels, the averaged numberof pixels in each superpixel is N/K, and the length of square d = N/K pixels. Let Nr and Nc be the total rows and columns in the image, the total rows and columns of patches are R = Nr /d and C = Nc /d respectively.
Fig. 4. Patch assembling. All patches are assembled into four layers indicated by different colors, and all patches in each layer do not overlap with each other. White region in each layer indicates border uncovered by patches.
3.2. Patch assembling As shown in Fig. 3, all patches are placed row by row and column by column. Both the rows and the columns can be distinguished as odd ones and even ones. All patches can be classified into four groups:
1
2
1
1. 2. 3. 4.
2
4
2
1
2
1
odd row and odd column; odd row and even column; even row and odd column; even row and even column.
All patches in one group are assembled into one layer as illustrated in Fig. 4. Since the patch length is two times of the distance between neighboring patch centers, the interleaved patches do not overlap with each other, that is, all patches do not overlap with each other in one layer. It means that a unique patch can be determined for each pixel once its layer is given. In this way, a layer seaming method is proposed to replace the patch seaming method. Instead of assigning an initial patch to each pixel directly, a layer is assigned to each pixel. Once a layer is assigned to a pixel, an original patch in this layer can be determined for the pixel since it is covered by only one patch in one layer. 3.3. Label mapping Let the patches be indexed by {0, 1, . . . , R ∗ C − 1}. These patch indexes are mapped to layer indexes. Since there are only four layers, it needs only four labels to distinguish the layers. The labels {0, 1, 2, 3} are called as quaternary labels. Correspondingly, the pixel labeling is called as quaternary labeling. An original label l ∈ L = {0, 1, . . . , R ∗ C − 1} is mapped to a quaternary label l ∈ L = {0, 1, 2, 3} by:
l = δ p (l/C ) ∗ 2 + δ p (l%C )
(4)
where, δ p (x) is a parity indicator, it takes 0 and 1 when x is even and odd respectively, / and % are division and modulo operators respectively.
(a)
(b)
Fig. 5. Label mapping. (5a) distinguishes corner, border, and center regions covered by 1, 2 or 4 patches. (5b) enumerates four kinds of patches according to the parity of their corners.
The inverse mapping from quaternary labels to original labels does not exist in general. However, the original label can still be recovered from its quaternary label for each pixel. As indicated by the numbers in Fig. 5a, each pixel is covered by four patches at most. Further, their row indexes and column indexes have different parities. That is, four patches are assembled into different layers. In this case, the mapping is injective and the inverse mapping exists. Therefore, the patch indexes can be recovered from quaternary labels. 3.4. Quaternary labeling Let S be the intersection of four patches, that is, S is a square whose corners are centers of four patches labeled as 0,1,2,3. As depicted in Fig. 5b, there are four kinds of S according to their corners’ parity. It is necessary to distinguish them to define the data and smooth terms correctly.
56
D. Chai / Pattern Recognition 92 (2019) 52–63
(a)
(b)
(c)
Fig. 6. Constraints of quaternary labeling. It is depicted for an intersection of four patches labeled as 0,1,2,3 and indicated by red, green, blue and cyan circles. Black line in (6a) horizontally separate pixels into left part and right part labeled as {0, 2} and {1, 3} respectively, black line in (6b) vertically separate pixels into up part and bottom part labeled as {0, 1} and {2, 3} respectively. (6c) shows disconnected superpixels resulted from these constraints. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Without loss of generality, suppose that the top-left, top-right, bottom-left and bottom-right corners are labeled as 0,1,2,3 respectively as shown in Fig. 6. To assure lattice structure, the square should be split along horizontal and vertical seams depicted in Fig. 6a and b respectively. Labels can vary from 0,2 to 1,3 along horizontal lines or from 0,1 to 2,3 along vertical lines, but they cannot change inversely. These constraints will be utilized to construct the smooth terms elaborated in Section 3.4.2. However, as illustrated in Fig. 6c, these constraints are not enough to enforce connectivity. It is repaired by an extra labeling described in Section 3.4.3. 3.4.1. Data term The color difference between a pixel p and a superpixel (i.e. a patch labeled as fp ) is explored to measure their similarity. The spatial distance between the pixel and the superpixel is explored to measure their affinity. Similarity and affinity are combined to measure how well a label fp ∈ {0, 1, 2, 3} fits the observed image at p. The data term in Eq. (2) is defined as follows:
Dp( f p ) =
r 2 + g2 + b2 + x2 + y2
(5)
and
r = r p − rs ( f p )
(6)
g = g p − gs ( f p )
(7)
b = b p − bs ( f p )
3.4.2. Smooth term To make f vary smoothly across neighboring pixels, a penalty is introduced into Vp,q (fp , fq ) when fp = fq . That is, the smooth term is built upon the Potts model [48]:
Vp,q ( f p , fq ) = β p,q (1 − δ ( f p , fq ))
(11)
where, the Kronecker delta δ (fp , fq ) equals one whenever f p = fq and zero otherwise, the contrast-sensitive penalty β p,q is calculated from color difference between neighboring pixels [33]:
||[r , g , b ]T − [rq , gq , bq ]T ||2 β p,q = exp − p p p β
(12)
where, [rp , gp , bp ] and [rq , gq , bq ] are color vectors of p and q respectively, the Euclidean distance is adopted to measure color difference, and β is the color difference averaged over the whole image. When p and q cross an image edge, [rp , gp , bp ] and [rq , gq , bq ] are usually quite different. Correspondingly, β p,q is small and contributes little to the smooth term. It allows p and q to have different labels without significant penalty. In this way, it encourage superpixel boundaries to align with image edges. Further, above smooth term is improved to reflect both horizontal constraint and vertical constraint. Without loss of generality, let the square illustrated in Fig. 6 cover both p and q, and let q be the right neighbor or beneath neighbor of p. In this case, the smooth term reflecting horizontal constraint is:
Vp,q ( f p , fq ) = (8)
+∞
β p,q (1 − δ ( f p , fq ))
p%2 > q%2 otherwise
(13)
Similarly, the smooth term reflecting vertical constraint is:
x = k ( x p − xs ( f p ) )
(9)
y = k ( y p − ys ( f p ) )
(10)
where, r, g, b, x, y denotes red, green, blue channels and x, y coordinates respectively, ∗ p denotes a pixel value at p, k is a scaling factor for balance between colors and coordinates, and s(fp ) denotes a superpixel labeled as fp . It can be easily extended to deal with multi-spectral images. By minimizing the data term, a pixel is grouped into a superpixel with minimal similarity and affinity. In this way, it assures coherent superpixels. At the beginning, the color of a superpixel (patch) is initialized to be the color of its central pixel. Once all pixels are labeled, the color of a superpixel is revised to be its averaged color. Then, based on the new superpixel colors, the energy can be updated and minimized again. In this way, the superpixel colors and labels are refined iteratively. As supported by evaluations in Section 4.3, two iterations are enough to converge. Unlike color, both xs( f p ) and ys( f p ) are fixed to be its central coordinates. Such fixed coordinates facilitate patch assembling and pixel labeling.
Vp,q ( f p , fq ) =
+∞
β p,q (1 − δ ( f p , fq ))
p/2 > q/2 otherwise
(14)
Furthermore, above smooth term is improved to correct inconsistency of quaternary labeling as illustrated in Fig. 7. Without loss of generality, let two squares cover p and q respectively, and let q be the right neighbor of p. Both p and q are labeled as 0 and shown
Fig. 7. Inconsistency of quaternary labeling. The inconsistency is demonstrated by the red pixels, which have same labels but belong to different superpixels. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
D. Chai / Pattern Recognition 92 (2019) 52–63
57
(a) Input image
(b) First labeling
(c) Second labeling
(d) Third labeling
(e) Inverse mapping
(f) Final result
Fig. 8. Procedure of SQL algorithm. 8a is the input to SQL algorithm, (8b) and (8c) are the results at step 20 after one and two iterations respectively. (8d) and (8e) are the results at step 29 and 30, respectively. The segmented superpixels are depicted in (8f), in which the superpixel boundaries are drawn as green lines, the original patch centers are drawn as red circles. The blue circles and green circles in (8c) and (8d) indicate some corrected labels compared with their former states. The red circles depict the lattice structure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
as red pixels, however, they belong to different patches and correspond to different original labels. To correct such inconsistency of quaternary labeling, the smooth term is constructed as follows:
β p,q Vp,q ( f p , fq ) = β p,q β p,q (1 − δ ( f p , fq ))
f p = 0, f q = 0 f p = 2, f q = 2 otherwise
looped for α = 0, 1, 2, 3 and this loop is carried out iteratively until a local minimum is reached. 3.5. SQL Algorithm
(15)
The smooth term for a vertical pair of neighboring pixels can be constructed in a similar way. It should be pointed out that the conditions on right of above equations depend on the parity of patches and vary from patch to patch. Therefore, patch parity needs to be checked to develop correct smooth terms. 3.4.3. Connectivity repairing As depicted in Fig. 6c, horizontal and vertical seams may intersect at more than one point. In this case, it generates more than four subregions, and produce unconnected superpixels. To repair the connectivity, all pixels connected to the superpixel centers are fixed and the rest pixels are relabeled by an extra step. The relabeling is achieved by minimizing an energy function only with smooth terms. Since a penalty is introduced into the energy for each pair of neighboring pixels with different labels, an unique label is expected to be optimal for a connected component. By minimizing the smooth terms, the component is merged to one of its neighbors. In this figure, the red, green, blue and cyan region around four corners are fixed, and the other two components are relabeled. Two components are merged to their neighbors. This relabeling assures superpixels the connectivity. 3.4.4. Energy minimization The optimal labeling can be achieved by minimizing the energy function with above data and smooth terms. Since the number of labels is reduced to 4, both α −expansion and α − β swap algorithm are efficient energy minimizers [46]. Expansion algorithm is employed to find an optimal labeling. From an initial labeling, it performs a series of expansion moves to reduce the energy. In each expansion move, all pixels either change their labels to α or keep their old labels. Each expansion move is solved by minimizing the involved energy via a s-t cut algorithm. The expansion move is
Above sections are summarized by the pseudocode in Algorithm 1. It requires five inputs. I is an image, K is an expected number of superpixels, T is an expected number of iterations, λ is a data term and smooth term balancing parameter, k is a color and spatial coordinate balancing parameter respectively. The first part does initialization. The outer for-loop (5 - 20) iteratively refine the labeling. The inner while-loop corresponds to α −expansion. The details of 10 and 11 are described in Section 3.4. The for-loop (22–29) repairs connectivity. Fig. 8 demonstrates the procedure of SQL by depicting the intermediate results. Quaternary labeling after one and two iterations are presented in Fig. 8b and c respectively. Connectivity repaired labeling and final labeling with original labels are presented in 8d and e respectively. 4. Experimental results 4.1. Experiment setup All experiments are carried out on a computer with an Intel Core i7-4770 CPU @ 3.40 GHz, 8 Gb RAM running Windows 7. Both SQL algorithm and Superpixel evaluation method are implemented in C++, and no parallelization is employed. For simplicity, λ in Eq. (1) is set to be 1. To be adaptive to variable patch sizes, k in Eq. (6) is set to be 20 N/K , where N is the total number of pixels in the image and K is the expected number of superpixels. 4.1.1. Data sets Berkeley Segmentation Dataset (BSDS) [49,50] is employed to test and evaluate the methods. BSDS is widely used for superpixel evaluation. It consists of 500 natural images, and each image has a size of either 481 × 321 or 321 × 481. These images have been manually segmented and the human annotations serve as a benchmark for evaluation of segmentation methods.
58
D. Chai / Pattern Recognition 92 (2019) 52–63
(a) HV
(b) SQL
Fig. 9. Strip seaming and patch seaming. Green lines depict superpixel boundaries. Left is generated by horizontal and vertical strip seaming (HV) (9a) and right is generated by patch seaming (SQL) (9b). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Undersegmentation Error 1.4
Boundary recall
0.9 0.85 0.8 0.75 0.7
Achievable Segmentation Accuracy
1.2
1
0.8
0.6
0.4
0.65
0
200
400
600
800
1000
1200
1400
Number of superpixels
1600
0.2
0
200
400
600
800
1000
1200
1400
1600
Number of superpixels
Explained Variation
0.98
0.9
0.97
0.88
0.96
0.86
Explained Variation
0.95
Undersegmentation error
1.6
Achievable segmentation accuracy
Boundary Recall 1
0.95 0.94 0.93 0.92
0.84 0.82 0.8 0.78
0.91
0.76
0.9
0.74
0.89
0
200
400
600
800
1000
1200
1400
1600
0.72
Number of superpixels
HV
0
200
400
600
800
1000
1200
1400
1600
Number of Superpixels
SQL
Fig. 10. Evaluation of superpixels based on strip seaming and patch seaming. BR, UE, ASA and EV are reported for superpixels generated by strip seaming and patch seaming. HV indicates horizontal and vertical strip seaming whereas SQL indicates patch seaming.
4.1.2. Metrics Four metrics are employed to measure the performance of superpixel segmentation. Boundary Recall (BR) [36,38,40,41] and Undersegmentation Error (UE) [37,38,40,41] are criteria of adherence to boundaries. Achievable Segmentation Accuracy (ASA) [36,40] is a criterion of segmentation accuracy. Explained Variation (EV) [26] measures superpixels’ coherency. BR calculates coincidence between ground truth boundary and superpixel boundary as
BR =
p∈Bg
I[∃ p∈Bs p − q < ] Bg
(16)
where, Bg and Bs are two sets of pixels on ground-truth boundary and superpixel boundary respectively, I[x] is an indictor function taking 0 and 1 when x is false and true respectively, and = 2 as in the literatures [36,40,41]. UE calculates the fraction of superpixels leaking the boundary of ground-truth segments as
UE =
i
|s j − gi | | g | i i
j |s j ∩gi =∅
(17)
where, sj and gi are superpixel and ground-truth segment respectively. ASA is an upper bound of performance when superpixels serve as units for object segmentation. It is calculated as
ASA =
j
maxi |s j ∩ gi | i |gi |
(18)
EV describes the proportion of image variation that is explained when superpixels is compressed as units of representation. It is calculated as
( μi − μ ) 2 EV = i 2 i ( xi − μ )
(19)
where, xi is the value is the local mean for the other metrics, this ceptible to variation in perception.
of pixel i, μ is the global mean, and μi the superpixel containing pixel i. Unlike one is human independent, it is not susmanual annotations resulted from human
4.2. Strip seaming vs. patch seaming As stated in Section 2.1, superpixel segmentation can be formulated as either strip seaming or patch seaming. Horizontal strips and vertical strips are seamed separately. In contrast, patches are seamed together as a whole instead of separately. The argument of patch seaming outperforming strip seaming is supported by their comparison and evaluation depicted in Figs. 9 and 10 respectively. In this experiment, both data term and smooth term are constructed as described in Section 3.4.1 and 3.4.2 respectively. The objective function is optimized in two ways. One is by SQL Algorithm 1. The other is by two binary labeling. In the first labeling, either {0, 1} or {2, 3} is assigned to each pixel such that it is classified into either up part or bottom part. In the second labeling, either {0, 2} or {1, 3} is assigned to each pixel such that it is classified into left part or right part. Based on two binary labeling, each pixel get its determinate label. The first and second binary labeling correspond to horizontal and vertical seaming respectively. In contrast, SQL corresponds to patch seaming. The comparison reveals that the quality of quaternary labeling is better than that of two binary labeling even if the objective function is same. In other words, patch seaming outperforms strip seaming. 4.3. Refining superpixel colors As stated in Section 3.4.1, the color of a superpixel is initialized by the its center and refined by all labeled pixels iteratively. The superpixel color is used as a reference of similarity in the
D. Chai / Pattern Recognition 92 (2019) 52–63
Undersegmentation Error Achievable segmentation accuracy
1.1
Undersegmentation error
0.95 0.9
Boundary recall
Achievable Segmentation Accuracy
1.2
0.85 0.8 0.75 0.7 0.65
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3
0
200
400
600
800
1000
1200
Number of superpixels
1400
1600
0.2
0
200
400
600
800
1000
1200
1400
1600
0.9
0.97
0.88
0.96
0.86
0.95 0.94 0.93 0.92 0.91
0.89
0.84 0.82 0.8 0.78 0.76
0.9
0.74
0
Number of superpixels SQL(0)
Explained Variation
0.98
Explained Variation
Boundary Recall 1
59
200
400
600
800
1000
1200
1400
1600
0.72
0
Number of superpixels SQL(1)
200
400
600
800
1000
1200
1400
1600
Number of Superpixels
SQL(2)
Fig. 11. Performance evaluation against the number of iterations for refining patch colors and data terms. BR, UE, ASA and EV are reported for superpixels generated by SQL algorithm with different number of iterations. SQL(0), SQL(1) and SQL(2) indicate no, one and two refinements respectively. The black lines are covered by green lines. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Algorithm 1 SQL Algorithm. Input: I, K, T , λ, k. Output: Superpixel indexes for all pixels. 1: Calculate d, R, C; 2: Partition the image plane into R ∗ C square patches; 3: Initialize the label of each pixel according to the initial partition; 4: Initialize the color of each patch to be its central pixel’s color; 5: for i = 1 to T do success = 1; 6: while success do 7: success = 0; 8: for α = 0 to 3 do 9: Calculate D p ( f p ) for each pixel p; 10: Calculate Vp,q ( f p , fq ) for each pair of p, q; 11: 12: fˆ = arg min f E ( f ) subject to f is an α −expansion of f; if E ( fˆ) < E ( f ) then 13: f = fˆ; 14: success = 1; 15: end if 16: end for 17: 18: end while Update the color of each patch by its averaged color; 19: 20: end for 21: Grow pixels from patch centers to find connected components. 22: for α = 0 to 3 do Calculate D p ( f p ) for each pixel p in disconnected compo23: nents; Calculate Vp,q ( f p , fq ) for each pair of p, q in (or linked) to 24: disconnected components; fˆ = arg min f E ( f ) subject to f is an α −expansion of f ; 25: if E ( fˆ) < E ( f ) then 26: 27: f = fˆ; end if end for 30: Map the quaternary label to the superpixel index for each pixel. 31: return Superpixel indexes for all pixels. 28:
29:
data term. As illustrated in Fig. 11, the first refinement improves segmentation performance significantly as indicated by the difference between green lines and red lines. However, no observable improvements are achieved in the second refinement as the black lines are covered by the green lines. It means that two iterations are enough to converge. Therefore, T = 2 is recommended for SQL Algorithm 1.
4.4. Effects of balancing parameters In Eq. (1), the data and smooth terms are balanced by λ. When λ = 0, no smooth terms contribute to the objective function. When λ → ∞, the relative contribution approaches 0. In between, the contributions from smooth terms increase as λ increase. Both color and spatial coordinates of each pixel contribute to the data term in Eq. (5). Their balance is controlled by k in Eqs. (9) and (10). The effects of k is similar to that of λ. When k = 0, no spatial coordinates contribute to the data term. Their contributions increase as k increases and finally dominate the data term when k → ∞. λ = 0, 1, 10, 100, 10 0 0 and k = 0, 1, 10, 100, 10 0 0 are input to SQL Algorithm 1. Segmentation results based on different parameters are presented in Fig. 12 to demonstrate their effects. When λ = 0, the superpixel boundaries are very noisy since no smooth term is involved at all. The boundaries become smoother as λ increases. On the other hand, superpixels become more uniform and compact as k increases. When k = 10 0 0, the data term of each pixel is dominated by its spatial coordinates. As a result, it produce superpixel lattice as shown in the last column. 4.5. Comparison with state-of-the-art 4.5.1. Comparison with seaming methods SQL is compared with LC1 [27], SPBO2 [28], IRSL [34] and CIS3 [37]. LC, SPBO and IRSL are based on strip seaming whereas CIS is based on patch seaming. CS, VPS and CIS are formulated in the same framework. However, only CIS is selected for comparison since it has better performance. Superpixels are generated by publicly available codes downloaded from the listed website. The code for LC is executable whereas the C++ codes for SPBO and CIS are built to generate superpixels. All parameters are set to their default values. The superpixels of three images are presented in Fig. 13 for visual comparison. Both a coarse segmentation and a fine segmentation of each image are presented for illustration. As shown, LC, SPBO, IRSL and SQL generate superpixels lattice. An interesting characteristic of LC is that the superpixel boundaries are straight lines in homogeneous regions. However, these superpixels are uneven since strips are not enforced to have explicit width in its formulation. Some neighboring boundaries are very close while as the others are far away. This drawback is overcome by SPBO since
1 2 3
http://pvl.cs.ucl.ac.uk/. http://yuhang.rsise.anu.edu.au/yuhang/. http://www.csd.uwo.ca/faculty/olga/Code/superpixels1pt1.zip.
60
D. Chai / Pattern Recognition 92 (2019) 52–63
Fig. 12. Effect of balancing parameters. The superpixel boundaries change as λ varies row by row and k varies column by column. λ = 0, 1, 10, 100, 10 0 0 from up to bottom. k = 0, 1, 10, 100, 10 0 0 from left to right.
Fig. 13. Visual comparison of different superpixels. From left to right, the columns are superpixels generated by LC [27], SPBO [28], IRSL[34], CIS [37], SEEDS [40], SLIC [41] and SQL. The six rows demonstrate three examples from BSDS, 321 × 481 pixels are grouped into about 200 and 1000 superpixels in two rows, respectively.
D. Chai / Pattern Recognition 92 (2019) 52–63
Undersegmentation Error
Boundary Recall
Achievable Segmentation Accuracy
1.4
Undersegmentation error
Boundary recall
0.85 0.8 0.75 0.7 0.65 0.6
1.2
1
0.8
0.6
0.4 0.55 0.5
0
200
400
600
800
1000
1200
1400
0.2
1600
0
200
Number of superpixels
400
600
800
1000
1200
1400
0.9
0.96
0.94
0.92
0.9
0.88
0.86
0.8 0.75 0.7
0.6
0
200
Number of superpixels LC
0.85
0.65
0.84
1600
0.95
Explained Variation
0.95 0.9
Explained Variation
0.98
1.6
Achievable segmentation accuracy
1
61
400
600
800
1000
1200
1400
0.55
1600
0
200
Number of superpixels
SPBO
IRSL
CIS
400
600
800
1000
1200
1400
1600
Number of Superpixels
SQL
Fig. 14. Performance evaluation of seaming methods against BSDS. BR, UE, ASA and EV are reported for superpixels based on strip seaming (LC [27], SPBO [28], IRSL [34]), patch seaming (CIS [37]), and layer seaming (SQL).
Achievable segmentation accuracy
1.1
Undersegmentation error
0.95 0.9
Boundary recall
Achievable Segmentation Accuracy
1.2
0.85 0.8 0.75 0.7 0.65
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3
0
200
400
600
800
1000
1200
Number of superpixels
1400
1600
0.2
0
200
400
600
800
1000
1200
1400
1600
Explained Variation
0.98
0.92 0.9
0.97
0.88 0.96
Explained variation
Undersegmentation Error
Boundary Recall 1
0.95 0.94 0.93 0.92
0.86 0.84 0.82 0.8 0.78 0.76
0.91 0.9
0.74
0
200
Number of superpixels
400
600
800
1000
1200
1400
Number of superpixels
SEEDS
SLIC
1600
0.72
0
200
400
600
800
1000
1200
1400
1600
Number of superpixels
SQL
Fig. 15. Performance evaluation of state-of-the-art general superpixel methods against BSDS. BR, UE, ASA and EV are reported for superpixels generated by SEEDS [40], SLIC [41] and SQL.
strips are enforced to have limited and uniform width. However, SPBO is sensitive to noises and generates noisy boundaries even in homogeneous regions. The boundaries of superpixels generated by IRSL are smoother than those by SPBO, and they are evener than those of LC. However, these superpixels are not uniform in size since superpixel colors are not modeled. From the viewpoint of pixel labeling, the model for IRSL has only smooth terms but no data terms. In contrast, both data and smooth terms are carefully developed for SQL. Further, instead of horizontal labeling and vertical labeling, pixel labeling is achieved by SQL as a whole. It assures global optimality. As expected, SQL generates even superpixel lattice and uniform superpixels. Although CIS are produced via seaming a regular grid of patches, not all patches are required to present in the final segmentation. As a result, CIS does not conform to a regular lattice. Their evaluation is presented in Fig. 14. As a whole, layer seaming is better than strip seaming. That is, SQL has superior performances to LC, SPBO and IRSL, except that SQL’s BR and LC’ BR are competitive. When images are segmented into a few number of superpixels, SQL’s BR is lower than LC’ BR. As the number of superpixels increases, SQL’s BR increases faster than LC’ BR, and the former exceeds the later when n is about 600. One should note that LC is based on a pre-computed boundary map, which improves BR. Although patches are organized and assembled into four layers to reduce the number of labels, the performances of patch seaming are not sacrificed by layer seaming. SQL has superior BR, UE and ASA to CIS. Its EV is a little bit inferior to that of CIS. That means, both lattice structure and better performances are assured by SQL.
4.5.2. Comparison with general methods SQL is compared with SEEDS4 [40] and SLIC5 [41], which are two state-of-the-art methods for superpixels without lattice structure [4]. Starting from some initial centers, SLIC iteratively clusters pixels and adjusts cluster centers to converge to stable clustering. Starting from an initial partition, SEEDS adjusts the boundaries to achieve local optimality by exchanging pixels or blocks of neighboring superpixels. Superpixels are generated by publicly available C++ codes downloaded from the listed website. All parameters are set to their default values. The superpixels are also presented in Fig. 13 for visual comparison. As shown, lattice structure is not assured by SEEDS and SLIC. For the superpixels by SEEDS, one can hardly observe lattice structure even in homogenous regions. In contrast, SLIC produces lattice structure in homogenous regions but not clutter areas. To conform to a regular lattice, SQL may lose some performances as expected. However, SQL is still competitive with SEEDS and SLIC as demonstrated in Fig. 15.
4.6. Complexity and efficiency Superpixel segmentation via patch seaming is formulated as a pixel labeling problem, which is solved by α − expansion algorithm. Its complexities is O(n), where n is the total number of labels. It is usually a large number since it equals to the number of superpixels K. By label mapping based on layer
4 5
http://www.vision.ee.ethz.ch/software. http://ivrg.epfl.ch/supplementary_material/RK_SLICSuperpixels.
62
D. Chai / Pattern Recognition 92 (2019) 52–63
Segmentation Speed 4.5 CIS
SQL
perpixel lattice, which has the same neighborhood system as that of pixels. This facilitates superpixel labeling, which is another direction for further investigation.
4
Acknowledgments
Time in seconds
3.5
This work was supported by the National Natural Science Foundation of China (no.41571335).
3
References
2.5 2 1.5 1 0.5 0
0
500
1000
1500
Number of superpixels Fig. 16. Comparison of segmentation efficiency. Averaged computational cost for segmenting one 481∗ 321 image by CIS [37] and SQL.
seaming, the number of labels is reduced to 4. Therefore, as described in Algorithm 1, there are only 4 for-loops (9 -17). With this time complexity, the segmentation speed is improved significantly. The time consumed by SQL (layer seaming) and CIS (patch seaming) are recorded and presented in Fig. 16. As shown, the averaged time taken by SQL and CIS to segment one BSDS image are about 0.4 and 4 s respectively. Since a superpixel is assumed to cover within a small patch, only a local patch is involved in each α − expansion in CIS. In contrast, all pixels are involved in each expansion in SQL. As a result, the ration of their time is not 4/K. When K increases, patches become smaller, and fewer pixels are involved in each expansion. This is why the computation time does not increase as K increases. 5. Conclusion This paper formulates superpixel segmentation as patch seaming and pixel labeling problem. It assembles all patches into four layers according to the parities of their row indexes and column indexes, and generates superpixels via layer seaming. With such patch assembling, the number of labels is reduced from the number of patches to four. All original labels indexing patches are mapped to quaternary labels indexing layers. The original labels can be recovered from quaternary labels by inverse mapping. The efficiency of optimization is improved greatly with the help of reduced number of labels, and this promotes SQL algorithm to be efficient for superpixel segmentation. In the new framework, patch colors and patch positions are introduced into the data term, horizontal constraint and vertical constraint are introduced into the smooth term. By combing the data term and smooth term, SQL generates a lattice of superpixels and achieves good segmentation performance as supported by their evaluation. It is straightforward to extend to supervoxel segmentation. SQL provides a new framework for superpixel via pixel labeling. It allows more image features such as edges to be introduced into the data and smooth terms. In this way, more sophisticated model can be proposed to achieve better segmentation performance. This is an issue to be investigated in the near future. SQL produces su-
[1] X. Ren, J. Malik, Learning a classification model for segmentation, in: Proceedings of the Ninth IEEE International Conference on Computer Vision, volume 1, 2003, pp. 10–17. October 13, 2003 - October 16, 2003. [2] M. Wertheimer, Laws of organization in perceptual forms, in: A Sourcebook of Gestalt Psycychology, 1938, pp. 71–88. [3] S. He, R.W. Lau, W. Liu, Z. Huang, Q. Yang, Supercnn: a superpixelwise convolutional neural network for salient object detection, Int. J. Comput. Vis. 115 (3) (2015) 330–344. [4] D. Stutz, A. Hermans, B. Leibe, Superpixels: an evaluation of the state-of-the-art, Comput. Vis. Image Underst. 166 (2018) 1–27. [5] X. Zhang, X. Sun, C. Xu, G. Baciu, Multiple feature distinctions based saliency flow model, Pattern Recognit. 54 (2016) 190–205. [6] S.P. Chatzis, A Markov random field-regulated Pitman–Yor process prior for spatially constrained data clustering, Pattern Recognit. 46 (6) (2013) 1595–1603. [7] L. Zhang, B. Verma, D. Stockwell, Spatial contextual superpixel model for natural roadside vegetation classification, Pattern Recognit. 60 (2016) 444–457. [8] J. Xiao, L. Quan, Multiple view semantic segmentation for street view images, in: Proceedings of the IEEE Twelfth International Conference on Computer Vision, IEEE, 2009, pp. 686–693. [9] X. Boix, J.M. Gonfaus, J.V. de Weijer, A.D. Bagdanov, J. Serrat, J. Gonzàlez, Harmony potentials, Int. J. Comput. Vis. 96 (1) (2012) 83–102. [10] S. Yin, Y. Qian, M. Gong, Unsupervised hierarchical image segmentation through fuzzy entropy maximization, Pattern Recognit. 68 (2017) 245–259. [11] A. Aksac, T. Ozyer, R. Alhajj, Complex networks driven salient region detection based on superpixel segmentation, Pattern Recognit. 66 (2017) 268–279. [12] A. Levinshtein, C. Sminchisescu, S. Dickinson, Optimal contour closure by superpixel grouping, in: Proceedings of the European Conference on Computer Vision, Springer, 2010, pp. 480–493. [13] C.L. Zitnick, S.B. Kang, Stereo for image-based rendering using image over-segmentation, Int. J. Comput. Vis. 75 (1) (2007) 49–65. [14] B. Micˇ ušík, J. Košecká, Multi-view superpixel stereo in urban environments, Int. J. Comput. Vis. 89 (1) (2010) 106–119. [15] F. Cheng, H. Zhang, M. Sun, D. Yuan, Cross-trees, edge and superpixel priors-based cost aggregation for stereo matching, Pattern Recognit. 48 (7) (2015) 2269–2278. [16] B. Alexe, T. Deselaers, V. Ferrari, Measuring the objectness of image windows, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2189–2202. [17] J. Hosang, R. Benenson, P. Dollár, B. Schiele, What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38 (4) (2016) 814–830. [18] B. Fulkerson, A. Vedaldi, S. Soatto, Class segmentation and object localization with superpixel neighborhoods, in: Proceedings of the IEEE Twelfth International Conference on Computer Vision, 2009, pp. 670–677. [19] S. Wang, H. Lu, F. Yang, M.H. Yang, Superpixel tracking, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2011, pp. 1323–1330. [20] G. Wu, W. Kang, Exploiting superpixel and hybrid hash for kernel-based visual tracking, Pattern Recognit. 68 (2017) 175–190. [21] J. Xiao, R. Stolkin, A. Leonardis, Dynamic multi-level appearance models and adaptive clustered decision trees for single target tracking, Pattern Recognit. 69 (2017) 169–183. [22] C. Shi, C.M. Pun, Superpixel-based 3D deep neural networks for hyperspectral image classification, Pattern Recognit. 74 (2018) 600–616. [23] Y. Duan, F. Liu, L. Jiao, P. Zhao, L. Zhang, SAR image segmentation based on convolutional-wavelet neural network and Markov random field, Pattern Recognit. 64 (2017) 255–267. [24] A. Lucchi, K. Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features, IEEE Trans. Med. Imaging 31 (2) (2012) 474–486. [25] F. Kanavati, T. Tong, K. Misawa, M. Fujiwara, K. Mori, D. Rueckert, B. Glocker, Supervoxel classification forests for estimating pairwise image correspondences, Pattern Recognit. 63 (2017) 561–569. [26] A.P. Moore, S.J. Prince, J. Warrell, U. Mohammed, G. Jones, Superpixel lattices, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, 2008, pp. 1–8. [27] A.P. Moore, S.J. Prince, J. Warrell, lattice cut-constructing superpixels using layer constraints, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2117–2124. [28] Y. Zhang, R. Hartley, J. Mashford, S. Burn, Superpixels via pseudo-boolean optimization, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2011, pp. 1387–1394. [29] S.Z. Li, Markov Random Field Modeling in Image Analysis, Springer Science & Business Media, 2009.
D. Chai / Pattern Recognition 92 (2019) 52–63 [30] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, C. Rother, A comparative study of energy minimization methods for Markov random fields with smoothness-based priors, IEEE Trans. Pattern Anal. Mach. Intell. 30 (6) (2008) 1068–1080. [31] P.F. Felzenszwalb, D.P. Huttenlocher, Efficient belief propagation for early vision, Int. J. Comput. Vis. 70 (1) (2006) 41–54. [32] M.F. Tappen, W.T. Freeman, Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters, in: Proceedings of the Ninth IEEE International Conference on Computer Vision, volume 2, 2003, pp. 900–906. [33] C. Rother, V. Kolmogorov, A. Blake, “Grabcut” - interactive foreground extraction using iterated graph cuts, ACM Trans. Graph. 23 (3) (2004) 309–314. [34] D. Chai, Y. Huang, Y. Bao, Irsl: iterative refining superpixel lattice, IEEE Geosci. Remote Sens. Lett. 14 (3) (2017) 344–348. [35] P.F. Felzenszwalb, D.P. Huttenlocher, Efficient graph-based image segmentation, Int. J. Comput. Vis. 59 (2) (2004) 167–181. [36] M.Y. Liu, O. Tuzel, S. Ramalingam, R. Chellappa, Entropy rate superpixel segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 2097–2104. [37] O. Veksler, Y. Boykov, P. Mehrani, Superpixels and supervoxels in an energy optimization framework, in: Proceedings of the European Conference on Computer Vision, Springer, 2010, pp. 211–224. [38] A. Levinshtein, A. Stere, K.N. Kutulakos, D.J. Fleet, S.J. Dickinson, K. Siddiqi, Turbopixels: fast superpixels using geometric flows, IEEE Trans. Pattern Anal. Mach. Intell. 31 (12) (2009) 2290–2297. [39] P. Wang, G. Zeng, R. Gan, J. Wang, H. Zha, Structure-sensitive superpixels via geodesic distance, Int. J. Comput. Vis. 103 (1) (2013) 1–21. [40] M.V.d. Bergh, X. Boix, G. Roig, L.V. Gool, SEEDs: superpixels extracted via energy-driven sampling, Int. J. Comput. Vis. 111 (3) (2015) 298–314. [41] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Süsstrunk, SLIC superpixels compared to state-of-the-art superpixel methods, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2274–2282. [42] J. Wang, X. Wang, VCells: simple and efficient superpixels using edge-weighted centroidal voronoi tessellations, IEEE Trans. Pattern Anal. Mach. Intell. 34 (6) (2012) 1241–1247.
63
[43] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach. Intell. 24 (5) (2002) 603–619. [44] A. Vedaldi, S. Soatto, Quick shift and kernel methods for mode seeking, in: Proceedings of the European Conference on Computer Vision, Springer, 2008, pp. 705–718. [45] H. Ishikawa, Exact optimization for Markov random fields with convex priors, IEEE Trans. Pattern Anal. Mach. Intell. 25 (10) (2003) 1333–1336. [46] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Anal. Mach. Intell. 23 (11) (2001) 1222–1239. [47] D. Chai, H. Lin, Q. Peng, Bisection approach for pixel labelling problem, Pattern Recognit. 43 (5) (2010) 1826–1834. [48] F.Y. Wu, The Potts model, Rev. Mod. Phys. 54 (1) (1982) 235. [49] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proceedings of the Eighth IEEE International Conference on Computer Vision ICCV 2001, Vol. 2, IEEE, 2001, pp. 416–423. [50] P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hierarchical image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (5) (2011) 898–916. Dengfeng Chai received the Bachelors degree in surveying engineering from Wuhan University, Wuhan, China, the Masters degree in photogrammetry and remote sensing from the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, and the Doctors degree in applied mathematics from the State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, China, in 1997, 20 0 0, and 20 06, respectively. He is an associate professor in the Institute of Spatial Information Technique, Zhejiang University. From 2010 to 2011, he was a postdoctoral fellow in the Department of Photogrammetry, University of Bonn, Bonn, Germany. His research interest include many topics in computer vision and pattern recognition, photogrammetry and remote sensing, such as image segmentation, object recognition, object extraction and stereo matching. He published many papers in top level journals and conferences including Pattern Recognition, ICCV, CVPR and etc.