Ensembling over-segmentations: From weak evidence to strong segmentation

Ensembling over-segmentations: From weak evidence to strong segmentation

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Ensemblin...

9MB Sizes 8 Downloads 53 Views

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Ensembling over-segmentations: From weak evidence to strong segmentation Dong Huang a,b, Jian-Huang Lai b,c,n, Chang-Dong Wang b,c, Pong C. Yuen d a

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China c Guangdong Key Laboratory of Information Security Technology, Guangzhou, China d Department of Computer Science, Hong Kong Baptist University, Hong Kong, China b

art ic l e i nf o

a b s t r a c t

Article history: Received 27 September 2015 Received in revised form 26 February 2016 Accepted 5 May 2016 Communicated by Yi Zhe Song

Due to the high diversity of image data, image segmentation is still a very challenging problem after decades of development. Each segmentation algorithm has its merits as well as its drawbacks. Instead of segmenting images via conventional techniques, inspired by the idea of the ensemble clustering technique that combines a set of weak clusterers to obtain a strong clusterer, we propose to achieve a consensus segmentation by fusing evidence accumulated from multiple weak segmentations (or oversegmentations). We present a novel image segmentation approach which exploits multiple oversegmentations and achieves segmentation results by hierarchical region merging. The cross-region evidence accumulation (CREA) mechanism is designed for collecting information among oversegmentations. The pixel-pairs across regions are treated as a bag of independent voters and the cumulative votes from multiple over-segmentations are fused to estimate the coherency of adjacent regions. We further integrate the brightness, color, and texture cues for measuring the appearance similarity between regions in an over-segmentation, which, together with the CREA information, are utilized for making the region merging decisions. Experiments are conducted on multiple public datasets, which demonstrate the superiority of our approach in terms of both effectiveness and efficiency when compared to the state-of-the-art. & 2016 Elsevier B.V. All rights reserved.

Keywords: Image segmentation Ensemble clustering Over-segmentation Ensemble segmentation Segmentation fusion

1. Introduction Image segmentation is a fundamental yet challenging problem in the field of computer vision. The purpose of image segmentation is to partition an image into a certain number of regions that have coherent properties. There is a large amount of literature on image segmentation in the past few decades [1–19]. However, most of the existing approaches focus on combining low-level visual features and global optimization methods. In this paper, we explore an alternative strategy which aims to accumulate cues from multiple over-segmentations (generated by different segmentation methods or by the same method with different parameters) to obtain a more robust and better segmentation. n Corresponding author. Present address: School of Data and Computer Science, Sun Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou, Guangdong 510006, China. Tel.: þ 86 13168313819; fax: þ86 20 84110175. E-mail addresses: [email protected] (D. Huang), [email protected] (J.-H. Lai), [email protected] (C.-D. Wang), [email protected] (P.C. Yuen).

Over-segmentation occurs when coherent regions are split into smaller segments, which is generally an easier task than obtaining good segmentations. In one aspect, over-segmenting is usually not preferred for image segmentation. In another aspect, however, lots of object details and pixel-wise relationship are well preserved among the segments in an over-segmentation. An over-segmentation can be viewed as a weak segmentation. Similar to the ensemble techniques [20–26] which aim to combine a set of weak clusterers to obtain a strong clusterer, this paper addresses the problem of accumulating evidence from multiple weak segmentations (or over-segmentations) to obtain a strong segmentation (see Fig. 1). We propose a novel image segmentation approach based on over-segmetation fusion and hierarchical region merging. A set of over-segmentations generated by different methods are used, each treated as an independent evidence. The information in the set of over-segmentations, together with the brightness, color and texture cues, are incorporated in the region merging process to construct the final segmentation. The cross-region evidence accumulation (CREA) mechanism is presented for collecting information from multiple over-segmentations via a regional voting strategy. The experiments on three public datasets,

http://dx.doi.org/10.1016/j.neucom.2016.05.028 0925-2312/& 2016 Elsevier B.V. All rights reserved.

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

Over-segmenting the input image

Fusing multiple over-segmentions

Fig. 1. Image segmentation using multiple over-segmentations. (a) The original image. (b) Multiple over-segmentations produced by the mean-shift method [3] and the F–H method [4]. (c) After building a hierarchy of segmentations by fusing evidence from multiple over-segmentations, the final segmentation is obtained by choosing a segmentation number (or level) for the hierarchy. Here, the PRI-optimized segmentation number is adopted.

i.e., BSDS300, BSDS500, and MSRC, show that the proposed approach produces significantly better segmentation results than the state-ofthe-art techniques with a low computational cost in execution time and memory usage. The remainder of this paper is organized as follows. We review the related work of image segmentation and ensemble clustering in Section 2. The proposed segmentation approach with crossregion evidence accumulation and cue integration is introduced in Section 3. The experimental results are reported in Section 4. We conclude this paper in Section 5.

2. Related work 2.1. Image segmentation During the past few decades, many image segmentation approaches have been developed by exploiting a wide variety of techniques, such as mode seeking [3,11], graph partitioning [2,6,7,13,27], region merging [4,5,9], fuzzy clustering [28,29], variational methods [12,18,30], Markov random fields [8,17], and level set methods [14,19,31]. Among these categories, the mode seeking, graph partitioning, and region merging methods are three of the most popular techniques for segmenting natural images.

The mode seeking methods [3,11] provide image segmentation with a versatile tool via feature space analysis. The mean-shift method [3] is one of the most widely used mode seeking methods. Local density maxima (or modes) in the feature space are detected and the feature space is partitioned by several clusters with the modes of the density being cluster centroids. The mean-shift method is capable of generating clusters of arbitrary shapes. Though the image details are well respected, the mean-shift method tends to produce artifacts by splitting a coherent region into pieces. In practical applications the mean-shift method is often exploited as a pre-processing step to generate a set of primitive segments [32,33], which are also called superpixels. The graph partitioning based methods [2,6,7,13] offer a way to incorporate global information into the image segmentation process. Typically, a graph is constructed with the image pixels mapped onto the graph vertices and the relationship between pixels onto the weighted graph links. Spectral clustering [34,35] is often utilized to partition the graph into a certain number of nonoverlapping subgraphs and achieve the segmentation of the image. Shi and Malik [2] for the first time, to our knowledge, introduced spectral clustering into the field of computer vision and developed the normalized cuts (Ncuts) for graph partitioning and image segmentation. In the naive implementation of the Ncuts [2], only the links connecting pixels within a small spatial distance

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

are considered so as to avoid a huge computational cost. To solve this limitation Cour et al. [6] constructed the links at multiple scales over the image and proposed the multiscale normalized cuts (MNcuts) which is a computationally efficient approach for spectral clustering based image segmentation. Sharon et al. [7] proposed an efficient algorithm which was inspired by algebraic multigrid [36] and applied a multiscale approach to recursively reduce the normalized cuts problem. To capture both short- and long-range grouping cues in the image, Wang et al. [37] proposed to construct a sparse global/local affinity graph over superpixels and partition the graph using the Transfer Cut (Tcut) [33], which is an extension of the normalized cut. However, for most graph partitioning methods, a single optimal partition of the graph is not easy to obtain. Additionally, the normalized cut-based methods generally favor normalized segments and tend to break large uniform regions into several pieces. The region merging based methods [4,5,9,38] is able to produce a hierarchy of image segmentation, which can be typically represented as a segmentation tree with each level corresponding to a specific segmentation. Felzenszwalb and Huttenlocher [4] developed a graph-based region merging algorithm, aiming to achieve the segmentation which is neither too coarse nor too fine in a greedy manner. Nock and Nielsen [5] formulated the region merging problem into a statistical framework, where it is implicitly assumed that the color variation between true regions should be greater than that within true regions. Arbeláez et al. [9] combined multiple local cues into a globalization framework and proposed a new contour detector termed gPb. Then a region merging based algorithm termed owt-ucm is performed on the output of gPb to achieve a hierarchical region tree. The final segmentation with a certain number of segments can be obtained by specifying a level (or scale) for the hierarchical region tree. 2.2. Ensemble clustering and ensemble segmentation Data clustering is a classical problem in machine learning and data mining. Many clustering approaches have been proposed in the past few decades [39]. However, there is no single clustering algorithm that is capable of dealing with all sorts of data with various structures. To combine the results of different clustering methods into a more robust and better clustering, the ensemble clustering technique emerges and has been drawing increasing attention in recent years [40]. The ensemble clustering technique aims to combine multiple clusterings to obtain a consensus clustering which is expected to be a more robust and better clustering. Fred and Jain [20] introduced the concept of evidence accumulation clustering (EAC) to fuse the information of multiple weak clusterings into a knowledge pool referred to as the co-association matrix. The value of each entry in the co-association matrix is defined as follows: Cði; jÞ ¼

mij ; M

ð1Þ

where mij is the number of times that objects i and j occur in the same cluster among the M clusterings. Thus the partitions of multiple clusterings are mapped into a new similarity measure via the co-association matrix C, over which further clustering methods can be performed to achieve a consensus clustering. Wang et al. [21] generalized the EAC approach using probability accumulation, which takes the cluster sizes of original clusterings into consideration. Mimaroglu and Erdil [22] proposed an evidence accumulation based approach that combined multiple clusterings via a similarity graph. Huang et al. [24] took into consideration the reliability of different clusterings and proposed the weighted evidence accumulation clustering (WEAC) method. These approaches require the construction of the co-association matrix

3

over the data objects and are not computationally feasible for the image segmentation problem. Image data generally consists of a large number of pixels. In an image of 300  400, there are 120,000 pixels, which results in a co-association matrix of size 120; 000  120; 000. Dealing with the co-association matrix of evidence accumulation for image segmentation would lead to a prohibitively expensive computation. To overcome the limitation of the computational cost to ensemble clustering, Franek et al. [41] used the superpixels as the primitive objects to reduce the problem size and then performed the general ensemble clustering methods on the superpixels to obtain the consensus segmentation. However, the approach in [41] only considers the information of multiple segmentations in the ensemble, but lacks the ability to exploit the multiple cues in the image, e.g., color, lightness, and texture cues, to enhance the robustness of the consensus process. Mignotte [42] proposed to combine multiple segmentations by fusing local histograms of the class labels of different segmentations, which is based on the assumption that all input segmentations should have the same number of clusters and is not applicable to the more general scenario where the multiple segmentations may have different numbers of clusters. Kim et al. [43] proposed to segment an image by combining an ensemble of multiple hierarchical segmentations. Each ensemble member in [43] should be a hierarchical segmentation, i.e., a segmentation tree. Its purpose is to combine multiple segmentation trees to obtain a consensus segmentation [43], which is not feasible for combining the general segmentation results. Besides that, several median partition based methods [44– 47] have been proposed to address the ensemble segmentation problem, i.e., the segmentation fusion problem. The median partition based methods [44–47] typically formulate the ensemble segmentation problem into an optimization problem, which aims to find the median partition (or segmentation) that maximizes the similarity between this segmentation and the multiple segmentations in terms of some similarity measures. Vega-Pons et al. [44] proposed a median partition based method for ensemble segmentation, which is developed from the weighted partition consensus via kernels (WPCK) [48] and exploits the Rand index as the similarity measure. Alush and Goldberger [45] proposed an ensemble segmentation approach based on integer linear programming, aiming to find an average segmentation in the “space of segmentations” which is close to all the individual segmentations. Mignotte [46] proposed to achieve the consensus segmentation by minimizing a cost function based on variation of information (VoI). Wang et al. [47] cast the ensemble segmentation problem into a combinatorial optimization problem and then developed a Bayesian image segmentation fusion (BISF) model to obtain the consensus segmentation. These median partition based approaches [44–47] aim to achieve a better segmentation by averaging the ensemble of multiple base segmentations, which implicitly assumes that there is a set of good, or at least not-bad, segmentations. However, obtaining a set of good segmentations itself is a very challenging task. Compared to good segmentations, over-segmentations are generally much easier to be obtained. As the over-segmentation results contain versatile information about object details and pixel relations, it is a practical strategy to segment an image by taking advantage of multiple over-segmentations [32,33,37]. Recently some efforts have been made to exploit multiple oversegmentations for the image segmentation problem [32,33,37]. Kim et al. [32] incorporated the information of multiple oversegmentations in semi-supervised learning to derive a dense affinity matrix over pixels for spectral segmentation. Li et al. [33] formulated the problem of segmentation with multiple oversegmentations into a bipartite graph model, where both pixels from the image and superpixels from the over-segmentations are

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

treated as graph vertices and the final segmentation is obtained by the Transfer Cut (Tcut) algorithm, which is a variant of the normalized cuts. Wang et al. [37] followed the model of [33] and fused the superpixels from multiple over-segmentations by constructing a bipartite graph that captures both short- and long-range grouping cues of the image. The method in [37] also exploits the Tcut algorithm [33] to partition the bipartite graph into a given number of segments. However, these methods [32,33,37] are based on normalized cuts, which favor normalized partitions and tend to break large uniform regions into several pieces. Besides, it is a time-consuming task to solve the eigen-problem for large matrices [32,33,37]. How to produce good segmentations for natural images by exploiting multiple over-segmentations effectively and efficiently remains a very challenging problem.

3. Image segmentation using multiple over-segmentations In this paper, we address the problem of image segmentation using multiple over-segmentations and propose a novel approach termed Image Segmentation by Cross-region evidence accumulation and Cue integration (ISCC). As shown in Fig. 2, given an input image, a set of diverse over-segmentations are generated by different segmentation methods or by the same segmentation method with different parameters. The cross-region evidence accumulation (CREA) mechanism is proposed to fuse information from multiple over-segmentations by means of a regional voting strategy. The accumulated cross-region evidence, together with the appearance similarity w.r.t. multiple cues, is then utilized to build the a hierarchy of segmentations in a iterative region merging manner. The final segmentation is obtained by choosing a level (or scale) for the segmentation hierarchy. In the proposed approach, the over-segmentations are exploited in two ways. One of the over-segmentations is treated as the primary layer (PL), upon which the region merging process would be performed, whereas the other over-segmentations are treated as the supporting layers (SLs). The region merging process of our approach starts from the PL. It is desirable that the PL preserves more local details and has more segments. Typically, it is suggested to select the over-segmentation with the most segments as the PL and use the other over-segmentations generated by different methods as the SLs.

Input Image

∼ Over-segmentation P

The Primary Layer

Over-segmentation P



Over-segmentation P

The Supporting Layers

Cross-Region Evidence Accumulation Cue Integration

Iterative Region Merging

Hierarchical Segmentation Tree

Fig. 2. The generation of the hierarchical segmentation tree by the proposed image segmentation approach based on cross-region evidence accumulation and cue integration.

Formally, consider K over-segmentations for an image I, and select one of the over-segmentations as the PL and the other K  1 over-segmentations as the SLs. Let P ¼ fP~ ; P ð1Þ ; …; P ðK  1Þ g represent the set of K over-segmentations, where P~ denote the PL, and P ðkÞ denote the k-th SL. Each over-segmentation consists of a set of regions (or segments), that is P~ ¼ fR1 ; R2 ; …; Rn0 g;

ð2Þ

P ð1Þ ¼ fR11 ; R12 ; …; R1n1 g;

ð3Þ

⋮ P ðK  1Þ ¼ fRK1  1 ; RK2  1 ; …; RKnK11 g

ð4Þ

where Ri is the i-th region in P~ , n0 is the number of regions in P~ , Rik is the i-th region in P ðkÞ and nk is the number of regions in P ðkÞ . Over the regions in the PL, the region merging algorithm proceeds according to the similarity measure between adjacent regions. The similarity measure is one of the most crucial factors for the region merging-based image segmentation approaches [4,5,9]. In this work, aiming to exploit the over-segmentations effectively and efficiently, we introduce a joint similarity measure between adjacent regions,which takes into consideration the cross-region evidence accumulated from multiple oversegmentations as well as the appearance similarity between the two regions w.r.t. multiple cues. Formally, the joint similarity between regions Ri and Rj is computed as follows: Simjoint ðRi ; Rj Þ ¼

1 ½Ecrea ðRi ; Rj Þ þ Eas ðRi ; Rj Þ; T

ð5Þ

where Ecrea ðRi ; Rj Þ and Eas ðRi ; Rj Þ are the influence of the CREA mechanism and the influence of the appearance similarity w.r.t. multiple cues, respectively, and 1=T is the normalization term for keeping Simjoint ðRi ; Rj Þ in the interval of ½0; 1. The CREA mechanism is designed to fuse information from multiple over-segmentations via a novel regional voting strategy, which combines multiple weak evidence into a strong bias for the region merging process. The appearance similarity integrates brightness, color, and texture cues and encourages the merging of similar regions. The construction of Ecrea ðRi ; Rj Þ and Eas ðRi ; Rj Þ will be described in Sections 3.1 and 3.2, respectively. Having computed the joint similarity between regions, the region merging process proceeds in a greedy and iterative manner. In each iteration of the region merging process, the two regions with the highest joint similarity would be merged into a new and larger region. Then we update the joint similarity between the new region and each of its adjacent regions w.r.t. Eq. (5) and proceed to the next iteration. The iterative merging over the regions in the PL leads to a hierarchical segmentation tree, where the root is the entire image and the leaves are the initial regions in P~ . Each level of the tree is associated with a segmentation of the image with a certain number of segments. Because in each iteration two of the regions merge and the number of segments decreases by 1, it is obvious that the t-th level of the tree is associated with the segmentation result with t segments. The final segmentation can be achieved by specifying a level (or scale) for the hierarchical segmentation tree. 3.1. Cross-region evidence accumulation The EAC method [20] proposed by Fred and Jain is based on the intuition that the more frequently two objects occur in the same cluster among multiple clusterings, the more likely it is that they belong to the same cluster in the consensus clustering. This mechanism is effective for fusing evidence from multiple clusterings. However, for the image segmentation problem the EAC is infeasible due to its prohibitively high computational expenses

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

when dealing with a huge pixel-wise co-association matrix (see Section 2). In this paper, instead of accumulating pixel-wise evidence, we explore another mechanism to accumulate regional evidence among multiple over-segmentations via a novel voting strategy. Compared to the huge number of pixels in an image, the number of segments in an over-segmentation is far smaller an amount to deal with, which generally ranges from tens to at most thousands. While respecting the pixel-wise relations, the proposed CREA mechanism is able to accumulate evidence from multiple over-segmentations in a bigger scenario of regions and provides an effective and efficient solution to fusing multiple over-segmentations. Different from the pixel-wise evidence accumulation, where two pixels could only be either included by the same segment or not, the task of obtaining evidence between regions needs to deal with the partial occlusion situations. A region consists of a certain number of pixels. If the pixels of two regions all occur in the same segment of another over-segmentation, i.e., if two regions are fully occluded by the same segment of another over-segmentation, then this is an evidence that the two regions may belong to a coherent region. And if part of the pixels of the two regions occur in the same segment of another over-segmentation, which is more often the case, this is also an evidence of the coherency and the strength (or reliability) of this evidence should be related to the number of the occluding pixels and the sizes of the two regions. In this work, both the full occlusion and partial occlusion situations are taken into consideration and the cross-region evidence is accumulated among multiple over-segmentations. The relationship between two regions can be viewed as the relationship between two sets of pixels. Two pixels from two regions respectively are called a pixel-pair across these two regions. Then the regional voting problem could be divided into a certain number of pixel-pair voting issues. We treat each pixelpair across two regions as an independent voter. For two regions in the PL, Ri ; Rj A P~ , there would be totally j Ri j  j Rj j pixel-pairs (or voters) across them. Then we are going to figure out the ratio of the voters that support the coherency of Ri and Rj. Let Rkh A P ðkÞ be a region in the k-th SL. If two pixels of a voter across Ri and Rj both occur in the region Rhk, then we say this voter supports the coherency of Ri and Rj w.r.t. Rhk. By considering the occluding portions between Ri, Rj and Rhk, we compute the ratio of voters that support the coherency of Ri and Rj w.r.t. Rhk as follows: votekh ðRi ; Rj Þ ¼

j Ri \

Rkh j

 j Rj \ j Ri j  j Rj j

Rkh j

;

ð6Þ

where Ri \ Rkh denote the set of pixels that occur in both Ri and Rhk. As it is defined, when all pixels in Ri and Rj appear in Rhk, i.e., when all voters support the coherency of Ri and Rj, votekh ðRi ; Rj Þ reaches its maximum 1. As it is a possible case, the voters across Ri and Rj may occur in more than one region in an over-segmentation. Specifically, some voters across Ri and Rj may support their coherency w.r.t. Rhk, while some other voters across them may support their coherency w.r.t. Rgk such that g a h. To obtain the ratio of voters that support the coherency of Ri and Rj w.r.t. to the over-segmentation P ðkÞ , we collect the votes for Ri and Rj w.r.t. different regions in P ðkÞ . That is X votekh ðRi ; Rj Þ; ð7Þ votek ðRi ; Rj Þ ¼ Rkh A P ðkÞ

where vote ðRi ; Rj Þ is the ratio of voters that support the coherency of Ri and Rj w.r.t. P ðkÞ . The regions within an over-segmentation are always non-overlapping, i.e., 8 g a h; Rkg \ Rkh ¼ ∅. For Ri ; Rj A P~ and k ¼ 1; …; K  1, it holds that votek ðRi ; Rj Þ A ½0; 1. The ratio of voters that support the coherency of two regions w. r.t. an over-segmentation (as defined in Eq. (7)) can be viewed as a k

5

measure of possiblity that the two regions belong to a coherent region given the information of that over-segmentation. In this work, multiple over-segmentations are treated as the SLs. The ratio of voters that support the coherency of two regions in the PL is averaged over the multiple SLs. Given multiple SLs, the crossregion evidence accumulation (CREA) term for two regions Ri and Rj is computed as CREAðRi ; Rj Þ ¼

1 1 KX votek ðRi ; Rj Þ; K 1 k ¼ 1

ð8Þ

where CREAðRi ; Rj Þ is the CREA term for Ri and Rj w.r.t. all the K 1 SLs. Then the influence of the CREA over the joint similarity in Eq. (5) is defined as follows: Ecrea ðRi ; Rj Þ ¼ λc CREAðRi ; Rj Þ;

ð9Þ

where λc Z 0 is a parameter. 3.2. Integrating brightness, color, and texture cues Besides the cross-region evidence accumulated from multiple over-segmentations, the proposed approach further takes the regional similarity in appearance into consideration. In this paper, the brightness, color, and texture cues are integrated to measure the appearance similarity between adjacent regions. The input image is converted into the CIE-Lab color space, where the L channel corresponds to the brightness while the a and b channels correspond to the color. Over each channel of the Lab color space, we construct a histogram for each region in the PL. We utilize the bag of textons [9,11] for depicting the texture of regions. To obtain the textons, the image is convolved with a 17dimensional filter bank [9]. Each image pixel is then associated with a 17-dimensional vector of filter responses. The k-means clustering is performed on these vectors and thereby a set of textons can be defined by the obtained Z cluster centers. Each pixel is assigned the texton id in ½1; Z of the nearest cluster center. Then, the histogram of textons can be computed for each region in the PL. 0 The dissimilarity between two histograms, say, h and h , is measured by the χ2 test [1,9]. That is

χ 2 ðh; h0 Þ ¼

h 1X ½hðiÞ  h ðiÞ2 ; 2 i ¼ 1 hðiÞ þ h0 ðiÞ

0

N

ð10Þ 0

where Nh is the number of bins of the histograms h and h . It holds 0 that χ 2 ðh; h Þ A ½0; 1. For a region Ri A P~ , let hiL, hia, hib, and hit denote its histograms w.r.t. the brightness, color a, color b, and texture channels respectively. The similarity between two regions Ri and Rj w.r.t. each channel is defined as follows: SL ðRi ; Rj Þ ¼ 1  χ 2 ðhi ; hj Þ;

ð11Þ

Sa ðRi ; Rj Þ ¼ 1  χ 2 ðhi ; hj Þ;

ð12Þ

Sb ðRi ; Rj Þ ¼ 1  χ 2 ðhi ; hj Þ;

ð13Þ

St ðRi ; Rj Þ ¼ 1  χ 2 ðhi ; hj Þ:

ð14Þ

L

a

b

t

L

a

b

t

Then the influence of the appearance similarity w.r.t. multiple cues is defined as follows: Eas ðRi ; Rj Þ ¼ λt St ðRi ; Rj Þ þ λLab ½SL ðRi ; Rj Þ þ Sa ðRi ; Rj Þ þ Sb ðRi ; Rj Þ;

ð15Þ

where λLab Z 0 and λt Z 0 are parameters.

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

With Eqs. (9) and (15), we further rewrite the joint similarity in Eq. (5) as follows: 1 λc CREAðRi ; Rj Þ þ λt St ðRi ; Rj Þ T  þ λLab ½SL ðRi ; Rj Þ þ Sa ðRi ; Rj Þ þSb ðRi ; Rj Þ ;

Simjoint ðRi ; Rj Þ ¼

ð16Þ

The three weighting paramters, namely, λc, λt, and λLab, should not be simultaneously set to 0. Obviously, the normalization term in Eq. (16) can be obtained as T ¼ λc þ λt þ 3  λLab . In this work, we mainly consider the task of segmenting natural images, which are generally color images. However, our approach is also applicable to grayscale image segmentation. For the case of grayscale images, the L channel is in fact the grayscale intensity and the two color channels are not used.

4. Experiments In this section, we evaluate the proposed approach, termed Image Segmentation by Cross-region evidence accumulation and Cue integration (ISCC), on three public image datasets, and compare it with the state-of-the-art image segmentation approaches. The experiments are conducted in Matlab R2014a 64-bit on a workstation with 8 Intel 2.40 GHz processors and 16 GB of RAM. 4.1. Datasets Our experiments are conducted on three public image segmentation datasets, which are described respectively as follows: BSDS300 [49]: The Berkeley Segmentation Dataset (300) contains 300 natural images of a wide variety of scene categories. Five to ten ground-truth segmentations produced by humans are provided for each of the images in the dataset. The images in BSDS300 are divided into a training set of 200 images and a test set of 100 images. BSDS500 [9]: The BSDS500 is an extension of the BSDS300 that includes 200 new test images. MSRC [50]: The MSRC object recognition dataset consists of 591 images with objects of 21 classes. The performances of the test approaches are evaluated using the cleaner and more precise ground-truth labeling of [51]. 4.2. Evaluation methods and experimental setup To quantitatively evaluate the segmentation results, we make use of five different measures in our experiments, that is, probabilistic rand index (PRI) [52], variation of information (VoI) [53], global consistency error (GCE) [49], boundary displacement error (BDE) [54] and segmentation covering [9]. As a generalization to the rand index (RI) [55], the PRI allows the comparison of a test segmentation against multiple groundtruth segmentations. Given a test segmentation P and a set of ground-truth segmentations fGk g, the PRI is defined as follows [52]:   1 X PRIðP; Gk Þ ¼ N ½cij pij þ ð1  cij Þð1  pij Þ;

ð17Þ

2 ioj

where N is the number of pixels in the image, cij A f0; 1g is the event that pixel i and pixel j have the same label in the test segmentation P, and pij is the corresponding probability estimated with the sample mean. The GCE measures the extent to which a segmentation can be deemed as a refinement of the other. Let ⧹ denote set difference and RðP; iÞ the set of pixels within the regions in segmentation P

that contains pixel i. The GCE is defined as follows [49]: ( ) X X 1 GCEðP; GÞ ¼ min EðP; G; iÞ; EðG; P; iÞ ; N i i

ð18Þ

where EðP; G; iÞ ¼

j RðP; iÞ⧹RðG; iÞj RðP; iÞ

ð19Þ

is the local refinement error [49]. The BDE measures the segmentation quality in terms of the precision of the region boundaries [54]. The displacement error of one boundary pixel in a segmentation is defined as the distance between the pixel and the closest pixel in the boundary image of the other segmentation. The BDE is computed by averaging displacement error of boundary pixels between two segmentations. The covering of a test segmentation P by a ground-truth segmentation G is defined as follows [9]: CoveringðP; GÞ ¼

1X j Rj  max OðR; R0 Þ; NR A P R0 A G

ð20Þ

where OðR; R0 Þ ¼

j R⋂R0 j j R⋃R0 j

ð21Þ

is the overlap between the two regions of R and R0 [9]. A segmentation is viewed as better if PRI and segmentation covering are larger and VoI, GCE, and BDE are smaller. In our experiments, we utilize two classical segmentation methods to produce over-segmentations, i.e., the mean-shift method [3] and the F–H method [4]. For each image, we use six over-segmentations. Two of the over-segmentations are produced by the mean-shift method [3] with parameters ðhs ; hr ; MÞ A f ð7; 9; 150Þ; ð7; 11; 250Þg, where hs and hr are the bandwidth parameters in spatial and range domains respectively, and M is the minimum size of each region. The other four over-segmentations are produced by the F–H method [4] with parameters ðσ ; c; MÞ A fð0:5; 100; 150Þ; ð0:5; 200; 150Þ; ð0:8; 200; 150Þ; ð0:8; 350; 150Þg, where σ and c are the smoothing and scaling parameters respectively and M is the minimum size of each region. Note that the same over-segmentation setting is adopted for all images in all the three benchmark datasets. It is intuitive that an over-segmentation with more segments is more likely to preserve the local details well. In our approach, out of the set of multiple over-segmentations, the one with the largest number of segments is exploited as the PL by default, while the other the SLs, in all experiments on the three benchmark datasets. 4.3. Parameter analysis In this section, we test the performance of our image segmentation approach with varying parameters on the training set of BSDS300. There are three parameters in our approach, namely, λc, λt, and λLab. As shown in Eq. (16), λc is the weight of the CREA information, λt is the weight of the texture information, and λLab is the weight of the Lab information. We test the performance of our approach with varying parameters λc, λt, and λLab (see Figs. 3–5). While testing one parameter, all the other parameters are set to be equal. The details of the experiments are given as follows.

 We test the influence of the CREA information by varying the value of λc =T (as shown in Fig. 3). The λc =T varies from 0 to 1, whereas the other two parameters are set that λLab ¼ λt ¼ ðT  λc Þ=4.

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 0.88 2.2 2

0.82 0.8

0.76

1.8

1.4

0

0.2

0.4 0.6 λc / T

0.8

1

16

0.18

15 14

0.16 0.15

1.6

0.78

17

0.17 GCE

VoI

PRI

0.84

0.2 0.19

BDE

0.86

7

0

0.2

0.4 0.6 λc / T

0.8

12 11

0.14

10

0.13

9

0.12

1

13

0

0.2

0.4 0.6 λc / T

0.8

8

1

0

0.2

0.4 0.6 λc / T

0.8

1

Fig. 3. Testing the parameter λc, which indicates the influence of the CREA information. The four sub-figures correspond to the average scores of PRI, VoI, GCE, and BDE, respectively, as λc =T varies from 0 to 1. For PRI, higher values indicate better segmentation; for VoI, GCE, and BDE, lower values indicate better segmentation. 0.88

2.4

0.24

0.86

2.2

0.22

0.8

0.2

0.4 0.6 λ /T

0.8

1

14

0.18

0

0.2

0.4 0.6 λt / T

t

0.8

1

12 10

0.14 0.12

13 11

0.16

1.4

0

15

BDE

1.8 1.6

0.78 0.76

GCE

0.82

16

0.2

2 VoI

PRI

0.84

17

9 0

0.2

0.4 0.6 λ /T

0.8

8

1

0

0.2

0.4 0.6 λ /T

0.8

1

t

t

Fig. 4. Testing the parameter λt, which indicates the influence of the texture information. The four sub-figures correspond to the average scores of PRI, VoI, GCE, and BDE, respectively, as λt =T varies from 0 to 1. For PRI, higher values indicate better segmentation; for VoI, GCE, and BDE, lower values indicate better segmentation. 0.88 2.2 2

0.82 0.8

0.76

1.8

1.4

0

0.1

λ

Lab

0.2 /T

0.3

0.18

15 14

0.16 0.15

1.6

0.78

16

0.17 GCE

VoI

PRI

0.84

17

BDE

0.86

0.2 0.19

0

0.1

0.2 λLab / T

0.3

13 12 11

0.14

10

0.13

9

0.12

0

0.1

λ

Lab

0.2 /T

0.3

8

0

0.1

λ

Lab

0.2 /T

0.3

Fig. 5. Testing the parameter λLab, which indicates the influence of the Lab information. The four sub-figures correspond to the average scores of PRI, VoI, GCE, and BDE, respectively, as λLab =T varies from 0 to 1/3. For PRI, higher values indicate better segmentation; for VoI, GCE, and BDE, lower values indicate better segmentation.

 We test the influence of the texture information by varying the value of λt =T (as shown in Fig. 4). The λt =T varies from 0 to 1, whereas the other parameters are set that λLab ¼ λc ¼ ðT  λt Þ=4.  We test the influence of the Lab information by varying the value of λLab =T (as shown in Fig. 5). The λLab =T varies from 0 to 1/3, whereas the other parameters are set that λc ¼ λt ¼ ðT  3  λLab Þ=2. The parameters λc =T, λt =T, and λLab =T correspond to the influence of the CREA, texture, and Lab information, respectively. As shown in Fig. 4, setting λt =T to a relatively small value leads to better performance. As shown in Figs. 3 and 5, when either λc =T or λLab =T approaches its minimum or its maximum, the segmentation performance declines. The segmentation performance of the proposed approach is consistently good (in terms of PRI, VoI, GCE and BDE) when setting λc =T and λLab =T to moderate values, e.g., in the intervals of ð0:3; 0:8Þ and ð0:05; 0:2Þ, respectively. In the following, in all of the experiments on the three benchmark datasets, the same parameters are used, that is, λc ¼ 4, λt ¼ 1, λLab ¼ 1. Here, it is obvious that T ¼ λc þ λt þ 3  λLab ¼ 8.

4.4. Performance of our approach with different combinations of components The proposed approach makes use of three types of information for making the region merging decisions, that is, the information of crossregion evidence accumulation (CREA) from multiple over-segmentations, the texture, and the Lab information (see Eq. (16)). In this section, we compare the segmentation quality of our approach using different combinations of these three components. For each of the combinations, the PRI-optimized segmentation number is adopted for yielding the final segmentation. As shown in Fig. 6, the combinations of two or three components lead to better segmentation performance than those with only one component. The best segmentation performance is achieved with the combination of all of the three components. The second best performance is achieved with the combination of the CREA and the Lab components. 4.5. Comparison against other image segmentation approaches In this section, we compare the proposed approach, termed ISCC, against fourteen baseline approaches, namely, Ncuts [2],

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

2.5

0.25

2

0.2

15

0.8

VoI

PRI

0.82

0.78

1.5

BDE

0.84

GCE

0.86

0.15

10

0.76 0.74

0.1

1

0.72 b

C R

La

EA +T A l l C extu R re E La A+ b+ La Te b xt ur C e R EA Te xt ur e

C R

+T All ex tu E A re La + b+ Lab Te xt ur e C R EA Te xt ur e La b R

R EA C

C

b

La

Al ex l tu re E La A+ b+ La Te b xt ur C e R EA Te xt ur e +T

R

C

EA C R

EA +T All ex C R ture E L a A+L b+ a Te b xt ur e C R EA Te xt ur e La b

5

0.7

Fig. 6. Performance of the proposed approach using different combinations of the three components, namely, the CREA, the texture, and the Lab information, in terms of the average scores of (a) PRI, (b) VoI, (c) GCE, and (d) BDE on the BSDS300 dataset. For PRI, higher values indicate better segmentation; for VoI, GCE, and BDE, lower values indicate better segmentation.

Table 1 Average performances on the BSDS300 dataset (bold indicates the best score of all the methods). For PRI and segmentation covering, higher values indicate better segmentation; for VoI, lower values indicate better segmentation. Methods Human

PRI 0.87

VoI 1.16

Covering 0.73

ISCC GL-graph [37] TPG [61] l0-graph [60] gPb-owt-ucm [9] SAS [33] MLSS [32] TBES [59] SDTV [58] JSEG [57] NTP [56] MNcuts [6] mean-shift [3] F–H [4] Ncuts [2]

0.86 0.84 0.82 0.84 0.85 0.84 0.81 0.80 0.78 0.78 0.75 0.76 0.80 0.71 0.72

1.66 1.80 1.77 2.00 1.71 1.67 1.86 1.76 1.82 2.32 2.50 2.47 1.97 3.39 2.91

0.65 – – – 0.65 0.61 – – – – – – – – –

Table 2 Average performances on the BSDS500 dataset (bold indicates the best score of all the methods). For PRI and segmentation covering, higher values indicate better segmentation; for VoI, lower values indicate better segmentation. Methods Human

PRI 0.88

VoI 1.17

Covering 0.72

ISCC gPb-owt-ucm [9] SAS [33]

0.86 0.86 0.84

1.70 1.73 1.71

0.64 0.64 0.60

mean-shift [3], F–H [4], MNcuts [6], NTP [56], JSEG [57], SDTV [58], TBES [59], MLSS [32], gPb-owt-ucm [9], SAS [33], l0-graph [60], TPG [61], and GL-graph [37]. The experiments are conducted on three benchmark datasets, i.e., BSDS300, BSDS500, and MSRC. Here, the PRI-optimized segmentation number (or scale) is adopted for evaluation, which follows the strategy in [33,37], and [61]. Out of the fourteen baseline approaches, the scores of gPb-owtucm and SAS are obtained by running the softwares from their authors, while the scores of the other twelve baseline approaches are collected from [33,37], and [61]. In Table 1, we report the segmentation performances of the proposed ISCC approach and the fourteen baseline approaches on the BSDS300 dataset. For each of the evaluation metrics, the average score over the 300 images in BSDS is computed. As shown in Table 1, the proposed approach outperforms the fourteen

Table 3 Average performances on the MSRC dataset (bold indicates the best score of all the methods). For PRI and segmentation covering, higher values indicate better segmentation; for VoI, lower values indicate better segmentation. Methods

PRI

VoI

Covering

ISCC TPG [61] gPb-owt-ucm [9] SAS [33]

0.85 0.81 0.84 0.80

1.27 1.26 1.28 1.39

0.74 – 0.73 0.67

baseline approaches in terms of PRI and VoI. The average PRI score of our approach (over the 300 images in BSDS300) is 0.86, and the second best PRI score is 0.85 which is achieved by gPb-owt-ucm. Our approach yields an average VoI of 1.66, which significantly outperforms most baseline approaches. In terms of segmentation covering, our approach and the gPb-owt-ucm approach achieve the best score of 0.65. In Table 2, the average performances of gPb-owt-ucm, SAS and our approach on the BSDS500 dataset are reported. the proposed ISCC approach achieves the same scores as gPb-owt-ucm in terms of PRI and segmentation covering, but better score than gPb-owtucm in terms of VoI. Our approach outperforms the SAS approach in terms of all of the three evaluation metrics. In Table 3, we report the average performances of TPG, gPbowt-ucm, SAS, and our approach on the MSRC dataset. the proposed ISCC approach achieves a VoI score of 1.27, which is slightly poorer than the TPG approach, but superior to gPb-owt-ucm and SAS. Compared to the three baseline approaches, the proposed approach yields the best performance on MSRC in terms of PRI and segmentation covering. The visual segmentation results of MNcuts [6], SAS [33], gPb-owtucm [9] and the proposed approach for some sample images in BSDS300 are given in Fig. 7. For each of the test approaches, the PRIoptimized segmentation number is adopted. We refer to the seven images, each corresponding to a column in Fig. 7, as images 1–7. The MNcuts and SAS approaches are based on normalized cuts, which favor normalized segments and tend to break large uniform regions. The gPb-owt-ucm approach produces good segmentations for images 2 and 5, which are visually comparable to the proposed ISCC approach, but yields poorer segmentations than the proposed approach for the other images. For image 7, all the four test approaches fail to achieve very good segmentations for it. The proposed ISCC approach tends to over-segment some regions of image 7, probably due to the ambiguity in the solder's hands and shoes in the image. However, for image 7, the segmentation of ISCC is still better than or comparable to that of the other three baseline approaches. To summarize, the visual comparison

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

9

Fig. 7. Some sample images and their segmentations produced by different approaches. Row 1 corresponds to the original images. Rows 2, 3, 4 and 5 respectively correspond to the segmentations produced by MNcuts [6], SAS [33], gPb-owt-ucm [9] and the proposed ISCC approach. The PRI-optimized segmentation number is adopted for each of the test approaches.

suggests that the proposed approach tends to produce overall better segmentation results than the baseline approaches (see Fig. 7). 4.6. Performance with various ensembles of over-segmentations In this section, we further compare ISCC against SAS [33] with various ensembles of over-segmentations. Specifically, we perform ISCC and SAS repeatedly on the BSDS300 dataset and report their average performances over a large number of runs. At each run, we

use randomly decided over-segmentation settings. In the following, we will describe how the random over-segmentation settings are obtained. First, for each run, the number of over-segmentations, i.e., K, is randomly selected in the interval of ½5; 10. Then, to generate each of the K over-segmentation settings, we randomly choose a method between mean-shift [3] and F–H [4]. If meanshift is chosen, then the three parameters hs, hr, and M of it will be randomly selected in the intervals of ½5; 10, ½5; 15, and ½50; 200, respectively. Similarly, if F–H is chosen, then the three parameters

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1.85

0.85

1.8

14

0.83

GCE

1.75

0.84

13.5

0.2

1.7

13

0.18

BDE

0.86

VoI

PRI

10

0.16

1.65

0.82

1.6

0.81

1.55

ISCC

SAS

12.5 12 11.5

0.14

11 ISCC

SAS

0.12

ISCC

SAS

ISCC

SAS

Fig. 8. Average Performances (over 100 different ensembles of over-segmentations) of the proposed ISCC approach and the SAS approach [33] on the BSDS300 dataset in terms of (a) PRI, (b) VoI, (c) GCE, and (d) BDE. For PRI, higher values indicate better segmentation; for VoI, GCE, and BDE, lower values indicate better segmentation.

σ, c, and M of F–H [4] will be randomly selected in the intervals of ½0:5; 0:9, ½100; 400, and ½50; 200, respectively. With the K oversegmentation settings decided, we proceed to generate K oversegmentations for each image in BSDS300. Then, the ISCC approach and the SAS approach are performed on the K oversegmentations for each image and their average performances (in terms of PRI, VoI, GCE, and BDE) over the 300 images in BSDS300 can thereby be obtained. Moreover, we perform ISCC and SAS 100 times with the over-segmentation settings randomly decided at each time, and report their grand average performances over 100 runs with various ensembles of over-segmentations in Fig. 8. As shown in Fig. 8, the proposed ISCC approach yields consistent performance over various ensembles of over-segmentations. With the ensemble of the K over-segmentation settings randomly decided at each run, the proposed approach consistently outperforms the SAS approach over 100 runs on the BSDS300 dataset in terms of PRI, VoI, GCE, and BDE, which demonstrates the robustness of our approach to various ensembles of over-segmentations. 4.7. Computational cost To evaluate the computational cost, we use different segmentation methods to segment a 481  321 color image from the BSDS300 dataset. As shown in Table 4, the proposed ISCC approach and the SAS approach are the fastest two of the five test approaches, which consume 29 s and 23 s respectively to segment an image in BSDS300. In the 29 s of CPU time for our approach, 6 s is consumed for producing the over-segmentations and 11 s for computing the textons. It takes 343 s for gPb-owt-ucm to segment an image of 481  321, which is significantly more computationally expensive than the other test approaches. In terms of memory usage, the proposed approach has a huge advantage over gPb-owt-ucm and a significant advantage over SAS, MNcuts, and Ncuts. the proposed approach consumes about 80 Megabytes of memory for segmenting a 481  321 image, whereas gPb-owt-ucm consumes over 6 Gigabytes of memory and the other three baseline methods consume about 300 Megabytes, 250 Megabytes, and 800 Megabytes of memory respectively. To summarize, according to the experimental results of segmentation quality and computational cost on the benchmark datasets (see Tables 1–4), the proposed approach outperforms the baseline image segmentation approaches in segmentation quality while requiring a low computational cost (in execution time and memory usage).

Table 4 Computational cost of segmenting a color image in BSDS300 using different methods. Methods

Execution time (in seconds)

ISCC 29 gPb-owt-ucm [9] 343 SAS [33] 23 MNcuts [6] 60 Ncuts [2] 76

Memory usage (in Megabytes)

 80 Z 6000  300  250  800

segmentation fusion and hierarchical region merging. The CREA mechanism is presented for collecting cross-region evidence from multiple over-segmentations. Instead of accumulating the pixelwise evidence, the proposed CREA mechanism handles the task of fusing over-segmentations in a bigger scenario of voting among regions, where the situations of both full occlusion and partial occlusion are taken into consideration and a CREA term is obtained for estimating the coherency of adjacent regions w.r.t. multiple over-segmentations. The brightness, color, and texture cues are further integrated to measure the appearance similarity between regions in an over-segmentation, which, together with the CREA information, are exploited for making the region merging decisions. A hierarchy of segmentations is obtained by iterative region merging. We conduct experiments on three public image datasets, i.e., BSDS300, BSDS500, and MSRC. The experimental results demonstrate that the proposed approach achieves better segmentation performance than the state-of-the-art approaches with a low computational cost in execution time and memory usage.

Acknowledgment This work was supported by NSFC (61573387 & 61502543), Guangdong Natural Science Funds for Distinguished Young Scholar (16050000051), the PhD Start-up Fund of Natural Science Foundation of Guangdong Province, China (2016A030310457 & 2014A030310180), the Fundamental Research Funds for the Central Universities (16lgzd15), South China Agricultural University Special Funds for Young Scientific Talents, and the GuangZhou Program (Grant no. 201508010032).

References 5. Conclusion In this paper, we propose a novel image segmentation approach, termed Image Segmentation by Cross-region evidence accumulation and Cue integration (ISCC), which is based on over-

[1] J. Malik, S. Belongie, J. Shi, T. Leung, Textons, contours and regions: cue integration in image segmentation, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV'99), 1999. [2] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 888–905.

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ [3] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach. Intell. 24 (5) (2002) 603–619. [4] P.F. Felzenszwalb, D.P. Huttenlocher, Efficient graph-based image segmentation, Int. J. Comput. Vis. 59 (2) (2004) 167–181. [5] R. Nock, F. Nielsen, Statistical region merging, IEEE Trans. Pattern Anal. Mach. Intell. 26 (11) (2004) 1452–1458. [6] T. Cour, F. Bénézit, J. Shi, Spectral segmentation with multiscale graph decomposition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005. [7] E. Sharon, M. Galun, D. Sharon, R. Basri, A. Brandt, Hierarchy and adaptivity in segmenting visual scenes, Nature 442 (2006) 810–813. [8] B. Zhao, L. Fei-Fei, E.P. Xing, Image segmentation with topic random field, in: Proceedings of the European Conference on Computer Vision (ECCV'10), 2010. [9] P. Arbeláez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hierarchical image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (5) (2011) 898–916. [10] B. Peng, L. Zhang, D. Zhang, Automatic image segmentation by dynamic region merging, IEEE Trans. Image Process. 20 (2011) 3592–3605. [11] Z. Yu, A. Li, O. Au, C. Xu, Bag of textons for image segmentation via soft clustering and convex shift, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), 2012. [12] Y. Yang, L. Guo, T. Wang, W. Tao, G. Shao, Q. Feng, Unsupervised multiphase color-texture image segmentation based on variational formulation and multilayer graph, Image Vis. Comput. 32 (2) (2014) 87–106. [13] B. Peng, L. Zhang, D. Zhang, A survey of graph theoretical approaches to image segmentation, Pattern Recognit. 46 (3) (2013) 1020–1038. [14] L. Wang, C. Pan, Robust level set image segmentation via a local correntropybased k-means clustering, Pattern Recognit. 47 (5) (2014) 1917–1925. [15] C. Qin, G. Zhang, Y. Zhou, W. Tao, Z. Cao, Integration of the saliency-based seed extraction and random walks for image segmentation, Neurocomputing 129 (2014) 378–391. [16] H. Yang, N. Ahuja, Automatic segmentation of granular objects in images: combining local density clustering and gradient-barrier watershed, Pattern Recognit. 47 (2014) 2266–2279. [17] O.O. Karadag, F.T.Y. Vural, Image segmentation by fusion of low level and domain specific information via Markov random fields, Pattern Recognit. Lett. 46 (2014) 75–82. [18] X. Cai, Variational image segmentation model coupled with image restoration achievements, Pattern Recognit. 48 (2015) 2029–2042. [19] H. Min, W. Jia, X.-F. Wang, Y. Zhao, R.-X. Hu, Y.-T. Luo, F. Xue, J.-T. Lu, An intensity-texture model based level set method for image segmentation, Pattern Recognit. 48 (4) (2015) 1543–1558. [20] A.L.N. Fred, A.K. Jain, Combining multiple clusterings using evidence accumulation, IEEE Trans. Pattern Anal. Mach. Intell. 27 (6) (2005) 835–850. [21] X. Wang, C. Yang, J. Zhou, Clustering aggregation by probability accumulation, Pattern Recognit. 42 (5) (2009) 668–675. [22] S. Mimaroglu, E. Erdil, Combining multiple clusterings using similarity graph, Pattern Recognit. 44 (3) (2011) 694–703. [23] D. Huang, J.-H. Lai, C.-D. Wang, Exploiting the wisdom of crowd: a multigranularity approach to clustering ensemble, in: Proceedings of the International Conference on Intelligence Science and Big Data Engineering (IScIDE'13), 2013. [24] D. Huang, J.-H. Lai, C.-D. Wang, Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis, Neurocomputing 170 (2015) 240–250. [25] D. Huang, J. Lai, C.-D. Wang, Ensemble clustering using factor graph, Pattern Recognit. 50 (2016) 131–142. [26] D. Huang, J.-H. Lai, C.-D. Wang, Robust ensemble clustering using probability trajectories, IEEE Trans. Knowl. Data Eng. 28 (5) (2016) 1312–1326. [27] S. Kim, C.D. Yoo, S. Nowozin, P. Kohli, Image segmentation using higher-order correlation clustering, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014) 1761–1774. [28] M. Gong, Y. Liang, J. Shi, W. Ma, J. Ma, Fuzzy c-means clustering with local information and kernel metric for image segmentation, IEEE Trans. Image Process. 22 (2013) 573–584. [29] L. Chen, J. Zou, C.P. Chen, Kernel spatial shadowed c-means for image segmentation, Int. J. Fuzzy Syst. 16 (2014). [30] H. Ali, N. Badshah, K. Chen, G.A. Khan, A variational model with hybrid images data fitting energies for segmentation of images with intensity inhomogeneity, Pattern Recognit. 51 (2016) 27–42. [31] Y. Wu, C. He, Indirectly regularized variational level set model for image segmentation, Neurocomputing 171 (2016) 194–208. [32] T.H. Kim, K.M. Lee, S.U. Lee, Learning full pairwise affinities for spectral segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10), 2010. [33] Z. Li, X.-M. Wu, S.-F. Chang, Segmentation using superpixels: a bipartite graph partitioning approach, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), 2012. [34] A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in: 2001 Advances in Neural Information Processing Systems (NIPS'01), MIT Press, 2001, pp. 849–856. [35] U. von Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007) 395–416.

11

[36] A. Brandt, S. McCormick, J. Ruge, Sparsity and its Applications, Cambridge University Press, 1984. [37] X. Wang, Y. Tang, S. Masnou, L. Chen, A global/local affinity graph for image segmentation, IEEE Trans. Image Process. 24 (4) (2015) 1399–1411. [38] R.S. Medeiros, J. Scharcanski, A. Wong, Natural scene segmentation based on a stochastic texture region merging approach, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'13), pp. 1464–1467. [39] A.K. Jain, Data clustering: 50 years beyond k-means, Pattern Recognit. Lett. 31 (8) (2010) 651–666. [40] S. Vega-Pons, J. Ruiz-Shulcloper, A survey of clustering ensemble algorithms, Int. J. Pattern Recognit. Artif. Intell. 25 (2011) 337–372. [41] L. Franek, D.D. Abdala, S. Vega-Pons, X. Jiang, Image segmentation fusion using general ensemble clustering methods, in: Proceedings of the Asian Conference on Computer Vision (ACCV'10), pp. 373–384. [42] M. Mignotte, Segmentation by fusion of histogram-based k-means clusters in different color spaces, IEEE Trans. Image Process. 17 (2008) 780–787. [43] H. Kim, J.J. Thiagarajan, P.T. Bremer, Image segmentation using consensus from hierarchical segmentation ensembles, in: Proceedings of the IEEE International Conference on Image Processing (ICIP'14), pp. 3272–3276. [44] S. Vega-Pons, X. Jiang, J. Ruiz-Shulcloper, Segmentation ensemble via kernels, in: Proceedings of the Asian Conference on Pattern Recognition (ACPR'11), 2011. [45] A. Alush, J. Goldberger, Ensemble segmentation using efficient integer linear programming, IEEE Trans. Pattern Anal. Mach. Intell. 34 (10) (2012) 1966–1977. [46] M. Mignotte, A label field fusion model with a variation of information estimator for image segmentation, Inf. Fusion 20 (2014) 7–20. [47] H. Wang, Y. Zhang, R. Nie, Y. Yang, B. Peng, T. Li, Bayesian image segmentation fusion, Knowl.-Based Syst. 71 (2014) 162–168. [48] S. Vega-Pons, J. Correa-Morris, J. Ruiz-Shulcloper, Weighted partition consensus via kernels, Pattern Recognit. 43 (8) (2010) 2712–2724. [49] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV'01), 2001. [50] J. Shotton, J. Winn, C. Rother, A. Criminisi, Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation, in: Proceedings of the European Conference on Computer Vision (ECCV'06), 2006. [51] T. Malisiewicz, A.A. Efros, Improving spatial support for objects via multiple segmentations, in: Proceedings of the British Machine Vision Conference (BMVC'07), 2007. [52] R. Unnikrishnan, C. Pantofaru, M. Hebert, Toward objective evaluation of image segmentation algorithms, IEEE Trans. Pattern Anal. Mach. Intell. 29 (6) (2007) 929–944. [53] M. Meilǎ, Comparing clusterings: an axiomatic view, in: Proceedings of the International Conference on Machine Learning (ICML'05), 2005. [54] J. Freixenet, X. Muñoz, D. Raba, J. Martí, X. Cufí, Yet another survey on image segmentation: region and boundary information integration, in: Proceedings of the European Conference on Computer Vision (ECCV'02), 2002. [55] W.M. Rand, Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc. 66 (336) (1971) 846–850. [56] J. Wang, Y. Jia, X.-S. Hua, C. Zhang, L. Quan, Normalized tree partitioning for image segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08), 2008. [57] Y. Deng, B. Manjunath, Unsupervised segmentation of color-texture regions in images and video, IEEE Trans. Pattern Anal. Mach. Intell. 23 (8) (2001) 800–810. [58] M. Donoser, M. Urschler, M. Hirzer, H. Bischof, Saliency driven total variation segmentation, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV'09), 2009. [59] H. Mobahi, S. Rao, A. Yang, S. Sastry, Y. Ma, Segmentation of natural images by texture and boundary compression, Int. J. Comput. Vis. 95 (1) (2011) 86–98. [60] X. Wang, H. Li, C.E. Bichot, S. Masnou, L. Chen, A graph-cut approach to image segmentation using an affinity graph based on l0-sparse representation of features, in: Proceedings of the IEEE International Conference on Image Processing (ICIP'13), pp. 4019–4023. [61] X. Yang, L. Prasad, L.J. Latecki, Affinity learning with diffusion on tensor product graph, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2013) 28–38.

Dong Huang received his B.S. degree in computer science in 2009 from South China University of Technology, China. He received his M.Sc. degree in Computer Science in 2011 and his Ph.D. degree in Computer Science in 2015, both from Sun Yat-sen University, China. He joined South China Agricultural University in 2015 as an Assistant Professor with College of Mathematics and Informatics. His research interests include data mining and pattern recognition.

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i

12

D. Huang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Jian-Huang Lai received his M.Sc. degree in Applied Mathematics in 1989 and his Ph.D. in Mathematics in 1999 from Sun Yat-sen University, China. He joined Sun Yat-sen University in 1989 as an Assistant Professor, where currently he is a Professor with the Department of Automation of School of Information Science and Technology. His current research interests are in the areas of digital image processing, pattern recognition, multimedia communication, wavelet and its applications. He has published over 100 scientific papers in the international journals and conferences on image processing and pattern recognition, e.g. IEEE TPAMI, IEEE TKDE, IEEE TNN, IEEE TIP, IEEE TSMC (Part B), Pattern Recognition, ICCV, CVPR and ICDM. Prof. Lai serves as a standing member of the Image and Graphics Association of China and also serves as a standing director of the Image and Graphics Association of Guangdong.

Chang-Dong Wang received the B.S. degree in applied mathematics in 2008, M.Sc. degree in Computer Science in 2010 and Ph.D. degree in Computer Science in 2013, all from Sun Yat-sen University, China. He is a visiting student at University of Illinois at Chicago from January 2012 to November 2012. He joined Sun Yat-sen University in 2013 as an Assistant Professor with School of Mobile Information Engineering. His current research interests include machine learning and data mining. He has published over 40 scientific papers in international journals and conferences such as IEEE TPAMI, IEEE TKDE, IEEE TSMC-C, Pattern Recognition, KAIS, Neurocomputing, ICDM and SDM. His ICDM 2010 paper won the Honorable Mention for Best Research Paper Awards.

Pong C. Yuen received his B.Sc. degree in Electronic Engineering with first class honors, in 1989 from City Polytechnic of Hong Kong, and his Ph.D. degree in Electrical and Electronic Engineering, in 1993 from the University of Hong Kong. He joined the Department of Computer Science, Hong Kong Baptist University, in 1993 and currently is a Professor and Head of the Department. Dr. Yuen was a recipient of the University Fellowship to visit the University of Sydney, in 1996. He was associated with the Laboratory of Imaging Science and Engineering, Department of Electrical Engineering. In 1998, Dr. Yuen spent a 6-month sabbatical leave in the University of Maryland Institute for Advanced Computer Studies (UMIACS), University of Maryland at College Park. He was associated with the Computer Vision Laboratory, CFAR. From June 2005 to January 2006, he was a Visiting Professor in GRAVIR Laboratory (GRAphics, VIsion and Robotics) of INRIA Rhone Alpes, France. He was associated with PRIMA Group. Dr. Yuen was the director of Croucher Advanced Study Institute (ASI) on biometric authentication in 2004 and the director of Croucher ASI on Biometric Security and Privacy in 2007. Dr. Yuen has been actively involved in many international conferences as an organizing committee and/or technical program committee member such as FG and ICB. Recently, he was the track co-chair of International Conference on Pattern Recognition 2006. Currently, Dr. Yuen is an editorial board member of Pattern Recognition. Dr. Yuen's current research interests include human face processing and recognition, biometric security and privacy, context modeling and learning for human activity recognition.

Please cite this article as: D. Huang, et al., Ensembling over-segmentations: From weak evidence to strong segmentation, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.028i