Interactive color image segmentation via iterative evidential labeling

Interactive color image segmentation via iterative evidential labeling

Information Fusion xxx (2014) xxx–xxx Contents lists available at ScienceDirect Information Fusion journal homepage: www.elsevier.com/locate/inffus ...

4MB Sizes 2 Downloads 94 Views

Information Fusion xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Information Fusion journal homepage: www.elsevier.com/locate/inffus

Interactive color image segmentation via iterative evidential labeling Yin Chen a, Armin B. Cremers b, Zhiguo Cao a,⇑ a National Key Laboratory of Science and Technology on Multi-spectral Information Processing, School of Automation, Huazhong University of Science and Technology, Luo Yu Road, No. 1037, Hongshan District, 430074 Wuhan, China b Institute of Computer Science III, Rheinische Friedrich-Wilhelms-Universität Bonn, Römerstr. 164, 53117 Bonn, Germany

a r t i c l e

i n f o

Article history: Received 9 November 2013 Received in revised form 21 February 2014 Accepted 24 March 2014 Available online xxxx Keywords: Interactive image segmentation Markov random fields (MRFs) Dempster–Shafer’s (DS) theory of evidence Bayesian information criterion (BIC) Information fusion

a b s t r a c t We develop an interactive color image segmentation method in this paper. This method makes use of the conception of Markov random fields (MRFs) and D–S evidence theory to obtain segmentation results by considering both likelihood information and priori information under Bayesian framework. The method first uses expectation maximization (EM) algorithm to estimate the parameter of the user input regions, and the Bayesian information criterion (BIC) is used for model selection. Then the beliefs of each pixel are assigned by a predefined scheme. The result is obtained by iteratively fusion of the pixel likelihood information and the pixel contextual information until convergence. The method is initially designed for twolabel segmentation, however it can be easily generalized to multi-label segmentation. Experimental results show that the proposed method is comparable to other prevalent interactive image segmentation algorithms in most cases of two-label segmentation task, both qualitatively and quantitatively. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction Image segmentation is the first step of many computer vision tasks. It involves partitioning an image into several homogeneous parts, which are spatially connected clusters of pixels, while the union of any two neighboring parts is heterogeneous. In general, image segmentation methods can be categorized as: fully automatic ones, semi-automatic ones, and manual ones. It is time consuming and tedious as well as lacking in precision to manually segment images. Such methods are impractical for images with a large size or long image sequences. On the other hand, fully automatic methods can segment images without human intervention, which greatly simplifies the operation. These methods can achieve high accuracy in many uncomplicated image scenes. However, fully automatic methods often fail when the image scene is complex. In these situations, semi-automatic methods can be the best choice. The segmentation is obtained after a few interactions (usually scribbles or strokes) are provided, which indicate the region of interest. This kind of user interaction can help to segment difficult scenes. In the pattern classification view, such user inputs can be viewed as supervised information which provides visual hints to model and group visual patterns. Many existing machine learning algorithms can be employed to segment images with such supervised information [1]. ⇑ Corresponding author. Tel: +86 27 87558918; fax: +86 27 87540131. E-mail address: [email protected] (Z. Cao).

Practically, image segmentation is not an easy task due to all sorts of difficulties, such as noise pollution, illumination variation and background clutter. Color images contain more information, which makes it more difficult to segment color ones [2]. Thus, there has been much research on color image segmentation, and it has received much attention for visual surveillance, intelligent transportation, special film effects, and so on. Many different color image segmentation methods have been reported. For example, the methods based on mathematical morphology [3], MRFs [4,5], neural networks [6], support vector machines (SVMs) [7], and so on. MRFs consider the spatial–contextual information contained in images in the framework of Bayesian decision theory. In this framework, the labels representing segmentation results are decided by considering both likelihood information given by pixel values and priori information given by the labels of neighborhoods [8]. Data fusion has gained a lot of research interests in the last decade [9,10]. There are many data fusion techniques. They are fusion by Bayesian inference [11], fusion by probabilistic [12], fuzzy fusion [13] and evidence theory, also known as D–S theory [14,15], which is the base of this work. The D–S theory has been used for MRI segmentation and classification [16–20]. In [16], an unsupervised algorithm based on D–S evidence theory is proposed to segmenting and visualizing left heart ventricles. In [17], some key features of D–S evidence theory are pointed out, as well as with examples of brain tissue classification in pathological dual echo MR images. In [18], a segmentation scheme of multi-echo MR image is proposed. The scheme combines spatial information by the fusion

http://dx.doi.org/10.1016/j.inffus.2014.03.007 1566-2535/Ó 2014 Elsevier B.V. All rights reserved.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

2

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

of the information of spatial neighborhoods. In [19], the information of spatial neighborhood is introduced in Evidential C-Means to deal with the problem of multi-source image segmentation, with applications to prostate multi-parametric MRI. In [20], a pixel labeling method is proposed based on evidence theory. In [21], the problem of color image segmentation is tackled by considering tristimulus R, G and B as three independent information sources and fusing the information provided by different sources. However, in some cases, the information of different color channel may conflict, which will lead to non-sense fusion results [22]. The main contribution of this paper is that we present a new interactive color image segmentation method by making use of the conception of Markov random fields (MRFs) and D–S evidence theory. In [18], the spatial information is introduced by fusion of basic belief assignments of neighboring pixels. But such treatment is intuitive. Since segmentation results are decided by considering both likelihood information and priori information under Bayesian framework, we here consider spatial information by generalizing it under the evidential framework. The proposed method has only one parameter. Starting with two-label segmentation, the method can be generalized to multi-label segmentation. Experimental results have demonstrated the effectiveness of our method. In [20], although the pixel labeling method is based on evidence theory, which is also the basis of our method, there are essential differences between the methods. In particular the contribution of our method includes: (1) the spatial contextual information is introduced in a Bayesian framework; (2) the assignment of nonsingleton belief is inspired by [23], which considers how large the difference among the involved singletons; (3) class variances are considered in assigning the belief of an unlabeled pixel. The rest of the paper is organized as follows. First, related works are summarized in Section 2. Then the basic conception of MRF and D–S evidence theory are introduced in Section 3. We describe in Section 4 our segmentation scheme. Experimental results are given in Section 5, and discussions and conclusions are given in Sections 6 and 7. 2. Related works In [24], image segmentation is formulated as a labeling problem in which image pixels or features are assigned with labels. In particular, a set of sites and a set of labels are defined, with a neighborhood system representing the interrelationship between sites. The contextual constraints are integrated into energy functions under Bayesian decision rule [24]. The labeling result is obtained by different optimization methods. Another interactive image segmentation method related to MRF is graph cut, proposed in [25,26]. It determines a globally optimal solution using a fast min-cut/max flow algorithm. The graph cut method boasts high speed, high stability and strong mathematical foundation, and has become popular. In [27], the ‘‘GrabCut’’ algorithm based on graph cuts is proposed. It iteratively uses the min-cut/max flow algorithm to minimize energy, instead of the one-shot algorithm in [26]. In [20], the formulation of image segmentation is also a pixel labeling problem. However unlike the above mentioned methods, the strategy adopted in that work is to firstly label those pixels with low degrees of doubt. Those with high doubt degrees are labeled progressively by using iterative regularization, which step-by-step achieve more accurate results. Although the strategy adopted in that work is similar to our proposed method, there are several differences between the two, including the assignment of belief and regularization scheme. In [28], a region merging strategy is proposed which uses maximal-similarity mechanism to guide the process of merging. In the mechanism, the input image is firstly segmented into regions by

any kind of general methods, such as mean shift or watershed. Then, a region is merged with one of its neighboring regions if the similarity between the two neighboring regions is the highest among all neighboring region pairs. The region merging process neither requires similarity threshold, nor does it depend on the image content. Besides the above mentioned methods, there are still other different classifiers like ‘‘linear discriminant analysis (LDA) + Knearest neighbor (KNN) classifier’’ [29,30], support vector machine (SVM) [7], random walks (RW) [31], lazy snapping [32], paint selection [33], etc. These methods are effective in many cases. However they may generate unsatisfactory results in complex natural scenes. In graph cuts [25,26] or Grabcut [27], the energy function is optimized by finding the minimal cut of a graph. However the result of graph cut may be inaccurate due to the complexity of scenes and inaccurate parameter estimation. In [20] the pixel labeling process is by utilizing D–S evidence theory. But the regularization method in [20] is not in the sense of maximum a posteriori (MAP). Intuitively, data fusion has the potential to improve the performance of image segmentation. In our method we combine the D–S evidence theory with the MRF framework, yielding a new method which is in the sense of MAP. In the following sections, we present our interactive color image segmentation method based on MRF and D–S evidence theory as well as experiments and discussions. 3. Basics of MRF and D–S evidence theory 3.1. MRF image segmentation MRF formulates image segmentation as a maximum a posterior (MAP) problem. It iteratively optimizes the class labels by maximizing the global posterior probability. Let S be the pixel set, X = {x1, x2,   , xS} the observation field, and X = {x1, x2,    xS} the corresponding label field, where xi 2 f1; 2;    ; C g can take any label of the class set f1; 2;    ; C g. The optimization of global posterior probability p(X|X) under Bayesian rule is given by [8]:

^ ¼ arg max pðXjXÞ X X

¼ arg max fpðXjXÞpðXÞg

ð1Þ

X

Here in (1), p(X|X) is the likelihood probability of the observation field X conditioned by the label field X, while p(X) is the prior probability of the full scene label field. If Gaussian distribution is assumed for the observation field, we have:

pðXjXÞ ¼

"  2 # xi  l i 1 pffiffiffiffiffiffiffi  exp 2r2i 2pri i¼1

S Y

ð2Þ

where li and ri are the average value and standard deviation of the Gaussian distribution corresponding to the class xi, respectively. The prior probability of the label field, p(X), is computationally intractable. Because for an image with I  J pixels and each pixel having C possible class labels, X has CIJ different configurations. We consider the posterior probability at the individual pixel level, rather than for the whole scene. Making use of the conception of neighborhood systems, we have:

^ c ¼ arg max pðxc jxi ; x@i Þ x

ð3Þ

c

where @i is the neighborhood of pixel i. An example of neighborhood system is shown in Fig. 1, and the neighborhood size chosen in this study is 3  3. Assuming that the central observation xi is independent of neighboring labels, it can be shown that [34]:

pðxi jxi ; x@i Þ / pðxi jxi Þpðxi jx@i Þ

ð4Þ

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

3

If we consider the closed-world case, the bba satisfies m(/) = 0, which means that the solution belongs to the discernment frame [22]. If A is a union of hypotheses, redistributing a part of m(A) to any subset of A is not possible without additional information. An element A # H is called a focal element if m(A) – 0. The credibility (Bel) and the plausibility (Pl) functions are defined as:

X

BelðAÞ ¼ Fig. 1. Neighborhood system with different orders. The numbers n = 1, ... , 5 indicate the outermost neighboring sites in the nth-order neighborhood system.

In order to make use of optimization methods, the above equation is usually taken negative logarithm, replacing multiplications by summations. Therefore we have [8]:

^ ¼ arg maxfpðXjXÞpðXÞg X X S X ¼ arg min log ½pðxi jxi Þpðxi jx@i Þ c

ð5Þ

i¼1

S X ¼ arg min ðU data þ U context Þ c

i¼1

U data ¼ log pðxi jxi Þ ¼

 2 S X xi  l i 2r2i i¼1

U context ¼ log pðxi jx@i Þ ¼

S X X i¼1

0;

if

1; if



bdk xi ; xj



xi ¼ xj xi –xj

We can interpret Bel(A) and Pl(A), respectively, as the total amount of belief in the proposition A, and the maximum amount of belief potentially assigned to A. Functions m, Bel(A) and Pl(A) can be viewed as three different representations of the same information [15]. By using Möbius transformation, these functions are transformed to each other [35]. According to Dempster’s combination rule, the combination of J masses of belief from J distinct sources is defined as:

ð13Þ

2 ¼ f/; H1 ; H2 ;    ; fH1 [ H2 g; fH1 [ H3 g;    ; Hg

\ ðAÞ m ðAÞ ¼ m1 8A # H ; j X m\ ðAÞ ¼ m1 ðA1 Þ  m2 ðA2 Þ    mJ ðAJ Þ

ð14Þ

A1 \\AJ ¼A



X

m1 ðA1 Þ  m2 ðA2 Þ    mJ ðAJ Þ

ð15Þ

A1 \\AJ ¼/

the term j, with 0 6 j 6 1, is a conflict measure between the J information sources. When it is high, combining the sources is incoherent [22]. Moreover, when j = 1, the sources completely contradict each other and data fusion is not possible. There are different solutions dealing with the conflict. For example, it is proposed in [36] to avoid the normalization, and in [37] a general framework unifying different classical combination rules is developed.

ð8Þ 4. Segmentation scheme

Under the theory of D–S evidence, a frame of discernment H is defined, which is composed of N hypotheses Hn. The problem’s solution set is H = {H1, H2,   , HN}. The power set of all 2N propositions is defined on H as:

ð9Þ

where / represents an empty set. A proposition A can be either a singleton Hn or a union of hypotheses. Define the basic belief assignment (bba) m assigned by a source as:

A#H

ð12Þ

where j is the normalization term and is given by

3.2. D–S evidence theory

m : 2H ! ½0; 1 X mðAÞ ¼ 1

mðBÞ

A\B–/

ð7Þ

j2@i

Unlike optimization based MRF segmentation process, we here use D–S evidence theory to obtain optimal class labels for image pixels by considering the likelihood probability and prior probability in (1) as two different information sources. The final decision is made by fusion of the information of two sources. The basic concept of D–S evidence theory is briefly introduced in next subsection.

H

ð11Þ

where

ð6Þ

Udata and Ucontext are the two energy terms representing how well the observation field fits the assumed distribution model, and how well the label field satisfies the smooth constraint, respectively. In MRF based method, b is a constant parameter tuning the weight between the two energy terms, Udata and Ucontext. Large values of b penalize heavily spatial discontinuities and result in smoother segmentation results but details are removed; while small values result in noisy segmentations but with more details. dk is the so-called Kronecker delta function, which is given by:



X

PlðAÞ ¼

m ¼ m1  m2      mJ

where

  dk xi ; xj ¼

mðBÞ

B # A;B–/

ð10Þ

4.1. Learning the user-input data Let us begin with a two-class problem, which is equivalent to extracting foreground from background in an image. When users give scribbles or strokes on an image X, a few pixels are set as the foreground and background training pixels, denoted as Xf and Xb. Assuming the image data of either foreground or background satisfy Gaussian mixture model (GMM), the distribution of either Xf or Xb consists of a combination of finite Gaussian distributions, i.e.

pðX i Þ ¼

ni X pðX i juij ÞPðuij Þ; i ¼ fb; f g; ni < 1

ð16Þ

j¼1

where p(Xi|uij) is the likelihood probability of Xi, and P(uij) is the a priori probability of each Gaussian distribution uij. ni is the number of Gaussian distributions used for modeling background or foreground. n o The distribution parameter hij ¼ lij ; Rij ; Pðuij Þ is estimated by expectation maximization (EM) algorithm [1]. The EM algorithm iteratively execute two steps until convergence, namely an expectation step (E-step) and a maximization step (M-step). The E-step uses current estimates of the parameters to compute the unknown

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

4

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

underlying variables conditioned by observations, while new parameter estimations are obtained in the maximization step. More about EM are in [1,38]. 4.2. Determining optimal class numbers We use the criterion in statistics, the so-called Bayesian information criterion (BIC), or Schwarz criterion (also SBC, SBIC), for model selection [39]. BIC balances between fitting accuracy and model complexity. For a distribution density X i ; i ¼ fb; f g , with N samples, modeled by k parameters, the BIC is given by [39]:

BICðkÞ ¼ 2L þ k ln N

ð17Þ

where L is the maximum of the likelihood function for the estimated model. According to [40], the BIC is approximately equivalent to:

^ 2e Þ þ k ln N BICðkÞ ¼ N lnðr

is more suitable for color image segmentation. The reason is as follows:  The mass function of Denœux’s model is based on distances between samples and class center, which does not consider the variance of each class. This is not appropriate for classes with variances that differ drastically.  Shafer’s model considers variance of classes. But it does not directly assign values to mass functions, instead, it assigns values to plausibility functions. Therefore, transformation from plausibility to mass is need for every pixel, which increases computational burden.  Appriou’s model is not only consistent with Bayesian approach and probabilistic association of sources, but it also considers the variance of each class. Therefore it is suitable for images with high value of difference between class variances, such as natural scene images.

ð18Þ

where r is the variance of modeling error. The best model is indicated by the lowest BIC value. Fig. 2 shows an example of a mixture density which consists of 3 Gaussian distributions and its BIC computed by using different parameter numbers. It is clear that the model with three parameters generates the lowest BIC value. ^ 2e

4.3. The assignment of belief As mentioned before, we treat the likelihood and prior part of (1) as two sources of information. For a foreground extraction problem, there are only two classes. The label set is therefore L ¼ fC 1 ; C 2 g, with C1 and C2 representing background and foreground, respectively. To make use of the evidence theory’s ability to deal with uncertainties, we define an augmented label set Lþ ¼ fC 1 ; C 2 ; C 12 g, with C12 representing uncertain knowledge, which means at a certain stage, we do not know how to dispose the current pixel. At the initial step, the labels of user scribbled pixels are labeled as either C1 or C2. All the pixels not scribbled are labeled as C12. By iteratively fusing both likelihood and prior information, more pixels can be labeled as either C1 or C2, and pixels labeled as C12 become few and few. Eventually, a criterion is satisfied to stop the iteration process. 4.3.1. Assigning belief to the likelihood part of MRF There are three different kinds of evidence model in the literature, namely Denœux’s model [41], Shafer’s model [15], and Appriou’s model [10]. In [18], the authors use Denœux’s model to segment multi-echo MR images. Here we believe Appriou’s model

However some changes need to be made. In particular, for userlabeled pixels, the uncertainty of such pixels belonging to the augmented label set L+ is null; only for non-user-labeled pixels, there is uncertainty. That is to say, for example, if a pixel p e Xb, then we have the likelihood mass:

8 0 > < mlikelihood ðC 1 Þ ¼ 1 m0likelihood ðC 2 Þ ¼ 0 > : 0 mlikelihood ðC 12 Þ ¼ 0

ð19Þ

For non-user-labeled pixels, the mass of singleton is assigned as:

8   2 > ðXl Þ 0 > > max exp  2r21j < mlikelihood ðC 1 Þ ¼ a  1jn 1j 1   2 > X l ð > 2j Þ > : m0likelihood ðC 2 Þ ¼ a  max exp  2r2 1jn2

ð20Þ

2j

where 0 < a < 1 is a constant parameter. It is explained as a belief discounting parameter in [18]. Inspired by [23], we define the mass of the undecided label C12 by considering how large the difference between two singleton mass is. That is:

  m0likelihood ðC 12 Þ ¼ 1  m0likelihood ðC 1 Þ  m0likelihood ðC 2 Þ

ð21Þ

For example, if the difference between two singleton mass is large, the ambiguity in assigning either label is low, then m0 likelihood(C12) is small; on the contrary, if the difference between two singleton mass is small, then m0 likelihood(C12) is large. To satisfy (10), normalization is needed:

Fig. 2. (a) An example of a mixture density consisting of 3 Gaussian distributions, (b) the BIC value of the mixture density evaluated by using different numbers of Gaussian distributions. The lowest BIC value indicates that the mixture density consists of 3 Gaussian distributions.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

mlikelihood ðxi Þ ¼ P

m0likelihood ðxi Þ ; 0 xi 2Lþ mlikelihood ðxi Þ

xi 2 Lþ

ð22Þ

4.3.2. Assigning belief to the a priori part of MRF A heuristic assignment of the a priori part of MRF is used here, which only considers the number of different labels. That is:

mprior ðxi Þ ¼ Nxi =N;

xi 2 Lþ

ð23Þ

where N xi is the number of pixels currently labeled as xi. N is the total pixel number of the neighborhood. Fig. 3 shows an example of a neighborhood with different labels and how to assign belief to the a priori part of MRF. Note that the central pixel is included in the neighborhood. The a priori part has smooth effect as the smooth energy of MRF. For example, if a label is rarely seen in a neighborhood, say only one or two occurrences. This situation is mostly caused by noise. Then the belief assignment to this label is small according to (23), and the fusion result will favor other labels for the central pixel. The effect of prior information is shown in Section 5.3. 4.4. Decision making The label of a pixel is chosen by comparing the evidence after fusion the likelihood and a priori part of MRF, which is expressed as:





xopt ¼ arg max mlikelihood ðxi Þ  mprior ðxi Þ ; xi 2 Lþ

ð24Þ

xi

The reason why we choose the mass as the decision function is that we need not transform it into other kinds of functions for each pixel, which saves a lot of computation time. Moreover, the purpose of the decision in each iteration is to decide whether we have enough evidence to believe any pixel previously labeled as C12 belongs to either background or foreground at any iteration. Therefore, we are not able to make decisions if we choose other functions, such as plausibility function or pignistic probability function.

5

have singleton labels like C1, C2,   , which can be viewed as the convergence of the algorithm. But actually, a small number of pixels may have undecided labels like C12, C13,    at the last iteration. So the purpose of (25) is to force these undecided pixels to have a singleton label by comparing their singletons of mass. If the singletons have equal values, it implies that the current pixel has the same probability of being either class. So in this situation, a random (arbitrary) label can be assigned to this pixel. 4.6. Overall algorithm We summarize our iterative evidential labeling algorithm in 7 steps: (1) (2) (3) (4) (5) (6)

User input. EM parameter estimation. Model selection by BIC. Assigning bbas according to (22) and (23). Fusion of information and decision making according to (24). Iteratively executing Steps (2)–(5) until stopping criterion is satisfied. (7) Solving undecided pixels according to (25). The flowchart of the iterative evidential labeling algorithm is shown in Fig. 4. 4.7. Generalization to multi-label segmentation The generalization of our method to multi-label segmentation problem is straightforward. The EM parameter estimation step, the model selection step and the information fusion step, as well as the decision making step are the same as in two-label

4.5. Iterative evidential labeling and stopping criterion The evidential labeling is a process which iteratively executes the parameter estimation, model choosing, belief assignment and decision making. There are two stopping criteria commonly used: (1) the iteration step exceeds a predefined value, or (2) the proportion of changed labels to the total image pixels between two consecutive steps is less than a predefined threshold. Here we choose the latter for our method. After the iteration stops, some pixels may be labeled as C12. The labels of these pixels can be decided by:

xopt ¼ arg max mðxi Þ; xi 2 L

ð25Þ

xi

which means that the labels are decided by comparing their belief assignments on singletons. The aim of (24) is to eliminate uncertainties so that the number of undecided pixels decreases step by step. Ideally, at last, all pixels

Fig. 3. An example of a neighborhood with different labels and how to assign belief to the a priori part of MRF.

Fig. 4. The flowchart of our proposed method.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

6

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

segmentation. The only difference lies in the belief assigning step. Taking a C-label problem as an example, the label set becomes L ¼ f1; 2;    ; C g, the augmented label set is accordingly defined as Lþ ¼ 2L n f/g. For pixels that are labeled as l0 by a user, the likelihood mass is:



m0likelihood ðlÞ ¼ 1;

l ¼ l0

ð26Þ

m0likelihood ðlÞ ¼ 0; l – l0 and l 2 Lþ

For non-user-labeled pixels, the mass of any singleton is defined as:

2

m0likelihood ðlÞ

X  llj 6 ¼ a  max exp 4 1  jnl 2r2lj

2 3 7 5

ð27Þ

where nl is the number of components in the GMM parameter for class l. Eq. (27) means that for each pixel, its belief assignment to class l is calculated by taking the maximum likelihood of belonging to any of the components in class l, multiplied by the discounting parameter a. The mass of a non-singleton is defined by considering how large the difference between the corresponding singletons is. In particular, the mass of non-singleton is defined by:

m0likelihood ðl1 l2

 !   k  k X 1X   0 0    lk Þ ¼ max 1  mlikelihood ðli Þ; 0 mlikelihood ðlj Þ    k j¼1 i¼1 ð28Þ

The meaning of Eq. (28) is that, if we want to measure the uncertainty among several singletons, we can first calculate the sum of differences between every singleton and the center of them, then subtract the sum from 1. For example, if the belief assignments of several singletons are very close, the ambiguity of them is large, which is consistent with the result of Eq. (28); conversely, if several singletons have very different belief assignments, the ambiguity of them is insignificant, which coincides with the fact that the result of Eq. (28) is small. Moreover, if there are three singleton labels, two of which are similar and the last dissimilar. For example, say, the three singletons are m0 (C1) = 0.2, m0 (C2) = 0.8, m0 (C3) = 0.8, respectively [note that they are not summed to 1 before the normalization step in (22)]. Then m0 (C23) should have a higher value than m0 (C123). Because m0 (C23) measures the basic belief of assigning either label 2 or label 3 to current pixel, while m0 (C123) measures the basic belief of assigning label 1 or label 2 or label 3. Conversely, if the three singletons are m0 (C1) = 0.8, m (C2) = 0.2, m0 (C3) = 0.2, then according to Eq. (28), m0 (C123) = 0.2 and m0 (C23) = 1. In this situation, the current likelihood assignment of m0 (23) is indeed larger than m0 (1). However, this is not the labeling result of the algorithm, because the prior information has not been used yet. And the prior information will help remove the uncertainty of the assignment. So this situation is temporary and the uncertainty will tend to decrease. In some situations, subtracting the sum of differences between every singleton and the center of them from 1 may get negative values. To deal with this problem, we can simply replace these negative values by 0. That is take the maximum value between the subtraction result and 0. The rationale is that Eq. (28) represents the ambiguity of a decision. A negative value implies that the ambiguity is extremely low, therefore we can use 0 to indicate the low ambiguities. We can easily show (21) is the special case of (28) when k = 2. Similarly, the mass is finally normalized according to (22). In addition, the mass of the a priori part of MRF is defined by (23).

4.8. Implementation details One of the important aspects of our method is that it involves EM parameter estimation in each iteration, which increases computation complexity. To save computation time, we adopt a simple strategy in which we let the initial label of EM algorithm be the labels of the last iteration. By using this strategy, only two or three steps are needed for each EM procedure. Secondly, when computing the pixel prior, the labels of previous iteration are used. That is, we store the neighboring labels for all pixels in the last iteration in an array and then calculate the numbers of neighboring labels in each class for all pixels, instead of updating labels one by one. In fact, computing in this way is a kind of parallel treatment of image pixels. Finally, the labels at image boundaries are replicated from the original labels so that the prior information at image boundaries can be computed normally. However at such places the prior information is inaccurate especially at corners. 5. Experimental results We show two features of our method. One is the convergence speed of our method, while the other is the effect of parameter a in (20) and (27) on our method. We tested our method with different real natural images from Berkely Segmentation Dataset [42] and GrabCut dataset [43]. In addition, we compared our method with other algorithms based on ‘‘Evidence-based pixel labeling’’ (referred as EBPL) in [26], SVM [7], graph cut [25], GrabCut [27], and maximum similarity based region merging (referred as MSRM) [28]. The images used in our experiments and initial strokes, as well as the manual-labeled ground truth of them are shown in Fig 5. In [21], the color space chosen for cell image segmentation is RGB space. The three channels are considered as three information sources and the final decision is made by combining the information provided by the sources. However, for natural scene images, RGB color space is not suitable. Because it is affected by illumination changes. Therefore, we choose CIELab color space [44] for our method. 5.1. The convergence speed of our method To illustrate how our method works, we show in Figs. 6 and 7 the labeling process at different iteration steps. The parameter a used in this experiment is set to 0.5. In the figures, red color represents pixels that are labeled as foreground, while blue color represents pixels that are labeled as background. Meantime, earlier iterations have some pixels that are labeled as C12, and they are represented as white regions. At the first iteration step, pixels labeled as either foreground (red)1 or back ground (blue) are user inputs. The other pixels are non-user inputs which are white. In the following steps, with the fusion of likelihood and prior information of MRF, the ambiguity in labeling the pixels is removed little by little, and the number of white pixels becomes smaller and smaller. Finally, the iteration converges to a stable state. 5.2. The effect of parameter a on our method As mentioned in Section 4.3.1, the parameter a is a belief discounting parameter. Intuitively, the higher the parameter is, the less ambiguity exists in likelihood information of MRF and consequently less steps are needed to get convergence. We show in 1 For interpretation of color in Figs. 6 and 7, the reader is referred to the web version of this article.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

7

Fig. 5. The images used in our experiments, the initial strokes and the manual-labeled ground truth of them. (a) The original Images 1–7 with initial strokes, (b) the ground truth of Images 1–7, (c) the original Images 8–14 with initial strokes and (d) the ground truth of Images 8–14.

Fig. 6. The labeling process of Image 1 at different iteration steps. (a) Initial scribbles, (b) Step 1, (c) Step 10, (d) Step 20, (e) Step 30 and (f) Step 40.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

8

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

Fig. 7. The labeling process of Image 2 at different iteration steps. (a) Initial scribbles, (b) Step 1, (c) Step 10, (d) Step 20, (e) Step 30 and (f) Step 50.

Fig. 8 the ratio of pixels labeled as C12 to the total pixel number by using a from 0.5 to 0.9 with a step of 0.05. Also, in Table 1 we show the correct ratio of the final segmentation result by using different values of parameter a. Note that the images used in this experiment are Images 6 and 7 shown in Fig. 5, and the segmentation result is represented by the average color of each segment. It can be seen in Fig. 8 that a low value of parameter does slow down the speed of convergence, while a higher value needs fewer steps to convergence. In particular, when a = 0.5, more than 100 steps are needed for convergence, on the contrary when a = 0.9, several steps are sufficient for convergence. Furthermore, parameter a has effect on the final segmentation results. It is clear for both images, the best segmentation result is obtained when a = 0.6. In fact, a = 0.6 is the optimal parameter for most complex images; whereas it has less effect on simple images. Therefore we set a = 0.6 for all the images in our experiment. 5.3. The effect of prior information As mentioned in Section 4.3.2, the effect of prior information is to smooth the result. Here we show the results of our method with and without prior information in Fig. 9 for comparison. The images used here are Images 3 and 9. It is clear that the prior information does have the smoothing effect. The results without prior information have many isolated points and regions, while the results with prior information are smooth.

Table 1 Segmentation accuracy under different belief discount parameter for Images 6 and 7.

a

Image 6

Image 7

0.5 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.9

97.02 97.02 98.57 98.52 98.44 98.14 97.99 97.86 97.84

96.32 96.32 96.98 95.85 95.41 95.09 94.97 94.86 94.74

The bold values represent the highest correct ratios of the final segmentation for Images 6 and 7 as well as the corresponding belief discounting parameter.

5.4. Segmentation results of two-label case It is worth noting that the evaluation of the segmentation performance depends on the inaccuracy of boundary pixels and the imprecision of object region (such as large holes or missing pieces of the object). Therefore we show both the qualitative and quantitative results. The segmentation results of Images 1–7 are shown in Fig. 10, while the results of Images 8–14 are shown in Fig. 11. The quantitative results for the proposed method and other methods are tabulated in Table 2. From the figures and the table, it is clear that all the methods obtained relatively very good results. However SVM-based method yields inaccurate segmentation results

Fig. 8. The ratio of the pixels labeled as C12 when using different a parameter values. (a) Image 6 and (b) Image 7.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

9

Fig. 9. Results of our method with and without prior information. (a) Results of our method without prior information for Image 3. (b) Results of our method with prior information for Image 3. (c) Results of our method without prior information for Image 9. (d) Results of our method with prior information for Image 9.

in many cases. That is because it does not consider spatial pixel relevance. It only considers the maximum interval hyperplane of the training data samples in both foreground and background input. In some cases, such as Images 3, 6, and 7, where the foreground and background contain similar pixels, the results of SVM-based method contain a large number of misclassified pixels and noise. On the contrary, graph cuts and GrabCut consider both pixel likelihood and spatial contextual information. The results of these two methods are usually smooth and small and isolated regions are seldom seen in the results. In many cases, the results of these two methods are close to real ground truth. Furthermore, the proposed evidence-based method considers the uncertainty of the labeling process, the segmentation is achieved by iteratively fusion of pixel likelihood and spatial contextual information. The results of our method are comparable to the other methods in most cases. 5.5. Segmentation results of multi-label cases As mentioned before, the proposed method can be easily generalized to deal with multi-label segmentation problems. We show

in Fig. 12 the results of our method using three and four labels as the input scribbles. Since the goodness of the solution to the multi-label problem is more dependent on subjective assessment, the ground truth is not provided in this experiment. It can be seen in Fig. 12 that the results of our method are generally satisfactory. The final segmentation results are in accordance with the initial scribbles. The segments of each image is smooth and small isolated areas are seldom seen.

6. Discussion An interesting feature of our method is that when the pixel likelihood information is not useful for classifying between two labels, the decision process depends only on the inter-pixel contextual information. A simple example is used to demonstrate this feature of our method. The image used in the experiment is a synthetic one which has a noisy background with constant mean gray value and a spatially varying colored foreground. The image and the initial strokes are shown in Fig. 13(a). This picture is difficult for SVM

Fig. 10. Segmentation results of our method and other methods for Images 1–7. (a) EBPL, (b) SVM-based method, (c) graph cuts, (d) Grubcut, (e) MSRM and (f) proposed.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

10

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

Fig. 11. Segmentation results of our method and other methods for Images 8–14. (a) EBPL, (b) SVM-based method, (c) graph cuts, (d) Grubcut, (e) MSRM and (f) proposed.

Table 2 The segmentation accuracy of different methods (%). Image

DS

SVM

GraphCut

Grabcut

MSRM

Proposed

1 2 3 4 5 6 7 8 9 10 11 12 13 14

98.49 97.11 95.46 99.27 95.61 98.19 97.45 98.32 98.84 97.63 92.69 91.64 98.26 98.41

98.28 95.39 97.49 93.33 98.76 97.90 96.73 96.99 98.18 96.83 93.24 88.76 97.56 97.12

98.90 95.04 98.31 97.46 98.83 97.27 99.63 97.91 98.68 97.89 97.15 96.02 96.14 98.40

98.70 96.78 98.82 86.65 98.69 97.59 99.66 98.31 93.52 98.05 98.31 98.51 99.53 99.42

97.88 97.63 96.57 99.70 98.71 94.45 97.39 99.14 98.92 99.13 97.87 98.30 99.42 99.46

99.01 96.13 98.95 97.53 98.89 97.84 99.37 98.23 98.48 97.96 98.57 96.98 99.25 98.58

The bold values represent the correct ratios of the top two accurate segmentation results for each image.

method and graph cut method due to the variation of the foreground. As we can see in the figure, the results of SVM and graph cut is inaccurate. The reason for SVM method’s failure is that it lacks the training samples in the middle of the foreground object. As a result, the SVM method can only recognize the pixels similar to the colors on the two scribbles. Let us consider the reason why the middle part of the object is not correctly classified. For graph cuts, the final result is the solution minimizing the sum of the data term and the smooth term. The color of foreground object is gradually varying, so if a pixel in the middle is assigned a label to

foreground, the smooth energy for this pixel should be low. However the color of this pixel is very bright, even close to the background. Therefore the data term of this pixel is very high if it is labeled as foreground. So the graph cuts will label this pixel as background, yielding the inaccurate result. For Grab cut and our method, the GMM models are re-estimated in each iteration. So when the pixel labels change after each iteration, the GMM models will be updated to learn new components. The usage of D–S evidence theory to obtain optimal class label has the advantage that, in a single iteration, if we are not sure about what class the current pixel belongs to, we can give it an undecided label. We only give determined labels to those pixels with enough evidence to belong to a certain class. The parameter re-estimation is executed for pixels with determined labels so that the estimation accuracy can be higher. In regard to the number of user scribbles, it generally depends on the complexity of objects in our method. Simple images generally need less user inputs and complex images need more. Here we choose two images with different complexities to demonstrate that (see Fig. 14. We can see that the user input has less effect on the simple one while more effect on the complex one. The reason is that for the second image, some regions in the foreground are similar to the background. So if no scribbles reach these regions, our method will not be able to learn the parameters in these regions. The algorithm will eventually get stuck in local minima. Therefore, a strategy for users to give strokes is that give more strokes at places where the colors of background and foreground are similar. A drawback of our method is that when the image has very complex foreground and background, our method is unable to

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

11

Fig. 12. Multi-label segmentation results of our method. (a) Initial scribbles for 3-label segmentation, (b) segmentation results of initial scribbles shown in (a), (c) initial scribbles for 4-label segmentation and (d) segmentation results of initial scribbles shown in (c).

Fig. 13. (a) An example image with a gray background and a varying color object with initial strokes; and segmentation results by using (b) SVM, (c) graph cuts, (d) GrabCut and (e) proposed method. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

12

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

Fig. 14. Segmentation results of our method using different input strokes. (a) fewer initial strokes, (b) results of our method using strokes in (a), (c) more initial strokes and (d) results of our method using strokes in (c).

Fig. 15. (a) An example with too complex components, (b) the initial scribbles and (c) the final result of our method.

successfully segment the image. For example, an image with complex scene and the segmentation result of our method are shown in Fig. 15. This scene is difficult for our method, because the foreground and background have similar pixels. The color of tree trunk is like that of the sand and the color of the leaves is like that of the sea. The segmentation result contains a large number of misclassified pixels. Another drawback of our method is the computation complexity, which is a limitation of application when the number of labels is high. Because the total number of propositions defined on the frame of discernment L is 2L. Therefore, the computation complexity of our method is, in theory, O(2N) for N-label segmentation problem, compared with O(N) for non-evidential-based method. Concerning future work, the most needed investigation is a fast implementation of multi-source and multi-proposition information fusion algorithm. In [45], a fast max flow/min cut algorithm for vision is developed. The new algorithm is several times faster than any other methods, which makes it possible for near real-time performance. However, this fast algorithm is not applicable for information fusion. We need to develop new algorithms that are suitable for information fusion. If the computation time is near real-time, we can design better user interfaces, for example like the ‘‘paint selection’’ in [33], where users can progressively obtain the object of interest by painting the object of interest, thanks to the instant feedback. On the other hand, some global priors for image segmentation can be integrated into our framework, for example, in [46] the limitation of MRFs or CRFs that they can only model local interactions is overcome by deriving a potential function that enforces the output labeling to be connected. In [47], a novel nonlocal MRF prior is

proposed to exploit global information using large neighborhoods and a new weighting method. However, we need to consider other issues when incorporating such global prior information into our framework. For example, the combination scheme of such global prior information with likelihood information, the computational complexity, and so on. Furthermore, region-based methods have advantages in dealing with such complex images. In this kind of methods, the minimum semantic image unit is not a pixel, but a region. A region is less affected by noise and other disadvantageous factors, because the region is represented by the common properties of the pixels in this region. As mentioned before, the proposed method lacks the ability to cope with very complex scenes. Therefore a potential future work is to develop a method which combines both regional information and evidence theory. Also, it is promising to use some pattern analysis algorithms to preprocess the input training data, such as kernel methods [48], which map the data into a high dimensional feature space. In addition, there are different ways of treating conflict information [49–51]. The effect of using different ways of conflict treating on the performance of our method is still to be studied. 7. Conclusion In this study, an interactive color image segmentation method via iterative evidential labeling is developed. The proposed method makes use of the conception of MRF and D–S evidence theory. The method first adopts EM algorithm to estimate parameters and BIC to select a best model. The evidential labeling process is achieved by iteratively fusion of the pixel likelihood information and the

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007

Y. Chen et al. / Information Fusion xxx (2014) xxx–xxx

pixel contextual information until convergence. The study starts with the 2-label segmentation problem, then the method is generalized to multi-label segmentation problems. The only parameter of our method is the belief discounting parameter in belief assignment. Experimental results show that this parameter has effects on both the convergence speed of our method but also segmentation results. The optimal parameter value is found to be 0.6. Moreover, the segmentation performance of our method is comparable with the prevalent methods, in terms of subjective visual interpretation and objective classification accuracy. Although the preliminary results of our method show some interesting aspects, we should note that there are still drawbacks of our method, in terms of, for example, the user input/interface, global optimum and computational time, as discussed before. Acknowledgements We thank Dr. Jörg Zimmermann for helpful and useful discussing on our work. We also thank the anonymous reviewers for their helpful and constructive suggestions and comments. References [1] C.M. Bishop, N.M. Nasrabadi, Pattern Recognition and Machine Learning, Springer, New York, 2006. [2] H.-D. Cheng, X. Jiang, Y. Sun, J. Wang, Color image segmentation: advances and prospects, Pattern Recognit. 34 (2001) 2259–2281. [3] A. Gillet, L. Macaire, C. Botte-Lecocq, J.-G. Postaire, Color image segmentation based on fuzzy mathematical morphology, in: Proceedings 2000 International Conference on Image Processing, IEEE, 2000, pp. 348–351. [4] D.K. Panjwani, G. Healey, Unsupervised segmentation of textured color images using markov random field models, in: Proceedings 1993 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’93, IEEE, 1993, pp. 776–777. [5] S. Wesolkowski, P. Fieguth, Adaptive color image segmentation using markov random fields, in: Proceedings, 2002 International Conference on Image Processing, IEEE, 2002, pp. 769–772. [6] G. Dong, M. Xie, Color clustering and learning for image segmentation based on neural networks, IEEE Trans. Neural Networks 16 (2005) 925–936. [7] J. Wang, M.F. Cohen, An iterative optimization approach for unified image segmentation and matting, in: Tenth IEEE International Conference on Computer Vision, ICCV 2005, IEEE, 2005, pp. 936–943. [8] S.Z. Li, Markov Random Field Modeling in Image Analysis, Springer, 2009. [9] D. Dubois, H. Prade, Representation and combination of uncertainty with belief functions and possibility measures, Comput. Intel. 4 (1988) 244–264. [10] A. Appriou, Multisensor signal processing in the framework of the theory of evidence, Lecture series 216 on Application of Mathematical Signal Processing Techniques to Mission Systems, 1999, pp. 5–31. [11] P.L. Bogler, Shafer–Dempster reasoning with applications to multisensor target identification systems, IEEE Trans Syst., Man Cybern. 17 (1987) 968–977. [12] J. Manyika, H. Durrant-Whyte, On sensor management in decentralized data fusion, in: Proceedings of the 31st IEEE Conference on Decision and Control, IEEE, 1992, pp. 3506–3507. [13] J. Keller, G. Hobson, J. Wootton, A. Nafarieh, K. Luetkemeyer, Fuzzy confidence measures in midlevel vision, IEEE Trans Syst., Man Cybern. 17 (1987) 676–683. [14] A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Stat. 38 (1967) 325–339. [15] G. Shafer, A Mathematical Theory of Evidence, Princeton university press, Princeton, 1976. [16] R.A. Isoardi, D.E. Oliva, G. Mato, Maximum evidence method for classification of brain tissues in MRI, Pattern Recognit. Lett. 32 (2011) 12–18. [17] I. Bloch, Some aspects of Dempster–Shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account, Pattern Recognit. Lett. 17 (1996) 905–919. [18] A.-S. Capelle, O. Colot, C. Fernandez-Maloigne, Evidential segmentation scheme of multi-echo MR images for the detection of brain tumors using neighborhood information, Inf. Fusion 5 (2004) 203–216.

13

[19] N. Makni, N. Betrouni, O. Colot, Introducing spatial neighbourhood in Evidential C-Means for segmentation of multi-source images: application to prostate multi-parametric MRI, Inf. Fusion (2012). [20] P. Vannoorenberghe, L. Macaire, O. Colot, Evidence-based pixel labeling for color image segmentation, Comput. Vision Res. Prog. (2008) 279–296. [21] S. Ben Chaabane, M. Sayadi, F. Fnaiech, E. Brassart, Color image segmentation based on Dempster–Shafer evidence theory, in: 14th IEEE Mediterranean Electrotechnical Conference, 2008, MELECON 2008, IEEE, 2008, pp. 862–866. [22] P. Smets, The combination of evidence in the transferable belief model, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 447–458. [23] M. Tabassian, R. Ghaderi, R. Ebrahimpour, Knitted fabric defect classification for uncertain labels based on Dempster–Shafer theory of evidence, Expert Syst. Appl. 38 (2011) 5259–5267. [24] S.Z. Li, Invariant surface segmentation through energy minimization with discontinuities, Int. J. Comput. Vision 5 (1990) 161–194. [25] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 1222–1239. [26] Y. Boykov, G. Funka-Lea, Graph cuts and efficient ND image segmentation, Int. J. Comput. Vision 70 (2006) 109–131. [27] C. Rother, V. Kolmogorov, A. Blake, Grabcut: interactive foreground extraction using iterated graph cuts, ACM Trans. Graphics (TOG), ACM (2004) 309–314. [28] J. Ning, L. Zhang, D. Zhang, C. Wu, Interactive image segmentation by maximal similarity based region merging, Pattern Recognit. 43 (2010) 445–456. [29] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2012. [30] P. Parveen, B. Thuraisingham, Face recognition using multiple classifiers, in: 18th IEEE International Conference on Tools with Artificial Intelligence, 2006, ICTAI’06, IEEE, 2006, pp. 179–186. [31] L. Grady, Random walks for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 1768–1783. [32] Y. Li, J. Sun, C.-K. Tang, H.-Y. Shum, Lazy snapping, ACM Trans. Graphics (TOG), ACM (2004) 303–308. [33] J. Liu, J. Sun, H.-Y. Shum, Paint selection, ACM Trans. Graphics (TOG), ACM (2009) 69. [34] J.A. Richards, Remote Sensing Digital Image Analysis: An Introduction, Springer, 2013. [35] R. Kennes, Computational aspects of the Mobius transformation of graphs, IEEE Trans. Syst., Man Cybern. 22 (1992) 201–223. [36] P. Smets, The transferable belief model for quantified belief representation, Handbook Defeasible Reason. Uncertainty Manage. Syst. 1 (1998) 267–301. [37] E. Lefevre, O. Colot, P. Vannoorenberghe, Belief function combination and conflict management, Inf. Fusion 3 (2002) 149–162. [38] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodological) (1977) 1– 38. [39] G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6 (1978) 461–464. [40] G. Jenkins, M. Priestley, The spectral analysis of time-series, J. R. Stat. Soc. Ser. B (Methodological) (1957) 1–12. [41] T. Denoeux, A k-nearest neighbor classification rule based on Dempster–Shafer theory, IEEE Trans. Syst., Man Cybern. 25 (1995) 804–813. [42] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proceedings Eighth IEEE International Conference on Computer Vision, 2001, ICCV 2001, IEEE, pp. 416–423. [43] http://research.microsoft.com/en-us/um/cambridge/projects/ visionimagevideoediting/segmentation /DATA/data_GT.zip. [44] R.W.G. Hunt, M.R. Pointer, Measuring Colour, John Wiley & Sons, 2011. [45] Y. Boykov, V. Kolmogorov, An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 1124–1137. [46] S. Nowozin, C.H. Lampert, Global connectivity potentials for random field models, in: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 818–825. [47] C. Yang, F. Qianjin, S. Pengcheng, C. Wufan, A Novel Nonlocal QuadraticMRF Prior Model for Positron Emission Tomography, in: 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2007. ISBI 2007, pp. 149–152. [48] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. [49] D. Dubois, H. Prade, On the unicity of Dempster rule of combination, Int. J. Intell. Syst. 1 (1986) 133–142. [50] F. Voorbraak, On the justification of Dempster’s rule of combination, Artif. Intell. 48 (1991) 171–197. [51] P. Smets, Resolving misunderstandings about belief functions, Int. J. Approx. Reason. 6 (1992) 321–344.

Please cite this article in press as: Y. Chen et al., Interactive color image segmentation via iterative evidential labeling, Informat. Fusion (2014), http:// dx.doi.org/10.1016/j.inffus.2014.03.007