Evidential segmentation scheme of multi-echo MR images for the detection of brain tumors using neighborhood information

Evidential segmentation scheme of multi-echo MR images for the detection of brain tumors using neighborhood information

Information Fusion 5 (2004) 203–216 www.elsevier.com/locate/inffus Evidential segmentation scheme of multi-echo MR images for the detection of brain t...

787KB Sizes 0 Downloads 36 Views

Information Fusion 5 (2004) 203–216 www.elsevier.com/locate/inffus

Evidential segmentation scheme of multi-echo MR images for the detection of brain tumors using neighborhood information A.-S. Capelle a, O. Colot a

b,*

, C. Fernandez-Maloigne

a

Laboratoire Signal Image et Communications (SIC), UMR CNRS 6615, Universit e de Poitiers, B^at. SP2MI, Boulevard Marie et Pierre Curie, B.P. 30179, 86962 Futuroscope-Chasseneuil Cedex, France b Laboratoire d’Automatique I3 D, FRE CNRS 2497, B^at. P2, Cit e Scientifique, Universit e des Sciences et Technologies de Lille, 59655 Villeneuve d’Ascq Cedex, France Received 3 April 2003; received in revised form 9 October 2003; accepted 9 October 2003 Available online 13 November 2003

Abstract In this paper we propose and study an evidential segmentation scheme of multi-echo MR images for the detection of brain tumors. We show that the modeling by means of evidence theory is well suited to the processing of redundant and complementary data as the MR images. Moreover neighborhood relationship between voxels is taken into account via Dempster’s combination rule. We show that using this information improves the classification results previously obtained and leads to a real region-based segmentation. Moreover, the combination of spatial information allows to compute a measure of conflict, which reflects the spatial organization of the data: the conflict is higher at the boundaries between different structures. Thus, it provides a new source of evidence that the specialist can aggregate with the segmentation results to soften its own decision.  2003 Elsevier B.V. All rights reserved. Keywords: Data fusion; Evidence theory; Decision; Multi-modality imaging; Segmentation; Neighborhood relationship; Conflict

1. Introduction The technical progresses in medical imaging provide to physicians many tools for the observation of the human anatomy or functional behavior of organs. Different modalities of imagery exist: classical radiology, echography, functional positron emission tomography (PET) or magnetic resonance (MR) imaging, etc. Thus, considering that the different sources highlight particular regions, tissues or pathologies, the physicians may use multi-source imaging and take advantages of each. In this way, the multi-source approaches give the opportunity to refine the diagnosis. In addition to the possibility of choosing the most adapted technology, the physicians may simultaneously analyze different images and combine them in order to obtain complementary and/or redundant data. We distinguish two kinds of fusion. The first one is the fusion of images acquired by a single imaging technique but using different acquisitions parameters (for instance multi-echo MR images).

*

Corresponding author. Tel.: +33-320436928; fax: +33-320436567. E-mail address: [email protected] (O. Colot).

1566-2535/$ - see front matter  2003 Elsevier B.V. All rights reserved. doi:10.1016/j.inffus.2003.10.001

The second one is the fusion of images acquired with different imaging techniques such as the anatomical MR images and the functional PET. The use of different techniques for the analysis of a given pathological region of interest (ROI) such as a tumor, provides a large amount of information that physicians naturally integrate and merge for their diagnosis. Data fusion appears to be a growing research field of the last decade [1–4]. The interest in data fusion is to obtain an information synthesis by taking into account different pieces of evidence. Data coming from different sources (sensors, observers,. . .) are usually redundant but also complementary: the observed scenes are the same but they are acquired from different points of view and thus, the amount of available information is increased. The usual characteristic of the data is that they are imprecise, uncertain and incomplete. They emanate from an observer (which has its own interpretation of the scene) or from a sensor (data are distorted by the sensor itself, by the numerical acquisition device including the associated image formation algorithm,. . .). In such a context, the aim of the fusion process is to synthesize a more reliable and elaborated information and thus, to improve the decision.

204

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

Fusion techniques are based on various theories: probabilistic fusion, Bayesian inference, fuzzy sets theory, possibility theory and evidence theory. In this paper, we focus on the evidence theory, also called Dempster–Shafer theory [5]. Although often used in pattern recognition until now, there are very few applications of the evidence theory in medical imaging for images analysis and diagnosis. In [6,7], evidence theory is used for the segmentation of MR brain images. In [8], Suh et al. use it for segmentation and visualization of the left ventricle of the heart. In [9], Bloch classifies brain tissues in pathological dualecho MR images. In this paper, we propose an evidential scheme for the segmentation of multi-echo MR images for detection and 3D visualization of brain tumors. The innovation of the proposed segmentation process is to include spatial neighborhood relationship between the voxels by means of a segmentation scheme based on the use of evidence theory. In Section 2, we present the main aspects of the theory. In Section 3, the use of the theory is more precisely discussed. The segmentation scheme is then described in Section 4. In Section 5, we discuss the results and the consequences of the introduction of the spatial neighborhood relationship in the segmentation process, as the conflicting information, in particular. We show that conflicting information is not only the consequence of data fusion but is also an information source about the data spatial organization.

2. The fundamentals of evidence theory Evidence theory, or the theory of belief functions, was initially introduced by Dempster’s works on the concept of lower and upper bounds for a set of compatible probability distributions [5]. In [10], Shafer formalizes the theory and shows the advantage of using belief functions to model imprecise and uncertain data. Different interpretations of the native ‘‘Dempster– Shafer’’ theory successively appeared [11]. Smets and Kennes [12] deviate from the initial probabilistic interpretation of the evidence theory with the Transferable Belief Model (TBM) giving a clear and coherent interpretation of the underlying concept of the theory. 2.1. Belief structures Within the context of the evidence theory, the imprecise and uncertain data are modeled by belief structures called credibility and plausibility, each of them derived from the elementary belief structure m (also called basic belief assignment, bba). The existence and the use of these functions involve the definition of the frame of discernment X which is composed of N ex-

haustive and exclusive hypotheses Hn , solutions of the problem: X ¼ fH1 ; H2 ; . . . ; HN g:

ð1Þ

From the frame of discernment, we define the power set of all 2N propositions A defined on X, 2X : 2X ¼ f;; H1 ; H2 ; . . . ; fH1 [ H2 g; fH1 [ H3 g; . . . ; Xg:

ð2Þ

Evidence theory provides a theoretical and mathematical framework to quantify a piece of opinion of an agent that a proposition A belongs to the actual world, or equivalently that a given proposition is true [13]. Note that A can be either a singleton Hn or a disjunction of hypotheses. This is one of the main differences with respect to the probabilistic approaches which only consider the singleton case. The bba m assigned by a source S is defined by m : 2X ! ½0; 1 :

ð3Þ

In particular, this function verifies X mðAÞ ¼ 1:

ð4Þ

A X

If we consider the closed-world assumption [14], in opposition to the open-world assumption, the bba verifies mð;Þ ¼ 0, such hypothesis meaning that the solution belongs to the frame of discernment. The quantity mðAÞ measures the amount of belief that is exactly committed to A. In the case of disjunction of hypotheses (A being a compound hypothesis) it is impossible to re-distribute a part of evidence mðAÞ in any subset of A by lack of information. Only the injection of new information allows to re-assign more precisely the belief. An element A X such as mðAÞ 6¼ 0 is then called focal element. Let us denote F the set of focal elements associated to a bba m. From the bba m, the credibility (Bel) and the plausibility (Pl) functions are defined by X BelðAÞ ¼ mðBÞ ð5Þ B A;B6¼;

and PlðAÞ ¼

X

mðBÞ:

ð6Þ

A\B6¼;

The quantity BelðAÞ can be interpreted as the total amount of belief in the proposition A. The plausibility PlðAÞ quantifies the maximum amount of belief potentially assigned to A. Thus the credibility and the plausibility are dual notions: the plausibility is defined by  where A  is the complementary PlðAÞ ¼ BelðXÞ  BelðAÞ of A. Note that the functions m, Bel and Pl are three different representations of the same information [10]. The transformation of any of these functions into another is possible thanks to the M€ obius transformation [15]. Note that if the set of focal elements F is only

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

composed of singleton hypotheses then m, Bel and Pl are three equivalent functions. An important key point within the evidence theory framework is to properly model the knowledge given by different sources of information S in order to initialize the associated bba m. Generally, the models depend on the considered problem. Therefore, we can distinguish two main approaches: the distance-based models initially proposed by Denœux [16–18], which take into account the neighborhood information and the models based on likelihood functions [1,4,10,19]. These different models will be described more precisely in Section 4.2. 2.2. Belief discounting If a source S, associated to a belief function m is considered as not fully reliable, the belief issued from S can be attenuated thanks to a discounting process. If we call a the coefficient which represents the belief we have about the reliability of the source S, the discounted belief function ma is defined by ma ðAÞ ¼ amðAÞ

8A X; A 6¼ X;

ma ðXÞ ¼ 1  a þ amðXÞ;

ð7Þ ð8Þ

with 0 6 a 6 1. 2.3. Combination In the case of classification problems dealing with uncertain and imprecise data, it is often interesting to aggregate the information coming from different sources in order to obtain more relevant information. Evidence theory provides reliable tools to combine the knowledge given by different sources. The obtained information is the synthesis of all sources. Thus, the decision process is more confident because it takes into account the whole information of the sources, partially redundant and complementary. The orthogonal rule also called Dempster’s rule of combination [10] is the first combination defined within the framework of evidence theory. Let us denote m1 ; . . . ; mJ , J masses of belief coming from J distinct sources. 1 The belief function m resulting from the combination of the J sources by means of Dempster’s combination rule is defined by m ¼ m1      mi      mJ :

ð9Þ

The operator  of Dempster’s combination is associative and commutative. For all A of X, m ðAÞ is given by m ðAÞ ¼ 1

m\ ðAÞ 1k

8A X;

ð10Þ

Note that Smets and Kruse detailed in [20] the notion of distinctness: the belief of a particular source should not interfere with the belief of any other source.

205

where m\ is the conjunctive combination defined by X m1 ðA1 Þ  m2 ðA2 Þ    mJ ðAJ Þ 8A X; m\ ðAÞ ¼ A1 \\AJ ¼A

ð11Þ and where the term k is given by X m1 ðA1 Þ  m2 ðA2 Þ    mJ ðAJ Þ: k¼

ð12Þ

A1 \\AJ ¼;

The term k, with 0 6 k 6 1, can be interpreted as a measure of the conflict between the J sources to combine and is directly taken into account in the combination as a normalization factor. It also represents the mass attributed to the empty set if the masses are not normalized. Dempster’s rule has been justified theoretically by several authors [21,22]. However, the normalization step was also criticized [22–24]. It is very important to take into account the value of the normalization term: when it is high (k  1), combining the sources is a non-sense leading to incoherence [14,24] and involving counter-intuitive behaviors. Moreover, when k ¼ 1 the sources are in complete opposition and the data fusion is impossible. Different solutions are proposed to deal with the conflict. In the case of reliable sources, Smets [20] supposes, as Dempster, that the conflict can only come from a bad definition of the frame of discernment. In other words, the problem is an ill-posed problem and Smets proposes to avoid the normalization step. In this case, the value k represents the mass assigned to one or several hypotheses that have not been initially taken into account. The advantage of this approach is that it allows us a possible revision of the initial frame of discernment. Thus, Smets proposes a combination defined as follow:  mS ðAÞ ¼ m\ ðAÞ 8A X; ð13Þ mS ð;Þ ¼ k: Note that a similar reasoning is proposed by Yager [25]. In the case of non-reliable sources, solutions are proposed by Yager [25] and Dubois and Prade [3]. In [26], Lefevre et al. define a formalism to describe a family of combination operators and develop a generic framework in order to unify the different classical rules of combination previously cited. 2.4. Decision making Generally, for most applications, the decision that have to be taken is to choose a simple hypothesis. Within the particular context of the TBM, which deviates from the probabilistic approach of the theory, Smets opposes the credal level to the pignistic level. The first one consists in modeling the knowledge and combining sources. The second one is fully oriented toward the decision-making [20]. Smets proposes a decision rule based on a probability function called pignistic probability function [27] defined by

206

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

X

BetP ðHn Þ ¼

A X;Hn 2A

mðAÞ jAj  ð1  mð;ÞÞ

3.1. Problematic 8Hn 2 X;

ð14Þ

where jAj is the cardinality of A. Let denote by X a pattern to be assigned in one of the N hypothesis of the frame of discernment X. In the context of the decision theory, we consider A ¼ fa1 ; . . . ; aN g a finite set of actions. Typically the action ai is associated to the assignment to the hypothesis Hi . We define by kðai jHn Þ the loss which occurs when we select ai whereas the truth belongs to the hypothesis Hn . The Bayesian decision rule defines the expected loss associated to each possible action ai 2 A by X RBetP ðai Þ ¼ kðai jHj ÞBetP ðHj Þ: ð15Þ Hj 2X

For a pattern vector X , the pignistic decision rule is then defined by DBetP ðX Þ ¼ ai

with

ai ¼ arg min RBetP ðaj Þ:

ð16Þ

aj 2A

If we consider the simplest case where the losses are assumed to be equal to 1 for a bad decision and 0 for correct decision ðkðai jHj Þ ¼ 1  di;j ), the pignistic decision rule is written as DBetP ðX Þ ¼ ai

with

ai ¼ arg max BetP ðHj Þ:

ð17Þ

The problems that physicians and the oncologists encounter, within the framework of brain tumors treatment, is to determine exactly the areas of extension and the volume of tumors. These parameters, beside the nature of the lesion, allow them to plan treatments. This work aims to supply efficient segmentation and brain volume estimation tools to help the physicians in their diagnosis. For that purpose, we have, for a same patient, several MR brain volumes. Each volume, formed by a stack of slices, was acquired with the same MR scanner but with different acquisition parameters (e.g. T1 echo, T2 echo). Thus, each volume can also be considered as an information source and the data are numerous, redundant and complementary. The idea is to merge the data, as the specialist mentally does, in order to obtain a reliable segmentation and consequently a good estimation of the brain tumor volume. 3.2. The interest of using evidence theory To realize the data fusion we adopt the evidence theory which is, also well suited to the problem of brain tumors detection and to the nature of the used data:

Hj 2X

Other decision rules can be defined [28]. Considering losses f0; 1g, a lower and a upper decision rules are defined by DH ðX Þ ¼ ai

with

ai ¼ arg max BelðHj Þ;

ð18Þ

Hj 2X

DH ðX Þ ¼ ai

with

ai ¼ arg max PlðHj Þ:

ð19Þ

Hj 2X

Thus, maximizing the credibility corresponds to a pessimistic decision and maximizing the plausibility corresponds to an optimist decision. Within the case of pattern recognition, it is often interesting to add a new action ar , called rejection, used when uncertainty is too high [29,30]. The rejection action ar is then associated to a constant cost kðar jHj Þ ¼ Cr . If the function WðÞ represents the credibility or the plausibility or the pignistic probability, then the decision rule is defined by 8 WðHj Þ < 1  Cr ; < ar ; if max Hj 2X DðX Þ ¼ : ai ¼ arg max WðHj Þ; with WðHi Þ P 1  Cr : Hj 2X

ð20Þ 3. MR image segmentation and evidence theory This section deals with a particular application in the field of medical imaging. It concerns the segmentation of multi-echo images for detection and 3D visualization of brain tumors.

• The images are obtained by the same imaging technique but with different acquisition parameters. We are within the framework of multi-echo analysis. The same scene and thus the same anatomical regions are observed and the information to be treated are redundant. • Every echo highlights some elements and some particular tissues that the others do not. For instance, the T1 echo is an anatomical modality, thus the anatomical structures are well differentiated. The T2 echo highlights, totally or partially, tumors and œdemas. Using several different modalities thus implies the complementarity of data. • Although the scanner efficiency is increasing, we have to keep in mind that the images are issued from an acquisition system and therefore that their quality depends on the quality of the scanner. The images are, by nature, corrupted by noise (noise coming from the acquisition system or the patient itself, quantization noise). The images are thus uncertain and imprecise. • Furthermore, we are particularly interested in tumors with badly defined frontiers. Such tumor boundaries are already uncertain. • Finally, we have seen in Section 2 that the conflict is an important information in evidence theory, information given by the combination process. However, from our point of view, the conflict is not only a consequence of the combination process but it is also an information source which increases the knowledge on

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

the problem. We will see in Section 5.3.3 a possible use of the conflict in the segmentation process.

4. Segmentation scheme We propose an automatic method for the segmentation of pathological multi-echo MR volumes. The method relies on two main points. Firstly, we model data according to an evidential parametric model. In particular, we propose the use of three models. Secondly, we make use of spatial information by a weighted Dempster’s combination rule. 4.1. Notations Let xi denote the gray level of a voxel located on site s of the MR volume Vi . If we consider the set of p different echoes, the pattern vector associated to the voxel in s is defined by X ¼ ½x1 ; . . . ; xp :

ð21Þ

The space of characteristics X is a p-dimensional space defined by the grey level of the volumes. Moreover, let Xt be a training set defined by Xt X. Let X ¼ fH1 ; . . . ; HN g be the set of hypotheses (the solutions of the problem) where N is the constant number, arbitrarily fixed. X is also called the frame of discernment. The classification problem is to divide the volume into N classes. Ideally, each class represents a particular anatomical structure. Usually, N is equal to 5 in order to segment the volume into white matter (WM), gray matter (GM), cerebrospinal fluid (CF) and two classes for tumor and œdema. 4.2. Evidential modeling We propose the use of three evidential models: a distance-based model introduced by Denœux [16,31] and two models based on a likelihood function, the first one was defined by Shafer [10] and the second one by Appriou [4]. 4.2.1. Denux’s model Denœux’s model is a kNN-theoretic distance-based classifier. For each pattern X 2 X to classify, this model considers k neighbors of the training set Xt as k independent information sources which are used in order to determine the class of membership of X . The information brought by a training neighbor Xs , with s 2 ½1; . . . ; k , is modeled by a bba ms defined on X. If Xs belong to the hypothesis Hn , a weighted fraction of the unit mass is assigned to the singleton hypothesis Hn and the rest to the frame of discernment X (so it quantifies the degree of ignorance). The mass ms ðfHn gÞ is then a decreasing function of the distance ds between X and X s :



207

ms ðHn Þ ¼ a/n ðds Þ; ms ðXÞ ¼ 1  a/n ðds Þ;

ð22Þ

where 0 < a < 1 is a constant and /n a monotonous decreasing function such that /n ð0Þ ¼ 1 and limd!inf /n ðdÞ ¼ 0. We use for /n the following exponential function: /n ðds Þ ¼ expðcn ðds Þ2 Þ;

ð23Þ

where cn > 0 depends on the class Hn . Note that in [17], Zouhal and Denœux propose to optimize the parameter cn by the minimization of an error criterion. Since the k neighbors are considered as k independent sources of information modeled by k bba ms with s ¼ ½1; . . . ; k , a decision on the membership class of the vector X can be made after the aggregation of the whole information by Dempster’s rule of combination. In order to cope with high computation time, Denœux proposes a ‘‘prototypes’’ version. With this version, each hypothesis Hn is associated to a center (a prototype) xn which is considered as a source of information. Thus N bbas mn for n ¼ ½1; . . . ; N are built according to Eq. (22) where the distance is the distance between the pattern X and the center xn . The final bba m is obtained by merging the N initial bbas. 4.2.2. Shafer’s model Shafer’s model [10] is based on the likelihood function. We suppose that the conditional a priori probability function f ðX jHn Þ is known; thus the conditional likelihood associated to the pattern X is defined by LðHn jX Þ ¼ f ðX jHn Þ. The bba is defined thanks to the knowledge of all the hypotheses Hn : it is a global method. Moreover it is a consonant method based on two axioms. Firstly, the plausibility of a simple hypothesis Hn is proportional to its likelihood. Let us denote c a normalization factor. The plausibility is thus given by PlðHn Þ ¼ cLðHn jX Þ 8Hn X:

ð24Þ

Secondly, if we assume the plausibility to be consonant, Pl verifies the condition PlðA \ BÞ ¼ max½PlðAÞ; PlðBÞ

8A; B X:

ð25Þ

The plausibility of a set A is thus given by PlðAÞ ¼ c max LðHn jX Þ:

ð26Þ

Hn 2A

As the plausibility function verifies PlðXÞ ¼ 1, we deduce max LðHn jX Þ PlðAÞ ¼

Hn 2A

max LðHn jX Þ

8A X:

ð27Þ

Hn 2X

The bba function is finally determined thanks to the M€ obius transformation [15].

208

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

4.2.3. Appriou’s model Appriou’s model, as Shafer’s one, is based on likelihood functions LðHn jX Þ. The models defined by Appriou satisfy three axioms [1]:

are very weak and consequently a too large amount of belief is placed on the set X.

• consistency with the Bayesian approach; • separability of the evaluation of the hypotheses Hn ; • consistency with the probabilistic association of sources Sj .

In this part, we deal with the parameter estimation associated to the various belief models described above. In particular, the building of Denœux’s model needs to have a training set. For Appriou and Shafer models, likelihood functions need to be estimated. We suppose that each pattern X is assigned to one hypothesis Hn depending only on its intensity level. If we assume that each tissue class of the MR volume can be modeled by a Gaussian distribution [36–38], the conditional density function is defined by 1 f ðX jHn Þ ¼ p=2 ð2pÞ jRn j1=2  1 T  exp  ðX  ln Þ R1 ð33Þ n ðX  ln Þ ; 2 where ln and Rn are respectively the mean p-vector and the p  p covariance matrix associated to the hypothesis Hn . In the probabilistic context, X is a particular realization of the stochastic process. Let Xt be the set of pattern vectors used to estimate the means and the variances. This training set is such as Xt X where X is the set of all the pattern vectors of the volume. If we suppose the volume is composed of N distinct classes, associated to the hypotheses Hn for n ¼ 1; . . . ; N and that the realizations are independent and identically distributed (i.i.d.), we are in presence of a mixture model described by the mean, the variance and the proportion of each class. Let H represents these parameters. Thus the conditional probability of Xt is defined by X X f ðX jHn Þ  P ðHn Þ: ð34Þ P ðXt jHÞ ¼

An exhaustive search shows that only two models satisfy these axioms [1]. For the first model, each information source Sj is associated to N elementary bbas defined by 8 < mnj ðHn Þ ¼ 0; mnj ðHn Þ ¼ anj  f1  Rj  LðHn jxj Þg; ð28Þ : mnj ðXÞ ¼ 1  anj  f1  Rj  LðHn jxj Þg: For the second model, the bbas are defined by 8 < mnj ðHn Þ ¼ anj  Rj LðHn jxj Þ=f1 þ Rj  LðHn jxj Þg; mnj ðHn Þ ¼ anj  f1 þ Rj  LðHn jxj Þg; : mnj ðXÞ ¼ 1  an ;

ð29Þ

where Rj is a normalization factor constrained by 1

Rj 2 ½0; ðsup max fLðHn jxj ÞgÞ ; xj

n2½1;N

ð30Þ

and anj is a reliability factor depending on the hypothesis Hn and on the source Sj . If the confidence into the training is high, Appriou proposes to fix anj to 1 and not to discount the bba. Otherwise, anj is fixed to 0:9. Other methods were proposed to automatically compute the reliability factors [32,33]. A mass m is finally obtained by computing the orthogonal sum of the different bbas mnj and mj : mj ðÞ ¼ a mnj ðÞ;

ð31Þ

n

mðÞ ¼ a mj ðÞ:

ð32Þ

j

According to [34], the first model seems to be preferable because it is more consistent with the Generalized Bayes Theorem (GBT) introduced by Smets [35]. In the sequel, we will only deal with the first model proposed by Appriou. A particularity of Appriou’s models is their multisensor aspect. Thus, in our application, each MR echo is considered as an information source, and, therefore, each echo is treated separately before to be merged according to the fusion process in order to build a synthesized information. This is especially interesting for the tumor segmentation with MR images which provide complementary and redundant image information. Moreover this choice is guided by the fact that with a vectorial version, the values of the likelihood functions

4.3. Model parameter estimation

X 2Xt Hn 2H

The parameters of this mixture model are then estimated using the expectation–maximization (EM) algorithm [39]. The definition of the training set Xt is discussed in Section 5.1. The likelihood of Shafer’s model is then defined using LðHn jX Þ ¼ f ðX jHn Þ and Eq. (33). For Appriou’s model, we need to estimate p likelihood functions Lðjxp Þ. They are estimated by means of the EM algorithm while preserving a same frame of discernment for each of the information sources. Concerning Denœux’s model, each estimated mean and variance are associated with a class prototype. For each pattern X , the belief function mn (Eq. (22)) is computed using the Mahalanobis distance defined by ds2 ¼ ðX  ln ÞT R1 n ðX  ln Þ:

ð35Þ

The optimized cn (Eq. (23)) parameters are computed using the classification obtained by the EM algorithm as the training set Xt and the method proposed in [17].

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

4.4. Introduction of spatial information The different evidential models presented above allow to obtain a classification of all the patterns of the volume: the knowledge about each pattern is modeled by a bba and a decision (assignment to a class) is taken. However, such a scheme classifies each data independently, without taking into account any information provided by its spatial neighborhood. The originality of the proposed scheme is to introduce spatial information within the framework of evidence theory. Indeed, within the framework of image segmentation, we implicitly suppose that each region (equivalent to a class in the posed problem) of the image shares the same properties. Thus, in a particular homogeneous region, each pattern––modeled by a bba–– reinforces the knowledge about its neighbors. Moreover, if a corrupted pattern is present in this region, the knowledge provided from its neighbors attenuates its belief such as a denoising process. The introduction of spatial information in the modeling process by merging data gives then a more accurate information about the patterns and leads to a more reliable decision. Moreover, we can now assimilate this process to a real segmentation process and not only to a classification process. Thus, we denote m the bba of a pattern X and mk the bba of the kst spatial neighbor Xk of X . The basic idea is to consider that a pattern X can probably be assigned to Hn if the evidence about its K spatial neighbors in Hn is high. If we denote m the bba of the pattern X issued from the introduction of spatial information, we propose to define a distance-based bba m0 by m0 ¼ m  ma11      maKK ;

ð36Þ

mai i ,

for i ¼ 1; . . . ; K, is the discounted bba mi . where Thus, the resulting mass m0 depends on m––the bba associated to the pattern X ––but also on the weighted masses of its K neighbors. Intuitively, we consider that the contribution of a neighbor depends on its distance to the pattern X to assign to one of the competitive hypotheses supported by the pattern X itself and its neighbors. Thus, we propose to define ak by 2

ak ¼ expfðdk Þ g;

ð37Þ

where dk is the Euclidian distance between X and its spatial neighbor Xk . Besides weighting the bba according to the position of the neighbors, another advantage of using a distancebased function is to take into account the true dimensions of the voxels. This is particularly interesting in MR imaging because the voxels are usually anisotropic: the z-distance of the voxel both includes the slice thickness and the distance between slices; consequently the z-distance is often two time larger than the x- or y-distance. The introduction of spatial information is then consis-

209

tent with the data and avoids the approximations due to digitization. Given the mass m0 for each pattern X of a considered volume, the decision rule consists in choosing a hypothesis Hi which verifies Eq. (20).

5. Experiments and results 5.1. Image description The MR images are issued from the 1.5 T scanner of the Regional University Hospital Center (CHRU) of Poitiers. For a same patient, we have a pair of MR volumes of the same brain region. Depending on the cases and the needs of the physicians, the pairs of echoes are fT1 ; T2 g, {T1 ; T1 Gado 2} and fT1 Gado; T2 g. Let us note that the method can be easily extended to J sources with J > 2. Figs. 1(a) and (b) show a dual-echo image (respectively a T1 Gado echo and a T2 echo). Those images highlight both the redundancy and the complementarity of the data. The data were axially acquired. Each volume is composed of voxels having near-millimetric dimensions: 0.93 · 0.93 · 1.2 mm. The voxels are anisotropic but the thickness of each slice and the space between them are such as there does not exist any superposition between slices thus avoiding to corrupt the data quality. Note that acquisition protocols are such that the dimensions of the voxels and the acquired zones are the same for all the sequences, avoiding to match the different volumes. Moreover, the segmentation scheme is applied to the restricted region of the brain (the brain is previously isolated from the whole data by a preliminary preprocessing based on morphological operations [40]). The segmentation scheme proposed in Section 4 is applied on six dual-echo volumes. For simplicity, we only present the results about a single patient but similar behaviors are observed with the other data volumes. For the particular presented volume, the expected anatomical regions are the WM, the GM, the CF, the tumor and the œdema. Thus, the number of classes N equals 5. The three models presented in Section 4.2 are used and their behavior compared. The parameter estimation is realized as described in Section 4.3. In order to obtain a completely automatic segmentation (no user interaction) and to be sure that all the classes are represented in the training set Xt , we define Xt ¼ X where X is composed of all the pattern vectors of the volume. For this particular volume, X is composed of more than 800 000 patterns.

2 We call T1 Gado a T1 type acquisition for which a contrast product, the gadolinium, was injected to the patient.

210

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

Fig. 1. A dual-echo MR image. (a) T1 Gado echo (b) T2 echo.

The discounting parameters a (Eq. (22)) and anj (Eq. (28)) depend on the reliability we assess to the hypotheses Hn and in the training set Xt . However, the confidence in the training set Xt , obtained by an unsupervised process, is not complete essentially due to the fact that image data characteristics depend on the acquisition conditions that we did not control. Consequently the bbas have to be discounted. Moreover, not to favor any hypothesis, we impose that these parameters are set to the same value. Thus, we propose to set all the discounting parameters to the value 0.95. Moreover, this constraint induces that a part of the belief is assigned to the set X which consequently avoids any possible situation of total conflict when the bbas are combined. The decisions about the class membership are taken using a loss equal to 1 in case of a wrong decision and 0 otherwise. 5.2. Comparison between the different models 5.2.1. Without using spatial information In order to evaluate the impact of the neighborhood information on the different models, we first segment the volumes without using spatial information. In other

words, this is equivalent to set the parameters ak ¼ 0 for k in ½1; K (Eq. (36)) where K is the number of spatial neighbors. We present the final segmentation results into five classes of a dual-echo fT1 Gado; T2 g. The decision is taken by maximizing the pignistic probability function. Fig. 2 represents the segmentation results obtained with the three evidential models. As we can observe, the results seem to be consistent with the requested anatomical structures, for all the used models. However, we observe two different behaviors which correspond to the information used to build the bbas: a distance-based approach for Denœux’s model and a likelihood-based approach for Appriou and Shafer models. Fig. 3 represents the evolution of the rejection cost when the decision cost Cr increases from 0 to 1. We observe different levels of rejection depending on the used model. Appriou’s model rejects more than Shafer’s model and Denœux’s model rejects more than Appriou’s model. Thus, we remark that this organization corresponds to the specificity of the models. The more it is specific, the less the rejection rate is high. Depending on the classification strategy, the chosen model will be different: with a careful approach, Denœux’s model is preferred, with an optimistic approach, Shafer’s model is preferred. 5.2.2. Introduction of spatial information As explained in Section 4.4, spatial information are introduced in the segmentation scheme via the combination of the bba of a voxel and the bbas of its neighbors. This combination provides additional information which allow a more accurate decision. Fig. 4 shows the segmentation obtained when we introduce spatial information via the combination of masses of belief. The visual quality of the segmentation seems to increase. In particular, the WM is better recognized when Denœux’s model is used. The segmented region is more homogeneous and seems to be less corrupted by misclassifications. We will detail more precisely in Section 5.3 the differences obtained with and without the use of spatial information.

Fig. 2. Segmentation results into 5 classes without using spatial information on a middle slice of the volume. From darker to lighter: CF, œdema, tumor, WM and GM. (a) Denœux’s model. (b) Appriou’s model. (c) Shafer’s model.

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

211

Table 1 Tumor classification result comparison (the tumor ground truth is composed of 72 405 points) Denœux Appriou Shafer

Fig. 3. Evolution of the rejection rate.

In order to analyze the quality of the different models within the context of our medical application, we compare the ability of the models to detect the tumor and the œdema. For that purpose, we extract from the segmentation the regions corresponding to the tumor and the œdema. If we denote by 0, the label attributed to the darkest region of a segmentation and by 4 the label attributed to the lightest, then the tumor and œdema correspond to the biggest connex component labeled 1 and 2. Note that it is not possible to extract these regions in that way when we do not include neighborhood information: many small regions are still connected to the œdema and the tumor. Thus, the neighborhood information is necessary for our application with this data volume. The extracted ROI composed of the tumor and the œdema is compared with an expert segmentation. However, even if the expert frontiers are manually drawn by a radiologist, they could not be considered as ‘‘The Gold Standard’’ because of the variability of the expert decision; but they are still a good reference for quality evaluation. Note that this expert segmentation is never taken into account during the segmentation steps: the segmentation is fully automatic. The evaluation of evidential segmentations is made via different classification rates: true positives rate, the false positive rate and the false negative rate.

True positives

False positives

False negatives

69 857 (96.48%) 64 181 (88.64%) 67 288 (92.93%)

5233 (7.23%) 6500 (8.98%) 9708 (13.41%)

2548 (3.52%) 8224 (11.36%) 5117 (7.07%)

Table 1 summarizes the results. Considering the true positive rate, Denœux’s model provides the best detection rate (96.48% of the tumor volume is detected) even if the three models underestimate the tumor volume. However, we have to keep in mind that oncologists tend to over-estimate the tumor volumes. In the context of medical application, the only use of the true positive rate is not sufficient to evaluate a segmentation result. It is often interesting to consider the false positive rate and the false negative rate. In particular, when the ROI is pathological structure (as a lesion), it is critical to avoid false negative decisions. Considering this rate, Denœux’s model is again the most efficient with a false negative rate equal to 3.52% of the ground truth tumor volume. Appriou and Shafer false negative rates are higher; 7.07% of tumor volume is not detected with Shafer’s model and 11.36% with Appriou’s model. The false positive rate information is less important in a tumor result evaluation. However, we notice that this rate is between 7.23% and 13.41%. Considering this rate, Shafer’s model provides the more careful segmentation. Finally, these rates are obtained without including any rejection class. The use of such a class should avoid some misclassifications. The study of this case shows that evidence theory seems to be well suited for tumor segmentation problems. The segmentation results are very encouraging and close to expert segmentation. Moreover, the use of spatial information increases the visual accuracy of the segmentation (see the results obtained with Denœux’s model). The influence of the spatial information is further detailed in the next section.

Fig. 4. Segmentation results with the introduction of spatial information (the pignistic decision rule is used). (a) Denœux’s model. (b) Appriou’s model. (c) Shafer’s model.

212

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

5.3. Contribution of spatial information This part deals with the analysis of the results obtained when spatial information are included in the segmentation process. We answer different questions. Which modifications are brought on the final segmentation (Section 5.3.1)? How are the different behaviors of the decision rules influenced by such information (Section 5.3.2)? What is the behavior of the conflicting information (Section 5.3.3)? 5.3.1. Segmentation results This section deals with the comparison between the segmentation obtained with and without the use of neighborhood information. We will see in particular how these additional information improve the segmentation and how it can solve some ambiguous situations. Fig. 5(a) shows the segmentation obtained with the introduction of spatial information using Shafer’s model. The mass of each neighbor is attenuated by a factor which takes into account the distance to the voxel to classify and the dimensions of the voxels (Eq. (37)). Whereas the quality of the segmentation obtained with Shafer’s model without spatial information is already acceptable (Fig. 2(b)), we can notice some changes. Comparing Figs. 5(b) and (c), which are a zoom of the segmentations obtained respectively without and with spatial information, we observe that a lot of isolated points of the initial segmentation disappear when one introduces spatial information. Moreover, the regions are smoother as we can see on the boundaries of the œdema which are more regular. Figs. 6(a) and (b) show the segmentations obtained using Shafer’s model with a pignistic decision rule, with and without the introduction of spatial information.

Fig. 5. Effects of the introduction of spatial information. (a) Segmentation result with Shafer’s model, spatial information and pignistic decision on the same slice as Fig. 2(b). On the right, (b) and (c) are respectively a zoom without and with spatial information.

Fig. 6. Segmentation results with Shafer’s model. The rejection rate is equal to 0.7. In white, the rejected voxels. (a) Without spatial information. (b) With spatial information.

The white points represent the voxels rejected by the classification. The rejection rate equals 0.7. Without spatial information (Fig. 6(a)), the rejected points are located both on the boundaries of the regions and inside the regions. The first kind of rejection is an ambiguity rejection: the point is close to several regions. The second kind of rejection is a distance rejection: the point is too far from at least one region, thus it could be a noisy point or a point which belongs to any class of the learning set. With the use of spatial information (Fig. 6(b)), the rejected points are mainly located on the boundaries and are less numerous. It seems that the spatial information combinations eliminate most of the noisy points and reduce the ambiguity on the boundaries. The segmentation is thus more accurate by using spatial information. The boundaries are smoothed, while keeping the finest details. Theses results are obtained thanks to the addition of new information and the use of discounting factors which tune the contribution of the different voxels of the neighborhood. The neighbors confirm or invalidate the previous knowledge: some ambiguous or noisy points are re-classified: this is a spatially guided belief revision process. The discounting factor allow to finely smooth the regions and to keep the details. Without the use of discounting factors, the segmentation would be coarser and of poorer quality. 5.3.2. Modification of the decision plots As explained in Section 2.4, different decision functions can be used with evidence theory. In this part, we are interested in the modifications of the decision plots introduced by the use of spatial neighborhood information. For that purpose, we study the variations of the rejection rate when the rejection cost Cr varies from 0 to 1. Fig. 7 represents the results obtained with and without spatial information for Appriou’s model. On Fig. 7(a), we retrieve the properties of credibility, pignistic

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

213

Fig. 7. Evolution of the number of the rejection points according to the rejection threshold for Appriou’s model. (a) Without spatial information. (b) With spatial information.

and plausibility decision rule: for all BelðX Þ 6 BetP ðX Þ 6 PlðX Þ for all X 2 X. With the introduction of spatial information, we see on Fig. 7(b) that the three plots tend to converge asymptotically. It becomes almost equivalent to choose a pessimistic or an optimistic decision. Moreover, the level of the rejection rate is globally less important using spatial information. The same behavior is observed in the case of Denœux and Shafer models. 5.3.3. Conflicting information This section deals with the origin and the interpretation of the conflicting information. Mechanically, the conflicting information is the result of the combination of masses of belief and reflects how the sources are opposed. If the conflict is maximum (k ¼ 1), the sources are in complete opposition. On the contrary, when k ¼ 0, the sources completely agree (Section 2.3). However, the interpretation and the usage of the conflict is not obvious. Three main reasons are basically brought to interpret the conflicting information [26]. The first one deals with the quality of the sources: some abnormal measurements induce conflicting mass during the combination. The second reason is linked to the modeling of the masses of belief: the use of a nonappropriate or an imprecise model induces some variations in the belief function and provides conflict. The last reason relies on the number of sources to combine: the increase of the number of sources provides the increase of the conflict amount. In our application context, two main processes induce the conflicting information. On one side, the conflict can be attributed to the fact that models are imprecise. Even if the learning step is efficient, there is a vagueness leading to the corruption of the bbas. Their combination thus induces conflict. The conflict is then inherent to the training. The less efficient is the training, higher is the conflict. The conflict corrupts the belief functions and decreases the classification efficiency. Thus, it is an undesirable information. On the other side, some conflict appears when we combine the neighboring bbas (Section 4.4). This information is now an expected information. Indeed, if we consider that the different regions of an image are homogeneous, the combination of neighbors near the frontiers of the regions induces conflicting in-

formation which reflects the frontiers. On Fig. 8 are represented the conflict generated by the spatial combination for the three models. On each images, a high conflict is expressed by a high grey level and the absence of conflict corresponds to black points. We observe that the conflict is especially located on the boundaries of the regions. Inside a region, i.e. in homogeneous regions of the volume, the information brought by the different neighbors agree and consolidate the previous knowledge: the generated conflict is low. On the contrary, near the boundaries, the believes associated to each of the neighbors are opposed. One part of neighbors supports one hypothesis whereas the other part supports another hypothesis, thus creating high conflict. However, we should make the difference between Appriou’s conflict and the two others. Indeed, the Appriou’s model is a multi-sensor one. Each echo is a sensor which has its personal interpretation of the scene. In particular, the tumor is not represented in the same way with the used echoes which naturally induces conflict. Thus, in Fig. 8(middle), the conflict is high not only on the frontiers but on the tumor too and that complicates the interpretation of this information. The interpretation and the use of the conflicting information is complex because it is impossible to separate the quantity of conflict due to the vagueness of the training and the one due to local information about the frontiers. However, we can suppose that this last interpretation of the conflict corresponds to the dominant case. Thus, we can say that the local conflict maxima correspond to the existence of a frontier and indicate that the decision about the class membership is not

Fig. 8. Conflicting information generated by the combination of 26 neighbors (from left to right Denœux, Appriou and Shafer models). A high conflict is expressed by a high grey level and the absence of conflict corresponds to black points.

214

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

obvious. Thus, we propose to use the conflicting information as an image of the confidence about the segmentation process decisions. Let cðsÞ be the intensity of the conflict located in the voxel s. Let us define cconf ðsÞ the confidence on the segmentation in s by  0 if cðsÞ < kc ; cconf ðsÞ ¼ ð38Þ cðsÞ if cðsÞ P kc ; where kc is a threshold constrained by 0 6 kc 6 1. Experiments show that a threshold kc ¼ lc , where lc is the mean conflict of the whole volume, deletes most of the conflict located inside the regions while keeping the frontier information as show (Fig. 9(a)). A particular case appears with Appriou’s model. Some conflicting points located in the tumor and œdema are kept on the confidence image. This behavior is due to the multisensor aspect of the model which highlights the complementarity between the used echoes. However, it still possible to extract the conflict frontiers using a higher threshold. Fig. 9(b) represents the confidence image with kc ¼ lc þ 0:25rc where rc is the standard deviation of the conflict. Thus, the frontiers between œdema and tumor appear. Globally, the conflict information match with the frontiers between the main anatomical structures (WM, GM, CSF). However, this information differs for the tumor and œdema. Denœux’s model mainly highlights the tumor frontiers. Shafer’s model mainly highlights the œdema frontiers. Finally, Appriou’s model both

Fig. 9. Confidence images obtained using Eq. (38). (a) kc ¼ lc ; (b) kc ¼ lc þ 0:25rc .

highlights œdema and tumor frontiers. Thus, the three models seem to provide complementary information. Fig. 10(a) and (b) represent the confidence images superposed to the original T1 Gado slice. As one can see, the conflict is really located at the interfaces of the different anatomical structures and is really significant of the frontiers location. Within the context of the aid to

Fig. 10. Confidence images superposed to the original T1 Gado slice. (a) kc ¼ lc ; (b) kc ¼ lc þ 0:25rc .

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

decision making, the observation of such an image can help the user (e.g. the physicians) to adjust his belief and his decision.

6. Conclusion In this paper, we have presented an original scheme for the segmentation of multi-echo MR images of brain for the detection of brain tumors. This scheme combines the modeling of the knowledge by means of evidence theory and integrates the spatial neighborhood information. The modeling via evidence theory allows to take into account the characteristics of the multi-echo MR images: complementarity, redundancy and incompleteness. Moreover, evidence theory brings the theoretical support and tools for the combination of such information. The introduction of spatial information (introduction of the spatial dependency) allows us to obtain an image segmentation process. Our work focuses on the analysis of different evidential modeling and on the influence of the introduction of the spatial information. The study of different evidential models shows their effectiveness to segment MR images. Thanks to the introduction of neighborhood information, the evidence about the membership class of each voxel of the volume grows and the final decision is more accurate. The effects of using neighborhood information were studied according to different parameters. In particular, we have shown that the neighborhood information provides a more accurate segmentation: the fine regions are better detected and the boundaries of the different regions are smoothed while preserving the small details. The process takes into account the local information and leads to a true regionbased segmentation. By eliminating the ambiguities produced by noisy data and boundary points, the use of spatial information modifies the decision plots. Taking a pessimistic decision is quite equivalent to take an optimistic decision. The remaining ambiguities are mainly located on the boundaries of the regions. Finally, the comparison of the detected tumor volumes with segmentation manually made by a physician highlights the ability of our evidential segmentation scheme to segment pathological MR images. In particular, among the three models that we have been studied, the one proposed by Denœux, which detects more than 96% of the tumor volume, is the most accurate. In a second part of our study, we are interested in the conflicting information created by the combination of spatial information. The location of the conflicting points confirms the previous observations, being especially concentrated on the boundaries of the regions within the images. We show that the conflicting information, that we can quantify by evidence theory, is not only a consequence of data fusion but it is also a source

215

of information about the spatial organization of the data: the conflict is a significant information that we interpret in the segmentation context. In our case, it is significant of the position of the boundaries of the regions and that brings a new source of information to the specialist to soften his decision. This study showed that evidential theory using neighborhood information can enhance segmentation results of multi-echo images. We currently continue our investigations on the analysis of the conflicting information and its more complete integration in the segmentation process.

Acknowledgements The authors thank J.C. Ferrie of the regional university hospital of Poitiers to have furnished the used MRI in this study and for the validation of the results of segmentation.

References [1] A. Appriou, Probabilites et incertitudes en fusion de donnees multi-senseurs, Revue Scientifique et Technique de la Defense 11 (1991) 27–40. [2] I. Bloch, Incertitude, imprecision et additivite en fusion de donnees: point de vue historique, Traitement du Signal 13 (4) (1996) 267–288. [3] D. Dubois, H. Prade, Representation and combination of uncertainty with belief functions and possibility measures, Computational Intelligence 4 (1998) 244–264. [4] A. Appriou, Multisensor signal processing in the framework of the theory of evidence, NATO/RTO – Lecture Series 216 on Application of Mathematical Signal Processing Techniques to Mission Systems, November 1999, pp. 5–31. [5] A. Dempster, Upper and lower probabilities induced by multivalued mapping, Annals of Mathematical Statistics 38 (1967) 325– 339. [6] R.H. Lee, R. Leahy, Multi-spectral classification of MR images using sensor fusion approaches, in: SPIE Medical Imaging IV: Image Processing 1233, 1990, pp. 149–157. [7] S.Y. Chen. W.C. Lin, C.T. Chen, Evidential reasoning based on Dempster–Shafer theory and its application to medical image analysis, in: SPIE 2032, 1993, pp. 35–46. [8] D.Y. Suh, R.M. Mersereau, R.L. Eisner, R.I. Pettigrew, Automatic boundary detection on cardiac magnetic resonance image sequences for four dimensional visualisation of the left ventricle, in: First conference on Visualization in Biomedical Computing, Atlanta, 1990, pp. 149–156. [9] I. Bloch, Some aspects of Dempster–Shafer evidence theory for classification of multi-modality medical images taking partial volume into account, Pattern Recognition Letters 17 (1996) 905– 919. [10] G. Shafer, A Mathematical Theory of Evidence, Princetown University Press, Princetown, NJ, 1976. [11] P. Smets, What is Dempster–Shafer’s model? in: R.R. Yager, M. Fedrizzi, J. Kacprzyk (Eds.), Advances in the Dempster–Shafer Theory of Evidence, New York, pp. 5–34. [12] P. Smets, R. Kennes, The transferable belief model, Artificial Intelligence 66 (2) (1994) 191–234.

216

A.-S. Capelle et al. / Information Fusion 5 (2004) 203–216

[13] P. Smets, Pratical uses of belief functions, in: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, 1999, pp. 612–621. [14] P. Smets, The combination of evidence in the transferable belief model, IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (5) (1990) 447–458. [15] R. Kennes, Computational aspect of the M€ obius Transformation of graphs, IEEE Transactions on Systems, Man and Cybernetics 22 (1992) 201–223. [16] L.M. Zouhal, T. Denœux, An adaptative k-NN rule based on Dempster–Shafer theory, in: 6th International Conference on Computer Analysis of Images and Pattern, September 1995, pp. 310–317. [17] L.M. Zouhal, T. Denœux, An evidence-theoretic K-NN rule with parameter optimization, IEEE Transactions on Systems, Man and Cybernetics 28 (1998) 263–271. [18] T. Denœux, A neural network classifier based on Dempster– Shafer theory, IEEE Transactions on Systems, Man and Cybernetics 30 (2) (2000) 131–150. [19] P. Walley, Belief function representations of statistical evidence, The Annals of Statistics 15 (4) (1987) 1439–1465. [20] P. Smets, R. Kruse, The transferable belief model for belief representation, in: A. Motro, P. Smets (Eds.), Uncertainty Management in Information Systems: From Needs to Solutions, Kluwer, Boston, 1997. [21] D. Dubois, H. Prade, On the unicity of Dempster rule of combination, International Journal of Intelligent System (1996) 133–142. [22] F. Voorbraak, On the justification of Dempster’s rule of combinations, Artificial Intelligence (1991) 171–197. [23] P. Smets, Resolving misunderstandings about belief functions, International Journal of Approximate Reasoning 6 (1992) 321–344. [24] L.A. Zadeh, On the validity of Dempster’s rule of combination of evidence, University of California, Berkeley, 1979, ERL Memo M79/24. [25] R.R. Yager, On the Dempster–Shafer framework and new combination rules, Information Sciences 41 (1987) 93–137. [26] E. Lefevre, O. Colot, P. Vannoorenberghe, Belief function combination and conflict management, Information Fusion (2002) 149– 162. [27] P. Smets, in: M. Henrion, R.D. Schachter, L.N. Kanal, J.F. Lemmers (Eds.), Constructing the Pignistic Probability Function in a Context of Uncertainty, Amsterdam, North-Holland, 1990.

[28] T. Denœux, Analysis of evidence-theoretic decision rules for pattern classification, Pattern Recognition 30 (7) (1997) 1095– 1107. [29] C.K. Chow, On optimum recognition error and reject tradeoff, IEEE Transactions on Information Theory 16 (1970) 41–46. [30] B. Dubuisson, M. Masson, A statistical decision rule with incomplete knowledge about classes, Pattern Recognition 26 (1) (1993) 155–165. [31] T. Denœux, An evidence-theoric neural network classifier, IEEE International-Conference on Systems, Man and Cybernetics 3 (1995) 712–717. [32] E. Lefevre, O. Colot, P. Vannoorenberghe, D. de Brucq, Contribution des mesures d’information a la modelisation credibiliste de connaissances, Traitement du Signal 17 (2) (2000) 1–11. [33] S. Mathevet, L. Trassoudaine, P. Checchin, J. Alizon, Application de la theorie de l’evidence a la combinaison de segmentations en region, in: GRETSI, Vannes, France, 1999, pp. 635–639. [34] P. Vannoorenberghe, T. Denœux, Likelihood-based vs distancebased evidential classifiers, in: FUZZ-IEEE’2001, Melbourne. Australia, December 2001. [35] P. Smets, Belief functions: The disjunctive rule of combination and the Generalized Bayesian Theorem, International Journal of Approximate Reasoning 9 (1993). [36] L.P. Clarke, R.P. Velthuizen, S. Phuphanich, J.D. Schellenberg, J.A. Arrington, M. Silberger, MRI: stability of three supervised segmentation techniques, Magnetic Resonance Imaging 11 (1993) 95–106. [37] G. Gerig, J. Martin, R. Kinikis, O. Kubler, M. Shenton, F.A. Jolesz, Unsupervised tissue type segmentation of 3D dual-echo MR head data, Image and Vision Computing 10 (1992) 349–360. [38] J.R. Mitchell, S.J. Kalik, D.H. Lee, A. Fenster, Computer-assisted identification and quantification of multiple sclerosis lesions in MR imaging volumes in the brain, Journal of Magnetic Resonance Imaging 4 (1994) 197–208. [39] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society 39 (1977) 1–38. [40] A.-S. Capelle, O. Alata, C. Fernandez, S. Lefevre, J.-C. Ferrie, Unsupervised segmentation for automatic detection of brain tumors in MRI, in: IEEE International Conference on Image Processing, ICIP2000, vol. 1, Vancouver, Canada, 10–13 September 2000, pp. 613–616.