Classifier fusion with interval-valued weights

Pattern Recognition Letters 34 (2013) 1623–1629 Contents lists available at SciVerse ScienceDirect Pattern Recognition Letters journal homepage: www...

Download PDF

321KB Sizes 0 Downloads 25 Views

Report

PDF Reader
Full Text

Pattern Recognition Letters 34 (2013) 1623–1629

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Classiﬁer fusion with interval-valued weights Robert Burduk ⇑ Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland

a r t i c l e

i n f o

Article history: Available online 4 June 2013 Communicated by Manuel Graña Keywords: Classiﬁer fusion Interval information Ensemble classiﬁer Multiple classiﬁer system

a b s t r a c t The article presents a new approach of calculating the weight of base classiﬁers from a committee of classiﬁers. The obtained weights are interpreted in the context of the interval-valued sets. The work proposes four different ways of calculating weights which consider both the correctness and incorrectness of the classiﬁcation. The proposed weights have been used in the algorithms which combine the outputs of base classiﬁers. In this work we use both the outputs, represented by rank and measure level. Research experiments have involved several bases available in the UCI repository and two data sets that have generated distributions. The performed experiments compare algorithms which are based on calculating the weights according to the resubstitution and algorithms proposed in the work. The ensemble of classiﬁers has also been compared with the base classiﬁers entering the committee. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction Creating a group of classiﬁers is one of the ways to improve the classiﬁcation accuracy in the recognition process (Kuncheva, 2004; Kittler et al., 1998). In particular, the aim is to allow the ensemble of classiﬁers to obtain greater values of the classiﬁcation correctness than in the case of a single classiﬁer from the pool. Research regarding the problem of recognition has been in focus for more than ﬁfteen years now. Due to the large number of different methods for combining classiﬁers and creating an ensemble of classiﬁers it is difﬁcult to identify the best method for a particular recognition task (Alkoot and Kittler, 1999; Chen and Cheng, 2001; Kittler and Alkoot, 2003; Zhang and Duin, 2011). Problems involved in these areas are still evolving and there are new concepts associated with them (Cyganek, 2012; Woloszynski et al., 2011). While analysing the outcomes of the base classiﬁers we can distinguish three primary situations (Xu et al., 1992). In the ﬁrst one, the label description (crisp label) of the class of the recognized object is available. In the second, the labels obtained from a base classiﬁer are ranked in a queue. In the third, each base classiﬁer returns a posteriori probability of membership of the recognized object to each of the possible class labels. In recent years, many studies have presented different issues related to this recognition task. One of them deﬁnes the weights assigned to the given component classiﬁers. The selection of the appropriate system of weights has been widely discussed in the literature regarding the construction of the complex classiﬁers (Woods et al., 1997; Wozniak et al., 2009). This problem can also be formulated as a

separate element of the process of learning in the classiﬁers committee (Kuncheva et al., 2000). Recently, many papers describe the use of the interval information in pattern recognition (Bhadra et al., 2009; Kulczycki et al., 2011; Silva et al., 2006; Viertl, 1996). In particular, they refer to the inaccurate data description expressed by the interval information. The interval information is based on the interval analysis which belongs to the ﬁeld of mathematics (Alefeld and Herzberger, 1983). Its advantage is modelling the uncertainty of the given value in the simplest way possible. The investigated value meets the dependency and thus can be regarded as the value in the range. In the article, four ways of calculating weights of the classiﬁers entering the classiﬁers’ committee will be suggested. These weights will be interpreted as the lower and upper values of the base classiﬁers’ weights. The deﬁned boundaries refer to the correctness or incorrectness of these classiﬁers. The text is organized as follows: in Section 2 the deﬁnitions associated with the classiﬁer fusion are presented. In particular, two approaches will be introduced that differ in the type of data received from the base classiﬁers’ outcomes. In Section 3 the new methods of assigning weights of individual base classiﬁers are presented. Section 4 includes the description of research experiments comparing the suggested algorithms with others that are based on the same data received on the outcome of the of base classiﬁers. Finally, conclusions from the experiments are presented. 2. Classiﬁer fusion Let us assume that we possess K of different classiﬁers

W1 ; W2 ; . . . ; WK . Such a set of classiﬁers, which is constructed on ⇑ Tel.: +48 71 320 2877; fax: +48 71 320 2902. E-mail address: [email protected] 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.05.022

the basis of the same learning sample is called an ensemble classiﬁer or a combining classiﬁer. However, each of the Wi classiﬁers is

1624

R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629

described as a component or base classiﬁer. As a rule K is assumed to be an odd number and each of Wi classiﬁers makes an independent decision. As a result, of all the classiﬁers’ action, their K responses are obtained. Having at the disposal a set of base classiﬁers one should determine the procedure of making the ultimate decision regarding the allocation of the object to the given class. It implies that the output information from all K component classiﬁers is applied to make the ultimate decision. 2.1. Fusion techniques for class labels

K X

16i6M

IðWk ðxÞ ¼ iÞ;

ð1Þ

k¼1

where i denotes the set of class labels and IðÞ is the indicator function. Another method of combining the classiﬁers is the weighted voting. In this approach each of the classiﬁers has an allocated weight, which is taken into account when reaching the ﬁnal decision of the group. Weights depend largely on the quality of their base classiﬁers. In the case when each classiﬁer has one weight for all the possible classes or for the complete features space an adequate group classiﬁcation formula is presented as follows:

WwMV ðxÞ ¼ argmax

16i6M

K X wk IðWk ðxÞ ¼ iÞ;

ð2Þ

k¼1

where wk ¼ 1 PeWk , and PeWk is the empirical error of Wk classiﬁer estimated on the testing set. In the case when the error is estimated on the learning set, we can talk about the estimation error based on the resubstitution method. Then wk weight of each component classiﬁer is calculated depending on the:

PN wk ¼

n¼1 Ið

Wk ðxn Þ ¼ i; jn ¼ iÞ N

:

ð3Þ

The N value refers to the number of the learning set observations, which is used for estimating classiﬁers’ weights, and jn is the class number of the object with n index. Another way of calculating weights is the approach of giving each of the classiﬁers as much weight as there are pre-deﬁned classes in the recognition task. In this case, the classiﬁcation rule is the following:

WwcMV ðxÞ ¼ argmax

16i6M

K X wki IðWk ðxÞ ¼ iÞ; k¼1

PN

IðW ðxn Þ¼i;jn ¼iÞ

PN k

n¼1

n¼1

Iðjn ¼iÞ

.The obtained weights are normalised for

each of the classes according to the formula: K X wki ¼ 1;

ð5Þ

k¼1

which means that the sum of weights of all the base classiﬁers for the given class i is equal to unity. 2.2. Fusion techniques for a posteriori probability

One of the possible types of information obtained from the component classiﬁers is a class label that is assigned to the given observation. Having at the disposal a set of K labels and in order to obtain the ﬁnal decision different methods of connecting outputs of classiﬁers’ sets are applied. The way of reaching the decision is based on counting the votes and is deﬁned as the voting method. One of the most common methods of connecting classiﬁers is the majority voting. The method implies that each of the component classiﬁers of the committee gives a rightful vote and the object is assigned to the class which gets the largest number of votes given by the base classiﬁers. The advantage of the method is its simplicity and lack of any calculation apart from counting votes of the individual classiﬁers. One of the drawbacks of the approach to counting scores is the draw situation, which denotes that the same number of classiﬁers points to more than one class. In the tasks of binary classiﬁcation the solution of the problem can be using the odd number of base classiﬁers. The algorithm of making the ultimate decision by the set of classiﬁers in this approach is the following:

WMV ðxÞ ¼ argmax

with wki ¼

ð4Þ

Suppose that we have at our disposal the probability evaluation that the tested object belonging to each of the classes for all base classiﬁers. In particular, this evaluation is a posteriori probability, which can be obtained in the way of the parametric as well as nonparametric estimation. Let us denote a posteriori probability esti^k ðijxÞ; k ¼ 1; 2; . . . ; K; i ¼ 1; 2; . . . ; M. In the literature mation by p several methods of denoting the scores of classiﬁers’ groups have been proposed for this type of the problem. We can distinguish the sum, prod and mean methods. In the sum method the score of the group of classiﬁers is based on the application of the following sums:

si ðxÞ ¼

K X ^k ðijxÞ; p

i ¼ 1; 2; . . . ; M;

ð6Þ

k¼1

however, following the prod method a posteriori probability prods are applied:

pri ðxÞ ¼

K Y

^k ðijxÞ; p

i ¼ 1; 2; . . . ; M:

ð7Þ

k¼1

The ﬁnal decision of the group of classiﬁers is made following the maximum rule and is presented accordingly, depending on the sum method (6) or the prod method (7):

WSUM ðxÞ ¼ argmaxsi ðxÞ;

ð8Þ

WPROD ðxÞ ¼ argmaxpri ðxÞ:

ð9Þ

i

i

In the presented methods (8) and (9) discrimination functions obtained from the individual classiﬁers take an equal part in building the combined classiﬁer. Also, the weighted versions of these methods can be created effortlessly. In this case, similarly as in the case of classiﬁers (2) and (4) one needs to formulate ﬁrstly the way of calculating the individual weights of classiﬁers. It can be done according to the dependence (3), which means that each classiﬁer is assigned weight depending on its classiﬁcation error. 3. Interval-valued weights in classiﬁer fusion The methods of calculating weights, which were presented in the previous chapter take into account the quality of classiﬁcation of the individual base classiﬁers. The calculation is done simultaneously for all the classiﬁers and is concluded by the normalisation process. The calculated weights do not therefore consider decisions made by other classiﬁers. We will now suggest a new way of calculating weights, which takes into account the result of all base classiﬁers entering the committee, for each of the learning objects. Two main cases will be discussed. In the ﬁrst one the correctness of the classiﬁcation will be taken into consideration. The second case, however, will refer to the incorrectness of the base classiﬁers. For each of the cases the upper and lower value of the base classiﬁers’ weight will be suggested. The obtained range between the upper and lower values deﬁnes the uncertainty in estimating the quality

1625

R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629

of the classiﬁcation of the particular classiﬁer in the context of the classiﬁcation quality of other classiﬁers entering the committee. We will suggest now the method of determining weights for the individual base classiﬁers, which implies using the interval-valued description. It means that the particular weights will not be provided precisely but their lower and upper value will be used. Therefore, each wk weight of the K-component classiﬁer will be represented by the upper wk and lower wk value. Let us differentiate between two approaches to the individual weight determination. In the ﬁrst one, the correct classiﬁcation of the relevant component classiﬁers will be used. The second one will be based on applying the incorrect classiﬁcation in order to obtain weights.

decision. In the case of samples n 2 f7; 9g, each of the classiﬁers pointed to the correct decision. Therefore, the relevant values are equal unities. In the relevant rows of Table 2 we have values related to the classiﬁers’ fraction that did not make errors, while the given classiﬁer made an error. For example, for the sample of n ¼ 1 the classiﬁer W2 made an error, and 50% of the other classiﬁers pointed to the appropriate class. In the case of n ¼ 10 sample, this classiﬁer made an error and all the other ones showed the correct prediction. The classiﬁcation rule of the classiﬁers’ group that uses weights obtained according to the dependence (10) and which is based on the majority voting will be denoted as WIVucwMV . The algorithm of making the ﬁnal decision will be presented as follows:

3.1. Interval-valued weights for the accuracy of the base classiﬁers The basis for the determination of upper and lower values of weights will be the correct classiﬁcation of the component classiﬁers. Firstly, the approach will be presented with every component classiﬁer having one weight allocated. Then, we will introduce the situation in which the number of weights of each classiﬁer is equal to the number of the pre-deﬁned classes. Having at the disposal a group of K component classiﬁers W1 ; W2 ; . . . ; WK we ascribe at the learning set the probability of the correct classiﬁcation PcWk for each of them. The upper weight wk of the k-classiﬁer refers to the situation in which k-classiﬁer was correct, while the other committee classiﬁers proved the correct prediction. The lower wk describes the situation in which k-classiﬁer made errors, while the other committee classiﬁers did not make any errors. The upper value of weight is obtained from the dependence:

PN

wk ¼

n n¼1 UC k ; PN max n¼1 UC nl

ð10Þ

WIVucwMV ðxÞ ¼ argmax

16i6M

WIVlcwMV ðxÞ ¼ argmax

16i6M

PK

l¼1;l–k Ið

Wl ðxn Þ ¼ jn Þ : K 1

ð11Þ

PN

n n¼1 LC k ; PN max n¼1 LC nl

ð12Þ

n n¼1 UC ki PN max n¼1 UC nli 16l6K

ð16Þ

;

where

UC nki

PK ¼ IðWk ðxn Þ ¼ jn ; jn ¼ iÞ

l¼1;l–k Ið

Wl ðxn Þ ¼ jn ; jn ¼ iÞ K 1

:

ð17Þ

However, the lower value of weight wki is obtained from the dependence:

PN

16l6K

wki ¼ 1

where

PK ¼ IðWk ðxn Þ – jn Þ

ð15Þ

k¼1

PN

However, the lower value of weight is obtained from the dependence:

LC nk

K X wk IðWk ðxÞ ¼ iÞ:

As previously, the relevant weights w1 ; . . . ; wk are normalised to unity. Let us now present the method of calculating the upper and lower values of the relevant weights in the case when each of the classiﬁers possesses as many weights as there are predeﬁned classes in the recognition task. The upper value of weight wki is obtained from the following dependence:

wki ¼

where

wk ¼ 1

ð14Þ

k¼1

Before making the ﬁnal decision according to the Eq. (14) upper values of weights w1 ; . . . ; wk are normalised to unity. Analogically, the rule for the group of classiﬁers that uses weights obtained according to the dependency (12) is marked as WIVlcwMV and has the following form:

16l6K

UC nk ¼ IðWk ðxn Þ ¼ jn Þ

K X wk IðWk ðxÞ ¼ iÞ:

l¼1;l–k Ið

Wl ðxn Þ ¼ jn Þ : K 1

ð13Þ

Let us now present a simple calculation example of the obtained values from the relevant lower and upper values of weights for the task of binary classiﬁcation with ﬁve component classiﬁers. In this task we will assume that the following class labels j 2 f0; 1g and the ten-elements learning set are available. In Table 1 each row is associated with one observation from the training set. The number of observation is marked as n, and jn is a true class label for nth observation. The base classiﬁers are marked as Wi ðxn Þ; i 2 1; 2; . . . 5. The corresponding table cell shows the prediction of the class label done by the base classiﬁer and the factor UC nk needed to calculate the upper value of the weight wk . The results obtained in the corresponding rows of Table 1 represent the value of the fraction of classiﬁers which did not make errors (made a correct decision), while the given classiﬁer did not make errors either. So, for the ﬁrst row, the classiﬁer W1 has the value of 0:25 assigned. It denotes, that the classiﬁer W1 did not make an error, furthermore, 25% of the rest provided the correct

n¼1 LC ki P max Nn¼1 LC li 16l6K

ð18Þ

;

where

LC nki

PK ¼ IðWk ðxn Þ – jn ; jn ¼ iÞ

l¼1;l–k Ið

Wl ðxn Þ ¼ jn ; jn ¼ iÞ K 1

:

ð19Þ

The rule for the group of classiﬁers that uses weights obtained according to the relation (16) and that are based on the majority voting is marked as WIVucwcMV ðxÞ and has the form:

WIVucwcMV ðxÞ ¼ argmax

16i6M

K X wki IðWk ðxÞ ¼ iÞ:

ð20Þ

k¼1

Analogically, the rule for the set of classiﬁers that uses weights obtained following the dependency (18) is marked as WIVlcwcMV and has the form:

WIVlcwcMV ðxÞ ¼ argmax

16i6M

K X wki IðWk ðxÞ ¼ iÞ:

ð21Þ

k¼1

Table 3 presents data needed to calculate the appropriate weights wki .

1626

R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629

Table 1 The outputs of the base classiﬁers, the values of UC nk coefﬁcients and upper values of the weights wk – the case of the component classiﬁers’ correctness. n

jn

W1 ðxn Þ

W2 ðxn Þ

W3 ðxn Þ

W4 ðxn Þ

W5 ðxn Þ

UC n1

UC n2

UC n3

UC n4

UC n5

1 2 3 4 5 6 7 8 9 10 PN

1 1 0 1 1 1 0 0 0 0

1 0 0 0 1 1 0 0 0 0

0 0 0 1 1 1 0 0 0 1

1 1 0 1 0 1 0 0 0 0

0 0 1 0 1 0 0 1 0 0

0 0 0 0 1 0 1 1 0 0

0.25 0 0.75 0 0.75 0.5 1 0.5 1 0.75

0 0 0.75 0.25 0.75 0.5 1 0.5 1 0

0.25 0 0.75 0.25 0 0.5 1 0.5 1 0.75

0 0 0 0 0.75 0 1 0 1 0.75

0 0 0.75 0 0.75 0 1 0 1 0.75

–

–

–

–

–

5.5

4.75

5

3.5

4.25

1

0.86

0.91

0.64

0.77

–

–

–

–

–

n n¼1 UC k

wk

Table 2 The outputs of the base classiﬁers, the values of LC nk coefﬁcients and lower values of the weights wk – the case of the component classiﬁers’ correctness. n

jn

W1 ðxn Þ

W2 ðxn Þ

W3 ðxn Þ

W4 ðxn Þ

W5 ðxn Þ

LC n1

LC n2

LC n3

LC n4

LC n5

1 2 3 4 5 6 7 8 9 10 PN

1 1 0 1 1 1 0 0 0 0

1 0 0 0 1 1 0 0 0 0

0 0 0 1 1 1 0 0 0 1

1 1 0 1 0 1 0 0 0 0

0 0 1 0 1 0 0 1 0 0

0 0 0 0 1 0 1 1 0 0

0 0.25 0 0.5 0 0 0 0 0 0

0.5 0.25 0 0 0 0 0 0 0 1

0 0 0 0 1 0 0 0 0 0

0.5 0.25 1 0.5 0 0.75 0 0.75 0 0

0.5 0.25 0 0.5 0 0.75 0 0.75 0 0

–

–

–

–

–

0.75

1.75

1

3.75

2.75

0.8

0.53

0.73

0

0.27

–

–

–

–

–

n n¼1 LC k

wk

Table 3 The outputs of the base classiﬁers, the values of UC nk coefﬁcients and upper values of the weights wki – the case of the component classiﬁers’ correctness. n

jn

UC n10

UC n20

UC n30

UC n40

UC n50

UC n11

UC n21

UC n31

UC n41

UC n51

1 2 3 4 5 6 7 8 9 10 PN

1 1 0 1 1 1 0 0 0 0

– – 0.75 – – – 1 0.5 1 0.75

– – 0.75 – – – 1 0.5 1 0

– – 0.75 – – – 1 0.5 1 0.75

– – 0 – – – 1 0 1 0.75

– – 0.75 – – – 1 0 1 0.75

0.25 0 – 0 0.75 0.5 – – – –

0 0 – 0.25 0.75 0.5 – – – –

0.25 0 – 0.25 0 0.5 – – – –

0 0 – 0 0.75 0 – – – –

0 0 – 0 0.75 0 – – – –

4

3.25

4

2.27

3.5

1.5

1.5

1

0.75

0.75

1

0.8

1

0.7

0.9

1

1

0.7

0.5

0.5

n n¼1 UC ki

wki

Before the application of the decision rules (20) and (21) the upper and lower values of weights (16) and (18) are normalised to unity for each class. It means that the sum of the upper and lower values of weights of all the base classiﬁers for the given class is equal to unity. 3.2. Interval-valued weights for the incorrectness of the base classiﬁers Now, the way of calculating the upper and lower values of weights will be presented, in which the classiﬁcation error of the individual component classiﬁers will be considered. In this case the upper value of weight wk of k-classiﬁer refers to the situation when k-classiﬁer was correct, while the other committee classiﬁers pointed to the incorrect decision. However, the lower value of weight wk describes the situation in which k-classiﬁer made errors, and the other classiﬁers made errors as well. The upper value of weight is obtained from the dependence:

PN

wk ¼

n n¼1 UEk ; PN max n¼1 UEnl

ð22Þ

16l6K

where

UEnk

PK

l¼1;l–k Ið

¼ IðWk ðxn Þ ¼ jn Þ

Wl ðxn Þ – jn Þ : K 1

ð23Þ

Then, the lower value of weight is obtained from the dependence:

PN wk ¼ 1

n¼1 LEk PN n¼1 LEl

max

16l6K

ð24Þ

;

where

LEnk ¼ IðWk ðxn Þ – jn Þ

PK

l¼1;l–k Ið

Wl ðxn Þ – jn Þ : K 1

ð25Þ

1627

R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629

The classiﬁcation rules for the classiﬁers’ committee are marked as WIVuewMV and WIVlewMV in the described case. In the rule WIVuewMV the upper values of weights are applied. These are calculated following the dependence (22). The rule is presented as:

WIVuewMV ðxÞ ¼ argmax

16i6M

K X wk IðWk ðxÞ ¼ iÞ:

ð26Þ

k¼1

In the rule WIVlewMV the lower values of weights are applied. These are calculated following the dependence (24). The rule is presented as:

WIVlewMV ðxÞ ¼ argmax

16i6M

K X

wk IðWk ðxÞ ¼ iÞ:

ð27Þ

k¼1

As in the case described before the values of weights are normalised to unity before the application of the adequate decision rules. If every classiﬁer has as many weights as there are predeﬁned classes in the recognition task, then the upper value of weight wki for the described problem is obtained from the following dependence:

PN

wki ¼

n n¼1 UEki PN max n¼1 UEnli 16l6K

ð28Þ

;

where

UEnki ¼ IðWk ðxn Þ ¼ i; jn ¼ iÞ

PK

l¼1;l–k Ið

Wl ðxn Þ – i; jn ¼ iÞ K 1

:

ð29Þ

However, the lower value of weight wkj is obtained from the dependence:

PN

wki ¼ 1

n n¼1 LEki PN max n¼1 LEnli 16l6K

ð30Þ

;

Table 4 Description of data sets selected for the experiments.

where

LEnki

PK ¼ IðWk ðxn Þ – i; jn ¼ iÞ

l¼1;l–k Ið

Wl ðxn Þ – i; jn ¼ iÞ K 1

:

ð31Þ

The classiﬁcation rules for the classiﬁers’ committee are marked as WIVuewcMV and WIVlewcMV in the described case. They are presented according to the following dependencies:

WIVuewcMV ðxÞ ¼ argmax

16i6M

K X wki IðWk ðxÞ ¼ iÞ

ð32Þ

k¼1

where

WIVlewcMV ðxÞ ¼ argmax

16i6M

(Duin et al., 2007), this is the so called banana distribution, the second one, instead, has random objects drawn in accordance with the procedure (Highleyman, 1962) – Higleyman distribution. The numbers of attributes, classes and available examples of the investigated data sets are introduced in Table 4. The research assumes that the group of classiﬁers is composed of seven base classiﬁers. Three of them work according to the k NN rule where the k parameter is from the set k 2 3; 5; 7. For the four remaining base classiﬁers the decision trees are used, where in the decision-making nodes the Gini index or entropy are used. The results are obtained via the 10-fold-cross-validation method. Table 5 presents the classiﬁcation accuracy for base classiﬁers and their respective average rank obtained from the Friedman test (Derrac et al., 2011; García et al., 2010). However, Table 6 shows the results for the ensemble of classiﬁers with the voting method and their respective average rank. Abbreviations taken in Tables 6–9, 1w-weights calculated from the formula (10), 1wc-weights calculated from the formula (16), 2w-weights calculated from the formula (12), 2wc-weights calculated from the formula (18) and so on. In particular, a few groups of algorithms were featured. For these the average ranks were calculated. One of the groups consisted of all the base classiﬁers as well as the ensembles of classiﬁers using the voting method. The average ranks for this case are labelled in Tables 5 and 6 as ’’RL, AR (all)’’. The second major group constitute all the base classiﬁers and the ensembles of classiﬁers using a fusion of a posteriori probability. The ranks obtained for this group are labelled accordingly as ’’ML,AR (all)’’. In Tables 7–9 also the average ranks were presented, with the appropriate group of classiﬁers using a similar way of combining the base classiﬁers’ outputs.

K X wki IðWk ðxÞ ¼ iÞ:

Data set

Example

Attribute

Class

Banana Breast Tissue Dermatology Glass Identiﬁcation Haberman’s Survival Highleyman Ionosphere Irys Pima Indians Diabetes Sonar (Mines vs. Rocks) Vertebral Column Wine

400 106 366 214 306 400 351 150 768 208 310 178

2 10 33 10 3 2 34 4 8 60 6 13

2 6 6 6 2 2 2 3 2 2 3 3

ð33Þ

k¼1

In this case, the upper and lower values of weights that are obtained accordingly from the dependencies (28) and (30) are also normalised to unity in each of the classes before the application of the above presented rules. Having weights calculated in accordance with the dependencies (10), (12), (16), (18), (22), (24), (28) and (30) the decision rules for the classiﬁers’ ensemble in which each base classiﬁer returns the a posteriori probability can be easily deﬁned. For example, in the ^k ðijxÞ is multiplied by the appropriate weight. sum rule p 4. Experimental studies In the experiential research 12 data sets were tested. Ten data sets come from UCI repository (2007). Two of them have random observations in accordance with a certain assumed distribution. One of them has objects generated according to the procedure

Table 5 The classiﬁcation accuracy for base classiﬁers and their respective average rank obtained from the Friedman test. Data set

W1

W2

W3

W4

W5

W6

W7

Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine

0.971 0.57 0.875 0.657 0.746 0.921 0.841 0.965 0.739 0.803 0.819 0.854

0.97 0.541 0.863 0.613 0.739 0.914 0.833 0.967 0.746 0.792 0.83 0.823

0.969 0.5 0.848 0.612 0.739 0.914 0.823 0.976 0.759 0.778 0.829 0.793

0.958 0.681 0.922 0.638 0.687 0.918 0.887 0.954 0.731 0.694 0.827 0.927

0.95 0.627 0.891 0.617 0.691 0.915 0.881 0.952 0.744 0.692 0.793 0.904

0.949 0.695 0.93 0.684 0.694 0.904 0.883 0.954 0.735 0.698 0.821 0.884

0.939 0.595 0.932 0.594 0.7 0.909 0.864 0.948 0.736 0.634 0.791 0.868

ML,AR (all) RL,AR (all)

9.67 22.13

11.13 25.58

11.54 26.92

11.71 28.54

14.75 34.58

13.38 30.21

16.25 36.29

1628

R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629

Table 6 The classiﬁcation accuracy for ensemble of classiﬁers with voting method and their respective average rank obtained from the Friedman test. Data set

V

wV

wcV

1wV

1wcV

2wV

2wcV

3wV

3wcV

4wV

4wcV

Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine

0.964 0.632 0.943 0.675 0.718 0.915 0.908 0.952 0.76 0.783 0.845 0.907

0.964 0.657 0.943 0.69 0.718 0.915 0.908 0.952 0.76 0.783 0.843 0.912

0.964 0.657 0.943 0.699 0.718 0.915 0.908 0.952 0.76 0.783 0.844 0.911

0.964 0.657 0.943 0.687 0.718 0.915 0.908 0.952 0.76 0.783 0.844 0.912

0.964 0.657 0.944 0.697 0.718 0.915 0.908 0.952 0.76 0.783 0.844 0.911

0.967 0.681 0.939 0.691 0.718 0.918 0.906 0.959 0.756 0.756 0.846 0.916

0.966 0.676 0.94 0.693 0.718 0.916 0.907 0.957 0.757 0.756 0.844 0.911

0.969 0.7 0.934 0.687 0.713 0.916 0.907 0.952 0.752 0.741 0.844 0.92

0.966 0.678 0.94 0.687 0.718 0.92 0.904 0.961 0.754 0.75 0.839 0.909

0.966 0.692 0.939 0.699 0.729 0.916 0.905 0.974 0.757 0.734 0.839 0.911

0.965 0.681 0.943 0.697 0.727 0.918 0.905 0.963 0.757 0.767 0.83 0.904

RL,AR (all) AR (group)

8.33 6.67

7.67 6.08

7.21 5.63

7.58 6.00

7.13 5.54

6.54 5.13

7.46 5.96

8.17 6.67

8.46 7.04

6.88 5.63

7.17 5.67

Table 7 The classiﬁcation accuracy for ensemble of classiﬁers with sum method and their respective average rank obtained from the Friedman test Data set

S

wS

wcS

1wS

1wcS

2wS

2wcS

3wS

3wcS

4wS

4wcS

Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine

0.96 0.681 0.949 0.717 0.72 0.917 0.911 0.95 0.765 0.787 0.848 0.921

0.961 0.692 0.949 0.719 0.72 0.917 0.912 0.95 0.767 0.784 0.847 0.921

0.961 0.681 0.949 0.717 0.718 0.917 0.912 0.95 0.767 0.783 0.848 0.921

0.96 0.697 0.949 0.717 0.72 0.917 0.912 0.95 0.765 0.787 0.848 0.921

0.96 0.692 0.949 0.713 0.719 0.917 0.912 0.95 0.767 0.787 0.848 0.921

0.966 0.703 0.947 0.728 0.717 0.921 0.919 0.954 0.77 0.767 0.847 0.916

0.964 0.695 0.947 0.726 0.717 0.921 0.915 0.95 0.771 0.767 0.85 0.912

0.966 0.711 0.945 0.717 0.714 0.92 0.918 0.952 0.766 0.745 0.841 0.92

0.962 0.686 0.948 0.713 0.716 0.92 0.909 0.948 0.77 0.753 0.844 0.912

0.963 0.714 0.942 0.712 0.723 0.918 0.908 0.963 0.767 0.748 0.841 0.904

0.962 0.7 0.947 0.719 0.722 0.92 0.915 0.952 0.767 0.758 0.839 0.905

ML,AR (all) AR (group)

18.46 6.54

16 5.67

17.42 6.25

16.96 5.88

17.29 6.13

10.96 4.08

13.38 4.75

16.38 6.33

19.92 7.75

17.17 6.88

15.92 5.75

Table 8 The classiﬁcation accuracy for ensemble of classiﬁers with prod method and their respective average rank obtained from the Friedman test. Data set

P

wP

wcP

1wP

1wcP

2wP

2wcP

3wP

3wcP

4wP

4wcP

Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine

0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841

0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841

0.961 0.643 0.92 0.659 0.725 0.922 0.915 0.952 0.763 0.741 0.84 0.841

0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841

0.961 0.649 0.92 0.661 0.725 0.922 0.915 0.952 0.763 0.741 0.84 0.841

0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841

0.961 0.627 0.919 0.659 0.729 0.92 0.912 0.959 0.764 0.741 0.841 0.841

0.964 0.646 0.92 0.67 0.718 0.921 0.917 0.954 0.767 0.742 0.842 0.841

0.964 0.624 0.924 0.662 0.732 0.911 0.916 0.961 0.767 0.748 0.837 0.845

0.964 0.646 0.92 0.658 0.723 0.918 0.919 0.961 0.767 0.742 0.836 0.841

0.958 0.589 0.904 0.652 0.725 0.916 0.912 0.959 0.764 0.742 0.825 0.841

ML,AR (all) AR (group)

23.79 6.71

23.79 6.71

23.25 6.17

23.79 6.71

22.71 5.63

23.79 6.71

23.67 6.29

20 4.04

20.42 3.71

20.63 5.08

28.54 8.25

In the case of the ensembles of classiﬁers that use the voting method as well as the suggested in the work ways of calculating weights, minor differences in the average ranks were obtained (Table 6). The best outcomes were received for the suggested ensembles of classiﬁers, in which the weights are calculated according to the dependencies (12). However, the improvement of the classiﬁcation quality in comparison to other ensembles of classiﬁers is not statistically signiﬁcant. In the case of the analysis taking also into account the outcomes obtained by the base classiﬁers the critical value was applied which was calculated on the basis of the pos-hoc Nemenei test. This value for the twelve investigated data sets and the eighteen classiﬁers constituting the appropriate group on the signiﬁcance level 0:05 is 76. The best

ensemble of classiﬁers from the pool is therefore statistically better than two base classiﬁers, whereas the classiﬁer WMV is only better from one classiﬁer. The post hoc analysis using the Bonferroni–Dunn test for each of the groups of classiﬁers (Tables 7–9) that are based on another way of combining classiﬁers’ outputs is linked to the critical value that equals to 3:8 and 3:5 according to the signiﬁcance level, which equals 0:05 and 0:1. Therefore, the suggested in this work ways of calculating weights inﬂuence the improvement in the quality of the ensembles of classiﬁers. In two cases these are the ensembles of classiﬁers using the weights calculated with the dependency (12). It must be noted that the improvement is not statistically signiﬁcant.

1629

R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629 Table 9 The classiﬁcation accuracy for ensemble of classiﬁers with average method and their respective average rank obtained from the Friedman test. Data set

A

wA

wcA

1wA

1wcA

2wA

2wcA

3wA

3wcA

4wA

4wcA

Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine

0.96 0.681 0.949 0.717 0.72 0.917 0.911 0.95 0.765 0.787 0.848 0.921

0.961 0.692 0.949 0.719 0.72 0.917 0.912 0.95 0.767 0.784 0.847 0.921

0.961 0.681 0.949 0.719 0.718 0.917 0.912 0.95 0.767 0.783 0.848 0.921

0.96 0.697 0.949 0.717 0.72 0.917 0.912 0.95 0.765 0.787 0.848 0.921

0.96 0.689 0.949 0.717 0.719 0.917 0.912 0.95 0.767 0.787 0.848 0.921

0.966 0.703 0.947 0.728 0.717 0.921 0.919 0.954 0.77 0.767 0.847 0.916

0.964 0.681 0.947 0.725 0.717 0.921 0.915 0.959 0.771 0.767 0.85 0.912

0.966 0.711 0.945 0.717 0.714 0.92 0.918 0.952 0.766 0.745 0.841 0.92

0.962 0.686 0.946 0.712 0.716 0.916 0.905 0.957 0.77 0.753 0.844 0.914

0.963 0.714 0.942 0.712 0.723 0.918 0.908 0.963 0.767 0.748 0.841 0.904

0.961 0.673 0.937 0.717 0.72 0.919 0.91 0.961 0.767 0.77 0.836 0.912

ML,AR (all) AR (group)

18.46 6.38

16 5.33

16.88 5.79

16.96 5.75

17 5.75

10.96 4.29

12.38 4.63

16.38 6.42

19.92 7.83

17.17 6.58

19.46 7.25

The obtained values of the average ranks and the post hoc analysis using the Nemenei test show that the suggested in this work ways of calculating weights for the ensemble of classiﬁers improve the quality of classiﬁcation. In particular, the statement refers to the comparison of the suggested algorithms with the base classiﬁers. In the group consisting of all the base classiﬁers and the ensembles of classiﬁers using the fusion a posteriori probability the lowest ranks were obtained for the proposed in this article weighted fusion methods of the based classiﬁers’ outcomes WIVlcwS and WIVlcwM . The critical value for the discussed group of the forty classiﬁers is 18:55 and 17:55 accordingly for the level of signiﬁcance 0:05 and 0:01. The quality of classiﬁcation of WIVlcwS ; WIVlcwM is therefore statistically signiﬁcantly different from the quality of classiﬁcation of the three or four base classiﬁers. The quality of the classiﬁers WwS and WwM is statistically signiﬁcantly different only from the two base classiﬁers (on the investigated levels of signiﬁcance). The carried out research conﬁrms also that the better quality of classiﬁcation gets the ensemble of classiﬁers using the discrimination functions of base classiﬁers rather than the ensemble based on the voting idea. 5. Conclusion In the paper two new ways of calculating the weights of classiﬁers entering the committee were presented. In the process of calculating weights the proposed methods take into account the prediction of all classiﬁers. For each of the ways of calculating weights the relevant upper and lower values were suggested in this work. The obtained range between the upper and lower value of the weight denotes the uncertainty of estimating the classiﬁcation quality of an individual classiﬁer in the context of the quality of other classiﬁers entering the committee. In the experiential research the ten benchmark data sets and two sets with randomly generated distributions were used. In the carried out analysis the non parametric, statistical tests were used. The obtained averages ranges indicate the improvement of the quality in the classiﬁcation for the proposed methods of calculating weights. The improvement of the classiﬁcation quality for the investigated bases in most cases is not statistically signiﬁcant. However, the number of base classiﬁers that give statistically worse results in comparison to the committees of classiﬁers using the suggested weights rises. Acknowledgement This work is supported by the Polish National Science Center under a Grant N N519 650440 for the period 2011–2014.

References Asuncion, A., Newman, D. J., 2007. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http:// www.ics.uci.edu/ mlearn/MLRepository.html. Alefeld, G., Herzberger, J., 1983. Introduction to interval computations. Computer science and applied mathematics. Academic Press. Alkoot, F.M., Kittler, J., 1999. Experimental evaluation of expert fusion strategies. Pattern Recogn. Lett. 20 (11–13), 1361–1369. Bhadra, S., Nath, J., Ben-Tal, A., Bhattacharyya, C., 2009. Interval data classiﬁcation under partial information: a chance-constraint approach. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (Eds.), Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, 5476. Springer, Berlin Heidelberg, pp. 208–219. Chen, D., Cheng, X., 2001. An asymptotic analysis of some expert fusion methods. Pattern Recogn. Lett. 22 (8), 901–904. Cyganek, B., 2012. One-class support vector ensembles for image segmentation and classiﬁcation. J. Math. Imaging Vision 42 (2–3), 103–117. Derrac, J., García, S., Molina, D., Herrera, F., 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1 (1), 3–18. Duin, R., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D., Verzakov, S., 2007. PR-Tools4.1. A matlab toolbox for pattern recognition. Delft University of Technology. García, S., Fernández, A., Luengo, J., Herrera, F., 2010. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180 (10), 2044–2064. Highleyman, W.H., 1962. The design and analysis of pattern recognition experiments. Bell Syst. Tech. J. 41, 723–744. Kittler, J., Alkoot, F.M., 2003. Sum versus vote fusion in multiple classiﬁer systems. IEEE Trans. Pattern Anal. Mach. Intell. 25 (1), 110–115. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., 1998. On combining classiﬁers. IEEE Trans. Pattern Anal. Mach. Intell. 20 (3), 226–239. Kulczycki, P., Kowalski, P.A., 2011. Bayes classiﬁcation of imprecise information of interval type. Control Cybern. 40 (1), 101–123. Kuncheva, L., Jain, L., 2000. Designing classiﬁer fusion systems by genetic algorithms. IEEE Trans. Evol. Comput. 4 (4), 327–336. Kuncheva, L.I., 2004. Combining Pattern Classiﬁers: Methods and Algorithms. Wiley-Interscience. Silva, F., de Carvalho, F.A.T., Souza, R., Silva, J., 2006. A modal symbolic classiﬁer for interval data. In: King, I., Wang, J., Chan, L.-W., Wang, D. (Eds.), Neural Information Processing, Lecture Notes in Computer Science, 4233. Springer, Berlin Heidelberg, pp. 50–59. Viertl, R., 1996. Statistical Methods for Non-precise Data. CRC Press Inc. Woloszynski, T., Kurzynski, M., 2011. A probabilistic model of classiﬁer competence for dynamic ensemble selection. Pattern Recogn. 44 (10–11), 2656–2668. Woods, K., Kegelmeyer Jr., W.P., Bowyer, K., 1997. Combination of multiple classiﬁers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19 (4), 405–410. Wozniak, M., Jackowski, K., 2009. Some remarks on chosen methods of classiﬁer fusion based on weighted voting. In: Corchado, E., Wu, X., Oja, E., Herrero, A., Baruque, B. (Eds.), Hybrid Artiﬁcial Intelligence Systems, Lecture Notes in Computer Science, 5572. Springer, Berlin Heidelberg, pp. 541–548. Xu, L., Krzyzak, A., Suen, C.Y., 1992. Methods for combining multiple classiﬁers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22, 418–435. Zhang, C.-X., Duin, R.P.W., 2011. An experimental study of one- and two-level classiﬁer fusion for different sample sizes. Pattern Recogn. Lett. 32 (14), 1756– 1767.

Classifier fusion with interval-valued weights

Classifier fusion with interval-valued weights

Recommend Documents