Pattern Recognition Letters 34 (2013) 1623–1629
Contents lists available at SciVerse ScienceDirect
Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec
Classifier fusion with interval-valued weights Robert Burduk ⇑ Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland
a r t i c l e
i n f o
Article history: Available online 4 June 2013 Communicated by Manuel Graña Keywords: Classifier fusion Interval information Ensemble classifier Multiple classifier system
a b s t r a c t The article presents a new approach of calculating the weight of base classifiers from a committee of classifiers. The obtained weights are interpreted in the context of the interval-valued sets. The work proposes four different ways of calculating weights which consider both the correctness and incorrectness of the classification. The proposed weights have been used in the algorithms which combine the outputs of base classifiers. In this work we use both the outputs, represented by rank and measure level. Research experiments have involved several bases available in the UCI repository and two data sets that have generated distributions. The performed experiments compare algorithms which are based on calculating the weights according to the resubstitution and algorithms proposed in the work. The ensemble of classifiers has also been compared with the base classifiers entering the committee. Ó 2013 Elsevier B.V. All rights reserved.
1. Introduction Creating a group of classifiers is one of the ways to improve the classification accuracy in the recognition process (Kuncheva, 2004; Kittler et al., 1998). In particular, the aim is to allow the ensemble of classifiers to obtain greater values of the classification correctness than in the case of a single classifier from the pool. Research regarding the problem of recognition has been in focus for more than fifteen years now. Due to the large number of different methods for combining classifiers and creating an ensemble of classifiers it is difficult to identify the best method for a particular recognition task (Alkoot and Kittler, 1999; Chen and Cheng, 2001; Kittler and Alkoot, 2003; Zhang and Duin, 2011). Problems involved in these areas are still evolving and there are new concepts associated with them (Cyganek, 2012; Woloszynski et al., 2011). While analysing the outcomes of the base classifiers we can distinguish three primary situations (Xu et al., 1992). In the first one, the label description (crisp label) of the class of the recognized object is available. In the second, the labels obtained from a base classifier are ranked in a queue. In the third, each base classifier returns a posteriori probability of membership of the recognized object to each of the possible class labels. In recent years, many studies have presented different issues related to this recognition task. One of them defines the weights assigned to the given component classifiers. The selection of the appropriate system of weights has been widely discussed in the literature regarding the construction of the complex classifiers (Woods et al., 1997; Wozniak et al., 2009). This problem can also be formulated as a
separate element of the process of learning in the classifiers committee (Kuncheva et al., 2000). Recently, many papers describe the use of the interval information in pattern recognition (Bhadra et al., 2009; Kulczycki et al., 2011; Silva et al., 2006; Viertl, 1996). In particular, they refer to the inaccurate data description expressed by the interval information. The interval information is based on the interval analysis which belongs to the field of mathematics (Alefeld and Herzberger, 1983). Its advantage is modelling the uncertainty of the given value in the simplest way possible. The investigated value meets the dependency and thus can be regarded as the value in the range. In the article, four ways of calculating weights of the classifiers entering the classifiers’ committee will be suggested. These weights will be interpreted as the lower and upper values of the base classifiers’ weights. The defined boundaries refer to the correctness or incorrectness of these classifiers. The text is organized as follows: in Section 2 the definitions associated with the classifier fusion are presented. In particular, two approaches will be introduced that differ in the type of data received from the base classifiers’ outcomes. In Section 3 the new methods of assigning weights of individual base classifiers are presented. Section 4 includes the description of research experiments comparing the suggested algorithms with others that are based on the same data received on the outcome of the of base classifiers. Finally, conclusions from the experiments are presented. 2. Classifier fusion Let us assume that we possess K of different classifiers
W1 ; W2 ; . . . ; WK . Such a set of classifiers, which is constructed on ⇑ Tel.: +48 71 320 2877; fax: +48 71 320 2902. E-mail address:
[email protected] 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.05.022
the basis of the same learning sample is called an ensemble classifier or a combining classifier. However, each of the Wi classifiers is
1624
R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629
described as a component or base classifier. As a rule K is assumed to be an odd number and each of Wi classifiers makes an independent decision. As a result, of all the classifiers’ action, their K responses are obtained. Having at the disposal a set of base classifiers one should determine the procedure of making the ultimate decision regarding the allocation of the object to the given class. It implies that the output information from all K component classifiers is applied to make the ultimate decision. 2.1. Fusion techniques for class labels
K X
16i6M
IðWk ðxÞ ¼ iÞ;
ð1Þ
k¼1
where i denotes the set of class labels and IðÞ is the indicator function. Another method of combining the classifiers is the weighted voting. In this approach each of the classifiers has an allocated weight, which is taken into account when reaching the final decision of the group. Weights depend largely on the quality of their base classifiers. In the case when each classifier has one weight for all the possible classes or for the complete features space an adequate group classification formula is presented as follows:
WwMV ðxÞ ¼ argmax
16i6M
K X wk IðWk ðxÞ ¼ iÞ;
ð2Þ
k¼1
where wk ¼ 1 PeWk , and PeWk is the empirical error of Wk classifier estimated on the testing set. In the case when the error is estimated on the learning set, we can talk about the estimation error based on the resubstitution method. Then wk weight of each component classifier is calculated depending on the:
PN wk ¼
n¼1 Ið
Wk ðxn Þ ¼ i; jn ¼ iÞ N
:
ð3Þ
The N value refers to the number of the learning set observations, which is used for estimating classifiers’ weights, and jn is the class number of the object with n index. Another way of calculating weights is the approach of giving each of the classifiers as much weight as there are pre-defined classes in the recognition task. In this case, the classification rule is the following:
WwcMV ðxÞ ¼ argmax
16i6M
K X wki IðWk ðxÞ ¼ iÞ; k¼1
PN
IðW ðxn Þ¼i;jn ¼iÞ
PN k
n¼1
n¼1
Iðjn ¼iÞ
.The obtained weights are normalised for
each of the classes according to the formula: K X wki ¼ 1;
ð5Þ
k¼1
which means that the sum of weights of all the base classifiers for the given class i is equal to unity. 2.2. Fusion techniques for a posteriori probability
One of the possible types of information obtained from the component classifiers is a class label that is assigned to the given observation. Having at the disposal a set of K labels and in order to obtain the final decision different methods of connecting outputs of classifiers’ sets are applied. The way of reaching the decision is based on counting the votes and is defined as the voting method. One of the most common methods of connecting classifiers is the majority voting. The method implies that each of the component classifiers of the committee gives a rightful vote and the object is assigned to the class which gets the largest number of votes given by the base classifiers. The advantage of the method is its simplicity and lack of any calculation apart from counting votes of the individual classifiers. One of the drawbacks of the approach to counting scores is the draw situation, which denotes that the same number of classifiers points to more than one class. In the tasks of binary classification the solution of the problem can be using the odd number of base classifiers. The algorithm of making the ultimate decision by the set of classifiers in this approach is the following:
WMV ðxÞ ¼ argmax
with wki ¼
ð4Þ
Suppose that we have at our disposal the probability evaluation that the tested object belonging to each of the classes for all base classifiers. In particular, this evaluation is a posteriori probability, which can be obtained in the way of the parametric as well as nonparametric estimation. Let us denote a posteriori probability esti^k ðijxÞ; k ¼ 1; 2; . . . ; K; i ¼ 1; 2; . . . ; M. In the literature mation by p several methods of denoting the scores of classifiers’ groups have been proposed for this type of the problem. We can distinguish the sum, prod and mean methods. In the sum method the score of the group of classifiers is based on the application of the following sums:
si ðxÞ ¼
K X ^k ðijxÞ; p
i ¼ 1; 2; . . . ; M;
ð6Þ
k¼1
however, following the prod method a posteriori probability prods are applied:
pri ðxÞ ¼
K Y
^k ðijxÞ; p
i ¼ 1; 2; . . . ; M:
ð7Þ
k¼1
The final decision of the group of classifiers is made following the maximum rule and is presented accordingly, depending on the sum method (6) or the prod method (7):
WSUM ðxÞ ¼ argmaxsi ðxÞ;
ð8Þ
WPROD ðxÞ ¼ argmaxpri ðxÞ:
ð9Þ
i
i
In the presented methods (8) and (9) discrimination functions obtained from the individual classifiers take an equal part in building the combined classifier. Also, the weighted versions of these methods can be created effortlessly. In this case, similarly as in the case of classifiers (2) and (4) one needs to formulate firstly the way of calculating the individual weights of classifiers. It can be done according to the dependence (3), which means that each classifier is assigned weight depending on its classification error. 3. Interval-valued weights in classifier fusion The methods of calculating weights, which were presented in the previous chapter take into account the quality of classification of the individual base classifiers. The calculation is done simultaneously for all the classifiers and is concluded by the normalisation process. The calculated weights do not therefore consider decisions made by other classifiers. We will now suggest a new way of calculating weights, which takes into account the result of all base classifiers entering the committee, for each of the learning objects. Two main cases will be discussed. In the first one the correctness of the classification will be taken into consideration. The second case, however, will refer to the incorrectness of the base classifiers. For each of the cases the upper and lower value of the base classifiers’ weight will be suggested. The obtained range between the upper and lower values defines the uncertainty in estimating the quality
1625
R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629
of the classification of the particular classifier in the context of the classification quality of other classifiers entering the committee. We will suggest now the method of determining weights for the individual base classifiers, which implies using the interval-valued description. It means that the particular weights will not be provided precisely but their lower and upper value will be used. Therefore, each wk weight of the K-component classifier will be represented by the upper wk and lower wk value. Let us differentiate between two approaches to the individual weight determination. In the first one, the correct classification of the relevant component classifiers will be used. The second one will be based on applying the incorrect classification in order to obtain weights.
decision. In the case of samples n 2 f7; 9g, each of the classifiers pointed to the correct decision. Therefore, the relevant values are equal unities. In the relevant rows of Table 2 we have values related to the classifiers’ fraction that did not make errors, while the given classifier made an error. For example, for the sample of n ¼ 1 the classifier W2 made an error, and 50% of the other classifiers pointed to the appropriate class. In the case of n ¼ 10 sample, this classifier made an error and all the other ones showed the correct prediction. The classification rule of the classifiers’ group that uses weights obtained according to the dependence (10) and which is based on the majority voting will be denoted as WIVucwMV . The algorithm of making the final decision will be presented as follows:
3.1. Interval-valued weights for the accuracy of the base classifiers The basis for the determination of upper and lower values of weights will be the correct classification of the component classifiers. Firstly, the approach will be presented with every component classifier having one weight allocated. Then, we will introduce the situation in which the number of weights of each classifier is equal to the number of the pre-defined classes. Having at the disposal a group of K component classifiers W1 ; W2 ; . . . ; WK we ascribe at the learning set the probability of the correct classification PcWk for each of them. The upper weight wk of the k-classifier refers to the situation in which k-classifier was correct, while the other committee classifiers proved the correct prediction. The lower wk describes the situation in which k-classifier made errors, while the other committee classifiers did not make any errors. The upper value of weight is obtained from the dependence:
PN
wk ¼
n n¼1 UC k ; PN max n¼1 UC nl
ð10Þ
WIVucwMV ðxÞ ¼ argmax
16i6M
WIVlcwMV ðxÞ ¼ argmax
16i6M
PK
l¼1;l–k Ið
Wl ðxn Þ ¼ jn Þ : K 1
ð11Þ
PN
n n¼1 LC k ; PN max n¼1 LC nl
ð12Þ
n n¼1 UC ki PN max n¼1 UC nli 16l6K
ð16Þ
;
where
UC nki
PK ¼ IðWk ðxn Þ ¼ jn ; jn ¼ iÞ
l¼1;l–k Ið
Wl ðxn Þ ¼ jn ; jn ¼ iÞ K 1
:
ð17Þ
However, the lower value of weight wki is obtained from the dependence:
PN
16l6K
wki ¼ 1
where
PK ¼ IðWk ðxn Þ – jn Þ
ð15Þ
k¼1
PN
However, the lower value of weight is obtained from the dependence:
LC nk
K X wk IðWk ðxÞ ¼ iÞ:
As previously, the relevant weights w1 ; . . . ; wk are normalised to unity. Let us now present the method of calculating the upper and lower values of the relevant weights in the case when each of the classifiers possesses as many weights as there are predefined classes in the recognition task. The upper value of weight wki is obtained from the following dependence:
wki ¼
where
wk ¼ 1
ð14Þ
k¼1
Before making the final decision according to the Eq. (14) upper values of weights w1 ; . . . ; wk are normalised to unity. Analogically, the rule for the group of classifiers that uses weights obtained according to the dependency (12) is marked as WIVlcwMV and has the following form:
16l6K
UC nk ¼ IðWk ðxn Þ ¼ jn Þ
K X wk IðWk ðxÞ ¼ iÞ:
l¼1;l–k Ið
Wl ðxn Þ ¼ jn Þ : K 1
ð13Þ
Let us now present a simple calculation example of the obtained values from the relevant lower and upper values of weights for the task of binary classification with five component classifiers. In this task we will assume that the following class labels j 2 f0; 1g and the ten-elements learning set are available. In Table 1 each row is associated with one observation from the training set. The number of observation is marked as n, and jn is a true class label for nth observation. The base classifiers are marked as Wi ðxn Þ; i 2 1; 2; . . . 5. The corresponding table cell shows the prediction of the class label done by the base classifier and the factor UC nk needed to calculate the upper value of the weight wk . The results obtained in the corresponding rows of Table 1 represent the value of the fraction of classifiers which did not make errors (made a correct decision), while the given classifier did not make errors either. So, for the first row, the classifier W1 has the value of 0:25 assigned. It denotes, that the classifier W1 did not make an error, furthermore, 25% of the rest provided the correct
n¼1 LC ki P max Nn¼1 LC li 16l6K
ð18Þ
;
where
LC nki
PK ¼ IðWk ðxn Þ – jn ; jn ¼ iÞ
l¼1;l–k Ið
Wl ðxn Þ ¼ jn ; jn ¼ iÞ K 1
:
ð19Þ
The rule for the group of classifiers that uses weights obtained according to the relation (16) and that are based on the majority voting is marked as WIVucwcMV ðxÞ and has the form:
WIVucwcMV ðxÞ ¼ argmax
16i6M
K X wki IðWk ðxÞ ¼ iÞ:
ð20Þ
k¼1
Analogically, the rule for the set of classifiers that uses weights obtained following the dependency (18) is marked as WIVlcwcMV and has the form:
WIVlcwcMV ðxÞ ¼ argmax
16i6M
K X wki IðWk ðxÞ ¼ iÞ:
ð21Þ
k¼1
Table 3 presents data needed to calculate the appropriate weights wki .
1626
R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629
Table 1 The outputs of the base classifiers, the values of UC nk coefficients and upper values of the weights wk – the case of the component classifiers’ correctness. n
jn
W1 ðxn Þ
W2 ðxn Þ
W3 ðxn Þ
W4 ðxn Þ
W5 ðxn Þ
UC n1
UC n2
UC n3
UC n4
UC n5
1 2 3 4 5 6 7 8 9 10 PN
1 1 0 1 1 1 0 0 0 0
1 0 0 0 1 1 0 0 0 0
0 0 0 1 1 1 0 0 0 1
1 1 0 1 0 1 0 0 0 0
0 0 1 0 1 0 0 1 0 0
0 0 0 0 1 0 1 1 0 0
0.25 0 0.75 0 0.75 0.5 1 0.5 1 0.75
0 0 0.75 0.25 0.75 0.5 1 0.5 1 0
0.25 0 0.75 0.25 0 0.5 1 0.5 1 0.75
0 0 0 0 0.75 0 1 0 1 0.75
0 0 0.75 0 0.75 0 1 0 1 0.75
–
–
–
–
–
5.5
4.75
5
3.5
4.25
1
0.86
0.91
0.64
0.77
–
–
–
–
–
n n¼1 UC k
wk
Table 2 The outputs of the base classifiers, the values of LC nk coefficients and lower values of the weights wk – the case of the component classifiers’ correctness. n
jn
W1 ðxn Þ
W2 ðxn Þ
W3 ðxn Þ
W4 ðxn Þ
W5 ðxn Þ
LC n1
LC n2
LC n3
LC n4
LC n5
1 2 3 4 5 6 7 8 9 10 PN
1 1 0 1 1 1 0 0 0 0
1 0 0 0 1 1 0 0 0 0
0 0 0 1 1 1 0 0 0 1
1 1 0 1 0 1 0 0 0 0
0 0 1 0 1 0 0 1 0 0
0 0 0 0 1 0 1 1 0 0
0 0.25 0 0.5 0 0 0 0 0 0
0.5 0.25 0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0 0 0
0.5 0.25 1 0.5 0 0.75 0 0.75 0 0
0.5 0.25 0 0.5 0 0.75 0 0.75 0 0
–
–
–
–
–
0.75
1.75
1
3.75
2.75
0.8
0.53
0.73
0
0.27
–
–
–
–
–
n n¼1 LC k
wk
Table 3 The outputs of the base classifiers, the values of UC nk coefficients and upper values of the weights wki – the case of the component classifiers’ correctness. n
jn
UC n10
UC n20
UC n30
UC n40
UC n50
UC n11
UC n21
UC n31
UC n41
UC n51
1 2 3 4 5 6 7 8 9 10 PN
1 1 0 1 1 1 0 0 0 0
– – 0.75 – – – 1 0.5 1 0.75
– – 0.75 – – – 1 0.5 1 0
– – 0.75 – – – 1 0.5 1 0.75
– – 0 – – – 1 0 1 0.75
– – 0.75 – – – 1 0 1 0.75
0.25 0 – 0 0.75 0.5 – – – –
0 0 – 0.25 0.75 0.5 – – – –
0.25 0 – 0.25 0 0.5 – – – –
0 0 – 0 0.75 0 – – – –
0 0 – 0 0.75 0 – – – –
4
3.25
4
2.27
3.5
1.5
1.5
1
0.75
0.75
1
0.8
1
0.7
0.9
1
1
0.7
0.5
0.5
n n¼1 UC ki
wki
Before the application of the decision rules (20) and (21) the upper and lower values of weights (16) and (18) are normalised to unity for each class. It means that the sum of the upper and lower values of weights of all the base classifiers for the given class is equal to unity. 3.2. Interval-valued weights for the incorrectness of the base classifiers Now, the way of calculating the upper and lower values of weights will be presented, in which the classification error of the individual component classifiers will be considered. In this case the upper value of weight wk of k-classifier refers to the situation when k-classifier was correct, while the other committee classifiers pointed to the incorrect decision. However, the lower value of weight wk describes the situation in which k-classifier made errors, and the other classifiers made errors as well. The upper value of weight is obtained from the dependence:
PN
wk ¼
n n¼1 UEk ; PN max n¼1 UEnl
ð22Þ
16l6K
where
UEnk
PK
l¼1;l–k Ið
¼ IðWk ðxn Þ ¼ jn Þ
Wl ðxn Þ – jn Þ : K 1
ð23Þ
Then, the lower value of weight is obtained from the dependence:
PN wk ¼ 1
n¼1 LEk PN n¼1 LEl
max
16l6K
ð24Þ
;
where
LEnk ¼ IðWk ðxn Þ – jn Þ
PK
l¼1;l–k Ið
Wl ðxn Þ – jn Þ : K 1
ð25Þ
1627
R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629
The classification rules for the classifiers’ committee are marked as WIVuewMV and WIVlewMV in the described case. In the rule WIVuewMV the upper values of weights are applied. These are calculated following the dependence (22). The rule is presented as:
WIVuewMV ðxÞ ¼ argmax
16i6M
K X wk IðWk ðxÞ ¼ iÞ:
ð26Þ
k¼1
In the rule WIVlewMV the lower values of weights are applied. These are calculated following the dependence (24). The rule is presented as:
WIVlewMV ðxÞ ¼ argmax
16i6M
K X
wk IðWk ðxÞ ¼ iÞ:
ð27Þ
k¼1
As in the case described before the values of weights are normalised to unity before the application of the adequate decision rules. If every classifier has as many weights as there are predefined classes in the recognition task, then the upper value of weight wki for the described problem is obtained from the following dependence:
PN
wki ¼
n n¼1 UEki PN max n¼1 UEnli 16l6K
ð28Þ
;
where
UEnki ¼ IðWk ðxn Þ ¼ i; jn ¼ iÞ
PK
l¼1;l–k Ið
Wl ðxn Þ – i; jn ¼ iÞ K 1
:
ð29Þ
However, the lower value of weight wkj is obtained from the dependence:
PN
wki ¼ 1
n n¼1 LEki PN max n¼1 LEnli 16l6K
ð30Þ
;
Table 4 Description of data sets selected for the experiments.
where
LEnki
PK ¼ IðWk ðxn Þ – i; jn ¼ iÞ
l¼1;l–k Ið
Wl ðxn Þ – i; jn ¼ iÞ K 1
:
ð31Þ
The classification rules for the classifiers’ committee are marked as WIVuewcMV and WIVlewcMV in the described case. They are presented according to the following dependencies:
WIVuewcMV ðxÞ ¼ argmax
16i6M
K X wki IðWk ðxÞ ¼ iÞ
ð32Þ
k¼1
where
WIVlewcMV ðxÞ ¼ argmax
16i6M
(Duin et al., 2007), this is the so called banana distribution, the second one, instead, has random objects drawn in accordance with the procedure (Highleyman, 1962) – Higleyman distribution. The numbers of attributes, classes and available examples of the investigated data sets are introduced in Table 4. The research assumes that the group of classifiers is composed of seven base classifiers. Three of them work according to the k NN rule where the k parameter is from the set k 2 3; 5; 7. For the four remaining base classifiers the decision trees are used, where in the decision-making nodes the Gini index or entropy are used. The results are obtained via the 10-fold-cross-validation method. Table 5 presents the classification accuracy for base classifiers and their respective average rank obtained from the Friedman test (Derrac et al., 2011; García et al., 2010). However, Table 6 shows the results for the ensemble of classifiers with the voting method and their respective average rank. Abbreviations taken in Tables 6–9, 1w-weights calculated from the formula (10), 1wc-weights calculated from the formula (16), 2w-weights calculated from the formula (12), 2wc-weights calculated from the formula (18) and so on. In particular, a few groups of algorithms were featured. For these the average ranks were calculated. One of the groups consisted of all the base classifiers as well as the ensembles of classifiers using the voting method. The average ranks for this case are labelled in Tables 5 and 6 as ’’RL, AR (all)’’. The second major group constitute all the base classifiers and the ensembles of classifiers using a fusion of a posteriori probability. The ranks obtained for this group are labelled accordingly as ’’ML,AR (all)’’. In Tables 7–9 also the average ranks were presented, with the appropriate group of classifiers using a similar way of combining the base classifiers’ outputs.
K X wki IðWk ðxÞ ¼ iÞ:
Data set
Example
Attribute
Class
Banana Breast Tissue Dermatology Glass Identification Haberman’s Survival Highleyman Ionosphere Irys Pima Indians Diabetes Sonar (Mines vs. Rocks) Vertebral Column Wine
400 106 366 214 306 400 351 150 768 208 310 178
2 10 33 10 3 2 34 4 8 60 6 13
2 6 6 6 2 2 2 3 2 2 3 3
ð33Þ
k¼1
In this case, the upper and lower values of weights that are obtained accordingly from the dependencies (28) and (30) are also normalised to unity in each of the classes before the application of the above presented rules. Having weights calculated in accordance with the dependencies (10), (12), (16), (18), (22), (24), (28) and (30) the decision rules for the classifiers’ ensemble in which each base classifier returns the a posteriori probability can be easily defined. For example, in the ^k ðijxÞ is multiplied by the appropriate weight. sum rule p 4. Experimental studies In the experiential research 12 data sets were tested. Ten data sets come from UCI repository (2007). Two of them have random observations in accordance with a certain assumed distribution. One of them has objects generated according to the procedure
Table 5 The classification accuracy for base classifiers and their respective average rank obtained from the Friedman test. Data set
W1
W2
W3
W4
W5
W6
W7
Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine
0.971 0.57 0.875 0.657 0.746 0.921 0.841 0.965 0.739 0.803 0.819 0.854
0.97 0.541 0.863 0.613 0.739 0.914 0.833 0.967 0.746 0.792 0.83 0.823
0.969 0.5 0.848 0.612 0.739 0.914 0.823 0.976 0.759 0.778 0.829 0.793
0.958 0.681 0.922 0.638 0.687 0.918 0.887 0.954 0.731 0.694 0.827 0.927
0.95 0.627 0.891 0.617 0.691 0.915 0.881 0.952 0.744 0.692 0.793 0.904
0.949 0.695 0.93 0.684 0.694 0.904 0.883 0.954 0.735 0.698 0.821 0.884
0.939 0.595 0.932 0.594 0.7 0.909 0.864 0.948 0.736 0.634 0.791 0.868
ML,AR (all) RL,AR (all)
9.67 22.13
11.13 25.58
11.54 26.92
11.71 28.54
14.75 34.58
13.38 30.21
16.25 36.29
1628
R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629
Table 6 The classification accuracy for ensemble of classifiers with voting method and their respective average rank obtained from the Friedman test. Data set
V
wV
wcV
1wV
1wcV
2wV
2wcV
3wV
3wcV
4wV
4wcV
Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine
0.964 0.632 0.943 0.675 0.718 0.915 0.908 0.952 0.76 0.783 0.845 0.907
0.964 0.657 0.943 0.69 0.718 0.915 0.908 0.952 0.76 0.783 0.843 0.912
0.964 0.657 0.943 0.699 0.718 0.915 0.908 0.952 0.76 0.783 0.844 0.911
0.964 0.657 0.943 0.687 0.718 0.915 0.908 0.952 0.76 0.783 0.844 0.912
0.964 0.657 0.944 0.697 0.718 0.915 0.908 0.952 0.76 0.783 0.844 0.911
0.967 0.681 0.939 0.691 0.718 0.918 0.906 0.959 0.756 0.756 0.846 0.916
0.966 0.676 0.94 0.693 0.718 0.916 0.907 0.957 0.757 0.756 0.844 0.911
0.969 0.7 0.934 0.687 0.713 0.916 0.907 0.952 0.752 0.741 0.844 0.92
0.966 0.678 0.94 0.687 0.718 0.92 0.904 0.961 0.754 0.75 0.839 0.909
0.966 0.692 0.939 0.699 0.729 0.916 0.905 0.974 0.757 0.734 0.839 0.911
0.965 0.681 0.943 0.697 0.727 0.918 0.905 0.963 0.757 0.767 0.83 0.904
RL,AR (all) AR (group)
8.33 6.67
7.67 6.08
7.21 5.63
7.58 6.00
7.13 5.54
6.54 5.13
7.46 5.96
8.17 6.67
8.46 7.04
6.88 5.63
7.17 5.67
Table 7 The classification accuracy for ensemble of classifiers with sum method and their respective average rank obtained from the Friedman test Data set
S
wS
wcS
1wS
1wcS
2wS
2wcS
3wS
3wcS
4wS
4wcS
Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine
0.96 0.681 0.949 0.717 0.72 0.917 0.911 0.95 0.765 0.787 0.848 0.921
0.961 0.692 0.949 0.719 0.72 0.917 0.912 0.95 0.767 0.784 0.847 0.921
0.961 0.681 0.949 0.717 0.718 0.917 0.912 0.95 0.767 0.783 0.848 0.921
0.96 0.697 0.949 0.717 0.72 0.917 0.912 0.95 0.765 0.787 0.848 0.921
0.96 0.692 0.949 0.713 0.719 0.917 0.912 0.95 0.767 0.787 0.848 0.921
0.966 0.703 0.947 0.728 0.717 0.921 0.919 0.954 0.77 0.767 0.847 0.916
0.964 0.695 0.947 0.726 0.717 0.921 0.915 0.95 0.771 0.767 0.85 0.912
0.966 0.711 0.945 0.717 0.714 0.92 0.918 0.952 0.766 0.745 0.841 0.92
0.962 0.686 0.948 0.713 0.716 0.92 0.909 0.948 0.77 0.753 0.844 0.912
0.963 0.714 0.942 0.712 0.723 0.918 0.908 0.963 0.767 0.748 0.841 0.904
0.962 0.7 0.947 0.719 0.722 0.92 0.915 0.952 0.767 0.758 0.839 0.905
ML,AR (all) AR (group)
18.46 6.54
16 5.67
17.42 6.25
16.96 5.88
17.29 6.13
10.96 4.08
13.38 4.75
16.38 6.33
19.92 7.75
17.17 6.88
15.92 5.75
Table 8 The classification accuracy for ensemble of classifiers with prod method and their respective average rank obtained from the Friedman test. Data set
P
wP
wcP
1wP
1wcP
2wP
2wcP
3wP
3wcP
4wP
4wcP
Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine
0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841
0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841
0.961 0.643 0.92 0.659 0.725 0.922 0.915 0.952 0.763 0.741 0.84 0.841
0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841
0.961 0.649 0.92 0.661 0.725 0.922 0.915 0.952 0.763 0.741 0.84 0.841
0.961 0.643 0.92 0.658 0.725 0.922 0.915 0.952 0.762 0.741 0.84 0.841
0.961 0.627 0.919 0.659 0.729 0.92 0.912 0.959 0.764 0.741 0.841 0.841
0.964 0.646 0.92 0.67 0.718 0.921 0.917 0.954 0.767 0.742 0.842 0.841
0.964 0.624 0.924 0.662 0.732 0.911 0.916 0.961 0.767 0.748 0.837 0.845
0.964 0.646 0.92 0.658 0.723 0.918 0.919 0.961 0.767 0.742 0.836 0.841
0.958 0.589 0.904 0.652 0.725 0.916 0.912 0.959 0.764 0.742 0.825 0.841
ML,AR (all) AR (group)
23.79 6.71
23.79 6.71
23.25 6.17
23.79 6.71
22.71 5.63
23.79 6.71
23.67 6.29
20 4.04
20.42 3.71
20.63 5.08
28.54 8.25
In the case of the ensembles of classifiers that use the voting method as well as the suggested in the work ways of calculating weights, minor differences in the average ranks were obtained (Table 6). The best outcomes were received for the suggested ensembles of classifiers, in which the weights are calculated according to the dependencies (12). However, the improvement of the classification quality in comparison to other ensembles of classifiers is not statistically significant. In the case of the analysis taking also into account the outcomes obtained by the base classifiers the critical value was applied which was calculated on the basis of the pos-hoc Nemenei test. This value for the twelve investigated data sets and the eighteen classifiers constituting the appropriate group on the significance level 0:05 is 76. The best
ensemble of classifiers from the pool is therefore statistically better than two base classifiers, whereas the classifier WMV is only better from one classifier. The post hoc analysis using the Bonferroni–Dunn test for each of the groups of classifiers (Tables 7–9) that are based on another way of combining classifiers’ outputs is linked to the critical value that equals to 3:8 and 3:5 according to the significance level, which equals 0:05 and 0:1. Therefore, the suggested in this work ways of calculating weights influence the improvement in the quality of the ensembles of classifiers. In two cases these are the ensembles of classifiers using the weights calculated with the dependency (12). It must be noted that the improvement is not statistically significant.
1629
R. Burduk / Pattern Recognition Letters 34 (2013) 1623–1629 Table 9 The classification accuracy for ensemble of classifiers with average method and their respective average rank obtained from the Friedman test. Data set
A
wA
wcA
1wA
1wcA
2wA
2wcA
3wA
3wcA
4wA
4wcA
Banana Breast Dermat. Glass Haber. Highl. Ionos. Irys Pima Sonar Verteb. Wine
0.96 0.681 0.949 0.717 0.72 0.917 0.911 0.95 0.765 0.787 0.848 0.921
0.961 0.692 0.949 0.719 0.72 0.917 0.912 0.95 0.767 0.784 0.847 0.921
0.961 0.681 0.949 0.719 0.718 0.917 0.912 0.95 0.767 0.783 0.848 0.921
0.96 0.697 0.949 0.717 0.72 0.917 0.912 0.95 0.765 0.787 0.848 0.921
0.96 0.689 0.949 0.717 0.719 0.917 0.912 0.95 0.767 0.787 0.848 0.921
0.966 0.703 0.947 0.728 0.717 0.921 0.919 0.954 0.77 0.767 0.847 0.916
0.964 0.681 0.947 0.725 0.717 0.921 0.915 0.959 0.771 0.767 0.85 0.912
0.966 0.711 0.945 0.717 0.714 0.92 0.918 0.952 0.766 0.745 0.841 0.92
0.962 0.686 0.946 0.712 0.716 0.916 0.905 0.957 0.77 0.753 0.844 0.914
0.963 0.714 0.942 0.712 0.723 0.918 0.908 0.963 0.767 0.748 0.841 0.904
0.961 0.673 0.937 0.717 0.72 0.919 0.91 0.961 0.767 0.77 0.836 0.912
ML,AR (all) AR (group)
18.46 6.38
16 5.33
16.88 5.79
16.96 5.75
17 5.75
10.96 4.29
12.38 4.63
16.38 6.42
19.92 7.83
17.17 6.58
19.46 7.25
The obtained values of the average ranks and the post hoc analysis using the Nemenei test show that the suggested in this work ways of calculating weights for the ensemble of classifiers improve the quality of classification. In particular, the statement refers to the comparison of the suggested algorithms with the base classifiers. In the group consisting of all the base classifiers and the ensembles of classifiers using the fusion a posteriori probability the lowest ranks were obtained for the proposed in this article weighted fusion methods of the based classifiers’ outcomes WIVlcwS and WIVlcwM . The critical value for the discussed group of the forty classifiers is 18:55 and 17:55 accordingly for the level of significance 0:05 and 0:01. The quality of classification of WIVlcwS ; WIVlcwM is therefore statistically significantly different from the quality of classification of the three or four base classifiers. The quality of the classifiers WwS and WwM is statistically significantly different only from the two base classifiers (on the investigated levels of significance). The carried out research confirms also that the better quality of classification gets the ensemble of classifiers using the discrimination functions of base classifiers rather than the ensemble based on the voting idea. 5. Conclusion In the paper two new ways of calculating the weights of classifiers entering the committee were presented. In the process of calculating weights the proposed methods take into account the prediction of all classifiers. For each of the ways of calculating weights the relevant upper and lower values were suggested in this work. The obtained range between the upper and lower value of the weight denotes the uncertainty of estimating the classification quality of an individual classifier in the context of the quality of other classifiers entering the committee. In the experiential research the ten benchmark data sets and two sets with randomly generated distributions were used. In the carried out analysis the non parametric, statistical tests were used. The obtained averages ranges indicate the improvement of the quality in the classification for the proposed methods of calculating weights. The improvement of the classification quality for the investigated bases in most cases is not statistically significant. However, the number of base classifiers that give statistically worse results in comparison to the committees of classifiers using the suggested weights rises. Acknowledgement This work is supported by the Polish National Science Center under a Grant N N519 650440 for the period 2011–2014.
References Asuncion, A., Newman, D. J., 2007. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http:// www.ics.uci.edu/ mlearn/MLRepository.html. Alefeld, G., Herzberger, J., 1983. Introduction to interval computations. Computer science and applied mathematics. Academic Press. Alkoot, F.M., Kittler, J., 1999. Experimental evaluation of expert fusion strategies. Pattern Recogn. Lett. 20 (11–13), 1361–1369. Bhadra, S., Nath, J., Ben-Tal, A., Bhattacharyya, C., 2009. Interval data classification under partial information: a chance-constraint approach. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (Eds.), Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, 5476. Springer, Berlin Heidelberg, pp. 208–219. Chen, D., Cheng, X., 2001. An asymptotic analysis of some expert fusion methods. Pattern Recogn. Lett. 22 (8), 901–904. Cyganek, B., 2012. One-class support vector ensembles for image segmentation and classification. J. Math. Imaging Vision 42 (2–3), 103–117. Derrac, J., García, S., Molina, D., Herrera, F., 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1 (1), 3–18. Duin, R., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D., Verzakov, S., 2007. PR-Tools4.1. A matlab toolbox for pattern recognition. Delft University of Technology. García, S., Fernández, A., Luengo, J., Herrera, F., 2010. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180 (10), 2044–2064. Highleyman, W.H., 1962. The design and analysis of pattern recognition experiments. Bell Syst. Tech. J. 41, 723–744. Kittler, J., Alkoot, F.M., 2003. Sum versus vote fusion in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 25 (1), 110–115. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20 (3), 226–239. Kulczycki, P., Kowalski, P.A., 2011. Bayes classification of imprecise information of interval type. Control Cybern. 40 (1), 101–123. Kuncheva, L., Jain, L., 2000. Designing classifier fusion systems by genetic algorithms. IEEE Trans. Evol. Comput. 4 (4), 327–336. Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience. Silva, F., de Carvalho, F.A.T., Souza, R., Silva, J., 2006. A modal symbolic classifier for interval data. In: King, I., Wang, J., Chan, L.-W., Wang, D. (Eds.), Neural Information Processing, Lecture Notes in Computer Science, 4233. Springer, Berlin Heidelberg, pp. 50–59. Viertl, R., 1996. Statistical Methods for Non-precise Data. CRC Press Inc. Woloszynski, T., Kurzynski, M., 2011. A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recogn. 44 (10–11), 2656–2668. Woods, K., Kegelmeyer Jr., W.P., Bowyer, K., 1997. Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19 (4), 405–410. Wozniak, M., Jackowski, K., 2009. Some remarks on chosen methods of classifier fusion based on weighted voting. In: Corchado, E., Wu, X., Oja, E., Herrero, A., Baruque, B. (Eds.), Hybrid Artificial Intelligence Systems, Lecture Notes in Computer Science, 5572. Springer, Berlin Heidelberg, pp. 541–548. Xu, L., Krzyzak, A., Suen, C.Y., 1992. Methods for combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22, 418–435. Zhang, C.-X., Duin, R.P.W., 2011. An experimental study of one- and two-level classifier fusion for different sample sizes. Pattern Recogn. Lett. 32 (14), 1756– 1767.