Dissimilarity-based multi-instance learning using dictionary learning and sparse coding ensembles

Computers and Electrical Engineering 80 (2019) 106482 Contents lists available at ScienceDirect Computers and Electrical Engineering journal homepag...

Download PDF

3MB Sizes 0 Downloads 55 Views

Report

PDF Reader
Full Text

Computers and Electrical Engineering 80 (2019) 106482

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Dissimilarity-based multi-instance learning using dictionary learning and sparse coding ensemblesR Nazanin Moarref∗, Yusuf Yaslan Istanbul Technical University, Faculty of Computer and Informatics Engineering, Sariyer 34469, Istanbul, Turkey

a r t i c l e

i n f o

Article history: Received 18 January 2019 Revised 26 September 2019 Accepted 26 September 2019

Keywords: Dictionary learning Sparse coding Multi instance learning Random subspace Bagging

a b s t r a c t In multi-instance learning problems, samples are represented by multisets, which are named as bags. Each bag includes a set of feature vectors called instances. This differs multi-instance learning problems from classical supervised learning problems. In this paper, to convert a multi-instance learning problem into a supervised learning problem, ﬁxed-size feature vectors of bags are computed using a dissimilarity based method. Then, dictionary learning based bagging and random subspace ensemble classiﬁcation models are proposed to exploit the underlying discriminative structure of the dissimilarity based features. Experimental results are obtained on 11 different datasets from different multiinstance learning problem domains. It is shown that the proposed random subspace based dictionary ensemble algorithm gives the best results on 8 datasets in terms of classiﬁcation accuracy and area under curve. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Multi-instance learning (MIL) problem was proposed in [1] and recently, has been applied in many real-world problems such as drug activity prediction, content-based image retrieval classiﬁcation, and text/document classiﬁcation [2]. In a MIL problem, the classiﬁers are dealing with sets of instances, which are gathered in different kinds of bags. In a traditional supervised classiﬁcation model, each example has a ﬁxed-size vector of features, whereas in a MIL problem each example in a bag consists of multi-feature vectors [3]. The assumption is that the label of a bag is positive if and only if it has at least one positive instance. Otherwise, the bag will be labeled as a negative bag. Classiﬁers use bags (for either bag-based or embedded-based methods) or instances (for instance-based methods) and their labels to train models [4]. In instancebased methods, classiﬁers use instances to predict the labels of the bags. Different classiﬁers such as Bayesian approach, neural networks, decision trees, random forest, and SVM have been used in the instance-based domain [5]. In addition to the classiﬁcation models, feature representations are also important for the performance of any classiﬁcation problem. Sparse coding and dictionary learning methods have recently attracted researchers’ interests by representing each sample as sparse as possible. In this way, samples are represented by linear combinations of basic elements, which are called atoms. Many research ﬁelds have beneﬁted from sparse coding and dictionary learning techniques such as signal denoising, image processing [6], feature extraction [7], supervised learning [8], unsupervised clustering [9] and semi-supervised dictionary learning [10]. Previously dictionary learning is used for MIL problems in [11], by generating class-speciﬁc bag level diverse R This paper is for regular issues of CAEE. Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. Ayaz Isazadeh. ∗ Corresponding author. E-mail addresses: [email protected] (N. Moarref), [email protected] (Y. Yaslan).

https://doi.org/10.1016/j.compeleceng.2019.106482 0045-7906/© 2019 Elsevier Ltd. All rights reserved.

2

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

classiﬁers. The impressive performance of dictionary learning showed that sparse representation is naturally discriminative [11]. Among many methods that tried to convert a MIL problem into a supervised learning problem[4], dissimilarity based method applied in [12] is shown to be a successful approach for MIL problems. However, it seems that SVM is not a suﬃcient method to discriminate bags in dissimilarity feature spaces. As a contribution, in this paper, we have considered these two related works and applied a discriminative dictionary learning framework, which uses the dissimilarity values to learn a structured dictionary and ensembles of these dictionaries are proposed as MIL classiﬁers. During the last decade, the ensemble methods have been applied in many classiﬁcation problems. Making decisions according to more than one classiﬁer, result in more reliable decisions and increase the classiﬁcation performance. Random subspace and bagging methods are most successful ensemble models and a lot of research has been done in these ﬁelds [13]. Therefore, we have proposed two methods to train ensemble classiﬁers for MIL problems. The ﬁrst method utilizes the random subspace algorithm to select instances as prototypes and the dissimilarity values of each instance to the selected prototypes are used as feature values. In the second method, all instances are evaluated as prototypes and dissimilarity values of each instance to prototypes are taken into consideration. Then bagging is applied to the bags, which are represented by dissimilarity based feature vectors. In the following section, we cover the methods and techniques that are used in this article. In Section 2, dissimilaritybased MIL problem is explained, then the combination of this method with ensemble-based random subspace technique [12], which is one of the recently successful approaches in MIL, is described. Dictionary learning and sparse coding is clariﬁed afterward. In Section 3, the proposed method is explained in details. In Section 4, the experimental results are illustrated, and in Section 5, we conclude the article by emphasizing the important results achieved. 2. Methodology 2.1. Dissimilarity based multi instance learning In a MIL problem, a bag is a set B j = {xi j , y j |i = 1, 2, . . . , N j }, where the xij ∈ Rd are instances called feature vectors. yj is the label, Nj is the number of instances in jth bag that can vary from bag to bag. All the instances belong to the d dimensional feature space called the instance space. This learning method aims to learn model which can predict targets of unseen bags. Unlike most of the existing works on MIL, in this paper instead of representing a bag by its instances, bags are represented by relative dissimilarities to the reference instances called prototypes. In [14] the prototypes are selected using the clustering method. However, clustering the instances has the risk of excluding informative instances, and the obtained feature space may not carry suﬃcient information about the problem. The dissimilarity-based MIL method is proposed in [12]. The aim is to convert the MIL problem into a supervised learning problem by computing the dissimilarities between bags and preselected prototypes. Let d() be the dissimilarity function, T = {Bi , yi |i = 1, 2, . . . , M} be training bags, rT = {xi j |(i = 1, 2, . . . , N j ), ( j = 1, 2, . . . , M )} which, includes all the instances in the training set and r = {r1 , r2 , . . . , rz } be the set of randomly selected instances as prototypes from rT and has z number of prototypes. Then, by using dissimilarity values each bag Bi in the training set is represented by a simple vector vinstance = [d (Bi , r1 ), d (Bi , r2 ), . . . , d (Bi , rz )], where vinstance ∈ Rz and MIL problem can be converted into supervised learning. Many dissimilarity metrics have been tried in [5] and experimental results show that Euclidean distance is a reasonable choice for many datasets. Therefore, Euclidean distance is used as a dissimilarity function d(). The instance-based dissimilarities between bags and prototypes can be written as follows:

vinstance = [d (Bi , r1 ), d (Bi , r2 ), . . . , d (Bi , rz )]

(1)

where, for a prototype instance rp , the dissimilarity of the Bi to rp is:

d ( Bi , r p ) =

min

l∈{1,2,..Ni }

d (xli , r p )

(2)

After representing the bags by dissimilarity feature vectors, SVM is used for classiﬁcation and it is shown that this method, Dissimilarity-based Random Subspace with SVM (DRSSVM), outperforms its counterparts such as MILES [4], MinimaxSVM [15], MILBoost [16], MI-SVM [17], EM-DD [18]. In pursuit of this approach, in this paper Dissimilarity-based Random Subspace (DRS) algorithm is implemented with the dictionary learning method (DL). We illustrate these algorithms in more details in the next section. 2.2. Dictionary learning and sparse coding In this section, we will give details about general dictionary learning and sparse coding methods. Consider y ∈ Rn as an input sample. The dictionary is a matrix, consists of normalized, basis vectors di , where di .diT = 1. We symbolize a dictionary with D = [d1 , d2 , . . . , dk ], where D ∈ Rn × k . The number of instances is usually greater than the number of atoms (n > k). In this case, the dictionary is called an over-complete dictionary. α ∈ Rk is called sparse code, which is the coeﬃcient vector. Using the dictionary, the input signal can be represented by a linear combination of atoms and it can be formulated as:

min α0

s.t

y = Dα

(3)

α0 is the L0 norm of the coeﬃcient vector α , which indicates the number of non-zero elements of α . To represent the signal as sparse as possible, we try to ﬁnd the minimum number of non-zero elements. In case the dictionary is an

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

3

over-complete one, ﬁnding the sparsest representation will be diﬃcult. Since the computational problem will arise, and it needs a combinatorial search, which results in an NP-hard problem. To ﬁnd the best solution, instead of using L0 , the L1 norm can be used. As a result, the sparse representation will be formulated as:

min α1

s.t

y = Dα

(4)

Then the general form can be formulated as follows:

α ∗ = arg min α1 s.t Dα − y2 ≤ α

(5)

Dictionary learning is a learning method whereby a dictionary is directly constructed by input signals. To address this purpose, dictionaries could be attained by solving this minimization problem:

min y − Dα22 + λα1 D,α

(6)

Where y is representative of the input signal, y − Dα22 is data ﬁtting term which deﬁnes reconstruction error, α1 is the regularization term which speciﬁes the decomposition sparsity, and λ is penalty parameter that holds the trade-off between the data ﬁtting and the regularization term in balance. Considering D and α as the variables of the mentioned problem makes this optimization problem a non-convex problem. To address this issue, the solution considers one of the variables ﬁxed so that the minimization function turns into a convex optimization problem, which assures the existence of an optimum solution regarding the other variable. As a result, to solve this objective function, two optimization steps, related to these two variables, are needed. To achieve a predetermined convergence, it is required to apply these two steps iteratively until the desired convergence is achieved. • Sparse approximation step: using the dictionary D, coeﬃcients α of signal y are calculated by ﬁnding an optimum solution for the objective function. (Note that, in the beginning, the dictionary is initialized arbitrarily.) • Dictionary update step: Using the calculated sparse coding matrix α , new dictionaries are computed and updated. Updating the dictionaries leads to the reduction in the approximation error at each iteration. 3. Proposed dictionary learning based ensemble multiple-instance learning In this study, dissimilarity-based MIL has been addressed using dictionary learning as a base classiﬁer. In this case, we have MIL problem datasets which consist of bags and their associated feature vectors (instances). To convert the MIL problem into a standard supervised learning problem, the dissimilarities of the bags to the selected instance prototypes are calculated using Eq. (1). Thereby, each bag is represented by the dissimilarity values as features. Sparse coding can help decrease the dimension size, therefore the classiﬁers deal with lower complex problems so that their performances can increase [19]. Dictionary learning not only is applied for the reconstructive purpose but also can be used for discriminative purposes. Discriminative dictionaries are an appropriate strategy to classify input data. In pursuit of this aim, class labels of the input data get involved in learning dictionaries, which result in different representations for each separate class. Therefore, for each separate class of input data, a unique dictionary is obtained. If we consider input data with m class labels, the base dictionary D is the result of m sub-dictionaries D = [D1 , D2 , . . . , Dm ], where Di ∈ Rn × k . To classify the test set we use the D dictionary to encode the signal. This means that the signal uses each of sub-dictionaries to be encoded. The signal will be assigned to the class of the sub-dictionary which leads the signal to have the least reconstruction error and the sparsest representation. To explain in more details, the steps are: • Obtain the sparse code α i of the signal y for each sub-dictionary Di , which are trained for each class of input data. • Compare the cost of representation δ i (y) for each sub-dictionary and its related sparse code. Assign the class labels of the sub-dictionary which leads to minimum representation cost:

class where

i∗ = arg min δi (y ), i∈1,m

δi (y ) = min yi − Di αi 22 + λαi 1

(7) (8)

Using these equations supervised dictionary learning and sparse coding can be used as a classiﬁer, which generates the sparse representation of signals and classiﬁes them simultaneously. A classiﬁer can suffer from the curse of dimensionality when the number of features is too large. Similarly, when the number of prototypes is too large the dissimilarity based MIL algorithm can also suffer from high dimensionality and a lot of redundant and uninformative dissimilarities. Therefore, ensemble learning methods are also applied to deal with this problem. In this paper, random subspace and bagging, are taken into consideration as ensemble learning methods. Random subspace is one of the commonly used ensemble learning methods, which generates subspaces from randomly selected features. In the random subspace method, in each subspace, the model is trained using a subset of randomly selected features from the entire features. The advantage of this method is that in high dimensional data, it may reduce the problems that may arise from the curse of dimensionality problem. In our random subspace-based method, Dissimilarity-based Random Subspace with Dictionary Learning (DRSDL), some instances from all training instances are randomly selected as prototypes. The instance-based dissimilarity is used to calculate the dissimilarity of each bag to the prototype instances. Using the bags

4

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

Fig. 1. Flow chart of the proposed DRSDL model. Table 1 pseudo-code for dissimilarity based MIL using random subspace and dictionary learning. Algorithm: Dissimilarity Based Multi Instance Learning Using Random Subspace and Dictionary Learning Input: training set T, the number of ensemble classiﬁers s, subspace size z, base classiﬁer D, number of samples in the test set t, prototypes r = {r1 , r2 , . . . , rz } Output: labels of test set y = [y1 , y2 , . . . , yt ] Training: For i = 1: s Create a subspace using z randomly selected instances: r = {r1 , r2 , .., rz }. Represent each bag in T in vinstance form (Eq. (1)). Train two separate dictionaries: Di1 , Di2 (Eqs. (5) and (6)). End for Test: Represent each bag in the test set in vinstance form. For i=1: s Predict the test bag labels using Di1 , Di2 ((Eq. (7)). End for Output: Classify each bag using the posterior probability: y

with obtained feature values, dictionary learning and sparse coding method is used to train the model and classify the unseen bags. The second approach used in this paper is the bagging method and it is the most straightforward approach for manipulating the inputs of the training set. In this technique, samples are selected randomly (with replacement) from the entire training set samples. This approach helps to create diverse classiﬁers [20]. In our bagging-based method, Dissimilaritybased Bagging Subspace with Dictionary Learning (DBSDL), all training instances are selected as prototypes. Thereby, our feature space consists of instance-based dissimilarity values of each bag to all existent instances. Afterward, each subspace is generated by randomly selected bags. Using selected bags, at each subspace, the model is trained by dictionary learning and sparse coding-based classiﬁers. The pseudo-code of the proposed methods DRSDL and DBSDL techniques are given in Tables 1 and 2 respectively. To illustrate our approach more concisely, the ﬂowchart of DRSD and DBSDL is given in Figs. 1 and 2 respectively. The computational complexity of the proposed methods are dependent on dictionary learning and sparse coding part that has two steps: • Approximating the sparse representation of the training data. • Updating the dictionary using sparse representation. These steps can be interpreted as PCA approach with multiple subspaces or K-Means method as clustering a signal using multiple locations. Complexity analysis of dictionary learning is given with details in [21]. 4. Experimental results DRSDL, DBSDL are compared with DRSSVM, MILES, MILBoost, minimaxSVM on 11 different MIL datasets [22]. To have an accurate perception of the DL role in the ensembles, Dissimilarity-based Bagging subspace with SVM (DBSSVM) is evaluated.

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

5

Table 2 pseudo-code for dissimilarity based MIL using bagging and dictionary learning. Algorithm: Dissimilarity Based Multi Instance Learning Using Bagging and Dictionary Learning Input: training set T, the number of ensemble classiﬁers s, base classiﬁer D, the number of samples in the test set t, prototypes r = {r1 , r2 , . . . , rz }. Output: labels of the test set y = [y1 , y2 , . . . , yt ] Training: For i = 1: s Use all the instances in the training set as prototypes. Represent each bag in the training set in vinstance form (Eq. (1)). Create a subspace using randomly selecting bags from T with replacement. Train two separate dictionaries: Di1 , Di2 (Eqs. (5) and (6)). End for Test: Represent each bag in the test set in vinstance form (Eq. (1)). For i=1: s Predict the test bag labels using Di1 , Di2 . (Eq. (7)) End for Output: Classify each bag using the posterior probability: y Table 3 MIL Datasets used for experimental results. Dataset

Total bags

Positive bags

Negative bags

Total instances

Tiger Fox Elephant Musk2 Musk1 alt.atheism comp.graphics rec.autos sci.crypt sci-med talk.politics.guns

200 200 200 102 92 100 100 100 100 100 100

100 100 100 63 47 51 52 51 51 51 51

100 100 100 39 45 49 48 49 49 49 49

1220 1320 1391 6598 476 5443 3094 3458 4284 3045 3558

Similarly, to see the role of ensembles, Dissimilarity-based with Dictionary Learning (DDL) and Dissimilarity-based with SVM (DSVM), where no ensemble method is combined, are also examined. Table 3 indicates the number of positive and negative bags and the number of instances that each data set contains. The Tiger, Fox, and Elephant datasets are the most frequently used benchmarks in MIL problems and are related to image categorization ﬁeld. The bags are images and the instances are the segments of the images. The images which contain the relevant animal in, are considered as positive bags, and the negative ones are the images which do not contain the relevant animal in. Musk1 and Musk2 are about molecule activity prediction problems. In this problem, the classiﬁers try to make

Fig. 2. Flowchart of the proposed DBSDL model.

6

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482 Table 4 Number of atoms and percentage of selected prototypes. Dataset

DDL

DBSDL

DRSDL

percentage

Tiger Fox Elephant Musk2 Musk1 alt.atheism comp.graphics rec.autos sci.crypt sci-med talk.politics.guns

100 150 50 25 25 25 50 50 25 50 50

50 50 25 25 25 25 50 50 25 25 25

200 200 200 25 25 50 200 100 100 200 100

20% 50% 5% 10% 30% 20% 50% 50% 30% 50% 50%

Table 5 Accuracy (%) and SE results. DATA

Tiger Fox Elephant Musk2 Musk1 alt.atheism comp.graphics rec.autos sci.crypt sci-med talk.politics.guns

MILES

MILBoost

minimaxSVM

DRSSVM

ACC

SE

ACC

SE

ACC

SE

ACC

SE

76.00 65.00 78.50 93.00 82.22 51.00 46.00 49.00 53.00 45.00 41.00

2.59 3.07 2.11 2.60 5.02 2.33 6.00 4.58 5.97 3.73 5.04

76.00 55.50 87.00 63 66.67 42.00 43.00 53.00 52.00 59 54.00

5.57 2.52 2.26 4.48 6.83 3.27 4.48 7.16 3.27 5.04 6.36

61.50 58.50 69.00 37 60.00 66.00 60.00 61.00 62.00 63 63.00

2.69 2.11 3.48 4.73 4.12 3.71 2.58 2.33 2.91 1.53 2.13

81.00 64.50 83.50 89.00 88.94 69.05 54.02 61.23 70.14 71.94 76.05

2.56 2.17 3.08 3.48 3.32 1.69 1.95 2.43 2.79 3.62 2.98

decisions as to whether a molecule has a musky smell or not. A molecule can have different shapes which are fold into conformers. Hence, each bag is considered as a molecule and each instance is one of its conformers. In this case, if at least one of the conformers can make the molecule smell musky, then the bag will have a positive label. The remaining datasets are from 20Newsgroups’ dataset. Newsgroups are about a text categorization problem and generated from 20 different categories. A bag consists of different posts (as instances) from different categories. The positive bags contain 3% posts from the relevant category and 97% from other categories. SPAMS tool [23], LIBSVM [24] and PRtools [25] are used for dictionary learning-based methods, SVM-based methods, and MIL-based approaches (MILES, MILboost, and MinimaxSVM) respectively. Experimental results are obtained using ten-fold cross-validation. All the experiments are obtained on the same training sets and test sets. All the dissimilarity values for training sets and test sets are normalized with zero mean and unit variance. The number of atoms for each dictionary-based method (DRSDL, DBSDL, and DDL) is different. For each method and different datasets, the accuracies are calculated for different numbers of atoms (using one of 25, 50, 100, 150, 200, 250, and 300). The number of atoms is selected using the validation set. In this way, the dictionaries are constructed using the numbers of atoms which give the best prediction accuracies. The accuracies of the models decrease to around 50% after 300 atoms and they are not reported in the paper. As the number of features in each dataset is data-dependent, in random subspace ensemble methods (DRSSVM and DRSDL), the number of selected features for each dataset is different. Since there is no information about the number of redundant features, the number of selected features is examined for different sizes. In the experiments, 5%, 10%, 20%, 30%, 40%, and 50% of dissimilarity features are evaluated for each dataset, and the one which gives the best accuracy is selected to be the random subspace size. For example, the Elephant dataset has the best accuracy using 5% of training instances, whereas Fox has the best accuracy with 50%. In this way, we tried to obtain better results using fewer than 50% of all features in the training sets. In the bagging subspace ensemble methods (DBSDL, DBSSVM) all of the training bags are selected with replacement. In both of the ensemble methods, the number of subspaces is chosen to be 20. The reason why we use the same number of subspaces for bagging and random subspace methods is that we try to ﬁx all possible parameters so that we can approximately have an accurate perception of the performance of each ensemble method. Table 4 indicates the number of atoms used for each relevant dictionary-based method and the percentage of selected prototypes (z) from the entire training bag instances (random subspace dimension size). In our initial experimental results we have compared the previously proposed DRSSVM method with 3 powerful MIL approaches; MILES, MILBoost, and MinimaxSVM. The classiﬁcation accuracy and AUC results are given in Tables 5 and 6 respectively. As it was shown in [12], in our experiments DRSSVM generally outperforms its counterparts. Therefore, the proposed methods are compared with DRSSVM. These classiﬁcation accuracies and AUC results are given in Tables 7 and 8 respectively.

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

7

Fig. 3. Classiﬁcation accuracy of DRSSVM and DBSDL on test datasets. Table 6 AUC (%) and SE results. DATA

MILES

Tiger Fox Elephant Musk2 Musk1 alt.atheism comp.graphics rec.autos sci.crypt sci-med talk.politics.guns

minimaxSVM

DRSSVM

AUC

SE

MILBoost AUC

SE

AUC

SE

AUC

SE

87.24 72.78 85.84 97.14 87.89 46.49 48.72 55.49 50.65 52.93 35.11

2.40 2.79 1.84 1.90 5.04 6.38 6.36 4.89 5.61 7.68 6.15

79.53 64.12 92.43 68.56 70.76 52.52 41.63 54.94 51.69 66.41 55.53

8.13 4.23 2.05 5.78 8.07 5.49 4.26 7.28 4.33 4.31 6.63

73.85 64.10 89.86 76.55 78.04 51.42 54.33 46.67 56.11 54.53 61.12

2.93 3.48 2.11 5.05 6.09 2.55 4.12 3.71 6.19 2.86 4.93

84.5 68.85 89.95 90.33 95.75 76.70 52.00 61.90 78.28 78.05 89.30

2.54 2.36 2.19 3.21 1.75 3.76 2.00 2.81 3.36 4.46 2.83

Table 7 Accuracy (%) and SE results. DATA

Tiger Fox Elephant Musk2 Musk1 alt.atheism comp.graphics rec.autos sci.crypt sci-med talk.politics.guns

DSVM

DDL

DBSSVM

DBSDL

DRSSVM

DRSDL

ACC

SE

ACC

SE

ACC

SE

ACC

SE

ACC

SE

ACC

SE

83.00 63.00 71.50 85.09 87.80 69.80 54.02 59.23 68.94 64.92 66.14

2.71 2.81 2.59 3.1 2.61 3.07 1.95 2.87 3.81 3.10 2.89

80.00 57.50 84.00 76.45 85.02 73.07 62.88 65.25 73.16 67.83 63.92

3.16 1.86 2.97 4.16 3.85 3.34 2.64 3.90 4.17 2.28 3.12

82.50 74.00 75.00 89 90.02 68.92 55.13 60.12 69.93 68.03 72.94

2.50 2.96 2.98 3.48 2.51 2.86 2.32 3.11 2.16 2.39 3.04

83.89 73.50 83.00 80.27 90.28 81.07 76.32 73.12 78.07 79.85 83.05

1.93 3.17 3.67 2.63 3.11 2.29 3.54 4.02 2.88 2.75 2.50

81.00 64.50 83.50 89.00 88.94 69.05 54.02 61.23 70.14 71.94 76.05

2.56 2.17 3.08 3.48 3.32 1.69 1.95 2.43 2.79 3.62 2.98

87.00 65.50 87.50 82.27 86.16 90.18 90.25 91.00 90.07 83.98 87.87

2.00 1.57 3.27 3.28 3.98 2.02 2.79 2.77 2.02 3.71 2.59

Table 8 AUC (%) and SE results. DATA

Tiger Fox Elephant Musk2 Musk1 alt.atheism comp.graphics rec.autos sci.crypt sci-med talk.politics.guns

DSVM

DDL

DBSSVM

DBSDL

DRSSVM

DRSDL

AUC

SE

AUC

SE

AUC

SE

AUC

SE

AUC

SE

AUC

SE

83.00 63.00 71.50 85.65 88.00 70.00 52.00 58.50 68.67 64.92 65.67

2.71 2.81 2.59 3.29 2.32 2.98 2.00 2.79 3.81 3.10 2.80

80.00 58.00 74.11 74.11 85.25 73.17 62.25 64.92 72.92 67.83 63.83

3.16 2 4.83 4.83 3.81 3.39 2.42 3.82 4.11 2.33 3.14

88.75 85.70 89.35 96.46 98.40 80.42 53.25 61.05 82.60 75.27 88.30

2.66 2.30 2.62 2.06 1.06 2.51 2.24 3.43 3.3 4.16 3.23

89.2 84.70 91.55 92.77 96.60 91.73 90.23 88.83 86.87 92.38 93.65

2.14 2.67 2.75 2.17 2.00 2.67 3.29 3.74 2.84 1.27 2.30

84.50 68.85 89.95 90.33 95.75 76.70 52.00 61.90 78.28 78.05 89.30

2.54 2.36 2.19 3.21 1.75 3.76 2.00 2.81 3.36 4.46 2.83

92.05 71.50 93.80 79.73 93.24 97.60 96.57 98.8 97.87 96.27 95.83

2.05 2.16 2.21 4.12 2.43 1.36 1.25 1.20 0.95 1.75 1.59

8

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

Fig. 4. Classiﬁcation accuracy of DRSSVM and DBSSVM on test datasets.

Fig. 5. Classiﬁcation accuracy of DRSDL and DBSDL on test datasets.

Fig. 6. Classiﬁcation accuracy of DDL and DSVM on test datasets.

As it is shown in these two tables, DRSDL approach outperforms not only the DRSSVM technique but also it outperforms the bagging-based methods for each of DBSDL and DBSSVM algorithms and simple dissimilarity-based methods without any ensembles for each DDL and DSVM. The best performance of the proposed method can be seen on newsgroup dataset where the proposed DRSDL algorithm performs 91% classiﬁcation accuracy and 98.8% AUC performance for rec.autos dataset, Whereas the DRSSVM performs 61.23%, 61.90%; DBSDL 73.12%, 88.83%; DBSSVM 60.12%, 61.05%; DDL 65.25%, 64.92% and DSVM 59.23%, 58.50% in terms of classiﬁcation accuracy and AUC performance respectively. In Tables 7 and 8 the prominence of DRSDL is shown. The DBSDL method as a second successful performance is also compared with DRSSVM in Fig. 3. It can be seen that in most cases DBSDL has higher performance. It can be inferred

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

Fig. 7. Area under curve of MIL datasets for DRSDL, DRSSVM, DBSDL,DBSSVM, DDL,DSVM methods.

9

10

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

Fig. 7. Continued

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482

11

that these two proposed methods outperform other MIL methods as well. The next comparisons are obtained on ensemble strategies. To compare the performance of random subspace and bagging methods, we have also compared dissimilaritybased random subspace and bagging methods using different classiﬁers (DRSSVM with DBSSVM in Fig. 4 and DRSDL with DBSDL in Fig. 5). Comparing DRSSVM to DBSSVM, these two methods in most cases show approximately similar results, but it seems that the DRSSVM slightly performs better. Comparing DRSDL with DBSDL in Fig. 5, in most cases, DRSDL has higher performance. It is clear that the random subspace method has better performance than bagging using dictionary learning. To illustrate the constructive performance of the ensembles, one can compare the results of DSVM and DDL methods in Fig. 6, where only single classiﬁers are trained on all the dissimilarity values. It is clear that performance is decreased significantly. Comparing DDL and DSVM, it seems that DL slightly performs better than SVM. The area under curve performance of all methods for each dataset can be seen in Fig. 7. DRSDL has the best AUC performance in most cases. Following this method, DBSDL has the second-best AUC performance compared to the remaining approaches (DRSSVM, DBSSVM, DSVM, and DDL). Combining DL with ensembles outperform the SVM ensembles. It seems that DL ensemble methods generate more diverse classiﬁers comparison to SVM ensembles. It is notable that in ensemble methods what matters is not about using high-performance classiﬁers as a base classiﬁer, it is about applying methods which generate more diverse classiﬁers at each subspace. The main limitation of the dissimilarity based models is the number of prototypes and this limitation is also valid for the proposed algorithms. Besides, the number of atoms in the dictionaries is also another parameter to be determined. However, parameter optimization is a general issue for any classiﬁer and these parameters can be found from the validation sets. 5. Conclusion In this paper, dissimilarity-based dictionary learning and sparse coding ensemble methods are proposed for multiinstance learning problems. The dissimilarity-based part is applied to convert the problem into a supervised problem. Dictionary learning and sparse coding method is used in a combination of random subspace and bagging for classiﬁcation purpose. The detailed experimental results on benchmark datasets show signiﬁcantly high performance compared to the current state-of-art models. As future work, we will investigate the diversity of classiﬁers in ensembles and their effects on overall performance. Declaration of Competing Interest Authors declare that there is no conﬂict of interest. Supplementary material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.compeleceng. 2019.106482. References [1] Dietterich TG, Lathrop RH, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 1997;89(1):31–71. [2] Kotzias D, Denil M, De Freitas N, Smyth P. From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2015. p. 597–606. [3] Foulds J, Frank E. A review of multi-instance learning assumptions. Knowl Eng Rev 2010;25(01):1–25. [4] Chen Y, Bi J, Wang JZ. Miles: multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell 2006;28(12):1931–47. [5] Cheplygina V. Dissimilarity-based multiple instance learning Ph.D. thesis. TU Delft; 2015. [6] Hand EM, Castillo C, Chellappa R. Doing the best we can with what we have: Multi-label balancing with selective learning for attribute prediction. In: Thirty-second AAAI conference on artiﬁcial intelligence; 2018. [7] Kocyigit G, Yaslan Y. Demial: an active learning framework for multiple instance image classi cation using dictionary ensembles. Turk J Electr Eng Comput Sci 2018;26(1):593–604. [8] Cheng E-J, Prasad M, Puthal D, Sharma N, Prasad OK, Chin P-H, et al. Deep learning based face recognition with sparse representation classiﬁcation. In: International Conference on Neural Information Processing. Springer; 2017. p. 665–74. [9] Ramirez I, Sprechmann P, Sapiro G. Classiﬁcation and clustering via dictionary learning with structured incoherence and shared features. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE; 2010. p. 3501–8. [10] Wang Z, Dong Y, Mao S, Wang X. Internet multimedia traﬃc classiﬁcation from qos perspective using semi-supervised dictionary learning models. China Commun 2017;14(10):202–18. [11] Qiao M, Liu L, Yu J, Xu C, Tao D. Diversiﬁed dictionaries for multi-instance learning. Pattern Recognit 2017;64:407–16. [12] Cheplygina V, Tax DM, Loog M. Dissimilarity-based ensembles for multiple instance learning. IEEE Trans Neural Netw Learn Syst 2016;27(6):1379–91. [13] Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst Mag 2006;6(3):21–45. [14] Akbas E, Ghanem B, Ahuja N. Mis-boost: multiple instance selection boosting. arXiv:11092388, 2011. [15] Gärtner T, Flach PA, Kowalczyk A, Smola AJ. Multi-instance kernels. In: ICML, 2; 2002. p. 179–86. [16] Zhang C, Platt JC, Viola PA. Multiple instance boosting for object detection. In: Advances in Neural Information Processing Systems; 2005. p. 1417–24. [17] Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems; 2002. p. 561–8. [18] Zhang Q, Goldman SA. Em-dd: An improved multiple-instance learning technique. In: Advances in Neural Information Processing Systems; 2001. p. 1073–80. [19] Tosic I, Frossard P. Dictionary learning. IEEE Signal Process Mag 2011;28(2):27–38. [20] Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC press; 1994.

12 [21] [22] [23] [24] [25]

N. Moarref and Y. Yaslan / Computers and Electrical Engineering 80 (2019) 106482 Vainsencher D, Mannor S, Bruckstein AM. The sample complexity of dictionary learning. J Mach Learn Res 2011;12(Nov):3259–81. Cheplygina V. Multi instance learning data sets. http://www.miproblems.org/datasets/, Accessed: 2016-08-12. Mairal J. Spams tool. http://spams-devel.gforge.inria.fr/index.htmll, Accessed: 2015-02-01. Libsvm. https://www.csie.ntu.edu.tw/∼cjlin/libsvm/, Accessed: 2015-02-01. Prtools. http://prtools.org/software, Accessed: 2017-03-01.

Nazanin Moarref received her B.Sc. degree in Electric and Electronic Engineering from University of Tabriz, Iran, in 2013. She got her M.Sc. degree in Computer Engineering from ITU in 2017. She also continues studying her Ph.D. in Computer Engineering in ITU. Her research activities are concentrated in machine learning theory and applications and deep learning ﬁelds. Yusuf Yaslan received his B.Sc. degree in Computer Science Engineering from Istanbul University, Turkey, in 2001. He got his M.Sc. degree in Telecommunication Engineering and his Ph.D. in Computer Engineering from Istanbul Technical University, in 2004 and 2011 respectively. His research interests are machine learning, data mining and recommendation systems.

Dissimilarity-based multi-instance learning using dictionary learning and sparse coding ensembles

Dissimilarity-based multi-instance learning using dictionary learning and sparse coding ensembles

Recommend Documents