Pattern Recognition Letters 128 (2019) 122–130
Contents lists available at ScienceDirect
Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec
Supervised classification using graph-based space partitioning ˙ b,∗, Karima Ben Suliman b Nicola Yanev a, Ventzeslav Valev a, Adam Krzyzak a b
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, H3G 1M8 Canada
a r t i c l e
i n f o
Article history: Received 1 March 2019 Revised 6 July 2019 Accepted 27 July 2019 Available online 8 August 2019 Keywords: Supervised classification Feature space partitioning Graph partitioning Nearest neighbor rule Box algorithm
a b s t r a c t In the paper we consider the supervised classification problem using space partitioning into multidimensional rectangular boxes. We show that the problem at hand can be reduced to computational geometry problem involving heuristic minimum clique cover problem satisfying the k-nearest neighbor rule. We first apply heuristic algorithm for partitioning a graph into a minimal number of maximal cardinality cliques inscribed into the smallest (in volume) rectangular parallelepipeds called boxes. The main advantage of the new classifier called Box algorithm which optimally utilizes the geometrical structure of the training set is decomposition of the l-class problem (l > 2) into l binary classification problems. We discuss computational complexity of the proposed method and the resulting classification rules. The extensive experiments performed on the real and simulated data show that in almost all cases the Box algorithm performs significantly better than k-NN, SVM and decision trees. © 2019 Elsevier B.V. All rights reserved.
1. Introduction This paper considers the supervised classification problem in which a pattern is assigned to one of a finite number of classes. The goal of supervised classification is to learn a function, f(x) that maps features x ∈ X to a discrete label (color), y ∈ Y = {1, 2, . . . , l } using available data {(x1 , y1 ), . . . , (xm , ym )}. We propose to approximate f by partitioning the feature space into uni-colored box-like regions, where sides of boxes are orthogonal to coordinate axes. The optimization problem of finding the minimal number of such regions is reduced to the well-known problem of minimal clique cover of a properly constructed graph. The solution results in feature space partitioning. This geometrical approach has been recently actively pursued in the literature. We provide a brief survey of relevant results. In [3] sequential feature partitioning is applied to regression estimation problems and in [2] to the problem of diagnosis of gastric carcinoma using a classifier that maximizes the overall quality. The weakness of this approach is that the n-dimensional information contained in the descriptions of the patterns is lost. In [4] principal component analysis is used for the feature partitioning approach, where subspaces are constructed from the initial n-dimensional feature space. Each pattern description is divided into sub-patterns from which features are extracted and used for
∗
Corresponding author. E-mail addresses:
[email protected] (N. Yanev),
[email protected] (V. Valev), ˙
[email protected] (A. Krzyzak),
[email protected] (K.B. Suliman). https://doi.org/10.1016/j.patrec.2019.07.024 0167-8655/© 2019 Elsevier B.V. All rights reserved.
selection at the next level. Valev [7] proposed so-called G-cut algorithm which solves classification problem by parallel feature partitioning of the initial n-dimensional space. In this approach we divide an n-dimensional rectangular hyperparallelepiped into a minimal number of hyperparallelepipeds, each containing either patterns from one of the classes only or being empty. Classification problem is reduced to an integer optimization problem, which leads to the construction of minimal covering. Many important intractable problems are easily reducible to minimum number of the Maximum Clique Problem (MCP), where the maximal clique is the largest subset of vertices such that each vertex is connected to every other vertex in the subset. They include the Boolean satisfiability problem, the independent set problem, the subgraph isomorphism problem, and the vertex covering problem. MCP has many important practical applications in diverse domains such as information retrieval, group technology analysis, VLSI layout design, mobile networks, qualitative data analysis, computer vision, and alignment of DNA with protein sequences. In the literature much attention has been devoted to developing efficient heuristic approaches for MCP for which no formal guarantee of performance exist. These approaches are nevertheless useful in practical applications. In [13] a flexible annealing chaotic neural network has been introduced, which on graphs from the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) has achieved optimal or near-optimal solution. In [1] a graph partition algorithm is proposed. It uses the min-max clustering principle with a simple min-max function: the similarity between two subgraphs is minimized, while the
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
similarity within each subgraph is maximized. Combinatorial and decision-tree approaches for solving supervised classification problems are discussed in [8]. Valev and Yanev [9] proposed a new approach to the supervised classification problem by reducing it to the solution of an optimization problem for partitioning of the graph into the minimal number of maximal cliques. This approach is similar to the one-versus-all SVM with a Gaussian radial basis function kernel, however unlike in the previous case no assumptions are made about statistical distributions of classes. This approach differs from the integer programming formulation of the binary classification problem proposed in [12], where the classification rule is a hyperplane which misclassifies the fewest number of patterns in the training set. Initial results concerning the proposed approach have been presented in [10] and a few experimental studies in [11]. The rest of the paper is organized as follows. The class covering problem using minimum number of cliques is discussed in Section 2 while an alternate approach to class covering by colored boxes is discussed in Section 3. Classification algorithm using Box algorithm and the nearest neighbor rule is introduced in Section 4. The computational complexity of the proposed algorithm is investigated in Section 5 and its relationship to the nearest neighbor rule and decision trees is presented in Section 6. Results of extensive simulations and experiments on real data with Box algorithm classifier, SVM, k-NN and decision tree are presented in Section 7. Finally, in Section 8 we draw some important conclusions. 2. Covering classes by colored boxes using maximum cardinality clique approach The paper addresses the solution of the supervised classification problem by reducing it to heuristically solving minimal clique cover problem satisfying the nearest neighbor rule. In this section we apply a heuristic algorithm for partitioning a graph into a minimal number of cliques. Then we cover the cliques by minimum size rectangular boxes and introduce a classifier based on the nearest neighbor principle and the Manhattan distance. Recall that the patterns x = (x1 , x2 , . . . , xn ) are points in Rn and let M = {x1 , . . . , xm } be feature vectors (patterns) and T = {y1 , . . . , ym } be their labels. In the sequel, the rectangular hyperparallelepiped P = {x = (x1 , x2 , . . . , xn ), x ∈ I1 × I2 × · · · × In }, where Ii is a closed interval, will be referred to as a box. Let Kc be the set of all patterns belonging to class c, i.e., patterns painted in color c. For any compact S ⊂ Rn , let us denote by P(S) the smallest (in volume) box containing the set S, i.e. Ii = [li , ui ], where li = min xi , x ∈ S and ui = max xi , x ∈ S. We say that a box Pc (∗ ) is painted in color c, if it contains at least one pattern x ∈ M and all patterns in the box are of the same color c, i.e. P c (∗ ) ∩ M = ∅ and Pc (∗ ) ∩ M ⊂ Kc . Under these notations, we obtain the following Master Problem (MP): MP: Cover all points in M with a minimal number of painted boxes. Note that in the classification phase, a pattern x is assigned to a class c, if x falls in some Pc (∗ ). It is not necessary to require non-intersecting property for equally painted boxes. Suppose now that P (c ) = {P c (S1 ), P c (S2 ), . . . ,P c (Stc )} (minimal set of tc boxes of color c, covering all c colored points) is an optimal solution to the following problem: MP(c): Find the minimal cover of the points painted in color c by painted boxes. Then, one can easily prove that ∪ P(c) (minimal cover) is an optimal solution to MP. Thus MP is decomposable in MP (c ), c = 1, 2, . . . , l. In [9] the MP(c) problem has been considered as a problem of partitioning the vertex set of a graph into a minimal number of maximal cardinality cliques. In general, this problem is NP-complete, but as it will be shown below the instances originating from the boxes approach are polynomially solvable.
123
Fig. 1. Black and white points are patterns from two classes. The graph and the minimal clique cover are shown for one of the classes. Source: Reference - Valev et al. [11].
We will now summarize the MP(c) algorithm from Valev and Yanev [9]. In the beginning we will describe an algorithm for graph construction for each class. Let X1c , X2c , . . . , Xsc be the patterns in the training set colored in c. Let Gc = (Vc , Ec ) be a graph related to the patterns of color c. The vertex set Vc = {vc1 , vc2 , . . . , vcs } corresponds to the patterns of color c. We will construct the graph Gc in the following way. The edge (vck , vcl ) ∈ Ec iff P (vck , vcl ) is a painted box. The following pseudocode algorithm can be used as a Gc graph-builder. Algorithm for Gc graph-builder: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Ec = ∅ for i = 1 to s do for j = i + 1 to s do begin noedge = F alse for k = 1 to m do begin if noedge = F alse then if vk ∈ / Vc & vk ∈ P c (vci , vcj ) then noedge = T rue end if noedge = F alse then Ec = Ec ∪ (vci , vcj ) end
Under this setting, a painted box is necessarily a clique in Gc , but in Rn , n > 2 there could be patterns with cliques that are not painted boxes. We call a clique feasible if it corresponds to a painted box, then MP(c) is reduced to the well known NPcomplete problem Minimum Clique Cover Problem (MCCP) that is to partition the vertex set of a graph into a minimum number of cliques. We want to point out the possibility for generating greedy algorithms for the MP based on consecutive calls to a maximum cardinality clique finder like one in [5]. An illustration of the proposed algorithms for graph construction and for the minimum clique cover are shown in Fig. 1. Since we are not aware of any published efficient algorithms for general graphs, the following heuristics seems to be quite appropriate for MP(c). The heuristic algorithm is based on a greedy approach, by using a recursive call to a function MCCP(G) [6], which finds the maximum cardinality clique in a given graph G. This MCCP(G) function is considered to be the fastest at the moment. A previously cited algorithm [5] is faster but requires extra efforts for converting the initial graph into k-partite one. Algorithm for solving MP(c): 1. set G = Gc 2. while Vc = ∅ do 3. Vcur = MCC P (G ) /∗ Vcur is the vertex set of the maximum cardinality clique found by MCCP(G) ∗ / 4. set G to be the induced subgraph of G by the vertex set Vc − Vcur 5. end do
124
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
This heuristic algorithm could be easily turned into an exact one if the do - end do body is substituted by a call to an MCCP solver. By presenting this algorithm, we simply emphasize the possibility of starting to solve the MP(c) with existing general purpose tools and in the case of promising results to try to switch to special purpose algorithms. In the next section we propose a heuristic algorithm called the Box algorithm for constructing the minimum cover of the maximal cliques by the colored rectangular parellepipeds (boxes). 3. Box algorithm To introduce the Box algorithm (BA) we first recall the box definition and the problem we are trying to solve. The box (in vector notation) B(l, u) is the set of points x ∈ Rn s.t. l ≤ x ≤ u or B(l, u ) = {x : li ≤ xi ≤ ui , i = 1, . . . , n}, where l, u ∈ Rn . The 0−dim boxes (just points in Rn ) are those with l = u. If ui − li > 0, i = 1, . . . , n then we call the box B a full-dimensional box. For the sake of clarity and without losing generality we will only consider 2-class problem. Suppose that two sets Xb and Xr of training patterns (points in the hypercube F ∈ Rn ) are given and suppose that they are colored in blue and red, respectively. We call the box B colored iff it only contains points of the same color. A pair of points y = (y1 , y2 , . . . , yn ) and z = (z1 , z2 , . . . , zn ) generates B if li = min{yi , zi } and ui = max{yi , zi }, i = 1, . . . , n. Consider again the master problem MP(c). Problem A: Find a coverage of Xb ∪ Xr with the minimal number of colored full-dimensional boxes such that Xb is covered by blue boxes and Xr by red boxes. We will first construct the box cover for Xb . The cover for Xr can be constructed analogously and thus its description is omitted. Define a graph GXb = (V, E ), V = Xb , E = {e = (vi , v j )} and let e be a colored box generator. An edge e is colored green if it is a full-dimensional box generator. Let now e = (a, b) and f = (c, d ) be green and let Be and Bf be the corresponding full-dimensional boxes. An operation ef is color-preserving if the full-dimensional box C, C = Be B f , li = min{ai , bi , ci , di }, ui = max{ai , bi , ci , di } is colored. An edge e dominates f (say e > f) if Be ⊃Bf . The key operation AB over the pair of blue boxes A(l1 , u1 ), B(l2 , u2 ) creates the box C(l, u), where l = min{l1 , l2 } and u = max{u1 , u2 } (the min and max operations are component-wise.) In a case of 0−dim boxes(points) y = (y1 , y2 , . . . , yn ) and z = (z1 , z2 , . . . , zn ) this operation generates a box of higher dimension B(l, u ), li = min{yi , zi }, ui = max{yi , zi }, i = 1, . . . , n. In any case we will call the pair of boxes eligible only if the resulting sum is a blue box. Obviously, there is one-to-one correspondence between fulldimensional boxes and the green edges. The dominance relation on the set of full-dimensional boxes (say Be > Bf ) could be easily established. When the full dimensional box C is colored then it dominates Be and Bf and the appropriate application of operation allows generation of maximal colored cliques. We call a clique colored if it contains green edges. The points contained in the full dimensional box C form the minimum clique cover, i.e., the vertex set (points in C) is partitioned in cliques and the number of cliques is minimal. Now we can reformulate the Problem A as follows. Problem A: Cover the graph GXb with the minimal number of blue colored cliques. The algorithm for solving Problem A is as follows. Define a graph GXb = G = (V, E ), V = X, E = {e = (vi , v j )} if (vi , vj ) is eligible pair. The vertices are labeled by previously defined vectors (l, u). It could be easily seen that the problem of finding the minimal blue box cover is equivalent to finding the minimum number of cliques covering all vertices of V. Since each edge e ∈ E corresponds to a blue box call e non-dominated if its box is not
a sub box of any other edge. The usage of non-dominated edges allows for simplifying G to the graph whose vertices correspond to the set of non-dominated edges and edges created by applying operation (step 1 in the algorithm bellow). This simplification is not free and the cost is O(|E|2 ) edge comparisons. BOX - ALGO input: graph G(V, E) – Step 1. (reduce the graph G) Create the graph W G = (W V, W E ) by use of non-dominated edges of G – Step 2. (Clique enlargement) while WV = ∅ do: – Call try-to-extend. – create WE, save all isolated vertices (boxes) in FB (Final Boxes) and remove them from WG – end do try-to-extend(WG): input: graph WG output: vertex set WV as a result from d−clique cover of the input graph WG - (2−clique cover of WG) For all (u, v) ∈ WE save uv in BS - return W V = BS The BOX - ALGO is based on the following idea: starting from cover by 0-dimensional boxes corresponding to the vertices in G to enlarge (by adding new instances) the boxes by calls to try-toextend(WG). The enlargement given there is done by performing the binary uv operation, equivalent to finding the convex hull in Manhattan distance of the two (in case of 2-clique cover) given boxes. The enlarged boxes are returned for possibly further enlargement. The 1-clique (called isolated vertices above) are saved in the list of final boxes FB. The try-to-extend(WG) function is explicitly included just to remind that one could use clique cover by cliques of bigger than 2 cardinality. Otherwise the 2-clique cover are given by the edges in WE which on return become vertices on the next version of the graph WG. To reduce the number of the boxes (Step 1) subject to enlargement (the list WV above) we use the non-dominated edges because each dominated edge (u, v) create the uv sub box. The finiteness of the algorithm follows from |V| < ∞ that guarantees the finiteness of the number of calls to enlargements. Also recall that the vertices covered by a box form a clique in G. Remark: In the special case when all boxes have dimension zero and no rotation is performed the Box algorithm is reduced to the nearest neighbor approach. 4. Classification rule based on Box algorithm Cliques-to-painted boxes. Let S be any clique in the optimal solution of MP(c). The box painted in color c that corresponds to this clique is defined by P (S ) = {x = (x1 , x2 , . . . , xn ), x ∈ I1 × I2 × · · · × In }, where Ii = [min x¯i , max x¯i ]. The points x correspond to the vertices in S. Geometrically, by converting cliques to boxes, one could obtain overlapping boxes of the same color. The union of such boxes is not a box, but in the classification phase the point being classified is trivially resolved as belonging to the union of boxes instead of a single box. If a pattern x from the test dataset falls in a single colored box or in the union of boxes with the same color, then the element x is assigned to the class that corresponds to this color. It is possible that we encounter a case of overlapping boxes painted in different colors, with empty cross-section. Such empty cross-section is easily determined and corresponds to an empty box in the definition of the G-cut problem. Both cases are illustrated in Fig. 2. The colored boxes constructed in this way are used to make decisions about class membership of classified patterns.
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
Fig. 2. Overlapping example. Left: overlapping for the same colored boxes. Right: overlapping for two colored boxes - the shaded area is empty. Source: Reference - Valev et al. [11].
If a pattern x from the test dataset falls in an empty (uncolored) box then the pattern x is not classified. Another possible classification rule is that the pattern x can be assigned to a class with color that corresponds to the majority of adjacent colored boxes with ties resolved as in k-NN rule by indices or random splitting. We would like to point out that the proposed new classifier is more general than the linear classifier. Note that considering blue and not blue points only doesn’t diminish the applicability of the approach to more than two classes of patterns. In case of l classes for some integer l > 2, we use one against the rest approach. Our classifier is applied sequentially for each class separately and then the final decision is based on majority vote. The class membership is only used in the process of building Gc . This fact shows another advantage of the proposed algorithm. 5. Computational complexity Like many other methods, the optimal solution to the graph partitioning problem is NP-complete because of its combinatorial nature. While in both versions of the above-mentioned graph algorithm there is a call to a solver of a classical NP-complete problem, it is far from evident that the instances of MP(c) are not polynomially solvable. This is due to the fact that the vertices of the generated graphs are points in a metric space and clustering the points according to the Euclidian distance could result in forming cliques in the respective graphs. We would like to point out that a new platform for solving the classification problem has been proposed, which in the exact case leads to solving an NP-complete problem. This can be avoided if approximate solution is sought. To keep the polynomial complexity of the algorithm we sacrifice the optimality by using the threshold d as a parameter in try-to-extend procedure. Call now the speed-up s_up = |Xb |/|NB|, where NB is the cardinality of the clique cover. Since the above approach is the nearest neighbor in disguise, the bigger s_up is the faster classification procedure will become. Step 2 finds a clique cover in O(|Xb |3 ) time. To keep this complexity in practical use of the algorithm, one could adjust the threshold c to achieve a satisfactory s_up. Note that the main idea of the algorithm is to reduce the size of the clique cover problem on a graph with |Xb | nodes to much smaller size |GGXb |, which is decomposed into its connected components. 6. Relation of Box algorithm to the nearest neighbor rule and decision trees
125
argminx ∈X ∪Xr ρ (x, x ) = y∗ and y∗ is red (blue, etc.) or equivalently b when x falls in the appropriate region of the Voronoi diagram. Let us shed more light on the relation of this rule and its kNN generalization to BA. To this end introduce a function f mapping a set of patterns X to a set of labels Y = {1, 2, ., l }. (In one dimensional case such function is called stepwise function). Without losing generality we can take X to be a box (rectangle in 2-d) and Y to be a finite set of l colors, say Y = {red, blue, . . .}. Thus to know f is equivalent to know the coloring of X based on {(x1 , y1 ), . . . , (xm , ym )}. Then the nearest neighbor rule is constructed as follows. For each point xi colored in red (blue, etc.) paint in red (blue, etc.) all points in X that are closest neighbors to xi (these are regions in Voronoi diagram). k-NN rule is obtained as follows. For each k-element subset A of set M of training patterns having majority of red (blue, etc.) points, paint in red (blue, etc.) all points in X whose k closest points are points in A (not all A provide paintings). A good classifier decomposes P(M) into painted areas (in linear case they are only two) having the nearest neighbor property, i.e. for any point in red (blue) area the nearest neighbor rule classifies the recognized pattern as red (blue). If box B = li ≤ xi ≤ ui i = 1, . . . , n contains training patterns and ρ is the Manhattan distance, then for the pattern y the distance is equal to ρ (y, B ) = ni=1 max(0, li − yi ) + max(0, yi − ui ). Now the idea of previously defined boxes becomes clear. We first approximate the above mentioned painted areas (not known in advance) by painted boxes (perfect candidates for Manhattan distance) and then classify patterns according to point-to-box distance rule. BA is similar to NN if the Voronoi cells are taken to be boxes. Let us note that instead of boxes we can consider the convex hulls of patterns. Applying now the nearest neighbor rule we get better classification, but this approach is computationally intractable either for convex hulls constructions or for computing the pointto-set distances. Now the MP(c) problem can be formulated as an heuristic good clique cover problem satisfying the nearest neighbor rule. Regarding similarity between BA and k-NN one can adjust BA to apply k-NN, but only for a subset of points in M if the boxes contain the list of points. Then if the distance of a new instance to, say, two boxes is r1 > 0 and r2 > 0, then just apply k-NN to the appropriate lists of points. 6.2. Relation of Box algorithm to the decision trees (i) The leaves of the decision trees (DT) are painted rectangles (boxes) obtained by preliminary given sets of intervals and priority (for univariate trees) for creating successors of a given node. Internal nodes are rectangles containing a mixture of colored points. (ii) BA create painted boxes (under Manhathan distance) that are convex hull of given sets of unicolored sets of points. For certain instances (i) and (ii) could create the same coloring if the set of intervals and priority are properly chosen. In general you can say that BA create DT with a root node and a list of successors (colored boxes, containing unicolored points, whose coordinates satisfy the test on the arc for box membership). We could go further by saying that BA aims to create trees with minimum number of leaves and thus it attempts to reduce the generalization errors. 7. Experimental results
6.1. Relation of Box algorithm to the nearest neighbor rule A reasonable classification rule, known as a nearest neighbor rule, is to classify the pattern x as red (blue, etc.) if
In this section we compare the performance of our Box algorithm classifier with other classifiers including k-NN, SVM and decision trees (DT) on synthetic data generated from 3-variate nor-
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
126
Table 1 Datasets from UCI depository used in computer experiments. Name
Data type
# of attributes
# of instances
Monks Breast Cancer Wisconsin (original) Habermans Survival Mammographic Mass Caesarian Section Classification Dataset SPECTF Heart Blood Transfusion Service Center Diabetic Retinopathy Debrecen Data Set Statlog (Heart) Parkinsons EEG Eye State Lung Cancer
Categorical Integer Integer Integer Integer Integer Integer Integer Categorical, Real Real Integer, Real Integer
7 10 3 6 5 44 5 20 13 23 15 56
1711 683 306 830 80 267 748 1151 270 195 200 28
Table 2 Parameter settings for k-NN and SVM. Dataset
k in k-NN
SVM Kernel
Normal distribution 1 Normal distribution 2 Normal distribution 3 Monk 1 Monk 2 Monk 3 Diabetic rethinopathy SPECTF Heart Haberman Survival Data Mammographic Data Breast Cancer Wisconsin Data Caesarian Section Blood Transfusion Statlog (Heart) Parkinsons EEG Eye State Lung Cancer
5 5 5 3 3 3 5 5 5 5 3 5 5 7 5 3 3
polynomial polynomial polynomial polynomial polynomial RBF RBF RBF polynomial RBF RBF RBF RBF RBF RBF polynomial RBF
Table 5 Accuracy, sensitivity, specificity, and precision of k-NN, DT, SVM classifiers and Box algorithm for normally distributed data. Normal distribution Accuracy
k-NN DT SVM Box algorithm
Covariance matrices
Mean vectors
1 2 3
I I I
0 0 0
I 2I 4I
2
3
1
2
3
0.61 0.58 064 0.98
0.66 0.62 0.69 0.98
0.78 0.75 0.81 0.98
0.61 0.58 0.55 0.98
0.60 0.63 0.57 0.99
0.70 0.74 0.73 0.99
Specificity
k-NN DT SVM Box algorithm
Precision
1
2
3
1
2
3
0.60 0.59 0.72 0.98
0.73 0.62 0.81 0.97
0.86 0.75 0.89 0.96
0.61 0.58 0.66 0.98
0.69 0.62 0.75 0.97
0.83 0.75 0.87 0.96
algorithms of the same type like SVM, k-NN and decision trees. Simulations were carried out using MATLAB16b on Intel(R) Pentium CPU B960 @ 2.20GHz with 4.00 GB RAM.
Table 3 Parameter settings for normal distributions. Case
Sensitivity
1
0.5e 0.6e 0.8e
7.1. Data splitting For some datasets in UCI depository splitting into train and test set was already done and it is as follows.
mal distributions and on real datasets from UCI Machine Learning Repository listed in Table 1. All datasets in Table 1 represnt binary classification problems except for the last dataset (Lung Cancer) which is a 3-class problem. Box algorithm is based on similarity and it is normal and reasonable to compare it with
• • • •
Monk 1. Training: 124 instances, testing: 432 instances Monk 2. Training: 169 instances, testing: 432 instances Monk 3. Training: 122 instances, testing: 432 instances SPECTF Heart. Training: 80 instances, testing: 187 instances
Table 4 Confusion matrices in percentage ratio for k-NN, DT, SVM classifiers and Box algorithm for normally distributed data. k-NN
DT
SVM
Box algorithm
First normal distribution
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.61 0.39
0.39 0.61
0.59 0.42
0.41 0.58
0.72 0.45
0.28 0.55
0.98 0.02
0.02 0.98
Second normal distribution
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.73 0.40
0.27 0.60
0.62 0.37
0.38 0.63
0.81 0.43
0.19 0.57
0.97 0.01
0.03 0.99
Third normal distribution
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.86 0.30
0.14 0.70
0.75 0.26
0.25 0.74
0.89 0.27
0.11 0.73
0.96 0.01
0.04 0.99
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
127
Table 6 Confusion matrices in percentage ratio for k-NN, DT, SVM classifiers and Box algorithm for Monk’s dataset. k-NN
DT
SVM
Box algorithm
Monk 1
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.74 0.29
0.26 0.71
0.78 0.11
0.22 0.89
0.88 0.08
0.12 0.92
1.00 0.00
0.00 1.00
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.83 0.30
0.17 0.70
0.71 0.35
0.29 0.65
0.85 0.11
0.15 0.89
1.00 0.00
0.00 1.00
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.86 0.18
0.14 0.82
0.91 0.03
0.09 0.97
0.97 0.14
0.03 0.86
1.00 0.00
0.00 1.00
Monk 2
Red Points Blue Points
Monk 3
Red Points Blue Points
Table 7 Accuracy, sensitivity, specificity, and precision of k-NN, DT, SVM classifiers and Box algorithm for Monk’s dataset. Accuracy
k-NN DT SVM Box algorithm
Sensitivity
Monk 1
Monk 2
Monk 3
Monk 1
Monk 2
Monk 3
0.72 0.83 0.90 1.00
0.79 0.69 0.86 1.00
0.84 0.94 0.91 1.00
0.71 0.89 0.92 1.00
0.70 0.65 0.89 1.00
0.82 0.97 0.86 1.00
Monk 1
Monk 2
Monk 3
Monk 1
Monk 2
Monk 3
0.74 0.78 0.88 1.00
0.83 0.71 0.84 1.00
0.86 0.91 0.97 1.00
0.73 0.8 0.89 1.00
0.67 0.52 0.74 1.00
0.87 0.93 0.97 1.00
Specificity
k-NN DT SVM Box algorithm
Precision
Table 8 Confusion matrices in percentage ratio for k-NN, DT, SVM classifiers and Box algorithm for three datasets from UCI. k-NN
DT
SVM
Box algorithm
Wisconsin Data
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.99 0.03
0.01 0.97
0.97 0.11
0.03 0.89
0.99 0.00
0.01 1.00
1.00 0.00
0.00 1.00
Mammography Data
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.87 0.37
0.13 0.63
0.87 0.36
0.13 0.64
1.00 0.48
0.00 0.52
1.00 0.01
0.00 0.99
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
1.00 0.50
0.00 0.50
0.53 0.36
0.47 0.64
0.67 0.25
0.33 0.75
0.47 0.16
0.53 0.84
Heart Data
Red Points Blue Points
For other datasets from UCI depository we selected 80% data for training and 20% for testing yielding the following splits:
•
•
•
•
•
Haberman Survival Data. Training: 244 instances, testing: 62 instances Mammographic Data. Training: 664 instances, testing: 166 instances Breast Cancer Wisconsin Data. Training: 546 instances, testing: 137 instances
•
• • • •
Caesarian Section. Training: 64 instances, testing: 16 instances Blood Transfusion. Training: 598 instances, testing: 150 instances Diabetic Retinopathy. Training: 920 instances, testing: 231 instances Statlog (Heart). Training: 216 instances, testing: 54 instances Parkinsons. Training: 156 instances, testing: 39 instances EEG Eye State. Training: 160 instances, testing: 40 instances Lung Cancer. Training: 22 instances, testing: 6 instances
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
128
Table 9 Confusion matrices in percentage ratio of k-NN, DT, SVM classifiers and Box algorithm for four datasets from UCI. k-NN
DT
SVM
Box algorithm
Haberman’s Survival Data
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.93 0.81
0.07 0.19
0.72 0.44
0.28 0.56
0.96 0.87
0.04 0.13
1.00 0.00
0.00 1.00
Blood Tranfusion
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.99 0.93
0.01 0.07
0.94 1.00
0.06 0.00
0.99 1.00
0.01 0.00
1.00 0.00
0.00 1.00
Caesarian Section
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.57 0.44
0.43 0.56
0.57 0.33
0.43 0.67
0.57 0.44
0.43 0.56
1.00 0.00
0.00 1.00
Diabetic retinopathy
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.59 0.36
0.41 0.64
0.53 0.36
0.47 0.64
0.60 0.24
0.40 0.76
0.91 0.08
0.09 0.92
Table 10 Accuracy and sensitivity of k-NN, DT, SVM classifiers and Box algorithm for seven datasets from UCI.
k-NN DT SVM Box algorithm
Wisconsin Accuracy
Mammography
Heart
Haberman’s Survival
Blood Tranfusion
Caesarian Section
Diabetic retinopathy
0.99 0.95 0.99 1.00
0.75 0.76 0.77 0.99
0.54 0.63 0.74 0.81
0.74 0.68 0.74 1.00
0.90 0.85 0.89 1.00
0.56 0.63 0.56 1.00
0.61 0.58 0.68 0.92
0.76 0.89 1.00 1.00
0.63 0.64 0.52 0.99
0.5 0.64 0.75 0.84
0.19 0.56 0.13 1.00
0.07 0.00 0.00 1.00
0.56 0.67 0.56 1.00
0.64 0.64 0.76 0.92
Sensitivity k-NN DT SVM Box algorithm
Table 11 Specificity and precision of k-NN, DT, SVM classifiers and Box algorithm for different datasets from UCI.
k-NN DT SVM Box algorithm
Wisconsin Specificity
Mammography
Heart
Haberman’s Survival
Blood Tranfusion
Caesarian Section
Diabetic retinopathy
0.99 0.97 0.99 1.00
0.87 0.87 1.00 1.00
1.00 0.53 0.67 0.47
0.93 0.72 0.96 1.00
0.99 0.94 0.99 1.00
0.57 0.57 0.57 1.00
0.59 0.53 0.6 0.91
0.97 0.91 0.97 1.00
0.82 0.83 1.00 1.00
1 0.94 0.96 0.95
0.50 0.41 0.50 1.00
0.33 0.00 0.00 1.00
0.63 0.67 0.63 1.00
0.61 0.58 0.66 0.91
Precision k-NN DT SVM Box algorithm
7.2. Normal attributes In this section we compare the performance of Box algorithm classifier with k-NN, SVM and DT classifiers on the normal data. Parameter settings for k-NN and SVM are provided in Table 2. The samples for a binary classification problem with normal attributes are generated for three sets of parameters and with 3-dimensional normal distributions with mean vectors and covariance matrices given in Table 3, where e = (1, 1, 1 )T . For each distribution 100 samples are generated and they are divided into 50 training samples and 50 testing samples. The simulation results presented in Tables 4 and 5 below are averages of 50 runs. Bolded numbers indicate the best performance. It can be noticed that in all cases the
Box algorithm significantly outperforms other classifiers in terms of accuracy, sensitivity, specificity and precision. 7.3. Nominal attributes In this section we present experimental results on datasets from UCI Machine Learning Repository as listed in Table 1 with the following classifiers: Box algorithm, SVM, k-NN and DT. The parameter settings of k-NN and SVM are listed in Table 2. Each experiment is run of training and testing datasets as described in Section 7.1. In Tables 6, 8, 9, 12, and 15 we present confusion matrix for the Box algorithm, SVM, k-NN and DT classifiers on UCI datasets. Accuracy, sensitivity, specificity and precision are
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
129
Table 12 Confusion matrices in percentage ratio of k-NN, DT, SVM classifiers and Box algorithm for three datasets from UCI. k_NN
DT
SVM
Box algorithm
Statlog (Heart)
Red Points Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.9 0.17
0.1 0.83
0.87 0.29
0.13 0.71
0.9 0.17
0.1 0.83
0.9 0.13
0.1 0.87
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
0.17 0
0.83 1.00
0.12 0.07
0.88 0.93
0.21 0
0.79 1.00
0.21 0
0.79 1.00
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
Red Points
Blue Points
1.00 0
0 1.00
1.00 0.3
0 0.7
1.00 0
0 1.00
1.00 0
0 1.00
Parkinsons
Red Points Blue Points
EEG Eye State
Red Points Blue Points
Table 13 Accuracy and sensitivity of k-NN, DT, SVM classifiers and Box algorithm for three datasets from UCI.
k-NN DT SVM Box algorithm
Statlog (Heart) Accuracy
Parkinsons
0.87 0.80 0.87 0.89
0.49 0.44 0.51 0.51
1.00 0.85 1.00 1.00
0.83 0.71 0.83 0.88
1.00 0.93 1.00 1.00
1.00 0.7 1.00 1.00
Table 16 Accuracy, sensitivity, specificity and precision of k-NN, DT, SVM classifiers and Box algorithm for 3-class lung cancer classification problem from UCI.
EEG Eye State
k-NN DT SVM Box algorithm
Accuracy
Sensitivity
Specificity
Precision
0.50 0.50 0.50 0.67
0.50 0.50 0.50 0.67
0.75 0.75 0.75 0.83
0.50 0.44 0.50 0.72
Sensitivity k-NN DT SVM Box algorithm
almost always beats the other classifiers with respect to sensitivity, specificity and precision. Consequently, it can be seen from the experimental results on real data from UCI depository presented in this section that the Box algorithm classifier performs significantly better than SVM, kNN and DT in all cases.
Table 14 Specificity and precision of k-NN, DT, SVM classifiers and Box algorithm for three datasets from UCI.
k-NN DT SVM Box algorithm
Statlog (Heart) Specificity
Parkinsons
EEG Eye State
8. Conclusions
0.9 0.87 0.9 0.9
0.17 0.13 0.21 0.21
1.00 1.00 1.00 1.00
0.87 0.81 0.87 0.88
0.43 0.4 0.44 0.44
1.00 1.00 1.00 1.00
We introduced a new geometrical approach for solving the supervised classification problem. We applied graph optimization approach using the well-known problem of partitioning the graph into a minimum number of cliques which were subsequently merged using the nearest neighbor rule. Equivalently, the supervised classification problem is solved by means of a heuristic maximal clique cover approach satisfying the nearest neighbor rule. The Box algorithm showed superior performance on simulated normal data and on several real data applications from UCI depository in comparison with SVM, k-NN and DT classifiers. We analyzed computational complexity of the proposed algorithm. Its low complexity results from utilizing efficient algorithm for solving graph partitioning problem into a minimum number of maximal cliques.
Precision k-NN DT SVM Box algorithm
provided in Tables 7, 10, 11, 13, 14, and 16. It can be seen that in all cases the Box algorithm classifiers significantly outperforms SVM, k-NN and DT classifiers in terms of accuracy. Likewise it
Table 15 Confusion matrices in percentage ratio of k-NN, DT, SVM classifiers and Box algorithm for 3-class lung cancer classification problem from UCI. k-NN
Red Points Blue Points Green Points
DT
Red Points
Blue Points
Green Points
Red Points
Blue Points
Green Points
0.5 0 0
0.5 0 0
0 1.00 1.00
1.00 0 0
0 0 0.5
0 1.00 0.5
Red Points
Blue Points
Green Points
Red Points
Blue Points
Green Points
0.5 0 0
0.5 0 0
0 1.00 1.00
0.5 0 0
0.5 0.5 0
0 0.5 1.00
SVM
Red Points Blue Points Green Points
Box algorithm
˙ N. Yanev, V. Valev and A. Krzyzak et al. / Pattern Recognition Letters 128 (2019) 122–130
130
Last but not least the proposed approach optimally utilizes the geometrical structure of the training set by decomposing the l-class problem into l binary classification problems. Declaration of Competing Interest There is no conflict of interest. Acknowledgments Research of N. Yanev was partially supported by the French˙ Bulgarian contract ”RILA”, 01/4, 2018. Research of A. Krzyzak was supported by the Natural Sciences and Engineering Research Council under Grant RGPIN-2015-06412. K. Ben Suliman research was supported by the Libyan Government. References [1] C. Ding, X. He, H. Zha, M. Gu, H. Simon, A min-max cut algorithm for graph partitioning and data clustering, in: Proc. of Int. Conf. on Data Mining, 2001, pp. 107–114. [2] H. Güvenir, N. Emeksiz, N. Ikizler, N. Ormeci, Diagnosis of gastric carcinoma by classification on feature projections, Artif. Intell. Med. 23 (2004) 231–240.
[3] H. Güvenir, I. Sirin, Classification by feature partitioning, Mach. Learn. 23 (1966) 47–67. [4] K. Kumar, A. Negi, Subxpca and a generalized feature partitioning approach to principal component analysis, Pattern Recognit. 41 (2008) 1398–1409. [5] N. Malod-Dognin, R. Andonov, N. Yanev, Maximum cliques in protein structure comparison, Lect. Notes Comput. Sci. 6049 (2010) 106–117. [6] P. Östergard, A fast algorithm for the maximum clique problem, Discrete Appl. Math. 120 (2002) 197–207. [7] V. Valev, Supervised pattern recognition by parallel feature partitioning, Pattern Recognit. 37 (2004) 463–467. [8] V. Valev, From binary features to non-reducible descriptors in supervised pattern recognition problems, Pattern Recognit. Lett. 45 (2014) 106–114. [9] V. Valev, N. Yanev, Classification using graph partitioning, in: Proc. of the 21st Int. Conf. on Pattern Recognition, IEEE Xplore, Tsukuba, Japan, 2012, pp. 1261–1264. ˙ , A new geometrical approach for solving the su[10] V. Valev, N. Yanev, A. Krzyzak pervised pattern recognition problem, in: Proc. of the 23rd Int. Conf. on Pattern Recognition, IEEE Xplore, Cancun, Mexico, 2016, pp. 1648–1652. ˙ , K.B. Suliman, Supervised classification using [11] V. Valev, N. Yanev, A. Krzyzak feature space partitioning, in: E. Hancock, X. Bai, T.K. Ho, R. Wilson (Eds.), 2018 Joint International Workshop on Statistical, Structural and Syntactic Pattern Recognition, S+SSPR’2018, Beijing, China, LNCS, vol. 11004, Springer, 2018, pp. 194–203. [12] N. Yanev, S. Balev, A combinatorial approach to the classification problem, Eur. J. Oper. Res. 115 (1999) 339–350. [13] G. Yang, Z. Tang, Z. Zhang, Y. Zhu, A flexible annealing chaotic neural network to maximum clique problem, Int. J. Neural Syst. 17 (2007) 183–192.