Improving the efficiency of a mixed integer linear programming based approach for multi-class classification problem

Computers & Industrial Engineering 66 (2013) 383–388 Contents lists available at SciVerse ScienceDirect Computers & Industrial Engineering journal h...

Download PDF

462KB Sizes 3 Downloads 140 Views

Report

PDF Reader
Full Text

Computers & Industrial Engineering 66 (2013) 383–388

Contents lists available at SciVerse ScienceDirect

Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie

Improving the efﬁciency of a mixed integer linear programming based approach for multi-class classiﬁcation problem Alaleh Maskooki ⇑ Department of Applied Mathematics, Ferdowsi University of Mashhad (FUM), Pardis Campus, Azadi Square, Mashhad, Iran

a r t i c l e

i n f o

Article history: Received 19 June 2012 Received in revised form 4 May 2013 Accepted 8 July 2013 Available online 17 July 2013 Keywords: Data classiﬁcation Mixed integer linear programming Hyper-boxes Iterative algorithm

a b s t r a c t Data classiﬁcation is one of the fundamental issues in data mining and machine learning. A great deal of effort has been done for reducing the time required to learn a classiﬁcation model. In this research, a new model and algorithm is proposed to improve the work of Xu and Papageorgiou (2009). Computational comparisons on real and simulated patterns with different characteristics (including dimension, high overlap or heterogeneity in the attributes) conﬁrm that, the improved method considerably reduces the training time in comparison to the primary model, whereas it generally maintains the accuracy. Particularly, this speed-increase is signiﬁcant in the case of high overlap. In addition, the rate of increase in training time of the proposed model is much less than that of the primary model, as the set-size or the number of overlapping samples is increased. Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction Data classiﬁcation is generally known as a method for constructing a set of classiﬁcation rules (classiﬁer), based on data attributes, in order to predict class membership of unknown data. This task is accomplished in two phases. In the training phase, a set of data with known membership (training dataset), is used to estimate the parameters of boundaries between classes. After these boundaries are determined, in the second phase, some new data (test data), which are not used for training, is assigned to a class according to the boundaries obtained, in order to evaluate the accuracy of the model. The latter phase is called testing procedure. For instance, a dataset can be a set of vectors representing chemical attributes of patients’ blood and each class can be a disease. Extensive applications of data classiﬁcation in different ﬁelds of engineering, medicine, industry and ﬁnance, result in developing a variety of methods. Decision tree induction (Safavian & Landgrebe, 1991), neural networks (Hertz, Palmer, & Krogh, 1991), Bayesian networks (Russell & Norvig, 2010) and genetic algorithm are some examples. A simple classiﬁer, which is constructed by a linear programming (LP) formulation, can be in the form of a hyper-plane separating two sets (Bennett & Mangasarian, 1992a; Freed & Glover, 1986; Glover, 1990; Glover, Keene, & Duea, 1988; Lam, Choo, & Moy, 1996; Lam & Moy, 2002). Erenguc and Koehler (1990) studied different types of LP models according to their objective functions. In addition, Koehler (1990) studied the problems that may ⇑ Tel.: +989151249299; fax: +98 5118439924. E-mail address: [email protected] 0360-8352/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cie.2013.07.005

appear in formulating an LP model, such as choosing an objective function, unbounded or unacceptable solutions, side constraints, data translation and transformation. A survey on mathematical programming models can be found in Pai (2009) and Lee and Wu (2009). Since hyper-planes are not sufﬁciently accurate in many real cases, piecewise linear classiﬁers (as an approximation for nonlinear boundaries) have extensively been studied. Multi Surface Method (MSM) (Mangasarian, 1968) is a piecewise LP-based classiﬁer that forms a hyper-plane at each stage until all data points become completely separated. Ryoo (2006) has proposed a Piecewise Linear and Convex (PLC) discriminant function and used it to develop a mixed integer linear programming (MILP) model. He has compared the PLC method with two common methods including MSM. Results show that, MSM constructs more complex boundaries than the ideal shape when using on datasets with high overlap. On the other hand, PLC is appropriate for data with convex boundaries and may fail to perform accurately in other cases. A piecewise linear function is introduced by Astorino and Gaudioso (2002) that constructs multiple hyper-planes to form a convex polyhedron. Comparing to other linear methods, this model requires high numerical computations. All the above models are introduced for solving the two-class classiﬁcation problem. There are fewer mathematical programming models for the multi-class case. Bennett and Mangasarian (1992b) have proposed a piecewise linear classiﬁer which is an extension of their former model for the two-class case. An LP model is proposed by Bal and Orkcu (2011) which is a combination of three LP models proposed by Gehrlein (1986), Gochet, Stam, Srinivasan, and Chen (1995) and Sueyoshi (2006).

384

A. Maskooki / Computers & Industrial Engineering 66 (2013) 383–388

Freed and Glover (1981) presented a linear programming model that assigns intervals to classes. In this model, the data are classiﬁed into intervals according to their discriminant scores. A drawback of this method is that an appropriate order for classes should be determined in advance; otherwise, it yields low accuracy. Gehrlein (1986) suggested an MILP method that overcomes the problem of sorting, by introducing binary variables for each pair of classes. If the intervals in Gehrlein’s model are deﬁned for each direction (attribute) separately, and data coordinates in Rm are used instead of discriminant scores in R, then hyper-boxes are produced instead of a single hyper-plane. Uney and Turkay (2006) presented a new data classiﬁcation method based on the use of hyper-boxes for determining the class boundaries. They developed an MILP model by converting the relationship among discrete variables to their equivalent integer constraints using Boolean algebra. They evaluated the performance of their proposed method on Iris dataset (Fisher, 1936). Xu and Papageorgiou (2009) formulated an MILP model in a similar concept for enclosing the training data using hyper-boxes. They suggested an algorithm that applies the MILP model iteratively in order to improve the accuracy. Kone and Karwan (2011) extended this model for predicting cost-to-serve (CTS) values of new customers in industrial gas business. Although hyper-planes are efﬁcient in classifying data into two sets, they can be inaccurate and inefﬁcient when applied to solve multi-class problems (Uney & Turkay, 2006). Fig. 1 shows the schematic representation of classifying three datasets using hyperplanes and hyper-boxes. As can be seen in Fig. 1, hyper-boxes have more ﬂexibility for estimating ideal class boundaries of a pattern in comparison to rigid hyper-planes, particularly for classifying multiple classes. For instance, if square or circle data points are omitted from the pattern in Fig. 1, then the other two sets can be completely separated by a single hyper-plane, but there are misclassiﬁed points when using hyper-planes for separating three sets. In addition, the boundaries, which are constructed using hyperboxes, look closer to the actual class boundaries in comparison to the borders obtained on the same dataset using hyper-planes. However, integer linear programming (ILP) problems are generally NP-complete (Schrijver, 1998). This conﬁrms that, large-scale ILP problems are impractical due to being extremely time-consuming for real-world applications. This research is focused on Xu and Papageorgiou’s MILP-based method. This method can be inefﬁcient when the size of the training set or the overlapping area is increased. Modiﬁcations are suggested to improve the efﬁciency of their proposed model and algorithm.

In the next section, the MILP model and the related iterative algorithm, which are suggested by Xu and Papageorgiou (2009) is brieﬂy stated. The new approach is proposed in Section 3. Numerical computations and comparisons of two methods are illustrated in Section 4. Section 5 concludes the discussion. Notations are as follows: Consider a classiﬁcation problem with G classes and n samples (i = 1, ... , n). The class membership of samples is supposed to be known. Each sample is a vector in Rm where m is the number of attributes. The parameter aij represents the value of sample i on attribute j (j = 1, ... , m). 2. Xu and Papageorgiou’s MILP model In this section, Xu and Papageorgiou (2009)’s model for the multi-class classiﬁcation problem is summarized. The training process is performed in two stages. In the ﬁrst stage, an MILP model produces boundaries that form one hyper-box for each class. A hyper-box r is recognized by its central coordinate ðBrj Þ and length ðLErj Þ on each attribute j(j = 1, ... , m) and is assigned to one of the classes 1, . . . , G. ri exists if sample i is assigned to hyper-box r. The objective is to minimize the total number of misclassiﬁcations by numerating non-zero variables Ei . The binary variable Ei is equal to 1 if sample i is included in the corresponding hyper-box and is zero otherwise. In addition, the binary variable Y rsj is introduced to prevent boxes with different class labels overlapping each other. Y rsj is zero if box r and s do not overlap on attribute j; otherwise it is equal to 1. U and e are respectively large and small positive constants with arbitrary values. The parameter A is the initial number of hyper-boxes. The complete MILP model for Multi-Class data classiﬁcation Problem (MCP) is formulated as follows:

min

n X ð1 Ei Þ i¼1

Subject to:

aij P Brj

LErj Uð1 Ei Þ 8i; r i ; j 2

ð1Þ

aij 6 Brj þ

LErj þ Uð1 Ei Þ 8i; r i ; j 2

ð2Þ

Brj Bsj þ U Y rsj P

LErj þ LEsj þ e 8j 8r; s ¼ 1; . . . ; A; 2

m X ðY rsj þ Y srj Þ 6 2m 1 8r ¼ 1; . . . ; A 1;

s–r

s ¼ r þ 1; . . . ; A

j¼1

Fig. 1. Linear classiﬁers: hyper-boxes have more ﬂexibility for estimating the ideal class boundaries of a pattern in comparison to rigid hyper-planes.

ð3Þ

ð4Þ

385

A. Maskooki / Computers & Industrial Engineering 66 (2013) 383–388

Ei ; Y rsj 2 f0; 1g;

LErj P 0;

Brj : unrestricted in sign

Note that, constraints (1) and (2) are generated for hyper-box r and sample i only if the corresponding ri exists. A single Brj and LErj is created on each attribute j for all the samples that are assigned to the hyper-box r. Constraints (3) and (4) (non-overlapping constraints) are added in order to prevent each pair of boxes with different class labels sharing same regions simultaneously in all dimensions. The above model is interpreted in details by Xu and Papageorgiou (2009). In order to improve the accuracy of the MCP model, an iterative algorithm is introduced in the second stage of the training process. Multiple boxes are constructed for each class and the number and position of these boxes are optimized during the second stage. Steps of their proposed algorithm are as follows: 2.1. Xu and Papageorgio’s algorithm 1. Let H be the set of boxes which are assigned to the same class, D ¼ Ø and A = G. 2. Solve the MCP model using A hyper-boxes, each of which represents one of G classes. 3. According to the solution obtained, identify samples which are located outside the produced boxes and let D include all such samples, that is D ¼ fijEi ¼ 0; i ¼ 1; . . . ; ng. 4. Add one more box for each class that has misclassiﬁed samples in D. Update A and H. update r i by assigning misclassiﬁed samples to the corresponding new boxes. 5. Formulate the new MCP model using updated values. Write overlapping constraints only for boxes with different classes. 6. Solve the new model using updated A boxes. 7. If the objective values of two successive iterations are equal, then stop. Otherwise, let D ¼ Ø and go to step 3. Thus, the boundaries for classes are identiﬁed. In the testing procedure, if the new test data is located inside the boundaries of a produced hyper-box, it is assigned to the class, which is represented by that hyper-box. Otherwise, (if the new data is located outside all produced boxes) it is assigned to the nearest box using the Euclidian norm in the solution space. According to the algorithm mentioned above, the iterative procedure continues until there is no way to achieve a better solution by adding another new hyper-box. In each iteration, a new MILP model is solved regardless to the boundaries obtained for hyperboxes in the previous iteration. In other words, there is a great number of training samples that are repeatedly classiﬁed by constraints (1) and (2) in every iteration. As mentioned earlier, a drawback of MILP models is their high computational time leading to low efﬁciency of the algorithm when applying on large size datasets. In the next section, a different algorithm is proposed which considerably reduces the training time. 3. Proposed improved model As mentioned above, a great number of samples (which are correctly classiﬁed) are reconsidered during the calculation of boundaries for hyper-boxes in every iteration and thus extra enclosing constraints (constraints of type (1) and (2)) are produced. The simple idea of the suggested algorithm is to use the boundaries obtained so far (from previous iterations) and eliminate the correctly classiﬁed points from the training set in each iteration, instead of using whole dataset for calculating new boundaries. In order to implement the idea, some new notations are deﬁned and the MCP model is reformulated. g r shows the class which hyper-box r is assigned to. T is the training dataset, which initially includes all samples. P is the set of indices belonging to newly added

hyper-boxes in the current iteration, and K is the cardinality of P. P includes indices of boxes that have already been constructed during past iterations. Parameters B0rj and LE0rj are deﬁned in order to save Brj and LErj values that are obtained in the previous iteration. Y 0rsj is a binary variable which is equal to 0 if boxes r and s do not overlap each other on attribute j; otherwise it is equal to 1. The new model (named as MCPm) is stated as follows:

min

n X ð1 Ei Þ i¼1

Subject to:

aij P Brj

LErj Uð1 Ei Þ 8i; r i ; j 2

ð5Þ

aij 6 Brj þ

LErj þ Uð1 Ei Þ 8i; r i ; j 2

ð6Þ

Brj Bsj þ U Y rsj P

LErj þ LEsj þ e 8j8r; s 2 P; 2

m X ðY rsj þ Y srj Þ 6 2m 1 8r ¼ 1; . . . ; K 1;

s–r

ð7Þ

s

j¼1

¼ r þ 1; . . . ; K;

r; s 2 P

ð8Þ

LErj0 þ LEsj þ e 8j8r 2 P ; 2 2 P; g r –g s

Brj0 Bsj þ U Y rsj0 P

Brj Bsj0 þ U Y rsj0 P

8s ð9Þ

LErj þ LEsj0 þ e 8j8r 2

2 P; g8s 2 P; m X ðY 0rsj þ Y 0srj Þ 6 2m 1 8r 2 P

gr –gs

8s 2 P;

ð10Þ g r –g s

ð11Þ

j¼1

Ei ; Y rsj ; Y 0rsj 2 f0; 1g;

LErj P 0;

Brj unrestricted in sign

3.1. Suggested algorithm 1. Let T be the set including all training samples and P be the set of G initial hyper-box indices each of which represents a class. Let g r 8r 2 P, be the class label of box r and P ¼ £. 2. Solve the MCPm model. 3. According to the solution obtained, put: i. X 0rj ¼ X rj ; LE0rj ¼ LErj ; 8j8r 2 P. ii. Let P ¼ P [ P iii. Eliminate all samples with Ei –0 from the training set, that is T = fijEi ¼ 0; i ¼ 1; . . . ; ng iv. Let P ¼ £. 4. Add one more box for each class that has a sample in T. i. Let P include newly added boxes and update K. ii. Update g r 8r 2 P. iii. Update r i by assigning samples in T to the corresponding new boxes. 5. Formulate the MCPm model using updated values and solve the new model. 6. If the objective values of two successive iterations are equal, then stop. Otherwise, go to step 3. Therefore, only the misclassiﬁed samples (from different classes that usually form a small subset of the training dataset) are passed to the next iteration. In this way, the corresponding constraints of

386

A. Maskooki / Computers & Industrial Engineering 66 (2013) 383–388

type (5) and (6) are not produced and thus the number of binary variables ðEi Þ is step by step decreased. Henceforth, Xu and Papageorgiou’s MCP model is simply referred as MCP, and the related algorithm as MCP algorithm. Similarly, the proposed new model and the suggested algorithm are referred as MCPm and MCPm algorithm, respectively. Note that, constraints of type (7), (8) together with (9)–(11) in MCPm have a similar function to non-overlapping constraints of type (3), (4) in MCP. It is worth to say that, the total number of (7)–(11) constraints, are much fewer than constraints of type (3), (4) when large number of boxes are produced. As a result, the number of corresponding binary variables is reduced. In fact, if h is the maximum number of same-label boxes, then the number of nonoverlapping constraints is a function of h. It can be shown that the number of constraints of type (3) in MCP is always greater than the total number of constraints (7), (9), (10) in MCPm. In addition, 2 the increasing rate of these non-overlapping constraints is of Oðh Þ in MCP algorithm, while it is of O(h) in MCPm algorithm. For a proof of this claim, see Appendix A. 4. Comparison of two models In order to evaluate the proposed improved model, its performance is compared with MCP method on ﬁve real datasets, which are used by Xu and Papageorgiou (2009). Two real datasets (Firm1 and Firm2) refer to ﬁnancial ratios of bankrupt and non-bankrupt ﬁrms. These examples are taken from the reference (Xu & Papageorgiou, 2009). The other three real datasets (Iris, Glass and E. coli) are available at UCI machine learning repository http://archive.ics.uci.edu/ml/datasets.html (Frank & Asuncion, 2010). Iris dataset (Fisher, 1936) consists of 150 samples from three different types of Iris plant. Each sample is characterized by four independent attributes (sepal length, sepal width, petal length and petal width). The Glass dataset includes 9 structural attributes for 214 instances that classiﬁes the data into 6 glass types. The last dataset represents 336 protein sequences of Escherichia coli, which are characterized by 7 attributes and labeled according to 8 classes of localization sites. A summary of data characteristics is given in Table 1. The comparison is done using m-fold cross validation (m-fold CV) for m = 3 and 8. In m-fold CV method, the training set is divided into m subsets of almost equal size. The model is trained m times. Each time, one subset is left out and the model is trained to determine the decision boundaries, using the remaining subsets. Then the class membership of the omitted samples is predicted using the boundaries obtained from the training phase. When the class membership of all samples is predicted, the number of correctly classiﬁed samples is divided by the total number of samples. This ratio is known as the model accuracy. The value of U in two models is set as the smallest integer, which is greater than the maximum absolute value among vectors’ components, and e is set to 10(t+1) where t is the maximum number of decimal places of vectors’ components in each dataset.

The mathematical programming models and algorithms have been implemented in GAMS modeling software using CPLEX solver with the default settings on a 2 GHz PC (Intel Core2 CPU T7200) with 1 GB of RAM. According to the computational results (see Table 2), on Firm1, Firm2 and Iris datasets, the number of misclassiﬁed samples obtained by the two models is equal in almost all situations. Therefore, the accuracy of the two algorithms is the same, Whereas, MCPm algorithm reached the solution 1.5–2 times faster than MCP. The accuracy of MCPm on larger size datasets (Glass and E.coli), evaluated using 3-fold CV method, is equal to or better (in Glass example) than MCP. In addition, the execution speed of the suggested algorithm is increased up to 5.2 times on E.coli dataset, in comparison to MCP algorithm. In 8-fold CV method, the accuracy of MCPm is slightly decreased. In return, the solution is reached within 2.24 h and more than 9.7 h are saved in the last example. As can be seen in Table 2, the rate of increase in the training time of MCP algorithm is much more than MCPm, as the size of the training set rises. In order to evaluate the performance of the proposed improved model on different characteristics of patterns (in terms of high overlap or heterogeneity in the attributes), two additional artiﬁcial datasets are also considered. The two patterns are simulated according to what are reported by Xu and Papageorgiou (2009). The vectors are randomly generated with continuous attributes that are uniformly distributed in a two-dimensional space in the range of 2 cm 3 cm. The ﬁrst artiﬁcial example consists of 100 vectors (50 vectors in each class). The two classes are linearly separable but samples of each class are located in different disjoint regions. The second artiﬁcial example includes 200 vectors (100 vectors per class) that 50 vectors of each class overlap the other in a common region. The results are given in Table 3. First artiﬁcial example shows that, MCPm method is not appropriate for datasets with disjoint regions for classes. This happens because MCPm method keeps the errors of samples lying within wrong hyper-boxes from the initial stages, whereas MCP method eliminates the old boundaries in each iteration and so it has access to all data. On the second dataset with high overlap, both algorithms resulted in equal number of misclassiﬁcations. Hence, their accuracies are the same. Whereas, there is a great increase (more than 29 times) in execution speed of MCPm, as can be seen in Table 3. Using the complete datasets of above (real and simulated) examples for training, it is observed that, although MCP algorithm replaces old boundaries by new ones in each iteration, unlike MCPm that keeps the old boundaries, many of ﬁnal borders calculated by the two algorithms are similar. These are the values that are repeatedly computed by the MCP algorithm. According to the computational results in Tables 2 and 3, the suggested algorithm is always faster than MCP algorithm. In particular, this speed-increase can be substantial on overlapping datasets. In addition, there is no signiﬁcant difference between the accuracies of MCP and MCPm in most cases.

5. Conclusion Table 1 Summary of data characteristics. Datasets

Number of samples

Number of classes

Number of attributes

Number of samples in each class

Firm1 Firm2 Iris Glass E.coli

46 83 150 214 336

2 2 3 6 8

4 13 4 9 7

21; 25 61; 22 50; 50; 50 70; 76; 17; 13; 9; 29 143; 77; 2; 2; 35; 20; 5; 52

In this research a new model and algorithm is proposed to improve the work of Xu and Papageorgiou (2009), in which an MILPbased model was introduced for solving the multi-class data classiﬁcation problem. In Xu and Papageorgio’s model, the training process is performed in two stages: In the ﬁrst stage, the MILP model produces boundaries that form one hyper-box for each class. The objective is to minimize the total number of misclassiﬁcations. In order to improve the accuracy of the model (named as MCP), an iterative algorithm is introduced in the second stage of

387

A. Maskooki / Computers & Industrial Engineering 66 (2013) 383–388 Table 2 Comparisons of two models using CV technique for ﬁve real datasets. Example

Algorithm

3-Fold CV (%)

Execution time (in seconds)

Time of MCP/MCPm

8-Fold CV (%)

Execution time (s)

Time of MCP/MCPm

Firm1

MCP MCPm

78.26 78.26

3 1.5

2

78.26 78.26

8 5

1.6

Firm2

MCP MCPm

90.36 90.36

3 2

1.5

91.57 91.57

12 8

1.5

Iris

MCP MCPm

92.67 92.67

3 1.5

2

93.33 92.67

11 5.5

2

Glass

MCP MCPm

47.20 49.53

1245 1086

1.1

50.47 48.60

8443 8019

1

E.coli

MCP MCPm

85.42 85.42

15645 3013

5.2

85.71 83.93

43015 8054

5.3

Time of MCP/MCPm

8-Fold CV (%)

Execution time (s)

Time of MCP/MCPm

4 3

1.3

94 70

19 4

4.7

72.5 72.5

Table 3 Comparisons of two models using CV technique for two artiﬁcial datasets. Example

Algorithm

3-Fold CV (%)

Artiﬁcial exampl1

MCP MCPm

94 69

Artiﬁcial exampl2

MCP MCPm

70.5 70.5

Execution time (s)

the training process, which allows multiple boxes to be produced for each class and optimizes the number and position of these boxes. The iterative procedure continues until there is no way to achieve a better solution by adding another new hyper-box. In each iteration, a new MILP model is solved. This can result in high computational time, and therefore, low efﬁciency of the algorithm, when applying on large size datasets. The proposed improved model in this paper considerably reduces the training time of the original model. The simple idea of the suggested algorithm is to use the boundaries obtained from previous iterations and eliminate the correctly classiﬁed points from the training set in each iteration, instead of using whole dataset for calculating new boundaries. Therefore, the enclosing constraints corresponding to these points are not produced and thus the number of binary variables is step by step decreased. In addition, It can be shown that the number of non-overlapping constraints (and thus the corresponding binary variables) in MCP is always greater than the number of this type of constraints in the suggested new model (named as MCPm). The increasing rate of this number is of order 2 in MCP and of order 1 in MCPm according to the maximum number of samelabel boxes. The performance of the suggested algorithm is compared with MCP algorithm on ﬁve real and two artiﬁcial datasets, which are used by Xu and Papageorgiou (2009). Results show that, the suggested algorithm is always faster in comparison to the other algorithm. Besides, its accuracy is maintained in general. According to Tables 2 and 3, the rate of increase in the training time of MCPm algorithm is much less than that of MCP, as the size of the training set or the number of overlapping samples rises. Particularly, the speed-increase of MCPm can be substantial on overlapping regions. In future work, the author would like to compare the training time versus accuracy of MCPm method with other standard classiﬁcation functions for solving both multi-class and binary problems, particularly on real datasets. Extending MCPm algorithm to deal with data samples with disjoint class regions is also an interesting direction for future study. However, the existence of binary variables makes ILP-based models time-consuming in general. As mentioned earlier, using hyper-box enclosure method gives us the desired ﬂexibility to approximate boundaries accurately. Hence, designing algorithms

17 14 595 20

1.2 29.7

using this method which are based on optimization techniques (other than ILP) also opens a new area for research. Acknowledgements The author is greatly indebted to Dr. Taghizadeh and Dr. Ghanbari for their encouragement, assistance and valuable comments. The author would also like to thank the anonymous referees for their constructive comments and suggestions that help improve the quality of this manuscript. Appendix A Suppose that, at the beginning of step 6 in MCP algorithm, there are hu same-label boxes for the uth class. Thus, the number of new constraints of type (3) that are produced for each attribute is equal to G G X X u¼1

hu hv

ð12Þ

v¼1 v –u

In MCPm algorithm, constraints (7), (9), (10) have a similar function to (3). Consider the worst case in which the maximum number G G is added to the existing boxes. Thus, 2 constraints of type 2 PG PG (7) and 2 v ¼1 u ¼ 1 ðhu 1Þ constraints of types (9), (10) are prou–v duced for each attribute after step 5 of MCPm algorithm. Suppose the maximum value of hu for all classes to be h. Hence, the equiv G G 2 for (12), and 2ð2h 1Þ for the summation alent term 2h 2 2 P P G þ 2 Gv ¼1 Gu ¼ 1 ðhu 1Þ is obtained. 2 2 u–v G G 2 P 2ð2h 1Þ or Therefore, the inequality 2h 2 2 2 h P 2h 1 for each h P 1 is resulted. The ﬁrst iteration implies that the minimum possible number for h is 1. Since ﬁrst iterations of both algorithms are the same, the number of non-overlapping constraints in MCP is always

388

A. Maskooki / Computers & Industrial Engineering 66 (2013) 383–388

greater than the number of these constraints in MCPm, and the 2 increasing rate of these non-overlapping constraints is of Oðh Þ in MCP algorithm, whereas it is of O(h) in MCPm algorithm. References Astorino, A., & Gaudioso, M. (2002). Polyhedral separability through successive LP. Journal of Optimization Theory and Applications, 112(2), 265–293. Bal, H., & Orkcu, H. H. (2011). A new mathematical programming approach to multigroup classiﬁcation problems. Computers and Operations Research, 38(1), 105–111. Bennett, K. P., & Mangasarian, O. L. (1992a). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1(1), 23–34. Bennett, K. P., & Mangasarian, O. L. (1992b). Multicategory discrimination via linear programming. Optimization Methods and Software, 3(1–3), 27–39. Erenguc, S. S., & Koehler, G. J. (1990). Survey of mathematical programming models and experimental results for linear discriminant analysis. Managerial and Decision Economics, 11(4), 215–225. Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188. Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. Accessed 07.04.13.. Freed, N., & Glover, F. (1981). Simple but powerful goal programming models for discriminant problems. European Journal of Operational Research, 7(1), 44–60. Freed, N., & Glover, F. (1986). Evaluating alternative linear programming models to solve the two-group discriminant problem. Decision Sciences, 17(2), 151–162. Gehrlein, W. V. (1986). General mathematical programming formulations for the statistical classiﬁcation problem. Operations Research Letters, 5(6), 299–304. Glover, F. (1990). Improved linear programming models for discriminant analysis. Decision Sciences, 21(4), 771–785. Glover, F., Keene, S., & Duea, B. (1988). A new class of models for the discriminant problem. Decision Sciences, 19(2), 269–280. Gochet, W., Stam, A., Srinivasan, V., & Chen, C. (1995). Multi-group discriminant analysis using linear programming. Operations Research, 45(2), 213–225. Hertz, J., Palmer, R., & Krogh, A. (1991). Introduction to the theory of neural computation. Boulder, Colorado: Westview Press.

Koehler, G. J. (1990). Considerations for mathematical programming models in discriminant analysis. Managerial and Decision Economics, 11(4), 227–234. Kone, E. R. S., & Karwan, M. H. (2011). Combining a new data classiﬁcation technique and regression analysis to predict the cost-to-serve new customers. Computers and Industrial Engineering, 61(1), 184–197. Lam, K. F., Choo, E. U., & Moy, J. W. (1996). Minimizing deviations from the group mean: A new linear programming approach for the two-group classiﬁcation problem. European Journal of Operational Research, 88(2), 358–367. Lam, K. F., & Moy, J. W. (2002). Combining discriminant methods in solving classiﬁcation problems in two-group discriminant analysis. European Journal of Operational Research, 138(2), 294–301. Lee, E. K., & Wu, T. L. (2009). Classiﬁcation and disease prediction via mathematical programming. In P. M. Pardalos & H. E. Romeijin (Eds.), Handbook of Optimization in Medicine (pp. 381–430). New York: Springer. Mangasarian, O. L. (1968). Multi-surface method of pattern separation. IEEE Transactions on Information Theory, 14(6), 801–807. Pai, D. R. (2009). Determining the Efﬁcacy of Mathematical Programming Approaches for Multi-Group Classiﬁcation (Ph.D. thesis). Retrieved from ProQuest Dissertations and Theses database. (UMI No. 3387968). Russell, S., & Norvig, P. (2010). Artiﬁcial intelligence: A modern approach (3rd ed.). New Jersey: Pearson Education, Inc.. Ryoo, H. S. (2006). Pattern classiﬁcation by concurrently determined piecewise linear and convex discriminant functions. Computers and Industrial Engineering, 51(1), 79–89. Safavian, S. R., & Landgrebe, D. A. (1991). A survey of decision tree classiﬁer methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3), 660–674. Schrijver, A. (1998). Theory of linear and integer programming. London: John Wiley & Sons. Sueyoshi, T. (2006). DEA-discriminant analysis: Methodological comparison among eight discriminant analysis approaches. European Journal of Operational Research, 169(1), 247–272. Uney, F., & Turkay, M. (2006). A mixed-integer programming approach to multiclass data classiﬁcation problem. European Journal of Operational Research, 173(3), 910–920. Xu, G., & Papageorgiou, L. G. (2009). A mixed integer optimisation model for data classiﬁcation. Computers and Industrial Engineering, 56(4), 1205–1215.

Improving the efficiency of a mixed integer linear programming based approach for multi-class classification problem

Improving the efficiency of a mixed integer linear programming based approach for multi-class classification problem

Recommend Documents