Extracting rules for classification problems: AIS based approach

Extracting rules for classification problems: AIS based approach

Expert Systems with Applications 36 (2009) 10494–10502 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

235KB Sizes 0 Downloads 56 Views

Expert Systems with Applications 36 (2009) 10494–10502

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Extracting rules for classification problems: AIS based approach Humar Kahramanli *, Novruz Allahverdi Electronic and Computer Education Department, Selcuk University, Konya, Turkey

a r t i c l e

i n f o

Keywords: Hybrid neural networks Artificial Immune Systems Optimization Rule extraction Opt-aiNET

a b s t r a c t Although Artificial Neural Network (ANN) usually reaches high classification accuracy, the obtained results in most cases may be incomprehensible. This fact is causing a serious problem in data mining applications. The rules that are derived from ANN are needed to be formed to solve this problem and various methods have been improved to extract these rules. In our previous work, a hybrid neural network was presented for classification (Kahramanli & Allahverdi, 2008). In this study a method that uses Artificial Immune Systems (AIS) algorithm has been presented to extract rules from trained hybrid neural network. The data were obtained from the University of California at Irvine (UCI) machine learning repository. The datasets are Cleveland heart disease and Hepatitis data. The proposed method achieved accuracy values 96.4% and 96.8% for Cleveland heart disease dataset and Hepatitis dataset respectively. It is been observed that these results are one of the best results comparing with results obtained from related previous studies and reported in UCI web sites. Ó 2009 Published by Elsevier Ltd.

1. Introduction One of the important tasks in data mining is classification. In classification, there is a target variable which is partitioned into predefined groups or classes. The classification system takes labeled data instances and generates a model that determines the target variable of new data instances (Mohamadi, Habibi, Abadeh, & Sadi, 2008). One of the most commonly used classifier technique is Artificial Neural Networks. The reason for being commonly used is to present some properties such as learning from examples and exhibiting some capability for generalization beyond the training data (Mukhopadhyay, Tang, Huang, Yu, & Palakal, 2002). In data mining, Neural Networks are competitive in classification due to their imperviousness of ‘‘the curse of dimensionality” and low computational cost by using large number of dimensionality and the huge volume of data (Wang, 2005). An important drawback of many Artificial Neural Networks is their lack of explanation capability (Andrews, Diederich, & Tickle, 1996). However, they are black boxes and consequently, it is very difficult to understand how an ANN has solved a problem (Mantas, Puche, & Mantas, 2006). This may cause problems in some cases. To solve this problem, researchers are interested in developing a humanly understandable representation for neural networks. This can be achieved by extracting production rules from trained neural networks (Huang & Xing, 2002). As more algorithms for extracting classification * Corresponding author. Tel.: +90 332 2233331; fax: +90 332 2412179. E-mail addresses: [email protected] (H. Kahramanli), noval@selcuk. edu.tr (N. Allahverdi). 0957-4174/$ - see front matter Ó 2009 Published by Elsevier Ltd. doi:10.1016/j.eswa.2009.01.029

rules from networks are developed, neural networks are becoming an attractive alternative to other machine learning methods when one is faced with a decision making problem and an explanation of how each decision is made must be given. An adeural networks is usually its higher predictive accuracy, and with a rule extraction algorithm, the drawback of ‘‘black box” neural network prediction can be overcome (Odajima, Hayashi, Tianxia, & Setiono, 2008). Rule extraction techniques are grouped into three approaches named as decompositional, pedagogical and eclectic. In contrast with the decompositional approach, which analyzes the activation and weights of the hidden layers of the neural network, the pedagogical approach treats the ANN as a black box and extract rules by only looking at the input and output activations (Andrews et al., 1996; Tickle, Andrews, Golea, & Diederich, 1997). Pedagogical approach aims at extracting symbolic rules which map the input– output relationship as closely as possible to the way the ANN understands the relationship. The number of these rules and their form do not directly correspond to the number of weights or the architecture of ANN (Saad & Wunsch, 2007). Finally, the eclectic approach is characterized by any use of knowledge concerning the internal architecture and/or weight vectors in a trained ANN to complement a symbolic learning algorithm (Keedwell, Narayanan, & Savic, 2000a). In our previous work we presented a hybrid neural network for classification. The aim of this study is to develop a new method for rule extraction from hybrid neural networks which are trained by using Artificial Immune Systems. The work is organized as follows: In the second chapter previous studies related with this study are introduced. In the third chapter framework of this study and related background theory are presented. In the fourth chapter,

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

performance metrics are explained. In the fifth chapter, evaluation methods are described. The results of the experiments and evaluation of these results are presented in sixth chapter. In the final this paper is concluded.

2. Literature review A lot of researchers have identified and elaborated various properties, both theoretical and empirical, of ANNs for practical pattern classification tasks (Dorado, Rabunal, Rivero, Santos, & Pazos, 2002; Simpson, 1992). To reveal the information concealed in an ANN, researchers have proposed a number of rule extraction techniques. One of the first rule extraction techniques from neural networks was proposed by Gallant (1988). He was working on connectionist expert systems. In this work each ANN node represents a conceptual entity. This approach provides NN understanding, but requires available domain knowledge (Narazaki, Shigaki, & Watanabe, 1995). Also, since redundant rules can appear in the rule extraction process, this method presents the inconvenient necessity of establishing subjective criteria for choosing the appropriate rules (Baron, 1994). McMillan, Mozer, and Smolenski (1992) described a neural network called RuleNet, which learns symbolic rules in array manipulation domains. The RuleNet was employed to natural language processing. Towell and Shavlik showed how to use ANNs for rule refinement (Towell & Shavlik, 1993). The algorithm was called SUBSET, which is based on the analysis of the weights that make a specific neuron active. Alexander and Mozer developed a rule extraction method, based on connection weights, that supposes activation functions showing approximately Boolean behavior (Alexander & Mozer, 1995). Sethi et al. suggested a rule extraction method based on the connection weights (Sethi & Yoo, 1996). Lu et al. proposed an approach for rule extraction from ANNs based on the clustering of hidden unit activation values (Lu, Setiono, & Liu, 1996). Weijters et al. developed an algorithm called as BP-SOM, which trains NNs and automatically extracts classification rules. BPSOM based on the clustering, performed by Kohonen networks, of hidden unit activations (Weijters, Bosh, & Herik, 1997). Das and Mozer suggested an ANN learning algorithm that provides a discrete representation, more suitable for rule extraction. The suggested method was supposed to be used for recurrent networks, but the authors note that it is possible to employ the same method in other ANNs (Das & Mozer, 1998). Keedwell et al. developed a system in which a genetic algorithm is used to search for rules in the ANN input space (Keedwell et al., 2000a, Keedwell, Narayanan, & Savic, 2000b). Setiono and Leow presented the fast method is based on the relevance of hidden units, considering their information gains (Setiono & Leow, 2000). Palade et al. presented a method of rule extraction from ANNs that is based on interval propagation across the network, using a procedure of inverting an ANN (Palade, Neagu, & Puscasu, 2000). Duch et al. developed a methodology for extracting optimized crisp and fuzzy rules (Duch, Adamczak, & Grabczwski, 2000). Garcez et al. suggested a method to extract non-monotonic rules from ANNs formed by discrete input units (Garcez, Broda, & Gabbay, 2001). Snyders and Omlin compared the performance of symbolic rules extracted from ANNs trained with and without adaptive bias, giving empirical results for a molecular biology problem (Snyders & Omlin, 2001). Jiang et al. described a method that combines ANNs and rule learning (Jiang, Zhou, & Chen, 2002). The proposed algorithm utilizes an ANN ensemble as the front-end process, which generates abundant training instances for the back-end rule learning process. Setiono et al. presented an approach for extracting rules from ANNs trained in regression problems (Setiono, Leow, & Zuarada, 2002). Elalfi

10495

et al. presented an algorithm for extracting rules from databases via trained ANN using genetic algorithm (Elalfi, Haque, & Elalami, 2004). In summary, most of the approaches described in the literature have basically two motivations. On the one hand, some authors noticed that the need for simplification of neural networks to facilitate the rule extraction process, and are in favor of using specialized training schemes and architectures to perform such task. The assumption underlying these approaches is that neural networks can help the extraction of interesting rules. On the other hand, some papers have proposed algorithms mainly intended to clarify the knowledge encoded in previously trained ANNs (Hruschka & Ebecken, 2006). This study is focused on the problem of extracting rules from previously trained hybrid neural network by using AIS. Study on rule extraction from trained hybrid neural network is inspired by the work of Elalfi et al. (2004).

3. Study environment and background theories In our previous work, a hybrid neural network that includes artificial neural network (ANN) and fuzzy neural network (FNN) was developed (Kahramanli & Allahverdi, 2008). In this study, a method has been proposed which used AIS for extracting rules from hybrid neural network 3.1. Hybrid Neural Network Architecture of Hybrid Neural Network is seen in Fig. 1. Backpropagation algorithm has been used for training the network. In the first stage, crisp data are coded as binary and fuzziable data are fuzzified. In the second stage, crisp data are given to ANN1, fuzzy data are given to FNN as inputs and outputs are obtained. In the third stage, these two obtained outputs are given to ANN2 as inputs and output of hybrid neural network is obtained. Obtained output is compared to the desired output and weights of the network are updated by using weight update method of backpropagation algorithm. The training algorithm for the system is executed as follows: Step Step Step Step Step Step

0. 1. 2. 3. 4. 5.

Step 6. Step 7. Step 8. Step 9.

Initialize weights. Code crisp data binary. Fuzzify fuzziable data. When stopping condition is false, do Steps 4–9. For each training pair, do Steps 5–8. Present fuzzy data to the input of the FNN and obtain the result. Present crisp data to the input of the ANN1 and obtain the result. Present results of 5th and 6th phases to the input of ANN2 and calculate the output. If obtained value is different from expected value then update weights. Test stopping condition.

3.2. Artificial Immune System The human immune system protects the body from a large variety of bacteria, viruses, and other pathogenic organisms (Zheng, Zhang, & Nahavandi, 2004). There is no central organ controlling the immune system: various distributed elements perform complementary tasks (Musilek, Lau, Reformat, & Wyard-Scott, 2006). The main purpose of the immune system is to recognize all cells (or molecules) within the body and categorize those cells as self

10496

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

Fuzzy inputs Fuzzification

FNN

Defuzzification

Crisp inputs ANN2

ANN1 Fig. 1. Hybrid neural network.

or non-self (Kalinli & Karaboga, 2005) and protect the organism against disease-causing cells called pathogens and to eliminate malfunctioning cells (Musilek et al., 2006). All elements recognizable by the immune system are called antigens (Musilek et al., 2006). There are two types of antigens: self and non-self. Non-self antigens are disease-causing elements, whereas self-antigens are harmless to the body (Kumar, Prakash, Shankar, & Tiwari, 2006). For every antigen, acquired immune response must be able to produce a corresponding antibody molecule, so that the antigen can be recognized and defended against (Zheng et al., 2004). There are two major groups of immune cells: B-cells and T-cells which help in recognizing an almost limitless range of antigenic patterns. It was discovered that people who had been inoculated against diseases contained certain agents that could in some way bind to other infectious agents. These agents were named antibodies (de Castro & Timmis, 2002). AIS is a computational technique inspired by ideas coming from immunology and used to develop adaptive systems capable to solve different domain problems (Seredynski & Bouvry, 2007). The AIS (de Castro & Timmis, 2002) have become popular over the last year. Applications of AIS include pattern recognition, fault and anomaly detection, data mining and classification, scheduling, machine learning, autonomous navigation, search and optimization areas (Hou, Su, & Chang, 2008). The acronym opt-aiNET stands for ‘‘Optimization version of an Artificial Immune Network” (de Castro & Timmis, 2002). It is a particular type of Artificial Immune System developed to solve optimization problems (de Attux et al., 2005). Opt-aiNET is capable of either unimodal or multimodal optimization and can be characterized by five main features (Timmis & Edmonds, 2004):  The population size is dynamically adjustable.  It demonstrates exploitation and exploration of the search space.  It determines the locations of multiple optima.  It has the capability of maintaining many optima solutions.  It has defined stopping criteria. Opt-aiNet presents a number of interesting features, such as dynamic variation of the population size, local and global search, and the ability to maintain any number of optima (Campelo, Guimarães, Igarashi, Ramírez, & Noguchi, 2006). The opt-aiNET is a valuable tool for solving a wide range of optimization problems for two main reasons:

1. Initialization: create an initial random population of network antibodies. 2. Local search: while stopping criterion is not met, do: – Clonal expansion: for each network antibody, determine its fitness (an objective function to be optimized) and normalize the vector of fitnesses. Generate a clone for each antibody, i.e., a set of antibodies which are the exact copies of their antibody. – Affinity maturation: mutate each clone inversely proportionally to the fitness of its parent antibody that is kept unmutated. For each mutated clone, select the antibody with highest fitness, and calculate the average fitness of the selected antibodies. – Local convergence: if the average fitness of the population does not vary significantly from one iteration to the other, go to the next step; else, return to Step 2. 3. Network interactions: determine the affinity (similarity) between each pair of network antibodies. 4. Network suppression: eliminate all network antibodies whose affinity is less than a pre-specified threshold, and determine the number of remaining antibodies in the network; these are named memory antibodies. 5. Diversity: introduce a number of new randomly generated antibodies into the network and return to Step 2.

4. Performance metrics Accuracy, sensitivity and specificity are the common performance metrics used in medical diagnosis tasks. The measure of the ability of the classifier to produce accurate diagnosis is determined by accuracy. The measure of the ability of the model to identify the occurrence of a target class accurately is determined by sensitivity. The measure of the ability of the model to separate the target class is determined by specificity. So that accuracy, sensitivity and specificity are calculated as follows (Loo, 2005):

Total number of correctly diagnosed cases ð4:1Þ Total number of cases Total number of positive cases correctly diagnosed Sensitivity ¼ Total number of positive cases ð4:2Þ Accuracy ¼

Specificity ¼ 1. It presents a good balance between exploration and exploitation of the search-space. 2. Differently from other evolutionary proposals, it contains a mechanism devised to regulate population size and to maintain the diversity (de Attux et al., 2005). The opt-aiNET algorithm borrows ideas from two main theories about how the immune system operates, namely, clonal selection and the immune network theory (de Attux et al., 2005). The opt-aiNET algorithm can be described as follows (de Castro & Timmis, 2002):

Total number of negative cases correctly diagnosed Total number of negative cases ð4:3Þ

5. Study environment In this paper we propose a novel approach for extracting rules from previously trained hybrid neural network. The idea behind the suggested approach is to use Artificial Immune Systems for optimization of function which is produced from neural network. The proposed new rule extraction algorithm is composed of three parts:

10497

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

(1) Data coding. (2) Classification of coded data. (3) Rule extraction.

Here, k is the number of neurons in hidden layer of ANN1, n is the number of element of input vector of ANN1. Output of jth neuron of hidden layer is calculated as:

yj ¼

5.1. Data coding The attributes in database are grouped into two parts as crisp and fuzziable. Therefore was coded separately. Crisp attributes are coded as binary strings. The below method has been used for coding. Let the data has N crisp attributes. Every attribute An {n = 1,2, . . . , N} is divided into mn substrings formed as fa1 ; a2 ; . . . ; amn g and coded as fbn1 ; bn2 ; . . . ; bnmn g. If attribute An belongs to ai(i = 1,2, . . . .mn) substring, bnj is described as follows:

8 > < 1; i ¼ j bnj ¼ 0; i – j > :

yj ¼

ð5:1Þ

bni

ð5:11Þ

k X

vj 

1 þ e

!

1 Pn i¼1

ð5:12Þ

wij xi þhj

Output of ANN1 is as follows:

mn

ð5:3Þ

n¼1

The following method has been used for coding the fuzziable data. Let the data has K fuzziable attributes. Lk {k = 1,2, . . . , K} attribute of every data has tk fuzzy value like flk1 ; lk2 ; . . . ; lktk g. The coded state of fuzziable attributes are:

XB ¼

ð5:10Þ

wij xi þhj

v j yj

ð5:2Þ



tk K [ [

k X

j¼1

Length of vector X is found as:



i¼1

If we use Eq. (5.10) in Eq. (5.11), we obtain:

n¼1 i¼1

N X

1þe

1 Pn



j¼1

tz ¼ X¼

ð5:9Þ

The sum of weighted input signals for output neuron of ANN1 is calculated as:

The coded state of crisp attributes is: mn N [ [

j ¼ 1; 2 . . . ; k

If we use Eq. (5.8) in Eq. (5.9), we obtain:

tz ¼ j ¼ 1; 2 . . . ; mn

1 ; 1 þ etyj þhj

lki

ð5:4Þ

1 1 þ etzþn

ð5:13Þ

If we use Eq. (5.12) in Eq. (5.13), for output neuron of ANN1 we obtain a function as follows:

0

k P

! 11

 vj P B w x þh  B j¼1 i¼1 ij i j 1þe z ¼ B1 þ e @ 1 n

þn C

C C A

ð5:14Þ

k¼1 i¼1

If we use Eqs. (5.8)–(5.14) for FNN, we obtain the following function for output of FNN:

Length of vector XB is found as:



K X

tk

ð5:5Þ

k¼1

Y ¼ X [ XB ¼

mn N [ [

bni

[

n¼1 i¼1

K [

tk [

! lki

N X

mn þ

n¼1

K X

tk

ð5:7Þ

k¼1

i¼1

wij xi ;

j ¼ 1; 2; . . . ; k

C C A

ð5:15Þ

Here, m is number of elements of FNN input vector; l is number of the neurons in hidden layer. Values z and zb are inputs of ANN2. Hence, the sum of weighted input signals for output neuron of ANN2 is calculated as:

tf ¼ ww1 z þ ww2 zb

The datasets that will be used in this study consist of two classes, so output layer can consist of one neuron. Output will be 1, when the presented vector belongs to class 1 and it will be 0, when the presented vector belongs to class 0. Let’s form the function of output neuron for this neural network. Firstly, we have to look at ANN1 for this. The sum of weighted input signals for jth neuron of hidden weighted sums for hidden layer neurons in ANN1 is calculated as: n X

11 þnb C

ð5:6Þ

5.2. Data classification

tyj ¼

!

k¼1 i¼1

Total length of inputs is:

uz ¼

l P

 v bj  Pm 1 B wb xb þhbj  B j¼1 i¼1 ij i 1þe zb ¼ B1 þ e @

As result, common input of hybrid neural network is:

!

0

ð5:8Þ

ð5:16Þ

Output of hybrid neural network will be as follows:



1 1 þ etf þ1

ð5:17Þ

If we use Eq. (5.16) in Eq. (5.17):



1 1 þ eðww1 zþww2 zbÞþf

ð5:18Þ

If we use Eqs. (5.14) and (5.15) in Eq. (5.18), we obtain the following function for the output of the network:

10498

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

2

0 k P 6 B  6 B j¼1 6  ww1 B1þe 6 @ 4

2

6 6 6 6 6 6 CðYÞ ¼ 6 61 þ e 6 6 6 6 4

! 11 Pn1

vj  1þe



i¼1

þn wij xi þhj

C C C A

31 0

l P

 v bj  Pm 1 B wb xb þhbj  B j¼1 i¼1 ij i 1þe þww2 B1 þ e @

!

7 7 11 3 7 7 7 þnb C 7 7 C 7 þ f7 C 7 7 7 A 5 7 7 7 7 5

ð5:19Þ

5.3. Extraction of rules C(Y) function that is obtained for extraction of rules from hybrid neural network has been optimized. Opt-aiNET algorithm has been used for this purpose. The algorithm that is proposed for extraction of rules from hybrid neural network is as follows (Kahramanli, 2008): Optimize the function that forms as the result of training the network. Step 1. Decode the antibodies that are obtained as the result of optimization. Step 2. Each antibody is divided into two subsets; first m elements of the antibody are crisp attributes, which composed X, the following t elements of the antibody are fuzzy attributes, which compose XB (See 5.2 - 5.5). Step 2.1. Decode the subset X that determines the crisp attributes. Step 2.1.1. Set X is divided into N segments. Every segment determines the attribute An and has the length of mn. Step 2.1.2. Decode according to Eq. (5.1). Step 2.1.3. For combining different values of the same attribute use ‘‘OR”, for combining different attributes use ‘‘AND”. Step 2.2. Decode subset XB that determines fuzzy attributes. Step 2.2.1. Divide subset XB into K segments. Every segment determines the attribute Lk and has length of tk. Step 2.2.2. Decode by implementing defuzzification. Step 2.2.3. For combining different attributes use ‘‘AND”. Step 3. Form the rule base by combining the subrules which are obtained in Step 2.1.3 and Step 2.2.3, by using ‘‘AND”.

serum cholesterol in mg/dl, fasting blood sugar > 120 mg/dl, resting electro cardio graphic results, maximum heart rate achieved, exercise induced angina, oldpeak = ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels (0–3) colored by fluoroscopy, that: 3 = normal; 6 = fixed defect; 7 = reversable defect. 165 of the samples belong to healthy class and 138 belong to sick class. The dataset has two classes and the classes are coded as zero and one for presence and absence, respectively. The range values that the attributes have are seen in Table 1. Every crisp attribute value is coded as binary by using Table 2. Length of binary string changes according to how many different values that the attribute can have. Every bite of string is ‘‘0” or ‘‘1” according to which subset that the attribute belongs to. For example, if Chest Pain Type = asympt, then ‘‘Chest Pain Type” is coded as [0, 1, 0, 0]. With the coding scheme shown in Table 2 we had a total of 23 binary inputs for ANN1. Fuzzy attributes also take the values which are seen in Table 3. Triangular and trapezoidal membership functions are used for fuzzification. 15 inputs had formed for FNN via these fuzzified data. In the hidden layers of ANN1 and FNN have six and four neuron, respectively. All parameter was chosen empirically for the best convergence rate between the actual and desired output. The neural network has been trained by using all of the data with the purpose of rule extraction for Cleveland heart diseases dataset. According to this training result, the network has learned 300 of 303 data.

Step 0.

Table 1 Range values and attribute names for Cleveland heart diseases database. Attribute

Range

1 2 3

Age Sex Chest pain type

4 5 6 7 8 9 10

Resting blood pressure Serum cholesterol in mg/dl Fasting blood sugar > 120 mg/dl Resting electro cardiographic results Maximum heart rate achieved Exercise induced angina Oldpeak = ST depression induced by exercise relative to rest The slope of the peak exercise ST segment Number of major vessels (0–3) colored by fluoroscopy Thal

29–77 Male, female Angina, asympt, notang, abnang 94–200 126–564 0, 1 Norm, abn, hyper 71–202 0, 1 0–6.2

11 12 13

Up, flat, down 0–3 Norm, fixed, rever

Table 2 Coding of the crisp attributes of Cleveland heart diseases dataset. Attribute

No. of inputs

Subintervals

6. Evaluation

Sex Chest pain type (CPT)

2 4

Two different datasets are used in this study for application part. The used datasets are medical datasets which consist of Cleveland heart disease and Hepatitis data. The chosen dataset for this first experiment is the Cleveland heart disease from UCI Machine Learning Repository (http://www.ics.uci.edu/mlearn/ MLRepository.html). This dataset contains 303 samples that are taken from patients with heart problem. The dataset has 13 attributes namely, age, sex, chest pain type, resting blood pressure,

Fasting blood sugar > 120 mg/dl (FBS) Resting electro cardiographic results (RECR) Exercise induced angina (EIA) The slope of the peak exercise ST segment (Slope) Number of major vessels (0–3) colored by fluoroscopy (NMV) Thal

2 3

{male}, {female} {angina}, {asympt}, {notang}, {abnang} {0}, {1} {norm}, {abn}, {hyper}

2 3

{0}, {1} {up}, {flat}, {down}

4

{0}, {1}, {2}, {3}

3

{norm}, {fixed}, {rever}

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502 Table 3 Coding of the fuzziable attributes of Cleveland heart diseases dataset. Attribute

No. of inputs

Subintervals

Age

3

Resting blood pressure (RBP)

3

Serum cholesterol in mg/dl (SC)

3

Maximum heart rate achieved (MHRA)

3

Oldpeak = ST depression induced by exercise relative to rest (Oldpeak)

3

{Young}, {Middleaged}, {Old} {Low}, {Medium}, {High} {Low}, {Medium}, {High} {Low}, {Medium}, {High} {Low}, {Medium}, {High}

Table 4 Classification accuracies obtained by the proposed algorithm and other classifiers for Cleveland Heart Diseases database (http://www.fizyka.umk.pl/kmk/projects/ datasets.html). Method

Accuracy (%)

Reference

Suggested approach in this study C-MLP2LN FSM

96.4 82.5 82.2

Suggested approach in this study RA, estimated Rafał Adamczak

The obtained function C(Y) (see Eq. (5.19)) has been optimized using Opt-aiNET algorithm. Class 0 has been formed by decoding the antibodies that equals the function C(Y) to 0, class 1 has been formed by decoding the antibodies that equals the function C(Y) to 1, as the result of optimization. Antibodies consist of 23 binary and 15 real numbers. The explained method in Section 5.3 has been applied for decoding antibodies. As result,

Table 5 The obtained Cleveland heart diseases database classification accuracy, sensitivity and specificity. Accuracy Sensitivity Specificity

96.37% 96.37% 96.38%

10499

36 rules have been obtained for class 0, 40 rules have been obtained for class 1. The classification accuracy of extracted rules for suggested approach is shown in Table 4 with the accuracies which are obtained for the same problem related to other methods in http://www.fizyka.umk.pl/kmk/projects/datasets.html for Cleveland Heart Disease. As it can be seen in Table 5, classification accuracy, sensitivity and specificity of the proposed system for Cleveland Heart Diseases database according to Eqs. (4.1)–(4.3) are 96.37%, 96.37% and 96.38%, respectively. It is not possible to show all of the generated rules due to lack of space. Examples of the extracted set of rules for Cleveland Heart Disease database have been presented in Tables 6 and 7. The dataset that is chosen for the second experiment is the hepatitis dataset from the same database (http://www.ics.uci.edu/mlearn/MLRepository.html). This dataset has been formed by Bojan Cestnik. Dataset researches whether the people who have this disease continue living or not. This dataset has been formed by being used 19 of hepatitis tests and consists of 155

Table 8 Range values and attribute names for Hepatitis database.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Attribute

Intervals

Age Sex Steroid Antivirals Fatigue Malaise Anorexia Liver Big Liver Firm Spleen Palpable Spiders Ascites Varices Bilirubin Alk phosphate SGOT Albumin Protime Histology

7 -78 Male, Female No, Yes No, Yes No, Yes No, Yes No, Yes No, Yes No, Yes No, Yes No, Yes No, Yes No, Yes 0.3–8 26–295 14–648 2.1–6.4 0–100 No, Yes

Table 6 Examples of the extracted set of rules for Class 0 for the Cleveland heart diseases database. No

Rules

1.

If Age 6 68 & (CPT = asympt OR CPT = abnang) & RBP 6 147 & FBS > 120 = 0 & MHRA 2 [92,175] & EIA = 1 & Oldpeak 2 [0.3,3.5] & Slope – up & Thal = Rever Then Class 0 If Age 6 63 & Sex = Male & CPT = asympt & RBP 6 141 & MHRA 2 [86,172] & Oldpeak 6 1.8 & Slope – down & (NMV = 1 OR NMV = 2) & Thal – Fixed Then Class 0 If Age P 50 & RBP P 110 & SC P 195 & RECR = hyper & MHRA 6180 & Oldpeak 2 [0.6, 3.2] & Slope – up & (NMV = 1 OR NMV = 2) Then Class 0 If Age 6 63 & RBP P 110 & FBS > 120 = 0 & MHRA 6 183 & Oldpeak 2[2.4, 6.2] & Slope – up & Thal – Rever Then Class 0 If Age 6 63 & Sex = Male & CPT = asympt & RBP P 108 & RECR = hyper & MHRA 6 174 & Oldpeak 2 [0.2, 3.7] & Slope – down & NMV = 4 & Thal – Fixed Then Class 0

2. 3. 4. 5.

Table 7 Examples of the extracted set of rules for Class 1 for the Cleveland Heart diseases database. No

Rules

1. 2. 3.

If Age 2 [37,67] & Sex = Female & CPT – asympt & RBP 6 145 & RECR – abn & MHRA 6 181 & Oldpeak 6 2.3 & Slope – flat & NMV 6 1 & Thal – Rever Then Class 1 If Age P 45 & Sex = Female & CPT – angina & RBP 6 147 & MHRA 6 170 & EIA = 0 & Oldpeak 2 [0.1, 3.4] & (NMV = 0 OR NMV = 2) Then Class 1 If Age 6 69 & (CPT = angina OR CPT = notang) & RBP P 111 & SC 6238 & RECR – abn & MHRA 2[101,180] & Oldpeak 6 2.5 & Slope – flat & NMV 6 1 & Thal – Rever Then Class 1 If Age 6 66 & Sex = Male & RBP 2 [107, 142] & MHRA P 124 & EIA = 0 & Oldpeak 6 3.1 & Slope – flat & NMV = 1 & Thal = norm Then Class 1 If Age 6 66 & Sex = Female & CPT – angina & RBP 2 [105, 141] & SC 6 233 & RECR – hyper & MHRA P 132 & EIA = 0 & Oldpeak 6 3.4 & NMV – 3 & Thal = norm Then Class 1

4. 5.

10500

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

Table 9 Coding of the crisp attributes of Hepatitis dataset. Attribute

Subintervals

Sex Steroid Antivirals Fatigue Malaise Anorexia Liver Big Liver Firm Spleen Palpable Spiders Ascites Varices Histology

{Male} , {Female} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes} {No}, {Yes}

Table 11 Classification accuracies obtained by the proposed algorithm and other classifiers for Hepatitis Database (http://www.fizyka.umk.pl/kmk/projects/datasets.html). Method

Accuracy (%)

Reference

Suggested approach in this study C-MLP2LN FSM CART (decision tree)

96.78 96.1 90 82.7

Suggested approach in this study http://www.fizyka.umk.pl http://www.fizyka.umk.pl http://www.fizyka.umk.pl

Table 12 The obtained Hepatitis database classification accuracy, sensitivity and specificity. Accuracy

96.78%

Sensitivity Specificity

97.56% 93.75%

Table 10 Coding of the fuzziable attributes of Hepatitis dataset. Attribute

No. of inputs

Subintervals

Age Bilirubin Alk phosphate SGOT Albumin Protime

3 3 2 2 3 3

{Young}, {Middle-aged}, {Old} {Low}, {Medium}, {High} {Normal}, {High} {Normal}, {High} {Low}, {Medium}, {High} {Low}, {Medium}, {High}

samples. Thirty two of the samples have died and 123 have continued living. The dataset has 19 attributes namely, age, sex, steroid, antivirals, fatigue, malaise, anorexia, liver big, liver firm, spleen palpable, spiders, ascites, varices, bilirubin, alk phosphate, SGOT, albumin, protime, histology. Outputs are coded as 0 and 1. 0 means that the person continues living, 1 means that the person has died. The range values that the attributes have are seen in Table 8.

Crisp attribute values are coded as binary by using Table 9. ‘‘Yes” is coded as {0,1}, ‘‘No” is coded as {1,0}, ‘‘Male” is coded as {1,0} and ‘‘Female” is coded as {0,1}. Fuzzy attributes also take the values that are seen in Table 10. Twenty six binary inputs are formed for ANN1 via the coding method that is shown in Table 9. Fuzzy attributes also takes the values which are seen in Table 10. Triangular and trapezoidal membership functions have been used for fuzzification. Sixteen inputs are formed for FNN via these fuzzified data. In the hidden layers of ANN1 and FNN have four and five neuron, respectively. All parameter was chosen empirically for the best convergence rate between the actual and desired output. The neural network has been trained by using all of the data with the purpose of rule extraction for Hepatitis dataset. As the result of this training, the network has learned all of the 155 data. The obtained function C(Y) has been optimized. Opt-aiNET algorithm has been used for optimization. Class 0 has been formed by

Table 13 Examples of the extracted set of rules for Class 0 for the Hepatitis database. No

Rules

1.

If Age 2 [13,67] & Sex = Male & Anorexia = Yes & Liver_Firm = Yes & Spleen Palpable = Yes & Ascites = Yes & Varices = Yes & Bilirubin 6 2 & Alk phosphate 6 121 & SGOT P 35 & Albumin 2 [3,6] & Protime P 13 & Histology = No Then Class0 If Age 2 [16,67] & Sex = Male & Anorexia = Yes & Spiders = Yes & Bilirubin 6 2 & Alk phosphate 2 [65,112] & SGOT 6 58 & Albumin 2 [3,5] & Protime P 29 Then Class0 If Age 2 [19,67] & Steroid = No & Antiviral = Yes & Spiders = Yes & Varices = Yes & Bilirubin 6 2 & Alk phosphate 2 [70,117] & Albumin 2 [3,6] & Protime P 28 & Histology = No Then Class0 If Age 2 [9,58] & Malaise = Yes & Liver_Firm = Yes & Spleen Palpable = Yes & Ascites = Yes & Varices = Yes & Bilirubin 6 2 & Alk phosphate 6 115 & Albumin 2 [3,6] & Protime P 24 & Histology = Yes Then Class0 If Age 2 [36,72] & Sex = Male & Liver_Firm = No & Spiders = No & Ascites = Yes & Bilirubin 6 1 & Alk phosphate 2 [63,115] & SGOT 6 57 & Albumin 2 [3,6] & Protime 6 68 Then Class0

2. 3. 4. 5.

Table 14 Examples of the extracted set of rules for Class 1 for the Hepatitis database. No

Rules

1.

If Age 2 [19,70] & Sex = Male & Antiviral = Yes & Fatigue = No & Ascites = No & Bilirubin 6 2 & Alk phosphate P 64 & SGOT 6 54 & Albumin 6 5 & Protime 6 66 & Histology = Yes Then Class1 If Age 2 [12,68] & Sex = Male & Steroid = No & Antiviral = Yes & Fatigue = No & Anorexia = Yes & Liver_Big = Yes & Liver_Firm = Yes & Bilirubin P 2 & Alk phosphate P 64 & SGOT P 16 & Albumin 6 3.8 & Protime 6 51 & Histology = Yes Then Class1 If Age 2 [11,70] & Sex = Male & Steroid = No & Antiviral = Yes & Fatigue = No & Malaise = No & Liver_Big = Yes & Liver_Firm = Yes & Spiders = No & Bilirubin 6 2 & Alk phosphate 6 117 & SGOT P 16 & Albumin 6 4 & Protime 6 49 Then Class1 If Age 2 [35,69] & Sex = Male & Malaise = No & Anorexia = Yes & Liver_Big = Yes & Spleen Palpable = No & Spiders = No & Ascites = Yes & Bilirubin 6 2 & Alk phosphate P 68 & Albumin 2 [3,6] & Protime 6 48& Histology = Yes Then Class1 If Age 2 [35,55] & Liver_Big = Yes & Liver_Firm = Yes & Spleen Palpable = No & Spiders = No & Varices = Yes & Bilirubin 6 2 & Alk phosphate P 71 & Albumin 6 4 & Protime 2 [20,59] & Histology = Yes Then Class1

2. 3. 4. 5.

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

decoding the antibodies that equals the function C(Y) to 0, class 1 has been formed by decoding the antibodies that equals the function C(Y) to 1, as the result of optimization. Antibodies consist of 26 binary and 16 real numbers. The method that is explained above has been applied for decoding antibodies. As result, 45 rules have been obtained for class 0, 22 rules have been obtained for class 1. The classification accuracy of extracted rules for suggested approach is shown in Table 11 with the accuracies which are obtained for the same problem related to other methods in http:// www.fizyka.umk.pl/kmk/projects/datasets.html for Hepatitis database. As it can be seen in Table 12, classification accuracy, sensitivity and specificity of the proposed system for Hepatitis database according to (4.1), (4.2) and (4.3) are 96.78%, 97.56% and 93.75% respectively. It is not possible to show all generated rules due to lack of space. Examples of the extracted set of rules for Hepatitis database have been presented in Tables 13 and 14. 7. Conclusion Mining classification rules is an important task of data mining. In the previous work we presented a hybrid neural network for classification (Kahramanli & Allahverdi, 2008). In this paper, an algorithm for extracting comprehensible rules from the trained hybrid neural network has been presented. This algorithm takes all input attributes into consideration and extracts the rules efficiently. The approach for extracting rules from hybrid neural network consists of three phases: (1) Data coding. (2) Classification of coded data. (3) Rule extraction. The proposed approach has been applied to two real world classification problems. The data has been obtained from University of California at Irvine (UCI) Machine Learning Repository. The suggested rule extraction algorithm has been applied to hybrid neural network. The results of comparison of the experiments show that the developed approach can generate more accurate rules. Although the method is successful, it is seen that a great number of rules have been formed. In future works, being realized the classification with fewer rules by decreasing this number is aimed. Acknowledgement This study is supported by the Scientific Research Projects Unit of Selcuk University. References Alexander, J. A., & Mozer, M. C. (1995). Template-based algorithm for connectionist rule extraction. In G. Tesauro, D. Touetzky, & T. Leen (Eds.). Advances in neural information processing systems (Vol. 7). Cambridge, MA: MIT Press. Andrews, R., Diederich, J., & Tickle, A. B. (1996). A survey and critique of techniques for extracting rules from trained Artificial Neural Networks. Knowledge-Based Systems, 8, 373–389. Baron (1994). Knowledge extraction from neural networks: A survey. In: Report No. 94-17, Laboratoire de l’Informatique du Paralle’lisme, Ecole Normale Supe’rieure de Lyon. Campelo, F., Guimarães, F. G., Igarashi, H., Ramírez, J., & Noguchi, S. (2006). A modified immune network algorithm for multimodal electromagnetic problems. IEEE Transactions on Magnetics, 42(4). Das, S., & Mozer, M. (1998). Dynamic on-line clustering and state extraction: An approach to symbolic learning. Neural Networks, 11(1), 53–64. de Attux, R. R., Duarte, L. T., Ferrari, R., Panazio, C. M., de Castro, L. N., Von Zuben, F. J., et al. (2005). MLP-based equalization and pre-distortion using an artificial

10501

immune network. Machine learning for signal processing, 2005. IEEE workshop (pp. 177–182). de Castro, L. N., & Timmis, J. (2002). Artificial immune systems: A new computational intelligence approach. UK: Springer. Dorado, J., Rabunal, J. R., Rivero, D., Santos, A. & Pazos, A. (2002). Automatic recurrent ANN rule extraction with genetic programming. In: Proceedings of the 2002 international joint conference on neural networks (pp. 1552–1557). Duch, W., Adamczak, R., & Grabczwski, K. (2000). A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks, 11(2), 1–31. Elalfi, A. E., Haque, R., & Elalami, M. E. (2004). Extracting rules from trained neural network using GA for managing E-business. Applied Soft Computing, 4. Gallant, S. I. (1988). Connection expert systems. Communications of the ACM, 31(2), 152–169. Garcez, A. S. D., Broda, K., & Gabbay, D. M. (2001). Symbolic knowledge extraction from trained neural networks: A sound approach. Applied Intelligence, 125, 155–207. Hou, T., Su, C., & Chang, H. (2008). Using neural networks and immune algorithms to find the optimal parameters for an IC wire bonding process. Expert System with Applications, 34, 427–436. Hruschka, E. R., & Ebecken, N. F. F. (2006). Extracting rules from multilayer perceptrons in classification problems: A clustering-based approach. Neurocomputing, 70, 384–397. . Last Accessed March 2008. Huang, S. H., & Xing, H. (2002). Extract intelligible and concise fuzzy rules from neural networks. Fuzzy Sets and Systems, 132, 233–243. Jiang, Y., Zhou, Z., & Chen, Z. (2002). Rule learning based on neural network ensemble. In Proceedings of the international joint conference on neural networks, Honolulu (pp. 1416–1420). Kahramanli, H. (2008). Developing a classification and rule extraction systems using hybrid fuzzy neural network. PhD thesis. Selcuk University. Kahramanli, H., & Allahverdi, N. (2008). Design of a hybrid system for the diabetes and heart diseases. Expert Systems with Applications, 35(1–2), 82–89. Kalinli, A., & Karaboga, N. (2005). Artificial immune algorithm for IIR filter design. Engineering Applications of Artificial Intelligence, 18, 919–929. Keedwell, E., Narayanan, A., & Savic, D. (2000a). Evolving rules from neural networks trained on continuous data. Evolutionary Computation. In Proceedings of the 2000 congress on evolutionary computation. Keedwell, E., Narayanan, A., & Savic, D. (2000b). Creating rules from trained neural networks using genetic algorithms. International Journal of Computers Systeming Signals (IJCSS), 1(1), 30–42. Kumar, A., Prakash, A., Shankar, R., & Tiwari, M. K. (2006). Psychoclonal algorithm based approach to solve continuous flow shop scheduling problem. Expert System with Applications, 31, 504–514. Loo, C. K. (2005). Accurate and reliable diagnosis and classification using probabilistic ensemble simplified fuzzy ARTMAP. IEEE Transactions on Knowledge and Data Engineering, 17(11). Lu, H., Setiono, R., & Liu, H. (1996). Effective data mining using neural networks. IEEE Transactions on Knowledge and Data Engineering, 8(6), 957–961. Mantas, C. J., Puche, J. M., & Mantas, J. M. (2006). Extraction of similarity based fuzzy rules from artificial neural networks. International Journal of Approximate Reasoning, 43, 202–221. McMillan, M. C., Mozer, P., & Smolenski, P. (1992). Rule induction through integrated symbolic and subsymbolic processing. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.). Advances in Neural Processing Systems (Vol. IV, pp. 969–976). Los Allos: Morgan Kaufmann Publishers. Mohamadi, H., Habibi, J., Abadeh, M. S., & Sadi, H. (2008). Data mining with a simulated annealing based fuzzy classification system. Pattern Recognition, 41, 1824–1833. Mukhopadhyay, S., Tang, C., Huang, J., Yu, M., & Palakal, M. (2002). A comparative study of genetic sequence classification algorithms, Neural Networks for Signal Processing. In Proceedings of the 2002 12th IEEE workshop on 4–6 September, (pp. 57–66). Musilek, P., Lau, A., Reformat, M., & Wyard-Scott, L. (2006). Immune programming. Information Sciences, 176, 972–1002. Narazaki, H., Shigaki, I., Watanabe, T., (1995). A method for extracting approximate rules from neural network. In Proceedings of the fourth IEEE international conference on fuzzy systems and second international fuzzy engineering symposium, Yokohama, Japan. Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J., (1998). UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. . (Last Accessed January 2007). Odajima, K., Hayashi, Y., Tianxia, G., & Setiono, R. (2008). Greedy rule generation from discrete data and its use in neural network rule extraction. Neural Networks. doi:10.1016/j.neunet.2008.01.003. Palade, V., Neagu, D., Puscasu, G. (2000). Rule extraction from neural networks by interval propagation. In Proceedings of the fourth IEEE international conference on knowledge-based intelligent engineering systems. Brighton, UK. (pp. 217–220). Saad, E. W., & Wunsch, D. C. II (2007). Neural network explanation using inversion. Neural Networks, 20, 78–93. Seredynski, F., & Bouvry, P. (2007). Anomaly detection in TCP/IP networks using immune systems paradigm. Computer Communications, 30, 740–749. Sethi, I., & Yoo, J. (1996). Multi-valued logic mapping of neurons in feedforward networks. Engineering Intelligent Systems, 4(4), 153–243.

10502

H. Kahramanli, N. Allahverdi / Expert Systems with Applications 36 (2009) 10494–10502

Setiono, R., & Leow, K. (2000). FERNN: An algorithm for fast extraction of rules from neural networks. Applied Intelligence, 12(1–2), 15–25. Setiono, R., Leow, W. K., & Zuarada, J. M. (2002). Extraction of rules from artificial neural networks for nonlinear regression. IEEE Transactions on Neural Networks, 13(3), 564–577. Simpson, P. (1992). Fuzzy min–max neural networks. Part 1. Classification. IEEE Transactions on Neural Networks, 3, 776–786. Snyders, S., Omlin, C. (2001). Rule extraction from knowledge-based neural networks with adaptive inductive bias. In Proceedings of the eighth international conference on neural information processing (ICONIP), (Vol. 1, pp. 143–148). Tickle, A. B., Andrews, R., Golea, M., & Diederich, J. (1997). Rule extraction from trained artificial neural networks. Neural Networks Analysis, Architectures and Applications, 61–69.

Timmis, J., & Edmonds, C. (2004). A comment on Opt-aiNET: An immune network algorithm for optimization. In D. Kalyanmoy et al. (Eds.), Genetic and Evolutionary Computation. Lecture Notes in Computer Science (Vol. 3102, pp. 308–317). Springer. Towell, G. G., & Shavlik, J. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101. Wang, S. (2005). Classification with incomplete survey data: A Hopfield neural network approach. Computers and Operations Research, 32, 2583-1594. Weijters, T., Bosh, A. V. D., Herik, J. V. D. (1997). Intelligible neural networks with BP-SOM. In: Proceedings of the ninth dutch conference on artificial intelligence. (pp. 27–36). Zheng, H., Zhang, J., & Nahavandi, S. (2004). Learning to detect texture objects by artificial immune approaches. Future Generation Computer Systems, 20, 1197– 1208.