An associative memory approach to medical decision support systems

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307 journal homepage: www.intl.elsevierhealth.com...

Download PDF

986KB Sizes 1 Downloads 158 Views

Report

PDF Reader
Full Text

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

An associative memory approach to medical decision support systems夽 a ˜ Mario Aldape-Pérez a,b,∗ , Cornelio Yánez-Márquez , Oscar Camacho-Nieto a , Amadeo J.Argüelles-Cruz a a b

Center for Computing Research, CIC-IPN Building, Nueva Industrial Vallejo, G.A. Madero, Mexico City 07738, Mexico Superior School of Computing, ESCOM-IPN Building, Lindavista, G.A. Madero, Mexico City 07738, Mexico

a r t i c l e

i n f o

a b s t r a c t

Article history:

Classiﬁcation is one of the key issues in medical diagnosis. In this paper, a novel approach

Received 10 September 2010

to perform pattern classiﬁcation tasks is presented. This model is called Associative Mem-

Received in revised form

ory based Classiﬁer (AMBC). Throughout the experimental phase, the proposed algorithm

1 April 2011

is applied to help diagnose diseases; particularly, it is applied in the diagnosis of seven dif-

Accepted 13 May 2011

ferent problems in the medical ﬁeld. The performance of the proposed model is validated

Keywords:

twenty well known algorithms. Experimental results have shown that AMBC achieved the

Associative memories

best performance in three of the seven pattern classiﬁcation problems in the medical ﬁeld.

by comparing classiﬁcation accuracy of AMBC against the performance achieved by other

Decision support systems

Similarly, it should be noted that our proposal achieved the best classiﬁcation accuracy

Supervised Machine Learning

averaged over all datasets. © 2011 Elsevier Ireland Ltd. All rights reserved.

algorithms Pattern classiﬁcation

1.

Introduction

Expert systems (ESs) as we know them today have their origins in the ground-breaking work of Feigenbaum, Buchanan and Lederberg [1–3] in the late sixties and early seventies. From that time until now, demonstrable successes of ES have resulted in the emergence of knowledge-based applications and, more particularly, on decision support systems. Unlike most daily decisions, many health-care decisions have important implications for the quality of life of the patient, and involve signiﬁcant uncertainties and trade-offs. The uncertainties may be about the diagnosis, the accuracy of available diagnostic tests, the prevalence of the disease and its

attendant risk factors. For such complex decisions, which are inherently affected by so many uncertainties, it is indispensable to have computational tools that help to identify which variables of the problem should have a major impact on our decision. It is also needed to apply effective mathematical models, as well as efﬁcient algorithms that allow decreasing the level of uncertainty in the diagnosis of the disease. Early models of learning matrices appeared more than four decades ago [4–6], and since then associative memories have attracted the attention of major research groups worldwide. From a connectionist model perspective an associative memory can be considered a special case of the neural computing approach for pattern recognition [7–9]. Furthermore, associative memories have a number of properties, including a rapid,

夽 This work was supported by the Science and Technology National Council of Mexico under Grant No. 174952, by the National Polytechnic Institute of Mexico (Project No. SIP-IPN 20101709) and by the ICyTDF (Grant No. PIUTE10-77 and PICSO10-85) ∗ Corresponding author at: Center for Computing Research, CIC-IPN Building, Nueva Industrial Vallejo, G.A. Madero, Mexico City 07738, Mexico. Tel.: +52 55 5729 6000x52032, 56584; fax: +52 55 5754 0506. E-mail address: [email protected] (M. Aldape-Pérez). URL: http://www.aldape.org.mx (M. Aldape-Pérez).

0169-2607/$ – see front matter © 2011 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2011.05.002

288

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

compute efﬁcient best-match and intrinsic noise tolerance that make them ideal for many applications [10–12]. As a consequence, associative memories have emerged as a computational paradigm to efﬁciently solve pattern recognition tasks. The ﬁrst known mathematical model of associative memory is the Lernmatrix, developed in 1961 by the German scientist Karl Steinbuch, who published his article in a German journal titled Kybernetik [4]. Eight years after the Lernmatrix, Scottish scientists created the Correlograph, which is an elemental optical device able to behave as an associative memory [13]. In 1972, supported by the UCLA, James A. Anderson proposed his Interactive Memory [12]. In April of the same year, Teuvo Kohonen, professor at the Helsinki University of Technology at that time, introduced his Correlation Matrix Memories [9]. Three months later, Kaoru Nakano from Todai (Tokyo University) unveiled his Associatron [14]. In that year, Shun-ichi Amari, a professor at Todai (Tokyo University), published a theoretical work related to Self-Organizing Nets of Threshold Elements [15]. This work by Amari establishes a precedent to the creation of what would become one of the most important associative memory models: the Hopﬁeld model. The ideas of Anderson and Kohonen, and to some extent Nakano’s, gave rise to the model currently known as Linear Associator. In 1982 John J. Hopﬁeld [16], published an iterative model based on spin-glasses. Two years later published a second article, where he introduced an extension to the original model: a continuous model [17]. Hopﬁeld results caused great excitement throughout the associative memories and neural network worlds. So much that many scientist who had so far remained on the sidelines, got interested in these topics. Thus, in the late 1980s many scientists took the classic models and gave birth to new kinds of associative memories [18]. Hopﬁeld models are also appealing to many cognitive modelers because of their apparent similarity to human episodic memory: they can recall patterns after a single exposure using a Hebbian learning rule, and they are capable of retrieval from partial or noisy patterns [19]. This model is one of the most popular work that use Hebbian learning and owes some of its advantages regarding learning and recalling of altered patterns to Hebbian learning rules [20]. Among the myriad contributions and innovations in the ﬁeld of associative memories, an associative memory approach for pattern recognition termed as Distributed Hierarchical Graph Neuron (DHGN) was presented by Khan and Muhamad Amin [21]. DHGN is a scalable, distributed, and one-shot learning pattern recognition algorithm which uses graph representations for pattern matching without increasing the computation complexity of the algorithm. This model has been successfully tested for character patterns with structural and random distortions. The pattern recognition process is completed in one-shot and within a ﬁxed number of steps [22–26]. In 1998 Ritter et al. [27] introduced a novel class of artiﬁcial neural networks, called morphological neural networks, in which the operations of multiplication and addition are replaced by addition and maximum (or minimum),

respectively. By taking the maximum (or minimum) of sums instead of the sum of products, morphological network computation is nonlinear before possible application of a nonlinear activation function. The main difference between morphological associative memories [27–38] and classical associative memories—such as the Linear Associator and the Hopﬁeld model—is that, while the classical models base their operation on the usual addition and multiplication operations over the ring of rational numbers, morphological memories are based on two lattice operations: dilation and erosion, which are immersed in a belt [39]. According to Haykin [40], the Hopﬁeld model is a classical example of a recurrent neural network; which is at the same time, an associative memory model. Even though the Hopﬁeld model has been a cornerstone for both neural networks and associative memories, it has two crippling disadvantages. First, this model shows a very low recall capacity, 0.15 n, where n is the dimension of stored patterns. Also, the Hopﬁeld model is autoassociative, which means that it is not able to learn and thus recall input patterns which are different from the output patterns. In the late 1980s, Kosko [41] developed a heteroassociative memory from two Hopﬁeld memories with the aim of remedying the second disadvantage of the Hopﬁeld model. The Bidirectional Associative Memory (BAM) is based on an iterative algorithm, just as the Hopﬁeld model. Associative models, including both associative memories and BAMs, have found their applications in many ﬁelds of human endeavour. They have been widely used to create knowledge databases for expert agents [42], as classiﬁers [43], data compression [44], ﬁngerprints recognition [45], border detection [46], as English–Spanish/ Spanish–English translator [47,48] among others. In this paper, an associative memory based classiﬁer is presented. The paper is organized as follows. In Section 2, a succinct description of associative memories fundamentals is presented. In Section 3, Associative Memory based Classiﬁer (AMBC) foundations are presented. Section 4 provides a concise description of the most important characteristics of the datasets that were used as test sets to validate the experimentation. Section 5 provides a brief description of each of the algorithms that were used during the experimental phase. In Section 6, a consistent comparison between the classiﬁcation performance achieved by our proposal and the classiﬁcation performance achieved by some well known algorithms in different pattern classiﬁcation problems is presented. Section 7 describes how the experimental phase was conducted. In Section 8, classiﬁcation accuracy results achieved by each one of the compared algorithms in seven different pattern classiﬁcation problems in the medical ﬁeld are presented. Finally, Associative Memory based Classiﬁer advantages, as well as some conclusions will be discussed in Section 9.

2.

Associative memories

An associative memory M is a system that relates input patterns and output patterns as follows:

x→ M →y

289

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

with x and y the input and output pattern vectors, respectively. Each input vector forms an association with its corresponding output vector. For each integer and positive, the corresponding association will be denoted as: (x , y ). An associative memory M is represented by a matrix whose ij th component is mij . An associative memory M is generated from an a priori ﬁnite set of known associations, called the fundamental set of associations. If is an index, the fundamental set is represented as: {(x , y ) | = 1, 2, . . ., p} with p as the cardinality of the set. The patterns that form the fundamental set are called fundamental patterns. If it holds that x = y ∀ ∈ {1, 2, . . ., p}, M is autoassociative, otherwise it is heteroassociative; in this case, it is possible to establish that ∃ ∈ {1, 2, . . ., p} for which / y . If we consider the fundamental set of patterns {(x , x = y ) | = 1, 2, . . ., p} where n and m are the dimensions of the input patterns and output patterns, respectively, it is said that x ∈ An , A = {0, 1} and y ∈ Am . Then the j th component of an input pattern x is xj ∈ A. Analogously, the i th component

¯ ∀ ∈ x = (x − x)

⎛

⎞

x1 x2 ⎟ ⎜ x = ⎜ . ⎟ ∈ An ⎝ . ⎠ . xn

⎛

⎞

Associative memory based classiﬁer

Deﬁnition 3.1. Let x1 , x2 , . . ., xp be fundamental input patterns, so mean vector x¯ is obtained according to the following expression: 1 x p p

=1

1, 2, . . . , p

(2)

Deﬁnition 3.4. Let y1 , y2 , . . . , yp be displaced fundamental output patterns, so the i th component of each displaced fundamental output pattern is coded according to the following expression:

yi

=

∀ ∈

1

if i = k

0

if i = 1, 2, . . . , k − 1, k + 1, . . . m

(3)

1, 2, . . . , p

¯ xω = (xω − x)

(1)

(4)

Deﬁnition 3.6. Let A = {0, 1} and let er be the r th learning reinforcement vector of size n represented as:

⎛

er1

⎞

⎜ r⎟ ⎜ e2 ⎟ ⎜ ⎟ e = ⎜ ⎟ ∈ An ⎜ .. ⎟ ⎝ . ⎠ r

In any associative memory there are two phases that determine the particular performance of each model, namely Learning phase and Classiﬁcation Phase. In our proposal, Associative Memory based Classiﬁer (AMBC), besides the two phases which are intrinsic to all associative memory, a procedure for estimating the quality of learning is incorporated. In what follows, let M be an associative memory whose ij th component is denoted by mij . Let y ∈ Am , A = {0, 1} be the th fundamental output pattern of size m ∈ Z+ and let be an index such that ∈ {1, 2, . . ., p} with p as the cardinality of the set. Let xω ∈ Rn be an unknown input pattern to be classiﬁed, where n ∈ Z+ is the dimension of the unknown input pattern and let r be an index, such that r ∈{1, 2, . . ., (2n − 1)}.

x¯ =

Deﬁnition 3.5. Let xω ∈ Rn be an unknown input pattern to be classiﬁed, so a displaced unknown input pattern to be clas siﬁed, denoted as xω , is obtained according to the following expression:

y1 y2 ⎟ ⎜ y = ⎜ . ⎟ ∈ Am ⎝ . ⎠ . ym

A distorted version of a pattern x to be recalled will be denoted as x˜ . An unknown input pattern to be recalled will be denoted as xω . If when an unknown input pattern xω is fed to an associative memory M, happens that the output corresponds exactly to the associated pattern yω , it is said that recalling is correct.

3.

Deﬁnition 3.3. Let m ∈ Z+ be the number of different classes and let k ∈ {1, 2, . . . , m} be the class to which a fundamental input pattern x ∈ An belongs.

of an output pattern y is represented as yi ∈ A. Therefore, the fundamental input and output patterns are represented as follows:

Deﬁnition 3.2. Let x1 , x2 , . . . , xp be displaced input patterns, obtained according to the following expression:

(5)

ern Deﬁnition 3.7. Let A = {0, 1} and let n ∈ Z+ be the dimension of an input pattern. Given an integer value r ∈ Z+ , the IntegerToVector operator, denoted by , takes r as input and returns a column vector er with r value expressed in its binary representation. Note that er1 is the Most Signiﬁcant Bit (MSB) while ern is the Least Signiﬁcant Bit (LSB). In order to understand the IntegerToVector operator, denoted by , consider the following example. Example 3.1. Let A = {0, 1}, let n = 4 and let r = 11. Obtain the r th learning reinforcement vector of size n, by applying the IntegerToVector operator as stated in Deﬁnition 3.7. To convert an integer to its binary representation, we divide r by two repeatedly, until the ﬁnal remainder is zero. If we take only the remainder of each division, then the number 11 can be expressed as

11 = ( 1 × 23 ) + ( 0 × 22 ) + ( 1 × 21 ) + ( 1 × 20 )

290

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

After applying the IntegerToVector operator, denoted by , we obtain the 11 th learning reinforcement vector of size 4

⎛ ⎞

(11) = e11

1 ⎜0⎟ =⎝ ⎠ 1 1

Learning phase

Consists of ﬁnding the class to which an unknown input pattern xω ∈ An belongs. Finding the class means getting yω ∈ Am that corresponds to xω . Deﬁnition 3.8. Let yω be a column vector that represents the classiﬁcation result of a displaced test pattern xω ∈ Rn , thus, ω the i th component of y is obtained according to the following expression:

yωi =

Find the adequate operators and a way to generate an associative memory M that will store the p associations of the fundamental set. Note that there are m different classes, so each one of the input patterns belongs to class k ∈ {1, 2, . . ., m}, represented by a column vector whose components will be coded as stated in expression (3). Obtain an associative memory M by performing the following steps: 1. Given the fundamental set of associations {(x , y ) | = 1, 2, . . ., p}, obtain the displaced fundamental set of associations {(x , y )| = 1, 2, . . . , p} using expression (1), expression (2) and expression (3). 2. Consider each one of the p associations (x , y ), so an m × n matrix is obtained according to the following expression:

y1

(6)

t

y1 x1 ⎜ . ⎜ .. ⎜

= ⎜ yi x1 ⎜ . ⎜ . ⎝ .

···

yi xj .. .

⎟ ⎟ ⎟ ⎟ · · · yi xn ⎟ ⎟ .. ⎟ . ⎠

ym x1

···

ym xj

···

y1 xn .. .

⎞

y1 xj .. .

···

···

⎜

=

k

p

y · x

t

=1

yi xj

3.2.1.

⎡ ⎤ n ω r ⎣ mhj · xj · ej ⎦

= mij

m×n

(10)

j=1

is the maximum operator.

Classiﬁcation accuracy

Classiﬁcation accuracy of any algorithm can be estimated taking into account the overall number of test patterns that are correctly classiﬁed. In the present paper, classiﬁcation accuracy results were estimated using the following expression:

accuracy(T) =

assess(x ) =

assess(xω )

ω=1

|T|

; xω ∈ T

(11)

1

if classify(xω ) = y

0

otherwise

(12)

ym xn

where y is the actual condition of a test pattern xω and classify(xω ) returns the classiﬁcation result of a test pattern xω by AMBC algorithm.

3.3.

AMBC algorithm

(7)

in this way the ij th component of an associative memory M is expressed as follows: p

otherwise

=1

mij =

h=1

ω

3. Obtain an associative memory M by adding all the p matrices according to the following expression:

M=

0

(9)

where T is the set of unknown input patterns to be classiﬁed (test set). Each time the classiﬁcation result of a test pattern xω ∈ T is equal to the actual condition of that pattern, an integer value equal to 1 will be assigned to the assessment function, as it is shown in the following expression:

j=1

|T|

ym

y · x

⎪ ⎪ ⎩

⎞

⎜ ⎟ ⎜y ⎟ ⎜ 2 ⎟

t ⎟ · x1 , x2 , . . . , xn y · x =⎜ ⎜ .. ⎟ ⎜ . ⎟ ⎝ ⎠

⎛

⎡ ⎤ ⎧ n ⎪

⎪ ⎨ 1 if ⎣ mij · xω · er ⎦ = j j

where represents the maximum threshold value

and

⎛

Classiﬁcation phase

In summary, the IntegerToVector operator, denoted by , helps us to obtain a column vector er with r ∈ Z+ expressed in its binary representation.

3.1.

3.2.

(8)

This section describes the proposed algorithm, called Associative Memory based Classiﬁer (AMBC). This algorithm is divided into three phases. The ﬁrst is the learning phase, the second is the learning reinforcement phase and the third is the classiﬁcation phase. Given the fundamental set of patterns {(x , y ) | = 1, 2, . . ., p} with p as the cardinality of the set, obtain an Associative Memory based Classiﬁer following the steps outlined below.

291

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Fig. 1 – Learning phase. At this stage we obtain the mean ¯ according to expression (1). vector x,

3.3.1.

Learning phase

1. Let n be the dimension of each input pattern in the fundamental set, grouped in m different classes. ¯ according to expression (1). 2. Obtain the mean vector x, 3. Obtain displaced input patterns x1 , x2 , . . . , xp , according to expression (2). 4. Each one of the input patterns belongs to a k class, k ∈ {1, 2, . . ., m}, represented by a column vector whose components will be assigned by yk = 1, so yj = 0 for j = 1, 2 . . . , k − 1, k + 1, . . . m, as stated in expression (3). 5. Create a classiﬁer using expression (6), expression (7) and expression (8). As a result of the learning phase, we obtain an associative memory M (Figs. 1–3).

3.3.2.

Fig. 2 – Learning phase. At this stage we obtain displaced input patterns x1 , x2 , . . . , xp , according to expression (2).

Learning reinforcement phase

1. Initialize r = 1. 2. Initialize rmax = 2n − 1. 3. Use the IntegerToVector operator to get the r th learning reinforcement vector of size n, as stated in expression (5). 4. Classify the fundamental set of patterns {(x , y )| = 1, 2, . . . , p} that was used during the learning phase,

5. 6.

7. 8. 9. 10.

according to expression (9) so an r th classiﬁcation accuracy parameter is obtained. Store both parameters (the r th classiﬁcation accuracy parameter and the r th learning reinforcement vector). Compare the r th classiﬁcation accuracy parameter with the (r − 1) th classiﬁcation accuracy parameter. The best classiﬁcation accuracy value is stored. Increment r If r < rmax then go to Step 3 of Section 3.3.2. Else go to Step of Section 3.3.2. End of Learning reinforcement phase.

As a result of the learning reinforcement phase, we obtain the r th learning reinforcement vector of size n, which allows us to reinforce learning.

3.3.3.

Classiﬁcation phase

Given an unknown input pattern xω ∈ Rn to be classiﬁed and the r th learning reinforcement vector of size n, obtain the unambiguously recalled class vector yω .

1. Obtain displaced input test pattern xω , according to expression (2).

292

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

x5 , x6 , x7 , x8 belong to class 2. Fundamental input patterns are as follows:

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

30 30 30 31 x1 = ⎝ 64 ⎠ , x2 = ⎝ 62 ⎠ , x3 = ⎝ 65 ⎠ , x4 = ⎝ 59 ⎠ 1 3 0 2

34 34 38 39 x5 = ⎝ 59 ⎠ , x6 = ⎝ 66 ⎠ , x7 = ⎝ 69 ⎠ , x8 = ⎝ 66 ⎠ 0 9 21 0 As indicated in step 2 of Section 3.3.1, obtain the mean vector ¯ according to expression (1) x,

⎛

⎞

33.25 x¯ = ⎝ 63.75 ⎠ 4.5 As indicated in step 3 of Section 3.3.1, obtain displaced input patterns x1 , x2 , . . . , x8 , according to expression (2); which is

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎞

⎛

⎛

⎞

⎛

−3.25 −3.25 x = ⎝ 0.25 ⎠ , x2 = ⎝ −1.75 ⎠ −3.50 −1.50 1

−3.25 −2.25 x = ⎝ 1.25 ⎠ , x4 = ⎝ −4.75 ⎠ −4.50 −2.50 3

⎞

0.75 0.75 x = ⎝ −4.75 ⎠ , x6 = ⎝ 2.25 ⎠ −4.50 4.50 5

⎞

4.75 5.75 x = ⎝ 5.25 ⎠ , x8 = ⎝ 2.25 ⎠ 16.50 −4.50 7

Fig. 3 – Learning phase. As a result of the learning phase, we obtain an associative memory M.

2. Classify displaced input test pattern xω , according to expression (9).

Classiﬁcation phase is applied repeatedly for each unknown input pattern xω ∈ Rn to be classiﬁed. In order to illustrate each step of the proposed algorithm, consider the following example. Notation 3.1. Numerical values of patterns used in this example were randomly taken from the Haberman survival dataset [49]. Each instance of this database has three attributes and a class label. The most important characteristics of this dataset are summarized in Table 3, while a more detailed description of its contents, appears in Section 4.1. Example 3.2. Let p = 8 be the cardinality of the fundamental set of associations and let n = 3 be the dimension of the fundamental input patterns. The fundamental set of associations consists of pairs {(x , y ) | = 1, 2, . . ., 8}. Each input pattern x is a column vector whose components take values in the set R. Similarly, each output pattern y is a column vector whose components will be assigned according to expression (3). Fundamental input patterns x1 , x2 , x3 , x4 belong to class 1, while

Once you have displaced input patterns x1 , x2 , . . . , x8 , obtain their corresponding output patterns, according to expression (3); which is

y1 =

, y2 =

0 1

, y6 =

y5 =

1 0

, y3 =

0 1

, y7 =

1 0

1 0

, y4 =

0 1

, y8 =

1 0

0 1

As indicated in step 5 of Section 3.3.1, create a classiﬁer using expression (6), expression (7) and expression (8).

M=

−12 −5 5 12

−12 12

As a result of the learning phase, we obtain an associative memory M whose dimensions are m × n. The next step is to apply the learning reinforcement phase. Throughout this phase, we will conduct an iterative process to ﬁnd the r th learning reinforcement vector of size n, that allows us to reinforce learning (Figs. 4–6). As indicated in step 1 and step 2 of Section 3.3.2, initialize r = 1, rmax = 23 − 1. Applying step 3 of Section 3.3.2, we have

⎛ ⎞

0 (1) = e1 = ⎝ 0 ⎠ 1

293

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Fig. 4 – Learning reinforcement phase. Use the IntegerToVector operator to get the r th learning reinforcement vector of size n, as stated in expression (5).

As indicated in step 4 of Section 3.3.2, classify the displaced fundamental set of patterns that was used during the learning phase, this is

1

M · x1 · e

=

−12 −5 −12 5 12 12

⎛⎛

Fig. 5 – Learning reinforcement phase. Classify the fundamental set of patterns {(x , y )| = 1, 2, . . . , p} that was used during the learning phase, according to expression (9) so an r th classiﬁcation accuracy parameter is obtained.

⎞ ⎛ ⎞⎞

−3.25 0 · ⎝⎝ 0.25 ⎠ · ⎝ 0 ⎠⎠ −3.50 1

Calculate the ﬁrst component of vector y1 according to expression (9) from Deﬁnition 3.8, this is

As can be seen, classiﬁcation result of displaced test pattern x1 is equal to the actual condition of that pattern; thus, dis placed test pattern x1 was correctly classiﬁed (Figs. 7–9). For the second pattern, we have the following result:

(−12 × −3.25 × 0) + (−5 × 0.25 × 0) + (−12 × −3.50 × 1) = 42

Calculate the second component of vector y1 according to expression (9) from Deﬁnition 3.8, this is (12 × −3.25 × 0) + (5 × 0.25 × 0) + (12 × −3.50 × 1) = −42

M · x1 · e1 =

42 −42

42 −42

→

1 0

1

M· x ·e

=

M · x2 · e1 =

according to expression (10) from Deﬁnition 3.8, the maximum threshold value for this pattern is = 42; in this way, classiﬁca tion result of displaced test pattern x1 is obtained according to expression (9) from Deﬁnition 3.8, this is y1 =

2

−12 −5 5 12 18 −18

−12 12

⎛⎛

⎞ ⎛ ⎞⎞

−3.25 0 · ⎝⎝ −1.75 ⎠ · ⎝ 0 ⎠⎠ −1.50 1

, = 18

thus, classiﬁcation result of displaced test pattern x2 is

2

y

=

18 −18

→

1 0

as can be seen, displaced test pattern x2 was correctly classiﬁed.

294

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Fig. 6 – Learning reinforcement phase. As a result of the learning reinforcement phase, we obtain the r th learning reinforcement vector of size n.

For the third pattern, we have the following result:

M · x3 · e

1

=

M · x3 · e1 =

−12 −5 −12 12 5 12 54 −54

⎛⎛

⎞ ⎛ ⎞⎞

−3.25 0 · ⎝⎝ 1.25 ⎠ · ⎝ 0 ⎠⎠ −4.50 1

y

=

54 −54

→

as can be seen, displaced test pattern x4 was correctly classiﬁed. For the ﬁfth pattern, we have the following result:

1 0

M·

M · x4 · e

=

M · x4 · e1 =

−12 −5 −12 12 5 12 30 −30

⎛⎛

y4 =

30 −30

→

1 0

y5 =

⎞ ⎛ ⎞⎞

−2.25 0 · ⎝⎝ −4.75 ⎠ · ⎝ 0 ⎠⎠ −2.50 1

54 −54

=

=

−12 12 54 −54

−5 −12 5 12

⎛⎛

⎞ ⎛ ⎞⎞

0.75 0 · ⎝⎝ −4.75 ⎠ · ⎝ 0 ⎠⎠ −4.50 1

, = 54

x4

is

→

1 0

as can be seen, classiﬁcation result of displaced test pattern x5 is different from the actual condition of that pattern, this is

, = 30

thus, classiﬁcation result of displaced test pattern

· e1

1

x5

1

thus, classiﬁcation result of displaced test pattern x5 is

as can be seen, displaced test pattern x3 was correctly classiﬁed. For the fourth pattern, we have the following result:

M · x5 · e

, = 54

thus, classiﬁcation result of displaced test pattern x3 is 3

Fig. 7 – Classiﬁcation phase. Given an unknown input pattern xω ∈ Rn to be classiﬁed and the r th learning reinforcement vector of size n, obtain the unambiguously recalled class vector yω .

1 0

= /

0 1

consequently, displaced test pattern x5 was not correctly classiﬁed.

295

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Fig. 9 – Classiﬁcation phase. Classify displaced input test pattern xω , according to expression (9).

For the seventh pattern, we have the following result:

1

M · x7 · e M·

x7

· e1

=

=

−12 −5 5 12 −198 198

−12 12

⎛⎛

⎞ ⎛ ⎞⎞

4.75 0 · ⎝⎝ 5.25 ⎠ · ⎝ 0 ⎠⎠ 16.50 1

, = 198

thus, classiﬁcation result of displaced test pattern x7 is Fig. 8 – Classiﬁcation phase. Obtain displaced input test pattern xω , according to expression (2).

y7 =

−198 198

→

0 1

as can be seen, displaced test pattern x7 was correctly classiﬁed. For the eighth pattern, we have the following result: For the sixth pattern, we have the following result:

M · x6 · e1 =

M · x6 · e1 =

−12 12 −54 54

−5 −12 5 12

⎛⎛

⎞ ⎛ ⎞⎞

0.75 0 · ⎝⎝ 2.25 ⎠ · ⎝ 0 ⎠⎠ 4.50 1

, = 54

−54 54

→

M · x8 · e

=

M · x8 · e1 =

−12 −5 12 5 54 −54

−12 12

⎛⎛

⎞ ⎛ ⎞⎞

5.75 0 · ⎝⎝ 2.25 ⎠ · ⎝ 0 ⎠⎠ −4.50 1

, = 54

thus, classiﬁcation result of displaced test pattern x is

1

thus, classiﬁcation result of displaced test pattern x8 is 6

y6 =

y8 =

54 −54

→

1 0

as can be seen, classiﬁcation result of displaced test pattern x8 is different from the actual condition of that pattern, this is

0 1

as can be seen, displaced test pattern x6 was correctly classiﬁed.

1 0

= /

0 1

296

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Table 1 – Classiﬁcation performance achieved with different values of r, ranging from r = 1 to r = 4. r=1 1

x x2 3 x x4 5 x x6 7 x x8

r=2

√ √ √ √

× √ × √

× √ √

× √ √ √

× 75.0

r=3

r=4

√ √ √ √

√ √ √ √ √ √ √ √

× √ √ × 75.0

62.5

Table 3 – Characteristics of datasets used in the experimental phase.

100.0

consequently, displaced test pattern x8 was not correctly classiﬁed. In summary, for r = 1, six out of eight patterns were correctly classiﬁed. According to step 5 of Section 3.3.2, store classiﬁcation performance for r = 1. As indicated in step 7 of Section 3.3.2, increment r. This procedure is repeated for each of the patterns, but with different values of r. Notation 3.2. If we take the number of patterns that were correctly classiﬁed for each value of r, we can estimate the quality of learning; that is, if we take the number of patterns that were correctly classiﬁed, we can identify the r th learning reinforcement vector of size n, that allows us to reinforce learning. Table 1 shows classiﬁcation performance achieved with different values of r, ranging from r = 1 to r = 4. Table 2 shows classiﬁcation performance achieved with different values of r, ranging from r = 5 to r = 7. We can see from Table 1 that for r = 1 there were two instances wrongly classiﬁed, for r = 2 there were three instances wrongly classiﬁed, for r = 3 there were two instances wrongly classiﬁed and for r = 4 all instances were correctly classiﬁed. As we can see from Table 2, for r = 5, r = 6 and r = 7 there was only one instance wrongly classiﬁed. As a result of the learning reinforcement phase, we obtain the r th learning reinforcement vector of size n, that allows us to reinforce learning. Considering results shown in Table 1 and Table 2, we can see that the best classiﬁcation performance is achieved for r = 4.

Dataset 1. 2. 3. 4. 5. 6. 7.

Haberman Liver Inﬂammation Diabetes Breast Heart Hepatitis

x x2 3 x x4 5 x x6 7 x x8

r=5

r=6

r=7

√ √ √ √

√ √ √ √

√ √ √ √

× √ √ √

× √ √ √

× √ √ √

87.5

87.5

87.5

Missing

306 345 120 768 699 270 155

3 6 6 8 9 13 19

No No No No Yes No Yes

Notation 3.3. Numerical values of test patterns (unknown input patterns to be classiﬁed) used in this example were randomly taken from the Haberman survival dataset [49]. Each instance of this database has three attributes and a class label. The most important characteristics of this dataset are summarized in Table 3, while a more detailed description of its contents, appears in Section 4.1. The test set consists of the following patterns:

⎛

⎞

⎛

⎞

31 33 x9 = ⎝ 65 ⎠ , x10 = ⎝ 58 ⎠ 4 10

⎛

x11

⎞

⎛

⎞

41 41 = ⎝ 60 ⎠ , x12 = ⎝ 64 ⎠ 23 0

x9 , x10 belong to class 1, while x11 , x12 belong to class 2. The same way as with training patterns, obtain a set of displaced test patterns. As indicated in step 1 of Section 3.3.3, obtain displaced test patterns x9 , x10 , . . . , x12 , according to expression (2); which is

⎛

⎞

⎛

⎞

−2.25 −0.25 x9 = ⎝ 1.25 ⎠ , x10 = ⎝ −5.75 ⎠ −0.50 5.50

⎛

1

Attributes

In this case, the 4 th learning reinforcement vector of size n is e4 . The next step is to apply the Classiﬁcation phase as stated in Section 3.3.3. Given an unknown input pattern xω ∈ Rn to be classiﬁed and the r th learning reinforcement vector of size n, obtain the unambiguously recalled class vector yω .

x11

Table 2 – Classiﬁcation performance achieved with different values of r, ranging from r = 5 to r = 7.

Instances

⎞

⎛

⎞

7.75 7.75 = ⎝ −3.75 ⎠ , x12 = ⎝ 0.25 ⎠ 18.50 −4.50

As we can see from Table 1, the r th learning reinforcement vector of size n, that allows us to reinforce learning is e4 . As indicated in step 2 of Section 3.3.3, classify displaced test pat terns x9 , x10 , . . . , x12 , according to expression (9). For the ninth pattern, we have the following result:

9

4

M· x ·e

=

M · x 9 · e4 =

−12 12 27 −27

−5 −12 5 12

⎛⎛

⎞ ⎛ ⎞⎞

−2.25 1 · ⎝⎝ 1.25 ⎠ · ⎝ 0 ⎠⎠ −0.50 0

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

according to expression (10) from Deﬁnition 3.8, the maximum threshold value for this pattern is = 27; thus, classiﬁcation result of displaced test pattern x9 is obtained according to expression (9) from Deﬁnition 3.8, this is

y9 =

27 −27

1 0

→

as can be seen, displaced test pattern x9 was correctly classiﬁed. For the tenth pattern, we have the following result:

10

M· x

4

·e

=

M · x10 · e4 =

−12 12

−5 5

3 −3

, = 3

−12 12

⎛⎛

⎞ ⎛ ⎞⎞

−0.25 1 · ⎝⎝ −5.75 ⎠ · ⎝ 0 ⎠⎠ 5.50 0

thus, classiﬁcation result of displaced test pattern x10 is

3 −3

y10 =

1 0

→

as can be seen, displaced test pattern x10 was correctly classiﬁed. For the eleventh pattern, we have the following result:

11

M· x

4

·e

=

M · x11 · e4 =

−12 12 −93 93

−5 5

−12 12

⎛⎛

⎞ ⎛ ⎞⎞

7.75 1 · ⎝⎝ −3.75 ⎠ · ⎝ 0 ⎠⎠ 18.50 0

, = 93

thus, classiﬁcation result of displaced test pattern x11 is

−93 93

y11 =

0 1

→

as can be seen, displaced test pattern x11 was correctly classiﬁed. For the twelfth pattern, we have the following result:

12

M· x

4

·e

=

M · x12 · e4 =

−12 −5 5 12 −93 93

−12 12

⎛⎛

⎞ ⎛ ⎞⎞

7.75 1 · ⎝⎝ 0.25 ⎠ · ⎝ 0 ⎠⎠ −4.50 0

, = 93

thus, classiﬁcation result of displaced test pattern x12 is

y12 =

−93 93

→

0 1

as can be seen, displaced test pattern x12 was correctly classiﬁed. In summary, in this example we have shown the steps of the proposed algorithm. We obtained a vector of size n, that allows us to reinforce learning. We have also shown the behavior of the proposed algorithm when trying to classify unknown input patterns.

3.4.

AMBC algorithm complexity analysis

Complexity theory investigates the amount of computational resources needed to execute an algorithm. An algorithm is a

297

ﬁnite set of precise rules for a computational procedure that solves a problem [50]. It is generally accepted that an algorithm provides a satisfactory solution when it produces a correct answer efﬁciently. The efﬁciency of an algorithm can be estimated in two ways. One measure of efﬁciency is the time required by the computer to solve a problem using a given algorithm. A second measure of efﬁciency is the amount of memory required to implement that algorithm when input data are of a given size. In this section we analyze the behavior of the proposed algorithm taking into account time complexity as well as space complexity.

3.4.1.

Time complexity

The worst-case time complexity of an algorithm is deﬁned as a function of the size of the input. For a given input size, the worst-case time complexity is the maximal number of execution steps needed for executing the program on arbitrary input of that size. Operations used to measure time complexity can be singleprecision ﬂoating point comparison, single-precision ﬂoating point addition, single-precision ﬂoating point division, variable assignation, logical comparison, or any other elemental operation. The following is deﬁned:

EO: elemental operation. n: dimension of input patterns. p: cardinality of the fundamental set of patterns.

Notation 3.4. The learning reinforcement phase will be analyzed, since it is the one that requires a greater number of elemental operations.

∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ Learning Reinforcement Phase ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ 1 r− max = ( 2 ˆ ( n ) ) ; 2 f o r r = 1 : r− max − 1 3 c l a s s h i t = 0; 4 c l a s s −m i s s = 0 ; 5 e −r = i n t − t o − v e c t o r ( r ) ; f o r i = 1:p 6 7 y− mu −1= sum ( x−mu ( i ) . ∗ e −r . ∗ M( 1 ) ) ; 8 y − mu− 2= sum ( x−mu ( i ) . ∗ e −r . ∗ M( 2 ) ) ; i f y mu 1 > y− mu− 2 9 10 c l a s s − l a b e l = c l a s s −1 ; 11 else c l a s s − l a b e l = c l a s s −2 ; 12 13 end i f c l a s s − l a b e l == x mu ( i , n ) 14 15 c l a s s −h i t = c l a s s −h i t + 1 ; 16 else 17 c l a s s −m i s s = c l a s s m i s s + 1 ; 18 end 19 end 20 end ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗

298

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ Time c o m p l e x i t y a n a l y s i s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ 1 1 EO , a s s i g n a t i o n 2 m a x i t e r EO , c o m p a r i s o n 3 m a x i t e r EO , a s s i g n a t i o n 4 m a x i t e r EO , a s s i g n a t i o n 5 m a x i t e r ∗n EO , a s s i g n a t i o n 6 m a x i t e r ∗p EO , c o m p a r i s o n 7 a m a x i t e r ∗n ∗p EO , m u l t i p l i c a t i o n 7 b m a x i t e r ∗n ∗p EO , m u l t i p l i c a t i o n 7 c m a x i t e r ∗n ∗p EO , a d d i t i o n 7 d m a x i t e r ∗p EO , a s s i g n a t i o n 8 a m a x i t e r ∗n ∗p EO , m u l t i p l i c a t i o n 8 b m a x i t e r ∗n ∗p EO , m u l t i p l i c a t i o n 8 c m a x i t e r ∗n ∗p EO , a d d i t i o n 8 d m a x i t e r ∗p EO , a s s i g n a t i o n 9 m a x i t e r ∗p EO , c o m p a r i s o n 10 m a x i t e r ∗p EO , a s s i g n a t i o n 14 m a x i t e r ∗p EO , c o m p a r i s o n 15 m a x i t e r ∗p EO , a s s i g n a t i o n ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ The total number of Elemental Operations is: n

n

Total EOs = 1 + 3(−1 + (2 )) + (−1 + (2 ))n + 7(−1 + (2n ))p + 6(−1 + (2n ))np

Then if g(n) = 2n , C = 20000 and k = 1, we have that

f (n) ≤ 20000 g(n) , whenever n > 1 Therefore, f(n) is O(2n ).

3.4.2.

Space complexity

The space complexity of a program (for a given input) is the number of elementary objects that this program needs to store during its execution. This number is computed with respect to the size of the input data. Let m be the dimension of an output pattern y , let n be the dimension of an input pattern x and let be an index such that ∈ {1, 2, . . ., p} with p as the cardinality of the set. In order to store the fundamental set of patterns, a matrix is needed. This matrix will have dimensions p × (n + m). The ¯ as well as the r th learning reinforcement mean vector x, vector will have dimensions 1 × n. Similarly, another matrix is needed to store the set of displaced patterns, this matrix will also have dimensions p × (n + m). The resulting associative memory M of the learning phase will have dimensions m × n. The number of elementary objects that the proposed algorithm needs to store during its execution is: TotalObj = p × (n + m) + (1 × n) + (1 × n) + p × (n + m) + (m × n) The number of bytes required to store a single-precision ﬂoating point value can be determined by NumOfBytes = sizeof (float)

By grouping some terms, we have the following: Total EOs = −2 + 3(2n ) + 7(−1 + (2n ))p + (−1 + (2n ))n(1 + 6p) If we factor some terms, we have the following: Total EOs = − 2 + 3(2n ) − n + (2n )n − 7p + 7(2n )p − 6np + 3(2n+1 )np Finally, the equation of the total number of Elemental Operations can be written as Total EOs = −2 − 7p + (−1 + (2n ))n(1 + 6p) + (2n )(3 + 7p) The growth of time and space complexity with increasing input size n is a suitable measure of the efﬁciency of the algorithm. To obtain an estimate of the complexity of the algorithm when it is applied to a known test set, we took the dataset with the largest number of features, which is the Hepatitis disease dataset [49]. As it is shown in Table 3, each of the 155 instances has 19 features and a class label. The number of fundamental input patterns is p = 155. The growth of functions is usually described using the bigO notation [50]. Deﬁnition 3.9. Let f and g be functions from the integers or the real numbers to the real numbers. We say that f(n)is O(g(n)) if there are constants C and k such that f (n) ≤ C g(n) whenever n > k. The total number of Elemental Operations can be written as Total EOs = 1 + 1088(−1 + (2n )) + 931(−1 + (2n ))n A function g(n) and constants C and k must be found, such that the inequality holds. We propose g(n): 1(2n ) − 1088(2n ) + 1088(2n ) − 931n(2n ) + 931n(2n )

Similarly, the number of bytes required to store an integer value can be determined by NumOfBytes = sizeof (int) It is noteworthy that in either case, NumOfBytes = 4. Since each of the components xj ∈ R of an input pattern x ∈ Rn can be represented by a single-precision ﬂoating point value, the number of bytes required to store an input pattern x ∈ Rn is n × NumOfBytes. Similarly, each of the components yi ∈ A, A = {0, 1} of an output pattern y ∈ Am can be represented by an integer value, consequently, the number of bytes required to store an output pattern y ∈ Am is m × NumOfBytes. The total amount of bytes required to implement the proposed algorithm is: TotalBytes = NumOfBytes(p(n + m) + (n) + (n) + p(n + m) + (mn)) TotalBytes = (8p(n + m) + 8(n) + 4(mn))

4.

Datasets

This section provides a brief description of the most important characteristics of the datasets that were used during the experimental phase. All of these were taken from the University of California at Irvine machine learning repository [49]. Characteristics of datasets used in the experimental phase are shown in Table 3.

4.1.

Haberman survival dataset

This database contains cases from a study that was conducted at the University of Chicago’s Billings Hospital on the survival

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

of patients who had undergone surgery for breast cancer. The purpose of the dataset is to identify the survival status of patients who had undergone surgery for breast cancer. Haberman survival dataset consists of 306 instances belonging to two different classes (225 “the patient survived 5 years or longer” cases, 81 “the patient died within 5 year” cases). Each instance consists of 4 attributes, including the class attribute.

4.2.

Liver disorders dataset

This database was created by BUPA Medical Research Ltd and was donated by Richard S. Forsyth. This dataset contains cases from a study that was conducted on liver disorders that might arise from excessive alcohol consumption. Liver disorders dataset consists of 345 instances belonging to two different classes. Each instance consists of 7 attributes, including the class attribute.

4.3.

Acute inﬂammations dataset

This database contains cases from a study that was conducted on the diagnosis of urinary system diseases of patients. This dataset consists of 120 instances. Each instance consists of 6 attributes and two decision labels. The main idea of this dataset is to perform the diagnosis of two diseases of urinary system.

4.4.

Pima Indians diabetes dataset

This database was originally owned by the National Institute of Diabetes and Digestive and Kidney Diseases, U.S. This dataset contains cases from a study that was conducted on female patients at least 21 years old of Pima Indian heritage. This dataset consists of 768 instances belonging to two different classes (500 “the patient tested positive for diabetes” cases, 268 “the patient tested negative for diabetes” cases). Each instance consists of 9 attributes, including the class attribute.

4.5.

Breast cancer dataset

This database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg and was donated by Olvi Mangasarian. This dataset contains periodical samples of clinical cases. Breast cancer dataset consists of 699 instances belonging to two different classes (458 “benign” cases, 241 “malign” cases). Each instance consists of 10 attributes, including the class attribute.

4.6.

Heart disease dataset

This database comes from the Cleveland Clinic Foundation and was supplied by Robert Detrano, M.D., Ph.D. of the V.A. Medical Center, Long Beach, CA. The purpose of the dataset is to predict the presence or absence of heart disease given the results of various medical tests carried out on a patient. This dataset consists of 270 instances belonging to two different classes: presence and absence (of heart-disease). Each instance consists of 14 attributes, including the class attribute.

4.7.

299

Hepatitis disease dataset

This dataset was donated by the Jozef Stefan Institute, former Yugoslavia, now Slovenia. The purpose of the dataset is to predict the presence or absence of hepatitis disease in a patient. Hepatitis disease dataset consists of 155 instances belonging to two different classes (32 “die” cases, 123 “live” cases). Each instance consists of 20 attributes, 13 binary, 6 attributes with discrete values and a class label.

5.

Machine Learning algorithms

This section provides a brief description of each of the algorithms that were used during the experimental phase. It has to be mentioned that, although WEKA 3: Data Mining Software in Java [51] has more than seventy well known algorithms implemented, only the twenty best-performing algorithms were considered for comparison purposes. Further details on the implementation of these algorithms can be found in the following references [52,53].

5.1.

AdaBoostM1

AdaBoost.M1 algorithm, proposed by Yoav Freund and Robert E. Schapire [54], obtains a single composite classiﬁer which is constructed through the combination of various classiﬁers produced by repeatedly running a given “weak” learning algorithm on various distributions over the training data. The “weak” learning algorithm is executed T rounds in order to obtain T “weak” hypotheses, ﬁnally the booster combine the T “weak” hypotheses into a single ﬁnal hypothesis.

5.2.

Bagging

Bagging predictors method [55] works by generating various versions of a predictor and using these to obtain an amalgamated predictor. In order to predict a class label, each predictor casts a vote and a plurality voting scheme is applied. Similarly, when predicting a numeric class value, the multiple versions of a predictor are averaged.

5.3.

BayesNet

Bayesian networks are alternative ways of representing a conditional probability distribution by means of directed acyclic graphs (DAGs). In this graphical model, each node represents a random variable and the arrow connecting a parent node with a child node indicates that there is a relationship between them [56]. This relationship is calculated in terms of conditional probability among variables of interest.

5.4.

Dagging

This meta classiﬁer, proposed by Ting and Witten [57], creates a number of disjoint, stratiﬁed folds out of the data and feeds each chunk of data to a copy of the supplied base classiﬁer. Predictions are made via majority vote, since all the generated base classiﬁers are put into the Vote meta classiﬁer.

300

5.5.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

DecisionTable

This algorithm builds a simple classiﬁer based on a decision table with a default rule mapping to the majority class. This representation called Decision Table Majority (DTM) [58] has two components: a schema which is a set of features that are included in the table and a body consisting of labeled instances from the space deﬁned by the features in the schema.

5.6.

DTNB

Decision Table Naive Approach, proposed by Hall and Frank [59], builds a decision table/naive Bayes hybrid classiﬁer. This method is based on a simple Bayesian network in which the decision table (DT) represents a conditional probability table. At each point in the search, the algorithm evaluates the merit of dividing the attributes into two disjoint subsets: one for the decision table, the other for naive Bayes.

5.7.

5.12.

Class for building and using a simple Naive Bayes classiﬁer, numeric attributes are modeled by a normal distribution. For more information, see [65].

5.13.

Class for a Naive Bayes classiﬁer using estimator classes. This is the updateable version of NaiveBayes. For more information on Naive Bayes classiﬁers, see [64].

5.14.

A random forest is a classiﬁer consisting of a collection of tree-structured classiﬁers such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. For more information on Random Forest classiﬁers, see [66].

5.16.

Logistic

This algorithm focuses on building and using a multinomial logistic regression model with a ridge estimator. le Cessie and van Houwelingen [63] showed how ridge estimators can be used in logistic regression to improve the parameter estimates and to diminish the error made by further predictions.

MultiClassClassiﬁer

MultiClassClassiﬁer, proposed by Eibe Frank, Len Trigg and Richard Kirkby, builds a metaclassiﬁer for handling multi-class datasets with 2-class classiﬁers. This classiﬁer is also capable of applying error correcting output codes for increased accuracy.

RandomSubSpace

This method constructs a decision tree based classiﬁer that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classiﬁer consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. For more information on The Random Subspace Method for Constructing Decision Forests, see [67].

5.17.

RBFNetwork

Class that implements a normalized Gaussian radial basis function network. It uses the k-means clustering algorithm to provide the basis functions and learns either a logistic regression (discrete class problems) or linear regression (numeric class problems). Symmetric multivariate Gaussians are ﬁt to the data from each cluster. If the class is nominal it uses the given number of clusters per class. It standardizes all numeric attributes to zero mean and unit variance. For more information on Radial Basis Functions, see [68].

5.18. 5.11.

RandomForest

LMT

Logistic Model Trees (LMTs) are based on two basic approaches: tree induction and logistic regression [60]. LMT are classiﬁcation trees with logistic regression functions at the leaves, which can deal with binary and multi-class target variables, numeric and nominal attributes and missing values [62].

5.10.

RandomCommittee

Class for building an ensemble of randomizable base classiﬁers. Each base classiﬁers is built using a different random number seed (but based on the same data). The ﬁnal prediction is a straight average of the predictions generated by the individual base classiﬁers.

5.15.

5.9.

NaiveBayesUpdateable

FT

This algorithm focuses on the construction of Functional Trees, which are classiﬁcation trees that could have logistic regression functions at the inner nodes and/or leaves [60]. The effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes were studied by Gama [61].

5.8.

NaiveBayesSimple

RotationForest

NaiveBayes

This algorithm is based on two important simplifying assumptions. NaiveBayes assumes that the predictive attributes are conditional independent given the class, and it posits that no hidden or latent attributes inﬂuence the prediction process [64]. Numeric estimator precision values are chosen based on analysis of the training data.

Rotation Forest, proposed by Rodríguez et al. [69], is a method for generating classiﬁer ensembles based on feature extraction. To create the training data for a base classiﬁer, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

axis rotations take place to form the new features for a base classiﬁer.

5.19.

SimpleLogistic

Classiﬁer for building linear logistic regression models. LogitBoost with simple regression functions as base learners is used for ﬁtting the logistic models. The optimal number of LogitBoost iterations to perform is cross-validated, which leads to automatic attribute selection. For more information, see [62,60].

5.20.

SMO

Class that implements Platt’s Sequential Minimal Optimization algorithm for training a Support Vector Machine. For more information, see [70–72].

6.

Algorithm comparisons

One of the main objectives of this study is to make a consistent comparison between the classiﬁcation performance achieved by our proposal and the classiﬁcation performance achieved by some well known algorithms in different pattern classiﬁcation problems in the medical ﬁeld. There are two fundamental questions that naturally arise. The ﬁrst one is which test is appropriate for comparing the differences between algorithms? The second is how classiﬁcation performance (error rate) is compared? In order to answer the ﬁrst question, Mitchell [73] presented an approach to determine the level of signiﬁcance that one algorithm outperforms another. The classiﬁcation accuracies and standard deviations are considered to differ from one another signiﬁcantly if the result of a t-test is less than or equal to 0.05. Following this approach, good outcomes should have high accuracies and low standard deviation. In order to answer the second question, there are several approaches to make such comparisons. Clark [74] compared the accuracies and standard deviation of each pair of algorithms, averaged over all the experimental datasets. The classiﬁcation accuracies to be averaged are the average accuracies of each algorithm over ﬁve runs. This approach can be strongly criticized since its affect is to ignore the underlying distribution of the dataset [75]. Murthy et al. [76] compared the number of datasets in which an algorithm achieves higher classiﬁcation accuracy averages over ﬁve runs. An algorithm is considered better than its paired algorithm if it achieves higher classiﬁcation accuracy on a greater number of datasets. Kohavi [77] reviewed accuracy estimation methods and compared cross-validation and bootstrap. Experimental results showed that bootstrap has low variance, but extremely large bias on some problems; as a consequence, stratiﬁed 10-fold cross-validation is recommended for model selection. Kohavi and John [78] pointed out that when comparing a pair of algorithms, it is critical to understand that when 10-fold cross-validation is used for classiﬁcation accuracy evaluation, this cross-validation is an independent outer loop. They also pointed out that some researchers have reported accuracy results from the inner cross-validation loop; such results are

301

optimistically biased and are subtle means of training on the test set. In order to make a consistent comparison between the classiﬁcation performance achieved by our proposal and the classiﬁcation performance achieved by some well known algorithms in different pattern classiﬁcation problems in the medical ﬁeld, we followed Kohavi [77] and Kohavi and John [78] approaches.

7.

Experimental phase

Throughout the experimental phase, seven datasets were used as test set to estimate the classiﬁcation performance of each one of the compared algorithms. These databases were taken from the UCI machine learning repository [49], from which full documentation for all datasets can be obtained. The main characteristics of these datasets have been expounded in Section 4. AMBC performance was compared against the performance achieved by the twenty best-performing algorithms of the seventy-six available in WEKA 3: Data Mining Software in Java [51]. WEKA is an open source software issued under the GNU General Public License, freely available on the Web [52]. Further information on each of the algorithms that were used during the experimental phase can be found in [53]. All experiments were conducted using a personal computer with an Intel Core 2 Duo Processor E6700 (4M Cache, 2.66 GHz, 1066 MHz FSB) running Windows XP Professional operating system with 2048 GB of RAM. In order to carry out such a comparison, we applied the same conditions and validation schemes for each experiment. Classiﬁcation accuracy of each one of the compared algorithms was calculated using 50-50 training-test split, 70-30 training-test split, 10-fold cross-validation and leave-one-out cross-validation.

8.

Results and discussion

In this section we analyze the classiﬁcation accuracy results achieved by each one of the compared algorithms in seven different pattern classiﬁcation problems in the medical ﬁeld. Although WEKA 3: Data Mining Software in Java [51] has more than seventy well known algorithms implemented, only the twenty best-performing algorithms were considered for comparison purposes. According to the type of learning scheme, each of these can be grouped in one of the following types of classiﬁers: Bayesian classiﬁers, Functions based classiﬁers, Meta classiﬁers, Rules based classiﬁers and Decision Trees classiﬁers. The twenty best-performing algorithms are as follows: • Four algorithms based on the Bayesian approach (BayesNet [56], NaiveBayes [64], NaiveBayesSimple [65] and NaiveBayesUpdateable [64]). • Four functions based classiﬁers (Logistic [63], RBFNetwork [68], SimpleLogistic [62] and SMO [70]). • Seven meta classiﬁers (AdaBoostM1 [54], Bagging [55], Dagging [57], MultiClassClassiﬁer [52,53], RandomCommittee [52,53], RandomSubSpace [67], RotationForest [69]).

302

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

• Two rules based classiﬁers (DecisionTable [58] and DTNB [59]). • Three decision trees classiﬁers (FT [60], LMT [60], RandomForest [66]).

Table 4, Table 5, Table 6 and Table 7 show the classiﬁcation accuracy achieved by each of the compared algorithms in seven different pattern classiﬁcation problems in the medical ﬁeld, using 50-50 training-test split, 70-30 training-test split, 10-fold cross-validation and leave-one-out cross-validation, respectively. For each compared algorithm, the values of classiﬁcation accuracy averaged over all datasets are given at the end of each row. For each dataset, the highest classiﬁcation accuracy is highlighted with boldface. As is shown in Table 4, Table 5, Table 6 and Table 7 there is no particular method that surpasses all other algorithms in all sorts of problems. This should not be surprising since Wolpert and Macready [79] demonstrated that what an algorithm gains in performance on one class of problems is necessarily offset by its performance on the remaining problems. Table 4 shows classiﬁcation accuracy achieved by each of the compared algorithms, using 50-50 training-test split. Classiﬁcation results are as follows: two of the four functions based classiﬁers (SimpleLogistic [62] and SMO [70]) achieved the best performance in two of the seven pattern classiﬁcation problems. Similarly, one of the three decision trees classiﬁers (RandomForest [66]) achieved the best performance in two of the seven datasets. It is worth noting that our proposal achieved the best performance in three of the seven pattern classiﬁcation problems in the medical ﬁeld, using 50-50 training-test split. Table 5 shows classiﬁcation accuracy achieved by each of the compared algorithms, using 70-30 training-test split. Classiﬁcation results are as follows: one of the seven meta classiﬁers (Bagging [55]) achieved the best performance in two of the seven pattern classiﬁcation problems. Two of the four functions based classiﬁers (SimpleLogistic [62] and SMO [70]) achieved the best performance in two of the seven pattern classiﬁcation problems. Similarly, one of the three decision trees classiﬁers (LMT [60]) achieved the best performance in two of the seven datasets. It is worth noting that our proposal achieved the best performance in three of the seven pattern classiﬁcation problems in the medical ﬁeld, using 70-30 training-test split. Table 6 shows classiﬁcation accuracy achieved by each of the compared algorithms, using 10 fold cross-validation. Classiﬁcation results are as follows: two of the seven meta classiﬁers (Bagging [55] and RotationForest [69]) achieved the best performance in two of the seven pattern classiﬁcation problems. One of the three decision trees classiﬁers (LMT [60]) achieved the best performance in two of the seven datasets. Similarly, two of the four functions based classiﬁers (RBFNetwork [68] and SimpleLogistic [62]) achieved the best performance in two of the seven datasets. It is worth noting that our proposal achieved the best performance in four of the seven pattern classiﬁcation problems in the medical ﬁeld, using 10 fold cross-validation. Table 7 shows classiﬁcation accuracy achieved by each of the compared algorithms, using leave-one-out

cross-validation. Classiﬁcation results are as follows: two of the seven meta classiﬁers (AdaBoostM1 [54] and MultiClassClassiﬁer [52,53]) achieved the best performance in two of the seven pattern classiﬁcation problems. One of the four algorithms based on the Bayesian approach (BayesNet [56]) achieved the best performance in two of the seven datasets. Similarly, two of the three decision trees classiﬁers (FT [60], and LMT [60]) achieved the best performance in two of the seven pattern classiﬁcation problems. Two of the four functions based classiﬁers (Logistic [63] and SimpleLogistic [62]) achieved the best performance in two of the seven datasets. It is worth noting that our proposal achieved the best performance in three of the seven pattern classiﬁcation problems in the medical ﬁeld, using leave-one-out cross-validation. In summary, along the experimental phase we used different types of cross-validation techniques, namely, 50-50 training-test split, 70-30 training-test split, 10-fold crossvalidation and leave-one-out cross-validation to show how accurately the proposed model will perform in practice. After carrying out the experiments and as a consequence of analysis of the results shown in Table 4, Table 5, Table 6 and Table 7, we can say that the proposed algorithm has a competitive performance compared against the performance achieved by the twenty best-performing algorithms of the seventy-six available in WEKA 3: Data Mining Software in Java [51].

9.

Summary

In this paper, a novel approach to perform pattern classiﬁcation tasks is presented. This model is called Associative Memory based Classiﬁer (AMBC). Throughout the experimental phase, the proposed algorithm is applied to help diagnose diseases; particularly, it is applied in the diagnosis of seven different problems in the medical ﬁeld. The performance of the proposed model is validated by comparing classiﬁcation accuracy of AMBC against the performance achieved by the twenty best-performing algorithms of the seventy-six available in WEKA 3: Data Mining Software in Java [51]. An important point to note is that even when it seems that the calculation of repetitive matrices can be an impediment to addressing larger problems, the proper use of tools developed for matrix operations and structured data, such as MATLAB, allow you to manipulate arrays of considerable size. For instance, if you use MATLAB on a computer running 64-bit operating system, the Total Workspace Size in Bytes is <8TB, the Largest Matrix Size in Bytes is <8TB, the Number of Elements in Largest Real Double Array is 248 − 1 (2.8e14) and the Number of Elements in Largest int8 Array is 248 − 1 (2.8e14) [80]. It is also necessary to note that even though the most demanding phase of this algorithm is the learning reinforcement phase, if we look at Fig. 4, Fig. 5 and Fig. 6, we can see that once the learning phase is completed, it is possible to carry out the learning reinforcement phase fully in parallel. This means that we can divide the process of calculating the r th learning reinforcement vector of size n in as many cores or nodes, as we have available.

303

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Table 4 – Classiﬁcation accuracy using 50-50 training-test split. The ﬁrst twenty methods are included in WEKA 3: Data Mining Software in Java [51]. Algorithm

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

AdaBoostM1 Bagging BayesNet Dagging DecisionTable DTNB FT LMT Logistic MultiClassClassiﬁer NaiveBayes NaiveBayesSimple NveBayesUpdateable RandomCommittee RandomForest RandomSubSpace RBFNetwork RotationForest SimpleLogistic SMO AMBC (our proposal)

Datasets

Average

Haberman

Liver

Inﬂammation

Diabetes

Breast

Heart

Hepatitis

72.87 72.22 73.20 73.52 73.20 73.20 71.56 73.52 73.20 73.20 74.50 74.18 74.50 62.74 67.32 73.52 72.22 73.85 74.18 73.52 74.83

66.95 68.11 57.97 57.68 57.97 57.97 68.40 66.37 64.92 64.92 52.17 53.91 52.17 65.50 70.14 66.95 61.73 68.98 66.37 58.55 65.40

100.00 95.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 96.66 96.66 96.66 100.00 100.00 98.33 100.00 100.00 100.00 100.00 91.66

73.95 75.00 75.00 71.35 74.73 74.73 75.91 76.56 77.08 77.08 75.91 75.78 75.91 72.91 74.34 72.52 75.52 76.82 77.86 77.86 70.57

95.02 96.77 97.65 96.48 94.72 97.51 97.36 96.04 96.77 96.77 96.19 96.04 96.19 96.63 96.92 95.46 96.48 97.21 96.63 96.92 97.80

83.33 82.96 82.22 83.33 81.48 84.44 82.96 81.48 82.59 82.59 84.81 83.70 84.81 80.00 79.25 77.03 81.48 80.74 81.48 83.33 83.33

62.58 64.51 67.09 66.45 61.93 63.87 62.58 61.29 57.41 57.41 69.67 69.67 69.67 65.80 63.87 60.00 65.80 62.58 61.29 67.09 83.76

Experimental results have shown that AMBC achieved the best performance in three of the seven pattern classiﬁcation problems in the medical ﬁeld, using 50-50 training-test split, 70-30 training-test split and leave-one-out cross-validation, as shown in Table 4, Table 5 and Table 7. Likewise we can see that AMBC achieved the best performance in four of the seven pattern classiﬁcation problems in the medical ﬁeld, using 10fold cross-validation, as shown in Table 6.

79.24 79.22 79.02 78.40 77.72 78.81 79.82 79.32 78.85 78.85 78.56 78.56 78.56 77.65 78.83 77.69 79.03 80.02 79.68 79.61 81.05

It should be noted that our proposal achieved the best classiﬁcation accuracy averaged over all datasets. The proposed approach has proven to be an effective alternative to perform pattern recognition tasks in the medical ﬁeld. The results presented in this paper demonstrate associative memories potential for medical decision support systems.

Table 5 – Classiﬁcation accuracy using 70-30 training-test split. The ﬁrst twenty methods are included in WEKA 3: Data Mining Software in Java [51]. Algorithm

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

AdaBoostM1 Bagging BayesNet Dagging DecisionTable DTNB FT LMT Logistic MultiClassClassiﬁer NaiveBayes NaiveBayesSimple NveBayesUpdateable RandomCommittee RandomForest RandomSubSpace RBFNetwork RotationForest SimpleLogistic SMO AMBC (our proposal)

Datasets

Average

Haberman

Liver

Inﬂammation

Diabetes

Breast

Heart

Hepatitis

73.52 73.20 72.22 73.85 72.22 72.22 73.20 73.85 74.18 74.18 75.49 75.49 75.49 64.70 67.64 74.50 73.52 74.18 73.85 73.52 77.30

68.98 72.17 58.55 58.26 58.55 58.55 69.85 71.59 66.37 66.37 56.23 55.07 56.23 68.40 66.37 67.53 61.73 70.72 67.82 57.68 59.593

100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 95.83 95.83 95.83 100.00 100.00 100.00 100.00 100.00 100.00 100.00 94.16

74.08 75.13 74.73 73.95 74.86 75.52 77.21 76.69 76.69 76.69 75.78 75.78 75.78 73.30 73.56 73.43 74.34 75.39 76.69 77.47 70.18

95.16 95.75 97.36 96.63 93.99 97.07 96.92 96.63 96.48 96.48 96.33 96.19 96.33 95.60 96.33 95.90 96.33 97.36 96.63 96.48 97.64

82.96 81.11 82.96 83.33 80.37 82.59 82.22 84.44 83.33 83.33 83.33 84.07 83.33 80.37 78.88 78.14 82.59 83.70 84.44 84.07 83.33

62.58 64.51 69.03 65.16 69.67 65.80 57.41 63.87 61.29 61.29 72.25 70.32 72.25 62.58 60.64 61.93 71.61 63.22 65.16 64.51 84.86

79.61 80.26 79.26 78.74 78.52 78.82 79.54 81.01 79.76 79.76 79.32 78.96 79.32 77.85 77.63 78.78 80.02 80.65 80.65 79.10 81.01

304

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

Table 6 – Classiﬁcation accuracy using 10 fold cross-validation. The ﬁrst twenty methods are included in WEKA 3: Data Mining Software in Java [51]. Algorithm

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

AdaBoostM1 Bagging BayesNet Dagging DecisionTable DTNB FT LMT Logistic MultiClassClassiﬁer NaiveBayes NaiveBayesSimple NveBayesUpdateable RandomCommittee RandomForest RandomSubSpace RBFNetwork RotationForest SimpleLogistic SMO AMBC (our proposal)

Datasets

Average

Haberman

Liver

Inﬂammation

Diabetes

Breast

Heart

Hepatitis

73.20 73.20 72.54 73.52 72.54 72.54 72.87 73.85 74.50 74.50 74.50 73.85 74.50 64.37 67.97 72.22 72.87 73.20 73.85 73.52 76.33

66.66 73.04 56.81 57.97 57.97 57.97 70.43 69.85 68.69 68.69 54.20 55.07 54.20 68.11 70.72 64.05 66.08 73.04 71.01 57.97 65.50

100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 95.83 95.83 95.83 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00

74.34 74.60 74.34 74.08 71.22 73.82 77.34 77.47 77.21 77.21 76.30 76.30 76.30 75.26 72.39 75.26 75.39 76.82 77.47 77.34 70.39

95.60 96.19 97.21 96.77 95.75 97.51 96.92 96.48 96.63 96.63 96.19 96.33 96.19 96.48 97.07 95.54 95.90 97.21 96.63 96.92 97.80

82.22 83.70 82.22 82.22 83.33 82.59 82.22 82.22 83.70 83.70 83.33 82.96 83.33 82.22 83.70 82.22 84.07 82.59 82.22 83.33 83.70

67.09 69.67 69.03 66.45 72.25 68.38 69.03 67.09 68.38 68.38 71.61 70.96 71.61 63.22 65.16 67.74 69.67 66.45 66.45 72.25 85.16

79.87 81.49 78.88 78.72 79.01 78.97 81.26 80.99 81.30 81.30 78.85 78.76 78.85 78.52 79.57 79.60 80.57 81.33 81.09 80.19 82.70

Table 7 – Classiﬁcation accuracy using leave-one-out cross-validation. The ﬁrst twenty methods are included in WEKA 3: Data Mining Software in Java [51]. Algorithm

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

9.1.

AdaBoostM1 Bagging BayesNet Dagging DecisionTable DTNB FT LMT Logistic MultiClassClassiﬁer NaiveBayes NaiveBayesSimple NveBayesUpdateable RandomCommittee RandomForest RandomSubSpace RBFNetwork RotationForest SimpleLogistic SMO AMBC (our proposal)

Datasets

Average

Haberman

Liver

Inﬂammation

Diabetes

Breast

Heart

Hepatitis

75.49 72.87 74.18 73.52 68.95 68.95 73.20 72.87 74.18 74.18 75.49 74.50 75.49 64.70 66.66 71.56 74.50 72.87 73.52 73.20 74.18

63.76 70.43 63.18 57.68 63.18 63.18 71.59 69.85 68.40 68.40 55.94 55.36 55.94 68.11 67.82 68.69 64.63 70.72 69.27 57.97 60.57

100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 95.83 95.83 95.83 100.00 100.00 98.33 100.00 100.00 100.00 100.00 100.00

75.65 74.73 75.78 74.21 74.86 74.86 76.69 77.08 77.73 77.73 75.65 75.26 75.65 75.39 74.60 73.30 73.69 76.56 77.34 76.82 70.70

95.60 95.90 97.36 96.92 95.75 97.07 97.21 96.19 96.77 96.77 96.19 96.33 96.19 96.63 96.48 95.90 96.19 97.51 96.48 97.07 97.80

81.85 81.11 83.70 82.96 82.96 80.74 80.37 83.70 82.96 82.96 82.96 83.70 82.96 82.22 82.59 81.11 81.85 79.62 83.70 82.96 83.33

60.00 68.38 70.96 63.87 73.54 69.67 69.03 66.45 69.67 69.67 71.61 70.96 71.61 62.58 60.64 67.09 71.61 64.51 65.16 69.67 85.16

Concluding remark

Here are some relevant points that are useful to highlight the differences between the current proposal, named Associative Memory based Classiﬁer, and some previous models proposed by the Alfa-Beta research group. First, previous associative models work with column vectors with binary components, while the current proposal work

78.90 80.49 80.74 78.45 79.89 79.21 81.15 80.88 81.39 81.39 79.09 78.85 79.09 78.52 78.40 79.43 80.35 80.25 80.78 79.67 81.68

with vectors with real components; which means signiﬁcant savings in the encoding of information. Second, in order to be robust to noise, previous associative models need to encode the information using the JohnsonMöbius code [81], the current proposal does not require special coding. Third, previous associative models do not have the ability to identify relevant features that allow to increase the

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

performance of classiﬁcation, on the contrary, the current proposal allow to increase the performance of classiﬁcation by means of the learning reinforcement phase. But perhaps the crucial point is that the current proposal is completely parallelizable and can therefore take advantage of technological advances such as multiple cores computers or parallel computing, among others.

Acknowledgments The authors of the present paper would like to thank the following institutions for their economical support to develop this work: Science and Technology National Council of Mexico (CONACyT Grant No. 174952), SNI, National Polytechnic Institute of Mexico (COFAA, SIP, ESCOM, and CIC) and ICyTDF (Grant No. PIUTE10-77 and PICSO10-85).

references

[1] E.A. Feigenbaum, B. G. Buchanan, J. Lederberg, On generality and problem solving: a case study using the dendral program, Tech. Re CS-TR-70–176, Stanford University, Department of Computer Science, Stanford, CA, USA (1970). [2] B.G. Buchanan, E.A. Feigenbaum, J. Lederberg, A heuristic programming study of theory formation in science, in: IJCAI, 1971, pp. 40–50. [3] E.A. Feigenbaum, The art of artiﬁcial intelligence: themes and case studies of knowledge engineering, in: IJCAI, 1977, pp. 1014–1029. [4] K. Steinbuch, Die lernmatrix, Kybernetik 1 (1) (1961) 36–45. [5] K. Steinbuch, H. Frank, Nichtdigitale lernmatrizen als perzeptoren, Kybernetik 1 (3) (1961) 117–124. [6] K. Steinbuch, Adaptive networks using learning matrices, Kybernetik 2 (4) (1964) 148–152. [7] H. Kazmierczak, K. Steinbuch, Adaptive systems in pattern recognition, IEEE Transactions on Electronic Computers EC-12 (6) (1963) 822–835. [8] K. Steinbuch, B. Widrow, A critical comparison of two kinds of adaptive classiﬁcation networks, IEEE Transactions on Electronic Computers EC-14 (5) (1965) 737–740. [9] T. Kohonen, Correlation matrix memories, IEEE Transactions on Computers C-21 (4) (1972) 353–359. [10] K. Steinbuch, U.A.W. Piske, Learning matrices and their applications, IEEE Transactions on Electronic Computers EC-12 (6) (1963) 846–862. [11] J.A. Anderson, A memory storage model utilizing spatial correlation functions, Kybernetik 5 (3) (1968) 113–119. [12] J.A. Anderson, A simple neural network generating an interactive memory, Mathematical Biosciences 14 (1972) 197–220. [13] D.J. Willshaw, O.P. Buneman, H.C. Longuet-Higgins, Non-holographic associative memory, Nature 222 (5197) (1969) 960–962. [14] K. Nakano, Associatron—a model of associative memory, IEEE Transactions on Systems, Man, and Cybernetics SMC-2 (3) (1972) 380–388. [15] S.-I. Amari, Pattern learning by self-organizing nets of threshold elements, System and Computing Controls 3 (4) (1972) 15–22. [16] J.J. Hopﬁeld, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences 79 (1982) 2554–2558.

305

[17] J.J. Hopﬁeld, Neurons with graded response have collective computational properties like those of two-state neurons, Proceedings of the National Academy of Sciences of the United States of America 81 (1984) 3088–3092. [18] I.G.L. Personnaz, G. Dreyfus, Information storage and retrieval in spin glass like neural networks, Journal of Physical Letters 46 (1985) L359–L365. [19] A. Liwanag, S. Becker, Improving associative memory capacity: one-shot learning in multilayer Hopﬁeld networks, in: Proceedings of the 19th Annual Conference of the Cognitive Science Society, 1997, pp. 442–447. [20] F.T. Sommer, G. Palm, Improved bidirectional retrieval of sparse patterns stored by Hebbian learning, Neural Networks 12 (1999) 281–297. [21] R.R.M.A.H. Muhamad Amin, A. Khan, Analysis of pattern recognition algorithms using associative memory approach: a comparative study between the Hopﬁeld network and Distributed Hierarchical Graph Neuron (DHGN), in: IEEE 8th International Conference on Computer and Information Technology Workshops, 2008, pp. 153–158. [22] A.I. Khan, A.H.M. Amin, One shot associative memory method for distorted pattern recognition, in: AI 2007: Advances in Artiﬁcial Intelligence, vol. 4830 of Lecture Notes in Computer Science, 2007, pp. 705–709. [23] A.H.M. Amin, A.I. Khan, Parallel pattern recognition using a single-cycle learning approach within wireless sensor networks, in: Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, 2008, pp. 305–308. [24] A.H.M.A. Amir, H. Basirat, A.I. Khan, Under the cloud: a novel content addressable data framework for cloud parallelization to create and virtualize new breeds of cloud applications, in: Ninth IEEE International Symposium on Network Computing and Applications, 2010. [25] A.H.M. Amin, A.I. Khan, A divide-and-distribute approach to single-cycle learning HGN network for pattern recognition, in: 11th International Conference on Control, Automation, Robotics and Vision, 2010. [26] A.H.M. Amin, A.I. Khan, Distributed multi-feature recognition scheme for greyscale images, Neural Processing Letters 33 (2011) 45–59. [27] G. Ritter, P. Sussner, J. Diaz-de Leon, Morphological associative memories, IEEE Transactions on Neural Networks 9 (2) (1998) 281–293. [28] S. Peter, A fuzzy autoassociative morphological memory, in: Proceedings of the International Joint Conference on Neural Networks, 2003, pp. 326–331. [29] S.T. Wang, H.J. Lu, On new fuzzy morphological associative memories, IEEE Transactions on Fuzzy Systems 12 (3) (2004) 316–323. [30] P. Sussner, New results on binary auto- and heteroassociative morphological memories, in: Proceedings of International Joint Conference on Neural Networks, 2005, pp. 1199–1204. [31] G. Urcid, G.X. Ritter, Noise masking for pattern recall using a single lattice matrix associative memory, Ch. 5, pp. 81–100, in: Studies in Computational Intelligence, no. 67, Springer-Verlag, Berlin Heidelberg, 2007. [32] P. Sussner, M.E. Valle, Morphological and certain fuzzy morphological associative memories for classiﬁcation and prediction, Ch. 8, pp. 149–171, in: Studies in Computational Intelligence, vol. 67, Springer-Verlag, Berlin Heidelberg, 2007. [33] M. Wang, R. Chu, Economizing enhanced fuzzy morphological associative memory, in: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, 2008, pp. 495–500. [34] T. Saeki, T. Miki, Effectiveness of scale free network to the performance improvement of a morphological associative

306

[35]

[36]

[37]

[38]

[39] [40] [41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49] [50] [51]

[52]

[53]

[54]

[55]

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

memory without a kernel image, in: Neural Information Processing, vol. 4984 of Lecture Notes in Computer Science, 2008, pp. 358–364. M.E. Valle, P. Sussner, A general framework for fuzzy morphological associative memories, Fuzzy Sets and Systems 159 (2008) 747–768. M.E. Valle, Permutation-based ﬁnite implicative fuzzy associative memories, Information Sciences 180 (2010) 4136–4152. Y.S. Boutalis, A new method for constructing kernel vectors in morphological associative memories of binary patterns, Computer Science and Information Systems 8 (2011) 141–166. M.E. Valle, P. Sussner, Storage and recall capabilities of fuzzy morphological associative memories with adjunction-based learning, Neural Networks 24 (2011) 75–90. J. Serra, Image Analysis and Mathematical Morphology, vol. 2, Academic Press, London, 1992. H. Simon, Neural Networks—A Comprehensive Foundation, Prentice Hall International, Inc., 1999. B. Kosko, Bidirectional associative memories, IEEE Transactions on Systems, Man, and Cybernetics 18 (1980) 49–60. R. Bogacz, Knowledge database implemented as a neural networks, in: Proceedings of 2nd Conference on Neural Networks and their Application, 1996, pp. 66–71. G. MATHAI, B. UPADHYAYA, Performance analysis and application of the bidirectional associative memory to industrial spectral signatures, in: International Joint Conference on Neural Networks, 1989. ˜ E. Guzmán, O.B. Pogrebnyak, C. Yánez, J.A. Moreno, Image compression algorithm based on morphological associative memories, in: CIARP, 2006, pp. 519–528. M. Aldape-Pérez, I. Román-Godínez, O. Camacho-Nieto, Thresholded learning matrix for efﬁcient pattern recalling, pp. 445–452, in: CIARP’08: Proceedings of the 13th IberoAmerican congress on Pattern Recognition, Springer-Verlag, Berlin, Heidelberg, 2008. S. Chartier, R. Lepage, Learning and extracting edges from images by a modiﬁed Hopﬁeld neural network, in: Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), 2002. M.E. Acevedo-Mosqueda, Alpha–beta bidirectional associative memories (in Spanish). Ph.D. thesis, Center for Computing Research, México (2006). ˜ ˜ M.E. Acevedo-Mosqueda, C. Yánez-Márquez, I. López-Yánez, Alpha-beta bidirectional associative memories: theory and applications, Neural Processing Letters 26 (1) (2007) 1–40. A. Asuncion, D. Newman, UCI machine learning repository (2007). URL http://archive.ics.uci.edu/ml/. K.H. Rosen, Discrete Mathematics and Its Applications, 6th ed., McGraw-Hill, 2007. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, SIGKDD Explorations 11 (1) (2009) 10–18. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, WEKA 3: Data mining software in java (2010). URL http://www.cs.waikato.ac.nz/ml/weka/. I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, in: Morgan Kaufmann Series in Data Management Systems, 2nd ed., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Thirteenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, 1996, pp. 148–156. L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140.

[56] N. Christoﬁdes, Graph Theory: An Algorithmic Approach (Computer Science and Applied Mathematics), Academic Press, Inc., Orlando, FL, USA, 1975. [57] K.M. Ting, I.H. Witten, Stacking bagged and dagged models, in: D.H. Fisher (Ed.), Fourteenth international Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, CA, 1997, pp. 367–375. [58] R. Kohavi, The power of decision tables, in: 8th European Conference on Machine Learning, Springer, 1995, pp. 174–189. [59] M. Hall, E. Frank, Combining Naive Bayes and decision tables, in: Proceedings of the 21st Florida Artiﬁcial Intelligence Society Conference (FLAIRS), AAAI Press, 2008, pp. 318–319. [60] N. Landwehr, M. Hall, E. Frank, Logistic model trees, Machine Learning 59 (1–2) (2005) 161–205. [61] J. Gama, Functional trees, Machine Learning 55 (3) (2004) 219–250. [62] M. Sumner, E. Frank, M. Hall, Speeding up logistic model tree induction, in: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, 2005, pp. 675–683. [63] S. le Cessie, J. van Houwelingen, Ridge estimators in logistic regression, Applied Statistics 41 (1) (1992) 191–201. [64] G.H. John, P. Langley, Estimating continuous distributions in Bayesian classiﬁers, in: Eleventh Conference on Uncertainty in Artiﬁcial Intelligence, San Mateo, Morgan Kaufmann, 1995, pp. 338–345. [65] R. Duda, P. Hart, Pattern Classiﬁcation and Scene Analysis, Wiley, New York, 1973. [66] L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32. [67] T.K. Ho, The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8) (1998) 832–844. [68] M.D. Buhmann, Radial Basis Functions: Theory and Implementations (Cambridge Monographs on Applied and Computational Mathematics), Cambridge University Press, 2003. [69] J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classiﬁer ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (10) (2006) 1619–1630. [70] J.C. Platt, Fast training of support vector machines using sequential minimal optimization, in: B. Schoelkopf, C. Burges, A. Smola (Eds.), Advances in Kernel Methods—Support Vector Learning, MIT Press, 1998. [71] T. Hastie, R. Tibshirani, Classiﬁcation by pairwise coupling, in: M.I. Jordan, M.J. Kearns, S.A. Solla (Eds.), Advances in Neural Information Processing Systems, vol. 10, MIT Press, 1998. [72] S. Keerthi, S. Shevade, C. Bhattacharyya, K. Murthy, Improvements to Platt’s SMO algorithm for SVM classiﬁer design, Neural Computation 13 (3) (2001) 637–649. [73] T.M. Mitchell, Machine Learning, 1st ed., McGraw-Hill, 1997. [74] P. Clark, R. Boswell, Rule induction with cn2: some recent improvements, pp. 151–163, in: Y. Kodratoff (Ed.), Machine Learning—Proceedings of the Fifth European Conference (EWSL-91), 1991. [75] P.W. Eklund, A. Hoang, A performance survey of public domain Supervised Machine Learning algorithms, Tech. Rep., Grifﬁth University, School of Information Technology, Parklands Drive, Southport, Queensland 9726, Australia (2002). [76] S.K. Murthy, S. Kasif, S. Salzberg, A system for induction of oblique decision trees, Journal of Artiﬁcial Intelligence Research 2 (1994) 1–32.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 6 ( 2 0 1 2 ) 287–307

[77] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the Fourteenth International Joint Conference on Artiﬁcial Intelligence (IJCAI 95), 1995, pp. 1137–1145. [78] R. Kohavi, G.H. John, Wrappers for feature subset selection, Artiﬁcial Intelligence 97 (1) (1997) 273–324. [79] D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation 1 (1) (1997) 67–82.

307

[80] MathWorks, Maximum matrix size by platform (2010). URL http://www.mathworks.com/support/technotes/1100/1110.html. ˜ ˜ [81] C. Yánez, E.M.F. Riverón, I. López-Yánez, R. Flores-Carapia, A novel approach to automatic color matching, in: CIARP, 2006, pp. 529–538.

An associative memory approach to medical decision support systems

An associative memory approach to medical decision support systems

Recommend Documents