Engineering Applications of Artificial Intelligence 65 (2017) 12–28
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai
A morphological neural network for binary classification problems Ricardo de A. Araújo a, *, Adriano L.I. Oliveira b , Silvio Meira b a b
Laboratório de Inteligência Computacional do Araripe, Instituto Federal do Sertão Pernambucano, Brazil Centro de Informática, Universidade Federal de Pernambuco, Brazil
a r t i c l e
i n f o
Keywords: Morphological neural network Mathematical morphology Lattice theory Descending gradient-based learning Binary classification
a b s t r a c t The dilation–erosion perceptron (DEP) is a hybrid morphological processing unit, composed of a balanced combination between dilation and erosion morphological operators, recently presented in the literature to solve some problems. However, a drawback arises from such model for building complex decision surfaces for nonlinearly separable data. In this sense, to overcome this drawback, we present a particular class of morphological neural networks with multilayer structure, called the dilation–erosion neural network (DENN), to deal with binary classification problems. Each processing unit of the DENN is composed by a DEP processing unit. Also, a descending gradient-based learning process is presented to train the DENN, according to ideas from Pessoa and Maragos. Furthermore, we conduct an experimental analysis with the DENN using a relevant set of binary classification problems, and the obtained results indicate similar or superior classification performance to those achieved by classical and state of the art models presented in the literature. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction Morphological neural networks (MNN) (Davidson and Ritter, 1990; Davidson, 1991; Davidson and Talukder, 1993; Ritter and Sussner, 1996) are a particular class of artificial neural networks (ANN) based on fundamental concepts of mathematical morphology (Matheron, 1975; Serra, 1982). The development of MNN has grown due to an important observation made by Ritter et al. (1989, 1990) and Ritter and Wilson (2001), demonstrating that the image algebra (Sternberg, 1985) can be used to develop several ANN models. In this sense, several MNN have been presented in the literature to solve relevant problems, such as automatic target detection (Khabou and Gader, 2000; Khabou et al., 2000), landmine detection (Gader et al., 2000), rule extraction (Kaburlasos and Petridis, 2000), handwritten character recognition (Pessoa and Maragos, 2000), image restoration (Sousa et al., 2000), hyperspectral image analysis (Graña et al., 2003; Ritter et al., 2009), time series forecasting (Sussner and Valle, 2007; Araújo, 2011, 2013), pattern recognition (Sussner and Esmi, 2011a; Jamshidi and Kaburlasos, 2014), software development cost estimation (Araújo et al., 2012; Araújo, 2012), computer vision (Sussner et al., 2012), vision-based self-localization in mobile robotics (Esmi et al., 2014), high-frequency financial data prediction (Araújo et al., 2015), image understanding (na and Chyzhyk, 2016), amongst others. In general, an MNN performs morphological operators among complete lattices within each processing unit of the network. The reason
for that is because the complete lattice theory is accepted as theoretical framework for the mathematical morphology (Serra, 1988; Ronse, 1990; Banon and Barrera, 1993; Heijmans, 1994). Therefore, the set of all structuring elements of morphological operators, employed within each processing unit of an MNN, represents the network’s weights (Sussner and Esmi, 2011a). In the context of MNNs, it is worth mentioning the existence of a hybrid morphological processing unit, called the dilation– erosion perceptron (DEP) (Araújo, 2011, 2013), which was recently presented in the literature to solve some nonlinear problems. Such a model can be represented by a balanced combination between morphological operators (dilation and erosion) within an unique operator with hybrid morphological structure. However, Araújo (2011, 2013) argue that only one processing unit is not able to produce complex decision surfaces to classify efficiently non-linearly separable data, and this is the main drawback of the DEP model. In this sense, as future work, Araújo (2011, 2013) have suggested the development of a neural network with multilayer structure using the DEP as fundamental processing unit to overcome these limitations. In this sense, in order to overcome such drawback, we propose to extend the main ideas presented in Araújo (2011, 2013) to a neural network with a multilayer feed-forward structure, called the dilation– erosion neural network (DENN). Each morphological processing unit of the DENN model is represented by a balanced combination between dilation and erosion operators, followed by an activation function.
* Corresponding author.
E-mail addresses:
[email protected] (R.de A. Araújo),
[email protected] (A.L.I. Oliveira),
[email protected] (S. Meira). http://dx.doi.org/10.1016/j.engappai.2017.07.014 Received 23 January 2017; Received in revised form 25 May 2017; Accepted 16 July 2017 0952-1976/© 2017 Elsevier Ltd. All rights reserved.
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
Besides, based on Pessoa and Maragos (2000), a gradient-based learning process is presented to train the DENN model, where a smoothed rank-based approach is employed to overcome the nondifferentiability problem of dilation and erosion morphological operators. At the end, we conduct an experimental analysis with the proposed DENN using a relevant set of binary classification problems, where three measures are used to assess performance and compare to classical and state of the art models presented in the literature. Therefore, the main contributions of this work are: (i) proposal of an architecture of morphological neural network to deal with binary classification problems, (ii) proposal of a descending gradient-based approach using ideas from the back-propagation algorithm to train the proposed model, and (iii) Performing an experimental analysis with benchmark datasets to assess classification performance. This work is organized as follows. Section 2 presents the needed concepts about lattice theory for morphological neural networks. Section 3 presents the formal definition of the proposed model. Section 4 presents the descending gradient-based learning process to train the proposed model. Section 5 presents the simulations and experimental results. At the end, in Section 6, we present the final remarks of this work.
Definition 2.4. A lattice 𝐿 is a complete lattice if and only if all finite or infinite subset of 𝐿 has a supremum and an infimum in 𝐿, that is, ∀ ⋁ ⋀ 𝑋 ⊆ 𝐿 finite or infinite we have 𝑋 ∈ 𝐿 and 𝑋 ∈ 𝐿.
2. Background on morphological neural networks
Definition 2.6. Let 𝐿 and 𝑀 be lattices. An operator 𝛹 ∶ 𝐿 → 𝑀 is increasing if and only if (∀𝑥, 𝑦 ∈ 𝐿):
Definition 2.5. Let 𝐿 and 𝑀 be complete lattices, let 𝛿 and 𝜀 operators from the complete lattice 𝐿 to the complete lattice 𝑀, and let 𝑋 ⊆ 𝐿. Then, we have (1) The operator 𝛿 is an algebraic dilation if and only if (⋁ ) ⋁ 𝛿 𝑋 = 𝛿(𝑥).
(1)
𝑥∈𝑋
(2) The operator 𝜀 is an algebraic erosion if and only if (⋀ ) ⋀ 𝜀 𝑋 = 𝜀(𝑥).
(2)
𝑥∈𝑋
Banon and Barrera (1993) have presented important theorems regarding the decomposition of mappings among complete lattices in terms of elementary morphological operators. In this work we focus on the decomposition of increasing operators, since morphological operators of dilation and erosion are increasing.
The mathematical morphology (MM) (Matheron, 1975; Serra, 1982) represents a theory initially applied for image processing and image analysis using structuring elements (SE) (Serra, 1982, 1988; Heijmans, 1994; Soille, 1999). In this context, Ritter et al. (1989, 1990) and Ritter and Wilson (2001) have demonstrated that the image algebra (Sternberg, 1985) can be used to develop a particular class of artificial neural networks, called the morphological neural network (MNN) (Davidson and Ritter, 1990; Davidson, 1991; Davidson and Talukder, 1993; Ritter and Sussner, 1996). In this sense, we can observe that MNN are based on the complete lattice theory (Ritter et al., 1997; Ritter and Sussner, 1997; Petridis and Kaburlasos, 1998; Kaburlasos and Petridis, 2000; Ritter and Urcid, 2003; Ritter et al., 2004; Sussner and Esmi, 2011b), since the lattice theory (Dedekind, 1987; Birkhoff, 1993) can be seen as a theoreticalframework of the MM (Serra, 1988; Ronse, 1990; Banon and Barrera, 1993; Heijmans, 1994). Next, we present some fundamental concepts and relevant theories for the development of the DENN. Additional information can be found in Banon and Barrera (1993), Birkhoff (1993), Cuninghame-Green (1995), Ritter and Urcid (2003), Sussner and Valle (2006), Araújo and Susner (2010) and Sussner and Esmi (2011a).
𝑥 ≤ 𝑦 ⇒ 𝛹 (𝑥) ≤ 𝛹 (𝑦).
(3)
In this way, the Banon and Barrera decomposition for increasing operators leads to the theorem presented as follows. Theorem 2.1. Let 𝐿 and 𝑀 be complete lattices, and let 𝛹 ∶ 𝐿 → 𝑀 be an increasing mapping, then we have (1) There are dilations 𝛿 𝑖 for an index set 𝐼 so that 𝛹=
⋀
𝛿𝑖 .
(4)
𝑖∈𝐼
(2) There are erosions 𝜀𝑖 for an index set 𝐼 so that 𝛹=
⋁
𝜀𝑖 .
(5)
𝑖∈𝐼
It is worth mentioning that dilation and erosion operators require an additional algebraic structure besides the complete lattice structure. In this way, we use the same extension proposed by Green (1979) and Cuninghame-Green (1995) and employed by Sussner and Esmi (2011a). Some fundamental concepts os this approach is presented as follows.
Definition 2.1. Let 𝑋 be a nonempty set, and let ≤ be a binary relation in 𝑋, then ≤ is a partial order if and only if (∀ 𝑥, 𝑦, 𝑧 ∈ 𝐴):
Definition 2.7. Let 𝐺 be a nonempty set, and let ≤ a partial order relation. Then 𝐺 is a group if and only if
(1) 𝑥 ≤ 𝑥; (2) 𝑥 ≤ 𝑦 and 𝑦 ≤ 𝑥 ⇒ 𝑥 = 𝑦; (3) 𝑥 ≤ 𝑦 and 𝑦 ≤ 𝑧 ⇒ 𝑥 ≤ 𝑧.
(1) 𝐺 is a lattice; (2) ∀𝑎, 𝑏, 𝑥, 𝑦 ∈ 𝐺, then 𝑎 ≤ 𝑏 ⇒ 𝑥𝑎𝑦 ≤ 𝑥𝑏𝑦.
Definition 2.2. Let (𝑋, ≤) be a partially ordered set and let 𝑌 a subset of 𝑋 (𝑌 ⊆ 𝑋). Then 𝑥 ∈ 𝑋 is an upper bound of 𝑌 if and only if 𝑦 ≤ 𝑥, ∀ 𝑦 ∈ 𝑌 . Similarly, 𝑥 ∈ 𝑋 is a lower bound of 𝑌 if and only if 𝑥 ≤ 𝑦, ∀ 𝑦 ∈ 𝐵.
Definition 2.8. Let 𝐺 be a lattice and also a group, and let ‘‘+’’ an addition operation among groups. Then, the group 𝐺 is a lattice ordered group if every group translation using the operation ‘‘+’’ be isotone, that is, if and only if (∀ 𝑥, 𝑦 ∈ 𝐺 so that 𝑥 ≤ 𝑦):
Note that the smallest upper bound of 𝑌 is defined as the supremum ⋁ of 𝑌 and denoted by 𝑖∈𝐼 𝑦𝑖 when 𝑌 = {𝑦𝑖 , 𝑖 ∈ 𝐼} for a finite index set 𝐼. Similarly, the greatest lower bound of 𝑌 is defined as the infimum of ⋀ 𝑌 and denoted by 𝑖∈𝐼 𝑦𝑖 when 𝑌 = {𝑦𝑖 , 𝑖 ∈ 𝐼}.
𝑎 + 𝑥 + 𝑏 ≤ 𝑎 + 𝑦 + 𝑏,
∀𝑎, 𝑏 ∈ 𝐺.
(6)
‘‘+′ ’’
Let an addition operation over (𝐺 × 𝐺) ⧵ {(+∞, −∞), (−∞, +∞)}, then ‘‘+′ ’’ differs from conventional addition operation ‘‘+’’ according to the following rules:
Definition 2.3. A partially ordered set 𝐿 is a lattice if and only if all finite subset of 𝐿 has a supremum and an infimum in 𝐿, that is, ∀ 𝑋 ⊆ 𝐿 ⋁ ⋀ finite then 𝑋 ∈ 𝐿 and 𝑋 ∈ 𝐿.
(−∞) + (+∞) = (+∞) + (−∞) = −∞,
(7)
and
Note that a lattice 𝐿 is a bounded if and only if it has at least element 0𝐿 and a greatest element 1𝐿 .
(−∞) + ′ (+∞) = (+∞) + ′ (−∞) = +∞. 13
(8)
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28 Table 4 Testing performance of 10-fold cross validation for the credit screening data.
Table 1 Testing performance of 10-fold cross validation for the breast cancer wisconsin data. Model
Statistics
Performance measures
Model
KS2
AUC
ACC
Statistics
Performance measures AUC
ACC
DENN
MEAN PERC25 PERC975
0.9475 0.8485 1.0000
0.9905 0.9719 1.0000
0.9685 0.9286 1.0000
DENN
MEAN PERC25 PERC975
0.7616 0.6282 0.8395
0.9219 0.8468 0.9656
0.8637 0.7971 0.9143
DT
MEAN PERC25 PERC975
0.8376 0.7880 0.9149
0.9188 0.8940 0.9574
0.9270 0.8986 0.9571
DT
MEAN PERC25 PERC975
0.6459 0.4754 0.7974
0.8230 0.7377 0.8987
0.8260 0.7536 0.8986
KNN
MEAN PERC25 PERC975
0.9096 0.7065 0.9783
0.9548 0.8533 0.9891
0.9613 0.8857 0.9857
KNN
MEAN PERC25 PERC975
0.6962 0.5458 0.7974
0.8481 0.7729 0.8987
0.8477 0.7826 0.8986
MLP
MEAN PERC25 PERC975
0.9345 0.8636 1.0000
0.9912 0.9737 1.0000
0.9613 0.9143 0.9859
MLP
MEAN PERC25 PERC975
0.6759 0.5433 0.7861
0.8686 0.7725 0.9454
0.8144 0.7101 0.9000
NB
MEAN PERC25 PERC975
0.9312 0.8514 0.9783
0.9656 0.9257 0.9891
0.9613 0.9275 0.9857
NB
MEAN PERC25 PERC975
0.5102 0.3319 0.7070
0.7551 0.6660 0.8535
0.7712 0.6812 0.8676
RBF
MEAN PERC25 PERC975
0.8911 0.7841 0.9583
0.9803 0.9592 0.9983
0.9513 0.9000 0.9857
RBF
MEAN PERC25 PERC975
0.5708 0.4497 0.6827
0.7672 0.6641 0.8243
0.7537 0.6957 0.8088
SVM-L
MEAN PERC25 PERC975
0.9278 0.8514 0.9783
0.9639 0.9257 0.9891
0.9656 0.9286 0.9859
SVM-L
MEAN PERC25 PERC975
0.7239 0.6019 0.8158
0.8620 0.8009 0.9079
0.8550 0.7971 0.9000
SVM-P
MEAN PERC25 PERC975
0.8796 0.7065 0.9783
0.9398 0.8533 0.9891
0.9456 0.8857 0.9857
SVM-P
MEAN PERC25 PERC975
0.6706 0.5985 0.7394
0.8353 0.7992 0.8697
0.8348 0.8116 0.8696
SVM-R
MEAN PERC25 PERC975
0.9421 0.8732 1.0000
0.9711 0.9366 1.0000
0.9699 0.9420 1.0000
SVM-R
MEAN PERC25 PERC975
0.6849 0.5458 0.8098
0.8425 0.7729 0.9049
0.8419 0.7826 0.9000
Table 2 Results of the Friedman test for the breast cancer wisconsin data. KS2 measure
KS2
defined in terms of matrix products of maximum and minimum from
AUC measure
ACC measure
𝜒2 42.1375 Model
𝑝-value 1.2761e−6 Rank
𝜒2 63.8328 Model
𝑝-value 8.2074e−11 Rank
𝜒2 39.9213 Model
𝑝-value 3.3136e−6 Rank
DENN SVM-R MLP SVM-L NB KNN RBF SVM-P DT
2.65 3.05 3.95 4.15 4.35 4.75 6.65 7.00 8.45
MLP DENN RBF SVM-R NB SVM-L KNN SVM-P DT
1.55 1.80 3.95 4.35 5.40 5.60 6.00 7.70 8.65
SVM-R DENN SVM-L KNN NB MLP RBF SVM-P DT
3.05 3.25 3.75 4.40 4.45 4.70 6.05 7.05 8.30
minimax algebra. Definition 2.9. Let 𝐺 be a complete lattice ordered group, and let the following matrices 𝑋 ∈ 𝐺𝑚×𝑝 and 𝑌 ∈ 𝐺𝑝×𝑛 . Then, the max product of 𝑋 and 𝑌 is given by (9)
𝐵 = 𝑋 ∨𝑌. An element of the matrix 𝐵 is given by 𝑏𝑖𝑗 =
𝑝 ⋁
(10)
(𝑥𝑖𝑘 + 𝑦𝑘𝑗 ).
𝑘=1
Recall that minimax algebra considers a bounded lattice ordered group (BLOG), which is a bounded lattice 𝐿 whose set of finite elements builds a group, given by 𝐿 ⧵ {+∞, −∞}, with the terms +∞ and −∞ ⋁ ⋀ denoting 𝐿 (the greatest element of 𝐿) and 𝐿 (the lowest element of 𝐿), respectively (Green, 1979; Cuninghame-Green, 1995). In the particular case when 𝐿 is a complete lattice whose set of finite elements builds a group, 𝐿 is defined as a complete lattice ordered group. In this case, Green (1979) and Cuninghame-Green (1995) has demonstrated that any elementary morphological operator of kind 𝐿𝑛 → 𝐿𝑚 can be
Similarly, the min product of 𝑋 and 𝑌 is given by (11)
𝐶 = 𝑋 ∧𝑌. An element of the matrix 𝐶 is given by 𝑐𝑖𝑗 =
𝑝 ⋀
(𝑥𝑖𝑘 + ′ 𝑦𝑘𝑗 ).
(12)
𝑘=1
Table 3 Results of the Tukey test for the breast cancer wisconsin data. Pair
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
KS2 measure
AUC measure
Statistic
𝑝-value
−5.80 −2.10 −1.30 −1.70 −4.00 −1.50 −4.35 −0.40
1.3008e−6 7.9725e−2 2.7803e−1 1.5604e−1 8.4480e−4 2.1070e−1 2.8372e−4 7.3855e−1
Statistic −6.85 −4.20 0.25 −3.60 −2.15 −3.80 −5.90 −2.55
14
ACC measure 𝑝-value 1.0761e−8 4.5474e−4 8.3469e−1 2.6539e−3 7.2688e−2 1.5130e−3 8.4263e−7 3.3279e−2
Statistic −5.05 −1.15 −1.45 −1.20 −2.80 −0.50 −3.80 0.20
𝑝-value 7.5012e−6 3.0775e−1 1.9843e−1 2.8719e−1 1.3014e−2 6.5743e−1 7.5087e−4 8.5921e−1
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
Table 5 Results of the Friedman test for the credit screening data.
Table 7 Testing performance of 10-fold cross validation for the iris 2D data.
KS2 measure
AUC measure
ACC measure
𝜒2
𝜒2
𝜒2
Model
31.8927 Model
𝑝-value 9.7354e−5 Rank
45.3445 Model
𝑝-value 3.1661e−7 Rank
29.4605 Model
𝑝-value 2.6317e−4 Rank
DENN SVM-L KNN SVM-P SVM-R MLP DT RBF NB
1.95 3.70 4.55 4.55 4.70 5.20 5.45 7.30 7.60
DENN MLP SVM-L SVM-R KNN SVM-P DT RBF NB
1.20 3.60 3.90 4.90 4.95 5.10 5.75 7.70 7.90
DENN SVM-L KNN SVM-P SVM-R MLP DT NB RBF
2.80 3.65 4.10 4.35 4.45 5.30 5.35 7.10 7.90
Definition 2.10. Let 𝐺 be a complete lattice ordered group. Let the operators 𝛿𝑋 , 𝜀𝑋 for 𝑋, 𝑌 ∈ 𝐺𝑛×𝑚 . Then we have 𝛿𝑋 (𝑌 ) = 𝑌 𝑇 ∨ 𝑋,
(13)
𝜀𝑋 (𝑌 ) = 𝑌 𝑇 ∧ 𝑋,
(14)
in which
‘‘𝑇 ’’
⋀
AUC
ACC
MEAN PERC25 PERC975
0.9400 0.6000 1.0000
0.9800 0.8800 1.0000
0.9400 0.8000 1.0000
DT
MEAN PERC25 PERC975
0.8800 0.6000 1.0000
0.9400 0.8000 1.0000
0.9400 0.8000 1.0000
KNN
MEAN PERC25 PERC975
0.8800 0.4000 1.0000
0.9400 0.7000 1.0000
0.9400 0.7000 1.0000
MLP
MEAN PERC25 PERC975
0.9000 0.6000 1.0000
0.9760 0.8800 1.0000
0.9200 0.8000 1.0000
NB
MEAN PERC25 PERC975
0.8800 0.4000 1.0000
0.9400 0.7000 1.0000
0.9400 0.7000 1.0000
RBF
MEAN PERC25 PERC975
0.7383 0.3333 1.0000
0.7400 0.3600 1.0000
0.7600 0.6000 1.0000
SVM-L
MEAN PERC25 PERC975
0.9000 0.4000 1.0000
0.9500 0.7000 1.0000
0.9500 0.7000 1.0000
SVM-P
MEAN PERC25 PERC975
0.9000 0.6000 1.0000
0.9500 0.8000 1.0000
0.9500 0.8000 1.0000
SVM-R
MEAN PERC25 PERC975
0.8800 0.4000 1.0000
0.9400 0.7000 1.0000
0.9400 0.7000 1.0000
(15)
𝛿 𝑖, 𝑖∈𝐼 𝐴
Performance measures
DENN
represents transposition.
Note that the operators 𝛿𝑋 and 𝜀𝑋 represent, respectively, a dilation and an erosion. In this context, Theorem 2.1 reveals that there are matrices (𝐴𝑖 and 𝐵 𝑗 ), for the indexes set 𝐼 and 𝐽 , so that 𝛹=
Statistics
KS2
and 𝛹=
3. The proposed dilation–erosion neural network (DENN)
⋁ 𝑗∈𝐽
(16)
𝜀𝐵 𝑗 .
The proposed model has a multilayer feed-forward architecture, in which each processing unit is given by a combination between dilation and erosion operators. The formal definition of the DENN is presented as follows. The 𝑛th output of a processing unit from 𝑙th layer of the DENN is given by
In the same way, Eqs. (15) and (16) suggest that an increasing operator 𝛹 ∶ R𝑛 → R can be estimated in terms of vectors 𝐚𝑖 , 𝐛𝑗 ∈ R𝑛 for a set of finite indexes 𝐼̄ and 𝐽̄, so that 𝛹≃
⋀
(17)
𝛿𝐚𝑖 ,
𝑖∈𝐼
( ) 𝑦𝑛(𝑙) = 𝑓 𝑢𝑛(𝑙) ,
and 𝛹≃
⋁
(21)
𝑛 = 1, … , 𝑁𝑙 ,
in which (18)
𝜀𝐛𝑗 .
( ) (𝑙) (𝑙) (𝑙) (𝑙) 𝑢(𝑙) 𝑛 = 𝜃𝑛 𝛿𝑛 + 1 − 𝜃𝑛 𝜀𝑛 ,
𝑗∈𝐽
When the indexes set 𝐼̄ = 1 and 𝐽̄ = 1, an increasing operator 𝛹 ∶ R𝑛 → R can be estimated in terms of vectors 𝐚, 𝐛 ∈ R𝑛 :
(22)
with 𝑁𝑙−1 ( ) ( ) ⋁ (𝑙) 𝛿𝑛(𝑙) = 𝛿𝐚(𝑙) 𝐲(𝑙−1) = 𝑦𝑖(𝑙−1) + 𝑎𝑛,𝑖 , 𝑛
(19)
𝛹 ≃ 𝛿𝐚 ,
𝜃𝑛(𝑙) ∈ [0, 1],
(23)
𝑖=1
and
and
𝑁𝑙−1 ( ) ( (𝑙−1) ) ⋀ (𝑙) 𝜀(𝑙) = 𝜀 𝐲 = 𝑦(𝑙−1) + ′ 𝑏𝑛,𝑖 , (𝑙) 𝑛 𝑖 𝐛
(20)
𝛹 ≃ 𝜀𝐛 .
𝑛
where the term 𝜃𝑛(𝑙) ∈ R and terms 𝐚𝑛(𝑙) , 𝐛𝑛(𝑙) ∈ R𝑁𝑙−1 . The term 𝑁𝑙 denotes the number of processing units within 𝑙th layer.
The hypothesis of Eqs. (19) and (20) provides the basis to deal with classification problems by means of increasing morphological operators. Table 6 Results of the Tukey test for the credit screening data. Pair
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
KS2 measure
(24)
𝑖=1
AUC measure
ACC measure
Statistic
𝑝-value
Statistic
𝑝-value
Statistic
𝑝-value
−3.50 −2.60 −3.25 −5.65 −5.35 −1.75 −2.60 −2.75
4.1555e−03 3.3245e−02 7.7819e−03 3.7150e−06 1.1811e−05 1.5184e−01 3.3245e−02 2.4326e−02
−4.55 −3.75 −2.40 −6.70 −6.50 −2.70 −3.90 −3.70
1.9823e−04 2.1623e−03 4.9662e−02 4.2613e−08 1.0602e−07 2.7229e−02 1.4244e−03 2.4774e−03
−2.55 −1.30 −2.50 −4.30 −5.10 −0.85 −1.55 −1.65
3.3664e−02 2.7887e−01 3.7303e−02 3.4126e−04 2.1554e−05 4.7893e−01 1.9667e−01 1.6932e−01
15
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
Table 8 Results of the Friedman test for the iris 2D data.
Table 10 Testing performance of 10-fold cross validation for the parkinson data.
KS2 measure
AUC measure
ACC measure
𝜒2
𝜒2
𝜒2
Model
11.6152 Model
𝑝-value 1.6922e−1 Rank
25.9786 Model
𝑝-value 1.0592e−3 Rank
24.5390 Model
𝑝-value 1.8601e−3 Rank
DENN MLP SVM-L SVM-P DT KNN NB SVM-R RBF
3.95 4.70 4.70 4.75 5.10 5.15 5.15 5.15 6.35
DENN MLP SVM-L SVM-P DT KNN NB SVM-R RBF
3.50 3.60 4.80 4.95 5.25 5.25 5.25 5.25 7.15
SVM-L SVM-P DENN DT KNN NB SVM-R MLP RBF
4.25 4.25 4.65 4.65 4.70 4.70 4.70 5.50 7.60
Statistics
AUC
ACC
DENN
MEAN PERC25 PERC975
0.9067 0.6667 1.0000
0.9563 0.8167 1.0000
0.9195 0.7500 1.0000
DT
MEAN PERC25 PERC975
0.7514 0.3333 1.0000
0.8757 0.6667 1.0000
0.8926 0.8000 1.0000
KNN
MEAN PERC25 PERC975
0.7057 0.5286 1.0000
0.8529 0.7643 1.0000
0.8976 0.7895 1.0000
MLP
MEAN PERC25 PERC975
0.5557 0.1429 1.0000
0.7734 0.4800 1.0000
0.7174 0.6111 0.8000
NB
MEAN PERC25 PERC975
0.5581 0.2667 0.9333
0.7790 0.6333 0.9667
0.7064 0.5500 0.9500
RBF
MEAN PERC25 PERC975
0.9340 0.4833 1.0000
0.9670 0.8133 1.0000
0.9282 0.8421 1.0000
SVM-L
MEAN PERC25 PERC975
0.5490 0.2000 0.8000
0.7745 0.6000 0.9000
0.8778 0.8000 0.9500
SVM-P
MEAN PERC25 PERC975
0.7145 0.4667 1.0000
0.8573 0.7333 1.0000
0.8879 0.7895 1.0000
SVM-R
MEAN PERC25 PERC975
0.5833 0.2000 1.0000
0.7917 0.6000 1.0000
0.8934 0.8000 1.0000
Fig. 1. The 𝑛th processing unit of the DENN.
Note that the input of the DENN is given by 𝐲(0) = 𝐱 = (𝑥1 , … , 𝑥𝑁0 ),
(25) in which 𝑛 = 1, … , 𝑁𝑙 and 𝑙 = 1, … , 𝐿 with 𝐿 representing the number of layers of the DENN. The proposed gradient-based learning process employees a supervised training algorithm to adjust the weight vector. It is based on an error criterion defined in terms of a cost function 𝐽 , given by
and its output given by 𝐲(𝐿) = 𝐲 = (𝑦1 , … , 𝑦𝑁𝐿 ).
(26)
In this work we have employed the activation function 𝑓 as a binary sigmoid, given by ( ) 𝑓 𝑢(𝑙) = 𝑛
Performance measures KS2
1 ( ). 1 + exp −𝑢(𝑙) 𝑛
𝐽 (𝑖) =
(27)
(29)
in which
The reason for this choice is to ensure that the output of processing units to lie within interval [0, 1] (that is, to be a complete lattice). Recall that internal activation 𝑢𝑛(𝑙) is composed by a balanced combination (the mix term is given by 𝜃𝑛(𝑙) ) between a dilation operator (defined by 𝛿𝑛(𝑙) ) and an erosion operator (defined by 𝜀(𝑙) 𝑛 ). Vectors 𝐚 and 𝐛 represent, respectively, the structuring elements of dilation and erosion operators. Fig. 1 depicts the architecture of a processing unit from the proposed model.
𝜉(𝑚) = ‖𝐞(𝑚)‖2 ,
(30)
with ‖𝐞(𝑚)‖2 =
𝑁𝐿 ∑ [
]2 𝑒𝑛 (𝑚) ,
(31)
𝑛=1
where 𝑀 is the amount of training patterns, and 𝐞(𝑚) = [𝑒1 (𝑚), … , 𝑒𝑁𝐿 (𝑚)] is the signal error for the 𝑚th training pattern. Term 𝑒𝑛 (𝑚) is the instantaneous error, given by
4. The proposed learning process According to the definition of the DENN, it requires the adjustment of a weight vector defined by (𝑙) (𝑙) (𝑙) 𝐰(𝑙) 𝑛 = (𝜃𝑛 , 𝐚𝑛 , 𝐛𝑛 ),
𝑀 1 ∑ 𝜉(𝑚), 𝑀 𝑚=1
𝑒𝑛 (𝑚) = 𝑡𝑛 (𝑚) − 𝑦(𝐿) 𝑛 (𝑚),
(32)
in which 𝑡𝑛 (𝑚) and 𝑦(𝐿) 𝑛 (𝑚) are the 𝑛th target and the 𝑛th output, respectively.
(28) Table 9 Results of the Tukey test for the iris 2D data. Pair
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
KS2 measure
AUC measure
ACC measure
Statistic
𝑝-value
Statistic
𝑝-value
−1.15 −1.20 −0.75 −1.20 −2.40 −0.75 −0.80 −1.20
1.2393e−01 1.0841e−01 3.1569e−01 1.0841e−01 1.3240e−03 3.1569e−01 2.8451e−01 1.0841e−01
−1.75 −1.75 −0.10 −1.75 −3.65 −1.30 −1.45 −1.75
3.6805e−02 3.6805e−02 9.0503e−01 3.6805e−02 1.3318e−05 1.2089e−01 8.3631e−02 3.6805e−02
16
Statistic 0.00 −0.05 −0.85 −0.05 −2.95 0.40 0.40 −0.05
𝑝-value 1.0000e+00 9.5251e−01 3.1138e−01 9.5251e−01 4.4241e−04 6.3379e−01 6.3379e−01 9.5251e−01
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) KS2.
(b) AUC.
(c) ACC.
Fig. 2. Boxplot graphic produced by all investigated models for the breast cancer wisconsin data (testing set). Table 11 Results of the Friedman test for the parkinson data. KS2 measure 2
AUC measure 2
Term ACC measure 2
𝜕𝜉
𝜒 36.9081 Model
𝑝-value 1.1965e−5 Rank
𝜒 37.9619 Model
𝑝-value 7.6514e−6 Rank
𝜒 41.8904 Model
𝑝-value 1.4199e−6 Rank
RBF DENN DT KNN SVM-P SVM-R MLP SVM-L NB
1.90 2.40 4.30 5.15 5.15 6.10 6.45 6.55 7.00
RBF DENN DT SVM-P KNN SVM-R MLP SVM-L NB
1.80 2.40 4.30 5.15 5.20 6.15 6.45 6.55 7.00
RBF DENN DT KNN SVM-P SVM-R SVM-L NB MLP
3.05 3.30 4.20 4.20 4.45 4.50 4.90 7.95 8.45
𝜕𝐰(𝑙) 𝑛
𝜕𝑦(𝑙) 𝑛 𝜕𝑢(𝑙) 𝑛
𝜕𝑦(𝑙) 𝑛
𝜕𝜉 𝜕𝑦(𝑙) 𝑛
approach, the weight vector can be adjusted by
(36)
( ) = 𝑓̇ 𝑢𝑛(𝑙) ,
(37)
(38)
= 𝑒𝑛 ,
𝑁𝑙+1
=
∑ 𝑛=1
𝜕𝜉
𝜕𝑦𝑛(𝑙+1)
𝜕𝑦𝑛(𝑙+1) 𝜕𝐲(𝑙)
(39)
,
in which 𝜕𝜉 (33)
𝜕𝑦(𝑙+1) 𝑛
= 𝑒𝑛(𝑙+1) ,
(40)
) 𝜕𝑢𝑛(𝑙) ( = 𝑓̇ 𝑢𝑛(𝑙+1) . 𝜕𝐲(𝑙−1)
(41)
and
in which 𝜇 is the learning rate and ∇𝐽𝑛(𝑙) is the gradient of the cost
𝜕𝑦(𝑙+1) 𝑛
function 𝐽 , given by
𝜕𝐰(𝑙) 𝑛
(35)
,
or when 𝑙 < 𝐿
the error surface of cost function. According to the descending gradient
.
(𝑙) (𝑙) 𝜕𝑦(𝑙) 𝑛 𝜕𝑢𝑛 𝜕𝐰𝑛
and when 𝑙 = 𝐿
The main problem to adjust the weight vector is minimizing the error
𝜕𝜉(𝑖)
(𝑙) (𝑙) 𝜕𝜉 𝜕𝑦𝑛 𝜕𝑢𝑛
and
between target and output, that is, finding a minimum point within
∇𝐽𝑛(𝑙) (𝑖) =
can be evaluated using the chain rule:
(𝑙)
in which ( ) 𝜕𝑢𝑛(𝑙) 𝜕𝑢𝑛(𝑙) 𝜕𝑢(𝑙) 𝜕𝑢(𝑙) 𝑛 = , , 𝑛 , 𝜕𝐰(𝑙) 𝜕𝜃𝑛(𝑙) 𝜕𝐚(𝑙) 𝜕𝐛(𝑙) 𝑛 𝑛 𝑛
𝜕𝜉
(𝑙) (𝑙) 𝐰(𝑙) 𝑛 (𝑖 + 1) = 𝐰𝑛 (𝑖) + 𝜇∇𝐽𝑛 (𝑖),
=
𝜕𝜉(𝑖) 𝜕𝐰𝑛
𝜕𝐲(𝑙)
The main problem of the learning process is how to evaluate the (𝑙) (𝑙) (𝑙) (𝑙) 𝜕𝑢𝑛 𝜕𝑢 𝜕𝑢 𝜕𝑢 following derivatives: 𝑓̇ , 𝜕𝐲(𝑙−1) , 𝑛(𝑙) , 𝑛(𝑙) and 𝑛(𝑙) .
(34)
𝜕𝜃𝑛
17
𝜕𝐚𝑛
𝜕𝐛𝑛
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28 Table 12 Results of the Tukey test for the parkinson data. Pair
KS2 measure Statistic
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
AUC measure Statistic
𝑝-value 1.1366e−01 2.2040e−02 7.4604e−04 1.2814e−04 6.7719e−01 5.4961e−04 2.2040e−02 2.0654e−03
−1.90 −2.75 −4.05 −4.60 0.50 −4.15 −2.75 −3.70
Statistics
Statistic
1.1366e−01 1.9737e−02 7.4604e−04 1.2814e−04 6.1738e−01 5.4961e−04 2.2040e−02 1.7945e−03
−1.90 −2.80 −4.05 −4.60 0.60 −4.15 −2.75 −3.75
−0.90 −0.90 −5.15 −4.65 0.25 −1.60 −1.15 −1.20
𝑝-value 4.4545e−01 4.4545e−01 1.2642e−05 8.0702e−05 8.3214e−01 1.7494e−01 3.2957e−01 3.0898e−01 (𝑙)
Table 13 Testing performance of 10-fold cross validation for the Ripley data. Model
ACC measure 𝑝-value
Also, from Eq. (22), we can estimate the derivative 𝜕𝑢𝑛(𝑙)
Performance measures KS2
AUC
ACC
𝜕𝐲(𝑙−1)
𝜕𝑢𝑛(𝑙)
=
𝜕𝐚𝑛(𝑙)
+
𝜕𝑢(𝑙) 𝑛 𝜕𝐛(𝑙) 𝑛
𝜕𝑢𝑛 𝜕𝐲(𝑙−1)
by (44)
.
DENN
MEAN PERC25 PERC975
0.8118 0.7637 0.8569
0.9603 0.9319 0.9772
0.9024 0.8800 0.9355
Considering that dilation and erosion operators are not differentiable in the usual way, a problem arises to evaluate the partial derivatives
DT
MEAN PERC25 PERC975
0.7647 0.7279 0.8254
0.8823 0.8639 0.9127
0.8823 0.8640 0.9127
𝜕𝐚𝑛
KNN
MEAN PERC25 PERC975
0.8016 0.7440 0.8710
0.9008 0.8720 0.9355
0.9008 0.8720 0.9355
MLP
MEAN PERC25 PERC975
0.7833 0.7448 0.8413
0.9534 0.9342 0.9714
0.8816 0.8480 0.9206
MEAN PERC25 PERC975
0.7759 0.6956 0.8254
0.8880 0.8478 0.9127
0.8880 0.8480 0.9127
MEAN PERC25 PERC975
0.8170 0.7594 0.8728
0.9576 0.9291 0.9813
0.8968 0.8720 0.9435
SVM-L
MEAN PERC25 PERC975
0.7648 0.7276 0.8095
0.8824 0.8638 0.9048
0.8824 0.8640 0.9048
SVM-P
MEAN PERC25 PERC975
0.8016 0.7276 0.8571
0.9008 0.8638 0.9286
0.9008 0.8640 0.9286
SVM-R
MEAN PERC25 PERC975
0.8048 0.7276 0.8548
0.9024 0.8638 0.9274
0.9024 0.8640 0.9274
(𝑙)
(𝑙)
NB
RBF
𝜕𝑢𝑛
(𝑙)
𝜕𝑢(𝑙) 𝑛 𝜕𝜃𝑛(𝑙)
= 𝛿𝑛(𝑙) − 𝜀(𝑙) 𝑛 .
. In this sense, we can note that morphological operators of
(45)
𝑟 = 1, 2, … , 𝑛.
Therefore, from Eq. (45), we have ⋁( ) 𝐲(𝑙−1) + 𝐚𝑛(𝑙) ≡ 1 (𝐲(𝑙−1) + 𝐚𝑛(𝑙) ),
(46)
and ⋀( ) 𝐲(𝑙−1) + ′ 𝐛𝑛(𝑙) ≡ 𝑛 (𝐲(𝑙−1) + ′ 𝐛𝑛(𝑙) ).
(47)
Therefore, the evaluation of these partial derivatives can be done using the concept of rank indicator vector, originally proposed by Pessoa and Maragos (2000) and extended in this work in terms of morphological operators. However, such approach has a problem that can lead to abrupt changes in the gradient estimation, compromising the numerical robustness of the learning process (Pessoa and Maragos, 2000). In this way, the smoothed rank indicator vector, extended in terms of morphological operators, is employed to overcome such drawback. The reason for that is due to the fact that the smoothed rank operator approximates morphological operators in terms of differentiable operators. The smoothed rank-based approach depends of an smoothed impulse function 𝑄𝜎 (𝐱) = [𝑞𝜎 (𝑥1 ), 𝑞𝜎 (𝑥2 ), … , 𝑞𝜎 (𝑥𝑛 )], in which [ ( ) ] 1 𝑥𝑖 2 𝑞𝜎 (𝑥𝑖 ) = exp , ∀𝑖 = 1, …, 𝑛, (48) 2 𝜎
(𝑙)
by
(𝑙)
𝑟 (𝐱) = 𝑥(𝑟) ,
(42)
Similarly, from Eq. (22), we can estimate the partial derivative
𝜕𝑢𝑛
𝜕𝐛𝑛
dilation and erosion can be seen as particular cases of the rank operator. The 𝑟th rank operator of a vector 𝐱 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )𝑇 ∈ R𝑛 is the 𝑟th component from vector ordered by decreasing way (𝑥(1) ≥ 𝑥(2) ≥ ⋯ ≥ 𝑥(𝑛) ), and given by
Clearly, from Eq. (27), we can estimate the derivative 𝑓̇ by ( ) ( )[ ( )] 𝑓̇ 𝑢(𝑙) = 𝑓 𝑢(𝑙) 1 − 𝑓 𝑢𝑛(𝑙) . 𝑛 𝑛
and
𝜕𝑢𝑛
(𝑙)
𝜕𝜃𝑛
where 𝜎 is a smoothing factor. Next, we present how to evaluate these partial derivatives.
(43) Table 14 Results of the Friedman test for the Ripley data. KS2 measure
AUC measure
ACC measure
𝜒2 26.0323 Model
𝑝-value 1.0370e−3 Rank
𝜒2 62.9211 Model
𝑝-value 1.2417e−10 Rank
𝜒2 21.9283 Model
𝑝-value 5.0504e−03 Rank
RBF SVM-R SVM-P DENN KNN MLP NB DT SVM-L
2.60 3.70 4.20 4.25 4.55 5.25 6.35 7.00 7.10
DENN RBF MLP SVM-R KNN SVM-P NB DT SVM-L
1.70 2.00 2.30 5.25 5.70 5.70 7.15 7.60 7.60
SVM-R SVM-P DENN KNN RBF NB MLP SVM-L DT
3.30 3.70 3.80 4.10 4.55 5.75 6.50 6.60 6.70
18
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28 Table 15 Results of the Tukey test for the Ripley data. Pair
KS2 measure Statistic
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
−2.75 −0.30 −1.00 −2.10 1.65 −2.85 0.05 0.55
AUC measure Statistic
𝑝-value
2.3378e−02 8.0465e−01 4.0969e−01 8.3396e−02 1.7373e−01 1.8791e−02 9.6712e−01 6.5023e−01
−5.90 −4.00 −0.60 −5.45 −0.30 −5.90 −4.00 −3.55
1.1736e−06 9.8441e−04 6.2114e−01 7.1437e−06 8.0481e−01 1.1736e−06 9.8441e−04 3.4527e−03
Table 16 Testing performance of 10-fold cross validation for the sonar data. Model
Statistics
DENN
DT
and 𝜕𝛿𝑛(𝑙)
Performance measures
0.7255 0.5278 0.8889
0.8954 0.7677 0.9909
0.8176 0.6667 0.9500
in which 𝟏 = (1, … , 1) and ⋅𝑇 denotes transposition.
MEAN PERC25 PERC975
0.3901 0.0364 0.6273
0.6914 0.4818 0.8136
0.6929 0.4762 0.8095
𝜕𝑢𝑛(𝑙) 𝜕𝐛(𝑙) 𝑛
MEAN PERC25 PERC975
0.5830 0.1727 1.0000
0.7762 0.5182 1.0000
0.7224 0.5238 0.9500
MEAN PERC25 PERC975
0.3941 0.0505 0.8182
0.6971 0.5253 0.9091
0.6874 0.5000 0.9048
and
RBF
MEAN PERC25 PERC975
0.5754 0.2857 0.8333
0.8000 0.6455 0.9798
0.7268 0.5714 0.9000
𝜕𝐛(𝑙) 𝑛
SVM-L
MEAN PERC25 PERC975
0.5556 0.3030 1.0000
0.7778 0.6515 1.0000
0.7789 0.6500 1.0000
SVM-P
MEAN PERC25 PERC975
0.7434 0.4646 1.0000
0.8717 0.7323 1.0000
0.8749 0.7500 1.0000
SVM-R
MEAN PERC25 PERC975
0.5811 0.2000 1.0000
0.7906 0.6000 1.0000
0.8042 0.6190 1.0000
AUC measure
ACC measure
𝜒2 38.4836 Model
𝑝-value 6.1273e−6 Rank
𝜒2 42.0506 Model
𝑝-value 1.3250e−6 Rank
𝜒2 37.8634 Model
𝑝-value 7.9786e−6 Rank
SVM-P DENN SVM-L RBF SVM-R MLP NB KNN DT
2.15 2.65 4.55 4.60 4.70 4.85 6.70 7.30 7.50
DENN SVM-P RBF SVM-R SVM-L MLP NB KNN DT
2.15 2.25 4.50 4.55 4.70 5.20 6.90 7.30 7.45
SVM-P SVM-R DENN SVM-L RBF KNN MLP DT NB
1.65 3.45 3.50 4.20 6.15 6.40 6.45 6.55 6.65
𝜕𝑢𝑛(𝑙) 𝜕𝜀(𝑙) 𝑛
𝜕𝜀(𝑙) 𝑛
,
𝜕𝑢𝑛
(𝑙)
𝜕𝐚𝑛
𝜕𝛿𝑛(𝑙)
= 𝜃𝑛(𝑙) ,
𝜕𝜀𝑛(𝑙) 𝜕𝐛(𝑙) 𝑛
is given by
,
(52)
= 1 − 𝜃𝑛(𝑙) , ( [ ]) 𝑄𝜎 𝜀𝑛(𝑙) ⋅ 𝟏 − 𝐲(𝑙−1) + ′ 𝐛(𝑙) 𝑛 = . ( [ ]) (𝑙−1) + ′ 𝐛(𝑙) 𝑄𝜎 𝜀(𝑙) ⋅ 𝟏𝑇 𝑛 ⋅𝟏− 𝐲 𝑛
(53)
(54)
A relevant set of benchmark binary classification problems (breast cancer wisconsin, credit screening, iris 2-Dimensional (2D), parkinson, Ripley and sonar) are used as a test bed for evaluation of the proposed model. Several models have been proposed in the literature to solve classification problems (Ganesh et al., 2014; Baraldi et al., 2016; Jiang et al., 2016; Sarda-Espinosa et al., 2017). However, in the attempt to establish a performance study, we investigate the results obtained by representative models presented in the literature: (i) decision tree (DT) (Breiman et al., 1984; Esposito et al., 1997; Parvin et al., 2015), (ii) knearest neighbors (KNN) (Devroye et al., 1996; Han et al., 2013b), (iii) multilayer perceptrons (MLP) (Haykin, 2007; Gacquer et al., 2011), (iv) naive bayes classifier (NB) (Duda et al., 2000; Gacquer et al., 2011), (v) radial-basis function (RBF) (Haykin, 2007; Dash et al., 2013), and (vi) support vector machine (SVM) (Haykin, 2007; Han et al., 2013a) with linear (SVM-L), polynomial (SVM-P) and RBF (SVM-R) kernels. Notice that we have defined a basic architecture for the proposed model, defined as DENN (𝐼; 𝐻; 𝑂; 𝜇; 𝜎). Term 𝐼 represents the input dimensionality. Term 𝐻 is given by the amount of processing units in hidden layer (note that only one hidden layer is used in our experimental analysis). Term 𝜇 represents the learning rate. Term 𝜎 represents the smoothing factor. The choice of the input dimensionality depends of the investigated classification problem. For the amount of processing units in hidden layer (𝐻), we have used an empirical methodology (using the cross validation) where the values 1, 5, 10, 25 e 50 are investigated. For the learning rate (𝜇), we also have used an empirical methodology (using the cross validation) where the values 0.001, 0.01 and 0.1 are investigated. For the smoothing factor (𝜎), we also have employed an empirical methodology (using the cross validation) to investigate the values 0.005, 0.05 and 0.5. The initial values of the DENN parameters (𝑙) are 𝐚𝑛(𝑙) , 𝐛(𝑙) 𝑛 ∈ [−1, 1] and 𝜃𝑛 ∈ [0, 1]. For the learning process, two stop
is given by
(49)
in which 𝜕𝑢(𝑙) 𝑛
𝜕𝑢𝑛(𝑙) 𝜕𝜀(𝑙) 𝑛
(𝑙)
5. Simulations and experimental results
(𝑙)
The partial derivative
=
𝜕𝑢𝑛
𝜕𝐛𝑛
in which
Table 17 Results of the Friedman test for the sonar data. KS2 measure
(51)
(𝑙)
The partial derivative
MLP
𝜕𝛿𝑛(𝑙) 𝜕𝐚(𝑙) 𝑛
[ ]) ( 𝑄𝜎 𝛿𝑛(𝑙) ⋅ 𝟏 − 𝐲(𝑙−1) + 𝐚(𝑙) 𝑛 , = [ ]) ( 𝑄𝜎 𝛿𝑛(𝑙) ⋅ 𝟏 − 𝐲(𝑙−1) + 𝐚𝑛(𝑙) ⋅ 𝟏𝑇
MEAN PERC25 PERC975
0.7215 0.6190 0.9500
𝜕𝐚(𝑙) 𝑛
1.4075e−02 7.9950e−01 2.2254e−02 9.8738e−02 5.2543e−01 1.7756e−02 9.3253e−01 6.7205e−01
𝜕𝐚𝑛(𝑙)
0.7116 0.6091 0.9444
(𝑙) 𝜕𝑢(𝑙) 𝑛 𝜕𝛿𝑛
𝑝-value
ACC
0.4231 0.2182 0.8889
=
−2.90 −0.30 −2.70 −1.95 −0.75 −2.80 0.10 0.50
AUC
MEAN PERC25 PERC975
𝜕𝑢(𝑙) 𝑛
Statistic
KS2
KNN
NB
ACC measure
𝑝-value
(50) 19
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) KS2.
(b) AUC.
(c) ACC.
Fig. 3. Boxplot graphic produced by all investigated models for the credit screening data (testing set). Table 18 Results of the Tukey test for the sonar data. Pair
KS2 measure Statistic
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
−4.85 −4.65 −2.20 −4.05 −1.95 −1.90 0.50 −2.05
AUC measure
ACC measure
𝑝-value
Statistic
𝑝-value
6.8443e−05 1.3484e−04 7.0902e−02 8.8460e−04 1.0941e−01 1.1880e−01 6.8146e−01 9.2383e−02
−5.30 −5.15 −3.05 −4.75 −2.35 −2.55 −0.10 −2.40
1.3435e−05 2.3400e−05 1.2246e−02 9.5724e−05 5.3600e−02 3.6232e−02 9.3454e−01 4.8710e−02
Statistic −3.05 −2.90 −2.95 −3.15 −2.65 −0.70 1.85 0.05
𝑝-value 1.0687e−02 1.5215e−02 1.3546e−02 8.3774e−03 2.6556e−02 5.5795e−01 1.2153e−01 9.6662e−01
that we have used both tests with two scenarios: (i) considering each dataset individually, and (ii) considering all datasets together (using the approach proposed by Demsar (2006) and Nobrega and Oliveira (2015)).
conditions are used according to (Prechelt, 1994): (i) the number of training epochs (𝑒𝑝𝑜𝑐ℎ = 10 000), and (ii) the process training (𝑃 𝑡 ≤ 10−6 ). Furthermore, three measures (two-sample Kolmogorov–Smirnov statistic – KS2 (Drew et al., 2000), area under a receiver operating characteristics curve – AUC (Fawcett, 2006) and accuracy – ACC) are used to assess classification performance. The 10-fold cross-validation is used to assess the generalization performance of classification models investigated in this work, where we calculate, for each performance measure, the mean (MEAN), the percentile 2.5% (PERC25) and the percentile 97.5% (PERC975). Besides, the boxplot analysis is performed to evaluate the empirical distribution of the obtained results. Also, in order to determine, statistically, the model with greater classification performance, we apply the Friedman test (Friedman, 1940) with significance level 𝛼 = 0.05, since it establishes a performance rank for the investigated models. Furthermore, we use a post hoc test, called the Tukey test (Tukey, 1949) with 𝛼 = 0.05, trying to evaluate the pairwise performance of all investigated models. It is worth mentioning
5.1. The breast cancer wisconsin data The breast cancer wisconsin data (Lichman, 2013) is a classification problem, in which each sample represents a clinical case with malignant or benign diagnosis for breast cancer. All samples have a 10-dimensional feature vector. According to empirical results, the best parameters configuration of the DENN model for this dataset is 𝐻 = 10, 𝜇 = 0.01 and 𝜎 = 0.05. Table 1 presents the testing performance of 10-fold cross validation performed with all investigated models for the breast cancer wisconsin data. Fig. 2 depicts the boxplot graphic of the results presented in Table 1. 20
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) KS2.
(b) AUC.
(c) ACC.
Fig. 4. Boxplot graphic produced by all investigated models for the iris 2D data (testing set). Table 19 Summary of the testing performance for all datasets. Measure
Dataset
Model DENN
DT
KNN
MLP
NB
RBF
SVM-L
SVM-P
SVM-R
KS2
Wisconsin Credit Iris 2D Parkinson Ripley Sonar
0.9475 0.7616 0.9400 0.9067 0.8118 0.7255
0.8376 0.6459 0.8800 0.7514 0.7647 0.3901
0.9096 0.6962 0.8800 0.7057 0.8016 0.4231
0.9345 0.6759 0.9000 0.5557 0.7833 0.5830
0.9312 0.5102 0.8800 0.5581 0.7759 0.3941
0.8911 0.5708 0.7383 0.9340 0.8170 0.5754
0.9278 0.7239 0.9000 0.5490 0.7648 0.5556
0.8796 0.6706 0.9000 0.7145 0.8016 0.7434
0.9421 0.6849 0.8800 0.5833 0.8048 0.5811
AUC
Wisconsin Credit Iris 2D Parkinson Ripley Sonar
0.9912 0.9219 0.9800 0.9563 0.9603 0.8954
0.9188 0.8230 0.9400 0.8757 0.8823 0.6914
0.9548 0.8481 0.9400 0.8529 0.9008 0.7116
0.9905 0.8686 0.9760 0.7734 0.9534 0.7762
0.9656 0.7551 0.9400 0.7790 0.8880 0.6971
0.9803 0.7672 0.7400 0.9670 0.9576 0.8000
0.9639 0.8620 0.9500 0.7745 0.8824 0.7778
0.9398 0.8353 0.9500 0.8573 0.9008 0.8717
0.9711 0.8425 0.9400 0.7917 0.9024 0.7906
ACC
Wisconsin Credit Iris 2D Parkinson Ripley Sonar
0.9685 0.8637 0.9400 0.9195 0.9024 0.8176
0.9270 0.8260 0.9400 0.8926 0.8823 0.6929
0.9613 0.8477 0.9400 0.8976 0.9008 0.7215
0.9613 0.8144 0.9200 0.7174 0.8816 0.7224
0.9613 0.7712 0.9400 0.7064 0.8880 0.6874
0.9513 0.7537 0.7600 0.9282 0.8968 0.7268
0.9656 0.8550 0.9500 0.8778 0.8824 0.7789
0.9456 0.8348 0.9500 0.8879 0.9008 0.8749
0.9699 0.8419 0.9400 0.8934 0.9024 0.8042
According to results presented in Table 1 and in Fig. 2, we can note
Table 2 confirms, statistically, the results obtained in Table 1 and in Fig. 2, in which the proposed DENN model achieve the best performance (considering the KS2 measure) and the second best performance (considering AUC and ACC measures). Table 3 presents the obtained results of the Tukey test for the breast cancer wisconsin data. According to Table 3, the pairwise evaluation reveals that, except for MLP (considering the AUC measure) and SVM-R (considering the ACC measure) models, the DENN model has better performance regarding all pairs, for all investigated measures.
that DENN, MLP and SVM-R models obtained the greater performance for KS2 (0.9475), AUC (0.9912) and ACC (0.9699) measures, respectively. Even not producing the greater performance for AUC (0.9905) and ACC (0.9685) measures, we can observe that the DENN model achieve similar performance to those obtained with MLP and SVM-R models. Table 2 presents the obtained results of the Friedman test for the breast cancer wisconsin data. 21
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) DT.
(b) KNN.
(c) MLP.
(d) NB.
(e) RBF.
(f) SVM-L.
(g) SVM-P.
(h) SVM-R.
(i) DENN.
Fig. 5. Decision surfaces produced by all investigated models for the iris 2D data. Table 20 Results of the Friedman test considering all datasets. KS2 measure
AUC measure
ACC measure
𝜒2 38.4836 Model
𝑝-value 6.1273e−6 Rank
𝜒2 42.0506 Model
𝑝-value 1.3250e−6 Rank
𝜒2 37.8634 Model
𝑝-value 7.9786e−6 Rank
DENN SVM-R SVM-P MLP RBF KNN SVM-L NB DT
1.50 4.25 4.42 4.67 5.17 5.33 5.50 6.92 7.25
DENN MLP RBF SVM-P SVM-R SVM-L KNN NB DT
1.17 4.00 4.33 4.83 4.92 5.58 5.83 7.08 7.25
DENN SVM-R SVM-L SVM-P KNN RBF DT NB MLP
2.25 3.08 4.08 4.17 4.42 6.00 6.83 7.00 7.17
According to empirical results, the best parameters configuration of the DENN model for this dataset is 𝐻 = 10, 𝜇 = 0.1 and 𝜎 = 0.05. Table 4 presents the testing performance of 10-fold cross validation performed with all investigated models for the credit screening data. Fig. 3 depicts the boxplot graphic of the results presented in Table 4. Table 4 and Fig. 3 reveal that the DENN model outperformed all investigated models, considering KS2 (0.7616), AUC (0.9219) and ACC (0.8637) measures. Table 5 presents the obtained results of the Friedman test for the credit screening data. Table 5 reveal that the obtained results presented in Table 4 and in Fig. 3 can be statistically confirmed, in which the DENN model has the best performance. Table 6 presents the obtained results of the Tukey test for the credit screening data. The pairwise evaluation presented in Table 6 confirms that the DENN model has better performance regarding all pairs, for all investigated measures.
5.2. The credit screening data
5.3. Iris 2D data The credit screening data (Lichman, 2013) is a classification problem, in which each sample represents a customer profile with approval
The iris 2D data (Lichman, 2013) considered in this work is a classification problem (iris-versicolor vs. iris-virginica). All samples have a 2-dimensional feature vector (petal length and petal width).
or rejection decision for grant credit. All samples have a 15-dimensional feature vector. 22
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) KS2.
(b) AUC.
(c) ACC.
Fig. 6. Boxplot graphic produced by all investigated models for the parkinson data (testing set). Table 21 Results of the Tukey test considering all datasets. Pair
DENN–DT DENN–KNN DENN–MLP DENN–NB DENN–RBF DENN–SVM-L DENN–SVM-P DENN–SVM-R
KS2 measure
AUC measure
ACC measure
Statistic
𝑝-value
Statistic
𝑝-value
Statistic
𝑝-value
−5.75 −3.83 −3.17 −5.42 −3.67 −4.00 −2.92 −2.75
2.3775e−04 1.4283e−02 4.2973e−02 5.3607e−04 1.9102e−02 1.0570e−02 6.2296e−02 7.8806e−02
−6.08 −4.67 −2.83 −5.92 −3.17 −4.42 −3.67 −3.75
1.0450e−04 2.9169e−03 7.0750e−02 1.6090e−04 4.3417e−02 4.8487e−03 1.9358e−02 1.6769e−02
−4.58 −2.17 −4.92 −4.75 −3.75 −1.83 −1.92 −0.83
3.1299e−03 1.6249e−01 1.5266e−03 2.1977e−03 1.5629e−02 2.3726e−01 2.1661e−01 5.9112e−01
According to empirical results, the best parameters configuration of the DENN model for this dataset is 𝐻 = 10, 𝜇 = 0.1 and 𝜎 = 0.05. Table 7 presents the testing performance of 10-fold cross validation performed with all investigated models for the iris 2D data. Fig. 4 depicts the boxplot graphic of the results presented in Table 7. According to Table 7 and Fig. 4, although the proposed model did not have achieved the greater performance for ACC (0.9400) measure, it outperformed all investigated models, considering KS2 (0.9400) and AUC (0.9800) measures. Table 8 presents the obtained results of the Friedman test for the iris 2D data. According to Table 8, we can confirm statistically the obtained results in Table 7 and in Fig. 4. Note that the DENN model has the best performance for the KS2 measure. However, even producing the best performance for AUC measure, the DENN model has performance statistically similar to the MLP model. With respect to ACC measure, the DENN model has produced the third best performance (note that it has the same statistical performance than DT model). Table 9 presents the obtained results of the Tukey test for the iris 2D data.
According to Table 9, the pairwise evaluation reveals that, except for SVM-L, SVM-P and DT models (considering the ACC measure) and for MLP (considering the AUC measure), the DENN model has better performance regarding all pairs, for all investigated measures. The decision surfaces generated by all investigated models for the iris 2D data are depicted in Fig. 5. According to Fig. 5 we can observe that the decision surface generated by the proposed DENN model is able to efficiently classify the iris 2D data. Note that the decision surfaces generated by KNN, MLP, NB, SVM-L and SVM-R models are quite similar, in the same way to those generated by DT and SVM-P models. Besides, we can observe that only the RBF model is not able to produce a good decision surface to generalize the iris 2d data. 5.4. Parkinson data The parkinson data (Lichman, 2013) is a classification problem, in which each sample represents a patient having positive or negative 23
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) KS2.
(b) AUC.
(c) ACC.
Fig. 7. Boxplot graphic produced by all investigated models for the Ripley data (testing set).
diagnosis for parkinson disease. All samples have a 22-dimensional feature vector. According to empirical results, the best parameters configuration of the DENN model for this dataset is 𝐻 = 25, 𝜇 = 0.1 and 𝜎 = 0.05. Table 10 presents the testing performance of 10-fold cross validation performed with all investigated models for the parkinson data. Fig. 6 depicts the boxplot graphic of the results presented in Table 10. According to Table 10 and Fig. 6, we can note that the RBF model has produced the greater performance, considering KS2 (0.9340), AUC (0.9670) and ACC (0.9282) measures. However, the DENN model has achieved similar performance and the second best performance for KS2 (0.9067), AUC (0.9563) and ACC (0.9195) measures. Table 11 presents the obtained results of the Friedman test for the parkinson data. According to Table 11, we can confirm statistically the results presented in Table 10 and in Fig. 6. Table 12 presents the obtained results of the Tukey test for the parkinson data. According to Table 12, the pairwise evaluation reveals that, except for the RBF model, the DENN model has better performance regarding all pairs, for all investigated measures.
it outperformed all investigated models, considering AUC (0.9603) and ACC (0.9024) measures. However, the same performance can be achieved with the SVM-R model, considering ACC measure. Table 14 presents the obtained results of the Friedman test for the Ripley data. According to Table 14, we can confirm that the DENN model has best performance, in statistical terms, only for AUC measure. Considering KS and ACC measures, the DENN model has the fourth and third best performances, respectively. Table 9 presents the obtained results of the Tukey test for the Ripley data. According to Table 15, the pairwise evaluation reveals that, except for RBF, SVM-P and SVM-R models (considering the KS2 measure) and for SVM-P and SVM-R models (considering the ACC measure), the DENN model has better performance regarding all pairs, for all investigated measures. The decision surfaces generated by all investigated models for the Ripley data are depicted in Fig. 8. Once again, according to Fig. 8, we can verify that the decision surface generated by the proposed DENN model is able to efficiently classify the Ripley data. Also, is possible to observe that the decision surfaces generated by models KNN, MLP, RBF, SVM-P, SVM-R and DENN have some similarities, in the same way to those generated by models DT, NB and SVM-L.
5.5. Ripley data The Ripley data (Ripley, 1996) is a synthetic classification problem, in which each sample has a 2-dimensional features vector. According to empirical results, the best parameters configuration of the DENN model for this dataset is 𝐻 = 10, 𝜇 = 0.01 and 𝜎 = 0.05. Table 13 presents the testing performance of 10-fold cross validation performed with all investigated models for the Ripley data. Fig. 7 depicts the boxplot graphic of the results presented in Table 13. According to Table 13 and Fig. 7, although the proposed model did not have achieved the greater performance for KS2 (0.8118) measure,
5.6. Sonar data The sonar data (Lichman, 2013) is a classification problem, in which each sample represents a sonar signal with characteristics of a metal cylinder or a roughly cylindrical rock. All samples have a 60dimensional feature vector. According to empirical results, the best parameters configuration of the DENN model for this dataset is 𝐻 = 25, 𝜇 = 0.01 and 𝜎 = 0.05. 24
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) DT.
(b) KNN.
(c) MLP.
(d) NB.
(e) RBF.
(f) SVM-L.
(g) SVM-P.
(h) SVM-R.
(i) DENN.
Fig. 8. Decision surfaces produced by all investigated models for the Ripley data.
the Tukey test (Tukey, 1949) with 𝛼 = 0.05. Recall that we have used the approach proposed by Demsar (2006) and Nobrega and Oliveira (2015), to evaluate the investigated models using multiple datasets. Table 20 presents the obtained results of the Friedman test considering all datasets. According to Table 20, we can confirm statistically the results presented in Table 19, in which the DENN model outperformed all investigated models for all performance measures, considering all datasets. Table 21 presents the obtained results of the Tukey test considering all datasets. According to Table 21, the pairwise analysis reveals that the DENN model has better performance regarding all pairs, for all investigated measures, considering all datasets. According to the experimental analysis considering all datasets, we can observe significant improvements (statistically confirmed by both Friedman and Tukey tests) achieved by the proposed DENN model when compared to state of the art models in the literature. Note that the obtained results with the proposed DENN model are overall significant, since it has obtained a surprising first place in the rank position (considering KS2, AUC and ACC measures) in the Friedman test and no loss in the pairwise analysis (also considering KS2, AUC and ACC measures) in the Tukey test. Both statistics validated the effectiveness of the proposed DENN model for all investigated datasets.
Table 16 presents the testing performance of 10-fold cross validation performed with all investigated models for the sonar data. Fig. 9 depicts the boxplot graphic of the results presented in Table 16. Table 16 and Fig. 9 reveal that the DENN model has produced the greater performance only for AUC (0.8954) measure. Note that the SVM-P model has best performance for KS2 (0.7434) and ACC (0.8749) measures. Table 17 presents the obtained results of the Friedman test for the sonar data. According to Table 17, we can confirm statistically the results presented in Table 16 and in Fig. 9, in which the DENN model has the best performance only for AUC measure. Table 18 presents the obtained results of the Tukey test for the sonar data. According to Table 18, the pairwise analysis reveals that, except for the SVM-P and SVM-R models, the DENN model has better performance regarding all pairs, for all investigated measures. 5.7. Statistical analysis In Table 19, we summarize the obtained results of all datasets, considering the MEAN statistic, for all performance measures. In this sense, trying to establish a performance rank and a pairwise performance for the investigated models, considering all datasets, we have applied the Friedman test (Friedman, 1940) with 𝛼 = 0.05 and 25
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
(a) KS2.
(b) AUC.
(c) ACC.
Fig. 9. Boxplot graphic produced by all investigated models for the sonar data (testing set).
Besides, analyzing the rank position and the pairwise analysis of all investigated models in both Friedman and Tukey tests, it is possible to verify that the performance of the model in our experiments is strongly related to the choice of the performance measure. Note that opposite results can be find with the same dataset considering distinct measures, that is, the best model for a particular measure does not implies in the best model for another measure, and this is the main reason of the need of a robust performance analysis including measures with distinct properties to assess classification performance.
individually. When we consider the obtained results from all datasets together, the statistical analysis revealed that the proposed DENN outperformed all these models. Therefore, we can conclude that the proposed model can be successfully applied as solution for binary classification problems, having superior performance with respect to the investigated models. As future work, a sensitivity analysis must be done in terms of the amount of hidden processing units, the learning rate and the smoothing factor, since we have used an empirical methodology to determine these parameters. Also, other approaches must be developed to circumvent the nondifferentiability problem of morphological operators, in the attempt to improve the gradient estimation of dilation and erosion operators. In the same way, other gradient-based learning processes must be considered to improve learning performance. Besides, an study about computing complexity and computation time must be done in order to establish a complete cost performance evaluation of the proposed model. Also, the distributive lattice case can be explored in future works. At the end, an experimental analysis must be done with multi-class classification problems to assess performance of the proposed model, as well as other real-life applications defined in Zhang and Chau (2009a, b), Wu et al. (2009), Taormina and Chau (2015), chuan Wang et al. (2015) and Chau and Wu (2010).
6. Conclusions In this work we have presented the dilation–erosion neural network (DENN) to deal with binary classification problems. The processing unit of the DENN is given by a balanced combination among morphological operators, followed by a binary sigmoid activation function. Note that each processing unit has combined the characteristics of dilation and erosion operators in an unique hybrid morphological operator. In this context, a descending gradient-based learning using the backpropagation algorithm was presented to train the DENN. In this sense, according to demonstration that morphological operators can be seen as special cases of the rank operator, we have employed a smoothed rankbased approach to overcome the nondifferentiability problem of dilation and erosion operators. Such approach is a needed step to evaluate the derivatives within the proposed learning process. The empirical analysis performed in this work has demonstrated the effectiveness of the proposed DENN, having a classification performance similar or superior to those achieved by classical and state of the art models presented in the literature (DT, KNN, MLP, NB, RBF, SVM-L, SVM-P and SVM-R), considering an statistical analysis of each dataset,
References Araújo, R.A., 2011. A class of hybrid morphological perceptrons with application in time series forecasting. Knowl.-Based Syst. 24, 513–529. Araújo, R.A., 2012. Hybrid morphological methodology for software development cost estimation. Expert Syst. Appl. 39, 6129–6139. Araújo, R.A., 2013. Evolutionary learning processes to design the dilation-erosion perceptron for weather forecasting. Neural Process. Lett. 37, 303–333.
26
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28 Khabou, M.A., Gader, P.D., 2000. Automatic target detection using entropy optimized shared-weight neural networks. IEEE Trans. Neural Netw. 11 (1), 186–193. Khabou, M.A., Gader, P.D., Keller, J.M., 2000. LADAR target detection using morphological shared-weight neural networks. Mach. Vis. Appl. 11 (6), 300–305. Lichman, M., 2013. UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine, URL http://archive.ics.uci.edu/ml. Matheron, G., 1975. Random Sets and Integral Geometry. Wiley, New York. na, M.G., Chyzhyk, D., 2016. Image understanding applications of lattice autoassociative memories. IEEE Trans. Neural Netw. Learn. Syst. 27 (9), 1920–1932. Nobrega, J.P., Oliveira, A.L., 2015. Kalman filter-based method for online sequential extreme learning machine for regression problems. Eng. Appl. Artif. Intell. 44, 101– 110. Parvin, H., MirnabiBaboli, M., Alinejad-Rokny, H., 2015. Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng. Appl. Artif. Intell. 37, 34–42. Pessoa, L.F.C., Maragos, P., 2000. Neural networks with hybrid morphological rank linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recognit. 33, 945–960. Petridis, V., Kaburlasos, V.G., 1998. Fuzzy lattice neural network (FLNN): a hybrid model for learning. IEEE Trans. Neural Netw. 9 (5), 877–890. Prechelt, L., 1994. Proben1: A set of neural network benchmark problems and benchmarking rules. Tech. Rep. 21/94. Ripley, B.D., 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, United Kingdom. Ritter, G.X., Iancu, L., Schmalz, M.S., 2004. A new auto-associative memory based on lattice algebra. Lecture Notes in Comput. Sci. 3287, 148–155. Ritter, G.X., Li, D., Wilson, J.N., 1989. Image algebra and its relationship to neural networks. In: Technical Symposium Southeast on Optics, Electro-Optics, and Sensors. Proceedings of SPIE. Orlando, FL, Mar, pp. 90–101. Ritter, G.X., Sussner, P., 1996. An introduction to morphological neural networks. In: Proceedings of the 13th International Conference on Pattern Recognition. Vienna, Austria, pp. 709–717. Ritter, G.X., Sussner, P., 1997. Morphological perceptrons. In: ISAS’97, Intelligent Systems and Semiotics, Gaithersburg, Maryland. Ritter, G.X., Sussner, P., Hacker, W.B., 1997. Associative memories with infinite storage capacity. In: InterSymp’97, 9th International Conference on Systems Research Informatics and Cybernetics, Baden-Baden, Germany, pp. 281–293, invited Plenary Paper. Ritter, G.X., Urcid, G., 2003. Lattice algebra approach to single-neuron computation. IEEE Trans. Neural Netw. 14 (2), 282–295. Ritter, G.X., Urcid, G., Schmalz, M.S., 2009. Autonomous single-pass endmember approximation using lattice auto-associative memories. Neurocomputing 72 (10–12), 2101– 2110. Ritter, G.X., Wilson, J.N., 2001. Handbook of Computer Vision Algorithms in Image Algebra, second ed.. CRC Press, Boca Raton. Ritter, G.X., Wilson, J.N., Davidson, J.L., 1990. Image algebra: An overview. Comput. Vis. Graph. Image Process. 49 (3), 297–331. Ronse, C., 1990. Why mathematical morphology needs complete lattices. Signal Process. 21 (2), 129–154. Sarda-Espinosa, A., Subbiah, S., Bartz-Beielstein, T., 2017. Conditional inference trees for knowledge extraction from motor health condition data. Eng. Appl. Artif. Intell. 62, 26–37. Serra, J., 1982. Image Analysis and Mathematical Morphology. Academic Press, London. Serra, J., 1988. Image Analysis and Mathematical Morphology, Volume 2: Theoretical Advances. Academic Press, New York. Soille, P., 1999. Morphological Image Analysis. Springer Verlag, Berlin. Sousa, R.P., Carvalho, J.M., Assis, F.M., Pessoa, L.F.C., 2000. Designing translation invariant operations via neural network training. In: Proc. of the IEEE Intl Conference on Image Processing, Vancouver, Canada. Sternberg, S.R., 1985. Overview of image algebra and related issues. In: Levialdi, S. (Ed.), Integrated Technology for Parallel Image Processing. Academic Press, London. Sussner, P., Esmi, E., 2011a. Morphological perceptrons with competitive learning: Latticetheoretical framework and constructive learning algorithm. Inform. Sci. 181 (10), 1929–1950. Sussner, P., Esmi, E.L., 2011b. Morphological perceptrons with competitive learning: Lattice-theoretical framework and constructive learning algorithm. Inform. Sci. 181 (10), 1929–1950. Sussner, P., Esmi, E., Villaverde, I., na, M.G., 2012. The kosko subsethood fuzzy associative memory (ks-fam): Mathematical background and applications in computer vision. J. Math. Imaging Vision 42, 134–149. Sussner, P., Valle, M.E., 2006. Grayscale morphological associative memories. IEEE Trans. Neural Netw. 17 (3), 559–570. Sussner, P., Valle, M., 2007. Morphological and certain fuzzy morphological associative memories for classification and prediction. In: Kaburlassos, V.G., Ritter, G.X. (Eds.), Computational Intelligence Based on Lattice Theory, Vol. 67. Springer Verlag, Heidelberg, Germany, pp. 149–173. Taormina, R., Chau, K.-W., 2015. Data-driven input variable selection for rainfall-runoff modeling using binary-coded particle swarm optimization and Extreme Learning Machines. J. Hydrol. 529 (Part 3), 1617–1632. Tukey, J.W., 1949. Comparing individual means in the analysis of variance. Biometrics 5, 99–114.
Araújo, R.A., Oliveira, A.L.I., Meira, S.R.L., 2015. A hybrid model for high-frequency stock market forecasting. Expert Syst. Appl. 42 (8), 4081–4096. Araújo, R.A., Oliveira, A., Soares, S., Meira, S., 2012. An evolutionary morphological approach for software development cost estimation. Neural Netw. 32, 285–291. Araújo, R.A., Susner, P., 2010. An increasing hybrid morphological-linear perceptron with pseudo-gradient-based learning and phase adjustment for financial time series prediction. In: IEEE International Joint Conference on Neural Networks. Banon, G.J.F., Barrera, J., 1993. Decomposition of mappings between complete lattices by mathematical morphology, Part 1. general lattices. Signal Process. 30 (3), 299–327. Baraldi, P., Cannarile, F., Maio, F.D., Zio, E., 2016. Hierarchical k-nearest neighbours classification and binary differential evolution for fault diagnostics of automotive bearings operating under variable conditions. Eng. Appl. Artif. Intell. 56, 1–13. Birkhoff, G., 1993. Lattice Theory, third ed.. American Mathematical Society, Providence. Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA. Chau, K.W., Wu, C.L., 2010. A hybrid model coupled with singular spectrum analysis for daily rainfall prediction. J. Hydroinform. 12 (4), 458–473. chuan Wang, W., wing Chau, K., mei Xu, D., Chen, X.-Y., 2015. Improving forecasting accuracy of annual runoff time series using arima based on eemd decomposition. Water Resour. Manage. 29 (8), 2655–2675. Cuninghame-Green, R., 1995. Minimax algebra and applications. In: Hawkes, P. (Ed.), Advances in Imaging and Electron Physics, Vol. 90. Academic Press, New York, NY, pp. 1–121. Dash, C.S.K., Dash, A.P., Dehuri, S., Cho, S.-B., Wang, G.-N., 2013. De+rbfns based classification: A special attention to removal of inconsistency and irrelevant features. Eng. Appl. Artif. Intell. 26 (10), 2315–2326. Davidson, J.L., 1991. Template learning in morphological neural nets. In: Image Algebra and Morphological Image Processing II. In: Proceedings of SPIE, vol. 1568, pp. 176– 187. Davidson, J.L., Ritter, G.X., 1990. A theory of morphological neural networks. Proc. SPIE 1215, 378–388. Davidson, J.L., Talukder, A., 1993. Template identification using simulated annealing in morphology neural networks. In: Proceedings of 2nd Annual Midwest ElectroTechnology Conference, Ames, IA, April, pp. 64–67. Dedekind, R., 1987. Gesammelte mathematische werke. Math. Ann. Braunschweig. Demsar, J., 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30. Devroye, L., Gyorfi, L., Lugosi, G., 1996. A Probabilistic Theory of Pattern Recognition. Springer. Drew, J.H., Glen, A.G., Leemis, L.M., 2000. Computing the cumulative distribution function of the Kolmogorov-Smirnov statistic. Comput. Statist. Data Anal. 34 (1), 1– 15. Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classification, second ed.. WileyInterscience. Esmi, E., Sussner, P., Bustince, H., Fernandez, J., 2014. Theta-Fuzzy associative memories (Theta-FAMs). IEEE Trans. Fuzzy Syst. 23, 313–326. Esposito, F., Malerba, D., Semeraro, G., 1997. A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19 (5), 476–491. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27 (8), 861– 874. Friedman, M., 1940. A comparison of alternative tests of significance for the problem of 𝑚 rankings. Ann. Math. Statist. 11 (1), 86–92. Gacquer, D., Delcroix, V., Delmotte, F., Piechowiak, S., 2011. Comparative study of supervised classification algorithms for the detection of atmospheric pollution. Eng. Appl. Artif. Intell. 24 (6), 1070–1083. Gader, P.D., Khabou, M.A., Koldobsky, A., 2000. Morphological regularization neural networks. Pattern Recogn. Spec. Issue Math. Morphol. Appl. 33 (6), 935–945. Ganesh, M.R., Krishna, R., Manikantan, K., Ramachandran, S., 2014. Entropy based binary particle swarm optimization and classification for ear detection. Eng. Appl. Artif. Intell. 27, 115–128. Graña, M., Gallego, J., Torrealdea, F.J., D’Anjou, A., 2003. On the application of associative morphological memories to hyperspectral image analysis. Lecture Notes in Comput. Sci. 2687, 567–574. Green, R.C., 1979. Minimax Algebra: Lecture Notes in Economics and Mathematical Systems, Vol. 166. Springer-Verlag, New York. Han, L., Han, L., Zhao, H., 2013a. Orthogonal support vector machine for credit scoring. Eng. Appl. Artif. Intell. 26 (2), 848–862. Han, X., Quan, L., Xiong, X., Wu, B., 2013b. Facing the classification of binary problems with a hybrid system based on quantum-inspired binary gravitational search algorithm and K-NN method. Eng. Appl. Artif. Intell. 26 (10), 2424–2430. Haykin, S., 2007. Neural Networks and Learning Machines. McMaster University, Canada. Heijmans, H.J.A.M., 1994. Morphological Image Operators. Academic Press, New York, NY. Jamshidi, Y., Kaburlasos, V.G., 2014. gsaINknn: A {GSA} optimized, lattice computing knn classifier. Eng. Appl. Artif. Intell. 35, 277–285. Jiang, L., Li, C., Wang, S., Zhang, L., 2016. Deep feature weighting for naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39. Kaburlasos, V.G., Petridis, V., 2000. Fuzzy lattice neurocomputing (FLN) models. Neural Netw. 13 (10), 1145–1170.
27
R.de A. Araújo et al.
Engineering Applications of Artificial Intelligence 65 (2017) 12–28
Wu, C., Chau, K., Li, Y., 2009. Methods to improve neural network performance in daily flows prediction. J. Hydrol. 372 (1–4), 80–93. Zhang, J., Chau, K.-W., 2009a. Multilayer ensemble pruning via novel multi-sub-swarm particle swarm optimization. J.-Jucs 15 (4), 840–858.
Zhang, S., Chau, K.-W., 2009b. Dimension reduction using semi-supervised locally linear embedding for plant leaf classification. In: Emerging Intelligent Computing Technology and Applications: 5th International Conference on Intelligent Computing, ICIC 2009, Ulsan, South Korea, September 16-19, 2009. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 948–955.
28