International Journal of Approximate Reasoning 106 (2019) 228–243
Contents lists available at ScienceDirect
International Journal of Approximate Reasoning www.elsevier.com/locate/ijar
Design methodology for Radial Basis Function Neural Networks classifier based on locally linear reconstruction and Conditional Fuzzy C-Means clustering ✩ Seok-Beom Roh a , Sung-Kwun Oh a,b,∗ , Witold Pedrycz c,d,e , Kisung Seo f , Zunwei Fu b a
Department of Electrical Engineering, The University of Suwon, Wauan-gil, Bongdam-eup, Hwaseong-si, Gyeonggi-do, South Korea Key Laboratory of Complex Systems and Intelligent Computing in Universities of Shandong, Linyi University, Linyi, 276005, China c Department of Electrical & Computer Engineering, University of Alberta, Edmonton AB T6R 2G7, Canada d System Research Institute, Polish Academy of Sciences, Warsaw, Poland e Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia f Department of Electronic Engineering, Seokyeong University, Jungneung-Dong 16-1, Sungbuk-gu, Seoul, 02713, South Korea b
a r t i c l e
i n f o
Article history: Received 2 August 2018 Received in revised form 18 October 2018 Accepted 15 January 2019 Available online 21 January 2019 Keywords: Conditional Fuzzy C-Means clustering Fuzzy Radial Basis Function Neural Networks Auxiliary information Locally linear reconstruction Outlier detection
a b s t r a c t In this study, a new design method for Fuzzy Radial Basis Function Neural Networks classifier is proposed. The proposed approach is based on conditional Fuzzy C-Means clustering algorithm realized with the aid of auxiliary information, which is extracted by the locally linear reconstruction algorithm. Conditional Fuzzy C-Means can analyze the distribution of data (patterns) over the input space when being supervised by the auxiliary information. The conditional fuzzy C-Means clustering can substitute the conventional fuzzy C-Means clustering which has been usually used to define the radial basis functions over the input space. It is advocated that the auxiliary information extracted by using the locally linear reconstruction can determine which patterns among the entire data set are more important than the others. This assumption is based on the observation that the data, which cannot be fully reconstructed by the linear combination of their neighbors, may convey much more information than the other data to be reconstructed. It is well known that in the case of radial basis function neural networks classifier, the classification performance of this classifier is predominantly based on the distribution of the radial basis function over the input space. Several experiments are provided to verify the proposed design method for classification problems. © 2019 Elsevier Inc. All rights reserved.
1. Introduction Among various intelligent algorithms, fuzzy neural networks (FNNs) have become a popular over the past few decades [1,15]. Fuzzy neural networks can be considered as a type of hybrid system, which integrate superb learning capabilities of neural networks with reasoning abilities of fuzzy inference systems.
✩ This paper is part of the Virtual special issue on Fuzzy Systems and Fuzzy-Statistical Modeling: Theory and Applications, Edited by Jin Hee Yoon, Vilém Nóvak and Inuiguchi. Corresponding author. E-mail address:
[email protected] (S.-K. Oh).
*
https://doi.org/10.1016/j.ijar.2019.01.008 0888-613X/© 2019 Elsevier Inc. All rights reserved.
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
229
Neural networks, which are one of the functional components of fuzzy neural networks, come with a sound learning algorithm such as the error back propagation (BP) algorithm to estimate their parameters. Fuzzy inference systems, being another functional component of fuzzy neural networks, support reasoning abilities and deliver an efficient way to represent linguistic knowledge of experts in a formal manner. It can be concluded that fuzzy neural networks combine the advantages of neurocomputing and fuzzy sets and help eliminate their limitations. Fuzzy radial basis function neural networks (FRBFNNs) form fuzzy neural networks. FRBFNNs have been used widely in various areas such as system modeling, control, and classification [2,5–13]. FRBFNNs are another type of hybrid system, which stems from the fuzzy inference system and neural networks. Especially, FRBFNNs have several properties such as simple structure, good local approximating performance, and function equivalence with a simplified class of fuzzy inference systems [2]. One of key issues of FRBFNNs is the distribution of radial basis functions (RBFs) in the input space. The classification performance of FRBFNNs is closely related to the location of the RBFs over the input space. In order to determine the location of RBFs, clustering algorithms, which analyze the distribution of data points over the input space, are generally used. The general clustering algorithms can be considered as a type of unsupervised learning algorithm. The general objective of any data clustering techniques is somewhat different from the main requirement, which arises when RBFNNs are constructed to deal with the classification problem [4]. In this paper, the Conditional Fuzzy C-Means (CFCM) clustering algorithm, which was proposed by Pedrycz [4], are used to determine the distribution of RBFs in the input space with the aid of supervisory information called “auxiliary information” which can help improve the classification performance. The auxiliary information, which is necessary for the CFCM algorithm, is defined by locally linear reconstruction error based on Locally Linear Embedding (LLE) [14]. In addition, the weighted least square estimation technique is used to estimate the parameters of the consequent parts in the proposed FRBFNNs. To demonstrate the classification performance of the proposed classifier, several machine learning data sets are used. The contributions of this study are summarized as follows: – The outlier elimination procedure is applied to reduce the effect of the outliers during analyzing the distribution of data patterns distributed over the input space. – We demonstrate how the important distribution level of data patterns are assigned based on locally linear reconstruction error. – We introduce how to analyze the distribution of data patterns over the input space and then to locate RBFs according to the important distribution level acquired by using locally linear reconstruction error. In the sequel, it leads to the better classification performance. – Friedman test and Bonferroni–Dunn test are used to compare the classification performance of the proposed classifier. In order to show the possibility of the proposed classification system, we use two categories of data sets such as the conventional machine learning data and Laser Induced Breakdown Spectroscopy (LIBS) spectrum data. Usually, the conventional data sets are used to compare the classification capability of the proposed classifier with the already studied classifiers. In contrast to machine learning data, LIBS spectrum data is a real-world application data, which can be used to assess the possibility of the proposed classifier when the proposed classifier is applied to a real world problem. The paper is organized as follows. First, in Section 2, we discuss Conditional Fuzzy C-Means clustering method, which analyzes a distribution of data patterns under supervision of auxiliary information. In Section 3, we introduce Fuzzy Radial Basis Function Neural Networks with Conditional Fuzzy C-Means Clustering Algorithm. Comprehensive studies exploiting a series of experiments are reported in Section 4. Finally, concluding remarks are covered in Section 5. 2. Conditional Fuzzy C-Means clustering with locally linear reconstruction error As mentioned before, the key issue related to FRBFNNs is where the RBFs should be located over the input space to improve the classification performance of the FRBFNNs. In this study, locally linear reconstruction error is used as the criterion, which determines the importance of each data points. With the aid of the important degrees of data points, the locations of RBFs can be determined by using the conditional fuzzy C-Means clustering algorithm. 2.1. Locally linear reconstruction To improve the classification performance of FRBFNNs, it becomes necessary to locate the RBFs across the input space. Generally, the clustering methods are used to analyze the data distribution and determine a position of the centroids of the clusters (viz. RBFs). However, the generic clustering methods are usually carried out from the viewpoint of unsupervised learning and exploit input data. In order to enhance the classification performance, it seems to be necessary to locate RBFs over an input space in the mode of supervised learning. If a supervised clustering algorithm would be used, the supervised signal, which supervises a clustering algorithm to obtain the better location of RBFs should be extracted.
230
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
Locally linear reconstruction error (LLRE) is used as the supervision signal to help the performance of the clustering algorithm. Locally linear reconstruction (LLR) is a reconstruction phase of locally linear embedding, which is one of representative nonlinear dimension reduction methods. The nonlinear dimension reduction method includes isometric mapping (ISOMAP), local tangent alignment (LTSA), diffusion map, and kernel principal component analysis (KPCA) [14]. Locally linear reconstruction is considered as a kind of instance based learning. In LLR, if it is possible to describe a given data pattern well with its neighbors (i.e., K -nearest neighbors), the given data pattern has rarely unique information, in other words the information involved in the given data pattern can be described by the linear combination of the information of its neighbors. In other words, the importance of a data pattern, which can be described by the linear combination of its neighbors, is considered to be low. Whereas, a data pattern can be considered as a very important when it is difficult to describe the data pattern with its neighbors well. To elaborate on the essence of the LLR method, let us consider a set of patterns X = {x1 , x2 , . . . , x N } (where N stands for the number of patterns), xk ∈ Rm (where m denotes for the dimensionality of the input space). The objective function of LLR is defined as follows
2 K E (w) = x − w j x˜ j s.t. j =1
K
wj =1
(1)
j =1
Here, x˜ j denotes the j-th neighbors of a given data pattern x, K means the number of neighbors, and w ∈ Rm stands for the weight vector. In addition, w j denotes the weight of the k-th neighbor of a given data pattern x, which is used as a component of the coefficient vector defined as follows.
w = w1
··· wK
w2
T
∈ RK
(2)
The coefficient vector w is used to reconstruct the given data pattern x through the linear combination of x and w. The reconstructed data pattern xˆ can be obtained as follows.
⎤
⎡
xˆ =
K
w j x˜ j = x˜ 1
w1 ⎥ ⎢ ⎢ w2 ⎥
˜ · · · x˜ K ⎢ . ⎥ = Xw ⎣ .. ⎦
x˜ 2
j =1
(3)
wK
The objective function of LLR can be considered as the reconstruction error engaging the Euclidean distance between the data pattern x and the data pattern reconstructed by using its neighbors x˜ j and the weight vector w. The optimal weight w can be estimated by minimizing the objective function (1). The objective function can be expressed in the matrix form.
2
2
T
1 1 1 (X − X˜ )w (X − X˜ )w s.t. E (w) = x − w j x˜ j = w j (x − x˜ j ) = 2 2 2 K
j =1
j =1
K
x ∈ Rm× K
X = x ···
˜ = x˜ 1 X
K
wj =1
(4a)
j =1
(4b)
m× K
· · · x˜ K ∈ R T w = w1 · · · w K ∈ R K ×1
(4c) (4d)
The Lagrange multiplier involved in the above optimization problem is expressed in the form
L(w, λ) =
1 2 1
(X − X˜ )w
T
(X − X˜ )w + λ(e K w − 1)
= wT (X − X˜ )T (X − X˜ )w + λ(ek w − 1) 2
(5)
Here, e K = [1 · · · 1] ∈ R1× K . The optimal solution to the optimization problem is expressed as
∂ L(w, λ) = (X − X˜ )T (X − X˜ )w + λekT = 0 ∂w ∂ L(w, λ) = ekT w − 1 = 0 ∂λ ˜ )T (X − X˜ ) −1 ek w = λ (X − X
(6) (7) (8)
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
−1
˜ )T (X − X˜ ) ekT w = λekT (X − X λ= w=
ek = 1
231
(9)
1
(10)
˜ )T (X − X˜ )]−1 ek ekT [(X − X [(X − X˜ )T (X − X˜ )]−1 ek ˜ )T (X − X˜ )]−1 ek e T [(X − X
(11)
k
We modify the constraint of the objective function of LLR (5) as follows
2
1 1 ˜ )T (X − Xw ˜ ) subject to E (w) = x − w j x˜ j = (X − Xw 2 2 K
w j ≥ 0, ∀ j , e K w = 1
(12)
j =1
The parameters denoted as w = [ w 1 · · · w k ] should be nonnegative, (i.e. w j ≥ 0, ∀ j). The quadratic programming is used to solve the optimization problem as (12). The procedure of LLR is shown as follows: Table 1 Pseudo code of locally linear reconstruction. Step 1: Calculate the distance between a pair of data patterns for i = 1 to N for j = 1 to N d2i j = (xi − x j ) T (xi − x j ) end end Step 2: Sort the distance in an ascending order [sorted_distance, sorted_index] = sort(d, ascend ); Step 3: Find the K -nearest neighbors of each data patterns for j = 1 to K x˜ j = xsorted_index( j ) end Step 4: Solve the quadratic programming: Q = [x − x˜ 1 · · · x − x˜ K ] T [x − x˜ 1 · · · x − x˜ K ] e K = [1 · · · 1], |e K | = K , b = 1 lb = [0 · · · 0], |lb| = K w = quadprog(Q, [ ], e TK , b, lbT ) Step 5: Reconstruct the new data point which can be calculated by the linear combination of the neighbors of the given data point with the estimated parameters w.
˜ xˆ = Xw
(13)
Here, x˜ denotes the reconstructed data point, w is the parameter obtained by using quadratic programming.
In the case of the conventional RBFNNs, the locations of the RBFs can be determined by using a conventional clustering algorithm (especially Fuzzy C-Means clustering algorithm is used to determine the centroids of clusters and the activation levels of data patterns). In some studies, the supervised clustering algorithms have been used to analyze the distribution of data patterns under the supervision of the supervised signal. In this study, the clustering algorithm in supervised manner is used to determine the locations of RBFs. The supervised signal is defined by using the locally linear reconstruction error. We assume that the importance of a data pattern depends on the locally linear reconstruction error is expressed as
2 K ˜ )T (x − Xw ˜ ) RE(x, w) = x − xˆ = x − w j x˜ j = (x − Xw 2
(14)
j =1
Here, RE stands for the reconstruction error. A pattern with high reconstruction error cannot be easily reconstructed by the linear combination of its neighbors. This means that the information involved in the given pattern with high reconstruction error cannot be replaced by the linear combination of the information involved in its neighbor data points. In other words, the data pattern is too unique (different) to be replaced by the reconstruction of other data patterns. The uniqueness of data patterns emphasizes its importance.
232
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
Fig. 1. Membership function used in the quantification of importance of data.
Therefore, the uniqueness, which can be indicated by the reconstruction error, can be used in the realization of a supervision signal. We use membership function to transform the uniqueness of data patterns to their importance level defined in the [0, 1]. interval. The domain and the range of the transforming function f are x ≥ 0 and 0 ≤ y ≤ 1, respectively. In addition, the transform function should be a kind of monotonically increasing function. In other words, the more unique a data pattern is, the more important the data pattern should be considered when analyzing the distribution of data patterns. The membership function used to determine the importance of data patterns based on their reconstruction errors is described as follows
B (x) = f RE(x); a, b
f (x; a, b) =
⎧ 0, ⎪ ⎪ ⎨ 2( x−a )2 , b−a
(15a) x≤a b a ≤ x ≤ a+ 2
a 2 a+b ⎪ 1 − 2( bx− ≤x≤b ⎪ 2 −a ) , ⎩ 1, x≥b
(15b)
Here, B denotes the importance of a data pattern, and a and b are the parameters which are related to the shape of the transforming function. The membership function to transform a reconstruction error to the related importance of a given pattern is shown Fig. 1. As shown there, we can change the relation between the uniqueness and the importance by adjusting the values of the shape parameters a and b. 2.2. Conditional Fuzzy C-Means clustering with the auxiliary information The idea of Conditional Fuzzy C-Means (c-FCM, for short) clustering proposed in [4] was applied to the design of RBF neural networks as presented in [7]. To elaborate on the essence of the method, let us consider a set of patterns X = {x1 , x2 , . . . , x N }, xk ∈ m (where m stands for the dimensionality of the input space) along with an auxiliary information granule, which is defined as the boundary area. Each element of X is then associated with the auxiliary information granule (fuzzy set) B given by (15). In conditional clustering, the data pattern xk is clustered by taking into consideration the conditions (auxiliary information expressed in the form given by B (x1 ), B (x2 ), . . . , B (xn )) based on some linguistic term expressed as a fuzzy set B (B : → [0, 1]). The objective function used in the conditional fuzzy clustering is the same as the one used in the FCM, namely
J=
c n (u ik ) p · xk − vi 2
(16)
i =1 k =1
where J is the objective function, u ik is the activation level associated with the linguistic term B defining the boundary area, vi is the ith cluster and c is the number of rules (clusters) formed for this context. The difference between the FCM and c-FCM comes in the form of the constraint imposed on the partition matrix where we now have
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
233
Fig. 2. Data patterns of toy problem.
c
u ik = B (xk )
(17)
i =1
Here, B (xk ) is the linguistic term (fuzzy set) which means the activation level how much the input data xk is involved in the boundary area. Now the optimization problem is formulated in the following form
min J U,v
c
subject to
u ik = B (xk )
(18)
i =1
The iterative optimization scheme is governed by the two update formulas using which we successively modify the partition matrix and the prototypes
u ik = C
B (xk )
||xk −vi || 2/( p −1) j =1 ( ||xk −v j || )
n p k=1 (u ik ) · xk vi = n p k=1 (u ik )
(19a)
(19b)
So far, it is explained how to determine the center points of RBFs by using FCM with the aid of auxiliary information which is defined by reconstruction error and the membership function. In order to show the procedure of the proposed determination procedure to locate RBFs over the input space, the data patterns shown in Fig. 2 are used. In Fig. 3, the allocation of the centroids of clusters (i.e. centroids of RBFs) is shown. In the first step, some outliers included in the data patterns should be determined as shown in Fig. 3-a. To identify which data patterns are regarded as outliers, we use the minimum distance among distances of a data pattern and the other data patterns, which are involved the different classes. The distances between a given data pattern and other data patterns which are involved in the different classes are calculated in the form
D i = {di1 , di2 , . . . , di , N −|C k | } di j = xi − x j , xi ∈ C k , x j ∈ C l , l = k
(20)
Here, C k denotes the set of data patterns, which are involved in the k-th class, |C k | means the number of data of k-th class, N is the number of the entire data set. The distance between a given data pattern xi and the other classes is assigned as the minimum distance among the elements of the distance set
Dist(xi ) = di ,i ∗ ,
i ∗ = min di j j
(21)
234
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
Fig. 3. Procedure to calculate auxiliary information.
We determine the outliers among the whole data patterns in terms of the distance of the pattern to the other classes. In other words, the L data with large distance are determined as outliers. 3. Fuzzy radial basis function neural networks with Conditional Fuzzy C-Means clustering algorithm As mentioned before, the auxiliary for CFCM is defined by using outlier elimination and LLRE. The LLR algorithm is based on simple geometric intuitions [3]. Provided there is sufficient data, each data point and its neighbors are expected to lie on or close to a locally linear patch of the manifold. The geometry of these patches is characterized by linear coefficients that reconstruct each data point based on its neighbors. In this study, FRBFNNs constructed by using CFCM with the aids of the auxiliary information obtained by LLRE are used as a classifier. It is said that the RBFNNs have some advantages including global optimal approximation and classification capabilities as well as rapid convergence of the underlying learning procedures. The generic topology of FRBFNNs is depicted in Fig. 4.
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
235
Fig. 4. An overall topology of radial basis function neural networks with auxiliary information.
In Fig. 4, u i , i = 1, . . . , R denote receptive fields (radial basis functions), while the parameter “m” denotes the number of input variables. The output of the FRBFNNs is expressed as a linear combination of the outputs (μi ) of the corresponding hidden nodes with the connection weights ( f i ) as (22).
yˆ (xk ) =
R
u i f i (xk )
(22)
i =1
To estimate the parameters, we use the orthogonal least square method and the weighted least square estimation method. Proceeding with the optimization details, the objective function of Least Square Error (LSE) reads as follows
J=
n
gi −
i =1
g1
u ji f j (xi )
= (G − Θ a)T (G − Θ a)
(23)
m
j =1 ai j x j ,
⎤
⎡
⎢ g2 ⎥ ⎢ ⎥ G = ⎢ . ⎥, ⎣ .. ⎦
2
j
where f i (x) = ai0 +
⎡
c
gn
u 11 Θ = ⎣ u 21 un1
a = a10
u 11 x11 u 21 x12 un1 x1n
⎤ · · · u 11 xm1 u 12 u 12 x11 · · · u 12 xm1 · · · u 1c xm1 · · · u 21 xm2 u 22 u 22 x12 · · · u 22 xm2 · · · u 2c xm2 ⎦ , · · · un1 xmn un2 un2 x1n · · · un2 xmn · · · unc xmn
· · · a1m a20 a21 · · · a2m · · · acm
a11
T
∈ c(m+1)
The optimized values of the coefficients are expressed in a well-known manner
a = ΘT Θ
− 1
ΘT G
(24)
When we use the weighted LSE to estimate the coefficients of local models, we assume that each data patterns comes with its priority (relevance) and data patterns with high priority significantly affect the estimation process whereas data with low
236
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
priority participate to a limited degree and can be almost neglected. The activation levels of the linguistic variable defining the boundary area can be considered as the priority index. As noted earlier, we emphasize the data positioned within the boundary area. Unlike the conventional LSE, the objective function of the weighted LSE is defined as follows
J=
n
2 c
q B (xi ) gi − u ji f j (xi ) = (G − Θ a)T Dq (G − Θ a)
i =1
⎡ ⎢ ⎢ ⎣
B (x1 )
0 B (x2 )
where D = ⎢
(25)
j =1
..
.
⎤ ⎥ ⎥ ⎥. ⎦
0 B (xn ) In the above expression, q denotes the linguistic modifier of the activation level of the boundary area. If the values of q get higher than 1, we arrive at higher specificity of the underlying linguistic information while an opposite effect becomes present when dealing with the lower values of q [3]. Note that the diagonal partition matrix D is the reduced matrix, which is composed of the activation levels of all data pairs to the linguistic term B as the diagonal elements. The optimal values of the coefficients by using the weighted LSE are determined in a well-known manner.
a = Θ T Dq/2 Θ
− 1
Θ T Dq/2 Y
(26)
The final output of the RBF NNs comes in the following form
ˆ = Θa Y
(27)
The estimated class label is determined by using the well-known decision rule
gˆ i =
0, if yˆ i < 0 1, if yˆ i ≥ 0
(28)
The pseudocode used to construct the classifier is summarized in Table 2. Considering the pseudo code for the construction of the proposed RBF NNs classifier, there are several essential parameters of the proposed RBF NNs that impact its performance of the classifier. These parameters concern the number of clusters (C ), the values of the fuzzification coefficient (p) and the linguistic modifier (q).
Table 2 Pseudocode for the construction of RBF NNs classifier with the boundary area. Main Procedure of Radial Basis Function Neural Networks Classifier Decide upon the design parameters such as a. The number of Clusters (R) used to define Boundary Area b. The number of RBFs (C ) c. Order of the polynomial (O ) d. Fuzzification Coefficient (p) e. Linguistic Modifier (q) f. The number of nearest neighbors (K ) Define Auxiliary Information based on Locally Linear Reconstruction Error a. Reconstruct given data patterns by using (12) b. Calculate reconstruction error by using (14) c. Calculate importance of data patterns by using (15) Determine the location of RBFs within Auxiliary Information by using c-FCM a. Calculate partition matrix and prototypes of RBFs using (23) and (24) Estimate the coefficients of local models a. Weight Least Square Estimation using (26) Calculate the output of Radial Basis Function Neural Networks a. Calculate final output of RBF NNs using (27) Decide class label of data pattern a. Determine class label of data pattern using decision rule (28) End Procedure of Radial Basis Function Neural Networks Classifier
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
237
Table 3 Selected numeric values of the parameters of the proposed model. Parameter
Value
Polynomial order (O ) Number of RBFs (c) Fuzzification coefficient (p) Number of nearest neighbors (K )
0 (constant), 1 (linear), or 2 (quadratic) 2–9 In the range of 1.5–3.0 varying with step of 0.5 1.0 or 2.0
Fig. 5. Two-dimensional synthetic data.
4. Experimental studies In order to evaluate and quantify the classification effectiveness of the proposed classifier, the proposed classifier is experimented with by making use of a series of numeric data including two synthetic datasets and several Machine Learning datasets (http://www.ics.uci.edu/~mlearn/MLRepository.html). In the assessment of the performance of the classifiers, we use the error rate of the resulting classifier. We investigate and report the results of each experiment in terms of the mean and the standard deviation of the performance index (error rate). We consider some predefined values of the parameters of the network whose values are summarized in Table 1. The choice of these particular numeric values has been motivated by the need to come up with a possibility to investigate of the performance of the model in a fairly comprehensive range of scenarios. The numeric values in Table 3 were selected through a trial and error process by running a number of experiments and monitoring the pace of learning and assessing its convergence. 4.1. Two-dimensional data In order to illustrate the characteristics and performance of the proposed classifiers such as the location of the centroids of clusters determined by CFCM under the supervision of the auxiliary information and the boundary surface estimated by using the proposed classifier, the two-dimensional synthetic examples are of instructional value. The two-dimensional synthetic data are shown in Fig. 5. Each cluster consists of 100 data. The classification results of the proposed classifier is shown in Fig. 6. To locate the centroids of clusters over the input space, we should extract the auxiliary information by using the locally linear reconstruction error as shown in Fig. 6-b. Based on the obtained auxiliary information, CFCM is used to determine the locations of centroids of clusters as depicted in Fig. 6-c. In Fig. 6-d, we can see the difference between the center points obtained by using FCM and those by using CFCM. The centroids determined by using CFCM are located along the boundary surface. In Fig. 7, the center points of clusters and the estimated boundary surface which is estimated by WLSE or LSE are shown. The number of clusters and fuzzification coefficient are 4 and 2.0 respectively. In addition, the linear functions are used as the consequent part of the RBFNNs classifier.
238
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
Fig. 6. Results of localization of clusters with the aid of auxiliary information.
In Fig. 8, the center points of clusters and the estimated boundary surface of RBFNNs which is estimated by WLSE or LSE are shown. The number of clusters and fuzzification coefficient are 4 and 2.0, respectively. In addition, the order of the consequent part of RBFNNs is 3 (i.e. quadratic polynomial is used as the consequent part of the RBFNNs classifier). 4.2. Machine learning data sets In what follows, we report on several experiments when using machine learning data sets coming from UCI Machine Learning Repository (http://www.ics.uci.edu/~mlearn/MLRepository.html). Table 4 summarizes the pertinent details of 13 data sets such as the number of features and the number of patterns. In addition, Table 5 contrasts the classification performance of the proposed classifier with the performance of other classification methods such as Bayesian Networks, Logistic Classifier, Support Vector Machine (SVM), K -Nearest Neighbor, and Multilayer Perceptron. These classifiers were experimented within the data mining framework of WEKA [16]. The Friedman test is well known as a non-parametric equivalent of the repeated-measures analysis of variance (ANOVA). In this study, the Friedman test is used to compare the classification performances of the proposed classifier and several already studied classifiers such as Bayesian Networks, Logistic Regression, Support Vector Machine, K -Nearest Neighbors, and Multilayer Neural Networks. The Friedman test checks whether the measured average ranks are significantly different from the mean rank. In order to show the rank of each classifier for each data set, the rank is described in Table 5, which is denoted inside parenthesis.
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
239
Fig. 7. Center points of clusters and boundary surface of proposed classifier and conventional RBFNNs classifier (with 4 RBFs, fuzzification coefficient set to 2, and linear consequent function).
Table 4 Machine learning datasets used in the experiments. Datasets
Number of features
Number of patterns (data)
Number of classes
Standard Machine Learning Data sets Australian (UCI) Balance (UCI) Diabetes (UCI) Glass (UCI) Hayes (UCI) Ionosphere (UCI) Iris (UCI) Liver (UCI) Sonar (UCI) Thyroid (UCI) Vehicle (UCI) Wine (UCI) Zoo (UCI)
42 4 8 9 5 34 4 6 60 5 18 13 16
690 625 768 214 132 351 150 345 208 215 846 178 101
2 3 2 6 3 2 3 2 2 3 4 3 7
240
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
Fig. 8. Center points of clusters and boundary surface of proposed classifier and conventional RBFNNs classifier (with 4 RBFs, 2.0 fuzzification coefficient, and quadratic consequent function).
From the average rank shown in Table 5, the Friedman test rejects the null-hypothesis under the condition that the statistic FF = 6.177 is greater than the critical value F(5, 60) = 2.368 when the significance level is set as 0.05. Since the null hypothesis was rejected by the Friedman test, the ad-hoc test is proceeded to examine whether the proposed classifier is statistically significantly better than other classifiers. In order to compare the classification performance statistically, Bonferroni–Dunn test is used. We can conclude that a classifier is better than the other one when the difference between ranks of the two classifier is greater than the critical difference (CD). In this experiments condition, CD is 2.091. The differences between the average rank of the proposed model and the already studied classifiers such as Bayesian Networks, Logistic Classifier, SVM, K -Nearest Neighbor, and Multilayer Perceptron are 2.077, 1.731, 2.346, 3.154, and 1.077 respectively. From the Bonferroni–Dunn test, we can say that the proposed classifier is better than SVM and K -Nearest Neighbor. From the experimental results shown in Table 5, the proposed classifier shows the better classification performance than Bayesian Network on 11 of 13 datasets (win: 11 data sets, loss: 2 data sets). When comparing with Logistic Regression, the proposed classifier is also preferred on 9 of 13 data sets (win: 9, loss: 2, tie: 2). Finally, when it comes to Multilayer perceptron, the proposed classifier results in better classification performance on 10 of 13 data sets (win: 10, loss: 2, tie: 1). 4.3. Laser induced breakdown spectroscopy spectrum data to recycle black plastic waste In this experiment, the spectra of black plastic wastes obtained by using LIBS are used to evaluate the proposed classifier. Environmental problem and energy depletion have become a major issue all over the world. Among the various experimental
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
241
Table 5 Results of comparative analysis for machine learning data sets (the best results shown in boldface). Data sets
Proposed classifier1
BayesNet*
Logistic*
SVM* (Poly)
KNN* (K = 5)
Multilayer perceptron*
Australian Balance Diabetes Glass Hayes Ionosphere Iris Liver Sonar Thyroid Vehicle Wine Zoo Average rank
87.39(1) 87.84(3) 74.49(5) 64.51(4) 82.71(1) 91.46(1) 98(1) 76.42(1) 85.61(1) 95.81(1) 84.99(1) 100(1) 96.0(2) 1.769
85.13(4) 73.73(6) 75.19(3) 70.6(1) 60.63(3) 89.94(3) 93.93(6) 57.36(6) 75.09(5) 95.21(3) 60.06(6) 98.31(3) 96.84(1) 3.846
86.67(2) 89.14(2) 77.42(1) 63.88(5) 56.13(4) 86.87(5) 96.33(3.5) 68.52(3) 71.07(6) 95.67(2) 79.78(3) 97.14(5) 95.06(4) 3.5
85.46(3) 87.46(4) 76.98(2) 56.83(6) 54.68(5) 88.03(4) 96.33(3.5) 57.97(5) 76.39(4) 88.51(6) 73.5(4) 98.59(2) 94.28(5) 4.115
84.42(5) 87.22(5) 73.61(6) 65.42(3) 34.46(6) 84.67(6) 96.13(5) 60.61(4) 80.29(3) 93.3(4) 70.79(5) 95.68(6) 93.46(6) 4.923
83.13(6) 91.15(1) 74.62(4) 66.09(2) 72.7(2) 90.08(2) 96.4(2) 68.75(2) 82.01(2) 91.35(5) 80.95(2) 98.03(4) 95.96(3) 2.846
*
Classifiers of WEKA [16].
Fig. 9. Laser induced breakdown spectroscopy.
issues, especially the black plastic wastes are focused in this experiment. To recycle plastic wastes, plastic wastes should be categorized into several classes according to their resins. While colored plastic wastes can be identified according to their resins by using Near Infrared (NIR) Spectroscopy easily, it is difficult to identify black plastic wastes due to their characteristic. In other words, black colored matter absorbs NIR light, so that black plastic wastes cannot be identified by using NIR spectroscopy. To overcome this drawback of NIR spectroscopy, Laser Induced Breakdown Spectroscopy (LIBS) is applied to acquire the characteristic spectra of plastic resins such as Acrylonitrile Butadiene Styrene (ABS), Polypropylene (PP), and Polystyrene (PS). LIBS is a form of emission spectroscopy of laser-produced plasma and a reliable technique for identifying the resin of black plastic. The LIBS system, which was used to acquire spectra of black plastic wastes in this experiment, is shown in Fig. 9. The black plastic wastes such 400 PP, 400 PS, and 400 ABS samples were sampled in Hwasung Recycle Center in Korea. All spectra of 400 PP, 400 PS, and 400 ABS black plastic samples are shown in Fig. 10. Table 6 summarizes the pertinent details of the LIBS data such as the number of features, the number of classes and the number of patterns. In Table 7, the comparison between the proposed identification algorithm with the other well-known and widely used classification algorithms. 5. Conclusions In this study, we have proposed and investigated the fuzzy radial basis function neural networks whose radial basis functions are located by using the conditional fuzzy c-means clustering method under supervision of the importance of
242
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
Fig. 10. Spectrum of black plastic wastes such as (a) PP, (b) PS, and (c) ABS.
Table 6 Application dataset (laser induced breakdown spectroscopy) used in the experiment. Datasets
Number of features
Number of patterns (data)
Number of classes
LIBS
10240
1200
3
Table 7 Results of comparative analysis on large data sets (the best results shown in boldface). Data sets
Proposed classifier1
BayesNet*
Logistic*
SVM* (Poly)
KNN* (K = 5)
Multilayer perceptron*
LIBS
96.17
89.50
92.91
91.83
95.00
95.08
*
Classifiers coming from WEKA [16].
each data pattern. In addition, the parameters of the consequent polynomial are estimated by the weighted least square estimation method, which is also guided by the importance of each data. From the experimental results, we can conclude that the effectiveness of the proposed RBFNNs classification system is visibly enhanced by applying the importance of each data pattern to locate the RBFs and estimate the consequent parameters.
S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243
243
Especially, in the case of a real application data like LIBS data, the proposed classification algorithm shows the better generalization capability then the well-known classification systems. In the future research, we will investigate various functions which can satisfies the two conditions, which are described in the context, for the membership function from the viewpoint of the classification performance. In addition, we will apply optimization method to optimize the shape of the membership function and also to improve the classification performance of the proposed classifier. Acknowledgements This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07051331 & NRF-2017R1D1A1B03032333). References [1] J.-S.R. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice-Hall, Englewood Cliffs, NJ, 1997. [2] W. Li, Y. Hori, An algorithm for extracting fuzzy rules based on RBF neural network, IEEE Trans. Ind. Electron. 53 (4) (2006) 1269–1276. [3] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (4) (1998) 601–612. [4] G. Tsekourasa, H. Sarimveisb, E. Kavaklia, G. Bafasb, A hierarchical fuzzy-clustering approach to fuzzy modeling, Fuzzy Sets Syst. 150 (2) (2005) 245–266. [5] W.W.Y. Ng, A. Dorado, D.S. Yeung, W. Pedrycz, E. Izquierdo, Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error, Pattern Recognit. 40 (2007) 19–32. [6] B.J. Park, W. Pedrycz, S.K. Oh, Polynomial-based radial basis function neural networks (P-RBFNNs) and their application to pattern classification, Appl. Intell. 32 (1) (2008) 27–46 [24]; W. Pedrycz, Conditional fuzzy C-means, Pattern Recognit. Lett. 17 (6) (1996) 625–632. [7] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (4) (1998) 601–612. [8] W. Pedrycz, H.S. Park, S.K. Oh, A granular-oriented development of functional radial basis function neural networks, Neurocomputing 72 (2008) 420–435. [9] C. Renjifo, D. Barsic, C. Carmen, K. Norman, G.S. Peacock, Improving radial basis function kernel classification through incremental learning and automatic parameter selection, Neurocomputing 72 (2008) 3–14. [10] M. Rocha, P. Cortez, J. Neves, Simultaneous evolution of neural network topologies and weights for classification and regression, in: Computational Intelligence and Bioinspired Systems: 8th Int. Workshop on Artificial Neural Networks, in: Lect. Notes Comput. Sci., vol. 3512, 2005, pp. 59–66. [11] F. Ros, M. Pintore, J.R. Chretien, Automatic design of growing radial basis function neural networks based on neighborhood concepts, Chemom. Intell. Lab. Syst. 87 (2007) 231–240. [12] M.R. Senapati, I. Vijaya, P.K. Dash, Rule extraction from radial basis functional neural networks by using particle swarm optimization, J. Comput. Sci. 3 (8) (2007) 592–599. [13] A. Staiano, R. Tagliaferri, W. Pedrycz, Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering, Neurocomputing 69 (13–15) (2006) 1570–1581. [14] Yuanhong Liu, Yansheng Zhang, Zhiwei Yu, Ming Zeng, Incremental supervised locally linear embedding for machinery fault diagnosis, Eng. Appl. Artif. Intell. 50 (2016) 60–70. [15] W. Huang, Sung-Kwun Oh, Optimized polynomial neural network classifier designed with the aid of space search simultaneous tuning strategy and data preprocessing techniques, J. Electr. Eng. Technol. 12 (2) (2017) 911–917. [16] E. Frank, M.A. Hall, T.H. Witten, Data Mining: Practical Machine Learning Tools and Techniques, fourth edition, Morgan Kaufman, 2016.