Design methodology for Radial Basis Function Neural Networks classifier based on locally linear reconstruction and Conditional Fuzzy C-Means clustering

International Journal of Approximate Reasoning 106 (2019) 228–243 Contents lists available at ScienceDirect International Journal of Approximate Rea...

Download PDF

5MB Sizes 0 Downloads 37 Views

Report

Full Text

International Journal of Approximate Reasoning 106 (2019) 228–243

Contents lists available at ScienceDirect

International Journal of Approximate Reasoning www.elsevier.com/locate/ijar

Design methodology for Radial Basis Function Neural Networks classiﬁer based on locally linear reconstruction and Conditional Fuzzy C-Means clustering ✩ Seok-Beom Roh a , Sung-Kwun Oh a,b,∗ , Witold Pedrycz c,d,e , Kisung Seo f , Zunwei Fu b a

Department of Electrical Engineering, The University of Suwon, Wauan-gil, Bongdam-eup, Hwaseong-si, Gyeonggi-do, South Korea Key Laboratory of Complex Systems and Intelligent Computing in Universities of Shandong, Linyi University, Linyi, 276005, China c Department of Electrical & Computer Engineering, University of Alberta, Edmonton AB T6R 2G7, Canada d System Research Institute, Polish Academy of Sciences, Warsaw, Poland e Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia f Department of Electronic Engineering, Seokyeong University, Jungneung-Dong 16-1, Sungbuk-gu, Seoul, 02713, South Korea b

a r t i c l e

i n f o

Article history: Received 2 August 2018 Received in revised form 18 October 2018 Accepted 15 January 2019 Available online 21 January 2019 Keywords: Conditional Fuzzy C-Means clustering Fuzzy Radial Basis Function Neural Networks Auxiliary information Locally linear reconstruction Outlier detection

a b s t r a c t In this study, a new design method for Fuzzy Radial Basis Function Neural Networks classiﬁer is proposed. The proposed approach is based on conditional Fuzzy C-Means clustering algorithm realized with the aid of auxiliary information, which is extracted by the locally linear reconstruction algorithm. Conditional Fuzzy C-Means can analyze the distribution of data (patterns) over the input space when being supervised by the auxiliary information. The conditional fuzzy C-Means clustering can substitute the conventional fuzzy C-Means clustering which has been usually used to deﬁne the radial basis functions over the input space. It is advocated that the auxiliary information extracted by using the locally linear reconstruction can determine which patterns among the entire data set are more important than the others. This assumption is based on the observation that the data, which cannot be fully reconstructed by the linear combination of their neighbors, may convey much more information than the other data to be reconstructed. It is well known that in the case of radial basis function neural networks classiﬁer, the classiﬁcation performance of this classiﬁer is predominantly based on the distribution of the radial basis function over the input space. Several experiments are provided to verify the proposed design method for classiﬁcation problems. © 2019 Elsevier Inc. All rights reserved.

1. Introduction Among various intelligent algorithms, fuzzy neural networks (FNNs) have become a popular over the past few decades [1,15]. Fuzzy neural networks can be considered as a type of hybrid system, which integrate superb learning capabilities of neural networks with reasoning abilities of fuzzy inference systems.

✩ This paper is part of the Virtual special issue on Fuzzy Systems and Fuzzy-Statistical Modeling: Theory and Applications, Edited by Jin Hee Yoon, Vilém Nóvak and Inuiguchi. Corresponding author. E-mail address: [email protected] (S.-K. Oh).

*

https://doi.org/10.1016/j.ijar.2019.01.008 0888-613X/© 2019 Elsevier Inc. All rights reserved.

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

229

Neural networks, which are one of the functional components of fuzzy neural networks, come with a sound learning algorithm such as the error back propagation (BP) algorithm to estimate their parameters. Fuzzy inference systems, being another functional component of fuzzy neural networks, support reasoning abilities and deliver an eﬃcient way to represent linguistic knowledge of experts in a formal manner. It can be concluded that fuzzy neural networks combine the advantages of neurocomputing and fuzzy sets and help eliminate their limitations. Fuzzy radial basis function neural networks (FRBFNNs) form fuzzy neural networks. FRBFNNs have been used widely in various areas such as system modeling, control, and classiﬁcation [2,5–13]. FRBFNNs are another type of hybrid system, which stems from the fuzzy inference system and neural networks. Especially, FRBFNNs have several properties such as simple structure, good local approximating performance, and function equivalence with a simpliﬁed class of fuzzy inference systems [2]. One of key issues of FRBFNNs is the distribution of radial basis functions (RBFs) in the input space. The classiﬁcation performance of FRBFNNs is closely related to the location of the RBFs over the input space. In order to determine the location of RBFs, clustering algorithms, which analyze the distribution of data points over the input space, are generally used. The general clustering algorithms can be considered as a type of unsupervised learning algorithm. The general objective of any data clustering techniques is somewhat different from the main requirement, which arises when RBFNNs are constructed to deal with the classiﬁcation problem [4]. In this paper, the Conditional Fuzzy C-Means (CFCM) clustering algorithm, which was proposed by Pedrycz [4], are used to determine the distribution of RBFs in the input space with the aid of supervisory information called “auxiliary information” which can help improve the classiﬁcation performance. The auxiliary information, which is necessary for the CFCM algorithm, is deﬁned by locally linear reconstruction error based on Locally Linear Embedding (LLE) [14]. In addition, the weighted least square estimation technique is used to estimate the parameters of the consequent parts in the proposed FRBFNNs. To demonstrate the classiﬁcation performance of the proposed classiﬁer, several machine learning data sets are used. The contributions of this study are summarized as follows: – The outlier elimination procedure is applied to reduce the effect of the outliers during analyzing the distribution of data patterns distributed over the input space. – We demonstrate how the important distribution level of data patterns are assigned based on locally linear reconstruction error. – We introduce how to analyze the distribution of data patterns over the input space and then to locate RBFs according to the important distribution level acquired by using locally linear reconstruction error. In the sequel, it leads to the better classiﬁcation performance. – Friedman test and Bonferroni–Dunn test are used to compare the classiﬁcation performance of the proposed classiﬁer. In order to show the possibility of the proposed classiﬁcation system, we use two categories of data sets such as the conventional machine learning data and Laser Induced Breakdown Spectroscopy (LIBS) spectrum data. Usually, the conventional data sets are used to compare the classiﬁcation capability of the proposed classiﬁer with the already studied classiﬁers. In contrast to machine learning data, LIBS spectrum data is a real-world application data, which can be used to assess the possibility of the proposed classiﬁer when the proposed classiﬁer is applied to a real world problem. The paper is organized as follows. First, in Section 2, we discuss Conditional Fuzzy C-Means clustering method, which analyzes a distribution of data patterns under supervision of auxiliary information. In Section 3, we introduce Fuzzy Radial Basis Function Neural Networks with Conditional Fuzzy C-Means Clustering Algorithm. Comprehensive studies exploiting a series of experiments are reported in Section 4. Finally, concluding remarks are covered in Section 5. 2. Conditional Fuzzy C-Means clustering with locally linear reconstruction error As mentioned before, the key issue related to FRBFNNs is where the RBFs should be located over the input space to improve the classiﬁcation performance of the FRBFNNs. In this study, locally linear reconstruction error is used as the criterion, which determines the importance of each data points. With the aid of the important degrees of data points, the locations of RBFs can be determined by using the conditional fuzzy C-Means clustering algorithm. 2.1. Locally linear reconstruction To improve the classiﬁcation performance of FRBFNNs, it becomes necessary to locate the RBFs across the input space. Generally, the clustering methods are used to analyze the data distribution and determine a position of the centroids of the clusters (viz. RBFs). However, the generic clustering methods are usually carried out from the viewpoint of unsupervised learning and exploit input data. In order to enhance the classiﬁcation performance, it seems to be necessary to locate RBFs over an input space in the mode of supervised learning. If a supervised clustering algorithm would be used, the supervised signal, which supervises a clustering algorithm to obtain the better location of RBFs should be extracted.

230

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

Locally linear reconstruction error (LLRE) is used as the supervision signal to help the performance of the clustering algorithm. Locally linear reconstruction (LLR) is a reconstruction phase of locally linear embedding, which is one of representative nonlinear dimension reduction methods. The nonlinear dimension reduction method includes isometric mapping (ISOMAP), local tangent alignment (LTSA), diffusion map, and kernel principal component analysis (KPCA) [14]. Locally linear reconstruction is considered as a kind of instance based learning. In LLR, if it is possible to describe a given data pattern well with its neighbors (i.e., K -nearest neighbors), the given data pattern has rarely unique information, in other words the information involved in the given data pattern can be described by the linear combination of the information of its neighbors. In other words, the importance of a data pattern, which can be described by the linear combination of its neighbors, is considered to be low. Whereas, a data pattern can be considered as a very important when it is diﬃcult to describe the data pattern with its neighbors well. To elaborate on the essence of the LLR method, let us consider a set of patterns X = {x1 , x2 , . . . , x N } (where N stands for the number of patterns), xk ∈ Rm (where m denotes for the dimensionality of the input space). The objective function of LLR is deﬁned as follows

2 K E (w) = x − w j x˜ j s.t. j =1

K

wj =1

(1)

j =1

Here, x˜ j denotes the j-th neighbors of a given data pattern x, K means the number of neighbors, and w ∈ Rm stands for the weight vector. In addition, w j denotes the weight of the k-th neighbor of a given data pattern x, which is used as a component of the coeﬃcient vector deﬁned as follows.

w = w1

··· wK

w2

T

∈ RK

(2)

The coeﬃcient vector w is used to reconstruct the given data pattern x through the linear combination of x and w. The reconstructed data pattern xˆ can be obtained as follows.

⎤

⎡

xˆ =

K

w j x˜ j = x˜ 1

w1 ⎥ ⎢ ⎢ w2 ⎥

˜ · · · x˜ K ⎢ . ⎥ = Xw ⎣ .. ⎦

x˜ 2

j =1

(3)

wK

The objective function of LLR can be considered as the reconstruction error engaging the Euclidean distance between the data pattern x and the data pattern reconstructed by using its neighbors x˜ j and the weight vector w. The optimal weight w can be estimated by minimizing the objective function (1). The objective function can be expressed in the matrix form.

2

2

T

1 1 1 (X − X˜ )w (X − X˜ )w s.t. E (w) = x − w j x˜ j = w j (x − x˜ j ) = 2 2 2 K

j =1

j =1

K

x ∈ Rm× K

X = x ···

˜ = x˜ 1 X

K

wj =1

(4a)

j =1

(4b)

m× K

· · · x˜ K ∈ R T w = w1 · · · w K ∈ R K ×1

(4c) (4d)

The Lagrange multiplier involved in the above optimization problem is expressed in the form

L(w, λ) =

1 2 1

(X − X˜ )w

T

(X − X˜ )w + λ(e K w − 1)

= wT (X − X˜ )T (X − X˜ )w + λ(ek w − 1) 2

(5)

Here, e K = [1 · · · 1] ∈ R1× K . The optimal solution to the optimization problem is expressed as

∂ L(w, λ) = (X − X˜ )T (X − X˜ )w + λekT = 0 ∂w ∂ L(w, λ) = ekT w − 1 = 0 ∂λ ˜ )T (X − X˜ ) −1 ek w = λ (X − X

(6) (7) (8)

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

−1

˜ )T (X − X˜ ) ekT w = λekT (X − X λ= w=

ek = 1

231

(9)

1

(10)

˜ )T (X − X˜ )]−1 ek ekT [(X − X [(X − X˜ )T (X − X˜ )]−1 ek ˜ )T (X − X˜ )]−1 ek e T [(X − X

(11)

k

We modify the constraint of the objective function of LLR (5) as follows

2

1 1 ˜ )T (X − Xw ˜ ) subject to E (w) = x − w j x˜ j = (X − Xw 2 2 K

w j ≥ 0, ∀ j , e K w = 1

(12)

j =1

The parameters denoted as w = [ w 1 · · · w k ] should be nonnegative, (i.e. w j ≥ 0, ∀ j). The quadratic programming is used to solve the optimization problem as (12). The procedure of LLR is shown as follows: Table 1 Pseudo code of locally linear reconstruction. Step 1: Calculate the distance between a pair of data patterns for i = 1 to N for j = 1 to N d2i j = (xi − x j ) T (xi − x j ) end end Step 2: Sort the distance in an ascending order [sorted_distance, sorted_index] = sort(d, ascend ); Step 3: Find the K -nearest neighbors of each data patterns for j = 1 to K x˜ j = xsorted_index( j ) end Step 4: Solve the quadratic programming: Q = [x − x˜ 1 · · · x − x˜ K ] T [x − x˜ 1 · · · x − x˜ K ] e K = [1 · · · 1], |e K | = K , b = 1 lb = [0 · · · 0], |lb| = K w = quadprog(Q, [ ], e TK , b, lbT ) Step 5: Reconstruct the new data point which can be calculated by the linear combination of the neighbors of the given data point with the estimated parameters w.

˜ xˆ = Xw

(13)

Here, x˜ denotes the reconstructed data point, w is the parameter obtained by using quadratic programming.

In the case of the conventional RBFNNs, the locations of the RBFs can be determined by using a conventional clustering algorithm (especially Fuzzy C-Means clustering algorithm is used to determine the centroids of clusters and the activation levels of data patterns). In some studies, the supervised clustering algorithms have been used to analyze the distribution of data patterns under the supervision of the supervised signal. In this study, the clustering algorithm in supervised manner is used to determine the locations of RBFs. The supervised signal is deﬁned by using the locally linear reconstruction error. We assume that the importance of a data pattern depends on the locally linear reconstruction error is expressed as

2 K ˜ )T (x − Xw ˜ ) RE(x, w) = x − xˆ = x − w j x˜ j = (x − Xw 2

(14)

j =1

Here, RE stands for the reconstruction error. A pattern with high reconstruction error cannot be easily reconstructed by the linear combination of its neighbors. This means that the information involved in the given pattern with high reconstruction error cannot be replaced by the linear combination of the information involved in its neighbor data points. In other words, the data pattern is too unique (different) to be replaced by the reconstruction of other data patterns. The uniqueness of data patterns emphasizes its importance.

232

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

Fig. 1. Membership function used in the quantiﬁcation of importance of data.

Therefore, the uniqueness, which can be indicated by the reconstruction error, can be used in the realization of a supervision signal. We use membership function to transform the uniqueness of data patterns to their importance level deﬁned in the [0, 1]. interval. The domain and the range of the transforming function f are x ≥ 0 and 0 ≤ y ≤ 1, respectively. In addition, the transform function should be a kind of monotonically increasing function. In other words, the more unique a data pattern is, the more important the data pattern should be considered when analyzing the distribution of data patterns. The membership function used to determine the importance of data patterns based on their reconstruction errors is described as follows

B (x) = f RE(x); a, b

f (x; a, b) =

⎧ 0, ⎪ ⎪ ⎨ 2( x−a )2 , b−a

(15a) x≤a b a ≤ x ≤ a+ 2

a 2 a+b ⎪ 1 − 2( bx− ≤x≤b ⎪ 2 −a ) , ⎩ 1, x≥b

(15b)

Here, B denotes the importance of a data pattern, and a and b are the parameters which are related to the shape of the transforming function. The membership function to transform a reconstruction error to the related importance of a given pattern is shown Fig. 1. As shown there, we can change the relation between the uniqueness and the importance by adjusting the values of the shape parameters a and b. 2.2. Conditional Fuzzy C-Means clustering with the auxiliary information The idea of Conditional Fuzzy C-Means (c-FCM, for short) clustering proposed in [4] was applied to the design of RBF neural networks as presented in [7]. To elaborate on the essence of the method, let us consider a set of patterns X = {x1 , x2 , . . . , x N }, xk ∈ m (where m stands for the dimensionality of the input space) along with an auxiliary information granule, which is deﬁned as the boundary area. Each element of X is then associated with the auxiliary information granule (fuzzy set) B given by (15). In conditional clustering, the data pattern xk is clustered by taking into consideration the conditions (auxiliary information expressed in the form given by B (x1 ), B (x2 ), . . . , B (xn )) based on some linguistic term expressed as a fuzzy set B (B : → [0, 1]). The objective function used in the conditional fuzzy clustering is the same as the one used in the FCM, namely

J=

c n (u ik ) p · xk − vi 2

(16)

i =1 k =1

where J is the objective function, u ik is the activation level associated with the linguistic term B deﬁning the boundary area, vi is the ith cluster and c is the number of rules (clusters) formed for this context. The difference between the FCM and c-FCM comes in the form of the constraint imposed on the partition matrix where we now have

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

233

Fig. 2. Data patterns of toy problem.

c

u ik = B (xk )

(17)

i =1

Here, B (xk ) is the linguistic term (fuzzy set) which means the activation level how much the input data xk is involved in the boundary area. Now the optimization problem is formulated in the following form

min J U,v

c

subject to

u ik = B (xk )

(18)

i =1

The iterative optimization scheme is governed by the two update formulas using which we successively modify the partition matrix and the prototypes

u ik = C

B (xk )

||xk −vi || 2/( p −1) j =1 ( ||xk −v j || )

n p k=1 (u ik ) · xk vi = n p k=1 (u ik )

(19a)

(19b)

So far, it is explained how to determine the center points of RBFs by using FCM with the aid of auxiliary information which is deﬁned by reconstruction error and the membership function. In order to show the procedure of the proposed determination procedure to locate RBFs over the input space, the data patterns shown in Fig. 2 are used. In Fig. 3, the allocation of the centroids of clusters (i.e. centroids of RBFs) is shown. In the ﬁrst step, some outliers included in the data patterns should be determined as shown in Fig. 3-a. To identify which data patterns are regarded as outliers, we use the minimum distance among distances of a data pattern and the other data patterns, which are involved the different classes. The distances between a given data pattern and other data patterns which are involved in the different classes are calculated in the form

D i = {di1 , di2 , . . . , di , N −|C k | } di j = xi − x j , xi ∈ C k , x j ∈ C l , l = k

(20)

Here, C k denotes the set of data patterns, which are involved in the k-th class, |C k | means the number of data of k-th class, N is the number of the entire data set. The distance between a given data pattern xi and the other classes is assigned as the minimum distance among the elements of the distance set

Dist(xi ) = di ,i ∗ ,

i ∗ = min di j j

(21)

234

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

Fig. 3. Procedure to calculate auxiliary information.

We determine the outliers among the whole data patterns in terms of the distance of the pattern to the other classes. In other words, the L data with large distance are determined as outliers. 3. Fuzzy radial basis function neural networks with Conditional Fuzzy C-Means clustering algorithm As mentioned before, the auxiliary for CFCM is deﬁned by using outlier elimination and LLRE. The LLR algorithm is based on simple geometric intuitions [3]. Provided there is suﬃcient data, each data point and its neighbors are expected to lie on or close to a locally linear patch of the manifold. The geometry of these patches is characterized by linear coeﬃcients that reconstruct each data point based on its neighbors. In this study, FRBFNNs constructed by using CFCM with the aids of the auxiliary information obtained by LLRE are used as a classiﬁer. It is said that the RBFNNs have some advantages including global optimal approximation and classiﬁcation capabilities as well as rapid convergence of the underlying learning procedures. The generic topology of FRBFNNs is depicted in Fig. 4.

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

235

Fig. 4. An overall topology of radial basis function neural networks with auxiliary information.

In Fig. 4, u i , i = 1, . . . , R denote receptive ﬁelds (radial basis functions), while the parameter “m” denotes the number of input variables. The output of the FRBFNNs is expressed as a linear combination of the outputs (μi ) of the corresponding hidden nodes with the connection weights ( f i ) as (22).

yˆ (xk ) =

R

u i f i (xk )

(22)

i =1

To estimate the parameters, we use the orthogonal least square method and the weighted least square estimation method. Proceeding with the optimization details, the objective function of Least Square Error (LSE) reads as follows

J=

n

gi −

i =1

g1

u ji f j (xi )

= (G − Θ a)T (G − Θ a)

(23)

m

j =1 ai j x j ,

⎤

⎡

⎢ g2 ⎥ ⎢ ⎥ G = ⎢ . ⎥, ⎣ .. ⎦

2

j

where f i (x) = ai0 +

⎡

c

gn

u 11 Θ = ⎣ u 21 un1

a = a10

u 11 x11 u 21 x12 un1 x1n

⎤ · · · u 11 xm1 u 12 u 12 x11 · · · u 12 xm1 · · · u 1c xm1 · · · u 21 xm2 u 22 u 22 x12 · · · u 22 xm2 · · · u 2c xm2 ⎦ , · · · un1 xmn un2 un2 x1n · · · un2 xmn · · · unc xmn

· · · a1m a20 a21 · · · a2m · · · acm

a11

T

∈ c(m+1)

The optimized values of the coeﬃcients are expressed in a well-known manner

a = ΘT Θ

− 1

ΘT G

(24)

When we use the weighted LSE to estimate the coeﬃcients of local models, we assume that each data patterns comes with its priority (relevance) and data patterns with high priority signiﬁcantly affect the estimation process whereas data with low

236

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

priority participate to a limited degree and can be almost neglected. The activation levels of the linguistic variable deﬁning the boundary area can be considered as the priority index. As noted earlier, we emphasize the data positioned within the boundary area. Unlike the conventional LSE, the objective function of the weighted LSE is deﬁned as follows

J=

n

2 c

q B (xi ) gi − u ji f j (xi ) = (G − Θ a)T Dq (G − Θ a)

i =1

⎡ ⎢ ⎢ ⎣

B (x1 )

0 B (x2 )

where D = ⎢

(25)

j =1

..

.

⎤ ⎥ ⎥ ⎥. ⎦

0 B (xn ) In the above expression, q denotes the linguistic modiﬁer of the activation level of the boundary area. If the values of q get higher than 1, we arrive at higher speciﬁcity of the underlying linguistic information while an opposite effect becomes present when dealing with the lower values of q [3]. Note that the diagonal partition matrix D is the reduced matrix, which is composed of the activation levels of all data pairs to the linguistic term B as the diagonal elements. The optimal values of the coeﬃcients by using the weighted LSE are determined in a well-known manner.

a = Θ T Dq/2 Θ

− 1

Θ T Dq/2 Y

(26)

The ﬁnal output of the RBF NNs comes in the following form

ˆ = Θa Y

(27)

The estimated class label is determined by using the well-known decision rule

gˆ i =

0, if yˆ i < 0 1, if yˆ i ≥ 0

(28)

The pseudocode used to construct the classiﬁer is summarized in Table 2. Considering the pseudo code for the construction of the proposed RBF NNs classiﬁer, there are several essential parameters of the proposed RBF NNs that impact its performance of the classiﬁer. These parameters concern the number of clusters (C ), the values of the fuzziﬁcation coeﬃcient (p) and the linguistic modiﬁer (q).

Table 2 Pseudocode for the construction of RBF NNs classiﬁer with the boundary area. Main Procedure of Radial Basis Function Neural Networks Classiﬁer Decide upon the design parameters such as a. The number of Clusters (R) used to deﬁne Boundary Area b. The number of RBFs (C ) c. Order of the polynomial (O ) d. Fuzziﬁcation Coeﬃcient (p) e. Linguistic Modiﬁer (q) f. The number of nearest neighbors (K ) Deﬁne Auxiliary Information based on Locally Linear Reconstruction Error a. Reconstruct given data patterns by using (12) b. Calculate reconstruction error by using (14) c. Calculate importance of data patterns by using (15) Determine the location of RBFs within Auxiliary Information by using c-FCM a. Calculate partition matrix and prototypes of RBFs using (23) and (24) Estimate the coeﬃcients of local models a. Weight Least Square Estimation using (26) Calculate the output of Radial Basis Function Neural Networks a. Calculate ﬁnal output of RBF NNs using (27) Decide class label of data pattern a. Determine class label of data pattern using decision rule (28) End Procedure of Radial Basis Function Neural Networks Classiﬁer

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

237

Table 3 Selected numeric values of the parameters of the proposed model. Parameter

Value

Polynomial order (O ) Number of RBFs (c) Fuzziﬁcation coeﬃcient (p) Number of nearest neighbors (K )

0 (constant), 1 (linear), or 2 (quadratic) 2–9 In the range of 1.5–3.0 varying with step of 0.5 1.0 or 2.0

Fig. 5. Two-dimensional synthetic data.

4. Experimental studies In order to evaluate and quantify the classiﬁcation effectiveness of the proposed classiﬁer, the proposed classiﬁer is experimented with by making use of a series of numeric data including two synthetic datasets and several Machine Learning datasets (http://www.ics.uci.edu/~mlearn/MLRepository.html). In the assessment of the performance of the classiﬁers, we use the error rate of the resulting classiﬁer. We investigate and report the results of each experiment in terms of the mean and the standard deviation of the performance index (error rate). We consider some predeﬁned values of the parameters of the network whose values are summarized in Table 1. The choice of these particular numeric values has been motivated by the need to come up with a possibility to investigate of the performance of the model in a fairly comprehensive range of scenarios. The numeric values in Table 3 were selected through a trial and error process by running a number of experiments and monitoring the pace of learning and assessing its convergence. 4.1. Two-dimensional data In order to illustrate the characteristics and performance of the proposed classiﬁers such as the location of the centroids of clusters determined by CFCM under the supervision of the auxiliary information and the boundary surface estimated by using the proposed classiﬁer, the two-dimensional synthetic examples are of instructional value. The two-dimensional synthetic data are shown in Fig. 5. Each cluster consists of 100 data. The classiﬁcation results of the proposed classiﬁer is shown in Fig. 6. To locate the centroids of clusters over the input space, we should extract the auxiliary information by using the locally linear reconstruction error as shown in Fig. 6-b. Based on the obtained auxiliary information, CFCM is used to determine the locations of centroids of clusters as depicted in Fig. 6-c. In Fig. 6-d, we can see the difference between the center points obtained by using FCM and those by using CFCM. The centroids determined by using CFCM are located along the boundary surface. In Fig. 7, the center points of clusters and the estimated boundary surface which is estimated by WLSE or LSE are shown. The number of clusters and fuzziﬁcation coeﬃcient are 4 and 2.0 respectively. In addition, the linear functions are used as the consequent part of the RBFNNs classiﬁer.

238

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

Fig. 6. Results of localization of clusters with the aid of auxiliary information.

In Fig. 8, the center points of clusters and the estimated boundary surface of RBFNNs which is estimated by WLSE or LSE are shown. The number of clusters and fuzziﬁcation coeﬃcient are 4 and 2.0, respectively. In addition, the order of the consequent part of RBFNNs is 3 (i.e. quadratic polynomial is used as the consequent part of the RBFNNs classiﬁer). 4.2. Machine learning data sets In what follows, we report on several experiments when using machine learning data sets coming from UCI Machine Learning Repository (http://www.ics.uci.edu/~mlearn/MLRepository.html). Table 4 summarizes the pertinent details of 13 data sets such as the number of features and the number of patterns. In addition, Table 5 contrasts the classiﬁcation performance of the proposed classiﬁer with the performance of other classiﬁcation methods such as Bayesian Networks, Logistic Classiﬁer, Support Vector Machine (SVM), K -Nearest Neighbor, and Multilayer Perceptron. These classiﬁers were experimented within the data mining framework of WEKA [16]. The Friedman test is well known as a non-parametric equivalent of the repeated-measures analysis of variance (ANOVA). In this study, the Friedman test is used to compare the classiﬁcation performances of the proposed classiﬁer and several already studied classiﬁers such as Bayesian Networks, Logistic Regression, Support Vector Machine, K -Nearest Neighbors, and Multilayer Neural Networks. The Friedman test checks whether the measured average ranks are signiﬁcantly different from the mean rank. In order to show the rank of each classiﬁer for each data set, the rank is described in Table 5, which is denoted inside parenthesis.

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

239

Fig. 7. Center points of clusters and boundary surface of proposed classiﬁer and conventional RBFNNs classiﬁer (with 4 RBFs, fuzziﬁcation coeﬃcient set to 2, and linear consequent function).

Table 4 Machine learning datasets used in the experiments. Datasets

Number of features

Number of patterns (data)

Number of classes

Standard Machine Learning Data sets Australian (UCI) Balance (UCI) Diabetes (UCI) Glass (UCI) Hayes (UCI) Ionosphere (UCI) Iris (UCI) Liver (UCI) Sonar (UCI) Thyroid (UCI) Vehicle (UCI) Wine (UCI) Zoo (UCI)

42 4 8 9 5 34 4 6 60 5 18 13 16

690 625 768 214 132 351 150 345 208 215 846 178 101

2 3 2 6 3 2 3 2 2 3 4 3 7

240

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

Fig. 8. Center points of clusters and boundary surface of proposed classiﬁer and conventional RBFNNs classiﬁer (with 4 RBFs, 2.0 fuzziﬁcation coeﬃcient, and quadratic consequent function).

From the average rank shown in Table 5, the Friedman test rejects the null-hypothesis under the condition that the statistic FF = 6.177 is greater than the critical value F(5, 60) = 2.368 when the signiﬁcance level is set as 0.05. Since the null hypothesis was rejected by the Friedman test, the ad-hoc test is proceeded to examine whether the proposed classiﬁer is statistically signiﬁcantly better than other classiﬁers. In order to compare the classiﬁcation performance statistically, Bonferroni–Dunn test is used. We can conclude that a classiﬁer is better than the other one when the difference between ranks of the two classiﬁer is greater than the critical difference (CD). In this experiments condition, CD is 2.091. The differences between the average rank of the proposed model and the already studied classiﬁers such as Bayesian Networks, Logistic Classiﬁer, SVM, K -Nearest Neighbor, and Multilayer Perceptron are 2.077, 1.731, 2.346, 3.154, and 1.077 respectively. From the Bonferroni–Dunn test, we can say that the proposed classiﬁer is better than SVM and K -Nearest Neighbor. From the experimental results shown in Table 5, the proposed classiﬁer shows the better classiﬁcation performance than Bayesian Network on 11 of 13 datasets (win: 11 data sets, loss: 2 data sets). When comparing with Logistic Regression, the proposed classiﬁer is also preferred on 9 of 13 data sets (win: 9, loss: 2, tie: 2). Finally, when it comes to Multilayer perceptron, the proposed classiﬁer results in better classiﬁcation performance on 10 of 13 data sets (win: 10, loss: 2, tie: 1). 4.3. Laser induced breakdown spectroscopy spectrum data to recycle black plastic waste In this experiment, the spectra of black plastic wastes obtained by using LIBS are used to evaluate the proposed classiﬁer. Environmental problem and energy depletion have become a major issue all over the world. Among the various experimental

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

241

Table 5 Results of comparative analysis for machine learning data sets (the best results shown in boldface). Data sets

Proposed classiﬁer1

BayesNet*

Logistic*

SVM* (Poly)

KNN* (K = 5)

Multilayer perceptron*

Australian Balance Diabetes Glass Hayes Ionosphere Iris Liver Sonar Thyroid Vehicle Wine Zoo Average rank

87.39(1) 87.84(3) 74.49(5) 64.51(4) 82.71(1) 91.46(1) 98(1) 76.42(1) 85.61(1) 95.81(1) 84.99(1) 100(1) 96.0(2) 1.769

85.13(4) 73.73(6) 75.19(3) 70.6(1) 60.63(3) 89.94(3) 93.93(6) 57.36(6) 75.09(5) 95.21(3) 60.06(6) 98.31(3) 96.84(1) 3.846

86.67(2) 89.14(2) 77.42(1) 63.88(5) 56.13(4) 86.87(5) 96.33(3.5) 68.52(3) 71.07(6) 95.67(2) 79.78(3) 97.14(5) 95.06(4) 3.5

85.46(3) 87.46(4) 76.98(2) 56.83(6) 54.68(5) 88.03(4) 96.33(3.5) 57.97(5) 76.39(4) 88.51(6) 73.5(4) 98.59(2) 94.28(5) 4.115

84.42(5) 87.22(5) 73.61(6) 65.42(3) 34.46(6) 84.67(6) 96.13(5) 60.61(4) 80.29(3) 93.3(4) 70.79(5) 95.68(6) 93.46(6) 4.923

83.13(6) 91.15(1) 74.62(4) 66.09(2) 72.7(2) 90.08(2) 96.4(2) 68.75(2) 82.01(2) 91.35(5) 80.95(2) 98.03(4) 95.96(3) 2.846

*

Classiﬁers of WEKA [16].

Fig. 9. Laser induced breakdown spectroscopy.

issues, especially the black plastic wastes are focused in this experiment. To recycle plastic wastes, plastic wastes should be categorized into several classes according to their resins. While colored plastic wastes can be identiﬁed according to their resins by using Near Infrared (NIR) Spectroscopy easily, it is diﬃcult to identify black plastic wastes due to their characteristic. In other words, black colored matter absorbs NIR light, so that black plastic wastes cannot be identiﬁed by using NIR spectroscopy. To overcome this drawback of NIR spectroscopy, Laser Induced Breakdown Spectroscopy (LIBS) is applied to acquire the characteristic spectra of plastic resins such as Acrylonitrile Butadiene Styrene (ABS), Polypropylene (PP), and Polystyrene (PS). LIBS is a form of emission spectroscopy of laser-produced plasma and a reliable technique for identifying the resin of black plastic. The LIBS system, which was used to acquire spectra of black plastic wastes in this experiment, is shown in Fig. 9. The black plastic wastes such 400 PP, 400 PS, and 400 ABS samples were sampled in Hwasung Recycle Center in Korea. All spectra of 400 PP, 400 PS, and 400 ABS black plastic samples are shown in Fig. 10. Table 6 summarizes the pertinent details of the LIBS data such as the number of features, the number of classes and the number of patterns. In Table 7, the comparison between the proposed identiﬁcation algorithm with the other well-known and widely used classiﬁcation algorithms. 5. Conclusions In this study, we have proposed and investigated the fuzzy radial basis function neural networks whose radial basis functions are located by using the conditional fuzzy c-means clustering method under supervision of the importance of

242

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

Fig. 10. Spectrum of black plastic wastes such as (a) PP, (b) PS, and (c) ABS.

Table 6 Application dataset (laser induced breakdown spectroscopy) used in the experiment. Datasets

Number of features

Number of patterns (data)

Number of classes

LIBS

10240

1200

3

Table 7 Results of comparative analysis on large data sets (the best results shown in boldface). Data sets

Proposed classiﬁer1

BayesNet*

Logistic*

SVM* (Poly)

KNN* (K = 5)

Multilayer perceptron*

LIBS

96.17

89.50

92.91

91.83

95.00

95.08

*

Classiﬁers coming from WEKA [16].

each data pattern. In addition, the parameters of the consequent polynomial are estimated by the weighted least square estimation method, which is also guided by the importance of each data. From the experimental results, we can conclude that the effectiveness of the proposed RBFNNs classiﬁcation system is visibly enhanced by applying the importance of each data pattern to locate the RBFs and estimate the consequent parameters.

S.-B. Roh et al. / International Journal of Approximate Reasoning 106 (2019) 228–243

243

Especially, in the case of a real application data like LIBS data, the proposed classiﬁcation algorithm shows the better generalization capability then the well-known classiﬁcation systems. In the future research, we will investigate various functions which can satisﬁes the two conditions, which are described in the context, for the membership function from the viewpoint of the classiﬁcation performance. In addition, we will apply optimization method to optimize the shape of the membership function and also to improve the classiﬁcation performance of the proposed classiﬁer. Acknowledgements This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07051331 & NRF-2017R1D1A1B03032333). References [1] J.-S.R. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice-Hall, Englewood Cliffs, NJ, 1997. [2] W. Li, Y. Hori, An algorithm for extracting fuzzy rules based on RBF neural network, IEEE Trans. Ind. Electron. 53 (4) (2006) 1269–1276. [3] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (4) (1998) 601–612. [4] G. Tsekourasa, H. Sarimveisb, E. Kavaklia, G. Bafasb, A hierarchical fuzzy-clustering approach to fuzzy modeling, Fuzzy Sets Syst. 150 (2) (2005) 245–266. [5] W.W.Y. Ng, A. Dorado, D.S. Yeung, W. Pedrycz, E. Izquierdo, Image classiﬁcation with the use of radial basis function neural networks and the minimization of the localized generalization error, Pattern Recognit. 40 (2007) 19–32. [6] B.J. Park, W. Pedrycz, S.K. Oh, Polynomial-based radial basis function neural networks (P-RBFNNs) and their application to pattern classiﬁcation, Appl. Intell. 32 (1) (2008) 27–46 [24]; W. Pedrycz, Conditional fuzzy C-means, Pattern Recognit. Lett. 17 (6) (1996) 625–632. [7] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (4) (1998) 601–612. [8] W. Pedrycz, H.S. Park, S.K. Oh, A granular-oriented development of functional radial basis function neural networks, Neurocomputing 72 (2008) 420–435. [9] C. Renjifo, D. Barsic, C. Carmen, K. Norman, G.S. Peacock, Improving radial basis function kernel classiﬁcation through incremental learning and automatic parameter selection, Neurocomputing 72 (2008) 3–14. [10] M. Rocha, P. Cortez, J. Neves, Simultaneous evolution of neural network topologies and weights for classiﬁcation and regression, in: Computational Intelligence and Bioinspired Systems: 8th Int. Workshop on Artiﬁcial Neural Networks, in: Lect. Notes Comput. Sci., vol. 3512, 2005, pp. 59–66. [11] F. Ros, M. Pintore, J.R. Chretien, Automatic design of growing radial basis function neural networks based on neighborhood concepts, Chemom. Intell. Lab. Syst. 87 (2007) 231–240. [12] M.R. Senapati, I. Vijaya, P.K. Dash, Rule extraction from radial basis functional neural networks by using particle swarm optimization, J. Comput. Sci. 3 (8) (2007) 592–599. [13] A. Staiano, R. Tagliaferri, W. Pedrycz, Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering, Neurocomputing 69 (13–15) (2006) 1570–1581. [14] Yuanhong Liu, Yansheng Zhang, Zhiwei Yu, Ming Zeng, Incremental supervised locally linear embedding for machinery fault diagnosis, Eng. Appl. Artif. Intell. 50 (2016) 60–70. [15] W. Huang, Sung-Kwun Oh, Optimized polynomial neural network classiﬁer designed with the aid of space search simultaneous tuning strategy and data preprocessing techniques, J. Electr. Eng. Technol. 12 (2) (2017) 911–917. [16] E. Frank, M.A. Hall, T.H. Witten, Data Mining: Practical Machine Learning Tools and Techniques, fourth edition, Morgan Kaufman, 2016.

Design methodology for Radial Basis Function Neural Networks classifier based on locally linear reconstruction and Conditional Fuzzy C-Means clustering

Design methodology for Radial Basis Function Neural Networks classifier based on locally linear reconstruction and Conditional Fuzzy C-Means clustering

Recommend Documents