Applied Soft Computing 54 (2017) 121–140
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Review Article
Knowledge base to fuzzy information granule: A review from the interpretability-accuracy perspective Md. Manjur Ahmed a,b , Nor Ashidi Mat Isa b,∗ a
Faculty of Computer Systems and Software Engineering, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Kuantan, Pahang, Malaysia Imaging and Intelligent Systems Research Team (ISRT), School of Electrical and Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong Tebal, Penang, Malaysia b
a r t i c l e
i n f o
Article history: Received 3 February 2016 Received in revised form 22 November 2016 Accepted 30 December 2016 Available online 13 January 2017 Keywords: Fuzzy information granule Interpretability-accuracy tradeoff Granularity-accuracy dilemma Overfitting/underfitting situation Conflict situation
a b s t r a c t Fuzzy information granules indicate sufficiently interpretable fuzzy sets for achieving a high level of human cognitive abstraction. Furthermore, granularity, complexity, and accuracy are associated with fuzzy information granules. Measuring granularity is a promising means of verifying the effectiveness of the fuzzy granular model. Higher granularity indicates fine partitions, whereas coarser partitions suggest lower granularity. Therefore, accuracy is directly proportional to the granularity, such that, the higher the granularity, the more accurate and more complex the model is. Consequently, the granularity-simplicity tradeoff is also a significant criterion in considering the interpretability-accuracy tradeoff. This paper thoroughly reviews diverse ideas to understand the fuzzy information granule and addresses a sensible compromise between interpretability-accuracy and granularity-simplicity. Those requirements contradict each other, thus certain conceptual and mathematical considerations are necessary in designing a granular framework. Moreover, a double axis taxonomy is introduced in this paper: “complexity-based granularity versus semantic-based granularity” (which considers granularity measures) and “granular partition level versus granular rule base level” (regarding knowledge base stages). However, several constraints should be considered in designing a granular framework such as the granularity-accuracy dilemma, the overfitting/underfitting situation, the granular rule base level conflict, the interpretability constraint threshold, the stability-plasticity dilemma, and the parameter optimization. This paper primarily aims to present a conceptual framework to better understand existing methods, as well as how these methods can inspire future research. © 2017 Elsevier B.V. All rights reserved.
Contents 1. 2. 3.
4.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Review method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Information granule and fuzzy model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.1. Knowledge transfer to fuzzy model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.2. Double axis taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Interpretability-accuracy tradeoff in the fuzzy information granule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.1. Interpretability, accuracy, and granularity measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.2. T1 : granularity-simplicity tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.2.1. Granularity, complexity, and accuracy: associations on the fuzzy information granule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.2.2. Overfitting and underfitting situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.2.3. Justifiable information granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.3. T2 : conflict decision and its resolver for the fuzzy granular model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
∗ Corresponding author. E-mail addresses:
[email protected], manjur
[email protected] (Md.M. Ahmed),
[email protected],
[email protected] (N.A.M. Isa). http://dx.doi.org/10.1016/j.asoc.2016.12.055 1568-4946/© 2017 Elsevier B.V. All rights reserved.
122
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
4.4.
5.
6.
T3 : interpretability constraints and parameter optimization for interpretability-Accuracy tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.4.1. Constraint realization for semantic blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.4.2. Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.4.3. Constraint realization and parameter optimization discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5. Research on interpretability-accuracy tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5.1. Context-Based fuzzy system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5.2. Evolving granule methods to realize application environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.5.3. Self-adaptation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.5.4. Sequential decision-Making and progressive computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.1. Vicinity for designing fuzzy granular framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.2. Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
1. Introduction Various conceptual frameworks have realized the intelligent analysis of a knowledge base (information). These frameworks usually focus the analysis on several algorithmic and methodological features (such as evolutionary computing, pattern recognition, and symbolic processing) [1] (Lu et al., 2014). One of the recent research developments is computational intelligence (CI), a novel and intelligent methodology of data analysis [2]. According to Pedrycz [3], “computational intelligence encompass of neural networks, fuzzy set technology, and evolutionary computing”. Granular computing (GrC) is a computing paradigm of CI which processes information in the form of information granules [4]. Information granules are described as a conceptual framework of basic entities [5] that arise knowledge is derived from the knowledge structure. Information granules can be defined in many frameworks such as rough sets and interval analysis [details in [6]]. Furthermore, fuzzy set theory (FST) is prominent across all GrC frameworks because it can be expressed as human-centered concepts [7]. Thus, GrC with FST generates theory of fuzzy information granule (TFIG). Fuzzy information granules (FIG) have fuzzy sets (or linguistic terms) that are sufficiently interpretable (meaning, the information granule) to achieve a high level of human cognitive abstraction. Therefore, research on FIG and fuzzy models diverge. Two different areas in fuzzy model research can be considered depending on the main requirement pursued: 1. Linguistic fuzzy modeling (LFM): Here, the main objective is a fuzzy model with good semantic interpretability. This LFM is based on linguistic rules (Mamdani-type fuzzy rule) with antecedent and consequent parts that may be referred to as linguistic terms [8]. The associated fuzzy sets are defined through such linguistic terms. 2. Precise fuzzy modeling (PFM): This primarily aims to achieve a fuzzy model with good accuracy; it is mainly developed by the Takagi-Sugeno method [9]. This model considers fuzzy sets without associated meanings. The primary focus of this article is to present a systematic review of the diverse ideas compromising interpretability and accuracy. These characteristics contradict each other; hence, a conceptual framework, in which interpretability-accuracy tradeoff is achieved, for FIG is required. Therefore, an interpretable or a Mamdani-type fuzzy model is essential to realize a mathematical and conceptual FIG. Nowadays, the research interest on interpretable or Mamdanitype fuzzy model is increasing because of the human cognitive aspect that this model can achieve.
The automatic generation of information granules from knowledge is a very promising undertaking because of its close relation to human behaviors and its depiction of high abstraction levels. Therefore, the FIG should emphasize knowledge acquisition such that it can be naturally labeled in linguistic terms; consequently, the fuzzy model should favor natural language. Furthermore, the interpretability of model performance is affected. Numerous research have proposed the fuzzy information granule to soundly compromise between interpretability and accuracy. This paper reviews the FIG and its limitations with consideration of the interpretability-accuracy tradeoff. The methodology to carry out the review is described in the next section (Section 2). Section 3 starts with a discussion of the information granule and its relation to fuzzy systems. Section 3.2 proposes a FIG taxonomy for the interpretability-accuracy tradeoff. A detailed overview on the interpretability-accuracy dilemma is discussed in Section 4 given that the interpretability and accuracy of intuitively compelled conditions. Furthermore, compromising between granularity and accuracy is also vital to finely partition the knowledge base (see Section 4.2.1). To achieve the interpretability-accuracy tradeoff, the following factors should be resolved: overfitting/underfitting situation (Section 4.2.2), justifiable granularity (Section 4.2.3), and conflict decision (Section 4.3). Moreover, parameter optimization and interpretability constraints are discussed in line with the recent models in Section 4.4 to realize a highly interpretable model with reasonable accuracy. The latest fuzzy granularity models are reviewed in Section 4.5. Several research scopes are discussed in Section 5.1. A few recommendations inspired by recent research for future studies are presented in Section 5.2. Finally, the conclusions are expressed in Section 6. 2. Review method In order to carry out the review, a systematic and structured methodology based on [97] and [98] have been considered in this review paper as follows: (i) Identification of research or identifying the relevant literature: The main objective of this review paper is to present the systematic review of FIG from the perspective of interpretability and accuracy. This objective is already presented in Section 1: Introduction. Detailed discussions regarding the relation of interpretability, accuracy and granularity can be found in Sections 3.1 and 4.1. In order to ensure the structured approach, the source materials have been considered from top journals (i.e. IEEE Transactions, Wiley Publications, Elsevier Journals, and Springer Journals etc.). All of these references are ISI-Web of Science journals. Moreover, works and researches form the renowned researchers have been taken
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
into consideration for this review process. Some of those renowned researchers are J. M. Alonso, C. Quek, A. Bargiela, W. Pedrycz, J. A. Keane, R. Alcalá, F. Herrera, M. J. Gacto, N. Kasabov, B. Kosko, E. Lughofer and C. Mencar. (ii) Structuring the review: In order to make sense of the accumulated knowledge, this review paper proposes a double axis taxonomy as in Table 1 (Section 3.2). Furthermore, this work employed the concept-centric method rather than the author-centric, which is described by [98]. This concept-centric method could be observed in this review process, especially in Tables 1 and 3 (Sections 3.2 and 4.5, respectively). (iii) Tone: In this review process, we have recognized the previous works that are accumulated slowly in a piecemeal fashion. Particularly, Sections 4 and 5 have described the existing research. In addition, critical comments on these researches are included as well. These critical comments are also supported either hypothetically and/or using appropriate reference(s). (iv) Theoretical development and evaluation: In order to identify the critical knowledge gaps, this article has been discussed the research scope while designing a fuzzy granular framework (i.e. Section 5.1). Furthermore, based on Sections 4 and 5.1, we have highlighted some problem statements, which have built from multiple paradigms (or references). Motivated by these problem statements, hypothesis can be made for a conceptual framework in which a bridge between the fuzzy granular partition, accuracy, and complexity (i.e. number of fuzzy terms or rules) is considered. This framework can be focused on integrating the evolving and self-organizing system (i.e. Sections 5.1 and 5.2). 3. Information granule and fuzzy model The concept of information granule consists of knowledge derivation and data processing [1]. Information granules are considered as a certain conceptual framework of basic entities [5]. The GrC is a unified conceptual and computing framework of the information granules, which exhibits the descriptive and functional representation of a global concept [10,11]. Information granularity and its relation to fuzzy systems can also be expressed as a functional model to achieve the application environment [11]. 3.1. Knowledge transfer to fuzzy model Pedrycz discussed two general scenarios of knowledge transfer to fuzzy model depicted in Fig. 1 [12]. In the first general scenario, the individual knowledge source is captured as a model and formed at the higher level as shown in Fig. 1(A). Models from knowledge sources are referred to as granular models. Knowledge sources are diversely distributed in terms of their nature; therefore, to obtain a balanced granular model, knowledge management is locally adapted. For example, knowledge management can anticipate computational intelligence (CI) models such as fuzzy models, neural network, and regression models. The second category of knowledge transfer to information granule is illustrated in Fig. 1(B). In this category, a conceptual junction is developed by a knowledge structure. Afterwards, knowledge sources collaborate and interact with that conceptual junction, such that individual models are realized with some computational mechanisms.
123
The generation (or extraction) of information granules can be accomplished through the application of learning techniques (such as fuzzy models, neural network, regression models, fuzzy cognitive maps, and rough sets) suited to the conceptual framework. This conceptual framework is based on the definitions of the information granules. Granular models and their relation with fuzzy systems are well-documented in the literature. However, FST holds a prominent position among all GrC frameworks, because the linguistic terms of FST can be represented in a procedural manner and this technique is used to express human-centered concepts [7]. This paper develops theory of fuzzy information granule (TFIG), which leads to a mathematical framework for modeling the cognitive tasks of humans based on granulating information [4]. However, dissimilarities between the interpretable and the transparent fuzzy models exist. Interpretability is more related to the cognitive aspect and its presence can be easily perceived by humans. In contrast, transparency is associated with mathematical concepts that can be explained based on fuzzy linguistic terms (components) and their relation within the fuzzy model. Therefore, fuzzy models are inherently transparent; however, their interpretability remain uncertain until further analyses are conducted [13,14]. Hence, a framework based on an interpretable fuzzy model can be described as an interpretable information granule or fuzzy information granule [13]. Moreover, the fuzzy rule base system (FRBS) and the neural fuzzy system (NFS) are hybrid systems that capitalize on the functionalities of fuzzy systems and neural networks [15]. Resolving the black-box nature of a neural network can be accomplished by integrating the interpretability of a fuzzy system into a connectionist structure [15,16]. In addition, introducing the learning capabilities of a neural network into a fuzzy system allows the system to automatically refine its parameters [17]. Numerous algorithms have been developed to represent fuzzy granular models, as well as adaptive neuraland evolving fuzzy systems [6,18,8,16]. When dealing with granular fuzzy rule-based system (GFRBS), linguistic fuzzy modeling (LFM) or the classic Mamdani rule-based model are obvious choices. LFM is based on linguistic rules with antecedent and consequent parts. These parts may be regarded as linguistic terms wherein associated fuzzy sets are defined. Therefore, it would be advantageous for constructing meaningful information granules based on experimental evidence. The data denoted as [x, d]i where x = evidence is the ith training x1 , x2 , . . ., xp , . . ., xn is an input vector, and di is the corresponding output and i = 1, 2, .., N. This modeling problem has n input variables and N data samples. A rule r (r = 1, 2, . . ., Rt ), which encodes an IF-THEN Mamdanitype fuzzy rule at t evolving stage, is represented as a granular fuzzy rule as follows: t,(r)
Rt,r :
IF x1 is A1
t,(r)
, and x2 is A2
t,(r)
, . . . . . . . . . , and xp is Ap
THENyis C t,(r) , (1) t,(r)
where C t,(r) is the rth consequent part, and Ap is the rth antecedent part associated with the pth input variable. Both C t,(r) t,(r) and Ap are information granules where agranular membership functions (MFs) (including Gaussian, triangular, or trapezoidal) are t,(r) utilized for both linguistic levels C t,(r) and Ap [14]. Recall that t,(r)
both input and output information granules (Ap and C t,(r) ) are semantically interpretable. A Gaussian membership function () is t,(r) employed for both linguistic levels C t,(r) and Ap .
=e
− (x−c)2 /
,
(2)
124
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Table 1 Taxonomy for the analysis of the interpretability-accuracy tradeoff in the fuzzy information granule. Granular rule base level Complexity-based granularity
Granular partition level T1 i. number of granules in input-output domain ii. underfitting/overfitting iii. justifiable granularity
Semantic-based granularity
T2 conflict decision and resolver
T3 i) a) b) ii)
interpretability constraint constraints for fuzzy set constraints for frame of cognition parameter optimization
Fig. 1. Knowledge transfer to information granule [12]. (A) Individual sources of knowledge to a granular model, and (B) interaction linkages of knowledge sources (f1 , f2 ,. . ., fc ).
=
−
2
(ak − bk ) , ln ˛
(3)
where c and are the center and width of the linguistic (and interpretable) levels, respectively. ak (orbk ) indicates that data are t,(r) located at the border of the kth granules (C t,(r) or Ap ). ˛ > 0 is the minimum membership value, as well as the distinguishability t,(r) factor that keeps the semantic value for both C t,(r) and Ap .
3.2. Double axis taxonomy To analyze the interpretability-accuracy tradeoff, a taxonomy is designed for the fuzzy information granule (Table 1). This taxonomy is based on a double axis, namely, complexity-based granularity versus semantic-based granularity, and granular partition level versus granular rule base level. The following sections will explain this taxonomy in detail. First, the number of granules in the complexity-based granularity in the granular partition level (T1 ) should be optimized, such that, the tradeoff between granularity and simplicity is achieved (details in Section 4.2). Furthermore, this optimized granularity level is the reason for reducing the MFs in both input and output domains, hence, the number of rules (or size of rule base) and rule premises (or the number of fuzzy set) are reduced. Second is the semantic-based granularity in the granular rule base level using conflict (rules fired at the same time) resolver (T2 ). The simultaneous firing of rules leads to the conflict situation in the rule base level, which shows the lack of interpretability
that also reduces accuracy. Conflict decision and its resolver will be discussed in Section 4.3. The third category is the semantic-based granularity in the granular partition level (T3 ). In this category, a few properties should be maintained to retain the semantic interpretability of the granules. The taxonomy for the interpretability constraint of the fuzzy information granule suggested by Mencar and Fanelli [1] involves two properties considered as significant for granular computing, namely, constraints for the fuzzy set (i.e., normality, convexity, unimodality, and continuity) and for the frame of cognition (i.e., proper ordering, distinguishability, completeness or coverage, complementarity, and uniform granulation). To realize the interpretable granule in the granular partition level, the interpretability constraints and the parameter optimization will be specifically described in Section 4.4 in the context of fuzzy granular computing. 4. Interpretability-accuracy tradeoff in the fuzzy information granule The interpretability–accuracy tradeoff of fuzzy granular models is essential, although this contradicting objective is still an unresolved problem [19]. Changing the abstraction level conceived by humans enables this tailored approach to address the interpretability–accuracy tradeoff of the fuzzy model [18]. The definition of accuracy in a specific application is uncomplicated; however, the definition of interpretability is relatively problematic [9,20,21].
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
4.1. Interpretability, accuracy, and granularity measures fi (j) = Interpretability and accuracy are two contradictory requirements of the fuzzy information granule [22,23]. Interpretability is the ability to explain the behavior of an application system in comprehensively. The abstraction level conceived by global knowledge implies interpretability, supporting the existence of information granules. Therefore, granularity is the interpretability outcome. Moreover, accuracy refers to the capability to represent the similarity between the real test data and the proposed model. The Sum of square error (SSE), mean square error (MSE), and root of MSE (RMSE) measure the accuracy of how reasonable the model is with respect to the real test data. The sum of square error (SSE) of the tth evolving process is defined in (4) as follows:
SSE = E (t) =
N
o (xi ) − ddesi
2
,
• Complexity-based interpretability in the rule base level by considering the numbers of rules and conditions. Simplifying the readability of the rules means minimizing the conditions to as few as possible in the antecedent (fuzzy linguistic terms in the input domain) of the rule. • Complexity-based interpretability in the fuzzy partition level by viewing the numbers of MFs and features. • Semantic-based interpretability in the rule base level by considering the consistency of rules, as well as the rules fired concurrently. • Semantic-based interpretability in the fuzzy partition level by maintaining important properties such as completeness (or coverage), normalization, distinguishability, and complementarity. Such properties can be tackled using constraints in MFs, thereby improving the desirable characteristics. However, research on interpretability and granularity measures failed to lead to a unique direction. Granularity measure is a promising means to verify the effectiveness of the fuzzy granular model. Pedrycz [22,23] recently proposed justifiable granularity, a remarkable concept that realizes the semantic information granule; details of this concept are discussed in Section 4.2.3. Nevertheless, very few studies have considered granularity measurement. [96] proffered a performance index (p) to measure the accuracy of the time series granular model. Inspired by justifiable granularity [22,23], they used two performance criteria: coverage criterion (Q ) to measure data legitimacy (or justification) and specificity criterion (V )i to measure granule compactness. Assume that Xi = x1i , x2i , . . ., xw is a feature vector included in the time window Ti and the interval + , b . Therefore, on Ti is a+ i i
w j=1
× 100,
(5)
j = 1, 2, . . ., w
Vi = b+ − a+ . i i
(6) (7)
If the number of time windows is p, then
p
P% =
Qi i=1 Vi
p
× 100,
(8)
where P is high if Qi and Vi are high and low, respectively. Moreover, larger P values signify a more accurate representation of the granular model. To assess the quality of the granular model, Pedrycz et al. [26] also proposed an area under curve (AUC) based on justifiable granularity (Section 4.2.3).
1 AUC (cov) =
where N is the number of training data. o (xi ) and ddesi are the model and desired outputs of the ith training data, respectively. MSE and RMSE are defined as the mean and root of the mean of SSE, respectively. Nevertheless, interpretability is a subjective property, and its measure remains an open problem [24,25,9,19]. Most researchers have used the following aspects to measure interpretability: fewer rules, fuzzy linguistic terms with semantic properties, and rule premises [20]. However, Gacto et al. [9] proposed a taxonomy to analyze the interpretability of the FRBS containing the following quadrants:
w fi (j)
0otherwise
(4)
i=1
Qi % =
1xji ∈ a+ , b+ i i
125
cov (˛) d␣ 0
,
1
AUC (spec) =
(9)
spec (˛) d␣ 0
where AUC is the global form of the coverage (cov) and specificity (spec) criteria, and ˛ is the calibration parameter to regulate the specificity criteria of the constructed information granule. Moreover, Reyes-Galaviz and Pedrycz [27] used the concept of justifiable granularity (Section 4.2.3) to evaluate the granular fuzzy model. A performance index Qmax was considered from the maximum value of Q as follows: Q = f1 × f2
(10)
4.2. T1 : granularity-simplicity tradeoff Table 1 (Section 3.2) shows the designation of Taxonomy T1 in relation to complexity-based granularity. To realize the interpretability-accuracy tradeoff in the fuzzy information granule, the number of granules in the input-output domain should be within a certain (optimized) level that will discuss in Section 4.2.1. After that, the following sections (Sections 4.3.3 and 4.2.3) will provide in-depth description of the criteria for preserving the semantic and compact granules while considering accuracy. 4.2.1. Granularity, complexity, and accuracy: associations on the fuzzy information granule The tradeoff between granularity and complexity is important for the application environment of the fuzzy granular approach. Higher granularity refers to increased MFs, which creates a more complex fuzzy system. Fig. 2 compares lower and higher granularities [20]. However, lower granularity seldom represents the data and exploits the bloated fuzzy rules. Consequently, an overlapping condition is realized for A11 and A12 as shown in Fig. 2(A). Conversely, higher granularity aptly shows the semantic information, such that, linguistic fuzzy MFs are realized [see Fig. 2(B)]. Nevertheless, reaching a balanced number of granularities is significant to compromise with complexity, thereby achieving an effective fuzzy system. Furthermore, higher granularity represents fine partitions, whereas coarser partitions are visible in lower granularity. Therefore, accuracy is directly proportional to granularity, which means that higher granularity signifies a more accurate and complex model. Considering the complexity reduction of the knowledge base, a granular framework is crucial in designing a tradeoff approach between simplicity (inverse of complexity) and accuracy [13]. Such strategy is consistent with the minimum description
126
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Fig. 2. (A) Lower granularity with bloated rules and (B) higher granularity. Rule centroids are shown with “+” [20].
length (MDL) principle [28], also called the Occam’s Razor Principle, which states, “the best model is the simplest one fitting the system behavior well” [9]. Therefore, granularity should be at a certain level (or evolving stage) for a granular framework, in which the model performance can be the tradeoff with granularity and thus preserve the satisfactory level of human cognitive tasks. Nevertheless, that certain level (or stage) of granularity fails to guarantee the interpretability of the information granules if other properties are unsatisfied. To guarantee interpretability, numerous authors proposed the adoption of several knowledge structure constraints; these interpretability constraints define the mathematical framework of the interpretability concept. A number of definitions and constraints for the FIG formation are defined by Mencar and Fanelli [1]. A detailed review on interpretability constraint will be discussed in Section 4.4.1. The literature describes several complexity reduction methods in the fuzzy information granule, including: the selection of certain granularity levels [8,29], the extraction of granularity stages by avoiding the overfitting situation [30,31], the elimination of low-relevant information granules [32,33,16], the search for information granules based on approximation error tolerance and sampling data distribution [34], and the fusion of similar information granules [35,31]. Another typical method to reduce knowledge structure complexity is to diminish the dimensionality of the problem (for example, feature selection) [36,37]. Nevertheless, reduced dimensionality (feature extraction) may lead to the loss of interpretable knowledge as disregarded features can have semantic meanings. Therefore, reducing the dimensionality of the knowledge base may result in the non-interpretation of the unspecified test data, thus, reducing accuracy. In summary, the prominence of the FIG in GrC arises from its capability model the cognitive tasks of humans. Moreover, granularity and complexity are significant in designing a conceptual framework for FIG. Higher granularity is desirable for proper data representation; thus, higher granularity also provides a more accurate FIG representation. However, complexity should be compromised. Hence, the optimized level (or certain evolving stage) of granularity can justify the knowledge base achievement of a higher abstraction level. Following sections will provide in-depth description of the tradeoff between granularity and complexity.
4.2.2. Overfitting and underfitting situation To preserve the semantic and compact granules described in Section 4.2.1, overfitting/underfitting criteria is a significant issue to consider. When a model is applied to a dataset, three situations may be observed (see Fig. 3). First, underfitting occurs when
the model is inadequate to represent the dataset [Fig. 3(A)]. Afterwards, the evolving stage transpires to attain a good compromise between the dataset and its pattern [Fig. 3(B)]. In some evolving stages, the dataset can be overfitted, which fails to represent the dataset properly as shown in Fig. 3(C). The model is generally considered overfitted if it is more accurate for relevant information (known data) but less accurate for irrelevant information (noise data). The overfitting and underfitting conditions regarding the granular computing method (T1 in Table 1) are discussed below. Training error is expected to abate as the evolving stage (the addition of more information granules) proceeds. However, the two issues of underfitting and overfitting occur in practice to generate unbalanced information granules. Underfitting occurs when lower granularity exists as depicted in Fig. 2(A). Given the high variance of the membership function, the bloated granule cannot represent the application data properly, which decreased accuracy. As information granules continue to evolve, the granule variance diminishes and very fine partitions are produced in the application data. At specific evolving stages, the variance of the granules becomes extremely small resulting in an unbalanced state that cannot properly represent the data. Such an unbalanced state leads to a fuzzy system with a large number of rules, causing overfitting (meaning the data fit is very close because of the small width of the granules) as shown in Fig. 3(C), and reducing accuracy. However, very few studies considered this significant characteristic in designing the granular framework. Di et al. [8] measured the underfitting and overfitting conditions for each evolving stage based on the error-evolving rate (EER) index. The EER is the percentage change of the model errors at evolving stage t as follows [8]: REER (t) =
= 1−
E (t − 1) − E (t) × 100% E (t − 1)
E (t) × 100% E (t − 1)
,
(11)
where E (t − 1) and E (t) are the model-training error at and t evolving stages, respectively. This index is characterized by an error-decrease rate, and REER (t) always takes its value between 0% and 100%. The REER (t) index decreases quickly at initial stages. Afterwards, it becomes very small indicating that further evolving processes are at risk of overfitting the model. In this model [8], EER = 3% is the efficient threshold value to realize the best-fitted fuzzy model. Ahmed and Isa [30] proposed an evolving method, which begins from the underfitting state by translating the entire output domain knowledge. Their method used first training data as the reference
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
127
Fig. 3. Dataset to information model, (A) underfitting, (B) good compromise, (C) overfitting [38].
data for the underfitting state. Therefore, this underfitting state locates the data sample with the largest error, which are too coarse to fit the data (i.e., first coarser partition). Afterwards, the evolving granulation process continues by recognizing the overfitting situation of the evolving granule error (EGE) index. Therefore, at evolving stage t [30], we have EGE (t) =
E (t) . E (t − 1)
(12)
The characteristic equation for the EGE index was defined as (13), where the effective and overfitted rule base(s) are found when EGE < 1 and EGE ≥ 1, respectively. EGE (t) = {
< 1,
effective
≥ 1,
over − fitting
(13)
The two conditions proposed in this granular method to analyze the EGE (t) index are as follows [30]: (a) If the number of distinct context increases in the consecutive evolving stages, i.e., S t > S t−1 , then E (t) < E (t − 1). Therefore, E (t) ≥ E (t − 1) causes the overfitting situation, and EGE (t) is computed using (12). (b) If the number of distinct context remains the same in the consecutive evolving stages, i.e., S t = S t−1 , then, a new rule is not t+v t−1 t−2 t−w created. Let S t = S t+1 = . . . = S and S = S = . . . = S . Therefore, E (t − 1) = max E (t − 1) , E (t − 2) , . . ., E (t − w) . Hence, {E(t) ≥ E(t − 1), E(t + 1) ≥ E(t − 1), E(t + 2) ≥ E(t − 1), . . .. . ., E(t + v) ≥ E(t − 1)} causes the overfitting. Thus, the EGE (t + v) index can be defined as EGE (t + v) =
E (t + v) v | . E (t − 1) v=0
(14)
Another recursive fuzzy model proposed by Ahmed et al. [31] is applied to the condition monitoring of electrical hotspots; this model is comprehensively defined as the overfitting characteristic based on the number of distinct fuzzy terms (or granules). Similar to Ahmed and Isa [30], the method also started from the underfitting state by taking the first training data as the reference point. Afterwards, the evolving process continues until the termination criteria are fulfilled. The balanced number of rules (BNR) index is introduced to realize the overfitting state and terminate the recursion procedure. BNRt , a recursive controller index at the evolving stage t, achieves the unbalanced (i.e., overfitting) situation in the recursive process to prevent the further evolution of the algorithm. BNRt =
(NRt − NRt−1 )2 , NRt
(15)
where NRt and NRt−1 are the numbers of rules at the evolving stages t and (t − 1), respectively. BNR ∈ [0, 1] shows the balanced parti-
tion for the fuzzy model. The characteristic equation for BNR at the evolving stage t is as follows [31]: BNRt = {
≥0
balanced partition
>1
unbalanced partition
.
(16)
The EGE and BNR estimations are fully online and are not predefined thresholds; these are also approximations from the current and previous evolving stages. The underfitting state is considered from the first incoming data, so the evolving process can be delayed to obtain the expected outcome; hence, the overfitting state for EGE and BNR can be nonlinear. For example, consider the first performance analysis of EGE in Ahmed and Isa [30], i.e., the identification of a non-linear system. The EGE index appears nonlinear in nature as the evolution process continues. Therefore, the underfitting state and the estimation of these indexes (EGE and BNR) should be refined and refocused. However, these indexes are evaluated based on the application data, thereby demonstrating the significance of the proposed underfitting/overfitting characteristics.
4.2.3. Justifiable information granularity Compact and semantic granules are vital to reduce complexity and to realize the optimum number of granularity. The term “compact” implies that the granule should include as many data as possible. However, the granule should also maintain semantic interpretability. The concept of justifiable granularity [39–41] (Lu et al., 2014) recently consisted of the fundamental blocks of GrC. The principle of justifiable granularity (JG) is concerned with the formation of a semantic information granule. The term “justifiable” relates to the realization of the information granule, which is considered such that it is (a) highly legitimate (justified) with regard to the experimental data, and (b) sufficiently specific to indicate the realization of well-defined semantics [41]. These two requirements instinctively compel each other. Assume that information granule is based on available experimental evidence (data) D = {x1 , x2 , . . ., xN }. The first requirement is for the numeric evidence of to be as high as possible. By doing so, the new information granule is expected to be well justified and this prospect predicts good data representation. For example, if is considered as a set, the set becomes more legitimate as data are increased. In the case of a fuzzy set, higher justifiability is quantified when a higher sum of membership degrees of the data is realized in . Assume that f1 is an increasing function of card{xk |xk ∈ } and simplyf1 (u) = u. The second requirement of JG is that should be specific to realize a well-defined semantic. Therefore, the cardinality of (card) should be as low as possible which means that a more compact information granule is desirable. Assume a decreasing function f2 (m ) where m is the interval length of = [a, b]. More semantic
128
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Fig. 4. Optimization of the interval for information granule = [a, b] [5].
information granule can generate for the higher values of f2 (m ). The optimization is shown in Fig. 4 . However, based on the two conflicting requirements, the lower and upper bounds of the interval can be realized disjointedly using the composite multiplicative index,V [23]. V (a) = f1 (card{xk ∈ D|med (D) ≤ xk ≤ b}) ∗ f2 (|med (D) − b|) , (17) and V (b) = f1 (card{xk ∈ D|a ≤ xk ≤ med (D)}) ∗ f2 (|med (D) − a|) . (18)
Optimal upper and lower bounds can be constructed as [39]:
V bopt =
V aopt =
max V (b) ,
(19)
max V (a) .
(20)
b>med(D)
a
Alternatively, the JG requirements can be designed by the following forms (Lu et al., 2014): f1 (u) = u, f2 (u) = e
−˛u
(21) ,
(22)
where ˛ is the calibration parameter to regulate the specificity criteria of the constructed information granule. The specificity criteria achieve higher abstraction levels for higher ˛ values. ˛ = 0 indicates ignorance of the specificity criteria. 4.3. T2 : conflict decision and its resolver for the fuzzy granular model Conflict decisions refer the number of rules concurrently fired for an input in the fuzzy rule-base (T2 in Table 1). This situation is observed when the data samples are distributed unevenly over the input domain with low space coverage [20]. Furthermore, the loss
of information [42] is another important issue for conflict decision that will reduce the accuracy while considering interpretabilityaccuracy tradeoff. To circumvent the conflict decisions at the granular fuzzy rule base, fuzzy model studies focused on improving decision boundaries (i.e., MFs) or on customizing the decision boundaries by rule weights [18,9,20]. Other methods included rule compression [32,33] and reduction [43,44,35,16,36], which preserved the semantics at the fuzzy rule-base level. The aforementioned methods comprehensively modeled the whole system without considering the individual meaning of inputs. Therefore, a conflict situation leads to the incorrect classification of an unspecified part of the input space, thereby reducing accuracy. The information granule is considered optimal as it lessens the number of rules (i.e., reduces the complexity described in Section 4.2.1) and achieves high accuracy. Unlike the grid-partitioning approach where the rule centroids are isolated to ensure interpretability [8], rules isolate themselves from one another in the fuzzy rule base system [20] as depicted in Fig. 5(A). Conflict decisions are observed [Fig. 5(B)] when the data samples are distributed unevenly over the input domain with low space coverage. To resolve the conflict, studies on fuzzy rule-based systems focused on rule compression, which discarded less significant antecedents from individual rules and improved the interpretability and accuracy of the classifier [32,33,9]. Fig. 5(B) shows the conflict situation of two rules; their compressed rules are: i) If x2 is A32 , then y belongs to class 3; ii) If x1 is A12 , then y belongs to class 1. For example, Pizzileo et al. [36] employed forward rule selection, backward rule refinement, and optimization criteria to achieve a good tradeoff between interpretability and accuracy. Fig. 5(B) shows the conflict situation of two rules and their compressed rules. Rule compression may cause conflict situations in the unusual part of the input space and make the system lean more toward the inaccurate classifier for unseen samples [20]. Consistency check (and deletion of the inconsistent rules) is another method to resolve conflicts [35,16]. However, this method is similar to rule reduction as reported by Alonso et al. [32] and Mencar et al. [33]. All antecedent parts generally function in the system and their roles are either high or low. Therefore, rule reduction is illogical because a conflict situation leads to the incorrect classification of an unspecified part of the input space, which reduces accuracy. Another method in the fuzzy rule-based system focuses on improving decision boundaries to obtain as much accuracy as possible. Rule weights [18,9,45,20], which have rather limited capacities
Fig. 5. Fuzzy rule-base and its conflict (A) A five-rule fuzzy classifier (B) conflict situation for the first (R1 ) and third (R3 ) rules.
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
129
Fig. 7. Fuzzy partition with poor semantic interpretability [9].
Fig. 6. Decision boundaries and its effect on membership function [20].
to adapt to the system, are used to control decision boundaries. Furthermore, rule weight affects all MFs associated with the rule under observation; therefore, a permanent decision border is attained relative to rule weight. Fig. 6 shows the rules (rectangular form of V1 –V3 ) and their corresponding MFs. A permanent decision border may be forcibly defined by rule weight (shaded area in Fig. 6) and cause the bounded system to realize the unseen sample. However, a dynamic decision border is more desirable than a permanent decision border because of the generalization ability required to realize the data. A dynamic decision border may be regarded as a conflict resolver wherein the decision border updates dynamically with respect to the evolving [8,30], the iterative [16], or the granular processes [40]. Note that a dynamic decision border is considered in the granular partition level, but not in the rule base level. A detailed discussion on controlling the decision border in the granular partition (i.e., through interpretability constraint and parameter optimization) is presented in Section 4.4. Unlike the improvement of decision boundaries, a 2-tuple fuzzy linguistic model [19,42] was proposed where the linguistic representation is expressed by means of 2-tuples, namely linguistic term and symbolic translation of a numeric value. This expression leads a continuous representation of the linguistic information, thus tradeoff between interpretability and accuracy. Without loss of information, Herrera and Martínez [42] developed multigranular linguistic contexts based on 2-tuple fuzzy model. Here, the concept of hierarchical linguistic structure was introduced in order to improve accuracy. However, a conflict situation in the rule base level can still exist although the interpretability constraint is used in the granular partition level because of the noise and unevenly distributed input data. Table 2 depicts the conflict situation in the granular rule base for new input information xnew = x1new , . . ., xpnew , . . .xnnew (and Anew as its fuzzy set) which is supposed to be fired at a specific granular rule. The maximum compatibility grade with the new pattern xnew is used to determine the winner rule [46]. The winner rule is expressed for Eq. (1) in Eq. (23).
t,(r)
y = C t,(r) , argmax { A1,2,...,p,...,n (x) },
1≤r≤R
t,(r)
and A1,2,...,p,...,n (x) = max t,(r) (x1 ) × t,(r) (x1 ) . . . × t,(r) (xn )
.
(23)
Therefore, to resolve the conflict situation for the granular rule base (as described in Table 1), the rule firing method can be a significant GrC research scope. A number of the common reasoning methods for rule firing were described in Nakanishi et al. [47]. However, 2-tuple fuzzy model [19] is a significant research in order to avoid the loss of information, thus the fusion of linguistic information (i.e. rule firing) implies more precise information (i.e. more accurate). A very promising method regarding the firing level deter-
mination of individual antecedent terms based on new information was suggested by Yager [48]. Here, Ap is an antecedent fuzzy set and p is a fuzzy measure on space xnew , where p is the input attribute (feature). In addition, the method determines the firing level of the rule antecedent fuzzy set Ap (simply A) when the input information about the variable is expressed using the measure p (simply of A given ) and the satisfaction is defined as the validation or Val A/ . The determination of Val A/ was carried out by Yager [48] based on the Choquet integral by considering as a probability measure. Furthermore, Val A/ was also derived using the Sugeno integral and the median of a function under the measure . In summary, conflict decision resolution is a very promising undertaking although granular (interpretable) rule base is considered. However, unlike rule weights, rule compression and reduction in the granular rule base level are significant concepts in firing the antecedent condition based on the new information. Nevertheless, very few works highlighted the firing of the individual antecedent part to maintain the interpretability-accuracy tradeoff in the granular rule base level. 4.4. T3 : interpretability constraints and parameter optimization for interpretability-Accuracy tradeoff Designing a granular framework in which interpretability constraints and parameter optimization are realized is essential, such that, semantic ability and accuracy are maintained (T3 in Table 1). Constraint realization is important for realizing semantic blocks and consequently achieving high interpretability. Notably, parameter optimization should not be considered alone, because its purpose is to increase accuracy, which reduces interpretability [35]. Therefore, constraint realization and parameter optimization should be executed in a manner in which the granular framework avoids the interpretability-accuracy dilemma. 4.4.1. Constraint realization for semantic blocks To realize the interpretable fuzzy information granule, it is vital to preserve the semantics of the MFs or to measure some criteria such as distinguishability and coverage [1]. The semantics of the MFs actually design the fuzzy partition level semantics [9]. Furthermore, constricting the MFs to avoid poor semantic interpretability is important as depicted in Fig. 7. Numerous approaches in the literature have attempted to ensure semantic integrity by applying constraints on the MFs. Some approaches use the FCM method for the semantic interpretability of the information granule [18,33] (Lu et al., 2014). In this study, the weighted information granule refers to rule weight and affects the decision boundaries of MFs as depicted in Fig. 6. Nevertheless, automatic generation of the information granules from data is a significant approach considering its dynamic adoption of the application environment behavior [1]. The simplified structure evolving method (SSEM) [29], and the evolving-construction scheme for fuzzy system (ECSFS) describe the error-reducing evolving methods [8]. In both studies, the structure of the fuzzy rule base system evolved, and the evolving
130
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Table 2 Conflict situation example in the granular (interpretable) rule base (number of rules are five at tth evolving stage) for new input information, xnew . This new input information should be fired at this rule base. Fired antecedent parts are shown as shadow. Granular fuzzy set for the antecedent part
Granular fuzzy set for the consequent part
Rule number
x1
x2
y
1 2 3 4 5
t,(1) A1 t,(2) A1 t,(3) A1 t,(4) A1 t,(5) A1
t,(1) A2 t,(2) A2 t,(3) A2 t,(4) A2 t,(5) A2
C t,(1) C t,(2) C t,(3) C t,(4) C t,(5)
= C1 = C1 = C2 = C3 = C3
Winner rule using Eq. (23)
Conflict: Outputs of rules 2 and 4 are realized as C 1 and C 3 , respectively.
Fig. 8. Creation and update of fuzzy rules and MFs for a closed selected region [8].
processes continued until the desired accuracy is achieved. Fig. 8 illustrates the evolving procedure of the fuzzy information granule and the dynamic adaptation of its semantic block. In addition, extremum and inflexion points were computed using the least square method (LSM). Consequently, variances of the information granules are dynamically attained to obtain a semantic block. Pulkkinen and Koivisto [49] proposed “a dynamically constrained multiobjective genetic fuzzy system for regression problems” that considers these two objectives: mean square error (MSE) and total rule length (sum of the rule lengths). This evolving approach tunes the three parameter MFs with dynamic constraint to achieve the interpretability-accuracy tradeoff, and therefore guarantee the distinguishability and coverage of the fuzzy partition based on the following criteria:
(1) Symmetry conditions: All MFs should be symetrical in shape. This criteria can be guaranteed by definition if Gaussian, traiangular, or trapizoidal MFs are used. (2) ˛-condition: The intersection point of two MFs should be constrained. “At any intersection point of two MFs, the membership values are at most ˛”. (3) -condition: The overlap at the center of each MF should be constrained. “At the center of each MF, no other MF receives membership values larger than ”. (4) ˇ-condition: The strongly covered universe of discourse (UOD) should be constrained. “At least one MF has its membership value at ˇ”.
These constraint values should be predefined before applying the dynamic tuning strategies. The authors suggest the following suitable values: ˛ = 0.8, = 0.25, and ˇ = 0.05. In case of highly interpretable fuzzy partitions, they also recommended using the following states: ˛ = 0.6, = 0.4, and ˇ = 0.1. eFSM [35] is a self-organizing model that ensures online learning. This fuzzy system attempted to design a compact fuzzy rule base system that ascertains a clear semantic meaning of fuzzy partitions. Furthermore, eFSM uses the uniform coverage criterion (i.e., threshold) during the structure learning. Neverthless, threshold causes unbalanced MFs given an unevenly distributed data over the application domain. However, SaFIN [16] also utilizes a self-adaptation method and introduces a new clustering technique defined as categorical learning-induced partitioning (CLIP), which derives “inspiration from the behavioral category learning process demonstrated by humans”. Fig. 9 shows the CLIP procedure, which considers both antecedent and consequent parts. The proposed method initiates the first information granule with the minp , maxp boundary and the same procedure repeats for the output domain. Afterwards, adaptation and evolution continues for the next incoming training tuple xp , d . The first fuzzy cluster (antecedent part, A1p , and its parameters center, c1p and variance, 1p ) in the input dimension p can be formed using Eqs. (24)–(25) as illustrated in Fig. 9(A).
c1p = xp ,
(24)
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
131
Fig. 9. CLIP procedures for each input–output dimension (A) Initialization. (a) First cluster formed if the center coincides with the midpoint of the domain. (b) First cluster formed before regulation. (c) First cluster formed after regulation. (B) Clustering. (a) Introduction of a novel data point with no left neighbor. (b) Creation of a new cluster before regulation. (c) Final appearance of the fuzzy partitioning after regulation. (d) Introduction of a novel data point with both left and right neighbors. (e) Creation of a new cluster before regulation. (f) Final appearance of the fuzzy partitioning after regulation [16].
⎛ 2 2 ⎞ minp − xp maxp − xp ⎠, 1p = R ⎝ − ,− log ˛
log ˛
(25)
where ˛ > 0 is the minimum membership value that strongly covers the UoD. To maintain the distinct semantic meaning, the proposed system defines the regulator function R (1 , 2 ) := 21 [1 + 2] , which buffers for either side from the center. In SaFIN, the similarity threshold (ˇ) represents the similarity between two consecutive fuzzy levels. Fig. 9(A) exhibits the initialization process, which considers whether or not the center coincides with the midpoint of the domain. Afterwards, variance is regulated for the first cluster using R (.) based on the center [Fig. 9A(c)]. After the initialization of the first fuzzy cluster, evolution and adaptation are continued using the incremental method shown in Fig. 9B(a–f). In this stage, SaFIN considers three conditions: (a) only the left neighbor exists, (b) only the right neighbor exists, and (c) both left and right neighbors exist. Consider a novel data point (based on ˇ) with no left neighbor (i.e., the right neighbor exists) as shown in Fig. 9B(a–c). The final appearance after regulating the fuzzy MFs is shown in Fig. 9B(c). Consider another condition with a novel data point which has both left and right neighbors (Fig. 9B(d–f)). The final form of the fuzzy cluster after applying the regulator function R (.) is illustrated in Fig. 9B(f). Nevertheless, SaFIN uses the initial boundary of minp , maxp , which is arbitrarily regarded as the first cluster; the number of clusters might differ depending on the chosen value of this boundary. Evidently, the variances of the new fuzzy cluster are calculated simply by averaging the left and right neighbors. Therefore, the variances attain a constant value after a new cluster is formed without any room for further adaptation. Hence, an ignored novel
data point might belong in an existing fuzzy cluster, consequently reducing semantic interpretability and accuracy. 4.4.2. Parameter optimization Parameter optimization primarily aims to maximize the average agreement of the linguistic model with the experimental numerical data [50]. The designer articulates a model based on some specific perspectives; therefore, the developed model pivots almost exclusively [18]. Consequently, the model may not fully reveal the nature of the experimental data, such that it can be an optimization of its parameter. Therefore, a successive optimization technique is significant for refining and readapting the information granules to achieve the interpretability-accuracy tradeoff. However, the main focus on parameter optimization of the information granules actually improves the decision boundaries of MFs [35]. Assume that Q is the optimization factor and N is the number of training dataset. The optimization problem can be expressed as [18] Q =
1 N [Tk (yk ) − 1] → Min, N k=1
(26)
which should be optimized to achieve the highest human abstraction level (or matching the highest possible level). Ideally, the main goal is to equal the highest degree between the numeric output data point yk and the granular output Tk [i.e., membership grade of Tk (yk ) at yk is considered as equal to 1]. 4.4.2.1. Gradient-descent method and its drawback. Gradientdescent is a popular iterative refinement technique for the linguistic model in the successive optimization course of the information granules. This method starts from a given initial point and follows the negative of the gradient to move toward a critical point
132
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Fig. 10. Gradient-descent technique (A) initial stage with a guess and, (B) its recursive procedure until a stopping criterion is fulfilled [51].
(Fig. 10); it is a recursive procedure, where optimization proceeds toward a desired local minima. Moreover, Pedrycz and Keun-Chang [18] proposed a user-centric model, which modeled an information granule system based on the fuzzy rule and which optimized its parameters using a gradient-based technique. Let the optimization factor (Q ) be represented using (26) as the sum of squared errors, such that,
2 1 yˆ i − yi , N N
E=
(27)
i=1
where yˆ i and yi are the model and desired outputs, respectively. The new gradient point (wnew ) is the weight vector from the FCM algorithm determined based on the gradient vector (w) and on a step down from the current state (wold ). wnew = wold + w.
(28)
The gradient vector w is determined from the derivative of the error measure on each parameter. Therefore, for each jth step of gradient-descent technique [18],
wj = −
∂Ei ∂wj
= −
∂Ei ∂yˆ i , ∂yˆ i ∂wj
(29)
where Ei is the error for the i training data and
∂Ei = 2 yˆ i − yi , ∂yˆ i
(30)
∂yˆ i = uj . ∂wj
(31)
Thus, wnew can be rewritten for each jth step of the gradientdescent technique as follows [18]:
wnew = wold − 2 yˆ i − yi uj ,
(32)
where and uj are the iteration rate and output parameter with rule weight, respectively. However, the gradient-descent learning method only focuses on improving the global performance or output accuracy of the fuzzy rule base model. Given that the information granule extracts meaningful knowledge from low-level training data, considering global accuracy without local semantic interpretability may negatively affect the interpretability-accuracy tradeoff. Undoubtedly, any optimization considered in the output domain (or consequent part) of the fuzzy information granule inclines the fuzzy model to optimize the input domain (antecedent part). However, if optimization is considered concurrently in both global (output) and local (input) domains, it will hinder the semantic interpretability of the fuzzy model. Therefore, an operational framework which incorporates both global and local optimization is proposed by eFSM
Fig. 11. Optimized fuzzy rule bases which cover the extrema of a function. The extrema points are defined as the fuzzy system center [52].
[35]. A localized parameter adaptation (LPA) is developed in eFSM, along with the global accuracy that maintains clear semantic interpretability, which achieves the interpretability-accuracy tradeoff. Furthermore, eFSM uses the back propagation method, and the iterative process continues until a stopping criterion is fulfilled. 4.4.2.2. Extremum and inflexion points. Designing a mathematical framework that considers the concurrent execution of the evolving information granule and its optimization is interesting. The concept of extremum points was suggested by Kosko [52] and mathematically explained by Yongsheng et al. [53] to split the input-output domain in an optimized manner. From Fig. 11, the extrema or bumps of the function f (x) can be achieved naturally, and these bumps are regarded as the center of the rule patches to minimize model error, E (Eq. (27)). To determine the bumps (i.e., x points) to minimize E, learning the derivative map for E with respect to x and then setting E as zero is suggested. Various algorithms and clustering approaches are used to build the granular block shown in Table 3 (fifth, sixth, seventh, and tenth columns); these methods mainly aim to attain information granules with interpretability and accuracy tradeoff. Similarly, SSEM [29] and ECSFS [8] both use extremum and inflexion points to split the input-output domain. LSM is used in these methods to realize the extremum or inflexion points. The splitting point of a sub-region is selected based on the maximal error point as determined by LSM. Moreover, an underfitting or overfitting condition in the ECSFS (see Section 4.2.2) is ensured for each evolving stage based on the error-evolving rate (EER) index. In addition, 3% is considered the efficient threshold value to realize the best-fitted fuzzy model. In contrast, SSEM uses the gradient-descent technique to improve global accuracy. Furthermore, Ahmed and Isa [30] and Ahmed et al. [31] also utilized the concept of extremum points. These methods are regarded as the concurrent execution of evolving and self-organizing methods. In addition, dynamic constraints and underfitting/overfitting situations are also considered
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
133
Table 3 Summary of selected fuzzy models to assess the interpretability-accuracy tradeoff. Authors
Type
Pedrycz and Keun-Chang [18] Liu et al. [44] Herrera et al. [54]
REG.
Pulkkinen and Koivisto [49] Tung and Chai [35] Mencar et al. [33,55] Tung et al. [16] Di et al. [56] Pizzileo et al. [36] Carrasco and Villar [57] Di et al. [8], Wang et al. [29] Solis and Panoutsos [50] Sanz et al. [45] Zhao et al. [58] Sanchez et al. [59] Oh et al. [60]
REG.
Sanchez et al. [61] Fazzolari et al. [62] Lu et al. (2014) Pedrycz and Izakian [63] Cpałka et al. [64] Leski [65] Ahmed and Isa [30] Wang et al. [66] Ahmed et al. [31] Reyes-Galaviz and Pedrycz [27] Reyes-Galaviz and Pedrycz [67]
CLAS. & REG. CLAS. REG. REG.
REG. DM
REG. CLAS. CLAS. & REG. REG. REG. DM REG. REG. CLAS. REG. CLAS. & REG. REG.
REG. CLAS. REG. REG. REG. REG. REG.
NR
Acc.
√
√
√
√ √
DM √
√
√ √ √ √ √ DM √
√ √ √ √ √ √ √
√
√
APV √ √ √
√ √ √ √
√ √ √ √
√ √ √ √
√ √ √ √ √ √
√ √ √ √ √ √
√
√
CM
CONST.
PO
C-FCM
–
GD
HL & SM 2-TFLM
LH
LSM Transformation function
√
GFS
ACD
Threshold – CLIP – √ DR EM
CLIP ST IAIC LS EM
DS
UE
UE
√ √
√ √
√
√
– FOU –
GD PSO (CPN, LSE) & (PN, RMSE) EM
UG FD √ –
√
TE
√
Ev M M-FCM (c + p) EM & SM SM EM & SM Output interval and FCM Output interval and min-max NN
√ √
√ √ √
LPA
√
ANY MOEA TW FCM
O/U Condition
√
SM FCM SM FCM B-spline NFN ST LSM &EM
ST FCM ANY C-FCM& FCM
CBC
√
– GD – SIFS – DE for JG
– DC √ DC –
√ √ √ √
√
√
√
DE for JG
Threshold in minmax NN
√
NR = Number of Rules, CBC = Condition-based Clustering, CM = Clustering Method, CONST. = Constraint, Acc. = Accuracy, O/U = Overfitting/Underfitting, PO = Parameter optimization, ACD = Avoid Conflict decision in rule base, FCM = Fuzzy c-means, C-FCM = Conditional FCM, M-FCM = Modified FCM, c and p = number of cluster and prototype, REG. = Regression, CLAS. = Classification, GFS = Genetic Fuzzy System, HL = Hebbian Learning, SM = Similarity Measure, LSM = Least Square Mean, EM = Evolving Method, DS = Distance Similarity, DM = Decision Making, UE = Uncertainty Estimation, FOU = Footprint of Uncertainty, HCD = Hybrid Centroid Density, MOEA = Multi-objective Evolutionary Algorithm, FD = Fuzzy Discretization, TW = Time Window, GD = Gradient Descent, TE = Taylor expansion, NFN = Neuro-fuzzy Network, IAIC = Improved Akaike’s Information Criteria, LPA = Localized Parameter Adaptation, CLIP = Categorical Learning-induced Partitioning APV = Adjusted p-value, ST = Search Tree, PSO = Particle Swarm Optimization, UG = Uncertainty-based Granule, DC = Dynamic Constraint, Ev M = Evolutionary method, ST = Separability technique, SIFS = self-structuring interpretable fuzzy system. CPN = Context-based polynomial neuron,PN = Polynomial neuron, LSE = Least square error, and RMSE = Root mean square error, DE = Differential evolution, JG = Justifiable granularity, NN = Neural network, ST = Semantic translation, LS = Linguistic Summarization, DR = degree of representativeness, 2-TFLM = 2-tuple fuzzy linguistic model, LH = Linguistic Hierarchies
for each concurrent execution, thereby, realizing prominent distinction points in the knowledge base.
1 incl (ti , Yi ) . N N
Q =
(34)
i=1
4.4.2.3. Information granularity allocation. The recent allocation of information granularity [39–41] (Lu et al., 2014) consists of the fundamental blocks of granular computing. The optimal allocation of information granularity (OAIG) is involved to maximize the coverage criterion, which means that the granular realization consists of the maximum amount of data [39]. Let there be an experimental evidence y = f (x) in the form of input-output pairs [xi , ti ]. Consider the mapping function, y = f (x, a), where the mapping vector is represented as a. The mechanism of optimal allocation G is applied to a as A = G (a); therefore, the granular mapping with optimal allocationY = G (f (x, a)) = f (x, G (a)) = f (x, A). Two criteria should be considered in this optimization. First is the coverage criterion for output data, ti=1,2,...,N . For all inputoutput pair (N data), the inclusion degree of output data is analysed as follows:
The second criterion concerns the Yi specificity, which is preferred to be as high as possible. Thus, the length of Yi is computed
1
based on the length of ˇ-cuts,
(33)
ˇ
ˇdˇ. As a consequence
0
of the decreasing function of length, g (length (Yi )), consider the execution of the second criteria based on the following satisfaction:
dim(a)
ε=
εj ,
(35)
j=1
where
Yi = f (xi , G (a)) ,
length Yi
ε
is
the
ε1 , ε2 , . . ., εdim(a)
T
.
information
granule
level,
and
ε=
134
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Fig. 12. Relationship between Q and ε.
Therefore, the two-criterion optimization problem is expressed as follows: Maximixe : Q =
1 N incl (ti , Yi ) N i=1
dim(a)
Subject to : =
εj . (36)
j=1
The relationship between Q and ε depicts the global view of the performance of the information granule as shown in Fig. 12. A higher area under curve (AUC) value indicates increased optimization performance. For both OAIG and JG (Section 4.2.3), parameters are optimized after the initial information granules constructed. Furthermore, the numeric evidence of depends on the input/output domain distribution; it might be low to properly represent the data. Therefore, a significant framework, which concurrently considers justifiability and specificity at each evolving stage, can be designed. 4.4.3. Constraint realization and parameter optimization discussions In Section 4.4, constraint realization and parameter optimization are regarded as essential parts of GrC. Inspired by the aforementioned methods, a dynamic constraint based on a self-organizing method can be a suitable design for a granular framework. Unlike SAFIN, where the initial boundary choice may change the number of clusters, the use of extremum and inflexion points might be the best way to reduce error. Furthermore, ECSFS uses the LSM technique to realize the extremum and inflexion points. Therefore, a dynamic constraint using a self-organizing technique might be significant, particularly with a design wherein the extremum and inflexion points are self-determined based on experimental evidence. Moreover, the design framework can be considered as the concurrent execution of justifiability and specificity. Thus, for each evolving stage, this concurrent execution can be realized in a FIG, which compromise between interpretability and accuracy. 4.5. Research on interpretability-accuracy tradeoff Various research have proposed the interpretable fuzzy model (or fuzzy information granule) to achieve interpretability-accuracy tradeoff. Table 3 indicates some of the selected fuzzy models applied in the literature grouped by publication year. Table 3 also summarizes the works that considered the interpretabilityaccuracy tradeoff for fuzzy models. Number of rules and accuracy are the common interpretability and accuracy measures. Nevertheless, interpretability constraint (Section 4.4.1) and over/under-fitting condition (Section 4.2.2) are significant in considering the interpretability-accuracy tradeoff. Furthermore, information granule should be specific to experience well-defined semantics. Therefore, the interpretability constraint can be significantly considered for GrC to provide a descriptive representation of the experimental evidence. Various
Fig. 13. Web of information granules based on the context-based clustering approach [70].
models considered the interpretability constraint depicted in the sixth column of Table 3. An in-depth review of the interpretability constraint is discussed in Section 4.4.1. Consequently, various clustering approaches applied the interpretability constraint (fifth column of Table 3). Conflict resolver and parameter optimization are also important factors in increasing accuracy (eighth and seventh columns in Table 3) as examined in Sections 4.3 and 4.4.2, respectively. Nevertheless, few studies considered contextor condition-based approaches and overfitting or underfitting situations as the evolving granulation process continues. Details of context-based and over/under-fitting approaches are found in Sections 4.5.1 and 4.2.2, respectively. Selected fuzzy models are explored in the following subsections. 4.5.1. Context-Based fuzzy system The conditional or context-based fuzzy granular model proposed by Pedrycz [68,69,18] involves the conditional fuzzy C-means. This model mainly aims to define the output context partition and then cluster the corresponding inputs. A consistent approach in the conditional clustering operation can be formulated as follows [70]: Determine structure in X under condition (context) D. The fuzzy subset of X looks more promising under a specific condition. Hence, the fuzzy context-based clustering algorithm specifically emphasizes only a fuzzy subset of X implied (or conditioned) by the context D [70]. Therefore, directionality or context is imposed upon the application data to realize the promising fuzzy subset of X. A web of information granules can be found for a set of contexts (F, G, and H) as illustrated in Fig. 13. Similar to X, fuzzy subsets (shaded balls in Fig. 13) are conditioned by F, G, and H. The numbers of contexts and clusters per context are predefined and fixed in Oh et al. [60], Pedrycz and Keun-Chang [18], Reyes-Galaviz and Pedrycz [27], and Reyes-Galaviz and Pedrycz [67]; hence, a computational model of the fuzzy system is manually designed by experts [8]. In addition, the number of output-context and its corresponding input clusters are based on the distinct nature of the data and are considered locally distributed. The result is often highly prejudiced and uncertain because of the limited prior information in designing fuzzy models for humans. Without considering the input space, the output domain partition may cause underfitting or overfitting (i.e., in Section 4.2.2) that can lead to inaccurate performance. The input space necessary to avoid the imbalance partition of the output domain should be considered when partitioning the output domain because of the uneven distribution of data in the input space. This imbalance partition of the output (or input)
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
domain may be referred to as the overfitting condition. Moreover, the realization of the output context and its fully independent corresponding input cluster requires refinements and refocusing. Hence, the stability–plasticity dilemma [35,16] in Pedrycz’s method prevents the fuzzy system from incorporating past and future knowledge. Furthermore, Di et al. [8,56] and Wang et al. [29] also used the same theme for the context-based fuzzy system. Their works involved the evolving methods which change the context and the input cluster until a satisfactory result is achieved (details in Sections 4.4.1 and 4.5.2). Moreover, the semantic cointension-based, fuzzy rule-based classifier [33] also uses the conditional-based fuzzy approach. The data granulation process of the input cluster uses the projection method based on the class level. The bound values (membership function variance) and cut sequences, all are based on the output constraint. Unlike in Mencar et al. [33], who considered classification, regression problems should granule the data such that the data granulation concurrently processes both output and input domains, thereby avoiding underfitting/overfitting situations. The results in Di et al. [56] and Mencar et al. [33] are highly uncertain because prior knowledge (user-defined number of clusters) was used to design these fuzzy systems. Their results are subjective considering that the user-defined number of clusters are applied to the application environment. Nonlinear training or testing errors can be observed in the evaluation, and overfitting/underfitting assessment is absent. Therefore, uncertain results (nonlinear training or testing error) are obtained for some clusters. For example, based on Mencar et al. [33] a nonlinear nature of the testing errors was observed for the ionosphere dataset as the number of cluster increases. For the automobile dataset, Di et al. [56] showed that the training error increases and the testing error decreases as the number of rules increases. Nevertheless, inconsistent results between automobile and census datasets were derived. Similar to the method of Oh et al. [60] and Pedrycz and KeunChang [18], the output domain is evenly partitioned; therefore, the output domain ignores the local distribution of the input data. An evenly partitioned output domain may also cause underfitting or overfitting, thereby leading to inaccurate performance. 4.5.2. Evolving granule methods to realize application environment To clarify the evolving granulation process, Fig. 14 depicts the comparison between the fixed grid-partitioning approach and the evolving method. In the grid-partitioning approach, a priori partitioning is considered for the domains of the input (antecedent) variables; thus, a specific number of MFs are realized for each domain [8]. This approach transforms the learning problem of fuzzy systems into a simple linear-learning problem. However, this transformation also includes several drawbacks, such as: 1) exponential increase of fuzzy terms and rules relative to the number of input variables, and 2) not all the fuzzy terms are effective because the even partitioning does not consider the knowledge management in input domains. However, the evolving method [Fig. 14(C–E)] starts adding fuzzy terms and rules based on the knowledge distribution in the input (or output) domain. For example, Pedrycz and Keun-Chang [18], Oh et al. [60], Pedrycz and Izakian [63] and Leski [65] used the FCM clustering method (fifth column of Table 3) to evolve the fuzzy terms and rules in the input/output domain. Furthermore, Reyes-Galaviz and Pedrycz [27] utilized the arbitrarily taken output interval for the output granule and the FCM; whereas Reyes-Galaviz and Pedrycz [67] used the same output interval for the output granule and the min-max neural network for the input cluster. Nevertheless, the threshold is used in Reyes-Galaviz and Pedrycz [67] to control the size of the hyperboxes (clusters); the number of hyperboxes heavily relied on the value of . However,
135
the automatic generation of the information granule from knowledge has significant potential as it relates toward human behaviors more and depicts a high abstraction level. Several classic and evolving methods that consider the automatic generation of information granules include EFuNN [71], Hebb-R–R [44], POPFNN Zhou and Quek [72], RSPOP [43], and DENFIS [73]. As for other systems, error-reducing evolving methods are described in the SSEM [29] and the ECSFS [8]. In both studies, the structure of the fuzzy rule base system evolved and the errors to fit the changes were reduced within the given system. These evolving processes continued to achieve the desired threshold accuracy. In addition, extremum and inflexion points were computed by using LSM to obtain optimal accuracy. The detailed parameter realization of these methods are described in Section 4.4. Learning methods employed in SSEM and ECSFS are based on the global and localized learning for the consequent and antecedent parameter rules, respectively. Without considering the antecedent part, the lack of localized learning in the consequent part may cause an imbalanced partition of the output domain. Furthermore, the existing ECSFS and SSEM models prevent the fuzzy system from incorporating past and future knowledge. Therefore, the stability–plasticity dilemma [16] is not addressed in the present evolving models, and ECSFS and SSEM may select noise data as an extremum or inflexion point. The output-constrained cluster approach [56] was proposed to consider the output domain for partitioning the input data. In the said approach, first, the output space is roughly partitioned by fuzzy c-means, and then, the data within each output constraint are further refined based on “separability,” which refers to the connectivity of the inputs. The output-constrained cluster approach results are highly uncertain because prior knowledge for designing the fuzzy system comes from users with limited competence (see Section 4.5.1). The output domain is evenly partitioned as in Pedrycz’s method [18], thereby ignoring the local distribution of the input data. An evenly partitioned output domain may also cause underfitting or overfitting, thus leading to inaccurate performance.
4.5.3. Self-adaptation methods SONFIN [74], eFSM [35] and SaFIN [16] are self-organizing models that ensure online learning. These fuzzy systems attempted to design a consistent and compact fuzzy rule base system to ensure a clear semantic meaning of fuzzy partitions with reasonable accuracy. These models considered the stability–plasticity dilemma, such that previous knowledge and new information are integrated. Adaptation independently occurs at the consequent and antecedent parts. Therefore, structure learning includes pruning inconsistent or identical rules and deleting orphaned rules. Hence, an operational framework for granular computing is required to synchronize the self-adaptation in both consequent and antecedent parts. The formation of the distinct information granule should also consider the aforementioned limitations of the existing methods. Moreover, the similarity threshold (ˇ = [0, 1]) is used in self-adaptation and online learning. Consequently, accuracy and interpretability varied for different values of ˇ [16]. SONFIN and eFSM use the uniform coverage criterion (i.e., threshold) during structure learning, similar to Reyes-Galaviz and Pedrycz [67]. Rational partitions highly depend on the input-output data distribution. Hence, the threshold causes uncertain partitions in both models. In SaFIN, the clustering approach effectively addresses the stability–plasticity condition. SaFIN integrates new and old knowledge, such that a distinct cluster is formed simply by averaging the variances of left and right neighbors. Thus, SaFIN also uses the uniform coverage criterion of each cluster because the center of a cluster remains constant. Hence, an adaptation that uses a dynamic uncertainty function is necessary to achieve a compact, consistent, and effective rule base.
136
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
Fig. 14. Granular region (Gi ) creation for the grid-partitioning approach (A to B) and evolving method (C to E). Consider two input features x1 and x2 , where Ajk is the fuzzy term.
Direct adaptive self-structuring fuzzy control (DASFC) [66] is another self-organizing model that guarantees online learning technique. DASFC also keeps the aforementioned limitations and uses the uniform coverage criterion (i.e., threshold which is similar to eFSM) during structure learning. Moreover, Oh et al. [60] proposed a granular-oriented self-organizing method called the hybrid fuzzy polynomial neural network (HFPNN). This method is based on a multi-layer perceptron with context-based polynomial neurons (CPNs) and polynomial neurons (PNs). The polynomial neural network (PNN) has two parts: the first layer contains CPNs, whereas PNs are adjusted in second or higher layers. The coefficients of CPNs and PNs are self-determined using LSE. Given that this method uses the C-FCM method (or simply FCM), its use of generic parameters is restricted. These generic parameters include: maximal number of inputs for anode, number of nodes forming a layer, number of contexts, number of clusters, and order of polynomial in CPNs and PNs. Therefore, the self-organizing ability can be constricted in the HFPNN method.
Moreover, the initial preferences were often arbitrarily decided, and the designed model failed to achieve the desired performance because the granularity depends on the distribution of the application problem. Therefore, Yao [89] proposed a granular computing with sequential decision-making, which considers finer granulation level with more detailed information. The relationship between progressive and granular computing was proposed in the top-down progressive computing [90], the ECSFS [8], the SSEM [29], the EIG [30], and the recursive construction of output-context fuzzy systems [31]. Progressive computing in these models realizes an evolving granule system from coarser to finer information granulation (i.e. Fig. 2). The SSEM and ECSFS used overfitting and underfitting criteria to continue progressive computing as depicted in Table 3. Furthermore, EIG and recursive construction [31] are initiated from the first rule, and then, concurrently executed the evolving and self adaptive processes to realize an effective rule base. 5. Discussion
4.5.4. Sequential decision-Making and progressive computing The optimal allocation of information granularity (Section 4.4.3) can be employed in group decision-making problems [75–78] in which the initial preferences from the decision maker can be adapted to achieve higher agreements. Furthermore, Cabrerizo et al. [79] described the group decision-making problems in heterogeneous contexts based on the information granules. Some fundamentals regarding granular fuzzy decision support systems can be found in Pedrycz et al. [80,81]. Moreover, Herrera and Martínez [42] solved the multi-expert decision-making based on the 2-tuple fuzzy linguistic model [19] and the multi-granular contexts. Similarly, Carrasco and Villar [57] proposed the linguistic summarization of heterogeneous data, which is also based on the 2-tuple fuzzy linguistic model. This model has been successfully used in a wide range of applications [82–87]. An overview regarding this 2-tuples fuzzy model can be found in Martínez et al. [88].
This section discusses the main achievements of the models and their drawbacks. Afterwards, the open problems and new solutions are identified. 5.1. Vicinity for designing fuzzy granular framework Existing models to overcome the aforementioned limitations (Section 4) are well-documented in the literature. Numerous algorithms that model the fuzzy granular methods, as well as the adaptive and evolving fuzzy systems have been developed. These fuzzy systems were used to design a consistent and compact fuzzy rule-based granular system, which ensures a clear semantic meaning of fuzzy partitions/clusters with reasonable accuracy. Nevertheless, these fuzzy systems suffer from the following major drawbacks (which can serve as the research scope while designing
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
a fuzzy granular framework): 1) The need for prior knowledge, such as the number of information or granules (or clusters) to be computed, or the need for a threshold to terminate the evolving process; 2) Adaptation depends on the threshold during structure learning, 3) The stability-plasticity dilemma of the system, 4) Inconsistent rules, and 5) Granulation process conflict. First, the fuzzy partitioning technique to design a fuzzy system was chosen. Numerical methods such as fuzzy C-means [68,18], fuzzy Kohonen partitioning [91], and linear vector quantization [92] are popular methods in partitioning the input-output domain. These methods reduce the amount of subjective information from experts, but still require prior knowledge on the number of clusters [16]. Such information is often arbitrarily decided and the system might fail to achieve a desirable performance for the application problem. Furthermore, the termination index for the evolving process is significant to obtain a balanced fuzzy model. The evolving process continues by adding fuzzy rules and terms. Consequently, the process is affected by the termination threshold given that the desired threshold may inadequately represent the application environment. However, the termination index from considering the overfitting/underfitting state is considerable because this index can be evaluated through the unbalanced situation of the fuzzy model. Thus, the optimized level of granularity (i.e. optimal evolving stage) can be achieved to realize a higher abstraction level. A second important consideration is the adaptation method during structure learning. Adaptation regulates new knowledge and existing information [35,93]. Given that the input-output space is distributed in the low space coverage in the fuzzy model [94,49], rational partitions highly depend on the input-output data distribution. Thus, uncertain partitions will be observed if the adaptation threshold is considered in structure learning. In addition, adaptation and termination thresholds also deprive the fuzzy system from coexisting with both past and future knowledge. This situation is known as the stability-plasticity dilemma [35] and is a significant criterion for developing the current system for modeling the application environment. Furthermore, inconsistent rules are observed in structure learning; furthermore, there is a need to excise inconsistent rules [35,16]. A rule base is inconsistent when there are two fuzzy granular rules such that the antecedent conditions are similar but the consequences differ. However, the lack of semantic adaptation between the consequent and antecedent parts primarily accounts for the inconsistent rule base. A rule base with semantic adaptation allows for a logical interpretation of the knowledge, thereby, realizing a consistent fuzzy granular rule base. Hence, a granular framework with proper adaptation algorithm, which can avoid inconsistent rules, is available in the literature [18,8,29,61]. However, similar to Tung and Chai [35] and Tung et al. [16], a self-organizing method is significant because it achieves the number of clusters according to self-adaptation. Moreover, synchronizing self-adaptation in both consequent and antecedent parts is a crucial factor in avoiding inconsistent rules. Finally, the conflict situation in the fuzzy granular rule base is another important problem. This situation is observed when a test input encounters more than one rules fired at the same time. Resolving the conflict situation is a very rewarding undertaking for the granular rule base, although interpretability issues are fulfilled in the granular partition level. Conflict situations and conflict resolvers are discussed in Section 4.3. 5.2. Problem statement Models that overcome the interpretability-accuracy tradeoff are well-documented in research. Numerous algorithms representing fuzzy granular models have been developed, including adaptive neural and evolving fuzzy systems. Based on the aforementioned
137
existing models, the following concerns are significant when considering a computing framework for the fuzzy information granule: (1) evolving granule process from bloated granularity (coarser partition) to higher granularity (fine partition), (2) interpretability constraint for granular computing, (3) overfitting and underfitting situations in the evolving process, and (4) the stability-plasticity tradeoff. First, the information granule evolves from coarser to finer partitions, which consequently yields oblique decision boundaries [20]; thus, the evolving process and its decision boundaries achieve a low model error. Numerous studies on fuzzy granular approach focus on improving interpretability constraints (or decision boarders) to achieve a low model error [59,95,50,12] (Lu et al., 2014); which is the second significant consideration for granular computing. Existing models use the uniform coverage criterion (i.e., threshold) [35,16], the rule weight [20], or the LSM [8,29] to achieve a significant decision border. For example, the models by Di et al. [8] and Wang et al. [29] are error-reducing evolving methods that used boundary constraints in the LSM-based evolving method. Consequently, LSM requires the LSE computation of entire datasets for each evolving stage and considers noise data as extremum or inflexion points because of the desynchronization of the rule antecedent and consequent parameters. Furthermore, the threshold causes uncertain partition given that datasets are unevenly distributed in the input-output domain, whereas the rule weight affects all MFs associated with the rule under observation and causes a permanent decision border. Therefore, a dynamic constraint can be designed such that it updates the decision border of the linguistic terms. Consequently, the evolving granule approach and dynamic interpretability constraint should coexist (Section 4.4.3). Hence, this computing framework can be a tradeoff between interpretability and accuracy. The third important consideration is the overfitting and underfitting situations in the evolving granule approach. Underfitting occurs when the information granule is too coarse to fit the data resulting in poor testing accuracy. Furthermore, a number of evolving stages cannot properly represent the data when the evolving granule process continues, leading to an unbalanced state. Therefore, this unbalanced state leads to the fuzzy system overfitting (i.e., the data fit is regarded as close to each other because of the small and unbalanced information granule), thereby resulting in poor testing accuracy. Hence, evolving granule approach should consider both the underfitting and overfitting states of each evolving stage. Underfitting or overfitting conditions were ensured by Di et al. [8] for each evolving stage and 3% was considered as the efficient threshold value to realize the best-fitted fuzzy model. However, unlike the threshold, it is desirable if underfitting/overfitting can be decided based on the current evolving stage; hence these unbalanced situations should be avoided in that evolving stage. Furthermore, Mencar et al. [33] and Di et al. [56] utilized the conditional-based fuzzy approach where the data granulation process of the input cluster uses the projection method based on class levels (or output-contexts). The results of these models are highly uncertain (nonlinear training or testing error) because of the underfitting/overfitting situations. Therefore, the computational framework can be designed for the concurrent execution of the data granulation process for both output and input domains, as well as for the optimized level (or certain evolving stage) of granularity to realize the simplicity (inverse of complexity) and accuracy tradeoff (Section 4.2.1), thereby avoiding the underfitting/overfitting situation. Moreover, the realization of these unbalanced states can enhance system performance if the stability-plasticity tradeoff is considered. The stability-plasticity tradeoff is the fourth significant consideration in designing a granular framework. This tradeoff combines the past and future knowledge from the training data
138
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
to achieve a current and up-to-date system to model the application environment [35,16]. In Pedrycz’s method [18], refining and refocusing are significant to achieve the output context and its corresponding input clusters given that the realization procedure is fully independent. Hence, the stability–plasticity dilemma prevents the fuzzy system from incorporating past and future knowledge. However, the stability-plasticity tradeoff is significant for each evolving stage to realize an up-to-date and consistent fuzzy model. This tradeoff is also significant for developing the interpretability constraint and overfitting/underfitting index; therefore, a consistent and compact fuzzy information granule can be achieved to ensure the interpretability-accuracy tradeoff. Based on these four considerations, the hypothesis for a conceptual framework in which a bridge between semantics at the fuzzy granular partition (i.e. information granule), accuracy, and complexity (number of fuzzy terms or rules) is considered, can be made. This framework can focus on integrating the evolving and self-organizing system, such that it can consider the interpretability–accuracy and stability–plasticity tradeoffs. Unlike the grid partitioning approach discussed in Pedrycz and KeunChang [18], achieving some distinction points in the output (or input) domain to attain reasonable accuracy is significant [52,53]. The partitions can be made in the output (or input) domain at a higher error region, wherein the evolving process can realize a low model error. An interpretability constraint can be considered so that it can be dynamically updated for each training data. This dynamic constraint controls the uncertainty of semantic interpretability. Therefore, a semantic rule base is important for enhancing accuracy, such that it can be competently handled by humans [33]. Furthermore, the overfitting/underfitting index, which depends on the previous evolving stage, can be realized for each evolving stage. If both interpretability constraint and overfitting/underfitting index incorporate the previous and current stages, then stability-plasticity dilemma can be avoided.
6. Conclusions The conflict between the semantic information granule and accuracy renders it as a challenging issue that needs to be addressed when forming an operational framework to construct information granules. Fuzzy information granule suffers from various drawbacks, which causes accuracy and interpretability dilemmas for solving real world problems. This paper presented a methodical review of the diverse ideas for realizing an effective fuzzy information granule, which addresses the interpretability and accuracy tradeoff. Furthermore, the conceptual framework should be designed such that the data is adequately represented by the granule. For the fuzzy information granule with an already reasonably high accuracy, it is desirable to avoid conflicting decisions and overfitting/underfitting situations. Correspondingly, interpretability constraints and parameter optimization are significant for interpretability-accuracy tradeoff while designing a highly interpretable fuzzy information granule. Despite producing granular frameworks with competitive performance, most existing models suffer from different tradeoffs. For instance, both ECSFS and SSEM use extremum/inflexion criteria to determine the optimized data point, forming an interpretability constraint to avoid the interpretability-accuracy dilemma. Nevertheless, these models use LSM to find the extremum/inflexion data point and to avoid the stability-plasticity criteria for its sub-region. Thus, the selection of noise data as extremum/inflexion point is highly possible. Furthermore, most models utilize thresholds to realize the constraints for semantic blocks; however, it is desirable to design a mathematical framework where dynamic constraint is realized. Intuitively, the concurrent evolution of the information granule and
parameter optimization can further improve model performance. To introduce the extremum/inflexion [52,8], the output-context fuzzy information granule [18], and the self-organizing method [16,35], the constraint realization and parameter optimization techniques are described in this paper. Note that the advantages and disadvantages of these methods can inspire future research.
Acknowledgments This work was supported by the Malaysian Ministry of Higher Education under the Commonwealth Scholarship and Fellowship Plan, and in part by the Universiti Sains Malaysia through the research university grants entitled “Development of an Intelligent Auto-Immune Diseases Diagnostic System by Classification of Hep2 Immunoflourescence Patterns”.
References [1] C. Mencar, A.M. Fanelli, Interpretability constraints for fuzzy information granulation, Inform. Sci. 178 (2008) 4585–4618. [2] J.C. Bezdek, On the relationship between neural networks: pattern recognition and intelligence, Int. J. Approx. Reason. 6 (1992) 85–107. [3] W. Pedrycz, Computational Intelligence: An Introduction, CRC Press, Boca Raton, FL, 1997. [4] L.A. Zadeh, Graduation and granulation are keys to computation with information described in natural language, in: IEEE International Conference on Granular Computing, Atlanta, Georgia, USA, 10–12, May, 2006, p. 30. [5] A. Gacek, Granular modelling of signals: a framework of granular computing, Inform. Sci. 221 (2013) 1–11. [6] W. Pedrycz, Computing with granular information: fuzzy sets and fuzzy relations, in: Knowledge-Based Clustering, John Wiley & Sons, Inc., 2005. [7] W. Ostasiewicz, Towards fuzzy logic, in: L. Zadeh, J. Kacprzyk (Eds.), Computing with Words in Information/Intelligent Systems 1, Physica-Verlag, HD, 1999. [8] W. Di, Z. Xiao-Jun, J.A. Keane, An evolving-construction scheme for fuzzy systems, IEEE Trans. Fuzzy Syst. 18 (2010) 755–770. [9] M.J. Gacto, R. Alcalá, F. Herrera, Interpretability of linguistic fuzzy rule-based systems: an overview of interpretability measures, Inform. Sci. 181 (2011) 4340–4360. [10] A. Bargiela, W. Pedrycz, Granular computing as an emerging paradigm of information processing, in: Granular Computing, Springer, US, 2003. [11] L.A. Zadeh, Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems, Soft Comput. 2 (1998) 23–25. [12] W. Pedrycz, Information granules and their use in schemes of knowledge management, Sci. Iran. 18 (2011) 602–610. [13] C. Mencar, Theory of Fuzzy Information Granulation: Contributions to Interpretability Issues. PhD Dissertation, Department of Informatics, University of Bari, Italy, 2004. [14] W. Pedrycz, From fuzzy data analysis and fuzzy regression to granular fuzzy data analysis, Fuzzy Sets Syst. 274 (2015) 12–17. [15] D. Nauck, F. Klawonn, R. Kruse, Foundations of Neuro-Fuzzy Systems, John Wiley & Sons, Inc., 1997. [16] S.W. Tung, Q. Chai, G. Cuntai, SaFIN: a self-Adaptive fuzzy inference network, IEEE Trans. Neural Netw. 22 (2011) 1928–1940. [17] G. Bosque, I. Del Campo, J. Echanobe, Fuzzy systems: neural networks and neuro-fuzzy systems: a vision on their hardware implementation and platforms over two decades, Eng. Appl. Artif. Intell. 32 (2014) 283–331. [18] W. Pedrycz, K. Keun-Chang, Linguistic models as a framework of user-centric system modeling, IEEE Trans. Syst. Man Cybernet. A: Syst. Hum. 36 (2006) 727–745. [19] F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Trans. Fuzzy Syst. 8 (6) (2000) 746–752. [20] A. Riid, E. Rüstern, Adaptability: interpretability and rule weights in fuzzy rule-based systems, Inform. Sci. 257 (2014) 301–312. [21] S. Kar, S. Das, P.K. Ghosh, Applications of neuro fuzzy systems: a brief review and future outline, Appl. Soft Comput. 15 (2014) 243–259. [22] W. Pedrycz, Granular Computing: Analysis and Design of Intelligent Systems, CRC Press/Francis Taylor, Boca Raton, 2013, pp. 239–269. [23] W. Pedrycz, Collaborative and linguistic models of decision making, in: Granular Computing: Analysis and Design of Intelligent Systems, CRC Press/Francis Taylor, Boca Raton, FL, 2013, pp. 239–269. [24] E. Lughofer, On-line assurance of interpretability criteria in evolving fuzzy systems–achievements, new concepts and open issues, Inform. Sci. 251 (2013) 22–46. [25] K. Trawinski, O. Cordon, L. Sanchez, A. Quirin, A genetic fuzzy linguistic combination method for fuzzy rule-based multiclassifiers, IEEE Trans. Fuzzy Syst. 21 (2013) 950–965.
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140 [26] W. Pedrycz, R. Al-Hmouz, A. Balamash, A. Morfeq, Designing granular fuzzy models: a hierarchical approachto fuzzy modeling, Knowl.-Based Syst. 76 (2015) 42–52. [27] O.F. Reyes-Galaviz, W. Pedrycz, Granular fuzzy models: analysis, design, and evaluation, Int. J. Approx. Reason. 64 (2015) 1–19. [28] J. Rissanen, Modeling by shortest data description, Automatica 14 (1978) 465–471. [29] D. Wang, X.-J. Zeng, J.A. Keane, A simplified structure evolving method for Mamdani fuzzy system identification and its application to high-dimensional problems, Inform. Sci. 220 (2013) 110–123. [30] M.M. Ahmed, N.M.A. Isa, Information granularity model for evolving context-based fuzzy system, Appl. Soft Comput. 33 (2015) 183–196. [31] M.M. Ahmed, A.S.N. Huda, N.M.A. Isa, Recursive construction of output-context fuzzy systems for the condition monitoring of electrical hotspots based on infrared thermography, Eng. Appl. Artif. Intell. 39 (2015) 120–131. [32] J.M. Alonso, L. Magdalena, S. Guillaume, HILK: A new methodology for designing highly interpretable linguistic knowledge bases using the fuzzy logic formalism, Int. J. Intell. Syst. 23 (2008) 761–794. [33] C. Mencar, C. Castiello, R. Cannone, A.M. Fanelli, Design of fuzzy rule-based classifiers with semantic cointension, Inform. Sci. 181 (2011) 4361–4377. [34] F. Wana, H. Shang, L.-X. Wangc, Y.-X. Suna, How to determine the minimum number of fuzzy rules to achieve given accuracy: a computational geometric approachto SISO case, Fuzzy Sets Syst. 150 (2005) 199–209. [35] W.L. Tung, Q. Chai, eFSM: a novel online neural-fuzzy semantic memory model, IEEE Trans. Neural Netw. 21 (2010) 136–157. [36] B. Pizzileo, L. Kang, G.W. Irwin, Z. Wanqing, Improved structure optimization for fuzzy-neural networks, IEEE Trans. Fuzzy Syst. 20 (2012) 1076–1089. [37] W. Pedrycz, M. Song, A genetic reduction of feature space in the design of fuzzy models, Appl. Soft Comput. 12 (2012) 2801–2816. [38] Mathbabe, 2012. http://mathbabe.org/2012/11/20/columbia-data-sciencecourse-week-12-predictive-modeling-data-leakage-model-evaluation/, (Accessed 22 September 2015). [39] W. Pedrycz, A. Bargiela, An optimization of allocation of information granularity in the interpretation of data structures: toward granular fuzzy clustering, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 42 (2012) 582–590. [40] W. Pedrycz, R. Al-Hmouz, A. Morfeq, A. Balamash, The design of free structure granular mappings: the use of the principle of justifiable granularity, IEEE Trans. Cybernt. 43 (2013) 2105–2113. [41] W. Pedrycz, W. Homenda, Building the fundamentals of granular computing: a principle of justifiable granularity, Appl. Soft Comput. 13 (2013) 4209–4218. [42] F. Herrera, L. Martínez, A model based on linguistic 2-tuples for dealing with multigranular hierarchical linguistic contexts in multi-expert decision-making, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 31 (2) (2001) 227–234. [43] K.K. Ang, C. Quek, RSPOP: rough Set–Based pseudo outer-product fuzzy rule identification algorithm, Neural Comput. 17 (2005) 205–243. [44] F. Liu, C. Quek, G.S. NG, A novel generic hebbian ordering-based fuzzy rule base reduction approach to mamdani neuro-fuzzy system, Neural Comput. 19 (2007) 1656–1680. [45] J.A. Sanz, A. Fern´ıandez, H. Bustince, F. Herrera, IVTURS: a linguistic fuzzy rule-based classification system based on a new interval-valued fuzzy reasoning method with tuning and rule selection, IEEE Trans. Fuzzy Syst. 21 (3) (2013) 399–412. [46] H. Ishibuchi, T. Nakashima, Effect of rule weights in fuzzy rule-based classification systems, IEEE Trans. Fuzzy Syst. 9 (4) (2001) 506–515. [47] H. Nakanishi, I.B. Turksen, M. Sugeno, A review and comparison of six reasoning methods, Fuzzy Sets Syst. 57 (1993) 257–294. [48] R.R. Yager, Firing fuzzy rules with measure type inputs, IEEE Trans. Fuzzy Syst. 23 (4) (2015) 939–950. [49] P. Pulkkinen, H. Koivisto, A dynamically constrained multiobjective genetic fuzzy system for regression problems, IEEE Trans. Fuzzy Syst. 18 (2010) 161–177. [50] A.R. Solis, G. Panoutsos, Granular computing neural-fuzzy modelling: a neutrosophic approach, Appl. Soft Comput. 13 (2013) 4010–4021. [51] Bayen, A.M., 2015. University of California. Barkley. http://bayen.eecs. berkeley.edu/bayen/?q=webfm send/246, (Accessed 1 July 2015). [52] B. Kosko, Optimal fuzzy rules cover extrema, Int. J. Intell. Syst. 10 (1995) 249–255. [53] D. Yongsheng, Y. Hao, S. Shihuang, Necessary conditions on minimal system configuration for general MISO Mamdani fuzzy systems as universal approximators, IEEE Trans. Syst. Man. Cybernet. B: Cybernet. 30 (2000) 857–864. [54] F. Herrera, E. Herrera-Viedma, L. Martínez, A fuzzy linguistic methodology to deal with unbalanced linguistic term sets, IEEE Trans. Fuzzy Syst. 16 (2) (2008) 354–370. [55] C. Mencar, C. Castiello, R. Cannone, A.M. Fanelli, Interpretability assessment of fuzzy knowledge bases: a cointension based approach, Int. J. Approx. Reason. 52 (2011) 501–518. [56] W. Di, Z. Xiao-Jun, J.A. Keane, An output-constrained clustering approach for the identification of fuzzy systems and fuzzy granular systems, IEEE Trans. Fuzzy Syst. 19 (2011) 1127–1140. [57] R.A. Carrasco, P. Villar, A new model for linguistic summarization of heterogeneous data: an application to tourism web data sources, Soft Comput. 16 (1) (2012) 135–151.
139
[58] W. Zhao, K. Li, G.W. Irwin, New gradient descent approach for local learning of fuzzy neural model, IEEE Trans. Fuzzy Syst. 21 (1) (2013) 30–44. [59] M.A. Sanchez, J.R. Castro, F. Perez-Ornelas, O. Castillo, A hybrid method for IT2 TSK formation based on the principle of justifiable granularity and PSO for spread optimization, in: IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), Edmonton, Canada, 24–28 June, 2013, pp. 1268–1273. [60] S. Oh, W.-D. Kim, B.-J. Park, W. Pedrycz, A design of granular-oriented self-organizing hybrid fuzzy polynomial neural networks, Neurocomputing 119 (2013) 292–307. [61] M. Sanchez, O. Castillo, J. Castro, Uncertainty-Based information granule formation, in: O. Castillo, P. Melin, W. Pedrycz, J. Kacprzyk (Eds.), Recent Advances on Hybrid Approaches for Designing Intelligent Systems, Springer International Publishing, 2013. [62] M. Fazzolari, R. Alcalá, F. Herrera, A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: d-MOFARC algorithm, Appl. Soft Comput. 24 (2014) 470–481. [63] W. Pedrycz, H. Izakian, Cluster-centric fuzzy modeling, IEEE Trans. Fuzzy Syst. 22 (6) (2014) 1585–1597. ´ [64] K. Cpałka, K. Łapa, A. Przybył, M. Zalasinski, A new method for designing neuro-fuzzy systems for nonlinear modelling with interpretability aspects, Neurocomputing 135 (2014) 203–217. [65] J.M. Leski, Fuzzy (c+p) − means clustering and its application to a fuzzy rule-based classifier:toward good generalization and good interpretability, IEEE Trans. Fuzzy Syst. 23 (4) (2015) 802–812. [66] N. Wang, J.-C. Sun, Y.-C. Liu, Direct adaptive self-structuring fuzzy control with interpretable fuzzy rules for a class of nonlinear uncertain systems, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.09.036, in press. [67] O.F. Reyes-Galaviz, W. Pedrycz, Granular fuzzy modeling with evolving hyperboxes in multi-dimensional space of numerical data, Neurocomputing 168 (2015) 240–253. [68] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (1998) 601–612. [69] A. Bargiela, W. Pedrycz, Granular prototyping in fuzzy clustering, in: Granular Computing, Springer, US, 2003. [70] W. Pedrycz, Conditional fuzzy clustering, in: Knowledge-Based Clustering, John Wiley & Sons, Inc., 2005. [71] N. Kasabov, Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 31 (2001) 902–918. [72] R.W. Zhou, C. Quek, POPFNN: a pseudo outer-product based fuzzy neural network, Neural Netw. 9 (1996) 1569–1581. [73] N.K. Kasabov, S. Qun, DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction, IEEE Trans. Fuzzy Syst. 10 (2002) 144–154. [74] J. Chia-Feng, L. Chin-Teng, An online self-constructing neural fuzzy inference network and its applications, IEEE Trans. Fuzzy Syst. 6 (1998) 12–32. [75] Y.-P. Jiang, Z.-P. Fan, J. Ma, A method for group decision making with multi-granularity linguistic assessment information, Inform. Sci. 178 (2008) 1098–1109. [76] Z.-P. Fan, Y. Liu, A method for group decision-making based on multi-granularity uncertain linguistic information, Expert Syst. Appl. 37 (2010) 4000–4008. [77] Z. Zhang, C. Guo, A method for multi-granularity uncertain linguistic group decision making with incomplete weight information, Knowl.-Based Syst. 26 (2012) 111–119. ˜ W. Pedrycz, E. Herrera-Viedma, Building consensus in [78] F.J. Cabrerizo, R. Urena, group decision making with an allocation of information granularity, Fuzzy Sets Syst. 255 (2014) 115–127. [79] F.J. Cabrerizo, E. Herrera-Viedma, W. Pedrycz, A method based on PSO and granular computing of linguistic information to solve group decision making problems defined in heterogeneous contexts, Eur. J. Oper. Res. 230 (3) (2013) 624–633. [80] W. LU, W. Pedrycz, X. Liu, J. Yang, P. LI, The modeling of time series based on fuzzy information granules, Expert Syst. Appl. 41 (2014) 3799–3808. [81] W. Pedrycz, R. Al-Hmouz, A. Morfeq, A. Balamash, Building granular fuzzy decision support systems, Knowl.-Based Syst. 58 (2014) 3–10. [82] S. Alonso, F.J. Cabrerizo, F. Chiclana, F. Herrera, E. Herrera-Viedma, Group decision making with incomplete fuzzy linguistic preference relations, Int. J. Intell. Syst. 24 (2) (2009) 201–222. [83] Y. Dong, C.C. Li, Y.F. Xu, X. Gu, Consensus-based group decision making under multi-granular unbalanced 2-tuple linguistic preference relations, Group Decis. Negotiat. 24 (2) (2015) 217–242. [84] L. Martínez, Sensory evaluation based on linguistic decision analysis, Int. J. Approx. Reason. 44 (2) (2007) 148–164. [85] L. Martínez, M. Espinilla, L.G. Pérez, A linguistic multigranular sensory evaluation model for olive oil, Int. J. Comput. Intell. Syst. 1 (2) (2012) 148–158. [86] S.Y. Wang, Applying 2-Tuple multigranularity linguistic variables to determine the supply performance in dynamic environment based on product-Oriented strategy, IEEE Trans. Fuzzy Syst. 16 (1) (2008) 29–39. [87] C.P. Wei, N. Zhao, X.J. Tang, Operators and comparisons of hesitant fuzzy linguistic term sets, Trans. Fuzzy Syst. 22 (3) (2014) 575–585. [88] L. Martínez, R.M. Rodriguez, F. Herrera, The 2-tuple Linguistic Model: Computing with Words in Decision Making, Springer, New York, 2015.
140
Md.M. Ahmed, N.A.M. Isa / Applied Soft Computing 54 (2017) 121–140
[89] Y. Yao, Granular computing and sequential three-Way decisions, in: P. Lingras, M. Wolski, C. Cornelis, S. Mitra, P. Wasilewski (Eds.), Rough Sets and Knowledge Technology, Springer, Berlin Heidelberg, 2013. [90] Y. Yao, J. Luo, Top-Down progressive computing, in: J. Yao, S. Ramanna, G. Wang, Z. Suraj (Eds.), Rough Sets and Knowledge Technology, Springer, Berlin Heidelberg, 2011. [91] S.-B. Roh, T.-C. Ahn, W. Pedrycz, The design methodology of radial basis function neural networks based on fuzzy K-nearest neighbors approach, Fuzzy Sets Syst. 161 (2010) 1803–1822. [92] T. Kohonen, Self-organized formation of topologically correct feature maps, Biol. Cybern. 43 (1982) 59–69. [93] C. Quek, M. Pasquier, B. Lim, A novel self-organizing fuzzy rule-based system for modelling traffic flow behaviour, Expert Syst. Appl. 36 (2009) 12167–12178. [94] H.K.H. Chow, K.L. Choy, W.B. Lee, A strategic knowledge-based planning system for freight forwarding industry, Expert Syst. Appl. 33 (2007) 936–954.
[95] M. Sanchez, O. Castillo, J. Castro, An analysis on the intrinsic implementation of the principle of justifiable granularity in clustering algorithms, in: O. Castillo, P. Melin, J. Kacprzyk (Eds.), Recent Advances on Hybrid Intelligent Systems, Springer, Berlin Heidelberg, 2013. [96] W. LU, W. PEDRYCZ, X. LIU, J. YANG, P. LI, The modeling of time series based on fuzzy information granules, Expert Syst. Appl. 41 (2014) 3799–3808. [97] B. Kitchenham, Procedures for performing systematic reviews, in: Technical Report TR/SE 0401 Software Engineering Group, Department of Computer, Science Keele University, 2004. [98] J. Webster, R.T. Watson, Analyzing the past to prepare for the future: writing a literature review, MIS Quarterly 26 (2) (2002) 13–23.