Evolving type-2 web news mining

Evolving type-2 web news mining

Accepted Manuscript Title: Evolving Type-2 Web News Mining Author: Choiru Za’in Mahardhika Pratama Edwin Lughofer Sreenatha G. Anavatti PII: DOI: Ref...

2MB Sizes 1 Downloads 40 Views

Accepted Manuscript Title: Evolving Type-2 Web News Mining Author: Choiru Za’in Mahardhika Pratama Edwin Lughofer Sreenatha G. Anavatti PII: DOI: Reference:

S1568-4946(16)30603-2 http://dx.doi.org/doi:10.1016/j.asoc.2016.11.034 ASOC 3924

To appear in:

Applied Soft Computing

Received date: Revised date: Accepted date:

31-7-2016 31-10-2016 18-11-2016

Please cite this article as: Choiru Za’in, Mahardhika Pratama, Edwin Lughofer, Sreenatha G.Anavatti, Evolving Type-2 Web News Mining, Applied Soft Computing Journal http://dx.doi.org/10.1016/j.asoc.2016.11.034 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Elsevier Editorial System(tm) for Applied Soft Computing Manuscript Draft

Manuscript Number: ASOC-D-16-01797

Evolving Type-2 Web News Mining

Article Type: Full Length Article

Choiru Zain, Mahardhika Pratama, Edwin Lughofer, Sreenatha G Anavatti

Corresponding Author: Mr. Choiru Za'in, M.Kom Corresponding Author's Institution: Latrobe Univeristy First Author: Choiru Za'in, M.Kom Order of Authors: Choiru Za'in, M.Kom; Mahardhika Pratama, Dr.; Edwin Lughofer, PhD; Sreenatha G Anavatti Dr.

Choiru Za’in Corresponding Author

Highlights: 

Web news mining using an evolving type-2 fuzzy system algorithm is proposed.



The online news articles are characterized as non-stationary data.



Non stationary data in the web news mining can be viewed as the change of category distribution over the time (concept drift).



The algorithm’s framework adopts open structure that can handle non-stationary environments and work on single pass mode.



Evolving learning can typically work with minimum or mostly with no manual operator supervision which is desirable in an online real-time learning environment.

Abstract

Web news articles are generated in continuous, time-varying, and rapid modes. This environment causes an explosion of information which needs to be stored, processed and analyzed. Conventional machine learning algorithms that are applied in the web news mining work in an offline environment cannot efficiently handle data streams. In this paper, we propose an evolving web news mining framework based on the recently published Evolving Type-2 Classifier (eT2Class). The eT2Class adopts an open structure that can be used in non-stationary environments and works on a single pass learning mode that is applicable for online real-time applications. The effectiveness of our evolving web news mining techniques is numerically validated and compared against state-of-the-art algorithms. The efficacy of our methodology has been numerically validated with real local Australian news articles, namely the Age, spanning from 26/2/2016 to 13/3/2016 and has been compared with 6 state of the art algorithms. Our algorithm outperforms other consolidated algorithms and achieves a tradeoff between complexity and accuracy with almost 10% improvement in term of complexity.

Keywords: evolving fuzzy classifier, online learning, web news mining

I. Introduction

Nowadays, websites are a common platform for exchanging information. The type of information most commonly accessed on websites is textual data, such as emails, blogs, social media, and web news articles. Of these, web news articles are the most popular media for sharing recent world events and it is the most accessible media for people to gain up-to-date information.

Unlike traditional newspaper articles, web-based news articles are updated in a near real-time manner. A survey conducted by CNNIC [1] indicates that the number of online news articles grew by around 20 percent from mid-2012 to early 2014. The growth of these web-based news articles has resulted in a significant volume increase of digital news data stored in the server or cloud. This creates a data stream with rapidly changing characteristic: continuously arriving, time-varying, unpredictable, and unbounded [2].

Many text mining techniques can be applied to news articles for various purposes: automatic news categorization, opinion mining and semantic analysis. Traditional text mining techniques focus on processing text in an offline mode [3]. A typical text mining technique builds the model based on the historical data that is stored in the server in an offline condition. Such an algorithm does not take into account any additional data after the model has been built and requires a fixed number of training data to build the model [3]. It does not cater for the rapidly changing data characteristics of web-based news articles, such as the appearance of new topics in the article (leading to new text categorization classes) or changes (drifts) in the textual description of events, occasions etc (e.g., due to political influences, writing style fashion, and the rare occurrence of incident/event). These algorithms impose a complete retraining from scratch with an up-to-date dataset to handle new training patterns. The retraining phase leads to the so-called catastrophic forgetting of previously valid knowledge and prohibitive computational and memory burden, which is linearly independent on the number of training data.

Incremental learning algorithms have been proposed to accommodate the changing characteristics of data in a data stream. They update the model from incoming data without catastrophically erasing existing knowledge. There are two main methodologies of learning the data incrementally: instance-based incremental learning and batch-based incremental learning [3].

In order to update the existing model, the batch-based incremental learning algorithm needs to retrain it with an updated dataset incorporating new data, whereas the instance-based incremental algorithm updates the model only based on the new instances [4-7]. Ade [8] further divides the batch-based incremental learning

algorithm into two main approaches: ensemble and data accumulation learning. The ensemble learning approach learns a model on each data batch separately and makes a final decision based on voting or weighting the model outputs (typically, older ones receive lower weights), whereas the data accumulation learning approach iteratively learns a model on all data (including previously learnt data) whenever an update is requested. The drawback of these approaches lies in the fact that they require some time-intensive re-training steps, which are computationally intractable in the online learning environment. Although the classical incremental learning algorithms are applicable for online real-time applications, they assume a prefixed model structure which does not fully reflect the nonlinearity of a problem and does not follow changing dynamics of a decision boundary. The conventional incremental learning merely involves the parameter learning scenario without the self-evolving property [9].

Therefore, the field of evolving intelligent systems [10] have emerged during the last 10 years, which are equipped with so-called evolving learning algorithms. In these algorithms, the model self-constructs itself in an online real-time manner. An evolving learning algorithm can start its learning process from scratch with no initial structure and it can add, remove or merge structural model components (such as, e.g., neurons, fuzzy rules, and leaves) on demand in the single pass learning mode. Moreover, an evolving learning algorithm can typically work with minimum or mostly with no manual operator supervision which is desirable in an online real-time learning environment.

To the best of our knowledge, there is limited work on an evolving web news mining algorithm. Even though Iglesias [11] developed web news mining based on eClass0 classifier [12] that can handle data streams, they still rely on the type-1 fuzzy system which cannot deal with higher level information uncertainty because of its crisp and certain nature [13].

Comprehensive surveys about 40+ methods can be found in [14] and [15]. Existing approaches in [12, 16-20], are mostly built on a type-1 fuzzy system which relies on a crisp fuzzy set. A type-1 fuzzy system is prone to information uncertainty as a result of conflicting expert knowledge, noisy data, noisy measurement, and inexact fuzzy

rule parameter identification scenarios [21]. To tackle these problems, a type-2 fuzzy system was proposed by Zadeh [22], and has been significantly extended by the group around Jerry Mendel during the recent 20 years [13, 23]. A type-2 fuzzy system constructs a fuzzy – fuzzy set which employs fuzzy membership features. In the context of evolving learning of models from streams, type-2 fuzzy systems deserve in-depth study. Some approaches can be found in [24-26] for regression problems, but using standard axis-parallel type-2 fuzzy rules and hyper-plane type consequent functions, whereas the classification case has been firstly addressed in [27].

In this paper, we propose evolving a web news mining framework using the Evolving Type-2 Classifier (eT2Class) [27]. eT2Class can learn data online in a non-stationary environment. In addition, it can forget unnecessary knowledge, while at the same time being computationally efficient because all learning procedures are committed to the single-pass learning mode. The algorithmic development of the eT2Class involves six new salient learning components: 1) the fuzzy inference scheme of the eT2Class depends on a generalized version of the interval type-2 Gaussian rule with uncertain standard deviations (SDs), where the rule premise part is constructed by the multivariable Gaussian function with an uncertain non-diagonal covariance matrix. The rule consequent is controlled by the Chebyshev polynomial [28], expanding the degree of freedom of the Takagi Sugeno Kang (TSK) rule consequent. It is worth noting that the use of type-2 fuzzy system within the framework of web news mining is urgently required due to uncertain characteristic of online news article; 2) the fuzzy rules are extracted automatically from data streams by two rule growing cursors: Type-2 Datum Significance (T2DS); Type-2 Data Quality (T2DQ) methods, which are extended from the type-1 fuzzy classifier model as being used in [11] to suit the type-2 fuzzy rule; 3) the initialization of the new fuzzy rule considers influence zone strategy to overcome class overlapping, which can degenerate the classifier’s generalizations [29]; 4) outdated and unnecessary fuzzy rules can be identified and, in turn, pruned by two rule pruning scenarios: the Type-2 Extended Rule Significance (T2ERS) method and the Type-2 Potential+ (T2P+) method. Both methods are extensions of the methods that were proposed in [16] which incorporate the type-2 fuzzy rule. T2P+ also has a rule recall mechanism, in which the fuzzy rule that has been deactivated in previous learning can be recalled in the future to handle a recurring or cyclic concept drift; 5) to increase the interpretability of rule semantics,

the eT2Class is equipped with a novel rule merging scenario using multi-faceted merging criteria: the vector similarity measure in [30] [31] and the blow-up check; and 6) the eT2Class proposed novel adaptation scheme based on Zero Error Density Maximization (ZEDM) [32] to adapt the type reducer 𝑞𝑙 and 𝑞𝑟 factors. In addition, the convergence of this method is mathematically proven through the Lyapunov stability criterion and the rule outputs are fine-tuned through a local learning scenario of the Fuzzily Weighted Generalized Least Square (FWGRLS) method [33]. Our experiments with eT2Class delivered remarkable results in terms of accuracy and running time in comparison with other benchmarked algorithms. Our web news mining framework criteria has the following characteristics:  it updates the model’s structures and parameters based on news article content.  it analyzes each article in a single-pass mode without any pre-recorded data.  it automatically extracts relevant terms from various web news articles and classifies them in an evolving manner.  it automatically adapts to non-stationary, changing environments. The rest of the paper is organized as follows. Section II discusses the web news mining background and the evolving fuzzy classifier for data streams. Section III describes the general structure of the proposed web news mining approach. Section IV discusses the evolving classification. Section V describes the experiment framework design and Section VI presents the conclusion. Furthermore, this paper consists of many abbreviations that are taken from both acronyms and initials of terms and algorithms. Table 1 tabulates the list of main abbreviations used in this paper.

II. Background and related work 2.1 Web Mining using machine learning.

Web mining is a process of data exploration in the web environment, such as web services or web documents for information extraction purposes [34]. Fayyd [35] defines web mining as the process of finding hidden context and knowledge from the web environment. Web mining covers several tasks, such as resource finding, information selection, pattern generalization, and pattern analysis [34].

Kosala [36] categorizes web mining into three major areas: web content mining, web structure mining, and web usage mining. Web content mining discovers hidden information from a broad range of web content types, such as text, image, audio, and video. Web structure mining builds the connection between webpages using its link structure. Web usage mining learns the user behavior by analyzing the user interaction with the web.

Web content encompasses a wide range of data types, such as image, audio, video, and textual data. Web content mining of images, audio, and video is known as web multimedia mining [37], whereas web content mining of text is known as web news mining. Web news mining has drawn vast research attention since there is an explosion of news articles generated from online media. News companies contribute to this volume of online news articles as they provide news articles on the web in addition to the traditional printed newspaper. This is also caused by dynamic of today’s news article, updated continuously with a rapid rate. In this paper, we focus on web news mining with a specific application of online news articles.

Web news articles are updated continuously over time. They are generated in an online manner without any time restriction. Thus, web news mining works in the data stream environment and leads to an urgent need of an efficient computational tool, which is scalable against information explosion. Therefore, an advanced text mining algorithm is required to handle the data stream characteristics [2, 38-40]. The algorithm must encompass the following traits: (1) incremental and fast in order to cope with the speed of data arrival; (2) able to adapt to changing concepts due to the influence of new data; (3) compact, robust and able to cope with the growth of data; and (4) able to handle outliers.

To the best of our knowledge, in the realm of web news mining, an evolving learning algorithm is used in [11] to classify news based on topic category. However this classifier, originally proposed in [12], is seen as the conventional evolving fuzzy classifier, because it is based on Type-1 fuzzy systems architecture. In this paper, we go one step ahead with a more advanced algorithm, namely eT2Class [41] which is based on type-2 fuzzy classifiers. In our numerical study, eT2Class is capable of producing the most encouraging result in attaining tradeoff

between complexity and accuracy. In addition, our new web news mining framework is developed for local online news articles in Australia.

2.2 Evolving System

The concept of Evolving Intelligent Systems (EIS) was initiated by Juang and Lin [42] in their work on self-constructing neural fuzzy interference networks with online structure and parameter learning (SONFIN), which learns structural and parameter models in single-pass mode. A dynamic evolving neuro-fuzzy inference system (DENFIS) was proposed by Kasabov and Song [43]. DENFIS clusters the fuzzy rule with the evolving clustering method (ECM), followed by eTS [44]. It develops an online version of evolving algorithms with the use of mountain clustering, inspired by Yager and Filev [45]. Simpl_eTS [46], a simplified version of eTS, utilizes the concept of scatter to replace information potential in eTS and replaces the Gaussian exponential membership function with the Cauchy function. The use of these two methods significantly reduced the computational cost of eTS. A modification to the rule growing and pruning in the EIS system was introduced in the sequential adaptive fuzzy inference system (SAFIS) [47].These methods are inspired by GAP-RBF [48] and GGAP-RBF [49] for fuzzy systems. Lughofer proposed FLEXFIS [50] which is developed with the incremental version of vector quantization [51] and which has been expanded in Gen-Smart-EFS approach [52] to generalized rules, smart, compact rule bases and reduced dimensionality by outweighing unimportant input features. Angelov appended the rule base simplification and online dimensionality reduction of eTS, namely eTS+ [10]. Angelov enhanced the performance of eTS+ by adding the rule addition process using the density increment method in simp_eTS+[53].

Lemos et al. proposed eMG [54], an extension of ePL [55], which used a multivariable Gaussian function. AnYa [56] is also a prominent work proposed by Angelov and Yager that provides cloud-based rules with arbitrary shapes. PANFIS, developed by Pratama [57], modified the statistical contribution theory of SAFIS [47] to the multivariate Gaussian function in PANFIS. It contains a proof of convergence of the model error over time, which is unique compared to other approaches. PANFIS was also extended with the use of a new online feature selection scenario introduced in GENEFIS [17]. Lughofer [58] modified the evolving clustering engine in

FLEXFIS (termed eVQ, evolving Vector Quantization) with a non-diagonal covariance matrix, geometric rule merging criteria to achieve maximal compactness and dynamic splitting concepts in case of cluster heterogeneities. In the realm of Evolving Fuzzy Classifier, a generalized concept of meta-cognitive learning was introduced in rClass [59] to incorporate fundamental pillars of human learning which are what-to-learn, when-to-learn, and how-to-learn. This fundamental pillar of human learning was previously introduced in gClass [60] in the incremental classifier. Comprehensive surveys on EFS can be found in [14] and [15].

All these algorithms mentioned above are built based on a type-1 fuzzy system which features crisp and certain membership. The underlying problem of the type-1 fuzzy system lies in its incapability to overcome the issue of uncertainty, particularly when dealing with the inexact, inaccurate, and imprecise nature of real-world data streams. It relies heavily on the accurate parameter identification strategy.

The type-2 fuzzy system has been developed to overcome type-1 fuzzy drawbacks of uncertainty. Zadeh [22] introduced the concept of the fuzzy-fuzzy set that leads to fuzzy membership in the type-2 fuzzy system. Nevertheless, the type-2 fuzzy system suffers from the over-complex working principle, which cannot be handled by type-1 fuzzy mathematics. The type-2 fuzzy system also suffers from algorithm complexity due to the mechanism of type reduction from type-2 to type-1. The algorithm complexity triggers the idea of interval type-2 fuzzy systems [61] that simplify the working principle of the type-2 fuzzy system by assuming unity secondary grade of type-2 fuzzy systems. Note that a single interval primary membership in type-2 fuzzy systems is functionally equivalent to interval-valued fuzzy systems [62]. Interval type-2 fuzzy concept was incorporated in EIS in [24], which significantly enhances the type-2 fuzzy set of SONFIN. The extension of this concept was applied in a local recurrent structure in [63] and an interactive recurrent structure [64]. Castro et Al. [65] proposed a hybrid learning scenario of type-2 fuzzy system. It utilizes gradient descent back-propagation for three different configurations of the interval type-2 network architectures. Tung et Al. [26] developed interval type-2 in the Mamdani type fuzzy system. The interpretability problem in the interval type-2 EIS was also addressed by Juang et al. [66] by advocating a new parameter scenario, which is useful for the rule semantic improvement of interval type-2 EIS. The extension of

work in of meta-cognitive learning in evolving type-2 classifier environment was introduced in ST2Class [67], while the incremental type-2 extreme learning machine was put forward in eT2ELM [68]. However, interval type-2 suffers from the scalability issue as a result of the intensive use of the KM type reduction method.

Bouchachia et al. proposed GT2FC [69], the evolving interval type-2 fuzzy classifier, which learns in a truly sequential learning scenario. It is, nevertheless, a zero-order classifier, which uses a class label as a rule consequent. Although the working principle of a sequential learning scenario can avoid using the KM method, it predicts a class label directly instead of a decision surface. Abiyev et al. [70] introduced 𝑞 coefficient for the KM method in the fixed-structure interval type-2 FNN. Lin [32, 71] adopted this method in the interval type-2 EIS. [62].

The concept of type-2 metacognitive neuro-fuzzy system was put into perspective in [72], which presents the interval type-2 version of McFIS . Although some seminal work on the interval type-2 EFSs can be seen in the literature [24, 25, 29, 32, 69-72], a further study in this area is necessary for three reasons: 1) this work still depends on firing a strength-based clustering strategy, which is prone to outliers; 2) the interval type-2 EFSs in [24, 25], and [32, 70, 71] suffers from the absence of the rule base simplification strategy which occurs in both the fuzzy rule and input feature levels. This limitation often inflicts to a complex rule base as no rule pruning scenario is embedded. In fact, the fuzzy rules can be inconsequential during the training process. This is as a result of either integrating outliers as fuzzy rules improperly or rules becoming overlapping/redundant over time due to data cluster fusion effects The fuzzy rules can also become obsolete due to changes in data distribution (drifts). On the contrary, although the interval type-2 EFSs in [71], [66], and [26] exhibit a rule merging scenario to address the redundancy issue, they do not address the issue of inactive and/or obsolete rules (due to insignificance, representing outliers, etc.), whereas in [72] the redundancy aspect is unexplored; 3) the use of interval type-2 EFS for classification problems is yet an uncharted territory in the existing literature.

2.3 Application of Evolving Fuzzy Systems

Evolving Fuzzy Systems (EFS) have been widely used in many real-time based applications such as web news mining and on-line residential premises prediction. EFS has also been widely used in many applications that require computationally efficient algorithm to process data streams. A real-time approach of automatic detection, object identification, and tracking video system are used in [73]. The evolving system part is based on recursive density estimation (RDE) which is used for object identification. This method utilizes cauchy-type of kernel that replaces kernel density estimation (KDE) method that use gaussian function. More precisely, object detection is carried out by analyzing frame by frame recursively using evolving algorithm instead of a window that usually contains several dozens of frames using non evolving algorithm. This method significantly reduces the memory space required in processing and detecting an object. Evolving system is also utilized for faultydetection/monitoring system of waste-water treatment plants(WWTPs) in eFuMo [74]. This algorithm optimizes the control quality by minimizing the influence of sensor faults (malfunction). The control quality can increase the pollutant removal, a reduced need for chemical, and energy savings. Precup et.al [75] utilizes evolving TSK fuzzy models applied in the crane system. This TSK fuzzy models are derived from simple and transparent online identification algorithm. The input selection based on important factors ranking obtained from online identification algorithm which dynamically evolve to modify its rules and parameters with the use of potentials of new data point. A robust fuzzy adaptive law for evolving system is implemented in [76]. It proposes the implementation of adaptive law which can be applied in TakagiSugeno based control of consequent part and evolving control systems in antecedent part. The result is a high performance and robust control of nonlinear and slowly varying system. Lughofer et. al [77] investigates the prediction of residential prices premises using evolving fuzzy models. This model updates the parameters and inner structure expansion using concepts of uncertainty modeling in a possibilitic and linguistic manner. NOx prediction models in a diesel engine is applied in [78] to control pollutant formation and emissions from selective catalyst systems or diesel particulate filters. This prediction model utilizes FLEXFIS [50] which automatically extracts suitable number of rules and fuzzy sets and estimates the consequent parameters of Takagi-Sugeno fuzzy systems. Iglesias introduced the automated approach of daily activities recognition for intelligent home environment in evolving

manner in [79] to cope the dynamic human activities. This evolving activities recognition not only can recognize specific activity but also able to predict future action prediction, remote health monitoring, or interventions. Another evolving system framework in web news mining domain is also introduced in [11] to classify online news article in the large corpus of data.

III. Our approach: 3.1. General Structure

Our main goal is to develop the web news article classification framework incorporating an advanced evolving classifier, namely the Evolving Type-2 Classifier (eT2Class) [41]. The salient approach of eT2Class lies in its capability to update the model's structures and parameters based on news article content in the online realtime mode. Fig.1 shows the structure of our framework which comprises two separate phases (modules):

1. Term extraction: selecting the prominent terms among the articles. This term extraction comprises two sub modules: term generation and term filtering. These sub-modules are described in section 3.2.1. and 3.2.2. 2. Evolving classification: classifying terms extracted from the previous phase using an evolving classifier. The prominent property of this classifier is its open structure, which can add, update, prune, merge the fuzzy rules model continuously in each incoming news article (set of terms in the news article). Here, we limit our discussion in supervised learning case where ground truth is always available.

Each article is analyzed in a single pass mode without the requirement of initial data and/or structure. Our approach does not require a predefined number of categories of collected data, whereas traditional classifiers (common classifiers) require categorized and labeled data. Moreover, the traditional static classifier structure is static over time. Our approach extends the classification method that was developed in [11] by embedding the evolving Classifier eT2Class, which employs type-2 fuzzy sets instead of type-1 to better model uncertainty in news article. Our approach automatically

extracts relevant terms from various web news articles and classifies them in an evolving manner.

The eT2Class classifier has the capability of online mining algorithms to deal with data stream challenges:

1. It needs to be fast enough to cope with real-time demands in web new mining. 2. It needs to adapt to non-stationary environments. 3. It needs to have the ability to handle outliers.

3.2. Term Extraction

The main functionality of the term extraction module is to acquire clean terms from various news articles in various topic areas. This module records the term distribution of each document/news article. The initial step of term extraction is crawling the text from the web news article. We manually crawl the web news pages and acquire the textual article data from the websites. We utilize the Python library called the BeautifulSoap HTML parsing library. This library pulls the important data out of HTML and XML files. Its parsers allow us to retrieve particular article content in the website according to its website structure.

In this work, we crawl Australian-based web news articles in www.theage.com.au. We extract textual data content such as date, article content, article link and its category from the articles. We acquired textual data from news articles dated from 26/2/2016 to 13/3/2016 as a dataset. The obtained articles are labeled based on the category link in the website. There are 6 categories: business, entertainment, lifestyle, politics, sport, and technology.

The framework of web news article classification is depicted in Fig. 1. This figure shows that categorized news is obtained by labeling the web news article. Every categorized article is further processed and analyzed. The process covers two main sub-modules, namely term generation and term filtering which are explained further in sections 3.2.1 and 3.2.2 respectively.

3.2.1. Term Generation

We define term as a root word, a word that does not have a prefix and suffix, that exists in the dictionary. The term generation sub-module generates a set of terms from the chunk of strings in the article. Preprocessed (original) articles may contain root words, compound words and complex words. A compound word is a combination of two words such as: dragonfly, grandmother, biography etc. A complex word is a root word that has the addition of a suffix and/or prefix. Examples of complex words that are built by a root word include: unlikely and vitality. Examples of complex words that add to a compound word include biographical and hot-bloodedness.

Based on the complexity of words in the article, several steps are required to obtain a set of clean term/root word from the set complex word (affixes/suffix of word and compound word). There are many phases required to obtain the terms in the article: 1. Tokenization (symbol removal): This step breaks chunk of strings into separate words. At the same time, meaningless strings such as symbols and spaces are removed. For example: the string of: “Friends, Romans, Countrymen, lend me your ears;” becomes separate words: | Friends | Romans | Countrymen | lend | me | your | ears |. 2. Stopwords removal: This step eliminates stopwords in the article. Stopword is a list of meaningless words that is stored in the database. In general, stopwords consists of a list of prepositions, articles, and pronouns. However, we can append other meaningless words. The result of this step is extracted meaningful words. For example, if there is a string taken from part of poem by William Shakespeare: “Friends, Romans, countrymen, lend me your ears”, this string will be tokenized then it will result tokenized words. Then, the word ‘me’ and ‘your’ are removed because word ‘me’ and ‘your’ are regarded as unnecessary words (meaningless) in the stopwords list. The last result becomes: | Friends | Roman | Countrymen | lend | ears |. 3. Stemming or lemmatization: This step elicits complex words into root words, resulting in only root words being generated in the article. Stemming is the process of slashing the ends of words to acquire the root word, whereas lemmatization is the process of grouping words into the same lemma. As an example of stemming: the word “cats” becomes “cat”, “ponies” becomes

“poni”. As an example of lemmatization: run, runs, ran, running are regarded as the same lemma. In this work, the stemming process is carried out by using stemming algorithms called Porter Stemmer [80]. The process of stopword removal and tokenization is depicted in Fig. 2 and Fig. 3. In order to analyze the number and distribution of terms in the article, we need to count the frequency of each term in the article and calculate the distribution of terms over the article. In this case, a term is considered as a stemmed word that is generated from tokenization, stopword removal, and stemming or lemmatization.

3.2.2. Term filtering

Term filtering is a process to determine the importance of a term in each article that is expressed by the weight of a term. The weight of a term is influenced by the occurrence of that particular term in a particular document with respect to the appearance of that particular term in other documents in the database. We utilize a popular term weighting schema called TF-IDF (Term Frequency Inverse Document Frequency) [81].

Term frequency (TF) is the frequency of a particular term in a particular document, whereas Inverse Document Frequency (IDF) relates to the number of documents in which the particular term appears. TF-IDF weighting for a particular term i in the document j is defined by: 𝑛

𝑤𝑒𝑖𝑔ℎ𝑡(𝑖, 𝑗) = {

(1 + 𝑡𝑓𝑖,𝑗 )𝑙𝑜𝑔2 (𝑑𝑓 ) if 𝑡𝑓𝑖,𝑗 ≥ 1, 𝑖

0

if 𝑡𝑓𝑖,𝑗 ≥ 0,

(1)

where 𝑡𝑓𝑖,𝑗 is defined by the frequency of the ith term within the jth document, and 𝑑𝑓𝑖 expresses the number of documents in which the ith term appears inside the database. From the formula in Equation (1), it is clear that TF plays an important role in determining the importance of the term in a particular document, whereas IDF

reduces the significance of terms if a particular term does not frequently appear in the other documents.

3.3. The Dynamic Learning Problem in Web News Mining

In the web news mining, articles are the source of dataset. We gathered articles ranging from 26/2/2016 to 13/3/2016. It comprises 6 categories with different proportions of each category. It is worth noting that this problem possesses nonstationary characteristics because online news articles are highly influenced by numerous external factors, rapidly changing overtime. For example, the general election in Australia caused news about politics appeared more frequently than other categories. It is confirmed by distribution of news articles, which are not fixed every day: the number of documents in business, entertainment, lifestyle, politics, sport, and technology are 311, 169, 138, 144, 226, and 199.

Terms that are part of the articles become the input of the Evolving Classifier. There are 34,639 terms generated from all the documents after the preprocessing step. Term distribution over all the documents in the dataset can be mapped to a matrix called the Term and Document Matrix (TDM).

We do not use all the terms for the classification task. We select only the most significant terms in the documents by calculating the weight of the terms for all the documents. In order to determine how many terms are used in the classification task, a term filtering method called TF-IDF is used to calculate the weight of the terms from the TDM. We apply the threshold for the TF-IDF calculation for selecting important terms.

The TF-IDF matrix expresses the importance of each term to perform the classification task. A term with a low TF-IDF means it is not informative for classification task. We can discard those terms without significant loss of accuracy. For each category, we assign unique thresholds to arrive at balanced distribution across each category.

For the 1st, 2nd, 3rd, 4th, 5th, and 6th categories, 0.26, 0.41, 0.23, 0.13, 0.06, and 0.007 are applied as a threshold, and result in 54, 46, 61, 46, 42, and 41 terms to be selected, for each category. The different thresholds for each category are set due to different numbers of web news articles crawled in each category. Thus, in order to avoid imbalanced data problem, which is beyond the context of this paper, we apply different thresholds. Table 2. shows the different threshold values applied in the TFIDF matrix and the number of terms selected from the term selection process. Note that the higher the threshold set to each category, the lower the number of terms selected from the TDM. The selected terms are then fixed in the classification task. The number of input features affects the number of categories, because each category may bring different feature subsets. For example, for 2-categories classification task where 1st and 2nd categories are classified, the number of feature vectors is 100 (the sum of 54 and 46).

3.4. Training Process in Web News Mining

Training process in web news mining is carried out by processing Term Document Matrix (TDM) generated from multiple documents in the corpus to the classifier. This TDM still has possibly inconsequential terms (initial TDM) and the terms need to be selected to reduce the input dimension in the training phase. Therefore, term extraction is applied to the initial TDM, and some terms are selected which represent the important terms in every class category. We then combine all selected terms and we call it “Selected Term” as depicted in Fig. 1. In more details form, “Selected Term” is also depicted in Fig. 4 (in part A – Term Document Matrix Generation). The “Selected Term” forms new TDM is then ready to be classified. The new TDM with the selected term has much fewer number of term in comparison with initial TDM. The training and testing of TDM are depicted in Fig. 4. Fig. 4 in part B shows the TDM transformation during training and testing process in the classifier. Phase 1 describes the TDM (with the selected tern). Then, the label is added for training purposes in the 2nd phase. The 3rd phase is to conduct random permutation for all document index in the corpus to yield TDM with shuffled index. Phase 4 and 5 depict the training and testing part of shuffled TDM. The training part forms the model that is used to inference (predict) the output result of. The classification performance can be obtained by comparing the output prediction result

and the testing label. This training process is done 50 times to validate the classification performance. In addition to classification performance, we also measure the running time and number of rule generated for every classification process.

IV. Evolving Type-2 Classifier

eT2Class is an evolving classifier based on the type-2 fuzzy system framework to learn data streams. The eT2Class starts the learning process from an empty trained rule base in the single pass and local learning modes. Its fuzzy rules can be automatically grown, pruned, recalled and merged. This chapter briefly recounts the network architecture of the eT2Class in section 4.1 and the learning policy of the eT2Class in section 4.2. 4.1.

Network Architecture

The prominent eT2Class learning properties adopt the fuzzy rule mechanism where the Gaussian function assembles its premise part and the functional link-based Chebyshev polynomial [26] develops the consequent part.

eT2Class utilizes type-2 fuzzy inference which applies the upper and lower multivariate Gaussian function with the uncertain inverse covariance matrix. It is capable of generating the non-axis parallel ellipsoidal cluster, which is suitable to cover irregular data distribution. The firing strength of eT2Class is defined as follows: −1 −1 −1 −1 𝑅̃ = exp (−(𝑋𝑛 − 𝐶𝑖 )Σ̃𝑖 (𝑋𝑛 − 𝐶𝑖 )𝑇 ) , Σ̃𝑖 = [Σ̃𝑖,1 , Σ̃𝑖,2 ](1),

where 𝑅𝑖 defines the spatial firing strength of the ith rule through the t-norm operator which can be defined as follows: 𝑅̃𝑖 = [𝑅𝑖 , 𝑅𝑖 ] (2). −1 −1 −1 The uncertain non-diagonal covariance matrix is denoted as Σ̃𝑖 = [Σ̃𝑖,1 , Σ̃𝑖,2 ].

𝐶𝑖 ∈ ℜ1×𝑢 denotes the centroid of the ith cluster and u denotes the number of input dimensions. The transformation of the non-axis parallel ellipsoidal cluster into its corresponding fuzzy set is undertaken by applying the following method [66]:

𝜎̃𝑖 =

𝑟 ̃𝑖𝑖 √Σ

, 𝜎̃𝑖 ∈ [𝜎𝑖 1 , 𝜎𝑖 2 ] (3),

where 𝜎̃𝑖 denotes the uncertain radii of the jth data in the ith rule obtained from the non-diagonal covariance matrix that can be derived into upper and lower uncertain radii. Once obtaining the upper and lower radii, the membership degree of the interval type-2 fuzzy set function of the current jth data sample to the ith rule with uncertain standard deviations (SDs)[82] can be formulated as follows:

𝑥𝑗 −𝑐𝑖,𝑗

𝜇̃𝑖,𝑗 = exp (− (

̃ 𝑖,𝑗 𝜎

2

) ) , 𝜎̃𝑖,𝑗 ∈ [𝜎 1 𝑖,𝑗 , 𝜎 2 𝑖,𝑗 ] (4).

The membership degree also can be derived into an upper and lower membership degrees due to the formation of upper and lower spatial firing strengths. The upper and lower spatial firing strength of the ith rule 𝑅̃𝑖 = [𝑅𝑖 , 𝑅𝑖 ] (equation 2) using the product t-norm operator can be derived as follows: 𝑅𝑖 = ∏𝑢𝑗=1 𝜇𝑖,𝑗 , 𝑅𝑖 = ∏𝑢𝑗=1 𝜇𝑖,𝑗 (5).

eT2Class also exploits the q design factor to perform the type reduction mechanism. This type reduction is computationally efficient for reducing execution time compared to the classical K-M iterative [21]. This coefficient is adaptively adjusted by the Zero Error Density Maximization (ZEDM) [83] method which controls the upper and lower output [𝑦𝑙 , 𝑦𝑟 ] which is defined as follows :

𝑦𝑙,𝑜 =

𝑦𝑟,𝑜 =

𝑙 𝑜 𝑃 𝑙 (1−𝑞𝑙 𝑜 ) ∑𝑃 𝑖=1 𝑅𝑖 𝑦𝑖,𝑜 +𝑞𝑙 ∑𝑖=1 𝑅𝑖 𝑦𝑖,𝑜

∑𝑃 𝑖=1 𝑅𝑖 +𝑅𝑖

𝑟 𝑜 𝑃 𝑟 (1−𝑞𝑟 𝑜 ) ∑𝑃 𝑖=1 𝑅𝑖 𝑦𝑖,𝑜 +𝑞𝑟 ∑𝑖=1 𝑅𝑖 𝑦𝑖,𝑜

∑𝑃 𝑖=1 𝑅𝑖 +𝑅𝑖

𝑦𝑜 = 𝑦𝑙,𝑜 + 𝑦𝑟,𝑜 , 𝑌 = max (𝑦̂𝑜 ) 𝜎=1,…,𝑚

(6)

(7) (8)

where P denotes the number of fuzzy rules. The lower and upper output [𝑦𝑙 , 𝑦𝑟 ] describes the reduced set. It produces the output as a result of the upper and lower output as 𝑦𝑜 = 𝑦𝑙 + 𝑦𝑟 .

The network topology of eT2Class from the input and inference is depicted in Fig. 5.

4.2.

Learning Policy

The learning policy of eT2Class is derived in 3 parts: rule generation, rule pruning, and rule merging. Other essential concepts that correspond to these three issues are fuzzy rule initialization and rule premise adaptation. The eT2Class procedure is depicted in Fig. 6.

a. Rule Generation The rule generation mechanism handles the rule growing process. It is orchestrated by two main methods, namely T2DS and T2DQ. T2DS defines the data significance, whereas T2DQ defines data quality.

The T2DS method obtains the generalization potential and summarization power of the data stream. This first rule growing cursor is modified from [57] to accommodate the interval type-2 fuzzy system. It predicts the statistical contribution of the fuzzy rules, leading to their expected future contribution to model outputs/predictions. With the prior assumption that the training samples are uniformly distributed, the statistical contribution of the (P+1)th hypothetical cluster of can be estimated with the formula: 1

𝑉𝑝+1 1

𝐷𝑆𝑛 = 2 (∑𝑃+1 𝑉 𝑖=1

𝑃+1

𝑉𝑝+1 2

+ ∑𝑃+1 𝑉 𝑖=1

𝑃+1

) (9)

where P is the number of current existing clusters (rules). 𝑉𝑝+1 expresses the volume of the (P+1)th multivariate Gaussian rule that can be calculated by the det operator [16]. To be qualified as a new rule, the hypothetical cluster needs to have a higher volume than all the current existing clusters. This condition of rule generation that is influenced by T2DS is expressed as follows: (𝑉𝑃+11 + 𝑉𝑃+1 2 ) > max (𝑉𝑖 1 + 𝑉𝑖 2 ) 𝑡=1,…,𝑃

(10)

The above formula expresses the upper and lower prediction of the volume, whereas 𝑉𝑖 expresses the history of the volume multivariate Gaussian rule that has been formed previously, before the new incoming datum.

The second rule growing cursor is T2DQ (Type-2 Data Quality) [16]. T2DQ quantifies the spatial proximity of an incoming datum in comparison to previously trained data patterns. This formula exhibits the data density measure. Type-2 Data Quality is defined as follows: 𝑈𝑛 (11) (1+𝑏 𝑛 𝑛 )−2ℎ𝑛 +𝑔𝑛

𝐷𝑄𝑁 = √𝑈

𝑈𝑛 = 𝑈𝑛−1 + 𝐷𝑄𝑁−1 , 𝑏𝑛 = ∑𝑢𝑗−1(𝑥𝑗 𝑁 )2 , (12) ℎ𝑛 = ∑𝑢𝑗−1 𝑥𝑗 𝑁 𝑝𝑛 𝑗 , 𝑝𝑛 = 𝑝𝑛−1 + 𝐷𝑄𝑁−1 𝑋𝑛 𝑔𝑛 = 𝑔𝑛−1 + 𝐷𝑄𝑁−1 𝑏𝑛 (13) The training process initializes all recursive parameters to zero. A new incoming datum can be considered as a new fuzzy rule when it stands in a dense area or remote area from the current influence zone. This condition is formulated as: 𝐷𝑄𝑁 ≥ max (𝐷𝑄𝑖 ) or 𝐷𝑄𝑁 ≤ min 𝐷𝑄𝑖 𝑖=1,…,𝑃

𝑡=1,…,𝑃

(14)

The 𝐷𝑄𝑁 denotes the current data quality and 𝐷𝑄𝑖 denotes the data quality of the previously generated rule. Even though this data quality offers trivial computational complexity, these two conditions suffer from the outlier’s bottleneck. Therefore, to reduce the outlier’s influence, the weighting factor 𝐷𝑄𝑁−1 must be applied [84]. the 𝐷𝑄𝑁 ≥ max (𝐷𝑄𝑖 ) condition will be satisfied if the newly created fuzzy 𝑖=1,…,𝑃

region is deemed too close to or overlaps the current region.

b. Rule Initialization In [29], the distance ratio between inter- and intra-class clusters was proposed to avoid the overlapping situation. This work was enhanced by [60] to include a case of a non-purified cluster, in which many classes may exist in the

overlapping region. It formulates the quality per class concept, in which a different class sample might be in the same cluster, and applies it in the type-2 fuzzy systems structure.

The quality per class method is formulated as follows:

𝐷𝑄𝑜 = √(𝑁

(𝑁𝑜 −1)

𝑜 −1)(𝑎𝑏𝑛 +1)+(𝑐𝑏𝑛 −2𝑏𝑏𝑛 )

(15)

𝑢+𝑚 𝑁 𝑛𝑜 𝑁 2 where 𝑎𝑏𝑛 = ∑𝑢+𝑚 𝑗=1 (𝑥𝑗 ) , 𝑐𝑏𝑛𝑜 = 𝑐𝑏𝑛𝑜−1 , 𝑏𝑏𝑛𝑜 = ∑𝑗=1 𝑥𝑗 𝑑𝑗 , 𝑑𝑛𝑜 =

𝑑𝑛𝑜−1 + 𝑥𝑁𝑜 −1 . The number of samples labeled as the oth class is denoted as 𝑁𝑜 . 𝑥𝑗 𝑁 and 𝑥𝑗 𝑛𝑜 denote the latest incoming datum and the latest one falling in the oth class respectively. The first class overlapping condition is indicated by max (𝐷𝑄𝑜 ) ≠ 𝑜=1,…,𝑚

𝑡𝑟𝑢𝑒_𝑐𝑙𝑎𝑠𝑠_𝑙𝑎𝑏𝑒𝑙, which may deteriorate the classifier’s generalization. Thus, the cluster’s shifting and the cluster’s shrinking are reasonable ways to handle this situation. Henceforth, the new rule is formulated as follows: 𝜌1

𝑐𝑃+1,𝑗 1,2 = 𝑥𝑗 − 𝜌2 (𝑐𝑖𝑒,𝑗 1,2 − 𝑥𝑗 ), 𝑑𝑖𝑠𝑡1 𝑗 = 𝑁 −1

2𝜌1

(𝑑𝑖𝑠𝑡1,2 𝑇 𝑑𝑖𝑠𝑡1,2 ) , 𝑑𝑖𝑠𝑡1,2 𝑗 = 𝑁

𝑤𝑖𝑛

𝑤𝑖𝑛

|𝒄𝑷+𝟏,𝒋 𝟏 − 𝒄𝒊𝒆,𝒋 𝟏 |, Σ̃𝑝+1

−1

=

|𝑐𝑃+1,𝑗 2 − 𝑐𝑖𝑒,𝑗 2 |(16)

Where 𝑖𝑒 and 𝑖𝑟 are the closest inter and intra class cluster, respectively. 𝜌1 = 𝑟𝑖𝑟 𝑗 /𝑟𝑖𝑒 𝑗 denotes the overlapping factor, determining the fuzzy region size. 𝜌2 ∈ [0.01 − 0.1] denotes the shifting factor. The upper fuzzy set is factored by the standard deviation of 𝑁𝑤𝑖𝑛 supports of the winning cluster, whereas the lower fuzzy set can be assigned as the standard deviation of 𝑁𝑤𝑖𝑛 /2 population [66]. The second class overlapping condition is when the rule has more neighboring relationship with the intra-class data clouds, which is denoted by max (𝐷𝑄𝑜 ) = 𝑡𝑟𝑢𝑒_𝑐𝑙𝑎𝑠𝑠_𝑙𝑎𝑏𝑒𝑙. The decision surface is not affected and

𝑜=1,…,𝑚

there is a chance of misclassification. A confident parameter is assigned as follows:

𝑐𝑃+1,𝑗 1,2 = 𝑥𝑗 + 𝜌2 (𝑐𝑖𝑟,𝑗 1,2 − 𝑐𝑖𝑒,𝑗 1,2 ), 𝑑𝑖𝑠𝑡1 𝑗 = 𝑐𝑖𝑟,𝑗 1 |,̃Σ𝑝+1

−1

−1

𝜌1 𝑁𝑤𝑖𝑛 2𝜌1

= (𝑑𝑖𝑠𝑡1,2 𝑇 𝑑𝑖𝑠𝑡1,2 ) , 𝑑𝑖𝑠𝑡2 𝑗 = 𝑁

𝑤𝑖𝑛

|𝑐𝑃+1,𝑗 1 − |𝑐𝑃+1,𝑗 2 − 𝑐𝑖𝑟,𝑗 2 |(17)

The new fuzzy rule is then evolved as follows: 𝑐𝑃+1,𝑗 1,2 = 𝑥𝑗 , 𝑑𝑖𝑠𝑡1 𝑗 =

𝜌1 𝑁𝑤𝑖𝑛

|𝑥𝑗 − 𝑐𝑖𝑟,𝑗 1 |, 𝑑𝑖𝑠𝑡2 𝑗 =

2𝜌1 𝑁𝑤𝑖𝑛

−1 |𝑥𝑗 − 𝑐𝑖𝑟,𝑗 2 |, Σ̃𝑝+1 =

(𝑑𝑖𝑠𝑡1,2 𝑇 𝑑𝑖𝑠𝑡1,2 )−1 (18)

The rule consequent of the new fuzzy rule is formed as follows: Ω𝑃+1 𝑙,𝑟 = Ω𝑤𝑖𝑛 𝑙,𝑟 , Ψ𝑃+1 𝑙,𝑟 = 𝜔𝐼(19) where 𝜔 implies a large positive constant valued 𝜔 = 105 and Ψ𝑃+1 implies a big positive definite matrix.

c. Rule Recall Mechanism The rule recall mechanism will be carried out if the following condition is met: max 𝜘𝑖 ∗ > max(𝐷𝑄𝑖 )(20)

𝑖 ∗ =1,…,𝑃 ∗

𝑖=1,…,𝑃+1

where 𝜘𝑖 ∗ denotes the potential of pruned rules, 𝑃∗ denotes the number of fuzzy rules that are already pruned, whereas 𝐷𝑄𝑖 expresses the current data distribution. This equation shows that the maximum potential value of the previously pruned rules is higher than the maximum of the current data distribution. Therefore, if this condition is met, the rule recall mechanism is activated. Otherwise, eT2Class will add a new fuzzy rule and initialize the new fuzzy rule. The rule recall mechanism aims to control the cyclic drift situation without inducing catastrophic forgetting of previously valid knowledge.

d. Rule Premise adaption

Rule premise adaptation occurs when the rule growing mechanism is not met, based on equations (10) and (14). If this is the case, the rule premise of the winning rule is adjusted to refine the coverage area of the winning cluster. The marginal conflict between datum and the existing knowledge base can be easily alleviated by the rule premise adaptation of the winning rule as follows: 𝑁𝑤𝑖𝑛 𝑁−1

𝐶𝑤𝑖𝑛1,2 𝑁 = 𝑁

𝑤𝑖𝑛

𝑁−1

∑𝑤𝑖𝑛,1,2(𝑁)−1 =

𝑁−1 𝐶 + +1 𝑤𝑖𝑛1,2

∑𝑤𝑖𝑛,1,2(𝑁−1)−1 1−𝛼

+

(𝑋𝑁 −𝐶𝑤𝑖𝑛1,2 𝑁−1 )

𝛼 1−𝛼

𝑁𝑤𝑖𝑛 𝑁−1 +1

(21)

(22)

(∑𝑤𝑖𝑛,1,2(𝑁 − 1)−1 (𝑋𝑁 − 𝐶𝑤𝑖𝑛,1,2 𝑁−1 ))(∑𝑤𝑖𝑛,1,2(𝑁 − 1)−1 (𝑋𝑁 − 𝐶𝑤𝑖𝑛,1,2 𝑁−1 ))𝑇 1 + 𝛼(𝑋𝑁 − 𝐶𝑤𝑖𝑛,1,2 𝑁−1 ) ∑𝑤𝑖𝑛,1,2(𝑜𝑙𝑑)−1 (𝑋𝑁 − 𝐶𝑤𝑖𝑛,1,2 𝑁−1 )𝑇 𝑁𝑤𝑖𝑛 𝑁 = 𝑁𝑤𝑖𝑛 𝑁−1 + 1 (23) where 𝛼 = 𝑁

1 𝑤𝑖𝑛

𝑁−1

+ 1. The rule premise adaptation scheme is derived from

the sequential maximum likelihood estimation for the spherical cluster to adjust the non-axis parallel ellipsoidal cluster. Moreover, equation (22) is adjusted to enable a direct update of the inverse covariance matrix to evade a further re-inversion step. It is based on the update scheme in [58] derived with the help of the Neumann series. The re-inversion step can trigger numerical instability if the rank of the matrix becomes low and requires additional computation time.

e. Rule Pruning If rule-growing criteria are not fulfilled, and the rule premise is adjusted based on rule premise adaptation, the rule pruning scenario will be executed. Novel pruning procedures are orchestrated by two methods, called the T2ERS (Type2 Extended Rule Significance) method and T2P+ (Type-2 Potential+) method which are the extensions of their type-1 versions in [16]. Both the T2ERS method and T2P+ method can detect inconsequential and obsolete clusters. The inconsequential rules are those ones that have a minimum role during

their lifespan. This condition can be interpreted as the outliers, which initiate fuzzy rules in the previous training observations. The obsolete rule refers to the fact that it is no longer relevant to capture the current data trend, e.g., due to concept drifts or due to sample insignificance (the latter means that the obsolete rule denotes an outlier rule).

The first method of the rule pruning scenario is T2ERS which utilizes the statistical contribution of the rule antecedent and consequent. This method is formulated as: 1

𝑉𝑖 1

𝑜,𝑗 𝑚 𝐸𝑅𝑆𝑖 𝑛 = 2 |∑2𝑢+1 + Ω𝑖,𝑟 =𝑜,𝑗 | (∑𝑝 𝑗=1 ∑𝑜=1 Ω𝑖,𝑙

𝑉 𝑖=1 𝑖

1

𝑉𝑖 2

+ ∑𝑝

𝑉 𝑖=1 𝑖

2

)(24)

This formula is based on assumption of uniformly distributed training data, where the contribution of rule antecedent can be approximated by its volume, while the rule consequent can be quantified from its accumulated weight. The pruning scenario occurs when it meets the condition of: 𝐸𝑅𝑆𝑖 𝑛 ≤ 𝑚𝑒𝑎𝑛( 𝐸𝑅𝑆𝑖 𝑛 ) − 𝑚 ∗ 𝑠𝑡𝑑(𝐸𝑅𝑆𝑖 𝑛 ) (25), where ERS is the abbreviation for Extended Rule Significance, and mean and standard deviation (std) are taken over past values; per default m=1, which makes the algorithm pretty optimistic, thus favoring compact rule bases with a low number of rules.

Another rule-pruning scenario, called T2P+, detects the outdated fuzzy rules due to irrelevancy to cover the current data trend. This scenario becomes very relevant in case of drifts. The cluster evolution should be able to be traced by the pruning method in respect to the current data distribution as: (𝑁−1)𝜘𝑁−1,𝑖 2 2 +2𝜘 2 𝑚+𝑜 𝑁−1 −𝑐 )2 +(𝑁−2) 𝑁−1,𝑖 𝑁−1,𝑖 ∑𝑗=1 (𝑥𝑖,𝑗 𝑖,𝑗

𝜘𝑖 = √2𝜘

(26)

Note that equation (26) is derived under the assumption of identical upper and lower centroids. This method prunes the rule if the condition satisfies:

𝜘𝑖 𝑛 ≤ 𝑚𝑒𝑎𝑛(𝜘𝑖 𝑛 ) − 𝑠𝑡𝑑(𝜘𝑖 𝑛 )(27) f. Rule Merging Scenario This scenario aims to guarantee the compactness and interpretability of the rule base and is developed from the vector similarity measure of [30]. This rule-merging scenario is based on both distance and shape to calculate the similarity degree. The rule merging scenario also is enhanced by a blow-up test, which ensures the coalition of two non-homogenous clusters without harming the classifier’s accuracy. The combined proximity and shape formula is expressed as: 𝑠𝑣,𝑗 (𝑤𝑖𝑛, 𝑖) = 𝑠1,𝑗 (𝑤𝑖𝑛, 𝑖) × 𝑠2,𝑗 (𝑤𝑖𝑛, 𝑖)(28) where 𝑠1,𝑗 (𝑤𝑖𝑛, 𝑖) ∈ [0,1]and 𝑠2,𝑗 (𝑤𝑖𝑛, 𝑖) ∈ [0,1] describes the shape and distance based similarity measure, respectively. The odd characteristic of this formula lies in its alignment procedure due to addressing two different types of similarity separately. To quantify the shape-based similarity measure, 𝑤𝑖𝑛 ̃ and 𝑖̃ fuzzy sets are aligned until 𝑐𝑤𝑖𝑛,𝑗 1,2 = 𝑐𝑖,𝑗 1,2 is reached, meaning that two clusters coincide each other. Then, the extended Jaccard similarity measure is applied to adapt the type-2 fuzzy rule with the use of the average cardinality method as:

𝑠1,𝑗 (𝑤𝑖𝑛, 𝑖) =

̅̅̅̅̅̅̅̅̅∩𝜇 𝑀(𝜇𝑤𝑖𝑛,𝑗 ∩𝜇𝑖,𝑗 )+𝑀(𝜇 𝑤𝑖𝑛,𝑗 𝑖,𝑗 ) ̅̅̅̅̅̅̅̅̅∪𝜇 𝑀(𝜇𝑤𝑖𝑛,𝑗 ∪𝜇𝑖,𝑗 )+𝑀(𝜇 𝑤𝑖𝑛,𝑗 𝑖,𝑗 )

(29)

On the other hand, the distance based similarity measure relies on the kernelbased metric method [57]. The average cardinality method is used to infer the resultant type-2 distance-based similarity which is formulated as:

𝑠2,𝑗 (𝑤𝑖𝑛, 𝑖) =

exp(−𝐴)+exp(−𝐵) 2

(30)

𝐴 = |𝑐𝑤𝑖𝑛,𝑗 1 − 𝑐𝑖,𝑗 1 | − |𝜎𝑤𝑖𝑛,𝑗 1 − 𝜎𝑖,𝑗 1 | 𝐵 = |𝑐𝑤𝑖𝑛,𝑗 2 − 𝑐𝑖,𝑗 2 | − |𝜎𝑤𝑖𝑛,𝑗 2 − 𝜎𝑖,𝑗 2 |

The condition of identical two fuzzy rules is determined by calculating 𝑠𝑣,𝑗 (𝑤𝑖𝑛, 𝑖) for each input dimension and combining them with the use of the t-norm operator. The condition is formulated as: 𝑆𝑣 ≥ 𝜌3 , 𝑆𝑣 = min (𝑆𝑣,𝑗 ) (31) 𝑗=1,…,𝑢

where 𝜌3 is the predefined constant that is set to the value of 0.5. The blow-up effect method as proposed in [14] is applied to deal with the nonhomogenous clusters having different orientations. In this case, the merging can lead to an oversized volume of the merged cluster (blow-up effect). In the case when the merged clusters are significantly greater than the total volume of the two independent clusters, the rule-merging scenario should be avoided. Therefore, the merging can be only accomplished if this following condition is met: 𝑉𝑚𝑒𝑟𝑔𝑒𝑑 1 + 𝑉𝑚𝑒𝑟𝑔𝑒𝑑 2 ≤ 𝑢(𝑉𝑤𝑖𝑛1 + 𝑉𝑖 1 ) + (𝑉𝑤𝑖𝑛 2 + 𝑉𝑖 2 )(32)

The term u is used to compensate the curse of dimensionality effect [14]. Merging of two overlapping clusters to one is conducted by: 𝐶𝑚𝑒𝑟𝑔𝑒𝑑 1,2 = 𝐶𝑤𝑖𝑛1,2 𝑁𝑤𝑖𝑛 𝑜𝑙𝑑 + 𝐶𝑖 1,2 𝑁𝑖 𝑜𝑙𝑑 /(𝑁𝑤𝑖𝑛 𝑜𝑙𝑑 + 𝑁𝑖 𝑜𝑙𝑑 ) (33) Σ −1 𝑚𝑒𝑟𝑔𝑒𝑑

1,2

=

Σ−1 𝑤𝑖𝑛

1,2

𝑁𝑤𝑖𝑛 𝑜𝑙𝑑 +Σ−1 𝑖 1,2 𝑁𝑖 𝑜𝑙𝑑

𝑁𝑤𝑖𝑛 𝑜𝑙𝑑 +𝑁𝑖 𝑜𝑙𝑑

(34)

𝑁𝑚𝑒𝑟𝑔𝑒𝑑 1,2 = 𝑁𝑤𝑖𝑛 𝑜𝑙𝑑 + 𝑁𝑖 𝑜𝑙𝑑 (35) where 𝐶𝑚𝑒𝑟𝑔𝑒𝑑 expresses the merged cluster, Σ −1 𝑚𝑒𝑟𝑔𝑒𝑑 defines the inverse covariance matrix of the merged cluster, and N expresses the number of data being merged.

g. Adaptation of q coefficient and rule output. After some of the previous processes are executed (either rule growing, rulepruning, or rule merging), the q coefficient (𝑞𝑙 and 𝑞𝑟 ) needs to be adapted

which is achieved by using the ZEDM method [83]. Since ZEDM applies the error entropy as the objective function, the optimization process aims to minimize the probability distribution between classifier’s output and target class. The Parzen window estimation method is exploited to form the cost function as follows: 2

2

𝑒𝑛,𝑜 𝑒𝑛,𝑜 1 1 𝑁 𝑓̂(0) = 𝑁ℎ√2𝜋 ∑𝑁 𝑛=1 exp (− 2ℎ2 ) = 𝑁ℎ√2𝜋 ∑𝑛=1 K (− 2ℎ2 )(36)

where N denotes the number of data samples processed so far, h defines the smoothing factor valued 1, and 𝑒𝑛,𝑜 presents the system error of the nth training cycle of the oth class. The gradient ascent method in the et2Class sequential learning environment is formulated as follows: ∑𝑁 𝑛=1 exp(

−𝑒𝑛 2 2

−𝑒𝑁,𝑜 2

) = 𝐴𝑁−1 + exp (

2

𝜕𝑓̂ (0)

) , 𝜕𝑞

𝑙,𝑟

𝑜

𝐴

𝜕𝐸

𝑁 = 𝑁√2𝜋 𝜕𝑞

𝑙,𝑟

𝑜

(37)

The learning rate to support the training process is defined as follows: 𝜂𝑜 (𝑁) {

𝜌5 𝜂𝑜 (𝑁 − 1), 𝑓̂(0)𝑁 ≥ 𝑓̂(0)𝑁−1 , 0 < 𝜌4 < 𝜌5 (38) 𝜌4 𝜂𝑜 (𝑁 − 1), 𝑓̂(0)𝑁 ≥ 𝑓̂(0)𝑁−1

where 𝜌5 ∈ (1,1.5], 𝜌4 ∈ [0.5,1) stands for the learning rate factor, which determines the growth and reduces the learning rate. The convergence is assured by determining the learning rate in the range of 0 < 𝜂𝑜 < (𝑃

2𝑁√2𝜋

𝑜,𝑚𝑎𝑥 )

2𝐴 𝑁

,

which comes from the Lyapunov stability criterion. In addition, we develop a local learning scenario called Fuzzily Weighted Generalized Recursive Least Square (FWGLRS) which is inspired by the Generalized Recursive Least Square (GRLS) method [85]. The salient characteristic of this method can be seen in its weight decay term. The weight decay term is capable of keeping the weight vector to only hover around the small bounded interval. This method is useful for improving the stability and compactness of the rule base. The FWGRLS method is expressed as follows: ∆(𝑛)

𝜓 𝑙,𝑟 (𝑛) = Ψ𝑖 𝑙,𝑟 (𝑛 − 1)𝐹(𝑛)(Λ̃(n) + 𝐹(𝑛)Ψ𝑖 𝑙,𝑟 (𝑛 − 1)𝐹 𝑇 (𝑛))−1 (39) Ψ𝑖 𝑙,𝑟 (𝑛) = Ψ𝑖 𝑙,𝑟 (𝑛 − 1) − 𝜓 𝑙,𝑟 (𝑛)𝐹(𝑛)Ψ𝑖 𝑙,𝑟 (𝑛 − 1)(40)

Ω𝑖 𝑙,𝑟 (𝑛) = Ω𝑖 𝑙,𝑟 (𝑛 − 1) − 𝜛Ψ𝑖 𝑙,𝑟 (𝑛)∇ξ (Ω𝑖 𝑙,𝑟 (n − 1)) + Ψ𝑖 𝑙,𝑟 (𝑛)(𝑡(𝑛) − 𝑦(𝑛))

(41) 𝑦(𝑛) = 𝑥𝑒𝑛 Ω𝑖 𝑙,𝑟 (𝑛) and 𝐹(𝑛) =

𝜕𝑦(𝑛) 𝜕Ω𝑖 𝑙,𝑟 (𝑛)

= 𝑥𝑒𝑛 (42)

̃(n) ∈ ℜ(𝑃+1)×(𝑃×1) defines a diagonal matrix. Its diagonal elements where Λ comprise the firing strength of fuzzy rule 𝑅̃𝑖 . The output covariance matrix is denoted by ∆(𝑛), which can be set as an identify matrix [85]. Furthermore, the gradient of the weight decay is expressed by ∇ξ (Ω𝑖 𝑙,𝑟 (n − 1)) and 𝜛 ≈ 10−15 presents a case-sensitive predefined constant. The gradient of the weight decay function can be specified as any nonlinear function that usually has an inexact gradient solution. eT2Class utilizes quadratic weight decay due to its capability of shrinking the weight vector proportionally to its current values.

V. Experimental Design and Result

In order to compare algorithm performance, eT2Class and 7 benchmarked classifier algorithms, namely please list algorithms for comparisons here are tested on web news article data. The data collection technique is explained in section 3. The classification procedure for each algorithm for this experiment is depicted in Fig. 7. The classification task describes the number of categories that are classified. It can be seen that for a 2-category classification task, 15 combinations are tested from 15 possible combinations. For classification tasks involving a higher number of categories (the 3, 4, 5, and 6-category classification tasks), the details are shown in Table 3.

This section comprises three subsections. Subsection 5.1. discusses the preprocessing step, subsection 5.2. illustrates the experimental setup, and subsection 5.3. presents the comprehensive results of the algorithms.

5.1 Preprocessing Step Web news articles are characterized as non-stationary data. Their content is generated by an unpredictable event and appears anytime from many sources (news editor)

containing a large amount of news. Therefore, a news article can be regarded as a data stream with changing (drifting) characteristics over time. In this sense, an evolving model, i.e. a model being able to appropriately expand and shrink its structure on demand and on the fly, is necessary for classifying such article with sufficient accuracy.

eT2Class is utilized to analyze news articles for several reasons: (1) it is incremental and evolving to handle the rapid arrival of data by fast model adaptation in form of parameter updates and structural evolution; (2) it has the ability to handle concept drift of unpredictable data; (3) it can deal with uncertainty (due to noise) in streams on a higher level.

The generated dataset consists of text document articles in six categories: business, entertainment, lifestyle, politics, sport, and technology. Document articles are stored in a local repository for analysis purposes. Data crawling is carried out manually due to the necessity to analyze up-to-date and real time data (online).

5.2 Experiment The experimental framework is carried out under the following computer specifications: MacBook Pro, 2.9 GHz Intel Core i7 personal computer, and Memory 8GB 1600 MHz, DDR3. The software used to run the experiment is MATLAB R2015b 64-bit (MacI64).

The implementation and data used in this paper is publicly available on http://homepage.cs.latrobe.edu.au/mpratama/appliedsoftcomputing/WebNewsMining .zip.

The number of terms used for the classification task, as described in Section 5.1 is 54, 46, 61, 46, 42, and 41 for the 1st, 2nd, 3rd, 4th, 5th, and 6th categories, respectively. For comparison, eT2Class is benchmarked with six other algorithms: ANFIS [86], eTS [87], Simpl_eTS [46], DFNN [88], GDFNN [89], and FAOSPFNN [90] to measure three criteria: accuracy, time, and the number of fuzzy rules generated.

The reason we choose these algorithms for comparison is as follows: 1. ANFIS is a well-known and widely-used neuro-fuzzy classifier (), working in fully batch, and non-evolving modes, thus serving as hard benchmark against our algorithm. 2. GDFNN, DFNN, and FAOSPFNN are widely-used semi-evolving classifiers which can automatically generate rules and handle concept drift but still works on a batched learning scenario. 3. eTS and Simpl_eTS are used for comparison against eT2Class because both algorithms are seen as evolving classifiers (and are also used in the related work by Iglesias [11]). However, both of them apply the type-1 classifier architecture. This comparison is to illustrate the efficacy of the eT2Class against its counterparts. Several classification scenarios with across different combinations of categories are carried out in this experiment. e. A summary of our experiment is described in Table 3.

Table 3 shows the list of classification tasks that are carried out in this experiment for every algorithm. Each classification task is executed for 50 times with random permutation in order to avoid data order dependency problems. For this task, we number each category, namely business, entertainment, lifestyle, politics, sport, and technology with 1,2,3,4,5, and 6 respectively. For example, for classification task [1,2,3], we compare the 1st, 2nd, and 3rd categories, which are the business, entertainment, and lifestyle categories, for classification.

The classifier performance is measured by calculating the classification rate, running time, and number of rules generated for each combination of categories.

5.3 Results

This section details the performance of every classifier in every classification task of the 2,3,4,5, and 6 number categories in terms of accuracy, running time, and number of rules generated. The supporting tables and figures of the performance results can be seen in Appendix 1 of this document.

Generally, for the 2, 3, 4, and 5-category classification, all the algorithms have an accuracy of above 90 percent, except for the DFNN and FAOSPFNN algorithm, which have an accuracy of 0.848 percent and 0.805 percent, respectively as shown in Table 4 to Table 10. In the 6-category classification, the eTS algorithm also shows a decreasing performance to 88.4% accuracy. On the other hand, in the 6-category classification, ANFIS, eT2Class, GDFNN, and Simpl_eTS have above 90 percent accuracy with 100%, 96.4%, 95.6%, and 94.9% respectively, as shown in Table 11. The average accuracy for all the varying number of categories is shown in Fig. 7. It can be seen that ANFIS and eT2Class have comparable accuracy for all these classification tasks, regardless of the number of categories. It is worth mentioning though ANFIS is a classical static benchmark algorithm and has the big advantage to see all data at once, but suffers from demanding computational time, especially when the number of categories increases.

In terms of runtime, the Simpl_eTS is the best-performing algorithm with an average runtime of 0.46s for the 2-category classification task (Table 4 and 5 in the running time section), followed by the ANFIS and eT2Class at 1.312 seconds and 3.11 seconds, whereas the speed of the others was more than 10 seconds. In the 3-category and above classification task, eT2Class outperforms Simpl_eTS with an average running time of around 3 seconds for the former and 4 seconds for the latter, whereas ANFIS is around 9 seconds. The other algorithms in this case have incomparable running times, at around 80, 90, 175, 350, and 1400 seconds for DFNN, eTS, GDFNN, ANFIS, and FAOSPFNN, respectively. As the number of categories in the classification task increases (4, 5, and 6 categories), both ANFIS and Simpl_eTS perform higher running time in comparison with eT2Class. The most notable runtime result is in the 6-category classification task (Table 11) where Simpl_eTS and ANFIS require 17.636 and 99.5857 seconds to accomplish the task, whereas eT2Class only 2.27 seconds.

In terms of the number of rules, Simpl_eTS, ANFIS, and eT2Class generate the least number of rules. Although the Simpl_eTS algorithm achieves the best numerical result (generating only one rule no matter how many categories in the classification task), eT2Class shows a comparable with Simpl_eTS performance with only generating 2.13 rules in 6-category classification.

In conclusion, eT2Class delivers the most encouraging results in terms of a balance of runtime performance, competitive accuracy and the number of rules. Fig. 10 shows that eT2Class achieves the fastest running time at around 3 seconds on average for all numbers of category classification. Even though ANFIS has the highest performance with 100% accuracy, the results for eT2Class are competitive, with 97% average accuracy. Note that ANFIS suffers from a complexity problem due to its learning characteristic (batch mode). ANFIS requires many parameters to be looked after because it stores all training data in every learning process. The eT2Class also generates few rules (around 3 rules) as shown in Fig. 13. Although Simpl_eTS performs better in the number of rules, eT2Class achieves better performance in accuracy and computational speed compared to the former.

VI. Discussion This web news mining framework is built on recently published evolving algorithm eT2Class. In addition, our web news mining framework is customized for a local Australian web news articles, namely the Age. The goal of evolving web news mining lies in the real-time, non-stationary characteristic of web news article that needs to be handled by fast and evolving algorithm. The working principle of eT2Class guarantees the validity of a network structure, because it evolves on demand in accordance with up-to-date context and on the fly, thereby being scalable for online real-time applications. eT2Class is also capable starting its learning process from scratch without any pre-recorded data.

VII. Conclusion and Future Work

This papers aims to present a new paradigm of web news mining using an evolving algorithm, namely the eT2Class. In this work, the efficacy of the eT2Class has been demonstrated and has been proven to be very effective in comparison to state-of-the art algorithms. The algorithm proposed in this framework can be extended to many various fields of text mining. It can be used to predict other text mining domains, such as social media for trend and sentiment analysis and web link analysis to evaluate customer behavior in the real-time mode.

In the future, this framework can be used in distributed learning areas, which can handle heterogeneous data streams from different sources.

Reference : [1]

H. Cnnic, "Statistical report on internet development in China, July 2014," ed, 2014.

[2]

B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and issues in data stream systems," 2002, pp. 1-16 %@ 1581135076.

[3]

J. Read, A. Bifet, B. Pfahringer, and G. Holmes, "Batch-incremental versus instance-incremental learning in dynamic and evolving data," in Advances in Intelligent Data Analysis XI, ed: Springer, 2012, pp. 313-323 %@ 3642341551.

[4]

S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering data streams," 2000, pp. 359-366 %@ 0769508502.

[5]

S. Guha, C. Kim, and K. Shim, "XWAVE: optimal and approximate extended wavelets," 2004, pp. 288-299 %@ 0120884690.

[6]

G. Hulten, L. Spencer, and P. Domingos, "Mining time-changing data streams," 2001, pp. 97-106.

[7]

X. H. Dang, V. Lee, W. K. Ng, A. Ciptadi, and K. L. Ong, "An EM-based algorithm for clustering data streams in sliding windows," 2009, pp. 230235 %@ 3642008860.

[8]

R. R. Ade and P. R. Deshmukh, "Methods for incremental learning: a survey," International Journal of Data Mining & Knowledge Management Process, vol. 3, p. 119, 2013.

[9]

M. Sayed-Mouchaweh and E. Lughofer, Learning in non-stationary environments: methods and applications: Springer Science & Business Media, 2012.

[10]

P. Angelov, "Evolving takagi-sugeno fuzzy systems from streaming data," Evolving intelligent systems: methodology and applications, vol. 12, p. 21, 2010.

[11]

J. A. Iglesias, A. Tiemblo, A. Ledezma, and A. Sanchis, "Web news mining in an evolving framework," Information Fusion, vol. 28, pp. 90-98 %@ 15662535, 2016.

[12]

P. P. Angelov and X. Zhou, "Evolving fuzzy-rule-based classifiers from data streams," IEEE Transactions on Fuzzy Systems, vol. 16, pp. 1462-1475, 2008.

[13]

J. M. Mendel, "Advances in type-2 fuzzy sets and systems," Information sciences, vol. 177, pp. 84-110, 2007.

[14]

E. Lughofer, Evolving fuzzy systems-Methodologies, advanced concepts and applications vol. 53: Springer, 2011.

[15]

E. Lughofer, "Evolving Fuzzy Systems–Fundamentals, Reliability, Interpretability, Useability, Applications," Handbook of Computational Intelligence, editor: P. Angelov, World Scientific, 2015.

[16]

M. Pratama, S. G. Anavatti, M. Joo, and E. D. Lughofer, "pClass: an effective classifier for streaming examples," IEEE Transactions on Fuzzy Systems, vol. 23, pp. 369-386, 2015.

[17]

M. Pratama, S. G. Anavatti, and E. Lughofer, "GENEFIS: toward an effective localist network," Fuzzy Systems, IEEE Transactions on, vol. 22, pp. 547562 %@ 1063-6706, 2014.

[18]

A. Lemos, W. Caminhas, and F. Gomide, "Adaptive fault detection and diagnosis using an evolving fuzzy classifier," Information Sciences, vol. 220, pp. 64-85 %@ 0020-0255, 2013.

[19]

E. Lughofer and O. Buchtala, "Reliable all-pairs evolving fuzzy classifiers," IEEE Transactions on Fuzzy Systems, vol. 21, pp. 625-641, 2013.

[20]

H.-J. Rong, N. Sundararajan, G.-B. Huang, and G.-S. Zhao, "Extended sequential adaptive fuzzy inference system for classification problems," Evolving Systems, vol. 2, pp. 71-82 %@ 1868-6478, 2011.

[21]

N. N. Karnik, J. M. Mendel, and Q. Liang, "Type-2 fuzzy logic systems," IEEE Transactions on Fuzzy Systems, vol. 7, pp. 643-658, 1999.

[22]

L. A. Zadeh, "The concept of a linguistic variable and its application to approximate reasoning—I," Information sciences, vol. 8, pp. 199-249 %@ 0020-0255, 1975.

[23]

J. M. Mendel, "On the importance of interval sets in type-2 fuzzy logic systems," in IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, 2001, pp. 1647-1652.

[24]

C.-F. Juang and Y.-W. Tsao, "A self-evolving interval type-2 fuzzy neural network with online structure and parameter learning," Fuzzy Systems, IEEE Transactions on, vol. 16, pp. 1411-1424 %@ 1063-6706, 2008.

[25]

C.-F. Juang and Y.-W. Tsao, "A type-2 self-organizing neural fuzzy system and its FPGA implementation," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 38, pp. 1537-1548 %@ 1083-4419, 2008.

[26]

S. W. Tung, C. Quek, and C. Guan, "eT2FIS: an evolving type-2 neural fuzzy inference system," Information Sciences, vol. 220, pp. 124-148 %@ 00200255, 2013.

[27]

M. Pratama, J. Lu, and G. Zhang, "An Incremental Interval Type-2 Neural Fuzzy Classifier," in Proceedings of the 2014 IEEE Conference on Fuzzy Systems (FUZZ-IEEE), 2015, pp. 1-8.

[28]

S. W. Tung, C. Quek, and C. Guan, "An evolving type-2 neural fuzzy inference system," in PRICAI 2010: Trends in Artificial Intelligence, ed: Springer, 2010, pp. 535-546 %@ 3642152457.

[29]

K. Subramanian, S. Suresh, and N. Sundararajan, "A metacognitive neurofuzzy inference system (McFIS) for sequential classification problems," IEEE Transactions on Fuzzy Systems, vol. 21, pp. 1080-1095, 2013.

[30]

D. Wu and J. M. Mendel, "A vector similarity measure for linguistic approximation: Interval type-2 and type-1 fuzzy sets," Information Sciences, vol. 178, pp. 381-402 %@ 0020-0255, 2008.

[31]

D. Wu and J. M. Mendel, "A comparative study of ranking methods, similarity measures and uncertainty measures for interval type-2 fuzzy sets," Information Sciences, vol. 179, pp. 1169-1192, 2009.

[32]

Y.-Y. Lin, J.-Y. Chang, and C.-T. Lin, "A TSK-type-based self-evolving compensatory interval type-2 fuzzy neural network (TSCIT2FNN) and its applications," IEEE Transactions on Industrial Electronics, vol. 61, pp. 447459, 2014.

[33]

M. Pratama, S. G. Anavatti, and E. Lughofer, "Evolving fuzzy rule-based classifier based on GENEFIS," in Fuzzy Systems (FUZZ), 2013 IEEE International Conference on, 2013, pp. 1-8.

[34]

O. Etzioni, "The World-Wide Web: quagmire or gold mine?," Communications of the ACM, vol. 39, pp. 65-68 %@ 0001-0782, 1996.

[35]

U. M. Fayyd, G. P. Shapiro, and P. Smyth, "From data mining to knowledge discovery: an overview," 1996.

[36]

R. Kosala and H. Blockeel, "Web mining research: A survey," ACM Sigkdd Explorations Newsletter, vol. 2, pp. 1-15 %@ 1931-0145, 2000.

[37]

O. R. Zaiane, J. Han, Z.-N. Li, S. H. Chee, and J. Y. Chiang, "MultiMediaMiner: a system prototype for multimedia data mining," 1998, pp. 581-583 %@ 0897919955.

[38]

D. Barbará, "Requirements for clustering data streams," ACM sIGKDD Explorations Newsletter, vol. 3, pp. 23-27 %@ 1931-0145, 2002.

[39]

D. K. Tasoulis, N. M. Adams, and D. J. Hand, "Unsupervised clustering in streaming data," 2006, pp. 638-642 %@ 0769527027.

[40]

A. Bifet, "Adaptive stream mining: Pattern learning and mining from evolving data streams," 2010, pp. 1-212 %@ 1607500906.

[41]

M. Pratama, J. Lu, and G. Zhang, "Evolving Interval Type-2 Fuzzy Classifier," IEEE Transactions on Fuzzy Systems, vol. 24, pp. 574-589, 2016.

[42]

C.-F. Juang and C.-T. Lin, "An online self-constructing neural fuzzy inference network and its applications," IEEE Transactions on Fuzzy Systems, vol. 6, pp. 12-32, 1998.

[43]

N. K. Kasabov and Q. Song, "DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction," IEEE Transactions on Fuzzy Systems, vol. 10, pp. 144-154, 2002.

[44]

P. Angelov, J. Victor, A. Dourado, and D. Filev, "On-line evolution of TakagiSugeno fuzzy models," 2004.

[45]

R. R. Yager and D. P. Filev, "Generation of fuzzy rules by mountain clustering," Journal of Intelligent & Fuzzy Systems, vol. 2, pp. 209-219 %@ 1064-1246, 1994.

[46]

P. Angelov and D. Filev, "Simpl_eTS: a simplified method for learning evolving Takagi-Sugeno fuzzy models," 2005, pp. 1068-1073 %@ 0780391594.

[47]

H.-J. Rong, N. Sundararajan, G.-B. Huang, and P. Saratchandran, "Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction," Fuzzy sets and systems, vol. 157, pp. 12601275 %@ 0165-0114, 2006.

[48]

G.-B. Huang, P. Saratchandran, and N. Sundararajan, "An efficient sequential learning algorithm for growing and pruning RBF (GAP-RBF) networks," IEEE Transactions on Systems, Man and Cybernetics, vol. 34, pp. 2284-2292, 2004.

[49]

G.-B. Huang, P. Saratchandran, and N. Sundararajan, "A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation," IEEE Transactions on Neural Networks, vol. 16, pp. 57-67, 2005.

[50]

E. D. Lughofer, "FLEXFIS: a robust incremental learning approach for evolving Takagi–Sugeno fuzzy models," Fuzzy Systems, IEEE Transactions on, vol. 16, pp. 1393-1410 %@ 1063-6706, 2008.

[51]

R. Gray, "Vector quantization," IEEE Assp Magazine, vol. 1, pp. 4-29 %@ 0740-7467, 1984.

[52]

E. Lughofer, C. Cernuda, S. Kindermann, and M. Pratama, "Generalized smart evolving fuzzy systems," Evolving Systems, vol. 6, pp. 269-292 %@ 1868-6478, 2015.

[53]

P. Angelov, "Fuzzily connected multimodel systems evolving autonomously from data streams," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, pp. 898-910, 2011.

[54]

A. Lemos, W. Caminhas, and F. Gomide, "Multivariable gaussian evolving fuzzy modeling system," IEEE Transactions on Fuzzy Systems, vol. 19, pp. 91-104, 2011.

[55]

E. Lima, M. Hell, R. Ballini, and F. Gomide, "Evolving fuzzy modeling using participatory learning," Evolving intelligent systems: methodology and applications, pp. 67-86, 2010.

[56]

P. Angelov and R. Yager, "A new type of simplified fuzzy rule-based system," International Journal of General Systems, vol. 41, pp. 163-185 %@ 0308-1079, 2012.

[57]

M. Pratama, S. G. Anavatti, P. P. Angelov, and E. Lughofer, "PANFIS: A Novel Incremental Learning Machine," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, pp. 55-68, 2014.

[58]

E. Lughofer and M. Sayed-Mouchaweh, "Autonomous data stream clustering implementing split-and-merge concepts–towards a plug-andplay approach," Information Sciences, vol. 304, pp. 54-79 %@ 0020-0255, 2015.

[59]

M. Pratama, S. G. Anavatti, and J. Lu, "Recurrent Classifier Based on an Incremental Metacognitive-Based Scaffolding Algorithm," IEEE Transactions on Fuzzy Systems, vol. 23, pp. 2048-2066, 2015.

[60]

M. Pratama, S. Anavatti, E. Lughofer, and C. P. Lim, "gClass: An Incremental Meta-Cognitive-based Scaffolding Theory," Submitted to a special issue on IEEE Computational Intelligence Magazine, 2014.

[61]

Q. Liang and J. M. Mendel, "Interval type-2 fuzzy logic systems: theory and design," IEEE Transactions on Fuzzy Systems, vol. 8, pp. 535-550, 2000.

[62]

H. Bustince Sola, J. Fernandez, H. Hagras, F. Herrera, M. Pagola, and E. Barrenechea, "Interval type-2 fuzzy sets are generalization of intervalvalued fuzzy sets: toward a wider view on their relationship," IEEE Transactions on Fuzzy Systems, vol. 23, pp. 1876-1882, 2015.

[63]

C.-F. Juang, Y.-Y. Lin, and C.-C. Tu, "A recurrent self-evolving fuzzy neural network with local feedbacks and its application to dynamic system processing," Fuzzy Sets and Systems, vol. 161, pp. 2552-2568 %@ 01650114, 2010.

[64]

Y.-Y. Lin, J.-Y. Chang, N. R. Pal, and C.-T. Lin, "A mutually recurrent interval type-2 neural fuzzy system (MRIT2NFS) with self-evolving structure and parameters," IEEE Transactions on Fuzzy Systems, vol. 21, pp. 492-509, 2013.

[65]

J. R. Castro, O. Castillo, P. Melin, and A. Rodríguez-Díaz, "A hybrid learning algorithm for a class of interval type-2 fuzzy neural networks," Information Sciences, vol. 179, pp. 2175-2193 %@ 0020-0255, 2009.

[66]

C.-F. Juang and C.-Y. Chen, "Data-driven interval type-2 neural fuzzy system with high learning accuracy and improved model interpretability," IEEE Transactions on Cybernetics, vol. 43, pp. 1781-1795, 2013.

[67]

M. Pratama, J. Lu, E. Lughofer, G. Zhang, and S. Anavatti, "Scaffolding type2 classifier for incremental learning under concept drifts," Neurocomputing, vol. 191, pp. 304-329, 2016.

[68]

M. Pratama, G. Zhang, M. J. Er, and S. Anavatti, "An Incremental Type-2 Meta-Cognitive Extreme Learning Machine," 2020.

[69]

A. Bouchachia and C. Vanaret, "GT2FC: an online growing interval type-2 self-learning fuzzy classifier," IEEE Transactions on Fuzzy Systems, vol. 22, pp. 999-1018, 2014.

[70]

R. H. Abiyev and O. Kaynak, "Type 2 fuzzy neural structure for identification and control of time-varying plants," IEEE Transactions on Industrial Electronic Industrial Electronics, vol. 57, pp. 4147-4159, 2010.

[71]

Y.-Y. Lin, S.-H. Liao, J.-Y. Chang, and C.-T. Lin, "Simplified interval type-2 fuzzy neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, pp. 959-969, 2014.

[72]

K. Subramanian, A. K. Das, S. Sundaram, and S. Ramasamy, "A metacognitive interval type-2 fuzzy inference system and its projection based learning algorithm," Evolving Systems, vol. 5, pp. 219-230 %@ 1868-6478, 2014.

[73]

P. Angelov, P. Sadeghi‐Tehran, and R. Ramezani, "An approach to automatic real‐time novelty detection, object identification, and tracking in video streams based on recursive density estimation and evolving Takagi–Sugeno fuzzy systems," International Journal of Intelligent Systems, vol. 26, pp. 189-205, 2011.

[74]

D. Dovžan, V. Logar, and I. Škrjanc, "Implementation of an evolving fuzzy model (eFuMo) in a monitoring system for a waste-water treatment process," IEEE transactions on Fuzzy Systems, vol. 23, pp. 1761-1776, 2015.

[75]

R.-E. Precup, H.-I. Filip, M.-B. Rădac, E. M. Petriu, S. Preitl, and C.-A. Dragoş, "Online identification of evolving Takagi–Sugeno–Kang fuzzy models for crane systems," Applied Soft Computing, vol. 24, pp. 1155-1163, 2014.

[76]

S. Blažič, I. Škrjanc, and D. Matko, "A robust fuzzy adaptive law for evolving control systems," Evolving systems, vol. 5, pp. 3-10, 2014.

[77]

E. Lughofer, B. Trawiński, K. Trawiński, and T. Lasota, "On-line valuation of residential premises with evolving fuzzy models," in International Conference on Hybrid Artificial Intelligence Systems, 2011, pp. 107-115.

[78]

E. Lughofer, V. Macián, C. Guardiola, and E. P. Klement, "Identifying static and dynamic prediction models for NOx emissions with evolving fuzzy systems," Applied Soft Computing, vol. 11, pp. 2487-2500, 2011.

[79]

J. A. Iglesias, P. Angelov, A. Ledezma, and A. Sanchis, "Human activity recognition based on evolving fuzzy systems," International Journal of Neural Systems, vol. 20, pp. 355-364, 2010.

[80]

M. F. Porter, "An algorithm for suffix stripping," Program, vol. 14, pp. 130137 %@ 0033-0337, 1980.

[81]

G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information processing & management, vol. 24, pp. 513-523 %@ 0306-4573, 1988.

[82]

J. M. Mendel and R. I. B. John, "Type-2 fuzzy sets made simple," Fuzzy Systems, IEEE Transactions on, vol. 10, pp. 117-127 %@ 1063-6706, 2002.

[83]

L. M. Silva, L. A. Alexandre, and J. M. de Sá, "Neural network classification: Maximizing zero-error density," in Pattern Recognition and Data Mining, ed: Springer, 2005, pp. 127-135 %@ 3540287574.

[84]

L. Wang, H.-B. Ji, and Y. Jin, "Fuzzy Passive–Aggressive classification: A robust and efficient algorithm for online classification problems," Information Sciences, vol. 220, pp. 46-63 %@ 0020-0255, 2013.

[85]

Y. Xu, K.-W. Wong, and C.-S. Leung, "Generalized RLS approach to the training of neural networks," IEEE Transactions on Neural Networks, vol. 17, pp. 19-34, 2006.

[86]

J.-S. R. Jang, "ANFIS: adaptive-network-based fuzzy inference system," IEEE Transactions on Systems, Man and Cybernetics, vol. 23, pp. 665-685, 1993.

[87]

P. P. Angelov and D. P. Filev, "An approach to online identification of Takagi-Sugeno fuzzy models," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 34, pp. 484-498, 2004.

[88]

S. Wu and M. J. Er, "Dynamic fuzzy neural networks-a novel approach to function approximation," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 30, pp. 358-364, 2000.

[89]

S. Wu, M. J. Er, and Y. Gao, "A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 9, pp. 578-594, 2001.

[90]

N. Wang, M. J. Er, and X. Meng, "A fast and accurate online self-organizing scheme for parsimonious fuzzy neural networks," Neurocomputing, vol. 72, pp. 3818-3829 %@ 0925-2312, 2009.

APPENDIX B

Fig. 1 Web news article classification with the eT2Class Framework

Pseudocode stopword removal Define : List of stopword removal For i=1 to Number_of_words_in_the_document do 
 For j=1 to Number_of_words_in_stopwords_list do
 If Words(i) == stopwords(j) THEN Eliminate Words (i) End If End For End For

Fig. 2 Pseudocode of stopword removal

Pseudocode of tokenization Define : List of unwanted_character Define : String_article

For i=1 to number_of_character in String Article If (String_article[i]==’unwanted character’) Remove String_article[i]; End If End For

String_split (String_article)

Fig. 3 Pseudocode of tokenization

Fig. 4 Details of TDM transformation in training and testing in the Classifier

Fig. 5 Network architecture of the eT2Class inference system Define : Input attributes and Desired labels : (𝑿𝒏 , 𝑻𝒏 ) = (𝒙𝟏 , … , 𝒙𝒖 , 𝒕𝟏 , … 𝒕𝒎 ) Predefined Threshold: 𝜌2 = 0.01, 𝜌3 = 0.8, 𝜌4 = 1.1, 𝜌5 = 0.9, 𝜛 = 10−15 Current Fuzzy Classifier Containing P Rules For i=1 to P do Compute the posterior probabilities of the fuzzy rules and Update the T2DQ method for all rules (11) Compute the volume of existing rules using det operation. End For IF (10) and (14) ==TRUE THEN IF ∗ max ∗ 𝜘𝑖 ∗ > max(𝐷𝑄𝑖 ) 𝑖 =1,…,𝑃

𝑖=1,…,𝑃+1

Activate Rule Recall Mechanism ELSE Add new fuzzy rule and initialize new fuzzy rule End

ELSE Rule Premise Adaptation (21-23) For i=1 to P do Execute T2ERS (24) Execute T2P+(26) IF 𝐸𝑅𝑆𝑖 𝑛 ≤ 𝑚𝑒𝑎𝑛( 𝐸𝑅𝑆𝑖 𝑛 ) − 𝑠𝑡𝑑(𝐸𝑅𝑆𝑖 𝑛 ) ==TRUE THEN Prune Fuzzy Rules ELSE IF 𝜘𝑖 𝑛 ≤ 𝑚𝑒𝑎𝑛(𝜘𝑖 𝑛 ) − 𝑠𝑡𝑑(𝜘𝑖 𝑛 ) THEN

Prune Fuzzy Rules ELSE IF For i=1 to u do Compute the shape-based and proximity based similarity measures Quantify the vector similarity measure End For IF 31,32 == TRUE Merge the fuzzy rules (33-35) END END IF End For

END IF For i=1 to P do Adapt q coefficient and fuzzy rule consequent and design factors (39) -(42) End For

Fig. 6. Pseudocode of eT2Class

Fig. 7. Classification Procedure for Each Classifier Algorithm

Fig. 8. Performance comparison based on the accuracy of each algorithm

Fig. 9 Running time comparison of eTS, DFNN, and GDFNN

Fig. 10 Running time comparison of Simpl_eTS and eT2Class

Fig. 11. Running time comparison of FAOSPFNN and ANFIS

Fig. 12. Comparison of number of rules for DFNN, GDFNN, and FAOSPFNN

Fig. 13. Comparison of number of rules for eTS, Simpl_eTS, ANFIS, and eT2Class

APPENDIX A

Table 1 List of main abbreviations

TDM

Term Document Matrix

EIS

Evolving Intelligent Systems

KM

Karnik-Mendel

FNN

Fuzzy Neural Network

ANFIS

Adaptive neuro-fuzzy inference system or adaptive networkbased fuzzy inference system

eTS

Evolving Takagi-Sugeno Fuzzy Systems

Simpl_eTS

Simplified Method for Learning Evolving Takagi-Sugeno Fuzzy Models

DFNN

Dynamic Fuzzy Neural Network.

GDFNN

Genetic Dynamic Fuzzy Neural Network.

FAOSPFNN

A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural network.

eT2Class

An incremental Interval Type-2 Neural Fuzzy Classifier.

Table 2 The number of terms selected from different thresholds applied in the TF-IDF calculation Threshold

Number of terms selected

(term

Category1

Category2

Category3

Category4

Category5

Category6

(business)

(entertain

(lifestyle)

(politics)

(sport)

(technology)

weight)

ment)

0.007

12125

19800

20818

10261

4177

41

0.06

1671

1030

435

156

42

0

0.13

357

259

175

46

0

0

0.23

77

94

61

26

0

0

0.26

54

86

28

0

0

0

0.41

8

46

0

0

0

0

Table 3 Classification Tasks for 2, 3, 4, 5, and 6 Categories Number of

Classification Tasks

combinations

2 categories

3 categories

4 categories

5 categories

6 categories

1

[1,2]

[4,5,6]

[3,4,5,6]

[1,2,3,4,6]

[1,2,3,4,5,6]

2

[1,3]

[3,5,6]

[1,2,4,6]

3

[1,4]

[3,4,6]

[2,3,5,6]

4

[1,5]

[3,4,5]

5

[1,6]

[2,5,6]

6

[2,3]

[2,4,6]

7

[2,4]

[2,4,5]

8

[2,5]

[2,3,6]

9

[2,6]

[2,3,5]

10

[3,4]

[2,3,4]

11

[3,5]

[1,5,6]

12

[3,6]

[1,4,6]

13

[4,5]

[1,4,5]

14

[4,6]

[1,3,6]

15

[5,6]

[1,3,5]

16

[1,3,4]

17

[1,2,6]

18

[1,2,5]

19

[1,2,4]

20

[1,2,3]

Table 4 Performance results (accuracy, running time, and number of rules) for the 2-category classification for every classifier in every classification task (15 tasks) Sub-Task 1-7

Average accuracy of the 2-category classification (50 permutations) Classification Task [1,2]

[1,3]

[1,4]

[1,5]

[1,6]

[2,3]

[2,4]

eTS

0.99±0.006

0.988±0.007

0.988±0.005

0.988±0.006

0.976±0.007

0.994±0.004

0.994±0.003

Simpl_eTS

0.988±0.01

0.988±0.007

0.986±0.008

0.991±0.005

0.966±0.006

0.984±0.049

0.991±0.003

DFNN

0.905±0.042

0.904±0.037

0.918±0.025

0.907±0.04

0.906±0.032

0.93±0.138

0.968±0.014

GDFNN

0.933±0.014

0.931±0.015

0.929±0.015

0.941±0.011

0.919±0.019

0.988±0.011

0.983±0.013

FAOSPFNN

0.949±0.028

0.943±0.033

0.905±0.067

0.947±0.033

0.94±0.044

0.951±0.087

0.97±0.029

ANFIS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

eT2Class

0.993±0.01

0.988±0.02

0.961±0.01

0.994±0.01

0.929±0.02

0.991±0.01

0.973±0.01

Average running time of the 2-category classification (50 permutations) Classification Task [1,2]

[1,3]

[1,4]

[1,5]

[1,6]

[2,3]

[2,4]

eTS

2.467±0.604

2.839±0.518

3.965±0.905

4.315±0.916

3.758±0.706

2.664±0.666

2.163±0.49

Simpl_eTS

0.46±0.096

0.507±0.109

0.513±0.116

0.666±0.163

0.609±0.169

0.355±0.056

0.325±0.022

DFNN

23.08±12.602

20.975±6.832

19.664±6.369

19.871±4.988

24.823±6.136

21.587±7.821

9.772±1.099

GDFNN

14.447±2.965

14.851±2.153

13.695±1.659

15.802±3.84

18.024±2.777

15.806±2.52

19.978±5.855

FAOSPFNN

20.186±25.40

54.813±89.37

24.587±19.45

16.285±13.00

46.032±88.48

10.947±12.73

4.753±5.39

ANFIS

1.312±0.205

1.435±0.074

1.077±0.018

1.787±0.561

1.394±0.058

0.835±0.048

0.611±0.022

eT2Class

3.11±0.1

2.22±0.2

4.16±0.08

2.01±0.02

5.13±0.07

3.16±0.01

3.23±0.01

Average number of rules generated for the 2-category classification (50 permutations) Classification Task [1,2]

[1,3]

[1,4]

[1,5]

[1,6]

[2,3]

[2,4]

eTS

6.22±1.418

5.76±1.709

5.52±1.515

5.84±1.742

5.74±1.352

5.98±1.348

5.5±1.344

Simpl_eTS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

DFNN

43.16±7.206

33.82±5.749

32.86±4.807

42.84±8.632

48.54±6.175

31.38±7.123

24.72±5.361

GDFNN

26.96±3.22

23.24±3.255

23.94±3.053

30.2±3.301

32.34±3.317

28.92±3.607

28.16±3.46

FAOSPFNN

10.64±4.227

12.82±4.877

12.72±3.995

10.56±3.308

14.12±5.576

7.48±2.435

6.82±2.723

ANFIS

2±0

2±0

2±0

2±0

2±0

2±0

2±0

eT2Class

2.86±0.03

1.99±0.02

3.90±0.03

1.99±0.04

4.27±0.03

2.93±0.03

2.96±0.02

Table 5 Performance results (accuracy, running time, and number of rules) for the 2-category classification for every classifier in every classification task (15 tasks) Sub-Task 8-15 Average accuracy of the 2-category classification (50 permutations) Classification Task [2,5]

[2,6]

[3,4]

[3,5]

[3,6]

[4,5]

[4,6]

[5,6]

eTS

0.993±0.003

0.992±0.005

0.992±0.005

0.991±0.004

0.991±0.006

0.99±0.004

0.993±0.005

0.989±0.005

Simpl_eTS

0.992±0.003

0.983±0.01

0.988±0.005

0.99±0.003

0.983±0.004

0.992±0.003

0.985±0.005

0.985±0.004

DFNN

0.95±0.029

0.944±0.031

0.956±0.026

0.951±0.027

0.913±0.134

0.951±0.02

0.935±0.033

0.909±0.135

GDFNN

0.987±0.01

0.973±0.018

0.976±0.014

0.984±0.007

0.969±0.013

0.979±0.013

0.968±0.013

0.966±0.018

FAOSPFNN

0.969±0.026

0.955±0.015

0.959±0.085

0.973±0.018

0.935±0.084

0.961±0.037

0.92±0.132

0.954±0.022

ANFIS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

1±0

eT2Class

0.998±0.01

0.994±0.02

0.963±0.02

0.998±0.02

1

1

0.993±0.01

0.996±0.02

Average running time of the 2-category classification (50 permutations) Classification Task [2,5]

[2,6]

[3,4]

[3,5]

[3,6]

[4,5]

[4,6]

[5,6]

eTS

2.455±0.481

2.375±0.45

2.382±0.514

2.635±0.564

3.01±0.744

2.283±0.434

2.328±0.442

2.576±0.544

Simpl_eTS

0.49±0.094

0.729±0.086

0.736±0.149

1.511±0.699

1.086±0.387

0.712±0.198

0.639±0.258

0.876±0.19

DFNN

13.61±2.614

17.951±3.54

15.958±5.32

18.328±3.83

18.843±4.13

13.159±6.50

24.421±8.64

17.3±5.782

GDFNN

20.829±5.65

20.189±5.37

19.305±3.10

22.308±6.3

20.643±2.25

14.474±1.48

16.498±1.52

14.236±1.44

FAOSPFNN

6.086±5.001

11.241±8.62

12.902±28.6

11.463±12.3

22.646±17.7

6.147±4.756

14.854±11.4

28.659±85.7

ANFIS

0.719±0.029

0.629±0.009

0.743±0.011

0.887±0.016

0.8±0.006

0.664±0.021

0.592±0.013

0.665±0.008

eT2Class

3.16±0.01

3.13±0.02

2.14±0.03

2.12±0.02

4.27±0.2

4.18±0.05

3.45±0.02

3.05±0.01

Average number of rules generated for the 2-category classification (50 permutations) Classification Task [2,5]

[2,6]

[3,4]

[3,5]

[3,6]

[4,5]

[4,6]

[5,6]

eTS

5.6±1.4

6.26±1.549

5.54±1.343

4.82±1.19

6.12±1.547

5.34±1.222

5.96±1.616

6.1±1.607

Simpl_eTS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

1±0

DFNN

36.68±11.94

46.62±9.24

17.98±7.19

29.42±9.639

46.28±8.33

30.64±7.312

49.58±8.669

46.24±8.46

GDFNN

35.98±4.307

35.5±3.442

25.66±2.973

34±3.849

36.82±4.322

33.68±3.443

36.7±3.77

33.64±3.757

FAOSPFNN

7.28±2.516

9.54±2.549

7.2±2.688

7.28±2.391

10.44±2.943

7.78±2.46

10.7±2.88

11.94±6.677

ANFIS

2±0

2±0

2±0

2±0

2±0

2±0

2±0

2±0

eT2Class

2.46±0.01

2.78±0.01

1.96±0.02

1.96±0.01

3.495±0.06

3.18±0.1

2.91±0.1

2.895±0.3

Table 6 Performance results (accuracy, running time, and number of rules) for the 3-category classification for every classifier in every classification task (20 tasks) Sub-Task 1-7 Average accuracy of the 3-category classification (50 permutations) Classification Task [4,5,6]

[3,5,6]

[3,4,6]

[3,4,5]

[2,5,6]

[2,4,6]

[2,4,5]

eTS

0.985±0.005

0.983±0.005

0.987±0.006

0.987±0.004

0.985±0.005

0.987±0.005

0.988±0.003

Simpl_eTS

0.972±0.008

0.969±0.008

0.966±0.022

0.984±0.003

0.975±0.01

0.972±0.011

0.973±0.087

DFNN

0.892±0.133

0.897±0.046

0.904±0.04

0.92±0.034

0.897±0.048

0.916±0.035

0.906±0.05

GDFNN

0.98±0.007

0.977±0.009

0.975±0.011

0.988±0.004

0.981±0.008

0.981±0.01

0.992±0.003

FAOSPFNN

0.917±0.072

0.923±0.037

0.908±0.048

0.948±0.04

0.909±0.126

0.904±0.08

0.952±0.024

ANFIS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

eT2Class

0.981±0.01

0.964±0.04

0.971±0.02

0.981±0.1

0.987±0.01

1

0.988±0.08

Average running time of the 3-category classification (50 permutations) Classification Task [4,5,6]

[3,5,6]

[3,4,6]

[3,4,5]

[2,5,6]

[2,4,6]

[2,4,5]

eTS

8.724±2.877

13.079±3.449

11.745±2.934

10.595±1.848

8.435±1.124

8.041±1.132

8.124±1.448

Simpl_eTS

0.802±0.144

3.881±3.82

2.294±0.471

2.588±0.473

2.626±0.321

2.13±0.357

2.408±0.606

DFNN

31.546±8.965

71.739±21.76

33.618±12.23

23.118±3.421

26.472±5.007

28.676±6.552

24.278±2.732

GDFNN

33.016±2.866

40.374±5.874

43.138±8.891

45.762±21.66

35.444±2.845

35.199±2.72

35.786±10.75

FAOSPFNN

101.895±188

95.39±109.92

94.615±73.26

59.905±41.83

75.947±91.73

81.715±94.58

61.953±65.45

ANFIS

5.58±0.842

6.437±0.316

6.261±0.438

6.589±0.471

5.329±0.202

5.244±0.328

5.643±0.453

eT2Class

3.635±0.08

2.925±0.05

3.805±0.05

2.005±0.4

3.62±0.35

2

4.89±0.04

Average number of rules generated for the 3-category classification (50 permutations) Classification Task [4,5,6]

[3,5,6]

[3,4,6]

[3,4,5]

[2,5,6]

[2,4,6]

[2,4,5]

eTS

7.38±1.76

7.62±1.725

7.5±1.799

6.42±1.372

7.46±1.809

7.6±1.414

6.6±1.841

Simpl_eTS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

DFNN

67.06±12.27

69.98±6.745

60.74±6.983

48.7±7.967

69.52±13.159

62.38±6.064

58.42±6.658

GDFNN

48.2±3.356

46.74±4.095

45.2±3.162

42.84±3.371

47.54±3.677

45.46±3.424

42.82±4.034

FAOSPFNN

13.46±5.011

12.02±3.426

11.96±3.585

9.38±2.49

12.86±4.209

12.12±3.635

10.3±3.564

ANFIS

3±0

3±0

3±0

3±0

3±0

3±0

3±0

eT2Class

4.22±0.05

3.25±0.08

4.12±0.05

2.25±0.02

4.12±0.12

2.05±0.07

5.05±0.03

Table 7 Performance results (accuracy, running time, and number of rules) for the 3-category classification for every classifier in every classification task (20 tasks) Sub-Task 8-14 Average accuracy of the 3-category classification (50 permutations) Classification Task [2,3,6]

[2,3,5]

[2,3,4]

[1,5,6]

[1,4,6]

[1,4,5]

[1,3,6]

eTS

0.986±0.005

0.989±0.003

0.989±0.003

0.973±0.006

0.973±0.007

0.981±0.005

0.97±0.007

Simpl_eTS

0.961±0.04

0.985±0.005

0.973±0.077

0.96±0.005

0.959±0.007

0.974±0.004

0.956±0.005

DFNN

0.881±0.057

0.883±0.066

0.928±0.037

0.876±0.045

0.862±0.131

0.882±0.044

0.862±0.131

GDFNN

0.979±0.011

0.992±0.003

0.992±0.005

0.94±0.017

0.935±0.016

0.952±0.008

0.935±0.019

FAOSPFNN

0.921±0.053

0.953±0.024

0.947±0.046

0.895±0.056

0.88±0.125

0.922±0.038

0.895±0.051

ANFIS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

eT2Class

1

0.988±0.09

0.984±0.05

0.963±0.06

1

0.979±0.02

0.967±0.07

Average running time of the 3-category classification (50 permutations) Classification Task [2,3,6]

[2,3,5]

[2,3,4]

[1,5,6]

[1,4,6]

[1,4,5]

[1,3,6]

eTS

8.869±1.657

9.564±1.859

8.775±1.941

11.098±1.365

10.532±1.483

10.314±1.476

10.66±1.629

Simpl_eTS

2.222±0.445

2.552±0.448

2.568±0.258

3.426±0.404

3.047±0.361

6.353±1.427

5.958±0.945

DFNN

33.422±5.553

38.827±7.912

29.778±7.795

29.407±3.883

26.101±4.734

49.811±18.98

41.299±11.52

GDFNN

40.032±3.561

40.675±7.691

36.556±5.748

30.588±3.342

30.672±3.666

29.977±6.843

39.126±8.148

FAOSPFNN

107.792±177

60.305±57.30

54.115±48.14

417.71±1358

275.2±904.36

77.494±78.05

437.5±590.91

ANFIS

7.458±0.603

8.022±0.566

7.001±0.794

8.436±0.177

8.335±0.903

8.519±0.436

11.928±0.89

eT2Class

2.98±0.08

3.205±0.06

3.69±0.5

2.02±0.34

2.335±0.06

2.94±0.08

2.91±0.05

Average number of rules generated for the 3-category classification (50 permutations) Classification Task [2,3,6]

[2,3,5]

[2,3,4]

[1,5,6]

[1,4,6]

[1,4,5]

[1,3,6]

eTS

7.6±2.119

7.22±1.595

7.48±1.717

7.56±1.786

7.06±1.889

6.42±1.785

7.2±1.906

Simpl_eTS

1±0

1±0

1±0

1±0

1±0

1±0

1±0

DFNN

68.26±8.815

64.56±11.838

53.02±6.073

72.22±7.985

61.64±11.878

57.12±9.886

60±11.925

GDFNN

45.32±3.582

44.86±3.435

40.58±3.934

44.56±4.674

41±4.286

37.12±3.931

40.42±3.592

FAOSPFNN

12.3±4.042

9.76±3.242

9.58±3.233

17.64±9.45

16.52±9.407

11.24±3.166

17.98±7.124

ANFIS

3±0

3±0

3±0

3±0

3±0

3±0

3±0

eT2Class

3.32±0.12

5.22±0.03

4.05±0.06

2.35±0.08

2.75±0.05

3.32±0.02

3.12±0.03

Table 8 Performance results (accuracy, running time, and number of rules) for the 3-category classification for every classifier in every classification task (20 tasks) Sub-Task 15-20 Average accuracy of 3-category classification (50 permutations) Classification Task [1,3,5]

[1,3,4]

[1,2,6]

[1,2,5]

[1,2,4]

[1,2,3]

eTS

0.981±0.006

0.979±0.008

0.975±0.006

0.982±0.005

0.982±0.006

0.984±0.006

Simpl_eTS

0.973±0.005

0.971±0.006

0.96±0.008

0.964±0.098

0.977±0.008

0.959±0.092

DFNN

0.876±0.072

0.882±0.046

0.882±0.05

0.861±0.076

0.896±0.038

0.869±0.056

GDFNN

0.949±0.009

0.943±0.011

0.939±0.016

0.952±0.009

0.949±0.008

0.949±0.01

FAOSPFNN

0.926±0.069

0.907±0.063

0.859±0.099

0.923±0.056

0.91±0.045

0.92±0.041

ANFIS

1±0

1±0

1±0

1±0

1±0

1±0

eT2Class

0.973±0.2

1

0.979±0.07

0.980±0.05

0.983±0.11

0.971±0.02

Average running time of the 3-category classification (50 permutations) Classification Task [1,3,5]

[1,3,4]

[1,2,6]

[1,2,5]

[1,2,4]

[1,2,3]

eTS

10.75±1.261

11.004±1.502

10.358±1.048

10.858±1.679

9.562±1.501

10.024±1.55

Simpl_eTS

8.492±1.989

7.047±1.192

8.321±1.825

6.17±2.164

6.442±2.898

6.945±2.43

DFNN

38.477±9.474

35.14±13.429

32.771±11.64

56.724±29.83

28.175±6.014

47.239±25.467

GDFNN

34.015±5.377

52.988±23.16

52.215±17.54

32.766±6.028

32.744±5.249

39.09±13.091 166.891±179.69

FAOSPFNN

113.113±134

149.911±123

455.01±1073

86.545±95.51

108.519±88.5

1

ANFIS

11.686±0.704

10.714±0.341

9.907±1.312

13.901±1.859

10.489±0.309

12.84±1.886

eT2Class

4.64±0.06

2.94±0.05

1.99±0.03

3.395±0.02

3.315±0.05

2.8±0.01

Average number of rules generated for the 3-category classification (50 permutations) Classification Task [1,3,5]

[1,3,4]

[1,2,6]

[1,2,5]

[1,2,4]

[1,2,3]

eTS

6.64±1.411

7.22±1.682

6.96±1.442

6.96±2.128

6.98±1.985

7±2.01

Simpl_eTS

1±0

1±0

1±0

1±0

1±0

1±0

DFNN

58.32±9.859

51.38±7.442

67.38±6.593

66.1±7.872

60.12±8.138

60.4±8.094

GDFNN

37.94±3.841

35.12±4.029

42.84±3.683

38.82±4.327

37.02±4.172

35.84±3.733

FAOSPFNN

11.16±3.904

13.28±3.314

19.48±9.29

11.36±3.658

12.96±3.817

13.4±4.305

ANFIS

3±0

3±0

3±0

3±0

3±0

3±0

eT2Class

5.21±0.05

3.12±0.06

2.05±0.03

4.12±0.03

4.12±0.01

3.05±0.03

Table 9 Performance results (accuracy, running time, and number of rules) for the 4-category classification for every classifier in the 3 classification tasks Average accuracy of the 4-category classification (50 permutations) Classification Task [2,3,5,6]

[3,4,5,6]

[1,2,4,6]

eTS

0.982±0.004

0.98±0.005

0.971±0.007

Simpl_eTS

0.954±0.056

0.965±0.01

0.954±0.007

DFNN

0.857±0.061

0.835±0.18

0.856±0.133

GDFNN

0.983±0.008

0.98±0.009

0.943±0.014

FAOSPFNN

0.887±0.134

0.877±0.109

0.839±0.14

ANFIS

1±0

1±0

1±0

eT2Class

0.9699±0.04

0.905±0.12

0.955±0.08

Average running time of the 4-category classification (50 permutations) Classification Task [2,3,5,6]

[3,4,5,6]

[1,2,4,6]

eTS

19.323±5.125

20.978±7.906

17.237±3.558

Simpl_eTS

6.068±1.31

3.176±0.734

6.235±0.152

DFNN

48.773±5.521

39.65±8.319

59.512±19.055

GDFNN

96.972±23.445

160.848±59.587

67.676±6.178

FAOSPFNN

614.556±2784.931

444.648±349.816

979.542±1438.856

ANFIS

35.869±1.398

34.675±1.297

41.948±1.447

eT2Class

2.605±0.01

2.985±0.06

2.6±0.02

Average number of rules generated for the 4-category classification (50 permutations) Classification Task [2,3,5,6]

[3,4,5,6]

[1,2,4,6]

eTS

8.6±2.483

8.46±1.992

7.84±2.271

Simpl_eTS

1±0

1±0

1±0

DFNN

89.08±11.56

77.48±18.618

81.14±14.175

GDFNN

85.86±5.507

87.98±5.798

77.84±5.797

FAOSPFNN

13.42±8.079

15.04±3.907

18.64±6.945

ANFIS

4±0

4±0

4±0

eT2Class

3.335±0.08

3.456±0.02

3.213±0.12

Table 10 Performance results (accuracy, running time, and number of rules) for the 5category classification for every classifier in 1 classification task

Average accuracy of the 5-category classification (50 permutations) Classification Task [1,2,3,4,6] eTS

0.924±0.014

Simpl_eTS

0.959±0.012

DFNN

0.848±0.066

GDFNN

0.95±0.012

FAOSPFNN

0.805±0.177

ANFIS

1±0

eT2Class

0.977±0.06

Average running time of the 5-category classification (50 permutations) Classification Task [1,2,3,4,6] eTS

67.357±1785

Simpl_eTS

8.198±1.435

DFNN

64.526±6.163

GDFNN

120.362±15.668

FAOSPFNN

1453.357±1508.906

ANFIS

156.521±8.4

eT2Class

2.025±0.01

Average number of rules generated for the 5-category classification (50 permutations) Classification Task [1,2,3,4,6] eTS

6.92±1.412

Simpl_eTS

1±0

DFNN

90.84±8.235

GDFNN

95.76±5.192

FAOSPFNN

16.76±5.52

ANFIS

5±0 2.23±0.06

eT2Class

Table 11 Performance results (accuracy, running time, and number of rule) for the 6category classification for every classifier in 1 classification task

Average accuracy of the 6-category classification (50 permutations) Classification Task [1,2,3,4,5,6] eTS

0.884±0.044

Simpl_eTS

0.949±0.009

DFNN

0.787±0.135

GDFNN

0.956±0.01

FAOSPFNN

0.856±0.082

ANFIS

1±0

eT2Class

0.964±0.07

Average running time of the 6-category classification (50 permutations) Classification Task [1,2,3,4,5,6] eTS

99.5857±30.7341

Simpl_eTS

17.636±8.64

DFNN

88.382±13.589

GDFNN

177.078±15.965

FAOSPFNN

804.411±1236.232

ANFIS

488.672±130.646

eT2Class

2.27±0.8

Average number of rules generated for the 6-category classification (50 permutations) Classification Task [1,2,3,4,5,6] eTS

7.12±1.662

Simpl_eTS

1±0

DFNN

103.34±18.019

GDFNN

116.74±6.131

FAOSPFNN

12.98±3.485

ANFIS

6±0

eT2Class

2.13±0.36