Towards a granular computing approach based on Formal Concept Analysis for discovering periodicities in data

Towards a granular computing approach based on Formal Concept Analysis for discovering periodicities in data

Accepted Manuscript Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data Vincenzo Loia, Francesco Orciuoli, Witol...

3MB Sizes 0 Downloads 58 Views

Accepted Manuscript

Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data Vincenzo Loia, Francesco Orciuoli, Witold Pedrycz PII: DOI: Reference:

S0950-7051(18)30048-0 10.1016/j.knosys.2018.01.032 KNOSYS 4205

To appear in:

Knowledge-Based Systems

Received date: Revised date: Accepted date:

11 September 2017 24 January 2018 29 January 2018

Please cite this article as: Vincenzo Loia, Francesco Orciuoli, Witold Pedrycz, Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data, Knowledge-Based Systems (2018), doi: 10.1016/j.knosys.2018.01.032

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • Granular Computing (GrC) for discovering event periodicities in temporal data. GrC is a paradigm useful to describe and analyse data at different

CR IP T

levels of abstraction, therefore data can be exploited to create information granules starting from both the selection of a given time unit and the

construction of a set of time slots in a time interval of interest decomposed into periodic segments.

• Formal Concept Analysis (FCA) with time-related attributes to realize

AN US

granulations of data with respect to periodic time slots represented as specific attributes, called temporal attributes of the formal context. Granules

including temporal attributes are used to discover periodic occurrences and co-occurrences.

• A set of measures, i.e., Information Granulation (IG), Information Entropy (IE), Separation (SEP), Coverage (COV) and Specificity (SP), which can

M

be used to assess granulation and resulting granules according to their capability to elicit useful knowledge related to periodic occurrences and

ED

co-occurrences. In brief, these measures help us discover relevant temporal occurrences and co-occurrences by guiding us across 55 multiple granulations and, within a specific granulation, to identify also the granules

PT

providing more interesting and/or unique knowledge.

• The work provides both an illustrative example and a case study realized

CE

by using a dataset related to forest fires occurred in the natural park of

AC

Montesinho (Portugal).

• The main original aspect of the proposed approach comes to the definition of a novel time-guided granulation approach including the contextualization of a set of measures useful to assess the quality of granulations with respect to the interestingness and uniqueness of the discovered knowledge related to the periodicity of occurrences and co-occurrences of events. Despite of existing works, the proposed approach helps the human operator 1

ACCEPTED MANUSCRIPT

finding the granularities (along the time dimension) providing interesting

AC

CE

PT

ED

M

AN US

CR IP T

and unique knowledge.

2

ACCEPTED MANUSCRIPT

Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data

CR IP T

Vincenzo Loiac , Francesco Orciuolic , Witold Pedrycza,b a Department

of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6R 2G7, Canada b Systems Research Institute, Polish Academy of Sciences, 01-224 Warsaw, Poland c Dipartimento di Scienze Aziendali - Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132 - 84084 Fisciano (SA), Italy

AN US

Abstract

Studying aspects related to the occurrences and co-occurrences of events enables many interesting applications in several domains like Public Safety and Security. In particular, in Digital Forensics, it is useful to construct the timeline of a suspect, reconstructed by analysing social networking applications like

M

Facebook and Twitter. One of the main limitations of the existing data analysis techniques, addressing the above issues, is their ability to work only on a single view on data and, thus, may miss the elicitation of interesting knowledge.

ED

This limitation can be overcome by considering more views and applying methods to asses such views, allowing human operators to move from a view to a more suitable one. This paper focuses on temporal aspects of data, proposes

PT

an approach based on Granular Computing to build multiple time-related views and to interpret the extracted knowledge concerning the periodic occurrences

CE

of events. The proposed approach adopts Formal Concept Analysis (with timerelated attributes) as an algorithm to realize granulations of data and defines a set of Granular Computing measures to interpret the formal concepts, whose

AC

extensional parts are formed by co-occurred events, in the lattices constructed by such algorithm. The applicability of the approach is demonstrated by pro✩ Fully

documented templates are available in the elsarticle package on CTAN. Email addresses: [email protected] (Vincenzo Loia), [email protected] (Francesco Orciuoli), [email protected] (Witold Pedrycz)

Preprint submitted to Journal of LATEX Templates

January 30, 2018

ACCEPTED MANUSCRIPT

viding a case study concerning a public dataset on forest fires occurred in the Montesinho natural park in Portugal. Keywords: Formal Concept Analysis, Temporal Data, Periodicity, Knowledge

CR IP T

Discovery, Granular Computing

1. Introduction

According to [1], an event is a thing that happens in a certain time and

place, which is involved in some actors, objectives, and action features with sta-

5

AN US

tuses changing. The notions of occurrence and co-occurrence of events can be

found in [2] where the authors provide definitions of basic co-occurrence, temporal co-occurrence, spatial co-occurrence and spatio-temporal co-occurrence. In particular, a temporal co-occurrence between two events emerges when the absolute value of the difference among the timestamps of such events is lower than a given threshold. Moreover, a spatial co-occurrence emerges when two events are co-located, i.e., they both occur in the same space (place). Lastly, a

M

10

spatio-temporal co-occurrence deals with events occurring at the same time in the same place. Also in this case, the concept of both same time and same space

ED

should be formally defined. A periodic event is an event occurring periodically, e.g., an event occurring every Monday or an event occurring every month or every weekend. In a wide range of application domains, it is important to deal

PT

15

(discovering, studying, analysing, etc.) with occurrences, co-occurrences and also with periodicities of events. For instance, in Digital Forensics, the timeline

CE

of a suspect or a group of suspects can be enriched by considering the possibility to elicit some hidden relationships among persons in a group who are in the same place at the same moment or, next, it is possible to discover a behavioural

AC

20

pattern of a person (a group of persons) by considering the sequence of his/her periodic actions. In a completely different domain like Smart Cities, it is possible to discover periodicities in some phenomena like, for instance, traffic jam in order to early plan for avoiding hardships for citizens. In this context, thre

25

present work provides an approach grounded on Granular Computing (GrC)

4

ACCEPTED MANUSCRIPT

for discovering event periodicities in temporal data. GrC is a paradigm useful to describe and analyse data at different levels of abstraction, therefore data can be exploited to create information granules starting from both the selec-

30

CR IP T

tion of a given time unit and the construction of a set of time slots in a time interval of interest decomposed into periodic segments. The paper is organized as it follow. Section 2 describes the motivations for the present work and the

originality of its results with respect to related works. Section 3 provides basic

notations, definitions, and the formulation of the problem, Section 4 provides an overall description of the proposed approach, definition and contextualiza-

tion of the measures related to granulations, some background knowledge on

AN US

35

Formal Concept Analysis (FCA) and GrC and a discussion on spatio-temporal occurrences and co-occurrences. Section 5 provides some illustrative examples recalling the main aspects of the proposed approach. Section 6 provides a case study for demonstrating the key features of the approach and, lastly, Section 7 offers some final remarks and future works.

M

40

2. Motivations and Related Works

ED

Discovering periodic occurrences or co-occurrences is an important result that can be obtained by applying data analysis techniques. One of the most

45

PT

interesting challenges, with respect to the aforementioned issue, is to find suitable views [3] from which observed data must be analysed. For instance, in a dataset of ten events, discovering seven events occuring always on Satur-

CE

day mornings could be more interesting than discovering nine events occurring always on Weekends. This aspect depends on both data and objectives. In general, observing data with an erroneous view leads to the incapacity of discovering interesting knoweldge. For this work, such views are defined by selecting tem-

AC 50

poral intervals and modifying time units in a way that allows to make a kind of temporal zoomin/zoomout operation on data. Therefore, according to the fact that the most useful views to adopt depends on both the available data and the objectives of the analysis, it is needed a computational approach to support hu-

5

ACCEPTED MANUSCRIPT

55

man operators in effectively and efficiently evaluating different views on data in order to discover interesting periodic temporal occurrences and co-occurrences of events. The proposed approach uses Formal Concept Analysis (FCA) with

CR IP T

time-related attributes to realize granulations of data with respect to periodic time slots represented as specific attributes, called temporal attributes of the 60

formal context. Granules including temporal attributes are used to discover periodic occurrences and co-occurrences. Moreover, the paper discusses a set of measures, i.e., Information Granulation (IG), Information Entropy (IE), Sepa-

ration (SEP), Coverage (COV) and Specificity (SP), which can be used to assess

65

AN US

granulation and resulting granules according to their capability to elicit useful knowledge related to periodic occurrences and co-occurrences. In brief, these

measures help us discover relevant temporal occurrences and co-occurrences by guiding us across multiple granulations and, within a specific granulation, to identify also the granules providing more interesting and/or unique knowledge. The application of FCA (or other concept learning methods) to implement data analysis approaches based on the paradigm of GrC has been already discussed

M

70

in several works, recognizable in the specialized literature [4, 5, 6]. With respect

ED

to the aforementioned works, this paper introduces an approach to granulate data along the time dimension by using FCA (borrowing the idea of Temporal Concept Analysis, already described in some works like [7]) to learn granules/concepts representing temporal periodicities of events (in data). Thus, the main

PT

75

original aspect of the paper comes to the definition of a novel time-guided granu-

CE

lation approach including a framework for assessing the quality (interestingness and uniqueness) of the discovered knowledge (periodic occurrences and/or cooccurrences of events). Despite of existing works like [2], the proposed approach helps the human operator finding the granularities (along the time dimension)

AC

80

providing interesting and unique knowledge.

6

ACCEPTED MANUSCRIPT

3. Periodic Occurrences and Co-occurrences of Events An event can be represented by the tuple . According to [8], it is possible to assume different timelines, basing on their specific time unit, to represent time information associated with

CR IP T

85

events and their patterns (occurrences, co-occurrences, periodicity, etc.). For instance, it is possible to consider a timeline based on days and, in this case, the

time unit is the day (Tday ) or based on months (the time unit is the month, i.e., Tmonth ) and so on. Different timelines provide a first type of temporal granular90

ities and make emerge, in general, different knowledge. A time slot is a sequence

AN US

of time units wihin a timeline. For instance, if we are considering the timeline Tday each day could correspond to a time slot or it is possible to consider more

days for a time slot. Following these criteria we obtain a detailed representation of the timeline. Based on a given time unit and a specific definition of 95

time slots it is possible to introduce the segment. Segments are used to model

M

periodicities. Given some t > 0, if we have a total of n time slots, such slots   are grouped into nt segments, such that each segment consists of t consecutive    slots and the k-th segment (k ∈ 0, 1, ..., nt − 1 ), denote by Ik , consists of t 100

ED

slots from the slot (t ∗ k) to the slot (t − 1 + t ∗ k). In this sense, t represents the number of parts (consecutive and non-overlapping time intervals) a segment is

PT

decomposed in. The set Ik [i] is the set of all events occurring in the i-th slot (i ∈ {0, 1, ..., t − 1}) of the k-th segment. Note that it is possible to map a given event on different timelines by using its timestamp. By considering the above

CE

approach to segmentation, an event v is said to periodically occurr if there exists

105

i ∈ {0, 1, ..., t − 1} such that v is found in the i-th slot of every segment (strict periodicity) or in most segments (relaxed periodicity). Moreover, two events

AC

v1 and v2 periodically co-occurr if there exists i ∈ {0, 1, ..., t − 1} such that v1 and v2 are found in the i-th slot of every segment or in most segments. Lastly, a time interval of interest (e.g., 3 years, from 1999-03-01 to 2002-03-31) is the

110

whole time window in which the events of interest fall in. Let us present an example. Assume to have a time interval of interest going

7

ACCEPTED MANUSCRIPT

from 2018-01-08 to 2018-01-21 and the following set of events with their associated timestamps (the information on the time at which each event occurred): (e1, 2018-02-08), (e2, 2018-02-09), (e3, 2018-02-13), (e4, 2018-02-15), (e5, 201802-20) and (e6, 2018-02-21). Assume also that we choose a time unit equal to

CR IP T

115

an individual day (i.e., Tday ) and a segment equal to seven days (i.e., a week). Now, it is possible to decompose the week into three parts and define three time

slots: time slot 0 represented by , time slot 1 represented

by and time slot 2 represented by
Sunday>. Thus it is possible to affirm that events e1 and e2 co-occur in time

AN US

slot 0 of the first segment, e4 occurs in time slot 0 of the second segment e3 occurs in time slot 2 of the first segment and e5 and e6 co-occur in time slot 2 of

the second segment. In other words: I0 [0] = {e1, e2}, I1 [0] = {e4}, I0 [1] = {}, I1 [1] = {}, I0 [2] = {e3} and I1 [2] = {e5, e6}. Lastly it is possible to affirm that 125

e1, e2 and e4 periodically co-occur in time slot 0 and e3, e5 and e6 periodically

M

co-occurr in time slot 2.

4. A GrC-based Approach to Discover Temporal Periodicity in Data

ED

The approach is based on the paradigm of Granular Computing (GrC) [9] in which a human operator establishes the time-related parameters (time unit, segment, periodic time slots, time interval of interest) of the view he/she would

PT

130

like to use in order to analyse the dataset. Such view will be used to pre-process the dataset in order to obtain a formal context to enable the work of one of

CE

the algorithms for generating formal concepts and constructing formal concept lattices (in brief FCA algorithms [10]), which can be used to realize granula-

135

tion and produce a granulated view of the universe [11]. This is what we call

AC

time-guided granulation. After a first granulation accomplished by considering the aforementioned formal context to obtain a lattice/granulation (time-guided granulation) in order to observe events from the given view, the human operator assesses the lattice/granulation by using the proposed measures and decides

140

if going ahead or back to adjust or change the considered view by modifying

8

ACCEPTED MANUSCRIPT

some time-related parameters. In this way he/she searches for suitable views. Next, given a granulation, he/she can evaluate concepts/granules to discover strict or relaxed periodicities with respect to event significant occurrences or

similar schema to that adopted in [12] for Cognitive Cities.

AN US

145

CR IP T

co-occurrences. The overall approach is depicted in Fig. 1 and it follows a

4.1. Granular Computing

M

Figure 1: Graphical representation of the overall approach.

Intelligent data analysis can be accomplished by using a plethora of existing

ED

approaches for data mining, machine learning, knowledge discovery, and statistics. The main drawback of such approaches is that they consider a specific 150

view of data when being applied. Often, it is convenient to observe and study

PT

data from multiple views. This consideration leads to the introduction of a multiview intelligent data analysis that has been introduced to explore differ-

CE

ent types of knowledge, different features of data, and different interpretations of data [3, 13]. In this context, Granular Computing has emerged as a multi-

155

disciplinary paradigm for problem solving and information processing [14, 15]

AC

that provides a general, systematic and natural way to analyze, understand, represent, and solve real world problems. By adopting such paradigm it is possible to realize multiview intelligent data analysis. Granulations and granular structures are the two fundamental issues in granular computing. Different types of

160

granulations and granular structures reflect multiple aspects of data. Granulation is related to the process for the construction of granules. The structures of 9

ACCEPTED MANUSCRIPT

all granules is related to the relationships among the granules. Different granular structures could describe different characteristics of data or knowledge embedded in data [3]. Furthermore, granulation of a universe of discourse (of available elements described by data) involves the grouping of individual elements into

CR IP T

165

classes that decompose the set of all elements into granules. The relationships that keep together different elements in a granule could be indistinguishability,

similarity, proximity or functionality [15]. Therefore, a granulation provides a

granulated view of the universe. Potentially, we can realize many granulated 170

views of the same universe. Fig. 1 (the image on the right) provides also a

AN US

graphical sketch depicting three different time-guided granulations of the same universe of events.

4.2. Formal Concept Analysis with Time-related Attributes

Formal Concept Analysis (FCA) [16] is a method for data analysis, knowl175

edge representation and information management [17]. In this paper, FCA is

M

adopted to realize granulation over a set of events/objects (universe) [18] in order to investigate temporal phenomena [7]. Let us introduce basic definitions in

ED

the area of FCA. A formal context is defined as a set structure K = (G, M, I) for which G and M are sets while I is a binary relation between G and M , 180

i.e., I ⊆ G × M . G is a set of objects, M is a set of attributes and gIm, i.e.,

PT

(g, m) ∈ I, states that the object g has the attribute m. For constructing formal concepts from a formal context two derivation operators are needed. Being

AC

CE

A ⊆ G and B ⊆ M , then it is possible to define the following two operators:

185

AI = {m ∈ M | gIm for all g ∈ A} , B I = {g ∈ G | gIm for all m ∈ B} .

The previously defined derivation operators generate, respectively, the set of

all attributes held by a given set of objects, and the set of all objects holding a give set of attributes. A formal concept of a formal context is defined as (A, B) with A ⊆ G and 10

ACCEPTED MANUSCRIPT

B ⊆ M , B = AI and A = B I . A and B are called extent and intent of the formal concept (A, B). Moreover, the subconcept-superconcept relation is defined as:

190

CR IP T

(A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B1 ⊇ B2 ),

where (A1 , B1 ) and (A2 , B2 ) are two formal concepts of the formal context K = (G, M, I), (A1 , B1 ) is subconcept of (A2 , B2 ) and (A2 , B2 ) is superconcept of (A1 , B1 ). Starting from the derivation operators and their properties, it is

possible to define algorithms to construct formal concepts from a formal context

AN US

and represent the extracted knowledge as a lattice [19].

As reported in [20], concepts are primarily cognitive structures and used to

195

reconstruct and represent objects, segments, events of the surrounding world. If objects in formal contexts come with information related to time, it is possible to apply FCA to realize time-related knowledge discovery and representation [7, 21, 22]. The resulting formal concepts (A, B) are information granules. Among all the constructed information granules, those having a B ∩ D 6= ∅

M

200

are called timed information granules (TIGs). TIGs are interpreted as periodic

ED

occurrencese or co-occurrences of events. In order to semantically interpret a TIG it is needed to consider the formal concept (A, B) representing the TIG, the temporal attributes D included in B, the other attributes included in B that characterize the events in A and all the time-related parameters used to

PT

205

realize the granulation.

CE

Therefore, we further specify the definition of formal context by considering D ⊆ M , where D = {d1 , d2 , ..., dk }. The generic element di is an attribute representing a discrete time information. Definitely, injecting time information, that is extending the set of attributes, in a formal context, with time-related

AC

210

attributes, is a plausible approach to extract time-enriched concepts from data and represent them by means of a well-known knowledge structure (the lattice) [7]. Let us describe a short example. If we consider information related to the daily presence of two employees (e1 and e2) in one of the existing sites

215

(Rome, M ilan) of their company, we have the following set of attributes: M = 11

ACCEPTED MANUSCRIPT

{Rome, M ilan, M onday, T uesday, W ednesday, T hursday, F riday} and D = {M onday, T uesday, W ednesday, T hursday, F riday}. By applying FCA it is possible to obtain formal concepts organized into the lattice shown in Fig. 2

220

CR IP T

(obtained from the formal context in Table 1), where it is possible to understand, by analysing formal concepts cointaining information for both e1 and e2, that

e1 periodically (strict periodicity) is at the Rome site every Monday, and e2

periodically (relaxed periodicity) is at the Milan site on Tuesday. Moreover,

the presence of e1 and e2 co-occur (relaxed periodicity) at Rome on Monday. Note that it is possible to read, from the lattice, the attributes simultaneously held by the aforementioned objects by exploring the attributes attached to all

AN US

225

nodes within all paths conducting (from the top) to the node containing such objects. Moreover, e1 1, e1 2, ..., e2 1, ... represent the set of observations for

x

x

e1 2

x

x

e1 3

x

x

x

x

e1 4 e2 1

x

x

PT

e2 2

ED

e1 1

M

Ro me Mi lan Mo nd ay Tu sda We y d Th nesd urs ay Fri da da y y

the employees e1 and e2.

x

e2 4

x

x

x x

CE

e2 3

x

Table 1: Sample formal context.

AC

230

Figure 2: Sample formal lattice.

4.3. Preparing Formal Contexts to support Time-guided Granulation Granulation needs to start from data, thus the original dataset (e.g., a log

file) to be analysed, also called raw data, is typically structured into a set 235

of rows where each row includes information about an event and the time at

12

ACCEPTED MANUSCRIPT

which it occurs. By using a conceptual scaling approach, the raw data are transformed into a formal context (G, M , I) where the events (or entities) are the objects belonging to G described by means of the attributes in M . Periodic

240

CR IP T

time slots are scaled into periodic time slots and represented by the subset of attributes D ⊆ M , namely temporal attributes. More in detail, we have to select: i) a time interval of interest Ω and excluding, from the raw data, the

information related to events that do not occur in such inverval, ii) the time

unit to consider (e.g., Tday , Tweek ), and iii) the number and length (in terms of time units) of time slots basing on the length t (measured in time units) of the segments the timeline is decomposed in. By using the above parameters

AN US

245

it is possible to define the set of periodic time slots {S1 , S2 , ..., Sk } where the Pk length of Si is defined by the function len(Si ) such that i=1 len(Si ) = t and considering that the whole length of a segment (i.e., t), in number of time units.

AC

CE

PT

ED

M

Subsequently, given the above information we need a function fT that maps an

250

Figure 3: Mapping a time interval onto a formal context.

event timestamp onto a periodic time slot. Being g an event, fT (g) is equal to i, with i = 1, ..., k if the timestamp of g, namely τ (g), falls in the defined time interval Ω and falls in exactly one element of the set {S1 , S2 , ..., Sk }, otherwise it is equal to −1. Starting from fT it is possible to conceptually scale the raw 13

ACCEPTED MANUSCRIPT

data into a formal context (G, M, I). In particular, we are interested in focusing 255

on how to define the set D ⊆ M of temporal attributes. The set D includes the attributes d1 , d2 , ..., dk such as gIdi if and only if g falls in Si , i.e., fT (g) = i.

CR IP T

Thus, by using fT it is possible to fill the part of the formal context related to the temporal attributes in D. Fig. 3 graphically shows an example of how

original data can be pre-processed to build a formal context. In particular, such 260

figure provides insights into the transformations from time slots to time-related attributes. Thus, the attribute T uesday, in the formal context, represents the

union of all slots T uesday, i.e., one for each segment in the time interval of

AN US

interest. In brief, each attribute in D represents a periodic time slots. Take care

that after conceptual scaling operations it could be useful to execute attribute 265

reduction operations [23, 24] to improve the performance of FCA algorithms. 4.4. Assessing Granulations

A problem related to the interpretation of concept lattices is often caused

M

by the fact that number and srtucture (intensional and extensional parts) of the extracted formal concepts are not satisfactory. A large number of formal concepts, corresponding to the information granules, provides an overly fine

ED

270

granulation of the input objects. On the other hand, a small number of formal concepts (information granules) may not provide sufficient information about

PT

the input objects for the decision makers. Therefore, adjusting the granularity level of attributes in order to acquire suitable knowledge from the corresponding 275

formal concept becomes an important issue [4]. We need an approach to assess

CE

granulations in order to evaluate the knowledge provided by granules. Such approach can be based on one or more existing measures that will be introduced

AC

soon in this paper.

280

Given the formal context K(G, M, I) and the corresponding lattice L(G, M, I),

a first couple of measures [25, 26] using which it is possible to evaluate a lat-

14

ACCEPTED MANUSCRIPT

tice/granulation is represented by the Information Entropy (IE): X 1 IE(L) = |G| g∈G

1−

II ! g

(1)

|G|

CR IP T

and the Information Granulation (IG): IG = 1 − IE. IG (and IE) provides information on the granules within a lattice by taking into account the num-

ber objects (estensional part) included into each granule. For each couple of 285

formal contexts K1 (G1 , M, I1 ) and K2 (G2 , M, I2 ), if IG(L1 ) > IG(L2 ) then K1 provides a granulation that is more interesting than that of K2 with re-

AN US

spect to our aim to find co-occurrences. This is intuitive beacause L1 provides bigger granules, i.e., granules with more objects, and the set of attributes is the same, i.e., thus the time slots have the same length. Moreover, given two 290

formal contexts K1 (G1 , M1 , I2 ) and K2 (G2 , M2 , I2 ) where: i) M1 = M ∪ {a}, M2 = M ∪ {a1 , a2 }, and ii) ∀g1 ∈ G1 ∃g2 ∈ G2 such as when g1 I1 m holds then g2 I2 m holds and when g1 I1 a holds then exactly one of g2 I2 a1 and g2 I2 a2 holds.

M

Then IG(L1 ) ≥ IG(L2 ). If IG(L1 ) = IG(L2 ) then K2 is more suitable than K1 , otherwise we need more information to compare K2 and K1 with respect to 295

their interestingness related to the discovery of co-occurrences. This is intuitive

ED

because we are decomposing an attribute providing finer intervals in the place of a coarser interval.

PT

IG and IE give us a framework to evalute a set of granulations driven by different definitions of specific periodic time slots, which are represented by the set of temporal attributes included in the set of attributes of the formal context.

CE

300

4.5. Evaluating the uniqueness level of granules In order to elicit more information on individual granules it is possible to

AC

use the measure of separation (SEP), introduced in [27][28] and defined as:

305

SEP (A, B) = P

I g∈A |g | +

|A| |B| P I m∈B |m | − |A| |B|

(2)

where (A, B) is a formal concept. The SEP index can be used to describe how 15

ACCEPTED MANUSCRIPT

well a formal concept sorts out the objects it covers from other objects and, jointly, how well it sorts out the attributes it covers from other attributes of the formal context. Therefore, SEP is defined by calculating the ratio between the

310

CR IP T

area covered in the formal context by a formal concept (A, B) and the total area covered by its objects and attributes. The SEP index gives us a relative measure about how a specific granule is separated by the other ones and it can be useful to identify granules with unique characteristics (in terms of both objects and attributes). SEP index does not care of the attribute semantics.

AN US

4.6. Evaluating the interestingness of granules and granulations

For this aim, Granular Computing literature [29, 30] provides with general

315

measures to evaluate individual granules within a given granulation. The most important measures are coverage (COV) and specificity (SP). As reported in [31], coverage is concerned with an ability of information granule to represent (cover) data. In general, the larger number of data is being covered, the higher the coverage of the information granule. The author of [31] also states that

M

320

the specificity, intuitively, relates to a level of abstraction of the information

ED

granules. The higher the specificity, the lower the level of abstraction. These two measures have to be specialized by taking into account both the nature and structure of granules. In particular, given a lattice L(G, M, I), for each formal concept (information granule) (A, B) in such a lattice, it is possible to calculate

PT

325

AC

CE

its coverage and specificity.

COV (A, B) =

SP (A, B) = 1 −

|A| , |G|

len(d) , range

(3)

d∈B

(4)

where len(d) is the length of the periodic time slot Sd represented by the at-

tribute d and range is the sum of the lengths of all the considered time slots

D = {d0 , d1 , ..., dt−1 } corresponding to the mumber of time units in a sin330

gle segment. The previous measures allow to describe the characteristics of

16

ACCEPTED MANUSCRIPT

an information granule, represented by a formal concept, along two coordinates [32]. This capability will be exploited in the next parts of this study to graphically describe granules. Moreover, such measures can be aggregated in

335

CR IP T

order to provide a unique index representing the interestingness of the granule: ζ

Q(A, B) = COV (A, B) · SP (A, B) , where the power ζ offers some flexibility

in expressing importance between the conflicting requirements of coverage and

specificity. The higher the value of ζ, the more essential is the facet of specificity. Furthermore, if we would like to consider the whole granulation, it is possible

340

the following formula: ¯ Q(L) =

AN US

to calculate the average interestingness of it. This can be accomplished by using

X

(A,B)∈L,

B∩D6=∅

Q(A, B) n

(5)

where n is the cardinality of the set of formal concepts in L having a temporal

M

attribute in its intensional part, i.e., those (A, B) given that B ∩ D 6= ∅. 4.7. Combining measures to support granulation decisions Making reference to the process in Fig. 1, once the assessment results of

ED

345

the current granulation are observed and interpreted by the human operator, he/she can modify the time-related parameters in order to re-granulate and

PT

changing the lens (e.g., zooming in or zooming out) using which observing the events. This section provides a set of recommendations useful to configure a new granulation. In particular, when performing a further time-guided granu-

CE

350

lation starting from the results of a previous one, two important decisions to make concern which temporal attributes (in the formal context) have to be de-

AC

composed and how to decompose them. The attribute selection can be done by considering the temporal attribute included in the granule with big coverage

355

and small specificity, i.e. with greater

COV SP

index. Such choice allows us to

further investigate the granule characterized by numerous events occurring in a large periodic time slot.

17

ACCEPTED MANUSCRIPT

On the one hand, we can separate these events and continue to have a sufficient coverage. On the other hand, we can reduce the length of the periodic 360

time slot, given that the current one is not significant. The second issue is to

CR IP T

decide how to decompose the selected temporal attribute. First of all, we can decide on the number of parts two obtain by decomposing the given attribute. For simplicity, we choose to consider only three parts that are defined by means

of two values α and β. Finding optimal values of α and β is a subject to 365

several existing studies. There exist several approaches based, for instance, on Information Theoretic Rough Sets [33].

AN US

Additionally, we would like to define the decomposition operator ` that can be used to split a specific temporal attribute d ∈ D, related to a given formal

context K(G, M, I) and D ⊆ M , into a set of temporal attributes {d01 , ..., d0n } 370

in order to obtain a new formal context K 0 (G0 , M 0 , I 0 ) where M 0 = M − {d} ∪

{d01 , ..., d0n } and ∀g ∈ G ∃g 0 ∈ G0 such that when gIm holds then g 0 I 0 m0 also

holds and when gId holds then exactly one of g 0 I 0 d0i (for i = 1, ..., n) also holds.

M

For instance, the statement d ` d1 d2 indicates that the temporal attribute d is decomposed into the set {d1 , d2 }.

After realizing a new granulation by decomposing a temporal attribute d

ED

375

into a set of smaller temporal attributes d01 , d02 , ..., d0k it could be useful to understand if the new granulation provides more interesting knowledge with

CE

PT

respect to the previous one. In this case, the following statement holds:

(6)

where B 0 = B − {d} ∪ {d0i } (i = 1...k).

AC

380

COV (A, B) ≥ COV (A0 , B 0 ),

Additionally, also the following statement holds: SP (A, B) ≤ SP (A0 , B 0 ),

where B 0 = B − {d} ∪ {d0i } (i = 1...k). 18

(7)

ACCEPTED MANUSCRIPT

¯ applied to specific periodic Moreover, we can provide the definition of Q time slots: ¯ Q(d) =

X

B∩{d}6=∅

(A,B)∈L,

Q(A, B) nd

(8)

CR IP T

385

where nd is the cardinality of the set of formal concepts in L having a temporal attribute d in its intensional part. It is possible to know if a decomposition leads ¯ 0 ), Q(d ¯ 0 ),..., Q(d ¯ 0 )) ≥ Q(d). ¯ to a better granulation by checking if max(Q(d 1 2 k

Lastly, we can assess two alternative decompositions (obtained by applying

390

AN US

two times the operator ` and using different target sets for the function range)

of the same temporal attribute by: i) realizing the associated granulations and ii) comparing the two sums (one for each granulation) of Q index of all timed information granules. The greater is associated with the most suitable granulation 395

in terms of interestingness.

M

4.8. Including Spatial Information

In order to extend the proposed approach to include also spatial information,

ED

it is possible to construct a formal context with additional attributes related to locations/places. In particular, if M is the set of attributes, A is the set of 400

objects and D ⊆ M is a subset of all time-related attributes (periodic time

PT

slots as defined in Section 4.3) we consider a further subset of M , namely P = {p0 , p1 , ..., ph−1 }, of places in which the event of interest has occurred. The

CE

construction of the subset P can be realized by considering a granulation process similar to that employed for the granulation along the time dimension. The sets

405

D and P are disjoint: D ∩ P = ∅. The resulting lattice will contain concepts

AC

including both temporal and spatial information within their intentional parts. In this case, it will be possible to look for periodic spatio-temporal occurrences or co-occurrences. For instance, Fig. 4(c) shows an information granule containing events (or entities) {oo1, oo5} that periodically co-occur at time w1,2,2 in the

410

place place-r. Therefore, we will consider a new type of information granules represented by formal concepts (A, B) with B = P ∪ {d} ∪ {p}. 19

ACCEPTED MANUSCRIPT

5. Illustrative Example The initial dataset consists of six records providing information related to space and time of activities executed by a specific actors (users). We use FCA to detect periodic co-occurrences. Therefore, the first step is to choose time

CR IP T

415

unit, segment and time slots in order to construct the set D of temporal at-

tributes in the formal context. In this example, we describe three different ways of granulation guided by three different decomposition of a segment. In the first decomposition, the time unit is set to Tday (days), the segment is represented 420

by the week (seven days) and, initially, there is only one periodic time slot (i.e.,

AN US

w1 ) representing all the days in a week. Subsequently, for the second decomposition, there are two periodic time slots obtained by grouping the seven time

units in a segment into two parts: the first part (first slot w1,1 ) including Monday and Tuesday and the second part (second slot w1,2 ) including Wednesday, 425

Thursday, Friday, Saturday and Sunday. The third decomposition foresees three

M

different slots: the first one composed by Monday and Tuesday (equal to the previous decomposition w1,1 ), the second one (w1,2,1 ) including Wednesday and Thursday, and, lastly, the third one (w1,2,2 ) represented by Friday, Saturday

430

ED

and Sunday. The above three decompositions can be described by the operation chain w1 ` w1,1 w1,2 ` w1,1 w1,2,1 w1,2,2 . This chain leads to three different

PT

granulations of the universe of available events and can be seen as a sequence of zooming in operations. In order to execute a lattice building algorithm it is needed the construction

CE

of formal contexts. Therefore, the attributes have to be conceptually scaled. In

435

more detail, we have three attributes related to space (i.e., place-r, place-n and place-m), three related to the activity type (i.e., act-1 and act-2) and, lastly,

AC

the temporal attributes as we have indicated above. Table 2 shows the three formal contexts constructed by accomodating the initial dataset with respect to the three aforementioned decompositions of the temporal attributes. The

440

original dataset comes from a log file in which activities of two types (act-1 or act-2) have been executed at a given time (the timestamp is scaled on the

20

ACCEPTED MANUSCRIPT

Table 2: Three formal contexts

x

oo4

x

oo5

x

oo6

oo2

x

oo3

x

oo4

x

x

x

oo5

x

x

x

oo6

x

x

x

x x

x

x

x

oo5

x

2 1,

w

x

2, 2

1,

x x

x x

w

1, 2, 1

t-2

1, 1

w

t-1

ac

ac

w

x

x

x

x

x

x

x

x

M

oo4

x

x

AN US -m

-n

ce

ce

x

x x

x

x

x

oo3

oo6

pla

-r ce

pla

pla

x

oo2

x

x

(c) Decomposition w1,1 , w1,2,1 , w1,2,2

oo1

1, 1

ce -r pla ce -n pl a ce -m ac t-1

oo1

x

x x

x

pla

1

x

x

oo3

w

x

oo2

CR IP T

x

ac t-2

ce -r pla ce -n pla ce -m ac t-1

pla

oo1

ac t-2 w

(b) Decomposition w1,1 , w1,2

(a) Decomposition w1

x

x

ED

periodic time slots represented by the temporal attributes) in a specific location (place-r, place-n or place-m) by different human actors. Once, the granulation parameters have been established, we have to execute the FCA algorithm for three times, one for each formal context obtained before. If we calculate the IG

PT

445

for each lattice, construted respectively from the three provided formal contexts,

CE

we obtain the values: IG(L1 ) = 0.28, IG(L2 ) = 0.28 and IG(L3 ) = 0.22. This result is convincing given that information granules are known to become larger when attributes are removed from the previous formal context [25]. Thus if we decompose temporal attributes in finer parts we obtain a minor or equal

AC

450

IG value. Lattices generated from the first and the second context provide the same IG values because they have de-facto the same number of attributes

(note that w1,1 has no object supporting it). If we increase the number of temporal attributes (by decomposing in the way adopted before) and the IG

21

CR IP T

ACCEPTED MANUSCRIPT

455

AN US

Figure 4: Information granules in lattices constructed from the three formal contexts

value does not change then we achieve a more detailed view of data. In brief, IG provides a useful approach to evaluate the degree of finess (or coarseness) of information granules: the larger IG of a lattice the coarser the information granules and, consequently, the view on data will be less focused. Fig. 4 shows

460

M

how granules vary along different granulations. It is important to underline that the three results must be considered in the light of the size of periodic time slots we are considering. In general, using finer time slots allows to gather more

ED

precise knowledge about periodic co-occurrences.

In the provided example

AC

CE

PT

Table 3: COV SP and Q measures applied to the granules of the third granulation. GRANULE (INT)

COV

SP

Q

place-m act-2 w1,2,2

0.17

0.57

0.10

place-n act-1 w1,2,2

0.17

0.57

0.10

place-r act-2 w1,2,2

0.17

0.57

0.10

place-n act-1 w1,2,1

0.17

0.71

0.12

place-r act-1 w1,2,2

0.33

0.57

0.19

act-2 w1,2,2

0.33

0.57

0.19

place-n act-1

0.33

N/A

N/A

act-1 w1,2,2

0.50

0.57

0.29

place-r w1,2,2

0.50

0.57

0.29

act-1

0.0

N/A

N/A

w1,2,2

0.0

0.57

0.0

it is possible to see that the granulation degree calculated by applying the IG

22

ACCEPTED MANUSCRIPT

measure is the same for the first and the second lattice. This is due to the fact 465

that we have the same granules in both lattices, in turn, this is given by the executed decomposition that does not change the granule composition. Taking

CR IP T

into account the discussion of Section 4.7 we can assert that if we choose a time decomposition leading to a granulation with an unsuitable IG value, we have to go back and try with another decomposition. The attribute to decompose can be 470

selected, empirically, by considering those attributes representing time slots with

great length and holding a great number of objects. Now, it is possible to employ the measures described in Section 4.6 in order to evaluate the interestingness of

AN US

every granules in available lattices. COV , SP and Q (calculated by using ζ = 1)

indexes are applied to all granules of the three lattices in Fig. 4. In particular, 475

Table 3 reports such values for the third granulation. We can conclude that the two granules of the third granulation seem to be more interesting than others. Such granules are positioned in the same point and provide the same Q value of approximately 0.29: (place-r, w1,2,2 ) and (act-1, w1,2,2 ). Given that, for this

480

M

example, we are focusing only on temporal aspects it can be acceptable that the above granules have the same Q value representing their interestingness. The

ED

interpretation of such values is that there is a periodic temporal (co-)occurrence of activity act-1, executed periodically in the time slot w1,2,2 and another one

PT

for any activity in the same time slot in a specific location called place-r.

6. Forest Fires Periodicity: a case study The goal of this case study is to analyse the periodicity of forest fires

CE

485

in the Montesinho natural park (Portugal). The study results can be used, for instance, to predict next years forest fires. The data used in the exper-

AC

iments was collected from January 2000 to December 2003 and it was built using two sources.

490

The first database was collected by the inspector that

was responsible for the Montesinho fire occurrences. At a daily basis, every time a forest fire occurred, several features were registered, such as the time, date, spatial location within a 99 grid the type of vegetation involved, the

23

ACCEPTED MANUSCRIPT

six components of the FWI system and the total burned area. The second database was collected by the Braganc, a Polytechnic Institute, containing sev495

eral weather observations (e.g. wind speed) that were recorded with a 30 minute

CR IP T

period by a meteorological station located in the center of the Montesinho park. The two databases were stored in tens of individual spreadsheets, under distinct formats, and a substantial manual effort was performed to integrate them into a single dataset with a total of 517 entries. This data is available at: 500

http://www.dsi.uminho.pt/pcortez/forestfires/. The dataset has been pre-processed (as indicated in the Section 4 and, in particular, in the Section

AN US

4.3) in order to construct a set of formal contexts that are needed to execute

the FCA algorithm and evaluate the corresponding time granulations. To preprocess data we have assumed the time unit Tmonth and the time interval of 505

interest Ω = [January 2000, December 2003]. Time unit and time interval of interest are fixed also for the next granulations. The number and lengths of the time slots vary along the granulation sequence. The starting granulation has

M

been guided by the periodic time slots month tri1, month tri2 and month tri3 corresponding to the three parts of the year (each part of four months) repeated for each segment (each segment has length of one year). This means

ED

510

that each event (forest fire) falls in only one of the defined periodic time slots and supports only one of the attributes month tri1, month tri2 and month tri3

PT

after performing the conceptual scaling operation along the time dimension. An additional conceptual scaling operation has been realized for the location information included in each row of the dataset. In particular, we have added one

CE

515

attribute for each combination of x and y values corresponding to a specific cell in the 9x9 grid. More in details, a forest file occurred in the location (x1 , y1 )

AC

holds the attribute P lace x1 y1 . The location attributes and values will not vary

along the granulation sequence and, additionally, for this case study, we have

520

not used the other information (fire and weather) attached to each event. After realizing the granulation by considering the above configuration the IG measure is 0.025. The application of the Q index on the lattice obtained by starting from the first formal context is reported in Fig. 5 as an heat map whose cells represent 24

ACCEPTED MANUSCRIPT

timed information granules values are Q values organized into a grid of times

ED

M

AN US

CR IP T

(x axis) and places (y axis). As observed in Fig. 5, the most interesting granule

Figure 5: Q values for the first granulation.

525

PT

is the one holding P lace 86 and falling in the periodic time slot month tri2. In brief, the major number of forest fires occurr in the months of May, June, July and August. Next, Fig. 6 shows the values corresponding to the ratios for all the timed information granules. In this case we have that a granule

CE

COV SP

530

holding month tri2 attribute has the highest

COV SP

value. Thus, it is plausible

AC

to decompose the slot corresponding to such attribute to better observe data by a zooming-in operation. Therefore it is possible to apply the decomposition operator ` to split the periodic time slot containing May, June, July and August

into two periodic time slots: the first one containing May and June, the second

535

one containing July and August. Take care that the parameters (number and lengths of periodic time slots) adopted for the decomposition operation have 25

Figure 6:

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

COV SP

values for the first granulation.

ED

been choosed arbitrary. The obtained periodic time slots are represented by two new temporal attributes month tri2mj and month tri2ja. Thus, we have built a second formal context by dropping from the previous one the attribute month tri2 and inserting month tri2mj and month tri2ja. Every event pre-

PT

540

viously holding month tri2 holds now exaclty one between month tri2mj and

CE

month tri2ja.

The second granulation produced a lattice that shows the most interesting

granule holding month tri2ja and again P lace 86. It is clear than there is a strong periodicity in the period [July, August] for the cell (8, 6) of the 9x9 grid

AC 545

attached to the Montesinho park. It is important to note that if we calculate the interestingness of the at-

¯ index, it is smaller tribute month tri2 for the first granulation by means of the Q ¯ indexes of the attributes month tri2mj and than the maximum between the Q

26

ACCEPTED MANUSCRIPT

550

month tri2ja in the second granulation. As stated in Section 4.7 we can affirm that the second granulation is better than the first in terms of acquired knowl¯ ¯ edge because max(Q(month tri2mj) = 0.00278, Q(month tri2ja) = 0.01243) ≥ Q formula. If we take a look to the

555

COV SP

CR IP T

¯ ¯ values are obtained with ζ = 1.0 in the Q(month tri2) = 0.00977. Q and Q

values for the second granulation

it is possible to know that the attribute month tri2ja is the most suitable to be decomposed to find a better granulation. In fact, the third granulation de-

composes the attribute month tri2ja into two attributes month tri2ja j (July) and month tri2ja a (August). First of all, the third granulation, for ζ = 1.0,

560

AN US

¯ is better than the second one if we consider that max(Q(month tri2ja j) =

¯ ¯ 0.00355, Q(month tri2ja a) = 0.01255) ≥ Q(month tri2ja) = 0.01243. Also for the third granulation the most interesting granule continues to be the one holding P lace 86 along to the space dimension. However, in this granulation we have that the above granule holds month tri2ja a along the time dimension.

fires in the cell (8, 6) of the 9x9 grid. With respect to the same granula-

M

565

This indicates that in August there is a strong periodicity of forest

tion it is interesting to note that if we set to 0.5 the value for the parame-

ED

ter ζ we obtain that the third granulation is not better than the second one. ¯ ¯ In fact max(Q(month tri2ja j) = 0.00370, Q(month tri2ja a) = 0.01310) < ¯ Q(month tri2ja) = 0.01362. Next, if we would like to realize a further zoomingin operation on data it is possible to observe again the

PT

570

COV SP

-values. In this case,

we would need a new time unit if splitting the attribute month tri2ja a that COV SP

value. Thus, we decide to keep the above attribute and

CE

has the maximum

split month tri3, providing the second higher value for the

COV SP

index, in order

to obtain the new two temporal attributes month tri3so (for September and October) and month tri3nd (for November and December). Once the new for-

AC

575

mal context has been built and the FCA algorithm has been applied, the fourth granulation can be evaluated. In particular, another strong periodicity of forest fires occurs in the period from September to October and, in particular, in cells (7, 4) and (6, 5). The fourth granulation is better than the previous one with

580

respect to both ζ = 1.0 and ζ = 0.5. If we follow the previous approach we can 27

ACCEPTED MANUSCRIPT

further decompose month tri3so into month tri3so s and month tri3so o for a fifth granulation. In this case, if ζ = 1 than the granulation is considered better than the previous one. This is not true if ζ = 0.5. At this point we decided

585

given the

COV SP

CR IP T

to stop the process given that we would not like to change the time unit and, values, we had not other suitable attribute to decompose. If

we caluclate the SEP -values for the fourth granulation we obtain higher values

for P lace 86 in month tri2ja a (August) and for P lace 4,6 in month tri3nd

(November-December). Therefore, the aforementioned two granules have high

uniqueness levels given that their events have temporal and/or spatial components that are not shared with many other events in the considered universe.

AN US

590

In this case, in November and December we have found that the ratio between the number of forest fires localized in P lace 46 and that localized in other cells in the 9x9 grid is anomalous. This ratio is relatively much greater for P lace 46 than for the other cells.

7. Final Remarks and Future Works

M

595

The paper proposes an overall approach to use FCA and the paradigm of

ED

Granular Computing to analyse data, containing information about events, in terms of their periodicity. FCA is used as a granulation mechanism able to gen-

600

PT

erate different granulated structures based on its current configuration. Such structures contain timed information granules, i.e., a group of related events, occurring periodically in the same (periodic) time slot and also sharing other

CE

characteristics (space, category, etc.). Multi-views on data, supported by the granular computing paradigm, are needed because they metaphorically offer the chance to observe data through different lens with different zoom levels. This allows to elicit additional knowledge (if we can correctly interpret it) with respect

AC 605

to the one obtained by means of a single-view. The main contribution of the paper is to provide solutions to support the discovery of significant periodicities in data by defining an approach to perform time-guided granulations of data and a framework to evaluate the quality of these granulations. At the moment,

28

ACCEPTED MANUSCRIPT

610

the proposed approach does not foresee a (semi-) automatic procedure for finding, in an effective and efficient way, the most suitable granular structures and

CR IP T

granules but the authors may address this issue in future works.

References References 615

[1] Y. Zhang, W. Liu, N. Ding, X. Wang, Y. Tan, An event ontology descrip-

tion framework based on skos, in: IEEE 15th International Conference on

AN US

Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), 2015, pp. 1774–1779.

[2] H. W. Lauw, E.-P. Lim, H. Pang, T.-T. Tan, Social network discovery by mining spatio-temporal events, Computational & Mathematical Organiza-

620

tion Theory 11 (2) (2005) 97–118.

M

[3] Y. Chen, Y. Yao, Multiview intelligent data analysis based on granular computing., in: Granular Computing (GrC), 2006, pp. 281–286.

ED

[4] Q. Zhang, Y. Xing, Formal concept analysis based on granular computing, J. Comput. Inf. Syst 6 (7) (2010) 2287–2296.

625

PT

[5] J. Li, C. Mei, W. Xu, Y. Qian, Concept learning via granular computing: A cognitive viewpoint, Information Sciences 298 (Supplement C) (2015) 447 – 467. doi:https://doi.org/10.1016/j.ins.2014.12.010.

CE

URL

http://www.sciencedirect.com/science/article/pii/

S0020025514011475

630

AC

[6] J. Li, C. Huang, J. Qi, Y. Qian, W. Liu, Three-way cognitive concept

635

learning via multi-granularity, Information Sciences 378 (Supplement C) (2017) 244 – 263. doi:https://doi.org/10.1016/j.ins.2016.04.051. URL

http://www.sciencedirect.com/science/article/pii/

S0020025516303048

29

ACCEPTED MANUSCRIPT

[7] K. E. Wolff, Temporal concept analysis, in: ICCS-2001 International Workshop on Concept Lattices-Based Theory, Methods and Tools for Knowledge Discovery in Databases, Stanford University, Palo Alto (CA), 2001, pp. 91–

640

CR IP T

107. [8] A. Le, M. Gertz, Mining periodic event patterns from rdf datasets, in: East

European Conference on Advances in Databases and Information Systems, Springer, 2013, pp. 162–175.

[9] J. T. Yao, A. V. Vasilakos, W. Pedrycz, Granular computing: perspectives

645

AN US

and challenges, IEEE Transactions on Cybernetics 43 (6) (2013) 1977–1989.

[10] S. O. Kuznetsov, S. A. Obiedkov, Comparing performance of algorithms for generating concept lattices, Journal of Experimental & Theoretical Artificial Intelligence 14 (2-3) (2002) 189–216.

[11] J. Hu, T. Li, H. Wang, H. Fujita, Hierarchical cluster ensemble model based

M

on knowledge granulation, Knowledge-Based Systems 91 (2016) 179–188. doi:10.1016/j.knosys.2015.10.006.

650

ED

URL https://doi.org/10.1016/j.knosys.2015.10.006 [12] G. Wilke, E. Portmann, Granular computing as a basis of human–data interaction: a cognitive cities use case, Granular Computing 1 (3) (2016)

655

PT

181–197.

[13] Y. Jing, T. Li, H. Fujita, Z. Yu, B. Wang, An incremental attribute re-

CE

duction approach based on knowledge granularity with a multi-granulation view, Inf. Sci. 411 (2017) 23–38. doi:10.1016/j.ins.2017.05.003.

AC

URL https://doi.org/10.1016/j.ins.2017.05.003

[14] A. Bargiela, W. Pedrycz, The roots of granular computing, in: Granular

660

Computing, 2006 IEEE International Conference on, IEEE, 2006, pp. 806– 809.

30

ACCEPTED MANUSCRIPT

[15] L. A. Zadeh, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy sets and systems 90 (2) (1997) 111–127. [16] R. Wille, Restructuring lattice theory: an approach based on hierarchies of

CR IP T

665

concepts, in: Ordered sets, Springer, 1982, pp. 445–470.

[17] U. Priss, Formal concept analysis in information science, Arist 40 (1) (2006) 521–543.

[18] G.-Y. WANG, Q.-H. ZHANG, X.-A. MA, Q.-S. YANG, Granular comput-

AN US

ing models for knowledge uncertainty, Journal of Software 4 (2011) 007.

670

[19] M. C. Lee, Z. L. Liu, H. H. Chen, J. B. Lai, Y. T. Lin, Fca based concept constructing and similarity measurement algorithms, in: Advanced Information Management and Service (IMS), 2010 6th International Conference on, IEEE, 2010, pp. 384–388.

[20] R. Wille, Formal concept analysis as mathematical theory of concepts and

M

675

ED

concept hierarchies, in: Formal concept analysis, Springer, 2005, pp. 1–33. [21] C. De Maio, G. Fenza, V. Loia, F. Orciuoli, Unfolding social content evolution along time and semantics, Future Generation Computer Systems 66

680

PT

(2017) 146–159.

[22] C. D. Maio, G. Fenza, V. Loia, F. Orciuoli, Making sense of cloud-sensor

CE

data streams via fuzzy cognitive maps and temporal fuzzy concept analysis, Neurocomputing 256 (Supplement C) (2017) 35 – 48, fuzzy Neuro Theory

AC

and Technologies for Cloud Computing.

[23] W. Wei, J. Wang, J. Liang, X. Mi, C. Dang, Compacted decision tables

685

based attribute reduction, Knowledge-Based Systems 86 (2015) 261–277.

[24] Y. Jing, T. Li, C. Luo, S.-J. Horng, G. Wang, Z. Yu, An incremental approach for attribute reduction based on knowledge granularity, KnowledgeBased Systems 104 (2016) 24–38. 31

ACCEPTED MANUSCRIPT

[25] C. Huang, J. Li, S. Dias, Attribute significance, consistency measure and attribute reduction in formal concept analysis, Neural Network World 26 (6)

690

(2016) 607–623.

CR IP T

[26] P. K. Singh, A. K. Cherukuri, J. Li, Concepts reduction in formal concept analysis with fuzzy setting using shannon entropy, International Journal of Machine Learning and Cybernetics (2015) 1–11. 695

[27] M. Klimushkin, S. Obiedkov, C. Roth, Approaches to the selection of rel-

evant concepts in the case of noisy data, in: International Conference on

AN US

Formal Concept Analysis, Springer, 2010, pp. 255–266.

[28] S. O. Kuznetsov, T. Makhalova, On interestingness measures of formal concepts, arXiv preprint arXiv:1611.02646. 700

[29] W. Pedrycz, Concepts and design aspects of granular models of type-1 and

(2015) 87–95.

M

type-2, International Journal of Fuzzy Logic and Intelligent Systems 15 (2)

[30] W. Pedrycz, The principle of justifiable granularity and an optimization of

ED

information granularity allocation as fundamentals of granular computing, Journal of Information Processing Systems 7 (3) (2011) 397–412.

705

PT

[31] W. Pedrycz, Algorithmic Developments of Information Granules of Higher Type and Higher Order and Their Applications, Springer International

CE

Publishing, Cham, 2017, pp. 27–41. [32] W. Pedrycz, Granular Computing: Analysis and Design of Intelligent Systems, Industrial Electronics, Taylor & Francis, 2013.

AC

710

[33] X. Deng, Y. Yao, An information-theoretic interpretation of thresholds in probabilistic rough sets., in: RSKT, Springer, 2012, pp. 369–378.

32