Accepted Manuscript
Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data Vincenzo Loia, Francesco Orciuoli, Witold Pedrycz PII: DOI: Reference:
S0950-7051(18)30048-0 10.1016/j.knosys.2018.01.032 KNOSYS 4205
To appear in:
Knowledge-Based Systems
Received date: Revised date: Accepted date:
11 September 2017 24 January 2018 29 January 2018
Please cite this article as: Vincenzo Loia, Francesco Orciuoli, Witold Pedrycz, Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data, Knowledge-Based Systems (2018), doi: 10.1016/j.knosys.2018.01.032
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • Granular Computing (GrC) for discovering event periodicities in temporal data. GrC is a paradigm useful to describe and analyse data at different
CR IP T
levels of abstraction, therefore data can be exploited to create information granules starting from both the selection of a given time unit and the
construction of a set of time slots in a time interval of interest decomposed into periodic segments.
• Formal Concept Analysis (FCA) with time-related attributes to realize
AN US
granulations of data with respect to periodic time slots represented as specific attributes, called temporal attributes of the formal context. Granules
including temporal attributes are used to discover periodic occurrences and co-occurrences.
• A set of measures, i.e., Information Granulation (IG), Information Entropy (IE), Separation (SEP), Coverage (COV) and Specificity (SP), which can
M
be used to assess granulation and resulting granules according to their capability to elicit useful knowledge related to periodic occurrences and
ED
co-occurrences. In brief, these measures help us discover relevant temporal occurrences and co-occurrences by guiding us across 55 multiple granulations and, within a specific granulation, to identify also the granules
PT
providing more interesting and/or unique knowledge.
• The work provides both an illustrative example and a case study realized
CE
by using a dataset related to forest fires occurred in the natural park of
AC
Montesinho (Portugal).
• The main original aspect of the proposed approach comes to the definition of a novel time-guided granulation approach including the contextualization of a set of measures useful to assess the quality of granulations with respect to the interestingness and uniqueness of the discovered knowledge related to the periodicity of occurrences and co-occurrences of events. Despite of existing works, the proposed approach helps the human operator 1
ACCEPTED MANUSCRIPT
finding the granularities (along the time dimension) providing interesting
AC
CE
PT
ED
M
AN US
CR IP T
and unique knowledge.
2
ACCEPTED MANUSCRIPT
Towards a Granular Computing Approach based on FCA for Discovering Periodicities in Data
CR IP T
Vincenzo Loiac , Francesco Orciuolic , Witold Pedrycza,b a Department
of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6R 2G7, Canada b Systems Research Institute, Polish Academy of Sciences, 01-224 Warsaw, Poland c Dipartimento di Scienze Aziendali - Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132 - 84084 Fisciano (SA), Italy
AN US
Abstract
Studying aspects related to the occurrences and co-occurrences of events enables many interesting applications in several domains like Public Safety and Security. In particular, in Digital Forensics, it is useful to construct the timeline of a suspect, reconstructed by analysing social networking applications like
M
Facebook and Twitter. One of the main limitations of the existing data analysis techniques, addressing the above issues, is their ability to work only on a single view on data and, thus, may miss the elicitation of interesting knowledge.
ED
This limitation can be overcome by considering more views and applying methods to asses such views, allowing human operators to move from a view to a more suitable one. This paper focuses on temporal aspects of data, proposes
PT
an approach based on Granular Computing to build multiple time-related views and to interpret the extracted knowledge concerning the periodic occurrences
CE
of events. The proposed approach adopts Formal Concept Analysis (with timerelated attributes) as an algorithm to realize granulations of data and defines a set of Granular Computing measures to interpret the formal concepts, whose
AC
extensional parts are formed by co-occurred events, in the lattices constructed by such algorithm. The applicability of the approach is demonstrated by pro✩ Fully
documented templates are available in the elsarticle package on CTAN. Email addresses:
[email protected] (Vincenzo Loia),
[email protected] (Francesco Orciuoli),
[email protected] (Witold Pedrycz)
Preprint submitted to Journal of LATEX Templates
January 30, 2018
ACCEPTED MANUSCRIPT
viding a case study concerning a public dataset on forest fires occurred in the Montesinho natural park in Portugal. Keywords: Formal Concept Analysis, Temporal Data, Periodicity, Knowledge
CR IP T
Discovery, Granular Computing
1. Introduction
According to [1], an event is a thing that happens in a certain time and
place, which is involved in some actors, objectives, and action features with sta-
5
AN US
tuses changing. The notions of occurrence and co-occurrence of events can be
found in [2] where the authors provide definitions of basic co-occurrence, temporal co-occurrence, spatial co-occurrence and spatio-temporal co-occurrence. In particular, a temporal co-occurrence between two events emerges when the absolute value of the difference among the timestamps of such events is lower than a given threshold. Moreover, a spatial co-occurrence emerges when two events are co-located, i.e., they both occur in the same space (place). Lastly, a
M
10
spatio-temporal co-occurrence deals with events occurring at the same time in the same place. Also in this case, the concept of both same time and same space
ED
should be formally defined. A periodic event is an event occurring periodically, e.g., an event occurring every Monday or an event occurring every month or every weekend. In a wide range of application domains, it is important to deal
PT
15
(discovering, studying, analysing, etc.) with occurrences, co-occurrences and also with periodicities of events. For instance, in Digital Forensics, the timeline
CE
of a suspect or a group of suspects can be enriched by considering the possibility to elicit some hidden relationships among persons in a group who are in the same place at the same moment or, next, it is possible to discover a behavioural
AC
20
pattern of a person (a group of persons) by considering the sequence of his/her periodic actions. In a completely different domain like Smart Cities, it is possible to discover periodicities in some phenomena like, for instance, traffic jam in order to early plan for avoiding hardships for citizens. In this context, thre
25
present work provides an approach grounded on Granular Computing (GrC)
4
ACCEPTED MANUSCRIPT
for discovering event periodicities in temporal data. GrC is a paradigm useful to describe and analyse data at different levels of abstraction, therefore data can be exploited to create information granules starting from both the selec-
30
CR IP T
tion of a given time unit and the construction of a set of time slots in a time interval of interest decomposed into periodic segments. The paper is organized as it follow. Section 2 describes the motivations for the present work and the
originality of its results with respect to related works. Section 3 provides basic
notations, definitions, and the formulation of the problem, Section 4 provides an overall description of the proposed approach, definition and contextualiza-
tion of the measures related to granulations, some background knowledge on
AN US
35
Formal Concept Analysis (FCA) and GrC and a discussion on spatio-temporal occurrences and co-occurrences. Section 5 provides some illustrative examples recalling the main aspects of the proposed approach. Section 6 provides a case study for demonstrating the key features of the approach and, lastly, Section 7 offers some final remarks and future works.
M
40
2. Motivations and Related Works
ED
Discovering periodic occurrences or co-occurrences is an important result that can be obtained by applying data analysis techniques. One of the most
45
PT
interesting challenges, with respect to the aforementioned issue, is to find suitable views [3] from which observed data must be analysed. For instance, in a dataset of ten events, discovering seven events occuring always on Satur-
CE
day mornings could be more interesting than discovering nine events occurring always on Weekends. This aspect depends on both data and objectives. In general, observing data with an erroneous view leads to the incapacity of discovering interesting knoweldge. For this work, such views are defined by selecting tem-
AC 50
poral intervals and modifying time units in a way that allows to make a kind of temporal zoomin/zoomout operation on data. Therefore, according to the fact that the most useful views to adopt depends on both the available data and the objectives of the analysis, it is needed a computational approach to support hu-
5
ACCEPTED MANUSCRIPT
55
man operators in effectively and efficiently evaluating different views on data in order to discover interesting periodic temporal occurrences and co-occurrences of events. The proposed approach uses Formal Concept Analysis (FCA) with
CR IP T
time-related attributes to realize granulations of data with respect to periodic time slots represented as specific attributes, called temporal attributes of the 60
formal context. Granules including temporal attributes are used to discover periodic occurrences and co-occurrences. Moreover, the paper discusses a set of measures, i.e., Information Granulation (IG), Information Entropy (IE), Sepa-
ration (SEP), Coverage (COV) and Specificity (SP), which can be used to assess
65
AN US
granulation and resulting granules according to their capability to elicit useful knowledge related to periodic occurrences and co-occurrences. In brief, these
measures help us discover relevant temporal occurrences and co-occurrences by guiding us across multiple granulations and, within a specific granulation, to identify also the granules providing more interesting and/or unique knowledge. The application of FCA (or other concept learning methods) to implement data analysis approaches based on the paradigm of GrC has been already discussed
M
70
in several works, recognizable in the specialized literature [4, 5, 6]. With respect
ED
to the aforementioned works, this paper introduces an approach to granulate data along the time dimension by using FCA (borrowing the idea of Temporal Concept Analysis, already described in some works like [7]) to learn granules/concepts representing temporal periodicities of events (in data). Thus, the main
PT
75
original aspect of the paper comes to the definition of a novel time-guided granu-
CE
lation approach including a framework for assessing the quality (interestingness and uniqueness) of the discovered knowledge (periodic occurrences and/or cooccurrences of events). Despite of existing works like [2], the proposed approach helps the human operator finding the granularities (along the time dimension)
AC
80
providing interesting and unique knowledge.
6
ACCEPTED MANUSCRIPT
3. Periodic Occurrences and Co-occurrences of Events An event can be represented by the tuple
. According to [8], it is possible to assume different timelines, basing on their specific time unit, to represent time information associated with
CR IP T
85
events and their patterns (occurrences, co-occurrences, periodicity, etc.). For instance, it is possible to consider a timeline based on days and, in this case, the
time unit is the day (Tday ) or based on months (the time unit is the month, i.e., Tmonth ) and so on. Different timelines provide a first type of temporal granular90
ities and make emerge, in general, different knowledge. A time slot is a sequence
AN US
of time units wihin a timeline. For instance, if we are considering the timeline Tday each day could correspond to a time slot or it is possible to consider more
days for a time slot. Following these criteria we obtain a detailed representation of the timeline. Based on a given time unit and a specific definition of 95
time slots it is possible to introduce the segment. Segments are used to model
M
periodicities. Given some t > 0, if we have a total of n time slots, such slots are grouped into nt segments, such that each segment consists of t consecutive slots and the k-th segment (k ∈ 0, 1, ..., nt − 1 ), denote by Ik , consists of t 100
ED
slots from the slot (t ∗ k) to the slot (t − 1 + t ∗ k). In this sense, t represents the number of parts (consecutive and non-overlapping time intervals) a segment is
PT
decomposed in. The set Ik [i] is the set of all events occurring in the i-th slot (i ∈ {0, 1, ..., t − 1}) of the k-th segment. Note that it is possible to map a given event on different timelines by using its timestamp. By considering the above
CE
approach to segmentation, an event v is said to periodically occurr if there exists
105
i ∈ {0, 1, ..., t − 1} such that v is found in the i-th slot of every segment (strict periodicity) or in most segments (relaxed periodicity). Moreover, two events
AC
v1 and v2 periodically co-occurr if there exists i ∈ {0, 1, ..., t − 1} such that v1 and v2 are found in the i-th slot of every segment or in most segments. Lastly, a time interval of interest (e.g., 3 years, from 1999-03-01 to 2002-03-31) is the
110
whole time window in which the events of interest fall in. Let us present an example. Assume to have a time interval of interest going
7
ACCEPTED MANUSCRIPT
from 2018-01-08 to 2018-01-21 and the following set of events with their associated timestamps (the information on the time at which each event occurred): (e1, 2018-02-08), (e2, 2018-02-09), (e3, 2018-02-13), (e4, 2018-02-15), (e5, 201802-20) and (e6, 2018-02-21). Assume also that we choose a time unit equal to
CR IP T
115
an individual day (i.e., Tday ) and a segment equal to seven days (i.e., a week). Now, it is possible to decompose the week into three parts and define three time
slots: time slot 0 represented by , time slot 1 represented
by and time slot 2 represented by
Sunday>. Thus it is possible to affirm that events e1 and e2 co-occur in time
AN US
slot 0 of the first segment, e4 occurs in time slot 0 of the second segment e3 occurs in time slot 2 of the first segment and e5 and e6 co-occur in time slot 2 of
the second segment. In other words: I0 [0] = {e1, e2}, I1 [0] = {e4}, I0 [1] = {}, I1 [1] = {}, I0 [2] = {e3} and I1 [2] = {e5, e6}. Lastly it is possible to affirm that 125
e1, e2 and e4 periodically co-occur in time slot 0 and e3, e5 and e6 periodically
M
co-occurr in time slot 2.
4. A GrC-based Approach to Discover Temporal Periodicity in Data
ED
The approach is based on the paradigm of Granular Computing (GrC) [9] in which a human operator establishes the time-related parameters (time unit, segment, periodic time slots, time interval of interest) of the view he/she would
PT
130
like to use in order to analyse the dataset. Such view will be used to pre-process the dataset in order to obtain a formal context to enable the work of one of
CE
the algorithms for generating formal concepts and constructing formal concept lattices (in brief FCA algorithms [10]), which can be used to realize granula-
135
tion and produce a granulated view of the universe [11]. This is what we call
AC
time-guided granulation. After a first granulation accomplished by considering the aforementioned formal context to obtain a lattice/granulation (time-guided granulation) in order to observe events from the given view, the human operator assesses the lattice/granulation by using the proposed measures and decides
140
if going ahead or back to adjust or change the considered view by modifying
8
ACCEPTED MANUSCRIPT
some time-related parameters. In this way he/she searches for suitable views. Next, given a granulation, he/she can evaluate concepts/granules to discover strict or relaxed periodicities with respect to event significant occurrences or
similar schema to that adopted in [12] for Cognitive Cities.
AN US
145
CR IP T
co-occurrences. The overall approach is depicted in Fig. 1 and it follows a
4.1. Granular Computing
M
Figure 1: Graphical representation of the overall approach.
Intelligent data analysis can be accomplished by using a plethora of existing
ED
approaches for data mining, machine learning, knowledge discovery, and statistics. The main drawback of such approaches is that they consider a specific 150
view of data when being applied. Often, it is convenient to observe and study
PT
data from multiple views. This consideration leads to the introduction of a multiview intelligent data analysis that has been introduced to explore differ-
CE
ent types of knowledge, different features of data, and different interpretations of data [3, 13]. In this context, Granular Computing has emerged as a multi-
155
disciplinary paradigm for problem solving and information processing [14, 15]
AC
that provides a general, systematic and natural way to analyze, understand, represent, and solve real world problems. By adopting such paradigm it is possible to realize multiview intelligent data analysis. Granulations and granular structures are the two fundamental issues in granular computing. Different types of
160
granulations and granular structures reflect multiple aspects of data. Granulation is related to the process for the construction of granules. The structures of 9
ACCEPTED MANUSCRIPT
all granules is related to the relationships among the granules. Different granular structures could describe different characteristics of data or knowledge embedded in data [3]. Furthermore, granulation of a universe of discourse (of available elements described by data) involves the grouping of individual elements into
CR IP T
165
classes that decompose the set of all elements into granules. The relationships that keep together different elements in a granule could be indistinguishability,
similarity, proximity or functionality [15]. Therefore, a granulation provides a
granulated view of the universe. Potentially, we can realize many granulated 170
views of the same universe. Fig. 1 (the image on the right) provides also a
AN US
graphical sketch depicting three different time-guided granulations of the same universe of events.
4.2. Formal Concept Analysis with Time-related Attributes
Formal Concept Analysis (FCA) [16] is a method for data analysis, knowl175
edge representation and information management [17]. In this paper, FCA is
M
adopted to realize granulation over a set of events/objects (universe) [18] in order to investigate temporal phenomena [7]. Let us introduce basic definitions in
ED
the area of FCA. A formal context is defined as a set structure K = (G, M, I) for which G and M are sets while I is a binary relation between G and M , 180
i.e., I ⊆ G × M . G is a set of objects, M is a set of attributes and gIm, i.e.,
PT
(g, m) ∈ I, states that the object g has the attribute m. For constructing formal concepts from a formal context two derivation operators are needed. Being
AC
CE
A ⊆ G and B ⊆ M , then it is possible to define the following two operators:
185
AI = {m ∈ M | gIm for all g ∈ A} , B I = {g ∈ G | gIm for all m ∈ B} .
The previously defined derivation operators generate, respectively, the set of
all attributes held by a given set of objects, and the set of all objects holding a give set of attributes. A formal concept of a formal context is defined as (A, B) with A ⊆ G and 10
ACCEPTED MANUSCRIPT
B ⊆ M , B = AI and A = B I . A and B are called extent and intent of the formal concept (A, B). Moreover, the subconcept-superconcept relation is defined as:
190
CR IP T
(A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B1 ⊇ B2 ),
where (A1 , B1 ) and (A2 , B2 ) are two formal concepts of the formal context K = (G, M, I), (A1 , B1 ) is subconcept of (A2 , B2 ) and (A2 , B2 ) is superconcept of (A1 , B1 ). Starting from the derivation operators and their properties, it is
possible to define algorithms to construct formal concepts from a formal context
AN US
and represent the extracted knowledge as a lattice [19].
As reported in [20], concepts are primarily cognitive structures and used to
195
reconstruct and represent objects, segments, events of the surrounding world. If objects in formal contexts come with information related to time, it is possible to apply FCA to realize time-related knowledge discovery and representation [7, 21, 22]. The resulting formal concepts (A, B) are information granules. Among all the constructed information granules, those having a B ∩ D 6= ∅
M
200
are called timed information granules (TIGs). TIGs are interpreted as periodic
ED
occurrencese or co-occurrences of events. In order to semantically interpret a TIG it is needed to consider the formal concept (A, B) representing the TIG, the temporal attributes D included in B, the other attributes included in B that characterize the events in A and all the time-related parameters used to
PT
205
realize the granulation.
CE
Therefore, we further specify the definition of formal context by considering D ⊆ M , where D = {d1 , d2 , ..., dk }. The generic element di is an attribute representing a discrete time information. Definitely, injecting time information, that is extending the set of attributes, in a formal context, with time-related
AC
210
attributes, is a plausible approach to extract time-enriched concepts from data and represent them by means of a well-known knowledge structure (the lattice) [7]. Let us describe a short example. If we consider information related to the daily presence of two employees (e1 and e2) in one of the existing sites
215
(Rome, M ilan) of their company, we have the following set of attributes: M = 11
ACCEPTED MANUSCRIPT
{Rome, M ilan, M onday, T uesday, W ednesday, T hursday, F riday} and D = {M onday, T uesday, W ednesday, T hursday, F riday}. By applying FCA it is possible to obtain formal concepts organized into the lattice shown in Fig. 2
220
CR IP T
(obtained from the formal context in Table 1), where it is possible to understand, by analysing formal concepts cointaining information for both e1 and e2, that
e1 periodically (strict periodicity) is at the Rome site every Monday, and e2
periodically (relaxed periodicity) is at the Milan site on Tuesday. Moreover,
the presence of e1 and e2 co-occur (relaxed periodicity) at Rome on Monday. Note that it is possible to read, from the lattice, the attributes simultaneously held by the aforementioned objects by exploring the attributes attached to all
AN US
225
nodes within all paths conducting (from the top) to the node containing such objects. Moreover, e1 1, e1 2, ..., e2 1, ... represent the set of observations for
x
x
e1 2
x
x
e1 3
x
x
x
x
e1 4 e2 1
x
x
PT
e2 2
ED
e1 1
M
Ro me Mi lan Mo nd ay Tu sda We y d Th nesd urs ay Fri da da y y
the employees e1 and e2.
x
e2 4
x
x
x x
CE
e2 3
x
Table 1: Sample formal context.
AC
230
Figure 2: Sample formal lattice.
4.3. Preparing Formal Contexts to support Time-guided Granulation Granulation needs to start from data, thus the original dataset (e.g., a log
file) to be analysed, also called raw data, is typically structured into a set 235
of rows where each row includes information about an event and the time at
12
ACCEPTED MANUSCRIPT
which it occurs. By using a conceptual scaling approach, the raw data are transformed into a formal context (G, M , I) where the events (or entities) are the objects belonging to G described by means of the attributes in M . Periodic
240
CR IP T
time slots are scaled into periodic time slots and represented by the subset of attributes D ⊆ M , namely temporal attributes. More in detail, we have to select: i) a time interval of interest Ω and excluding, from the raw data, the
information related to events that do not occur in such inverval, ii) the time
unit to consider (e.g., Tday , Tweek ), and iii) the number and length (in terms of time units) of time slots basing on the length t (measured in time units) of the segments the timeline is decomposed in. By using the above parameters
AN US
245
it is possible to define the set of periodic time slots {S1 , S2 , ..., Sk } where the Pk length of Si is defined by the function len(Si ) such that i=1 len(Si ) = t and considering that the whole length of a segment (i.e., t), in number of time units.
AC
CE
PT
ED
M
Subsequently, given the above information we need a function fT that maps an
250
Figure 3: Mapping a time interval onto a formal context.
event timestamp onto a periodic time slot. Being g an event, fT (g) is equal to i, with i = 1, ..., k if the timestamp of g, namely τ (g), falls in the defined time interval Ω and falls in exactly one element of the set {S1 , S2 , ..., Sk }, otherwise it is equal to −1. Starting from fT it is possible to conceptually scale the raw 13
ACCEPTED MANUSCRIPT
data into a formal context (G, M, I). In particular, we are interested in focusing 255
on how to define the set D ⊆ M of temporal attributes. The set D includes the attributes d1 , d2 , ..., dk such as gIdi if and only if g falls in Si , i.e., fT (g) = i.
CR IP T
Thus, by using fT it is possible to fill the part of the formal context related to the temporal attributes in D. Fig. 3 graphically shows an example of how
original data can be pre-processed to build a formal context. In particular, such 260
figure provides insights into the transformations from time slots to time-related attributes. Thus, the attribute T uesday, in the formal context, represents the
union of all slots T uesday, i.e., one for each segment in the time interval of
AN US
interest. In brief, each attribute in D represents a periodic time slots. Take care
that after conceptual scaling operations it could be useful to execute attribute 265
reduction operations [23, 24] to improve the performance of FCA algorithms. 4.4. Assessing Granulations
A problem related to the interpretation of concept lattices is often caused
M
by the fact that number and srtucture (intensional and extensional parts) of the extracted formal concepts are not satisfactory. A large number of formal concepts, corresponding to the information granules, provides an overly fine
ED
270
granulation of the input objects. On the other hand, a small number of formal concepts (information granules) may not provide sufficient information about
PT
the input objects for the decision makers. Therefore, adjusting the granularity level of attributes in order to acquire suitable knowledge from the corresponding 275
formal concept becomes an important issue [4]. We need an approach to assess
CE
granulations in order to evaluate the knowledge provided by granules. Such approach can be based on one or more existing measures that will be introduced
AC
soon in this paper.
280
Given the formal context K(G, M, I) and the corresponding lattice L(G, M, I),
a first couple of measures [25, 26] using which it is possible to evaluate a lat-
14
ACCEPTED MANUSCRIPT
tice/granulation is represented by the Information Entropy (IE): X 1 IE(L) = |G| g∈G
1−
II ! g
(1)
|G|
CR IP T
and the Information Granulation (IG): IG = 1 − IE. IG (and IE) provides information on the granules within a lattice by taking into account the num-
ber objects (estensional part) included into each granule. For each couple of 285
formal contexts K1 (G1 , M, I1 ) and K2 (G2 , M, I2 ), if IG(L1 ) > IG(L2 ) then K1 provides a granulation that is more interesting than that of K2 with re-
AN US
spect to our aim to find co-occurrences. This is intuitive beacause L1 provides bigger granules, i.e., granules with more objects, and the set of attributes is the same, i.e., thus the time slots have the same length. Moreover, given two 290
formal contexts K1 (G1 , M1 , I2 ) and K2 (G2 , M2 , I2 ) where: i) M1 = M ∪ {a}, M2 = M ∪ {a1 , a2 }, and ii) ∀g1 ∈ G1 ∃g2 ∈ G2 such as when g1 I1 m holds then g2 I2 m holds and when g1 I1 a holds then exactly one of g2 I2 a1 and g2 I2 a2 holds.
M
Then IG(L1 ) ≥ IG(L2 ). If IG(L1 ) = IG(L2 ) then K2 is more suitable than K1 , otherwise we need more information to compare K2 and K1 with respect to 295
their interestingness related to the discovery of co-occurrences. This is intuitive
ED
because we are decomposing an attribute providing finer intervals in the place of a coarser interval.
PT
IG and IE give us a framework to evalute a set of granulations driven by different definitions of specific periodic time slots, which are represented by the set of temporal attributes included in the set of attributes of the formal context.
CE
300
4.5. Evaluating the uniqueness level of granules In order to elicit more information on individual granules it is possible to
AC
use the measure of separation (SEP), introduced in [27][28] and defined as:
305
SEP (A, B) = P
I g∈A |g | +
|A| |B| P I m∈B |m | − |A| |B|
(2)
where (A, B) is a formal concept. The SEP index can be used to describe how 15
ACCEPTED MANUSCRIPT
well a formal concept sorts out the objects it covers from other objects and, jointly, how well it sorts out the attributes it covers from other attributes of the formal context. Therefore, SEP is defined by calculating the ratio between the
310
CR IP T
area covered in the formal context by a formal concept (A, B) and the total area covered by its objects and attributes. The SEP index gives us a relative measure about how a specific granule is separated by the other ones and it can be useful to identify granules with unique characteristics (in terms of both objects and attributes). SEP index does not care of the attribute semantics.
AN US
4.6. Evaluating the interestingness of granules and granulations
For this aim, Granular Computing literature [29, 30] provides with general
315
measures to evaluate individual granules within a given granulation. The most important measures are coverage (COV) and specificity (SP). As reported in [31], coverage is concerned with an ability of information granule to represent (cover) data. In general, the larger number of data is being covered, the higher the coverage of the information granule. The author of [31] also states that
M
320
the specificity, intuitively, relates to a level of abstraction of the information
ED
granules. The higher the specificity, the lower the level of abstraction. These two measures have to be specialized by taking into account both the nature and structure of granules. In particular, given a lattice L(G, M, I), for each formal concept (information granule) (A, B) in such a lattice, it is possible to calculate
PT
325
AC
CE
its coverage and specificity.
COV (A, B) =
SP (A, B) = 1 −
|A| , |G|
len(d) , range
(3)
d∈B
(4)
where len(d) is the length of the periodic time slot Sd represented by the at-
tribute d and range is the sum of the lengths of all the considered time slots
D = {d0 , d1 , ..., dt−1 } corresponding to the mumber of time units in a sin330
gle segment. The previous measures allow to describe the characteristics of
16
ACCEPTED MANUSCRIPT
an information granule, represented by a formal concept, along two coordinates [32]. This capability will be exploited in the next parts of this study to graphically describe granules. Moreover, such measures can be aggregated in
335
CR IP T
order to provide a unique index representing the interestingness of the granule: ζ
Q(A, B) = COV (A, B) · SP (A, B) , where the power ζ offers some flexibility
in expressing importance between the conflicting requirements of coverage and
specificity. The higher the value of ζ, the more essential is the facet of specificity. Furthermore, if we would like to consider the whole granulation, it is possible
340
the following formula: ¯ Q(L) =
AN US
to calculate the average interestingness of it. This can be accomplished by using
X
(A,B)∈L,
B∩D6=∅
Q(A, B) n
(5)
where n is the cardinality of the set of formal concepts in L having a temporal
M
attribute in its intensional part, i.e., those (A, B) given that B ∩ D 6= ∅. 4.7. Combining measures to support granulation decisions Making reference to the process in Fig. 1, once the assessment results of
ED
345
the current granulation are observed and interpreted by the human operator, he/she can modify the time-related parameters in order to re-granulate and
PT
changing the lens (e.g., zooming in or zooming out) using which observing the events. This section provides a set of recommendations useful to configure a new granulation. In particular, when performing a further time-guided granu-
CE
350
lation starting from the results of a previous one, two important decisions to make concern which temporal attributes (in the formal context) have to be de-
AC
composed and how to decompose them. The attribute selection can be done by considering the temporal attribute included in the granule with big coverage
355
and small specificity, i.e. with greater
COV SP
index. Such choice allows us to
further investigate the granule characterized by numerous events occurring in a large periodic time slot.
17
ACCEPTED MANUSCRIPT
On the one hand, we can separate these events and continue to have a sufficient coverage. On the other hand, we can reduce the length of the periodic 360
time slot, given that the current one is not significant. The second issue is to
CR IP T
decide how to decompose the selected temporal attribute. First of all, we can decide on the number of parts two obtain by decomposing the given attribute. For simplicity, we choose to consider only three parts that are defined by means
of two values α and β. Finding optimal values of α and β is a subject to 365
several existing studies. There exist several approaches based, for instance, on Information Theoretic Rough Sets [33].
AN US
Additionally, we would like to define the decomposition operator ` that can be used to split a specific temporal attribute d ∈ D, related to a given formal
context K(G, M, I) and D ⊆ M , into a set of temporal attributes {d01 , ..., d0n } 370
in order to obtain a new formal context K 0 (G0 , M 0 , I 0 ) where M 0 = M − {d} ∪
{d01 , ..., d0n } and ∀g ∈ G ∃g 0 ∈ G0 such that when gIm holds then g 0 I 0 m0 also
holds and when gId holds then exactly one of g 0 I 0 d0i (for i = 1, ..., n) also holds.
M
For instance, the statement d ` d1 d2 indicates that the temporal attribute d is decomposed into the set {d1 , d2 }.
After realizing a new granulation by decomposing a temporal attribute d
ED
375
into a set of smaller temporal attributes d01 , d02 , ..., d0k it could be useful to understand if the new granulation provides more interesting knowledge with
CE
PT
respect to the previous one. In this case, the following statement holds:
(6)
where B 0 = B − {d} ∪ {d0i } (i = 1...k).
AC
380
COV (A, B) ≥ COV (A0 , B 0 ),
Additionally, also the following statement holds: SP (A, B) ≤ SP (A0 , B 0 ),
where B 0 = B − {d} ∪ {d0i } (i = 1...k). 18
(7)
ACCEPTED MANUSCRIPT
¯ applied to specific periodic Moreover, we can provide the definition of Q time slots: ¯ Q(d) =
X
B∩{d}6=∅
(A,B)∈L,
Q(A, B) nd
(8)
CR IP T
385
where nd is the cardinality of the set of formal concepts in L having a temporal attribute d in its intensional part. It is possible to know if a decomposition leads ¯ 0 ), Q(d ¯ 0 ),..., Q(d ¯ 0 )) ≥ Q(d). ¯ to a better granulation by checking if max(Q(d 1 2 k
Lastly, we can assess two alternative decompositions (obtained by applying
390
AN US
two times the operator ` and using different target sets for the function range)
of the same temporal attribute by: i) realizing the associated granulations and ii) comparing the two sums (one for each granulation) of Q index of all timed information granules. The greater is associated with the most suitable granulation 395
in terms of interestingness.
M
4.8. Including Spatial Information
In order to extend the proposed approach to include also spatial information,
ED
it is possible to construct a formal context with additional attributes related to locations/places. In particular, if M is the set of attributes, A is the set of 400
objects and D ⊆ M is a subset of all time-related attributes (periodic time
PT
slots as defined in Section 4.3) we consider a further subset of M , namely P = {p0 , p1 , ..., ph−1 }, of places in which the event of interest has occurred. The
CE
construction of the subset P can be realized by considering a granulation process similar to that employed for the granulation along the time dimension. The sets
405
D and P are disjoint: D ∩ P = ∅. The resulting lattice will contain concepts
AC
including both temporal and spatial information within their intentional parts. In this case, it will be possible to look for periodic spatio-temporal occurrences or co-occurrences. For instance, Fig. 4(c) shows an information granule containing events (or entities) {oo1, oo5} that periodically co-occur at time w1,2,2 in the
410
place place-r. Therefore, we will consider a new type of information granules represented by formal concepts (A, B) with B = P ∪ {d} ∪ {p}. 19
ACCEPTED MANUSCRIPT
5. Illustrative Example The initial dataset consists of six records providing information related to space and time of activities executed by a specific actors (users). We use FCA to detect periodic co-occurrences. Therefore, the first step is to choose time
CR IP T
415
unit, segment and time slots in order to construct the set D of temporal at-
tributes in the formal context. In this example, we describe three different ways of granulation guided by three different decomposition of a segment. In the first decomposition, the time unit is set to Tday (days), the segment is represented 420
by the week (seven days) and, initially, there is only one periodic time slot (i.e.,
AN US
w1 ) representing all the days in a week. Subsequently, for the second decomposition, there are two periodic time slots obtained by grouping the seven time
units in a segment into two parts: the first part (first slot w1,1 ) including Monday and Tuesday and the second part (second slot w1,2 ) including Wednesday, 425
Thursday, Friday, Saturday and Sunday. The third decomposition foresees three
M
different slots: the first one composed by Monday and Tuesday (equal to the previous decomposition w1,1 ), the second one (w1,2,1 ) including Wednesday and Thursday, and, lastly, the third one (w1,2,2 ) represented by Friday, Saturday
430
ED
and Sunday. The above three decompositions can be described by the operation chain w1 ` w1,1 w1,2 ` w1,1 w1,2,1 w1,2,2 . This chain leads to three different
PT
granulations of the universe of available events and can be seen as a sequence of zooming in operations. In order to execute a lattice building algorithm it is needed the construction
CE
of formal contexts. Therefore, the attributes have to be conceptually scaled. In
435
more detail, we have three attributes related to space (i.e., place-r, place-n and place-m), three related to the activity type (i.e., act-1 and act-2) and, lastly,
AC
the temporal attributes as we have indicated above. Table 2 shows the three formal contexts constructed by accomodating the initial dataset with respect to the three aforementioned decompositions of the temporal attributes. The
440
original dataset comes from a log file in which activities of two types (act-1 or act-2) have been executed at a given time (the timestamp is scaled on the
20
ACCEPTED MANUSCRIPT
Table 2: Three formal contexts
x
oo4
x
oo5
x
oo6
oo2
x
oo3
x
oo4
x
x
x
oo5
x
x
x
oo6
x
x
x
x x
x
x
x
oo5
x
2 1,
w
x
2, 2
1,
x x
x x
w
1, 2, 1
t-2
1, 1
w
t-1
ac
ac
w
x
x
x
x
x
x
x
x
M
oo4
x
x
AN US -m
-n
ce
ce
x
x x
x
x
x
oo3
oo6
pla
-r ce
pla
pla
x
oo2
x
x
(c) Decomposition w1,1 , w1,2,1 , w1,2,2
oo1
1, 1
ce -r pla ce -n pl a ce -m ac t-1
oo1
x
x x
x
pla
1
x
x
oo3
w
x
oo2
CR IP T
x
ac t-2
ce -r pla ce -n pla ce -m ac t-1
pla
oo1
ac t-2 w
(b) Decomposition w1,1 , w1,2
(a) Decomposition w1
x
x
ED
periodic time slots represented by the temporal attributes) in a specific location (place-r, place-n or place-m) by different human actors. Once, the granulation parameters have been established, we have to execute the FCA algorithm for three times, one for each formal context obtained before. If we calculate the IG
PT
445
for each lattice, construted respectively from the three provided formal contexts,
CE
we obtain the values: IG(L1 ) = 0.28, IG(L2 ) = 0.28 and IG(L3 ) = 0.22. This result is convincing given that information granules are known to become larger when attributes are removed from the previous formal context [25]. Thus if we decompose temporal attributes in finer parts we obtain a minor or equal
AC
450
IG value. Lattices generated from the first and the second context provide the same IG values because they have de-facto the same number of attributes
(note that w1,1 has no object supporting it). If we increase the number of temporal attributes (by decomposing in the way adopted before) and the IG
21
CR IP T
ACCEPTED MANUSCRIPT
455
AN US
Figure 4: Information granules in lattices constructed from the three formal contexts
value does not change then we achieve a more detailed view of data. In brief, IG provides a useful approach to evaluate the degree of finess (or coarseness) of information granules: the larger IG of a lattice the coarser the information granules and, consequently, the view on data will be less focused. Fig. 4 shows
460
M
how granules vary along different granulations. It is important to underline that the three results must be considered in the light of the size of periodic time slots we are considering. In general, using finer time slots allows to gather more
ED
precise knowledge about periodic co-occurrences.
In the provided example
AC
CE
PT
Table 3: COV SP and Q measures applied to the granules of the third granulation. GRANULE (INT)
COV
SP
Q
place-m act-2 w1,2,2
0.17
0.57
0.10
place-n act-1 w1,2,2
0.17
0.57
0.10
place-r act-2 w1,2,2
0.17
0.57
0.10
place-n act-1 w1,2,1
0.17
0.71
0.12
place-r act-1 w1,2,2
0.33
0.57
0.19
act-2 w1,2,2
0.33
0.57
0.19
place-n act-1
0.33
N/A
N/A
act-1 w1,2,2
0.50
0.57
0.29
place-r w1,2,2
0.50
0.57
0.29
act-1
0.0
N/A
N/A
w1,2,2
0.0
0.57
0.0
it is possible to see that the granulation degree calculated by applying the IG
22
ACCEPTED MANUSCRIPT
measure is the same for the first and the second lattice. This is due to the fact 465
that we have the same granules in both lattices, in turn, this is given by the executed decomposition that does not change the granule composition. Taking
CR IP T
into account the discussion of Section 4.7 we can assert that if we choose a time decomposition leading to a granulation with an unsuitable IG value, we have to go back and try with another decomposition. The attribute to decompose can be 470
selected, empirically, by considering those attributes representing time slots with
great length and holding a great number of objects. Now, it is possible to employ the measures described in Section 4.6 in order to evaluate the interestingness of
AN US
every granules in available lattices. COV , SP and Q (calculated by using ζ = 1)
indexes are applied to all granules of the three lattices in Fig. 4. In particular, 475
Table 3 reports such values for the third granulation. We can conclude that the two granules of the third granulation seem to be more interesting than others. Such granules are positioned in the same point and provide the same Q value of approximately 0.29: (place-r, w1,2,2 ) and (act-1, w1,2,2 ). Given that, for this
480
M
example, we are focusing only on temporal aspects it can be acceptable that the above granules have the same Q value representing their interestingness. The
ED
interpretation of such values is that there is a periodic temporal (co-)occurrence of activity act-1, executed periodically in the time slot w1,2,2 and another one
PT
for any activity in the same time slot in a specific location called place-r.
6. Forest Fires Periodicity: a case study The goal of this case study is to analyse the periodicity of forest fires
CE
485
in the Montesinho natural park (Portugal). The study results can be used, for instance, to predict next years forest fires. The data used in the exper-
AC
iments was collected from January 2000 to December 2003 and it was built using two sources.
490
The first database was collected by the inspector that
was responsible for the Montesinho fire occurrences. At a daily basis, every time a forest fire occurred, several features were registered, such as the time, date, spatial location within a 99 grid the type of vegetation involved, the
23
ACCEPTED MANUSCRIPT
six components of the FWI system and the total burned area. The second database was collected by the Braganc, a Polytechnic Institute, containing sev495
eral weather observations (e.g. wind speed) that were recorded with a 30 minute
CR IP T
period by a meteorological station located in the center of the Montesinho park. The two databases were stored in tens of individual spreadsheets, under distinct formats, and a substantial manual effort was performed to integrate them into a single dataset with a total of 517 entries. This data is available at: 500
http://www.dsi.uminho.pt/pcortez/forestfires/. The dataset has been pre-processed (as indicated in the Section 4 and, in particular, in the Section
AN US
4.3) in order to construct a set of formal contexts that are needed to execute
the FCA algorithm and evaluate the corresponding time granulations. To preprocess data we have assumed the time unit Tmonth and the time interval of 505
interest Ω = [January 2000, December 2003]. Time unit and time interval of interest are fixed also for the next granulations. The number and lengths of the time slots vary along the granulation sequence. The starting granulation has
M
been guided by the periodic time slots month tri1, month tri2 and month tri3 corresponding to the three parts of the year (each part of four months) repeated for each segment (each segment has length of one year). This means
ED
510
that each event (forest fire) falls in only one of the defined periodic time slots and supports only one of the attributes month tri1, month tri2 and month tri3
PT
after performing the conceptual scaling operation along the time dimension. An additional conceptual scaling operation has been realized for the location information included in each row of the dataset. In particular, we have added one
CE
515
attribute for each combination of x and y values corresponding to a specific cell in the 9x9 grid. More in details, a forest file occurred in the location (x1 , y1 )
AC
holds the attribute P lace x1 y1 . The location attributes and values will not vary
along the granulation sequence and, additionally, for this case study, we have
520
not used the other information (fire and weather) attached to each event. After realizing the granulation by considering the above configuration the IG measure is 0.025. The application of the Q index on the lattice obtained by starting from the first formal context is reported in Fig. 5 as an heat map whose cells represent 24
ACCEPTED MANUSCRIPT
timed information granules values are Q values organized into a grid of times
ED
M
AN US
CR IP T
(x axis) and places (y axis). As observed in Fig. 5, the most interesting granule
Figure 5: Q values for the first granulation.
525
PT
is the one holding P lace 86 and falling in the periodic time slot month tri2. In brief, the major number of forest fires occurr in the months of May, June, July and August. Next, Fig. 6 shows the values corresponding to the ratios for all the timed information granules. In this case we have that a granule
CE
COV SP
530
holding month tri2 attribute has the highest
COV SP
value. Thus, it is plausible
AC
to decompose the slot corresponding to such attribute to better observe data by a zooming-in operation. Therefore it is possible to apply the decomposition operator ` to split the periodic time slot containing May, June, July and August
into two periodic time slots: the first one containing May and June, the second
535
one containing July and August. Take care that the parameters (number and lengths of periodic time slots) adopted for the decomposition operation have 25
Figure 6:
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
COV SP
values for the first granulation.
ED
been choosed arbitrary. The obtained periodic time slots are represented by two new temporal attributes month tri2mj and month tri2ja. Thus, we have built a second formal context by dropping from the previous one the attribute month tri2 and inserting month tri2mj and month tri2ja. Every event pre-
PT
540
viously holding month tri2 holds now exaclty one between month tri2mj and
CE
month tri2ja.
The second granulation produced a lattice that shows the most interesting
granule holding month tri2ja and again P lace 86. It is clear than there is a strong periodicity in the period [July, August] for the cell (8, 6) of the 9x9 grid
AC 545
attached to the Montesinho park. It is important to note that if we calculate the interestingness of the at-
¯ index, it is smaller tribute month tri2 for the first granulation by means of the Q ¯ indexes of the attributes month tri2mj and than the maximum between the Q
26
ACCEPTED MANUSCRIPT
550
month tri2ja in the second granulation. As stated in Section 4.7 we can affirm that the second granulation is better than the first in terms of acquired knowl¯ ¯ edge because max(Q(month tri2mj) = 0.00278, Q(month tri2ja) = 0.01243) ≥ Q formula. If we take a look to the
555
COV SP
CR IP T
¯ ¯ values are obtained with ζ = 1.0 in the Q(month tri2) = 0.00977. Q and Q
values for the second granulation
it is possible to know that the attribute month tri2ja is the most suitable to be decomposed to find a better granulation. In fact, the third granulation de-
composes the attribute month tri2ja into two attributes month tri2ja j (July) and month tri2ja a (August). First of all, the third granulation, for ζ = 1.0,
560
AN US
¯ is better than the second one if we consider that max(Q(month tri2ja j) =
¯ ¯ 0.00355, Q(month tri2ja a) = 0.01255) ≥ Q(month tri2ja) = 0.01243. Also for the third granulation the most interesting granule continues to be the one holding P lace 86 along to the space dimension. However, in this granulation we have that the above granule holds month tri2ja a along the time dimension.
fires in the cell (8, 6) of the 9x9 grid. With respect to the same granula-
M
565
This indicates that in August there is a strong periodicity of forest
tion it is interesting to note that if we set to 0.5 the value for the parame-
ED
ter ζ we obtain that the third granulation is not better than the second one. ¯ ¯ In fact max(Q(month tri2ja j) = 0.00370, Q(month tri2ja a) = 0.01310) < ¯ Q(month tri2ja) = 0.01362. Next, if we would like to realize a further zoomingin operation on data it is possible to observe again the
PT
570
COV SP
-values. In this case,
we would need a new time unit if splitting the attribute month tri2ja a that COV SP
value. Thus, we decide to keep the above attribute and
CE
has the maximum
split month tri3, providing the second higher value for the
COV SP
index, in order
to obtain the new two temporal attributes month tri3so (for September and October) and month tri3nd (for November and December). Once the new for-
AC
575
mal context has been built and the FCA algorithm has been applied, the fourth granulation can be evaluated. In particular, another strong periodicity of forest fires occurs in the period from September to October and, in particular, in cells (7, 4) and (6, 5). The fourth granulation is better than the previous one with
580
respect to both ζ = 1.0 and ζ = 0.5. If we follow the previous approach we can 27
ACCEPTED MANUSCRIPT
further decompose month tri3so into month tri3so s and month tri3so o for a fifth granulation. In this case, if ζ = 1 than the granulation is considered better than the previous one. This is not true if ζ = 0.5. At this point we decided
585
given the
COV SP
CR IP T
to stop the process given that we would not like to change the time unit and, values, we had not other suitable attribute to decompose. If
we caluclate the SEP -values for the fourth granulation we obtain higher values
for P lace 86 in month tri2ja a (August) and for P lace 4,6 in month tri3nd
(November-December). Therefore, the aforementioned two granules have high
uniqueness levels given that their events have temporal and/or spatial components that are not shared with many other events in the considered universe.
AN US
590
In this case, in November and December we have found that the ratio between the number of forest fires localized in P lace 46 and that localized in other cells in the 9x9 grid is anomalous. This ratio is relatively much greater for P lace 46 than for the other cells.
7. Final Remarks and Future Works
M
595
The paper proposes an overall approach to use FCA and the paradigm of
ED
Granular Computing to analyse data, containing information about events, in terms of their periodicity. FCA is used as a granulation mechanism able to gen-
600
PT
erate different granulated structures based on its current configuration. Such structures contain timed information granules, i.e., a group of related events, occurring periodically in the same (periodic) time slot and also sharing other
CE
characteristics (space, category, etc.). Multi-views on data, supported by the granular computing paradigm, are needed because they metaphorically offer the chance to observe data through different lens with different zoom levels. This allows to elicit additional knowledge (if we can correctly interpret it) with respect
AC 605
to the one obtained by means of a single-view. The main contribution of the paper is to provide solutions to support the discovery of significant periodicities in data by defining an approach to perform time-guided granulations of data and a framework to evaluate the quality of these granulations. At the moment,
28
ACCEPTED MANUSCRIPT
610
the proposed approach does not foresee a (semi-) automatic procedure for finding, in an effective and efficient way, the most suitable granular structures and
CR IP T
granules but the authors may address this issue in future works.
References References 615
[1] Y. Zhang, W. Liu, N. Ding, X. Wang, Y. Tan, An event ontology descrip-
tion framework based on skos, in: IEEE 15th International Conference on
AN US
Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), 2015, pp. 1774–1779.
[2] H. W. Lauw, E.-P. Lim, H. Pang, T.-T. Tan, Social network discovery by mining spatio-temporal events, Computational & Mathematical Organiza-
620
tion Theory 11 (2) (2005) 97–118.
M
[3] Y. Chen, Y. Yao, Multiview intelligent data analysis based on granular computing., in: Granular Computing (GrC), 2006, pp. 281–286.
ED
[4] Q. Zhang, Y. Xing, Formal concept analysis based on granular computing, J. Comput. Inf. Syst 6 (7) (2010) 2287–2296.
625
PT
[5] J. Li, C. Mei, W. Xu, Y. Qian, Concept learning via granular computing: A cognitive viewpoint, Information Sciences 298 (Supplement C) (2015) 447 – 467. doi:https://doi.org/10.1016/j.ins.2014.12.010.
CE
URL
http://www.sciencedirect.com/science/article/pii/
S0020025514011475
630
AC
[6] J. Li, C. Huang, J. Qi, Y. Qian, W. Liu, Three-way cognitive concept
635
learning via multi-granularity, Information Sciences 378 (Supplement C) (2017) 244 – 263. doi:https://doi.org/10.1016/j.ins.2016.04.051. URL
http://www.sciencedirect.com/science/article/pii/
S0020025516303048
29
ACCEPTED MANUSCRIPT
[7] K. E. Wolff, Temporal concept analysis, in: ICCS-2001 International Workshop on Concept Lattices-Based Theory, Methods and Tools for Knowledge Discovery in Databases, Stanford University, Palo Alto (CA), 2001, pp. 91–
640
CR IP T
107. [8] A. Le, M. Gertz, Mining periodic event patterns from rdf datasets, in: East
European Conference on Advances in Databases and Information Systems, Springer, 2013, pp. 162–175.
[9] J. T. Yao, A. V. Vasilakos, W. Pedrycz, Granular computing: perspectives
645
AN US
and challenges, IEEE Transactions on Cybernetics 43 (6) (2013) 1977–1989.
[10] S. O. Kuznetsov, S. A. Obiedkov, Comparing performance of algorithms for generating concept lattices, Journal of Experimental & Theoretical Artificial Intelligence 14 (2-3) (2002) 189–216.
[11] J. Hu, T. Li, H. Wang, H. Fujita, Hierarchical cluster ensemble model based
M
on knowledge granulation, Knowledge-Based Systems 91 (2016) 179–188. doi:10.1016/j.knosys.2015.10.006.
650
ED
URL https://doi.org/10.1016/j.knosys.2015.10.006 [12] G. Wilke, E. Portmann, Granular computing as a basis of human–data interaction: a cognitive cities use case, Granular Computing 1 (3) (2016)
655
PT
181–197.
[13] Y. Jing, T. Li, H. Fujita, Z. Yu, B. Wang, An incremental attribute re-
CE
duction approach based on knowledge granularity with a multi-granulation view, Inf. Sci. 411 (2017) 23–38. doi:10.1016/j.ins.2017.05.003.
AC
URL https://doi.org/10.1016/j.ins.2017.05.003
[14] A. Bargiela, W. Pedrycz, The roots of granular computing, in: Granular
660
Computing, 2006 IEEE International Conference on, IEEE, 2006, pp. 806– 809.
30
ACCEPTED MANUSCRIPT
[15] L. A. Zadeh, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy sets and systems 90 (2) (1997) 111–127. [16] R. Wille, Restructuring lattice theory: an approach based on hierarchies of
CR IP T
665
concepts, in: Ordered sets, Springer, 1982, pp. 445–470.
[17] U. Priss, Formal concept analysis in information science, Arist 40 (1) (2006) 521–543.
[18] G.-Y. WANG, Q.-H. ZHANG, X.-A. MA, Q.-S. YANG, Granular comput-
AN US
ing models for knowledge uncertainty, Journal of Software 4 (2011) 007.
670
[19] M. C. Lee, Z. L. Liu, H. H. Chen, J. B. Lai, Y. T. Lin, Fca based concept constructing and similarity measurement algorithms, in: Advanced Information Management and Service (IMS), 2010 6th International Conference on, IEEE, 2010, pp. 384–388.
[20] R. Wille, Formal concept analysis as mathematical theory of concepts and
M
675
ED
concept hierarchies, in: Formal concept analysis, Springer, 2005, pp. 1–33. [21] C. De Maio, G. Fenza, V. Loia, F. Orciuoli, Unfolding social content evolution along time and semantics, Future Generation Computer Systems 66
680
PT
(2017) 146–159.
[22] C. D. Maio, G. Fenza, V. Loia, F. Orciuoli, Making sense of cloud-sensor
CE
data streams via fuzzy cognitive maps and temporal fuzzy concept analysis, Neurocomputing 256 (Supplement C) (2017) 35 – 48, fuzzy Neuro Theory
AC
and Technologies for Cloud Computing.
[23] W. Wei, J. Wang, J. Liang, X. Mi, C. Dang, Compacted decision tables
685
based attribute reduction, Knowledge-Based Systems 86 (2015) 261–277.
[24] Y. Jing, T. Li, C. Luo, S.-J. Horng, G. Wang, Z. Yu, An incremental approach for attribute reduction based on knowledge granularity, KnowledgeBased Systems 104 (2016) 24–38. 31
ACCEPTED MANUSCRIPT
[25] C. Huang, J. Li, S. Dias, Attribute significance, consistency measure and attribute reduction in formal concept analysis, Neural Network World 26 (6)
690
(2016) 607–623.
CR IP T
[26] P. K. Singh, A. K. Cherukuri, J. Li, Concepts reduction in formal concept analysis with fuzzy setting using shannon entropy, International Journal of Machine Learning and Cybernetics (2015) 1–11. 695
[27] M. Klimushkin, S. Obiedkov, C. Roth, Approaches to the selection of rel-
evant concepts in the case of noisy data, in: International Conference on
AN US
Formal Concept Analysis, Springer, 2010, pp. 255–266.
[28] S. O. Kuznetsov, T. Makhalova, On interestingness measures of formal concepts, arXiv preprint arXiv:1611.02646. 700
[29] W. Pedrycz, Concepts and design aspects of granular models of type-1 and
(2015) 87–95.
M
type-2, International Journal of Fuzzy Logic and Intelligent Systems 15 (2)
[30] W. Pedrycz, The principle of justifiable granularity and an optimization of
ED
information granularity allocation as fundamentals of granular computing, Journal of Information Processing Systems 7 (3) (2011) 397–412.
705
PT
[31] W. Pedrycz, Algorithmic Developments of Information Granules of Higher Type and Higher Order and Their Applications, Springer International
CE
Publishing, Cham, 2017, pp. 27–41. [32] W. Pedrycz, Granular Computing: Analysis and Design of Intelligent Systems, Industrial Electronics, Taylor & Francis, 2013.
AC
710
[33] X. Deng, Y. Yao, An information-theoretic interpretation of thresholds in probabilistic rough sets., in: RSKT, Springer, 2012, pp. 369–378.
32