Applied Soft Computing 9 (2009) 786–805
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Granular Rough Theory: A representation semantics oriented theory of roughness Bo Chen *, Ming Sun, Mingtian Zhou College of Computer Science & Engineering, University of Electronic Science & Technology of China, Chengdu 610054, China
A R T I C L E I N F O
A B S T R A C T
Article history: Received 27 October 2005 Received in revised form 27 July 2008 Accepted 28 July 2008 Available online 5 November 2008
The present work is an archival paper for a series of contributions proposed in last few years on building a theory of roughness over pure mereological relations among information granules. There are five major efforts taken in the present paper: (1) emphasizing on the representational semantics of theory of roughness: to approximately represent a class of entities characterized by some aspects in terms of entity collections described at other aspects; (2) defining a representation model Granular Representation Calculus (GrRC) to synthesize complex information systems from information granules; (3) establishing notion of Granular Rough Theory (GrRT) over information granules operated in terms of GrRC; (4) extending GrRC/GrRT to various computational environments such as multi-agent systems and ontological computing environments; (5) exploring pragmatic aspects of GrRC/GrRT in implementing prototypes with data model and object programming orientations, and proposing an Ontology-Driven Web Information System as a granular-rough computational Web intelligence framework over GrRC/ GrRT. ß 2008 Elsevier B.V. All rights reserved.
Keywords: Granular Representation Calculus Granular Rough Theory Granular-Rough Computational Web Intelligence
1. Introduction Rough Set Theory founded by Pawlak [1] is recognized as one major soft computing methodology to capture the semantics of vagueness defined by Frege: ‘‘imprecision is presented as set boundary rather than fuzzy membership’’ [2]. Mereology is the theory of ‘‘part-to-whole’’, which has long been a philosophical discipline to investigate mutual relationships amongst entities in ontological research. Mereology of Lesniewski is a specific formalization of Mereology developed by Polish Logician Lesniewski to avoid antinomies of Cantor Set Theory such as ‘‘vicious circle’’, and ‘‘maximum set and its superset’’. Together with Prothetic and Ontology, Lesniewski built up a system [3] on par with that of Principia Mathematica by Whitehead and Russell [4]. As the first attempt to combine Rough Set Theory and Mereology of Lesniewski, L. Polkowski and A. Skowron created the theory of Rough Mereology [5], which replaces the crisp part-to-whole relation primitive in original Mereology with the notion of ‘‘to be a part in some degree’’, viz. Rough Inclusion to quantify to what degree one entity is a part of another one. Rough Mereology is a way to extend classic Mereology by involving roughness in the part-to-whole relation. It is well applied to quality control in complex component system synthesis [6]. Rough Mereology is only an application of Set Theoretic orientation of roughness to
* Corresponding author. Tel.: +86 28 8320 8861. E-mail address:
[email protected] (B. Chen). 1568-4946/$ – see front matter ß 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2008.07.008
classic Mereology, which can be identified from definition of Standard Rough Inclusion in that it is based on Rough Membership Function [7]. Since Mereology is intended to be a substitution for Set Theory, the authors seek for a pure mereological approach to roughness, that is, to define concepts of upper approximation, lower approximation and boundary of roughness with only part-towhole relation between information units rather than set theoretic membership and inclusion. This ultimate intention is affected by requirement for a more agile representation model in abstracting Information Systems so as to: (1) explicitly encode semantic context of an entry of Information Table in Rough Set Theory into information granules to emphasize the representative semantics of roughness; (2) naturally extend applicability of roughness methodology by accommodating both tabular and semi-structured information sources. In the light of above motivations, based on a series of preliminary works examining semantics of roughness [8–10], tentative constructions [11], and intuitive extensions [12–14], the authors of current paper formally clarified the Granular Rough Theory in [15] for its motivational, theoretic, and pragmatic aspects. The present paper is intended to be a thorough archival of Granular Rough Theory for interdisciplinary audiences. Efforts in the present work emphasize on representational semantics of Rough Set Theory, and by approaching notion of roughness over a pure mereological way without terms of Set Theory, validate that roughness theory is not just an extension for classic Set Theory. Moreover, by proposing different adaptation scenarios, unseen
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
usages of Roughness Theory in a wider domain are left for our further exploration. As for the layout, in Section 2, representational semantics of Rough Set Theory is clarified by analyzing motivation leading to its emergence with notions in common Set Theory, investigating its granular nature, and representing such characterization in terms of Lesniewski’s Mereology. This approach of Information Granule from Information System partitioning is naturally derived from primitive definition of Rough Set Theory. Then in Section 3 Granular Representation Calculus (GrRC) is presented in a formalized way, based-on the granular intuition stated, as construction model for a theory of roughness attained upon pure mereological (part-towhole) relations among information granules, which is referred to as Granular Rough Theory (GrRT). In Section 4, GrRC is extended to describe Information Systems in the multi-agent context. In Section 5, GrRC is explicated for its compliance with the framework of Kant’s Synthetic a priori in its current status, and tentative expansion is explored to complement GrRC for its ontological expressivity. In Section 6, two prototypes of Granular Rough Theory are presented in terms of data representation model mapping and object-oriented programming, respectively, and building blocks of an OntologyDriven Web Information System framework are clarified as a computational Web intelligence infrastructure for upper applications. In Section 7, open issues and summarization of contributions are presented as concluding remarks. 2. Re-examine essence of roughness 2.1. Motivation leading to RST In Rough Set Theory, a fundamental notion is the Information System (IS) I = (U, A), in which U is the universe of entities of interests in the system, and A is the set of all the attributes. In a particular case of Information System, Decision System (DS), A = C [ {d}, where C stands for the set of all the Conditional Attributes and {d} is the set of the only Decision Attribute d. One of the initial purposes of Rough Set Theory is to find Decision Rules (DR) in Decision Systems; from the point of view in Knowledge Discovering on Database, the Association Rules (AR) generation [16] is a special case of DR finding. As a general example, IS I* is given arbitrarily in the form of Information Table, by Table 1. For simplicity, one assumption of I* is that it has been reduced with respect to the mutual dependency between attributes (despite of the objectivity of dependency among attributes, the assumption is only for convenience of demonstration purpose, so that it does not invalidate the soundness of the system); another is that the numbers in the table stand for specific discrete values of corresponding Conditional Attribute ci, and the same numbers in different columns are of different meaning to their concrete semantic context. Then the DR of I* is of the form: for B C, u 2 U, ValB(u) ) ValB(u), which means a Valuation of the conditional attribute set B for object u can determine the corresponding value Table 1 Information Table of I*.
u1 u2 u3 u4 u5 u6 u7 u8
c1
c2
c3
c4
c5
d
0 1 0 2 2 2 1 0
1 0 1 0 1 1 0 1
0 1 0 2 1 1 2 1
2 2 2 1 1 0 0 2
4 3 3 2 4 1 0 3
d1 d2 d1 d2 d1 d1 d1 d3
787
of its decision attribute d. This kind of inference is usual in a diagnostics of disease, where the symptoms are the conditional attributes and the status of disease is the decision attribute, and the DR can be exemplified as an inference from symptoms to status of disease. By above formula, we can also interpret it from the aspect of Universe partitioning. The left side ValB(u) can form a partitioning of U based on the valuation of attributes in B, e.g. for B = {c2, c3}, the partitioning of U based on ValB(u) is:{{u1, u3}{u2}{u4, u7}{u5, u6, u8}}, named by us as Conditional Partitioning (CP) Similar partitioning corresponds to Vald(u): {{u1, u3, u5, u6, u7}{u2, u4}{u8}}, named as Decision Partitioning (DP). A DR can be regarded as a relationship between partitions in above two different partitioning over U. A DR may state that ‘‘if u has the conditional attributes c2 = 1 and c3 = 0, then u has the decision attribute d = d1’’, as we can observe from the Information Table of I*, there is a set inclusion relationship between partition {u1, u3} in CP with partition {u1, u3, u5, u6, u7} in DP. The DR example above is not enough for us to acquire the complete reasoning power from a given IS, for we need to know all the cases of valuation to B of an object which can determine the object’s decision attribute to be a particular value. In I*, besides {u1, u3} from the valuating (1, 0) to B = {c2, c3}, we also want to know which valuation of B could lead to the remained entities, {u5, u6, u7} in the same decision partition. From Table 1, the rest objects have B values (1, 1) and (0, 2), but such valuation of B cannot bring us with proper DRs, for they could also exist as conditional attribute values of objects in other decision partition, such as u4 and u8. Here the partitions in CP of U corresponding to (1, 1) and (0, 2) are {u5, u6, u8} and {u4, u7}, respectively. From the viewpoint of set theory, either of them only partly intersects with the specified decision partition {u1, u3, u5, u6, u7}. The partition {us} in CP has nothing to do with the decision partition of interest in DP. So far, from previous investigating, we have following intuitive propositions: Proposition 1. For an Information System, there are two kinds of partitioning of its Universe, viz. the Conditional Partitioning and the Decision Partitioning. Proposition 2. For any B-partition cpi in CP, its corresponding valuation of attribute set B can absolutely determine the decision attribute d of its elements to be di, iff cpi dpi, where dpi is in DP with d = di. We call such partition in CP as Regular B-partition with respect to di, denoted by RpB(di). Proposition 3. For any B-partition cpi in CP, its corresponding valuation of attribute set B can lead its decision attribute d to be di among more than one values of d, iff cpi \ dpi 6¼ F and c pi 6 d pi where dpi is in DP with d = di. Such partition in CP we call Irregular B-partition with respect to di, denoted by IRpB(di). It is the ambiguity of IRpB(di) that calls for the needs of Rough Set Theory. Let X = {u/d(u) = di, u 2 U}, then the lower approximation of X related to B, BX, can be the union of all RpB(di); and the upper approximation of X, BX, can be the union of BX with all IRpB(di). Although we did not explicitly use the crucial basic notions of Indiscernible Relation (IND) and the equivalent class over it, our narration is rooted from them, where IND is represented by the same valuation to a set of objects’ conditional attributes and the equivalent class is resembled by a partition in CP over U. 2.2. Granular intuition Partitions in CP and DP of Universe for an IS can be defined as Conditional Granules (CGr) and Decision Granules (DGr),
788
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
respectively. A valuation of conditional attribute set B can determine a CGr, denoted by CGrB(ValB), which is on par with a B-partition in CP. In the same way, CGrd(Vald) is equivalent to a partition in DP. From the viewpoint of the universe, by the term ‘‘Information Granule’’ for partitions, the whole Universe granule of it are partitioned and the system is decomposed into finer granules of sub-systems. Most of the cases, only part of the sub-systems are of our interest, and detailed investigation over individual sub-system can be of less complexity. And the decomposition aspect of Information Granule is common in practical domains, and similar with the Divide and Conquer strategy. In the context of Decision Rules finding on an Information System, by decomposing the IS into multiple Decision Granules, DRs for each of the DGr are first found out locally, then the complete DRs of the global IS can be synthesized. In contrast to above top-down perception of the system, when we begin with the finest granules for each elementary entity in the Universe, the granular representation of partition is coarser and the intricacy amongst the objects in one partition can be concealed, so that the partition can be a totality of objects having the same characteristics, and present itself as an identifiable and integral individual entity. There is a hierarchy of the fineness of granularity in an Information System, in which the topmost is the all-in-one granule standing for the universe, the bottommost is the atomic granule for each elementary object, and what left between them are granules for different kind of partitions of the universe. It is too easy to regard the hierarchy of granularity as what is constructed by the set inclusion and membership relation in common Set Theory. But in such a view, the relationship of the elementary granule to the partition granule is membership; while between the partition granules with the Universe granule, it is the set inclusion. They are not identical in classical Set Theory. Moreover, by granule, we want to represent a set of objects as an individual entity; however, the notions of set and subset in Set Theory are by nature indicating a meaning of collection of entities rather than entity. 2.3. Mereological interpretation For consistent description of granules without taking account whether they represent element entity or collection of entity, we employ S. Lesniewski’s Mereology as an alternative to Set Theory. Lesniewski’s Mereology is a formal theory dealing with the part to whole relation between objects. As Tarski concisely summarized in [17], the preliminaries are given below, with the formula corresponding to Tarski’s statements given in [18]. In Mereology, the only primitive notion is the relation of part-towhole (P), which is treated as a relation between individuals. In terms of this notion, there are three definitions for other notions and two postulates: Def. MERE-1 An individual X is called a proper part of an individual Y, if X is a part of Y and X is not identical with Y. PPðx; yÞ def ðPðx; yÞ ^ : ðx ¼ yÞÞ Def. MERE-2 An individual X is said to be disjoint from an individual Y if no individual Z is a part of both X and Y. DRðx; yÞ def : 9 z½ pðz; xÞ ^ pðz; yÞ Def. MERE-3 An individual X is called a sum of all elements of a class a of individuals if every element of a is a part of X and if no
part of X is disjoint from all elements of a. SUMða; xÞ def 8 y½y 2 a ! Pðy; xÞ ^ : 9 z½Pðz; xÞ ^ 8 y ½y 2 a ! DRðy; zÞ Postulate 1 If X is a part of Y and Y is a part of Z, then X is a part of Z.
8 x 8 y 8 z½Pðx; yÞ ^ Pðy; zÞ ! Pðx; zÞ Postulate 2 For every non-empty class a of individuals there exists exactly one individual X, which is a sum of all elements of a.
8 a½ 9 x½x 2 a ! 9 !x½SUMða; xÞ
For convenience, [18] defines the notion of Overlap (O) and Partial Overlap (PO) as the negative case of DR, as: Oðx; yÞ def : DRðx; yÞ POðx; yÞ def Oðx; yÞ ^ : Pðx; yÞ ^ : Pðy; xÞ: Now the inconsistent relations between granules of different granularity, the membership and set inclusion relation, are substituted by a unified relation, viz. the primitive notion of Mereology, the relation of part to whole (part of). It is also called ‘‘Ingredient of’’ relation in some publications. By unifying the relation between granules, all granules are of equal significance from the point of view that they are Individuals, and no semantic gap exists any longer in that a collection of individuals is also an individual, for the contained individuals can be regarded as parts of the integral entity. Then the hierarchy of granularity can be described as the atomic granule is a part of its corresponding partition granule, and all the partition granules are part of the Universe granule. Compared with propositions given in Section 2.2, by replacing the set relationship with Mereological notions, we get following alternative propositions: Proposition 20 . For any Conditional Granule CGrB(ValB), its corresponding valuation of attribute set B can absolutely determine the decision attribute d of its atomic parts to be di, iff P(CGrB(ValB), DGrd(di)), where DGrd(di) is the Decision Granule with d = di. We call such Conditional Granule as Regular B-granule with respect to di, denoted by RGrB(di). Proposition 30 . For any Conditional Granule CGrB(ValB), its corresponding valuation of attribute set B can lead the decision attribute d of its atomic parts to be more than one value of d besides di, iff PO(CGrB(ValB), DGrd(di)), where DGrd(di) is the Decision Granule with d = di. We call such Conditional Granule as Irregular B-granule with respect to di, denoted by IRGrB(di). Proposition 20 shows the Regular B-granule to di is part of the decision granule of di, whereas the Proposition 30 illustrates the Irregular B-granule of di is Partially Overlapping the decision granule to di. Then the set approximation can also be transformed to approximation of decision granule DGrd(di) with the conditional granules CGrB(ValB) for a specified attribute set B. The key point here is that the process of approximating a granule is just the process of construction a complex individual by its simpler parts. Analogue to their counterparts in terms of set definition, we can define the notions of lower and upper granule approximation as below. Definition 1. The individual granule constructed by all the Regular B-granule of di, RGrB(di), is called the Kernel of the decision granule
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
DGrd(di) of di, denoted by KERNELGrB, which is the lower approximation. Definition 2. The individual granule constructed by all the Irregular B-granule of di, IRGrB(di), is called the Hull of the decision granule DGrd(di) of di, denoted by HULLGrB(di), standing for the boundary between the lower and upper approximation. Definition 3. The Kernel granule and Hull granule of DGrd(di) together compose the upper approximation of DGrd(di), which we call Corpus Granule, denoted by CORPUSGrB(di). From definitions above, it is clear that the substitution of Set Theory with Lesniewski Mereology can also present an equivalent theory of Roughness, with the superiority that the representational semantics are built in the process of roughness formation. 3. Granular Rough Theory: formalization efforts 3.1. Evolution of the representation model 3.1.1. Implicit to explicit semantic context Rough Set Theory operates on a single tabular Information System I = {U, C [ D}, where U stands for universe of discourse, C for conditional attributes and D for decisional attributes. In roughness formation, the notion of ‘‘indiscernibility relation’’ is used to partition entities in U into equivalent classes with respect to specific conditional and decisional attributes, respectively. Each conditional equivalent class is then compared to a give decisional one to determine whether the two sets of entity identities are subsumed, intersected or disjointed. Based on difference in settheoretic relations, entity identities in conditional partitions are used to approximate decisional equivalent classes from both lower and upper directions. The output of roughness analysis is presented as a pair of sets X ¼ X 1 , X¯ ¼ X 2 ðX 1 X X 2 Þ for entity identities, in which the target conception X is a set of entity identities qualified by decisional attribute valuation, whereas the source conceptions X1 and X2 are sets of entity identities qualified by valuation of some conditional attributes. This pair cannot be meaningfully interpreted without implicit references to semantic contexts such as metadata and concrete valuations that are inherent in Information Table. This viewpoint highlights that the representational semantics of roughness methodology is to approximately represent ‘‘an equivalent class of entity identities qualified by decisional aspects’’ with ‘‘several collections of entity identities qualified by conditional aspects’’. To emphasize the representative semantics, it is desirable for the representation model of roughness methodology to be more informational than mere sets of entity symbols. The represented information units should explicitly encode semantic contexts from underlying information system. 3.1.2. Structured to semi-structured information sources The structured tabular information system in classic Rough Set Theory is compliant with standard Entity-Relationship (ER) model [19], but it is not fit for semi-structured information sources. As proposed in last subsection, a new representation model should be designed to encode semantic contexts explicitly. If the new model follows some general design guidelines for representing semistructured data, it is feasible to achieve a more agile model that naturally extends applicability of roughness methodology to a wider range of information sources. Three major existing data models for semi-structured data include Entity-Attribute-Value (EAV) model for clinical study data management [20], Object Exchange Model (OEM) for Knowledge Integration style Web information System [21], and Resource Description Framework
789
(RDF) standard for resources on Semantic Web [22]. Despite of the concrete implementation details, ‘‘Attribute-Value’’ tuple based paradigm is widely used in all the semi-structured models mentioned above. This design consideration leads to the triple form of primitive information unit (atomic granule) in the representation model of Granular Rough Theory, the Granular Representation Calculus (GrRC). 3.1.3. Set theoretic to mereological relations among information granules In GrRC, different granularity of information blocks are referred to as Information Granules, which are more complex than merely a set of entity identity symbols. For such complex blocks, a more application semantics oriented relations amongst them are desirable. WCH (Winston, Chaffin, and Herrmann) taxonomy of mereological relationships gives at least six different interpretations for concrete application contexts of ‘‘part-to-whole’’ [23]. With operations that synthesize individual granules into more intricate information sources, mereological relations amongst them are encoded to identify inner structure of information sources. This gives a cut-in point for substituting Mereology for Set Theory in roughness methodology. 3.2. Granular Representation Calculus 3.2.1. Atomic granules and compound granules Def.GrRC-1: atomic granule is a semantic unit extracted from information systems, with the finest granularity of complete significant information, formalized as a triple (u; c; v), where the three elements correspond to the entity identity u, attribute name c and attribute value v, stating the simple fact that ‘‘entity u has an attribute c with value v’’, denoted by j or j(u; c; v). Equivalence of atomic granule is defined as equivalence on each element. Atomic granule represents the simplest individual that contains complete conception in GrRC. Besides notions from Mereology, atomic granule is the only primitive in GrRC, and more complex information structure can be constructed from atomic granules. In I*, examples of atomic granules are j(u1, c1, 0), j(u7, c3, 2). It should be clarified that if the triple (u; c; v) is the basic individual, what is u. From linguistic perspective, three elements of the triple correspond to subject, predicate and object in a propositional sentence. In the vocabulary space, subject, predicate and object are individuals. But in the sentence space, each of them cannot be a complete propositional sentence, where the basic individual is the sentence rather than its ingredient. In short, u is only an entity identifier, which is a reference to the described entity, and nothing else can be attained from the symbol. In the process of roughness formation in classic Rough Set Theory, most computations are operated on set of entity identifiers, implicitly depending on semantic context defined by underlying Information Table. In Granular Rough Theory, three elements of a triple explicitly encapsulate the semantic context for a given identifier. Complying with convention of Lesniewski Mereology, no Null information granules are allowed in our system. The information granule does not exist when any of the three elements is not specified. For occasions where no granules satisfy given conditions, and the result of an operation does not exist, a placeholder Ø is used, just like NaN (Not a Number) is used in arithmetic operations. The placeholder is not an information granule, for it should not be used for further operation with granules. Def.GrRC-2: Atomic Aggregation is the operation performed on multiple atomic granules to synthesize more complex information granule, bearing the meaning that all the facts contained in the operand granules are true. Aggregation operation puts multiple individual granules together in a pair of parentheses, and delimits
790
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
them with colons. Operator for atomic aggregation is . For binary operation, j1 j2def(j1:j2); for multiple granules operation, (j1, j2,. . ., jn)def(j1:j2:. . .:jn). The result of atomic aggregation is defined as Compound Granule, denoted by Q. Def.GrRC-3: Aggregation and Fusion operation for compound granules. Aggregation over compound granules bears the same function of that over atomic ones, the operator is also , to bind independent information parts to a whole, viz. Q1 Q2def(Q1:Q2). Fusion ( ) extracts all the atomic granules contained in operands and re-aggregates them into a new compound granule, viz. Qa Qb ¼ ðja1 : ja2 : . . . : jan : jb1 : jb2 : . . . : jbm Þ, where Qa ¼ ðja1 : ja2 : . . . : jan Þ, Qb ¼ ðjb1jb2 : . . . : jbm Þ. According to convention of Mereology, granules participate in aggregation operation are referred to as Ingredient of the result compound granule, and the mereological part-to-whole relations exists between them, denoted as P(j, Q) and P(Q1, Q2). Aggregation is the primary operation in GrRC, which provides an approach for information granules with different granularity to interact and synthesize more complex information structures in a constructive way. Above definitions for aggregation do not suppose any concrete application contexts, not specify the applicability of participants of the operation, and not define what WCH taxonomy of mereological relation exists in the resulted compound granule. Aggregation bears parameterized application semantics, which should be explicitly defined by application developers when GrRC is used to represent a concrete information system, especially for the mereological context between granules. For example, in the sample Information Table I*, aggregation means co-occurrence of data cells, whereas in an information system described by XML tree, aggregation may mean two sibling leaf nodes compose their parent node, and in more complex environments such as business process management system, aggregation may involve intricate business rules. Def.GrRC-4: Self Aggregation is the operation to aggregate one information granule with itself (or an equivalent one), j1 j1 = (j1:j1). There are two remarks: (1) No new fact is added to when an information granule is aggregated with itself, hence arbitrary number of redundant granules in aggregation will occur only once in the result, viz. j1 j1 = (j1). (2) Since compound granule (j1) expresses no more information than j1, mereological equivalence relation exists, viz. (j1) = j1. Same remarks are applicable to compound granules. Compound granule with only one atomic granule as its ingredient is referred to as trivial compound granule. Due to the mereological equivalence, atomic granules can be operands in operations for compound granules. This is quite different from the situation in Set Theory, where an element is not equivalent to the set containing it as the only element. Def.GrRC-5: Structural Categories of Non-trivial Compound Granules. Various operations over different ingredients lead to two structural categories of compound granules: Plain Compound Granule and Higher Order Compound Granule, corresponding to compound granule comprising only atomic ingredients, and granule with non-trivial compound ingredients, respectively. For aggregation and fusion, the former results in higher order compound granule, whereas the latter results in plain compound granule. Higher order compound granule encodes structural information, viz. the application semantic context in the form of different types of mereological relations, among its parts. For example, two compound granules standing for a simple ER table and an XML tree, respectively, the latter has much more complex hierarchical structure than the former. In terms of WCH taxonomy, mereological relation among ingredients of the ERtable granule is ordinary Member/Collection, whereas in the case of XML-granule, it is Component/Integral-Object. It is of
significance to discriminate between aggregation and fusion in the latter case. From structural perspective of compound granules, the iteration rule of (n 1)-ary to n-ary atomic aggregation can be clarified. The first n 1 atomic granules aggregate into a plain compound granule, n 1(j1:j2:. . .:jn 1). For the nth atomic granule jn, if the iteration rule is n(j1:j2:. . .:jn) = n 1(j1:j2:. . .:jn 1):jn, a higher order granule (Qn 1:jn) would be generated, but atomic aggregation always produces plain results, so the correct iteration rule should be n(j1:j2:. . .:jn) = n 1(j1:j2:. . .:jn 1) jn. Def.GrRC-6: Self-fusion (5) is a unary operation over higher order compound granule to remove structural information of its ingredients, resulting in a plain compound granule. Let Q = (Qa:Qb:jc), where Qa ¼ ðja1 : ja2 : . . . : jan Þ, Qb ¼ ðjb1 : jb2 : . . . : jbm Þ, then rðQÞ ¼ ðja1 : ja2 : . . . : jan : jb1 : jb2 : . . . : jbm : jc Þ. Def.GrRC-7: Granularity of Information Granules. To provide quantity categorical measurements, GrRC defines granularity of information granules from two aspects: number of primitive concepts and structural complexity of ingredients. Granular Cardinality of granule Q is defined as the number of atomic granules it contains, denoted by @(Q). For any atomic granule j, @(j) = 1; for higher order granules, self-fusion should be first performed to calculate its granular cardinality; for the placeholder Ø, it is defined that @(Ø) = 0, but this only means that when no information presents, granular cardinality is zero. Granular Order of granule Q is defined as below: for any trivial compound granule (an atomic granule), the granular order is 1, and for a non-trivial compound granule, the order is equal to the highest order of its ingredient plus 1, denoted as (Q). 3.2.2. Special compound granules for roughness formation Several special compound granules dedicated for roughness formation are defined in this subsection. Def.GrRC-8: Cluster Granule is the result of atomic aggregation on all the atomic granules jk (k = 1, 2,. . ., n) with the same value vt for the relevant attribute cj, by the operation of ðj1 ðui1 ; c j ; vt Þ; j2 ðui2 ; c j ; vt Þ; . . . ; jn ðuin ; c j ; vt ÞÞ, abbreviated as ððui1 ; ui2 ; . . . ; uin Þ; c j ; vt Þ, denoted by Jðc j ; vt Þ. For example, in I*, J(c3, 1) = (u2, u5, u6, u8), c3, 1), J(c5, 0) = ((u7), c5, 0). By Postulate MERE-2, a specific cluster granule can be regarded as the sum (totality) of all atomic granules belonging to the class of atomic granules describing ‘‘the entity having the attribute cj with value vt ’’. Through semantic mapping from granules to entities described, this interpretation is equivalent to that a cluster granule is an information granule describing the sum of all entities that have the attribute cj with value vt . Def.GrRC-9: Aspect Granule is the result of atomic aggregation on arbitrary number of atomic granules jk (k = 1, 2,. . ., n) describing the same entity ui, by the operation of ðj1 ðui ; c j1 ; v1 Þ; j2 ðui ; c j2 ; v2 Þ; . . . ; jn ðui ; c jn ; vn ÞÞ, abbreviated as ðui ; ðc j1 ; c j2 ; . . . ; c jn Þ; ðv1 ; v2 ; . . . ; vn ÞÞ, denoted by C ðui ; ðc j1 ; c j2 ; . . . ; c jn ÞÞ. An aspect granule encapsulates multiple aspects of a given entity. For example, in I*, C(u1, (c1, c3, c4)) = (u1, (c1, c3, c4), (0, 0, 2)) states that entity u1 has attributes c1 = 0, c3 = 0 and c4 = 0. Def.GrRC-10: Aspect Cluster Granule is the result of aggregation on all the aspect granules Ck (k = 1, 2,. . ., n) with the same valuation of attributes ðc j1 ; c j2 ; . . . ; c jn Þ, by aggregation of compound granules ðC 1 ðui1 ; ðc j1 ; c j2 ; . . . ; c jm ÞÞ; C 2 ðui2 ; ðc j1 ; c j2 ; . . . ; c jm ÞÞ; . . . ; C n ðuin ; ðc j1 ; c j2 ; :::; c jm ÞÞ, denoted by G ððc j1 ; c j2 ; :::; c jm Þ; ðv1 ; v2 ; :::; vm ÞÞ, and abbreviated as ððui1 ; ui2 ; :::; uin Þ; ðc j1 ; c j2 ; :::; c jm Þ; ðv1 ; v2 ; :::; vm ÞÞ. For convenience, E ¼ ðui1 ; ui2 ; :::; uin Þ, A ¼ ðc j1 ; c j2 ; :::; c jm Þ, V ¼ ðv1 ; v2 ; :::; vm Þ are referred to as Entity Segment, Attribute Segment and the Value Segment of the aspect cluster granule. An aspect cluster granule stands for the sum of
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
aspect granules that describing ‘‘the entity having the attribute collection A with value collection V’’. In I*, G((c1, c2), (0, 1)) = ((u1, u3, u8), (c1, c2), (0, 1)). If the entity segment of an aspect cluster granule has only one ingredient, this granule retrogresses to ordinary aspect granule, which describes characteristics of one entity. Another case when the attribute segment and value segment only stands for a single attribute-value pair, the aspect cluster granule retrogresses to ordinary cluster granule. 3.2.3. Granule operations over special compound granules for roughness formation Besides common operations of compound granules, there are additional operations that generate or operate on the special compound granules defined above. Def.GrRC-11: Merge Operation () is the operation over cluster granules or aspect cluster granules to generate a new aspect cluster granule, valid expressions including G ðA j ; V j ÞG ðAk ; V k Þ def G ðAl ; V l Þ; Jðc j ; v j ÞG ðAk ; V k Þ def G ðAl ; V l Þ, and Jðc j ; v j ÞJ ðck ; vk Þ def G ððc j ; ck Þ; ðv j ; vk ÞÞ. Merge operates in three steps for each case, taking Jðc j ; v j ÞJðck ; vk Þ for clarification: (1) the two operands are first fused to get a new plain compound granule, Jðc j ; v j Þ Jðck ; vk Þ ¼ Q; (2) aggregate atomic granules in Q for each entity into aspect granules with maximal granular cardinality (@(Q)); (3) keep all the aspect granules for attribute segment (cj, ck) with value segment ðv j ; vk Þ as ingredients of the resulted compound granule. In some cases, when no result for the relevant attribute segment and valuation could be generated by merge operation, the placeholder Ø is used. For example, in I*, to merge J(c3, 1) = ((u2, u5, u6, u8), c3, 1) and G((c1, c2), (0, 1)) = ((u1, u3, u8), (c1, c2), (0, 1)): (1) Perform fusion operation over operands:
Jðc3 ; 1Þ G ððc1 ; c2 Þ; ð0; 1ÞÞ ¼ ððu2 ; c3 ; 1Þ : ðu5 ; c3 ; 1Þ : ðu6 ; c3 ; 1Þ : ðu8 ; c3 ; 1Þ : ðu1 ; c1 ; 0Þ : ðu3 ; c1 ; 0Þ : ðu8 ; c1 ; 0Þ : ðu1 ; c2 ; 1Þ : ðu3 ; c2 ; 1Þ : ðu8 ; c2 ; 1ÞÞ (2) Find out all aspect granules with maximum granular cardinality: ððu2 ; c3 ; 1Þ : ðu5 ; c3 ; 1Þ : ðu6 ; c3 ; 1Þ : ðu1 ; ðc1 ; c2 Þ; ð0; 1ÞÞ : ðu8 ; ðc1 ; c2 ; c3 Þ; ð0; 1; 1ÞÞÞ (3) Remove ingredients irrelevant to attribute segment (c1, c2, c3) and its valuation (0, 1, 1): ððu8 ; ðc1 ; c2 ; c3 Þ; ð0; 1; 1ÞÞÞ ¼ ðC ðu8 ; ðc1 ; c2 ; c3 ÞÞÞ ¼ C ðu8 ; ðc1 ; c2 ; c3 ÞÞ
The result is a retrogressed aspect cluster granule. For convenience of expressing common operations over the ingredients of a compound granule, three unary internal operations of plain compound granule are defined below: Horizontal Convergence, Vertical Convergence and Full Convergence. Def.GrRC-12: Horizontal convergence is applied to plain compound granule to aggregate aspect granules with maximal granular cardinality, viz. with maximal number of possible attributes, from its atomic ingredients, so as to transfer a plain granule into a higher order granule. Def.GrRC-13: Vertical convergence is applied to plain compound granule to aggregate cluster granules from its atomic ingredients, so as to transfer a plain granule into a higher order granule. Def.GrRC-14: Full convergence is applied to plain compound granule to aggregate aspect cluster granules with maximal number
791
of possible attributes from its atomic ingredients, so as to transfer a plain granule into a higher order granule. These internal operations together with self-fusion operation are facilities for conversion between plain and higher order compound granules. The second step of the merge operation is horizontal convergence. Def.GrRC-15: Aspect Shift Operation ðdA j Þ. Given an aspect cluster granule (E, Ai, Vi), from the aspect of its attribute segment Ai, its entity segment E stands for the mereological sum (class) of entities with attributes Ai valuated by Vi. When the class of entities is observed from another perspective, say Aj, how would these entities distribute in new classes defined by valuation of Aj? Aspect shift operation implies how an aspect cluster granule evolves from a single entity class defined by the original aspect, into one or more entity classes defined by the valuation of target aspect. This operation provides a mapping method between different entity aspects. Aspect shift first fuses all the atomic granules for each entity in E with respect to each attribute in Aj, and then performs full convergence operation over the compound granule to generate result. For example, in I*, given an aspect cluster granule G((c1, c2), (0, 1)) = ((u1, u3, u8), (c1, c2), (0, 1)), aspect shift operation can be performed, to find out the distribution of its entities in deferent equivalent classes defined by indiscernibility on decisional attribute d, and the result is dd(G((c1, c2), (0, 1))) = (((u1, u3), d, d1):(u8, d, d3)). 3.2.4. Identification of mereological relations Due to the atomicity of atomic granules, all of them are disjoint from each other, viz. 8ji, jj, DR(ji, jj), i 6¼ j. For two compound granules, when they have common ingredients, there is overlap relation O between them, viz. O(Q1, Q2). Def.GrRC-16: Generic Wrap (\G) operates on two compound granules to obtain atomic ingredients of the overlapped part. Let rðQa Þ ¼ ðja1 : ja2 : . . . : jan Þ, rðQb Þ ¼ ðjb1 : jb2 : . . . : jbm Þ, Qa \ G Qb def ðjc1 : jc2 : . . . : jcl Þ, where jc1 ; jc2 ; . . . ; jcl are common atomic ingredients of both Q1 and Q2. Wrap operation first perform self-fuse on either granule to remove the structural information encoded, so as to investigate the shared atomic granules. It is not uncommon that the two operands are disjoint, i.e. DR(Q1, Q2). Since null granule is excluded from GrRC, wrap operation over them produces nothing but the placeholder Ø. By definition, operands of aggregation operation become ingredients of the result granule, viz. the part-to-whole relation is maintained; furthermore, based on transitivity of part-to-whole relation stated in Postulate Mere-1, parts of the ingredients are also ingredients of the whole. For example, if Qr = Q1 j0, where Q1 = j1 j2, then P(Q1, Qr), P(j0, Qr), P(j1, Qr) and P(j2, Qr) are all maintained. Now there is a problem in that if another granule Q2 = j0 j1, the relation P(Q2, Qr) is also maintained or not. This problem lies in the structural information inherent in application specific semantics defined by the aggregation operation. As mentioned previously, due to the parameterized semantics of aggregation, significance of involved structural information is quite different in various systems. To avoid ambiguity, meanwhile to enable GrRC to accommodate complex structure representations, part-to-whole relations amongst information granules are classified into Canonical and Generalized part-to-whole relations. Canonical pat-to-whole relation requires that the part-granule must be the whole-granule itself, the operand in aggregation operation for the whole-granule, or ingredient of operands, which imposes exact structural consistence based on concrete mereological context. In the sense of canonical mereological relation, P(Q2, Qr) in above example may not maintain for most cases. On the other hand, generalized part-to-whole relation requires only that atomic ingredients of the part-granule be ingredients of the whole
792
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
granule, without considering internal structural differences brought by aggregation. In terms of generalized part-to-whole, P(Q2, Qr) in above example is valid. The discrimination of mereological relation results from different significance of structural information in various mereological contexts. In systems that require canonical mereological relation to reserve semantics invariance, comparison mechanism to identify canonical part-to-whole should be provided. Due to diversity of realistic information systems, implementation for canonical part-to-whole identification in specific system is case by case, and incarnation of such mechanism for typical mainstream structure such as XML, RDF and relational model requires major future efforts of Granular Rough Theory developing. In single table information systems where the mereological context is simple, generalized part-to-whole is enough for roughness formation. To identify generalized part-to-whole relation between two compound granules Qa and Qb is as followed: (1) Compute Qr = Qa\GQb; (2) Compute granular cardinality: @(Qa), @(Qb), @(Qr); (3) Compare above three cardinalities, and if granular cardinality of the result @(Qr) equals to that of the operand with smaller cardinality, generalized part-to-whole is maintained. 3.3. Remarks on web information source representation 3.3.1. Information repository Based on atomic granules as primitive constructs, over operations defined in GrRC, multiple information granules of different granularity can be aggregated or fused from finer granules, and eventually formally represent realistic information systems. As a major motivation to define GrRC, by accommodating semi-structured or relational data, it is expected to expand applicable information systems for roughness methodology. As a reference for GrRC design, an important methodology to model semi-structured data on the Web is the Object Exchange Model addressed by Papakonstantinou et al. in [21] for a Knowledge Integration Web information project TSIMMIS. OEM is a lightweight data representation method, used in a well-known query language for semi-structured data, namely, LOREL [24]. Each object in OEM has the structure: (Label, Type, Value, Object-ID), where Label is a string describing what the object represents, Type is the data type of the object’s value, Value is the value for the object, and Object-ID is an identifier for the object. Object is atomic when its value is from basic atomic data types such as string, integer, etc. Whereas object that has its value as a set of object references in the form (label, object-id) is complex object. For example, (city, string, ‘‘Paris’’, &13) is an atomic object, and (address, set, {(city, &13), (zip code, &12), (country, &14)}) is a complex object. It should be remarked on the complex object representation for GrRC in representing such systems. From a relational database viewpoint, a single Information Table stands for only’ one relation of the whole database schema. A relation contains tuples describe a class of homogeneous entities. For instance, it is not possible for a relation to represent simultaneously on-line transactions and diagnostic records. As in database design, when multiple classes of entities are of interests, there would also be multiple relations standing for them. Secondly, when an attribute value of one entity is by itself a firstclass object, embedded case of information granules occurs, just as the case of complex objects in OEM. By enlarging the universe of discourse U with multiple subuniverses, for each of which there are entities of a specific type, we can smooth off the gap with the notion of Information Repository. For example, the Information Repository for information source
describing the bibliographies may contain Authors and Books; then R = IAuthors IBooks, IAuthors = (UA, AA), IBooks = (UB, AB). Entities in UA are human beings that wrote some books, whereas entities in UB are publications. Even in commonsense, they are of different types and not comparable, so it would be improper to present them in one single Information Table. But to present them together in separate Information Tables of one Information Repository is reasonable and necessary. The necessity exists in that both the authors and books are of our Interest for investigating a bibliography source, viz. the universe of discourse should be the union of the two sub-universes. As far as it is concerned with attribute set to either class of entities, they describe corresponding aspects of specific entity class, and sometimes, there may be situations when the value for an entity in one class is by nature an entity in the other class. Taking account of the bibliography example, one attribute of an author may be ‘‘the first book’’, which may be an entity described by IBooks = (UB, AB). Such a case is resembled with the relationships in relational database systems. Without modification of the Atomic Granule, which is still of the form ðui ; c j ; vt Þ for the same semantic interpretation, GrRC is applicable to entities described inside each sub-universe of the Information Repository in order to decompose the tabular representation into information granules. After the decomposition, the mutual mereological relation between atomic granules from different sub-universe may evolve. As described previously, all atomic granules in GrRC are disjoint, but for two information granules from different sub-universes, such as in above example of bibliography source, are they still disjoint when one information granule can be regarded as the value of an attribute for the other? It may not be so easy to realize such relation. If the granule is (author1, ‘‘magnum opus’’, ‘‘Ulysses’’), the value of the attribute ‘‘magnum opus’’ is too easy to be superficially regarded as a string only. But by semantic nature of the description, the value refers to ‘‘a book which has the title ‘Ulysses’’’, which is an atomic granule (book1, ‘‘title’’, ‘‘Ulysses’’) from UB. The granule (author1, ‘‘magnum opus’’, ‘‘Ulysses’’) can be rewritten as (author1, ‘‘magnum opus’’, (book1, ‘‘title’’, ‘‘Ulysses’’)). Concerned with the mereological relation between them with WCH taxonomy, Component/Integral-Object relation exists. 3.3.2. Mapping RDF described information sources Resource Description Framework (RDF) is mainstream standard for describing web information sources, as well as one reference model that inspires GrRC. In its model and syntax specification, triple is significant for the data model, and can be naturally mapped to atomic granules in GrRC. Such a key element makes it possible to map RDF notations into expressions stated in terms of the GrRC. As a typical example in [22], Fig. 1 illustrates how RDF uses a graph approach to represent an information source with a property bearing structured value, with semantics ‘‘‘http://www.w3.org/ Home/Lassila’ has creator something and something has name ‘Ora Lassila’ and email ‘
[email protected]’’’. From top down it can be mapped into GrRC. The URI ‘‘http://www.w3.org/Home/Lassila’’ is an entity identifier, and to describe its attribute ‘‘Creator’’, there is an atomic information granule j1 = (‘‘http://www.w3.org/Home/ Lassila’’, Creator, Q1), where the compound granule Q1 = j2 j3.
Fig. 1. A typical example of RDF representation for information sources.
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
Designating the unnamed entity with an arbitrary identifier e1, j2 = (e1, Name, ‘‘Ora Lassila’’) and j3 = (e1, Email, ‘‘
[email protected]’’), and the aggregation of them forms a compound granule describing multiple aspects of the same entity e1, viz. the compound granule is an aspect granule with respect to e1. For artifacts beyond those in Basic RDF, like Bag, Sequence and other Container model, it can be expressed by a compound granule aggregated from ingredient granules. By such a general instance, the representative ability of GrRC can be exhibited, and a formal proving and mapping of such representative ability is underway. 3.4. Formation of roughness over GrRC In terms of GrRC, representational semantics of roughness can be explicated as: to approximate a decisional cluster granule with several conditional aspect cluster granules. In this section, roughness formation approach over information systems represented by GrRC is discussed. Given a conditional attribute collection B, according to all possible valuation of attributes in it, a series of aspect cluster granules relevant to B in the information system can be aggregated. According to decisional attribute d, by its valuation, several decisional cluster granules can be constructed. Let Gk(B, Vk) stands for an conditional aspect cluster granule with valuation segment Vk for B in the aggregated series; and J(d, di) stands for a decisional cluster granule with valuation di for d. By applying aspect shift operation dd(Gk(B, Vk)), the result is a compound granule Qk ¼ ðJ1 ðd; v1 Þ : J2 ðd; v2 Þ : . . . : Jt ðd; vt ÞÞ, vi ði ¼ 1; 2; . . . ; tÞ. Qk exhibits value versatility of entities in the entity segment of Gk(B, Vk) when observed from decisional aspect. For information system where structural significance introduced by aggregation is crucial, canonical part-to-whole relation between Qk and J(d, di) should be identified due to concrete mereological context; otherwise, wrap operation Qk\GJ(d, di) can be applied to identify the generalized part-to-whole relation. By the mereological relation between Qk and the decisional cluster granule J(d, di), the aspect cluster granule accounted for can be classified into three classes: Def.GrRough-1: If Qk is part of J(d, di), P(Qk, J(d, di)), Gk(B, Vk) is called Regular B-Granule with respect to di, denoted by RB(di). Def.GrRough-2: If Qk overlaps J(d, di), but not part of it, O(Qk, J(d, di))^:P(Qk, J(d, di)), Gk(B, Vk) is called Irregular B-Granule ˆ B ðdi Þ. with respect to di, denoted by R Def.GrRough-3: If Qk is disjoint from J(d, di), DR(Qk, J(d, di)), Gk(B, Vk) is called Irrelevant to di. Analogue to their counterparts in terms of set definition, we can define the notions of lower and upper granule approximation as below. Def.GrRough-4: The compound granule aggregated by all the regular B-granules with respect to di, RB(di), is called Kernel Granule with respect to J(d, di), denoted by LB(di), which is interpreted as the lower approximation of the decisional granule. Def.GrRough-5: The compound granule aggregated by all the ˆ B ðdi Þ, is called Hull Granule irregular B-granules with respect to di, R with respect to J(d, di), denoted by DB(di), standing for the boundary between the lower and upper approximation. Def.GrRough-6: The kernel and hull granule to J(d, di) together aggregate the upper approximation of J(d, di), which we call Corpus Granule with respect to J(d, di), denoted byVB(di). Taking I* for example, let B = (c1, c2), di = d1. (1) Calculate current decisional cluster granule through aggregation operation:
Jðd; d1 Þ ¼ ððu1 ; u3 ; u5 ; u6 ; u7 Þ; d; d1 Þ;
793
(2) Calculate aspect cluster granules corresponding to different valuation of (c1, c2):
G 1 ðB; ð0; 1ÞÞ ¼ ððu1 ; u3 ; u8 Þ; B; ð0; 1ÞÞ; G 2 ðB; ð1; 0ÞÞ ¼ ððu2 ; u7 Þ; B; ð1; 0ÞÞ; G 3 ðB; ð2; 0ÞÞ ¼ ððu4 Þ; B; ð2; 0ÞÞ; G 4 ðB; ð2; 1ÞÞ ¼ ððu5 ; u6 Þ; B; ð2; 1ÞÞ; (3) Apply aspect shift operation to each aspect cluster granule in the results of step (2) from conditional attribute segment B to decisional attribute d:
Q1 ¼ dd ðG 1 Þ ¼ ðððu1 ; u3 Þ; d; d1 Þ : ðu8 ; d; d3 ÞÞ; Q2 ¼ dd ðG 2 Þ ¼ ððu2 ; d; d2 Þ : ðu7 ; d; d1 ÞÞ; Q3 ¼ dd ðG 3 Þ ¼ ðu4 ; d; d2 Þ; Q4 ¼ dd ðG 4 Þ ¼ ððu5 ; u6 Þ; d; d1 Þ; (4) Apply wrap operation to compare each compound granule in results of step (3) and to the decisional cluster granule in step (1), to identify generalized part-to-whole relations between them: OðQ1 ; Jðd; d1 ÞÞ : PðQ1 ; Jðd; d1 ÞÞ; OðQ2 ; Jðd; d1 ÞÞ : PðQ2 ; Jðd; d1 ÞÞ; DRðQ3 ; Jðd; d1 ÞÞ; PðQ4 ; Jðd; d1 ÞÞ; (5) Classify each conditional aspect cluster granules in step (2), due to results of step (4): Regular B-Granule with respect to d1: RB ðd1 Þ ¼ G 4 ðB; ð2; 1ÞÞ; Irregular B-Granule with respect to d1: ˆ B;1 ðd1 Þ ¼ G 1 ðB; ð0; 1ÞÞ; R ˆ B;2 ðd1 Þ ¼ G 2 ðB; ð1; 0ÞÞ; R Irrelevant B-Granule with respect to d1:
G 3 ðB; ð2; 0ÞÞ; (6) Construct approximation of J(d, d1): Kernel Granule:
LB ðdi Þ ¼ ðRB ðd1 ÞÞ ¼ G 4 ðB; ð2; 1ÞÞ; Hull Granule: ˆ B;1 ðd1 Þ : R ˆ B;2 ðd1 Þ ¼ ðG 1 ðB; ð0; 1ÞÞ : G 2 ðB; ð1; 0ÞÞÞ; DB ðd1 Þ ¼ R Corpus Granule:
VB ðd1 Þ ¼ ðLB ðd1 Þ : DB ðd1 ÞÞ: 3.5. Semantic superiority of granular approach Each atomic information granule encapsulates a piece of information about some entity in the real world, with the form of a triple ðui ; c j ; vt Þ. From the linguistic point of view, the triple gives a syntax structure which is equivalent to a propositional sentence, whereas the subject is ui, the predicate is cj, and the object is vt . Then the semantics of a triple is just identical to what is revealed by the corresponding propositional sentence. In the real world, the entity ui is a first class object that exists objectively; while an information granule is a second order object exists only on some ontological level to commit to some conception of entities in the real world. In our granular calculus, such a second order object is defined as primitive individual for further operation. The decision to select the atomic granules rather than entities they describe as the primitive constructs lies in following points: (1) A symbol itself is just a name; nothing is really encoded in the symbol, say ui. What makes an entity be ui is not available
794
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
solely from the symbol, so the extensional aspects of ui must be presented for representation of knowledge related to the entity, viz. its attributes should also be accessible in an operational computing environment. By information granules, such information is bound to the entity as a whole, ready for access. (2) An atomic granule is regarded as an individual with detailed information encapsulated in it; from this aspect, it is somewhat like an atomic proposition P in the formal logic. The great convenience of atomic propositions is their capabilities to form formulae to state more intricate logic sentences, while the semantics of each atomic proposition is still preserved and they form the semantics of the formulae. Via primitives and operations defined in our granular calculus, granules of higher complexity can easily generated from simpler ones, in a pure formalism way, without taking account of what information a granule is meant for. (3) In classical set approach to roughness research, the notion of equivalent class due to a specified attribute set is critical. An equivalent class is a set of entities partitioned from the universe by the valuation of the relevant attributes, for example, {u1, u2,. . ., uk}. From such a set, nothing more than the set of symbols can be extracted without semantic contexts, including entity information, attribute set related and valuation of the attributes for the given partition. In the duration of set approximation of classical rough set, these semantic contexts are used implicitly, dependent too much to external reference, such as subscript of set denotation. This implicitness is prone to intertwined external references when multiple simpler constructs are to model more complex one. By cluster and other compound information granules, our system is more explicitly encoding such vital information inside the constructs, together with the entity related. (4) By differing granules describing distinct attributes of a given entity as different aspect granules, the semantic intension of roughness theory, viz. ‘‘to present a way of approximately representing a class of entities characterized by some aspects in terms of entity collections described at other aspects’’, is revealed by the process of finding granule approximation naturally. For the sake of implicitly used contexts, such intension of original Rough Set Theory is long concealed, which lead to abuses and mistakes in applications of approximation, leading to results of vain semantic significance. 4. Granular Rough Theory adapted to multi-agent context Efforts on encapsulating all relevant elements into individual granules and then develop roughness theory over granules lie in the expectation that with proper efforts, the rough approach could play a crucial role in the Knowledge-central system. As mentioned above, each data cell of an Information Table gives the minimal significant semantics of the corresponding entities, viz. a triple of the form ðui ; a j ; vk Þ, with the sentential meaning that ‘‘Entity ui has the attribute aj with the value of vk ’’. Such a triple is referred to as the elementary unit, atomic granule, of Granular Representation Calculus. With facilities of GRC, a theory of roughness is extracted from the granular aspects based on pure mereological relation over information granules, referred to as Granular Rough Theory. Now shifting the context of information granules to a multiagent system, each agent can have her own knowledge/belief/etc. of the outer world, which means there would be multiple Information Tables in the entire system. Hence, the Information Table of 2D structure can only represent the viewpoint of a specific agent, whereas the triple ðui ; a j ; vk Þ encodes incomplete informa-
tion without mentioning the corresponding agent. Then our new approach would set out by incorporating the missed element, the viewpoint of agent, into the atomic granule, bringing in a quadruple ðag t ; ui ; a j ; vk Þ, with the complete semantics a data cell could speak out, i.e., ‘‘Agent agt knows/believes/etc. that entity ui has the attribute aj with the value of vk ’’. In this section, based on the new quadruple form of atomic information granules, we would investigate some issues of adapting granular rough theory to the multi-agent system, so as to enrich the fundamental research of roughness. 4.1. Granule Space 4.1.1. Coordinate alternative for granule representation The tabular style Information Table can have some equivalent extensional forms to visualize it by other usual mathematical diagram. For a given triple, an atomic information granule jðui ; a j ; vk Þ in original Information Table, it is easy to suppose that there exists a functional relation F: U A ! V, which states that the specific attribute value of a granule is determined by the entity identity and the attribute type, viz. vk ¼ Fðui ; a j Þ. This equation can be represented by a valued point in a two-dimensional coordinate system, as shown in Fig. 2. Some remarks should be clarified to make such a representative form a viable auxiliary tool. First, each triple is now a visualized point Pt(ui, aj) of two coordinate arguments, the entity identity and the attribute type, whereas a value vk is assigned to the point as its associated functional value dependent on its coordinates. Secondly, either of the coordinates is a discrete set and sorted only by the given index, which distinguish the coordinate system in Fig. 2 from common mathematical plot systems in the following ways: (1) there are only non-negative index, which makes the plot system a quarter plane; (2) the nature of the arbitrary given index makes it little sense to spatially concern with the relative position of two points, which is essential in a real Euclidean geometry coordinate system; (3) the value associated with a point is a valuation of the attribute type related to this point, then values of two different points may not be comparable. With the shift of discourse context into the multi-agent systems, a quadruple ðag t ; ui ; a j ; vk Þ becomes the core presentation form for an atomic information granule. F is adapted as F0 : Ag U A ! V and vk ¼ F 0 ðag t ; ui ; a j Þ. In a similar manner, this extended atomic granule can be plotted as a valued point in a three-dimensional coordinate system illustrated in Fig. 3. Even though there are many restrictions that constrain the coordinate system for information granules to be Euclidean spatially significant, the three-dimensional system given in Fig. 3 does present us with a convenient way of clarifying the
Fig. 2. Coordinate representation of triples.
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
795
Fig. 3. Coordinate representation of quadruples.
relationship between information granules with their dependent factors in a Multi-Agent environment. In the rest of this section, we are attempting to give necessary definitions and descriptions of such a system, which is referred to as Granule Space. 4.1.2. Rudiments of Granule Space An informal definition of Granule Space goes as that it is a hypothetical qualitative quasi-Cartesian coordinate system, with three non-negative axes, respectively standing for arbitraryordered discrete agent viewpoints, entity identities in the universe of discourse and attribute types for entities, holding threedimensional points qualified by the three coordinates, for each of which it is valued as the specific attribute value of an entity in the viewpoint of the associated agent. The following properties of Granule Space should be noticed from the definition given above: (1) By the term quasi-Cartesian, the coordinate system of Granule Space is similar to a restricted or pruned form of common Cartesian coordinate system. The similarity consists in the convention of coordinate representation and interpretation for points, which means that in Fig. 3, by an orthogonal projection from a point to a specific axis, the corresponding value could be got. (2) The Granule Space is a hypothetical system, which means it is only used to locate the conceptual objects, not fit for the physical spatial purpose without deliberate modification. (3) The Granule Space is not a quantitative but a qualitative system that is determined by the differences of its axes from their counterpart in the general Cartesian systems, as stated in the last subsection, including the non-numerical semantics of the coordinate, the incomparability of values associated with different points for distinct attributes. In such a system, to quantitatively define a metric to evaluate the distance between two information granules denoted by the points does not feasible. (4) The Granule Space is intrinsically discrete, which takes two sides: from the aspect of the coordinate axes, they stand for discrete notions such as agent views, entities and attribute types; from the aspect of contents in a Granule Space, they are individual information granules dispersed in such a conceptual space. In such a view, the Granule Space is never continuous, indicating that the interval between two neighbored coordinate graduations would not be of any importance, and that there would be nothing between two neighbored points. Then the meaningful coordinate graduation planes are crisscrossed to form a 3D grid framework. Moreover, the original form of a point to stand the information granule can be substituted with a more natural visualization of it, a sphere centered on the point Pt(ag, ui, aj), as indicated by the dashed circles in Fig. 3.
Fig. 4. An information cube in granule space.
4.1.3. Information Cube Such a three-dimensional Information Cube can be a standard extension to the original 2D Information Table structure for multiagent systems, which encodes the knowledge/belief of each agent so as to provide additional functions of Information structure itself, viz. the ability to evaluate, synthesize and harmonize the distributed viewpoints. Fig. 4 gives a qualitative view of Information Cube, omitted the associated attribute value to each granule, while a quantitative view of it could also be given as a 3D grid, in each cell of which there is stuffed with an attribute value, like a pile of layered Information Tables. For an Information or Knowledge System builder and analyzer, the value of an information representative methodology lies in what connotation can be extracted from such an approach. Now we could have a naı¨ve exploration of the implicit connotation the Information Cube could incorporate. As usual, we could investigate it as common cubic object, i.e. to begin with its inherent three planes paralleled with the axes-plane: the Entity-Attribute plane, the Agent-Attribute plane and the Agent-Entity plane. A plane of Information Cube is equivalent to a layer of arrayed information granules. The Entity-Attribute plane cutting with a specific agent graduation agt takes the meaning of the complete viewpoint of agent agt with respect to the universe of discourse, namely, the Information Table in her mind, also referred to as Agent Sight. The Agent-Attribute plane intersecting a specific entity identity ui on the Entity axis stands for all agent views about entity ui, referred to as Entity Perspective. The Agent-Entity plane meeting the Attribute axis at the graduation aj indicates the extensional values of the specific attribute aj presented over each entity from all the agents’ viewpoints, referred to as Attribute Extension. 4.2. Quadruples instead of triples An entity has many aspects expressed by its multiple attributes, whereas for the same attribute, its value may be inconsistent in different observer’s sights. This is the case in a multi-agent system, in which the quadruple structure of an information granule is appropriate to convey personality of agents in their view of the objective entities. As the adapted form of conventional Information
796
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
System in such a context, we define a Multi-Agent oriented MInformation System IM = (Ag, U, A), given by an Information Cube for which Ag is the set of agent views, U is the universe of discourse and A is the set of attribute types. If we were to apply the rough methods while analyzing IM, it would be crucial to take into account the semantic goals that can be achieved through roughness, otherwise, that would be pure imaginary symbolism making no sense. Substituting triple form atomic granule jðui ; c j ; vk Þ with the quadruple jðag t ; ui ; c j ; vk Þ, the underlying granular calculus is modified to an extended version, the M-Granular Calculus CM. In CM, most the basic operations are consistent with the original system, while the internal structures of the compound granules generated are incorporated with the additional agent information, leading to the modification of original definitions for some special compound granules and introduction of new instances. For a compound granule Q1 = (j1, j2,. . ., jn) = (j1, j2,. . ., jn), it is aggregated from n atomic granules j1, j2,. . ., jn via the Aggregation . Grouping each element of quadruples of Q1 into Agent Segment Ag Q1 ¼ ðag t1 ; ag t2 ; . . . ; ag tn Þ, Entity Segment EQ1 ¼ ðui1 ; ui2 ; . . . ; uin Þ, Attribute Segment AQ1 ¼ ða j1 ; a j2 ; . . . ; a jn Þ and Value Segment V Q1 ¼ ðvk1 ; uk2 ; . . . ; ukn Þ, then Q1 ¼ ðAg Q1 ; EQ1 ; AQ1 ; V Q1 Þ. The agent segment in a compound granule is critical in our analysis, which decides if the granule at hand is in the local scope of one agent view or spanning multiple views. There are two important cases of the agent segment, one for only one specific agent view agt and the other for all the agent views in the entire M-Information System, due to which, the corresponding compound granules are named agt-Local Granules and Global Granules, respectively. Such a classification is applicable to the original definitions of Cluster Granule, Aspect Granule, and Aspect Cluster Granule and so on. It is natural for each participated agent to have her own right to analyze her own Agent Sight of the universe to achieve local perception of Roughness. That is, by slicing the given Information Cube into layers paralleled with the Entity-Attribute axis plane, methods in 2D Rough Granular Theory can be applied to local compound granules in each layer, so as to roughly approximate agent-local decisional granules with agent-local conditional aspect cluster granules. For the M-Information System itself, the extended Granular Rough Theory should take account of attaining not only the localroughness for each agent but also the global-roughness. It is straightforward to apply similar process when we define the Granular Roughness from an Information Table, viz. to classify all the global Aspect Cluster Granules into Regular, Irregular and Irrelevant Granules with respect to a given global Decisional (Aspect) Cluster Granule, moreover, to define the global Kernel, Hull and Corpus Granule with respect to this Decisional (Aspect) Cluster Granule. It should be noticed that the Shift operation is based on some internal connection amongst information granules, which is interpreted as aspects belonging to the same entity in the Information Table, whereas in an Information Cube, it is confined to be aspects of the same entity in the same aspect’s view. By nature, the above two approaches are both trying to reduce the 3D information structure given by an Information Cube to 2D structure. For the former, it applies the Slicing method to cut the Information Cube into layers, and then confines each pass of investigation in a single Information Table; the latter, it implicitly utilize the Flattening method to merge each layer of the Information Cube into a large global Information Table, in which, the original universe of entities is enlarged by re-labeling each of them a new identity incorporated with the hint of agent view. Then the extended version of Granular Rough Theory is consistent with its triple version. But this extended version gives more than that.
4.3. Challenges of multi-agent context Here we are confronted with a puzzle. From the motivation of roughness theory, the roughness is developed to discover the decision rules of an Information System, so that we could base our reasoning on these rules to infer the decision attribute value of an entity due to some conditional attributes values of it. On the other hand, the most import contribution of roughness theory is to present a way of approximating a set of entities from inner and outer of the set, leaving undecided area as boundary, or the Hull Granule in the Granular Rough Theory. In the non-agent oriented system, there are no great conflicts between these two methodological connotations. Nevertheless, in the M-Information System, since different agents might have diverse knowledge/beliefs over the outer world, drastic inconsistency arises. For instance, there are two agents ag1 and ag2 in the system, both of the agents may inferred the decision rule ‘‘each paper that has the readability of 3 points and innovation of 4 is accepted’’ from their own Information Table. Then such a rule is translated into the representation form as ‘‘the class of papers that will be accepted can be approximated by the class of papers that have the attribute readability with value 3 and the attribute innovation with value 4’’. Since the attributes ‘‘readability’’ and ‘‘innovation’’ are both somewhat subjective, local Information Table for either agent may be quite different in the real data distribution, but they happen to achieve common rules describing only the inference relation between attribute values. Then for a concrete paper, it may be hard to decide whether it is qualified or not without further efforts to coordinate contradictions between agents’ views. But if the Information Tables of each agent were identical, it would not make any sense to make efforts on it. Rough approach of information analysis is now challenged in the context of multi-agent system. Such a puzzle lies in multiple factors that affect the agents’ views to the outer world, including the epistemic characteristics of each agent, the system deployments and other concrete environments, and so on. It is out of our reach in the present paper to establish a systematic methodology to resolve the puzzle and left as an open issue for future research. In next subsection, some measurements about the diverse perspectives of entities in different agent sights are given as a start point. 4.4. Auxiliary measurements Running back over the three planes standing for Agent Sight, Entity Perspective and Attribute Extension in a given Information Cube, we can establish other meaningful measurements that may be helpful in alleviating the invalidity of rough approach in some cases, at least, to give ways of evaluating the current applicability for rough approach. Slicing a given Information Cube along with the direction paralleled with the Agent-Attribute axis plane, we have an array of information granules called Entity Perspective for a specific entity. From the viewpoint of the quadruple granular calculus, such an array of atomic granules is a special case of compound granule QEP0 ¼ ðAg; u0 ; A; VÞ, for which, the three collections of Agent, Attribute and Value are the respective entire set given by the Information Cube, with a single entity u0 in the Entity Collection. This granule QEP0 is named from the plane as Entity Perspective Granule, containing all the perspectives of an entity in multiple agents’ views. With the same process, following the direction of Agent-Entity plane, Attribute Extension Granule QAE0 ¼ ðAg; U; a0 ; VÞ standing for the extensional valuation of a specific type of attributes a0 for all the entities in each agent’s view. By evaluating the similarity among rows of a specific Entity Perspective Granule, the degree of inconsistence of entity’s perspective in different agent views can be calculated; whereas
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
by assessing the similarity among rows of a given Attribute Extension Granule, the subjectivity, viz. the degree of dependence on agents’ personality, of an attribute can be found. If an attribute type is too subjective, when its value depends too much on the arbitrary epistemic state of an agent, this attribute is bound to differ drastically on the value for most of the entities, aggravating the degree of inconsistence of entity’s perspectives. In such a case, we can reconsider new attribute types that can better objectively characterizing the entities in order that we have a more rational decision system. On the other hand, if the subjectivity is not very obvious, and there are some entities that have much higher degree of inconsistence on its perspectives than average cases, we could try to find out what implicit reasons lead to these special cases, so that we can adjust the system or these special entities accordingly. 5. GrRC adapted as an upper-level formal ontology language It is an appealing motivation for Granular Rough Theory to utilize underlying relationships between Mereology and Ontology, so as to naturally and effectively apply roughness methodology in ontological computing environment. In this section, inspired by Synthetic a priori of Kant, necessary elements for an upper-level formal ontology language are investigated. GrRC is explicated for its compliance with this framework in its current status, and tentative expansion is explored to complement GrRC for its ontological expressivity. In information systems, ontology is utilized to provide facilities for interoperation among multiple applications, especially in multi-agent centered systems, where multiple autonomous participants automate the business process. Ontology aiming to facilitate specific interoperation purpose is referred to as Domain Ontology. Upper-Level Ontology (ULO) is abstract description to entities in the real world independent of application, aiming to be an infrastructure for domain ontology building. As the theory of beings, Ontology is a cornerstone for philosophy, which seeks the definitive and exhaustive classification of entities in all spheres of being [25]. The widely accepted definition for ontology in information systems is proposed by Gruber, that ontology is based on conceptualization, which represents objects, other entities and their mutual relations in the discourse, and an ontology is an explicit representation of a conceptualization [26]. And for ULO, such as GOL [27] and SUMO [28,29], the research objectives are to enable the development of general-purpose ontology of common concepts, so as to provide the basis for domain ontologies, and to ensure interoperability amongst different domain ontologies via specific ULO-compliance. 5.1. Synthetic a priori of Kant as fundamental framework of ULO In [30], Colomb explored the connection of ULO in information system and semantic Web research with Kant’s Synthetic a priori. According to Kant’s idea to discriminate structure and content in knowledge processing, Colomb suggested to classify ULO into two facets of representation: material ontology and formal ontology. Material ontology focuses on describing what is there in the discourse; whereas formal ontology focuses on the expressive form of material ontology, used to build new material ontology. Synthetic a priori provides support for application independent formal ULO with a comprehensive framework with following elements. Space: All beings in the world are presented to human epistemic process in some spatial extensional form. Time: Time exhibits the possibility of external objects simultaneously or consequently being recognized. In information systems, temporal order of events is more important than spatial information.
797
Quantity category: Quantity category concerns with Unity, Plurality and Totality. In this category, Kant regards that the unified cognition to entity with multiple parts is crucial; hence the composition principles should be an a priori, which emphasizes the necessity of mereology in ontology building. In information systems, facts such as data cells composite into tuples and then into tables, and XML nodes composite into trees, exhibit key role played by part-to-whole composition. It is necessary to build-in mereological relations in the ontology language so as to explicitly encode structure of information. Quality category: Quality category concerns with Reality, Negation and Limitation. Reality specifies objects an information system handles and responds; negation gives rise to status change of beings; limitation confines the boundary an information system deals with. Modality category: Modality category concerns with Possibility/ Impossibility, Existence/Non-existence and Necessity/Contingency. Relation category: Relation category concerns with Inherence/ Subsistence, Causality/Dependence, Community, etc. Inherence and subsistence relations specify relations between substances and properties. In GOL, inherence and subsistence correspond to relations between Substance and Moment. Causality relation exhibits necessary consequence of an object and dependency exhibits its necessary premise. Community relation shows mutual dependence and causality amongst multiple participants in a collection. 5.2. Extending ontological expressivity of GrRC According to necessary elements to represent entities and their relations in the objective world stated in Kant’s Synthetic a priori, in this subsection, tentative efforts on extending ontological expressivity of GrRC are explored. 5.2.1. Space Def.GrRCOnto-1: Spatial Atomic Granule, jspat, is a quadruple (u, s, c, v), which is adapted from the triple form atomic granule with an additional spatial element, stating that entity u has attribute c with value v under restriction of spatial information s. Def.GrRCOnto-2: Spatial Compound Granule, Qspat, is generated by aggregation over spatial atomic granules and can be further aggregated or fused into new spatial compound granules with higher granularity. Def.GrRCOnto-3: Topological Relation is mutual relationship resulting from relative spatial location qualified by valuation of spatial element s. Specific to concrete application semantics in given spatial information system, primitive topological relations include connection (overlapping), co-location, proper-part and so on. Due to WCH taxonomy, topological relation is a special case of mereological relation. There are two points to consider in adapting GrRC to specific spatial information system. Firstly, triple form atomic granule is modified to quadruple incorporating element standing for spatial location, and corresponding compound granules can be generated over atomic ones. Secondly, operations dedicated to spatial information granules are to be provided, especially for identifying and utilizing mereological relations amongst spatial granules that describe significant topological semantics. Since Lesniewski’s Mereology together with its underlying logical system provides theoretic foundation for Taski’s axiomatic Geometry of Solids [17]. In spatial-informatics, methodologies such as Region Connection Calculus [31] and Region-based Qualitative Geometry [18] are extensions to Taski’s Geometry of Solids. Since GrRC is built to be based on pure mereological relations, future efforts should focus
798
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
on utilizing affinities in these systems originated from Mereology. As a source of inspiration for future efforts on the spatial ontological extension of GrRC, there are Casati-Varzi’s Formal Theory of Holes [32], Ontology of Boundaries of Barry Smith [33], and Formal Mereotopology from Guarino et al. [34], all of which has been incorporated in SUMO to deal with spatial issues. 5.2.2. Time In clinical study information system, a patient record is of the form ‘‘patient-id, diagnostic-date, diagnostic-attribute, diagnosticvalue’’. When applying GrRC to represent this record, an atomic granule can be (patient-id & diagnostic-date, diagnostic-attribute, diagnostic-value), in which patient-id together with diagnostic-date becomes a unique identifier. This approach maps a clinical record to GrRC representation without modification of GrRC, nevertheless impairs the temporal significance carried by diagnostic-date, so that the applicability of the granule would be narrowed. A better solution is to make temporal element inherent in atomic granule, just like the way that spatial atomic granule are defined. With the introduction of temporal element, two kinds of temporal representation are to be considered corresponding to time-point and time-interval. Time-point granule is a slice on time-interval granule, which is a kind of framing. This resembles with the case in GOL, where situation is a frame on situoid. With introduction of temporal element, two kinds of temporal representation should be considered corresponding to time-point and time-interval. Time-point granule is a slice on time-interval granule, which is a kind of framing. This resembles with the case in GOL, where situation is a frame on situoid. Furthermore, due to the inherent sequential characteristics of time, time-point granules could be compared on their temporal order, and besides the sequential order, overlapped parts of two time-interval granules could be identified. For temporal information granules, status transition for given entities in different time points or intervals is also of interest, which leads to the notion of process. In the process, factors that fire status transition correspond to the notion of events. Def.GrRCOnto-4: Temporal Atomic Granule, jtemp, is a quadruple (u, t, c, v), which is adapted from the triple form atomic granule with an additional time-point element, stating that entity u has attribute c with value v at time-point t. Def.GrRCOnto-5: Temporal Compound Granule, Qtemp, is generated by aggregation over temporal atomic granules and can be further aggregated or fused into new temporal compound granules with higher granularity. Def.GrRCOnto-6: Interval Granule, Qint, is a special temporal compound granule, taking all temporal atomic granules in time interval T as ingredients. Def.GrRCOnto-7: Process Granule, Qproc, is a special temporal granule sequentially aggregated by temporal atomic granules and interval granules, to model the status transition of temporal conception. Def.GrRCOnto-8: Event Extension Granule is an ordered pair aggregated from two temporal information granules, operands of which correspond to pre-event and post-event statuses of specific entities. Since an event have its specific intensional content, it is named as ‘‘extension’’ granule when represented by status transition. Def.GrRCOnto-9: Framing Operation is applied to temporal interval granule to perform a snapshot and get a temporal atomic granule of the specified time-point as the result. 5.2.3. Quantity category Aggregation and fusion of information granules in GrRC gives a way of composition from parts to whole. With the notions of
granular order and granular cardinality, numerical semantics of internal structure are quantified. In special compound information granules, the notion of totality is incarnated as cluster and aspect cluster granule, corresponding to ‘‘class’’ in major ontological description language. It should be pointed out that cluster and aspect cluster granules are extensional representation of class, viz. to qualify collections of information granules with attributes and their concrete valuation. In the future efforts to extend GrRC, intensional representation of class could be defined, which is an abstract conception without temporal-spatial extension. 5.2.4. Quality category In GrRC, the notion of reality is presented in the way that system structures of real information systems are abstracted as triple (or quadruple) form inform atomic granules and their aggregations. The notion of negation can find its support in temporal granule definitions, especially by event granules which recording status transition of entities, where negation leads to status change. Limitation is ensured by the combination of entity segment, attribute segment and value segment, to deterministically qualify the system boundary. 5.2.5. Modality category Modality category can find its representation in GrRC related roughness formation in a natural way. Core concepts of modality are necessity and possibility, and in modal logic [35], (F, ^, _, :, &, ^), besides standard logical connectives of propositional logic (F, ^, _, :), necessity connective & and possibility connective ^ are introduced to capture the modality. Modal semantics can be explicated with the notion of possible worlds. Let W be the totality of each possible world w. If proposition P is valid in all possible worlds in W, then &P maintains; if proposition P is valid in some of the possible worlds in W, ^P maintains. Yao and Lin attempt to generalize the roughness conception with modal logic [36], which states the way how the lower and upper approximations are mapped to necessity and possibility connectives. Lower approximation determines that entities in it necessarily satisfy the specifications of decisional attribute; whereas upper approximation states that entities in it possibly satisfy the specifications. In this point of view, to represent modality in terms of GrRC, kernel granule and corpus granule can be introduced under proper context. 5.2.6. Relation category For inherence and subsistence relations, property types and inherent in an entity are ensured by intended meaning of each element in triple model for information granules, and following conventions of Lesniewski’s Mereology, subsistence of null entity is denied in GrRC. To represent causality and dependence relations, further extension of temporal information granules are required, especially for the process and event granules, which are dedicated designed for expressing such relations. 6. Pragmatic aspects of Granular Rough Theory It is obvious that the GrRC is by nature a representation calculus, viz. it provides a way of expressing information structures by its operations rather than really specifies underlying implementation. Since there are a great number of information granules in the system, efficient comparison and query mechanisms for information granules are crucial for this approach to be practical and realizable. In other words, a mapping from the symbolic operations of the GrRC to computational realizable constructs is the key to a full-fledged system. Two prototypes of Granular Rough Theory are presented in terms of data representation model
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
799
mapping and object-oriented programming, respectively. Data representation model mapping implementation utilizes open source clinical information system, which is based on EAV model. Object-oriented programming implementation is developed with Java classes and objects to incarnate the notion of information granules. Moreover, building blocks of an Ontology-Driven Web Information System framework are clarified, as a Granular-Rough computational Web intelligence infrastructure for upper applications. 6.1. EAV-based prototyping for GrRC Aiming to rapid prototyping, with a data-model mapping orientation, an open-source EAV system, TrialDB, is based upon to build an implementation of GrRC in this subsection. Information granules in an information system are mapped to data records in an EAV database, and corresponding operations on granules are mapped to database statements. 6.1.1. Mapping granular representation to EAV model in clinical systems Knowledge discovery and data mining in clinical information systems are important applications of Rough Set Theory [37,38]. Semi-structured EAV model is the mainstream for representing heterogeneous data in clinical systems. As Nadkarni et al. pointed out in [39], major commercial clinical data management systems such as Oracle Clinical, ClinTrial and MetaTrial utilize EAV as the underlying model. As reported by Clinical Informatics Management System team in NIH (National Institution of Health) [40], in tests over clinical data systems based on EAV, ER and CLOB (Character Large Object), EAV model exhibits superior performance, which is preferable to be data structure for analyzing clinical data. When classic Rough Set Theory is applied to analyze EAV systems, these systems must be pre-processed and transformed to ER table. To accommodate wide enough scope of information sources, design considerations of GrRC comply with commonness of EAV, OEM, and RDF, viz. to adopt ‘‘attribute-value’’ paradigm. It is more natural to apply roughness data analysis on clinical information system with GrRC, and meanwhile, open-source implementation of EAV-based clinical system, such as TrialDB [39], provides GrRC prototype with substantial foundation. To implement GrRC over existing semi-structured systems, a feasible solution should cover two issues. (1) How information granules are mapped to the representation units in the target system? (2) How operations over granules are mapped to computational mechanisms in the target system? As primitive of GrRC, triple-form atomic granule (u, c, v) stating ‘‘entity u has an attribute c with value v’’ corresponds to a single cell in Information Table, which reflects the primary idea or EAV model in that the ER table are decomposed into discrete cells, and value in each cell together with its entity identifier and attribute name constitute a row in EAV table. In [41], Anhøj summarizes evolution of EAV model in clinical databases. Fig. 5 from Anhøj illustrates a schema for simple EAV model. The Data table is a typical EAV table; patientID and date together form the primary key that uniquely identifies a test item for an entity. To express Data table in Fig. 5 in terms of GrRC, the first problem is how to encapsulate record of the form (patientID, date, attributeID, value) in a triple form atomic granule ðu; c; vÞ. Its solution should be based on specific requirements for mining association rules in clinical systems. When association rules minded concern only relationships between symptoms and diseases, where temporal characteristics of given patient are not significant, entity-identifier element of atomic granule can be the
Fig. 5. Database schema for simple EAV model [41].
pair of patientID together with date. In EAV-based prototype of GrRC, this simple solution is adopted. When the data mining scenario should trace temporal evolvement of symptom for a given patient, the mapping should consider extending original atomic granule into temporal atomic granule, an quadruple ðu; t; c; vÞ with additional element t, as described later in Section 6. This extension paradigm based on dimension augmentation is an important method for further development of GrRC. 6.1.2. Mapping granular operations to EAV-DB queries After mapping atomic granule to EAV representation, granular operations relevant to roughness extraction should be translated to EAV-DB queries, so as to accomplish roughness formation over EAV. Full GrRC implementation should define computational mechanism for all notions and operations, but as a rapid prototype, in this subsection, only immediately needed operations are concerned with, including aggregation, self-fusion, internal fusion, aspect shift and wrap. Descriptions for these operations are stated in terms of standard query language (SQL), for most EAV systems such as TrialDB are built on standard commercial relational databases. Specific semantics of aggregation operation vary in different aggregation requirements. For special compound granules dedicated for roughness formation, aggregation over them can be mapped to queries over specified attribute collection qualified by WHERE clause on given attribute-value pairs. Full convergence operation is translated to SQL statement qualified by GROUP BY, whereas self-fusion removes the grouping information. Aspect shift corresponds to more complex operations, for which the entity-identifiers of initial aspect cluster granule are used as conditions in subsequent queries, and then grouped due to valuation of target attributes. Wrap operation is realized by using INNER JOIN between temporal tables represented by two aspect cluster granules. These operations can be conveniently realized with ad hoc query mechanisms built-in in TrialDB [42]. 6.1.3. Limitations of EAV-based prototype Operations of relational database management systems are based on classic set theory as theoretic foundation. So it is with EAV databases implemented over them. For a full implementation of GrRC, the pure mereological approach to roughness formation, there would be a major limitation that the set-theoretic operations cannot describe all the possible mereological relation types, as well
800
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
as compound granules involving such relations amongst ingredients. Moreover, core components of TrialDB are based on WebEAV and SQLGEN, which are poorly encapsulated, and hardly portable. Albeit this, EAV-based prototyping provides a practical system in which notions and operations of GrRC are all computable elements, and it is crucial for GrRC to find its first natural application in clinical system. 6.2. Design for Java-based prototyping of Granular Rough Theory Due to limitations of EAV-based prototype for GrRC, it is more viable to a full implementation of Granular Rough Theory with an object orientation, where the granules are mapped to classes and their instantiated objects, and the granular operations are mapped to methods of objects. In this way, persistent storage and computational logic of information granules are isolated, making the framework more extensible and unified. Java prototype system for Granular Rough Theory is referred to as jGrRT, which implements various granules and operations of GrRC as its primary computational framework, with translators to process information sources of different representations into granules defined in GrRC, and builders to form roughness over granules. Major class diagrams are listed in Appendix A. In the jGrRT.GrRC package, the information granule hierarchy in GrRC is mapped to the class hierarchy. The bottommost granule class is AtomicGranule to present a triple, with an auxiliary class AVPair for convenience of parsing correspondence between attribute and value. All compound granule classes should implement the interface CompoundGranule, which specifies following methods: aggregate, fuse, wrap, order, cardinality. For convenience of unified parameter presentation in method stereotype, with the notion of trivial compound granule, atomic granule is operated as instance of a special compound granule, TrivialCompoundGranule. In the hierarchy, classes for non-trivial compound granules are PlainCompoundGranule and HigherOrderCompoundGranule. For the former, there are internal convergence methods including horizonConv, verticalConv and fullConv; for the latter, there is selfFuse method to transform it into plain granule. AspectGranule, ClusterGranule and AspectClusterGranule are special compound granule classes for the purpose of roughness formation, with additional method merge to generate aspect cluster granules with higher granularity. With respect to information repository, the value of a given granule in one sub-universe of discourse can be a granule in another sub-universe, which is represented by GrValueGranule class. Mereological relation determination between granules is crucial for the success of roughness formation. These important methods are encapsulated in a series of classes that implement MereologicalRelationDeterminer interface, which specifies possible mereological relations: A is-part-of B, B is-part-of A, A is-equal-to B, A is-proper-part-of B, B is-proper-part-of A, proper-overlap, disjoint, etc. For determination of generalized part-to-whole relation, class GeneralizedMereRDeterminer implements it with wrap operation of granules. For canonical part-to-whole relation, CanonicMereRDeterminer is an abstract class leaving key methods for concrete sources undefined until in specific sub-packages of them. There are two mainstream representation models are concerned with by jGrRT. The sub-package jGrRT.GrRC. RelationalModel provides support for multi-table relational information sources, which is typical for realistic database. The expectation is that roughness formation should be done spanning multi-table without unnaturally preprocessing such as preparing a single Information Table through view operation. CompoundCell is sub-class of GrValueGranule that contains information granule as value. RelationSchema class is aggregation of CompoundCell to present
database schema, which provides generateSchemaLinkMap method to produce a linked-list for the schema, for convenience of tracing relationships. RelationalCanonicMereRDeterminer inherits CanonicMereRDeterminer to fulfill key methods of mereological relations in relational data. RDF model support is incarnated in jGrRT.GrRC.RDFModel sub-package. RDFTreeNode interface specifies unified access to all leaf and non-leaf nodes in an RDF tree. NonLeafRDFTreeNode inherits GrValueGranule which bears granules as its values, and in terms of tree, as its child nodes. CompoundNode is special non-leaf node standing for Bag, Sequence and other container types. RDFTreeCanonicMereRDeterminer solves canonical part-to-whole relation determination for RDF model. In complete implementation, RDF sub-package should describe arbitrarily complex RDF sources, which needs further verification for its completeness. 6.3. Towards an Ontology-Driven Web Information System (ODWIS) In [43], Guarino proposes Ontology-Driven Information Systems, with emphases on the distinction between ontology-aware and ontology-driven systems. In the first case, a component in information system is just aware of the existence of ontology and can query it for application-specific purpose. In the second case, the ontology is just another indispensable component, cooperating at run time towards higher overall system goal. In this section, an Ontology-Driven Web Information System is suggested, in the expectation that it could be a candidate engine for computational Web intelligence [44]. The ODWIS system acts as a mediator between users and information sources, collecting descriptions of sources all over the Internet, answering queries from clients and additionally providing knowledge discovering supports over descriptions or query logs. 6.3.1. Upper level formal ontology Upper level formal ontology specifies expressive forms of domain ontologies so as to ensure different domain ontologies bear a common representation infrastructure. When agents in different domains need collaboration, it can benefit from this commonness to map from one domain ontology to another. In systems where domain ontologies are generated, maintained or updated through ontology learning, the upper-level formal ontology provides a meta-framework. Directed by specifications defined in upper level formal ontology, these systems parse concrete descriptions of information sources into domain ontology. As mentioned previously, with thorough extension to GrRC due to framework of Kant’s Synthetic a priori, upper level ontology in ODWIS can be described in terms of GrRC. 6.3.2. Domain ontology Domain ontology plays a core role in ODWIS, which gives the global worldview for all domains of interest. A domain is referred to as a particular discourse of universe, which stands for phenomena in a specific context, for instance, bibliographical domain, medical domain, and automobile manufactory domain. For each domain, there is a domain specific ontology, which is crucial to specify what classes of entities are presented in the information source, what aspects can be inspected over the entities, what component/integral-object mereological relation may exist between granules, what attribute set can be for entities in a sub-universe, and so on. 6.3.3. Domain-specific description repositories There are as many specific description Information Repositories as domains of interest, each of which conforms to the ‘‘schema’’
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
defined by the corresponding domain ontology, termed as DomainSpecific Description Repository (DescRep). A DescRep contains information granules describing aspects of Web information sources, extracted from RDF-like annotations of these sources. For a given information granule, it must describe aspects for entities of the exact types specified in the domain-specific ontology. Information granules in DescRep need not be complete in the sense of describing all aspects of an entity, which makes it tolerable to incompleteness of information. 6.3.4. Description collecting Since ODWIS intends to act as a mediator system, it is natural for ODWIS to have facilities for collecting descriptions of widespread heterogeneous Web information sources in the form of annotations so as to answer complex queries. A prerequisite condition for description collecting is that meta-level annotation mechanism complying with domain ontologies is well adopted in construction of Web information sources, which is one major focus of future development of this work. There may be two styles fulfilling such functionality: ODWIS Initiated Collecting and Description Registry Service Provider. The former is closely related to agent technologies, which makes use of agents (so-called Web Crawler) that crawl on the Web to gather descriptions from annotations residing with each information source. The latter provides open description registry facilities in the same way some global name servers supply their naming services. The collected annotations are then processed to validate their conformance to the specified Domain Ontology in ODWIS. If there are inconsistencies, a semantics oriented domain-specific ontology translation is performed or the inconsistent descriptions are discarded. The resulted annotations are then transformed into information granules and added to corresponding domain-specific DescRep. 6.3.5. Information and knowledge retrieval End user initiates an ODWIS query through different kinds of communication mechanisms. Particularly, for different end user equipments, dedicated domain ontology can be development to isolate information presentation and contents. In this way, the server-side could focus on contents of result, leaving format building for various access methods to special functional components. From the viewpoint of the end users, the most desirable form of queries is a propositional sentence in natural language. By some auxiliary utilities, such a sentence in natural language can be parsed into triple form information granules, due to their intrinsic affinities. It is of paramount significance that queries must follow the specification of domain ontologies. Such conformance can be reinforced by predefined Web pages that provide standard domain ontology oriented query generation forms. On the ODWIS Server side, the received queries are then matched to their corresponding domains. The matching is mainly based on the domain information encoded in the query granules. There may be occasion that no such information is encoded, which makes it hard to determine information sources in what particular domain the user are intended to retrieve. In such a situation, the domain information for some of them can be extracted by comparing the attribute collections of query granules to domain-specific ontologies for their names and value types. This issue is potentially profound and further efforts on it would be necessary. Once the domain-specific ontology is matched, the domain-specific DescRep is also determined, which is the totality of description granules of information sources in the specified domain. Then these description
801
granules are compared with the query granules to obtain the resulted granules. From the viewpoint of the GrRC, the comparison is to find out the aspect cluster granule with respect to the attribute and valuation collection of the query granule. The resulted granules are transformed into sentences as answers to users. Optionally, query results cache mechanism can be incarnated into realized system for higher performance. 6.3.6. Knowledge discovering and ontology learning Knowledge discovering components in ODWIS take advantage of the intrinsic connection of the underlying representation model. By applying the Granular Rough Theory to description granules in each domain-specific description repositories, association or decisional rules between multiple aspects of information sources can be obtained, and information sources can be classified into predefined classifications. Furthermore, clustering of information sources can also be achieved. It is often more desirable to analyze the activities of end users, which calls for the mining functionalities for historical queries. There is Query Logging component that log all the compound query granules into domain specific Log Repositories (LogRep). Then similar knowledge discovering mechanism applied to DescRep can be adapted into the logging context, with augmented requirements such as sequential user activities mining. 7. Concluding remarks The Rough Set Theory sets up a great area of soft-computing and its potential power has not been fully revealed yet. The present paper is intended not for building an alternative attempting to replace it, but a complement of classical Rough Set Theory via a different constructing approach, so as to emphasize specific representational semantics of roughness. The other purpose of constructing a system purely based on Mereology is to call for more efforts in this area, so as to make advantage of its intrinsic relationship with ontology to build brand new methodologies and theories, which would surely lead to the renaissance of the great systems of Lesniewski in the computer community. Related efforts of present work result in following major contributions: (1) The representative semantics of roughness is clarified as an essential feature that makes roughness methodology independent of other soft computing approaches. It is proposed to adjust the underlying representation model of classic Rough Set Theory, in order to explicitly encode semantic contexts underlying schema of original Information Tables. In the new representation model design, it is taken into account to widen the natural applicability scope of roughness methodology by accommodating semi-structured data with tuples of the ‘‘attribute-value’’ form. And mereological relations are used to describe the structural relations in the new representation model, due to their rich application semantic contexts. It is pointed out that the motivation to build a pure mereological approach to roughness lies in the expectation not only to make use of the close relationship among mereology, spatial informatics and ontology, but also to exhibit potential powers of mereology in the case of building interdisciplinary methodologies. (2) A representation model Granular Representation Calculus is presented. In GrRC, the primitive notion is the triple form atomic granule, encapsulating the minimal complete semantic unit of information system. Compound granules are aggregated
802
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
from atomic ones, and then compose more complex structures with aggregation and fusion operations. Since GrRC plays the role of common representation model for both ordinary information sources and roughness methodology, some special kinds of compound granules with dedicated operations are defined, and mechanism for mereological relation identification is also discussed, in order to support roughness formation. Then roughness formation approach based on GrRC is proposed. By performing aspect shift over conditional granules to decisional attribute, the results are wrapped with decisional granules to identify the reciprocal part to whole relations, due to which, the conditional granules are classified into regular, irregular and irrelevant granules with respect to a given decisional granule. All regular granules aggregate into the kernel granule, standing for the lower approximation in roughness. On the other hand, the irregular ones form the boundary notion in roughness, named as hull granule. Aggregation of both the kernel and the hull granule results in the upper approximation of roughness, called corpus granule. Distinction between Granular Rough Theory and Rough Mereology is clarified for their different point of view in the case of incorporating mereology. (3) Tentative solutions are presented, to adapt Granular Rough Theory to ontological computing and multi-agent contexts. Based on design considerations in major upper level ontology, ontologically applicable concepts in GrRC are explicated or added to, in terms of space, time, quantity, quality, modal and relation category in Kant’s framework of synthetic a priori. In multi-agent systems, representation for its particular information system is defined. The notion of Information Cube is used to visualize special compound granules, and to analyze the way of roughness formation in multi-agent systems. Epistemic collision among agents is explored with definitions of two auxiliary measurements to alleviate the problem. (4) Two prototypes of implementing Granular Rough Theory are presented in the sense of data representation and objectoriented programming. Data representation oriented implementation utilizes open source clinical information system, which is based on Entity-Attribute-Value model. Objectoriented programming implementation is developed with Java classes and objects to incarnate the notion of information granules. Moreover, functional aspects of Ontology-Driven Web Information System framework are illustrated as a RoughGranular computational Web intelligence infrastructure for upper applications.
Following open issues need further efforts to improve current status of Granular Rough Theory: (1) Theoretic issues in classic Rough Set Theory should also be formally treated in Granular Rough Theory, such as the crucial problem of reducing attributes and calculating the core of information systems. (2) It is the key for success of roughness formation with GrRT to develop efficient mechanism for identifying canonical mereological relations on information sources originally represented in various models. This issue is tightly related to algorithmic improvements on different data structures in graph theory. (3) For extension of GrRC in ontological computing environments, GrRC needs feasible improvements to be an expressive ontology language to describe a comprehensive upper level formal ontology orienting granular-rough computational Web intelligence. (4) In multi-agent contexts, with a representational semantics oriented viewpoint, different strategies of adapting GrRT can be implemented, especially roughness formation spanning different views of multiple agents, which could be applied to collision resolution and social collaboration of multi-agents. (5) Techniques on spatial representation and visualization of information granules should be investigated, so as to build a structured and visualized methodology to analyze and solve the problems, which is the essence of granular computing. (6) The framework of Ontology-Driven Web Information System should be fully implemented and encapsulated into a pluggable component of granular-rough computational Web intelligence, which can be conveniently, dynamically and flexibly deployed on heterogeneous infrastructures. (7) Logistic branches especially modal logic, epistemic logic, description logic and so on should be investigated to provide GrRT with theoretic supports. Acknowledgments Research of the present work is partially supported by Chinese national eleventh Five-year high-tech support program project ‘‘research on key-tech of service interaction infrastructure in modern service industry’’ (Grant Number: 2006BAH02A0407). The authors are grateful to the open source code of TrialDB provided by Medical Information Center of Yale. The authors are also grateful to reviewers and editors for their comments and suggestions on improving the quality of this work.
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
Appendix A. Major class diagrams of jGrRT
803
804
Appendix A (Continued )
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
B. Chen et al. / Applied Soft Computing 9 (2009) 786–805
References [1] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11 (1982) 341–356. [2] Z. Pawlak, A treatise on rough sets, Transactions on Rough Sets IV (2005) 1–17. [3] E.C. Luschei, The Logical Systems of Lesniewski, North-Holland Publishing Company, Amsterdam, 1962. [4] A. Whitehead, B. Russell, F.R.S. Principia Mathematica, Second Edition, Published by Cambridge University Press, reprinted 1957. [5] L. Polkowski, A. Skowron, Rough mereology, in: Z.W. Ras, M. Zemankova (Eds.), Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems. Lecture Notes in Computer Science, vol. 869, Springer-Verlag, October 1994, pp. 85–94. [6] A. Skowron, L. Polkowski, Rough Mereological Foundations for Design, Analysis, Synthesis, and Control in Distributed Systems Information Sciences, Elsevier Science Inc., 1998. [7] Z. Pawlak, A. Skowron, Rough membership functions, in: R.R. Yager, M. Fedrizzi, J. Kacprzyk (Eds.), Advances in the Dempster-Shafer Theory of Evidence, Wiley, New York, 1994, pp. 251–271. [8] B. Chen, M. Zhou, A naı¨ve exploration on intensions of rough mereology, in: Proceeding of the 2nd National Conference on Rough Set and Soft Computing of China (CRSSC’2002), Special Issue of Computer Science, vol. 29, no. 9, 09/2002, pp. 7–10 (in Chinese). [9] B. Chen, M. Zhou, Re-examine semantics of rough theory, in: Proceeding of the 2nd National Conference on Rough Set and Soft Computing of China (CRSSC’2002), Special Issue of Computer Science, vol. 29, no. 9, 09/2002, pp. 106–109. [10] B. Chen, M. Zhou, A Lesniewski Mereological Analysis on Roughness Theory, Computer Science, vol. 33, no. 7, 2006, pp. 171–175 (in Chinese). [11] B. Chen, M. Zhou, A pure mereological approach to roughness, in: G. Wang, Q. Liu, A. Skowron, et al. (Eds.), Proceedings of the RSFDGrRC 2003, Chongqing, China, LNCS/LNAI 2639, Springer-Verlag, 2003, pp. 425–429. [12] B. Chen, M. Zhou, Adapting granular rough theory to multi-agent context, in: G. Wang, Q. Liu, A. Skowron (Eds.), Proceedings of the RSFDGrRC 2003, Chongqing, China, LNCS/LNAI 2639, Springer-Verlag, 2003, pp. 701–705. [13] B. Chen, M. Zhou, Extending granular representation calculus for Internet media resources, in: J. Li, et al. (Eds.), Proceedings of the 3rd International Conference on Active Media Technology (ICAMT’04), World-Scientific, Chongqing, China, 2004, pp. 571–576. [14] B. Chen, M. Zhou, ODWIS as a prototype of knowledge service layer in semantic grid, in: K.M. Liew, et al. (Eds.), Proceedings of the 5th International Conference on Parallel and Distributed Computing, Applications and Technologies, Singapore, LNCS 3320, Springer-Verlag, 2004, pp. 772–776. [15] B. Chen, M. Zhou, Granular Rough Theory Research, Journal of Software 19 (3) (2008) 565–583 (in Chinese). [16] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, DC, May 1993. [17] A. Tarski, Foundations of the geometry of solids, in: Logic, Semantics, Metamathematics, Translated by Woodger into English, Clarendon Press, Oxford, 1956, pp. 24–29. [18] B. Bennett, A.G. Cohn, P. Torrini, et al., A foundation for region-based qualitative geometry, in: W. Horn (Ed.), Proceedings of the ECAI-2000, 2000, pp. 204–208. [19] E.F. Codd, A relational model of data for large shared data banks, Communications of the ACM 13 (June (6)) (1970) 377–387. [20] P.M. Nadkarni, C. Brandt, S. Frawley, et al., Managing attribute-value clinical trials data using the ACT/DB client-server database system, Journal of the American Medical Informatics Association 5 (2) (1998) 139–151. [21] Y. Papakonstantinou, H. Garcia-Molina, J. Widom, Object exchange across heterogeneous information sources, in: Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, (1995), pp. 251–260. [22] F. Manola, E. Miller (Eds.), RDF Primer, W3C Recommendation, 2004, p. 2, http:// www.w3.org/TR/rdf-primer.
805
[23] M. Winston, R. Chaffin, D. Herrmann, A taxonomy of part-whole relations, Cognitive Science 11 (1987) 417–444. [24] S. Abiteboul, D. Quass, J. McHugh, et al., The LOREL query language for semistructured data, International Journal on Digital Libraries 1 (1) (1997). [25] B. Smith, C. Welty, Ontology: towards a new synthesis, in: Proceedings of the Conference on Formal Ontology in Information Systems, 2001, p. iii. [26] T.R. Gruber. Toward Principles for the Design of Ontologies Used for Knowledge Sharing Technical Report KSL 93-04, Knowledge Systems Laboratory Stanford University, 1993. [27] W. Degen, B. Heller, H. Herre, et al., GOL: towards an axiomatized upper level ontology, in: Proceedings of the International Conference on Formal Ontology in Information Systems, ACM Press, 2001, pp. 34–46. [28] I. Niles, A. Pease, Towards a standard upper ontology, in: Proceedings of the International Conference on Formal Ontology in Information Systems, ACM Press, (2001), pp. 2–9. [29] I. Niles, A. Pease, Origins of the IEEE Standard Upper Ontology, Working Notes of the IJCAI-2001 Workshop on the IEEE Standard Upper Ontology, Seattle, pp. 37– 42. [30] R.M. Colomb, Formal versus material ontologies for Information Systems Interoperation in the Semantic Web, Computer Journal 49 (1) (2006) 4–19. [31] D. Randell, Z. Cui, A. Cohn, A spatial logic based on regions and connection, in: Proceedings of the Knowledge Representation and Reasoning, Morgan Kaufmann Publishers, San Mateo, 1992, pp. 165–176. [32] R. Casati, B. Smith, A. Varzi, Ontological Tools for Geographic Representation. Formal Ontology in Information Systems, IOS Press, 1998, pp. 77–85. [33] B. Smith, Mereotopology: a theory of parts and boundaries, Data and Knowledge Engineering 20 (1996) 287–303. [34] S. Borgo, N. Guarino, C. Masolo, An ontological theory of physical objects, in: Proceedings of the Eleventh International Workshop on Qualitative Reasoning (QR’97), 1997, pp. 223–231. [35] G.E. Hughes, M.J. Cresswell, A New Introduction to Modal Logic, Routledge, London, 1996. [36] Y.Y. Yao, T.Y. Lin, Generalization of rough sets using modal logic, Intelligent Automation and Soft Computing 2 (1996) 103–120. [37] S. Tsumoto, H. Tanaka, Induction of expert system rules from databases based on rough set theory and resampling methods, in: Proceedings of the 9th International Symposium on Methodologies for Intelligent Systems (ISMIS’96), LNCS 1079, Springer-Verlag, Heidelberg, 1996, pp. 128–138. [38] K. Farion, W. Michalowski, R. Slowinski, et al., Rough set methodology in clinical practice: controlled hospital trial of the MET system, in: Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing (RSCTC 2004), Springer-Verlag, Heidelberg, 2004 , pp. 805– 814. [39] C.A. Brandt, A.M. Deshpande, C. Lu, et al., TrialDB: a web-based clinical study data management system AMIA 2003 open source expo, in: M. Musen (Ed.), Proceedings of the 2003 AMIA (American Medical Informatics Association) Annual Symposium, AMIA Press, Washington, 2003, p. 794. [40] S.A. Wang, Y. Fann, H. Cheung, et al., Performance of using oracle XMLDB in the evaluation of CDISC ODM for a clinical study informatics system, in: Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems, IEEE Computer Society, Washington, 2004, pp. 594–599. [41] J. Anhøj, Generic design of web-based clinical databases, Journal of Medical Internet Research 5 (4) (2003) e27 http://www.jmir.org/2003/4/e27/. [42] P.M. Nadkarni, C.A. Brandt, Data extraction and ad hoc query of an entityattribute-value database, Journal of the American Medical Informatics Association 5 (6) (1998) 511–527. [43] N. Guarino, Formal ontology, conceptual analysis and knowledge representation, International Journal of Human and Computer Studies 43 (5–6) (1995) 625–640. [44] Y.Q. Zhang, T.Y. Lin, Computational Web Intelligence (CWI): synergy of computational intelligence and Web technology, in: Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, 2002, pp. 1104–1107.