Enhanced entity-relationship modeling with description logic

Enhanced entity-relationship modeling with description logic

Knowledge-Based Systems 93 (2016) 12–32 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/...

5MB Sizes 0 Downloads 32 Views

Knowledge-Based Systems 93 (2016) 12–32

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Enhanced entity-relationship modeling with description logic Fu Zhang∗, Z.M. Ma, Jingwei Cheng College of Information Science & Engineering, Northeastern University, Shenyang 110819, China

a r t i c l e

i n f o

Article history: Received 21 April 2015 Revised 3 October 2015 Accepted 30 October 2015 Available online 10 November 2015 Keywords: EER (Enhanced Entity-Relationship) model Description Logic Representation Reasoning

a b s t r a c t Based on the high expressive powers and effective reasoning services of description logics (DLs, for short), DLs have been employed in data modeling to support the development and maintenance of data models. The basic idea is that once the correspondences between data models and DLs can be established, reasoning techniques from DLs become applicable to the reasoning of data models. This paper proposes a complete DL approach for representing and reasoning on EER (Enhanced EntityRelationship) models. We develop an equivalence-preserving transformation approach and a prototype tool for transforming an EER model into a DL knowledge base, and propose methods to reduce reasoning on the EER model to reasoning on the transformed DL knowledge base. As one result, the reasoning capabilities of the DL can provide the basic reasoning services that are needed in EER modeling. In detail, we firstly propose a formal definition and semantic interpretation method of EER models, which summarizes and includes all features of EER models. Then, by analyzing the features of EER models, a DL called ALCQIK is presented as the language of representing and reasoning on EER models. On this basis, we propose an approach for transforming EER models into ALCQIK knowledge bases. The correctness of the transformation is proved and a transformation example is provided. Further, a prototype transformation tool is implemented. Case studies show that our approach and prototype tool actually work. Finally, based on the transformed ALCQIK knowledge bases, we propose methods to reduce reasoning on EER models to reasoning on the transformed ALCQIK knowledge bases. © 2015 Elsevier B.V. All rights reserved.

1. Introduction The Enhanced Entity-Relationship (EER) model [49], which is an extension of the Entity-Relationship (ER) model [23], includes all the modeling concepts of the ER model and incorporates additional constructs such as subclass/superclass, specialization/generalization, categorization, and aggregation. Over the years, EER models are widely used in the conceptual analysis stage to support the developments of databases, application software, websites, and other areas [25,26,29,33,38,41,46,48,49]. In the process of EER modeling, the complex of the EER notions and their interaction may lead to various problems such as redundancies and inconsistencies. Checking such problems manually is a complex and time-consuming task [9,12,14,18]. Therefore, one issue has arisen from practical needs: namely, how to detect these problems in EER modeling. It could be addressed by the well-known knowledge representation language Description Logics (DLs, for short [11]). One of the basic ideas behind applying DLs to data management is that data models can be expressed as DL knowledge bases, so that DL



Corresponding author. Tel.: +86 2483681582. E-mail address: [email protected] (F. Zhang).

http://dx.doi.org/10.1016/j.knosys.2015.10.029 0950-7051/© 2015 Elsevier B.V. All rights reserved.

reasoning techniques can be used to reason about the data models, e.g., detecting whether a model is consistent or an entity is a subtype of another entity [4,9,18,20,50]. In the last years, DLs have been shown useful for reasoning about data models (e.g., ER [18,27,37] and UML [12]). The detailed report about data modeling with DLs can be found at Section 6 of this paper. However, to our best knowledge, there is not a complete and detailed report on EER modeling with DL. Some important issues including the EER constructs (e.g., categorization, aggregation, total participation constraints of entities in relationships, and etc.), the detailed transformation rules from EER to DL, the prototype transformation tool, and the reasoning of EER with DL were still missed. At present, there are still some issues need to be addressed in detail: (i) How to choose a DL for EER modeling. This implies that the representation and reasoning of EER with the DL are easier to be implemented and exploited; (ii) How to provide a complete report on EER modeling with DL. The report should give a formal approach for transforming all constructs of EER into DL knowledge bases, so as to enable readers to understand well the EER modeling with DL; (iii) What are the familiar reasoning problems in EER modeling? and how to deal with these reasoning problems with DL? If all of these issues are solved, the possible uses of automated reasoning scenarios to improve the quality of EER modeling are greatly expanded.

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

On this basis, in this paper we propose a DL approach for representing and reasoning on EER models. We develop an approach and a tool for transforming an EER model into a DL knowledge base, and further propose methods for reasoning on the EER model with the DL. In brief, the paper makes the contributions: • How to formalize EER models and choose a DL? In Section 3, we first propose a formal definition and semantic interpretation method of EER models. Then, based on the features of EER models, we propose a DL ALCQIK , which will be used to represent and reason on EER models. • How to transform EER models into DL ALCQIK knowledge bases? In Section 4, we propose an approach and develop a tool for transforming EER models into ALCQIK knowledge bases, including: (i) proposing complete transformation rules; (ii) giving the proof of correctness of the transformation, and providing a transformation example; (iii) implementing a prototype tool. Case studies show that our approach and tool actually work. • How to reason on EER models with the transformed ALCQIK knowledge bases? Based on the transformed ALCQIK knowledge bases in Section 4, in Section 5, we further propose methods to reduce the reasoning of EER models to the reasoning of the transformed ALCQIK knowledge bases. Also, a reasoning example is provided. As one result, the reasoning capabilities of the DL can provide the basic reasoning services that are needed in EER modeling.

13

The remainder of this paper is organized as follows. Section 2 recalls preliminaries. Section 3 formalize EER and present a DL ALCQ IK . Section 4 develops a transformation approach and tool. Section 5 studies the reasoning of EER with ALCQIK . Section 6 introduces related work. Section 7 gives conclusions. 2. Preliminaries on EER models and DLs In this section, some preliminaries on EER models and DLs are recalled. 2.1. EER models The Enhanced Entity-Relationship (EER) model is an extension of Entity-Relationship (ER) model [23]. Since 1980s there has been an increase in emergence of new database applications with more demanding requirements. Basic concepts of ER modeling are not sufficient to represent requirements of newer, more complex applications. Therefore, Enhanced Entity-Relationship (EER) model comes into being successively. EER includes the notions of ER (e.g., entity, attribute, relationship), and incorporates additional constraints such as subclass/superclass, specialization/generalization, categorization, and aggregation. Unfortunately, there are not standard terminologies for these notions, so we use the most common terminologies. Fig. 1 describes a diagrammatic technique for displaying these notions when they arise in an EER

Fig. 1. EER diagram notations.

14

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

their instances. Note that, a weak entity depend on its identifying entity to generate its primary key. That is, the primary key of a weak entity is formed by the primary key of the identifying entity on which the weak entity is existence dependent, plus the weak entity set’s discriminator.

Fig. 2. An entity Student with attributes in an EER model.

model. The detailed introductions will be provided in the subsequent subsections. 2.1.1. Entity/attribute In an EER model, (weak) entities and attributes are defined as follows: (i) Entity ♦ The object instances in the real-world with the same attributes are grouped into an entity type. For example, Student is an entity, and an object John is an instance of the entity Student. (ii) Attribute ♦ Attribute: Each entity has a set of attributes. For example, each student has attributes StuNo, Name, and Phone. Attributes maybe single or multivalued, simple or composite, derived. – Single and simple attribute: An attribute is single and simple attribute by default. – Multivalued attribute: Attribute with a set of possible values for the same object instance of an entity. E.g., a student may have more than one phone. Here, Phone is a multivalued attribute. – Composite attribute: Attribute composed of several simple attributes. – Derived attribute: The attribute values can be calculated or derived from others. ♦ Domain: With each attribute a domain is associated, i.e., a set of permitted values for an attribute. Possible domains are integer, string, and etc. ♦ Key: A key is a set of one or more attributes whose values uniquely determine each object instance of an entity. For an entity, several candidate keys may exist. During conceptual design, one of the candidate keys is selected to be the primary key of the entity. In an EER model, Rectangle denotes Entity; Ellipse denotes Attribute; Double Ellipse denotes Multivalued Attribute; Dashed Ellipse denotes Derived Attribute; and Underline denotes Primary Key as shown in Fig. 1. Fig. 2 shows an entity with attributes in an EER model, which denotes "A Student entity has a StuNo (primary key), a Name, a composite attribute Address (with HouseNo, Street, District) and a multivalued attribute Phone. (iii) Weak entity ♦ An entity that does not have a primary key is referred to as a weak entity. The existence of a weak entity depends on the existence of an identifying entity, it must relate to the identifying entity via a one-to-many identifying relationship from the identifying to the weak entity. A weak entity typically has discriminator (or partial key) for distinguishing

In an EER model, Double Rectangle denotes Weak Entity; Double Diamond denotes Identifying Relationship; and Dashed Underline denotes Discriminator as shown in Fig. 1. Fig. 3 shows a weak entity Dependent, and its existence depends on the existence of its identifying entity Employee by the identifying relationship Dependents_of. The primary key of the weak entity Dependent is formed by the primary key EmpNo of the identifying entity Employee and its discriminator DeName. 2.1.2. Relationship In an EER model, relationships and constraints are defined in the following. (i) Relationships ♦ Relationships: Relationships associate several entities. They are pictured by Diamond Nodes. A relationship can have attributes, which are used to describe properties of the relationship. ♦ Roles: Roles are optional, and they are used to clarify semantics of a relationship, i.e., the way in which an entity participates in a relationship. (ii) Constraints of relationships ♦ Cardinality ratio constraint: which specifies the number of relationship instances an entity can participate in. The cardinality ratio constraint may be one of the following types: – One to One (1:1). For example,

– One to Many (1:N). For example,

– Many to Many (M:N). For example,

♦ Participation constraint: which specifies the participation of entities in relationships. The participation constraint may be one of the following types: - Total participation (displayed by double line): each instance of an entity participates in the relationship. For example, the participation of Loan in Borrower is total, i.e., every loan must have a customer associated to it via borrower.

- Partial participation (displayed by a single line): some instances of an entity may not participate in the relationship. For example, the participation of Customer in Borrower is partial. ♦ Cardinality constraint: which is optional and specifies the lower and upper bounds on the number of relationships each instance of an entity can participate in. Instead of a cardinality ratio or participation constraint, more precise cardinality limits can be associated with relationships. The cardinality constraint is displayed as follows:

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

15

Fig. 3. A weak entity Dependent in an EER model.

Moreover, in an EER model, relationships among more than two entities are rare, and most relationships are binary [24,46]. Relationships may involve more than two entities, and n-ary relationships are usually converted to binary relationships for the sake of implementation ease as presented in [25,33]. 2.1.3. Specialization/generalization In an EER model, the concepts of specialization and generalization are defined as follows. (i) Specialization/generalization A new entity, called subclass, is produced from another entity, called superclass by means of inheriting all attributes of the superclass, overriding some attributes of the superclass, and defining some new attributes. Any instance belonging to the subclass must belong to the superclass. This characteristic can be used to determine whether two entities have a subclass/superclass relationship. Formally, an entity, say E1 , is called a subclass of another entity, say E2 , and meanwhile E2 is called a superclass of E1 . For such a subclass/superclass relationship, we must always have:

E1 ⊆ E2 A specialization is the process of defining a set of subclasses of an entity, and this entity is called the superclass of the specialization. Generalization is the inverse process of generalizing several entities into a higher level abstract entity that includes the instances in all these entities. Specialization is conceptual refinement, whereas generalization is conceptual synthesis. Formally, a specialization E = {E1 , . . . , En } is a set of subclasses that have the same superclass E; that is, E is called a generalized entity (or the superclass of the specialization, or the generalization of the subclasses {E1 , . . . , En }). Such a specialization/generalization relationship can be formed as: n

∪ Ei ⊆ E

i=1

(ii) Constraints of specialization/generalization ♦ Disjoint and overlap constraint: Disjoint specifies that the subclasses of the specialization must be disjoint. Otherwise, If the subclasses are not constrained to be disjoint, their sets of entities may overlap. Formally, a specialization E = {E1 , . . . , En } is said to be disjoint if we always have: Ei ∩ E j = φ (empty set) for i, j ∈ {1 . . . n} and i = j Otherwise, E is said to be overlapping. ♦ Total and partial constraint: Total specifies that every entity in the superclass must be a member of at least one subclass in the specialization; Partial allows an entity not to belong to any of the subclasses. Formally, a specialization E = {E1 , . . . , En } is said to be total if we always have: n

Fig. 4. The specialization/generalization in an EER model.

Fig. 4 shows a specialization/generalization, which specializes Person into {Faculty, Student} and the overlapping constraint is chosen because a person may belong to more than one of the subclasses. Also, Student is specialized into {Graduate_Student, Undergraduate_Student}, and two subclasses are disjoint.

2.1.4. Category The notion category in an EER model can be defined as follows. (i) Category All of the superclass/subclass relationships we have seen in Section 2.1.3 thus far have a single superclass. It is not uncommon, however, that the need arises for modeling a single superclass/subclass relationship with more than one superclass, where the superclasses represent different entities. In this case, the subclass will represent a collection of objects that is a subset of the UNION of distinct entities, and such a relationship is called a union type or a category. Formally, a category E = {E1 , . . . , Ek } is a entity that is a subset of the union of k defining superclasses E1 , …, Ek , and is formally specified as follows: k

E ⊆ ∪ Ei i=1

(ii) Constraints of category A category can be total or partial. A total category holds the union of all entities in its superclasses, whereas a partial category can hold a subset of the union. A total category is represented by a double line connecting the category and the circle, whereas partial categories are indicated by a single line. Formally, a category E = {E1 , . . . , Ek } is said to be total if we always have: k

∪ Ei = E

E = ∪ Ei

Otherwise, E is said to be partial.

Otherwise, E is said to be partial.

i=1

i=1

16

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

In order to define a formal semantics of AL-concepts, an interpretation I that consists of a non-empty set I (the domain of the interpretation) and an interpretation function I , which assigns to every atomic concept A a set AI ⊆DI and to every atomic role R a binary relation RI ⊆DI × I . The interpretation function is extended to concept descriptions by the following inductive definitions:

Fig. 5. A category in an EER model.

Fig. 6. An aggregation in an EER model.

The diagrams of category and constraints can be found in Fig. 1. Fig. 5 shows a category Instructor_Researcher, which is a subset of the union of Faculty and Graduate_Student, includes faculties or graduate students who are supported by teaching or research. 2.1.5. Aggregation An aggregation may model a part-whole relationship between entities, i.e., a relationship that specifies each instance of an entity (the containing entity) contains a set of instances of another entity (the contained entity). Aggregations are a particular kind of binary relationships. In an EER model, an aggregation is pictured as follows (where the diamond indicates the containing entity):

Fig. 6 shows an aggregation, which models phone bills containing phone calls: a PhoneCall is contained in one and only one PhoneBill, while a PhoneBill contains at least one PhoneCall. 2.2. Description Logics (DLs) In the last decade a substantial amount of work has been carried out in the context of Description Logics (DLs). DLs are a logical reconstruction of the so-called frame-based knowledge representation languages, with the aim of providing a simple well-established Tarski-style declarative semantics to capture the meaning of the most popular features of structured representation of knowledge. Nowadays, DLs have gained even more popularity due to their applications in the context of the Semantic Web [11]. In the following, we survey the languages of DLs according to their expressive power with the beginning of AL (i.e., attributive language) [11]. The language AL has been introduced as a minimal language that is of practical interest, and the different DL languages are distinguished by the constructors. 2.2.1. The basic DL AL Elementary descriptions are atomic concepts and atomic roles (also called concept names and role names), and complex descriptions can be built from them inductively with concept and role constructors. In abstract notation, we use the letter A for atomic concepts, R for atomic roles, and C and D for concept descriptions. Concept descriptions in AL are formed according to the following syntax rule: C, D →

| ⊥| A| C D| ∀R.C | ∃R.⊥

(Universal concept) (Bottom concept) (Atomic concept) (Intersection) (Value restriction) (Limited existential quantification)

I = I ⊥I = φ (¬A)I = I \AI (C D)I = C I ∩ DI (∀R.C )I = {a ∈ I |∀b.(a, b) ∈ RI → b ∈ C I } (∃R.⊥)I = {a ∈ I |∃b.(a, b) ∈ RI }

The above constructors can be used to build DL knowledge bases for representing the knowledge of an application domain (the “world”) by first defining the relevant concepts of the domain (i.e., its terminology), and then using these concepts to specify properties of objects and individuals occurring in the domain (i.e., the world description). Generally speaking, a DL knowledge base KB comprises two components, i.e., the TBox T and the ABox A as follows: A TBox T introduces the terminology (i.e., the vocabulary of an application domain) and is a finite set of terminology axioms of the form C  D or C ≡ D; An interpretation I satisfies C  D or C ≡ D if C I ⊆ DI or C I = DI ; An ABox A contains assertions about named individuals in terms of this vocabulary and is a finite set of assertions of the form C(a) or R(a,b); An interpretation I satisfies C(a) or R(a, b) if aI ∈ C I or (aI , bI ) ∈ RI ; An interpretation I satisfies a DL KB if it satisfies all axioms and assertions in KB.

2.2.2. The family of AL-languages Many more expressive languages are proposed by adding constructors to AL. The following introduces several important constructors which are often added into AL: Union (indicated by the letter U): is written as CD, and interpreted as

(C  D)I = C I ∪ D Full Existential Quantification (indicated by the letter E): is written as ∃R.C, and interpreted as

(∃R.C )I = {a ∈ I |∃b.(a, b) ∈ RI ∧ b ∈ C I } Number Restrictions (indicated by the letter N ): are written as ≥nR (at-least restriction) and as ≤nR (at-most restriction), where n ranges over the nonnegative integers. They are interpreted as

( ≥ nR)I = {a ∈ I |#{b|(a, b) ∈ RI } ≥ n} ( ≤ nR)I = {a ∈ I |#{b|(a, b) ∈ RI } ≤ n} where "#{}" denotes the cardinality of a set. Qualified Number Restrictions (indicated by the letter Q): are written as ≥nR.C (at-least restriction) and as ≤nR.C (at-most restriction), where n ranges over the nonnegative integers. They are interpreted as

( ≥ nR.C )I = {a ∈ DI |#{b|(a, b) ∈ RI ∧ b ∈ C I } ≥ n} ( ≤ nR.C )I = {a ∈ I |#{b|(a, b) ∈ RI ∧ b ∈ C I } ≤ n} Negation (indicated by the letter C, for “complement”): is written as ¬C, and interpreted as

(¬C )I = I \C I Inverse (indicated by the letter I, for “complement”): is written as R− , and interpreted as

(R− )I = {(b, a)|(a, b) ∈ RI } Extending AL by any subset of the above constructors yields a particular AL-language. We name each AL-language by a string of the form:

AL[U][E][N][Q][C][I],

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

where a letter in the name stands for the presence of the corresponding constructor. Moreover, since the union (U) and full existential quantification (E) can be expressed using negation (C), the letters UE in language names will be replaced by the letter C. For example, we shall write ALC instead of ALUE. Moreover, in order to avoid very long names for expressive DLs, the abbreviation S was introduced for ALCR+ , i.e., DL that extends ALC by transitive roles. Prominent members of the S-family are SIN (which extends ALCR+ with number restrictions and inverse roles), SHIQ (which extends ALCR+ with role hierarchies, inverse roles, and qualified number restrictions), and so on. For more introductions and extensions about DLs, please refer to [11].



3. Formalization of EER models and selection of DLs



In order to establish correspondences between EER models and DLs (i.e., representation and reasoning of EER models with DLs), it may be necessary to give formal definitions of EER models. Also, a "well-fit" DL should be chosen. The DL should be powerful enough to account for the essential features of EER models and has a lower computational complexity of reasoning, so that the representation and reasoning of EER models with the DL are easier to be implemented and exploited. To this end, in this section:



- We propose a formal definition and semantic interpretation method of EER models, and provide a complete EER model example (see Section 3.1). - Based on the features of EER models, we propose a DL called ALCQIK (see Section 3.2), which will be used to represent and reason on EER models in the later sections of this paper. 3.1. Formalization of EER models In the following, we propose a formal definition and semantic interpretation method of EER models, and we also provide an example of a full EER model including its graphical model and formal representation. Here we first assume that: for two finite sets X and Y we call a function from a subset of X to Y an X-labeled tuple over Y. The labeled tuple τ that maps xi ∈ X to yi ∈ Y, for i ∈ {1, . . . , n}, is denoted [x1 : y1 , x2 : y2 , …, xn : yn ]. We also write τ [xi ] to denote yi .

• •

17

- If a relationship R has explicit role names in the EER model, then U1 ,…, Uk above are such names. Otherwise, they are arbitrary names used to clarify the semantics of the relationship, i.e., the way in which an entity participates in the relationship. θ card is a function from E × R × U to N0 × (N0 ∪ {∞}) that satisfies the following condition (where N0 denotes non-negative integers): for a relationship R such that ξ RE (R) = [U1 : E1 ,…, Uk : Ek ], defining θ card (Ei , R, Ui ) = ((Ei , R, Ui )min , (Ei , R, Ui )max ). If not stated otherwise, (Ei , R, Ui )min is assumed to be 0 and (Ei , R, Ui )max is assumed to be ∞. The function θ card is used to specify the cardinality constraints on the minimum and maximum number of times an instance of an entity may participate in a relationship via some role. δ ISA ⊆ ESET × ESET is a binary relation over ESET specifying the subclass/superclass relationship. δ SG (E) = E1 × … × En is a relation specifying the generalization/ specialization relationship between the superclass E and the subclasses E1 ,…, En ; there may be an optional constraint , , < Overlap, Total>, or < Overlap, Partial>. δ AGG (E1 ) = E2 is a relation specifying aggregation between a containing entity E1 and a contained entity E2 . δ CAT (E) = E1 × … × En is a relation specifying the category between a subclass E and the union of n superclasses E1 ,…, En ; there may be an optional constraint denoting the category is total.

After formalizing EER models as Definition 1, we further give the semantics of EER models. Analogously to the semantics of ER models [11,18], the semantics of EER models can be given by object instance states (i.e., Definition 2). Definition 2. An object instance state O with respect to an EER model MEER is constituted by a nonempty finite set O , and a function O that maps: • Each domain symbol D ∈ DSET to the corresponding basic domain DO . • Each entity E ∈ ESET to a subset E O of O , that is E O ⊆ O . • Each attribute A ∈ ASET to a set AO ⊆ O × DO . • Each relationship R ∈ RSET to a set RO of U-labeled tuples over O .

Definition 1 (Formalization of EER models). An EER model

Further, an object instance state O is considered acceptable if it satisfies all constraints of the EER model. This is captured by the definition of legal object instance states (see Definition 3).

• LSET = ESET ∪ ASET ∪ DSET ∪ RSET ∪ USET is a finite alphabet partitioned into the sets: ESET (entity and weak entity symbols), ASET (attribute symbols), DSET (domain symbols), RSET (relationship symbols), and USET (role symbols). The various basic alphabets ESET , RSET , and DSET are usually assumed to be pairwise disjoint. • ξ EA : E → τ (A,D) is a function that maps each (weak) entity symbol E in ESET to a A-labeled tuple over D, i.e., ξ EA (E) → [A1 : D1 ,…, An : Dn ]. • ξ RA : R → τ (A, D) is a function that maps each relationship symbol R in RSET to a A-labeled tuple over D, i.e., ξ RA (R) → [A1 : D1 ,…, An : Dn ]. • ξ RE : R → τ (U, E) is a function that maps each relationship symbol R in RSET to a U-labeled tuple over E, i.e., ξ RE (R) = [U1 : E1 ,…, Uk : Ek ], where is an optional participation constraint of Ei in R. Without loss of generality: - Each role is specific to exactly one relationship, i.e., for two relationships R1 , R2 with R1 = R2 , if ξ RE (R1 ) = [U1 1 : E1 1 ,…, Uk 1 : Ek 1 ] and ξ RE (R2 ) = [U1 2 : E1 2 ,…, Uk 2 : Ek 2 ], then {U1 1 ,…, Uk 1 } ∩ {U1 2 ,…, Uk 2 } = ∅. - For each role U there is a relationship R and an entity E such that ξ RE (R) = […, U : E,…].

Definition 3. An object instance state O is said to be legal for an EER model MEER = (LSET , ξ EA , ξ RA , ξ RE , θ card , δ ISA , δ SG , δ AGG , δ CAT ), if it satisfies the following conditions:

IEER can be formalized as a tuple MEER = (LSET , ξEA , ξRA , ξRE , θcard , δISA , δSG , δAGG , δCAT ):

• For each entity E with ξ EA (E) → [A1 : D1 ,…, An : Dn ], there is at , e ∈ EO , di ∈ DO . least one element ai = ∈ AO i i • For each relationship R with ξ RE (R) = [U1 : E1 ,…, Uk : Ek ], all instances of R are of the form [U1 : e1 ,…, Uk : ek ], , ei ∈ EO , i ∈ {1,…, k}. where Ui ∈ UO i i • For each relationship R with ξ RE (R) = […, U : E,…], (E, R, U)min ≤ #{r ∈ RO | r[U] = e} ≤ (E, R, U)max holds for each instance e ∈ EO , where #{}denotes the base of set {}. ⊆ • For each pair of entities E1 , E2 ∈ ESET such that δ ISA (E1 , E2 ), EO 1 EO holds. 2 • For each generalization/specialization δ SG (E) = E1 × · · · × En , it holds that ∪ni=1 Ei O ⊆ E O ; ∩ EO = - if there is the disjoint constraint, then it holds that EO i j ∅ for i, j ∈ {1…n} and i = j; - if there is the total constraint, then it holds that ∪ni=1 Ei O =E O . • For each aggregation δ AGG (E1 ) = E2 , it holds that: ∀e1 , e2 . AGG(e1 , e2 ) → e1 ∈ EO ∧ e2 ∈ EO . 1 2

18

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

Fig. 7. An EER model MEER1 for a UNIVERSITY database.

• For each category δ CAT (E) = E1 × · · · × En , it holds that E O ⊆ ∪ni=1 Ei O ; if there is the total constraint, then it holds that E O = ∪ni=1 Ei O . Fig. 7 gives an example of a full EER model MEER1 to illustrate the EER notions mentioned in the previous sections. In Fig. 7: • Some entities are defined: Person, Student, Faculty, and so on. • A weak entity is defined: The existence of the weak entity Dependent depends on the existence of its identifying entity Faculty by the identifying relationship Dependents_of. The primary key of the weak entity Dependent is formed by the primary key FacNo of the identifying entity Faculty and its discriminator DeName. • Some attributes are specified: The specific attributes of Student are StuNo (primary key), Name, composite attribute Address (with HouseNo, Street, District), and multivalued attribute Phone; other attributes such as FacNo, Sex are specified similarly. • Some relationships are defined: All faculty members are related to the department(s) with which they are affiliated [Belongs] (a faculty member can be associated with several departments, so the relationship is M:N); other relationships such as [Major], [CD] are defined similarly. • Two specialization relationships are defined: First, two subclasses of the Person entity are identified: Faculty and Student, and the overlapping constraint is chosen because a person may belong to more than one of the subclasses; Second, Student is specialized into {Graduate_Student, Undergraduate_Student}, and two subclasses are total and disjoint. • An aggregation relationship is defined: A Course is contained in one and only one Curriculum system, while a Curriculum system contains at least one Course. • A category relationship is defined: The category Instructor_Researcher is a subset of the union of Faculty and Graduate_Student and includes faculties or graduate students who are supported by teaching or research.

The EER model MEER1 in Fig. 7 can be formalized as shown in Fig. 8 according to the formalization method of EER models in Definition 1. Further, on the basis of the formalization of the EER model MEER1 in Fig. 8 and the Definitions 2 and 3, the semantics of the EER model MEER1 can be explained precisely. For example, for the specialization δ SG (Student) = Graduate_Student × Undergraduate_Student , it holds Graduate_Student O ∪ Undergraduate_Student O = Student O and Graduate_Student O ∩ Undergraduate_Student O = ∅, where O is an object instance state with respect to the EER model MEER1 ; for the category δ CAT (Instructor_Researcher) = Faculty × Graduate_Student, it holds Instructor_ResearcherO ⊆ FacultyO ∪ Graduate_Student O . 3.2. A description logic ALCQIK In order to represent and reason on EER models with DLs, a "wellfit" DL should be chosen. The DL should be powerful enough to account for the essential features of EER models and has a lower computational complexity of reasoning, so that the representation and reasoning of EER models with the DL are easier to be implemented and exploited. Based on the features of EER models, in this section we propose a DL called ALCQIK , which will be used to represent and reason on EER models in the later sections of this paper. 3.2.1. The syntax, semantics, and knowledge base of ALCQIK The DL ALCQIk proposed in our work is a minor extension of ALCQI with the construct keys. The well-known DL ALCQI [11] (as recalled in Section 2.2) is the logic underpinning of OWL 2 [43] (the W3C recommendation Ontology Web Language) and is supported in lots of reasoning engines. The construct keys was discussed in DLs [10,35], and some attempts were done to add keys into DLs such as DLR [20,21], ALC(D) [39], and OWL 2 [43,30]. All of these proposals give us good hints for adding keys into ALCQI (the resulting DL is called ALCQIK ) to represent and reason on EER models.

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

19

Fig. 8. Formalization of the EER model MEER1 in Fig. 7.

In the following, we give the syntax and semantics of ALCQIK . Definition 4 (syntax). The arbitrary concepts (denoted C or D), the arbitrary roles (denoted R), and the construct keys in ALCQIK are formed according to the following syntax rules: C, D →

| ⊥| A| ¬C C D| CD| ∀R.C | ∃R. C | ≥n R.C | ≤n R.C P| P−

(Universal concept) (Bottom concept) (Atomic concept) (Concept negation) (Intersection) (Union) (Value restriction) (Full existential quantification) (Qualified number restrictions) (Qualified number restrictions) R→ (Atomic role) (Role inverse) (Key assertion) (key C [i1 ]R1 , . . . , [ih ]Rh ) Comments: Rj is a role, j ∈ {1, . . . , h}, and ij denotes one component of Rj . Intuitively, such an assertion states that two instances of C cannot agree on the participation to R1 , . . . , Rh via components i1 , . . . , ih , respectively.

Definition 5 (semantics). The semantics of ALCQIK are provided by an interpretation I = (I , •I ), where the domain of the interpretation I is a non-empty set, and I is an interpretation function that maps: AI ⊆ I RI ⊆ I × I I = I ⊥I = ∅ (¬C )I = I \C I (C D)I = C I ∩ DI (C  D)I = C I ∪ DI (∀R.C )I = {a ∈ DI |∀b.(a, b) ∈ RI → b ∈ C I } (∃R.C )I = {a ∈ I |∃b.(a, b) ∈ RI ∧ b ∈ C I } ( ≤ nR.C )I ={a ∈ I | #{b | (a, b) ∈ RI ∧ b ∈ C I } ≥ n} ( ≤ nR.C )I = {a ∈ I | #{b | (a, b) ∈ RI ∧ b ∈ C I } ≤ n} (R− )I (a, b) = RI (b, a) I I satisfies (key C [i1 ]R1 , . . . , [ih ]Rh ) iff for all a, b ∈ C and for all t1 , s1 ∈ RI1 , . . . , th , sh ∈ RIh we have that: a = t1 [i1 ] = · · · = th [ih ], b = s1 [i1 ] = · · · = sh [ih ], tj [i] = sj [i], for each j ∈ {1, . . . , h}, and for i = ij implies a = b

The above constructors can be used to build DL knowledge bases for representing the knowledge of an application domain (the “world”) by first defining the relevant concepts of the domain (i.e., its

20

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

terminology), and then using these concepts to specify properties of objects and individuals occurring in the domain (i.e., the world description). An ALCQIK knowledge base can be defined as follows:

Theorem 1. An ALCQIK knowledge base KB =< T , A, K > is satisfiability if and only if the ALCQI knowledge base KB  =< T , A ∪ AK > is satisfiability.

Definition 6 (knowledge base). An ALCQIK knowledge base KB consists of a TBox T , an Abox A, and a set of key assertions K, i.e., KB =< T , A, K >:

It is obvious that Theorem 1 is the straightforward consequence of the reformulation process as just mentioned above, and thus the proof of Theorems 1 is omitted here.

A TBox T is a finite set of terminology axioms of the form C  D or C ≡ D; An interpretation I satisfies C  D or C ≡ D if C I ⊆ DI or C I = DI ; An ABox A is a finite set of assertions of the form C(a), R(a, b), a ≈ b or a ∼ = b; An interpretation I satisfies C(a), R(a, b), a ≈ b or a ∼ = b if aI ∈ C I , (aI , bI ) ∈ RI , aI = bI or aI = bI ; A set of key assertions K is a set of assertions of the form (key C [i1 ]R1 , . . . , [ih ]Rh ); An interpretation I satisfies an ALCQIK knowledge base KB if it satisfies all axioms and assertions in KB; In this case, I is called a model of KB.

3.2.2. Reasoning on ALCQIK As in many DLs (e.g., ALC, ALCQI, SIN , SHIQ as mentioned in Section 2.2.2), the basic reasoning tasks over an ALCQIK knowledge base KB are as follows: • Knowledge base satisfiability: An ALCQIK knowledge base KB is satisfiable iff there is an interpretation I which satisfies all axioms and assertions in KB. • Concept satisfiability: A concept C is satisfiable in an ALCQIK knowledge base KB iff there is some model I of KB such that C I = ∅. • Concept subsumption: A concept C is subsumed by a concept D (written as C  D) in an ALCQIK knowledge base KB if for every model I of KB it holds that C I ⊆ DI . Moreover, concept equivalence can be expressed by concept subsumption, i.e., C and D are equivalent can be expressed as C  D and D  C. Also, concept disjointness can be expressed by concept subsumption, i.e., two disjoint concepts C and D can be expressed as C  D  ⊥. • Logical implication: An ALCQIK knowledge base KB implies an axiom or assertion η, written KB  η, iff all models of KB satisfy η. All of these reasoning problems can be reduced to each other [11]. For example, the concept subsumption can be reduced to the concept unsatisfiability, e.g., C  D iff C  ¬D is unsatisfiability. The concept satisfiability can be reduced to the logical implication, e.g., a concept C is satisfiable in an ALCQIK knowledge base KB iff KB  C  ⊥. Also, the logical implication can be mutually reduced to knowledge base satisfiability, e.g., KB  C  D iff KB ∪ {C  ¬D} is unsatisfiability. In particular, it shows in [11,12,22] that checking an ALCQI knowledge base for satisfiability is EXPTIME-complete. Therefore, based on the ideas in [20,21,39,35], we now deal with the problem of reasoning on ALCQIK knowledge bases by reducing ALCQIK knowledge bases into ALCQI knowledge bases. Based on the semantics of key assertions in Definition 5, given a key assertion (key C [i1 ]R1 , . . . , [ih ]Rh ), which can be reformulated in terms of the following set of assertions AK : • C(a), C(b), a ∼ = b; • Rj (tj ), Rj (sj ), j ∈ {1, . . . , h}, where tj and sj are tuples whose arity if that of Rj with tj [ij ] = sj [ij ], tj [ij ] = a, and sj [ij ] = b for i = ij . Therefore, an ALCQIK knowledge base KB =< T , A, K > can be reformulated as an ALCQI knowledge base KB  =< T , A ∪ AK >, where A ∪ AK is a set of assertions without key assertions as follows: • a finite set of assertions of the form C(a), R(a, b), a ≈ b or a ∼ = b as mentioned in Definition 6; • for each key assertion (key C [i1 ]R1 , . . . , [ih ]Rh ), A ∪ AK contains: (i) either C(a) or ¬C(a); (ii) either C(b) or ¬C(b); (iii) either Rj (tj ) or ¬Rj (tj ); (iv) either Rj (sj ) or ¬Rj (sj ). The following theorems show that reasoning on ALCQIK can be reduced to reasoning on ALCQI.

Theorem 2. Satisfiability in ALCQIK is EXPTIME-complete. Proof (sketch). By Theorem 1, satisfiability of an ALCQIK knowledge base KB =< T , A, K > reduces to solving a set of tests, where each test involves on a set of assertions A ∪ AK and consists in directly verifying all key assertions in A ∪ AK and checking the satisfiability of the ALCQI knowledge base KB  =< T , A ∪ AK >. As we have known, satisfiability in ALCQI is EXPTIME-complete [11,12,22]. Therefore, it shows that satisfiability in ALCQIK is EXPTIMEcomplete.  The DL ALCQIK will be adept at representing and reasoning about EER models as will be discussed in the following sections. 4. Representing EER models with ALCQIK In this section, we propose a formal approach and develop a prototype tool for transforming EER models into DL ALCQIK knowledge bases, including: - proposing complete and detailed transformation rules from EER models to ALCQIK knowledge bases, and providing a transformation example (see Section 4.1); - proving the correctness of the transformation (see Section 4.2); - implementing a prototype transformation tool EER2DL (see Section 4.3). 4.1. Transforming EER models into ALCQIK knowledge bases In the following we propose a formal approach for representing EER models with the DL ALCQIK , i.e., transforming an EER model into an ALCQIK knowledge base (Definition 7). Definition 7 (transformation). Given an EER model MEER = (LSET ,

ξ EA , ξ RA , ξ RE , θ card , δ ISA , δ SG , δ AGG , δ CAT ) in Definition 1, which can be transformed into an ALCQIK knowledge base KB = ψ (MEER ) = < T , A, K > as the function ψ in Table 1. Here we provide an example to enable readers to well understand the transformation rules in Table 1. Example 1. Given the full EER model MEER1 in Fig. 7 of Section 3.1, following the proposed rules in Table 1, it can be transformed into an ALCQIK knowledge base KB = ψ (MEER1 ) as shown in Fig. 9. 4.2. The correctness of the transformation approach In this section we prove the correctness of the transformation approach proposed in Section 4.1, i.e., prove the approach is a semantics-preserving transformation. As mentioned in Section 3, the semantics of an EER model are given by object instance states, and the semantics of an ALCQIK knowledge base are defined by interpretations. Based on the similar ideas in [9,11,18], the semantics-preservation of the proposed approach in Section 4.1 can be sanctioned by establishing mappings between the object instance states (w.r.t. the EER models) and the interpretations of the transformed ALCQIK knowledge bases. Theorem 3. Given an EER model MEER = (LSET , ξ EA , ξ RA , ξ RE , θ card , δ ISA , δ SG , δ AGG , δ CAT ), there are mappings μM , from an object instance state w.r.t. MEER to an interpretation of the transformed ALCQIK knowledge base ψ (MEER ), andλM , from an interpretation of ALCQIK

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

21

Table 1 Transformation rules from an EER model to an ALCQIK knowledge base. EER model MEER = (LSET , ξ EA , ξ RA , ξ RE , θ card , δ ISA , δ SG , δ AGG , δ CAT )

ALCQIK knowledge base KB = ψ (MEER ) = < T , A, K >

Transformation of symbols LSET in MEER

ALCQIK concepts and roles

Eaxh entity E∈ESET , such as

An atomic concept ψ (E)

Each weak entity E∈ESET , such as

An atomic concept ψ (E)

Each relationship without attributes R∈RSET , such as

An atomic role ψ (R)

Each relationship R∈RSET with attributes A∈ASET , i∈{1…n}, such as

An atomic concept ψ (R) An atomic role ψ (Ai )

Each role U ∈ USET , such as Each (derived/multivalued) attributes A ∈ ASET with its domain D, such as

An atomic role ψ (U) An atomic role ψ (A) An atomic concept ψ (D)

Each composite attribute A ∈ ASET which is composed of attributes A1 , . . . , An , such as

An atomic concept ψ (A) n atomic roles ψ (A1 ), . . . , ψ (An )

Transformation of constraints in MEER Each entity E ∈ ESET such that ξ EA (E) → [A : D, A1 : D1 ,…, An : Dn ], where A is the primary key attribute, A1 , . . . An are (derived) attributes.

ALCQIK axioms and assertions ψ (E)  ∃ψ (A).ψ (D)  =1ψ (A)  ∀ψ (A1 ).ψ (D1 )  ≤1ψ (A1 ) . . . ∀ψ (An ).ψ (Dn )  ≤1ψ (An )

Each entity E ∈ ESET with multivalued attribute A such as:

ψ (E)  ∀ψ (A).ψ (D)

Each entity E ∈ ESET with a composite attribute A which is composed of simple attributes A1 , . . . , An , such as:

Adding an ALCQIK role ψ (Ehas-c- A ) ψ (E)  ∀ψ (Ehas-c- A ).ψ (A)  ≤1ψ (Ehas-c- A ) ψ (A)  ∀ψ (A1 ).ψ (D1 )  ≤1ψ (A1 ) . . . ∀ψ (An ).ψ (Dn )  ≤1ψ (An )

Each relationship R ∈ RSET with ξ RE (R) = [U1 : E1 , U2 : E2 ], where U1 , U2 ∈ USET , and E1 , E2 ∈ ESET , such as:

Case 1:  ∀ψ (R).ψ (E2 )  ∀ψ (R) − .ψ (E1 ) ψ (E1 )  ≤1ψ (R). ψ (E2 )  ≤1ψ (R) − . Case 2:  ∀ψ (R).ψ (E2 )  ∀ψ (R) − .ψ (E1 ) ψ (E2 )  ≤1ψ (R) − . Case 3:  ∀ψ (R).ψ (E2 )  ∀ψ (R) − .ψ (E1 )

Each relationship R ∈ RSET with ξ RE (R) = [U1 : E1 , U2 : E2 ] and θ card (Ei , R, Ui ) = ((Ei , R, Ui )min , (Ei , R, Ui )max ), where U1 , U2 ∈ USET , and E1 , E2 ∈ ESET , such as:

 ∀ψ (R).ψ (E2 )  ∀ψ (R) − .ψ (E1 ) ψ (E1 )  ≥min2 ϕ (R).  ≤max2 ϕ (R). ψ (E2 )  ≥min1 ϕ (R) − .  ≤max1 ϕ (R) − .

Each relationship R ∈ RSET with ξ RE (R) = [U1 : E1 , U2 : E2 ], and R has its own attribute A, where U1 , U2 ∈ USET , and E1 , E2 ∈ ESET , such as:

Case 1: ψ (R)  ∃ψ (U1 ).ψ (E1 )  ≤1ψ (U1 )  ∃ψ (U2 ).ψ (E2 )  ≤1ψ (U2 )  ∀ψ (A).ψ (D)  ≤1ψ (A) ψ (E1 )  =1ψ (U1 )− .ψ (R) ψ (E2 )  =1ψ (U2 )− .ψ (R) Case 2: ψ (R)  ∃ψ (U1 ).ψ (E1 )  ≤1ψ (U1 )  ∃ψ (U2 ).ψ (E2 )  ≤1ψ (U2 )  ∀ψ (A).ψ (D)  ≤1ψ (A) ψ (E2 )  ≤1ψ (U2 )− . ψ (R) Case 3: ψ (R)  ∃ψ (U1 ).ψ (E1 )  ≤1ψ (U1 )  ∃ψ (U2 ).ψ (E2 )  ≤1ψ (U2 )  ∀ψ (A).ψ (D)  ≤1ψ (A)

Each relationship R ∈ RSET with ξ RE (R) = [U1 : E1 , U2 : E2 ], θ card (Ei , R, Ui ) = ((Ei , R, Ui )min , (Ei , R, Ui )max ), and R has its attribute A, where U1 , U2 ∈ USET , and E1 , E2 ∈ ESET , such as:

ψ (R)  ∃ψ (U1 ).ψ (E1 )  ≤1ψ (U1 )  ∃ψ (U2 ).ψ (E2 )  ≤1ψ (U2 )  ∀ψ (A).ψ (D)  ≤1ψ (A) ψ (E1 )  ≥min2 ψ (U1 )− .ψ (R)  ≤max2 ψ (U1 )− .ψ (R) ψ (E2 )  ≥min1 sψ (U2 )− .ψ (R)  ≤max1 ψ (U2 )− .ψ (R)

An entity E2 totally participates in a relationship R, such as

 ∀ψ (R).ψ (E2 )  ∀ψ (R)− .ψ (E1 ) ψ (E2 )  ∃ψ (R)− . (continued on next page)

22

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

Table 1 (continued) EER model MEER = (LSET , ξ EA , ξ RA , ξ RE , θ card , δ ISA , δ SG , δ AGG , δ CAT ) A weak entity E2 depends on the existence of a identifying entity E1 via a identifying relationship R, where A1 is the primary key of E1 and A2 is the discriminator of E2 , such as:

Each subclass/superclass relationship δ ISA ⊆ E1 × E2 , where E1 , E2 ∈ ESET , such as:

Each a specialization/generalization with the optional constraints disjointness/overlap/ total/partial between a superclass E and n subclasses E1 ,…, En , i.e., δ SG (E) = E1 × … × En , such as:

Each aggregation δ AGG (E1 ) = E2 between a containing entity E1 and a contained entity E2 :

Each category between a subclass E and the union of n superclasses E1 ,…, En with the optional constraints Total/Partial, i.e., δ CAT (E) = E1 × … × En

Each pair of symbols x, y ∈ ESET ∪ RSET ∪ DSET with x = y and x ∈ RSET

knowledge base ψ (MEER ) to an object instance state w.r.t. MEER , such that: • For each legal object instance state O for MEER , μM (O) is an interpretation of ψ (MEER ), and for each symbol X ∈ LSET , X O = (ψ(X ))μM (O) holds. • For each interpretation I of ψ (MEER ), λM (I) is a legal object instance state for MEER , and for each symbol X ∈ LSET , (ψ(X ))I = X λM (I ) holds. Proof. The following gives the proof of the first part of Theorem 3. Given an object instance state O of the EER model MEER , which is constituted by a finite set of values O as mentioned in Definition 2 of Section 3.1, and an interpretation I = μM (O) of ψ (MEER ) can be defined as follows: • The interpretation domain μM (O) is constituted by the values of the object instance state O, i.e., μM (O) = O ∪ ∪R∈RSET RO . Here, according to Definition 2 of Section 3.1, we know that each relationship R of the EER model MEER is assigned with a U-labeled tuples over O , and thus we use ∪R∈RSET RO to explicitly represent the structure of the relationship in ALCQIK knowledge base. • The atomic concepts and roles in the interpretation I = μM (O) of ψ (MEER ) constituted by the set of symbols LSET = ESET ∪ ASET ∪ DSET ∪ RSET ∪ USET in the EER model MEER , i.e., (ψ(X ))μM (O) = {X O | X ∈ LSET }. Further, for each relationship R ∈ RSET such that ξ RE (R) = [U1 : E1 ,…, Uk : Ek ], (ψ(Ui ))μM (O) = { ∈ μM (O) ×μM (O) | r ∈ RO ∧ ei ∈ EiO ∧ r[Ui ] = ei }, where i = 1,…, k. Further, we show that the interpretation μM (O) defined above satisfies each axioms and assertions in ψ (MEER ) as proposed in

ALCQIK knowledge base KB = ψ (MEER ) = < T , A, K >  ∀ψ (R).ψ (E2 )  ∀ψ (R) − .ψ (E1 )

ψ (E2 )  ≤1ψ (R) − . ψ (E1 )  ∃ψ (A1 ).  =1ψ (A1 ) (key ψ (E2 ) [1]ψ (A1 ), [1]ψ (R), [1]ψ (A2 ))

ψ (E1 )  ψ (E2 )

Case 1: ψ (E1 )  . . .  ψ (En ) = ψ (E) ψ (Ei )  ψ (Ej ) = ∅, i = j, i, j ∈ {1…n} Case 2: ψ (E1 )  . . .  ψ (En )  ψ (E) ψ (Ei )  ψ (Ej ) = ∅, i = j, i, j ∈ {1…n} Case 3: ψ (E1 )  . . .  ψ (En ) = ψ (E) Case 4: ψ (E1 )  . . .  ψ (En )  ψ (E) Adding an ALCQIK role ψ (AGG)  ∀ψ (AGG).ψ (E2 )  ∀ψ (AGG)− .ψ (E1 ) ψ (E1 )  ≥1ψ (AGG). ψ (E2 )  =1ψ (AGG)− . Case 1: ψ (E) = ψ (E1 )  . . .  ψ (En ) Case 2: ψ (E)  ψ (E1 )  . . .  ψ (En ) Case 3: Without constraints ψ (E)  ψ (E1 )  . . .  ψ (En )

ψ (x)  ¬ψ (y)

Definition 7 of Section 4.1. To simplify the proof, the axioms and assertions in ψ (MEER ) of Definition 7 may be partitioned into the following several main cases. Case 1: Let E ∈ ESET be an entity such that ξ EA (E) → [A : D, A1 : D1 ,…, An : Dn ], and consider an instance e ∈ (ψ(E ))μM (O) . Firstly, by the definition of μM (O) above, e ∈E O . Further, by the definition of legal object instance states in Definition 3 of Section 3.1, there is exactly one element a ∈ AO = (ψ(A))μM (O) whose first component is e, and the second component is an element d ∈ DO = (ψ(D))μM (O) , i.e., a = ∈ AO . Similarly, there also may be one element ai ∈ AO i

= (ψ(Ai ))μM (O) whose first component is e, and the second compo, i.e., ai = ∈ AO = (ψ(Ai ))μM (O) , nent is an element di ∈ DO i i where i ∈ {1, . . . , n}. That is to say, the interpretation μM (O) satisfies the transformed axioms corresponding to the constraints ξ EA (E) in Definition 7 of Section 4.1. Case 2: Let R ∈ RSET be a relationship such that ξ RE (R) = [U1 : E1 ,…, Uk : Ek ], and consider an instance r ∈ (ψ(R))μM (O) . We have to show that for each i ∈ {1, . . . , k} there is exactly one element ei ∈ μM (O) such that ∈ (ψ(Ui ))μM (O) , and that moreover ei ∈ (ψ(Ei ))μM (O) . Firstly, by the definition of μM (O) above, r ∈RO . Then, by the definition of legal object instance states in Definition 3 of Section 3.1, r is a labeled tuple of the form [U1 : e1 ’,…, Uk : ek ’], where ei ’ ∈ EiO , i ∈ {1, . . . , k}. That is, r is a function defined on {U1 , …, Uk }. Further, again by the definition of μM (O) above, we know that ei is unique and equal to ei ’, and ei ∈ (ψ(Ei ))μM (O) . Moreover, if there is a cardinality constraint θ card (Ei , R, Ui ) = ((Ei , R, Ui )min , (Ei , R, Ui )max ) defined on R ∈ RSET . By the definition of legal object instance states in Definition 3 of Section 3.1, (Ei , R, Ui )min ≤ #{r ∈ RO | r[Ui ] = ei } ≤ (Ei , R, Ui )max , where #{}denotes the base of set {}. By the

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

23

Fig. 9. A transformation example from the EER model MEER1 in Fig. 7 to an ALCQIK knowledge base.

definition of μM (O) above, (ψ(Ei ))μM (O) ⊆ {ei | (Ei , R, Ui )min ≤ #{r ∈ (ψ(R))μM (O) | ∈ (ψ(Ui ))μM (O) } ≤ (Ei , R, Ui )max }. That is to say, the interpretation μM (O) satisfies the transformed axioms corresponding to the constraints ξ RE (R) in Definition 7 of Section 4.1. Analogously, since aggregations are a particular kind of binary rela-

tionships as mentioned in Section 2.1, the case δ AGG can be easily proved according to the process above. Case 3: Let δ ISA ⊆ E1 × E2 be a subclass/superclass relationship. Firstly, by the definition of legal object instance states in Definition 3 of Section 3.1, E1O ⊆ E2O . Further, by the definition of μM (O) above, E1O

24

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

= (ψ(E1 ))μM (O) and E2O = (ψ(E2 ))μM (O) . Therefore, (ψ(E1 ))μM (O) ⊆ (ψ(E2 ))μM (O) . That is to say, the interpretation μM (O) satisfies the transformed axioms corresponding to the constraints δ ISA in Definition 7 of Section 4.1. Case 4: Let δ SG (E) = E1 × … × En be a generalization/specialization relationship. Firstly, by the definition of legal object instance states in Definition 3 of Section 3.1, ∪ni=1 Ei O ⊆ E O . Further, by the definition of

μM (O) above, E1O = (ψ(Ei ))μM (O) and E O = (ψ(E ))μM (O) . Therefore, ∪ni=1 (ψ(Ei ))μM (O) ⊆ (ψ(E ))μM (O) , where i ∈ {1, . . . , n}. Moreover, if there is the disjoint constraint defined on δ SG (E) = E1 × … × En . By the definition of legal object instance states in Definition 3 of Section 3.1, E1O ∩ E O = ∅ for i, j ∈ {1…n} and i = j. Again by the definition of j

μM (O) above, (ψ(Ei ))μM (O) ∩ (ψ(Ej ))μM (O) = ∅. Similarly, if there is the total constraint defined on δ SG (E) = E1 × … × En . Again by the definition of legal object instance states in Definition 3 of Section 3.1, ∪ni=1 Ei O =E O . By the definition of μM (O) above, ∪ni=1 (ψ(Ei ))μM (O) =

(ψ(E ))μM (O) . Above all, that is to say, the interpretation μM (O) satisfies the transformed axioms corresponding to the constraints δ SG (E)

in Definition 7 of Section 4.1. Case 5: Let δ CAT (E) = E1 × … × En be a category relationship. Firstly, by the definition of legal object instance states in Definition 3 of Section 3.1, E O ⊆ ∪ni=1 Ei O . Further, by the definition of μM (O) above,

E O = (ψ(E ))μM (O) and E1O = (ψ(Ei ))μM (O) . Therefore, (ψ(E ))μM (O) ⊆ ∪ni=1 (ψ(Ei ))μM (O) . Moreover, if there is the total constraint defined on δ CAT (E) = E1 × … × En . By the definition of legal object instance states in Definition 3 of Section 3.1, E O = ∪ni=1 Ei O . Again by the defi-

nition of μM (O) above, E O = (ψ(E ))μM (O) and E1O = (ψ(Ei ))μM (O) . Therefore, (ψ(E ))μM (O) = ∪ni=1 (ψ(Ei ))μM (O) . Above all, that is to say, the interpretation μM (O) satisfies the transformed axioms corresponding to the constraints δ CAT (E) in Definition 7 of Section 4.1. Case 6: Let x, y ∈ ESET ∪ RSET ∪ DSET be a pair of symbols such that x = y and x ∈ RSET . If y ∈ ESET ∪ DSET , then by the Definition 1 of Section 3.1, we know that various basic alphabets ESET , RSET , and DSET are usually assumed to be pairwise disjoint; if y ∈ RSET , we know that labeled tuples corresponding to different relationships cannot be equal since they are defined over different sets of roles; that is, xO ∩ yO = ∅. Further, by the definition of μM (O) above, xO = (ψ(x))μM (O) and yO = (ψ(y))μM (O) . Therefore, (ψ(x))μM (O) ∩ (ψ(y))μM (O) = ∅. That is to say, the interpretation μM (O) satisfies the transformed axioms ψ (x)  ¬ψ (y) in Definition 7 of Section 4.1. The two parts of Theorem 3 are a mutually inverse process. The proof of the second part of Theorem 3, which can be treated analogously according to the first part above, is omitted here. Above all, given an EER model MEER and its transformed ALCQIK knowledge base ψ (MEER ) according to the proposed approach in Section 4.1, Theorem 3 allows us to establish mappings between the object instance states of MEER and the interpretations of ψ (MEER ). That is, the transformation from EER models to ALCQIK knowledge bases proposed in Section 4.1 is correct and semantics-preserving. Furthermore, in order to implement the automated representation of EER models with the DL ALCQIK , in the following section we develop a prototype transformation tool. 4.3. Prototype transformation tool Following the proposed approach in Section 4.1, we implemented a prototype representation tool called EER2DL, which can transform an EER model into an ALCQIK knowledge base. In the following we first introduce the design and implementation of the prototype transformation tool EER2DL. Then, with one of the running examples of EER2DL we show that the proposed approach is feasible and the implemented tool is efficient. The core of EER2DL is that it can transform an EER model into an ALCQIK knowledge base. The implementation of EER2DL is based on

Fig. 10. The overall architecture of the prototype tool EER2DL.

Java 2 JDK 1.7.0 platform, and the Graphical User Interface is exploited by using java.awt and javax.swing packages. Fig. 10 shows the overall architecture of EER2DL, which includes several main modules, i.e., parsing module, transformation module, and output module: • The parsing module uses the regular expression to parse the input XML-coded EER model file and store the parsed information as Java ArrayList classes. The symbols and constraints of the EER model in the XML-coded file (such as EER entities, attributes, roles, relationships, specialization, aggregation, and so on) can be extracted and represented as the formalization of the EER model as proposed in Section 3.1. • The transformation module transforms the parsed results into the corresponding ALCQIK knowledge base according to the following algorithm EER2DL_Trans in Fig. 11, which is given based on the proposed approach in Section 4.1. • The output module finally produces the resulting ALCQIK knowledge base which is saved as a text file. Also, the information including the source EER model in XML format, the parsed results, and the target ALCQIK knowledge base are displayed on the tool screen (see Fig. 13). Here we briefly analyze the time complexity of algorithm EER2DL_Trans by measuring the amount of work done by the algorithm. From the Fig. 11, it is shown that the algorithm mainly performs two kinds of operations, i.e., the transformation from EER symbols to ALCQIK concepts and roles and the transformation from EER constraints to ALCQIK axioms and assertions. Since the transformation from EER symbols to ALCQIK concepts and roles (Step 1 of the algorithm) can be simultaneously made as sub-operations in creating ALCQIK axioms and assertions (Step 2 of the algorithm), we can ignore the amount of work done in the first step and consider only the creation of axioms and assertions in the second step. Also, we ignore the preprocessing operations (i.e., EER model parsing and element extraction as mentioned in Fig. 10), that is, we exclude the amount of work done by an XML parser (e.g., the DOM API for Java in our implementation) that parses the EER model (i.e., an XMI-coded file) and prepares the element data in computer memory for the usage in the transformation procedure. On this basis, the time complexity of the algorithm mainly depends on the structure of an EER model. Suppose the scale of the EER model MEER is N = NE +NR +NA +NU +NISA +NSG +NCAT +NAGG , where NE , NR , NA , NU , NISA , NSG , NCAT , and NAGG denotes the cardinality of sets of entities, relationships, attributes, roles, subclass/superclass, generalization/specialization, category, and aggregation, respectively. Then, the creating times of the corresponding axioms/assertions of the cases

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

25

Fig. 11. The brief transformation algorithm EER2DL_Trans from EER models to ALCQIK knowledge bases.

ξ EA (E) and ξ RA (R) are NE + NR + NA at most, the case ξ RE (R) are NR +NU +1 at most, the case δ ISA (E) are NISA at most, the case δ SG (E) are NSG +(NE 2 -3NE -2)/2 at most, the case δ CAT (E) are NCAT at most, the case δ AGG (E) are NAGG , and the last case are NR × (NR -1)/2 + NR × NE

= NR × (NE + NR /2—1/2). Therefore, in worst case, the total running times T= NE + NR + NA + NR + NU + 1+ NISA + NSG + (NE 2 -3NE -2)/2 + NCAT + NAGG + NR × (NE + NR /2—1/2) < N2 +5N, that is, the time complexity of the algorithm is O(N2 ) at most. We carried out some transformation experiments from EER models to ALCQIK knowledge bases using the implemented tool EER2DL, with a PC (Inter Core i7 [email protected], RAM 8.0GB and Windows 7 system). As we have known, currently there is not a widely accepted or standard dataset of EER models. In this case, the EER models used in our experiments are mainly from the following several parts: • Some ones come from the existing EER models (e.g., from the references [48,49,46,25], and the website1 ); • Some ones are created manually by us with the CASE tools PowerDesigner2 , Lucidchart3 and creately4 (e.g., the EER model MEER1 in Fig. 7);

1 2 3 4

https://www.google.com.hk/search?q=Enhanced+Entity-Relationship+diagram http://www.powerdesigner.de/ https://www.lucidchart.com/pages/enhanced-entity-relationship-diagram http://creately.com/

Fig. 12. The actual execution time of testing several EER models by the tool EER2DL.

• Some others are derived by importing some additional constraints into the existing ER models5 by means of the tools above. Fig. 12 shows the actual execution time of testing several transformations from EER models to ALCQIK knowledge bases by the EER2DL tool, where the preprocessing denotes the operations of parsing the XML-coded EER model files and preparing the element data in computer memory for the usage in the transformation procedure.

5

http://creately.com/diagram-community/popular/t/erd

26

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

Fig. 13. Screen snapshot of the transformation tool EER2DL.

Case studies show that our approach and the prototype tool actually work. In the following, we give the screen snapshot of EER2DL, and an example is provided to well show the running process of the tool EER2DL. Fig. 13 shows the screen snapshot of EER2DL, which displays the transformation of an EER model in Fig. 7 to an ALCQIK knowledge base. In Fig. 13, the source XML-coded EER model file produced from the CASE tool, the parsed results, and the transformed ALCQIK knowledge base are displayed in the left, middle and right areas, respectively. Note that, the more complete XML-coded EER model file can be found at the Appendix A. Moreover, we should be noted that after an EER model is designed by the CASE tools mentioned above, it can be saved as a file in XML format as shown in the Appendix A, and such XML format files can be inputted and supported by the prototype tool EER2DL. In addition, the process of the prototype tool currently needs the users participation in a certain way (e.g., solve conflicts, correct mistakes, etc.). In our future work, we will further enhance the prototype tool in such several aspects. So far, on the basis of the proposed approach and the implemented tool, EER models can be represented as ALCQIK knowledge bases. Accordingly, it is possible to take advantage of the associated reasoning techniques of ALCQIK to reason about the EER models. In the following section we will further investigate how to reason on EER models with ALCQIK . 5. Reasoning on EER models with ALCQIK On the basis of the transformed ALCQIK knowledge bases from EER models in Section 4, this section will further investigate how to reason on EER models with ALCQIK , including: - We introduce some reasoning tasks considered in EER models with an example in Section 5.1; - We give formal definitions of the reasoning tasks in Section 5.2; - We propose methods of reasoning on EER models with ALCQIK in Section 5.3, and the methods are used to deal with the reasoning tasks in the example of Section 5.1.

5.1. Reasoning tasks of EER models Here we first give a brief example to help readers to understand the reasoning tasks of EER models. In the later sections we will further illustrate how to reason on EER models with ALCQIK . Example 2. We take the previous EER model MEER1 in Fig. 7 as example. To introduce the reasoning tasks more clearly, we assume that an additional subclass/superclass relationship between the subclass Undergraduate_Student and the superclass Graduate_Student is added into Fig. 7 by the designer. The following reasoning tasks may occur in the EER model: (a) As shown in Fig. 7, Student is specialized into two sub-entities {Undergraduate_Student, Graduate_Student} with the total and disjoint constraints. That is, there is no object instance which belongs to the two sub-entities simultaneously. For illustration purposes, assuming that several object instances {s1 , s2 } ∈ Undergraduate_Student in the real applications. Accordingly, it can be inferred that:

{s1 , s2 } ∈/ Graduate_Student

(1)

(b) As mentioned at the beginning of Example 2, the entity Undergraduate_Student is a subclass of Graduate_Student, i.e., any object instance belonging to the subclass must belong to the superclass. Moreover, as stated at (a) that {s1 , s2 } ∈ Undergraduate_Student. Thus, it can be inferred that:

{s1 , s2 } ∈ Graduate_Student

(2)

(c) From (1) and (2), it is shown that the two object instance sets of Graduate_Student are conflictive. Therefore, it can be inferred that the object instance set of Undergraduate_Student is an empty set, because the empty set is the only set that can be at the same time disjoint from and contained in the entity Graduate_Student. In this case, we can found that Undergraduate_Student is unsatisfiable since it is an empty entity. (d) The union of two entities Graduate_Student and Undergraduate_Student completely covers the entity Student as shown in Fig. 7, and it is shown from (c) that Undergraduate_Student is an

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

empty set. Therefore, it can be inferred that Student is equivalent to Graduate_Student, i.e., there is redundancy in the EER model. (e) The above cases may result in that other undesirable problems occur in a complex EER model. In summary, being similar to the data models (such as ER, UML, and object-oriented data model [4,6,9,11,12,18]), the familiar reasoning problems of EER models include satisfiability, subsumption, equivalence, disjointness, consistency, and redundancy. - Satisfiability is to determine whether an entity stands for a nonempty set of object instances; - Subsumption is to determine whether an entity is a subclass of another entity; - Equivalence is to determine whether two entities are equivalent, i.e., they denote the same set of object instances; - Disjointness is to determine whether two entities are disjoint, i.e., there is no object instance which belongs to the two entities simultaneously; - Consistency is to check whether an EER model is consistent, i.e., whether the notions altogether in the EER model are contradictory; - Redundancy is to check whether there is redundancy in an EER model, e.g., if there are two equivalent entities in a model, then the model is redundant.

27

Comments: Two disjoint entities denote that they cannot refer to the same object instance. Disjointness is usually useful for eliminating inconsistencies in an EER model, e.g., if two entities E1 and E2 are declared to be disjoint, but in the reasoning process we found that there is some object instance e such that e ∈ E1O and e ∈ E2O , then it can be inferred that there may be inconsistency in the EER model. Definition 12 (consistency). An EER model is consistent, if the notions altogether in the EER model are not contradictory. Comments: (i) An EER model is consistent if there is at least one object instance state O which satisfies the constraints of the model. (ii) An inconsistent EER model may be due to a design error or overconstraining. Observe that the interaction of various types of constraints may make it difficult to detect inconsistencies. Definition 13 (redundancy). An EER model is redundant, if there is some entity standing for an empty entity; or there are two equivalent entities. Comments: Removing redundancy may reduce the complexity and increase understandability of an EER model. If some entity is an empty entity, the designers should modify or delete it. Also, when two entities are equivalent, one of the entities can be removed and replaced by another one. 5.3. Reasoning on EER models with ALCQIK

All of the reasoning problems above may occur in the EER modeling activities, and the burden of checking these problems is left to the designers. Therefore, it would be highly desirable to improve the ability of reasoning on EER models. Based on the previous transformation work in this paper, in the following sections we make an attempt to resolve the reasoning problems by means of the DL ALCQIK .

In this section we propose methods for reasoning on EER models with ALCQIK . After transforming an EER model into a DL ALCQIK knowledge base according to our approach in Sections 3 and 4, the following theorems can further equivalently transform the reasoning of EER models into the reasoning of ALCQIK knowledge bases, and thus the reasoning problems of EER models can be checked by means of the reasoning ability of ALCQIK .

5.2. Formal definitions of reasoning problems of EER models

Theorem 4 (satisfiability). Given an EER model MEER , E is an entity in MEER , ψ (MEER ) is a transformed ALCQIK knowledge base, and ψ (E) is a concept in ψ (MEER ) corresponding to E. Then: E is satisfiable in MEER iff ψ (MEER )  ψ (E)  ⊥.

In the following we give formal definitions of the reasoning tasks of EER models mentioned in Section 5.1. Definition 8 (satisfiability). An entity E in an EER model MEER is satisfiable, if there is at least one object instance state O of MEER in Definition 2 such that E O = ∅. Comments: (i) An entity E in an EER model MEER is satisfiable, i.e., MEER admits at least one object instance state O and at least one object instance e such that e ∈ E O ; (ii) An unsatisfiable entity weakens the understandability of an EER model, since it stands for an empty entity, and thus, at the very least, it is inappropriately named. The designers should modify or delete it to increase understandability. Definition 9 (subsumption). Let E1 and E2 be two entities in an EER model MEER . If for each object instance state O of MEER , it holds that E1O ⊆ E2O , then E1 is a subclass of E2 . Comments: Determining subsumption relationships is the basis for the classification of entities in an EER model. If all object instances of a more specific entity are not supposed to be instances of a more general entity, then something may be wrong with the EER model, since it is forcing an undesired conclusion. Definition 10 (equivalence). Let E1 and E2 be two entities in an EER model MEER . If for each object instance state O of MEER , it holds that E1O = E2O , i.e., two entities denote the same set of object instances, then E1 is equivalent to E2 . Comments: The task of determining the equivalence relationship between two entities E1 and E2 can be reduced to checking the subsumption relationships. In detail, if E1 is a subclass of E2 , and E2 is also a subclass of E1 , then E1 is equivalent to E2 . Definition 11 (disjointness). Let E1 and E2 be two entities in an EER model MEER . E1 and E2 are disjoint, if for each object instance state O of MEER , it holds that E1O ∩ E2O = ∅.

Proof. “⇒”: If E is satisfiable, then there is an object instance state O of MEER such that E O = ∅. By part 1 of Theorem 3, μM (O) is an interpretation of ψ (MEER ), and E O = (ψ(E ))μM (O) . Combining the above results, it can be further inferred that (ψ(E ))μM (O) = ∅. That is, ψ (MEER )  ψ (E)  ⊥. “⇐”: If ψ (MEER )  ψ (E)  ⊥, then ψ (E) is satisfiable, i.e., there is an interpretation I of ψ (MEER ) such that (ψ(E ))I = ∅. By part 2 of Theorem 3, λM (I ) is an object instance state for MEER , and (ψ(E ))I = E λM (I ) . Combining the above results, it can be further inferred that E λM (I ) = ∅. That is, E is satisfiable.  Theorem 5 (subsumption). Given an EER model MEER , E1 and E2 are two entities in MEER , ψ (IEER ) is a transformed ALCQIK knowledge base, ψ (E1 ) and ψ (E2 ) are two concepts in ψ (MEER ) corresponding to E1 and E2 , respectively. Then: E1 is a subclass of E2 in MEER iff ψ (MEER )  ψ (E1 )  ψ (E2 ). Proof. “⇒”: If ψ (MEER )  ψ (E1 )  ψ (E2 ), then there is an interpretation I of ψ (MEER ) such that (ψ (E1 )  ¬ ψ(E2 )I = ∅. More specifically, ∃e. e ∈ (ψ(E1 ))I and e ∈ (ψ(E2 ))I . By part 2 of Theorem 3, λM (I ) is an object instance state for MEER , and (ψ(E1 ))I = E1 λM (I ) and (ψ(E2 ))I = E2 λM (I ) . Combining the above results, it can be further inferred that ∃e. e ∈ E1 λM (I ) and e ∈ E2 λM (I ) . That is, E1 is not a subclass of E2 , and such conclusion is conflict to the premise. Therefore, ψ (MEER )  ψ (E1 )  ψ (E2 ). “⇐”: If E1 is not a subclass of E2 , then there is an object instance state O of MEER such that e ∈ E1O and e ∈ E2O . By part 1 of Theorem 3, μM (O) is an interpretation of ψ (MEER ), and E1 O = (ψ(E1 ))μM (O) and E2 O = (ψ(E2 ))μM (O) . Combining the above results, it can be further inferred that e ∈ (ψ(E1 ))μM (O) and e ∈

28

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

(ψ(E2 ))μM (O) . That is, ψ (MEER )  ψ (E1 )  ψ (E2 ), and such conclusion is conflict to the premise. Therefore, E1 is a subclass of E2 .



Theorem 6 (equivalence). Given an EER model MEER , E1 and E2 are two entities in MEER , ψ (MEER ) is a transformed ALCQIK knowledge base, ψ (E1 ) and ψ (E2 ) are two concepts in ψ (MEER ) corresponding to E1 and E2 , respectively. Then: E1 is equivalent to E2 in MEER iff ψ (MEER )  ψ (E1 ) ≡ ψ (E2 ). Proof. “⇒”: If E1 is equivalent to E2 , then for any object instance state O of IEER it follows that ∀e. e ∈ E1O and e ∈ E2O . By part 1 of Theorem 3, μM (O) is an interpretation of ψ (MEER ), and E1 O = (ψ(E1 ))μM (O) and E2 O = (ψ(E2 ))μM (O) . Combining the above results, it can be further inferred that ∀e. e ∈ (ψ(E1 ))μM (O) and e ∈ (ψ(E2 ))μM (O) . That is, ψ (MEER )  ψ (E1 ) ≡ ψ (E2 ). “⇐”: If ψ (MEER )  ψ (E1 ) ≡ ψ (E2 ), then for each interpretation I of ψ (MEER ) it follows that ∀e. e ∈ (ψ(E1 ))I and e ∈ (ψ(E2 ))I . By part 2 of Theorem 3, λM (I ) is an object instance state for MEER , and (ψ(E1 ))I = E1 λM (I ) and (ψ(E2 ))I = E2 λM (I ) . Combining the above results, it can be further inferred that ∀e. e ∈E1 λM (I ) and e ∈E2 λM (I ) . That is, E1 is equivalent to E2 .  Theorem 7 (disjointness). Given an EER model MEER , E1 and E2 are two entities in MEER , ψ (MEER ) is a transformed ALCQIK knowledge base, ψ (E1 ) and ψ (E2 ) are two concepts in ψ (MEER ) corresponding to E1 and E2 , respectively. Then: E1 is disjoint to E2 iff ψ (MEER )  ψ (E1 )  ψ (E2 )  ⊥. Proof. “⇒”: If ψ (MEER )  ψ (E1 )  ψ (E2 )  ⊥, then there is an in-

terpretation M of ψ (MEER ) such that (ψ (E1 )  ψ (E2 ))M = ∅. By part 2 of Theorem 3, λM (I ) is an object instance state for MEER , and (ψ(E1 ))I = E1 λM (I ) and (ψ(E2 ))I = E2 λM (I ) . Combining the above results, it can be further inferred that E1 λM (I )  E2 λM (I ) = ∅. That is, E1 is not disjoint to E2 , and such conclusion is conflict to the premise. Therefore, ψ (MEER )  ψ (E1 )  ψ (E2 )  ⊥. ⇐”: If E1 is not disjoint to E2 , then there is an object instance state O of MEER such that E1O  E2O = ∅. By part 1 of Theorem 3, μM (O) is an interpretation of ψ (MEER ), and E1 O = (ψ(E1 ))μM (O) and E2 O = (ψ(E2 ))μM (O) . Combining the above results, it can be further inferred that (ψ(E1 ))μM (O)  (ψ(E2 ))μM (O) = ∅. That is, ψ (MEER )  ψ (E1 )  ψ (E2 )  ⊥, and such conclusion is conflict to the premise. Therefore, E1 is disjoint to E2 .  Theorem 8 (consistency). Given an EER model MEER , ψ (MEER ) is a transformed ALCQIK knowledge base. The EER model MEER is consistent iff ψ (MEER ) is satisfiable. From the transformation of EER models to ALCQIK knowledge bases in Section 4.1 and the proof of semantics-preservation of the transformation in Section 4.2, it is shown that Theorem 8 is the straightforward consequences of Definitions 7 and 12, and thus the proof of Theorem 8 is omitted here. Theorem 9 (redundancy). Given an EER model MEER , E, E1 and E2 are three entities in MEER , ψ (MEER ) is a transformed ALCQIK knowledge base, ψ (E), ψ (E1 ) and ψ (E2 ) are three concepts in ψ (MEER ) corresponding to E, E1 and E2 , respectively. Then: MEER is redundant iff at least one of the following conditions is satisfied: (i) ψ (MEER )  ψ (E)  ⊥; (ii) ψ (MEER )  ψ (E1 ) ≡ ψ (E2 ). Proof. “⇒”: If MEER is redundant, then there is some entity standing for an empty set; or there are two equivalent entities. If there is some entity E standing for an empty set, then for any object instance state O of MEER it follows that E O = ∅. By part 1 of Theorem 3, μM (O) is an interpretation of ψ (MEER ), and E O = (ψ(E ))μM (O) . Combining the above results, it can be further inferred that (ψ(E ))μM (O) = ∅. That is, ψ (MEER )  ψ (E)  ⊥. If there are two equivalent entities E1 and E2 , then according to Theorem 6, it holds that ψ (MEER )  ψ (E1 ) ≡ ψ (E2 ).

“⇐”: If ψ (MEER )  ψ (E)  ⊥, then for each interpretation I of ψ (MEER ) it follows that (ψ(E ))I = ∅. By part 2 of Theorem 3, λM (I ) is an object instance state for MEER , and (ψ(E ))I = E λM (I ) = ∅. That

is, E is an empty entity. It can be further inferred that MEER is redundant. If ψ (MEER )  ψ (E1 ) ≡ ψ (E2 ), then according to Theorem 6, E1 is equivalent to E2 . That is, MEER is redundant.  Until now, an EER model can be represented and reasoned by the DL. In order to again help readers to understand the approach. Now, let us once again return to Example 2 of Section 5.1, and we will describe the complete process of representing and reasoning on the EER model in Example 2 based on our approach. From Fig. 14, it can be found that four main steps are executed to represent and reason on an EER model with the DL: - Step

—Formalization of an EER model: A graphical EER model can

be formalized and interpreted according to the definition and semantic interpretation method of EER models proposed in Section 3.1; - Step —Transformation of the EER model: The formal EER model can be transformed into an ALCQIK knowledge base by our approach and tool in Section 4; - Step —Reduction of reasoning tasks of the EER model: Based on the transformed ALCQIK knowledge base, the reasoning tasks of the EER model can be further reduced to the reasoning problems of ALCQIK by our approach in Section 5; - Step —Reasoning on ALCQIK : This can be done by means of the reasoning mechanism of ALCQIK mentioned in Section 3.2. Finally, the reasoning results of ALCQIK (i.e., the reasoning results of the EER model) are returned. Overall, the complete correspondences between EER and DL were established in our work, the DL can provide adequate expressive power to account for the essential features of EER models, and the reasoning capabilities of the DL can provide the basic reasoning services that are needed in EER modeling. Once the correspondences are achieved, the approach may be further used in some practical applications: - In the conceptual analysis stage, data models can take advantage of both the algorithms and the results concerning their complexity developed in DL languages. - The approach may be further used in the area of information integration. As mentioned in [11], DLs could also be considered as expressive variants of data models with incorporated reasoning facilities. This is of particular importance in the context of information integration, where a high expressiveness is required to capture in the best possible way the complex relationships that hold between data in different information sources. - The approach may be considered as the basic steps towards developing intelligent systems that provide computer-aided support to improve some applications of data modeling. For example, the reasoning services provided by a DL system can be integrated in CASE tools and profitably exploited to support the designer in the analysis phase. An example is the i.com tool [28] for conceptual modeling, which combines a user-friendly graphical interface with the ability to automatically infer properties of a schema (e.g., inconsistency of a class, or implicit IS-A relations) by invoking the Fact DL reasoner [11]. In our future work, we will further consider and investigate these applications in depth. 6. Related work Over the years, DLs have been used for data management to improve some data processing techniques (e.g., query optimization and

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

29

Fig. 14. The overall description of representing and reasoning on the EER model with ALCQIK .

Table 2 The comparison between our work with the existing work closely related to our research. Data

Using

Main notions

Modeling

DLs

Entity

with DLs

ER

UML Objectoriented EER (Our Work)

Attribute

Relationship Constraints of relationship

Entity Weak Single Multi Composite/ entity attribute -valued derived attribute attribute ALUNI[18]  ALCFI[27] ALENI+[37] ERL[31] DL-Lite[6] DLR[12]  ALUNI[18]  ALCQIK





Subclass/ Specialization/ Category Aggregation

Cardinality Participation Cardinality superclass generalization ratio constraint constraint constraint constraint constraint









 

 



 













data maintenance [4,8,9,13,14,17,32,42,45]). In particular, thanks to the high expressive powers and effective reasoning services of DLs, data modeling with DLs is gaining a privileged place in recent years. Table 2 first compares our work in this paper with the other existing work closely related to our research. From Table 2, regarding the researches on DLs for data modeling, the correspondences between DLs and several data models (e.g., ER, object-oriented data model, and UML) were investigated. In [18,27,37], ER models were transformed into the DLs ALUNI, ALCFI, and ALENI+ knowledge bases, respectively. The work in [31] investigated the connections between the conceptual database model (ER model) and the DL called ERL. Also, the complexity of reasoning over













ER with DL DL-Lite was discussed in [6]. The temporal ER modeling with DLs was investigated in [1,5]. Moreover, a simple object-oriented data model was transformed into the DL ALUNI knowledge base in [18]. The UML modeling with DLs was investigated in [12], where the UML class diagrams were encoded in DLRifd and ALCIQ knowledge bases, respectively. Also, the DL was used to reason on UML conceptual schemas with arbitrary object constraint language (OCL) constraints in [44]. The work in [36] proposed an automatic approach to analyze the consistency and satisfiability of UML models using DL reasoners. A DL approach was proposed in [19] for representing and reasoning on XML documents. In addition, in [2], the DL was employed to detect the full satisfiability of a conceptual model, i.e.,

30

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

whether there is an instantiation of the conceptual model where all classes and associations are non-empty and all the constraints are respected. A prototype design tool i.com for intelligent conceptual modeling with the DL DLR was developed in [28]. Also, the author in [34] presented a simple framework for ORM (Object Role Modeling), ER, EER and UML in terms of a new generic conceptual data modeling language CMcom that is based on the DL DLRifd . But the problem is that not all DLRifd ’s constructs are implemented by current reasoning engines. An ontological modeling of e-Catalogs using EER and DL was presented in [40]. Their goals are not to represent and reason on EER models with DL, but to describe the semantic information of eCatalog in a conceptual EER model, and then to construct a formally ontology using DL. More recently, the authors in [3] used DLs not only for static reasoning about data models, but also for reasoning about the evolution and change over time of graph structured data that happens as the result of executing actions. The authors in [51] presented a framework that allows the use of ontology and DL technologies to describe and reason on domain-specific modeling languages. The existing work above gives us good hints for developing our approach in this paper, but their goals and approaches are different from our research as shown in Table 2 and also will be compared in the following. In summary, there are several major differences as shown in Table 2: (i) Differences of data models: The existing work did not mainly and fully discuss EER model. As we have known, there are some differences between EER and the data models (ER, UML, and Object-oriented models discussed in the existing work), e.g., EER includes the notions of ER but adds constraints of specialization/generalization, categorization, and aggregation [26,48]; Also, the notions multi-value attribute, composite attribute, weak entity and categorization are usually not considered in UML and object-oriented model [7,15]; The more detailed comparisons and differences of these models can be found in [7,15,16,38,47]. Therefore, these existing approaches for data modeling with DLs are not enough to represent and reason on EER with the DL. (ii) Differences of approaches: To our best knowledge, so far, there is not a complete and detailed report on representing and reasoning on EER models with DLs. Some important issues including the complete EER constructs (e.g., categorization, aggregation, total participation constraints of entities in relationships, and etc.), the detailed transformation rules from EER to DL, the prototype transformation tool, and the reasoning of EER with DL were still missed at present as shown in Table 2. To this end, in this paper we investigated and proposed a DL approach for representing and reasoning on EER models.

7. Conclusions and future work We proposed a complete Description Logic (DL) approach for representing and reasoning on EER (Enhanced Entity-Relationship) models. Firstly, we proposed a formal definition and semantic interpretation method of EER models, which summarizes and includes all features of EER models. Then, by analyzing the features of EER models, a DL called ALCQIK (a minor extension of ALCQI with the construct keys) was presented as the language of representing and reasoning on EER models. On this basis, we proposed an approach for transforming EER models into ALCQIK knowledge bases. The correctness of the transformation was proved and a transformation example was provided. Further, a prototype transformation tool called EER2DL was implemented. Case studies show that our approach and prototype tool actually work. Finally, based on the transformed ALCQIK knowledge bases, we proposed methods for reducing the reasoning of EER models to the reasoning of the transformed ALCQIK knowledge bases. As one result, reasoning techniques from the DL become applicable to the EER reasoning tasks. In our near future work, we will further investigate the approach and tool in depth from the following several aspects: (i) we will consider and investigate some practical applications of this approach in a real world context as mentioned at the end of Section 5.3 in depth; (ii) we will further enhance the prototype tool in several aspects (e.g., solve conflicts, correct mistakes, etc.); (iii) we will publish the studied data sets (EER models in XML format) on the Internet in order to make them available to other researchers; (iv) we will investigate and discuss the efficiency of the process in terms of memory consumption in depth; (v) we also aim at developing a unified DL-based system of integrating EER design, transformation, and reasoning. Acknowledgments The authors thank the anonymous referees for their very valuable comments and suggestions, which improved the technical content and the presentation of the paper. The work is supported by the National Natural Science Foundation of China (61202260, 61370075, 61370154, 61370155) and Fundamental Research Funds for the Central Universities (N140404005, N140404010). Appendix A. The XML format file of an EER model The appendix provides a relatively complete XML format file of the EER model in Fig. 7 produced from the CASE tool. The file is then used as input of the transformation tool EER2DL in Fig. 13. The other similar data sets (e.g., some EER models in XML format used in our tests as mentioned in Section 4.3) are not listed here considering the reason of space.

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

References [1] A. Artale, R. Kontchakov, V. Ryzhikov, M. Zakharyaschev, A cookbook for temporal conceptual data modelling with description logics, ACM Trans. Comput. Logic 15 (3) (2014) 1–50. [2] A. Artale, D. Calvanese, A. Ibánez-Garcıa, Checking full satisfiability of conceptual models, in: Proceedings of the 23rd International Workshop on Description Logics (DL2010), 2010, pp. 55–66. [3] S. Ahmetaj, D. Calvanese, M. Ortiz, M. Šimkus, Managing change in graphstructured data using description logics, in: Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014), 2014, pp. 966–973. [4] A. Artale, F. Cesarini, G. Soda, Describing database objects in a concept language environment, IEEE Trans. Knowl. Data Eng. 8 (2) (1996) 345–351.

31

[5] A. Artale, E. Franconi, temporal er modeling with description logics, in: Proceedings of the 18th International Conference on Conceptual Modeling, 1999, pp. 81– 95. [6] A. Artale, D. Calvanese, R. Kontchakov, V. Ryzhikov, M. Zakharyaschev, Reasoning over extended ER models, in: Proceedings of the 26th International Conference on Conceptual Modeling (ER 2007), 2007, pp. 277–292. [7] A. Al-Shamailh, An experimental comparison of ER and UML class diagrams, Int. J. Hybrid Inf. Technol. 8 (2) (2015) 279–288. [8] D. Beneventano, S. Bergamaschi, C. Sartori, Description logics for semantic query optimization in object-oriented database systems, ACM Trans. Database Syst. 28 (2003) 1–50. [9] A. Borgida, Description logics in data management, IEEE Trans. Knowl. Data Eng. 7 (5) (1995) 671–682.

32

F. Zhang et al. / Knowledge-Based Systems 93 (2016) 12–32

[10] A. Borgida, G.E. Weddell, Adding uniqueness constraints to description logics, in: Proceedings of the 5th International Conference on Deductive and ObjectOriented Databases (DOOD’97), 1997, pp. 85–102. [11] F. Baader, D. McGuinness, D. Nardi, P.F. Patel-Schneider, The Description Logic Handbook: Theory, Implementation, and Applications, Cambridge University Press, 2003. [12] D. Berardi, D. Calvanese, G. De Giacomo, Reasoning on UML class diagrams, Artif. Intell. 168 (1-2) (2005) 70–118. [13] A. Bonifati, L. Palopoli, D. Saccà, D. Ursino, Discovering description logic assertions from database schemes, in: Proceedings of the International Workshop on Description Logics (DL97), 1997, pp. 144–148. [14] P. Bresciani, Some research trends in KR&DB, in: Proceedings of the 3rd Workshop on Knowledge Representation Meets Databases (KRDB’96), 1996, pp. 1–3. [15] G. Bavota, C. Gravino, R. Oliveto, A. De Lucia, G. Tortora, M. Genero, J.A. CruzLemus, Identifying the weaknesses of UML class diagrams during data model comprehension, in: Proceedings of the 14th International Conference Model Driven Engineering Languages and Systems (MODELS 2011), 2011, pp. 168–182. [16] D. Bock, T. Ryan, Accuracy in modeling with extended entity relationship and object oriented data models, J. Database Manag. 4 (1993) 30–39. [17] O. Curé, F. Jochaud, Preference-based integration of relational databases into a description logic, in: Proceedings of the DEXA’07, 2007, pp. 854–863. [18] D. Calvanese, M. Lenzerini, D. Nardi, Unifying class-based representation formalisms, J. Artif. Intell. Res. 11 (2) (1999) 199–240. [19] D. Calvanese, G. De Giacomo, M. Lenzerini, Representing and reasoning on XML documents: a description logic approach, J. Logic Comput. 9 (3) (1999) 295–318. [20] D. Calvanese, G. De Giacomo, M. Lenzerini, Identification constraints and functional dependencies in description logics, in: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), 2001, pp. 155–160. [21] D. Calvanese, G. De Giacomo, M. Lenzerini, Keys for free in description logics, in: Proceedings of the 2000 International Workshop in Description Logics (DL2000), 2000, pp. 79–88. [22] D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, Reasoning in expressive description logics, in: A. Robinson, A. Voronkov (Eds.), Handbook of Automated Reasoning, Elsevier Science Publishers, Amsterdam, 2001. [23] P.P. Chen, The entity-relationship model−toward a unified view of data, ACM Trans. on Database Syst. 1 (1) (1976) 9–36. [24] Deutsch A. CSE 132B database system applications. 2006. [25] Elmasri R., Navathe S. Fundamentals of database system, in: Enhanced EntityRelationship and UML Modeling, 2003. Chapter 4 [26] R. Elmasri, J. Weeldreyer, A. Hevner, The category concept: an extension to the entity-relationship model, Data Knowl. Eng. 1 (1) (1985) 75–116. [27] E. Franconi, U. Sattler, A data warehouse conceptual data model for multidimensional aggregation, in: Proceedings of the Workshop on Design and Management of Data Warehouses (DMDW ’99, Heidelberg, Germany, June), 1999. [28] E. Franconi, G. Ng, The i.com tool for intelligent conceptual modeling, in: Proceedings of the 7th International Workshop on Knowledge Representation meets Databases (KRDB 2000), 2000, pp. 45–53. [29] R. Fidalgo, E.D. Souza, S. España, J. Castro, O. Pastor, EERMM: a metamodel for the enhanced entity-relationship model, in: Proceedings of the ER Conference, 2012, pp. 515–524. [30] B.C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, U. Sattler, OWL 2: the next step for OWL, Web Semant. Sci. Serv. Agents World Wide Web 6 (4) (2008) 309–322.

[31] M.S. Hacid, J.M. Petit, F. Toumani, Representing and reasoning on database conceptual schemas, Knowl. Inf. Syst. 3 (1) (2001) 52–80. [32] G. Hao, S. Ma, Y. Sui, J. Lv, An unified dynamic description logic model for databases: relational data, relational operations and queries, in: Proceedings of the 26th International Conference on Conceptual Modeling (ER 2007), 2007, pp. 121–126. [33] T.H. Jones, I. Song, Analysis of binary/ternary cardinality combinations in entityrelationship modeling, Data Knowl. Eng. (1996) 39–64. [34] C.M. Keet, Unifying industry-grade class-based conceptual data modeling languages with CMcom , in: Proceedings of the 21st International Workshop on Description Logics (DL’08), 2008. [35] V.L. Khizder, D. Toman, G.E. Weddell, On decidability and complexity of description logics with uniqueness constraints, in: Proceedings of the ICDT, 2001. [36] A.H. Khan, I. Porres, Consistency of UML class, object and statechart diagrams using ontology reasoners, J. Vis. Lang. Comput. 26 (2015) 42–65. [37] Y.X. Lei, J.Y. Tian, B.X. Cao, A faithful translation from entity-relationship schemas to the description logic ALENI+ , J. Softw. 8 (2) (2013) 296–301. [38] A.D. Lucia, C. Gravino, R. Oliveto, G. Tortora, An experimental comparison of EER and UML class diagrams for data modelling, Emp. Softw. Eng. (2010) 455– 492. [39] C. Lutz, C. Areces, I. Horrocks, U. Sattler, Keys, nominals, and concrete domains, J. Artif. Intell. Res. 23 (2005) 667–726. [40] H. Lee, J. Shim, D. Kim, Ontological modeling of e-catalogs using EER and description logic, in: Proceedings of the International Workshop on Data Engineering Issues in E-Commerce (DEEC 2005), IEEE Society, 2005. [41] Mamˇcenko J. Introduction to data modeling and MSAccess.in: Lecture Notes on Information Resources Part I, 2004. [42] B. Motik, I. Horrocks, U. Sattler, Integrating Description Logics and Relational Databases, University of Manchester, UK, 2006 Technical Report. [43] OWL 2 Web ontology language document overview (second ed.), http://www. w3.org/TR/owl2-overview/, W3C Recommendation, 11 December 2012 (accessed 12.04.15). [44] A. Queralt, A. Artale, D. Calvanese, E. Teniente, OCL-Lite: Finite reasoning on UML/OCL conceptual schemas, Data Knowl. Eng. 73 (2012) 1–22. [45] M. Roger, A. Simonet, M. Simonet, Bringing together description logics and database in an object oriented model, in: Proceedings of International Conference on Database and Expert Systems Applications (DEXA 2002), 2002, pp. 504– 513. [46] I. Song, M. Evans, E.K. Park, A comparative analysis of entity-relationship diagrams, J. Comput. Softw. Eng. 3 (4) (1995) 427–459. [47] P. Shoval, I. Frumermann, OO and EER conceptual schemas: a comparison of user comprehension, J. Database Manag. 5 (4) (1994) 28–38. [48] B. Thalheim, The enhanced entity-relationship model, Handbook of Conceptual Modeling, Springer, Heidelberg, 2011, pp. 165–206. [49] T.J. Teorey, D.Q. Yang, J.P. Fry, A logical design methodology for relational databases using the extended entity-relationship model, ACM Comput. Surv. (CSUR) 18 (2) (1986) 197–222. [50] P. Warren, P. Mulholland, T. Collins, E. Motta, The usability of description logics, in: Proceedings of the 11th Extended Semantic Web Conference (ESWC 2014), 2014, pp. 550–564. [51] T. Walter, F. Parreiras, S. Staab, An ontology-based framework for domain-specific modeling, Softw. Syst. Model. 13 (1) (2014) 83–108.