Classification and syntax of constraints in binary semantical networks

Classification and syntax of constraints in binary semantical networks

hformaion S.vsrems Vol. IS. No. 5. pp. 497-513. All rights reserved 1990 0306-4379:90 53.00 + 0.00 Press plc Copyright C 1990Pergamon Printed i...

1MB Sizes 0 Downloads 23 Views

hformaion

S.vsrems

Vol. IS. No. 5. pp. 497-513. All rights reserved

1990

0306-4379:90

53.00 + 0.00 Press plc

Copyright C 1990Pergamon

Printed in Great Britain.

CLASSIFICATION AND SYNTAX BINARY SEMANTICAL

OF CONSTRAINTS NETWORKS

IN

SHAN-HWEI NIENHUYS-CHENG Department of Computer Science, Erasmus University, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands (Receioed 5 April 1989; in revised form 10 January 1990; received for publication 19 June 1990)

Abstract-We use binary semantical networks to design or to build information systems. A good system should have a good conceptual scheme with well-defined constraints. This article classifies the constraints and defines their syntax so that there is a clear and easy way to express constraints. The constraints language is a generalization of graphical constraints. This article is based on a former article of the author (Proceedings of the New Generation Computers Conference) and a Control Data DMRL Report by R. Meersman (1982). Examples come mostly from the hospital case of De Brock and Remmen.

&’ words: NIAM, binary semantical network, conceptual scheme, constraint, support, range. tuple, set, injective, surjective, path, route, source, target

lot, nolot, relation.

the graphical scheme. These are called graphical constraints [in contrast to the constraints of (3)]. The project RIDL*, led by Professor Meersman of Tilburg University, uses this method as a basis to build a tool to accomplish all those purposes which are stated in (l)--(4). This tool is divided into different modules which perform the different tasks stated above. This project intends to create an integrated programming environment with which a user can make the conceptual scheme, can define the constraints, can check the consistency and completeness of the scheme and can transform the graphical conceptual scheme with graphical constraints (and eventually also other constraints) into a relational database scheme with constraint specifications. It is realized on Apollo workstations with the help of Oracle. At the end of this article there shall be a short but more concrete introduction to the different modules of this tool and how far the implementation goes. However, for more details of this project, the reader should consult Refs [ 1,2]. The present article represents part of the work on the RIDL-C module for constraint language. This language intends to generalize graphical constraints. Constraints on binary semantical networks were not formalized enough, not always transparent, sometimes ambiguous and unstructured. My work began with interpreting a role as a binary relation instead of as a column (see Section 1.1). The idea of composition of relations is then used as a thread through the whole theory. With this basis we can classify, unify and express constraints easily with the concepts of domain, range, image, injection, etc. in mathematics. After the classification of constraints is done, we can consider how to define the syntax. The ideas of the syntax are: it should not be too colloquial so that the

1. INTRODUCTION

There are two phases in developing a database, namely, design phase and implementation phase. In the design phase we need a way to express the relationships of different types of data and the restrictions of data. That means we need to have a good conceptual scheme with a set of constraints. How to express the scheme and the constraints is a question of available models. There are different kinds of models for this purpose. However, when we choose an agreement we can consider the following things: (1) Can the important ingredients of a conceptual scheme be graphically expressed? (2) Can some constraints be integrated in the graphical conceptual schemes? (3) If it is not possible to integrate all the constraints in the graphical conceptual scheme, is there a language which can express the constraints unambiguously? (4) Is it possible to check the quality of the conceptual scheme including the related constraints syntactically and semantically (e.g. consistency) before the implementation? (5) If we can accomplish (l)-(4), can we then transform the conceptual scheme in a programming environment to a database scheme of a datamodel, for example, the scheme of relational database? Binary semantical network according to NIAM (Nijssen Information Analysis Method) expresses the important ingredients of a conceptual scheme graphically. Some constraints are also integrated in 497

498

WAN-HWEINIENHUYS-CHENG

structures are still transparent, it should not give too many possibilities for the same constraint so that people can easily remember the syntax, furthermore, it should not be too mathematical so that nonmathematicians can also understand it. This syntax is first implemented in YACC under UNIX in the Oracle environment, and Ref. [3] comprises a report on this work. More recent developments are improvements in the language. It can express more constraints, and it is more independent of implementation. This is achieved by expressing constraints with attribute grammar. The other improvements on Ref. [3] are listed shortly here:

(1) There are more data types defined here. For example, elements in a nolot and elements with more than one or two coordinates are also allowed. The sets with such elements are also considered for expressing constraints. There are more standard functions and (2) more possibilities to define sets in this article. (3) The examples here come from the Hospital Case instead of the Cris Case. See Refs [I, 294-61. Details of the grammar are available in a technical report from the author. 1.1. What is a Binary Semantical Network? Suppose we want to make an information analysis about classification, structure, and mutual relations of certain things in the real world, we can choose the method of binary semantical networks according to NIAM. The essential ingredients of this method are the concepts of nolot (non-lexical object type), lot (lexical object type) and the binary relations between them. Let us look at the following example. We can design an information system with this method for a hospital. In fact, we are going to use such a binary semantical network for all examples of constraints in this article. This network is originated from the example of Remmen and De Brock [4,5] for their set model of databases. Many examples in this article come also from them, applied in the network situation. The Appendix of this article contains also the graphical conceptual scheme of the binary semantical network of the hospital case. Because the original scheme is too big and complicated, we take only a part of it which we need for our examples of constraints. For those who are interested in the whole thing, see Ref. [6]. Nolots are sets which represent real world entities or abstract concepts. A lot is a set of lexical objects like names, numbers which are used to code properties of the elements in a nolot. Among the nolots in the hospital case there are two important ones: specialist and patient. We may think specialist and patient are both sets of real people. The relation that someone is the patient of some specialist is a binary

relation between the two sets specialist and patient. If we want to build an information system which includes these two sets and their relationships, we need some way to code these two groups of real people, for example, the names of the specialists and the names of the patients. In the network we have for these reasons two nolots specialist and patient and also two lots sp-name and pat-name. Besides the relations between the patients and their specialists, we have also the relations between the specialists and their names, the patients and their names. Figure I is usually used to denote such a situation in a binary semantical network. Notice a nolot is expressed by a solid circle and a lot by a dashed circle. We use sometimes also a two layered circle, dashed and solid, to mean a nolot which is identified with a lot when there is one-to-one correspondence between their elements and when the importance of the distinction between the lot and the corresponding nolot is not so important. We say then we have a lotnolot. Let us use the name ot for a lot, nolot or lotnolot in general. The two rectangles between two ots are used symbolically to represent a table for a binary relation. The columns are usually called roles and they have names, for example, the two roles of the table between specialist and patient are called having and with. Let us use # 1, # 2, etc. to denote the elements in a nolot. The table between specialist and sp-name can have for example the following elements at a certain moment: with # 1 #2 #3

of Smith J. Koffman M. Smith J.

Notice that the specialists # 1 and # 3 have the same name. Although a role means originally a column, we can think of it as a set of pairs. For example, of={( # 1, Smith J.), (#2, Koffman M.), (# 3, Smith J.)], with={(Smith J., # l), (Koffman M., # 2), (Smith J., # 3)).

From this viewpoint we can consider an ot as a set of simple data and a role as a set of pairs of such data. From the philosophy of binary semantical networks there should not exist direct relation between two lots. The relation between lots have to come from nolots. For example a patient may have name Jansen and he may have as specialist Koffman M. The connection between the two names comes from the relation between the corresponding patient and specialist in the nolots and it does not come from the two names themselves. In other words, a lot contains data to code elements in a nolot. Lots and the relation between lots do not exist independently. A design of binary semantical network can be expressed by a diagram which is the combination

Classification

i having

/

of constraints

with

/ I

semantical

networks

499

\ /

\

\./

./

sp_namc

in binary

compare two elements with lexicographical order. We can only say if two such elements are the same or not. (4) Pair of two co-ordinates. A role is a set of pairs. We need thus to consider an element with two co-ordinates. A co-ordinate can be a string, a number or a nolot element. (5) n-&pie. We can generalize a pair to an n-tuple. We can denote it by (a,, a*, . . . , a,), where ai is one of the types in (l)-(3) above.

I I

\

\

and syntax

pat-name

Fig. I of some simple diagrams as the one above. Such a diagram which tells how all ots are related to each other by roles is called a graphical representation of a conceptual scheme. We say sometimes just a conceptual scheme. The conceptual scheme in the Appendix is referred to as HOSP in this article. 1.2. What are the Element Types, Set Types and their Permissible Operations?

In the semantical networks we operate things on two levels: sets or elements in sets. For example, if we add two numbers together then we operate on the level of elements; if we take the union of two sets then we operate on the level of sets. We notice also that our sets in consideration have always a homogeneous structure. That means, for example, if one element in the set is a number then every other element in this set is also a number. Just consider for example a lot with the name salary. It contains the data of salaries of certain people which are numbers. We should also be careful with the type restrictions when we do operations. For example, it is not allowed to take the union of a set of numbers and a set of strings. We are going to introduce all the datatypes and set types we need. The main concern of this article is to make the concepts of constraints clear so we use only a minimum number of types. In practice we can of course define more types to make some practical problems easier. 1.2(a). Element types (I) Number. This type corresponds with the mathematical concept of the set of real numbers. It includes integers. The usual arithmetical operations, +, -, *, /, are of course allowed here. We can also compare two numbers by using the numerical order. (2) String. A string means a sequence of printable characters. We can compare two strings in lexicographical order by ASCII code. (3) Element of a nolot. An element in a nolot is something unique. We do not operate on such elements with arithmetical operations. We do not

1.2(b). Set types (1) Set of numbers. A lot can be a set of numbers (that is to say, having a set of numbers as values on certain moment). We need thus to consider a type which consists of sets of numbers. Two such sets can be operated by operations like union, intersection, difference. You can also apply relational operators between two such sets to get true or false as value. For example, you can ask if one number set is a subset of another number set. (2) Set of strings. A lot can be a set of strings. The set operations and set comparisons work just like the corresponding situations for sets of numbers. We need only to pay attention to the type control when we use such operations and comparisons. There is a special set of strings, namely, set of dates. For example, the lot in HOSP with the name date is such a set. From this set we can define some number functions which give the year, month, etc. of a date. We shall talk about such functions in the non-declarative constraints. (3) Nolots. A nolot defines a type. This type consists of all subsets of this nolot. You can consider the unions, intersections, differences of two such subsets. The comparisons by using relational operators to two subsets of the same nolot are allowed. (4) Set of pairs. As we have noticed, a role is a set of pairs. For example, the role of in the example in (1) is a set of three pairs. The first co-ordinate of a pair is an element from the nolot specialist and the second co-ordinate is a string. We need to consider such kinds of sets in general. Two such set types can be different because the co-ordinates are of different types. If we want to apply union, intersections, etc. to two such sets or we want to compare two such sets, we should first check if they have compatible types. (5) Set of n-tuples. A set type can consist of sets of n-tuples with the following property: for every i, where 1 d i < n, the ith co-ordinates are all from the same type, namely, (l)-(3) from Section 1.2 (a). This is just a generalization of set of pairs.

1.3. What are Constraints? We have to have certain restrictions in inserting, deleting or changing data in a semantical network. We call such restrictions constraints. For example, a patient should not be released on a date earlier than the date of his admission. If you try to insert such a

SHAN-HWEINLWHUYS-CHENG

500

date for a patient in such a role in HOSP, then the system should reject this insertion. Constraints can be defined as Boolean expressions which should always be true. We need thus, to build up tools which can express a Boolean expression in a semantical network. We divide constraints in two categories: declarative constraints and non-declarative constraints. For declarative constraints we need declarative Boolean expressions and for non-declarative constraints we need non-declarative Boolean expressions. The declarative constraints in their simplest situations can be expressed in the form of diagrams with special symbols. In this article we shall also use such diagrams for examples. The distinction between declarative and non-declarative is not absolute. Any constraint can be expressed non-declaratively. 2. RELATIONS AND PATHS We have seen in the introduction how the state of a role on some moment can be viewed as a set of pairs. The state can change from time to time but the related constraints have to be met always. To be able to describe the situations of states of roles we need the concepts of relations, composition of relations, etc. in mathematics. Therefore we introduce in this chapter such elementary concepts. Definition A relation from A to B, denoted by f:A+B is a subset of the Cartesian product A x B. We call A the source of the relation f and B the target off. We denote them by source(f) and targetCf), respectively. The support ofl; denoted by support(f), is the set defined by support(f)

In a binary semantical network, a role from an ot to an ot is a relation. We have seen such examples in Section 1.1. Using the same diagram and the notations now, we have for examples the following two relations: of: specialist+ sp_name with : sp-name -+ specialist.

These two relations are inverses of each other. With the same population as in Section 1.I, we can say of({ # 1)) = {Smith J.) with({Smith J.}) = { # 1, # 3).

We notice that in the same diagram there are two roles with the name of and two roles with the name with. Let us assume that a role is uniquely defined in a semantical network if the object types on both sides are also given. We use thus for example sp_name of specialist to denote the unique role with name of from specialist to sp_name. In this way we can write: sp_name of specialist({ # 1)) = {Smith J.} specialist with sp_name ({Smith J.}) = { # 1, # 3).

Dejnition A path is just a composition of relations. It is also a relation but we emphasize here that it is the composition of some other known relations. Given f,:A,--rAzrft:A2-+A,,

rangeU7 = {v Efl(x9v)ef1. An element y in B is an image of x underfif (x, y) ef. In this situation x is an original of y. The inverse of the relation f is the relation f-i which is defined by f-’ = ((x*Y)((Y, x)efl. If f: A -rB and A 2 V then Vsuch that (x,y)~f}.

If V and f (V) contain only one element, i.e. V = {a} and {f(a)} =f({a}) then we use f(a) to denote the only image of a.

. . . .fn:A,+A,+,,

we can define the composition f. f,_ , . . .fi of these relations as the relation f from A, to A,, , with the following property: (x, y) Ef o there exist z2, z,, . . . , I,, such that

= {x E A I&Y) of}.

Notice that support is usually called domain in mathematics. However, domain is also a term used in the information system with a quite different meaning. This is why we choose an another name for domain. The range offt denoted by range(f), is the set defined by

f(V)={yEB(hE

Interpretation

(x,Z*)Ef,t(zzrz~)~f*,...,(~,,Y)~f,. We denote the composition

in the following way:

f”f._ ,... fi:A,+Ar...+A,+A,+,. - z, + y is called a route from Thewayx--rz2+... x to y. We denote the route by (x, z?, z3.. . . , z,,, y) under f,f,_ , . . .fi . We have just seen that the composition f.fn_, . . .fi is the set of pairs where the first co-ordinate is the beginning of a route and the second co-ordinate is the end of the same route. This means also that two different routes can give the same element in the path because they have the same beginning and ending. We need later also the concept of a route and a set of routes later, thus: Definition We use R(f.fn-, . . .fi) to denote the set of routes (x, 22, zj, . . . .L Y) underf,f,I. . .f, . This is a subset of the Cartesian product of A,xAzx

. . . xA,+,.

501

Classification and syntax of constraints in binary semantical networks Example

If we consider the Fig. 1, we have a path in the semantical networks of HOSP expressed in the following way: pat-name

of patient with specialist with

sp-name.

This path is the set of all pairs where the first co-ordinate is the name of a specialist and the second co-ordinate is the name of a patient of him. We can also consider the set of routes R(pat_name

of patient with specialist with

sp_name).

For example, the treatment elements here in the related

two patients with the same name under of the same specialist shall give different even though they give the same element path.

Remark

We can see that in a semantical network we are dealing with set of one co-ordinate (ot), set of two co-ordinates (roles or paths), set of n co-ordinates (set of routes in a path). 3. CONSTRAINTS

ABOUT SUBTYPES

When we look at a conceptual scheme, we see often such diagrams as Fig. 2. This means the nolot A is a subtype of the nolot B. The concept of subtype can be considered as a constraint. We define it in the following way. Definition

If A and B are nolots and B 2 A, then we say A is a subtype of B and B is a supertype of A. It is expressed in Fig. 2. We can think of the arrow as a trivial role, namely, the identity relation. Notice that in this situation a subset of A is also a subset of B and the operations like unions, intersections, etc. for subsets of A can be considered as operations for the set type defined by B. Theoretically, we should discuss the constraints about subtypes after some other declarative constraints. On the other hand, we use paths to build up constraints in general. We need to make agreements about paths which pass subtypes and supertypes. So we begin with the constraints about subtypes. Path through subtypes

Fig. 3

precisely, let us consider A, a subtype of B,f: B -+ C, a role. Then define fi : A + C as

Because we always specify the object types which are source and target of a role, we can use f instead off,. For example, consider the following path in I-IOSP: specialist

with secretary having salary.

Secretary is a subtype of employee and hacing is a role from employee to salary. We consider having now as role from secretary to salary. An element (a, b) is in this path if a is a specialist and b is the salary of one

of his secretaries. On the other hand, if we have a role between a subtype and some object type, we can also consider it as a role between its supertype and that object type. We only have to consider it as a subset of a bigger Cartesian product. We can also generalize the idea of subtypes transitively. That means if A is a subtype of B and B is a subtype of C then A is a subtype of C. Definition

If B has several subtypes A,, A,, . . . , A,, then we A,, satisfy the total constraint for say A,,A,,..., subtypes with respect to B if B = A, v A, v . . . uA,. We often use a diagram to denote the total constraint for subtypes. We use Fig. 3 for such a diagram, for n =3. Definition

Given A,, A,, . . . , A,, subtypes of B. We say these subtypes satisfy exclusion constraint for subtypes if A, n A, = 4 for every i f j. For example, specialist,

secretary

subtype-excl

4. UNIQUENESS

If a role is between a supertype and some object type then we can also consider it as a role between a subtype and that object type. To formulate it more

w Fig. 2

employee.

CONSTRAINTS

4.1. Convention Observe that a set is usually defined without multiple elements. That is to say, we are not interested in multisets. Since a relation is also a set of pairs, an arbitrary element (x, y) does not repeat. This means that such a condition should always be checked for every state of every semantical network. We do not consider it as a constraint for a special semantical

502

SHAN-HWEINIENHUYS-CHENG

It-0

()--p-p-j-_0

b

4

Fig. 4

Fig. 5

network or a special role of a network. In other words, in Fig. 4 the constraint should always be satisfied. 4.2. Injective Constraints

This constraint is equivalent with condition (1) above and the injectivity of the product function g,xg,x

Definition

f,f,-

,... fi:A,--rA,...

+A,--rA,+I,

this

path satisfies injective constraint if is injective. In other words, there can be different ways to reach an element in A,, , from A,, but they all have to start from the same element.

fnfn_,. . .f,

A--,B,xB2...

xB,,

(a,b,)Eg,,...,(a,b,)Eg,. Diagram For n = 2 and g,, g2 roles, we can express the injective constraint by the graphical constraint shown in Fig. 6. Example It is not allowed to give a patient in one day more than one patient-treatment. This constraint can be expressed in the following way:

Proposition

identified-by date of pat-treat, patient of pat-treat.

pat-treat

Given a path f =fnfn_,. . .fr in a semantical network. If everyJ satisfies injectivity, then f satisfies injective constraint.

4.3. Functional Constraints

Diagram Injective constraint occurs often for a simple path, namely, a role. Figure 5 denotes the injective constraint of J Example

Definition A relation f: A + B is a function if (a, b), (a, b ‘) Ef =-b = b ‘. Definition

If one insurance number covers only one patient, then we have an injective constraint to be expressed in the following way: patient identified-by of patient.

insur_nr

. ..B2 ,..., g,:A--r

g,:B,-+

. . . +A,g2:B2--t

g,,:B,--r

The functional constraint is just the dual concept of injective constraint. Thus we are not going to implement a language special for functional constraints.

(1) support (gr) = support(g2) = . . . = support( In other words, all g,s are defined on the same subset of A. (2) If (a, b, ).

then a =a’.

. . . -+A

Remark . . . --rB,.

Then the n-tuple (g,,g2, . . . ,g,) satisfies the hjective constraint if the following two conditions are satisfied:

(a’,b,)Egl,...,

. ..A ,...,

be n paths in a semantical network. Then the n-tuple (g,.g2,*-. g,) is said to satisfy the functional constraint if the n-tuple (g;‘,g;‘, . . . , g;‘) satisfies the injective constraint.

Let us consider n (n 2 1) paths . . . --rB,,g,:A--t

Let

of insurance

Definition

g,:A-+

. . . xg,:

where g(a) is defined as (b,, . . . , b,) if

A relation f: A + B is injective if for every b E B there is at most one a E A such that (a, b) in J In other words, if (a, b), (a’, b) of, then a = a’. If we consider a path in a semantical network

then

Remark

(a, b,), .(a’, b,) Eg,, Fig. 6

503

Classification and syntax of constraints in binary scmantical networks

5. TOTALITY

CONSTRAINTS

5.1. Total Constraints Definition Given f: A + B. We say f is surjective if range(f) = B. Let f.f,_ , . . .f, be a path in a semantical network. If it is required to satisfy such surjective property, then we say that this is a surjective constraint for this path. On the other hand, traditionally, people use the terminology total constraint instead of surjective constraint. We shall follow this tradition.

I I

g2

Fig. 1

6. KEY CONSTRAINTS

Given a path f =fnf, _ , ...fi in a semantical network. If every f; satisfies total constraint, then f satisfies the total constraint.

The concept of key constraints is not new. It is a kind of combination of total support, injective and functional constraints. Because the concept of key is important for databases, we need to know the corresponding concept in semantical networks.

Definition

Dejnition

Proposition

We can generalize the definition than one path. Let g, : A, 4 . . , +B,g,:A,+...--,B

above to more ,...,

g,,:A,+...-+B, be n paths in a semantical network. The n-tuple . . g.) satisfies the total constraint if range $: j “: ‘range&) u . . . u range(g,) = B. Diagram For n = 2 and g, , g, are roles, we can use Fig. 7 to illustrate the total constraint.

Given

f=fmfn,... fi:A,+Az...+A,-+A,+,, in a semantical network where A, is a nolot. to satisfy the key constraint if (1) support(f) = 4, (2) if (a,b), (a’,b) inf, then a =u’, (3) if (a, b), (a, b’) in f,then 6 = b’. This means that there is a one-to-one correspondence between A, and range(f). In other words, f satisfies the properties of total support, injectivity and functionality. We say b is the key of u if (a, 6) E$ Observe that not every element of A, is a key of an element in A,, but for every element in A, there is a key in A,+,.

+,

Example Every prescription of medicine comes from some specialist. This constraint can be expressed in the following way: med_prescr total_in

med-prescr by specialist.

5.2. Constraints for Total Support

Dejinition Given n paths from a nolot A as follows: g,:A+...+B,,g2:-+...-+Bz

,a.., g,:A+...+B,.

Definition A relation f:A + B has total support if support(f) = A. Thus a path f in a semantical network is said to satisfy the constraint for total support if support(f) = source(f).

Then the n-tuple (g, , g,, . . . , gn) is said to satisfy the key constraint if g, x g2 x . . . x g, : A + B, x B, . . . B,, satisfies the properties of total support, injectivity and functionality.

Definition We can generalize the definition above as follows. Given n paths g,:A+...

f is said

+B,,g,:A-...--rB,

,..., gn:A+...-+Bn,

in a semantical network. This n-tuple (g,, g,, . . . g,) is said to satisfy the constraint for total support if support u . . . u support@“) = A. Total support is the dual property of total constraint. The above diagram can also be used to illustrate the constraint of total support for the pair (g;‘,g;‘).

Diagram Figure 8 can be used to illustrate the key constraint for the pair of roles (g, , gz) because it is easy to prove that the combination of the constraints in this diagram is equivalent with the key constraint defined above. Example The following words describe that the paths from nolot med_prescr to the ots pnr, med_code and date

SEWN-HWEINIENHUYS-CHENG

504

Fig. 10 Fig. 8

target(g). We say the pair cf, g) satisfies the subpath define a key constraint: med-prescr having-key pnr of patient with med_prescr, med_code of medicine of med_prescr, date of med_prescr.

constraint if g =J Diagram

Figure roles.

10 shows the subpath

constraint

of two

Example 7. SUBSET CONSTRAINTS

Subset constraints express a subset relationship between supports, ranges of two paths or a subset relationship between the two paths themselves. Because of the dual property between the subset relationship of supports and subset relationship of ranges, we are again going to define the language for one of them, namely, for supports. 7.1. Subsupport Constraint Definition

For two paths f and g in a semantical network with source(J) = source(g), if support(g) 17 support(f), then we say V; g) satisfies the subsupport constraint.

A secretary who works for a specialist of some department is also a secretary of some department where the specialist works. We express this constraint in the following way: secretary working-for

specialist subpath-of secretary working-for department of specialist. 8. EQUIVALENCE

CONSTRAINTS

The equivalence constraints express the set equality between supports or ranges of two paths or the set equality between the two paths themselves. 8. I. Support Equivalence

Diagram

Definition

Figure 9 is often used for the subsupport constraint for two roles.

Consider two pathsfand g in a semantical network where source(j) = source(g). The pair cf, g) is said to be support equivalent if support(f) = support(g). If this property is always required for f and g, then we have a constraint of support equivalence for these two paths.

7.2. Subrange Constraints Definition

Let us consider two paths f and g in a semantical network where target(f) = target(g). We say the pair cf,g) satisfies the subrange constraint if range(g) 3 range(f).

Figure 11 shows that the two roles f and g should satisfy the support equivalence.

7.3. Subpath Constraints

8.2. Range Equivalence

Diagram

Definition

Definition

Consider two paths f and g in a semantical network with source(f) = source(g) and target(f) =

Suppose target(f) = target(g) for two paths f and g in a semantical network. The pair (Jg) is said

I

Fig. 11

Classification and syntax of constraints in binary semantical networks to satisfy the constraint range(f) = range(g).

505

of range equivalence if

8.3. Path Equicdence Dejnition

If two pathsfand g in a semantical network satisfy source(f) = source(g) and target(f) = target(g), then the pair cf,g) satisfies the path equivalence if f =g. Thus two paths are equivalent if they both contain precisely the same pairs. Example

Fig. 12 9.3. Path Exclusion Definition

A patient of certain patient treatment must also be a patient of some admission with this patient treatment and vice versa. This can be expressed by patient of admission path_eq patient

having

pat-treat

of pat-treat.

9. EXCLUSION

For two paths f and g in a semantical network with source(f) = source(g), and target(J) = target(g), we say the pair (J g) are path exclusive iff n g = 6. The requirement of such property with respect to the pair cf g) is the constraint of path exclusion. Diagram

Figure 13 shows the constraint with respect to two roles.

CONSTRAINTS

An exclusion constraint has to do with the disjoint property between supports of two paths, ranges of two paths or the disjoint property of two paths themselves. 9.1. Support Exclusion

If we require that the date for admitting a patient is not the same as the date for his leaving, then we can give the following constraint. date to-begin

Dejnition

of path exclusion

to-end

admission admission.

path_excl date

For two paths f and g in a semantical network with source(f) = source(g), we say the pair U; g) is support exclusive if support(f) A support(g) = 4. The constraint of support exclusion with respect to cf, g) is the constraint which requires such property being always true.

If f, g : A - B are path exclusive, h : B + C is injective and u : D + A is functional, then (hg, hf) and (gu,fu) satisfy also the path exclusion.

Diagram

The following constraint is superfluous if the constraint in the above example is already given.

Figure 12 is used to express the constraint support exclusion for two roles f and g.

of

Proposition

Given two paths f: A 4 B and g : A + C which satisfy the support exclusion constraint. Then the pair (fif, g,g ) satisfies also the support exclusion if f, : B -) B, and g, : C -+ C, . 9.2. Range Exclusion

Proposition

Example

date to-begin admission of patient path_excl date to-end admission of patient. 10. NUMERIC

CONSTRAINTS

Numeric constraint can be considered as a declaration of types of some lots (lotnolots). This constraint tells which lots are sets of numbers. If we know which lots are sets of numbers then we also know which

Definition

For two paths f and g in a semantical network with target(f) = target(g), we say the pair cf g) is range exclusive if range(J) n range(g) = 4. The constraint of range exclusion with respect to cf, g) is the constraint which requires such property being always true. Remark

The requirement of range exclusion between f and g is equivalent with the requirement of support exclusion between f -’ and g -‘.

Fig. 13

SHAN-HWEINIENHIJYS-CHENG

506

ones are sets of strings. Thus we can use different type of operations with respect to different lots. This constraint can be easily expressed. For example, numeric

salary, storage,

mintime,

maxtime,

dose,

price,

CONSTRAINTS

I I. 1. Introduction I I. 1.1. Basic building bricks for non -declarative constraints Paths and routes are important concepts to build declarative constraints, just as we have Seen in the sections before. We can also build up the nondeclarative constraints by the routes and paths. We give an example here. Consider the path working-for

department.

Suppose we have the following routes in the path: working-for department &l &2 &2 &2

of specialist #4 #cl 12 #3

salary 60,000 60,000 60,000 50,000

In the role working-for there are four elements and in the role of there are four elements. If we want to express that the salary of the specialists in department &2 should not exceed 70,000 dollars, then we have to do with the following set of the related path {(&2, SO,OOO),

{(&2, # 1,60.000),

(8~2. # 2, SO.OOO),

# 3,50,000)}.

We notice that the sum of the second co-ordinates of the first set is not enough to express this concept. Instead we should sum up the last co-ordinates of these routes of three co-ordinates. We should design our language which can express all these kinds of concepts. 11.2 Standard Set Functions We need some special set functions to express the non-declarative constraints easier. Definition Given n roles f,:A,+A2,fi:Az-+Aj

Notice if V = {a) then we have the image set of a. We use also the notation f(a) = b if (a, 6) Ef and b is the only image of a in A,,+ 1 under J Thus if f({a}) has only one element, then {f(a)} = f({a}). Furthermore, when V = source(f), then we are allowed to use range(J) instead off(V). f(V={(a,b)((a,b)~fanda~Vj, is the restriction of the path

,... fn:A,--rA,+,

f on V.

RCf,f.-,...fi>IV={(al,a,,...a,+,)l(ai,ai-~)~f;: for all i, a, E V), is the subset of all routes which begin at an element in V under f, f,_ , . . .fi. This would give the same result as fjV if n = 1. Definition If we have a set S of n-tuples then define F(S) = {a,\(ar, a,, . . . a,,)ES); L(S)={a,l(a,,a,,...a,)ES); 11.3. Standard Element Functions Definition For a set S we can use the following function to find the number of elements in this set:

(&2,50,000)},

where the second co-ordinate should not exceed 70,000. On the other hand, if we are considering the sum of the salaries of the specialists in department &2 should not exceed 200,000 dollars then we have something to do with the three routes beginning from 8~2 above. That is to say (&2,

f(V)={bI(a,b)EfandaEV}.

times.

11. NON-DECLARATIVE

salary of specialist

in a semantical network. We want the following to be delined as standard functions. For a E A, and A,= Vandf=f,f,_,...f,.

card(S) = # {xix E S}. Definition If we a=(a,,a,,...

have a set S of n-tuples, a.) an n-tuple then define

and

first(a) = a,. last(a) = a,. where sumf(S) = x[, + x2, + . . . + xml, x, are elements in S X1,X2,... first co-ordinates and the are all numbers. x11, x21, * - * JmI suml(S) = xln + x, + . . . + xm, where are all numbers. x1,&,.*.x,, Definition Consider a set S of strings or numbers. We can compare two elements in such a set to decide which one is bigger. Thus there exists a maximum and a minimum for S: max(S) I x, where x E S and x > y, for every y ES; min(S) = x, where x E Sand x < y, for every y ES.

Classification and syntax of constraints in binary semantical networks Dejnition

Let us consider an non-empty S={a,,a, ,.... a,}. Then:

set of numbers

sum(S) = a, + a, + . . . + a,; q(S) = sum(S)/card(S).

two numbers, or two sets of the same type. In the last situation they mean include or included, respectively. (4) If, for each, there exist: The syntax for such Boolean expressions is: if Boolean expression then Boolean expression. if Boolean expression then Boolean expression else Boolean expression. for each identifiers from set expression; . . ,. identifiers from set expression: Boolean expression. there exist identifiers from set expression; . . . ; identifiers from set expression: Boolean expression.

Definition

For the lotnolot number functions:

date there

are a few special

day(x) = The day related to x; for example, day(840302) = 2. month(x) = The month related to x; for example, month(840302) = 3. year(x) = The year related to x; for example, year(840302) = 1984. daysof(x, r) = The days between two dates x, y; for example, daysof(840302,840405) = 34. Remark

It is convenient to know what the current day is. Thus there should be a function without parameter which gives the current date as the result. We call it today. 11.4. Expressions and Operators

507

The first two constructions look like the standard if-statement. The difference is that they deliver values: true or false. If. . . then . . . can be replaced by imply. The for each expression becomes true if the Boolean expression after the colon is true for every permissible choice of values of the identifiers. The there exist expression becomes true if the Boolean expression is true for one permissible choice of values of identifiers. 11.5. Examples of Non-declaratice Constraints

Datatypes and operators

Example

We have seen in Section 1 what the element types are. There are essentially four kinds of element types: string, number, an element in a nolot and tuple. A tuple can have n co-ordinates. If n = 1, then we have the special situation like a string, a number or an element from a nolot, etc. An element of a role or a path can be described as a tuple with n = 2. Furthermore, a set can be a set of elements of above described element typ. It is important that all elements in a set are of the same type. The following operators are important operators. They are closed under operations. That means, if we apply such operators on expressions of proper type, you get still an expression of the same type:

Every specialist is allowed to use a number of beds. The sum of beds of all specialists should not exceed 15% of the total beds. The constraint can be expressed as:

+, -7 *, /: for number expressions; u (union), n (intersect), minus: for sets of the same type. Boolean expressions and Boolean operators

A constraint can be defmed as a Boolean expression which should always be true. We want to classify constraints so we should first define Boolean expressions. All other types of expressions here are only used for constructing Boolean expressions:

(1) Boolean constants: true, false. (2) Boolean operators: and, or, not, iff, imply. (3) Relational operators: We can use the relational operators =, # (< >) for two expressions of the same type. We can apply 2 (> =), <( =) on two strings,

suml(bed

of specialist)) < card(bed)

* 1 .15.

Notice that this is not the same as sum(L(bed

of specialist)). The latter can be much less because there can be some specialists having the same number of beds. Example

The number of patients who live in Eindhoven should contribute at least half of the total number of patients. 2 * card(patient

living-in place ({“Eindhoven”))) > = card(patient).

Example

Department Eindhoven.

9 employs only nurses who live in

for each x from nurse: if department of nurse (x) = “9” then place lived-by nurse(x) = “Eindhoven”.

Notice we use the notation (x) after the path place lived-by nurse because we suppose this path satisfies functional constraint. There is only one place where a nurse lives officially.

508

SHAN-HWEINIENFWYS-CHESG

Example

Department

9 should have at least three nurses: oard(nurse working-for department ({9})) > = 3.

Example

The number of specialists who work for department 9 and who live in Eindhoven should be more than 3: oard(specialist working-for department ({S})) intersect specialist living-in place({“Eindhoven”))) > 3.

We can express the same constraint way:

and the place of the patient treatment be A12:

should not

for each x from pat-treat: if treat-code of treatment of pat-treat(x) =“KN09” then sp-id of specialist of pat-treat (x) in {12,14} and hasp_place of pat-treat (x) < >“Al2” ExampIe

If someone gets a treatment of “KNOZ” more than one time, then the difference of the two dates of treatments should be more than 4 days: in the following

card({x from R(place lived-by specialist working-for department) [first(x) =9 and last(x) = “Eindhoven”}) > 3.

This is clearly an example that the concept of path is not enough. If you consider only the begin and end points of routes here then we have only one pair, namely (9, “Eindhoven”). The different specialists make different routes and the cardinal number of the set of such routes is important for the constraints. Example

The date of entering the hospital of an admission of a patient should not be later than the date of leaving. for each x from admission: date to-begin admission(x) < =date to-end admission(x). Example

If the admission reason of a patient is “informaritis”, then the place of staying in the hospital should have code 5 and the patient should stay more than 4 days in the hospital: for each x from admission with adm_reason ({“informaritis”)): daysof(date to-begin admission(x), date to-end admission(x)) > 4 and hasp_place of admission(x) = 5. Example

Comparing the rate (price per hour) of all treatments, the maximum rate should not be more than 10 times the minimum rate: max(range(rate of treatment)) < = lO*(min(range(rate of treatment)). Example

If the treatment is called KN08 then the patient treatment should be done by specialists 12 or 14

for each x, y from pat-treat with treat-code ((“KN02”)) if patient of pat-treat(x) = patient of pat-treat(y) and x < > y then daysof (date of pat-treat(x). date of pat-treat(y)) > 4. Example

If the med_code of a medicine is “M12”, then it is not allowed to be taken by a patient more than two times a day. Furthermore, each time he may not take more than four pieces and the total amount he takes per day may not be more than six pieces: for each x from med_prescr with medicine with med_code ({“Ml 2”)): freq of med_prescr(x) < =2 and num_units of med_prescr(x) < =4 and freq of med_prescr(x)*num_units of med_prescr(x) c = 6. Example

If the danger code of the medicine is 12, then there should be still a medicine of the same kind with only half of the danger (i.e. code = 6): for each x from medicine with dang_code({l2}): there exists y from medicine with dang_code({6}): med_kind of medicine(x) = med_kind of medicine(y) Example

If a patient has more than one time medicine prescriptions with code “M 12”, then the two dates of prescriptions should be at least 15 days away from each other. for each x, y from med_prescr with medicine with med_code({“Ml2”}): if x c > y and patient of med-prescr(x) = patient of med_prescr(y) then daysof(date of med_prescr(x), date of med_prescr(y)) > = 15.

Classification and syntax of constraints in binary semantical networks Example

If a patient is born before 1920 and he is admitted to hospital because of “informaritis” then he has to stay in hospital more than 15 days: for each x from admission: if birthday of patient with admission(x) < 200101 and adm-reason of admission(x) = “informaritis” then daysof(date to-begin admission(x), date to-end admission(x)) > 15. Example

A medicine may only be given to a patient who is staying in the hospital: for each(x, y) from patient of med_prescr: there exists z from admission of patient({y)): date to-begin admission(z) c =date of med_prescr(x) and date of med_prescr(x) c =date to-end admission(z).

Here we use (x, y) to represent an element in the path, that means x is the first co-ordinate and y is the second. We can also use x to represent an element with two co-ordinates. We have to use then first, last to get the two co-ordinates: for each x from patient of med-prescr: there exists z from admission of patient({last(x))) date to-begin admission(z) < =date of med_prescr(first(x)) and date of med_prescr(first(x)) < =date to-end admission(z). Example

The total number of working hours in a week of all specialists in department 5 is no more than 300, no less than 200 h: suml( R (work_ hours of specialist working _for department)\{ 5)) > = 200 and suml(R(work_hours of specialist working-for department)l{5} < = 300. Exampie

If an employee earns less than $10,000 a year, he should have the type of health insurance with code 1. for each x from employee: if salary of employee(x) < 10000 then insur_code of insurance of employee(x)

= 1.

Example

Every department should have a department head who is an employee in the same department.

Especially, department as department head.

509

1, 3, 4 must have a specialist

for each x from department: depart-head of department(x) working-for department({x}) and if x in {l,3,4} then depart-head of department(x) working-for department({xj).

in employee

in specialist

Remarks

As we have mentioned before, the examples come mostly from Refs [4, 51. Their set model is used to develop a theory about the logical structure and constraints of databases. Intuitively, we can think the set model as a kind of mathematical abstraction of the relational databases. In their model there are four kinds of constraints: attribute constraints, tuple constraints, table constraints and database constraints. In comparison with their theory, our theory is also mathematically well founded, just like theirs, but easier to be understood. In comparison with their mathematical notation, our syntax is more colloquial and more understandable. Furthermore, a database constraint has to combine different tables and its expression is thus clumsier. Such a constraint (e.g. the 14th example in section 11.5) can be expressed naturally in the same way as other constraints in binary semantical networks. 12. CONCLUDING

REMARKS

As we know, developing a large relational database is a complicated work. It has many phases and it requires the participation of many different people: users, analysts, database administrators, programmers. Since most large systems have very long lifetimes, the decisions which have been taken have long-lasting effects. The relational database in the present implementation ignores the semantics of the data, and consequently this semantics have to be handcrafted later with application programs. Database design for this model often starts out too early from rough, application-inspired aggregations of data which are later made correct by using decomposition (normalization) rules, a technique which needs a lot of intuition of the database engineer. The integrated design environment of information systems such as RIDL* makes sure that you pay a lot of attention to semantics in the design phase. The correctness of the conceptual scheme and constraints are checked already before you choose the DBMS. The transformation module transforms the conceptual scheme automatically into suitable (by default normalized) relational database scheme. Here follows a short introduction of a few most important modules in RIDL*. RIDL-G allows a user to make a graphical conceptual scheme with related graphical constraints according to the rules of NIAM. In the Appendix is

510

SHAN-HWEINIENHWS-CHENG

a part of a scheme (Hospiral Case) made by this module. After this scheme is made, you can save it with the RIDL-DB module. RIDL-DB contains a meta-database and also provides the facilities to query this database and generate the cross-reference reports on conceptual schemes. RIDL-A is the analyser module of RIDL*. Its purpose is to check if the conceptual scheme is correctly constructed according to the rules of the binary model (‘NIAM). For examples, it checks if every object type in the model has a unique name, if a subset constraint is superfluous, if some types of constraints are consistent, if a nolot can be referred lexically, etc. This analyser checks in fact whether the scheme is good enough for a correct transformation to a relationat database scheme by RIDL-M later. This transformation can only correctly take place after the analyser is satisfied with its analysis. RIDL-M is a module that maps a binary conceptual scheme into a relational scheme (by default fuIly normalized). The generated relational scheme together with the generated additional constraint specifications for the semantic given in the binary conceptual scheme (graphical constraints at this moment) is equivalent with the given binary conceptual scheme [2]. This transformation is based on database schema transformation theory [7]. Most constraints are transformed in ~r~ce~r~i constraints because they are not supported by current implementation of the relational model. The relational scheme built internally by RIDL-M is independent of any target DBMS, it is called a generic relational scheme. From this generic relational scheme a scheme definition for any relational DBMS can be derived by using the specific database definition language related to the DBMS. At this time RIDL-M generates fully operational Oracle, Ingres and DB2 scheme definitions, and a neutral scheme definition on SQL2 standard. RIDL-C is the generalization of graphical constraints. This is a module which tries to express constraints with a certain language and this is what this article is about. After the author left Infolab in Tilburg University, work on it has proceeded. The

essential idea and syntax here are adopted. These constraints checkings, just like the graphicai constraints, should also be analyzed by the analyser RIDL-A before the transformation of RIDL-M takes place. That means there should also be further developments in the analyser (RIDL-A) and mapper (RIDL-M). What we have discussed here are essentially static constraints. The dynamic constraints are considered as an extension of this language and others in the project are working on them. We can go further to make a DBMS based on binary semantical network. The formalism here supplies a method to analyize the related problems mathematically, thereby preventing a lot of troubles in implementation. It is in the next part of the project. Acknowledgements-I hereby thank Professor R. Meersman and my colleagues at Infolab, Tilburg University where I used to work, for their advice and help.

REFERENCES

1110. M. F. de Troyer. RIDL’: a tool for the computer-

assisted engineering of large database in the presence of integrity constraints. Proc. ACM-SIGMOD Inf. Con/. Munugement of Data, Portland, Ore. (1989). RI 0. deTroyer, R. Meersman and P. Verlinden. RIDL* on the CRIS case: a workbench for NIAM. Prcc. fFIp. WG 8. I Working Con/. Computerized Assistance during the Information System Life Cycle, London, England (1988). I31 S. H. Nienhuys-Cheng. Constraints in semantical networks. Proc. Conf. New Generation Computers, pp. 5-14, Beijing, China, Apr. (1989). I41 E. 0. de Brock. De Grondslagen van Semmttische Databases. Academic Service, Schoonhoven, The Netherlands, (I989), also to appear in English. Prentice-Hall, Englewood Cliffs, N.J. F. Remmen. Databases, GronaWagen van de logische structuur. Academic Service, Schoonhoven, The Netherlands (1982). D. de Reus. CNS on the hospital. case. Technical Report, Infofab, Tilburg University, Tiiburg, The Netherlands (1989). I. Kobayashi. Losslessness and semantic correctness of database schema transformation: another look of schema equivalence. Information Systems 11(l), 41-59 (1986).

(Appendix on pp. fi I-513)

511

Classification and syntax of constraints in binary semantical networks

APPENDIX Graphical

Conceptual

Scheme of the Hospital

Case

HOSP: outtine

Fig. A.1 4OSP: specialist,

employee

insur- code

Fig. A.2

512

SHAN-HWEINBWJYS-CHENG HOSP: Patient

Fig. A.3

HOSP: patient treatment

rs

speciotist

Fig. A.4

Classification and syntax of constraints in binary semantical networks

-

513

/1

‘\

num - unit: \ / ./

Fig. A.5