On rules and integrity constraints in database systems

On rules and integrity constraints in database systems

On rules and integrity constraints in database systems T W Ling and P K Teo Recently, several researchers have proposed incorporating rules into data...

1MB Sizes 16 Downloads 107 Views

On rules and integrity constraints in database systems T W Ling and P K Teo

Recently, several researchers have proposed incorporating rules into database systems. These rules typically embody several diverse concepts, e.g., deductive rules, production rules, and authori=ation rules. However, most systems treat these diverse eon- ' cepts homogeneously. Their roles are often misunderstood because of this lack o f distinction. The paper clarifies the concepts Of deductive rules, production rules, authorization rules, and integrity constraints and points out the essential differences between them. A Jramework for comparison is established. Several problems with current implementations o f the rule mechanisms are highlighted. The advantages and disadvantages of each o f these concepts and the domains Jbr which the)' are suitable are also described. databases, database systems, rules, constraints, object-oriented database systems

Recently, several researchers have proposed incorporating rules into database systems ~-4. These rules capture additional semantics (knowledge) for databases. The rule mechanisms employed in these systems often embody several diverse concepts, e.g., deductive rules, production rules, and authorization rules. The different forms of rules have different functions. However, most systems treat these diverse concepts homogeneously. Their roles are often misunderstood because of this lack of distinction. Deductive rules 5 provide a mechanism to derive data that are not explicitly stored in databases. This is the concept of virtual or derived data. Deductive rules are more powerful and expressive than relational views. However, support for deductive rules can be problematic with negative information and recursion. One of the areas where production rules have been successfully used .is in the representation of domain knowledge in rule-based expert systems such as Mycin 6 and XCON v. Triggers and alerters (both are types of production rules) have been used to monitor and react to events that occur in databases ~,3,~. In some objectoriented database management systems (OODBMSs), e.g., Postgres 2, triggers are also used to enforce integrity constraints. However, integrity constraints exist that are difficult to enforce using triggers.

Department of Information Systemsand Computer Science, National University of Singapore. Lower Kent Ridge Road. Singapore0511 Vol 34 No 3 March 1992

In a deductive or relational database, integrity constraints are data dependencies that database states are compelled to obey 9. These constraints represent knowledge that captures the business rules of the application domain. A database operation whose completion violates any integrity constraint should be denied. This has given rise to several proposals for integrity constraint checking 1~2. An efficient form of incremental integrity constraint checking has been proposed 9. Authorization rules 13 are used to control access to sensitive information (e.g., salary). Access is denied if it leads to a violation of an authorization rule defined in the database. This paper makes several contributions. First, the differences between deductive rules, production rules, authorization rules, and integrity constraints are highlighted. A framework is established against which this comparison can be made. Second, several problems with current implementations of these notions in some OODBMSs are highlighted. Third, the advantages and disadvantages of each of these concepts and the domains for which they are suitable are discussed. The next section describes the framework used for comparison. Then, each of these concepts is compared in terms of the framework and their use in some OODBMSs discussed. After that, advantages and disadvantages of each of these concepts are pointed out and the domains for which they are suitable discussed. The comparison is summarized, and finally conclusions are given.

F R A M E W O R K FOR C O M P A R I S O N A framework is established against which to compare deductive rules, production rules, authorization rules, and integrity constraints. The basis for comparison in this paper is given in Table 1. Each of these criteria is discussed in the subsequent sections and a summary of the comparison is then presented. The following schema will be used in many of the examples in this paper: PERSON (IC#, Name, Address) EMPLOYEE (Emp#, IC#, SalesAmt, Salary, Bonus, Deptno) MANAGES (Mgrno, Deptno) SUPPLIER (S#, Sname) PART (P#, Pname, Colour, Qty On Hand, Reorder Point) SUPPLIES (St, P#, Qty)

095(~5849/92/030147 12 © 1992 Butterworth Heinemann Ltd

147

On rules and integrity constraints in database systems Table 1. Comparison criteria Criterion

Description

Underlying theory

Whether there is some underlying theory, e.g., logic on which concept is based

Possible representations (syntax)

How concept is represented syntactically

Semantics

Interpretation or meaning of concept. Declarative or procedural aspects of concept are discussed

Support for negation and recursion

Support of these notions can enhance expressiveness of these concepts

Impact on extensional database

Whether each activation adds to extensional database

Possible conflicts

Whether conflicts are possible within set of each type of rules

Order of activation/evaluation of rules

Whether there is ordering of each rule within rule set. Possibility of h/~ving nondeterministic activation of rules is explored

Optimization

Whether optimization is possible

Control strategies

Whether evaluation strategy is forward chaining (bottom-up) or backward chaining (top-down)

Role in database operations

Part that each concept plays in database operations (e.g., select, updates, etc.)

COMPARISON OF CONCEPTS Underlying theory and syntax In a deductive database, deductive rules comprise the intensional database and are usually represented by H o r n clauses 14. H o r n deductive rules are based on firstorder logic. Logic can be used as a uniform language for representing database facts, views, integrity constraints, and data manipulation and as a programming language. For instance, the programming language Prolog ~5 is based on a Horn-clause subset of logic. Datalog ~6 is another deductive rule-based language (in fact a subset of Prolog) designed specifically to interact with databases. Examples 1 and 2 show the use of H o r n clauses in representing views and data manipulation, respectively.

Example 1: views Clauses can be used to define views. For example: HIGH_SAL_EMP(Emp//, Salary, Deptno):EMPLOYEE(Emp#, IC#, SalesAmt, Salary, Bonus, Deptno), Salary > = 30000. defines a view of highly paid employees. Another view, V E M P L O Y E E , can be defined based on a join o f the E M P L O Y E E and the P E R S O N predicates over the IC# attribute: VEMPLOYEE(Emp#,IC//,Name,Address,SalesAmt, Salary,Commission):148

EMPLOYEE(Emp#,IC#, SalesAmt,Salary,Bonus, Deptno), PERSON(IC#,Name, Address), Commission = SalesAmt * 0.25. Notice that Commission is a virtual attribute that is derived from the SalesAmt field. Derived data are generated on the fly in response to a query and automatically reflect any changes made to the base predicates from which they are derived. For instance, whenever the value of the SalesAmt attribute is changed, the Commission field is automatically changed. The idea of virtual columns has been suggested elsewhere 2. Views can also be defined on other views. For instance, another view L A Z Y - S A L E S M A N , based on the VEMP L O Y E E view, can be defined as: LAZY-SALESMAN(Emp#,Name,Commission):VEMPLOYEE(Emp//,IC#,Name,Address,SalesAmt, Salary,Commission), Salary + Commission < 5000.

Example 2." data manipulation In queries, subject to proper authorization, users may access views and virtual attributes as they would access data from other parts of the schema. The following query, in SQL, retrieves two attributes, N a m e and Commission (which is a virtual attribute), from the view called L A Z Y - S A L E S M A N : Select Name, Commission From LAZY-SALESMAN The answering of a query on a deductive database becomes more of a computation rather than being simply a retrieval of information ~4. Different ways of representing integrity constraints within a logic formalism have been proposed 9:7,18, e.g., tuple calculus, closed first-order formula, clause, etc. Example 3 provides some integrity constraints using a logic formalism called IC-formula 9. It has been shown 9 that an integrity constraint which involves universal quantifiers, existential quantifiers, negative conditions, and/or disjunctive conditions can be expressed by an ICformula easily. IC-formula is more powerful and expressive than some of the other methods of representing integrity constraints, e.g., tuple calculus, closed firstorder formula, clause j7:8, etc. Refer to Ling 9 for a formal definition of IC-formulas.

Example 3: integrity constraints ICI: All parts are either red or blue. PART( . . . . Colour . . . . ) --> Colour = 'red' V Colour = 'blue' where '_' represents the anonymous variable ~5. IC2: N o supplier supplies all the parts. SUPPLIER(S//,_) PLIES(S#,P#, ))

-- >

PART(P// . . . . . . . . ),

not(SUP-

Information and Software Technology

T W L I N G AND P K TEO

Unfortunately, unlike deductive rules and integrity constraints, production rules are not based on any particular underlying theory. Production rules generally reside in the rule base of a production system 19 and usually take the form: IF condition is satisfied THEN perform some action(s) or draw some conclusion(s). Note that production rules can make inferences in their consequent portion. This kind of rule is called a Mycintype rule, after the Mycin expert system 6 that uses this kind of rule. There appears then to be some similarity with deductive rules. However, they differ from deductive rules in at least two areas. First, their execution semantics is different, as shall be seen later. Second, many rule-based expert systems, e.g., Mycin, attach certainty factors to their production rules. These factors measure how strongly the conclusions are believed to be true and allow for probabilistic reasoning. Although certainty factors can also be attached to deductive rules if really desired, it is not normally done. Recently, several database researchers 3,2°,2~ have extended the general form of a production rule to a triple (E, C, A), where E is an event or events being monitored, C is the condition(s) that must be satisfied before the production rule can be activated, and A is the action(s) to be performed when the rule is activated. Such a rule is commonly referred to as an ECA rule. An event can be a database operation (e.g., insert, delete), a temporal event (e.g., an elapsed time), or an application-generated event. When A is a sequence of database updates, the ECA rule functions as a trigger. When A is a signal that is despatched to a user or an external program, the ECA rule functions as an alerter.

Example 4 An employee is automatically rewarded with a $50 bonus, up to a m a x i m u m bonus of $300 (a constraint), whenever each update to the SalesAmt field of the employee's record in the E M P L O Y E E table results in an increase of $1000 in the SalesAmt field. This can be done automatically using two triggers defined as follows (using a form similar to that implemented in Vax Rdb/ VMS22): AFTER UPDATE OF SalesAmt ON EMPLOYEE REFERENCING OLD AS OLD_EMPLOYEE NEW AS NEW_EMPLOYEE WHEN N EW_EMPLOYEE.SalesAmt OLD_EMPLOYEE.SalesAmt > = 1000 (UPDATE EMPLOYEE SET Bonus = Bonus + 50 WHERE Emp# = NEW_EMPLOYEE.Emp#) FOR EACH ROW . . . . . . . . . (TI) AFTER UPDATE OF Bonus ON EMPLOYEE REFERENCING NEW AS NEW_EMPLOYEE WHEN NEW_EMPLOYEE.Bonus > 300 (UPDATE EMPLOYEE SET Bonus = 300 Vol 34 No 3 March 1992

WHERE Emp# = NEW_EMPLOYEE.Emp#) FOR EACH ROW . . . . . . . . . (T2) The keywords ' O L D ' and ' N E W ' refer to the old and new values of the updated tuples, respectively2L The clause ' F O R E A C H R O W ' indicates that the action part of the trigger is executed for each row of the subject table that satisfies the condition part of the trigger. This example illustrates two points. First, the action of one trigger can fire another trigger, whose action may in turn cause other triggers to fire, leading to a cascading activation of triggers. For example, if an update to an employee's SalesAmt field exceeds the old SalesAmt value by $1000, T1 will be fired. T l ' s action will activate T2 to check for a constraint violation on the Bonus field. If a violation occurs, T2's action will be executed, which sets the Bonus field to the m a x i m u m of $300. T2 triggers itself once more before it stops. The second point is that a trigger can be used to enforce simple integrity constraints, as illustrated by the trigger T2. However, more complex integrity constraints are difficult to enforce using triggers. See later for an example.

Example 5 The department manager will be interested to identify immediately those parts whose quantity on hand falls below the reorder point. An alerter can be defined as follows: AFTER UPDATE OF Qty_On_Hand ON PART REFERENCING OLD AS OLD_PART NEW AS NEW_PART WHEN OLD_PART.Qty_On_Hand > = O L D P A R T . Reorder Point AND NEW_PART.Qty_On_Hand < O L D P A R T . Reorder_Point THEN alert-the-manager The 'alert-the-manager' action is implementation-specific. It can be, for example, the sending of a mail to the manager. Some production rules can be considered as instanceoriented rules 3. These rules are applied once for each data item satisfying the condition part of the rule. For example, the rules in Mycin are in this class. Such rules are in fact based on propositional calculus. Authorization information in relational database systems is typically kept in an access rights table or an access control list. For instance, in SQL/DS, the user's access rights are maintained in the SYSTEM.SYSTAB A U T H catalogue table. Owners of database resources can grant or revoke access rights to other users, which will update the access rights table. Authorization rules can augment the existing access rights table by handling access conditions that cannot be handled by the usual access rights table.

Example 6 An authorization rule can state that only a department manager can access his subordinates' data during a speci149

On rules and integrity constraints in database systems

fled time period. This authorization rule can be coded using an SQL G R A N T command. Standard SQL does not permit the qualification of the G R A N T command with specific conditions such as time periods. For the above example, the VAX Rdb/VMS 22 G R A N T command is extended to handle specific time periods. The authorization rule is as follows: GRANT SELECT ON VEMP TO PUBLIC WHERE SYSTIME BETWEEN 0800 AND 1730; The keyword SYSTIME (which is not part of standard SQL, but which is used for exposition) defines the current time. VEMP is a view that defines the tuples that a manager in the above example can access: CREATE VIEW VEMP AS SELECT * FROM EMPLOYEE WHERE Deptno = (SELECT Deptno FROM MANAGES WHERE Mgrno = USER); The keyword USER denotes the person currently logged on. Note that the W H E R E clause in the G R A N T command can be used to specify more conditions, e.g., the terminals/devices that can be used to access the specific VEMP data. This authorization rule can then be invoked during user queries. For instance, whenever a user selects from the view VEMP, the authorization subsystem can invoke the authorization rule to retrieve the correct, authorized data. This can be done by, for example, using a query rewriting technique 23,24that transforms a user's query accordingly: SELECTName,Salary= > FROMVEMP

SELECTName,Salary FROMEMPLOYEE WHEREDeptno= (SELECTDeptno FROMMANAGES WHEREMgrno = USER) and SYSTIMEBETWEEN0800AND 1730;

Note that when a new employee's data are inserted into E M P L O Y E E , the data are automatically accessible by the manager without any explicit update.

Semantics Integrity constraints and authorization rules have a purely declarative semantics. However, both deductive rules and production rules can be viewed from a declarative or procedural perspective. A purely declarative set of deductive rules is best exemplified by a Datalog program. In contrast, a Prolog program has an operational semantics and employs a control strategy that is dependent on the order of the rules and the literals in the body of each rule (see later). Prolog rules are declarative, but there are several system predicates, such as the cut, that makes a Prolog program procedural. The semantics of Datalog and Prolog have been compared ~6. Note that a pure Datalog program does not support updates. Extensions to Datalog have been proposed to incorporate updates j6, e.g., allowing a 150

negative literal in the head of a rule and treating the negative literal in the head as a deletion of facts. In general, to ensure no ambiguity in updates, some operational semantics, such as a meta-rule that determines when a deletion should take place, needs to be defined. This introduces some procedurality to a Datalog program extended with updates. Rule-based expert systems are production systems that have been shown to have the power of a Turing machine, i.e., can be used to model any computable procedure 25. Viewed from this perspective, production rules have a procedural semantics. Note that the consequent part of a production rule can comprise a series of declarative SQL statements that performs some actions. The execution of these actions still requires some form of procedural control.

Negation In deductive databases, there is a problem of expressing negative information and of interpreting rules with negation. For instance, the intuitive definition of a negated literal is to take its complement. Unfortunately, the complement is not well defined and may yield an infinite set 24. Assumptions can be made that will allow negative information to be derived. The open-world assumption 5 requires that all facts, both positive and negative, be represented in the extensional database. Unfortunately, the number of negative facts can be overwhelming. Another possibility is to assume a closed-world assumption (CWAy, under which information that is not declared as an explicit fact and that cannot be inferred from the deductive rules is taken to be false. Unfortunately, the CWA is undecidable. Another less powerful inference rule, the negation-as-failure rule 26, can be used to infer negative information. This rule states that the proof that P is not provable from a particular logic program is always the exhaustive but unsuccessful search for a proof of P. Implementing negation as failure is easy by using S L D - N F resolution ~4, but it may not terminate ~4. Other implementations, e.g., Datalog extended with negation, allow negative clauses to appear in the body of rules. A Datalog program with negation may not have a unique least modeP 6, and therefore may yield ambiguous semantics. One way to circumvent this is to stratify, if possible, a Datalog program with negation so that each stratum has a unique model. Note that not all programs are stratifiable 24. It is possible to include the N O T primitive into the condition and action portions of production rules, just as it is possible to include conjunction and disjunction operators. However, it may not make much sense to specify a NOT in the event specification of an ECA rule. The semantics of NOT(event) is not clearly defined. It is possible to have negative information in integrity constraints specified using IC-formulas. Authorization rules can incorporate simple N O T conditions, e.g., using SQL-NOT conditions 27. Information and Software Technology

T W L I N G A N D P K TEO

Recursion Consider the following recursive deductive rules, which are based on a stored relation P A R E N T ( P a r e n t - N a m e , Child-Name): ANCESTOR (X,Y):- PARENT (X,Y) . . . . . . (R1) ANCESTOR (X,Y):- ANCESTOR (X,Z), PARENT(Z,Y). . . . . . (R2) Here, the A N C E S T O R relation is inferred from the extensional facts given by the P A R E N T relation and computes the transitive closure of the P A R E N T predicate. Deductive rules are more powerful and expressive than relational views. The A N C E S T O R predicate can participate in database queries. Given that A N C E S T O R is defined in the schema as A N C E S T O R ( A n c name,Desc_name), a possible SQL query could be: Select Desc_name From ANCESTOR Where Anc name = 'JIM'; The processing of this query invokes R1 and R2 to retrieve all descendants of ' J I M ' . Recursion is possible in triggers, which can possibly lead to infinite loops. Consider the triggers T1 and T2 given in Example 4. It is possible for a malicious programmer to code the following meaningless production rule (T3): AFTER UPDATE OF Bonus ON EMPLOYEE REFERENCING NEW AS NEW_EMPLOYEE WHEN NEW_EMPLOYEE.Bonus > = 50 (UPDATE EMPLOYEE SET SalesAmt = SalesAmt + 1000 WHERE Emp# = NEW_EMPLOYEE.Emp#) FOR EACH ROW . . . . . (T3) Clearly, when an update of the SalesAmt field of an employee record (with Bonus = 0) results in an increase of $1000 in the SalesAmt field, TI of Example 4 will be triggered. TI will, in turn, trigger T2 (of Example 4) and T3. In the absence of any integrity violation on the Bonus field, only the action portion of T3 will be executed. The net effect is that TI and T3 recursively trigger each other. When the Bonus field eventually exceeds $300, the action portion o f T 2 must be considered. In this case, regardless of the order or choice of execution, T1, T2, and T3 will recursively trigger one another indefinitely. Therefore, there is a danger that triggers may not terminate. Unfortunately, because of the lack of a strong theoretical formalism, there is no generally accepted way to handle this situation. Instead, several proposals have been made, e.g., use of a time-out mechanism, or use of a slatic rule analysis program 3.

Impact on extensional database Integrity constraints checking, authorization rules checking, and the activation of deductive rules do not change Vol 34 No 3 March 1992

the extensional database. Deductive rules infer new facts on the fly. Authorization rules build up the user access profile, also on the fly. The inferred facts and the user access profile are not physically stored in the database and do not extend the extensional database. Integrity constraints do not generate any facts at all and hence do not change the database. Triggers change the extensional database with delete, update, and insert actions. Mycin-type production rules do not change the extensional database, but affect the working memory, which is one part of a production system. Alerters do not change the extensional database and the working memory.

Possibility of conflicts No conflicting answers are possible in retrievals from a deductive database. As no negative information is explicitly stored in a deductive database, only positive information can be a logical consequence of the database. For instance, consider the following trivial rules: good(X):- person(X), helpful(X). bad(X):- person(X), commit_crime(X). Given the following extensional database: person(a). person(b). commit_crime(a). helpful(a). the queries ?good(a) and ?bad(a) both return the answer true. It would seem that there is a conflict in the two answers given. This is not the case, however, because the system has no notion of the semantics of the terms 'good' and 'bad'. Both answers are logical consequences of the above program. Unless an explicit integrity constraint which states that good(X) - - > not(bad(X)) is given, no conflict can arise. It is possible to encounter conflicts with production rules, integrity constraints, and authorization rules. For instance, the following two integrity constraints are in conflict: All employees must earn a salary of at least $500. All employees must earn a salary of at most $400. It is left to an integrity constraint subsystem to ensure that such conflicts do not occur by rejecting the specification of conflicting integrity constraints. Similarly, authorization rules can possibly be specified that return two contrasting access profiles, e.g., one giving access to a resource and the other denying access. Conflicts in production rules can occur in two ways. First, it is possible for the action of one production rule to interfere with that of another production rule. For instance, the action of the following trigger (T4) conflicts with that of T I in Example 4: 151

On rules and integrity constraints in database systems

AFTER UPDATE OF Bonus ON EMPLOYEE REFERENCING NEW AS NEW_EMPLOYEE WHEN NEW EMPLOYEE.Bonus > = 50 (UPI~ATE EMPLOYEE SET Bonus = Bonus - 50 WHERE Emp# = NEW_EMPLOYEE.Emp#) FOR EACH ROW . . . . . (T4) Here, T l ' s action triggers T4, which undoes the updates of T1. Second, when an event happens, several rules may be candidates to fire. These candidate rules form a conflict set. Unless all these rules are fired simultaneously, a strategy must be available to select the rule to fire from the conflict set. This is called conflict resolution. Several approaches to conflict resolution have been proposed, such as selecting rules arbitrarily, assi~.ning priorities to rules, firing the most recently executed rule, or using the matching rule that has the greatest number of conditions. None of these proposals is based on any specific theory that allows the behaviour of these rules to be a~atysed. Under these schemes, it is possible to have'ruies that never fire (e.g., schemes using arbitrary selection or priorities) or rules that can fire indefinitely. An example of the latter kind of rules was described earlier. The firing of the selected rule may produce a database state that can either invalidate the conditions for some of the remaining rules in the conflict set or satisfy the conditions for other rules. The new rules can be added to the conflict set, but it is debatable whether the invalidated rules should be allowed to execute or should be removed from the conflict set. Again, there is no underlying theory to handle this situation.

Order of evaluation/activation of rules Integrity constraints clauses and authorization rules can be evaluated/activated without considering the order of the clauses or rules. However, certain implementations for deductive rules are sensitive to the order of deductive rules, while other implementations are not. This is best exemplified by Datalog and Prolog. The order of Datalog rules has no impact on query evaluation. In contrast, Prolog uses a depth-first search strategy that selects rules top-down and processes the literals in the body of each selected rule from left to right. For instance, if the order of the two rules R 1 and R2 mentioned earlier is changed in a Prolog program, then the processing of the query ANCESTOR(X,Y) will not terminate. The same query will terminate in a Datalog program. Production rules are also sensitive to the order in which they are activated. However, the execution semantics of production rules differs from that of deductive rules. Deductive rules are activated by user queries and always return the same consistent results at different invocations of the same queries. In contrast, when an event happens, a set of production rules may fire in a particular order. When the same event happens again, a possibly different set/order of production rules may be chosen to fire, depending on the conflict resolution strategy used. Different actions and results are obtained, even though the same event occurred. 152

Optimization and control strategy There is no standard evaluation strategy for query processing. Two common approaches are the top-down and bottom-up evaluation strategies. The bottom-up approach is basically a forward-chaining (or data-driven) process that starts from the extensional database and proceeds to generate further facts from the rules and the facts until the answer set fails to grow. Bottom-up query evaluation of a Horn program without function symbols always terminates and returns a set of answers; it also generates a lot of useless results because it does not consider the query. Generally, optimization using methods such as magic sets 28 and counting 28is necessary here. Note that magic sets and counting handle only positive information. The top-down approach is a backward-chaining (goaldriven) process that starts from the query and generates further subqueries that can be recursively processed to enable answers to be obtained from the database. This approach is more efficient because it does not generate unnecessary results, but it produces an answer at a time and may not terminate (as in Prolog). Several algorithms have been proposed based on these two approaches, e.g., naive and semi-naive evaluations (bottom-up) and query-subquery evaluation (topdown) 28. Triggers and alerters can be processed in a forwardchaining manner; backward chaining does not make much sense for triggers and alerters. For Mycin-type production rules, both forward and backward chaining are entirely possible. Optimization of a set of production rules is difficult because of the procedural nature of this set. Production rules communicate with each other by placing data in the database. The entire system behaviour changes when new input is placed in the database. Computation in production rules is data driven and works mainly by side-effects. This makes optimization difficult. The checking of whether a database state obeys an integrity constraint or not is called full integrity constraint checking. This is time-consuming, but can be improved by using incremental integrity constraint checking 9A7,18. Authorization rules are based on queries and therefore employ standard query optimization techniques 24.

Role in database operations Each of these rule concepts has a different role in database operations. Deductive rules can be invoked directly from queries using a query language such as SQL. Authorization rules ensure that users have the authority to access the resource specified in the database operation before the operation can proceed. Integrity constraints are data dependencies that must be true on completion of all database operations. Any operation that violates any constraints is denied. Production rules automatically react to events arising from database operations (e.g., select, update, insert). In turn, these production rules Information and Software Technology

T W L I N G A N D P K TEO

change the database state, which may trigger other production rules, or even the same production rules, causing a cascading activation of rules, which may not terminate.

Further examples This subsection presents three further examples.

Example 7: deductive rules and production rules In general, it is difficult to simulate deductive rule behaviour by using production rules. For instance, consider the following three production rules, which attempt to simulate the deductive A N C E S T O R rules given earlier: On Insert to PARENT (X, Y) Insert ANCESTOR (X, Y). On Insert to PARENT (Z, Y) If ANCESTOR (X, Z) is in the database Insert ANCESTOR (X, Y). On Insert to ANCESTOR (X, Z) If PARENT (Z, Y) is in the database Insert ANCESTOR (X, Y). Unfortunately, there are several undesirable side-effects of this simulation. First, because the A N C E S T O R predicate is materialized, some storage is used up. Second, updating anomalies can occur. For instance, consider an extensional database comprising the following two facts: PARENT (mary, lucy). ANCESTOR (mary, lucy). When a new fact P A R E N T (lucy, john) is inserted into the extensional database, the first two of the above three production rules are triggered. These two rules form a conflict set. In some systems, a conflict resolution technique is used to select one of the two rules to fire 3. Clearly, this is not correct because both rules must fire to produce the following new A N C E S T O R facts: ANCESTOR(lucy, john). ANCESTOR(mary, john). If the fact P A R E N T (lucy, john) is subsequently deleted, the facts A N C E S T O R (lucy, john) and A N C E S T O R (mary, john) still remain in the database. This is an anomaly. To restore consistency, some compensating actions need to be taken, e.g., by deleting A N C E S T O R (lucy, john) and A N C E S T O R (mary, john). Note, however, that deleting a fact P A R E N T (x,y) does not necessarily imply that the corresponding fact A N C E S T O R (x,y) should also be deleted automatically. This is because A N C E S T O R (x,y) could potentially have been generated through other means. For instance, if there exist two facts, P A R E N T (z,y) and A N C E S T O R (x,z), in the database, then A N C E S T O R (x,y) cannot be deleted. Note that it is even more difficult to simulate non-Horn deductive rules using production rules. This is because such simulation must consider both insertion and deletion of facts. For example, consider the use of producVol 34 No 3 March 1992

tion rules to simulate the following three deductive rules, the first of which is non-Horn: SINGLE (X) :- NOT MARRIED (X). MARRIED (X) :- REGISTERED (X, Y). MARRIED (X) :- REGISTERED (Y, X). Assuming the presence of facts S I N G L E ( a ) and SINGLE(b), an insertion of a fact R E G I S T E R E D ( a , b ) must trigger a production rule that inserts facts M A R RIED(a) and M A R R I E D ( b ) . In response to the above insertions, another production rule is needed to delete facts S I N G L E ( a ) and SINGLE(b). This is to ensure database consistency. Clearly, it can be difficult to prove the correctness of a simulation of more complex nonHorn deductive rules using production rules. Consider the use of production rules to materialize the H I G H SAL_EMP view of Example 1. This materialization produces data redundancy, which, if not controlled, can lead to updating anomalies. Furthermore, each update to the salary field of a tuple of the E M P L O Y E E predicate (the base table from which H I G H _ S A L _ E M P is obtained) may affect the materialized H I G H SAL_ EMP view, according to the following matrix:

Updates the salaryto an amountlessthan $30000

Updatesthe salaryto an amount > = $30000

Employeeearnsless than $30 000

No impacton HIGH SAL_EMP

Insert Employeetuple into HIGH SAL EMP

Employeeearns > = $30 000

DeleteEmployeetuple fromHIGH_SAL EMP

No impacton HIGH SAL_EMP

Update action Status beforeupdate

Two production rules must be coded to handle the situations that have an impact on the H I G H SAL E M P view. The use of production rules to simulate more complex view definitions may be even more troublesome. The examples in this section show that the materialization of views and the simulation of deductive rule behaviour by using production rules should be avoided.

Example 8." deductive rules and integrity constraints Given the following two deductive rules D l: PARENT(X,Y) :- MOTHER(X,Y). PARENT(X,Y) :- FATHER(X,Y).

. . . . . (D1)

the addition of an extensional fact M O T H E R ( b , d ) will generate an inferred fact PARENT(b,d). However, consider the case when the two deductive rules are replaced by the integrity constraints: MOTHER(X,Y) - > PARENT(X,Y) . . . . . IC3 FATHER(X,Y) - > PARENT(X,Y) . . . . . 1C4 As no deductive rules are available to generate tuples for 153

On rules and integrity constraints in database systems Table 2. Six triggers are needed to enforce constraint

Database operation

Relation on

Constraint checking needed

Insert

Part Supplier Supplies Part Supplier Supplies Part Supplier Supplies

Yes, if part is red Yes No Yes, if part is blue or green No Yes Yes, if colour is updated No Yes, if quantity is updated and only if new quantity is less than 100

Delete Update

the Parent predicate, consider the following extensional database: MOTHER(a,b) PARENT(a,b) This extensional database satisfies the integrity constraints. However, the addition of the new fact M O T H ER(b,d) will violate the integrity constraint IC3. The operation to insert the new fact M O T H E R ( b , d ) should either be done t o g e t h e r with an insertion of PARENT(b,d) or be denied. Example 9." integrity constraints and production rules Production rules have often been cited as a useful mechanism to enforce integrity constraints 2.3,z°. However, there exist integrity constraints that are difficult to enforce using the current implementation of production rules. For instance, consider this constraint:

Suppliers who do not supply all red parts must supply a blue part or a green part with quantity greater than 100. To enforce this constraint, more than one trigger has to be implemented, as shown in Table 2. For example, when a supplier is inserted, the following trigger could be activated (because of its complexity, pseudocode is used to represent this trigger): On insert to SUPPLIER IF SUPPLIER does not supply all the red parts and (supplier does not supply any blue part with qty > 100 or supplier does not supply any green part with qty > 100) THEN ROLLBACK This is not a trivial trigger to code. For the above constraint, several such complicated triggers need to be coded. Even if such triggers can be coded, it cannot be verified that these triggers are coded correctly and do what they are supposed to do. Coding triggers is essentially a p r o g r a m m e r responsibility as opposed to, say, a transaction, which is a DBMS responsibility. As such, a trigger can be improperly coded, or not coded at all. The assumption that a trigger has been coded, when it has not, can lead to undesirable consequences. For example, a trigger to delete all dependents of an employee who has 154

just been fired may not have been coded, but a programmer who assumes that such a trigger exists will be left with an unexpected database state. On the other hand, properly coded production rules can simplify programming. For instance, when an employee is deleted, his dependents can be automatically deleted without explicit programming. Table 2 shows that not all database operations require a check on the integrity constraint. Full integrity constraint checking is time-consuming, and therefore several methods for incremental constraint checking have been proposed W8,29. These methods use different ways of representing integrity constraints. Nicholas 29 used closed well formed formulas to express integrity constraints in relational databases. His method treats some update operations as a deletion operation followed by an insertion operation. Therefore, there may be some unnecessary incremental checking. Henschen et al. ~° represent integrity constraints by first-order formulas. The method of incremental checking is to assert a constraint for the current database state, express the new database state in terms of both the old database state and the general form of the update, deny that the constraint holds in the new database state, and attempt to obtain a contradiction. However, the method is not complete, i.e., failure to pass all tests does not imply the violation of some integrity constraints. Chakravarthy et al.~8 use clauses to represent integrity constraints. The form of clauses cannot easily be used to represent integrity constraints that involve existential quantifiers, negative conditions, and/or disjunctions. Note that it is difficult to implement tuple calculus and closed first-order formulas in Prolog systems 9. A special form of closed first-order formula, called ICformula, has been proposed to represent integrity constraints in nonrecursive, range-restricted deductive databases 9. IC-formulas are more expressive than other methods of representing integrity constraints. The full and incremental checking of the IC-formulas of deductive databases can be easily implemented in most Prolog systems 9. However, further research is required to apply IC-formulas to recursive databases, as well as to handle update operations in transactions. Note that integrity constraint checking involving transactions is a difficult problem. Within a transaction, some operations may temporarily violate some constraints, but when the transaction commits, all constraints must hold. RULES IN OBJECT-ORIENTED DATABASE SYSTEMS

Orion 3° does not support deductive rules. Iris 3~ supports conjunctive and nonrecursive rules. This is restrictive. The first Postgres rules system 2 uses a single syntax to support production rules, authorization rules, and inference rules (views), thus blurring the distinction between these concepts. It is also not clear by looking at the Postgres rule syntax when a rule will fire. Other problems with this implementation, such as no support for view processing, and problems in controlling rule activation, Information and Software Technology

T W LING AND P K TEO

have been mentioned elsewhere 2t. A second Postgres rules system, primarily a production rules system, was proposed to eliminate the above restrictions 2~. In this system, however, some rules may never fire (those with least priorities) and some rules may loop indefinitely (see example given earlier). 0232 provides production rules as objects for manipulation 33. There is a strong interest to incorporate production rules into OODBMSs. It has been proposed that the SQL view mechanism can be enhanced with the expressive power of deductive rules 2. Consider the following view definition 2 for the recursive A N C E S T O R relation defined earlier: range of P is PARENT range of A is ANCESTOR define view ANCESTOR (P.all) define view* ANCESTOR (A.Parent-Name, P.ChildName) where A.Child-Name = P.Parent-Name The * is a transitive closure operator that iteratively retrieves from the P A R E N T relation until the answer set fails to grow. This is equivalent to a least fixpoint computation. Extension to the query optimizer must be made to support this mechanism. Note that there are two A N C E S T O R view definitions. Current DBMSs do not support multiple views with the same name, but this can be implemented. Note that the above example does not explicitly declare the column names of the A N C E S T O R predicate. It implicitly assumes that the A N C E S T O R predicate has the same column name as the P A R E N T predicate. This is semantically incorrect, as the following example further shows. Consider the S I B L I N G rule, defined as: SIBLING (YI, Y2) :- PARENT(X, YI), PARENT(X, Y2). Adhering to the convention above, the S I B L I N G predicate will have two columns with the same name (ChildName). The use of authorization rules in O O D B M S s has been relatively unexplored, although some work has been done in defining access controls in O O D B M S s 34-35. Dittrich e t al. 34 applied authorization information to the composite parts of complex objects, but the semantics for inheriting access controls through the inheritance hierarchy was not defined. Neither Dittrich e t al. 34 nor Fernandez e t al. 35 considered the behaviour (methods) of objects in their respective models. Except for those O O D B M S s that are based on the extended relational model (e.g., Postgres), most O O D B M S s do not provide definitions for views and external schemas. Views provide a mechanism for protecting information 27. Many OODBMSs, such as Orion, 02, and Iris, enforce referential integrity through the use of unique object identifiers. Others, such as Postgres, use triggers to enforce simple integrity constraints. It is difficult to code and verify the correctness of triggers used for enforcing complex integrity constraints. An execution model has been proposed ~,-'° that specifies how production rules are processed in the context of database transactions. In this model, each rule is an Vol 34 No 3 March 1992

object that can have coupling modes as attributes. These modes decide whether the execution of the rule (if triggered) is immediate or can be deferred till the end of the triggering transaction or even spawned as a separate transaction. However, the determination of the best coupling mode for each rule is manual and can be difficult. Triggers can also be defined to maintain indexes in OODBMSs. For instance, if an attribute of an object class is indexed, then any update to the attribute value can set off a trigger that will update the index 36. STRENGTHS, WEAKNESSES, SUITABLE DOMAINS

AND

This section discusses the advantages and disadvantages of each of the rule concepts and suggests suitable domains in which to apply these concepts. Deductive rules provide several advantages. • Deductive facts are generated on the fly in response to a query and are not stored in the extensional database. Therefore, storage is saved. • Derived data do not introduce any redundancy and therefore there will not be any updating anomalies. • Deductive rules are more powerful and expressive than relational views. They can be recursively defined and can compute the transitive closure of a relation. However, the advantages must be considered against the disadvantages of using deductive rules. In a query using deductive rules, time must be spent computing these rules. The top-down evaluation of a deductive query, using for example S L D - N F (as in Prolog), may not terminate. Unsound implementation, e.g., Prolog, may return the wrong results to queries. Note that the bottom-up query evaluation of a Horn program without function symbols always terminates. Support for negation and recursion can be problematic. Several recent implementations, e.g., L D L 37, provide support for both negation and recursion. L D L extends Datalog with negation, updates, and support for complex terms. It handles negation and recursion by stratifying 24 programs. The following are advantages of production rules: • Triggers simplify programming. For example, to fire an employee, the application just needs to delete the employee. Other employee-related details, e.g., dependents, can be deleted by triggers, if necessary. • Mycin-type production rules are an easy and relatively natural way to express knowledge. The rules are both accessible and easy to modify. • Production rules are powerful enough to model any computable procedure. The following are disadvantages of production rules: • Production rules can fire indefinitely and may go into a loop. They are difficult to program and control.

155

On rules and integrity constraints in database systems

• Conflicts are possible in production rules, both during the evaluation of the rules and when they fire. This means that a conflict resolution strategy must be employed. • The correctness criteria for production rules are not clearly defined, especially when used with transactions. • Triggers change the database states and may produce updating anomalies. • The enforcement of a complex integrity constraint using triggers may require the coding of many such rules. Unless these triggers are automatically generated, there is no guarantee that the coded triggers are correct. • The order of execution of triggers is important and may produce different results on different invocations as a result of a different order of invocation of the triggers.

The main advantage of integrity constraints is that their enforcement ensures the consistency of data in the database. Their major disadvantage is the computational overhead involved in their enforcement. This overhead varys according to the method that is used for integrity constraint checking. The main advantage of authorization rules is that they can handle special access rules that cannot be handled by the existing authorization schemes in DBMSs (elg., access rights table of SQL/DS, access control lists in Vax/VMS). Their main disadvantage is the time involved in building up the access profile of the user. Deductive rules have been applied in domains such as theorem proving, knowledge representation, and nonmonotonic reasoning. They find applications in disparate areas such as computer-aided design, scientific databases, and query support for databases 38. Production rules are suitable for domains in which there are many independent states, and where knowledge can be declaratively expressed '9. This allows multiple, nontrivial production rules to be written. Production rules are also suitable in domains where the computational process consists of a set of independent actions. Such a process requires only limited communication between actions, similar to the execution semantics for production rules. Integrity constraints can be applied to database schema design. A few commercial database systems provide support for simple integrity constraint checking, such as key constraints and referential constraints. Authorization rules can augment the existing authorization schemes in DBMSs.

CONCLUSION A framework was established that pointed out the differences among deductive rules, production rules, authorization rules, and integrity constraints. The advantages and disadvantages of each of these concepts was also pointed out and suitable domains suggested where each of these rules can be used. These rule concepts can be integrated usefully in any database system. For instance, a user query can be checked first by an authorization subsystem for possible access violation. If the query is an authorized access, the deductive rules within the query, if any, are parsed and incorporated into the query by the deductive rule subsystem. Then the integrity constraint subsystem checks the final query for any possible integrity constraint violation. If no violation is detected, the query is passed to the query manager for execution. This execution may activate some triggers. When triggers finish, the query commits. This integration needs to be further developed, especially in the context of object-oriented database systems.

REFERENCES 1 Chakravarthy, S 'Rule management and evaluation: an active DBMS perspective' SIGMOD Record Vol 18 No 3

(September 1989) pp 20-28 2 Stonebraker, M, Hanson, E and Potamianos, S 'A rule manager for relational database systems' in The Postgres papers, Memo No UCB/ERL M86/85 (revised) University

of California, Berkeley, CA, USA (June 1987) 3 Widom, J and Finkelstein, S 'A syntax and semantics for

4 5 6 7 8

1989 A C M SIGMOD Int. Conf. Management of Data

9 10

11 12 SUMMARY DIFFERENT

OF COMPARISON RULE CONCEPTS

OF

Table 3 gives a summary of the comparison of deductive rules, production rules, authorization rules, and integrity constraints. 156

set-oriented production rules in relational database systems' A C M SIGMOD Record Vol 18 No 3 (September 1989) pp 36-45 Committee for Advanced DBMS Function 'Third-generation database system manifesto' Memo No UCB/ERL M90/28 University of California, Berkeley, CA, USA (April 1990) Reiter, R 'On closed world data bases' in Gallaire, H and Minker, J (eds) Logic and databases Plenum Press (1978) Shortliffe, E H Computer-based medical consultations." Mycin Elsevier (1976) Barker, V and O'Connor, D 'Expert systems for configuration at Digital: XCON and beyond' Commun. A C M Vol 32 No 3 (March 1989) pp 298-318 Agrawal, R and Gehani, N H 'ODE (Object Database and Environment): the language and the data model' in Proc.

13 14

(1989) pp 3&45 Ling, T W 'Integrity constraint checking in deductive databases using the prolog not-predicate' Data Knowl. Eng. Vol 2 (1987) pp 145-168 Henschen, L, McCune, W and Nagui, S 'Compiling constraint checking programs from first order formulas' in Gallaire, H et aL (eds) Advances in database theory, Vol 2 Plenum Press (1984) pp 145-169 Kobayashi, ! 'Validating database updates' Inf. Syst. Vol 9 No 1 (1984)pp 1-17 ,, Lloyd, J, Sonenberg, E A and Topor, R W 'Integrity constraint checking in stratified databases' TR 86/5 Department of Computer Science, University of Melbourne, Australia (1986) Thuraisingham, M B 'Mandatory security in object-oriented database systems' in Proc. OOPSLA '89 (October 1989) Lloyd, J W Foundations of. logic programming SpringerVerlag (1984) Information and Software Technology

T W LING AND P K TEO

Table 3. Summary of comparison of different rule concepts Feature

Deductive rule

Production rule

Integrity constraint

Authorization rule

Adds to extensional database

No

Mycin-type rule and alerters--no Triggers--yes

No

No

Possible conflicts

No

Yes

Yes

Yes

Is order important in activating/evaluating rules?

Depends on implementation. For example, in Prolog, order is important; in Datalog, order is not important. Deductive rules always return same results through different invocations

Yes; it is possible to get different results with different invocations or different order of invocation of production rules

No

No

Possible loop

Depending on algorithm used to process rules, loop is possible

Yes, depending on how production rules are coded

Yes, if integrity constraint checking invokes some rule that loops

No

Representation (syntax)

Horn clause

IF X THEN Y format (Mycin-type rule); ECA triple or its variant

Tuple calculus, Horn clause, IC-formula, etc.

Access-rights tables; declarative SQL statements

Role in database operations

Can be invoked directly from within query; used to generate facts to answer queries

Triggered as result of database state changes that satisfy certain conditions

Semantic condition that must be true on completion of all database operations. Database operation that leads to violation of integrity constraint is denied

Ensures that users have authority to access database resource before operation is allowed to proceed

Underlying theory

Logic

No fundamental theory

Logic

Proposal is based on query rewriting

Semantics

Contrast Prolog's operational semantics with Datalog's declarative semantics

Mycin-type rules have declarative semantics. Execution of triggers have procedural semantics; however, condition and action statements in triggers can be declarative

Declarative

Declarative

Control strategy

Backward and/or forward chaining

Triggers and alerters primarily forward chaining; Mycin-type rules can be both forward and backward chained

Not applicable

Execution of queries to build user profile

Optimization

Several algorithms, e.g., magic sets, Alexander method, etc., have been proposed to optimize logic programs

Hard to optimize because of imperative nature

Incremental integrity constraint checking

Query optimization techniques

Support for negation

Yes

Possible in condition and action portion of ECA rule, but a not (E) does not make sense

Yes

SQL-not

Recursion

Yes

Yes

No

No

15 Clocksin, W and Mellish, C Programming #1 Prolog (2nd ed) Springer-Verlag (I 984) 16 Ceri, S, Gottlob, G and Tanea, L ~What you always wanted to know about Datalog (and never dared to ask)' IEEE Trans. Knowl. Data Eng. Vol 1 No 1 (March 1989) pp 146166 17 Asirelli, P, Santis, M D and Martelli, M 'Integrity constraints in logic databases' J. Logic Prog. Vol 3 (1985) pp 221 232 Vol 34 No 3 March 1992

18 Chakravarthy, U, Minker, J and Grant, J 'Semantic query optimization: additional constraints and control strategies' in Proc. 1st Int. Conf. Expert Database Systems (1986) 19 Davis, R and King, J 'An overview of production systems' in Eleock, E and Mitchie, D (eds) Machine intelligence 8 EllisHorwood (1977) pp 300-332 20 McCarthy, D R et al. 'The architecture of an active data base management system' in Proc. A C M SIGMOD (May 1989) 157

On rules and integrity constraints in database systems 21 Stonebraker, M e t al. 'The implementation of Postgres' IEEE Trans. Knowl. Data Eng. Vol 2 No 1 (March 1990) 22 Vax Rdb/VMS SQL reference manual (December 1989) 23 Stonebraker, M 'Implementation of integrity constraints and views by query modification' in Proc. A C M SIGMOD Int. Conf. Management of Data (1975) 24 Uliman, J Principles of database and knowledge based systems, Vol I Computer Science Press (1988) 25 Rich, E Artificial intelligence McGraw-Hill (1983) 26 Clark, K 'Negation as failure' in Gallaire, H and Minker, J (eds) Logic and databases Plenum Press (1978) pp 293-322 27 Date, C An introduction to database systems, Vol 1 (4th ed) Addison-Wesley (1986) 28 Bancilhon, F and Ramakrlshnan, R 'An amateur's introduction to recursive query processing' in Proc. A C M SIGMOD Int. Conf. Management of Data (1986)pp 16-52 29 Nicholas, J 'Logic for improving integrity checking in relational data bases' Acta Inf. Vol 18 (1982) pp 227-253 30 Banerjee, J et aL 'Data model issues for object-oriented applications' A C M Trans. Office Inf. Syst. Vol 5 No 1 (January 1987) pp 3-26 31 Fishman, D et al. 'Iris: an object oriented database management system' A C M Trans. Office Inf. Syst. Vol 5 No 1

158

(January 1987) pp 48~9 32 Deux, O e t al. 'The story of 02' IEEE Trans. Knowl. Data Eng. Vol 2 No 1 (March 1990) pp 91-108 33 Medeiros, C and Pfeffer, P 'A mechanism for managing rules in an object-oriented database' Research report Altair (1990) 34 Dittrich, K, Hartig, M and Pfefferle, H 'Discretionary access control in structurally object-oriented database systems' in Landwehr, C E (ed) Database security II." status and prospects North-Holland (1989) 35 Fernandez, E, Gudes, E and Song, H 'A security model for object-oriented databases' in Proc. IEEE Computer Society Syrup. Security and Privacy Oakland, CA, USA (I-3 May 1989) 36 Zdonik, S and Maier, D 'Fundamentals of object-oriented databases' in Readings in object-oriented database systems Morgan Kaufmann (1990) pp 1-32 37 Tsur, S and Zaniolo, C 'LDL: a logic based data language' in Proc, Int. Conf. Very Large Data Bases Kyoto, Japan (August 1986) 38 Ullman, 3 'Deductive databases: achievements and future directions' A C M SIGMOD Record Vol 19 No 4 (December 1990) pp 75-82

Information and Software Technology