Injormarion Systems Vol. 18, No. 6, pp. 391404, Printed in Great Britain. All rights reserved
FLEXIBLE
1993 Copyright
0
0306-4379/93 $6.00 + 0.00 1993 Pergamon Press Ltd
CONSISTENCY MODES FOR ACTIVE DATABASES APPLICATIONS OPHER ETZION
Technion-Israel
Institute of Technology, Faculty of Industrial and Management Engineering, Haifa, 32000, Israel
(Received 10 February 1992; in @aI revised form 25 February 1993) Abstract-The concept of transaction is a basic one in database theory. Yet, in some cases the atomicity of a transaction may be relaxed. This happens either to avoid the damage inflicted by the fact that many data elements are locked in long transactions, or to support different view-points of the database that may relax atomicity constraints. This paper defines the concept of transaction, in the environment of PARDES-an intelligent active database and outlines a model in which different modes of consistency co-exist as local decisions. These modes are defined and analyzed. Among the basic modes are: fully consistent (consistent at all times), quasi consistent (eventually consistent) and loosely consistent (failures in derivations are not compensated). This is an extension to the active database definition language that enables the support of implementation considerations using high-level abstractions. Key words: transaction
completeness, interoperability,
active databases
1. BACKGROUND
1.1. Prologue
Classical database theoreticians may find the idea of relaxing the automicity of transaction to be a faulty one. Practical needs, however, sometimes indicate otherwise, especially in the context of active databases. This is one of the cases, that a cozy theory has to be adjusted to cope with practitioner’s requisitions. This section re-visits the transaction concept, discusses our frame of reference (the PARDES active database model), surveys related work and motivates this research. 1.2. The major players 1.2.1. Transactions. The concept of atomic transaction is one of the fundamental concepts in database management systems. A transaction [l] is a set of database transitions (i.e. actions that modify a database, or create a new database state) that has the desired following properties:
Atomicity: Either all the transitions terminate successfully (commit) or the database retreats to the state before the transaction began (rollback). Soundness: If a transaction operates upon a consistent database state it would result in a consistent database state. Durability: If a transaction commits, it is guaranteed that the database modification inflicted by its transitions will not be lost. 1.2.2. Active databases. An active database [2], is a database that contains a mechanism to detect events (such as: a certain update operation), and trigger actions as a result of this detection. This works in the following fashion: 1. A database operation is identified as an event. 2. for each such event rules may be attached. Each rule is an E-C-A (event-condition-action) rule. 3. For each such rule invoked by the event the condition is evaluated. The condition is a query in the database language, which returns to the control mechanism an indication whether this condition is satisfied or not. 4. If the condition is satisfied, then an action is fired. Typically, the action is a program that is invoked by the control mechanism. The action’s logic is not controlled by the “active” 391
OPHERETZION
392
model, and left to the system designer’s discretion. Any such action may generate database events that invoke the “active” mechanism again. 1.2.3. The PARDES model. The PARDES model [3] is an intelligent active database, in which the active mechanism is defined in the form of dependencies among database entities. These dependencies are expressed in form of invariants. An example of such an invariant is: Balance:=sum(Credits)
- sum(Debits).
This is higher level language than the E-C-A rules one. In an E-C-A rule language one has to designate all the events, in which a credit or a debit may occur (such as: Deposit Cash, Withdraw from automatic teller machine, Getting Interest). The language used in the model is an abstract one, and there is an intelligent mechanism that converts it to an executable form [3]. The equivalent of E-C-A is inferred. The major benefits of this approach are: 1. The logic of dependencies is also part of the active model, and not only the control mechanism. Thus, we can capture in a high-level language a larger portion of the application. 2. Furthermore, inclusion of the logic of dependencies enable avoidance of semantic or pragmatic fallacies, such as: contradicting actions (two actions assigning conflicting values that are fired by the same event), and update redundancy (the system performs, in the worst case, exponential number of redundant update operation, because the control mechanism cannot expect updates operations that will be triggered by fired actions [4]). 3. The soundness property of a transaction can be provably guaranteed by the system, and not be left to the system designer’s discretion. 1.3. The honey and sting of atomic transactions There are obvious benefits of keeping any transaction as an atomic entity. Yet, there are cases in which the atomicity might be too restrictive. Some of the cases are: Long-lived transactions: These are transactions that either contain many transitions, or heavy ones [5]. Consequently, the total execution time is long. The atomicity property requires locking all the data-elements that have been modified by any transition in the transaction, until the transaction completes. This may produce some damage to the system usability. Distributed transactions: In an MDBS (multiple database system) context [6] atomicity may not be practical due to either technical constraints (unguaranteed serialization) or due to the fact that the distribution overhead leads to the same effect as the long-lived transactions. Restricted consistency transactions: In some cases, some of the transitions may be classified as major, and others as minor. In this case, we may not want to undo a major transition because of a failure of a minor one. Also, there might be cases where we want to allow different viewpoints to coexist. In the PARDES model, this might be translated to a sequence of derived transitions that some of them are not executed. 1.4. Related work
Related work has been performed in several contexts: in the context of multidatabase systems, in the context of long range transactions within a single or multiple database, and in the context of active databases. 1.4.1. Multidatabase system. In the context of a multidatabase system, the issue of interdependent data requires some state of consistency. A substantial amount of work has been done about replica control [7]. The primary copy approach [8] designates one copy as a primary database and other copies as secondary databases. The primary copy assumes the responsibility of maintaining consistency. In this approach an update of the primary database is not translated into a single atomic transaction, but to a series of transactions for the different databases. Replica control algorithms work in a predetermined fixed consistency mode. They usually deal only in a restricted type of derived data. More recent work of this area is the notion of quasi copy [9]. Quasi copy models the relationship between the primary copy and a secondary one. In this context the concept of coherency conditions has been introduced. These include: delay condition, which signifies how
Flexible consistency modes for active databases applications
393
much time a secondary copy can lag behind the primary one, and version condition, which signifies how many versions a secondary copy can lag behind the primary one. This policy may lead to consistency problems in a secondary copy that do not exist in the primary one. These problems occur in situations where some but not all of the objects, may be replicated, without violating the coherency conditions. Enforcement of such constraints may be requested by the users, and add more constraints to the replica protocol. Temporal constraints were also been used in contexts of mo~er~~lized views [lo] and of federated dfff~uses [ll]. The idea is similar to that of coherency conditions. A more flexible approach to mutual consistency has been developed recently [ 121. This is the Eventual Consistency approach, based on the premise that the actual replication may not be necessary, and different data-elements may have different degrees of consistency, specified by a set of constraints about the mutual consistency. The polytransactions approach [13] defines inter database dependencies as Data Description Dependencies (D’), which is a 5-tuple (S, U, P, C, A) where: S is the set of source data objects. U is the target data objects P is a predicate that specifies the relationship
between the source and target. C is a predicate that specifies consistency requirements in term of temporal relations. A is the action component, which contains a set of procedures to restore consistency.
A Polytransaction is the transitive closure of a transaction to satisfy the dependencies as specified by D3. A Polytransaction is a design tool, modelling the behavior of inter database dependencies. It is not an active database mechanism that enforces these constraints. A system that is modelled by the polytransaction model should behave in the following way: For each members of S that is being modified DO: Find a set of actions in A that are aimed to maintain the relationship in P between s and the relevant members of U Schedule the execution of each relevant action in a way that satisfies relevant consistency predicates in C End For
The consistency requirements in C are of the same type as coherency conditions that were discussed earlier. 1.4.2. Long transections. Long transactions are transactions which contain many or long update operations. In these cases, it may not be desirable (or even possible) to lock all the participating data elements until the end of transaction is achieved. In such models, the transaction is partitioned into sub-transactions that may commit separately. If any of the sub-transactions fail, then the system should initiate compensations that emulates the rollback mechanism. The saga model [S] follows this approach for sequential partitioning of transactions. The migrating transaction model [14J supports it for concurrent sub-transactions. The ATM model [15] extends this approach to deal with nested transactions, i.e. sub-transactions specified as a tree of transactions. The model permits the user to define both compensation modules and exception-handlers with script oriented control mechanism. 1.4.3. Active databases. The HiPac model [2] is an Event-Control-Action model. As any database system a transaction is a sequence of update operations specified by the user, but because of the active environment, each update operation is an event that might trigger additional update operations. The execution mode is determined by the coupling between the event and the action. A coupling is either immediate or deferred or detached. The first two possibilities look at the process as atomic transaction, where the distinction is in the order of the execution of the update operations (immediate stands for a depth-first protocol, while deferred stands for a breadth first one). The detached coupling scheme, creates a different transaction for the action. The original transaction and the derived transaction in this case are detached, and compensation for failure in the derived transaction, if done, is left to the system designer’s discretion.
ETZION
In many of these models the consistency of dependent data is periodically coordinated, according to some criteria. We propose to add more flexibility in the definition of the consistency requirements. This is possible in the PARDES model, due to the fact that the ACTION part is embedded in it.
Our motivation is to construct a model that will: 1. enable the sytem designer to define the consistency requirements for each database dependency using high-level abstractions. This is achieved by using c~~~~~e~c~modes and applying them for each derivation or constraint. Some consistency modes are: Fully Consistent (the dependency invariant should be kept at all times), Quasi Consistent (the original transaction appears to commit but a compensation may be issued), Loosely Consistent (consistency may be relaxed). 2. integrate these abstractions within the execution model of an intelligent active database, so these definitions will be directly executable, rather than issue a descriptive model of the application. The rest of the paper is structured in the following way: Section 2 discusses the transaction model of PARDES, which serves as the kernel of the proposed model. Section 3 defines and demonstrates the consistency modes. Section 4 concludes the paper and discusses implementation issues. 2. THE PARDES
TRANSACTION
MODEL
2.1. The concept of transaction is revisited The concept of transaction in the context of the PARDES model shares the desired properties that have been mentioned for the general concept of transaction. However, a few outstanding properties exist in this environment: 1. Any transaction is a nested transaction. There are two types of transaction connections: A bridged connection: Transactions tri, trj are said to be bridge-connected, if they are both direct transitions (created by the user), and they are explicitly defined by the system designer as a members of the same transaction. A triggered connection: Transitions tri, trj are said to be trigger-connected if: either tr, is a direct transition, and trj has been triggered (directly or indirectly) by the active mechanism as a result of tri, or exists a direct transition trk, such that: tri, trj are both triggered from it. 2. One of the desired properties of a transaction system is the soundness property: given a consistent state, any transaction that operates on it will result in consistent state. The term consistent in regular database system is hard-coded within the user-defined transition, namely, the model assumes that the user written programs are correct and take care of the consistency issues, but it is not guaranteed by the model itself, since the model cannot capture the semantics of consistency. In the PARDES framework, the consistency constraints are explicitly defined, and translated by the system into internal transitions. Thus, the soundness of the system is kept by the system, and not by user defined programs. 2.2. Datable
de~~it~on
1. A database D is an history of database states A,,, . . . , A,, where A., is the database kernel which includes only meta-information entities, and A, is the current database state. 2. A database state A is a tuple (STS, PDISET, M, EDG, CMS) where: l
l
STS is the static and designates PDISET is the PDI (Persistent
schema. An element in the static schema is called IU (Info~ation Unit), a property of some object. set of derived information definitions. Each element in this set is called Derived Information). It refers to an IU (element of the schema) that
Flexible consistency modes
l l
l
for active databasesapplications
395
includes the derived value, and to an invariant that designates a relationship which is required to be maintained by all the IU’s instances. M is the set of messages, The concept of message is explained below. EDG is a derivation graph denotes derivation relationships among information unit (dependency graph). EDG is inferred by the system as a function of the STS and PDISET definitions. It is defined and discussed in Section 3.4. CMS (Consistency Mode Set) is a set of consistency requirement relationships among elements in the dependency graph. It is discussed in Section 4.1.
This definition bears some similarity to the ~~~~~~~~~~c~~~~ approach, mentioned earlier. However, the a&ion component (A) in the polytransaction model assumes that there are user-defined procedures that take care of the consistency, while the PARDES model contains an executable specification of the dependencies. Consistency modes are specified in the PARDES model as part of the database definition. Temporal and other constraints are discussed in Section 3.6. 3. Each database state contains as one of its entities a set of messages: p,, . . . , pk. Each meassage is a possible update operation (such as: Update Employee, Update Salary. , , ) that can be activated by the user or by the system. A message that is activated by the user is called direct message. A message that is triggered by the system is called derived message. 4. A database transition tr is an instance of a message defined for the database state. 5. An operation of a transition tr upon a database state A, creates a new database state Ai+, . The semantics of this change is embedded in the message definition. 2.3. Transaction definition The concept of transaction
in our framework is defined in the following way:
1. A transaction 0 is a sequence of Basic Transactions (BTR) #], . . . , &, where each basic transaction & is a sequence of transitions trio, tr,, , . . , tr,. Vi: 1 < i < n: trio is an instance of a direct message. Vj: 1
the paper.
Vx : x E Project -+ [Total1. The derivation of Total-Cost is interpreted as: Cost(x) = X(Activity-Cost(y)(Project-Affiliation(y) = x]. 2. The derivation of Activity-Cost is interpreted as: Vx: x E Activity + [Activity(Activity-Type(x)) f Overhead-CostfProjectCost(x) = Duration(x) x Cost-Per-Day A~liation(~)).] 3. Assertion-l is a consistency assertion that has to be verified if a relevant modification occurs.
2.4. Derivation graphs Formally, given a database schema, the executable data structure is called EDG (extended derivation graph) and is defined as follows: EDG (Extended Derivation Graph) = (V, E) is a directed acyclic graph, where V is a set of vertexes and E is a set of edges. V is a set that contains elements of three different types: IU (information-unit), MSG (direct and derived messages) and PATH. PATH entities are created when a PDI belongs to a class, while a deriver (IU that participates in the derivation) of the same PDI belongs to another class, for example:
396
OPHER ETZION Schema:
Project:
is an entity with attributes:
Activity:
is an entity with attributes:
Activity-Type:
Overhead-Cost,
Budget, Total-Cost(PDI].
Project-Affiliation,
is an entitywith attributes:
Activity-Type,
Duration, Activity-Cost[PDI].
Cost-Per-Day.
Invariants:
Activity-Cost Total-Cost
:= Duration x Cost-Per-Day
+ Overhead-Cost
:= sum (Activity-Cost)
Assertion-l
Budget 2 Total-Cost
A Stable State Ass: Project Overhead-Cost
Budget Total-Cost
1
10
150
120
2
20
200
100
Activity Project-Affiliation
Duration Activity-Type
Activity-Cost
1
1
10
1
60
2
1
25
2
60
3
2
16
1
100
Activity-Type
Cost-Per-Day
1
5
2
2
Messages: External
Messages:
Derived Messages:
Update Project, Update Activity, Update Activity-Type Update Activity-Cost,
Update TotaCCosf.
Check-Assertion-1
given the following transaction: trlu = < Update Activity with Operation-Code 10, Activity-Type = 1 >.
The derived transitions trll
= modify, Activity = 4. Activity-Cost = modify, Project = 2, Total-Cost
Check Assartion-1 with: Budget = 200, Total-Cost
The transaction
= 2, Duration
=
are as follows:
= Update Activity with Operation-Code
tr12 = Update Project with Operation-@de ttla
= insert, Activity = 4, Project-Affiliation
issues commif.
= 70
= 170
= 170.
These transitions create the states As, As,, AJS AJO is a stable state. Fig. 1. A project management example.
There is a need to determine which instances of Activity affect the values of a certain instance of Project, or the converse-which instances of Project are affected by a change in a given instance of Activity. In conventional models this matching is done explicitly by designating the conditions for this match? or by using path expressions [16]. In PARDES there is an inference process that determines the matching in case that matching conditions are omitted. The automatic matching protocol attempts to infer such a matching by using semantic equivalences (properties are semantically equivalent if they are mapped to the same set and have the same meaning). Automatic matching simplifies the language and makes it matching independent. If there is a unique matching between two objects and there is a unique pair of properties of these objects which are semantically equivalent, then this matching is assumed by the translator. If there is no matching at all between the two objects then it is considered to be a system design error. If the matching is not unique then the user is prompted with the possible matches and is asked to make a selection. tJoin in relational
algebra is a kind of “matching”.
Flexible
391
for active databases applications
consistency modes
PATH = pl. PDI = Activity-Cost, context = Activity, connector = Project,relationrhip = m:l
reference=
Overhead-Cost,
anchor=
Project-Affiliation,
PATH= p2. PDl = Activity-Cost, context = Activity, reference = Cost-Per-Day, connector=Activity-Type, relationship = m:l
anchor = Activjty.Activity-Type,
PATH=p3,
anchor =
PDI=
Project-Affiliation,
Total-Cost,
context
=
Project,
reference
=
Activity-Cost,
Project,
connector=
relationship = 1:n
Fig. 2. The PATH entities in the example.
The result of any matching operation is a meta-information
entity called PATH. PATH is a tuple
(PDI, context, reference, anchor, connector, relationship), where: PDI is the derived property; context is the class to whom the PDI belongs; reference is the property that is referred to in the PDI’s invariant and does not belong to context; anchor is the matching property that belongs to context; connector is the matching property that belongs to the reference’s class; and relationship is the cardinality of the attachments between the two classes. Possibilities are: 1 : 1, 1 :n, m:l,
m:n.
Figure 2 shows the PATH entities in our case study example. An EDG for the example in Fig. 1 is shown in Fig. 3. Circles designate information units, boxes designate messages, triangles designate the PATH matching constructs. Assertion is triggered in a similar way to a message. A transition is an instance of a message, i.e. a message, whose variables are bound to values. The transaction derived graph (V,, , E,,) is the transitive closure of a direct message p in EDG. The definition is: V, = {p} u {u E EDGl exists a path in EDG between p and u}. E,={e~EDG(e=(u,,u,)andv!,v,~~~}. MS(p) is the message-set of p defined as: MS(p) = {ulv E V, and type(v) = MSG}, it designates all the modes of type MSG in the transaction derivation graph. Example: in the EDG described in Fig. 3. MS(Update
Activity) = (Update Activity, Update Activity-Cost,
Update Total-Cost}.
This set of MSGs designate the MSGs that are potentially triggered by the execution of an instance of Update Activity. In our example, instances of all these MSGs are activated, in the general case, a derivation may be conditioned (in the form: u:=b when condl; c when cond2. . . ), thus some decisions about MSG activation are determined only at run time. The control algorithm creates a set of transitions (fro, . . . , tr,,,}, where tr, is an instance of p and the rest of the transitions are instances of members of MS(p). The set {tr,,, . . . , trm} is called a basic transaction designates transitions that have trigger connection relationships. in the rest of the discussion we shall concentrate upon basic transactions. The Object Derivation Graph (ODG) is a subgraph of EDG defined as follows: ODG = (I’,, >4,), where: V, = (u E Vltype(u) = IV>. E,, = {(u,, uz)IuI, u2E u. and exists a path (u, , up, . . - up”, u2) in EDG, where for all 1 Q i < n,
398
OPHERETZION
7Y Total
Fig. 3. An EDG.
type&J # IU).? Let ol, o2 be two information-units. If e = (0, , ox) is an edge in ODG, then o1 is called immediate deriuer of o1 and o2 is called immediate derivative of 0,. The definitions ofderivers and derivatives are obtained by extending the previous definitions to their transitive closure, in the appropriate direction. 3.
CONSISTENCY
MODES
3.1. Overview This section proposes six consistency modes: 1. 2. 3. 4. 5. 6.
the the the the the the
fully consistent mode. quasi consistent mode. loosely consistent mode. periodically coordinated mode. semi consistent mode. virtual mode.
The general issue handled by these modes is the level of consistency required between original information and a derived one and between various levels of derived information. A mode is an abstraction that denotes consistency requirement of a PDI w.r.t. some or all of its derivers. A mode tThis condition denotes the existence of a path in the graph that contains either PATH nodes or MSG nodes, but not an information unit node.
Flexible consistency modes for
We shall limit the discussion to basic transactions, i.e. a direct transition, and a set of all its derived transitions. We are looking at the consistency problem in three levels: The global level: The consistency mode of a PDI p, with respect to all its non-PDI derivers. The local level: The consistency mode of a PDI p, with respect to all its immediate derivers
(either PDI or non-PDI). The link level: The consistency mode of a PDI p, with respect to a specific deriver.
CMS (Consistency Modes Set) is a set that is given by the user (via an appropriate user interface), in which the user may designate consistency modes in all three levels. The system designer may specify a level of granularity, and also defaults (where the given default is: every PDI is global fully consistent, taking the most cautious approach). The system checks CMS for internal contradictions and permits the system designer to resolve them. Following are definitions of the different modes in the various levels. 3.2. The fully consistent mode 3.2.1. General description. The fully consistent mode is the classic mode of consistency. If a PDI is fully consistent, it means that its consistency is kept at all times, thus atomicity of all the derivations that leads to the calculation of this value, must be kept. The definition of the term fully consistent in the three levels follows. 3.2.2. Definitions. A PDI p is local fully consistent if each instance of the PDI’s values is identical to the computation results in its invariant definition. Example: Total-Cost is local fully consistent if the invariant Total-Cost;=sum (ActivityCost) is always true. This definition does not say anything about the consistency of instances of Activity-Cost relative to their derivers. A PDI p is global fully consistent if it is local fully consistent and all its PDI’s immediate derivers are also global fully consistent. This recursive definition enforces that p will be consistent with all the derivers, not only the immediate ones. Example: the PDI Total- Cost is global fully consistent if both invariants (of Total-Cost and Activity-Cost) are true at all times. A PDI p is fully consistent with respect to an IU i, if p is a derivative of i and the last update for each instances of i has triggered an update operation to derive p, which has been comp1eted.t Example: Total-Cost is fully consistent with respect to Overhead-Cost if any update in the instance of Overhead-Cost is reflected (via the relevant instances of Activity-Cost) in the value of Total-Cost for the same Project. Note that this does not require Total-Cost to be fully consistent with respect to other derivers (such as Cost- Per- Day.) The following properties can be proved$ from these definitions: If the system contains only atomic transactions, then all the PDI’s are globalfully consistent, and the database is sound. If a PDI p is local fully consistent, then it is fully consistent with respect to each of its immediate derivers. If all the PDI’s are local fully consistent then the system contains only atomic transactions (and the database is sound). If a PDI p is fulfy consistent with respect to an IU i, then for any PDI q s.t. exists a path in ODG from i via q to p, q is fully consistent with respect to i. 3.3. The quasi-consistent mode 3.3.1. General description. The idea of quasi-consistent stems from the relaxation of the transaction atomicity, in order to enable asynchronous execution of sub-transactions. The idea is tThe contribution of i to the computation of the value of $Proofs are straightforward given the definitions.
p.
400
OPHERETZION
commit, i.e. to unlock the involved information-units, while other subtransactions have not yet concluded. If such delayed sub-transaction fails, the original transaction also fails, and the system should emulate transaction rollback. This involves: 1. Retreat to the original values that existed before this transaction. 2. Re-activate all subsequent transactions that have used these values. A PDI is quasi-consistent if it is updated by such delayed sub-transaction. The formal definition is, thus: 3.3.2. DeJinitions. 1. A PDI p is local quasi consistent, if p is not local fully consistent and for each instance of p, if p is not identical with the computation results in the invariant definition, then: (a) A transition to update the instance of p has been submitted. (b) The inconsistency exists only during the time that this transition completed.
has not been
Example: If Total-Cost is local quasi consistent then each update of Activity-Cost creates a sub-transaction whose execution may be delayed. If, in addition, Activity-Cost is global fully consistent then the transaction is partitioned into two sub-transactions: (a) the main sub-transaction, consists of: Update Activity, Update Activity-Cost; (b) the delayed sub-transaction, consists of: Update Total-Cost. The main sub-transaction quasi commits, that is behaves as if the entire transaction issues commit. If the delayed sub-transaction fails (a possible failure may be violation of Assertion 1) then the main sub-transaction retreats to the original values, and any transaction that has used these values since the quasi commit should issue rollback and be re-activated. 2. A PDI p is global quasi consistent if exists a PDI q, in the set DEREQ(p) = {xix is a PDI and x E Derivers(p)} u {p} that is local quasi consistent and all the members of DEREQ(p) are either local quasi consistent or local fully consistent. Example: DEREQ(TotaI-Cost) = {Activity-Cost, Total-Cost). It is sufficient that one member in the set is local quasi consistent, while the other may be either localfully consistent or local quasi consistent for Total-Cost to be global fully consistent. 3. A PDI p is quasi consistent w.r.t. an IU i, if p is a derivative of i and p is global quasi consistent and p is not fully consistent w.r.t. i. Example: If Total-Cost isglobal quasi consistent and Total-Cost is not fully consistent w.r.t. Cost- Per- Day then Total- Cost is quasi consistent w.r.t. Cost- Per- Day. 4. A sub-transaction is quasi consistent if: (a) All the PDI’s updated by all its transitions are global quasi consistent (b) If p 1, p2 are updated by this sub-transaction and p 1 is a deriver of p2, then p2 is fully consistent w.r.t. p 1. These definitions lead to the following properties: 1. If a quasi consistent sub-transaction fails, then the database should behave as if the entire transaction is fully consistent and the transaction has failed (i.e. a ROLLBACK is issued). 2. Consequently, any transaction that has been executed and used any information-unit that belongs to the failed transaction, should rollback and be re-activated. 3. A PDI p is quasi consistent w.r.t. to an IU i, if i is a deriver of p, and p is not fully consistent w.r.t. i, and for every path in ODG that starts in i and ends in p, exists a PDI q, s.t. q is either fully or quasi consistent w.r.t. i and p is either fully consistent or quasi consistent w.r.t. q. 4. A sub-transaction (fro, . . . , tr,,), where tr, updates pi is quasi consistent when: (a) pO is local quasi consistent. (b) for all 0 < i,j G n, if pi is a derivative of pi then pi is fully consistent w.r.t. pi.
Flexible consistency modes for active databases applications
401
The last property determines the way these sub-transactions are considered. A quasi consistent sub-transaction is constructed when exists an edge (i, p) in ODG and p is quasi consistent w.r.t. i. The sub-transaction starts with the transition to update p. Examples:
1. Given that Activity-Cost is local quasi consistent and Total-Cost is local fully consistent. The main sub-transaction consists of Update Activity and there is one quasi consistent sub-transaction that consists of Update Activity-Cost and Update Total-Cost. 2. Given that both Activity-Cost and Total-Cost are local quasi consistent. The main subtransaction consists of Update Activity and there is two quasi consistent sub-transactions, each for one of the derived MSGs. 3.4. The loosely consistent mode 3.4.1. General description. The loosely consistent mode applies in cases, where it is a desirable property to maintain the PDI’s consistency, but it is not mandatory. The consistency of any loosely consistent PDI with its invariant definition is not guaranteed. Like the quasi consistent mode, asynchronous sub-transactions are created, but unlike the quasi consistent case a compensation is
not issued, and a state of inconsistency might persist. 3.4.2, Definitions. 1. A PDI p is local loosely consistent if exists i, an immediate deriver of p, such that when a transition that modifies i occurs, then the appropriate transition to update p is submitted, without any feedback to the original transition. 2. A PDI p is global loosely consistent if exists a PDI q, in the set DEREQ(p) (as defined in the global quasi consistent case) that is local loosely consistent 3. A PDI p is loosely consistent w.r.t. a deriver i when: (a) If i is an immediate deriver when a transition that modifies i occurs, then the appropriate transition to update p is submitted, without any feedback to the original transition. (b) Otherwise exists a pair (q l,q2) s.t. q 1, q2 are nodes in any path from i to p in ODG, and q2 is loosely dependent w.r.t. ql. 4. A sub-transaction
(tr,, . . . , tr,),
where tr, updates pi is loosely consistent when:
(a) pO is local loosely consistent. (b) for all 0 < i, j < n, if pi is a derivative of pi then pj is fully consistent w.r.t. pi. Note that the loosely consistency definition are similar to the quasi consistency ones. The only distinction is the binding of sub-transactions to the main transaction. A consequence of the loosely consistent mode definitions is: If p is global loosely consistent and q is a derivative of p, then q is also global loosely consistent. Inconsistencies that are created as a result of update failure of are treated as exceptions. The exception handling component of PARDES is discussed in [17]. 3.5. Non -active modes
This section presents consistency modes that do not require the support of active database model. 3.5.1. The periodically coordinated mode. In this mode derived update operations are not triggered at all during the normal course of the transaction. Instead, at designated times, the PDI’s instances are recalculated according to their invariants. Efficient implementation of periodical coordination is based upon logging all the IU instances that have been modified from the last coordination. This is equivalent to the consistency mode that is created by replica control models. 3.5.2. The semi-consistent mode. When a PDI-instance is updated in any of the modes, calculation of its value is being performed. In some cases we might prefer to execute the actual update when the first retrieval request for this PDI-instance occurs. In this case, instead of
402
OPHER ETZION
Mode FWly Consistent Quasi Consistent Loosely Consistent Periodically Coordinated Semi Consistent virtual
Persistence t t t t t f
Triggering t t t f 0) f
Eventual Consistency t t f f t or f ir
Grace
Period f t t t t ir
Compensation ir t f f t or f ir
Fig. 4. Comparison of consistency modes.
performing the actual calculation, we shall put a tag, saying that this is an out-of-date value?. The semi consistent mode is coupled with any of the other modes (i.e. it is on another dimension relative to the other modes). If a PDI p is semi consistent then all its derivative are also semi consistent regardless of the coupled mode. 3.5.3. The virtual mode. In the virtual mode all derived data elements are treated as virtual, i.e. they are calculated whenever they are needed. No mechanism is needed in the update process to support this mode. 3.4. Coherency conditions The term Coherency conditions as defined by [9] namely: Temporal and version conditions can be applied in our model in the following fashion: The fully consistent case: In real-time database systems temporal conditions can be applied for an update operation. This may effect the optimization decision about using the semi consistent
solution. Coherency conditions in the sense of inconsistency periods are not compatible with the fully consistent mode. The quasi consistent case: Coherency conditions can be applied to monitor the order of execution of quasi-consistent sub-transactions. The periodically coordinated case: This is the mode that is comparable with the models of replica control as described in Section 2. The Coherency conditions can apply here initiate periodical coordination. 3.7. Summary-comparison
among the various modes
Figure 4 summarizes the differences among the various consistency modes. The comparison criteria are: Persistence: Is the value of derived data element stored in the database? Triggering: Is the derivation being triggered as part of the transaction that updates base
data-elements? Eventual consistency: are derived data elements required to be consistent? Grace period: Is there a grace period where a required consistency may not be maintained? Compensation: If there is a grace period, is compensation is issued if a delayed update fail?
The values in the tables are: t if the answer to the question with respect to the mode is positive, f if the answer to the question is negative, and ir if the question is irrelevant for this mode. Comments: 1. The virtual mode is the only one, in which there is no persistence of derived updates. 2. The three first modes require an active database mechanism. A derivation is triggered during the update process. The semi consistent mode also requires an active database mechanism, but the update operation issued is not a value derivation, instead the out-of-date indication is being updated. 3. The fulIy consistent mode and the quasi consistent mode require and enforce eventual consistency. The semi consistent mode has eventual consistency requirements, and they may tA similar idea has been proposed in the CACTIS model [4].
Flexible consistency modes for active databases applications
403
or may not be enforced. If consistency is enforced then semi eons~renr is similar to quasi consistent (except for the triggering aspect), otherwise semi consistent is a special case of periodically coordinated. 4.
CONCLUSION
4.1. implementation issues
Analysis of the consistency definitions leads to the creation of sub-transactions. Each sub-transaction is treated by the regular database transaction manager as a separate transaction. A consistency manager should be devised to support the compensation part of the quasi consistent mode. The consistency manager applies non monotonic reasoning in a similar fashion to TMS [ 191. The implementation of this consistency manager is a subject for further research. The sub-transaction creator has been implemented in the framework of the PARDES prototype, on a X86 platform. 4.2. Confributions This research is an attempt to relax the strict consistency that is required in active database transaction systems, without giving up the ability to monitor the consistency state, or to leave it to the user’s discretion. The added flexibility features are: 1. the ability 2. the ability persistent 3. the ability
to relax the fully consistency requirement and determine other modes. to optimize the persistency level of a derived information (fully persistent, semi or virtual). to make decisions in either global or locai level.
The model proposed here supports this flexibility and thus extends the usability of the active database paradigm. This is another component of the application that can be now implemented using high-level language in active database applications. Comparing with existing models. The fact that the transaction’s soundness is embedded in the model, and not assumed (left to application programs) is of big advantage. 4.3. Further research Further work still has to be done in both optimization and implementational issues. In the optimization area a subsequent work is being performed now within the PARDES project that issues recommendations about optimized consistency modes, given a schema, PDISET, EDG and expected utilities and resource cons~ptions. Extension of this work to a multi database system is also investigated. Concurrency issues within and among sub-transactions should also be investigated. Implementation work has to be done in order to construct the consistency manager. Finally, we are using the idea of consistency models to create an ~n?egrutor in a CIM database. Acknowledgemenl-I presentation.
would like to thank the reviewers for their helpful comments that have helped me to improve the
REFERENCES N. S. Barghouti and G. E. Kaiser. Concurrency control in advanced database applications. ACM Compuz. Suw. U(3), 269-317 (1991). Xerox Advanced Information Technologies. HiPAC: a research project in active, time-constrained database management. Final technical report. XAIT-89-02, July 1989. 0. Etzion. PARDES-A data-driven oriented active database model. Sigmod Record 122(l), l-7 (1993). S. Hudson and R. King. CACTI& a database for specifying a functionally defined data. In Reading in Object-oriented Darabare Sysiems (Edited by S. Zdonik, D. Mayer). Morgan-Kaufman, San Francisco, 432-443 (1986). H. Garcia-Molina and K. Salem. “Sagas”. In Proc. ACM Sfgmod, pp. 249-257 (1987). Y. Breitbart, H. Garcia-Molina, A. Silberschatz. Overview of multidatabase transaction management. VLDB JI, l(2), Ml-239
(1992).
D. Skeen, S. B. Davidson and H. Garcia-Molina. Consistency in partitioned network. ACM Compuf. Sure. 3(17) (1985). M. Stonebraker. Concurrency control and consistency of multipie copies of data in distributed ingres. In IEEE TOSE 3(3), 188-194 (1979).
OPHERETZION
404
TODS H(3),
[9] 359-384 (1990).
[lo] A. Segev and W. Fang. Currency-based optimal update policies for distributed materialized views. Mgmr Sci. 37(7), 851-870 (1989). [II] G. Wiederhold and X. Qian. Consistency control of replicated data in federated databases. In Proc. Workshop of Replicated Dafa (1990). [12] M. Rusinkiewicz, A. Sheth and G. Karabatis. Specifying interdatabase dependencies in a multidatabase environment. IEEE Compur. 24(12), 46-54 (1991). [13] A. Sheth, M. Rusinkiewicz and G. Karabatis. Using polytransactions to manage interdependent data. In Transaction Models for Advanced Database Applications (Edited by A. Elmagarmid), Chap. 14. Morgan-Kaufmann, San Francisco
(1992). [14] J. Klein and A. Reuter. Migrating transactions. In Proc. Future Trends in Distributed Computer Systems in the ‘90~ (1988).
[15] U. Dayal, M. Hsu and R. Ladin. A transactional model for long-running activities. In Proc VLDB 1991, Barcelona, pp. 113-122. [16] M. Kifer, W. Kim and Y. Sagiv. Querying object-oriented databases. In Proc. Sigmod 1992, San Diego, pp. 393402 (1992). [17] 0. Etzion. Active handling of incomplete or exceptional information in database systems. In Proc. WITS 91, Cambridge, MA, pp. 4660 (1991). [18] J. Doyle. A truth maintenance system. Artif. Intell. 12, 231-272 (1981).