Concurrency control: adaptation of rax protocol for telecommunication systems G Nicaud
The paper describes an adaptation o f the 'rax' protocol for concurrency control in distributed multiservice telecommunication systems. This adaptation has been developed on a prototype distributed database management system ( D B M S ) implemented at C N E T on a version o f the Sabrina centralized DBMS. The adaptation consists o f the transposition o f the rax protocol into a multisite architecture, namely, one operation and maintenance site and several real-time sites, and the implementation o f particular techniques for control o f concurrent access by two types o f transactions, namely, real-time transactions and management transactions. concurrency control, databases, real-time transactions, distributed databases, telecommunication systems
Numerous protocols for the management of concurrent access have been proposed ~,2. Nevertheless, such protocols sometimes need to be adapted or mixed to meet the particular requirements of an application or family of applications. Recently published papers deal with concurrency control for real-time environments 3,4, main memory-resident database environments 5, or objectoriented environments 6. The adaptation presented here is compatible with these methods when they are based on locking. This article describes how the 'rax' protocol ~,2,7,8 was adapted to meet the requirements of future distributed multiservice telecommunication systems. This adaptation has been developed on a prototype distributed database management system (DBMS) 9 implemented at C N E T on a version of the Sabrina centralized DBMS ~°. This adaptation could also be useful in other applications with similar architecture and characteristics (such as data servers in telematic networks). The paper is organized as follows. The next section gives a brief overview of the 'rx' and tax protocols. The third section presents the application features and the requirements that the concurrency control protocol must meet. The fourth section contains the description of the proposed algorithms, and the fifth section concludes the paper. An Appendix gives the results of a quantitative evaluation developed at C N E T ~t.
France Telecom, CNET LAA/SLC/BSA, Route de Tregastel, 22301 Lannion, France
Vol 34 No 9 September 1992
r
x
r
yes
no
X
no
no
Figure 1. Compatibilities f o r r x protocol
RX AND RAX PROTOCOLS Rx protocol with wait-for graph The rx protocol with wait-for graph management is the basic two-phase locking protocol (2PL) 2,8:2. To read a data item, a transaction must obtain a read lock (r) on this data item. Similarly, to update a data item, a transaction must obtain a write lock (x) on the data item. A transaction obtains an r-lock if the data item is not already x-locked. If it is x-locked, the transaction must wait for the termination of the transaction that has xlocked the data item. A transaction obtains an x-lock if the data item is not locked. If it is r-locked, or x-locked, the transaction must wait for the termination of the transaction(s) that holds the lock. Each time a transaction must wait, it is integrated into an oriented wait-for graph. If the integration creates a cycle in the graph, there is a deadlock situation: one of the transactions in the cycle must be aborted. The compatibility table for the rx protocol is as shown in Figure 1.
Rax protocol with wait-for graph For safety reasons, a transaction does not write directly into the database. It writes into its private workspace. While a transaction prepares the new value of a data item into its workspace, the rax protocol allows other transactions to read the item in the database. The new value of the data item updated by the transaction is only visible to this transaction in its workspace. If the transaction commits, its workspace is integrated into the database and replaces the former data. On the other hand, if the transaction aborts, its workspace is released and the database remains intact. The active phase of an update transaction is the phase between the beginning of the
0950-5849/92/090563-10 O 1992 Butterworth-Heinemann Ltd
563
Concurrency control: adaptation of rax protocol for telecommunication systems
r
a
x
r
yes
yes
no
a
yes
no
no
Figure 2. Compatibilities for rax protocol
transaction and the time when the transaction wants to commit. As long as a transaction that modifies some data in its workspace has not reached the end of its active phase, other transactions can read these data in the database. To modify a data item, a transaction T m u s t hold an alock on this data item. At the end o f T's active phase, the a-locks o f T are converted into x-locks, and if nonterminated transactions have read items x-locked by T, T waits for the end of these transactions before committing. If a transaction T issues an a-lock request for a data item that is a-locked or x-locked, this request has to be denied, and T must wait for the termination of the transaction that holds the a-lock or the x-lock. If a transaction T issues an r-lock request for a data item that is x-locked, this request has to be denied, and T must wait for the termination of the transaction that holds the x-lock. Each time a transaction must wait, it is integrated into an oriented wait-for graph. If the integration of the transaction creates a cycle in the graph, there is a deadlock situation: one of the transactions in the cycle must be aborted. Figure 2 is the compatibility table for any transaction that requests an r-lock or an a-lock on a data item during its active phase. The rows show the locking request of a transaction in active phase: the transaction requests either an r-lock or an a-lock. The columns show the data item's locking state: the item is either r-locked by one or several transactions, a-locked (and possibly r-locked by one or several transactions), or x-locked (and possibly rlocked by one or several transactions). Dependencies and wait-for between transactions
When a transaction T~ r-locks a data item already alocked by a transaction 7"2 (as 7"2holds a-locks, T2 is still active), T2 remains active and becomes dependent on 7"1. This means that T2 cannot complete while T~ is not terminated. At the end of T2's active phase, a-locks of T2 are converted into x-locks and, if Tt is not terminated, T2 waits for T~'s termination. So, 7"2 is integrated into the wait-for graph. This can lead to a deadlock (cycle in the graph), in which case one of the transactions in the cycle will have to abort. Therefore, in the rax protocol, two types of wait states for a transaction are distinguished: • A transaction can await, during its active phase, the release of the lock on a data item. 564
r
a
x
r
yes
yes
yes]no
a
yes
no
no
Figure 3. Compatibilities o f rax protocol with rax protocol option
• A transaction can await, at the end of its active phase, the termination of transactions that have read one or more items that the transaction has x-locked. R a x protocol option
A transaction Tt can obtain an r-lock on a data item xlocked by a transaction T2, provided that T2 is dependent o n T I. Thus when T~ issues an r-lock request for the data item x-locked by T2, Tl must wait for T2 and so is integrated into the wait-for graph. If such waiting creates a cycle in the graph, this means that T2 is already dependent on T~, and, consequently, an r-lock on this data item can be granted to T~. If there is no cycle in the graph, Tj must effectively wait. This option cannot leave T2 waiting indefinitely, as the number of transactions on which T2 is dependent, when its a-locks are converted into x-locks, is limited. After the locks conversion, 7"2can no longer be dependent on new transactions. The compatibility table (see Figure 3) is slightly modified as there is now partial r-x compatibility. Figure 4 shows a case of r-x compatibility. Consider d~ and d2, two data items with the constraint to be equal, and T~ and T2, two transactions that operate on these data. T~ reads d, and d2, T2 updates them (d~: = 200; d2: = 200). Suppose the initial value of dt and d2 is 100. At the end of T2's active phase, T2 is dependent on T~ because T~, which has read d~, is not terminated. Therefore, to commit, T2 must wait for T~'s termination. T2 can wait as its integration into the wait-for graph does not cause a cycle. Tl can r-lock d2, even though d2 is already x-locked, as T2 is dependent on T~. It is known that T2 is dependent on Tt as making T~ wait for T2 would lead to a cycle in the wait-for graph. When T2 is committed, its workspace is integrated into the database: the values of d~ and d2 in the base are now 200. The former locations ofd~ and dz are freed and locks on them are released. The correctness of the rax protocol has been demonstrated elsewhere t. TELECOMMUNICATION FEATURES
APPLICATION
The main function of a telecommunication system is to manage communication services, namely, telephone, data, and image transmission. A distributed telecommunication system consists of an operation and maintenInformation and Software Technology
G N1CAUD
T~
TI r-locks d, reads d, (100) in the database
a-locks dt writes d~ in its workspace (200) a-locks d2 writes d2 in its workspace (200) end of its active phase conversion of a-locks on d, and d2 into x-locks waits for the end of T, r-locks ,42 reads d2 (100) in the database terminates and releases its locks commits time
Figure 4. r-x compatibility ance site and several communications call processing sites that here are called real-time sites, for simplicity. Real-time sites provide telecommunications services;
they are the operational part of the system. They provide services to users, for example, calls set up in a switching system, authentifications in a credit card public phone system, and network routing in a multiple server system (e.g., toll-free number servers). The operation and maintenance site supervises the other sites. It observes and monitors their correct operation, draws up statistics, etc. Management transactions are created and controlled by the operation and maintenance site. They execute at the operation and maintenance site and at the real-time sites (distributed transactions). Real-time transactions are created and controlled by the real-time sites. They only execute at their site of creation (local transactions) (see Figure 5). Two main different classes of data are distinguished: local data and replicated data. Local data are stored on the operation and maintenance site or stored on the realtime sites. Replicated data are replicated or fragmented on real-time sites with a full reference copy on the operation and maintenance site. Update of this reference copy implies updates of copies on real-time sites to keep mutual consistency. Replicated data are only queried by real-time transactions. Local data of real-time sites may be handled by management transactions. The management transactions are infrequently initiated, but long-lived. They are distributed and perform many read and write operations. They are not subject to real-time constraints and they may wait. The real-time transactions are very frequent, but short. They are local: they only execute at their site of creation. They essentially perform read operations. They have strong real-time constraints and they are not
OPERATION AND M A I N T E N A N C E SITE Creation of management transactions
R E A L - T I M E SITE Creation of real-time transactions
Concurrency control between management transactions
L I
R E A L - T I M E SITE Creation of real-time transactions
Concurrency control between real-time transactions
Concurrency control between real-time transactions
Concurrency control between management transactions and real-time transactions
Concurrency control between management transactions and real-time transactions
Concurrency control between management transactions
Concurrency control between management transactions
Figure 5. Telecommunication application features Vol 34 No 9 September 1992
565
Concurrency control: adaptation of rax protocol for telecommunication systems T~
write in workspace(x)
T, -~rrlad in base(x) ad in base(z)
write in workspace(y) /'2
Ii
ad in base(y) write in in workspace(z)
ad in base(z)
T,
-~ead in base(x) write in workspace(w)
Tin:managementtransaction T,:
real-timetransaction
time
_commitment (workspace is copied into the base)
Figure 6. Rax protocol and applicathgn allowed to wait. The decision to reinitiate an aborted real-time transaction is handled at the application level. The objective of the concurrency control protocol is to provide a compromise between management transaction control and real-time transaction control. Management transactions should interfere as little as possible with real-time transactions. In other words, they should: • lead to few real-time transaction abortions • decrease the real-time transaction throughput as little as possible Nevertheless, management transactions, which are long with costly recovery, must be completed with a high probability within a reasonable time.
PRESENTATION OF PROPOSED PROTOCOL
r
a
c
r
yes
yes
yes
a
yes
no
no
Figure 7. Compatibilities for rac protocol transaction waits, at the end of its active phase, for termination of residual real-time transactions. However, the rax protocol needed adaptations: it was necessary to transpose it into a distributed architecture, while maintaining and improving the transactions throughput and the accesses compatibility. Before describing the adaptation of the rax protocol, the rac protocol 13and why that was not adapted is briefly presented.
Why rax protocol?
Why not rac protocol?
On the one hand, real-time transactions are short, involve principally read operations, and are not allowed to wait. On the other hand, management transactions, which update the database, are long, but may wait. Therefore, the rax protocol is well suited for this application as, while data are updated by a long management transaction in its workspace, short real-time transactions are allowed to read these data in the database (see Figure
The rac protocol ],2,8,t3 is the logical outcome of the rax protocol: a transaction is always allowed to read without waiting in any situation. Consider T, a dependent transaction at the end of its active phase. T is immediately committed, the a-locks of T are converted into c-locks and the workspace of T is maintained until the termination of the transactions on which T is dependent. A transaction T' is always allowed to read, without waiting, a data item committed by T: T' reads the old value in the database if T is dependent on T', or T' reads the new value in the workspace of T if T is not dependent on T'. The compatibility table for the rac protocol is shown in Figure 7. The rac protocol might be better than the rax protocol:
6). With the conventional rx protocol, real-time transactions TI, 7'2, and T3 would have had to wait for the commitment of the management transaction T,,. The throughput of real-time transactions is not interrupted, except during the short time when a management 566
Information and Software Technology
G NICAUD
a read operation of a real-time transaction would be immediately executed in any case. The author has not adapted the rac protocol because its consistency level is lower than that of the rax protocol: with the rac protocol, a transaction may read a value that is not the last committed value. The rac protocol and the rax protocol are similar. It is probable that a similar adaptation based on the rac protocol is possible for applications that require less consistency.
it ts(T) < ts(T') then T r-locksd; Treadsdin the base; T' ismadedependenton T;/*theendof Tmustnecessarily*/ /*precedethe end of T'*/ if ts( T ) = ts( r') then /*T and T' are the sametransaction.Thistransaction*/ /*wantsto reada data itemit has alreadya-locked*/ T readsd in its workspace; if ts(T) > ts(T') thenTmustwaitfor T';
A d a p t a t i o n o f r a x protocol To transpose the rax protocol into a distributed environment, transactions are timestamped by the use of a method comparable to the methods proposed ~4for the rx protocol. Transactions are timestamped as they are created and should access the data according to their timestamp (a recent transaction bears a higher timestamp than an older one). Transaction timestamping avoids deadlock. Thus the distributed management of a wait-for graph and the complex and poorly resolved associated problems ~5 are excluded. Nevertheless, in a general distributed environment, a clock synchronization mechanism 16 must be provided for timestamping. To reduce the risks of abortions, the general proposed rax protocol with timestamping only allows a partial compatibility between reads in the database and writes in workspaces. For the application, the full compatibility between real-time transaction reads in the database and management transaction writes in workspaces has been implemented through a particular timestamping mechanism. The particular implemented rax protocol encloses others adaptations to improve real-time transactions throughput, while allowing management transactions to execute under normal conditions. N o w the following points are described: • the principles of the general rax protocol with transaction timestamping • the particular timestamping mechanism • the rax protocol implemented for the application
ENDCASE Comments • T wants to read d and T' has a-locked d ( T ' is in its active phase). As T' has a-locked d, T' prepares a new value of d in its workspace. One or several transactions may have r-locked d. I f these transactions are not terminated when T' reaches the end of its active phase, T' waits for the end of these transactions. I f T' does not wait, these transactions may subsequently read d (or another item) in the base after the end of T' and therefore precede T for d and follow T for d (or another item): there is no equivalence with a serial execution (see Bernstein et al.'s serializability theory0. • T wants to read d and T' has x-locked d. T' waits for the end of one or several transactions before the effective c o m m i t m e n t begins. ( T cannot read d during the effective commitment of T' because the effective commitment has been considered as a critical section where execution of any other transaction is excluded.) T' must also wait for the end of T. This cannot leave T' waiting indefinitely as the number of transactions with timestamps less than T' is finite when T' x-locks and can only decrease after this lock conversion. WRITE(d) : CASEd OF
General rax protocol with transaction timestamping The principles o f the general rax protocol with transaction timestamping are presented through the following read, write, commit, and abort primitives, d is a data item, Tis a transaction that reads or writes d, and t s ( T ) is the timestamp of T. T' is a transaction with a lock on d and t s ( T ' ) is the timestamp of T'. Algorithms READ(d) : CASEd OF d is unlockedor r-locked: T r-locksd; T readsd in the base; d is a-lockedor x-lockedby T': Vol 34 No 9 September 1992
d is unlocked: T a-locksd; T writesd into its workspace; d is r-locked:/*byone or severalT' transactions*/ T a-locksd; T writesd into its workspace; T is madedependenton all T' with r-lockon d; d is a-lockedor x-lockedby T': if ts( T) > ts( T' ) then T mustwaitfor T'; if ts( T) = ts( T' ) then /*Twantsto re-writea dataitemit "hasalreadya-locked*/ T re-writesd into its workspace; 567
Concurrency control." adaptation of rax protocol for telecommunication systems if ts(T)
abort(T); /*deadlockprevention*/ END CASE
T,
T~ a-locks dl
writes d~ into its space
Comments • T wants to write d and one or several T' transactions have r-locked d.
a-locks d2 writes d2 into its space
If d is r-locked by one or several transactions (but neither a-locked nor x-locked), T is always allowed to a-lock d even if the timestamp of T is lower than the timestamp of the transactions that have rlocked d. But, if at the end of its active phase T is dependent on another T' transaction having a timestamp higher than the timestamp of T, Twill be aborted (deadlock prevention). This strategy gives some chance to avoid abortions. Otherwise, if T h a s already r-locked d, Tconverts its r-lock into a-lock. • T wants to write d and T' has a-locked or x-locked d. If ts(T) is lower than ts(T'), T is aborted because T' may wait or may subsequently wait for T on another site: this is a distributed deadlock situation. COMMIT(T): /*T has reachedthe end of its activephase*/ IF 3 T' transactionswithr-lockson data that T has a-locked and havinga highertimestampthan TTHEN T is aborted/*deadlockprevention*/ ELSE a-locksof T are convertedintox-locks; IF 3 T' transactionswithr-lockson data that T has x-locked (and havinga lowertimestampthan T) THEN /*Tis stilldependenton all theseT' transactions*/ Twaitsfor the end of all T' transactions;
/*T dependson no transaction,EFFECTIVECOMMITMENT (criticalsection):*/ the workspaceof T is integratedinto the database; locksheldby Tare released; T is withdrawnfromall dependencylists; transactionswaitingfor Tarereactivatedaccordingto theirtimestamps; /*END EFFECTIVECOMMITMENT*/
Comments T has reached the end o f its active phase. The effective commitment begins if T depends on no transaction. For the simplicity of presentation, the effective commitment is considered as only one critical section where execution of any other transaction is excluded. The reality must be more complex. The effective commit568
requests r-lock on d~ time
Figure 8. Partial r-a compatibility merit is based on a distributed two-phase commit protocoP. The effective commitment must include mutual exclusions and internal locks. If the effective commitment includes mutual exclusions, the transactions that want to access a datum that is being effectively committed by a transaction T must wait for T. This waiting cannot create a deadlock as Twill never wait for a datum of the base (the serializability is preserved as T was not dependent when the effective commitment began). The situation is similar for the abort operation. ABORT(T) /*criticalsection*/: the workspaceof T is released; locks held by T are released; T is withdrawnfrom all dependencylists; transactionswaitingfor T are reactivatedaccordingto theirtimestamps;
Partial r-a compatibility and full r-a compatibility The transposed rax protocol only allows a partial r-a compatibility: read transactions may have to wait for write transactions that hold a-locks. Consider the following execution: Tl and T2 are two transactions, timestamped 1 and 2 respectively (T~'s timestamp < T2's timestamp), which access d~ and d2. d~ and d 2 a r e located on two different sites (see Figure 8). According to the protocol, T2 must wait for Tfs completion because T2 has a higher timestamp than T~. If transactions are allowed to r-lock data already alocked by transactions that have smaller timestamps, then the transactions that have a-locked will be preventatively aborted if at the end of their active phases they are still dependent on the transactions that have higher timestamps. Thus if T2 had been allowed to r-lock d~, and if T~ had reached the end of its active phase before T2's completion, T~ should have been aborted. After having rlocked d~, T2 could have issued an a-lock request for d2. In this case, if 7"1 had not been aborted, T2 would have created a distributed deadlock. In the application, if Tmand 7"2are management transactions, the protocol avoids the risk of abortion for T~. Information and Software Technology
G NICAUD
Now the particular timestamping mechanism is presented. This mechanism provides both full r-a compatibility between real-time transaction reads and management transaction writes and partial r-a compatibility between management transactions.
Timestamping mechanism To achieve concurrency control between real-time transactions and management transactions, the timestamp of real-time transactions must always be lower than the timestamps of management transactions. Thus a real-time transaction can always read data alocked by a management transaction as its timestamp is always lower than the management transaction's timestamp (full r-a compatibility between real-time transaction reads and management transaction writes). On the other hand, management transactions will be always allowed to wait for real-time transactions. Moreover, as real-time transactions are not allowed to wait, they cannot create deadlock. Therefore, for concurrency control between these transactions, it is unnecessary to timestamp them or construct a wait-for graph. It is sufficient to check that real-time transactions behave in accordance with the rules of the rax protocol with a waitfor graph, and to abort them when they have to wait. Consequently, each real-time transaction is given a zero-valued timestamp. This timestamp is not used for concurrency control between real-time transactions, but is used to control concurrency between real-time transactions and management transactions. Management transactions are normally timestamped at their operation and maintenance site with a positive value that will thus always be higher than real-time transaction timestamps. This mechanism provides: • a full r-a compatibility between real-time transaction reads and management transaction writes • a full r-a compatibility between real-time transactions • a partial r-a compatibility between management transactions (to reduce the risk of abortion for management transactions) Finally, as management transactions are timestamped at one single site (the operation and maintenance site), it is unnecessary to provide a clock synchronization mechanism. As a result of such a mechanism, however, the system can evolve towards a fully distributed control in which management transactions can be created at several sites.
Description of rax protocol implemented for application Before presenting the algorithms, the case of the permanent dependency of a management transaction and the case of a condemned real-time transaction are discussed.
Permanent dependency of management transaction When T, a management transaction with x-locks, waits for the termination of only real-time transactions, other real-time transactions are not allowed to r-lock the data Vol 34 No 9 September 1992
x-locked by T. Otherwise, T could be continuously dependent on real-time transactions. Nevertheless, if T waits for at least one management transaction, real-time transactions are allowed to r-lock the data x-locked by T. The problem of permanent dependency cannot arise because: • the number of management transactions with a lower timestamp than that of T is finite • real-time transactions are not allowed to r-lock data xlocked by a management transaction that is waiting for only real-time transactions Thereby, in this latter case, r-x compatibility is provided between real-time transaction reads and management transaction writes.
Condemned real-time transactions As there are many real-time read transactions, a realtime write transaction at the end of its active phase can often be dependent on other real-time read transactions. But such a transaction is not allowed to wait. So it will be committed, and the real-time transactions on which it depends will not be aborted but rather will be condemned. A condemned real-time transaction is aborted if it tries to re-access the database. If a real-time transaction is condemned between its last access to the database and the end of its active phase, it will complete normally. As real-time transactions perform few accesses to the base, this option gives condemned real-time transactions some chance of terminating normally. Algorithms Management transactions are distinguished by their timestamps and their identifiers, while real-time transactions are distinguished only by their identifiers. The read and/or write locks are associated with each data item. For each transaction T, there is a list of transaction identifiers on which T depends and a list of identifiers of transactions that are waiting for T. The messages sent between the operation and maintenance site and the real-time sites are not discussed. It is simply said that the operation and maintenance site is informed of management transactions that are waiting at the real-time sites. Hence, for each real-time site, the operation and maintenance site manages the list of management transactions that are waiting here. The effective commitment of a management transaction is two phases. This does not appear in the algorithm, but it is implemented in the prototype. In the prototype, the effective commitment is executed in only one critical section where execution of any other transaction is excluded. If the effective commitment included several critical sections or mutual exclusions, the management transactions that wanted to access a datum being effectively committed by another management transaction T should wait for T, but the real-time transactions should be aborted (they are not allowed to wait). In future work, improvements will be made to 569
Concurrency control: adaptation of rax protocol for telecommunication systems i m p l e m e n t the effective c o m m i t m e n t w i t h several m u t u a l exclusions. T h e s i t u a t i o n is s i m i l a r for t h e a b o r t o p e r ation. A t r a n s a c t i o n c a n r e a d the s a m e d a t a i t e m several t i m e s o r be d e p e n d e n t several times o n the s a m e t r a n s a c tion. T h e s e s i t u a t i o n s are n o t i n d i c a t e d i n the d e s c r i p t i o n o f the a l g o r i t h m s . T h e a l g o r i t h m s are n o w p r e s e n t e d for the read, write, c o m m i t , a n d a b o r t p r i m i t i v e s . T is a t r a n s a c t i o n t h a t r e a d s o r writes a d a t a i t e m d, ts(T) is the t i m e s t a m p o f T, a n d id(T) is its identifier. T ' is a t r a n s a c t i o n w i t h a lock o n d, ts(T') is the t i m e s t a m p o f T ' , a n d id(T') is its identifier. F i n a l l y , for the sake o f r e a d a b i l i t y , the r e a d a n d write p r i m i t i v e s are d e s c r i b e d b y c o n s i d e r i n g t h a t the t r a n s a c t i o n is n o t c o n d e m n e d (every c o n d e m n e d t r a n s a c t i o n t h a t a t t e m p t s to r e a d o r write is a b o r t e d ) .
if is(T) < > 0 thent /*T is a management transaction*/ if ts( T) < ts( T' ) then
T r-locks d; T reads d in the base; T' is made dependent on T; else T must wait for T'; END CASE WRITE (d) : CASE d OF d is unlocked: T a-locks d; T writes d into its workspace;
R E A D (d) :CASE d OF d is r-locked:/*by one or several T' transactions*/~ d is unlocked or r-locked: T r-locks d; T reads d in the base; dis a-locked by T':+ if ts(T) < ts(T') then /*T' isa management transaction*/ /*because ts(T') is necessarily higher than 0'/ T r-locks d; T reads d in the base; T' is made dependent on T; if ts( T) ~- ts( T' ) and id( T) = id( T') then /*Tand T' are the same transaction*/ /*(management or real-time transaction)*/ T reads d in its workspace;
ifts(T)= ts(T ') and id(T) < > id(T' ) then
/*T and T' are different real-time trans.*/ T r-locks d; T reads d in the base; T' is made dependent on T;
T a-locks d; T writes d into its workspace; T is made dependent on all T' with r-lock on d; dis a-locked or x-locked by T': if ts( T ) > ts( T' ) then /*T is a management transaction*/ Tmust wait for T'; if is(T)= ts(T')and id(T) = id(T')then T re-writes d into its workspace; /*Tand T' are the same management or real-timetransaction*/
if ts(T) = ts(T') and id(T) < > id(T') then abort (T); /*T and T' are two different real-time transactions*/ if ts(T) < ts(T') then abort (T);/*T' is a management transaction*/~P
END CASE
COMMIT (7):/*T has reached the end of its active phase*/ if ts(T) > ts(T' ) then /*Tis a management transaction*/ /*because ts(T) is necessarily higher than 0'/ Tmust wait for T';
IF 3 T' transactionswithr-10ckson datathat Thasa-locked and havinga highertimestampthanTTHEN Tis aborted/*deadlockprevention*/~
dis x-locked by T': if ts(T) = 0 then /*T is a real-time transaction*/ if T' waits for the end of a managementtransaction then
T r-locks d; T reads d in the base; T' is made dependent on T; else abort (T);
'If T is a real-time transaction, T can always read the data item. 570
Tis different from T', as T' having x-locked d has terminated its active phase, while T is still in its active phase. ~Tis allowed to write d into its workspace, but if, at the end of its active phase, T is dependent on another T' transactions having timestamps higher than T, Twill be aborted (deadlock prevention). If Thas already r-locked d, T converts its r-lock into an a-lock. ' T a n d T' may both be management transactions and, in this case, Tis aborted for deadlock prevention. The concurrency between management transactions can still be improved if necessary, by inserting the wound option for these transactions~4or the dynamic timestamps ~7,or both. T and T' may both be management transactions and, in this case, T is aborted for deadlock prevention. The concurrency between management transactions can still be improved if necessary, by inserting the wound option for these transactionsE4or the dynamic timestamps ~7,or both. Information and Software Technology
G NICAUD ELSE a-locksof Tareconvertedintox-locks; IF ts(T)=0 THEN/*Tisa real-timetransaction*/ if 3 T' real-timetransactionswithr-locks on datathat Thasx-locked /*Tis stilldependenton theseT' real-timetransactions*/ thesereal-timetransactionsare condemned; /*theywillbeabortediftheylaterattemptto 10cknewdata*/
IF ts(T)<> 0 THEN/*Tis a managementtransaction*/ if 3 T' transactionswithr-10ckson datathat Thas x-locked (andhavinga lowertimestampthan T) then I*Tis stilldependenton all theseT' transactions*/ Twaitsfortheendof all T' transactions; /*T dependson no transaction,EFFECTIVECOMMITMENT (criticalsection):*/
the w0rkspaceof Tis integratedintothe database;
vice c o m m u n i c a t i o n systems. The p r o t o t y p e was developed f r o m a version of the Sabrina centralized D B M S . T h e full compatibility between real-time transaction reads in the d a t a b a s e and m a n a g e m e n t transaction writes in workspaces allows a high t h r o u g h p u t o f realtime transactions (short and essentially involve read operations) during the execution at the same site and, on the same data, o f m a n a g e m e n t transactions (long and p e r f o r m n u m e r o u s write operations). The m a n a g e m e n t transactions' read requests never lead these transactions to be aborted. T h e real-time transactions never lead to the a b o r t i o n of m a n a g e m e n t transactions. Thus, as long as they are few in n u m b e r , the m a n a g e ment transactions have a high probability o f being executed to completion within a reasonable time, while interfering as little as possible with the real-time transaction traffic.
locksheldby Tare released; ACKNOWLEDGEMENTS Tis withdrawnfromall dependencylists; transactionswaitingforTarereactivatedaccordingto theirtimestamps; /*ENDEFFECTIVECOMMITMENT*/ ABORT(7)/*criticalsection*/: the workspaceof Tis released;
The a u t h o r thanks Y Gicquel, G Le Gac, Y Lepetit, and B Rodet, who designed and implemented the model for incorporating the concurrency control system described in this paper. The author also thanks A G r a v e y , who designed and developed the quantitative evaluation partially presented in the Appendix, R Kerboul, with w h o m he had fruitful discussions, and the a n o n y m o u s referees for their precious comments.
locksheldby Tare released; REFERENCES
Tis withdrawnfromall dependencylists; transactionswaitingfor Tarereactivatedaccordingto theirtimestamps; Analysis o f abort situations T h e most critical case o f a b o r t is that of a real-time transaction that wants to r-lock data already x-locked by a m a n a g e m e n t transaction. The simulation described in the A p p e n d i x estimates the frequency of this situation with an effective c o m m i t m e n t where the execution of other transactions is not p e r m a n e n t l y excluded. The simulation c o m p a r e s the behaviour of the usual rx protocol with that of the rax protocol. A n o t h e r case of a b o r t arises when a m a n a g e m e n t transaction wants to a-lock data already a-locked or x-locked by a n o t h e r m a n a g e m e n t transaction t i m e s t a m p e d in reverse order. This rarely occurs, because m a n a g e m e n t transactions are not frequently activated. The concurrency between m a n a g e m e n t transactions can still be i m p r o v e d if necessary, however, by inserting the w o u n d option for these transactions ¢4 or the d y n a m i c timestamps tT, or both.
CONCLUSIONS This a d a p t e d rax protocol has been implemented in a p r o t o t y p e D B M S designed by C N E T for handling the data m a n a g e m e n t aspect of future distributed multiserVol 34 No 9 September 1992
1 Bernstein, P A, Hadzilaeos, V and Goodman, N Concurrency control and recovery in database systems Addison-Wesley (1987) 2 Kerboul, R, Nieaud, G and Pageot, J M 'Le controle de la concurrence dans les bases de donnees' Technical report N T / L A A / S L C 353 CNET, France (March 1991) 3 Marzullo, K 'Concurrency control for transactions with priorities' Technical report TR89-996 Cornell University, New York, USA (1989) 4 Abbott, R and Garcia-Molina, H 'Scheduling real-time transactions' SIGMOD Record Vol 17 No l (March 1988) pp 71-81 5 Lehman, T J and Carey, M J 'A concurrency control algorithm for memory-resident database systems' in Proc. 3rd Int. Conf. FODO (Lecture Notes in Computer Science Vol 367) Springer-Verlag (1989) pp 490-504 6 Cart, M, Ferrie, J and Richy, H 'Le controle de concurrence des transactions dans les environnements orientes objects' IVeme Journees Base de Donnees Avancees Vol 1 (May 1988) pp 117-138 7 Bayer, R 'Integrity, concurrency and recovery in databases' in Proc. ECJ Conf. 1976 (Lecture Notes in Computer Science Vol 44) Springer-Verlag (1976) pp 77-106 8 Kiessling, W and Landherr, G 'A quantitative comparison of lockprotocols for centralized databases' in Proc. 9th VLDB Conf. Florence, Italy (31 O c t o b e r 2 November 1983) pp 120-130 9 Driouche, M, Gicquel, Y, Kerherve, B, Le Gac, G, Lepetit, Y and Nicaud, G 'Sabrina-RT, a distributed DBMS for telecommunications' in Proc. Int. Conf. Extending Database Technology (Lecture Notes in Computer Science Vol 303) Springer-Verlag (1988) pp 594~599 10 Gardarin, G e t al. 'Sabrina: un syst~me de gestion de bases
571
Concurrency control: adaptation of rax protocol for telecommunication systems
11
12
13
14
de donn6es relationnelles issu de la recherche' TS1 Vol 5 No 6 (1986) pp.453-474 Gravey, A 'Evaluation et performances de Sabrina-RT' Internal report CNET (1987, revised 1991) Eswaran, K P, Gray, J N, Lorie, R A and Traiger, J L 'The notions of consistency and predicate locks in a database system' Commun. ACMVol 19 No 11 (November 1976)pp 624-633 Bayer, R, Heller, H and Reiser, A 'Parallelism and recovery in database systems' ACM Trans. Database Syst. Vol 5 No 2 (June 1980) pp 139-156 Rosenkrantz, D J, Stearns, R E and Lewis, P M 'System level concurrency control for distributed database systems' ACM Trans. Database Syst. Vol 3 No 2 (June 1978) pp
20 real-time sites 20 relations per site 150 TI/s/site C P U l o a d = 15%
CPUload = 45%
rax
2.53
7.85
rx
5.80
17.82
Figure 9. Mean number o f lost Tl per TG on real-time site
178-198 15 Elmagarmid, A K 'Survey of distributed deadlock detection algorithms' SIGMOD Record Vol 15 No 3 (September 1986) pp 37-45 16 Lamport, L 'Time, clocks, and the ordering of events in a distributed system' Commun. ACM Vol 21 No 7 (July 1978) pp 558-565 17 Bayer, R, Elhardt, K, Heigert, J and Reiser, A 'Dynamic timestamp allocation for transactions in database systems' in Proc. 2nd Int. Sympos. Distributed Data Bases Germany (September 1982) pp 9-20
APPENDIX: QUANTITATIVE EVALUATION A simulation (in Simscript 2.5 language) has been designed and developed at CNET" to evaluate the performances expected of a real system based on the principles of the DBMS prototype 9. The main results about the concurrency control are presented here. The simulation allows the observation of the number of realtime transactions lost due to the concurrency between real-time transactions (T1) and management transactions (TG). Two types of T G have been considered, namely, TGrx and TGrax.
Simulation hypothesis System The evaluation was made for 20 real-time sites and for 20 relations per site. The relation is the locking granularity. These 20 relations can be regarded as the highly active part of the database system where the conflict rate is most significant.
Transactions The TG (long write transactions) write on all the sites. They update replicated data. A TG modifies only one relation. One T G is activated every two minutes (30 TG per hour). At any time, there is at most one TG in the system. The T1 (short read transactions) are activated every 1/150 second and are sequentially executed. They read only one relation. They hold the local processor during their read. The read time of T1 (CPUT1) is the base time of the simulation, The authors have chosen 1 ms as read time for a TI.
572
The TG are executed in two steps: 'prewrite' ('a' reservation) and 'write' ('x' reservation during the commit phase). Each step itself comprises two states: the 'CPU state' where the TG hold the processor (CPUa and CPUx), and the 'wait state' ('waita' and 'waitx') where the TG have locked the data but do not hold the processor. The commit phase can therefore be regarded as an effective commitment where the T G do not permanently exclude the execution of other transactions. The reads of TI are not compatible with 'waitx' for the TGrax and with 'waita' and 'waitx' for the TGrx. CPUa and CPUx were assumed to be five times CPUTI. The duration of 'waita' and 'waitx' is determined by the time the TG must wait for the end of the prewrites or writes on all the real-time sites. A time constant was added to 'waita' (equal to CPUa) to include the period when the TG handles local data on the operation and maintenance site before the commit phase. This time constant is a low estimate of time taken for the work of a TG on the operation site during its prewrites ('a' reservations) on the real-time sites. It is obvious that the 'rax' protocol is even more interesting if this work becomes important.
Scheduling policy on real-time site The TI are activated first. The TG are activated only when there is no TI waiting for the processor, but they are reactivated as a priority for the write step. The case of a real-time transaction not terminated when the management transaction wants to commit is not considered, but the conflict exists between TG and TI during the commit phase of TG. If the TI holds the processor for 1 ms, the CPUload of a realtime processor is 15% for 150 TI per second. CPUload = (number of T1 per second * CPUT1)/10. The authors consider that the CPUload of the TG is negligible. The simulation was executed with a CPUload equal to 15% and a CPUload equal to 45% (i ms and 3 ms for the read time of T1, respectively).
Simulation results Figure 9 shows the results of simulation for 1000 generated TG. The mean number of lost TI is more than twice lower using the rax protocol in both cases (CPUload = 15% and CPUload = 45%).
Information and Software Technology