Improving the flexibility for replicated data management in distributed database systems

Improving the flexibility for replicated data management in distributed database systems

Computers ind. E,t~tns, VoL31,No. 3/4,pp. 901 - 905, 1996 Pergamon Copyright©1995C~inaMada~ Ptms Publishedby ElsevierScienceLtd.Printedin GreatBrita...

273KB Sizes 0 Downloads 54 Views

Computers ind. E,t~tns, VoL31,No. 3/4,pp. 901 - 905, 1996

Pergamon

Copyright©1995C~inaMada~ Ptms Publishedby ElsevierScienceLtd.Printedin GreatBritain S0360-8352(96)00274-4 0360-8352~96 $15.00+ 0.00

Improving the Flexibility for Replicated Data Management in Distributed Database Systems Chienwen Wu Department of Industrial Engineering National Taipei Institute of Technology

Geneva G, Belford Department of Computer Science University of Illinois at Urbana-Champaign

Abstract: This paper presents a novel scheme for implementing the flexible replica control proteaol[13] in the distributed database systems. The scheme requires less nodes to be leaked to perform the read(write) operations. This not only provides better performance, but also provides the system designer extra flexibility to implement the protocol. Key words: Replica control, Distributed database systems, Quorum. I. INTRODUCTION Many industrial and commercial applications require data to be efficiently accessed. Manufacturing systems, banking systems, and management information systems are such examples. One way to efficiently access dr,ta is to replicate ,~t_~ at different sites of the systems. In this way, the user could access the most nearby data copy to get best performance. In addition, if there are a lot of data failures, the user can still access the data. However, a synchronization protocol is needed to keep the data consistency. Conventional proteaols[l-I 1] for replica control are either very efficient or very fault tolerant, but not both. Once a system adopts a particular protocol, it is very difficult to change the protocol for better performance or better fault tolerance. A flexible protocol for replicated data management is presented in [13]. The protocol has the following advantages: (1) The proteaol is very flexible. The performance and fault tolerance of the protocol can be adjusted by appropriately setting parameters. (2) The protocol is a general case of many existing protocols. It is shown that the quorum consensus proteaol[2], the grid proteaol[3], the hierarchical quorum consensus proteaol[5] and the hierarchical grid proteaol[6] are all special cases of our protocol. (3) The smallest quorum size of this protocol is O ( ~ ' ) , which is proved optimal in a fully distributed environment. (4) The protocol offers asymptotically high availability, i.e., the availability of this protocol increases to 1 as the number of data copies increases to C~ This paper presents a novel scheme for implementing the flexible replica control proteaol[13] in the distributed ~ t _~_~ systems. The scheme requires less nodes to be leaked to perform the read(write) operations. This not only provides better performance, but also provides the system designer to provide extra flexibility to implement the protocol to fit system requirements. This paper is organized as follows. A brief description of the ptot~ol ~ in [13] is presented in section 2. In section 3, we provide an improved implementation of the protocol in distributed database systems. Conclusions are provided in section 4.

2. A BRIEF DESCRIPTION OF THE PROTOCOL In this section, a brief description of the flexible protocol proposed in [13] is presented. For detailed descriptions of this protocol, please refer to [13]. In this protocol, data copies are organized as an m-level tree. The root is assumed at level m. Real data copies are at the level 0. All the non-leaf nodes do not have real data copies on them and are used only to define the protocol. Each internal node at level (i+ 1) of the tree has l~ + 1 children, each of which is a node at level i. Figure I shows a data object with nine data copies organized a two-level tree. In this example, II =3 and 12--3. Nodes e to m are real data copies and nodes a to d are non-leaf nodes. 901

902

18lh inlernalional Conference on Compulers and Industrial Engineering

Figure 1: An example of a data object with nine data copies organized as a two-level tree. A read(write) quorum of data copies must be locked first to perform a read(write) operation. The write quorum is defined in terms of a read quorum and a partial write quorum. The read quorum, the partial write quorum and the write quorum are defined recursively. Definition 1: The read quorum of a leaf node is defined as the set containing the real data copy on it. A read quorum of a non-leaf node(including the root) at level (i+l) is defined as containing read quorums of n + i of its children, where 0 < n + l _< [i + i. A read quorum of a data object is defined as a read quorum of the root of the tree for the d,~t~ object. Definition 2: The partial write quorum of a leaf node is defined as the set containing the real data copy on it. A partial write quorum of a non-leaf node at level (i+l) is defined as containing partial write quorums of pw~ + 1 of its children, where 0 < pw~ + 1 _< [~ + 1. A partial write quorum of a d,~j_a object is defined as a partial write quorum of the root of the tree for the data object. Definition 3: A write quorum of a data object is defined as the union of a read quorum and a partial write quorum of the data object. In order to ensure one copy serializability[14], it is required that for each i, ri+pw~ > l~. For the proof of one copy serializability, please refer to [13]. Example 2: Take the two-level tree in Figure 1 for example. Assume rl=l, r2--3, pw~=3 and pW2=I. {e,hvk} iS a read quorum of the data object and {e,f,g} is a partial write quorum of the data object. Thus {e,f,g,h,k} is a write quorum of the data object. 3. THE IMPLEMENTATION This section provides a scheme for implementing this protocol in distributed database systems. In this protocol, we require each node in the system to keep information on which node is in which position of the tree. Because the read quorum and the partial write quorum are defined in a very similar way, they can share a common scheme. In the following, we describe first the common scheme, Scheme Lock_Basic_Quorum By passing different parameters to the common scheme, the scheme can be used to lock a read quorum or a partial write quorum of data copies. 3.1 A common scheme for the read quorum and the partial write quorum When a node wants to perform a read(partial write) operation to a data object, the node has to lock a read(partial write) quorum of data copies first. The following scheme Lock_Basic_Quorum(NL, i, r_pw) is provided to lock a read(partial write) quorum of the non-leaf node NL at level i. The scheme is described using C-like language. The scheme Lock_Basic_Quorum(NL, i, r._pw) returns value I if it can lock a read(partial write) quorum of the non-leaf node NL at level i. Otherwise it returns value 0.

Scheme Lock_Basic_Quorum( NL, i,r p w )

18th International Conference on Computers and Industrial Engineering

iffr..pw===READ) b~=ri : else b,=pw~: if(i m- O) { status = request_lock( NIL,); iffstatus ~ lock obtained ) return (I); else return(O); else { num_locked=O; num_fail=0; for(num_examined=i ;(hum_examined < 1~) && (hum locked < b~ ) && (hum_fail < ll+bl-1); hum_examined++) { child = (num_examined)th child of NL: if( Lock_Basic_Quorum( child, i-l, r__pw ) ~ 1) num_locked++, else num_fail++, } if(hum_locked _> bt ) return(l); else return(0): }

The scheme Lock BasicQuorum(NL, i, r_pw) t i m sees if it is to lock a read quorum or a partial write quorum. If it is to lock a reed(partial write)quorum, bt is set to he rJ(pwt). It then chocks to see whether i is equal 0. If i is 0, it issues a lock request to that data copy. If the lock can be obtained, the scheme returns 1. Otherwise it returns 0. If i is not 0, then it tries to lock read(partial write) quorums of r;(pw~) of NL's children. The scheme Lock_BasicQuorum is recursively called to lock read(partial write) quorums of NL's children. This scheme is repeated until read(partial write) quorums of r/(pw~) of NL's children can be locked or it is found that it is impossible to lock read(partial write) quorums of r f f p w D of NL's children. 3.2. Scheme to lock a read quorum of data copies This is done by calling Lock_Basic_Qumum(NL, n, READ),

3.3. Scheme to lock a partial write quorum of data copies This is done by calling Lock_Basic_Quorum(NL, n, PARTIALWRITE). 3.4. Scheme to lock a write quorum of data copies Because the write quorum of a data object is defined as containing a read quorum of the data object and a partial write quorum of the data object, one easy way to construct the algorithm is to call the schemes in the previous two sections to construct the read quorum and the partial write quorum. This scheme is shown as follows. Scheme Lock_Write_Quorum( NL, i ) { Lock_Basic_Quorum( NL, i, READ); Lock_Basic_Quorum( NL, i , PARTIAL_WRITE), } This scheme is indeed a correct algorithm. However, the disadvantage of the scheme is that when we try to lock a I~nial write quorum, we assume we do not know which data copies have been locked in Lock_Basic_Quorum( NL, i , READ). If we know which data copies have been-locked in Lock_Basic_Quorum( NL, i , READ), we can include those data copies in the partial write quorum. One solution for this is that when we try to lock a read quorum, we also think about locking a partial write quorum. This means we might need to consider locking a read quorum and a partial write

903

904

18th International Conference on Computers and Industrial Engineering quorum of a non-leaf node at the same time. To assist in this, we provide a recursive definition for the write quorum.

Definition 4: A write quorum of a non-leaf node at level (i+l) is defined as the union of a read quorum of the non-leaf node at level (i+l) and a partial write quorum of the non-leaf node at level (i+l). To lock a read quorum and a partial write quorum of a non-leaf node at the same time is indeed to lock a write quorum of a non-leaf node. In the following, we provide the scheme Lock_Write_Quorum( NL, i ) to lock a write quorum of a non-leaf node at level i. Scheme Lock_Write_Quorum( NL, i ) { iffi ~ 0) { status= request_lock(N L ); if(status ~ lock_obtained ) return (BOTH _ R_ &P W ) : else return(Not R & Not PW); } else { num R locked=0; num R fail=0; num PW locked=0; num PW fail=0; for(hum_examined=-1; ((hum_examined < Ii ) & & ( n u m R locked < n ) && (num_R_fail < l J + n - 1 ) & & ( n u m PW locked < pw~) & & ( n u m PW fail < l~+pwt-l))), hum_examined++) { child = (num_examined)th child of NL; status = Lock_Write_Quorum( child, i- l); iff(status~BOTH R & PW)[Kstatu~R_ONLY)) hum R locked++; if((status~BOTH R & P W ) [ K s t a m ~ P W _ O N L Y ) ) num P W locked++; if((status~Not R & Not PW)[[(statu~PW ONLY)) num R fail++, i f ( ( s t a t u ~ N o t R & Not PW)[[(statu~R_ONLY)) num PW fail++:

} for(; (num_examined < l~) & & ( n u m R_iocked < ri) & & ( n u m R fail < l ~ - n + l ) ; num_examined++) { child = (num_examined)th child of NL; if(Lock_Basic_Quorum( NL, i-I,READ ~ 1 ) num R locked++; else num R fail++:

} for(: (num_examined _< 1~) & & ( n u m PW locked < p w i ) && (num PW fail < I , - p w i + l ) : num_examined++) { child = (num_examined)th child of NL: iffLock_Basic_Quorum( child, i-I,PARTIAL_WRITE ~ l ) hum PW locked++, else num PW fail++, } }

18th International Conference on Computers and Industrial Engineering Because the Lock_Write_Quorum( NL, i ) scheme considers locking a read quorum and a partial write quorum of a non-leaf node at the same time, less nodes are needed to be locked to collect a write quorum. Better performance is guaranteed. In _n~Jition, this gives system designers extra flexibility in data accessingand ,~t? copies placement. 4. CONCLUSIONS This paper presents an improved scheme for implementing the flexible protocol. The improved implementation scheme requires less nodes to be locked. This results in better performance. This also provides the system designers extra flexibility in ~_tn accessing and data copies placement.

RIglq~RENCES I. Agrawal, D. and A. El Abbadi, "Exploiting Logical Structures in Replicated Databases", Information ProcessingLetter,no. 33. pp. 255-260, 1990. 2. Gifford, D.K., "Weighted Voting for Replicated Data" In Proceedings of the 7th Symposium on Operating Systems Principles,pp. 150-162, 1979. 3. Cheung, S. Y., M. Ammar, and M. Ahamad, -The Grid Protocol: A High Performance Scheme for Maintaining ReplicatedData" In Proceedings of the InternationalConference on Data Engineering, pp. 438-445, 1990. 4. Agrawal, D. and A. El Abbadi, "An Efficient Solution to the Distributed Mutual Exclusion Problem", In Proceedings of the Principlesof DistributedComputing Systems, pp. 193-200,1989. 5. Kumar, A., "HierarchicalQuorum Consensus: A New Algorithm for Managing Replicated Data;', IEEE Transactions on Computers, no. 40, pp. 996-1004, 1991. 6. Kumar, A. and S. Y. Cheung, "A High Availability ~ Replicated Data", Information ProcessingLetter,1992.

Hierarchical Grid Algorithm for

7. Maekawa, M., "An O('V~-) Algorithm for Mutual Exclusion in Decentralized Systems", A C M Transactions on Computer Systems, no. 3, pp. 145-159, 1995. 8. Thomas, RH., "A Majority Consensus Approach to Concurrency Control for Multiple Copy Dat__~hases",A C M Transactions on Database Systems, no. 4, 1979. 9. Wu, C. and Geneva G. Belford, "The Triangular Lattice Protocol: A Highly Fault Tolerant and Highly EfficientProtocolfor Replicated Data", In Proceedings of the I Ith Symposium on Reliable DistributedSystems, pp. 66-73, 1992. I0. Wu, C., "A Fault Tolerant O('s/-N) Algorithm for DistributedMutual Exclusion", In Proceedings of the 12th InternationalConference on Computers and Communications, pp. 175-150, 1993. I I. Wu, C. and Geneva G. Belford,"Achieving High Performance and Fault Tolerance for Distributed Mutual Exclusion", In Proceedings of the Conference on Information Sciences and Systems, pp. 251-255, 1993. 12. Wu, C. 1993. "A High Performance Scheme for Managing Replicated Data in Distributed Database Systems", In Proceedings of the Chinese Instituteof Industrial Engineers National Conference, pp. 524-529, 1994. 13. Wu, C. and Geneva G. Belford, 1994. "On Implementing a Replica Control Protocol for Flexible Performance and Fault Tolerance in DistributedDatabase Systems", 10th Technical Education Conference, 1994. 14. Bernstein,P.A., V. Hadzilacos,and N. Goodman, "Concurrency Control and Recovery in Database Systems", Addison-Wesley, MA, 1987.

905