Performance analysis of the send-ondemand: A distributed database concurrency control protocol for high-speed networks Sujata Banerjee
Victor 0 K Li t and Chihping Wang $
In this paper, we consider an application of high-speed wide area networks, viz. that of distributed database systems (DDBS). In a high-speed environment, although the transmission delay decreases, the propagation delay (communication latency) now becomes the throughput thwarting factor. Existing Concurrency Control (CC) protocols do not address the communication latency problem. This paper is a continuation of our research work in this area. First, we very briefly summarize our previously reported research results, viz. the issues that are important for DDBS in high-speed wide area networks, and the description of a new CC protocol called 'send-on-demand'. A detailed comparison of this protocol with that of locking is provided, using analytical and simulation techniques. The datacycle scheme performs very well in a read-oriented environment, and the rest of the paper deals with developing and comparing hybrid CC schemes using datacycling for queries. Altogether, the performance of four concurrency control schemes is compared.
Keywords: distributed databases, high-speed WANs, concurrency control protocol, performance analysis
The rapid evolution of high-speed networks is prompting research in many areas, particularly in the distributed systems of the future. In this paper, we report our continuing research on distributed database systems in a *Department of Information Science, University of Pittsburgh, Pittsburgh, PA 15260, USA tDepartment of Electrical Engineering-Systems, University of Southern California, Los Angeles, CA 90089-2565, USA tOmniScience Object Technology,Santa Clara, CA 95054, USA ~This research was performed at the University of Southern California. Paper received: 15 February 1993: revised paper received 24 August 1993
high-speed environment. Our interest in this area is motivated by our belief that most of the traffic in the networks of the future will be due to distributed applications such as this. Further, the network speed is expected to continue to increase, and it is important to develop efficient algorithms that scale well with the network data rate. In the OSI network hierarchy, these applications would be implemented in the highest layer (application layer). We attempt to merge two rapidly growing research areas, that of high-speed networks and distributed systems, and propose database Concurrency Control (CC) algorithms that can sustain a high transaction throughput in the new environment. In a distributed database system (DDBS), the data is partitioned into several smaller databases and distributed over multiple computer sites in a computer network. Traditionally in DDBS research, the computer network has been considered to be a performance bottleneck, and a lot of research effort has been directed towards the design of database operations that minimize the data transmission cost. With the development of highspeed networks, the network transmission costs go down and new algorithms that efficiently utilize the huge bandwidth available are required. At first glance, it seems as if DDBS (with traditional protocols) performance will improve if the low speed computer network is transformed overnight into a highspeed network. This is in fact true. However, as the network data rate continues to increase in the future, the performance will soon saturate, and there will be no noticeable improvement with increase in the network speed. The reason is that at high network data rates, the transmission cost (which continues to decrease as the speed is increased) is no longer the botlleneck. The new
0140-3664/94/030189-16 ~ 1994 Butterworth-Heinemann Ltd computer communications volume 17 number 3 march 1994
189
Performance analysis of the send-on-demand protocol: S Banerjee et al.
bottleneck is the propagation delay, which is independent of the network data rate. Hence in our attempts to derive better performance from the high-speed network, solutions to this communication latency problem will have to be found 1"2. Owing to the importance of this topic, several of the gigabit network testbed projects (specifically, CASA and Aurora) have ongoing efforts in the distributed computing applications area 3'4. The only database system known to us that exploits the huge bandwidths available to obtain higher transaction throughputs is the datacycle database machine 5 9. The datacycle machine is a centralized database system, although it can also be implemented in a distributed manner. If the distributed datacycle architecture were to be employed for a DDBS, while the query processing scheme would work well, the update mechanism (described later) would prove to be a severe limitation. We have concentrated our efforts in developing a new concurrency control protocol called send-on-demand that works better then traditional protocols (for example, locking) in a high-speed environment m. In this paper, we present a very detailed comparison study between send-on-demand and locking at different data rates and different read access distributions*.Then going a step further, we propose and compare two hybrid CC schemes, combining the datacycle concept with send-on-demand and locking. The performance comparison was done using analytical models for each of the CC protocols proposed. This performance model, which was validated using simulations, is also described in this paper. All of the results obtained point towards the superiority of the send-on-demand CC protocol. The superior performance is due to the capability of the proposed protocol to hide the communication latency, and to efficiently convert the available communication bandwidth into transaction throughput. Some preliminary results on the failure-recovery aspect have been obtained. However, in this paper, those issues are not addressed so as to keep the paper focused on the concurrency control problem. For the sake of continuity, the next section deals with our previously reported research results, albeit very briefly. Details may be obtained elsewhere l°.
PREVIOUS RESEARCH DDBSs in a high-speed environment have some unique characteristics. These were identified earlier l°, based on which the new concurrency control protocol termed 'send-on-demand' was developed. The most important criterion in a high-speed W A N is the communication latency or the signal propagation delay. The success of any protocol in a high-speed environment depends upon the effective camouflaging of the propagation delay. *In our earlier papefl°, the performancewas compared in a write-only environment.
190
Minimizing sequential message passing is one way to reduce the total propagation delay incurred. Further, owing to the anticipated heavy volume of traffic in a high-speed DDBS, the level of data contention is also quite high. Thus, optimistic CC schemes will not be very effective I l, 12. Also, transaction rollbacks and deadlocks should be avoided to the maximum possible extent, as their occurrence in a high data contention environment will potentially affect a very large number of transactions. These issues have been taken into account while designing the send-on demand protocol, as seen in the following subsection.
Send-on-demand: a new CC protocol A detailed explanation of the send-on-demand scheme may be obtained elsewhere l°. A brief description follows, along with an illustrative example. In a traditional DDBS, each data object is located at a particular computer site, and all requests to access it have to be processed by the computer site where the data object resides. This leads to an exchange of a sequence of messages between the transaction initiating site and the site where the data object is located. For instance, in the locking protocol, a sequence of three messages-lock request, lock grant and lock release-is required every time a transaction wants to access a data object. In our effort to reduce the number of sequential messages, and also promote cooperative execution between all the sites in the DDBS, the following CC algorithm termed 'send-on-demand' was proposed. The database correctness notion was that of conflict serializability ~l'I:. To avoid rollbacks to the maximum extent, the consistency criteria of the highest degree (degree 3, as defined by Gray et al. TM 13) was used. In the new scheme, every incoming transaction T broadcasts its access set information along with its arrival (at the head-of-line position) timestamp TS (T) to all the sites in the DDBS. Every site receiving the broadcast message compiles this information to construct a 'claim-queue' for each of the data objects located at that site. The claim-queue for a data object contains a list of transactions that require access to that data,object, in the order of their arrival timestamps. Further, no transaction is processed until every site in the DDBS has received the corresponding broadcast information. The transaction is then said to be confirmed, and the duration a site has to wait until a transaction is confirmed is termed the confirmation duration. Thus a claim-queue at any point in time contains a set of confirmed transactions and a set of unconfirmed transactions. The confirmation duration (B) of a transaction is the time greater than or equal to that required to send the access set information to all the sites. This ensures that at every site, the claim queues have entries in the same order, and thus no deadlocks can occur. B is decided during DDBS design,
computer communications volume 17 number 3 march 1994
Performance analysis of the s e n d - o n - d e m a n d protocol: S B a n e r j e e et al.
and is one of the most important parameters in this protocol. From the illustrative example that follows, it becomes clear how B is calculated. Initially, all the data objects reside at certain sites. When a confirmed transaction reaches the H O L of a particular claim queue, the site holding the data object transmits it to the site that initiated the confirmed transaction. Along with transmitting the physical copy of the data object, all associated responsibilities of maintaining the data object are also transferred to that site. Every transaction-initiating site waits for the entire access set to arrive at its location, finishes processing the transaction, and then sends out the data objects to other sites that are next on the respective claim queues. In traditional systems, to maintain the atomicity of transactions, global commit protocols like the two-phase commit 11 may have to be implemented, which requires message passing between the sites that hold the data objects in the access set. However, in the new protocol, at commit time, all the data objects in the access set are physically located at one site, thus requiring only a local commit procedure, which will have a significantly smaller communication cost. According to laws of concurrency, consecutive reads can be processed concurrently 12. This has been taken care of in the send-on-demand algorithm, as summarized in Figure 1. A simple D D B S example with a 3-node network and three data objects illustrating the send-ondemand protocol is given in Figure 2. The communication links are labelled with the communication time (transmission + p r o p a g a t i o n delay, the major component of which in high-speed networks is the propagation time). The confirmation duration B is the time required to broadcast a message to all the sites, and thus is set to a value of" two units. To exemplify the differences
All sites: • Update claim queues as and when transactions are confirmed. Sites Processing Updates: • Broadcast the entire access set ( r e a d s e t + w r i t e set) of the update to all the sites. • Wait for the f o l l o w i n g events: ---
• •
•
All the data objects in the access set arrive. The 'completion' m e s s a g e arrives from the transactions that hold a read-copy of a data object in the writeset. Execute the update. If holding a read-copy of data object Dj, send a 'completion' m e s s a g e to the next write in the claim queue for D/. If holding the write-copy of D/, send a read-copy of D / t o all sites which have a read entry in the claim queue before the next write; and send the writecopy to the next write site.
Figure I Send-on-demand CC algorithm
between locking and send-on-demand, the same example has been worked out in Figure 3 with locking as the CC protocol. In this simple example, send-on-demand saves 1 unit of time more than locking, in executing the two transactions. Next, a brief description of the datacycle architecture follows, to pave the way for a discussion of some hybrid protocols.
Datacycle database machine The datacycle database machine 5 9, a Bellcore effort, was developed to achieve very high read-oriented transaction throughputs. In this architecture, the entire database contents are broadcast periodically over a fibre optic broadcast bus. A central database site (also called the storage pump) is responsible for the broadcasting mechanism. User Access Units (UAU) equipped with data filters read the data 'on the fly' as the data files pass by that site on the broadcast bus. An optimistic multiversion concurrency control scheme is employed, whereby the data objects are read from the broadcast bus and the updates are submitted to a central Update Manager (UM) via an upstream network. The central update unit is responsible for resolving update conflicts and accepts only a nonconflicting set of updates. The updates are never introduced in the broadcast streams in the middle of the current cycle. The update manager delays the updates until the current cycle is over and a new cycle begins. This ensures the mutual consistency of the data objects read in a single cycle. The main bottleneck in this strategy is the centralized update manager, whose saturation point ultimately limits the update throughput. The query throughput is essentially unlimited, although long transactions that extend over one or more data cycles will have to undergo local validation. The datacycle architecture also gets around the I/O bottleneck for record retrieval. In our previous paper n°, the datacycle idea was adopted for a read-only DDBS in a high-speed WAN. For a true DDBS, the data objects have to be distributed. While a distributed storage p u m p is feasible, the distribution of the update manager is nontrivial. In our previous paper 1°, analytical and simulation performance models were developed for the datacycle architecture applied to a wide area, high-speed, bidirectional ring network. Since the processing of a query may begin anywhere in the middle of a data cycle, it is quite possible that the query may not be able to access its entire access set in the remaining part of the current data cycle. Two extreme scenarios were studied1°: one case was where every query was able to access a mutually consistent set of data objects regardless of whether a cycle boundary was crossed or not; the other case was when some queries could not access their entire access set in the current cycle, and had to wait until the end of the current cycle and then access all the data
computer communications volume 17 number 3 march 1994
191
Performance analysis of the send-on-demand protocol: S Banerjee et al.
T1
B=2 units
T1
DI
Claim queue for D i Di
T2
Transactions above the horizontal bar are confmned, those below are unconfirmed. 0
D2
1
arrival at HOL
D3
T2 t=0
Site 1 Broadcast Message for T l sent
Site 2
Site 3
[Site 1[ TS=0[ D2lwl D3[ w I t= 1
D1
Broadcast Message for T 2 sent
..............................................................................................................................
t= 2
t=3
' ...............................................................................................................................
D1
D1
D2
~'
D2
D3 [~
D2
D2
D3
t .
.
.
T 1 confirmed and D 2 sent to Site 1
.
.
.
.
.
.
D3 ~
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
T 1 confirmed and D 3 sent to Site 1
T 2 confirmed and D1 sent to Site 2 Ttexecuti°n continues for 1 unit.
D3
Dl
t=4 T 1 finishes execution and departs. D 3 sent to Site 2 ...........................................................................................................................................................................................................................
t=5
D2
D1
i ...............................................................
D3 T 2 execution continues for 1 unit
Figure 2 Simple example illustrating send-on-demand execution
D2, D3 locked arrive D2, D 3
D2, D3 unlocked
Site I
lo,
2134
T I arrives
5
6
7
8
9
lOlll2
T 1 departs D1 D1 D3 D3 D1 D locked arrives locked arrives unlockedun~ocked
Site 2 /
Figure 3 Same system as in Figure 2 using locking
192
0
/1 2 / T 2 arrives
3
4
computer communications volume 17 number 3 march 1994
5
6
r7 T 2 departs
8
9
10
11
12
Performance analysis of the send-on-demand protocol: S Banerjee et al.
objects in the new data cycle. The query throughput in the datacycle scheme is guaranteed to lie between the query throughput derived in these two extreme cases. It was shown that in a read-only environment, the maximum attainable query throughput increases almost linearly with increase in the network data rate. It is our belief that in a read-only situation, at a high enough data rate, the datacycle scheme will outperform any other existing scheme. The results of a comparison study 1° between the datacycle and locking schemes in a read-only environment prove this.
Hybrid algorithm From the algorithm description in Figure 1, it is clear that queries (read-only transactions) may be processed by the send-on-demand scheme, but there will be a broadcasting overhead and other delays associated with it. As mentioned before, the datacycle scheme is one scheme that works well for queries and also scales very well with the network data rate. Thus a hybrid scheme with send-on-demand for update transactions and datacycle for queries (hybrid(s-o-d)) may be easily envisaged*. For a fair comparison, the updates could be handled by any other conventional algorithm (for instance, locking) and the queries by datacycle, and we look at the hybrid algorithm of locking with datacycle (hybrid(lck)). To implement the datacycle concept, a broadcast bus is necessary, while the updates may be handled via any topology. Our simulation study was implemented on a bi-directional ring topology, with the broadcast bus running parallel to the ring, as shown in Figure 4. We assumed that a single central storage pump Bi-directional Ring
@
N ~
~
DDBS Site
handled the entire datacycle operation, although we would ultimately want to use a distributed version of the data storage pump. All the update information in the system has to be transmitted to the storage pump periodically. Thus the central data pump has a copy of the entire database. Another copy of the database is distributed among the DDBS sites. The queries require a consistent set of data objects to obtain correct results. The data set need not be the most recent, however. As in the datacycle architecture, the updates are incorporated at the beginning of a data cycle and never in the middle. If a query obtains its data set entirely from a single data cycle, the data objects read are guaranteed to be mutually consistent. Next, analytical models for the pure and hybrid send-on-demand schemes are developed to compare the performance of the different protocols.
ANALYTICAL MODELS Our approach to the analytical modelling of the sendon-demand protocol has been to modularize the whole process, work out each module, and finally integrate them. Our objective in this section is to develop performance models for the pure send-on-demand protocol with both read and write operations, and the hybrid send-on:demand scheme where queries (readonly transactions) are handled by the datacycling mechanism. The analysis of the datacycle operation was done in our earlier work 1°. Here, we first look at the pure send-on-demand protocol with write-only transactions and no read operations. Then we generalize this to the case where both read and write operations are allowed, but the queries are handled in the same way as updates (i.e. go through the broadcasting phase, and then wait for the appropriate data objects to arrive at the site). Finally, we incorporate the datacycling scheme for queries, and build a fairly accurate performance model for the hybrid send-on-demand protocol. There are several analytical models available for the locking protocol H ~ and with minor modifications, can be made to work in this scenario. Thus we do not work out the analysis for the hybrid(lck) scheme here, and concentrate only on the send-on-demand protocol.
Pure send-on-demand: write-only transactions
C~
- - ~
~'JBroadcast
Bus
Figure 4 DDBS network topologywith one data pump
*In our earlier paperm, this hybrid scheme was termed the integrated and described in more detail.
send-on-demand CC algorithm,
For illustration purposes, it might be best to look at a timing diagram depicting the chronological events in the life of a typical incoming update transaction as in Figure 5. In the pure send-on-demand protocol, when an update transaction reaches the head-of-line (HOL) position, it enters a broadcasting phase of constant duration B, during which the access set of the transaction is broadcast to all the sites in the DDBS. The
c o m p u t e r c o m m u n i c a t i o n s v o l u m e 17 n u m b e r 3 m a r c h 1994
193
Performance analysis of the send-on-demand protocol: S Banerjee et al.
Wk W~
~
.~e tk
t2 E
B
T
Figure 5 Chronologicalevents in the
execution of an update using the sendon-demand protocol
Tr~saction ~ v e s at HOL Broadcasting Begins
i
End of Broadcasting
confirmation duration is set to the value of B. In the analysis to follow, it is assumed that the broadcast duration is accurately known and thus the confirmation duration is set to this value. Obviously, this is an ideal situation which is not true in the real world. In reality, messages may be delayed, and this is studied in the next section. After the transaction is confirmed, the site processing the update transaction has to wait for each of the k data objects to arrive at that site before the processing can begin. The duration of the update execution is denoted by a random variable E. After the broadcasting is completed, the claims for each of the data objects in the access set of the update enter into the respective claim queues of the data objects. Now, each claim has to reach the HOL of the respective claim queue before the data object can be sent to the site under consideration. The service time and the waiting time for the claim queue of data object Dj is denoted by the random variables Xj and Wj, respectively. The random variable tj, whose distribution is known, denotes the time (transmission +propagation) it takes for the data object Dj to reach the site under consideration. The service time for a transaction accessing k data objects at any site is denoted by S k. From the timing diagram in Figure 5, the following two relationships may be derived:
S k = B + max{ W1 + tl, W2 -+- t 2 , . . . , W, + tk} + E (1)
X i = S k - //V/-B,
1
(2)
In the above two equations lies the main problem of analysing such a system. Since Wi is a function of Xi, from equation (1) we find that S k is a function of X/, and from equation (2), Xi is a function of S ~. To solve this problem, an iterative technique has been employedl4, 19. To simplify the notation, we now drop the superscript from S k and replace the random variable by S implicitly assuming that all the updates access the same number (K) of data objects. We are interested in finding the average response time of any update, which is defined to be the total time spent in the DDBS by the update. For that, we need to find the average waiting time W of any transaction, which is related to S and S 2, the first and second moments of S, respectively, by the
194
D 1 arrives D 2 arrives
T
D k arrives Processing Begins
End of Transaction
Pollaczek Khinchin mean value formula 2°, as below. The update arrival process at a site is assumed to be Poisson with rate 2, and the transaction multi-programming level at each site is assumed to be unity: 2S 2
w -
(3)
2(l - 233
The average response time R is then given by: R = S+ W
(4)
The problem now boils down to calculating S and S 2. Let Z = max{ W1 + tl, W2 -It-t2,..., WK + tK} and let each of the random variables (IV,. + ti), 1 < i < K have the same probability density function (pdf), denoted fw+~(x). Further, assuming independence of each of these K random variables, the pdf of Z may be derived as below applying elementary probability theory: fz(z) = K f w +,(z) r~v+l,(z)
(5)
where Fw+t(z) denotes the probability distribution function corresponding to fw+t(z). Now, Z and Z 2 may be easily calculated provided Fw+t(z) is known. The probability distribution of Wi depends on the distribution of Xg, and from equation (2) we find that the distribution of X~ depends on that of IV,.. Under these circumstances, the best that one can do is to assume some distribution fw+ t(z) and f x ( x ) (pdf for Xi) and then validate the analysis using the simulation results. The mean waiting time (IV,.) at a claim queue is given by the Pollaczek-Khinchin formula, as below: _
2cX~ 2(1 - 2,.X,.)
(6)
where 2c is the arrival rate of claims for that particular data object from all the sites in the DDBS. To proceed with the analysis, we make the assumption that all the data objects are uniformly accessed by every incoming transaction. We can relax this assumption very easily, except that the arrival rates to each claim queue will then be different. In a DDBS of N sites, D data objects that are uniformly accessed and each update accesses K data objects, the following relationship holds: 2, -
computer communications volume 17 number 3 march 1994
2KN D
(7)
Performance analysis of the send-on-demand protocol: S Banerjee et al.
The iterative algorithm to calculate 2 and Z 2 along the lines of the technique of Shyu and Li 14'19 is now described below. The convergence properties of this iterative technique are discussed by Shyulg: I. 2.
n = 0. Set WI "/ = 0. (The superscript is the iteration number.) C o m p u t e Z and Z 2 using f z (z).
3.
Compute Xi = Z - WI ") + -E and X~ using fv(X).
4.
n = n + 1. Compute WI "/ using equation (6).
5.
I f l W l "i - W I ' - 1)t > e, then go to step 2, otherwise stop. (e, is the tolerance.)
1000 :
Analysis
9oo ........................ ~"...;.s~ub~9~.................. 1 Gbps
KJ
800 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
g
K-l,
_
700 6O0 50O
g
400 3OO 200 100 0
Once 2 and Z 2 have been computed, then S and S 2 may be calculated as below. It is assumed of course that the first and second moments of E are known:
50
100
150
200
250
300
350
400
450
Total Arrival Rate to the System (tp:;)
Figure 6 Comparisonof simulation and analysis of the pure send-ondemand protocol
S=B+Z+E ~S = (B + Z
+ E ) 2 = B E + g 2 ÷ E 2 -+- 2
BZ
(8)
+2BT+2EZ Validation The above analytical model was implemented in the C programming language and some results obtained. We compared our analytical results to simulation results for the bi-directional ring topology described earlier. The same assumptions in the simulation model were made. The simulation programs written in the C programming language executed 150,000 transactions typically in each run, and required a few minutes of C P U time for each simulation point on a SUN Sparc workstation. The execution time E was set to zero and the broadcasting phase duration B was set to a constant value equal to the time required by a message to propagate half way down the bi-directional ring. Transmission time for a data object was assumed to be a constant, as in the simulation model, f w ~t(z) and f x (z) were both assumed to be uniform density functions. This is an approximation obviously that does not do well as k - t h e number of data objects accessed by an update i n c r e a s e s - b u t under these circumstances seemed to be a good approximation. We reiterate here that the analysis in the previous section rests heavily on these two distributions, and a bad choice would render the analysis results to be far from the simulation results. The value of the tolerance e, in the iterative algorithm was set to 10 6. In Figure 6 we present a comparison between the results obtained by analysis and simulation for K = 1 and K = 4. The graphs contain the average transaction response time plotted against the total update arrival rate (N).) into the DDBS. The graphs in Figure 6 have been plotted with the DDBS parameters, viz. N = 10 sites, D - - 100 data objects, circumference of the ring network L -- 6000 miles, database of total size G - 50 Mbit and the network data rate C = 1 Gbit/s. The system chosen for our study was small and for illustrative purposes alone.
The analysis and simulation results match fairly well at low arrival rates. At high arrival rates, there are two factors that cause the simulation results to be different from the analysis. One factor is, of course, the error induced by the uniform density approximations for fw+t(z) a n d f x ( z ) , which gets worse as K increases. The other factor is due to the dependencies introduced by the order in which the claims come into different claim queues. One must remember that in this protocol, the data objects are hopping from site to site. The time it takes a data object to reach a particular site depends upon which site accessed it earlier. Averaging the results over several simulation runs will reduce this discrepancy.
Pure send-on-demand: updates with read o p e r a t i o n s
In the design and implementation of any concurrency control protocol, the underlying principle is the fact that read operations can be concurrently executed by several transactions while the write operations must be executed serially, i.e. one after another. So far, we have been looking at the send-on-demand protocol in a write-only situation. In this section, we incorporate the read operations as had been mentioned in the algorithm in Figure 1. It is important to realize that so far we have not yet brought in the datacycling idea. We intend to use the datacycling idea for queries (pure read transactions) only. This will be dealt with in the next section, in this section we analyse the case where each data object may be accessed for a read or a write operation. If a transaction requires to both read and write a data object, it will be considered the same as a write operation. As in the algorithm description of the sendon-demand protocol given earlier, reads on a data object may occur concurrently, while writes have to be serialized. As before, we assume that each incoming transaction
computer communications volume 17 number 3 march 1994
195
Performance analysis of the send-on-demand protocol: S Banerjee et al.
accesses a certain set of data objects uniformly. Only now, each data object may be accessed for read or write purposes. The read probability is denoted by p~, while the write probability is denoted by pw = 1 - p ~ . Although it is simplistic to assume that all the data objects have the same read (and hence write) access probability, it does not gain us significantly more insight by assigning different probabilities of read access to different data objects. Our motivation here is to indicate how the performance analysis may be done in a relatively simple case. The analysis may be easily upgraded to handle the case of different read access probabilities for different data objects with a slight increase in computational complexity. Again, we are interested in calculating the average transaction response time. In this case, the distribution and moments of the service time S are derived as in the previous section using the iterative procedure. We handle the concurrent servicing of the read claims for a particular data object by grouping consecutive read requests into one superread request. The read requests are assumed to arrive in batches, with a batch-size distribution dependent upon the relative read and write average arrival rates. All read requests that arrive in a sequence, irrespective of the actual arrival instants of the individual read requests, form a batch. The number of consecutive read requests is distributed geometrically, as given below: Prob{j read requests occur in a row} = p~p~, j > 0 (9)
Prob{Super-read sequence length = j ] at least 1 read request occurs} = PrJ- ~P~,J-> " 1 (10) The average length of a super-read request J (assuming that at least one read occurs) is as given below: oo
J = Z j Prob{Super-read sequence length = j[ j=0 at least 1 read request occurs} OG
= ~-~jp{-1Pw (11)
j=0
1 Pw
As before, 2c is the average arrival rate of claim requests to the claim queue of data object Di. However, now we have to differentiate between the read arrival rate 2~ and the write arrival rate 2~', which are given by: (12) Let 2~s5 denote the average arrival rate of super-read requests to a claim queue. Assuming that super-read requests arrive in batches of average size J, we have: J = p~ 2[.
196
(13)
Intuitively, the above equation may be explained as follows. If Pr ~ O, i.e, 2~ ---+0, the rate of arrivals of a read sequence 2~cr ~ 0 . If P r ~ 1, i.e. P w ~ 0 , then 2~r ~ 0, since the arriving read requests will form very large sequences, reducing the overall read sequence arrival rate. When 2~'>> 2 r, p , . - + 1, which in turn implies that J ~ 1. Since the read arrival rate is very small, almost all the read requests are served individually, and thus 2sf ---+2~. The service time for a super-read request of size j at the ith claim queue, Xj[r is the maximum of the service times for each of the j individual read requests, the density function of which is given by:
f~i(s ) =jfx(s)(Fx(s))j
1
S >_ 0
(14)
where f x ( s ) is the pdf of the service time of a claim queue. The first two moments of X~;~, namely, X~,~ and x]r2 ,t ' may be derived from the above density function. Averaging the above two moments over all possible lengths of a super-read request, we can calculate the average super-read service time X i and the second moment X sr2, i . xsir : £
"~.r nJ -1 --j,trr
~'sr2= ~
Pw - - ,
j=l
sr2 j - 1
X},i P r
Pw
(15)
j=t
Now the claim queue system reduces a two-class queueing system of arrivals consisting of write claims and super-read claims, with no priority. The average waiting time at the claim queue of say, data object Di is now given by: Wi =
~sr ]('sr2 w 2 c --i ~- I~c X i 2(1 - ( , v r - c
(16)
i -~- ~c Z ) )
Incorporating the above into the iterative procedure mentioned before, we can calculate the average transaction response time, as a function of the total arrival rate in the system, as well as the probability of read, Pr. Minimizing the average transaction response time has the effect of increasing the overall transaction throughput. Here a slightly different set of performance curves from the ones that we have looked at so far are considered. Figure 7 contains the maximum throughput as a function of the read probability for a DDBS at 2 Gbit/s and K = 4 (with all other parameters remaining the same) obtained by both simulation and analysis. The maximum transaction throughput is defined as the largest overall transaction throughput with an average transaction response time of 0.5 s.
Using datacycling for queries The datacycle scheme, as demonstrated earlier m, works very well for queries. So, it only seems logical to introduce a hybrid scheme (hybrid(s-o-d)) where the queries are handled by the datacycle scheme and the update transactions are handled by the send-on-demand
computer communications volume 17 number 3 march 1994
Performance analysis of the send-on-demand protocol: S Banerjee et al. ! --
400
!
--
!
!
!
380
•~
360
.~
340
v Simul£tion i: Analy,fis
R Gbps
!
,8o ..............i...........
.............. i........... i.......... ! ............ i ...... i ........
i ............
17
i .
360I
~2Gbps
_ _ : Analy's
/
.
~= 320
30O
280
0.1
0.2
0.3
0.4
0,5
0.6
0.7
0.8
2800
0.9
__,
0.1
012
i
0.3
Read Probability
014
i
0.5
i
0.6
0.7
0.8
0.9
Read Probability
Figure 7 Comparison of simulation and analysis of the send-ondemand protocol with read operations
Figure 8 Comparisonof simulation and analysis of the hybrid sendon-demand protocol
CC protocol. As the fraction of queries in the system increases from 0 to 1, the hybrid scheme goes from the send-on-demand at one end to datacycte at the other end. If each incoming transaction accesses K data objects and the probability of read is p,, then the fraction of queries in the D D B S is denoted by q = prx. Since the queries are processed separately from the updates, they do not contribute to the load on the claim queues. Thus, the average arrival rate of claims to a claim queue is now given by:
In Figure 8, the simulation and analytical results for the hybrid send-on-demand scheme are presented for K = 4 and C=2Gbit/s. Here it was assumed that the datacycle operation was under the frequent updates scenario, and thus the results represent the upper bound of the achievable throughput. So far, an important assumption made is that all the sites receive the broadcast message within the confirmation duration. However, since broadcast times may vary for different sites, some problems may arise, and these are discussed in the next section.
2,.
-- 2(1 -
q)K N
(17)
D As mentioned in out" earlier work w, it is sufficient to look at the two extreme cases of the datacycle o p e r a t i o n - t h e no updates scenario and the frequent updates scenario. Let tP~:"c and tPf~,c denote the average service times of queries under the no updates and frequent updates scenarios, respectively, when each query accesses K data objects and the datacycle runs at C Gbit/s. The corresponding second moments are ~ 2 denoted by ~ ,nu( , 2 and frill =K.c • As before S and ~ denote the moments of the service time of an update with the send-on-demand protocol. At each site now we have two broad classes of t r a n s a c t i o n s - u p d a t e s and queries coming in with an overall average rate of 2 transactions per second. No priorities between these two classes are implemented. Thus applying the previously used formula, the average response time of a transaction, assuming the frequent update scenario, is as given below. R may be calculated for the no update case in the same way. substituting the appropriate variables:
fu
= q ~/~,c + (1 - q)S +
2
;t(q~' K,c + (~ - q ) ~ )
2(1 - 2(q ~ ~ , c + (1 - q)~g))
(]8)
ANALYSIS WITH VARIABLE BROADCAST DURATIONS Due to the variable delays in the network, some sites m a y not receive the broadcast information within the confirmation duration of the transaction under consideration. This may cause data objects to be sent to sites in the wrong order. Due to this wrong routing, there is an increase in the response time of a transaction. If the wrong routing of data objects occurs frequently, the overall performance of the send-on-demand protocol - may suffer. Thus the probability of wrong routing Pwr becomes a crucial parameter in the performance evaluation of the send-on-demand protocol. However, deriving P,.r is a formidable task, and in this section it is merely attempted to indicate the analytical procedure and the associated complexity to calculate it.
Problem
description
The first fact to note is that the wrong routing of a data object d may occur only if there are at least two transactions that conflict over d. Any two transactions that access c o m m o n data objects such that one of the accesses is a write operation, are said to conflict with
c o m p u t e r c o m m u n i c a t i o n s v o l u m e 17 number 3 march 1994
197
Performance analysis of the send-on-demand protocol: S Banerjee et al.
Ti arrives at HOL at site i
Broadcast from T i reaches claim queue ai End of Confirmation
--B"
~l Time
B End of Confirmation
aj Tj arrives at HOL at site j
Figure 9 Timing diagram of two conflicting transactions
each other. To get a better feel for the problem at hand, it is worth looking at Figure 9 containing the timing diagram of two conflicting transactions T~ and Tj that have data object d in their conflict set. We shall calculate the probability of wrong routing for data object d. T / a n d Ti have been generated at sites i and j respectively. Further, T,. has a lower timestamp than Tj and begins its processing lj seconds earlier than Tj. Each transaction at the beginning of its processing broadcasts its access set to all the sites. The access set information of Ti and Tj takes ai and aj seconds, respectively, to arrive at the claim queue of data object d. However, no action is taken until the transaction confirmation duration B' is over. If ai > B', then the transaction T/is confirmed at that data site immediately on the arrival of the access set information*. Otherwise, it has to wait for a time duration B ' - ag before confirming T/ and accepting it into the claim queue. At the time of confirmation of a transaction, the existing entries in the claim queue of the data object are re-ordered in the timestamp order, to reflect the arrival of the new claim. The claims get served in the FCFS order, and the arriving claim may have to wait for some time to be served (Wa). In Figure 9, we find that a situation exists whereby the data object dwill get sent to sitej instead of site i although transaction Tj had a lower timestamp. One easy solution to this problem is to make B' very large. However, this solution would entail a larger response time, and so a trade-off exists. The probability of wrong routing in terms of all the parameters listed above is given below: P.,r = Prob{The claim for data object d from Tj
arrives at the H O L of the claim queue before the claim from 7",-arrives, given that TS(Ti) < TS(Tj), and T/ & ~ conflict over data object d, and each accesses K data objects}
= (~a(K, K ) P r { l j + m a x ( B ' ,
i
ting Time Ends
Broadcast from Tj reaches claim queue In the above equation ckd(K , K) denotes the probability of transactions Ti and Tj, each accessing K data objects, conflicting over d. More generally, the probability of two transactions accessing L and M data objects, respectively, conflicting over data object d, dpd(L, M) is given below. The derivation is provided in Appendix A:
LM
~bd(L, M) = --D-T- (p2 + 2prpw)
(20)
Computationof Pwr To calculate P ..... the distributions of the random ai, 5, Wd, max(B', aj) and aj + lj have to be determined. In general, this is not an easy task and the key difficulty in analysing even simple systems lies here. In Li 21, performance analysis of timestamp-ordering CC protocols has been analytically done by approximating some of the key parameters as exponentially distributed random variables. Exponential random variables have many simplifying characteristics, some of which are listed without proof in Appendix B. The proofs may be found in Li 21. Let us assume that a/(aJ) is exponentially distributed with mean 1/pi = tp + ta. tp and ta represent the propagation delay and transmission + queueing delay, respectively. The transmission delay may be neglected at high data rates, tp depends on the topology of the network. For instance, in the ring topology described earlier, the average propagation delay ~p = r/2, where is the propagation delay for information to travel half way down the ring. Now, the derivation of Pwr is continued, as below: variables
Prob{aj + ~ < ai} = Prob{aj < ai} Prob{/j < ai}, Appendix B: Lemma 3
aj)+ Wa < ailaJ
+ / j < a~}Pr{a i + / j < a~}
(19)
_
/~i
Prob{lj < ai},
Appendix B: Lemma 1
~i -[- 'UJ *In the previous section, it was implicitly assumed that ai _< B', i = 1 , 2 , . . . , N . Thus, the broadcast duration B=max{al,a2,..., aN} <_B', and wrong routing never occurred.
198
= t~jProb{lj < ai}
computer communications volume 17 number 3 march 1994
#i + ~/
(21)
Performance analysis of the send-on-demand protocol: S Banerjee et al.
Prob{/~ + m a x ( i / , aj) + Wj < ailaJ + !J < ai} = P r o b { m a x ( i / , ai) - aj + w,t < ai - a i
- / j I.j + tj < . 4 = P r o b { m a x ( i / , ai) a~ 4- W d < ai}, Appendix B: Lemma 2 = Prob{max(B', aJ) - aj < ai} Prob{ IV,~ < ai}, Appendix B: L e m m a 3
(22)
Note that in using Lemma 3 in the above equation, we must satisfy the constraint that max(//', a j ) - a / > 0. Since m a x ( i / , ai)>-aj is true, this constraint is satisfied. Using the exponential distribution assumption for a~ and ai, it can be easily proved that: Prob{max(B', aj) - a i < ai}
{ L 1
deriving P,r. S and S 2 had been derived earlier under the assumption that the confirmation duration B' was set equal to the broadcast duration, which in turn was known to be a fixed value. However, now since those assumptions are no longer true, we have to recalculate the moments of the service time S. The new timing diagram for an incoming transaction is given in Figure lO. From this diagram, the following two equations similar to those derived earlier may be derived. There is a push-pull kind of relationship between P,.,. and S. The higher the probability of wrong routing, the higher is the transaction service time S. However, increasing S has the effect of decreasing the probability of wrong routing (since Wj increases), thus decreasing S. Again an iterative solution may be required. This phenomenon has been neglected for the time being to simplify matters:
e~i lqlB
=
e ~,,8 1 + ~
e ~'B'(I + # B'),
,
/Zi¢/2i
(23)
S = max{max(al, B') + Wl + tl . . . . , max(ak, i/) + Wk + tk} + E
IJj=#i=tz
= B' + max{max(al, i / ) - ff + Wi
Appendix C contains the details of the derivation. An observation that can be made from the above is that:
+ q , . . . , max(ak, i f ) - - B' + Wk + tk} + E
(25) P r o b { m a x ( i / , ai)
aJ < ai} =
1,
B' = 0
o,
(24)
i~ = oc
Further, it can be proved that the above probability is a monotonically decreasing function with respect to B' and has a maximum o f 1.0 at i / = 0. When B' is made very large (vc), the probability of wrongly routing a data object is zero. However, each data site has to wait for a very long time before confirming any transaction (confirmation d u r a t i o n = i f ) , and thus the response time will also become infinitely large. We still have to calculate Prob{/i < ai} and Prob{ W j < a,}. Unfortunately, the density functions of /j and Wa are mathematically intractable, since they depend on the density function of S (service time of a transaction queue), which is unknown. If these two distributions are assumed to be of some form, the derivation of P,.,. can be continued further. In fact, using some c o m m o n distributions for/./ and W j , Pwr c a n be analytically derived, which can then be matched with simulation results. The quality of match will, of course, depend upon the distributions chosen. This work is planned in the future. There is one more level of complexity that exists in
~
I~1
l
B'
--
al
a2
Transaction arrives at HOL Broadcasting Begins
•
W 2 W 1
~-
=
Xi
S - max(ai. B') - Wi = S - B'
(max(a/, i / )
B') - W,
(26)
Similar to the procedure given in the previous section, we can substitute Z = max{max(at, B') - B' + W1 + t~, max(a2, B') B' + W2 + t 2 , . . . , max(ak, B') - B' + Wk + tk}. This can be further simplified to Z max{ Yi. Y2,-.., Yk}, where each of the random variables Y, [max(a/, B') - i~ + Wi + ti], 1 < i < k is identically distributed. To compute S and S 2. we now need to calculate the first and second moments of Z. To derive the distribution of Z . the distribution of Yz, l < i < k has to be determined. Then the iterative procedure is followed as before to compute the moments of S.
RESULTS
AND
THEIR
INTERPRETATION
This section contains the results of a fairly detailed comparison study between the proposed send-ondemand protocol and the concurrency control scheme of locking. We compare our scheme with that of locking, since locking has been accepted as an industry
t2 -~
t 1 -~
E
b
D 1 arrives D 2 arrives
D k arrives Processing Begins
T
End of Transaction
Figure 10 Chronological events in the execution of an update using the send-on-demand protocol under variable broadcast durations
computer communications volume 17 number 3 march 1994
199
P e r f o r m a n c e a n a l y s i s of the s e n d - o n - d e m a n d protocol: S B a n e r j e e et al.
standard. The comparison is carried out in a high-speed wide area ring environment. The D D B S considered is fairly small, and the results derived are for illustrative purposes only. F o u r concurrency control algorithms are considered in this comparison study, viz. pure send-ondemand, locking, hybrid(s-o-d) and hybrid(lck). The comparison study was carried out with the same D D B S parameters, and under the same assumptions as mentioned before. These assumptions are listed below: • • • • • • •
All transactions at all the sites had the same data access distribution. The transaction arrival distribution at each site was Poisson with rate 2. Each transaction had a fixed access set cardinality K. The D data objects were accessed with equal probability (uniform distribution). The size of the database was 50 Mbit, with 100 data objects (D = 100) and 10 computer sites (N = 10). The network topology chosen was a ring with a circumference of 6000 miles (L = 6000). Each site operated at a multi-programming level of unity, i.e. each site processed only one transaction at a time.
The confirmation duration B was set to the time to propagate data half way down the bi-directional ring. The broadcast message from any site is sent in both directions down the ring, thus ensuring that all the sites obtain the message within B units of time. A transaction is assumed to be complete the moment it gains access to all the data objects in its access set, i.e. E as defined earlier, was set to zero. This assumption, although not correct in reality, does not affect our results in any significant way. In a real system, after accessing all the data objects in its access set, some processing will be required for the transaction to complete, and this will cause the transaction response time to increase. The locking protocol suffers from the occurrence of deadlocks. However, in this study, the effect of deadlocks is neglected and no overhead is attributed for resolving deadlocks or investing in deadlock detection procedures. Thus, the performance of locking as derived in this study is actually an upper bound. First, we consider a write-only system to compare the send-on-demand and the locking protocols. In Figure 11, the average response time of an update is plotted against the update arrival rate for both locking and the send-on-demand protocol with two values of K, viz. 1 and 4, at a network data rate of 1 Gbit/s. K = 1 corresponds to the low data contention case, while K = 4 corresponds to the high data contention case. When the data contention is high, the send-on-demand protocol outperforms locking, while at low data contention, the situation is reversed. The send-ondemand protocol has an initial o v e r h e a d - t h e broadcasting p h a s e - b e f o r e any transaction can be confirmed,
200
1200
I ~" 1000[
g
!
....... L°°ki"~
i
- - : Send-on-Demaiad
!
K--4i
8°0 t ............................ H i ~ / ~
?
):=l .... ............................................~ w
Contention
............
Conter/tion
600
. . . . . .
g
~ 4oo 20G
0 ..................................................................... 100 200 300
400
500
600
Total Arrival Rate to the System (tps)
Figure I1 Comparison of send-on-demand and traditional locking for a write-only system
i.e. officially accepted into the system. However, the data objects themselves are locked for a relatively short period as compared to the locking scheme, which suffers from the extra overhead of unlock messages. Thus, the queueing delay for data objects is larger in locking and degrades the performance in high contention situations. At low contention, the queueing delay is negligible, and the broadcasting overhead in the send-on-demand scheme causes the performance to be worse than locking. Now we relax the assumption of a write-only system, and consider a general DDBS, where both read and write operations are allowed. Each data access is of type read with probability Pr, and of type write with probability (1 - P r ) . Our performance parameter is the m a x i m u m sustainable transaction throughput with an average transaction response time of 0.5 s. In Figures 12 and 13, the m a x i m u m transaction throughputs for both locking and send-on-demand, as a function of the read probability pr, have been plotted at link speeds of 1 Gbit/s and 100 Gbit/s*. In Figure 12, each transaction accesses 1 data object ( K = 1: low data contention) while in Figure 13, each transaction accesses 4 data objects (K = 4: high data contention). Several observations may be made from this data. As expected, the m a x i m u m transaction throughput increases as the read probability increases, although the rate of increase is higher for locking than for send-on-demand. At low data contention, locking outperforms send-on-demand (Figure 12) while at high data contention, at low to medium read probabilities, send-on-demand outperforms locking (Figure 13). At high data contention and high read probabilities, locking does better due to the initial broadcasting overhead incurred in send-on*It may be noted that a network data rate of 100Gbit/s is unrealistic at present. However, there is no question about the fact that network speeds will continue to increase in the future. We have included our data at 100Gbit/s to foretell the performance of distributed database systems of the future.
c o m p u t e r c o m m u n i c a t i o n s v o l u m e 17 n u m b e r 3 m a r c h 1994
Performance analysis of the send-on-demand protocol: S Banerjee et al. 400
650 i 100 Gbps
:: 390
600 Lo~o~
....
•d, 550 g 500
380
....................... .
:
......
: •
:
370
. . . . . . . . . . . . . . . .
.....
:
......
i
i
,
i
360
....
350 340
450 330 .... S en d o:__-De n m and i
400
.............
!
3513
0.1
!
0.2
0.3
0.4
: ...................... ,...............
:
i
~ [
i3
0.8
0.9
0.5
!
0.6
i
0.7
320 310 300
0.1
0.2
0.3
0.4
Read Probability
0.5
0.6
0.7
0.8
0.9
Read Probability
Figure 12 Comparison of send-on-demand and locking under low data contention: K ~ 1
Figure 14 Comparison of pure send-on-demand and the hybrid scheme under relatively high data contention: K = 4
400 380 360 340
'i 260F
:
'
~4o~-~ 220[
0
i .....
/
"
-0.1
0.2
0.3
....
0.4
i 0.5
0.6
i 0.7
0.8
0.9
Read Probability
Figure 13 Comparison of send-on-demand and locking under high data contention: K = 4
demand. Another important (and dismaying!) observation is that the increase in transaction throughput by increasing the network data rate from 1 100Gbit/s is very small. This is due to the fact that both schemes under consideration are propagation delay bound (and the propagation delay does not depend on the network data rate). Next we look at the performance of the hybrid CC algorithm the hybrid of the datacycle scheme with send-on-demand (hybrid(s-o-d)). In Figure 14, the maximum transaction throughputs under high data contention (K = 4) f o r b o t h hybrid(s-o-d) and send-ondemand as a function of the read probability at a network data rate of 2 Gbit/s is given. At 1 Gbit/s, the performance of the hybrid scheme is actually worse than that of the pure send-on-demand. The reason for this is the relatively high datacycle duration at 1 Gbit/s. The datacycle concept has been incorporated in the system assuming a worst case scenario (updates in every datacycle). Thus the maximum transaction throughput
as shown in Figure 14 for the hybrid scheme is actually a lower bound. In Figure 15, to estimate the improvement in performance, we compare locking at 100 Gbit/s and the hybrid(s-o-d) protocol at 2 Gbit/s. As expected, the hybrid scheme outperforms traditional locking for all values of the read probability. This proves, to a large extent, our point that simply increasing the network data rate, without addressing the communication latency issue will not buy us extra transaction throughput in a DDBS. Finally, in Figure 16, the performance of all four algorithms is presented at 2 Gbit/s and K = 4. The hybrid(s-o-d) algorithm performs better than all the other schemes, except at very high values ofpr, when the hybrid(lck) scheme performs better. The reason for this is that as the read access probability for data objects increases in the DDBS, the overall level of data contention decreases, and as we have seen earlier, the locking mechanism performs well in a low data contention situation. For the same reason the hybrid(lck) and
4OO
38o .........
!
360 . . . . .
¢
340
.
.
..........
K--4
.
.
.
.
.
Hybrid(s-o-d)
at
.
.
2G
.
~
300
280t 2
. . . . . .
6
0
0
0.1
.
.
.
~
0.2
:
.
. . . .
......................................
0.3
0.4
0.5
0.6
........
0.7
0.8
0.9
Read Probability
Figure 15 Comparison of the hybrid scheme and locking under relatively high data contention: K = 4
computer communications volume 17 number 3 march 1994
201
Performance analysis of the send-on-demand protocol: S Banerjee et al. 400
Gips
2
il) Hyb~d(s-°-dl
~
4OO
i/'~
380 .......~-.-4..................................................~2)..s.~.~=~o:~4......i ................i..............,2f/ ....... i ! !~ ~3) HybfidOck)i ii ~ / 3oo 320 ...............~ 300
.o
! .. i
...........
.......
......
.........
~ybrid(~-o-d) ]
..................................................
0.1
0.2
0.3
t
...............................
ii i
..........
0
! i
i
!
i
i
i
i
!
0.5
0.6
i................i
!
!
......i................i................i........... ~ ...........
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...............
; ,,ot
! i
0.4
..........................
0.5
0.6
' ................
0.7
t,
' ..............................
0.8
0.9
1
S
!
2oo]
i
i
i
~
" IL~
~
~
~
i
15o F ..............i................i................i..-.~.--...] •
lOO/ ~ y0.2 ~ 0.3~ 0 0.1
Read Probability
0.4
:
0.7
0.8
~
0.9
Read Probability
Figure 16 Comparisonof the four schemes under relatively high data contention: K = 4 the locking schemes outperform the pure send-ondemand scheme at high values Ofpr. In Figures 17 and 18 we plot the maximum attainable transaction throughputs as a function of the read probability p for the hybrid(s-o-d) and hybrid(lck) protocols at a network data rate of 2 Gbit/s. The two plots are at three different data contention levels determined by the cardinality of the transaction access set, K. As K increases, so does the data contention level. As the data contention in the DDBS increases the hybrid(s-o-d) protocol performs better than the hybrid(lck) protocol at all values ofpr. At relatively lower levels of data contention, the hybrid(lck) performs well at high values of Pr. In all of the above hybrid algorithms, a worst case scenario for the datacycle operation has been assumed. Thus, the performance of the hybrid schemes obtained is in reality a lower bound. In Figure 19, the upper bound (assuming a best case scenario for the datacycle scheme) and the lower bound of the performance of the hybrid(s-o-d) algorithm for
Figure 18 Comparison of the two hybrid schemes under relatively high data contention: K = 8
380
............... , ................ ~~bp,~
............... ~................ i ................ ~............... ! ................ i .............
360
............... i ................ iK~-..-i
................ i
~o
............... i ................ i ................ i ................ i ................ i ................ i ................ i ................ ~i../:-..--.-i
................i................
i ................ i ................
,oo I. .............. i ................ , ................ i ................ , ................ i ...............
i:: " i
i ' .::: i / .............
...............................
zso I ..............i................i................!.......... ........... :
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Read Probability
Figure 19 Lower and upper bounds on the performance of the hybrid(s-o-d) algorithm: K = 6 K = 6 at 2 Gbit/s is plotted as a function of the read probability. We find that there is not too large a gap between the two bounds in this example.
4OO
Gbps
•
::
x--6
350
i
/
i CONCLUSIONS
='~ 300
•~
250
Hybrid(
200
'~
150
100
0.1
0.2
0,3
0.4
0.5
0.6
0.7
0.8
0.9
Read Probability
Figure 17 Comparison of the two hybrid schemes under relatively high data contention: K = 6
202
AND
WORK
IN
PROGRESS
In this paper, we have analysed a new concurrency control scheme applicable to DDBS in a high-speed environment. Further, the effect of variable broadcast durations on the performance of the send-on-demand was briefly studied. Using a small example, we have demonstrated that the traditional protocols do not overcome the communication latency problem and new protocols are required. There is no justification for implementing DDBS on high-speed networks if no gain in throughput is observed with the increase in the network speed. A study comparing four concurrency control schemes was conducted. The hybrid(s-o-d) algorithm performed the best, except in situations of
computer communications volume 17 number 3 march 1994
Performance analysis of the send-on-demand protocol: S Banerjee et al. very low data c o n t e n t i o n . D u e to the high c o m m u n i c a tion latency, the locking scheme at a very high n e t w o r k data rate failed to p e r f o r m better t h a n the new p r o t o c o l at a m u c h lower data rate. T h e o p e r a t i o n of the s e n d - o n - d e m a n d p r o t o c o l in the face of n e t w o r k a n d c o m p u t e r site failures is being considered. The effect of different data g r a n u l a r i t y , different data access d i s t r i b u t i o n s a n d multiple copies of d a t a objects on the p e r f o r m a n c e of the send-ond e m a n d protocol is also being looked at. Since locking performs well at low data c o n t e n t i o n , we are also l o o k i n g into a protocol that is a hybrid of the locking, s e n d - o n - d e m a n d a n d the datacycle concept. W o r k is in progress with regard to m a k i n g the original s e n d - o n d e m a n d protocol r o b u s t to failures, w i t h o u t having to give up too m u c h of the t r a n s a c t i o n t h r o u g h p u t in the process.
t3
14 15 16 17 18 19 20 21
ACKNOWLEDGEMENTS This research is s u p p o r t e d in part by the N a t i o n a l Science F o u n d a t i o n u n d e r g r a n t No. N C R - 9 0 1 6 3 4 8 , by the D e p a r t m e n t of Defense J o i n t Services Electronics P r o g r a m u n d e r c o n t r a c t No. F49620-91-0028, a n d by the Pacific Bell External T e c h n o l o g y P r o g r a m .
REFERENCES l 2 3 4
5 6 7
Kleinrock, L 'The latency/bandwidth tradeoff in gigabit networks', IEEE Commun. Mag., Vol 30 (April 1992) pp 36M0 Partridge, C 'Protocols for high speed networks: Some questions and a few answers', Comput. Networks & ISDN Svst., Vol 25 (April 1993) pp 1019-1028 'Gigabit network testbeds', Special Report, IEEE Computer Magazine (September 1990) Clark, D D, Davie, B S, Farber, D J, Gopal, I S, Kadaba, B K, Sincoskie, W D and Smith, J M "The AURORA gigabit testbed', Comput. Networks & ISDN Syst., Vol 25 (January 1993) pp 599621 Bowen, T, Gopal, G, Herman, G and Mansfield, W "A scale database architecture for network services', 1EEE Commun. Mag., Vol 29 (January 1991) pp 52 59 Bowen, T F, Gopal, G, Herman, G, Hickey, T M, Lee, K C, Mansfield, W, Raitz, J and Weinrib, A 'The Datacycle architecture', Commun. ACM, Vol 35 (December 1992) pp 71 81 Herman, G and Gopal, G "The case for orderly sharing', in
Gray, J, Lorie, R, Putzolu, G and Traiger, 1 ~Granularityof locks and degrees of consistency in a shared data base', in Modelling in Data Base Management Systems (G. Nijssen, ed.), North-Holland, Amsterdam (1976) pp 365 394 Shyu,S-C and Li, V O K 'Performance analysis of static locking in distributed database systems', IEEE Trans. Comput., Vol 39 (June 1990) pp 741-751 Tay, Y, Suri, R and Goodman, N "A mean value performance model for locking in databases: The no-waiting case', J. ACM, Vol 32 (July 1985) pp 618 651 Mitra, D and Weinberger, P J 'Probabilistic models of database locking: Solutions, computational algorithms and asymptotics', J. ACM, Vol 31 (October 1984) pp 855 878 Potier,D and Leblanc, P 'Analysis of locking policies in database management systems', Commun. ACM, Vo1123(October 1980) pp 584-593 Johnson, T 'Approximate analysis of reader and writer access to a shared resource', Proe. A C M SIGMETRICS Con[~, Boulder, CO (1990) pp 10(~114 Shyu, S-C Design and PerJbrmance Analysis (?/ Locking Algorithms .[br Distributed Databases, PhD thesis, University of Southern California, Los Angeles (December 1989) Kleinrock, L Queueing Systems, Vol. 1: Theory McGraw-Hill, New York (1975) Li, V O K 'Performance models of timestamp-ordering concurrency control algorithms in distributed databases', IEEE Trans. Comput.. Vol 36 (September 1987) pp 1041 1051
A P P E N D I X A: CALCULATING O~t(L, M ) Here the p r o b a b i l i t y of conflict f o r m u l a has been derived a s s u m i n g u n i f o r m access a n d both exclusive (write) a n d share (read) locks. If there is to be no conflict between the two t r a n s a c t i o n s (accessing L a n d M data objects, respectively, out of a total of D data objects), then the following c o n d i t i o n s m u s t be met: 1. There should be no c o m m o n elements (data objects) between the write-set of the first t r a n s a c t i o n a n d the entire access set of the second transaction. 2, The read-set of the first t r a n s a c t i o n should have no data objects in c o m m o n with the write-set of the second t r a n s a c t i o n . The p r o b a b i l i t y that two t r a n s a c t i o n s conflict over the same data object d, both t r a n s a c t i o n s m u s t access d (with p r o b a b i l i t y L I D a n d M / D , respectively), a n d at least one of them must write-access d. Using the above: dpd(L , M) = L-~ M (p2 + 2 p r p , ) L]
(27)
Lecture Notes in Computer Science on High Performance Transaction Systems (D Gawlick, M Haynie and A Reuter,
8 9 10
11 12
eds.), Springer-Verlag, Berlin (1989) pp 148-174 Herman, G, Gopal, G, Lee, K and Weinrib, A 'The Datacycle architecture for very high throughput database systems', Proc, A C M SIGMOD, San Francisco, CA (1987) pp 97-103 Weinrib A and Gopal G "Decentralized resource allocation for distributed systems', Proc. INFOCOM 87, San Francisco, CA (April 1987) pp 328 336 Banerjee, S, Li, V O K and Wang, C 'Distributed database systems in high-speed wide-area networks', 1EEE J. Selected Areas in Commun.. Special Issue on Gigabit Network Protocols and Applications, Vol 11 (May 1993) pp 617-630 Ozsu, M T and Valduriez, P Principles of Distributed Database Systems, Prentice Hall, Englewood Cliffs, NJ (1991) Cellary, W, Gelenbe, E and Morzy, T Concurrency Control in Distributed Database Systems, Studies in Computer Science and
Artificial Intelligence, Elsevier, Amsterdam (1988)
A P P E N D I X B: BASIC FACTS ON EXPONENTIALLY DISTRIBUTED RANDOM VARIABLES 21 L e m m a h G i v e n two i n d e p e n d e n t , exponentially distributed r a n d o m variables Xl a n d X2 with m e a n s 1/#1 a n d 1/#2 respectively: Prob{Xi >_ Y2} --
#2 #l -}- #2
(28)
L e m m a 2: G i v e n i n d e p e n d e n t r a n d o m variables X a n d Y, such that X is e x p o n e n t i a l l y distributed with
computer communications volume 17 number 3 march 1994
203
Performance analysis of the send-on-demand protocol: S Banerjee
parameter /~ and Y is positive with an arbitrary distribution, the distribution of X - Y, given X ___ Y, is the same as the distribution of X. Mathematically, this implies that: fx-rlX>r(z[X>_
Y)=fx(z)=Ite-~ZU(z)
(29)
Lemma 3: Given independent random variables X, Y and Z, such that X is exponentially distributed (the distributions of Y and Z are arbitrary but Y must be positive):
et al.
The above may be combined to get an integrated equation f o r f z ( z ) as: f z ( z ) = e -":g 6(z) + e-UJ(g-zl [U(z) - U(z - ff)]
(33)
Let fZa,(Z, X) represent the joint density function of the random variables Z and ai: Prob{Z < ai} =
fza, (z, x) d x dz =0
=
Prob{X > Y + Z} = Prob{X > r} Prob{X > Z} (30)
~z
517 =0
fz(z)fa, (x) d x dz,
=z
Z and ai are independent /¢ = I [e-~:~ 6(z) + e-~: (~¢ ~'1]
APPENDIX C: DERIVATION OF PROB{max(ff, aj) - aj < ai}
dz=O
Let the random variable Z = max(if, aj) - aj. Thus, we are interested in calculating Prob{Z < ai}. Then:
x
(31)
= f
5
l~i e-"' Xdx dz
=z
t¢
Z=
ff -aj,
O,
O<_aj <_ff ff
e(,j - u J = e -~j~¢ 1 + - I~j - Pi
Using standard techniques, the probability density function of Z - f z ( z ) may be found to be: -.AB'-z),
fz(z)=
{e
O,
Prob{Z = 0} = e -u~t¢
204
[e-~j g 6(z) + e -~'#¢ - zl] e ~' z dz
,I-'=0
O
(34)
The above was derived with the condition that/~i ~ #j. A similar procedure may be followed to calculate the above probability for the ]/i = ]A/ = ~ case.
computer communications volume 17 number 3 march 1994