The Journal of Systems and Software 41 (1998) 63±73
Satisfying timing constraints of real-time databases S.V. Vrbsky *, Sasa Tomic Department of Computer Science, The University of Alabama, Tuscaloosa, AL 35487-0290, USA Received 1 September 1996; received in revised form 9 November 1996; accepted 22 December 1996
Abstract A real-time database has deadlines for processing transactions. Approximate query processing (AQP) has been presented as a strategy to satisfy these timing constraints by providing approximate answers to queries at the deadline instead of missing deadlines. In order to produce approximate answers, semantic information is maintained about the database and a computational overhead is required. In this paper the performance of AQP is examined to determine its eect on satisfying the timing constraints of real-time databases. Query and update transactions are modeled as periodic tasks with hard deadlines. A lock-based concurrency control is used and the eect of workload characteristics, such as the scheduling algorithm, the number of transactions and the percentage of update transactions, are examined. We compare the number of missed deadlines and approximate answers produced during AQP to the number of missed deadlines occurring during traditional query processing (TQP). Results demonstrate that despite the overhead, fewer deadlines are missed during approximate query processing than during TQP. Ó 1998 Elsevier Science Inc. All rights reserved. Keywords: Approximation; Approximate query processing; Imprecise computation; Performance evaluation; Real-time databases
1. Introduction Real-time databases (Bestavros, 1996; Ramamritham, 1993) are distinguished from conventional databases because they have timing constraints that require the result of a transaction to be produced by a speci®ed deadline. Examples of systems with such real-time databases are avionics and air trac control, computer integrated manufacturing, and robotics. In order to meet deadlines, real-time databases require not only speed but also predictability of response (Ramamritham, 1993). Many factors can compromise the predictability of the response in a real-time database. For example, a network partition or a host failure can cause some needed data to become inaccessible and as a result a deadline can be missed. Likewise, when transactions share data, the concurrency control that is necessary to ensure data integrity can cause transactions to miss their deadlines. If an exact answer cannot be produced by a deadline, producing an approximate answer by the deadline is more useful for some systems than producing no answer at all or producing an exact answer after a deadline. For
* Corresponding author. Tel.: +1 205 348 7352; e-mail: vrbsky@ cs.ua.edu.
0164-1212/98/$19.00 Ó 1998 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 7 ) 1 0 0 0 7 - 3
example, if a pilot requires a list of available runways at airports within landing distance, having a partial list of those runways may be better than no list at all. Similarly, we consider a real-time database system for a large power generating dam (Durrans et al., 1997) that is designed to de®ne and control dam operation policies and water resource management. The data monitored in such a system consist of reservoir pool levels, precipitation in raingauges, current gate positions, etc. A query is used to derive the reservoir in¯ows every 15 min in order to determine future gate positions. If the information from all of the rain-gauges is not available, the reservoir in¯ow cannot be derived by the deadline in traditional database query processing. Missing a deadline will cause a delay in determining the gate positions, and this can be critical in the case of catastrophic rainfall. However, having an approximate answer which uses information from the available rain-gauges and the past record of the data for the unavailable rain-gauges will allow a reaction to the current situation to be undertaken. Predictability is increased if an answer (exact or approximate) can be provided by the deadline. Approximate query processing (Vrbsky and Liu, 1993, 1995; Vrbsky, 1996) has been presented as a strategy that increases the capability of a system to meet the deadlines of a real-time database. Approximate query processing
64
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
is based on the model of imprecise computation (Chung et al., 1990; Lin et al., 1987) which trades o the quality of the result for the time to process the result. Approximate query processing produces approximate answers to database queries. The approximate answers converge to the exact answer if there is enough time to retrieve and process all of the data. If there is not enough time, the most recently computed approximate answer is provided. However, approximate query processing requires a computational overhead. In this paper, we examine the performance of approximate query processing to determine its eect on satisfying the timing constraints of a real-time database. Other studies (Sha and Rajkumar, 1988; Song and Liu, 1995; Ulusoy and Belford, 1993) have examined the performance of concurrency control approaches and scheduling algorithms in real-time databases using the percentages of missed or satis®ed deadlines. Song and Liu compare the percentage of missed deadlines for optimistic concurrency control versus two-phase locking (Song and Liu, 1995). They model database transactions with hard deadlines as periodic tasks and utilize the rate monotone and earliest deadline ®rst scheduling algorithms. Ulusoy and Belford present a new lock-based protocol and compare its performance to existing protocols (Ulusoy and Belford, 1993). They consider database transactions with soft deadlines and use the earliest deadline ®rst and least slack ®rst scheduling algorithms. None of the studies includes approximate answers in their results. For this study the database transactions are modeled as periodic tasks with hard deadlines. A hard deadline means that a result produced after the deadline is useless. We compare the number of missed deadlines and approximate answers produced during approximate query processing to the number of missed deadlines occurring during traditional query processing. A lockbased concurrency control is used and the eect of workload characteristics, such as the scheduling algorithm, the number of transactions and the percentage of update transactions, are examined. In Section 2 of this paper we describe the approximate query processing strategy. Section 3 presents the model used for the simulation study. The results of the simulation are presented in Section 4, and in Section 5 we provide the conclusions. 2. Approximate query processing Approximate query processing (AQP) is based on the following processing strategy. Similar to traditional query processing, the approximate query processor begins by representing a query by a query tree. During traditional query processing, a node at a higher level in the query tree can be evaluated only when all operands rep-
resented by its children are available. No answer is produced unless all the read requests are granted and all operations performed. During AQP, segments of a relation, instead of the entire relation, are retrieved and processed at a time, with the approximate results repeatedly propagated up the query tree. The value of a leaf node improves as more segments are returned from the database. An improvement in a leaf node is propagated upward at each node to the root node. At each node the appropriate database operation is applied to the improved result. The value of the root node is updated with better values each time the root node is re-evaluated. If there is enough time to process all of the data, the exact answer is returned; if not, the most recent value of the root node is returned to the user as an approximate answer. 2.1. Approximate answers An approximate answer of AQP is based on the approximate relational model (Smith and Liu, 1989) which asserts that there is a natural way to approximate answers of set-valued queries. An exact answer E to such a query is a set of data objects. All the data objects in a subset of an exact answer E certainly belong to E; and a data object in a superset of E is possibly also a member of E. Therefore, a meaningful approximation of any exact answer E can be de®ned in terms of a subset and a superset of E. An approximation A of an exact answer E to a set-valued query is de®ned as the union of two sets of data objects: (C; P ), where C is a certain set, and P a possible set. C is the set of data objects certainly in E; it is produced from the data processed thus far, hence C E. P is the set of data objects that may be in E; it is produced based on information about the data that is maintained by the approximate query processor. P is the data to be retrieved and processed as query processing continues and
C [ P E. An approximation allows the user to distinguish between the data objects processed thus far, the set C, from those not yet processed, the set P . As an example of a real-time system we consider the automated factory described in (Ramamritham, 1993) that consists of the ¯oor of the factory containing robots, assembling stations and assembled parts, as well as computer and human interfaces to manage and coordinate the activities occurring on the ¯oor. The real-time database for this system contains information about production parameters, such as temperature and pressure at dierent stations on the ¯oor of the factory. Each of the stations has a manager (e.g., Jones, Lee or Smith). Fig. 1(a) illustrates an exact answer E to a query requesting the id, Manager and Temperatures of all stations on the factory ¯oor where the temperature exceeds 75 F. Fig. 1(b) is an approximation A0 of E. It is the set of temperatures at all stations on the ¯oor of the factory. A0 is a
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
65
and which are worse. If the approximation Ai has more certain tuples and less possible tuples than the approximation Aj , it is known that more tuples have been retrieved and processed to produce Ai . Thus, it makes sense to say the Ai P Aj . We compare only those approximate answers that have the subset/superset relation described above, so that we are comparing only those answers that are obtained at dierent stages of processing. 2.2. Semantic support
Fig. 1. Approximations of exact answer: (a) the exact answer E; (b) approximation A0 ; (c) approximation A1 ; (d) improved approximation A2 .
superset of the exact answer in Fig. 1(a). Fig. 1(c) is another approximation A1 of E. The horizontal line in Fig. 1(c) distinguishes the certain set from the possible set. The ®rst two data objects, the set C; have been identi®ed as certainly in the exact answer; they are a subset of the exact answer. The last ®ve data objects, the set P , are possibly in the exact answer. The data objects in Fig. 1(c) are a superset of those in Fig. 1(a). An improvement in approximation A1 is illustrated by approximation A2 in Fig. 1(d). The data object with id 1 is identi®ed as a data object certainly not in the exact answer and deleted from the set P . A further improvement results if the data object with id 7 is identi®ed as certainly in the exact answer and moved to set C. More formally, given a set of approximations (C; P ) to a query, a partial-order relation P (better than) is de®ned over the set as follows. One approximation is considered to be better than another, Ai P Aj ; if: Ci Cj ;
Ci [ Pi
Cj [ Pj , and Pi Pj . Ai is better than Aj because Ci has at least as many objects from E as Cj . The data objects in Ci ÿ Cj that are in E become certain and/or the data objects in (Ci [ Pi ÿ
Cj [ Pj ) that are not in E becomes certain. This partially ordered set of all approximations of E is a lattice. In the lattice, A0
[; v is the least (®rst) element and the worst possible approximation of E, where v is the Cartesian product of all the domains in the schema of E. It is the set of all possible data objects which could be in E. A key assumption is that the number of elements in v is ®nite and, at least theoretically, can be computed without accessing the data itself. This is necessary because the possible part is the data objects which have not been processed yet. The greatest (last) element of the lattice is the best possible approximation which is E itself, represented by (E; [). This partial-order relation is compatible with a common sense notion of which approximations are better
The possible tuples are those tuples not processed yet, but they cannot be identi®ed without being retrieved and processed. Hence, the following strategy is used for the design and implementation of the approximate query processor described in (Vrbsky and Liu, 1993). Similar to the approach used to produce intentional answers (Chu et al., 1991), data objects are categorized into classes (Hammer and McLeod, 1990; Peckham and Maryanski, 1988). A class corresponds to a relation or a segment of a relation. Classes can be organized into a class hierarchy, such as a specialization hierarchy. A class in a class hierarchy is described by the class attributes and methods that are speci®ed by the database manager when the class hierarchy is designed. The following semantic information is maintained for each class: (1) the number of instances of the class, (2) the attributes of the instance variables of the class, and (3) the domains for the attributes of the class, called the domain variables. Maintaining this information provides an alternative to processing the possible data objects P during query processing. This information is accessed along with the base relations during query processing and allows the value of an approximation to be stored in an approximate object A
C; P that has two parts: the certain_part, and the possible_part. The value of the certain_part is a certain set C containing all the data objects that are known to be in the exact answer E. The value of its possible_part is a set P of possible classes: P fS1 ; S2 ; . . . ; Sn g; where a possible class Si is a class in a class hierarchy. The set of all instances of the classes in P is the possible set in an approximation of the exact answer. Given a set of approximations corresponding to all approximate objects of an exact answer E, a partial order over the set is de®ned as follows. The approximate object Ai P Aj if
1 Ci Cj ;
2 Pi Pj and (3) Ci ÿ Cj is a subset of the instances of the possible classes in Pj ÿ Pi . Fig. 2 illustrates an approximate object that corresponds to the approximation in Fig. 1(c). The certain sets in both ®gures are the same. The factory station data objects have been categorized into classes according to the attribute Manager. Hence, instead of the possible data objects, the possible set of the approximation in
66
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
3.1. Periodic tasks
Fig. 2. An approximate object.
Fig. 2 are the two classes (``Smith Stations'' and ``Lee Stations''). The instances of these possible classes are the data objects in the possible set of stations that are managed by Smith and Lee. The complete set of relational algebra operations, as de®ned in (Vrbsky and Liu, 1993), are extended to accept an approximate object as an operand(s) and to produce an approximate object as a result. In (Vrbsky and Liu, 1993) the approximate relational algebra operations were shown to be monotonically improving according to the partial order described above. An additional primitive designed to accommodate approximate operations is an approximate_read that returns the instances of a class. The approximate answer returned to the user is a set of certain tuples and the domain variables of the attributes of the possible classes for the most recent value of the root node of the query tree. This provides the user with a superset of values of the data objects in the exact answer. Hence, the semantic information allows an approximate answer to be provided to the user without requiring the actual data values of the possible part be read in. This semantics of approximation can also be used to approximate queries with aggregate functions (Jukic and Vrbsky, 1996). 3. Simulation study Real-time database systems can have both periodic and aperiodic transactions with real-time constraints. The data in a real-time database are comprised of dierent collections of data objects: image objects, derived objects and invariant objects (Ramamritham, 1993; Song and Liu, 1995). An image object is obtained directly from sensors monitoring the real-world. An example of an image object is the temperature sampled by the sensors at stations on the factory ¯oor. A derived object is an object computed from a set of image objects and/or other derived data objects. The value of a derived object in the database may be updated. An example of a derived object is the rate at which a reaction is progressing in a factory. This derived object depends on past conditions, such as temperature and pressure values. An invariant object has a value that is constant with time. An example of an invariant object is the number of assembling stations on the ¯oor of the factory.
For this study, all database transactions are modeled as periodic tasks. (Aperiodic transactions can be scheduled as periodic tasks that run during the execution time of a periodic server task (Lehoczky et al., 1987).) A periodic task is de®ned by a period q and an execution time s. The task utilization u is derived from q and s, where u s=q. The deadline for each task is the end of each period and each task must execute once (and only once) during its period in order to meet a deadline. We assume that each task is ready at the beginning of its period (inphase). The total Pnutilization U is the sum of the task utilizations: U i1 ui , where n is the number of tasks in the schedule. Satisfying the timing-constraints of a real-time database is measured by: the number of missed deadlines over the total number of deadlines and the number of approximate answers produced over the total number of deadlines. The percentage of missed deadlines that occur during traditional query processing are compared with those that occur during AQP. We assume each periodic task is one of the following three types of database transactions: a query, an update transaction or a sensor transaction. Transactions may share data with each other, so concurrency control is needed. Queries are read-only transactions and can con¯ict with update transactions. While during traditional query processing queries can miss their deadlines, during AQP queries do not miss deadlines because approximate answers are produced instead. Update transactions are read-and-write transactions to derived objects. Update transactions can con¯ict with other update transactions and queries, and they can miss deadlines. Sensor transactions are write-only transactions to image objects. Similar to other studies, we assume sensor transactions need negligible processor time (Song and Liu, 1995) or can execute concurrently with update transactions and queries. An alternative approach for implementing imprecise computations (Chung et al., 1990) is to logically decompose a process into a mandatory part and an optional part. The mandatory part is the ®rst part of the computation and it must be completed to produce a useful result. If it cannot be completed by the deadline no result is produced at all and the deadline is missed. The optional part is the computation that follows the mandatory part and improves the result produced by the mandatory part. The optional part can be left un®nished at the expense of the quality of the result produced by the process. This approach can also be used by approximate query processing. The data needed to process the query is partitioned into two parts: mandatory data and optional data. The mandatory data must be retrieved and processed to produce a result of acceptable quality. If it cannot be retrieved and processed by the deadline,
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
the deadline is missed. However, this approach allows not processing the optional data so that deadlines can be met, and with this approach, it is possible for no exact answers to be produced. In order to compare AQP to traditional query processing, for this study we assume that as many exact answers as possible will be provided. We also assume that any approximate answer is useful, regardless of how much of the data was processed. Even if a query does not start executing before its deadline, a list of the possible classes can be provided to the user as an approximate answer. The priority-driven, preemptive algorithms, rate monotone and earliest deadline ®rst (Liu and Layland, 1979), are used to schedule the execution of the periodic tasks. For the rate monotone (RM) algorithm, the task with the shortest period has the highest priority. The ready task with the highest priority is allocated the CPU. For the earliest deadline ®rst (EDF) algorithm, the task with the earliest deadline has the highest priority and this priority may change during its execution. When a new task with the earliest deadline becomes ready, it is assigned the highest priority and allocated the CPU. The set of tasks is schedulable (no task misses its deadline) using the EDF algorithm if the total utilization is less than or equal to the feasible utilization, which is 1 for the EDF scheduling algorithm. The feasible utilization of a set of tasks using the RM algorithm is n
21=n ÿ 1, where n is number of tasks. Concurrency control, used to ensure data integrity, can cause transactions to miss their deadlines even when U is less than the feasible value. The concurrency control used for this study is a conservative two-phase locking (Song and Liu, 1995; Ulusoy and Belford, 1993). We assume that an update transaction con¯icts with a query transaction and other update transactions. A transaction must acquire all locks to the data before it can execute. This scheme avoids deadlock. Transactions that cannot acquire all locks because they con¯ict with an executing or blocked transaction that holds the lock, will wait until the locks are released. 3.2. AQP overhead The overhead of queries during AQP include class-ata-time processing required by an approximate_read, the overhead of the approximate relational algebra operations and the time required to generate and process the data objects during traditional query processing in order to produce an exact answer is Ttrad : Ttrad io time op time The io time is the time to retrieve the data and op time is the time to process the data for the query. The total time Taqp to process a query during AQP is Taqp :
Taqp
67
ar time io time op time
m X
! aop time
opi
! classes proc
aa time:
i1
The ar time is the overhead required by an approximate_read. Both io time and op time are the same for AQP and traditional query processing. aop time is the overhead of an approximate relational algebra operation opi , where m is the number of relational operations in the query expression. The overhead required by an approximate_read and the overhead for each of the complete set of relational algebra operations was presented in (Vrbsky and Jukic, 1995). The classes_proc is the number of classes processed before query processing is terminated. The time to evaluate the domains of the possible classes and print an approximate answer, is aa time. If an exact answer is produced during AQP, classes_proc is equal to the total number of classes and, since no approximate answer needs to be produced, aa time 0. The overhead in execution times during AQP is presented in (Vrbsky and Jukic, 1995). The number of classes into which a relation is categorized is chosen to maximize the percentage of data processed to produce the approximate answer and minimize the overhead. The results presented indicate that the overhead can vary depending upon the increase in disk I/Os required for an approximate_read. If there is one additional random I/O for each class processed, the overhead is 15% for relations of size 1 K, 6% for relations of size 10 K, 3% for relations of size 100 K and 2% for relations of size 1 M for a query containing the select and union operations. If there is one additional page accessed for each class processed and sequential prefetch is used, the overhead is 6% for relations of size 1 K, 3% for relations of size 10 K, 2% for relations of size 100 K and 1% for relations of size 1 M. When the data is buffered and no additional disk I/O is incurred due to classat-a-time processing, the overhead varies from 2% for relations with 1 K tuples to 0.05% for relations with 1 M tuples. For this study we assume that transactions with longer execution times involve larger relations and likewise, transactions with shorter execution times involve smaller relations. The overhead values are assigned to the transaction time accordingly. During AQP, the overhead of update transactions is the time required to update the semantic information maintained by the approximate query processor about the derived data. For example, the range or the minimum and maximum values of the set of domain values described in Section 2.2 must be stored and updated with any changes in attribute values. If new instances are added to existing classes, the domain information must also be updated. The semantic information is
68
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
stored in memory and we assume a 1% overhead for each update transaction as a conservative estimate of the time to update the values in memory. 4. Results The workload characteristics for this simulation are: the total utilization, the percentage of query transactions, the number of transactions, the type of scheduling algorithm, the ratio of the longest to the shortest period (the period ratio), the execution times, and the increased overhead in time required by the approximate operations. The total utilization ranges from U 0:2 to 1:0. The percentage of query transactions is varied at Q 20%, 50% and 80%. The number of transactions is varied at n 5; 10; and 20: The intermediate overhead values described above are assigned as follows. The overhead is 6% for 25% of the transactions with the smallest execution time, 3% for 25% of the transactions with the next smallest execution time, 2% for 25% of transactions with the next smallest execution time and 1% for 25% of the transactions with the largest execution time. The period ratio is varied at b 20; 50 and 100: The periods for a set of n tasks are randomly generated. The average execution time of the tasks is 100 time units. The execution time of the last task is deterministic; it is chosen to satisfy the total utilization U . The execution time for the tasks are varied among the following three choices: (1) equal utilizations
ui const 8i 1; . . . ; n, the execution times are varied so that the utilization time is constant for all tasks; (2) random; and (3) equal execution times (si const 8i 1; . . . ; n), the task with the longest period has the smallest utilization. All simulation results of percentage of missed deadlines and percentage of approximate answers are mean values with the 95% con®dence that the simulation error is less than 10% of the mean values.
Fig. 3. Q 50%, n 10.
for AQP and is less than the percentage for TQP for all utilization values which ranges from 0% to 5.37%. The percentage of approximate answers ranges from 0% to 2.41%. During AQP the percentage of missed deadlines is an average of 79% of the percentage of missed deadlines during TQP for U 6 1. There are 2-3 times more missed deadlines than approximate answers produced during AQP for U 6 1. 4.2. Percentage of query transactions Fig. 4 illustrates when the percentage of query transactions Q is decreased to 20%, the percentage of approximate answers decreases and the percentage of missed deadlines during AQP increases compared to Fig. 3. (Unless otherwise noted, the EDF scheduling algorithm is assumed in the remaining ®gures). The percentage of missed deadlines during query processing surpasses the percentage of missed deadlines when U 1: This is because fewer of the transactions are queries, and fewer approximate answers can be produced. The percentage of missed deadlines during AQP is an average of 87% of the percentage of missed deadlines during traditional query processing for U 6 1.
4.1. Total utilization Fig. 3 presents the percentage of deadlines missed (% Miss) during traditional query processing (TQP), the percentage of deadlines missed during AQP (AQP) and the percentage of approximate answers (AA) produced during AQP when n 10; b 50; Q 50%, equal utilizations of the tasks and U 0:2±1:0 for the EDF scheduling algorithm. As illustrated in Fig. 3, the percentage of missed deadlines and approximate answers increases as the utilization increases. The percentage of missed deadlines during AQP is less than the percentage of missed deadlines during TQP for all utilization values because approximate answers are provided instead of missed deadlines for the query transactions. The percentage of missed deadlines ranges from 0% to 4.48%
Fig. 4. Q 20%, n 10.
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
In Fig. 5, the percentage of query transactions is increased to Q 80%. The percentage of missed deadlines during AQP decreases to an average of 77% of the percentage of missed deadlines during traditional query processing. The percentage of approximate answers remains less than the percentage of missed deadlines for AQP when U < 1. When U 1, the percentage of approximate answers surpasses the missed deadlines during AQP. The sum of the percentage of missed deadlines and approximate answers during AQP is always greater than the percentage of missed deadlines which occur during TQP due to the overhead of AQP. This sum is the largest for Q 80% because the overhead of a query is larger than the overhead of an update during AQP. For values of U greater than 1, the percentage of missed deadlines increases considerably during both traditional and AQP. A set of tasks is no longer schedulable when U > 1, even when there are no data con¯icts due to concurrency control. As illustrated in Fig. 6, for values of U 0.9±1.3 and Q 80%, there are considerably more approximate answers produced than missed deadlines during AQP. The percentage of missed deadlines during AQP is an average of 26% of
69
the percentage of missed deadlines during TQP. When Q 50% the percentage of missed deadlines and approximate answers are close in value. For Q 20%, there are more missed deadlines than approximate answers produced during AQP, but fewer missed deadlines than during traditional query processing. When the percentage of query transactions is increased to Q 100%, all of the transactions are read transactions. There are no data con¯icts and no deadlines are missed when U 6 1 during traditional query processing. When U > 1; the set of tasks is no longer schedulable and the percentage of missed deadlines during TQP increases from 12% at U 1:1 to 35% at U 1:5. No deadlines are missed during AQP when Q 100% for any value of U . Approximate answers are produced instead. The percentage of approximate answers provided increases from 2.82% at U 1:0 to 39.4% at U 1:5. Fig. 7 summarizes the eect of the percentage of query transactions on the percentage of missed deadlines and approximate answers during traditional and AQP for U 80%. More deadlines are missed during TQP than during AQP. For smaller values of Q, the percentage of missed deadlines is similar for both traditional and AQP. As Q increases, the percentage of missed deadlines decreases for both AQP and TQP. The percentage of approximate answers increases slightly at Q 50%, because when there are more read transactions, more approximate answers can be provided than for Q 20%. There is also less overhead for Q 50% than for Q 80% because there are more update transactions, which have a lower overhead. When Q 100%, all of the transactions are read transactions. While no deadlines are missed when Q 100%, during AQP approximate answers are produced due to the overhead of AQP. 4.3. Number of tasks
Fig. 5. Q 80%, n 10.
When the number of tasks is decreased to n 5 and Q 50%, there is an overall increase in the percentage
Fig. 6. U > 1, Q 80%, n 10.
Fig. 7. Increase in queries, U 80%, n 10.
70
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
of missed deadlines for both TQP and AQP compared to n 10. However, the percentage of missed deadlines during AQP decreases to an average of 74% of the percentage of missed deadlines during TQP. The percentage of approximate answers is larger than the percentage of missed deadlines at a smaller total utilization value (U 1 instead of U 1:1) for Q 80%. As n is increased to 20 tasks and Q 50%, there is an overall decrease in the percentage of missed deadlines for both traditional and AQP. However, the percentage of missed deadlines during AQP increases to an average of 90% of the missed deadlines for TQP. This occurs because as the number of tasks increases, there is an increase in the total overhead due to AQP as U increases. This causes more update transactions to miss their deadlines, although the percentage of missed deadlines still remains less during AQP than TQP. As Q increases, more approximate answers are produced than missed deadlines. As illustrated in Fig. 8, the percentage of approximate answers produced begins to increase considerably and the percentage of missed deadlines decreases during AQP when U 1 and Q 80%. Fig. 9 summarizes the eect of the number of transactions on the percentage of missed deadlines and approximate answers during AQP and TQP for Q 50% and U 0:9: As illustrated in Fig. 9, as the number of transactions increases, the percentage of missed deadlines and the percentage of approximate answers during AQP decreases.
Fig. 9. Number of transactions, Q 50%, U 0.9.
swers during AQP is larger for the RM than EDF algorithm. Fig. 10 illustrates the results for RM and EDF when Q 80%. 4.5. Task utilization
When the RM scheduling algorithm is used, there is little dierence between the percentages of missed deadlines and approximate answers compared to EDF for smaller total utilization values. The average percentage of missed deadlines during AQP compared to TQP is similar to that using the EDF algorithm for Q 20%, 50%, and 80%. For larger values of U and all values of Q, the ratio of missed deadlines to approximate an-
In order to examine the eect of the task utilization on missed deadlines, all of the task execution times are set equal to a constant, so that tasks with shorter periods have higher utilizations. As illustrated in Fig. 11, almost no deadlines are missed for either traditional or AQP when U < 1 and EDF. (We assume EDF for the remaining ®gures). When U 1 there is an increase in the percentage of missed deadlines and approximate answers during AQP. Because all of the execution times are equal, there are no transactions with longer execution times and lower AQP overhead. Hence, the overall overhead of AQP is greater for equal task execution times, and more missed deadlines and approximate answers result at higher utilization values. As another variation in task utilizations, random task utilizations are assigned. Fig. 12 illustrates there is a decrease in the percentage of missed deadlines for random utilizations during AQP versus TQP, for U < 1 and Q 50%. This is also true for Q 20% and 80%. The
Fig. 8. n 20, Q 80%.
Fig. 10. RM vs. EDF, Q 80%, n 10.
4.4. Scheduling algorithm
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
71
Fig. 11. Constant task utilization Q 80%, n 10.
Fig. 13. Period ratio, Q 50%, n 10, U 0.9.
percentage of approximate answers is less for random utilizations than equal utilizations for Q 20% and 50%. At Q 80%, the ratio of approximate answers to missed deadlines during AQP is larger for random utilizations. When U 1 and Q 80%, the percentage of approximate answers during AQP is higher for random task utilizations compared to equal utilizations.
When the period ratio is increased to 100, the percentage of missed deadlines during AQP decreases to 76% of the missed deadlines during TQP for b 100 and U 0:9. When U 1, the ratio of the sum of the missed deadlines and approximate answers during AQP to missed deadlines during traditional processing decreases to 1.15 for b 100. The results in Fig. 13 illustrate that as the period ratio increases, the percentage of missed deadlines and the sum of the missed deadlines and approximate answers during AQP decreases compared to TQP.
4.6. Period ratio As illustrated in Fig. 13, when the period ratio b is decreased to 20, Q 50%, and equal task utilizations, the percentage of missed deadlines and approximate answers decreases when U 0:9. The percentage of missed deadlines during AQP is 91% of the missed deadlines during TQP for b 20. This is an increase from 82% when b 50: When U 1, the ratio of the sum of the missed deadlines and approximate answers during AQP to missed deadlines during traditional processing is also greater for b 20 than b 50, 1.81 and 1.28, respectively. This is because the period ratio is smaller and the task utilizations are equal, so the periods of the tasks and their execution times are closer when b 20. As discussed above, when all of the periods and execution times are equal in value there are no transactions with longer execution times, and less overhead due to AQP. Hence, more missed deadlines and approximate answers result.
Fig. 12. Random task utilization, Q 50%, n 10.
4.7. Overhead The overhead for AQP is determined by various factors as described in Section 3.2 and (Vrbsky and Jukic, 1995). In order to examine the eect of the overhead on the percentage of missed deadlines, the overhead is increased to 15%, 6%, 3%, and 2%. As illustrated in Fig. 14, the percentage increase in missed deadlines and approximate answers is larger compared to the lower overhead values as U increases for Q 50%. When Q 80% there is a considerable increase in the percentage of approximate answers for U 1. For Q 20%, the results are similar for both sets of overhead values.
Fig. 14. Overhead, Q 50%, n 10.
72
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
4.8. Aggregates As illustrated in the above results, the sum of the percentage of missed deadlines and approximate answers during AQP is greater than the percentage of missed deadlines during TQP. However, results presented in (Jukic and Vrbsky, 1996) indicate that when processing aggregates, such as MIN and MAX, not all of the data needs to be processed in order to obtain an exact answer due to the semantic information maintained by AQP. (The execution times of aggregate transactions are decreased by an average of 70%.) The semantic information is used to eliminate classes for retrieval and processing as follows. During AQP of an aggregate, such as MIN, AQP maintains a range of values as the approximate answer. This range of values always contains the exact answer and as query processing continues, the approximate answer converges to the minimum value that is the exact answer. During AQP, if the semantic information maintained by AQP for a class indicates the smallest value of that class is greater than the largest value of the current approximate answer, this class cannot possibly contain the minimum value and does not need to be retrieved and processed. Aggregates were included in our study to determine whether AQP can result in fewer missed deadlines during TQP. Fig. 15 illustrates that including aggregates decreases the percentage of missed deadlines and approximate answers during AQP. The sum of the percentage of missed deadlines and approximate answers during AQP is less than the percentage of missed deadlines during TQP. This is because during TQP, all of the data must be retrieved and processed for an aggregate. In Fig. 15, the results are shown when the tasks with the shortest execution times are assumed to be aggregates (SUM Short) and when the tasks with the longest execution times are assumed to be aggregates (SUM Long). Even though the tasks with shorter execution times have a higher overhead due to AQP than tasks with longer execution
Fig. 15. Aggregates, Q 80%, n 10.
times, the most dramatic decrease in percentage of missed deadlines and approximate answers occurs when tasks with the longest executions times were assumed to be the aggregates. This is because the impact of eliminating classes from data retrieval and processing is greater for tasks with the longest execution times. 5. Conclusions AQP has been presented as a strategy to satisfy the timing constraints of real-time databases. When a deadline cannot be met, during AQP approximate answers are provided by the deadline rather than no answer at all. In order to produce the approximate answers, an overhead is required. In this paper we have examined the performance of AQP by comparing the percentage of missed deadlines during AQP to the percentage of missed deadlines during traditional query processing. Database transactions are either updates, which can miss deadlines, or queries, which can provide approximate answers. Results have demonstrated that by providing approximate answers, predictability for realtime databases is increased because AQP allows more transactions to meet their deadlines despite the overhead. Fewer transactions miss their deadlines during AQP. The percentage of missed deadlines during AQP compared to traditional query processing was less for fewer transactions, equal task utilizations, larger period ratios and smaller total utilizations. The percentage of missed deadlines during AQP decreases as the percentage of query transactions increases. For the EDF scheduling algorithm and 50% query transactions, the percentage of missed deadlines during AQP was an average of 79% of the missed deadlines during traditional query processing. As the total utilization increases, the percentage of missed deadlines during AQP increased also, but for most of the results remained less than the percentage of missed deadlines during TQP. The sum of the percentage of missed deadlines and approximate answers during AQP is always greater than the percentage of missed deadlines during TQP due to the overhead of AQP (except when aggregates are included). For smaller total utilization values, the sum is only slightly larger. As the total utilization increases to 1, there is an increase in the percentage of missed deadlines and approximate answers produced because the overhead of AQP causes the utilization to exceed the feasible value. More deadlines were missed during AQP than during TQP when the total utilization value was 1 for smaller period ratios and larger percentages of update transactions. More deadlines were also missed during AQP than during TQP for larger total utilizations when transactions with shorter periods had longer execution times. Fewer approximate answers and more missed deadlines oc-
S.V. Vrbsky, S. Tomic / The Journal of Systems and Software 41 (1998) 63±73
curred at larger total utilization values for the rate monotone than the earliest deadline ®rst scheduling algorithm. If there is an overload in the system, causing the total utilization to exceed the feasible utilization value for TQP, AQP was shown to miss fewer deadlines than TQP. This was particularly dramatic when the total utilization of the tasks was greater than 1, and more of the transactions were queries than updates. For example, the percentage of missed deadlines during AQP was an average of 26% of the missed deadlines during TQP when the percentage of query transactions was 80%. Real-time databases require the maintenance of temporal consistency constraints as well as the deadlines. In the future we will examine the eect of AQP on the temporal consistency of real-time databases. The eect of AQP on the age of the data will be compared to traditional query processing. We will also examine the eect of AQP on the percentage of missed deadlines when optimistic concurrency control algorithms are used instead of two-phase locking. Acknowledgements This material is based upon work supported by the National Science Foundation under Grant No. IRI9409694. References Bestavros, A., 1996. Advances in Real-Time Database Systems Research. SIGMOD Record 25, 2±7. Chu, W.W., Lee, R.-C., Chen, Q., 1991. Using type inference and induced rules to provide intensional answers. In: Proceedings of Seventh International Conference on Data Engineering, pp. 396± 403. Chung, J.Y., Liu, J.W.S., Lin, K.J., 1990. Scheduling periodic jobs that allow imprecise results. IEEE Transactions on Computers 39, 1156±1174. Durrans, S.R., Tomic, S., Nix, S.J., 1997. An evaluation of data needs to support ¯ood frequency estimation at regulated sites. In: Proceedings of the XXVII International Association for Hydraulic Research Congress, pp. 494±499. Hammer, M., McLeod, D.M., 1990. Database description with SDM: A semantic database model. In: S.B. Zdonik, Maier, D. (Eds.), Readings in Object-Oriented Database Systems, Morgan Kaufman, Los Altos, CA.
73
Jukic, N., Vrbsky, S.V., 1996. Feasibility of aggregates in timeconstrained queries. Information Systems 21, 595±614. Lehoczky, J.P., Sha, L., Strosnider, L., 1987. Enhancing aperiodic responsiveness in a hard real-time environment. In: Proceedings of IEEE Eighth RTSS, pp. 261-270. Lin, K.J., Natarajan, S., Liu, J.W.S., 1987. Imprecise Results: utilizing computations in real-time systems. In: Proceedings of IEEE RealTime Systems Symposium, pp. 210±217. Liu, C.L., Layland, J.W., 1979. Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of the ACM 20, 46±61. Peckham, J., Maryanski, F., 1988. Semantic data models. ACM Computing Surveys 20, 153±189. Ramamritham, K., 1993. Real-time databases. International Journal of Distributed and Parallel Databases 1, 199±226. Sha, L., Rajkumar, R., 1988. Concurrency control for distributed realtime databases. SIGMOD Record 5, 82±98. Smith, K.P., Liu, J.W.S., 1989. Monotonically improving approximate answers to relational algebra queries. In: Proceedings of COMPSAC '89, pp. 234±241. Song, X., Liu, J.W.S., 1995. Maintaining temporal consistency: pessimistic vs. optimistic concurrency control. IEEE Trans. on Knowledge and Data Engineering 7, 786±796. Ulusoy, O., Belford, G., 1993. Real-time transaction scheduling in database systems. Information Systems 18, 559±580. Vrbsky, S.V., Jukic, N., 1995. Analysis of approximate query processing overhead. In: Proceedings of ISMM Intelligent Information Management Systems, pp. 21±25. Vrbsky, S.V., Liu, J.W.S., 1993. APPROXIMATE: A query processor that produces monotonically improving approximate answers. IEEE Trans. on Knowledge and Data Engineering 5, 1056±1068. Vrbsky, S.V., Liu, J.W.S., 1995. Producing approximate answers to database queries. In: Natarajan, S. (Ed.), Imprecise and Approximate Computations, Kluwer Academic Publishers, Dordrecht. Vrbsky, S.V., 1996. A data model for approximate query processing of real-time databases. Data and Knowledge Engineering 21, 79±102. Susan V. Vrbsky is an Assistant Professor of Computer Science at the University of Alabama, Tuscaloosa, AL. She received her Ph.D. in Computer Science in 1993 from the University of Illinois, UrbanaChampaign and an M.S. from Southern Illinois University, Carbondale, IL. Her research interests include real-time database systems, temporal, object-oriented and federated database systems, and database security. Sasa Tomic is a graduate student in the Computer Science Department and Civil and Environmental Engineering Department at the University of Alabama, Tuscaloosa, AL. He received the B.S. and M.S. degrees in Civil Engineering from the University in Sarajevo, Sarajevo, Bosnia and Herzegovina in 1991, and the University of Alabama, Tuscaloosa in 1995, respectively. He is currently working on his Ph.D. dissertation in Civil Engineering and an MS. thesis in Computer Science. His research interests include computer modeling of hydrologic processes, temporal and real-time databases and distributed systems.