Data & Knowledge Engineering 3 (1988) 49-61 North-Holand
49
Issues in relational database performance (invited paper ISDBMS) D a v i d A. B E L L University of Ulster at Jordanstown, Shore Road, Newtownabbey, County Antrim BT37 OQB, Northern Ireland Abstract. This paper presents some of the most notable recent contributions to the subject of relational database performance. The major topics addressed are access path selection and implementation, data clustering, buffer management, and query optimisation and some work in these areas in the author's laboratory is briefly outlined.
Keywords. Access path selection and implementation, Data clustering, Buffer management, Query optimisation.
1. Introduction
Database performance evaluation is not an easy subject to write about. The problem is basically that although there is a huge amount of accumulated knowledge in this area, it tends to be spread over many technical papers, research reports, user manuals and a great variety of published papers and books. To treat the topic in an exhaustive way is clearly beyond the scope of a paper such as this. The best we can hope to do is to present a systematic approach to the subject and to indicate where research in some of the most conspicuous issues is moving. The choice of the "most conspicuous issues" is clearly an idiosyncratic one but the issues treated here are intended to be those whose importance is fairly self-evident and whose selection is unlikely to be too controversial. Before indicating what is meant by "database performance", there are two measures which must be acknowledged as database "success indicators" but which are not considered here to be "performance". One is database system effectiveness: the extent to which a data system provides flexibility of access to data while securing the consistency and minimising the redundancy of data. This is what the vendors of database system software promise and we assume that it is provided. The other measure which is excluded here is usability: the ease of invocation of the sysem's capability--how "naive" a user of the system is allowed to be, while still being able to request sophisticated retrievals. For the present purposes we also take this as granted. A system's value depends on how well it accomplishes its functions in order to provide an effective and usable service. This is what is meant by performance here, and it is assumed to be characterised by three fundamental indicators---responsiveness, productivity and resource utilisation. Obviously these performance indices depend on lower-level, "internal", indices such as lengths of queue~ for facilities, paging rates and buffer pool statistics. The next section of this paper provides a brief systematic view of database performance. A fuller discussion is presented elsewhere [2]. The subsequent sections focus on a number of issues currently being actively researched in the author's laboratory and elsewhere. 0169-23X/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)
50
D.A. Bell
2. Database systems Database systems have built-in contention because they are sharing systems and therefore many, often conflicting, applications with perhaps incompatible demands, can be expected to arise, Performance optimisation is clearly desirable in a world with limited resources, including time, but the database system cannot be tailored to any one application-compromise is needed. Some method of ranking applications is required to help in conflict resolution, and privileges must be granted accordingly. However suboptimisation, where one application's needs are exclusively met at the expense of others', must be avoided. Tolerances in the requirements for the higher-ranked applications should be exploited. As well as suboptimisation between applications, there can be suboptimisation between system modules. It is very easy to lose the global perspective when tuning a particular module. The interworking and "pipe-lining" of modules must be clearly borne in mind by the system architect. Bottlenecks can occur in various parts of the system and there is a fascinating relationship between the design of system parameters, the work-load to be addressed to the system and performance. The systems parameters include the timing, capacity and architecture of hardware, the physical and logical structure of the database, and the algorithms used in the application and system software. The latter include buffer management algorithms, techniques for query optimisation, and data storage and access strategies and structures (such as data placement and indexing mechanisms). These are related closely to decisions made by the database designer--for example it is possible to include some design decisions, such as index selection or data placement details, as functions provided by the system software. The alternative is to decouple these design issues from the database management system (USMS) and include them in a separate design tool. An example of the latter approach is the EOS system for Codasyi database design [72], which has resulted in a commercial product DBAP [20]. Our approach resembles this closely but we consider relational databases only and include reorganisation in our design. An alternative approach which we have not yet finally discarded, is to use an Expert-System based approach. This approach was illustrated in a commercial setting by Koa [40] who produced the DECIDES system. In this "the knowledge of the DBA was incorporated.., in the form of about 150 rules representing the questions typically asked.., and the decisions m a d e . . . " Like EOS this was based on a layered approach to database design, and we also adopt this design approach. In this paper we do not consider whether or not the performance-impacting packages we discuss are in the DBMS or not. However we distinguish between design tools for the current generation of relational DBMS's, and tools for more futuristic systems. The reason for this distinction is that current or imminent commercial systems give little opportunity to select or tune modules or algorithms for query optimisation, access methods, placement methods or buffer management. The distinction is made in a project we are conducting at present in which we intend to provide a design tool for the "available" DBMS's and another which assumes functionality not offered in current packages [73]. We hope that this may influence ultimately the design of the next generation of DBMS's. The remainder of this paper discusses a number of issues of special challenge and fundamental importance to database performance. We treat them individually but realise that they interact in a way which is not understood fully at a general level. The intention is to be thought-provoking by showing some approaches to these problems rather than by providing original research results.
Relational Database Performance
51
3. Access path implementation and selection The data in a database is stored physically in accordance with some organisation method on storage cells of a computer system. The physical access method, PAM, upon which any given organisation is based specifies a particular relationship between the cells. To satisfy a transaction on the database, an ordered array of cells has to be visited forming a physical access path. Choosing PAMs from among the many contenders is one of the most fundamental database system design problems. Zezula and Zlzka [80] have pointed out that many important performance-prediction subsystems, such as those for access path selection, query optimisation and record design, make over-simplifying assumptions about the PAMs available. These assumptions limit the practical applicability of the subsystems--finer tuning of their relationships with the PAMs is required. Zezula and Zlzka [80] present a File Organisation Performance Evaluation System, FOPES, which is an analytical tool implemented in Pascal for predicting the performance of individual file organisations. Separability Theory [75] is invoked to permit the decomposition of the database into sub-objects which can be optimised individually. Researchers have identified many different issues in PAM design and selection, but it is possible to identify two major approaches, viz primary-key based methods and multiattribute methods. Most of the methods one finds discussed today are of the former kind but increasingly multi-attribute methods are appearing. An example of the multi-attribute method is the Grid File [58] and a number of partial match techniques have been suggested
[511. Methods available traditionally for data access have a crude and expensive overflow-based method of handling situations where too many records "belong" to a particular set of storage cells. The problem is that as overflow chains lengthen the time to get a particular tuple directly tends towards that needed for a scan of the file. To service applications requiring large volumes of insertions and deletions dynamic file methods have been designed. These accommodate expansion and contraction gracefully without degrading performance too much. B-trees are the best-known example of this type of structure but many extensible hashing schemes seeking to improve on B-trees have appeared this decade. Well-documented examples are Dynamic Hashing [47], Extendible Hashing [25], Trie Hashing [49], and Hash Trees [21]. Such methods have been categorised [3] and it is clear that there is no single best method for all sorts oi access profiles, application scenarios and value distributions. The discussion which follows tries merely to give an appreciation of the diversity of recent approaches. The problem with B-trees is that we need large page sizes for low retrieval cost. An exploration of file size:index size ratios for real systems shows that we could easily need about 116 K for single access retrieval [50]. Doubling the page size from, say, 4K to 8K bytes doubles the ratio--but we need a much greater increase. Page sizes are increasing but this problem should be tackled in other ways also. We would like to identify some technique of increasing the file size:index size ratio but keeping the unit of data transfer (page size), utilisation and overflow under control. One approach, used in a number of methods listed above, is to combine indexing and hashing methods in a single PAM. Recent developments of extensible key-based methods include combining the home and overflow areas of the files, and a method of "partial expansion" was introduced [48] so that any number of insertion and deletion updates can be made efficiently without periodic file reorganisation. Spiral Storage [57] and Bounded Index Exponential Hashing [52] are similar developments which give stable average performance when the Storage demand fluctuates. The main advantage of the Hash Trees method [21] was that the key-address link in
52
D . A . Bell
secondary indexes was broken. Keys are associated with "surrogates" so that merging or splitting of relation fragments does not cause updates of secondary indexes. Similar ideas have appeared more recently in other methods [19, 50]. A recent contribution [10] enhances the Hash Trees hashing method under particular circumstances, using information in the key. Carg and Gottlieb [15] present ways of constructing key transforms which preserve the key sequence. This removes the need to store an index to provide direct and sequential access to tuples. They make non-uniformly distributed key spaces "piecewise uniform" (not to be confused with the same term used in Section 4), and the results are uniformly distributed as well as order-preserving with small overheads. This is clearly a useful contribution to Extendible Hashing [25] for example, which is enhanced by an order-preserving hash method. Few DBMS's available currently have incorporated these techniques. Relational languages are set based and therefore are somewhat at odds with key.based access methods. It would be very nice to see a package which incorporated some of the key-based methods in, for example, the Predicate Tree method [19]. This method permits the designer to define set clustering and its advocates consider it as a generalisation of some of the methods already mentioned. Each relation has a predicate tree of its own--the predicates branch out from the roots and the leaves are the logical intersections of the predicates associated with their branches. All contiguous sub-trees with the same father are clumped together in a single leaf, and it is assigned an appropriate number of cylinders. The resulting "clustering tree" for a given relation is represented as a relation, linking a signature prefix "describing the path to the leaf from the root, with its address". Values specified in a query are hashed to get signature envelopes indicating the partitions holding the pertinent data. Like Hash Trees a level of independence between secondary key and address is provided by using signatures instead of addresses. Predicate Trees offer a method of finding which parts of a relation are likely to contribute to a join. Clearly clustering on common attributes enhances the join evaluation. Pre-joining in the directory and parallel execution of joins is possible. Since a file contributing to a database will typically have several attributes appearing in queries, secondary indexes are very important. Partial matching is also very important because exact matching is not always possible. It would be nice if the extensible methods could be extended to provide this facility. Kelley and Rusinkiewicz [41] present a design and describe an implementation for an addressing scheme in which multi-key hashing and multi-attribute pseudo-keys are used for relational database access. Another contribution [61] in this area is essentially a multi-dimensional version of Dynamic or Linear Hashing. Lloyd and Ramamohanara [49] discovered that although Linear Hashing gives better retrieval time than Extendible Hashing for single keys, this is reversed for partial matching. Yuen and Du [79] explored this further. KD Trees [12] are associative file structures which are also claimed to improve on "flat" file structures by allowing partial, best match and nearest-neighbour matching and range searching as well as exact matching. An interesting file structure comparison was conducted by Beckley, et al [13] when, as others have noted, they found that no single file structure was best for all query types, hence explaining the need for good index and PAM selection methods. The problem of selecting indexing attributes is to select a set of indices which minimises the cost of processing transactions on a database. Finding the set of attributes for whicL indices are to be constructed is not easy and it has cost-performance tradeoffs. Presumably the indices enhance the efficiency of retrieval by providing fast access to data objects addressed in queries, but, since they are redundant data representations, they have a negative effect on the storage utilisation and on updates. So the extreme solution of indexing on all attributes (a fully inverted file) is very costly. Great care is needed in the selection of
Relational Database Performance
53
attributes for indexing; the subset of keys chosen must be small and efficient. We consider only secondary keys. Hoffer [33] has discovered that typically database designers select indices based on crude rules-of-thumb and informal methods. Less than 3% of his sample were systematic in their selection methods. Reindexing was common because of poor initial choices. When all the details of cost equations are considered for multi-attribute retrievals and updates were ennumerated the complexity of the problem became apparent [78]. This was formalised by Piatievsky-Shapiro [62]. Heuristic methods for problem reduction are therefore attractive. Schkolnick [68] and Hammer and Chan [31] made useful contributions. Hoffer and Kovacevic [34] extended Sehkolnick's branch-and-bound search technique but the method was still restricted to single files. This is the case with most past work and the problem of accounting for clustering (on the P-key) is still outstanding. Whang [76] presents an algorithm which extends the capability in these directions (in the latter case using the separability property). It is similar to that of Hammer and Chan [31] but instead of starting with no indexes and adding new ones, it starts with full indexing and proceeds by dropping selected ones. Since the problem is NP-Complete [62] the results obtained by these algorithms cannot be guaranteed optimal. Whang's method was found to be optimal in many cases where dropping-index heuristic gave suboptimal results. It should be noted Whang considered only accessing~maintenance cost of indexes (not storage) but this can be included in the maintenance cost if required. This is probably the most important of the Index Selection algorithms currently available.
4. Placement of data
It is desirable to have pairs of data items which are required consecutively on some query access path to be stored physically "near" to each other. As the number of queries increases the complexity of arranging for this becomes clear, and so usually only the most prominent queries are privileged to be considered for "optimisation" of their placements. Clearly automated design aids are called for to supplement the human designer's skill and experience. Work in this area extends downwards to theoretical work on the Consecutive Retrieval Property [28] which requires that tuples pertinent to a query be stored in consecutive storage locations. Further idealised work was done by Yu, et ai., [77] where only the frequency of access was considered, giving the well known organ-piping arrangement as a solution. Eswaran [24] and Bell [4] have demonstrated the inherent difficulty of many more realistic placement problems, including tht outlined above. Hence many of the proposed solutions are heuristic and may give sub-optimal results. The solutions are not as extensive as one might have expected. Early work was done by Flory [27] who presented a technique to minimise first the number and then the length of inter-cluster queries for inverted files. Jakobsson [36] minimised the scattering of attribute values. Schkolnick [67] partitioned IMS "type trees" using a linear time optimising algorithm. For more general network data structures the need to have tuples clustered near many other tuples of a single relation and then near many other tup!,~s in other relations has led to a search for heuristic placement algorithms which can be invoked as data is being loaded into the database.
54
D.A. Bell
A linear time algorithm was suggested by Maimquist et al. [53] to collapse a weighted directed graph representing a CODASYL database by following continually a policy of minimising the total weight of the edges which represents the inter record links initially. As the algorithm is executed the edges represent the associativity between the distinct partitions which are constrained to be page sized. Bell [5] extended this method to a multi-layer case where the pages themselves could be clustered (eg on disks), and these new clusters could be clustered and so on. Using simulation studies it was shown that the two-level version which kept the O(n) time complexity function, improved on the one-level version substantially. Subsequently [8] further experiments confirmed this result for a wider selection of cases bdt it was discovered that when the fanout between the nodes was high, the advantage was less significant than for lower fanouts. First the database is expressed as a graph G(V, E), like that of Maimquist et al., where V is the set of tuples and E is the set of edges as defined above. The edges are weighted in accordance with their "importance" [6]. The nodes are really pages, assigned 1 tuple each initially. Each page is also assigned to a cylinder initially, and both pages and cylinders (and any higher level storage units) fill up or are removed from consideration. begin express E as a queue Q in weight order while Q is non-empty begin for the first edge e in Q if the corresponding page occupancies are small enough then COLLAPSE else if the corresponding cylinder occupancies are small enough then CLUSTER end merge contents of pages which can go on a single page merge contents of cylinders which can go on one cylinder end The details of COLLAPSE, for merging the nodes connected by an edge onto 1 page, and of CLUSTER for merging the contents of the cylinders holding these nodes can be found in [2], Usually the heuristic algorithms one finds in the literature to solve problems such as that above are greedy in that only forward, objective-improvi,g steps are taken. Examples of algorithms for combinatorial optimisation problems which allow controlled backward steps are the MMC algorithm [55] and the Simulated Annealing (SA) algorithm based on it. These are unusual examples, however, in that they take account of the state of the search process at the time when accepting or rejecting the backward steps. The MMC Algorithm is a method of simulating the behaviour of particles in substances as they are melted and then carefully cooled in order to form crystals. When the temperature is high enough there is a chance that some particle will acquire enough energy to surmount an energy barrier and actually end up with lower energy than it had at the start. If the temperature is too high thermal equilibrium will never be attained; if it is too low freezing to a meta stable state results. Simulated Annealing is based on MMC in that the melting and freezing are modelled. Temperature is modelled simply by a control variable. Kirkpatrick et ai. [44] have performed experiments to show that SA is a good technique for such optimisation problems as the arrangement of chips in a ceramic. We decided to try it for data placement.
Relational Database Performance
55
The structure of SA is summarised as follows by Mitra [56]. begin X: = Jo m:=0
while (termination criterion is not met) begin while (inner loop criterion is not met) begin j: = G E N E R A T E (X) it ( A C C E P T (D, T~,)) then X: = j end T.~ ÷t: = U P D A T E (T~) m:=m+l end end Jo is the initial configuration, D is the difference between the new, j, and the old, X, configurations, and T m is the mth value of the control variable representing temperature. In our case G E N E R A T E (X) is a routine which randomly selects a tuple from X and a page to try to move it to. ACCEPT (D, Tin) returns the value "True" if the selected page has room for it and either D is negative, or r, a random number on (0, 1) is less than exp(-D/KbTm) where K h is Boltzmann's constant. UPDATE is a routine to select the next temperature Tin+ 1 in sequence. Initially Tin÷ I =0.9 Tm was used but more sophisticated "annealing schedule" definitions are possible [56]. In [9] experiments to compare the Graph Collapsing (GC) and SA approaches were conducted. The GC method outperformed SA at low fanouts, but as the fanout was increased SA was the better. SA had better overall performance but the maximum improvement was a mere 8%. A major disadvantage of SA was the high CPU time required to do the clustering. The CPU time for GC was in the range 2-55 rains; the corresponding figures for SA were 3-16 hours! Similar heavy processing overheads were noted by other authors. The extents of these costs and benefits rule SA out as a contender for a practical physical design tool, and we will be incorporating a single-layer version of GC in our prototype. Further research will be conducted into the costs and benefits of incorporating redundancy into the placement and to consider when de-clustering [26] may be used in some circumstances of physical design. A further approach which should be considered is that of adaptive clustering [78l. This method allows sudden changes in access patterns to be detected and a new placement determined. It needs to be generalised to the multi-file case.
5. Buffer management In modern computing systems, space is provided for temporary storage of some pages of information. The purpose of providing buffer space is to allow harmonisation of the diverse speeds of processor and peripherals. Generally a DBMS will provide private buffer space which is better tailored to the needs of database systems than the available operating system buffers. But this can degrade overall system performance unless particular care is taken with the design and control of the buffering activities [71].
56
D.A. Bell
The database buffer manager maps the database address space onto the buffer pool pages. Secondary storage is accessed only if data required for the current transaction is not present in the buffer. Clearly such paging activity is affected by any mismatch between the buffer manager and the operating system's page replacement algorithms. Typically a physical data access uses up tens of milliseconds of access time as well as time for execution of several thousand CPU instructions. The well-known least-recently-used page replacement algorithm is acceptable generally, but it only works well for databases if there is locality of reference--a page has a good chance of being rereferenced while it is in the buffer [71]. So direct or cyclic access patterns are served poorly by this approach. Many operating systems "prefetch" automatically, which compounds the issue. Stonebraker [71] showed how a 10-15% improvement is promised by private database buffering. The database system has "knowledge" of access patterns which should be exploited in buffer managers. RTI's Ingres has incorporated such "intelligent" buffering in its most recent versions [63]. Some buffer managers permit the fixing and unfixing of pages in the buffer pool [22] so that they are not swapped out inadvertently. Clearly the fixing period needs to be short if thrashing is to be avoided. The least recently addressed page is no longer necessarily the same as the least recently referenced one and so the replacement algorithm is affected. Buffer management is a somewhat neglected topic for research compared to, for example, concurrency control but recently there has been a raising of its profile. Three interesting contributions appearing recently in the literature are based respectively on the Hot-Set model (HS) leg [65]], the query-locality-set model (QLSM) [181, and a new policy of buffering for nested-loop joins [32]. Hagmann [32] points out that conventional wisdom governing buffering for nested-loop joins would dictate that there should be room for all of the smaller relation, plus at least one page from the larger relation to be accommodated in memory. He argues that modern hardware and software developments, with larger buffers (thousa:ids of pages rather than a few pages) and predicted continued dominance of access time as a component of total access time, coupled with the clustering of adjacent pages in files on the storage devices by operating systems, dictate that the number of l/Os should replace the number of pages transferred as a metric of performance for buffering. He shows that the best buffering policy for the nested-loop method when no preselection is possible and all the tuples in a page are from the same relation, is to split the buffers evenly between the two relations if we "rock the relation through the buffer" [42]. Sacco and Schkolnick [65] have introduced a new model to characterise the buffer space needed by queries. Because the reference patterns are often known for database queries, the hot-set model can estimate buffer requirements before executio~n and this makes the method useful for query optimisation. Also the method allows processes always to be allocated enough buffer pages and so thrashing (page stealing) can be avoided. Obviously we do not want to leave too much space for a process either. The basic concept of this model is derived from the observation that a plotted curve of page faults against buffer size has a number of conspicuous discontinuities in it. These occur when a set of rereferenced pages do not fit in the buffer. Stable intervals are characterised by the buffer size at their lower boundary, called the hot point, Other buffer sizes in a stable interval use more resources but give no benefit in terms of fault reduction. Sacco and Schkolnick introduce hot-point identifying tools. This model improves on earlier models by exploiting the predictability of reference strings. But it is based on the LRU algorithm which is unsuitable for'certain looping patterns. Often it is the most-recently-used (MRU) page in a loop which should be replaced. Two recent developments have been by Chou and De Witt 118] and by Sacco [66].
Relational Database Performance
57
The query locality set model, QLSM [18] separates the model from the replacement algorithm. The associated algorithm, DBMIN, gives each file a local buffer pool to hold its set of buffered pages ("locality set") the size of which is calculated in advance and can be varied depending on the file access method used. The local pool can also have a tuned replacement policy. A simulation performance evaluation study was conducted and with the synthetic database and query workload used, DBMIN gave modestly improved throughput compared to the hot-set algorithm, with less overheads. Sacco [66] investigated the value of the Hot Set model when a new replacement algorithm, ILRU, was used. In this case it was not the LRU page that was replaced but there was an inverted list for each file, somewhat like MRU. This was found to provide advantages over the LRU policy.
6. Query optimisation Given a particular placement of data, choice of access paths and buffering techniques, and assuming that the other necessary design decisions have been taken, the database can be implemented and queries addressed to it. By providing tabular views of data Relational Database Management Systems offer major advantages to database users who have only had procedural interfaces to data in the past. Using procedural languages they have had to define what has to be done to fetch the qualified data and also to control the logic flow of the program. Non-procedural languages allow the users to specify the result and do not require them to specify actions needed to obtain the result. Physical access paths and ideally even logical access paths (see e.g. the approach of System/U [46] or deductive interfaces (e.g. [16])) are hidden from the end-user. The languages are physical- (and perhaps logical-) structure independent. The advantage of such an approach is increased productivity from programmers and easier "layman" access to databases. This means however that the database system must be capable of specifying a strategy for evaluating any query. Since there are many such strategies possible for even a modestly complex query much attention has been paid in recent years to techniques for optimising the evaluation of non-procedural queries. This is intended to enable the performance of query evaluation to match its flexibility. Good overviews of these techniques are presented by Ullman [74] and Jarke and Koch [37] for centralised databases, and by Ceri and Pelagatti [16] and Bell et al. [17] for distributed databases. Recently the power of relational query languages has been enhanced to allow 'secondorder operations' such as transitive closure and aggregation and a whole new set of query optimisation techniques has been introduced (see Bancilhon and Ramakrishnan [1] for a survey). There appears to be considerable scope for improvement of the performance of queries on deductive databases by executing of them on parallel devices and perhaps using concurrent languages. This raises a number of additional issues and alternative approaches to those encountered for non-deductive databases [70]. In the rest of this section, however, we restrict our attention to the optimisation of queries in conventional Relational Databases and list some areas receiving the attention of researchers in this field. These have been used to classify modern algorithms [11]. We assume that the reader is familiar with the decomposition of relational queries into trees for evaluation and with the notion of a query transformation phase (where certain transformations of the order of execution of relational operations are always beneficial) followed by a query mapping phase (where the operations are mapped to the low-level structures and algorithms of the system) (see e.g. [74]).
58
D.A. Bell
6. I. Static or dynamic algorithm ? in static algorithms the processing strategy including the handling of intermediate results, is determined completely before query execution. In dynamic algorithms not all of the information needed for strategy selection is available (or estimated) at initiation of execution but becomes known during the course of execution. In the former case the estimation of intermediate result sizes is required. Clearly this introduces inaccuracies which are propagated throughout the query tree. We consider this in section 6.2. 6.2. Are attribute values assumed to be uniformly distributed? If the values of the join-columns, for example, are assumed to be uniformly distributed, serious errors in the estimated sizes of intermediate relations can arise [17]. Query optimisation is costly and so to get good value from it we must be assured that it is not widely inaccurate. Epstein [23] carried out a comparison of estimation using uniformity and a few other crude estimation techniques, and found that even these improved upon its accuracy. Much work remains to be done in this area and in the related area of characterising the correlation between attributes. One promising technique is to use a "piecewise uniform" technique [11] where the range of the attribute values is divided into, say 10, fixed "pieces". The uniformity assumption is made for each piece rather than for the whole relation and this gives rise to an obvious hypothesis that because the attribute value distribution is followed more closely this method will give better estimations. This hypothesis has been verified by both analysis and simulation. The technique is used in two prototype multi-database systems viz the Experimental Distributed Database System [59] and MULTISTAR I601. 6.3. Which join method is used? Clearly the join operator is the most influential of the relational operators on performance and so it is important to search for good implemenatation techniques. An interesting method for distributed databases has been added recently to the well-known merge-scan and nested-loop methods, i.e. Bloom-Join. It is based on Bloom Filtering [14] and amounts to a bit hashed Semi-Join. The outer relation's join attribute is scanned and values are hashed to a binary vector position, setting them to 1. The vector is used subsequently to compare with the hashed values from the inner relations join attribute. Mackert and Lohmann [54] showed that it is twice as fast as Semi-Join for many networks and relation cardinalities. An alternative join method was presented by Cheiney et ai [19]. 6.4. Is simultaneous query traffic considered? Ways of exploiting global information on batches of queries addressed simultaneously to a database have been suggested by Kim [45], Jarke [39] and Satoh [64] amongst others. An interesting idea is to try to look for patterns in the query trees and exploit the common expressions represented by these. A starting-point is to use work done by researchers in other areas and apply it to this problem [38]. 6.5. is the search exhaustive? Because of the combinatorial explosion of the time taken to execute a search for an optimal query plan, most algorithms use heuristics to prune the search tree. For example in IBM's System R a heuristic is to reduce the permutations for multi-way join execution by
Relational Database Performance
59
considering only join orders where the predicates relate the inner rel;~,:ion to relations already in the join [69]. Again there is considerable scope for innovation in this area. 6.6. Is semantic query transformation used?
Conventional query optimisation involves searching for efficient, pc,:~sibly interleaved, sequences of database operations to satisfy a query. Semantic query ~'ansformation was introduced by Hammer and Zdonik [35] as a means of transforming any query into a semantically equivalent one (ie giving the same result) which executes faster. Knowledge of the semantics of the application and computing environment was used f0y King [43] to get equivalent queries in QUIST. Recently this possibility has been studied t~ore intensively and various means of "cutting corners" have been suggested. For example, Grant and Minker [30] suggest that knowledge of a functional dependency can avoid fully scanning a large table or index due to the fact that only a single value of the right-hand-s/Me attributes (of the functional dependency) is possible for a given left-hand-side attributes.
7. Conclusion In this paper we have surveyed some of the issues impacting on the performance of relational database systems and looked at some research or development studies addressing those issues. As we stated at the outset, neither the list of issues nor the list of studies in each case are meant to be exhaustive. Important issues such as concurrenecy control, data compression and hardware support are omitted. The survey is essentially a personal view and is therefore biased towards the author's own projects and viewpoints. However some conclusions are possible. One is that the interdependence of these problems and their solutions is ill-understood. Work is needed to improve this situation. The other main conclusion is that with so much going on in this problem area there are bound to be practical benefits in terms of performance "on the ground" in database shops. Those suffering from bottlenecks and poor utilisation at present may not take much heart from this, but future generations of DBMS's and associated tools will have much better performance modelling, tuning and optimising features than the present generation. And not before t i m e . . .
References [11 F. Bancilhon and R. Ramakrishnan, An Amateur's Introduction to Recursive Query Processing Strategies, Proc ACM SIGMOD Conf. (1986) pp. 16-52. [2] D.A. Bell (ed), Database Performance, Pergamon-lnfotech State-of-the-Art Report 12 (4) (1984). [3] D.A. Bell and S.M. Deen, Key space compression and handling in PRECI, Computer I. 25 (4) pp. 486-492. [4] D.A. Bell, Difficult data placement problems, Computer I. 26 (4) (1984). [5] D.A. Bell, Physical record clustering in databases, Kybernetes 13, pp. 31-37 (1984). [6] D.A. Bell, Capture and Analysis of Data Placement Requirements, Pergamon Infotech State-of-the-Art Report in Database Performance (1986). [7] D.A. Bell, D.H.O. Ling and J.B. Grimson, Query optimisation, Deliverable 7(3) ECMAP Project 773B (Available Universityof Ulster). [8] D.A. Bell, F. McErlean, P. Stewart and W. Arbuckle, Clustering related tuples in databases, to be published in Computer I. (1988). [9] D.A. Bell, S. McClean, F. McErlean and P. Stewart, Application of simulated annealing to clustering tuples in databases. To be published in J. American Soc. Inf. Syst. [I0] D.A. Bell and H. Jenkins, Key compression by forbidden zones, Kybemetes 16 (1987). [II] D.A. Bell, D.H.O. Ling and S. McClean, Working Paper on Query Optimisation, Universityof Ulster (1987).
60
D.A. Bell
[12] D.A. Beckley, M.W. Evens and V.K. Raman. Multi-key retrieval from KD trees and quad trees, Proc. ACM SIGMOD Conf. (19851 pp. 291-303. [13[ D.A. Beckley, M.W. Evens and V.K. Raman, Empirical comparison of associative file structures, Proc. lnt. Conf Foundations of Data Organ, (Kyoto, 19851 pp. 320-328. [14] B.A. Bloom, Space-time trade-offs in hash coding with allowable errors, Comm. ACM 13(7) (19701. [15] A.K. Carg and G.C. Gottlieb, Ordering-preserving key transformations, ACM Trans. on DB Systems 11 (2) pp. 213-234. [16l S. Ceri and G. Pelagatti, "'Distributed Databases: Principles and Systems (McGraw-Hill, 19841. [17] S. Christodoulakis, Estimating block transfers and join sizes, Proc, SIGMOD Conf. 0983). [18] H.T. Chou and D.J. De Witt, An evaluation of buffer management strategies for relational database systems, Proc. Eleventh Conf. on VLDB (Stockholm, 19851 pp. 127-141. [191 J.E Cheiney. P. Fandeinay and R, Michel, An extension of access paths to improve joins and selections, Proc. IEEE Int. Conf. on Data Engineering (19861 pp. 270-280. [20] DBAP Manual--Database Analyzer and Predictor: Concepts and Facilities, CRAI (Italy, September 19861. [21] S.M. Deen, An implementation of impure surrogates, Proc. Eighth SIGMOD Conf. (19821. [22] W. Effelsberg, Fixing pages in a database buffer, ACM SIGMOD Record 13 (2) (Jan. 19831. [23] R. Epstein. Analysis of Distributed Database Processing Strategies. Tech. Rep, UCB/ERC M80/25, Electronics Research Lab, Univ of California, Berkeley (Apr. 19801. [24] K.P. Eswaran, Placement of records in a file and file allocation in a computer network, Proc. IFIP Conf. (1974). [25] R. Fagin. J. Nievergelt, N, Pippenger and R.H. Strong. Extendible hashing in a fast access method for dynamic files, ACM Trans on DB Syst. 4 (3) pp. 315-344. [26] M.T. Fang, R.C.T. Lee and C.C. Chang, The idea of de-clustering and its applications, Proc. Twelfth VLDB Conf. (19861 pp. 181-188. [27] A. FIory, J. Gunther and J. Kouloudijan, Database organisation by clustering methods, h~f. Syst. 3 (1) (19781. [28] S.P. Ghosh, File organisation: the consecutive retrieval property, Comm. ACM 15 (19721 pp. 802-808. [29[ H. Gallaire, J, Minker and J.M. Nicolas, Logic and databases--a deductive approach, ACM Comp. Surveys 16(21 (19841 pp. 153-185. [3(I] J. Grant and J. Minker, Optimisation in deductive and conventional database systems, in Adv. in DB Theory (Plenum Press, 19811. I31] M. Hammer and A. Chan, Index selection for self-adaptive database systems, Proc. Second ACM SIGMOD Conf. (1976). [32] R.B. Hagmann, An observation on database buffering performance metrics, Proc, Twelfth Conf. VLDB (Kyoto, 1986) pp. 289-293. [33] J.A. Hoft'er, Method,~"of Primary and Secondary Key Selections (QED Information Sciences Ine, Wellesley MA, 1980). [341 J.A. Holler and A. Kovacevie, Optimal performance of inverted files, Oper. Res. (1981), [35} M. Hammer and S.B. Zdonik. Knowledge-based query processing, Proe. Sixth Conf, on VLDB (19801, 136l M. Jakobsson, Reducing block accesses in inverted files by partial clustering, blfonn. Systems 5 (1980). [371 M. Jarke and J. Koch, Query optimisations in database systems, ACM Comp. &lrveys 16 (2) (Jun. 19841. [38] JACM, Pattern matching in trees, J. ACM. [391 M. Jark¢, Cannon subexpression isolation in multiple query optimisation, in Query Processing in Distributed Systems (Springer-Verlag, 19851. [40] S. Kao, DECIDES: an expert system tool for physical database design, Proc. 1EEE Conf. on Database Engineerhzg (19861 pp. 671-675. [41] K.L, Kelley and M. Rusinkiewicz, Implementation of multi-key extendible hashing as an access method for a relational DBMS, Proc. IEEE Tenth Conf. Data Engineering (Feb. 1986) pp. 124-131. [42] W, Kim, A new way to compute the produce and join of relations. Proc, ACM-SIGMOD 80 (Santa Monica, May 19801 pp. 179-187. [43! J.J. King, QUIST~a system for semantic optimisation in relational databases. Proc. Seventh VLDB Conf, (1981). [44] S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Optimization by simulated annealing, Science 22(I (4598) (13 May 1983), [45] W. Kim, Global optimisation of relational queries: a first step, in Query Processing in Distributed Systems (Springer-Verlag, 1985), 146] H,F. Korth and ,I.D. UIIman, System/U: a database system based on the universal relation assumption, ACM TODS 9(3) (1984) pp. 331-347. [47] P.A, Larson, Dynamic hashing, Bit 18. pp. 184-201, [48] P. Larson, A single-file version of linear hashing with partial expansion, Proc. Eightti VLDB Conf. (1982) pp. 300-309,
Relational Database Performance
61
[49] W. Litwin, TRIE hashing, Proc. ACM Seventh SIGMOD Conf. (1981). [501 W. Litwin and D.B. Lomet, The bounded disorder access method, Proc. IEEE Int. Conf. on Data Engineering (1986) pp. 38-48. [51] J. Lloyd and K. Ramamohanaroa, Partial match retrieval for dynamic files, BIT 22 (1982) pp. 150-168. [52] D.B. Lomet, Bounded index exponential hashing, ACM Trans. on DB Syst., 8 (1) pp. 136-165. [53] J.P. Malmquist, E. Gudes and E.L. Robertson, Storage allocation for access path minimisation in network structured databases, Proc. lnt Conf. in Databases: Improving Usability and Responsiveness (Aug. 1978). [54] L.F. Mackert and G.F. Lohmann, R* optimiser validation and performance evaluation for distributed queries, Proc. Twelfth Conf. on VLDB (1986). [55] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.M. Teller and E. Teller, Equation of state calculation by fast computing machines, J. Chemical Physics 21 (6) (June 1953). [56] D. Mitra, F. Romeo and A. Sangiovanni-Vincentelli, Convergence and finite-time behaviour of simulated annealing, Adv. Appl. Prob. 18, 747-771 (1986). [57] J.K. Mullin, Spiral storage: efficient dynamic hashing with constant performance, Computer J. 28 (3) pp. 330-334. [58l J. Nievergelt, H. Hinterberger and K.C. Sevcik, The grid file: an adaptable multi-key file structure, ACM Trans. on DB Syst. 9 (1) pp. 38-76. [59] D. O'Sullivan, J.B. Grimson, D.A. Bell and D.H.O. Ling, Convenient multi-database access from a micro computer, Proc. Vienna Conf. (March 1987). [60] D. O'Sullivan, D.A. Bell, A. Fernandez, J.B. Grimson, N. Gianotti, A. Hutt and G. Turco, The Multi-Star System (MIE 1987). [61] E.J. Otoo, A multidimensional digital hashing scheme for files with composite keys, Proc. ACM SIGMOD Conf. (85) pp. 214-231. Proc. Int. Conf. on Foundations Database Organisation (Kyoto, 1985) pp. 283-296. [62] G. Piatetsky-Shapiro, The optimal selection of secondary indexes is NP-complete, ACM SIGMOD Record 13(2) (1983). [63l RTI Ingres cache management, Newsletter (June 1987). [64] K. Satoh, M. Tsuchida, F. Nakamura and K. Oomachi, Local and global query optimisation mechanisms in relational databases, Proc. Eleventh Conf. on VLDB (1985). [65] G.M. Sacco and M. Schkolnick, Buffer management in relational database systems, ACM TODS 11 (4) (Dec 1986) pp. 473-498. [66] G.M. Sacco, Index access with a finite buffer, Proc. Fourteenth Conf. on VLDB (1987). [67] M. Schkolnick, A clustering algorithm for hierarchical structures, ACM Trans. on DB Syst. 2 (1) (1977). [68l M. Schkolnick, The optimal relation of secondary indices for files, h~f. Systems 1 (Mar. 1977). [69] P.G. Selinger, M.M. Astrahan, D.D. Chamberlain, R.A. Lorie and T.G. Price, Access path selection in a relational database management system, Proc. SIGMOD Conf. (1979) pp. 23-34. [71)] J. Shao, D.A. Bell and E. Hull, A Scheme for Classifying Deductive Database Interfaces, Internal Report, Univ. of Ulster (1987). [71] M. Stonebraker, Operating system requirements for databases, CACM (1980). [72] W. Staniszkis, S. Orlando and P. Rullo, Performance.oriented database design laboratory, in Database Performance, Pergamon-lnfotech State-of-the-Art Report 12 (4) (1984) pp. 131-160. [73] E Stewart and D.A. Bell, A Tool for Database Design and Reorganisation, Internal Report, Univ. of Ulster (1987). [74] J.D. UIIman. Principles of Database Systems (Computer Press Inc., 1982). [75] K.-Y. Whang, G. Wiederhold and D. Sagalowicz Separability--an approach to physical database design, Proc. ACM Seventh Int. Conf. on VLDB (1981). [76] K.-Y. Whang, Index selection in relational databases, Proc. Int. Conf. on Foundations of Data Organisatio, (Kyoto, 1985) pp. 369-378. [77] P.C. Yue and C.K. Wong, Storage considerations in secondary index selections, Int. J. of Comp. and Inf. Science 4(4) (1975). [78] C.T. Yu, C.-M. Suen, K. Lam and M.K. Siu, Adaptive record clustering, ACM Trans. on DB Syst. 10 (2) (1985) pp. 180-2{]4. [79] T.S. Yuen and H.C. Du, Dyanmic file for partial match retrieval based on linear hashing, Proc. IEEE Int. Conf. on Data Engineering (Feb. 1986) pp. 116-123. [80] P. Zezula and J. Zlzka, File organisation i~erformance prediction, Int. Conf. on P~oc. Foundations of Data Organisation (Kyoto, 1985) pp. 310-319.