A rule-based query optimizer with multiple search strategies

A rule-based query optimizer with multiple search strategies

I DATA& KNOWLEDGE ENGINEERING ELSEVIER Data & Knowledge Engineering 13 (1994) 1-29 A rule-based query optimizer with multiple search strategies B6at...

2MB Sizes 3 Downloads 76 Views

I DATA& KNOWLEDGE ENGINEERING ELSEVIER

Data & Knowledge Engineering 13 (1994) 1-29

A rule-based query optimizer with multiple search strategies B6atrice Finance a, Georges Gardarin "'b'* aCNRS-PRiSM Laboratory, Versailles- St Quentin University, 45 Avenue des Etats-Unis, 78 000 Versailles, France blNRIA, RODIN Project, BP 105, 78153 Le Chesnay, France Received 27 July 1993; revised 27 October 1993; accepted 8 March 1994

Abstract This paper describes a rule-based query optimizer. The originality of the approach is through a uniform high-level rule language used to model both query rewriting and planning, as well as search strategies. Rules are given to specify operation permutation, recursive query optimization, integrity constraint addition, to model join ordering and access path selection. Therefore, meta-rules are presented to model multiple search strategies, including enumerative and randomized search. To illustrate these ideas, we describe a query optimizer for an extensible database server that supports abstract data types, complex objects, deductive capabilities and integrity constraints. A prototype of the query optimizer proposed in this paper is operational and has been demonstrated at the 1991 ESPRIT week in the EDS project.

Keywords: Query optimization; Object and deductive databases; Extensible optimizer; Query rewriting; Query Planning; Search strategy

I. Introduction

Query optimization in extended relational, object-oriented and deductive systems is a key issue in current database research (see [13] for a survey). Traditionally, query optimization translates a high-level user query into an efficient plan for accessing the database facts. Query optimization can be divided in two phases [20]: query rewriting transforms queries into equivalent simpler ones with better expected performance and query planning primarily determines the method for accessing objects. Query rewriting includes syntactic transformations such as query modification [35] with views, redundant sub-query elimination, select migration through join and fixpoint using methods such as Magic Sets [4]. It also includes semantic transformations such as query

* Corresponding author. Email: [email protected] 0169-023X/94/$07.00 © 1994 Elsevier Science B.V. All rights reserved SSDI 0169-023X(94)00008-3

2

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

simplification using integrity constraints [8, 34] and operator properties (e.g. transitivity of equality). Query planning includes data dependent transformations such as operation ordering (e.g. join ordering) and access path selection. Typically, query planning explores the search space of access plans to find a minimum cost access plan using a database processing cost model. Several search strategies have been proposed both for query rewriting and planing. For query rewriting, a strategy based on heuristics is usually chosen. Rules are ordered and alternative plans are not memorized. In the opposite, query planning memorizes many alternatives to find the plan that has the minimum cost. In the literature, several strategies are proposed including variations of enumerative search [33] and randomized search [21, 36, 37]. The diversity of the tasks that should be integrated in a query optimizer makes it one of the most complex components to write in a DBMS. Recently, extensible optimizers have been proposed [6, 11, 12, 17, 20, 29, 32] to support user-supplied ADTs and to deal with the complexity of query rewriting and/or query planning. The key idea is to generate a query optimizer from rules for transforming plans into alternative plans. A few extensible optimizers are operational, among them the E X O D U S optimizer [17] and the STARBURST optimizer [18]. E X O D U S was the first system to include a query optimizer generator based on a rule language that specify legal transformations of query trees. Although the E X O D U S optimizer generator has revolutionized query optimization, it falls short in several ways. First, it is not well adapted to query rewriting as it requires each operator to have an implementation. Second, the internal query representation is limited to query trees that fall short with the representation of common sub-expressions and recursive queries. Finally, it has a fixed unique search strategy. The STARBURST optimizer is divided in two phases, each one having its own rule language. The query rewrite rules are written in C. STARBURST designers have chosen to permit the full generality of C rather than commit to a fixed, declarative language. The use of C avoids the design of a fixed set of primitives for query rewriting. It keeps the rule language open and it facilitates the optimization of the rules. However, a disadvantage is that the rule language is hard to use without deep knowledge of the STARBURST internal structure. The query planning language is based on production rules. This architecture leads to several problems. As said by the designers themselves [18], the presence of two rule systems within the optimizer alone is particularly unsatisfactory. Ideally, they would like to find some unified set of rules that could be input as data to the optimizer, yet could be interpreted efficiently by a single rule processor. In this paper, we propose a unified rule language for expressing query transformations in an extensible query optimizer. The transformations integrate query rewriting and planning for extensible and deductive databases. We show that the uniformity and the high-level of abstraction of a language based on extended term rewriting rules can model many aspects of logical and physical query optimization. We also show that the rule language can be used to compute query cost variations and to define meta-rules. Meta-rules represent knowledge on the optimization strategy. Meta-rules are powerful tools to model various search strategies including exhaustive search, simulated annealing and iterative improvement. Rules and meta-rules can be added or modified by a database implementor.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

3

We study the query optimization problem in the context of an extensible database server that supports ADTs, objects, deductive capabilities and integrity constraints. We use the EDS database system as testbed system. This system is an extended relational parallel server developed as part of the EDS ESPRIT project. The server query language is called ESQL2 (Extended SQL2) [15]. ESQL2 integrates the essential concepts of relational, object-oriented and deductive databases. It provides the ability to extend the data model through abstract data types and sub-typing. This paper is organized as follows. Section 2 provides the background required to understand the query optimizer of EDS. It summarizes the data model and the extended algebra that we will use in our examples. Section 3 presents the rule language and the rule control language of the extensible optimizer. Section 4 describes the problem of query rewrite in an extensible DBMS. It gives significant examples of rewriting rules for syntactic and semantic optimization. In Section 5, query planning rules are introduced. In Section 6, we focus on the optimization strategies. We show how the rule language can be used for describing various search strategies with meta-rules. Section 7 describes the current implementation of the EDS query optimizer. It shows how the rule library can be divided in modules, each of them possibly integrating a different search strategy. Section 8 concludes the paper.

2. Background In this section, we introduce ESQL2, the source language of the optimizer, which is an extension of SQL with object-oriented and deductive capabilities [15]. Then, we focus on the LERA target language of the query optimizer, which is an extended form of relational algebra.

2.1. The ESQL2 Query Language The ESQL2 query language is an extension of SQL2 with complex objects and recursive views. The main advantages provided by ESQL2 over SQL are strong support for abstract data types, the introduction of classes as tables of complex objects with object identifiers, and a deductive capability to infer new data from stored data. Thus, ESQL2 supports both classes of objects and relations of values. They are both implemented as tables (with or without system generated identifiers) and referred as such in this paper. The reader can refer to [15] to get a detailed description of ESQL2. To illustrate the ESQL2 complex object facilities enhancing relational features, we give below a database example. Fig. 1 gives an example of a database mixing objects and values, respectively in classes and relations. The class FILM describes films as objects with a number (Numf), a list (Title) and a set (Categories) valued attributes. The class ACTOR simply describes actors as objects with Name, Firstnames and Salary attributes. The relation APPEARS_IN indicates which actor (as a reference to an object describing the actor) appears in which film (referred by a relational key). The relation DOMINATE shows the result of tennis matches organized between actors during film production. Fig. 2 gives one example of

4

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 [ Type Definitions ] TYPE Category ENUMERATION OF ('Comedy', 'Adventure', 'Science Fiction', 'Western') ; TYPE SetCategory SET OF Category ; TYPE Sentance LIST OF CHAR VARYING ; TYPE Pairs LIST OF TUPLE (Pros INT, Cons INT) ;

[Table Definitions] CLASS Film ( Numf NUMERIC PRIMARY KEY, Title Sentence, Categories SetCategory) RELATION Appears_In ( Numf NUMERIC, RefActor Actor) CLASS Actor ( Name CHAR(20), Firsmames Sentence, Salary NUMERIC) RELATION Dominate ( Numf NUMERIC, RefActorl Actor, RefActor2 Actor, Score Pairs)

Fig. I. Example of a database schema. SELECT Title, Categories, RefActor.Salary FROM Appears_in, Film WHERE Film.Numf = Appears_In.Numf AND RefActor.Name = 'Quinn' AND 'Adventure' MEMBER Categories ;

Fig. 2. Example of an ESQL2 query. queries for such a database. Note that an attribute in a nested tuple is designated using the attribute name as a function applied using dot notation. ESQL2 provides a deductive capability as an extension of the SQL view mechanism. This provides the ESQL2 user with the power of the D A T A L O G logic-based language using statements already available in SQL. Recursive views are defined through ESQL2 query expressions that refers to the view itself. To illustrate the ESQL2 deductive capabilities, we give in Fig. 3 a recursive view definition. It classifies actors (by their reference) according to tennis results. The query in Fig. 4 gives the actors who dominate Quinn.

2.2. The L E R A target language L E R A is an extended relational algebra to handle collection of values and objects. This language has been chosen for its power of abstraction from the physical implementation and from the query. L E R A operators can easily represent ESQL2 queries. A n ESQL2 query is a L E R A expression that maps collections into a collection as defined in [31]. In this section, we summarize the main L E R A operators required to extend the relational algebra to support the additional ESQL2 capabilities. The additional operators include a fixpoint operator, operators CREATE VIEW Better Than (RefActorl, RefActor2) AS ( SELECT RefActorl, RefActor2 FROM Dominate UNION SELECT D.RefActorl, B.RefActor2 FROM Dominate D, Better_ThanB WHERE D.RefActor2 = B.RefActorl ) ;

Fig. 3. A recursive view definition. SELECT RefActorl.Name FROM Better_Than WHERE RefActor2.Name = 'Quinn'

Fig. 4. A query on the recursive view.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

5

to transform simple values into collections and facilities to invoke ADT functions in qualification and projection expressions.

2.2.1. Basic operations The basic operations of our algebra are slight extensions of Codd's algebra, as defined for example in [14]. These operations are: (1) Filter which produces, from a complex table, a table of the same scheme and whose tuples satisfy a possibly complex condition. (2) Project which produces a new table from a given table by computing the expressions of source attributes as target attributes. (3) Join which may be defined as a Cartesian product of two tables followed by a filter. We also include traditional set operations on tables: union, difference, and intersection. A LERA tree is the internal representation of a user query for both logical and physical optimizations. Sometimes a sequence of basic operators can be implemented efficiently in a single algorithm (e.g. filter followed by project and nary join). Consequently, we define a macro algebra with compound operations. The compound operations are the Filter_Project operation that corresponds to the normal selection, the nary union (denoted union*, which performs the union of N tables) and the nary join (denoted join*, which performs the Cartesian product of N tables followed by a filtering). They also include the search operation, which is a composition of restrictions, nary joins and projections. These compound operations are close to tuple calculus expressions. They provide the system with the necessary degree of freedom to optimize correctly the queries. Tuple calculus is a good starting point for query optimization since it provides an optimizer with only the basic properties of the query; optimization opportunities may become hidden in a particular sequence of algebra operators like projection, restriction, join [25]. For illustration, the ESQL2 query shown in Fig. 2 is represented in Fig. 5 to illustrate one of the more powerful compound operators in LERA, a search; for clarity, table names are in capital letters in LERA expressions. To avoid ambiguity, an attribute reference is a composition of the variable number of the relation in the first search argument, with the column identifier.

2.2.2. The fixpoint operation A fixpoint operator is useful to compute recursive views as shown in [7, 21, 39]. This operator simply produces the saturation of a relation computed recursively by an algebra expression. More precisely, the algebra expression computes new tuples at each cycle. Thus, the general form of the fixpoint operator is fix (R, E(R)), where E(R) is a relational expression using R. The fix operator computes the smallest relation R such that R = Ei(R) = Ei+I(R). For search ( (APPEARS IN v_l, FILM v_2),

v l.l=v 2.1 ^ name(v_l.2)='Quinn' ^ member(v_2.3, 'Adventure'), (v_2.2, v_2.3, salary(v 1.2))) Fig. 5. E x a m p l e o f t h e s e a r c h f u n c t i o n a p p l i c a t i o n .

6

B. Finance, G, Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 fix (Better_than, union ((Dominate, search((Dominate v_l, Better_than v2), v_l.2=v 2.1,

(v_l.l, v_2.2))))) Fig. 6. Example of the fix operator application.

example, the fixpoint of the Better_than relation can be computed by the expression given in Fig. 6.

2.2.3. Support of ADT functions Complex objects support is provided in L E R A by the generalization of the restriction, projection, and join relational operators. Mainly, built-in and user-defined function symbols can now appear in the criterion of a restriction, of a join and in the attribute list of a projection. More generally, they can appear in the complex condition or attribute list of a search.

2.2.4. Support of collections Collections in L E R A are first supported by the collection built-in functions. For example, the M E M B E R Boolean function checks whether an element is a member of a set in a complex condition (e.g. MEMBER('Adventure', Categories)). More generally, L E R A includes all collection manipulation functions of ESQL2. In addition, the nest and unnest operators are added to LERA. They transform columns into collections and conversely and provide the necessary support of nested relations. The nest operator transforms one or more columns of a table into a collection, according to one or more grouping attributes. By default, the generated collection is a bag (with duplicates). It is transformed into a set by using the MakeSet function or into a list by using the MakeList function with an ordering criterion, etc. Other functions can be applied to the resulting collection, for example, standard SQL aggregate functions (i.e. SUM, COUNT, AVG...) that are set or list functions. The general format of the nest operator is: nest(Table, ( (Grouping Attributes) ), ( ( Nested Attributes) ), [ (Expression) ]) For example, nest (APPEARS_IN, (Numf), (RefActor), Count(MakeSet(RefActor))) returns tuples of the form {Numf, {Actor_oid}, COUNT}, where C O U N T is the number of actors who played in the given film. The unnest operator flattens a collection into one or more table columns. It transforms collections of atomic values into single valued columns. The general format of the unnest operator is: unnest(Relation, ( [ ( Ungrouping Attributes ) ] ), ( ( Nested Attributes ) )) For example, unnest(FILM, (Numf, Title), (Categories)) gives a relation where each film number and film title are repeated for each category they refer to.

3. The rule language A term rewriting formalism has been chosen to unify the various aspects of query optimization. Term rewrite rules were proven useful in many research areas. For example,

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

7

rewrite rules are helpful to synthesize programs [1]. The rule language we present here can be perceived as an extension of the one proposed in [32]. The main extensions are: the introduction of complex terms and their associated variables, the representation of path expressions, the addition of constraints expressed as m e t h o d calls in the conditional part of rules, and the inclusion of differential cost computation within rules. In the remainder of this section, we introduce the concept of term, then the syntax and the semantic of our rewriting rule language, and finally the control rule language. 3.1. Term definition

Because the job of a query optimizer is to synthesize a program from a specification, it is no surprise that optimizers can be specified using terms rewrite rules. However, it is important to know what is exactly the term expressiveness. In the context of advanced databases, the notion of term should be powerful enough to represent complex and cyclic structures, as well as c o m m o n sub-expressions. In the following, we give the alphabet and the construction rules of the words of our rule language. 3.1.1. The alphabet T h e term construction alphabet is composed of the following elements: -constants as integer, real, character and Boolean; -function symbols as join, insert, s e a r c h , . . . ; -constructors d e n o t e d {} for set, [] for list, () for tuple and . for path traversal; -identifiers as name, age, f i r s t n a m e , . . . ; - t h r e e sets of variables: (1) term variables d e n o t e d u, v, w, x, y, z . . . . ; (2) function variables d e n o t e d H, I, J, K , . . . ; (3) collection variables d e n o t e d u*, v*, w*, x*, y*, z* . . . . . Let C be the set of constants, F the set of function symbols, V t the set of term variables, Vf the set of function variables and Vc the set of collection variables. The alphabet presented above is such that the F and Vf sets are distinct, and the C, Vt, Vc sets are distinct two by two. 3.1.2. The term construction rules Terms are constructed using the following rules: - a constant is a term, - a term variable is a term, - ( a l :~'1, • • •, an :'r,) is a term, which defines a list of n a m e d terms, - [71, • • •, 3',] is a term, which defines a list of terms, - {3'1, • • •, 3'.} is a term, which defines a set of terms, -61 . . . . . 6. is a term, which defines a path expression, -f('rl,..., ~'n) is a term, which defines a functional term, where ~i's are identifiers, ~-i's are terms, 3'i's are terms or collection variables, 6i's are identifiers or term variables or collection variables, and finally f is a function symbol or a function variable. Fig. 7 gives some examples of terms. Functions may use collections as arguments, which improves the expressive power of the

8

B. Finance. G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 { [1, 2], 12, 3} } is a term contains( x, {1, 2, 3}) is a term I( 1, 2) is a term < name : 'Pierre', age : 10> is a term Intersection ( {1, x*}, {1, y*}) is a term person.address.street is a term Fig. 7. E x a m p l e s of t e r m s .

language. Collection variables are symbols representing collections. The use of collection variables allows the specification of expressions involving lists of arguments. For example, using sets as arguments eliminates the use of permutation rules, as sets are unordered. Terms as introduced above may be labelled [2] to model cyclic sub-queries or common subexpressions. For example, the labelled term 'L1 :join(A, :L1)' represents a cyclic term, and the term '{L2:'a', surface(10, 'b', :L2),'green' }' expresses sharing of the constant term 'a'. As example, we synthesize in Fig. 8 the L E R A program given in Fig. 5.

3.2. Syntax and semantic of the rule language In the previous section, we focus on the representation of a program. Now, we describe the term rewrite rule language used to specify query transformations in the optimizer. Traditionally, a term rewrite rule is an expression r 1~ r 2, where all variables in r2 also occur in zl [9]. A substitution maps variables to terms. If tr is a substitution, then for any term r the expression o'(r) denotes the term obtained by replacing all variables V in r by o'(V). Suppose that -rl ~ z2 is a term rewrite rule. Let T be a term containing a subterm r l ' such that there is a substitution o- with o-(tl) = TI'. Let r' be the term obtained by replacing ¢1' in ~- by ~r(T2). Then we say that r rewrites to r'. In the following we give an example of such a rewriting. Suppose that z l ' is a subterm of r and is equal to "[f('a'), g('b'), 'c', 'd']". Suppose that r l = [x, I('b'), y*] is the left term of a rewrite rule of the form [x, I('b'), y*]--->x. See Fig. 9 the graphic representation of these two terms. The substitution o- gives for each variable x, I and y* its term value. The substitution search ( {[APPEARS_IN, v 1], [FILM, v_2]}, and( {=(v 1.1,v_2.1), name(v_l.2)='Quinn', member(v 2.3, 'Adventure')}), [v_2.2, v_2.3, salary(v_1.2)]) Fig. 8. A p r o g r a m s y n t h e s i z e d as a t e r m .

[l" [],i f~'

:

g

x b Fig. 9. E x a m p l e of u n i f y i n g t e r m s .

i ! b

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

9

c r = { x : f ( ' a ' ) , I : ' g ' , y * : [ ' c ' , ' d ' ] ) is such that o'('rl)=~-l'. The unification succeeds. The rewrite term 1-' is obtained by replacing T'I in ~- by or(x), i.e. "f('a')". However, traditional term rewrite rules are not powerful enough to express complex query transformations, as required in a query optimizer. A rewrite rule only focuses on the structural properties of terms. In certain cases, it can be useful to interpret the semantics of queries; for example, 'is the sum of two constant terms x and y equal to 10?' may be a valid condition. Furthermore, it is not always possible to instance all the variables of right term using only the unification algorithm. For example, in the context of deductive databases, it can be useful to apply a complex transformation such as the Magic Sets technique to propagate constraints through a cycle. The complexity of such rules is such that we need to invoke external functions. An important need of query optimization is the ability of the optimizer to decide to apply a rule in view of its effect on the query cost. So, a very important aspect of query optimization, i.e., the cost criteria, should be optionally expressed in the rules. It is for all these reasons that we extend the power of term rewrite rules with constraints, methods and differential cost computation. The proposed syntax of a rewrite rule is the following: (rule name ) : IF(Term)[under(Constraint)][delta_eost = (Function Expression)] Then [execute(Method List)][rewrite](Term'); This rule is interpreted as:

"if the left term appears in the selected query plan under the given set of constraints, it is rewritten as the given right term after the execution of the given list of methods. The delta_cost expression gives the estimated cost variation induced by the application of the rewrite rule; it is computed before the rewriting." More precisely, the rule is decomposed in two parts. The first one, called the pre-condition, consists in doing first the pattern-matching, second the constraint verification and finally the computation of the estimated cost variation. The second part, called the post-condition, consists in doing first the execution of the method list and next the rewriting. The evaluation of a rule is done in two steps: the pre-condition and then the post-condition. An ELSE part is also possible as post-condition, but it has been omitted for simplicity. Constraints, methods and the delta_cost definitions are optional. The constraint verifies the properties of the variables appearing in the left term. (Constraint) is a Boolean expression, such as (fa and f2) or (not(f3)) where the fi's are Boolean functions that take as parameters left term variables or constant terms. Method calls can be useful to deal with complex optimization problems, that require external functions programmed in C or C + +. For example, if all variables in a criterion are bound, it can be useful to apply an evaluation function. (Method List) defines a list of term affection x a = ml, x 2 = m 2 , . . . , x n = m n. The mi's are methods that take as parameters constant terms or term variables appearing in the left term or computed by a method mi_ a. The methods compute new constant terms that take place in the right term. (Function expression) defines the a priori estimated differential cost when applying the rewrite rule. The function takes as parameters variables appearing in the left term only. The full syntax of the rule language is given in Annex.

10

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 Rule_l : IF f i r s t e l e m e n t ( [ x, y*]) THEN x; Rule_2: IF and({ false, x*}) THEN false; Rule 3: IF L1: eclual(x, y) u n d e r ConstantTerm(x) and ConstantTerm(y) THEN execute val = Evaluate(:L1) rewrite val; Rule_4: IF LI: filter(union(x, y), name ='John') delta_cost = CostPermFilterUnion(:L1) THEN union( filter(x, name ='John'), filter(y, name ='John')); Fig. 10. E x a m p l e s of t e r m r e w r i t e rules.

Fig. 10 shows four examples of syntactically correct rules. They express successively A D T function simplification, Boolean expression simplification, use of constraints and methods, and finally the use of the delta_cost clause. More complex rules are given in the following sections. The traditional semantics of a term rewriting system is based on the reduction principle. This principle consists in rewriting a constant term into another constant term by application of term rewriting rules. The semantics is well defined if each variable that appears in the right part of a rule occurs in the left term or in the m e t h o d list definition. The semantics of our language is based on the same principle. Therefore, we guarantee that in each rule each variable appearing as a parameter of a function, a m e t h o d or the right term is completely instanced when calling the function, the m e t h o d or rewriting the left term. For more details see [10].

3.3. Rule control language Traditionally in the literature, optimizer designers focus on rule definition and very little on the strategy component. Rules define possible local optimizations whereas the strategy addresses the global optimization process. Rewrite rules performs query transformations and allows the optimizer to explore the search space of query plans. The importance of the strategy is to ensure the efficiency and the termination of the exploration. The strategy determines what rules to apply and when, since at any point of the rewrite process, several rules may be eligible to fire. To define a strategy supposes to have a language to control the ordering of rules. That is why we propose a control language introduced hereafter. A strategy consists in three steps: M A T C H , S E L E C T and A P P L Y [17]. The first one determines the conflict set. The conflict set defines all firable rules on the query. Then, the second one chooses one rule in the conflict set. Finally, the selected rule is applied. There are mainly two strategies to select a rule in a conflict set: based on priorities or randomly. Therefore, we define two basic constructors that allow the database implementor to define blocks or sequences of rules to run up to a certain limit. The block constructor corresponds to a random strategy. The syntax to define a block is: {rule, rule, . . .} credit

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

11

The set of rules specifies the rules that are in the block. The credit is a maximum number of rule application trials. Each time a rule of the conflict set is randomly chosen and evaluated, the credit of the block is decreased by one. Infinite credit (denoted *) means application up to saturation. A block is totally evaluated when the credit reaches zero or if there are no more rules in the conflict set. By default the credit is one. The sequence constructor corresponds to the rule priority strategy. The syntax to define a sequence is: [rule, rule . . . . ]credit. The list of rules specifies the rule ordering of the sequence according to the rule priority. This strategy is more efficient than the previous one. As each rule is evaluated according to its list ordering, it is not necessary to compute the whole conflict set. It simply contains the first firable rule. Each time a rule is applied, the credit of the sequence is decreased by one. A sequence is totally evaluated when the credit reaches zero or if no more rules in the list are firable. By default, the credit is one. These two basic constructors can be nested to produce more complex strategies. For example, one may specify the strategy [rl, r2, {r2,r3} s, {r4, r5}*]. This means that rl should be applied first, then r2, then the block {r2,r3} with credit N, and finally the block {r4,r5} up to saturation. A list of rules is evaluated in sequence while a set is evaluated at random. The credit of a block or a sequence is decreased by one each time a rule is applied in the block or in a sequence. A block or a sequence is totally evaluated when the credit reaches zero or if all rules are saturated (i.e., the conflict set is empty). In the remainder of this paper, we illustrate how the rule language and the rule control language are used to model query rewrite, query planning and complex search strategies in the EDS optimizer.

4. Query rewrite Query Rewriting transforms queries into equivalent simpler ones with better expected performance. It includes syntactic and semantic transformations. Syntactic optimization is one approach to query optimization; it is based on properties of relational algebra operations [38] and generally consists of permuting algebra operations. In the context of deductive databases, syntactic optimization also includes pushing selection before recursion, using Magic Sets, and Alexander methods [3,30]. A complementary approach, called semantic optimization, transforms the query using semantic knowledge about the database [8]. When a query is evaluated against a database, integrity constraints can be used to make data selection in a more intelligent way. Solving queries that contradict integrity constraints can be avoided. Certain integrity constraints can be simply added to the query plan to optimize access path selection later. Using integrity constraints to optimize queries in a deductive environment with complex objects can generate valuable cost savings. In the context of EDS, the knowledge exists in several forms. It includes equivalence of algebraic expressions (for L E R A expressions), inference rules (for semantic transformations), logic formula simplification rules (for ADT function simplification, and for detecting inconsistencies) and complex algorithms (e.g., rewriting methods). In this section, we show how both syntactic and semantic query optimization transformations are specified using our rule language.

12

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 MorganRules: IF not (and ( x , y)) T H E N or(not (x), not(y)) IF not (or ( x , y)) T H E N and(not (x), not(y)) IF not (not (x)) T H E N x

DisjunctiveRule : IF and (L1: x, or (y, z)) T H E N or ( and (:L1, y), a n d (:L1, z))

Conjunctive_Rule : IF or (L1: x, a n d ( y, z)) T H E N a n d ( or (:L1, y), or (:L1, :z))

Fig. 11. Finding qualification normal form.

4.1.

Syntactic optimization

rules

Syntactic rewriting rules are used after parsing the query to perform a straightforward translation of an ESQL2 query into a L E R A functional expression. They structure and order L E R A operations to yield a canonical form of queries. More precisely, syntactic transformations are divided into three important activities: standardization, simplification and amelioration of the query. 4.1.1.

Standardization

The standardization phase transforms the query into a canonical form. Standardization rules can transform the query qualifications into a conjunctive or disjunctive normal form, can provide a compact representation of the query by merging L E R A operations, or rewrite predicates. The standardization step is very important as it can ease the application of other rules. The rules in Fig.11 find the disjunctive or conjunctive normal form of a qualification. Merging rules reduces the size of a L E R A program by removing unnecessary temporary relations. Qualifications are merged to detect inconsistencies and to ease predicate simplification. In Fig. 12, we illustrate the operation merging rules with the search merging rule: two successive Search operators are simply merged and their qualifications are connected by 'and'. The substitute function maps all attributes in the qualification or in the expression of the external search to their corresponding attribute in the internal search. We also give the union merging rule. These rules eliminate the arbitrary processing order imposed by the user-written queries

Union_merging_rule :

IF UNION( {UNION(x),y*})

T H E N execute z = A P P E N D (x, y*) rewrite U N I O N (z) ; Search_mersins_rule :

IF SEARCH({[SEARCH(x,Quall, Expl),Var ], y*},Qual2, Exp2) THEN execute N e w qual = SUBSTITUTE ( Qual2, Var, Expl),

New exp = SUBSTITUTE ( Exp2,Var, Expl), z = APPEND ( x, y*)

rewrite

SEARCH( z, and (Quall,New_qual),New exp);

Fig. 12. Examples of merging rules.

B. Finance, G. Gardarin / Data &

Knowledge Engineering 13

(1994) 1 - 2 9

13

Equation_Evaluation_Rule: IF =(-(x,y),0) THEN =(x,y) ; Constant_Evaluation_Rule: IF LI: F(x,y) under Constant_term(x)and Constant_term(y) THEN execute val= EVAL(:L1) rewrite val ; Fig. 13. Predicate standardization rules.

through normalization. This normalization provides more opportunity to find the best access plan as all constraints are merged. Predicates can be written in different ways in queries. The rules in Fig. 13 eliminate some predicate complexity by selecting a standard form.

4.1.2. Simplification The simplification phase simplifies the query to find redundancies or inconsistencies in a program. For example, the simplification rules in Fig. 14 detect inconsistency and redundancy in a qualification. In general, determining if an expression is inconsistent is an NP-complete problem. However, it is possible to detect simple inconsistencies. Fig. 14 illustrates a few predicate simplification rules, which are expressed with the presented rule language. 4.1.3. Amelioration The amelioration of a query consists in the permutation of L E R A operations. Permutation rules push constraints on relations stored in the database and focus the query on relevant facts. Permutation rules are heuristics and do not guarantee a better processing plan; their role is to propagate constraints on base relations when possible. We illustrate them by giving in Fig. 15 two rules that push a search down a L E R A program. The first one is simple: it decomposes a search applied to a union of two relations into a union of two searches. The second one is

Idempotence_rules: IF and ( true, x) THEN x ; IF and ( or(L1: x, y), :L1) THEN :L1; IF or ( and(L1: x, y), :L1) THEN :L1; IF and ( false, x) THEN false; IF or ( true, x) THEN true; IF and (L1: x, not (:L1)) THEN false; Fig. 14. Examples of predicate simplification rules.

SearchthroughUnion_Pushing_Rule : IF SEARCH ( {[L1 : UNION ( {y, z*}), Var], x*}, Qual, Exp ) THEN UNION ( { SEARCH ( {[y, Var], x*}, Qual, Exp ), SEARCH ( {[UNION ({z*}), Var], x*}, Qual, Exp ) } ) ; SearchthroughNest_Pushing..Rule : IF SEARCH( {[NEST( x, Gatt, Natt), Var], y*}, Qual, Exp) under REFER (Qual, Var, Gatt) THEN execute Quail= EXTRACT(Qual, Var, Gatt), Qual2 = DIFF(Qual, Quail), Expl = APPEND (Gatt, Natt) rewrite SEARCH ({[NEST( SEARCH ( {[ x, Var], Quail, Expl), Gatt, Natt), Var], y*}, Qual2, Exp); Fig. 15. Examples of permutation rules.

14

B. Finance, G. Gardarin / Data & K n o w l e d g e Engineering 13 (1994) 1 - 2 9

more complex: it pushes a search through a nest when the search condition does not refer to nested attributes. This additional condition is added to the rule as a constraint using the R E F E R Boolean external function. This function checks whether the attributes of a given qualification belong to a given list of attributes. The EXTRACT function separates predicates that refer to grouping attributes, then rename them. In the case of recursive predicates, the permutation between operators is difficult. The application of a rewriting method such as Magic Sets [3] is helpful. Such methods transform recursive expressions into expressions that focus on relevant facts. In the rule given in Fig. 16, we apply once the Magic Set method for every recursive predicate. One possibility is to add a condition for triggering the rule. This can be useful, if we characterize typical recursion computable with an efficient algorithm, without applying the Magic Set method.

4.2. Semantic rewriting rules Traditionally in query optimizers, decisions are made mainly based on syntactic knowledge and data structures. Semantic query optimization explores all semantically equivalent queries. This increases the search space of the optimizer. Research in semantic query optimization has demonstrated the potential time saving possible with proper use of inference rules; this should be more significant with complex objects and deductive rules. Database semantic properties are defined at the schema level, and should be integrated in the logical optimizer as one possibility to improve the query for physical optimization. Profitable predicates can be added. Predicate elimination is also possible to simplify the qualification in case of contradictions. ESQL2 queries are expressed over well-understood structures such as sets, lists, arrays, bags. These structures have natural algebraic operations (e.g., inclusion, intersection, etc.), and privileged predicates (equality, membership, etc.) associated with them. The properties of these algebraic operations and predicates comprise the implicit semantic knowledge. It defines properties of system constructs, which are specified by the database implementor. In the same manner, integrity constraints define properties satisfied by objects that are declared by the user. As noted in [41], implicit and explicit semantic knowledge plays important and complementary roles to pass from one representation to another one. We now study the integration of these two kinds of semantic knowledge in our extensible optimizer. The explicit semantic knowledge is composed of the set of integrity constraints declared by the database administrator. An integrity constraint is an axiom that must be satisfied by all data inserted in the database. Integrity constraints can also refer to complex objects. In the context of ESQL, integrity constraints may be declared either on extensions, either on type. The ESQL syntax is derived from SQL2, extended to type constraints. For example, we define in Fig. 17 three integrity constraints on the database schema of Fig. 1. Implicit semantic knowledge corresponds to well-known properties of A D T functions. For Magic..SetRule : IF SEARCH( {[L1: FIX(x, Alg_exp), Var], y*},Qual, Exp) THEN execute Signature = ADORNMENT (Qual, Var), z = MAGIC (:L1, Signature) rewrite SEARCH( {[z, VarL y*}, Qual, Exp) Fig. 16. The Magic Set invocation rule.

B. Finance, G. Gardarin / Data &

Knowledge Engineering 13 (1994)

1-29

15

CREATE C O N S T R A I N T Positive_Salary ON TABLE APPEARS1N CHECK RefActor.Salary BETWEEN 0 A N D 999999 ; CREATE CONSTRAINT Possible_Categories ON TYPE SetCategory CHECK VALUE IN {'Comedy', 'Adventure', 'Western'} ; CREATE CONSTRAINT Greater_Score ON TABLE DOMINATE CHECK Score.Pros > Score.Cons ; Fig. 17. Examples of explicit integrity constraints.

Positive_Salary : IF SEARCH (LI: {['APPEARS_IN', Var], x*}, Qual, Exp) T H E N SEARCH (:L1, and(Qual, between(Var.actor.salary, 0, 999999)), Exp);

Possible_Categories : IF SEARCH (L1: {['FILM', Var], x*}, Qual, Exp) THEN SEARCH (:L1, and( Qual,

include( Var.categories, ['Comedy', 'Adventure', 'Western'})), Exp);

Greater_Score : IF SEARCH (L1: {['DOMINATE', Var], x*}, Qual, Exp) THEN SEARCH (:L1, and(Qual, sup(Var.score.pros, Var.score.cons)), Exp); Fig. 18. Examples of explicit semantic rules.

defining implicit semantic knowledge, we propose that the database implementor use the rule language for specifying integrity constraint. For illustration, we will illustrate some fundamental properties of the equality operator and the inclusion relationship, as defined in [41], namely the transitivity of equality and of set inclusion, and the equality substitution rules (i.e., (x = y) and P(x) implies P(y)). The addition of semantic knowledge to queries may be useful to simplify further predicates in qualifications and to improve access planning. The addition of the defined constraints to a query is simply done by adding the rules given in Fig. 18 in the query rewriter. The implicit constraints can be expressed in several ways in the query rewriter depending of the choice made by the database implementor. For example in Fig. 19, we presume that the qualification is transformed into a canonical form based on nary predicate; this avoids to declare the commutativity of the 'and' predicate. Choosing appropriate explicit integrity constraints or implicit constraints that guarantee profitability for the query is a hard problem. In our language, this can be achieved by strengthening the rule condition. A classification of semantic inference rules is helpful to choose appropriate ones. For example, domain integrity constraints can be very useful: if Transitivity_of_equality_Rule : IF and ( { L1: = ( x, y), L2: =( y, z), u*} ) T H E N and ( { =(x, z), :L1 , :L2, u*} );

Transitivity_of set_inclusion_Rule : IF and ({ L1: include( x, y), L2: include( y, z), u* }) THEN and ( { include (x, z), :L1, :L2, u*});

Equality_of_substitution_Rule : IF and ({ L1: =(x,y), L2: l(x), z*} ) T H E N and ({ I(y), :L1, :L2, z*}); Fig. 19. Examples of implicit semantic rules.

16

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 Filter.. ProjectRule: IF SEARCH( { Ix, Var], x*}, and (ly, y*}), Exp) under Mono_Predicate(x,y) THEN SEARCH( { [FILTERPROJECT( x, y ), Var], x*}, and({y*}}, Exp); Fig. 20. Filter_Project extraction rule.

there exists another constraint on the attribute, an inconsistency can be detected quickly. For example, MEMBER('Cartoon', Categories) is inconsistent because MEMBER('Cartoon', {'Comedy', 'Adventure', 'Western'}) is false.

5. Query planning Query planning includes data dependent transformations such as decomposition and ordering (e.g. join ordering) of operations and access path selection. In this section, we present some rules illustrating the possible support of query planning using our rule language.

5.1. Decomposition and ordering of operations In this sub-section, we give the rewrite rules used to transform SEARCH operations into a sequence of FILTER_PROJECT and JOIN operations. For example, the rule given in Fig. 20 analyzes the qualification predicates to extract mono-relation predicates. The rule given in Fig. 21 extracts join predicate from the search qualification. Pushing restrictions before joins is a useful heuristic of relational systems. This heuristic is easy to define with our rule control language. It consists of applying the Filter_Project extraction rule up to saturation before the join extraction rule. However, if join indexes are supported or if costly ADT functions appear in a Filter_Project, the strategy defined above should be changed. Indeed, this heuristic is no more valid and cost variations should be considered. In the following, we assume that restrictions are done at the beginning. Thus, the optimizer is mainly concerned with join ordering operations. In join ordering methods, several join processing trees have been considered, among them linear trees and bushy trees. A linear tree is a left-deep linear tree if all join right operands are base relations. A bushy tree is a join processing tree for which each operand can be independently a base relation or an intermediate result. For example, the rule given in Fig. 21 builds bushy trees. Left-deep trees can be easily obtained by constraining the join extraction rule. For example, we can specify that the x2 relation should be a base relation. Notice that the join extraction rule does not investigate Cartesian product.

Join_Rule: IF SEARCH( { [xl, Varl], Ix2, Var2], x*}, and ({y, y*}), Exp) under Join_Predicate(x1, x2, y) THEN SEARCH( { [JOIN( xl, x2, y ), Var], x*}, and({y*}}, Exp);

Fig. 21. Join extraction rule.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

17

]oin_Commutalivity_Rule: IF JOIN(R1, R2) THEN JOIN(R2, RI)

Join_~ciativity_Rule: IF L1: JOIN(JOIN(R1, R2), R3) delta_cost = DeltaCost_ChangeJoin(:Ll) THEN JOIN(R1, JOIN( R2, R3)) Fig. 22. Join transformation rules.

In Fig. 22, we define usual valid transformation rules of busy trees [23]. We intentionally omit join predicates to increase the readability of the rule. Note that in the second rule, a delta_cost method is specified, which computes the variation of cost induced by changing the order of joins; the result can be positive or negative.

5.2. Algorithm selection Up to now, algebraic transformations have been defined. In this sub-section, the given rules associate for each algebraic operation the possible algorithms that are provided by the system. We present only significant examples. Notice that for all the following rules the delta_clause is specified; it gives for each selected algorithm the estimated variation of cost if applying the algorithm. By doing this, it will be possible to choose the algorithm that gives the better cost. Traditionally, different access methods are proposed to increase the efficiency of the Filter_Project operation. In Fig. 23, we illustrate rules to force selection though Btrees or Rtrees. The first rules simply states that if there exists a Btree on an attribute checked for equality, then the table might be accessed through the B_tree function. The second rule is similar if there exists an Rtree on an attribute checked for inclusion in a rectangle. Join methods generally distinguish the two operand relations, the first one being external and the other internal. The external relation is pre-processed (e.g. hashed) first and the internal relation second, possibly using values derived from the processing of the external relation (e.g. bit arrays). Next, a final processing of both pre-processed relations is necessary (e.g. a merge of the corresponding buckets). Rules generating all alternatives are required in the optimizer as cost formulas are not symmetrical. In Fig. 24, we give the rules corresponding to the possible selection of the hashed join algorithm. Another way of accelerating joins is to use indices [40]. The existence of a join indice can be checked in a rule and marked in a join node, as done in the rule of Fig. 25. Further algorithms can be selected by extending the type of additional structures checked by the EXIST function BTree_Rule : IF LI : FILTER_ PROJECT( [x, Var], and({ =( Vat.i, Constl), y*}), Exp) under EXIST('BTree', i, x) delta_cost = DeltaCost_Btree(:Lt) THEN FILTER_ PROJECT( [B_Tree(x, i, Constl), Var], ancl({y*}),Exp); RTree_Rule : IF Lt : FILTER_ PROJECT( Ix, Var], and({ inside( , Cortstt), y*}), Exp) under EXIST('RTree', i, x) delta..¢ost = DeltaCost..Rtree(:L1) THEN FILTER- PROJECT( [ R-Tree(x, i, Constl), Var], ancl({y*}),Exp) Fig. 23. Access method selection.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

18

Hash_Join l: /* RI as the external relation */ IF JOIN( RI, R2, Pred, Proj_Exp) delta_cost = DeltaCost_HashJoin(Rl,R2) THEN HASH lOIN( R1, R2, Pred, Proj Exp); Hash_Join 2: /* R2 as the external relation */ IF JOIN( R1, R2, Pred, Proj_Exp) delta_cost = DeltaCost_HashJoin(R2,R1) THEN HASH JOIN( R2, R1, Pred, Proj Exp); Fig. 24. Hash join algorithm selection.

and by supporting more physical operations (with corresponding algorithms executed at run time).

6. Implementing search strategies with meta-rules One key aspect of query planning is the control strategy, which determines what transformation rule to apply and when. The control strategy implements a search among the set of plans possibly generated by the rules from the initial query plan. In the literature, several search strategies have been proposed to explore the search space of query plans [22, 36]. Object-oriented implementations of several of them are nicely described in [27].

6.1. Principles In this paper, we give meta-rules for implementing variations of exhaustive search, iterative improvement and simulating annealing. Meta-rules are high-level rules referencing the generated plans and the transformation rules. The syntax of the meta-rules is similar to that of rules. Meta-rules work on special objects: • States, which record query plans with contexts including the estimated costs; • The traversed search space, which contains the list of evaluated states; • Rules, which are mapping functions from one state to another state. To define meta-rules the search space is considered as a term: Search_Space([State(P0), State(P1),... , State(Pn)]) The specification of a search strategy requires rules to generate query plans and select the best generated plan. Generated plans are memorized as successive states in the search space. One problem is to find rules that apply to the working state. Then, among them a rule is elected. Finally, the chosen rule is applied to produce another state. To illustrate the introduced meta-rule language, we give in Fig. 26 meta-rules that are useful for specifying various strategies. These meta-rules use three main built-in methods provided in our extensible optimizer, Match, Select and Apply, that respectively correspond to the three Join_index : IF JOIN( RI, R2, Pred, Proj_Exp) under EXIST('Join_index',RI, R2, Pred) delta c o s t = DeltaCost_IndexJoin(R2,Rt) THEN INDEX JOIN( R1, R2, Pred, Proj_exp); Fig. 25. Join indices algorithm selection.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

19

Conflict Set_State: IF Search_Space([ State(x), z*]) THEN execute 1 =Match (x, 'R') rewrite SearchSpace([ Conflict_Set(x, 1), z*]) ; Transform_State: IF Search_Space([Conflict_Set(x,I), z*]) under Not_Empty(I) THEN execute subst = Select(l), new = Apply(x, subst) rewrite Search_Space( [ State(new), z*]) ; ELSE rewrite Search_Space( [z*, Goal(x)]) ; Fig. 26. T w o m e t a - r u l e s g e n e r a t i n g an a l t e r n a t i v e state of o n e state.

steps just defined and mentioned in our rule control language. The Match method takes as parameters a term and a specific list of potential rewrite rules. It returns the conflict set of a term including the list of valid substitutions. A valid substitution is characterized by a sub-term, a variable substitution list and the cost variation of the rule (i.e., the delta_cost value). If the delta_cost clause is empty, the default value is 0. For example, a possible value for the result of the Match function is [ ( R l , { ( z l ' , o ' l , + 1 0 ) (~-2',o'2,-20)}), (R2, {(~'n',0.n, 0})]. The Select function returns the first selected rule with the associated substitution (e.g., (R1, ('rl', o'1, 61 )); a variation is Select_Best, which returns the minimum cost substitution (e.g., (Rl,(z2', 0 " 2 , - 2 0 ) ) ) . Finally, the Apply function applies the valid substitution to a term and produces another term. The first meta-rule of the Fig. 26 generates the conflict set of one state from a specific list of potential rewrite rules denoted R. The working state is rewritten into a Conflict_Set term that represents a state in the middle of its transformation process. Thus, a second mandatory meta-rule follows the first one. The Transform_State meta-rule chooses one rule in the conflict set and applies it on the working state to produce another state. The Not_Empty function checks whether it exits a valid substitution. Instead of defining two rules corresponding to the test Not_Empty is true or not true, we introduce in the second meta-rule an 'ELSE' construct to facilitate readability. For example, if the conflict set is empty, no rule can be applied on a state. The rewriting is consequently done to reach a terminal state denoted Goal. Many variations of the given rules are possible. For example, we can use another Select function (e.g. Select_Best or Select_Random) or another Apply function (e.g. Apply_All that returns all possible alternative states of one state from a given list of potential rewrite rules). In the remainder of this section, we show how meta-rules are used t o define well-known search strategies. This approach yields more extensibility: the database implementor can change the query optimizer strategy using the rule language he/she is used to. This does not require complex programming or reprogramming. Thus, it becomes possible to implement in an easy way various search strategies in an optimizer, for example one for each module of rules. The value of the approach is also to model the search strategy of the optimizer with the rule language, which is a good exercise.

6.2. Greedy search The Greedy search strategy is very simple. It starts from the initial plan derived from the query (using the SetlnitPlan method), then applies all profitable transformations in sequence to find a local minimum. At each step, it generates the conflict set and looks for the best

20

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

immediate neighbor of the current state. A profitable transformation can be chosen based on the estimated cost variation of a rule (i.e. the delta_cost value) or on the real cost of a state after the application of the rewrite rule. The first alternative is used when the estimated differential costs are meaningful and given in the rewrite rules. The second one is used when the delta_clause is often omitted. On the basis of the rules given in Fig. 26, the Greedy Search Strategy is described in Fig. 27. The first rule defines the initial plan. The second and third rules are very similar to those given in Fig. 26 and look for the best improving transformation. As said before, these two rules should be applied in sequence. To reach the local minimum, this sequence is applied up to saturation. The control strategy of the meta-rules captures the entire strategy by requiring to apply once the first rule and up to saturation the sequence of the second and third rules.

6.3. Iterative improvement Iterative Improvement is a randomized strategy. It selects start plans at random in the search space. From a start state, it applies profitable transformations also chosen at random. Finally, the least costly traversed plan is kept as the optimum. The selected start plans must represent a good sample of the search space. Several solutions exist to generate the start plans. The first start plan is the initial query plan. Then, when a local minimum is reached, a new start is generated from it and added to the search space. In Fig. 28, we give the rules corresponding to an implementation of Iterative Improvement. We start from the plan generated by the method SetInitPlan. Then, we try to reach a local minimum by applying profitable transformations chosen at random. This task is performed by the second, third and fourth rules. Again, we find a variation of the sequence of rules given in Fig. 26 and 27. In the Transform_State rule the valid substitution is chosen at random. The major difference with the initial rules consists in the way profitable transformations are defined. Previously, we based our choice on the estimated cost variation of the rule. In the context of randomized strategies, this is difficult to estimate the delta_cost of a rule. For example, if we want to apply a 3-ways join permutation on a plan, it is not easy to predict the delta_cost. Thus, it is sometimes necessary to compute again the global cost of the entire plan. The Choose_Profitable_Move rule compares the two alternatives and decides to keep the current state if there is no benefit, or to change it by a new one. When the transformation is Set_Initial_Plan: IF Query(x) THEN execute y=SetlnitPlan(x) rewrite Search_Space([State(y)]) ; Conflict_Set_State: IF Search_Space([ State(x)]) THEN execute 1 =Match (x, 'R') rewrite Search Space([ Conflict_Set(x, l)]) ; TransformState: IF Search_Space([Conflict_Set(x,1)]) under Not_Empty(l) THEN execute subst = SelectBest(l), new = Apply(x, subst) rewrite Search_Space( [ State(new)]) ; ELSE rewrite SearchSpace( [ Goal(x)D ; Control: [Set_Initial_Plan, [Conflict_Set_State, Transform_State]*] ; Fig. 27. G r e e d y search meta-rules.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

21

Set._Initial_Plan: IF Query(x) THEN execute y=SetlnitPlan(x) rewrite Search_Space([State(y)]) ; Conflict_Set_State: IF Search_Space([ State(x), z*]) THEN execute I =Match (x, 'R') rewrite Search_Space([ Conflict_Set(x, I), z*]) ; TransformState: IF Search_Space([Conflict_Set(x,l), z*]) under Not_Empty(l) THEN execute subst=Select Random(l), new = Apply(x, subst), I' = Minus(l, subst) rewrite Search_Space( [ State(new), Conflict_set(x, l'), z*]) ; ELSE rewrite Search_Space( [ Goal (x), z*]) ; Choose_ProfitableMove: IF Search_Space([State(new), Conflict_Set(x,l), z*]) under Cost(new) < Cost(x) THEN rewrite Search_Space( [ State(new), z*]) ; ELSE rewrite Search_Space( [ Conflict_Set(x,l), z*]) ; Generate_New_Start: IF Search_Space([State(x), z*]) THEN execute new = New_Start(x) rewrite Search_Space( [ State(new), State (x), z*]) ; Find_Best_Plan: IF Search Space([State(x), State(y), z*]}) u n d e r Cost(x) < Cost(y) THEN Search_Space([State(x), z*]) ELSE Search_Space([State(y), z*]) ; Control:

[ Set_Initial_Plan, [

[Conflict_Set_State, [Transform_State, ChooseProfitable_Move ]* ]10, Generate_New_Start] 5, [Find Best Plan]* ]; Fig. 28. Iterative i m p r o v e m e n t meta-rules.

not profitable, the Transform_State and Choose_Profitable_Move rules are applied up to saturation. The process will stop when no more profitable transformation is found, which means that we reach a local minimum. These three rules are applied up to saturation or up to a certain number of times depending on the number of profitable moves that are accepted. For example, in Fig. 28, this number is 10, which means that only ten profitable transformations are accepted. When the limit is reached a new starting plan is generated from the working plan; this is achieved by the Generate_New_Start rule. This new plan is added to the search space, then the sequence of rules defined just above is applied again. The number of start plans is fixed to 5 in our example. Finally, when all start plans have been explored, the Find_Best_Plan rule is applied recursively to select the best one. In summary, the control sequence given in Fig. 28 captures the entire strategy by requiring to apply once the first rule, then to apply 5 times a specific sequence of rules. This sequence aims to perform 10 profitable moves on each start plan. It requires to apply once the Conflict_Set_State rule, then to apply in sequence and up to saturation the Transform_State and Choose_Profitable_Move rules. The traversed search space, which is fully recorded, is finally pruned to extract the best plan.

6.4. Simulated annealing Simulated Annealing is also a randomized search strategy. It starts from the initial plan and tries to improve it as Iterative Improvement does. However, a temporary deterioration in cost of the plan is permitted to explore in a better way the search space. A temperature is defined for the system, which varies from a maximum value to 0. At 0, the system is frozen and only

22

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29 Set_Initial_Plan: IF Query(x) THEN execute y=SetlnitPlan(x), SetlnitTemp0 rewrite Search Space([State(y)]) ; ConflicLSet_State: IF Search_Space([ State(x)]) THEN execute l =Match (x, 'R') rewrite Search_Space([ Conflict_Set(x, 1)]) ; Transform_State: IF Search_Space([Conflict Set(x,1)]) under NoLEmpty(l) or Frozen() THEN execute subst=SelecLRandom(1), new = Apply(x, subst), I' = Minus(l, subst) rewrite Search_Space( [ State(new), Conflict_set(x, 1')]) ; ELSE rewrite Search_Space( [ Goal (x)]) ; Choose_Acceptable_Move: IF Search_Space([State(new), Conflict_Set(x,1)]) under Probability(Cost(x), Cost(y)) THEN execute ReduceTemp0 rewrite Search_Space( [ State(new)]) ; ELSE execute ReduceTemp0 rewrite Search_Space( [ Conflict Set(x,1)]) ; Con~ol: [ Set_Initial_Plan, [ ConflictSet_State, [Transform_State, Choose_AcceptableMove ]* ]* ] ; Fig. 29. S i m u l a t e d a n n e a l i n g meta-rules.

profitable moves are accepted. Before the system being frozen, the criterion for accepting a transformation is different from that of Iterative Improvement: a transformation is accepted if it is profitable, but also if it is not with a probability that depends on the temperature and the augmentation of the cost. The basic idea is to accept a move that deteriorates the cost with a threshold depending on the system temperature. For example, the probability to accept a bad move may be given by the formula [21]: Prob(temperature, CostBefore, CostAfter)

= e -(C°stAfter-C°stBef°re)/temperature

A possible choice for the temperature is the maximum allowed number of moves minus the number of executed moves. The SetlnitTemp and ReduceTemp methods compute the temperature. Fig. 29 gives the meta-rules that implements Simulated Annealing. Note that the meta-rules are very similar to those given in Fig. 28. The major differences are: Simulating Annealing only works on one start plan (i.e. Find_New_start and Find_Best_Plan rules are removed), and the constraint and execute clauses of the initial rules are modified to integrate the SetlnitTemp and ReduceTemp methods. The Probability (CostBefore, CostAfter) function is a Boolean function that returns true if the transformation is acceptable. By opposition to the Iterative Improvement strategy, the control sequence is applied up to saturation; the process naturally stops when the system freezes.

7. Architecture and implementation of the query optimizer

Up to now, we have introduced a term rewriting rule language for expressing query transformations. Meta-rules have been added to express search strategies using the same rule language on sets of plans. We have demonstrated that many kinds of query transformations and optimization techniques, including search strategies, can be expressed with this language.

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

23

In the following, we present the current architecture of the optimizer that we have implemented in the EDS ESPRIT project and the status of the implementation. 7.1. Architecture o f the optimizer

The general architecture of the optimizer is shown in Fig. 30. The optimizer transforms a L E R A program, generated by the ESQL2 parser, into a LERA_opt program (that is an optimized LERA program). The main module is the strategy evaluator, which executes the global strategy. The global strategy defines how the optimizer is organized. For example, the optimizer can be divided in two phases (i.e., query rewriting and query planning) or in more complex phases. The global strategy is defined by using the rule control language, which is applied under the direction of the strategy evaluator that also controls the number of rule applications according to the limits given to the meta-rules. In fact, the global strategy chains together the different control strategies defined by meta-rules. The evaluation of the control sequence corresponds to the activation of specific search strategies. The optimizer engine is based on the Match, Select and Apply procedures as defined in Section 3.3. Rules and meta-rules are stored in the optimizer library. They are organized in modules so that new optimization techniques or new search strategies can be added, modified or deleted without affecting other modules in the library. A module refers either to optimization techniques (i.e. semantic query optimization, syntactic query optimization, query planning), either to search strategies (i.e. Greedy Search, Iterative Improvement, Simulating Annealing). An optimization module can be linked to a strategy module, that means that the rules given in the optimization module are applied under a specific strategy. Many links can be defined, each one corresponding to a specific implementation of the optimizer. In Fig. 30, c4 refers to query planning with Simulating Annealing. Rules and meta-rules refer to built-in or

LERA_optprogram Rule library Fig. 30. The extensible optimizer architecture.

24

B. Finance, G. Gardarin / Data d~ Knowledge Engineering 13 (1994) 1-29

user-defined methods. The most well known methods are Match, Select and Apply to specify strategies, Cost to evaluate the cost of a plan, and catalog manager methods to test for example the presence of an index on a relation or a class. As the global strategy is interpreted by the optimizer engine, it is easy to disrupt a module. For example, semantic query optimization can be avoided, or syntactic optimization can be applied after semantic optimization, or a randomized strategy can be selected for query planning rather than an exhaustive one. For instance, the global strategy can be [cl, c4] or [cl, c2, c3]. The performance of the optimizer engine is very dependent on the efficiency of the methods Match, Select and Apply. The Match procedure performs pattern matching in the LERA program to find the conflict set among the active rules. The LERA program is implemented as a graph of nodes, each node being a C+ + object describing a LERA operation. The Match procedure considers the first rule of the active ones and traverses the LERA graph from the root in a depth first way. If the type of node matches the type of condition (e.g. both are SEARCH), then unification of terms is attempted. If it succeeds, Match returns the variable substitutions. The Select procedure is rather simple and selects if not specified otherwise the first active rule that matches. The Apply procedure first executes the methods to bind arguments and then rewrites the query plan as required by the substitution. A more sophisticated inference engine could be implemented to converge quicker towards the optimal plan.

7.2. Status of the implementation A prototype of part of the optimizer has been developed at the PRISM Laboratory in Versailles, in Cooperation with INFOSYS, INRIA and BULL. It validates the approach given in this paper. The current implementation is done in C+ +. The query is represented by a LERA graph. The graph may have cycles in the case of fixpoint. The inference engine, including the strategy evaluator, match, select and apply components, has been implemented. Two optimization modules have been fully implemented: the syntactic transformation module and the query planning module. Only syntactic optimization rules have been introduced under the form of rules in the rule library. The corresponding rule module is called the logical optimizer. A fixed irrevocable strategy is specified for the syntactic optimization rules. The query graph is scanned node by node and each rule that qualifies is applied up to no more rule can be applied or the maximum optimization time is exhausted. Thus, the current version of the logical optimizer module applies transformation rules in an irrevocable manner, without taking into account cost considerations. It delivers a canonical form of queries. This canonical form is then processed in a second module called the physical optimizer that performs query planning [27]. The physical optimizer is not implemented as a rule base but rather as a procedural program; however, an abstraction with rules using the presented rule language has been done. The physical optimizer has been experienced using various search strategies, including greedy search, iterative improvement and simulated annealing [28]. The optimizer is operational and has been demonstrated at ESPRIT reviews and conferences. However, we do not directly implement the rules in the formal form given in this

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

25

paper: a 'by hand' compilation into C+ + was performed. The rule compiler is currently being developed. Each rule is translated into a Horn clause with a head and a tail composed of a sequence of C+ + procedure calls. The head of the clause is its name. The head includes the search for the particular pattern in the query graph (i.e. a call to Match) and the constraint predicate checks; the tail includes the method calls and the graph transformation (i.e. a call to Apply). The next step of the implementation is to develop the rule compiler for generating optimizers in C+ +. From our 'by hand' experience, it seems to be a feasible task. Further, we would like to introduce semantic knowledge rules in the rule optimization module library, which requires to improve the pattern matching component of the optimizer engine. Then, strategy including differential cost computation should be implemented to support fully the specified approach. Our ultimate goal is to compare various classes of optimizers generated with different optimization rule libraries incorporating various strategies.

8. Conclusion This paper has presented an extensible optimizer for extensible databases supporting complex objects and user methods. A high-level rule language has been proposed to express transformation rules and optimization strategies. The rule language is uniform (based on term rewriting under constraints) and supports syntactic, semantic and cost-based optimization rules. Reusability of generic methods for developing an extensible optimizer is important. The extensibility is given by offering a rule-based system invoking reusable methods that can be extended by database implementors. One of the main originality of the proposed extensible optimizer is to allow the database implementor to specify various search strategies using meta-rules. Meta-rules and rules are written with the same language; this is possible due to the accessibility of the optimized query plans as terms to which rewrite rules apply. Examples of the most well known strategies have been given as typical meta-rule programs. The power of the rule language and the separation of the control structures from the rules make them compact and relatively easy to understand. Meta-rules are interpreted by the strategy evaluator component of the extensible optimizer. Organizing query processing knowledge in transformation rules and strategy control metarules yields a modular architecture for the query optimizer. The optimization rule-base is divided in modules; each module has its own strategy associated as meta-rules. Thus, each optimization module is clearly parametrized at two levels by rules and meta-rules. Further research is necessary to understand better what can and cannot be done. Clearly, it appears that various rule modules embedding powerful search strategies can be added to the optimization knowledge base. Such modules may lead to long processing if the limits in number of rule application are too high or the strategy too exhaustive. If one stops too early (low limit), then the optimization can be very poor. Thus, a tradeoff has to be found. Thanks to the organization of the rule library in modules- each module having a different search strategy- various strategies can be combined and experimented. The limit given to a given module in unit of time could also be allocated dynamically, according to the complexity and type of the query. Simple queries (e.g. search on a key) do not need sophisticated

26

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

optimization: a small limit can then be given to all modules. Complex queries need optimization: a high limit can then be given to the modules. Various local strategies could even be selected according to the type of the query. In summary, the specification of a query optimizer in modules of rules, each module having its own search strategy, is a promising approach to generate highly adaptable and extensible optimizers. This is feasible with a high-level term rewriting rule language. This general approach has been demonstrated as efficient to implement the EDS optimizer. Further work remains to be done to integrate a complete optimizer (e.g. including semantic optimization) following the lines presented in this paper. Acknowledgments

The authors are indebted to Rosana Lanzelotte for pointing out the problem of extensible search strategies and for helpful discussions. The authors wish to thank both Rosana Lanzelotte and Patrick Valduriez for helpful comments on a first draft of this paper. Annex: Grammar of the rule language

(Rule) ::=

(Rule Name) : If (Term) [under(Constraint) ] [delta_cost = (FunctionExpression) ] THEN [execute(Method List)] [rewrite] (Term) [ELSE [execute (Method List) ] [rewrite] (Term) ]

( Rule Name ) :: = (identifier) (Term) :: = [(label) :] (Functional Expression) I[ (label):](Complex Term) [: (label) (Functional Expression) :: = (Function Variable) ((Term) [, ( T e r m ) . . . ]) [(Function N a m e ) ( ( T e r m ) [ , ( T e r m ) . . . ]) ( Complex Term) :: = (Constant) I (Path Expression) I (Term Variable ) I (Collection Variable) I { (Complex Term List)) [ [ (Complex Term List) ] [( (Named Complex Term List))

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

27

(Constant) ::= (Real) I (Integer) I (String) [ (Boolean) (Path Expression) :: = (Path Term) [. (Path T e r m ) . . . ] (Path T e r m ) : : = (identifier) ] (Term Variable) [ (Collection Variable) (Term Variable)::= u l v l w l x l Y l Z . • • (Function Variable):: = F I G I H III J [ K . . . (Collection Variable):: = u* Iv* [w* Ix* l Y* I z* • • • (C_Term) :: = (Collection variables) I (Term) ( Complex Term List) :: = (C_Term) [, ( C _ T e r m ) . . . ] (Named Complex Term List) :: = (identifier) : (Term) [, (identifier) : ( T e r m ) . . . ] (Constraint) :: = (Formula) (Formula) :: = NOT((Formula)) [ ((Boolean Expression) AND (Formula)) [ ( (Boolean Expression) OR (Formula)) [(Boolean Expression) (Boolean Expression) :: = (Functional Expression) [ (Predicate) [ true [ false (Predicate) :: = (Num_exp) (Comparator) (Num_exp) (Comparator)::= = [ < [ > [ ~<[ I>[... (Num _exp ) :: = ( Constant ) [ ( Functional Expression ) (Method List) ::= ( M e t h o d ) [ , ( M e t h o d ) . . . ] (Method) :: = [ ( label ) :] (Term Variable ) = (Functional Expression)

References [1] A.V. Aho, R. Sethi and J.D. Ullman, Compilers: Principles, Techniques and Tools (Addison-Wesley, Reading, MA, 1986). [2] H. Ait-Kaci, Solving type equations by graph rewriting, Lecture Notes in Comput Sci. 202 (1985). [3] F. Bancilhon, D. Maier, Y. Sagiv and J.D. Ullman, Magic Sets and other strange ways to implement logic programs, ACM PODS Symp. Cambridge, MA (1986). [4] C. Beeri and R. Ramakrishnan, On the power of Magic, ACM PODS Symp. (1987). [5] B. Bergsten, M. Couprie and P. Valduriez, Prototyping DBS3, a shared-memory parallel DBS, Int. Conf. on PDIS, Miami (1991). [6] M.J. Carey, D. DeWitt et al., The architecture of the EXODUS extensible DBMS: A preliminary report, IEEE Int. Workshop on Object-Oriented Database Systems, Asilomar, CA (1986). [7] S. Ceri, G. Gottlob and L. Lavazza, Translation and optimization of logic queries: the algebraic approach, Int. Conf. on VLDB, Kyoto (1986). [8] U.S. Chakravarthy, J. Grant and J. Minker, Foundations of semantic query optimization, Int. Workshop on Foundations of Deductive Database in Logic Programming (7) (1986) 243-273. [9] N. Dershowizt and J.P. Jouannaud, Rewriting Systems, in: J. van Leeuwen, ed., Handbook of Theoretical Computer Sciences, (North-Holland, Amsterdam, 1987).

28

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

[10] B. Finance, Une Plate-forme pour la grnrration d'optimiseurs extensibles, Ph.D. Thesis, University of Paris 6. 1992. [11] Freytag J. and Goodman N., Rule-based translation of relational queries into iterative programs, A C M S1GMOD Int. Conf., Washington, D.C. (1986). [12] J.C. Freytag, A rule-based view of query optimization, A C M SIGMOD Int. Conf., San Francisco (1987). [13] J.C. Freytag, The basic principles of query optimization in relational DBMS, lnformat. Process. Lett. (IFIP, 1989) 801-807. [14] G. Gardarin and P. Valduriez, Relational Databases and Knowledge Bases (Addison-Wesley, Reading, MA, 1989). [15] G. Gardarin and P. Vaiduriez, ESQL2: An object-oriented SQL with F-Logic semantics, Int. Conf. on Data Engineering, Phoenix (1992). [16] G. Gardarin and R. Lanzelotte, Optimizing object-oriented database queries using cost-controlled rewriting, Int. Conf. on Extended Data Base Technology (EDBT), Vienna (1992). [17] G. Graefe and D. DeWitt, The EXODUS optimizer generator, Ph.D. Thesis, University of Wisconsin, August 1987 and published at A C M SIGMOD Int. Conf., San Francisco (1987). [18] L.M. Haas, J.C. Freytag, G.M. Lohman and H. Pirahesh, Extensible query processing in Starburst, IBM Research Report RJ 6610 (63921). [19] L.M. Haas, W. Chang, G.M. Lohman, J. McPherson, P.F. Wilms, G. Lapis, B. Lindsay, H. Pirahesh, M.J. Carey and E. Shekita, STARBURST mid-flight: As the dust clears, IEEE Trans. Knowledge and Data Eng. 2 (1) (1990). [20] W. Hasan and H. Pirahesh, Query rewrite optimization in Starburst, IBM Research Report RJ 6367 (62349). [21] Y. Ioannidis and E. Wong, Query optimization by simulated annealing, A C M SIGMOD Int. Conf., San Francisco, CA (1987). [22] Y. Ioannidis and C.K. Younkyung, Randomized algorithms for optimizing large join queries, A C M SIGMOD Int. Conf., Atlantic City, NJ (1990). [23] Y. Ioannidis and Y. Cha Kang, Left-deep vs. bushy trees: An analysis of strategy spaces and its implication for query optimization, A C M SIGMOD Int. Conf., Denver (1991) 168-177. [24] ISO/IEC JTC1/SC21 N5739, Database Language SQL, ISO/DIS 9075:199x (E), Draft International Standard, April 1991. [25] M. Jarke and J. Koch, Query optimization in database systems, A C M Comput. Surv. (16) (1984) 111-152. [26] G. Kiernan, C. De Maindreville and E. Simon, Making deductive database a practical technology: a step forward, A C M SIGMOD Int. Conf., Atlantic City, NJ (1990). [27] R. Lanzeiotte and P. Valduriez, Extending the search strategy in a query optimizer, Int. Conf. on VLDB, Barcelona, Spain (1991) 363-373. [28] R. Lanzelotte, P. Valduriez and M. Zait, Optimization of object-oriented recursive queries using costcontrolled strategies, A C M SIGMOD Int. Conf. (1992) 256-265. [29] K.M. Lee, J.C. Freytag and G.M. Lohman, Implementing an interpreter for functional rules in a query optimizer, Int. Conf. on VLDB, Los Angeles (1988). [30] J. Rohmer and R. Lescoeur and J.M. Kerisit, The Alexander method, a technique for processing of recursive axioms in deductive databases, New Generation Comput. (4) (Springer-Verlag, 1986) 273-285. [31] G. Saake, V. Linnemann, P. Pistor and L. Wegner, Sorting grouping and duplicate elimination in the advanced information management prototype, Int. Conf. on VLDB, Amsterdam (1989) 307-316. [32] E. Sciore and J. Sieg, A modular optimizer generator, IEEE Trans. Knowledge and Data Eng. 2(1) (1990). [33] P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie and T.G. Price, Access path selection in a relational database management system, A C M SIGMOD Int. Conf. Boston, (1979). [34] S. Shenoy and Z. Ozsoyoglu, A system for semantic query optimization, A C M SIGMOD Int. Conf., San Francisco (1987). [35] M. Stonebraker, E. Wong, P. Kreps and G. Held, The design and implementation of INGRES, A C M Trans. Database Syst. 1 (3) (1976). [36] A. Swami and A. Gupta, Optimization of large join queries, A C M SIGMOD Int. Conf., Chicago (1988). [37] A. Swami, Optimization of large join queries: Combining heuristics and combinatorial techniques, A C M SIGMOD Int. Conf., Portland, OR (1989).

B. Finance, G. Gardarin / Data & Knowledge Engineering 13 (1994) 1-29

29

[38] J . D . U l l m a n and A.V. A h o , Universality of data retrievial languages, ACM POPL Int. Conf., (1979). [39] J.D. U l l m a n , Principle of Database and Knowledge-Base Systems ( C o m p u t e r Science Press, Rockviile, M D , 1988). [40] P. Valduriez, Optimization of complex queries using join indices, Database Eng. IEEE 9(4) Special Issue on query optimization (G. L o h m a n , 1987). [41] E. Wong, W. H o n g , Separable semantic optimization, Internal R e p o r t from E E C S D e p a r t m e n t , U . C . Berkeley, C A , 1989. was born in Riom, France, in 1947. He received the B.S degree from the Ecole Normale Supeneure de l'Enseignement Technique and the M.S. and French Ph. D degrees in computer science from the University of Paris VI, in 1970 and 1978, respectively. In 1979, he was visiting researcher at UCLA. From 1980 to 1989, he was a full professor at the University of Paris VI and scientific adviser at the Institut de Recherche en Informatique et Automatique (INRIA) where he leaded the SABRE research project, which was developing a parallel system. He is now professor at the University of Versailles and director of the PRISM research Laboratory. He has worked in several ESPRIT projects, among them the EDS parallel database project and now the IRO-DB interoperable DBMS project. Georges Gardarin's main interests are databases, parallel and distributed systems. Prof. Gardarin has published more than 60 papers in international conferences and journals. He has served as chairman or a member in several international conference program committees, among them ACM SIGMOD, VLBD, IEEE Data Engineering. He is a member of the VLDB endowment. He is also the author of several French books (Ed. Eyrolles) and two English books (Ed. Addison Wesley). --

Georges Gardarin

Bb~atrice Finance was born in Ch~lons, France, in 1965. She is an assistant professor in computer science at the University of Versailles. She received her B.S. and M.S in computer science from the University of Paris VI, France, in 1987 and in 1988, respectively. In 1992, she received her French Ph. D in computer science from the University of Paris VI. Her research interests include object-oriented databases, deductive databases and their query optimization. She worked in several ESPRIT projects, among them the EDS parallel database project, the Pythagoras project whose aim is the performance quality assessment of advanced database systems, and now the IRO-DB interoperable DBMS project. She acted as a member of national and international conference program committees, the French national conference 'Bases de Donn6es Avanc6es' and the 9th Symposium on Applied Computing (SAC'94), Special Track on AI/DB.