Infom~. Syrrcms Vol. 4, pp. 293-305 Pergamon Pros Ltd., 1979. Printed in Great Britin
LEGOL 2.0: A RELATIONAL SPECIFICATION LANGUAGE FOR COMPLEX RULESt SUSANJONES,PETERMASONand RONALDSTAMPER London School of Economics and Political (Received
II
Science, Houghton Street, London WCZA 2AE, England
December
1978: revised
25 April
1979)
is a language for writing rules such as those which might appear in legislationor system specifications.in such a way that they can be interpreted automatically and tested to discover whether they will
Abstract-LEGOL
have the desired effect. In particular it is intended for database applications where the correct handling of time is an important issue. The following paper describes a subset of the syntax of this language and makes some comparisons
with other relational formalisms.
lNTRODUCTlON
One of the prime requisites of a high level system specification language is that it provide a means of stating what actions must be performed under what circumstances, without going immediately into detailed questions of implementation methods. The designers of LEGOL, which is intended to serve such a purpose, have used legislation as their model for the following reasons: (a) legislation states general principles to be applied in a precise, formal and yet general way; (b) it provides the basis for many actual financial and administrative systems where the provisions of the law must be translated in terms of information processing tasks to be performed by persons or machines. A formal language which can handle legislation is likely to be adequate for expressing complex rules in other application areas where information systems must be defined. This paper will show LEGOL at work in both legal and non-legal areas, in an attempt to show that their requirements are comparable. One very necessary feature of a legally-oriented language is the ability to deal consistently and automatically with time. The correct application of a law may require access to facts about events taking place over a long time span; past history can never be definitively deemed irrelevant. This perspective differs from that adopted in many conventional data-processing applications where decisions to destroy quite recent information are commonplace, and where items such as dates are recorded and manipulated in the same way as any other pieces of data. LEGOL is based on the philosophy that information about time provides a framework which defines the validity of all other information stored in a data base, and that it must be handled in a special way. Only then will a high level system specification have the necessary degree of generality, and decisions to “forget” past information be seen as what they are-part of the lower level process of system design and implementation. LEGOL is intended to allow its users to refer to entities and processes which have some observed existence outside a computer system. Because it aims also to *This research was funded by the British Science Research Council and the British Social Science Research Council. 293
be interpretable by computer it must-do so by providing: (a) representations of entities in terms of recorded items of information “about” them, e.g. names, .measurements, classifications, etc.; (b) manipulations of this information whose results reflect as directly as possible the results of the corresponding external processes. The use of LEGOL involves both aspectsspecification of the entity representations to be set up and of the rules to manipulate them. In more familiar terms, we are talking about “data definition” and “program specification” but with the proviso that the language has many built-in constraints to ensure a sensible correspondence between data structures and what they represent, and between symbol manipulations and the processes which they simulate. These constraints have a basis in an “epistomological semantic model” discussed in Stamper [ 11 and are not elaborated here. The present paper is devoted to describing the syntactic aspects of the language; illustrating some of the representations and operations available and the consequences of using them. Examples will be taken from both legal and data processing sources, mainly the Family Allowances Act of 1%5, stock control systems and personnel applications. 1. THR REPllESRN’TATlON OF ENTITIES
In general terms, an entity is something to which we want to refer within the context of a particular problem application. Examples: (a) To express the provisions of the Family Allowances Act we need to refer to families, parents, children, sums of money, periods of time, the school leaving age, etc. (b) In order to specify a stock control system we need to refer to commodities, store locations, quantities of commodities at locations, issues and receipts. Each one of these can be considered as an entity. LEGOL entity representations are based on the relational model. Using this system, all recorded information is stored in the form of tables, with one or more rows and columns. Each individual table represents an entity set, that is, a collection of entities of the same type. Each column stores one item of information in the entity representation. Each row or “tuple” represents one instance of the entity.
S. JONES,P. MASONand
294
R. STAMPER
Example: the following table represents four instances of an entity set of persons, and records three items of information or “attributes” about each.
Table i(a). Date of Death
PerSOn
Date of Birth
Arthur
1870
1956
Freda
1890
1956
Florence
la95
1965
Willisa!
1897
1973
LEGOL has some conventions for describing and using attributes which are not common to those of normal relational theory. For instance, the first or left-most attribute (~metimes called the “ch~cteristic“~ has a special status since language references to the whole table are in many cases treated by default as particular references to this attribute. Typical contents of the characteristic attribute position are names, e.g. the person’s name in Table l(a) above and values, e.g. the “quantity in stock” of Table l(b) below. Examples of typical uses will be seen in later sections. Table t(b). Location
Start
time
End time
Stock
1tcB
100
2
1” elbows
Wolverhampton
lw7a
3llf7a
35
2
4” elbows
Wolverhampton
3t117a
unknown
Q-19
90
1
t-pieces
Coventry
50
1
t-pieces
Coventry
1oflr7a
40
1
t-oieces
Coventry
15/l/78
111178
low78 15w7a unknown
Start times are inclusive; End times exclusive. The other possible attribute types are “identifiers” and “time attributes”. Identifiers are items of information essential to the process of distinguishing one entity from another; together with the characteristic they may be used for relational joins. Table I(a) has no identifiers, but the columns headed STOCK, ITEM and LOCATION are the identifiers of the “quantity in stock” of Table l(b). Time attributes define the canning and end of the existence of any entity. LEGOL entity representations always have a time dimension; we must record not only the facts but the period at or during which these facts are or were true. In certain cases of course, the contents of a time attribute may be unknown (e.g. the date-of~eath of a stiIl living person) or irrelevant to the problem in hand. But every table must contain columns in which time attributes may be recorded. Examples: (a) Date-of-birth and date-ofdeath of a person (see Table la). (b) The first date on which the qu~tity-in-st~k of a comm~ity had a particular value, the last date on which it had that value before being changed (see Table lb). In this case, the characteristic value (quantity in stock) is a time-dependent property of a particular combination of identifier attributes. To summa&e, all entity representations are made up of: one characteristic attribute zero or more identifier attributes two time attributes in that order.
The entities which can be represented in this way are, inevitably, a subset of the possible ones which a language for formalising legislation or complex organisational rules may require. Neve~heIess they allow a useful class of problems to be solvedIll. Very briefly, they are characterised as being: (a) Static rather than dynamic The language provides facilities for representing and manipulating information about particular states of affairs, e.g. that such and such an entity exists, that it has particuku’ properties or is in a particular relationship to another entity. It allows the user to define when one state can be derived from another. But it cannot, in the present version, refer directly to external events or processes which in reality bring about changes of state, or relationships, such as causality, between such events or processes. (b) “Substantive” rather than “procedural” (i) Substantive entities are those which have an existence independent of any formal information system. The ones just illustrated-i.e. persons, quantity of goods in a particular store-are of this kind. (ii) Procedural entities are those which are brought into existence only to serve the needs of a formal information system. The most obvious examples are the documents used to convey messages through a systemrequisitions, job sheets, claim forms, certificates, etc.
LEGOL 2.0: A relational specification language for complex rules Procedural entities are, of course, important in system specification and in legislation, but obviously the conventions for representing and m~ipulating them may be of an entirely separate kind and this paper does not discuss them.
2. REI%RRtNG TO ENTITY ~P~E~ATiO~S
2%
picks out the third identifier attribute for whatever processing is specified. For referring to time attributes, two reserved words “start” and “end” are used. Either of these may be placed in front of the entity label in order to point to a time attribute, e.g. the expression: start of PERSON
References to LEGOL entity representations are made by using the entity name optionally followed by a list of one or more identifier elements, separated by commas and enclosed in brackets. Most frequently, the identifier elements are simple identifier labels. Examples: (a) To refer to an entity set of persons, with no identifiers, we simply use the entity label, e.g. PERSON. (b) To refer to a state or property of such a person; we use the entity label plus an identifier label, e.g. STATUS (PERSON). (c) To refer to a relationship between a person and an object we require two identifier labels, e.g. OWNERSHIP (PERSON, OBJECT). It will be noted that we have made no reference to the time attributes. There is no need to do so when we are talking about the entity as a whole; the reference automatically includes them. When we wish to single out a time attribute explicitly we have a way of doing so, described later in this section. Entity labels are very straightforw~d: they simply name a table of data. Identifier elements are more complex, since there are a number of possibilities as to the form they may take. The most frequent form is that of an alpha~tic string called an “identifier label”. For example, Table l(b) above could be referred to as QUANTITY (STOCK, ITEM, LOCATION) where labels STOCK, ITEM and LOCATION each stand for one identifier attribute. Although relational operators act upon whole tables, there are many operations to which only one attribute is relevant. (It does not make sense, for instance, to talk about arithmetic addition between two tables, only between two suitable components of those tables). LEGOL therefore needs a way of “pointing” at individual components or attributes. By default, if an operation requires a single attribute, the characteristic is used, So the expression:
singles out the start time attribute of that table. It will be seen that these conventions are somewhat different from those commonly adopted in a relational language such as Alpha131 where all individual components are referred to by attribute name qualified by table name, e.g. SUPPLIER.CITY. The principle points of difference are: (a) The identity of the table name and first attribute name in LEGOL permit the default reference as a kind of pun. Within the context of a strictly controlled attribute structure this allows many operations to be specified in a very succinct way, as examples in the following sections will show. (b) LEGOL identifier labels are bound oariubles. In the reference given above meaningful labels such as STOCK, ITEM, LOCATION were used, but in practice exactly the same effect would have been produced by tl,J expression L of QUA~~(S, I, L) in selecting the third identifier element of the tabfe. It is the order of items in the list which is s~~c~t. This fact has two consequences: it is possible to write very abbreviate code if required, and, more implant, to exploit the ~ssibilities of identifier label mu~c~j#g in order to specify the relational joins, projections and selections which underlie the LEGOL operators. This point will be developed in Section 4.
3. THE LEGOL “RULsE”
We specify an application in LEGOL (whether a piece of legislation or a system definition) by writing a series of LEGOL “rules” to be applied to entity representations. The rule is the basic LEGOL syntactic unit, which specifies one or more operations to be performed, and how the result is to be represented. A rather simplified syntax? of a rule would read as follows:
QUANTITY (STOCK, ITEM, LOCATION) + 16 will automatically add 16 to the value stored as the quantity. If we wish to single out an identifier attribute we must make a special reference to the identifier label, e.g. the expression: LOCATION OF QUA~I~ (STOCK, ITEM, L~ATION)
tThe form of BNF used here is due to Wirth[7]. syntax definition is given in the appendix.
A complete
rule :: = target update source target :: = entity reference update ::=+I+ source :: = expression {operator expression} expression :: = entity reference I “r’ expression {operator expression} “I,’ In words, a rule consists of a “target” expression (a reference to an entity ~presentation to be updated) followed by an update symbol, then by a “source” expression (containing references to entity representations and the operations to be performed upon them).
S, JONES,P. MASIN and R.
296
STAMPER
Examples: below are some typical LEGOL rules. Target Expression
Update Symbol
Source Expression QUANTITY(S,I,L)+RECEIPTS(S,I,L)
8)
QUANTITY(S.1.L)
<-
b)
start of CHILDHOOD(PERSON)
<-
start of PERSON
c)
ALLOWANCE(FAMILY)
+
RATE(NCHXLDREN) while NCHILDREN(FAMILY)
They are intended to be automatic~ly inte~re~ble and applicable to entity ~pre~n~tions stored on a data base. The inte~re~tion process involves: (a) Evaluating the source expression, taking into account matching identifier labels and relevant time attributes. (b) Restructuring the resultant table to be compatible with the target entity representation, again taking into account matching identifier labels. (c) Merging the resultant tuples with the existing target, and storing the updated entity representation. It is impo~nt to point out at this stage that the individu~ rules illus~ted here would nosily be seen as part of a longer sequence. In general each rule may use output from, and supply input to, other rules. The object of applying such a sequence would be to derive from tables recording observed facts some results representing the legal or actual consequences of those facts, and this derivation could be quite long and complex. There is a contrast here with the conventional use of relational formalisms in query languages, where “completeness” is defined informally in terms of the ability to answer any data base query with a single self-anon statement 131. LEGOL can function successfully as a query language, but design decisions about its mode of operation are made on the basis of its likely use for simulating the effect of a piece of legislation. This paper, however, does not show a continuous example since its purpose is to illustrate the language syntax in some detail, but the reader is referred to [4,8] which do.
I M,M,M,M,M,M,M,M,M,M,GID-LABELS use of matching identifier labels in entity references provides a powerful and concise means of specifying which at~bute values from different tables are to be compared and transferred during LEGOL operations. The principles are defined and illustrated below. If an identifier label of one entity reference in a source expression matches any label of another entity reference in the same source expression, the match will cause an equijoin to be performed on the corresponding columns of the two entity representations referred to. Thus, for example, the expression The
QUA~~(S,I,L)
+ ~CEIPTS(S,I,L)
clearly demands a match on all three identifier attributes of the referenced tables before the addition of characteristic attribute values can take place. Not all identifiers need be matched however; in the expression
QUA~~(S,I,L)
* COST(I)
obviously only one is relevant. These expressions appear to have a considerable advantage over the equivalent Alpha formalism, for instance, where attribute equality must be specified explicitly, e.g. GET W(QUANTITY-TABLE. QUANTITY, COST-TABLEXOST): QUANTITY-TABLEITEM = COST-TABLEITEM In the third of the example rules given in Section 3 it will be noticed that the identifier label of one entity reference matches the entity label of another, i.e. RATE(NCHILDREN) while NCHILDREN(FAMILY). Exactly the same principle is observed here, the identifier attribute of the BATE table is matched with the characteristic of the NCHILDREN table for the purpose of performing the relational join. The only important difference is that the label NCHILDREN is invariant-it actually names a relational table stored in the data base. This particular expression will be used again in the following section to illus~ate the effect of the LEGOL operator “while”. A match between an entity or identifier label in a source expression and an entity or identifier label in a target expression will cause the referenced component of the evaluated source to be transferred to the corresponding component of the target. The transfer from source to target of attributes pointed to by matching identifiers is one of a number of processes forming part of the “update” operation. Update is described in Section 9; here we will simply illustrate the transfer of identifier attributes, using the current examples: (a) QUAN~(S,I,L)~QUAN~~(S,I,L) t RECEIPTS(S,I,L) As one might expect, all three identifiers-STOCK, ITEM and LOCATION are transferred to becomed attributes of the newly created QUANTITY. (b) start of CHILDHGGD(PERSON)&tart of PERSON The characteristic attribute of the PERSON table (the person’s name) is transferred to become the identifier attribute of a new table Dennis a particular timedependent state’of that person. (c) ALLOWANCE(FAMILY)c$RATE(NCHILDREN) while NCHILDREN(FAMILY)
LEGOL
2.0: A relational specification language for complex rules
The attribute labelled FAMILY (identifier of table NCHILDREN) becomes the identifier of the new ALLOWANCE table. Once again there is a favourable comparison to be made with Alpha where qualified names would need to be mentioned in the target specification, e.g.
A general point to note about them is that. conceptually, they deal with all represented periods of time simultaneously, just as they act simultaneously on all members of an entity set. They are, in effect, set operations with additional temporal connotations. By default, LEGOL makes no distinction between “historical” and “current” states, although in special circumstances it allows the user to select instances of current or past states. The time-related operators will be listed here and one of them will be examined in detail. They are: (a) Time Intersect (“while”), (b) One-sided time intersect (“since” and “until”), (c) Time difference (“while not”), (d) Time
GET W(RATE-TABLE.RATE, NCHILDREN-TABLE.FAMILY): RATE-TABLE.NCHILDREN = NCHILDREN-TABLE.NCHILDREN. We can also mention at this point that the identifier label position in an entity reference may be occupied by: (a) a literal, e.g. CHILDHOOD(“JACK”) which acts as a selector of the relevant entry or entries in the table; (b) a “null” symbol, e.g. QUANTITY( - ,ITEM,LOCATION) to be used when the value of a particular attribute is not significant in the current rule. The following sections illustrate some particular LEGOL operations, showing how far they go beyond those of the normal relational algebra or calculus.
union (“or while”), (e) Time set membership (“during”). Time Intersect is the most important operation, not only in its own right, but because it underlies other operations as well. It identifies states which co-occur in time and creates new table entries, with appropriate attributes, to represent them. Example. This has already been used to illustrate the function of matching identifier labels. We now introduce the time dimension, which, for the sake of simplicity, was previously ignored. The expression:
5.TIMERELATED OPERATIONS In Section I of this paper we saw that every entity representation contained two time attributes, delimiting the period of existence of each member of the entity set. Most LEGOL operations take these attributes into account in some way; those which are the subject of this section
are specifically
concerned
297
RATE(NCHILDREN)
while NCHILDREN(FAMILY)
is used to associate a standard rate of allowance with individual families, according to number of children. Both the rate of the allowance and the number of children in a family are liable to fluctuate over time. We might have the following representations:
with time.
Table 2a. RATi? af-
NCHILDRRN
start
end
NCHILDRRN
FAMILY
start
end l/6/67
2
111165
l/1/70
3
Jones
2113159
ral-
3
111165
l/1/70
2
Jones
116167
319172
25/-
2
l/1/70
-
2
Smith
214166
al10173
40/-
3
l/1/70
-
3
Smith
al10173
-
(Where an end time of “-” is assumed to be an unknown time in the future and therefore later than any recorded time). The time intersect allows us to say which families had which allowances at which periods. For example, Jones had 3 children in 1959.The Act came into force in 1965so the first period relevant to both entities was 1%5-67 when Jones was eligible for eighteen shillings a week:
Table 2. RATE
ial-
NCHILDREN
3
In 1967 the eldest child eight shillings a week: al-
left 2
FAMILY
start
end
Jones
l/1165
l/6/67
school
and the entitlement 116167
Jones
was reduced to 111l70
But in 1970 the allowance went up end until his other two children to be eligible he was entitled to twenty-five shillings a week: 25/The rerult
for
2
Jones
Smith shows a similar
111170
319172
end
pattern:
FAMILY
start
2
Smith
214166
l/1/70
25/-
2
Smith
l/1/70
ah0173
40/-
3
Smith
all0173
RATE al-
NCHILDRRN
-
ceased
298
S. JONES, P. MASON and R. STAMPER
Obviously this is an extremely powerful operation and the equivalent formulation in a relational language which does not give time attributes a special status would be comp1icated.t It certainly can be expressed using a series of relational algebra operations, and was implemented in this way on the Peterlee Relational Test Vehicle[2]. It has the effect, generally, of splitting up time into smaller slices, and one of its possible uses is to produce entity representations identified by arbitrarily delimited periods, such as those required for administering tax or national insurance. The general significance of the other time-related operators can probably be grasped by analogy, and by considering the natural language interpretation of the words used. In Section 7 we shall also see some oneplace functions which act upon and produce time attributes. LEGOL also contains generalised forms of the normal set operators, which do not affect the time attributes but construct table entries according to the pattern of matching identifier labels. 6.~E~~~S ON IN~V~~ A~~~ Operations between the individual attributes of two entities always imply a time intersect between the periods during which those attributes exist. Only attributes which co-exist in time can be operands of the same LEGOL operator, and the result of applying that operator will always have times associated with it. The meaning of the arithmetic (t , - ,*,/) operators is usually self-evident. They are of course applicable only to attributes with numeric values. We have afready seen one expression ~ontaini~ an addition of two characteristic attribute values, i.e. QUANTITY(S,I,L) t RECEIPT(S,I,L), and the full implications of the underlying time intersect in this example will be explained in Section 9. A second example illustrates addition on a time attribute, i.e. start of PERSON t SCHOOL-LEAVING-AGE which will add the relevant figure to every date~f-bi~h attribute in the table of persons. Arithmetic comparison operators ( > , z , -C, 5) are applicable to attributes which are either numeric, or taken from a domain which has an ordering defined upon it. They imply a time intersect and ~ons~uently a selection of time periods; they also involve a more explicit selection of those entries for which the result of the comparison is “true”. Using this operator, the famous There are four ways in which two periods, both delimited by a start and end time, can overlap. A sequence in the relational algebra to implement “while” might therefore be: -four selections, each specifying the kind of overlap (e.g. &a&RATE > start.NCHtLDREN & end.RATE > end.NCHt~DR~N); -four projections, each specifying the times delimiting the overlap period (e.g. start.RATE and end.NCHILDREN); -a union of these projections to give all possible combinations of overlap.
data base conundrum about finding the employee who earns more than his manager can be answered easily in LEGOL; the expression: SALARY(EMPLOYEE) while MANAGES(MANAGER,EMPLOYEE) >SALARY(MANAGER) will produce the result and the time periods during which this situation was in force. Comparisons with literals may also be made, e.g. NCHILD~N(FAMILY) > I, and, of course, time attributes can also be compared, e.g. end of PERSON > end of CHILDHOOD(PERSON) selects persons who survived their childhood. Equals and not-equals ( = , f ) work in the same way as the other comparison operators. However they do not require their operands to be taken from a numeric or ordered domain; they may be used to test equality with a literal character string, e.g. GRADE(EMPLOYEE,ORGANISATION) = “SENIOR” or to compare any two attributes generally. It will be evident that ‘*equals” allows equijoins to be specified without label matching. This is especially necessary for comparing the characteristic attributes of two entity representations, which must by definition be differently labelled.
Like other relational languages, LEGOL provides a number of functions which act upon tables. They may be classified in two different ways: (a) Whether they aggregate or select. hectic functions produce results which are a summary of their argument, and will usually have a different attribute structure. Familiar examples are functions to sum attribute values or colrnt the number of entries in a table. Selecting functions use some criterion to choose a subset of their argument, whose structures and content remain unchanged. Examples are functions to identify entries with maximum or minimum attribute values. (b) Their action with regard to time. Some functions return results valid at each separately identifiable period of time, i.e. a time dist~bution; others yield results valid over a total period. A third group are explicitly defined to act upon or produce time attributes. The latter classification will obviously le less familiar as it has no counterpart in other relational languages, so it will be illustrated. In Alpha, the maximum quantity in a table would be selected by an expression such as GET(MAX(QUANTITY-TABLE.QUANTITY)) which would return a single value. In a time-based system, however, the maximum quantity is changeable, and we may be interested in seeing the pattern of change. By applying the equivalent LEGOL function to the table of
LEGOL2.0: A relational specification language for complexrules
299
quantities given in Table l(b): max of QUANTITY(STOCK,ITEM,LOCATION) we would produce the result:
ITKM
LOCATION
start
end
100
2
4" elbows
Wolverhampton
lllf78
3/1178
90
1
t-pieces
coventry
3/11?8
IO/l/78
OIJANTITY
STOCK
showing for which time period the maximum quantity occurred. Note that the function is applied by default to the characteristic value of the argument, and that all the other attributes are retained. It will be evident that the application of this function involves comparison between time periods within the same table and gives a distribution over time of the selected values. We may also wish to know the maximum quantity overall and for this purpose LEGOL provides another function--highest, which, applied to Table l(b), would take only the entry where QUANTITY = 100. There are similar counterparts for other selecting and aggregating functions, namely: at each time min sum number
over all time
lowest accumulate count
Although at first sight this distinction may appear complicated, experience has shown that it is necessary if a temporal data base is to be handled satisfactorily. It could be crucial, for instance, in the provision of management information about past events. In the legal context and using another current example, the distribution over time shown in NCHILDREN(FAMILY) (Table 2a) could be derived only by applying the number function to a representation of relationships between children and their families. The corresponding count of those children would not have the desired effect. This example leads on to another important point. The function calls illustrated so far give results for the whole table, whereas in practice it might be required to produce them, say, for each item, or location. In LEGOL this is achieved by the use of control identifiers. For instance, in the expression sum for ITEM of QUANTITY( - ITEM, -) ITEM is the control identifier, causing the function to sum quantities for each item, regardless of location or stock. The result will, of course, be a time distribution; the use of the corresponding “overall” summation function accumulate would make no sense at all here. A second example shows a function taking an expression as argument, i.e. sum for LOCATION of [QUANTITY( - ,ITEM,LOCATION) * COST(ITEM)]
could compute the total value of stocks at each location, once again distributed over time. More than one control identifier may be specified, in which case results are returned for each combination of control identifier values encountered in the table. Function calls with control identifiers are comparable to the Alpha “image functions”-IMAX,IMIN,ICOUNT, ITOTAL etc. which produce results for each different value of a specified attribute. Similarly, Sequel, the relational language for System R, contains a “GROUPED BY” operator, the effect of which is “to partition the table concerned into groups, such that within any one group all rows have the same value within the indicated column(s)” and this allows it, like LEGOL, to “use the same function as both “simple” and “image” function”[3]. Of the LEGOL functions which deal with time directly, four are selectors. First and last select entries with earliest and latest start times by default, although end times can be specified instead. They may use control identifiers, e.g. first for FAMILY of ALLOWANCE(FAMILY) but there is no question of producing a time distribution. Current and past select instances of current or past states, so allowing a specification to bypass the normal “historical” mode of operation where necessary. “Current” entries are defined as those without a recorded end time; “past” entries are the converse. The use of control identifiers is not applicable here. Two other functions of interest are whenever, which aggregates overlapping or adjacent periods of time associated with the same control identifier(s) and duration which computes the length of given time periods by subtracting start date from end date. These are essential for describing regulations (such as those for taxation and social security) involving time-based conditions. 8.EVALUATION
Having looked at the various components (references, operators, functions) of a LEGOL rule we are in a position to consider the general process of evaluation. A complete grammar for the language is given in Appendix I; here we will simply note a few salient points. Evaluation of a LEGOL source expression goes from left to right. There are no degrees of precedence between the dyadic operators, but brackets may be used to alter the normal evaluation order. Function calls and attribute
S. JONES,P. MASONand R. STAMPER
300
references have a higher binding power than dyadic operators, so that, for instance, it is necessary to bracket an expression to which a function is applied. It is a basic principle that each step in the’evaluation process should produce a result whose structure is in every respect like that of a normal entity representation. With this condition it is possible to define operations and functions in a standard way, so that they will work when applied either to original entity representations or to the result of some previous operation. Details of the attribute structure for intermediate results produced by each operator and function are also given as appendices. BrielIy, a new ch~cte~stic attribute value will be given by arithmetic operators or functions like sum and count, which actually compute such a value. Otherwise the existing characteristic attribute of the left-most operand will be used, so that, for instance, after evaluating RATE(NCHILDREN) while NCHILDREN(FAMILY) the value of RATE will become the new characteristic. The remaining operand attributes will be treated as identifiers of the derived table, and its time attributes will usually be those computed by the underlying intersect or other time operation. Attributes are not generally discarded until the whole of the source part of the rule has been evaluated, when the form of the target reference is used to direct the tinal restructuring of the result. This involves using the relational projection operation to select and re-order attributes. The process is affected by: (a) The fomt of tke ‘outdate” sym~f The grammar in Section 3 shows this to have two possible forms, namely, “+” or “V. The double arrow indicates that the characteristic attribute is to be transferred to the target, the single arrow that it is not. Thus,
while NCHILDREN(FAMILY) will create tuples of the ALLOWANCE table having a characteristic attribute taken from that of the evaluated source expression, whereas ALLOWANCE(FAMILY)+-RATEJNCHILDREN) while NCHILDREN(FAMILY) will cause the ch~acte~stic attribute to be discarded. The characteristic attribute of the source may update another attribute of the target, e.g. end of CHILDH~~PERSON)~s~
In general then, the double update arrow indicates that the source characteristic attribute is to be transferred, and the form of the target reference indicates where it is to be transferred to. (b) Matching identifier labels This point has already been illustrated in Section 3. Identifier labels which match between source and target specify the transfer of the corresponding identifier attributes. (c) Whether the target co~tui~s an explicit time reference Unless there is a direct ~si~ment to a target time attribute, the derived time attributes of the evaluated source expression will be transferred automatically to the target. Thus, given the “allowance” rule above, it will be precisely the period defined by the “while” operator which will be assigned as the time of existence for the derived allowance. A time attribute reference in the target overrides this automatic transfer, giving the necessary facility for computing start and end times separately by different formulae, e.g. start of CHILDHOOD(PERSON)+start end of CHILDHOOD(PERSON)(start
of PERSON t 16.
When one target time attribute is explicitly set by a rule, the other one, whether start or end, is left undefined in the newly created tuple. Matching between corresponding start and end times, by way of the identifiers, is part of the updating process described in Section 9. 9.UPDATING The process of merging newly created source tuples into the entity representation referred to by the target expression is an extension of the relational union operation. In the simplest case, the new tuples are simply appended to the appropriate table to be stored on the data base. Duplicate entries are discarded, since there is no point in representing the same piece of information more than once. The LEGOL update goes further than relational union, however, in allowing the merging of individual tuples under certain conditions. At present we can give some simple default rules, which may be modified according to the particular semantic properties of the entity representation being updated. In general, a pair of tuples will be merged if, for every one of their attributes, the values are either equal or undefined. Example: We saw in the previous section a pair of rules to create a table representingthose periods when personsrecordedin the data base were under 16 years of age. Application of the first rule might produce the following tuples:
of PERSON + 16.
The derived characteristic attribute here is the result of the addition. The form of the target reference indicates that it is to be transferred to a time attribute of the CHILDHOOD table.
of PERSON
Table 3(a). Childhood
end
PERSON
start
Jack
I/7/54
-
Jim
3/9/53
-
Peter
2/4/51
-
LEGOL
2.0: A relational specification language for complex rules
And application of the second rule would produce: Table 3(b). PERSON
CHILDHOOD
eta?.?
end
Jack
l/?/70
Jim
319169
Peter
2/4/6?
Clearly these sets of tuples can be merged, given the above conditions and assuming the necessary continuity of time between start and end of childh~. The characteristic attributes of both sets are unde~n~, so can be ignored. The identifier attributes contain equal values which allow matching. The start and end attributes are obviously complementary. So the result of applying both rules would be: Table 3(c). CHILDHOOD
PERSON
start
end
Jack
117/54
I/?170
Jim
3/g/53
3/P/69
Peter
2/P/51
2/4/w
It is important to note that this tuple-merging process is quite different from the overwriting implied by assignment statements or file-update commands in more conventional data processing languages. No defined item of data is ever destroyed by a LEGOL update. If an incoming piece of information contradicts some fact already represented in the data base, the system should detect and point out the inconsistency. What constitutes an inconsistency in any particular case is, of course, dependent upon the semantic definition of the entity representation concerned. The following example shows one which should be resolved automatically by the proposed LEGOL interpreter. Example: if no ove~ting or replacement of information can occur, how will a rule be interpreted in which a reference to the same entity representation occurs both as source and target? The rule
expresses the fact that issues of an ITEM reduce the current stock level. Suppose we have current stock levels: Table 3(d). STOCK
start
end
washer
w/2
100
pipe
WI2
17m78 2om7a
-
200
washer
C/l
19/l/78
-
c/r
28/l/78
-
QUANTITT
50
400
ITEM
pipe
and details of the following issues made: Table 3(e). ‘IssttR
STOCK
state
end
20
washer
WJ2
3om7a
-
20
pipe
Cl1
30/l/78
-
ITEM
Evaluation of the source will produce: Table 3(f) QUANTITY
30
50
380
400
start
end
washer
w/2
20
3o/wa
-
pipe
C/l
20
30/l/78
-
ITEM
STOCK
ISSUE
S. JONES,P. MASONand R. STAMPER
302 restructured as:
Table 3(g).
30 380
end
start
STOCK
ITEM
QUANTITY
washer
WI2
30/1/70
-
pipe
C/l
3011178
-
An attempt to add these two tuples to the existing QUANTITY table will again throw up a kind of contradiction. If an undefined end time is considered as later than any recorded time, then for instance: Table 3(h). OUANTIW
STOCK
ITEM
end
start
(previous)
50
washer
w/2
17/l/78
-
(new)
30
washer
w2
30fIf78
-
are inconsistent for the period after 30/l/78. Obviously this can be resolved automatically, provided that QUANTITY has been defined such that the start time of a new current state may be taken as the end time of the previously current state. If so in this p~ticul~ example, after update the tuples will appear as follows: Table 3(i). QUANTITY
ITEM
50
washer
30
washer
start
end
w7.
17/1178
30/l/78
w
30/11?8
STOCK
In general, the question of consistency checking is a very complex one. Specification of consistency rules involves expressing constraints based upon the semantics of what is being represented, in terms of permissible relationships between attribute values in the representation. It is particularly important for LEGOL to get this right since the tables produced by the application of a rule are not, as we have seen, simply output as answers to queries but used in a long derivation chain simulating the effect of a system specification or piece of legislation.
The language described in this paper is, at the least, an extremely powerful and concise formalism for the specification of relational data-base commands. Operationally it owes more to the relational algebra than the calculus, but the basic const~ction of LEGOL rules as: “T~G~~SOURC~ makes it possible to define one’s requirements in terms of a desired result. The scope of LEGOL goes well beyond that of a relational query language. In addition to the features already described, it incorporates control structures such as conditionals and iteration, which will be discussed in another paper. The question obviously arises: is it actually feasible to implement such a language? A working implementation of an earlier version, which already contained such important operations as time intersect and update, was produced in 1975, based upon facilities provided by the Peterlee Relational Test Vehicle[Z]. Work has now begun on the design of a more ambitious LEGOL 2.0 interpreter, but it is clear that at present a system in-
-
tended for use with a large data-base would not be a practical pro~sition. Nevertheless, the language will be a valuable tool, if the user’s aim is p~ncip~ly to test the effectiveness of a system specification or piece of legislation on a small, judiciously selected set of test data. The declared purpose of LEGOL was to serve as a language with which to specify what must be done, without going into detail about how to do it. Inevitably, when rules are to be interpreted automatically, questions of “how” and “in what order” do become significant to some extent, and it is interesting to compare a piece of original legislation with the equivalent LEGOL version from this point of view[4]. But the principle of maintaining the closest possible correspondence with “real” entities and processes, avoiding decisions based solely on data-processing considerations, is of pa~mount importance. That point is exemplified by what is perhaps the most unusual feature of the language described here, the prominence which it gives to the matter of time. In practice, almost all entities and relationships are timedependent, and LEGOL structures and operations are intended to reflect this fact. By contrast, the more conventional data-processing approach, taking it for granted that information in a data base represents a snapshot of the “current” state of a&its which may be replaced ~~~c~ly by more recently acquired facts, is one step further into abstraction (see 15, 61 for some interesting discussions on this point). In a large-scale working system, discriminating between current and historical data
LECOL
2.0: A relational specification language for complex rules
303
I21 R. K. Stamper: The LEGOL project: a survey. IL?&4 UKSC Rep. No. 0081, May 1976. 19 C. Date: An Introduction to Databuse Systems, 2nd Edn. Addison-Wesley, Reading, Mass. (1978). [4] S. Jones: A LEGOL example: intestate succession. L.S.E. paper on Infomatics L. 19, 1978. 151 J. Bubeoko: The temporal dimension in information modelling. A~hitecture and Models in L&a-base management Systems (Ed. by Nijssen). North Holland, Amsterdam (39773. 161 B. M. Schueler: Update reconsidered, Architecture und Models in Data-base Management Systems (Ed. by Nijssen). North Holland, Amsterdam (1977). [7] W. Wirth: What can we do about the unnecessary diversity of notations for syntactic definition? CACM ZO(I I) ( 1977). [S] P. Mason and S. Jones: P~~~~n8 the law, IFIP TCI WG8.2 Con/ on The ~afo~utioa Systems E~i~~ronment. Bonn, I l-13 June. 1979.
is,
of course, necessary on the grounds of efkiency, but it should be recognised that the decision to do so is based upon expediency rather than any “natural” criterion. By retaining the time dimension within its area of active concern then, LEGOL allows users to specify rules at a higher level of generality, and to tackle applications, pa~icul~ly those based upon le~slation, for which the currently orthodox methods are inadequate.
REFERENCES
[II R. K. Stamper: Verso on modello semontico per I’analisi della legislazione, ~~fo~aticu e D&to. Anno IV (Ed. by A. A. Mar&o, E. Maretti and C. Ciampif Special mon~phic edition on informatics, logic and law. L’mnier, Firenze (1978).
LEGOL syntax
Rule
::-
target
Target
::=
reference
Update
::=
<= I+
Source
::-
expression
Expression Simple
::-
reference
Entity
reference
::=
Entity
label
::=
Attribute
Iabel
Identifier
list
tdentifier
element
ldentifier
label
expression)
::-
entity
reference
{operator
expression
::=
Attribute
source
simple
exprertsion
Reference
1,+ emreasion{operator [ literal
entity
label
alphabetic ::-
label
identifier
label
::-
identifier
element
identifier alphabetic
::=
quoted
string
/numeric
::=
function
name
Function
name
Select
::=
Aggregate Value
: :=
::=
Operator Setop Timeop
select
1aggregate
: :=
setop
“union”
::=
“while”
1 timeop
1“while
::=
+I-l*]/lt
::=
-Ifl>/
such as
meaning,
for”
control
element}
1integer
list
J
“of”
expression
label]
i value 1 “lowest”
1arithmeticop
1 "is" 1 “is
Comparisonop (Non-ferminals
f “end”
1“first” 1 “count”
1 “last”
1 “current”
1“past”
1“whenever”
1“today”
Arithmeticop
an obvious
reference
1non-negative
f,attribure
1“highest”
entity
1literal
) “number” 1“accumulate”
“sum”
“duration” ::=
I?
label
“maz? 1 “min”
“of”
constant
attribute
::=
call
f.identifier
string
call
::-
I(
list”)”
1 “start”
label
Function
list
1
string
: :=
::=
expression)”
reference
“(“idenrifier
attribute
::-
1 function
f ertribuce
reference
Literal
Control
IS Vol.4, No.4-D
update
not” not”
‘non-negative
and are
1comparisonop
not
1 “or
while”
integer’,
defined
here.)
) “during”
‘alphabetic
1 “since”
string’
1“until”
etc.
bear
S. JONES, P. MASON and R. STAMPER
304
APPENDIX2 Attribute Structure of Function Results Function type
Result Characteristic
Aggregate “at”
time
result
mm and number --
Aggregate “over”
accum and --
Identifiers
of
“control”
aggregation
Time Attributes result
identifiers
time
total
Aggregate time period6
null
“at”
Select
time
aggregated
characteristic
Select
first
on
ids of
of argument
“over” ,
period
time
periods
mex and -min -
highest
quad-
times delimiting
count
whenever
Select
of
time intersect
11
time
result
argument
II
of
quali-
time intersect
time delimiting
lowest
total
time
period
original
,s,
times of
argument
current Ppast
duration
(end -start) of argument
today’s
today
date
today’s
date
today’s
date
APPENDIX3 Attribute
Structure
of Dyadic Owrator
Result
-Onerator Characteristic Arithmetic
result
+-*te
operation
Comparison
characteristic
=#<><>
left
“while”.
Results
_ _
“since”,
of
Identifiers
Time Attributes
all
result
attributes
of operands
of time
intersect
of
operand
1,
“until”
“during”
“while not”
undefined
attributes left
of
operand
result
of time-
difference
LEGOL2.0: A relational
specification language for complex rules
305
Result Characteristic
Identifiers
Time Attributes
IlUll
matching ids
result
only
characteristic either
characteristic left
of
operand
ODerand
of
of time-
union