TARI LAN : an embedded functional data processing language

TARI LAN : an embedded functional data processing language

The Journal of Systems and Software 43 (1998) 93±102 TA R I LA N : an embedded functional data processing language Koen De Bosschere 1 Vakgroen Ele...

121KB Sizes 0 Downloads 42 Views

The Journal of Systems and Software 43 (1998) 93±102

TA R I LA N : an embedded functional data processing language Koen De Bosschere

1

Vakgroen Elektronika en Informatie systemen, Universiteit Gent, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium Received 19 September 1996; received in revised form 3 February 1997; accepted 2 April 1997

Abstract TA R I LA N is a functional programming language, specialized for data processing tasks. Its intended application ®eld is in complex data processing tasks where ¯exibility is more important than high performance. The ¯exibility is the result of a separation of the business rules that control the transformation from the transformation itself. The present work shows how a subset of functional and logic programming can be used to express complex transformation rules. The TA R I LA N system is now in use for more than two years and proves to be a powerful environment in practice. Ó 1998 Elsevier Science Inc. All rights reserved. Keywords: Functional programming; Data processing; TA R I LA N

1. Introduction Traditionally, data processing is either done with the help of a database programming language of a (relational) database system (Date, 1986), or with the help of a general purpose imperative programming language such as COBOL, or recently an object oriented language such as C++ or Borland's Delphi. A general purpose programming language o€ers full ¯exibility for expressing whatever data processing algorithm, at the cost of increased complexity, more mistakes, and longer development time. Database systems and their associated programming languages o€er the user more support, but tend to be hard to use when it comes to expressing algorithms that are not directly supported, and do not fall within the class of applications the database system was originally designed for. In that case, developing an application can be very time-consuming and error-prone too. Fortunately, most data processing tasks are fairly regular, and database systems are nowadays so sophisticated that they can easily handle the majority of these tasks. There is however a class of applications that are not easily handled by standard database tools (De Bosschere, 1996b). It are applications that need more than just iterating over the database records, or that must take care of many special cases or exceptions. One such application ± the one we will use as running example in this paper ± is the processing of claims for a medical insurance (De Bosschere, 1996a). An insurance company receives all kinds of receipts of drugs, doctor's visits, hospital stays, and so on. Depending on the kind of insurance contract, the status of the patient, the kind of disease, the kind of physician, the type of hospital, it has to compute the amount of money to be reimbursed, after deducting the amounts that were already reimbursed by other insurances (normally the obligatory basic insurance), or taking into account certain copayment rules. There is really no end at the complexity of some rules. It may be the case that some drug is reimbursed only if it is prescribed in combination with some other drug, for a particular set of diseases, for persons in a particular age window, not more than twice a year, and so on. If one wants to automate the processing of this kind of information, there are two problems that should be solved. The ®rst problem is the representation problem of the claims, the second one is the processing of the claim information. The ®rst problem is not too dicult to solve as almost every drug and medical prestation has a unique code which can be used to identify it (Algemene Pharmaceutische Bond, 1997). So, basically, it is sucient to specify (i) the date, (ii) the amount paid, and (iii) the unique code to have access to any information needed to correctly process the claim. There are however a number of expenses that are not (cannot be) standardized, such as medical expenses from other

1

E-mail: [email protected].

0164-1212/98/$ ± see front matter Ó 1998 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 8 ) 1 0 0 2 5 - 0

94

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

countries, alternative therapies, and expenses that are not strictly medical such as a taxi drive to a hospital. For these cases, additional (technical) codes can be created by the insurance company. 2 Once everything has been codi®ed, the (i) date, (ii) amount, and (iii) code for all the claimed expenses can be stored in a relational table. The second problem is much harder to solve. On the one hand, one could try to use the relational database system to express an algorithm that produces the ®nal amount to be reimbursed per person (Date, 1986). There are however many special cases, and the functionality o€ered by the database system is not really adequate to express them. Standard database operations cannot be used anymore, and consequently one has to traverse the database record by record and to apply the algorithm on the individual records. This is a both slow and error-prone solution. Furthermore, given the rate of change of the reimbursement rules, this solution is not practical as everything is coded in one program. On the other hand using e.g. COBOL to express the algorithm in full detail is also too low level. Although one can express anything one needs, there are too many (static) low level details (®le pointers, data base layout,. . .), that are visible to the programmer. The use of object orientation as in C++ or Delphi can hide some of these low level details in class libraries, and is therefore better than a non-object oriented language, especially when we want to let the users understand and modify the business rules. String processing languages such as Awk (Aho et al., 1988), A (Ladd and Ramming, 1995), Perl (Wall and Schwartz, 1992), and Icon (Griswold and Griswold, 1990) could be used to hide many of the low level details. Although programs in these languages are normally very concise (especially for small tasks), their use for complicated tasks involving multiple large ®les, index ®les, etc. is not evident. Furthermore, long scripts in these languages tend to become very unclear due to the sometimes cryptic syntax. All the approaches discussed so far su€er from a last major disadvantage for the application here described: they are not declarative, which implies that the (declarative) reimbursement rules that control the actual processing of the data, must ®rst be translated into an more imperative form before they can be used. Changing an existing rule means ®rst undoing the e€ect of the old rule on the program, and then translating the new rule into the program. Every rule change thus requires a program change, and possibly the redistribution of the complete program if it is compiled (e.g., C++). Ideally, what we want is to store the rules in such a way that they remain readable (modifyable) for the user, and automatically executable at the same time (a so-called executable speci®cation). Then, changes will normally be limited to the rule base, and will not require a modi®cation of the program that executes them. Changing the rules means redistributing the rules, but not the complete program. Hence, having a system that uses the unpreprocessed rules as an executable speci®cation, stored in a separate ®le would have many advantages (De Bosschere, 1996b). The approach we present in this paper, and which has already proved to be successful for two non-trivial applications, and which has been independently advocated in (Poo, 1992), is the combination of a declarative programming language (in our case the functional language TA R I LA N , which stands for Tari®cation Language) to express the business rules, with a vanilla database management system that basically o€ers access to the various ®les used by the system (currently about 15, index ®les included). The choice for a declarative language is motivated by the fact that declarative languages make a clear distinction between what has to be done, and how (Sterling and Shapiro, 1986). This separation between the rules, and their execution turns out to be very useful for the users of this environment. They do not have to worry about operational aspects of the system, and the rules are so simple, and so close to the language they use to talk about claims that even people who do not have a programming background can understand the rules, and start modifying them after a while. As a TA R I LA N program only contains the rules, it is normally rather short. One hundred rules can be expressed in 2±3 pages of programming text. TA R I LA N is called an embedded language because it is aimed at being integrated into a classical data processing application, while preserving its own visibility through the rule database. Hence, the integration is not complete, and that is why we prefer to call it embedded. Another important di€erence with traditional data processing is that the processing philosophy is actually completely di€erent. Instead of having an application speci®c data processing program that iterates over the records of a database, the TA R I LA N system consists of a generic event engine that accesses the database, and generates several events per record. These events are linked with TA R I LA N functions that implement the business rules, and control the event engine (feedback loop: at initialization, the event engine generates one event; the subsequent events are controlled by the TA R I LA N program). In other words, the event engine contains the general data base access routines while the TA R I LA N program contains the application speci®c information.

2 Ideally, all the codes should be di€erent. Unfortunately some of the codes for medical expenses coincide with drug codes. This problem is solved by storing the type of code with its value.

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

95

This paper contains a description of a practical experience in medical tari®cation where the data sets are relatively small, but the algorithms are complex. It also wants to show that although declarative programming seems to have lost the `battle of paradigms' against object oriented programming, there is still a huge opportunity for non-mainstream applications that are unusual and need more ¯exibility than is normally o€ered by compiled object oriented environments. The application is actually a good illustration of how di€erent paradigms can be combined to build a good application (a database application programmed in TA R I LA N , a declarative language with functional and logic programming characteristics, and implemented in an object oriented language). This application also illustrates our belief that programming languages should not be opposed to each other, but that their strengths should be combined. The remainder of the paper mainly contains a description of the TA R I LA N system. Next comes a description of the language, followed by a description of the system that generates the events, and how they are processed. The paper ends with a discussion of the implementation, its performance, and related work. 2. TA R I LA N : The language Syntactically, the TA R I LA N language is simple, and can be described in less than one page of BNF. Its syntax is related to the syntax of functional logic programming languages such as Escher (Lloyd, 1995), and the committed logic programming languages such as Janus (Gudeman et al., 1992). The language does (currently) only support strings, booleans and integers, neither ¯oating point numbers nor structured data types. The main reason is that the current set of applications does not require these data types. They could however easily be added to the language. clause head headarguments headargument body condition expression term factor compfactor call arguments argument constant variable name charstring truefalse number string uppercase lowercase digit char comparison

:: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ :: ˆ

head : ˆ body :- condition . j head : ˆ body . name j name (headarguments) headargument j headargument , headarguments variable j constant expression expression term j expression + term j expression - term j expression | term j expression ++ term factor j term * factor j term / factor j term % factor term r factor compfactor j (compfactor) - factor j ! factor j (expression) j variable j constant j call name j name (arguments) argument j argument , arguments expression number j truefalse j string uppercase j uppercase alphastring j_-1pt_ lowercase j lowercase alphastring char j char charstring true j false digit j digit number 'charstring' AjBjCjDjEjFjGjHjIjJjKjLjMjNjOjPjQjRjSjTjUjVjWjXjYjZ ajbjcjdjejfjgjhjijjjkjljmjnjojpjqjrjsjtjujvjwjxjyjz 0j1j2j3j4j5j6j7j8j9 any ascii-char except' >j > ˆ j ˆ j <>j
We will explain the semantics of the language by means of a couple of examples. The integer division of two number can be speci®ed as follows quotient…X; Y† :ˆ X=Y: ÿY <> 0: which means that the result of the function application is X/Y if Y <> 0: If Y is 0, the function fails and an exception is raised. If we do not want this to happen, but return 0 instead, we can write: quotient…X; Y† :ˆ X=Y: ÿY <> 0: quotient…X; Y† :ˆ 0: ÿY ˆ 0:

96

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

Note that the order of execution is (i) bind the formal function arguments to the actual arguments, (ii) evaluate the conditions of the clauses in program order, return the value of the function body of the ®rst clause that has a condition that evaluates true. Since the second clause will only be used if Y<>0 fails, we can actually replace its condition by true, which can be omitted as condition. quotient…X; Y† :ˆ X=Y: ÿY <> 0: quotient… ; † :ˆ 0: Note that we have also replaced X and Y by anonymous variables (_) as they occur only once in the clause. Notice that _ can only occur in the head of a clause as every clause variable must occur at least in the head of a clause (this is the only place where it can be given a value), and hence variables with only one occurrence must necessarily occur in the clause head. Additional variables can only be created by calling functions, e.g., by recursion. The clause order is important as it determines the order of condition evaluation. Changing the order of clauses will normally require some editing. The previous program is semantically equivalent with quotient… ; Y† :ˆ 0: ÿY ˆ 0: quotient…X; Y† :ˆ X=Y: which can be syntactically sugared and be rewritten as quotient… ; 0† :ˆ 0: quotient…X; Y† :ˆ X=Y: For performance reasons, it is however better to put the clause with the highest probability on top. As is the case in our toy example too, the fastest program might not always be the shortest or the most elegant program, and depending on the complexity of the rules, a user might prefer not to write the fastest program, but the program that is easiest the understand and to modify, and that will probably contain the smallest number of errors. Other examples are: the smallest of 2 or 3 values. min…A; B† :ˆ A: ÿA <ˆ B: min…A; B† :ˆ B: min…A; B; C† :ˆ min…A; min…B; C††: Notice that identical function names with di€erent arities denote di€erent functions. Functions can also return strings as in formattedclause…H; B;00 † :ˆ H ‡ ‡0 :ˆ0 ‡ ‡ B ‡ ‡0 :0 : formattedclause…H; B; C† :ˆ H ‡ ‡0 :ˆ0 ‡ ‡ B ‡ ‡0 : ÿ0 ‡ ‡C ‡ ‡0 :0 : and boolean values as in specialnumber…3† :ˆ true: specialnumber…5† :ˆ true: specialnumber…7† :ˆ true: specialnumber… † :ˆ false: which is semantically equivalent with specialnumber…3†: specialnumber…5†: specialnumber…7†: specialnumber… † :ˆ false: because true as body can always be omitted. Using the logical or, this function could also be rewritten as specialnumber…G† :ˆ …G ˆ 3†j…G ˆ 5†j…G ˆ 7†: Recursion can be expressed in the usual way

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

97

gcd…A; 0† :ˆ A: gcd…A; B† :ˆ gcd…B; A%B†: ÿA >ˆ B: gcd…A; B† :ˆ gcd…B; A†: Since recursion is not an easy concept for people that do not have a programming background, TA R I LA N also o€ers the concept of accumulation function. The accumulation function is a higher order function. It does not evaluate its argument, but parses it and extracts the information it needs out of it. The accumulation function can be used to express linear recursion in a non-linear (easier) way (Burge, 1975). The function sumsquare/1 returns the sum of the squares between 1 and its argument N. square…X† :ˆ X  X: sumsquare…N† :ˆ accumulation…square…1† ‡ square…N††: In general, the accumulation function requires as argument an expression with one binary operator between two function invocations of the same unary function of a numerical argument. This expression will be validated, the unary function is then iterated over all the integers ranging between the two given values, and the function results will be combined using the binary operator. In the previous example, sumsquare(5) evaluates the squares from 1 to 5 and returns the sum. There are two more occasions where higher order functions are used in TA R I LA N . The ®rst one is for input/output, where a function returning a ®lename is used as ®le handle. filename :ˆ0 test:out0 : makefile :ˆ rewrite…filename†&write…filename;0 this is a test:0 †&close…filename†: The function ®lename/0 is not evaluated before calling the input/output operations. Instead the function address itself is used as ®le handle. When needed, the input/output functions will evaluate the ®le handle to have access to the ®lename itself. All input/output functions are boolean functions returning true when successful. As TA R I LA N evaluates boolean expressions in short circuit, the evaluation of a function returns false as soon as there is a fatal input/ output error, and subsequent input/output operations will not be executed. The last occasion where higher order functions are used is when rede®ning functions. Functions having a constant in their body and no condition such as f: ˆ 1 or ®lename: ˆ `test.out' can be rede®ned at runtime with the de®ne/2 function. This feature can e.g. be used to simulate the behavior of global variables which are not directly supported in TA R I LA N . total :ˆ 0: updatetotal…X† :ˆ define…total; total ‡ X†: The ®rst argument of de®ne is not evaluated, but passed as such. The second argument is evaluated, and uses the previous value of total to add to X. Their sum is used as new function value. Given the complexity of this mechanism, users are not stimulated to use it.

3. TA R I LA N : The event engine As mentioned earlier, the driving force behind a TA R I LA N program is not the program itself, but the event engine generating events. These events are then processed by a TA R I LA N program (see Fig. 1). In principle, it would be sucient to generate one event per record. In the application we describe in this paper, we have however chosen to generate eight di€erent events, depending on the type of record. Although this approach could be implemented by inserting a dispatch function in the main event handler, we decided to let the event engine generate these events as this turned out to be slightly more convenient for this application. The eight events re¯ect the hierarchical structure of the contents of the database. Per level of the hierarchy, there is one event generated before the processing starts, and one event after the processing has ®nished. The four hierarchical levels are: ®le, patient, claim, code. Hence, the eight possible events are:

98

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

Fig. 1. TARILAN architecture.

begin®le end®le

beginpatient endpatient

beginclaim endclaim

begincode endcode

Before the processing starts, the event engine sorts the records per patient and per claim. As explained in the introduction, a claim consists of a multiset of codes. Typically, the event sequence starts like begin®le, begincode, beginclaim, beginpatient,

beginpatient, endcode, begincode, ...

beginclaim, begincode, endcode,

begincode, endcode, endclaim,

endcode, endclaim, endpatient,

Hence, an event is generated before the processing of the ®le starts begin®le, followed by an event indicating that the processing of the ®rst patient will start beginpatient, followed by an event announcing the processing of the ®rst claim of this patient beginclaim, and then two events per code that is mentioned on the claim (one before the code is processed (begincode), and one after the code has been processed (endcode)). The link with the TA R I LA N program is simple. Every event calls the corresponding (nullary) function in the program. If a particular event function is not de®ned, the program raises an exception. The event functions must be boolean functions, which can be used to control the event engine. A begin-function should return true. If it returns false, the even engine will skip the current hierarchical level (®le, patient, claim, code). An end-function should return true to indicate the successful processing of a given level. If it returns false, the complete level is restarted. Each level has an individual iteration counter that indicates the current iteration. This mechanism is useful if the processing of a level depends on the contents of the level. E.g., a patient will only receive a document if something is to be reimbursed, and this is only known after all the claims of the patient are processed. In that case, there is a ®rst pass to see whether something should be reimbursed, if necessary followed by a second pass that prints the documents. The decision whether to make another pass is made by the function endpatient. The next example illustrates how the mechanism works. Suppose we want to print all the claims that ful®ll a certain condition. As we only know whether a claim ful®lls the condition after it has been completely scanned, we have to process the claim twice: a ®rst time to evaluate the condition, and a second time to print it if necessary. The event functions look as follows. beginfile: endfile: beginpatient: endpatient: beginclaim :ˆ resetflag: ÿclaimcounter ˆ 1: beginclaim :ˆ printheader:

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

99

endclaim :ˆ !flag: ÿclaimcounter ˆ 1: endclaim: begincode :ˆ markflag: ÿclaimcounter ˆ 1: begincode :ˆ printcode: endcode: When the processing of a new claim starts, claimcounter is set to 1 by the event engine and hence, the ®rst clauses of the dual-clause functions will be selected. Consequently, ¯ag is set to false, and the functions begincode and endcode are called per code belonging to the claim, hereby setting the ¯ag to true if appropriate. Finally, endclaim is called. As we are still in the ®rst iteration of the claim, the ®rst clause is executed, and !¯ag is returned. Hence, the processing of this claim stops here if the condition is not ful®lled. However, if the condition is ful®lled, endclaim will return false, and the event engine will again call beginclaim on the same claim, but this time with 2 as value for claimcounter, which will select the second clause of the dual-clause functions. In this second pass, the complete claim is printed. After having printed the claim, endclaim will ®nally return true, which will ®nish its processing. The behavior of the event engine for failing end functions can be considered as a form of backtracking (Deransart et al., 1996) over the database records. 3 Notice however that it is strictly limited to the event engine, and that the TA R I LA N language itself does not feature backtracking. Notice how concisely TA R I LA N allows to express this behavior. For most cases, one or two pages of code is sucient to implement quite complicated processing algorithms. The events are however useless if the event functions cannot access the database records they are associated with. Therefore, the event engine not only creates events, but also de®nes functions per database ®eld. In the current implementation there are about 45 such functions, some of them correspond to overlaid database ®elds, but have distinct function names to hide the database record layout from the user. Every single bit of the database ®le can be accessed by the functions. Besides this set of functions that return database ®elds, there are about 20 other auxiliary functions that return data from auxiliary ®les such as the description of the codes, price information, etc. These functions, combined with about 20 built-in TA R I LA N functions to do format conversion, input/output, function manipulation etc. enable the user to express data processing rules in a very natural way. The following example illustrates how naturally reimbursement rules can be speci®ed in TA R I LA N . The aim of this program is to print one line per patient, containing the name of the patient, and the amount to be reimbursed. Since we want to know the total amount, we start with creating a function sum to keep the amount. When a new patient is started, this variable is set to zero. There is no work to be done at the claim level. At the level of the individual expenses (codes), sum is increased by the amount of money to be reimbursed, which is returned by the function amountreimbursed. The de®nition of this function is straightforward. All identi®ers that were not de®ned earlier are functions that return particular ®elds of the database record being processed (code, patient, amountpaid). Notice how easy it is to understand the de®nition of the function amountreimbursed. This de®nition is very close to the application domain, and is therefore also comprehensible by the users of the system that are non-programmers. beginfile: endfile: beginpatient :ˆ define…sum; 0†: 0

endpatient :ˆ write… Sum for patient %s is %d n n0 ; patient; sum†: beginclaim: endclaim: begincode :ˆ define…sum; sum ‡ amountreimbursed† endcode:

3

Technically speaking, this is not exactly true as claimcounter is incremented per extra iteration.

100

amountreimbursed

:ˆ :ÿ amountreimbursed :ˆ :ÿ amountreimbursed :ˆ :ÿ amountreimbursed :ˆ :ÿ amountreimbursed :ˆ :ÿ amountreimbursed :ˆ :ÿ amountreimbursed :ˆ :ÿ

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

0 transportation…code†: amountpaid dentalcare…code†&…age…patient† < 18†: amountpaid=2 dentalcare…code†: amountpaid prescriptiondrugs…code†: amountpaid otherdrugs…code†&veteran…patient†: 100 otherdrugs…code†: min…amountpaid; 10000† hospitalcare…code†:

transportation(C): ˆ (C > ˆ 23000) & (C < 24000). dentalcare(C): ˆ (C > ˆ 30000) & (C < 40000). prescriptiondrugs(C): ˆ (C < 10000) & (Cmod10 ˆ 0). otherdrugs(C): ˆ (C > 10000) & (C < ˆ 20000). hospitalcare(C): ˆ (C > ˆ 60000) & (C < 70000). 4. Implementation and performance The TA R I LA N language, and the event engine are implemented in Turbo Pascal 6.0, and are about 4000 lines of pascal code (comments included). It took about one week to design and implement it from scratch to a fully working system. The system is a language interpreter. Hence, it starts by loading a TA R I LA N source (which can be a multi®le program as there are ®le inclusion primitives). The program is scanned, parsed and stored in an internal data structure which makes optimal use of the object oriented features of Turbo Pascal 6.0. Hence, every program item, even variables and constants are stored as objects, which might not be very ecient, but facilitated the implementation. Evaluating a function now just means evaluating the object in which it is stored, and this object will automatically evaluate all the other objects it depends on. An additional feature of this implementation is that (multi®le) programs can be merged into one ®le, and scrambled. This allows to freely distribute TA R I LA N programs without giving away the know-how that is stored in the ®les. Indeed, in contrast to a compiled program, a TA R I LA N program is distributed as source ®le, and as a result of its simple syntax and semantics, it can be understood, and modi®ed even by non-experts. The current performance of the prototype implementation varies between 100 and 2000 records processed per second (on a 66 MHz 80486 processor), depending on the complexity of the TA R I LA N program. This means that it takes about 10 s to 3 min to process 1 Mb of data. Given the moderate size of the data ®les that are processed in this stage, this is an acceptable speed, and several times faster than a comparable program in a commercial database language on the same platform. Hence, we did actually not trade speed for ¯exibility, but we have both at the same time. A better implementation (e.g., compilation to intermediate code (AõÈt Kaci, 1991), or compilation to native code, combined with optimizations) can certainly improve the performance of the prototype, although as a data processing application, it will always be input/output bound. 5. Related work Although the idea of combining a kind of declarative programming language with a more general environment in order to customize it is not new (varying from setting a number of environment variables (as is the case in operating systems), to full-¯edged programming languages for e.g., the emacs editor), we are aware of only one author that advocates a similar approach (Poo, 1992) for a comparable application (de®ning business policies in a public library). There, a distinction is made between objects (the database of records), the functional requirements (the built-in methods accessing the database), and the business policies (comparable to the TA R I LA N program) aiming at the same objectives as TA R I LA N does.

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

101

We believe however that TA R I LA N goes beyond Poo's proposal in several ways: ®rst of all, we formally de®ne a language to express the business rules were in (Poo, 1992) an ad hoc mix of rules and functions is used; secondly, since we are using a functional (declarative) language with some (but limited) metaprogramming capabilities, there is no need to make a distinction between the areas identi®ed in (Poo, 1992) (object actions, pre-action constraints, action sequence policies, object attributes, condition derivatives and computational derivatives) as they can all be expressed in one uniform framework, and ®nally, our work is being used in practice by a couple of companies, and it is not clear from (Poo, 1992) whether this is more than just a proposal. The languages that are more related to the TA R I LA N language are string processing languages a la Awk (Aho et al., 1988), A (Ladd and Ramming, 1995), Perl (Wall and Schwartz, 1992), or Icon (Griswold and Griswold, 1990). These languages allow to specify ®le transformations in a concise way. Although these languages can in principle also be used to process database ®les, their intended use is more in the processing of ®les with a free format. Furthermore, as soon as multiple ®les are involved, the scripts one needs to write also become quite complicated, especially when one has to access large ®les that have to be indexed. Functionally, Perl is closest to TA R I LA N . It too allows to hide most of the ®le processing routines. Even at the syntactic level, statement modi®ers could be used to simulate the structure of the clauses of TA R I LA N . The major disadvantage of Perl for non-experienced computer users is however the fact that long scripts are often dicult to read. 6. Conclusion TA R I LA N is an embedded functional data processing language. The aim of the language design was to create a specialized language for the processing of insurance claims. Therefore, we borrowed goodies from functional and logic programming, but not more than what was needed to do the processing. Our approach actually combines the best of two worlds. On the one hand, the ®le input/output and database manipulations are eciently done by the event engine, in an imperative language (Pascal). This part is not likely to change often during the lifetime of an application. On the other hand, everything that is related to the processing, is expressed in a declarative language. This part might need several modi®cations a year. The user has only access to the TA R I LA N program, which is expressed in the language he/ she is familiar with, namely the language of the application domain. TA R I LA N has now been used for more than two years, and proves to be a very useful tool in practice. Acknowledgements The author is research associate with the Belgian National Fund. This work has been carried out in collaboration with the Belgian Pharmaceutical Association (APB). The author would like to thank J. Van Campenhout for support and advice. He is also indebted to P. Meuwissen for introducing him to the ®eld of medical tari®cation, and to the numerous users of the program for their feedback. References Aho, A., Kernighan, B., Weinberger, P., 1988. The AWK Programming Language. Addison±Wesley, Reading, MA. Aõt Kaci, H., 1991. Warren's Abstract Machine, A Tutorial Reconstruction, Series in Logic Programming. MIT Press, Cambridge, MA, 1991. Algemene Pharmaceutische Bond, 1997. Tarief APB der Farmaceutische Specialiteiten. APB, Brussels. Burge, W., 1975. Recursive Programming Techniques, The Systems Programming Series. Addison±Wesley, Reading, MA, 1975. Date, C., 1986. An Introduction to Database Systems, vol. 1, fourth edition. Addison±Wesley, Reading, MA. De Bosschere, K., 1996. TariLan. ELIS Technical Report PARIS 96-05, Vakgroep Elektronica en Informatiesystemen, Universiteit Gent. De Bosschere, K., 1996b. Tarivite: An application of dynamic views in a constrained committed choice deductive database. International Journal of Mini and Microcomputers 18 (2), 61±67. Deransart, P., Ed-Dbali, A., Cervoni, L., 1996. Prolog: The Standard Springer, Berlin. Griswold, R., Griswold, M., 1990. The ICON Programming Language, second edition. Prentice-Hall, Englewood Cli€s, NJ. Gudeman, D., De Bosschere, K., Debray, S., 1992. jc: An ecient and portable sequential implementation of Janus. In: Apt, K. (Ed.), Proceedings of the Joint International Conference and Symposium on Logic Programming. MIT Press, Cambridge, MA, Washington, DC, 1992, pp. 399±413. Ladd, D., Ramming, J., 1995. A : A language for implementing language processors. IEEE Trans. Software Eng. 21 (11), 894±901. Lloyd, J., 1995. Declarative programming in Escher. Technical Report CSTR-95-013, Department of Computer Science, University of Bristol. Poo, C.-C., 1992. A framework for software maintenance. In: CAiSE 1992, Lecture Notes in Computer Sciences, vol. 593, Manchester, pp. 88±104. Sterling, L., Shapiro, E., 1986. The Art of Prolog: Advanced Programming Techniques, Series in Logic Programming. MIT Press, Cambridge, MA. Wall, L., Schwartz, R., 1992. Programming Perl. O'Reilly & Associates, Sabastopol, CA.

102

K. De Bosschere / The Journal of Systems and Software 43 (1998) 93±102

Koen De Bosschere was born in Belgium in 1963. He received the degrees of Electrotechnical Engineering and Computer Science from the University of Gent, in 1986 and 1987, respectively. He obtained his Ph.D. from the same university in 1992 (Multi-Prolog, a blackboard-based logic programming language). Since 1996, he is research associate with the Belgian National Fund, and he is part-time senior lecturer at the University of Gent where he is teaching the courses on computer architecture, operating systems and non-procedural programming languages at the Faculty of Applied Sciences. His research interests include logic programming, systems programming, parallelism and parallel debugging. Koen De Bosschere is member of IEEE, ACM, and the Koninklijke Vlaamse Ingenieursvereniging (KViv).