Information and Software Technology 40 (1998) 463±474
VC t Ða formal language for the speci®cation of diagrammatic modelling techniques J. Artur Serrano a, Ray Welland b,* a
Departamento de Electronica e Telecomunicacoes, Campus Universitario, 3810 Aveiro, Portugal b Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK Received 18 July 1997; received in revised form 16 July 1998; accepted 20 July 1998
Abstract In this paper we propose a language to produce formal high-level speci®cations of modelling techniques based on diagrammatic notations such as Data¯ow Diagrams, State Transition Diagrams or the Entity-Relationship technique. The language, called VC t, is based on set theory and a form of predicate logic. It is aimed at expressing the semantics of modelling techniques. Being a formal language, the speci®cations produced from it are unambiguous. Complete VC t speci®cations have been produced for a number of modelling techniques. The speci®cations are used as the input of a system for the automatic generation of design tools. The system includes a parser for the VC t language and a code generator. Each generated tool is dedicated to a speci®ed modelling technique, providing both diagrammatic and semantic support. q 1998 Published by Elsevier Science B.V. All rights reserved. Keywords: Software modelling techniques; Software design methods; EASE tools
1. Introduction The overall goal of our research is to build a system to automatically generate con®gurable tools that support software design notations, such as data¯ow diagrams and entityrelationship diagrams. In order to be able to automatically generate such tools we need to start with a precise and unambiguous description of a design notation and its semantics. The main purpose of this paper is to describe our approach to designing and evaluating such a language. Diagram based conceptual Modelling Techniques (MTs) such as Data¯ow Diagrams (DFD) or the Entity-Relationship technique (ER) are normally described in natural language. These descriptions can be ambiguous and imprecise and, as a result, it is dif®cult to implement software tools supporting MTs based on such descriptions. Our approach is to de®ne a special-purpose formal speci®cation language called `VC t', where `VC' stands for Visual Concepts which formally specify the constructs of a MT as sets, and `t' stands for textual. VC t is a simple language, this was a major requirement for its design. We managed to achieve simplicity mainly because the scope of the language is well delimited as we are only interested in specifying the
* Corresponding author; E-mail:
[email protected]
semantics of MTs. VC t descriptions are concise, clear and easy to read. Several studies [1, 2] show that some companies ®nd it more pro®table to tailor standard MTs to suit their own application domain than to use general purpose commercial MTs and supporting tools. We designate these specially tailored MTs as Application Speci®c Modelling Techniques (ASMTs). Because they are based on informal descriptions of MTs, they will inherit the problems of those descriptions and therefore bene®t from being formally described. This is another potential application area for the VC t language. The VC t language was tested on a number of well known MTs. Complete speci®cations were obtained for the following MTs: DFD and State Transition DiagramsÐboth de®ned in Ref. [3]Ðand ER [4]. The former two techniques are used to provide dynamic viewpoints of the universe of discourse, whereas the latter is a semantic data modelling technique. These examples prove the expressive power of the language although some further development of the language is required to handle partitioning of diagrams, such as multi-level DFDs. The language is aimed at the automatic generation of software design tools to support the speci®ed MTs. This means that VC t speci®cations can be parsed to automatically generate the code implementing the MT design tools. A compiler has been designed and implemented for
0950-5849/98/$ - see front matter q 1998 Published by Elsevier Science B.V. All rights reserved. PII: S 0950-584 9(98)00081-0
464
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
the VC t language. The compiler is divided into a front-end and a back-end. The front-end parses the speci®cation and checks its correctness at the lexical, syntactical and semantic levels. MT design tools are automatically generated by the back-end, which produces executable code for the chosen target language. The scope of this paper is principally the presentation and discussion of the language; its implementation is only brie¯y described. In the next section of this paper we discuss the language design principles for VC t and this is followed by an example of the use of the language to describe a DFD. We then discuss the formal aspects of the language and brie¯y describe the overall framework for tool generation. In the related work section we discuss other work in the area and compare our approach with the major alternatives. Finally, we summarise our work and discuss possible future developments. 2. VC t language design The language is not intended to be a general purpose one. We believe that a simpler language, which produces clearer and more readable speci®cations, can be obtained if its scope is well delimited. The following requirements guided the language design: ² it must be expressive enough to capture the semantics of a range of widely used MTs such as State Transition Diagrams (STD), DFD or ER; ² the speci®cations produced with the language must be able to be used as input to a parser and automatic code generation must be possible from any legal speci®cation; ² the language has to be more than a theoretical exerciseÐ simplicity and usability of the language are desirable properties. The VC t language is able to express the semantics of MTs. This is exclusively done by using semantic constraints. A semantic constraint is a rule expressing part of the semantics of the MT under speci®cation, that can be checked for validity. In the VC t language, semantic constraints are written using a form of predicate logic with equality. A speci®cation of a MT, obtained with the VC t language, includes two main sections: the `Preamble', where all the sets and their properties are declared; and the `Semantic Constraints', where each constraint is speci®ed by a statement in predicate logic. In the speci®cation, a natural language description is associated with each constraint. The natural language descriptions will be used later, in the design tool automatically generated to support the MT. They will appear as messages in the user interface of the tool, giving semantic feedback to the user during the design task. These descriptions also add legibility to the speci®cation. The semantics of a MT are orthogonal to the particular
representation chosen, i.e. multiple graphical notations can be de®ned for a given MT without changing its semantics. Hence, aspects related to the graphical notation are intentionally left out of the language. This has contributed to the simplicity of the language. Speci®cations obtained with the VC t language are, at the same time, formal descriptions of MTs and high-level descriptions of the software tools generated to support them. Descriptions obtained with the VC t language present major advantages to the user 1 when compared with natural language ones: ² Because they are formal they have a well de®ned meaning and consequently they are precise and unambiguous. ² The VC t language has explicitly been developed to express MTs, thus, being specialised, the descriptions obtained with it have a more consistent structure and terminology than those obtained with a highly generalpurpose language like the natural one. ² In the case of the ASMTs, the speci®cation task may have an additional bene®t: uncovering eventual faults or inconsistencies in their semantics. Note that what it is said about the natural language in the second point also applies to those formal speci®cation languages which can be classi®ed as general purpose, meaning that they are able to describe a wide range of systems, such as `Z' [5]. The advantages stated above are important, but they can be totally negated by a single disadvantageÐlack of readability. It is important that the speci®cations may be well understood and used as a means of sharing information within a team. This will also allow for them to become an insight into the supporting tools. We want the user to be able to obtain a speci®cation easily and quickly. That is only achievable if the language has a steep and short learning curve. We must have in mind that the language users will be MT designers (or at least have a fair knowledge of software modelling) but will not necessarily have expertise in formal methods or a deep mathematical background. So, we have tried to reduce the complexity of the formal aspects of the language to the necessary minimum. Formality has been employed only as a mathematical tool to guide the design of the language, never as an end in itself. The main goal is to obtain a language that may be used by someone who is already able to specify a MT using natural language; the use of a specialised formal language should not make the task of specifying a MT more complex.
3. A sample speci®cationÐData¯ow Diagrams A detailed non-formal description in natural language 1 In this context, the term `user' designates the designer of the modelling techniques (`MT designer').
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
465
Fig. 1. DFD graphical representation.
(English) of Data¯ow Diagrams (DFD) will be presented. Afterwards, an explanation of how to obtain a formal VC t speci®cation from this natural language description will be given. For this sample speci®cation we have deliberately chosen a very basic DFD notation consisting of the four symbols shown in Fig. 1, each symbol having only one associated label. Since the graphical symbols are not part of the VC t speci®cation it would be possible to instantiate the same VC t speci®cation with a different set of symbols, such as the DFD symbols used in SSADM. However, in practice, there are some other variations in semantics which will necessitate changes in the VC t speci®cation. 3.1. Semantics de®nition in natural language A DFD is, structurally, a connected graph in which the nodes and edges are graphically represented, respectively, by icons and connections. A DFD includes four distinct constructs: Process (or
Transform), Datastore, External Entity and Data¯ow. The ®rst three constructs are represented by icons and the last one is represented by a connection. The graphical representation of DFD is shown in Fig. 1 and an example is given in Fig. 2. The semantics of DFD can be written in natural language as follows. Each Process has a unique name and must have at least one input Data¯ow and one output Data¯ow which are different. Each Datastore has a unique name and must have at least one input Data¯ow and one output Data¯ow; all Data¯ows connected to a Datastore at one end must be connected to a Process at the other. Each External Entity has a unique name and must have at least one Data¯ow, either input or output; all Data¯ows connected to an External Entity at one end must be connected to a Process at the other. A Data¯ow is named, not necessarily uniquely, and connects two different icons of the diagram, no self-loops
Fig. 2. DFD example.
466
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
Table 1 Correspondence between the mathematical and English based notations Mathematical notation
English based notation
: ' ; ) , [ ^ _ #
NOT EXISTS FORALL IMPLIES DOUBLEIMPLICATION BELONGING AND OR CARDINALITY
are allowed. If two icons are connected by two or more Data¯ows in the same direction then these Data¯ows must have distinct names. All elements on a DFD must be named; no anonymous elements are allowed. A DFD must include at least one Process. Every DFD must include at least one External Entity which provides input to the system and at least one External Entity to which output is directed; these External Entities need not be distinct. A DFD must be connected; i.e. there should be no isolated icons or disconnected partitions of the diagram. 3.1.1. Assumptions The following simpli®cations to the notation have been made: ² diagrams are assumed to be single level without any explosion of Processes; ² the identi®er of a Process is a single string; usually it is split into a serial number and a description; ² the convention that allows duplicate copies of either External Entities or Datastores to be drawn for topological reasons, has not been considered. A complete de®nition of DFD is given in Ref. [3]. 3.2. The VC t speci®cation The VC t speci®cations can be used as input to a compiler. Because that compiler is not able to parse mathematical symbols, such as `;' or ` ) ', an alternative notation based in English lexemes is used. The correspondence between the mathematical notation and the English based one, for the symbols used in this section, is presented in Table 1. The Lexical Analyser of the VC t compiler accepts, for most lexemes, multiple alternatives. For instance, `FORALL' may also be written `UNIVERSAL', `P' may be written `POWERSET', `BELONGING' may be `MEMBERSHIP', `IN' or `E'. In this section the English based notation is used, so that the obtained VC t speci®cation may be compiled. In the remainder of this paper, the English based notation will be used in situations relating to a VC t speci®cation which is
Fig. 3. VC t speci®cation of DFD (part 1).
meant to be used as input to the compiler, in all other situations the mathematical notation will be used. The ®rst aspect of DFD we must specify is its concept structureÐthe constructs and their properties. For this purpose we will write the `PREAMBLE' section (see Fig. 3). We must declare DFD as a Cartesian product of the power sets of the VCs which specify the MT's constructs. This is done in section `A.MT_MODEL'. A variable of type DFD is also declared. Now, we divide the power sets into the ones that are represented by icons from the ones that are represented by connections and also declare extractors to isolate each one of them from the MT. In section B1 we declare the extractors for the power sets represented by icons. In section B2
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
we declare the extractor for the only power set represented by connections. In the same way that there are multiple icons in this speci®cation, it is also legal to have multiple connections. Icons and Connections are formally equivalent. This means that they are speci®ed similarly. They only differ on their graphical implementation; icons are mapped as images whereas connections are represented as line styles of variable length. In section `C.SETS_DEFINITIONS' we can declare auxiliary sets as expressions composed by those declared in section B, and the corresponding extractors. Auxiliary sets simplify the writing of some constraints. Without them, the speci®cations would become clumsy and dif®cult to read. Next, in section `D.SET_PROPERTIES', the properties of the VCs are expressed. It is important to notice that VCs represented by connections will usually have a property that returns the icon connected at its start, and another returning the icon connected at its end. The equality property is expressed to allow for comparisons. Now, we must express the semantics already described informally above, using constraints formally written in a logic-based style. For this purpose we will write the `SEMANTIC_CONSTRAINTS' section (see Fig. 4). For clarity, we have only speci®ed the constraints we ®nd more illustrative. For the remaining constraints only the description in natural language is given. It is important to write meaningful descriptions of the constraints in a speci®cation because they will be used to give feedback to the user of the generated design tool. The constraint numbering must be `C$:' where `$' takes a sequential integer value starting at 1. C1 is a names uniqueness constraint. `FORALL' is the Universal Quanti®er; it binds the variables p1 and p2 ranging over the elements of the VC Process; p1 and p2 belong to the set extracted by the function Processes. We are then saying that for all possible pairs of Processes in the diagram, for instance (p1, p2), if the name of p1 is equal to the name of p2, then p1 and p2 must be the same. This kind of constraint is so common in MTs that we have de®ned a predicate to simplify it: uniqueName(extractor). This predicate has been used in C3. `EXISTS' is the Existential Quanti®er. C2 expresses that a Process must have, at least, one input and one output Data¯ow. Moreover, there must be a Data¯ow that leaves the Process and a different one that reaches the Process; it cannot be the same Data¯ow leaving and reaching the Process, i.e. it cannot be a self-loop Data¯ow. Note that constraint C7 also expresses that no self-loops are allowed. But the fact that we have overlapping constraints does not constitute an error in the speci®cation. It would only be an error if they were con¯icting constraints (over speci®cation). Every speci®cation must de®ne, either implicitly in several constraints or explicitly in one single constraint, what is the minimum diagram that can be obtained and
467
still makes sense. In this example the minimum diagram is expressed by C8, C9 and C10. C11 is a connectivity constraint. This kind of constraint cannot be expressed in plain VC t. The solution to this problem is provided in the form of an auxiliary function: `Connect'. This is a pre-de®ned function that given a DFD icon returns the set of DFD icons that are connected between them and which include that DFD icon. The algorithmic de®nition of this function is given in Section 4. The speci®cation ends with a dot in the beginning of a line after the `END' of the Constraints Section. 4. The formal aspects of VC t speci®cations It is vital that the speci®cations have a formal basis to make them unambiguous. The formal aspects are presented using the mathematical symbolism proposed by Ref. [6]. In what follows, the examples given refer to the DFD speci®cation shown in Figs 3 and 4 (3.2). A VC t speci®cation has two main sections: the Preamble where the Modelling Technique (MT) and its VCs are speci®ed; and the Semantic Constraints which consists of a sequence of predicate logic sentences. The formal and mathematical details of each section are presented in what follows. A speci®cation of a MT starts with an identi®er (its name) followed by the reserved word `SEMANTICS_SPECIFICATION'. For simplicity, we will omit the reserved words in the remainder of this section. 4.1. The Preamble section The Preamble expresses the MT in terms of its VC types and declares all the sets that will be used in the speci®cation. It comprises four sub-sections: the MT model, set extractors declarations, sets de®nitions and set properties. To specify a MT and its VC types we have employed elementary set theory. The MT model is the Cartesian product of the power sets of all its VC types. A generic MT is expressed as: MTX PVCt0 £ PVCt1 £ ¼ £ PVCtn where MTX is an identi®er, called the name of the MT; VCt0 to VCtn are the VC types de®ned for the MT. A variable of type MTX is also declared to be used in the Semantic Constraints section. As an example, DFD has its model speci®ed as: DFD P ProcessxP DatastorexP ExternalEntityxP Dataflow
VAR dfd : DFD In the set extractors sub-section, power sets are divided into the ones that are represented by icons and the ones that are represented by connections. This is not a closed classi®cation; at the moment our diagrams only contain icons and
468
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
Fig. 4. VC t speci®cation of DFD (part 2).
connections, yet, in the future, other diagram constructs may be added, e.g. object inclusion. It is necessary to specify which power sets are represented by icons and which are represented by connections, so that the code generator can establish the mapping from each diagram component (the icons and connections) to the correct underlying abstract graph components (nodes and edges). For each power set, an extractor is de®ned. An extractor is a triple , mtd, ext, vcr . , these being: mtd the MT for which the extractor is declared, called the domain MT; ext an identi®er, called the name of the extractor; vcr a power set, called the range VC. So, for the MT named MTX we have: exti : MTX ! PVCti ; i [ f0::ng
For the Process power set of DFD, the elements of which are represented by icons, the following extractor has been de®ned under `B1.ICONS': Processes : DFD ! P Process A VC is a maximal set in the sense that its values may belong to just that VC type. No subtyping is allowed amongst VCs. However, it is possible to specify auxiliary sets in the sets de®nitions sub-section, as a union of VC sets. For instance, for DFD the following set was de®ned: DFDelement Process U Datastore U ExternalEntity U Dataflow Auxiliary sets can also be speci®ed in extension, which is
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
469
number of levels, and each quanti®cation can either be an existential or a universal one. An existential quanti®cation is speci®ed as: 'x : T´x [ S ^ P
x
Fig. 5. Algorithmic de®nition of the function `Connect'.
useful, for example, in the speci®cation of pre-de®ned label strings. For instance, cardinality labels for the ER technique could be speci®ed as: Cardinality f``1; 1''; ``1; n''; ``n; m''g Auxiliary sets aim at simplifying the constraints in the Semantic Constraints section. See, for example, constraint C11. Properties are declared for each VC in the set properties sub-section. Properties are the constrainable components of VCs. They are used in predicate logic statements of the constraints expressed in the Semantic Constraints Section. For instance, the following statement as been used in constraint C7 of the DFD speci®cation: :
source
f destination
f where `f' is an instance of `Data¯ow'; `source' and `destination' are properties of `Data¯ow' declared in the set properties sub-section. Set properties are de®ned as triples , vcd, prop, tr . , where vcd is a set or a power set, called the domain; prop is an identi®er, called the property name; tr can be a VC type, a pre-de®ned type (String, Natural or Boolean) or an auxiliary set, called the range. For example, the equality property used in the predicate logic statement above, has been de®ned for the DFD `Process' as: equal : Process £ Process ! Boolean 4.2. The Semantic Constraints section A semantic constraint is a rule expressed as a sentence in a form of predicate logic with equality. The constraints can be divided into two groups: the instantiated predicate logic statements and the quanti®ed predicate logic statements. An instantiated predicate logic statement is simply a constraint where the predicates have no variables (excluding the variable declared in the MT model, e.g. `dfd' in the DFD speci®cation). This means that the predicates are not quanti®ed. Examples of this kind of constraint are C3 and C10. A quanti®ed predicate logic statement is a constraint where the predicates use quanti®cation of variables. This means that all the variables in the statement must be bound by quanti®cations. Quanti®cations can be existential or universal. Nested quanti®cations are allowed with any
where x is a variable bound by the existential quanti®cation; T and S are sets such that S is a subset of T; P(x) denotes any boolean expression with one or more predicates on the variable x. Likewise, a universal quanti®cation is speci®ed as: ;x : T´x [ S ) P
x Constraint C7 is a quanti®ed predicate statement with a universal quanti®cation. N-ary predicates, denoted by P(x1, x2, ¼, xn) with the variables x1, x2, ¼, xn ranging over the same or different VC types, are also allowed. As for unary predicates, all the variables must be bound by quanti®cations, e.g. ;x : T´x [ S ) 'y : Q´x [ R ^ P
x; y. In this situation we say that there are nested quanti®cations as in, for instance, constraint C2. The variables in the constraints, bound by the quanti®cations, may range over one VC type or a union of several VC types. Short-cuts could be de®ned for the speci®cation of constraints [6]: 'x : T´x [ S ) P
x can be written as 'x : S´P
x ;x : T´x [ S ^ P
x can be written as ;x : S´P
x A predicate logic statement is a boolean expression including the usual boolean operators. i.e. `negation' ( : ), `implication' ( ) ), `double implication' ( , ), `and' ( ^ ), `or' ( _ ). A boolean expression can include one or more natural number expressions combined by the following natural number comparison operators: `greater' ( . ), `less' ( , ), `at least' ( $ ), `at most' ( # ); `equality' ( ) is allowed over sets and the pre-de®ned types String and Natural. 4.3. Conclusions The language proved to be able to capture most of the constraints de®ned by the semantics of several standard MTs covering static aspects of software (data modelling) and dynamic aspects (process modelling). However, we do not claim that the language is able to express all the semantics. For example, a constraint asserting that a diagram must be connected, cannot be expressed in plain VC t. In order to achieve that expressiveness, one of the main advantages of the VC t language, its simplicity, would have had to be compromised. Therefore, we have provided the possibility of including function calls in the speci®cations. These pre-de®ned functions constitute a library that extends and completes the VC t language. For the code generation
470
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
Fig. 6. The architecture of the VC t compiler.
process, implementations of the library functions must be provided in the target language. Three of these functions have already been de®ned: `Reach(i)', `Connect(i)' (where `i' is an icon) and `uniqueName(extractor)'. The last two have been used in the DFD speci®cation. The algorithmic de®nition of `Connect' for DFD is shown in Fig. 5. The function is de®ned recursively. The result of the function is the union of the given icon with the result of the function applied to each icon which is connected with it by a data¯ow (independently of the direction of the data¯owÐthis is expressed by the boolean connective ` _ '). That is, for each icon it ®nds out which other icons are connected with it by a data¯ow, and the function is applied recursively to each of the connected icons. The result will be the set of all icons found. `uniqueName' is a boolean function, also applied to the whole diagram, in the sense that it performs its complete traversal, but it checks only the VCs determined by the extractor given as a parameter; it returns `true' if all the names are different. While the ®rst two functions add expressiveness to the language, as they capture a class of constraints that the VC t language is unable to capture, the `uniqueName' function only simpli®es the speci®cations. The uniqueness of names given to the objects in a diagram (for instance, ER entities) can be expressed in the VC t language, see for example constraint C1 in the DFD speci®cation, but this constraint is so common that a function replacing the full speci®cation in VCt becomes very useful. 5. Implementing the VC t language 5.1. The compiler A compiler has been built for the VC t language [7]. Its construction was aided by the software tools Lex and YACC [8, 9]. A BNF formal description of the VC t syntax was used as input to YACC to produce the parser. The implementation language used by these tools is `C' [10]. The languages used in the compiler are, therefore: VC t Ðthe source language, CÐthe implementation language and Napier88 [11]Ðthe target language (see Fig. 6). The VC t compiler's model comprises two main blocks: a front-end and a back-end. The front-end is composed of the lexical analyser and the parser; it performs the analysis phase of the compilation process, i.e. it translates the source program (VC t speci®cation) into an intermediate represen-
tation. The parser also performs semantics analysis consisting of type checking and enforcing scope rules. The back-end is the code generator; it performs the synthesis phase of the compilation, i.e. the target code is generated from the intermediate representation. In this model any component of the compiler related to the target language, which in the implemented prototype is Napier88, is restricted to the back-end. The main advantage of this model, apart from facilitating testing and maintenance of the compiler, is to produce a retargetable compiler, i.e. the parser is independent of the target language used and therefore fully portable to other target platforms. The current implementation uses the Napier88 persistent programming language as its target. To transform the compiler to use a different target language, e.g. Java or C 1 1 , it would only be required to replace the back-end. The output of the compilation of a VC t speci®cation of a particular MT is an executable program in the target language (Napier88 in our prototype). The automatically generated code implements a software design tool that supports the speci®ed MT. This tool is brie¯y described below. 5.2. Applying the VC t language for the generation of MT design tools After the compiler's front-end has successfully parsed a VC t speci®cation, the code generator (compiler's back-end) produces a visual and interactive design tool supporting the semantics of the underlying MT. For a particular MT, a visual language to be used within the generated design tool is also automatically generated. This visual language is also constraint based. In the visual language, a diagram is composed of Visual Objects (VOs), which can be icons or connections. Each VO has associated with it a set of constraints. Constraints are increasingly being used to specify the graphical layout and behaviour of an application [12, 13]. In the visual language context, a constraint is a rule on the properties of VOs that can be checked for validity. These constraints are executable versions of the semantic constraints written in the corresponding speci®cation. Executable constraints are used to determine the behaviour of a VO at design time, i.e. how it acts individually and how it interacts with other VOs. This way, a VO is more
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
than just a visual representationÐit has a speci®c behaviour. All aspects related to the visual language are fully discussed in Ref. [14]. 6. Related work 6.1. Evolution from the ECLIPSE design editing system The design editing system we will examine in this section was developed as part of the ECLIPSE system [15]. A requirement for ECLIPSE was the ability to support multiple methods. It is here that the ECLIPSE design editor (ECLIPSE-DE) can be compared with our approach. We too intend to achieve a generic approach which is applicable across a range of methods (or modelling techniques). 6.1.1. The description language The language used in the ECLIPSE-DE to describe a design methods is called GDL (Graph Description Language) [16, 17]. In the same way as our VC t language, GDL is used to describe those methods which exhibit a `node and link' graph structure. Both nodes and links are typed; they are distinguished by their visual representation. GDL includes the feature of allowing new types to be derived from pre-de®ned ones; thus giving rise to a type hierarchy. One of the requirements that guided the design of the VC t language was `simplicity'. An immediate advantage is the ease of use of the language. It is also easier to reason about a simple, clear and non-ambiguous speci®cation. GDL is a very expressive language, but we believe `expressiveness' should not compromise `simplicity' in a context where the goal is to facilitate the development of design toolsÐnot make it more dif®cult. The speci®cations obtained with GDL seem far more complex than the those obtained with VC t. Two possible reasons for this can be pointed out. The ®rst one is that VC t is supported by a very simple formal mathematical basis, namely elementary set theory is used to specify the MT constructs and a form of predicate logic with equality is used for the semantic constraints. An explicit formalization of GDL is not given in the literature. The second reason is that the main purpose of VC t is to express the semantics of a MT, whereas GDL indiscriminately expresses semantic aspects, layout information, e.g. aspects related to how labels are placed in a diagram (inside a box, below it, etc.), rules to ensure the production of good quality designs (this is related to metrics provided by some design methods), and also project speci®c rules such as the maximum number of characters allowed in a label. 6.1.2. Compilation of GDL descriptions and code generation The output of the compilation phase of a GDL description consists of a number of tables stored in ®les. These tables
471
are used to drive the generic design editor at execution time. The GDL compiler does not generate an intermediate representation (such as a syntax tree). This approach provides no independence from the underlying implementation platform (target system); the output tables are directly used to drive the generic design editor. We believe that our system achieved considerable progress in both the language and its implementation (the compiler and code generator) when compared with the ECLIPSE-DE system. 6.2. Picture Speci®cation Notation (PSN) A paper presented at the European Software Engineering Conference (ESEC) [18] describes a language for the speci®cation of MTs called PSN (Picture Speci®cation Notation). PSN is to be used within a prototype of a software tool building system. A graphic editor, which is driven by PSN speci®cations of MTs in a way that guarantees syntactic correctness, is included in that system. In the PSN based approach, the graphical notation `G' of a MT is seen as having three components: lexical, which denotes the symbols used in G; syntactic, the rules governing the combination of symbols in the production of a diagram; and semantic, ``which denotes the meaning attributed to each syntactically valid picture in G''. The paper does not explain how the semantic information is used in the system. This de®nition differs from the one used in our approach in which the syntactic level dictates the valid geometrical relationships between any graphical objects, e.g. ``a shape can only be connected to a line style from its perimeter, not from its centre'', or ``shapes cannot overlap''. These syntactic rules are independent of the type of the shapes, i.e. `Processes' in DFD or `Entities' in ER, or the type of the line styles, i.e. `Data¯ows' in DFD or `Relationships' in ER. The semantic level corresponds to the MT rules on its concepts, e.g. ``a DFD must have at least one External Entity which provides input to the system''. We believe this is a neater structure in that it provides a clear separation from the geometrical relationships, which are valid for all notations, and the semantic aspects, which are characteristic of each notation. PSN speci®es the rules of a notation, i.e. the syntax according to their de®nition. The alphabet (symbols) is de®ned separately using an interactive editor. We use a similar approach: the symbols (shapes and line styles) are de®ned using a graphical objects editor while the rules (constraints) are speci®ed in VC t language. An important aspect of PSN is its ability to express re®nement. An object can be re®ned into a diagram and a logical connection is always maintained between the two. This is possible even when the diagram is expressed in a different notation of that used for the object. Our formalism does not include such a feature at the moment. We do intend to
472
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
extend the formalism to make it able to capture this kind of object re®nement (or explosion, as we call it). In our approach, executable code is automatically generated from VC t speci®cations of MTs. PSN speci®cations are translated into a format that can be interpreted; they cannot be translated or compiled into a widely used programming language, therefore the system is not portable across implementation platforms. PSN is a very expressive formal language, but due to its expressiveness it is also too complex for automatic code generation, i.e. the designer is able to write speci®cations from which the generation of code would be virtually impossible. For example, the `functions' section in a PSN speci®cation allows the designer to specify in an algorithmic form the computations that must be used in the rules. This seems very dif®cult to be supported by a code generator. In spite of its expressiveness, PSN was not designed for the purpose of code generation. In our approach we require an expressive formal speci®cation language but one which can also be used in the automatic generation of executable code. The implementation of the system using PSN was still under way at the time the paper was written and we have been unable to ®nd any further references to the continuation of this work. 6.3. Metamodelling community The work done by Cooper on Con®gurable Data Modelling Systems [19], although following an approach that is more system orientedÐthe system uses a toolkit of high level modelling primitivesÐrather than metamodel based, provided the motivation for the development of our formalism to describe MTs. The toolkit includes a set of data modelling primitives and a set of user interface primitives. Speci®c data models can be composed, by a user, out of these primitives. For any data model, a user interface can also be built. The system also allows for the speci®cation of constraints which are part of the data model de®nition. A good overview of concepts and systems for metamodelling is presented in Ref. [20]. Blaha also addresses metamodelling in an introductory way [21]. The OMT object model notation is used to obtain a number of restricted metamodels of some widely known MTs. Most of the research currently being done in method speci®cation is more related to method integration than to tools generation. A language for the de®nition of a variety of data models called MDL (Model De®nition Language) is proposed in Ref. [22]. The goal of this metamodelling approach is the translation of a scheme from one model to another. The models must have been previously de®ned in MDL. One of the interesting points is the possibility of the automatic generation of a Schema De®nition Language (SDL) from the corresponding MDL de®nition (unfortunately not presented in detail in the paper). In Ref. [23] the integration of the diagrams resulting from the various editors provided by a CASE tool, e.g. Entity Relationship,
Data¯ow Diagrams or Structure Charts, is discussed. A framework, using a new construct named ViewPoints, for the development of systems requiring the use of multiple methods (which include notations and development strategies) in given in Ref. [24]. A formal language able to express constraints is presented in Ref. [25]. The language is called LISA-D (Language for Information Structure and Access Descriptions), which is a formal extension of RIDL (Reference and IDea Language). LISA-D is based on the conceptual modelling technique PSM (Predicate Set Model), an extension of PM (Predicate Model) which in turn is a formalization of NIAM. The paper claims that ``a conceptual data modelling technique should not only be capable of representing complex structures but also rules (constraints) that must hold for these structures''. It is said that LISA-D is (in principle) also applicable to other object-role modelling techniques such as ER or FDM. Models obtained must generally satisfy complex rules imposed by the MTs. To capture such rules, a powerful constraint modelling technique is required. For that purpose the approach uses both graphical representation of constraints in PSM, e.g. total role or uniqueness constraints, and the constraint modelling language LISA-D for constraints that cannot be expressed graphically (which are in fact the majority, as they declare). The work of the team led by Lyytinen is done in the context of the development of a CASE shell called MetaEdit. A CASE shell is de®ned as a tool that can be customised by users to support their own preferred methodologies [26]. In Ref. [1], as a motivation to this work, the weak support given by CASE tools to the users' native methods and methodologies (i.e. the ones they normally use) is mentioned. Although taking a different approach, MetaEdit has a common goal with our work in that they both support high-level speci®cation of methods, or MTs, using an easy to use speci®cation language. MetaEdit is a metamodelling editor based on the OPRR (Object Property Role Relationship) data model [26] (it is in fact a meta-metamodel for it is used to obtain metamodels of methods or MTs). OPRR offers a graphical notation with which methodology models can be constructed. MetaEdit can be used either as a CASE shell [27] or alternatively as an interface to other CASE shells by generating their input con®guration ®les in a (semi)automatic way [26]. In the latter situation the output generator of MetaEdit translates methodology speci®cations to formats needed in CASE shells. The process of method adaptation is described in Ref. [27]. During this process, a formal model of the method is derived. Note that this is also done in our approach, it can then be considered as a metamodelling approach. Although it is possible to express some integrity rules (or constraints as we call them) when modelling a given MT in MetaEdit, they must be very simple ones. The following rule for DFD is given as an example in Ref. [28]: ``a string property must be a dotted sequence of numbers''. This rule
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
forbids combinations such as `Fred' or `2.'. More complex integrity rules cannot be expressed, such as ``there cannot exist two Data¯ows with the same name and in the same direction which share the same origin and destination''. The literature indicates that a desired direction for the work in MetaEdit is ``to increase the capabilities to describe integrity constraints within and between method speci®cations''. We believe our work has provided a positive contribution to this topic. 7. Conclusions and further work A formal language for the speci®cation of diagram-based MTs, named VC t, has been designed and tested. The language is formal, and therefore unambiguous, but also simple to use and from it easily readable speci®cations can be obtained. Formality was employed only as a mathematical tool; the main goal was to obtain a language that could be used by someone who does not have to be knowledgeable on formal notations. The current version of VC t is expressive enough to describe most of the widely used MTs and we have already written speci®cations for a number of such MTs. However, the language does not include meta-concepts, i.e. concepts used in a MT speci®cation, such as re®nement and inclusion. Re®nement occurs when a node can be expanded into further detail, for example, the explosion of a DFD process into a more detailed DFD with a consistent signature of incoming and outgoing data¯ows. Inclusion means that a group of connected nodes is contained within another node in a single diagram, for example a superstate in a statechart. VC t is not yet able to describe these metaconcepts. The solution is to extend the language to make it more expressive. This must be done, however, preserving the language usability. Our aim is that speci®cations of MTs which do not include such concepts, must not be more complex than they are now. High-level speci®cations written in the language can be compiled to automatically generate software design tools supporting the speci®ed MT. The implementation platform used for the generated MT design tools was the Napier88 persistent programming system. The design of the compiler makes the task of porting it to another target language, for instance Java or C11, very simple. A system to generate Java implemented MT design tools is now being planned. We see the production of a MT design tool as an iterative process in which an initial speci®cation is written and compiled to produce a working prototype design editor. The user can test this prototype tool, revise the speci®cation and automatically generate a new prototype for further testing. We are working on ®nding a more precise way of expressing the usage of a MT within a design tool. For example, in a DFD every process must have, at least, one input data¯ow and one output data¯ow. During the editing of a diagram, it
473
will be necessary to allow this constraint to be violated, at least for a short time. Can this be expressed in a speci®cation? To specify such usage aspects, a novel theory of semantic constraints has been formulated. This theory is used to produce re®nements of initial VC t speci®cations. The MT designer may specify usage by classifying the semantic constraints in terms of their checking and enforcement. This means that, by using only the speci®cations, the designer may determine how the generated tools will behave at editing time. Acknowledgements We acknowledge the ®nancial support given by the Portuguese governmental institution JNICT in the scope of PRAXIS XXI and the Portuguese research institute INESC. Many thanks to Dr Alex Bunkenburg for his expert comments regarding the formal aspects of the language. We are very grateful to the reviewers of this paper for their helpful comments.
References [1] P. Marttiin, K. Lyytinen, M. Rossi et al., Modeling requirements for future CASE: modeling issues and architectural considerations. Information Resources Management Journal 8(1) (1995) 15±25. [2] J.L. Wynekoop, N.L. Russo, System development methodologies: unanswered questions and the research±practice gap, presented at Fourteenth International Conference on Information Systems, Orlando, FL, 1993. [3] D. Budgen, Software Design, Addison-Wesley, Reading MA, 1994. [4] P. Chen, The entity relationship modelÐtoward a uni®ed view of data, ACM Transactions on Database Systems 1(1) (1976). [5] B. Potter, J. Sinclair, D. Till, An Introduction to Formal Speci®cation and Z, Prentice-Hall, Englewood Cliffs, NJ, 1991. [6] J. Woodcock, M. Loomes, Software Engineering Mathematics, Pitman, 1988. [7] J.A. Serrano, Automatic generation of software design tools supporting semantics of modelling techniques. Ph.D. Thesis, University of Glasgow, 1997. [8] A.V. Aho, R. Sethi, J.D. Ullman, 1986. Compilers Principles, Techniques and Tools, Addison-Wesley, Reading, MA, 1986. [9] A.T. Schreiner, H.G. Friedman, Jr. Introduction to Compiler Construction with UNIX. Prentice-Hall, Englewood Cliffs, NJ, 1985. [10] B.W. Kernighan, D.M. Ritchie, The C Programming Language, 2nd ed., Prentice-Hall, Englewood cliffs, NJ, 1988. [11] R. Morrison, F. Brown, R. Connor et al., The Napier88 Reference Manual Release 2.0, University of St Andrews, Research Report CS/ 94/8, 1994. [12] B.V. Zanden, B.A. Myers, D. Giuse, P. Szekely, The importance of pointer variables in constraint models, presented at UIST, 1991. [13] A. Borning, The programming language aspects of ThingLab, a constraint-oriented simulation laboratory, ACM Transactions on Programming Languages and Systems 3 (10) (1981) 353±387. [14] J.A. Serrano, The use of semantic constraints on diagram editors, presented at 11th International IEEE Symposium on Visual Languages, VL'95, Darmstadt, Germany, 1995. [15] F. Bott, ECLIPSE an integrated project support environment, in IEE Computing Series, vol. 14, Peter Peregrinus Ltd, London, 1989.
474
J. Artur Serrano, R. Welland / Information and Software Technology 40 (1998) 463±474
[16] R. Welland, A toolbuilders guide to the ECLIPSE design editing system, University of Strathclyde, Glasgow, Research Report CS/ ST/1/88, 1988. [17] S.J. Beer, Supporting checking in a generic, graphical, software design environment, Ph.D. thesis University of Strathclyde, Glasgow, 1988. [18] S. Hekmatpour, M. Woodman, Formal speci®cation of graphical notations and graphical software tools, presented at 1st European Software Engineering Conference, Strasbourg, France, 1987. [19] R. Cooper, Con®gurable data modelling systems, presented at 9th Entity Relationship, Lausanne, 1990. [20] A. Alderson, Meta-CASE technology, presented at European Symposium on Software Development Environments and CASE Technology, Konigswinter, Germany, 1991. [21] M. Blaha, Models of models, JOOP 5(5) (1992) 13±18. [22] P. Atzeni, R. Torlone, A metamodel approach for the management of multiple models and the translation of schemes, Information Systems 18 (1993) 349±362. [23] S. Brinkkemper, Integrating diagrams in CASE tools through model-
[24] [25]
[26]
[27] [28]
ling tranparency, Information and Software Technology 35 (1993) 101±105. B. Nuseibeh, A. Finkelstein, ViewPoints: a vehicle for method and tool integration, presented at International Workshop on CASE (CASE92), Montreal, Canada, 1992. A.H.M. ter Hofstede, H.A. Proper, Th.P. van der Weide, Formal de®nition of a conceptual language for the description and manipulation of information models, Information Systems 18(7) (1993) 489± 523. K. Smolander, K. Lyytinen, V.-P. Tahavanainen et al., MetaEditÐa ¯exible graphical environment for methodology modelling, presented at Advanced Information Systems Engineering 3rd International Conference CAiSE'91, Trondheim, Norway, 1991. J.-P. Tolvanen, K. Lyytinen, Flexible method adaptation in CASEÐ the metamodeling approach, Scandinavian Journal of Information Systems 5 (1993) 51±77. S. Kelly, K. Lyytinen, M. Rossi, MetaEdit 1 a fully con®gurable multi-user and muti-tool CASE and CAME environment, presented at Advanced Information Systems Engineering 8th International Conference, CAiSE'96, Heraklion, Crete, Greece, 1996.