Information and Software Technology 46 (2004) 309–314 www.elsevier.com/locate/infsof
A method for the automatic generation of test suites from object models Alessandra Cavarra*, Charles Crichton, Jim Davies Oxford University Computing Laboratory, Wolfson Building Parks Road, Oxford OX1 3QD, UK
Abstract This paper shows how object-oriented specifications, written in the Unified Modeling Language (UML) can be translated into formal, behavioural descriptions and used as a basis for automatic test generation. The behavioural descriptions are written in a language of communicating state machines: the Intermediate Format (IF). The translation from UML to IF is based upon an earlier formal semantics, written in the Abstract State Machine (ASM) notation. Descriptions written in IF can be automatically explored; the results of these explorations are test trees, ready for input to a variety of testing packages. q 2003 Elsevier B.V. All rights reserved. Keywords: Unified modeling language (UML); Formal semantics; Testing; Object modelling
1. Introduction Software systems are extremely complex; the amount of information contained in an implementation is hard to comprehend in its entirety. As we cannot test without first understanding what the implementation is supposed to do, we need a way to manage this complexity; one way of doing this is to create a suitable abstract model of the system. The potential benefits of model-based testing are clear, but difficult to obtain: the manual extraction of behavioural information from an object-oriented model, and writing test suites to determine whether an implementation exhibits only acceptable behaviours, is a time-consuming and error-prone activity. Fortunately, this test generation process can be automated. This paper shows how test suites can be generated from a precise, abstract model, written in the Unified Modeling Language (UML) [9], while addressing particular combinations of test directives: test purposes, test constraints, and coverage criteria; the generation process is based in part upon a formal semantics for a subset of UML. This paper is intended as an introduction to this approach. It explains the underlying principles, as well as the functionality of the prototypical toolset. The mathematical * Corresponding author. Tel.: þ 44-1865-283666; fax: þ 44-1865273839. E-mail address:
[email protected] (A. Cavarra). 0950-5849/$ - see front matter q 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.infsof.2003.09.004
details of the behavioural semantics, and the translation from UML to the language of the toolset have been omitted for reasons of space and readability: suitable references are provided.
2. Adequate models The UML provides a set of notations designed to meet the needs of typical software modeling projects. For the purposes of this paper, we are interested in three of these notations: class diagrams, state diagrams, and object diagrams; using these, we can completely characterise the behaviour of a system at a particular level of abstraction: the class diagram identifies the entities in the system; the object diagram specifies an initial configuration; the state diagrams explain how these entities may evolve. In a class diagram, we use associations in place of data attributes referring to other classes. Associations may be annotated with roles—a name at one end reveals the (formal) name used for an object of the adjacent class, within the context of an object of the other class. An object diagram shows the state of the system at a certain point in time, as a collection of objects, each in a particular state. The state of an object may be constrained using an assertion, a state name from the corresponding state diagram, or by constraining the values of its attributes. The presence of a link between objects indicates that communication is possible. A link may be decorated with
310
A. Cavarra et al. / Information and Software Technology 46 (2004) 309–314
Fig. 1. A UML state diagram.
information about roles: a name at one end of a link reveals the actual name used, within the context of the adjacent object, for the object at the other end. A state diagram shows how an object will react to the arrival of an event, by performing a sequence of actions, possibly accompanied by a transition from one named state to another. An event represents the receipt of a signal, or the effect of an operation call. An action represents the sending of a signal, or the call of an operation. A transition may be annotated with an event, a guard, and an action expression. The transition begins, or fires, with the occurrence of the trigger event. If there is a guard, it is evaluated before the action list is considered—should it prove to be false, no change in state will take place; in a sense, the transition is cancelled. If there is no guard, or if the guard is true, then the exit actions of the source state are performed, followed by the actions of the transition itself, and then, finally, the entry actions of the target state. For example, in Fig. 1, if ‘event1’ were to occur in ‘State1’, then the sequence of actions performed in response would be: ‘actionB; actionC; actionD’. If two outgoing transitions are enabled at the same time—either because they are both labelled with the same event, or because neither requires an event, and both guards are true—then either may fire: in this way, state diagrams can be used to represent non-deterministic behaviour. To produce an adequate model, we must ensure that there is a state diagram for each class; the identifiers used in each state diagram are all declared within the class diagram; there is a signal class for each and every signal that may be sent or received.
3. Semantics 3.1. The IF language The Intermediate Format (IF) was developed as a machine-readable interchange language: a staging point in the translation of models between higher-level specification languages and a variety of analysis tools. In IF, a system is
described as a collection of finite state machines, or processes, communicating with each other by sending signals along signalroutes. Each process is described as a list of states and transitions. The arrival of a signal at a process triggers a transition, leading to a change in state, and a sequence of actions. A signalroute is a communication medium between processes, or between processes and the environment of the system. A signalroute class describes: the set of signals that may be passed; the queueing policy (FIFO, multiset); delay properties; and reliability. A signal declaration comprises a name, and a list of parameter types. The subset of the IF language that interests us here is described (partially) by the (incomplete) syntax diagram below. The complete language includes constructs for variable and type declarations, clocks, timed transitions, and the creation and destruction of process and signalroute instances. system < ¼ SYSTEM{system-component}*ENDSYSTEM; system-component < ¼ process-decllsignalroute-decllsignaldecl process-decl < ¼ PROCESS process-id;{state}*ENDPROCESS; signalroute-decl < ¼ SIGNALROUTE signalroute-id{signalroute-option}* FROM {process-idlENV} TO {processidlENV} WITH signal-id{,signal-id}*; signal-decl < ¼ SIGNAL signal-id (type-id {, type-id}*); state < ¼ STATE state-id {state-option}* {statecomponent}*
A. Cavarra et al. / Information and Software Technology 46 (2004) 309–314
[SAVE signal-id {, signal-id}] ENDSTATE state-option < ¼ # STARTl#STABLEl#UNSTABLE state-component < ¼ transitionlstate transition < ¼ [PROVIDED expression;] [INPUT signal-id ([expression {, expression}*]);] {action}* terminator action < ¼ SKIP; lTASK expression U expression;l OUTPUT signal-id ([expression {, expression}*]) [VIA signalroute-id]; terminator < ¼ NEXTSTATE lSTOP;
state-id;
The actions performed within a transition are described by imperative programs. They include assignment to a local variable (a TASK action) and the output of a signal on a particular signalroute. At the end of a transition, a process instance moves to another state (NEXTSTATE state-id), or is destroyed (STOP). The IF language shares many of the features of the (informal) state machine semantics of UML [9], making it an ideal vehicle for the formal representation of our UML models. The fact that IF descriptions can be used as input to existing tools for analysis, animation, and test generation, is also important. 3.2. Translation Our translation into IF captures our semantics for this subset of UML. Given an adequate model (Section 2), we declare: an IF process for each object in the model; an IF signalroute for each link; an IF signal for each signal class; an acknowledgement signal, including a return value parameter, for each synchronous operation. To produce the process description for an object of a given class, we declare an IF state for each state in the corresponding state diagram. The state-option clause is used to indicate whether a state is stable or unstable: a process can accept an event only if the current state is #STABLE. A start state in a state diagram becomes a #START state in IF. Before we translate any other state, we transform the action and transition information as follows: we append any entry actions to every incoming transition; we prepend any exit actions to every outgoing transition; we transform any internal transition into an external self-transition. In
311
the Abstract State Machine (ASM) semantics [1], this is a semantics-preserving transformation. Having done this, we construct an IF transition for each transition in the state diagram. For example, a transition from stateA to stateB, triggered by the arrival of signal1, guarded by expression guard, with an action sequence actions, will appear in the declaration of stateA as PROVIDED guard; INPUT signal1( ); actions; NEXTSTATE stateB; If the current state is stateA, then this transition will fire whenever signal1 is received from the input buffer for the current process, provided that the guard is true. The input buffer for a process accepts values from signalroutes representing links to the corresponding object. In UML, the guard expression may contain variables whose value is determined by the input signal itself. In this case, we declare an additional, unstable state stateC, and map the UML transition to three separate IF transitions: in stateA we declare INPUT signal1( ); NEXTSTATE stateC and in stateC, marked as #UNSTABLE, we declare two more: PROVIDED guard; PROVIDED (not guard); actions; NEXTSTATE stateA; NEXTSTATE stateB; If guard is true, then the first transition is enabled; if not, then the second will take the process back to stateA. Send actions are mapped to simple OUTPUT statements: the target component of the action is used to select the required signalroute, based upon the role annotations on the associations in the class diagram, or the links in the object diagram. For example, a send action represented by the transmission of signal2 would be mapped to OUTPUT signal2 VIA signalroute-id where signalroute-id is determined by the current process-id and the target process-id. Call actions may have return values; in any case, for a synchronous operation the current process must wait until the operation has completed before moving to the next state. For each call action, we declare an additional, stable state in which the process waits for the return. For example, if the call action represented by the output of signal3 is performed as part of a transition from stateA to stateB, with a subsequent action sequence actions, and the additional state declared has the name stateD, then the UML transition is
312
A. Cavarra et al. / Information and Software Technology 46 (2004) 309–314
mapped to a pair of IF transitions: in stateA, OUTPUT signal3 VIA signalroute-id; NEXTSTATE stateC and in stateD, INPUT return; actions; NEXTSTATE stateB; where signalroute-id is determined as for send actions. Finally, assignment actions are mapped to TASK statements, changing the values of local attributes: we declare a local attribute for each of the data attributes in the corresponding class. This translation is based upon a formal semantics for UML state diagrams in the ASM notation [1,2]. This semantics covers the whole of the state diagram notation: only part of which is required here: it elucidates and elaborates upon the semantic variation points, and apparent conflicts, of the language definition. However, this semantics—and the current version of the translation—does not provide an adequate treatment of concurrent invocation (of operations upon the same object). An extension of the semantics along the lines proposed in Refs. [3,4] is required. Furthermore, the ASM language supports the representation of call actions as synchronous communications between processes: in IF, we use signalroutes, just as we do with send actions. Some additional work is required here before we can claim that the semantic basis of our translation is complete.
4. Test generation The toolset that we are working with is being developed by the academic and industrial partners in an EU-funded research project called AGEDIS (for Automated Generation and Execution of test suites for DIstributed componentbased Software). 4.1. Directives For a model of any size, the number of states will be such that this exploration must be guided by a set of test directives. Such a directive may combine a test purpose, a set of test constraints, and a selection of coverage criteria. A test purpose presents an abstraction of the state space of the system, expressible as a UML state diagram with systemlevel transitions and attributes drawn from any object in the model. In this diagram, some states may be labelled accept or reject, indicating that further exploration is acceptable, or that the current path of exploration should be aborted.
Test constraints describe additional restrictions based upon particular global states, expressible as UML object diagrams. There are four common kinds of test constraint: we may constrain the state of the system at the start of a test; we may insist that, at some point during a test, the system should be in a particular state; we may insist that at no point during a test does the system enter a particular state; we may constrain the state of the system when the test terminates. Only tests that satisfy these constraints will be generated. Coverage criteria are expressions, in which the variables are object attributes from the model. The collection of tests generated should be sufficient to ensure that, for every possible value of the expression, there is a state of the system, reached during some test, in which the expression actually takes this value. 4.2. Model compilation The UML model of the system, together with the specified test directives, can be created in any of the standard UML tools—such as Rational Rose, Objecteering, Together, or Poseidon. 4.3. Test generation The generation engine is based on TGV, a tool developed by Verimag and Irisa [10] and based on a sound testing theory [15]. This has been extended in several ways: in particular, test constraints are compiled into a simulation API and an extended version of the system is used incorporating a vector of observable attribute values. TGV produces tests based upon an input –output labelled transition system, representing the specification, and a test purpose: a complete automaton with accept and reject states. The former are used to select behaviours for testing; the latter to constrain the exploration of the system. The labels of the test purpose are regular expressions, matched against the labels of the specification. The tool computes a transition system based upon the product of the specification and test purpose. It then constructs one of two objects: a test graph, consisting of all traces leading to an accept state, corresponding to a pass verdict, together with branches that might produce an inconclusive result; or an individual test case—a subgraph of the test graph—if this is necessary to avoid a conflict between an output and another action. In both cases, whenever an output is observable, the appearance of an unspecified output corresponds to a fail verdict upon the test. The current version of the tool provides output in Tree and Tabular Combined Notation—a standard format in the telecommunications industry—but this output can be translated to produce test cases in the language of any API, whether this is C, Cþ þ , or Java. It may also be useful to provide output in the form of UML diagrams, using the sequence diagram notation.
A. Cavarra et al. / Information and Software Technology 46 (2004) 309–314
5. Discussion The methodology described in this paper has two features of particular interest: the use of precise UML models, with an associated formal, behavioural semantics; the way in which test directives, also defined as UML diagrams, can be used to guide the generation process. The former makes the methodology accessible to a wide range of potential users; the latter makes it possible to generate useful tests from a complex, realistic model. The development of the methodology, and the associated toolset, has been informed by a number of industrial experiments: at IBM Hursley Laboratories (UK); at France Telecom (France); and at imbus AG (Germany). Perhaps the most important lesson learned from the experiments was methodological, rather than technological: it proved extremely difficult for software testing professionals to create suitably abstract models of existing software. From a testing perspective, the software is often presented purely as executable code. In reverse engineering this code into models, testers tend to take the path of least resistance, and base the abstraction too closely upon the attributes of the implementation. In most cases, the resulting abstraction are quite uninformative: state diagrams tend to have a single state, with transitions labelled with large amounts of translated code. In this case, there is little advantage in using something as powerful as a model-driven test generation framework. The modeling effort required would be better expended annotating the implementation with testing methods [7], or by translating the methods into a rule-based unit testing framework such as Jython [13]. The expected return on modeling investment, in terms of scalability and compositionality—the ability to generate tests for distributed systems on the basis of individual component descriptions—can be obtained only through suitable abstraction. It is for this reason we now recommend that the modeling process should start strictly before coding, or should be conducted separately, even where the models are intended purely for testing purposes. 5.1. Related research The research described here has been influenced by the work on the two toolsets previously developed, separately, by the AGEDIS partners. GOTCHA [8] sets out an architecture for automated test generation and execution. The Umlaut/TGV combination [5,14] has been used to conduct experiments in generating tests from smaller UML models. It is important to distinguish the research aims discussed here from the work of various open source projects and tool vendors: for example, the JUnit [7] framework includes a test case generator for Java: however, the basis for generation is an implementation-level description;
313
the frame-work provides a pattern, but the user must write the test code. Similarly, many UML modeling tools allow for the automatic creation of empty stub classes. These can be completed by the programmer to form the basis of a unit test. The tools can then run these tests on recompilation, providing for automatic regression testing. This work is complementary to the test generation described here. The need for improvements in software testing infrastructure has led the Object Management Group to issue a call for proposals for a testing profile for UML. The current submission [12] aims to support an effective, efficient and— as far as possible—automated approach to model-driven testing. This will complement the influential approach of the model-driven architecture [6,11]. Unlike the methodology described here, the profile is aimed at a particular approach, namely functional black box testing. In this approach, the system under test is not described by a separate model; instead, the test architecture package imports a complete design model. Although such a model could serve our purpose perfectly well, a model developed purely for design may contain a great deal of information that is not relevant for test generation purposes. Such a model would be unlikely to produce a useful suite of tests. The underlying state space would be too large. A test model written in the UML profile will require the a priori specification of test objectives, usually in natural language, and the corresponding test cases. In the approach described in this paper, we focus instead upon the definition of precise test directives, preferring that abstract test suites should be generated automatically.
Acknowledgements The authors gratefully acknowledge the contributions made by their partners in the AGEDIS project, and in particular the extensive feedback on language design and formalisation provided by Laurent Mounier (Verimag). They are grateful also to Jim Woodcock and Ian Craggs (IBM) for their role in initiating this research; and to IBM, for their support under the Faculty Partnership Program.
References [1] E. Bo¨rger, A. Cavarra, E. Riccobene, Modeling the dynamics of UML state machines, In ASMs: Theory And Applications, Springer, Berlin, 2000. [2] E. Bo¨rger, A. Cavarra, E. Riccobene, A precise semantics of UML State Machines: making semantic variation points and ambiguities explicit, in ETAPS 2002 (2002). [3] J. Davies, C. Crichton, Refinement and concurrency in UML, Formal Aspects of Computing, 2003. [4] J. Davies, C. Crichton, Using state diagrams to describe concurrent behaviour, in ICFEM 2003, LNCS, Springer, Berlin, 2003.
314
A. Cavarra et al. / Information and Software Technology 46 (2004) 309–314
[5] J.-C. Fernandez, C. Jard, T. Je´ron, C. Viho, On-the-fly verification techniques for the generation of test suites, in Computer Aided Verification (CAV) 96, Springer, Berlin, 1996. [6] D. Frankel, Model Driven Architecture: Applying MDA to Enterprise Computing, Wiley, New York, 2003. [7] E. Gamma, K. Beck, Junit: a regression testing framework, www. junit.org [8] I. Gronau, A. Hartman, A. Kirshin, K. Nagin, S. Olvovsky, A methodology and architecture for automated software testing, http:// www.haifa.il.ibm.com/projects/verification/, 2000 [9] Object Management Group. Unified modeling language specification, version 1.5, http://www.omg.org/
[10] T. Je´ron, P. Morel, Test generation derived from model-checking, in CAV ’99, Trento, Italy, LNCS, Springer, Berlin, 1999. [11] A. Kleppe, J. Warmer, W. Bast, MDA Explained: The Model Driven Architecture: Practice and Promise, Addison-Wesley, Reading, MA, 2003. [12] OMG. UML testing profile, 2003. http://doc.omg.org/ad/2003-03-26 [13] S. Pedroni, N. Rappin, Jython essentials, 2002, see also www.jython. org [14] S. Pickin, C. Jard, Y. Le Traon, T. Je´ron, J.-M. Je´ze´quel, A. Le Guennec, System test synthesis from UML models of distributed software, in: D. Peled, M. Vardi (Eds.), FORTE 2002, LNCS, 2002. [15] J. Tretmans, Test generation with inputs, outputs and repetitive quiescence, Software—Concepts and Tools, 1996.