Information and Software Technology 46 (2004) 281–286 www.elsevier.com/locate/infsof
An integrated framework for formal development of open distributed systemsq Issa Traore´a,*, Demissie Aredob, Hong Yea a
Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada V8W 3P6 b Norwegian Computing Center, P.O. Box 114 Blindern, N-0314 Oslo, Norway
Abstract This paper contributes to the discussion on issues related to the formal development of open distributed systems (ODSs). Deficiencies of traditional formal notations in this setting are highlighted. We argue that there is no single formalism exhibiting all the features required to capture properties of ODSs. As a solution, we propose an integrated development framework that involves two notations: the Unified Modeling Language and the Prototype Verification System. We discuss the motivation for the choice of these notations, provide an overview of a CASE tool we have developed to support the proposed framework, and present a case study to demonstrate usability of our approach. q 2003 Elsevier B.V. All rights reserved. Keywords: Formal methods; Open distributed systems; Unified Modeling Language; Prototype Verification System; Multi-formalism; Object-orientated programming
1. Introduction Motivated by the need for modeling the dynamic features of object-oriented programming languages and openness in distributed applications, the study of open, and dynamically extendable systems has become a very popular research area. In fact, since late 1980s, much research within theoretical computer science has been directed towards this kind of systems. The emphasis has mainly been put on semantic issues; in particular, on how such systems should be represented faithfully and fully abstracted. The emphasis in our work is not on the semantics of systems, rather on the formal system development. On one hand, most specification techniques supporting the development of open distributed systems (ODSs), e.g. the Unified Modeling Language (UML) [6], lack formal semantics and the rigorous reasoning facilities necessary for formal development of software systems. On the other hand, the existing formal development methods suffer from certain limitations, which constrain their application to large scale projects in the industrial settings, especially their esoteric feature is a major obstacle. Moreover, we are not q An earlier version appeared in the Proceedings of ACM Symposium on Applied Computing (SAC03), March 9 –12, 2003, Melbourne, FL, USA. * Corresponding author. Tel.: þ 1-250-721-8697; fax: þ1-250-721-6052. E-mail address:
[email protected] (I. Traore´).
0950-5849/$ - see front matter q 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.infsof.2003.09.012
aware of any conventional formal development method that is capable to fully handle the flexibility, extendability and dynamic features characterizing contemporary distributed systems. In RMODP [4], formal description techniques such as LOTOS, Z, SDL and Estelle are proposed for the specification of systems from various viewpoints. Yet, as pointed out in Ref. [1], these languages are only partly satisfactory. Taking the above remarks into account, the challenge is to build a development framework and a supporting CASE tool that exhibits the following capabilities: – – – –
can be grasped and used in an industrial context; supports description of major aspects such as openness and dynamic reconfigurability exhibited by ODSs; supports formal system specifications that are amenable to rigorous reasoning; have strong and efficient tool support.
In this context, based on the evaluation of several existing methods and CASE tools, we propose a multiformalism approach where we integrate existing technologies. More specifically, we propose a formal development framework and a supporting tool that is based on the UML for specification and refinement, and on the Prototype Verification System (PVS) [7] for semantic foundation and rigorous reasoning.
282
I. Traore´ et al. / Information and Software Technology 46 (2004) 281–286
The rest of the paper is organized as follows: In Section 2 we give a brief overview of UML and discuss the rational behind our choice of notations. In Section 3, we briefly discuss our formalization approach. Then, in Section 4, we present a case study of a network reconfiguration protocol. Finally, in Section 5 we make some concluding remarks.
2. Modeling open distributed systems in UML The choice of the UML notations was dictated by the fact that it is built on an object-oriented paradigm and provides several capabilities such as extension mechanisms (e.g. stereotyping), dynamic and multiple classification, which are useful for the description of open distributed systems (ODSs). In addition, UML provides underlying methodology for specification and refinement, a graphical notation, which contributes to communicability and friendliness, and very importantly, UML is an international standard for object-oriented modeling techniques. In spite of the benefits it provides, UML has limitations in the context of the formal development of ODSs. The graphical UML constructs are not precise enough to achieve a complete and formal descriptions of ODSs. For instance, in Ref. [2] several incompleteness in the static semantic model of the UML are reported, especially concerning the definitions of the concepts of aggregation, inheritance, constraints on inheritance hierarchies and abstract operation descriptions. In order to fill this gap, there is a need for extending the UML notations to achieve two main objectives. (i)
To improve description of additional constraints on modeling elements, e.g. invariants on classes and types. (ii) To provide formal semantics for different constructs involved. Currently, the first issue is generally addressed by using natural language, which results in ambiguities and misinterpretations. An approach is to use the Object Constraint Language (OCL) [8], an assertional language that is used to specify well-formedness of modeling abstractions provided by UML. Unfortunately, expressiveness of OCL is relatively limited in the context of dynamic aspects of systems, and as pointed out in Ref. [2], the semantic of OCL is not mathematically defined. Hence, in order to achieve the objectives mentioned earlier, we decided to use the PVS as underlying semantic foundation for our development framework.
3. Formalization of graphical OO models Several works have attempted to provide a mathematical basis for the concepts underlying object-oriented graphical
models using different approaches [2]. Some of the approaches consist of adapting or extending a novel or existing formal description technique with object-oriented concepts. Others derive a formal specification from the semi-formal (or informal) model built with existing objectoriented notations such as UML. The main problem with these approaches is the fact that users should have to deal with a certain amount of formal artifacts, and as we have already argued, this can be a barrier to their practical usability in the industrial settings. A more workable approach that is adopted in our development framework integrates semi-formal modeling techniques with formal methods by assigning formal semantics to the graphical modeling constructs of an existing notation. In this case, the formal ‘stuff’ is hidden behind the graphical notations, and users deal with graphical model they develop, while the formal stuff is automatically processed at the back-end. UML consists of nine standard diagrams; our formalization work has focused so far only on three of them, namely class, sequence, and statechart diagrams. In Section 3.1, we present a brief sketch of a formal semantic definition we proposed for the UML statechart. 3.1. Formal semantics of UML statecharts The formalization scheme we adopted for the UML statecharts consists of definition of formal semantics as a transition system consisting of the triple (init, V, next), where init is an initialization predicate that describes initial global states, V defines the global state in which the machine may be at a given time, and next is a global transition relation that describes the execution sequence of the underlying state machine. A statechart diagram consists of a collection of state nodes, also called state vertex. The state nodes are related by transitions that are triggered by events, and may result in execution of a series of actions. A transition is characterized by a source state, a target state, a triggering event, a guard condition, and an associated action, which is executed when the transition is fired. Hence, we define the abstract syntax of a transition as a PVS record type consisting of fields that capture these elements. Transition: TYPE þ ¼ [# source: Vertex, trigger: Event, guard: Condition, effect: Action, target: Vertex #] The main step in the formalization approach adopted in our work consists of defining a set of elementary predicates that describe properties of a system state or a system operation. We represent the concrete state V as a record type whose fields correspond to the concrete state variables. We define three categories of predicates associated, respectively, to the notions of state vertex, guard condition, and action. The predicate associated to a state corresponds to a condition that must hold for the state to be activated.
I. Traore´ et al. / Information and Software Technology 46 (2004) 281–286
The predicate associated to an action corresponds to a condition that holds after the execution of the action. This can be assimilated by postcondition of the action. The state and the guard conditions are functions of the current values of the state variables, whereas the postcondition of an action is a function of both the current and the future values of state variables. VC: TYPE ¼ [#current: V, next: V#] vc: VAR VC; v: VAR V %Predicates for states, conditions, and actions pred: [Vertex – . PRED[V]] pred: [Condition – . PRED[V]] pred: [Action – . PRED[VC]] In a statechart diagram, more than one state can be active at once. If a simple state is active, then all the composite states that contain it either directly or transitively are also active. The set of all the states that are active simultaneously defines what is called a state configuration. We define the initial state configuration initConf of a statechart as a set of all the default states involved in the diagram. Configuration: TYPE þ ¼ finite_set[Vertex] init: PRED[V] ¼ pred(initConf) A transition is enabled if the event generated matches its trigger, its guard condition is true and its source state is active. An enabled transition may be eligible for firing. Firing a transition will activate its target state and leads to an execution of its action. Below we define the predicates enabled and fired to describe, respectively, the enabling and firing conditions of a transition. More than one transition may be enabled within a state machine, resulting in a conflict. The set of transitions that will actually be fired in the whole state machine is a maximal set of enabled transitions with the highest priorities, and that are not mutually conflicting. e: VAR Event; tr, tr1, tr2: VAR Transition a: VAR set[Transition]; v1, v2: VAR V enabled (e,tr,v): bool ¼ pred(source(tr))(v) AND (trigger(tr) ¼ e) AND pred(guard(tr))(v) fired (tr,v,v1): bool ¼ pred(target(tr))(v1) AND pred(effect(tr))(vc) WHERE vc ¼ (# current: ¼ v, next: ¼ v1#)
283
and processed one at a time. At the beginning of a run-tocompletion step, a statechart is in a stable state configuration, with all the actions completed. At the end of the step, the same conditions apply as well. Before starting a run-tocompletion step, a maximum set of enabled transitions is chosen non-deterministically and then fired. We define below a function called eprocess that describes event processing operations. Event processing consists of selection and firing a maximal set of enabled transitions. In the informal semantics of UML statecharts, there are no assumptions on the order of event dequeuing; we adopt in this work a simple priority scheme based on the first comes, first served principle. We also define the global transition relation next based on function eprocess. c1, c2: VAR configuration; st: VAR set[Transition] eprocess(e,v,v1): bool ¼ EXISTS st: subset?(st, transitions(sm)) AND maxEnabled(st,v,e) ¼ . (FORALL (tr: (st)): fired(tr,v,v1)) next(v1, v2): bool ¼ EXISTS (e: (events(sm)), c1, c2): (pred(c1)(v1) AND pred(c2)(v2)) ¼ . eprocess(e,v1,v2)) 4. A case study In this section, we illustrate usability of our approach through a case study of a network reconfiguration protocol. 4.1. Summary of requirements The IEEE 1394 tree identify protocol [3] is used by the 1394 high performance serial bus for leader election tasks. It has an open and scalable architecture that allows addition and removal of devices and peripherals at any time. After a bus-reset, i.e. when a node is added to, or removed from the network, all the nodes in the network have equal status and they know only nodes to which they are directly connected. The IEEE 1394 tree identify is based on a leader election algorithm that allows the election of a leader (root) that will act as a manager of the bus for subsequent phases of the IEEE 1394 protocol. 4.2. UML specifications
maxEnabled(a,v,e): bool ¼ subset?(a, transitions (sm)) AND FORALL (tr: (a)): enabled(e,tr,v) AND (FORALL (tr1: (a)): NOT conflict(tr,tr1)) AND (FORALL (tr2lenabled(e,tr2,v) AND NOT member(tr2,a)): hasPriority(tr,tr2) OR samePriority(tr,tr2)) Semantics of UML statechart is based on the run-tocompletion assumption, meaning that events are dispatched
We describe the system by providing UML class and statechart diagrams shown, respectively, in Figs. 1 and 2. The class diagram consists of two, classes: Node and Network classes. The class Node represents individual nodes involved in the network. A name, possibly a parent node, and three collections of nodes corresponding, respectively, to the neighbors, the actual children and the potential children characterize an instance of Node. Potential children are represented by the role name
284
I. Traore´ et al. / Information and Software Technology 46 (2004) 281–286
Fig. 1. Class diagram of IEEE 1394 protocol.
pending. The class Network corresponds to the collection of nodes involved in the network. An instance of Node may be either a regular child or the manager in an instance of Network. The two associations between the two classes capture this property. The statechart diagram shown in Fig. 2 describes the dynamic behavior of the Network class in terms of the messages it sends and receives. Initially, a Network object is in an initial state Init that corresponds to the state entered immediately after the bus-reset. Then the election starts with the occurrence of the electLeader event, bringing the Network object to the Electing state. If a leader is elected, represented by condition c4, the object will move to the LeaderElected state ending the statechart. If a cycle is detected, represented by condition c5, an error is reported, and the object evolves to the ErrorDetected state. The Electing state is a concurrent state whose direct substates, also called regions, describe the individual behavior of the elements, e.g. the nodes, involved in the collection underlying a Network object. Dividing it using dashed line specifies the regions of a concurrent state.
Given i such that 1 # i # N; region NodeiStatus starts in a Waiting state where the corresponding node waits for ‘be my parent’ request represented by event beMyParent from its neighbors. If a request is received from a neighbor that is not a child (condition c1), an acknowledgement is generated (action accept), followed by an acknowledgement of the acceptance (event confirm), and an update of the number of children of the node (action update). The update may lead to the Voting state, in case the number of neighbors that are not children is exactly 1. In that state, the node can send a be my parent request represented by event vote to the neighbor. The node may also receive at the same time a be my parent request from the same node resulting in contention described by state Contention. After a timeout, the node returns to the Voting state. If the request is accepted (condition c2), the node evolves to the ParentElected state, which represents the final state of the NodeiStatus region. When all the nodes but one have their parents elected, the election process is completed, and the node without any parent becomes the elected leader (condition c4). 4.3. Complementary semantics and system properties The standard UML document [6] provides only a partial specification of a system. The UML specification produced needs to be extended by providing complementary semantics for the elementary features, e.g. state, actions, conditions, etc. and properties involved using languages like the OCL [8] or any mathematical or textual languages. We give some examples of complementary semantics and properties of the statechart diagram shown in Fig. 2 using OCL. The context of the expressions is a Network object, and two interacting Node objects k and n in a collection. Let us say that node k corresponds to one of the nodes whose behavior is described by StatuskNode. As example of guard condition, we define c1 as follows: c1(n: Node,k: Node): Boolean self.nodes ! includes(n) AND self.nodes ! includes(k) AND k.children ! excludes(n) AND k.neighbours ! includes(n) As an example of action, we specify update by the predicate predUpdate. Its outcome consists of moving the requesting node from the pending list (i.e. list of the nodes for which a beMyParent request has been received) to the children list. predUpdate(k: Node, n: Node): Boolean k.children ! includes(n) AND (n.parent ¼ k) AND k.pending ! excludes(n)
Fig. 2. Statechart diagram of IEEE 1394 protocol.
We also give an example of property named Prop1 that characterizes a Network object. Prop1 ensures that there is at most one root in the network.
I. Traore´ et al. / Information and Software Technology 46 (2004) 281–286
Prop1: self.nodes ! forAll(p1, p2lp1 ¼ self.root AND p2 ¼ self.root implies p1 ¼ p2) 4.4. Formal analysis In order to formally reason about the UML models, we need a formal description of the system. As we already stated, we use PVS for that purpose. More specifically, we translate the OCL specification into PVS, and based on our semantic framework, we do the same for the UML graphical specification. The two PVS specification fragments in UML and OCL are integrated into a single and homogeneous PVS specification that serves as a basis for the formal verification. We have developed a supporting environment, called the Precise UML Development Environment (PrUDE), to assist specifiers in generating PVS models. PrUDE also gives the specifier the possibility to invoke PVS tools either in a batch mode or interactively. Fig. 3 shows a snapshot of property verification using the PrUPE tool. The lower windows show the log report generated by running the PVS tool in batch mode. The verification of the model is conducted by expressing system properties as PVS
285
theorems, and then by checking them using the PVS tools. For instance, property Prop1 (cf. Section 4.3), which states that there is at most one root in the network, is expressed in PVS as follows: p1, p2: VAR VNode prop1: THEOREM (member(p1, nodes(v)) AND member(p2, nodes(v)) ) (root(v) ¼ p1 AND root(v) ¼ p2 ) p1 ¼ p2)) By invoking the PVS theorem-prover interactively from PrUDE, we obtain the following proof of property Prop 1: prop 1: l– – {1} FORALL (p1, p2: VNode, v: V): (member(p1,nodes (v)) AND member(p2,nodes(v)) ¼ . (root(v)) ¼ p1 AND root(v) ¼ p2 ¼ . p1 ¼ p2)) Rerunning step: (SKOSIMP*) Repeatedly Skolemizing and flattening, this simplifies to:
Fig. 3. Automatic verification of Prop 1 using PrUDE.
I. Traore´ et al. / Information and Software Technology 46 (2004) 281–286
286
prop 1: {-1} member(p1!1, nodes(v!1)) {-2} member(p2!1, nodes(v!1)) {-3} root(v!1) ¼ p1!1 {-4} root(v!1) ¼ p2!1 l– – {1} p1!1 ¼ p2!1 Rerunning step: (EXPAND “member”) Expanding the definition of member, this simplifies to: prop1: {-1} nodes(v!1)(p1!1) {-2} nodes(v!1)(p2!1) [-3] root(v!1) ¼ p1!1 [-4] root(v!1) ¼ p2!1 l– – [1] p1!1 ¼ p2!1 Rerunning step: (GROUND) Applying propositional simplification and decision procedures, Q.E.D. Run time ¼ 0.17 s. Real time ¼ 0.22 s. NIL PVS(33): Conducting interactive proof-checking, even from the PrUDE environment, is quite often tedious and time consuming. The properties expressed in our framework are based on a common template. Using that general structure, we have succeeded in defining general PVS proof strategies based on the notion of configuration pairs [5]. Each strategy consists of primitive strategies, and can be used to check automatically system properties. The following is a proof strategy for a statechart: (defstep property-proof-strategy (then (auto-rewrite “user_defined_axiom1” “user_defined_axiom2” …) (skosimp) (expand “ConfigurationPair”) (grind) ) ) The proof strategy denoted property-proof-strategy, collects the complementary semantics, e.g. user-defined axioms as auto-rewrite rules, invokes skosimp command to replace universal quantifications in the target formulas with
constants. The expand command is then used to expand the configuration pair definition. Finally, the grind command, a catch-all strategy is invoked to apply all the necessary simplifications and complete the proof. These proof strategies are implemented in PrUDE and can be invoked to check automatically any proof obligation based on our framework. If the proof fails, a counterexample is generated to trace errors in the original UML model. Fig. 3 presents a snapshot of automatic verification of property Prop 1: the property is edited using a property editor (the upper window) and then checked automatically by invoking the prover.
5. Concluding remarks In this paper, we have presented a framework for formal development of ODSs and an automated platform that supports the framework. One of the main objectives of our platform is to minimize the formal artifacts that users of the platform should have to deal with. This in turn facilities the industrial usability of the platform. In this respect, we have decided to use the PVS-SL as underlying semantic foundation and not as a specification language. As a result, the user need not have an in-depth knowledge about the PVS formal notation and proof system. The PVS-SL offers a very general semantic foundation and a set of powerful tools. It is highly expressive and offers several mechanisms for formal analysis. In order to enhance the automation of the formal verification process, we have defined suitable proof patterns and strategies for the kinds of properties that can be derived from our semantic model. These strategies are implemented in the current version of the PrUDE tool, and allow automatic processing of proof obligations.
References [1] O.J. Dahl, O. Owe, Formal Methods and the RMODP, Research Report No. 261, Department of Informatics, University of Oslo, Norway, 1998. [2] A. Evans, UML class diagrams-filling the semantic gap, Technical Report, York University, 1998. [3] IEEE, IEEE Standard for a High Performance Serial Bus, Standard 1394-1995, August 1995. [4] ISO-IEC JTC1/SC21/WG7, The Reference Model of Open Distributed Processing, 1995. [5] M. Liu Yanguo, Proof Patterns for UML-based Verification, Master Thesis, ECE Department, University of Victoria, Victoria, Canada, October 2002. [6] The OMG, OMG Unified Modeling Language Specification, version 1.3, OMG standard document, June 1999. [7] S. Owre, N. Shankar, J. Rushby, D.W. Stringer-Calvert, PVS Language Reference, version 2.3, September 1999. [8] J.B. Warmer, A.G. Kleppe, The Object Constraint Language: Precise Modeling with UML, Addison Wesley Longman Inc., Readign, MA, 1999.