Control in Multi-threaded Information Systems* PABLO A. STRAUB AND CARLOS A. HURTADO Depto. de Ciencia de la Cornputacion Universidad Catolica de Chile Santiago, Chile
Abstract Information systems design has traditionally dealt with both data modeling and process modeling. Regarding process modeling, most design techniques based on structured design or object-oriented design specify possible data flows and interactions among components, but are not very precise in specifying system control flow. On the other hand, the introduction of computer networks has enabled technologies like work flow automation and ED1 to coordinate collaborative work in and across organizations, allowing process re-engineering to shorten process times by introducing parallelism. The introduction of these technologies requires a precise specification of control for a parallel process. Moreover, process specifiers are not necessarily computer scientists, nor are they willing or able to learn complex languages. Several languages have been developed to specify control in worwlow systems. Most languages specify control using diagrams similar both to traditional single-threaded control flow diagrams and CPM charts. These languages can represent both choice and parallelism. But this combination of language constructs, required for modeling processes, can introduce control anomalies, like useless work or even worse, deadlock. This paper is a first treatment of multi-threaded control flow in information processes. It presents common language constructs and some extensions, semantics, analysis methods, and a theory of threads of control which is used to analyze process models and to define the semantics for some of the language extensions. The formalization is given using Petri nets.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Automated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Process Model Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Control Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Contents of this Article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Process Model Control Specification . . . . . . . . . . . . . . . . . . . . . . . . . .
2 3 4 6 7 7
*This work is funded in part by COMCYT through project FONDECYT 1940677.
1 ADVANCES IN COMPUTERS, VOL 45
Copynght 0 1997 by Academic R e v Ltd All nghts of reproduction in any form reserved
2
PABLO A. STRAUB AND CARLOS A . HURTADO
2.1 Basic Control Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 CICN: A Standard Control Language . . . . . . . . . . . . . . . . . . . . . . . 2.3 Advanced Control Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. PetriNets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Place/Transition Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Free-choice Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Partial Order Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Behavioral Properties and Invariants . . . . . . . . . . . . . . . . . . . . . . . 4 . The Simple Control Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Petri Net Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Behavioral Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . A Theory of Threads of Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Thread Labels and Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Threads and Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . Applications of Thread Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 BaseModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Alternatives within a Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Alternatives between Multiple Threads . . . . . . . . . . . . . . . . . . . . . . 6.5 Unbalanced Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Summary: Incremental Case Composition . . . . . . . . . . . . . . . . . . . . 6.7 Dealing with Unspecified Situations . . . . . . . . . . . . . . . . . . . . . . . 6.8 General Unbalanced Connectors . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 10 11 13 13 14 15 16 17 17 18 21 25 25 27 31 33 34 36 37 38 40 43 44 44 46 47 50
1. Introduction Organizations are distributed. interactive. parallel systems. that handle incomplete and inconsistent information . From the perspective of this definition. it is possible to realize that computer science can help to understand and support work within organizations. for parallelism. interaction. data handling. etc., are clearly in its realm . In fact. in the last decade or so the coordination of collaborative work has been automated by so-called collaborative systems. which are based on techniques loosely identified by the term Computer-Supported Collaborative Work (CSCW) [21] . The idea of process automation traces its origins back to the invention of the assembly line in the beginnings of this century . Taylor’s theories on rationalization of functions within organizations led to the definition of organizational processes. defined as sets of interrelated functions performed by several individuals. Only recently. with the development of inexpensive networked computers. has the possibility of automating coordination of work by many people been realized .
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
3
1.1 Automated Processes Automated processes are usually more complex than similar manual processes because of added cases and parallelism. Automated process models usually have more cases (Le., choices of possible execution paths) than manual processes, because manual processes-followed by intelligent people-can be changed during process execution if need arises. In fact, most procedures from organizational procedure manuals are just a sequence of unconditionally performed steps. While most organizational processes have intrinsic parallelism due to the relative independence of some activities, manual procedures usually cannot take advantage of it, because coordination becomes too complex. Automatically coordinated processes, on the other hand, can safely handle complex parallel procedures. Another dimension of complexity that can be handled in automatically coordinated process models is the structure into models and submodels, with complex call relations, even including recursion. Many organizational processes are fairly rigid or at least there is a preestablished set of possible choices. They are called routine processes. There are also many organizational activities that cannot easily be structured in terms of fixed procedures. They are called non-routine processes. Most processes fall between these two categories, so they are usually executed as routine processes, but may sometimes include non-routine parts. It is not possible to create a meaningful process model for a completely non-routine process. Of course, automation of routine processes is simpler than that of non-routine processes (see Table I). Support for nonroutine processes has been an active line of research and development. Some systems use messaging to handle exceptions by supporting informal communication, which is initiated once the process is off its normal course of action [S].Another approach is to explicitly model the interactions among actors involved in the process [ 141. However, those aspects pertaining to the control flow of a process during and after exception handling have not been treated in depth. In particular, TABLEI ATTRIB~JTES OF AUTOMATEDPROCESSES
Kind of process Attnbute Process definition Tool support
Routine
_
Eimple good
Semi-routine ~ hard fau
_
Non-routine impractical poor
4
PABLO A. STRAUB AND CARLOS A. HURTADO
when processes have parallelism not every state is acceptable because of the possibility of control anomalies, like deadlock.
1.2 Process Model Components To reliably automate a process, the process has to be formally defined, that is, there must be a formal procedure or process model, usually written using a graphical language. The process model is enacted (i.e., executed) creating particular processes from the same model. Process models comprise function, data, behavior, organization, and other process features [lo]. These features can be dealt with separately by a family of related process modeling languages (e.g. Kellner [lo] used the Statemate family of languages [8] to model processes). A complete process model must describe not only the activities performed, but also their sequencing, parallelism, resources, organizational context, etc. Kellner [ 101 identifies four main perspectives of process models: function, behavior, data, and organization; these perspectives cover most process aspects. Kellner’s process perspectives are present in the representation of information processes under new automation technologies, because information processes can be modeled at different levels. In this work we describe these process models using a so-called generic model, which is based on several specific process modeling languages and techniques [7]. In the generic process language, there are four related submodels for a complete process model: control model, context-independent model, organizational model, and context-dependent model. The relationship among these models is shown in Fig. 1. 0
Control model. The control model is basically a graph whose nodes are
Context-dependent model Context-independent model
7,
Grganizationd model scripts role assignment object references
FIG.1. The four submodels of the generic process language.
-
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
5
activities and connectors. Activities represent basic units of work: they are either atomic activities, calls to other models, or special. The control model is a transition diagram with constructs resembling control flow diagrams (CFD) and critical path method (CPM) charts. Control is a subset of process behavior. Process behavior is defined as “when and how they are performed through feedback loops, iteration, complex decision-making conditions, entry and exit criteria, and so forth” [lo]. From the point of view of coordination theory [ 131, control models are the specification of causal interdependencies between activities. That is, the control model represents partial orders between the activities of the different processes for a given process model (as opposed to total orders or sequences, due to parallelism). The control model does not represent functionality, and other aspects like resource sharing, timing, etc., even though all these aspect do determine actual process behavior. Like programming languages, process modeling languages include both basic control constructs-like selection, iteration, and synchronization-and more advanced constructs-like recursive calls to submodels and activity replication-some of which will be described in this chapter. 0
Context-independent model. The context-independent model is an extension to the control model, which adds local data and adds a functional description. That is, this model adds the description of what data is passed from activity to activity, and how data is changed in the activities. This model is independent of the organizational context in which it is executed, not unlike the way in which a program written in a high-level language can be executed in different computers and operating systems.
0
Organizational model. The organizational model includes classes of objects, types of objects, types of object relationships, actual objects and actual relations between objects. Each class has a defined set of operations or methods for objects of the class. There are two distinguished object classes called actors and roles and also a relationship between actors and roles. This model represents resources-people, machines, data-and resource organization-organizational structures, data structures-.
0
Context-dependent model. The context-dependent process model comprises the context-independent process model, the organizational model, and their relationships. This model assigns roles to activities, references to organizational objects, and scripts which call organizational object methods. Executors are related to activities. Object
6
PABLO A. STRAUB AND CARLOS A. HURTADO
references are treated like data. Scripts are mainly used to fetch data from organizational objects before executing the activity and to store data in them after the activity executes; scripts are thus related to activities.
1.3 Control Anomalies As was previously mentioned, the capacity to handle complex processes with parallelism and choice is the main difference between process execution under automatic coordination and process execution under manual coordination. However, even though parallelizing a business process might ‘‘dramaticallyreduce cycle times and the resultant costs” [17], the existence of both parallelism and choices might lead to anomalous behavior, like deadlock or the execution of useless activities. Thus, some process modeling languages constrain the forms of parallelism (e.g., parbegin and parend) [2] that can only describe correct behavior. Alas, languages that do allow general forms of parallelism do not test for incorrect behavior, a notion that is not even well defined. Sequential processes cannot have behavioral anomalies (except infinite loops). On the other hand, parallel processes without choice cannot have control anomalies. Thus, it is not surprising that the usual way to avoid these anomalies is by defining simple-minded process models that inhibit the natural parallelism of activities and abstract away the handling of exceptional cases. Oversimplification is especially relevant when dealing with complex processes. While there are many real-world simple business process models, a study cited in [ 3 ]on the modeling of 23 processes of the US Department of Defense included 17 process models with more than 50 activities, and 3 of them had more than 200. There are three main approaches to find control anomalies: (1) Build a model and then verify its properties. One way to verify is building the space state and exhaustively checking the properties (e.g., a deadlock is an inappropriate final state). Another way is finding net invariants; so-called place invariants and transition invariants can be computed by solving systems of linear equations derived from the net [12]. (2) Build a model that is correct by construction, because all grammatical rules used guarantee correct behavior. Abbati et al. [ 1] presents a grammar that introduces parallelism using a construction similar to parbegin-parend pairs. DeMichelis and Grasso [2] annotates the activities in a defined manner to the same effect. (3) A third approach is using only small models by abstracting models
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
7
and submodels; the intent is that it is more likely to have correct behavior if the models are simpler. For example, [9] suggests models with less than 10 activities. The first approach does not explain why a particular model has control anomalies nor which specific part of the model is the culprit. The second approach works by limiting the forms of parallelism; in particular, it is impossible to model all the parallelism in a PERT chart. The third approach is just a rule of thumb, that may or may not work and inhibits parallelism. Besides, in addition to the need of having correct control in a process model, exception handling poses the additional problem of ensuring correctness after an exception is handled, even if due to the exception the model needs to be changed at run time.
1.4
Contents of this Article
In this article we are concerned with the modeling and analysis of control, the definition of a notion of control correctness, a theoretical framework for these purposes that we call the theory of threads of control, and applications of the theory. Section 2 describes constructs for control specification languages in general and the particular language used in this article. Section 3 introduces those aspects of the theory of Petri nets that will be used in latter sections. The main contributions of this work are in Sections 4 to 6. Section 4 formally defines CICN and its mapping into Petri nets, and then describes a series of control anomalies and control properties that characterize control correctness. Section 5 introduces an algebraic theory of threads of control that explains control anomalies. Two applications of the theory of threads are presented in Section 6: an incremental development method and a new advanced language construct called the unbalanced connector, which is used in the development method. Finally, Section 7 puts these results into perspective.
2. Process Model Control Specification Most graphical languages to specify control flow have similar constructs. Thus, instead of doing a detailed language survey we will analyze language constructs on a small sample of representative languages. Language constructs are classified as either basic or advanced. While this classification might seem arbitrary at first, it is based on whether or not the semantics of a construct can be expressed using place/transition nets (as in Section 4.2) using a bounded number of nodes.
8
PABLO A. STRAUB AND CARLOS A. HURTADO
In most process modeling languages, control is expressed by a directed graph with several kinds of nodes. One such kind of node is an activity node that must be present in one form or another in all languages. For example, control flow diagrams (CFD) have three kinds of nodes: statements, twoway conditionals, and two-way joins' (a case-like statement needs also n-way conditionals and joins). Even though edges are not always drawn with arrow heads, they are directed and we will call them arrows; if the arrow head is not present we will assume the edge is directed from left to right.
2.1
Basic Control Constructs
In this section we present basic control constructs like sequencing, choice, introduction of parallelism, and synchronization. We also present abstraction and simple procedure calls. These constructs constitute a meaningful set in which to write control process models. Models written using only basic constructs are called basic models-although they might be very complex. 0
0
Sequencing. In almost all languages, sequencing between activities is expressed using an arrow. Thus, an arrow from an activity A to an activity B means that B can start execution only after A has finished execution; if A is the only predecessor of B , usually finishing A executions is the only precondition that enables B.2 In fact, this is true of both CFD and the critical path method (CPM), as well as most other modeling languages. Arrows do not express sequencing in data flow diagrams or similar languages like SADT or IDEFO [16]. This is a common source of confusion on the semantics of this kind of languages, which do not specify control flow, but merely constrain it. Choice. There are two common ways to represent choice: by special choice nodes or implicit in activities. In CFDs choice is represented by a diamond-shaped node to split control flow and the joining of two arrows in one (sometimes the arrows join in a dot). Languages like CICN use or-nodes, which might be either or-split nodes or or-join nodes, both of them drawn with clear circles. Informally, the behavior of an or-split is that control reaching the node flows to just one of the outputs, and the behavior of an or-join is that control reaching an input flows through the output.
'Joins in CFDs are implicitly represented as the joining of two arrows, or explicitly represented as a small filled circle. 'As far as control is concerned, i.e., abstracting away from resource utilization and other behavioral conditions.
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
0
0
0
9
A combined or-join-split node is present in several languages. In fact, in several languages in which activity nodes allow multiple inputs and multiple outputs, the activity itself behaves as an or-node, with implicit choice. That is, the activity is enabled by just one input and at the end it produces just one output. Iteration. Iteration is not a primitive construct in graphical languages. It is defined using choice constructs, so it is not discussed here, except to note that the possibility of infinite loops is not a concern in control flow: knowing which branch of an or-node is selected is a behavioral problem, but not a control flow problem. If a loop is forced to be infinite because there is no path that leaves the loop, then this is a control flow problem. Parallelism. Like choice, parallelism is represented either by special nodes or implicit in the activities. CICN uses and-splits to introduce parallelism (a thread of control is divided in two or more parallel threads) and and-joins to synchronize threads (two or more threads are replaced by a single thread). A combined and-join-split node is also usual. In the critical path method (CPM), for instance, all activities behave like an and-node, that is, the activity starts when all its predecessors finish, and upon finalization control flows to all outputs. This semantics is very convenient for CPM charts that express parallelism, but cannot express choice, for there is only one kind of node: the activity itself. Simple abstraction. When processes are complex, it should be possible to abstract parts of them in subprocesses. Conversely, given a process, it should be possible to refine it adding more detail. For example, an activity might be replaced by another process. If a process P has an activity A that is refined by a process P' we say that P calls P'. Thus, complex processes are represented by a set of processes and a calls relation, with one root process that directly or indirectly calls all others3 If the culls relation does not allow cycles it represents simple abstraction. In other words, the abstraction is simple if there are no recursive calls. Simple abstraction has a semantics known as the copy rule [19, page 2881, which is basically the semantics of macro expansion. Most commercial and experimental work-flow systems that support abstraction have this semantics, and thus disallow recursion.
'Lamb [ l l ] recognizes several relations between program elements in addition to the culls relation, like the uses and defines relations. In process models, usually the defines relation is empty (i.e. there are no local definitions) and the uses relation is equal to culls (i.e. all uses are calls and vice versa).
10
PABLO A. STRAUB AND CARLOS A. HURTADO
This is not surprising for two reasons: the semantics is easily understood, and it is not always apparent why a process modeler might need recursion. Even simple abstraction is not very simple when abstracting from a multithreaded process. If an activity A is refined by a process P' it must be the case that there is a one-to-one mapping from inputs and outputs of A to those of P ' , or else there will be dangling arcs once the copy rule is applied. In the particular case that both A and P' have a single input and a single output, semantics is simple (e.g., CICN). When more than one input and output is allowed, abstracting control is powerful but might be misleading. There are several possible ways in which abstraction takes place: there is a fixed or semantics (e.g., Rasp/VPL, Copa), that is, activities are single-input-singleoutput, there is a fixed and semantics (e.g. CPM), there are both or and and semantics, there is a general propositional semantics (e.g. P I ) .
2.2
CICN: A Standard Control Language
Information Control Nets is a family of models developed at the University of Colorado for information system analysis, simulation, and implementation 151. The ICN is a simple but mathematically rigorous formalism that has similarity to Petri nets. ICNs are actually a family of models which have evolved to incorporate control flow, data flow, goals, actors, roles, information, repositories, and other resources. This integrated model is well adapted to the traits of a generic process model as described in the introduction. ICNs have been studied in academia and applied in industrial workflow products. The control ICN (CICN) is a simple, known, and representative language for the specification of control in ICNs models. A CICN is a graph in which nodes are labeled as an activity, as an or-node, or as an and-node. There are two distinguished activities start and exit, without a predecessor and without a successor, respectively. Other activities have one predecessor and one successor. Usually or-nodes and and-nodes have either one predecessor and more than one successor (a split) or more than one predecessor and one successor (a join). An or-split represents a decision point. Graphically, activities are depicted as labeled ovals, and-nodes are depicted as dark circles, and or-nodes as clear circles. As un example, consider the credit application process in Fig. 2, whose activities are explained in Table II. The execution of this process begins with the start node, and then executes activities A, B, C and D in parallel. After both C and D are completed, activity G is executed in parallel with the
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
11
n
FIG. 2. A mortgage loan process model in CICN notation. Activities are represented by ovals and and-nodes are represented by black circles. TABLEI1 ACW?ITIES OF THE MORTGAGELOANPROCESS Activity
Description
Start
Fill out application form Verify creditworthiness of customer Register life insurance Set up expense account Identify property to buy Get legal data for property Verify legal status of property Appraise property and verify value is okay Notify customer of outcome
A B C D E F G exit
sequence E, F. When A, B, G and F are done, the process executes the exit node and stops.
2.3
Advanced Control Constructs
Like some programming languages, some process modeling languages have an array of relatively sophisticated control constructs. Among these advanced constructs there are exception handling constructs, which define complex state transitions, but whose semantics is expressible using P/T nets. We will show in Section 6 how the theory of threads provides a framework to define a very general state transition, called an unbalanced connector. In this section we briefly mention two other kinds of constructs: recursive calls between process models and replication of activities.
12
PABLO A. STRAUB AND CARLOS A. HURTADO
0
Recursion. Consider negotiation processes between a provider and a requester. Using software such as the Coordinator system [14], the state of a negotiation between a person A acting as supplier and another person B acting as consumer can be maintained and transitions can be recorded by the computer. It is usual that as a result of a negotiation process the supplier requests something from another supplier C, that is, the negotiation process is intrinsically recursive. If process models are not recursive, the subordinated negotiation process will not have any formal link to the original process. Changes in the state of one of these processes might affect the other. For example, a renegotiation between B and A might trigger a renegotiation between A and C as a subprocess. With recursion, this trigger is automatic. Implementing recursion requires separate states for the execution of called models; the state of the computation of a model must be linked to the state of execution of the calling model. This is simpler than in a language such as Pascal where separate dynamic and static links are needed, because process models do not have local submodels (much as C functions cannot define local functions). On the other hand, recursion in the presence of parallelism cannot be implemented simply with a stack and a single current instruction pointer, as in most sequential programming languages. The semantics of recursion cannot be expressed using P/T nets, because the copy-rule as semantics of procedure calls cannot be applied [19]. High-level nets can be used to express recursion.
0
Replication. Replication of activities occurs when an activity must be performed several times in parallel, as opposed to several times in sequence. If “several” means a fixed number n of times, it can be modeled by an and-split followed by n copies of the activity followed by an and-join. If n is not fixed, a new construct is needed. In languages like Rasp/VPL replication is denoted by an icon that looks like a stack of regular activity icons. But does replication occur in practice? Yes, for instance, consider a software development process, where a set of modules needs to be implemented by possibly different people. This can be modeled by a replicated activity. Another use for replicated activities is defining a single activity performed by an unspecified number of people, like a meeting. Replication can be implemented in a workflow system by a loop that creates all required parallel processes; to synchronize these processes, an integer semaphore initialized to n might be used. Again, the semantics of replication cannot be expressed with a fixed P/T net, but can be expressed using high-level nets.
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
13
3. Petri Nets Petri nets are a mathematical formalism developed in the early 1960s from the work of C.A. Petri in 1962, in which he formulated a basic theory on the communication of asynchronous components of systems. Petri nets are a generalization of transition-state diagrams from automata theory. Unlike ST diagrams, the global state is defined as the union of local states which enable transitions in the net by themselves. Having a distributed state allows the expression of not only choices but also parallelism within a system. Petri nets have been widely used to describe the semantics of many process modeling languages and workflow systems (see, e.g., [ l ,5,18, 25,28,29]). Its success is due to the possibility of formally analyzing and simulating dynamic aspects of systems. There are many kmds of Petri nets. They are classified in two groups: low-level nets and high-level nets. In both groups, a system model is presented by a graph with two kinds of nodes (places and transitions) and a set of tokens. In the first group, tokens represent boolean conditions that enable the firing of transitions. Upon a firing of a transition, the tokens that enable the transition are deleted and new tokens are created, as if tokens were moving through the net. Important kmds of low-level nets are elementary nets and place/transition nets. In the second group, tokens represent not just boolean conditions, but complex data items or predicates. Important kinds of high-level nets are colored Petri nets, predicate/transition nets, and environment/transition nets.
3.1 Place/Transition Nets Place/transition nets are adequate to model basic control constructs. However, they are not useful to model more complex forms of control like recursion and replication of activities, let alone other process model issues like functionality, timing, resource utilization, etc. In this section we will show some basic aspects of P/T nets, which will be used to base the semantics and analysis of basic control models. A P/T net is a directed graph with two kinds of nodes, called places and transitions. A net is a triple N = ( P , T , F ) , where P is a set of places, T is a set of transitions, and F is a set of edges from places to transitions and from transitions to places, that is, F C ( P x T ) U (T x P ) . Places and transitions are called nodes. A node cannot be both a place and a transition, i.e., P n T = 0. The flow relation F defines for each node x E P U T a set of successors, denoted x', and a set of predecessors, denoted 'x. Nets are drawn as follows: places are represented by circles, transitions
14
PABLO A. STRAUB AND CARLOS A. HURTADO
p2
t2
d b2
P4
(a)
(b) FIG.3. (a) A Petri net; (b) one of its processes.
are represented by rectangles, and the flow relation is represented by a set of directed arcs (arrows). For example, for the net in Fig. 3(a), p = { P I , P ~ ~ P ~ , P T~ =, P{ f i~? fIz ~t f 3 1 , and F = { ( P i , f i ) , ( f i , P 3 ) 9 ( P 3 r f 3 ) , ( f 3 , P ~ ) (, P 2 9 f 3 ) v ( P 2 r f 2 ) r ( f 2 9 P 4 ) I .
A path in a net N is a non-empty sequence n = x o x l ... x, such that ( x i - l ,x i ) E F , for 1 C is k. A net is said to be connected if every pair of nodes ( x , y) belongs to the symmetric, transitive, reflexive closure of F , i.e. (x, y) E ( F U F - I ) " . A net is strongly connected if for every pair of nodes ( x , y) there is a path from x to y. A P/T system is defined by a net and an initial marking or state, where a marking is a mapping from places to numbers (i.e., the count of tokens in where N = (P.T , F ) is a net and each place). A P/T system is a pair ( N , Mi) M i:P + N, is the initial marking. For example, the initial marking in Fig. 3(a) is M i =( p i + + 1, p2++1,~ ~ - ~0 ~, - 0~ ~, - 0 ) . If in a marking M all input places of a transition t have tokens, the marking enables t. If a transition is enabled at a marking M , t can fire and produce a marking M' by removing one token from each input place and producing one token in each output place. A marlung M is reachable from a marking M' if there is a sequence of firings enabled at M' whose final marking is M.A marking M is reachable if it is reachable from Mi.
3.2
Free-choice Nets
In a P/T net, synchronization is represented by a transition having more than one input place, and choice is represented by a place having more than one output transition. In general, choice and synchronization could be mixed. For example, in the net in Fig. 3(a), choice at place p 2 is interfered with by synchronization at transition f3, for choice depends on the existence of a token in place p 3 . Free-choice Petri nets [4] form a subclass of the P/T-nets, in which choice in a place is not interfered with by other parts of the system. This means that choice and synchronization do not mix, or that choices are in a sense free. A sufficient condition for free choice is that places with more
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
15
than one output transition (where choices are made) are not connected to transitions with more than one input place (where synchronization occurs). While this condition is not necessary for free choice as defined by [4],it is satisfied by the systems considered in this paper. The importance of a net being free-choice is that many interesting properties can be decided in polynomial time.
3.3 Partial Order Behavior A sequence of enabled transitions in a system describes executions of the net. These sequences, however, do not determine causal dependencies between transition occurrences. To describe causal dependencies, an alternative representation of executions is needed, in which an execution is a partial order of firings (as opposed to a sequence, which is a total ordering). This partial order is also described by a net called a causal net. A causal net has no choice nor cycles. A causal net is a place/transition net N' = ( B , E , F ' ) , where B is a set of places, E is a set of transitions, F' is the flow relation, each place has at most one predecessor and at most one successor, i.e. V b E B : # ' b s 1 A # b * G 1, and finally the net is acyclic, that is, there are no paths with two occurrences of the same node. For example, the causal net of Fig. 3(b) defines a process for the system in Fig. 3 ( 4 , where q(b,)= p i . q ( e i ) = t l , d b z ) = p 2 , 4 ( b 3 )= p 3 , q ( e d = t 3 , q ( b J = Ps. A process is represented by a causal net which is related to the execution of a P/T system.
Definition 3.1 (Process) A process is a tuple n = ( B , E , F ' , q ) where ( B , E , F ' ) is an acyclic place/transition-net without choices, q is a total function from elements of n to elements of a P/T net N = ( P , T , F ) , and the following holds: q ( B )c p A q ( E ) , V e € E : q ( ' e ) = ' q ( e ) A q ( e ' ) =q(e)'. The first condition relates places and transitions of the process with those in the system. The second condition implicitly defines F' in terms of F. The initial state in N before the execution of the process is defined by the number of places without predecessors in the process that corresponds to each place in N :
M,(p ) = # { b E B l p = q ( b ) ,'b= 0 } . Likewise, the state reached in N after the execution of the process is defined
16
PABLO A. STRAUB AND CARLOS A. HURTADO
by the number of places without successors in the process that correspond to each place in N .
M,( p ) = #( b E B 1 p = q ( b ) , b' = 0 ) .
3.4 Behavioral Properties and Invariants There are several important concepts and properties related to the behavior of a P/T system. Some of them are defined here. 0 0
0
0
A deadlock in a system is a reachable state in which no transition is enabled. A system is deadlock-free if for any reachable state M there is a transition t , enabled at M . A system is live if for any reachable state M and for any transition t , there is a state M', reachable from M that enables t. A system is n-bounded if for any reachable state M and for any place p , M ( p ) s n. A system is bounded if there is such an n. A system is safe if it is 1-bounded.
A comprehensive study of these and other properties is in [4]. The dynamic behavior of a P/T system depends upon the net structure and its initial marking. The influence of the structure is very important, because it holds under every initial markmg. Moreover, the structure can be studied independently from the initial marking. Invariants are properties that hold in all reachable states of a system. Different kinds of nets have different kinds of invariant predicates over net states. In particular, functions that map places to rational numbers which depend only on the net structure have been studied extensively. These functions determine an invariant property of a system: the weighted sum of the tokens in the net is constant under all reachable states. By a slight abuse of notation, these functions are known as place invariants or S-invariants. Given a net, a place invariant is a rational solution to the homogeneous set of linear equations
c,,.,,I ( P ) = c,,;, I ( P )
where the unknown variable I is a vector whose dimension is the number of places in the net. The fundamental property of a state invariant I is given by the following equation, which defines the conservation property described in the preceding paragraph:
C P EI(p p ) x Mi(p ) = &, I( p ) x M ( p ) = constant where M iis the initial marking and M is any reachable marking.
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
17
S-invariants are strongly related to behavioral properties of Petri nets. Among other things, the existence of an invariant in which all components are positive ensures that the net is bounded. We will see (Section 5) that a special kind of non-numeric invariant is related to notions of correctness in control models.
4. The Simple Control Property Figure 4 shows a simple process model to approve a loan for a home. The first activity, a , is the customer’s application, then and-node x splits execution in two parallel activities ( b , credit approval and c, mortgage approval). After each activity, a choice is made (at or-nodes u and v). If both activities are successful, they synchronize in and-node y and the process proceeds to the exit node, so that the credit might be issued. Of course, not all applications are approved; if the credit is not approved the mortgage approval activity c becomes useless; in a realistic situation this activity is composed of several other (now useless) activities. Moreover, upon finishing the mortgage approval activity and-node y will attempt (and fail) to synchronize with the credit approval activity b. On the other hand, if both the credit and the mortgage are rejected, the process will produce two tokens in the w or-node, both of which will reach the exit node (the process exits twice!).
4.1
The Control Model
A CICN is a directed graph with three lunds of nodes: activity nodes, ornodes, and and-nodes. There are two special activities: the start node and the exit node. A node that is not an activity node is a control node.
Definition 4.1 (CICN) The control model is a directed graph (A, 0, N , start, exit, R ) , where A is a set of activities, 0 is a set of or-nodes, N is a set of and-nodes, start is the start node, exit is the exit node, and R is the flow relation. The set of all nodes is V = A U 0 U N .
FIG. 4. Example of a CICN net with a potentially anomalous loan approval process model:
Or-nodes are represented by white circles.
18
PABLO A. STRAUB AND CARLOS A. HURTADO
The following conditions hold: 0
0 0 0 0
Start and exit are activities, i.e., (start, exit) A . Activities and control nodes disjoint, i.e., A r l 0 = A n N = 0 f~ N = 0. R is a relation on nodes, i.e., R C V x V. Start has no predecessors; exit has no successors. For every node x there is a path from start to exit that includes x , i.e. Vx E V: start R*x A x R* exit
where R * is the reflexive and transitive closure of R. The semantics of CICN can be expressed directly, without the use of Petri nets [ 5 ] . A marked CICN is a CICN along with a function m from nodes and edges to the naturals, i.e., unlike P/T nets, all nodes and arcs might be marked. The initial marking has a single token in the start node. In general, marked edges enable the start of a node and marked nodes represent activities in execution and thus enable the termination of the node. Thus, the initial marking enables the termination of the start node, which will mark the edge following the start node. Or-nodes require one token in one of the incoming edges to start; upon termination they add a token to one of its outgoing edges. And-nodes require one token from each incoming edge to start, and produce one token in each outgoing edge upon finishing. Activities require one token in their incoming edge to start, and upon termhation produce one token in their outgoing edge. While not part of CICN, the most usual semantics for activities with several inputs and outputs is that of an or-node. 4.2
Petri Net Semantics
The behavior of CICN can be modeled by a P/T net by translating each of the elements in the net into part of a P/T net as in Fig. 5 and then connecting these parts. In CICN edges can be marked, hence we translate each CICN edge into two P/T net edges with a place in the middle. CICN nodes can also be marked and have two changes of marking: start and termination. Hence each node will be translated into a sort of sequence: transitionplace-transition. An and-node is translated as a sequence of one transition connected to the place connected to one transition. An or-node with n incoming edges and m outgoing edges is translated into n transitions connected to a place connected to m transitions. A regular activity may be regarded as an or-node; if it has one incoming edge and one outgoing edge it translates into the same as an and-node. The sfurt (respectively, exit) activity has a special translation, as a sort of incomplete activity, because it does not have a predecessor (respectively, successor).
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
CICN
PIT-net
CICN
19
P/T-net
FIG.5. Translation of CICN components into P/T net components.
It is not hard to realize that the resulting P/T net has the same behavior as the CICN, because the correspondence between CICN marlungs and P/T system markings is trivial.
Definition 4.2 (Place/transition net of a CICN model) The place/transition net corresponding to a control model At is the place/ transition net N(&) = ( P . T , F ) where P=V UR T = ( a , I a E A - (start]] U (a,( a E A - {exit]]
[transitions of activities 1
u ( x0 1 0 E 0 , ( x , 0)E R ] u ( ox 1 0 E 0, (0,x) E R 1 u{n,InEN)U{nfInEN) F = { as-u I a E A - {start]1 U u-afI
[transitions of or-nodes I transitions of and-nodes I
a E A - ( e x i t )}
u (xoHOIOE 0, (x,o)ER)U (o-ox(oEO,
[within activities] (0,X)ER)
[within or-nodes1 U(n,Hn(nEN]U(nHnf(nENJ [within and-nodes1 u ( x p(x,y) 1 X E A u N ) [from activities and and-nodes1 [from or-nodes] u b Y H ( 0 , Y ) I 0E 0 ) U I (x,Y ) + + x , 1 x E A U N ) [to activities and and-nodes] [to or-nodes1 u ((x, o ) + x 0 1 o E 0 ) An example is in Fig. 6(a) which shows the place/transition system corresponding to the model in Fig. 4. This translation creates only free-choice nets [ 2 5 ] , because the only places with more than one successor are those of or-nodes, but their
20
PABLO A. STRAUB AND CARLOS A. HURTADO
FIG.6. (a) Place/transition net corresponding to the loan approval in FIG. 4. (b) One of the processes corresponding to the net, representing the case in which the mortgage is found acceptable but the credit is rejected.
successors have only the place as predecessor. That is, when there is choice there is no synchronization. The P/T net for a given model JU becomes a P/T system if it has a defined initial marking.
Definition 4.3 (Placeltransition system of a ClCN model)
A control model At has one place/transition system defined over the P/T net N(JU). The initial marking is
Call semantics. The translation above does not include the possibility of assigning a whole process model to an activity, i.e. having an activity whose execution comprises the execution of another process. This implies that there is a hierarchy of models. The semantics for simple calls can be expressed by the copy rule as in programming languages [19], that is the process model hierarchy is flattened before its conversion into Petri nets. Another possible translation for a call can be developed as a refinement of Petri nets. In that case, the structure of calls at the process model level is kept and there is a mapping between Petri nets. Figure 7 represents how an activity u in a model At is mapped to another model A ’ , in terms of Petri nets. The basic idea is that the place a in N ( A ) is refined into the whole net N ( A ’ ) .
21
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
...
Caller start
-0
a
exit
Caller’s translation
Called’s translation
0-43-0
Called
exit
FIG. 7. A caller model ht with an activity u that calls a called model A’.The figure shows the translations in terms of Petri nets and the refinement mapping of a into a whole net.
4.3
Behavioral Properties
One basic property of a good model is that it does not deadlock, that is, each process of the model reaches an output socket to produce a termination response to the environment. Inappropriate control structures can create deadlocks as in Fig. 8.
Definition 4.4 (Deadlock) A final marking M, is a deudlock in a process model if Mf(exit) = 0. Property 4.1 (Deadlock freedom) A model is deadlock-free if none of its final markings is a deadlock.
b
b
(4
(b)
FIG. 8. Models that deadlock; (a) guaranteed deadlock; (b) deadlocks when activity u chooses one path and b chooses the other. This is a distributed decision, i.e. a single decision must be taken by two independent executors.
22
PABLO A. STRAUB AND CARLOS A. HURTADO
Looking at Fig. 8(b) it seems that distributed decision is a structural property of a net. A relevant question is whether there is a class of nets that do not have distributed decision, i.e. local-choice nets. The property of being local-choice is not a structural property that can be checked by a simple predicate on the flow relation (as the free-choice property). Figure 9 shows a model that is not local-choice, but the and-nodes that may deadlock are far away from the activities that made the distributed decision. Process models can suffer from prescribing too much work, so that some activities are unnecessarily performed, because in some particular execution the exit of the model does not depend on them, i.e., the activity is useless. Useless activities are those that are unnecessarily performed in some particular execution, because there is no causal relation from the activity to the exit node. In other words, the process could produce the same output without executing the activity. For example, activity c is useless in the process pictured Fig. 6(b). If tokens are regarded as information placeholders, useless activities represent unused information. Useless activities are the consequence of choices that involve parallel activities. Given a process, a place is useless if there is no path from the place to the exit place. To define useless activities we need a notion of behavioral semantics of Petri nets that can represent causality, i.e. true parallelism as opposed to interleaving semantics. A parallel semantics of Petri nets represents executions of a net by a process, in which precedence represents causality (Section 3). In a process, a place represents a token, a transition represents a firing occurrence, and a path between two nodes represents a causal dependency between them. Useless activities are defined in terms of processes instead of final markings, because from a final marking it is impossible to know whether an activity was even executed. However, the existence of useless activities can be characterized in krms of final markmgs. A process model has no useless activities if and only if in all final markmgs all tokens are in the exit node [22].
FIG.9. This process model has distributed decision between activities A and B : executors of A and B are both to decide on the future of the process (decisions are representedby the ornodes immediately after the activities). If they take different decisions, the process deadlocks because the and-join will have two tokens on the same arc, instead of one token on each arc. The blobs represent submodels without control anomalies.
23
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
Definition 4.5 Given a process, a place is useless if there is no path from the place to the exit place. In other words, a place e within a process is useless if ( e ,exit) E F where F' is the transitive closure of F. An activity a is useless if there is a place e in a process for the model which is related to the activity, i.e., q ( e ) = a , and e is useless in the process. +,
Property 4.2 (Usefulness) A process model is useful if no activity is useless in any of its processes. For example, consider the process model from Fig. 10, whose translation into a Petri net is shown in Fig. 1 la. In one possible process of that net (Fig. 1lb), activity B is useless, hence the process model is not useful. Tokens in interior places (i.e. places that does not correspond to sockets) in a final marking that is not a deadlock define an overloaded markmg.
Definition 4.6 (Overloaded state) A final marking M f is overloaded if it is not a deadlock and there is an interior place p such that Mf(PI > 0.
C FIG. 10. A process model in CICN notation: activities are represented by ovals, and-nodes by black circles and or-nodes by white circles.
-0-DC)
exit
FIG. 1 1 . (a) The translation into a Petri net of the model in FIG. 10. (b) One of the processes of the Petri net, in which activity 8 is useless.
24
PABLO A. STRAUB AND CARLOS A. HURTADO
Useless activities are defined in terms of processes instead of final states, because from a final state it is impossible to know whether an activity was even executed. However, the existence of useless activities, can be characterized in terms of final states, as is stated in the following theorem. Theorem 4.1 A process model is useful overloaded markings and is deadlock-free.
if and only if it has no
Single-input-single-output is the most commonly accepted form of model abstraction, as used in ICN, VPL/Rasp, Action Workflow, and other languages. If a process is viewed as a black box, then if there is one token in the process, eventually there will be one token in the output. Given the semantics of activities in these languages, abstraction of a process as a compound activity is only possible if the model has single-input-single-output. Some languages [6,9] define other lunds of behavior, where and and or outputs are mixed, defining an unnecessarily complex control logic within activities. Hence, another property of a good process model is single-response, that is, each enaction of the process produces exactly one output (a singleresponse process model is by definition deadlock-free). If a process can produce more than one output in the same enaction we call the process multiple-response (Fig. 4).
Property 4.3 (Single response) A process model is singleresponse if all final markings M , satisfy M, (exit) = 1. It is multiple-response if there is a final marking M, such that M,(exit) > 1. The simple control property summarizes or-behavior.
Property 4.4 (Simple control) A model has simple control if the
only final marking M , is
Mf@) =
1 i f p =exit 0 otherwise
Simple control implies that if a model begins with a single token in one of its input sockets, it ends with a token in one of its output sockets and there are no other tokens. Theorem 4.2 provides an alternative definition of simple control in terms of other properties. Theorem 4.2 A process model has simple control single-response and useful.
if and only if it is
A model with simple control is said to be behaviorally correct. There are two reasons to adopt this notion of correctness. First, from the above theorems there are no control anomalies, like deadlock, useless activities,
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
25
and multiple response. Second, a simple control model behaves like an activity; this allows consistent or-abstraction in which a process model can be used safely within a larger model without assumptions of its behavior (except for simple control).
4.4
Discussion
Basic constructs are all free-choice, i.e. they lead to free-choice Petri nets [25].This can be observed from the semantics of most languages to model process behavior. This allows simpler model analysis of control properties and the development of thread theory in the following section. Should choice and synchronization mix in BP models? In most situations no, because synchronization occurs within the model and choices are the result of the functionality of the activities which are executed without reference to the model’s state, that is, the state is not an input to the execution of an activity. We call this the principle of individual decisions; following this principle necessarily leads to free-choice nets. This principle is desirable from a language perspective, as it allows a sort of referential transparency with respect to the meaning of an activity. That is, the functionality of an activity is independent of the model in which the activity is embedded. This is needed if a system supports abstraction so that a model can be used as an activity in several other models. For example, in a banking application, there can be a process model for credit approval which is used as an activity in several process models for different kinds of loans. However, sometimes we want to model the situation in which an actor decides for another actor working in parallel, taking away his authority to choose. If we regard this authority as an order to make a decision, then this situation is a counterorder. Traditional modeling constructs do not allow one to specify this situation. The unbalanced connector and its supporting theory of threads is a simple, high-level and succinct construct to model this phenomenon. Simple control is related to some behavioral properties of free-choice Petri nets. In fact we prove in [22] that a model has simple control if and only if a connected free-choice net derived from the model is live and safe, as defined in Section 3. Because these properties can be determined in polynomial time for free-choice nets [4], this implies in turn that simple control in free-choice nets can be decided in polynomial time.
5. A Theory of Threads of Control We use a thread metaphor in which a thread is a set of strands for a single rope that goes from an input to an output in a process model. The metaphor
26
PABLO A. STRAUB AND CARLOS A. HURTADO
is best understood by considering a model with parallelism but no choice (e.g., a PERT chart). A single rope in which all strands are as long as the rope is passed from the input to the output by dividing the groups of strands into splits and uniting them in joins. In the metaphor, a thread is a set of strands. This metaphor can be extended to models that do have choice; in that case, whatever choices are made, all ropes that get to a choice point (i.e., an or-node) are part of the same thread. In other words, making a choice does not change the thread: only and-nodes change threads. The theory defines the concept of the thread of control ?#(n)of a node n, the subthread relation C, and thread addition Q. Threads are algebraic expressions whose operators are nodes in the net. The intuition behind the theory of threads is that threads are divided by and-splits and added by andjoins, in such a way that for every and-node the sum of the threads of its predecessors equals the sum of the threads of its successors. Each activity and or-nodes has one and only one thread. We have shown elsewhere that a model has no control anomalies like deadlock (i.e., no output), multiple response (i.e., more than one output), or useless activities, if and only if threads are well defined: the thread of the start node equals the thread of the exit node. The thread of a node x is denoted +(x), thread addition is denoted 8 , and the subthread relation is denoted For example, in Figure 12, the following relationships among the threads are true: v ( A ) = v ( o r l )= v(B)= v ( o 4 c v ( s t a r t ) = v ( e x i t > , v ( C >Q v ( o Q ?#(A)@w(C>= ?#(start), ?#(A)is not comparable to ?#(C). In Fig. 10 threads are not well defined. The reasoning is as follows: (a) It should be the case that ?#(A)8 v ( B ) = ?#(or2) so that the and-node that joins A and B is balanced. (b) It should be the case that v ( A ) = v ( o r l )because they are connected, and also that ?#(or,)= ?#(orz).But these equations are unsatisfiable because there is no equivalent of zero or negative threads. Thus, activity A and or-node or, belong to two different threads. This in turn implies that the model has behavioral anomalies, in this case the (unresolved) useless activity B which may lead to an extra token upon reaching the exit node.
c.
FIG. 12. A CICN model without choice and well-defined threads
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
27
The following goals are derived from the thread metaphor to characterize good models.
(1) An and-node joins input threads into a superthread and splits the superthread into output subthreads. ( 2 ) Two activities connected in sequence have the same thread. ( 3 ) Threads are executed sequentially, i.e., if two activities can be active at the same time, they belong to different threads. (4) If a thread is active (i.e. one of its activities is active), then its subthreads are inactive. If a subthread is active then the thread is inactive. (5) The start node and the exit node belong to the same thread, called p . (6) Every thread is a subthread of p . There is a strong relationship between threads and behavior. First, threads are a lund of non-numeric place invariant (i.e., the weighted sum of the tokens is constant). Second, a model has simple control if and only if every node belongs to one thread and the thread of the start node equals the thread of the exit node. Thus, the model in Figure 12 has simple control, while the model in Figure 10 does not. Moreover, if this condition does not hold, an explanation on the origin of the problem can be derived (e.g. distributed decision between A and B would be diagnosed for Figure 9, and a connection from a subthread to a superthread between the or-nodes would be diagnosed in Figure 10, page 23). Section 5.1 defines an algebra in which terms can be interpreted as threads satisfying all these goals. Section 5.2 shows that we can assign a unique thread to each place in a model if and only if the model has simple control. Moreover, if the model has no simple control, analysis of threads sheds light into the reasons for not having simple control.
5.1
Thread Labels and Threads
The definition of threads is done indirectly via thread labels. A thread label for a place represents one possible destiny or future history for a token in that place (i.e. a set of paths that is part of a process that begins with a token in p ) . Ldcewise, a thread label for a transition represents one possible destiny for a set of tokens which enable the transition. Because there are choices, hence several possible destinies, each node in the net is assigned a set of thread labels. Only those destinies that end with one token in the exit node of the model are considered. Thus, thread labels capture successful executions.
Definition 5.1 (Thread labels) The set of thread labels of a
28
PABLO A. STRAUB AND CARLOS A. HURTADO
process model At whose net is ( P , T , F ) , denoted L,, includes the following and nothing else: 0
0 0
every place or transition x in P U T ; thesymbolp; the label multiplication a @p, where a and /3 are labels; the label addition a o/?,where a and /? are labels; ( a ) ,where a is a label.
Label equality is defined by the following axioms: (1) (2) (3) (4)
addition is associative, a @ (/?8 y ) A ( a S B ) o y; addition is commutative, a Q p = /? Q a ; multiplication is associative, a QI (B @ y ) ( a @/?) y; multiplication distributes over addition, a @ (B o y ) a @/? Q a QI y , and ( a @ / ? ) @Gya @ y s / ? @ y .
As usual, multiplication has higher precedence than addition, and juxtaposition means multiplication, e.g. a o/?8 y = a 0 (/?@ y ) = a Q B ~ . The meaning of a label is a set of paths in the net. For places and transitions, these paths have length one. Label multiplication @ denotes a sort of relational cross product of paths, i.e. an element of the product is the catenation of an element from the multiplier with an element from the multiplicand. Label addition 8 denotes union of sets of paths. It is easy to check that whenever two labels are equal then they denote the same sets of paths, because cross product and union satisfy axioms 1 to 4. A label for a place represents one future history of a token in that place. If there are two or more tokens in a process, the set of futures of the process does not include all possible combinations of futures of these tokens, because their futures might eventually interact (e.g., by synchronization). Labels are said to be consistent if whenever they refer to the same place, the future history of a token at that place is the same, i.e. decisions taken at common places are the same. The definition of label consistency is syntactic if labels are expressed in a normal form.
Definition 5.2 (Label normal form) A label is in normal form if it is written as a sum of factors, without parentheses nor multiplication symbols. Any label can be normalized, by distributing multiplication successively and then dropping all parentheses and multiplication symbols.
Definition 5.3 (Consistent labels) A set X of normalized labels is consistent if for each place p that occurs in the labels, either all occurrences
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
29
of p are the rightmost symbol of some label, or there is a transition t such that all occurrences of p are right-multiplied by t. In other words, a given place has one successor in all the labels, or no successor in all the labels.
Definition 5.4 (Model labeling) The labeling of a model A is a function z : P u T+2: from nodes of the net to sets of labels, defined by 0
the only label for the place y that corresponds to the exit node is the place itself
d P ) = {PI. 0
Labels of a transition t are computed by adding a consistent set of labels, one from each successor of t , and pre-multiplying it by t : z ( t ) = ( t m ( a , CB. .. 0 a,) 1 t'
= ( p , , .. ., p , ) A
(A:=, 0
a , € t ( y ! ) ) A ( a ,,..., a , ) isconsistent).
Labels of a place p # exit are computed by pre-multiplying labels from its successors by p . z( Y ) = U r e p . { p & a I a E z(t)I
For example, part of the labeling of the model in Fig. 6(a) is shown in Table E is a shorthand for exit. There is an equivalence relation for thread labels, which is the base of the definition of a thread. 111, where
Definition 5.5 (Label equivalence) Label equivalence, denoted G , is the least equivalence relation that satisfies the axioms for equality and the following axioms: (6) OPE.,p m c 8 a a ( 7 ) p + p , if p is the place of the exit node. TABLE111 LABELING FOR
FIG. 6(a)
30
PABLO A. STRAUE AND CARLOS A. HURTADO
Intuitively, axiom ( 5 ) represents the firing of a transition. Applying this axiom changes the interpretation of labels, though: they are not always paths in the original net (they are paths in a net that may be collapsed in parts). Finally, axiom (6) says that should the model be associated with an activity a in another calling model, p denotes q(4)in the other model. A thread is an equivalence class of thread labels. A threading is a total function from the places of a model to threads, such that the start node is mapped to the same thread as the exit node, i.e. p.
Definition 5.6 (Thread and threading) A thread is a non-empty equivalence class of labels. A labeling t such that all labels assigned to a given place p are equivalent and the start node belongs to the thread p , i.e., (b'a,B € z( p ) : a
defines a threading $J: P + 2 ;
B ) A (3E
start) :I& p )
such that W ( p ) = [ a [ 38 E t( p ) :a t P}.
Notation. We usually denote a thread by a label (one of its members), e.g., q(p ) = a means a E q(p ) . Likewise, operations on labels are extended to operations on threads. Thread equality is simply set equality; hence, with a little abuse of notation, we write a = B to mean that both labels belong to the same thread, i.e., a B. For example, the threading of the model in Fig. 12 is as shown in Table IV, in which fy is the name of the start transition of y, p o is the name of the place between the final transition of or2 and t,, and p c is the name of the place between the final transition of c and ty. Now, we cannot derive a threading for Fig. 6(a). Consider the partial labeling shown in Table llI. Node Y has two labels that must be equivalent. The first label (the one going to translation t9) can be simplified to p, because all transitions in the label have only one predecessor. The second cannot be simplified to p; in fact its simplification is vt7yp,which is different from p because transition t7 has more than one predecessor and no addition is present in the label.
Definition 5.7 (Subthread, superthread) A thread a is a subTABLEIV THREADING OF MODELOF FIG. 12
Place p exit, start, and,, andz A , or,, B , or2
C
Thread v( P )
P PJ,P P
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
thread of a thread B, denoted a p, if there is thread d such thata or a is a subthread of one of the addends of B, i.e.
31 =B@S,
a C B W (36, y : a = 6 8 B ) V ( 3 y , E : B = y @E A a c y ) . The inverse relation is called superthread. Our definition of threads meets the six goals at the beginning of this section. This is because threadings can be regarded as positive non-numeric invariants, hence properties of invariants apply to threads. Theorem 5 . I Given a model whose connected net is ( P , T , F ) , its connected net has an extra transition t, such that {exit}= ' t , and t; = { start}. Then, i f the model has a threading I),for every transition t E T U 1 f , 1 , Q P E . ,I ) ( P ) = Q , , ; + ( P I .
If the function I)is rational-valued, as opposed to thread-valued, and the symbol Q is replaced by a summation symbol, the above equation is the standard definition for so-called S-invariants in Petri nets. The characteristic property of S-invariants is the fact that the weighted sum of tokens remains constant. This property also holds for threadings. Thus, if a model has a threading, the summation of the active threads (those that have a token) is ,a, in any reachable marking M.
5.2 Threads and Behavior Given a set of compatible labels whose summation is equivalent to p , these labels represent a (partial) process. The construction of the process from the labels is rather simple. The importance of this construction is that the final state implied by the process has a single token, in place exit, that is the set of labels represents a correct process. Furthermore, it can be proven that a process model has a threading if and only if it has simple control.
5.2.1 labels and processes Different labels are related to different processes. In fact a set of labels satisfying certain constraints defines a process. Given a consistent set [ a , ,..., a,,} of normalized labels taken from a labeling z, such that QY,i a, Ap, then it is possible to build a causal net ( B , E , F ' ) with only one place without successors. The construction has the following four steps:
S1. Let a , = 6,,@ @ 6 , ,where , 6,)has no operators. Each 6, is an odd-length string of alternating places and transitions, whose last symbol is the place of the exit node. - . a
32
PABLO A. STRAUB AND CARLOS A. HURTADO
S2. The set B of places in the causal net is obtained from the places of the Q,. S3. The set E of transitions is obtained from the transitions of the 6,. S4. A node x is a successor of a node y if there is a 6, of the form axyp, where a and p are possibly empty strings. The following lemmas claim that the net so constructed is a causal net and also a process of the original net N ( A ) , where the initial state is the one that marks all states related to the labels in the set. This process ends in a state whose only token is in the exit node (i.e. a state that satisfies the simple control condition). Lemma 5. I lfall labels have no cycles (hence arefinite), the net defined by the four-step construction is a causal net, i.e. ( a ) it has no cycles; ( b ) each place has at most one successor and ( c ) each place has at most one predecessor. Lemma 5.2 I f all labels have no cycles, the causal net of the four-step construction is a process of N(A)that begins in a state with one token in each place related to the labels (and no more tokens) and ends in a state with one token in an output socket (and no more tokens).
5.2.2 Threads and simple control The purpose of this section is to relate simple control and threadings. Theorem 5.2 uses the results of the previous Section to prove that a model that has a threading has simple control. Then a series of lemmas are introduced to prove in Theorem 5.3 that a model with simple control has a threading. Theorem 5.2 (Threading implies simple control) .A has a threading 11, then A has simple control.
If a process
model
We want to prove that simple control and the existence of a threading are equivalent. To prove that all models with simple control have a threading we use some properties of connected free-choice place/transition nets. Lemma 5.3 I f a model has simple control all places have at least one label. Lemma 5.4 If a model has simple control, all labels for a given place are equivalent, i.e.,
vp E P :v 1I , I ,
E z( p ) : 1 1 A
12
Theorem 5.3 (Simple control implies threading) A model .A that has simple control has a threading.
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
33
There are three possible causes for not having a balanced threading, which can be interpreted in terms of behavioral properties. First, there might be a place with no labels. If a place in the model has no label, this implies there is a proper trap in the net (a proper trap is a set of places that once they receive a token they always have some tokens [4]) and the model has an overloaded state, unless a deadlock upstream impedes reaching the trap. Second, there might be two unequivalent labels for a place. If there is a place with unequivalent labels, this means that this place is an activity whose output sockets are connected to different threads: this implies either deadlock or overloaded state. Third it might be the case that the thread of the start node is not p; in that case it is not possible to reach the proper final state, because the threading is an invariant.
6. Applications of Thread Theory The main two applications of the theory of threads are the development of consistent models and the definition of a new modeling construct called the unbalanced connector. These two applications are related, because the unbalanced connector is basically an and-node with a special semantics that guarantees consistency. There are basically two approaches to build consistent models: (1) build a model, then deduce that the model is consistent (otherwise modify the model); or (2) inductively develop the model using a set of primitives that guarantee consistency.
Proving consistency of a model. The usual method to deduce properties like simple control is based on a standard Petri net analysis, the reachability graph, which shows all reachable states for the net. There are many useful properties derivable from the reachability graph. Unfortunately the size of the graph might in general be exponential on the size of the net, so some methods to reduce the graph are needed.4 Another method to prove simple control is by computing a labeling and showing that it is (or it is not) a balanced threading. An algorithm to compute a threading is derivable from the definition. The idea is to assign labels to the outputs and compute labels of predecessors. Whenever a place has more that one label, they are proven equivalent by term r e d ~ c t i o n .If ~ they are no1 equivalent, then there is no 4The rank theorem [4] shows that the well formedness property of free-choice nets can be checked in polynomial time. This property is intimately related to simple control, hence it can be proven that simple control is also polynomial in free choice nets [22,25,29]. ‘Because thread addition commutes, unless care is taken, attempts at automatic proofs can lead to infinite iteration.
34
0
PABLO A. STRAUB AND CARLOS A. HURTADO
threading and the model has no simple control. If they are, one label is kept as a representative and others are deleted, thus the algorithm does not compute full labels but it does compute a threading. Of course, full label sets can be computed if desired; they give details about all possible futures of the computation. The best method is of course to have a small net: this can be accomplished if complex processes are described in terms of simpler processes, analyzing first subprocesses and then the composition of these subprocesses. Building consistent modeZs. It is possible to use a context-free grammar to model the primitives in the second approach. If a model can be generated using a grammar with a set of constructions, it is possible to prove by structural induction that the model has simple control, provided that each rule preserves simple control. One possible set of rules is sequential composition, alternative composition, and parallel composition. In addition, we have as an axiom the fact that each atomic activity has simple control. These rules are sound, although they are not complete (there are process models with simple control that cannot be parsed).
In what follows we present a variant of the grammar approach called the incremental case composition method, where instead of using a contextfree grammar the model is built incrementally by adding alternatives to a socalled base model. Each alternative is added by performing an operation on the model and creating a new model. The base model has simple control and each operation preserves this property, hence no control anomalies will occur. This method is not complete in the sense that there are models with simple control that cannot be obtained using the method. Creating a complete method in this sense is an open problem. 6.1
Base model
A base model is a model that describes one execution scenario, i.e., there are no alternatives. Most corporate procedure manuals describe processes without alternatives, i.e., they describe base models. Because behavioral inconsistency is the consequence of inappropriate use of parallelism and choice and the base model has no choices, it has no control anomalies. Figure 2 shows the base model for the loan example; it assumes there are no problems whatsoever with the credit so every activity is performed successfully and once. A process model is built beginning with a base model that represents the
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
35
"normal" case (whatever that means for the process developer). Then the model is enhanced by adding alternatives. Each alternative or case has a condition (specified as an or-split), a controlling activity or set of activities, and a next state (specified as a connection). At each step, the method checks that no behavioral anomalies are added.
Definition 6.1 (Base model ) it has no or-nodes.
A behavioral model is a base model if
An interesting property of a base model is that the only process denoted by the model is the same net as N(JU), because the model has no choices. This implies that those activities not related by F are concurrent. In P/Tnet parlance, a base model is a process of itself [20]. Moreover, the base model is the onfy process of JU (defining q as the identity function). +
Theorem 6. I
Every base model hus simple control.
Because every base model has simple control, it has a balanced threading
q.To compute the threading, the net is executed backwards as a P/T-system
( P , T , F - I ) , initially defining q(exit) = p and upon firing a transition t defining ;~ ( q )for p q @ ) : = p8 t 8eqG The following algorithm gives details.
E 't.
Algorithm: Threading of a base model Input Output Local variable
The place/transition net ( P , T , F ) of a base model At The threading i+ defined for each node of the net T ' , the set of processed transitions, and P ' , the set of processed places Loop invariant b'x E P' U T', y E P U T : x F*y w( y ) is well defined.
*
(1) Let q(exit):=,u. (2) Let P' := [ exit 1, and let T' := M. (3) While T' # T do (a) Let t be an element of T - T' such that t' 2 P'. (b) ForeachpE't, let q ( ~ ) : = p 8 f t @ ~q. (, q. ) (c) Let P' := P' U ' t , and let T' := T' U ( t ) Some comments on the algorithm: 0
This algorithm computes a labeling that has one label for each net element. Because every place has one predecessor, only one label is defined, so the labeling is a threading.
36
PABLO A. STRAUB AND CARLOS A. HURTADO
0
It is easy to show that the loop invariant is established before the loop, and that the loop invariant, together with the negated loop condition and the connectivity of the net establish the desired output. The fact that the invariant is indeed invariant is also simple, provided that there is always one transition t that satisfies the condition of statement 3(a). But that is the case, because the set of places p in P' such that ' p is not in T' represent a state of the system (i.e. it is a maximal cut in the process) and there must be a t whose firing leads to that state.
0
The threading computed by executing the net as above is an invariant. The only initial state has just one token in start, because the model has simple control. Hence, it must be the case that ly(sturt)= p , that is, ly is a threading.
6.2 Exceptions The base model describes a fixed procedure to handle a process. An exception is defined as any situation in which the base model does not apply. Exceptions are an intrinsic part of process modeling. This is specially true when modeling non-routine processes, but most routine processes do have exceptions, e.g., due to different cases or incorrect data. As in programs, exception handling in processes is done with regular branching and iteration, and also with special mechanisms that perform global control state changes, e.g., by manipulating the stack. In general, an exception comprises three parts:
(1) a control state and a condition in which the exception is raised; (2) a new control state resumed after the exception is handled; (3) an optional process that handles the exception. For example in an Ada program, the exceptional condition might be an attempt to divide by zero, the controlling activity is the exception handler, and the new control state is derived from the actions of the exception handler. We have shown that without exceptions, models have no behavioral anomalies. It is easy to prove that without parallelism there are no behavioral anomalies. Adding alternatives in the presence of parallelism can create behavioral anomalies. Consider a connection from a thread y to a thread B. If B < y the connection leads to deadlock. On the other hand, if y c B the connection may lead to an overloaded state or to multiple response. In the following subsections we show how exceptions can be added to a base model preserving the simple control property. The method adds
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
37
alternatives to a base model in such a way the the weighted sum of the tokens is always p , hence the process ends correctly and preserves the simple control property.
6.3
Alternatives within a Thread
Let .A be a model that is being developed and 1/1 its threading. In this section we describe how to extend JM by adding an alternative. The resulting model At.' becomes the new version of the model. A new alternative is specified by giving: (1) an or-node where the exception is raised; (2) An optional exception handler given by either a new activity a ; ( 3 ) an or-node where control returns after exception handling.
The addition of an alternative can be constructed with three more basic operations on the model: adding an or-node; adding a connection between two or-nodes; adding an activity.
Operation 6.1 (Adding an or-node) The new model resulting from the addition of an or-node o within a connection c = SH d is defined by Table V. Adding an or-node always preserves the simple control property. Lemma 6. I Adding an or-node o within a connection c = s + + dalways preserves the threading of existing nodes. The threading is extended by definingpsi (0)= q ( c ) .
Operation 6.2 (Adding a connection) The new model resulting from the addition of a connection c = s ~ between d or-nodes is defined by Table VI. TABLEV ADDITION OF AN OR-NODE Set
New value
A' := A 0' := 0 u I 0 ) N' := N R':= R U {s ++o,o++dJ - { s ~
d
)
38
PABLO A. STRAUB AND CARLOS A. HURTADO
TABLEVI ADDINGA CONNECTION BETWEEN OR-NODES Set New value A' := A 0' := 0
N' := N
R':=RU[s-d]
This operation preserves the simple control property and the threading if and only if the source and destination belong to the same thread. Lemma 6.2 Adding a simple connection c = SH d preserves the threading of existing nodes if and only if s and d belong to the same thread. The threading is extended by defining V ( c ):= q ( d ) .
Operation 6.3 (Adding an activity) The new model resulting from the addition of an activity a within a connection c = s~ d is defined by Table VII. Like adding an or-node, adding an activity always preserves simple control. Lemma 6.3 Adding an activity a in a connection c = s o d always preserves the threading of existing nodes. The threading is extended by defining * ( a ) = ~ ( c ) .
6.4 Alternatives between Multiple Threads Sometimes adding a simple connection is not enough. In a multiple connection a new and-node n is added and instead of returning control to a single or-node, control is returned to a set D = { d , , ...,d k ]of or-nodes. The addition of a multiple connection can be constructed with three more TABLEVII ADDINGAN ACTIVITY Set New value
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
39
basic operations on the model: 0 0
0
adding an or-node; adding a multiple connection from an or-node to several or-nodes, through an and-node; adding an activity.
Of these, the first and third are already analyzed.
Operation 6.4 (Adding a multiple connection) The new model resulting from the addition of a multiple connection from an or-node s to a set D = { d , , ..., d k ] of or-nodes through an and-node n is defined by Table VIII. This operation preserves the simple control property if and only if the thread of control of the source equals the summation of the threads of control of the destinations. Lemma 6.4 Adding a multiple connection preserves the threading of existing nodes, if and only if q ( s ) =@;=I
@(dJ
The threading is extended by defining ly(sn):= @ ( S > v ( n > : =v(s) q(ndi):=q(di)
for 1 G i c k .
The operation of adding a multiple connection was defined by adding an and-node that has exactly one predecessor. If an and-node with more than one predecessor is added the resulting model is necessarily not deadlockfree. In fact it leads to the phenomenon that we call distributed decision (Figure 8(b), Figure 9), in which the execution path to follow must be decided by two or more executors, but they all must make the same decision. TABLEVIII ADDITION OF A MULTPLS CONNECTION Set
New value
40
PABLO A. STRAUB AND CARLOS A. HURTADO
Theorem 6.2 (Distributed decision) if an and-node with more than one input is added to a model with simple control, the resulting model is not deadlock-free.
6.5
Unbalanced Connectors
Not all exceptional conditions can be handled by adding alternatives with balanced connections. While exceptions are raised locally in a single thread, other parallel threads of control might be involved, e.g. an exception may need to abort activities in other threads of control. An unbalanced connector is used to abort activities; it has always one source and one or more destinations. Information processes need exception handling facilities to succinctly describe many possible outcomes of a process. In principle, all exception handling can be done using conditionals and iteration, by adding tests for all possible events in every relevant point in the model. The idea of special exception handling constructs is factoring out those tests. For example, if it is known that because of some event a whole range of activities become irrelevant, these activities should be canceled. Furthermore, if an exception cannot be handled at the level where it is detected, some form of exception propagation must occur. To enrich information process control languages we can draw from ideas prevalent in the programming languages community. However, because of multi-threading, it is not obvious that ideas from languages such as Ada, Lisp or C can be mapped to information processes. Thus, it is not surprising that few languages have exception-handling constructs. In Rasp/VPL and in WAM [15] when a token reaches the end of a model, all activities within the model are deleted. This semantics ensures that there cannot be multiple response, i.e., more than one output (defined in Section 4). An explicit construct to abort the execution of unneeded activities is the unbalanced connector [25]. The unbalanced connector is a sort of and-node in which one predecessor is explicitly shown, and other predecessors are implicit (all successors are explicitly shown). The semantics can be informally expressed as “take a token from the input and other tokens from the model so that the number of tokens is just right”. The meaning of ‘‘just right” is such that it ensures that at the end the model will produce one token on the exit node and there will be no other tokens left. The semantics of the unbalanced connector is based on the theory of threads. Unbalanced connectors are different from regular and-nodes and thus its translation into P/T-nets is different. The semantics of unbalanced connectors can be expressed using Petri nets. In the example from Figure 13, the unbalanced connector is represented in Figure 14 by three transitions, which
41
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
CFIG.
FIG. 9.
13. A process model with an unbalanced connector corresponding to the model in
unbalanced
DDO-
Uu-0
exit
start
FIG. 14. Semantics of the unbalanced connector in FIG. 13
produce a token in the place of the following or-node, and consume a token from the connector’s incoming edge and a token from the places of either B, its incoming edge, or its outgoing edge. The translation ensures that extra tokens are consumed by the one of the transitions corresponding to the connector. An unbalanced connector produces dangling activities, i.e., those whose output is not needed. Dangling activities are those whose thread is less than the threads of the destinations. In Figure 13 B is dangling. Dangling activities violate the invariant, so they must be aborted.6 In fact, dangling activities are very much like useless activities, so if they were not aborted they would lead to overloaded states. In principle, Petri nets are useful to model unbalanced connectors, but they are not practical because the number of transitions needed to represent an unbalanced connector can grow exponentially with the number of nodes in the net due to the state explosion that may occur. In a realistic loan approval example [ 2 5 ] , the CICN net had 24 nodes, 3 of which were unbalanced connectors, and the corresponding Petri net had 1266 nodes; for comparison, the CICN net without unbalanced connectors has 20 nodes and ‘This requires a user interface where users are advised that their pending activities have been interrupted. Here we d o not delve in these matters.
42
PABLO A. STRAUB AND CARLOS A. HURTADO
the corresponding Petri net has 35 nodes. On the other hand, the intuitive semantics of the unbalanced connector is relatively simple: “abort now all activities that are not required to produce output,” or more technically “delete all tokens in places whose thread is a subthread of the destination of the unbalanced connector and put one token in the destination”. An implementation can use the second intuitive semantics, with the subthread relation computed off-line when the process model is compiled into an internal form. Formally, the semantics of a deterministic unbalanced connector c is given in terms of its translation into a P/T-net, i.e., by extending the definition of N ( & ) (Definition 4.2) as follows.
Definition 6.2 (Extension of Definition 4.2) The connection from an or node s to a set of or nodes D = ( d l ,...,dk} through an unbalanced connector c is translated by adding nodes to the P/T net, once regular nodes are translated. Let PI,...,P,, be all possible sets of dangling places in the connection at any state M such that M(’c) > 0. The translation will add 1 place p and n + 1 transitions { f, t , , ...,t , , ) , connected as follows: ‘P= (tl ‘t=s ‘ t i= Piu { p )
p ’ = i f , , . ..,f”) t’ = p tl. = D.
An important property of the semantics for an unbalanced connector is that the transitions of the connector are all balanced. Thus, even if the model seems unbalanced, the P/T-net that defines its semantics is balanced. However, an important property is lost: the nets are no longer free-choice. For example, in Figure 14 the representation for the unbalanced connector shows a place with three successors, each of them with a different set of predecessors, hence the net is not free-choice. That is, unbalanced connectors mix choice and synchronization. This means that choices are constrained by the global state, i.e. they are no longer local. However, this is precisely the intended meaning: in the presence of an exception an executor might unexpectedly lose control of its thread. Lemma 6.5 Adding a multiple connection with an unbalanced connector preserves the threading, if and only if
W )G @;=
1
V(4).
The threading is extended by defrning
1v( P > := 1v(s).
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
43
6.6 Summary: Incremental Case Composition The method to build a model by incremental case composition is as follows.
(1) Identify one possible case of the model. This case might be the most likely or the one that is considered normal. (2) Create a base model by enumerating all activities of this case and identifying the precedence relation. This process is the same as creating a critical path method (CPM) chart. (3) Repeat, until all possible cases are covered: (a) Identify a condition and place of evaluation of the condition. (b) If a handling process is needed, develop it using this method or use one that is already developed. (c) Determine the new state of the process after the exception is handled. (d) Check that the connection is feasible. If not, this exception cannot be added to the model. Feasibility of a connection can be checked automatically, if the system keeps the threading as the model is being developed. For example, to develop the loan approval process, the base model shown in Figure 2 is the case in which all activities succeed (this might not be the most frequent case). Figure 15 is the final model which was built by adding several cases to the base model. The first case was the possibility of not approving the credit: this case added an or node ( u l ) after activity A, an unbalanced connector (a,), and an or connector immediately before exit. The second case was the possibility of errors in the legal data of the property: this case added a loop in activities E and F , with two new or
exit
FIG. 15. The complete loan approval process, with four exceptional cases.
44
PABLO A. STRAUB AND CARLOS A. HURTADO
nodes. The third case had to do with the possibility of the property not being appraised as valuable enough to cover the credit: this case added an activity H (“to notify the customer”) and an unbalanced connector ( a s ) .The fourth case involved letting the customer decide upon this notification whether or not to use the credit for another property: this case added or node o4 after activity H and an unbalanced connector (a6). Because all steps in the process keep the threading, it can be proven that the model has simple control. Theorem 6.3
control.
6.7
Models built with the described method have simple
Dealing with Unspecified Situations
In most business processes it is impossible to know in advance how to handle every possible situation. It is likely that regardless of how complete the model might be, at some point none of the pre-programmed alternatives apply. Thus, if a process modeling system insists on the model being complete before enaction, once an unforeseen situation arises either the problem will not be solved (“sorry, madam, the computer doesn’t allow that”) or it will be very hard to keep the computer up to date; that is, at this point the workflow system would be more hindrance than help. A system should allow addition of unspecified exceptions at run time. When such an unforeseen exception arises, the executor of the activity manually solves the problem and creates a connection to bring the process to a desired state. The system checks that the connection is feasible; if the connector is infeasible it is rejected, else execution continues as if the connection existed before enaction. Because the connection is feasible no behavioral anomalies will ensue.
6.8 General Unbalanced Connectors A generalization of the unbalanced connector includes the possibility of producing extra tokens instead of deleting them [26]. This happens if the sum of the threads of the incoming arcs is greater than the thread of the outgoing arc. A further generalization is allowing incomparable threads, so that some tokens are deleted while others are produced. Producing tokens in a thread poses the problem that there are several places in the same thread: which place should be chosen? If the choice is random, then we have a nondeterministic system with the power to decide on the work performed by people. We rule that out. Fortunately, there is a better choice: produce the token in the place right before the connector that will be
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
45
waiting for it. This connector can be determined by looking at the thread expression. Figure 16 shows the use of such a kind of unbalanced connector, depicted with dangling arcs for both predecessors and successors. The extra token is produced in such a way that it minimizes work along its thread, i.e. no work is implicitly assigned due to the unbalanced connector. In the example, when or-node or2 chooses the lower path to iterate, activity D is aborted, activities C , D , and E are re-executed, but B is executed just once. While the semantics of general unbalanced connectors imply the deletion and addition of tokens to ensure that the weighted sum of tokens is equal to p , this is not always feasible. That is, general unbalanced connectors cannot always be added from any or node to any other set of or nodes. The constraint is that the destinations must add up to something less than or equal to p. This implies in turn that no thread will be active twice, and no thread will be active if a subthread is active.
Definition 6.3 (Feasible connector) A connection from an or node s to a set of or nodes D = { d ,,..., dk] using a generalized unbalanced
FIG.16. A generalized unbalanced connector and its semantics in terms of Petri nets.
46
PABLO A. STRAUB AND CARLOS A. HURTADO
connector n is feasible if and only if Note that because C is a partial order it might be the case that these threads are uncomparable.
7.
Conclusion
The modeling of process control is a non-trivial task, especially when processes are complex. These problems are not conveniently handled in all tools we are aware of, even though it does not make much sense to analyze efficiency, timing, data flow, data availability, and resource utilization, if the model behaves incorrectly. Traditional solutions to the problem of ensuring correct behavior do not seem adequate. The use of context-free grammars to define sets of allowable models ensures correctness, but overly constrain the range of expressible models, inhibiting the natural parallelism in processes. The verification of models based on reachability graphs (i.e. finding all reachable states) or on Petri net invariants are computationally tractable methods for free-choice nets,' but they do not give clues on the causes of behavioral anomalies and possible corrections. In this article we have identified a series of relevant control properties, including deadlock freedom, useless activities, consistent abstraction, etc. These properties are related and form the basis for a notion of behavioral correctness in a model. Process semantics of Petri nets helps to determine the causal relations between activities in a process, i.e., whether two activities are independent and are executed in parallel, or they are executed in sequence. This paper defines the concept of useless activity using process semantics. Surprisingly, there are situations in which the possibility of having a useless activity cannot be avoided [26]. In this case, once an activity is determined to be useless it can be aborted. An algebraic formalization of the rather elusive concept of thread of control was given. The thread algebra is a suitable framework to understand behavioral properties; in fact, there are strong relationships between threads and behavior: behavior is correct if and only if threads of controls are properly mixed in the model, a notion that has been formally defined. We have recognized several applications of thread theory, applicable to several other languages. One application of thread theory is the incremental composition method to develop process models by iteratively adding exceptions to a so-called base 'All basic control constructs lead to free-choice nets 1251.
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
47
model. At each step, an appropriate mix of threads is kept in the control model, hence preserving the simple control property. This method can also be applied to the problem of handling an unforeseen exceptional condition at run time. A thorough understanding of control allows the modification of an executing process due to unanticipated exceptions, guaranteeing that the modified model will not have behavioral anomalies: it suffices to (incrementally) recompute the threading for the new model, using the techniques of Section 6. Using the theory of threads we have extended basic control models adding so-called unbalanced connectors that mix threads in a controlled way [ 2 3 ] . Thread theory is used to identify which activities should be canceled. This is a generalization of the rule in VPL/Rasp that once a token is put into an output socket’ of the model all pending activities within the model are canceled. While the semantics in terms of Petri nets is complex, the informal semantics can be simply stated as “abort now all activities that are not required to produce output”. The theory of threads identifies those places based on the threads of predecessors and the successor of the unbalanced connector. A further generalization of the unbalanced connector not only deletes extra tokens, but also add those that are now needed. The generalized unbalanced connector has an even more complex semantics in terms of Petri nets, but its informal semantics is still simple “abort now all activities that are not required to produce output and create tokens so as to avoid deadlock”. Again, the theory of threads identifies those places that will be affected. ACKNOWLEDGMENTS This paper has been improved by the comments of one anonymous referee.
Appendix: Proofs of Theorems This appendix includes most proofs. In some cases where the full proofs are lengthy and do not give much insight, only a sketch of the proof is provided and the reader is referred to the original proofs.
Proof of Theorem 4.1 A process model is useful if and only if it has no overloaded markings and is deadlock-free. (If.) Let n be a complete process that corresponds to one execution of the ‘An output socket of a model is the equivalent of an exit node.
48
PABLO A. STRAUB AND CARLOS A. HURTADO
P/T-system of a CICN model. Assume x has a useless place 4 ( b ) . Then there is no path from b to a place corresponding to exit, that is, if q ( x ) = exit, (b,x ) E F'. Then for every successor b' of b there is no path that leads to exit. Because n is a complete process (hence finite), there must be at least one successor b" of b that has no successors. The state denoted by x is such that M ( q ( b " ) )> 0, and a ( b " )+ exit. That is, the final state of n is either overloaded or a deadlock. (Only if.) Assume the model has an overloaded state or a deadlock. In any case there must be a place p # exit and a process x whose final state satisfies M,( p ) > 0. From the definition of reachable state of a process, there must be a place b E B such that q ( b ) = p and b' = 0.Then p is a useless place in n.
0 Proof of Theorem 4.2 A process model has simple control if and only if it is single-response and useful. (If.) From Theorem 4.1 the model has no overloaded states nor deadlocks, hence for every place p # exit, M,( p ) = 0. Because it has single response, then M,(exit) = 1, so the model has simple control. (Only if.) Immediate from the definition of simple control.
0 Proof of Theorem 5.1 Given a model whose connected net is ( P , T , F ) , its connected net has an extra transition t, such that { e x i t ] = ' t , and ti = { start }. Then, if the model has a threading V , for every transition t E T U k l , O p E . ,V ( P ) = e P € iV ( P > . We first prove that equality for t, and then for other transitions. For t,, we have ' t , = { exit] and t: = ( s t a r t ) . From the definition of label equivalence, 1/,(exit) = ,u and from the definition of labeling, V (start)= p. Let c E T be a transition different from I,. Because all labels in a given place are equivalent, the proof can be done by choosing any label from each predecessor and successor of t. Let a,, ..., a , be a compatible set of labels each for one of the successors of t. Let p be a predecessor of t. Then a label for p is p
@ t @ (a1 @
... @ a,)
and a summation of labels from predecessors of t is
[Because of rule 41 [Because of rule 71
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
49
Proof of Lemma 5.1 If all labels have no cycles (hence are finite), the net defined by the four-step construction is a causal net, i.e. (a) it has no cycles; (b) each place has at most one successor; and (c) each place has at most one predecessor. For a proof of this lemma, please refer to [7]. The proof hinges on the fact that if the net had a place with more than one successor, then the labels would not be consistent. Now, if no place has more than one successor, a cycle would be an infinite cycle only obtainable from an infinite label or from an infinite set of finite labels, but all labels are finite. Finally, if there were a place with more than one predecessor, the labels would be irreducible to p by the given rules. 0 Proof of Lemma 5.2 If all labels have no cycles, the causal net of the four-step construction is a process of A'(&) that begins in a state with one token in each place related to the labels (and no more tokens) and ends in a state with one token in an output socket (and no more tokens). For a proof of this lemma, please refer to [ 7 ] . The proof hinges on the fact that all labels end in the exit node and that if the labels add up to p they must be reducible to p. 0
Proof of Theorem 5.2 If a process model A, has a threading q then
A, has simple control.
Because the threading is a place invariant (proved in Theorem 5.1) every reachable state M must satisfy v ( M ) = p . Let ( a l ,..., a,,] be a set of consistent labels, one from each place marked in M (these labels exist, just take for each successor transition the same label). Because the labeling is a threading, this set of labels adds up to p , hence it is possible to build a process that ends with a single token in the exit node. Hence, given any reachable state it is possible to reach an adequate final state (i.e. no reachable state is deadlock or overloaded or multiple response). Thus the model has simple control. 0
Proof of Lemma 5.3 If a model has simple control all places have at least one label. For a proof of this lemma, please refer to [7]. The proof is based on 2 other lemmas, which state: (1) If the connected net has a trap' that does not include start, then the model has no simple control. (2) If there is a labeling 'In Petri net theory, a trap is a set X of places such that x'
'X.
50
PABLO A. STRAUB AND CARLOS A. HURTADO
that includes a place without labels, then the net has a trap that does not include start.
0 Proof of Lemma 5.4 If a model has simple control, all labels for a given place are equivalent, i.e. t l p E P :V l , ,I, E t(p ) : I, 2 I, Assume there is a place p with two non-equivalent labels I, it I,, such that all successors of p only have equivalent labels (i.e. p is the first offending place discovered by a labeling algorithm working backwards). Let M be a reachable state that marks p . This state must exist because if the net has simple control, the net is live [22,29]; because the net is safe (or 1-bounded) then M ( p ) = 1. Then, let M' be a state obtained from M by firing transitions that do not consume the token in p until no more transitions can be fired without consuming that token. Because the model has simple control, from M' all processes end in a state with a single token in the exit place. Thus, it is possible to choose one label from each place marked by M' such that the summation of all labels is p. In particular, we might choose I, to be in that set. From the set of labels we can construct a process. In this process all places but those marked by M' are successors (under F ' ) of p . Now, because the election of I, was arbitrary ...
0
Proof of Theorem 5.3 A model At that has simple control has a threading. From Lemma 5.3 all places have a label. From Lemma 5.4 all labels in a given place are equivalent. To prove that the labeling defines a threading we only need to prove that a label a for the start node is equivalent top. But that must be the case, because from the second part of the proof of Theorem 5.1 if all labels are equivalent the threads are an invariant. Now, because the model has simple control the threads of all final states add up to p , hence the thread of the initial state must also be p. (This can be seen more clearly by executing the net backwards, the invariant property holds both ways.) 0 REFERENCES
1. Abbati, D., Caselli, S . , Conte, G., and Zanichelli, F. (1993). Synthesis of GSPN Models for Workload mapping on Concurrent Architectures. Proceedings of the International Workshop on Petri Nets and Performance models. 2. De Michelis, G., and Grasso, M. A. (1993). How to put cooperative work in context:
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS
51
Analysis and design requirements. In Issues of Supporting Organizational Context in CSCW Systems (L. Banon and K. Schmidt, Eds). 31 August. 3. Dennis, A. R., Hayes, G. S., and Daniels, R. M. (1994). Re-engineering business process modeling. Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences. 4. Desel, J., and Esparza, J. (1995). Free-choice Petri Nets. Tracts in Theoretical Computer Science 40, Cambridge University Press, Cambridge. 5. Ellis, C. A,, and Keddara, K. (1995). Dynamic Change within Workflow Systems. University of Colorado Technical Report, July. 6. Gula, J. A,, and Lindland, 0. A. (1994). Modeling cooperative work for workflow management. 6th International Conference on Advances Information Systems Engineering, CAISE, June. 7. Hurtado, C. A. (1995). Modelaci6n y Analisis de Procesos de Informaci6n. Tesis de Magister, Pontificia Universidad Cat6lica de Chile, Santiago (in Spanish). 8. Harel, D., Lachover, H., Naamad, A., Pnueli, A., Politi, M., Sherman, R., ShtullTrauring, A., Trakhtenbrot, M. R. (1990). STATEMATE: A Working Environment for the Development of Complex Reactive Systems. IEEE Trans. on Software Engineering, 16(4). 9. Integration Dejnition for Functional Modeling (IDEFO). National Instihte of Standards and Technology, USA 1992. 10. Curtis, W., Kellner, M. I., and Over, J. (1992). Process modeling. Communications ofthe ACM, 35(9). 11. Lamb, D. A. (1988). Software Engineering: Planning for Change. Prentice-Hall, Englewood Cliffs, N.J. 12. Lauterbach, K. (1987). Linear algebraic techniques for place/transition nets. Lecture Notes in Computer Science, 255. 13. Malone, T. W., and Crowston, K. (1994). The Interdisciplinary Study of Coordination. ACM Computing Surveys, 26(26). 14. Medina-Mora, R., Winograd, T., Flores, R., Flores, F. (1992). The Action Workflow approach to workflow management technology. Proceedings of CSCW, November. 15. Messer, B., and Faustmann, G., (1995). Efficient Video Conference via Workflow Management systems. Workshop “Synergie durch Netze”, Universitat Magdeburg, October. (English translation by the authors.) 16. National Institute of Standards and Technology (NIST) (1993). Integration Definition for Function Modeling (IDEFO). FIPS Pub 183, NIST, December. 17. Parry, M. (1994). Reengineering the Business Process. The Workflow Paradigm, Future Strategies Inc., 1994. 18. Peters, L., and Schultz, R. (1993). The application of petri-nets in object-oriented enterprise simulations. Proceedings of the 27th Annual Hawaii International Conference on System Sciences, 1993. 19. Pratt, T. W., and Zelkowitz, M. V. (1996) Programming Languages Design and Implementation. Prentice-Hall, Englewood Cliffs, N.J. 20. Reisig, W. (1985). Petri Nets: An Introduction. Springer-Verlag. Berlin. 21. Robinson, M. (Ed.) (1991). Computer Supported Cooperative Work. Cases and Concepts. Proceedings of Groupware ’91. Software Engineering Research Center. 22. Straub, P., and Hurtado, Carlos A. (1995). The simple control property of business process models. XV International Conference of the Chilean Computer Science Society, M c a , Chile, 30 October-3 November. 23. Suaub, P., and Hurtado, Carlos A. (1995). A theory of parallel threads in process models.
52
24. 25. 26.
27. 28. 29. 30.
PABLO A. STRAUB AND CARLOS A. HURTADO
Techical Report RT-PUC-DCC-95-05. Computer Science Department, Catholic University of Chile, August (In URL ftp://ftp.ing.puc.cl/puWescuelddcc/techReportslrt95-05.ps). Straub, P., and Hurtado, Carlos A. (1996). Understanding behavior of business process models. In Coordination Languages and Models, First International Conference, Coordination’96, LNCS 1061, Springer, Cesena, Italy, April 15-17. Straub, P., and Hurtado, Carlos A. (1996). Business process behavior is (almost) freechoice. In Computational Engineering in Systems Applications, Session on Petri Nets for Multi-agent Systems and Groupware, Lille, France, July 9-12. Straub, P. and Hurtado, Carlos A. (1996). Avoiding useless work in workflow systems. International Conference on Information Systems Analysis and Synthesis, ISAS’96, International Institute of Informatics and Systemics, 14269 Lord Barclay Dr., Orlando, USA, 22-26 July. Swenson, K. D. (1993). Visual support for reengineering work processes. Proceedings of the Conference on Organizational Computing Systems, November. Touzeau, P. (1996). Workflow procedures as cooperative objects. In Computational Engineering in Systems Applications, Session on Petri Nets for Multi-agent Systems and Groupware, Lille, France, 9-12 July. van der Aalst, W. M. P. (1995). A class of Petri nets for modeling and analyzing business processes. Computing Science Report 95/26, Dept.’ of Computing Science, Eindhoven University of Technology, August. Workflow Management Coalition (1994). Glossary. Document no. TC00-0011, 12 August.