Improving business process intelligence by observing object state transitions

Improving business process intelligence by observing object state transitions

DATAK-01534; No of Pages 21 Data & Knowledge Engineering xxx (2015) xxx–xxx Contents lists available at ScienceDirect Data & Knowledge Engineering j...

2MB Sizes 0 Downloads 42 Views

DATAK-01534; No of Pages 21 Data & Knowledge Engineering xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Data & Knowledge Engineering journal homepage: www.elsevier.com/locate/datak

Improving business process intelligence by observing object state transitions Nico Herzberg ⁎, Andreas Meyer, Mathias Weske Hasso Plattner Institute at the University of Potsdam Prof.-Dr.-Helmert-Str. 2–3, D-14482, Potsdam, Germany

a r t i c l e

i n f o

Available online xxxx Keywords: Business process management Events Data Process Monitoring BPMN

a b s t r a c t During the execution of business processes several events happen that are recorded in the company's information systems. These events deliver insights into process executions so that process monitoring and analysis can be performed resulting, for instance, in prediction of upcoming process steps or the analysis of the run time of single steps. While event capturing is trivial when a process engine with integrated logging capabilities is used, manual process execution environments do not provide automatic logging of events, so that typically external devices, like bar code scanners, have to be used. As experience shows, these manual steps are error-prone and induce additional work. Therefore, we use object state transitions as additional monitoring information, so-called object state transition events. Based on these object state transition events, we reason about the enablement and termination of activities and provide the basis for process monitoring and analysis in terms of a large event log. In this paper, we present the concept to utilize information from these object state transition events for capturing process progress. Furthermore, we discuss a methodology to create the required design time artifacts that then are used for monitoring at run time. In a proof-of-concept implementation, we show how the design time and run time side work and prove applicability of the introduced concept of object state transition events. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Nowadays, companies face a very competitive and fast changing market environment. Therefore, they strive to run their value generating operations in a process-oriented way to keep the pace. These business processes are managed by using techniques and methodologies of business process management (BPM). BPM deals with the organization, documentation, analysis, optimization, and execution of business processes [1]. One important aspect of BPM is business process intelligence (BPI) that comprises process analysis, monitoring, and mining [2,3]. For using BPI approaches, information about process behavior and events that occur during process execution need to be present in a certain quantity and quality. Otherwise, the derived conclusion might not correspond to the actual execution. Process monitoring is used to predict upcoming events and process steps by observing the actual process execution and deriving the corresponding process behavior. Process monitoring is usually applied on process models being enacted by a process engine – an information system that controls the process execution – because it generally provides logging capabilities out of the box. Therewith, the current process progress is easily recognizable from the observed events. The observation of such an event determines the current position in the course of process execution. In contrast, manually executed processes, as, for instance, usual in healthcare, lack these capabilities such that most events cannot be observed or stored, i.e., event logs do not exist or are incomplete. However, recognition and prediction of process progress as well as a holistic view on the process ⁎ Corresponding author. E-mail addresses: [email protected] (N. Herzberg), [email protected] (A. Meyer), [email protected] (M. Weske).

http://dx.doi.org/10.1016/j.datak.2015.07.008 0169-023X/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

2

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Perform conservative treatment

Admit patient

Examine patient and assess risk

Educate patient

Perform surgery

Perform aftercare

Information Sheet

Surgery Protocol

[signed]

[completed]

Release patient

Fig. 1. Abstracted treatment process in a hospital, starting with the admission of the patient, followed by its examination and the treatment alternatives, i.e., conservative treatment and surgery, and ending with the release of the patient.

execution is necessary in these domains as well. They also need to know exactly what is going on, how the process performs, and where the points of improvements are. The lack of events could be fixed by introducing additional external devices and similar logging equipment. However, it is not applicable in manual processes – especially in time critical ones as in healthcare – due to extra amount of work and complexity. Therefore, event monitoring points [4] are defined to specify where in the process events are expected. This event information is than extracted from the IT system landscape and correlated to the particular process execution. Event monitoring points can be considered as milestones indicating the achievement of important business value. In this paper, we apply the concept of event monitoring points to data objects to create more expressive event logs without adding additional logging mechanisms or devices such as scanners, RFID tags, or sensors. We further elaborate on the ideas about the concept of object state transition events (in short: transition events), which have been initially sketched in Ref. [5] and discussed in more detail in Ref. [6]1. Thereby, we reason about activity termination (enablement) by means of events indicating the data objects to be written (read) by that activity transitioned into the respective object state and vice versa. Additionally, this technique also increases the number of events observed in enacted process models such that analysis is based on a much higher number of inputs resulting in an increased reliability of the results. The remainder of the paper is structured as follows. We present a motivating example from the healthcare domain and discuss the arising challenges in Section 2. In Section 3, we introduce the foundation combined with the scope of our approach followed by an approach overview and detailed descriptions of its application in Section 4. The general methodology to apply the presented approach is defined in Section 5. Section 6 describes the application of the introduced approach to a use case during design time (model view) and during run time (instance view). In Section 7, the system architecture including an implementation is described, followed by an evaluation of the approach by applying the use case from Section 6 to the implemented software system. Section 8 discusses the additional benefits of the presented approach for BPI application in non-automated process execution environments. Finally, we discuss related work in Section 9 and conclude the paper in Section 10.

2. Motivating example Within healthcare, business processes are executed by humans in a manual fashion. Some steps are possibly connected with IT systems, e.g., the admission and release of a hospital's patient. During patient treatment several information are taken to paper and various forms are filled and brought into document systems later on. In Fig. 1, an abstracted treatment process is shown. The process starts with the admission of the patient, followed by its examination and the risk assessment for a particular treatment. Based on that, the decision between a conservative treatment and a surgery is made. In case of a surgery, the patient must be educated beforehand and needs aftercare afterwards. In any case, the patient is released after the treatment. In this process, two documents are of importance, the Information Sheet and the Surgery Protocol. These documents are valuable, because they are the manipulated artifacts during process execution and describe the processing results through their changing states. For instance, a Surgery Protocol has several parts like patient data, staff data, medical data, and surgery data. Filling those parts put the document in a certain state, e.g., patient data filled representing completeness of this respective part. The document Information Sheet is, for instance, described through the states initial, filled, and signed. During process execution, the Information Sheet could be in one of these states at some point in time but in no other. In this scenario, only the first activity Admit patient and the last activity Release patient describe work that is done in an IT system. Thus, the usual log of recorded activities for a particular process execution only contains events for these activities and is as such not efficiently usable for typical BPI applications, e.g., process mining [3] and business process monitoring [7]. Therewith, this process environment challenges the requirement for business process monitoring, as (a) several steps of the patients treatment are not represented in IT systems – in fact, only the admission and the release of a patient are –, (b) there is no central system to collect

1

This paper at hand is a special issue paper extending the paper [6] published through ER′13 conference.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

3

the information needed, and (c) various valuable information about the treatment process may be spread over various systems, but it is not easy to relate it to the particular process execution. In this paper, we focus on a solution to challenge (a). As we have valuable information about the used documents during process execution, we utilize knowledge about the utilization of data objects and their states to get additional insights about particular process executions. For instance, while only having information about the admission and release of a patient, the addition of information from the used documents will help to reason about the middle part of the process. Knowing that an Information Sheet is signed by a patient leads to the reason that in a particular treatment process the lower part, i.e., the surgery, was chosen. Having the information that the Surgery Protocol is completed but the patient is not released yet, leads to the insight that the patient is still in aftercare. However, using the information about the data objects and their states will lead to additional challenges similar to (b) and (c). The information about data objects may be stored in different systems and has to be accessed for utilizing it for business process monitoring. Furthermore, a process monitoring approach must be able to correlate existing information to the particular process instance.

3. Foundation Within our approach, we connect process models with object life cycles which describe the actions and manipulations performed upon data objects. Thereby, a data object is any piece of information being processed, manipulated, or worked with during process execution. Each data object can exist in several states which represent specific business situations of interest from the data object's point of view. The dependencies of these object states are defined by means of object life cycles. Definition 1. Object life cycle An object life cycle is a tuple L = (S, Z, i, T, Σ, η), where S is a finite non-empty set of object states, Z ⊆ S is a set of final object states, i ∈ S\Z is an initial object state, T ⊆ S × S is a finite set of object state transitions, Σ is a finite set of labels (S, T, and Σ are disjoint), and function η : T → Σ assigns to each object state transition the corresponding label representing the action initiating that transition. We represent an object life cycle as a state transition diagram, where elliptic nodes represent the object states and the transitions are represented by edges connecting two states. Each edge is labeled with the action leading to the state change. For readability reasons, we abandon from edge labels within this paper in most of the figures. Fig. 2 shows the object life cycle for the data object Information Sheet from our healthcare example in Section 2. This data object consists of three states: initial, filled, and signed. The object state transitions describe the transition of one of these states to another, i.e., from initial to filled and from filled to signed. Those state transitions might be represented through event objects during run time. We refer to an event object as real-world happening occurring in a particular point in time at a certain place in a certain context that is represented in the information system landscape [8], e.g., the documentation of the patients signature on the information sheet. Each event object can be correlated to any positive number of nodes or object state transitions and is stored in an IT system such that they are accessible as (semi-) structured data. In the scope of this paper, we focus on event objects that can be correlated to object state transitions. Thereby, we assume that all expected events are recognized and stored, i.e., we exclude the discussion of missing events from this paper. By using data objects and their states in process models, these data objects are utilized during process execution. Thus, the event objects relating to the data objects' life cycles are relevant for the corresponding process execution. We formally define the process model and set the scope of the process models, we support with our approach. As an example, we refer to the process model shown in Fig. 1. Definition 2. Process model A process model is a tuple M = (N, D, R, C, F, type), where N ⊆ (A ∪ G ∪ E) is a finite non-empty set of nodes, which comprises sets of activities A, gateways G, and events E. D is a finite non-empty set of data objects, R is a finite non-empty set of object states, C ⊆ (A ∪ G ∪ E) × (A ∪ G ∪ E) is the control flow relation, F ⊆ (A × (D × R)) ∪ ((D × R) × A) is the data flow relation representing write and read operations of activities respectively, and type : G → {xor, and} assigns to each gateway a type. Let Sd be the set of object states defined in the object life cycle for a data object d ∈ D, then the super set of object states S M ¼ ∑d ∈ D Sd for process model M denotes the set of object states specified in the object life cycles of all data objects being associated to an activity of M. The set of object states R used in M is a subset to the super set of object states, i.e., R ⊆ S M . Further, we require each process model M to fulfill basic structural requirements. First, the process model may be arbitrarily structured but it needs to be structural sound [9], i.e., M contains exactly one start and one end event and every node of M is on a path from the start to the end event. Further, each activity and event has at most one incoming and one outgoing control flow edge, while each gateway has at least three incident control flow edges with at least one incoming and at least one outgoing control flow edge. The events added to a process model only represent a small fraction of the events actually occurring during process execution such that most events are not comprised in the model but occur during process execution.

i

filled

signed

Fig. 2. Object life cycle of data object Information Sheet.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

4

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

The control flow semantics of the process model follows Petri net semantics [10]. The data flow semantics inherits the concept of data input and output sets from the Business Process Model and Notation (BPMN) [11]. The data input set contains a positive number of sets, each specifying the data objects in specific object states required to enable the activity. If one of the sets is completely satisfied by the existing data objects, activity enablement takes place. Otherwise, process execution is paused until satisfaction is observed. The check for data object existence for a certain activity is only done, if the control flow approaches that activity. Following, we assume that activity enablement requires control flow and data flow enablement at the same time. Analogously, the data output set specifies different options of which data objects in which object states exist after activity termination. The corresponding set is chosen during the execution of the respecting activity. Based on these data flow semantics, we assume that the observation of a data object write (the data object exists in the specific object state) indicates activity termination and vice versa. Finally, we also assume that the process model only utilizes object state transitions comprised in the object life cycle, i.e., the combination of process model and object life cycles fulfill the notion of weak conformance [12]. The object life cycle may contain object states and transitions not utilized in the process model. During process execution described by the above mentioned process models, several information about changes or exceptions in the business process environment is provided by event objects as well, e.g., the termination of a certain activity. Often those event objects are not stored at one place, but in several information systems or business process management systems [13]. We refer to events being unstructured and available in an IT system as raw events. For utilizing this event information and for further processing, structuring of raw events is required according to a certain description referred to as structured event type. In the first step, it transforms raw events into normalized events (see Fig. 3). Normalized events are the basis for further processing by, for instance, combination, aggregation, and enrichment by context data [14] and other event processing techniques. In this paper, we distinguish between two event objects building up on normalized events (a) representing a state change of a process model node, e.g., an activity, referred to as process events [8] and (b) representing a state change of a data object, referred to as object state transition event (see Section 4). These events are used during process execution for process monitoring, adjustment, and control [8,15,16]. The connection between process events and the particular places in the process models, i.e., the process model nodes, is established by process event monitoring points (PEMP). A PEMP specifies which process events are valid to describe a certain process state transition respectively process model node transition by utilizing defined node life cycles. A node life cycle describes the states and the corresponding state transitions of a process model node. In this paper, we assume a fixed node life cycle valid to all nodes consisting of state transitions initialize, enable, begin, terminate, skip, and disrupt for simplicity reasons. In figures, we restrict this to (e)nable, (b)egin, and (t)erminate. However, the presented approach also works with a separate node life cycle for each process model node.

Fig. 3. Event processing framework overview: event processing phases representation, normalization, object state transition event creation, and process event creation and their utilized concepts including the interdependencies.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

5

4. Approach In this section, we will introduce the details and the scope of our approach. First, we present the general idea in Section 4.1 before we get into the details of core concepts. Section 4.2 discusses the concept of object state transitions, names the challenges, and shows solutions for them. The notion of binding is presented in Section 4.3 followed by the description of the correlation aspects between data object and process instances in Section 4.4. At the end of this section, we will elaborate on the limitations of our approach in Section 4.5. Since our approach is generally applicable, we present the concepts of our approach with abstract and generic models. Design time representations utilize uppercase letters, e.g., X and Q , while run time representations use lowercase letters, e.g., x and q, both representing the first activity and a data object written by it on the process level in Fig. 4. Additionally, supporting concept descriptions, we also link parts the generic discussions to the healthcare example in Section 2. Fostering the general applicability, Section 6 applies our concepts to a second use case: an order process.

4.1. General idea The presented approach enables process execution analysis by utilizing the information about data objects and their life cycles. Thus, the required data objects and their life cycles need to be modeled first. In Fig. 4—Data Object Life Cycle Level, we provide an example with one data object. Assume, data object Q ∈ D may appear in five different states (init, a, b, c, and d), which are reached by the state transitions set a, set b, set c, and set d. For the given life cycle, the observable state transitions need to be selected; a transition to a particular object state is observable, if it can be monitored by a specific event named object state transition event (see Definition 3). In our motivating example with its data object Information Sheet defined by the states initial, filled, and signed, the state transition from filled to signed is of interest and need to be monitored by a transition event. In Fig. 4, we denote observable object state transitions with bold (set b) and dashed (set c) lines. Definition 3. Object state transition event An object state transition event E = (type, id, timestamp, q, s) consists of an object state transition event type referencing the corresponding data object Q ∈ D, a unique identifier, a timestamp indicating the point in time when the object transition occurred, the instance q of the data object Q , and the object state s of object Q reached when the event was triggered. An object state transition event E is an instance of an object state transition event type (see Definition 4) that describes the creation logic for the underlying transition events based on the corresponding data object. The information about the object state in the transition event is necessary to enable the correlation of the object state transition to the corresponding activity in the business process. The transition event holds a snapshot of the data object at the point in time a particular object state is reached. This is required to enable process analysis with respect to certain process data as, for instance, object attributes. Assume a stakeholder is interested in the costs of activities executed in the course of process model M (see Fig. 4) and data object Q contains an attribute in which execution costs are summed up. For a specific process model instance m of M and therefore a specific data object instance q of Q, the required process analysis would be about the costs recorded at every step in the data object instance q. Another analysis targeting execution times allows identifying bottlenecks in the process model and both mentioned analyzes may visualize room for process improvements, e.g., by knowing how long it takes from the signature of the Information Sheet by the patient until the surgery starts.

Fig. 4. Overview of the approach at design time and run time.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

6

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

The object state transition event aligns to structured events as defined by Herzberg et al. [8] with the event content from their definition being represented by the snapshot of the data object q and data state s reached through the respecting transition. Referring to this work, we define an object state transition event type ET for each data object that is managed in an event type repository, Fig. 4—Event Level. Definition 4. Object state transition event type An object state transition event type is a tuple ET ¼ ðQ ; BÞ, where Q ∈ D is the corresponding data object and B is a set of bindings that are required to access the information about the state transitions of the particular data object Q. The event type refers to the data object affected by a state transition represented by an object state transition event. Additionally, the object state transition event type holds information about the set of all bindings available for the particular data object. A binding links an object state transition of a specific object life cycle with an information system allowing the identification of the respecting state change information, because information about the transition of a data object to a new object state is represented in the information system landscape as shown in Fig. 4—IT Level. Such state change information may be, for instance, the insertion or update of an entry with a specific identifier in a database, e.g., the timestamp to the signature of the Information Sheet. An object state transition is observable if and only if there exists a binding for that transition. During process modeling, data objects may be associated with several process models. In our example (see Fig. 4—Process Level), data object Q is associated with process model M and is read and written in different states (a, b, and c) by particular activities, which are in sequence X, Y, Z surrounded by a start and an end event. An activity reads a data object in a certain state, creates a data object in a certain state, or transfers a data object from one state into another one. For instance, activity Z transfers Q from state b to state c only although the object life cycle would also allow a transition to state d. Altogether, the given process model conforms to the given object life cycle [17]. During the process execution (run time) the object state transition events are created according their specification described by the object state transition event type. The event store (Event Level in the Instance View in Fig. 4) comprises all transition events generated and persists them for further usage. Information about process execution can be gathered from the assignment of data objects to activities and the transition events. We assume that an activity is enabled if the activity could be executed with respect to the control flow specification and if all required input data objects are available in the corresponding object states as specified in the data object input set. Relating to our motivating example from Section 2, the surgery can be started once the Information Sheet is signed. Analogously, we assume that an activity is terminated once the data objects are present in the respective object states as specified in the data object output set. For instance, the signed Information Sheet indicates that the education of the patient is finished. A data object input respectively output set specifies for each activity the input respectively output data objects including their object states and their required combination. As aforementioned, process execution can be monitored for those activities that consume respectively provide data objects in certain object states, where the corresponding object state transitions can be observed by events. In Fig. 4, the object state transition set b is observable because of an existing binding and event type. If a corresponding event appears in the event store, we can reason about activity termination for the corresponding process instance. The object state transition set a cannot be observed such that it is not possible to derive information about the termination of activity X through data object information. Knowing about the existence of output data objects and the termination of the corresponding activity allows to derive the enablement of the directly succeeding activity from that information, if the succeeding activity requires a subset of the output data objects from the preceding activity. In Fig. 4, observing a transition event correlating to the transition set b, we consider activity Y of the respecting process instance to be terminated and activity Z gets enabled, because it uses exactly the output of Y as input. In the figure, we denote this insight by the bent arrow from t to e between both activities in the instance view. Furthermore, an event correlating to transition set c is not yet being observed although it is expected to happen at some point in time. This, we indicate with a dashed bold line between object states b and c in the object life cycle.

4.2. Object state transitions Object state transitions highly influence activity enablement and termination that are closely coupled with data object input and output sets respectively, which are specified for each activity. Both sets are comprised by further sets with each set containing a positive number of data objects in specific object states. Fig. 5 represents a process model fragment and an object life cycle comprising the state manipulations done by the activities. Activity A and B are executed in interleaving order; the actual execution order is not enforced by data dependencies due to the extensive data object input and output alternatives and therefore relies on control flow only. In Fig. 5a, the input set {{(Q , b)}, {(Q , x)}} of activity A comprises two sets each containing one entry stating that data object Q is required for enablement either in state b or in state x. In case the input set comprises multiple sets, they will be checked one by one for satisfaction and the first one satisfied by currently existing data objects is utilized to enable the corresponding activity. As already mentioned above, only state transitions with a respecting binding are observable. Referring to Fig. 5b, only the transitions corresponding to activity A in Fig. 5a are observable. Following, execution of activity B cannot be captured by means of object state transition analysis such that we cannot easily reason about enablement of activity C. In fact, between termination of activity X and termination of activity C, we cannot identify the exact current state of the process, i.e., which activity is currently enabled. However, combining the approach presented in this paper with control flow monitoring approaches, the execution of B might be detected through a positioned event monitoring point, preferably to monitor termination of B. Then, we can easily reason about the execution of both interleaving activities and following about enablement of C. Knowing about the existence of output data objects (here: from B) Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Q [x]

Q [b]

Q [x]

Q [a]

Q [ab]

Activity A

7

Q [ab]

a Activity X

Activity C

x

ab b

Activity B Q [a]

Q [x]

Q [b]

Q [ab]

(a)

(b)

Fig. 5. Process fragment (a) and corresponding object life cycle (b).

and based on that control flow information, we can derive that a state transition corresponding to activity B must have been occurred. If the event monitoring point is related to another happening of B, e.g., enablement or begin, above mentioned reasoning only leads to success if B is executed before A. Assuming the life cycle from Fig. 5b with state ab being independent from states a and b and an activity Z having state x as input and state ab as output, two state transitions are comprised by that activity. If both transitions to state ab can be observed, we can reason about termination of Z. Otherwise, control flow information needs to be considered (see above). In case only one transition to ab is observable but a previous transition on the other path is also observable (as notated in Fig. 5b), we can derive at run time whether reasoning about termination of Z is possible from object information. If the transition to state x is not observable and therefore enablement of Z cannot be implied, observing of state a allows to reason about the start of Z. 4.3. Binding Organizations, which follow a model driven execution of business processes, collect the information about the execution of their processes in information systems. In our approach, we assume that information about events is extracted from existing legacy data stores. All data processed during the process execution is stored in the IT systems including process context data, processed data objects, and occurred events. In this part of the paper, we discuss how the object state transitions represented within a Data Object Life Cycle Level can be mapped to the corresponding data state representations at the IT Level (see Fig. 4). For this purpose, we define the binding of a specific object state transition of a data object to the corresponding object state representation as follows: Definition 5. Binding A binding is a mapping function bind : D × T → I, where D is a finite non-empty set of data objects, T is a finite set of object state transitions modeled within a data object life cycle, and I is the set of implementations, i.e., rules and methods, specifying how to extract event information from different data sources within the information system landscape. The defined binding-function can be applied for various data sources within an IT systems landscape, e.g., for querying databases, processing Web service invocations, analyzing structured datasets, stream processing filters, or reading a log entry. Assuming, for instance, that event information is stored within a relational database, the techniques of schema summarization [18] combined with schema matching techniques [19] and matching approaches based on linguistic comparison [19] can be applied to map the object state transitions to the corresponding data representation and to realize the binding – in this particular example – to columns and rows of a relational database. Thus, the bind function is used at design time to identify the mappings, which are then stored within the binding repository. In the second step, the object state transition events are generated based on the IT Level representation at run time, using the mapping information, which is defined by the binding function. Each discovered event will then be stored within an event store using the structure defined in Definition 3. 4.4. Correlation of data object instances to process instances At run time, the instances of data objects must be correlated to the corresponding process instance to allow reasoning about activity progress as described above. We utilize concepts from Ref. [20] where database concepts have been introduced to BPMN process Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

8

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

models. Data objects, associated to activities, get annotated with one primary key, any number of foreign keys, and the type of data operation referring to standard CRUD (create, read, update, delete) operations. A data object read by an activity is read, an object written by an activity with no specified data operation type is updated, a written object with type new is created, and a written object with type delete is deleted. The primary key is a unique value. The foreign key shows on another data object's primary key indicating a directed dependency between both objects. Such dependency can be either rather lose in terms of an association in UML class diagrams [21] or a containment in terms of an UML aggregation or composition. Additionally, Ref. [20] utilizes a concept adapted from object-centric process modeling: a case object that drives the overall execution of business process. One case object is specified for each process model (contained sub-processes or multi-instance activities require an own case object specification that might be identical to the parent process). All data objects utilized in a process model, a sub-process, or a multi-instance activity must refer to the case object either directly or indirectly via foreign key relations. Indirectly means, that an object refers to another object that directly or indirectly refers to the case object. Finally, the instance of a case object is directly related to specific process instances by primary key correlation. If the process instance id equals the primary key of the case object, the case object instance and all other object instances being directly or indirectly related to this case object instance via foreign key relations are handled in that process instance. Since an object can be utilized in multiple process models, various process instances handling the same case object instance would be required to have the same process instance id. Therefore, we slightly extend the approach in Ref. [20] by making the process instance identifier two-parted: the first part contains an identifier referencing the process model, the second contains the primary key of the case object. Thus, second part of the process instance id must correlate to the case object primary key to indicate utilization in the process instance. Referring to the abstract example in Fig. 4, the process model consists of three activities reading and writing object Q. All objects Q get assigned a primary key q_id and no foreign key since Q does not relate to any other object. Furthermore, Q is the case object of this process model. Differently to this example, a case object might have foreign key relations inherited from other process that are not used in the current. For completeness reasons, they can be modeled but can also be suppressed for clarity reasons. Object Q in state a also gets assigned new as data operation type while the other two data objects have an empty data operation type. The process model in Fig. 6 visualizes these changes to the abstract example. Assume, there exist – among others – three instances q1, q2, q3 of data object Q as given on the very left in Fig. 6. Additionally, there exist several process instances as, for instance, the four given on the very right in Fig. 6. Here, all instances belong to the same process model M. Then, we recognize an object state transition event indicating set b for the instance of object Q where the primary key q_id equals 3. Since Q is the case object, this state transition affects all process instances, where the second part of the process instance id is 3. In this case, process instance m11 is affected such that we

Fig. 6. Abstract example to correlate a data object instance to a process instance. The top shows the adapted process model from 4 based on data concepts introduced in Ref. [20] enabling such correlation. These concepts are primary key, foreign key, data operation type referring to CRUD operations, and case object. The bottom shows an example for three instances of object Q (left) and four running process instances (right). Data object instance q1 gets changed by setting state b that affects process instance m11 since for this process instance, the second part of the process instance identifier equals the primary key of q1.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

9

medication data filled i

patient data filled

staff data filled

signed surgery data filled

Fig. 7. Object life cycle of the Surgery Protocol with interleaving states medication data filled and surgery data filled, whereas only the filling of the surgery data is recorded in the underlying IT system.

can conclude based on the object state transition event that activity instance y of m11 is properly terminated and that activity instance z of m11 can get enabled. Summarizing, the presented approach to improve business process intelligence consists of several steps: (1) the definition of an object life cycle for each data object using a state machine, (2) the specification of the observable object state transitions in the object life cycle via the binding function, which maps such transition into an information system and (3) the model of the business process and the assignment of data objects to the corresponding activities. Utilizing this setup, processes can be partly monitored by analyzing the data object states of activities' input respectively output sets to predict the enablement respectively termination of those events. In the next section, we will describe the methodology of setting up the approach in detail. 4.5. Limitations The presented approach is subject to several limitations. First of all, the quality and quantity of the information about object state transitions bias the result of the process monitoring and analysis. For getting the complete picture about the process execution, it is necessary to get at least information about state changes or information about execution of every activity in the process. Further, for getting reliable process analysis results, e.g., for time or costs evaluation, not only the occurrence of state transition events are important, but also the data that is contained in the payload of the events. Performing a costs analysis on a process execution requires that the costs of every activity respectively group of activities can be determined based on event data. Secondly, depending on the object life cycle, it is possible that recognizing a state transition event does not reveal clearly which data object state is reached. As an example, we would like to refer to the object life cycle of the Surgery Protocol in our treatment process (see Fig. 7). As the medication data and the surgery data can be filled in an arbitrary order, two states can be reached when the surgery data is filled: surgery data filled or complete. If it is not possible to observe when the medication data is filled, i.e., it is not recorded in an IT system, then it could not be revealed clearly, which of the two states is reached. If this uncertainty could not be resolved by further control flow information of the process execution, no proper reason about that particular part of the process execution is possible. One further limiting aspect of the approach could be the correlation of data object instances to process instances. If the correlation logic cannot be specified in the binding definition of the object state transition event types, it is not possible to use the information in object state transition events. However, the combination of events with context data, which exists orthogonally to the corresponding process, gives a powerful capability to use huge amounts of data object state transition information [8]. The presented approach bases on object life cycles and process models that coexist independently. Over time, both the data object life cycles and the process models evolve. It has to be ensured that both still fit to each other; otherwise the application of the presented approach may lead to wrong results. Ensuring consistency between these models is outsourced to further research; e.g., Ref. [22] introducing the research field on bidirectional model transformations that, among others, allows keeping two models updated. 5. Methodology For our approach, we allow independent modeling of data objects and their life cycles and of business processes. However, data objects need to be modeled in the process model as well — to enable more detailed views on the business processes [23] and to allow the application of the approach introduced in this section. The independent modeling allows reusing data objects and their object life cycles such that they may be modeled once with all applicable manipulations for each object and then reused by coupling them to multiple process models using a subset of the specified manipulations, i.e., state transitions. Enabling business process intelligence based on object state transition events is a six-step approach with step (iv) being optional as visualized in Fig. 8. (i) Organization-wide object life cycles (O-OLC) that capture the states and the corresponding state transitions of the data objects used in the organization, e.g., order and product, have to be in place; one O-OLC for each utilized type of data objects. These models are often expressed as UML state transition diagrams [21]; for instance, the modeling of the O-OLCs of data objects Information Sheet and Surgery Protocol. (ii) Secondly, the organization's processes need to be described by process models. These models should express the dependencies of activities among themselves as well as on data objects indicating the consumption (read), creation, or change (write) of data objects by process activities. For it, we utilize BPMN [11] in this paper. For application of our approach, other notations allowing the creation of process models as defined in Definition 2 are also fine. As an example, we refer to our treatment process described in Section 2 and shown in Fig. 1. (iii) Furthermore, conformance between process models that utilize data objects and the corresponding organization-wide object life cycles is required ensuring data-compliant and deadlock-free execution of processes [12,24,25]. For setting up business process applications based on object state transition events, the life cycles of the data objects play a central role. (iv) Therefore, it is recommended to reduce the O-OLCs, that contain very detailed views about the object states, to a set of states and transitions required by the specific process models. Within this step, all states not used in the data associations in the process Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

10

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

(i) Model Organizational Object Life Cycle (O-OLC)

(ii) Model Business Process (BP)

(iii) Ensure Conformance of BP and OOLC

(v) Specify Object State Transition Event Types

(iv) Reduce O-OLC (optional)

(vi) Define Observable Object State Transitions

Fig. 8. Steps for setting up business process intelligence based on object state transition events.

models are removed while keeping the causality of the remaining states. At the end of this step, conformance checking as mentioned above should be applied again ensuring correct reduction operations. (v) For all observable object states, an object state transition event type has to be specified. The event type specification needs to consist of an object state transition event structure and a binding that describes how event information is accessed from the underlying IT systems. For instance, the link to the database in which the time of the signature of the Information Sheet is recorded needs to be specified in the particular object state transition event type. (vi) Based on the object life cycles, the state transitions that can be observed are specified. These transitions are connected to the object state transition event types modeled in the previous step. With this association the foundation for business process intelligence based on object state transition events is established. Relating to our healthcare example, the defined object state transition event type (step (v)) needs to be assigned to the object state transition from filled to signed of the Information Sheet data object. 6. Application by use case Besides the healthcare domain, we applied the presented approach to a product order process scenario also. In Section 6.1, we describe the model view of the scenario including the object life cycles, the object state transition event types including their bindings, and the business process model itself. Section 6.2 describes how the models can be used during run-time to identify the progress of a process instance. 6.1. Model view application by use case A product is requested from the customer by an order. This request could be a normal one or an urgent one that requires express shipping. The orders are fulfilled with products in stock; however, sometimes the products need to be produced by machines first. Thus, we find four different data objects in this example case: (a) the order, (b) the product, (c) machines in the production line, and (d) the type of shipment. These data objects are described by object life cycles (see Fig. 9). Based on these object life cycles the state transitions that can be observed in the information systems are selected and the bindings that allow the access to this data are defined. For an Order it is possible to observe the data state transitions to states accepted, rejected, Order created i

accepted

confirmed

shipped

invoiced

paid

received rejected

archived

(a) Machines Product in stock

shipped

set up

in operation

maintenance

retired

i

i not in stock

purchased

received

(b)

(c) Type of Shipment express i regular

(d) Fig. 9. Life cycles of data objects (a) order, (b) product, (c) machines, and (d) type of shipment used in the process model in Fig. 10. The transition labels are omitted due to space requirements.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

11

Table 1 Example of representation of the Product object instances in a relational database. product_id



in_stock

not_in_stock

purchased

received

shipped

856

… …

2012-12-11 10:40 null

2012-12-11 10:51 null

2012-12-12 15:13 null

null

857 858



2012-12-12 16:02 2012-12-14 08:34 2012-12-19 11:55

null

null

null

2012-12-19 13:01 null

confirmed, shipped and paid. In case of the product, every state transition can be tracked in the information systems, whereas for the machines none of the state transitions are tracked. For the data object type of shipment the state transition to the state express is observable. Consider the representation of the data object Product depicted exemplarily in Table 1. The presented table shows excerpt-wise a table of a relational database, which stores the states of the object Product. We omit some columns for space reasons. Each row of the table represents an instance of the Product object in a certain state in accordance with its defined life cycle, introduced in Fig. 9. The stored objects evolve over time changing their state by changing the records within the depicted table. The mapping of a state transition defined in the object life cycle to the implementation of the data extraction from the data source is established by a binding function. The implementation identifies the location of the timestamps of transition occurrence for each object at the model level. As in our case the data source is realized by a relational database, a database query stencil using, e.g, SQL as a query language, will describe the implementation. Listing 1 shows an example of a binding for an object state transition of the object Product from state received to state in stock. Listing 1. Definition of binding for an object state transition of the object Product from state received to state in stock.

For the order process, the company designed a process model (see Fig. 10) consisting of an initial activity Analyze order that uses a received order as input and transfers the Order to the state confirmed and may produce a data object Type of shipment in state express if express shipment is required. After an order has been confirmed it is checked, whether the product request is in stock or not by the activity Check stock creating the data object Product either in state in stock or not in stock. Based on this outcome, a decision is made whether the product needs to be produced or products in stock fulfill the order. In case the product is not available, the raw material for the production is purchased (activity Purchase raw material) and the production plan is made (activity Make production plan). The activity Purchase raw material utilizes the data object Product and sets the state purchased to it once the required raw material is ordered. Make production plan has the data object Product as well and outputs the machines' setup (data object Machines in state setup). Once the purchased material is received, the Product is manufactured (activity Manufacture product) ending with the products

Order

Product

[confirmed ]

[not in stock ]

Type of Shipment

Product

Product

Product

[purchased]

[received]

[in stock]

Product [not in stock ]

Make production plan

[in stock]

Check stock Product

Analyze order

Product

Order [received]

[not in stock ]

Purchase raw material

Receive purchased material

Manufacture product

Machines [set up]

Ship product

Product [in stock]

[express]

Type of Shipment [express]

Product

Product

Order

[in stock]

[shipped]

[shipped]

Fig. 10. Order process including the relevant data objects.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

12

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

being in stock again (data object Product in state in stock). When the produced order is in stock, the activity Ship product can be executed. This activity has a data object input set consisting of two elements requiring either the data object Product in state in stock or the data object Product in state in stock and the data object Type of shipment in state express. Once the activity is executed, the state shipped will be set to data object Product as well as to data object Order. With the usage of the data objects in the process model described above, the information about the observed data state transitions can be used for process monitoring and progress prediction. How this is achieved will be discussed in the following section by application of our approach at the instance view. 6.2. Instance view application by use case In this section, we show the application of the introduced approach at the instance level using the order process defined in the previous section as example. First, we show how the identified bindings can be used for extraction of object state transition events from data sources at the IT Level. Afterwards, we discuss how they can be utilized to predict the behavior of the concrete process instance. In the first step, we use the binding example introduced for the object Product in Listing 1 as a foundation for the definition of transition events. This binding defines exemplary the SQL stencil, which describes the access to the data set, reflecting the object state transition from state received to state in stock. This binding is part of the set of bindings, which describes object state transitions of the object Product : bind(Product, (received, in stock)) ∈ BProduct. At the instance level, the defined binding can be used to access the data, which indicates the state transition of the instance of object Product with product_id = 856 described in Table 1. For this purpose, the SQL query stencil can be used replacing the placeholder bproduct_idN with the object's instance data — 856. Further, the transition events can be extracted from the data sources using the object state transition event type ET Product ¼ ðProduct; BProduct Þ under consideration of the appropriate bindings as described in the example above. One object state transition event for the state transition towards in stock may look as follows: E 1 ¼ ET Product ; 0001; 2012-12-12 16:02; (856, …, 2012-12-12 16:02, 2012-12-11 10:40, 2012-12-11 10:51, 2012-12-12 15:13, null); in stock). The event follows the structure introduced in the object state transition event definition (cf. Definition 3). The object is a record of the database at the point of time, when the event has occurred. The event id is generated. Additionally, we observed the following object state transition events in sequence for the product 856. An event indicating the state not in stock was reached and from there the state purchased was reached. Another event indicates that the data object product was received. There is no event recognized for the shipping of the product yet (see Fig. 11). Analogously, we observed the state Product instance 856 in stock

shipped

i not in stock

purchased

received

(a)

Order

Product

[confirmed ]

[not in stock ]

Type of Shipment

Purchase raw material

Product

Product

Product

[purchased]

[received]

[in stock]

Product [not in stock ]

Make production plan

[in stock]

Check stock Product

Analyze order

Product

Order [received]

[not in stock ]

Process instance for order instance 411

Receive purchased material

Manufacture product

Machines [set up]

Ship product

Product [in stock]

[express]

Type of Shipment [express]

Product

Product

Order

[in stock]

[shipped]

[shipped]

(b) Fig. 11. Instance view of (a) the object life cycle for data object Product with id 856 and (b) the process instance for object Order with id 411. (a) shows for the product instance the current state as well as the historic ones based on the object state transition events generated from the information existing in the information systems. The specific product is currently in state in stock and reached it via states not in stock, purchased, and received (black arrows). It is expected to transition to state shipped some time in the future (dashed arrow). (b) shows the actual progress of the process instance of order 411 with activity Ship product being currently enabled. Thereby, all object state transition events relevant for this order are illustrated using the corresponding data objects: white colored data objects were observed in the respective state, gray colored data objects were not observed in the respective state yet, and gray bordered data objects cannot be observed in the respective state. Ticks visualize successful termination of activities, whereas the watch symbolizes an activity that is enabled respectively currently running.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

13

confirmed for the data object instance order 411. In the information system, there is no information available about the type of shipment. For machines, we cannot observe any state transition at all. With the existing events, we can predict the progress of the order process instance for order 411. The techniques for correlation between instances of the order and the product data objects and the process instance are not part of this paper; they can be established by using approaches as presented in Ref. [20]. Because we cannot observe the state received of data object order, it is not possible to predict the enablement of activity Analyze order. However, we can monitor the state accepted, which is between the object states received and confirmed expected in the output set, but not modeled in the process model. Nonetheless, we can derive from that information, when the activity Analyze order began. The observation of the transition event of order 411 to state confirmed fulfills one of two sets of the output set ({{(Order, confirmed)}, {(Order, confirmed), (Type of Shipment, express)}}) of activity Analyze order, thus the activity is finished. Afterwards, we recognized a transition event for reaching the state not in stock of product 856. Since this is one valid output of activity Check stock, this activity is terminated as well. Based on the state of data object Product, the decision is made for producing the product. By observing the transition event to state purchased of instance 856 of data object product, it can be predicted that activity Purchase raw material is terminated. Analogously, this holds true for activity Manufacture product by recognizing the transition event for reaching the state in stock for product 856. However, we cannot predict the termination of the activity Make production plan from the available event information only. The control flow of the process model needs to be taken into account as well, so that it can be assumed that once the activity Manufacture product is enabled, by observing object state transition event for product 856 and state received, the activity Make production plan must be finished before. With the transition event for product 856 and state in stock, the enablement of activity Ship product is denoted, because one set of the input set ({{(Product, instock)}, {(Product, instock), (Type of Shipment, express)}}) of the activity is met. The activity is not finished so far since the output set is not fulfilled, e.g., {{(Product, shipped), (Order, shipped)}}. Both object state transition events are not observed yet. Even if one of those was already recorded, the activity cannot be set as terminated as the output set requires that both data objects, Product and Order, are in state shipped. Thus, activity Ship product is enabled respectively running, indicated by the watch in Fig. 11. Termination of activity Ship product requires the observation of the object state transition events indicating that the instance of object Product with primary key 856 is set to state shipped and indicating that the instance of object Order with primary key 411 is set to state shipped as well. In the given process model, Order is the case object and Product directly relates to Order via foreign key o_id. The primary keys for Object and Product are o_id and p_id respectively. The left side of Fig. 12 presents three instances o1, o2, and o3 of object Order and two instances p1 and p2 of object Product that are part of all data object instances utilized in various process instances executing the process model from Fig. 10. These instances represent the state of our example process instance during execution of activity instance Ship product. First, we observe an object state transition event that reports about setting data state shipped for an instance of Product with primary key 856. This object instance references an instance of object Order with primary key 411 that in turn references – as case object – process instances with the second part of the id equaling 411. From the process instances represented in the second column from right in Fig. 12, process instance m11 is affected. Later, we observe another object state transition event indicating that the data state of an instance of Order with primary key 411 was set to shipped. As before, this event can be correlated to process instance m11. Since both events refer to the same process instance and they fulfill the

o2 o_id = 425 state = confirmed

o3 o_id = 436 state = received

o3 o_id = 436 state = received

references

s

p2 p_id = 856 o_id = 411 state = shipped

ct

p2 p_id = 856 o_id = 411 state = in stock

p1 p_id = 851 o_id = 436 state = shipped

fe

p1 p_id = 851 o_id = 436 state = shipped

m10 pid = M-398 m11 pid = M-411 m12 pid = M-425 m13 pid = M-456 Process instances

fe

o2 o_id = 425 state = confirmed

af

o1 o_id = 411 state = confirmed af

o1 o_id = 411 state = confirmed

cts

set shipped for object Order with o_id = 411 o1 o_id = 411 state = shipped o2 o_id = 425 state = confirmed o3 o_id = 436 state = received p1 p_id = 851 o_id = 436 state = shipped p2 p_id = 856 o_id = 411 state = shipped

set shipped for object Product with p_id = 856

Data object instances

Data object instances

Data object instances

Fig. 12. Observation of object state transition events indicating that data state shipped is set for instances of objects Order and Product. These events are correlated to the process instance for order instance 411 (process instance m11) indicating that activity instance Ship product is terminated and the end event is enabled resulting in a proper termination of process instance m11.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

14

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

output set condition {{(Product, shipped), (Order, shipped)}}, we can reason that activity instance Ship product of process instance m11 for order instance 411 and product instance 856 is properly terminated and thus the overall process instance gets properly terminated since the end event is enabled. Showing that the prediction of a process instance's progress, e.g., process instance for order 411, can be achieved by utilizing information about the state transitions of data objects illustrates the importance of the existence of object state transition events. 7. Evaluation Next, we describe the system architecture for software realizing the presented approach in Section 7.1. Afterwards, details about an implementation that is used for the evaluation are given in Section 7.2. In Section 7.3, the results of the evaluation are explained. 7.1. System architecture A system supporting the presented approach, cf. Section 4, needs to provide modeling capabilities for event types, process models, activity life cycles, and data object life cycles. Further, an event processing component needs to be in place that uses the model information during run time to provide valuable event information via an event consumer interface. This event consumer interface is the data source for business process intelligence applications, such as process monitoring and analysis. Fig. 13 shows the architecture for such a system. The modeling component provides (a) a capability for designing event types. For event types, the structure must be defined generally. When defining normalized event types, additionally, a binding needs to be specified to describe how the event information could be accessed from the underlying event sources. In case of defining object state transition event types, details about the utilization of underlying normalized events have to be modeled. Same holds for process event types that contain rules about using information from object state transition events and normalized events. Those event types are stored in the event type repository. Further, (b) process modeling capabilities are provided. The created process models are managed in the process repository along with the activity life cycles and object life cycles. (c) Modeling of these is also provided by the modeling component. The event processing component provides the access to event sources by event adapters. Event sources can be other information systems or devices (rectangles on the bottom in Fig. 13) as well as databases, data streams, or single data items (ellipses on the bottom

Event Processing Process Events

Business Process Intelligence Application

Process Event Creation

Object State Transition Events

Event Type Modeling

Event Type Repository

Process Modeling

Process Models

Life Cycle Modeling

Event Consumer Interface

Modeling Object State Transition Event Creation

Normalized Events

Activity Life Cycles Event Normalization Object Life Cycles Process Repository

Event Adapter

Event Source I

Event

... Source n

Event Source A

Event

... Source m

Fig. 13. System architecture for process monitoring based on object state transition events.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

15

Fig. 14. Event type modeling in the event processing platform.

Fig. 15. Modeling of life cycles in extended Signavio modeler including the assignment of event types to state transitions (arcs) in the attributes section.

in Fig. 13). The configuration of an event adapter is defined in the corresponding normalized event type. After accessing the event data by the event adapter, normalized events are created according the specification in the normalized event types. Normalized events are stored in the corresponding data store and build the basis for object state transition event creation as well as for process event creation. The process event creation component utilizes object state transition events from the corresponding storage also. Those process events are then provided by the process event consumer interface to business process intelligence applications, such as process monitoring and analysis.

7.2. Implementation For evaluation of the presented approach, we implemented a system as described in Section 7.1. Event type definition is done in our event processing platform [26] that allows manual specification of event types, but supports also XSD upload and web service calls for event type definition; see Fig. 14 for the user interface allowing specification of the type name, the timestamp name (used for process correlation and ordering of events), and attributes requiring information about the attribute name and the data type. For life cycle modeling (objects and activities), we use the Signavio modeler2 with an own-developed extension. The extension allows the modeling of state transition diagrams in which attributes for the underlying event types can be maintained. Fig. 15 shows this extension with the modeling canvas in the middle framed from the constructs to be utilized for modeling on the left side and the configuration tab on the right side. The event processing platform [26] is also used for event processing. The platform allows the extraction of event data from event sources, their further processing, i.e., normalization, and the creation of object state transition events and process events. The

2 Signavio Core Components, the open source version of the Signavio process editor, available on Google Code: https://code.google.com/p/signavio-corecomponents/.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

16

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Fig. 16. Monitoring of business process based on BPMN.

connection between object state transition events and process events is derived from the process model information that contains input and output data objects connected to activities. With this information, the transformation rules are defined. Further, as a business process intelligence application, the process monitoring capability of the event processing platform is used; see Fig. 16 for an example showing that process instances with IDs 101 and 102 respectively are both currently running. This is an abstracted view on the process instances based on the information taken from the state of the single activities of the process. The process instances will be finished once the last activity is observed as terminated as shown in the application of the scenario described in Section 7.3. The build-in process monitoring function of the event processing platform is set up by defining process event types to the state transitions of the process model's nodes, i.e., activities and events. Fig. 17 shows how this is implemented.

Fig. 17. Assignment of process event types to state transitions of process model's nodes.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

17

Fig. 18. Simulator for data object state changes.

For the evaluation, we built a simulator that simulates (a) the instantiation of processes and data objects, (b) the correlation between process instances and data object instances, and (c) the state changes of the corresponding data object instances. Fig. 18 shows the simulator user interface to set the states by button click for the particular data object instances as entered in the input field. The current state of a process instance respectively a data object instance is stored in a database. While changing the state of a data object instance, an event is triggered and additionally sent to the event processing platform via a web service interface. 7.3. Evaluation results and discussion We evaluate our approach by using the implemented system and applying the example of the product order process used in Section 6. A demonstration of the whole scenario is shown in two screencasts. The modeling part is presented at http://youtu.be/ 4_LpAPmJMoU and the monitoring part is presented at http://youtu.be/0wOgCnQ-ibs. For the scenario, we defined several object state transition events in the event processing platform (modeling video 0:05). These events hold a timestamp of their creation, a data object ID of the corresponding data object, and an ID of the process instance they belong to. We use the Signavio modeler to create the object life cycles for Order, Product, Machine, and Shipment data objects (modeling video 0:35). According to the given example, we assign the event types to several transitions of the object life cycles indicating the observable state transitions, i.e., we add them to Order, Product, and Shipment life cycles (e.g., modeling video 0:50 and 1:02). For the data object Machine, we are not able to observe events as the event sources do not provide such. Furthermore, we use the Signavio modeler to specify a BPMN process model named OrderProcess including the data object dependencies (modeling video 1:56). After the modeling phase, we need to configure the event processing platform for the event processing itself, but also for the process monitoring. Therefore, we create a process named OrderProcess in the platform and assign the exported BPMN process from the Signavio modeler to it – named as OrderProcessModel – so that we can use it for process monitoring (modeling video 3:04). Derived from the process model, all necessary process event types are defined to setup the process monitoring for the order process with the implemented prototype (modeling video 3:42). To create process events from object state transition events during run time, a transformation is necessary, e.g., the transformation to a process event representing the termination of activity Ship product as shown in Listing 2. This transformation specification is linked to the particular process event types (modeling video 4:10). Listing 2. Definition of the transformation rule for process event type PETShipProduct terminate.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

18

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

The shown transformation rule creates a process event of process event type PETShipProduct_terminate when an object state transition event of type OSTETOrderShipped and another one of type OSTETProductShipped referring to the same process instance are observed. The correlation of object state transition event types to process event types and their transformation is derived from the process model description. For correlating the process events to the particular process instance, the correlation rule must be configured based on the processID attribute of the events (modeling video 5:59). In the last configuration step, for every monitorable activity state transition, e.g., begin or terminate, a process event type is assigned (modeling video 6:30). Therewith, the process monitoring is set up. For simplicity reasons, we assume a fixed activity life cycle for all activities containing the transitions initialize, enable, begin, terminate, skip, and disrupt in the prototype implementation. Within the simulation, we initialize the process instance 411 with its data object instances order 411, product 856, machine 123, and shipment 456 (monitoring video 0:03). We initialize also a second process instance 425 with its data object instances order 425, product 425, machine 425, and shipment 425. When changing the state of the order instance with id 411 from initialize to accepted according to its object life cycle, the object state transition event gets recognized by the event processing platform (monitoring video 0:56). As this particular object state transition event of type OSTETAcceptOrder is not used by any transformation to a process event, it is not processed further. We did not configure a transformation within the modeling phase, because we are not interested in state accepted of data object Order as it is not used in the process model of OrderProcess. Confirming the particular order (monitoring video 1:21) triggers an object state transition event as well but forces the creation of a process event of type PETAnalyzeOrder_terminate and another one of type PETCheckStock_begin (monitoring video 1:30). With this information, it can be derived that the particular process instance is running (see for example Fig. 16). When simulating the events ProductInStock and ProductShipped for the Product instance 856 and OrderShipped for Order instance 411 (monitoring video 1:51), we can derive that the corresponding process instance, i.e., the one with id 411 based on the case object's primary key, is finished as the last activity Ship Order of the order process happened (monitoring video 2:34). With the application of the product order process, we showed the applicability of our approach based on the implemented prototype. We showed the steps necessary for setting up the process monitoring consisting of (1) event type specification, (2) object life cycle modeling, and (3) process modeling and their configuration. We then showed the process monitoring by simulating process instances and the state changes of the corresponding data object instances. Without applying the presented approach to the product order process and using the data objects as source of information for process monitoring, it would be very hard to get insights about order processing. As typical BPI applications does not consider data objects and just base on logs of recorded activities, they would fail in that particular field of application.

8. Implication for event process in BPM Applying an event processing framework as, for instance, described in Ref. [8], may not deliver a detailed basis for a business process intelligence application, e.g., for process progress monitoring. Fig. 19 connects the framework from Ref. [8] with the concepts discussed in this paper utilizing an adapted version of this paper's scenario. Considering the concepts from Ref. [8] only, the beginning of activities Purchase raw material and Make production plan and the termination of activity Manufacture product could be monitored based on events available in IT systems related to the process instance (solid framed state transitions of node life cycles in Fig. 19). This provides an abstract process monitoring but it is quite uncertain, as it is not clear when the purchase of raw materials is finished. Adding the presented approach, the process monitoring can be substantiated by utilizing state information of the underlying data objects. For example, by knowing that a product is purchased (state transition in not in stock to purchased) it can be assumed according

Fig. 19. The improvement of process monitoring of non-automated processes.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

19

to our approach that activity Purchase raw material is terminated (dashed framed state transitions in node life cycles in Fig. 19). The same holds for the enablement of activity Manufacture Product with receiving the Product. However, there is still uncertainty within the process monitoring result. The termination of activity Make production plan can neither be terminated based on raw events nor on object state transition events of object Machine. Therewith, showing the enablement of Manufacture product based on the data object Product is an approximation only. When considering object life cycles, further uncertain situations can occur. For the illustration of these situations, the object life cycle for the product shown in Fig. 20 adjusts the one introduced in Section 6. Observing that a certain state is reached does not contain the information from which state this state was reached. As shown in Fig. 20 the state in stock can be reached from state initial and from state received. Assume that both state transitions are observed by an object state transition event that represents the setting of flag “in stock” in a database for a certain product. Therewith, it is not clear whether the product was previously not in stock and needed to be purchased or whether it was in stock from the beginning. Additional object state transition events can help clearing this situation by indicating that some state has been passed, e.g., state purchased if it would be observable. However, the absence of object state transition events does not prove that the lower path was not taken such that the product was in stock from the beginning. This situation may challenge process monitoring if this kind of alternative (in stock and not in stock initially) is present there as well. Furthermore, two state transitions can be represented by the same object state transition event but do not have the same target state, e.g., as shown in Fig. 20 with states shipped and registered. After taking a product out of the stock, it will be sent to the customer as well as registered in a database for handling guarantee in any order. The shipping is not observable (state shipped) whereas the registration is (state registered). The final state of a product is sold and is reached once the product is shipped and registered. Since only the registration of the product can be observed by object state transition events, it cannot be ensured whether the shipping still needs to follow the registration (state registered is reached, but not shipped and therewith not sold) or whether it was shipped already (state shipped was reached implicitly, registered also, and therewith state sold immediately). Thus, it is not possible to definitely determine the state that is currently reached. 9. Related work In this section, we discuss the state of the art for process monitoring and analysis, event recognition, data modeling in business processes, and the correlation between process modeling artifacts and information systems. As motivated, the presented approach targets on the enhancement of the event basis for business process intelligence (BPI). BPI aims to enable quality management for process executions by utilizing features like monitoring and analysis, prediction, control, and optimization [2]. A lot of literature discusses business process execution data capturing and storage [27,2]; however, most of them assume that every process step is observable and thus, the recorded event log is complete. Mutschler et al. [28] propose a reference architecture for BPI, consisting of three layers, i.e., integration, function, and visualization. The presented approach targets in particular on the integration and the combination of information about data objects and the process execution. One application of BPI is process mining [3] that benefits from the presented approach, because more data about the process execution is made available. The events about the data object state transitions can be composed with the process execution events already existing to form a more fine-grained event log to enable existing process mining techniques. The approach targets on the same vein as our previous works do where we present approaches for event recognition in especially manual executing process environments. In Ref. [4], an approach that shows how event information out of information systems can be assigned to particular activities of a business process is presented. Herzberg et al. [8] discuss how the recognition of events could be improved by utilizing process knowledge and external context data — data that exists independently of the business processes and their execution. Robinson and Purao [29] discuss the utilization and benefits of service systems monitoring based on goals that can be seen as an event source for the broader event process framework, consisting of object state transition events and process events. The authors deal with correlation of events to each other but also the correlation to a process instance as well by applying common correlation techniques, like the algorithms for the determination of correlation sets based on event attributes introduced in Ref. [30]. Robinson et al. [31] discuss the application of continuously applied real-time monitoring, such as process monitoring, for discovering process deviations, i.e., changes in user behavior, in a healthcare scenario. Modeling data in the context of business processes received much attention such that most current process modeling notations support the modeling of data information [23] — especially for enacting process models. In this regard, requirements on data modeling capabilities are determined [32,20]. Currently, the activity-centric (current standard way of modeling [11,20]) and the object-centric modeling (cf., for instance, business artifacts [33] and the PHILharmomicFlows project [32]) paradigm are distinguished. Our approach introduced in this paper generally works with both paradigms, but several adjustments need to be undertaken to migrate it from one to the other paradigm. In this paper, we focused on activity-centric modeling because it is the current standard way of modeling and

shipped

registered

registered

shipped

in stock

sold

i not in stock

purchased

received

Fig. 20. Adapted product OLC introducing uncertainty situations when using object state transition events for business process intelligence.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

20

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

therefore the most used one such that our approach achieves highest impact. Furthermore, we assume the correlation of data object instances to process instances exists or can be ensured by the modeling approach [20]. Additionally, to support the design time part of approach, one may use a process model describing all actions and manipulations performed on data objects and synthesize the object life cycles from that model [34,35]. The other way round, the complete process model can be extracted from object life cycles [24] potentially considering given compliance rules [36] to allow monitoring and analysis on all allowed aspects of a process. Data objects and their object life cycles to be used in process models may also be extracted from structured data sources as, for instance, ERP systems [37]. If the process model gets specified independently from the object life cycle, a conformance check can be applied to ensure that the process model only requests data manipulations allowed from the object life cycles of the data objects utilized in the process model [17,24]. Various techniques from the area of information retrieval can be used to support automatic binding identification. We argue that in cases where data sources at the IT Level can be represented in form of an XML schema, e.g., relational databases, Web service requests and responses, structured event logs etc., approaches from the area of schema summarization [19] can be used to identify the relevant data structures, which represents the object state transition in the object life cycle. Additionally, schema-matching techniques surveyed in Ref. [19] can be used to propose the matching of the object state transitions to the appropriate parts of the schema. 10. Conclusion In manual executing business process environments, e.g., in the healthcare domain, it is very valuable to provide business process monitoring and analysis capabilities similar to automatically executed business processes. However, most of the performed activities are not logged in IT systems. To tackle the challenge of incomplete event logs of process executions, we presented an approach to utilize object state transition events for reasoning about process progress and the enablement respectively termination of activities in this paper. Thereby, the transitions of a data object are linked to activities of the process model by using the data object specification within the process model itself, i.e., reading and writing of data objects. The particular transitions are indicated by object state transition events that are specified by a type containing, if they are observable, a binding, which links them into information systems that hold information to recognize state changes for the observable transitions. If such state change is recognized for an object instance, a corresponding object state transition event is generated and then utilized for reasoning about business process progress, but also to provide analysis like time and costs investigations. An implementation of the proposed software architectures showed the applicability of our approach using the example of a product order process scenario. Combined with control flow based event recognition, this approach increases the number of events and therefore the log quality to be used in the field of business process intelligence for mining, monitoring, and analysis. The presented approach can be used to introduce an event base for environments without logging capabilities or it can be used to extend already existing event bases to improve their quality. In future work, we plan to also extract events from resource information to further improve the quality of the event log used for business process intelligence. Further, we will look into the independent evolution of data object life cycles and process models to keep the business process intelligence results consistent. Acknowledgments The research leading to these results has been partly funded by the DFG under grant agreement WE 1930/8-1 and has been partly funded by the European Commission under the 7th Framework Programme (FP7) for Research and Technological Development under grant agreement 2012-318275. We thank Anne Baumgrass and the GET Service team for maintaining the event processing platform for constructive feedback and continuous support during the implementation of our concepts and coupling them to the event processing platform. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

M. Weske, Business Process Management: Concepts, Languages, Architectures, Second Edition Springer, 2012. D. Grigori, F. Casati, M. Castellanos, U. Dayal, M. Sayal, M. Shan, Business process intelligence, Comput. Ind. 53 (3) (2004) 321–343. W.M.P. van der Aalst, Process mining: overview and opportunities, ACM Trans. Manag. Inf. Syst. 3 (2) (2012) 7:1–7:17. N. Herzberg, M. Kunze, A. Rogge-Solti, Towards process evaluation in non-automated process execution environments, 4th Central European Workshop on Services and their Composition (ZEUS), CEUR-WS 2012, pp. 97–103. N. Herzberg, A. Meyer, Improving process monitoring and progress prediction with data state transition events, 5th Central European Workshop on Services and their Composition (ZEUS), CEUR-WS 2013, pp. 20–23. N. Herzberg, A. Meyer, O. Khovalko, M. Weske, Improving business process intelligence with object state transition events, Conceptual Modeling (ER), Springer 2013, pp. 146–160. M. Zur Muehlen, Process-driven management information systems combining data warehouses and workflow technology, Proceedings of the International Conference on Electronic Commerce Research (ICECR-4), Citeseer 2001, pp. 550–566. N. Herzberg, A. Meyer, M. Weske, An event processing platform for business process management, Enterprise Distributed Object Computing Conference (EDOC), IEEE, Vancouver 2013, pp. 107–116. W.M.P. van der Aalst, The application of Petri nets to workflow management, J. Circuit Syst. Comp. 8 (1998) 21–66. T. Murata, Petri nets: properties, analysis and applications, Proc. IEEE 77 (4) (1989) 541–580. OMG, Business Process Model and Notation (BPMN), Version 2.0, 2011. A. Meyer, M. Weske, Weak conformance between process models and synchronized object life cycles, Service-Oriented Computing, Springer 2014, pp. 359–367. O. Etzion, P. Niblett, Event Processing in Action, Manning Publications Co., 2010 N. Herzberg, M. Weske, Enriching Raw Events to Enable Process Intelligence — Research Challenges, Tech. Rep. 73, Hasso Plattner Institute at the University of Potsdam, 2013.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008

N. Herzberg et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

21

[15] G. Hermosillo, L. Seinturier, L. Duchien, Using complex event processing for dynamic business process adaptation, Services Computing (SCC), 2010 IEEE International Conference on, IEEE 2010, pp. 466–473. [16] M. Daum, M. Götz, J. Domaschka, Integrating CEP and BPM: how CEP realizes functional requirements of BPM applications (industry article), Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, ACM 2012, pp. 157–166. [17] A. Meyer, A. Polyvyanyy, M. Weske, Weak conformance of process models with respect to data objects, 4th Central European Workshop on Services and their Composition (ZEUS), CEUR-WS 2012, pp. 74–80. [18] X. Yang, C.M. Procopiuc, D. Srivastava, Summarizing relational databases, Proc. VLDB Endowment 2 (1) (2009) 634–645. [19] E. Rahm, P.A. Bernstein, A survey of approaches to automatic schema matching, the, VLDB J. 10 (4) (2001) 334–350. [20] A. Meyer, L. Pufahl, D. Fahland, M. Weske, Modeling and enacting complex data dependencies in business processes, Business Process Management, Springer 2013, pp. 171–186. [21] OMG, Unified Modeling Language (UML), Version 2.2, 2009. [22] K. Czarnecki, J.N. Foster, Z. Hu, R. Lämmel, A. Schürr, J.F. Terwilliger, Bidirectional transformations: a cross-discipline perspective, Theory and Practice of Model Transformations, Springer 2009, pp. 260–283. [23] A. Meyer, S. Smirnov, M. Weske, Data in business processes, EMISA Forum 31 (3) (2011) 5–31. [24] J. Küster, K. Ryndina, H. Gall, Generation of business process models for object life cycle compliance, Business Process Management, Springer 2007, pp. 165–181. [25] Z. Wang, A.H.M. ter Hofstede, C. Ouyang, M. Wynn, J. Wang, X. Zhu, How to Guarantee Compliance between Workflows and Product Lifecycles? Tech. rep., BPM Center Report BPM-11-10, 2011. [26] S. Buelow, M. Backmann, N. Herzberg, T. Hille, A. Meyer, B. Ulm, T.Y. Wong, M. Weske, Monitoring of business processes with complex event processing, 11th International Conference on Business Process Management Workshop on “Emerging Topics in BPM”, Springer 2013, pp. 277–290. [27] B. Azvine, Z. Cui, D. Nauck, B. Majeed, Real time business intelligence for the adaptive enterprise, E-Commerce Technology, 2006. The 8th IEEE International Conference on and Enterprise Computing, E-Commerce, and E-Services, The 3rd IEEE International Conference on, IEEE 2006, pp. 29–39. [28] B. Mutschler, J. Bumiller, M. Reichert, An approach to quantify the costs of business process intelligence, EMISA 2005, pp. 152–163. [29] W.N. Robinson, S. Purao, Monitoring service systems from a language-action perspective, IEEE Trans. Serv. Comput. 4 (1) (2011) 17–30. [30] H.R. Motahari-Nezhad, R. Saint-Paul, F. Casati, B. Benatallah, Event correlation for process discovery from web service interaction logs, VLDB J. 20 (3) (2011) 417–444. [31] W.N. Robinson, A. Akhlaghi, T. Deng, A.R. Syed, Discovery and diagnosis of behavioral transitions in patient event streams, ACM Trans. Manag. Inf. Syst. 3 (1) (2012) 4. [32] V. Künzle, M. Reichert, PHILharmonicFlows: towards a framework for object-aware process management, J. Softw. Maint. Evol. Res. Pract. 23 (4) (2011) 205–244. [33] A. Nigam, N.S. Caswell, Business artifacts: an approach to operational specification, IBM Syst. J. 42 (3) (2003) 428–445. [34] R. Eshuis, P. Van Gorp, Synthesizing object life cycles from business process models, Conceptual Modeling, Springer 2012, pp. 307–320. [35] R. Liu, F.Y. Wu, S. Kumaran, Transforming activity-centric business process models into information-centric models for SOA solutions, J. Database Manag. 21 (4) (2010) 14–34. [36] N. Lohmann, Compliance by design for artifact-centric business processes, Inf. Syst. 38 (4) (2013) 606–618. [37] E.H. Nooijen, B.F. van Dongen, D. Fahland, Automatic discovery of data-centric and artifact-centric processes, Business Process Management Workshops, Springer 2013, pp. 316–327.

Please cite this article as: N. Herzberg, et al., Improving business process intelligence by observing object state transitions, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.07.008