Process mining applied on library information systems: A case study

Process mining applied on library information systems: A case study

Library and Information Science Research xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Library and Information Science Research journ...

2MB Sizes 2 Downloads 71 Views

Library and Information Science Research xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Library and Information Science Research journal homepage: www.elsevier.com/locate/lisres

Process mining applied on library information systems: A case study ⁎

Elia Kouzari , Ioannis Stamelos School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

A B S T R A C T

Process mining techniques have already been studied in a wide range of sectors, revealing useful information on the processes. In this study, a five-step methodology is applied to an integrated library system (ILS) for the first time. Given two event logs from two different organizations the ILS, a process mining tool is used for process discovery and data analysis. The findings reveal that although both of the organizations were using the same system, there were differences in the activities, sequences, and approaches followed by each one in daily tasks. The results of this kind of analysis can be used to highlight best practices and improve processes. In addition process model comparisons can then be made across various systems and organizations.

1. Introduction An integrated library system (ILS) is any information system used to manage the processes of a library. With the hundreds of transactions that happen in a library on a daily basis, the use of information systems that address the needs of these complex systems is essential in order to manage digital and print resources efficiently. Fortunately, there are tools, techniques, and methods available that can be used to extract, analyze and enhance the processes applied in ILSs. One of these methods is process mining, applied in this research in two implementations of the same ILS in different organizations. Going into its second decade, process mining is now a prominent field in academic research that continues to evolve and support a range of applications in an expanding set of disciplines. A collection of process mining algorithms and techniques is applicable to the event logs available in almost every information system for the purpose of discovery, monitoring, and improvement of the processes of a system (van der Aalst et al., 2012).

Comparing how the same ILS is used differently in different organizations adds an additional adds additional analytical information. The research questions addressed by this study are the following: RQ1: What do the actual processes look like across organizations that use the same ILS? RQ2: Are there any visible bottlenecks in the processes extracted after applying process mining? RQ3: Is the same process executed differently on the same information system across different organizations? The value of process mining, especially to those who manage library technology, including ILSs, is a deeper and more detailed understanding of what processes are at work, where they are working efficiently, and where they might need improvement. Over the long-term, as process mining is applied more widely in libraries, process models can be compared across systems and organizations, with a view to identifying best practices. 2. Literature review

1.1. Problem statement

2.1. Process mining generally

The library community has evolved a relatively standardized set of processes such as cataloguing, acquisitions, circulation, and so on. These standardized processes have in turn created the circumstances that have allowed for the development of an industry which supports those processes with various products, most notably ILSs. Process mining has not yet been applied in libraries. Since it has been demonstrated to have made useful contributions stated in other fields, it would seem worthwhile to explore the application of process mining in libraries, specifically to ILSs. One insight to be gained would be whether and how a library's ILS reflects the state of business processes.

Although it is not long ago that researchers started focusing on the applications of process mining, the research in this area has already contributed to improved understanding of the way systems perform by extracting knowledge used to improve system processes. As a result of these successes, researchers have been studying the application of process mining in a variety of sectors, trying not only to discover processes but also to perform conformance checking and to enhance systems to improve productivity. There has recently been an increase of research studies of process mining in the domains of healthcare (Delias, Doumpos, Manolitzas,



Corresponding author. E-mail addresses: [email protected] (E. Kouzari), [email protected] (I. Stamelos).

https://doi.org/10.1016/j.lisr.2018.09.006 Received 27 October 2017; Received in revised form 22 July 2018; Accepted 24 September 2018 0740-8188/ © 2018 Elsevier Inc. All rights reserved.

Please cite this article as: Kouzari, E., Library and Information Science Research, https://doi.org/10.1016/j.lisr.2018.09.006

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

like”, “how well does the system perform”, and “who is involved in the process and how” need to be answered fast with only an event log provided as a resource. The authors applied the method to a Dutch government organization and the outcome focused on control flow, performance, and organizational perspectives.

Grigoroudis, & Matsatsinis, 2013; Partington, Wynn, Suriadi, Ouyang, & Karnon, 2015; Rojas, Munoz-Gama, Sepulveda, & Capurro, 2016), insurance (De Weerdt, Schupp, Vanderloock, & Baesens, 2013), and governmental organizations (Park & Kang, 2016). The maturity of the field has allowed numerous tools and platforms to be developed, with ProM,1 Disco2 and RapidProm3 being the most popular, and has allowed for the creation and application of formal and hybrid process mining methodologies in various application areas. The starting point for process mining related research is the Process Mining Manifesto (van der Aalst et al., 2012), considered to lay out the state-of-the-art for process mining. The manifesto is the result of discussions by 77 researchers from 53 different institutions about the fundamentals of process mining, with the intention of promoting research, development, education, implementation, evolution, and understanding. As well as explaining the connection between data mining and business process modeling and analysis, the manifesto also offers concrete guidelines for undertaking process mining, defining an event log as the starting point for every process mining procedure and listing the three types of process mining which can then be conducted: discovery, conformance, and enhancement. The manifesto also includes guiding principles to help analysts avoid obvious mistakes and enumerates the challenges faced by modern organizations that are trying to manage non-trivial operational processes. Its publication was followed by process mining applications in a number of different domains, some example of which are highlighted here. Rojas et al. (2015) point out in an extensive literature review that “healthcare processes are highly dynamic, complex, ad-hoc, and are increasingly multidisciplinary, making them interesting to analyze and improve” (p. 225). This study analyzed the literature systematically for elements such as process and data types, frequently posed questions, perspectives, tools and techniques, methodologies, implementation and analysis strategies, geographical origin, and medical field. Finally, they report that process mining in healthcare is mostly used for process discovery, but they point out that it is important also to promote process mining for conformance checking and compliance to help monitor various processes. Driven by the need to monitor the impact of the daily processes of systems, De Weerdt et al. (2013) describe a methodology framework that directs process mining to the financial services sector. They emphasize data extraction and exploration and the multi-faceted nature of analyzing process execution data, and at the same time they clarify the benefits and challenges of performing a realistic process mining research. The authors propose their own process mining methodology framework that focuses on the phases of data preparation and exploration and acknowledges the multi-faceted analysis needed in a setting highly flexible business processes where logged data may not be strictly structured, making the analysis even more complicated. Their framework was applied in a large Belgian insurance company. The findings proved valuable to the company, as managers were able to contrast actual behavior with expectations and requirements, highlighting the point that a process mining analysis on real event logs can be the starting point for management to take measures towards improving business processes. A process mining method called process diagnostics was first introduced by Bozkaya, Gabriels, and van der Werf (2009), who describe the method as giving “a broad overview of the organization's process (es) within a short period of time” (p. 22). The important features of process diagnostics are that it is relatively simple to apply and does not require domain specific knowledge in advance. This makes it useful in cases where questions like “what does the process model actually looks

2.2. Process mining and ILS According to Müller (2011), “ILS systems are multifunction, adaptable, software applications that allow libraries to manage, catalog and circulate their materials to patrons” (p. 58) and are essential tools for managing the processes of acquiring, describing, and making available library resources (Wang & Dawes, 2012). Disruptive changes in information technology in the world at large have affected libraries as well, largely in terms of resources and the services provided. Electronic resources, including digital collections and e-books in libraries and publicly-accessible online resources on the Internet, outpace physical materials. In addition, with so much information readily available everywhere in a matter of seconds, library users expect to be offered immediate and straightforward access to such services and resources as are provided by the library. As this situation was emerging, Breeding (2007) pointed to the lack of (and importance of) automated tools that assist library personnel effectively and the need for information systems to managing digital and print resources productively. Gradually, a new generation of ILS has emerged, with the most prominent examples being Evergreen,4 Koha5 and PMB.6 Systems such as these can be used for process mining to gain insight on main processes and extract valuable conclusions on their current performance. 3. Procedures The study focuses on the discovery aspect of process mining as described in the Process Mining Manifesto (van der Aalst et al., 2012). More specifically, an event log that is extracted from the information system is modified, cleaned, and inserted into a process mining tool. Without using any a-priori information, the process mining tool extracts a process model based on the input from the event log reflecting the process that currently resides in the information system of the organization. The methodology followed here is based on the five-phase process diagnostics method as presented by Bozkaya et al. (2009; Fig. 1), and is applied to the event logs of an academic library and a municipal library in Greece, using the Disco process mining tool. An event log contains all relevant activity performed in a system, in a well-defined way such that each event corresponds to an activity and is also associated to a process instance. In addition, information on the actor of each event as well as a timestamp can also be recorded and accompany each record in the event log (van der Aalst & Dustdar, 2012). Both of the libraries use the same open source ILS (Koha) to manage their library processes. 3.1. Process mining steps 3.1.1. Log preparation In this first phase the data extracted from the ILS need to be cleaned and transformed into a format suitable for process mining. Taking into consideration the fact that Koha is open source and users can freely customize the system, it is critical to create a logging scheme with the events presented in a way that is meaningful and helpful. It is expected that some extra effort will be needed to extract appropriate information from event logs of open source software projects in the right form (Kouzari & Stamelos, 2013). Nevertheless, in this study it is of great importance to create comparable event logs in order to present process

1

4

2

5

http://www.promtools.org http://fluxicon.com/disco 3 http://www.rapidprom.org

http://evergreen-ils.org http://www.koha.org 6 http://www.sigb.net/pmb 2

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

Fig. 1. Methodology applied for cases in the study.

2013). Being able to be installed on and run under all operating systems, Koha offers a very configurable and adaptable user interface, allowing each organization to set it up accordingly to local procedures and policies. Based on these characteristics and the availability of Koha implementations in Greek libraries it was an ideal candidate for this research.

mining and produce process diagrams that contain the same level of information. In general, an event log needs to at least contain information regarding the timestamp of an event, case id, and activity performed. It is also helpful, but not required, to include information on supporting activities and the user performing the activity at the time. 3.1.2. Log inspection In this step the researcher investigates the data in the event log and is able to extract useful information on the size of the log (both number of cases and columns per case); identify possible processes, major activities, and support activities; and gain insights into the size of the process and event log. In some cases, it might be useful to remove incomplete cases using filters which may provided by the process mining tool used.

3.3. The cases Data from the cases will be referred to from this point forward as sample1 (the municipal library) and sample2 (the academic library) respectively. 3.3.1. Municipal library network Sample1 was extracted from the Koha ILS of a municipal library network in Greece. The municipality is one of the biggest in Greece, with a population of almost 100,000, and belongs to one of the biggest Greek urban areas. The library network of the city consists of a central library building and six children's libraries. The ILS integration in the library network of the city was initiated in 2010 and concluded in 2014. Koha 3.6 was initially installed, with complete functionality, and library personnel were trained to use it and were given documentation in the Greek language.

3.1.3. Control flow analysis A first process model is then constructed by the process mining tool. In the case that there are multiple cases in the event log that do not conform to the process model, a spaghetti-like model is produced, meaning that there is no clear path that resembles a process followed. In that case the filters provided by the process mining tool can again be applied to not take under consideration infrequent events in the construction of a process model. 3.1.4. Performance analysis The process models extracted during step 3 can then be used for performance analysis to check for bottlenecks in the process. The throughput time of each activity can be used as a comparison measure between organizations to identify process improvement areas and bestcase implementations.

3.3.2. Academic library Sample2 was extracted from the Koha ILS of the central library building of one of the biggest universities in Greece. The library was founded in 1927 and is the second largest in Greece, after the Greek National Library. Koha was fully installed in 2015. With millions of resources and thousands of registered users, the ILS circulates close to 29,000 books a month.

3.1.5. Role analysis Finally, the data available on the persons involved in the process are analyzed to determine who executes which activities and also to explore the productivity of users. This kind of insight can then be examined by the administrators of the system to decide upon work allocation among team members, and other related tasks. Finally, at the end of the procedure, a report is formed to communicate the findings of the research to the administrators of the system or to management.

4. Results For the first step simple text editors and spreadsheet software were used, and the remaining steps were completed with the Disco process mining too. The event logs were extracted from the table action_logs that consisted of seven columns (Table 1): Table 1 Fields used for process mining in sample 1.

3.2. The ILS Koha is a web-based open source ILS written in Perl and distributed under the GNU general public license. With a very active community of developers around the world, Koha was first released in 2000 and since then has gained wide acceptance and is used in hundreds of organizations (both in the government and private sectors) around the world. Koha interacts with a MySQL database and supports a variety of library activities, with the most widely used being cataloguing, acquisitions, circulation, and administration (Macan, Fernandez, & Stojanovski, 3

Column number

Contains

Assigned as

1 2 3 4 5 6 7

Event ID Timestamp User ID Module Action Object Additional information in SQL form

IGNORED TIMESTAMP RESOURCE (User ID) ACTIVITY (Main Activity) ACTIVITY (Action) CASE ID IGNORED

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

• action-id: the unique number automatically assigned to each event that takes place in the system • timestamp: a complete timestamp for each event in the format yyyymm-dd HH:mm • user: the numeric ID assigned to the registered user who was responsible for the action on each event • module: one of the main actions selected to perform • action: the action performed in the selected module • object: the object to which the action was performed (usually a •

Table 2 Actions and activities that create the 15 activities used in sample 1 process models.

numeric ID for a resource); for this case study the object field was selected as the case ID in the application of process mining, for the reasons explained below info: other information in the form of SQL sub-queries that are ignored for purposes of this study

Main activity or action

Cataloguing

MODIFY ADD ISSUE RETURN DELETE CREATE ADDCIRC MESSAGE DELCIRC MESSAGE RENEW

* *

Circulation

Authorities

Members

System preference

* *

*

*

*

* * *

* * *

* *

4.1. The municipal library Cataloguing/Add and Circulation/Issue. In the process model all 15 activities are present with a median frequency of 1132. Table 3 presents the most common activities. Cataloguing/Modify, Cataloguing/Add and SystemPreference/ Modify are the activities used first in a case most of the time. However, all five activities presented in Table 3 are also the activities that most frequently appeared as the last ones in the cases. Most of the cases follow a very simple path of between 1 and 3 activities and end instantly. However, there are some cases (system related) that are very common and run in the background, maximizing their mean active time. In order to focus on the most common sequences of events in the system, some filters were applied to the extracted process model. Only the cases whose sequence of events were shared by at least 100 cases were kept in order to limit exceptional behavior. Applying this variation filter results in the simplified model illustrated in Fig. 3. 79% of the cases follow this process model, representing 39% of events and 1% of variants. Only 5 out of the 15 activities are included. The model shows the case frequency along with the minimum duration per activity. The most common path is the Cataloguing/Add - > Authorities/Add - > Cataloguing/Modify.

4.1.1. Log preparation The 302 MB log file contained header and footer information regarding server configuration and the structure for table action_logs. The first event was dated 2015-03-02 22:42:49 and the last one 2016-12-01 09:01:01 (the time it was extracted). Each event had to be placed on a new line, to be treated as a separate record at a later stage. Simple text editor commands were used to remove the SQL related part from each case (such as the “INSERT INTO” at the beginning of each record). As a result, a text file with 1,049,834 lines of comma separated values (CSV) was formed and was loaded into a spreadsheet program. Unfortunately the file was too big for the spreadsheet program and only the first 1,048,575 records were included, with the ending date now being 2016-06-28 08:26. At this stage string functions were applied in the first column to clean the action-ID field. The processed file was saved as a tab-delimited text file and was used as the process mining event log for sample1. 4.1.2. Log inspection The event log was loaded into Disco. The Event ID column was ignored since it solely identifies an event and does not affect the process itself. The timestamp column was essential for the application of process mining as was the user ID column that was set to be a resource in the process mining terms used by Disco. The columns for module and action contained the activities performed by the ILS and therefore they were both set as activities in the process mining terms used by Disco. The object column contained the case ID (the book identification number in this case). Finally, the last column of the event log contained additional information in SQL and was ignored as it was of no importance for the process mining procedure. During log inspection, 1,045,108 events were loaded into the process mining tool and they corresponded to 69,618 cases. 654 users had initiated at least one event in the system. Five main activities were identified, namely: Cataloguing, Circulation, Authorities, Members and Systempreference. In addition, 9 actions were applied in the context of the main activities: Modify, Add, Issue, Return, Delete, Create, Addcircmessage, Delcircmessage, and Renew. The combinations of main activities and actions revealed 15 activity cases (Table 2).

4.1.4. Performance analysis Further analysis was performed on the events used to extract the simpler model. In Fig. 4 the 10 most frequent variants are shown along with their total duration for the entire event log. A variant is actually a path present in the system and it represents a scenario of a specific case. A variant itself does not show the start and end point of a process but does show the start and end point of the data associated with it. Variants help reveal different perspectives on the system. In Variant 1 there are 12,792 cases with four events per case. Looking more closely shows that the sequence of the four events was Cataloguing/Add - > Cataloguing/Add - > Authorities/Add - > Cataloguing/Modify. Whereas the first event had minimal conclusion times (0 ms – 20 min), the second event had an average conclusion time of 11 h and the third approximately 8 h. The fourth and final was instantly executed. It is clear that the time between the first and next activities is much longer. Since no information exists on the duration of the activities, it is likely that since activities 2 and 3 require some kind of authorization to take place the time increase was the result of waiting for an administrator to grant access in order to proceed with a case. Analyzing all 10 most frequent variants led to the conclusion that all activities that contained Authorities had very large waiting times whereas all events dealing with Cataloguing activities were almost instantly executed.

4.1.3. Control flow analysis The original process model constructed by Disco is shown in Fig. 2. This model illustrates the absolute frequency of cases, for all activities in the event log, limited to showing only a single path to avoid the “spaghetti-like” process model. The results of this analysis show that Cataloguing/Modify is clearly the most frequently executed activity in the organization. Disco allows for deeper insights into the data, revealing that most cases appear in Cataloguing/Modify, Cataloguing/ Add and Authorities/Add and that the top values for mean duration are observed in Cataloguing/Modify activity and Cataloguing/Delete, Members/Create, Authorities/Delete, Authorities/Modify,

4.1.5. Role analysis Disco does not offer a rich variety of role analysis tools. However, 4

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

Fig. 2. Initial process model extracted from the event log of sample 1.

=11 and day(timestamp)=2,” indicating that the data provided were limited to one day. However, the structure of the table action logs was visible through the INSERT INTO queries included. The first event was dated 2016-11-02 01:00 and the last one 2016-11-02 19:58. Sample2 only contained events of one day due to restrictions set by the system administrator prior to providing the data. However, this does not create a sampling problem for the research since the main goal is to mine the processes of the system rather than to statistically compare the events with those of sample1. For this sample no extra processing was required since the sample was already broken down into separate lines for each record, although commands were also used to remove SQL related parts from each case. As a result, a CSV text file with 11,950 lines was formed and was fully loaded in a spreadsheet program. The processed file was saved as a tab delimited text file and was used as the process mining event log for sample2.

Table 3 Most common activities in sample 1. Activity or action

Relative frequency (%)

Cataloguing/Modify Circulation/Issue Cataloguing/Add Circulation/Return Authorities/Add

62.81 10.96 10.92 10.79 3.83

since it does provide a resource indicator for each event, some insight about the users of the system was gained. The most active users are shown in (Fig. 5). It is clear from system labeling that in the majority of cases these profiles correspond to system accounts that are used most of the time. The average user of the system (relative frequencies vary from 0% - 0.1%) takes part in an average of 50 events. Again, without any particular knowledge about the users (library staff) it is likely that such users are those who have access in the system only for searching purposes and other relevant activities. Since the activities in which these users participate are mostly Circulation/Issue and Circulation/Return, this explains their low frequency of using the system. There is also another group of users whose relative frequency is around 0.5% and the average number of events in which they participate in is around 3000. This group most likely contains the system administrators who perform daily activities. 4.2. The university library

4.2.2. Log inspection The event-log was loaded into Disco and the same columns as sample1 were assigned for process mining (Table 1). During log inspection, all 11,950 events were loaded into the process mining tool and they corresponded to 4575 cases. 74 users had initiated at least four events in the system in that day. Five main activities were identified, namely Fines, Cataloguing, Circulation, Authorities, and Members. In addition, ten actions were applied in the context of the main activities: Modify, Add, Issue, Return, Delete, Create, Addcircmessage, Delcircmessage, Renew, and Null. The combinations of main activities and actions revealed 16 activity (Table 4).

4.2.1. Log preparation In the second case the 4.2 MB log contained only header information, but it varied from that of the first sample. For sample2 the header information included the following query: “Query: SELECT * FROM koha.action_logs where year(timestamp)=2016 and month(timestamp)

4.2.3. Control flow analysis The original process model for sample2 illustrates the absolute frequency of cases, for all activities in the event log limited to showing only a single path to avoid the “spaghetti-like” process model (Fig. 6). The results of this analysis show that Cataloguing/Modify and Fines are 5

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

activities and end instantly. However, there are some cases (system related) that are very common and run in the background, maximizing their mean active time. Since sample2 was only limited to the events of one day, some graphical illustrations of the events over time (Fig. 7) and the active cases over time (Fig. 8) were also extracted. Starting at midnight it is obvious that up until 8:00 am there is a burst of events followed by a limited number of events. This burst corresponds to system activities that are initiated, and then it is clear that some of them take several hours to complete. From 8:00 am until 8:00 pm the figures show an increased number of events served by the system followed by a decrease as the end of the day approaches for librarians and users. Filters were applied in order to focus on the most common sequences of events in the system. Only the cases whose sequence of events were shared by at least 100 cases were kept in order to limit exceptional behavior. Once the variation filter was applied, the process model was extracted (Fig. 9). 77% of the cases follow this process model represented by 53% of events and 3% of the variants. Only 5 out of the 16 activities are included. The model shows the case frequency along with the min duration per activity. The most common path is still Cataloguing/Add - > Cataloguing/Modify. 4.2.4. Performance analysis For sample2 the performance analysis in the original event log rather than the filtered one was evaluated to keep all the available information. In Fig. 10 the most frequent variants are shown along with their total duration for the entire event log. Despite the fact that in most of the variants there are only a few events per case, the events contain mostly the Cataloguing/Modify activity that is performed instantly. However, in some cases, even between events referring to the same activity (Cataloguing/Modify - > Cataloguing/Modify), there are large time deviations in the case completeness. For example, 7–9 h might pass for one event to proceed to the next; this might be based on system activities that interfere. Library management in different organizations may have quite different policies. In sample2 Fines, a common procedure for book returns, is shown to be delayed. This is illustrated in both the general and the specific extracted process models, as Fines is one of the most frequently appeared activities. Variant 17 (Fig. 11) reveals some additional information in the process of Fines creation. It consists of 5 events. The sub-process originates with the Fines activity that is reset each day and then proceeds to the Fines activity for a specific event. Circulation/Return then takes place. In the case of expired due-dates, the Fines/Modify activity appears, followed immediately by Fines/ Create, which creates the fine and ends the case.

Fig. 3. Simplified process model containing the cases for which the sequence of activities is shared by at least 100 cases.

the most frequently executed activities in the library and as a result these are also the activities that participate in the majority of the events in the event log. Using the tools provided by Disco, the top values for mean duration can also be observed in relation to the Circulation/Issue activity and Circulation/Return and their transitions to Members/ Create and Fines/Create in the first case and Fines in the second case. In the process model all 16 activities are present with mean case duration 24,1 min and median frequency of 78. Table 5 presents the most common activities in this case. Cataloguing/Modify, Fines, and Circulation/Issue are the activities used first in most of the cases. All three activities are also the most frequently appearing as last ones in all cases. In sample2, as with sample1, most of the cases follow a very simple path between 1 and 3

4.2.5. Role analysis This phase highlighted the most active users for sample2 (Fig. 12). It is once again clear from the labeling of the system that resource 0 corresponds to a system account that is used most of the time. The rest of the users interacted with the system several times with frequencies that vary from 0.1% to 4.27%. At this point and with no other information, it is not possible to discern with confidence a group of users

Fig. 4. 10 most frequent variants for sample 1. 6

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

Fig. 5. Most active users in events for sample 1. Table 4 Actions and activities that create the 16 activities used in sample 2 process models. Main activity or action

Cataloguing

MODIFY ADD ISSUE RETURN DELETE CREATE ADDCIRC MESSAGE DELCIRC MESSAGE RENEW NULL

* *

Circulation

Authorities

* *

Members

Table 5 Most common activities in sample 2.

*

*

* * *

*

* * *

Activity or action

Relative frequency (%)

Cataloguing/Modify Fines Circulation/Issue Circulation/Return Fines/Modify

45.92 30.76 7.97 7.15 2.49

Fines

the first case and for a period of a day for the second case, different levels of information were analyzed, leading to different insights from the extracted process models. The text files that contained the raw data were almost identical in their structure and they both used the same table and the same attributes to store the data. This made it easier to create comparable event logs and to end up with process models that contained the same activities. The observed slight variations in a small number of activities was expected since the library system is used in organizations that operate in different domains. In the municipal library, the originally extracted process model was complicated, with Cataloguing/Modify being the most observed activity in the cases. By applying the filter, a highlighted process that is obviously simpler than the original model was produced. However, the most frequently used activity was still Cataloguing/Modify and the

* * *

representing employees and a group representing students.

5. Discussion To begin with, the difference in the number of events per system was very interesting, since it allowed for the extraction of different process models and also enabled the identification of distinguishing characteristics of each system. With events for a period of 20 months for

Fig. 6. First process model extracted for sample 2. 7

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

Fig. 7. Events over time for sample 2.

Fig. 8. Active cases over time for sample 2.

Fig. 9. Filtered process model for sample 2.

library, since data was limited to one day, but this allowed for the extraction of diagrams that illustrated the number of events per hour of the day. The fact that the events all took part in a day might mean that specific system activities that usually take place but were not scheduled to run on that specific day would minimize the mean times per case. Filtering the second sample led to the extraction of a simpler model, highlighting once again the same activities as the most frequent.

most frequently observed path was Cataloguing/Add - > Authorities/ Add - > Cataloguing/Modify. In the university library, even though the number of events was limited to few thousands, the first extracted process model was once again complicated enough. Cataloguing/ Modify was again the most frequently used activity in the cases. The Fines activity, an activity not used in the first library, was the second most frequent. The mean times per case were shorter for the second 8

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

Fig. 10. Most frequent variants for sample 2.

The performance analyses showed that for the municipal library all activities that included Authorities presented delays of several hours, maximizing the response times for these cases. This was not true in sample2. Since there was no information on the actual processes, it is possible that in the university library the procedures that require Authorities intervention had been minimized to avoid delay. If that is the case, the municipal library could benefit by following a similar approach. Performance-wise, it appears that university library worked towards optimization of the way in which fines are handled for the books that are delayed. Since Fines is one of the most common activities, in some cases it causes some delays. However, in the majority of the cases, most fines are issued at night, when the library is closed and when the deadlines for the books that are due end. This ensures that in the morning the fines are already issued and the system is not stalled by repeated requests that are automated and do not require users' attention. With this efficiency, the system is able to serve the rest of the requests, most of which deal with Circulation activities. In both libraries, role analysis was difficult. Disco is not optimized for this kind of analysis and thus even though some user groups emerged, interpretation was based on assumptions and the metrics provided by relative frequencies. A more advanced tool, such as ProM, would allow for more conclusions to be drawn regarding the various roles, and might make it possible to identify user groups. It would be interesting to mine processes from the logs of different ILSs in comparable libraries and observe similarities and differences. In addition, it would be of great interest to replicate this methodology with other process mining tools and to not only extract various process models but to also extract social graphs and perform user clustering. Fig. 11. Graphical representation of Variant17 of sample 2.

6. Conclusion This

methodology

Fig. 12. Most frequent users for sample 2. 9

for

process

discovery

across

different

Library and Information Science Research xxx (xxxx) xxx–xxx

E. Kouzari, I. Stamelos

organizations using the same ILS has been shown to be able to illustrate differences both in the activities and also in the approaches followed by each one. By analyzing several process models, best practices and bottlenecks were highlighted, making it possible for managers to make changes in policies and procedures and bring about efficiencies. Comparisons with event logs from other ILSs, as well as the process optimization for organizations whose processes need improvement, could not only be of use for upper-level administrators but could also help the people who interact with these systems to be more efficient in day to day activities. Process mining findings could be interpreted, prepared, and shared in customized reports to managers and or master users of such systems. Any changes brought about could be monitored with further analyses. Process mining has yet to reach full maturity; there will be more technical advances in the future and more sophisticated tools. The gains will be significant and the organizations to benefit from this kind of research are numerous. This research has demonstrated that libraries can and should be among those organizations.

Kouzari, E., & Stamelos, I. (2013). Process mining in software events of open source software projects. Paper presented at 2nd International Symposium & 24th National Conference on Operational Research, September 26–28, 2013, Athens, Greece. Retrieved from https:// www.researchgate.net/publication/276280864_Process_Mining_in_Software_Events_ of_Open_Source_Software_Projects/download. Macan, B., Fernandez, V. G., & Stojanovski, J. (2013). Open source solutions for libraries: ABCD vs Koha. Program, 47(2), 136–154. Müller, T. (2011). How to choose a free and open source integrated library system. OCLC Systems & Services: International Digital Library Perspectives, 27(1), 57–78. https://doi. org/10.1108/10650751111106573. Park, S., & Kang, Y. S. (2016). A Study of process mining-based business process innovation. Procedia Computer Science, 91, 734–743. https://doi.org/10.1016/j.procs. 2016.07.066. Partington, A., Wynn, M. T., Suriadi, S., Ouyang, C., & Karnon, J. (2015). Process mining for clinical processes: A comparative analysis of four Australian hospitals. ACM Transactions on Management Information Systems, 5(4), 1–19. Rojas, E., Munoz-Gama, J., Sepulveda, M., & Capurro, D. (2016). Process mining in healthcare: A literature review. Journal of Biomedical Informatics, 61, 224–236. https://doi.org/10.1016/j.jbi.2016.04.007. Wang, Y., & Dawes, T. A. (2012). The next generation integrated library system: A promise fulfilled. Information Technology and Libraries, 31(3), 76–84. https://doi.org/10. 6017/ital.v31i3.1914. Ioannis Stamelos is a professor in the Department of Informatics, Aristotle University of Thessaloniki, Greece, where he carries out research and teaching in the area of software engineering and information systems. He holds a Diploma in electrical engineering and a PhD in computer science from the Aristotle University of Thessaloniki. He has published approximately 200 works in journals, conferences, and so on. He has been the scientific coordinator or principal investigator for the University in more than 30 research and development projects in information and communication technologies, with funding from national and international organizations. He is currently running the Open Source Excellence Center of the University and is member of the Board of Directors of the Hellenic Society for Free/Open Source Code. He is also adjunct professor at Hellenic Open University.

References Bozkaya, M., Gabriels, J., & van der Werf, J. M. (2009). Process diagnostics: A method based on process mining. In A. Kusiak, & S.-g. Lee (Eds.). International Conference on Information, Process, and Knowledge Management, 1–7 February, 2009, Cancun, Mexico (pp. 22–27). Los Alamitos, CA: IEEE. https://doi.org/10.1109/eKNOW.2009.29. Breeding, M. (2007). It's time to break the mold of the original ILS. Computers in Libraries, 27(10), 39–41. De Weerdt, J., Schupp, A., Vanderloock, A., & Baesens, B. (2013). Process mining for the multi-faceted analysis of business processes: A case study in a financial services organization. Computers in Industry, 64(1), 57–67. https://doi.org/10.1016/j.compind. 2012.09.010. Delias, P., Doumpos, M., Manolitzas, P., Grigoroudis, E., & Matsatsinis, N. (2013). Clustering healthcare processes with a robust approach. European Conference on Operational Research, 26. Retrieved from http://www.researchgate.net/publication/ 245883299_Clustering_Healthcare_Processes_with_a_Robust_Approach/file/ 60b7d51d875ee204e4.pdf. van der Aalst, W., Adriansyah, A., De Medeiros, A. K. A., Arcieri, F., Baier, T., Blickle, T., & Wynn, M. (2012). Process Mining Manifesto. Retrieved from http://www.win.tue.nl/ ieeetfpm/doku.php?id=shared:process_mining_manifesto. van der Aalst, W. M. P., & Dustdar, S. (2012). Process mining put into context. IEEE Internet Computing, 16(1), 82–86. https://doi.org/10.1109/MIC.2012.12.

Elia Kouzari is a PhD candidate in software engineering in the Department of Informatics, Aristotle University of Thessaloniki, Greece. She holds a master's degree in computer science from the University of Cyprus. She works as an IT consultant in a statistics and informatics firm based in Greece and Cyprus where she collaborates with organizations such as Eurostat, European Central Bank, and so on. In addition, she has been teaching in several institutions of tertiary education in Cyprus since 2011. She is currently an external associate in the Computer Science Department, European University of Cyprus, teaching courses in the area of internet technologies and information systems, and is also an external lecturer at the Mediterranean Institute of Management (Cyprus), teaching enterprise information systems. She is a member of the Cypriot Society for Free/ Open Source Software and a member of Cyprus Computer Society.

10