Research stakeholders identification using an mobile agent's framework

Research stakeholders identification using an mobile agent's framework

Expert Systems With Applications 72 (2017) 18–29 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.el...

2MB Sizes 0 Downloads 25 Views

Expert Systems With Applications 72 (2017) 18–29

Contents lists available at ScienceDirect

Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa

Research stakeholders identification using an mobile agent’s framework Mihai Horia Zaharia∗, Florin Alexandru Hodorogea “Gheorghe Asachi” Technical University, Romania

a r t i c l e

i n f o

Article history: Received 9 May 2015 Revised 11 October 2016 Accepted 4 December 2016 Available online 6 December 2016 Keywords: Distributed computing Artificial intelligence Business process re-engineering

a b s t r a c t In the business process of reengineering, one of the main problems consists in the organizational reintegration of the employee. New instruments for the automatic identification of stakeholders among existing staff evaluation are needed. It is difficult to identify a stakeholder, especially if we refer to science stakeholders, and the proper evaluation of a researcher’s value and position at international level is needed. There are large contradictions concerning the validity of various types of indexing services and other techniques used to classify the researchers. This paper proposes the use of corporate Human Resource Management (HRM) specific instruments when it comes to the researcher as informational stakeholder identification. The main idea behind the method is to use some form of information retrieval techniques over a set of research papers which contain, mostly, the same key words and to extract the hierarchy of the researchers involved/referenced over that set. A distributed approach was used due to its inherent scalability. The use of an inference engine was also deemed necessary due to the complexity of the problem. The system may be used in the automatic assessment of research performance in public or private institutions. It can be easily modified to support complex sets of rules depending on each user’s needs in personnel evaluation. The main research field of this papers concerns distributed artificial intelligence. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction The concept of business process re-engineering (BPR) was introduced more than twenty years ago (Davenport & Short, 1990). The main idea was to convert classic ways of doing business into a more flexible and a more efficient form using information technology (IT) (Hammer & Champy, 1993). BPR was related to the organizational changes that were needed when the first wave of IT adoption began. Since then, IT involvement in the organizational workflows at various levels has continuously increased. As a result, the BPR concept has begun to be reanalyzed (Zamudio, Zheng, & Callaghan, 2012). Nowadays the concept applies to the changes that any organization must undergo in order to merge with another one using an advanced software platform like the one based on service oriented architecture (SOA). The new SOA based approach gives a new dimension to BPR because the concepts must be adapted to the new architectural characteristics. The complexity of the changes

∗ Corresponding author at: Faculty of Automatic Control and Computer Engineering, Dept. Computer Engineering, Str. Prof. dr. doc. Dimitrie Mangeron, nr. 27, Ias¸ i 70 0 050, Romania. Fax: +40 232 231343. E-mail addresses: [email protected] (M.H. Zaharia), florin.hodorogea@ siliconservice.ro (F.A. Hodorogea).

http://dx.doi.org/10.1016/j.eswa.2016.12.006 0957-4174/© 2016 Elsevier Ltd. All rights reserved.

required to integrate a new organization into a larger one needs the involvement of artificial intelligence (AI) (Yu, Mylopoulos, & Lespérance, 1996). Because nowadays most of the international organizations have geographically distributed branches, an approach based on intelligent agents is recommended, especially due to the advancement of multi-agent systems (Gorodetskii, 2012). Yet, there are some problems concerning BPR, among which personnel efficiency still remains an issue (Terziovski, Fitzpatrick, & O’ Neill, 2003). When the BPR is applied within an organization, some Human Resource Management (HRM) changes must also take place. This raises the classic problem of identifying the organization informational stakeholders, especially the latent or hidden ones (Mitchell, Agle, & Wood, 1997). This is critical because the advantage provided by BPR can be quickly lost if a stakeholder is moved from his area of expertise or, in the worst case, fired. Friedman and Miles (20 01; 20 06) define the stakeholder as an individual who has the knowledge to influence a business process. Roberts and Mahoney (2004), after conducting an inquiry among managers, found that only about 65% of the participants used the term stakeholder correctly. Mitchel classified the stakeholders by their influence but also by the urgency of their claims on the organization. The danger of neglecting the claims of some stakeholder categories has been

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

clearly presented e.g. Mitchell et al. (1997). Turner proposed some methods which can be used to properly manage stakeholder satisfaction (Turner, Grude, & Thurloway, 1999). Probably the most important type of stakeholder is the hidden or dormant one. Due to external or internal factors, subjective or objective, dormant or inactive stakeholders may suddenly change their state and become active from an organizational point of view. There is a 50% chance that this activation can be beneficial for the organization. For instance, they may begin to increase their performance, involvement and help by offering solutions to various local problems. However, in other cases, the newly activated stakeholders may use their knowledge to disturb the activity of the local branch. This kind of involvement can be difficult to prove, even when this may even destroy a project or a newly initiated organizational change. Consequently, new ways of identifying stakeholders must be developed. Unfortunately, there are no accurate ways of measuring the stakeholders’ real influence. The mere identification of part of the hidden or dormant stakeholders would be a great asset for the organization. In the classic approach, based on face-to-face interviews and performance evaluation, stakeholder identification is a slow process, with many opportunities to err (Bryson, 2004; Zhao & Hou, 2010). The employees associate this type of evaluation with further personnel decisions (dismissal, transfers and promotions) and engage in simulated behavior strategies, in order to keep their jobs. The large scale integration of workflows using IT would make this kind of evaluation partially or totally automatic. A problem may appear when one believes that the experience of the HR expert cannot be replaced. This is only partially true, because the HR expert will remain useful to the organization, with the computer speeding up his/her work in order to increase their efficiency. A part of one’s experience will be used to generate the rules needed by an inference engine involved in document flow analysis. A possible approach to hidden stakeholder detection would be to extract information about the real structure of document flow. There are situations when a document does not follow the expected processing flow. For example, it is sent to a third party for a supplementary check. Due to the high amount of information to be processed this deviation in document flow structure can be detected only by automatic processing. In the scientific world, we may witness various differences between the quality and output of private and academic research. As far as private research is concerned, the pressure on employees may be high. However, it may bear very good results and offer a high income to the researcher. Performance evaluation is done using corporate specific techniques. In the public funded research, some quality factors are observed, such as the Hirsch index, but they alone are not enough to evaluate the whole performance of an individual within the organization (Priem, Taraborelli, Groth, & Neylon, 2010). In most cases, the criteria used in evaluations vary a lot. As a result, only indirect methods of performance analysis can be used. Most of the common methods of performance evaluation take into account external funding and some visibility indexes, such as the Hirsch evaluation, or they use a weighted sum, based on the ISI impact factor values of each publication. If the evaluation criteria used in HR management are correct, the approach may be successful. Unfortunately, within each organization there may be deviations regarding its evaluation principles. An internal instrument is needed; one that could analyze and also retrieve other research papers from the Internet in order to properly rate the importance of a researcher. The researcher himself/herself can be viewed as a stakeholder for their organization (Lapointe, Mignerat, & Vedel, 2010). In this case, the evaluation must be made at the level of his/her group. To accomplish that, the group needs to be identified in the first place and then the position of the researcher can be estimated. The

19

Achilles’ heel in the case of researcher performance evaluation is the individual’s real influence in the domain, which may be smaller than expected in case of a large number of citations. The idea is to analyze the scientific papers selected from a domain, using a combination of keywords and to extract the influence tree of the author. Until a few decades ago, this method was not applicable because there was not enough available information in electronic form. Nowadays, this is no longer a problem, and the proof that the needed information is available is represented by existing plagiarism detection systems. So, only a few changes in knowledge extraction are enough to obtain a better representation of the real influence of a researcher in a specific field. The same technique can be simultaneously used to analyze the same researcher’s influence in the R&D department of an organization. Finally, both results can be merged to obtain a pertinent performance evaluation of the whole work of the researcher. The research presented in this paper is related to the field of document (scientific papers) clustering based on citations. The term clustering refers in this research the process of finding a cluster of authors that cite one with each other. The approach is not new; it actually preceded current research in social networks (Morlacchi, Wilkinson, & Young, 2005). The first attempts were related to the identification of the researcher’s network structure created by the citing process (Newman, 2001a) and to the application of graph theory to the obtained network (Newman, 2001b). After the development of data mining, the research was mostly focused on clustering in order to capture the critical topics in the research papers (Aljaber, Stokes, Bailey, & Pei, 2010) or on combining citation semantics (Tong, Dinakarpandian, & Lee, 2009). The original problem of using citations to identify the most influential researchers can be solved nowadays by creating the citation tree and analyzing the differences in the frequencies of citations (Cribbin, 2011). In this paper we present the use of a document workflow analysis system for the extraction of the research relationships among the researchers. This will be done by analyzing the domain of their published papers and also the references used in the latter. This may help to properly identify the stakeholders from a domain in conjunction with other quantitative methods that are already used. The systems presented belong to the area of computer assisted decision support and can be used by HRM staff in the phase of employee evaluation. As is previously mentioned there are some aspect related to personnel evaluation that are hard to quantify. Even when standard measurement techniques are used, the specific training of an HR expert, make it difficult to properly use them. The main aim of this research is to offer an instrument that can provide a supplementary and intuitive way of assessment for the HR experts in the area of scientific throughput efficiency. The presented instrument is in the area of distributed artificial intelligence. It was created on top of an existing general framework for distributed computing based on intelligent agents (developed using JADE). For this research supplementary agents were developed accordingly with the analyzed problem. There are some tools that can estimate the throughput of a researcher. The most important are Google scholar and Microsoft academic. The research gate also offers some facilities for researchers. The analysis is more detailed if the ‘Publish or Perish’ tool is used. This tool uses previously mentioned solutions to compute not only the h-index or i10-index but more complex ones, such as the contemporary h-index and some supplementary analysis, the age-weighted citation rate and an analysis of the number of authors per paper (Harzing, 2016). Anyhow, these tools compute some statistical parameters for one individual but do not offer an all citation relationship graph for an author. Those tools are focused only on computing the universally accepted statistical

20

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

parameters. This approach can sometimes have errors especially when deontology problems are in place. The analysis done over general graph can easily (by direct computing or by visual inspection) offer a measure regarding citation graphs overlapping degree among the researchers. The MS office type documents were not analyzed in this stage because their structure is supported by various libraries (MarkLogic, 2016) that can be used. This type of documents has its own metadata. In this case the agent will simply retrieve the document metadata and the reference list from the document in order to provide the system with all the required information. The influence graph presented in Fig. 14 was generated after analyzing research papers (in PDF format) from the Elsevier publisher. This is not a limitation because Springer also freely offers needed information (the paper title, keywords, title and the references of the paper). The PDF type files were mainly used in the experiments because, as was expected, even extracting only needed information (text based) was more difficult than in other cases. The paper has the following sections. First the main system architecture will be discussed. Then comes the basic framework for distributed computing created on top of JADE. This approach was necessary because the effort of creating the next layer (a supplementary framework), which is usually tuned for each researched problem, is significantly decreased. After that, the layer of agents used to do this specific research is presented together with the way of interacting with each other. The research instrument details are followed by the experiment description and finally by the results and conclusions, together with some future development possibilities, too. 2. The intelligent agent-based system From an increasing number of agent-based frameworks (Weiss, 2013) that cover all the possible uses of agents, an improved framework for distributed computing was implemented using JADE. JADE is one of the widely used frameworks for distributed applications based on agent technologies (Bellifemine, Caire, & Greenwood, 2007). The intelligent agents are virtually used in any business aspect. This is not a surprise, as there is a need for both simulation and control. In electronic commerce, another field where agent systems are used, the dataflow specific to virtual partnerships between players can be handled using agent-based data mining architecture. One of the goals is to help managers in the decision-making process (Warkentin, Sugumaran, & Sainsbury, 2012). To increase the security of business related information, the intelligent agents can be used to supervise the whole system (Moradian, 2008). Some estimation of the merchandise price, its trends and the probability of receiving a customer order at a given price can be obtained by simulations using intelligent agents (Ketter, Collins, Gini, Gupta, & Schrater, 2012). Demand driven services require dynamic composition. To lower costs, it must become automatic as much as possible (Zaharia, 2014). The problem of handling specific ontology can also be solved with the use of intelligent agents (Hoxha & Agarwal, 2010). The processing of distributed documents specific to knowledgebased organizations can also rely on an agent-based system (Godlewska, 2012). Workflow management systems that were simulated initially with Petri nets are now modeled, in the most cases, with intelligent agents (Delias, Doulamis, & Matsatsinis, 2011). Multi-agent systems (MAS) are also used in integrated business information systems (IBIS) to implement the needed orthogonal ontological constructs required by any conceptual specific grammar for modeling the system (Kishore, Zhang, & Ramesh, 2006). The business distributed database can be improved if a MAS approach is adopted (Yang, Gen, & Cao, 2010). In the organizational

Fig. 1. System Architecture.

integration process, workflows must be integrated, too. Due to its distributed nature and complexity, an approach based on MAS is suitable (Guo, Robertson, & Chen-Burger, 2008).The agent-based application can be used virtually in any aspect of the high level applications related to business management. The presented system is able to retrieve information from all the central nodes of a company and also from the Internet. Using intelligent agents provides the scalability and flexibility needed to solve various problems that may appear during the process. In fact, the documents are retrieved, analyzed, and the needed information is extracted. Because the main aim of the research in this stage was to extract the citation graph, the main type of supported documents formats were PDF and HTML because most on-line journals use them. If in the case of HTML the extraction process was easy, in the case of PDF type files some problems appeared. Finally, the documents are stored or discarded, depending on the user’s choice. To do that, the framework should be installed in all the required nodes. In a previous research that use a different agent custom framework/tier for implementation (Zaharia, Hodorogea, & Atanasiu, 2012) presented a system that solves the problem of the extraction of the real organization structure. Since in this case the research was done on top of a dedicated framework, the system has been modified to answer new specifications and to perform new sets of analyses. 2.1. System design The basic design of distributed application was a partially centralized one. As a result, the system will use in parallel as many agents as possible (due to some limitations of the accessed systems) to retrieve all needed data. In the second phase the system will use all the data to create a complex influential graph. Then, extracting a mapping tree for each desired researcher is a simple programming problem. In Fig. 1 the main architecture of the application is presented. The used approach is multilayer multitier that is typical for any distributed application. The layer of the host operating system was supplementary introduced because the application can be deployed either in a workstation or in cloud virtual machine. The second layer handles the persistence problems and is similar to one of an ERP, if the system is used in this manner. The system stores locally all the gathered information; so, this layer is required only in previously mentioned case. The classical business layer of the model was divided in some basic tiers for clarity reasons. First

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

21

Fig. 2. Modular structure of the application.

tier contains the JADE system which offers the basic facilities for the next layer. The distributed computing framework tier consists in a general framework designed and implemented to solve this class of applications by means of intelligent agents. Finally, the last tier encloses all needed agents to solve the problems presented in this paper. The presentation layer has now custom made interfaces needed to interact with the user. In the future, it may be replaced by the standard approach typical to the ERP used in the system. Fig. 2 shows the modular structure of the application. The granularity of agent based architecture sometimes makes a modular paradigm approach difficult. As a result, the presented modules have the aim of presenting the main specifications of software. When documents from the ERP system of the company are needed by the system, the communication module will access the ERP module that provides all required interfaces with the persistence system of the ERP. The Data Persistence Module is a general adapter to any database management system (DBMS). Using Factory of Factory design pattern various DBMS specific interfaces can be added without problem. This module was introduced to assure a good maintenance of the system during its life cycle. The Communication module can directly access the ERP database using the Data Persistence Manager Module but due to design reasons the supplementary ERP Interface Module was introduced. The security in this case is minimal and based on standard login procedures because it is presumed that internal network of the company is safe. The Communication Module is also used as a gate needed by the document retrieval module to gather needed information from the Internet. The Enterprise Resource Planning (ERP) Module refers to the special interface that must be created in order to properly interact with a selected ERP system. The Document Retrieval Module implements one or more types of crawling components as needed. In our case, one unsecured access based crawler was developed for basic tests. If the source of documents cannot provide a separate metadata of the document or the information is embedded into the document further processing is required in order to extract only the metadata needed for the system. The Metadata Generation Module will solve the problem using various techniques of document processing in order to obtain the desired pieces of information. These are related to the name of the paper, the author and the keywords for that paper, as well as the same information for each of the references within it. In the process of metadata extraction one problem concerns the natural language. In this case there is the advantage that a scientific report or paper has well defined rules of metadata generation so the natural language interpretation can be avoided. Because the documents used to test the system were from the area of engineer-

ing a simple solution was to use the IEEE taxonomy. Anyway the Dictionary Module was introduced in the system design to offer the possibility of handling any type of supplementary term dictionaries if needed. It will also handle the interaction with SAP data dictionary (Stutenbäumer, 2016). Another inherent advantage of dictionary approach is an increased processing speed because natural language processing is computationally intensive and prone to errors from time to time (Imler, Morea, & Imperiale, 2014). The dictionary based approach also has some problems when some terms are not provided at the used dictionary level but this can be easily managed during system use. After information from documents is retrieved in the desired form, the next step is to create a graph following the citations. To do that the Relationship Graph Generator Module is used. It analyzes all available data and generates the associated graph for the identified relationships. The result is also displayed graphically using the Presentation Module. After that, the hierarchy represented by one of the graph’s spanning trees is generated using domain-specific algorithms. The system stores locally, using an XML file, all intermediary or final data sets produced by the framework because the final results occupy little space on the disk. Most of the temporary data is processed locally on the nodes where the agent is temporarily or permanently located, and only the metadata is transmitted to the central point.

2.2. The distributed computing agent based generic framework JADE framework (Bellifemine et al., 2007) is a very generic tool yet not all-encompassing. As it has all of the essential building blocks for developing distributed MAS, it can easily be used as a framework for building new and more context-specific frameworks. However, because it is not all-encompassing, using it for fast prototyping of very complex distributed systems is not always straightforward. For these specific reasons, we implemented computational MAS upon JADE, which utilizes the distributed computational model inside the multi-agent one (see Fig. 1). In Fig. 3a basic collaboration scheme of the distributed computing framework main components is presented. It is based on a set of agents which handle aspects of a distributed computing framework. At the node level there is a coordinator agent that has the role of controlling the node-level deployment and the run time of computational tasks. It accepts resource requests from agents which encapsulate these tasks and then it decides, based upon internal and

22

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

Fig. 3. Basic collaboration among system components.

Fig. 4. Basic class diagram for the coordinator agent.

system-level heuristics components, their execution and/or migration times. The diagram in Fig. 4 also suggests the way in which task deployment inside a node can be customized, namely, by extending the Scheduler class and injecting desired custom behavior. Access to specific areas from the flow of a Scheduler object is guaranteed because it implements the Template design pattern (Gamma, Helm, Johnson, & Vlissides, 1995). Presently, there are two implementations for scheduling: one just launches tasks as they appear inside a node (FifoScheduler) and the other has a local load balancing policy which, specifically, tries to avoid overloading the current node (ThresholdScheduler). The gateway agent has the role of a gateway into the system for solving task requests by human users or any other processes that do not utilize the computational systems’ internal protocols. For example, there may be an SOAP protocol gateway which handles the deployment of intensive computing tasks on the system. A Java remote method invocation (RMI) is already implemented. It also handles result transmissions and task redeployment in case of a failure. There are a few levels where fault detection is done. The first is given by JADE AMS (Bellifemine et al., 2007) that will raise an exception. This is treated on upper levels provided by presented framework. The main exception concerns a fatal crash of the agent and it is simply solved by restarting it with the same job. This situation may occur when the communication fault appears and it is detected due to AMS features. It may also appear when a document processing error (only for PDF’s) is raised. This may happen especially in cases when the PDF was generated by scanning the original source or a nonstandard (in many cases using free tools) was use in his generation. This last case is not yet completely solved. As for the scanned document this is a never-ending story because supplementary OCR detection tools must be applied and in this case the rate of success can be high but not 100% (Qiu et al., 2016). Therefore, in case of scanned documents the document will not be analyzed because one single mismatch in character detection can provide faulty information, so it is better to neglect it. An-

other possible source of errors can be given by the fact that Clips must be compiled for the host OS and sometimes problems may appear. During the experiment this type of errors still does not yet occur. Python interpreter can be another source of problems but also we do not have yet problems from this area of implementation. Anyway, because the framework was designed as a general solution for distributed computing it has a tolerance embedded at its design and implementation level so even catastrophic crashes of some agent will not propagate nearby. These specifications can be seen in Fig. 5, which presents the basic structure of the Java RMI Gateway agent that is already implemented in this iteration of the application. Communication with the agent is done through the IRequestHandler” and IClientCallback interfaces, via the Java RMI protocol. It contains a callback manager used to keep references of all of the clients and quickly identifies which tasks belong to which clients in case of task completion or failure events. The gateway agent also acts as a source for processing agents. It is equipped with an agent spawning component needed to implement this feature. This agent contains a component (TaskMapper) which keeps track of the progress of the task agents within the system. It also has the ability to detect when a certain task has crashed, and if the task allows it, it can signal the gateway agent to be redeployed. The monitoring agent, presented in Fig. 6, retrieves real time information regarding the node on which it resides, and then reports it to the interested entities (e.g. the coordinator agent). In its current state, it is used mainly for analyzing the computational load of a node, but it can also be used for node-level fault detection. Its implementation currently uses the Hyperic System Information Gatherer and Reporter (SIGAR) API (Schulz, Blochinger, & Hannak, 2009). Using SIGAR gives this agent the ability to analyze various aspects regarding the node status. In this context the node term refers to a computing node from a distributed system that holds a JADE framework and where one or more agents are deployed in order to solve their assigned tasks. Because the basic framework was designed as a general framework for distributed computing,

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

23

Fig. 5. RMI Gateway agent basic class diagram.

Fig. 6. Basic class diagram for the monitoring agent.

Fig. 7. Basic class diagram for the processing agent.

main mechanisms such as load balancing were implemented. The implementation of load balancing algorithm is encapsulated at the agent level, thus it can be changed as a particular class of problems is studied. This is a common approach in various java based distributed computing frameworks. The structure of the monitoring agent, supports both queryreport and subscribe-report scenarios. It captures requests for the current state of the system and responds right away. Enclosed, is a subsystem made up of a subscription manager and a couple of behaviors used to store subscription requests and periodically send information to the interested parties. The processing agent is the entity which encapsulates a task inside the system. In order to execute its task, it needs to register with a coordinator agent and wait for specific instructions relative to its next course of action. The current version of the application has a reactive waiting behavior, meaning that it does not decide anything until it receives a command to execute, migrate, abort and so forth. This agent’s behavior is affected by the global load balancing strategy. As seen in Fig. 7, the processing agent offers features to manage all standard interactions within the framework such as migration, registration to a node, detection of a node’s coordinator, reaction to a coordinator’s commands and status report. Also, it holds an internal executor component which handles the actual processing duties along with the cleanup of the workspace and compilation of the result. It runs in a separate thread, from the one of the agent itself, and can be customized as needs dictate. The executor com-

ponent implements the Template design pattern. From the implementation point of view, it is just an abstract class. Yet, all agents that achieve some sort of computational task within the system need to extend that class. The profiling agent presented in Fig. 8 will gather information over time from different components of the system (not just from the monitoring agent). After that, it will compute and display (or just export) statistical information regarding the general evolution of the system. It is not essential for the function of the framework. The best place for deploying a profiling agent is on an out-of-theway node, maybe even on the node that hosts the Main-Container (master node) from the underlying JADE platform. The profiling agent can discover coordinator and monitor type agents from within the system, hook into the JADE management subsystems and acquire information regarding containers, agents, platforms and so forth, as well as perform different types of analyses upon the collected data. The different analyses can be viewed in a separate Graphical User Interface (GUI). Load balancing agents will explore the system and receive information regarding its state, thus deciding the most suited computing nodes where the migration of any excess tasks can be done. The load balancing strategies can be configured to be either activated explicitly by the node’s coordinator agent, or periodically (if the current strategy permits it). Currently, there are two stable implementations for the load balancing strategies. One is derived from sender initiated diffusion algorithm (Willebeek-LeMair & Reeves, 1993) and the other

24

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

Fig. 8. Basic class diagram for the profiling agent.

is a variation of the single objective ant colony optimization solution (Dorigo & Stuzle, 2004), based upon the research of Salehi and Denary (2006) and Ali, Belal, and Al-Zoubi (2010). Due to its nature, any new required method for load balancing can be added without any problems. These load balancing strategies are global and non-preemptive. They are global because any local load balancing decision is done at the coordinator level for the current node. The non-preemptive behavior is given by the limits of the Java virtual machine concerning the teleportation of execution threads. Even in the case of solving the first problem, when forcing a check-pointing system upon the computational tasks, the agent would require supplementary computing resources. As long as the load unbalance is detected early enough, the costs associated with the termination of a given task or its relaunching on a different node do not significantly affect the makespan of the tasks. The killing of a working thread in the middle of its execution is not usually recommended by the Java programming language community. The computational agent framework also supports XML-based deployment and execution configuration.

2.3. The context-specific framework The agent-based computational framework described earlier has the main advantage of providing a basis for computationally-heavy applications to be deployed, but still leave enough room for complex interactions to be developed between the different components of a system. The main designed and implemented agents solve problems related to information retrieval, metadata extraction, document and metadata repository handling, graph analysis and task management. The role of the information retrieval agent is to communicate with different data sources and to assure the transportation to a specified destination of data that will be processed at a later stage by the system. That destination becomes a document/data repository and can be accessed by other system agents through a data/document repository agent. If the agent performs this task and commits to its predefined communication protocols, specific implementations can be adapted to handle any data source and utilize any processing model: sequential, distributed or parallel. In the current version of the application, since there is less emphasis upon the data sources and more emphasis upon the development of the rest of the system, the information retrieval agent is sequential, launched selectively only when new data is required and deals exclusively with data sources that communicate through the HTTP protocol. The metadata extraction agent directly extends the behavior of the processing agents from the generic computation framework. As result it is directly affected by the execution policies of the coordinators assigned to each node of the system, as well as by the global load balancing strategy.

Fig. 9. Interface between facts and objects.

This approach was chosen because the analyzed documents do not generally have the same structure. As expected, due to various tools used in generating it, a PDF document has different kinds of anomalies as compared to expected standards. Anyhow the agent will use only information given by the Apache PDF box library. From the whole amount of provided data the agent will take only the information needed in this case (such as paper metadata and its reference list) and neglect the rest. These anomalies are not visible in the printed format yet they exist inside the document. As a result, the agent is provided with an inference rule engine and with a few clustering mechanisms. Its execution can sometimes overload the processor and the memory of the host. In the current version of the application, the system utilizes CLIPS as its reasoning engine through the “CLIPS Java Native Interface” (Riley, 2013). The mapping between facts and objects (and vice versa) is achieved through the usage of custom fact translator objects, which implement a generic interface of the type presented in Fig. 9. Thus, complex facts can be handled with chains of custom fact translator objects. After extracting the metadata from a document, the metadata extraction agent will report its findings to the current metadata repository or a representative of such an entity, in case it is distributed. There is also the possibility to configure the agent to handle more than one document at a time. The role of the document repository agent is to provide an interface for other agents inside the platform to access previously stored data from a given node without actually migrating to that node. At the same time, it also provides a mechanism for balancing the load of queries for documents on a given node and, thus, manages its bandwidth, especially if the document repository agent is implemented as a self-replicating entity. The latter solution, coupled with selective replication and a metric inside of a multi-objective load balancing function, might become worthy of a future pursuit. The repository managed by this agent can also contain metadata already generated which are relative to the documents. In fact, the system can work only with the mentioned metadata and refer to the actual document only when the type of analysis demands it. The current version of the system does not use replication and the system will work starting from the assumption that there is no metadata already present. The metadata repository agent is similar

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

25

Fig. 10. Basic collaboration scheme of the graph generation application.

to the document repository agent. It provides an interface for accessing a metadata repository which is, currently implemented as an XML database. With each new addition to the metadata database, that information will also be used to incrementally generate the final graph representation. The metadata aggregation can be done in a distributed manner. However, as specified when the general method was presented, the final representation will be entirely available on one node for processing. Given a representation of the final organization graph, the graph analysis agent’s task is to perform the actual processing upon that representation (e.g. extraction of a spanning tree). Then it either exports the results in XML format or renders them to a convenient format. The current version supports all presented options with possibilities to render the graph as an image (for example, in a format that allows effective zooming into certain sections) or to a GUI. The task agent is simply an interface for configuring and launching an analysis session with custom parameters and, if it is required, custom implementations of the afore mentioned agents. It acts mostly as a master during the actual computational run. Thus, the user can easily control and monitor the execution of the application. 2.3.1. Collaboration scheme A graphical representation of the collaboration scheme of the agents can be seen in Fig. 10 The collaboration is being indirectly issued by the task agent. That is the main reason of representing agents according to their performed activities and there is no explicit naming in their collaborations. 2.3.2. Agent framework deployment Nowadays the transnational corporations map their own their WAN over the Internet, using various secure communication techniques, such as VPN. The research is not centralized from a geographical point of view even if some degree of centralization is required to control it. Therefore, each corporation owns one or more research facilities in each country depending on the magnitude of its economic involvement. At each facility, the researchers have to publish internal research reports regarding their current research. Researchers are also mobile inside and outside the company. The mobility inside the corporation depends mainly on the current project because it is impractical in most cases to have large research facilities with top level support for all the research domains of the enterprise. The same pattern in researcher mobility can be found when researchers work outside the corporation. In this case, the best results are published in places with various degrees of importance but with Internet level indexing. The proposed system tries to keep track of all the researchers’ published work. To accomplish this task, the system must find and analyze all the work published in public or private places by the researchers. To

do that separately for each researcher, would be a waste of resources, and the results may not be as accurate as expected. So the only possible approach is to analyze all available research papers and reports and to create the author’s hierarchy. There are differences when it comes to handling the information retrieval process between corporation WAN and the Internet. In case of an enterprise WAN, one must first decide if either the distributed or the centralized approach will be used. If the corporation has a document workflow that covers all its sites, it is enough to use the centralized approach and to do data mining over the internal reports data base provided by the workflow system. In the other case, the agent framework must be deployed at each research facility in order to access the local database that contains the internal research reports. In the situation of gathering information from the Internet the same decision is take, but using a different approach. If the corporation considers only a well-defined set of publications to be important, then the agent framework can be installed only in one place within the company. Then the credentials needed to access the publications database must be provided to the system. In the other situation, when the corporation is interested in anything that may be published anywhere by its researchers, the agent’s framework must be installed on each location of the company to properly distribute the search effort of the agent’s framework and thus to increase its efficiency. After choosing the deployment strategy the system is operational. There are two phases in its use. The first phase is the initial metadata database creation, and it may require some time and a large number of resources, depending on the area dimension where information retrieval must be done. The second phase can be seen as a maintenance one. The system will index and process new data that appears at each point of the surveyed areas. In this phase, the amount of needed resources significantly decreases. Cloud deployment can be chosen by the system engineer for the centralized solution as it is the most efficient approach, economically speaking. In other situations, the inherent distribution will manage the initial overload of the system without difficulty. 2.3.3. Use of the system The system was designed having in mind the fact that the user is usually a human resources (HR) expert. Consequently, the system provides a text or graphical representation of the relations between various researchers. It can also be easily adapted to automatically provide any performance factors about a researcher. A trivial use of the system may be to automatically access the performance of the university researchers, as well as for teaching staff as in most cases the H index and ISI factor based calculations are commonly used to generate the researcher efficiency score. Due to the use of artificial intelligence techniques, the system can also be modified to support a tailor made complex set of rules for each institutional process of personnel evaluation. Thus the HR

26

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

personnel will not spend too much time gathering the researcher’s scientific efficiency assessment because the system will work in the background and re-compute the performance indexes with a rate that may be established by the user in terms of time. 2.3.3.1. Author hierarchy. The algorithm construction graph is based on assumption that there is an organizational dictionary, which maps terms, keywords, phrases to task types and the document metadata, already exist in order to do the graph generation (Zaharia et al., 2012). For presented experiments the “2009 IEEE Taxonomy” which contains a three level term-family index, constructed by using the top-most terms found in the “IEEE Thesaurus” (IEEE, 2013) were used. In the general case, the relevant metadata might contain the ID of the document sender, the ID of the document receiver and the type of job the document refers to. In the specific case, decided upon in this study, those fields become: • • •

the name of the author of an article (known as X) the name of a referenced author inside the article (known as Y) the title of the paper being referenced (known as T)

Thus, the relationship takes the following form: X gives Y a vote for expertise in the domain of C(T), where C(T) (read as “concept of T” ) represents the concept or domain that the aggregation of  tokens from within T points to. More specifically, C (T ) = nk=1 tk where n is the number of tokens from within T, tk is the kth token from within T and  is an aggregation operator over a given set of tokens. In this case the situations when both X and Y are members are analyzed. Situations in which Y is not part of the organization may be relevant because it shows the influence of external experts to the organization upon the corporate entity. In this study, however, we ignore those results. In this particular case of presented experiment the chosen domains were artificial intelligence and distributed systems. Therefore, the extracted graph Fig. 13 contains the citation relationship among author of the papers in these areas of expertise. Since in this context, neither X, Y nor T is known before hand, a strategy for extracting them has been devised using the CLIPS engine embedded in the metadata extraction agent entities. Using a rule engine is necessary because the currently analyzed documents are in PDF format. This means that there is a very high probability that different anomalies could appear inside the text embedded in the PDF file at export time. The most important examples of such anomalies are: missing spacing information, wrong characters in different places, images used instead of blocks of text. In order to extract the text from a PDF file and also to get information regarding its position on the page, the type and size of the used font, the current page number and other supplementary data, a solution based upon the Apache PDFBox library was chosen (The Apache Software Foundation, 2012). More specifically, the selected solution involves extending one of the classes from the PDFBox (the PDFTextStripper class) and utilizing one of its internal methods to capture the afore mentioned details of each character (or, sometimes, groups of characters). That information is then passed onto a different module (CharacterGroupClusterer) which, after receiving all of the information, begins to form character clusters based upon the position on the page of each character/group and dominant font type. Each character cluster is analyzed, and the resultant facts are then fed into the CLIPS engine, which was loaded with a set of custom rules manually created after the examination of many documents. For example, a simple rule to identify whether a chunk is the title of a paper or not (with the journal ID being equal to 10) is presented in Fig. 11.

Fig. 11. Example of a used CLIPS rule.

2.3.3.2. Graph generation. As can be observed in Fig. 12 the idea of graph generation is not complicated. In the beginning, the agents (in information retrieval phase) connect to any available (or preestablished) source of documents and use the key terms that specify the domain of interest to retrieve only the one of interest for the experiment. In the second phase (research information and relationship extraction) each document is taken, filtered and the required data is extracted. In the current research the tests were done over PDF documents because their variable structure presents the most complex problems that must be handled by the system. After metadata is extracted from each document, it is written into a common XML type file. When this process is done the system enters the third phase of graph generation. In here all separate relations among authors filtered using the pre-established keywords are merged into a large graph. Its size may vary but when the analysis also involves the Internet, it may be very large (thousands of nodes). If a full search into a domain that will access all main journal websites is done the number of graph nodes can be even higher. This is normal because each author is represented by a graph node. After graph internal representation is created a dedicated library was used to display the graph. The following section of a graph, presented in Fig. 13, is derived from a collection of 130 free documents stored in PDF format. They have been classified by online search engines, prior to graph generation, as being on the subjects of “Artificial intelligence” and “Distributed computing.” For the sake of simplicity, only the nodes which have been referenced at least once, ignoring self-referencing, have been taken into consideration. The Java Universal Network/Graph Framework (JUNG) has been used both for rendering the graph onto a Java JFrame component and for exporting it into a high-resolution image file (JUNG, 2010). The diameter of the circle representing a vertex is proportional to its in-degree. After generating the full graph, a greedy version of the BFS algorithm (Kozen, 1992) was used to generate a spanning tree. This makes the “organization’s” hierarchy easier to observe. In contrast with the classical version of the BFS algorithm, this version does not introduce a node’s child into the exploration queue as it iterates through them but does so in the reversed order of their number of references to their parent. Therefore any child node that may increase the number of reference to the current node will be expanded earlier than others. This will ensure, in the case of multiple references to the same node, that the nodes which consider the current node as being more “influential” are added into the hierarchy sooner rather than later. Also, this strategy keeps a balance between breadth-heavy and depth-heavy results. The root node for the tree was selected as being the most “influential” one, for instance the node that has been referenced the most. Supplementary tests have been made on the nodes that have the in-degree in the vicinity of the selected root node but, at least on this document sample, the results were, mostly, sub-trees of the selected root node. For the sake of supplying a decent overview of the tree’s overall shape without too much information cluttering

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

27

Fig. 12. The graph generation algorithm.

Fig. 13. A section of the extracted graph.

the image in Fig. 14, we only included the names of the vertexes that have an in-degree of at least 10 (e.g. have been referenced at least 10 times) in the complete graph. It is unimportant whether the diameter of the circle representing a vertex is proportional to its in-degree, like in the example containing the section from the complete graph. In Fig. 14, the root vertex’s circle is darker in order to be easily identified. The system scalability is the one provided by JADE thus it is good (Bellifemine et al., 2007). This was one of the reasons for choosing it as base for our frameworks development. The centralized approach was chosen in the first stages of development be-

cause it was needed to see if the proposed approach is feasible enough. Because the system cannot use from the beginning a filtered search for some researcher the initial computing effort is high. Anyway, in the next development stages the fully distributed approach is required because the system must continuously improve the complex citation based graph as the researcher continues to publish and the area of document search can be increased. The current design may be also used but, due to inherent problems of a partially centralized approach (e.g. bottleneck at the main node) and also due to the inherent geographical distribution of the sources, it is better to use a distributed one. The solution

28

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

Fig. 14. The spanning tree of the graph with “Y. Moses” as the root node.

will consist in the introduction of the geographically distributed concentrator node (a solution similar with the banking system) where partial sub-graphs will be maintained. As a result, the computing effort is distributed over the tree leafs. Even if (from fault tolerance reasons) more master nodes may exist, from the algorithm point of view it is enough. In this case the final graph must be replicated on each master computing node. The overload problems are not so important because the main computing effort is to retrieve, filter and organize the information not to assemble into a final graph (this is the main operation done at the master level). The load balancing will also be done at the intermediary nodes of the tree (where the information retrieval process id ongoing). Of course at the central nodes level a minimal load balancing is assured, too. It will be only in terms of choosing the sources for each node that controls the retrieval process in order to avoid duplication of work. 2.3.4. Future developments The presented results are encouraging but at this moment only the aspects regarding the influence (measured by citation) is automatically extracted (using information retrieval techniques). To become a fully functional tool for an HR expert, some features must be added. A new class of agents must be developed in order to retrieve and analyze from Internet all information regarding Hirsh index and any supplementary scientometric indicators (Vinkler, 2010). When this phase is accomplished some charts regarding the geographical places and the researchers from high rated university (e.g. from top 100) that cite a researcher must be generated. Therefore the HR will have the information presented into a synthetic and quantified way that is more appropriate to its way of doing things. To have a global image regarding the influence of an expert, its managerial aptitudes may also be taken into account. In these conditions another improvement at the software level is to create agents that retrieve information regarding the researcher grants. Some parameters regarding the amount of money, the starting date, the contracting unity and its role in science development are critical. When complex statistic analysis that involve all mentioned parameters are done some supplementary evaluation rules must be used. Unfortunately, the IT cannot fully replace the human at this level but tools such as the presented one may improve his effi-

ciency and significantly decrease its evaluation time. There are two reasons for doing that. One consists in the set of rules that are given by the expert based on its experience and the other refers to the organization rules. Extracting rules from an expert and using them in inference engine is a common task nowadays (Kidd, 1987). Due to the personal aspect of these rules the tool must give the expert the possibility of defining/modifying the used rule set. The same liberty must be granted also with respect to the organizational rules because just as the organization is in a continuous change, so are the rules used for evaluation Another feature concerns the paid access papers. In this case there is no problem (from basic software engineering point of view) in designing and implementing an agent to handle any type of secure login (base on CA, on IP list and on login information) (Davies, 2011). Another problem in using this tool is that it may overload the journal website with agent requests and, in some cases; there are automatic measures that can prevent this. There are two ways of solving this problem. If the journal is an open source type, then the agent may be instructed to wait as long as it is necessary until the access is granted again. In the case of the paid one the situation is solved by direct negotiation with the journal in terms of accepted number of simultaneous requests. This type of problems are similar with the ones of plagiarism detection software solutions (Luparenko et al., 2014) so there is enough expertise (at the legal or IT level) to solve them. On long term the implication of this software may be significant because nowadays this type of complex evaluation of the researcher is mainly done by other experts (usually highly recognized in their organizations and in their field of expertise. As it was previously mentioned, there are on the market various tools that try to help these experts in their work but due to problem complexity the main decision remains in their hands. Because the main information retrieved from documents is only text based as the efficiency of the PDF extraction API will improve some supplementary processing steps, executed now at the agent level, in metadata extraction may be neglected in the future. 3. Conclusions In this paper, an intelligent agent based approach aimed at identifying scientific stakeholders was presented. The instrument was designed to be used by HRM staff not only in their daily activities but also when major organization changes are initiated. If the system has large amounts of information as input, it will have enough data to offer a good estimation of the existing relations among different researchers and also of the importance of these relations. The system also gives the possibility to analyze internal research papers that have restricted access to various organizational levels. This will give a more accurate evaluation of a researcher than when external tools without rights to access the internal company data flow are used. The approach based on agents makes the security breach between the agents that crawl through the Internet and internal ones less probable. In order to test the system, a set of published papers available on the Internet were used. The experiment proved that there are, indeed, well defined relationships between researchers and, thus, the results can be used to design a better system for the evaluation and classification of researchers. The use of an automated system approach can also decrease inherent organization frictions that may appear during the process. The costs of system implementation are reasonable and remain the same if cloud deployment is chosen. The framework developed using JADE was designed to have a high level of generality, so it can be used in various situations related to information retrieval by

M.H. Zaharia, F.A. Hodorogea / Expert Systems With Applications 72 (2017) 18–29

simply changing the working agent core and roles. This is proven by the fact that an agent system is easily adaptable to new tasks by simply adding new dedicated or general agents to it. At the same time, the scalability of the system can be improved in relation to a given set of requirements, because the algorithms used in load balancing can be easily replaced. The usage of artificial intelligence tools (such as CLIPS) within the system can be improved because the framework allows the creation of new agents with custom embedded solutions. The system may be used in the automatic assessment of research performance in public or private institutions. It can be easily modified to support complex sets of rules depending on each user’s needs in personnel evaluation. References Ali, A.-D., Belal, M. A., & Al-Zoubi, M. B. (2010). Load balancing of distributed systems based on multiple ant colonies optimization. American Journal of Applied Sciences, 7, 428–433. Aljaber, B., Stokes, N., Bailey, J., & Pei, J. (2010). Document clustering of scientific texts using citation contexts. Information Retrieval, 13(2), 101–131. Bellifemine, F., Caire, G., & Greenwood, D. (2007). Developing multi-agent systems with JADE. Chippenham, Wiltshire: John Wiley & Sons Inc. Bryson, J. M. (2004). What to do when sakeholders matter: Stakeholder identification and analysis techniques. Public Management Review, 6(1), 21–53. Cribbin, T. F. (2011). Citation chain aggregation: An interaction model to support citation cycling. In Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM ’11) (pp. 2149–2152). New York: ACM. Davenport, T. H., & Short, J. E. (1990). The new industrial engineering: Information Technology and business process redesign. MIT Sloan Management Review, 31(4), 1–31. Davies, J. (2011). Implementing SSL / TLS using cryptography and PKI (1st ed.). London: Wiley. Delias, P., Doulamis, A., & Matsatsinis, N. (2011). What agents can do in workflow management systems. Artificial Intelligence, 35(2), 155–189. Dorigo, M., & Stuzle, T. (2004). Marco Dorigo, thomas stuzle. Cambridge, MA: MIT Press. Friedman, A. A. (2006). Stakeholders: Theory and practice. New York: Oxford University Press Inc. Friedman, A., & Miles, S. (2001). Developing a Stakeholder Theory. Journal of Management and Studies, 39(1), 1–21. Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns: Elements of reusable object-oriented software. Boston: Addison-Wesley Longman Publishing Co. Godlewska, M. (2012). Agent system for managing distributed mobile interactive documents. In Transactions on compuational collective intelligence VI (pp. 121–145). Springer. Gorodetskii, V. I. (2012). Self-organization and multiagent systems: II. Applications and the development technology. Journal of Computer and Systems Sciences International archive, 51(3), 391–409. Guo, L., Robertson, D., & Chen-Burger, Y.-H. (2008). Using multi-agent platform for pure decentralised business workflows. Web Intelligence and Agent Systems, 6(3), 295–311. Hammer, M., & Champy, J. A. (1993). Reengineering the corporation: A manifesto for business revolution. New York: Harper Business Books. Harzing, A.-W. (2016), September 17. Publish or Perish. Retrieved from Harzing.com: http://www.harzing.com/resources/publish- or- perish. Hoxha, J., & Agarwal, S. (2010). Semi-automatic acquisition of semantic descriptions of processes in the web. In Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology - Volume 01 (WI-IAT ’10) (pp. 256–263). Washington, DC: IEEE Computer Society. IEEE (2013),. January 1. IEEE Thesaurus. Retrieved May 13, 2013, from http://www.ieee.org: http://www.ieee.org/publications_standards/publications/ services/thesaurus2.html. Imler, T. D., Morea, J., & Imperiale, T. F. (2014). Clinical decision support with natural language processing facilitates determination of colonoscopy surveillance intervals. Clinical Gastroenterology and Hepatology, 12, 1130–1136. JUNG (2010), January 24. JUNG Java Universal Network/Graph Framework. Retrieved May 15, 2013, from http://jung.sourceforge.net/: http://jung.sourceforge. net/index.html. Ketter, W., Collins, J., Gini, M., Gupta, A., & Schrater, P. (2012). Real-time tactical and strategic sales management for intelligent agents guided by economic regimes. Information Systems Research, 23(4), 1263–1283. Kidd, A. (1987). Knowledge acquisition for expert systems. New York: Springer US. Kishore, R., Zhang, H., & Ramesh, R. (2006). Enterprise integration using the agent paradigm: Foundations of multi-agent-based integrative business information systems. Decision Support Systems, 42(1), 48–78.

29

Kozen, D. C. (1992). Depth-first and breadth-first search. In D. C. Kozen (Ed.), The design and analysis of algorithms (pp. 19–24). New York: Springer. Lapointe, L., Mignerat, M., & Vedel, I. (2010). The IT productivity paradox in health: A stakeholder’s perspective. International Journal of Medical Informatics, 80, 102–115. Luparenko, L., Ermolayev, V., Mayr, C. H., Nikitchenko, M., Spivakovsky, A., & Zholtkevych, G. (2014). Plagiarism Detection Tools for Scientific e-Journals Publishing. In V. Ermolayev, H. C. Mayr, M. Nikitchenko, A. Spivakovsky, & G. Zholtkevych (Eds.), Information and communication technologies in education, research, and industrial applications (pp. 362–370). Cham: Springer Intl. MarkLogic (2016), October 8. Extracting Metadata and Text From Binary Documents. Retrieved from docs.marklogic.com: https://docs.marklogic.com/guide/ search-dev/binary-document-metadata. Mitchell, R. K., Agle, B. R., & Wood, D. (1997). Toward a theory of stakeholder identification and salience: Defining the principle of who and what really counts. Academy of Management Review, 22(4), 853–888. Moradian, E. (2008). Integrating web services and intelligent agents in supply chain for securing sensitive messages. In Proceedings of the 12th international conference on knowledge-based intelligent information and engineering systems, Part III (KES ’08) (pp. 771–778). Berlin, Heidelberg: Springer-Verlag. Morlacchi, P., Wilkinson, I. F., & Young, L. C. (2005). Social networks of researchers in B2Business marketing: A case study of the IMP group 1984-1999. Journal of Business-to-Business Marketing, 12(1), 3–34. Newman, M. E. (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64, 016131–016139. Newman, M. E. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64, 016132–016139. Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010), Altmetrics: A manifesto. (October 26) Retrieved from altmetrics.org: http://altmetrics.org/manifesto/. Qiu, Q., Li, Z., Ahmed, K., Liu, W., Habib, S. F., & Li, H. (2016). A neuromorphic architecture for context aware text image recognition. Journal of Signal Processing Systems, 84, 355–369. Riley, G. (2013), January 25. A Tool for Building Expert Systems. Retrieved May 16, 2013, from clipsrules.sourceforge.net: http://clipsrules.sourceforge.net/ Roberts, R. W., & Mahoney, L. (2004). Stakeholder concept of the corporation: their meaning and influence in accounting research. Business Ethics Quarterly, 14(3), 399–431. Salehi, M. A., & Deldari, H. (2006). Grid load balancing using an echo system of intelligent ants. In Proceedings of the IASTED international conference on parallel and distributed computing and networks (pp. 47–52). Innsbruck: ACTA Press. Schulz, S., Blochinger, W., & Hannak, H. (2009). Capability-aware information aggregation in peer-to-peer grids. Journal of Grid Computing, 7, 135–167. Stutenbäumer, T. (2016). Practical guide to ABAP. Gleichen: Espresso Tutorials GmbH. Terziovski, M., Fitzpatrick, P., & O’Neill, P. (2003). Successful predictors of business process reengineering (BPR). International Journal of Production Economics, 84, 35–50. The Apache Software Foundation (2012), June 11. Apache PDFBox - A Java PDF Library. Retrieved May 14, 2013, from http://pdfbox.apache.org/ Tong, T., Dinakarpandian, D., & Lee, Y. (2009). Literature clustering using citation semantics. In Proceedings of 42nd Hawaii international conference on system sciences (pp. 1–10). Washington: IEEE. Turner, J. R., Grude, K. V., & Thurloway, L. (1999). The project manager as change agent : Leadership, influence and negotiation. New York: Mcgraw Hill Book Co Ltd. Vinkler, P. (2010). The evaluation of research by scientometric indicators (1st ed.). Oxford: Chandos Publishing. Warkentin, M., Sugumaran, V., & Sainsbury, R. (2012). The role of intelligent agents and data mining in electronic partnership management. Expert Systystems Applications, 39(18), 13277–13288. Weiss, G. (2013). Multiagent systems (2th ed.). Cambridge: MIT Press. Willebeek-LeMair, M. H., & Reeves, A. P. (1993). Strategies for dynamic load balancing on highly parallel computers. IEEE Transactions on parallel and distributed systems, 979–993. Yang, L., Gen, X., & Cao, X. (2010). Application research of enterprise information integration based on intelligent agent on ciagent platform. In Education technology and computer science (ETCS) (pp. 531–534). Washington, DC: IEEE Society Press. Yu, E. S., Mylopoulos, J., & Lespérance, Y. (1996). AI models for business process reengineering. IEEE Expert: Intelligent Systems and Their Applications, 11(4), 16–23. Zaharia, M. H. (2014). Generalized demand-driven web services. In Z. Sun, & J. Yearwood (Eds.), Demand-driven web services (pp. 102–132). IGI Global. Zaharia, M. H., Hodorogea, A., & Atanasiu, G. M. (2012). Automatic extraction of the real organizational hierarchy using JADE. In Advances in intelligent systems and computing: 171 (pp. 185–196). Heidelberg: Springer Verlag. Zamudio, V., Zheng, P., & Callaghan, V. (2012). Intelligent business process engineering: An agent based model for understanding and managing business change. In Proceedings of the 2012 eighth international conference on intelligent environments (IE ’12) (pp. 141–148). Washington: IEEE Computer Society. Zhao, Z., & Hou, J. (2010). The study of Influencing factors of team brainstorming effectiveness. International Journal of Business Management, 5(1), 181–184.