Computers in Industry 102 (2018) 50–61
Contents lists available at ScienceDirect
Computers in Industry journal homepage: www.elsevier.com/locate/compind
A cloud-based manufacturing control system with data integration from multiple autonomous agents ileanu* , Florin Anton, Theodor Borangiu, Silvia Anton, Maximilian Nicolae Silviu Ra Dept. of Automation and Applied Informatics, University Politehnica of Bucharest, Romania
A R T I C L E I N F O
A B S T R A C T
Article history: Received 22 September 2017 Received in revised form 8 June 2018 Accepted 27 August 2018 Available online xxx
The paper describes a semi-heterarchical manufacturing control solution based on a private cloud infrastructure which collects data in real-time from intelligent devices associated to shop-floor entities. The entities consist of industrial resources and mobile devices embedding the work in process on products during their manufacturing cycle. The proposed control system is developed using a common database in the cloud which handles operation synchronization and production control logic. The database component tables and the update processes are described in the article. The main functionalities of the control system are: manufacturing system configuration, control, monitoring, and optimization, and storage of historic data. An implementation framework and experimental results for the evaluation of consumed energy are reported. © 2018 Published by Elsevier B.V.
Keywords: Cloud manufacturing Private cloud MES High availability Distributed intelligence Mobile/intelligent devices Agentification
1. Introduction Control paradigms for the manufacturing domain have evolved over time from centralized to decentralized or semi-heterarchical [1] and were mainly driven by the new trends in information and communication technology (ICT) such as: mobility, connectivity, increase of the decisional capabilities, service orientation and more recently the usage of cloud infrastructures to host control applications, run intensive optimization procedures and store large amount of production and resource data [2]. This is how the concept of cloud manufacturing (CMfg) arose [3]; its scope is to handle manufacturing resources and processes according to the quality of services (QoS) they provide, and thus efficiently operate and manage them in a unified manner to provide efficiency, high availability, and flexibility [4] in realizing customer orders in both small batches and mass production. At the manufacturing execution system (MES) level, cloud computing refers mainly to virtualization of applications such as mixed batch planning, job scheduling and resource allocation, or product traceability and production tracking [5]. While MES implementations are different and dependent of the physical infrastructure, the generic MES functions are standardized by ISA-
* Corresponding author. ileanu), E-mail addresses:
[email protected] (S. Ra fl
[email protected] (F. Anton),
[email protected] (T. Borangiu),
[email protected] (S. Anton),
[email protected] (M. Nicolae). https://doi.org/10.1016/j.compind.2018.08.004 0166-3615/© 2018 Published by Elsevier B.V.
95.03 specification [6]. ISA-95 defines five levels for the hierarchical organization of a production enterprise, as follows: i) levels 0–2 represent the process control layer together with its associated intelligent devices and control system, ii) level 3 represents the manufacturing operating layer and does jobs like: scheduling, quality management, production tracking, a.o., and iii) level 4 is in charge with managing the business-related activities of the enterprise. In conventional production structures with multiple workstations which allow job-shop fabrication scenarios, there are several physical control and computing entities representing single points of failure (SPoF) that can be avoided only by hardware redundancy; such entities usually perform centralized tasks: the system scheduler doing high performance computing (HPC) tasks, the product router (the PLC responsible for routing products on pallets throughout the cell towards assigned resources), and the central user application for production tracking and resource monitoring. Private cloud platforms integrated as IaaS (Infrastructure as a Service) in the MES layer solve the high availability (HA) intensive computation through resource virtualization techniques and provisioning of computing resources (CPU, storage, I/O) and applications. Also, the system scheduler (SS) implemented in the cloud communicates with the distributed MES (dMES) in which shop floor resources and product execution orders are agentified; these agents cooperate in a multi-agent framework (MAS) [7,8], both with the SS to receive scheduled orders and report their execution, and between them at dMES level to reach a common goal (e.g. best job assignment) [9].
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
Secure communication protocols must be used in such cloudbased CMfg systems in order to access real-time production data and resource status information from agentified devices: i) mobile intelligent embedded devices (IED) containing both the order execution data (scheduled operations for product execution and assigned resource for each operation), and work in process (WIP) data over the product’s execution lifecycle; ii) stationary computing devices (local workstations) providing resource status data and receiving from the SS application programs (CNC, robot) for operations on products assigned to resources [10,11]. The paper is structured in six sections: Section 2 describes the structure of the dual SS-dMES control architecture; Section 3 describes the dynamic model of the control architecture, Section 4 describes the cloud infrastructure which hosts the centralized system scheduler and integration principles; Section 5 presents an experimental case study, and the last section is devoted to conclusions. The proposed architecture has the following key elements: WIP follow up (augmented with decisional capabilities), resource monitoring and centralized control (according to product recipes). The partitioning of the control entities is inspired from the PROSA reference architecture, but the reported research focuses on shop floor data acquisition, transfer and storage in the cloud database and on processes that are performed in the cloud [12]. 2. The control architecture The proposed semi-heterarchical control architecture performs the following tasks: i) resource team configuration, batch planning, product scheduling, resource allocation [13], cell and production monitoring which will be done on the upper SS level running in cloud infrastructure for the received batch orders, and ii) coordinating the execution of batch orders and rescheduling of orders in current execution which will be done on the lower dMES level. There is an information flow from the lower dMES level to the upper SS one about special event occurrence like resource breakdown / recovery, rush order or local part storage depletion for recording and decision making. The choice of having centralized batch planning optimization and decentralized production control (which may override the centralized optimization) is justified by the cloud HPC availability to run complex optimization programs on the farthest (batch) horizon while assuring robustness and agility at changes. This dual semiheterarchical control system has the ability to quickly react at unexpected events through intelligence distribution, agentification, and decentralization of the control [14,15]. The semiheterarchical control architecture can deal rapidly and locally with orders in current, while computing in parallel at the upper SS level availability a new optimized schedule for the rest of orders waiting to be processed. This dual behaviour reduces the myopia of the system at global batch level and preserves the system’s agility. The semi-heterarchical control solution is designed around a common database containing four types of information distributed in 10 tables: i) shop-floor level information (1 - available resources, 2 - feasible operations, 3 - products that can be manufactured, 4 distribution of operations on resources, 5 - operations needed to obtain products); ii) clients information (6 – clients’ identification and 7 - placed orders); iii) information resulted as a consequence of planning and scheduling (8 - sequence of orders, sequence of operations, the resources processing the operations, and the specific time duration); and iv) information needed to communicate between the web interface and the shop-floor level (9 - operation execution reports and 10 - synchronization data). The composing entities are instrumented and the categories of information above defined are updated and processed as depicted in Fig. 1. A) Orders, represented by WIP data [16], are instrumented with IEDs that are able to run a multi-agent system and communicate
51
over wireless networks. From an informational point of view the WIP is represented by an order agent. Each order agent sends to the database located on cloud the order and operations status. It receives from cloud the product type, the operations to perform according to a certain control strategy (hierarchical / heterarchical); depending on the recommended control strategy it might receive an optimal resource allocation. B) Processing resources (robots, CNC machines) have computing terminals (PCs) attached to their controllers. The PCs run a multi-agent system used to integrate the resources with the mobile IEDs (located on pallet carriers) embedding WIP, and with the cloud system. From an informational point of view the resources are represented by resource agents that run on the PCs and send the periodically updated resource status to the database located on cloud. Resource and order agents are located on the same network and exchange execution information (assign operations on resources, request execution of operations, receive acknowledges from resources). C) Transporting resources are represented by the devices of the conveyor (track motors, lifts, diverting elements, etc.) controlled by a PLC which is accessed through a TCP / OPC bridge. This type of resource communicates only with the mobile order agents located on the IED of each pallet where a product will be progressively manufactured by the visited resources, and fulfils transport services (e.g., an order agent requests transport to a specified resource, a transport resource realizes transport operations, and an acknowledgement is issued upon operation completion). D) The cloud infrastructure hosts a set of machines running the database, the optimization engine, the web server, and the agent used to synchronize the data from the database with the optimization engine and with the shop-floor level. As a conclusion for the section describing the proposed control architecture the decision-making process is realized in decentralized manner using autonomous agents (mobile for orders, and fixed for resources). These agents are developed using a MAS platform. The objective of the research is the integration of the resources, orders and cloud infrastructure using the MAS platform which performs the communication through messages and assures integrity and reliability. Security is assured through private communication network which is separated from the Internet. To address security requirements such as: unauthorised access to information, theft of proprietary information, denial of service and impersonation in the shop floor architecture using Wi-Fi communication between intelligent products, resources and Cloud production scheduler, the proposed control architecture uses a policy-based mechanism to handle transport security by introducing a real-time Public-Key Infrastructure (PKI) platform using Certification Authorities to generate certificates on the fly and secure socket communication. The PKI implementation is based on OpenSSL, an Open Source toolkit implementing the Secure Sockets Layer SSL (v2/v3) and Transport Layer Security TLS (v1) protocols as well as full strength general-purpose cryptography library. The internal JADE agent architecture is integrated with OpenSSL. Additionally, a document level encryption and signing mechanism is introduced as a component of the policy for all MES messages exchanged between intelligent products, sensors, shop floor resources and different MES components. 3. Database structure and updating process The database tables, structure, description of fields and how they are updated are presented below (Fig. 2). The update process is detailed in the processes associated to table definition. 1. Resources table (resources_table) – contains information about feasible resources, their description, and current state.
52
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
Fig. 1. The 2-layer generic model for cloud-based manufacturing control describing the processes of: i) updating a resource’s state and energy consumption, its associated operation execution time and energy consumption (transparent arrow); ii) updating order location, current operation and resource failure in case of resource agent breakdown; and iii) process optimization consisting of: querying the resource state, updating the optimization model, running optimization algorithm, comparison with the current situation, and updating planning and scheduling.
Processes associated with the resources table: Resource state update: a JADE [17] agent executing on the resource PC continuously interrogates the physical resource (every second) and in case of change updates the state field of the associated resource. If the resource agent goes offline and an order agent tries to obtain a service from it, the order will update the resource state to offline. 2. Operations table (operations_table) – contains a description of the operations that can be realized by the system. This table is filled offline by the operator in the design stage of production. 3. Operations on resources table (op_on_res_table) – links the resources with the operations they can perform, together with information collected after production like processing time and energy consumed. After an operation is realized a new record is added to this table. Processes associated with the operations on resources table: Execution time and energy utilization update: the resource agent receives a request for the execution of an operation; this request is forwarded to the resource controller while starting a counter and reading the current energy consumption of the resource. When the resource controller signals that the operation has ended it stops the timer in order to obtain the execution time and reads the current energy consumption; by subtracting the initial energy value the energy consumption during the operation is computed. Both values are used to update the database through the JDBC (Java Database Connectivity) connector. This information is stored in the operations on
resources table for the centralized optimization model and in operations traceability table to obtain an overview of each product execution’s history. In the operations on resources table, operation and resource IDs are used to update a fixed record (only last changes are stored) while in the operations traceability table all changes (execution times and energy consumed for a given product) are stored. Adjusting production planning and resource scheduling in real-time. By collecting in real-time the resources’ state and quality of services (QoS - how operations have been done) the optimization model can be re-run in order to reduce manufacturing costs (optimization function). In this research, the cost is composed of the total production time (makespan) and the energy consumed. Since energy is influenced by the makespan (energy = power time) [18] tests were carried out in the experimental part for the instantaneous power and energy update processes. The information mentioned before is captured in the “Operations on resources” and “Resources” tables which serve as input for the optimization engine. Based on the information stored in these two tables, the events that can alter the computed schedule fall into two categories: hard changes and soft changes. Hard changes are events which alter in a major way the production. In this category fall resource/operation failures which cannot be overcome without batch rescheduling in order to eliminate unavailable resources. The second category, soft changes, consists of small variations of the registered QoS parameters which alter the computed schedule but do not make it unfeasible (e.g.: an increase in the execution time generates a longer makespan, but operation schedule can be maintained). With these considerations special events are defined beforehand
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
53
Fig. 2. The relations between database tables in the Cloud MES database.
and identified during production by the execution agents. By continuously monitoring the production database the MES is able to trigger the planning and rescheduling process whenever initial conditions are no longer satisfied. Variance of the operation start and execution time or energy consumption (soft changes) are the most common unforeseen shop floor events during execution of a fixed batch; they are likely to appear more frequently than hard changes (resource breakdowns) [9]. Hard change can occur at any time (low frequency) and they invalidate the existing operation scheduling since the affected resource/operation no longer exists (Fig. 3). In this case it is mandatory to reschedule operations both at packet level (orders currently in execution) and batch level (orders waiting to be inserted). The rescheduling of orders in execution is started at event detection by the multi-agent dMES handling the decentralized operating mode. Orders waiting to be inserted will be centrally processed in the cloud by the optimization engine using the MES database as input (Fig. 4). Soft changes are registered after each operation terminationfor any of the products in simultaneous execution. They may change the result of the optimized scheduling, but not significantly since resources/operations are still available; only when the increase of
the operation execution time or energy consumption exceeds a predefined threshold thr (experimentally computed), this change at operation level will be considered as a hard change causing rescheduling. The predefined threshold divides the changes into two categories: minor and major changes. Both are recorded but only major changes trigger the centralized rescheduling for orders awaiting execution. Frequent operation rescheduling (at occurrence of unexpected events) may affect the smooth global operating of the manufacturing system: the initial schedule is abandoned for a new one computed in real-time and instead of operating in normal conditions, the system wastes time recalculating a global optimum [9]. This frequent operation scheduling, also called “nervousness”, can affect the normal operation of the control system because by taking into consideration new changes in the optimization model the optimization procedure is re-run causing instability for the orders implementing the optimized command in the physical world (e.g., due to frequent updates an order can remain blocked or circulate between two stations without executing any operation). This time gap between the physical change and the model change is emphasized if the IT infrastructure is not so powerful in order to run the optimization in real-time. Thus, it is advisable to avoid time-expensive full-scale computation of the production plan and
54
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
Fig. 3. Replanning and rescheduling when dealing with hard changes.
Fig. 4. Replanning and rescheduling when dealing with soft changes.
resource schedule during execution, allowing only changes that limit global cost increase beyond a significant threshold and maintain smooth operating [19]. The described production optimization process falls into the category flexible Job Shop Scheduling Problem (JSSP). JSSP is an NP (nondeterministic polynomial time) complete problem; thus a mathematical (exact) algorithm cannot compute an optimal solution in polynomial time. Nevertheless, in current manufacturing practice it is not always important to aim at a global optimum, but to be agile and responsive to events like rush orders and resource failures or performance degradation [20,21]. Optimization engines such as ILOG CPLEX [22] construct a solution (allocation of operations on resources) sequentially and continuously improve it. The advantage of this operating mode is that the time interval in which a feasible solution for a given search space (fixed shop floor layout, batch size, and order complexity) is computed can be evaluated. This interval of time is used afterwards in conjunction with the moment when an order will be completed (and a new one should be inserted) to determine when the optimization model will be verified against the updated parameters recorded from the instrumented resources. If significant changes are detected the optimization algorithm will be executed in the cloud, otherwise its previous instance will be used. In either case a solution will be available when a completed order (and finished product) leaves the system and a new one has to enter it. Using this method, rescheduling is done only once before inserting a new order into production and only if a major soft change is
detected. The rescheduling is done using the updated optimization model. This method of operation assures that the undesirable characteristic of nervousness is eliminated from the control system. Depending on the magnitude of the cost change, an order with new or the same scheduled operations will be chosen for the configuration of the Order Agent in charge of product execution. This process of adjusting the production plan and resource schedule with updated information is described below: Step 1. Run offline the production planning and operation scheduling on resources (optimization algorithm); record the running time (OM_running_time) (Fig. 5) Step 2. Choose the first order from the optimized batch, assign it to the first available pallet and remove it from the batch
Fig. 5. Time window for starting the recalculation of a new plan and schedule at soft changes.
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
Step 3. If there are more pallets available GOTO Step 2 Step 4. Compute when the first pallet is ready (delay_first_exit), wait delay_first_exit –OM_running_time Step 5. Update operations parameters (read from Operations on resources table) Step 6. If the model used as input for the optimization algorithm changed during operation and there are orders to be processed then rerun the optimization algorithm Step 7. GOTO Step 2 4. Products table (products_table) – contains a listing of all products available for manufacturing. This table is filled offline by the operator in the production design stage. 5. Operations on products table (op_on_prod_table) – realizes the link between the products types and the needed operations. It also specifies the precedencies between different operations needed for a class of products through the use of a field called operation level. Its value specifies the level on which the operation is placed in the realization of a product. This field is used to define precedencies between the operations needed for the manufacturing of a product (e.g., all operations on level n must be completed before the ones on level n+1 are processed). This field is created initially by the user based on the product specifications. 6. Client table (clients_table) – contains the list of the clients the manufacturing enterprise works with. All the information in this table is inserted by the operator. 7. Orders table (orders_table) – contains the information associated to clients’ requests together with the real-time location and completion rate of an order. It associates a request/client with a type of product. Multiple products to be manufactured generate multiple orders. The information (except the order index and the order state) is filled offline by the operator at the start of the production cycle. Processes associated with the orders table:
55
resources in the scheduling process for a given batch. In the case of hierarchical operating mode this information is computed offline by the SS and is sent sequentially (order by order and operation by operation) to the agents responsible with production execution. In the case of heterarchical operating mode only product orders are sent and the resource allocation is done decentralized by dMES through cooperation between order and resource agents. The table fields are populated automatically with data resulted from the optimization engine. Processes associated with operations scheduled on resources and production tables: Update schedule: write data resulted from production planning and operation scheduling to order agents that are in charge of implementing the schedule; as operations are sent by the supervisor agent to the physical layer tor effectively apply the operation_sent (scheduled_operations) strategy, fields from/ to_FMS are updated. This process whose core is the supervisor agent is described in Fig. 6: the agent reads information from the tables containing scheduling and resource data, and based on the commands received from the control interface sends to the order agent either the scheduled order or only the order type and its composing operations. 11. Instantaneous power table (power_table) – contains the real-time power consumption for each resource. This information is written directly by each workstation with a frequency of 1 Hz (one sample per second): The resource energy consumption is integrated in time by an acquisition board (part of the resource agent) which updates each second the power table from the database with the average instantaneous power. The acquisition board is able to call a PHP script from the cloud with the instantaneous power value and associated resource which further inserts a row in the database.
Update order location: Operation state, location, and current operation are updated by the order agent after the completion of either a processing or a transporting operation. The JADE agent writes a record on the database using the JDBC connector (see Fig.1, solid arrow ii). 8. Operations traceability table (real_time_operations) – contains information retrieved from the resource after it has finished the execution of an operation; this information is updated automatically by the resource through interaction with the WIP and is used for product traceability. Processes associated with the operations traceability table: Record operation: the main process associated with the traceability table, realized by the resource after each interaction with an order. The resource agent writes a record to this database table including the operation which has been completed, the resource which has realized it, the order to which the operation belongs together with the product type, and the timestamp of the operation. All the production execution data recorded by the resource agent is logged by the operations traceability table associated to the production database. 9. Table for interaction with the production layer (production_table) – contains fields through which the upper layer in charge with optimization, monitoring, and reporting communicates (start/stop operation, select order) with the lower layer containing the MAS framework in charge with operation execution. 10. Operations scheduled on resources table (scheduled_operations) – contains the information about operations assigned to
Fig. 6. Interaction between MES and the multi-agent framework controlling the shop-floor.
56
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
4. HA design of the Cloud infrastructure The cloud control layer presented in Fig. 1 uses an Infrastructure as a Service which executes the scheduling optimization engine, the database, and the supervisor agent. The cloud infrastructure used in this research is an IBM Service Delivery Manager (ISDM) based on IBM Tivoli Service Automation Manager (TSAM) 7.2 version; the selected virtualization solution is VMware ESXi 4.1 [23]. In the most general case, applications which are migrating in cloud do not inherit all the cloud-specific features like High Availability (HA) provided by VMware or hypervisor-specific features like Distributed Resource Sharing (DRS). In real cases, the situation is more complex, because even if a private cloud system is used this will not imply that the ISA95-level 3 MES applications (planning, scheduling, monitoring, QAQC) run for CMfg will benefit of the same facilities of the cloud system they are executed on. Using an IaaS cloud system, the services offered are related to virtual machines (VMs) which can be configured to create an environment that allows executing MES applications: a supervisor agent and a complex database which is updated at differentiated time intervals. VMs (the services) created in the cloud benefit of HA, which means that VMs which are creating the solution will be always available. However, the problem is that only the VMs are guaranteed to be available, whereas the functions offered by applications within the VMs can change (the function may crash or may have bugs giving wrong information) and become unavailable. In such situations the CMfg service may become unavailable, although there is no VM malfunction. The problem of high availability of the control CMfg applications is solved using a HA cluster which offers the necessary availability for the applications and the database. Fig. 7 presents a HA solution based on the following virtual machines: 1. Two virtual machines which are implementing the HA load balancer; the Load Balancer has a public IP and can be reached from Internet. The Load Balancer cluster listens on port 443 for secure https requests and for JAVA agent communication; any request received is forwarded to be processed by that HA cluster with which it has a direct link in the internal network. The Load Balancer is not using direct routing because the HA cluster doesn’t have direct access in Internet; thus the routing method is Network Address Translation (NAT), and the load balancer is the gateway for the HA Nodes. IPSec (Internet Protocol Security) [24] was used to implement a VPN which creates a secure communication line between the manufacturing infrastructure and the Load Balancer, allowing JAVA agents to communicate securely. In this way Internet communication is secured with https and JAVA agent communication with VPN. The requests coming from Internet or from the manufacturing infrastructure are distributed by the Load Balancer to the HA Nodes based on the Round Robin algorithm in such way that the nodes will receive the same number of requests. Before sending the request to the HA Nodes, the Load Balancer verifies the service availability on the HA Nodes using nany agents (agents which check whether the service in accessible and the Node is up and running). If the nany agent detects a service that is unavailable on a HA Node, the load balancer will commute the session on another node and the new incoming requests are redirected to this node until the failed one recovers. The Load Balancer Backup is monitoring the status of the main Load Balancer. If the Load Balancer fails, the backup will take over the load balancing function and assign the public IP; thus the functions of the VMs in the Load Balancer cluster will change, the failed node becoming the backup node. The Piranha load balancer from RedHat [25] was used for implementation with Linux as operating system. 2. The HA Cluster contains two VMs and is configured to offer high availability for three services: i) web interface access using the
Fig. 7. The VM infrastructure in cloud.
https protocol on port 443; apache was used as a web server, and certificates were used to encrypt the connections using 1024 bit SSL (Secure Socket Layer) keys [26]; ii) JAVA agent communication, the connections being made between the agent which is running on the HA node and the agents on the manufacturing infrastructure; IPSec was used to encrypt the communication in Internet (from Load Balancer to the manufacturing infrastructure). Also, in the internal network there is no encryption; iii) database access: a High Availability MySQL Cluster was used to store the database. The configuration of the HA Cluster allows executing in parallel the services on both nodes. The RedHat High Availability Cluster was used for the implementation of this unusual configuration as compared with classical HA where a service runs only on one node. This configuration was chosen to obtain a faster system to process requests. In the event that a service is unavailable the cluster will try first to restart the service on the same node; if this operation will not work and the service is already running on the other node, then the service will be stopped on the current node. If the restart fails but the service is not running on the other node, then the service will be relocated on the other node. The fencing function which isolates the failed node to prevent further malfunctions of the cluster is stopped on the cluster because in this configuration all nodes can execute in parallel the same services. 3. The MySQL cluster is composed by four VMs which are grouped in two Node Groups. The Database is divided in two parts each of them being hosted in the two Node Groups. For example if we consider a table on the database which has four fields (F1, F2, F3, and F4), then Node 1 will store F1 and F3, Node 2 will store F3 and F1, Node 3 will store F2 and F4, and Node 4 will store F4 and F2.
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
57
The cluster is replicating the data between the nodes, without losing data consistency. In the example above if two nodes from two node groups will fail the database will be still available and consistent. The MySQL Cluster protects against outages offering the following facilities:
system, but if a higher level of security is required it is possible to isolate the MySQL Cluster from the Load Balancer by using a VLAN which connects the HA Cluster with the MySQL Cluster.
Synchronous Replication - data within each data node is synchronously replicated to another data node. Automatic Failover - the heart beating mechanism detects in realtime any failures and automatically fails over to other nodes in the cluster, without service interruption. Self-Healing - failed nodes are able to self-heal by automatically restarting and resynchronizing with other nodes before rejoining the cluster, with complete application transparency. Shared Nothing Architecture, No Single Point of Failure - each node has its own disk and memory, so the risk of a failure caused by shared components such as storage is eliminated.
In this section we provide an experimental evaluation of the data collection process of the MES platform which gathers information from devices embedded on industrial resources (machines, robots) and uploads it to the cloud for centralized tasks. The prototype process used to validate the system is the monitoring of energy consumption. For this purpose two scenarios are analysed: first is the communication between the embedded devices measuring instantaneous power and the MES database located on the cloud; second there is the real-time aspect of data acquired from resources. The first scenario is associated to the “Instantaneous power table” which monitors the power profile consumed by each resource. The important aspect here is the performance of the communication link between the embedded devices distributed in the shop floor and the cloud. The interest is for a high frequency update rate and a low latency in order to obtain a detailed energy consumption graph. The second scenario is associated to the “Operations on resources table” which monitors energy consumption at operation level for each resource. The important aspect here is how the embedded devices store individual power samples and integrate them in time. In order to test the proposed scenarios, experiments involving a generic pick and place industrial robot operation [28] have been conducted and data has been recorded about:
The system administrator is notified if an error event is detected on any level of the infrastructure (HA Cluster, Load Balancer or MySQL Cluster). The notification informs about the problem and is offering the possibility to find the problem and to solve the malfunctions. The environment based on VMs was designed with monitoring facilities; in the case an unusual situation appears, an alert will be sent to the system administrator in order to debug the system. The performance of the cloud system is more than sufficient for a 6station shop floor MES application (see the next section); also, in the case of a large plant with more than hundred workstations (robots, CNC machines) producing batches of products subject to multiple operations, the proposed HA system can be configured to scale. In this case only the VMs which are implementing the HA cluster and the MySQL cluster will be multiplied and added to the system. The configuration of the system can be scaled depending on the complexity of production orders and the number of manufacturing resources involved in production. After the batch order is placed and analysed, the type and number of necessary manufacturing resources is computed and a command is sent to TSAM using REST API. The command contains instructions about the number of VMs required for the HA cluster and MySQL cluster. After receiving the command, TSAM contacts the hypervisor and starts cloning the templates of the involved VMs. After cloning, the hypervisor reconfigures the hardware of the VMs (CPU, RAM, HDD, and network) and starts the VMs. Each VM has a set of scripts which are executed only one time, when the VM starts the first time; the scripts reconfigure the MySQL cluster and the HA Cluster in order to add the new machines in the system. If, after a while, there are orders which require less computing resources the VMs are placed by TSAM in suspend mode or they can be de-provisioned, depending on the production planning (long term planning). Regarding the infrastructure and the tools used to implement the CMfg solution, they offer High Availability on two levels: the cloud system offers HA at the infrastructure level and the Load Balancing/HA Clustering solutions offers HA at the application level. From the point of view of security, the communication between clients and the web interface is encrypted using self-signed SSL certificates that offer a good level of security by using the DiffieHelman algorithm [27]. The communication between JAVA agents is also encrypted in Internet by using an IPSec VPN tunnel between the gateways of the manufacturing infrastructure and the load balancer in the cloud infrastructure. The other connections are made in the local network which is a private network implemented with virtual LAN’s (VLAN). The HA Cluster and the MySQL cluster are in the same VLAN in the cloud
5. Experimental results
a Latency: consists in the evaluation of the time delay between the signal transmission from the embedded device and its update in the database on the cloud. In order to realize this experiment an NTP server (time.nist.gov) has been used both on the cloud and on the embedded device and timestamps at message sending and receiving have been compared (Fig. 8). The method used for data transmission is a parameterized PHP script which contains the useful process information together with a timestamp from the sending point. Since the database frequency
Fig. 8. Screen capture of the table used to test the connection between the embedded device and the cloud platform.
58
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
update rate is dependent on the PHP calls made by the embedded device, a test was carried out where the main loop of the embedded device consists only of the script call. The obtained results are presented in the screen capture shown in Fig. 8. The average measured time for executing a simple update loop on the embedded device is 100 ms which can also be seen from the database update on the cloud (Fig. 8, 10 samples in 1 s – from 11:14:37 to 11:14:38). The average update time for the instantaneous power is 1.24 s which consists of the embedded device loop time, transmission delay, and the information update delay in the database. Communication tests (Fig. 9) have been carried out on a private cloud located in the same network with the manufacturing shop floor. This information is relevant for real-time update of the production database in the cloud. Signal reproduction on the cloud: pick and place operations were executed on an industrial robot; the measured instantaneous power was sent to the cloud and stored in the “Instantaneous power table”. An external oscilloscope was used to monitor the original signal from the current transformer (Fig. 10). Data obtained from the two sources are compared in the form of two graphics (Fig. 11) using the same time base. Signal integration on the embedded system associated to the resource: the same pick and place operation from scenario 2 was executed, but this time the energy was measured and integrated using the embedded device and secondly the instantaneous power was sent to the cloud. This information goes into the operations on resource table and is relevant for the planning and scheduling model. The embedded device’s main loop has four stages: i) time update (the synchronization with time.nist.gov is
done outside of the main loop), ii) instantaneous power computation and consumed energy update (this is the most intensive part since it uses 1500 samples to compute instantaneous power [29], iii) PHP script call for database update with the instantaneous power, and iv) TCP server connection used to externally interrogate the total consumed energy (this is used by the resource agent to precisely monitor the energy used by a certain operation). The loop is executed on average at 370 ms, most of this time being spent to compute the instantaneous power (an average of 260 ms). The energy consumed for executing the pick and place operation (robot motion is limited at 50% cruise speed) measured by the embedded device is 1137Ws. Measured results are reported in Fig. 12. Database maximum utilization rate: in order to determine the upper limit of the database update frequency a testing hardware was set up according to Fig. 7: the VMs used for the two clusters are using 4 cores (Intel Xeon X5660 at 2.8 GHz) and 8 GB of RAM, the MySQL Cluster is v7.4. The HA cluster was composed by two machines and the MySQL Cluster by four machines. The benchmark machine is a Linux 64 bit one running on a VM with the same hardware configuration as the VMs in the clusters. The benchmarking tools used were HammerDB and DBT2 benchmark tool (flexAsynch). HammerDB was configured to simulate 10 virtual users in a TPC-C benchmark (a benchmark using transactions); for each user the number of total transactions per user was set to 1 M. For flexAsynch (which is a tool developed
Fig. 9. Screen capture of the “Instantaneous power table”.
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
59
Fig. 10. Layout for the data collection and its upload to the cloud database.
Fig. 11. Signal reproduction on the cloud compared to the signal acquired with a high frequency oscilloscope.
Fig. 12. Signal on the cloud compared to signal acquired locally with embedded device.
to test the scalability of MySQL Cluster) the configuration used was: 4 LDM (Local Data Manager) threads, non-HT (one thread per CPU core); 4 TC (Transaction Coordinator) threads, HT (multiple treads per core); 2 send threads, non-HT; 8 receive threads, HT. In this scenario, the maximum recorded number of requests
processed per second was 437,532. This value is much greater than the values needed in the case of simple data (case a, 10 records/second) or in the case of local processing (case b and c, 10 records/second) allowing thus the monitoring of a large set of embedded devices.
60
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra
Table 1 Comparison between the data captured by the embedded device and uploaded to the cloud and the data acquired by the oscilloscope.
Number of samples for computing the energy consumption for the pick and place operation Total energy for pick and place operation
Monitoring system on the embedded device
Cloud
Oscilloscope
Approx. 50.000 raw samples consisting of 31 samples of instantaneous power (Fig. 12), each sample being computed using 1500 samples of the instantaneous current 1137 Ws
Approx. 50.000 raw samples consisting of 26 samples of instantaneous power (Figs. 11 and 12), each sample being computed using 1500 samples of instantaneous current 1162 Ws
100.000 raw samples processed according to the formulas used to compute the energy by the embedded device (Fig. 11) 1172 Ws
By comparing data from the above described scenarios the following observations can be made (Table 1): The embedded device used for energy monitoring limits the frequency update but the overall precision for either cloud or embedded device energy integration is good. Samples are lost when using the embedded device due to the fact that the main loop is executed in a single thread; while it communicates it loses samples. There is a delay between the signal from the oscilloscope (directly measured) and the signal recorded with the embedded device (see Fig. 11).
CMfg control architecture are: i) easy integration of manufacturing data from multiple sources into a cloud infrastructure using a multiagent framework distributed over the field entities and on the cloud (Fig. 1), ii) fault tolerance and high availability of the MES global applications centralized at MES level, running on cloud (Fig. 7), iii) scalability in running in real manufacturing time intensive computing applications (e.g., optimization engines for product rescheduling and resource re-allocation at significant changes in the performances and utilization costs of resources) in the cloud (Fig. 7), and iv) sustainability of shop floor processes and infrastructure by the supervision of energy consumption.
References 6. Conclusions Taking into consideration these observations, a number of conclusions are formulated with respect to the real-time capability of the cloud infrastructure to acquire and process multiple data sources for global MES tasks: Data can be processed both locally on the embedded device (in the case of instantaneous power pattern monitoring) and on the cloud (energy consumption at operation level) since the latency permits this option. The difference between the signal acquired from the device and the signal uploaded on the cloud in terms of consumed energy is negligible. New MES designs based on private cloud platforms are able to act in real time on predicted data, thus anticipating shop floor events and allowing prevention of failures; this new capability is based on real time collecting data from shop floor devices and products with embedded intelligence by considering the covariance of multiple metrics monitored. Manufacturing is highly sensor-intensive and thus a dataintensive environment, the data being generated at the edge of production systems, which causes challenges on features such as bandwidth, network latency, overall speed and hence agility and responsiveness. Edge computing offers a new perspective for the industrial IoT space by integrating intelligence and computing capabilities directly into small-footprint IIoT edge devices: IoT gateway devices and aggregation nodes. Summarizing, the article describes a semi-heterarchical manufacturing control solution based on a private cloud infrastructure which collects data in real-time from intelligent devices associated to shop-floor entities: resources and mobile devices embedding the work in process (WIP) on products during their execution cycle. The cloud platform acts as a centralized system scheduler (SS), planning jobs and allocating resources optimally at batch level, and integrates real-time data and status information about manufacturing resources and orders from agentified, stationary or mobile shop floor devices. The main qualities of the proposed
ileanu, T. Berger, D. Trentesaux, Switching mode control [1] T. Borangiu, S. Ra strategy in manufacturing execution systems, Int. J. Prod. Res. 53 (7) (2015) 1950–1963, doi:http://dx.doi.org/10.1080/00207543.2014.935825. [2] S. Kubler, J. Holmstrom, K. Främling, P. Turkama, Technological Theory of Cloud Manufacturing, Service Orientation in Holonic and Multi-Agent Manufacturing, Vol. 640 of Studies in Computational Intelligence, Springer, 2016, pp. 267–276, doi:http://dx.doi.org/10.1007/978-3-319-30337-6_24. [3] D. Wu, M.J. Greer, D. Rosen, D. Schaefer, Cloud manufacturing: strategic vision and state-of-the-art, J. Manuf. Syst. 32 (2013) 564–579, doi:http://dx.doi.org/ 10.1016/j.jmsy.2013.04.008. [4] F. Gamboa Quintanilla, O. Cardin, A. L’Anton, P. Castagna, A Petri net-based methodology to increase flexibility in service-oriented holonic manufacturing systems, Comput. Ind. 76 (2016) 53–68, doi:http://dx.doi.org/10.1016/j. compind.2015.09.002. [5] O. Morariu, C. Morariu, Th. Borangiu, Shop-floor resource virtualization layer with private cloud support, J. Intell. Manuf. (January) (2014) 1–16, doi:http:// dx.doi.org/10.1007/s10845-014-0878-7. [6] H. Meyer, F. Fuchs, K. Thiel, Manufacturing Execution Systems, McGraw-Hill, 2009. [7] J. Barbosa, P. Leitão, E. Adam, D. Trentesaux, Dynamic self-organization in holonic multi-agent manufacturing systems: the ADACOR evolution, Comput. Ind. 66 (2015) 99–111, doi:http://dx.doi.org/10.1016/j.compind.2014.10.011. [8] A. Giret, V. Botti, Engineering holonic manufacturing systems, Comput. Ind. 60 (6) (2009) 428–440, doi:http://dx.doi.org/10.1016/j.compind.2009.02.007. [9] J.M. Novas, R. Bahtiar, J. Van Belle, P. Valckenaers, An approach for the integration of a scheduling system and a multi-agent manufacturing execution system. Towards a collaborative framework, Proc. of 14th IFAC Sympos. INCOM’12, Bucharest, IFAC PapersOnLine (2012) 728–733, doi:http://dx.doi. org/10.3182/20120523-3-RO-2023.00156. [10] O. Morariu, C. Morariu, T. Borangiu, Security issues in service oriented manufacturing architectures with distributed intelligence, in: T. Borangiu, D. Trentesaux, A. Thomas, D. McFarlane (Eds.), Service Orientation in Holonic and Multi-Agent Manufacturing, Vol. 640, Springer Series SCI, 2016, pp. 205–224, doi:http://dx.doi.org/10.1007/978-3-319-30337-6_23. [11] L. Wang, W. Shen, S. Lang, Wise-ShopFloor: a web-based and sensor-driven shop floor environment, Proc. 7th International Conference on IEEE Computer Supported Cooperative Work in Design (2002) 413–418, doi:http://dx.doi.org/ 10.1109/CSCWD.2002.1047724. [12] H.V. Van Brussel, J. Wyns, P. Valckenaers, L. Bongaerts, P. Peeters, Reference architecture for holonic manufacturing systems: PROSA, Comput. Ind. 37 (3) (1998) 255–274 Special Issue on IMS. nescu, C. Pascal, A constraint satisfaction approach for planning of multi[13] D. Pa robot systems, Proc. 18th Int. Conf. on System Theory, Control and Computing (ICSTCC) (2014), doi:http://dx.doi.org/10.1109/ICSTCC.2014.6982408. [14] D. Trentesaux, A. Thomas, Product-driven control: a state of the art and future trends, IFAC Proc. 45 (6) (2012) 716–721, doi:http://dx.doi.org/10.1007/978-3642-35852-4_9. [15] C. Indriago, O. Cardin, N. Rakoto, P. Castagna, E. Chacòn, H2CM: a holonic architecture for flexible hybrid control systems, Comput. Ind. 77 (2016) 15–28, doi:http://dx.doi.org/10.1016/j.compind.2015.12.005. [16] D. McFarlane, Product intelligence: theory and practice, IFAC Proc. 45 (6) (2012) 9–14, doi:http://dx.doi.org/10.3182/20120523-3-RO-2023.00441.
ileanu et al. / Computers in Industry 102 (2018) 50–61 S. Ra [17] F. Bellifemine, G. Carie, D. Greenwood, Developing Multi-agent Systems With JADE, Wiley, 2007978-0-470-05747-6. ileanu, F. Anton, A. Iatan, T. Borangiu, S. Anton, O. Morariu, Resource [18] S. Ra scheduling based on energy consumption for sustainable manufacturing, J. Intell. Manuf. 2015 (2015), doi:http://dx.doi.org/10.1007/s10845-015-1142-5. [19] G.M. Kopanos, E. Capón-García, A. Espuña, L. Puigjaner, Costs for rescheduling actions: a critical issue for reducing the gap between scheduling theory and practice, Ind. Eng. Chem. Res. 47 (2008) 8785–8795, doi:http://dx.doi.org/ 10.1021/ie8005676. [20] O. Sauer, Automated engineering of manufacturing execution systems – a contribution to "adaptivity" in manufacturing companies, Proc. of DET’08, Nantes, (2008) , pp. 181–193. [21] F.G. Quintanilla, O. Cardin, A. L’Anton, P. Castagna, A modeling framework for manufacturing services in service-oriented holonic manufacturing systems, Eng. Appl. Artif. Intell. 55 (2016) 23–36, doi:http://dx.doi.org/10.1016/j. engappai.2016.06.004.
61
[22] ***, Optimization modeling with IBM ILOG OPL, Instructor workbook, 2009. [23] R. Buyya, C. Vecchiola, T. Selvi, Mastering Cloud Computing Chapter 3 Virtualization, (2013), pp. 71–109, doi:http://dx.doi.org/10.1016/B978-0-12411454-8.00003-6. [24] T. Rowan, VPN technology: IPSEC vs. SSL, Netw. Secur. 2 (12) (2007) 13–17. [25] Red Hat, Red Hat Enterprise Linux 5, Virtual Server Administration, 2009. [26] H. Meyer, Securing intranet data with SSL client certificates, Comput. Secur. 16 (4) (1997) 334. [27] L. Harn, C. Lin, Efficient group Diffie–Hellman key agreement protocols, Comput. Electr. Eng. 40 (August (6)) (2014) 1972–1980. [28] T. Borangiu, Advanced Robot Motion Control, Romanian Academy Press & AGIR Press, Bucharest, 2003, pp. 1–450 ISBN 9732709681, 973-8130-99-9. [29] https://learn.openenergymonitor.org/electricity-monitoring/ct-sensors/howto-build-an-arduino-energy-monitor-measuring-current-only, consulted in September 2017.