Anton Friedl, Jiří J. Klemeš, Stefan Radl, Petar S. Varbanov, Thomas Wallek (Eds.) Proceedings of the 28th European Symposium on Computer Aided Process Engineering June 10th to 13th, 2018, Graz, Austria. © 2018 Elsevier B.V. All rights reserved. https://doi.org/10.1016/B978-0-444-64235-6.50176-5
Learning operation strategies from alarm management systems by temporal pattern mining and deep learning Gyula Dorgoa , Peter Piglera , Mate Haragovicsb and Janos Abonyia,* a MTA-PE
Lend¨ulet Complex Systems Monitoring Research Group, Department of Process Engineering, University of Pannonia, Egyetem str. 10, Veszpr´em, H-8200, Hungary b MOL Danube Refinery, Olajmunk´ as str. 2., Sz´azhalombatta, H-2443, Hungary
[email protected]
Abstract We introduce a sequence to sequence deep learning algorithm to learn and predict sequences of process alarms and warnings. The proposed recurrent neural network model utilizes an encoder layer of Long Short-Term Memory (LSTM) units to map the input sequence of discrete events into a vector of fixed dimensionality, and a decoder LSTM layer to form a prediction of the sequence of future events. We demonstrate that the information extracted by this model from alarm log databases can be used to suppress alarms with low information content which reduces the operator workload. To generate easily reproducible results and stimulate the development of alarm management algorithms we define an alarm management benchmark problem based on the simulator of a vinyl acetate production technology. The results confirm that sequence to sequence learning is a useful tool in alarm rationalization and, in more general, for process engineers interested in predicting the occurrence of discrete events. Keywords: alarm management, operator support system, recurrent neural networks
1. Introduction Forecasting of discrete events has critical importance in process (safety) engineering (Baptista et al., 2018). In complex chemical production systems faults generate long sequences of alarm and warning signals. Although - according to alarm management guidelines - a single abnormal event should produce a unique and informative alarm, the high number of interacting components in modern production systems makes the co-occurring and redundant alarms almost inevitable (Mehta and Reddy, 2015). Since most of the signals are redundant, the handling of the malfunctions is a challenging task even for well-trained operators due to the unnecessarily high information overload. The concept of advanced alarm management is that the occurring alarms should be grouped and the prediction of future events should be used for the suppression of predictable, nuisance alarms. The core concept of our methodology is based on the assumption of predictable alarms do not contain any novel information, so the state of the process can be already characterized based on the previously registered signals, so this information is enough to determine what operator actions are required in the given situation. As the efficiency of this approach relies on the performance of the model used for the prediction of the events, we selected the most advanced sequence to sequence deep learning neural network structure (Kiros et al., 2015) to realize our concept.
1004
G. Dorgo et al.
2. Methodology 2.1. Formulation of the sequence to sequence prediction problem In process engineering practice, we frequently need to predict the occurrence of a discrete event. This task is highly relevant in alarm management, where the problem can be formulated as follows. Alarm and warning signals can be treated as states of the technology. Each state (denoted by s) is represented by < pv, a > data couples, where pv is the index of the process variable and a is the attribute showing the process variable’s value related to the alarm and warning limits, such as a ∈ {Low A, Low W, High W, High A}, where A stands for Alarm and W stands for Warning. For example the description of a state can be represented as follows: se := < Column Top Temperature, High A >. An event is the time interval in which the defined state occurs, denoted by e. We suppose that the Φ = (e1 , ..., eT ) sequence of events (present and past alarms) defines the state of the process and we assume that based on this information we can ˆ where Φ ˆ = (eˆ1 , ..., eˆ ˆ ). Therefore, we look for a predict the sequence of future events Φ ⇒ Φ, T ˆ Φ = f (Φ) model that efficiently handles this sequence to sequence modeling problem. 2.2. Encoder-decoder recurrent neural network based sequence prediction The most advanced approach of sequence to sequence learning is based on deep learning neural networks. Although Nguyen et al. (2017) applied these models for event prediction, according to our knowledge, our work is the first example when it is used for the prediction of alarm and warning signals. The application of this unique recurrent neural network is not trivial, so in this section, we describe the proposed goal-oriented model structure (presented in Figure 1). Forming the input of the model: Figure 1 highlights the important characteristics of the input sequences. As can be seen, an end-of-sequence (EOS) tag is appended to the sequence. To ensure fixed sequence length, we completed the sequences to the T -th element by adding padding symbols (PAD) after the EOS tag. Moreover, the order of the events in the input sequence is reversed, since Sutskever et al. (2014) found that the prediction accuracy improves when the beginning of the input sequence is ”closer” to the beginning of the predicted sequence.
Figure 1: The schematic illustration of the proposed methodology. The encoder maps the input sequence into a fixed length vector representation. Using this vector as initial state the decoder layer determines the next event with the highest probability by the argmax function of the dense layer.
Learning operation strategies from alarm management systems
1005
Embedding layer: To utilize the sequences of symbols as inputs of the neural network, we encode the symbols of the events as one-hot encoded vectors, oht , where only one bit related to the encoded signal is fired among the nd bits, where nd represents the number of symbols. The embedding layer realises a xt = Wemb oht linear transformation, which maps the one-hot encoded vectors into a lower (ne ) dimension of continuous values. Note that in Figure 1 the embedded form of the EOS and PAD symbols are represented as EOS’ and PAD’ respectively. Encoder and decoder layers: The encoder LSTM layer maps the embedded input sequence into its internal states. The vector of the internal states and the activity values of the encoder layer are used for the conditioning of the LSTM units of the decoder layer, which means the transferring of information about what has happened in the process and what kind of prediction should the decoder layer generate. The decoder itself is trained to predict the next event of the predicted sequence, so the procedure is repeated until the prediction of an EOS signal or reaching the maximum sequence length. Dense layer: The decoder layer maps the xˆ tˆ input event into a hˆ tˆ vector of real values represented as hˆ tˆ = hˆ 1 , . . . , hˆ tnˆ U and used to calculate the probabilities of the occurrences of the events by the softmax activation function represented by the dense layer in Figure 1, exp hˆ tˆ)T ws, j + b j P(eˆtˆ+1 |ˆxtˆ) = P(eˆtˆ+1 |hˆ tˆ) = nd (1) ∑ exp (hˆ tˆ)T ws, j + b j j=1
where ws, j represents the j-th column vector of the Ws weight matrix of the output dense layer of the network, and b j represents the bias. Training: Deep learning requires a high number of training sequences which can be extracted from the log files of the process control system. When the malfunctions are also logged, special sequences can be defined that can be also used for fault classification (Dorgo et al., 2018). For the training of the model, we have to encode the input data to the one-hot vectorized form. The input data of the decoder is the one-hot vectorized form of the sequences that we want to predict. The decoder target data is also identical to the decoder input data, but it is shifted by one timestep since from the et event we would like to predict the et+1 event. This training approach is referred as teacher forcing, which means that we use the expected future output from the training dataset at the current time step as input in the next time step, rather than the predicted output generated by the model, (J. Williams and Zipser, 1998). Using this technique, we train all of the layers (the two embedding, the encoder, the decoder and the dense layers) simultaneously. Prediction: Prior to the prediction, we need to encode the sequence defining the internal state vector. The internal states of the encoder network are transferred to the decoder layer. The prediction starts with the start-of-sequence symbol (marked as StOS in Figure 1). The layer generates a prediction for the next event which will also be applied to the input of the next time step. The generated events are always appended to the predicted target sequence. This prediction process is repeated until the layer generates the end-of-sequence symbol or hit the previously set limit of the length of the predicted target sequence. Evaluation: The evaluation of the model should be related to its intended application. Since we are focusing to build an alarm suppression algorithm, we defined two measures to evaluate the performance of the resulted models. Firstly, we defined val1 as the percent of sequences having at least one well-predicted event, secondly, we calculate the percent of the well-predicted events (val2 ).
1006
G. Dorgo et al.
3. Results We present a reproducible benchmark example to ensure the comparability and reproducibility of our results. We extended the widespread dynamic simulator of a vinyl-acetate production system (VAC Chen et al. (2003)) to be able to serve case studies of alarm management and event analysis. The used dynamic simulator of the vinyl acetate (VAc) process contains 27 controlled and 26 manipulated variables, therefore it is complex enough to define alarm management problems. The model is available from the website of the authors (www.abonyilab.com). The extended simulator handles 11 malfunctions related to the fault of valves or actuators (see Figure 2). The events (alarms and warnings) were defined when the process variables exceeded certain threshold limits used to determine the normal operating range. The details of the alarms are given in our previous publication (Dorgo et al., 2018). In this study, the faults have lognormal time distribution, and their effects were simulated in 200 one-hour-long operating periods. A 100-minute time window was used to identify events that we consider as direct consequences of the malfunctions. We transformed these events into sequences based on their start time. The resulted sequences were filtered for the minimal length of five events, and the first half of the sequence was used as the encoder input sequence, and the rest of the sequence was used as the target sequence for prediction. From the originally generated 2200 sequences 1289 satisfied the minimal sequence length condition. The simulation and the data preprocessing were carried out in MATLAB environment. Since we are interested in the development of open source and industrially applicable solutions, the deep neural network was identified and applied in Python/Keras using Tensorflow as backend. We trained the model using a Nvidia GeForce GTX 1060 6GB GPU with the application of CUDA. We selected the optimal model structure by 7-fold cross-validation experiments. The final model has an embedding dimension of 40. The number of LSTM units in both the encoder and decoder layer was 256. The number of epochs was set to 4000, with 256 as batch size using the RMSProp optimizer of Keras. The performance of the resulted model is shown in Figure 3. As the 7-fold cross validation illustrates, the model has consistently excellent performance in case of all fault types.
Figure 2: Flow chart of the vinyl acetate production technology (The numbers in circle (red) show the type of the implemented fault)
Learning operation strategies from alarm management systems
1007
Figure 3: 7-fold cross validation of the prediction performance. Val1 represents the percent of the sequences with at least one correctly predicted event, while Val2 shows the average percent of the correctly predicted events. The first column shows the overall performance, while the other columns show the prediction power of the model in a given fault type. Table 1 gives a didactic example for sequence prediction and for the evaluation process. In case LW and QHW are correctly predicted, therefore val = 66%, of the first sequence two event tags, T14 2 20 while in the case of the second sequence, all of the events are predicted correctly (however, not in the correct order), therefore val2 = 100%. Table 1: Illustration of the sequence prediction procedure. We denote the alarms by capital letters of T, Q and L to show the type of the measure variable, i.e. temperature, quality and level respectively. The letters L, H, W and A stand for low, high, warning and alarm respectively, and the numbers in lower script show the No. of the process variable. The prediction is based on input sequences used to represent the current operating state. To demonstrate how we evaluate the model performance the predicted and the logged sequences are also shown in two particular example. Input Sequence True Sequence Predicted sequence HA T LA T LW EOS LW QHW LLW EOS LW QHW LHW EOS T14 T T14 14 7 14 20 12 20 12 LW HW HA LA LA LW LW HW HW LW LW LHW T HW EOS L3 T16 T16 L3 T16 EOS L3 T16 T16 L12 EOS L3 T16 12 16
1008
G. Dorgo et al.
4. Conclusions We proposed a sequence to sequence learning based methodology to model the temporal patterns of discrete events related to the operation of complex chemical processes. As we were interested in ”translating” the sequence of past events that define the state of the technology and utilized operating strategy into the future sequence of events, the structure of the proposed recurrent neural network model has been inspired by the deep learning models used for language translation. The model consists of long short-term memory units which are specifically designed to handle long-term time-dependency, so ideal for the prediction of discrete events. The encoder layer maps the input sequence into a vector with fix length, while the decoder layer generates the next prediction based on the previously predicted elements. Both the inputs of the encoder and the decoder layer utilize embedding layers to map the set of discrete events into a lower dimension of continuous variables, while the prediction of the decoder layer is calculated by the argmax function of the dense layer. We designed a benchmark simulation example to evaluate the effectiveness of alarm suppression algorithms. The results demonstrate the applicability of the proposed methodology for the extraction of useful operating patterns, which can be transformed to alarm suppression rules.
Acknowledgements The research has been supported by the National Research, Development and Innovation Office NKFIH, through the project OTKA 116674 (Process mining and deep learning in the natural sciences and process development) and the EFOP-3.6.1- 16-2016- 00015 Smart Specialization Strategy (S3) Comprehensive Institutional Development Program.
References M. Baptista, S. Sankararaman, I. P. de Medeiros, C. Nascimento, H. Prendinger, E. M. Henriques, 2018. Forecasting fault events for predictive maintenance using data-driven techniques and arma modeling. Computers and Industrial Engineering 115 (Supplement C), 41 – 53. URL http://www.sciencedirect.com/science/article/pii/S036083521730520X R. Chen, K. Dave, T. J. McAvoy, M. Luyben, 2003. A Nonlinear Dynamic Model of a Vinyl Acetate Process. Industrial and Engineering Chemistry Research 42 (20), 4478–4487. URL http://dx.doi.org/10.1021/ie020859k G. Dorgo, P. Pigler, J. Abonyi, 2018. Understanding the importance of process alarms based on the analysis of deep recurrent neural networks trained for fault isolation. Journal of Chemometrics, to appear. R. J. Williams, D. Zipser, 09 1998. A learning algorithm for continually running fully recurrent neural networks 1. R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, S. Fidler, 2015. Skip-thought vectors. CoRR abs/1506.06726. URL http://arxiv.org/abs/1506.06726 B. Mehta, Y. Reddy, 2015. Chapter 21 - alarm management systems. In: B. Mehta, Y. Reddy (Eds.), Industrial Process Automation Systems. Butterworth-Heinemann, Oxford, pp. 569 – 582. URL http://www.sciencedirect.com/science/article/pii/B9780128009390000218 D. Q. Nguyen, D. Q. Nguyen, C. X. Chu, S. Thater, M. Pinkal, 2017. Sequence to sequence learning for event prediction. CoRR abs/1709.06033. URL http://arxiv.org/abs/1709.06033 I. Sutskever, O. Vinyals, Q. V. Le, 2014. Sequence to sequence learning with neural networks. CoRR abs/1409.3215. URL http://arxiv.org/abs/1409.3215