Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
Contents lists available at ScienceDirect
Process Safety and Environmental Protection journal homepage: www.elsevier.com/locate/psep
Model selection and fault detection approach based on Bayes decision theory: Application to changes detection problem in a distillation column Yahya Chetouani ∗ Université de Rouen, Département Génie Chimique, Rue Lavoisier, 76821, Mont Saint Aignan Cedex, France
a b s t r a c t The fault detection of industrial processes is very important for increasing the safety, reliability and availability of the different components involved in the production scheme. In this paper, a fault detection (FD) method is developed for nonlinear systems. The main contribution consists in the design of this FD scheme through a combination of the Bayes theorem and a neural adaptive black-box identification for such systems. The performance of the proposed fault detection system has been tested on a real plant as a distillation column. The simplicity of the developed neural model of normal condition operation, under all regimes (i.e. steady-state and unsteady state), used in this case is realised by means of a NARX (Nonlinear Auto-Regressive with eXogenous input) model and by an experimental design. To show the effectiveness of proposed fault detection method, it was tested on a realistic fault of a distillation plant of laboratory scale. © 2013 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved. Keywords: Fault detection; Reliability; Safety; Classification; Bayes theorem; Neural networks; Dynamic systems; Distillation column
1.
Introduction
With the growing complexity of modern engineering processes and ever-increasing demand for safety and reliability, there has been great interest in the development of fault detection (FD) schemes. A FD structure not only reduces the workload of the human operators, but also improves the effectiveness and continuity of the industrial production. A “fault” is an abnormal event or a no-acceptable deviation of at least one characteristic property of the process from its normal conditions (Venkatasubramanian et al., 2003). It is typically defined as a departure from a tolerable range of an observed variable. FD methods have been actively studied throughout the last decade. Traditionally, the most commonly implemented methods have been based on model-based approaches (Patton et al., 2000; Chetouani, 2006). The most known structures are the parameter identification (Isermann, 2006; Simani et al., 2000). Other methods rely on analytical redundancy (Korbicz et al., 2004), the comparison of the actual plant behavior to that expected on the basis of a mathematical
∗
model. Parity relation method (Scola et al., 2003) and observerbased approach (Witczak et al., 2007; Chetouani, 2008) are also most often applied. In practice, process industries as mining, chemical, water treatment processes, are characterized by complex processes which often operate in several operating regimes and a large class of such systems is intrinsically nonlinear. It is often difficult to obtain nonlinear models that accurately describe plants in all regimes. In the general case of nonlinear plants, two possible choices for the model of normal condition operation may be considered: physical models or black-box models. In the first case, physical laws are applied in order to link the plant variables. Unfortunately this “white” approach requires the state-space description of the studied process, which is not very often available or it is difficult to obtain it in engineering practice. However, in modern process industry, there is a demand for data-based methods due to the complexity and limited availability of the chemical models because more advanced methods are data-driven process monitoring methods (Chiang et al., 2001).
Fax: +33 235146130. E-mail address:
[email protected] Received 5 April 2012; Received in revised form 25 January 2013; Accepted 22 February 2013 0957-5820/$ – see front matter © 2013 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.psep.2013.02.004
216
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
In order to meet the demands from industry concerning quality, efficiency and safety, numerous FD methods have been developed based on ANNs (Isermann, 2005; Ferentinos, 2005; Bouthiba, 2004; Gupta et al., 2003; Nelles, 2001). This neural approach is characterized with particular properties such as the ability to learn, generalization abilities, good approximation properties and simplicity of implementation (Tan et al., 2007; Uraikul et al., 2007; Vieiraa et al., 2004; Chetouani, 2007). Although all these techniques are well designed for the fault detection, one of the most relevant technique for the fault detection is the classification. Indeed, it is usual to see the fault detection as a classification task whose objective is to class new observations to one of the existing classes. Many methods have been developed; the support vector machine (Vapnik, 1995), the fisher discriminant analysis (Duda et al., 2001), the k-nearest neighbourhood (Cover and Hart, 1967), the digital filtering and discriminant analysis (Tiplica et al., 2003), etc. Other recent and new emerging classification approaches are based on the use of some artificial intelligence techniques particularly the Multi-Layer Perceptron (MLP) which is a very efficient nonlinear classifier (Duda et al., 2001); others classifiers are also used like the naïve Bayesian network (Langley et al., 1992), the k-dependence Bayesian classifier (Sahami, 1996),. . .This work is motivated by developing a method which combines ANNs model and the Bayes classifier for detecting changes in a chemical process as a distillation column. Firstly, this study consists to obtain a reduced and reliable model which allows describing the dynamics of this chemical process. This reliable model enables to reproduce the process dynamics under all operating conditions i.e. steady-state and unsteady state. The present study focuses on the development and implementation of a NARX (Nonlinear Auto-Regressive with eXogenous input) neural model for forecasting the process dynamics. Experiments were achieved in a binary distillation column and experimental data were used both to define and to validate the identified model. The performance of this neural model was then evaluated using the performance criteria such as Akaike’s Information Criterion (AIC), Final Prediction Error (FPE) and Bayesian Information Criterion (BIC). Results show that the NARX neural model is especially representative for the dynamic behaviour of this nonlinear process. Then, the abnormal behaviour of the process is inspected when it is submitted to faults in its control parameters. Fault detection results show that the statistical test based on the Bayes classifier is a powerful tool to detect changes in the behaviour of the distillation column. This paper is organized as follows: Section 1 describes the fault detection strategy, Section 2 describes the neural training and selection of neural structure, Section 3 the experimental set-up is introduced and presents the experimental results with the proposed FD method, and Section 4 ends the paper.
2.
Fault detection method: Bayes theorem
In this paper, the Bayes classification theory is used in order to detect faults. Bayesian decision theory is a fundamental statistical approach to the problem of pattern classification. Litovski et al. (2006) showed that the feed-forward artificial neural networks could be applied to the diagnosis of nonlinear dynamic analogue electronic circuits. They used a four-layer feed-forward neural network that realizes the Bayes classifier.
The ANN creates the probability that a circuit is faulty and points to the type of fault. Large representative set of faults was considered, i.e., all possible catastrophic transistor faults and qualified representatives of soft transistor faults were diagnosed in an integrated circuit. On a statistical basis, there is an optimal classifier that will properly determine the class of unknown patterns than any other classifier (Dupret and Koda, 2001). Notice that, although a Bayes classifier has optimal performance, it may not be perfect. The performance of a Bayes classifier is determined by how much overlap exists between the classes (Duda et al., 2001; Hecht-Nielsen, 1990). The Bayes probability method is perhaps the most appealing approach among the statistical theories (Babnik and Gubina, 2002). For a set of modes of functioning, the Bayesian approach is the most direct method of selection for the most probable mode based on the given data (Babnik and Gubina, 2002). This approach is based on the assumption that the decision problem is posed in probabilistic terms. Giving 2 classes Ni (i ∈ {1, 2}) a priori known, this rule allocates a new observation to the class Ni with the maximum a posteriori probability p(Ni /) giving the value of each descriptor, as defined in the following equation: ∈ Ni
i = arg max p(Ni /
if
(1)
where (k) = y(k) − yˆ (k) is the sequence of the residuals between the actual and predicted output of the system. This decision rule is named “Bayes decision rule” because it is based on the Bayes rule which gives the value of p(Ni /) as stated in the following equation: p(Ni /) =
p(Ni /)p(Ni ) p()
where p() =
2
(2)
p(Ni /)p(Ni ).
i=1
In this equation p() denotes the overall probability density function for the occurrence of a data record . The class (N1 ) defines the class when any fault is present and the abnormal mode is described by the class (N2 ). p(Ni ) is the a priori probability of to belong to the class Ni . This probability is fixed uniformly on two the classes (p(N1 ) = p(N2 ) = 0.5). The statistical threshold (0.5) allows to delimit two distinct regions; the first region is known to be certain as where the innovation variation is considered acceptable. The second region is not acceptable (fault region) and is where the innovation variation exceeds this statistical threshold. In this paper, we consider the classical assumption that data follow a normal distribution i.e. Gaussian distribution. The density function of a normal variable conditionally to a class Ni can be written as in Eq. (3). p(/Ni ) =
1 1/2
(2)
(det (i (k))
1/2
1 exp − iT (k)−1 (k)i (k) 2
(3)
with i (k) is the covariance matrix of the class Ni . Hypothesis H0 . The residual is symptomatic of a normal operating condition of the process. It is a random variable of Gaussian law with common variance 1 which is assumed to follow a normal distribution.
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
Hypothesis H1 . The residual is symptomatic of an abnormal operating condition of the system. It is a random variable of Gaussian law with common variance 2 . A calculation of the 1 and 2 is taken at the end of each fixed sampling interval.
According to the Bayes’ postulate, the form of the decision rule emphasizes the role of the a posteriori probabilities is written as follows: Decide N1 if p(/N1 )p(N1 ) > p(/N2 )p(N2 ); otherwise decide N2 . If an observation for which p(N1 /) is greater than p(N2 /), we would be naturally inclined to decide that the true state is N1 . Similarly, if p(N2 /) is greater than p(N1 /), we would be naturally inclined to choose N2 . Since modelling errors and noises in complex engineering systems are inevitable, the robustness to noises and model uncertainties is the key issue in the application of fault detection methods (Sharma et al., 2004). One of the classes of discrete time nonlinear systems is NARX model (Previdi, 2002). In this study, an ANN-NARX is used to describe accurately the distillation behaviour. The ANN propagates the error from the output layer (yi ) to the hidden layer to update the weight matrix (wi ). Each neuron produces an output signal, which is a function of the sum of its inputs:
yi =
(4)
xi wi
where (·) is the activation function. In this study, the neural model is defined as follows (Zaknich, 2003; Ljung, 1999):
y(t) = (y(t − 1), . . . , y(t − ny ), ui (t − nk ), . . . , ui (t − nk − nu ))
1≤i≤m
(5)
where y(k) is the Auto-Regressive (AR) variable or system output; u(k) is the eXogenous (X) variable or system input. ny and nu are the AR and X orders, respectively. m is the number of the used inputs. nk is the time delay between u and y. In this study, the hidden layer neurons have log-sigmoid transfer hyperbolic function as the activation function and the output have linear activation function. The ANN multilayer feed forward network is trained to capture the underlying relationship between the input u(k) and output y(k) using the training data. The training algorithm of Levenberg-Marquardt (LM) is used along with back-propagation (BP). In this case the ANN is trained iteratively using the training dataset to minimize the performance function of mean square error (MSE) between the network outputs and the corresponding target values. The LM algorithm (Chen et al., 2003) shows the fastest convergence during the training process based on gradient descent methods because it performs as a compromise between the stability of the first-order optimization methods (steepest-descent method) and the fast convergence properties of the second-order optimization methods (GaussNewton method). After training, the networks thus developed are tested with the test data set to assess the generalization capability of each developed network. The best ANN model developed is trained off-line and then used on-line for detecting faults in the separation unit. All computations have been made on Matlab® 7.0.4.
3.
Experimental results
3.1.
Experimental set-up
217
The proposed FD scheme is applied to a distillation unit. The feed tank (Fig. 1) contains a mixture to be separated (toluene–methylcyclohexane) with a mass composition at 23% in methylcyclohexane. The operation in continuous mode involves charging the still with the mixture to be separated, bringing the column to equilibrium under total reflux. The product is introduced through the optimal feed tray so that the light components are volatilized, while the heavy part goes down again with the reflux in the column reboiler. The quality of the collected top product of the column depends on the reflux flow rate. The reflux ratio is varied through the magnetic valve by changing the relative quantities of material returning to the column and flowing to product storage. Feed preheating system is constituted by three elements of 250 W each one. In addition it has a low liquid level switch in order to avoid the running if the level is excessively low. The reciprocating feed pump is constituted by a membrane allowing firstly the suction of the mixture and the discharge towards the tank with a flow capacity F = 4.32 L h−1 . The column has also a reboiler of 2 L hold-up capacity, an immersion heater of a power Qb = 3.3 kW and of a level liquid switch sensor which allows the automatic stop of heating if the level is insufficient. The column can be used in atmospheric (Patm) or in vacuum conditions (Pr). The stirring of the mixture in the reboiler is ensured by the boiling mixture. The internal packing is made of Multiknit stainless 316 L which enhances the mass transfer between the vapour and liquid phases. In order to approach the adiabatic conditions, a heat-insulating made of glass wool is laid around the column. A condenser is placed at the column overhead in order to condense the entire vapour coming out from the column. The cooling (Qc ) used in exchangers is water. The heat-transfer area of the total overhead condenser is 0.08 m2 . Moreover the reflux timer (Rt ) allows to control the reflux ratio (Rr ). It is monitored by the distillate temperature (Td ). When the required distillate temperature (Td ) is attained, the reflux timer opens. In the opposite case, it remains closed. Distillation supervision control system allows to modify the parameters and to follow their evolution such as the pressure drop (P), the flow or the temperatures at different points of the distillation column. This control system, therefore, must hold product compositions as near the set points as possible. The thermocouples are coupled to a calibrated amplification circuit (4–20 mA, 0–150 ◦ C) whose signals are inputted to the computer on-line, which permits the bottom and top temperatures to be obtained. The unit has twelve sensors which measure continuously the temperature throughout the column.
3.2. Selection of the reliable and reduced structure of the ANN model For the neural ANN learning, it is very important to appropriately choose the input and output variables, determination of the relative importance of inputs and time delay. Various techniques of the determination of the relevant structure are defined. The neural network weight matrix can be used to assess the relative importance of the different input variables upon the output variables of the process (Sharma et al., 2004). Garson (1991) proposed an equation based on partitioning
218
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
Fig. 1 – Experimental device: distillation column. used interpolated parameter functions to reduce the number of experiments used to develop ANN-based models. In this work, another strategy of the determination of the relevant input variables is proposed. For this reason, each input variable is modified and the variation relevance on the output variables is examined. It has to be bear in mind that this analysis of the relative importance is based on a long-term experimental work but it takes the real physical nature into account as well. For this particular reason, the distillation
of connection weights. Schittenkopf et al. (1997) suggest the use of principal component analysis (PCA) for model reduction in neural networks. Kramer (1991) used a network with an internal bottleneck layer of arbitrarily small dimension to develop a compact representation of input data. McLain et al. (McLain and Henson, 2000) used the method of Kramer in the design of their neural network, and used the model thus developed in the design of a nonlinear model reference adaptive controller for a polymerization CSTR. Tholodur et al. (2000)
Pressure
120
900
800 Overhead temperature
100
700
Feed temperature
90 80
600
70 500
60 400
50 40
300
30
Preheating power
200
20 100
Reflux
10 0
0
0
1000
2000
3000
4000
5000
6000
7000
Fig. 2 – Nominal steady-state behaviour of the studied column.
Time 8000(s)
Preesure drop (mBar)
Temperature (°C), Reflux (%), Preheating power (%)
110
219
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
Table 1 – The operating conditions of the nominal steady-state regime.
7
Qb (%)
Tf (◦ C)
68
80.1
Td (◦ C)
F (%)
95.15
Qc (L h−1 )
80
250
unit operates at Td = 95 ◦ C from the isobar diagram of the toluene–methylcyclohexane by using the Wilson thermodynamic model for vapour-liquid equilibrium (VLE). In order to define this nominal mode, all the regulation systems of the column are put in a closed-loop configuration. When the steady-state regime is achieved, the column was operated for approximately four hours in steady-state mode (Td = 95 ◦ C) in order to collect a sufficient number of data. The data is taken every 11 s. The set of the most important measurable variables of the process are (Rt , Qb , Pr, P, Qf , Tf , F, Qc ). Qf and Tf represent the flow and the temperature of the feed. Amongst the remaining variables, an experimental analysis of the process is carried out in order to observe the influence of each input variable on the output variables. This analysis aims at reducing more the system. This study is determined for a nominal steady-state regime of the column defined in Fig. 2. Table 1 gives the values of the measurable variables calculated on average from the nominal steady-state regime. xb , xF are the boiler and the feed compositions. The analysis of the relative importance is based on a longterm and an accurate experimental work. For this reason, each input variable is modified and the variation relevance on the output variables is examined. It has to be bear in mind that this analysis of the relative importance is based on an experimental work but it takes the real physical nature into account as well. This analysis is based mainly on the physical knowledge of the industrial processes. The fact of changing an input and only one allows to visually observe the delay between this input and the outputs of the process. The fact of changing an input and only one allows to visually observe the delay between this input and the outputs of the process. If the delay exists, therefore there is a dependence between this input and the observed output. Otherwise, there is no correlation between these quantities. Among all realized experimental results, only the study of the reflux timer is shown. The other experimental analyses were defined in the same way i.e. only one input variable is modified while the other input variables
P
Pr
xb
xF
3.02 mbar
800 mbar
0.25
0.25
3.2.1.
Influence of the reflux timer
3.2.2.
Influence of all input variables
This part summarizes the experimental variation of all inputs as seen in Section 3.2.1. Table 2 collects the mean time delay measured between each input variable and the output 12
Overhead temperature
Boiler temperature
% Reflux timer
102
Temperature (°C)
101 8
t=682 s
100 99
t=1331 s
6
t=1001 s
98 97
4
96 95
t=1188 s
2
94 93 0
250
500
16
The reflux timer allows to route part of the vapour condensed in the column, therefore the distillate temperature (Td ) decreases as the reflux increases. It also allows recovering the distillate, which causes a temperature (Td ) increase. In this experiment, it is important to highlight the importance of the opening percentage of the reflux timer, in other words, the reflux ratio is modified from a minimal value to a maximum one in order to reveal clearly in Fig. 3 the influence of this input variable. At the same time, the mean time delay (if it exists) is measured between this input variable and the output variables of the system. It is important to notice that the reflux timer has a very important influence on the distillate temperature (Td ) and quality. When the opening percentage of the reflux timer is null, all the vapours are condensed at the column overhead and routed back to the top tray. In this case, the vapour is very rich of the most volatile product, which drives the vapour temperature to decrease. Conversely, if the reflux timer opens that causes an impoverishment of the vapour of the most volatile product. The toluene being in excess compared to the methylcyclohexane, the vapour temperature increases. It is also important to point out that if Rt is excessively low, the level in the reboiler increases fairly quickly because most of the vapours are returned in the column and the feed rate (F) which remains constant sends feed permanently. In the other case if Rt is too high, too much distillate is recovered but the mixture quantity vaporized will be more important than the constant feed flow rate.
104 103
Qf (%)
are maintained constant in order to notice only the evolution of the output variables due to this particular input variation.
750
1000
Time (s)
Fig. 3 – Relative importance of the reflux timer.
1250
0 1500
Reflux timer (%)
Rt (%)
220
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
a
b
Fig. 4 – (a) Training and test data (b) training and test data. variable. The lower the time delay is the higher the relative importance is because if a variation of the input variable occurs, the output variable would be disturbed after a time delay. On the hand, it is important to highlight that the time delay between the reflux timer and the distillate temperature is 44 s. The feed flow rate has an influence on (Td ); however the response of this temperature is especially low to a feed variation (210.6 s). Therefore, the feed flow rate has less influence on (Td ) than the other input variables. When a time delay is called infinite (represented by “∞”), it means that there is no influence of the input variable upon the output variables.
3.3.
Pattern generation
performance of the neural network, the training data should represent the full range of operating conditions of the distillation column. This study aims at modelling the distillate temperature (Td ) according to the input variables of the process such as P, Rt , Qb , Qf , Qc , Tf , Pr and F. To have this database which is rich in amplitudes and in frequencies, the column behaviour is modified in the temperature range (Td = 93.4 → Td = 95.9 ◦ C). The test data is a set of independent data used to verify the consistency of the efficiency of the model. For more legibility of figures, the parameters of the distillation column are represented in Fig. 4(a) and (b). The operation duration of the distillation column is 12 h incessantly.
3.4.
The generation of training data is an important procedure in the development of ANN models. To reach the best
Table 2 – Mean time delay (seconds) between input and output variables. Input output
Rt
Qb
P
Qf
Tf
F
Qc
Td
44
104.5
93.5
95.3
29.3
201.6
∞
Data normalization
To maintain the influence of smaller data values in comparison to higher input values, the generated experimental data are no-directly introduced in the network as training patterns. Also, when the raw data is applied in the ANN, there is a great risk of the simulated neurons reaching the saturated conditions. For these reasons, the experimental data are normalized before being presented to the ANN which gives equal priority to all the inputs variables. In our study, data normalization compresses the range of training data between 0.1 and 0.9.
221
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
0.55 2.05
Test
Hidden nodes (N h )
Training
0
0.4
1.15
0.35
0.85 0
0.3 1
2
3
4
5
6
7
8
9
10
11
3
4
AIC Statistical tests
1.45
LF (training, 10 -3 )
LF (test, (10 -3 )
0.45
2
5
6
7
8
9
10
11
12
-5700
0.5 1.75
1
BIC
FPET
-5900
-6100
-6300
12
Hidden nodes (N h ) -6500
Fig. 5 – The variations of loss functions for learning and testing data with different number of hidden neurons.
Fig. 6 – Evolution of the statistical criteria.
The number of nodes in the input and the output layers depend on the number of input and output variables, respectively. The number of hidden layers and the number of nodes in each hidden layer affect the generalization capability of the neural network. For smaller number of hidden layers and neurons, the performance of the ANN may not be satisfactory, while with too many hidden neurons, there is the risk of overfitting the training data. In this case, the ANN generalization on the new data is poor. Different methods, both heuristic and systematic, can be chosen in order to select the number of hidden layers and the nodes. Haykin (1999). In this study, the adopted strategy is chosen as follows; the initial model has a low number of parameters and hidden neurons were gradually added during learning until the optimal result is achieved in the test subset. Indeed, the neural network of increasing model order is trained and its performance is compared using the loss function (LF). This function is defined by the following equation: 1 2 ε (t) N N
LF =
(6)
i−1
where ε (t) = y(t) − yˆ (t) represents the prediction error and N is the data number. In the present work, the number of hidden neurons was modified from 1 to 12 neurons. Also, one hidden layer was used. The complete training process of networks took approximately 80,000 epochs using the Levenberg-Marquardt algorithm. In this case, the model composed by the set of inputs (P, Rt , Qb , Qf , Qf , Tf , Pr and F) and the output (Td ) is reduced according the Eq. (5) and Tables 1 and 2. Td (t) = (Td (t − 1), Tf (t − 3), Rt (t − 4), P(t − 8), Qf (t − 8) (7) Fig. 5 shows the variations of loss function (LF) vectors for learning and testing data versus different number of hidden nodes (Nh ). After training, each ANN is tested with a test data set, which was not used for training (Fig. 5). This figure shows that in the training phase, the loss function decreases when the neurons number (Nh ) in the hidden layer increases. Nh has a significant effect on the performance of the ANN. It is noticed that if Nh > 9, the trained ANN produces an over-fitting of the data prediction and it loses its generality. Therefore, a discrepancy appears between the ANN output and the actual target. In conclusion, as seen in Fig. 5, this adequate number (Nh ) is clearly equal to 9 neurons in the test.
The model which has a structure (5-9-1) exhibits the acceptable LF in the test data. The difficult trade-off between model accuracy and complexity can be clarified by using model parsimony indices from linear estimation theory (Ljung, 1999) such as Akaike’s Information Criterion (AIC), Final Prediction Error (FPE) and Bayesian Information Criterion (BIC). These statistical criteria are given as follows: AIC = N ln
FPE = N ln
1 N
1
BIC = N ln
N
LF
LF
1 N
+ N ln
LF
+
+
(8)
2nw
N + n w
(9)
N + nw nw
ln(N)
(10)
where nw is the number of weights. Hence, the AIC, FPE and BIC are weighted functions of the LF for the validation data set, which penalize for reductions in the prediction errors at the expense of increasing model complexity (i.e. model order and number of weights). Strict application of these statistical criteria means that the model structure with the minimum AIC, FPE or BIC is selected as a parsimonious structure. Fig. 6 shows the criteria evolution. A strict application of the indices would select a number Nh = 5 because it exhibits the lowest of three indices for all the model structures compared. In conclusion, the developed network architecture used consisted of 5 neurons in the input layer and 5 neurons for the hidden layer. This reduced neural model is considered as a reliable one for describing the dynamic behaviour of the studied distillation column. In conclusion, the identified model is reduced from a (9-9-1) neural structure to (5-5-1). Once the training and the test of the NARX model has been completed, a residual analysis (Fig. 7) is necessary as seen in (Chetouani, 2007). Model validation tests should be performed to validate the identified model. Billings and Voon (1986) proposed some correlations based model validity tests. 0.40
0.20
0.00
Residual
3.5. Reliability of modelling of the distillate temperature (Td )
0
5000
10000
15000
20000
25000
-0.20
30000
35000
Time (s)
-0.40
-0.60
-0.80
Fig. 7 – Evolution of the residual.
96
1
95.5
0.9
95
0.8
94.5
0.7
93.5
Fault time (8173s)
93 92.5
Fault time (8173 s)
0.6 0.5
Class C : Fault class
94
Class C : Normal class
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
Stastical test
Overhead Temperature (°C)
222
0.4 0.3
92 0.2
91.5
Time (s)
91 0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Fig. 8 – Evolution of the distillate temperature (Td ) caused by the chosen fault.
In order to validate the identified model, it is necessary to evaluate the properties of the errors that affect the prediction of the outputs of the model, which can be defined as the differences between experimental and simulated time series. In general, the characteristics of the error are considered satisfactory when the error behaves as white noise, i.e. it has a zero mean and is not correlated (Billings and Voon, 1986; Cammarata et al., 2002). In fact, if both these conditions are satisfied, it means that the identified model has captured the deterministic part of the system dynamics, which is therefore accurately modeled (Chetouani, 2007). This section shows that the ANN modeling has many attractive properties for modelling of complex production systems as a distillation column: universal function approximation capability, resistance to noisy and good generalization ability.
3.6.
Application of the developed fault detection
Once the identified ANN is trained and tested, it is ready for detecting faults. This developed neural model is used both for the prediction and the FD procedure using the analysis of residuals. The proposed method is based on the Bayes classifier which has as inputs the set Td (t − 1), Tf (t − 3), Rt (t − 4), P(t − 8), Qf (t − 8) and as output the probabity P(Ni /) indicating whether the system is in normal or abnormal mode. In this section, the proposed FD is applied to an experimental fault to verify its workability and effectiveness under real measurement conditions. Ten scenarios of faults are realized experimentally as follows. To illustrate the adopted approach for the fault detection, it was decided to attempt among these faults to detect a sudden closing of the reflux timer (Rt = 0%). This fault is frequent in the distillation operation and introduces a large deviation in comparison with the normal behaviour. It is important to notice that this fault occurs at 8173 s causes a large decrease of the distillate temperature (Td ). This evolution is showed in Fig. 8. This decrease should be detected by the Bayes classifier because it exceeds a statistical threshold (0.5). This statistical acceptance threshold is shown clearly in Fig. 9 which represents the evolution of the Bayes classifier. Two operating zones are exposed: a fault region and a confidence one. It is important to notice that the fault which occurs at 8173 s, is detected at 8217 s i.e. with a delay of (44 s) which corresponds to a difference (T ≈ 0.7) between the temperature of the desired distillatetemperature (Td = 95) and the fault one (Td = 94.3) (Table 3).
Threshold (0.5) Detection time (8228 s)
0.1
Time (s)
0 7700
7800
7900
8000
8100
8200
8300
8400
8500
8600
8700
8800
Fig. 9 – Results of Bayes classifier for the selected fault (Fault 10). Table 3 – List of experimental faults. Number Normal mode Fault 1 Fault 2 Fault 3 Fault 4 Fault 5 Fault 6 Fault 7 Fault 8 Fault 9 Fault 10
Fault type No fault: normal conditions are given in Table 1 Vacuum pressure (Pr) is equal to atmospheric (Patm) Vacuum pressure (Pr) is fall down from 800 mbar to 700 mbar Heating power Qb from 68 to 0% Heating power Qb from 68 to 100% Feed pump (F) from 80 to 0% Feed pump (F) from 80 to 100% Feed preheating power QF from 16 to 0% Feed preheating power QF from 16 to 100% Reflux (Rt ) from 7 to 100% Reflux (Rt ) from 7 to 0%
Fig. 9 indicates that the Bayes classifier defines correctly the nature of the class on-line for the normal mode and the abnormal one. It is noticed a good classification of the normal mode (Threshold > 0.5). When the fault occurs, the Bayes classifier indicates that the behaviour of the residual is modified and it is defined rather in the class N1 that in the class N2 (Threshold < 0.5). In this section, we will describe the Bayes approach that is developed with a simple, reduced and reliable neural modelling. It shows that the FD is essentially a classification problem, where a particular measurement vector in time has to be classified as belonging to one of the n a priori determined classes. Given such a problem definition, the decision rule that produces the lowest error rate, and maximizes the a posteriori probability is the Bayes classifier as given in Eq. (2).
4.
Conclusion
Fault detection is a vital problem with large impact on safety and optimal operation of complex chemical separation units as a distillation. The FDI problem for industrial systems described by ANNs has been addressed in this paper. The reduced proposed model performs the FD task by comparing the actual behaviour of the distillation column with the predicted by a connectionist model of the normal condition operation. This study shows that the combination of the ANN and the Bayes classifier solves efficiently the FD problem. The experimental results show that the one-layer perceptron network provides promising assignments to normal and faulty states of the investigated reference process. In chemical process industry, there is a demand for actual data-based methods due to the complexity and limited availability of the models. The dedicated example indicates the strength of the
Process Safety and Environmental Protection 9 2 ( 2 0 1 4 ) 215–223
proposed approach for reliability of models, prediction of the future data and fault detection. This makes this combination very attractive for solving the modelling issues and fault detection in real time operation of complex plants as a distillation column.
References Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S., 2003. A review of process fault detection and diagnosis. Part I: Quantitative model-based methods. Comput. Chem. Eng. 27, 293–311. Patton, R.J., Frank, P.M., Clark, R.N., 2000. Issues of Fault Diagnosis for Dynamic Systems. London, Springer. Chetouani, Y., 2006. Application of the generalized likelihood ratio test for detecting changes in a chemical reactor. Process Saf. Environ. Prot. 84, 371–377. Isermann, R., 2006. Fault-Diagnosis Systems. An Introduction from Fault Detection to Fault Tolerance. Springer, Berlin. Simani, S., Fantuzzi, C., Beghelli, S., 2000. Diagnosis techniques for sensor faults of industrial processes. IEEE Trans. Control Syst. Technol. 8, 848–855. Korbicz, J., Koscielny, J.M., Kowalczuk, Z., Cholewa, W., 2004. Fault Diagnosis: Models, Artificial Intelligence, Applications, first ed. Springer, Berlin. Scola, H.R., Nikoukah, R., Delebecque, F., 2003. Test signal design for failure detection: a linear programming approach. Int. J. Appl. Math. Comput. Sci. 13, 515–526. Witczak, M., Korbicz, J., Mrugalski, M., Patton, R.J., 2007. A GMDH neural network based approach to robust fault detection and its application to solve the DAMADICS benchmark problem. Control Eng. Pract. 14, 671–683. Chetouani, Y., 2008. Design of a multi-model observer-based estimator for Fault detection and Isolation (FDI) strategy: Application to a chemical reactor. Braz. J. Chem. Eng. 25, 777–788. Chiang, L.H., Russell, R.E.L., Braatz, R.D., 2001. Fault Detection and Diagnosis in Industrial Systems. Springer, London. Isermann, R., 2005. Model-based fault-detection and diagnosis-status and applications. Ann. Rev. Control 29, 71–85. Ferentinos, K.P., 2005. Biological engineering applications of feedforward neural networks designed and parameterized by genetic algorithms. Neural Netw. 18, 934–950. Bouthiba, T., 2004. Fault location in ehv transmission lines using artificial neural networks. Int. J. Appl. Math. Comput. Sci. 14, 69–78. Gupta, M.M., Liang, J., Homma, N., 2003. Static and Dynamic Neural Networks. Wiley, Hoboken, New Jersey. Nelles, O., 2001. Non-linear Systems Identification. From Classical Approaches to Neural Networks and Fuzzy Models. Springer, Berlin. Tan, S.C., Lim, C.P., Rao, M.V.C., 2007. A hybrid neural network model for rule generation and its application to process fault detection and diagnosis. Eng. Appl. Artif. Intell. 20, 203–213. Uraikul, V., Chan, W.C., Tontiwachwuthikul, P., 2007. Artificial intelligence for monitoring and supervisory control of process systems. Eng. Appl. Artif. Intell. 20, 115–131.
223
Vieiraa, J., Diasb, F.M., Motac, A., 2004. Artificial neural networks and neuro-fuzzy systems for modelling and controlling real systems: a comparative study. Eng. Appl. Artif. Intell. 17, 265–273. Chetouani, Y., 2007. Modelling and prediction of the dynamic behaviour in a reactor-exchanger using NARMAX neural structure. Chem. Eng. Commun. J. 194, 691–705. Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer. Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classification, second ed. Wiley. Cover, T., Hart, P., 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27. Tiplica, T., Kobi, A., Barreau, A., 2003. Optimisation et maîtrise des processus multivariés. la méthode FNAD. J. Eur. Syst. Automat. 37, 477–500. Langley, P., Iba, W., Thompson, K., 1992. An analysis of Bayesian classifiers. In: National Conference on Artificial Intelligence. Sahami, M., 1996. Learning limited dependence Bayesian classifiers. In: Second International Conference on Knowledge Discovery in Databases. ´ M., Zwolinski, M., 2006. Analogue Litovski, V., Andrejevic, electronic circuit diagnosis based on ANNs. Microelectronics Reliab. 46, 1382–1391. Dupret, G., Koda, M., 2001. Bootstrap re-sampling for unbalanced data in supervised learning. Eur. J. Oper. Res. 134, 141–156. Hecht-Nielsen, R., 1990. Neurocomputing. Addison-Wesley, Reading, MA. Babnik, T., Gubina, F., 2002. Two approaches to power transformer fault classification based on protection signals. Int. J. Electr. Power Energy Syst. 24, 459–468. Sharma, R., Singh, K., Singhal, D., Ghosh, R., 2004. Neural network applications for detecting process faults in packed towers. Chem. Eng. Process. 43, 841–847. Previdi, F., 2002. Identification of black-box nonlinear models for lower limb movement control using functional electrical stimulation. Control Eng. Pract. 10, 91–99. Zaknich, A., 2003. Neural Networks for Intelligent Signal Processing. World Scientific, Singapore. Ljung, L., 1999. System Identification, Theory for the User. Prentice Hall, Englewood Cliffs, NJ. Chen, T.C., Han, D.J., Au, F.T.K., Tham, L.G., 2003. Acceleration of Levenberg-Marquardt training of neural networks with variable decay rate. In: Proceedings of the International Joint Conference – 3, pp. 1873–1878. Garson, G.D., 1991. Interpreting neural-network connections weights. AI Expert, 47–51. Schittenkopf, C., Deco, G., Brauer, W., 1997. Two strategies to avoid overfitting in feedforward networks. Neural Netw. 10, 505–516. Kramer, M.A., 1991. Nonlinear principal component analysis using autoassociative neural networks. AIChE 37, 233–243. McLain, R.B., Henson, M.A., 2000. Principal component analysis for nonlinear model reference adaptive control. Comput. Chem. Eng. 24, 99–110. Tholodur, A., Ramirez, W.F., McMillan, J.D., 2000. Interpolated parameter functions for neural network models. Comput. Chem. Eng. 24, 2545–2553. Haykin, S., 1999. Neural Networks. Prentice Hall. Billings, S.A., Voon, W.S.F., 1986. Correlation based model validity tests for nonlinear models. Int. J. Control 44, 235–244. Cammarata, L., Fichera, A., Pagano, A., 2002. Neural prediction of combustion instability. Appl. Energy 72, 513–528.