Accepted Manuscript Midpoint-radii principal component analysis -based EWMA and application to air quality monitoring network M. Mansouri, M.-F. Harkat, M. Nounou, H. Nounou PII:
S0169-7439(17)30108-9
DOI:
10.1016/j.chemolab.2018.01.016
Reference:
CHEMOM 3589
To appear in:
Chemometrics and Intelligent Laboratory Systems
Received Date: 10 February 2017 Revised Date:
27 December 2017
Accepted Date: 27 January 2018
Please cite this article as: M. Mansouri, M.-F. Harkat, M. Nounou, H. Nounou, Midpoint-radii principal component analysis -based EWMA and application to air quality monitoring network, Chemometrics and Intelligent Laboratory Systems (2018), doi: 10.1016/j.chemolab.2018.01.016. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
RI PT
Midpoint-Radii Principal Component Analysis -based EWMA and Application to Air Quality Monitoring Network Mansouri M.a,∗, Harkat M.-F.a , Nounou M.a , Nounou H.b
Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha, Qatar b Chemical Engineering Program, Texas A&M University at Qatar, Doha, Qatar
M AN U
SC
a
Abstract
AC C
EP
TE
D
Monitoring air quality is crucial for the safety of humans and the environment. Moreover, real world data collected from air quality network is often affected by different types of errors as measurement noise and variability of pollutant concentrations. The uncertainty in the data, which is strictly connected to the above errors, may be treated by considering interval-valued data analysis. In practical cases of measured data, the true value cannot be measured and the collected data on a process are only approximations given by sensors, and are thus imprecise. This is due mainly to the uncertainties induced by measurement errors or determined by specific experimental conditions. Thus, the main aim of this paper is to develop an enhanced monitoring of air quality network by taking into account the uncertainties on the data. To do that, we develop a new monitoring technique that merges the advantages of Midpoint-radii PCA (MRPCA) method with exponentially weighted moving average (EWMA) chart, in order to enhance sensor fault detection technique of air quality monitoring process. MRPCA is the most popular interval multivariate statistical method, able to tackle the issue of uncertainties on the models and one way to improve the fault detection abilities. On the other hand, the EWMA statistic allows an exponential weighted average to successive observations and able to detect small and moderate faults. The developed MRPCA-based EWMA method relies on using MRPCA as a modeling framework for fault detection and EWMA as a detection chart. The ∗
Mansouri M.(
[email protected])
Preprint submitted to Chemometrics and Intelligent Laboratory SystemsFebruary 12, 2018
ACCEPTED MANUSCRIPT
SC
RI PT
proposed MRPCA-based EWMA scheme is illustrated using a simulation example and applied for sensor fault detection of an air quality monitoring network. The monitoring performances of the developed technique are compared to the classical monitoring techniques. MRPCA model performances are compared with the interval PCA models: complete-information principal component analysis (CIPCA) and Centers PCA (CPCA). The MRPCAbased EWMA monitoring performances are compared to MRPCA-based Shewhart, generalized likelihood ratio test (GLRT) and squared prediction error (SPE) techniques.
M AN U
Keywords: Sensor fault detection, Midpoint-radii, Principal component analysis, Exponentially weighted moving average, Air quality monitoring network. 1. Introduction
15
20
AC C
EP
10
TE
D
5
Air pollution poses a significant threat to human health and people life quality. To protect public health, air quality is monitored through several techniques. The most famous techniques used for air quality monitoring is model based approaches. Furthermore, most existing models take into account the atmospheric chemistry reactions and the emissions of primary pollutants. These models therefore use a large number of parameters, computationally costly and need measurements that are seldom available in air quality monitoring networks [1, 2]. So, the air quality monitoring network is a sensor data validation problem which needs i) process modeling, ii) sensor fault detection, iii) sensor fault isolation, and iv) correction. In this work, we will focus on the two first tasks: process modeling and sensor fault detection. Principal component analysis (PCA)-based fault detection is a well established data driven approach that has long been praised for its performances. However, data are often affected by different types of errors/uncertainties, including measurement noise, sensor imprecision and variability of measured quantity. These uncertainties have a negative impact on the established PCA model, and thus, on the fault detection performances. For more precision in representing the real data, this uncertainty can be treated by considering an interval representation, instead of a single-valued representation. In this case, the determination of PCA model requires using new techniques adapted for the interval-valued data. The first Interval Principal Component Analysis (IPCA) methods proposed were the centers and vertices methods by Cazes 2
ACCEPTED MANUSCRIPT
40
55
60
RI PT
AC C
50
EP
TE
45
SC
35
M AN U
30
D
25
et al. [3] and Choukria [4]. The centers PCA (CPCA) method used the centers matrix of the input interval dataset to compute the principal components. Thus, the centers method only utilizes the variations between intervals, while ignoring the variation within the interval. Lauro and Palumbo [5] also proposed the symbolic object and range-transformation methods to eliminate some of the shortcomings in the vertices method. The symbolic object method introduces an additional boolean transformation matrix, the purpose of which is to remove any interdependency between the vertices. However, this approach still suffers from the limitation of utilizing only the variance between the vertices, thereby ignoring some of the dataset’s internal variance. The paper [6] proposes a new techniques that take into consideration the internal structure of symbolic variables. The authors in [7], have proposed a three-way PCA of interval data to extract the dynamic main features of Copper futures market in order to reduce the variable space dimension. A new interval PCA method with an enhanced covariance matrix calculation, is proposed in [8], and is called the complete-information principal component analysis (CIPCA). The authors in [9] introduced both interval centers and interval ranges and is called the midpoints-radii PCA (MRPCA), and is an enhancement of CPCA by including the radius of data. Thus, in this work, we propose to use CIPCA, CPCA and MRPCA methods to deal with the problem of modeling for process monitoring purposes. Regarding process monitoring, the fault detection problem is often used for process monitoring. Several fault techniques based on single valued data have been developed in literature [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. Different detection charts are used to detect the faults in systems such cumulative sum (CUSUM), shewhart [24], exponentially weighted moving average chart (EWMA) [25], generalized likelihood ratio test (GLRT) [17] and squared prediction error (SPE) index [26]. The main weakness of Shewhart chart, is in dealing with last data point while ignoring older one as there is no memory. So, minor variation in average of random variable is very probably to undetectable rapidly. In contrast, CUSUM and EWMA are performing well in detection of minor faults with respect to Shewhart chart. The CUSUM statistic presumes that every monitoring is of equivalent weightage. It is generally sluggish in responding to big shifts. Regarding fault detection in cases of interval valued data, the authors in [27, 28] have used SPE index as well as GLRT chart based on interval PCA model for detecting mean shifts. On the other hand, the EWMA control limit showed improved abilities over SPE and GLRT charts for detecting small and moderate faults and allows 3
ACCEPTED MANUSCRIPT
75
RI PT
SC
70
M AN U
65
an exponential weighted average to successive observations [29]. EWMA is used to monitor variables in statistical process control which is different from other control charts tent to consider each data point individually [30, 31]. The advantages of the EWMA chart in the fault detection is mainly due to its extensive process memory and its ability to better detect small faults. Therefore, the contribution of this paper is to extend the techniques proposed for single-valued data to deal with interval-valued data and develop a new sensor fault detection that combines the benefits of EWMA chart with those of the interval valued-based MRPCA model. The rest of the paper is organized as the following. In Section 2, MRPCA method description is given, which is used with the EWMA chart for fault detection. Section 3 presents the developed MRPCA-based EWMA technique which integrates the fault detection abilities of the MRPCA model with the EWMA chart. Then, in Section 4, the fault detection performance is studied using two examples: synthetic data and air quality data. At the end, the conclusions are presented in Section 5. 2. Midpoint-radii PCA (MRPCA) method description
90
95
D
AC C
85
EP
TE
80
2.1. Interval-valued data description Air quality monitoring network systems is often affected by different types of uncertainties due mainly to measurement noise and variability of pollutant concentrations. The uncertainty in the model may be treated by considering the interval-valued data. Once a measurement of a quantity is achieved using a sensor, the actual value x∗j (k) of a quantity can diverge from the measured result xcj (k). The measurement errors being δxj (k) = xcj (k) − x∗j (k). Once we perform a measurement result xcj (k), we know that the actual (unknown) value x∗j (k) of the measured quantity belong to the interval + − + c c x∗j (k) = [x− j (k) xj (k)], where xj (k) = xj (k) − δxj (k) and xj (k) = xj (k) + δxj (k). An interval valued data [x(k)], refers to a set of numbers enclosed in an interval on the real line, usually expressed as [x] = [x− (k) x+ (k)], where x− (k), x+ (k) ∈ R and x− (k) ≤ x+ (k). We start to describe the interval valued variables properties [32]. An interval valued variable [Xj ] ⊂ R is represented by a series of sets of values delimited by ordered couples of bounds referred as minimum and maximum: [Xj ] = {[xj (1)], [xj (2)], ..., [xj (n)]}, where − + [xj (k)] ≡ [x− j (k), x+ j (k)] ∀ k ∈ {1, ..., n} and xj (k) ≤ xj (k). The generic
4
ACCEPTED MANUSCRIPT
1 − xcj (k) = (x+ j (k) + xj (k)) 2 and
(1)
1 − xrj (k) = (x+ (2) j (k) − xj (k)) 2 Before presenting interval-valued data based PCA methods, let us define an interval-valued data matrix. Let [X] be an n × m data matrix. Then, − − + x1 (1), x+ (1) . . [x (1), x (1)] 1 m m . . . [X] = (3) . . − . + + x1 (n), x1 (n) . . [x− m (n), xm (n)]
M AN U
SC
100
RI PT
interval [xj (k)] can be also expressed by a couple {xcj (k), xrj (k)} and that this is a biunivocal relationship, where :
+ where x− j (k) ≤ xj (k) for all k = 1, 2, ..., n and j = 1, 2, ..., m.
TE
D
105
2.2. Classical PCA Let x(k) = [x1 (k) x2 (k) . . . xm (k)]T denote the sample measurement vector at time k. Assuming that there are N samples for each sensor, a data matrix X = [x(1) x(2) ... x(N )]T ∈ RN ×m is composed with each row representing a sample x(k)T . PCA determines a transformation of the data matrix X which maximizes the variance of projections: X = T P T,
EP
Σ = P ΛP T
115
(4)
with T = [t1 t2 . . . tm ] ∈
AC C
110
T = XP
P P T = P T P = Im
with
(5)
with Λ = diag(λ1 . . . λm ) a diagonal matrix where diagonal elements are order in decreasing order : λ1 ≥ λ2 ≥ . . . ≥ λm . Eigenvalues, eigenvectors and principal components matrices can be partitioned as: ˆ` Λ 0 Λ= (6) ˜ m−` 0 Λ 5
ACCEPTED MANUSCRIPT
P =
P` Pm−`
,
T =
T` Tm−`
(7)
where ` represents the number of retained principal components to be kept in the PCA model. Equation (4) can be rewritten as: T ˆ +X ˜ X = T` P`T + Tm−` Pm−` =X
SC
with: ˆ = X C` , X
˜ = X Cm−` X
(8)
(9)
where C` = P` P`T and Cm−` = Im − C` constitute PCA model. ˆ and X ˜ represent, respectively, the modeled and unmodeled variMatrices X ations of X using the first ` principal components (` < m). A sample vector x(k) ∈ Rm can be projected on the principal and residual subspaces, respectively,
M AN U
125
RI PT
120
ˆ (k) = P` t` (k) = C` x(k) x
(10)
t` (k) = P`T x(k) ∈ R`
(11)
D
and,
EP
2.3. Midpoints-Radii PCA Method Midpoints-Radii PCA (MRPCA) is one of the most known interval datadriven technique used for process modeling. First, a MRPCA model is built off-line using fault-free data. From the obtained MRPCA model, interval residuals are generated and used for process monitoring purposes. Problems with statistical analysis of interval data using standard interval arithmetic can be avoided by representing them using interval midpoints and ranges. The midpoints-radii PCA (MRPCA) for interval-valued data, introduced in [33, 9], is a hybrid method that is an improvement of CPCA by including radius. MRPCA is resolved in terms of midranges (X c ) and midpoints (X r ), given in equations (1) and (2), and their interconnection.
AC C
130
TE
is the vector of the scores of ` latent variables. The residual vector is given by: ˆ (k) = (I − C` )x(k) r(k) = x(k) − x (12)
6
ACCEPTED MANUSCRIPT
According to MRPCA [9], two independent PCAs are applied on these two matrices. The solutions are given by the following eigen-systems:
Where Λc , P c and Λr , P r are, respectively, the eigenvalues and eigenvectors of the two partial eigen-decomposition of midpoints and midranges matrices, and Σ is the covariance matrix given by:
SC
135
(13) (14)
RI PT
X c Σ−1 P c = Λc P c X r Σ−1 P r = Λr P r
Σ = X c T X c + X r T X r + X c T X r + X r T X c
In order to get a logical graphical representation of the statistical units based on MRPCA model, the rotated radii coordinates are superimposed on the midpoints PCs as supplementary points, which can be achieved by maximizing the Tucker congruence coefficient between midpoints and radii [34], or using a rotation matrix A = QP T [9], given the following singular value decomposition:
M AN U
140
X cT X r = P Λcr QT C`c
P`c P`cT
C`r
(16)
P`r P`rT
D
Let = and = be the PCA models for centers and ranges data matrix, respectively. Interval-valued estimations based on the MRPCA model is then given by: c x (k) = C`c xc (k) (17) xr (k) = C`r xr (k) and
TE
145
(15)
ˆ − (k) = x ˆ c − Aˆ x xr (18) + c ˆ (k) = x ˆ + Aˆ x xr Although, the MRPCA will be applied in the proposed MRPCA-based EWMA technique to perform the modeling phase, while the EWMA chart will be used to detect the fault. The EWMA chart description will be presented in the next section.
150
AC C
EP
3. Exponentially weighted moving average based interval-valued data
155
3.1. Exponentially weighted moving average based single-valued data The EWMA chart was established by Roberts in 1959 and named as Geometric Moving Average (GMA) chart [35]. Later, the GMA chart became 7
ACCEPTED MANUSCRIPT
RI PT
160
popularly referred as the EWMA chart [36]. Like CUSUM chart [37], the EWMA chart is capable of detecting smaller faults shifts in the mean if compared to Shewhart chart [24]. The single valued based EWMA statistic (Z) may be calculated using [38]: Zi = λXi + (1 − λ)Zi−1 , i = 1, ..., N
SC
where λ denotes smoothing parameter between 0 and 1, which changes the memory of the detection statistic, Xi is the value of the i − th individual observation. The initial value Z0 is set equal to process in-control mean, or target value, µ0 . The EWMA statistic (Z) detects a fault in the process when Zi exceeds the control limits. The control limits (U CL; upper control limit while LCL; lower control limit) for the EWMA control chart may be calculated as ([39]): r λ [1 − (1 − λ)2k ], (20) U CL = µ0 + Lσ 2−λ r λ LCL = µ0 − Lσ [1 − (1 − λ)2k ], (21) 2−λ where, L represents the control width of the EWMA chart and σ is the incontrol standard deviation of X. At steady state [1 − (1 − λ)2i ] becomes to unity, while steady state values will be rewritten as [39]: r λ , (22) U CL = µ0 + Lσ 2−λ r λ LCL = µ0 − Lσ . (23) 2−λ When the EWMA statistic value is between the control limits under null hypothesis, there is no fault and if EWMA statistic is exceeds threshold value, fault is declared in the system.
175
AC C
EP
TE
170
D
M AN U
165
(19)
3.2. Exponentially weighted moving average based interval-valued data In interval valued based EWMA, the interval residuals r− (k) and r+ (k) can be obtained using MRPCA model, as: − r (k) = x− (k) − xˆ− (k) (24) r+ (k) = x+ (k) − xˆ+ (k) 8
ACCEPTED MANUSCRIPT
RI PT
180
Furthermore, a method of calculating the EWMA statistic for intervalvalued data can be achieved using the interval residuals as in classical case given by 19. Thus, yielding an interval with an upper Z + (k) and a lower bound Z − (k), corresponding respectively to the upper and lower bounds of the calculated residuals, as: − Z (k) = λr− (k) + (1 − λ)Z − (k − 1), (25) Z + (k) = λr+ (k) + (1 − λ)Z + (k − 1).
SC
M AN U
185
The corresponding control limits (U CL− ; upper control limit of lower chart, LCL− ; lower control limit of lower chart, U CL+ ; upper control limit of upper chart, LCL+ ; lower control limit of upper chart) for the interval EWMA control chart can be computed as ([39]): r λ − − − [1 − (1 − λ)2k ], (26) U CL = µ0 + Lσ 2−λ r
−
−
+
+
LCL = µ0 − Lσ and
−
r
D
U CL = µ0 + Lσ
+
190
r
(27)
λ [1 − (1 − λ)2k ], 2−λ
(28)
TE
λ [1 − (1 − λ)2k ], (29) 2−λ where, µ0 − and µ0 + are the mean of lower and upper bounds of the EWMA chart, respectively. Where σ − and σ + are the in-control standard deviation of lower and upper bounds of X, respectively. At steady state [1 − (1 − λ)2i ] simplifies to unity, and the following steady state values are obtained [39]: r λ − − − U CL = µ0 + Lσ , (30) 2−λ r λ − − − LCL = µ0 − Lσ , (31) 2−λ and r λ + + + U CL = µ0 + Lσ , (32) 2−λ +
AC C
EP
LCL = µ0 + − Lσ +
195
λ [1 − (1 − λ)2k ], 2−λ
9
ACCEPTED MANUSCRIPT
r
λ . (33) 2−λ Next, the developed MRPCA-based EWMA technique is validated through two examples: the first one using a simulated example and the second one using an air quality monitoring network.
200
SC
4. MRPCA-based EWMA and Applications
RI PT
LCL+ = µ0 + − Lσ +
M AN U
205
4.1. Simulation example The efficiency of the proposed scheme is first validated through a numerical example. Consider the following simulation example based on 7 variables j = 1, . . . , 7 and n = 1000 measurements. The monitored variables are described in different instants k by the following relations: ei (k) ∼ N (0, 0.05)
(34)
TE
D
x1 (k) = u1 (k) + e1 (k) x2 (k) = u2 (k) x3 (k) = x2 (k) + e3 (k) x4 (k) = 2x1 (k) + x3 (k) + e4 (k) x5 (k) = x2 (k) + x3 (k) + e5 (k) x6 (k) = 2x1 (k) + x2 (k) + e6 (k) x7 (k) = x1 (k) + 2x3 (k)
210
AC C
EP
where u(k) are an independent random generated variables and e1 − e6 are independent Gaussian noise N (0, 0.05). As training data, samples are generated under normal conditions. In order to obtain the interval-valued data matrix [X], a variation δxj (k), j = 1, . . . , 7, which simulates the presence of uncertainties is added to each variable. Hence, the construction of intervals is given by: [xj (k)] = [xj (k) − δxj (k), xj (k) + δxj (k)] Figure 1 shows the time evolution of interval-valued variables [x1 ], [x4 ] and [x7 ] of the simulation example. Before applying MRPCA modeling, data are scaled to zero-mean and unit-variance. The mean square estimation error (MSE) criterion is used to select the best model that will be applied next for monitoring purposes. The MSE criterion measures the estimation distance between the interval-valued fault free data and the estimated one. It is
10
ACCEPTED MANUSCRIPT
Interval-valued data
1.6 1.4 1.2 100
200
300
400
500 Sample Number
600
700
6
RI PT
1
[x ]
1.8
800
900
1000
4
[x ]
Interval-valued data
5
4 200
300
400
500 Sample Number
600
700
800
900
SC
100 6
1000
Interval-valued data
7
[x ]
5
3 100
200
300
M AN U
4
400
500 Sample Number
600
700
800
900
1000
Figure 1: Time evolution of simulated data
D
expressed as:
where,
TE
N 1 X M SEj = k[rj (k)]k2 , j = 1, ..., m N k=1
1 (rj− )2 (k) + rj− (k)rj+ (k) + (rj+ )2 (k) (36) 3 The proposed criterion is based on the minimization of the estimation error using the three interval models: CPCA, CIPCA and MRPCA. Results are illustrated in Table 1. Table 1 shows the estimation MSE values using the CPCA, CIPCA and MRPCA methods. Results show that for MRPCA approach provides the lowest estimation MSE (best modeling). For the rest of the paper, the MRPCA technique will be applied for modeling purposes and the EWMA chart will used to detect faults in cases of interval-valued data. The detection and monitoring performances of MRPCA-based EWMA will be compared with MRPCA-based Shewhart, MRPCA-based GLRT and MRPCA-based SPE techniques. To test the performances of the presented MRPCA-based EWMA in terms of fault detection, two faults are simulated
225
AC C
220
EP
k[rj (k)]k2 =
215
(35)
11
ACCEPTED MANUSCRIPT
Table 1: MSE using CPCA, CIPCA and MRPCA models
M SE1 0.013 0.013 0.009
M SE2 0.039 0.039 0.038
M SE6 0.026 0.026 0.026
M SE7 0.030 0.030 0.030
RI PT
Method CPCA CIPCA MRPCA
Mean Squares Error (MSE) M SE3 M SE4 M SE5 0.025 0.020 0.020 0.025 0.020 0.020 0.024 0.020 0.019
230
M AN U
SC
of variables x2 form sample 300 to 500 and x3 from sample 700 to 800. The EWMA chart control width (L) and smoothing parameter (λ) are fixed to 3 and 0.95 respectively. Figures 2, 3, 4 and 5 show the time evolution of MRPCA-based EWMA, MRPCA-based Shewhart, MRPCA-based GLRT and MRPCA-based SPE, respectively for lower (LB) and upper bound (U B). From those figures, it is clear that MRPCA-based EWMA presents the best detection performances. Table 2 gives a summary of false alarm (F A) and missed detection (M D) rates and ARL1 values for the four approaches. 2.5
MRPCA-based SPE 95% Control limit
D
1.5
TE
MRPCA-SPE
2
1
0
100
200
300
400 500 600 Sample Number
AC C
0
EP
0.5
700
800
900
1000
Figure 2: The time evolution of the MRPCA-based SPE statistic in the presence of faults in x2 and x3 .
235
4.2. Air quality monitoring network In order to perform air quality management, air quality monitoring networks have the following missions: the production of data (pollutant concentration and a range of meteorological parameters related to pollution events)
12
ACCEPTED MANUSCRIPT
MRPCA-based Shewhart UB faulty case MRPCA-based Shewhart LB faulty case Control limits
RI PT
1
0.5
0
-0.5
-1 100
200
300
400 500 600 Observation Number
700
800
900
1000
M AN U
0
SC
MRPCA based Shewhart statistic
1.5
Figure 3: The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 .
MRPCA-based GLRT UB faulty case MRPCA-based GLRT LB faulty case Threshold
D
1
0.5
TE
MRPCA-based GLRT statistic
1.5
0 100
200
300
EP
0
400 500 600 Observation Number
700
800
900
1000
240
AC C
Figure 4: The time evolution of the MRPCA-based GLRT statistic in the presence of faults in x2 and x3 .
including the network management, the diffusion of data for permanent information of population and public authorities, and surveillance in reference to norms. To the crossing of economical, sanitary and ecological, social, scientific and technical interests, the data validity and credibility of the delivered information are essential. Sensor fault detection is therefore an issue of great importance for the development of reliable environmental monitoring and management systems. Till now, the problem of sensor fault detection is 13
ACCEPTED MANUSCRIPT
MRPCA-based EWMA UB faulty case MRPCA-based EWMA LB faulty case Control limits
RI PT
1
0.5
0
-0.5
-1 100
200
300
400 500 600 Observation Number
700
800
M AN U
0
SC
MRPCA based EWMA statistic
1.5
900
1000
Figure 5: The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 . Table 2: Summary of Missed Detection (%), False Alarms (%) and ARL1 .
255
EP
250
MDs (%) 14.5695 10.9272 0 1.6026
FAs (%) 11.6071 1.7857 66.96436 0
ARL1 2 1 1 1
performed either using an ”outlier” detection methods which only identify those extreme values out of measurement range or manually by an operator. Unfortunately, this approach is too subjective and impractical in real-time due to high network dimensionality and the large amount of collected data. In this work, six measurement stations were considered. The data matrix X contains 18 variables, x1 to x18 , corresponding, respectively, to ozone concentrations O3 and nitrogen dioxide (N O2 and N O) of each station. For process monitoring 1080 observations are used. Figure 6 shows the time evolution of ozone concentrations in single-valued and interval-valued data. For example, Figures 7 and 8, respectively, present measurements and estimations of pollutant concentrations O3 for the stations 1 and 3, the estimations being given by MRPCA method. Station 1 is a peri-urban station which has the highest ozone levels and Station 3 behaves like the others
AC C
245
TE
D
Chart/Fault Detection Metric MRPCA-based SPE MRPCA-based Shewhart MRPCA-based GLRT MRPCA-based EWMA
14
ACCEPTED MANUSCRIPT
Single-valued Ozone concentrations
100
50
0 200
400
600
800 Sample Number
1000
RI PT
O 3 (µg/m3)
150
1200
1400
Interval-valued Ozone concentrations
Testing data set
Training data set
100
50
0 400
600
800 Sample Number
1000
1200
1400
M AN U
200
SC
O 3 (µg/m3)
150
Figure 6: Ozone concentrations for single-valued data and interval data
peri-urban stations. Using the identified MRPCA model, the interval-valued measurements are well estimated. 160
D
Measurements upper and lower bounds 140
Estimations upper and lower bounds
TE
100
80
60
40
20
EP
O 3 of station 1 (µg/m3)
120
AC C
260
0
200
400
600
800 Sample Number
1000
1200
1400
Figure 7: Measurements and estimations of O3 station 1
To illustrate the performances of the proposed fault detection approach, a sensor fault is introduced on variable x7 (O3 of the third station) between samples 1500 and 2000. The magnitude of the fault represents 30% 15
ACCEPTED MANUSCRIPT
120 Measurements upper and lower bounds Estimations upper and lower bounds
RI PT
100
O 3 of station 3 (µg/m3)
80
60
20
0 400
600
800 Sample Number
1000
1200
1400
M AN U
200
SC
40
Figure 8: Measurements and estimations of O3 station 3
of the range of variation of this variable. Figures 9, 10, 11 and 12 show the fault detection results of the MRPCA-based SPE, MRPCA-based Shewhart, MRPCA-based GLRT and MRPCA-based EWMA techniques. We can show from these figures, that SPE, Shewhart and EWMA charts could detect the fault in the ozone with high false alarm and missed detection rates. The detection results show also that, the MRPCA-based EWMA technique shows a good detection rate for fault in ozone O3 (see Figure 12). Table 3 presents the monitoring performances comparison between the four charts. We can show the detection effectiveness of MRPCA-based EWMA technique in terms of false alarm (F A) and missed detection (M D) rates and ARL1 values.
EP
270
TE
D
265
Table 3: Summary of Missed Detection (%), False Alarms (%) and ARL1 .
AC C
Chart/Fault Detection Metric MRPCA-based SPE MRPCA-based Shewhart MRPCA-based GLRT MRPCA-based EWMA
MDs (%) 79.5 9.92 12.33 0.71
FAs (%) 6.61 68.87 3.96 3.26
ARL1 3 1 1 1
5. Conclusion 275
In this paper, a Midpoint-radii PCA (MRPCA)-based EWMA technique is proposed for monitoring of air quality network using interval valued data. 16
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
Figure 9: The time evolution of the MRPCA-based SPE statistic in the presence of faults in O3 .
MRPCA-based Shewhart UB faulty case MRPCA-based Shewhart LB faulty case Control limits
100
D
50
0
TE
MRPCA based Shewhart statistic
150
-50
-100 200
400
600
EP
0
800 1000 1200 Observation Number
1400
1600
1800
2000
280
AC C
Figure 10: The time evolution of the MRPCA-based EWMA statistic in the presence of faults in x2 and x3 .
The developed monitoring technique consists in using the MRPCA method to achieve the modeling phase and applying the EWMA detection chart for monitoring purpose. In our current studies, the issues of uncertainties on the measurements provided by the sensors are taken into consideration by using MRPCA method, which is able to improve the monitoring abilities. The monitoring performances of the proposed approach are assessed and validated using a simulation example and an air quality data. The MRPCA17
ACCEPTED MANUSCRIPT
3000 2500 2000 1500 1000 500 0 200
400
600
800 1000 1200 Observation Number
1400
1600
1800
2000
M AN U
0
RI PT
MRPCA-based GLRT UB faulty case MRPCA-based GLRT LB faulty case Control limits
3500
SC
MRPCA based GLRT statistic
4000
Figure 11: The time evolution of the MRPCA-based GLRT statistic in the presence of faults in O3 .
MRPCA-based EWMA UB faulty case MRPCA-based EWMA LB faulty case Control limits
D
50
0
TE
MRPCA based EWMA statistic
100
-50 200
400
600
EP
0
800 1000 1200 Observation Number
1400
1600
1800
2000
285
290
AC C
Figure 12: The time evolution of the MRPCA-based EWMA statistic in the presence of faults in O3 .
based EWMA method provided a good modeling as well as monitoring performances but with a small modeling errors and false alarm rates. This is due to that the linear PCA method is not suitable in nonlinear cases and assumes that the relationships between variables are linear and hence may not always be the most appropriate method of analysis. Therefore, as future work, we propose to extend the classical interval PCA modeling technique to deal with more practical processes such as input-output models (partial 18
ACCEPTED MANUSCRIPT
RI PT
least squares (PLS)), nonlinear models (kernel PCA and kernel PLS) and dynamic models to take into account the nonlinear and dynamic nature of the real processes. Acknowledgment
This work was supported by Qatar National Research Fund (a member of Qatar Foundation) under the NPRP grant NPRP9-330-2-140.
SC
295
References
M AN U
300
[1] M. G´omez-Carracedo, J. Andrade, P. L´opez-Mah´ıa, S. Muniategui, D. Prada, A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets, Chemometrics and Intelligent Laboratory Systems 134 (2014) 23–33. [2] I. Stanimirova, V. Simeonov, Modeling of environmental four-way data from air quality control, Chemometrics and intelligent laboratory systems 77 (1) (2005) 115–121. [3] P. Cazes, A. Chouakria, E. Diday, Y. Schektman, Extension de l’analyse en composantes principales `a des donn´ees de type intervalle, Revue de Statistique appliqu´ee 45 (3) (1997) 5–24.
TE
D
305
[4] A. Douzal-Chouakria, Extension des m´ethodes d’analyse factorielles a` des donn´ees de type intervalle, Ph.D. thesis, Paris IX Dauphine (1998).
315
EP
[5] C. N. Lauro, F. Palumbo, Principal component analysis of interval data: a symbolic data analysis approach, Computational statistics 15 (1) (2000) 73–87.
AC C
310
[6] J. Le-Rademacher, Principal component analysis for interval-valued and histogram-valued data and likelihood functions and some maximum likelihood estimators for symbolic data, Ph.D. thesis, Doctoral Dissertation. University of Georgia (2008). [7] M. Jie, Three-way pca of interval data for dynamic features extraction in futures market, in: 2008 Chinese Control and Decision Conference, IEEE, 2008, pp. 1083–1086.
19
ACCEPTED MANUSCRIPT
325
[8] W. H. G. R. W. Junjie, Cipca: Complete-information-based principal component analysis for interval-valued data, Neurocomputing, vol. 86, pp. 158-169.doi:10.1016/j.neucom.2012.01.018.
RI PT
320
[9] F. Palumbo, C. N. Lauro, A pca for interval-valued data based on midpoints and radii, in: New developments in psychometrics, Springer, 2003, pp. 641–648.
SC
[10] R. Dunia, S. Joe Qin, Subspace approach to multidimensional fault identification and reconstruction, AIChE Journal 44 (8) (1998) 1813–1831. doi:10.1002/aic.690440812.
M AN U
330
[11] R. Dunia, S. J. Qin, Joint diagnosis of process and sensor faults using principal component analysis, Control Engineering Practice 6 (4) (1998) 457–469. doi:10.1016/s0967-0661(98)00027-6. [12] R. Dunia, S. J. Qin, A unified geometric approach to process and sensor fault identification and reconstruction: the unidimensional fault case, Computers & chemical engineering 22 (7) (1998) 927–943. [13] Y. Pan, C. Yang, R. An, Y. Sun, Fault detection with improved principal component pursuit method, Chemometrics and Intelligent Laboratory Systems 157 (2016) 111–119.
D
335
TE
EP
340
[14] H. Zhang, Y. Qi, L. Wang, X. Gao, X. Wang, Fault detection and diagnosis of chemical process using enhanced keca, Chemometrics and Intelligent Laboratory Systems 161 (2017) 61–69.
345
350
AC C
[15] M. Mansouri, M. N. Nounou, H. N. Nounou, Improved statistical fault detection technique and application to biological phenomena modeled by s-systems, IEEE transactions on nanobioscience 16 (6) (2017) 504–512. [16] Y. Chetouani, Model selection and fault detection approach based on bayes decision theory: Application to changes detection problem in a distillation column, Process Safety and Environmental Protection 92 (3) (2014) 215–223. [17] M. Mansouri, M. Nounou, H. Nounou, K. Nazmul, Kernel pca-based glrt for nonlinear fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 26 (1) (2016) 129–139.
20
ACCEPTED MANUSCRIPT
355
RI PT
[18] M. Z. Sheriff, M. Mansouri, M. N. Karim, H. Nounou, M. Nounou, Fault detection using multiscale pca-based moving window glrt, Journal of Process Control 54 (2017) 47–64. [19] R. Fezai, M. Mansouri, O. Taouali, M. F. Harkat, N. Bouguila, Online reduced kernel principal component analysis for process monitoring, Journal of Process Control 61 (2018) 1–11.
365
[21] L. Cai, X. Tian, A new fault detection method for non-gaussian process based on robust independent component analysis, Process Safety and Environmental Protection 92 (6) (2014) 645–658.
M AN U
360
SC
[20] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M. N. Karim, Kernel pls-based glrt method for fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 43 (2016) 212–224.
[22] C. Botre, M. Mansouri, M. N. Karim, H. Nounou, M. Nounou, Multiscale pls-based glrt for fault detection of chemical processes, Journal of Loss Prevention in the Process Industries 46 (2017) 143–153.
D
[23] M. Mansouri, M. N. Nounou, H. N. Nounou, Multiscale kernel pls-based exponentially weighted-glrt and its application to fault detection, IEEE Transactions on Emerging Topics in Computational Intelligence.
TE
EP
370
[24] M. Hart, R. Hart, Shewhart control charts for individuals with timeordered data, in: Frontiers in Statistical Quality Control 4, Springer, 1992, pp. 123–137.
375
380
AC C
[25] G. J. Ross, N. M. Adams, D. K. Tasoulis, D. J. Hand, Exponentially weighted moving average charts for detecting concept drift, Pattern Recognition Letters 33 (2) (2012) 191–198. [26] H. Lahdhiri, I. Elaissi, O. Taouali, M. F. Harakat, H. Messaoud, Nonlinear process monitoring based on new reduced rank-kpca method, Stochastic Environmental Research and Risk Assessment 1–16. [27] M.-F. Harkat, M. Mansouri, M. Nounou, H. Nounou, Enhanced data validation strategy of air quality monitoring network, Environmental Research 160 (2018) 183–194.
21
ACCEPTED MANUSCRIPT
390
[29] C. A. Lowry, W. H. Woodall, C. W. Champ, S. E. Rigdon, A multivariate exponentially weighted moving average control chart, Technometrics 34 (1) (1992) 46–53. [30] A. K. Patel, J. Divecha, Modified exponentially weighted moving average (ewma) control chart for an analytical process data, Journal of Chemical Engineering and Materials Science 2 (1) (2011) 12–20.
SC
385
RI PT
[28] T. Ait-Izem, M.-F. Harkat, M. Djeghaba, F. Kratz, On the application of interval pca to process monitoring: A robust strategy for sensor fdi with new efficient control statistics, Journal of Process Control 63 (2018) 29–46.
[32] H.-H. Bock, E. Diday, Analysis of symbolic data: exploratory methods for extracting statistical information from complex data, Springer Science & Business Media, 2012. [33] C. N. Lauro, F. Palumbo, Principal component analysis of interval data: A symbolic data analysis approach, Computational statistics, vol. 15, N. 1, pp. 73-87doi:10.1007/s001800050038.
400
TE
D
395
M AN U
[31] L. Corominas, K. Villez, D. Aguado, L. Rieger, C. Ros´en, P. A. Vanrolleghem, Performance evaluation of fault detection methods for wastewater treatment processes, Biotechnology and bioengineering 108 (2) (2011) 333–344.
EP
[34] L. N. C., P. F., Principal component analysis on subpopulations: an interval data approach, in: IMPS Conference’01, Osaka (Japan), 2001.
405
AC C
[35] S. W. Roberts, Control chart tests based on geometric moving averages, Technometrics 1 (3) (1959) 239–250. [36] S. V. Crowder, M. D. Hamilton, An ewma for monitoring a process standard deviation, Journal of Quality Technology 24 (1) (1992) 12–21. [37] E. S. Page, Continuous inspection schemes, Biometrika (1954) 100–115. [38] J. S. Hunter, The exponentially weighted moving average., Journal of Quality Technology 18 (4) (1986) 203–210.
410
[39] D. C. Montgomery, Introduction to statistical quality control, John Wiley& Sons, New York. 22
ACCEPTED MANUSCRIPT
Midpoint-Radii Principal Component Analysis -based EWMA and Application to Air Quality Monitoring Network
1. Develop an interval PCA (MRPCA)-based EWMA method,
RI PT
Highlights:
2. Use the developed MRPCA -based EWMA method for Fault Detection (FD),
SC
3. Apply the developed MRPCA -based EWMA method for interval FD of a simulation example and Air Quality Monitoring Network,
AC C
EP
TE D
M AN U
4. Examples show effectiveness of the developed MRPCA -based EWMA method over the conventional techniques.