Conclusions and perspectives

Conclusions and perspectives

CHAPTER 7 Conclusions and perspectives Contents 7.1. Conclusions 7.2. Perspectives and research proposals 7.2.1 Project 1: water distribution network...

118KB Sizes 0 Downloads 57 Views

CHAPTER 7

Conclusions and perspectives Contents 7.1. Conclusions 7.2. Perspectives and research proposals 7.2.1 Project 1: water distribution networks: modeling, sensor placement, leak and quality monitoring 7.2.2 Project 2: enhanced operation of wastewater treatment plants 7.2.3 Project 3: enhanced monitoring of photovoltaic systems 7.2.4 Project 4: enhanced data validation of an air quality monitoring networks

259 260 260 266 270 273

7.1 Conclusions The following broad conclusions are reached from the analysis of the simulation and experimental results: • Developed latent variable-based hypothesis testing fault detection techniques, when the process model is not available, that can enhance monitoring processes represented by linear or nonlinear input-space models (such as PCA) or input–output models (such as PLS). These techniques widened the applicability of hypothesis testing-based fault detection in practice. Also, kernel PCA (kPCA) and kernel PLS (kPLS) used to deal with nonlinearities in the process data. • Developed multiscale latent variable-based hypothesis testing fault detection techniques using multiscale representation to help deal with uncertainty in the data and minimize its effect on fault detection. • Developed interval PCA (IPCA) and interval PLS (IPLS) fault detection methods to account for uncertainty in the data. The advantages of IPCA and IPLS were combined with those of hypothesis testing by developing interval PCA- and PLS-based hypothesis testing fault detection methods, which allowed incorporating prior knowledge about the data variability into the monitoring problem to further enhance the quality of fault detection. • Developed model-based detection techniques that can improve monitoring processes using state estimation-based fault detection approaches. Once the process model is available, the state variables are estimated using particle filter (PF). Enhanced univariate charts based on exponenData-Driven and Model-Based Methods for Fault Detection and Diagnosis https://doi.org/10.1016/B978-0-12-819164-4.00016-9

Copyright © 2020 Elsevier Inc. All rights reserved.

259

260

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

tially weighted moving average (EWMA) are applied to the monitored residuals obtained from PF for fault detection purposes. • Demonstrated the effectiveness of the proposed strategies by conducting simulation and experimental studies on the following problems: 1. Simulated examples: – Synthetic data – Continuously stirred tank reactor – Tennessee Eastman process – Wastewater treatment plant – Cad System in E. Coli system – Simulated distillation column – Simulated photovoltaic systems 2. Real air quality monitoring network data 3. Real photovoltaic data

7.2 Perspectives and research proposals 7.2.1 Project 1: water distribution networks: modeling, sensor placement, leak and quality monitoring Proposal summary Environmental, safety, and health issues have gained great importance worldwide. These issues are closely related to the availability and quality of water and can be used in several industrial and domestic applications. Water is a unique commodity because nothing else can substitute for it, especially in areas exposed to drought weather conditions. Also, in most water distribution systems (WDS), it is estimated that about 10% to 30% of the water is lost in transportation from treatment plants to consumers. Therefore proper operation of these systems is crucial to maintain the desired efficiency of water distribution. Thus this project aims at enhancing the operation of WDS by developing innovative hydraulic and water quality modeling, leak and contaminant monitoring, and sensor placement techniques that are capable of improving the performance of these systems. Specifically, four main parts (Part A, Part B, Part C, and Part D) will be addressed in this project. The objective of Part A is to enhance hydraulic and water quality modeling in WDS by developing various multivariate, nonlinear, uncertain, and multiscale modeling frameworks. The hydraulic model describes the behaviors of water pressure, flow, and consumers demand, whereas the water quality modeling describes the behavior of the contaminant concentrations (i.e., chlorine, pH, turbidity). For hydraulic modeling, the empirical data-based

Conclusions and perspectives

261

modeling techniques will be developed. Latent variable regression (LVR) are well-known empirical data-based modeling techniques. LVR modeling methods are multivariate techniques that aim to reduce data dimensionality and rely on the definition of a linear transformation of the data through an orthonormal matrix calculated on the basis of the dataset itself. However, most practical systems are nonlinear, multivariate, and uncertain. To make an extension to nonlinear systems, kernel LVR (KLVR) models will be used. KLVR methods are widely used nonlinear models that can handle nonlinear components. The multiscale nature of the WDS data provides a representation that can be made robust to noises and errors and have a great impact on the quality of monitoring. Hence we propose to combine modeling frameworks with wavelet-based multiscale representation of WDS data to improve the hydraulic modeling effectiveness. Therefore, to enhance the hydraulic modeling, LVR, KLVR, and multiscale KLVR will be applied to improve the modeling performance. It is known that water quality systems are dynamic. Therefore dynamic LVR and KLVR modeling methods will be developed to take into consideration the dynamic nature of the network. To deal with scenarios where the process model is available and has a predefined structure, the state variables are estimated using state estimation techniques. The estimation techniques include the extended Kalman filter (EKF), unscented Kalman filter (UKF), and particle filtering (PF). The PF has shown good improvement and provides a significant advantage over EKF and UKF techniques and can be applied to nonlinear models with non-Gaussian errors. However, it ignores the information given by the measurements when the sampling phase is performed. Thus an improved PF will be developed in this project to incorporate information from recent process measurements, which will help improve the use of the classical PF. Moreover, the collected data from WDS are often affected by noises and measurement uncertainties. These uncertainties have a negative impact on the established models and thus on the monitoring performance. For more precision in representing the real data, this uncertainty can be treated by considering an interval-valued representation instead of a single-valued representation. In this case the determination of model requires new approaches adapted for the interval-valued data. Therefore we extend the developed hydraulic and water quality modeling methods to deal with interval-valued data. In Part B, we propose an enhanced data-based and model-based leak and contaminant monitoring techniques in WDS. The first objective of Part B is developing a new technique for leak or contaminant detection in WDS. The developed technique exploits the benefits

262

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

of the exponentially weighted filter with the generalized likelihood ratio test and multiobjective optimization to detect the leak or the contaminant source in the network. The idea behind the developed chart is to compute a new chart that takes into consideration the current and the previous data information by giving more weight to the more recent data. Once it has been determined that there are leaks or contaminants in the network, they must be identified. For example, the developed chart combined with the sensitivity analysis will be used to identify the leak location. To identify the contamination source, an enhanced technique that integrates the advantages of the developed chart with genetic algorithm will be developed. The objective of Part C is optimizing the sensor placement in WDS. To do that, we develop a new sensor location technique that combines the advantages of the chart and multiobjective optimization. The optimal sensor location is selected so that the detection rate, detection redundancy, and isolation rate are maximized and the detection speed is minimized for leak and contaminant monitoring. In Part D a software for modeling, monitoring, and sensor placement will be developed. The software will be available at the end of this project for use by researchers interested in continuing work in this direction and by practitioners who would like to use it to improve the performance of WDS. The proposed frameworks will be evaluated through three applications: the first one using EPANET simulator, the second one using a real data from ASTAD water plants at Qatar Foundation, and the third one using a benchmark WDS available at TAMU-Qatar.

Technical objectives Water distribution systems are essential for providing quality water for domestic and industrial applications. Proper operation of water distribution systems requires good understanding of their behavior and tight monitoring of their key variables to achieve the desired effectiveness of operation (to produce the sought water quality) and to ensure maintaining the desired safety standards and protocols. Therefore the main objective of this project is to propose a general framework for enhancing the operation of water distribution systems by developing: • Part A: Enhanced hydraulic and quality modeling techniques in water distribution systems: 1. by developing various multivariate, nonlinear, interval-valued, and multiscale hydraulic modeling frameworks.

Conclusions and perspectives

263

2. by developing various multivariate, nonlinear, interval-valued, and multiscale water quality modeling techniques. • Part B: Enhanced leak and contaminant monitoring techniques in water distribution systems as follows: 1. Develop a new leak detection and isolation techniques using optimized exponentially weighted-GLRT-based sensitivity analysis. 2. Develop a new contamination event detection and isolation techniques using optimized exponentially weighted-GLRT-based genetic algorithms. • Part C: Optimal sensor placement techniques in water distribution systems. • Part D: Software implementation for modeling, monitoring and sensor placement in water distribution systems. To achieve the first objective of Part A, which is enhancing the hydraulic models in water distribution networks, multivariate, nonlinear, and multiscale methods will be developed. Several modeling techniques that can accurately predict the behavior of water distribution network will be developed. These techniques include the latent variable regression (LVR), such as principal component analysis (PCA) and partial least squares (PLS). Both PLS and PCA methods assume that the data lay on a linear subspace; however, most practical systems are nonlinear, multivariate, and uncertain. To make the extension to nonlinear systems, kernel LVR (KLVR) models including kernel PCA (KPCA) and kernel PLS (KPLS) will be used. The multiscale nature of the data collected from water distribution networks have a great impact on the quality of monitoring. Thus we propose to merge the developed hydraulic models with wavelet-based multiscale representation to enhance the modeling performances. For example, in this project, multiscale LVR and multiscale KLVR methods will be developed to enhance the hydraulic models. The second objective of Part A is to propose an enhanced water quality modeling techniques. To do that, the developed LVR, KLVR, multiscale LVR, and multiscale KLVR methods can be also applied to water quality modeling. Since the contaminant variables have dynamic behavior and the techniques presented above are only suitable for use under static or weakly dynamic conditions, dynamic LVR, KLVR, multiscale LVR, and multiscale KLVR methods will be developed. To deal with scenarios where the process model is available, state estimation techniques will be used. Particle filtering (PF) is a nonlinear and non-Gaussian state estimation technique, which has been successfully applied in various applications and has shown

264

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

better performances when compared to the linearization-based techniques, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). However, the classical PF estimation technique uses the prior distribution when performing the sampling phase, which ignores the information given by the likelihood function and the information from recent process measurements affecting the estimation performances. Therefore, in this project, an improved PF will be developed to address this issue and improve the estimation convergence and accuracy. The proposed IPF will be based on Kullback–Leibler divergence (KLD) to generate a better importance sampling distribution and thus will be referred to as KLD-particle filter. However, collected data are generally affected by uncertainties in the system parameters, measurement errors, sensor inaccuracies, and computation errors. These types of data can be studied by considering interval-valued data techniques. These challenges can affect the modeling abilities in water distribution systems. Therefore the developed hydraulic and water quality modeling methods based on single-valued data will be extended to interval-valued data. Thus interval KLVR, interval multiscale KLVR, interval dynamic KLVR, and interval dynamic multiscale KLVR modeling methods will be developed to account for uncertainties in water distribution systems. In Part B, to achieve the first task, which is enhancing leak monitoring performances, a new improved chart that combines the advantages of exponentially weighted (EW) filter with those of the GLRT chart will be developed for leak detection. However, EW filter usually has a main parameter λ that must be specified by the user to improve detection for a specific change size. Thus an optimized EW (OEW) based on the best selection of smoothing parameter will be proposed. A multiobjective optimization (MOO) is applied to compute the optimal value λˆ . The MOO is addressed using three objective functions: i) missed detection rate (MDR), ii) false alarm rate (FAR), and iii) average run length (ARL1 ) values. The developed chart (called OEW-GLRT) will provide quick and good detection rates. The idea behind the developed OEW-GLRT is computing a new chart that takes into account the current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. Therefore, in this task, single- and interval-valued model-based OEWGLRT technique will be developed to detect the leak in water distribution networks. Once it has been determined that there is a leak in the network, it should be isolated. Thus we develop new techniques that aim at identifying the leak location using the developed OEW-GLRT-based sensitivity matrix analysis. The sensitivity analysis matrix is determined from pressure residuals

Conclusions and perspectives

265

generated considering all possible leak locations. The residuals are obtained from the comparison of the observed pressures and the estimated ones given by the model. The most similar behavior between the actual residuals and the leak sensitivity matrix, which contains the effect of each possible leak, determines the most probable leak location. Therefore, to identify the location of the leak, angle, correlation, Euclidean distance, and least-square optimization metrics will be used. The second task in Part B focuses on water quality monitoring. The contaminant detection and isolation problem is often used for water quality monitoring. Thus the developed single- and interval-valued model-based OEW-GLRT chart will be used to enhance the contaminant detection abilities. To localize the contaminant source, a multiobjective Pareto optimization-based pollution matrix will be used. The pollution matrix, which presents the impact of different contaminants on the nodes, is computed using the measured contaminant concentrations and their estimations obtained from the developed water quality models. The objective of Part C is optimizing the sensor placement for leak and contaminant monitoring in water distribution systems. In fact, the optimal positions of sensors must be well selected to enhance monitoring performances. Thus, in this project, we develop new optimization techniques that exploit the advantages of the developed modeling/monitoring techniques and genetic algorithms to enhance the performances of water distribution systems. The developed sensor placement algorithms will help make improvements in monitoring water distribution systems. To select the optimal sensor placement, we build a multiobjective optimization scheme containing four objective functions (detection rate (DR), detection speed (DS), detection redundancy (DRE), and isolation rate (IR)). This scheme gives a tradeoff between the four metrics. Thus the optimal sensor location is selected so that DR is maximized, DS is minimized, DRE is minimized, and IR is maximized. To deal with modeling uncertainties, parameter uncertainty of pipe roughness coefficients, and uncertainty in water demands, the developed sensor placement methods based on single-valued data will be extended to interval-valued data. The last part of this proposal is to implement a software for modeling, monitoring, and sensor placement in water distribution systems. Since modeling, monitoring, and sensor placement are inherent parts of water distribution systems, software implementation of more advanced techniques helps to enhance the operation of these systems more efficiently. It will be shown that the developed software is capable of producing compact

266

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

modeling, monitoring, and sensor placement frameworks that are especially tailored for WDS operation purposes. The developed modeling, monitoring, and sensor placement frameworks will be evaluated using EPANET simulator and real data from ASTAD water plants at Qatar Foundation. Also, the validation of the developed techniques will be done using benchmark water distribution system available at Texas A&M University at Qatar, under the operation of Prof. Ahmed Abdel-Wahab. The proposed project falls under Technological Readiness Levels of TRL 4, “Lab Testing/Validation of Alpha Prototype Component/Process”. A framework for enhancing the hydraulic and water quality modeling, leak and contaminant monitoring, and sensor placement will be developed in this project. The developed software will be available at the end of this project for use by researchers interested in continuing work in this direction and by practitioners who would like to use them to improve the performance of water distribution systems. This application will be facilitated through our collaborators and consultants from Technical University of Catalonia (UPC) and Massachusetts Institute of Technology (MIT).

7.2.2 Project 2: enhanced operation of wastewater treatment plants Proposal summary Environmental, health, and safety concerns are of major importance worldwide. These concerns are closely tied to the availability and quality of water that can be used in various domestic and industrial applications. Water is a precious commodity, especially in regions that are exposed to drought or severe weather conditions, such as Qatar. Also, the high demand for water due to the rapidly increasing rate in world population and the global economic development is another key factor that greatly affects the availability of water. The water resources in Qatar continue to decrease and are becoming insufficient to meet its civil and industrial needs. Due to the low levels of precipitation (annual average of 82 mm) and high evaporation rate (average of 2,200 mm per annum), Qatar relies on groundwater sources (which are not abundant), desalinated seawater (which is expensive), and treated water. Thus the reuse of treated wastewater is becoming an absolute necessity, not only to preserve the environment, but also to avoid being completely dependent on the limited groundwater resources or desalinated

Conclusions and perspectives

267

seawater. Wastewater treatment offers significant water savings while providing great financial benefits as it is much less expensive than desalinated seawater. Wastewater treatment is used by various governmental and industrial entities in Qatar, such as Ashghal and Qatar Shell, which operate wastewater treatment plants that utilize different technologies, such as biological treatment or reverse osmosis. Proper operation of these wastewater treatment plants is crucial to maintain the sought effectiveness and desirable water quality. Therefore the main objective of this proposal is to develop a general framework for modeling, monitoring, and control techniques that aim at enhancing the operation of wastewater treatment plants. Specifically, the following objectives will be sought. First, different modeling techniques that can accurately predict the behavior of wastewater treatment plants will be developed. Some of these modeling techniques will be based on a predefined model structure, which is derived using material and energy balances, for which the model parameters are estimated from measurements of the plant variables using state estimation techniques, such as particle filtering (PF). In fact, an improved PF method will be developed to better handle the nonlinear and high-dimensional state estimation problem involved in modeling wastewater treatment plants. When such a model structure is not available, another empirical modeling strategy will be developed, where the model will be estimated entirely from wastewater plant data. Some of the well-known empirical modeling techniques include the latent variable regression (LVR) models. In this proposal, we propose to develop dynamic and multiscale LVR models that account for the dynamic nature and the uncertainty of measurements obtained from wastewater treatment plants. Even though the developed modeling techniques can be applied on any type of wastewater treatment plant, in this project, they will be validated using a biological treatment process that has been widely used in the water quality research community as a benchmark process model. The second objective of this project is to develop effective monitoring techniques that can ensure normal and safe operation of wastewater treatment plants. Two types of monitoring strategies will be developed. One strategy will rely on fault detection techniques, where anomalies in the process variables are quickly detected to ensure safe handling of such faults. Some of the proposed fault detection techniques include LVR-based multivariate detection indices (e.g., Q statistic) and generalized likelihood ratio test (GLRT)-based methods. In fact, GLRT fault detection methods that can detect changes in the mean or variance of the measured variables will be developed. The other monitoring strategy will aim at developing techniques that detect

268

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

drifts in the operating conditions from the nominal operating region where the process was intended to operate at. These techniques will rely on monitoring the model parameters to make sure that they are within a close vicinity of their nominal values. The developed modeling and monitoring techniques will be validated using both simulated data (to theoretically assess their effectiveness) and real data from a biological treatment pilot plant at Qatar Shell.

Technical objectives Wastewater treatment is essential for providing quality water for domestic and industrial applications, and it is performed by governmental and industrial entities all over the world. Proper operation of wastewater treatment plants requires good understanding of their behavior and tight monitoring and control of their key variables to achieve the desired effectiveness of operation (to produce the sought water quality) and to ensure maintaining the desired safety standards and protocols. Therefore the main objective of this project is developing a general framework for enhancing the operation of wastewater treatment plants by developing: • modeling techniques that can accurately predict the behavior of wastewater treatment plants and help understand their behavior, and • monitoring techniques that can detect sensor faults or deviations from a plant’s intended operating region. To achieve the first objective, which is developing accurate wastewater treatment plant modeling techniques, two modeling approaches will be followed. The first modeling approach will utilize a predefined dynamic model structure derived from the basic principles (i.e., mass and energy balances) and then estimate the model parameters from measurements of the process variables using state estimation techniques, such as particle filtering (PF). Particle filtering is a Bayesian estimation method that has been successfully utilized in various applications and has shown advantages over other classical state estimation techniques, such as the extended Kalman filter (EKF) and unscented Kalman filter (UKF), especially when dealing with high-dimensional problems. In fact, an improved PF will be developed in this project to deal with the nonlinear, complex, and highdimensional models of wastewater treatment plants. The idea behind the proposed improved PF is that it will utilize a sampling distribution that incorporates information from recent process measurements, which will help improve the convergence and accuracy of the estimated model parameters. This modeling approach, however, is feasible when a dynamic model

Conclusions and perspectives

269

structure can be derived using the conservation laws and our physical understanding of the process. In some cases, however, deriving such dynamic model structures for complex systems, such as wastewater treatment plants, is a challenging task. In such cases, we propose another empirical modeling approach, which relies on the availability of measurements of the process variables. Specifically, we propose to develop linear and nonlinear dynamic latent variable regression (LVR) models that can account for the multivariate dynamic and nonlinear nature of wastewater treatment plants. For example, in this project, dynamic kernel principal component analysis (DkPCA) and dynamic kernel partial least square (DkPLS) models will be developed to deal with scenarios where the process variables may not be all measurable. Furthermore, the performances of all developed LVR modeling techniques will be enhanced using multiscale representation, which is a powerful data analysis tool that has been shown to improve several types of models. Even though the developed models (using both proposed modeling approaches) are applicable to a wide range of wastewater treatment plants, they will be validated in this project using simulated and real biological wastewater treatment plant data. The simulated biological plant model used in this project is a benchmark model developed by the European CoOperation in the field of Scientific and Technical Research (COST) and has been extensively used in water quality research. The second objective of this project is developing effective monitoring techniques that can help detect anomalies in key measured wastewater treatment plant variables (such as the chemical oxygen demand, dissolve oxygen concentration, ammonia concentration, and others) and also to detect drifts in the process operating conditions from their normal or nominal values. To detect sensor faults or anomalies in the process measurements, different fault detection indices will be utilized, which include the LVR-based Q statistic and the generalized likelihood ratio test (GLRT). GLRT is a statistical fault detection method that relies on maximizing the probability of detection for a given false alarm rate. In this project, various GLRT fault detection statistics will be developed, which include indices that aim at detecting shifts in the mean of the models residuals (which will help detect malfunctioning sensors that may get stuck at some point during the process operation) or aim at detecting changes in the variance of the residuals (which will help detect changes in the uncertainty of the measuring devices). Another monitoring objective is to detect drifts in the wastewater treatment process from its nominal operating region even if there are no sensor faults occurring. This can be done by monitoring the values of estimated model parameters

270

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

obtained using state estimation. So, control charts (which are sensitive to small and slow changes in the process operating conditions), such as exponentially weighted moving average (EWMA), multivariate EWMA, and multiscale EWMA charts, will be developed and used in this regard. Again, the developed monitoring techniques will be validated using simulated and real wastewater plant data obtained from the COST benchmark wastewater plant model and Qatar Shell, respectively.

7.2.3 Project 3: enhanced monitoring of photovoltaic systems Proposal summary Effective operation of various engineering systems requires tight monitoring of some of their key process variables. For example, detection of anomalies in photovoltaic (PV) power systems is crucial for their efficient application to convert solar energy to usable power. Most practical processes, however, are multivariate, that is, involve many variables that need to be monitored at the same time. In a previous research effort, we have developed PCA- and kernel PCA (kPCA)-based GLRT fault detection schemes, in which PCA and kPCA have been used as a modeling framework for fault detection. In this project, our objective is three-fold: to improve the performance of the GLRT, to extend it applicability to a wide range of practical systems, and to apply the developed techniques to enhance monitoring PV systems. First, to improve the performance of the GLRT, a new statistical fault detection method based on combining the advantages of the exponentially weighted moving average (EWMA) filter with those of the GLRT will be developed. The developed method, which is called EWMA-based GLRT, will provide improved properties, such as smaller missed detection and false alarm rates and smaller average run length. The second objective of this project is to extend the applicability of the developed GLR methods to a wide range of practical systems. Most real systems are nonlinear, multivariate, and are best represented by input–output type of models. Latent variable models, such Partial Least Squares (PLS), have been widely used to represent such systems. Therefore, in this project, linear and nonlinear PLS-based GLRT and EWMA-based GLRT methods will be developed to widen the applicability of these techniques in practice. For nonlinear systems, kernel PLS (kPLS), which is capable of dealing with high-dimensional nonlinear data, will be used to make such an extension. Also, in most practical situations, fault detection is needed online, that is, as the data are measured. The nonlinear latent variable models, however, are batch, that is, they require the entire data sets to be available a priori.

Conclusions and perspectives

271

Therefore recursive kPCA and kPLS modeling schemes will be developed to extend the advantages of the GLRT methods for online systems. Also, practical data are usually contaminated with measurement errors. Therefore multiscale representation of data, which is an effective tool for dealing with measurement noise, will be used to further enhance the performances of the fault detection techniques developed in this project. Finally, the developed fault detection techniques will be utilized in practice to help improve various applications. First, they will be used to enhance monitoring the operation of grid-connected photovoltaic power systems through monitoring some of the key variables involved in these systems. Validation of the developed techniques will be made using real PV system data obtained in the “Smart Grid Center” at Texas A&M University at Qatar. Also, the univariate fault detection techniques will be used to provide more accurate detection of aberrations in genomic copy number data, which will help pave the road to better patient-specific diagnosis of diseases and more personalized medicine.

Technical objectives Process monitoring is becoming increasingly important in various applications to ensure safe, reliable, efficient, and economical operation of many engineering systems. For example, failure in a power system is not only inefficient from a productivity point of view, but it can be disastrous from a safety perspective. Also, effective detection of aberrations in genomic data has been shown to help provide better diagnosis of major diseases, such as cancer, which can help lead to more targeted (or personalized) medicine. Various fault detection techniques have been developed and used in practice. The statistical hypothesis testing-based fault detection methods, such as the generalized likelihood ratio test (GLRT), have been shown to be among the most effective univariate techniques. In a previous effort, we have extended the GLRT to handle multivariate and nonlinear systems by developing principle component analysis (PCA)- and kernel PCA (kPCA)-based GLRT methods. This project has three main objectives: • Enhance the performance of the GLRT by developing a new GLRT statistic with improved fault detection abilities, • Widen the applicability of the GLRT methods to a broad range of systems and model structures by developing various multivariate, nonlinear, recursive, and multiscale modeling frameworks, • Utilize the developed fault detection techniques in important practical applications by enhancing the monitoring of photovoltaic (PV) power

272

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

systems and by making a step forward toward better genomic data-based personalized medicine. To achieve the first objective, which is enhancing the effectiveness of the GLRT method, a new method that combines the advantages of the exponentially weighted moving average (EWMA) filter with those of the GLRT method (to further enhance its performance) will be developed. The developed method, which will be referred to as EWMA-based GLRT, will provide fast and effective detection while maintaining a low false alarm rate. The idea behind the developed EWMA-based GLRT is computing a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. This will help provide a more accurate estimation of the GLRT statistic and provide a stronger memory, which will enable better decision making with respect to fault detection. The second objective of this proposal is to widen the applicability of the GLRT methods to a wide range of systems. Most practical systems are nonlinear multivariate and are best described using input–output models, such as partial least squares (PLS) regression models. Therefore, to achieve this objective, linear and nonlinear PLS-based GLRT fault detection methods will be developed. To make the extension to nonlinear systems, kernel PLS (kPLS) will be used. Kernel latent variable regression (LVR) models rely on of transforming the data into a higher-dimensional space, in which the data become linear, making the kernel-based approach for modeling nonlinear systems an attractive choice. Unfortunately, LVR models are batch, that is, they require the availability of the process data before constructing the model. In most situations, however, fault detection is needed online, that is, as the data are collected from the process. Therefore recursive kPCA and kPLS modeling techniques will be developed to extend the advantages of the GLRT to online processes. Also, measured data are usually contaminated with errors, which degrade their quality and limit their usefulness. Multiscale representation of data has been shown to improve the performances of various fault detection methods. Therefore multiscale representation will be used to further enhance the effectiveness of the fault detection methods developed in this project. The third objective of this project is to utilize the developed fault detection techniques to 1) improve monitoring the operation of PV systems, and 2) better utilize genomic copy number data for more effective diagnosis of diseases. Grid-connected PV systems are among the top power technologies with the highest rate of growth. Therefore their proper operation and safe handling are top priorities. Various key variables will be monitored in PV

Conclusions and perspectives

273

systems, which include the voltage and frequency of the grid, the voltage and the current of the AC and DC converters, and climate data, such as the temperature and irradiance. Tight monitoring of these variables will help provide more effective and less interrupted energy supplies. In this application the developed fault detection methods will be applied and validated using real data from the “Smart Grid Center” at Texas A&M University at Qatar, which is operated under Prof. Haitham Abu Rub. One of the PIs on this project, Dr. Mohamed Trabelsi (assistant research scientist at the Smart Grid Center). On the other hand, the univariate fault detection methods developed in this project, such as the EWMA-based GLRT, will be used to help detect abnormal changes (aberrations) in genomic copy number data. Medical practitioners can use such information in the diagnosis of various genetic diseases. Therefore improved detection of such genetic aberrations can help medical doctors provide case-specific treatment plans based on such knowledge, which can be an important step forward toward more personalized medicine.

7.2.4 Project 4: enhanced data validation of an air quality monitoring networks Proposal summary Many human activities produce primary pollutants like nitrogen oxides (NO2 and NO), sulfur dioxide, and volatile organic compounds formed in the lower atmosphere by chemical or photochemical reactions secondary pollutants like ozone. A number of these pollutants are likely to cause problems for both human health and ecological systems. To perform air quality management, air quality monitoring networks have the following missions: the production of data (pollutant concentration and a range of meteorological parameters related to pollution events), including the network management, the diffusion of data for permanent information of population and public authorities, and surveillance in reference to norms. To the crossing of economical, sanitary and ecological, social, scientific, and technical interests, the data validity and credibility of the delivered information are essential. Sensor data validation is therefore an issue of great importance for the development of reliable environmental monitoring and management systems. Till now, the problem of sensor data validation is performed either using “outlier” detection methods, which only identify those extreme values out of measurement range or manually by an operator. Unfortunately, this approach is too subjective and impractical in real-time due to high network dimensionality and the large amount of collected

274

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

data. However, in the field of fault diagnosis, more modern methods have been developed. Model-based diagnosis relies on information redundancy concepts. Its principle is generally based on consistency checking between an observed behavior of the process provided by sensors and an expected behavior provided by a mathematical representation of the process. The analytical redundancy is an explicit input–output relationship that may be difficult to obtain (high process nonlinearity, complexity of the process, and high process dimensionality). As an alternative, methods based on latent variable (LV) (PCA, PLS, kernel PCA, kernel PLS) methods that are data-driven could be very attractive for failure detection. The LV approach is used to model normal process behavior, and faults are then detected by referencing the observed behavior against this model. Because of the nice features of LV, this method can handle high-dimensional and correlated process variables. By way of their interaction the different pollutants constitute a dynamic chemical system strongly influenced by atmospheric conditions. The physico-chemical mechanisms taking place are poorly understood, but it clearly appears that these processes are multivariable and strongly nonlinear. Furthermore, most existing models take into account the atmospheric chemistry of one hundred reactions, also the emissions of primary pollutants, as well as vertical and horizontal exchanges linked to movements of the atmosphere. These models therefore combine a large number of equations with numerous parameters of inaccessible and unknown quantities. These models then are very complex, computationally costly, and, above all, need measurements that are seldom available in air quality monitoring networks. So, the air quality monitoring network is a sensor data validation problem, which needs the following steps: • Process modeling, • Sensor fault detection, • Sensor fault isolation, and • Fault identification and replacement values for faulty measurements. In this project the main objectives are improving the performance of the latent variable process modeling, monitoring, and diagnosis methods, extending their applicability to a wide range of practical systems, and applying the developed techniques to enhance the operation and performance of air quality systems. The real situations are more complex than those encountered on simulation or on a laboratory pilot, the latter being generally well controlled, and the assumptions made being more easily testable. In addition, a laboratory pilot remains easily instrumental, and most variables are accessible to measurement. Models of physical phenomena introduce

Conclusions and perspectives

275

many parameters, some of which are difficult to obtain and calibrate from a practical point a view. To overcome this problem, it is necessary to make a significant effort in modeling task. This effort can be achieved by taking into account models and measurement inaccuracies and uncertainties. Therefore, to improve the performance of the linear and nonlinear input (PCA and kernel PCA) and input–output (PLS and kernel PLS) modeling techniques, new statistical modeling methods, based on combining the advantages of LV methods with those of interval-valued methods, will be developed. The developed methods, which are called LV (PCA, PLS, kernel PCA, kernel PLS) and interval LV (ILV) (interval PCA, interval PLS, interval kernel PCA, interval kernel PLS), will provide improved properties, such as smaller interval modeling errors using single and interval data. Thus the first objective of this proposal is to develop a robust LV and interval LV models for environmental process of air quality, which can correctly estimate all involved variables concentrations. The second objective is to improve the performance of monitoring by using an enhanced statistical fault detection methods that exploit the advantages of the generalized likelihood ratio test (GLRT) with those of the single- and interval-valued LV methods. The developed methods, called LV (PCA, PLS, kernel PCA, kernel PLS) and interval LV (ILV) (interval PCA, interval PLS, interval kernel PCA, interval kernel PLS)-based GLRT, will provide improved properties, such as smaller missed detection and false alarm rates and smaller average run length. The third objective is to extend the reconstruction principal method proposed in the literature, which is generally used in the case of PCA-based single-valued data to deal with input–output model by using PLS method and nonlinear models by using kernel PCA and kernel PLS. Hence PLS-, kernel PCA-, and kernel PLS-based reconstruction methods will be developed. Also, the developed methods will be extended to deal with interval-valued data, and thus interval PLS-, kernel PCA-, and kernel PLS-based reconstruction methods will be proposed to address the fault isolation problem for environmental process of air quality and which can correctly estimate all involved variables concentrations. The fourth objective of this proposal is the correction of the system. Once the faulty variable is identified, a corrective action is necessary to give replacement values for faulty measurements. Since the reconstruction approach uses the other variables to reconstruct the considered one, the faulty variable is then reconstructed from the other variables (single fault case), and the reconstructions are taken as replacement measurements. It should be noted that this approach is easily extended to the multiple simultaneous faulty case

276

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

under some specific conditions (reconstruction conditions). Therefore the developed single and interval PLS-, kernel PCA-, and kernel PLS-based reconstruction methods will be proposed to achieve the correction phase since the data collected from AQMNs are generally noisy and of multiscale nature, which can greatly affect the quality of monitoring. Hence we propose to integrate wavelet-based multiscale representation of AQMN data with the proposed techniques. The various techniques developed in this project will be applied to enhance air pollution monitoring. We will seek to address the monitoring and diagnosis problems of the concentrations of various air pollutants, such as ozone, sulfur dioxide, carbon monoxide, nitrogen oxides, and dust particles. Practical air pollution data from different countries (e.g., Qatar, United States, and France) will be used to design monitoring systems that can provide early alert mechanisms in the cases of abnormal changes in the concentrations of these pollutants.

Technical objectives Process monitoring is becoming increasingly important in various applications to ensure safe, reliable, efficient, and economical operation of many engineering systems. For example, failure in environmental processes is not only inefficient from a productivity point of view, but it can be disastrous from a safety perspective. Effective detection of aberrations in such processes has been shown to help provide better diagnosis of major diseases, which can help lead to more targeted treatment development. Abnormal atmospheric pollution levels negatively affect the public health, animals, plants, and climate and damage the natural resources. Therefore monitoring air quality is also crucial for the safety of humans and the environment. Thus the main aim of this proposal is to develop enhanced operation and performance of air quality monitoring process. In addition, real-world data analysis is often affected by different types of errors such as measurement errors, computation errors, and imprecision related to the method adopted for estimating the data. The uncertainty in the data, which is strictly connected to the errors mentioned, may be treated by considering the interval of values into which the data may fall, rather than a single value for every data. The true value of the quantity is a concept. In almost all cases the true value cannot be measured, and the collected data on a process are only approximations given by sensors and thus are imprecise. This is due mainly to the uncertainties induced by measurement errors or determined by specific experimental conditions. Statistical methods have been mainly developed for the analysis of single-valued variables. However, in real life, there are

Conclusions and perspectives

277

many situations in which the use of these variables may cause severe loss of information. Dealing with quantitative variables, there are many cases where a more complete information can be surely achieved by describing a set of statistical units in terms of interval data. For example, daily temperatures registered as minimum and maximum values offer a more realistic view on the variations of weather conditions with respect to the simple average values. Another example can be given by air quality data (O3 , NO2 , and NO), where each concentration measurement is taken as a mean of several measurements over 15 minutes (sample time), and the minimum and maximum concentrations, recorded over 15 minutes, represent a more relevant information for experts for evaluation of tendency and variability of pollutants concentrations. Therefore, in this project, we propose to develop an enhanced fault detection and isolation using latent variable-based single and interval data methods and then use these developed techniques to improve monitoring Air-Quality Monitoring Network (AQMN) process. The main objectives of this project is to enhance the operation and performance of AQMN through the development and integration of innovative modeling, monitoring, and diagnosis techniques. In this project, several objectives will be sought. The first objective is to improve the performance of the linear and nonlinear input (PCA and kernel PCA) and input–output (PLS and kernel PLS) modeling techniques by using single- and interval-valued methods. The developed methods, which are called LV (PCA, PLS, kernel PCA, kernel PLS) and interval LV (ILV) (interval PCA, interval PLS, interval kernel PCA, interval kernel PLS), will provide improved properties, such as smaller interval modeling errors using single and interval data. Therefore the aim of this task is to develop a robust LV and interval LV models for environmental process of air quality and which can correctly estimate all involved variables concentrations. The second objective of this proposal is to improve the performance of monitoring using a new statistical fault detection method, which combines the advantages of generalized likelihood ratio test (GLRT) with those of the single- and interval-valued LV methods. The developed methods PCA, PLS, kernel PCA, kernel PLS and interval PCA, interval PLS, interval kernel PCA, interval kernel PLS-based GLRT will provide improved properties, such as smaller missed detection and false alarm rates and smaller average run length using single and interval data. The third objective is to develop robust fault identification techniques; to do that, we propose to extend the reconstruction PCA method-based single valued-data to deal with nonlinear models by using kernel PCA method

278

Data-Driven and Model-Based Methods for Fault Detection and Diagnosis

and input output models by using PLS and kernel PLS methods. Hence, kernel PCA-, PLS-, and kernel PLS-based reconstruction methods will be developed. Also, the developed methods will be extended to deal with interval-valued data, and thus interval kernel PCA-, interval PLS-, and interval kernel PLS-based reconstruction methods will be proposed to address the fault isolation problem for environmental process of air quality and which can correctly estimate all involved variables concentrations. The fourth objective is a correction of the system. Once the faulty variable is identified, a corrective action is necessary to give a replacement values for faulty measurements. Since the reconstruction approach uses other variables to reconstruct the considered one, the faulty variable is then reconstructed from the other variables (single fault case), and the reconstructions are taken as replacement measurements. It should be noted that this approach is easily extended for the multiple simultaneous faulty case under some specific conditions (reconstruction conditions). Therefore the developed single and interval kernel PCA-, PLS-, and kernel PLS-based reconstruction methods will be proposed to achieve the correction phase since the data collected from AQMNs are generally noisy and of multiscale nature, which can greatly affect the quality of monitoring. Hence we propose to integrate wavelet-based multiscale representation of AQMN data with the proposed techniques. The developed techniques will be used to enhance modeling, monitoring, and diagnosis of the concentration levels of various air pollutants, such as ozone, nitrogen oxides, sulfur oxides, dust, and others. Real air pollution data from various countries (e.g., Qatar, United States, and France) will be used in this important application.