Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment

Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment

Chemometrics and Intelligent Laboratory Systems 84 (2006) 75 – 81 www.elsevier.com/locate/chemolab Comparison of different predictive models for nutr...

362KB Sizes 0 Downloads 50 Views

Chemometrics and Intelligent Laboratory Systems 84 (2006) 75 – 81 www.elsevier.com/locate/chemolab

Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment D. Aguado a,⁎, A. Ferrer b , A. Seco c , J. Ferrer a a

b

Department of Hydraulic Engineering and Environment, Technical University of Valencia, Camino de Vera s/n, 46022 Valencia, Spain Department of Applied Statistics, Operations Research and Quality, Technical University of Valencia, Camino de Vera s/n, 46022 Valencia, Spain c Department of Chemical Engineering, University of Valencia, Doctor Moliner, 50. 46100-Burjassot, Valencia, Spain Received 16 September 2005; received in revised form 16 March 2006; accepted 30 March 2006 Available online 11 July 2006

Abstract In this paper different predictive models for nutrient estimation in a sequencing batch reactor (SBR) for wastewater treatment are compared: principal component regression (PCR), partial least squares (PLS), and artificial neural networks (ANNs). Two unfolding procedures were used: batch-wise and variable-wise. For the latter unfolding method, X and Y matrix augmentation with lagged variables were used in some models to incorporate process dynamics. The results have shown that batch-wise unfolding PLS models outperform the other approaches. The ANN models are good predictive models, but in this particular case-study, they do not outperform those multivariate projection models that use the same explanatory and response variables. Moreover, the latter models allow assessing if new data is similar to that used for model building (avoiding dangerous extrapolations), are very efficient in handling missing data, exhibit good predictive performance using only data collected from the sensors installed in the SBR and can provide useful information for process understanding. © 2006 Elsevier B.V. All rights reserved. Keywords: ANN; Enhanced biological phosphorus removal; PCR; PLS; Sequencing batch reactor; Soft-sensor; Wastewater

1. Introduction Monitoring and automation of wastewater treatment processes has become essential to make their operation easier, saving manpower and energy. On-line sensors constitute an important source of information about the state of the plant that can be used by plant operators to apply real-time control strategies to optimise process performance. Most of on-line sensors for measuring key quality variables (phosphorus concentration, ammonium concentration, …) are quite expensive, require excessive maintenance and, thus, are not usually available in small wastewater treatment plants (WWTPs). Nevertheless, there are other kinds of on-line sensors (pH, electric conductivity, …) which are inexpensive, reliable and low-maintenance, but their information is not directly related to process performance. ⁎ Corresponding author. Tel.: +34 963879341. E-mail addresses: [email protected] (D. Aguado), [email protected] (A. Ferrer), [email protected] (A. Seco), [email protected] (J. Ferrer). 0169-7439/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2006.03.009

Sequencing batch reactor (SBR) systems have demonstrated that they are specially well suited for nutrient removal in small WWTPs and are a key technology for the removal of organic carbon, nitrogen and phosphorus from domestic and industrial wastewaters [1]. Their flexibility makes them suitable for meeting many different effluent quality limits. Full-scale applications are rapidly increasing in number worldwide. Enhanced biological phosphorus removal (EBPR) is a wastewater treatment process utilized to achieve effluents with low phosphorus concentrations. This process is based on the capability of polyphosphate accumulating organisms (PAOs) to store phosphate in higher quantities than just nutritional requirements. To enrich the biomass with PAOs, it is necessary to alternate the sludge through anaerobic and aerobic conditions and the presence of volatile fatty acids in the anaerobic stage. A detailed description of the EBPR process can be found elsewhere (e.g. [2]). In SBR systems operated for EBPR, the phosphorus concentration profile in each cycle (batch) is considered the main key

76

D. Aguado et al. / Chemometrics and Intelligent Laboratory Systems 84 (2006) 75–81

Table 1 Experimental conditions Runs

SRT (days)

HAc (mg COD l− 1)⁎

P–PO4 (mg l− 1)⁎

K (mg l− 1)⁎

Mg (mg l− 1)⁎

20

8, 9 and 10

80 to 135

3 to 15

18 to 30

19 to 38

* Initial concentrations.

indicator of process performance. This quality variable can be measured on-line by means of special sensors, although due to the aforementioned drawbacks, phosphorus concentration is usually determined off-line from samples taken periodically from the process and analysed in a quality control laboratory. This procedure has several disadvantages, and consequently, there is a strong interest in using the information contained in data recorded on-line from the process in order to develop a soft-sensor for estimating phosphorus concentrations. The use of soft-sensors has both practical and economical advantages since no extra maintenance nor chemical consumption is required. Several researches have been carried out to develop soft-sensors for SBR systems [3,4]. These authors used artificial neural networks (ANNs) as it has proven their ability to model non-linear systems without requiring a structural knowledge of the process. The aim of this work is to develop different predictive models in order to find the most feasible soft-sensor for a SBR operated for EBPR. A comparison has been made between models built using principal component regression (PCR), partial least squares (PLS) and artificial neural networks (ANNs). Two unfolding procedures have been used: batch-wise unfolding and variablewise unfolding. For the latter unfolding method, X and Y matrix augmentation were used in some models to incorporate process dynamics. Although both unfolding methods are commonly used for analysis of batch process data, each one corresponds to looking at a different type of variability [5]. For a detailed description of this issue see Refs. [6] and [7]. 2. Materials and methods

The working volume of the SBR was 7 l and the overall hydraulic retention time 12 h. Biomass was daily wasted from the system to maintain the sludge retention time at the desired value. The reactor was equipped with a mechanical mixer, an air diffuser and pH, electric conductivity (Cond), oxidation reduction potential (ORP), dissolved oxygen (DO) and temperature (Temp) probes. During the aerobic stage, an on–off controller was used to maintain the DO concentration around 3 mg l− 1. The experimental period was divided into 20 runs with different solids retention time (SRT) as well as different initial concentrations of acetate, phosphorus (P), potassium (K) and magnesium (Mg), as it is summarized in Table 1. During this period, the SBR was seeded three times (i.e., three process start-ups) with activated sludge from a full-scale WWTP operated for biological nutrient removal (Quart–Benàger WWTP, Valencia, Spain). The SBR was operated under constant conditions until a steady state was reached. Then it was extensively sampled to characterize the EBPR performance. At each run, samples were withdrawn at regular time intervals (every 15 min, see Fig. 1), and analysed for phosphorus (ascorbic acid method; [8]). After each characterization, the operating conditions of the SBR were changed, and the system operated to reach a new steady state. 2.2. Data set and data analysis Data from the five on-line sensors were collected approximately every 1.06 min (i.e. 340 time points per cycle), while the experimental data (determined off-line in the quality control laboratory) every 15 min (i.e. 19 analysed samples in each run). In this study, data from the 3 main stages (anaerobic, aerobic and settle) were used. As their duration is the same in every cycle, data can be structured in a three-way matrix (X) and a two-way matrix (Y): X consisting of 20 batches by 5 process variables by 340 instances of time and Y consisting of 20 batches by 1 quality variable by 19 instances of time. The three-way matrix was unfolded into a bi-directional structure using the two common ways of unfolding:

2.1. Pilot plant A laboratory SBR was operated with four 6-h cycles per day in a temperature controlled room (20 ± 1 °C). Each cycle consisted of several phases with constant duration: fill (2 min), anaerobic (1.5 h), aerobic (3 h), settle (1.5 h) and draw (2 min). The SBR was automated by means of a personal computer and a time controller. The experiments were carried out using synthetic wastewater.

a) Batch-wise unfolding, in which batch direction is maintained while variable and time modes are unfolded. b) Variable-wise unfolding, in which variable direction is kept unaltered while batch and time modes are unfolded. Some of the predictive models were built using artificial neural networks (ANNs). There are many types of ANNs, but in

Fig. 1. Schematic diagram of the quality data collected in each run.

D. Aguado et al. / Chemometrics and Intelligent Laboratory Systems 84 (2006) 75–81

77

Table 2 Summary of the models' structure Unfolding

Input

Output

Models

Batch-wise

[xT1, … xTK] [xTr1, … xTrK] [xTk ] [xTk,yk−15] [xTk, xTk−1, xTk−2 … xTk−15]

yT yT yk yk yk

[xTk, xTk−1, xTk−2 … xTk−15, yk−15]

yk

[xTk ] [xTk, yk−15]

yk yk

M1 (PLS1700,18) M2 (PLS680,18) M3 (PLS5,1) and M4 (PCR5,1) M5 (PLS6,1) and M6 (PCR6,1) M7 (PLS80,1) and M8 (PCR80,1) M9 (PLS81,1), M10 (PLS81−b,1) and M11 (PCR81,1) M12 (MLP5,1) M13 (MLP6,1)

Variable-wise

the WWTPs field with predictive purposes the most widely used is the feed-forward configuration. Therefore, multilayer perceptrons (MLPs) were used in this application. The optimal architecture for the ANNs was determined by varying the number of neurons in the hidden layer and using cross-validation technique to assess the predictive ability of the developed models. Only one hidden layer architecture was tested since it has been demonstrated by other researchers that any continuous multivariate function can be approximated to any desired degree of accuracy as long as a sufficient number of hidden neurons are provided and non-linear sigmoid functions are used (e.g. [9]). That is, sufficient degrees of freedom can always be provided by changing the number of hidden neurons. During the training process, the mean squared error function was minimised by adjusting the network parameters using the error backpropagation algorithm with momentum. As in this process the ANN can converge in a poor local minima, each ANN was trained with 20 different initialisations (using initial random weights between −1 and 1). For model development, 15 batches from the 3 seedings were used to allow the models to describe all the operating conditions of the system. The remaining 5 batches (also distributed among the 3 seedings) were used to validate and compare all the models in terms of their predictive capability. In one of the validation batches the conductivity trajectory was not available and it was decided to utilize this batch separately from the other 4 validation batches to assess how the different developed models were able to handle missing data. Several statistical software packages have been used in this work: SIMCA-P 9.0, MATLAB 6.1, STATGRAPHICS PLUS 5.1 and Stuttgart Neural Network Simulator (SNNS) 4.2.

output data, and the method used (PCR, PLS or MLP) with the number of input and output variables in brackets. (E.g., the PLS1700,18 model has 1700 explanatory variables and 18 response variables). In this table, xk is the input vector (ΔpH, ΔCond, ΔORP, DO, ΔTemp) at time instant k, K is the last time instant of the batch, xrk is the input vector with the two most predictive sensors (ΔpH, ΔCond) at time instant k, yk is the output variable at time instant k (ΔPk), and y is the output vector which consists of the output variable at every time instant of each batch, that is, 18 values of the quality variable trajectory (note that 19 samples were determined in the quality control laboratory but as the models are built to predict the transformed trajectory, ΔP, the first value is always 0 for every batch). It must be taken into account that the dynamics of the process is modelled when using the batch-wise unfolding, since the autocorrelations and cross-correlations of the variables included in the model are considered. Nevertheless, in the variable-wise unfolding only the instantaneous cross-correlation between the variables is considered. Therefore, in some of the models that use this unfolding method, the X and Y matrices were expanded (lagging the variables) in order to incorporate process dynamics. The difference between M9 and M10 models is related to the data preprocessing. In M9 all variables were scaled to unit variance, while in M10 block scaling was used: the first five blocks consisted of all the lagged variables coming from each of T T T the sensors (xkT, xk−1, , xk−2 … xk−15 ); the last block was the output variable at time instant k − 15 (yk−15). In this way, the more

3. Results and discussion

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13

As it was observed that process and quality variables showed similar trends in every run (batch) but their initial values were quite different, a preliminary study was performed in order to find if a relationship between them could be established. Based on the results of this study it was decided to transform original variables using the difference (Δ) between each variable value and its value at the beginning of the batch. In Table 2 some of the developed models are summarized, indicating the unfolding procedure, the structure of input and

Table 3 Main features of the developed models Model PLS1700,18 PLS680,18 PLS5,1 PCR5,1 PLS6,1 PCR6,1 PLS80,1 PCR80,1 PLS81,1 PLS81−b,1 PCR81,1 MLP5,1 MLP6,1

N Comp

SPE

5 4 3 1; (1) 2 1; (2) 3 8; (6) 3 3 8; (7) 4 1

1092.805 885.240 6651.277 32883.320 2147.108 5271.002 10440.620 8281.894 9120.564 2083.466 1313.392 5768.910 1807.204

78

D. Aguado et al. / Chemometrics and Intelligent Laboratory Systems 84 (2006) 75–81

frequently measured variables do not have a priori more importance in the resulting model. The batch-wise unfolding and the explanatory variable lagging used in the variable-wise unfolding with the multivariate projection models were not considered with the ANNs to avoid model overfitting. A summary of the results of the 13 models is presented in Table 3, including: – The number of components or neurons in the hidden layer (N Comp). For PLS and PCR models, it is the number of statistically significant components according to cross-validation, while for MLP models it is the number of neurons in the hidden layer (the optimal architecture for this data). Particularly, in PCR models two values are shown: firstly, the number of statistically significant components obtained in the PCA stage, and secondly, in brackets the number of statistically significant components in the regression stage. As in the PCR models that include the response variable dynamics this was directly incorporated in the regression stage, the number of statistically significant components in this stage could be higher than the number of components obtained from the PCA model. – The squared prediction error (SPE) for the validation data, specifically for the 4 batches without missing data. This parameter reflects the predictive performance of the developed models because it was calculated for batches which were not used in the model building stage. Note that the number of response variables in the models is different depending on the unfolding procedure used, and for this reason it was not considered appropriate to use the goodness of fit index (R2y) to compare the models. In order to compare the mean value of the SPE in the 13 models developed, an analysis of variance (ANOVA) was carried out. Given the positive skewness of the SPE, a logarithmic transformation (ln) was applied to the SPE variable. In Fig. 2 the least significance difference (LSD) intervals for the average of ln(SPE) are shown. As it can be seen in Table 3 and Fig. 2, M4 model shows the worst performance (from the SPE point of view), which can be due to the fact that only one component resulted statistically significant according to cross-validation in the PCA model. M3 and M12 models using the same inputs show a much better performance. Although the average of ln(SPE) in the ANN model (M12) is lower than in the PLS model (M3), their LSD intervals overlap and, thus, the difference is not statistically significant. When the response variable dynamics was incorporated (M5, M6 and M13), a significant improvement in the predictive performance was obtained in all models, and again there was no statistically significant difference between the PLS (M5) and ANN (M13) models. Therefore, using the same data structure, the predictive performance of the models were similar regardless of the method used (PLS versus ANN). In the case of the PCR models, the results depended on the number of statistically significant components in the PCA stage. M7 and M8 models included the explanatory variables dynamics. No significant improvement in the predictive perfor-

Fig. 2. Least significance difference (LSD) intervals for the average of the logarithm of the squared prediction error (SPE) for the validation data in the 13 models developed.

mance was achieved in the PLS model (M7). However, the PCR model (M8) improved because in this case, more than one component was obtained from the PCA stage (Table 3). The explanatory and response variables dynamics was considered in M9, M10 and M11 models. It can be seen that the more weight was given to the response variable dynamics, the more improved the predictive performance of the model. The excellent performance of the PCR model (M11) was due to the fact that the lagged response variable was directly incorporated in the regression stage. Fig. 2 clearly reveals that M2 and M11 models show the best predictive performance and are statistically equivalent. The following group of models with better results and similar performances is made up of M1, M5, M10 and M13. It should be emphasized that those models that use the variable-wise unfolding require the incorporation of the lagged response variable (and in some cases also the dynamics of the explanatory variables) in order to achieve similar performance to the batchwise unfolding PLS models. As already commented, in these models, the auto-correlations and cross-correlations within the entire batch among the variables included are built in by design. To appreciate the good results of the best models developed (M2 and M11), the predicted values from both models for the four validation runs without missing data are compared to the experimental values in Fig. 3. It should be noted that information on the initial phosphorus concentration is required if phosphorus concentrations is the desired response variable instead of its variations with respect to the concentration at the beginning of the run. Bearing in mind that the experimental data from the validation set were not used for building the models, these results demonstrate that these models can be used to predict with reasonable accuracy the phosphorus behaviour in the SBR. It is important to highlight that M2 model not only shows a good predictive ability but also has the advantage of using as input only data from the on-line, inexpensive, reliable and low maintenance sensors installed in the SBR, which means that it is not necessary to determine in laboratory the trajectory of the quality variable (phosphorus concentration). It should be noted

D. Aguado et al. / Chemometrics and Intelligent Laboratory Systems 84 (2006) 75–81

79

Fig. 3. Experimental and predicted values of phosphorus transformed trajectory (ΔP) in the four validation runs without missing data for the two best models M2 (PLS680,18) and M11 (PCR81,1). Note that ΔP − PO4 is the difference between each P − PO4 value and P − PO4 at the beginning of each run (batch).

that since ΔpH and ΔCond are the only trajectories needed as input in M2 model, these are the trajectories more related to the quality variable (ΔP) and with the highest predictive performance. If phosphorus concentration is the desired response variable, just one analytical determination is needed to know the entire trajectory. Therefore, using M2 as a soft-sensor, traditional laboratory analysis can be significantly reduced and more batches with information on process performance will be available, thus, resulting in a practical and cost-effective tool. When phosphorus concentration trajectory is determined in a given batch, this information can be used to assess model validity, that is, to verify if the model is performing well or its coefficients need to be updated. Additionally, in this model (M2) as well as in the remaining multivariate projection models, it is possible to test if new data is similar to that used for model building by monitoring the observation residuals (in X). This is of great practical utility to detect and avoid dangerous extrapolations. The major drawback of M2 model is that it requires the entire trajectories of the explanatory variables to be known, that is, the predictions are done when the batch has been completed. This is not necessary in any of the variable-wise unfolding models, where on-line prediction is possible in a straightforward way. However, M2 model could be applied on-line using any of the methods proposed by Nomikos and MacGregor [6]: filling in the future observations with zeros, with the current deviation, or estimating the future observations with the fitted model (missing data option, MD). García-Muñoz et al. [10] have shown that the MD option results in the best predictions for the future unknown part of the batch, even from the beginning. At this point, it should be em-

phasized that the stability property of the estimated scores becomes specially important when developing a predictive model, because if these estimations are smooth and nearly constant even at the beginning of the batch, relatively accurate predictions for the response variable will be obtained from the beginning of the batch. Therefore, the use of the MD option for the future unknown part of the batch is the most suitable alternative in this particular application. In this work the implemented MD option was based on the Trimmed Score Regression method (TSR) proposed by Arteaga and Ferrer [11,12], using the X-weighting matrix of the PLS model instead of the PCA loadings used by these authors.

Fig. 4. Experimental and on-line predicted values of phosphorus transformed trajectory (ΔP) in the validation run number 3 using M2 (PLS680,18) model. At each time instant, the future unknown part of the batch was estimated using the missing data option, specifically the TSR method.

80

D. Aguado et al. / Chemometrics and Intelligent Laboratory Systems 84 (2006) 75–81

Fig. 5. Neural network approach to predict the response variable (ΔP) when data from the conductivity sensor is not available.

TSR is a least squared-based method that uses the known score matrix from the training data together with the PLS X-weights and the available measurements to estimate the score vector. A summary of the procedure used in this work is presented next. Firstly, at each particular time instant of an evolving batch the future unknown observations are filled in with zeros and the reconstructed batch is projected onto the PLS X-weight vectors, yielding the trimmed score vector for that particular batch at that particular time instant. Exactly the same procedure is done with the batches from the training data set. Afterwards, these trimmed scores from the training data set are used as explanatory variables to predict the “true scores” via least squares regression. Then, this regression model is used to predict the score estimation for the evolving batch using the trimmed score vector obtained before. Finally, this estimated score together with the PLS Y-weight vectors are used to calculate the predictions for the response trajectory (ΔP). As an example, in Fig. 4 the predicted values from M2 model at different time instants of validation run number 3 are compared to the experimental values. This figure shows that from time instant k = 15 onwards, relatively good predictions for the response trajectory (ΔP) are obtained, and as more on-line data from the sensors are available the estimations get closer to the final predictions (when the batch is completed). As it was previously mentioned, in the last validation batch, the conductivity trajectory was not available and it was decided to use it to assess how the different models were able to predict the response variable in this situation. Due to the fact that missing measurements are common in modern automated processes as a result of different causes (sensor failures, sensor maintenance programs, …), it is important for the soft-sensor to be efficient

dealing with this problem. When using multivariate projection models the occurrence of missing data can be handled, but for the MLP modelling a distinction between two cases can be made. Firstly, model development (training stage) in which the data set should not contain missing data; and secondly, model application, in which different alternatives can be used: – Deleting observations with missing data. This choice is not reasonable since the loss of valuable information is a clear disadvantage. – Using missing data imputation algorithm. In this way, information contained in the training data set is exploited. Within this option, two choices are possible: → Replacing missing data with the mean from the training data. In a batch process, the mean can be calculated either from the variable-wise unfolding or from the batch-wise unfolding. The latter would be the most reasonable choice. → Estimating missing data with a new MLP model. In the data set analysed, the development of a new MLP model resulted statistically equivalent to replacing missing data with the batch-wise mean when using M12, and showed better performance when using M13. For this reason, the results that will be shown are those obtained using the two new MLP models developed (MLP4−c,1 and MLP5−c,1) to estimate the conductivity trajectory. The same 15 batches used to train the previously described models (Table 2) were used to train these neural networks. The new MLP models were used together with M12 and M13 models to predict the response variable (ΔP), as seen in Fig. 5. The estimations of all models for the last validation batch are presented in Fig. 6. It can be seen that when variable-wise

Fig. 6. Experimental and predicted values of phosphorus transformed trajectory of all models in the validation run number 5 (conductivity trajectory was missing): a) the 7 models which gave the best predictions b) the 6 models which gave the worst predictions.

D. Aguado et al. / Chemometrics and Intelligent Laboratory Systems 84 (2006) 75–81

unfolding is used, only those models which incorporate the response variable dynamics are able to give reasonable predictions. Note that accurate predictions using only data from the sensors installed in the SBR were obtained by those models developed using the batch-wise unfolding. These results are again emphasizing the importance of considering the auto-correlations and cross-correlations of process variables during the complete batch in the development of a soft-sensor for the system. 4. Conclusions In this paper different predictive models have been built in order to find the most feasible soft-sensor for a SBR operated for EBPR. A comparison has been made between models built using principal component regression (PCR), partial least squares (PLS) and artificial neural networks (ANNs). Different unfolding methods have been considered. The results have shown that batch-wise unfolding PLS models outperform the other approaches, because they incorporate the structure of auto-correlations and cross-correlations of the variables along the entire batch. Moreover, pH and conductivity trajectories constitute the source of information with more predictive contribution to phosphorus trajectory. Regarding the ANN models, they are good predictive models, but in this particular case-study, they do not outperform those multivariate projection models that use the same explanatory and response variables. Furthermore, the latter models allow assessing if new data is similar to that used for model building by monitoring the observation residuals (in X), are very efficient in handling missing data, and by examining the loadings, useful information about the process can be revealed. On the contrary, the structure of the ANN models is not easily interpretable. A soft-sensor for estimating phosphorus concentration profile in the system studied (SBR operated for EBPR) using as input only data from the on-line, inexpensive, reliable and low maintenance sensors installed in the SBR has been proposed. Therefore, the use of an expensive quality sensor whose cost makes it unaffordable for most small wastewater treatment plants can be

81

avoided. Moreover, by using this predictive model, traditional laboratory analysis can be significantly reduced and more batches with information on process performance will be available, thus, resulting in a practical and cost-effective tool. The main drawback of batch-wise unfolding PLS models is that they have to wait until the end of the batch for calculating the predictions. This inconvenience can be overcome by using missing data imputation methods as it has been illustrated. Research is being carried out to develop control strategies and optimise the process based on the information provided by these soft-sensors. Acknowledgements Financial support from MCYT (project PPQ2002-04043C02-01) is gratefully acknowledged. References [1] P.A. Wilderer, R.L. Irvine, M.C. Goronszy, Sequencing Batch Reactor Technology, IWA Publishing, London, 2001. [2] C.P.L. Grady Jr., G.T. Daigger, H.C. Lim, Biological Wastewater Treatment, 2nd ed., Marcel Dekker, New York, 1999. [3] D.S. Lee, J.M. Park, J. Biotechnol. 75 (1999) 229–239. [4] L. Luccarini, E. Porrà, A. Spagni, P. Ratini, S. Grilli, S. Longhi, G. Bortone, Water Sci. Technol. 45 (4–5) (2002) 101–107. [5] J.A. Westerhuis, T. Kourti, J.F. MacGregor, J. Chemom. 13 (1999) 397–413. [6] P. Nomikos, J.F. MacGregor, Technometrics 37 (1995) 41–59. [7] S. Wold, N. Kettaneh, H. Friden, A. Holmberg, Chemom. Intell. Lab. Syst., 44 (1998) 331–340. [8] American Public Health Association, American Water Works Association, Water Environmental Federation, in: L.S. Clesceri, A.E. Greenberg, A.D. Eaton (Eds.), Standard Methods for the Examination of Water and Wastewater, 20th ed., American Public Health Association, the American Water Works Association and the Water Environment Federation, Washington, D.C., 1998. [9] K. Hornik, M. Stinchcombe, H. White, Neural Netw. 2 (1989) 359–366. [10] S. García-Muñoz, T. Kourti, J.F. MacGregor, Ind. Eng. Chem. Res. 43 (2004) 5929–5941. [11] F. Arteaga, A. Ferrer, J. Chemom. 16 (2002) 408–418. [12] F. Arteaga, A. Ferrer, J. Chemom. 19 (2005) 439–447.