Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction

Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction

Journal of Environmental Management 91 (2010) 767–771 Contents lists available at ScienceDirect Journal of Environmental Management journal homepage...

425KB Sizes 9 Downloads 52 Views

Journal of Environmental Management 91 (2010) 767–771

Contents lists available at ScienceDirect

Journal of Environmental Management journal homepage: www.elsevier.com/locate/jenvman

Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction Roohollah Noori*, Abdulreza Karbassi, Mohammad Salman Sabahi Department of Environmental Engineering, Graduate Faculty of Environment, University of Tehran, P.O. Box 14155-6135, Tehran, Iran

a r t i c l e i n f o

a b s t r a c t

Article history: Received 14 April 2009 Received in revised form 29 September 2009 Accepted 23 October 2009 Available online 13 November 2009

Artificial neural networks (ANNs) are suitable for modeling solid waste generation. In the present study, four training functions, including resilient backpropagation (RP), scale conjugate gradient (SCG), one step secant (OSS), and Levenberg–Marquardt (LM) algorithms have been used. The main goal of this research is to develop an ANN model with a simple structure and ample accuracy. In the first step, an appropriate ANN model with 13 input variables is developed using the afore-mentioned algorithms to optimize the network parameters for weekly solid waste prediction in Mashhad, Iran. Subsequently, principal component analysis (PCA) and Gamma test (GT) techniques are used to reduce the number of input variables. Finally, comparison amongst the operation of ANN, PCA-ANN, and GT-ANN models is made. Findings indicated that the PCA-ANN and GT-ANN models have more effective results than the ANN model. These two models decrease the number of input variables from 13 to 7 and 5, respectively.  2009 Elsevier Ltd. All rights reserved.

Keywords: Solid waste Artificial neural network Principal component analysis Gamma test Mashhad

1. Introduction Literature survey demonstrates that artificial neural network (ANN) models are proper tools for prediction of solid waste generation predicting (Jalili and Noori, 2008; Noori et al., 2009a,b). Levenberg–Marquardt (LM) optimization method suggested by Levenberg (1944) and Marquardt (1963) is the well-known training function of ANN model in solid waste management studies (Karaca ¨ zkaya, 2006; Noori et al., 2009b). Studies in the other fields and O demonstrated that some training functions such as quasi-Newton algorithms, resilient backpropagation (RP), and conjugate gradient (CG) algorithms are also appropriate in optimizing networks parameters (Thirumalaiah and Deo, 1998; Ramirez et al., 2005). Unfortunately, most of the researches on the waste management using ANN technique just focus on the LM optimization method for updating the weights and bias of network. Thus studies on the comparison of different training functions on the ANN operation are scanty. It should be pointed out that similar to any other statistical and mathematical model, ANN models have also some disadvantages, too. Having a large number of input variables is one of the most common problems for their development because they are not engineered to eliminate superfluous inputs. Furthermore, in the

* Corresponding author. Tel.: þ98 9374230526; fax: þ98 2166407719. E-mail address: [email protected] (R. Noori). 0301-4797/$ – see front matter  2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.jenvman.2009.10.007

case of a high number of input variables, irrelevant, redundant, and noisy variables might be included in the data set, consequently; meaningful variables could be hidden (Seasholtz and Kowalski, 1993). There are many different techniques to reduce the number of input variables such as principal component analysis (PCA) (Zhang et al., 2006; Zhang, 2007; Noori et al., 2009c) and Gamma test (GT) (Corcoran et al., 2003; Moghaddamnia et al., 2008). In the present work it is intended to investigate on (1) the effect of some important training functions for updating the weights and bias of ANN model; (2) input selection for ANN model using both the PCA and GT techniques; and (3) comparing the role of input variable preprocessing by means of PCA and the GT on ANN operation. 2. Material and methods 2.1. Case study and data Mashhad is the located 850 km east of Tehran. It is the second largest city of Iran that lies between latitude 35 430 –37 70 North and longitude 59 20 to 60 380 East. Mashhad’s overall climate is cool and dry; the mean air temperature is 14.2  C with an annual precipitation of 255 mm. Mashhad has a population of almost 3 million, which consists mainly of people of Iranian descent. There are also over 20 million tourists visiting the city every year. About 500 thousand tons of waste per day is produced in Mashhad. Seasonal patterns of waste generation have an effective role in estimating the amount of generated waste in a city, so a weekly

768

R. Noori et al. / Journal of Environmental Management 91 (2010) 767–771

Table 1 The best architecture for each training function of ANN models.

6 4 6 4

Training

Testing

R2

AARE

R2

AARE

0.76 0.75 0.74 0.78

2.74 2.83 2.78 2.63

0.76 0.77 0.76 0.79

3.29 3.09 3.55 2.95

0.750 0.700 2

RP SCG OSS LM

Neuron

RP SCG OSS LM

0.800

R

Training function

0.850

0.650 0.600

time-series model of waste generation with 12 lag times (which equals a full season) has been made in order to forecast the amount of waste generation. In this model, the weight of waste in t þ 1 week (Wtþ1), is a function of waste quantity in t (Wt), t  1 (Wt1), . and t  11 (Wt11) weeks. Besides the time-series model of generation, another input data (the thirteenth data) consists of the number of trucks that carry the generated wastes in the week of t (Trt). 2.2. Principal component analysis PCA is one of the multivariate statistical methods which can be used to reduce the input variables complexity when we have a huge volume of information (Camdevyren et al., 2005). PCA changes the input variables into principal components (PCs) that are independent and linear compound of input variables (Lu et al., 2003). Instead of the direct use of input variables, we transform them into PCs and then we can use them as input variables. In this method, the information of input variables will be presented with minimum losses in PCs (Helena et al., 2000). Details for mastering the art of PCA are published elsewhere (Wackernagel, 1995; Tabachnick and Fidell, 2001; Noori et al., 2007).

0.550 0.500 PC3

PC4

PC5

PC6

PC7

PC8

PC9

PC10

PCs Fig. 1. R2 values for different PCs as input variables in PCA-ANN models in testing stage.

for by a smooth data model. The GT is based on N[i,k], which are the kth (1  k  p) nearest neighbours xN[i,k] (1  k  p) for each vector xi (1  i  M). Specifically, the GT is derived from the delta function of the input vectors:

dM ðkÞ ¼

M 1X jx  x i j2 M i ¼ 1 N½i;k

ð1  k  pÞ

(1)

where j/j denotes Euclidean distance, and the corresponding Gamma function of the output values ðgM ðkÞ ¼ ð1=2MÞ P M i ¼ 1 jyN½i;ky j2 ; ð1kpÞÞ , where yN[i,k] is the corresponding y-value i

for the kth nearest neighbour of xi in Eq. (1). In order to compute GT a least squares regression line is constructed for the p points (dM(k), gM(k)):

2.3. Gamma test

y ¼ Ad þ GT

The GT estimates the minimum mean square error (MSE) that can be achieved when modeling the unseen data using any continuous nonlinear models. The GT was first reported by Koncar (1997) and Agalbjorn et al. (1997) and later enhanced and discussed in detail by many other researchers (Durrant, 2001; Tsui et al., 2002). The basic idea is quite distinct from the earlier attempts with nonlinear analysis. Suppose we have a set of data observations, {(xi,yi), 1  i  M}, where the input vectors xi ˛Rm are vectors confined to some closed bounded set C˛Rm and, without loss of generality, the corresponding outputs yi ˛R are scalars. The x vectors contain predicatively useful factors influencing the output y. The only assumption made is that the underlying relationship of the system is y ¼ f ðx1 /xm Þ þ r, where f is a smooth function and r is a random variable that represents noise. Without loss of generality it can be assumed that the mean of the r’s distribution is zero (since any constant bias can be subsumed into the unknown function f) and that the variance of the noise Var(r) is bounded. The domain of a possible model is now restricted to the class of smooth functions which have bounded first partial derivatives. The GT is an estimate of the model’s output variance that cannot be accounted

The intercept on the vertical axis (d ¼ 0) is the GT value, as can be shown gM ðkÞ/VarðrÞ in probability as dM ðkÞ/0. Calculating the regression line gradient can also provide helpful information on the complexity of the system under investigation. A formal mathematical justification of the method can be found in Evans and Jones (2002). The graphical output of this regression line (Eq. (2)) provides very useful information. First, it is remarkable that the vertical intercept GT of the y (or Gamma) axis offers an estimate of the best MSE achievable utilising a modelling technique for unknown smooth functions of continuous variables (Evans and Jones, 2002). Second, the gradient offers an indication of model’s

(2)

5 RP SCG OSS LM

4.5

AARE

4

3.5

Table 2 The best architecture of each training function for PCA-ANN models.

3

Model

Training function

Number of PCs

Neuron

Training

Testing

R2

AARE

R2

AARE

7PCs-RP 7PCs-SCG 6PCs-OSS 5PCs-LM

RP SCG OSS LM

7 7 6 5

4 10 4 4

0.80 0.75 0.73 0.74

2.75 3.03 2.89 3.10

0.80 0.77 0.75 0.77

3.12 3.28 3.17 3.36

2.5 PC3

PC4

PC5

PC6

PC7

PC8

PC9

PC10

PCs Fig. 2. AARE values for different PCs as input variables in PCA-ANN models in testing stage.

R. Noori et al. / Journal of Environmental Management 91 (2010) 767–771 Table 3 The GT results on the input variables.

0.84

RP SCG OSS LM

0.82

Input variables

Gamma value

Input variables

Gamma value

All All All All All All All

0.023829 0.044064 0.039507 0.006973 0.029370 0.026131 0.033725

All All All All All All All

0.038756 0.032170 0.031024 0.037740 0.006405 0.034461 0.063153

Wt Wt1 Wt2 Wt3 Wt4 Wt5

– – – – – – –

Wt6 Wt7 Wt8 Wt9 Wt10 Wt11 Trt

0.8 0.78 2

– – – – – –

inputs inputs inputs inputs inputs inputs inputs

R

inputs inputs inputs inputs inputs inputs inputs

769

0.76 0.74 0.72 0.7 0.68

complexity (a steeper gradient indicates a model of greater complexity). In practice, the GT can be achieved through WinGamma software implementation (Durrant, 2001). Noori et al. (in press) applied the GT as a method for selecting the inputs to ANN technique in order to forecast the daily average carbon monoxide in the atmosphere of Tehran, Iran. 2.4. Artificial neural networks ANN is a proper mathematical structure having an inter-connected assembly of simple processing elements or nodes. ANN customary architecture is composed of three layers. Many theoretical and experimental works have shown that a single hidden layer is sufficient for ANN to approximate any complex nonlinear function (Cybenko, 1989; Jalili and Noori, 2008; Noori et al., in press). A major reason for this is that the intermediate cells do not directly connect to output cells. Hence, they will have very small changes in their weight and learn very slowly (Gallant, 1993). In this study, a model based on a feedforward neural network with a single hidden layer is used. The backpropagation (BP) algorithm is used to train the network. Followings are different training functions to optimize the network weights and bias in the BP algorithm. 2.4.1. Heuristic techniques Gradient descent, gradient descent with momentum, and RP are the most famous training functions which use heuristic techniques to update the network parameters. Because of using sigmoid transfer function in the hidden layer of multilayer ANN with BP algorithm, the best choice for the training function is RP. Therefore, in the present work, RP has just been used. 2.4.2. Standard numerical optimization techniques 2.4.2.1. Conjugate gradient algorithms. In the CG algorithms for faster convergence, a search is performed along conjugate directions. All the CG algorithms start out by searching in the steepest descent direction on the first iteration.

P+ ¼ g+

3

4

5

6

7

8

9

10

11

Number of GT Fig. 3. R2 values for different input variables based on GT results in GT-ANN models in testing stage.

where xkþ1 and xk are vectors of current weights and biases in k þ 1 and k epochs. ak and Pk are the learning rate and the vector of current input. Then, the next search direction is determined so that it is conjugate to previous search directions. The new steepest descent direction would then be combined with the previous search direction is the general procedure for determining the new search direction:

Pk ¼ gk þ bk Pk1

(5)

The various types of the CG algorithm are distinguished by the manner in which the constant bk is computed. In this study, the scaled conjugate gradient (SCG) algorithm is selected which is the best choice among other functions in this field. For a discussion of CG algorithms and their application to neural networks, the interested reader is referred to Hagan et al. (1996). 2.4.2.2. Quasi-Newton algorithms. The basic step of Newton’s method is:

xkþ1 ¼ xk  A1 k gk

(6)

where A1 is the Hessian matrix (second derivatives) of the k performance index at the current values of the weights and biases. Unfortunately, it is too complex and expensive to compute the Hessian matrix for feedforward neural networks. There is a class of algorithms that is based on Newton’s method which doesn’t require calculation of the second derivatives. These are called quasi-Newton (or secant) methods. They update an approximate Hessian matrix in each iteration of the algorithm. The update is computed as a function of the gradient. The one step secant (OSS)

(3)

4.00

where Po and go are input vectors and initial gradient. To determine the optimal distance to move along the current search direction, a line search is then performed:

RP SCG OSS LM

3.80

(4)

Table 4 Training and testing results of different features of input variables based on GT results.

AARE

3.60

xkþ1 ¼ xk þ ak Pk

3.40

3.20

3.00

Model

Training function

Number of variable

Neuron

Training

Testing

R2

AARE

R2

AARE

8GT-RP 7GT-SCG 10GT-OSS 5GT-LM

RP SCG OSS LM

8 7 10 5

6 14 10 4

0.76 0.77 0.77 0.79

2.83 2.76 2.89 2.66

0.78 0.80 0.79 0.81

3.00 3.12 3.10 3.02

2.80 3

4

5

6

7

8

9

10

11

Number of GT Fig. 4. AARE values for different input variables based on GT results in GT-ANN models in testing stage.

770

R. Noori et al. / Journal of Environmental Management 91 (2010) 767–771 14000 Observation Prediction

Solid Waste (ton)

13000

12000

11000

10000

9000

(Kim and Valdes, 2003). Therefore, 143 data sets were divided into training, validating, and testing respectively. The output value of the tansigmoid function in the hidden layer is bounded between 1 and 1, so when rescaling the data, both the input and output data will be rescaled to (1, 1). The default parameter values, as reported in Demuth and Beale (1998), perform adequately for function algorithms. Therefore, the parameter values related to each training function are set as default. Furthermore, STA is used for all models investigated in this research. To prevent the instability of the network, the number of neurons in the network’s hidden layer for all models is decided to be between 4 and 20. The best architecture for each training functions is given in Table 1.

8000 1

5

9

13 17 21 25

29 33 37 41

45 49 53 57 61 65 69 73 77

Week

Fig. 5. Forecasted and observed waste for 5GT-LM model in training stage.

method is an appropriate candidate for this purpose. This algorithm does not store the complete Hessian matrix; it assumes that in each iteration, the previous Hessian was the identity matrix. This has the additional advantage that the new search direction can be calculated without computing a matrix inverse. The OSS method is described in Battiti (1992). The CG and quasi-Newton algorithms require that a line search be performed. There are several search functions which can be used in CG and quasi-Newton training functions. Charalambous (CHA) search and backtracking (BAC) search are customary suggestions for CG and quasi-Newton training functions, respectively. Details for CHA and BAC methods are described in Charalambous (1992) and Dennis and Schnabel (1983), respectively. 2.4.2.3. LM algorithm. The LM algorithm was designed to approach second-order training speed without having to compute the Hessian matrix. When the performance function has the form of a sum of squares (as is typical in training feedforward networks), then the Hessian matrix can be approximated as H ¼ JTJ and the gradient can be computed as g ¼ JTe, where J is the Jacobian matrix that contains the first derivatives of the network errors with respect to the weights and biases, and e is a vector of network errors. The Jacobian matrix can be computed through a standard BP algorithm that is much less complex than computing the Hessian matrix. 3. Results and discussion 3.1. ANN model development The data to be used for training should be sufficiently large to cover the possible known variations in the problem domain 14000

Observation Prediction

Solid Waste (ton)

13000

12000

11000

10000

9000 1

5

9

13

17

21

25

29

33

37

41

Week Fig. 6. Forecasted and observed waste for 5GT-LM model in testing stage.

3.2. PCA-ANN model development For selecting the number of optimum PCs in such a way that is useful in reducing the complexity of ANN and in obtaining better results, after standardizing the variables, different models are formed and investigated by using different input PCs. Similar to previous section (Section 4.1), four training functions (RP, SCG, OSS, and LM) were selected for optimizing the network parameters. The best architecture for each training function is presented in Table 2. R2 and AARE values for different PCs as input variables and different training functions are also illustrated in Figs. 1 and 2, respectively. Table 2 and Figs. 1 and 2 indicate that the model include 7PCs and RP as input variables and training function, respectively. It includes 4 neurons in hidden layer (7PCs-RP model) that proves the best choice for Wtþ1 prediction. 3.3. GT-ANN development The GT can greatly reduce the model development workload and provide input data guidance before a model is developed. In this study, different combinations of input data are explored to assess their influence on the Wtþ1 estimation modeling. There are 2n  1 meaningful combinations of inputs from which the best one can be determined by observing the Gamma value. For determining the most important variables, first the Gamma value must be calculated for a combination of all variables (13 input variables). In the next stage, one of the variables is omitted and Gamma value is calculated for a combination of the other variables (12 variables). Then, the omitted variable in the previous stage is returned and another variable is omitted from the original combination (13 variables) and Gamma value is then calculated for the new combination that contains 12 variables. This process is continued for all variables one by one and in each step the Gamma value is computed. In this method, the omission of important variable is related to the increasing in Gamma value by comparison with the original combination. The results for different combinations are shown in Table 3. This table indicates that Trt is the most important variable. The other important variables are Wt, Wt1, Wt6, Wt9, Wt11, Wt5, Wt7, Wt8, Wt3, and Wt4, respectively. In the present research, the different features of 3–11 variables are investigated. The training and testing results of these different features as the input to ANN model are presented in Table 4. Furthermore, R2 and AARE values for different features of the input variables based on GT results are illustrated in Figs. 3–6, respectively. According to Fig. 3, two choices can exist: (1) the model with the first five important variables are based on the GT results including 4 neurons in hidden layer and LM as training function; and (2) the model including the first eight important variables based on the GT results having 6 neurons in its hidden layer and RP as training function. It should be noted that the R2 and AARE values for these two choices are approximately equal. In the present investigation we were looking for a simpler ANN structure and it

R. Noori et al. / Journal of Environmental Management 91 (2010) 767–771

seems that the first nomination is most appropriate and thus has been chosen as best model for Wtþ1 prediction. 4. Conclusion The accurate prediction of waste generation plays an important role in the solid waste management system. For this reason, ANN, PCA-ANN, and GT-ANN are used and different models are created and tested. Finally, according to the applied indices in this research (R2 and AARE), ANN model including structures with the first seven PCs (7PCs-ANN) and the first five important input variables based on GT results (5GT-ANN), are selected as suitable models for prediction of waste generation in Mashhad. The following results have been achieved in the present investigation: a. Generally, it is clear that preprocessing input variables has a positive effect on ANN operation. b. Different training functions do not have any noticeable effect on ANN operation. c. Using STA method make it possible to investigate various structures for network in a shorter time, but it had no effect on the complexity of the network (reduction of the input data and number of neurons in hidden layer). d. After comparing the results of ANN, 7PCs-ANN, and 5GT-ANN models, it became clear that although the results are close, preprocessing caused a simpler network while we selected 7PCs and 5 input variables in 7PCs-ANN and 5GT-ANN models, respectively, instead of using 13 variables. e. Although the prediction is weekly in this study, due to the limitations of the available data (from 2004 to 2007), the methodology presented in this work can be developed for longer ranges of time in the future by constructing monthly and longer time-series. f. Finally, PCA and GT techniques are recommended for increasing the ANN model operation, especially in cases where there is not a great deal of knowledge with respect to the input variables. References Agalbjorn, S., Koncar, N., Jones, A.J., 1997. A note on the gamma test. Neural Computing Applied 5, 131–133. Battiti, R., 1992. First and second order methods for learning: between steepest descent and Newton’s method. Neural Computation 4, 141–166. Camdevyren, H., Demyr, N., Kanik, A., Keskyn, S., 2005. Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecological Modelling 181, 581–589. Charalambous, C., 1992. Conjugate gradient algorithm for efficient training of artificial neural networks. IEEE Proceedings 139, 301–310. Corcoran, J., Wilson, I., Ware, J., 2003. Predicting the geo-temporal variation of crime and disorder. International Journal of Forecasting 19, 623–634. Cybenko, G., 1989. Approximation by superposition of a sigmoidal function. Mathematics of Control, Signals, and Systems 2, 303–314. Demuth, H., Beale, M., 1998. Neural Network Toolbox for Use with Matlab. The Mathworks Inc., Natick, Massachusetts. Dennis, J.E., Schnabel, R.B., 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs, NJ. Durrant, P.J., 2001. winGamma: a non-linear data analysis and modeling tool with applications to flood prediction. PhD thesis, Department of Computer Science, Cardiff University, Wales, UK.

771

Evans, D., Jones, A.J., 2002. A proof of the gamma test. Proceedings the Royal of Society A 458, 2759–2799. Gallant, S.I., 1993. Neural Network Learning and Expert Systems. MIT Press, Cambridge. Hagan, M.T., Demuth, H.B., Beale, M.H.,1996. Neural Network Design. PWS Publishing, Boston. Helena, B., Pardo, R., Vega, M., Barrado, E., Fernandez, J.M., Fernandez, L., 2000. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga river, Spain) by principal component analysis. Water Research 34, 807–816. Jalili, M., Noori, R., 2008. Prediction of municipal solid waste generation by use of artificial neural network: a case study of Mashhad. International Journal of Environmental Research 2, 13–22. ¨ zkaya, B., 2006. NN-LEAP: a neural network-based model for controlling Karaca, F., O leachate flow-rate in a municipal solid waste landfill site. Environmental Modelling and Software 21, 1190–1197. Kim, T.W., Valdes, J.B., 2003. Nonlinear model for drought forecasting based on a conjunction of wavelet transforms and neural networks. Journal of Hydrologic Engineering 6, 319–328. Koncar, N., 1997. Optimisation methodologies for direct inverse neurocontrol. PhD thesis, Department of Computing, Imperial College of Science, Technology and Medicine, University of London. Levenberg, K., 1944. A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Applied Mathematics 2, 164–168. Lu, W.Z., Wang, W.J., Wang, X.K., Xu, Z.B., Leung, A.Y.T., 2003. Using improved neural network to analyze RSP, NOx and NO2 levels in urban air in Mong Kok, Hong Kong. Environmental Monitoring and Assessment 87, 235–254. Marquardt, D.W., 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11, 431–441. Moghaddamnia, A., Ghafari-Gousheh, M., Piri, J., Amini, S., Han, D., 2008. Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques. Advances in Water Resources. doi:10.1016/ j.advwatres.2008.10.005. Noori, R., Abdoli, M.A., Jalili-Ghazizade, M., Samifard, R., 2009a. Comparison of neural network and principal component-regression analysis to predict the solid waste generation in Tehran. Iranian Journal of Public Health 38, 74–84. Noori, R., Abdoli, M.A., Farokhnia, A., Abbasi, M., 2009b. Results uncertainty of solid waste generation forecasting by hybrid of wavelet transform-ANFIS and wavelet transform-neural network. Expert Systems with Applications 36, 9991–9999. Noori, R., Abdoli, M.A., Ameri, A., Jalili-Ghazizade, M., 2009c. Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: a case study of Mashhad. Environmental Progress & Sustainable Energy 28, 249–258. Noori, R., Hoshyaripour, G.A., Ashrafi, K., Araabi, B.N. Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration. Atmospheric Environment, in press, doi:10.1016/j.atmosenv. 2009.11.005. Noori, R., Kerachian, R., Khodadadi, A., Shakibayinia, A., 2007. Assessment of importance of water quality monitoring stations using principal component and factor analyses: a case study of the Karoon River. Journal of Water & Wastewater 63, 60–69 (in Persian). Ramirez, M.C.V., Velho, H.F.C., Ferreira, N.J., 2005. Artificial neural network technique for rainfall forecasting applied to the Sao Paulo region. Journal of Hydrology 301, 146–162. Seasholtz, M.B., Kowalski, B., 1993. The parsimony principle applied to multivariate calibration. Analytica Chimica Acta 277, 165–177. Tabachnick, B.G., Fidell, L.S., 2001. Using Multivariate Statistics, third ed. Allyn and Bacon, Boston, London. Thirumalaiah, K., Deo, M.C., 1998. River stage forecasting using artificial neural networks. Journal of Hydrologic Engineering 3, 26–32. Tsui, A.P.M., Jones, A.J., deOliveira, A.G., 2002. The construction of smooth models using irregular embeddings determined by a gamma test analysis. Neural Computing Applied 10, 318–329. Wackernagel, H., 1995. Multivariate Geostatistics: an Introduction with Applications, second ed. Springer, New York and London. Zhang, Y.X., 2007. Artificial neural networks based on principal component analysis input selection for clinical pattern recognition analysis. Talanta 73, 68–75. Zhang, Y., Li, H., Hou, A., Havel, J., 2006. Artificial neural networks based on principal component analysis input selection for quantification in overlapped capillary electrophoresis peaks. Chemometrics and Intelligent Laboratory Systems 82, 165–175.