Available online at www.sciencedirect.com
ScienceDirect Aquatic Procedia 4 (2015) 1099 – 1106
INTERNATIONAL CONFERENCE ON WATER RESOURCES, COASTAL AND OCEAN ENGINEERING (ICWRCOE 2015)
Inference of Water Quality Index using ANFIA and PCA Mrunmayee.M.Sahooa*, K.C.Patrab, K. K. Khatuac a
Deptartment of Civil Engineering, NIT, Rourkela, India Department of Civil Engineering, NIT, Rourkela, India c Department of Civil Engineering, NIT, Rourkela, India
b
Abstract River Brahmani is reported to be polluted from the effluents discharged from the nearby industries, towns and villages located near the banks. The presence of heavy metal content and radioactive material makes it most unsuitable for human use. The fertilizers used for agricultural purpose affect the pH and nitrate content of water. Evaluation of Water Quality Index (WQI) of water is extremely important in the gauging stations located near the industries to prepare remedial measures. To this end, the present study proposes an efficient methodology such as adaptive Neuro fuzzy inference system (ANFIS) for the prediction of water quality in Brahmani River. The water quality parameters used to assess are usually inter correlated with each other and this makes an assessment unreasonable. Therefore, the parameters are uncorrelated using principal component analysis with varimax rotation. The uncorrelated values are fuzzified to take into account uncertainty and impreciseness during data collection and application in ANFIS. An efficient rule base and optimal distribution of membership function is constructed from the hybrid learning algorithm of ANFIS in MATLAB. The model performed quite satisfactory with actual and predicted data on water quality. © 2015 2015The TheAuthors. Authors.Published Published Elsevier © byby Elsevier B.V.B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of ICWRCOE 2015. Peer-review under responsibility of organizing committee of ICWRCOE 2015 Keywords: ANFIS ; Correlation; MATLAB; Membership function; Principal component; WQI
1. Introduction Water availability means the ideal combination of quality groundwater and surface water resource taken together at a locality. While rivers form the lifeblood of most of the cities, towns and villages across the country, groundwater is also vital to India’s people. _____________ Corresponding author. Tel.: 08596024674 Email Address:
[email protected]
2214-241X © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of ICWRCOE 2015 doi:10.1016/j.aqpro.2015.02.139
1100
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
There is a tremendous variation both in the quantity and quality of discharge from region to region in river basins. With a few exceptions, all the medium and minor river basins originates in the mountains, and thus exhibit a common feature of fast flowing and monsoon-fed in the hilly regions and by the time they reach the plains they are tidal. The treated or untreated discharges from such sources would always find a way into the rivers that oscillate like a pendulum due to the seasonal flow character of these rivers. Surface water of River Brahmani is found extremely variable in its chemical composition due to seasonal variations in the relative contributions of ground water and surface water sources. The mineral content in river water usually bears an inverse relationship to discharge. The mineral content of river water tends to increase from source to mouth, although the increase may not be continuous or uniform. Other factors like discharge of city wastewater, industrial waste and mixing of waters can also affect the nature and concentration of minerals in surface water. Assessment of water quality measures the analysis of physical, chemical and biological characteristics of water. Knowledge of the water quality and evaluation of water quality index (WQI) plays a significant role in water quality control and management. The index helps in interpreting the water quality in a single numerical value. WQI is strongly dependent on various correlated parameters taken for the study. Also, identification of the suitability of the parameters is critical for accurate evaluation of WQI. Water quality is generally ascertained based on guidelines provided by agencies such as the correlation between parameters can be avoided if the data reduction technique like principal component analysis (PCA) is used to obtain independent principal components. The uncorrelated parameters can be used to assess the quality of water through the standard procedure of prediction methodology such as neural network or fuzzy logic tool boxes. However, the fuzzy logic tool box is preferred because it can take into account uncertainty and impreciseness in the data. A neural network with their learning techniques can be used to learn the fuzzy decision rules. This combination merges the advantages of a fuzzy system and a neural network. Sahu et al. (2011) have predicted the water quality index by using adaptive neuro fuzzy interference system. ANFIS is an adaptive fuzzy inference system implemented in the framework of neural networks. In this study, WQI of River Brahmani of five gauging sites Panposh down-stream, Talcher up-stream, Kamalanga Downstream, Aul and Pottamundai located in urban area adjacent to industries is predicted using ANFIS considering 11 water quality parameters to improve the prediction capability. The correlation of these parameters has been studied and converted into uncorrelated principal components by SPSS to serve as input to the ANFIS system. The paper presents the prediction of WQI of River Brahmani by adopting architechture of ANFIs network for creating a set of fuzzy IF-THEN rules and fuzzy inference system with the membership function to obtain the result. Neuro-fuzzy rules play an important role in human ability to make decisions. Hence, fuzzy IF-Then rules are used to make decisions in uncertainty analysis. 2. Data Collection and Analysis
Fig.1. Map shows the locations gauging stations of Brahmani River.
1101
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
In the present study data from River Brahmani of Odisha is considered for water quality indexing. The Panposh gauging station can be located at Rourkela in Odisha. The gauging stations are well known in integrated Rourkela steel plant, NALCO Smelter Plant and Captive Power Plant, Mahanadi Coal field Limited and chromites mines. The river water adjacent to these industries is contaminated heavily resulting acidity, toxicity, presence of heavy metal and microbes. Five gauging stations namely, Panposh downstream, Talcher up-stream, Kamalanga downstream, Aul and Pottamundai are selected on the basis of mining and industrial activities prevalent nearby. From these five gauging stations, the data are sampled during 2003-2011 for monsoon season. The Pearson correlation matrix was prepared within the studied parameters for the data in monsoon season of nine years and shown in Table 1. It is observed that parameters such as TC and NH4-N exhibit slight correlation with pH of 0.147 and 0.175 respectively. COD and BOD show strong correlation with a Pearson correlation coefficient of 0.747.BOD and FC show slight correlation of 0.361.TC, FC and COD show slight correlation with conductivity having a correlation coefficient of 0.325, 0.380 and 0.331 respectively. When parameters exhibit strong or moderate correlation with each other, WQI may not characterize the quality of water. Therefore, it is important to convert correlated parameters into uncorrelated parameters for efficient forecasting of water quality. Principal component analysis provides a suitable method to transform correlated parameters into uncorrelated components. Table: 1 Pearson Correlation matrix pH
DO
BOD
EC.
NitrateN
TC
FC
COD
NH4N
TA. as CaCO3
pH
1.000
DO
0.030
1.000
BOD
0.120
0.224
1.000
EC
0.073
0.360
0.290
1.000
Nitrate-N
0.160
0.044
0.342
0.001
1.000
TC
0.147
0.362
0.243
0.325
-0.038
1.000
FC
0.126
0.308
0.361
0.380
-0.006
0.764
1.000
COD
0.120
0.220
0.747
0.331
0.227
0.224
0.312
1.000
NH4-N
0.175
0.017
0.048
0.239
-0.175
0.093
0.117
0.128
1.000
TA as CaCO3
0.072
0.142
0.087
0.206
0.062
0.052
0.054
0.122
0.019
1.000
TH as CaCO3
0.137
0.029
0.316
0.173
0.396
0.178
0.229
0.438
0.115
0.274
TH as CaCO3
1.000
3. Principal Component Analysis (PCA) The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. All of the new variables are independent, i.e., are not correlated with each other (whereas the original, untransformed variables may have been correlated to a lesser or greater extent). The new principal component (PC) axes (Y1, Y2, . . . , Yp) are uncorrelated (e.g. Y1 and Y2 are perpendicular as shown in Fig. 2). In principle, each of the principal components is a linear combination of the original X values for the p variables given as: PC1 = c11X1 + c12X2 +c13X3 +· · ·+c1pXp (axis Y1) PC2 = c21X1 + c22X2 +c23X3 +· · ·+c2pXp (axis Y2) ... (1) PCp = cp1X1 + cp2X2 +cp3X3 +· · ·+cppXp (axis Yp) Where c a,b is the component score coefficient for variable b on PC axis Ya, and Xb is the X score for variable b. PCA converts a multivariate set of variables (X1, X2, . . . , Xp) to new variables (Y1,Y2, . . . , Yp), which are uncorrelated with each other. The first principal component consists of a principal component coefficient (αi) for each variable (p) such that there is maximal variance in the calculated score for each case (n); the factor score for each case is calculated as α1X1 + α2X2 +· · ·+αiXi +· · ·+αpXp, where Xi is the center value for the ith variable (Xi :
1102
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
mean X for the ith variable). 3.1 Determination of Principal Components for the Assessment of Water Quality In the previous section of data collection and analysis, interrelation of water quality parameters such as pH, DO, BOD, Conductivity, Nitrate-N, NH4-N, COD, TC, FC, TA of CaCO3 and TH of CaCO3 have been established. As calculation of WQI considers an additive approach; the parameters considered in the study must be independent of each other for efficient forecasting of WQI. The number of PCs justified in the study can be judged from scree plot shown in Fig. 2.
Fig.2: The Scree plot of 11 water quality parameters
It is observed that in scree plot four principal components explaining 65.159% of total variation are sufficient for the study. The PC1 accounts for 28.485%, PC2 explains 16.496%, PC3 accounts for 10.458% and PC4 explains 9.719% of total variation as calculated by loadings for a cumulative percentage of variance by SPSS. The component matrixes extracted from the data by SPSS for the calculation of PCs are given in table 3. The PCs in terms of actual parameters are given in equations 2 to 5. PC1=-0.017*pH-0.487*DO+0.734*BOD+0.616*Conductivity+0.287*Nitrate-N+0.650*TC +0.719*FC+0.745*COD+0.195*NH4-N+0.274*TA of CaCO3+0.536*TH of CaCO3 PC2=-0.563*pH+0.273*DO+0.328*BOD-0.260*Coductivity+0.647*Nitrate-N-0.480*TC 0.407*FC+0.315*COD-0.315*NH4-N+0.170*TA of CaCO3+0.454*TH of CaCO3
(2) --(3)
PC3=0.237*pH+0.458*DO+0.018*BOD+0.131*Coductivity-0.142*Nitrate-N-0.246*TC -0.165*FC+0.171*COD+0.799*NH4-N+0.026*TA of CaCO3+0.302*TH of CaCO3 (4) PC4=-0.113*pH-0.230*DO-0.281*BOD+0.298*Coductivity-0.144*Nitrate-N-0.121*TC -0.171*FC-0.183*COD-0.002*NH4-N+0.853*TA of CaCO3+0.102*TH of CaCO3 (5) 3.2. Calculation and Formulation of WQI In the calculation of water quality for river water, the importance of various water quality parameters depends on the intended use of water and the studied water quality parameters for the point of view of suitability for domestic purposes. The standards (permissible values of various water quality parameters) for drinking water are recommended by Indian Council of Medical Research (ICMR). When ICMR standards for water quality are not available, the standards of United States Public Health Services (USPHS), World Health Organization (WHO), Indian Standard Institution (ISI) and European Economic Community (EEC) are considered. The water quality rating qi for the ith water quality parameters is obtained from the relation: qi = 100(vi/si ),
(6)
1103
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
where vi = value of the ith water quality parameter at a given sampling station and si = standard permissible value of ith water quality parameter. This equation ensures that qi = 0 when a pollutant (the ith water quality parameter) is absent in the water while qi = 100 if the value of this parameter is just equal to its permissible value for drinking water. Thus, the larger the value of qi, the more polluted is the river water with the ith pollutant. The sum of unit weight of 11 water quality parameters can be given as: 11
¦W
i
(8)
1
i 1
Table: 3: The Component Matrix by SPSS Component Matrix Component 1
2
3
4
pH
0.017
0.563
0.237
0.113
DO (mg/l)
0.487
0.273
0.458
0.230
BOD (mg/l)
0.734
0.328
0.018
0.281
Cond. mmho/cm
0.616
0.260
0.131
0.298
Nitrate-N (mg/l)
0.287
0.647
0.142
0.144
TC (MPN/100 ml)
0.650
0.480
0.246
0.121
FC (MPN/100 ml)
0.719
0.407
0.165
0.171
COD (mg/l)
0.745
0.315
0.171
0.183
NH4-N (mg/l)
0.195
0.315
0.799
0.002
T. Alka. as CaCO3 (mg/l)
0.274
0.170
0.026
0.853
TH
0.536
0.454
0.302
0.102
as CaCO3 (mg/l)
The overall WQI of River Brahmani is then calculated by aggregating these sub indices (SI) linearly. Thus, WQI can be written as: 11
11
i 1
i 1
[ ¦ q i Wi / ¦ W i ]
WQI
11
¦q W i
i
i 1
(9) 11
¦ Wi
1
as explained above in (9) Water quality can be categorized into five classifications depending on where, i 1 WQI values of the parameters. Water quality can be treated as excellent, good, poor, very poor, and unsuitable for drinking water and domestic purposes if WQI lies in the range of 0–25, 26–50, 51–75, 76–100, and 100 respectively. 3.3. Adaptive Neuro-Fuzzy Inference System (ANFIS) By MATLAB Adaptive neuro-fuzzy inference system (ANFIS) is the result of coupled between artificial neural networks (ANN) and fuzzy inference system (FIS) in MATLAB. A neural network and fuzzy logic are related and complementary technology to each other. The data and feedback can be learned by neural network, however understanding the knowledge or trend of data can be difficult. But fuzzy logic models and tool boxes are easy to execute because of the linguistic terms like IF-THEN rules. An Adaptive Neuro-Fuzzy Inference System consists of five important functional building parts of the fuzzy logic tool box, those are (i) rule base, (ii) data base, (iii) decision making unit, (iv) fuzzification interface and (v) defuzzification interface.
1104
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
3.4. Architecture and Basic Learning Rules of ANFIS system In a typical adaptive neural network, the network structures are consisting of a number of nodes, characterized by node function with fixed or adjustable parameters. These nodes are connected through directional links. The basic learning rule for ANFIS is a back propagation method, which minimizes the error; it is usually the sum of squared differences between network output and desired output for the data. Generally, Learning or training phase of ANFIS is a process to determine parameter values to best fit the training data given. The model performance can be checked by means of distinct data and best fit is expected in testing phase. Considering a first order Takagi, Sugeno and Kang (TSK) fuzzy inference system, a neoro-fuzzy model consists of two rules, given by Sugeno and Kang (1988) as Rule 1: If x is A1 and y is B1 then f1 = p1x +q1y +r1 Rule 2: If x is A2 and y is B2 then f2 = p2x +q2y +r2 If f1 and f2 are constants instead of linear equations, we have zero order TSK fuzzy models. The node function in the same layer is of the same function family as described below. Here, denotes the output of the ith node in layer j. 3.5. Training and Testing of data by ANFIS GUI Editor The data are normalized and are used as input in Principal Component Analysis as described in section of Principal Component Analysis (PCA). Then the principal components extracted by SPSS Version 20 are normalized and used as an input to ANFIS. The output for each data is the WQI calculated as per procedure given in section of calculation and formulation of WQI. A five layered ANFIS model is created during training. Starting with two nodes the number of nodes in the second layer is increased gradually during training of data. The error starts decreasing with increasing the nodes up to three. Hence, the number of nodes in the second layer is fixed to three and further analysis of ANFIS model is carried out. The five layers are defined as, one input, three hidden layer and one output layer. The network is run in MATLAB 2012b Version 8. A membership function of Gaussian type (guessmf) is chosen for inputs and a membership function of constant type for output during generating fuzzy inference system. The flow chart for complete approach and ANFIS algorithm is shown in Fig.3.
Fig. 3: Flow chart showing the steps of ANFIS model
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
1105
4. Results and discussion The pattern of variation and distribution of actual and predicted WQI for training & testing data of River Brahmani are shown in Fig.4 (a) and (b) respectively. The plot of training and testing data along with FIS output show the coherent nature of data in the distribution.
Fig. 4 (a): Distribution of actual Predicted WQI (Training data); (b) Distribution of Actual and Predicted WQI (Testing data)
Here, the blue dots indicateactual output and red dots present predicted data of WQI. The surface plot for these data sets is shown in Fig.5. It is shown that the surface covers the total landscape and decision space. A complete set of rules is generated by the Rule Editor in ANFIS GUI Editor for prediction of entrance length as shown in Fig.6. The mean absolute percentage error (MAPE) for training data is found to be 0.37 and it is 1.09 for testing data. The mean absolute percentage error (MAPE) of the same set of training and testing data used to forecast WQI without transforming into principal components are found to be 12.84 and 13.52 respectively.. The regression relationship between actual WQI and predicted WQI via ANFIS model for training & testing data are plotted in Fig.7. The degree of the coefficient of determination (R2) is 0.970 and 0.792 for training and for testing respectively. From, this higher degree of coefficient of determination, it can be concluded that the data are well fitted. The variation in two principal components is 44.98~45 % is used initially. Then three and four principal components having 55.43% and 65.15% of total variation are used as input respectively to the ANFIS model. It is also observed that men absolute percentage error (MAPE) for training data is 0.90, 0.86 and 0.37 respectively when two, three and four principal components are used as input parameters to ANFIS model. It is found that as the input parameters decrease the mean absolute percentage error increases due to loss of information. Therefore, the ANFIS model performs well when four principal components explaining 65.15% of total variation are used as input components to ANFIS GUI Editor.
Fig.5: The Surface Plot.
Fig.6: Set of Rules for rediction of entrance length
1106
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106
Fig. 7: Correlation between Predicted and actual WQI of Training and Testing data
5. Conclusions Prediction of water quality with four principal components as inputs and WQI as output for River Brahmani. outputs of these input parameters are excellent, good, poor, very poor and unsuitable for drinking are concluded by using this Gaussian type membership function. Principal component makes a sense to consider water quality data to be an input fuzzy set, which provides the statistical foundation, to express the term water quality index in a linguistic way, e.g., very poor, poor, decent, good, very good and excellent. The water quality values predicted by ANFIS model lies between 21-52,from this range of water quality it can be said that he water of the Brahmani River can be used for drinking water as well as domestic purposes up to a certain extent. The degree of the coefficient of determination (R2) is 0.970 and 0.792 for the regression plots between actual WQI and predicted WQI via ANFIS model for training and testing data respectively. The mean absolute percentage error for training and testing data are 0.37 and 1.09 respectively. It can be said that the ANFIS model predicts WQI with certain accuracy. References BIS (1991) Specification for drinking water, IS: 10050. Rumelhart, D.E., Hinton, G.E., William, D.E., 1986. Learning internal representations by error propagation. In: Parallel distributed processing: Explorations in the microstructure of cognition, MIT Press, Cambridge, 1–8, 318–362. Sahu, M., Mohapatra, S.S., Sahu, H.B., Patel, R.K., 2011. Prediction of water quality index using neuro fuzzy inference system, Water quality expo health, 3,175-191. Singkran,N., Yenpiem, A., Sasitorn, P., 2010. Determining water conditions in the north-eastern rivers of Thailand using time Series and water quality Index Models, Journal of sustainable Energy & Environment, 1, 47-58. Sugeno, M., Kang, G.T., 1988. Structure identification of fuzzy model. Fuzzy Sets Syst 28, 942-947. WHO (2006), guidelines for drinking water quality first addendum to 3rd edition.(I) recommendations, Geneva, Switzerland. Zhou, J., Su, G., Jiang, C., Deng, Y., Li, C., 2007.A face and finger print identity authentication system based on multi-route detection, Neurocomputing, 70, 922–931.