Available online at www.sciencedirect.com
Expert Systems with Applications Expert Systems with Applications 34 (2008) 1038–1043 www.elsevier.com/locate/eswa
Prediction of wastewater treatment plant performance based on wavelet packet decomposition and neural networks Davut Hanbay a
a,*
, Ibrahim Turkoglu a, Yakup Demir
b
Firat University, Technical Education Faculty, Department of Electronics and Computer Science, 23119, Elazig, Turkey b Firat University, Department of Electrical-Electronics Engineering, 23119, Elazig, Turkey
Abstract In this paper, an intelligent wastewater treatment plant model is developed to predict the performance of a wastewater treatment plant (WWTP). The developed model is based on wavelet packet decomposition, entropy and neural network. The data used in this work were obtained from a WWTP in Malatya, Turkey. Daily records of these WWTP parameters over a year were obtained from the plant laboratory. Wavelet packet decomposition was used to reduce the input vectors dimensions of intelligent model. The suitable architecture of the neural network model is determined after several trial and error steps. Total suspended solid is one of the measures of overall plant performance so the developed model is used to predict the total suspended solid concentration in plant effluent. According to test results, the developed model performance is at desirable level. This model is an efficient and a robust tool to predict WWTP performance. 2006 Elsevier Ltd. All rights reserved. Keywords: Wavelet packet decomposition; Entropy; Wastewater treatment plant; Total suspended solid
1. Introduction In recent years, intelligent modeling studies related to WWTP have become popular because of the rising concern about environment. The developments in intelligent methods make them possible to use in complex systems modeling. Intelligent modeling was firstly used to increase the robustness of existing models but now it is used to obtain new models. Complex mathematical models are including many biochemical processes. Yet, due to the scarcity of measured data, it is almost impossible to obtain reliable estimates of unknown dynamical parameters. Therefore, either simpler, manageable models are needed. So considerable efforts have been expended to change or modify the traditional wastewater treatment processes (Mu¨ller, Noykova, Gyllenberg, & Timmer, 2002). Improper operation of a *
Corresponding author. Tel.: +90 424 237 4352; fax: +90 424 2184674. E-mail addresses: dhanbay@firat.edu.tr (D. Hanbay), iturkoglu@firat.edu.tr (I. Turkoglu), ydemir@firat.edu.tr (Y. Demir). 0957-4174/$ - see front matter 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2006.10.030
WWTP may bring about serious environmental and public health problems, as its effluent to a receiving water body can cause or spread various diseases to human beings. A better control of a WWTP can be achieved by developing robust models for predicting the plant performance based on past observation so certain key parameters. However, modeling a WWTP is a difficult task due to the complexity of the treatment processes. The complex physical, biological and chemical processes involved in wastewater treatment process exhibit nonlinear behaviors which are difficult to describe by linear mathematical models (Hamed, Khalafallah, & Hassanien, 2004). In the last decade, many studies were realized in wastewater treatment based on intelligent methods. These researches are related to modeling WWTP. These researches are about predictions of WWTP parameters, process control of WWTP, estimating WWTP output parameters characteristics. Some of these studies based on intelligent methods are as follows. A novel approach on the basis of ANN model that was designed to provide better predictions of nitrogen contents in treated effluents was reported by
D. Hanbay et al. / Expert Systems with Applications 34 (2008) 1038–1043
Chen, Chang, and Shieh (2003). Total suspended solid (TSS) is an indication of plant performance. A simple prediction models based on neural network for TSS was demonstrated in Belanche, Valde´s, Comas, Roda, and Poch (2000). To develop a neural network model to predict long term fouling of nanofiltration membranes that are used to purify contaminated water supplies was demonstrated in Shetty and Chellam (2003). ANN-based models for prediction of biological oxygen demand (BOD) and suspended solid (SS) concentrations in plant effluent were presented in Hamed et al. (2004). A model based on ANN was developed for evolution of the pollutant concentration during irradiation time under various conditions was presented in Go¨b et al. (1999). The coagulation-flocculation is a major step in the drinkable water treatment process allowing the removal of colloidal particles. ANN predictor of coagulant dosage in order to facilitate process operation was reported in Gagnon, Grandjean, and Thibault (1997). A simplified hybrid neural net approach was applied for the modeling and subsequent analysis of a chemical waste water treatment plant in Mizushima, Japan to reduce the occurrences of overflow in the clarifier caused by filamentous bulking and thereby increase waste water treatment capacity was presented in Miller, Itoyama, Uda, Takada, and Bhat (1997). The soft-sensing method based on neural networks was proposed in order to detect on-line wastewater treatment quality parameters. In that study WWT technique was analyzed systematically. The parameters which can be detected on-line were taken as the secondary variables. The parameters which can not be detected on-line were taken as the primary variables. Back propagation (BP) NN for soft-sensing is proposed and trained using the testing data of practical treatment process. The simulation results were showed that the soft-sensing system of wastewater treatment based on BPNN can correctly estimates the quality parameters on real time (Wan-liang & Min, 2002). In another study, neural network models were used to model alum dosing of southern Australian surface waters was presented in Maier, Morgan, and Chow (2004). In this paper, wavelet packet analysis, entropy and NN are combined to extract the features from processed data for estimating TSS as output quality parameter. The paper is organized as follows. In Section 2, we review some basic properties of used methods, wavelet packet decomposition, NNs. The intelligent model and entropy is described in Section 3. This method enables reduction of the data size and make model efficient. The effectiveness of the proposed model is demonstrated in Section 4. Finally Section 5 presents discussion and conclusions.
2. Preliminaries
1039
2.1. Wavelet transform Wavelet transforms are finding inverse use in fields as diverse as telecommunications and biology. Because of their suitability for analyzing non-stationary signals, they have become a powerful alternative to Fourier methods in many medical applications, where such signals abound (Daubechies, 1998). The main advantages of wavelets is that they have a varying window size, being wide for slow frequencies and narrow for the fast ones, thus leading to an optimal time-frequency resolution in all the frequency ranges. Furthermore, owing to the fact that windows are adapted to the transients of each scale, wavelets lack the requirement of stationary. A wavelet expansion is Fourier series expansion, but is defined by a two-parameter family of functions. It can be defined as follows: X f ðxÞ ¼ ci;j wi;j ðxÞ; ð1Þ i;j
where i and j are integers, the functions wi,j(x) are the wavelet expansion functions and the two-parameter expansion coefficients ci,j are called the discrete wavelet transform (DWT) coefficients of f(x). The coefficients are given by: Z þ1 ci;j ¼ f ðxÞwi;j ðxÞ: ð2Þ 1
The wavelet basis functions can be computed from a function w(x) called the generating or mother wavelet through translation and dilation: wi;j ðxÞ ¼ 2i=2 wð2i x jÞ;
ð3Þ
where j is the translation and i the dilation parameter. Mother wavelet function is not unique, but it must satisfy a small set of conditions. One of them is multi-resolution condition and related to the two-scale difference equation; pffiffiffi X /ðxÞ ¼ 2 hðkÞ/ð2x kÞ; ð4Þ k
where /(x) is scaling function and h(k) must satisfy several conditions to make basis wavelet functions unique, orthonormal and have a certain degree of regularity. The mother wavelet is related to the scaling function as follows, pffiffiffi X wðxÞ ¼ 2 gðkÞ/ð2x kÞ; ð5Þ k
where g(k) = (1)kh(1 k). At this point, if valid h(x) is available, one can obtain g(x). Note that h and g can be viewed as filter coefficients of half band low pass and high pass filters, respectively. J-level wavelet decomposition can be computed with Eq. (6) as follows: X f0 ðxÞ ¼ c0;k /0;k ðkÞ k
In this section, the theoretical foundations for the intelligent modeling used in the presented study are given in the following subsections.
¼
X k
cJ þ1;k /J þ1;k ðxÞ þ
J X j¼0
! d jþ1;k wjþ1;k ðxÞ ;
ð6Þ
1040
D. Hanbay et al. / Expert Systems with Applications 34 (2008) 1038–1043
where coefficient c0,k are given and coefficients and coefficient cj+1,n and dj+1,n at scale j + 1 and they can be obtained if coefficient at scale j is available: X cjþ1;n ¼ cj;k hðk 2nÞ k
d jþ1;n ¼
X
cj;k gðk 2nÞ:
X2 W1 X W2 3
Xm
ð7Þ
As an extension of the standard wavelets, wavelet packet represent a generalization of multi-resolution analysis and use the entire family of subband decomposition to generate an overcomplete representation of signals (Wang, Teo, & Lin, 2001). Wavelet decomposition uses the fact that it is possible to resolve high frequency components within a small time window, while only low frequency components need large time windows. This is because a low frequency component completes a cycle in a large time interval whereas a high frequency component completes a cycle in a much shorter interval. Therefore, slow varying components can only be identified over long time intervals but fast varying components can be identified over short time intervals. Wavelet decomposition can be regarded as a continuous time wavelet decomposition sampled at different frequencies at every level or scale. The wavelet decomposition functions at level m and time location tm can be expressed as: t t m d m ðtm Þ ¼ xðtÞWm ; ð8Þ 2m where Wm is the decomposition filter at frequency level m. The effect of the decomposition filter is scaled by the factor 2m at stage m, but otherwise the shape is the same at all scales (Devasahayam, 2000). Wavelet packet analysis is an extension of the discrete wavelet transform (DWT) (Burrus, Gopinath, & Guo, 1998) and it turns out that the DWT is only one of the much possible decomposition that could be performed on the signal. Instead of just decomposing the low frequency component, it is therefore possible to subdivide the whole time-frequency plane into different time-frequency pieces. The advantage of wavelet packet analysis is that it is possible to combine the different levels of decomposition in order to achieve the optimum time-frequency representation of the original (Turkoglu, Arslan, & Ilkay, 2003). 2.3. Neural networks Neural networks (NNs) are biologically inspired and mimic the human brain. They are consisting of a large number of simple processing elements called as neurons. A schematic diagram for an artificial neuron model is shown in Fig. 1. Let X = (X1,X2, . . . ,Xm) represent the m input applied to the neuron. Where Wi represent the weight for input Xi and b is a bias then the output of the neuron is
W3
b
Σ
u f (.)
V
Wm Fig. 1. Artificial neuron model.
k
2.2. Wavelet packet decomposition
X0
X1
given by Eq. (9). These neurons are connected with connection link. Each link has a weight that multiplied with transmitted signal in network. Each neuron has an activation function to determine the output. There are many kind of activation function. Usually nonlinear activation functions such as sigmoid, step are used. NNs are trained by experience, when applied an unknown input to the network it can generalize from past experiences and product a new result (Haykin, 1994). u¼
m X
xi wi b
and
V ¼ f ðuÞ:
ð9Þ
i¼0
Neural networks are systems that are deliberately constructed to make use of some organizational principles resembling those of the human brain (Haykin, 1994). They represent the promising new generation of information processing systems. Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. They represent the promising new generation of information processing systems. Neural networks are good at tasks such as pattern matching and classification, function approximation, optimization and data clustering (Bishop, 1996). When designing a NN model, a number of considerations must be taken into account. First of all the suitable structure of the NN model must be chosen, after this the activation function and the activation values need to be determined. The number of layers and the number of units in each layer must be chosen. Generally desired model consist of a number of layers. The most general model assumes complete interconnections between all units. These connections can be bidirectional or unidirectional. We can sort the advantages of NN as follows; • They can be implemented electrically, optically, or can be modeled on general purpose computer. • They are fault tolerant and robust. • They work in parallel and special hardware devices are being designed and manufactured which take advantage of this capability. • Many learning paradigm or algorithms are available in practice. • An ability to learn how to do tasks based on the data given for training or initial experience.
Hidden Layer Σ
Σ
TN (mg/l)
Σ
Network input Σ
8.3 8.6 8.2 8.9 8.7 8.3 11.0 8.2 8.4 7.3 10.1 9.7 9.2 8.9 9.5
Σ
Output Layer
1041
6 8 6 5 6 10 9 7 10 8 8 6 6 7 5
Input Layer
TSS (mg/l)
D. Hanbay et al. / Expert Systems with Applications 34 (2008) 1038–1043
bias
COD
Σ bias
29 29 35 25 32 24 29 29 37 32 54 32 26 26 29
Σ
6072 7260 7656 7524 8316 10,560 11,616 11,484 10,560 13,068 12,936 18,744 23,628 19,404 24,024
Flow Debi (m3/day)
26,800 57,600 57,600 62,100 69,300 72,000 72,000 72,000 72,000 92,400 115,200 108,000 178,400 173,860 192,514
COD (mg/l) pH T (C) Inlfluent and efluent parameters
Stage 1–Wavelet packet decomposition: For wavelet packet decomposition of the input data, the decomposition structure at level 3 was realized and shown in Fig. 4.
Table 1 A group of data is used for training
The wavelet packet and NN structure is composed of two layers. These are wavelet packet layer and Multilayer perceptrons layer, respectively. Wavelet packet layer: This layer is responsible for feature extraction from input data. The feature extraction process has two stages:
Conduct. (lS/cm)
Step 2: This step is related to feature extracting and classification. Fig. 3 shows the wavelet packet and NN structure for intelligent modeling. Feature extraction is the key process for intelligent methods. So that it is arguably the most important component of modeling based on intelligent. A feature extractor should reduce the input vector (i.e., the original waveform) to a lower dimension, which contains most of the useful information from the original vector. The goal of the feature extraction is to extract features from these data for reliable intelligent modeling. For feature extraction, the wavelet packet and NN structure was used.
7.8 7.6 7.6 7.7 7.5 7.6 7.6 7.6 7.4 7.5 7.5 7.5 7.4 7.5 7.4 96 118 96 119 110 121 121 128 125 121 236 147 194 222 185 788 766 864 840 807 816 884 849 887 921 753 766 731 728 749
ð10Þ
7.8 7.8 7.8 8.1 8.0 7.9 8.0 8.0 8.0 7.9 8.1 7.9 7.9 7.8 7.9
sðiÞ minðsÞ : maxðsÞ minðsÞ
19.7 19.2 16.2 14.3 17.4 14.7 19.4 18.9 18.7 18.6 19.1 16.9 16.4 16.8 16.9
sðiÞ ¼
TN (mg/l)
Step 1: First of all, parameters database is formed. The parameters data which have missing value are ignored. Some of these data are shown in Table 1. in this table, T is temperature, COD is chemical oxygen demand, TN total nitrate. These data are normalized by
18.4 17.7 18.4 19.0 21.6 15.0 20.2 17.4 16.8 16.7 20.9 20.4 25.0 22.2 15.8
The realization steps are as follows:
138 163 131 147 192 160 173 170 172 186 248 237 256 278 262
TSS (mg/l)
3. Procedure
824 827 829 820 829 834 849 872 846 852 811 856 768 781 715
Treated effluent
pH Total (kwh/day) WWTP X*13,200
Raw wastewater
NN can create its own organization or representation of the information it receives during learning time. There are many kind of NN structure. One of these is multilayer feed forward NN and is shown in Fig. 2.
Conduct. (lS/cm)
Fig. 2. Multilayer feed forward neural network structure.
1042
D. Hanbay et al. / Expert Systems with Applications 34 (2008) 1038–1043 Table 2 MLP architecture and training parameters
Input Parameters
Desired output Waste Water Treatment Plant
Architecture The number of layers The number of neuron on the layers
Outputs
The initial weights and biases Activation functions
Training parameters Learning rule error
+
Model output
Neural Network
Entropy
Wavelet Packet Decomposition
Intelligent Plant Model
3 Input: 4 Hidden: 25 Output: 1 Random Tangent-sigmoid Tangent-sigmoid Linear Levenberg–Marquardt Back-propagation 0.0000001
Sum-squared error
Performance is 9.05132e-008, Goal is 1e-007
Fig. 3. The structure of intelligent modeling.
10
0
low-pass
high-pass
C
DWT
D
Terminal nodes
Training-Blue Goal-Black
Signal
10
Multi-layer perception (MLP) layer: This layer is realized the classification by using features from wavelet packet layer. The training parameters and the structure of the MLP are shown in Table 2. These were selected for the best
-2
Trainig
-4
Goal
-6
-8
0
50
100
150 200 352 Epochs
250
300
350
Fig. 5. The intelligent model training.
performance after several trial and error stages, such as the number of hidden layers, the size of the hidden layers, value of the moment constant and learning rate, and type
1.2 Model outputs Desired outputs
1
i
0.8 TSS
where, the wavelet entropy E is a real number, s is the terminal node signal and (si) is i the waveform of terminal node signals. In sure entropy, P is the threshold and must be a positive number. At the WPNN training process, while the P parameter is updated by 0.1 increasing steps, the weights of the NN is updated randomly. Thus, feature vectors which have the length of four are obtained.
10
10
Fig. 4. Total decomposition tree of wavelet packet analysis.
Wavelet packet decomposition was applied to the input data using the Symlet-1 wavelet decomposition filters, w. Stage 2–Wavelet entropy: An Entropy-based criterion describes information-related properties for an accurate representation of a given signal. Entropy is a common concept in many fields, mainly in signal processing (Quiroga, Roso, & Basar, 1999). A method for measuring the entropy appears as an ideal tool for quantifying the ordering of non-stationary signals. We next calculated the sure entropy of the wavelet packet coefficients as defined in X EðsÞ ¼ minðs2i ; p2 Þ jsi j 6 p; ð11Þ
10
0.6 0.4 0.2 0 0
5
10
15
20
25
30
35
40
Fig. 6. The intelligent model training performance.
45
50
D. Hanbay et al. / Expert Systems with Applications 34 (2008) 1038–1043
1043
Acknowledgement
0.4 0.35
We thank, the Malatya Municipality, Malatya, Turkey for providing the WWTP process data to us.
Model outputs Desired outputs
0.3 0.25
References
TSS
0.2 0.15 0.1 0.05 0
-0.05 -0.1 0
2
4
6
8
10
12
Fig. 7. Test performance of wavelet packet and NN model for TSS.
of the activation functions. WPNN training performance is shown in Figs. 5 and 6. 4. Experimental results We performed WPNN model training using 50 different input data. Another 12 input data is used for testing. Test results which are shown in Fig. 7 are showed that WWTP output characteristics are modeled correctly by WPNN. It clearly indicates the effectiveness and the reliability of the proposed approach for extracting features from input data. 5. Discussion and conclusion This work indicates the use of wavelet packet and NN for feature extracting and classification in intelligent modeling. The application of the wavelet packet entropy and adaptive feature extraction from input data are shown. Wavelet entropy proved to be a very useful tool for characterizing the input data. This means that with this method, new information can be accessed with an approach different from the traditional analysis methods. The most important aspect of the intelligent model is the ability of self-organization of the WPNN without requirements of programming and the immediate response of a trained net during real-time applications. These features make the intelligent model suitable for complex systems. These results point out the ability of design of a new intelligence model. The test results of the realized model are showed the advantages of intelligent modeling; it is rapid, easy to operate, non-invasive, and not expensive.
Belanche, L., Valde´s, J. J., Comas, J., Roda, I. R., & Poch, M. (2000). Prediction of the bulking phenomenon in wastewater treatment plants. Artificial Intelligence in Engineering, 14(4), 307–317. Bishop, C. M. (1996). Neural networks for pattern recognition. Oxford: Clarendon Press. Burrus, C. S., Gopinath, R. A., & Guo, H. (1998). Introduction to wavelet and wavelet transforms. New Jersey, USA: Prentice Hall. Chen, J. C., Chang, N. B., & Shieh, W. K. (2003). Assessing wastewater reclamation potential by neural network model. Engineering Applications of Artificial Intelligence, 16, 149–157. Daubechies, I. (1998). Orthogonal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 41, 909–996. Devasahayam, S. R. (2000). Signals and systems in biomedical engineering. Kluwer Academic Publishers. Gagnon, C., Grandjean, B. P. A., & Thibault, J. (1997). Modelling of coagulant dosage in a water treatment plant. Artificial Intelligence in Engineering, 11, 401–404. Go¨b, S., Oliveros, E., Bossmann, S. H., Braun, A. M., Guardani, R., & Nascimento, C. A. O. (1999). Modeling the kinetics of a photochemical water treatment process by means of artificial neural networks. Chemical Engineering and Processing, 38, 373–382. Hamed, M. M., Khalafallah, M. G., & Hassanien, E. A. (2004). Prediction of wastewater treatment plant performance using artificial neural networks. Environmental Modelling and Software, 19, 919–928. Haykin, S. (1994). Neural networks, a comprehensive foundation. College Publishing Comp. Inc. Maier, H. R., Morgan, N., & Chow, C. W. K. (2004). Use of artificial neural networks for predicting optimal alum doses and treated water quality parameters. Environmental Modelling and Software, 19, 485–494. Miller, R. M., Itoyama, K., Uda, A., Takada, H., & Bhat, N. (1997). Modeling and control of a chemical waste water treatment plant. Computers and Chemical Engineering, 21, 947–952. Mu¨ller, T. G., Noykova, N., Gyllenberg, M., & Timmer, J. (2002). Parameter identification in dynamical models of anaerobic waste water treatment. Mathematical Biosciences, 177–178, 147–160. Quiroga, R. Q., Roso, O. A., & Basar, E. (1999). Wavelet entropy: a measure of order in evoked potentials. Elsevier Science, Evoked Potentials and Magnetic Fields, September, 49, 298–302. Shetty, R. G., & Chellam, S. (2003). Predicting membrane fouling during municipal drinking water nanofiltration using artificial neural networks. Journal of Membrane Science, 217(1–2), 69–86. Turkoglu, I., Arslan, A., & Ilkay, E. (2003). An intelligent system for diagnosis of the heart valve diseases with wavelet packet neural networks. Computers in Biology and Medicine, 33(4), 319–331. Wang, L., Teo, K. K., & Lin, Z. (2001). Predicting time with wavelet packet neural networks, international joint conference on neural networks. In Proceedings of the IJCNN’01, INNS-IEEE, Washington DC, vol. 3, pp. 1593–1597. Wan-liang, W., & Min, R. (2002). Soft-sensing method for wastewater treatment based on bp neural network. In: Proceedings of the 4th World Congress on Intelligent Control and Auto., Shanghai, P.R. China.