Available online at www.sciencedirect.com
ScienceDirect Procedia Engineering 186 (2017) 544 – 550
XVIII International Conference on Water Distribution Systems Analysis, WDSA2016
Gene expression programing in long term water demand forecasts using wavelet decomposition Peyman Yousefia, Sina Shabania, Hadi Mohammadia, and Gholamreza Nasera* a
Okanagan School of Engineering, The University of British Columbia, Canada
Abstract Increasing draught seasons and lack of access to potable water reserves have been the major risks threatening water authorities and governments over the recent years. Therefore, long term water forecasts are receiving much more attention nowadays. Unlike the conventional projection of historical water demand, researchers have tried to implement sophisticated mathematical models to predict demand of water. Gene expression programming (GEP), as a relatively new forecasting technique, remains to be explored in this endeavor. The main purpose of this research was to assess the performance of GEP models using wavelet decomposition with 2 transfer functions (db2 and haar) and 3 levels. Results of this study showed GEP models can be highly sensitive to wavelet decomposition if all combinations of proper lag times are used as inputs feeding these models. 2016The The Authors. Published by Elsevier Ltd.is an open access article under the CC BY-NC-ND license ©©2016 Authors. Published by Elsevier Ltd. This Peer-review under responsibility of the organizing committee of the XVIII International Conference on Water Distribution (http://creativecommons.org/licenses/by-nc-nd/4.0/). Systems. under responsibility of the organizing committee of the XVIII International Conference on Water Distribution Systems Peer-review Keywords: Water Demand Forecasting; Gene expression Programming; Wavelet Decomposition, Lag Time.
1. Introduction Highly uncertain governing factors are coupled to water demand, making water distribution systems (WDSs) one of the most complex infrastructures in terms of resource management. Climate change has significantly shifted water availability patterns over the recent years [1]. On the other hand, sudden economic expansions and increasing urbanization escalated water demand worldwide [2]. Consequently, governments are expected to be prepared for
* Corresponding author. Tel.: +1-250-807-8464; fax: +1-250-807-9850. E-mail address:
[email protected]
1877-7058 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of the XVIII International Conference on Water Distribution Systems
doi:10.1016/j.proeng.2017.03.268
Peyman Yousefi et al. / Procedia Engineering 186 (2017) 544 – 550
545
highly uncertain anticipated future of water demands. Projection of historical water usage patterns is no longer applicable due to high complexity of the mentioned global drivers of demand. Therefore, an educated/engineering prediction of demand seems necessary for efficient management of this valuable resource. Water demand forecasting has attracted many researchers over the last decade. A wide range of modeling techniques has been employed to assess the accuracy of such forecasts. Multiple linear regression [3-5], time-series analysis [6-7], artificial neural networks [8-9], hybrid models [10-11], and support vector machines [12-13] have been compared by scholars working in this field. Recently, gene expression programming has been employed by researchers in forecasting models [14]. Unlike most of the emerging complicated techniques, GEP is not a black-box model failing to deliver models formulation. Therefore, it can have a certain edge over conventional models. Coupled waveletGEP models gave promising results in other fields such as precipitation prediction [15], runoff forecasting [16], and stream-flow prediction [17]. Therefore, the prime objective of this research is to investigate the performance of this approach in long term forecasting of water demand in City of Kelowna district (CKD), Canada. 2. Methodology 2.1. Gene Expression Programming Inspired by Darwin’s theory of evolution, gene expression programming (GEP) was proposed by Ferreira [18], as an alternative or complement to other genetic based computer programming techniques like genetic programing (GP) and genetic algorithms (GA). This model works based on two simple entities: 1) chromosomes 2) expression trees. It starts with random generation of chromosomes which are linear fixed string of numbers defined by the genes. Moreover, unconstrained applications of genetic operators (e.g. Replication, recombination, mutation, and etc.) are allowed on these linear chromosomes. Figure 1 shows a simple structure or expression tree (ET) diagram of a sample candidate solution which shows how the encoding differs from GP and GA. Such diagrams should be read from left to right. These models are based on a training which enhances the algorithms to look for the optimum candidate solution or “offspring/children” among the generated population subjected to a selection environment. In this paper, the maximum fitness was used as stopping condition of the developed GEP models. Following previous researchers [15] and suggested values by Ferreira [19], 30 chromosomes, 8 head sizes, and 3 genes were used for model structures. Root mean square of error (RMSE) was used as a fitness function to fit a curve to target values.
( a b) u (c d ) Fig. 1. Structure of same GEP candidate solution
546
Peyman Yousefi et al. / Procedia Engineering 186 (2017) 544 – 550
2.2. Wavelet transform Time series data are usually associated with missing some information in their raw time domain; therefore, mathematical transformations have been used to exhibit extra information from such data to make their natural dynamics more readily apparent. Fourier transform (FT) is widely known as the most popular transform function among researchers. One major disadvantage associated with FT is its lack of application in stationary data due to loss of time information while transferring data from time domain to frequency domain [20]. Grossmann and Morlet [21] introduced Wavelet transform (WT) as a remedy to the mentioned shortcomings of FT. In wavelet analysis the use of a fully scalable modulated window solves the signal-cutting problem. This allows simultaneous consideration of both time and frequencies by WT. Long intervals can be studied under low frequency information, whereas short intervals can be used to capture information with high frequency. The scaled version of the continuous WT function is formed through summation of all signals over time. Equation (1) represents mother wavelet function:
y s ,t (t ) = -1
1 t-t y( ) s s
(1)
is called the energy normalization, t represents the temporal translation parameter and s is the stretching parameter. However, continuous form of WT is not used in practise due to the major drawbacks of this function, such as redundancy and not being able to have analytical solutions to many wavelets. Therefore, discrete wavelet transforms are stretched and translated in discrete steps. Equation (2) represents a discrete wavelet function:
s
2
y j ,k (t ) =
1 s j0
y(
t - kt 0 s j 0 ) s j0 s
(2)
s >0
t
s =2
t =1
Where j and k are integers, 0 is a fixed stretching step ( 0 ) and 0 translation parameter. 0 and 0 are usually attributed to dilation and translation parameters. The signals or time series data are decomposed in a stepwise procedure by Mallat algorithm. Moreover, signals with low frequencies are deployed to be decomposed into high frequency time series or signals. After the Nth step has been fulfilled, the original signal is decomposed into equation (3):
X = D1 + D2 +... + DN + AN
(3)
D1 , D2 ,..., DN are the high frequency signals developed at the first step and at the second step until the Nth step respectively. AN denotes the low frequency signal developed at the Nth step. Where
2.3. Models Development High sensitivity of GEP models to phase space reconstruction of input time series data has been proved before [22]. This paper will follow our previous research to assess the effect of optimum lag time in each one of the variables separately. Moreover, WT with 2 transfer functions (db2 and haar) and 3 levels will be applied to improve the performance of the GEP models. Table 1 shows the design combinations based on t as current month, D as demand in MGl, T as temperature C°, and P as total precipitation in mm. Average mutual information (AMI) was used to find the optimum lag time for each one of the input variables [22]. Water consumption data of 1996-2010 in City of Kelowna has been partitioned into 80% as training and 20% as testing period. Temperature and precipitation data were collected from weather station A located at Kelowna airport.
547
Peyman Yousefi et al. / Procedia Engineering 186 (2017) 544 – 550
Table 1. Design combinations
Model ID
3.
Input variables
ࡰࢋ࢙ࢍ࢈ሺࡰሻ
ܦ௧ିଷ ǡ ܶ௧ିଵ ǡ ܲ௧ିଵ
ࡰࢋ࢙ࢍ࢈ሺࡰሻ
ܦ௧ିଵ ǡ ܶ௧ିଷ ǡ ܲ௧ିଵ
ࡰࢋ࢙ࢍ࢈ (D3)
ܦ௧ିଵ ǡ ܶ௧ିଵ ǡ ܲ௧ିଵ
ࡰࢋ࢙ࢍ࢈ (D4)
ܦ௧ିଵ ǡ ܦ௧ିଶ ǡ ܦ௧ିଷ ǡ ܶ௧ିଵ ǡ ܶ௧ିଶ ǡ ܶ௧ିଷ ǡ ܲ௧ିଵ ǡ ܲ௧ିଶ ǡ ܲ௧ିଷ
Results and Discussion
The combinations used in this paper are opted to investigate which design or use of proper lag times would improve the accuracy of the developed models. Table 2 summarizes the performance of the four GEP combinations used with their corresponding mathematical functions. In order to compare or measure the accuracy of these models, three performance indices were used: coefficient of determination (R2) and root mean square of error (RMSE) and mean absolute error (MAE). The first combination, based on optimum lag time of the water demand, showed first mathematical function {+, -, ×} can result in better performance of this model with a fairly high R 2 =0.8552 (D1F1). Surprisingly, the second combination behaved slightly different than the first combination. Looking into table 2, one can draw this conclusion that optimum lag time of temperature is selected as the superior model selected in this stage of the research. However, in this combination, the third selection of mathematical functions {+, -, ×, x, x2, x3,ξ ,݁ ݔ,݈ ݃,݈݊} outperformed the other two designs with R2 =0.8983 (D2F3). Explicit use of optimum lag time for rainfall resulted in a relatively poor design compared with other developed models, having the best model with R2 =0.6432(D3F2). Finally, the last combination which considered all possible lag times up to the optimum value for water demand, performed worse than the second combination with R2 =0.8452 (D4F2). Table 2. Performance of GEP models
Training
Testing
Function
Model ID
MAE
RMSE
R-Square
MAE
RMSE
R-Square
1
D1F1
2
D1F2
0.3304 0.3336
0.445 0.4374
0.8173 0.8079
0.2955 0.4019
0.3895 0.4891
0.8552 0.8102
3
D1F3
0.4256
0.5941
0.724
0.3641
0.5149
0.7647
1
D2F1
0.284
0.3833
0.8567
0.2652
0.3604
0.8754
2
D2F2
0.2863
0.4189
0.8531
0.2608
0.3779
0.8795
3
D2F3
0.2405
0.3557
0.8733
0.2407
0.3232
0.8983
1
D3F1
0.4564
0.6029
0.6478
0.4697
0.6243
0.626
2
D3F2
3
D3F3
0.4713 0.4496
0.6045 0.5959
0.635 0.6435
0.4556 0.4758
0.6007 0.6201
0.6432 0.6199
1
D4F1
0.3497
0.4596
0.7878
0.4036
0.4883
0.8153
2
D4F2
3
D4F3
0.32 0.4094
0.4306 0.5885
0.8164 0.7057
0.3165 0.3802
0.401 0.535
0.8452 0.7643
548
Peyman Yousefi et al. / Procedia Engineering 186 (2017) 544 – 550
In the second part of this research, wavelet decompositions of the input variables were investigated to assess the use of this mathematical transformation technique in GEP models for water demand forecasts. Table 3 shows the detailed comparison of all combinations for 2 transfer functions (db2 and haar) and 3 levels used in this study. The first three combinations failed to improve their overall performance using decomposed input variables by wavelet algorithms. However, the fourth combination happened to be highly sensitive to decomposition of input variables in their design. D4F2-haar-L1was selected as the superior model with an almost perfect R2 close to 1.
Table 3. Performance of Wavelet-GEP models
Training Level 1
D1F1-haar-L1
MAE 0.3621
RMSE 0.4551
R-Square 0.7942
MAE 0.4212
RMSE 0.5299
R-Square 0.7737
1
D1F1-db2-L1
0.3171
0.425
0.8196
0.4331
0.5702
0.77
2
D1F1-haar-L2
0.3173
0.4045
0.8359
0.4249
0.5455
0.7519
2
D1F1-db2-L2
0.2595
0.3453
0.8894
0.3679
0.5117
0.762
3
D1F1-haar-L3
0.3447
0.4349
0.8112
0.3974
0.51
0.7633
3
D1F1-db2-L3
0.3272
0.42
0.8481
0.3546
0.4496
0.835
1
D2F3-haar-L1
0.3492
0.5308
0.7546
0.4024
0.5729
0.7598
1
D2F3-db2-L1
0.3663
0.4533
0.8265
0.3562
0.4399
0.8266
2
D2F3-haar-L2
0.2161
0.2876
0.9175
0.3343
0.439
0.8321
2
D2F3-db2-L2
0.3118
0.4079
0.8342
0.3113
0.3981
0.8458
3
D2F3-haar-L3
0.3355
0.4256
0.8215
0.3243
0.4152
0.8509
3
D2F3-db2-L3
0.3341
0.4518
0.8022
0.3524
0.5114
0.7451
1
D3F2-haar-L1
0.4719
0.5968
0.6659
0.5181
0.6852
0.6598
1
D3F2-db2-L1
2
D3F2-haar-L2
0.4248 0.325
0.5265 0.4185
0.7303 0.8306
0.5121 0.3611
0.6521 0.488
0.5806 0.7892
2
D3F2-db2-L2
0.3321
0.4148
0.8297
0.4205
0.5756
0.719
3
D3F2-haar-L3
0.3551
0.4549
0.7943
0.5284
0.8568
0.5322
3
D3F2-db2-L3
1
D4F2-haar-L1
0.3309 0.0018
0.4471 0.002
0.8002 0.9999
0.4397 0.0018
0.601 0.002
0.6872 0.9999
1
D4F2-db2-L1
0.0017
0.002
0.9999
0.017
0.0906
0.9921
2
D4F2-haar-L2
0.0019
0.0024
0.9999
0.0018
0.002
0.9999
2
D4F2-db2-L2
0.0549
0.1271
0.9843
0.0773
0.1522
0.9788
3
D4F2-haar-L3
0.1496
0.1817
0.9692
0.1422
0.178
0.9742
D4F2-db2-L3
0.1835
0.2409
0.9424
0.1996
0.2553
0.9368
3
Model ID
Testing
549
Peyman Yousefi et al. / Procedia Engineering 186 (2017) 544 – 550
Figure 2 illustrates how wavelet decomposition can improve the performance of GEP models that consider all previous time steps defined by the optimum lag time of water demand. It compares the performances of D 4F2 (GEP) with D4F2-haar-L1 (WGEP). It is shown how close the predictions are in WGEP model, as they are almost on the same line representing the actual water demand in City of Kelowna (Figure 2). Equations 4-5, show formulation of the mentioned compared models D4F2-haar-L1 and D4F2 respectively. It can be observed that GEP models use a selection of the input variables through the learning process based on the best candidate solutions.
Fig. 2. Comparison of GEP and WGEP models over testing period
ܦௐீா ൌ ʹǤʹͻ݀ଵሺ
షభ ሻ
ଶସ
ൈ ݀ଵሺ
షయ ሻ
െ ݀ଵሺ
షభ ሻ
ଶ
െ ݀ଵሺ
షభ ሻ
ଶସ
ൈ ݀ଵሺೃ
ଷ ଶ ீܦா ൌ ܶ௧ିଵ െ ͵ ൈ ܦ௧ିଵ ൈ ܶ௧ିଵ ܶ௧ିଵ ൈ ܦ௧ିଵ ܶ௧ିଵ െ ͲǤͶͲͶܦ௧ିଷ
4.
షభ ሻ
െ ܣଵሺೃ
ଶ షయ ሻ
ൈ ݀ଵሺ
షయ ሻ
ଵ଼
݀ଵሺ
షభ ሻ
(4) (5)
Conclusion
Time series of explanatory variables such as temperature, precipitation, and water demand are widely known for feeding water demand forecasting models. At a given time domain, some of these data fail to convey all the embedded information in their temporal resolution. This has convinced researchers to utilize mathematical transfer functions to magnify the properties of such variables. Not enough attention has been paid to GEP models in water demand forecasting literature. Therefore, the prime objective of this research was to evaluate performances of these models coupled with wavelet decomposition. This study proved wavelet decomposition can significantly improve the performance of GEP models only if all possible lag times are used among the input variables. The proposed superior model can by far outperform most of the conventional water demand forecasting techniques as it resulted in very high values of performances indices. The outcome of this study tips gene expression programming as one of the emerging techniques in water demand forecasting which should be investigated further in detail. Acknowledgements The authors received financial support from the Natural Sciences and Engineering Research Council (NSERC) of Canada. The Okanagan Basin Water Board and the City of Kelowna are thanked for providing water consumption data.
550
Peyman Yousefi et al. / Procedia Engineering 186 (2017) 544 – 550
References [1] L. Beck, T. Bernauer, How will combined changes in water demand and climate affect water availability in the Zambezi River Basin? Global Environmental Change. 21(3)(2011) 1061-1072. doi:10.1016/j.gloenvcha.2011.04.001 [2] A.A. Makki, R.A. Stewart, C.D. Beal, L. Panuwatwanich, Novel bottom-up urban water demand forecasting model: revealing the determinants, drivers and predictors of residential indoor end-use consumption, Resources, Conservation and Recycling. 95(2015) 15-37. [3] L. Brekke, M. Larsen, M. Ausburn, L. Takaichi, Suburban water demand modeling using stepwise regression, Journal of American Water Works Association, 94 (10) (2002) 65–75. [4] A. Polebitski, R. Palmer, P. Waddell, Evaluating water demands under climate change and transitions in the urban environment. Journal of Water Resources Planning and Management, 137(3) (2010) 249-257. [5] S.J. Lee, E.A. Wentz, P. Gober, Space-time forecasting using soft geostatistics: A case study in forecasting municipal water demand for Phoenix, Arizona, Stochastic Environmental Research and Risk Assessment. 24(2) (2010) 283–295. [6] S.L. Zhou, T.A. McMahon, A.Walton, J. Lewis, Forecasting daily urban water demand: A case study of Melbourne, Journal of Hydrology. 236(3) (2000) 153–164. DOI: 10.1016/S0022-1694(00)00287-0. [7] J. Alhumoud, Freshwater consumption in Kuwait: Analysis and forecasting, Journal of Water Supply Research and Technology, AQUA. 57(4) (2008) 279–288. DOI: 10.2166/aqua [8] L. Jentgen, H. Kiddler, R. Hill, S. Conrad, Energy management strategies useshort-term water consumption forecasting to minimize cost of pumping operations, Journal of American Water Works Association. 99(6) (2007) 86-94. [9] M. Ghiassi, D. Zimbra, H. Saidane, Urban water demand forecasting with a dynamic artificial neural network model, Journal of Water Resources Planning and Management. 134(2) (2008) 138–146. [10] A. Aly, N. Wanakule, Short-term forecasting for urban water consumption, Journal of Water Resources Planning and Management. 130(5)(2004) 405–410. DOI: 10.1061/(ASCE)0733-9496(2004)130:5(405) [11] X. Wang, Y. Sun, L. Song, C. Mei, An eco-environmental water demand based model for optimising water resources using hybrid genetic simulated annealing algorithms. II: Model application and results, Journal of Environmental Management. 90(8) (2009) 2612–2619. DOI:10.1016/j.jenvman.2009.02.009 [12] IS. Msiza, FV. Nelwamondo, T. Marwala, Water demand prediction using artificial neural networks and support vector regression, Journal of computers. 3(11) (2008) 1-8. [13] M. Herrera, L.Torgo, J. Izquierdo, R. Pérez-García, Predictive models for forecasting hourly urban water demand, Journal of hydrology. 387(1) (2010) 141-150. [14] J. Shiri, S. Kim, O. Kisi, Estimation of daily dew point temperature using genetic programming and neural networks approaches, Hydrology Research. 45(2) (2014) 165-181. [15] O. Kisi, J. Shiri, Precipitation forecasting using wavelet-genetic programming and wavelet-neuro-fuzzy conjunction models, Water Resources Management. 25(13) (2011) 3135-3152. [16] M. Shoaib, A. Shamseldin, B.W. Melville, M.M. Khan, Runoff forecasting using hybrid Wavelet Gene Expression Programming (WGEP) approach, Journal of Hydrology.527(2015) 326-344. [17] S. Karimi, J. Shiri, O. Kisi, A.A. Shiri, Short-term and long-term streamflow prediction by using wavelet–gene expression programming approach, ISH Journal of Hydraulic Engineering. (2015) 1-15. [18] C. Ferreira, Mutation, Transposition, and Recombination: An Analysis of the Evolutionary Dynamics, In JCIS. (2002) 614-617. [19] C. Ferreira, Gene expression programming: mathematical modeling by an artificial intelligence 2006. (Vol. 21). Springer. [20] D. Gabor, Theory of communications. Part 1: The analysis of information, J. Institut. Electr. Engrs.95 (38) (1948) 429–441. [21] A. Grossmann, J. Morlet, Decomposition of Hardy functions into square integrable wavelets of constant shape, SIAM J. Math. Anal. 15 (4) (1984) 723–736. [22] S. Shabani, P.Yousefi, J.F. Adamowski, Gh. Naser, Intelligent Soft Computing Models in Water Demand forecasting, Intech- Water Stress. (2016).