1 Spatial Analysis of Extreme Rainfall Values Based on Support Vector Machines Optimized by Genetic Algorithms: The Case of Alfeios Basin, Greece Paraskevas Tsangaratos1, Ioanna Ilia1, Ioannis Matiatos2 1 LABORATORY OF ENGINEERING GEOLOGY HYDROGEOLOGY, DEPARTMENT OF GEOLOGICAL STUDIES, SCHOOL OF MINING AND ME TALLURGICAL ENGINEERING, NATIONAL TECHNI CA L UNIVERS ITY OF A THENS, AT HENS, GRE ECE 2 F ACULTY OF GEOLO GY AND GEOENVIRONME NT, N AT IONAL AND KAPODI STRIAN UNIVERSITY OF ATHENS, PANEPISTIMIOUPOLI, GREECE
1.1 Introduction The rainfall intensity duration frequency (IDF) curves are one of the most commonly used investigational tools in water resources engineering (Alemaw & Chaoka, 2016; Fadhel, RicoRamirez, & Han, 2017). The IDF curves provide essential information in planning, designing, operating, and protecting water resource projects or engineering projects against floods (Minh Nhat, Tachikawa, & Takara, 2006). In general, the IDF curve is a mathematical relationship between the rainfall intensity i, the duration d, and the return period T. Through the use of IDF curves one can estimate the return period of an observed rainfall event or the rainfall amount corresponding to a given return period for different aggregation times (Koutsoyiannis, 2003; Koutsoyiannis, Kozonis, & Manetas, 1998). The IDF curve estimation at gauged sites requires the analysis of precipitation extremes, which are reported as the annual maximum precipitation amounts measured in time intervals of a predefined duration. However, due to the low density and sparse distribution of rain-gauged sites, problems arose that have to do with the uncertainty and accuracy when trying to interpolate spatially the estimated extreme values and providing a spatial distribution map (El-Sayed, 2011; Liew, Raghavan, & Liong, 2014). Moreover, it has been stated that spatial variability of rainfall is
Spatial Modeling in GIS and R for Earth and Environmental Sciences. DOI: https://doi.org/10.1016/B978-0-12-815226-3.00001-6 © 2019 Elsevier Inc. All rights reserved.
1
2
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
sensitive to the location of the rain gauge from which rainfall data are collected (Bell et al., 2002; Looper & Vieux, 2011). It is also well known that besides elevation, which is considered to be a strong determinant of climate, rainfall may be influenced by the geo-environmental settings of the surroundings, such as geographical location, slope, aspect or bearing of the steepest slope, exposure, wind direction, proximity to the sea or other water bodies, and proximity to the crest or ridge of a mountain range (Agnew & Palutikof, 2000; Al-Ahmadi & Al-Ahmadi, 2013; Alijani, 2008; Buytaert, Celleri, Willems, Bievre, & Wyseure, 2006; Daly et al., 1994; Ding et al., 2014; Eris & Agiralioglu, 2009; Wang et al., 2017; Yao, Yang, Mao, Zhao, & Xu, 2016). The common approach to spatial interpolation is the use of deterministic, geostatistical methods and regression-based models (Begueria & Vicente-Serrano, 2006; Burrough & McDonnell, 1998; Ly, Charles, & Degre, 2013; Naoum & Tsanis, 2004; Vicente-Serrano, Lanjeri, & Lopez-Moreno, 2007). Several comparative studies concerning the interpolation of extreme rainfall values can be found in the literature. Weisse and Bois (2002) applied geostatistical methods and regression models to estimate extreme precipitation, concluding that geostatistical methods performed better only when the rain-gauging network was dense enough. Begueria and Vicente-Serrano (2006) mapped the hazard of extreme precipitation by linking the theory of extreme values analysis and spatial interpolation techniques. The authors applied geo-regression techniques, including location and other spatially independent parameters as predictors and reported that they produced significant and well-fitted models. Similarly, Ly et al. (2011), developed different algorithms of spatial interpolation for daily rainfall and compared the outcomes of geostatistical and deterministic approaches. The authors concluded that spatial interpolation with the geostatistical and inverse distance weighting (IDW) algorithms outperformed considerably interpolation with the Thiessen polygon method. Chen et al. (2017) analyzed and evaluated different methods of spatial rainfall interpolation at annual, daily, and hourly time scales. A regression-based scheme was developed utilizing principal component regression with residual correction (PCRR) and was compared with IDW and multilinear regression (MLR) interpolation methods. The authors report that PCRR showed the lowest streamflow error and the highest correlation with measured values at the daily time scale. Recently, more advanced investigation techniques in spatial rainfall prediction and missing rainfall estimation have been applied, which are based on machine learning methods, such as fuzzy and neuro-fuzzy logic, artificial neural networks (ANNs), support vector machine (SVM), and genetic algorithms (GAs) (Bryan & Adams, 2002; Chang, Lo, & Yu, 2005; Gilardi & Bengio, 2000; Kajornrit, Wong, & Fung, 2014; Kajornrit, Wong, & Fung, 2016; Kisi & Sanikhani, 2015; Paraskevas, Dimitrios, & Andreas, 2014). Chang et al. (2005) proposed a method which combined the inverse distance method and fuzzy theory, in order to interpolate precipitation in an area in northern Taiwan. GA was used to determine the parameters of fuzzy membership functions, which represent the relationship between the location without rainfall records and its surrounding rainfall gauges, with the objective of the optimization process being minimizing the estimated error of precipitation. The results confirmed that the method is flexible, performing much better than traditional methods,
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
3
particularly when the differences between the elevations of the rainfall gauge in question and the surrounding rainfall gauges are significant. Paraskevas et al. (2014) developed a multilayer feedforward back-propagation ANN in order to evaluate the spatial distribution of mean annual precipitation in Achaia County, Peloponnesus, Greece. The ANN used as input variables the latitude, longitude, and elevation of each observation station, and as a target variable, the mean annual precipitation values. The authors reported that the performance and the estimate of error results for the interpolating precipitation data had an acceptable level of accuracy, suggesting that ANN could be appreciated as a spatial interpolation method that improves the accuracy of analysis. Following a different approach, Kajornrit et al. (2016) proposed a methodology to analyze monthly rainfall data in the northeast region of Thailand and to establish an interpretable fuzzy model used as a spatial interpolation method. The developed model, based on a dataset which comprised information on the longitude, latitude, and amount of monthly rainfall, integrates various soft computing techniques including fuzzy system, ANN, and GA. According to the authors, the results demonstrated that the established models could serve as an alternative technique to create rainfall maps, are capable of providing reasonable interpolation accuracy as well as providing satisfactory model interpretability, and overall could be useful in understanding the characteristics of the spatial data. In this context, to overcome the low density and sparse distribution, ancillary variables were decided to be used and also the implementation of advanced spatial interpolation techniques that could model more accurately the variations in precipitation over the area in question. The novelty of the present study is the usage of an SVM optimized by GA, as a spatial interpolation method. Following the proposed methodology, the spatial distribution of daily extreme rainfall values was estimated and an accurate continuous surface was produced based on the estimation of the optimized SVM-GA model. Eleven topographic indices were selected as independent variables, namely: longitude, latitude, elevation, minimum, maximum and mean elevation within a 5-km radius around the rainfall gauge station, slope angle, slope aspect, mean slope angle, mean slope aspect within a 5-km radius around the rainfall gauge station, and distance from the coastline. The depended variable was the daily extreme rainfall value for a 5-year return period. The 5-year return period was set as the appropriate return period for flood management applications based on the assumption that repeated extreme rainfall events within a short time period at a given location would alert competent authorities to design appropriate infrastructure to mitigate potential damage caused by those extreme rainfall events. According to Zahiri, Bamba, Famien, Koffi, and Ochou (2016), the same event occurring once every 5 years would inflict more damage, while an event with a 50-year return period is not of interest to many decision makers. The main advantages of SVM, against conventional interpolation methods, is that SVMs do not make any assumptions regarding the nature of data and can handle nonlinear relations between the input and outputs (Kajornrit et al., 2014, 2016; Kong & Tong, 2008; Nourani et al., 2009). On the other hand, the usage of SVM requires tuning a set of parameters that mainly affect the ability of generalization (estimation of accuracy), such as the cost (C), epsilon (ε), and kernel parameters, gamma (γ), while ideal for the tuning process
4
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
are GA, due to their high global search ability (Chen et al., 2017). Several R packages (“e1071,” “GA,” “caret,” “corrplot,” and “raster”) were implemented in R (R Core Team, 2017), whereas a geographic information system (GIS) (ArcGIS 10.3.1) was utilized to process the spatial data. For the implementation of the developed methodology the Alfeios water basin, Peloponnesus, Greece, was chosen as an appropriate test site.
1.2 The Study Area The Alfeios water basin occupies an area of approximately 3810 km2 located in western Peloponnese, Greece. The basin is bounded to the north by the mountainous range of Erymanthos, east by the mountains of Artemisiou, south by the mountain of Lykaion, and west by the Kyparissiakos Gulf (Fig. 1-1). The geomorphological relief of the basin is characterized as mountainous and abrupt in area, with elevation higher than 600 m (52.5% of the entire area), semimountainous and hilly in areas with elevation between 100 and 600 m (36.9% of the entire area), and flat in the coastal zone (10.5% of the entire area). The maximum observed elevation is 2253 m and the mean elevation 648 m. The mean slope of the basin is approximately 14 degrees.
FIGURE 1-1 The study area.
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
5
Concerning the climatic conditions, the coastal and flat areas are characterized by a marine Mediterranean climate, whereas the inland presents a continental and mountainous type. In general, the area is characterized by mild winters and cool summers due to the impact of the sea. The temperature rarely falls below zero in winter and only in the inland exceeds 40 C during the summer. The relative humidity varies between 67.5% and 70%, with December being the wettest month and July and August the driest. Rainfalls are abundant from October to March, and rain heights are more than twice as high as those in the eastern Peloponnese. The mean annual precipitation averages 1100 mm, whereas the mean annual temperature is 19 C (MDDWPR, 1996).
1.3 Methodology and Data The developed methodology could be separated into two phases: (1) the phase of processing data and estimating the correlation coefficients and variable importance of the variables in question and (2) the phase of constructing a continuous surface of extreme rainfall values. Fig. 1-2 illustrates a flowchart of the developed methodology, and details of each phase are presented in the following paragraphs.
FIGURE 1-2 Flowchart of the developed methodology.
6
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
In the first phase, and based on the available intensity duration frequency (IDF) curves concerning 38 rain gauge stations located within the research area, the daily extreme rainfall value for a 5-year return period was estimated, which served as the depended variable (Table 1-1). The IDF data were obtained from the Ministry of the Environment and Energy (http://floods.ypeka.gr/). Table 1-1
Daily Extreme Rainfall Values for a 5-Year Return Period
Name
Longitude
Latitude
Daily Extreme Rainfall Value for a 5-Year Return Period (mm)
Amigdalia Ano Karies Ano Loussoi Araxamites Axladini Basilakio Vitina Dafni(1) Dafni(2) Desino Zatouna Zoni Karatoula Karkalou Karitaina Kastellio Likouria(1) Likouria(2) Mallota Matesi Neoxori Mantineias Pagrataiika Kalivia Panagitsa Paparis Perdikoneri Peukai Piana(1) Piana(2) Planitero Potamia Pirgos Poino Strefio Tripotama Tropaia Tripith Tselepako Dam Ladona
330744 323112 336479 344471 303703 302189 340055 347146 326083 323165 325475 333349 339141 330947 326645 328420 342934 342603 338998 316394 328795 336414 343176 346307 323027 295974 344624 344494 338815 335233 272878 348222 284191 315242 320027 304080 346648 321212
4179447 4144856 4207266 4145095 4177167 4168750 4170528 4136602 4185713 4199962 4162010 4147469 4147786 4166888 4150204 4197095 4192114 4191582 4140389 4155585 4134800 4187688 4181659 4136551 4178096 4171877 4159790 4159657 4199942 4129023 4172841 4160565 4170364 4193875 4177703 4160135 4155044 4180541
100.3 122.1 88.0 88.7 109.2 120.8 97.5 99.8 95.3 97.0 109.3 95.4 91.3 94.0 92.4 106.7 64.3 87.3 98.6 53.0 105.8 76.3 74.8 102.6 101.8 84.1 89.2 108.4 71.5 104.6 93.4 102.3 108.4 87.3 79.8 93.0 94.9 105.2
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
7
The following step involved formulating the conceptual model on which the prediction of the spatial patterns of extreme rainfall values for events with return periods of 5 years was achieved. The model had as independent variables, the longitude, latitude, elevation, slope angle and slope aspect of the rain gauges, the minimum, mean and maximum elevation, mean slope angle, and mean slope aspect within a radius of 5 km around the stations, and finally the distance of each station from the coastline. All variables were derived from a DEM file (http://www.opendem.info/) with a grid size of 25 m 3 25 m. Afterwards, each variable was normalized using a max min normalization procedure, so that all variables received equal attention during the training process. The normalized values were limited between 0.1 and 0.9 (Wang & Huang, 2009). During this phase all variables were transformed to ASCII files within the ArcGIS platform and the Conversion ToolBox, and the “raster” package (Hijmans & van Etten, 2012) was used to transform ASCII files into a format that could be further analyzed by R (Fig. 1-3). Within this phase, the correlation coefficients and variable importance were estimated by utilizing “corrplot” (Wei & Simko, 2017) and “caret” (Kuhn, 2008) packages. Highlighting the most correlated variables and indicating the most important variables provides additional information about the developed conceptual model. Finally, during the processing data phase and based on spatial function embodied in the Spatial Analyst Toolbox of ArcGIS 10.3.1, a grid value was extracted from each variable at locations that correspond to the locations of rain gauges, and the database was randomly separated into a training dataset for training the model and a validation dataset for evaluating the performance of the model. The training dataset included 70% of the total data, whereas the remaining 30% was included in the validation dataset. The second phase involved optimizing SVM with GA in order to create a continuous surface that represented the spatial daily extreme rainfall distribution. Brief descriptions of the two algorithms are presented in the following paragraphs. SVM are nonparametric kernel-based methods which are mainly used in solving linear and nonlinear classification and regression problems (Moguerza & Munoz, 2006; Vapnik, 1998). In order to separate data patterns, SVM applies an optimum linear hyperplane using kernel functions to transform the original nonlinear data patterns into a format that is linearly separable in a high-dimensional feature space (Yan, Li, & Ma, 2008). SVM in R was first introduced in the “e1071” package (Dimitriadou, Hornik, Leisch, Meyer, & Weingessel, 2005). The svm() function, which is used to train the SVM, provides a rigid interface to libsvm along with visualization and parameter tuning methods (Karatzoglou, Meyer, & Hornik, 2006). libsvm is an easy-to-use implementation which includes linear, polynomial, RBF, and sigmoid kernels (Chang & Lin, 2001). The SVM for regression problems uses the same principles as the SVM for classification, however the results are real numbers (Gunn, 2007; Vapnik, Golowich, & Smola, 1997). The main objective when applying SVM regression is to minimize the error estimated by taking into account the predictive and actual values. The generalization performance and efficiency of SVM for regression are influenced by the kernel width γ (gamma), the ε (epsilon), and the regularization parameter C (cost) (Smola & Schölkopf, 1998) thus, optimizing these parameters is a necessary process. Although the
8
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
FIGURE 1-3 The normalized independent variables: (A) longitude, (B) latitude, (C) elevation, (D) slope angle, (E) slope aspect, (F) minimum elevation within a 5-km radius, (G) mean elevation within a 5-km radius, (H) maximum elevation within a 5-km radius, (I) mean slope angle within a 5-km radius, (J) mean slope angle within a 5-km radius, (K) distance from coastline.
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
9
FIGURE 1-3 (Continued).
“e1071” provides a tuning process based on the grid-search method over supplied parameter ranges (Dimitriadou et al., 2005), the overall number of evaluated models in grid-search can become quite big.
10
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
GA is one of the most well-known evolutionary algorithms that have been widely used in optimization problems (Mitchell, 1996). Holland (1975) presented the GA as an abstraction of biological evolution, which involves bio-inspired operators such as selection, crossover, and mutation, and gave a theoretical framework for adaptation under the GA. The use of GA in optimization problems is based on the presence of a population of “chromosomes,” which are regarded as solutions to the optimization problem and are assessed through a cost function, the fitness function. In our case, a solution is expressed by the three SVM parameters cost, epsilon, and gamma, with the real-valued parameters used to form a chromosome, unlike traditional binary GAs which must be translated into binary codes. GA in R was introduced by Scrucca (2013), who describes the “GA” package as a collection of general-purpose functions that provide a flexible set of tools for applying a wide range of genetic algorithm methods. The main function in the package is called ga(), however the most significant arguments have to do with the type of GA to be run, which depends on the nature of the decision variables (binary, real-values, permutation) and the fitness function, which takes as input a potential solution and returns a numerical value describing its “fitness.” The final step of the second phase was validating the predictive performance of the optimized SVM model and comparing it with the performance of a MLR model. MLR is a statistical technique that consists of finding a linear relationship between a dependent (observed) variable and more than one independent variables (Wilks, 1995). To validate the outcomes of the optimized SVM and MLR, three statistical metrics, the RMSE (a quadratic scoring rule that measures the average magnitude of error), the r square (R2) (a measure of how well the outcomes are replicated by the model), and the mean squared error (MSE) (the average of the squares of the errors measuring the quality of the model) were calculated (Willmott et al., 1985).
1.4 Results During the first phase, the correlation coefficients and variable importance of the 11 variables were estimated. Fig. 1-4A illustrates the correlations with P-value ,.01 which are considered as significant. As is expected, a significant highly correlation appears between the elevation variable and the minimum (0.83), mean (0.91), and maximum elevation within a 5-km radius (0.76). Also, high correlations appears between longitude and the distance from the coastline (0.76), elevation (0.71) and the minimum (0.78), mean (0.77) and maximum elevation within a 5-km radius (0.76). The mean slope angle within a 5-km radius appears to be significantly correlated with the maximum elevation within a 5-km radius (0.82) and the mean elevation within a 5-km radius (0.74). Overall, the extreme rainfall values appear to be more correlated with the mean slope aspect within a 5-km radius (20.48), the maximum elevation within a 5-km radius (20.41), and the mean slope angle within a 5-km radius (20.39). Concerning the process of ranking the importance of the variables used by the extreme rainfall value model, the results of the nnet algorithm showed that the most
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
11
FIGURE 1-4 (A) Correlation matrix. (B) Rank order of variable importance: v1, longitude; v2, latitude; v3, elevation; v4, slope angle; v5, slope aspect; v6, minimum elevation within a 5-km radius; v7, mean elevation within a 5-km radius; v8, maximum elevation within a 5-km radius; v9, mean slope angle within a 5-km radius; v10, mean slope aspect within a 5-km radius; v11, distance from coastline.
important variable was longitude (12.83) followed by distance from the coastline (11.32), mean slope aspect within a 5-km radius (10.72), and elevation (10.49) (Fig. 1-4B). During the second phase, the tuning of the SVM structural parameters, cost, epsilon, and gamma, was performed by utilizing “e1071” and “GA” R packages. According to the developed methodology, the fitness function that was used was the accuracy rate (RMSE value) achieved by the SVM. The search domain for each parameter was set as follows: for the cost variable 1024 to 10, for the epsilon variable 1022 to 2 and the gamma variable 1023 to 2, whereas population size and the number of generations were set to 250 and 100, respectively. Crossover was set to 0.80 and mutation was set to 0.10. The optimal values of cost, epsilon, and gamma were estimated to be 3.39, 0.67, and 0.06, respectively. Based on the optimal parameters, the extreme rainfall value map for the Alfeios basin was constructed (Fig. 1-5). High values were identified in the western areas along the coastline in places reaching 132 mm/24 hours with a return period of 5 years. These areas are flat with low elevation. The implementation of the MLR model, based on the F-statistic value [F-statistic 5 1.67 (P , .1)], indicated that we should clearly reject the null hypothesis that the used variables have no effect on the extreme rainfall values. The results showed that the variables elevation and slope aspect (with P 5 .077 and P 5 .090, respectively), had a significant effect on the daily extreme rainfall values. Fig. 1-6 illustrates the spatial distribution of the daily extreme rainfall values of a 5-year return period based on the MLR method. A significantly different spatial pattern can be observed in comparison with the optimized SVM model.
12
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
FIGURE 1-5 The daily extreme rainfall values of a 5-year return period spatial distribution estimated by the optimized SVM model.
1.5 Performance Criteria The comparison of the results obtained by the optimized SVM model with the results obtained by the MLR model indicates the superiority of the optimized SVM model (Table 1-2). Evaluating the learning ability, on the training samples, the optimized SVM model achieved the highest r square value (0.74) indicating a good performance and a greater ability to replicate the model than the MLR model. The RMSE value was estimated to be 7.83 and MSE 61.38, both values lower than the values of MLR model (9.23 and 85.58, respectively). The same pattern of accuracy was detected when validating the performance based on the validation dataset. The r square value for the optimized SVM-GA model was higher (0.63) than that obtained by the MLR model (0.31), whereas the RMSE value for the optimized SVM-GA model was estimated to be 6.35 and MSE 40.31, also lower than those obtained by the MLR model (8.69 and 75.57, respectively). Overall, the quality of the outcomes produced by the SVM-GA model, expressed by the r square value, was higher than those achieved by the MLR model. The SVM-GA model fits the training and validation data well, since the differences between the observed values and the model’s predicted values are small and unbiased.
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
13
FIGURE 1-6 The daily extreme rainfall values of a 5-year return period spatial distribution estimated by the MLR model.
Table 1-2
Results of Analysis Training Dataset
Validation Dataset
Model
MAE
RMSE
R2
MAE
RMSE
R2
Multilinear regression Optimized SVM
85.58 61.38
9.23 7.83
0.52 0.74
75.57 40.31
8.69 6.35
0.31 0.63
Concerning the time needed for data preparation and to complete the learning phase, both models need approximately the same time. However, the MLR model needs less time to produce a result, since it only requires assigning to each variable the calculated coefficients and estimating the continuous surface through the usage of a single-line algebraic expression. On the other hand, the optimized SVM-GA model requires more time to provide a result, since the prediction phase involves transforming the outcome of the SVM-GA model into raster format, a time-consuming process. In the present study, the SVM-GA model produced a result in less than 7 minutes, while the MLR model needed less than 2 minutes [using a desktop PC with an Intel(R) Core (TM) i5-4460 CPU 3.20 GHz processor and 8.0GB RAM].
14
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
1.6 Discussion Spatial interpolation of precipitation and extreme rainfall values is a highly challenging topic of research which concerns geostatisticians, meteorologists, climatologists, and natural hazards practitioners (Foresti, Pozdnoukhov, Tuia, & Kanevski, 2010). The identification of the spatial and temporal variability of the extreme rainfalls and the estimation of the probability that rainfall exceeds a given amount is of crucial importance in water resource management and especially in mitigation and prevention of floods (Jung, Shin, Ahn, & Heo, 2017; Van Ootegem et al., 2016). This study provides a methodological approach which involves identifying the spatial patterns of extreme rainfall values for events with a 5-year return period by utilizing an SVM optimized by GA. SVM was selected on the bases that the available data are characterized by low spatial density and also showed a nonlinear interaction between precipitation and topography, making the usage of nongeostatistical, geostatistical methods, or spatial statistical methods less attractive, while GA was utilized to optimize the three parameters cost, epsilon, and gamma, due to their high global search ability (Chen et al., 2017). According to the correlation analysis, besides the correlation between the elevation variables, (minimum, mean, and maximum elevation within a 5-km radius), maximum elevation, mean slope aspect, and mean slope angle within a 5-km radius appear to be correlated with the daily extreme rainfall values for a 5-year return period. Such a correlation could be attributed to the fact that these three topographic variables may influence the microclimate of the area of question, the radiation, the precipitation, and temperature values (Busing, White, & MacKende, 1992; Stage & Salas, 2007). Johansson and Chen (2003), who investigated whether statistical relationships could be used to describe typical precipitation patterns related to topography and wind, reported that the single most important variable was the location of the rain gauge station with respect to a mountain range. This is in agreement with the outcomes of our study, which indicates maximum elevation within a 5-km radius around the rain gauge station as the most correlated variable. Also, similar findings have been reported by Hill, Browning, and Bader (1981), who point out that the values of topographic variables within a radius of a few kilometers around the rain gauge station, such as mean elevation and mean slope, are more representative for precipitation amounts than the actual station’s elevation. Concerning the importance of the variables used by the extreme rainfall value model, the present study is in agreement with previous studies, which report the significant importance of elevation, slope, and distance from the coastline. According to Haiden and Pistotnik (2009), for long accumulation periods such as monthly, annual, or interannual, elevation is the main factor of the small-scale precipitation distribution. Similarly, Sanchez-Moreno, Mannaerts, and Jetten (2013), who performed a multivariate linear regression analysis among daily, monthly, and seasonal rainfall and elevation, slope gradient, aspect, and geographic east and west coordinates as predictors in Santiago Island, Cape Verde, reported that elevation explains most of the variance in the rainfall. Variables such as slope, aspect, or coordinates can explain more than 50% of the rainfall variance in cases where rainfall has a low
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
15
correlation with elevation. Griffiths and McSaveney (1983) have reported that the distance from a moisture source is a significant parameter that influences the amount of precipitation. Wilson (1997) reported that for given latitude and elevation, the farther a site is from the coast, the larger the loss of marine moisture and, in general, the lower the average precipitation. Similarly, Zhu and Huang (2007) defined the maximum elevation within a 3-km radius of the site and the distance to Thousand-Islet Lake as important precipitation-influencing factors, in order to estimate precipitation values. The present study highlights the higher predictive performance of the SVM optimized by GA against conventional MLR models and the construction of an accurate daily extreme rainfall with a 5-year return period map. It reveals the existence of a complex and nonlinear variation of daily extreme rainfall pattern within the research area, which justifies the usage of SVM models. The outcomes are in agreement with previous reports that indicate the presence of a nonlinear relation between precipitation and elevation (Achite, Buttafuoco, Toubal, & Lucà, 2017). The generalization performance of SVM models depends on the metaparameters cost, epsilon, and kernel parameters (Hannan, Wei, & Wenda, 2011; Wang, Yang, & Dai, 2009; Yan et al., 2008). In general, the kernel parameter gamma is related to the local variability of the data. Specifically, the more the data are locally variable, the smaller they should be. In our case, the optimal gamma, was estimated to be 0.06, implying high variability among the training data. The precision parameter epsilon should not reach the difference between the highest and the smallest output values of the training set, in order to avoid misclassifying the data points as acceptable mistakes (Gilardi & Bengio, 2000). The difference between the highest and smallest output values of the training set after the normalization process was estimated at less than 0.8, while the optimal value was found to be 0.67. Finally, the optimal cost parameter, which is related to the confidence we have in our data, was estimated as having a value of 3.38, suggesting a rather fair confidence (Jiang & Deng, 2014).
1.7 Conclusions In the present study, a GA-optimized SVM model was used to calculate the spatial patterns of extreme rainfall values for events with return periods of 5 years in Alfeios water basin, Peloponnesus, Greece. “e1071” and “GA” R packages were implemented in order to optimize the parameters cost, epsilon, and gamma used by the SVM model, whereas GIS was utilized to process the spatial data and to create a continuous surface that represented the spatial daily extreme rainfall distribution within the research area. As expected, elevation variables were highly correlated, whereas the extreme rainfall values appear to be significantly correlated with the mean slope aspect, the maximum elevation, and the mean slope angle within a 5-km radius around the rain gauge stations. Longitude was identified as the most important variable which influences the spatial distribution of extreme rainfall values followed by the variable distance from the coastline, the mean slope aspect within a 5-km radius around the rain gauge stations, and elevation.
16
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
The outcomes of the study proved that GA is an excellent optimization method and also highlighted the significant advantage of the optimized SVM-GA model against the MLR model as a spatial interpolation tool. The accuracy of the optimized SVM-GA model could be justified by the fact that in principle SVM neither requires specifying the function that has to be modeled nor makes any underlying statistical assumptions about the data, as many geostatistical and regression-based models do. It can be concluded that SVM optimized by GA could be used as an alternative spatial interpolation method concerning the estimation of spatial distribution of extreme rainfall values.
References Achite, M., Buttafuoco, G., Toubal, K. A., & Lucà, F. (2017). Precipitation spatial variability and dry areas temporal stability for different elevation classes in the Macta basin (Algeria). Environmental Earth Sciences, 76, 458. Agnew, D. M., & Palutikof, P. J. (2000). GIS-based construction of baseline climatologies for the Mediterranean using terrain variables. Climatic Research, 14, 115 127. Al-Ahmadi, K., & Al-Ahmadi, S. (2013). Spatiotemporal variations in rainfall topographic relationships in southwestern Saudi Arabia. Arabian Journal of Geoscience. Available from https://doi.org/10.1007/s12517013-1009-z. Alemaw, B. F., & Chaoka, R. T. (2016). Regionalization of rainfall intensity-duration-frequency (IDF) curves in Botswana. Journal of Water Resource and Protection, 8, 1128 1144. Alijani, B. (2008). Effect of the Zagros Mountains on the spatial distribution of precipitation. Journal of Materials Science, 5, 218 231. Begueria, S., & Vicente-Serrano, S. M. (2006). Mapping the hazard of extreme rainfall by peaks over threshold extreme value analysis and spatial regression techniques. Jounral of Applied Meteorology and Climatology, 45, 108 124. Bell, V. A., Moore, R. J., & Brown, V. (2002). Snowmelt forecasting for flood warning in upland Britain. In M. Lees, & P. Walsh (Eds.), Flood forecasting: what does current research offer the practiotioner? BHS Occasional Paper 12. London: British Hydrological Society. Bryan, B. A., & Adams, J. M. (2002). Three-dimensional neuro interpolation of annual mean precipitation and temperature surfaces for China. Geographical Analysis, 34, 93 111. Burrough, P. A., & McDonnell, R. A. (1998). Principles of geographical information systems. Oxford: Oxford University Press. Busing, R. T., White, P. S., & MacKende, M. D. (1992). Gradient analysis of old spruce-fir forest of the Great Smokey Mountains circa 1935. Canadian Journal of Botany, 71, 951 958. Buytaert, W., Celleri, R., Willems, P., Bievre, B. D., & Wyseure, G. (2006). Spatial and temporal rainfall variability in mountainous areas: A case study from the south Ecuadorian Andes. Journal of Hydrology, 329 (3 4), 413 421. Chang, C. C., & Lin, C. J. Libsvm: A library for support vector machines. (2001). ,http://www.csie.ntu.edu.tw/ Bcjlin/libsvm.. Chang, C. L., Lo, S. L., & Yu, S. L. (2005). Applying fuzzy theory and genetic algorithm to interpolate precipitation. Journal of Hydrology, 314(1 4), 92 104. Chen, T., Ren, L., Yuan, F., Yang, X., Jiang, S., Tang, T., . . . Zhang, L. (2017). Comparison of spatial interpolation schemes for rainfall data and application in hydrological modeling. Water, 9, 342. Available from https://doi.org/10.3390/w9050342.
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
17
Daly, C., Neilson, R., & Phillips, D. (1994). A statistical-topographic model for mapping climatological precipitation over mountainous terrain. Journal of Applied Meteorology, 33(2), 140 158. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., & Weingessel, A. e1071: Misc functions of the Department of Statistics (e1071), TU Wien, Version 1.5 11. (2005). ,http://CRAN.R-project.org/.. Ding, H., Greatbatch, R. J., Park, W., Latif, M., Semenov, V. A., & Sun, X. (2014). The variability of the East Asian summer monsoon and its relationship to ENSO in a partially coupled climate model. Climate Dynamics, 42, 367 379. El-Sayed, E. A. H. (2011). Generation of rainfall intensity duration frequency curves for ungauged sites. Nile Basin Water Science and Engineering Journal, 4(1), 112 124. Eris, E., & Agiralioglu, N. (2009). Effect of coastline configuration on precipitation distribution in coastal zones. Hydrology Process, 23, 3610 3618. Fadhel, S., Rico-Ramirez, M., & Han, D. (2017). Uncertainty of Intensity Duration Frequency (IDF) curves due to varied climate baseline periods. Journal of Hydrology, 547, 600 612. Foresti, L., Pozdnoukhov, A., Tuia, D., & Kanevski, M. (2010). Extreme precipitation modelling using geostatistics and machine learning algorithms. In P. M. Atkinson, & C. D. Lloyd (Eds.), geoENV VII Geostatistics for environmental applications (pp. 41 52). Netherlands: Springer. Gilardi, N., & Bengio, S. (2000). Local machine learning models for spatial data analysis. Journal of Geographic Information and Decision Analysis, 4(1), 11 28. Griffiths, G. A., & McSaveney, M. J. (1983). Distribution of mean annual precipitation across some steepland regions of New Zealand. New Zealand Journal of Science, 26, 197 209. Gunn, S. (2007). Support vector machines for classification and regression. ISIS technical report. Southamption: University of Southampton, 1998. Haiden, T., & Pistotnik, G. (2009). Intensity-dependent parameterization of elevation effects in precipitation analysis. Advanced Geoscience, 20, 33 38. Hannan, M. A., Wei, T. C., & Wenda, A. (2011). Rule-based expert system for PQ disturbances classification using S-Transform and support vector machines. International Review on Modelling and Simulations, 4 (6), 3004 3011. Hijmans, R., & van Etten, J. Raster: Geographic analysis and modeling with raster data. R package version 2.0 12. (2012). ,http://CRAN.R-project.org/package 5 raster.. Hill, F. F., Browning, K. A., & Bader, M. J. (1981). Radar and raingauge observations of orographic rain over South Wales. Quarterly Journal of the Royal Meteorological Society, 107, 643 670. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press. http://floods.ypeka.gr/. http://www.opendem.info/. Jiang, L., & Deng, M. (2014). Support vector machine based mobile robot motion control and obstacle avoidance. Robotics: Concepts, Methodologies, Tools, and Applications, by Information Resources Management Association, 1, 85 111. Available from https://doi.org/10.4018/978-1-4666-4607-0.ch006. Johansson, B., & Chen, D. (2003). The influence of wind and topography on precipitation distribution in Sweden: Statistical analysis and modelling. International Journal of Climatology, 23, 1523 1535. Jung, Y., Shin, J.-Y., Ahn, H., & Heo, J.-H. (2017). The spatial and temporal structure of extreme rainfall trends in South Korea. Water, 9, 809. Available from https://doi.org/10.3390/w9100809. Kajornrit, J., Wong, K.W., & Fung, C.C. (2014). A modular spatial interpolation technique for monthly rainfall prediction in the northeast region of Thailand. In S. Boonkrong et al. (Eds.), Recent advances in information and communication technology, advances in intelligent systems and computing 265. Available from https://doi.org/10.1007/978-3-319-06538-0_6.
18
SPATIAL MODELING IN GIS AND R FOR EARTH AND ENVIRONMENTAL SCIENCES
Kajornrit, J., Wong, K. W., & Fung, C. C. (2016). An interpretable fuzzy monthly rainfall spatial interpolation system for the construction of aerial rainfall maps. Soft Computing, 20, 4631 4643. Available from https://doi.org/10.1007/s00500-014-1456-9. Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vector in R. Journal of Statistical Software, 15(9), 1 28. Kisi, O., & Sanikhani, H. (2015). Prediction of long-term monthly precipitation using several soft computing methods without climatic data. International Journal of Climatology. Available from https://doi.org/ 10.1002/joc.4273. Kong, Y. F., & Tong, W. W. (2008). Spatial exploration and interpolation of the surface precipitation data. Geographic Research, 27(5), 1097 1108. Koutsoyiannis, D. (2003). On the appropriateness of the Gumbel distribution for modelling extreme rainfall. Proceedings of the ESF LESC Exploratory Workshop, Hydrological Risk: Recent advances in peak river flow modelling, prediction and real-time forecasting, assessment of the impacts of land-use and climate changes. Bologna: European Science Foundation, National Research Council of Italy, University of Bologna, October 2003. Koutsoyiannis, D., Kozonis, D., & Manetas, A. (1998). A mathematical framework for studying rainfall intensity duration frequency relationships. Journal of Hydrology, 206, 118 135. Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28 (5), 1 26. Liew, S. C., Raghavan, S. V., & Liong, S.-Y. (2014). Development of Intensity-Duration-Frequency curves at ungauged sites: Risk management under changing climate. Geoscience Letters, 1, 8. Looper, J. P., & Vieux, B. E. (2011). An assessment of distributed flash flood forecasting accuracy using radar and rain gauge input for a physics-based distributed hydrologic model. Journal of Hydrology, 412 413, 114 132. Ly, S., Charles, C., & Degre, A. (2011). Geostatistical interpolation of daily rainfall at catchment scale: The use of several variogram models in the Ourthe and Ambleve catchments, Belgium. Hydrology and Earth System Sciences, 15, 2259 2274. Ly, S., Charles, C., & Degre, A. (2013). Different methods for spatial interpolation of rainfall data for operational hydrology and hydrological modeling at watershed scale. A review. Biotechnology, Agronomy, Society and Environment, 17(2), 392 406. MDDWPR (Ministry of Development, Directorate of Water and Physical Resources). (1996). A plan of project management of country water resources. Athens: Ministry of Development, Directorate of Water and Physical Resources. (in Greek). Minh Nhat, L., Tachikawa, Y., & Takara, K. (2006). Establishment of intensity-duration-frequency curves for precipitation in the monsoon area of Vietnam. Annuals of Disaster Prevention Research Institute, Kyoto University, 49B, 93 103. Mitchell, M. (1996). An introduction to genetic algorithms. Cambridge, MA: MIT Press9780585030944. Moguerza, M. J., & Muñoz, A. (2006). Support Vector Machines with Applications. Statistical Science, 21(3), 322 336. Naoum, S., & Tsanis, I. K. (2004). A multiple linear regression GIS module using spatial variables to model orographic rainfall. Journal of Hydroinformatics, 6, 39 56. Nourani, V., Alami, M. T., & Aminfar, M. H. (2009). A combined neural-wavelet model for prediction of Ligvanchai watershed precipitation. Engineering Applications of Artificial Intelligence, 22(3), 466 472. Paraskevas, T., Dimitrios, R., & Andreas, B. (2014). Use of artificial neural network for spatial rainfall analysis. Journal of Earth Systems and Science, 123(3), 457 465. R Core Team. (2017). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available from https://www.R-project.org.
Chapter 1 • Spatial Analysis of Extreme Rainfall Values
19
Sanchez-Moreno, J. F., Mannaerts, C. M., & Jetten, V. (2013). Influence of topography on rainfall variability in Santiago Island, Cape Verde. International Journal of Climatology, 34, 1081 1097. Scrucca, L. (2013). GA: a package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1 37. Smola, A.J., & Schölkopf, B. (1998). A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998. Stage, A. R., & Salas, C. (2007). Interactions of elevation, aspect, and slope in models of forest species composition and productivity. Forest Science, 53(4), 486 492. Van Ootegem, L., Van Herck, K., Creten, T., Verhofstadt, E., Foresti, L., Goudenhoofdt, E., . . . Willems, P. (2016). Exploring the potential of multivariate depth-damage and rainfall-damage models. Journal of Flood Risk Management. Available from https://doi.org/10.1111/jfr3.12284. Vapnik, V. (1998). Statistical learning theory. Berlin: Springer. Vapnik, V., Golowich, S., & Smola, A. (1997). Support vector method for function approximation, regression estimation and signal processing. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems 9 (pp. 281 287). Cambridge, MA: MIT Press. Vicente-Serrano, S. M., Lanjeri, S., & Lopez-Moreno, J. I. (2007). Comparison of different procedures to map reference evapotranspiration using geographical information systems and regression-based techniques. International Journal of Climatology, 27(8), 1103 1118. Wang, C. M., & Huang, Y. F. (2009). Evolutionary-based feature selection approaches with new criteria for data mining: a case study of credit approval data. Expert Systems With Applications, 36(3), 5900 5908. Wang, K., Yang, S., & Dai, T. (2009). A approach optimize support vector machine parameters using genetic algorithm. Computer Applications and Software, 26(7), 109 111. Wang, L., Chen, R., Song, Y., Yang, Y., Liu, J., Han, C., & Liu, Z. Precipitation altitude relationships on different timescales and at different precipitation magnitudes in the Qilian Mountains. (2017). Available from https://doi.org/10.1007/s00704-017-2316-1. Wei, T., & Simko V. R package “corrplot”: Visualization of a Correlation Matrix (Version 0.84). (2017). ,https://github.com/taiyun/corrplot.. Weisse, A. K., & Bois, Ph. (2002). A comparison of methods for mapping statistical characteristics of heavy rainfall in the French Alps: the use of daily information. Hydrological Sciences Journal, 47(5), 739 752. Wilks, D. S. (1995). Statistical methods in the atmospheric sciences: An Introduction. San Diego, CA: Academic Press. Willmott, C. J., Ackleson, S. G., Davis, R. E., Feddema, J. J., Klink, K. M., Legates, D. R., . . . Rowe, C. M. (1985). Statistics for the evaluation of model performance. Journal of Geophysical Research, 90(C5), 8995 9005. Wilson, R.C. (1997). Broad-scale climatic influences on rainfall thresholds for debris flows: Adapting thresholds for northern California to southern California. In R. A. Larson, J. E. Slosson (Eds.), Storm-induced geologic hazards. Available from https://doi.org/10.1130/REG11-p71. Yan, G., Li, C., & Ma, G. (2008). SVM parameter selection based on hybrid genetic algorithm. Harbin University, 40(5), 688 691. Yao, J., Yang, Q., Mao, W., Zhao, Y., & Xu, X. (2016). Precipitation trend elevation relationship in arid regions of the China. Global Planet Change, 143, 1 9. Zahiri, E.-P., Bamba, I., Famien, A.-M., Koffi, A.-K., & Ochou, A.-D. (2016). Mesoscale extreme rainfall events in West Africa: The cases of Niamey (Niger) and the Upper Ouémé Valley (Benin). Weather and Climate Extremes, 13, 15 34. Zhu, L., & Huang, J.-F. (2007). Comparison of spatial interpolation method for precipitation of mountain areas in county scale. Transactions of the CSAE, 23(7), 80 85.