Random forest regression evaluation model of regional flood disaster resilience based on the whale optimization algorithm

Random forest regression evaluation model of regional flood disaster resilience based on the whale optimization algorithm

Journal Pre-proof Random forest regression evaluation model of regional flood disaster resilience based on the whale optimization algorithm Dong Liu, ...

2MB Sizes 0 Downloads 56 Views

Journal Pre-proof Random forest regression evaluation model of regional flood disaster resilience based on the whale optimization algorithm Dong Liu, Zhongrui Fan, , Qiang Fu, Mo Li, Muhammad Abrar Faiz, Shoaib Ali, Tianxiao Li, Liangliang Zhang, Muhammad Imran Khan PII:

S0959-6526(19)34338-0

DOI:

https://doi.org/10.1016/j.jclepro.2019.119468

Reference:

JCLP 119468

To appear in:

Journal of Cleaner Production

Received Date: 4 August 2019 Revised Date:

4 November 2019

Accepted Date: 25 November 2019

Please cite this article as: Liu D, Fan Z, Fu Q, Li M, Faiz MA, Ali S, Li T, Zhang L, Khan MI, Random forest regression evaluation model of regional flood disaster resilience based on the whale optimization algorithm, Journal of Cleaner Production (2019), doi: https://doi.org/10.1016/j.jclepro.2019.119468. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

1

Random Forest Regression Evaluation Model of

2

Regional Flood Disaster Resilience Based on The Whale

3

Optimization Algorithm

4 5 6 7 8 9 10 11 12 13 14 15 16

Dong Liu2#, Zhongrui Fan1#, Qiang Fu*2, Mo Li1*, Muhammad Abrar Faiz1, Shoaib Ali1, Tianxiao Li1, Liangliang Zhang1, Muhammad Imran Khan3 1 School of Water Conservancy & Civil Engineering, Northeast Agricultural University, Harbin, Heilongjiang 150030, China 2 School of Water Conservancy & Civil Engineering, Northeast Agricultural University, Harbin, Heilongjiang 150030, China; Key Laboratory of Effective Utilization of Agricultural Water Resources of Ministry of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang 150030, China; Heilongjiang Provincial Key Laboratory of Water Resources and Water Conservancy Engineering in Cold Region, Northeast Agricultural University, Harbin, Heilongjiang 150030, China; Key Laboratory of Water-Saving Agriculture of Ordinary University in Heilongjiang Province, Northeast Agricultural University, Harbin, Heilongjiang 150030, China 3 Department of irrigation and Drainage, University of Agriculture, Faisalabad, Pakistan

17

[email protected]

18

[email protected]

19

[email protected]

20

[email protected]

21

[email protected]

22

[email protected]

23

[email protected]

24

[email protected]

25

[email protected]

26

# Co-first author: [email protected]

27

[email protected]

28

*Co-corresponding author: [email protected]

29

[email protected]

1 2 3 4

Random Forest Regression Evaluation Model of Regional Flood Disaster Resilience Based on The Whale Optimization Algorithm

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Abstract: This study proposes a flood disaster resilience evaluation model based on an improved random forest model, which is used to solve the fuzziness problem in resilience evaluations. The model uses the whale optimization algorithm (WOA) to determine the key parameters in the traditional random forest regression (RFR) model and combines the evaluation index set constructed by the Driving forces-Pressure-State-Impact-Response (DPSIR) model to output the resilience index of the study area. This approach has certain advantages in solving the spatiotemporal distribution problem of disaster resilience and can analyze the temporal and spatial variability of the research area and the key driving factors. Taking the Jiansanjiang Administration of Heilongjiang Province of China as an example, the model was used to analyze the resilience of flood disasters in 15 farms under the jurisdiction of the region from 2002 to 2016. The results showed that the level of resilience to flood disasters in the Jiansanjiang Administration was generally increasing at a growth rate of 4.175/10a. In addition, the level of flood resilience was spatially different as shown by the high level of resilience in the southwest and low level in the northeast. The degree of differentiation between farms increased between 2006 and 2011 and decreased between 2012 and 2016. The study also found that economic indicators and population indicators have a greater impact on the assessment results. Compared with the stochastic forest regression model optimized by particle swarm optimization (PSO-RFR) and the RFR model, the WOA-RFR model has outstanding advantages in fitting accuracy, generalization performance. The rationality coefficient and stability coefficient of the WOA-RFR are 0.964 and 0.976, respectively, which have reached a high level. The proposed WOA-RFR model can be used to perform regional disaster resilience evaluation, provide stable technical support and establish a scientific basis for regional disaster prevention and mitigation to ensure regional production safety and sustainable development. Keywords: Whale optimization algorithm; Random forest regression; Flood disaster; Resilience evaluation; Sustainable development

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

1 Introduction As a type of natural disaster that has a considerable negative impact on regional development, flood disasters are characterized by sudden bursts, high frequency of occurrence, uncontrollable factors, and serious damage (Miceli et al., 2008). According to a survey conducted by the International Federation of Red Cross and Red Crescent Societies, there were 1,719 floods worldwide from 2006 to 2015, accounting for 45.79% of the total number of natural disasters, resulting in economic losses of approximately 23.8 billion yuan and affecting at least 800 million people (Chau et al. 2017, Sanderson et al., 2017). Floods have seriously damaged the human production and living environment and hindered the process of social development (Cheng et al. 2002, Fotovatikhah et al. 2018). With sea level rise, the frequent occurrence of extreme rainfall events and the intensification of climate change, the frequency and severity of flood disasters will further increase. Determining how to prevent, reduce and mitigate disasters ensure the sustainable development of disaster-stricken areas, and promote the harmonious coexistence between human and nature has become an inevitable task for human development (Aerts et al., 2018; Wan et al., 2014). The term “resilience” originated from the Latin "resilo", which means to bounce back or rebound (Alexander et al., 2013). In 1973, Holling et al. (1973) first cited the concept of resilience in the field of ecology. In systematic ecology, resilience is used to express the ability of an ecosystem to recover from external stress. Resilience has also been applied to the field of disaster science and has been increasingly used by disaster science researchers as a measure of the nature of disaster systems. Many researchers have tried to define the concept of disaster resilience. Perring et al. (2003) defined disaster resilience as the extent to which affected people respond to potential disasters and prepare for rescue. Adger et al. (2000) defined disaster resilience as the interference or impact of an external stress on infrastructure. Liu et al. (2006) defined disaster resilience as the ability of a system to adjust, adapt, recover and rebuild after a disaster. This paper summarizes disaster resilience in three parts, namely, the ability of the affected body to prepare for disasters before they occur, the ability to adapt to disasters when they occur, and the ability to resume reconstruction after a disaster occurs. The theory of disaster resilience overcomes the shortcomings of traditional disaster prevention theory and has provided new ideas for disaster relief. This approach helps to coordinate the relationship between people and the natural environment and ensure the sustainability of regional development. With the continuous development of the theory of resilience, research on resilience has evolved from conceptual research to more in-depth quantitative expression research. Due to the complexity and abstraction of the resilience itself, the calculation of resilience is difficult to express intuitively. In the current literature, research on the quantitative resilience of expression is still in the exploration stage. Some researchers have attempted to use qualitative measurement methods, such as questionnaires and field trips, to gain insights into the characteristics of resilience and prepare for quantifying expression resilience (Heng et al., 2018). Adger et al. (2005) studied the impact of coastal disasters on local socioecological resilience and found that disaster resilience in disaster-affected areas can be enhanced with external interference. In response to regional disasters, Adger found that multilevel management can improve the ability to cope with uncertain disasters. Lai et al. (2015) studied the factors that influence earthquake resilience in Nepal in 2015 through field trips and policy analysis and found that community connections and organized reconstruction activities are important drivers of regional resilience.

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

Although qualitative methods can analyze the characteristics of resilience and influencing factors, the subjective composition of these methods is strong, the research results are more abstract, and the conclusions are difficult to popularize and apply. Quantitative measurement methods use data mining and statistical analysis to establish evaluation index systems and construct quantitative assessment models of regional disaster recovery (Cimellaro et al., 2010). The results are often more intuitive and easier to analyze. Ge et al. (2011) constructed an evaluation index system with four dimensions (the natural dimension, social dimension, economic dimension, and technical dimension) and used principal component analysis (PCA) to evaluate the trend of flood resilience in Changsha, China, from 1980 to 2006. The study found that urban and rural consumption and income levels are key factors that affect the resilience of floods in the region. Rose et al. (2007) established the adaptive regional input-output (ARIO) model to analyze the changes in post-disaster resilience from an economic perspective. The results showed that there is an inverse relationship between indirect disaster loss and disaster relief. This study provides a good reference for quantifying expression resilience. Kotzee et al. (2016) measured and analyzed the spatial distribution characteristics of flood resilience in three municipalities in South Africa by screening 24 resilience indicators and combining principal component analysis and GIS techniques. Liu et al. (2018) used the technique for order preference by similarity to an ideal solution (TOPSIS) to evaluate the resilience of flood disasters in Hunan Province, China. The research indicated that the resilience level of the northeastern region of Hunan Province is better than that of the southwest region. The above methods have achieved certain results in resilience evaluation, but there are also some shortcomings to these approaches. PCA may generate ambiguous results regarding the interpretation of principal components. When the factor load has positive and negative signs, the significance of the comprehensive evaluation function is not clear. The ARIO model focuses on the effects of economic indicators on resilience and ignores the impact of the natural environment. Thus, the results are not objective and comprehensive. TOPSIS has strong subjectivity in determining the weights of indicators in the evaluation process (Olson et al., 2004). Therefore, when selecting a method for quantifying resilience, the above problems should be avoided and a model with good comprehensive performance should be selected for evaluation. The random forest regression (RFR) model is a statistical learning theory that was proposed by Breiman et al. (2001) in 2001. Because of its excellent performance, it has been widely used in the field of data mining by scholars around the world. The main rule of RFR is to use the bootstrap resampling method to extract several samples from the original data, construct a decision tree for each bootstrap sample, combine the predictions of all decision trees and obtain the result by voting. The basic concept of RFR is to combine multiple weak decision trees into one strong decision tree and improve the accuracy of the overall decision by the principle of complementarity among weak decision trees. Compared with the artificial neural network (ANN),

112 113 114 115 116 117 118 119

the RFR model has high stability and is not prone to overfitting (Liu et al., 2007, Wang et al., 2015). Compared with particle swarm optimization (PSO), the RFR model converges quickly, has few adjustment parameters and is easy to operate (Wang et al., 2013). Compared with the support vector machine (SVM), the RFR model has higher fitting accuracy and excellent tolerance to noise and outliers in the data (Martens et al., 2007, Yaseen et al., 2019). In addition, the RFR model has strong data mining capabilities and high prediction accuracy and is good at processing multi-feature data. After model training is completed, the indicators that are important to the fitting result can be analyzed, which is beneficial for the follow-up research work. (Chen et al.,

120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

2012; Tesfamariam et al., 2010). In the forest generation process, the number of decision trees N and the candidate split attribute value M are two parameters that notably influence the performance of the model. According to the law of large numbers, when the value of N increases, the generalization error converges, which can effectively avoid data overfitting, but increasing this value after it reaches a certain threshold does not increase the accuracy of the model (Rodriguez-Galiano et al., 2012). Additionally, an excessive N value will also cause the RFR operating efficiency to decrease. In the process of generating the decision tree, the M value determines the strength of the decision tree and the correlation among the decision trees. The stronger the ability of each decision tree, the higher the accuracy of the model. The greater the correlation between decision trees, the lower the accuracy of the model. When M decreases, the ability and correlation of the decision tree will decrease accordingly; when M increases, the ability and correlation of the decision tree will increase accordingly. Therefore, finding the appropriate size of N and M within the acceptable range of operational efficiency is the key to determining the accuracy of the model. Historically, artificially setting parameters for comparison and screening parameters constituted a huge workload, and the results do not guarantee the efficient and stable operation of the model during the fitting process. Optimization algorithms can be used to improve work efficiency and ensure that the resilience assessment work is efficient and stable. The whale optimization algorithm (WOA) is a new metaheuristic intelligent optimization algorithm that was proposed by Mirijalili et al. (2016). The algorithm mimics the "spiral bubble net" strategy hunting behavior of humpback whales and captures the prey through shrinkage envelopment, spiral position updating and a random hunting mechanism to obtain the optimal solution of the optimization problem. The advantages of this approach include simple operation, few parameter adjustments, rapid convergence and strong global optimization ability, which can effectively provide researchers with the best answer (Khalil et al., 2019; Azizet al., 2017; Zhou et al.). In addition, the search strategy of WOA has advantages related to certain issues. Compared with PSO, the WOA only stores global optimal values during iterations, which increases the efficiency of storage. Compared with the GA, the WOA has a variety of optimization paths and the optimization strategy is more comprehensive (Zheng et al., 2019). Medani et al. (2018) applied the WOA to the optimal reactive power dispatching problem in a power system. The results showed that WOA performance was better than that of PSO and PSO with time-varying

151

acceleration coefficients (PSO-TVAC). Mostafa et al. (2017) applied the WOA to multilevel

152 153 154 155 156 157 158 159 160 161 162

threshold image segmentation. The research showed that WOA performs better than social spider optimization (SSO), the firefly algorithm (FA), the sine-cosine algorithm (SCA), and harmony search optimization (HSO). Zhao et al. (2017) applied the WOA to improve a least squares support vector machine (LS-SVM) model and predict energy-related CO2 emissions. The research showed that the WOA-LSSVM model is superior to the combined fruit fly optimization algorithm (FOA)-LSSVM model and the LSSVM model based on the prediction accuracy. Simhadri et al. (2019) applied WOA to the parameter optimization problem of a two degree of freedom state feedback controller. Compared to results of the genetic algorithm (GA) and chaos particle swarm optimization (CPSO), the results obtained by the WOA optimization parameters are more reasonable. Therefore, this paper proposes an improved RFR model that optimizes the split attribute value (M) and the number of decision trees (N) in the RFR model by WOA, and the

163 164 165 166 167 168 169 170

model is applied to evaluate regional flood disaster resilience. The objectives of this paper are as follows: (1) Construct a regional resilience RFR evaluation model based on the WOA; (2) Analyze the main driving factors that influence the resilience of flood disasters in the study region and propose improvements; (3) Analyze the temporal and spatial variation characteristics and possible causes of regional flood resilience; (4) Evaluate the performance of the WOA-RFR model.

171

2 Materials and Methodology

172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206

2.1 Study Area The Heilongjiang Provincial Agricultural Reclamation Jiansanjiang Administration in China is located in northeastern Heilongjiang Province and on the northern part of the Jiansanjiang Plain. This area is located in the inter-river zone of the Heilongjiang, Songhua and Wusuli Rivers. The geographical coordinates of the region are 132°31’-134°32’ east longitude and 46°49’-48°12’ north latitude. The area is flat and has fertile black soil resources and abundant water resources. The Honghe Wetland Protection Area of the China National Wetland Reserve is located to the north of the Jiansanjiang Administration and has very high hydrological research value. The Jiansanjiang Administration mainly uses modern mechanized agricultural production, with paddy fields as the main planting type, and is known as “the Green Rice Capital of China” (Liu et al., 2017). The total area of the jurisdiction is 12,400 km2, and the total population is 230,000 people. The region includes 15 state-owned farms. The specific distribution is shown in Figure 1. Since 1998, there have been many large-scale floods in the Heilongjiang area, and these events seriously endangered the lives of local people and hindered the development of society. In 1998, the Songhua River suffered the largest flood disaster in 150 years (Weng et al., 2007). In 2013, the Heilongjiang River Basin experienced the largest storm-type flood ever recorded in the basin. A severe flood disaster also occurred in the Songhua River Basin, and it was second in magnitude to only the 1998 flood disaster. The events in the province affected more than 2 million people, and floods have caused direct economic losses of more than 8 billion yuan (Wang et al., 2014). These flood disasters caused different degrees of flooding in Tongjiang City and the surrounding areas, which is only 51.8 km from the Jiansanjiang Administration. Floods have resulted in huge losses to the local government and threatened the lives and property of the people in the surrounding areas. The Jiansanjiang Administration is adjacent to Heilongjiang in the north, Songhua River in the west and Wusuli River in the east. Once the flooding in the flood season increases, it will face a high risk of flood disasters. In recent years, the number of flood disasters in Heilongjiang Province has continued to increase and all farms in the jurisdiction of the authority have been affected by various degrees of disasters. Due to the aging of the water infrastructure of the Jiansanjiang Administration, the disaster-resistant foundation is weak and the ability to withstand agricultural natural disasters is low. The production of crops on the farm was reduced, which seriously jeopardized the production safety of the Jiansanjiang Administration. Therefore, it is necessary to use the region as a typical case to carry out evaluations and analyses of flood disaster resilience and provide local governments with scientific guidance on disaster prevention to ensure the safety of people's lives and property and agricultural production in the region.

207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233

Fig.1. Administrative division of Jiansanjiang Administration

2.2 Data Source Through field visits to 15 farms in the Jiansanjiang Administration of Heilongjiang Province, the Economic and Social Development of the Jiansanjiang Administration (2002-2016) and the Comprehensive Annual Report of the Jiansanjiang Administration Water Conservancy (2002-2016) collected data and published statistics. The natural, economic and social development indicators of the 15 farms over 15 years for were analyzed to evaluate flood disaster resilience. 2.3 DPSIR Model The Driving forces-Pressure-State-Impact-Response (DPSIR) model is a conceptual model for characterizing complex causal relationships, and it is widely used to analyze and evaluate ecological problems that are subject to human influence, such as ecological assessment issues and regional sustainable development issues. The model consists of five subsystems, namely, the driving force, pressure, state, impact and response, and the subsystems are related by the causal relationships among systems (Gari et al., 2015). The “driving force” subsystem refers to the potential cause of a regional disaster; the “pressure” subsystem is the stress exerted by human activities on the natural environment to generate pressure that may cause floods; the “state” subsystem refers to the condition of the system under the current pressure; the "impact" subsystem considers the impact of the current state system on human society, that is, the impact of flood disasters on human society; and the "response" subsystem is a measure of the human response to current system changes, that is, the measures taken in the face of flood disasters. The DPSIR model integrates the advantages of the Pressure-State-Response (PSR) model and the Driving forces-State-Response (DSR) model. Its structure is clear and can intuitively reflect the relationship between different subsystems under the current indicator system. In the face of complex problems, especially the interaction between humans and the environment, researchers can get a clear framework of indicators, and the interrelationship between subsystems helps to

234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253

identify how the evaluation indicators work in the overall evaluation system. Compared with the PSR model and the DSR model, the DPSIR model framework is more complete, covers a wider range of fields and involves more comprehensive indicators. Therefore, the factors considered in the analysis of the problem using the DPSIR model are more sufficient and the results are more comprehensive. 2.4 Random Forest Regression RFR is an integrated decision tree composed of a set of independent regression decision trees h x,θk ,t=1, 2, 3,…N . θk is an independent distributed random variable, x is an input variable, and N is the number of regression decision trees. By using the mean value of k regression trees h x,θk as the regression prediction result (Strobl et al., 2009), the algorithm flow is as follows. (1) The bootstrap method is used to repeatedly randomly extract the original data; generate k sets θ1 ,θ2 ,…, θk ; and generate the corresponding k regression trees h x,θ1 , h x,θ2 ,…, h x,θk . (2) M features are randomly selected from the K features of each subnode in the regression tree as candidate split attribute sets, and the optimal segmentation method is determined to establish the segmentation criterion for the current node. (3) Each regression tree grows freely without restriction until it cannot continue to be split. (4) The k regression trees generated by the above process constitute a random forest. In the process of generating a random forest, the probability that each sample is not extracted

254

is P = (1 −

255 256 257 258 259 260 261 262 263 264 265 266 267 268

sample set N are not selected, and this part of the sample is referred to as out-of-bag (OOB) data. Because the OOB data are unbiased, the accuracy of the model can be evaluated by calculating the OOB error. The RFR model uses the mean square residual value of the OOB data to evaluate the degree of influence of the independent variable on the dependent variable, that is, the variable importance measure (VIM) (Gromping et al., 2009; Hemant et al., 2018).. The specific implementation of this approach is as follows. (1) Calculate the mean squared error value of the original OOB data MSEoob. (2) Randomly replace the target variable xi, and calculate the new OOB mean squared error value MSEoobi. (3) Calculate the difference between MSEoob and MSEoobi and divide by the number of regression trees to obtain the average. (4) Finally, using standard deviation standardization (z-score standardization), the importance of the target variable xi is obtained. The mathematical expression of this method is as follows.

1 N 1 ) . When N→∞, P→ ≈0.368. Approximately 36.8% of the samples in the total N e

1 V IM i ( M SE ) =  u

269 270 271 272 273 274

u

∑ ( M SE j =1

oo b

 − M SE oo bi )  / σ SD , (1 ≤ i ≤ K ) 

(1)

2.5 Whale Optimization Algorithm In the WOA, each humpback whale is a candidate for optimization. Before starting the optimization, a set of random candidate solutions is constructed, and the algorithm optimization mechanism is used to screen the candidate solutions until the global optimal solution of the optimization problem is determined (Alzaidi et al., 2019). The basic principles of the WOA are as follows.

275 276 277 278 279 280 281

(1) Surround hunting Because the global optimal solution location of the current search space in the optimization problem cannot be predicted, the WOA assumes that the current optimal candidate solution (optimal whale individual) is the global optimal solution (target prey position), and other candidate solutions (candidate whale individuals) will surround and approach the current optimal solution by changing their positions. The mathematical model of this surrounding hunting behavior is as follows. r r r r D = C ⋅ X * (t ) − X (t ) (2) r r r r X ( t + 1) = X ( t ) − A ⋅ D

(3)

282

where D is the distance vector of the current optimal solution and other candidate solutions, t is

283

the current number of iterations, X is the current optimal solution position vector, X is the

284 285

position vector of other candidate solutions, and A and C are the coefficient vectors. These vectors can be mathematically modeled as r r r r (4) A = 2a ⋅ r − a r r (5) C = 2⋅r

286

where a is the convergence factor, which decreases linearly from 2→0 as the number of

287 288 289 290 291

iterations increases. This factor is expressed as a=2-2t/tmax, where tmax is the maximum number of iterations and r is a random vector within [0,1]. (2) Spiral bubble net hunting Mathematical models are built based on the bubble net hunting of humpback whales and are divided based on the following two strategies.

292 293 294

① Contraction encirclement strategy: The range of fluctuation of A is affected by the a value in equation (4). If the a value decreases, the range of fluctuation of A decreases. A is a random vector in the interval [-2,2]. When the iteration number t increases, the convergence factor

295 296 297 298 299 300 301

a and the random vector A decrease, the whale hunting route shrinks, and the convergence

*

precision increases. When the A value is in the range of [-1,1], the whale individual will approach the prey; otherwise, it will stay away from the prey. ② Spiral updating location strategy: A spiral mathematical model involving the individual position of the whale and the position of the prey is created by simulating the behavior of a humpback whale moving in the pursuit of prey in a spiral path. This relation can be mathematically modeled as

r r r X ( t + 1) = D ' ⋅ e b l ⋅ c o s ( 2 π l ) + X * ( t ) r r r D ' = X * (t ) − X (t )

302 303 304 305 306 307

(6)

(7) r' where D is the distance between the ith whale group and the prey, b is the constant used to define the spiral form and l is a random number in the interval [-1,1]. When humpback whales are hunting, surround hunting and spiral bubble net hunting are synchronized. To simulate the corresponding mechanism, it is assumed that the whale group search individual uses the probability p of 50% as the selection threshold, that is, the individual selects one of the behaviors in (1) and (2) to update their position. This process can be

308

mathematically modeled as

r r r r  X * (t ) − A ⋅ D X (t + 1) =  r ' bl r*  D ⋅ e ⋅ cos(2π l ) + X (t )

( p < 0.5) ( p ≥ 0.5)

(8)

309 310 311 312

where p is a random number in the range of [0,1]. (3) Randomly search for prey In addition to the spiral bubble net hunting method, the humpback whale also uses a random search method to find prey. In the mathematical model, the search for the current optimal solution

313 314 315 316 317

is adjusted by changing the values in the vector A. Specifically, when the value range of A is based on random values in [-2, 1] and [1,2], the whale individual will deviate from the current optimal solution and expand the search range to find other optimal solutions. This mechanism can enhance the optimization ability of the algorithm, enabling the WOA to search globally and avoid local optima. This process can be mathematically modeled as r r r r D = C ⋅ X rand (t ) − X (9)

r r r r X (t +1) = Xrand − A⋅ D 318 319 320 321 322 323 324 325 326 327 328 329 330

(10)

where Xrand is a vector position randomly selected from the positions of the whale individual. 2.6 Basic Steps for Tuning the RFR Model Parameters with the WOA First, select the number of decision trees N and the splitting attribute value M of the random forest for the whale individual. By taking the root mean square error RMSE of the training samples as the objective function, the WOA-RFR model is constructed to achieve the iterative optimization of N and M. The specific steps in this approach are as follows. Step 1: The feature data training sample set is collected, including the input data and expected output data. Step 2: The position of the whale group search individual (the number of decision trees N and the splitting attribute value M) is initialized. Step 3: Determine the fitness function as the RMSE, which is mathematically expressed as follows. R M SE =

1 m

m

∑ (y i =1

i

− yi ' ) 2

(11)

331 332 333

Step 4: Initialize the algorithm parameters, including the number of whale populations G, the maximum number of iterations tmax, and the logarithmic spiral shape constant b. Step 5: Use formula (11) to calculate the fitness of the unit whale group search process and

334

find the current best whale individual X .

*

335

r Step 6: When p<0.5, if A < 1 , the spatial position of the current whale group search

336

r individual is updated by formula (3); if A ≥ 1 , the individual Xrand is randomly selected from

337 338 339

the current whale group, and the spatial position of the current whale individual is updated by formula (10). Step 7: When p ≥ 0.5 , the spatial position of the current whale individual is updated by

340 341

formula (6). Step 8: The updated whale individual fitness value is calculated by formula (11), and the best

342

whale individual X in the current group is found. Determine whether the algorithm satisfies the

343 344

termination condition. If yes, go to step 9. Otherwise, repeat steps 6-8. Step 9: Output the optimal whale individual fitness value and the corresponding spatial

345

position X , which are the number of decision trees N and the splitting attribute value M in

346 347 348

WOA-RFR. The specific implementation process of the WOA is shown in Figure 2.

*

*

Start

Enter training data

Whale individual position initialization

Fitness assessment

Set algorithm parameters Number of population G, The maximum number of iterations (tmax), Spiral shape constant(b)

for i=1:N Calculate fitness values

Termination condition is met?

Y

Output optimal fitness value and spatial position

End

N t=1

Location update

Updated location check for i=1:N

for i=1:N

Randomly select whale individuals from the whale population Spiral updating location strategy

Y

Update position by formula (6)

Calculate the fitness value of the new location

P≥0.5?

Update the individual position of the whale

N Contraction encirclement strategy

Y

r A < 1?

Is the new location feasible?

N

Update position by formula (3) Random search strategy

Y

Keep the original position

N

Update position by formula (10)

Update whale optimal individuals t=t+1

349 350 351 352 353 354 355 356 357 358 359

Fig.2. Whale algorithm flow chart 2.7 Theory of Serial Number Summation To visually compare the performance of different models, this paper selects the theory of serial number summation and the Spearman rank correlation coefficient to evaluate the rationality and stability of the model. The serial number sum theory uses multiple sorting results of different model simulation values to compare the performance of the model, and it is simple, efficient and easy to implement in practical applications. Therefore, it is widely used in model performance comparison problems. The principle of the method is as follows.

360 361 362 363 364 365 366 367 368 369 370 371 372

The theory of serial number summation states that the sorting result of the sum of the serial numbers obtained by different methods is a reasonable sorting result (Guo et al., 2011, Zhang et al., 2019). Therefore, the correlation among the results obtained by each evaluation method and the relatively reasonable ranking results can be used as the basis for the rationality of the evaluation method. The average and standard deviation of the correlation coefficients of each method are calculated after repeated evaluations. The method with the highest average correlation coefficient can be considered more reasonable than other methods. Additionally, the evaluation method with the smallest standard deviation is more stable than the other methods. The Spearman rank correlation coefficient is a classic method of calculating the correlations among variables. Therefore, this paper chooses the Spearman rank correlation coefficient combined with the theory of serial number summation to evaluate the rationality and stability of the model. Spearman’s rank correlation coefficient can be expressed as follows.

∂ = 1−

6 × ∑ Di2 n(n2 − 1)

(12)

373

where Di indicates the difference between a relatively reasonable sorting number for the

374

flood disaster resilience of farm i and the sorting number based on a certain evaluation method and

375 376 377

n is the number of farms. The closer ∂ is to 1, the greater the correlation between the two sorting results. The rationality coefficient R and the stability coefficient S are calculated as follows.

R= S =1−

1 n ∑ ∂i n i =1

1 n ∑ (∂ i − R ) 2 n i =1

(13)

(14)

378

3 Results

379 380 381 382 383 384 385 386 387 388 389 390 391 392 393

3.1 Construction of an Evaluation Index System for Flood Disaster Resilience The DPSIR model is used to construct an evaluation index system for the restoration of flood disasters in the Jiansanjiang area. According to the actual situation and existing research (Heng et al., 2018; Zhang et al., 2018;), 16 evaluation indicators were selected from the environmental, economic, medical care, population and agricultural classes to evaluate the resilience of the Jiansanjiang Administration area to flood disasters. The overall development degree of the region and the coverage of public facilities generally have a high impact on regional resilience. This paper uses the Gross Domestic Product (GDP) indicator to represent the overall development level of the region, and the tertiary industry output value represents the degree of improvement of public facilities. Because the labor force affects the quality of disaster relief activities in the event of a disaster, population indicators can be used to indirectly reflect the amount of labor in the region. Households with different income levels can withstand different levels of disasters. Therefore, the per capita income and per capita savings are selected as evaluation indicators. The Jiansanjiang Administration has a high level of agricultural development. Because the impact of flood disasters on food security may directly affect local economic development and social

394 395 396 397

stability, the arable land area, grain output and grain reserve capacity indicators are selected to evaluate the resilience to flood disasters. The specific definitions of the 16 evaluation indicators are shown in Table 1. Table 1 Definition of flood disaster resilience evaluation indicators System

Driving force

Evaluation index

Indicator meaning

type

Unit

Precipitation Z1

Depth of rainwater that has not been evaporated, infiltrated, or lost on a horizontal surface

-

mm

Per capita GDP Z2

Gross domestic product / total population

+

104 yuan

GDP growth rate in one year

+

%

Number of people per square kilometer of land area

+

individual/km2

+

%

+

km2

GDP growth rate Z3 The population density Z4 Pressure

Population growth rate Z5 Cultivated area Z6

State

Impact

Response

Proportion of young and middle-aged population Z7

Proportion of residents aged 16 to 59

+

%

Tertiary industry output value Z8

Gross domestic product in all industries except agriculture, industry and construction

+

104 yuan

Production of food crops in 1 year

+

kiloton

Grain production Z9 Per capita net income Z10 Per capita savings Z11 Food reserve capacity Z12 Number of health institutions Z13 Number of hospital beds Z14 Health technician Z15 Evacuation area Z16

398

Population growth rate due to natural population changes and migration changes within 1 year Area of land that has been sown or has been planted for planting

Average value of net income of local residents Average value of regional resident savings Regional granary reserve grain capacity

+

yuan

+

yuan

+

kiloton

Number of regional health agencies

+

individual/km2-

Number of beds in regional health agencies

+

individual/km2-

Number of regional health technicians

+

individual/km2-

Areas of places such as squares and parks that can be sheltered during disasters

+

m2

(Note: “+” and “-” in the table represent the positive and negative indicators, respectively.)

399 400 401 402 403 404 405 406 407

408 409 410 411 412

3.2 Flood disaster resilience rating criteria To facilitate the observation of temporal and spatial variations in the resilience of 15 farm flood disasters from 2002 to 2016, the evaluation time interval was set to 5 years, and the evaluation data were based on a 5-year average of the indicator data. The resilience level of the Jiansanjiang Administration from was evaluated 2002 to 2006, 2007 to 2011, and 2012 to 2016. Using the natural breakpoint method (Han et al., 2017), the 15-year data for each indicator were divided into 5 grades and adjusted according to the local actual situation. The classification criteria for each indicator are shown in Table 2. Table 2 Flooding Resilience Grading Standards Evaluation index

I

II

III

IV

V

Precipitation Z1 Per capita GDP Z2 GDP growth rate Z3 The population density Z4 Population growth rate Z5 Cultivated area Z6 Proportion of young and middle-aged population Z7 Tertiary industry value Z8 Grain production Z9 Per capita income Z10 Per capita savings Z11 Food reserve capacity Z12 Number of health institutions Z13 Number of hospital beds Z14 Number of health technicians Z15 Evacuation area Z16

>800

(700,800]

(600,700]

(500,600]

≤500

≤20000

(20000,40000]

(40000,60000]

(60000,80000]

>80000

≤0

(0,3]

(3,5]

(5,10]

>15

≤5

(5,10]

(10,15]

(15,20]

>20

≤0

(0,5]

(5,10]

(10,15]

>15

≤300

(300,400]

(400,500]

(500,600]

>600

≤0.65

(0.65,0.7]

(0.7,0.8]

(0.8,0.85]

>0.85

≤5000

(5000,10000]

(10000,20000]

(20000,50000]

>50000

≤100

(100,200]

(200,400]

(400,600]

>600

≤5000

(5000,10000]

(10000,20000]

(20000,30000]

>50000

≤5000

(5000,20000]

(20000,50000]

(50000,100000]

>100000

≤10

(10,50]

(50,500]

(500,1000]

>1000

≤5

(5,10]

(10,15]

(15,30]

>30

≤20

(20,50]

(50,100]

(100,300]

>300

≤20

(20,50]

(50,100]

(100,300]

>300

≤10000

(10000,25000]

(25000,50000]

(50000,100000]

>100000

3.3 Model implementation (1) Data generation and parameter setting Training samples and test samples were randomly generated using the original data set. According to Table 2, five standard level intervals (I-V) of flood disaster resilience were obtained; 400 samples were randomly generated in each level interval, and a total of 2000 samples were

413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431

generated. Additionally, 1, 2, 3, 4, and 5 are used as the expected outputs of the levels of flood disaster resilience. Taking the Grade II standard level interval as an example, the resilience level of the sample is considered Grade II, and the expected output is 2 when the values of each sample meet the following criteria: 700
2

RMSE 432 433 434 435 436 437 438 439 440 441 442

443 444 445 446 447 448 449

WOA-RFR model Training set

Test set

δ

0.994

0.988

0.006

0.112

0.156

0.044

2

As shown in Table 3, the R values of the training set and the test set in the WOA-RFR model are 0.994 and 0.988, respectively, and the RMSE values are 0.112 and 0.156, respectively. Additionally, the absolute value δ of the difference between the training set and the test set is small, which indicates that the WOA-RFR model has high evaluation accuracy and excellent generalization ability and can be used for comprehensive evaluations of flood disaster resilience. (4) Level simulation interval determination The thresholds of the evaluation index grades are used as the basis for the division of the flood disaster resilience evaluation level and are input into the WOA-RAR evaluation model established above to obtain the simulation intervals of each grade. The results are shown in Table 4. Table 4 Grade simulation interval of WOA-RFR flood disaster resilience evaluation model Grade

I

II

III

IV

V

Interval

≤1.363

(1.363,2.275]

(2.275,3.358]

(3.358,4.162]

>4.162

3.4 Spatio-temporal Variation Characteristics of Flood Disaster Resilience The evaluation index data for the 15 farms in the Jiansanjiang Administration from 2002-2006, 2007-2011 and 2012-2016 were input into the established WOA-RFR flood disaster resilience evaluation model. The resilience simulation results and evaluation grades for each period and each farm were obtained. The results are shown in Table 5. Table 5 Simulation results and evaluation grades of flood disaster resilience for each time period of each farm

Farm 859 Shengli Qixing Qindeli Daxing Nongjiang Qianjin Chuangye Hongwei Qianshao Qainfeng Honghe Yalvhe Erdaohe Qinglongshan Average 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475

Simulation result 2002-2006 2007-2011 2012-2016 2.717 2.854 3.324 2.304 2.851 1.962 3.090 3.059 2.205 2.182 2.554 2.312 1.953 2.134 2.010 2.501

3.364 3.149 4.103 2.596 3.041 2.118 3.419 3.246 2.803 2.361 2.965 2.695 2.493 2.550 2.314 2.881

3.821 3.355 4.395 3.363 3.595 2.741 3.652 3.635 3.104 2.918 3.683 3.047 2.632 3.282 2.810 3.336

Evaluation grade 2002-2006 2007-2011 2012-2016 III III III III III II III III II II III III II II II III

IV III IV III III II IV III III III III III III III III III

IV III V IV IV III IV IV III III IV III III III III III

Table 5 shows that the average value of the resilience index for the 15 farms in the Jiansanjiang Administration from 2002 to 2006 was 2.501, and the average resilience rating was III. Because the average value of the simulation results is closer to Grade II than to Grade III, it can be considered that the overall level of resilience of the Jiansanjiang Administration from 2002 to 2006 was Grade III-. Based on the simulation results, the 6 farms in Qindeli, Qinglongshan, Hongwei, Qianshao, Yaluhe and Nongjiang have low resilience levels, all of which are Grade II, and the remaining 9 farms have medium resilience grades, all of which are Grade III. Among the 15 farms, Yaluhe Farm has the lowest resilience index value of 1.953, and Qixing Farm has the highest resilience index value of 3.324. The difference is 1.371. The average value of the 15 farm resilience indices from 2007 to 2011 was 2.881, which was 15.19% higher than that from 2002 to 2006, and the average resilience rating was III. Based on the simulation results, Qixing Farm, 859 Farm and Qianjin Farm have high resilience values, all of which are Grade IV. Among them, the Qixing Farm resilience index value is 4.103, ranking first in the simulation results for the 15 farms. The results of the simulation of the Nongjiang Farm are the lowest, with a resilience index value of only 2.118. The difference between this value and that for Qixing Farm is 1.985, and the resilience rating is Grade II. The remaining 11 farms have a resilience rating of III. The average value of the 15 farm resilience indices from 2012 to 2016 was 3.336, an increase of 15.79% from 2007 to 2011, and the average resilience rating was Grade III. Because the simulation result is closer to Grade IV than to Grade III, the overall resilience level of the Jiansanjiang Administration from 2012 to 2016 is Grade III+. Based on the simulation results, we found that the Qixing Farm resilience index value was 4.395 and the resilience rating was V. The resilience rating of the 859 Farm, Qianjin Farm, Qindeli Farm, Chuangye Farm, and Daxing Farm was Grade IV. The remaining 10 farms are all Grade III. Yalvhe Farm has the lowest resilience index, and the simulation result is 2.632, which is 1.963 lower than that for Qixing Farm. The standard deviations of the simulated indices for 15 farms from 2002 to 2006, 2007 to 2011, and 2012 to 2016 were 0.452, 0.522, and 0.476, respectively. The resilience level of the 15 farms highly varied

476 477 478 479 480

481

482

between 2007 and 2011, but after 2012, it slowly began to return to the level observed from 2002-2006. To illustrate the spatial distribution of the 15-year flood disaster resilience level of the Jiansanjiang Administration, a spatial distribution map of the flood resilience rating of each farm from 2002 to 2016 is plotted according to the data in Table 5, and the results are shown in Figure 3.

483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498

Fig.3. Spatial distribution of flood disaster resilience ratings for each farm in Jiansanjiang Administration from 2002 to 2016: (a) 2002 to 2006, (b) 2007 to 2011, (c) 2012 to 2016 Figure 3(a)-3(c) shows the spatial distribution of the resilience level of the Jiansanjiang Administration farms from 2002 to 2006. The level of resilience of the northern farms was generally low. In contrast, the resilience of the southern farms was high. Of the northern farms, only Qindeleli farm, Qianfeng farm, and Honghe farm had resilience levels of Grade III, and the southern farms, except Hongwei Farm, were Grade III. During the five years from 2007 to 2011, the spatial distribution of the resilience level exhibited a different distribution pattern from that between 2002 and 2006. The overall resilience level was level III, and the resilience levels of some individual farms were higher; most of these farms were distributed in the southwestern portion of the Jiansanjiang Administration. The spatial distribution of resilience levels in the five years from 2012 to 2016 was similar to that from 2007 to 2011. The resilience of farms in the southwest was still higher than the overall level in the region, and the level of resilience in the northeast was generally low.

499

4 Discussion

500 501 502 503 504 505 506 507 508 509

4.1 Driving Factor Analysis of Flood Disaster Resilience With the importance function of the evaluation indicators built into the WOA-RFR model, the contribution degree of each index to the evaluation results is calculated (Lai et al., 2015), and the results are sorted according to the importance degree from large to small, as shown in Figure 4. The precipitation index has the greatest impact on the evaluation results, followed by the per capita GDP, and their importance in the evaluation process is similar. The indicators ranked 3rd to 5th are the tertiary industry output value, population density, and the proportion of the young and middle-aged population. The above five indicators have a considerable influence on the evaluation results and are very important indicators. The impact of the GDP growth rate, per capita savings, number of health institutions, per capita income, and number of hospital beds on the evaluation

510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530

results is between 15% and 25%, suggesting that these indicators are relatively important. The degree of influence of the number of health technicians, food reserve capacity and food production on the evaluation results is between 5% and 10%, indicating that these indicators are relatively unimportant. The population growth rate, shelter area and cultivated land area have influences of less than 5% on the accuracy of the model, indicating that the above three indicators contribute little to the level of disaster resilience in the defined area during the evaluation process and are unimportant indicators. Based on the ranking results of all indicators, precipitation, economic and population indicators have the highest impacts on the assessment of regional flood disaster resilience. This result is similar to the conclusions obtained by Sun et al. (2016) when studying the resilience of Chaohu, which showed that natural and economic indicators are two extremely important factors affecting the resilience of disasters. In addition, Tang G.J. et al. (2017) and Ge Y. et al. (2011) emphasized the important role of economic indicators in resilience in their disaster resilience studies, which is consistent with the conclusions of the driving factor analysis in this paper. Compared with the conclusions of the above three studies, this paper suggests that population indicators also play an important role in the assessment process. A review of historical large-scale flood disasters shows that the labor force plays an indispensable role in the process of restoration and reconstruction. Ample labor often represents rapid rescue and efficient reconstruction. Areas with sufficient labor are rebuilt faster in the face of disasters than areas where labor is scarce. In areas where labor is lacking, after a disaster occurs, if they are supported by volunteers from other regions, their ability to control losses and restore and rebuild will also be greatly enhanced.

Precipitation Per capita GDP Teriary industry value

Evalution Factors

Population density Proportion of YMAP GDP growth rate

Per capita savings Number of health institutions Per capia income Number of hospital beds Number of health institutions Food reserve capacity Grain production

The Importance of The Indicators

Ultra High Medium

Population growth rate Cultivated area Evacuation area 0.00

531

Low

0.05

0.10

0.15

0.20

0.25

0.30

MeanDecreaseAccuracy

532 533

Fig.4. Measure of importance of each indicator in the WOA-RFR model

0.35

0.40

534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577

(Note: The acronym YMAP stands for young and middle-aged population) Compared with the important influence of economic indicators and population indicators during assessments, medical indicators and agricultural production indicators have less of an impact on the results, which is likely associated with the low survival rate of victims of major flood disasters when the damage to the affected area is devastating.. Moreover, victims who survive until rescue are generally less injured and their medical needs will be relatively low. On the other hand, large flood disasters can overwhelm hospitals, clinics and other medical facilities and reduce the effectiveness of regional medical care. In addition, waterlogging usually has a greater impact on the regional economy but rarely involves casualties, and the demand for regional medical care in such situations is also low. Thus, the impact of the medical level on the resilience of flood disasters is limited. The impact of agricultural production capacity on regional reconstruction is also very limited, although agricultural resources in developed areas tend to have sufficient material reserves and certain advantages in dealing with disasters. However, when disasters occur, stocks can be obtained through social donations and state aid, thus weakening the importance of agricultural production capacity. 4.2 Analysis of Resilience Evaluation Results 4.2.1 Time variability analysis of flood disaster resilience According to Table 5, the average value of the flood disaster resilience of the Jiansanjiang Administration in 2002-2006, 2007-2011, and 2012-2016 was 2.501, 2.881, 3.336, respectively. The overall resilience growth rate of the Jiansanjiang Administration in 2002-2016 is calculated as 4.175/10a, which indicates a growth trend. The reason for this trend is the steady growth of economic construction in the Jiansanjiang Administration from 2002 to 2016, and the farm scale was increasingly optimized. Taking economic indicators and population indicators as an example, the GDP growth rate and population growth rate of the Jiansanjiang Administration are 9.79% and 1.70%, respectively. In addition, the average precipitation of the Jiansanjiang Administration per years has been maintained in the range of 500 mm to 650 mm. The fluctuation of precipitation changes is small, and the change that causes disturbance to resilience is also small, which is one of the reasons for the steady growth of the resilience of the Jiansanjiang Administration. The trend of resilience between farms is different from that of the overall resilience of the authority. The standard deviation of the resilience simulation index for the 15 farms from 2002-2006, 2007-2011 and 2012-2016 is 0.452. 0.522, 0.476, respectively, which indicates that the level of resilience of 16 farms in the 15 studied years experienced a process of increasing first and then decreasing. The main reasons for this phenomenon are as follows. First, the cumulative affected area of the Jiansanjiang Administration in 2002-2006, 2007-2011 and 2012-2016 was 10954.62 km2, 13607.05 km2, and 8032.05 km2 respectively. These data reflect the extent of the disaster and the degree of negative impact in the three periods. When the disaster is serious, farms with a higher degree of development will be hit harder and the disturbance will be greater than that of well-developed farms. When the disaster is severe, farms with a poor degree of development will be hit harder than well-developed farms, and the difference in resilience levels naturally increases. Therefore, in years when the disaster is severe, the difference in resilience between farms will increase, and in the years when the disaster is less severe, the difference in resilience between farms will decrease. Second, the unbalanced development of the Jiansanjiang area in the early years led to a large gap in the farm construction scale, economic level and agricultural production

578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621

capacity. Secondly, the unbalanced development of the Jiansanjiang area the early years led to a large gap in the construction scale, economic level and agricultural production capacity of different farms. Qixing Farm and Erdaohe Farm are considered as examples. In 2007, the GDP of Qixing Farm and Erdaohe Farm was 85,015 yuan and 23,739 yuan, respectively, and the GDP of Qixing Farm was 3.69 times that of Erdaohe Farm. In 2016, the GDP of Qixing Farm and Erdaohe Farm was 254,771 yuan and 92,386 yuan, respectively, and the GDP of Qixing Farm was 2.75 times that of Erdaohe Farm. Thus, the economic development speed of Qixing Farm has gradually flattened, which has driven the development of the surrounding farms to become a new trend and narrowed the difference in the level of development between farms. In the future, this trend will remain and the gap in flood disaster recovery capacity will also shrink. 4.2.2 Spatial Variability Analysis of Flood Disaster Resilience Figure 3(a)-3(c) and the previous analysis show that there is a clear regional law on the horizontal distribution of flood disaster resilience over the 15 studied years in the Jiansanjiang Administration, and an overall gradually increasing trend is observed from north to south. Farms in the northeast have lower levels of resilience, while farms in the southwest have higher levels of resilience. The reason for this phenomenon is that among the 15 farms under the jurisdiction of the Jiansanjiang Administration, the economic development level of the southwestern farm represented by the Qixing Farm has always been at the forefront of the administration. Taking 2016 as an example, the total GDP of Qixing Farm, Chuangye Farm, Daxing Farm and Qianjin Farm was 726,272 yuan, which accounted for 31.85% of the total GDP of Jiansanjiang Administration in 2016. The average GDP of the four farms was 181,568 yuan, which greatly exceeded the average GDP of the Jiansanjiang Administration over the same period. In addition, the population of the four farms in 2016 was 90,321, which accounted for 35.54% of the total population of the Jiansanjiang Administration during the same period. In the face of disasters, the above farms have huge human resource advantages and economic resources advantages compared to the other farms. Field investigations further showed that Qixing Farm is the largest farm in the southwest of the Jiansanjiang Administration. It is also the seat of the government of the authority and has public resource advantages, such as schools, hospitals, public security bureaus and fire brigades that cover 15 farms. When a flood disaster occurs, the above advantages will have a positive impact on disaster relief operations. The three farms of Chuangye Farm, Qianjin Farm and Daxing Farm are adjacent to and greatly affected by Qixing Farm. The overall development level of the region is at the forefront of the Jiansanjiang Administration; therefore, the level of resilience is relatively high, which also confirms the above analysis. In contrast, the large-scale Qindeli farms and outpost farms in the northern region were slow to develop before 2012 and could not effectively promote regional economic development. At the same time, the geographical location of Qindeli Farm is westward, and the ability to radiate farms in the northeast is limited, further exacerbating the gap in development between eastern and western farms. In addition, the abundant precipitation of the eastern farm makes it vulnerable to flooding. For example, in 2016, the average precipitation of the eastern farms was 768.4 mm, which far exceeded the average precipitation of 660.8 mm of the Jiansanjiang Administration for the same period. Compared to western farms, eastern farms have a higher risk of flooding and more losses from floods. In summary, the contradiction between the high flood disaster risk and the farm development scale leads to a significantly lower level of farm flood disaster resilience in the northeast than the southwest farm.

622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642

4.3 Model Performance Comparison To test the stability of the model and the rationality of the evaluation results, an RFR model, a stochastic forest regression model optimized by particle swarm optimization (PSO-RFR) and the WOA-RFR model were constructed. The resilience levels of 15 farms under the jurisdiction of Jiansanjiang Administration in 2016 were evaluated. The RFR model was run with default parameters, and the WOA-RFR model parameter settings were consistent with the previous ones. The maximum number of iterations of the PSO-RFR model was set to 1000, the overall size was set to 50, the attenuation coefficient was set to 0.99, the partial learning factor and the global learning factor were set to 2, the single speed limit was set to [-0.5, 0.5] and its optimization goal was consistent with the WOA-RFR model (Ren et al., 2006). All three models were built on the same platform (RFR model) to highlight the impact of the optimization model, effectively avoiding the interference of the initial model performance differences caused by the different platforms. In addition, PSO has been widely used in various evaluation methods. Choosing this classic algorithm for comparison will provide more representative results. 4.3.1 Model fitting performance comparison To compare the fitting ability and precision of the three models, the RMSE, R2, mean absolute error (MAE) and mean absolute percent error (MAPE) values of each model sample set were calculated. To avoid randomness, the three models were run 50 times, and the average value of the abovementioned indicators after 50 runs was calculated for comparison. The results are shown in Table 6. Table 6 Comparison of performance indicators of different models Model Sample set RMSE R2 MAE MAPE

643 644 645 646 647 648 649 650 651 652 653 654 655 656

WOA-RFR Training set 0.1131 0.9934 0.0868 0.0408

Test set 0. 1365 0. 9907 0.0894 0.0410

PSO-RFR Training set 0.1156 0.9932 0.0937 0.0411

Test set 0.1906 0.9817 0.1151 0.0423

RFR Training set 0.1726 0.9851 0.0909 0.0572

Test set 0.2277 0.9739 0.1285 0.0580

As shown in Table 6, the WOA-RFR model and the PSO-RFR model provide the most accurate fitting results for the sample set and display better fitting ability than the other model. This result suggests that parameter tuning has a considerable impact on the performance of the RFR model. For the training data, the results of the four fitting evaluation indices for the WOA-RFR model and the PSO-RFR model are similar. However, for the test data, the RMSE and MAE of the WOA-RFR model were reduced by 28.80% and 22.61%, respectively, compared to the PSO-RFR model. This result is different from the training data. The accuracy of the WOA-RFR mode for the test data is obviously higher than that of the PSO-RFR model. This finding suggests that the generalization ability of the WOA-RFR model is better than that of the PSO-RFR model, which is due to the excellent optimization ability of the WOA. 4.3.2 Comparison of the evaluation results The resilience evaluation index data for 15 farms in the Jiansanjiang Administration in 2016 were input into the above 3 models, and the simulation results are shown in Table 7. Table 7 Comparison of evaluation results of flood resilience of farms under different models Farm

WOA-RFR

PSO-RFR

RFR

859

IV

IV

III

Shengli Qixing Qindeli Daxing Nongjiang Qianjin Chuangye Hongwei Qianshao Qianfeng Honghe Yalvhe Erdaohe Qinglongshan 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680

III V IV IV III IV IV IV III III III III IV III

III V IV IV III IV IV III IV III III III IV III

III IV III III III IV IV III III III III III III III

As shown in Table 7, the simulation results for the WOA-RFR model and the PSO-RFR model are similar, further verifying the reliability of the WOA-RFR model. The evaluation results for the RFR model yielded lower resilience ratings for the 859 Farm, Qixing Farm, Qindeli Farm, Daxing Farm and Erdaohe Farm than those based on the other two models. By assessing the importance of the indicators of the RFR model output, it can be observed that the RFR model considers the precipitation indicator to be much more important than the other indicators (see Figure 5), and the precipitation totals for the abovementioned farms in 2016 are all greater than 680 mm. Obviously,the reason for the low resilience rating of these five farms is that the weight of the precipitation indicator in the RFR model is too high while the weights of other indicators, such as per capita GDP, youth population, population density, GDP growth rate, and tertiary industry output value, are too low. In the RFR model, the precipitation indicator is too important, and the influence of other indicators is ignored, which results in different results from the other models. Thus, the RFR model misjudged the resilience rating of 859 Farm, Qixing Farm, Qindeli Farm, Daxing Farm and Erdaohe Farm. Both the PSO-RFR model and the RFR model rated the resilience rating of Hongwei as Grade III, and the WOA-RFR model rated it as Grade IV. According to the above driving factor analysis, the 2016 precipitation total at Hongwei Farm was 679.5 mm, the per capita GDP was 1,129.27 million yuan, the tertiary industry output was 268.512 million, the population density was 17.67 people/km2, and the proportion of the young and middle-aged population was 74.85%. According to Table 2, the corresponding flood disaster recovery classes are Grade III, Grade V, Grade IV, Grade IV, and Grade III. Moreover, although the PSO-RFR model determines the resilience rating to be Grade III, the output resilience index is 3.326, which is very close to the Grade IV boundary. Therefore, the evaluation results for the WOA-RFR model can be considered reliable.

Evalution Factors

Precipitation Per capita GDP Proportion of YMAP Population density GDP growth rate Tertiary indstry value Per capita savings Per capita income Number of health institutions

The Importance of

Number of hospital beds Number of health technicians Food reserve capacity

The Indicators Ultra High

Population growth rate Cultivated area Evacuation area Grain production

Medium Low

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

681 682 683 684 685 686 687 688

689 690 691 692 693 694

MeanDecreaseAccuracy

Fig.5. Measure the importance of each indicator in the RFR model (Note: The acronym YMAP stands for young and middle-aged population) 4.3.3 Comparison of the rationality and stability of the evaluation models. Ten farms were randomly selected from 15 farms, the flood disaster resilience index values obtained by the above three methods were ranked, and the relative rationality of the ranking was determined. The results are shown in Table 8 Table 8 Sorting results and relatively reasonable ordering of each model Farm

WOA-RFR

PSO-RFR

RFR

Relatively reasonable sort

859 Qixing Qindeli Daxing Erdaohe Qianshao Qianjin Chuangye Honghe Hongwei

9 10 5 6 4 2 8 7 1 3

9 10 5 6 4 2 7 8 1 3

7 10 4 8 2 2 9 6 5 1

9 10 5 6 4 1 8 7 2 2

The above method was repeated 20 times, and equation (12) was used to calculate Spearman’s rank correlation coefficient between the ranking of the evaluation results of each model and the reasonableness to obtain 20 sets of coefficients. The rationality coefficient R and the stability coefficient S of the evaluation method were calculated using equations (13) and (14), respectively, and the results are shown in Table 9. Table 9 Final evaluation results of rationality and stability of each model Evaluation model

WOA-RFR

PSO-RFR

RFR

Spearman’s rank correlation coefficient

Rationality coefficient R Stability coefficient S

0.988 0.965 0.974 0.952 0.982 0.925 0.984 0.982 0.982 0.962 0.982 0.892 0.970 0.955 0.938 0.988 0.958 0.915 0.964 0.921 0.964 0.976

0.982 0.933 0.979 0.920 0.970 0.893 0.977 0.936 0.970 0.912 0.895 0.967 0.928 0.919 0.934 0.976 0.878 0.908 0.893 0.861 0.943 0.969

0.940 0.824 0.922 0.890 0.842 0.905 0.849 0.890 0.842 0.874 0.877 0.892 0.852 0.824 0.821 0.933 0.895 0.910 0.877 0.824 0.874 0.959

695 696 697 698 699 700 701

From Table 9, the R values of the WOA-RFR model, PSO-RFR model, and RFR model are 0.964, 0.943, and 0.874, respectively. In the 20 evaluations, the WOA yielded the highest Spearman’s rank correlation coefficient 18 times, suggesting that the WOA-RFR model evaluation results were most reasonable. The rationality ranking of the above 3 models is as follows: WOA-RFR model > PSO-RFR model > RFR model. Moreover, the model stability coefficients are 0.976, 0.969, and 0.959 for the WOA-RFR, PSO-RFR, and RFR models, respectively. The stability ranking of the evaluation models is WOA-RFR model>PSO-RFR model>RFR model.

702

5

703 704 705 706 707 708 709 710 711 712 713 714 715 716

In this paper, a novel flood disaster resilience assessment model was constructed using the WOA optimization RFR model and implemented to solve the fuzziness problem in the quantitative expression process of resilience. This method was used to assess the level of flood disaster resilience of 15 farms under the Jiansanjiang Administration. The results showed that the overall level of flood disaster resilience of the Jiansanjiang Administration from 2002-2016 was in a steady rising phase and the differences between farms showed an increasing trend between 2002 and 2011 but a decreasing trend after 2011. Spatial variations are characterized by a high level of resilience in the southwestern farms and a low level of resilience in the northeast. The driving factor analysis confirmed that the five main indicators affecting the results were precipitation, per capita GDP, tertiary industry output value, population density, and the proportion of young and middle-aged population. In order to improve the resilience of farms in the northeast and promote the sustainable development of the Jiansanjiang Administration, the following suggestions are proposed: (1) encourage the northwestern farms to promote the economic development of northeastern farms; (2)

Conclusions

717 718 719 720 721 722 723 724 725 726 727 728 729 730 731

strengthen the population introduction of farms in the northeast; (3) precipitation monitoring and flood hazard warning during the flood season should be performed because of the abundant rainfall of northern farms. A comparison of the PSO-RFR model, the RFR model and the WOA-RFR model showed that the fitting accuracy and generalization ability of the WOA-RFR model are better than that of the other models. In addition, the WOA-RFR model has better stability in the evaluation process and the evaluation results are more reasonable. The developed approach was portable to areas of disaster resilience assessment and regional sustainability assessment. Due to the availability of indicators and the limited length of the sequence, it may have an impact on the evaluation results of flood disaster resilience. In future research, an appropriate increase in indicators can be considered to improve this problem. In addition, with the improvement of the natural disaster monitoring network, the complex evolutionary characteristics of flood disaster recovery in disaster-stricken areas and the post-disaster sustainable development strategies will become important research directions.

732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760

Acknowledgement:This study is supported by the National Natural Science Foundation of China (No.51579044, No.41071053), National Science Fund for Distinguished Young Scholars (No.51825901), National Key R&D Program of China (No.2017YFC0406002), Natural Science Foundation of Heilongjiang Province (No.E2017007).

References Adger, W.N., 2000. Social and ecological resilience: are they related? Prog. Hum. Geog. 24 (3), 347-364. Adger, W.N., 2005. Social-Ecological Resilience to Coastal Disasters. Science 309 (5737), 1036-1039. Aerts, J.C.J.H., Botzen, W.J., Clarke, K.C, et al., 2018. Integrating human behaviour dynamics into flood disaster risk assessment. Nat. Clim. Change 8 (3), 193-199. Alexander, D.E., 2013. Resilience and disaster risk reduction: an etymological journey. Nat. Hazard. Earth Sys. 13 (11), 2707-2716. Alzaidi, K.M.S., 2019. Multiple DGs for Reducing Total Power Losses in Radial Distribution Systems Using Hybrid WOA-SSA Algorithm. Int. J. Photoenergy 2019,1-20. Aziz, M.AE., Ewees, A.A., Hassanien, A.E., 2017. Whale Optimization Algorithm and Moth-Flame Optimization for multilevel thresholding image segmentation. Expert Syst. Appl. 83, 242-256. Breiman, L., 2001. Random Forests. Mach. Learn. 45, 5–32. Chau, K.W., 2017. Use of Meta-Heuristic Techniques in Rainfall-Runoff Modelling. Water-Sui 9 (3), 186. Chen, X., Ishwaran, H., 2012. Random forests for genomic data analysis. Genomics 99 (6). Cheng, C.T., Chau., K.W., 2002. Three-person multi-objective conflict decision in reservoir flood control. Eur. J. Oper. Res. 142 (3), 625-631. Cimellaro, G.P., Reinhorn, A.M., Bruneau, M., 2010. Framework for analytical quantification of disaster resilience. Eng. Struct. 32 (11), 3639-3649. Diao, J.K., Cui, D.W., 2017. Initial Water Right Allocation in Yunnan Province Based on Whale Optimization Algorithm and Projection Pursuit. J. Nat. Resour. 32(11), 1954-1967.

761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803

Gari, S.R., Newton, A., Icely, J.D., 2015. A review of the application and evolution of the DPSIR framework with an emphasis on coastal social-ecological systems. Ocean Coast. Manage. 103, 63-77. Fotovatikhah, F., Herrera, M., Shamshirband S., 2018. Survey of computational intelligence as basis to big flood management: challenges, research directions and future work. Eng. Appl. Comp. Fluid. 12 (1), 411-437. Ge, Y., Shi, P.J., Zhou, X., et al., 2011. Research on Assessment of Flood Resilience: A Case Study of Changsha City, Hunan Province. J. B. Nor. Univ. (Nat. Sci.) 47 (02), 197-201. Gromping, U., 2009. Variable Importance Assessment in Regression: Linear Regression versus Random Forest. Am. Stat. 63 (4), 308-319. Guo, X.K., 2011. Research on Traffic Volume Forecast of Yu he Bridge Based on Serial Number Sum Theory. Logist. Tech. 34 (04), 11-12. Han, M., Zhang, C., Lu, G., et al., 2017. Gradient response of wetland landscape pattern of human activity intensity in the Yellow River Delta. Tran. Chin. Soc. Agric. Eng. 33 (06), 265-274. Hemant, I., Min, L., 2018. Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Stat. Med. 38 (4), 558-582. Heng, C., Lam, N.S.N., Yi, Q., et al., 2018. A Synthesis of Disaster Resilience Measurement Methods and Indices. Int. J. Disast. Risk Re. 31:844-855. Holling, C.S., 1973. Resilience and Stability of Ecological Systems. Annu. Rev. Ecol. S. 4:1-23. Kaveh, A., Ghazaan, M.I., 2016. Enhanced Whale Optimization Algorithm for Sizing Optimization of Skeletal Structures. Mech. Based Des. Struc. 45 (3), 345-362. Khalil, Y., Alshayeji, M., 2019. Distributed Whale Optimization Algorithm based on MapReduce. Concurr. Comp. Pract E. 31, 48-72. Kotzee, I., Reyers, B., 2016. Piloting a social-ecological index for measuring flood resilience: A composite index approach. Ecol. Indic. 60, 45-53. Lai C.G., Chen. X.H., Zhao, S.W., et al., 2015. Flood risk assessment model based on random forest and its application. J. Hydraul. Eng. 46 (01), 58-66. Lam, L.M., Kuipers, R., 2018. Resilience and Disaster Governance: some insights from the 2015 Nepal Earthquake. Int. J. Disast. Risk Re. 33:321-331. Liu, C.Y., Shang, S., Zhao, Q., et al., 2018. Evaluation of resilience after flood disaster in Hunan Province based on GIS and TOPSIS-PSR. Wat. Resour. Pow. 36 (01), 70-73. Liu, D., Liu, C.L., Fu, Q., et al. 2017. ELM evaluation model of regional groundwater quality based on the crow search algorithm. Ecol. Indi. 81, 302-314. Liu, J., Shi, P.J., Ge, Y., et al., 2006. A Review of Research Progress on Disaster Resilience. Ear. Sci. (02), 211-218. Liu, X.P., Li, X., Ye, J.A., at al., 2007. Using ant colony intelligence to mine the conversion rules of geographic cellular automata. China Sci. 2007 (06), 824-834. Martens, D., Backer, M.D., Haesen, R., et al., 2007. Classification With Ant Colony Optimization[J]. IEEE T. Evolut. Comput. 11 (5), 651-665. Medani, K.B., 2018. Whale optimization algorithm based optimal reactive power dispatch: A case study of the Algerian power system. Electr. Pow. Syst. Res. 163, 696-705.

804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847

Miceli, R., Sotgiu, I., Settanni, M., 2008. Disaster preparedness and perception of flood risk: a study in an alpine valley in Italy. J. Environ. Psychol. 28 (2), 164-173. Mirjalili, S., Lewis, A., 2016. The whale optimization algorithm. J. Adv. Eng. Softw. 95, 51– 67. Mostafa, A., Hassanien, A.E., Houseni, M., et al., 2017. Liver segmentation in MRI images based on whale optimization algorithm. Multimed. Tools Appl. Onu, P.U., Quan, X., Xu, L., et al., 2016. Evaluation of sustainable acid rain control options utilizing a fuzzy TOPSIS multi-criteria decision analysis model frame work. J. Clean. Prod. 141, 612-625. Pelling, M., 2003. The Vulnerability of Cities: Natural Disasters and Social Resilience. Earthscan, London. Ren, Z.W., San, Z., Chen, J.F, et al., 2006. Improved PSO algorithm and its application in PID parameter tuning. J. Syst. Simul. (10), 2870-2873. Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., et al., 2012. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. 67, 93-104. Roozbeh, M., Babak, M., Shahabodin, S., et al., 2018. Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng. Appl. Comp. Fluid 12 (1), 584-597. Rose, A., 2007. Economic resilience to natural and man-made disasters: Multidisciplinary origins and contextual dimensions. Nat. Hazards 7 (4), 383-398. Sanderson D., 2017. IFRC: World Disasters Report 2016: Resilience: saving lives today, investing for tomorrow. Int. Fed. Red Cross and Red Crescent Soc. Simhadri, K.S., 2019. Comparative performance analysis of 2DOF state feedback controller for automatic generation control using whale optimization algorithm. Optim. Contr. Appl. Met. 40 (7), 24-42. Strobl, C., Malley, J., Tutz, G., 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods 14 (4), 323-348. Sun, H.H., Cheng, X.F., Chen, Y.X., et al.,2016. Spatial and temporal evolution of regional flood disaster resilience——Taking the Chaohu Basin as an example. Resour. Environ. Yangtze. Basin. 25(9), 1384-1394. Tang, G.J., 2017. Construction and Comprehensive Evaluation of Urban Disaster Resilience Index System. J. Guangzhou Univ. 16(2), 31-37. Tesfamariam, S., Liu, Z., 2010. Earthquake induced damage classification for reinforced concrete buildings. Struct. Saf. 32 (2), 154-164. Wan, Z., Hong, Y., Khan, S., et al., 2014. A cloud-based global flood disaster community cyber-infrastructure: Development and demonstration. Environ. Modell. Softw. 58:86-94. Wang, R., Yin, Z.J., Zhu, C.Z., 2014. Analysis of storm floods in heilongjiang and songhua rivers in 2013. Hydrology 34 (06),67-71+76. Wang, W.C., Chau, K.W., Lin, Q., 2015. Improving forecasting accuracy of medium and long-term runoff using artificial neural network based on EEMD decomposition. Environ. Res. 139, 46-54.

848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864

Wang, Z.L., Chen, X.H., Lai, C.G., et al., 2013. Flood risk assessment model based on particle swarm rule mining algorithm. System Eng. Theor. Prac. 33 (06), 1615-1621. Wen, K.G., Sun, Y.Z., 2007. China meteorological disaster code heilongjiang. Beijing Meteorological Press 43-146. Yaseen, Z.M., Zaher, M., Sulaiman, S.O., 2019. An enhanced extreme learning machine model for river flow forecasting: state-of-the-art, practical applications in water resource engineering area and future research direction. J. Hydrol. 569, 387-408. Zhang, L.L., Li, Heng., Liu, D., et al., 2019. Identification and application of the most suitable entropy model for precipitation complexity measurement. Atmos. Res. 221, 88-97. Zhang, M., Xiang, W.B., 2018, Measuring Social Vulnerability to Flood Disasters in China. Sustainability-Basel 10 (8), 2676. Zhao, H.R., 2017. Energy-Related CO2 Emissions Forecasting Using an Improved LSSVM Model Optimized by Whale Optimization Algorithm. Energies 10 (7), 874. Zheng, Y., Li, Y., Wang, G., et al., 2019. A Novel Hybrid Algorithm for Feature Selection Based on Whale Optimization Algorithm. IEEE Access 7, 14908-14923. Zhou, Y., Ling, Y., Luo, Q., 2017. Lévy Flight Trajectory-Based Whale Optimization Algorithm for Global Optimization. IEEE Access, 1-1.

1. 2. 3. 4. 5.

Propose an improved regional flood disaster resilience evaluation model (WOA-RFR). The resilience of the study area has increased steadily during the past 15 years. Resilience levels in southern farms are generally higher than in the north. Precipitation has the greatest impact on regional resilience. WOA-RFR model has superior generalization performance and excellent stability.

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: