Journal of Computational and Applied Mathematics 352 (2019) 278–292
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
Spatiotemporal interpolation through an extension of differential evolution algorithm for agricultural insurance claims ∗
Ezgi Nevruz , Ş. Kasırga Yıldırak Department of Actuarial Sciences, Hacettepe University, Ankara, Turkey
article
info
Article history: Received 14 May 2018 Received in revised form 24 October 2018 Keywords: Crop-hail insurance Inverse distance weighting Reduction approach Spherical distance
a b s t r a c t Assessment of actuarial risks in agricultural insurance is a specific area which evaluates risks in geographical framework. In order to manage weather related risks, the agricultural insurance portfolio is analyzed according to spatial and temporal characteristics of hazard regions. To reduce the basis risk associated with the location of meteorological stations, inverse distance weighting method with reduction approach is employed to interpolate meteorological values related to the location and the time of the reported agricultural claims. The results are evaluated for the data provided by Turkish Agricultural Insurance Pool, which is a unique agricultural claims data set covering more than 100 products for 5 different hazard types. Since height is a very significant factor that determines the values for the meteorological variables concerned in this study, altitude values are used to choose the best initial sample set. We extend stochastic differential evolution (DE) optimization algorithm to a multivariate setting for the spherical space to measure the closeness of estimated altitudes to actual altitudes. Moreover, we propose an optimal way of choosing the population size for the initialization step as another improvement in relation to DE algorithm. © 2018 Elsevier B.V. All rights reserved.
1. Introduction In this study, we extend the differential evolution (DE) algorithm by moving from one-dimensional distance-based evaluation to angular-based spherical distance calculation in which coordinates of both the observed meteorological values and reported agricultural claims are taken into consideration. We also offer a parsimonious solution to( the ) problem of 7 choosing population size of DE algorithm by taking the advantage of the binomial coefficient M = 4 = 35. After determining the optimal sample set through DE algorithm, we estimate meteorological values related to each reported individual claim of agricultural crop-hail insurance using spatiotemporal interpolation (STI). To assess the appropriateness of the extended algorithm that we propose, we work with a specific data set given by Turkish Agricultural Insurance Pool (TARSIM), which is a government-backed agricultural insurance system. Improvement of agricultural sector depends on the interrelation among agrarian production, general employment, international trade and other sectors of economy. The agricultural industry contributes to economy with a rate of 6.1% in the Turkey’s gross domestic ∗ Corresponding author. E-mail addresses:
[email protected] (E. Nevruz),
[email protected] (Ş.K. Yıldırak). https://doi.org/10.1016/j.cam.2018.11.022 0377-0427/© 2018 Elsevier B.V. All rights reserved.
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
279
product. Since many industrial plants use agricultural products as raw materials, agricultural risk management possesses a very significant importance for the economy. Being 1st and 7th biggest agronomy in the EU and the world, respectively, and exporting thousands of different agricultural products make Turkey have a competitive agronomy on a large international scale [1]. These functions of agriculture remain effective as long as the risks which threaten the agricultural production are managed accurately. In this sense, paying attention to analyze the actuarial risks in agricultural insurance is the principal way to improve the agricultural economics. Evaluation and management of risks mostly depend on actuarial and statistical inference in non-life insurance. However, it is different when one deals with environmental risks. In agricultural insurance, the impact of weather and geographical conditions can be very severe in comparison with other areas of insurance. Hence, it is essential to take into account the environmental characteristics of the portfolio. In this regard, the meteorological features related to agricultural claims should be considered during the risk assessment. Classification and clustering of the agricultural risks are very useful for grouping claims having similar characteristics especially when we deal with big data. In environmental sciences, it is difficult to reach exact information about the characteristics of a variable related to its location. In order to overcome this problem, interpolation methods are very useful for estimating the approximate values of unknown variables. In this study, since we have no meteorological information on the exact location of the claims, we interpolate observed values using efficient techniques for the aim of measuring the impacts of their environmental attributes. Spatial interpolation techniques are very functional and extensively appropriate in a wide variety of applications in geographic information system (GIS) [2–5]. Unknown values which are not included in a sample are estimated using measured values at 2-dimensional location and time points through STI techniques. Kriging and inverse distance weighting (IDW) methods are two commonly-used STI algorithms. Latitude, longitude and altitude values of both meteorological stations and the villages, where the claims occur, are available at online batch geo-coding tools of ArcGIS software. Thus, we have obtained these values and have chosen the best initial sample values according to the closeness of the estimated altitudes to the actual altitudes. This optimization problem is studied by Susanto et al. [6] for one-dimensional case. In our study, we extend DE algorithm to the multivariate case for a spherical space using an angular-based distance computation and interpolate the meteorological information both spatially and temporally by using IDW method with reduction approach. Hejazi and Jackson [7] use spatial interpolation under a neural network approach to compute the Solvency Capital Requirement, which is one of the significant risk measures within the Solvency II framework regulating the EU’s insurance market. Spatial interpolation provides a considerable decrease in the complexity of the computation in Monte Carlo (MC) simulation scheme. In their study, by the help of spatial interpolation, a sample set of insurance policies are chosen and a risk measure is computed by MC using this sample set. After that, the MC results are used for estimating the riskiness of other policies in the portfolio with the spatial interpolation method. Likewise, we choose the best initial sample set through the multivariate generation of DE algorithm and use this set as the input of STI method to estimate the meteorological variables related to each insurance claim in an agricultural crop-hail insurance portfolio. Instead of MC simulation scheme, we use DE algorithm as an optimization technique since the sample size of the claim data set is large enough for estimation purposes. DE algorithm has some advantageous features such as straightforward execution, providing dependable, effective and robust results [8], hence it is preferred in many studies such as mechanical and chemical engineering [9,10], real-life applications [11], geophysics [12], artificial neural networks [13] and so on. The paper is organized as follows: In Section 2, we introduce our optimization problem and present the extension of DE algorithm in order to choose the best initial sample values for the STI method. We also explain the IDW method with reduction technique. In Section 3, we present our data set of crop-hail insurance and provide the results of both the DE optimization and the IDW method. In Section 4, concluding remarks are made. 2. Spatiotemporal techniques Susanto et al. [6] summarize different interpolation techniques used in the previous environmental studies. These methods are IDW and various extensions of IDW, clustering assisted regression, kriging and various extensions of kriging, splines and some extensions of splines, random forest and some mixtures of these methods etc. The common assumption of the spatial interpolation techniques is that the closer sample points cause stronger correlation between the sample point and the location to be estimated [14,15]. STI, which estimates the unknown value of a variable belonging to location and time pairs, has become popular in GIS applications as an extension of spatial interpolation techniques [16–20]. The meteorological quantities of a claim could be estimated related to its location and its time by the help of STI. Li et al. [21] claim that combining space and time together gives more accurate results. In general, it is deduced that kriging is the most appropriate technique among these algorithms, however it has a disadvantage of not performing well computationally. For this reason, IDW is preferable since it is more efficient and it
280
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
performs better [6]. Any spatial interpolation technique could be the best method for a particular case [22,23]. It is based on the characteristics and the design of the sample, the data set and the relationship among factors having influence on sampling [24]. In our study, we use an IDW-based technique with reduction approach in order to interpolate unknown meteorological values of the reported agricultural claims. Since IDW is a distance-based method, we are able to use this method as an STI technique considering both temporal and spatial distances from the sample points. 2.1. Optimization problem In order to obtain the optimal sample points reflecting the information on unknown points in the most suitable way, we recommend an adaptation of DE algorithm in our study. The random optimization methods may cause a disadvantage that randomly chosen sample points may be gathered in a particular area and the interpolation does not produce reasonable results notably for small samples. However, DE algorithm eliminates this drawback with choosing the optimal sample points using an optimization criterion rather than just randomizing [6]. According to Capozzi et al. [25], mid-altitude values have higher impacts on the hail probability. Since altitude is one of the decisive factors for the environmental risk evaluation studies, we use altitude data to calculate the fitness value, which denotes the best estimation considering the estimated and actual values. Susanto et al. [6] define the optimization problem as maximizing the leave-one-out cross-validation (LOOCV) error of the (k − 1)-dimensional sample when kth element is left out, so that kth element is the best estimator. In our study, we prefer to use sum of absolute errors instead of sum of squared errors used in their study. LOOCV error represents the total absolute error when kth element is excluded and it is defined as eabs k (x −k ) =
∑⏐ ⏐ ⏐xˆ i − oi ⏐ ,
(2.1)
i̸ =k
where eabs is the total absolute error, xˆ i denotes the estimated value at location xi which is obtained by using STI method k with the remaining set x −k = x \{xk }, and oi is the actual value that is observed at location xi . In this procedure, 4 surrounding sample points must be selected to interpolate the unknown value for IDW method with reduction technique. For univariate case, the set of remaining (n − 4) points is the initial set of elements of DE algorithm where the entire sample set is x = {x1 , x2 , . . . , xn }. In our case, we firstly choose m closest sample locations x (1) , x (2) , . . . , x (m) and then we obtain all possible 4-subsets out of the set of m distinct sample points and find the optimal subset that minimizes the (m)value of the cost function defined in X 1 , X 2 , . . . , X N ]. DE algorithm. Therefore, we define X as the input of the optimization having N = elements, i.e. X = [X 4 { } Here, {X k ; k = 1, 2, . . . , N } represents the kth combination from the set x (1) , x (2) , . . . , x (m) and it is defined as
]
[
X k = x k,1 ; x k,2 ; x k,3 ; x k,4 ;
k = 1, 2, . . . , N
; x (jk,2 ) ; x (jk,3 ) ; x (jk,4 ) ; ji,D ∈ {1, 2, . . . , m} for D = 1, 2, 3, 4. Here, X k is the (4 × 2)with x k,1 ; x k,2 ; x k,3 ; x k,4 = x dimensional matrix which consists of locations x k,j having both latitudes latx k,j and longitudes lonx k,j in each row. In addition to LOOCV error, Susanto et al. [6] also include another measure called ‘‘sparsity’’ for computing of fitness value. We redefine this measure as the coefficient of variation (CV) which measures the dispersion of pair-wise distances between m closest sample points. ( ) pw where = ∆x (1)};x (2) , ∆x (1) ;x (3) , . . . , ∆x (1) ;x (m) , . . . , ∆x (m−1) ;x (m) be the pair-wise (distance vector { Let ∆X ) ′ (j) (j′ ) ′ ) ; j ̸ = j ∈ 1, 2, . . . , m represents the Euclidean distance between the location x ∆ = lat , lon and x = (j) (j) (j) (j x x ( x ;x ) latx (j′ ) , lonx (j′ ) .
]
[
[
(jk,1 )
]
Definition 2.1.) In order to calculate the Euclidean distance on a sphere between two locations x = (latx , lonx ) and ( y = laty , lony , the polar coordinates are defined as x′ =
(
y′ =
(
1x
′
) , 2 x′ , 3 x′ = (cos θx , sin θx cos ϕx , sin θx sin ϕx )
and 1y
′
) ( ) , 2 y′ , 3 y′ = cos θy , sin θy cos ϕy , sin θy sin ϕy
π
− latx , θx = lonx , ϕy = π2 − laty and θy = lony . Here, it should be noted that 2 ′ ′ (1 y ) + (2 y )2 + (3 y′ )2 = 1 for a unit sphere, θx , θy ∈ [−π, π] and ϕx , ϕy ∈ [− π2 , π2 ].
where ϕx = √
2
√ (1 x′ )2 + (2 x′ )2 + (3 x′ )2 = 1 and
When one considers the altitudes of the locations, i.e. x = (latx , lonx , altx ), the polar coordinates are defined as x′ =
(
1x
′
) , 2 x′ , 3 x′ = (rx cos θx , rx sin θx cos ϕx , rx sin θx sin ϕx )
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
where the radial coordinate is taken as rx = altx . In this case, the condition changes as rx = 1 for the unit sphere. Therefore, the Euclidean distance between the locations x and y is calculated as
∆x ;y = x ′ − y ′
√
281
(1 x′ )2 + (2 x′ )2 + (3 x′ )2 = rx . Here,
(2.2)
for both rx = 1 and rx > 1. Lastly, the coefficient of variation is calculated as
CV (∆X pw ) =
σ (∆X pw ) , µ (∆X pw )
(2.3)
where σ (∆X pw ) and µ (∆X pw ) are the standard deviation and the mean of the vector ∆X pw , respectively. Lastly, the optimization problem turns out to be a minimization problem as
]−1
costk = eabs k (X −k )
[
CV (∆X pw ) ;
k = 1, 2, . . . , N .
(2.4)
where X −k = X \{X k } is the ‘‘leave-one-out’’ set used as the input for DE algorithm. Therefore, our purpose is to minimize the cost function value costk defined as multiplication of inverse LOOCV error and the dispersion, so that kth sample set is the best initial sample set for the interpolation.
2.2. Differential evolution algorithm: An extension The minimization problem defined in Eq. (2.4) is solved by DE algorithm which enables us to optimize nonlinear and nondifferentiable continuous space functions. A minimization method must (i) be capable to manage to optimize nonlinear, nondifferentiable and multimodal cost functions; (ii) be convenient to parallel computing when the cost functions are intensive to calculate; (iii) be practical, i.e. can be run with an acceptable number of control variables; (iv) converge consistently to the global minimum in successive separate trials. Storn and Price [26] propose DE algorithm to fulfill these necessities. This algorithm is a stochastic parallel method which uses several processors in order to calculate a single optimization problem fast. Since it is a direct search (DS) method, the gradient of the cost function is not a requisite [27,28]. To obtain the optimal solution, DS techniques try to find a group of points surrounding a particular point, seeking the one which has a lower cost function value in comparison with the relevant point’s cost function value whereas a large number of optimization methods require details about first or higher derivatives of the cost function. Therefore, DS methods are useful when the cost function is non-differentiable [29]. In this algorithm, the optimal vector among M vectors is determined. M is the population size and xi,G ; i = 1, 2, . . . , M is the D-dimensional parameter vector and is used as Gth generation’s population vector. The initial D-dimensional parameter population is determined randomly from M vectors for each generation whereas the population size M remains the same [26]. According to Storn and Price [26], there are four main steps in DE algorithm: initialization, mutation, crossover, and selection. In the first operation, three parameter populations are determined randomly from uniform distribution defined on the interval (1, M). A transformed vector υ i,G+1 is obtained in the ‘‘mutation’’ step. This vector is mutated by adding the difference between two random population vectors multiplied by a weighting factor to random base vector for the univariate case, where the elements in the parameter vector are one-dimensional. As an extension of the classical DE algorithm, we update this calculation to the multivariate case, i.e. the elements in the parameter vector are two-dimensional, which extends the parameter vector to the parameter matrix. In the ‘‘crossover’’ step, it is aimed at forming the trial vector u i,G+1 using an alternative pre-arranged parameter vector so-called target vector x i,G and the transformed vector υ i,G+1 . This procedure is actually mixing the parameter vectors randomly and guarantees that the trial vector takes at least one element from the mutant vector. For the last ‘‘selection’’ part, the algorithm selects the trial vector or the target vector according to their cost function values. The basic procedure under DE algorithm is given specifically in the following subsections.
2.2.1. Initialization and mutation In univariate case, a vector is mutated for each D-dimensional target vector xi,G ; i = 1, 2, . . . , M of the generation G as
( ) υ i,G+1 = x r1 ,G + x r2 ,G − x r3 ,G F
(2.5)
where x r1 ,G is the random base vector, x r2 ,G and x r3 ,G are the random parameter vectors with random integers r1 , r2 , r3 and F is a constant. Here, rk ∈ {1, 2, . . . , M ( } ; k = 1, 2), 3 are different from each other and also from i, so that M ≥ 4. F ∈ [0, 2] adjusts how much the difference x r2 ,G − x r3 ,G increases. The choice of the values of M and F is discussed in detail in Section 2.2.4.
282
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
For the sake of simplicity, we redefine the ‘‘leave-one-out’’ set X −k = X \{X k } as a population set having the size M = N − 1 as X pop = X 1,G , X 2,G , . . . , X M ,G
{
}
where X i,G ; i = 1, 2, . . . , M is the ith target matrix of the generation G. For the initialization step, three parameter matrices are chosen from the population set X pop . The initialization set consisting of three (D × 2)-dimensional X rk ,G parameter matrices all of which includes x rk ,j,G pairs for j = 1, 2, . . . , D is obtained as (init)
X i ,G
{ = The set of X rk ,G ; rk ∈ {1, 2, . . . , M }
for k = 1, 2, 3
}
where r1 ̸ = r2 ̸ = r3 ̸ = i are random integers distributed (1, M). Here, X rk ,G is the rk th element of [ uniformly on the interval ] the population set X pop = X −k = X \{X k }. Also, X i,G = x i,1,G ; x i,2,G ; . . . x i,D,G has D location pairs consisting of latitude and longitude values which are denoted as latx i,j,G and lonx i,j,G , respectively. Adding a weighted Euclidean distance between the nodes x r2 ,j,G and x r3 ,j,G , i.e. ∆x r ,j,G ;x r ,j,G , to x r1 ,j,G for each j = 2 3 1, 2, . . . , D, we adapt the distance calculation from the univariate case of DE algorithm shown in Eq. (2.5) to multivariate (init) case. In other words, this extension can be expressed through X i,G where X r1 ,G is the random base matrix, X r2 ,G and X r3 ,G are the random parameter matrices with random integers r1 , r2 , r3 . Nevruz [30] assumes that the distances between sample points are calculated linearly. In this paper, the mutated vector in Eq. (2.5) is obtained in terms of spherical distances. We hence suggest a multivariate generation of DE algorithm for spherical spaces. Having calculated the polar coordinates of x rk ,j,G ; k = 1, 2, 3, which are shown as x ′r ,j,G , by using Definition 2.1, the polar k coordinates of the mutant matrix is generated through the multivariate extension as follows: t
⏐ ⏐ υ ′i,j,G+1 = t x ′r1 ,j,G + ⏐t x ′r2 ,j,G − t x ′r3 ,j,G ⏐ F
for t = 1, 2, 3
(2.6)
and the spherical coordinates of the mutant matrix is normalized as
υ ′i,j,G+1 . υˆ ′i,j,G+1 = ′ υ i,j,G+1 Therefore, the form of the normalized spherical coordinates of mutant matrix is given as follows:
υˆ ′i,j,G+1
=
(
υ′
1 i,j,G+1
, 2 υi′,j,G+1 , 3 υi′, j,G+1
)
) ( = cos θυ i,j,G+1 , sin θυ i,j,G+1 cos ϕυ i,j,G+1 , sin θυ i,j,G+1 sin ϕυ i,j,G+1 The mutated latitude and longitude values can be obtained by the formulae given below: latυ i,j,G+1 =
π 2
− ϕυ i,j,G+1 =
π 2
( − arctan
υ′
3 i, j,G+1
) (2.7)
υ′
2 i, j,G+1
and
⎛ lonυ i,j,G+1 = θυ i,j,G+1 = arctan ⎝
⎜
′ 2 υi, j,G+1 cos(ϕυ i,j,G+1 )
υ′
1 i, j,G+1
⎞ ⎟ ⎠.
(2.8)
In Eqs. (2.7) and (2.8), we do not need to set the sign of the arctangent due to Turkey’s geographic coordinates. Since ) ( functions π both latitudes and longitudes of the locations lie in the interval 0 , , the inputs of the arctangent functions are positive 2 ( ) and hence arctan(.) ∈ 0, π2 . 2.2.2. Crossover The crossover step is introduced to(ensure the heterogeneity and ) randomness of( the initialization )and mutation process [26]. The trial matrix u i,G+1 = u i,1,G+1 ; u i,2,G+1 ; . . . ; u i,D,G+1 with u i,j,G+1 = latu i,j,G+1 , lonu i,j,G+1 for each j = 1, 2, . . . , D is obtained as u i,j,G+1 =
⎧ ⎨ υ i,j,G+1
if uj ≤ CR ∨ {j = ri }
⎩ x i,j,G
if uj > CR ∧ {j ̸ = ri }
{
}
;
{
}
j = 1, 2, . . . , D
(2.9)
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
283
where uj ∈ [0, 1] is a uniform random number generated for each j and CR ∈ [0, 1] is a constant chosen by the user to determine the crossover form. The choice of the values for CR is discussed in detail in Section 2.2.4. ri is a randomly chosen index in {1, 2, . . . , D} which guarantees that u i,G+1 takes at least one element from υ i,G+1 . 2.2.3. Selection The aim of this step is to determine the (G + 1)th generation’s target matrix. In order to do that, the values of fitness functions for both the trial matrix u i,G+1 and the target matrix x i,G are compared. These functional values, which(are )called ( ) as ‘‘score value’’ and ‘‘cost value’’, are calculated using Eq. (2.4). Indicating the values as ‘‘score u i,G+1 ’’ and ‘‘cost x i,G ’’, the next generation’s target matrix is obtained as x i,j,G+1 =
⎧ ⎨ u i,j,G+1
if score u i,G+1 ≤ cost x i,G
⎩ x i,j,G
otherwise
{
(
)
(
)} ;
j = 1, 2, . . . , D.
(2.10)
This equation shows that the trial matrix replaces the target matrix in the next generation if the score value is less than or equal to the cost value. For each element X i,G of the population set X pop , this procedure of four steps is repeated, thus M iterations are run for each generation. In order to summarize DE Algorithm explained in the previous subsections, the flowchart in Fig. 1 is represented.
Fig. 1. Flowchart for the extension of DE algorithm.
2.2.4. Criteria for DE algorithm’s control variables The population size must be M ≥ 4 since one base matrix and two parameter matrices are taken from the population randomly different from i, so that at least 4 elements are needed in order to make certain that this operation yields different (init) initialization sets X i,G for each generation. Storn and Price [26] restrict this criterion as 5D ≤ M ≤ 10D where D is the number of the locations included in the parameter matrix. The interval for the factor F which adjusts the amount of the difference between two random parameter matrices added to the base matrix in the mutation step, so-called ‘‘mutation scaling factor’’ [31], is F ∈ [0, 2]. Since it is a constant, the choice of this factor is discussed in several studies. Storn and Price [26] suggest that F = 0.5 can be preferred as the first choice. After checking whether the convergence of the algorithm is mature or not, it can be increased. It is also expressed in this study that the optimization processes of the cases where F < 0.4 and F > 1 are seldomly effective. Therefore, it could be inferred that the criterion F ∈ [0.4, 1] is more effective. Another control variable used in DE algorithm is the constant CR ∈ [0, 1] which is determined by the user. It is suggested that this constant, which is called as ‘‘crossover rate’’ [31], may be selected as CR = 0.1 for the beginning whereas the values
284
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
CR = 0.9 or CR = 1.0 may also be tested in order to see whether a fast convergence can be yielded or not, since higher values of CR usually provide fast convergence [26]. On the other hand, Gämperle et al. [32] recommend the criteria for DE’s control variables are 3D ≤ M ≤ 8D, F = 0.6 and CR ∈ [0.3, 0.9] whereas it is proposed in another study that F = 0.9 is a reasonable for the sake of convergence in probability concurrently [33]. In spite of the different suggestions in the literature, it can be concluded that the setting of control variables may vary according to the objective function and the structure of the data. Our parameter choice for this study is explained in the application part. 2.3. IDW method: Reduction technique Susanto et al. [6] propose an extensive alteration of the IDW-based spatiotemporal algorithm, so called ‘‘the reduction approach’’. The idea behind this technique suggests that the temporal distance is handled as an independent dimension from the spatial distance [34]. In DE algorithm part, the optimal sample set consisting of 4 locations is determined for the IDW method. The meteorological values for different variables are available at these locations at different times. Consider that X = [xx1 ; x 2 ; x 3 ; x 4 ] exists as the optimal sample location set including xi s each of which is used as an input in order to estimate the meteorological value) ( related to an agricultural claim x. Define oi,t as the observed meteorological value at the location x i = latx i , lonx i , altx i and at time t x i . Here, t x i is the vector of the times when the meteorological value at the location x i is measured and let t x i ∈ [tstart , tend ]. Hence, the values tstart and tend vary according to x i for each meteorological variable. In addition, consider that the agricultural claim occurs at the location x = (latx , lonx , altx ) and at time tx . In the IDW method with reduction technique, the unknown meteorological value xˆ related to the location and the time of the agricultural claim is estimated as follows: xˆ =
tend 4 ∑ ∑
wt ,i ot ,i
(2.11)
tstart i=1
where ot ,i is the observed point and wt ,i is the weight related to each observed point. The weights are calculated with respect to the traditional IDW technique using the inverse of spherical distance and one-dimensional temporal distance as us −ut ∆− ∆t s wt ,i = ∑t ∑i 4 i −u −u . s t end i=1 ∆si ∆ti tstart
(2.12)
In this equation, ∆si = ∆x i ;x is the spherical Euclidean distance between the ith known location x i = latx i , lonx i , altx i and the unknown point’s location x = (latx , lonx , altx ). The spatial distances are computed using Eq. (2.2). The temporal distance ∆ti = ∆t xi ;tx is the time difference between the elements of t x i and tx . In order to consider the seasonality in the data set simply, we compute the temporal distances based on (mod 365). Here, the constants us and ut are determined by user and they are called ‘‘spatial distance-decay factor’’ and ‘‘temporal distance-decay factor’’, respectively [6]. The calculation of the weight wt ,i in Eq. (2.12) demonstrates that the function yields division by zero error if ∆si and/or ∆ti is equal to zero. In order to overcome this problem, Susanto et al. [6] recommend to adjust these values to 1 since it is the smallest value that can be assigned to both spatial and temporal distances. In addition, as a result of the discussion for this drawback, it is suggested that the sample value can be taken directly since the distance between unknown point and known point is zero [35]. In our study, in the case of zero distance, the location of the unknown value is replaced with the sample point. Therefore, we determine the distances as follows:
(
)
(i) If ∆ti = 0 and ∆si = 0, then wt ,i = 1, so that xˆ = ot ,i which means that the estimated value is simply the sample point as in [35]. (ii) If ∆ti = 0 and ∆si ̸ = 0, then ∆ti is adjusted as ∆ti = 1 as in [6] and ∆si is calculated as an Euclidean distance. (iii) If ∆ti ̸ = 0 and ∆si = 0, then ∆si is adjusted as ∆si = 1 as in [6] and ∆ti is calculated as the absolute difference between dates. (iv) If ∆ti ̸ = 0 and ∆si ̸ = 0, then ∆si is calculated as an Euclidean distance and ∆ti is calculated as the absolute difference between dates. 3. Application The aim of this study is to obtain the estimated meteorological information of the agricultural claim data. For this aim, we first extend the DE optimization technique for the selection of the optimal sample data set to be used as an input for STI.
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
285
IDW method with reduction technique is an efficient approach to estimate the unknown meteorological values for the points related to the occurred claims. We solve the optimization problem proposing an extension of DE algorithm and obtain the estimates of meteorological values for individual crop-hail insurance claims through IDW with reduction technique using Matlab. For displaying the graphs and obtaining some of the numerical results, we use the tools of ArcGIS software.
3.1. Data TARSIM is a governmental institution taking the responsibility for the development of agricultural insurance in Turkey. TARSIM has the insurance lines such as crop insurance, district based drought yield insurance, greenhouse insurance, cattle insurance, sheep and goat insurance, poultry insurance, aquaculture insurance and beehive insurance. Since 94.8% of the policies of agricultural products are crop insurance policies in 2014, we focus on crop insurance [36]. Crop insurance covers the agricultural products exposed to various sources of risk such as hail, frost, storm and flood. Within the crop insurance products, 42.8% and 50.4% of the causes of the paid losses are hail and frost, respectively. Also, the frost hazard is covered together with the hail hazard. Thus, we study crop-hail insurance in order to analyze risks using a data set which is big enough. Having omitted the canceled policies from the recorded data set, we have 975,971 crop policies (including zero claims) caused by hail in 2014. Since the information about the policy holders was not shared by TARSIM, we need to obtain the individual claims through village codes and lot-block codes, together. After merging the same village codes and lot-block codes, the number of policies decreased to 831,325. Among these policies, the number of the recorded claims arose from hail hazard (including zero claim amounts) is 54,113. The variables used for the application are the location (latitude, longitude and altitude) and the date of the claims. There were some problems we had to overcome in this data set. Firstly, the claim dates and the claim amounts are available in different data sets separately given at different times. Thus, an amount recorded as a paid claim in the ‘‘claim amount data set’’ could be recorded as an outstanding claim in the ‘‘claim date data set’’ since the data set including the claim dates is older than the data set of the claim amounts. Besides, these amounts are not always consistent with each other because the claim amounts can change in time due to the contradictions between claim experts and policy-holders, delays of payments, law courts etc. We have overcome this problem considering each single individual claim. Therefore, we have acquired the claim dates of individual claims checking the consistency of the amounts with insurance amounts and considering more recent data. In addition to the dates, we needed the information about the location where the claims occur. The villages where the policies are written are available in the data set, but the spatial information of these villages were not provided. Therefore, the latitudes, the longitudes and the altitudes of 43,090 villages are obtained using online ‘‘batch geo-coding’’ tools which are compatible with ‘‘ArcGIS Geocode Addresses’’. For the sample data, we use the meteorological data given by Turkish State Meteorological Service. This data is arranged from 01.01.1980 to 31.12.2016 and it is recorded at 415 weather stations at different times. The location (latitude, longitude and altitude) of the climatic values and locations of the weather stations are used for the application. The spatial information of the weather stations are also obtained by geo-coding tools. In the following map, the brown circles indicate the location of the claims whereas the yellow ones are for the locations of 415 meteorological stations. Only unique pairs of latitudes and longitudes are shown in Fig. 2.
Fig. 2. The locations of the claims (brown) and the meteorological stations (yellow). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
286
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
As it can be seen from the map, the claims are mostly centered around the west, the middle and the south of the country due to the fact that agricultural activity is higher in these areas. Another reason of this localization is that education, economic conditions and insurance culture is more improved in these regions causing higher number of policies issued. Hence, in order to estimate the unknown meteorological value of the brown circles by STI, we use the known information of the optimal set of yellow circles chosen among the ones surrounding the yellow circles using DE algorithm. The types of the meteorological data that is interpolated for the claims are the ones suggested by experts in order to see their effects on hail events. The results show us that, low or high, all of the variables presented in Table 1 have influences on the claims.
Table 1 The information of the meteorological variables which are used in the application. No.
Variable name
Unit
Abbreviation
1
Daily average cloud
%
avcloud
98,510
2
Daily average humidity
%
avhum
927,254
3
Daily average pressure
hPa
avpres
828,340
4
Daily average 5 cm soil temperature
◦
C
avsoiltemp5
839,143
5
Daily average 10 cm soil temperature
◦
C
avsoiltemp10
836,883
6
Daily average 20 cm soil temperature
◦
C
avsoiltemp20
833,734
7
Daily average 50 cm soil temperature
◦
C
avsoiltemp50
821,938
8
Daily average temperature
◦
C
avtemp
944,670
9
Size
Daily average vapor pressure
hPa
avvappres
804,448
10
Daily insolation force
cal/cm2
insoforce
169,071
11
Daily maximum temperature
◦
C
maxtemp
957,491
12
Daily minimum surface temperature
◦
C
minsurftemp
848,233
13
Daily minimum temperature
◦
C
mintemp
959,525
14
Daily local 100 cm soil temperature
◦
C
soiltemp100
826,804
15
Daily total evaporation
mm
toteva
226,467
16
Daily total global solar radiation
cal/cm2
totglsolrad
169,077
17
Daily total insolation duration
h
totinso
443,246
18
Daily total precipitation
mm
totprec
490,522
3.2. Results of STI through optimization
It is explained in Section 2.2.4 that the criteria for DE algorithm’s control variables are reasonable when they are chosen as: (i) the population size, 5D ≤ M ≤ 10D; (ii) the scaling factor, F = 0.5; and (iii) the crossover ratio, CR = 0.9 or CR = 1.0 for the fast convergence. Therefore, we determine these factors as D = 4, M = 35, F = 0.5 and CR = 0.9. Since we use initial vectors including 4 sample points to interpolate the unknown value as it is explained in Section 2.1, the condition 5D ≤ M ≤ 10D ≡ 20 ≤ M ≤ 40 results in M = 35 =
(7) 4
, not
(6) 4
= 15 or
(8) 4
= 70. Thus, we first choose 7 closest sample
locations to the unknown point and we test all combinations of these locations in order to find the best 4-dimensional set of sample points for IDW method. In addition to DE algorithm control variables, the choice of the spatial and temporal distance-decay factors have a significant influence on the weights determining the IDW interpolation estimation. As it is inferred from Eq. (2.12), the −us
weights wt ,i are proportional to the power of the inverse distances, i.e. ∆si
−u
and ∆ti t . Each weight wt ,i will be the same
when us = ut = 0. If these power values are very high, only a few closest sample points have impact on the estimation. In the Geostatistical Analyst tool of the software ArcGIS, the power functions are taken as greater than 1. We choose us = ut = 2 as a specific power value which is known as inverse distance squared weighted interpolation. Table 2 displays the average errors and the average coefficient of variations, hence the average cost function values for all meteorological variables.
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
287
Table 2 The results for the optimal sample sets obtained with DE algorithm.
]−1
eabs k (X −k )
Variable
[
avcloud avhum avpres avsoiltemp5 avsoiltemp10 avsoiltemp20 avsoiltemp50 avtemp avvappres insoforce maxtemp minsurftemp mintemp soiltemp100 toteva totglsolrad totinso totprec
0.0136311999198039 0.0202279440688607 0.0202329292165670 0.0202288248112240 0.0202409529301215 0.0202294047608725 0.0202305046968018 0.0202310969131479 0.0202297019771295 0.0136119645458333 0.0202283032775291 0.0202275629274641 0.0202310924646778 0.0202303582407469 0.0222221044393762 0.0136114754457011 0.0224475594487593 0.0219899156348328
CV (∆X pw )
costk
1.543939215475710 0.352286642772752 0.352286642772752 0.352286642772752 0.352286642772752 0.352286642772752 0.352286642772752 0.352286642772752 0.352286642772752 1.005640107482880 0.352286642772752 0.352286642772752 0.352286642772752 0.352286642772752 0.510189528948739 1.005640107482880 0.489765811403711 0.398166772751478
0.02104574411017340 0.00748242124104953 0.00748339087379037 0.00748214787525899 0.00748595440992934 0.00748292780284032 0.00748320121422957 0.00748393025997700 0.00748261625960084 0.01368873748892430 0.00748205520362331 0.00748230634162909 0.00748419804962788 0.00748289680393535 0.01093404183469770 0.01368824563021470 0.01096106042882250 0.00870299230111245
pw ) and costk values are obtained by finding the mean of the values that are calculated In this table, eabs k (X −k ), CV (∆X according to the optimal sample sets for each claim location. Firstly, the error term eabs k (X −k ), the coefficient of variation
CV (∆X pw ) and the cost value costk are calculated from Eqs. (2.1), (2.3) and (2.4), respectively. Having computed these values for each unique claim location pair, we find the mean value of each array. The reason why CV (∆X pw ) results are identical is that the coefficient of variation is calculated from the pair-wise distances of the whole sample, not from the leave-one-out sample. Since most of the meteorological variables are measured at the same stations, these results are not surprising. After finding the optimal initial input set of sample points with DE algorithm, the meteorological values are estimated using IDW reduction technique. The basic descriptive statistics (minimum, maximum, mean, standard deviation and coefficient of variation) of the estimated values for each meteorological variable are presented in Table 3.
Table 3 The numerical results of the estimated meteorological variables obtained with STI. Variable avcloud avhum avpres avsoiltemp5 avsoiltemp10 avsoiltemp20 avsoiltemp50 avtemp avvappres insoforce maxtemp minsurftemp mintemp soiltemp100 toteva totglsolrad totinso totprec
Minimum 0.112387 5.418993 99.680297 1.638861 1.887984 1.630357 1.689714 1.590005 0.015913 23.655901 0.000894 −2.686932 −1.659817 1.569048 0.006840 5.797273 0.003298 0.007241
Maximum
Mean
Std.Dev.
Coef.Var
8.212468 85.829103 1011.476578 39.848590 37.653357 36.708751 34.431422 31.636077 26.939887 665.757626 39.117486 26.446437 27.273697 32.764589 11.015796 672.247079 12.184676 14.015522
8.212468 53.007691 818.432825 20.665928 20.015126 19.386182 17.885948 17.631339 11.846548 353.390889 23.452979 10.793316 12.074831 16.000496 3.744394 366.594729 5.677811 1.021892
1.081829 10.615359 143.390523 5.961296 5.789628 5.649834 5.144025 4.640650 4.059395 154.450782 5.536492 4.502987 4.435626 4.586364 1.493412 144.470776 2.374340 0.890617
0.131730 0.200261 0.175201 0.288460 0.289263 0.291436 0.287601 0.263205 0.342665 0.437054 0.236068 0.417201 0.367345 0.286639 0.398839 0.394089 0.418179 0.871537
For an accurate interpretation of the choropleth maps and the intervals specified for the varying colors represented in Figs. 4–6, the statistics given in Table 3 and the histogram graphs in Fig. 3 obtained with ArcGIS software are displayed to present the patterns of the estimations.
288
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
Fig. 3. The histograms of the estimated meteorological values.
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
289
Fig. 4. Choropleth maps for the estimated meteorological values related to the reported agricultural claims using IDW reduction technique: (a)–(f). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
According to choropleth maps, the colors scatter with respect to the environmental characteristics of the regions. Therefore, the values gathering together in a region could determine a cluster when one needs to analyze the environmental risk through clustering. 4. Concluding remarks Agricultural insurance risks cannot be analyzed only by actuarial and statistical modeling leaving out the meteorological characteristics of the insured areas. As weather related information is very hard to be measured in accurate terms, we seek for an approach which could enhance and facilitate the imputation of the missing data associated with environmental risks. An example of such approaches could be spatiotemporal technique, a more accurate alternative to spatial methods. We offer an alternative way of obtaining the optimal sample set for STI under a predetermined criteria by providing a multivariate generation for one of the most common optimization approaches in the literature, DE algorithm.
290
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
Fig. 5. Choropleth maps for the estimated meteorological values related to the reported agricultural claims using IDW reduction technique: (g)–(l). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
In this paper, we extend DE algorithm suggesting a computation that measures the spherical distance with an angularbased formula. In this formula, the polar coordinates of the known sample points and the ones to be estimated are considered. We also suggest a basic solution for determining the population size which is one of the decisive control variables of the DE optimization. Having proposed an extension for DE algorithm, we take into account of the influence of altitudes using 2-dimensional locations (latitudes and longitudes) all of which are the most significant determinants of weather related risks. The analytic tools may not always be sufficient for the risk management especially when we are interested in environmental risks. In such cases, considering the geographic information of the relevant data provides us an alternative way. Hence, we can use GIS as a tool for risk assessment in such cases. The results of the DE optimization and the STI estimations are represented with statistical measures and choropleth maps.
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
291
Fig. 6. Choropleth maps for the estimated meteorological values related to the reported agricultural claims using IDW reduction technique: (m)–(r). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Acknowledgments The authors would like to thank General Directorate of Agricultural Research and Policies (TAGEM), Ministry of Food, Agriculture and Livestock, Turkey for providing the data and funding our project (FUK-2015-6321). They are also grateful to the anonymous referee(s) who have contributed to the development of the contents with helpful suggestions. References [1] TARSIM, Agricultural Insurance Pool: Annual Report, Tech. rep., 2016. Available at https://web.tarsim.gov.tr/havuz/homePageEng. [2] K.A. Eldrandaly, M.S. Abu-Zaid, Comparison of six GIS-based spatial interpolation methods for estimating air temperature in Western Saudi Arabia, J. Environ. Informat. 18 (1) (2011) 38–45. [3] R. Murphy, F. Curriero, W. Ball, Comparison of spatial interpolation methods for water quality evaluation in the Chesapeake Bay, J. Environ. Chem. Eng. 136 (2) (2010) 160–171.
292
E. Nevruz and Ş.K. Yıldırak / Journal of Computational and Applied Mathematics 352 (2019) 278–292
[4] M. Ninyerola, X. Pons, J.M. Roure, Objective air temperature mapping for the Iberian Peninsula using spatial interpolation and GIS, Int. J. Climatol. 27 (2007) 1231–1242. [5] Y. Xie, T. Chen, M. Lei, J. Yang, Q. Guo, B. Song, X. Zhou, Spatial distribution of soil heavy metal pollution estimated by different interpolation methods: Accuracy and uncertainty analysis, Chemosphere 82 (3) (2011) 468–476. [6] F. Susanto, P. de Souza, He J., Spatiotemporal interpolation for environmental modelling, Sensors 16 (8) (2016) 1–20. [7] S.A. Hejazi, K.R. Jackson, Efficient valuation of SCR via a neural network approach, J. Comput. Appl. Math. 313 (2017) 427–439. [8] K. Price, R.M. Storn, J.A. Lampinen, Differential Evolution — A Practical Approach to Global Optimization, Springer, Berlin, 2005. [9] R. Joshi, A.C. Sanderson, Minimal representation multisensor fusion using differential evolution, IEEE Trans. Syst. Man Cybern. A 29 (1) (1999) 63–76. [10] F.S. Wang, H.J. Jang, Parameter estimation of a bio-reaction model by hybrid differential evolution, in: Proceedings of the 2000 Congress on Evolutionary Computation, CEC00 (Cat. No.00TH8512), La Jolla, CA, USA (2000) pp. 410–417. [11] S. Das, A. Abraham, U.K. Chakraborty, A. Konar, Differential evolution using a neighborhood-based mutation operator, IEEE Trans. Evol. Comput. 13 (3) (2009) 526–553. [12] X. Li, M. Yin, Application of differential evolution algorithm on self-potential data, PLoS ONE 7 (12) (2012) 1–11. [13] T.J. Choi, C.W. Ahn, An improved differential evolution algorithm and its application to large-scale artificial neural networks, J. Phys. Conf. Ser. 806 (1) (2017) 1–5. [14] L. Li, T. Losser, C. Yorke, R. Piltner, Fast inverse distance weighting-based spatiotemporal interpolation: A web-based application of interpolating daily fine particulate matter PM2.5 in the contiguous U.S. using parallel programming and k-d tree, Int. J. Env. Res. Public Health 11 (9) (2014) 9101–9141. [15] W.R. Tobler, A computer movie simulating urban growth in the Detroit region, Econ. Geogr. 46 (1970) 234–240. [16] L. Li, P. Revesz, A comparison of spatio-temporal interpolation methods, in: Proceedings of the Second International Conference on GIScience, Honolulu, Hawaii, USA (2002). [17] L. Li, P. Revesz, Interpolation methods for spatio-temporal geographic data, Comput. Environ. Urban Syst. 28 (3) (2004) 201–227. [18] L. Li, Constraint databases and data interpolation, in: S. Shekhar, H. Xiong (Eds.), Encyclopedia of Geographic Information System, Springer, Berlin/Heidelberg, Germany, 2008, pp. 144–153. [19] P. Revesz, S. Wu, Spatiotemporal reasoning about epidemiological data, Artif. Intell. Med. 38 (2006) 157–170. [20] H.L. Yu, C.H. Wang, Quantile-based Bayesian maximum entropy approach for spatiotemporal modeling of ambient air quality levels, Environ. Sci. Technol. 47 (3) (2013) 1416–1424. [21] L. Li, X. Zhang, R. Piltner, A spatiotemporal database for ozone in the conterminous U.S., in: Proceedings of the IEEE Thirteenth International Symposium on Temporal Representation and Reasoning, Budapest, Hungary (2006) pp. 168–176. [22] J. Li, A.D. Heap, A Review of Spatial Interpolation Methods for Environmental Scientists, Tech. rep., Geoscience Australia, Australian Government, 2008. [23] P.A. Burrough, R.A. McDonnell, Principles of Geographical Information Systems, Oxford University Press, New York, 1998. [24] J. Li, A.D. Heap, Spatial interpolation methods applied in the environmental sciences: A review, Environ. Model. Softw. 53 (2014) 173–189. [25] V. Capozzi, E. Picciotti, V. Mazzarella, G. Budillon, F.S. Marzano, Hail storm hazard in urban areas: Identification and probability of occurrence by using a single-polarization X-band weather radar, Hydrol. Earth Syst. Sci. Discuss. (2016) 1–22. [26] R. Storn, K. Price, Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim. 11 (4) (1997) 341–359. [27] R.M. Lewis, V. Torczon, M.W. Trosset, Direct search methods: Then and now, J. Comput. Appl. Math. 124 (1–2) (2000) 191–207. [28] A. Cohn, K. Scheinberg, L. Vicente, Introduction to Derivative-Free Optimization, in: MOS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, USA, 2009. [29] E. Baeyens, A. Herreros, J.R. Perán, A direct search algorithm for global optimization, Algorithms 9 (2) (2016) 1–22. [30] E. Nevruz, Multivariate stochastic prioritization of dependent actuarial risks under uncertainty (Ph.D. Thesis), Department of Actuarial Sciences, Graduate School of Science and Engineering, Hacettepe University, Ankara, Turkey, 2018. [31] A.W. Mohamed, H.Z. Sabry, M. Khorshid, An alternative differential evolution algorithm for global optimization, J. Adv. Res. 3 (2) (2012) 149–165. [32] R. Gämperle, S.D. Müller, P. Koumoutsakos, A parameter study for differential evolution, in: A. Grmela, N. Mastorakis (Eds.), Advances in Intelligent Systems, Fuzzy Systems, Evolutionary Computation, WSEAS Press, 2002, pp. 293–298. [33] J. Rönkkönen, S. Kukkonen, K. Price, Real-parameter optimization with differential evolution, IEEE Congress on Evolutionary Computation, Edinburgh, Scotland, UK (2005) pp. 506–513. [34] L. Li, Spatiotemporal interpolation methods in GIS (Ph.D. Thesis), The University of Nebraska, Lincoln, NE, USA, 2003. [35] L. De Mesnard, Pollution models and inverse distance weighting: Some critical remarks, Comput. Geosci. 52 (2013) 459–469. [36] Ş. Şahin, U. Karabey, B. Bulut Karageyik, E. Nevruz, Ş.K. Yıldırak, Actuarial premium calculation for wheat crop insurance in Turkey, Turk. J. Agric. Econ. 22 (2) (2016) 37–47.