The application of 0–1 mixed integer nonlinear programming optimization model based on a surrogate model to identify the groundwater pollution source

The application of 0–1 mixed integer nonlinear programming optimization model based on a surrogate model to identify the groundwater pollution source

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Journal of Contaminant Hydrology journal homepage: www...

1MB Sizes 2 Downloads 36 Views

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Journal of Contaminant Hydrology journal homepage: www.elsevier.com/locate/jconhyd

The application of 0–1 mixed integer nonlinear programming optimization model based on a surrogate model to identify the groundwater pollution source Jia-yuan Guoa,b, Wen-xi Lua,b, , Qing-chun Yanga,b, Tian-sheng Miaoa,b ⁎

a b

Key Laboratory of Groundwater Resources and Environment of Ministry of Education, Jilin University, Changchun 130021, China College of New Energy and Environment, Jilin University, Changchun 130021, China

ARTICLE INFO

ABSTRACT

Keywords: 0–1 mixed integer nonlinear programming Groundwater pollution source identification Groundwater solute transport Surrogate model

The optimization model is presently used for the identification of pollution sources and it is based on non-linear programming optimization. The decision variables in this model are continuous, resulting in a weak recognition of integer variables including pollution source location. In addition, as the number of pollution sources increase, so the calculated load increases exponentially and accuracy decreases. Compared with previous studies, this study makes a series of improvements by adopting a 0–1 mixed integer nonlinear programming optimization model to enable the simultaneous identification of both location (integer variable) and the release intensity (continuous variable) of the pollution source. One of the constraints in the optimization model is a simulation component which requires thousands of calls during the calculation process and therefore requires considerable computational load. To avoid this problem, the Kriging surrogate model is established in this study to reduce computational load, while at the same time ensuring the accuracy of the simulation results. The identification result is solved using a genetic algorithm (GA) and represents the real location of the pollution source, while release intensities are close to actual ones with small relative errors. The Kriging surrogate model is based on a 0–1 mixed integer nonlinear programming optimization model and can simultaneously identify both the location and the release intensity of the pollution source with a high degree of accuracy and by using short computational times.

1. Introduction Evaluating the risks associated with groundwater contamination usually requires knowledge of the characteristics of the contamination source, including the number and location of sources, release history and intensity and mass of contaminant. These characteristics are often not well known, creating great in the groundwater pollution rehabilitation program design, risk assessment, and the identification of parties responsible for pollution (Snodgrass and Kitanidis, 1997; Khan et al., 2004; Lapworth et al., 2012; Om and Bithin, 2013). Identification of the groundwater pollution source is particularly important. The identification of groundwater pollution sources is an inverse problem. There has been a long history of research and application to the forward problem but limited in the inverse problem. Since its rise in the 1960s, both the theory and application of this problem remain in a developmental stage (Li and Yao, 2014), and most of the problems so far addressed are non-linear and ill-conditioned. One reason for this is



that when the inverse problem is solved, the amount of information known is far less than the amount of information sought and this makes the problem difficult (Tikhonov and Arsenin, 1978; Yeh, 1986; Mahinthakumar and Sayeed, 2005). This problem is nevertheless gaining more and more attention due to its extensive and important application and future prospects. A number of applications have been shown in practice, including computed tomography (CT) imaging for medical diagnosis. Several comprehensive reviews on the simultaneous identification of release intensity and pollution source positioning have been published over the last 10 years; for example, Mahinthakumar and Sayeed (2005) proposed a simulation-optimization method based on a hybrid genetic-local search algorithm in order to identify the release intensity and the location of non-point pollution source (Mahinthakumar and Sayeed, 2005). In similar work, Mirghani et al. (2012) used a simulation-optimization method based on a parallel evolutionary strategy to identify both release intensity and pollution source location (Mirghani

Corresponding author at: College of New Energy and Environment, Jilin University, Changchun 130021, China. E-mail address: [email protected] (W.-x. Lu).

https://doi.org/10.1016/j.jconhyd.2018.11.005 Received 19 March 2018; Received in revised form 31 October 2018; Accepted 12 November 2018 0169-7722/ © 2018 Elsevier B.V. All rights reserved.

Please cite this article as: Guo, J.-y., Journal of Contaminant Hydrology, https://doi.org/10.1016/j.jconhyd.2018.11.005

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

et al., 2012), while Ayvaz (2010) used a simulation-optimization method based on the harmony search algorithm to synchronously identify the two variables (Ayvaz, 2010). All of these studies used both the location and release intensity of pollution source as continuous variables, while the recognition effect is acceptable when we uniformly treat two different types of variables in the process of simultaneous identification. There is also a significant magnitude of difference between position coordinates and the amounts of pollution released from a source and so the reliability of results is open to question. The 0–1 mixed integer nonlinear optimization model can be used to address this problem as it contains both discrete integer variables (i.e., the location of the pollution source) and continuous variables (i.e., the release intensity of the pollution source), it excludes unreal sources to reduce the number of variables, and it recognizes the release intensity of the real pollution source. The simulation-optimization method utilizes the optimization principle to solve the problem of pollution source identification, and it is presently one of the most complete and widely used mathematical methods in this area. This method also requires tens of thousands of iterations in order to invoke the groundwater solute transport simulation model; thus, because of the relatively large amount of computation and the calculation time involved in the running of the simulation model, both a large calculation load and long calculation time are required to generate an iterative solution (Ayvaz, 2010; Mirghani et al., 2012). Thus, establishing a surrogate of a simulation model is an effective way to reduce computation time (Singh et al., 2004; Kourakos and Mantoglou, 2013) and has a similar function. In other words, given the same input, outputs from the surrogate model are very close to the outputs of the simulation model in the first place and also has much less computational load and takes much less time (Queipo et al., 2005; Sreekanth and Datta, 2010). Such a model is advocated in this paper. This research is important because limited research has so far been carried out on the simultaneous identification of release intensity and pollution source location using a surrogate model that is based on the 0–1 mixed integer nonlinear programming simulation-optimization approach.

function as follows: n

R (x i , x j ) =

Y (x ) = f T

j

+ Z (x ) =

+Z

(4)

f )

(5) 2

In this expression, the estimated value of variance σz is as follows: 2 z

= ({y }

[F ]{ })T [R] 1 ({y }

[F ]{ })/n

(6)

Thus, the parameter θk is given according to the maximum likelihood estimate, in other words when θk > 0, the 1 2 ( n ln( ) + ln |[ R ]|) is maximized, and R is the correlation matrix z 2 between m sample points, as follows:

R=

R (x1 , x1) … R (x1 , xm ) … … … R (xm , x1) … R (xm , xm )

(7)

2.2. The optimization model The establishment of an optimization mathematical model is the basis for solving problems in this area. In addition, decision variables, objective functions, and constraints comprise the three basic elements of the optimization problem mathematic model (Yan et al., 2008; Huang and Meng, 2009). The objective function is a mathematical expression of the optimization goal expressed by the decision variable, and thus the evaluation criterion for the quality of a program. Constraints comprise the values of the decision variable in optimization, and can be applied either as equality, or inequality constraints. In a mathematical model, if some variables are integers, some are continuous, and there is a nonlinear constraint or a nonlinear objective function, such a problem is called mixed integer nonlinear programming. This is expressed as follows:

min f (X ) s. t .

gi (X )

0, i = 1, 2, …, m

hj (X ) = 0, j = 1, 2, …, t

(8) T

In this expression, X = (x1, x2, …, xn) is the decision variable, including some discrete integer variables and some that are continuous, while f(X) is the objective function, gi(X) is the inequality constraint, m is the total number of inequality constraints, hj(X) is the equality constraint, and t is the total number of equality constraints. As the discrete integer variable only takes 0 or 1, this model is a 0–1 mixed integer nonlinear programming model.

(1)

In this expression, fj(x) = [f1(x), f2(x), …, fk(x)] is a deterministic function, while βj = [β1, β2, …, βk]Tdenotes the corresponding undetermined parameter estimated by the response value, and Z(x) denotes the statistical stochastic process with a zero mean. Thus, the variance of Z(x) is σ2, as follows:

cov[Z (x i ), Z (xj )] =

+ r T (x ) R 1 (y

= ([F ]T [R] 1 [F ]) 1 [F ]T [R] 1 {y }

T

2 Z R (xi, xj )

(3)

In this expression, r (x) is the correlation vector between point x and m sample points rT(x) = [R(x, x1), R(x, x2), …, R(x, xm)], while y is the m × 1 vector which contains the response value corresponding to m sample points, and β∗ can be obtained from the optimal linear unbiased estimation, as follows:

k

f j (x )

x jk |pk ]

T

The Kriging method is an optimized interpolation approach that was invented by the French geographer and mathematician Matheron and the South African mining engineer, Krige. In early work, Sacks et al. (1989) used the Kriging method to establish surrogate models for the first time (Sacks et al., 1989), and this approach has since been used widely in surrogate modeling (Ryu et al., 2002; Kleijnen and van Beers, 2005; Hemker et al., 2008; Coetzee et al., 2012). The relationship between the response value and the independent variable in the approximate expression of the Kriging model can be expressed as follows:

j=1

|x ik

In this expression, n is the number of design variables, xi is the kdimensional coordinate of the i − th sample point, θk and pkare undetermined coefficients, θk ≥ 0, 0 ≤ pk ≤ 2, and k is the dimension of x. Thus, the response [y] = (y1, …, yn)T of known n sample points [x] = ({x1}T, …, {xn}T)T can be predicted to be the response of any point {x∗}, as follows:

2.1. The principle of the Kriging model

Y (x ) =

k

k

2. Methodology

fT

[ k=1

2.3. Genetic algorithm (GA)

(2)

A GA is a method for calculating an optimal solution by simulating natural biological evolution and genetic processes. This algorithm starts from any population set and generates a new one via the random use of genetic operators including selection, crossover, and variation. Individuals in a population are more adaptable to the environment than

In this expression, R(xi, xj) denotes the spatial correlation function between the two sample points xi and xj that are themselves related to just the distance between xi and xj, independent of its position (Rosenbaum and Schulz, 2012). Thus, R(xi, xj) usually uses a Gaussian 2

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

the above; thus, through the generation of evolving reproduction, the new population will gradually evolve into the search space gradually getting better and better, before finally converging to the most suitable population for the environment, in which the individual is the optimal solution, as follows:

One pollution source and five observation wells are present in this aquifer, and the entire simulation time applied was three years. This area is also divided into three release periods of which each has a duration of one year. The source is assumed to release pollutants during the three release periods; thus, Table 1 and Table 2 exhibits aquifer parameters as well as pollution source characteristics. The flowchart summarizing the approach is shown in Fig. 2.

2.3.1. Coding In practical applications, the solution of a problem is generally a real one and these data (x) are the expression of the GA. Thus, converting real solution data into a binary form is called encoding, and the GA converts data of the solution space into genotype string structure of the genetic space before searching. The different combinations of these string structure data therefore constitute different points.

3.2. The mathematical simulation model Using a conceptual hydrogeological model within the study area, the mathematical model of groundwater flow and groundwater solute transport was therefore established using Eq. (9):

2.3.2. Initial population production In randomly generated N initial string structure data, each string structural component is referred to as an individual, N of which comprise a group. The GA therefore starts the iteration with the N string structure datum as an initial point, so the evolutionary algebra counter is t and the maximum evolutionary algebra is T. A randomly generated number of M individuals are referred to as the initial population, P(0).

x

K (H

z)

H x

+

y

K (H

H y

z)

+W=µ

H t

(x , y )

D

H (x , y, t ) | t = 0 = H0 (x , y ) (x , y ) D H K | S = q (x , y , t ) (x , y ) S1 n 1 H |S = 0 (x , y ) S2 n 2 (9)

2.3.3. The self-fitness evaluation test A fitness function is used to show the merits of an individual or solution. Thus, for different problems, a fitness function is defined in different ways; according to the specific problem, the fitness of each individual in the population P(t) is calculated in this step.

In this expression, D is the area for simulation computation, H is the groundwater table head (m), H0 is the initial water table head (m), z is the elevation of target aquifer floor (m), K is hydraulic conductivity (m d−1), μ is specific yield (dimensionless), W is vertical recharge including the discharge strength of the unconfined aquifer (m d−1), q is the recharge and discharge quantity of aquifer per unit width (m d−1), S1 is the second-type boundary condition, S2 is the no-flux boundary, n is the direction of outward normal on the boundary, and t is the simulation period. The mathematical model of groundwater solute transport can therefore be described by Eq. (20), as follows:

(1) Selection: In this step, the selection operator is applied to the group. (2) Crossover: In this step, the crossover operator is applied to the group. (3) Variation: In this step, the variation operator is applied to the group. The group, P(t), is calculated to obtain the next generation group P(t + 1) after selection, crossover, and variation. (4) Termination condition judgment: In this step, t ≤ T, then t becomes t + 1 and we proceed to step (2), while if t > T, then the maximum fitness of an individual is taken as the optimal solution output obtained through the evolution of the process and then the operation is terminated.

c c c = Dxx + Dyy t x x y y c (x , y, t ) | t = 0 = c0 (x , y ) nDxx

3. Case study

c y

3.1. Site overview

S2

(x , y )

c + cvx | S1 = g (x , y, t ) x

=0

(x , y )

(cu x ) x

(cu y ) y

+I

D (x , y )

S1

S2

(10)

In this expression, C is the concentration of groundwater solute (mg L−1), C0 is the initial concentration of groundwater solute (mg L−1), Dxx and Dyy are the hydrodynamic diffusion coefficients in the x and y directions, respectively (m2 d−1), ux and uy are the components of the actual average flow rate in the x and y directions, respectively (m d−1), I is the source and sink in terms of the unit time per unit liquid volume within the amount of solute quality increase or decrease, g represents the solute mass per unit time through the actual cross section of the boundary, and n is the direction of the outward normal on the S1 boundary.

The study area used in this research is an example of a human hypothesis. Thus, if different from the direct study of practical examples, two relative processes are involved in this hypothetical example of which one is the forward process of knowing the characteristics of groundwater pollution sources which also enables prediction of the distribution of pollutant concentrations at each observation well. The second comprises an inverse process that identifies the characteristics of groundwater pollution sources based on the actual monitoring data of pollutant concentrations at the observation wells. The advantage of this hypothetical example is that a large amount of time is not required to study the complex conditions of the actual contaminated site, eliminating the need to establish an identification process and verify the mathematical simulation model, directly providing the conditions and parameters of the solution. It is possible to emphasize both the key and difficult issues that need to be addressed during the process of identifying pollution sources. The system studied in this research comprised a two-dimensional domain discretized into 9276 grid blocks. As shown in Fig. 1, a secondtype boundary condition is also assigned to the AD and BC boundaries, respectively, while the others are set as no-flux. As the terrain of the study area is high in the southeast and low in the northwest, groundwater flow also occurs in this direction.

3.3. The Kriging surrogate model of the simulation model The surrogate model is an alternative to the input-output response of the mathematical simulation model, it has a small computational load and takes less time to process. Therefore, use of a surrogate model can effectively simplify the calculations inherent to most mathematical simulation models. In this study, the release intensities of pollution source were selected as input variables, including the contaminant concentrations of observation wells as output variables in the surrogate model. A training set of 100 samples and a test set of 20 samples were therefore selected in feasible regions of three input variables via the Latin hypercube 3

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

Fig. 1. A conceptual model of the study area.

approximate accuracy of the Kriging surrogate model, as follows: Mean Relative Error (MRE) was calculated as follows:

Table 1 Pollution source release intensity at three periods. Pollution source

S

Release intensity of pollution source at each period(mg/d) SP1

SP2

SP3

115

210

157

MRE =

1 n

n

|yi

yi | yi

i=1

× 100%

(11)

Root Mean Square Error (RMSR) was calculated as follows: n

Table 2 Aquifer parameters.

RMSR =

Parameters

Values

Hydraulic conductivity, K(m/d) Specific yield, μ Effective porosity, n Longitudinal dispersivity, αL(m) Transverse dispersivity, αT(m) Initial concentration (mg/L)

10 0.15 0.3 30 10 115

i=1

yi )2

(yi n

(12)

In these expressions, yi represents the output value of the i sample obtained from the solute transport simulation model, while yi is the output value of the i sample derived from the Kriging surrogate model. In other words, the smaller the MRE and RMSE values, the higher the approximation of the surrogate model; thus, this surrogate model replaces the simulation model when accuracy is achieved. 3.4. The 0–1 mixed integer nonlinear programming optimization model

sampling method, and corresponding output variables were obtained by running the mathematical simulation model. Thus, based on training samples, the parameters which are the key to the Kriging surrogate model are estimated by using GA; when fitness values converge, the corresponding values of θ can be taken as the final results and each observation well corresponds to the set. Once a surrogate model has been established, it cannot be directly used to replace the actual simulation model, but also used to verify the approximate accuracy of the surrogate model with the set of 20 samples. We used two evaluation indexes in this study to analyze the

The identification of groundwater pollution sources is based on limited data. Thus, via the comprehensive utilization of various mathematical methods, simulation models to describe groundwater pollution can be inversely solved (in this case, the surrogate model replaces the original simulation model) to identify the relevant source information in the aquifer. Information about groundwater source therefore includes the number of sources, locations, and release intensity. This information not only includes integer variables, such as the number of sources, locations, and continuous variables, such as release intensity. This study uses a 0–1 mixed integer nonlinear programming 4

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

Fig. 2. Flowchart of the simulation approach.

mathematical simulation model for groundwater solute transport. The optimization model is expressed as follows:

Table 3 The parameters of the Kriging surrogate model. Positions of pollution sources

P1

P2

P3

P4

Observation wells

O1 O2 O3 O4 O5 O1 O2 O3 O4 O5 O1 O2 O3 O4 O5 O1 O2 O3 O4 O5

Parameter values

12

θ1

θ2

θ3

0.1789 0.0064 0.7132 0.2658 0.4963 0.1387 0.0982 0.2374 0.7549 0.3247 0.2974 0.0128 0.9786 0.6593 0.5492 0.2668 0.0315 0.2065 0.1653 0.2793

0.1654 0.0089 0.0459 0.0436 0.0519 0.0651 0.0246 0.0283 0.0343 0.0221 0.0516 0.0045 0.0124 0.0475 0.6147 0.0622 0.0049 0.0998 0.2315 0.1968

0.0528 0.0025 0.0012 0.0064 0.0027 0.0023 0.0019 0.0012 0.0005 0.4574 0.0453 0.0459 0.0063 0.0037 0.0469 0.0162 0.0190 0.3116 0.0064 0.0637

min z ( 1 , 2,…,

7 , t, k)

5

=

[(Ckt

t

C k )2]

t=1 k=1

= 0 or 1 , 1+…+ 4 = 1 Ckt = f ( 1 , 2,…, 0 < m < 300, j

j = 1, 2, 3, 4 7 , t, k)

m = 5, 6, 7

(13) t Ck

denote simulated values and obIn this expression, Ckt and servations of the contaminant concentration at time t in the observation well, k. respectively, while σ1, σ2, σ3, σ4 are variables of positions, σ5, σ6, σ7 are variables of release intensities, and f is the Kriging surrogate model of groundwater solute transport simulation model. 4. Results and discussion 4.1. The accuracy of the Kriging surrogate model The Kriging surrogate model was incorporated into a Matlab GA to solve the inverse pollution source problem. The GA was used to search for a release intensity that minimizes discrepancies between actual observations and model calculations, before a series of coefficients of gauss functions θ were applied with determine the precision of the surrogate model (Table 3). To investigate the validity of the surrogate model, 20 sets of test samples were introduced into the groundwater solute transport numerical simulation model and its surrogate counterpart, respectively. Thus, taking the P1 position as an example, the outputs and relative error of the pollutant concentration of the simulation and surrogate model are shown in Fig. 3 and Table 4, respectively. The results of the surrogate and numerical simulation models for groundwater solute transport are compared with the evaluation indexes including MRE and RMSR (Table 5). The data presented in Fig. 3 and Table 4 show that the relative error of the observation wells fall between 0.0120% and 2.8684%, less than

optimization model to achieve the inverse identification of pollution source. Four potential locations of pollution source, P1-P4, were defined randomly, of which P3 is the real location of pollution source, before decision variables were determined. The variables that represent pollution source locations were used as 0–1 integer variables, while the release intensities of the pollution source at each time are used as continuous ones. Seven variables were used for the 0–1 mixed integer nonlinear programming optimization model before it was established to include the objective function and constraint conditions. The objective function is expressed as the minimum error between the actual concentration and the simulated concentration of the observation, while constraint conditions included the emission intensity of pollution source, their number and location, and the surrogate model of 5

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

Fig. 3. Test outputs of the simulation and surrogate models in position P1, while a, b, c, d, and e represent the observation wells O1, O2, O3, O4, and O5, respectively.

3%. Thus, as shown in Table 5, the MRE of the five positions are 1.0085%, 0.7652%, 0.1547%, 1.2351%, and 0.4862%, which shows that the computed pollutant concentration based on the Kriging surrogate model lie very close to the simulation model. At the same time, there is no significant difference according to the RMSE of 20 validation samples from each observation well. From Table 6, we can figure out that the relative errors of the whole optimization results by using the surrogate model and the simulation model respectively are both small. Although the calculation accuracy of the surrogate model is lower than that of the simulation model, it is not much different. Therefore, the surrogate model can be used for iterative calculations instead of the simulation model. Calculating each simulation model using GMS took 22 s. Therefore,

during the process of solving the optimization model of groundwater pollution source identification, if the numerical simulation version is called directly then the time required for the optimization algorithm to iterate 6000 times is about 33.33 h while the Kriging surrogate just required 0.84 h. It is noteworthy that although a considerable amount of time has been spent on training, developing, and testing the Kriging surrogate model, less time was spent than on the simulation calls and also has clear advantages. The analysis presented in this paper demonstrates that the Kriging surrogate model can be effectively utilized in pace of a numerical simulation version as it requires less computational and does not entail a sacrifice in accuracy.

6

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

Table 4 The relative errors of the five observation wells at position P1. Test samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Relative errors of the five observation wells (%) O1

O2

O3

O4

O5

0.4123 0.3877 0.3609 0.0630 0.6100 1.0179 0.3171 1.0300 1.1590 0.3588 0.1130 0.1830 0.0740 0.8250 0.0350 0.0508 0.0670 0.0935 0.3135 0.7190

0.4472 1.4590 1.0934 1.8014 1.5066 0.6576 2.6510 2.5250 1.7300 2.8282 1.9443 0.5540 2.7130 0.7180 1.0280 2.0043 1.4077 2.3810 0.4835 0.6218

2.4823 1.2960 2.5820 0.0150 0.6500 0.7482 1.8540 0.2280 2.3170 2.6914 2.4980 1.0310 1.0770 1.8156 0.8727 1.5106 1.9900 2.2780 0.6410 0.9810

0.2167 0.2589 0.2224 0.2943 0.2975 0.2731 0.2468 0.3222 0.1938 0.2468 0.2416 0.2291 0.2849 0.4826 0.2456 0.3890 0.2661 0.3350 0.4280 0.2834

0.7539 0.8210 1.0270 0.0330 1.4317 2.8684 1.9100 0.0120 1.9260 1.2260 0.3400 2.3080 2.6350 1.6155 0.5941 1.0320 0.8200 2.3510 2.1972 0.8879

Fig. 4. Absolute errors between real release and identified release intensity. Table 7 Pollution source identification results. Position P1 0

MRE

RMSR

P1 P2 P3 P4

1.0085% 0.7652% 0.1547% 1.2351%

0.8531 0.5147 1.2562 0.4619

Table 6 The relative errors of the whole optimization results by using the surrogate model and the simulation model respectively. Test samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Real release intensity

106.00 130.00 251.00 256.00 141.00 272.00 146.00 158.00 127.00 204.00 179.00 263.00 271.00 161.00 235.00 112.00 109.00 137.00 198.00 272.00

277.00 166.00 150.00 81.00 131.00 251.00 97.00 114.00 178.00 160.00 92.00 102.00 196.00 135.00 204.00 183.00 123.00 115.00 85.00 251.00

284.00 177.00 213.00 72.00 44.00 89.00 105.00 92.00 226.00 114.00 57.00 81.00 99.00 107.00 145.00 269.00 158.00 120.00 76.00 89.00

The relative error of simulation model optimization results(%)

The relative error of surrogate model optimization results(%)

1.40 3.24 3.86 1.38 4.82 5.09 1.81 2.56 2.70 3.49 5.06 3.27 2.41 3.42 1.19 2.73 2.31 1.83 3.55 3.33

1.92 3.84 4.28 1.83 5.37 5.76 2.04 2.97 3.17 3.79 5.72 3.84 2.90 4.05 2.37 2.90 3.05 2.14 3.78 3.61

P2 0

P3 1

P4 0

SP1 113.95

SP2 212.74

SP3 157.23

in Table 6 and Fig. 4. The results presented in Table 7 show that just the identification of P3 is 1, while the rest are 0; this analysis indicates that the exact location of pollution source is P3, while the intensities of the three release periods are 113.95 (mg/d), 212.74 (mg/d), and 157.23 (mg/d), respectively. Thus, as shown in Fig. 4, the maximum absolute error of identification result is 2.74 (mg/d), while absolute errors between the real release intensity and the identified release intensity are small. Enhanced identification results show that the 0–1 mixed integer nonlinear programming optimization model has clear applicability to identify both the location and release intensity of the pollution source.

Table 5 Evaluation accuracy of the surrogate model. Position

Release intensity (mg/d)

5. Conclusions Using the simulation-optimization method, this paper presents a comprehensive application of mathematical methods including the 0–1 mixed integer nonlinear programming optimization model, the Kriging method, and a GA to identify the location and release intensity of the pollution source. Several conclusions are therefore clear from this research. First, the results of this study show that the surrogate model established via the Kriging method can be used to replace the actual groundwater solute transport simulation model. This approach avoids huge computational load, and it also has a high fitting accuracy based on the simulation model. This method can overcome the limitations of the conventional coupling technique and greatly reduces the computational load caused by directly invoking the simulation model in the solving process of the optimization model as well as saving a great deal of time. This approach also presents a clear solution to the problem presented in this study from both potential and practical aspects. Second, a 0–1 mixed integer nonlinear programming optimization model contains both integer and continuous variables and it also reduces the number required to improve the accuracy of identification results. Therefore, the 0–1 mixed integer nonlinear programming optimization model can be used to simultaneously identify the location and release intensities of the groundwater pollution source. Although location identification is accurate and the errors of release intensities are small, this approach effectively realizes the simultaneous identification of both location and release intensities and thus avoids a limited ability to identify integer variables inherent to former non-linear optimization models including the location of pollution source.

4.2. Results of pollution source identification analysis A GA approach was used to solve the 0–1 mixed integer nonlinear programming optimization model, with the set main parameters including a population size of 60, fitness scaling ranked, stochastic and uniform selection, mutation set as constraint dependent, crossover scattered, and forward migration. The results of this analysis are shown 7

Journal of Contaminant Hydrology xxx (xxxx) xxx–xxx

J.-y. Guo et al.

Acknowledgements

Application. Science Press, Beijing. Mahinthakumar, G.K., Sayeed, M., 2005. Hybrid genetic algorithm–local search methods for solving groundwater source identification inverse problems. J. Water Resour. Plan. Manag. 131 (1), 45–57. Mirghani, B.Y., Zechman, E.M., Ranjithan, R.S., 2012. Enhanced simulation-optimization approach using surrogate modeling for solving inverse problems. Environ. Forensic 13 (4), 348–363. Om, P., Bithin, D., 2013. Sequential optimal monitoring network design and iterative spatial estimation of pollutant concentration for identification of unknown groundwater pollution source locations [J]. Environ. Monit. Assess. 185 (7), 5611–5626. Queipo, N.V., Haftka, R.T., Shyy, W., et al., 2005. Surrogate-based analysis and optimization. Prog. Aerosp. Sci. 41 (1), 1–28. Rosenbaum, B., Schulz, V., 2012. Comparing sampling strategies for aerodynamic Kriging surrogate models. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik 92 (11-12), 852–868. Ryu, J.-S., Kim, M.-S., Cha, K.-J., et al., 2002. Kriging interpolation methods in geostatistics and DACE model. KSME Int. J. 16 (5), 619–632. Sacks, J., Welch, W.J., Mitchell, T.J., et al., 1989. Design and analysis of computer experiments. Stat. Sci. 4 (4), 409–423. Singh, R.M., Datta, B., Jain, A., 2004. Identification of unknown groundwater pollution sources using artificial neural networks. J. Water Resour. Plan. Manag. 130 (6), 506–514. Snodgrass, M.F., Kitanidis, P.K., 1997. A geostatistical approach to contaminant source identification. Water Resour. Res. 33 (4), 537–546. Sreekanth, J., Datta, B., 2010. Multi-objective management of saltwater intrusion in coastal aquifers using genetic programming and modular neural network based surrogate models. J. Hydrol. 393 (3–4), 245–256. Tikhonov, A.N., Arsenin, V.Y., 1978. Solution of ill-posed problems. Math. Comput. 32 (144), 1320–1322. Yan, G.L., Zhang, N., Liu, Y.H., 2008. System Engineering. Machinery Industry Press, Beijing. Yeh, W.G., 1986. Review of parameter identification procedures in groundwater hydrology: the inverse problem. Water Resour. Res. 22 (2), 95–108.

The authors would like to acknowledge the support provided by the National Natural Science Foundation of China (41672232) and Jilin Province Science and Technology Development Project (20170101066JC). References Ayvaz, M.T., 2010. A linked simulation–optimization model for solving the unknown groundwater pollution source identification problems. J. Contam. Hydrol. 117 (1–4), 46–59. Coetzee, W., Coetzer, R.L., Rawatlal, R., 2012. Response surface strategies in constructing statistical bubble flow models for the development of a novel bubble column simulation approach. Comput. Chem. Eng. 36, 22–34. Hemker, T., Fowler, K.R., Farthing, M.W., et al., 2008. A mixed-integer simulation-based optimization approach with surrogate functions in water resources management. Optim. Eng. 9 (4), 341–360. Huang, P., Meng, Y.G., 2009. Optimization Theory and Method. University Press, Beijing: Tsinghua. Khan, F.I., Tahir, H., Ramzi, H., 2004. An overview and analysis of site remediation technologies. J. Environ. Manag. 71 (2), 95–122. Kleijnen, J.P., van Beers, W., 2005. Robustness of Kriging when interpolating in random simulation with heterogeneous variances: some experiments. Eur. J. Oper. Res. 165 (3), 826–834. Kourakos, G., Mantoglou, A., 2013. Development of a multi-objective optimization algorithm using surrogate models for coastal aquifer management. J. Hydrol. 479, 13–23. Lapworth, D.J., Baran, N., Stuart, M.E., et al., 2012. Emerging organic contaminants in groundwater: a review of sources, fate and occurrence. Environ. Pollut. 163 (4), 287–303. Li, G.S., Yao, D., 2014. Source and Sink Identification of Diffusion Model and its

8