A benchmarking framework for simulation-based optimization of environmental models

A benchmarking framework for simulation-based optimization of environmental models

Environmental Modelling & Software 35 (2012) 19e30 Contents lists available at SciVerse ScienceDirect Environmental Modelling & Software journal hom...

297KB Sizes 2 Downloads 89 Views

Environmental Modelling & Software 35 (2012) 19e30

Contents lists available at SciVerse ScienceDirect

Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft

A benchmarking framework for simulation-based optimization of environmental models L. Shawn Matott a, *, Bryan A. Tolson b, Masoud Asadzadeh b a b

University at Buffalo, Center for Computational Research, Buffalo, NY 14023, USA University of Waterloo, Department of Civil and Environmental Engineering, Waterloo, Ontario, Canada

a r t i c l e i n f o

a b s t r a c t

Article history: Received 25 October 2010 Received in revised form 3 February 2012 Accepted 9 February 2012 Available online 9 March 2012

Simulation models assist with designing and managing environmental systems. Linking such models with optimization algorithms yields an approach for identifying least-cost solutions while satisfying system constraints. However, selecting the best optimization algorithm for a given problem is non-trivial and the community would benefit from benchmark problems for comparing various alternatives. To this end, we propose a set of six guidelines for developing effective benchmark problems for simulation-based optimization. The proposed guidelines were used to investigate problems involving sorptive landfill liners for containing and treating hazardous waste. Two solution approaches were applied to these types of problems for the first time e a pre-emptive (i.e. terminating simulations early when appropriate) particle swarm optimizer (PSO), and a hybrid discrete variant of the dynamically dimensioned search algorithm (HD-DDS). Model pre-emption yielded computational savings of up to 70% relative to non-pre-emptive counterparts. Furthermore, HD-DDS often identified globally optimal designs while incurring minimal computational expense, relative to alternative algorithms. Results also highlight the usefulness of organizing decision variables in terms of cost values rather than grouping by material type. Ó 2012 Elsevier Ltd. All rights reserved.

Keywords: Simulation-based optimization Benchmark problems Model pre-emption Sorptive barrier design Dynamically dimensioned search Particle swarm optimization

1. Introduction One approach for utilizing environmental models in a management or design context is to incorporate them into a simulationbased optimization framework e where a process-based environmental model is linked with an optimization search algorithm. The optimization search algorithm iteratively adjusts various simulation model inputs (i.e. decision variables) in order to optimize an application-specific objective function and satisfy problem constraints. The objective function and/or constraints are computed on the basis of simulation model outputs (i.e. response variables). The fundamental output of an optimization algorithm is an ‘optimal’ configuration of design variables along with the corresponding objective function value. Some example applications of simulation-based optimization in an environmental management context include the design of pumpand-treat systems (Bayer and Finkel, 2004; Finsterle, 2006), groundwater supply systems (Mayer et al., 2002), landfill liners

* Corresponding author. E-mail addresses: [email protected] (L.S. Matott), [email protected] (B.A. Tolson), [email protected] (M. Asadzadeh). 1364-8152/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2012.02.002

(Adams et al.,1998), agricultural watershed management (Cools et al., 2011; Gitau et al., 2006; Peña-Haro et al., 2011), and waste load allocation (Burn and Yulianti, 2001). Calibration of environmental simulation models is also typically formulated as a simulation-based optimization problem e where uncertain model parameters are adjusted to minimize differences between simulated outputs and corresponding observational data (Duan et al., 1992; Finsterle and Zhang, 2011). Nicklow et al. (2010) provide an extensive review of simulation model-based optimization applications in water resources planning and management. An uncountable number of optimization algorithms exist and these can be classified based on a number of problem characteristics, such as constrained versus unconstrained, linear versus nonlinear, and discrete- versus continuous-valued decision variables. Simulation-based optimization of environmental models is generally a nonlinear, constrained optimization problem and the most common types of optimization algorithms applied to solve these problems are metaheuristics. According to Collette and Siarry (2003), metaheuristic algorithms (e.g. simulated annealing, genetic algorithms, tabu search and ant colony optimization) are algorithms that: (1) are stochastic (i.e. not deterministic); (2) can be applied to discrete and, after some minor changes, continuous optimization problems; (3) do not require derivative information; and (4) are inspired by

20

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

analogies with physics, biology and nature. Evolutionary algorithms (e.g. genetic algorithms) are a subcategory of metaheuristics and can generally be described as algorithms that mimic evolutionary processes and utilize randomized operators to evolve a population of solutions towards improved solutions. The application of simulation-based optimization is becoming commonplace among the environmental modeling community. For example, Nicklow et al. (2010) document the dramatic increase in the use of evolutionary algorithms for optimization in water resources planning and management over the past two decades. However, the selection and use of the available optimization methods in the majority of environmental modelling publications, including various algorithm comparison studies, appears to have been determined in a more-or-less ad hoc fashion. For example, the choice of algorithm has often been arbitrary and based on a given investigators particular preference. Furthermore, comparison studies involving more than one optimizer often neglect to consider all aspects of algorithm performance and are typically limited to new study-specific optimization problems that are not easily reproduced. Too many new algorithm development and comparison studies overlook the fact that metaheuristic and evolutionary algorithms are stochastic and thus dependent on randomized operators. As such, any single optimization result is simply a sample from a statistical population of possible results and a single optimization trial cannot adequately characterize algorithm performance. To improve the robustness and consistency of algorithm comparisons in the face of an ever-expanding field of candidate metaheuristic algorithms, the environmental modelling community is in need of a benchmarking framework (i.e. a set of practical guidelines for developing benchmark problems) for simulationbased optimization of environmental models. This paper is primarily aimed at highlighting the importance and procedure associated with creating a benchmark simulation-optimization problem in environmental modelling. Unlike the operations research and mathematical optimization communities, in an environmental modelling context this is a major problem that still needs to be addressed. That is, high-quality, realistic and reproducible benchmark problems are simply not readily available to the modelling community. A sound benchmarking framework and resulting benchmark problems will serve to: (1) objectively inform environmental modellers as to which optimization algorithms are good candidates for their problem; and (2) improve the science behind metaheuristic algorithm development. This improvement would be due first to the quality or realism encapsulated in benchmark optimization problems built with our guidelines in mind, and secondly, because the existence of reproducible benchmarks would enable algorithm developers and modellers to easily compare the performance of a wide array of optimization algorithms. 1.1. Related work Within the mathematical optimization community, a variety of standard mathematical benchmark problems have long been established. For example, Herrera et al. (2010) provide a recent test suite that incorporates the well-known Schwefel, Rosenbrock, Griewank, Rastrigin, and Ackley test functions. However, algorithm performance observed for these test problems may not be a good predictor of algorithm performance when applied in an application-oriented and simulation-based optimization context. As a result, it is commonplace within the environmental modeling community to explore search algorithm behavior using a combination of mathematical benchmarks and realistic case studies (e.g. Duan et al., 1992; Parno et al., 2009; Tang et al., 2006).

Importantly, benchmarking in a simulation-based optimization context is different than benchmarking using mathematical test functions commonly employed in the optimization literature. When using mathematical test functions, it is relatively easy and computationally feasible to simply code up a given test function and reproduce previously published results. However, unlike classic optimization test problems, replicating the optimization formulation is non-trivial in simulation-based optimization since it involves a complex numerical simulation model. For example, most simulation model studies, let alone simulation-based optimization studies, do not publish comprehensive details on every single model input or associated model equations due to publication space limitations. Even if such information was available, compiling the thousands of lines of simulation model source code with the exact same options to generate precisely the same output is nontrivial. This problem is compounded by the fact that many simulation model source codes are regularly updated and more recent versions of the code can often fail to replicate outputs of older versions (i.e. those utilized in a previous optimization study). Even when a given formulation is replicated exactly (i.e. using same model, mathematical optimization problem statement, and the same optimizer), the computational expense associated with the embedded simulation model makes it impractical for users of the benchmark problem to regenerate previous optimization algorithm results when a new algorithm is investigated. Instead, a database or catalog of all published optimization results for a given simulationbased benchmark problem is required to facilitate a systematic comparison of alternative algorithms and identification of best-inclass algorithms for a given benchmark. The need to benchmark algorithm performance using realistic environmental modelling case studies has led to the development or adoption of numerous benchmark problems for various areas of environmental and water resources engineering. Mayer et al. (2002) proposed several ‘community’ (i.e. benchmark) problems for the design of pump-and-treat and groundwater supply systems and these problems have been investigated by a number of researchers from a variety of disciplines (e.g. Fowler et al., 2008). Similarly, the water distribution network optimization community has adopted a few benchmark design problems (the Hanoi water distribution network and the New York water supply system problems are commonly used) for testing new optimization algorithms (Vasan and Simonovic, 2010). In the hydrologic modelling community, the multi-institutional research project MOPEX (the model parameter estimation experiment), as described by Schaake et al. (2006), introduced various benchmark hydrologic modelling case studies. MOPEX case studies were designed to compare alternative overall hydrologic modelling approaches which includes the hydrologic simulation model, the method for estimating model parameters in ungauged basins and the model calibration (i.e. optimization) methodology. The above benchmarking studies in environmental and water resources modelling are examples of the most advanced kind of benchmarks available and, as such, have been utilized by numerous researchers in a variety of studies. Their success and widespread use can largely be attributed to the fact that each benchmark is readily available to other researchers. Unfortunately, from the perspective of comparing and developing optimization algorithms, even these advanced benchmarks have significant shortcomings. Most importantly, these benchmarking studies do not specify which specific numerical simulation model should be utilized to simulate the system of interest and instead users of the benchmark problems are free to decide which model to use. In the MOPEX study, for example, researchers were specifically instructed to select their own preferred hydrologic simulation model. Such flexibility means that two different researchers solving the same

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

benchmark problem can generate different solutions due to differences in both the optimization algorithm and differences in the underlying simulation model. Another important shortcoming is that these benchmarks were designed or adopted without specifically considering their usefulness in making algorithm comparisons across multiple independent research efforts. Yet such comparisons would allow for a diverse group of researchers to efficiently and objectively identify best-in-class algorithms. There are many examples of environmental and water resources modelling case studies that could be viewed as a benchmark case study simply because they have appeared in multiple simulationbased optimization studies. For example, popular case studies for exploring optimal waste load allocation strategies include the Tunga-Bhadra river system in India (Rehana and Mujumdar, 2009), the Moore’s Creek watershed in Virginia (Jia and Culver, 2006), and the Willamette river in Oregon (Burn and Yulianti, 2001; Yandamuri et al., 2006). Similarly, there are multiple examples in the field of automatic calibration of environmental models. Examples include: the heavily instrumented Reynolds Creek experimental watershed in Idaho, USA (Slaughter et al., 2001) as utilized by (Franz et al., 2008; Zhang et al., 2009); and the Leaf River watershed in Mississippi, USA (Tang et al., 2006; Vrugt et al., 2005). With the exception of the community problems proposed by Mayer et al. (2002), none of the aforementioned benchmark problems appear to have been developed for the explicit purpose of continued benchmarking against alternative optimizers and thus they have not used any sort of published guidelines. In contrast, this paper introduces and demonstrates a newly developed benchmarking framework (i.e. a set of guidelines for developing benchmark problems for simulation-based optimization). By following the benchmarking framework, researchers can develop benchmark problems that are suitable for robust and ongoing comparisons of alternative optimizers spanning multiple research efforts. To demonstrate the benchmarking framework, it was applied to develop a set of simulation-based optimization problems involving the design of multilayer sorptive barriers. These benchmarks were then utilized in a series of numerical experiments that explored the behavior of various optimization algorithms. 1.2. Research objectives The primary research objective of this work is to introduce a set of guidelines for developing benchmark problems. These guidelines provide a corresponding benchmarking framework for simulationbased optimization of environmental systems. To demonstrate these guidelines, they were applied to develop a new set of benchmark problems for sorptive barrier design. These resulting benchmarks are expected to encourage more widespread exploration of alternative optimization algorithms and help to identify best-in-class approaches for optimal sorptive barrier design. The usefulness of the benchmark problems and associated benchmarking framework is demonstrated by applying two recently developed optimization approaches in comparison to more commonly used metaheuristics. These new alternative approaches are pre-emption enabled particle swarm optimization (PSO) and hybrid discrete dynamically dimensioned search (HD-DDS). The preemption concept and the HD-DDS algorithm are very recent contributions and are presented here with a reasonable amount of detail. Pre-emption enabled PSO has been successfully applied to automatic simulation model calibration problems (Razavi et al., 2010) while HD-DDS was introduced in the context of water distribution network design (Tolson et al., 2009). This is the first application of the generalized HD-DDS algorithm for solving constrained discrete optimization problems in water resources and environmental management. Local search procedures in HD-DDS were

21

designed specifically for known characteristics of the water distribution network design problem and thus not directly applicable to other optimization problems. This is also the first application of the pre-emption enabled PSO and HD-DDS algorithms to sorptive barrier design. Therefore, these methods were applied to the benchmark problems and their performance was compared against several popular alternative algorithms that were applied to the benchmark problems in earlier studies. As described in Section 2.2, the research also compared performance of algorithms under two definitions of the decision variables. 2. Methods The methods section is organized as follows: Section 2.1 introduces our proposed guidelines for developing benchmark problems for simulation-based optimization; Section 2.2 describes the example benchmark problems and illustrates the application of the proposed benchmarking framework; Sections 2.3 and 2.4 briefly describe the HD-DDS and pre-emption enabled PSO algorithms; and Section 2.5 summarizes the numerical experiments that were performed. 2.1. Guidelines for developing benchmark problems for simulation-based optimization Based on our extensive experience with applying optimization algorithms in an environmental modeling context, we feel that high-quality benchmark problems should: (1) be realistic and based on previously published and peer-reviewed work; (2) completely and unambiguously describe the optimization problem and associated constraints; (3) be easily and freely accessible by the community; (4) be easily integrated with a given optimizer via a simplified input/output interface; (5) be accompanied by an available database of optimization results that report multiple aspects of optimizer performance, including efficiency, effectiveness and variability; and (6) provide a convenient means for cataloging new results. These six guidelines constitute our benchmarking framework and the rationale for each guideline is provided in the following sub-sections. 2.1.1. Guideline #1: use realistic peer-reviewed simulation-based optimization problems A given benchmark problem should consist of a simulation model and associated case study (preferably a real versus a hypothetical one), an optimization objective function (or multi-objective formulation, if appropriate), and a set of performance constraints. In this regard, it is imperative that appropriate experts have objectively evaluated and certified the merit and realism of a given benchmark. Thus, for a given problem to be a suitable benchmark, it should be previously published in the peer-reviewed literature. Ideally, this peer-review should be performed in a context that is independent of any algorithm comparison or development activity. That is, the primary purpose of the publication should be to introduce the optimization formulation and associated case study. In this ideal situation, separating the problem review from new algorithm developments or comparisons ensures a thorough review of the scientific merit of the underlying optimization problem. In practice, this ideal situation may be hard to achieve given that many relevant optimization problems are often published together with applications and testing of existing or new optimization algorithms. In such cases, it would be reasonable to consider the optimization problems as benchmark problems. 2.1.2. Guideline #2: optimization problem described completely and unambiguously For a given benchmark problem, robust comparisons of the performance of alternative optimization algorithms requires that each algorithm solves the exact same optimization formulation. Differences in optimizer performance should only be attributable to optimizer differences and random variability (when applying stochastic optimizers). As such, the optimization problem for a given benchmark must be described completely and unambiguously. Thus, and unlike Mayer et al. (2002) who provide users of their community problems with a flexible definition and interpretation of each problem and do not mandate any specific simulation models or optimization cost functions, we argue that benchmark developers should explicitly provide the following information: (1) a formalized optimization problem statement expressed in standard mathematical notation; (2) a clear explanation of any performance constraints that are to be evaluated via an environmental simulation model; and (3) identification of the specific modeling code that is to be utilized for evaluation of these constraints. In terms of solving the optimization problem, numerous constraint handling techniques have been proposed (e.g. explicitly handled via feasible direction methods or implicitly handled via penalty function methods) and the most appropriate method can vary depending on the selected optimization algorithm (Hilton and Culver, 2000). Therefore, although constraints and associated simulators should be explicitly prescribed as an intrinsic part of the benchmark problem, users of a benchmark should have the flexibility to handle these constraints any way they wish.

22

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

Although variations in the optimization formulation may prove useful in better solving the problem, they can easily cloud the comparison between metaheuristic algorithms. Nonetheless, benchmark developers should not ignore the possibility of more effective variant formulations. As such, the freedom to vary the benchmark formulation, if well documented, may warrant consideration. In such a case and to avoid confusing cross-comparisons, we would consider each problem formulation variant a new benchmark problem. 2.1.3. Guideline #3: make benchmarking software easily and freely accessible In addition to providing an unambiguous description of the benchmark problem (see Guideline #2), benchmark developers should identify all related computer code and, where possible, should provide convenient access to any required executables and input files. Ideally, users should not have to write any software related to the calculation of the cost function or performance constraints e instead these should be provided by the developer of the benchmark. Convenient formats for supplying the necessary software and related files include via web download, e-mail request, and ftp server. The philosophy behind this guideline is that potential users of the benchmark should not have to implement the problem in code. Rather, they should just focus on linking their chosen optimizer(s) with the existing benchmark software. 2.1.4. Guideline #4: provide a simple I/O for integrating benchmarks with optimizers To facilitate the philosophies of benchmark reproducibility (see Guideline #2) and convenient adoption (see Guideline #3), developers should provide a simple inputeoutput (I/O) interface for integrating a given benchmark problem with the user’s chosen optimizer. From a software perspective, this interface should effectively encapsulate the optimization problem as a “black-box” containing a limited set of well-defined inputs (e.g. design variables) and outputs (e.g. objective function and performance constraints). In software terminology, the code associated with this simplified I/O interface is known as a ‘wrapper’. Thus, users who want to apply a new optimization algorithm to given benchmark need only focus on the input/ output details of the wrapper software. 2.1.5. Guideline #5: measure multiple aspects of algorithm performance Three general measures of algorithm performance are of particular interest: efficiency, effectiveness and variability. Efficiency characterizes the ability of algorithm to determine high-quality solutions using a minimal amount of computational resources. Efficiency is a critical performance measure when considering that the constraint/ objective function evaluation time for a single solution can require minutes to hours or more depending on the environmental simulation model being invoked by the optimizer. Given the extreme variation in computation time associated with solving environmental simulation model-based optimization problems, measuring the speed at which maximum quality is achieved is a vital piece of information for practitioners e each of whom may face vastly different computational burdens. Effectiveness measures the quality of the algorithm’s solution relative to the best known or globally optimal solution. Variability refers to the ability of an algorithm to consistently identify similar quality solutions over multiple trials. Some examples of studies thoroughly comparing multiple aspects of algorithm performance in the context of environmental modelling include (Bayer and Finkel, 2004; Moré and Wild, 2009). Clearly, there is some degree of subjectivity when precisely defining performance metrics. The important point is that the benchmark developers choose three performance metrics that quantify the performance in terms of the three general measures. Presumably, the benchmark developers are familiar enough with the case study being optimized to make informed decisions on the most appropriate corresponding measures of algorithm performance. Another important aspect of measuring algorithm performance is the degree of algorithm tuning that is applied to adapt a given optimizer to a selected benchmark optimization problem. Ideally, reported measures of algorithm efficiency should include any significant computational resources (e.g. lots of pre-optimization simulation model runs) that are used to tune an optimizer to the problem. At the least, published comparisons between algorithms applied to a given benchmark should clearly indicate the extent of algorithm parameter tuning. The scope of this guideline for benchmark development is limited to emphasizing the concept of measuring multiple aspects of algorithm performance. Such measurements can then be utilized in a variety of ways to perform robust algorithm comparisons. While the range of methods available and an overall framework for comparing multiple algorithms is outside the scope of the proposed benchmark development guidelines, a few notes in this regard are useful. First, statistical hypothesis testing is worth considering when comparing algorithms. While parametric tests on algorithm effectiveness are possible (Regis and Shoemaker, 2007), Garcia et al. (2009) argue that parametric procedures are inappropriate with multiple algorithm comparisons and recommend using non-parametric test procedures instead. Alternatively, Taillard et al. (2008) demonstrate comparing algorithm success rates using multiple statistical tests on proportions. Overall, Rardin and Uzsoy (2001) caution that statistical significance is not always equivalent to practical significance in the context of optimization algorithm comparisons and they conclude that while formal statistics may be helpful at times, a well-conceived graph can often suffice. Future research to formalize an algorithm comparison framework in the context of environmental modelling benchmark problems should consider the current literature on optimization algorithm comparisons (e.g. Ali et al., 1997; Barr et al., 1995; Moré and Wild, 2009; Rardin and Uzsoy, 2001).

2.1.6. Guideline #6: provide mechanism for cataloging published results on the benchmark After a given benchmark is applied and new results are obtained and published in the peer-reviewed literature, the developer of the benchmark should provide a central repository for archiving these new results. The associated repository could be an online database, a periodically updated web site or table, a list-serve, etc. Alternatively, Wiki technology is a promising means for researchers to perform updates without having to go through original developer of the benchmark. As discussed previously (see Section 1.1), a mechanism for cataloging published benchmark results is essential when considering optimization problems that use computationally expensive environmental simulation models. 2.2. Benchmark problems for multi-layer sorptive barrier design In hazardous waste management and environmental remediation, subsurface barrier technologies are commonly employed to prevent contaminant migration and minimize exposure to humans and wildlife. Example barrier technologies include landfill liners, vertical cutoff walls and slurry walls, permeable reactive barriers, and reactive sediment caps. Regulatory guidelines for many barrier systems emphasize the minimization of advective contaminant transport. However, diffusive transport can result in significant contaminant migration through these types of barriers (Foose et al., 2002). Sorptive barriers are a recently developed technology in which the layers of a traditional advection-minimizing barrier are amended with one or more sorptive materials in order to also minimize diffusive transport (Gullick and Weber, 2001). Laboratory studies suggest that a variety of sorptive materials are promising candidates for inclusion in such barriers, including granular activated carbon (GAC), shale, benzyltriethylammonium (BTEA), and hexadecyltrimethylammonium (HDTMA) (Bartelt-Hunt et al., 2005).Anticipating potential new regulations and guidelines for barrier systems, the design of sorptive barriers was recently formulated as a simulation-based optimization problem (Bartelt-Hunt et al., 2006). The optimization formulation seeks to identify the lowest cost barrier configuration that satisfies performance constraints on the cumulative mass exiting the barrier over a given design lifetime. Rather than building and testing all possible barrier configurations, the formulation utilizes a subsurface contaminant transport model to simulate and evaluate the performance of a given candidate system. To date, the simulation-based optimization of sorptive barriers has been investigated by a small number of researchers in the environmental and water resources engineering community (e.g. Bartelt-Hunt et al., 2006; Lo, 1996; Painter, 2005). This paper expands on these efforts to develop a set of benchmark problems for sorptive barrier design that are consistent with our proposed benchmarking framework. The following sub-sections describe the application of the benchmarking framework to develop a set of benchmark optimization problems for designing sorptive landfill liners. 2.2.1. Problem selection (Guideline #1) The barrier optimization problems proposed by Bartelt-Hunt et al. (2006) were selected for developing a representative suite of benchmark problems. In accordance with our benchmarking framework, these problems were previously peer-reviewed and utilize real-world sorption data sets, making them sufficiently realistic case studies. Furthermore, the initial description of the problems (i.e. Bartelt-Hunt et al. (2006)) did not consider algorithm development or comparison, focusing primarily on the development of the optimization problem statement. The selected benchmark problems consist of three simulation-based optimization scenarios which seek to minimize the vertical transport of organic contaminants through a landfill liner. Each problem considers the containment of a different organic contaminant and the considered contaminants are: benzene, trichloroethylene (TCE), and 1,2-dichlorobenzene (1,2-DCB). Each of these contaminants has different physicalechemical properties and will interact differently with a given set of sorptive layers. As a result, optimal liner designs will be different for each contaminant, resulting in three distinct optimization problems. 2.2.2. Problem description (Guideline #2) Each layer of a landfill liner may be constructed from combining a variety of sorptive materials but, in practice, the range of different layers available is constrained to a discrete set of alternatives. The optimization goal is to identify the optimal configuration of multiple layers for a given organic contaminant. As required by Guideline #2, a formalized problem statement for the benchmark sorptive barrier optimization problems is given in Eq. (1): minimize

material cost

zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 6 zfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflffl{ X WðXi Þ f ðXÞ ¼ 5:625ðN  4Þ þ opportunity cost

i¼1

such that 1  Xi  13 for 1  i  4 1  Xi  14 for 5  i  6 gðXÞ  b

(1)

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30 Where a given liner design is represented by a vector of six integer-valued decision variables (X ¼ ½X1 ; X2 ; X3 ; X4; X5 ; X6 T ) defining the order and material composition of a series of 0.15 m layers; f ðXÞ [$/m2] is the corresponding cost of the landfill liner, consisting of both material and opportunity costs; N is the number of layers in the liner, ranging from 4 to 6; W (Xi) [$/m2] is the cost of the ith layer of the liner; gðXÞ [g/ m2] is the simulated performance of the liner, representing the cumulative amount of contaminant passing through the bottom of the liner over a 100-year design lifetime; and b [g/m2] is the maximum tolerable amount of contaminant passing through the bottom of the liner over the design lifetime. Regulations for limiting diffusive transport through barriers have not yet been established and the value of the performance constraint (i.e. b) is somewhat arbitrary. For these benchmark problems, a value of 5 mg/m2 is prescribed for b. Each layer parameter (Xi) is coded as an integer value, and Table 1 summarizes the corresponding material composition and costs. The fifth and sixth layers include a “no layer” composition, allowing for consideration of a variable number of layers. Two types of decision variable encoding were considered for the benchmark problems e encoding by layer composition (i.e. the Coded Value column of Table 1) and encoding by material cost (i.e. the Sorted Value column of Table 1). Previous studies have exclusively utilized the somewhat arbitrary layer composition approach and this study explored whether the more objective material cost approach would yield improved results for selected algorithms. Guideline #2 of the benchmarking framework also requires “a clear explanation of any performance constraints that are to be evaluated via an environmental simulation model.” In this regard, the performance (i.e. gðXÞ) of a given liner system can be evaluated using a suitable one-dimensional subsurface contaminant transport numerical simulation model. In previous barrier optimization studies, two different simulation engines were adapted for this purpose: Mouser (Rabideau, 2003) and Nighthawk (Matott and Rabideau, 2010). Both of these modeling codes are general purpose and can support a variety of transport modeling applications. Following Guideline #2, and for the considered benchmark problems, the Nighthawk code was selected as the specific modeling code to be utilized for evaluation of the constraints. 2.2.3. Benchmark software (Guideline #3) In accordance with Guideline #3 of the benchmarking framework, the benchmark problems have been implemented in readily available software. The software computes the associated cost and performance criteria for a given candidate barrier design. Instruction files are also provided with the software, allowing any interested user to quickly interface the selected benchmark problems with their choice of optimizer. A web-site (http://www.eng.buffalo.edu/wlsmatott/commprobs.html hereafter referred to as the companion web-site), containing a download link for obtaining the necessary files, has been developed to facilitate distribution of the software. Table 2 provides brief descriptions of the executables and input files provided on the web-site. These files implement the selected landfill liner benchmark problems along with a simple I/O interface known as nZones (described below). 2.2.4. Simplified inputeoutput interface (Guideline #4) Following Guideline #4, and to support easy exploration of the benchmark problems, a customized wrapper for the Nighthawk engine was created. The wrapper code is called nZones and utilizes a greatly simplified inputeoutput file

Table 1 Integer-valued encoding of decision variables, adapted from Matott et al. (2006). Coded Value

Layer Composition

Cost (W [$/m2])

1

87% sand, 10% bentonite, 3% BTEA-bentonite 84% sand, 10% bentonite, 6% BTEA-bentonite 81% sand, 10% bentonite, 9% BTEA-bentonite 87% sand, 10% bentonite, 3% HDTMA-bentonite 84% sand, 10% bentonite, 6% HDTMA-bentonite 81% sand, 10% bentonite, 9% HDTMA-bentonite 87% sand, 10% bentonite, 84% sand, 10% bentonite, 81% sand, 10% bentonite, 87% sand, 10% bentonite, 84% sand, 10% bentonite, 81% sand, 10% bentonite, 90% sand, 10% bentonite No layer

19.328595

7

36.495690

11

53.464785

13

20.318595

9

37.683690

12

53.464785

14

2.081805 2.198130 2.309505 10.652595 19.341690 28.030785 1.963500 0.00

3 4 5 6 8 10 2 1

2 3 4 5 6 7 8 9 10 11 12 13 14 a

3% 6% 9% 3% 6% 9%

shale shale shale GAC GAC GAC

Sorted Valuea

Coded values if layers are sorted by cost and not arranged by composition.

23

Table 2 Software and Associated Files for the Benchmark Problems. File

Description

Nighthawk.exe

Version 1.2 (release 05.15.2009) of the Nighthawk simulation engine. Batch file that calls Nighthawk.exe. Simple I/O interface for the benchmark problems; calls Nighthawk.bat. Batch file that calls nZones.exe and deletes extraneous output files. Simple input file for the benchmark problems. Example output file for the benchmark problems. Help file for configuring the input file and reading results from the output file.

Nighthawk.bat nZones.exe nZones.bat MyZonesIn.txt MyZonesOut.txt README.txt

format, relative to the Nighthawk code. For example, the simplified input file format consists of a simple list of coded layer values followed by an integer value identifying the desired organic contaminant (1 e benzene, 2 e TCE, and 3 e 1,2-DCB). Internally, the nZones code converts this simplified file format into much more complicated configuration files utilized by the Nighthawk code. Similarly, the nZones code consolidates lengthy output file information generated by Nighthawk into a simple output file that summarizes the liner configuration and reports the total mass exiting the liner after 100-years of transport. The nZones code also supports two different categories of pre-emption e cost pre-emption and model pre-emption. As described by Razavi et al. (2010), the term pre-emption refers to opportunistically avoiding evaluation of a numerical simulation model during simulation-based optimization. For example, cost pre-emption can be employed when the candidate design is deemed too costly (i.e. f ðXÞ > fPET , where fPET is an algorithm specific pre-emption threshold) such that constraint evaluation (i.e. calculation of gðXÞ via running a computationally expensive simulation model) is completely avoidable (Joslin et al., 2006). Similarly, violation of barrier performance (i.e. gðXÞ > b) may occur in the early stages of simulation, long before complete simulation of the design lifetime (e.g. in this case, constraint violations may occur in year 70 of the 100-year simulation period). The nZones code is pre-emption enabled if selected by the user such that the code avoids evaluation of computationally expensive constraints when possible and when constraint evaluation is required, the code continuously monitors the simulation and performs model pre-emption (i.e. terminating the simulation early) as soon as performance constraints are violated. Usage of pre-emption within nZones is accomplished by specifying the cost (i.e. fPET) and mass flux (i.e. b) pre-emption thresholds in the input file. These thresholds must be harmonized with the behavior of the selected optimization search algorithm and as in Razavi et al. (2010), the thresholds are defined to be ‘deterministic’ such that pre-emption functions only to improve optimizer efficiency and has absolutely no impact on the algorithm search trajectory or final result. Sections 2.3 and 2.4 discuss this harmonization with respect to the PSO and HD-DDS algorithms. To enable constraint-based model pre-emption, the underlying Nighthawk software was modified to facilitate runtime monitoring of the mass flux constraint. Furthermore, the nZones wrapper was also modified to perform cost-based model pre-emption. Thus, enabling model pre-emption for the considered benchmark problem was somewhat more elaborate than a typical “black-box” wrapping effort. Nonetheless, and consistent with Guideline #4 (Section 2.1.4), the resulting input/ output interface is simple and user-friendly e enabling model pre-emption requires the inclusion of only two additional lines to the input file (i.e. to specify cost- and constraint-based thresholds). As results will show, enabling pre-emption is certainly worth the effort because it can potentially yield substantial efficiency gains across a variety of optimization algorithms. Razavi et al. (2010) discuss at length guidelines and options for implementing model pre-emption. The pre-emption results presented in the current study should motivate other benchmark problem developers to consider producing software that similarly enables deterministic pre-emption (cost-based and simulation model-based) while still providing a simple and userfriendly software interface. 2.2.5. Measuring algorithm performance (Guideline #5) As suggested by Guideline #5, we have developed recommended metrics for measuring algorithm performance in terms of efficiency, effectiveness and variability, as applied to the selected benchmark landfill liner design problems. The recommended metrics are as follows: Efficiency e measure in terms of the average number simulation engine evaluations during one optimization trial (since this is the most computational demanding part of the optimization); Effectiveness e express as both the median best cost of multiple trials and the frequency that the best known solution is identified; and Variability e describe using the range (i.e. minimum and maximum) of best costs across multiple trials. For cases where the simulation model runtime accounts for at least 90% of computational resources associated with evaluating a given candidate solution, the most straightforward and hardware independent measure of efficiency is the count of total simulation model evaluations. In addition, empirical CDF (cumulative distribution function) plots of

24

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

algorithm performance for selected results were generated to demonstrate one method for visualizing and comparing multiple aspects of algorithm performance. 2.2.6. Catalog of published results (Guideline #6) The companion web-site will serve as a central repository for tabulating the results of published applications of the selected benchmark problems. To enhance the usability of the data, the table of results is available in both html and Microsoft Excel file formats. For each published result, the following information is included as a table entry: (1) name of the organic contaminant to be contained by the landfill design; (2) the median (over all optimization trials) optimal cost ($/m2) reported by the selected algorithm (i.e. algorithm effectiveness); (3) the 100-year exit mass (g/ m2) performance constraint; (4) the optimal vector of coded layer properties for the median trial; (5) the name of the optimization search algorithm that was employed; (6) the median (over all optimization trials) number of required simulation runs (i.e. algorithm efficiency); (7) the number of optimization trials; (8) the range of reported solutions across all trials (i.e. variability); (9) a reference to the peerreviewed publication where the results first appear; and (10) any additional comments, as appropriate. Practitioners may request updates to the web-site tables based on their own published findings by contacting the lead author via e-mail. 2.2.7. Limitations of the landfill liner case-study The landfill liner problem is simple in that it involves one constraint, an easily computed cost function, and just 6 decision variables. However, this simplicity is adequate for this study since the case study primarily serves as a backdrop for demonstrating the proposed guidelines, with a minor emphasis on exploring recently introduced pre-emption strategies. We don’t foresee any difficulties with applying our guidelines to the benchmarking of more complex optimization problems. Examples of more complicated formulations that should fit within our guidelines include: (1) multi-objective scenarios (e.g. Erickson et al., 2002; Gill et al., 2006; Tang et al., 2006, 2007; Vrugt et al., 2003; Yandamuri et al., 2006; Yapo et al., 1998), (2) optimization under uncertainty (e.g. Baú and Mayer, 2008; Freeze and Gorelick, 1999; Guan and Aral, 2004; Jia and Culver, 2006; Kalwij and Peralta, 2006; Painter and Milke, 2007; Rehana and Mujumdar, 2009), (3) combined economiceengineeringeenvironmental models (e.g. Cools et al., 2011; Draper et al., 2003; Medellín-Azuara et al., 2007; Ximing, 2008), and (4) vast numbers of constraints and/or design variables (e.g. Carroll et al., 2009; Muleta and Nicklow, 2005; Wang et al., 2008; Weng et al., 2010). 2.3. Pre-emption enabled particle swarm optimization To demonstrate the benchmark problems, several recently developed optimization strategies were applied; namely, a pre-emption enabled particle swarm optimizer (PSO) and a variant of the dynamically dimensioned search algorithm (DDS) called Hybrid Discrete-DDS (HD-DDS). The remainder of this section reviews the pre-emption enabled PSO algorithm, while Section 2.4 provides details on the HD-DDS algorithm. Particle swarm optimization (PSO) (Beielstein et al., 2002; Kennedy and Eberhart, 1995) is a population-based global search algorithm that incorporates elements of structured randomness. The algorithm is inspired by the cooperativeecompetitive social behavior of flocking animals, such as birds and fish. In PSO, a swarm (i.e. population) of particles (i.e. candidate solutions) is scattered about the solution space. In search of the optimal set of decision variables, particle positions are updated according to the following vector operations:      v ¼ c wv þ c1 r1 Xp  X þ c2 r2 Xg  X X ¼ Xþv

(2)

Where v is the velocity vector; c is the constriction factor; w is the inertia weight; r1 and r2 are uniform random numbers; c1 and c2 are the cognitive and social weights, respectively; Xp is the configuration of the best solution ever visited by the given particle; and Xg is the configuration of the best solution ever visited by any particle (the current global best). For this study, objective function values (i.e. FðXÞ) associated with a given candidate PSO solution (X) were computed using a penalty function approach that augments the cost function (i.e. f ðXÞ) to account for violations of the exit mass constraint (b). Two penalty function approaches were considered, the additive penalty method (APM) and the multiplicative penalty method (MPM): FðXÞ ¼ f ðXÞ þ pðXÞ ðAPMÞ FðXÞ ¼ f ðXÞ½1 þ pðXÞ ðMPMÞ pðXÞ ¼ 106  max½0; gðXÞ  b

(3)

Where pðXÞ is a penalty function that assigns a cost of $106 per g/m2 of excess exit mass. As implied by Eq. (2), each particle has limited memory and keeps track of a ‘personal’ best solution (Xp with objective function Fp) and the overall best solution of the entire swarm (Xg and Fg). As a result, and with respect to previously evaluated candidate solutions, particle movement is only influenced by the personal and global best solutions. For minimization problems Fp  Fg and for a given particle, candidate solutions that are inferior to the personal best will have no influence on

particle movement. Therefore, the pre-emption concept discussed in Section 2.2.4 can be harmonized with particle swarm optimization by assigning a separate cost-based pre-emption threshold for each particle, corresponding to the current personal best solution of each particle (i.e. fPET ¼ Fp). The mass flux threshold for each particle is simply assigned as the maximum allowable mass flux (b), assumed to be 5  106 g/m2 for this study. In summary and for a given candidate liner Xc, the pre-emption concept as applied to PSO operates as a two step procedure. First, the un-penalized raw cost (fc) of the liner is computed and if fc > Fp then cost-based pre-emption is employed and the simulation engine is not invoked for the candidate solution. Next, if fc < Fp, then the simulation engine is invoked and intermediate values of the cumulative exit mass and associated augmented cost function (Fc) are continuously monitored. During simulation, if Fc > Fp then the simulation is pre-empted (i.e. terminated early). 2.4. Hybrid discrete-dynamically dimensioned search (HD-DDS) algorithm As mentioned previously, the HD-DDS algorithm was recently introduced for solving discrete water distribution system (WDS) design problems (Tolson et al., 2009). The algorithm combines multi-start global and local search strategies to define a powerful yet simple optimizer. HD-DDS begins with a global search step based on the discrete Dynamically Dimensioned Search (discrete DDS) algorithm e a discrete-variable adaptation of the continuous DDS algorithm originally introduced in Tolson and Shoemaker (2007). DDS is a simple algorithm that has only one algorithm parameter with a well-established default value. The algorithm automatically adapts the search to maximize performance within a user-specified maximum number of objective function evaluations. Since DDS is the core of the HD-DDS algorithm, an overview of the algorithm is warranted. Briefly, DDS is a direct (derivative-free) search method that stochastically searches around the best solution (decision variable values) identified so far. Candidate solutions are compared to the current best solution to determine if an update is required. DDS starts by searching globally and transitions to a more local search as the number of objective function evaluations approaches the user-specified maximum. The adjustment from global to local search is achieved by dynamically and probabilistically reducing the number of dimensions in the search neighborhood (i.e. the set of decision variables modified from their best value). Like DDS, in the HD-DDS algorithm candidate solutions are sampled based only on the current best solution and the candidate and current best solutions are compared to determine which is best. This approach instills the HD-DDS algorithm with an intrinsic cost-based pre-emption behavior e a candidate solution with higher costs than a feasible current best solution will not impact the remainder of the search no matter what the corresponding simulation-model based constraint violation status turns out to be. In addition, this aspect of HD-DDS makes handling constraints straightforward without the need for penalty function parameters. Following Deb (2000), the HD-DDS approach defines the objective function such that any infeasible solution always has an objective function value that is worse than the objective function value of any feasible solution. Furthermore, and also following Deb (2000), HD-DDS quantifies the relative magnitude of constraint violations for infeasible solutions so that the relative quality of two infeasible solutions can be compared such that the solution closest to feasibility is deemed to be best. HD-DDS is a hybrid algorithm in that individual trials of the discrete DDS algorithm are augmented with a sequence of efficient enumerative local search polishing steps. The so-called L1 search polishes a given discrete DDS solution by enumerating all lower cost solutions that differ from the current solution in only one decision variable. The second so-called L2 polishing step, starts at a feasible solution and enumerates over all lower cost solutions that differ by only two decision variables, where one decision variable is increased (cost increases) and the other decision variable is decreased (cost decreases). In both polishing approaches, the current best solution is updated if a lower cost, yet still feasible, solution is found. Unlike the completely general implementation described above, the L2 polishing step in the original HD-DDS algorithm was implemented specifically for water distribution network design problems. Assuming the current best solution is feasible, the HD-DDS algorithm can take advantage of both cost and model-based pre-emption. For cost-based pre-emption, the pre-emption threshold is simply set equal to the current best solution plus 106 (this small added value is required to ensure HD-DDS functions the same with preemption since in the event of exact ties between the candidate and current best solution in DDS, the algorithm changes the current best solution to the newly evaluated but equivalent quality candidate solution). For model-based pre-emption, the pre-emption threshold is assigned a value of 5 mg/m2 according to the constraints specified for the selected benchmark problems (see Section 2.2.2). 2.5. Numerical experiments Three sets of numerical experiments were performed using the benchmark problems. The first set of experiments examined the use of a pre-emptive PSO algorithm (pePSO). For each of the three benchmark problems, multiple trials of an ordinary PSO implementation were performed and the results were compared

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30 against multiple trials of a pre-emptive implementation. Both the ordinary and preemptive implementations were configured to use identical algorithm settings. That is, the random seeds, the swarm size, number of generations, constriction factor, inertia weight, cognitive weight, and social weight were the same for corresponding PSO and the pePSO experiments. The next set of numerical experiments examined the performance of the HDDDS algorithm when applied to the benchmark problems. Multiple HD-DDS trials under the default parameter setting were performed for each benchmark problem and the results were compared to those generated via multiple trials of several alternative heuristic optimization algorithms: a binary-coded genetic algorithm (BGA), a discrete-value simulated annealing algorithm (DSA), and the aforementioned ordinary and pre-emptive PSO implementations. Algorithm parameters for the DSA, BGA and PSO algorithms were assigned using ‘tuned’ values that were previously established for the selected landfill liner problems (Matott et al., 2006). The penalty function approach to constraint handling used for PSO (see Eq. (3)) was also used for DSA and BGA and the penalty function parameters were also tuned previously. The maximum computational budget input to each of the DSA, BGA and PSO algorithms was also tuned previously and as such, the maximum budget specified in this study varies depending on the algorithm and the benchmark problem. Overall, these prior tuning experiments required a massive computational effort to tune the DSA, BGA and PSO algorithms e however, as that tuning effort was exploratory and designed for appraisal of a new tuning methodology, the associated expense is not included in the results reported here. HD-DDS performance was evaluated at two maximum computation budgets, M ¼ 500 and M ¼ 2000 objective function evaluations per optimization experiment. Note that since HD-DDS adapts its search behavior according to the budget (i.e. M), two separate optimization trials were required to evaluate HD-DDS for the two different computational budgets. For the shorter HD-DDS experiment (M ¼ 500) the pre-emption enabled HD-DDS algorithm was compared against HD-DDS with no cost- or model-based pre-emption to estimate computational savings attributable to pre-emption. As results will demonstrate, the utilized computational budget in HD-DDS can be different than the maximum budget specified. For example, the utilized budget can be much smaller than pre-set maximum budget when the HD-DDS algorithm quickly completes all steps of the search and terminates early. The utilized budget in HD-DDS can also be just slightly more than the maximum budget because HD-DDS is not coded to check the progress against the maximum budget before every single objective function evaluation. All computational budgets reported for HD-DDS are the utilized computational budget (i.e. how long users would wait for a final answer from HDDDS). The numerical experiments described previously, with the exception of HDDDS, utilized the initially proposed layer composition approach to encoding decision variables (see Section 2.2.2). The final set of numerical experiments examined the use of a more objective approach that encodes the decision variables according to ascending values of material cost. The L1 and L2 phases of HD-DDS are coded specifically for the case where decision variables are encoded according to ascending values of costs. A small program called ReMap was created to convert from this material cost encoding (which was used by the optimizers) to the layer composition approach required by the nZones program. Using ReMap as an nZones preprocessor, the previous optimization experiments were repeated using the alternative encoding.

3. Results and discussion A total of 216 optimization experiments (4 algorithms  9 trials per algorithm  2 encoding methods  3 optimization problems) were completed for the numerical experiments described in Section 2.5 for the PSO, pePSO, DSA and BGA algorithms. An additional 81 optimization experiments (3 different HD-DDS experiments  9 trials per budget  1 encoding method  3 optimization problems) were completed for HD-DDS. Nine optimization trials (replicates) were selected for the current study under that assumption that this would be sufficient to characterize the central tendency of a given algorithm. In general, multiple replicates are required since for stochastic optimization algorithms, each optimization result is a sample result from the statistical population of possible results. Using an odd number replicates yields a unique median trial (and associated design variables) and is therefore more convenient than using an even number of replicates. A performance summary expressed in terms of the efficiency, effectiveness and variability of all the considered algorithms and encoding methods is given in Table 3 through Table 5 for the benzene, TCE, and 1,2-DCB problems, respectively. Additionally, Fig. 1 contains plots of the cumulative distributions of best cost results across all trials for selected algorithms. To maintain consistency across algorithms, the efficiency of

25

Table 3 Summary of Optimization Results for the Benzene Problem (note: cost values have been rounded to two decimal places). Effectivenessa

Algorithm

Efficiencyb

Variabilityc

Design Variables Encoded Using Layer Composition Approach PSO 41.75 (0%) 10101 d 41.75 (0%) 3910 pePSO HD-DDS NA NA BGA 41.74 (33%) 10101 DSA 49.33 (0%) 3309 Design Variables Encoded Using Material Cost Approach

41.74 41.74 NA 41.62 41.86

to 49.33 to 49.33

PSO

41.74 41.74 41.74 41.74 41.62 41.62 41.62

to to to to to to to

49.33 49.33 41.74 41.74 41.74 41.74 41.86

d pePSO

HD-DDS (500) peHD-DDS (500) peHD-DDS (2000) BGA DSA

(0%) (0%) (0%) (0%) (11%) (11%) (11%)

10101 6784 518 425e <1350f 10101 5122

to 41.98 to 49.57 49.56 49.56 50.20 50.20 42.73 41.74 49.33

a

median of best costs over 9 trials, % is freq. best known solution found. computational budget in terms of number of solutions fully evaluated during the search (constraint þ cost). c range of best costs over 9 trials. d PSO and pePSO used identical random seeds for corresponding trials, same for pairs of HD-DDS and peHD-DDS. NA e not available. e computed using Eq. (4). f Worst-case (non pre-emption) value (see Section 3 for discussion). b

the pre-emption enabled algorithms is reported as the effective number of model evaluations (or solutions fully evaluated) required and is computed from the observed computational time savings achieved via pre-emption in comparison to the algorithm with no pre-emption as follows:

NpeAlg ¼

tpeAlg tAlg

!  NAlg

(4)

Where NpeAlg is the effective number of model evaluations required by the pre-emption enabled algorithm (in this paper, this includes only PSO and HD-DDS); tpeAlg and tAlg are the computation time (in seconds) required by the pre-emptive and ordinary algorithms, respectively; and NAlg is the number of model evaluations required by the ordinary algorithm. In Tables 3e5, algorithm efficiency is the computational budget utilized by each algorithm and is measured in terms of the total number of solutions evaluated (objective and constraint evaluation) by the algorithm over the course of the optimization. Since the PSO, BGA and DSA algorithms were not formulated to use convergence criteria, they always ran until their pre-determined user input computational budget was exhausted. As noted earlier, the number

Table 4 Summary of Optimization Results for the TCE Problem (note: cost values have been rounded to two decimal places). Algorithm

Effectiveness

Efficiency

Variability

Design Variables Encoded Using Layer Composition Approach PSO

16.26 (0%) 4520 16.26 (0%) 1526 HD-DDS NA NA BGA 8.90 (22%) 2551 DSA 23.73 (11%) 2152 Design Variables Encoded Using Material Cost Approach

16.26 to 23.74 16.26 to 23.74 NA 8.79 to 16.37 8.79 to 23.86

PSO pePSO HD-DDS (500) peHD-DDS (500) peHD-DDS (2000) BGA DSA

8.79 8.79 8.79 8.79 8.79 8.79 8.79

pePSO

16.26 (11%) 16.26 (11%) 8.79 (45%) 8.79 (45%) 8.79 (67%) 8.79 (89%) 8.79 (56%)

3547 1372 374 115 <1065 2551 2152

to to to to to to to

23.85 23.85 16.26 16.26 16.26 16.26 16.26

26

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

of solutions evaluated by HD-DDS is not always precisely equal to the user input computational budget. For the selected benchmark problems, significantly fewer solutions were actually evaluated than the maximum allowable limit for HD-DDS with M ¼ 2000.

Table 5 Summary of Optimization Results for the 1,2-DCB Problem (note: cost values have been rounded to two decimal places). Algorithm

Effectiveness

Efficiency

Variability

Design Variables Encoded Using Layer Composition Approach PSO

16.95 (0%) 1051 16.95 (0%) 342 HD-DDS NA NA BGA 24.19 (0%) 1051 DSA 24.20 (11%) 5122 Design Variables Encoded Using Material Cost Approach

16.83 16.83 NA 16.94 16.65

to 41.56 to 41.56

PSO pePSO HD-DDS (500) peHD-DDS (500) peHD-DDS (2000) BGA DSA

16.84 16.84 16.66 16.66 16.66 16.66 16.66

to to to to

pePSO

(0%) (0%) (56%) (56%) (100%) (44%) (22%)

Probability of Equal or Better Solution

24.20 24.20 16.66 16.66 16.66 16.78 16.83

1051 527 419 126 <1102 1051 5122

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 41.5

24.31 24.31 24.19 16.83

to 24.19 to 24.19

and HD-DDS vs.

peHD-DDS

Benzene

HD-DDS (2000) BGA DSA

PSO 42.5

43.5

44.5

45.5

Cost ($/m2) TCE

46.5

47.5

48.5

49.5

HD-DDS (2000) BGA DSA PSO 8

10

12

14

16

Cost ($/m2) 1,2-DCB

18

20

Probability of Equal or Better Solution

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

pePSO

Tables 3e5 compare the efficiency, effectiveness and variability of the pre-emption enabled and ordinary PSO and HD-DDS algorithms. As described earlier, the pre-emption concepts applied here are deterministic in that their use has no impact on the algorithm result e pre-emption only functions to enhance efficiency. Nonetheless, experiments were repeated and all three measures are reported simply to demonstrate empirically that algorithm results (variability and effectiveness) are the same with or without pre-emption. Examination of the PSO and pePSO entries in Tables 3e5 suggests that the computational savings achievable via pre-emption ranges from 33% to 67% with the variation due to both the problem (i.e. pollutant) and the decision variable encoding technique. For example, computational savings for the benzene problem are

to 24.31 to 24.49

Probability of Equal or Better Solution

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

3.1. Comparison of PSO vs.

22

24

HD-DDS (2000) BGA DSA

PSO 16

18

20

22

24

26

28

30

Cost ($/m2)

32

34

36

38

40

42

Fig. 1. Empirical cumulative distribution functions of the best cost based on 9 trials for various algorithms. HD-DDS(2000) e Hybrid Discrete Dynamically Dimensioned Search algorithm with a budget of 2000 function evaluations; BGA e Binary Coded Genetic algorithm; DSA e Discrete Simulated Annealing algorithm; PSO e Particle Swarm Optimization. All PSO results and the BGA/Benzene application correspond to a layer composition approach to design variable encoding. All other results correspond to the material cost approach to design variable encoding. The actual number of solutions evaluated for all algorithms varies and is reported as the efficiency in Tables 3e5.

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

approximately 61% for the layer composition approach (computed as [1  (3910 O 10101)]) and 33% for the material cost approach (computed as [1  (6784 O 10101)]) when using pePSO instead of a standard PSO. However, savings for the TCE (Table 4) and 1,2-DCB (Table 5) problems were somewhat less dependent on the decision variable encoding method e with savings of 66% (TCE) and 67% (1,2-DCB) for the layer composition approach and 61% (TCE) and 50% (1,2-DCB) for the material cost approach. In absolute terms, even the most modest result (i.e. 33% savings for pePSO applied to the benzene problem and using a material cost encoding approach) yielded a dramatic reduction in compute time e for that problem, using pre-emption reduced the average computational cost for a given trial by 14 h (i.e. from 43 h to 29 h). This highlights the usefulness of model pre-emption for even moderately computationally expensive problems. Similar to the PSO results described above, Tables 3e5 demonstrate that at a budget of M ¼ 500 for HD-DDS, computational savings via pre-emption are substantial (an average of 52%) and dependent on the problem. For example HD-DDS pre-emption savings are 18% (Benzene), 69% (TCE) and 70% (1,2-DCB). HD-DDS pre-emption savings were not evaluated for the budget of M ¼ 2000 e at this computational budget, only the pre-emption enabled version of HD-DDS was applied. Our previous experience with DDS and pre-emption suggests that pre-emption savings would be somewhat different for a computational budget of M ¼ 2000. 3.2. Performance of HD-DDS Relative to the considered alternatives (i.e. BGA, DSA, PSO and and across all problems, the HD-DDS algorithm consistently yielded significant improvements in one or more performance categories (i.e. effectiveness, computational efficiency, and variability). For the benzene problem, HD-DDS (M ¼ 2000) had the best effectiveness (both in terms of median cost and frequency of identifying the best known solution) compared to the four alternatives despite the fact that the four alternatives utilized a computational budget that was between 2 and 7 times larger than HD-DDS with M ¼ 2000. Performance of the reduced budget HD-DDS with preemption (M ¼ 500) was also good relative to the four alternatives with comparable effectiveness and only a slight increase in variability. This performance advantage is observed despite the fact that the four alternatives utilized a computational budget that was between 6 and 24 times larger than HD-DDS with M ¼ 500. Similarly impressive results for HD-DDS were observed for the TCE problem with the only difference being that HD-DDS with M ¼ 2000 showed the same effectiveness as the BGA, but the BGA showed a slightly higher frequency of returning the best known solution (89% compared to 67% for HD-DDS). However, it is worth noting that the BGA required approximately twice the computational budget utilized by HD-DDS (M ¼ 2000). In terms of solving the 1,2-DCB problems with a small computational budget (i.e. < 1000 solutions evaluated), HD-DDS with M ¼ 500 shows pePSO)

27

improved performance in all categories (i.e. both measures of effectiveness, as well as efficiency and variability) relative to the four alternative algorithms. Considering a larger computational budget (i.e. >1000 solutions evaluated) applied to solve the 1,2-DCB problem (i.e. HD-DDS with M ¼ 2000) resulted in all 9 trials converging on the best known solution. In comparison, the nextbest (non-HD-DDS) algorithm for the 1,2-DCB problem in terms of frequency of returning the best known solution was the BGA, which yielded the best known solution in only 44% of the trials. 3.3. Algorithm performance analyzed using cumulative distribution functions As discussed previously in Section 2.2.5, empirical CDF plots are convenient for visually comparing the performance of alternative algorithms. Selected CDF plots are presented in Fig. 1 and these correspond to the best-performing (in terms of effectiveness) results for a given algorithm. For example, the PSO algorithm yielded its best performance when paired with the layer composition approach to design variable encoding and the corresponding trials were selected for inclusion in the plots. Similar reasoning led to the inclusion of HD-DDS (2000) results and selected BGA and DSA results. Examination of the CDF plots suggests that the PSO algorithm resulted in the greatest variability across trials, with numerous higher-cost outliers being evident in corresponding Benzene and 1,2-DCB solutions. Furthermore, the failure of the PSO algorithm to converge on an optimal or even nearly-optimal solution to the TCE problem is also readily evident. Conversely, and with the exception of DSA as applied to the Benzene problem, the other algorithms exhibited less variability and have a high likelihood of converging on a solution that is at least nearly optimal. These results suggest that care must be taken when using the PSO algorithm on these types of problems as there is a potential for convergence on significantly inferior solutions. The CDFs shown in Fig. 1 are compared (pairwise) to determine if they are significantly different using the two-sample, two-sided KolmogoroveSmirnov (KS) test (see Gibbons and Chakraborti, 1992) and results are summarized in Table 6. This non-parametric test was selected because the probability distribution of best objective function values returned by any optimization algorithm for any of the problems solved in this paper is unknown, discrete and clearly not normal. The non-parametric Wilcoxon (or ManneWhitney) rank sum test has been used for comparing the performance of two optimization algorithms (Hadka and Reed, in press; Tang et al., 2007) but it is not applicable to discrete random variables. The null hypothesis of the two-sided KS test is that the CDFs of the continuous random variables are not equal while the alternative hypothesis is that they are unequal. The significance level of this test is calculated under the assumption the random variables are continuous and this continuous significance level is an upper bound estimate (conservative) for the actual discrete significance level (Goodman, 1954; Noether, 1963; Walsh, 1963).

Table 6 P-values of pairwise algorithm comparison results (comparing only the CDFs shown in Fig. 1) based on the two-sample KolmogoroveSmirnov test. Bolded P-values highlight CDFs that are significantly different at the 10% level. Italicized algorithm below each bolded P-value indicates the clearly preferred algorithm result (see Fig. 1). BGA

DSA

PSO

HD-DDS (2000)

Benzene

TCE

1,2-DCB

Benzene

TCE

1,2-DCB

Benzene

TCE

1,2-DCB

Benzene

TCE

1,2-DCB

BGA

e

e

e

0.603

0.958

e

e

e

e

e

0.001 BGAa 0.603

0.001 BGAa 0.001 DSAa

0.078 BGAa 0.004 DSAa

0.958

DSA

0.077 BGAa e

0.001 BGAa 0.958

PSO

e

e

e

e

e

e

0.078 HD-DDS 0.019 HD-DDS 0.0001 HD-DDS

a

Indicates preferred algorithm utilized >10% more computational budget than alternative algorithm in pairwise comparison.

0.078 HD-DDS 0.001 HD-DDS

0.958

28

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

Based on the P-values in Table 6, it is clear that many of the differences between algorithm CDFs are statistically significant (at the 10% significance level). The nature of these significant CDF differences (i.e. whether one algorithm is clearly preferred over another) can only be ascertained by visually comparing their CDFs. In all our pairwise comparisons yielding a significant difference between algorithm results, the preferred result is always clear because for any desired cost level, one CDF (the probability of finding an equal or better cost solution) is always equal or higher than the CDF of the alternative result (e.g., see BGA vs. PSO in TCE problem in Fig. 1). Table 6 indicates that the results of the BGA and HD-DDS algorithms are usually significantly different (and preferred) over the PSO and DSA algorithm results. In all cases where the BGA results were preferred, BGA utilized substantially more computational budget to solve the problem in comparison with the alternative algorithms. A few notes regarding the use of statistical test of hypothesis for comparing algorithms are warranted. Ideally, when comparing algorithms, it is best to equalize their computational budgets. Otherwise, some of the significant differences may arise from an algorithm having substantially more time to optimize a given problem (e.g., see BGA algorithm in Table 6). We have demonstrated only one of many potential statistical tests for comparing two optimization algorithms. 3.4. Influence of design variable encoding on algorithm performance As shown in Tables 3e5, the choice of encoding for the discretevalued layer parameters significantly influenced the performance of several of the considered algorithms. For example, the DSA and BGA algorithms were generally more effective when the material cost method of encoding was used. Conversely, the effectiveness of the PSO and pePSO algorithms were reduced when using the material cost encoding approach instead of the layer composition approach. Since the PSO, DSA and BGA algorithm parameters were tuned based on the layer composition approach, we would expect to see even stronger results in favor of the material cost encoding approach if the PSO, DSA and BGA algorithms were tuned specifically to the material cost encoding approach. 4. Conclusions This paper introduces a benchmarking framework, expressed as a series of 6 practical guidelines, for simulation-based optimization of environmental models. The framework is intended to address significant deficiencies in the more-or-less ad hoc approach to developing benchmark problems that is currently being employed within the environmental modelling community. The benchmarking framework was demonstrated by using it to develop a set of three benchmark problems for simulation-based optimization of sorptive barriers and a corresponding suite of example optimization algorithm performance comparisons. The benchmark problems and optimization results are both available for download from the companion web-site. The benchmark problems introduced here can be utilized by practitioners designing sorptive barriers either to make sure their preferred optimization algorithm is relatively effective or to select the best performing candidate from the list of previously applied algorithms. More generally, these problems can serve as benchmarks for researchers in global optimization developing algorithms to solve cost minimization problems with discrete decision variables and one or more constraints that are evaluated using the results of a numerical simulation model. In addition to demonstrating the proposed guidelines for developing benchmark problems for simulation-based optimization, there are three additional findings of note in this work. First of

all, the cost-based deterministic pre-emption concept from Tolson et al. (2009) was combined for the first time with the simulation model-based deterministic pre-emption concept first introduced by Razavi et al. (2010). Results show substantial computational savings of 33%e70% (with no change in optimization results) for the two pre-emption enabled algorithms (HD-DDS and PSO) relative to their ordinary versions with no pre-emption. In addition, the recently introduced HD-DDS algorithm was applied to the selected sorptive barrier design problems and compared with a suite of algorithms that have been previously applied to the same sorptive barrier problems (PSO, BGA, DSA). Even though it was applied using a default algorithm parameter setting, the simple HD-DDS algorithm generally outperformed tuned implementations of PSO, pePSO, BGA and DSA in almost all performance measures. For example, HD-DDS with a substantially smaller computational budget than PSO, pePSO, BGA or DSA demonstrated an improved effectiveness for the 1,2-DCB problem. For the Benzene and TCE problems, HD-DDS and BGA performance was very similar despite HD-DDS using between one-eighth (Benzene) and one- half (TCE) the computational budget as BGA. Lastly, and unlike previous studies on sorptive barrier design (Bartelt-Hunt et al., 2006; Lo, 1996; Painter, 2005) which defined a somewhat arbitrary order of decision variable options, results here show that defining decision variable options according to their costs generally improves the performance of the considered algorithms. 4.1. Broader implications In other disciplines, suites of optimization benchmarks (e.g. many of which are tabulated at http://www.mat.univie.ac.at/wneum/ glopt/test.html) are long-established and their common use provides a consistent basis for evaluating relative algorithm performance for various types of problems. In our view, the environmental modelling community is a long way from assembling a comparable suite of benchmark environmental simulationeoptimization problems each having a known level of problem difficulty. Only after careful and systematic benchmarking of individual problems can the community begin to empirically judge the difficulty of various problems and then form appropriate suites of test problems. As such, the guidelines for benchmark development proposed herein may be viewed as an important preliminary step in the process. Acknowledgments This work was performed in part at the University at Buffalo Center for Computational Research (UB CCR, www.ccr.buffalo.edu) and funded in part by Dr. Tolson’s NSERC Discovery Research Grant. The authors thank Dr. Shannon Bartelt-Hunt for approving the development of the benchmark problems and companion web-site. Optimization software used in this study is freely available and can be downloaded from www.groundwater.buffalo.edu (OSTRICH, which implements pePSO, PSO, BGA, and DSA among other algorithms) and www.civil.uwaterloo.ca/btolson/software.htm (HD-DDS). References Adams, B.J., Karney, B.W., Cormier, C.J., Lai, A., 1998. Artesian landfill liner system: optimization and numerical analysis. Journal of Water Resources Planning and Management 124 (6), 345e356. Ali, M.M., Torn, A., Viitanen, S., 1997. A numerical comparison of some modified controlled random Search Algorithms. Journal of Global Optimization 11 (4), 377e385. Barr, R., Golden, B., Kelly, J., Resende, M., Stewart, W., 1995. Designing and reporting on computational experiments with heuristic methods. Journal of Heuristics 1 (1), 9e32. Bartelt-Hunt, S.L., Smith, J.A., Burns, S.E., Rabideau, A.R., 2005. Evaluation of granular activated carbon, shale and two organoclays for use as sorptive

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30 amendments in clay landfill liners. Journal of Geotechnical and Geoenvironmental Engineering 131 (7), 848e856. Bartelt-Hunt, S.L., Culver, T.B., Smith, J.A., Matott, L.S., Rabideau, A.J., 2006. Optimal design of a compacted soil liner containing sorptive amendments. Journal of Environmental Engineering 132 (7), 769e776. Baú, D.A., Mayer, A.S., 2008. Optimal design of pump-and-treat systems under uncertain hydraulic conductivity and plume distribution. Journal of Contaminant Hydrology 100 (1e2), 30e46. Bayer, P., Finkel, M., 2004. Evolutionary algorithms for the optimization of advective control of contaminated aquifer zones. Water Resources Research 40 (W06506), 19 pp. Beielstein, T., Parsopoulos, K.E., Vrahatis, M.N., 2002. Tuning PSO parameters through sensitivity analysis. Technical Report, Reihe Computational Intelligence CI 124/02. Collaborative Research Center, Department of Computer Science, University of Dortmund. Available online at: http://ls11-www.cs.uni-dortmund. de/people/tom/. Burn, D.H., Yulianti, J.S., 2001. Waste-load allocation using genetic algorithms. Journal of Water Resources Planning and Management 127 (2), 121e129. Carroll, R.W.H., Pohll, G.M., Hershey, R.L., 2009. An unconfined groundwater model of the Death Valley Regional Flow System and a comparison to its confined predecessor. Journal of Hydrology 373 (3e4), 316e328. Collette, Y., Siarry, P., 2003. Multiobjective Optimization: Principles and Case Studies. Springer, Berlin, Germany. Cools, J., Broekx, S., Vandenberghe, V., Sels, H., Meynaerts, E., Vercaemst, P., Seuntjens, P., Van Hulle, S., Wustenberghs, H., Bauwens, W., Huygens, M., 2011. Coupling a hydrological water quality model and an economic optimization model to set up a cost-effective emission reduction scenario for nitrogen. Environmental Modelling & Software 26 (1), 44e51. Deb, K., 2000. An efficient constraint handling method for genetic algorithms. Computer Methods in Applied Mechanics and Engineering 186 (2e4), 311e338. Draper, A.J., Jenkins, M.W., Kirby, K.W., Lund, J.R., Howitt, R.E., 2003. Economicengineering optimization for California water management. Journal of Water Resources Planning and Management 129 (3), 155e164. Duan, Q., Sorooshian, S., Gupta, V.K., 1992. Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resources Research 28 (4), 1015e1031. Erickson, M., Mayer, A., Horn, J., 2002. Multi-objective optimal design of groundwater remediation systems: application of the niched Pareto genetic algorithm (NPGA). Advances in Water Resources 25 (1), 51e65. Finsterle, S., Zhang, Y., 2011. Solving iTOUGH2 simulation and optimization problems using the PEST protocol. Environmental Modelling & Software 26 (7), 959e968. Finsterle, S., 2006. Demonstration of optimization techniques for groundwater plume remediation using iTOUGH2. Environmental Modelling & Software 21 (5), 665e680. Foose, G.J., Benson, C.H., Edil, T.B., 2002. Comparison of solute transport in three composite liners. Journal of Geotechnical and Geoenvironmental Engineering 128 (5), 391e403. Fowler, K.R., Reese, J.P., Kees, C.E., Dennis Jr., J.E., Kelley, C.T., Miller, C.T., Audet, C., Booker, A.J., Couture, G., Darwin, R.W., Farthing, M.W., Finkel, D.E., Gablonsky, J.M., Gray, G., Kolda, T.G., 2008. Comparison of derivative-free optimization methods for groundwater supply and hydraulic capture community problems. Advances in Water Resources 31 (5), 743e757. Franz, K.J., Hogue, T.S., Sorooshian, S., 2008. Operational snow modeling: addressing the challenges of an energy balance model for National Weather Service forecasts. Journal of Hydrology 360 (1e4), 48e66. Freeze, R.A., Gorelick, S.M., 1999. Convergence of stochastic optimization and decision analysis in the engineering design of aquifer remediation. Ground Water 37 (6), 934e954. García, S., Molina, D., Lozano, M., Herrera, F., 2009. A study on the use of nonparametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 special session on real parameter optimization. Journal of Heuristics 15, 617e644. Gibbons, J.D., Chakraborti, S., 1992. Nonparametric Statistical Inference, third ed. Marcel Dekker, Inc., New York. Gill, M.K., Kaheil, Y.H., Khalil, A., Mckee, M., Bastidas, L., 2006. Multiobjective particle swarm optimization for parameter estimation in hydrology. Water Resources Research 42 (7), W07417. doi:07410.01029/02005WR004528. Gitau, M.W., Veith, T.L., Gburek, W.J., Jarrett, A.R., 2006. Watershed level best management practice selection and Placement in the Town Brook watershed, New York. Journal of the American Water Resources Association 42 (6), 1565e1581. Goodman, L.A., 1954. KolmogoroveSmirnov test for psychological research. Psychologicul Bulletin 51, 160e168. Guan, J., Aral, M.M., 2004. Optimal design of groundwater remediation systems using fuzzy set theory. Water Resources Research 40 (1), W01518. doi:01510.01029/02003WR002121. Gullick, R.W., Weber, W.J., 2001. Evaluation of shale and organoclays as sorbent additives for low-permeability soil containment barriers. Environmental Science & Technology 35 (7), 1523e1530. Hadka, D., Reed, P., in press. Diagnostic Assessment of Search Controls and Failure Modes in Many-Objective Evolutionary Optimization. Evolutionary Computation, doi:10.1162/EVCO_a_00053. Herrera, F., Lozano, M., Molina, D., 2010. Test Suite for the Special Issue of Soft Computing on Scalability of Evolutionary Algorithms and Other Metaheuristics

29

for Large Scale Continuous Optimization Problems. Available from:. University of Granada, Granada, Spain. 12 p. http://sci2s.ugr.es/eamhco/updatedfunctions1-19.pdf. Hilton, A.B.C., Culver, T.B., 2000. Constraint handling for genetic algorithms in optimal remediation design. Journal of Water Resources Planning and Management 126 (3), 128e137. Jia, Y., Culver, T.B., 2006. Robust optimization for total maximum daily load allocations. Water Resources Research 42 (2), W02412. Joslin, D., Dragovich, J., Hoa, V., Terada, J., 2006. Opportunistic fitness evaluation in a genetic algorithm for civil engineering design optimization. In: Proceedings of the 2006 IEEE Congress on Evolutionary Computation: Vancouver, BC, Canada, pp. 2904e2911. Kalwij, I.M., Peralta, R.C., 2006. Simulation/optimization modeling for robust pumping strategy design. Ground Water 44 (4), 574e582. Kennedy, J., Eberhart, R.C.,1995. Particle Swarm Optimization. Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, 1942e1948 pp. Lo, I.M.-C., 1996. Optimization in thickness of a liner composed of claymaxÒ and organo-clay. Water Science and Technology 34 (7e8), 421e427. Matott, L.S., Rabideau, A.J., 2010. NIGHTHAWKeA program for modeling saturated batch and column experiments incorporating equilibrium and kinetic biogeochemistry. Computers & Geosciences 36 (2), 253e256. Matott, L.S., Bartelt-Hunt, S.L., Rabideau, A.J., Fowler, K.R., 2006. Application of heuristic optimization techniques and algorithm tuning to multi-layered sorptive barrier design. Environmental Science & Technology 40 (20), 6354e6360. Mayer, A.S., Kelley, C.T., Miller, C.T., 2002. Optimal design for problems involving flow and transport phenomena in saturated subsurface systems. Advances in Water Resources 25 (8e12), 1233e1256. Medellín-Azuara, J., Mendoza-Espinosa, L.G., Lund, J.R., Ramírez-Acosta, R.J., 2007. The application of economic-engineering optimization for water management in Ensenada, Baja California, Mexico. Water Science & Technology 55 (1e2), 339e347. Moré, J.J., Wild, S.M., 2009. Benchmarking derivative-free optimization algorithms. SIAM Journal on Optimization 20 (1), 172e191. Muleta, M.K., Nicklow, J.W., 2005. Sensitivity and uncertainty analysis coupled with automatic calibration for a distributed watershed model. Journal of Hydrology 306, 127e145. Nicklow, J., Reed, P., Savic, D., Dessalegne, T., Harrell, L., Chan-Hilton, A., Karamouz, M., Minsker, B., Ostfeld, A., Singh, A., Zechman, E., 2010. State of the art for genetic algorithms and beyond in water resources planning and management. Journal of Water Resources Planning and Management 136 (4), 412e432. Noether, G., 1963. Note on the kolmogorov statistic in the discrete case. Metrika 7 (1), 115e116. Painter, B.D.M., Milke, M.W., 2007. Comparison of factorial and scenario analysis methods for assessing uncertainty in the design of permeable reactive barriers. Ground Water Monitoring & Remediation 27 (3), 102e110. Painter, B.D.M., 2005. Optimization of Permeable Reactive Barriers for the Remediation of Contaminated Groundwater, Natural Resource Engineering Department. Lincoln University, Canterbury, New Zealand. Parno, M.D., Fowler, K.R., Hemker, T., 2009. Framework for Particle Swarm Optimization with Surrogate Functions. Darmstadt Technical University, Darmstadt, Germany. 11 p. Peña-Haro, S., Pulido-Velazquez, M., Llopis-Albert, C., 2011. Stochastic hydroeconomic modeling for optimal management of agricultural groundwater nitrate pollution under hydraulic conductivity uncertainty. Environmental Modelling & Software 26 (8), 999e1008. Rabideau, A.J., 2003. MOUSER Version 1 User’s Manual. Available from. University at Buffalo, Department of Civil, Structural, and Environmental Engineering, Buffalo, NY. : www.groundwater.buffalo.edu. Rardin, R.L., Uzsoy, R., 2001. Experimental evaluation of heuristic optimization algorithms: a tutorial. Journal of Heuristics 7, 261e304. Razavi, S., Tolson, B.A., Matott, L.S., Thomson, N.R., MacLean, A., Seglenieks, F.R., 2010. Reducing the computational cost of automatic calibration through model preemption. Water Resources Research 46 (11), W11523. Regis, R.G., Shoemaker, C.A., 2007. Improved strategies for Radial basis function methods for global optimization. Journal of Global Optimization 37 (1), 113e135. Rehana, S., Mujumdar, P.P., 2009. An imprecise fuzzy risk approach for water quality management of a river system. Journal of Environmental Management 90 (11), 3653e3664. Schaake, J., Duan, Q., Andréassian, V., Franks, S., Hall, A., Leavesley, G., 2006. The model parameter estimation experiment (MOPEX). Journal of Hydrology 320 (1e2), 1e2. Slaughter, C.W., Marks, D., Flerchinger, G.N., Vactor, S.S.V., Burgess, M., 2001. Thirtyfive years of research data collection at the Reynolds Creek experimental watershed, Idaho, United States. Water Resources Research 37 (11), 2819e2823. Taillard, E.D., Waelti, P., Zuber, J., 2008. Few statistical tests for proportions comparisons. European Journal of Operational Research 185 (3), 1336e1350. Tang, Y., Reed, P., Wagener, T., 2006. How effective and efficient are multiobjective evolutionary algorithms at hydrologic model calibration? Hydrology and Earth System Sciences 10 (2), 289e307. Tang, Y., Reed, P.M., Kollat, J.B., 2007. Parallelization strategies for rapid and robust evolutionary multiobjective optimization in water resources applications. Advances in Water Resources 30 (3), 335e353. Tolson, B.A., Shoemaker, C.A., 2007. Dynamically dimensioned search algorithm for computationally efficient watershed model calibration. Water Resources Research 43 (W01413).

30

L.S. Matott et al. / Environmental Modelling & Software 35 (2012) 19e30

Tolson, B.A., Asadzadeh, M., Maier, H.R., Zecchin, A., 2009. Hybrid discrete dynamically dimensioned search (HD-DDS) algorithm for water distribution system design optimization. Water Resources Research 45 (W12416). Vasan, A., Simonovic, S.P., 2010. Optimization of water distribution network design using differential evolution. Journal of Water Resources Planning and Management 136 (2), 279e287. Vrugt, J.A., Gupta, H.V., Bastidas, L.A., Bouten, W., Sorooshian, S., 2003. Effective and efficient algorithm for multiobjective optimization of hydrologic models. Water Resources Research 39 (8), 1214. doi:1210.1029/ 2002WR001746. Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., Verstraten, J.M., 2005. Improved treatment of uncertainty in hydrologic modeling: combining the strengths of global optimization and data assimilation. Water Resources Research 41, W01017. 01010.01029/02004WR003059. Walsh, J., 1963. Bounded probability properties of Kolmogorov-Smirnov and similar statistics for discrete data. Annals of the Institute of Statistical Mathematics 15 (1), 153e158.

Wang, L., Fang, L., Hipel, K.W., 2008. Basin-wide cooperative water resources allocation. European Journal of Operational Research 190 (3), 798e817. Weng, S.Q., Huang, G.H., Li, Y.P., 2010. An integrated scenario-based multi-criteria decision support system for water resources management and planning e A case study in the Haihe River Basin. Expert Systems with Applications 37 (12), 8242e8254. Ximing, C., 2008. Implementation of holistic water resources-economic optimization models for river basin management e Reflective experiences. Environmental Modelling & Software 23 (1), 2e18. Yandamuri, S.R.M., Srinivasan, K., Bhallamudi, S.M., 2006. Multiobjective optimal waste load allocation models for rivers using nondominated sorting genetic algorithm-II. Journal of Water Resources Planning and Management 132 (3), 133e143. Yapo, P.O., Gupta, H.V., Sorooshian, S., 1998. Multi-objective global optimization for hydrologic models. Journal of Hydrology 204 (1e4), 83e97. Zhang, X., Srinivasan, R., Zhao, K., Liew, M.V., 2009. Evaluation of global optimization algorithms for parameter calibration of a computationally intensive hydrologic model. Hydrological Processes 23 (3), 430e441.