Ecological Modelling 151 (2002) 177– 193 www.elsevier.com/locate/ecolmodel
FEMME, a flexible environment for mathematically modelling the environment Karline Soetaert *, Vanni deClippele, Peter Herman Centre for Estuarine and Coastal Ecology, Netherlands Institute of Ecology, P.O. Box 140, 4400 AC Yerseke, The Netherlands Received 23 April 2001; received in revised form 19 September 2001; accepted 18 October 2001
Abstract A new, FORTRAN-based, simulation environment called FEMME (Flexible Environment for Mathematically Modelling the Environment), designed for implementing, solving and analysing mathematical models in ecology is presented. Three separate phases in ecological modelling are distinguished: (1) the model formulation i.e. the choice and implementation of the constitutive relations of the mathematical model, (2) the choice of a numerical solution scheme and (3) the model application i.e. a particular run or set of runs of the model. The basic characteristic of FEMME is to keep these three phases independent. The software is structured in a highly modular fashion, allowing unlimited combination of each. The modeller can restrict attention mostly to the model formulation, while the numerical solution and model application can be specified at run time without programming effort. The object-oriented design strongly reduces redundancy in the code: the same set of solution procedures is linked to all models, and one model implementation can be used without extra coding in a variety of applications. FEMME contains many functional units, such as a diversity of integration routines, steady-state solvers, fitting routines, input and output facilities and allows running Monte Carlo or sensitivity analyses or performing food web analyses. The ease of interfacing model applications with external data facilitates running different scenarios or fitting the model to observations. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Numerical modelling; Monte Carlo simulation; Sensitivity analysis; Data assimilation; Software frameworks; Coexistence model
1. Introduction Undoubtedly, mathematical modelling has produced acceleration in our understanding of natural processes. It allows to test hypotheses, to estimate ‘unmeasurable’ quantities, to obtain * Corresponding author. Tel.: +31-113-577-487; fax: + 31113-573616. E-mail address:
[email protected] (K. Soetaert).
closed mass or energy budgets or to signal inconsistencies in our comprehension of ecosystems (Jørgensen, 1994). Unfortunately, even for moderately complex systems, the efforts needed for making a mathematical model are often way out of balance with the objectives set. Compared to the time and effort spent in analysing the scientific problem and implementing the relevant equations, a disproportionate effort often goes into implementing
0304-3800/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 0 4 - 3 8 0 0 ( 0 1 ) 0 0 4 6 9 - 0
178
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
the basic modelling techniques, finding appropriate mathematical routines for solving a certain problem, and programming. The fact that ecological modelling requires combined expertise in ecology, mathematical techniques, and programming in some language may constitute a barrier for wider application of mathematical models in ecology. When devising a new ecological model, one generally starts by expressing the ecological problem (1) and formulating it in mathematical terms (2). Finding a proper mathematical solution is the next step (3), after which the model is implemented and run on a computer (4), its results inspected and interpreted in terms of ecological principles (5). From the point of view of the ecologist/modeller, phases (1), (2) and (5) are most relevant and exciting, yet steps (3) and (4) are usually the most time-consuming. Even if the basic equations are simple enough, the mathematical problems may turn out to be much more demanding than anticipated, and the use of improper solution routines may cause insufferable overhead or simply provide the wrong answers. An additional bottleneck is that all too often a model is designed with a specific application in mind. As a result the model problem, its solution and the application are closely intermingled in the computer code and it then becomes necessary to implement subtly changed new versions for every new application. Such approach may cause problems of internal consistency when modifications are not carried out on all the existing versions of a model. For instance, when models use external data (forcings, observations or initial conditions) and include code that is directly dependent on the structure of the data, it is impossible to link the model to a different data set without rewriting this part of the code.
2. Basic structure of FEMME FEMME has been designed to overcome these bottlenecks by: 1. Reducing code redundancy and preventing the proliferation of similar codes.
2. Enabling the modeller to focus on the highlevel process of modelling, and be shielded from the implementation details. 3. Providing a wide choice of methods for solving the model as efficiently as possible, and let the modeller decide upon the level of control over all the aspects of the solution procedures. 4. Providing a variety of techniques for analysing the behaviour of models. 5. Facilitating the interaction between the model and external data. The modelling environment is developed for modellers with a basic training in computer programming. As there are no crosschecks that the equations make sense, the units are appropriate and the proper mass balance is computed, the modeller must have knowledge on how to construct a valid set of differential equations. The design of models developed in FEMME (Fig. 1) is based on a distinction between three different aspects of the modelling process: (a) Model formulation. This encompasses the scientific analysis of the problem, leading to the definition of basic components (state variables, output variables, parameters and forcing functions) and the relationships existing between them in terms of mathematical equations. (b) Choice of a numerical solution scheme. Depending on the nature of the problem, on the desired accuracy and on the details of an application (as specified below), different numerical solution schemes may qualify as optimal choices. Flexibility in the choice of a solution scheme is required, but their implementation in the code is made independent of the model formulation step. Examples of such solution methods included in FEMME are integration methods, steady-state solvers and inverse modelling techniques for analysing food webs (Vezina and Platt, 1988; van Oevelen et al., in preparation). (c) Application of the model. This is the step where the model is actually run, and output is produced based on input in the form of forcing functions, parameter values, initial and observed data. One can distinguish different types of applications (e.g. a single run, Monte
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
Carlo analysis, sensitivity run or model calibration), and each can be run using different input values. In FEMME, the three steps in model development and use are kept strictly independent and
179
freely combinable. Once a model is formulated and its code implemented, it can be solved with different (pre-programmed) solution schemes, and used for a variety of applications. The software reuse tenet (goal 1) is achieved by
Fig. 1. Basic setup of FEMME.
180
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
linking the same FORTRAN code, the core of the environment, to all models implemented in FEMME. FEMME is allowed to take control over the simulation and orchestrates the various applications, without the need for extra coding. As such one set of subroutines solves all models, one model runs all applications (Fig. 1). To achieve goal 2, the model formulation is restricted to the specification (a) of the names, units and (for parameters) the values of the model components in a set of text files, and (b) of the functional relationships between these model components in a set of FORTRAN subroutines and functions. These are generally routines that specify the initial conditions and the dynamics of the system. The explicit declaration of model components in text files ensures proper documentation of the model. The implementation details (formal declarations), necessary to make them accessible in the FORTRAN subroutines and functions, are performed automatically. Technically speaking then, the final result of the model formulation step in FEMME is a FORTRAN programme, that communicates with a set of text files and that must be compiled using a FORTRAN 95 compiler, the modern equivalent of FORTRAN 77. FORTRAN was chosen as a programming language for several reasons. First of all, it is a well-structured language, relatively easy to learn, and programmers may benefit from the powerful and user-friendly environment that some compilers provide. In addition, it has all the basic qualities needed for implementing high-level and complex applications: the functionality of the language is virtually unlimited and the code is guaranteed to be fast. Because of that, it has been the language of choice for running e.g. complex 3-D hydrodynamic models. Coupling of models developed in FEMME to such hydrodynamic codes was one of the objectives. Most importantly though, there is a substantial scientific library available that is well tested and free to use and perfect for solving many problems associated with mathematical modelling. Indeed, some of FEMME’s more complex techniques, e.g. concerned with integration, or least distance programming (used for food web analysis) are based on such numerical codes (LINPACK: http://www.netlib.org/).
By mere specifying the most cost-effective routines, the model can be quickly tailored to run as efficiently as possible (goal 3). The mathematical routines may be complex, but the technicalities are hidden, and for each method, FEMME includes a default option. It is then up to the modeller to decide whether to override this default and use an alternative routine, by writing simple metacommands in a run specification file. After the model has been implemented and a method of solution chosen, the type of application can be chosen by specification of metacommands in a run specification file (goal 4). In these applications, the model can be linked to (a set of) files, containing external data for any of the model components (goal 5). Such external data are for instance used to initialise the model, as forcing functions or consist of observed data. In what follows, the structure of models developed in FEMME and how to trigger some of the possible applications will be demonstrated using a simple ecological model that describes coexistence in two Cladoceran species. Simple dynamic runs, steady-state calculations, Monte Carlo, calibration and sensitivity analysis will be performed. We will then shortly elaborate on the more advanced numerical features of the modelling environment, using examples where we have applied the tool for biogeochemical sediment modelling (Soetaert et al., 1996, 1998), and for coupled physical–ecological-diagenetic modelling (Soetaert et al., 2001) in the marine environment. FEMME has been designed as an ‘open source’ project, the whole package can be downloaded from http:// www.nioo.knaw.nl/cemo/femme and includes many more example models, as well as an extensive manual (Soetaert et al., 2000) explaining in full detail how to use the toolkit.
3. Example model: coexistence of two Cladoceran species This simple model describes the competitive interactions between two Cladoceran species, Daphnia galeata and Bosmina longirostris, grazing on the same food source (Fig. 2A) (Gurney and Nisbet, 1998). Both species have different maxi-
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
181
Fig. 2. (A) Structure of the coexistence model of Gurney and Nisbet (1998), (B) grazing of Bosmina and Daphnia, as a function of food concentration, (C) structure of the variable and parameter declaration files of the model.
mal growth rates and different half-saturation constants, such that Daphnia is the superior competitor (higher net growth) at high food concentrations, whereas Bosmina achieves higher growth at low food concentrations (Fig. 2B). They are kept in a culture where they are transferred to new medium with a known amount of food at regular time intervals. The experimental conditions are such that there is no growth of the food in the medium. If the transfer regime is kept constant for a sufficiently long time, the system generally evolves towards a dynamic steady-state, where the periodicity equals the transfer time. It has been shown that under certain conditions of transfer regime and food concentrations, both species may coexist at steady-state, whereas in a different situation either one of the species prevails.
The model equations, expressing the rate of change in time, are simple: dDaphnia = IngestionDaphnia·AssDaphnia dt − RespirationDaphnia dBosmina = IngestionBosmina·AssBosmina dt − RespirationBosmina dFOOD = −IngestionDaphnia dt − IngestionBosmina at transfer time: FOOD = FoodInMedium
182
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
where ingestion is limited by food concentration according to a hyperbolic function and respiration is simply first-order with respect to animal biomass. A fixed fraction (the assimilation fraction) of ingested material is converted to animal biomass. For Daphnia, respiration and ingestion are calculated as: IngestionDaphnia FOOD ·DAPHNIA FOOD +ks RespirationDaphnia =respDaphnia·DAPHNIA
= MaxIngDaphnia·
cifies the rate of change (derivative with respect to time) of Daphnia, Bosmina and Food (the three state variables), and calculates the values of the other variables. Transfer to new medium is accomplished using a built in function xTimeExpired() which returns true only if the time requested has expired. Finally, in the main PROGRAM the time unit of the model is set to hours (which is the time unit used for the parameter values) and control is passed to FEMME (CALL XSIMULATE).
3.2. First application, two simple dynamic runs 3.1. The model implementation in FEMME The implementation of this model in FEMME consists of a set of declaration files (plain ASCII text), specifying the model components and FORTRAN source code, describing the mathematical relationships. The model contains three state variables: Daphnia, Bosmina and Food and four ordinary variables: the ingestion and respiration of the two Cladocerans. These components, together with their units and a short description are declared in the variable declaration file (Fig. 2C). Six parameters determine the physiology of the animals: the respiration rate, the maximal ingestion rate and the half-saturation coefficient. Two additional parameters are used to initialise the concentration of the two species. Finally, the concentration of food in the transfer medium and the transfer time determine the set-up of the laboratory culture. The names of these parameters, together with their values and units are declared in the model’s parameter declaration file (Fig. 2C). The parameters and variables form the basic model components that can now be used in the FORTRAN subroutines. The FORTRAN source code that establishes the relationships between these model components consists of two subroutines (Fig. 3). In one subroutine (XSTART), the state variables are initialised, and the periodicity at which FEMME must test for steady-state is imposed via a built-in function xSetTestSteadyPeriod (see below for details). The second subroutine (XSUBMODS) spe-
After compiling FEMME’s subroutines and the model subroutines just described, the model is ready to be run. The specifications for each application are outlined in a run specification file, a text file containing metacommands specifying the numerical solution scheme, links to data and other details of the application. Fig. 4 gives an example of a run specification file that causes the model to run for 20 days. To propagate the model in time, a simple (explicit) integration routine with a time step equal to one model unit is used by default. This method is not always the most effective, and trial and error readily show that this particular model is most efficiently solved, in terms of accuracy and speed, using a 4th-order Runge–Kutta method with a time step of 0.25 h. This more complex routine is toggled on by writing in the run specification file ‘IntegrationType= RungeKutta’ and ‘Timestep= 0.25’. In addition, the number of hours the model has to run is specified (‘Duration = 20*24’), and the file to which the output must be written (‘Runfilename =Firstrun’). In Fig. 4 are the results of two model runs, with output written to a different output file. In the first run, parameter settings were as in Fig. 3, in the second run, the transfer time was changed, in the parameter declaration file, to 40 h. The model results indicate that changing the transfer rate has dramatic implications on the concentrations of the species, more specifically for Bosmina, which increases its biomass only under the rapid transfer regime.
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
3.3. Second application, two runs to dynamic steady-state In the previous example, it was demonstrated how the concentration of both species evolves, given an initial concentration in the medium. It is generally more instructive to consider the solution of the model at steady state, as this condition is independent from the initial concentration of the constituents (provided they are not zero). One metacommand (‘Runsteady=true’) instructs FEMME to iterate the model until steady-state has been achieved. In addition, the maximal amount of time that the model may run is set to 1× 106 h, to prevent the model from running forever in case steady state does not exist (‘Duration = 1000000’). Remark that for this model steady-state cannot be
Fig. 3. Model-specific subroutine.
FORTRAN
183
defined as a condition where all variables assume constant values, but rather as a periodic solution where the same time course for the state variables is repeated without change. The periodicity at which tests for recurrent output are made (the default is 0) was specified in the FORTRAN code (subroutine XSTART— see Fig. 3). Two example outputs of steady-state runs are in Fig. 5 with the food concentration in the medium set to 1.0 g C/m3 (left) and to 0.1 g C/ m3 (right). In these model runs, steady state was achieved after running the model for 100 000 and 20 000 simulation hours, which took less than 5 and 1 s on a 500 MHz Pentium, respectively. It is clear that under the high food conditions, only Daphnia can maintain itself, whereas under low food conditions, Bosmina is the only successful competitor.
code of the coexistence model: (left) main program and initialisation routine; (right) dynamic
184
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
Fig. 4. Two simple model runs for 20 days, (a) with the default settings, (b) with transfertime changed to 40 h; (above) metacommands required to trigger the run, (below) results.
3.4. Third application, a Monte Carlo analysis The previous finding raises the question under which experimental conditions either one of the species prevails or when they can coexist. To investigate that, the model must be run to steady state with different combinations of the food concentration and transfer time. In FEMME, this has been automated as a Monte Carlo application (Fig. 6). In addition to the previous specifications (as in Fig. 5), the run specification file contains the metacommands ‘application=MonteCarlo’, ‘parameterDrawType = grid’ and ‘NumberOfRuns =529’ which specifies that 529 parameter values must be generated according to a grid design. The names and ranges of the parameters that must be
varied and the variables to be inspected are given in a separate file (the random parameter file). With the settings as in Fig. 6, the model will now run 529 times to steady-state, for different combinations of the parameters FoodInMedium and TransferTime, generated according to a regular grid. For each run, the concentration of Daphnia, Bosmina and Food, averaged over the steady-state period (or the transfer time) is written as a function of the prevailing experimental settings. The figure generated using this output file demonstrates that at intermediate food concentrations ranging between 0.5 and 0.8, both species coexist, but when the species are transferred more frequently, coexistence is achieved at lower food concentrations.
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
3.5. Fourth application, a sensiti6ity analysis In the next application, the sensitivity of the model output to any of the four grazing parameters is tested. Suppose that the values of these quantities together with their uncertainty, expressed as the mean and the standard deviation have been estimated and we want to assess how this uncertainty is translated into the model outcome. The run specification file to put this sensitivity analysis into effect is in Fig. 7, both for a run to steady-state and a run for 20 days. The random parameters and random variables are specified in the random parameter and random variable file, as in the previous model application. Remark that the standard deviation is set to be
185
10% of the mean for each parameter and the possible parameter values are limited to positive numbers (by imposing the minimum of 0). Normally distributed parameter values are drawn according to the specifications given for each of the parameters by setting in the run specification file ‘RandomDrawType = Normal’. With the settings as in Fig. 7, the model will run 100 times (‘NumberOfRuns = 100’). The mean and standard deviation of the temporal evolution of each of the requested variables is written to the output. Remark that for the steady-state run, only the time evolution at steady-state is written. It is clear that the variability is relatively large during the 20 days run (Fig. 7C): after 20 days, the resulting standard deviation in the Bosmina concentration
Fig. 5. Two runs to steady-state, (a) with food concentration changed to 1 g C/m3; (b) with the default settings. The time curves for Bosmina, Daphnia (upper line) and food (lower thick line) is given. Above: metacommands required to trigger the application. Below: results.
186
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
Fig. 6. Monte Carlo application where the steady-state concentrations are listed as a function of food concentration and transfer period. Left: metacommands required to trigger the application (above) and files with specification of the random parameters (middle) and variables (below). Right: results. Grey area: Daphnia wins; white area: Bosmina wins; Black area: coexistence.
equals about 10% of the mean value. At steadystate however, variability is much lower and the standard deviation amounts to 0.01% of the average only (Fig. 7B).
3.6. Fifth application, fitting the model to obser6ations The impact of relatively small changes in the four grazing parameters on the model outcome during the 20 days run might give the impression that such experiment can be used to estimate the values of these parameters. Our last example consists of a model calibration, in which this assump-
tion will be tested. An ‘observed’ data set has been artificially created, by selecting some output values from the 20 days run (from Fig. 4), the values of the four parameters (maximal grazing rate and half-saturation constant) changed and FEMME instructed to fit the model to the ‘observations’ by fine-tuning the grazing parameters. The specifications ‘Application= Calibration’ puts the fitting routine into effect, while the name of the file containing the observed data is passed by writing ‘ObservedFilename= transfer20.dat’. This data file simply contains what has been measured (the name of the corresponding model variable), when it has been measured and its
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
value. In addition, the parameters that must be fine-tuned and the range of possible values (Fig. 8) are specified in the random parameter file, similarly as in the previous example. During a model calibration, FEMME tries to minimise the discrepancy between the model and the data (the model cost) by varying the values of the parameters in a structured way. Here the default ‘model cost’ function (i.e. the sum of squared residuals) and the default calibration method (the Levenberg– Marquardt algorithm) is chosen. In Fig. 8C, this model cost has been plotted for each run performed. The model converges towards a perfect fit in about 400 runs, the comparison between model and data at the end of the calibration is in Fig. 8B. In Fig. 8D are the estimated and true parameter values after performing the model calibration. It is obvious that these are strongly different for the parameters relating to the dynamics of Daphnia, slightly different for Bosmina. The program not only reports the ‘best’ parameter values but also the uncer-
187
tainty in the estimates and the correlation between parameters. As it happens, these uncertainties are extremely large due to the strong correlation between the estimated half-saturation coefficient and the maximal grazing rate. Indeed, larger values of the half-saturation constant, combined with larger values of the maximal grazing rate give almost the same model output such that it is impossible to estimate both with reasonable precision on this one data set. The same analysis may be run, now varying just the half-saturation coefficients of both species (not shown). In this case, convergence is reached after about 50 runs, and the exact values are retrieved.
4. A short description of more advanced numerical methods The previous examples have shown how to run different scenarios, how to perform a Monte Carlo, sensitivity or calibration analysis, how to
Fig. 7. Sensitivity application. (A) File with random parameter settings and variables, (B) sensitivity for steady-state solution, (C) sensitivity during 20 days run, (B) and (C): thick line is the mean over the simulation, thin line is mean 9 standard deviation.
188
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
Fig. 8. Calibration. (A) Left: file with metacommands required to trigger the application. Middle: part of the observed data file (‘Transfer20.dat’). Right: parameters that must be fine-tuned, (B) fit of model and data after performing the calibration, (C) discrepancy of model with data (Model cost) as a function of the run performed during calibration, (D) comparison of original parameter values (‘true’) and values after calibration (‘assimilated’).
overrule the default integration settings, and how to deal with observed data. These are but a subset of possible applications. Moreover, the model example was very simple, and FEMME offers some powerful techniques to solve models that are substantially more complex. Some of the more advanced features will now be dealt with (also see Table 1).
ing functions. The data can be provided in relation to other model variables, such as position or depth. For instance, a 1-D model can be initialised using a set of depth profiles, a 1-D forcing function can be specified as a time series of depth profiles. FEMME can be instructed to either interpolate to the exact position or to average or sum the data in an interval.
4.1. Forcing function, obser6ed and initial condition data
4.2. Steady-state
Models in FEMME can be linked to a variety of data files, containing specifications for the variables (initial conditions, observed data), or forc-
In the coexistence model, steady-state was achieved by running dynamically until recurrence of the model outcome. For certain model applications, there exists a much more efficient way to
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193 Table 1 Main options available in FEMME 1. Application types Single Run. Runs the model once, produces output (the default). Calibration. Fits the model to one or more datasets. Several routines are available (see 2). Sensiti6ity. Creates ranges around model output using multiple runs. Monte Carlo. Establishes cause-effect relationships using multiple runs. Impact analysis. Calculates the gradient of selected variables to selected parameters, i.e. dVAR/dPAR. By default, it is estimated numerically but the adjoint equations can be specified. Experiment runs. Runs a number of similar applications, called ‘experiments’, e.g. a set of dilution or batch cultures. Calibration on such experiments is possible. CheckAdjoint. Checks the implementation of the adjoint equations. Co6ariance. Calculates the covariance structure between the parameters with respect to a selection of variables or with respect to the model cost function. 2. The calibration algorithms implemented in FEMME The Levenberg–Marquardt algorithm (the default). The Marquardt parameter can be fine-tuned. The Broyden–Fletcher–Goldfarb–Shanno algorithm, a quasi-Newton method. The pseudo-random search method. The genetic algorithm. Probability of mutation, crossover, and other parameters can be finetuned. The method of simulated annealing. Several parameters can be finetuned, e.g. to update ‘annealing temperature’. Simple random search. 3. Frequency distributions of parameters Random distribution, within a predefined range (the default) Latin hypercube sampling, within a predefined range Uniform sampling, where each parameter is changed uniformly within a predefined range while all other parameters are kept at their nominal values. Grid sampling, where each parameter is changed uniformly within a predefined range and all parameter combinations are drawn. Normal distribution, specified by mean and standard deviation (if parameters are independent) or by variance-covariance structure of the parameters. Exponential distribution as specified by a mean value. Set of parameter values specified in a file. 4. Variable output during Monte Carlo and impact runs The mean of the variable during the simulation or steady-state period (the default) The minimum or maximum of a variable during the simulation or steady-state period
189
Table 1 (Continued) The timing of the minimum or maximum of a variable during the simulation or steady-state period The final value of the variable (at end of simulation) 5. The model cost function (objective function) implemented in FEMME Variable costs, the discrepancy between a model variable and the data, can be evaluated as: the sum of weighted squared residuals (default) or the sum of the weighted absolute values of the residuals. Each residual can be weighed using either the standard deviation or variance of the data (the default) or using user-specified weighing coefficients, one for each data point. The ultimate model cost is estimated either as the sum of all variable costs (the default) or the maximum of all variable costs. In addition, a parameter cost can be added. The more the value of the parameter deviates from the nominal value, the higher the cost (linearly). 6. Integration routines implemented in FEMME Simple Euler integration (the default); fixed time step can be specified. Simple Euler integration, adaptive time step; (absolute and relative) precision can be specified 4th-order Runge–Kutta integration, fixed time step can be specified 5th-order Runge–Kutta integration, adaptive time step; precision can be specified VODE Implicit integration routine; precision and structure of Jacobian matrix (see 8) can be specified Bulirsch-Stoer implicit integration routine; precision and structure of Jacobian matrix (see 8) can be specified Adams-Moulton predictor-corrector integration routine; precision can be specified 7. Steady-state methods implemented in FEMME Dynamically running to steady-state Period over which to test for steady-state can be specified Variables which must be tested for steady-state can be specified Iteratively calculating steady-state (Newton–Raphson). The structure of the Jacobian can be specified (see 8) 8. The jacobian matrix (implicit integration methods, Newton–Raphson steady-state solver) Jacobian is a full matrix. It is either numerically generated by FEMME (default method) or specified by the user. Jacobian can be written as a banded matrix (which has all non-zero elements close to the diagonal). Such matrix can be efficiently generated and inverted it is either generated numerically by FEMME or specified by the user.
190
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
Table 1 (Continued) 9. 1-D Transport solver Various types of boundary conditions (flux, imposed concentration, implicit flux, zero-gradient) Various ways of temporal differentiation (implicit, explicit, weighted) Several differencing schemes for the advective term (upwind, centered, quickest, flux-corrected).
establish the steady-state condition i.e. by direct iteration techniques. FEMME incorporates such a method, the Newton – Raphson method (optimised based on Press et al., 1993). This method was used to solve a biogeochemical model of the sediment, where the vertical profiles of four chemical species and two organic matter fractions are calculated as a function of the deposition of organic matter and bottom water concentrations (Soetaert et al., 1996). Technically, the model comprised several hundred non-linear equations. It was applied amongst others to run an extensive Monte Carlo analysis, consisting of several million steady-state runs (Soetaert et al., 1998; Middelburg et al., 1996). This can be achieved in one day of computer time (on a 500 MHz Pentium).
4.3. Integration Integration is at the heart of any dynamical model, and finding the most optimal method is often of great importance as it can substantially improve model performance. Numerical stability requirements of some integration routines may impose a significant constraint on the resolution of a simulation model. For instance, in spatially explicit models where matter is exchanged between boxes, and when using the more simple routines, one may have to decide to either cut down the time step or reduce the spatial resolution, depending on the so-called Courant number (Abbott and Basco, 1997). In order to allow efficient integration of most dynamic models, FEMME has incorporated several integrators. The default is a simple explicit method (Euler) with a fixed time step. Other, more powerful routines are the 4th- and 5th-order Runge – Kutta methods (adapted from Press et al.,
1993), one predictor-corrector method (the Adams –Moulton algorithm) and two implicit methods (VODE; Brown et al., 1989; and the Bulirsch–Stoer method; Stoer and Bulirsch, 1983). Especially the implicit methods may reduce the computation time significantly in certain model applications, as they are more stable and adapt the time step to the prevailing dynamic conditions. FEMME takes care of the technical intricacies of the implicit techniques (related to the generation and inversion of a so-called Jacobian matrix). If needed, however, the user can specify the structure of this matrix. FEMME also provides solutions when a model consists of sub-models with different numerical properties, by allowing each of these submodels to be integrated with their own settings. An example is an application where the above mentioned sediment model was coupled to a 1-D physical–biological model of the water column, the latter consisting of turbulence and biological equations (Soetaert et al., 2001). The sediment model was solved with an implicit integration method (VODE), the ecological and turbulence equations in the water-column with simple Euler integration. Interaction between both submodels was enforced each day. This split integration reduced the computing time with several orders of magnitude, without notable loss of accuracy. The model was forced with an extensive set of atmospheric data (air humidity, temperature, etc.). The water column constituents were initialised using a set of measured nutrient and chlorophyll profiles, the sediment model was initialised by iteratively solving it to steady-state at the start of the simulation. All this was triggered with a few lines of metacommands in the run specification file.
4.4. Calibration Fitting a model to observed data (data assimilation) is another area where several alternatives are provided, as some algorithms that perform well for certain models may horribly fail for others. The methods that are available for model calibration roughly fall in two categories. In one type, the best fit is searched in a quasi-random fashion. The method of simulated annealing (Kruger,
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
1993), a genetic algorithm (Goldberg, 1989), a pseudo-random search algorithm (Price, 1979) and a purely random method, implemented in FEMME, are of this type. In contrast, in so-called directional methods, the minimum is approached iteratively, using the gradient of the discrepancy of model to data as in the quasi-Newton and the Levenberg –Marquardt algorithm (both implementations based on algorithms described in Press et al., 1993). Each of the methods can be finetuned or the default settings may be used. Mathematically skilled modellers may choose to implement the adjoint equations of the model problem as a means to obtain the gradient of the cost function and speed up the calibration (Lamy et al., in press, Lawson et al., 1995), but this is not necessary as by default the gradient is approximated numerically. In addition, it is possible to combine both types of methods, i.e. to search parameter space first using a random-based method, after which a directional method is allowed to finish the job.
4.5. Food web analysis Finally, a special part of FEMME is dedicated to the analysis of food webs (Vezina and Platt, 1988). This is a special, inverse modelling technique in which observations, consisting of measured concentrations and combinations of fluxes, and process knowledge is combined to derive the plausible fluxes between components (van Oevelen et al., in preparation).
5. Discussion There are many basic goals of mathematical modelling, such as prediction, estimation of key parameter values, or simply gaining insight in the constitutive processes. In all these applications, the underlying model equations may be the same, but the orchestration, i.e. the way in which these equations are used, is different. Traditionally, modellers tend to make slightly different versions for each of these applications, which makes difficult to maintain consistency between the various codes within a modelling project. By separat-
191
ing model formulation, solution and application, FEMME has been designed to prevent such proliferation of code. The results presented in previous sections introduce FEMME as a robust simulation tool that can perform a variety of modelling tasks with a relatively minor effort, regardless of the complexity of the equations. For instance, when the aim of the application is to increase insight in the processes, a Monte Carlo application can quickly scan the universe of possible parameter combinations by varying one or more chosen parameters and testing the effect on a selection of model variables. It is possible to inspect either the average of a variable over the simulation time, and/or its final value, maximum, or minimum or the timing of the maximum or minimum. In addition, several patterns can be chosen by which parameter values are generated (Table 1). If prediction is the goal, the same model can be run with different parameter values or forcing function data or initial conditions, just by specifying the name of the file containing the data. Reading these data, interpolating and writing all necessary output is then automatic. In addition, mass budget calculations are made easy as the integral of all variables over the entire output period is automatically recorded. When we desire to fit the model to observations, it suffices to simply pass the name of the file containing the data, choose the parameters that must be fine-tuned and toggle on one of the calibration routines implemented. When successful, a parameter set will be obtained that minimises the summed residuals between model output and data (the model cost). There is no guarantee that the minimum will be a truly global one, but there are ample facilities implemented to increase the robustness of the model calibration. FEMME also provides great flexibility in the interaction with external data, be it initial conditions, forcing functions or observations. These data can be offered to the model in a way that little or no analytical pre-processing is necessary, as interpolation both in time and if requested in space (or some other dimension) is automated. This prevents the overhead commonly associated to changing the (spatial) resolution in a model where the data have to presented conform to the
192
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193
mesh used, and therefore generated prior to running. Many of FEMME’s applications require the same model to be executed multiple times, such that it becomes of primordial importance to solve the model as efficiently as possible. As models can differ substantially in mathematical complexity and numerical characteristics, none of the existing solution methods are consistently superior. Moreover, as it often happens that simply triggering the optimal method is associated with substantial leaps in performance, for the computationally most expensive modelling tasks, FEMME provides a choice amongst procedures, from very simple to rather complex ones. Seven integration routines and six calibration routines are implemented and can be fine-tuned at will (Table 1). FEMME is one of many software packages designed to make the life of a modeller easier (Legovic, 1997). It is a new contribution to the long series of FORTRAN- or C-based modelling frameworks (e.g. BSIM: Keizer et al., 1987; SONCHES: Gnauck et al., 1990; SENECA: de Hoop et al., 1993; SESAME: Ruardij et al., 1995). The new tool compares best to its predecessor, SENECA (Simulation ENvironment with Ecological Application, de Hoop et al., 1993), but if offers more model analysis tools and mathematical repertoire. Moreover, model components can have any rank (0-D, 1-D, multi-D), whereas SENECA was restricted to 0- or 1-D models only. Other modelling packages have emphasised different aspects of modelling. Examples are frameworks that offer a user-friendly interface (Costanza et al., 1998), ensure proper documentation of models (Benz et al., 2001), or facilitate the integration of different existing simulation models (Villa and Constanza, 2000; Villa, 2001). Still other tools allow development of individual-based models (Mooij and Boersma, 1996; Lorek and Sonnenschein, 1999), or spatially realistic population dynamics (Gathmann and Williams, 1998; Fall and Fall, 2001). There are certain modelling problems that may not benefit from implementation in FEMME. Non-deterministic models, such as individualbased models (Mooij and Boersma, 1996), for instance cannot be combined with the calibration
methods. Nor does FEMME cope efficiently with event-oriented processes. For certain simple problems, the use of FEMME may seem like using an axe to cut butter, and more user-friendly programmes (such as STELLA; Costanza et al., 1998) may be more suited. Finally, for models that impose high memory requirements (ecological formulations imbedded in 3-D hydrodynamic models for instance), the overhead created by FEMME may just be too large and these too may be better off with more specific implementations. 6. Conclusion The need to solve complex environmental problems in a timely fashion demand that the best scientific tools be employed. FEMME is such a tool that not only reduces the overall modelling time and programming expertise required, but does so without sacrificing optimal performance and functionality. The combination of text files that contain the names and units of the model components with FORTRAN code where all technical intricacies are handled by FEMME, ensures a good documentation of the final product, a familiar model code with which to work while offering a substantial repertoire to solve and analyse the model with little effort. Acknowledgements We thank Johannes van Oevelen for constructive comments on this manuscript. This work was performed in the framework of the EU-MASTII supported Ocean Margin EXchange (OMEX) project (MAS 3-CT96-0056). This is publication no 2828 from the NIOO-CEMO. The model used in this paper, as well as other example models and the framework itself can be downloaded from http://www.nioo.knaw.nl/cemo/femme References Abbott, M.B., Basco, D.R., 1997. Computational fluid dynamics. An introduction for engineers. Longman Singapore Publishers, Harlow, p. 425.
K. Soetaert et al. / Ecological Modelling 151 (2002) 177–193 Benz, J., Hoch, R., Legovic, T., 2001. ECOBAS – modelling and documentation. Ecol. Modell. 138, 3 –15. Brown, P.N., Byrne, G.D., Hindmarsh, A.C., 1989. VODE: a variable coefficient ODE solver. SIAM J. Sci. Stat. Comput. 10, 1038 – 1051. Costanza, R., Duplisea, D., Kautsky, U., 1998. Modelling ecological and economic systems with STELLA. Ecol. Modell. 110, 1 – 4. Fall, A., Fall, J., 2001. A domain-specific language for models of landscape dynamics. Ecol. Modell. 137, 1 – 21. Gathmann, F.O., Williams, D.D., 1998. Inter-site: a new tool for the simulation of spatially realistic population dynamics. Ecol. Modell. 113, 125 –139. Gnauck, A., Matthaus, E., Straskraba, M., Affa, I., 1990. The use of SONCHES for aquatic ecosystem modeling. Syst. Anal. Model. Simul. 7, 439 –458. Goldberg, D.E., 1989. Genetic algorithms in search, optimization & machine learning. Addison-Wesley Publishing Company, Inc, p. 412. Gurney, W.S.C., Nisbet, R.M., 1998. Ecological Dynamics. Oxford University Press, Oxford, p. 335. de Hoop, B.J., Herman, P.M.J., Scholten, H., Soetaert, K., 1993. SENECA 2.0: A Simulation Environment for ECological Application. ECOLMOD report EM-6, NIOODGW, p. 224. Jørgensen, S.E., 1994. Fundamentals of Ecological Modelling. In: Developments in Environmental Modelling, second ed. Elsevier, Amsterdam. Keizer, P.D., Gordon, D.C. Jr., Schwinghamer, P., Daborn, G.K., Ebenhoh, W., 1987. Cumberland Basin Ecosystem Model: Structure, Performance and Evaluation. Can. Tech. Report Fish. Aquat. Sci. No. 1547. Kruger, J., 1993. Simulated annealing: a tool for data assimilation into an almost steady model state. J. Phys. Oceanogr. 23, 679 – 688. Lamy, F., Sherwin, T., Soetaert, K., Herman, P.M.J., Torres, R., Estimates of vertical mixing during a Lagrangian experiment off the Galician coast. J. Mar. Syst., in press. Lawson, L.M., Spitz, Y.H., Hofmann, E.E., Long, R.B., 1995. A data assimilation technique applied to a predator-prey model. Bull. Math. Biol. 57 (4), 593 –617. Legovic, T., 1997. Ecological modelling Internet resources. Ecol. Modell. 100, 163 –169. Lorek, H., Sonnenschein, M., 1999. Modelling and simulation software to support individual-based ecological modelling. Ecol. Modell. 115, 199 –216.
193
Middelburg, J.J., Soetaert, K., Herman, P.M.J., Heip, C., 1996. Denitrification in marine sediments: a model study. Global Biogeochem. Cycles 10 (4), 661 – 673. Mooij, W.M., Boersma, M., 1996. An object-oriented framework for individual-based simulations (OSIRIS): Daphnia populatin dynamics as an example. Ecol. Modell. 96, 139 – 153. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1993. Numerical recipes in FORTRAN. The art of scientific computing, second ed. Cambridge University Press, Cambridge, UK, p. 964. Price, W.L., 1979. A controlled random search procedure for global optimisation. The Computer J. 20, 367 – 370. Ruardij, P., Baretta, J.W., Baretta-Bekker, J.G., 1995. SESAME, a software environment for simulation and analysis of marine ecosystems. Neth. J. Sea Res. 33, 261 – 270. Soetaert, K., Herman, P.M.J., Middelburg, J.J., 1996. Dynamic response of deep-sea sediments to seasonal variations: a model. Limnol. Oceanogr. 41 (8), 1651 – 1668. Soetaert, K., Herman, P.M.J., Middelburg, J.J., Heip, C., Smith, C.L., Tett, P., Wild-Allen, K., 2001. Numerical modelling the shelf break ecosystem: integrating benthic and pelagic field measurements. Deep-Sea Res. II 48, 3141 – 3177. Soetaert, K., Herman, P.M.J., Middelburg, J.J., Heip, C., 1998. Assessing organic matter mineralisation, degradability and mixing rate in an ocean margin sediment (Northeast Atlantic) by diagenetic modeling. J. Marine Res. 56, 519 – 534. Soetaert, K., Declippele, V., Herman, P.M.J., van Oevelen, J., 2000. FEMME, A Flexible Environment for Mathematically Modelling The Environment. MANual. OMEX-II Report, p. 200. Stoer, J., Bulirsch, R., 1983. Introduction to Numerical Analysis. Springer, NewYork, p. 609. Vezina, A.F., Platt, T., 1988. Food web dynamics in the ocean: I. Best-estimates of flow networks using inverse methods. Mar. Ecol. Progr. Ser. 42, 269 – 287. Villa, F., 2001. Integrating modelling architecture: a declarative framework for multi-paradigm, multi-scale ecological modelling. Ecol. Modell. 137, 23 – 42. Villa, F., Constanza, R., 2000. Design of multi-paradigm integrating modelling tools for ecological research. Environ. Modell. Software 15, 169 –177.