A method for aggregating state variables in large ecosystem models

A method for aggregating state variables in large ecosystem models

Available online at www.sciencedirect.com Mathematics and Computers in Simulation 79 (2008) 368–378 A method for aggregating state variables in larg...

240KB Sizes 2 Downloads 75 Views

Available online at www.sciencedirect.com

Mathematics and Computers in Simulation 79 (2008) 368–378

A method for aggregating state variables in large ecosystem models Jock Lawrie ∗ , John Hearne School of Mathematical and Geospatial Sciences, RMIT University, 368-374 Swanston Street, Melbourne, Australia Received 18 November 2006; received in revised form 2 November 2007; accepted 8 January 2008 Available online 17 January 2008

Abstract Simplifying large ecosystem models is essential if we are to understand the underlying causes of observed behaviours. However, such understanding is often employed to achieve simplification. This paper introduces a method for model simplification that can be applied without requiring intimate prior knowledge of the system. Its utility is measured by the resulting values of given model diagnostics relative to those of the original model. The method uses a least-squares criterion to identify sets of state variables that can be aggregated, and then generates a modified model structure and accompanying parameters that enable these variables to be replaced with the aggregates. The method is applied to a model of the nitrogen cycle in Port Phillip Bay, Victoria, Australia. Aside from reducing the model’s order, the method enables the reduced model to retain an ecological interpretation, and reveals insights into the system’s structure. © 2008 IMACS. Published by Elsevier B.V. All rights reserved. Keywords: Large ecosystem models; Aggregation; Partial proper orthogonal decomposition

1. Introduction In recent decades, several large ecosystem models have been developed and widely used. Common examples include Ecopath with Ecosim [7] in aquatic settings, and the model JABOWA [4] in forestry. Despite their utility, many problems are associated with such models [32,28]. For example, large models can be difficult to calibrate and validate [17,30], and their data requirements may be impractical [19]. Also, an observed behaviour may have many possible underlying causes, making it difficult to gain insights into the system and draw reliable conclusions. One way of avoiding these issues is to develop smaller models that serve the same purpose as the large models. Although several examples of this strategy have been documented [10–13], there are no definitive rules for determining the appropriate model size and complexity for a given modelling objective. Indeed, if a model is too simple it may misrepresent the dynamics associated with an observed output, or not be capable of producing the output at all [12,32]. Consequently, some authors have concluded that moderately complex models are the most instructive when investigating ecosystem dynamics [9,32,35]. To this end, this paper discusses a method for reducing the complexity of large ecological models without sacrificing the behaviours that are relevant to the specified modelling objective. The method involves aggregating state variables, thereby reducing the order of the system. ∗

Corresponding author. E-mail addresses: [email protected], [email protected] (J. Lawrie).

0378-4754/$32.00 © 2008 IMACS. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.matcom.2008.01.001

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

369

Early studies of aggregation in an ecological context involved simple models such as linear or Lotka-Volterra forms [1,5]. Aggregates were typically chosen using knowledge of model behaviours such as equilibria, and assessed using error-minimising conditions [5,6,25]. Several formal conditions for achieving zero error (perfect aggregation) have been proposed [1,15,21,22], and conditions for minimal error were put forward by Iwasa et al. [16]. However the resulting computational requirements are considerable in the linear case, and prohibitive otherwise. Furthermore there is no prescribed method for choosing the aggregates. Another common approach involves singular perturbations [2,3,31,18]. However these attempts also assume the existence of behaviours such as equilibria and limit cycles, which are difficult to identify in large simulation models if they exist at all. Moreover, many of these models comprised only a small number of state variables. The model of Kooi et al. [18], for example, had only one equation per trophic level. Although these models can provide insights into the mechanisms built into larger models [24], they cannot act as substitutes for the larger models. To date their has been little analytical work done on large scale ecological models. In this context, variables are typically aggregated after trialling different aggregates within the framework of a given large model, and comparing the outputs of the resulting smaller models [8,12,26]. Although useful, this approach is a trial-and-error procedure that relies on prior understanding of the model in order to select the aggregates. Moreover there is no hint about the effects of such groupings a priori. In this paper we seek an order-reduction method that requires no a priori information so that it can be implemented automatically. We also require that the resulting models have an ecological interpretation, so that insights into the system can be gained. The method discussed in this study applies to models of the form x˙ = f (x(t), u(t), t), y = y(x)

(1)

where x(t) is the n-dimensional state vector, u(t) a vector-valued forcing function and y is an output vector. We also have x(0) = x0 and 0 ≤ t ≤ T for some time frame T. Such a model shall be referred to as the full model, while its order-k approximation, k < n, shall be referred to as a reduced model. The order-reduction method is described in Section 2, and is demonstrated on the model described in Section 3. The results are presented and discussed in Section 4, and conclusions drawn in Section 5. 2. Aggregating state variables The usual approach to aggregating state variables in nonlinear systems is to use previous knowledge of the system to modify the parameters and equations for the aggregates. For example, O’Neill and Rust [25] aggregated two variables based on the long term behaviour of the system (which was an equilibrium). Kooi et al. [18] chose their aggregates using the conservation of mass, while Pinnegar et al. [26] lumped variables according to their positions in the food web. While this approach has been successful, the choice of variables to aggregate and the associated new parameter values is not always obvious, especially in large systems. In particular, it is not clear how many variables should be aggregated for a given modelling purpose. We seek insight into these problems via mathematical analysis of the model output. The technique we use is a modification of proper orthogonal decomposition (POD) [27,29] which we shall call partial proper orthogonal decomposition (PPOD). Before introducing PPOD, we briefly describe POD. 2.1. Proper orthogonal decomposition Proper orthogonal decomposition is a method for approximating models of the form (1) with models of order k < n. It is based on principal components data fitting and involves two steps: 1. Projecting the trajectory x(t) of the full model (1) onto a k-dimensional affine space S with origin x∗ in the original n-dimensional coordinates. The space S is identified by minimising the Euclidean distance between x(t) and S, integrated over time. 2. At the projected points xˆ ∈ S, project the derivative vector f (ˆx, u, t) onto S, so that we have xˆ (t) ∈ S for all 0 ≤ t ≤ T . The resulting reduced model is given by xˆ˙ (t) = Pf (ˆx(t), u(t), t),

(2)

370

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

where xˆ (0) = P(x0 − x∗ ) + x∗ (see Rathinam and Petzold [27] for details). Since this study considers several input vectors {um (t)}, 1 ≤ m ≤ s, we take x∗ to be the mean state  x¯ , where the average is taken over the simulation period T and over the s trajectories. That is, we have x¯ = (1/s) sm=1 x¯ m , where x¯ m is the time-averaged state under the m th input {um (t)}. The projection matrix P is computed from the eigenvectors of the covariance matrix cov of the full-model output. More specifically, we have P = ρT ρ, where the rows of the k × n matrix ρ are the k eigenvectors that correspond to the k largest eigenvalues of cov. Since we seek relationships between state variables that hold under each input {um (t)}, here we have used cov = (1/s) sm=1 covm , where covm is the covariance (centered about x¯ ) under the m th input. The coordinates of the reduced-model output, in the original n-dimensional space, are given by xˆ = P(ˆx − x¯ ) + x¯ .

(3)

2.2. Partial proper orthogonal decomposition Since POD minimises the Euclidean distance between a vector x and its approximation xˆ , it tends to focus on the components with the largest variance. These are typically the variables on the largest scale. That is, the basis vectors of the reduced space S have relatively large weights in the components corresponding to these variables. Even though the smaller scale variables can be effectively approximated using this basis, the relationships involved are only empirical. We gain no insight into the causes of system behaviour. However, we can reduce the effects of scale by projecting the normalised variables (xi − x¯ i )/(σi ), where x¯ i and σi are the mean and standard deviation of xi , respectively. The covariances between these variables are of course correlations. Therefore the only change to the reduced model (Eq. (2)) is that the eigenvectors of the correlation matrix rather than the covariance matrix are used to calculate the projection matrix P. Although POD reduces a model’s order, it also increases the complexity of the approximated derivatives. Specifically, processes that do not directly affect a variable xi can directly affect its approximation xˆ i . For example, suppose that an order-two system is projected onto a line. Then Eq. (2) tells us that the derivative for xˆ 1 is a weighted sum of estimates of the derivatives of x1 and x2 , and vice versa. That is, it contains processes that were not there originally, namely those in the derivative for x2 . For higher order ecological systems, this amounts to connecting every variable to every other variable in both directions, so that the trophic structure is obscured. Ecological interpretation then becomes particularly difficult. This observation leads us to project mutually exclusive subsets of the state variables, so that a given rate only affects one such subset and not all variables. That is, the basis vectors of the reduced space S are found by projecting each subset onto a separate one-dimensional line using POD. Trophic relationships can then be preserved because we are now considering interactions between groups of variables. Moreover, this utilisation of POD allows us to reduce the model’s order without increasing the complexity of the aggregate. It also allows for any variable to remain unaggregated, which is desirable if the variable cannot be appropriately lumped, or if it is required to be explicitly represented. The final modification of POD that we implement concerns estimating the variables that have been aggregated. Despite reducing the effects of scale, an aggregate (represented by the basis vectors of the reduced space) can still be dominated by one or a few of its constituent variables. In this case the reduced-model estimates of the non-dominant variables will behave similarly to the dominant variable estimates, though perhaps on a different scale. Consequently any diagnostics that focus on the non-dominant variables may be significantly affected by the aggregation. In order to overcome this situation we seek aggregates whose dynamics are not dominated by any of its constituent variables, but are significantly affected by each of them. To this end, once a set of variables {xij }, j ≤ α, has been identified for aggregation, we not only project them but also their sum. For example, if xi and xj are aggregated to obtain xi + xj , then we project the three variables (xi , xj , xi + xj ). The resulting aggregate differential equation is given by the last coordinate of Eq. (2), and represents the variable xi + xj . Extending this to the aggregation of α variables xi1 , . . . , xiα is straightforward—apply POD to the α + 1 variables (xi1 , . . . , xiα , xi1 + . . . + xiα ). A consequence of this action is that a given aggregate is not dominated by any of its constituent variables. Indeed, by including xi + xj in the projection, we are minimising the error  T  T (ˆxi − xi )2 + (ˆxj − xj )2 + (ˆxi + xˆ j − xi − xj )2 dt = 2 (ˆxi − xi )2 + (ˆxi − xi )(ˆxj − xj ) + (ˆxj − xj )2 dt 0

0

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

371

Since the magnitude of the second term is always between those of the first and third terms, minimising it ensures that neither of the errors (ˆxi − xi )2 or (ˆxj − xj )2 dominates the other. Thus we have a way of aggregating state variables using output from the full model and some mathematics. This method, which we call PPOD, involves three modifications of the original method, POD: 1. Several projections are identified, one for each aggregate. 2. The projections are calculated from the correlation matrix rather than from the covariance matrix. 3. The variables in an aggregate ({xij }, j ≤ α) are projected along with their sum. That is, a projection is applied to the vector (xi1 , . . . , xiα , xi1 + . . . + xiα ). Once we have chosen variables to lump together, no detailed understanding of the full model is required in order to choose the parameters for the aggregate (the issue of choosing the variables is discussed in the next section). In this case, the order of the full model is reduced by ki=1 (αi − 1), where αi is the number of variables in the ith aggregate, and k is the number of aggregates. The price paid is that we lose information about the aggregate’s constituent variables xij , though these can be estimated using Eq. (3). 2.3. Choosing the aggregates—the PPOD algorithm Although we now know how to aggregate a given set of variables, we are still faced with the problem of choosing such a set. In order to resolve this, we compare the time series of the derivatives of possible aggregates. More specifically, we calculate the time series of the derivative of a given aggregate agg = xi1 + . . . + xiα . This is simply the sum of the time series of the xij . Our first approximation of this time seriesis given by the last coordinate of Eq. (2) with  where agg  = αj=1 xˆ ij . That is, the time series of (d/dt)agg is the vector xˆ replaced by the vector (ˆxi1 , . . . , xˆ iα , agg), approximated by d  = Pα+1 · fagg (ˆx, agg,  u, t), agg (4) dt where P is the projection matrix found by applying PPOD to the vector (xi1 , . . . , xiα ,agg), fagg (x,agg,u, t) is the vector (˙xi1 , . . . , x˙ iα , (d/dt)agg), and α + 1 denotes the last row of P. However, this approximation contains xˆ i for which we have no data. We approximate further by replacing the xˆ i that  This is achieved using Eq. (3) with the vector are not in the aggregate by xi , and write the remaining xˆ i in terms of agg.  and agg  in turn replaced by agg. Therefore the time series of (d/dt)agg is xˆ replaced by the vector (ˆxi1 , . . . , xˆ iα , agg), approximated by d  = Pα+1 · fagg (xagg ,agg,u, t), (5) agg dt  are close, then where xagg is the vector of xi not in the aggregate agg. If the xi and xˆ i are “close”, and if agg and agg this approximation will yield a small least-squares error T  − (d/dt)agg)2 dt ((d/dt)agg Eagg = 0 (6) T 2 0 ((d/dt)agg) dt  is calculated from Eq. (5). Otherwise the error will be large. Thus we measure the validity of aggregating where (d/dt)agg the xij by the error given in Eq. (6). Note that this approach can only be a guide for choosing aggregates. That is, a small error does not guarantee that the approximate aggregate will behave like the true aggregate—its effect on the non-aggregated variables must  a small error is likely be taken into account. However, given the presence of these variables in the derivative of agg, to indicate an appropriate aggregate. We could check this more thoroughly by comparing the time series of the nonaggregated derivatives to their approximations, but this becomes computationally expensive. Moreover, we shall see that the procedure described is adequate. The algorithm for identifying aggregates that was used in this study is shown in Fig. 1. Here aggjm is the list of variables that are candidates for being aggregated with xj under the input um (t), and aggj is the list of variables that are candidates for being aggregated with xj in the reduced model. These sets are identified by requiring that the error

372

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

Fig. 1. The algorithm used for aggregating state variables.

that results from the corresponding aggregations (Eq. (6)) is less than a pre-specified tolerance, Enom . If there is a set of indices J = {i1 , . . . , iα } such that aggia = aggib for all a, b ≤ α, then the aggregate {xi1 + · · · + xiα } was included in the reduced model in place of the individual {xij }, ij ∈ J. The reduced model is then given by:  u, t) (d/dt)ˆxj = fˆ j (ˆx, agg,  i = fˆ i (ˆx, agg,  u, t) (d/dt)agg  is the vector of aggregates. The fˆ j are the where xˆ is the vector of state variables that are not aggregated and agg original fj with the non-aggregated xj replaced by the xˆ j , and each aggregated xj replaced by its expression in terms  i which is found by solving Eq. (3) as before. The approximated aggregate derivatives fˆ i are given of the relevant agg in Eq. (4) for each i. The initial conditions xj (0) are unchanged for the non-aggregated variables, while for each  i , the initial condition is found by solving the last equation of the set (see Eq. (3)). That is, we solve aggregate agg  i (0) = Pα+1 (ˆxi (0) − x¯ i ) + x¯ i , where xˆ i is the vector (ˆxi1 , . . . , xˆ iα , agg  i ), x¯ i is the corresponding vector of means agg and the subscript α + 1 denotes the last row of the projection matrix P. 3. The model The PPOD procedure shall be applied to a biogeochemical model of Port Phillip Bay, Australia. The bay is approximately horseshoe-shaped with an area of about 1930 km2 and a 3 km opening to Bass Strait at the southern end. The model incorporates both physical and biological processes, and was developed in order to investigate the effects of increasing nutrient inputs from the rivers that flow into to the bay. It was originally formulated in the 1990s by the CSIRO (Australia’s government research body) and was called the Port Phillip Bay Integrated Model (PPBIM) [14,23]. Several versions of the PPBIM have since been developed, including the one used for this study. We shall refer to this

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

373

Table 1 The state variables included in the PPBM, together with their units and their symbols used throughout this study Variable name

Units

Water column

Sediment

Ammonia Nitrate (and nitrite) Small phytoplankton Large phytoplankton Dinoflagellates MicrophytoBenthos Small zooplankton Large zooplankton Labile detritus Refractory detritus Dissolved organic nitrogen Macroalgae Seagrass Benthic filter feeders Dissolved silica Detrital silica Water column sand Water column volume Sediment water volume Sediment solid volume

mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−3 mg N m−2 mg N m−2 mg N m−2 mg Si m−3 mg Si m−3 mg WCS m−3 m3 m3 m3

NH NO PS PL DF MB ZS ZL DL DR DON

NHs NOs

Epibenthos

PLs MBs

DLs DRs DONs MA SG BF

Si DSi WCS WCV

Sis DSis

SWV SSV

version as the Port Phillip Bay Model (PPBM). Since it was described in Lawrie [20], we shall only summarise its key features here. The PPBM explicitly represents processes in both the water column and sediment layers of the bay, and includes processes by which the two layers interact. It is essentially a complex N–P–Z model that has been extended to incorporate detritus, benthic plants (SG and MA), and benthic filter feeders (BF). Most of the 29 state variables trace various forms of nitrogen throughout the bay, while the rest involve inorganic silica, water and sand. These variables are listed in Table 1 along with their units and the symbols used throughout the discussion. The subscript “s” denotes a sediment variable. 3.1. Model processes The main biological processes represented in the PPBM are growth, mortality and grazing. Most of these are density-dependent, and many are temperature-dependent (see Murray and Parslow [23] for details). To complete the cycle, plant and animal matter are recycled back to various forms of detritus and nutrients. Aside from the nutrient inputs from Bass Strait, the rivers and the atmosphere, the external forcing processes include tidal action, rainfall and evaporation. These processes are represented as time-dependent functions, and were obtained from auxiliary models also developed at the CSIRO [23,34]. The physical processes within the bay include an exchange of water between layers, and the settling and resuspension of particulates. These are thoroughly described in Walker and Sherwood [33]. Since the interface between the bay and Bass Strait is only 3 km wide, relatively little nutrient exits the bay to Bass Strait. Rather, most nutrient is converted to nitrogen gas within the bay and released to the atmosphere. This process is known as denitrification and is necessary to avoid eutrophication. Thus the ability of the reduced models to accurately represent denitrification is of particular interest. In addition, we consider six other performance measures, identified in the next section. 3.2. The purpose of the model The seven performance measures (yi ) that we consider are listed in Table 2, and are a subset of those considered by Murray and Parslow [23]. The purpose of the model is to investigate the effects that increased nutrient inputs to the bay have on these measures. The values of the measures under the current (base case) load are also given in Table 2.

374

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

Table 2 The system diagnostics, their units and their values under the base case nutrient load Diagnostic

Units

Base case value

y1 . PL production y2 . PS production y3 . ZL grazing PL y4 . ZS grazing PS y5 . Denitrification y6 . SG production y7 . MA production

tonnes N year−1 tonnes N year−1 tonnes N year−1 tonnes N year−1 tonnes N year−1 tonnes N year−1 tonnes N year−1

8026 23,260 4775 19,250 1.05 × 106 27,167 176,847

More specifically, the base case is the input of DIN (dissolved inorganic nitrogen = NO + NH), DON and Si from the Yarra and Patterson rivers, Mordialloc creek and the Western Treatment Plant. The increases that we consider involve scaling the base load by factors of 1–4, so that we have s = 4 input vectors um (t). Also, since each diagnostic is an annual average, the chosen simulation period was T = 365 days. This period is long enough for the processes in the model to affect the diagnostics because it is roughly the residence time of Port Phillip Bay [14]. The error tolerance for the reduced-model estimates of the diagnostics was chosen arbitrarily as Enom = 10%. In practical situations, not all of the performance measures are relevant to management objectives. Therefore we seek several reduced models, each appropriate for different purposes. In particular we seek the simplest model to accurately approximate a given diagnostic. Although the remaining diagnostics are not of concern for such a reduced model, their estimates are nevertheless noted. We shall see that for some models the subset of diagnostics that are within the error tolerance can be expanded. The reduced models obtained shall be coded so that they can be easily referenced. The simplest model obtained that adequately estimates diagnostic yi shall be denoted Model AG.i. If this model is also the simplest model that estimates diagnostic yj , then it shall be denoted Model AG.i.j. Note that Model AG.1 might also estimate diagnostic y5 to within the tolerance, but this would not necessarily be the same as Model AG.1.5. Model AG.1.5 would be the simplest model obtained that estimates both y1 and y5 to within the tolerance. 4. Results 4.1. Early results As a first attempt, the aggregation algorithm was first run with Enom = 10−6 . Two aggregates were identified namely k , see Fig. 1) were of order 10−16 and NH + NO and its sediment counterpart NHs +NOs . The resulting errors (Eagg −15 10 respectively, indicating that aggregating these variables will probably have a negligible effect on the system diagnostics. This was indeed the case, as the largest relative error between the full and reduced model diagnostics was about 0.5% (SG production under twice the base load). Hence the ammonia and nitrate pools in each layer can be combined into DIN pools. Furthermore, the order of the model was reduced by two. There was only one additional aggregate when Enom was increased to 10−5 , namely DONs + Sis . However, the resulting model produced negative states, which are nonsensical. This highlights the fact that the algorithm for choosing aggregates can only be a guide. That is, obtaining a small error in Eq. (6) does not imply that the aggregate in question is appropriate. Rather, it merely narrows the list of possible aggregates. 4.2. Model AG.1.2.3.4.7 After neglecting DONs + Sis as a possible aggregate, the most aggregated model that estimated all seven diagnostics to within 10% was identified with Enom = 0.035. This was also the simplest model that adequately estimated diagnostics 1–4, and diagnostic 7. Therefore, this is Model AG.1.2.3.4.7. This model had five aggregates. In addition to the DIN aggregates, the detritus pools were aggregated in both layers. That is, we obtained the aggregates DL + DR and DLs + DRs . The fifth aggregate was MA + BF. This is unexpected

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

375

Table 3 Percentage errors in the diagnostics for Model AG.5.6 Diagnostic

y1 . PL production y2 . PS production y3 . ZL grazing PL y4 . ZS grazing PS y5 . Denitrification y6 . SG production y7 . MA production

Load 1

2

3

4

18 −32 12 −36 7 −3 20

−10 −24 −23 −24 4 6 16

−23 −20 −39 −12 4 9 18

−31 −14 −49 −2 5 7 23

The seven aggregates are NH + NO, NHs + NOs , DL + DR, DLs + DRs , MA + BF, PL + PS and ZL + ZS.

and difficult to explain, since MA and BF occupy different trophic levels and are involved in different processes. On the other hand, these facts may help explain why MA and BF could be successfully aggregated. In particular, the method of PPOD lumps variables that are approximately linearly related, so that variables that vary in a similar way may be aggregated. Sometimes this may be because there is some causal relationship between the variables. At other times, it may be because there is a common underlying cause of variation (known as a “factor” in the social sciences). Since MA and BF are involved in different processes, it is plausible that they are linked by a common source of variation. Further investigation is required to resolve this hypothesis. The lumping of MA and BF was the only example of aggregation between trophic levels found in this study. Although we have shown that this kind of aggregation is possible, there is not yet sufficient evidence to contradict the finding of Fulton [12] that only variables on the same trophic levels should be aggregated. 4.3. Model AG.5.6 The only other aggregates that were successfully identified involved phytoplankton and zooplankton. That is, the reduced models involving any other aggregates resulted either in negative states or singularities. We shall discuss this further shortly. In the meantime, Table 3 shows the relative errors in the diagnostics when the two phytoplankton groups and the two zooplankton groups were aggregated. That is, the new aggregates in this reduced model were PL + PS and ZL + ZS. The only diagnostics that were adequately estimated were denitrification (y5 ) and SG production (y6 ). Thus we have identified Model AG.5.6. The large (in magnitude) errors of the phytoplankton diagnostics indicate that the dynamics of PS and PL are qualitatively different. Similarly, it seems that the dynamics of ZS and ZL are also different. Moreover, macroalgae production was overestimated, implying that the dynamics of both phytoplankton and zooplankton are important for MA. In particular, we learn that since phytoplankton growth (and hence zooplankton grazing) was underestimated, more nutrient was available in the water column for MA production. Also, seagrass production remains largely unaffected since it draws its nutrients from the sediment. Table 4 Percentage errors in the diagnostics when the aggregate PL + PS was removed from model AG.5.6 Diagnostic

y1 . PL production y2 . PS production y3 . ZL grazing PL y4 . ZS grazing PS y5 . Denitrification y6 . SG production y7 . MA production

Load 1

2

3

4

48 −20 80 −32 3 −1 4

−2 −17 −3 −21 3 −6 11

−23 −17 −35 −10 3 1 16

−34 −15 −51 −2 2 8 21

Thus the aggregates of this model were NH + NO,NHs + NOs ,DL + DR, DLs + DRs , MA + BF and ZL + ZS.

376

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

Table 5 A summary of the reduced models obtained using the method of partial proper orthogonal decomposition Model

Aggregates in the reduced model

Reduced model order

AG.1.2.3.4.7 AG.5.6

MA + BF,NH + NO,NHs + NOs , DL + DR, DLs + DRs MA + BF,NH + NO,NHs + NOs ,DL + DR, DLs + DRs ,PS + PL,ZS + ZL

24 (83%) 22 (76%)

The models are the simplest such that the diagnostics are estimated to within 10%. The order of each model is given, as well as the corresponding percentage of the full model order (which is 29) in parentheses.

Similar results were obtained when either the phytoplankton or zooplankton groups were aggregated, but not both. For example, Table 4 shows the relative errors when only the zooplankton groups were lumped. These indicate that the dynamics of the predator–prey pairs, that is, PS and ZS, and PL and ZL, are tightly coupled and require explicit representation if the corresponding diagnostics are to be adequately estimated. That is, the relationship between the two phytoplankton groups is dynamic and requires explicit representation, as does the relationship between ZS and ZL. They also suggest that PL is an important food source for ZL (ZL also consumes DF and MB). 4.4. Aggregation experiments Besides being useful for identifying possible aggregates, the aggregate-finding algorithm can be used to indicate when our intuition may be flawed. For example, it may seem natural to pool the silica variables (Si, Sis , DSi, DSis ), since these appear only in the PL production and ZL grazing diagnostics and are not of direct interest. Several reduced models were created in which every possible combination of dissolved and detrital silica in both layers were aggregated. However, each of these tended to a singularity, implying that the silica variables cannot be aggregated. The final set of experiments undertaken involved aggregating further the groups obtained so far. In particular, various combinations of the DIN and detritus variables were tried, but each resulting model quickly became unstable and tended to plus or minus infinity. That is, it was impossible to lump the DIN or detritus pools from both layers into one group. Therefore it seems that the separation of the water column and sediment layers cannot be simplified, which in turn implies that the ecological processes occurring in these layers are fundamentally different. Table 5 summarises the models obtained in this study. The first two columns show the diagnostics that were estimated to within 10%, while the third shows the aggregates in the corresponding reduced models. The final column gives the order of the reduced model, as well as the percentage of the full model order (which is 29) in parentheses. These models are the simplest such that adequate estimates were obtained. That is, any further aggregation would lead to inadequate estimates, or a nonsensical model. 5. Conclusion Despite its utility, the method of PPOD does not guarantee success. That is, the aggregates generated by the algorithm are not necessarily appropriate, and can result in poor diagnostic estimates or even nonsensical model output. For example, the third aggregate identified by the algorithm was DONs + Sis . However, the resulting model output involved negative states. Thus it can only act as a guide for lumping variables together; the validity of such a grouping can only be verified by experimentation. Nevertheless, the method enabled significant reduction of the order of the Port Phillip Bay Model. Indeed the order 29 PPBM was reduced by between 5 and 7 state variables. Moreover, several insights into the system were revealed by the reduction process. Thus the method can be useful for simplifying complex systems models and enhancing our understanding of them. In the case of the PPBM, the insights gained include: • The ecological processes occurring in the water column and sediment layers are fundamentally different and so this stratification cannot be simplified via aggregation. • Variables in the aggregates typically occupy the same trophic level. The only exception that was identified was the grouping of macroalgae with the filter feeders. Although surprising, this is not sufficient evidence to support aggregating variables on different trophic levels.

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

377

• The DIN pools and the detritus pools could each be aggregated in both layers while still enabling all seven diagnostics to be estimated to within 10% under all loads. MA and BF could also be lumped. • The silica variables (Si, Sis , DSi, DSis ) could not be aggregated. • Large zooplankton and large phytoplankton are tightly coupled and therefore PL is an important food source for ZL. The same holds for PS and ZS. Consequently, lumping the phytoplankton groups results in poor diagnostic estimates, as does lumping the zooplankton groups. The relationships within these groups are dynamic and require explicit representation. On the other hand, seagrass production (y6 ) and denitrification (y5 ) were adequately approximated when the phytoplankton groups or the zooplankton groups were aggregated. The method was introduced for aggregating state variables in large ecosystem models that does not require a detailed a priori understanding of the system. It is a modification of proper orthogonal decomposition and is called partial proper orthogonal decomposition. Acknowledgements This study was funded by the Australian Research Council and by the CSIRO Postgraduate Scholarship Program. The CSIRO (Commonwealth Scientific and Industrial Research Organisation) also supplied the model and data. The authors are grateful to Beth Fulton at CSIRO Marine and Atmospheric Research for her assistance with the model. References [1] E.D. Almymkulov, N.K. Lukyanov, Approximative separability of variables in dynamic models of ecological systems, Ecol. Model. 57 (1991) 165–172. [2] P. Auger, S. Charles, M. Viala, J.C. Poggiale, Aggregation and emergence in ecological modelling: integration of ecological levels, Ecol. Model. 127 (2000) 11–20. [3] P. Auger, R. Bravo de la Parra, Methods of aggregation of variables in population dynamics, Comptes Rendus l’Acad´emie Sci. S´erie III: Vie 323 (2000) 665–674. [4] H. Bugmann, A review of forest gap models, Climatic Change 51 (2001) 259–305. [5] W.G. Cale, P.L. Odell, Behaviour of aggregate state variables in ecosystem models, Math. Biosci. 49 (1980) 121–137. [6] W.G. Cale, R.V. O’Neill, R.H. Gardner, Aggregation error in nonlinear ecological models, J. Theor. Biol. 100 (1983) 539–550. [7] V. Christensen, C.J. Walters, Ecopath with ecosim: methods, capabilities and limitations, Ecol. Model. 172 (2004) 109–139. [8] K.L. Denman, Modelling planktonic ecosystems: parameterizing complexity, Prog. Oceanogr. (2003) 429–452. [9] E. Friedler, Hypertrophic reservoirs for wastewater storage and reuse, in: Ecology, Performance and Engineering Design, vol. 57, SpringerVerlag, Heidelberg, 1993, pp. 105–144. [10] E.A. Fulton, J.S. Parslow, A.D.M. Smith, C.R. Johnson, Biogeochemical marine ecosystem models. ii. The effect of physiological detail on model performance, Ecol. Model. 173 (2004) 371–406. [11] E.A. Fulton, A.D.M. Smith, C.R. Johnson, Mortality and predation in ecosystem models: is it important how these are expressed? Ecol. Model. 169 (2003) 157–178. [12] E.A. Fulton, The Effects of Model Structure and Complexity on the Behaviour and Performance of Marine Ecosystem Models, PhD Thesis, School of Zoology, Hobart, Australia, 2001. [13] L. H˚akanson, Optimal size of predictive models, Ecol. Model. 78 (1995) 195–204. [14] G. Harris, G. Batley, D. Fox, D. Hall, P. Jernakoff, R. Molloy, A. Murray, B. Newell, J. Parslow, G. Skyring, S.J. Walker, Port Phillip Bay Environmental Study Final Report, CSIRO, Canberra, Australia, 1996. [15] Y. Iwasa, V. Andreasen, S. Levin, Aggregation in model ecosystems. i. Perfect aggregation, Ecol. Model. 37 (1987) 287–302. [16] Y. Iwasa, V. Andreasen, S. Levin, Aggregation in model ecosystems. ii. Approximate aggregation, IMA J. Math. Appl. Med. Biol. 6 (1989) 1–23. [17] S.E. Jorgensen, S. Ray, L. Berec, M. Straskraba, Improved calibration of a eutrophication model by use of the size variation due to succession, Ecol. Model. 153 (2002) 269–277. [18] B.W. Kooi, J.C. Poggiale, P. Auger, Aggregation methods in food chains, Math. Comput. Model. 27 (4) (1998) 109–120. [19] J.N. Kremer, P. Kremer, Three-trophic level estuarine model: synergism of two mechanistic simulations, Ecol. Model. 15 (2) (1982) 145–157. [20] J. Lawrie, A method for simplifying large ecosystem models, Nat. Resour. Model. 21 (2) (2008), in press. [21] N.K. Lukyanov, Y.M. Svirezhev, O.V. Voronkova, Aggregation of variables in simulation models of water ecosystems, Ecol. Model. 18 (1983) 235–240. [22] N.K. Lukyanov, Grouping in simulational models of ecological systems, Eng. Cybernet. 5 (1981) 24–29. [23] A. Murray, J. Parslow, Port phillip bay integrated model: final report, technical report, CSIRO, Canberra, Australia, 1997. [24] A. Murray, J. Parslow, The analysis of alternative formulations in a simple model of a coastal ecosystem, Ecol. Model. 119 (2/3) (1999) 149–166. [25] R.V. O’Neill, B. Rust., Aggregation error in ecological models, Ecol. Model. 7 (1979) 91–105.

378

J. Lawrie, J. Hearne / Mathematics and Computers in Simulation 79 (2008) 368–378

[26] J.K. Pinnegar, J.L. Blanchard, S. Mackinson, R.D. Scott, D.E. Duplisea, Aggregation and removal of weak-links in food-web models: system stability and recovery from disturbance, Ecol. Model. 184 (2005) 229–248. [27] M. Rathinam, L.R. Petzold, A new look at proper orthogonal decomposition, SIAM J. Numer. Anal. 41 (5) (2003) 1893–1925. [28] E. Rexstad, G.S. Innis, Model simplification—three applications, Ecol. Model. 27 (1985) 1–13. [29] C.W. Rowley, T. Colonius, R.M. Murray, Model reduction for compressible flows using pod and galerkin projection, Phys. D: Nonlinear Phenomena 189 (1/2) (2004) 115–129. [30] H. Scholten, M.W. Van Der Tol, A.C. Smaal, Models or measurements, ICES, in: Meet. of the Int. Counc. for the Exploration of the Sea, Cascais, Portugal, Copenhagen, Denmark, September 16–19, 1998. [31] V. Van Breusegem, G. Bastin, Reduced order dynamical modelling of reaction systems: a singular perturbation approach, in: Proceedings of the 30th Conference on Decision and Control, Brighton, England, 1991, pp. 1049–1054. [32] E.H. Van Nes, M. Scheffer, A strategy to improve the contribution of complex simulation models to ecological theory, Ecol. Model. 185 (2005) 153–164. [33] S.J. Walker, C.R. Sherwood, A transport model of port phillip bay, technical report, CSIRO, Canberra, Australia, 1997. [34] S.J. Walker, Hydrodynamic models of port phillip bay, technical report, CSIRO, Canberra, Australia, 1997. [35] J.M. Yearsley, D. Fletcher, Equivalence relationships between stage-structured population models, Math. Biosci. 179 (2) (2002) 131–143.