Optimal size of predictive models

ELSEVIER Ecological Modelling 78 (199.5) 195-204 Optimal size of predictive models Lars Hikanson Instituteof Earth Sciences, Uppsala University, Nor...

Download PDF

975KB Sizes 0 Downloads 85 Views

Report

PDF Reader
Full Text

ELSEVIER

Ecological Modelling 78 (199.5) 195-204

Optimal size of predictive models Lars Hikanson Instituteof Earth Sciences, Uppsala University, Norbyv. 18B, 752 36 Uppsala, Sweden Received 27 April 1993; accepted 16 November 1993

Abstract This work focuses on models for ecosystems, like lakes and coastal areas. The objective has been to discuss the optimization problem, i.e., the balance between an increasing generality as dynamic and empirical/statistical models account for more processes and factors, and the increase in predictive uncertainty associated with this growth. The apparent predictive power, expressed for example by the r2-value (coefficient of determination between model-predicted values and empirical values), may increase with the number of x-variables (dependent variables) accounted for in predictive models. However, every .x-variable and/or rate in a predictive model has a certain uncertainty due to the fact that there are always problems associated with sampling, transport, storage, analyses, etc. Uncertainties in x-variables may be added or multiplied in the model predictions. The optimal size of predictive models is therefore generally achieved for a (surprisingly) small number of dependent variables. The results presented here indicate that predictive models should not have more than two to six x-variables (or compartments). It is, however, evident that there may be many specific cases when more x-variables would do more good than harm, but it is important to note that the potential model uncertainty would then also increase. Keywords: Lake ecosystems; Monte Carlo tests; Optimal size; Predictive models; Predictive power; Uncertainty

1. Introduction

and aim

Dynamic models (or mass-balance models) are based on a causal analysis of the processes governing the fluxes of matter and substances based on calculations using differential equations. If dynamic models are to be used in practice in monitoring and research, the rates that govern the transport between the various compartments need to be known, simulated or estimated. In dynamic modelling, the analysis of dimension (of each parameter) is very important. Dynamic mod-

els are mostly used to study complex interactions and time-dependent variations within defined ecosystems, like lakes. Dynamic models are sometimes difficult to calibrate and validate and they tend to grow indefinitely. If dynamic models are not validated against independent, reliable empirical data, they may yield absolutely wrong predictions. Empirical/ statistical models are primarily used to quantify differences between ecosystems/ areas from simple, readily accessible data. Empirical models and dynamic models are thus used for

0304-3800/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved SSDI 0304-3800(93)E0103-A

196

L. Hiikanson /Ecological

different purposes; they do not compete, they complement each other. The presuppositions of the models must always be clearly stated. It is well known for nutrients/eutrophication in lakes that large dynamic models often yield worse predictions than simple empirical/ statistical models (see, e.g., Ahlgren, 1988; Peters, 1991) and that dynamic models for toxins in lake ecosystems may yield extremely bad predictions with uncertainty limits two orders of magnitude apart (see, e.g., BIOMOVS, 1990). It is simple to give verbal “explanations” to this and very difficult to give clear-cut mathematical explanations where general definitions are used for “predictive power” and “model uncertainty”. Peters (1991) has discussed this problem from a principle viewpoint and stressed these facts about the predictive failure of many ecological models when tested for other ecosystems than the one for which the model was calibrated. The aim of this paper is certainly NOT to present the final solution to this problem but rather the first (that the author is aware of) quantitative approach to this important problem. This work focuses on models for ecosystems, like lakes and coastal areas, i.e., rather homogenous entities of a certain area (about 1 hectare to 100 km’). One important definition of such ecosystems is that the area1 and temporal variability of defined characteristic properties, like zooplankton biomass and Secchi depth, within the ecosystems should be smaller than the variability behveen ecosystems. If the variability within the ecosystem, e.g., in mean weekly Secchi depth, is great, many samples would be needed to empirically determine a representative mean (or median) value, and it would also imply that such a mean value would likely be difficult to predict by models, since there may be many causes to the variability. There are benefits and drawbacks with all types of models. The main disadvantages with empirical models are that they generally only apply under restricted conditions, as set by the range of the x-parameters, and that they may give a poor insight into the causal mechanisms (see, e.g., H%kanson, 1991). The main disadvantages with dynamic mass-balance models (see, e.g., Vemuri,

Modelling 78 (I 995) 195-204

1978; Stragkraba and Gnauck, 1985; Jorgensen and Johnsen, 1989) are that they may be difficult and very expensive to calibrate and validate, and they tend to be large (the “elephantiasis problem”), which entails both practical and economical problems, as well as an accumulated uncertainty in the prediction. This work focuses on the optimization problem for predictive models, i.e., in contexts where one or a few interesting y-variables are to be predicted from a set of more readily available x-variables. The basic question is: Is it possible to quantitatively assess the optimization problem of model size, i.e., the balance between an increasing generality as dynamic and empirical models account for more processes (more x-variables) and the increase in predictive uncertainty associated with this growth? Every process, state variable and/or compartment added to a dynamic model entails a certain error, since there is always an uncertainty linked to the method of sampling, transport, storage, handling, analysis and data processing. This problem has been addressed in several contexts (see, e.g., Peters, 1986, 1991). It is evident to many modellers that the risks of predictive failure, as determined from both a decreasing accuracy in the prediction of y and increasing uncertainty limits (e.g., confidence or tolerance limits) around the predicted y-value, will increase if more and more x-parameters (like compartments, boxes, factors, rates, processes, etc.) are accounted for in the model. So, is it possible to give some quantitative examples and perhaps even recommendations concerning the optimal size of predictive models? First, it should be stressed that it is evident that ecosystem models constructed with the purpose of describing/ understanding interactions, foodwebs, fluxes of contaminants, etc., due to the great complexities involved in ecosystems, generally must be extensive! But it is quite another issue with models for specific predictions, of just one or two y-variables. Even if “everything depends on everything else” in ecosystems, it is clear that all x-variables could not have the same predictive weight for one specific y-variable. From a principle view in predictive modelling, there seem to exist two avenues to make that ranking of

L. Hikanson /Ecological Moafelling78 (1995) 195-204

influence: By empirical/ statistical methods (correlations, etc.) or by sensitivity analyses.

2. The generality In the balance between generality and accumulated uncertainty, we will here first treat the generality. From a pedagogical point of view, an analogy with lake hydrographic surveys and the construction of bathymetric maps (see HAkanson, 1981) will be made. Fig. 1A illustrates a shoreline of a lake. From this simple information, nothing about the topography of the lake is known. The information value Z is zero, i.e., for IZ= 0, Z = 0. The information value is here a value between 0 and 1; Z = 1 means complete and correct information of the entire topography of the lake. Fig. 1B shows two contour lines for the depths 2 and 6 m. From this bathymetric map, one has a much better picture of the topography. The information value is 0.55,

-

-

197

according to a formula (eq. 1) given and motivated in HLkanson (1978). Fig. 1C shows the same lake with four contour lines. The information about deep and shallow areas, etc. is much better, and the Z-value is 0.79. Fig. lD, finally, gives a map with eight contour lines. Then Z = 0.96. That is, for IZ= 8, we have almost as much information about this lake as it is possible to illustrate in a bathymetric map to be printed in a standard book-format. It should be noted that the Z-value initially increases very markedly. There is a significant difference in how the Z-value increases when IZ increases from, e.g., two to four as compared to from eight to ten. The relationship between Z and n is given by: I=

(e0.4*n _

l)/(e0.4*”

+ 0.22).

(1)

This formula is depicted in Fig. 2A. From the curve for the Z-value, it is clear that the information value, I, asymptotically reaches 1 as n --) Q), and that Z initially increases very rapidly.

_

n=2

-

ho.55

n=8

-

lzO.96

Shore line

Shonl.tne

n=O -

I=0

-

Contour ihe

n=4

1=0.79

-

Contour line

Shoreline

Contour line

-

Fig. 1. Illustration of how the information value (I) increases with the number of contour lines (n) in a bathymetric map.

198

L. Hikanson /Ecological Modelling 78 (1995) 195-204

We can assume that the same principles would also be valid in contexts of predictive modelling. It is evident that there would be many specific cases when the relationships between n and Z (for modelling) would differ from this particular

Z-value (for bathymetric maps). So, this particular definition may be challenged. And we will do some tests. Table 1 gives three test series (stepwise multiple regressions) for:

Table 1 Illustration of how the ?-value increases with the number of x-variables in three stepwise multiple regression models Parameters

r2

A. Biological parameter. Hgpe (mg Hg/kg ww) vs RHg (ng Hg/l) and environmental parameters; n = 25 lakes Step 1 0.32 UPH Step 2 0.47 l/pH, l/DR Step 3 l/pH, l/DR, JRock% 0.58 Step 4 l/pH, l/DR, JRock%, T 0.62 Step 5 l/pH, l/DR, iRock%, T, log(CaMg) 0.66 Step 6

l/pH,

l/DR,

dRock%, T, log(CaMg), AcidN

0.71

Step 7

l/pH,

l/DR,

{Rock%, T, log(CaMg), AcidN, l/(1 + RHg)

0.73

Hgpe = -0.329 + 2.234/pH + O.O08/DR - 0.256/(1+

RHg) + O.O3JRock% + O.OOl\/AcidN - 0.146 log(CaMg) - 0.025T

B. Abiotic parameter. RHg vs environmental parameters; n = 25 lakes Step 1 log(Co1) Step 2 log@ol), log(Q) Step 3 log(Co0, log(Q), l/CaMg Step 4 log(Col), log(Q), l/CaMg, ADA Step 5 log(Co0, log(Q), l/CaMg, ADA, TillN Step 6 log(Col), log(Q), l/CaMg, ADA, TillN, AcidN

0.47 0.74 0.81 0.84 0.86 0.87

RHg = 0.235 + 0.89 log(Co1) + 0.807 log(Q) + O.O78/CaMg - 0.015ADA + 0.006TillN - 0.002AcidN C. Random parameter. pH12 vs 100 random parameters; n = 25 Step 1 rand 16 Step 2 rand 6 Step 3 rand 7 Step 4 rand 55 Step 5 rand 24 Step 6 rand 52 Step 7 rand 51 Step 8 rand 42 Step 9 rand 38 Step 10 rand 41 Step 11 rand 55 (2nd) Step 12 rand 17 Step 13 rand 56 Step 14 rand 96 Step 15 rand 44

0.23 0.40 0.57 0.67 0.73 0.76 0.80 0.85 0.86 0.88 0.88 0.93 0.95 0.96 0.97

A. A biological variable, Hgpe (i.e., the Hg-content in perch fiy in mg Hg/kg ww), versus lake pH, lake dynamic ratio (DR), Rock% in the catchment (i.e., percentage of rocks in the catchment area), lake water hardness (CaMg in meq/l), AcidN (i.e., the percentage of acid bedrocks in the so-called near area to the lake; see Nilsson and Hftkanson, 1992 for definition) and reactive Hg in lake water (RHg in ng/l). The r*-value increases in 7 steps up to 0.73. B. An abiotic variable, RHg, versus lake colour (in mg Pt/l), theoretical lake water discharge (Q in m3/s), lake water hardness (CaMg in meq/l), catchment area (ADA in km*), TillN (i.e., the percentage of till in the so-called near area to the lake) and AcidN (i.e., the percentage of acid bedrocks in the so-called near area to the lake). The r*-value increases in 6 steps up to 0.87. C. Mean annual lake pH (pH12) for 25 lakes versus 100 random parameters. The ?-value increases in 15 steps up to 0.97.

L. H&znson /Ecological Modelling78 (I 995) 195-204

1. A simple biological variable; Hgpe, i.e., the mean lake Hg-content in perch fry in mg/kg wet weight based on data from about 10 fishes per lake from 25 Swedish lakes; data from H&anson et al., 1990; predicted from many different empirical parameters from the catchment area (like percentage of rocks, lakes and mires), the bathymetric map of the lake (like mean depth, Dm, and dynamic ratio, i.e. darea/Dm) and from lake chemical variables (like pH and colour).

199

2. An abiotic (chemical) variable; RHg, i.e., mean lake reactive Hg from water samples; in rig/l based on about 10 monthly samples per lake from 25 lakes; data from H&anson et al., 1990; predicted from different empirical lake parameters. 3. Mean annual lake pH predicted from random parameters (as given by a random data generator); pH12 is determined from monthly lake samples from 25 lakes. It is evident that these selected variables are

o Biol param (Hgpe) o Abiotic param (RHg) + Random param (pH12,

n-25)

*I I-Informationvalue: I+O~4*n-l)/(eO~4*n+O.O2) n=Numbert of contour lines. or number of xvariables to predict y

0

2

8 4 6 Numbaafx

10 p-s

12

14

16

18

20

@I

Fig. 2. (A) Graphical illustration of the relationship between the degree of explanation (r2 or I) and the number of x-parameters (n) in a predictive model for a biological variable (Hgpe, Hg-content in perch fry in mg Hg/kg wv), an abiotic variable (mg, Hg-content in lake water in rig/l;; see Lindqvist et al., 1991 for definition), a prediction of mean annual lake pH (pH12) based on random parameters for 25 lakes, and the information value (I) as defined by the given formula (valid for bathymetric maps). (B) The same two curves for Hgpe and RHg as in Fig. 2A but a new curve for the prediction of mean annual lake pH (pH12)based on random parameters for 35 lakes.

200

L. Hikanson /Ecological Modelling 78 (1995) 195-204

not typical for all biotic and abiotic lake variables, they have been used here as examples to illustrate some important principles.

The following conclusions may be drawn from Table 1. ?? These results indicate what ought to be generally true, namely that it should be more difficult to predict biological variables than abiotic variables - as soon as biology enters the arena, predictions often get very difficult. This is evident from this simple example where we have selected an elementary biological parameter, the Hg-concentration in fish: From 7 steps one would here obtain a ?-value of 0.73. Our abiotic variable, a chemical variable as given by a specific fraction of mercury in lake water called reactive Hg, is by definition not a biological parameter. It should, however, be noted that the actual value of RHg in a given lake would depend on many complex biological processes (see, e.g., Lindqvist et al., 1991; Meili, 1991). The message here is that it is often possible when predicting chemical (and also probably physical) lake variables to obtain higher r2-values (here 0.87) from fewer steps (here 6 instead of 71, as compared to most biological variables. ?? It is possible to obtain very high r2-values in models based entirely on random parameters, under two presuppositions: (1) the number of x-variables (n) must be small (in Table 1C we have used n = 25 to get comparable data to the results in Table 1A and B), F must be small (we have set F to 1 in Table 1C; normally F would be set to 4; see any basic textbook in statistics for definition of F, e.g., Neter et al., 1988; Taylor, 1990; Helsel and Hirsch, 1991) so that many steps would be accepted (there are 15 steps in Table 10. The results in Table 1 are graphically illustrated in Fig. 2A, where they are compared to the information value, as defined by Eq. 1. From this figure, one can note: (1) that the degree of explanation (r2 or I, r2 is the coefficient of determination; r = correlation coefficient) increases as the number of x-variables accounted for in the models increases, (2) that the curve for the Z-value from Eq. 1 falls between the three other curves, and (3) that the curves for the biological variable

(Hgpe) and the random parameter prediction of pH12 (for n = 25) are very close to one another. But this is totally dependent on the choice of II. This is illustrated in Fig. 2B, where n is set to 35. Then we can note that this curve is significantly different from the curve for Hgpe. The reason for this is simply that it is much more unlikely to obtain high r2-values for random parameters if n is large. To conclude: the predictive power, expressed for example by the r2-value, generally increases with the number of x-variables accounted for in the predictive model, but differently for biological and chemical variables (and for different modelling presuppositions). This has been exemplified here with regression models, but the same principles ought to apply also to dynamic models.

3. The accumulated uncertainty Every state variable and/or rate in a dynamic model and x-variable in an empirical/ statistical model has a certain uncertainty (see, e.g., Hskanson, 1992). Often in dynamic modelling, one would use a mean or a median value for a given ecosystem and apply this value (e.g., sedimentation rate, partitioning coefficient, retention time) as a constant in different model calculations/ simulations. Uncertainty analysis means that the uncertainties in all the x-variables in the model are accounted for in the determination of the predicted y-value. (Sensitivity analysis means that one model parameter is varied while the rest of the model parameters are kept constant.) Uncertainties in x-parameters may, in principle, be added or multiplied in the model. From any statistical textbook (see, e.g., Neter et al., 1988 or Helsel and Hirsch, 1992), it is easy to demonstrate that there exists a mathematical and exact way of defining the uncertainty of additive models, since the standard deviation, SD, may be written as: SD,, = SD,,!I/n,

(2)

n Y=

Cxi, i=l

(3)

L. H&anson / Ecobgical Modelling 78 (1995) 195-204

where SD, = the standard deviation of the y-variable; SD, = the standard deviation of all the xvariables. The relationship between the number of x-variables, n, in a simple additive model and the uncertainty, as given by SD,, is given in Fig. 3A (the curve linked to the crosses). In the following, we will use Eq. 2 as a reference and test both additive (summation) models and products (multiplicative models) using x-variables with different standard deviations. It seems to be a rather difficult task to find exact solutions for multiplicative models, so in the following, the Monte

201

Basic x-distribution 634

-

.06.

10,OOOTrials

nv=r.o

.

A.

~ ,05_........................__.................................. C SD.O.1

0.60

0.80 --W

0.80

Multiplication

95X

1.00

1.20

conf.int.

1.19

1.40 +

of 4 variables

-516

OS-

00

.

,..,'i -. ').

0.40 ---_)

0.70 065

1.oo 95%

1.60 +

1.30 144

conl.int.- Z*SD

Summation of 4 variables

474

.OS c.

3.40 ----)

. 3.70 3.61

4.00 95X

4.60

4.30

conf.int.: 2.S.D

4.39

f-

Fig. 4. Uncertainty analyses for predictive models. (A) The basic uncertainty distribution for the given standard x-variable in this test. MV= 1.0, SD = 0.1, normal distribution, 10000 trials using Monte Carlo technique. (B) The calculated uncertainty in the y-variable after multiplication of four x-variables. (C) The calculated uncertainty in the y-variable after summation of four x-variables.

n

Fig. 3. (A) Uncertainty factors for the y-variable in a predictive model with different numbers of x-variables (n) determined in two different ways, by summation (SD,Sum and k’,Sum) and as a product (SD,Prod and v,Prod) of the x-variables in the model. The curve for the exact expression 0.1 * dn is used as a reference line. Ah x-variables in this test have a mean value of 1 and a standard deviation (SD) of 0.1. (B) The same results as under (A) but for situations when the standard deviation of the x-variable is varied (0.1, 0.2, 0.3 and 0.4; Prod’7 and Sum*-SD).

Carlo technique will be used to derive numerical solutions for all types of model uncertainties. Fig. 4A gives one example of a frequency distribution for a given standard x-variable to be used in the following uncertainty exercise. The mean value (MV) = is set to 1.0, the standard variation (SD) to 10% of the mean and the distribution is assumed to be normal. This is, in fact, a rather low relative standard deviation (V = 100 * SD/MV) in lake ecosystem contexts - Fig. 5 illustrates that many standard lake variables generally appear with significantly higher ?+alues!

L. Hhnson

202

/Ecological Moaklling78 (I 995) 195-204

0

12a-’

.

.--..---

‘-

.

120

-.-.---.-.-...-...........

160

.Naturalvariability ^. 5 120. . ...-.-.-....... -... .-. _..._.._.. _._..___ . ..~^......... _

8 140!..C-:_.._.. ._.._

,,. o,.,,. .___. ,F,.

160

10%). We can note that the mean is, naturally, still 1 (l* l* l* l), that the standard deviation has increased and the confidence interval widened (from 0.65 to 1.44, as compared to 0.80 to 1.19 (the 95% confidence interval corresponds to about 2 *SD). Fig. 4C gives analogous results for a summation of four x-variables (of the type in Fig. 4A), i.e., 1 + 1 + 1 + 1. The mean y-value is 4. The 95% confidence limits vary between 3.61 and 4.39. Calculations of this type have been made for a whole range of n-values (2, 4, 8, 16, 32) and the results are given in Fig. 3A and B. SD,Prod means a standard deviation obtained from a product where each model variable has a SD of 10%; SD,Sum means the same thing for a summation model; Prod *-I/ means a relative standard deviation (SD/MV) for a model where each model parameters has a SD-value of 10, 20, 30 and/or 40%. The series is: LI

HgP’

Hgpc

RlHg

RHg

THg

1

2

4

Xl

X7X2

(x~x;x~xf)

IHg

Fig. 5. (A) Compilation of relative standard deviations (V in %) for mercury (in fish and sediments), pH, conductivity, hardness (CaMg), Ca, colour (Cal), Fe, total-P and alkalinity, which are all linked to limitations/errors in analytical techniques, sampling and laboratory determinations (including intercalibrations). (B) Relative standard deviations for different water parameters (including water temperature). The values are based on mean annual values from natural lakes; data from 75 Swedish lakes. (C) Relative standard deviations for different mercury parameters (in pike, Hgpi; in perch, Hgpe; reactive plus non-reactive Hg in lake water, RIHg; reactive-Hg, RHg; total-Hg, THg; and non-reactive-Hg, IHg). The values expressing Hg in water are based on mean annual values from 25 Swedish lakes. From H&anson (1992).

The frequency distributions in Fig. 4 have been determined from 10000 trials using Monte Carlo technique. In the following, we will use this particular frequency distribution in Fig. 4A (SD = lo%>, and also distributions where SD is set to 20, 30 and 40% of the mean value. Fig. 4B shows the uncertainty in a y-variable if one muftiplies four x-variables with uncertainty distributions of the type given in Fig. 4A (SD =

SD, = lo%, SD, = 20%, SD, = 30% and SD, = 40% of the mean value (MV). The relative standard deviation, V, would, of course, improve the comparison for x-variables with different mean values. Now, if one plots such SD- and vvalues as uncertainty factors for different n, Fig. 3 shows that, as expected, the three curves for SD = 10% are very close to the exact curve (O.l*dn). Since these three curves depend on the given presuppositions for the x-variable, they may NOT be used generally. This is evident from Fig. 3B, which compares the results for SD = 10% (curve SD,Prod and curve O.l’\/n) with two curves with more uncertain x-variables (10, 20, 30 and 40%). One may also note that the relative standard deviation (I/Sum) naturally decreases when we add parameters. The reason is that the probability of getting a value which departs 2 *SD (0.2 units) from the mean of 1 would be about 0.02 for the initial x-distribution, but the probability of

203

L. Ha’kamon/EcologicalModelling78 (1995) 195-204

getting a value which departs (0.2 + 0.2 = 0.4 units) from a the new mean of 2 would be much less than 0.02. From these results, one can note that the uncertainty factor increases significantly as n increases.

4. The optimal size To determine the optimal size (= number of x-variables) of a predictive model one can combine the information value (r* or 1) from Fig. 2 and any of the uncertainty factors (SD or V) from Fig. 3 in several ways, e.g.,: ?? maximize I/ V (or r ‘/SD), see Fig. 6A. ?? maximize Z *(1 - V), see Fig. 6B, or ?? maximize Z - V, see Fig. 6C and D, since the I-value should be as large as possible and the V-value as small as possible. If the uncertainty factor (I’ or SD) approaches zero, i.e., if there is no uncertainty linked to the parameters and state variables in the model, then the model would be better as more compartments and processes are added. The expression (1 - V) could, of course, attain negative values if the standard deviation is larger than the mean value and V larger than 1. So, the expression Z *(l - V) is a constructed expression and the optimal model derived from this expression ought to be less interesting than the results from the expression I/V.

From Fig. 6, one can note that the optimal size for predictive models (under the given conditions) is generally achieved for (surprisingly) small n. The reason is that with these definitions, the predictive power (I or r*) increases rapidly when n is small and the increasing accumulated error (V or SD) presses down the factor to be optimized for higher n. From these presuppositions, one could recommend that predictive dynamic compartment models and empirical/ statistical regression models generally should not have more than two to six x-variables (or compartments). It is evident that there may exist many specific cases when more x-variables would do more good than harm, but it is important to note that the uncertainties would

n

Fig. 6. Calculation of the optimal number of r-variables in a predictive model (under given conditions). I, the information value (or the ?-value), which should be as large as possible. V = the accumulated model uncertainty (the standard deviation SD or the relative standard deviation V = SD/MV), which should be as small as possible. n = the number of compartments or x-variables in the model. (A) The result for a situation to maximize the ratio I/V. V is here equal to V,Prod from Fig. 3A. (B) The results to maximize the product I *(l - V). (C) The results to maximize I-Prod *-V. (D) The results to maximize I - Sum *-SD.

then increase and the model would appear to be better than it actually is likely to be (a case of “objectivity deodorant”) - the uncertainty limits would get wider. Models should always be related to well-focused questions. The question often dictates the number of variables. These results apply to predictive models where just a few y-variables are requested. These results support the old statement: The simpler the better!

204

L. Hakanson /Ecological

These results should be considered as the first step to the very important problem on the optimal size of predictive models. These examples illustrate the need to get further insights into the factors that determine “predictive power” and “model uncertainty”. Is it possible to derive any general mathematical theory to this problem? Would should a general theory be useful in practice? How valid are these results for dynamic models with positive and negative feed backs? References Ahlgren, I., Frisk, T. and Kamp-Nielsen, L., 1988. Empirical and theoretical models of phosphorus loading, retention and concentration vs. lake trophic state. Hydrobiologia, 170: 285-303. BIOMOVS, 1990. Scenario Al. Mercury in aquatic ecosystems. Tech. Report 7. National Inst. of Radiation Prot., Stockholm, Sweden. Hikanson, L., 1978. Optimization of lake hydrographic surveys. Water Resour. Res., 14: 545-560. Hikanson, L., 1981. A Manual of Lake Morphometry. Springer-Verlag, Berlin, 78 pp. HBkanson, L., 1991. Ecometric and Dynamic Modelling Exemplified by Cesium in Lakes after Chernobyl. Springer-Verlag, Berlin, 158 pp. Hgkanson, L., 1992. Considerations on representative water quality data. Int. Rev. Ges. Hydrobiol., 77: 497-505. H%kanson, L., Andersson, P., Andersson, T., Bengtsson, A.,

Mode&g

78 (1995) 195-204

Grahn, P,, Johansson, J-A., Kvarnas, H., Lindgren, G. and Nilsson, A., 1990. Measures to reduce mercury in lake fish. Final report from the Liming-mercury-cesium project. Nat. Environ. Prot. Agency, Solna, Sweden, SNV PM 3818, 189 PP. Helsel, D.R. and Hirsch, R.M., 1992. Statistical Methods in Water Resources. Elsevier, Amsterdam, 522 pp. Jorgensen, S.E. and Johnsen, J., 1989. Principles of Environmental Science and Technology (2nd edition). Studies in Environmental Science, 33. Elsevier, Amsterdam, 628 pp. Lindqvist, O., Johansson, K., Aastrup, M., Andersson, A., Bringmark, L., Hovsenius, G., Hikanson, L., Iverfeldt, A., Meili, M. and Timm, B., 1991. Mercury in the Swedish Environment. Water, Air and Soil Pollution, Vol. 55. Kluwer, Dordrecht, 261 pp. Meili, M., 1991. Mercury in boreal forest lake ecosystems. Acta Univ. Upsaliensis 336. Thesis, Uppsala University. Neter, J., Wasserman, W. and Whitmore, G.A., 1988. Applied Statistics, 3rd ed. Allyn and Bacon, Boston, 1006 pp. Nilsson, A. and Hi&anson, L., 1992. Relationship between drainage area characteristics and lake water characteristics. Environ. Geol. Water Sci., 19: 75-81. Peters, R.H., 1986. The role of prediction in limnology. Limnol. Oceanogr., 31: 1143-1159. Peters, R.H., 1991. A Critique for Ecology. Cambridge Univ. Press, Cambridge, 366 pp. StraHkraba, M. and Gnauck, A., 1985. Freshwater Ecosystems. Modelling and Simulation. Developments in Environmental Modelling, 8. Elsevier, Amsterdam, 310 pp. Taylor, J.K., 1990. Statistical Techniques for Data Analysis. Lewis, Chelsea, 200 pp. Vemuri, V., 1978. Modeling of Complex Systems. Academic Press, London, 448 pp.

Optimal size of predictive models

Optimal size of predictive models

Recommend Documents