How to validate environmental indicators

How to validate environmental indicators

Agricultural Systems 76 (2003) 639–653 www.elsevier.com/locate/agsy How to validate environmental indicators C. Bockstallera,*, P. Girardinb a Assoc...

286KB Sizes 1 Downloads 101 Views

Agricultural Systems 76 (2003) 639–653 www.elsevier.com/locate/agsy

How to validate environmental indicators C. Bockstallera,*, P. Girardinb a

Association pour la Relance Agronomique en Alsace (ARAA), c/o INRA, BP 507, 68021 Colmar Cedex, France b INRA, Equipe Agriculture et Environnement, BP 507, 68021 Colmar Cedex, France Accepted 1 May 2002

Abstract Different types of environmental indicators have been developed to meet the increasing need of assessment instruments of environmental impacts. As any tool developed by research, indicators must be elaborated according to a scientific approach. One of the important steps of this elaboration is the validation. The overall objective of this paper is to present a methodological framework to validate indicators. According to the definition of an indicator, three kinds of validation are presented: the ‘‘design validation’’ to evaluate if the indicators are scientifically founded; the ‘‘output validation’’ to assess the soundness of the indicator outputs, and, the ‘‘end use validation’’ to be sure the indicator is useful and used as a decision aid tool. The output validation is inspired from validation of simulation models, which is shortly reviewed. Because indicators differ from models in many cases, validation procedures commonly used in modelling have to be adapted. # 2003 Elsevier Science Ltd. All rights reserved. Keywords: Sustainability; Environment; Indicator; Validation; Nitrogen; Pesticide

1. Introduction In the last decade, the development of indicators at the national, regional, local or field level, has become a commonly approach to meet the crucial need for assessment tools. Such tools are a prerequisite to the implementation of the concept of sustainability, and especially its environmental component (Hansen, 1996). The increasing use of indicators to face this challenge may be explained by the impossibility to carry out direct measurements in many monitoring programs, due to methodological problems, or practical reasons of cost and time, and to the lack of * Corresponding author. Tel.: +33-3-89-22-49-80; fax: +33-3-89-22-49-33. E-mail address: [email protected] (C. Bockstaller). 0308-521X/03/$ - see front matter # 2003 Elsevier Science Ltd. All rights reserved. PII: S0308-521X(02)00053-7

640

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

feasibility of many simulation models which are developed as an alternative to direct measurements. The following definitions of an indicator refer to those considerations: An ‘‘Indicator is a variable which supplies information on other variables which are difficult to access (. . .) and can be used as benchmarker to take a decision’’ (Gras et al., 1989). ‘‘alternative measures (. . .) enable us to gain an understanding of a complex system (. . .) so that effective management decisions can be taken that lead towards initial objectives’’ (Mitchell et al., 1995). Both definitions imply: (1) an informative function, i.e. to supply simplified information about a complex system (e.g. an agrosystem), or an unmeasurable criteria (e.g. biodiversity, sustainability). It should be noticed that those definitions do not mention an accurate simulation of a system or a prediction of a variable as it is for models; (2) a decision aid function to help to achieve the initial objectives, e.g. the sustainability of a farming system. Indicators may result from a set of measurements, from calculated indices, or they may be based on expert systems. At least two types of indicators may be distinguished (Girardin et al., 1999): simple indicators resulting from the measurement or the estimation (e.g. by a model) of an indicative variable and composite indicators that are obtained by aggregation of several variables or simple indicators. Mineral nitrogen content in soil before winter (Schro¨der et al., 1996; Machet et al., 1997) or bioindicators as the number of earthworms per surface unit (Smith et al., 2000) are example of the former type whereas the index of sustainable economic welfare (ISEW) developed by Stockhammer et al. (1997) belongs to the latter group. The nature as well as the purpose of an indicator, as mentioned before, differ from that of a simulation model. Many indicators are not aimed at being used to predict an actual impact but to supply information about a risk or a potential effect (Halberg, 1999). Indicators can also inform policy makers about the progress that is being made towards achieving a policy objective (Crabtree and Brouwer, 1999; Vos et al., 2000). Thus, the role of the indicator may be to signal positive movements as well as negative ones. Someone are also aimed at ‘‘raising the alarm’’, meaning they should give information on negative impact before it actually occurs (Reus et al., 1999). However, in some cases indicators are derived from simulation models in order to make the output easier to understand and to ‘‘relay a complex message in a simplified manner’’ (Fisher, 1998). In any case, the methodology underlying the elaboration and development of indicators should fit scientific standards, which implies a procedure of validation (Girardin et al., 1999). Such a procedure is currently applied in the development of simulation models and is a decisive step in the evaluation of model quality. However, few developers of indicators deal with this issue and propose a detailed procedure. Some authors mention the necessity for indicators to be scientifically valid (Mitchell et al., 1995; Crabtree and Brouwer, 1999; Smith et al., 2000; Vos et al., 2000) but those authors do not propose a procedure for validation. Neither van der Werf and Petit (2002) do find detailed information on this issue in a review describing and evaluating twelve assessment methods of environmental impacts. A qualitative approach based on the assessment of the potential to obtain undesirable outcomes for six methods can be found in Hertwich et al. (1997). Some authors like Sharpley (1995) go further. He gives an example of validation of a Phosphorus index, based on the comparison of the indicator outputs with field measurements, as it

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

641

is generally carried out in modelling whereas Appel et al. (1994) failed to validate in the same way a indicator based on the calculation of an nitrogen budget at field level. Nevertheless, since the nature and the purpose of an indicator differ from those of a predictive model, it is questionable whether the same procedure as in modelling can be implemented to test indicators. The objective of this article is to propose a methodological framework to validate indicators. We will use the experience gained by modellers in validation and will focus on environmental indicators.

2. A general framework for the validation of indicators From the definition given in the Oxford dictionary and discussed by Addiscott et al. (1995), something is validated if: ‘‘it is well founded and it achieves the overall objectives or it produces the intended effects’’. Harrison (1990) and Kirchner et al. (1996) also refer to this last part of the definition and define validity as the adequacy for a specific purpose. The first point of the given definition requires the assessment of the scientific quality of the construction or design of a given tool. We will call this step a ‘‘design validation’’ (Gilmour, 1973). Concerning the second point of the definition, it comes out from their definition that indicators have a double function to supply information and help to take management decision. These two aspects of an indicator refer respectively to the soundness of the indicator output and to the usefulness of an indicator for potential users, which implies respectively an ‘‘output validation’’ (Gilmour, 1973) and an ‘‘end-use validation’’ (Girardin et al., 1999). Thus, an indicator will be validated if it is scientifically designed, if the information it supplies is relevant, if it is useful and used by the end users. Three types of indicator validation correspond to these three conditions (Fig. 1).

3. Validation of the indicator design In modelling, the design validation or conceptual validation (Mitchell and Sheehy, 1997) is carried by reviewers of scientific papers describing a model. However, this

Fig. 1. A flowchart for the framework of indicator validation.

642

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

first type of validation is often overshadowed by the validation of the model outputs or empirical validation (Mitchell and Sheehy, 1997), based on a comparison with a set of measured data, which is considered as the key step in model validation (Gilmour, 1973). For indicators, the design validation may be very important when no other possibility of validation exists (see Section 6), as pointed out by Reus et al. (1999). It consists of submitting the design or construction of the indicator to a panel of experts. Van der Werf and Zimmer (1998) applied this procedure to their indicator of pesticide environmental impact, which leads them to improve their indicator before publishing it. Another example is given by Taylor et al. (1993). Expert consensus plays a major role in this kind of validation. The design validation may be also carried out, a priori, by expert judgements for the selection of the variables which should be measured as indicator. This is the case when the choice of indicators is made by a survey among a panel of experts, for instance by means of Delphi technique (Hess et al., 1999; Smith et al., 2000).

4. Validation of the indicator outputs The purpose of this procedure focusing on the informative function of indicators is to evaluate their reliability. This is a very common approach in modelling. We will shortly review the validation procedures usually used for models, as this will help to define a specific approach for output validation of indicators. 4.1. What can be drawn from model validation? First of all, it is important to know the limits of the objectivity and scientific value of a validation procedure. Based on philosophical and linguistic considerations, Konikow and Bredehoeft (1992) and Addiscott et al. (1995) show that validation never enables one to prove the veracity of a model but only to invalidate it. If the probability to refute a model is small, it can be considered as ‘‘validated’’. Thus, validation is always linked to a certain element of subjectivity, because the model developer has to decide which level of probability is acceptable. At least three types of procedure have been proposed by modellers for the implementation of validation (Mayer and Butler, 1993): 1. The visual procedure is commonly used by modellers and implemented by means of a plot, either comparing the time-trace of the model with the time series of the measured data, or comparing observed data with model predictions in relation to the line of perfect agreement. Kirchner et al. (1996) strongly criticise validation procedures based only on this kind of test. They demonstrate that an invalid model may satisfy the visual test and point out the lack of objectivity in the commonly used and vague statement ‘‘acceptable agreement with the data’’. Actually, this visual step is only useful to put in evidence possible bias of a model.

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

643

2. The statistical procedure may give more objectivity to the test. Actually, the implementation of statistical tests raises some problems and is not recommended by several authors (Harrison, 1990; Mitchell, 1997). Harrison (1990) proposes to limit model validation to ‘‘descriptive (and not inferential) devices’’. Several statistical variables assessing the deviance are proposed by Mayer and Butler (1993) and Wallach and Goffinet (1989). In any case, Mayer and Butler (1993) state the necessity to use several ‘‘validation measure’’ to ‘‘appreciate the whole picture’’. Thus, Yang et al. (2000) analyse the correlations between some commonly used statistical variables in validation in order to avoid redundancy. An example of this approach based on several ‘‘validation measures’’ can be found in Mary et al. (1999) who used for the validation of their model several statistical variables. To face the lack of inferential method, Mitchell (1997), and, Mitchell and Sheehy (1997) developed an empirical test consisting of checking graphically whether 95% of deviations (prediction minus observation) lie within an acceptance envelope on the plot comparing deviations with observed data. The definition of this envelope depends on the purpose of the model and the precision of observed data (Fig. 2). 3. The third procedure is based on the judgement of experts. Such an approach as proposed by Mayer and Butler (1993) consists of selecting a panel of experts in the same field of interest and of submitting the output of the

Fig. 2. A method of empirical validation of simulation models (after Mitchell, 1997 and, Mitchell and Sheehy, 1997) The test consists of checking graphically whether 95% of deviations (prediction minus observation) lie within an acceptance envelope on the plot comparing deviations with observed data. The acceptance envelope may have an other shape, an other range of variation (given by k). Both depend on the precision of measured data.

644

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

models and real-world data to them. Some formalised tests exist like the Thuring-type tests for this kind of procedure. The comparison of the model predictions with data from the scientific literature may be also included in this type of approach (Chevalier-Ge´rard et al., 1994). Due to the subjectivity inherent to procedure based on expert judgement, Mayer and Butler (1993) authors consider this type of validation rather as a complement to more objective approaches mentioned previously. However, for specific types of models as expert systems for decision aid (Harrison, 1991) or multicriteria analysis models (Qureshi et al., 1999), such approaches based on expert judgements were also recommended and used for validation. 4.2. Application to indicators A comparison of indicator output with measured data and the use of tests or descriptive variables as proposed in the previous paragraph for modelling would seem ideal. This is a first type of procedure for output validation: validation through comparison. However, two problems arise rapidly in such an attempt and concern the type of indicator outputs and the availability of measured data. In some cases, the only possibility is to get a global evaluation of the quality of the indicator value from expert judgements. This is can be called a global expert validation. 4.2.1. Validation through comparison If the indicator results from the transformation of model outputs, its validity will logically depend on the validity of the model. This approach is applied by Pussemier (in Reus et al., 1999) for one part of his pesticide indicator. Some authors follow the approach currently used in modelling and restrict validation of their indicators to a visual procedure based on the comparison with measured data. This is the case of Brown et al. (1995) for the GUS index (indicator of pesticide leaching risk) or Sharpley (1995) for the Phosphorus index. The criticisms pointed out by Kirchner et al. (1996) to this procedure can also be made to those examples. In many cases, indicators differ from a simulation model and may be based on an expert system, or result from mathematical equations or a system of qualitative ranks or scores. Furthermore, many indicators are not aimed at being used to predict an actual impact but to supply information about a potential impact, as mentioned in the introduction. Because of all those differences with simulation models, it is obvious that an linear relation between indicator output and field data cannot be expected. Taking into account those considerations, Girardin et al. (1999) propose a probability test which may be paralleled to the graphical test proposed by Mitchell (1997). As shown in Fig. 3, the test consists (1) of defining a ‘‘probability area’’ on a figure representing the indicator versus field measurements (2) testing whether this probability area includes at least 95% of the points, this acceptance level being recommended by Mitchell and Sheehy (1997). The definition of the probability area depends on the calculation method of the indicator and the precision of the measurements. The shape of a probility area given in Fig. 3a may be chosen when the

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

645

Fig. 3. A ‘‘probability test’’ to compare output of an indicator with observed data when no linear or non linear relation is expected a priori (Bockstaller and Girardin, 2000). The test consists of checking whether 95% of the points are found in the probability area. (a) For a quantitative indicator; (b) for a qualitative ordinate indicator.

646

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

indicator assess a potential impact. The probability area can take another shape as shown in Fig. 3b for a qualitative procedure. An example for a nitrogen indicator assessing nitrogen losses by volatilisation or leaching in spring and during winter (Bockstaller and Girardin, 2000) is given in Fig. 4. The risk is considered as non significant below 30 kg N/ha, and above this value, the probability area is delimited by a decreasing boundary line with a slope equal to 1/30, corresponding to the transformation of nitrogen losses into values of the indicator. The definition of the probability area means that high values of the nitrogen indicator can be only related to a low environmental risk, which, in this case, is a risk for groundwater quality. But a low value of the nitrogen indicator may be related to a high or low risk of winter leaching. This last possibility may be explained by nitrogen losses that have occurred before winter (through volatilisation or leaching in spring). The test carried out with 5 years measurements yields a successful rate of 97% of the points (74 points out of 76) in the probability area if the points close to the boundary line are included in the probability area and four points with a specific problem are discarded. A second basic problem to implement a direct comparison with measured data is the lack of available observed data for different reasons: impossibility to measure

Fig. 4. An example of a ‘‘probability test’’ for a nitrogen indicator (Bockstaller and Girardin, 2000). Measurements of soil mineral nitrogen were carried out for 5 years (1994–1998) before winter (after 15 November).

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

647

them (e.g. for sustainability or biodiversity indicators), problem of costs like in pesticides, etc. In this case, we propose to use other sources of data which do no result from measurements or observations. A solution may be to relate the indicator outputs with results obtained by simulation model that should be validated. Pussemier (in Reus et al., 1999) shows for example, that there is a relation between the GUS index and one output of the PESLA model. If indicators cannot a priori be related to observed or measured environmental data, other procedures are possible. One is to compare the output of the indicator with those of other indicators having the same purpose but constructed in different way. It is assumed that if those indicators show the ‘‘same’’ results (e.g. the same ranking of risk for pesticides), their reliability may be reinforced. This type of work was carried out in the European Concerted Action CAPER (Reus et al., 1999). However, it can be argued that indicators may all be wrong for the same reason, although the probability is low when their construction is totally different, which was the case in CAPER. If indicators do not give the same answer, a analysis of the construction of each one is needed, to identify the reason of the discrepancy and which one may be ‘‘wrong’’. In CAPER, one indicator was not linked to the other one, and it was clear that it comes from its simplified construction which was not scientifically founded. In case of lack of measured data for comparison with indicator outputs, another solution is found in Bockstaller and Girardin (1996) who compared the values for the effect of the preceding crop (one of the component of a crop sequence indicator) with values given by experts. 4.2.2. Global expert validation If those methods are not possible, expert judgement is a last resort: outputs of an indicator may be submitted to a panel of experts who evaluate their relevance. For instance, this is, proposed by Lewis and Bardon (1998) although they do not give a formalised procedure to do it. As part of our ongoing work on agro-ecological indicators (Bockstaller et al., 1997), we developed a first version of an energy indicator assessing energy cost of cropping systems. We presented results of calculations to a group of farmers (having in this case the role of experts). They did not agree with the results on energy consumption for machines in the field according to their own experience of machine consumption. This put in evidence a weakness of the indicator for a given variable and lead to an improvement of the indicator (Pervanchon et al., 2002).

5. ‘‘End-use validation’’ This part of validation deals with the usefulness of an indicator as benchmark for decision making. Several reasons may explain the lack of usefulness of an indicator: a target of great relevance for the user may be missed, some data needed to calculate them are not available or the outputs of the indicator are not understandable or legible. To better appreciate the quality of an indicator, Girardin et al. (1999) propose a ‘‘usefulness test’’ to get the end-users’ opinions. Through a survey, the users can point out the weaknesses of the indicator as diagnosis or decision aid tool. This

648

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

procedure should also help to ensure that end-users understand what is being indicated by the indicator or that the results are being interpreted correctly by end-users. This is particularly important for indicators of more complex issues like sustainability. An example of this type of validation is given by Douguet et al. (1999) who performed an end-use validation to evaluate the usefulness of the set of agro-ecological indicators developed by Bockstaller et al. (1997). Charollais and Schaub (1999) tested how easy an indicator assessing field erosion risk was to handle and the accuracy of the assessment made by the end-use. This last test showed some problems of reproducibility of the results between the participants to the test.

6. Discussion Although several authors point out the difficulty of indicator validation in the sense of model validation (Reus et al., 1999; Lewis and Bardon, 1998) or do not mention this procedure at all, this does not mean that nothing can be done. This step of output validation is crucial to assure the credibility of an indicator approach among scientist involved in modelling and to the future end-users, governments or other decision-makers, NGO’s farmers, etc. of the tool. Nevertheless this last public are often less concerned by this point because they may trust in the outcomes of the research, or because they have no choice between several tools. With the increase of assessment systems, this type of procedure will take more importance beside other tests, for example, for the feasibility and the handiness of the tool, to help to the choice of the ‘‘best’’ method by a future user. If many indicators cannot be validated in the same way as simulation models by comparison with measured data, validation based on expert judgement and expert consensus concerning the quality of the indicator design as well the quality of the indicator outputs is always possible. Such an approach should be considered as a minimum requirement for indicator validation. Thus, expert judgement plays a more important role than in modelling, which appears to be more subjective. However, this type of procedure should not be too much depreciated with regard to model validation which is generally considered as an objective procedure. Our short review of validation in modelling reveals that validation is not an absolute objective procedure and some validation procedures of model present a lack of rigour (Kirchner et al., 1996). Even more, we propose several possibilities to go further than this first level of expert judgement (Fig. 5). The availability of measured data on the one hand, and, the construction of the indicator on the other hand play a major role in deciding how to carry out the comparison between the indicator outputs and data. The different possibilities shown in Fig. 5 are complementary and do not exclude each other. Actually, a procedure of validation is never completely accomplished (Whisler et al., 1986). Thus, several alternatives shown in Fig. 5 should be used together in order to broaden the domain of validity of an indicator. In fact, measured data may be available for some situations (e.g. a given climate, soil, cropping systems, etc.) whereas for other situations, only expert data are available.

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

649

If no observed data are available, we propose to compare indicator outputs, with models outputs, with value of other indicators or with a data set based on expert judgements. In this case, the strength of validation appears lower than in validation through comparison with measured data. However, It should be noticed that this can also occur in model validation . In this case, Kirchner et al. (1996) recommend to use measurements obtained by laboratory experiments in simplified conditions. For specific models which differ from simulation models, even submission of models outputs to experts were recommended (Harrison, 1991; Qureshi et al., 1999). Actually, as it was pointed out by Addiscott et al. (1995), validation does not allow one to verify that a model or an indicator is true, but to assess its probability of refutation and to reject it if this probability is high. The proposed procedure based on comparison allows to reveal discrepancies between indicator outputs and corresponding data. If the data do not result from measurements, an assumption is made that if the indicator is not linked to this kind of data used as reference, it would not be linked to real-world data (Kirchner et al., 1996). Concerning the construction of indicators, qualitative indicators or indicators aimed to put in evidence a potential impact seem to be more problematic to ‘‘validate’’ in the sense of modelling by a comparison procedure. For most of those indicators, climatic variable are not taken into account for reasons of simplification or to simplify the interpretation (Halberg, 1999). Thus, it is obvious that straightforward relation with measured or other data will not be observed. An example can be found in Appel et al. (1994) who fail to validate their nitrogen budget by looking for a linear relation with measured data. However, it is possible to go further in the

Fig. 5. A decision tree summarising the possibilities of output validation for indicators (see text).

650

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

validation and we proposed a procedure of comparison which is a based on a formalised graphic test, a probability test (Girardin et al., 1999) which can be compared with a test proposed in modelling (Mitchell, 1997). The definition of ‘‘probability area’’ or acceptance area is a key issue in this test and this should be done rigorously and a priori. Such a procedure may also be applied to qualitative indicators as well as observed qualitative data. Another problems due to the construction of the indicator may arise when the indicator is a composite one taking into account for instance different environmental component. For instance the pesticide indicator developed by van der Werf and Zimmer (1998) gives a value of global risk which results from the aggregation of three risks, for ‘‘air’’, ‘‘surface water’’ and ‘‘groundwater’’. If the data are available for each criteria, a n-dimension validation procedure should theoretically be carried out, which requests a high level of mathematical skill (Girardin et al., 1999). Another solution is at least to validate separately each one-dimensional module of the composite indicator. Another hindrance exists when the indicator is built with variables of different types. For example, environmental data and toxicological data. Examples can be given for some pesticides indicators, based on a risk ratio, i.e. the ratio between an estimated concentration of a pesticide in a given environmental compartment and its toxicity for relevant organisms (Reus et al., 1999). The direct comparison with measured environmental data becomes impossible. In this case it is only possible to validate the environmental part of the indicator, i.e. the estimated concentration in environment. In the example of the groundwater module of the pesticide indicator developed by van der Werf and Zimmer (1998), the validation can be restricted to the key environmental component of the module: the GUS index for which we have mentioned two validation procedures (Pussemier in Reus et al., 1999; Brown et al., 1995); The last type of validation, the ‘‘end-use validation’’ may appear more original than the design and output validations, both being currently used in modelling. A similar procedure to the proposed ‘‘end-use validation’’ is not often mentioned in modelling which is regretted by Cox (1996) and is strongly recommended by Harrison (1991) for decision-aid models (Lundkvist, 1997). From the results of this work, Cox (1996) points out the necessity to include in a validation procedure a comparison of the model with other methods like currently used technologies or management systems which are not based on the given model. For simulation models, Kirchner et al. (1996) also propose to test them against other decision-making methods such as expert opinions in order to know the additional gain which can be expected by using a model rather than another method. An example is given in Meynard (1998). If the necessity of this last stage of validation is clearly stated, its implementation is not formalised. Thus, the surveys done by Cox (1996) or Douguet et al. (1999) are rather descriptive and are not based on a formalised test. In any case, such a procedure allows a tie between the developers of the indicators and the users. As mentioned by Fisher (1998), it is crucial to establish such interactive ties with the groups of users involved in conducting environmental risk assessment, in order to provide a user-feedback on how to improve the indicator.

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

651

7. Conclusions Referring to models, validation of indicators is often requested. However, many indicator developers do not deal with this issue, probably because they think that the long term acceptance of their indicators by users is sufficient to assure their credibility. In this paper we propose for the validation of indicator a three-stage approach, based on a ‘‘design validation’’, an ‘‘output validation’’ which can be achieved by comparison with measurements or other type of data, and the ‘‘end use validation’’. This latter stage is particularly relevant for indicators which are built to be used as decision aid tools, to avoid that they remain confined to the office of the developer. Actually, the use of any tools should be thought about from its design. In any case, this procedure needs to be better formalised in a real test and not simply limited to a descriptive work.

Acknowledgements This work is sponsored by the EU (Interreg 2 programme), the Land BadenWurttemberg (Germany) and the Alsace Region (France) as part of the ITADA programme. We are grateful to A. Blatz for his work of collecting data for the probability test and the farmers’ survey. Thanks also to J. M. Meynard, N. MunierJolin and H. van der Werf for their helpful comments on a draft of this paper and to the colleagues from INRA involved in modelling for their stimulating questions on the validation of indicators. References Addiscott, T., Smith, J., Bradbury, N., 1995. Critical evaluation of models and their parameters. Journal of Environmental Quality 24, 803–807. Appel, T., Schmu¨cker, M., Schultheiss, U., 1994. Mo¨glichkeiten und Grenzen schlagbezogener N-Bilanzen zur Reduzierung der Nitratbelastung in Wasserschutzgebieten. VDLUFA-Schriftenreihe 37, 137–140. Bockstaller, C., Girardin, P., 1996. The crop sequence indicator; a tool to evaluate crop rotations in relation to the requirements of integrated arable farming systems. Aspects of Applied Biology 47, 405– 408. Bockstaller, C., Girardin, P., 2000. Agro-ecological indicators—instruments to assess sustainibility in agriculture. In: Ha¨rdtlein, M., Kaltschmitt, M., Lewandowski, I., Wurl, H. (Eds.), Nachhaltigkeit in der Landwirtschaft. Landwirtschaft im Spannungsfeld Zwischen O¨kologie, O¨konomie und Sozialwissenschaft, Initiativen zum Umweltschutz, Vol. 15. Erich Schmidt Verlag, Stuttgart, pp. 69–83. Bockstaller, C., Girardin, P., Van der Werf, H., 1997. Use of agro-ecological indicators for the evaluation of farming systems. European Journal of Agronomy 7, 261–270. Brown, C.D., Carter, A.D., Hollis, J.M., 1995. Soils and pesticide mobility. In: Roberts, T.R., Kearney, P.C. (Eds.), Environmental Behaviour of Agrochemicals, Progress in Pesticide Biochemistry and Toxicology, Vol. 9. John Wiley & Sons, Chichester, pp. 131–184. Charollais, M., Schaub, D., 1999. Erosion: test d’une cle´ d’appre´ciation du risque. Revue Suisse d’Agriculture 31, 33–38. Chevalier-Ge´rard, C., Denis, J.B., Meynard, J.M., 1994. Perte de rendement due aux maladies cryptogamiques sur ble´ tendre d’hiver. Construction et validation d’un mode`le de l’effet du syste`me de culture. Agronomie 14, 305–318.

652

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

Cox, P.G., 1996. Some issues in the design of agricultural decision support systems. Agricultural Systems 52, 355–381. Crabtree, J.R., Brouwer, F.M., 1999. Discussion and conclusions. In: Brouwer, F.M., Crabtree, J.R. (Eds.), Environmental Indicators and Agricultural Policy. CAB International, Wallingford, pp. 279–285. Douguet, J.M., O’Connor, M., Girardin, P., 1999. Validation Socio-e´conomique des Indicateurs Agroe´cologiques (C3ED). Universite´ de Versailles Saint Quentin-en-Yvelines, Guyancourt, France. Fisher, W.S., 1998. Development and validation of ecological indicators: an ORD approach. Environmental Monitoring and Assessment 51, 23–28. Gilmour, P., 1973. A general validation procedure for computer simulation models. Australian Computer Journal 5, 127–131. Girardin, P., Bockstaller, C., van der Werf, H.M.G., 1999. Indicators: tools to evaluate the environmental impacts of farming systems. Journal of Sustainable Agriculture 13, 5–21. Gras, R., Benoit, M., Deffontaines, J.P., Duru, M., Lafarge, M., Langlet, A., Osty, P.L., 1989. Le Fait Technique en Agronomie. Activite´ Agricole, Concepts et Me´thodes d’E´tude. Institut National de la Recherche Agronomique, L’Hamarttan, Paris, France. Halberg, N., 1999. Indicators of resource use and environmental impact for use in a decision aid for Danish livestock farmers. Agriculture Ecosystems and Environment 76, 17–30. Hansen, J.W., 1996. Is agricultural sustainability a useful concept? Agricultural Systems 50, 117–143. Harrison, S.R., 1990. Regression of a model on real-system output: an invalid test of model validity. Agricultural Systems 34, 183–190. Harrison, S.R., 1991. Validation of agricultural expert systems. Agricultural Systems 35, 265–285. Hertwich, E.G., Pease, W.S., Koshland, C.P., 1997. Evaluating the environmental impact of products and production processes: a comparison of six methods. The Science of the Total Environment 196, 13–29. Hess, B., Schmid, H., Lehmann, B., 1999. Umweltindikatoren—Scharnier zwischen O¨konomie und O¨kologie. Agrarforschung 6, 29–32. Kirchner, J.W., Hooper, R.P., Kendall, C., Neal, C., Leavesley, G., 1996. Testing and validating environmental models. The Science of the Total Environment 183, 33–47. Konikow, L.F., Bredehoeft, J.D., 1992. Groundwater models cannot be validated. Advances in Water Resources 15, 75–83. Lewis, K.A., Bardon, K.S., 1998. A computer-based informal environmental management system for agriculture. Environmental Modelling and Software 13, 123–137. Lundkvist, A., 1997. Weed management models. A literature review. Swedish Journal of agricultural Research 27, 155–166. Machet, J.M., Laurent, F., Chapot, J.Y., Dore, T., Dulout, A., 1997. Maıˆtrise de l’azote dans les intercultures et les jache`res. In: Lemaire, G., Nicolardot, B. (Eds.), Maıˆtrise de l’Azote dans les Agrosyste`mes. Les Colloques de l’INRA. INRA, Versailles, pp. 271–288. Mary, B., Beaudoin, N., Justes, E., Machet, J.M., 1999. Calculation of nitrogen mineralization and leaching in fallow soil using a simple dynamic model. European Journal of Soil Science 50, 1–18. Mayer, D.G., Butler, D.G., 1993. Statistical validation. Ecological Modelling 68, 21–32. Meynard, J.M., 1998. La mode´lisation du fonctionnement de l’agro-syste`me, base de la mise au point d’itine´raires techniques et de syste`mes de culture. In: Biarne`s, A. (Ed.), La Conduite du Champ Cultive´. Points de Vue d’Agronomes. ORSTOM E´ditions, Paris, pp. 29–54. Mitchell, G., May, A., McDonald, A., 1995. PICABUE: a methodological framework for the development of indicators of sustainable development. International Journal of Sustainable Development and World Ecology 2, 104–123. Mitchell, P.L., 1997. Misuse of regression for empirical validation of models. Agricultural Systems 54, 313–326. Mitchell, P.L., Sheehy, J.E., 1997. Comparison of prediction and observations to assess model performance: a method of empirical validation. In: Kropff, M.J., Teng, P.S., Aggarwal, P.K., Bouma, J., Bouman, B.A.M., Jones, J.W., Van Laar, H.H. (Eds.), Applications of Systems Approaches at the Field Level, Vol. 2. Kluwer Academic Publischers, Dordrecht, pp. 437–451. Pervanchon, F., Bockstaller, C., Girardin, P., 2002. Assessment of energy use in arable farming systems by means of an agro-ecological indicator: the energy indicator. Agricultural Systems 72, 149–172.

C. Bockstaller, P. Girardin / Agricultural Systems 76 (2003) 639–653

653

Qureshi, M.E., Harrison, S.R., Wegener, M.K., 1999. Validation of multicriteria analysis models. Agricultural Systems 62, 105–116. Reus, J., Leenderste, P., Bockstaller, C., Fomsgaard, I., Gutsche, V., Lewis, K., Nilsson, C., Pussemier, L., Trevisan, M., van der Werf, H., Alfarroba, F., Blu¨mel, S., Isart, J., McGrath, D., Seppa¨la¨, T., 1999. Comparing Environmental Risk Indicators for Pesticides. Results of the European CAPER project. Centre for Agriculture and Environment, Utrecht. Schro¨der, J.J., van Asperen, P., van Dogen, G.J.M., Wijnands, F.G., 1996. Nutrient supluses on integrated arable farms. European Journal of Agronomy 5, 181–191. Sharpley, A.N., 1995. Identifying sites vulnerable to phosphorus loss in agricultural runoff. Journal of Environmental Quality 24, 947–951. Smith, O.H., Petersen, G.W., Needelman, B.A., 2000. Environmental indicators of agroecosystems. Advances in Agronomy 69, 75–97. Stockhammer, E., Hochreiter, H., Obermayr, B., Steiner, K., 1997. The Index of Sustainable Economic Welfare (ISEW) as an alternative to GDP in measuring economic welfare. The results of the Austrian (revised) ISEW calculation 1955–1992. Ecologial Economics 21, 19–34. Taylor, D.C., Mohamed, Z.A., Shamsudin, M.N., Mohayidin, M.G., Chiew, F.C., 1993. Creating a farmer sustainability index: a Malaysian case study. American Journal of Alternative Agriculture 8, 175–184. van der Werf, H.G.M., Petit, J., 2002. Evaluation of environmental impact of agroculture at the farm level: a comparison and analysis of 12 indicator-based methods. Agriculture Ecosystems and Environment (in press). van der Werf, H.G.M., Zimmer, C., 1998. An indicator of pesticide environmental impact based on a fuzzy expert system. Chemosphere 36, 2225–2249. Vos, P., Meelis, E., TerKeurs, W.J., 2000. A framework for the design of ecological monitoring programs as a tool for environmental and nature management. Environmental Monitoring and Assessment 61, 317–344. Wallach, D., Goffinet, B., 1989. Mean squared error of prediction as a criterion for evaluating and comparing system models. Ecological Modelling 44, 299–306. Whisler, F.D., Acock, B., Baker, D.N., Fye, R.E., Hodges, H.F., Lambert, J.R., Lemmon, H.E., McKinion, J.M., Reddy, V.R., 1986. Crop simulations models in agronomic systems. Advances in Agronomy 40, 141–208. Yang, J., Greenwood, D.J., Rowell, D.L., Wadsworth, G.A., Burns, I.G., 2000. Statistical methods for evaluating a crop nitrogen simulation model, N_ABLE. Agricultural Systems 64, 37–53.