The need for validation of ecological indices

The need for validation of ecological indices

Ecological Indicators 84 (2018) 546–552 Contents lists available at ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/locate/ec...

856KB Sizes 70 Downloads 28 Views

Ecological Indicators 84 (2018) 546–552

Contents lists available at ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

The need for validation of ecological indices a,⁎

a

MARK

b

c

d

e,f

P.E. Moriarty , E.E. Hodgson , H.E. Froehlich , S.M. Hennessey , K.N. Marshall , K.L. Oken , M.C. Siplea, S. Kof, L.E. Koehna, B.D. Piercea, C.C. Stawitze a

School of Aquatic and Fishery Sciences, University of Washington, Box 355020, Seattle, WA 98195, United States National Center for Ecological Analysis and Synthesis, University of California, 735 State St. Suite 300, Santa Barbara, CA 93101, United States Department of Integrative Biology, Oregon State University, 2701 SW Campus Way, Corvallis, OR 97331, United States d Cascade Ecology, LLC, PO Box 25104, Seattle, WA 98165, United States e Quantitative Ecology and Resource Management, University of Washington, Box 352182, Seattle, WA 98195, United States f Department of Marine and Coastal Sciences, Rutgers University, 71 Dudley Rd., New Brunswick, NJ 08901, United States b c

A R T I C L E I N F O

A B S T R A C T

Keywords: validation development criteria

Increased recognition of the need for ecosystem-based management has resulted in a growing body of research on the use of indicators to represent and track ecosystem status, particularly in marine environments. While multiple frameworks have been developed for selecting and evaluating indicators, certain types of indicators require additional consideration and validation. In particular, an index, which we define as an aggregation of two or more indicators, may have unique properties and behaviors that can make interpretation difficult, particularly in a management context. We assert that more rigorous validation and testing is required for indices, particularly those used to inform management decisions. To support this point we demonstrate the need for validation and then explore current development and validation processes for ecosystem indices. We also compare how other disciplines (e.g., medicine, economics) validate indices. Validating indices (and indicators) is particularly challenging because they are often developed without an explicit objective in mind. We suggest that exploring the sensitivity of an index to the assumptions made during its development be a pre-requisite to employing such an index.

1. Introduction Over the past several decades, recognition of the importance of ecosystem approaches to natural resources management has increased (Fulton et al., 2014; Pikitch et al., 2004). In many countries, these approaches have begun to be codified into law and implemented in management strategies. The Marine Strategy Framework Directive in Europe, which provides a legislative framework for protecting the European Union’s marine waters using an ecosystem approach, is one example. Australia has also developed tools for ecosystem-based management, including ecosystem risk assessments and harvest strategy frameworks (Smith et al., 2007). Recent advances in the U.S. include Integrated Ecosystem Assessments (IEAs) (Levin et al., 2009) and fishery ecosystem plans (FEPs) (e.g., Pacific Fishery Management Council, 2013). Management at the ecosystem scale requires metrics that track ecosystem status and provide easily interpretable information about changes in that status. Indicators are measurable properties that track changes in attributes of ecosystems that cannot be measured directly (Kerschner et al., 2001).

Indicators provide a way to track progress towards management objectives. They are also used in decision analysis to evaluate the impacts of alternative management strategies (Fulton et al., 2005), or to define thresholds and goals for management (Gsell et al., 2016). Existing metrics range from relatively simple ones, such as the abundance of a single species, to complex, multifaceted indices (plural of ‘index’) that combine ecological attributes with economic and social factors, such as the Ocean Health Index (Halpern et al., 2012). Here, we distinguish between an indicator, which involves only one data stream that is directly observable (such as survey counts of a single species), and an index (plural: indices), which is a quantitative aggregation of two or more variables (Mayer, 2008). “Indicator” and “index” are often used interchangeably in the literature, though the two are different concepts and warrant explicit definitions. Indices can be a useful management and communication tool, as they collapse multiple indicators into a single value. However, their simplicity may conceal important ecosystem complexity. Indices have been used to characterize system attributes such as physical drivers of ecosystem processes (e.g., the Pacific Decadal Oscillation (PDO),

Abbreviation:FEC, focal ecosystem component ⁎ Corresponding author. E-mail address: [email protected] (P.E. Moriarty). http://dx.doi.org/10.1016/j.ecolind.2017.09.028 Received 13 June 2017; Received in revised form 13 September 2017; Accepted 14 September 2017 1470-160X/ © 2017 Elsevier Ltd. All rights reserved.

Ecological Indicators 84 (2018) 546–552

P.E. Moriarty et al.

individual species biomass ability to track food web status), which we defined here as the indicator’s correlation to the FEC. High quality indicators have high correlation (correlation = 0.7) to the FEC, where low quality indicators have low correlation (correlation = 0.5; Fig. 2a,b). We define responsiveness as how abruptly the indicator can change in response to external conditions, which may or may not be related to the focal ecosystem component. Indicators with high responsiveness are more independent from external conditions (autocorrelation = 0.3), whereas indicators with poor responsiveness are more dependent on external conditions (autocorrelation = 0.7; Fig. 2c,d). We used three performance metrics to evaluate the ability of the index to track the FEC: 1) the average Pearson’s correlation between the index and FEC (from here on referred to as ‘correlation’), 2) the proportion of simulations for which a comparison of the directions of the trends (positive or negative) of the index and the FEC, calculated over the final 5 years (‘5 year trend’) were in agreement, and 3) the proportion of simulations for which the mean of the last 5 years was more than 1 standard deviation above or below the long-term mean for both the index and the FEC (‘1 SD’). Metrics (2) and (3) are coarser than correlation, and capture the status and trends of ecological indicators represented in several ecosystem status reports developed for U.S. Fishery Management Councils (e.g., Pacific, North Pacific, and Gulf of Mexico) (Pacific Fishery Management Council, 2013). We used two scenarios to investigate the influence of quality and responsiveness on index performance. The ‘quality scenario’ included indices composed of indicators with different qualities (Kershner et al., 2011; Niemeijer and de Groot, 2008), holding responsiveness constant: 1) four low quality indicators, 2) four high quality indicators, and 3) two low quality and two high quality indicators. Similarly, in the ‘responsiveness scenario’, indices had combinations of indicators with different levels of responsiveness (Kershner et al., 2011; Niemeijer and de Groot, 2008), holding quality constant: 1) four poorly responsive indicators, 2) four highly responsive indicators, and 3) two poorly and two highly responsive indicators. We also investigated how different weighting schemes for combining indicators affects index performance, as weighting scheme influences index values, and potentially conclusions (Halpern and Fujita, 2013). We summed two high quality indicators and two low quality indicators, while holding responsiveness constant. We then weighted the indicators using three schemes. The first scheme used the default approach of additive weighting (Halpern et al., 2009), where the index value at each time step is an average of component indicators. Second, we used expert judgment weighting (Halpern et al., 2012). Since all time series were randomly generated, we optimistically assumed experts knew the relative quality (correlation to true system state) of the four indicators and weighted them accordingly. Third, we assigned random weights to the four indicators and calculated their weighted sum to simulate failed expert weighing. This may occur if expert judgement was used, but experts judged relative indicator quality poorly. We found that the coarser metrics (the 5-year trend and standard deviation) were robust to changes in indicator properties (Fig. 3, 4). When using coarser evaluation metrics, indicator quality had no effect and responsiveness had only a small effect on index performance (Fig. 3, 4). These metrics are very robust to changes in the indicators and will only detect changes in the corresponding index if there is a large fluctuation. This may be beneficial if managers are interested in wide fluctuations in the focal ecosystem component, such as regime shifts, or may be cause for concern if smaller fluctuations are important. When using the fine scale metric, ‘correlation’, properties of the indicators affected index performance (Fig. 3a). Summing four low quality indicators resulted in a low index quality, with a mean correlation to the true system state of 0.21 (SD = 0.19) and summing four high quality indicators resulted in high index quality, with a mean correlation of 0.93 (SD = 0.02) (Fig. 3a). When an index included two low quality indicators and two high

Mantua and Hare, 2002), inherent properties of an ecosystem (e.g., Shannon biodiversity index), or ecosystem services (e.g., Ocean Health Index, Halpern et al., 2012). An index is easy for managers to understand, but aggregating multiple time series may dampen important features of the attribute the index is designed to represent (Figge, 2004). We suggest that the aggregate nature of indices mandates additional validation. Validation is the process of establishing that an indicator or index meets performance criteria chosen for the specific circumstances (Rykiel, 1996). This is necessary to gauge how they track the desired attribute of system status over time. A body of literature provides guidance for developing and validating indicators (Samhouri et al., 2012; Samhouri et al., 2009), and indicator development is a key component of IEAs (Levin et al., 2009). However, these guidelines do not differentiate between indicators and indices, though indices require an extra level of scrutiny. Each indicator composing an index must be validated, but additionally, properties of indicators included in an index and the way that indicators are combined likely influence the ecological index value and its temporal dynamics. For example, the variance and amplitude of individual indicators will affect how they act when combined into a single index. Here, we illustrate the challenge of validating ecological indices and suggest improvements to the practice of validation. We begin with a simulated example to show how indicator properties influence index performance. We then evaluate several widely used ecosystem indices against selection criteria developed for indicators with a focus on validation criteria. Finally, we describe how other disciplines validate indices and compare their methods to those used in marine ecology. We close with suggestions to avoid potential pitfalls in the validation and use of indices to inform management decisions. 2. Why is index validation necessary? Combining multiple indicators into an index requires the developer to make decisions that may ultimately affect the behavior, performance, and reliability of the index. To explore the consequences of these decisions and provide a rationale for the importance of validation, we use a simulation study. The simulation is intended to be a generic example of how indicator characteristics and methods of combining indicators influence the resulting index. To concretize this example, we present the simulation in the context of a hypothetical scenario. Consider, for example, a management body that wants to track food web status of a marine system (the attribute) and chooses to monitor benthic fish community composition (the index). The management body chooses the status of benthic fish species because they are an important component of marine food webs (Kershner et al., 2011; Levin and Schwing, 2011). Biomass estimates for four species, based on trawl survey samples, are chosen as indicators to represent species status. The index, benthic fish community composition, is then a sum of survey data on all four species. The abundances are averaged to create the index. For this demonstration, we generated a 50-year time series of a focal ecosystem component (FEC) (e.g., ecosystem status, details of simulations in Supplement 1). The time series of the FEC was used to generate four simulated time series that were correlated with the FEC (e.g., individual species’ biomasses, Fig. 1). We modified the strength of correlation and the variance of indicator time series and explored how those factors and the method of combining the indicators influenced the ability of the index to represent the FEC. We considered how the indicator quality and responsiveness affected the performance of the resulting index. Quality and responsiveness are two aspects of indicators commonly considered during indicator development and selection (Kennedy and Jacoby, 1999; Kershner et al., 2011; Niemeijer and de Groot, 2008). We define quality as the ability of the indicator to track the FEC of interest (e.g., 547

Ecological Indicators 84 (2018) 546–552

P.E. Moriarty et al.

Fig. 1. Definitions of Indicator and Index (plural: indices). To simulate indices, four individual indicators were generated from a focal ecosystem component and then combined into an index. The quality and responsiveness of the indicators was varied, as was the weighting scheme used to combine them into an index.

Fig. 2. Example Time Series. Examples of low quality and high quality indicators (a, b) and poor responsive and high responsive indicators (c, d), along with the focal ecosystem component they are intended to represent.

Fig. 3. Indicator Quality Simulations. Four indicators of different qualities (correlation to true system state) were combined into a single index. The index performance was determined using 3 different metrics (a, b, c).

548

Ecological Indicators 84 (2018) 546–552

P.E. Moriarty et al.

Fig. 4. Indicator Responsiveness Simulations. Four indicators of different responsiveness (autocorrelation) were combined into a single index. The index performance was determined using 3 different metrics (a, b, c).

perception of the FEC. In cases where a threshold point is used for management (e.g., detecting regime shifts (deYoung et al., 2008)) it is important to ensure that the index’s properties match its desired use. For instance, if a highly responsive index is being developed, evaluation using a 5-year trend may not be a good choice. In other cases, when the goal is to provide ecological context (e.g., (Pacific Fishery Management Council, 2013)), coarser evaluation metrics may be more useful for capturing long-term trends in the system. While our simulation approach is simplified, and does not assess an array of other index components (e.g., multiplicative weightings, fractions, etc.), it highlights ways of exploring index properties and potential trade-offs.

quality indicators, it still performed well, with a correlation of 0.77 (SD = 0.08) (Fig. 3a). Across all three cases, the index had a higher correlation than any of the individual indicator time series, as adding more indicators always increases the information contained within the index. However, correlation between the index and FEC was less affected by differences in responsiveness (Fig. 4a). The coarser metrics were less likely to reveal changes in index performance as indicator properties changed. Indicator quality had no effect on index performance when using the 5-year trend and standard deviation metrics. Responsiveness of indicators affected index performance for only the standard deviation metric, where the index’s ability to track the FEC decreased with lower responsiveness (Fig. 4c). The weighting scheme used to combine indicators into an index had less of an effect than expected. It only affected the correlation of the index to the FEC (Fig. 5). In particular, expert judgment weighting had the highest correlation (Pearson’s correlation = 0.91) due to experts weighting indicators based on their quality. Though ideal, the application of such a weighting scheme in management relies on experts accurately knowing which indicators have high quality. When experts were in correct in their judgments about weighting, the correlation was lowest (r2 = 0.73 ± 0.34), highlighting the importance of good information when determining weighting schemes. For the coarser metrics, the weighting scheme of the indicators did not affect the performance of the index. Developing an index requires deciding which indicators to include, how to combine them, and how to detect changes in the index. As we see from the simulation example, different components and level of knowledge of the system had an unexpected lack of influence. A thorough validation process to analyze how decisions made in index development affect index behavior will aid in the interpretation of that index. In particular, those developing an index need to keep in mind that the metrics for detecting ecosystem change can strongly affect the

3. Are ecosystem indices validated? Having demonstrated validation as an important component of index development, we now consider whether existing ecosystem indices are validated. We compare several well-known indices against criteria commonly stated in the literature, focusing more heavily on validation criteria. Specifically, we selected eight commonly used indices and compared them against 15 criteria developed for evaluating ecosystem indicators. We collected commonly used indicator selection criteria and attributes from the existing literature on this topic (Kennedy and Jacoby, 1999; Kershner et al., 2011; Niemeijer and de Groot, 2008; Salas et al., 2006) and divided them into four themes: theory, data, use, and validation, to investigate which themes typically had their criteria met (Table 1). Theory criteria relate to the basis for creating an index. Criteria included here correspond to whether the index has a sound theoretical basis, is operationally feasible, and responds predictably to the underlying driver(s). Data are the basic characteristics of the data sets used in the index (e.g., are the data numerical? Do they have broad spatial coverage?). Use criteria are characteristics that ensure ease of use and interpretation in management decisions and monitoring. Finally, validation criteria describe the process of choosing how to create the index and determining whether it accurately measures the FEC. We particularly focused on validation as it underlies the utility of an index in a management context. We selected eight well-known indices from the fields of marine ecology and fisheries that range from very simple indices that are a ratio of two variables to complex indices that involve a range of indicators with different units. Here we include: Pacific Decadal Oscillation (PDO) (Mantua and Hare, 2002), Marine Trophic Index/ Mean Trophic Level (Pauly and Watson, 2005), Proportion of Overfished Stocks (Worm et al., 2009), Percent Predatory Fish (also known as the Piscivore/Zooplanktivore Ratio) (Caddy and Garibaldi, 2000)), Marine Biotic Index (Borja et al., 2000), Pelagic/Demersal Ratio (Caddy et al., 1998; de Leiva Moreno et al., 2000), Shannon Biodiversity Index (Shannon, 1948), and Ocean Health Index (Halpern et al., 2012).

Fig. 5. Indicator Weighting Simulations. Four indicators, two of low quality and two high quality, were combined into one index by weighting them in different ways. The index performance was compared using three different metrics.

549

Ecological Indicators 84 (2018) 546–552

P.E. Moriarty et al.

Table 1 The fifteen criteria we used to score existing indices against. Criteria from existing literature were collected, then condensed to remove duplicates and combine comparable ideas. Citations: 1 (Kershner et al., 2011), 2(Salas et al., 2006), 3 (Kennedy and Jacoby, 1999), 4 (Niemeijer and de Groot, 2008). Category

Indicator evaluation Criteria

Definition

Theory

Theoretically Sound1234

Theory

Operationally simple12

Theory

Linkable to scientifically-defined reference points and progress targets1234 Responds predictably and is sufficiently sensitive to changes in a specific ecosystem key attributes1 Responds predictably and is sufficiently sensitive to changes in specific management actions or pressures123

Scientific, peer-reviewed findings should demonstrate that the index acta as reliable surrogates for ecosystem key attribute(s). The methods for sampling, measuring, processing, and analyzing the indicator data should be feasible. It should be possible to link index values to quantitative or qualitative reference points and target reference points, which imply positive progress toward ecosystem goals. Index should respond unambiguously to variation in the ecosystem key attribute(s) they are intended to measure, in a theoretically- or empirically-expected direction. Management actions or other human-induced pressures should cause detectable changes in the indicators, in a theoretically- or empirically-expected direction, and it should be possible to distinguish the effects of other factors on the response. Diel, seasonal, annual, and decadal variability in the index should be understood, as should spatial heterogeneity or patchiness in indicator values. (Intrinsic) Be repeatable and reproducible in different contexts. Biological indices should integrate the effects of multiple inputs without confounding identification of their source. This is often referred to as specificity. It should be possible to estimate measurement and process uncertainty associated with each index and to ensure that variability in index values does not prevent detection of significant changes. Statistical properties that allow unambiguous interpretation Component indicators should be directly measureable. Quantitative measurements are preferred over qualitative, categorical measurements, which in turn are preferred over expert opinions and professional judgments. Signals changes in ecosystem attributes before they occur, and ideally with sufficient lead time to allow for a management response. Allows for setting thresholds that can be used to determine when to take action. Index should be simple to interpret, easy to communicate, and public understanding should be consistent with technical definitions.

Validation Validation

Validation

Spatial and temporal variation understood12

Validation Validation Validation

Portability4 Integrates the effects of a number of inputs without confounding the identification of the source23 High signal-to-noise ratio1

Validation Data Data

Statistical properties4 Concrete1 Numerical1

Use

Anticipatory or leading indicator14

Use Use

Thresholds4 Understood by the public and policy makers12

convenience of a single number that can be used to represent the state of a system (general health of a person or a country’s economic status), and they also have clear limitations if such representation is not numerically accurate and/or theoretically sound. We investigated index validation in these other disciplines to determine if this validation deficiency was unique to ecological indices. To do this, we reviewed literature for developing and assessing indices in disciplines outside of the natural sciences − economics, medicine, engineering, and public health − to determine if and how validation methods are used. Validation, as above, is defined as quantification of how well the calculated index represents the objective attribute (Rykiel, 1996). We found that some fields had established rigorous methods for developing and testing indices, whereas other fields experience challenges similar to those in marine ecology. Fields with clear objectives and large quantities of independent data with low measurement error have strong validation practices. For instance, medicine and engineering both regularly use indices to achieve specific objectives, which can be characterized by a binary measure of success or failure. In medicine, the response may be whether a patient lives or dies, and in engineering it may be whether a structure holds or fails. The validation of medical indices is improved by a wealth of data; there are often enough independent samples to separate the data into training and testing sets for cross-validation (Aujesky and Fine, 2008; Bombardier et al., 1992; Fine et al., 1997; Lee et al., 1999). Furthermore, it is possible to quantitatively measure multiple characteristics about each study subject. For example, Lee et al. (1999) developed an

Indices were scored against criteria in each of the four categories (see Supplement 2 for details). We also documented which categories of criteria had clear tendencies to be met, or not. To compare the indices against the criteria and attributes, we crossvalidated our expert opinions to evaluate their scoring (Supplement 2). That is, for each criterion or attribute, a subset of the authors of this study used their expertise to decide whether the index met the criterion, or whether it could not be determined. Then, a different subset of authors evaluated the first group’s scoring of the criteria for each index. If there was a disagreement in scoring, the two groups discussed to reach consensus. At least half of the eight indices met seven out of eight theory, data, and use criteria (Fig. 6). However, validation criteria were not upheld as frequently, with indices meeting fewer of the criteria. Though we only assessed a limited set of well-known marine ecology and fisheries indices in this case study, the consistency of our findings suggest that lack of validation is a common issue. 4. How do other disciplines validate indices? The use of indices is not unique to natural resource management and ecosystem assessment. Indices are frequently used in other fields, including medicine (e.g., body mass index (Pietrobelli et al., 1998)), economics (e.g., gross domestic product (Fleurbaey, 2009)), and engineering (e.g., crash potential index (Cafiso et al., 2007; Cunto and Saccomanno, 2008)). As in marine ecology, these fields benefit from the

Fig. 6. Index Validation Criteria. The number of indices that meet, don’t meet, or are unclear about meeting, each of 15 criteria, as described in Table 1.

550

Ecological Indicators 84 (2018) 546–552

P.E. Moriarty et al.

criteria. This is particularly concerning as an index’s tracking ability, if employed without validation, could lead to unhelpful or even harmful management decisions. If data are insufficient for validation, or the goal is to track an intangible ecosystem component, then knowledge of the index’s behavior can provide at least some information on how to best apply and interpret it. A thorough understanding of the behavior and sensitivity of an index will increase its effectiveness in management scenarios. Typically, an index is used either by looking at the long-term trend (Cury et al., 2005; Gascuel et al., 2005) or by determining a threshold at which point the system is considered to be in a poor state (Jennings, 2005; Link, 2005). Here, we represented these scenarios by two metrics: trend over the last 5 years and whether the mean of the last 5 years of the index was within 1SD of the long-term mean. To use these types of metrics in management, it is necessary to understand the properties of the indicator time series to know why an index moves outside of the long-term mean or exhibits a trend over several years. This knowledge will increase the the index’s potential uses, while helping to minimize errors. Unfortunately, validation of ecological indices is challenging. When individual indicators are used, it is relatively straightforward to evaluate their behavior, as they are single variables, and the main consideration is how changes in the metric are detected (Large et al., 2013). However, since indices are composed of multiple indicators combined in various ways, the effect of each indicator on the index, the effect of the combination method, and the effect of the metric chosen for evaluating status and trends in the index must all be understood. Currently, the necessary practices to address this are often neglected, despite previous calls for more rigorous development and validation of indices (Cury and Christensen, 2005; Jennings, 2005; Link, 2005). Simulation analysis is one validation tool that can reveal important aspects of an index’s behavior. Our demonstration revealed that the choices made during development may not influence the index in the way expected. In our example neither quality nor responsiveness influenced outcomes for our coarser metrics of 5-year trend and 1 standard deviation. Additionally, the coarse metrics only showed agreement between the index and focal ecosystem component about 70-80 percent of the time in our simple scenarios across all indicator qualities and responsiveness that we explored, meaning that 20-30 percent of the time the status or trend implied was incorrect. This kind of information about the behavior of an index would help managers make more informed decisions about how to use indices in management and how much to rely on them. Other validation tools, such as using ecosystem models (e.g., Atlantis, EwE, OSMOSE) can also be valuable tools that allow us to validate indices under a variety of ecosystem conditions. Regardless of the simulation method, a solid understanding of how and why an index value changes will increase its utility in a management scenario. Compared to other fields, one particular challenge in marine ecology is that indices are often used to track system attributes that do not have a clear objective. Within ecosystem-based management there are a suite of trade-offs that occur depending upon the context and user’s values (Kittinger et al., 2014). Others have also called out the need for specific objectives related to tools, processes, and implementation for ecosystem management (Fogarty, 2013, Lenfest Ocean Program, 2016, Leslie et al., 2015). Discussion amongst index developers needs to include how to form the index and concretely define the property of interest, so that the index is relevant and validation is possible. Indices are a powerful, convenient, and increasingly important tool for communicating the status and trends of ecosystems. The effective use of an index to inform management decisions relies on understanding its temporal behavior with respect to management objectives. This necessitates validation. While this is not a trivial task, improved validation will help increase the effectiveness of indices to inform management decisions.

index to assess the risk of a cardiac event for patients undergoing major, non-cardiac surgery. Their access to a large dataset that contained information for 4,315 patients enabled them to divide their data into a derivation set and a validation set for cross-validation. Cross-validation allowed the authors to develop the index using one dataset and confirm its predictive ability with another. Additionally, the index had a very specific goal (quantifying the risk of a cardiac event), and each data point (e.g., patient) had a binary outcome. Finally, because the data was divided into training and validation sets, the index’s performance on this subset could be compared to the performance of other indices used to determine the risk of a cardiac event. This provided data to support the use of the new index over alternatives. Similar to medical professionals, engineers are able to validate the reliability of indices by testing individual system components, using models of reality (Cunto and Saccomanno, 2008), or by applying indices to existing databases of known accident occurrences (Cafiso et al., 2007). In each case, the objective is clear, meaning there is a straightforward definition of success or failure, and measurement of performance is precise. However, unlike in medicine, less data is available on outcomes of catastrophic events that engineers work to prevent (e.g., bridge collapses, car crashes for specific makes and models). Instead, engineers validate indices on downsized physical models or numerical simulations (e.g., Cunto and Saccamano, 2008). Comparably, in ecology, we can use complex ecosystem models for validation of indices that are, by design, less detailed than such models (Fulton et al., 2014). In contrast, in social science disciplines, as in ecology, objectives are not as well-defined, and index performance is harder to measure, as there is not a single, binary outcome. For example, the gross domestic product (GDP) has been used for decades to measure economic welfare, despite protests by economists about its faults (Abramovitz, 1959). Since ‘economic welfare’ can be defined in many ways (Fleurbaey, 2009), several alternatives to GDP have arisen, but no single index has been clearly shown to be the best. Some of these alternatives attempt to quantify human ‘well-being’ (Cummins et al., 2003). These well-being indices were originally intended to be objective, though it has more recently been recognized that measures of well-being need to be subjective in order to reflect how people feel about their lives (Cummins et al., 2003). These subjective measures are comparable to ecological indices in their complexity and often ambiguous goal(s). Well-being measures are validated by measuring correlations between indicator components and the overall index (Cummins et al., 2003). While this can help reveal how indicators affect the index value post-hoc, it does not address the more in-depth criteria needed to understand and accurately apply an index to future datasets. When the objective of interest (“economic welfare”, “human wellbeing”, “ecosystem health”) is poorly defined, developing and validating an index is very challenging and will lead to disagreements about validity of the index. As with the marine ecological indices assessed above, these vaguely defined response variables, as compared to medicine and engineering, make it difficult to apply the more rigorous numerical validation practices to data from the ecological or social sciences. 5. Conclusions Validation, or determining how an index behaves and whether the index meets its performance goals, is challenging for ecological indices. The aggregation of multiple indicators into one index, while increasing the information it contains, can obscure how changes in the underlying true system state will affect the index. Without large datasets that can be used to test index accuracy, erroneous assumptions may be made. However, the widespread and increasing use of these indices makes it necessary to continue to improve and to implement validation practices. A number of well-known ecological indices failed to meet validation 551

Ecological Indicators 84 (2018) 546–552

P.E. Moriarty et al.

Martone, R.G., Shearer, C., Teck, S.J., 2009. Mapping cumulative human impacts to California Current marine ecosystems. Conservation Letters 2, 138–148. Halpern, B.S., Longo, C., Hardy, D., McLeod, K.L., Samhouri, J.F., Katona, S.K., Kleisner, K., Lester, S.E., O’Leary, J., Ranelletti, M., Rosenberg, A.A., Scarborough, C., Selig, E.R., Best, B.D., Brumbaugh, D.R., Chapin, F.S., Crowder, L.B., Daly, K.L., Doney, S.C., Elfes, C., Fogarty, M.J., Gaines, S.D., Jacobsen, K.I., Karrer, L.B., Leslie, H.M., Neeley, E., Pauly, D., Polasky, S., Ris, B., St Martin, K., Stone, G.S., Sumaila, U.R., Zeller, D., 2012. An index to assess the health and benefits of the global ocean. Nature 488, 615–620. Jennings, S., 2005. Indicators to support an ecosystem approach to fisheries. Fish and Fisheries 6, 212–232. Kennedy, A.D., Jacoby, C.A., 1999. Biological Indicators of Marine Environmental Health: Meiofauna −A Neglected Benthic Component? Environmental Monitoring and Assessment 54, 47–68. Kershner, J., Samhouri, J.F., James, C.A., Levin, P.S., 2011. Selecting Indicator Portfolios for Marine Species and Food Webs: A Puget Sound Case Study. Plos One 6, e25248. Kittinger, J.N., Koehn, J.Z., Le Cornu, E., Ban, N.C., Gopnik, M., Armsby, M., Brooks, C., Carr, M.H., Cinner, J.E., Cravens, A., D'Iorio, M., Erickson, A., Finkbeiner, E.M., Foley, M.M., Fujita, R., Gelcich, S., Martin, K.S., Prahler, E., Reineman, D.R., Shackeroff, J., White, C., Caldwell, M.R., Crowder, L.B., 2014. A practical approach for putting people in ecosystem-based ocean planning. Frontiers in Ecology and the Environment 12, 448–456. Large, S.I., Fay, G., Friedland, K.D., Link, J.S., 2013. Defining trends and thresholds in responses of ecological indicators to fishing and environmental pressures. Ices J Mar Sci 70, 755–767. Lee, T.H., Marcantonio, E.R., Mangione, C.M., Thomas, E.J., Polanczyk, C.A., Cook, E.F., Sugarbaker, D.J., Donaldson, M.C., Poss, R., Ho, K.K., 1999. Derivation and prospective validation of a simple index for prediction of cardiac risk of major noncardiac surgery. Circulation 100, 1043–1049. Lenfest Ocean Program, 2016. Building Effective Fishery Ecosystem Plans. 63 p. Leslie, H., Sievanen, L., Crawford, T.G., Gruby, R., Villanueva-Aznar, H.C., Campbell, L.M., 2015. Learning from Ecosystem-Based Management in Practice. Coast Manage 43, 471–497. Levin, P.S., Fogarty, M.J., Murawski, S.A., Fluharty, D., 2009. Integrated Ecosystem Assessments: Developing the Scientific Basis for Ecosystem-Based Management of the Ocean. Plos Biology 7, 23–28. Levin, P.S., Schwing, F.B., 2011. Technical background for an integrated ecosystem assessment of the California Current: Groundfish, salmon, green sturgeon, and ecosystem health. Link, J.S., 2005. Translating ecosystem indicators into decision criteria. ICES Journal of Marine Science: Journal du Conseil 62, 569–576. Mantua, N.J., Hare, S.R., 2002. The Pacific Decadal Oscillation. J Oceanogr 58, 35–44. Mayer, A.L., 2008. Strengths and weaknesses of common sustainability indices for multidimensional systems. Environment International 34, 277–291. Niemeijer, D., de Groot, R.S., 2008. A conceptual framework for selecting environmental indicator sets. Ecological Indicators 8, 14–25. Pacific Fishery Management Council, 2013. Pacific Coast Fishery Ecosystem Plan for the U.S. Portion of the California Current Large Marine Ecosystem. Pacific Fishery Management Council. Pauly, D., Watson, R., 2005. Background and interpretation of the ‘Marine Trophic Index’ as a measure of biodiversity. Philosophical Transactions of the Royal Society of London B: Biological Sciences 360, 415–423. Pietrobelli, A., Faith, M.S., Allison, D.B., Gallagher, D., Chiumello, G., Heymsfield, S.B., 1998. Body mass index as a measure of adiposity among children and adolescents: A validation study. The Journal of Pediatrics 132, 204–210. Pikitch, E.K., Santora, C., Babcock, E.A., Bakun, A., Bonfil, R., Conover, D.O., Dayton, P., Doukakis, P., Fluharty, D., Heneman, B., Houde, E.D., Link, J., Livingston, P.A., Mangel, M., McAllister, M.K., Pope, J., Sainsbury, K.J., 2004. Ecosystem-based fishery management. Science 305, 346–347. Rykiel, E.J., 1996. Testing ecological models: the meaning of validation. Ecological Modelling 90, 229–244. Salas, F., Marcos, C., Neto, J.M., Patrício, J., Pérez-Ruzafa, A., Marques, J.C., 2006. Userfriendly guide for using benthic ecological indicators in coastal and marine quality assessment. Ocean & Coastal Management 49, 308–331. Samhouri, J.F., Lester, S.E., Selig, E.R., Halpern, B.S., Fogarty, M.J., Longo, C., McLeod, K.L., 2012. Sea sick?: Setting targets to assess ocean health and ecosystem services. Ecosphere 3, 1–18. Samhouri, J.F., Levin, P.S., Harvey, C.J., 2009. Quantitative Evaluation of Marine Ecosystem Indicator Performance Using Food Web Models. Ecosystems 12, 1283–1298. Shannon, C.E., 1948. A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423. Smith, A.D.M., Fulton, E.J., Hobday, A.J., Smith, D.C., Shoulder, P., 2007. Scientific tools to support the practical implementation of ecosystem-based fisheries management. Ices J Mar Sci 64, 633–639. Worm, B., Hilborn, R., Baum, J.K., Branch, T.A., Collie, J.S., Costello, C., Fogarty, M.J., Fulton, E.A., Hutchings, J.A., Jennings, S., Jensen, O.P., Lotze, H.K., Mace, P.M., McClanahan, T.R., Minto, C., Palumbi, S.R., Parma, A.M., Ricard, D., Rosenberg, A.A., Watson, R., Zeller, D., 2009. Rebuilding Global Fisheries. Science 325, 578–585.

Acknowledgments This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We thank Timothy Essington, Isaac Kaplan, and two anonymous reviewers for helpful comments on the manuscript. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ecolind.2017.09.028. References Abramovitz, M., 1959. The Allocation of Economic Resources. Stanford University Press. Aujesky, D., Fine, M.J., 2008. The pneumonia severity index: a decade after the initial derivation and validation. Clinical infectious diseases 47, S133–S139. Bombardier, C., Gladman, D.D., Urowitz, M.B., Caron, D., Chang, C.H., Austin, A., Bell, A., Bloch, D.A., Corey, P.N., Decker, J.L., et al., 1992. Derivation of the SLEDAI. A disease activity index for lupus patients. Arthritis & Rheumatism 35, 630–640. Borja, A., Franco, J., Pérez, V., 2000. A Marine Biotic Index to Establish the Ecological Quality of Soft-Bottom Benthos Within European Estuarine and Coastal Environments. Marine Pollution Bulletin 40, 1100–1114. Caddy, J., Carocci, F., Coppola, S., 1998. Have peak fishery production levels been passed in continental shelf area?: Some perspectives arising from historical trends in production per shelf area. Journal of Northwest Atlantic fishery science 23, 191–220. Caddy, J.F., Garibaldi, L., 2000. Apparent changes in the trophic composition of world marine harvests: the perspective from the FAO capture database. Ocean & Coastal Management 43, 615–655. Cafiso, S., Cava, G., Montella, A., 2007. Safety Index for Evaluation of Two-Lane Rural Highways. Transportation Research Record: Journal of the Transportation Research Board 2019, 136–145. Cummins, R.A., Eckersley, R., Pallant, J., van Vugt, J., Misajon, R., 2003. Developing a National Index of Subjective Wellbeing: The Australian Unity Wellbeing Index. Social Indicators Research 64, 159–190. Cunto, F., Saccomanno, F.F., 2008. Calibration and validation of simulated vehicle safety performance at signalized intersections. Accident Analysis & Prevention 40, 1171–1179. Cury, P.M., Christensen, V., 2005. Quantitative ecosystem indicators for fisheries management. ICES Journal of Marine Science: Journal du Conseil 62, 307–310. Cury, P.M., Shannon, L.J., Roux, J.-P., Daskalov, G.M., Jarre, A., Moloney, C.L., Pauly, D., 2005. Trophodynamic indicators for an ecosystem approach to fisheries. ICES Journal of Marine Science: Journal du Conseil 62, 430–442. de Leiva Moreno, J.I., Agostini, V.N., Caddy, J.F., Carocci, F., 2000. Is the pelagic-demersal ratio from fishery landings a useful proxy for nutrient availability?: A preliminary data exploration for the semi-enclosed seas around Europe. ICES Journal of Marine Science: Journal du Conseil 57, 1091–1102. deYoung, B., Barange, M., Beaugrand, G., Harris, R., Perry, R.I., Scheffer, M., Werner, F., 2008. Regime shifts in marine ecosystems: detection, prediction and management. Trends Ecol. Evol. 23, 402–409. Figge, F., 2004. Bio-folio: applying portfolio theory to biodiversity. Biodiversity & Conservation 13, 827–849. Fine, M.J., Auble, T.E., Yealy, D.M., Hanusa, B.H., Weissfeld, L.A., Singer, D.E., Coley, C.M., Marrie, T.J., Kapoor, W.N., 1997. A prediction rule to identify low-risk patients with community-acquired pneumonia. New England journal of medicine 336, 243–250. Fleurbaey, M., 2009. Beyond GDP: The Quest for a Measure of Social Welfare. Journal of Economic Literature 47, 1029–1075. Fogarty, M.J., 2013. The art of ecosystem-based fishery management. Can J Fish Aquat Sci 71, 479–490. Fulton, E.A., Smith, A.D.M., Punt, A.E., 2005. Which ecological indicators can robustly detect effects of fishing? ICES Journal of Marine Science: Journal du Conseil 62, 540–551. Fulton, E.A., Smith, A.D.M., Smith, D.C., Johnson, P., 2014. An Integrated Approach Is Needed for Ecosystem Based Fisheries Management: Insights from Ecosystem-Level Management Strategy Evaluation. Plos One 9, e84242. Gascuel, D., Bozec, Y.-M., Chassot, E., Colomb, A., Laurans, M., 2005. The trophic spectrum: theory and application as an ecosystem indicator. ICES Journal of Marine Science: Journal du Conseil 62, 443–452. Gsell, A.S., Scharfenberger, U., Özkundakci, D., Walters, A., Hansson, L.-A., Janssen, A.B.G., Nõges, P., Reid, P.C., Schindler, D.E., Van Donk, E., Dakos, V., Adrian, R., 2016. Evaluating early-warning indicators of critical transitions in natural aquatic ecosystems. Proceedings of the National Academy of Sciences 113, E8089–E8095. Halpern, B.S., Fujita, R., 2013. Assumptions, challenges, and future directions in cumulative impact analysis. Ecosphere 4, 1–11. Halpern, B.S., Kappel, C.V., Selkoe, K.A., Micheli, F., Ebert, C.M., Kontgis, C., Crain, C.M.,

552