Towards optimization of chemical testing under REACH: A Bayesian network approach to Integrated Testing Strategies

Towards optimization of chemical testing under REACH: A Bayesian network approach to Integrated Testing Strategies

Regulatory Toxicology and Pharmacology 57 (2010) 157–167 Contents lists available at ScienceDirect Regulatory Toxicology and Pharmacology journal ho...

591KB Sizes 1 Downloads 81 Views

Regulatory Toxicology and Pharmacology 57 (2010) 157–167

Contents lists available at ScienceDirect

Regulatory Toxicology and Pharmacology journal homepage: www.elsevier.com/locate/yrtph

Towards optimization of chemical testing under REACH: A Bayesian network approach to Integrated Testing Strategies Joanna Jaworska a,*, Silke Gabbert b, Tom Aldenberg c a

Modeling & Simulation Biological Systems, Procter and Gamble, Temselaan 100, 1853 Strombeek-Bever, Brussels, Belgium Department of Social Sciences, Environmental Economics and Natural Resources Group, Wageningen University, P.O. Box 8130, 6700 EW Wageningen, The Netherlands c RIVM, Antonie van Leeuwenhoeklaan 9, P.O. Box 1, NL-3720BA Bilthoven, The Netherlands b

a r t i c l e

i n f o

Article history: Received 15 May 2009 Available online 13 February 2010 Keywords: Integrated Testing Strategies Conceptual requirements for ITS development Bayesian inference Bayesian networks Quantitative Weight-of-Evidence

a b s t r a c t Integrated Testing Strategies (ITSs) are considered tools for guiding resource efficient decision-making on chemical hazard and risk management. Originating in the mid-nineties from research initiatives on minimizing animal use in toxicity testing, ITS development still lacks a methodologically consistent framework for incorporating all relevant information, for updating and reducing uncertainty across testing stages, and for handling conditionally dependent evidence. This paper presents a conceptual and methodological proposal for improving ITS development. We discuss methodological shortcomings of current ITS approaches, and we identify conceptual requirements for ITS development and optimization. First, ITS development should be based on probabilistic methods in order to quantify and update various uncertainties across testing stages. Second, reasoning should reflect a set of logic rules for consistently combining probabilities of related events. Third, inference should be hypothesis-driven and should reflect causal relationships in order to coherently guide decision-making across testing stages. To meet these requirements, we propose an information-theoretic approach to ITS development, the ‘‘ITS inference framework”, which can be made operational by using Bayesian networks. As an illustration, we examine a simple two-test battery for assessing rodent carcinogenicity. Finally, we demonstrate how running the Bayesian network reveals a quantitative measure of Weight-of-Evidence. Ó 2010 Elsevier Inc. All rights reserved.

1. Introduction The new European chemicals regulation REACH (CEC, 2006) is considered a paradigm shift regarding risk assessment and risk management of chemicals (Hansen and Blainey, 2006; Führ and Bizer, 2007; Van Leeuwen et al., 2007). The REACH regulation has introduced a single regulatory framework with common standards for all substances produced, or marketed, in amounts above one metric ton per year (Schoerling, 2003; Petry et al., 2006; Hengstler et al., 2006), and has turned the burden of proof for chemical hazard and risk assessment from regulatory agencies to chemical industry (Führ and Bizer, 2007). In addition, REACH explicitly supports developing and using testing schemes providing an alternative to gold standard in vivo testing (CEC, 2006, Title II, Article 13). These conceptual changes, the challenging information needs defined in REACH, and the large number of substances that is expected to be tested have stimulated the discussion on how to evaluate existing data and generate new information in order to meet informational requirements in an optimal way.

* Corresponding author. Fax: +32 2 5683098. E-mail address: [email protected] (J. Jaworska). 0273-2300/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.yrtph.2010.02.003

Integrated Testing Strategies (ITSs) are assumed to speed up hazard and risk assessment of chemicals, while reducing testing costs and animal use (Grindon et al., 2006; Van Leeuwen et al., 2007; Ahlers et al., 2008; Lilienblum et al., 2008). Hence, ITSs are considered tools for guiding resource efficient decision-making on chemicals’ hazard and risk management. To meet this goal, Blaauboer and Andersen (2007) have pointed to the need for a new toxicity testing and risk analysis paradigm. Looking at the current practice of ITS development, as proposed in the REACH guidance documents of the European Chemicals Agency (ECHA), the selection of tests at a particular stage of an ITS is largely determined by the annual tonnage volume of chemicals (see, for example, the ITSs proposed in ECHA, 2008a,b,c). Furthermore, inference in ITSs is linked to a descriptive Weightof-Evidence (WoE) approach, which is a step-wise procedure for integration of data and for assessing the equivalence and adequacy of different types of information. This approach aims at ensuring optimal integration of information from different sources and various aspects of uncertainty (Ahlers et al., 2008). Though both current ITS and WoE approaches are undoubtedly useful tools for systemizing chemical hazard and risk assessment, they lack a consistent methodological basis for making inferences based on existing information, for coupling existing information with new data from differ-

158

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

ent sources, and for analyzing test results within and across testing stages in order to meet target information requirements. The overall aim of this paper, therefore, is to make a conceptual and methodological proposal for improving ITS development. More specifically, three objectives can be spelled out. The first objective of our paper is to describe the current practice of ITS design and to discuss the methodological shortcomings in more detail. This allows, secondly, for identifying conceptual requirements for ITS development and optimization. The third objective of our paper is, then, to suggest an information-theoretic approach to ITS development, the ‘‘ITS inference framework”, that can be made operational by using Bayesian networks. The paper is organized as follows. Section 2 briefly surveys the literature on ITS design, addressing also the conceptual characteristics and shortcomings of current ITS development. In Section 3, we define conceptual requirements for ITS development and introduce a probabilistic ITS inference framework that meets these requirements. We propose Bayesian networks as an operational tool for empirically applying the ITS inference framework. To illustrate our conceptual, information-theoretic framework, we present a simplified example of a two-test Bayesian network for rodent carcinogenicity assessment in Section 4. Here we also discuss how the ITS inference framework is related to Weight-of-Evidence. Section 5 concludes and discusses implications for further research.

2. Integrated Testing Strategies 2.1. Definition of ITSs and drivers for ITS development Conceptually, an ITS is a hierarchical testing scheme consisting of a set of decision nodes, allowing for taking different routes for information gathering and inference for decision-making about a chemical’s hazard or risk (Hengstler et al., 2006; Van Leeuwen et al., 2007). The overall aim of ITSs is to efficiently exploit and integrate existing information with new data, which can be generated by multiple testing and non-testing methods (Bassan and Worth, 2008). In contrast to classic tiered testing schemes, exploiting information from various sources is considered to increase the safety assessment’s quality, to maximize information gains, and to reduce testing costs, testing time and animal use (JRC, 2005; Hoffmann and Hartung, 2006; Vermeire et al., 2007). In addition, the information gained from a preceding test is considered to guide the choice of the next test, or battery of tests, at any particular stage in the testing sequence (Combes, 2007; Gallegos Saliner and Worth, 2007). ITSs have been developed and used for regulatory purposes before REACH existed. They emerged in the mid-nineties from several research initiatives examining how to combine different testing methods in order to reduce, refine and replace animal testing of chemicals (Blaauboer et al., 1999; Hakkinen and Green, 2002; Salem and Katz, 2003; Vermeire et al., 2007). However, REACH has lead to a considerable expansion of systematic research on ITS development for a broad range of toxicological endpoints. This has mainly been driven by (1) increased testing needs, (2) increased reliance on alternative methods, and (3) the acknowledgment that information requirements cannot be appropriately satisfied on the basis of a single (alternative) test alone, even if they have high predictive capacity (Hoffmann and Hartung, 2006; Van Leeuwen et al., 2007). Recently, several ITSs for environmental and human health endpoints have been proposed in the literature. Many of them were suggested by regulatory bodies such as ECHA, in order to provide guidance to chemical industry on how to meet information requirements defined in REACH. Table 1 presents a selection of ITSs, indicating the information outcome envisaged by the respective studies.

Table 1 Selected examples of ITSs proposed in the recent literature. Source

Endpoint

Information outcomea

Gerner et al. (2000b)

Eye irritation/corrosion

C&L

Höfer et al. (2004)

Eye irritation/corrosion

C&L

ECB (2005)

Reproductive toxicity

C&L

ECETOC (2005)

Bioconcentration Acute toxicity for fidh

BCF Toxicity value for fish

Gerner et al. (2005)

Eye irritation/corrosion

C&L

Walker et al. (2005)

Skin irritation

C&L

Grindon et al. (2008)

Eye irritation/corrosion Skin irritation/corrosion Skin sensitization Acute systemic toxicity and toxicokinetics Chronic and repeated dose toxicity Developmental and reproductive toxicity Endocrine disruption Mutagenicity Carcinogenicity Acute fish toxicity Chronic fish toxicity

C&L and/or RA

Van Leeuwen et al. (2007)

Skin sensitization

Classification (based on ± judgment)

ECHA (2008a,b,c)

Physicochemical properties Skin irritation/corrosion Eye irritation/corrosion Respiratory irritation Skin and respiratory sensitization Acute toxicity Repeated dose toxicity Reproductive and developmental toxicity Mutagenicity Carcinogenicity Aquatic pelagic toxicity Toxicity to sediment organisms Toxicity to STP microorganisms Degradation/biodegradation

Hazard assessment C&L and/or RA C&L and/or RA

Aquatic bioaccumulation Avian toxicity Effects on terrestrial organisms

C&L C&L and/or RA C&L C&L and/or RA C&L C&L and/or RA C&L RA (CSA) RA C&L and/or PBT/vPvB C&L and/or PBT C&L and/or PBT/vPvB C&L and/or PBT/vPvB and/or RA (CSA)

a C&L – classification and labeling, RA –risk assessment, BCF – bioconcentration factor, CSA – chemical safety assessment, PBT – persistent, bioaccumulative, toxic, vPvB – very persistent, very bioaccumulative.

The list of ITSs shown in the table is not exhaustive. However, it illustrates the multitude of human health and environmental endpoints for which ITSs have been proposed, and the envisaged different information targets. 2.2. Current practice of ITS development While the structure and implementation of the ITSs summarized in Table 1 differs considerably across endpoints, we can identify a number of common characteristics concerning information output and the mode of data generation. Most of the reviewed ITSs focus on hazard identification for further use in classification and labeling. In a few instances, hazard information is combined with exposure information in the very last step, as in a classic risk assessment.

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

The ITSs in Table 1 are mainly organized as decision flow charts with multiple branches. Each ITS starts with gathering and evaluating existing information, with the aim to identify substances for which no further testing is required and to evaluate whether existing information from different sources may be sufficient for regulatory decision-making, for example for classification and labeling of a substance. If existing information is considered sufficient, a decision on a chemical’s hazard or risk can be made. If the information collected in the first stage is insufficient, the ITS proceeds with a sequence of testing steps (stages) consisting of a combination of six testing and non-testing methods: (1) chemical categories and read-across; (2) structure–activity relationship (SAR) and quantitative structure–activity relationship (QSAR) techniques; (3) thresholds of toxicological concern (TTCs); (4) exposure-based waiving; (5) in vitro methods; and (6) optimized in vivo methods (Hengstler et al., 2006; Van Leeuwen et al., 2007). The overall testing sequence of an ITS aims at reducing the information target uncertainty in order to maximize predictive accuracy across stages and to obtain the best possible information base for decision-making. Usually, initial gathering of information exploits in silico methods where appropriate (Eriksson et al., 2003; Bassan and Worth, 2008). Next, the non-testing methods are followed by a system of experimental tests of increasing biological relevance and decreasing uncertainty. These possible routes are currently defined in advance, reflecting consensus among experts with regard to a given toxicological endpoint. The way how different decision nodes are connected in an ITS determines how decision-making is guided across stages. In existing ITSs, we observe two different types of decision nodes. First, the guidance can be purely knowledge-based. In this case, ITSs are organized as structured flow charts with nodes explaining, in a narrative way, the thought process experts use to assess a given endpoint, how to systematically gather data, as well as what other information to consider to aid decision-making on the hazard or risk information target. (See, for example, the ITSs proposed in ECB, 2005; Walker et al., 2005; van Leeuwen et al., 2007; Grindon et al., 2008; the ITSs for skin sensitization, acute toxicity, and repeated dose toxicity, mutagenicity and carcinogenicity as suggested in ECHA, 2008a, and the ITS for avian toxicity as suggested in ECHA, 2008c.) In knowledge-based ITSs decision-making is based on expert judgment. A typical knowledge based detail of an ITS is shown in Fig. 1. Second, decision-making across stages in some ITSs is guided by a combination of knowledge-based and rule-based decision nodes. In the context of reasoning on the properties of chemicals rule-based nodes typically denote quantitative ‘‘if-then” decisions, or nodes where a ‘‘yes/no” question is associated with a quantitative threshold (for example a pH value or a LC50 threshold value) (Bassan and Worth, 2008). Some of the ITSs shown in Table 1 contain quantitative rule-based decision nodes (e.g. Gerner et al., 2000a,b; Höfer et al.,

Fig. 1. Example of a knowledge-based decision node as in the ITS for carcinogenicity Fig. R7.7-2, p.425 (ECHA, 2008a).

159

2004; Gerner et al., 2005, the ITSs for skin corrosion/irritation in ECHA (2008a), and ITSs for environmental endpoints in ECHA, 2008b,c), but none of the existing ITSs is strictly rule-based. 2.3. Shortcomings of existing approaches The current way of structuring the testing process for chemical hazard and risk assessment in ITSs, as outlined in the previous section, suffers from several shortcomings. First, defining possible routes for information gathering and data generation in advance does not provide guidance on how to conduct inference regarding the information target considering all relevant information. This makes it very challenging to maintain and to verify consistency and completeness of reasoning across different branches of an ITS. This problem is particularly exacerbated for complex endpoints that require taking into account multiple lines of evidence. Second, we observe that in current ITSs an inference evaluation is limited to local analysis, focused on a single branch of the decision scheme. In many cases, this way of handling inference leads to incorrect overall conclusions, because the overall target uncertainty is not a local, but a global phenomenon, as it strongly depends on the entire available evidence. In addition, due to the use of local inference methods current ITSs cannot handle conditionally dependent evidence that can occur both in single stage as well as across stages. Third, the uncertainty about the information target is expressed qualitatively and in a static way, with a predefined sequence of testing steps, but not quantified and formally updated across testing steps. Hence, the sequencing of steps in an ITS is not explicitly driven by the aim of reducing uncertainty and, consequently, it cannot be ensured that predictive accuracy is maximized across testing steps in the ITS. This requires a dynamic approach for updating uncertainty information after each step. The fourth shortcoming, resulting from the previous two, is related to the fact that exposure information is disconnected from hazard information, which makes balancing uncertainty trade-offs between hazard and exposure assessments more difficult. Finally, the current structure of the ITS does not allow for identifying an optimal testing sequence in which information gains are balanced against costs and animal use. 3. A Bayesian probabilistic inference framework for ITS development 3.1. Conceptual requirements of the ITS inference framework To overcome the shortcomings of current ITS development it was recommended at the workshop on Integrated Approaches to Testing Strategies organized by the OECD (2008) that ITS development should be structured, consistent, transparent and hypothesisdriven. Building upon these recommendations and our discussion of the shortcomings of current ITSs above we propose to use a formal framework for ITS development, which is called ‘‘ITS inference framework” in the following. The ITS inference framework should allow (1) to integrate existing data, (2) to revise inference, when supplemented by new, or additional, evidence, (3) to handle conflicting evidence in a consistent way, and (4) to reason in a consistent manner, both in the case of incomplete data, and for a variety of datasets, based on the entire evidence. That is, the framework should be reusable for making consistent inference based on different sets of evidence for different chemicals. Finally, (5), the framework should allow for evidence maximization, i.e. it should guide testing in such a way that the information content can be updated stepwise and that the choice of subsequent tests is guided by the highest expected uncertainty reduction.

160

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

To meet these objectives, three major conceptual requirements of the ITS inference framework can be identified. First, the inference framework should be probabilistic. Since chemical risk assessment is inherently uncertain due to imperfect understanding of underlying toxicological mechanisms, variations among and between individual species used for testing, and due to measurement and observational errors, we need a systematic approach to quantify this uncertainty in order to reduce it. Probabilistic methods in data analysis provide a formal system for quantifying uncertainty from heterogeneous input sources, relationships between them, and overall target uncertainty. Note that ‘‘probability” in this context is defined as a measure of someone’s degree of belief, rather than a frequency of some kind of event (see Heckermann, 2008 for a discussion of probability concepts). Second, reasoning within the inference framework should be rational. Rational reasoning requires that every piece of evidence will be consistently assessed and coherently used in combination with other pieces of evidence. This requires an isotropic knowledge representation to support consistent reasoning both in causal as well as diagnostic (prediction) directions (Kjaerulff and Madsen, 2008). Probabilistic reasoning achieves rationality as belief assignments that reflect a set of logic rules for consistently combining probabilities of related events as formulated in the axioms of probability theory (Doyle, 1992; Horvitz et al., 1988). While knowledge- and rule-based systems, as manifested in current ITS schemes, typically model the expert’s way of reasoning, probabilistic systems describe dependencies between pieces of evidence (towards an information target) within the modeled domain. This ensures that learning, i.e. updating information over time based on available evidence, can be coherently addressed (Kjaerulff and Madsen, 2008). The use of probabilistic approaches for data integration in complex inference situations, such as heterogeneous and conflicting data, is not new and has been used in human and veterinary medical diagnoses (Horvitz et al., 1988; Gardner, 2002), and in bioinformatics (Troyanskaya et al., 2003), for addressing problems of a similar nature. Third, the inference should be hypothesis-driven. The hypothesisdriven workflow proceeds from gathering existing data, through computational analysis, towards quantifying uncertainty of the information target and, then, back to potentially new experimentation. Such hypothesis-driven inference is based on a cyclic workflow that basically consists of four steps: (1) Hypothesis formulation based on existing data; (2) Performing new experiments and collecting evidence; (3) Joint analysis of the hypothesis and collected evidence; (4) Refinement of the hypothesis, based on the joint analysis. A cyclic hypothesis-driven workflow cannot be realized in current ITS schemes where inference is fixed in one direction. Hypothesis-driven inference implies that causal relationships are preserved. This goes beyond Hansson and Rudén (2007), who have outlined a decisiontheoretic framework for the development of tiered testing systems, but who omit the idea of hypothesis-driven inference. 3.2. Bayesian inference as analytic method for the ITS inference framework We propose to use Bayesian probabilistic inference as methodological foundation for data analysis in the ITS inference framework. Bayesian inference meets the desired conceptual requirements put forward in Section 3.1: it is probabilistic and, therefore, reflects the rationality concept of probability theory; it uses an isotropic knowledge representation in the form of a joint probability function; it is well suited to realize hypothesis-driven inference as it updates a prior hypothesis as new evidence arrives; and it allows for generating a refined hypothesis in terms of a posterior probability distribution. As evidence accumulates, for example across testing stages of an ITS, the degree of belief in a hypothesis will generally change.

In more general terms, Bayesian inference uses a numerical estimate of the degree of belief (i.e. probability) in a hypothesis, before any evidence is available. After evidence has been observed, Bayesian inference allows to calculate a numerical estimate of the degree of belief in the hypothesis. This process may be iteratively repeated when additional evidence is obtained. Bayes’ Theorem adjusts probabilities given new evidence in the following way (e.g. Pearl, 1988, p. 17):

PrðHjEÞ ¼

PrðHÞ  PrðEjHÞ ; PrðEÞ

ð1Þ

where H represents a specific hypothesis; Pr(H) is called the prior probability of H that was inferred before new evidence, E, became available; Pr(E|H) is called the conditional probability of seeing the evidence E if the hypothesis H happens to be true. It is also called a likelihood function, when it is considered as a function of H for fixed E; Pr(E) is called the marginal probability of the evidence E: the probability of witnessing the new evidence E under all possible hypotheses. Pr(H|E) is called the posterior probability of H given E. Bayes’ Theorem is the starting point for inference problems using probability theory. Applying Bayes’ Theorem allows for transforming prior beliefs into ‘‘post data” beliefs, which is considered a fundamental characteristic of learning processes (Zellner, 1998). Clearly, being a probabilistic, data-driven, approach, the outcome of Bayesian inference depends on the quality and relevance of the input information used (see also Sections 4.1 and 4.2). Hence, applying Bayesian inference does not eliminate the need for a thorough expert evaluation of available information. Furthermore, expert knowledge remains to be vital for carefully assessing each step of the analysis. Bayesian inference and evidential hypotheses testing adopts a cyclic workflow as described in Section 3.1. It starts with formulating the prior knowledge into a hypothesis (step 1), which, in the context of an ITS, is the hypothesis that a chemical has a certain target property. The hypothesis is based on existing information, prior to conducting any experiments. The hypothesis is expressed as a socalled prior distribution that summarizes all prior knowledge about the target information. Informative priors can be constructed from various sources, for example existing knowledge published in the literature, information gained from experts, or information generated earlier from testing and non-testing methods such (Q)SARs and read-across methods. What type of information to use for generating an informative prior depends on the chemical and on the available information. The overall aim is to incorporate the various pieces of prior evidence in a systematic and coherent way. Next, the current information, as summarized in the informative prior, is revised (step 2 and 3, see Section 3.1) using Eq. (1) based on evidence collected in the testing part of the framework. In Bayesian terms, the prior information is updated using information from in vitro, and if needed, in vivo, experimental evidence to generate a posterior distribution about the assessed property of a chemical. Based on the new evidence, the final step is to generate an updated hypothesis (step 4; see Pearl, 1988; Berger, 1993). This cyclic, hypothesis-driven workflow of Bayesian reasoning supports the conceptual workflow of Evidence Based Toxicology (EBT), as outlined in Guzelian et al. (2005), Hoffmann and Hartung (2006) and Valerio (2008). In the field of chemical testing, the use of Bayesian inference methods is not a novel approach. Bayesian approaches have been used particularly for making predictions of a test battery’s accuracy in cases where tests are considered jointly (Rosenkranz et al., 1984; Kim and Margolin, 1994; Tarasov et al., 2003). To the best of our knowledge, no attempt has been made so far to apply Bayesian inference to ITS development.

161

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

3.3. Bayesian networks as an operational tool of the ITS inference framework The conceptual ITS inference framework requires to be implemented through a computational tool in order to become operational. We propose to use Bayesian networks, which are defined as graphical models of probabilistic relationships between variables of interest in a decision-making context (Pearl, 1988; Heckerman et al., 1995; Jensen and Nielsen, 2007). A Bayesian network is a formal tool that is based on the axioms of probability theory (Doyle, 1992). Its objective is to combine complex, and possibly conflicting, information by probabilistic reasoning and to generate the final result as a posterior probability distribution, which, in our case, is the probability that a chemical compound has a particular property. Bayesian networks can be regarded as decision-support frameworks because of their ability to explain causal relationships and to serve as prediction models (Castillo et al., 1997; Pearl, 2000; Kjaerulff and Madsen, 2008). Bayesian networks have emerged in the mid 1980s, and have been shown to be remarkably effective for encoding uncertain knowledge and using available and new data to support decisions (Charniak, 1991; Holmes and Jain, 2008). As a result, they have become widely used for more than two decades in many different fields such as, for example, causal learning (Steyvers et al., 2003; Gopnik et al., 2004; see Pourret et al., 2008 for an overview of applications). Bayesian networks have been frequently applied in medical diagnosis and clinical decision-making (see, for example, Spiegelhalter et al., 1989; Heckerman et al., 1995; Wang et al., 1999; Dendukuri and Joseph, 2001; Georgiadis et al., 2003; Branscum et al., 2005; Engel et al., 2006; Onis´ko, 2008; Bhattacharjee et al., 2008; Gevaert et al., 2006). As discussed in Guzelian et al. (2005) and in Hoffmann and Hartung (2006), medical diagnosis and chemical testing show several conceptual similarities. This suggests the applicability of Bayesian network approaches to support the development of chemical testing strategies. Bayesian networks offer several merits for structuring safetyrelated decisions. First, they graphically illustrate causal relationships between the target variable, and the observed, as well as the unobserved, variables. The graph representation, showing explicit dependencies between variables, is a well-developed tool in knowledge acquisition, verification, and explanation processes (Pearl and Paz, 1987; Pearl, 1988, 2000). Second, chemical safety evaluation frequently relies on incomplete data sets that will differ from chemical to chemical. Bayesian networks can handle incomplete and different data sets in order to reach consistent conclusions. Third, an important feature that can be addressed by Bayesian networks is quantification of conditional dependence between tests. As discussed in Section 2.3, current ITS schemes cannot address this issue. Ignoring conditional dependence between tests, however, often leads to an erroneous reduction of apparent uncertainty when presented with information from multiple tests (Gardner et al., 2000). Conditional dependence may also arise when both tests essentially measure the same effect of a compound. In extreme cases of conditional dependence, the situation is much like doing the same test twice: no additional information is obtained beyond what the first result already revealed. Last, the knowledge representation in a Bayesian network is fully isotropic and allows to reason based on the entire evidence.

4. An illustrative example: Bayesian network of 2 in vitro genotoxicity tests for rodent carcinogenicity potential assessment As stated in the introduction, the main objective of our paper is to make a conceptual proposal for improving ITS development. To

Carcinogenic

T1Ames

T2MLA

Fig. 2. Bayesian network of two in vitro genotoxicity tests to reason about the unobservable state of in vivo rodent carcinogenicity. The arrows (arcs) model probabilistic causality quantified by conditional probabilities.

better illustrate the useful features of Bayesian networks for chemical safety evaluation and ITS development, we provide in this section a simple example of a Bayesian network for two in vitro tests, used for supporting decision-making on the in vivo activity of a chemical. We use a stylized example which is not meant to represent a complete ITS. Rather, we attempt to exemplify the basic characteristics of Bayesian inference, and we illustrate how the inference framework can support decision-making on chemical hazard and risk in situations as presented in Fig. 1. Let us assume that two in vitro genotoxicity tests, the Ames test and Mouse Lymphoma Assay (MLA), are potentially available to assess the carcinogenic potential of a chemical (Fig. 2). The selection of tests, to be used as information inputs to a Bayesian network, accounts for the possible mode(s), or mechanism(s), of action. This illustrates the important role of expert knowledge for constructing a Bayesian network. In Fig. 2, the direction of the arcs symbolizes probabilistic causality. An arrow from the yes/no (binary) variable ‘Carcinogenic’ to each test indicates that carcinogenicity increases the probability that the test result is positive. Probabilistic causality is expressed as a conditional probability: PrðTjCÞ, denoting the probability of a test result, given (‘|’) carcinogenicity status. However, we are interested in the probability of carcinogenicity given the results obtained from the in vitro test(s). Using Bayes’ Theorem as shown in Eq. (1) in Section 3.2, this can be expressed as the inverse conditional probability PrðCjTÞ, which can be calculated by transforming the prior belief (probability) of a chemical being carcinogenic and the conditional probability PrðTjCÞ into the posterior probability PrðCjTÞ. Hence, applying Bayes’ Theorem allows inferential reasoning to go into the opposite direction of the probabilistic causality arrows. This illustrates the isotropic nature of Bayesian networks. The arc connecting the Ames test with the MLA test (Fig. 2) shows that we consider conditional dependence between the tests, which is reasonable, since none of the tests is a perfectly accurate gold standard test (Dendukuri and Joseph, 2001) and given the evidence provided by Tennant et al. (1987) and by Kim and Margolin (1994). When running the Bayesian network, it is important to account for conditional dependence between tests even if we expect it to be only weak (for example when underlying biological mechanisms observed by the tests are different). Ignoring conditional dependence may lead to an overestimation of the posterior probability of the target variable (in our case ‘‘Carcinogenicity potential”). When considering conditional dependence between the two tests, there are two options: to either draw an arrow from the Ames test to the MLA test, or the reverse. Note that, when the conditional probabilities are specified in accordance with either choice, this is entirely equivalent. In the present configuration of the network (Fig. 2), we have chosen the result of the Ames test to depend on Carcinogenicity only, while the result of the MLA test both depends on Carcinogenicity and the result of the Ames test. This is just a matter of convenience, since more often the Ames test will be available rather than the MLA test and the Ames test will usually be conducted first. As a result, the tests will become conditionally correlated and inferential reasoning can go either way.

162

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

Table 2 Observed cell counts and marginal totals of pair-wise test results for two in vitro tests (Ames and MLA) applied to 243 carcinogens and 105 non-carcinogens based on Kirkland et al. (2005). Carcinogens

0 12 34 46

135 108 243

Ames 1 0

1 20 44 64

0 7 34 41

27 78 105

Table 3 Conditional Probability Tables (CPTs) for the T1 (Ames) test node (left), and the T2 (MLA) test node (right). 1– positive test result, 0 – negative test result. Pr(T1|Carc)

1 0

135 108 243

1 0

MLA

1 123 74 197

+

Carcinogenic

Non-carcinogens

MLA Ames 1 0

Carcinogenic 50.00 50.00

T1Ames

T2MLA

T1 Ames

T2 MLA

40.63 59.37

71.01 1 28.99 0

1 0

Fig. 3. Bayesian two-test network before any evidence is specified. The numbers in the Ames and MLA monitors represent the probability of obtaining a 1/0 (yes/no) results. All the probabilities are expressed in percentages.

Pr(T2|Carc, T1) – 27 78 105

+, 1 1 0

123 12 135

+, 0 74 34 108

–, 1 20 7 27

Carcinogenic

–, 0

Carcinogenic

44 34 78

4.1. Informational requirements for inputs to the framework Before we illustrate the functioning of a Bayesian network for our specific example, we explain the required information inputs to the framework. Essentially, two types of information inputs are needed. First, the network analysis starts with developing a table of observed cell counts of carcinogens and non-carcinogens, each chemical being subjected to both tests. Here we use predictivity data taken from Kirkland et al. (2005), which are cross-classified in Table 2. Second, from the observed cell counts and the marginal totals of pair-wise test results as shown in Table 2, we need to construct the Conditional Probability Tables (CPT) (Table 3). Since there are two dependent test variables to detect carcinogenicity, there are two tables to specify the probabilistic relationships between tests and carcinogenicity status. Since the Bayesian network is a data-driven approach, the inference generated depends on the quality of the dataset used for the parameterization of the network. Clearly, Bayesian probabilistic inference cannot compensate for biases in the underlying datasets. This calls for carefully evaluating existing datasets, prior to applying the Bayesian inference framework empirically, and, where appropriate, for confining the training set to a homogeneous class of chemicals. 4.2. Running the Bayesian network In the following, the reasoning on the basis of the Bayesian network, as implemented in HUGIN Lite 7.0Ò (www.hugin.com), is demonstrated in a series of Figures. In the Appendix A, we provide both the background equations for the case of two joint test results, as well as spreadsheet-based calculations for the Bayesian network, leading to the same results. Furthermore, we show how a quantitative measure of Weight-of-Evidence can be derived from the Bayesian analysis. Fig. 3 illustrates the inferential state of the Bayesian network, before any experimental evidence has become available. Each Bayesian calculation allows for the specification of a so-called prior distribution: the distribution of a (new) chemical to be carcinogenic. For simplicity, let us assume a fifty-fifty prior distribution on the target variable ’Carcinogenic’, which means that we are indifferent towards the carcinogenicity of the chemical. The probability of getting positive test results, before these results become available, are calculated in the respective monitors,

68.36 31.64

T1Ames

T2MLA

T1 Ames

T2 MLA

100.00 1 0.00 0

85.72 1 14.28 0

1 0

Fig. 4. Bayesian two-test network after positive evidence on the Ames test was entered, by setting the Ames monitor to 100% for outcome 1. The predictive posterior value for carcinogenicity increased from the prior value of 50% to 68%.

and can be derived from the prior and the test performance for known compounds, as summarized in the Conditional Probability Tables (CPTs) in Table 3. Consider now the case where one piece of evidence – which is in our case a positive Ames test – has been entered. This is shown in Fig. 4. In this case, the posterior on the target variable, ‘Carcinogenic’, is updated and the chance (i.e. the belief assignment) that the chemical is carcinogenic has increased from 50% to 68.36%. Because (rodent) carcinogenicity is now more likely, the MLA test, although no test result is specified, has a higher probability of displaying a positive result: 85.72%, instead of 71.01% in Fig. 2. This also illustrates that the impact of the prior information on the posterior distribution is the greater the less evidence is available. As evidence increases as a result of additional testing or non-testing information, the impact of the prior on the posterior distribution decreases.

Carcinogenic

Carcinogenic

57.08 42.92

T1Ames

T2MLA

T1 Ames

T2 MLA

49.05 50.95

100.00 1 0.00 0

1 0

1 0

Fig. 5. Bayesian two-test network after positive evidence on the MLA test was provided, by setting the MLA monitor to 100% for outcome 1. No test result for Ames was entered. The predictive posterior value for carcinogenicity increased from the prior value of 50% to only 57.08%.

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

Carcinogenic

Carcinogenic

72.66 27.34

1 0

T1Ames

T2MLA

T1 Ames

T2 MLA

100.00 1 0.00 0

100.00 1 0.00 0

Fig. 6. Bayesian two-test network after positive evidence on both the Ames test and the MLA test was provided, by setting the respective monitors to 100% for outcome 1. The posterior value for carcinogenicity increased from the prior value of 50% to 72%.

Fig. 5 illustrates the probabilistic inference, when another single, but different piece of evidence: a positive MLA test, was entered. From the run of the Bayesian network in Fig. 4, it follows that, when we have no Ames test result, but do have a positive MLA test result instead, we can consistently reason using the same framework. A positive MLA result has a weaker impact on the posterior value of the endpoint being carcinogenic than a positive Ames result. The belief that the chemical is carcinogenic is only 57.08%, as compared to the posterior value (68.36%) based on a positive Ames test in Fig. 3. Due to the increase of the predictive posterior value of the target (carcinogenicity) variable, the probability of a positive Ames result, although unavailable in this run, has increased from 40.63% (Fig. 2) to 49.05% (Fig. 4). Fig. 6 shows a run with the carcinogenicity Bayesian network, when the two tests both yield positive results. The assumed prior probability for carcinogenicity of 50% is revised to a posterior probability of 72.66% (cf. the posterior value towards carcinogenicity calculated in the spreadsheet version in the Appendix A). Finally, Fig. 7 illustrates probabilistic inference after two pieces of evidence were provided, which are conflicting: in this case, a positive Ames test and a negative MLA test. Note that in the case of conflicting evidence, the fifty-fifty prior, although updated, has not changed much. The Bayesian two-test network in Fig. 6 indicates that the joint evidence of the target variable (carcinogenicity) is in disagreement, resulting in a posterior value for the belief that the chemical is carcinogenic, which is approximately the one given by the indifference prior. This may imply further testing, or may warrant invoking other additional information. Clearly, the approach is not limited to a noninformative prior and can be refined by developing an informative, chemical-specific, prior distribution of carcinogenicity potential.

Carcinogenic

Carcinogenic

42.55 57.45

1 0

T1Ames

T2MLA

T1 Ames

T2 MLA

100.00 1 0.00 0

0.00 100.00

1 0

Fig. 7. Bayesian two-test network after conflicting evidence was entered: a positive Ames test and a negative MLA test, by setting the respective values in the monitors to 100%. The posterior value for carcinogenicity decreased from the prior value of 50% to 42%.

163

4.3. Bayesian probabilistic inference as a quantitative Weight-ofEvidence As presented in the above section, running the Bayesian network generates a posterior probability distribution of a chemical having a certain target property (in our example carcinogenic or non-carcinogenic), given a complete, or incomplete, test battery result. In addition, as we explain in more detail in the Appendix A, Bayesian probabilistic inference allows for decomposing the posterior evidence of a testing sequence into its additive evidence components, i.e. the prior evidence component and the test component. Hence, we can quantitatively determine the additive contribution of each of these components to the overall evidence. The ITS inference framework can, therefore, be interpreted as a quantitative approach to Weight-of-Evidence (WoE). Compared to the descriptive WoE approaches that are attached to an ITS as a separate step (see, for example, the ITSs proposed in the REACH guidance documents; Ahlers et al., 2008), WoE values are generated endogenously in the proposed ITS inference framework. That is, the inference and the WoE values result from the same conceptual setting. This makes the development process of testing strategies such as ITSs methodologically consistent and transparent. 5. Discussion and conclusions Regulatory decision-making on chemical hazards and risks is characterized by high complexity and uncertainty. Improving the information basis for decision-making is, therefore, essential. Current approaches to ITS development, however, lack a conceptually consistent and transparent framework for data integration and aggregation across various testing stages. The goal of this paper is to make a theoretical and conceptual contribution to ITS development. We propose a set of core conceptual requirements for systematic and consistent data integration in ITSs. Furthermore, we suggest Bayesian probabilistic inference as a methodological foundation for developing ITSs in order to meet these requirements. The Bayesian network methodology is a formalized approach for evidential reasoning that has been proven useful in many different domains, including medical diagnosis and testing, and toxicology. For developing toxicological testing strategies a Bayesian network ITS framework offers several appealing features. First, being a quantitative approach, it allows for consistent and coherent reasoning based on different – and even conflicting – sources of information. Hence, the assessment of the target property of a new chemical is reproducible. Second, a Bayesian network ITS framework allows for updating uncertainty information across testing stages. Hence, the design of the ITS is guided by the aim of reducing uncertainty across testing stages. In this respect, the Bayesian inference approach suggested in this paper goes beyond current knowledge-based and rule-based approaches to ITS development. Third, the Bayesian network ITS framework allows for implementing quantitative WoE analysis. In contrast to descriptive WoE analyses, quantitative WoE assessment is integrated in the same methodology that is used for ITS inference. In order to illustrate the basic characteristics of the Bayesian network ITS framework, we present a simple two-test Bayesian network for rodent carcinogenicity assessment. This illustration, of course, does not represent a full ITS and we do not claim to provide a ready-for-use testing strategy. As the purpose of our paper is conceptual, the two-test example is, however, sufficient to demonstrate how the requirements defined above can be put into practice. The Bayesian network ITS framework is a data-driven approach. Thus, its outcomes strongly depend on the quality and the appropriateness of the input information, the choice of tests, and the underlying training set of chemicals. For assessing the quality

164

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

and relevance of input information, expert knowledge remains to play an important role. As for many endpoints representative training sets for the considered tests are not readily available, the construction of the Bayesian network may turn out to be difficult in practical applications. Furthermore, the Bayesian network ITS framework, as we have introduced it in this paper, is focussing on discrete input and output variables. Extensions of Bayesian networks to continuous variables are only considered for normally distributed variables. Therefore, generalizing the Bayesian ITS framework to other than normally distributed variables is an interesting challenge for further research. In the context of ITS development, Bayesian probabilistic reasoning and inference is still in an experimental phase. Further research is required to apply the approach to more complex ITSs that are beyond the illustrative example presented in the paper. The EU research project ‘‘Optimized Strategies for Risk Assessment of Chemicals based on Intelligent Testing – OSIRIS” (2008) aims at developing, testing and applying Bayesian networks to ITS development for various endpoints. The approach is discussed with risk assessors from regulatory agencies and the chemical industry to ensure that the needs of practical users will be met and to explore its scope in different regulatory frameworks. Finally, it should be pointed out that ITSs are considered tools for guiding efficient decision-making on chemical hazard and risk management. While a consistent framework for updating information is a necessary prerequisite, we need to go beyond the pure inference view in order to achieve this goal. In particular, since chemical testing is costly, information gains must be balanced against testing costs. Costs can comprise different components, in particular monetary testing costs, testing time, and animal welfare loss. A decision-maker who considers performing an additional test needs to know whether the information gained is worth its costs. This requires translating test information outcomes into values for decision-making. Hence, further research should address how the information-theoretic inference framework can be complemented by decision-theoretic optimization procedures. Acknowledgments The funding of the European Union 6th Framework OSIRIS Integrated Project (GOCE-037017-OSIRIS) is gratefully acknowledged. We thank R. McDowell, G. Daston and G. Stijntjes and three anonymous referees for helpful comments to an earlier version of this paper. Appendix A. Two-test Bayesian network in a spreadsheet form In this appendix, the equations for the two-test Bayesian network calculations in Figs. 6 and 7 are presented. For illustrative purposes, we demonstrate the calculations in spreadsheet form. Furthermore, we demonstrate how a quantitative measure of Weight-of-Evidence can be derived. Bayes’ Theorem, Eq. (1) in the main text, implemented for the two-test battery system to predict carcinogenicity, can be formulated as: PrðC þ jT 1 ¼ i;T 2 ¼ jÞ

Here, the hypothesis H in (1) has two possible values: carcinogenic (C þ ) and non-carcinogenic (C  ). The evidence E is a particular battery test result with four possible combinations: fT 1 ¼ 1; T 2 ¼ 1g, fT 1 ¼ 1; T 2 ¼ 0g, fT 1 ¼ 0; T 2 ¼ 1g, and fT 1 ¼ 0; T 2 ¼ 0g. Thus, with i ¼ 0; 1 and j ¼ 0; 1, Eqs. (2a) and (2b) comprise eight equations in total. In Bayesian jargon, fPrðC þ Þ; PrðC  Þg is the prior distribution on is carcinogenicity; fPrðT 1 ¼ i; T 2 ¼ jjC þ Þ; PrðT 1 ¼ i; T 2 ¼ jjC  Þg called the likelihood of carcinogenicity for a given test combination: fi; jg, and fPrðC þ jT 1 ¼ i; T 2 ¼ jÞ; PrðC  jT 1 ¼ i; T 2 ¼ jÞg is the posterior distribution on carcinogenicity given a test battery result. A convenient shorthand version of Bayes’ Theorem results from dividing Eq. (2a) by Eq. (2b):

PrðC þ jT 1 ¼ i; T 2 ¼ jÞ PrðC þ Þ PrðT 1 ¼ i; T 2 ¼ jjC þ Þ ¼  : PrðC  jT 1 ¼ i; T 2 ¼ jÞ PrðC  Þ PrðT 1 ¼ i; T 2 ¼ jjC  Þ

ð3Þ

The ratio of the probability of an event to the probability of its denial is called the odds of the event. Thus, Eq. (3) expresses that the posterior odds equals the prior odds times the likelihood ratio (Campbell and Machin, 1993, p.37; Pepe, 2003, p.18). A further simplification results, when we take the logarithm of Eq. (3), to arrive at:

ln

    PrðC þ jT 1 ¼ i; T 2 ¼ jÞ PrðC þ Þ ¼ ln   PrðC jT 1 ¼ i; T 2 ¼ jÞ PrðC Þ   PrðT 1 ¼ i; T 2 ¼ jjC þ Þ þ ln : PrðT 1 ¼ i; T 2 ¼ jjC  Þ

ð4Þ

Hence, the posterior log odds are the sum of the prior log odds and the log likelihood ratio. In other words: We are able to quantitatively determine the additive contribution of the log prior odds and the log likelihood ratio of a given test combination (E) to the log posterior odds towards carcinogenicity (H) given the evidence E. These terms can, therefore, be interpreted as a quantitative approach to Weight-of-Evidence (WoE) (Horvitz et al., 1988; Smith et al., 2002). This, in fact, has a distinguished history, dating back to cryptanalytic work in WOII (Good, 1979; Good, 1985, 1988). Good reports that Alan Turing, the ‘father of computing’, proposed a unit of WoE, analogous to the decibel in acoustics, which he called a deciban:

WoE ¼ 10  log10 ðoddsÞ  4:343  lnðoddsÞ:

ð5Þ

Suppose, one expresses the odds in favor of an event as 5–4, then we have approximately 1 deciban (db) of evidence. This is regarded as a convenient small value in human reasoning. Thus, we can express the post-test WoE for each of the four test result combinations as:

WoEpost-test ¼ 10  log10 ij

  PrðC þ jT 1 ¼ i; T 2 ¼ jÞ :  PrðC jT 1 ¼ i; T 2 ¼ jÞ

ð6aÞ

Similarly, the pre-test (prior) WoE is:

WoEpre-test ¼ 10  log10

  PrðC þ Þ ;  PrðC Þ

ð6bÞ

and the test (likelihood) WoE equals:

PrðC þ Þ  PrðT 1 ¼ i;T 2 ¼ jjC þ Þ ¼ þ PrðC Þ  PrðT 1 ¼ i;T 2 ¼ jjC þ Þ þ PrðC  Þ  PrðT 1 ¼ i; T 2 ¼ jjC  Þ

WoEtest ¼ 10  log10 ij ð2aÞ

  PrðT 1 ¼ i; T 2 ¼ jjC þ Þ :  PrðT 1 ¼ i; T 2 ¼ jjC Þ

ð6cÞ

It follows that Eq. (4) becomes: 

PrðC jT 1 ¼ i;T 2 ¼ jÞ

WoEpost-test ¼ WoEpre-test þ WoEtest ij ; ij

PrðC  Þ  PrðT 1 ¼ i;T 2 ¼ jjC  Þ ¼ þ PrðC Þ  PrðT 1 ¼ i;T 2 ¼ jjC þ Þ þ PrðC  Þ  PrðT 1 ¼ i; T 2 ¼ jjC  Þ ð2bÞ

(Rosenkranz et al., 1984; Kim and Margolin, 1994).

ð7Þ

expressed in units of deciban (db). The major advantage is that component evidence terms add up to combined evidence, while positive (negative) evidence takes positive (negative) values. The WoE decomposition can, in principle,

165

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167 Table 4 Two-test symbolic Bayesian network layout in spreadsheet form. Equations are given in the text. A

B

C

D

E

Carcinogens (C ), # chemicals T2 T1 1 1 m11 0 m01

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

H

Non-carcinogens (C ), # chemicals T2 T1 1 1 n11 0 n01

0 m10 m00

Cell probabilities, sensitivity: T2 T1 1 1 p11 0 p01 Se2 Likelihood ratio+ (odds) T2 T1 1 1 LRþ 11 0 LRþ 01 Posterior Prob+ T2 T1 1 1 PrðC þ j1; 1Þ 0 PrðC þ j0; 1Þ

0 p10 p00 1  Se2

n

Se1 1  Se1 1.0

0 LRþ 10 LRþ 00

Cell probabilities, specificity: T2 T1 1 1 q11 0 q01 1  Sp2

0 q10 q00 Sp2

Likelihood WoE+ (db) T2 T1 1 1 WoEþ 11 0 WoEþ 01

0 WoEþ 10 WoEþ 00

1  Sp1 Sp1 1.0

Posterior Prob– 0 PrðC þ j1; 0Þ PrðC þ j0; 0Þ



fPrðC Þ; PrðC Þg ¼ f0:5; 0:5g:

T1 1 0

ð8Þ

It follows that the pre-test WoE equals zero:

WoEpre-test

I

0 n10 n00

m

þ

  0:5 ¼ 10  log10 ¼ 0; 0:5

ð9Þ

WoEpost-test ¼ WoEtest ij : ij

ð10Þ

To illustrate the WoE calculation for the Bayesian network of Figs. 6 and 7, we implement the calculations in spreadsheet form. The symbolic layout of a two-test calculation is given in Table 4. The cell expressions are now presented. Rows 4 and 5 of Table 4 contain the raw number of chemicals in each class (carcinogenic and non-carcinogenic) for all joint test results. The total number of chemicals for each class, m and n, is displayed in cells D6 and I6. From these raw numbers of chemicals, we calculate the conditional probabilities of a joint test result for each chemical class in cells B11:C12 of Table 4, and G11:H12, respectively:

pij ¼ PrðT 1 ¼ i; T 2 ¼ jjC þ Þ ¼ mij =m qij ¼ PrðT 1 ¼ i; T 2 ¼ jjC  Þ ¼ nij =n

ð11Þ

These are also the likelihood values. The sensitivities of T1 and T2 in cells D11 and B13 in Table 4 are (Gardner et al., 2000):

Se1 ¼ p11 þ p10 Se2 ¼ p11 þ p01

Similarly, the specificities of T1 and T2 in cells I12 en H13 are:



T2 1 PrðC  j1; 1Þ PrðC  j0; 1Þ

0 PrðC  j1; 0Þ PrðC  j0; 0Þ

Sp1 ¼ q01 þ q00 Sp2 ¼ q10 þ q00

ð13Þ

The diagnostic likelihood ratio (Pepe, 2003) equals:

LRþij ¼

PrðT 1 ¼ i; T 2 ¼ jjC þ Þ pij ¼ ; PrðT 1 ¼ i; T 2 ¼ jjC  Þ qij

ð12Þ

ð14Þ

calculated in cells B18:C19 in Table 4, which are expressed as odds on carcinogenicity. Weight-of-Evidence towards carcinogenicity (+), given a test result combination, ij, is calculated in units of decibans as:

    WoEþij ¼ 10  log10 LRþij  4:343  ln LRþij :

i.e. we presume no pre-test evidence. In this case, the post-test WoE is just the test WoE itself:



G –

be extended to sequential battery testing (Rosenkranz et al., 1984), but we will not pursue that here. Note that the pre-test WoE is independent of the test results. In the Bayesian Network of Figs. 6 and 7, we have assumed the uniform (fifty-fifty) prior distribution:

(

F

+

ð15Þ

This is done in cells G18:H19 in Table 4. As the pre-test (prior) WoE is taken to be zero (see above), this test WoE is also the post-test (posterior) WoE. With no prior evidence, the posterior probabilities of carcinogenicity and non-carcinogenicity, for a given joint test result, can be conveniently calculated from the likelihood ratio:

PrðC þ jT 1 ¼ i; T 2 ¼ jÞ ¼ PrðC  jT 1 ¼ i; T 2 ¼ jÞ ¼

LRþij

;

ð16aÞ

1 ; 1 þ LRþij

ð16bÞ

1 þ LRþij

as is done in cells B24:C25 and G24:H25 from the values in cells B18:C19 in Table 4. A similar expression holds in the case of a nonuniform prior: use the posterior odds from Eq. (3), instead of the likelihood ratios. The results of the calculations are given in Table 5. The observed cell counts from Table 2 in the main text are replicated in rows 1–6 of Table 5. Posterior probabilities in cells B24 and G24 in Table 5 match those in the Bayesian network in Fig. 6. Cells C24 and H24 in Table 5 are equal to the posterior probabilities in Fig. 7. We observe that the WoE with respect to carcinogenicity is approximately 4 db, when both tests are positive, and roughly the same amount, but negative, when both tests are negative.

166

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167

Table 5 Two-test Bayesian network for a uniform prior in spreadsheet form, based on the data in Table 2. A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

B

C

D

H

I

135 108 243

Non-carcinogens (C ) MLA Ames 1 1 20 0 44 64

0 7 34 41

27 78 105

0.556 0.444 1.000

Cell probabilities, specificity: MLA Ames 1 1 0.190 0 0.419 0.610

0 0.067 0.324 0.390

0.257 0.743 1.000

0 0.741 0.432

Likelihood WoE+ (db) MLA Ames 1 1 4.24 0 -1.39

0 -1.30 -3.64

0 0.4255 0.3017

P Posterior Pr (C–|Ames, MLA) MLA Ames 1 1 0.2734 0 0.5791

0 0.5745 0.6983

+

F

G –

Carcinogens (C ) Ames 1 0

E

MLA 1 123 74 197

Cell probabilities, sensitivity: MLA Ames 1 1 0.506 0 0.305 0.811 Likelihood ratio+ (odds) MLA Ames 1 1 2.657 0 0.727 Posterior Pr (C+|Ames, MLA) MLA Ames 1 1 0.7266 0 0.4209

0 12 34 46

0 0.049 0.140 0.189

When the tests are in conflict, the WoE towards carcinogenicity is negative: –1.3, which means: inconclusive. References Ahlers, J., Stock, F., Werschkun, B., 2008. Integrated testing and intelligent assessment – new challenges under REACH. Environmental Sciences and Pollution Research 15, 565–572. Bassan, A., Worth, A.P., 2008. The integrated use of models for the properties and effects of chemicals by means of a structured workflow. QSAR and Combinatorial Science 27 (1), 6–20. Berger, J.O., 1993. Statistical Decision Theory and Bayesian Analysis. Springer, New York. Bhattacharjee, M., Pritchard, C., Nelson, P., 2008. A Bayesian framework for data and hypothesis driven fusion of high throughput data: application to mouse organogenesis. Pacific Symposium on Biocomputing 13, 178–189. Blaauboer, B.J., Barrat, M.D., Houston, J.B., 1999. The integrated use of alternative methods in toxicological risk evaluation. Alternatives to Laboratory Animals 27, 229–237. Blaauboer, B.J., Andersen, M.E., 2007. The need for a new toxicity testing and risk analysis paradigm to implement REACH or any other large scale testing initiative. Archives of Toxicology 81, 385–387. Branscum, A.J., Gardner, I.A., Johnson, W.O., 2005. Estimation of diagnostic-test sensitivity and specificity through Bayesian modeling. Preventive Veterinary Medicine 68, 145–163. Campbell, M.J., Machin, D., 1993. Medical Statistics. A Commonsense Approach. John Wiley & Sons, Chichester. Castillo, E., Gutiérrez, J.M., Hadi, A.S., 1997. Expert Systems and Probabilistic Network Models. Springer-Verlag, New York. CEC, 2006. Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 Concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH). Commission of the European Communities, Brussels. Charniak, E., 1991. Bayesian networks without tears. Artificial Intelligence Magazine 12 (4), 50–63. Combes, R., 2007. Developing, validating and using test batteries and tiered (hierarchical) testing schemes. Alternatives to Laboratory Animals 35, 375–378. Dendukuri, N., Joseph, L., 2001. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics 57, 158–167. Doyle, J., 1992. Rationality and its role in reasoning. Computational Intelligence 8 (2), 376–409. ECB, 2005. Scoping study on the development of a technical guidance document on information requirements on intrinsic properties of substances (RIP 3.3-1). Report prepared by CEFIC, DK-EPA, Environmental Agency of Wales and England, ECETOC, INERIS, Keml and TNO. European Chemicals Bureau, European Commission, Joint Research Centre, Ispra, Italy. ECETOC, 2005. Alternative Testing Approaches. ECETOC Technical Report No. 97. European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC), Brussels.

ECHA, 2008a. Guidance on Information Requirements and Chemical Safety Assessment. Chapter R.7a: Endpoint Specific Guidance. European Chemicals Agency (ECHA), Helsinki. ECHA, 2008b. Guidance on Information Requirements and Chemical Safety Assessment. Chapter R.7b: Endpoint Specific Guidance. European Chemicals Agency (ECHA), Helsinki. ECHA, 2008c. Guidance on Information Requirements and Chemical Safety Assessment. Chapter R.7c: Endpoint Specific Guidance. European Chemicals Agency (ECHA), Helsinki. Engel, B., Swildens, B., Stegman, A., Buist, W., De Jong, M., 2006. Estimation of sensitivity and specificity of three conditionally dependent diagnostic tests in the absence of a gold standard. Journal of Agricultural, Biological, and Environmental Statistics 11 (4), 360–380. Eriksson, L., Jaworska, J., Worth, A.P., Cronin, M.T.D., McDowell, R.M., Gramatica, P., 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environmental Health Perspectives 111 (10), 1361–1375. Führ, M., Bizer, K., 2007. REACH as a paradigm shift in chemicals policy – responsive regulation and behavioral models. Journal of Cleaner Production 15, 327–334. Gallegos Saliner, A., Worth, A.P., 2007. Testing Strategies for the Prediction of Skin and Eye Irritation and Corrosion for Regulatory Purposes. EU – Scientific and Technical Research Series. Office for Official Publications of the European Communities, Luxembourg. Gardner, I.A., 2002. The utility of Bayes’ theorem and Bayesian inference in veterinary clinical practice and research. Australian Veterinary Journal 80, 758– 761. Gardner, I.A., Stryhn, H.S., Lind, P., Collins, M.T., 2000. Conditional dependence between tests affects the diagnosis and surveillance of animal diseases. Preventive Veterinary Medicine 45, 107–122. Georgiadis, M.P., Johnson, W.O., Gardner, I.A., Singh, R., 2003. Correlation-adjusted estimation of sensitivity and specificity of two diagnostic tests. Applied Statistics 52, 63–76. Gerner, I., Graetschel, G., Kahl, J., Schlede, E., 2000a. Development of a decision support system for the introduction of alternative methods into local irritancy/ corrosivity testing strategies. Development of a relational database. Alternatives to Laboratory Animals 28, 11–28. Gerner, I., Liebsch, M., Spielmann, H., 2005. Assessment of the eye irritating properties of chemicals by applying alternatives to the Draize rabbit eye test: the use of QSARs and in vitro tests for the classification of eye irritation. Alternatives to Laboratory Animals 33 (3), 215–237. Gerner, I., Zinke, S., Graetschel, G., Schlede, E., 2000b. Development of a decision support system for the introduction of alternative methods into local irritancy/ corrosivity testing strategies. Creation of fundamental rules for a decision support system. Alternatives to Laboratory Animals 28, 665–698. Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y., De Moor, B., 2006. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 22 (14), 184–190. Good, I.J., 1979. Studies in the history of probability and statistics, XXXVII. A.M. Turing’s statistical work in World War II. Biometrika 66, 393–396. Good, I.J., 1985. Weight-of-Evidence. a brief survey. In: Bernardo, J.M. et al. (Eds.), Bayesian Statistics 2. Elsevier, Amsterdam, pp. 249–270.

J. Jaworska et al. / Regulatory Toxicology and Pharmacology 57 (2010) 157–167 Good, I.J., 1988. Statistical evidence. In: Kotz, S. et al. (Eds.), Encyclopedia of Statistical Sciences, vol. 8. Wiley, New York, pp. 651–656. Gopnik, A., Glymor, C., Sobel, D.M., Schulz, L.E., Kushnir, T., Danks, D., 2004. A theory of causal learning in children: causal learning and Bayes nets. Psychological Review 111 (1), 3–32. Grindon, C., Combes, R., Cronin, M.T.D., Roberts, D.W., Garrod, J.A., 2006. Integrated testing strategies for the use in the EU REACH system. Alternatives to Laboratory Animals 34 (4), 407–427. Grindon, C., Combers, M., Cronin, M.T.D., Roberts, D.W., Garrod, D.W., 2008. Integrated testing strategies for use with respect to the requirements of the REACH legislation. Alternatives to Laboratory Animals 36, 2–27. Guzelian, P.S., Victoroff, M.S., Halmes, N.C., James, R.C., Guzelian, C.P., 2005. Evidence-based toxicology: a comprehensive framework for causation. Human and Experimental Toxicology 24, 161–201. Hakkinen, P.J., Green, D.K., 2002. Alternatives to animal testing: information resources via the internet and world wide web. Toxicology 173, 3–11. Hansen, B.G., Blainey, M., 2006. REACH: a step change in the management of chemicals. RECIEL 15, 270–280. Hansson, S.O., Rudén, C., 2007. Towards a theory of tiered testing. Regulatory Toxicology and Pharmacology 48, 35–44. Heckerman, D., Geiger, D., Checkering, D.M., 1995. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning 20, 197–243. Heckermann, D., 2008. A tutorial on learning with Bayesian networks. In: Holmes, D.E., Jain, L.C. (Eds.), Innovations in Bayesian Networks. Springer, Berlin, pp. 33– 82. Hengstler, J.G., Foth, H., Kahl, R., Kramer, P.-J., Lilienblum, W., Schultz, T.W., Schweinfurth, H., 2006. The REACH concept and its impact on toxicological sciences. Toxicology 220, 232–239. Höfer, T., Gerner, I., Gundert-Remy, U., Liebsch, M., Schulte, A., Spielmann, H., Vogel, R., Wettig, K., 2004. Animal testing and alternative approaches for the human health risk assessment under the proposed new European chemicals regulation. Archives of Toxicology 78, 549–564. Hoffmann, S., Hartung, T., 2006. Toward an evidence-based toxicology. Human and Experimental Toxicology 25, 497–513. Holmes, D.E., Jain, L.C., 2008. Innovations in Bayesian Networks. Theory and Applications. Studies in Computational Intelligence. Springer, Berlin. Horvitz, E.J., Breese, J.S., Henrion, M., 1988. Decision theory in expert systems and artificial intelligence. International Journal of Approximate Reasoning 2, 247–302. Jensen, F.V., Nielsen, T.D., 2007. Bayesian Networks and Decision Graphs. Springer, New York. JRC, 2005. REACH and the Need for Intelligent Testing Strategies. Institute for Health and Consumer Protection, European Commission, Ispra, Italy. Kim, B.S., Margolin, B.H., 1994. Predicting carcinogenicity by using batteries of dependent short-term tests. Environmental Health Perspectives Supplements 102, 127–130. Kirkland, D.J., Aardema, M., Henderson, L., Müller, L., 2005. Evaluation of the ability of a battery of three in vitro genotoxicity tests to discriminate rodent carcinogens and non-carcinogens. I. Sensitivity, specificity and relative predictivity. Mutation Research 584, 1–256. Kjaerulff, U.B., Madsen, A.L., 2008. Bayesian Networks and Influence Diagrams. A Guide to Construction and Analysis. Springer, New York. Lilienblum, W., Dekant, W., Foth, H., Gebel, T., Hengstler, J.G., Kahl, R., Kramer, P.-J., Schweinfurth, H., Wollin, K.-M., 2008. Alternative methods to safety studies in experimental animals: role in the risk assessment of chemicals under the new European Chemicals Legislation (REACH). Archives of Toxicology 82, 211–236. OECD, 2008. Workshop on integrated approaches to testing and assessment. OECD Environment Health and Safety Publications. Series on Testing and Assessment No. 88. OECD, Paris. Onis´ko, A., 2008. Medical diagnosis. In: Pourret, O., Naim, P., Marcot, B. (Eds.), Bayesian Networks. A Practical Guide to Applications. John Wiley & Sons, Chichester, pp. 15–32.

167

Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA. Pearl, J., 2000. Causality. Models, Reasoning, and Inference. Cambridge University Press, Cambridge. Pearl, J., Paz, A., 1987. Graphoids: a graph-based logic for reasoning about relevance relations. In: Du Boulay, B. et al. (Eds.), Advances in Artificial Intelligence, vol. II. North-Holland, Amsterdam, pp. 357–363. Pepe, M.S., 2003. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford. Petry, T., Knowles, R., Meads, R., 2006. An analysis of the proposed REACH regulation. Regulatory Toxicology and Pharmacology 44, 24–32. Pourret, O., Naim, P., Marcot, B. (Eds.), 2008. Bayesian Networks. A Practical Guide to Applications. John Wiley & Sons, Chichester. Rosenkranz, H.S., Klopman, G., Changkong, V., Pet-Edwards, J., Haimes, Y.V., 1984. Prediction of environmental carcinogens: a strategy for the mid-1980s. Environmental Mutagenesis 6, 231–258. Salem, H., Katz, S.A., 2003. Alternative Toxicological Methods. CRC Press, London. Schoerling, I., 2003. The Greens perspective on EU chemicals regulation and the white paper. Risk Analysis 23, 405–409. Smith, E.P., Lipkovich, I., Ye, K., 2002. Weight-of-Evidence (WOE) Quantitative estimation of probability impairment for individual and multiple lines of evidence. Human and Ecological Risk Assessment 8, 1585–1596. Spiegelhalter, D., Franklin, R., Bull, K., 1989. Assessment criticism and improvement of imprecise subjective probabilities for a medical expert system. In: Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence. Association for Uncertainty in Artificial Intelligence, Mountain View, California, pp. 335–342. Steyvers, M., Tenenbaum, J.B., Wagenmakers, E.-J., Blum, B., 2003. Inferring causal networks from observations and interventions. Cognitive Science 27, 453–489. Tarasov, V.A., Abilev, S.K., Velibekov, R.M., Aslanyan, M.M., 2003. Efficiency of batteries of tests for estimating potential mutagenicity of chemicals. Russian Journal of Genetics 39 (10), 1191–1200. Tennant, R.W., Margolin, B.H., Shelby, M.D., et al., 1987. Prediction of chemical carcinogenicity in rodents from in vitro genetic toxicity assays. Science 236 (4804), 933–941. Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D., 2003. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proceedings of the National Academy of Sciences USA 100, 8348–8353. Valerio, L.G., 2008. Commentary: tools for evidence-based toxicology: computational-based strategies as a viable modality for decision support in chemical safety evaluation and risk assessment. Human and Experimental Toxicology 27, 757–760. Van Leeuwen, C.J., Patlewicz, G.Y., Worth, A.P., 2007. Intelligent Testing Strategies. In: Van Leeuwen, C.J., Vermeire, T.G. (Eds.), Risk Assessment of Chemicals: An Introduction. Springer, Dordrecht, pp. 467–509. Vermeire, T.G., Aldenberg, T., Dang, Z., Janer, G., De Knecht, J.A., Van Loveren, H., Peijnenburg, W.J.G.M., Piersma, A.H., Traas, T.P., Verschoor, A.J., Van Zijverden, M., Hakkert, B., 2007. Selected Integrated Testing Strategies (ITS) for the risk assessment of chemicals. RIVM Report. RIVM, Bilthoven, The Netherlands. Walker, J.D., Gerner, I., Hulzebos, E., Schlegel, K., 2005. The skin irritation corrosion rules estimation tool (SICRET). QSAR and Combinatorial Science 24, 378–384. Wang, X.-H., Zheng, B., Good, W.F., King, J.L., Chang, Y.-H., 1999. Computer-assisted diagnosis of breast-cancer using a data-driven Bayesian belief network. International Journal of Medical Informatics 54 (2), 115–126. Zellner, A., 1998. Bayesian inference. In: Eatwell, J., Milgate, M., Newman, P. (Eds.), The New Palgrave. A Dictionary of Economics, vol. 1. MacMillan, London, pp. 208–218.