Toxic. in Vitro Vol. 8, No. 4, pp. 787-791, 1994
~
Pergamon
0887-2333(94)E0066-3
Copyright © 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0887-2333/94 $7.00 + 0.00
Session 4 THE ROLE OF MECHANISTIC TOXICOLOGY IN TEST METHOD VALIDATION J. M. FRAZIER The Johns Hopkins Center for Alternatives to Animal Testing, The JohnsHopkins University, 615 N. Wolfe Street, Baltimore, MD 21205, USA Summary--There are two approaches to in vitro toxicity test validation, phenomenological and mechanistic. The phenomenological approach uses correlative mathematical techniques, with no regard to the identification of mechanistic relationships, to relate in vitro measurements of toxicity to /n vivo toxicological responses in order to establish the validity of the methodsunder consideration. This approach has three major limitations: (1) success or failure of a particular test will depend critically on the selection of test chemicals; (2) the reason why a chemical fails in a particular test is unknown; (3) without additional information there is no rational basis for extrapolation to new cases lying outside the domain of validation. The mechanistic approach addresses all of these issues: (1) mechanistic considerations are included in the selection of chemicals for validation; (2) the failure of a particular test to identify a given toxin means that the toxin does not act throughthe mechanism evaluated by the test, which is useful toxicological information; (3) any chemical that acts by the mechanism evaluated by the test will be identified. The major limitation of the mechanistic approach is our lack of knowledge concerning in vivo mechanistic toxicology.
Introduction The term 'mechanism' is open to diverse interpretations by researchers in the field of toxicology. This has often led to confusion in discussions of the validation of new testing methods. The following definition will be used here: a mechanistic explanation of an observed phenomenon is an explanation that describes the processes underlying the phenomenon in terms of events at lower levels or organization. In the case of toxicological phenomena, a description of in vivo lethality in terms of target organ toxicity would be a mechanistic description of lethality at the tissue pathology level. A mechanistic description of the tissue or organ responses would entail a description of the molecular or cellular events that contribute to tissue pathology. An even more basic description would be to describe the molecular initiation reactions (reactions of the toxicant with molecular targets) at the quantum mechanical level. These are all mechanistic descriptions at different levels of biological organization. What we label as mechanistic is entirely determined by the level of biological organization at which we observe a given phenomenon. In the context of toxicity test method validation more than one mechanism must be considered. First, there is the toxicological mechanism that is evaluated by the test method under consideration. This is mainly determined by the endpoint measurement used in the
test method. F o r example, if protein synthesis is the endpoint measurement, the test method will evaluate whether the test chemical interferes with cellular macromolecular synthesis, but it does not evaluate effects on cellular energy production/mitochondrial function or stability of cellular D N A directly. The second mechanism to consider is the one by which a chemical produces its toxic effects. Most toxicologists refer to this category of mechanism as the mechanism of action, and associate it with a description of the initial molecular/cellular events that ultimately lead to observable toxic effects. Whether or not the mechanism by which a chemical produces its toxic effects and the mechanism evaluated by a particular test overlap will determine whether that particular test will be appropriate to identify the fundamental toxicity of the chemical in question. It should also be recognized that a mechanistic description of the toxicological effect of a chemical is a qualitative statement. It is a verbalization of our understanding of the causal connections between events at a lower level of biological organization and the ultimate expression o f an observable pathological response. For mechanistic descriptions of toxicological phenomenon to be of greater value in the validation of new toxicity testing methods it is necessary to quantify the processes involved. The issue of quantification is addressed later in this review, first, I will consider validation of new in vitro methods in toxicology. 787
788
.I. M.
Validation The scientific community has been occupied with the issue of the validation of new in vitro methods for the past few years. Several issues will be reviewed here: (I) definition of validation, (2) procedure For validation, (3) criteria for evaluation, and (4) status.
A practical definjt~~n of validation has been proposed in the report of the CAATjERGATT Amden workshop (Balls et ub., 1990): “Validation is the process by which the reliability and relevance of a new method is established for a specific purpose”. The choice of words in this definition was carefully debated by the workshop participants and should be elaborated on. First, validation of a method is a process which, in all likelihood, evolves continuously. Validation never really stops since any new information concerning the reliability or relevance of the method merely adds to our understanding of its range and limitations (its domain of validity). Secondly, the process consists of two components: reliability and relevance. ‘reliability’ refers to establishing how reproducible the results are over space and time; ‘relevance’ refers to understanding the usefufness of the data produced by the method for the particular purpose for which the test is proposed. This dc~nit~on suggests what is conceptuaily needed for validation but certainly does not define a unique procedure to attain the goal, or more im~o~antly~ how t5 decide when the process is complete. As mentioned above, validation is never finished; however, there must be some key landmarks that can indicate progress. Procedures ,for vaiidation and crituia for evnhation
How vahdatiun is accomplished-the procedurecannot be given as an unique prescription. Several reports have discussed important. components of the process that must be taken into consideration (Balls et nl., 1990; Ekwall, 1992; Frazier, 1990 and 1993). The issues of reliability (ruggedness) can be established by intra- and interlaboratory comparative studies. The question of relevance is much more difficult to establish in the reahn of toxicity testing and risk assessment. Consider an analogy: suppose a new method for measuring distance was developed in a research taboratory. The validation of the new method would be relauvely straightforward. The new candidate method would be used to measure a reference standard under a range ofconditions that would be expected in the real world (e.g. wt different temperatures, under different canditions af humidity and vibration, with different technicians running the test, etc.). The reference standard used for the ~~l~datio~ process in the field is not usually the ‘g&l standard, which is kept in a controlled environment under lock and key, but a secondary standard which is an accurate approxjmat~on of the ‘gold’ standard and can be taken out into the field for on-site validation
FRAZIER
purposes. The criterion used to evaluate the validity of the method is not an absolute ‘yes’ or ‘no’, but in fact consists of establishing the limitations of the method (i.e. what is the precision of the method, under what conditions does the method work and not work, within what range of accuracy does the method agree with the secondary standard and therefore measure reality?). The answers to these questions constitute the end result of the validation process and anyone who wishes to make a measuren~~~t of a length can then decide whether the new method is sufficiently precise and accurate for their intended purposes under the expected conditions of use. The analogy with validation of new in d-o methods is useful for elucidating the problems we face. The first major problem is that we do not have a ‘gold’ standard, much less a secondary standard, to evaluate the validity of new methods. The ‘gold’ standard would be a large set of test chemicals with an appropriate distribution of target organ toxicities and a human toxicological database consisting of (1) known mechanisms of action at the moIecular/celIular level, and (2) adequate toxicokinetic data (concentrati5~-time profile) to establish an in vim c5ncentration.-response reiationsh~~ at the target cells. Unfortunately, we do not have a large collection of test chemicals that meet these criteria and can be used to investigate the predictive power of new in vitro methods. In the absence of a ‘gold’ standard, the secondary standard, by default, is in ~$0 data from animal mode& (mostly from rodents). In general, this database is also inadequate since mechanisms of action are still largely unknown and many in uit@ toxicity studies do not collect the data required for validation purposes: early m5lecular/ceIlular markers of toxic responses in uiuo and plasma concent~dtjon-time profiles. The second major problem is that generally we do not know exactly what the test method under evaluation actually measures in a mechanistic sense. When we run an MTT assay in vitro we know that the endpoint measurement (the optical density of the extracted blue dye) is related to the mitochondrial reducing capacity of the test cells during the course of the assay incubation. Iif the endpoint response in a cellular test system is inhibited by exposure to the test chemical, does this mean that the direct poisoning of the mitoch5ndr~al energy pathways is the primary cellular response to the chemical, or does it mean that the chemical kiled the ceil by some other m~chan~sm and ~~hibitio~ of mitocho~driaI function is a secondary effect? In most cases when using single endpoint assays at single time points it is impossibte to discriminate between many mechanistic possibilities, Thus, a mechanistic approach to toxicity testing necessarily implies that many toxicity endpoints must be evaluated over time in order to determine adequately critical cellular responses. Therefore we are faced with the situation that ifwe want to validate new toxicity test methods, we must
Mechanistic basis of validation establish some measure of relevance and to do this we must compare two ill-defined sets of data, in vitro test data and in vivo toxicity data. There have been numerous 'validation' projects over the last few years (Frazier, 1993), which have attempted to solve the practical and theoretical problems that arise. All of these projects have evaluated individual in vitro methods, using the test system to generate concentration-response data for a selection of test chemicals, and then compared these data sets with some measure of in vivo toxicity using statistical techniques. Little consideration has been given to the mechanisms of action of the selected chemicals. Thus, from a mechanistic point of view, the selection of chemicals has been essentially random (selection of chemicals belonging to a similar chemical class may minimize this problem in some studies). Given the discussion above concerning the importance of mechanistic knowledge in the validation process, it is to be expected that most good mechanistic tests will give poor statistical correlations under these conditions since many of the chemicals selected will elicit their toxic effects by mechanisms other than those that the test can detect. The traditional evaluation techniques of determining false positives and false negatives or simple rank correlations are not adequate for validation purposes. It is imperative to develop new techniques to evaluate new tests appropriately.
Phenomenological approaches to validation There are several approaches to evaluating quantitatively the validity of in vitro methods. The two approaches considered here are the phenomenological approach and the mechanistic approach. The phenomenological approach involves a direct comparison of the in vitro endpoint measure and an in vivo data set (e.g. a comparison of the ECs0 of the in vitro test method with the LDs0 data for a set of standard test chemicals). If the LDs0 s are known with absolute certainty (which is seldom the case), a mathematical model can be proposed to relate the observed in vitro EC~0s to the in vivo LDs0s, regression analysis can be used to obtain optimal fit of model parameters, and correlation coefficients can be computed to evaluate goodness of fit. The mathematical relationship chosen for this computation will affect the quantitative outcome. For example, a linear model can be selected, linear regression analysis conducted and a correlation coefficient determined. However, the selection of a non-linear model may result in a better fit in the sense that a better correlation coefficient may result. The question is which mathematical formulation correctly describes the relationship between the in vitro test results and the in vivo measure of toxicity..4 priori the answer is not apparent from theoretical considerations. The underlying issue is whether the mathematical relationship, which relates the in vitro test response in concentration units to the in vivo response in dosage
789
units, is a linear function or not. In theory, the mathematical order of the function can be elucidated empirically by studying a large collection of reference chemicals and determining mathematically the best fit. The more chemicals included in the study the greater the likelihood of correctly determining the order of the relationship. Unfortunately, the success of this approach depends on the assumption that all of the reference chemicals used in such a study fall into a mechanistically similar toxicological category (i.e. produce their adverse effects by a similar mechanism). If the chemicals fall into many different categories of toxic mechanisms, then it should not be expected that they all exhibit the same functional relationship between the ECs0 and the LDs0. By using a (mechanistically) mixed collection of reference chemicals, the consequences will be a significant scatter in the data, thus hiding the true mathematical order of the functional relationship. The real barrier to success of the phenomenological approach is that toxicology as a science has not satisfactorily established the mechanistic classification of in vivo toxicity of a sufficient number of chemicals to allow for a rational selection of test chemicals for correlative validation to progress. A second form of phenomenological validation is to determine the ranking of test chemicals and compare in vitro with in vivo data using rank correlations (or a concordance evaluation). This approach has the advantage of not requiring a specific choice of the mathematical order of the relationship, but to be successful the correct 'mapping' of the ranking of the & vitro to the in vivo data must have no crossovers (i.e. the two data sets do in fact have the same inherent ranking of the reference chemicals). This assumption has the same limitation as discussed above. If all the reference chemicals in a validation collection are mechanistically similar, then there is a rational basis for expecting that the inherent ranking is the same & vitro a n d / n vivo. However, if chemicals of different mechanistic categories are included in the collection, then crossovers would be expected and the evaluation fails since the essential criterion for success can never be fulfilled. Even if it were possible to obtain several distinct collections of reference chemicals, with all of the chemicals in each class having a similar mechanism of toxicity, and to determine empirically correlative predictor relationships for each class separately, the ultimate limitation of the phenomenological approach for predictive purposes would be the determination of the mechanistic classification of a new test chemical. Without an independent determination of the mechanistic category of the test chemical, it would be impossible to predict reliably the LDs0, since the LDs0 value predicted would depend on which predictor relationship was selected. Pbenomenological relationships certainly have their value in detecting possible toxicity (hazard identification), and furthermore may be sufficient for certain decisions in
790
J.M. FRAZIER
product development (screening tests), but such approaches are not adequate for definitive toxicological decision making since the inherent uncertainties are not quantifiable.
Any one of these three factors, or a combination, can determine target organ toxicity in vivo. In vitro toxicity testing systems are best suited as models to describe the central events of the toxicological process--initiation (i.e. early molecular and cellular effects). Thus, a mechanistic validation of an in vitro toxicity testing system should compare the responses in the in vitro test with the appropriate in vivo correlates. As an example, suppose an in vitro toxicity test based on human hepatocytes was developed to evaluate the potential of new chemicals to cause acute hepatotoxicity in humans. The appropriate 'gold standard' for validation would be a set of concentration-response curves that describe the relationship between a measure of hepatocyte exposure (possibly the peak plasma concentration of the chemical or the area under the plasma concentration curve) in humans after a single exposure to the chemical versus an early marker for hepatotoxicity, such as a serum enzyme marker, for a collection of reference chemicals with known mechanisms of action. If the mechanistic hypothesis is correct, then the in vitro concentration-response curves would resemble, qualitatively and quantitatively, the in vivo curves, assuming that the endpoint measurement of the in vitro test is an adequate marker for the mechanism of action of the reference chemical set. Again, a major limitation of the validation process is the ability to select a set of reference chemicals that have adequate toxicity data and defined mechanisms of toxicity. The advantage of the mechanistic approach to validation is that factors that confound the overall dose-response relationship--toxicokinetic and toxicodynamic factors--are isolated from the validation process and the performance of the test is evaluated against the appropriate standard which reflects the mechanistic basis of test system. If in vitro methods ultimately are to form the basis of a chemical safety evaluation program, then it will be necessary to
Mechanistic approach to validation The second general approach to validation is mechanistically based. The theoretical basis for developing a mechanistic description of in vivo toxicity lies in an understanding of the toxicological process (Fig. 1) and quantification of this process in terms of the dose-response relationship. The conceptual connection between exposure and a toxicological effect can be segregated into three components: toxicokinetics, initiation, and toxicodynamics. There is an underlying causal connection between these components (e.g. initiation of toxicity by the molecular interaction of the toxicant with a molecular target cannot occur before the arrival of the toxicant or its metabolite at the site of action). Similarly, pathological responses at the organ level do not develop before responses at a lower level of biological organization. Thus, the probability [P~nv~.o(TE,t:E)] that a given exposure (E) will result in an observed toxicological effect (TE) at some time, t, can be described as the product of three terms: P,n ~,,~.o(TE,t: E) = PK (X,t: E)
x P~ (CR,t:X) x PD(TE,t:CR) where PK(X,t:E) is the probability that given a particular exposure (E) a concentration X of the active form of the toxicant will occur at a specific tissue, P~ (CR,t:X) is the probability that given a particular concentration of the active toxicant (X) there will be a cellular toxic response (CR), and PD(TE,t:CR) is the probability that given a particular cellular response, ultimately a measurable toxic effect will be expressed at a higher level of biological organization. Exposure
Initiation
Toxicokinetics
Toxicodynamics
Absorption pn
j
a,T
/, E2 ~ Distribution and
•
T+M
Environment
Excretion
Organism
level
~'
= El0
E7
El/ "'E6 "I
~M'
E3 ~ Storage
E8
E9
Ell
""-.
E5
N'x~ E 4 / , ' ~ ~1123~
•
Molecular
level
=
Cellular level
Fig. I. Schematic representation of the toxicological process in vivo.
P Organism level
Mechanistic basis of validation develop in vitro tools that can add back into the equation the kinetic and dynamic factors that modulate intrinsic cellular toxicity in vivo.
Domain of validation]domain of validity The range of applicability of a validated test is another issue that must be considered. A test is validated for a limited set of reference chemicals. Usually the chemicals selected are chosen on the basis of the availability of in vivo toxicity data and often chemical structural considerations are neglected, or are only a secondary consideration. The set of chemicals tested in the validation study defines the domain of validation of the method within the universe of all possible chemical structures. The actual domain of validity (i.e. all chemicals that would be correctly evaluated by the test), is presumably much larger. When selecting chemicals for validation of tests there should be an attempt to optimize, on a structural basis, the domain of validation in order to form the basis for extrapolation of the method to its maximum range of applicability. These issues have not been adequately explored in the context of validation. The approach taken for the extrapolation of a method to chemical classes outside the domain of validation of the test will depend on whether a phenomenological or mechanistic validation was applied. Phenomenologically validated tests have no basis for extrapolation to chemical classes outside the domain of validation since there is no basis for assuming the predictor equation will apply to any new class of chemicals. It is possible to interpolate a phenomenologically validated method to new chemicals that fall within the defined class of reference chemicals used to establish the predictor relationship. However, as discussed above, for a phenomenologically based method to be fully reliable, even for interpolation, an operational procedure must be described to certify that a new chemical to be evaluated does in fact fall into the same mechanistic class as the reference chemicals. Without some assurance that a new chemical falls within the domain of validation then the uncertainty in the predicted result would make it unacceptable for many decision-making purposes, including regulatory decisions. On the other hand, if the test method is validated in the mechanistic sense (i.e. there is a reasonable level of confidence that a chemical which is positive for a particular mechanistically defined toxicological endpoint can be expected to exhibit the same response in vivo) then the in vitro test will presumably be a marker for that effect regardless of the chemical class of the test material. Thus, a mechanistically validated test will have a domain of validity that is identical to the universe of all chemicals (i.e. out of all chemicals in the universe, the test will identify
791
those chemicals that act through the mechanistically defined endpoint). As with all model systems, there will always be some exceptions that are attributable to artefacts in the test system, but in principle the method will have universal applicability. This is a major advantage of mechanistic validation.
Conclusions The validation of new in vitro methods for toxicity testing is still a difficult barrier to surmount. The main reason is that the mechanistic basis of in vivo toxicity is still poorly understood. In pregnancy testing, there is a single biological condition to be ascertained and definitive biomarkers have been identified that can be determined in vitro. These biomarkers are necessary and sufficient conditions for the biological state to exist. Thus, pregnancy testing is an example of an ideal situation for the replacement of an in vivo test with an in vitro test. Toxicity testing is not so simple: more than one pathological condition can develop, definitive biomarkers (i.e. necessary and sufficient conditions for the development of each of the potential pathological conditions) have not been identified by in vivo toxicological research, and the mechanistic bases of the measurements in many of the in vitro test systems are not fully understood. In the light of these limitations in the scientific basis of in vitro toxicity testing, it is not surprising that validation of the new methodology has presented such a perplexing problem. For these reasons, in vitro toxicity testing methods are still in the early stages of development. Much of the current research is observational and empirical, as would be expected early in the evolution of a new scientific endeavour. Acknowledgements--This work has been supported by the Johns Hopkins Center for Alternatives to Animal Testing and the National Institutes of Health, National Center for Research Resources (P40-RR05236).
REFERENCES
Balls M., Blaauboer B., Brusick D., Frazier J., Lamb D., Pemberton M., Reinhadt C., Roberfroid M., Rosenkranz H., Schmid B., Spielmann H., Stammati A. and Walum E. (1990) Report and recommendations of the CAAT/ERGATT workshop on the validation of toxicity testing procedures. ATLA 18, 313-336. Ekwall B. (1992) Validation of in vitro tests for general toxicity. AATE,Y I, 127-141. Frazier J. M. (1990) Scientific Criteria for Validation of ln Vitro Toxicity Tests. OECD Monograph No. 36. pp. 1-66. Organisation for Economic Cooperation and Development, Paris. Frazier J. M. (1993) CA A T Technical Report 5: International Status of In Vitro Test Validation. The Johns Hopkins Center for Alternatives to Animal Testing, Baltimore, MD.