REGULATORY
TOXICOLOGY
AND
PHARMACOLOGY
7,113- 119 ( 1987)
Strengths and Weaknesses of Long-Term Bioassays’
M.SCHACHVONWITTENAU Department of Drug Safety Evaluation, Pfizer Central Research, Groton, Connecticut 06340
Received October 20, I986
The long-term bioassay for carcinogenic potential is a valuable tool for assessing human hazard, but suffers from severe limitations. Some of these follow diily from the use of animals as surrogates for man; others are created by study design and interpretations. It is recommended that much attention must be given to dose selection so as not to destroy the validity of the purported model and that emphasis must be placed upon elucidating the nature of the stimuli the test substance provides to the animal. Only after arriving at an understanding of the primary activity will it be possible to judge whether or not a similar response can be expected from man. The recently instituted change by the National Toxicology Program in expressing their interpretive conclusions promises a shift of the investigator’s focus from the response of a test System to the action Of the test substance. @ 1987 Academic Pm, IIK.
The organizers of this meeting asked me to discuss the strengths and weaknesses of long-term bioassays. Let me summarize. In principle, for identifying a human carcinogenic hazard, experimentation with animals appears to be the best tool we have. It has to its credit some notable successes. I do, however, believe that we abuse this tool and that many of the problems we encounter with the bioassay as currently used are of our own making, largely because we indulge in undisciplined thinking. Examples of the latter are ( 1) we avoid providing a meaningful definition of a carcinogen; how can we agree on the appropriate method if we fail to agree on what we are looking for?; (2) we classify substances not by their inherent characteristics, such as reactivity, but by observed responses of test systems, without knowing how such responses are elicited, but in full knowledge of our experience that such responses are test-system-dependent, and often can be manipulated at will; (3) we pretend that the rules of biology are uniquely inapplicable in the area of carcinogenesis, and thus insist that mice are men, that a few molecules in each of many mice have the same conse’ Dedicated to Professor Dr. George Biichi on the occasion of his 65th birthday. Presented at the Second Annual Meeting of the International Society of Regulatory Toxicology and Pharmacology, September 22, 1986, Arlington, VA. 113 0273-2300/87 $3.00 Copyrisbt 0 1987 by Academic Press, Inc. All rightsof repmduction inany form reserved.
114
M.
SCHACH
VON
WITTENAU
quence as many molecules in a few mice, and that the symptom of tumor formation must have only one cause. Discussions of the bioassay usually generate much heat and frustration, but seldom, if ever, lead to dispassionate examination of its scientific basis. The stakes are high in terms of economics, and scientific reputations are on the line. Yet, I believe, the views expressed usually are sincerely held by the proponents. It may be useful to recall two experiments which are symptomatic of past and current attitudes. They can be viewed as symbolizing opposite ends of the spectrum of opinions; they were designed to demonstrate preconceived notions, and, as they succeeded in doing so, are cited in support of the respective positions. Both are valid experiments, and both yielded valid data. Neither alone should be taken as the definitive example, but together they serve to illustrate the scope and limitations of the bioassay. While potentially misleading if either one alone is accepted as describing truth, in their entirety awareness of both will help us to approach the stated goal the bioassay is intended to serve. The first experiment is familiar to all of you. It was borne from a Weltanschauung which perceives carcinogenicity as a phenomenon different from any other biological event, and reflects the thinking which has dominated public policy. I am, of course, referring to the ED 01, or the mega-mouse experiment (Innovations in Cancer Risk Assessment, 1979). A chemical generally thought capable of causing cancer in many species produced cancer in mice in a dose- and time-related fashion. While the study contributed little toward resolving the threshhold question, it in no way upset the dogma of the validity of animal models, of the maximum tolerated dose (MTD), of the interrelationship between benign and malignant tumors, of the validity of statistical extrapolations, and of the necessity to use as many animals as possible and for as long a time as feasible. Even genetic toxicologists were happy. Perhaps the most seductive feature of this study and the philosophy it represents was that it seemed to justify transforming cancer research into a mechanical exercise, which required other than qualified pathologists, only large quantities of money, technicians well trained in good laboratory practices, and computer software describing the latest statistical model. A totally different perspective is provided by another experiment, which unfortunately has been widely ignored, although it is as significant as the ED01 study. In 1975, Riley published in Sciencethe observation that mice in a noisy environment developed mammary tumors much earlier than their siblings kept in a quiet surrounding (Riley, 1975). Stress was carcinogenic, no exogenous chemical was needed. The result of this experiment is no more surprising than that of the EDol study, but it makes a very different point. The validity of the mouse as a model for man, at least in quantitative if not also in qualitative terms, is highly suspect. Dose responsiveness likely extends over a very narrow range only, and extrapolations are not credible. Imagine a risk assessment concluding that one minute in the noisy environment carried a 10e6 risk for man, or that working three years under those conditions would inevitably result in cancer, and basing public policy on that. These two experiments can be cited in support of views diametrically opposed. The first example tells us that animal studies may yield information of great importance to man as has been shown for vinylchloride (Maltoni et al., 1975), and the second that mere tumor counts can be grossly misleading. Obviously, it is wrong to initially entirely adhere to either position when attempting human risk assessment. To cite
LONG-TERM
BIOASSAYS
115
the Riley study as evidence that all animal data are useless is as irresponsible as quoting the ED*, experiment as fully justifying unquestioning acceptance of animal tumor data as indicating human hazard. Where, then, are we today? Let us first refresh our memories as to the objective of the bioassay. As this is the same whether sponsored by government or the private sector, I wish to quote, albeit selectively, from the mission defined for NTP in 1978 by then Secretary of Health Califano. “The broad goal. . . is. . . the testing of chemicals of public health concern. . . to provide information to regulatory . . . agencies. . . .” Obviously, the bioassay should provide information upon which public policy decisions can be based. Predictive investigations rely on models. Within the context of the bioassay, we may view the models as consisting of two major components. One is the biological system we employ, such as a cell culture or a rat, whose response to a stimulus we observe. The second is the provocation we provide to this biological system. Obviously, as long as we have good reason to doubt that either the provocation, or the reaction to it is within the human situation, we cannot infer anything from the observed response. In other words, for a model to be useful, we must be sure that the stimulus we provide reflects human reality, and we must also understand to what extent the reaction of the biological system we use mimics the response of man. A purported model can mislead either because we select an inappropriate test system or because we subject it to inappropriate conditions or both. For the evaluation of carcinogenic potential in man we still must rely on data from chronic animal studies. In vitro and short-term in vivo experiments at times are useful adjuncts, but do not and cannot possibly provide the universal solution to the problem. For practical reasons, rodents are the animals of choice, usually rats and mice. Review of accrued data described in a paper in 1983 with my colleague, Paul Estes (Schach von Wittenau and Estes, 1983) and additional results seen since then has convinced us that if a rat experiment is conducted appropriately, a mouse study is not needed for reaching the stated goal. The objective is recognition of a human hazard, not the demonstration that under some circumstances in some test systems it is possible to elicit a tumorigenic response. As the Riley study demonstrated, the suitability of rodents to function as models for man is limited because a given stimulus may elicit a tumor response from the animal but not from humans. Liver tumors in rodents treated with enzyme inducers (Clemmesen and Hjalgrim-Jensen, 1978), thyroid tumors in rats associated with sulfonamides (Swarm et al., 1973), and leiomyoma in uterine ligaments in rats following chronic exposure to 0-2 agonists (Physicians Desk Reference, 1986) are further examples of instances where animal findings are believed not to be predictive of the human reaction. In addition to these inherent shortcomings of the animals, the experimental design, if inappropriate, can destroy the potential usefulness of the rodent. Beyond selecting the species to be used, choosing the conditions of exposure is the most important task. Two considerations need to be balanced; the ED,, , study seems to suggest that high doses yield more of the same tumors as do low doses, and that greater sensitivity is achieved with high doses. This led to the opinion that doses should be as high as is compatible with the animals’ growth and survival. In an attempt to further strengthen this view, it was pointed out in a recent survey (Haseman, 1985) that of 3 1 chemicals tested in feeding studies using two doses, only 5 were
116
M.
!XHACH
VON
WITTENAU
declared carcinogenic based upon low- and high-dose data, while an additional 8 were so labeled because of effects seen at the high dose only; i.e., 8 compounds would have escaped “detection” if the high dose had not been used. There are several problems with this reasoning. First, 3 additional substances showed tumor increases at the low dose only. These findings were dismissed as of little relevance, since there was no dose response. If the tumor incidences for these 3 chemicals were elevated due to chance at the low dose, how many of the 8 with high incidences in the high dose only were similar chance occurrences? Second, a perusal of the data obtained with compounds not declared carcinogenic revealed that, in most instances, for some tumor types, the control groups had a statistically significantly higher incidence than the high-dose groups. Do we conclude to have discovered anticancer drugs? Third, the finding raises the suspicion that very high doses predispose to cancer, and thus mislead. In this context, it is noteworthy that 10 of the 13 putative carcinogens were found positive in one species only. An interpretation as to meaning of the observation becomes very difficult indeed. The Riley experiment unequivocally demonstrates the dependence of tumor emergence upon distortion of the test system’s homeostasis. This, as well as other considerations derived from metabolism studies and generally applicable rules of biology, tell us that great care must be taken in selecting the dose. By definition, a minimum toxic dose affects balances within cells, or organs, or whole animals and thereby may introduce confounding factors, which make the result uninterpretable or outright misleading. To name just a few extreme and well-known examples, stone formation in the urinary tract is often associated with tumors (OSTP, 1984) and excessive doses of selenium result in liver necrosis (Selenium in Animal Feed, 1973) which, in rodents, predisposes to cancer. Other primary effects less easily recognized, but probably of similar impact, may be distortions of nutritional status, or hormonal homeostasis. In other words, inappropriate doses can provide a stimulus to which the animal responds with tumor formation. Humans may never experience the same provocation but, even if they did, might respond differently. Inappropriately designed experiments can generate what I have called “pseudocarcinogens.” These are substances which elicit a carcinogenic response only under very specific, highly model-dependent circumstances. When evaluated in more than one species, they often show apparent oncogenic effects only at excessive doses, and frequently little concordance between species is observed. These compounds may provide a variety of stimuli to which the test system responds in a system-specific manner with an increase in tumor incidences. A finding of apparent tumorigenicity is more indicative of the test system’s characteristics than of a general carcinogenic potential of the chemical evaluated. For pseudocarcinogens, extrapolating animal data to human risk is not justified. The OSTP paper makes a point of this (OSTP, 1985). It is apparent that dose selection is a judgment call, which is particularly difficult because our testing approach has never been properly validated. We do not know how to differentiate the true, transpecies carcinogens from the artifactual pseudocarcinogens created by faulty design of the experiment or from those active in one species only. We know that our past practices have labeled substances carcinogenic, of which 30-50% appear to exhibit such activity only in either the mouse or the rat, but not in both (Schach von Wittenau and Estes, 1983; Haseman, 1985; Purchase, 1980; DiCarlo, 1984; DiCarlo and Fung, 1984). When one rodent is not predictive for another, why should it be for man? To say “most human carcinogens are animal carcinogens,
LONG-TERM
BIOASSAYS
117
therefore, animal carcinogens are human carcinogens” is not persuasive. Do we assume all humans to be men because all men are humans? Careful dose selection and appropriate bioassay designs are so important because the past has come to haunt us. The scientific community is perceived by the public as having taught that carcinogenic activity is an inherent property of matter, different in kind from other potential toxicity; that chemicals apparently carcinogenic in animals represent a carcinogenic hazard to man; that such chemical is carcinogenic at any dose; and that the bioassay reflects good, thoughtful science. It is wrong to blame laymen for the Delaney Clause and the thinking it expresses; scientists created those concepts, and scientists defended them with an apparent certitude not warranted by the evidence. It is, therefore, not surprising that a “positive” bioassay gets immediate attention, and that the public clamors for regulatory action. If a real hazard has been identified, well and good, but if a “pseudocarcinogen” was created by inappropriate design or interpretation, we have the moral equivalent of crying “fire” in a crowded theater. Consequently, in contrast to other toxicology experiments whose meaning can be calmly elucidated, bioassays must be designed such that they are valid models for the human situation. This is the obligation of the experimenter. Occasionally, the public does not believe what it is told. The rejection by Congress of the proposed ban of saccharin must be understood as criticism of the scientific community for not providing a credible definition of a carcinogen. Inasmuch as the public is probably correct in this instance, such protest is beneficial because it forces reexamination of doctrines, and facilitates acceptance of less doctrinaire positions. Indeed, we are witnessing change. The OSTP “Review of the Science and Its Associated Principles” (OSTP, 1985) related to chemical carcinogens has to be considered a milestone because it attempts to delineate the complexity associated with carcinogenic risk assessment. Although some of us might have hoped for a cleaner break with the then prevailing dogmata, in fairness one should say that the paper goes a long way in preparing the grounds for different concepts to develop. Most importantly, the message clearly comes through that tumors in animals may have many causes, some relevant to man, some not, and that consequently, the design and interpretation of animal studies must carefully consider those factors that could confound the meaning of observed tumorigenesis. One perceives the beginning of a movement toward merging the tenets of carcinogenesis with the concepts guiding toxicology. Symptomatic of the same conceptual changes is the recent decision by the NTP to modify the language which describes the conclusions drawn from the bioassay (NTP, 1986). Although it is too early to know how this modification will be implemented, potentially the impact can be profound. The change in wording seems minor, yet it forces a complete reorientation. The old phrase, such as “evidence of carcinogenicity,” merely records the observed response of the test animals; it often appeared to simply reflect the level of statistical significance. The new terminology, “evidence of carcinogenic activity” has meaning only if it describes what action the test substance exerts upon the animal. For example, in the instance of selenium, the old system might have stated “some evidence of carcinogenicity,” while the new approach may say “evidence of hepatotoxicity,” or for phenobarbital, “some evidence for carcinogenicity” would be replaced by “evidence of P-450 enzyme induction,” or if in male rats thyroid tumors result from administration of an antithyroidal compound, there would be “evidence of antithyroid activity” rather than “evidence of carcinogenicity.” Biologists would not be surprised that the stimuli cited result in tumor prolifera-
118
M. SCHACH VON WITTENAU
tion in rodents, yet they would not consider these substances to pose a carcinogenic risk to man under foreseeable conditions of use. Laymen, however, are spared unnecessary anxiety because no longer would a carcinogenic risk to man be implied. Of course, substances such as 2-acetylaminofluorene, nitrosamines, aflatoxin, the true transspecies carcinogens deserve the label “clear evidence of carcinogenic activity.” The new approach quite properly requires from the experimenter not only greater attention to ancillary data obtainable from the same experiment, but also demands integration of all available data before drawing conclusions. More description of evidence related to possible mechanisms is expected, and, in the end, more judgment is called for. In short, this change in nomenclature hopefully signals the beginning of a transformation of the rote bioassay into a meaningful toxicology study. In summary, we must identify potential human carcinogenic hazards. We work in a climate which requires that we do not erroneously label chemicals as carcinogenic when they are not. We know that the animals we use can respond with tumor formation to many types of stimuli, some of which do elicit the same reaction from man, others do not, and several will not be experienced by man. It therefore seems to follow that prior to starting the experiment, we have to ask three questions, the first two address the validity of our model, the third its internal soundness. 1. Is the dose within the pharmacokinetic
range relevant to man?
If at low doses metabolic pathways are different from those at high doses, then at different doses different chemicals are tested. Which situation is relevant to man? 2. Does the dose provide a stimulus not likely to be exerted in man under foreseeable conditions of use? As a variety of stimuli/mechanisms can elicit apparent tumor formation, the model becomes invalid once the animal experiences effects humans do not (excessive pharmacologic activity, toxicity, hormonal imbalance, etc.). 3. At what dose is the physiology of the medicated animals altered to such an extent that they can no longer be legitimately compared to the control animals? In testing an antibiotic, gnotobiotic animals are compared to those with a normal intestinal flora. Each population has its own background incidence of tumors. Which tumors reflect the altered physiological condition, which result from chemical carcinogenesis? After completion of the study, we again ask the same three questions and determine whether or not a carcinogenic response can be observed. Statistics may play a useful role at this stage. If there is evidence for a carcinogenic response, our task starts to become challenging; we now search for information which allows us to recognize the stimulus the substance provided; i.e., we try to determine the activity of the chemical, or if you wish, to identify the mechanism. Our conclusion then is expressed by describing the strength of the evidence for carcinogenic activity. Only after we have accomplished that difficult task are we justified in attempting human risk assessment. This latter endeavor is outside of the topic, but it follows from the emphasis I put on mechanism that, in my view, biological considerations are pivotal, while statistical modeling plays a secondary role only (Schach von Wittenau, 1979).
LONG-TERM
BIOASSAYS
119
CONCLUSION The long-term bioassayusesrodents as models for man. The validity of the experiment is limited by different responsivenessof the animals, and by study design. To obtain the optimal information from the experiment, great care must be exercised not to invalidate a potentially useful model by faulty study design. Elucidation of the nature of the test substance’sprovocation of the animal is pivotal in judging human hazard, asthe responsesto such stimuli may differ for rodents and man. REFERENCES CLEMMESEN, J., AND HJALGRIM-JENSEN, S. (1978). Is phenobarbital carcinogenic? Ecotoxicol. Environ. SaJ 1,457-470. DICARLO, F. J. ( 1984). Carcinogenesis bioassay data: Correlation by species and sex. Drug Metab. Rev. l&409-413.
DICARLO, F. J., AND FUNG, V. A. (1984). Summary of carcinogenicity data generated by the National Cancer Institute/National Toxiwlogy Program. Drug Metab. Rev. 15,125 I- 1273. HASEMAN, J. K. (1985). Issues in carcinogenicity testing: Dose selection. Fundam. Appf. Toxicol. 5,6678. Innovations in Cancer Risk Assessment (EDa, study). (1979). Proceedings ofa Symposium S’nsored by the National Center for Toxicological Research, U.S. Food and Drug Administration, and The American College of Toxicology (J. A. Statfa and M. A. Mehlman, Eds.), Pathotox, Park Forest South, IL. MALTONI, C., CILIBERTI, A., GIANNI, L., AND CHIECO, P. (1975). Carcinogenicity of vinyl chloride administered by the oral route in rats Osp. Vita 2, IO- 109. [in Italian] National Toxicology Program (NTP) ( 1986). Notice of modifications in the levels of evidence of carcinogenicity used to describe evaluative conclusions for NTP long-term toxicology and carcinogenesis studies. Fed. Regist. 51(66), 11843 (April 7, 1986). Office of Science and Technology Policy (OSTP) ( 1984). Chemical carcinogens; Notice of review of the science and its associated principles (May 1984). Fed. Regist. 49 (lOO), 21594 (May 22, 1984). Office of Science and Technology Policy (OSTP) ( 1985). Chemical carcinogens: A review of the science and itsassociated principles (February 1985). Fed. Regist. SO, 10371-10442 (March 14,1985). PURCHASE, I. F. H. (1980). Interspecies comparisons of carcinogenicity. Bit. J. Cancer41,454-468. Physicians’ Desk Reference (1986). Proventil, p. 1650. RILEY, V. (1975). Mouse mammary tumors: Alteration of incidence as apparent function of stress.Science 189,465-467. SCHACHVON WI~ENAU, M. (1979). Cancer risk assessment. Science 206,1258-l 260. SCHACH VON WI-I-I-ENAU, M., AND ESTES, P. C. (1983). The redundancy of mouse carcinogenicity bioassays.Fundam. Appl. Toxicol. 3,63 l-639. Selenium in Animal Feed (1973). Fed. Regist. Vol. 38 (81), 10458-10461 (April 27,1973). SWARM, R. L., ROBERTS, G. K. S., LEVY, A. C., AND HINES, L. R. (1973). Observations on the thyroid gland in rats following the administration of sulfmethoxazole and trimethoprim. Toxicol. Appl. Phamacol. 24,351-363.