Parkinsonism and Related Disorders 14 (2008) S81eS83 www.elsevier.com/locate/parkreldis
Thoughts on limitations of animal models Thomas Hartung* EU Joint Research Centre, Institute for Health and Consumer Protection (IHCP), European Centre for the Validation of Alternative Methods (ECVAM), 21020 Ispra (VA), Italy
Abstract Animal models are a fundamental tool in the life sciences. They have advantages and disadvantages compared with other approaches. In a few instances, they represent the only reasonable approach. However, increasingly modern methods allow the 3R principle of reducing, refining and replacing animal experiments to be put into practice, as required wherever possible by European legislation. This article summarizes limitations typical of animal models. However, this does not mean that each and every limitation holds true for all animal models and that its alternatives have fewer limitations. A serious assessment of the models used is necessary to draw conclusions and make decisions in an evidence-based manner. Only by assessing the performance characteristics of any tool used in research, the results can be interpreted and the method employed can be complemented properly. Ó 2008 Elsevier Ltd. All rights reserved. Keywords: Animal models; Validation; Limitations; Evidence
No model is perfect. The term ‘‘model’’ implies deviation from reality, usually by simplifying and reducing variables. This is unavoidable if any progress is to be made, but we tend to forget the compromises we made to end up with something feasible. The limitations of in vitro models have been summarized recently [1]. These also hold true especially for animal models of human health effects, e.g. in toxicology, from which key examples are given below. - Species differences e only rarely can the same experiment be performed on humans and on experimental animals; where this is possible (e.g. skin irritation), about 60% correlation is found. However, it is easier to subject different laboratory animal species to the same experiment. Even among rodent species, correlations of only 70% are typically found (notably, most laboratory animal species are much more similar to each other than to humans). There is no reason to assume that any species predicts human health effects better than they predict effects on each other. Even monkeys * Tel.: þ39 0332 785939. E-mail address:
[email protected] 1353-8020/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.parkreldis.2008.04.003
-
-
-
-
can differ considerably, as tragically experienced in 2006 when the TeGenero anti-CD28 antibody led to multiple organ failure within hours in six volunteers, despite having been tested in monkeys at 500 times higher concentrations. Pharmaco- and toxicokinetics of substances are different in humans e humans are not 70 kg rats: there are tremendous differences from all laboratory animal species, e.g. in metabolic rates; food and water intake; pH in the gut; body temperature; metabolism of xenobiotics with regard to capacity, velocity and metabolites formed; body fat distribution; elimination routes; carrier proteins/protein binding; defence mechanisms; and life span. Inbred strains e the broad use of inbred strains has helped to reduce the variability of experimental results. However, working with identical twins implies that the variability of the animal population is precisely what is no longer reflected. Young animals e typically e for cost and logistic reasons e young animals are used, whereas most health effects are found in the elderly, often displaying other diseases in parallel. Restriction to one sex e this is not meant as a call for default testing in both sexes, but again the spectrum of
S82
-
-
-
-
-
-
-
-
-
-
T. Hartung / Parkinsonism and Related Disorders 14 (2008) S81eS83
sensitivities is reduced and there should be at least some reasoning on the choice of sex. Group sizes e usually, the smallest possible group size is selected, often without power analysis to establish whether a significant result can even be expected. The problem is aggravated because maximum information is often sought by studying multiple end points or time points without adjusting the statistics for multiple testing (e.g. standard test guidelines for repeat-dose toxicity include up to 40 evaluated end points). Costs e the maintenance of adequate animal husbandry and the availability of qualified staff is extremely costly. This can amount to 10s and 100s of thousands of euros, for example, for higher apes. As a consequence, the number of replications and repetitions is typically limited. Lethality as a side-effect e mortality in animals is often the result of lack of food and water, and is not only the primary effect of the substance being studied. If small rodents are incapable of feeding, they will die within hours e it is highly likely that many substances would not be toxic if a simple sugar solution was injected. Most probably, it is often translocation of bacteria from the gut that kill the animal, especially if an orally administered substance damages the gastrointestinal barrier. Similarly, loss of body heat most probably contributes to the lethality of many treatments. Unrealistic dosages and exposure durations e high-dose treatments are applied (in toxicology often at the maximum tolerated dose, i.e. with up to 10% of the animals dying), which hardly reflect human exposure. Many health effects are multifactorial e however, for example, the disease models used are mostly based on only one often-artificial cause. Animals are stressed e there are many reasons for stress in the animals: transportation, insufficient time for adapting, cages without enrichment (e.g. opportunities for speciesspecific behaviour), handling, procedures at daytime in nocturnal species, changes in groups for randomization, and too many male animals caged together. Adaptive responses of animals are underestimated e this holds specifically true for the fashionable knockout mice experiments: we often forget that these animals have had a lifetime to compensate for the gene defect, but we simplify our interpretation as if only one gene is lacking. Organisms are incredibly complex e however, we use them to answer simple questions. Any interpretation is an oversimplification of what has really happened in the experiment, which usually overwhelms our interpretation capabilities. Face validity versus validation e we tend to believe in the relevance of animal tests if some features resemble our expectations; in contrast, validation means to systematically challenge the relevance of a model. Standardization versus tradition e although standardization is a prerequisite for mutual acceptance of data especially in the regulatory context (often also to accumulate sufficient knowledge of the model itself), it needs to be
-
-
-
-
-
distinguished from the ‘‘we have always made it like this’’ approach (tradition). This means that models should be modified if there is strong reason for it, but this modification needs to be documented and spread at the same time to create a new standard. Proprietary data e an enormous scientific limitation is the fact that data on animal tests are not shared, especially in the regulatory context. This is done often due to commercial interests (transparency toward competitors, barriers to make use of the data for their developments), but sometimes it is just the hassle and lack of incentive for publication. Acute models of chronic phenomena e there are enormous pressures to generate results quickly, leading us to develop animal models that establish symptoms quickly even for modelling chronic health effects. Post hoc definition of the experimental hypothesis e it is quite common e especially in basic research e to change a hypothesis according to the results obtained ‘‘to tell a story’’ in a publication; few investigators and readers are aware that these impacts on the statistics to be used. Tests are either sensitive or specific e thus, we can either rely on positive or on negative results. However, we typically accept both outcomes of animal tests with the same confidence. Most animal models are not validated e the systematic assessment of performance characteristics (reproducibility, limit of detection, relevance of results, mechanistic basis) is often not done.
In ecology, ecotoxicology or veterinary medicine, the claim of relevance is often made since testing is carried out in the target species. However, with about 400,000 fish species, it is most likely that there is no less of a problem in modelling each other’s sensitivities and specifics than there is between mammals. Most limitations accounted for above also hold true when testing in the target species. However, even an almost perfect animal test would not help us as a standalone test. In diagnostic medicine, the reason for this is ‘‘prevalence’’. We are typically looking for rare properties of substances both in agent discovery and in toxicology. The portion of either toxic or effective substances in a panel of substances tested is normally in the low percentage range. Thus, even tests with good specificity will produce some false positives, i.e. in a similar or higher proportion than the real positives. We have alluded to this previously [2]. The typical combination of tests to increase sensitivity, e.g. in order not to miss any toxic compound, further reduces the specificity and thus increases the number of false positives. So, we are living in an imperfect world working with imperfect tools. What are the prospects of still producing relevant science, protecting the consumer and identifying new agents with a high probability? First, we need to understand the limitations of the tools. Where possible, this requires optimization, standardization and validation, which can lead us to an evidence-based approach [3]. Second, we must not rely on
T. Hartung / Parkinsonism and Related Disorders 14 (2008) S81eS83
one tool only: the combination of tools with complementary strengths such as a sensitive and a specific test can overcome weaknesses. Integrated testing, especially using different technologies, is most promising. However, the tools to compose testing strategies systematically as well as to validate them are only now emerging. Third, we must stop believing and start doubting. There is a tendency to believe a priori in the relevance of animal models (‘‘in vivo veritas’’). This is further augmented by the desire to come to a definitive conclusion (due to economic or publication pressures). Believing is probably the most non-scientific approach. Conflict of interest None declared.
S83
Acknowledgement This article is based on a presentation given at the LIMPE Seminars 2007 ‘‘Experimental Models in Parkinson’s Disease’’ held in September 2007 at the ‘‘Porto Conte Ricerche’’ Congress Center in Alghero, Sardinia, Italy.
References [1] Hartung T. Food for thought. on cell culture. ALTEX 2007;24:143e7. [2] Hoffmann S, Hartung T. Diagnosis: toxic! e trying to apply approaches of clinical diagnostics and prevalence in toxicology considerations. Toxicol Sci 2005;85:422e8. [3] Hoffmann S, Hartung T. Towards an evidence-based toxicology. Hum Exp Toxicol 2006;25:497e513.