Fd Chem. Toxic. Vol. 32, No. 2, pp. 97-101, 1994 Copyright ~ 1994 Elsevier Science Ltd
Pergamon
Printed in Great Britain. All rights reserved 0278-6915/94 $6.00 + 0.00
OCCLUSIVE PATCH METHOD FOR SKIN SENSITIZATION IN GUINEA PIGS: THE BUEHLER METHOD E. V. BUEHLER Hill Top Biolabs, Inc., Cincinnati, OH 45242, USA
Abstract---Currently. the European Community is in the process of discussing the details of how to conduct various protocols for safety assessment. Additionally, there is a desire that these protocols be harmonized internationally. This presentation attempts to describe the critical and non-critical parameters of the protocol for detecting potential contact allergens in the guinea pig and also discusses some controversial issues. It is also emphasized that the identification of a test material as a potential sensitizer is only the first step in the risk-assessment exercise.
I wish to thank the organizers of this symposium for the opportunity to present my views on the conduct of protocols using the guinea pig to predict delayed contact hypersensitivity (DCH). It is particularly appropriate and timely, since the European Community is currently reviewing its regulatory policies with regard to safety-testing protocols. Therefore, I would like to discuss very briefly the basic concepts of the Buehler approach, and then provide you with some critical and non-critical variables as I perceive them. First of all we must recognize that the human disease of allergic contact dermatitis results only after epicutaneous contact, and does not occur under other exposure conditions. The initial contact stimulates the immune system, and it is only after the second exposure that we see an elicitation response. This response can be either mild or severe depending on a variety of circumstances. Perhaps the most critical immunokinetic parameter is the hapten interaction with host protein to form complete antigen and its subsequent penetration into the skin to contact and stimulate the immune system. Based on these considerations, the exposure conditions that we chose to exaggerate, in order to investigate the development of delayed contact hypersensitivity, was occlusion and exposure time. Occlusion provides several benefits. It hydrates the skin, keeps the test material from evaporating, and mobilizes and increases the numbers of Langerhans cells at the epidermal/dermal junction. Before discussing the critical and non-critical parameters of the test protocol, as well as identifying some controversial issues, we need to review the sequence of events that occurs during the development of allergic dermatitis, in the first place, the antigen or the sensitizer must have skin contact and must be substantive to the skin. It must have
physical/chemical properties that result in diffusion from the vehicle and penetration into the skin. At least theoretically, it needs to react with protein. There are a variety of proteins in the skin: they can be epidermal or dermal and can be soluble or insoluble. The biological characteristics of this hapten/ protein (antigen) is perhaps what determines the quality and quantity of the mature immune response. Fourthly, there is a macrophage processing step and this is primarily accomplished by the Langerhans cell. The Langerhans cell resides at the dermal/epidermal border. It is dendritic, so that it has a large surface area to interact with antigen. This cell is absolutely essential, I think, to the development of delayed contact hypersensitivity. Other macrophage systems (i.e. splenic dendritic cell) may or may not lead to immunological suppression. Restriction probably also plays a role in that class I! antigens are necessary for the specific interaction of macrophages and lymphocytes. Once these steps have occurred, the host's immune system responds. Whether this response primarily occurs peripherally (skin) or centrally (spleen and lymph) is perhaps still controversial, but it is obvious that the entire skin-associated lymphoid tissue (SALT) is involved (Streilein, 1983). In the development of an animal model, the most important parameter is the selection of species. The standard species for experimental DCH is, of course, the guinea pig. With current technology, there are basically two ways to exploit the system in order to optimize the probability that contact hypersensitivity will occur. One is to use experimental adjuvants that generally stimulate the various cellular components of the immune system and the other is to occlude. The advantages and disadvantages of these two approaches have been reviewed elsewhere (Buehler et al., 1981 and 1985). Directions for the conduct of the Buehler method have been published (Ritz et al., 1980), and they should be followed explicitly. The basic protocol that
DCH =delayed contact hyper; DNCB ffi dinitrochlorobenzene; GLP ffiGood Laboratory Practice.
Abbreviations:
97
E. V. Btmtt~R
98 Table I. Results from closed patch studies in the guinea pig using three common sensit/zers Sensitizer 2.0% Phenylene diamine 5.0% Formalin 50% Benzocame
Incidence 10/I0 3/10 2/10
From Buehler (1965).
we recommend involves three induction exposures: one per week for 3 weeks on a selected site on the flank of the guinea pig. There is a 2-week wait between the last induction exposure and the primary challenge. We propose that the challenge be on a naive test site, and that the resulting reaction be compared with a naive control group. Subsequently, there are three or four additional sites on the guinea pig for rechallenges, which can then be used either for affirming the development of DCH or for doing a number of risk-assessment procedures (Buehler, 1985). Table I presents data from the original publication (Buchlcr, 1965) and illustratesselected results from an array of sensitizcrsthat were used to validate the application of the occlusive patch. These three test materials are presented because they have been used by a French group (Guillot et al., 1983) to investigate several Frcund's adjuvant tcchniques, and were selected on the basis that there was a strong sensitizer (phcnylene diamine), a moderate sensitizer (formalin),and a weak sensitizer(benzocainc). The authors had difficulty in establishing benzocaine as a sensitizer with some of the Frcund's adjuvant tests but, as you can see, in 1965 wc readily sensitized guinea pigs to this material. If there has bccn a criticism of the Buchler test, it is that it does not identify weak sensitizers.This is an issue I have been trying to dispel for almost 30 years. I think bcnzocainc is a good example of a weak sensitizer.W c use this compound as a positive control whcn and if it is desirable to dcmonstrate hypoallergcnicity. W e have found that by using 20 animals, we are almost assured that at challenge we will get one or two animals that respond in a positive manner. If there is no response at primary challenge, we will rechallenge and, in every instance, we have had a positive response. I will now review the various parameters of the test protocol and identify their importance for a successful outcome. I have listedparameters that I think arc absolutely necessary for the investigator to consider and to optimize during the planning stage (Table 2). First Table 2. Critical parameters of the Buehlef method Vehicle selnction---cthanol and acetone Highest possible concentration--mass/area Occlusion---restraint v. wrapping Time of exposure Pilots. induction, challenge
Clipping Depilation (?) Naive controls
of all, vehicle selection is of absolute importance. It is not always easy to select the right vehicle, and sometimes it is expedient to use the formulation that you might be testing the raw material for, if it is non-irritating, but otherwise there are a number of vehicles that are appropriate. If there is no obvious choice for a vehicle, we recommend a combination of 80% ethanol for induction and acetone for challenge. The reason for this combination is that early observations indicated that 80% ethanol could cause reactions in both humans and guinea pigs, which could confuse the interpretation of the test (Ritz et al., 1980), even though ethanol is an outstanding vehicle in most instances. Water and dilute surfactant solutions are also appropriate. Mineral oils and petrolatum are variable in their irritating potential and should be avoided if possible. Secondly, it is essential to use the highest possible concentration. It is not appropriate to select lower concentrations of test material, based on safety factors or other considerations, when the intent is to determine the sensitizing potential of a raw material. Safety factors and other risk-assessment parameters need to be determined subsequently. Appropriate concentrations are established with pilot tests. For induction, we will use a concentration that produces only moderate irritation (an occasional grade I) but we will perhaps accept all the animals showing patchy erythema (grade +). We will then do a second pilot, since we are generally using a second vehicle, and we will select, for challenge, a concentration that is less irritating than that selected for induction. Ideally, the naive control group should have approximately half the animals with skin reactions that do not exceed a patchy erythema (grade +). Occlusion, of course, is absolutely essential during all phases of testing. It should be emphasized that one can wrap an animal and not get occlusion. Restrainers are not expensive. They do not cost much more than a hall-dozen guinea pigs. Wrapping animals to achieve occlusion is inappropriate and will invariably result in an eventual failure to achieve satisfactory results. Wrapping animals rather than the use of restraint is the most common and most serious deviation from the intended protocol. The time of exposure is also obviously important. With dinitrochlorobenzene (DNCB), we thought that we had established 3 hours as a minimum exposure period. We know now that there are materials that do not require this much exposure, l-lov,ever, we use 6 hours as a standard, simply because this is achievable in a working day. 24 hours produces stress on an animal, particularly if there are more than three exposures, and therefore should be avoided. Clipping is essential in order to be able to provide a suitable surface for the adhesion of the occlusive patch. It is also probably adequate for scoring. I present depilation with a question mark, because I believe one should depilate at scoring. We depilate 2 hours prior to scoring and at no other time.
The Buehler method Table 3. Variable parameters of the Buehler method* Group sizes Patch test system--Hill Top Chamber/Webril Size Volume No. of applications Induction intervals Scoring intervals Vehicle control No. of patches at challenges *These parameters need to be adjusted according to the specific needs of the experiment.
Depilation removes the hair stubble so one can observe even very slight reactions on guinea pig skin. I do not think that strong reactions will be missed if you do not depilate, but with depilation it is much easier to score. It is essential to have naive controls. We will talk a little bit later about vehicle controls, but the naive controls are absolutely essential. The reactions on the naive controls are the baseline data for determining whether there is increased reactivity in any of the test animals. Variable parameters are listed in Table 3, and they need to be adjusted according to the specific needs of the experiment. Group size is an obvious and important consideration. We recommend 20 test animals and 10 controls in most experiments. If you use more than these numbers you will increase your likelihood and, of course, if you use less you will decrease it. For strong sensitizers, 10 is more than adequate. With a material like benzocaine, if you wanted to be safe you might want to increase the test-group size, but 20 is adequate if a rechallenge is planned. We are confident that for weak sensitizers with a comparable potency of benzocaine can be identified with a 20 animal test group size. Patch-test systems should be optional. The critical part of any patch-test system is the pad that holds the test material. Webril is preferred and perhaps should be required. But whether or not you use the so-called professional pad or the Hill Top Chamber is really not important. You can produce DCH with either system. The size of the patch depends on the selected system, and the volume of the material that is used on that patch should be determined by the size and absorbency of the patch rather than a volume-toweight or a volume-to-area relationship to the guinea pig. The patch needs to be saturated so that there is maximum concentration at the interface of the patch with the skin. Saturation assures an adequate external reservoir of test material to assure optimum penetration into the epidermis. The number of applications is another variable. Our standard procedure is one induction per week for 3 weeks, and it is primarily based on expediency. We found that with DNCB nine applications were no better than three, and assumed that three was better than one. The rationale was no more complicated than that. Many investigators still prefer nine
99
applications, and this is certainly acceptable, although resulting in more effort. One induction per week for 3 weeks has been proven to be more than adequate and is certainly successful with benzocaine. With 0.1% DNCB only one induction is needed. Intervals between inductions can also be varied. Three times a week, or once a week, or once or two times a week, or once every other week, I do not know that there is a whole lot of criticality in selecting any of these intervals. Scoring intervals can also be adjusted. Delayed hypersensitivity, as a phenomenon, differs from primary irritation in terms of incidence of susceptible reactors and also from its temporal aspects. DCH is delayed in appearance and persists or becomes more severe as time progresses. It is important, therefore, to determine this parameter. To make the observations exactly at 24 and 48 hours is not a critical factor, but whether or not the reaction persists must be considered for interpretation. We recommend a vehicle control when it is a new or different vehicle from a historical perspective. Alcohol and acetone have been used so many times that we no longer use a vehicle control, and when water is used we would probably not use a vehicle control. Therefore, use of a vehicle control is a selective process. Naive controls are the essential element. The number of patches at challenge is a variable. We would recommend a single patch at challenge, but up to two or three patches for rechallenges is perfectly acceptable. The primary objective of the experiment should determine the number of challenges and their sequential progression. Non-critical parameters are presented in Table 4. These are specifically written into most protocols and legislative requirements but have no effect on the outcome of the test. First is the selection of sexes. Prior to the development of the current guinea pig assays, it was thought that "'female guinea pigs were more susceptible to DCH". That is not so. There is no indication that males or females differ in their responses. The exact age and weight of the guinea pig is not critical as long as they are young adults. It is the size of the cage and the restraint that limits or determines the size of the guinea pig: if you do not have a cage to hold a 500-gram guinea pig, that is what limits its size. Certainly, the animal needs to be the right size to fit into the restrainer. We have two sizes of restrainers so that we can use larger animals if we get into longer-range programs, but the
Table 4. Non-critical parameters of the Buchler melhod*
Selection of sexes Age/weight--cage and restrainer size Scoring of induction sites Depilation (7) Patch positions Histopathologic interpretation "These parameters have no effect on the outcome of the test.
lO0
E. V. BL'EHLER Table 5. Controversial issues arising occasionally from the conduct of the Buehler method Validation/positive controls Vehicle control Animal identification Scoring system--interpretation Equivocal (borderline) results Impact of irritation Wrapping r. restraint Incidence v. severity
age and weight have no effect on the outcome of DCH. Many protocols require the scoring of induction sites. These data are not useful for interpretation of results. If you have a good, strong sensitizer, it will show up during the second or third induction application. It is readily observed but, without the proper controls, cannot be differentiated from cumulative irritation. The quality of the reaction is determined easily at challenge or rechallenge where the proper control (naive) is available; therefore, I think that the scoring of induction sites is superfluous. Depilation is again listed, simply because it has no effect on the quality of the data or on the outcome of the tcsts. There has been some question as to whether a depilatory could alter the response because it is an irritant and a sensitizer; however, we see no indication of this effect. Depilation aids in the precision of visual scoring. There is no essential difference in the reactivity of the three challenge positions on each flank of the guinea pig. Depending on what is needed to be accomplished during the challenge phases, the patch positions should be considered as non-critical. It should be kept in mind when selecting sites that reaction at multiple test sites can be interactive so that these might well be separated. The last item is histopathological interpretation. This has never been a fruitful area of investigation. As far as i know, there is still no adequate histopathological technique that differentiates between irritation and contact sensitivity. It is recommended that rechallenge be used to confirm the presence or absence of sensitization, rather than a histopathological evaluation of a mature inflammatory process. Table 5 provides a list of controversial issues that have come up from time to time, and I would like to discuss each of these. First, what is required for validation and the use of positive controls. The history of the laboratory needs to be considered when deciding when and how often to use positive controls in order to validate a procedure. We validate twice a year using DNCB as a standard sensitizer. Animals are sensitized with a moderately high concentration (0.1%), and each of the test groups is then rechalIcnged with two concentrations (0.1% and 0.01%). The intent is to have one test group with 80-100% of the animals sensitized, and one with 40-50% of the animals sensitized. With this procedure, an array of
skin reactions is produced and each technician scores the animals blindly. All the data are recorded, put into tabular form and looked at for discrepancies. If there are discrepancies among the scorers, a senior technician is selected to review and mediate the process and to re-educate the scorers towards compliance. The guinea pig responds in a standard and almost invariable manner to DNCB, and thus it is not necessary to incorporate a positive control into every experiment. Likewise, a vehicle control is not necessary when standard non-irritating vehicles are used. The vehicle control has been controversial. Animal identification is an issue because of good laboratory practice (GLP). GLP requires that subacute tests need to have specific animal identification. We have resisted tattooing and use of ear clips because of their theoretical potential to sensitize animals. Other methods of identification have not proven to be adequate. Therefore, we have developed a relatively complex animal identification system which is based on a colour code and receipt date. As animals are moved from the cage to the restrainer, it will be made sure that the right animal gets to the right spot and back to its cage. This system has been approved by the FDA GLP compliance groups and works very nicely. The scoring system and the interpretation of results are certainly important considerations. Critics suggest that the tests are subjective, although they are no more subjective than others in toxicology. In addition, there is also the question about the impact of irritation on the development of the immune reaction and its effect on interpretation. Since we recommend that slight irritation with irritating materials be produced during the induction phase, it becomes an issue as to whether this procedure interferes with interpretation or adds to the sensitization potential. Irritation is an indication that the test material has penetrated the skin. l have never seen any indication that irritation pe r se will alter the course of the immune response, even though we have tried many ways to irritate in order to increase the sensitivity of the test. In my opinion, irritation does not play much of a role, if any, in the development of DCH. At challenge, of course, it is necessary to avoid excessive irritation so that increases in responsiveness of test animals can be detected. The scoring system is presented in Table 6 and is indeed a subjective system. Its most controversial aspect is the designation of a patchy erythema as + (0.5). This designation covers a wide range of slight reactions from an effect due to hydration to a more
Table 6. Scoring system used in the Buchlcr method 0 = N o reaction
+ = I= 2= 3=
Slight palchy erythcma Slight, concluent or moderate patchy erythcma Moderate crythema Severe erythema with or without ocdcma
The Buehler method Table 7 Data indicating a clear-cut positive response to a theoretical test material Response grade
Expertmenta] group
0
0.5
I
2
3
6 8
2 2
7 4 I 6 4 0
7 9
3 I
0 0 0 0 0 0
Incidence
Severity
12 20
1.0-0.8
Test
(24hr) (48 hr) Control (24hr) (48 hr)
0/10
0.2-0.1
substantive erythema that is still patchy. W h e n the reaction becomes confluent or is a moderately patchy erythema, it will be designated a 1. This is a critical score because in most instances it will be considered to be a positive response, particularly if all o f the control animals are less responsive. A 2 is a moderate e r y t h e m a and 3 is severe. The scoring system, o f course, is relative. The most i m p o r t a n t criterion is whether the test animals are more responsive than the naive controls. An additional problem with the scoring system is that some investigators regard the numerical scores as though they are real numbers. They are not quantitative in the sense that a 2 has twice the reactivity o[" a I, and there is no reaction that can be designated as a 1.5. However, for presentation o f data, we average these n u m b e r s as a relative indication of severity. However, the incidence o f positive reactors is the more i m p o r t a n t criterion. The next table (Table 7) displays an array o f data that indicate a clear-cut positive response to a theoretical test material. The test group and the control have been evaluated at 24 a n d 48 hours. There are 12 animals in the test group that are more responsive than any o f the control animals. Additionally, a few control animals have patchy erythema, indicating that the c o n c e n t r a t i o n o f test material was appropriate. These data are easy to interpret. Table 8 presents a less clear experiment and needs clarification. These type of data are often noted with weak sensitizers. In this particular illustration, the test animals have shown an impressive increase in the incidence of + reactions. A rechallenge would be necessary to resolve the mechanism involved. It is necessary to incorporate a new set o f naive controls a n d to rechallenge this same set o f test animals. We do not call this situation a sensitization reaction at this point, a n d insist that a material must produce a greater reaction in a test
I01
Table 8. Data indicating a borderline response to a test material Response grade Experimental group 0 0.5 I 2 3 Incidence Severity Test (24hr) (48 hr) Control (24hr) (48 hr)
I I
19 19
5 6
5 4
0 0
0 0
0 0
020
0.5-0.5
0 0 0 0 0 0
0/10
0.3-0.2
animal t h a n in any control animal in order to be designated as a sensitizer. If the test animals show no difference in reactivity at the rechallenge, we would not assume that sensitization had occurred. However, if any one o f these test animals showed greater reactivity at rechallenge, the test material would be designated as a sensitizer. These basic data, however, should not be used for risk assessment. M o r e substantive d a t a in the guinea pig and h u m a n s should be used to determine whether a material with sensitizing potential might cause problems in h u m a n s u n d e r normal exposure conditions. The basic procedures for accomplishing this objective have been presented ( R o b i n s o n et al., 1989).
REFERENCF.~
Buchlcr E. V. 0965) Dclaycd contact hypcrscnsitivity in the guinea pig. Archives of Dermatology 91, 171-[75. Buehler E. V. 0985) Methods, approaches for assessment of contact hypersensitivity. In lnmmnotoxicohJgy and Immunopharmacology. Edited by J. H. Dean, M. [. Luster, A. E. Munson and H. Amos. pp. 123 -131. Raven Press, New York. Buehlcr E. V., Ritz H. L. and Newmann E. A. (1981) A proposed plan for the detection and identification of potential sensitizers. Regulatory Toxicology and Pharmacology 5, 46--.58. Guillot J. P.. Gonnet J. F., Clement C. and Faccini J. M. (1983) Comparative study of mcthods chosen by the Association Fran~,aise de Normalisation (AFNOR) fi~r evaluating sensitizing potential in the albino guinea-pig. Food and Chemical Toxicology 21, 795-805. Ritz H. L. and Buehler E. V. (1980) Planning, conduct and interpretation of guinea pig sensitization patch tests. In Current Concepts in Cutaneous Toxicity. Edited by V. A. Drill and P. Lazar. pp. 25-42. Academic Press, New York. Robinson M. K., Stotts J., Danneman P. J. and Nusair T. L. (1989) A risk assessment process for allergic contact sensitization. Food and Chemical Toxicology 27, 479-489. Streilein J. W. 0983) Skin associated lymphoid tissue (SALT): origins and functions. Journal of Int'estigatil,e Dermatology 80, 12s-16s.