The potential of organ specific toxicity for predicting rodent carcinogenicity

The potential of organ specific toxicity for predicting rodent carcinogenicity

Fundamentaland Molecular Mechanisms of Mutagenesis ELSEVIER Mutation Research 358 (1996) 37-62 The potential of organ specific toxicity for predict...

2MB Sizes 1 Downloads 92 Views

Fundamentaland Molecular Mechanisms of Mutagenesis

ELSEVIER

Mutation Research 358 (1996) 37-62

The potential of organ specific toxicity for predicting rodent carcinogenicity Yongwon Lee a*1,Bruce G. Buchanan a, Gilles Klopman b, Mario Dimayuga b, Herbert S. Rosenkranz ‘** a Intelligent Systems Laboratory, Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260, USA b Department of Chemistry, Case Western Reserve Universi@, Cleveland, OH 44106, USA ’ Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, PA 15238, USA

Received 5 March 1996; revised 22 May 1996; accepted 28 May 1996

Abstract Relationships between organ specific toxicity (specifications of the presence or absence of 43 morphological effects in 32 organs) observed from 13-week subchronic studies and rodent carcinogenicity were investigated by manually measuring the concordance of each feature and also automatically using the RL (Rule Learner) induction program. Of the 32 organs, the presence or absence of any effect in liver or kidney was found very relevant to rodent carcinogenicity. While the

concordance of Salmonella genotoxicity with rodent carcinogenicity was only 60%, the battery of liver and kidney was 74% accurate with 75% sensitivity and 71% specificity. Further, using the RL program, rule sets based on organ specific toxicity together with the default predictions based on Salmonella mutagenicity were on average 80% accurate with 83% sensitivity and 82% specificity. Keywords: Organ specific toxicity; Morphology; Rodent carcinogenicity; Rule induction

1. Introduction The intricate nature of the carcinogenic endpoint coupled with the diversity of carcinogenic compounds, makes it a difficult task to predict this biological phenomenon. While numerous attempts have been made to explore relationships between a

* Corresponding author. Tel.: (412) 967-6510; Fax: (412) 6241289; E-mail: [email protected] ’ Dr. Yongwon Lee is currently at the Lockheed Martin Missiles and Space, Artificial Intelligence Center, O/96-20, B/254G, 3251 Hanover street, Palo Alto, CA 94304, USA.

variety of chemical features and carcinogenicity, considering the complexity, i.e., multiple stages and multiple causes, of carcinogenic events, a perfect set of criteria for distinguishing carcinogens from noncarcinogens remains to be discovered. Heretofore, the majority of data available has dealt with in vitro, or in vivo short-term effects, or physical and structural chemical features. Recently, the National Institute of Environmental Health Sciences (NIEHS) has made available organ specific toxicity data for a small set of 106 chemicals. For each chemical, the organ specific toxicity data specify the presence or absence of a number of lesions (a number of morphological effects in a number of

0027-5107/96/$15.00 Copyright 0 1996 Published by Elsevier Science B.V. PII SOO27-5 107(96)001 IO-8

38

Y. Lee et al. /Mutation

organs) observed at the end of 13-week subchronic studies. Because these chemicals already had been tested for rodent carcinogenicity in long-term rodent bioassays under the aegis of the National Toxicology Program (NTP), these data present a novel opportunity to explore the relationship between organ specific toxicity and carcinogenicity in rodents. For example, the data could provide an answer to a general question such as “which morphological effect is most indicative of carcinogenicity?” or a specific one such as “Is the observation of nephropathy indicative of rodent carcinogenicity?’ ’ Obviously such questions have been addressed previously (Hoe1 et al., 1988; Huff, 1993; Tennant et al., 199 1; Haseman and Lockhart, 1993). In this paper, we describe a study in which the relationship between organ specific toxicity and rodent carcinogenicity and non-carcinogenicity was investigated to test the potential of organ specific toxicities as predictors of rodent carcinogenicity. In the first half of the study, we analyzed the concordance of each specific organ toxicity with rodent carcinogenicity. We also measured and analyzed the concordance of a number of batteries, each consisting of a pair of endpoints (organs and morphological effects) of organ specific toxicity. In the second half of the study, we used the RL (Rule Learner) program (Clearwater and Provost, 1990) to test if RL could identify useful relationships between organ specific toxicity and rodent carcinogenicity as well as noncarcinogenicity. RL is a rule induction program, which induces a set of IF-THEN rules from specific descriptions (e.g., presence or absence of organ specific toxicity in this study) of chemicals. We also tested whether the responses in the Salmonella mutagenicity assay could be of help when included in addition to organ specific toxicity. Section 2 of this paper describes the database of 106 chemicals used in our study, including the description of organ toxicity data. Section 3 provides an overview of the present study, including brief descriptions of two parts of our study. Section 4 describes the first part of our study, analysis of concordance between organ specific toxicity data and rodent carcinogenicity. The second part of our study including a brief description of the RL induction program is described in Section 5, followed by conclusions in Section 6.

Research 358 (1996) 37-62

2. Database 2.1. Chemicals The database that NIEHS made available with organ specific toxicity data consisted of 106 chemicals. Eighteen (of the 106) chemicals were found equivocal and excluded from the study, because classifying a chemical as equivocal may be based upon biological plausibility as well as statistical confidence. Thus, 88 chemicals consisting of 60 carcinogens and 28 non-carcinogens were used to explore the relationship between organ specific toxicity and rodent carcinogenicity. Table 1 lists the 88 chemicals, including names, TR numbers (NTP Technical Report No.), CAS registration numbers, and carcinogenicity classes, along with the responses in the Salmonella mutagenicity assay. Table 1 also includes the organ specific toxicity of each chemical observed at the end of 13-week subchronic studies, i.e., the number of organs (No. organ) on which it had some effect, and the number of morphological (No. morph) effects it caused. For example, tetrachloroethylene was negative in the Salmonella mutagenicity assay and caused a total of five morphological effects in two organs. 2.2. Organ toxicity data 2.2.1. 124 Lesions

For each chemical, organ specific toxicity data indicate the presence or absence of each of 124 lesions observed at the end of a 13-week subchronic animal study along with the route of administration. A lesion is a specific morphological effect in an organ. The 124 lesions include a total of 43 morphological effects in 32 organs. The names of the 124 lesions and the number of carcinogens and noncarcinogens which caused each lesion are shown in Table 2. For example, five carcinogens and three non-carcinogens caused necrosis in brain tissue. Nephropathy was caused by the largest number of carcinogens (141, and regeneration in kidney and necrosis in liver were also caused by 13 carcinogens each. There were three lesions (necrosis in brain, inflammation in nasal cavity, and degeneration in testes) each of which were caused by four noncarcinogens. Overall, of the 124 lesions, 96 were

39

Y. Lee et al. / Mutation Research 358 (1996) 37-62

Table 1 88 chemicals (60 rodent carcinogens and 28 rodent non-carcinogens) used in the present study. SAL refers to responses in the Salmonella mutagenicity assays. ‘No. organ’ and ‘No. morph’ refer to the number of organs a chemical affected and the number of morphological effects a chemical caused, respectively, observed at the end of 13-week subchronic studies TR No.

CAS No.

Chemical

TR311 TR316 TR319 TR321 TR337 TR340 TR341 TR342 TR349 TR351 TR358 TR359 TR36 1 TR362 TR363 TR366 TR367 TR370 TR374 TR382 TR384 TR386 TR390 TR400 TR407 TR414 TR416 TR422 TR423 TR326 TR329 TR33 I TR332 TR350 TR352 TR372 TR39 I TR397 TR405 TR309 TR328 TR378 TR420 TR323 TR334 TR335 TR339 TR346 TR347 TR356 TR360

127-18-4 513-37-l 106-46-7 75-27-4 59-87-O 5634-39-9 67-20-9 62-73-7 87-86-5 20265-96-7 303-47-9 298-81-7 67-72. I 106-87-6 74-96-4 123-3 1-9 50-33-9 271-89-6 556-52-5 98-01-l 96-18-4 50914-8 612-82-8 96- 13-9 2425-85-6 1825-21-4 91-23-6 9 l-64-5 119-84-6 75-21-8 106-88-7 24382-045 149-30-4 75-25-2 924-42-5 20325-40-o I 15-96-g 2429-74-5 6459-94-5 1163-19-5 598-55-O 100-52-7 396-O 1-O 756-79-6 121-88-O 6313-74-6 99-57-o 75-00-3 5989-27-5 54-3 l-9 12 I-69-7

Tetrachloroethylene Dimethylvinylchloride p-dichlorobenzene Bromodichloromethane Nitrofurazone Iodinated glycerol Nitrofurantoin Dichlorvos Pentachlorophenol,dowicide EC-7 p-Chloroaniline HCl Ochratoxin A 8-Methoxypsoralen Hexachloroethane 4-Vinyl- 1cyclohexene diepoxide Bromoethane Hydroquinone Phenylbutazone Benzofuran Glycidol Furfural I ,2,3-Trichloropropane Tetranitromethane 3,3’-Dimethylbenzidine 2HCI 2.3-Dibromo-1-propanol CI pigment red 3 Pentachloroanisole 0Nitroanisole Coumarin 3,4-Dihydrocoumarin Ethylene oxide 1,2-Epoxybutane Malonaldehyde 2-Mercaptobenzothiazole Tribromomethane N-Methylolacrylamide 3,3’-Dimethoxybenzidine 2HCI Tris(Zchloroethy1) phosphate CI direct blue 15 CI acid red 114 Decabromodiphenyl oxide Methyl carbamate Benzaldehyde Triamterene Dimethyl methylphosphonate 2-Amino-5-nitrophenol CI acid orange 3 2-Amino-4-nitrophenol Chloroethane o-Limonene Furosemide N,N-dimethylaniline

name

CA

SAL

No. organ

No. morph

2 7 2 3 4 3 3 0 1 4 2 3 2 3 4 3 1 3 3 1 7 2 7 3 4 2 5 1 1 I 2 6 0

5 3 5 7 4 6 2 0 3 5 5 2 3 5 3 6 3 3 2 2 8 4 7 5 7 5 14 6 1 1 4 7 0

1

1

4 3 2 2 4 0 4 4 2 1 2 2 2 0

5 4 3 5 6 0 7 5 4 5 2 7 3 0 2 5 4

I 1 5

40

Y. Lee et d/Mutation

Research 358 11996) 37-62

Table 1 (continued) TR No.

CAS No.

Chemical name

CA

TR368 TR369 TR376 TR394 TR401 TR404 TR408 TR410 TR411 TR314 TR317 TR322 TR324 TR325 TR336 TR338 TR343 TR344 TR348 TR353 TR354 TR357 TR365 TR371 TR373 TR375 TR377 TR380 TR381 TR385 TR387 TR389 TR396 TR403 TR412

389-08-2 98-85-l 106-92-3 103-90-2 137-09-7 57-41-O 7487-94-7 91-20-3 6471-49-4 80-62-6 113-92-g 61-76-7 10043-35-3 82-68-8 132-98-9 643-22-I 100-5 1-6 64-75-5 41372-08-l 120-83-2 828-00-2 58-93-5 78-11-S 108-88-3 108-30-5 25013-15-4 2698-4 1 I 55-3 1-2 2244- 16-8 74-83-9 60- 13-9 26628-22-8 79-11-8 108-46-3 7336-20-I

D D D D D D D D D F F F F F F F F F F F F F F F F F F F F F F F F F F

TR413 TR417

107-21-I 100-02-7

Nalidixic acid ol-Methylbenzyl alcohol Allylglycidyl ether Acetaminophen ?A-Diaminophenol ‘2HCl 5,5_diphenylhydantoin Mercuric chloride Naphthalene CI pigment red 23 Methyl methacrylate Chlorpheniramine maleate Phenylephrine HCl Boric Acid Pentachloronitrobenzene Penicillin vk Erythromycin stearate (A) Benzyl alcohol Tetracycline HCl Methyldopa sesquihydrate 2.4-Dichlorophenol Dimethoxane Hydrochlorothiazide Pentaerythritoltetranitrat Toluene Succinic anhydride Vinyl toluene o-Chlorobenzalmalononitrile Epinephrine . HCI D-Carvone Methyl bromide m-Amphetamine sulfate Sodium azide Monochloroacetic acid Resorcinol 4,4’-Diamino-2,2’-stilbenedisulfonic acid Ethylene glycol p-Nitrophenol

F F

SAL

+ + + _ _ _ _ _ _ _ + _ _ _ _ -

No. organ

No. morph

1 1 3 5 5 1 1 0 0 2 0 3 3 0 2 1 1 1 4 2

1 1 5 10 7 3 4 0 0 6 0 3 3 0 4 1

1 2 0 2 1

I 1 5 2 3 6 0 3

I

2

4 6 5

+ M + _ _

3 4 1 4 0 1 2 0 5

-

2 1

2 4

M M -

1 2 0

1 3 0 5

CA (carcinogenicity in rodents) is defined as follows: A, carcinogenic in both rat and mice with tumors at one or more sites; B, carcinogenic to rat or mice with tumors at multiple (> 1) sites; C, carcinogenic to rat or mice with tumors at one site in both sexes; D, carcinogenic to rat or mice with tumors at one site in single sex; F, non-carcinogenic to both rat and mice Ashby and Tennant, 1991.

caused by one or more carcinogens but only 57 were caused by non-carcinogens. There were five lesions which none of the 88 chemicals caused. Of the 96 lesions caused by at least one carcinogens, 34 were also associated with at least one non-carcinogen. Thus, 62 were caused by only carcinogens and 23 were by only non-carcinogens. Table 2 also shows the concordance of the presence or absence of each

lesion with rodent carcinogenicity and non-carcinogenicity, respectively, of the 88 chemicals used in the study. The concordance is discussed in greater detail in the next section. While the presence or absence of the 124 lesions represented adequately the effect of chemicals, they were too specific to be useful in our study, because the size of the sample (88) was smaller than the

Y. Lee et al. / Mutation Research 358 (1996) 37-62 Table 2 Distributions and concordance non-carcinogens, respectively.

of each of 124 lesions specified in the organ specific toxicity data. C and NC refer to carcinogenic AC, SS. and SP stand for accuracy. sensitivity, and specificity, respectively

Lesions

Observed C

Adrenal cytoplasmic vacuolization Bone metaphyseal atrophy Bone marrow cellular depletion Bone marrow hyperplasia Brain degeneration Brain gliosis Brain hemosiderin pigment Brain mineralization Brain necrosis Eye inflammation Forestomach erosion/ulceration Forestomach hyperkeratosis Forestomach hyperplasia Forestomach inflammation Forestomach necrosis

3 1 4 5 3 0 0 3 4 0 0 2 6 4 1

Heart degeneration Heart inflammation Heart mineralization Intestine necrosis Intestine pigmentation Intestine vasculitis Kidney atrophy Kidney casts Kidney cytoplasmic vacuolization Kidney degeneration Kidney dilitation Kidney glomerulopathy Kidney granular Casts Kidney hemosiderin pigment Kidney hyperplasia Kidney inflammation Kidney karyomegaly Kidney mineralization Kidney necrosis

Not observed

Concordance

NC

C

NC

AC

ss

SP

X2

0

57

0

59 56 55 57 60 60 57 56 60 60 58 54 56 59

38 28 25 27 21 27 21 21 24 27 27 27 25 28 28

0.35 0.33 0.33 0.36 0.34 0.31 0.31 0.34 0.32 0.3 1 0.31 0.33 0.35 0.36 0.33

0.05 0.02 0.07 0.08 0.05 0.00 0.00 0.05 0.07 0.00 0.00 0.03 0.10 0.07 0.02

1.00 1.00 0.89 0.96 0.96 0.96 0.96 0.96 0.86 0.96 0.96 0.96 0.89 1.00 1.00

1.45 0.47 0.43 0.68 0.09 2.17 2.17 0.09 1.34 2.17 2.17 0.00 0.01 I .96 0.47

26 27 28 28 28 28 28 28 28 54 58 59 55 58 58 57 52 55 48

0.30 0.3 I 0.32 0.33 0.33 0.33 0.33 0.33 0.33 0.38 0.34 0.33 0.38 0.34 0.34 0.34 0.40 0.36 0.43

0.93 0.96 1.00

4.39 2.17

I .oo 1.00

0.47 0.47 0.47 0.47 0.47 0.47 1.08 0.98 0.47 2.47 0.96 0.96 0.09

2

60 60 60 59 59 59 59 59 59 54 58 59 55 58 58 57 52 55 48

3

I 1 1

I 1 4 1

I I 3 0 0 2 1 0 0 0 0 0 0 0 I 0 0 0 0 0 1

I 1

Kidney nephropathy Kidney pigmentation Kidney protein casts

14 6 6

1 0 1

46 54 54

46 54 54

Kidney regeneration Larynx inflammation Larynx metaplasia Liver cytoplasmic alteration Liver cytoplasmic vacuolization Liver degeneration Liver fibrosis Liver hematopoiesis Liver hemosiderin pigment

13 0 1 2 6 6 1 2 3

1 1 1 0 0 1 0 0 0

41 60 59 58 54 54 59 58 51

47 27 27 28 28 27 28 28 28

0 1 0

52 49 59

28 27 28

Liver hyperplasia Liver hypertrophy Liver inflammation

41

8 11 I

1.oo 1.00

1.oo 1.oo 0.96 1.oo 1.00 1.00

I .oo 1.oo 0.96 0.96 0.96 0.93 0.23 0.10 0.10

0.45 0.3 1 0.32 0.34 0.39 0.38 0.33 0.34 0.35

1.98 0.68 2.36

0.96 1.OO 0.96

5.23 3.00 1.08

0.96 0.96 0.96 1.oo 1.00 0.96

4.61 2.17 0.31 0.96 3.00 1.08 0.47 0.96 1.45

I .oo 1.00

I .oo 1.00 0.96 1.oo

and

42

Y. Lee et al. /Mutation

Research 358 (1996) 37-62

Table 2 (continued) Lesions

Observed

Not observed

Concordance

NC

C

NC

AC

ss

SP

5 2 1

0 0

55 58 59

28 28 28

0.38 0.34 0.33

0.08 0.03 0.02

1.00 1.00 1.00

2.47 0.96 0.47

13

0

47

28

0.47

0.22

1.00

7.12

8 2

0

42 58 59 58 59 58 60 59 59 59 59 60 60 60 58 59 58 59 59 60 57 58 59 57 60 59 60 59 59 60 60 59 59 59 59 59 59 60 59 58 57 52 55 59 59 59

28 26 28 28 28 28 27 28 28 28 28 27 27 27 28 27 28 28 27 21 24 25 25 28 27 28 27 28 28 27 27 28 28 28 27 27 27 27 28 28 28 26 28 28 28 28

0.41 0.32 0.33 0.34 0.33 0.34 0.31 0.33 0.33 0.33 0.33 0.3 1 0.31 0.31 0.32 0.32 0.34 0.33 0.32 0.3 1 0.31 0.31 0.30 0.35 0.31 0.33 0.31 0.33 0.33 0.31 0.31 0.33 0.33 0.33 0.32 0.32 0.32 0.3 1 0.33 0.34 0.35 0.39 0.38 0.33 0.33 0.33

0.13 0.03 0.02 0.03 0.02 0.03 0.00 0.02 0.02 0.02 0.02 0.00 0.00 0.00 0.03 0.02 0.03 0.02 0.02 0.00 0.05 0.03 0.02 0.05 0.02 0.02 0.00 0.02 0.02 0.00 0.00 0.02 0.02 0.02 0.02 0.02 0.02 0.00 0.02 0.03 0.05 0.13 0.08 0.02 0.02 0.02

1.00 0.93 1.00 1.00 1.00 1.00 0.96 1.00 1.00 1.00 1.00 0.96 0.96 0.96 0.93 0.96 1.00 1.00 0.96 0.96 0.86 0.89 0.89 1.00 0.96 1.00 0.96 1.00 1.00 0.96 0.96 1.00 1.00 1.00 0.96 0.96 0.96 0.96 1.00 1.00 1.00 0.93 1.00 1.00 1.00

4.10 0.64 0.47 0.96 0.47 0.96 2.17 0.47 0.47 0.47 0.47 2.17 2.17 2.17 0.64 0.31 0.96 0.47 0.31 2.17 2.25 1.94 3.60 1.45 2.17 0.47 2.17 0.47 0.47 2.17 2.17 0.47 0.47 0.47 0.31 0.31 0.3 1 2.17 0.47 0.96 1.45 0.73 2.47 0.47 0.47 0.47

C Liver karyomegaly Liver mineralization Liver mitotic alteration Liver necrosis Liver pigmentation Liver syncytial alteration Lung degeneration Lung hyperplasia Lung inflammation Lung karyomegaly Lung secondary congestion Lymph node inflammation Lymph node necrosis Lymph node secondary atrophy Mammary gland hypoplasia Muscle degeneration Muscle necrosis Nasal cavity cytologic alteration Nasal cavity degeneration Nasal cavity erosion/ulceration Nasal cavity exudate Nasal cavity fibrosis Nasal cavity hyperostosis Nasal cavity hyperplasia Nasal cavity inflammation Nasal cavity metaplasia Nasal cavity necrosis Ovary atrophy Ovary hypertrophy Ovary necrosis Ovary secondary atrophy Pancreas apoptosis Pancreas atrophy Rectum/anus erosion/ulceration Rectum/anus inflammation Salivary gland metaplasia Sciatic nerve degeneration Skin erosion/ulceration Skin hyperkeratosis Skin hyperplasia Skin inflammation Skin necrosis Spinal cord degeneration Spleen congestion Spleen fibrosis Spleen hematopoiesis Spleen hemosiderin pigment Spleen necrosis Spleen pigmentation Spleen secondary atrophy

1 2 1 2 0 1

1 I 1 0 0 0 2

I 2 1 1 0 3 2 1 3 0 1 0

I 1 0 0 1 1 1 1 1 1 0 1 2 3 8 5

I 1 1

0

2 0 0 0 0 1 0 0 0 0 1

I 1 2 1 0 0

I 1 4 3 3 0 I 0 1 0 0 1 1 0 0 0 1

1 1 1 0 0 0 2 0 0 0 0

I .oo

43

Y. Lee et al. / Mutation Research 358 (19961 37-62 Table 2 (continued) Observed

Lesions

C Stomach, Stomach, Stomach. Stomach,

glandular glandular glandular glandular

cytologic alteration erosion/ulceration fibrosis hyperplasia

Stomach, glandular inflammation Stomach. glandular metaplasia Testes degeneration Testes hemosiderin pigment Thymus necrosis Thymus secondary atrophy Thyroid pigmentation Tooth degeneration Trachea hyperplasia Trachea inflammation Trachea metaplasia Urinary bladder calculi Urinary bladder carcinoma Urinary bladder edema Urinary bladder hemorrhage Urinary bladder hyperplasia Urinary bladder inflammation Urinary bladder metaplasia Urinary bladder papilloma Uterus atrophy Uterus secondary atrophy

0 0 0

0 0

12

2 2

4 0

Not observed

Concordance

NC

C

NC

AC

ss

SP

X2

1 0 0 0

60 59 60 60

21 28 28 28

0.31 0.33 0.32 0.32

0.00

0.96 1.oo 1.00 1.00

2.17 0.41 _ _

3 1 4 0 0 0 0 0 1 1 0 1 0 0 0 1 1 0 0 2 1

60 60 48 59 59 58 60 60 60 60 59 60 59 59 59 58 58 59 59 56 60

25 21 56 60 28 28 28 28 21 21 28 21 28 28 28 21 21 28 28 26 21

0.28 0.3 1 0.41 0.33 0.33 0.34 0.32 0.32 0.31 0.31 0.33 0.3 1 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.34 0.31

0.89 0.96 0.86 1.oo 1.00 1.00 1.00 1.00 0.96 0.96 1.oo 0.96 1.00 1.00 1.00 0.96 0.96 1.00 1.oo 0.93 0.96

6.66 2.17 0.42 0.47 0.41 0.96 _ _

number of variables (124). The problem with a sample size being smaller than the number of variables is that a model composed of the 124 lesions may be too specific to be extrapolated to other chemicals not included among the 88 chemicals, thus reducing the predictive strength of the model. To decrease the number of variables, we simplified the organ specific toxicity data by separating organ specificity and morphological effects as described below. 2.2.2. 32 Organs and 43 morphological effects The 124 lesions can be reorganized and simplified by omitting the specificity of morphological effects, as shown in Table 3. The omission of morphological effects reduces the presence or absence of 124 lesions into the presence or absence of any morphological effect in 32 organs. For example, as shown in Table 3, of the 88 chemicals, five carcinogens and

0.02 0.00 0.00 0.00 0.00 0.20 0.02 0.02 0.03 0.00 0.00 0.00 0.00 0.02 0.00 0.02 0.02 0.02 0.03 0.03 0.02 0.02 0.01 0.00

2.11 2.17 0.41 2.17 0.47 0.47 0.47 0.00 0.00 0.47 0.47 0.01 2.17

five non-carcinogens caused some effects in brain. Of the 32 organs, for five organs (eye, muscle, rectum/anus, thyroid and tooth), none of the 60 carcinogens caused any morphological effects. For three organs (eye, muscle and rectum/anus), there was one non-carcinogen which caused some morphological effect. Overall, on 12 of the 32 organs, no morphological effects were caused by non-carcinogens. Similarly, the organ specific toxicity data of the 124 lesions can be simplified by omitting the organ specificity, thus leaving only the 43 morphological effects, as shown in Table 4. For example, 23 carcinogens and 9 non-carcinogens caused necrosis in some organs. Overall, of the 43 morphological effects, 39 were caused by at least one carcinogen (in some organ), and 25 were by at least one noncarcinogen. The importance of simplifying the organ specific toxicity data of the 124 lesions (Table 2) by

44

Y. Lee et al. /Mutation

separating organ specificity (Table 3) and morphological effects (Table 4), is to generalize organ toxicity data and thus make it more amenable to analyses. Of course, such generalization may lead to loss of information. For example, while two features, (Kidney, +> and (Bone marrow, +>, indicate the presence of some morphological effects in kidney and bone marrow, respectively, they do not indicate which morphological effect occurred and how many different effects occurred. Note that while 17 morphological effects are observed in kidney, only two are seen from bone marrow. However, the use of both 32 organs and 43 morphological effects as well

Table 3 Concordance of the presence or absence of a morphological used in the study Organ

Present

Adrenal Bone Bone-marrow Brain

3 1 9 5 0 7 0 3 35 1 29 3 3 1 0 5 4 2 0 1

Eye Heart Intestine Kidney Larynx Liver Lung Lymph-node Mammary Muscle Nasal cavity Ovary Pancreas Rectum/anus Salivary Sciatic nerve Skin Spinal-cord Spleen Stomach-glandular Testes Thymus Thyroid Tooth Trachea Urinary-bladder Uterus

1 1 1 12 1 13 3 0 0

1 3 4

Research 358 11996) 37-62

as the increased generality of the organ toxicity data increases the possibility of producing more predictive model and this could compensate for the information loss. In our study, we analyzed the concordance and learned rules using the specificity of 32 organs and 43 morphological effects, rather than the 124 lesions. Moreover for the reasons stated below, we pooled the data for both sexes and both species.

3. Study overview The objective of our study was to test the potential of organ specific toxicity as predictors for rodent

effect in each of the 32 organs with the rodent carcinogenicity

Absent

Concordance

Gain

NC

C

NC

AC

SS

SP

X2

0 0 4 5

57 59 51 55 60 53 60 57 25 59 31 57 57 59 60 60 56 58 60 59 59 59 59 48 59 47 57 60 60 59 57 56

28 28 24 23 27 25 26 28 23 29 24 27 28 28 27 23 26 28 27 28 28 27 28 26 25 24 28 28 28 27 27 25

0.35 0.33 0.38 0.32 0.31 0.36 0.30 0.35 0.66 0.32 0.60 0.34 0.35 0.33 0.3 1 0.32 0.34 0.34 0.31 0.33 0.33 0.32 0.33 0.43 0.30 0.42 0.35 0.32 0.32 0.32 0.34 0.33

0.05 0.02 0.15 0.08 0.00 0.12 0.00 0.05 0.58 0.02 0.48 0.05 0.05 0.02 0.00 0.08 0.07 0.03 0.00 0.02 0.02 0.02 0.02 0.20 0.02 0.22 0.05 0.00 0.00 0.02 0.05 0.07

1.oo 1.00 0.86 0.82 0.96 0.89 0.93 1.00 0.82 0.96 0.86 0.96 1.00

1.45 0.47 0.01 1.71 2.17 0.02 4.39

3 2 0 5 1 4 1 0 0 1 5 2 0 1 0 0 1 0 2 3 4 0 0 0 1 1 3

of 88 chemicals

I .oo 0.96 0.82 0.93 1.00 0.96 1.00 1.00 0.96 1.00 0.93 0.89 0.86 1.00 1.00 1.00 0.96 0.96 0.89

1.45 12.6 0.31 9.44 0.09 1.45 0.47 2.17 1.71 0.01 0.96 2.17 0.47 0.47 0.3 1 0.47 2.35 3.60 0.67 1.45 _ 0.31 0.09 0.43

0.02 0.01 0.00 0.02 0.02 0.00 0.04 0.02 0.12 0.00 0.09 0.00 0.02 0.01 0.02 0.02 0.00 0.01 0.02 0.01 0.01 0.00 0.01 0.02 0.03 0.01 0.02 0.00 0.00 0.00 0.00 0.00

Y. Lee et al./Mutation

Research

carcinogenicity as well as non-carcinogenicity. Although the number of available chemicals (88) was relatively small, this was the first time that the organ specific toxicity data had been compiled in an organized and standardized manner. This provided a novel opportunity to explore the relationship beTable 4 Concordance Morphological

of each of the 43 morphological effect

C Apoptosis Atrophy Calculi Carcinoma Casts Cellular-depletion Congestion Cytologic-alteration Cytoplasmic-alteration Cytoplasmic-vacuolization Degeneration Dilitation Edema Erosion/ulceration Exudate Fibrosis Gliosis Glomerulopathy Granular-casts Hematopoiesis Hemorrhage Hemosiderin -pigment Hyperkeratosis Hyperostosis Hyperplasia Hypertrophy Hypoplasia lnflammation Karyomegaly Metaphyseal-atrophy Metaplasia Mineralization Mitotic-alteration Necrosis Nephropathy Papilloma Pigmentation Protein- casts Regeneration Secondary-atrophy Secondary-congestion Syncytial-alteration Vasculitis

1 6 0 1 1 4 2 0 2 9 26 2 1 5 2 5 0 1 5 8 1 6 3 1 21 11 1 16 13 1 4 9 1 23 14 1 11 6 13 2 0 2 1

tween organ specific toxicity and rodent carcinogenicity. Also, while the cost of obtaining organ specific toxicity data is higher than short-term assays, it is still considered very economical compared to the enormous cost of a 2-year long-term animal bioassay. Thus, a predictive and significant relation-

effects with the rodent carcinogenicity

Observed NC 0

2 1 0 0 3 0 2 0 0 10 0 0 3 0 0 1 0 0 2 0 1 2 1 7 2 0 12 1 0 4 2 0 9 I 0 0 1 1 1 1 2 0

45

3.58 (1996) 37-62

of 88 chemicals

used in the study

Not observed

Concordance

C

NC

AC

SS

SP

X2

59 54 60 59 59 556 58 60 58 51 34 58 59 55 58 55 60 59 55 52 59 54 51 59 39 49 59 44 47 59 56 51 59 31 46 59 49 54 47 58 60 58 59

28 26 27 28 28 25 28 26 28 28 18 28 28 25 28 28 21 28 28 26 28 21 26 21 21 26 28 16 21 28 24 26 28 19 27 28 28 21 21 21 27 26 28

0.33 0.36 0.31 0.33 0.33 0.33 0.34 0.30 0.34 0.42 0.50 0.34 0.33 0.34 0.34 0.38 0.31 0.33 0.38 0.39 0.33 0.38 0.33 0.32 0.48 0.42 0.33 0.36 0.45 0.33 0.32 0.40 0.33 0.48 0.47 0.33 0.44 0.38 0.45 0.33 0.3 1 0.32 0.33

0.02 0.10 0.00 0.02 0.02 0.01 0.03 0.00 0.03 0.15 0.43 0.03 0.02 0.08 0.03 0.08 0.0 0.02 0.08 0.13 0.02 0.10 0.05 0.02 0.35 0.18 0.02 0.27 0.22 0.02 0.01 0.15 0.02 0.38 0.23 0.02 0.18 0.10 0.22 0.03 0.0 0.03 0.02

1.00 0.93 0.96 1.00 1.00 0.89 1.oo 0.93 1.oo 1.00 0.64 1.00

0.41 0.19 2.17 0.47 0.47 0.43 0.96 4.39 0.96 4.68 0.45 0.96 0.47 0.13 0.96 2.47 2.17 0.47 2.47 0.73 0.47 1.08 0.16 0.31 0.88 1.90 0.47 2.31 4.67 0.47 1.34 1.08 0.47 0.32 5.27 .47 5.87 1.08 4.67 0.00 2.17 0.64 0.47

Gain

1.oo 0.89 1.00 1.00 0.96 1.00 1.oo 0.93 1.oo 0.96 0.93 0.96 0.15 0.93 1.00 0.57 0.96 1.00 0.86 0.93 1.oo 0.68 0.96 1.00 1.00 0.96 0.96 0.96 0.96 0.93 1.00

0.01 0.00

0.02 0.01 0.01 0.00 0.01 0.04 0.01 0.07 0.00 0.01 0.01 0.00 0.01 0.04 0.02 0.01 0.04 0.01 0.01 0.01 0.00 0.00 0.01 0.02 0.01 0.02 0.05 0.01 0.01 0.01 0.01 0.00 0.06 0.01 0.08 0.01 0.05 0.00 0.02 0.01 0.01

46

Y. Lee et al. /Mutation

ship between organ specific toxicity and rodent carcinogenicity, if any, would be useful in reducing cost and duration. The study presented in this paper consisted of two parts. In the first part, we simply measured the concordance of each variable (124 lesions, 32 organs and 43 morphological effects) with the rodent carcinogenicity and non-carcinogenicity of the 88 chemicals. Four concordance measurements were reported; overall accuracy (AC, the fraction of correct classification to all classification), sensitivity (SS, the fraction of carcinogens which cause a lesion), specificity (SP, the fraction of non-carcinogens which do not cause a lesion), and a chi-square ( ~‘1 value which shows the degree of the observed concordance being due to chance (Klopman and Rosenkranz, 1991). The larger a x2 value, the more relevant the presence and absence of a lesion are to rodent carcinogenicity and non-carcinogenicity, respectively. The x2 values of 3.84 and 6.63 indicate that there are 5% (i.e., confidence level of 95%) and 1% (i.e., 99% confidence level) probabilities, respectively, that the observed concordance is due to chance. For a given type of lesion, a chemical was classified as a carcinogen if a lesion of that type was observed in tissues exposed to the chemical, and as a non-carcinogen otherwise. At the level of a given organ, a chemical is classified as a carcinogen if one or more morphological effects were observed in the organ, otherwise, the chemical was classified as a non-carcinogen. Similarly, for a given morphological effect, a chemical was classified as a carcinogen if it was associated with the given effect in any organ, and as a non-carcinogen otherwise. We also measured the concordance of batteries of two variables with the rodent carcinogenicity of 88 chemicals. In fact we tried all possible batteries of two organs, two morphological effects, and pairs of organ and morphological effects. In addition, using an information theoretic entropy measure, we also measured the expected amount of information gain (labeled GAIN in Tables 3 and 4) that each variable provided in classifying the 88 chemicals. In the second part of the present study, we used the RL induction program to learn rules predicting rodent carcinogenicity and non-carcinogenicity using the organ specific toxicity as well as the responses in the Salmonella mutagenicity assay. As mentioned

Research 358 (1996) 37-62

previously, with only 88 chemicals available, using the presence or absence of the 124 lesions directly was too specific for RL to learn general rules. To reduce the number of variables and also to make the organ specific toxicity data more general, we reorganized the organ specific toxicity data by separating organ specificity and morphological effects, as discussed in the previous section. This resulted in a total of 75 variables corresponding to 32 organs and 43 morphological effects rather than 124 variables for 124 lesions. It should be noted that in reducing the number of variables, we decided to pool the results irrespective of species and gender. This was done for several reasons: (a) to decrease the number of variables (see above), (b) because we were interested in developing an algorithm that was predictive of ‘carcinogenicity in rodents’ rather than of carcinogenicity in a specific species or a specific gender thereof. Moreover, we did not address the possible correlation of organspecific toxicity with organ-specific carcinogenicity as this has been studied by a number of other investigators and we were guided by the assumption that while carcinogenicity in rodents might indicate a carcinogenic risk to humans, there appears to be no correlation between the site-specificity of carcinogenicity between rodents and humans. The IO-fold cross validation method was used to learn and test rules. The 88 chemicals were divided randomly into ten mutually exclusive sets of approximately equal size. Rules were learned from the nine sets and tested on the remaining set, therefore yielding ten trials of learning and testing. In each trial, as the result of testing rules, four performance statistics were measured: sensitivity, specificity, accuracy, and the percentage of chemicals in a test set for which the learned rules made any predictions. The averages of these four measurements over the test sets of the ten trials were then reported as the result of one experiment (cross validation). When learned rules failed to make any predictions, default predictions were made based on the genotoxicity, which in turn was determined by the responses in the Salmonella mutagenicity assay (SAL). That is, the following two rules were used to make default predictions. IF - (SAL, +) - THEN - (Rodent C> IF - (SAL, -> - THEN - (Rodent NC) * When the default predictions were not used, the

47

Y. Lee et al. / Mutation Research 358 (1996) 37-62

three performance measurements, accuracy, sensitivity and specificity are referred to as AC,,, SS, and SP, , respectively. When the default predictions were used for cases for which rules learned by RL failed to make predictions, the three measurements are referred to as AC,, SS, and SP,.

4. Concordance 4.1. Concordance

with rodent carcinogenicity

carcinogens (90.3% positive predictive value). However, there were more non-genotoxic (i.e., negative in SAL) carcinogens (31) than non-genotoxic noncarcinogens (22), yielding less than 50% negative predictive value. Thus, the poor concordance of the responses in SAL was mainly due to the large number of non-genotoxic carcinogens. Such findings have been reported previously (Ashby and Tennant, 199 1), and also motivated one of our previous studies (Lee et al., 1995).

of SAL

Table 5 shows the concordance between the responses in the Salmonella mutagenicity assay and the rodent carcinogenicity of 88 chemicals. ‘M’ and ‘?’ indicate marginal and unknown responses, respectively. Three non-carcinogens were marginal and for one carcinogen, the response in SAL was not known. These four chemicals were excluded when calculating concordance. Of the remaining 59 rodent carcinogens, only 28 were positive in SAL (47.5% sensitivity). On the other hand, 22 of 25 rodent non-carcinogens were negative in SAL (88.0% specificity). Overall, the responses in SAL were only 60% (50 of 84) accurate in predicting rodent carcinogenicity of the 84 chemicals. While this concordance was significant with a x’ value of 9.48, note that the 60% concordance is below the prevalence of carcinogens among the 84 chemicals, which is 70.2% (59 of 84). There were 31 genotoxic (i.e., positive in SAL) chemicals and most of them (28) were rodent

Table 5 Concordance between the responses in the Salmonella mutagenicity assay (SAL) and rodent carcinogenicity of 88 chemicals (60 carcinogens and 28 non-carcinogens). Four chemicals whose responses in SAL were marginal (‘M’) or unknown (‘?‘) are excluded from calculating concordance Rodent carcinogenicity

C NC Sensitivity Specificity Overall accuracy Positive predictive value Negative predictive value

Responses

in SAL

+

_

28 3

31 22 47.5% 88.0% 59.5% 90.3% 41.5%

M 0 3 (28/59) (22/25) (50/84) (28/31) (22/53)

? 1 0

4.2. Concordance

of 124 lesions

Table 2 shows the concordance of the presence or absence of each of the 124 lesions with the rodent carcinogenicity and non-carcinogenicity of the 88 chemicals. Given a lesion, a chemical is predicted as a rodent carcinogen, if it caused that lesion. Otherwise, it is predicted as a non-carcinogen. For example, regeneration in kidney was caused by 13 carcinogens (i.e., 13/60 = 22% sensitivity) and was not caused by 27 non-carcinogens (i.e., 27/28 = 96% specificity). There were only six lesions associated with at least 10 carcinogens each. Nephropathy was associated with the most carcinogens, 14, which was also the most accurate (47%), it had a x’ value of 5.23 (i.e., better than 99% confidence level). Necrosis in liver was also 47% accurate but with an even higher x’ value (7.12). The lesions with x2 values greater than 3.84 (95% confidence level) are shown in bold in Table 2. While there were 85 lesions which were caused by more carcinogens than non-carcinogens, especially for those lesions related to kidney (lesions 22 to 38 in Table 21, liver (lesions 41 to 551, and spleen (lesions 93 to 99>, the number of carcinogens causing one or more of those lesions was dominant. For example, of the 60 carcinogens, 35 caused one or more lesions related to kidney, but only 5 (of 28) non-carcinogens did so. This, in fact, indicates that a chemical is likely to be a carcinogen, when it causes a lesion in kidney, regardless of the specificity of a morphological effect. Similarly, there are some morphological effects, whose observation are significant regardless of the organ site. For example, the observation of karyomegaly was made in three organs, kidney (lesion 32), liver (lesion 50) and lung (lesion

Y. Lee et al. /Mutation Research 358 (1996) 37-62

48

59). However, because only one non-carcinogen caused karyomegaly only in kidney, if a chemical caused karyomegaly, it is likely to be a carcinogen, regardless of which organ was affected. In other words, the 124 lesions can be reorganized by separating organ specificity and morphological effect, as discussed in the previous section, and the concordance of each of 32 organs and 43 morphological effects can be measured as discussed below. 4.3. Concordance

or absence of a morphological effect is indicative of rodent carcinogenicity with x’ values greater than 3.84 (95% confidence level). For the kidney, more than half of the tested carcinogens (35 of 601 caused one or more morphological effects, and the observations of any morphological effect was highly indicative of rodent carcinogenicity. Also, for the liver, 29 of the 60 carcinogens caused one or more morphological effects. While the presence or absence of a morphological effect in the kidney and liver appears indicative of rodent carcinogenicity and non-carcinogenicity (i.e., high x2 values), respectively, neither of them alone is more accurate than the prevalence of carcinogens among the chemicals. That is, while the presence and absence of a morphological effect in kidney and liver were 66% and 60% accurate, respectively, these numbers were lower than the prevalence of carcinogens, 68% (60/88), among the 88 chemicals. Then, one might ask whether using both organs (kidney and liver) will increase the concordance. Or, more generally, is there a battery of two organs which is more accurate than 68%? To answer this, we tested the concordance of all possible batteries of two organs as follows: given two organs, a chemical is predicted to be carcinogenic if it caused any morphological effect in either organ, otherwise, it is predicted as non-carcinogenic. A total of 496 distinct batteries consisting of two organs can be assembled from the 32 organs. The eleven most accurate batter-

of 32 organs

Table 3 shows the concordance of the presence or absence of one or more morphological effects in each of the 32 organs with rodent carcinogenicity and non-carcinogenicity. Given an organ, regardless of the type of morphological effects, a chemical causing any effect in that organ is predicted a carcinogen, and if no effects were observed, it is predicted a non-carcinogen. For example, in kidney, 35 of the 60 carcinogens and five of the 28 noncarcinogens caused some effects, yielding 58% sensitivity (35/60), 82% specificity (23/28) and 66% overall concordance (58/88). Assuming that a 95% confidence level ( x ’ value of 3.84) is the lower limit of an acceptable concordance, those whose x2 values are greater than 3.84 are shown in bold face in Table 3. Of the 32 organs, there were only three organs, i.e., heart, kidney and liver, for which the presence

Table 6 The 11 most accurate batteries of two organs and their concordance with rodent carcinogenicity of 88 chemicals used in the study. The concordance is measured such that given two organs, a chemical is predicted as a carcinogen if it causes an effect in either organ and as a non-carcinogen if it does not cause an effect in both organs Battery of two organs

Observed C

NC

C

Kidney, Kidney, Kidney, Kidney. Kidney, Kidney, Kidney, Kidney, Kidney, Kidney, Kidney,

45 41 37 36 36 36 36 36 36 36 36

8 8 5 5 5 5 5 5 5 5 5

15 19 23 24 24 24 24 24 24 24 24

Liver Testes Intestine Adrenal Bone Spinal-cord Thymus Urinary-bladder Lymph-node Sciatic-nerve Mammary

Not observed

-

Concordance

NC

ss

SP

AC

X2

20 20 23 23 23 23 23 23 23 23 23

0.75 0.68 0.62 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60

0.71 0.71 0.82 0.82 0.82 0.82 0.82 0.82 0.82 0.82 0.82

0.74 0.69 0.68 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67

17.18 12.23 14.69 13.63 13.63 13.63 13.63 13.63 13.63 13.63 13.63

Y. Lee et al. /Mutation

effect in one or more organs. Otherwise, it is classified as a non-carcinogen. For example, as shown in Table 4, 26 (of 60) carcinogens caused degeneration in some organs (43% sensitivity), and 18 (of 28) non-carcinogens did not cause degeneration in any organ (64% specificity). Of the 43 morphological effects, there are only six (cytologic alteration, cytoplasmic vacuolization, karyomegaly, nephropathy, pigmentation, regeneration), for which their presence or absence is indicative of rodent carcinogenicity or non-carcinogenicity, with x’ values greater than 3.84. It is interesting to realize that some of these six morphological effects such as pigmentation and cytoplasmic vacuolization are considered ‘irrelevant’ to carcinogenicity (personal communication). Yet, for the 88 chemicals used in the study, pigmentation and cytoplasmic vacuolization appeared indicative of rodent carcinogenicity, because pigmentation and cytoplasmic vacuolization were associated with 11 and 9 carcinogens, respectively, and not a single non-carcinogen. On the other hand, it should be noted that the 11 carcinogens which caused pigmentation, also caused other morphological effects such as nephropathy, necrosis and hyperplasia, in either liver or kidney. Recall that Table 3 showed that any morphological effect in liver or kidney was highly indicative of rodent carcinogenicity. Thus, the reasons the 11 chemicals were found carcinogenic in rodents might not be related to pigmentation, but rather they may result from toxicity in liver or kidney. Also, of the 9 carcinogens causing cytoplasmic vacuolization, 8 caused morphological effects other than cytoplasmic

ies (i.e., 11 organ pairs with the highest accuracy) are shown in Table 6, together with their concordance measurements (accuracy, specificity, sensitivity and x2 values). The most predictive battery of two organs consisted of kidney and liver. Of the 60 carcinogens, 45 caused one or more morphological effects in either kidney or liver, resulting in a sensitivity of 75%. Of the 28 non-carcinogens, 20 did not cause any effect in either organs, resulting in a specificity of 7 1.4%. Overall, the battery of kidney and liver was 74% accurate (65/88) with a xZ value of 17.18. All 11 batteries shown in Table 6 included kidney and had high x’ values. However, except for liver (in battery-l), it is hard to conclude that including other organs increased the predictive strength of using the kidney alone. Recall that the kidney alone was 66% accurate (58/88) with a x2 value of 12.6 (Table 3). In other words, the high predictive strength of batteries other than the first one in Table 6 is mainly due to the presence or absence of an effect in the kidney which is already highly indicative of rodent carcinogenicity and non-carcinogenicity, respectively, and not because the other organs paired with kidney are predictive. 4.4. Concordance

of 43 morphological

49

Research 358 (I 996) 37-62

effects

Table 4 shows the concordance of the presence or absence of each of 43 morphological effects in any organ with rodent carcinogenicity and non-carcinogenicity. That is, given a morphological effect, a chemical is classified as a carcinogen, if it caused the

Table I The seven most accurate batteries of two morphological effects and their concordance to rodent carcinogenicity of 88 chemicals used in the study. The concordance is measured such that a chemical is predicted as a carcinogen if it causes either effects in any organ and as a non-carcinogen if it does not cause both effects Two morphologies

Observed

Not observed

C

NC

C

NC

ss

SP

AC

X?

Degeneration. Nephropathy Hyperplasia. Necrosis Regeneration, Karyomegaly Necrosis, Regeneration Hyperplasia, Nephropathy Degeneration, Regeneration Degeneration, Karyomegaly

37 36 24 32 30 32 32

11 13 2 10 8 IO 10

23 24 36 28 30 28 28

17 15 26 18 20 18 18

0.62 0.60 0.40 0.53 0.50 0.53 0.53

0.61 0.54 0.93 0.64 0.71 0.64 0.64

0.61 0.58 0.57 0.57 0.57 0.57 0.57

3.86 I .43 9.90 2.38 3.58 2.38 2.38

Concordance

50

Y. Lee et al. /Mutation Research 358 (1996137-62

vacuolization and pigmentation in liver or kidney. The remaining one carcinogen did not cause any effect other than the cytoplasmic vacuolization in the liver. We also measured the concordance of the presence or absence of batteries of two morphological effects in any organ in the following manner: given two morphological effects, if a chemical caused either of them in any organ, it is predicted to be a rodent carcinogen, and otherwise, it is predicted to be a rodent non-carcinogen. There are a total of 903 distinct batteries consisting of two morphological effects and Table 7 shows the concordance of the seven most accurate ones. Of the seven batteries shown in Table 7, degeneration and regeneration were present in three batteries each. In fact, only six morphological effects were used in the seven most accurate batteries. Overall, the predictive strengths of batteries of two morphological effects are inferior to those of the batteries of two organs shown in Table 6. The most accurate battery consisted of degeneration and nephropathy (battery-l) with 61% accuracy (54/88), while the battery of liver and kidney was 74% accurate (65/88). Also, the x2 value of the battery of degeneration and nephropathy was only 3.86, while that for kidney and liver was 17.18. Although not shown in Table 7, the battery of morphological effects with the highest x2 value consisted of cytoplasmic vacuolization and pigmentation, which was 53% accurate (32% sensitivity and 100% specificity) with a x2 value of 11.3. When the concordance measurements of batteries of two morphological effects are compared to those of individual effects shown in Table 4, it is observed that including an additional morphological effect in-

creased the predictive strength, especially sensitivity (SS) and overall accuracy (AC), over using a single morphological effect. For example, both karyomegaly and regeneration were 45% accurate with 22% sensitivity and 96% specificity (Table 4). On the other hand, the battery of these two (battery-31 was 56.8% accurate with 40% sensitivity and 92.9% specificity. Also, the x2 value was increased from 4.67 to 9.90. While the sensitivity and accuracy increased, the specificity in general decreased when the presence or absence of two morphological effects were combined. 4.5. Concordance ogy pairs

of batteries of organ and morphol-

In the previous sections, we analyzed the concordance of batteries of organs and morphological effects separately. In this section, we examine the predictive strength of batteries, each consisting of an organ and a morphological effect. Note that each lesion also associates an organ and a morphological effects. Given an organ and a morphological effect (which are not necessarily related to each other), a chemical is predicted as carcinogenic, if it caused any effect in the given organ or the given effect was observed in any organ. Otherwise, it is predicted non-carcinogenic. There are a total of 1376 distinct batteries consisting of an organ and a morphological effect, and Table 8 shows the five most accurate ones along with their concordance measurements. The most predictive battery consisted of kidney and cytoplasmic vacuolization, which classified 62 (of 88) chemicals correctly (71% accuracy). However, this is lower than the 74% accuracy obtained by the battery consisting of kidney and liver (Table 6). AS

Table 8 The five most accurate batteries consisting of an organ and a morphological effect. The concordance is measured predicted as a carcinogen if it caused any effect in a specified organ or a specified effect in any organ Organ and morphology

pair

Kidney, Cytoplasmic vacuolization Kidney. Degeneration Liver, Regeneration Kidney, Karyomegaly Kidnev. Hvnertroohv

Observed

Not observed

Concordance

such that a chemical

C

NC

C

NC

ss

SP

AC

X2

39 44 37 37 39

5 II 4 5 7

21 16 23 23 21

23 17 24 23 21

0.65 0.73 0.62 0.62 0.65

0.82 0.61 0.86 0.82 0.75

0.71 0.69 0.69 0.68 0.68

16.97 9.44 17.22 14.69 12.24

is

Y. Lee et al./Mutation

mentioned previously, cytoplasmic vacuolization is considered ‘irrelevant’ to carcinogenesis (personal communication). Note that while a lesion is also a pair of an organ and a morphological effects, it is not exactly the same as the battery consisting of an organ and a morphological effect. For example, while the battery of kidney and cytoplasmic vacuolization means any effect (including cytoplasmic vacuolization) in kidney or cytoplasmic vacuolization in any organ (including kidney), the lesion ‘kidney cytoplasmic vacuolization’ refers to the observation of cytoplasmic vacuolization only in kidney. 4.6. Information

gain

In the previous sections, we measured the concordance of each feature (organ specificity and morphological effects) as well as batteries of two variables. The significance of predictive strength of each feature or battery was measured by x2 values. Table 3 and 4 also include another measurement called RGain (in the last column) which shows the classification strength of each variable using an information theoretic entropy measure. Let us first briefly describe the information theoretic entropy measure. In general, given training data, T, containing P examples (chemicals in our study) of a class, C and Q, examples of a class, NC, the expected amount of information, I( P,Q), needed to classify the examples correctly is:

I( P,Q) = -

P -logf’+Q

P P+Q

Q - -logP+Q

Q P+Q

For example, classifying the 88 chemicals consisting of 60 carcinogens and 28 non-carcinogens would require 0.902: Z(60,28)

= - &log-$& 28 - ---log60 + 28

28 60 + 28

= 0.902

Note that the entropy measure can be used for any number of classes. When a variable, M, with values v 1 is used to classify the examples, it I v% Y&t’gkneY-ate n features, (A4 u,), (M u?), . . . , and (M vn). These n features actually partition the training data into n disjoint sets, T], . . . , and T,,

51

Research 358 (19961 3742

where Ti contains examples whose value of A4 is u,. Let P, and Qi be the number of examples of classes C and NC, respectively, in q. Then, the expected amount of information required to classify the examples in T, is: I( P;,Qj) = -

&b1

I

Q,

----log--P, + Qi

p,

Pi + Qi

Qi P, + Q;

The expected amount of information required to classify examples in n disjoint sets CT,, . . , T,> partitioned by the values of a variable, M, E(M), is then obtained as the weighted average of information required for each set T,, q M) =

k

;=,

p,+g,Z( P+Q

Pi,Q;)

If E(M) is less than Z(P,Q), it indicates that partitioning examples using a variable A4 according to its values requires less information. In other words, distribution of examples in each partition is different from that in the original training data. The higher the value of E(M), the more a variable M contributes to classifying examples. On the other hand, if E(M) is greater than I( P,Q>, the attribute M is useless and not helpful in classifying examples. In other words, the difference Z( P,Q> - ECM) is the amount of information gained when the variable M is used to classify examples. Measurements shown in Table 3 and 4 are relative information gain, which is obtained by dividing I( P,Q> - E(M) by I( P,Q). For example, let us consider a variable, kidney, with two possible values { + , - 1, where ’ + ’ means the presence of a morphological effect in kidney and ‘ -’ means no effects in kidney. There are 35 carcinogens and 5 non-carcinogens which caused at least one morphological effect in kidney, and 25 carcinogens and 23 non-carcinogens which did not cause any effect in kidney. Thus, when the 88 chemicals are further partitioned according to the observation of any morphological effect in kidney, the expected information required is: E(kidney)

Thus,

25 + 23 35 + 5 = ___ 6. + 28 Z(3575) + ~Z(25.23)

= 0.792 the amount of information

gained

when the

52

Y. Lee et al. /Mutation

observation of any morphological effect in kidney is used to partition the 88 chemicals, is 0.903 - 0.792, or 0.111, which is about 12.3% (0.111 of 0.903) of the amount of total information required to classify 60 carcinogens and 28 non-carcinogens. For both information gain and x2 value, the greater the magnitude, the more a variable is relevant to classifying the chemicals. In fact, as seen in Tables 3 and 4, the information gain and x’ values appear correlated (i.e., the higher information gain, the larger x’ value). The best variable which provides the largest information gain (12.3%) was the observation of any morphological effect in kidney. Previously, we already discussed the presence or absence of any effect in kidney being the most significant among the 75 variables as shown by its highest x2 value. However, its information gain is only 12% which indicates that other features are required to be more accurate. In other words, while the observation of morphological effect in kidney seems significant, by itself it does not contribute much to classifying the 88 chemicals. There are no other variables which provide more than 10% information gain. Of the 7.5 attributes (32 organs and 43 morphological effects), only 36 attributes have at least 1% information gains. So far in this section, we analyzed the concordance of each variable as well as batteries of two variables. What does, then, the above analysis mean to inductive learning of rules? In general, the RL program searches for rules that are more general (i.e., covering more chemicals correctly) and more accurate (i.e., higher ratio between true positives and false positives). Thus, for rodent carcinogenicity, the features most likely to be used in rules include those organs in which many carcinogens had effects and those morphological effects which were caused by a large number of carcinogens. These include the presence or absence of an effect in kidney, liver, spleen and testes, in which relatively large number carcinogens caused one or more morphological effects. Also, the morphological effects such as degeneration, cytoplasmic vacuolization, hyperplasia, inflammation, karyomegaly, necrosis, nephropathy, pigmentation and regeneration are likely to be used in rules. Note that the observation that a morphological effect appears indicative of rodent carcinogenicity does not necessarily mean that it will always be included in

Research 358 (1996) 37-62

learned rules. For example, while the pigmentation appeared very relevant to rodent carcinogenicity (Table 4), it may not be used at all in rules learned by RL because all chemicals causing pigmentation also affected either liver or kidney and the observation of a morphological effect in liver or kidney is more indicative of rodent carcinogenicity than the observation of pigmentation. Of the batteries of two variables, the battery of kidney and liver was 74% accurate, which is significantly better than the prevalence of carcinogens among the 88 chemicals (68%) and the concordance of the responses in SAL (60%). Thus, the concordance of the battery of kidney and liver provides the minimum accuracy that a rule set learned by RL should achieve.

5. Learning rules

5.1. Assumptions The RL induction program searches for rules by generating and evaluating many combinations of features. It starts with single features and successively specializes rules by adding features or specializing values associated features. The choice and order of feature combinations are in general dependent on their concordance with training data. For example, a combination of features with higher positive predictive value may be evaluated prior to other combinations of features. On the other hand, it is also possible to provide RL with a set of assumptions which guide RI_ such that it will include or exclude specific types of rules. For example, in our previous study of predicting rodent carcinogenicity of nongenotoxic chemicals (Lee et al., 1995) RL was provided with an assumption that “rules including the responses in short-term assays in their conditions are preferred to those which do not include the responses in short-term assays”. This in fact was equivalent to assigning a higher importance to the short-term assays. In the present study, we tested a similar assumption involving the presence or absence of a morphological effect in kidney and liver. Rather than trying combinations of any features in rule formation, we provided RL with a strong bias which directed the rule search.

Y. Lee et al. /Mutation

The choice of liver and kidney was made because these organs were the two most accurate variables (Table 3) and the battery of these two organs was even more accurate in classifying the 88 chemicals (Table 6): the battery of kidney and liver correctly classified 45 of 60 carcinogens and 20 of 28 noncarcinogens. Also, we chose the liver and kidney because we wanted to test a rather specific hypothesis: ‘RL can learn a set of rules which is more accurate than the concordance of the battery of liver and kidney.’ Of the 60 carcinogens, 15 carcinogens were misclassified by the battery of liver and kidney, because these 15 chemicals did not cause any effect in either kidney or liver. Also, 8 non-carcinogens were misclassified because they caused some effects in either liver or kidney. Thus, to increase the accuracy, it is necessary to correctly classify carcinogens which caused no effects in either kidney or liver (rather than simply classifying as non-carcinogen because of no effects), and non-carcinogens which caused some effects in liver or kidney (rather than simply classifying them as carcinogens because of some effects). For this purpose, we provided the RL program with the following assumptions: * The presence of one or more morphological effects in kidney should not be used to predict that a chemical is a non-carcinogen. - The presence of one or more morphological effects in liver should not be used to predict that a chemical is a non-carcinogen. - The absence of a morphological effect in kidney or liver can be used to predict that a chemical is a carcinogen, only if that observation is accompanied with the presence or absence of a morphological effect in other organs, or the presence or absence of a specific morphological effect in any organ. - For a chemical to be a non-carcinogen, it should have no effects in at least one of liver or kidney. The above is a strong bias, because it forces RL to learn rules, all of which include either the presence or absence of a morphological effect in liver or kidney in their conditions. Especially, the last assumption seems very strong, because it assumes that for a non-carcinogen to be correctly predicted a non-carcinogen, it should not cause a morphological effect in at least one of liver and kidney. On the other hand, from the analysis provided in the previ-

Research 358 (1996) 37-62

ous and are the and

53

section, the above is a reasonable assumption provides a more direct way for testing if there other features which can be used together with observations of morphological effects in kidney liver, providing better predictive strength.

5.2. The RL program RL (Clearwater and Provost, 1990) is a knowledge-based inductive rule learning program, that induces one or more IF-condition-THEN-class rules from specific examples of classes. For example, in our study, RL is given a set of carcinogens and non-carcinogens and induces one or more rules which classify the carcinogens and non-carcinogens. RL uses a heuristic rule search which can utilize prior domain knowledge, such as facts, heuristics, or assumptions used by scientists, and can examine a much larger number of identification criteria than can be examined by manual analysis. The main strength of RL is its flexibility. Given a learning problem, many different problem models and assumptions can be tested. The flexibility is partly achieved through the use of a domain model, called the Partial Domain Model, which can guide RL’s search separately from the guidance implicit in the statistics of training examples. The partial domain model contains definitions of attributes to be used in representing examples and rules, a list of classes, assumptions and constraints on rules being sought, and domain knowledge relevant to a particular problem. The values of attributes may be symbolic or numeric, or may be binary. In the present study, all the attributes took binary values. Constraints and domain knowledge usually take the form of preference criteria characterizing desirable properties of rules to be learned. Thus, induction in RL is guided not only by syntactic similarity and dissimilarity of features of examples, but also by constraints and prior domain knowledge in the partial domain model. Given a learning problem, i.e., the names of one or more target classes, a set of their examples, and a partial domain model of the problem, RL searches for rules by examining a large but limited number of combinations of features. An example is represented as a vector of attribute-value pairs, each of which describes a feature of the example. For example, the

54

Y. Lee et al. /Mutation

Research 358 (1996) 37-62

representation of o-chlorobenzalmalononitrile is shown below, which means that o-chlorobenzalmalononitrile is associated with degeneration of brain and kidney, necrosis of kidney and liver, but did not cause inflammation of eyes,. . . , and is a rodent non-carcinogen. ((Name o-chlorobenzalmalononitrile) (Brain degeneration, +> (Kidney degeneration, +> (Kidney necrosis, + ) (Liver necrosis, + ) (Eye inflammation, - >... (Rodent NC)) For most learning problems, RL prefers a general rule, i.e., a rule classifying more training examples correctly, to a specific rule. However, RL can search for more specific rules to the extent specified in the partial domain model. Only those rules which are plausible (with respect to the partial domain model and training examples) are saved in the resulting set of rules. That is, the plausibility of a rule is determined by its performance (how accurately it classifies examples) and its concordance with assumptions, constraints, and domain knowledge. The result of rule search is a disjunction of IFcondition-THEN-class rules, where the condition is a conjunction (‘AND’) of features. For example, the following rule uses two features to predict rodent carcinogenicity: IF-(Liver, f ) and (Kidney, + )THEN-(Rodent C)which is interpreted as ifa chemical causes any ejfect in the liver and kidney, then it is a carcinogen. Such IF-THEN rules are very easy to understand, unlike numerical weights and nodes in a neural network. The comprehensibility of rules permits the facile verification of rules by experts. Unless a learning problem is simple enough to classify all training examples with a single rule, RL finds a disjunctive set of rules, each of which classifies a

subset of training examples. Such rules are then used collectively to make predictions on new cases. RL is a descendant of the Meta-DENDRAL system (Buchanan and Mitchell, 1978), which specialized in finding rules of mass spectrometry in chemistry. However, unlike Meta-DENDRAL, RL is a generai purpose learning program that can be applied to many problems in different domains. RL has been applied successfully in several real world problems, including predicting rodent carcinogenicity of nongenotoxic chemicals (Lee et al., 1995), predicting human developmental toxicity from animal toxicity assays (Gomez et al., 1993; Gomez et al., 19941, trigger design in high energy physics (Lee and Clearwater, 1992; Clearwater and Lee, 19931, trouble diagnosis in a telecommunication network (Danyluk and Provost, 1993), analyzing massive quantities of data on infant mortality (Provost and Aronis, 1994), and inducing rules for biological macromolecule crystallization (Hennessy et al., 19941, and analyzing data on botanical plant toxicity (Krenzelok et al., 1995). 5.3. Experimental

results and discussion

Table 9 shows the results of four different experiments, each an average of a lo-fold cross validation. The second column contains the information (variables) used in rule formation, and the third column specifies whether or not the assumptions regarding kidney and liver (see Section 5.1) were used. In the first two experiments (1 and 2), the assumptions were used and so RL focused on liver and kidney, searching for conditions (features) which could be used together with liver and kidney. Experiments 3

Table 9 The results of four IO-fold cross validation experiments. In two experiments, the responses in the Sulmonella mutagenicity assay were included. In two experiments, rule sets were learned under the assumptions (regarding the observation of a morphological effect in kidney and liver) provided in the previous section Exp. No.

Information

1

32 32 32 32

2 3 4

organs, organs, organs, organs.

used

43 morphologies SAL, 43 morphologies 43 morphologies SAL 43 morphologies

Assumptions

Yes Yes No No

used?

Performance AC,

SS,

SP,

PP

AC,

SS,

SP,

82.5 80.9 15.1 77.9

88.3 87.8 86.8 86.5

71.3 63.0 53.7 62.7

74.7 94.2 82.8 89.7

80.1 80.1 73.8 77.8

82.7 86.6 81.6 87.0

81.5 64.8 63.1 59.6

55

Y. Lee et al. /Mutation Research 3.58 (1996) 37-62

and 4 did not used the assumptions, and no priority was given to liver and kidney, and RL was allowed to learn rules which did not include the liver and kidney. While experiments 1 and 3 only used organ toxicity data, experiments 2 and 4 also included the responses in the Salmonella mutagenicity assay. The last seven columns of Table 9 contain the performance measurements. Note that all measurements shown are averages over 10 trials of testing rules. In the following discussion, we will use the experiment numbers (first column) to refer to experiments. 5.3.1. With or without kidney and liver The best result was obtained by rule sets learned in experiment 1, in which RL learned rules using only the organ toxicity data, according to the assumptions regarding the use of kidney and liver. The learned rule sets were on average 82.5% accurate with 88.3% sensitivity and 71.3% specificity, when they made predictions. However, rule sets made predictions on only 75% of chemicals in a test set on based on the average. When default predictions genotoxicity were made for those chemicals for which the learned rule sets failed to make any predictions, 80.1% accuracy was obtained with 82.7% sensitivity and 81.5% specificity. Does this mean that RL can learn a rule set which is more accurate than simply using the battery of kidney and liver? Recall that the battery of liver and kidney was

74% accurate with 75% sensitivity and 71% specificity. Note that this accuracy was measured when the battery was used to classify all 88 chemicals. Also recall that the concordance of the battery of kidney and liver was measured by classifying a chemical as a carcinogen if it caused a morphological effect in either liver or kidney, and as a noncarcinogen if it did not cause an effect in either liver or kidney. This is actually equivalent to using the following three rules: IF - (Liver + > - THEN - (Rodent C) IF - (Kidney $1 - THEN - (Rodent C) IF - (Liver - > and (Kidney -) - THEN (Rodent NC) Let us compare the performance of the battery of liver and kidney to each of the two groups of measurements obtained in experiment 1: one with the default predictions (AC,, SS,, SP,) and the other without the default predictions (AC,,, SS,. SP,,). Without the default predictions, the learned rule sets made predictions on average for 75% of chemicals in a test set. For the same chemicals for which learned rule sets made predictions. the battery of kidney and liver was actually a little more accurate: 83% accuracy with 90% sensitivity and 70% specificity. On the other hand, with the default rules in addition to rules learned, 80% accuracy was obtained in classifying all chemicals in test sets in a lo-fold cross validation. Of course for all chemicals, the

Table IO Summary and comparisons between performance of (1) rule sets in experiment 1, (2) genotoxicity. (3) the battery of liver and kidney. and (4) prevalence of carcinogens. The last row shows the performance when the observations of morphological effects in the battery of liver and kidney are used to make default predictions instead of genotoxicity Method

Performance

On 75% of chemicals predicted by rule sets learned in experiment 1 Rule sets learned by RL in experiment 1 Genotoxicity (responses in the Salmonella Battery of liver and kidney Prevalence of carcinogens

mutagenicity

assay)

AC

ss

82.5 57.8 83.1 12.7

88.3 55. I 90.2 100.0

80.1 59.5 73.9 68.1 12.9

82.7 36.7 75.0 100.0 73.3

SP 71.3

63.9 70.3 0.0

On all chemicals Rule sets learned by RL in experiment

1 + default predictions

based on genotoxicity

Genotoxicity Battery of liver and kidney Prevalence of carcinogens Rule sets learned by RL in experiment

1 + default predictions

using the battery of liver and kidney

81.5 78.6 11.4 0.0 75.6

56

Y. Lee et al. /Mutation

battery of liver and kidney was only 74% accurate with 75% sensitivity and 71% specificity. In other words, for the chemicals for which learned rule sets made predictions, rule sets alone did not appear more accurate than simply using the battery of kidney and liver. The comparison of performance of rule sets learned in experiment 1 as well as the battery of liver and kidney are summarized in Table 10. On the chemicals for which rule sets learned by RL (in experiment 1) made any predictions, both rule sets and the battery of liver and kidney performed similarly, and the classification based on the genotoxicity of chemicals was poor. On the other hand, over all chemicals, rule sets learned by RL along with default predictions resulted in the highest accuracy. Rule sets learned by RL provided an easy way for deciding when to make default predictions, i.e., “apply default rules when a learned rule set fails to make a prediction”. On the other hand, for the battery of liver and kidney, if it is applied according to the three rules shown above, there seems no clear way of deciding when (or for which chemicals) to incorporate the responses in SAL in making predictions such that better than 74% accuracy can be obtained. The fact that the concordance of the battery of liver and kidney was decreased from 83% for the 75% of chemicals for which learned rule sets (in experiment I> made predictions, to 74% for all chemicals, indicates that the battery of liver and kidney was not as accurate for 25% of chemicals for which learned rule sets did not make any predictions as it was for the 75% of chemicals. In fact, for those 25% of chemicals, the battery of liver and kidney

Research 358 (1996) 37-62

was only 43% accurate with 15% sensitivity and 80% specificity. This shows the strength of rules learned by RL and the utility of the RL induction program. That is, rules learned in experiment 1 were essentially the specialization of battery of liver and kidney with additional features (i.e., additional organs or morphological effects). While rule sets learned by RL appeared more complex (i.e., more rules) than the simple use of the battery of liver and kidney, and they performed similarly to the battery of liver and kidney, the rule sets identified the chemicals which were accurately predicted by using liver and kidney. In other words, by not making predictions, the rule sets excluded those chemicals which were incorrectly classified by the battery of liver and kidney. While we used genotoxicity to make default predictions for those chemicals for which learned rule sets failed to make predictions, we also measured the performance when default predictions were made based on the observations of morphological effects in liver and kidney. However, as shown in the last row of Table 10, the accuracy and sensitivity were about 8% lower when the battery of liver and kidney was used to make default predictions. Also, the specificity was about 6% lower. Both experiments 1 and 2 focused on the use of liver and kidney and both resulted in higher accuracy. Although not significant, rule sets learned with the assumptions of focusing on liver and kidney (experiments 1 and 2) were more accurate than those learned without the assumptions (experiments 3 and 4). However, Table 9 shows that while rule sets learned in experiment 1 on average resulted in the

Table 11 Distributions of 60 carcinogens and 28 non-carcinogens according to the responses in SAL and the presence or absence of morphological effects in kidney and liver. ‘M’ and ‘?’ refer to marginal and unknown response in SAL, respectively SAL

Non-carcinogens

Carcinogens

Liver

Kidney

Liver

Kidney +

-

+

_

+

_

+

_

+ M ?

15 20 0 0

13 11 0 1

14 15 0 0

14 16 0 I

1 4 0 0

2 18 3 0

0 4 0 0

3 18 3 0

Total

35

25

29

31

5

23

4

24

Y. Lee et al. /Mutation

highest accuracy, others.

they made fewer predictions

than

5.3.2. EfSect of SAL When comparing experiments 1 and 2 as well as experiments 3 and 4, it is observed that including the responses in the Salmonella mutagenicity assay (SAL) in learning rules did not result in more accurate rule sets. This in fact may be an indication that there is very little relationship between genotoxicity and the presence or absence of morphological effects in liver and kidney. Table 11 shows the distributions of 60 carcinogens and 28 non-carcinogens according to their responses in SAL and their effects in kidney and liver. More than 90% (28/31) of genotoxic chemicals are carcinogens (Table 5). Also, 87.5% (35/40) of chemicals causing some effects in kidney are car-

51

Research 3.58 (1996) 37-62

cinogens and 87.9% (29/33) of chemicals causing some effects in liver are carcinogens. Thus, given a that it genotoxic chemical, additional knowledge in liver or kidney does not causes some effects provide a significant performance increase. Similarly, given that a chemical causes some morphological effect in liver or kidney, that it is a genotoxic does not provide a significant performance increase, either. In other words, when a chemical causes a morphological effect in liver or kidney, the genotoxicity does not help, because most of such chemicals are already carcinogens (and presumably most such carcinogens are not genotoxic (Ashby and Tennant, 1991)). For example, the number of genotoxic carcinogens causing some effects in liver (14) and kidney (15) are almost equal to the number of genotoxic carcinogens not causing any effect in liver (14) and kidney (13), respectively. Similarly, the number

Table 12 Four sets of rules learned corresponding to experiments 1 to 4 Table 9. Rules are learned from all 88 chemicals. ’ +’ and ‘-’ indicate presence and absence of a morphological effect in a single organ, respectively, when used with an organ, and in any organ, if used with a morphological effect. SAL is the Salmonella mutagenicity assay Rules predicting

rodent carcinogenicity

Experiment 1

2

v

r/

rc Ic

V V V

Rule no. 3

r/ r/ r/

Rules predicting

r/ V V

1

2

r/ W 1/ r/

r/ 1/ r/ V Lc

1 2 3 4 5 6 7

Rule No. 3

(Kidney + ) and (Degeneration, - ) - ) (Liver + ) and (Syncytial-alteration. (Kidney +) and (Nasal-cavity, - ) (Kidney + ) and (SAL, +) (Liver +I (Kidney +) (Spleen +)

18 27 33 13 29 35 12

r/

r/ r/ V

Rule condition (IF)

No. NC 1 2 3 2 4 5 2

Coverage

4

v 1/

Coverage No. C

rodent non-carcinogenicity

Experiment

r/

Rule Condition (IF)

4

Ic

V Ic I@ v’ 1/

No. C 8 9 10 11 12 13 14 15 16 17 18 19

(Liver - ), (Regeneration, - ) and (inflammation, + ) (Kidney - 1, (Liver - ) and (Hyperplasia + ) (Liver - 1, (Regeneration - ) and (Bone-marrow + ) (Liver - ), (Nasal-cavity +) and (Necrosis + ) (Kidney - ) and (Brain + ) (Kidney -) and (SAL m) (Liver - ), ML - ), (Regeneration - ) and (Inflammation (Kidney - ), (SAL - ) and (Liver - ) (Karyomegaly - ), (Brain + ) and (Degeneration - ) (Spleen - ) and (Cellular-depletion + 1 (Nasal cavity +) and (Exudate - ) and (Necrosis + ) (SAL -1 and (Cellular-depletion + )

6 2

1

+ )

1 2 3 2 6 0 1 1 1

No. NC 12 6 3 4 4 0 10 15 3 3 4 3

58

Y. Lee et al. /Mutation

of non-genotoxic carcinogens causing some effects in liver (15) is also almost equal to the number of non-genotoxic carcinogens not causing an effect in liver (16). The responses in SAL also appear irrelevant to predicting non-carcinogens when used in addition to the presence or absence of morphological effects in liver or kidney. Of the 28 non-carcinogens, 24 and 23 did not cause an effect in liver and kidney, respectively. Also, 31 and 25 carcinogens did not cause an effect in liver and kidney, respectively. Thus, the probabilities that a chemical is a noncarcinogen when there are no morphological effects in liver and kidney, are estimated as 43.6% (24 of 55) and 47.9% (23 of 48) probabilities, respectively. When the knowledge that a chemical is non-genotoxic is provided, the probabilities of a chemical being a non-carcinogen increase to only 52.9% (18 of 34) with liver and 62% (18 of 29) with kidney. Of course, most non-carcinogen which did not cause any effect in liver and kidney were negative in SAL, but there are also a number of such carcinogens. In experiment 1 in which rules were learned focusing on liver and kidney, the learned rule sets were 7 1% accurate in predicting non-carcinogens correctly (SP,). Including SAL (experiment 2) as a feature in rules actually decreased the specificity by 8%. On the other hand, in experiment 3 in which the assumptions regarding the use of liver and kidney were not used, rule sets were on average only 53% accurate in predicting rodent non-carcinogens correctly, and by including SAL in experiment 4, the specificity was increased by 10%. Also, in experiments 1 and 3, both of which did not include the responses in SAL, the default predictions based on genotoxicity increased the sensitivity. That is, SS, > SS,. In other words, the genotoxicity was not useful in increasing the predictive strength when used together with the organ specificity toxicity in rule conditions. Rather, the genotoxicity was more effective when used to make default predictions for chemicals for which rules based on organ specific toxicity failed to make any predictions. 5.3.3. Rules Let us examine the rules learned in the four experiments. Each experiment was a IO-fold cross validation resulting in 10 sets of rules, each learned

Research 358 (1996) 37-62

from 90% of chemicals and tested on the remaining 10% with a different test set used in each trial. To obtain a final rule set, RL learned from all 88 chemicals with the same parameters used in the experiments. Table 12 contains four sets of rules, corresponding to experiments 1 to 4. For example, using the same parameter and assumptions used in the experiment 1, a total of 8 rules were learned from the 88 chemicals, of which 3 (rules 1, 2 and 3) predicted carcinogenicity and 5 (rules 8, 9, 10, 11 and 12) predicted non-carcinogenicity. Note that since rules were learned under different assumptions, it does not make sense to consider the union of all four rule sets as a coherent set in itself. For each rule, its condition (IF) is shown along with the numbers of carcinogens (No. C> and non-carcinogens (No. NC) in the training set covered by the rule. For example, rule 1 in Table 12 means that ifa chemical caused any morphological effect in kidney except degeneration, then it is a carcinogen, and there were 18 (of 60) such carcinogens. But, rule 1 also covered 1 (of 28) non-carcinogens. As mentioned before, because experiments 1 and 2 focused on kidney and liver, all rules learned in these two experiments included the observation of a morphological effect in either kidney or liver. Although we specified that RL could learn rules predicting rodent carcinogenicity using the absence of any morphological effect in kidney or liver, no such rules were learned, all rules learned to predict rodent carcinogenicity used the presence of a morphological effect in kidney or liver. While in experiments 3 and 4, RL did not focus on liver and kidney, it still found that the two organs were the most predictive. In fact, unlike experiments 1 and 2, RL did not even further specialize rules (rules 5 and 6) with other features. Note that in experiments 1 and 2, RL specialized rules by including other features in rule conditions, because RL was asked to correctly classify non-carcinogens which caused some effects in either kidney or liver. In experiments 3 and 4, RL also learned a rule using the presence of a morphological effect in spleen to predict rodent carcinogenicity. The four rule sets also showed that although SAL was included in learning rules in experiments 2 and 4, not many rules learned included the responses in SAL in their condition. For example, in experiment

Y. Lee et al. /Mutation Research 358 C1996) 37-62

4, no rules predicting carcinogenicity included the responses in SAL in their conditions. The use of SAL by RL can be well explained by comparing rules 8 and 14. When SAL was excluded in experiments 1 and 3, RL learned rule 8. However, when SAL was included, RL further specialized rule 8 by including a feature (SAL, -) to the rule condition, resulting in rule 14. Note that negative responses in SAL were used to predict non-carcinogenicity (rules 14, 15 and 19), and the positive responses were used to predict carcinogenicity (rule 4). In general, it is more intuitive to predict rodent carcinogenicity using the observation that a chemical caused a morphological effect in an organ, rather than using the observation that the chemical did not cause any toxic effect or did not affect an organ. Similarly, it is also more intuitive to predict rodent non-carcinogenicity with the observation that a chemical did not cause a morphological effect in an organ, rather than the observation that it caused a morphological effect or affected an organ. We tried several experiments with the RL program in which RL learned rules according to such intuitions. But, the performance was very poor. Especially, very few rules were learned for rodent non-carcinogenicity. which in turn reduced the coverage of learned rule sets and the predictive strength. In fact, none of the 12 rules (rules 8 to 19) predicting rodent noncarcinogenicity in Table 12 satisfied such intuition. That is, except for rule 13, they all included at least one feature with ‘ + ‘, meaning the presence of a morphological effect. In predicting rodent carcinogenicity, rules 1, 2 and 3 used the absence of morphological effects in predicting rodent carcinogenicity, i.e., (Degeneration - ), (Syncytial alteration -) and (Nasal cavity - ), respectively. Also, of the 11 rules predicting noncarcinogenicity, 10 included one or more features which indicated the presence of morphological effects. Rules 12 and 16 included the presence of a morphological effect in brain when predicting rodent non-carcinogenicity. Also, the observation of inflammation on some organs was used in rules 8 and 14. On the other hand, it is interesting to see that the observation of necrosis in nasal-cavity was used in predicting rodent non-carcinogenicity (rules 11 and 18) while the absence of effects in nasal-cavity was used in predicting rodent carcinogenicity (rule 3).

59

While the plausibility (both statistically with new chemicals and biologically with expertise and intuitions) of rules shown in Table 12 remains to be evaluated, our initial evaluation is encouraging. Of the 19 rules shown in Table 11, only three rules (rules 17, 18 and 19) included morphological effects which are considered ‘irrelevant’ to carcinogenesis (personal communication), i.e., cellular-depletion and exudate. On the other hand, some morphologies which are considered relevant to carcinogenesis were not used at all in rules. For example, the observation of carcinoma was not used at all. However, this was mainly due to the fact that there was only one carcinogen which was associated with carcinoma (Table 4). The observations of cytoplasmic-vacuolization and pigmentation, both of which are considered ‘irrelevant’ (personal communication), were not used at all, despite the fact that they were among the most predictive morphological effects (Table 4) for the 88 chemicals. This is because, as mentioned previously, the 9 and 11 carcinogens which caused cytoplasmic vacuolization and pigmentation, respectively, did so in kidney or liver or caused some other effects. Thus, the features, (Liver f ) or (Kidney + ), would cover those carcinogens. Finally, let us compare the concordance of the first rule set (rules 1 to 3, and 8 to 12, corresponding to experiment I) shown in Table 12 on the 88 chemicals (i.e., training data) with the concordance of the battery of liver and kidney. The first rule set contained three rules predicting rodent carcinogenicity and five rules predicting non-carcinogenicity. Although it was learned from all 88 chemicals, it classified 68 (48 carcinogens 20 non-carcinogens) of the 88 chemicals. The rule set classified correctly 41 of 48 carcinogens (85.4% sensitivity) and 17 of 20 non-carcinogens (85% specificity), resulting in 85.3% accuracy (58 of 68). For the same 68 chemicals, the battery of kidney and liver correctly classified 43 of 48 carcinogens (89.6% sensitivity) and 14 of 20 non-carcinogens (70% specificity), resulting in 83.8% accuracy (57 of 68). Thus, the rule set learned by RL classified one more chemical correctly than the simple battery of kidney and liver, which is not statistically significant. When default predictions based on genotoxicity were used for those 20 chemicals for which the first rule set did not make any predictions, 72 of 88 chemicals (81.8% accuracy) were correctly

60

Y. Lee et al. /Mutation Research 358 (1996) 37-62

classified with 78.3% sensitivity (47 of 60) and 89.3% specificity (25 of 28) ( x2 value of 35.6). This is more accurate than the simple use of the battery of liver and kidney on the 88 chemicals, which classified 65 of 88 chemicals ( x2 value of 17.2).

6. Conclusion In the present paper, we described the study in which the relationship between organ specific toxicity and rodent carcinogenicity was explored by using both manual analysis and the RL induction program. Our study is not the first one in which a learning program is used to relate organ specificity toxicity with rodent carcinogenicity. Bahler and Bristol (1993) used the C4.5 decision tree learning program (Quinlan, 1993) to predict rodent carcinogenicity using organ specific toxicity data in addition to other features such as short-term assays and structural alerts. Also, more recently, Busey and Henry (1995) reported the use of a neural network to predict rodent carcinogenicity using organ specific toxicity. However, it should be noted that both studies included the richer set of features than the organ specific toxicity used in our study and also both used different database of chemicals. In fact, we applied the C4.5 program to the 88 chemicals using the same organ toxicity data (32 organs and 43 morphological effects) and the result was not as good as in our study: in a IO-fold cross validation, learned decision trees were on average only 61.5% accurate. Let us summarize the results with answers to the following three questions: * Is organ specific toxicity data relevant to rodent carcinogenicity? The concordance of organ specific toxicity with the rodent carcinogenicity was better than the concordance of genotoxicity as well as the prevalence of carcinogens. While the genotoxicity was only 60% accurate and the prevalence of carcinogens among the 88 chemicals were 68%, the battery of liver and kidney was 74% accurate. Also in a lo-fold cross validation experiment, rule sets learned by RL using the organ specific toxicity, along with the default predictions based on genotoxicity, 80% accuracy was obtained. Thus, there is no doubt that organ specific toxicity is relevant to rodent carcinogenicity.

. Can the RL induction program produce a predictive rule set? The experiments with RL did not show that RL could learn a rule set which was more predictive than the simple use of the battery of liver and kidney. On the other hand, with the advantage of rule sets learned by RL in deciding when to use default rules, rule sets together with default predictions based on responses in the Salmonella mutagenicity assay were on average superior to the battery of liver and kidney. Because rules learned in experiment 1 were specializations of the battery of liver and kidney with additional features, the rule sets were able to exclude some chemicals for which the battery of liver and kidney were not accurate. This was shown by the fact that the battery of liver and kidney was 83% accurate for those chemicals rule sets learned in experiment 1 made predictions, and was only 43% accurate for those rule sets did not make any predictions. The strength of the RI_ induction program was not just limited to its ability for learning predictive rule sets which are also human readable. In our experiments, we in fact demonstrated one of the abilitities of RL, i.e., incorporating specific assumptions. - Are the responses in the Salmonella mutagenicity assay useful when used together with the organ specific toxicity in predicting rodent carcinogenicity as well as non-carcinogenicity? There was little relationship between genotoxicity and organ specific toxicity. Or more precisely, whether or not a chemical is genotoxic, was not relevant to whether it caused a morphological effect in kidney or liver. In the experiments with RL, the program did not learn more predictive rule sets by including the responses in SAL in rule conditions in addition to the organ specific toxicity. On the other hand, the experiments showed that genotoxicity could be effective in making default predictions. The specification of 124 types of lesions in the original organ specific toxicity data was too specific for 88 chemicals to provide any useful training. Thus, we separated organ specificity from morphological effects, resulting in features naming 32 organs and 43 morphological effects, where each had two values, ‘ + ’ for one or more effects in one or more organs and ‘ - ’ for no effects in no organs. The increased generality of variables by such reorga-

61

Y. Lee et al. / Mutation Research 358 (1996) 37-62

nization of organ toxicity data made it possible for RL to learn predictive and general rules. There are other ways to represent or model the organ toxicity data which we have not explored. For example, RL might use the number of organs affected by each morphological effect or the number of morphological effects observed in each organ. Until we try other representations, we speculate that the features used in our study are as predictive for these data as can be found. In some experiments with the RL induction program, RL was provided with the assumptions regarding the use of the observations of morphological effects in liver or kidney such that it could focus more on kidney and liver than on other organs. This, in fact, was equivalent to giving greater weight to kidney and liver than to other organs. Similarly, it is also possible to give different weights to morphological effects. For example, those which are considered relevant to carcinogenesis could be given greater weights, and RL can explore rules including the features with the greater weights. We, in fact, tried this by giving larger weights to two morphological effects, degeneration and nephropathy, because the battery of these two effects were the most accurate among the batteries of two morphological effects (Table 7). But, the result was not as good as those we reported in the previous section. The need for a larger database of organ toxicity data is no doubt required for a better understanding of its relationship to rodent carcinogenicity so as to gain higher statistical confidence. Although the number of chemicals available for our study was too small to make definitive conclusions, we believe that the results of our study are encouraging and should trigger the construction of a larger database. This, in fact, is possible as the data reside within the U.S. National Toxicology Program.

Acknowledgements We thank the NIEHS, in particular, Dr. Douglas Bristol, for providing the organ specific toxicity data. We also thank Dr. Hisachi Shinozuka, Department of Pathology, University of Pittsburgh Medical Center for his expertise in interpreting the organ specific toxicity data. This research was supported in part by

the William M. Keck Foundation and the U.S. Department of Defense (Contract No. DAAA21-93-C0046).

References Ashby, J. and R.W. Tennant (1991) Definitive relationships among chemical structures, carcinogenicity, and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutation Res., 257, 229306. Bahler, D. and D.W. Bristol (1993) The induction of rules for predicting chemical carcinogenesis in rodents, in: L. Hunter, J. Shavlik and D. Searls (Eds.1, Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, AAAI/MIT Press, Menlo Park, CA, pp. 29-37. Buchanan, B.G. and T.M. Mitchell (1978) Model directed learning of production rules, in: D.A. Waterman and F. Hayes-Roth (Eds.), Pattern Directed Inference Systems, Academic Press, New York, pp. 297-312. Busey, W.M. and Henry, J.F. (19951 Intelligent toxicology prediction system. Toxicologist, 15, 178-179. Clearwater, S.H. and Y. Lee (1993) Use of a learning program for trigger sensitivity studies, in: Proceedings of the Third Intemational Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, World Scientific, pp. 207-212. Clearwater, S.H. and F.J. Provost (19901 RL4: A tool for knowledge-based induction, in: Proceedings of Tools for Artificial Intelligence 90, IEEE Computer Society Press, pp. 24-30. Danyluk. A.P. and F.J. Provost (19931 Small disjuncts in action: Learning to diagnose errors in the local loop of the telephone network, in: Proceedings of the Tenth International Conference on Machine Learning, Morgan kaufmann, Los Altos. CA. Gomez. J., Y. Lee, and D.R. Mattison (1993). Identification of developmental toxicants using a rule learning expert system, in: Program and Abstracts: The Fourteenth Annual Meetings of the American College of Toxicology. Gomez, J., Y. Lee and D.R. Mattison (1994) RL: An innovative tool for predicting developmental toxicity, in: Program and Abstracts: The 33rd Annual Meeting of the Society of Toxicology. Haseman, J.K. and A.-M. Lockhart (1993) Correlations between chemically related site-specific carcinogenic effects in longterm studies in rats and mice. Environ. Health Perspect., 101, 50-54. Hennessy. D., V. Gopalakrishnan, B.G. Buchanan, J.M. Rosenberg and D. Subramanian (19941 Induction of rules for biological macromolecule crystallization, in: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, pp. 179-187. Hoel, D.G., J.K. Haseman, M.D. Hogan, J. Huff and E.E. McConnell (1988) The impact of toxicity on carcinogenicity studies: implications for risk assessment. Carcinogenesis. 9, 2045-2052. Huff,

J. (1993)

Absence

of morphologic

correlation

between

62

Y. Lee et al. /Mutation

chemical toxicity and chemical carcinogenesis. Environ. Health Perspect., 101, 45-54. Klopman, G. and H.S. Rose&ram (1991) Quantification of the predictivity of some short-term assays for carcinogenicity in rodents. Mutation Res., 253, 237-240. Krenzelok, E.P., F.J. Provost, T.D. Jacobsen, J.M. Aronis and B.G. Buchanan (1995). Assessing patient referral patterns to a health care facility in plant exposure patients using computer artificial intelligence, in: European Association of Poisons Centers and Clinical Toxicologists Scientific Meeting. Lee, Y. and S.H. Clearwater (1992) Tools for automating experiment design: A machine learning approach, in: Proceedings of

Research 358 (1996) 37-62 the Fourth International Conference on Tools with Artificial Intelligence, IEEE Computer Society Press. Lee, Y., B.G. Buchanan, D.R. Mattison, G. Klopman and H.S., Rosenkranz (1995) Learning rules to predict rodent carcinogenicity of non-genotoxic chemicals. Mutation Res., 328, 127-149. Quinlan, J.R. (1993) C4.5: Prograls for machine learning. Morgan Kaufmann. Tennant, R.W., M.R. Elwell, J.W. Spalding and R.A. Griesemer (1991) Evidence that toxic injury is not always associated with induction of chemical carcinogenesis. Mol. Carcinogen., 4, 420-440.