Mutation Research, 147 (1985) 139-151
139
Elsevier MTR08575
Cluster analysis of short-term tests" a new methodological approach R. B e n i g n i a n d A. G i u l i a n i ist:tuto Superiore di Sanith, Viale Regina Elena 299, Rome (Italy)
(Received12 July1984) (Revisionreceived7 December1984) (Accepted11 February1985)
Summary A totally data-based approach to the evaluation of short-term tests is proposed. The performances of 22 tests over a range of 42 chemicals (data from literature) were studied by cluster analysis. The comparison between them was performed only on the basis of their responses to the chemicals. Two different clustering methods produced a coincident classification, pointing to a clear resolution of all tests into 3 groups with common characteristics. With respect to carcinogen discrimination, cluster 1 showed the highest sensitivity and the lowest specificity. Cluster 3 had opposite characteristics. The tests of cluster 2 showed intermediate features. As far as the membership to clusters is concerned, the literature data about the responses to chemicals indicated a strong test system specificity. This apparently overcame both phylogeny and end-point community. A major characteristic of the present approach is the ability to elicit underlying patterns, the knowledge of which can contribute both to hypothesis formulation and be useful for practical purposes.
Over the last few years a vast number of shortterm test systems have been developed with the purpose of identifying chemicals able to produce mutagenic or carcinogenic effects. In particular, the difficulties in testing chemicals adequately for carcinogenicity, and the wide acceptance of the DNA alteration theory of cancer, have produced a great effort in the search for DNA-damage-based tests that could differentiate between carcinogens and non-carcinogens over a wide range of chemical classes. Because many theories exist for the induction of cancer, the number of short-term assays proposed has markedly increased. This variety of aims and objectives has provided an enormous data base on which to ground scientific judgement, but also a rather complex and confusing situation both for scientists and regulators (de Serres and Ashby, 1981; ICPEMC, 1982, 1984; Purchase, 1982).
A central problem is the elaboration of a framework useful to compare the performances of different test systems. But neither the conceptual nor the experimental background is sufficiently clear for this analysis to be made in detail, and there is no general agreement as to the most appropriate classification. This is due, among other reasons, to failure to define the ways in which the tests differ from each other (ICPEMC, 1982). For the current study, it was decided to investigate how far a purely mathematical-empirical approach might lead in the classification of shortterm tests. This approach employed cluster analysis techniques; the similarities between tests were defined only on the basis of their responses to chemicals. Cluster analysis is a mathematical tool devised to sort objects (in our case short-term tests) into groups by their similarities; it takes into consideration simultaneously all the measured at-
0165-1161/85/$03.30 © 1985 ElsevierSciencePublishersB.V.(BiomedicalDivision)
140 tributes of each object when group assignment is made (Van Ryzin, 1977). This procedure is not designed to test hypotheses, but rather to generate them. The importance of the results of this procedure depends on factors such as the scientific or operational usefulness of the new classification scheme. For these characteristics and for the generality of the concepts underlying mathematical classification, cluster analysis has demonstrated a high 'heuristic' potency in very different fields (including astronomy, market-research, high-energy physics, linguistics, social sciences) and in general in all those fields where an exploratory approach is necessary before the establishment of predictive laws. By the way, there is always the possibility that such an approach to a biological problem might detect useful new associations that have been overlooked because of preconceived notions. This paper is an attempt to apply such methods to the specific field of the evaluation of short-term tests. For this first approach, the coherent and homogeneous base of data generated by the International Collaborative Study (ICS) (de Serres and Ashby, 1981) was used. Methods
Data base The analysis reported in the present paper was carried out on the data generated by the ICS, as published in de Serres and Ashby (1981). This program was designed to examine the ability of a number of short-term tests to discriminate between carcinogens and non-carcinogens over a consistent (42) range of chemicals (Table 1). For our study 21 test systems were selected (Table 2). When a test was performed in more than one laboratory with different results, the consensus view of the investigators involved in the post-experiment discussions was the criterion that we followed. Table 3 summarizes the data base used for the present analysis. Cluster analysis Cluster analysis methods work by grouping together by their similarities elements for which a mathematical object called 'distance' between the elements is defined. In our case the elements were the short-term tests, and the axes were the varia-
TABLE 1 TEST CHEMICALS Chemical 4-Dimethylaminoazobenzene(butter yellow) 3,3',5,5'-Tetramethylbenzidine 4-Acetylaminofluorene 1 -Naphthylamine Pyrene Chloroform Diethylstilbesirol 4-Nitroquinoline-N-oxide 2-Naphthylamine 4-Dimethylaminoazobenzene-4-sulfonic acid Na salt Benzidine 2-Acetylaminofluorene Benzo[a ]pyrene Hydrazine sulfate 3-Methyl-4-nitroquinoline-N-oxide Hexamethylphosphoramide(HMPA) Diphenylnitrosamine Anthracene Dinitrosopentamethylene tetrarnine Ethylenethiourea T-Butyrolactone Methionine Safrole Dimethylformamide DL-Ethionine Epichlorohydrin N-Nitrosomorpholine Isopropyl N-(3-chlorophenyl)carbamate 13-Propiolactone 1,1,1-Trichloroethane Urethane Dimethylcarbamoylchloride Azoxybenzene Cyclophosphamide 3-Aminotriazole 4A'-Methylenebis(2-chloroaniline)(MOCA) Methylazoxymethanolacetate Sugar (sucrose) 9,10-Dimethylanthracene Ascorbic acid o-Toluidine Auramine (technical grade)
Code No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
bles measured in the tests (i.e. the experiments on the different chemicals). For a particular test, its coordinate on a specific axis was its response to the corresponding chemical. We calculated the H a m m i n g distances (Bellacicco, 1980) between the statistical units accord-
141 TABLE 2 LIST OF SHORT-TERM TESTS Assay
Bacillus subtilis recEscherichia coil recEscherichia coil polA Salmonella typhimurium his Escherichia coil WP2 Escherichia coli 343/113 Schizosaccharomyces pombe ade mutation Saccharomyces cerevisiae XV185-14C Saccharomyces cerevisiae mitotic recombination Saccharomyces cerevisiae aneuploidy Saccharomyces cerevisiae differential killing
Code 1 2 3 4 5 6 7 8 9 10
Assay
Code
UDS human fibroblasts/ HeLa cells SCE CHO cells Chromosome aberrations CHO cells Transformation BHK21 cells Mutation TK +/- L5178Y mouse lymphoblasts Mutation HGPRT CHO/V79 cells Drosophila melanogaster sex-linked recessive lethal Micronucleus mice SCE mice Sperm morphology mice
12 13 14 15 16 17 18 19 20 21
11
ing to the formula: DistancetA,B~= ((number of chemicals that gave different responses in the two tests A and B)/(number of chemicals assayed in both tests))x 100. This procedure produced a 21 x 21 distance matrix, where the generic dij element represents the distance between the i and j tests (Table 4). Each row represents a test with the distances from each of the other tests as coordinates. The cluster procedure performed on this basis groups together the tests that not only are similar but also have similar distances from all the other tests. After this transformation we performed our cluster analysis, consisting of the two classical stages: calculation of proximities (in this case secondary proximities) and application of a clustering algorithm. The secondary proximities were generated by calculating the Euclidean distances between the rows of Table 4. The clustering methods used were the 'single-linkage' and 'k-means' (Everitt, 1980). The former is a hierarchical agglomerative technique that starts with all the tests distinct from each other; then it agglomerates step by step the most similar tests. Obviously, after N - 1 steps all the tests will the grouped together in only one cluster. This type of classification, in which the tests are progressively aggregated in groups of progressive hierarchical importance and extension,
resembles the features of classical biological classifications. This type of aggregation imposes a forced hierarchical structure to the data, and can create artifacts if this structure is not inherent to the data. The k-means method is non-hierarchical and has only the constraint that the number of clusters to be formed must be decided in advance. When one decides the number rn of clusters, the procedure randomly chooses m seeds of aggregation and allocates the tests to the most similar seeds. After having classified all the tests, the procedure is repeated by taking as new seeds the gravity centers of the clusters formed in the first step and again allocating all the tests. The iterations end when no reallocations occur from a step to the next one; at this time the ratio between inter-group and intra-group variances is locally maximized. Cluster analysis was performed on the IBM 4341 computer at the Istituto Superiore di Sanith, using the BMDP Statistical Software Package. Results
Table 4 shows the proximity data relative to the 21 short-term tests under study. This matrix consists of the dissimilarity coefficients (Hamming distances) between each test and all other tests. It was treated as though it was a multivariate data
o~. N~ ~a-
-.]
-,]
&
~~ ~.. .~.~
~g g~
=r
e,
~~
.~
~tTT.
0.0 35.4 39.0 30.0 50.0 47.4 51.6 27.8 45.1 35.0 43.9 30.8 33.3 26.9 25.7 36.8 53.3 52.0 51.4 33.3 64.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
35.4 0.0 34.5 25.6 32.9 30.0 37.5 30.6 27.4 32.9 48.8 46.2 35.1 22.2 44.4 25.0 16.7 38.5 54.2 42.1 48.2
2
39.0 34.5 0.0 39.0 36.8 30.0 28.1 38.9 38.1 48.8 45.2 47.5 45.9 40.7 27.8 35.0 60.0 46.2 47.2 68.4 57.1
.3
30.0 25.6 39.0 0.0 20.3 28.9 30.6 37.1 28.0 23.7 48.8 24.4 30.6 19.2 30.0 25.0 26.7 40.4 37.1 41.7 42.9
4 50.0 32.9 36.8 20.3 0.0 15.8 30.0 42.4 26.3 34.2 60.5 25.0 32.4 23.1 36.4 26.3 28.6 27.3 27.3 38.9 36.0
5 47.4 30.0 30.0 28.9 15.8 0.0 35.0 21.1 37.5 35.0 50.0 31.6 30.0 35.0 21.1 40.0 33.3 46.7 41.2 57.9 42.9
6 51.6 37.5 28.1 30.6 30.0 35.0 0.0 42.9 34.4 40.6 46.9 43.3 42.9 38.5 41.4 42.1 28.6 38.1 41.4 52.6 40.9
7 27.8 30.6 38.9 37.1 42.4 21.1 42.9 0.0 38.9 37.1 47.2 45.7 50.0 44.0 16.7 50.0 46.7 50.0 40.0 44.4 53.8
8 45.1 27.4 38.1 28.0 26.3 37.5 34.4 38.9 0.0 34.1 47.6 35.0 32.4 24.1 44.4 37.5 20.0 28.8 37.5 31.6 35.7
9 35.0 32.9 48.8 23.7 34.2 35.0 40.6 37.1 34.1 0.0 39.0 38.5 36.1 37.0 38.9 45.0 21.4 56.0 47.2 47.4 55.6
10 43.9 48.8 45.2 48.8 60.5 50.0 46.9 47.2 47.6 39.0 0.0 45.0 45.9 51.9 44.4 45.0 53.3 50.0 66.7 63.2 57.1
11 30.8 46.2 47.5 24.4 25.0 31.6 43.3 45.7 35.0 38.5 45.0 0.0 25.7 24.0 29.4 26.3 28.6 36.0 35.3 33.3 44.4
12 33.3 35.1 45.9 30.6 32.4 30.0 42.9 50.0 32.4 36.1 45.9 25.7 0.0 20.0 43.8 20.0 15.4 39.1 38.7 31.6 39.1
13 26.9 22.2 40.7 19.2 23.1 35.0 38.5 44.0 24.1 37.0 51.9 24.0 20.0 0.0 41.7 31.6 28.6 42.1 37.5 26.3 33.3
14 25.7 44.4 27.8 30.0 36.4 21.1 41.4 16.7 44.4 38.9 44.4 29.4 43.8 41.7 0.0 27.8 69.2 47.6 40.6 50.0 56.5
15 36.8 25.0 35.0 25.0 26.3 40.0 42.1 50.0 37.5 45.0 45.0 26.3 20.0 31.6 27.8 0.0 37.5 38.9 47.1 35.7 46.2
16
53.3 16.7 60.0 26.7 28.6 33.3 28.6 46.7 20.0 21.4 53.3 28.6 15.4 28.6 69.2 37.5 0.0 40.0 36.4 36.4 27.3
17
52.0 38.5 46.2 40.4 27.3 46.7 38.1 50.0 28.8 56.0 50.0 36.0 39.1 42.1 47.6 38.9 40.0 0.0 30.0 35.7 22.2
18
51.4 54.2 47.2 37.1 27.3 41.2 41.4 40.0 37.5 47.2 66.7 35.3 38.7 37.5 40.6 47.1 36.4 30.0 0.0 37.5 12.0
19
33.3 42.1 68.4 41.7 38.9 57.9 52.6 44.4 31.6 47.4 63.2 33.3 31.6 26.3 50.0 35.7 36.4 35.7 37.5 0.0 28.6
20
64.3 48.2 57.1 42.9 36.0 42.9 40:9 53.8 35.7 55.6 57.1 44.4 39.1 33.3 56.5 46.2 27.3 22.2 12.0 28.6 0.0
21
=
The dissimilarity coefficients (Hamming distances) for the short-term tests were obtained by the formula: dA, B ((number of chemicals which gave different responses in the two tests A and B ) / ( n u m b e r of chemicals assayed in both tests))× 100. The original data were derived from de Serres and Ashby (1981) and are summarized in Table 3. The tests are indicated by the codes shown in Table 2.
1
Tests
DISSIMILARITY MATRIX
TABLE 4
144
6/`s7~
]
s4 . 91/`11 70 &891
I
4
suggests an optimal partition of most tests in 3 clusters. Because the single-linkage algorithm has built-in structural assumptions that can be satisfactorily met by data with an intrinsic hierarchical structure, we also applied a non-hierarchical method (the k-means) that does not impose any structure on the data. As a consequence of the former analysis, it was decided to start with a 3-cluster solution. Test 11 was also eliminated from the analysis, Table 5 shows the results of this procedure. Because there is no unique mathematical way to decide how many clusters may be involved in a set of data, we computed other solutions, each containing one cluster more than the previous one. However, when 4- and 5-cluster solutions were attempted, the new clusters were almost identical to the previous ones (results not shown), adding no clarity to the interpretation. Fig. 2 gives a spatial representation of the clusters generated by the k-means technique. The clear resolution of all tests into 3 clusters supports the significance of this clustering configuration. A close relationship can be seen between the results provided by the single-linkage and k-means procedures (Table 6). Both methods indicated a 3-cluster partition and produced analogous class composition, the single-linkage clusters being the 'strong' cores of the k-means clusters.
]
I
L 5/.1
tr~ /`605 ~ ~28s 1
I
~~Sl 38571
//
fi
/
3 789
~ t 3739~
]
]RJ
I
/
3 663 ~
3s7~"
I
~/`~ ~ 3 013 "~
II I
I
~ 7 1 11 18 21 19
i
,--
.
7
i
-
17 9
Tests
Fig. 1. Cluster dendrogram for short-term tests (single-linkage). The short-term tests were aggregated by the single-linkage procedure (see details in the text). The ordinate represents an ordinal scale of dissimilarity. Progressive hierarchical levels of aggregation take place at increasing dissimilarity levels.
matrix (Kruskal, 1964, 1977) and secondary proximities were formed by calculating the Euclidean distances among its rows. Such new proximity data were then analyzed by two different clustering methods: single-linkage and k-means algorithms (Everitt, 1980). In Fig. 1 the results of the single-linkage analysis are shown. The examination of the dendrogram
TABLE 5 k-MEANS CLUSTER C O N F I G U R A T I O N Cluster 1
Cluster 2
Cluster 3
Test
Distance a
Test
Distance a
Test
Distance a
1 3 6 8 15
49.71 47.54 46.28 33.23 30.92
46.70 28.84 31.74 32.61
a.d.
34.97
41.54
41.98 31.64 39.25 53.83 34.95 50.59 41.74 36.60 34.60 44.07 51.38
20 21 19 18
a.d.
2 4 5 7 9 10 12 13 14 16 17 a.d.
41.88
The clusters were generated by the k-means procedure (see details in the text). a Distance of the elements from the centers of the clusters. a.d., average distance,
145
120
80
3 33
3
40-
2
22 2 2 2
O-
22
2 1
2
11
1
1
-40-
I
50
i
I
0
r
I
I
50
[
I
100
Fig. 2. Planar projection of k-means clusters. Scatter plot of the orthogonal projection of tests into the plane defined by the centers of clusters. The values on the axes are a monotonic transformation of distances of tests from the center of cluster 2. Each test is indicated by the code number of the cluster to which they belong.
On the basis of the 3-cluster partition we investigated how this configuration was correlated with the original data base derived from ICS. For this purpose, coded values were assigned to the responses of the tests to the chemicals; the values 1, 2 and 3 were assigned to negative, dubious and positive results respectively. Then means and standard deviations for each chemical agent within the clusters were calculated. This analysis led to the results shown in Table 7, where markedly differentiated characteristics are shown by each cluster of tests with respect to the responses induced by the agents. Tests in cluster 1 gave the highest number of positive results; cluster 2 had an intermediate behavior between cluster 1
and cluster 3, suggesting an almost linear arrangement of the properties of the tests. Cluster 3 exhibited the smallest number of positive results together with a marked homogeneity of the responses (as one can see from the standard deviation). Five substances were positive and nine were negative in all the clusters (as mean responses of the tests within each cluster). It is worthwhile to note that out of the remaining chemicals, 19 showed a linear scaling of the response in the tests going from cluster 1 (more towards positivity) to cluster 3 (more towards negativity). To assess whether this classification of tests, based on the data of ICS, revealed a structure of general validity or was limited only to the 42 substances considered, we performed two further kinds of analysis. The first approach considered the fit to this structure of the results of a short-term test not included in the ICS. For this purpose we chose the neoplastic transformation study performed by Pienta (1980) on 95 compounds, of which 18 were in common with those of the ICS. The new dissimilarity coefficients between this and each of the other 20 tests were computed and added to the original dissimilarity matrix of Table 4 (Table 8). Then the k-means procedure was again undertaken as explained above. As shown in Table 9, the Syrian hamster embryo cell transformation test fitted perfectly into cluster 1 of the previous classification. It must be noted that cluster 1 also contained the BHK21 transformation test. One further analysis was carried out on the data published by ICPEMC (1984). They pertain to results relative to 280 compounds, critically chosen by the authors of the publication. Although the number of chemicals reviewed in this report was considerably higher than in the ICS, the possi-
TABLE 6 COMPARISON BETWEEN THE RESULTS OF THE CLUSTERING PROCEDURES Method
Cluster I
Cluster 2
Cluster 3
Single-hnkage
1, 3, 7, 8, 15
2, 4, 5, 6, 9, 12, 13, 14, 16, 17
18, 19, 20, 21
k-means
1, 3, 6, 8, 15
2, 4, 5, 7, 9, 10, 12, 13, 14, 16, 17
18, 19, 20, 21
The compositions of the clusters derived from the two different approaches are displayed. It is evident that single-linkage clusters are substantially coincident with the k-means ones.
146
Dinitrosopentamethylene tetramine
Z
Diethylstilbeslrol 4-Dimethylaminoazobenzene
~a
Hexamet hylphosphoramide o-Toluidine
ka
9.10-Dimethylanthracene Azoxyhenzene
Z
Dimethylcarbamoyl chloride /3-Propiolactone Epichlorohydrin 3-Methyl-4-nitroquinoline-N-oxide
Z
Hydrazine sulfate 4-Nitroquinoline-N-oxide Methylazoxymethanolacetate
<
Benzidine 2-N aphthylamine
Z: ©
4,4'- Methylenebis(2-chloroaniline)
b.[-.,
Cyclophosphamide
e¢
N-Nitrosomorpholine
,..d ka
Benzo{ a ]pyrene
rv
2-Acetylaminofluorene
r~ k~
Z <
d
d
~a
a S.D.
a S.D.
2
3
2.0 1.4
C
C
1.3 0.7
2.5 1.0
2.0 1.2
1.2 0.6
2.6 0.9
C
1.0 0.0
2.0 0.8
3.0 0.0
?
o
_
1.0 0.0
1.9 1.0
N
1.0 0.0
"2.5 0.8
2.5 I 1.4 1.0 0.9
?
1.0 0.0
1.7 0.9
2.2 1.1
C
1.0 0.0
1.2 0.6
3.0 0.0
N
1.0 0.0
1.0 0.0
3.0 0.0
C
1.0 0.0
1.7 0.9
2.5 1.0
N
~
~-
1.0 0.0
1.8 1.0
2.5 1.0
N
~.
1.0 0.0
1.5 0.9
12.0 I 1.2
?
.
1.0 0.0
2.0 1.0
1.5 1.0
N
N
Z
W
o o
1.0 0.0
1.3 0.7
1.8 1.1
.
C
1.0 0.0
1.3 0.8
1.0 0.0
'~_
.
1.0 0.0
1.0 0.0
1.8 1.1
?
1.0 0.0
1.1 0.3
1.0 0.0
?
1.0 0.0
1.3 0.7
1.0 0.0
N
~.
1.0 0.0
1.2 0.6
1.0 0.0
N
,,~
~-
1.0 0.0
1.0 0.0
1.0 0.0
C
1.0 0.0
1.6 1.0
1.7 1.2
N
1.0 0.0
1.0 0.0
1.0 0.0
The table reports the average responses of the clusters of tests to the chemicals, calculated from the original results of de Serres and Ashby (1981). The values 1, 2 and 3 were assigned to negative, dubious and positive results respectively. Then means and standard deviations for each chemical within the clusters were calculated. The cluster composition is that produced by the k-means method. The line delimits the positive results. a, average; S.D., standard deviation; C, carcinogen; N, non-carcinogen (according to de Serres and Ashby, 1981).
a S.D.
i
Cluster
148 TABLE 8 DISSIMILARITY INDEXES FOR THE SYRIAN HAMSTER EMBRYO CELL T R A N S F O R M A T I O N ASSAY Tests
1
2
3
4
5
6
7
8
9
10
12
13
14
15
SHE
23.5
50.0
29.4
37.5
41.2
36.4
37.5
13.3
50.0
41.2
35.3
46.7
35.7
20.0
Tests
16
17
18
19
20
21
SHE
55.6
62.5
66.7
33.3
50.0
50.0
On the base of 18 chemicals assayed both in the ICS and by Pienta (1980), we calculated the dissimilarity coefficients between the Syrian hamster embryo transformation test (SHE) and the other tests considered in this study, according to the procedure illustrated in the text.
TABLE 9 k-MEANS CLUSTERS I N C L U D I N G THE SYRIAN HAMSTER EMBRYO T R A N S F O R M A T I O N TEST Cluster 1
Cluster 2
Test
Distance
1 3 8 15 22 a
45.27 49.95 34.37 30.81 33.63
a.d.
Cluster 3
Test
38.80
Distance
Test
Distance
2 4 5 6 7 9
42.01 31.60 37.97 52.48 53.14 37.15
18 19 20 21
36.63 35.86 46.69 28.84
a.d.
37.00
10 12 13 14 17 16
49.58 42.69 38.47 37.49 56.54 45.93
a.d.
43.76
a Syrian hamster embryo cell transformation assay. For details, see text and legend to Table 3.
TABLE 10 DISTANCES BETWEEN SALMONELLA ASSAY A N D O T H E R SHORT-TERM TESTS Data source
Test codes 1
3
12
14
15
17
18
19
ICPEMC, 1984
35
32
7
21
29
26
22
24
42
ICS
30
39
20
24
19
30
27
40
37
5
& = 0.46
The table shows the Hamming distances ( × 100) between the Salmonella assay and other tests, calculated on two different data bases. r~, Spearman rank correlation coefficient.
149 bility of comparing the tests one by one is generally limited to a few substances, because of the different ranges of chemicals on which each test had been performed. It is evident that in this case a cluster analysis cannot be carried out. Many of the assays, though, presented a consistent number of test results on substances that had also been tested in the Salmonella assay. Using this data, we computed the dissimilarity coefficients between a number of tests and the Salmonella assay. These coefficients were compared with the analogous coefficients derived from the ICS (Table 10). These resulted in a Spearman rank correlation coefficient rS = 0.46, suggesting a considerable similarity of the typologies underlying the two data bases. Discussion
Since classification is the ordering of objects by their similarities (Sneath and Sokal, 1973), the definition of the criteria by which the objects are compared is obviously of the greatest importance. In the context of the present paper a purely operational criterion has been assumed. The assumption that we made is very simple and obvious: the more similar two tests, the more similar their responses to the chemicals. This assumption is totally data based and does not include any unverified inference about the mechanistic characteristics of the various biological end-points. A major problem in this study was the selection of an adequate data base for comparing the performances of the tests most used. As it has been stressed in a recent report of ICPEMC (1984), only a fraction of the published literature can be assumed to meet rigorous criteria, with the result that there are large gaps in information about the genotoxicity of substances characterized as carcinogens or non-carcinogens. For example, in the list mentioned the number of known carcinogens and non-carcinogens for which results are available with most of the assays is relatively small. As our intention was to compare the different tests over the responses to an adequate range of chemicals, we decided to perform the analysis on the data base generated by the ICS. The number of chemicals tested in this study was not very high (42), but it provided a body of data with characteristics essential for the analysis. First of all, the
chemicals were carefully chosen to include examples of the major classes of chemical carcinogens, with 14 structurally related carcinogen/noncarcinogen pairs. The chemicals assayed by the different laboratories were of common origin, coded and prepared in as pure a form as possible. Finally, the carcinogenicity data were critically evaluated and the results of the tests were discussed by the scientists participating in the study. For all these reasons, we decided to use such a coherent base of data for our analysis. As we did not want to fit the data to some preconceived models but rather to make an exploratory analysis, we chose two clustering methods with different (and in a certain sense opposite) underlying statistical models: single-linkage and k-means procedures. In our case, both analyses resulted in coincident clusters (Fig. 1, Tables 5 and 6), where the single-linkage classes formed the 'nuclei' of the k-means classes. A characteristic of such classifications is that they lend themselves, at least in principle, to measurement of how well they have contributed to the specific purpose of comprehending the data more clearly. In our case what we wanted to know was whether such classification had led to classes with distinguishable characteristics with respect to the response of the tests to the chemicals. As shown in Table 7, our mathematical classification had this discriminant power in a clear and crystalline manner. The tests classified in cluster 1 gave, on the average, the highest percentage of positive responses; cluster 3 exhibited the smallest number of positive results while cluster 2 had an intermediate behavior. Of course, another way of verifying the clarity of separation between clusters is looking at the dissimilarity matrix (Table 4). For example it is easy to note that test 8 is nearer to test 15, which is the more central one in cluster 1, then to tests 4 or 21, which are the units nearest to the centers of clusters 2 and 3 respectively (Table 5). An analysis of the results of Table 7 also gives a reason as to why the data fitted into the hierarchical model underlying the single-linkage algorithm (Fig. 1). It is evident that there are substances negative or positive in all the groups: these 'make the communality' at the top of the dendrogram. Then there are 'discriminant' chemicals exhibiting
150 a differential (nearly scalar) behavior; they ' m a k e the difference' between the groups. While the class partition we obtained by applying the clustering algorithms is a solid representation of 'what there is' in the data base generated by the ICS, whether it is of more general validity for the universe of chemical agents is open to further analysis. Because of the difficulties in finding a wider and reliable base of data, we began this task by two limited but informative approaches. First, we demonstrated that on selecting a test (the Syrian hamster embryo transformation test (Pienta, 1980)) not included in the ICS, it fitted perfectly in the cluster configuration without significant alterations to it (Table 9). It is important to note that the Syrian hamster embryo transformation test was allocated in cluster 1, very close to the BHK21 transformation test. Secondly, we computed the relative distances (dissimilarity coefficients) between the Salmonella assay and a number of tests on a base of data independent from ICS. This was the list published in I C P E M C (1984). These distances were compared with those generated by the ICS results (Table 10). The value of the Spearman rank correlation coefficient (r~ = 0.46) suggests a notable similarity between the structures of the two data sets, in spite of their very different origin and nature. Of course, even if these pieces of evidence may indicate that the pattern underlying the ICS results is a natural structure of more general validity, they cannot suffice to give the status of definitive statement to the results of the present paper. For this purpose, it is highly desirable that the large gaps in information in data bases such as that of the I C P E M C (1984) should be filled before gathering data on additional chemicals or setting up new tests. The overall biological profile of the classification produced by cluster analysis provided the confirmation of many current ideas, together with some notable new findings. Note the clear separation between in vitro and in vivo tests. The in vivo tests were all grouped together in cluster 3, which is characterized by the smallest number of positive responses and a remarkable homogeneity. This confirms the previous assessments about the particularly low sensitivity of these tests. It also confirms what was already known about the relative similarity of responses of the SCE in vitro test
with both mammalian mutation assays and chromosomal aberrations (Carrano et al., 1978: Gebhart, 1981). In fact these tests have low distances between them (Table 4) and group together in cluster 2 (Fig. 1). Toxicologists will easily detect many other examples of convergence between the previous notions and the findings of the present analysis. At the same time, some new and partially unsuspected elements emerged from the ICS data. The main dimension is the distinction of the in vitro tests in two well characterized clusters (Table 6). Neither phylogeny nor community of endpoints seems to be the rule of this partition; rather, a strong test system specificity apparently overcomes any other factor. In other words, it seems very difficult to predict the specific responses of the assays to the chemicals on the basis of those concepts. In this respect, the allocation of the BHK21 transformation test into cluster 1, next to the S. cerevisiae XV185-14C test, is particularly noteworthy. The Syrian hamster embryo transformation test, too, confirmed this finding by being allocated very closely (Tables 6 and 9). In light of these results, the current divisions of tests into groups on the basis of the type of organism and end-point are not much supported by the ICS data and do not help much with an understanding of the phenomena. As stressed above, it must be kept in mind that the characteristics divulged by cluster analysis are not the product of an inferential process; on the contrary the membership to clusters was determined fairly in a natural way by the ICS data themselves, as though they were subjected to a ' radiography'. Obviously, a cluster analysis can only be based on the variables that were provided. It is possible, at least in principle, that the repetition of the study on a different and wider base of data would result in a notably different classification. A similar occurrence should give rise to new and not easy to solve problems. Probably, the selection of the chemicals should be questioned first. In the case of the ICS, the test chemicals were a very carefully chosen sample, representative of most of the chemical classes of interest and including both carcinogens and non-carcinogens. It is difficult, by now, to imagine how such a subset of chemicals could produce a very distorted representation of the general properties of the tests, In any case,
151
these considerations indicate that much must be learned before our understanding of genotoxicity may be considered sufficiently sohd. In our opinion, the topics discussed in this paper emphasize the usefulness of applying cluster analysis, and in general automatic classification techniques, to the results of genotoxicity research. A major dimension of this approach is the ability to bring out underlying structures. The ready visualization of these patterns can be regarded as a quick and easy first step in data analysis and can contribute both to hypothesis formulation and for practical purposes. For example, cluster analysis can help to form stratified subsamples to reduce the bulk of the data. Or, if the objects in one cluster really have a different character from the objects in another, it can be useful to perform separately subsequent more sophisticated analysis (Kruskal, 1977). The construction of test batteries able to discriminate between carcinogens and noncarcinogens might be an immediate application of this approach. Once clusters of well validated tests had been obtained on a base of adequately representative data, it would be very easy to design a suitable combination of tests belonging to the cluster more able to identify carcinogenic properties correctly. In the case of the ICS data, Table 7 shows that cluster 1 and cluster 3 have the highest and lowest sensitivities respectively. They also have opposite specificities; this is low for cluster 1 and high for cluster 3. Cluster 2 has more equilibrated characteristics, in spite of the presence of 11 ' problem' carcinogens. In conclusion, we think that in the past years genotoxicity research has produced enough data to try to give a coherent form to them. The present work may be of value in assessing the correct methodological approach to this problem.
Acknowledgement We thank Dr. G.A. Zapponi for helpful discussions.
References Bellacicco, A. (1980) Metodologia e Tecnica della Classificazione Matematica, La Goliardica Editrice, Roma. Carrano, A.V., L.H. Thompson, P.A. Lindl and J.L. Minkler (1978) Sister-chromatid exchange as an indicator of mutagenesis, Nature (London), 271, 551-553. de Serres, F.J., and J. Ashby (Eds.) (1981) Evaluation of Short-Term Tests for Carcinogens, Report of the International Collaborative Program, Progress in Mutation Research, Vol. 1, Elsevier/North-Holland, Amsterdam. Everitt, B. (1980) Cluster Analysis, Halsted Press, New York. Gebhart, E. (1981) Sister chromatid exchange (SCE) and structural chromosome aberrations in mutagenicity testing, Hum. Genet., 58, 235-254. International Commission for Protection Against Environmental Mutagens and Carcinogens (ICPEMC) (1982) Committee NO. 2 final report: Mutagenesis testing as an approach to carcinogenesis, Mutation Res., 99, 73-91. International Commission for Protection against Environmental Mutagens and Carcinogens (ICPEMC) (1984) Task Group No. 5. Report on the differentiation between genotoxic and non genotoxic carcinogens, Mutation Res., 133, 1-49. Kruskal, J.B. (1964) Non metric multidimensional scaling: a numerical method, Psychometrika, 29, 115-129. Kruskal, J.B. (1977) The relationship between multidimensional scaling and clustering, in: Classification and Clustering, Academic Press, New York. Pienta, R.J. (1980) Evaluation and relevance of the Syrian hamster embryo cell system, in: G.M. Williams, R. Kroes, H.W. Waaijers and K.V. Van de Pol (Eds.), The Predictive Value of Short-Term Screening Tests in Carcinogenicity Evaluation, Elsevier/North-Holland, Amsterdam, pp. 149-169. Purchase, I.F.H. (ICPEMC) (1982) An appraisal of predictive tests for carcinogenicity, Mutation Res., 99, 53-71. Sneath, P.H.A., and R.R. Sokal (1973) Numerical Taxonomy, Freeman, San Francisco, CA. Van Ryzin, J. (Ed.) (1977) Classification and Clustering, Academic Press, New York.