Anulytica Chimica Acta, 133 (1981) 603-613 Computer Techniques and Optimization Elsevier
Scientific
Publishing
Company,
Amsterdam
-
Printed
in The Netherlands
COMPUTER-ASSISTED STRUCIWRE-CARCINOGENICITY STUDIES ON POLYCYCLIC AROMATIC HYDROCARBONS BY PATTERN RECOGNITION METHODS
YOSHIKATSU MIYASHITA, TOMOKO SEKI, YOSHIMASA TAKAHASHI, SHIN-ICHI DAIBA, YUICHIRO TANAKA, YASUHIKO YOTSUI+*, HIDETSUGU and SIIIN-ICHI SASAKI*
School of Materials Science, Toyohashi University of Technology, Aichi 440 (Japan) (Received
23rd January
ABE
Tempaku, Toyohashi,
1981)
SUMMARY Pattern recognition methods are applied to the study of structure-carcinogenicity relationships in 25 represenhrive polycyclic aromatic hydrocarbons (PAHs). On the basis of presumed metabolic transformation. a variety of reactivity indices taken from simple Hiickel molecular orbital theory for not only parent PAH but also later metabolites are In order to display the IP-dimensional used to investigate the carcinogenic process. molecular descriptor space, a Karhunen-Lo&e plot in two-dimensional space is employed; 92.1% of the variance is retained. The data structure shows asymmetric character. Carcinogens are clustered, whereas non-carcinogens are scattered. Linear discriminant functions for carcinogenicity are developed by using multiple linear regression analysis. The
most
significant
equations
suggest
the importance
of metabolic
pathways.
Numerous attempts have been made to explore structure-carcinogenicity relationships in polycyclic aromatic hydrocarbons (PAHs). The K- and L-region theory of Pullman and Pullman [l] and the bay-region theory of Jerina et al. [ 2, 31 have been applied to this problem. Most of the theoretical models have focused attention on properties of the parent PAHs. Recently, however, the metabolic transformations of benzo[a] pyrene [4, 51 and benz[a] anthracene [6] have been studied. E%enzo[a] pyrene and benz[a] anthracene are metabolically activated and transformed in vivo from precarcinogen to ultimate carcinogen. On the basis of this presumed transformation, Smith et al. [7] have qualitatively examined the relationships between carcinogenicity and a variety of reactivity indices taken from simple Hiickel molecular orbital theory in 25 unsubstituted PAHs. Pullman [ 81 has reviewed critically recent discoveries on the metabolic transformations of PAHs. Pattern recognition methods have been applied to structureactivity studies [9, lo]. These techniques have a!so been used for structure-carcino**Present address: 132, Japan.
Research
0378-4304/81/0000-0000/$02.50
Institute,
Daiichi
8) 1981
.Seiyaku
Elsevier
C_!. Ltd.,
Scientific
Edogawa-ku,
.Publishing
Company
Tokyo
604 genicity studies [ 11-151. In this report, pattern applied to the study of structurc-carcinogenicity
etical indices relating to a series of metabolic Smith et al. DATA
recognition methods are relationships using theortransformations obtained by
SET
-4s a result of a variety of studies, the preliminary path by which benzo[ a] pyrene is metabolically activated and transformed in vivo from precarcinogen to ultimate carcinogen is believed to consist of the stages shown in Fig. 1. Only one stereoisomeric form is illustrated. In step (a) of Fig. 1, benzo[a] pyrene is converted to its ‘I,&epoxide IIa at the A-region. This is the presumptive initial epoxidation site on the terminal ring of the bay region (called the M-region by Pullman [S] ). In step (b), the epoxide IIa is transformed to the 7,S-dihydrodiol IIb. Saturation of the 7,8-bond activates the 9,10-bond, the B-region. This is the site of final epoxidation on the terminal ring of the bay-region (cr!led the N-region by Pullman [ 81). Compound IIb is transformed to the 7,8-dihydrodiol-9,10-epoxide III in step (c). In step (d), the diol-eposide III converts spontaneously to the trio1 carbonium ion IV. It has been suggested [ 161 that carbonium ions such as IV act as ultimate carcinogens via electrophilic attack on critical cellular nucleophiles, e.g., DNA. Compounds IIa and IIb are equivalent from the Hiickel molecular orbital theory; they have the same theoretical indices_ The carcinogenicity indices of Arcos and Argus [17] and of Jerina et al. [ 2, 31 for PAHs are shown in Table 1. Fourteen molecular structure descriptors are shown in Table 2. Thus the chemical structures of PAH and its metabolites are represented by a 14-dimensional pattern vector. Table 3 shows which descriptor is associated with a series of metabolic transformations. CLUSTERING
OF POLYCYCLIC
AROMATIC
HYDROCARBONS
The preprocessing method is autoscaling to weight all descriptors equally. This method provides zero mean and unit standard deviation for all descrip-
Fig. 1. Metabolic conversion of benzo(a)pyrene.
605 TABLE
1
Carcinogenicity indices for 25 polycyclic aromatic hydrocarbons Compound
Name
Carcinogenicity index Arcos and bs--
1 2
3 4 5
6 7 8 9 10 11 12 13 14
15 16 17 18 19
20 21 22 23 24 25
Naphtbalene Antbracene Tetncene Pentacene Hcxacenc Benz[a]anthracene Eknzo[ a ] tdracene Phenanthrene Benzo[c Jphenanthrene Chrysene Benzo[ b ] chrysene Picene Triphenylene
Benzo[g]chrysene Dibenz[%c] anthracene Dibenz[a,j ]ant.bracene Dibenz[ a,h ] antbracene Naphtbo( 2,3-a]pyrene Bcnzo[a] pyrene Benzo[e]pyrene Dibenzo[a,l] pyrenc Dibenzo[a,i ] pyrene Dibenzo[ a,e] pyrene Dibento[a,h]pyrene Tribcnzo[qe,i]pyrenc
0 0 0 0
5 0 4 3 0 0
17 3 4 26 27 73 2 33 74 50 70 16
[I71
Jerina et al. [2,3]
? .t + + 2+ + + 2.t
2+ 41. + 2+ 4+ 3+ 4+ 2c
tors. Both correlation coefficients between PA and LIE,(‘), and Qb and LL!&,~~ are 0.999. Therefore, P, and Qb are omitted and the remaining twelve-dimensional data are analyzed_ The correlation matrix for 12 descriptors is shown in Table 4. A hierarchal clustering method [ 181 was applied to 25 PAHs. Here, the distance between two clusters is determined by the nearest neighbours in the two clusters. The result is shown as a branching-tree diagram in Fig. 2. It is clear that carcinogenic compounds are clustered, whereas noncarcinogenic compounds are scattered. This result suggests the asymmetric nature of molecular descriptor space. The K-nearest neighbor classification rule (K = 1) was applied to 24 PAHs for carcinogenic data (Arcos and Argus [ 171). On the basis of autoscaled 12 structural descriptors, the predictive ability for classifying PAHs as carcinogens or noncarcinogens is 79.2%.
606 TABLE
2
Molecular
structure
for polycyclic
aromatic
hydrocarbons
Descriptor
No.
1nP I -A
1 2
HOMO LFJ’ p,* AEwoc Fb Sb
AE,(‘)
10
AEd’)
11
A&+‘)
12 13 14
NC PA Qb
TXBLE
descriptors
log of partition coefficient P for parent compound sum of two atomic superdelocalizabilities involved in an A-regional bond for parent compound highest occupied molecular orbital energy for parent compound bond superdelocalizability for A-region dihydrodiol form bond order for the A-region dihydrodiol form change in delocalization energy carbonium ion free valence carbonium ion atomic superdelocalizability energy loss in going from the parent compound to the A-region epoxide or dibydrodiol energy 10s~ in forming the dihydrodiol-epoxide from the A-region dihydrodiol energy change in forming the trihydrotriol carbonium ion from the dihydrodiol-epoxide no. of cabon atoms bond order for the A-region parent compound carbonium ion charge density at the henzylic carbon position
3
Metabolites
and descriptors
A Ed”
AE,(‘)
A
Ed’)
1nP
NC
LB’
A&w
Sb
LA IfOMO
pA
PB.
Fb
Qb
Display The of the If the method
of data structure advantage of the display method is that it offers easy understanding complicated data structure in a high-dimensional space [ 19, 201. intrinsic dimensionality of the data is high, the nonlinear mapping is preferred_ However, it should be remembered that the axes given
NC
A,?,$‘)
A E,(‘)
Sb A&(‘)
HOMO LB' be Abdoc Fb
-0.790 0.664 -0.313 0.707 -0.434 0.876 0.164 -0.233 0.474 0.922
1.000
0.690
LA
InP
1.000 -0.842 -0.013 0.370 0.125 0.256 0.473 0.798 0.457 -0.212 0.263
LA
1.000 -0.374 0,014 -o.491 0.139 -0.735 -0.562 -0.063 -0.209 -0.574 -0.350 -0.869 0.950 0.710
0.792
1.000 -0.909 0.981 -0.952
LB,
descriptors
HOMO
matrix for twelve etructure
lnp
Correlation
1.000 -0.834 0.988 -0.502 0.600 0.994 -0.970 -0.557
Pg.
1.000 -0.903 0.890 -0.275 -0.771 0.903 0.783
A&mc
1.000 -0.620 0.533 0.968 -0.985 -0.641
Fb
1.000 0.021 -0.415 0.635 0.796
Sb
1.000 0.653 -0.478 -0.170
aE,$‘)
1.000 -0.947 -0.499
dE,(‘)
1.000 0.656
A&(‘)
1.000
NC
608
----. -;.. _-
..----~--
‘;--_.
.
__
7’
.j
I
i I
-
-.
. .
1:
,!
-
--
.__~_._
i
L..
I
1I I
. ‘b i_i
i
_P 1-1 I i
. - ._
_.::--
1.
.---~
__z. _-.
:.
k
.7-
_ --
.’ .
I
-1__ __I
-.. . .
.“-__
1 I
!
i.. .----
.-
.
I
1
!
I
_---
”
tp
i ’
:_: -.--!
_.--__-.
_---___ -. .
i-
-_.!
--.i
L
-1
‘_,,. ----...,
_
f -- ._~_ .:-_, '.'
.-.
-
:. I--. ,. . ;.
1
----”
-_._
‘-_.
Fig. 2. Branching-tree
---YTl’ diagram
for 25 PAHs.
by nonlinear transformation have no physical meaning with respect to the original axes. In contrast, if the intrinsic dimensionality of the data is low, linear mapping is preferred. The Karhunen-Loeve transform is one of the linear mapping methods. Each new transformed axis is a linear combination of the descriptors xk and is orthogonal to the other axes. The method starts by diagonalizing the covariance matrix C to obtain eigenvector matrix 2’ and eigenvalues Al, AZ,.. -Ad.
T'CT=A=
(1)
with A, > X, >... Xd and r= (t”‘, t(‘) ,--- t@‘). for display purpose in twodimensional space, the eigenvectors t(l) and t(l) associated with the two largest eigenvalues A, and X2 are used for calculating new axes, 2, and Z2: Next,
609
2,
= 1
fk(‘)
xk
and Z2 = x tkt2)xk
This K-L plot in two-dimensional space is the best projection that has minimum mean squared error of variance. The reliability of interpretation using this plot is calculated by the percent variance (WVar) retained by this method: %Va.r = [(hi + h2)/Chk ] X 100. If %Var is near 100, the plot is satisfactory for displaying the data structure. In order to display the multidimensional data structure, the KarhunenLo&e (K-L) transform is applied to the autoscaled molecular descriptors of PAHs and their metabolites. The K-L plot in two-dimensional space (Z,, 2,) for 25 PAHs is shown in Fig. 3. Since the percent variances retained by the K-L plot in two- and three-dimensional space are 92.1 and 96.7, respectively, the reliability of interpretation of the two-dimensional map is very high. This map is helpful in understanding the multidimensional data structure. It seems that ZI and 2, axes are related to the length and roundness of PAH, respectively_ An asymmetric type of data structure can be observed [ 211. As the carcinogenic process for PAHs involves a series of metabolic transformations, it would be reasonable to suppose that carcinogenic compounds and their metabolites have similar properties. The noncarcinogenicity may be due to several factors.
Fig. 3. K-L
Plot of 25 PAHs.
612
L = 11_611AEd,,,
-
6.628
(r = 0.736, s = 0.817, F = 15.36)
L = 11_931L,, - 107_406AE, “‘-283.610
(r = 0.899,
(22)
s = 0.550,
F = 25.27) (23)
I, = ll.l45L,,
-
184.581PB* + 146.493
(r = 0.896, s = 0.558,
F = 24.41) (24)
L = 9_76OL, + 21_427AE, U) - 34.431
(r = 0.877, s = 0.604,
F = 19.98) (25)
The discriminant functions for carcinogenicity of Jerina et al. [ 2, 31 are very similar to those for carcinogenicity of Arcos and Argus [ 17]_ DISCUSSION
Cluster analysis and linear mappin,0 are used for understanding the multidimensional structures descriptor space. The intrinsic dimensionality of the data is about three. The quantitative analysis to develop discriminant functions shows the importance of the reactivity of the A-region of parent PAHs and the indices for the later metabolites. The result that more significant linear discriminant functions are obtained for only the carcinogenic compounds shows the asymmetric nature of the descriptor data space structure [21]_ Smith et al. [ 71 could find no good explanation for the carcinogenicities of benzo[ a] tetracene (Compound 7) and naphtho[ 2,3-a] pyrcne (Compound 18). It can he seen from Table 5 that the calculated carcinogenicity for Compound 18 is in fair agreement with the observed value. This success obtained by using multidescriptors strongly supports the presumed model of metabolic transformations_ The prediction of non-carcinogenic compounds in this study also could not explain the non-carcinogenicity of Compound 7. An explanation may be that the model for non-carcinogenicity would require deactivating processes such as the L-region model. REFERENCES 1 A. Pullman and B. Pullman, Adv. Cancer Res.. 3 (1955) 117. 2 D. M. Jerina and R. E. Leht, in V. Ulltich, I. Roots, A. G. Hildebtant, R. W. Estabtook and A. H. Conney (Eds.), Microsomes and Drug Oxidations, Pergamon. Oxford, 1977, 709pp. 3 D. M. Jetina, R. Leht, hf. Schaefer-Ridder, Wood, A. Y. H. Lu, D. Ryan, S. West, W. Watson and I. Winstin (Eds.), Origins of
H. Yagi, J. M. Katie, D. R. Thakket, A. H. Levin and A. H. Conney, in H. Hiatt, J. D. Human Cancer, Cold Spring Harbor, NY,
1977.639pp. 4 P. Sims, P. L. Grover. A. Swaisland. K. Pal and A. Hewer, Nature. 252 (1974) 326. 5 A. Borgcn, H. Darvey, N. Castagnoli. T. T. Ctocker, R. E. Rasmussen and I. Y. Wang, J. Med. Chem., 16 (1973) 502. 6 A. W. Wood, W. Levin, A. Y. H. Lu;D. Ryan, S. B. West, R. E. Leht, M. SchaefetRidder, D. IM. Jerina and A. H. Conney, Biochem. Biophys. Res. Commun., 72 (1976) 680.
613
7 I. A. Smith, G. D. Berger. P. G. Seybold and M. Servi?, Cancer Res., 38 (1978) 2968. 8 B. Pullman, Int. J. Quant. Chem.. 16 (1979) 669. 9 G. L. Kirschner and B. R. Kowaiski, in E. J. AriZns (Ed.), Drug Design, Vol. 8, Academic, New York, 1979, 73pp. 10 A. J. Stuper, W. E. Brugger and P. C. Jurs, Computer--*isted Studies of Chemical Structure and Biological Function, Wiley, New York, 1973. 11 B. Nordin, U. Ediund and S. Weld, Acta Chem. &and., Sect. B, 32 (1978) 602. 12 W. J. Dunn III and S. Wold, J. Med. Chem., 21 (1978) 1001. 13 P. C. Jurs, J. T. Chou and M. Yuan, J. Med. Chem., 22 (1979) 476. 14 J. T. Chou and P. C. Jurs, J. Med. Chem., 22 (1979) 792. 15 M. Yuan and P. C. Jurs, Toxicol. Appl. Pharmacol., 52 (1980) 294. 16 P. B. Hulbert, Nature, 256 (1975) 146. 17 J. C. Arcos and M. F. Argus, Adv. Cancer Hes., 11 (1968) 305. 18 Y. Takahashi, Y. Miyashita, H. Abe, S. Sasaki, Y. Yotsui and M. Sane, Anal. Chim. Acta, 122 (1980) 241. 19 B. R. Kowalski and C. F. Bender, J. Am. Chem. Sot., 94 (1972) 5632; 95 (1973) 686. 20 Y. Miyashita, Y. Takahashi, Y. Yotsui, H. Abe and S. Sasaki, Proceedings of 7th International CODATA Conference, No. 41. Pergamon, Oxford, 1981, p. 37. 21 W. J. Dunn III and S. Woid, J. Med. Chem., 23 (1980) 595.