Chemical Physics ELSEVIER
Chemical Physics 209 (1996) 19-29
Theoretical determination of the solvation free energy in water and chloroform of the nucleic acid bases Modesto Orozco a, *, Caries Colominas a, Francisco J. Luque * ,b a Departament de Bioqufmica i Biologfa Molecular. Facultat de Qufmica, Universitat de Barcelona, Marti i Fraru!u~s 1, Barcelona 08028, Spain b Departament de Farm~tcia, Unitat Fisicoqufmica, Facultat de Farmhcia, Universitat de Barcelona, Avgda Diagonal s / n , Barcelona 08028, Spain Received 31 January 1996
Abstract
The free energy of solvation in H20 and CHCI3 of the N-methyl derivatives of the five nucleic acid bases has been determined from self-consistent reaction field (SCRF) calculations and free energy perturbation (FEP) simulations. Theoretical estimates of the respective water/chloroform partition coefficients (log P) were determined from the solvation free energies in the two solvents. Comparison of the results with experimental data suggests a reliable estimation of the free energies of solvation for the nucleic acid bases. 1. Introduction
The free energy of solvation, AGsolv, is the work needed to transfer a molecule from the gas phase into solution. It plays a key role in the understanding of the chemical behaviour of molecules in condensed phases. This is clearly stated by the widespread use of thermodynamic cycles (Fig. l) in the study of reactions in solution. The availability of experimental and theoretical techniques able to evaluate the magnitude of AGso~v accurately is essential to rationalize chemical reactivity in condensed phases. Unfortunately, the experimental determination of AGso~v is not straightforward and this has promoted the development of different theoretical methods to estimate AGso~v. Among them, free energy perturbation (FEP) and self-consistent reaction field (SCRF) are
* Corresponding authors.
the most popular, mainly due to the accuracy of the results obtained upon a detailed parametrization of these methods. Examples can be found in the literature [1], but the large number of studies precludes a thorough revision. Evaluation of AGsolv for complex molecules can be performed from high-quality SCRF and FEP techniques. Particularly, a large amount of research effort has been focused on the nucleic acid bases, since the free energy of hydration is presumably expected to have a considerable influence on the structure and reactivity of the bases and, accordingly, of the nucleic acids. Thus, since the seminal FEP calculations performed in the framework of molecular dynamics (MD) by Kollman's group [2], numerous studies have reported estimates of the hydration free energy for the nucleic bases: MD-FEP simulations [3], classical SCRF calculations [4], semiempirical [5] and ab initio [6] quantum mechanical (QM) SCRF calculations, and mixed QM and molecular mechanical
0301-0104/96515.00 Copyright © 1996 Elsevier Science B.V. All rights reserved. PII S 0 3 0 1 - 0 1 0 4 ( 9 6 ) 0 0 1 1 2 - 7
M. Orozco et al./ Chemical Physics 209 (1996) 19-29
20
aG^.,s "~
A(gas phase)
g
B(gas phase)
AGtolvA
AGsolv B
AGA..:,.Bs°luti°n A(solution)
B-
B(solution)
large errors in the partition coefficient. Moreover, the agreement between theoretical and experimental estimates of log P would also give confidence to the theoretical simulation, thus supporting the reliability of the free energies of solvation determined in two different solvents. Eventually, approximate values of the absolute free energies of solvation may be envisaged from inspection of the different theoretical estimates.
AGA.>Bs°luti°n = AGA.>B vac + AOtolvB - AGiolvA = AGA.>Bvae + AAOtolvA'>a
Fig. 1. Thermodynamiccycle used to determine free energy changes in solution. The free energy change in gas phase is typicallycalculatedat the ab initio level.
2. M e t h o d s
2.1. Gas phase calculations (MC/QM-MM) simulations [7]. Nevertheless, the hydration free energies determined from these methods exhibit rather large discrepancies. Indeed, less attention has been paid to the solvation of nucleic acid bases in non-polar solvents, like chloroform [8]. We have recently reported the ab initio [9] and semiempirical [10] optimized versions of the highlevel SCRF method proposed by Miertus et al. (MST [11]) to determine AG~oI~ in aqueous [9,10], carbon tetrachloride [12] and chloroform [13] solutions for neutral solutes. The results are quite satisfactory, with average RMS deviations for the free energy of solvation in the range of 0.7-1.0 kcal/mol for water, and less than 0.5 kcal/mol for the apolar solvents. The accuracy of the results encouraged us to study the solvation of nucleic acid bases in water and in chloroform. In this study results derived from SCRF calculations are compared with values of AG~ol, reported previously and with estimates derived from MC-FEP calculations. Moreover, assuming that the mixing between water and chloroform is negligible, the SCRF and MC-FEP free energies of solvation are used to estimate the water/chloroform partition coefficient (log P; Eq. (1)), which can be compared with experimental data [14]. 1
. [ At'z,_H 2 0
2.2. MST-SCRF calculations The free energy of solvation is computed as the addition of three contributions: cavitation (AGony), van der Waals term (AGvw) and electrostatic (AGcle) free energies (Eq. (2)). AGso,v = AGc. v + AGvw + AGel ~
log P (water/CHCl3) __
Gas phase geometry optimizations at the ab initio 6-31G* level [15] were performed for the N-methyl derivatives of adenine (A), guanine (G), uracil (U), cytosine (C) and thymine (T). The amino group tends to be out of the plane in the optimized geometries, even though a planar geometry is expected in a polar solvent. Since these changes in geometry might be important in the determination of AG~oj~, the geometrical parameters were also optimized imposing molecular planarity. Both planar and non-planar geometries were used in ab initio MST-SCRF calculations, while only the planar geometry was considered in MC-FEP simulations (see below). The AM1 [16] gas phase optimized geometry was used in AM1-MST calculations. According to previous studies, no large errors are expected from the use of gas phase geometries in MC-FEP or MST-SCRF calculations in water and chloroform [lc,d],[12],[13],[17].
A :';'. C H C I 3 ~
2.303RT~,.,sol~--Vsol~
!
(1)
Comparison of theoretical and experimental partition coefficients is very challenging, since moderate errors in the theoretical values of AG~ol~ may induce
(2)
The AGe, v is determined using Pierotti's scaled particle theory adapted to non-spherical solutes [18]. The AGvw is evaluated from Eq. (3), where S i stands for the surface of atom-/ and the parameter ~i defines the strength of the van der Waals interaction [9,10,12,13,19]. The solute/solvent interface, which
M. Orozco et al. / Chemical Physics 209 (1996) 19-29
is a critical factor in the reliability of SCRF methods, was built up using the GEPOL-92 algorithm [18] from the scaling of van der Waals atomic radii by a factor of 1.25 (6-31G*) or 1.20 (AM1) for aqueous solutions [9,20,21], and 1.60 (6-31G * and AM1) for chloroform [13]. This cavity definition was used for evaluation of all the free energy components. However, other versions of the MST method which uses different radii for the different components of the solvation free energy N
AGvw = E ~iSi
(3)
i=1
The AGete was computed using the standard MST method (Eq. (4); see Ref. [22] for additional details). The reaction field operator, VR (Eq. (5)), is determined from the surface charge distribution, tr(s), induced at the solute/solvent interface by the solute charge distribution. The partitioning of the cavity surface into M small elements, Si, allows us to rewrite Eq. (5) in terms of the imaginary "solvation" charges spread over the cavity surface (Eq. (6)). Such charges are determined by solving the Laplace equation which leads to Eq. (7), in which both solute ( p ) and solvent (tr) contributions to the electrostatic potential at the cavity surface, defined by the unitary vector n, are considered. The solute contribution to the potential was rigorously computed at the SCF level as the expectation value of the r-~ operator [23]. In semiempirical calculations the ortho procedure [24] was chosen to evaluate the electrostatic potential due to its advantages with respect to other algorithms [10]. Dielectric constants of 78.4 and 4.717 were used for water and chloroform at 298 K, respectively [25]. AGete = (~bIH ° + ITRI~) - ( q,01,q01~0 > - 1/2((~bIVRI~b) + fpnueV~(s)ds)
(5)
VR = fs I r " ~ _rl dS
" -o'( -rj)Sj = ~
E Irj-rl jzl qj
=
M
:=
q--J [rj-rl
~ - - l (0(V,r+Vp)) -- Sj --~--~~ On n
2.3. MC-FEP calculations Simulations were performed to estimate the relative AGsolv of the methylated bases in chloroform. The mutations performed were G ~ A, G ~ T, T U and T --->C. These simulations imply small perturbations, since the volume of the solute is not largely altered. Accordingly, they are expected to have less statistical uncertainties than direct calculation of solvation free energy obtained through complete annihilation of molecules, as done in previous studies [2],[8b]. In all cases the solute was placed into a pre-equilibrated box of 264 CHC13 molecules [26]. A nonbonded cutoff of 9.0 A was used. In addition, simulations were performed to determine the absolute AG~o~v in water and chloroform. For this purpose 1-methylthymine was mutated to pyridine (T ~ pyr), for which the experimental AGso~v in water and chloroform is available [27]. The MC-FEP estimate of the AQot~ for a given base (X) can then be determined according to Eq. (8). The simulation in chloroform was performed using the procedure noted above, whereas in water the solute was inserted into a cubic box of 502 TIP3P [28] water molecules. Values of AAG~o~ ( T ~ X) in aqueous solution were taken from MD-FEP results reported previously [3], which were determined using identical simulation protocol, force-field, water model and cutoff (see below). This strategy is expected to reduce largely the uncertainties in the FEP estimates of AG~ol~, since the most difficult perturbation (the complete annihilation of the reference molecule) is avoided. AG~olv(X) = AGsoj~(pyr) - A AG~o.v(T ~ pyr)
+ a AQo,v(a" --, x)
(4)
(6)
(7)
21
(8)
All the simulations were performed at the isobaric-isothermic ensemble (NPT; 1 atm, 298 K) using periodic boundary conditions (PBC). The geometry of the planar base was mutated in the simulation, but no sampling of the internal degrees of freedom was performed and no intra-molecular contributions to the free energy change were considered. Parameters determining volume changes and solute rotations and translations were adjusted until an acceptance level around 40% was obtained. The OPLS
M. Orozco et al. / Chemical Physics 209 (1996) 19-29
22
force-field was used for both solute and solvent molecules [8,26,28]. This force-field has been previously used successfully for the calculation of water/chloroform partition coefficients [26]. The relative free energy of solvation, AAG~olv, was determined using FEP theory (Eq. (9) [29]). The number of intervals and the length of the simulation varied with the perturbation. For the simplest mutation T ~ U only 6 windows were used, each with 2- 106 configurations for both equilibration and averaging. The perturbations G ~ A, G ~ T and T ~ C were carried out in 11 windows (3- 106 configurations for both equilibration and averaging). The mutation T ~ pyr was done in 15 windows for chloroform (2- 106 configurations for equilibration; 3 • 106 configurations for averaging), and in 21 windows for water (3 • 106 configurations for equilibration; 4- 106 configurations for averaging). Double-wide sampling was used in all cases. Inspection of the free energy profiles and of the statistical analysis of the simulations confirms the suitability of the perturbation protocol (see below). AGi_~ j = - kaT
ln(exp - ( H i - H i ) / k a T ) i
(9)
where i, j stand for two states of the system, and H is the Hamiltonian. The goodness of the simulations was analyzed from the computation of the hysteresis error and the standard deviation. The hysteresis was measured by the difference between the average (forward and reverse) free energy value and that obtained for either the forward or reverse simulation. The standard deviation (SD) for each window was determined from independent averaging every 5 - 1 0 5 configurations, and was propagated throughout the simulation using Eq. (10). SOn =
~ / j = l~
SO:
(10)
where n is the total number of windows. Ab initio gas phase calculations were performed using Gaussian 92 [30]. MST-SCRF calculations were performed using a locally modified versions of MonsterGauss [31] and MOPAC-93 Rev. 2 programs [32]. MC simulations were performed using BOSS 3.4 computer program [33]. All the simulations were performed on a Cray Y-MP at the Centre de Super-
computaci6 de Catalunya, and on SGI and HP workstations in our laboratory.
3. Results and discussion
Ab initio and semiempirical MST-SCRF calculations give large free energies of solvation (Table 1), which range typically between - 1 0 and - 2 0 kcal/mol in water, and between - 1 0 and - 1 5 kcal/mol in chloroform. The values are nearly insensitive to the geometry. Thus, the largest difference due to the use of a planar or non-planar geometry is found for 9-methylguanine and amounts to 0.5 kcal/mol. This confirms the reduced dependence of AGsotv on geometrical parameters for small rigid molecules, and gives confidence to the use of planar structures in MC-FEP simulations. The AM1-MST results are surprisingly similar to the ab initio values, the difference being less than 1 kcal/mol in most cases. The largest deviation is found for methylated guanine in aqueous solution, where the ab initio AGsolv is 3.0 kcal/mol smaller (in absolute terms) than the AM1 value. The reasons for this discrepancy, which is also found in methylated adenine, are unclear, especially when AM1 Table 1 Free energies o f solvation ( k c a l / m o l ) o f the m e t h y l a t e d derivatives o f nucleic acid bases f r o m M S T (AM1 a n d 6 - 3 1 G " ) calculations. Values for p l a n a r (p) geometries are also given for adenine, g u a n i n e and cytosine. See text for details Base A A(p) A A(p) G G(p) G G(p) T T U U C C(p) C C(p)
Solvent H20 H20 CHCI 3 CHCl 3 H 20 HzO CHCI 3 CHCI 3 H20 CHC13 H20 CHCI 3 H20 H20 CHC! 3 CHCI 3
AGsolv
AGml v (6-31G" )
(AMI)
-8.7 -8.5 - 10.5 - 10.5 - 18.0 - 18.1 - 15.3 - 15.4 - 10.3 - 10.2 - 10.9 - 10.0 - 15.0 - 15.1 - 12.4 - 12.4
- 10.8 - 10.8 - 10.4 - 10.4 - 20.6 -21.1 - 14.0 - 14.2 - 10.1 - 9.5 - 10.4 -9.1 - 16.0 - 16.1 - 11.3 - 11.3
M. Orozco et a l . / Chemical Physics 209 (1996) 19-29 Table 2 Electrostatic component (kcal/mol) of the free energy of solvation from MST (AMI and 6-31G * ) calculations. The contribution (in percentage) to the free energy of solvation is given in parenthesis. Values for planar geometries Base
Solvent
AGete (6-31G * )
AGete (AMI)
A
H20 CHC13 H20 CHCl 3 H20 CHCI 3 H20 CHC13
- 11.9 (140) - 3.3 (31) - 21.9 (121) - 7 . 9 (51) - 13.6 (132) - 4.0 (39) - 14.1 (129) - 4.4 (44) - 18.1 (120) - 6.4 (52)
- 14.5 (134) - 3.0 (29) - 25.4 (120) - 6 . 5 (46) - 13.5 (133) - 3.2 (34) - 13.8 (133) - 3.4 (37) - 19.5 (121) - 5 . 2 (46)
G T U C
H20
CHCI 3
and 6-31G * results are very similar for chloroform and when a close agreement is observed for the other bases. Inspection of the contributions to the AG~ol~ shows the relevance of electrostatic interactions (Table 2). Thus, in aqueous solution the AGe]e is around 130%
4.0
o
,
,
.
G->A . .
of AG~olv, while the steric term (cavitation and van der Waals) does not favour the transfer from gas phase to water. In contrast, in chloroform the electrostatic component is around 40% of AGso~v, and the steric term makes a favourable contribution to the transfer from gas phase into solution. This finding strongly argues against the assumption that electrostatic interactions a minor role in the solvation in apolar solvents. Moreover, both the magnitude of AGete and its relative contribution to AGsoiv are rather insensitive to the method (AM1 or ab initio 6-31G * ) used in MST calculations. The calculation of the MC-FEP estimates of AGsol~ was done in two steps (see Methods). First, relative free energies of solvation were determined from mutations between the bases both in water and in chloroform, and then the absolute free energies of solvation were evaluated from the mutation of the methylated thymine to pyridine, a reference molecule whose AQoiv in water and chloroform is known experimentally [27]. Due to the nature of this mutation, the error in the absolute AGsoiv of the bases is G->T
.
4.0
20
2.0 /-~
,'
AG=-3.2 keal/mol hyst=0.17 kcaYmol std.=0.15 keal/mol
0.0-
-2"00.0
23
02
04
0.6
hyst-~0.35kcaYmol std.----0.22keaYmol
0.0-
0.8
-2.0
1.0
0.0
I
I
!
0.2
0.4
0.6
T->C
1.0
4.0 A ~ . 8 3 kcal/mol hyst--O.02keal/mol std.=O.12 l~al/mol
2.0
0.0-
I
-2"00.0
I
0.8
T->U
4.0
"~
I
0.2
,
I
I
0.4
0.6
,
I
0.8
o
2.0
~
0.0-
--2.0
1.0
0.0
ACt=1.5 kcal/mol hyst.--0.11 kcal/mol I
J
t
0.2
0.4
0.6
,
I
0.8
1.0
L Fig. 2. Free energy perturbation profile for the mutation between the methylated bases in chloroform. Forward (X ~ Y) and reverse (X ,-- Y) directions were considered for each simulation, but only the average value is shown. Error bars indicate the standard deviation for each window. The relative free energy of solvation (AAGsotv) and its standard deviation and hysteresis errors are given in the insert.
24
M. Orozco et aL / Chemical Physics 209 (1996) 19-29
expected to be greater than the errors in AAQolv between the bases. However, these errors are expected to be smaller than those arising from a direct annihilation of nucleic acid bases. Fig. 2 shows the AAGsolv profiles for the mutations G ~ A, G ~ T, T ~ U and T ~ C in chloroform. The profiles are smooth, with no discontinuity that suggests insufficient sampling. The hysteresis between forward and reverse directions is small, typically around 0.2 kcal/mol, with a maximum error of 0.35 kcal/mol. Therefore, the forward and reverse estimates of AAGsotv differ, on average, by less than 0.5 kcal/mol. The standard deviation is also moderate, around 0.2 kcal/mol, which gives a 95% confidence interval of +0.4 kcal/mol for the average AAGsolv. Accordingly, the statistical analysis suggests that the simulations are quite accurate, and that the MC-FEP estimates are unlikely to contain severe errors originated from the simulation protocol. It is worth noting that this was also anticipated by the excellent results found by Richards and co-workers using MD-FEP techniques with a similar protocol [3]. Table 3 shows the MC-FEP values of A AG~olv relative to 9-methylguanine in water and chloroform. The similarity between the MST and MC-FEP estimates of AAG~olv in chloroform is notable. Both semiempirical and ab initio MST results suggest the following ordering for the free energy of transfer from gas phase into chloroform: G > C > A > T > U. This ordering is also reproduced by MC-FEP calculations, except for the methyl derivatives of adenine and thymine, which exchange their respective ordering. The discrepancy is small, especially considering that the OPLS-FEP simulation gives a difference of only 0.3 kcal/mol between adenine and thymine, which is clearly below the accuracy of the simulation. From the deviation between MST and FEP values, as estimated from the root mean square error (RMS), the AGsol~ estimated in chloroform should differ by around 1.1-1.6 kcal/mol. Keeping in mind the errors inherent to MST and FEP calculations, the agreement is encouraging. It is also noticeable the agreement between our FEP estimates of AAG~oI~ in chloroform and those obtained by Pranata and Jorgensen [8b] from direct annihilation. The error (RMS around 0.7 kcal/mol) is smaller than the expected hysteresis errors [8b], which supports the
Table 3 Differences of solvation free energies (kcal/mol) in water and chloroform for the methylated derivatives of nucleic acid bases relative to 9-methylguanine Base
Method
AAGsolv (H20)
AAGsolv(CHCi3)
A
OPLS-FEP a AMI-MST 6-31G *-MST AMBER-FEP b AM1-SM2 c 6-31G *-SCRF d FDPB e OPLS-FEP AM1-MST 6-31G *-MST AMBER-FEP AM I-SM2 6-31G *-SCRF FDPB OPLS-FEP AM I-MST 6-31G *-MST AMI-SM2 6-31G *-SCRF MC-QM/MM OPLS-FEP AM 1-MST 6-31G *-MST AMBER-FEP AMI-SM2 6-31G *-SCRF FDPB
10.1 10.3 9.6 7.0 3.4 9.6 8.9 8.6 11.0 7.8 12.1 11.0 7.2 9.3 8.9 10.7 7.2 9.5 6.1 3.6 1.6 5.0 3.0 6.9 5.6 3.1 2.9
3.2 (3.7) 3.8 4.9
T
U
C
2.9 4.7 5.2 4.4 (4.7) 5.1 5.4 -
2.1 (3.1) 2.9 3.0 -
a Results in water are from Ref. [3]. Values in parentheses are taken from Ref. [8b]. b Ref. [2]. c Ref. [5b]; relaxed geometries are considered. d Ref. [6]. e Ref. [4].
goodness of the simulation protocol followed by the authors to perform a very drastic perturbation. In aqueous solution the discrepancies between the different computational methods are greater. Nevertheless, several conclusions can be reached. All the methods suggest that the methylated guanine has the most favourable transfer from gas phase into solution. Indeed, with exception of the AM1-SM2, 1methylcytosine follows the methylated guanine in such a preference. Values for the methyl derivatives of adenine, thymine and uracil lie in a close range, around 1-2 kcal/mol for most methods, which makes it difficult, if not impossible, to establish a
25
M. Orozco et al./ Chemical Physics 209 (1996) 19-29 AGX~ot,,(mo)
L X v~
&GXso]v(CHCl3) Ii
AAGX->Yvac
yvac
AGYtolv(CItCI3)
xCllCI3
'
G CHCI3->H20 .
X m°
AAGX'>Yroo
AAGX->YcHct3
I~ yCHCI3
AAGYcHcI3.>H20 ID
I
ymO
t AGY~I~H20) IogP x - logPy = -1/2.303RT (AAGX'>YcHcB - AAGX->Ymo)
Fig. 3. Thermodynamic cycle used to determine the Alog P(water/chloroform) for the N-methylated bases relative to 9methylguanine. clear, definite ordering. However, it seems that either A or T has the least favoured transfer from gas phase to water. Despite these discrepancies, a detailed comparison of data in Table 3 yields qualitative information, since a general agreement is found between the different estimates ~. Thus, results (with the exception of the AM1-SM2 value) reasonably agree for 9-methyladenine, whose AG~olv is larger by 9 - 1 0 k c a l / m o l than that of the methylated guanine, the A M B E R - F E P value being slightly lower (7 kcal/mol). For 1-methylthymine the AAQo]v is around 8-11 kcal/mol. In this case the AMBER-FEP estimate is slightly higher (12.1 kcal/mol). Values of the AAGsotv for U and C are less clearly defined, but a relatively broad range of 6 - 1 0 and 3 - 5 kcal/mol, respectively, can be obtained. For 1-methylcytosine the AMBER-FEP value (6.9 kcal/mol) seems to be too large, while the OPLS-FEP estimate (1.4 kcal/mol) is probably too small. In summary, the relative free energies of solvation for the methylated bases relative to 9-methylguanine seems to lie in the following range: A ( 8 - 1 0 kcal/mol), T (8-11 kcal/mol), U ( 6 - 1 0 kcal/mol)
1 MC-QM/MM results by J. Gao are not included in Table 3 since they were obtained for the non-methylated bases. For a qualitative comparisonthe reader is addressed to Ref. [7].
and C ( 3 - 5 kcal/mol). The AM1-SM2 method differs clearly in the AAGhyd for the methylated adenine, but the remaining results are not very different to those obtained from other methods. Finally, AMBER-FEP results, which are not clearly outside the range of expected values, are slightly different in most cases from the results provided by the rest of methods, which can be attributed to the simulation protocol used by Kollman and co-workers in their seminal work [2]. The relative free energies of solvation in water and chloroform can be used to estimate differences in partition coefficients, Alog P(water/chloroform), following the thermodynamic cycle shown in Fig. 3 and assuming that the mixing of water and chloroform is negligible. Comparison of differences in partition coefficients is interesting, since errors are presumably largely accumulated in the theoretical calculations. Results in Table 4 show that, on average, AM1-MST values deviate 1.7 (log P units) from experimental data, while OPLS-FEP deviates 1.5 and the 6-31G*-MST values only 0.9, which considering the large source of errors is a reasonable agreement. The present results allow us to determine the hydrophilicity ordering a priori, even though the small differences to be reproduced (around 0.4 k c a l / m o l according to experimental data) make it difficult to obtain reliable results. The AM1-MST method gives the ordering of hydrophilicity (G > C > U > A > T) correctly, with exception of the relative hydrophilicity of 1-methylthymine, which differs from the value of 9-methyladenine by only 0.2 (log P units). The 6-31G*-MST fails in the relative ordering of 1-methylthymine, and the OPLS-FEP simulations fail simultaneously in the ordering of methylated cytosine and adenine.
Table 4 Differences of partition coefficients (water/chloroform) of the methylated derivativesof nucleic acid bases relative to 9-methylguanine Base FEP AMI/MST 6-31G*/MST Exp a A T U C
-5.1 -4.2 -3.3 +0.4
aRe~[l~.
-4.8 -4.6 -4.1 -1.5
-3.4 -1.9 -1.3 0.0
-2.7 -3.1 -2.3 -0.5
M. Orozco et al./ Chemical Physics 209 (1996) 19-29
26
An additional point which merits attention is the ability of these methods to reproduce absolute free energies of solvation. This is especially challenging,
(a)
9-methyla deninecu°3
AG t
/
Chloroformslmukttom
9-methylguaninecHc~
AG~ 1-mcthylthymine cncB
I -methyluracil c n ° 3
~G6 1.
Pyridine ''~
Pyridin¢ cncB .
1-me~ylcytosine c u ° 3
AG~aIv(A) = AG6 - 6 0 5 + AG2 + AG{ AGselv(G) = AG6 - AG5 + AG2 AGm{v(U) = AG 6 - AG 5 + AG 3
AG~Iv(C) = AG6 - AG5 + AG4 AG,o{~CI')= AG6 - AG5
(b)
9-methyladenine m °
AGI water dmulalions 1-methylurncil n2°
j
AG2
AG5 l - m c t h y l t h y m ~ H~O
AG 3
1-methylcytosinem°
&G4
AG6 Pyridine m °
,u
Pyridine T M
since the AGsolvis expected to have the largest error. The calculation of absolute AGso,v is straightforward within the MST formalism (Table 1), but difficult within the FEP framework owing to problems in the sampling of configurational space during the annihilation process of the bases. In order to minimize the error a mixed strategy, in which 1-methylthymine was mutated to pyridine, was used (see above). Since the AGsolv of pyridine in water and CHC13 is known from experimental data, the absolute AG,o,v of the bases can be easily determined from the scheme shown in Fig. 4. The free energy profiles for the mutation T ~ pyr in water and chloroform are shown in Fig. 5. The simulation in water requires much more computational effort (a total of 1.47 • 108 configurations for a system of 502 molecules) than in chloroform (7.5107 configurations for 264 molecules), and still exhibits the largest uncertainties. However, the profiles are smooth, with small hysteresis (0.24 for chloroform versus 0.51 kcal/mol for water) and standard deviation (0.19 and 0.38 kcal/mol for chloroform and water, respectively). Therefore, no relevant errors are expected in the AAGsol~ due to the simulation protocol. Fig. 5 shows an increment of AGsotv of 8.4 (water) and 7.9 (chloroform) kcal/mol, which indicates a better solvation of 1-methylthymine in both solvents with respect to pyridine. The absolute FEP-free energy of solvation in chloroform is given in Table 5, where selected MST values (Table 1) are included for comparison. Once again, the agreement with the FEP results obtained by direct annihilation by Pranata and Jorgensen is excellent, confirming the goodness of those simulations. Results in Table 5 state the qualitative agreement between MST and FEP results. The MC-FEP values are generally larger (in absolute terms) by 2 - 4 kcal/mol than the MST estimates. This small, but systematic deviation can be a priori due to
9-methylguanine m °
AG~I,,(A) = &G6 - AGs
- &G2 - AGI
&Gsoj,(G) = AG6 - AGs
+ &G3 - AG4
AG~o{ffU)= &G6- &G~ - AG2 AG~,tv(C)= &G6- AG5 + AG3 AG~o~I')= AGe- •Gs
Fig. 4. T h e r m o d y n a m i c pathway used to compute the free energy of solvation (AGso~v) in chloroform (a) and water (b) from FEP calculations. AGL2.3.4 in chloroform were determined in this
study, but in waterthey were taken from Ref. [3] (computedfrom OPLS-MD-FEP simulations). AG5 was calculated in this study. AG6 was taken from experimentaldata in Ref. [27].
M. Orozco et al. / Chemical Physics 209 (1996) 19-29
12.0 ~
•
,
•
4.0 ~ 2 00
.
--~
I
,
,
r 0.2
,
i 0.3
i 0.4
,
•
•
. . . .
Table 6 Polarization contribution ( k c a l / m o l ) to the free energy of solvation (AGpo 1) ill water and chloroform of the methylated derivatives of the nucleic acid bases. For technical details see text and Ref. [34]. MST was used in all cases
,
hys=0.51 kcaYmoi
f
,
10,0 t 8.0 ~
•
~ 0
r,-~7-, ~ 0.0 0.1
12.0
T->Pyr(water) . , ,
,
s t d ~ 0 . 3 8 kcal/mol
.
.
,
i 0,5 k
i 0.6
,
~
~ 0.7
,
i 0.8
,
t 0.9
, 1.0
Base
Method
AGpo I ( H 2 0 )
AGpo ~ (CHCI3)
G
6-31G * AM 1 6-31G* AM1 6-31G * AM1 6-31G * AM 1 6-31G* AMI
-3.2 - 5.3 -1.0 - 1.8 - 1.6 -2.1 - 1.8 - 2.3 -2.8 -4.1
-0.7 - 0.7 -0.2 -0.2 -0.3 -0.2 -0.3 - 0.3 -0.5 -0.5
A
T->Pyr(chJ.oroform) . . ,
.
.
.
.
.
T
AG=7.9 kcaYmol hys=0,24 keel/tool
~
::x: :
:
t
:
~
U ~
4.0 2.0
C
0.0 0.0
0.1
27
0.2
0.3
0.4
0.5
0.6
0,7
0.8
0.9
1.0
Fig. 5. Free energy perturbation profile for the mutation l-methylthymine ~ pyridine in chloroform and water. Forward (T ~ pyr) and reverse (T *-- pyr) directions were considered in the simulation, but only the average value is shown. Error bars indicate the standard deviation for each window. The relative free energy of solvation (AAGsolv) and its standard deviation and hysteresis errors are given in the insert.
different reasons, mainly intrinsic shortcomings in the MST algorithm, and inaccuracy of the nonpolarizable force-field parameters for the bases. It is worth noting that the errors intrinsic to the FEPsimulation protocol are not expected to be greater than 1 kcal/mol according to the hysteresis and standard deviations. It is difficult at this point to determine the errors due to the neglect of explicit treatment of solvent molecules in SCRF calculations, but it is possible to estimate a priori the uncertainties due to the use of non-polarisable force-fields in FEP calculations by inspection of Table 6, which contains the polarization contribution to the free energy of solvation in water and chloroform [34]. The polarization contribuTable 5 Free energies of solvation ( k c a l / m o l ) in chloroform for the methylated derivatives of nucleic acid bases. MST values for planar geometries. Values in parentheses are from Ref. [8b] Base
FEP
AM 1/MST
6-3 IG * / M S T
G A T U C
-17.4(-18.3) -- 14.2 ( -- 14.6) - 14.5 - 13.0 ( - 13.6) - 15.3 ( - 15.2)
-14.2 -- 10.4 -9.5 -9.1 - 11.3
-15.4 -- 10.5 - 10.2 -10.0 - 12.4
tion in water is large: typically from 2 to 5 kcal/mol, which represents around 25% the total free energy of hydration. Such a contribution is less than 1 kcal/mol in chloroform, which represents around 2 - 4 % the total free energy of solvation. This means that FEP calculations using a common set of charges for water and chloroform have a theoretical range of uncertainty of 2 - 4 kcal/mol due to the neglect of solvent polarization. If charges are parametrized to capture water polarization in an average way, the polarization effects in chloroform should be overestimated and the resulting log P should be too small. On the contrary, if water polarization is not fully captured in the parametrization, the free energy of hydration should be underestimated and the log P should be too large. The use of the FEP-OPLS AAGsolv for the mutation T ~ pyr in water (Fig. 5), in conjunction with the experimental AGsolv of pyridine and the values of AAGso]v reported for the bases in water [3], allows us to determine the AGsolv in aqueous s o l u t i o n (AGhyd). Results are given in Table 7, which includes other estimates of AGsolv found in the literature and present MST values for comparison. Large free energies of solvation are found. In general the most negative estimates are provided by the AM1-SM2 method. With the exception of this method, t h e AGhyd for the reference base, 9-methylguanine, is estimated to be around 17-21 kcal/mol. As a reference, the MST methods give a value of -21.1 (AM1) and - 18.1 kcal/mol (6-31G * ), and the FEP-OPLS gives - 21.7 kcal/mol.
28
M. Orozco et al./ Chemical Physics 209 (1996) 19-29
Table 7 Free e n e r g y o f solvation ( k c a l / m o l ) in a q u e o u s solution o f the m e t h y l a t e d derivatives o f nucleic acid b a s e s Method
G
A
T
U
C
OPLS/FEPa AMBER/FEP b AMI/SM2 c 6-31G*/SCRF d FDPB e
-21.7 -- 19.6 -24.3 -16.1 - 19.7
- 11.6 -- 12.6 -20.9 -6.5 - 10.8
-13.1 --7.5 -13.3 -8.9 - 10.4
-12.8 -14.8 -10.0 -
-20.1 -- 12.7 -18.7 -13.0 - 16.8
AMI-MST 6-31G*-MST
-21.1 -18.1
-10.8 -8.5
-10.1 -10.3
-10.4 -10.9
-16.1 -15.1
a Ref. [3]. b Ref. [2]. ¢ Ref. [5b]; relaxed geometries are considered. a Ref. [6]. e Ref. [4].
4. Conclusions
MST and FEP-OPLS results in Tables 5 and 7 can be used according to Eq. (1) to obtain the partition coefficients, log P(water/chloroform), which can be compared with experimental data [14] (Table 8). FEP-OPLS calculations tend to underestimate the log P, which is probably due to an overestimation of the AGsolv in the apolar phase (see Table 5). This results can be understood considering that OPLS atomic charges for the bases were obtained [8a] by fitting ab initio interaction energies between the bases and water molecules, which implies that the effect of water polarization was indirectly captured in the parametrization process [8a]. As noted above, these water-polarized charges should overestimate polarization effects in chloroform (see Table 6). We do not believe that technical errors like improper samplings, use of rigid models for the bases or neglect of long range effects can explain the results. Despite the numerical uncertainties, it is encouraging to note that the deviation (RMS of 1.5 kcal/mol) from experimental data is not far from the Table 8 W a t e r / c h l o r o f o r m partition coefficient for the m e t h y l a t e d derivatives o f the nucleic acid bases Method
G
A
T
U
C
OPLS-FEP A M 1- M S T 6-31G *-MST Experimental a
3.2 5.1 2.0 3.5
- 1.9 0.3 - 1.5 0.8
- 1.0 0.4 0.1 0.4
- 0.1 1.0 0.7 1.2
3.5 3.5 2.0 3.0
a Ref. [14].
statistical uncertainties in the results. The 6-31G*MST method seems to underestimate the partition coefficient slightly, but uniformly, even though the agreement with experimental data (RMS of 1.3 kcal/mol) is better than that obtained for FEP-OPLS. Surprisingly, the best fitting is found for the AM1MST, which reproduces accurately the range of partition coefficients, as noted in the RMS of only 0.8 kcal/mol. This finding, which might be due to fortuitous cancellation of errors, provides nevertheless confidence in the method.
The "state of the art" calculations performed here and the critical review of available data reported in the literature have provided detailed information on the solvation of nucleic acid bases in both water and chloroform. Furthermore, to our knowledge this is the first time that the chloroform ~ water transfer free energies for the nucleic bases have been calculated a priori, thanks to the use of thermodynamic cycles. FEP simulations and MST-SCRF calculations at the semiempirical and ab initio levels have shown their ability to reproduce reasonably the relative hydrophilicity of the bases. All the methods yield quite correct predictions of absolute log P values, but the MST model (specially at the AM1 level) performs better than the FEP simulation. The reasons for the errors in the FEP-OPLS calculations are suggested to be related to problems derived from the neglect of specific polarization interactions whose magnitude can be very different in water and chloroform. The present results show the suitability of these techniques to study the affinities of complex molecules for different environments, but also highlights the weaknesses and limitations of these computational tools. Further work may help to eliminate these shortcomings and to develop more sophisticated theoretical models able to describe the influence of solvation on the chemical properties and reactivity in condensed phases.
Acknowledgements We are indebted to Professor J. Tomasi for providing us the original MST routines, which were
M. Orozco et al. / Chemical Physics 209 (1996) 19-29
modified to perform this work, and to Dr. W.L. Jorgensen for a copy of BOSS3.4 computer program. This work was supported by the Centre de Supercomputaci6 de Catalunya (CESCA, Mol. Recog. Project) and by the Spanish DGICYT (PB93-0779 and PB94-0940). We thank R. Rycroft for helpful assistance in the preparation of the manuscript. References [1] (a) J. Tomasi and M. Persico, Chem. Rev. 94 (1994) 2027; (b) P.A. Kollman, Chem. Rev. 93 (1993) 2395; (c) M. Orozco, W.L. Jorgensen and F.J. Lnque, J. Comput. Chem. 14 (1993) 1498; (d) H. Carlson, T.B. Nguyen, M. Orozco and W.L. Jorgesen, J. Comput. Chem. 14 (1993) 1240; (e) C.J. Cramer and D.G. Truhlar, J. Am. Chem. Soc. 6 (1992) 629; (f) J.W. Essex, J.A. Reynolds and W.G. Richards, J. Am. Chem. SOC. 114 (1992) 3634; (g) J. Gao, J. Phys. Chem. 96 (1992) 537; (h) W.L. Jorgensen, Chemtracts-Org. Chem. 4 (1991) 91; (i) I. Tufi6n, M.F. Ruiz-L6pez, D. Rinaldi and J. Bertnin, J. Comput. Chem. 17 (1996) 148; (j) I. Tufi6n, M.T.C. Martins-Costa, C. Millot, M.F. Ruiz-L6pez and J.L. Rivail, J. Comput. Chem. 17 (1996) 19; (k) C.J. Cramer and D.G. Truhlar, Rev. Comput. Chem., 6 (1991) 1. [2] P.A. Bash, U.C. Singh, R. Langridge and P.A. Kollman, Science 236 (1987) 564. [3] A.H. Elcock and W.G. Richards, J. Am. Chem. Soc. 115 (1993) 7930. [4] V. Mohan, M.E. Davis, J.A. McCammon and B.M. Pettitt, J. Phys. Chem. 96 (1992) 6428. [5] (a) M. Orozco and F.J. Luque, Biopolymers 35 (1993) 1851; (b) C.J. Cramer and D.G. Truhlar, Chem. Phys. Lett. 198 (1992) 74; (c) C.J. Cramer and D.G. Truhlar, Chem. Phys. Leu. 202 (1992) 567. [6] P.E. Young and I.H. Hillier, Chem. Phys. Left. 215 (1993) 405. [7] J. Gao, Biophys. Chem. 51 (1994) 253. [8] (a) J. Pranata, S.G. Wierschke and W.L. Jorgensen, J. Am. Chem. Soc. 113 (1991) 2810; (b) J. Pranata and W.L. Jorgensen, Tetrahedron 47 (1991) 2491. [9] (a) M. Bachs, F.J. Luque and M. Orozco, J. Comput. Chem. 15 (1994) 446; (b) M. Orozco and F.J. Luque, Chem. Phys. 182 (1994) 237. [10] (a) F.J. Luque, M. Bachs and M. Orozco, J. Comput. Chem. 15 (1994) 847; (b) M. Orozco, M. Bachs and F.J. Luque, J. Comput. Chem. 16 (1995) 563. [11] S. Miertus, E. Scrocco and J. Tomasi, Chem. Phys. 55 (1981) 117. [12] F.J. Luque, C. Alem~in, M. Bachs and M. Orozco, J. Comput. Chem. 17 (1996) 806. [13] F.J. Luque, Y. Zhang, C. Alem~n, M. Bachs, J. Gao and M. Orozco, J, Phys. Chem. 100 (1996) 4269. [14] P.M. Cullis and R. Wolfenden, Biochemistry 20 (1981) 3024.
29
[15] P.C. Hariharan and J.A. Pople, Theoret. Chim. Acta 28 (1978) 213. [16] M.J.S. Dewar, E.G. Zoebisch, E.F. Healy and J.J.P. Stewart, J. Am. Chem. Soc. 107 (1985) 3902. [17] M. Orozco and F.J. Luque, J. Am. Chem. Soc. 117 (1995) 1378. [18] P.A. Pierotti, Chem. Rev. 76 (1976) 717. [19] (a) B. yon Freyberg and W. Braun, J. Comput. Chem. 14 (1993) 510; (b) L. Wesson and D. Eisenberg, Protein Sci. 1 (1992) 227; (c) T. Ooi, M. Oobatake, G. N6methy and H.A. Scheraga, Proc. Natl. Acad. Sci. USA 84 (1987) 3086; (d) D. Eisenberg and A.D. McLachlan, Nature 319 (1986) 199. [20] (a) J.L. Pascual-Ahuir, E. Silla, J. Tomasi and R. Bonaccorsi, J. Comput. Chem. 8 (1987) 778; (b) J.L. Pascual-Ahuir and E. SiUa, J. Comput. Chem. 9 (1990) 1047; (c) E. Silla, I. Tufi6n and J.L. Pascual-Ahuir, J. Comput. Chem. 12 (1991) 1077. [21] F.J. Luque, M. Negre and M. Orozco, J. Phys. Chem. 97 (1993) 4386. [22] (a) J. Tomasi, G. Alagona, R. Bonaccorsi, G. Ghio and R. Cammi, in: Theoretical models of chemical bonding, ed. Z.B. Maksic, Vol. 4 (Springer, Berlin, 1991) p. 546; (b) J. Tomasi, R. Bonaccorsi, R. Cammi and F.J. Olivares del Valle, J. Mol. Struct. (THEOCHEM) 234 (1991) 401; (c) R. Bonaccorsi, F. Floris and J. Tomasi, J. Mol. Liq. 47 (1990) 25. [23] (a) E. Scrocco and J. Tomasi, J. Topics Curr. Chem. 42 (1973) 95; (b) F.J. Luque, F. Illas and M. Orozco, J. Comput. Chem. 11 (1990) 416. [24] (a) C. Alhambra, F.J. Luque and M. Orozco, J, Comput. Chem. 15 (1994) 12; (b) G.G. Ferenczy, C.A. Reynolds and W.G. Richards, J. Comput. Chem. 11 (1990) 159. [25] D.R. Lide, ed., Handbook of chemistry and physics, 74th edn. (CRC Press, Boca Raton, FL, 1993-94). [26] W.L. Jorgensen, J.M. Briggs and M.L. Contreras, J. Phys. Chem. 94 (1990) 1683. [27] (a) S. Cabani, P. Giani, V. MoUica and L. Lepori, J. Solut. Chem. 10 (1981) 563; (b) M.G. Koehler, S. Grigoras and W.J. Dunn, Quant. Struc.-Act. Relat. 7 (1988) 150. [28] W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R. Impey and M. Klein, J. Chem. Phys. 79 (1993) 296. [29] R.W. Zwanzig, J. Chem. Phys. 22 (1954) 1420. [30] M.J. Frisch, G.W. Trucks, M. Head-Gordon, P.M.W. Gill, M.W. Wong, J.B. Foresman, B.G. Johnson, H.B. Schlegel, M.A. Robb, E.S. Replogle, R. Gomperts, J.L. Andres, K. Raghavachari, J.S. Binkley, C. Gonzalez, R.L. Martin, D.J. Fox, D.J. Defrees, J. Baker, J.J.P. Stewart and J.A. Pople, Gaussian 92 (Gaussian Inc., Pittsburgh, PA, 1992). [31] M. Peterson and R. Poirier, MonsterGanss, (Department of Biochemistry, University of Toronto, Canada, version modified by R. Cammi, R. Bonaccorsi and J. Tomasi, 1987). [32] J.J.P. Stewart, MOPAC93 Rev. 2 (Fujitsu Ltd., 1993, version modified by F.J. Luque and M. Orozco, M. 1994). [33] W.L. Jorgensen, BOSS, Version 3.4 (Yale University, New Haven, CT, 1990). [34] M. Orozco, F.J. Luque, D. Habibollahzadeh and J. Gao, J. Chem. Phys. 102 (1995) 6145.