Chemical Physics Letters 394 (2004) 293–297 www.elsevier.com/locate/cplett
An efficient linear scaling method for ab initio calculation of electron density of proteins Ai M. Gao, Da W. Zhang, John Z.H. Zhang *, Yingkai Zhang Department of Chemistry, New York University, New York, NY 10003, USA Received 11 June 2004 Available online 28 July 2004
Abstract We present an efficient linear scaling method for direct calculation of electron density of proteins based on the molecular fractionation with conjugate caps (MFCC) approach. In the present approach, the MFCC method is first applied to computing electron densities of individual protein fragments and conjugate caps (concap). The total electron density of a protein molecule is obtained by summing over all individual electron densities of protein fragments subtracted by electron densities of concap molecules. Similar combination also directly yields electrostatic potential and dipole moment of protein from calculations of separate fragments. Detailed numerical tests on two peptides show that MFCC computed densities and dipole moments are in excellent agreement with those obtained from standard full system calculations at various ab initio levels. 2004 Elsevier B.V. All rights reserved.
Accurate determination of electron density and electrostatic potential of molecular systems is of both fundamental importance and important practical utility. For example, the spatial distribution of density or electrostatic potential is often used to understand chemical structure, reaction, binding, catalysis, and solvation [1–8]. The electrostatic potential is of particular interest in rational drug design for optimization of lead drug candidates and pharmacophore searches [9,10]. It is also widely useful in areas such as force-field parametrization [11–15] and quantitative structure–activity relationships (QSAR) [16,17]. For small molecular systems, the standard quantum chemistry methods such as Hartree–Fock (HF) or density functional theory (DFT) can be routinely performed to generate electron densities (or electrostatic potentials). However, for molecules as large as proteins with thousands of atoms, standard quantum chemistry methods could not be directly applied to such large sys-
*
Corresponding author. Fax: +1 212 260 7905. E-mail address:
[email protected] (J.Z.H. Zhang).
0009-2614/$ - see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.cplett.2004.06.137
tems due to steep scalings of computational cost with the number of electrons of the molecular system. As a rule of thumb, computational cost of a promising quantum mechanical method for large molecular systems should scale linearly with respect to the size of the system being treated. A linear scaling approach is primarily based on the local property of electron interaction because the effect of energy perturbation in one area is generally localized within its vicinity and decays rapidly going away from it. In this approach, the divideand-conquer (DAC) and similar methods are commonly employed in theoretical calculations [18–27]. Although these methods scale linearly with the size of the system, applications are largely limited to calculations using semi-empirical methods for proteins due to huge computational cost. In this communication, we present an efficient linear-scaling method for direct calculation of total electron density of protein based on the molecular fractionation with conjugate caps (MFCC) approach which was developed recently for computing protein–ligand interaction energy [28]. In the MFCC approach, a protein molecule is cut into amino acid fragments along the backbone and a pair of conjugate caps is inserted at
294
A.M. Gao et al. / Chemical Physics Letters 394 (2004) 293–297
locations of cut to cap the protein fragments [29,30]. This method has been successfully applied to computing interaction energy of protein–ligand systems containing thousands or more atoms [31–34]. In this Letter, we develop a new MFCC ansatz for efficient, linear-scaling calculation of total electron density as well as electrostatic potential and dipole moment of proteins and report benchmark numerical results. To simplify discussion without loss of generality, let us first treat a binary system comprised of A and B components as shown in Fig. 1. By applying the MFCC approach [28], this system is cut into A and B fragments that are capped with two conjugate caps as shown in Fig. 1. Now the original system AB is replaced by three new subsystems or fragments, A–C, C*–B and the concap C*–C as shown in Fig. 1. Here, we propose a new MFCC ansatz for total electron density of proteins. The new ansatz states that the total electron density of the original AB system q can be obtained, to a very good approximation, by the following relation (in exactly the same way as for protein–ligand interaction energy) [28]: q ¼ qA þ qB qcc ;
ð1Þ
where qA and qB are, respectively, electron densities of the capped A (A–C) and B (C*–B) fragments and qcc is the electron density of the concap fragment. Eq. (1) shows that by calculating electron density of individual protein fragment separately and independently, we can directly obtain total electron density of the whole system through Eq. (1). It is important to note that Eq. (1) has a number of desirable features. First, if one of the conjugate caps C (or C*) includes the entire B (or A), Eq. (1) gives the exact electron density. Secondly, if classical point charge models are employed to represent electron density, Eq. (1) would also be exact. Thus, the new MFCC ansatz of Eq. (1) has the correct limiting behavior. Eq. (1) (a)
(b)
R1
O
N
C
C
H
H
R1
O
N
C
C
H
H
R2
O
N
C
C
H
H
R'2 N
C
H
H
H
H
can be easily generalized to a polymer such as protein comprised of many components. For a protein molecule with N fragments, Eq. (1) is generalized to q¼
N X
O
C
C
H
H
R'1
O
C
C
H
C
H
H
Nd X
qdc k ;
ð2Þ
k¼1
where qk is the density of the kth protein fragment, qcc k is the density of the kth concap, qdc is the density of the k kth disulfide concap and Nd is the number of disulfide bonds in protein which are cut in the MFCC approach [30]. Thus, the total electron density of a protein can be obtained through simple combination of individual densities of amino-acid fragments and concap species that can be calculated independently. It is useful to point out that the computational cost for concap species is minimal compared to that for larger capped protein fragments. It is easy to verify that the total electron density obtained from Eq. (2) is correctly normalized Z
q dr ¼
N Z X
qk dr
k¼1
¼
N 1 Z X
qcc k dr
Nk
k¼1
N 1 X
N cc k
qdc k dr
k¼1
k¼1
N X
Nd Z X
Nd X
N dc k ¼ N total ;
ð3Þ
k¼1
k¼1
where Nk and N cc k are, respectively, the number of electron of the kth protein fragment (capped) and concap, and N dc k is the number of electron of the kth disulfide concap. Thus, the density in Eq. (2) automatically has the correct normalization. Since the electrostatic potential is obtained from the density by the formula (excluding contribution from nuclei) Z qðr0 Þ /ðrÞ ¼ dr0 ; ð4Þ jr r0 j we can obtain the total electrostatic potential of a protein by the formula similar to Eq. (2) /¼
R2
O
N
C
C
H
H
R'2 N
qcc k
k¼1
N X
/k
k¼1
R'1
N 1 X
k¼1
H
Fig. 1. The MFCC scheme in which: (a) a peptide bond is cut along the backbone, (b) a pair of concap is inserted to cap the fragments and (c) the concap is fused to form molecular concap species.
N 1 X
/cc k
Nd X
/dc k :
ð5Þ
k¼1
k¼1
Thus, the total electrostatic potential is also directly given through simple combination of individual electrostatic potentials of the protein fragments. Similarly, one can obtain the total dipole moment of the protein by l¼
N X k¼1
(c)
qk
lk
N 1 X k¼1
lcc k
Nd X
ldc k ;
ð6Þ
k¼1
where l is the total dipole moment of the protein and the fragment dipole moments are defined similarly as fragment electron densities given above. In the present MFCC approach, we cut the peptide bond along the backbone [34] and use the concap represented by R01 CH2 CONHCH2 R02 , where R01 and R02
A.M. Gao et al. / Chemical Physics Letters 394 (2004) 293–297
denote two choices of side chains as shown in Fig. 1. Two different choices for the side chain groups are tested: R01 ¼ R02 ¼ H (concap 1) and R01 ¼ R1 ; R02 ¼ R2 (concap 2). For proline residues, the side chain groups are slightly different to account for the covalent bond formed between the side chain and the nitrogen atom.
295
We performed numerical calculations for two peptides, a 6-residue peptide (Pro-Ala-Phe-Gly-Tyr-Val) constructed by using Insight II program and a 9-residue peptide from protein data bank (PDB) id 1JJQ. To facilitate comparison of MFCC calculated electron density with that from the standard full system ab initio calculation, we choose a box surrounding the peptide as shown in Fig. 2 and compare calculated electron densities within the box for the 9-aa peptide (PDB: 1JJQ). ˚ 3 and contains 69 120 evenly The box size is 18 · 24 · 20 A spaced grid points. Fig. 3 shows the absolute deviation of the MFCC density with the full system result on discrete grid points calculated at the level of HF/ 6-31G* using concap 1. As we can see that density
Table 1 Root mean square deviation of MFCC electron density from full system result for peptide 1 (Pro-Ala-Phe-Gly-Tyr-Val) and peptide 2 (PDB: 1JJQ) with concap 1 ðR01 ¼ R02 ¼ HÞ and concap 2 ðR01 ¼ R1 ; R02 ¼ R2 Þ System
Basis set
Method
RMSD
Peptide 1
HF/3-21G
Cap1 Cap2
9.24 · 105 1.62 · 10 5
HF/6-31G*
Cap1 Cap2
9.07 · 10 5 1.72 · 105
HF/3-21G
Cap1 Cap2
8.10 · 105 6.10 · 105
HF/6-31G*
Cap1 Cap2
7.55 · 105 5.99 · 105
Peptide 2
Fig. 2. A three-dimensional box in which electron densities obtained from MFCC and full system ab initio calculations are compared.
Fig. 3. Absolute deviation of MFCC calculated electron density from standard full system ab initio calculation for a 9-aa peptide (PDB: 1JJQ) at HF/6-31G* using concap 1.
296
A.M. Gao et al. / Chemical Physics Letters 394 (2004) 293–297
Table 2 Comparison of dipole moment between full system and MFCC ab initio calculations for peptide 1 (Pro-Ala-Phe-Gly-Tyr-Val) and peptide 2 (PDB: 1JJQ) with concap 1 ðR01 ¼ R02 ¼ HÞ and concap 2 ðR01 ¼ R1 ; R02 ¼ R2 Þ System
Basis set
Method
X
Y
Z
HF/3-21G
Full system MFCC cap1 MFCC cap2
0.9738 0.8245 0.9466
0.9512 0.9526 1.0015
0.7207 0.9593 0.7218
1.5403 1.5835 1.5557
HF/6-31G*
Full system MFCC cap1 MFCC cap2
1.0495 0.9495 1.0083
0.9737 1.0615 1.0350
0.4959 0.7265 0.5049
1.5151 1.5988 1.5306
HF/3-21G
Full system MFCC cap1 MFCC cap2
1.7835 1.5988 2.0838
11.5923 10.5578 10.3857
1.0937 1.3984 1.3149
11.7796 10.7693 10.6740
HF/6-31G*
Full system MFCC cap1 MFCC cap2
1.8360 1.7864 2.2230
10.8096 9.8421 9.6211
1.2152 1.4888 1.4620
11.0316 10.1131 9.9822
Peptide 1
Peptide 2
deviation from the full system result is quite small. Detailed analysis of results shows that most density deviation is just a few percent or smaller. Table 1 lists the root mean square deviation (RMSD) of the calculated density for the 9-aa peptide defined as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN grid uX 2 RMSD ¼ t DqðmÞ =N grid ; ð7Þ m¼1
where Ngrid is the total number of grid points within the box of Fig. 2. These deviations are generally small although it should be kept in mind that the above definition of RMSD is box-dependent. One should note in Table 1 that use of concap 2 (including two neighboring side chains) does give more accurate density than the smaller concap 1 in which the neighboring side chains are replaced by hydrogen atoms. Another good measure of accuracy of density is the dipole moment. Table 2 lists the comparison of dipole moments of these two peptides calculated using both MFCC and standard full system methods at HF levels. It is interesting to note that for the 6-aa peptide (peptide 1), the MFCC method gives dipole moment with only 1% deviation from the corresponding full system result. The deviation is relatively larger for the 9-aa peptide (peptide 2) with about 10% (1 Debye). However, close examination of peptide 2 (1JJQ) reveals that there is a intra-peptide hydrogen bond which is not treated in the current MFCC calculation and it caused local distortion of electron density surrounding the hydrogen bond. The handling of hydrogen-bond in the MFCC treatment is worth mentioning. So far, intra-protein hydrogen-bond does not seem to have much effect on the interaction energy between a peptide and a ligand and has been neglected in the MFCC treatment. In calculating local electron densities and dipole moment, however,
Total
these intra-protein hydrogen bonds do have some effect. This is especially true if a protein has a network of hydrogen bonds such as in a helix and b sheet. An intraprotein hydrogen bond causes some local polarization of electron density surrounding the hydrogen bond. It is therefore desirable to include this effect in the MFCC treatment in order to give more accurate electron density. Work is currently in progress to deal with this issue as well as other intra-protein electrostatic effects. The current new MFCC ansatz provides a direct way to calculate total electron density and related physical quantities of protein or other polymers without any need for diagonalization of large Hamiltonian matrix of the entire system. The method is efficient, linearscaling and highly parallelizable. At this stage, it is important to note another important result from the current MFCC ansatz on electron density: since the ground state electron energy of a molecular system is completely determined by the electron density q based on the fundamental theorem in DFT, determination of protein electron density using the current MFCC method implies that total (intra-) protein energy can also be determined accordingly. This will provide a highly efficient approach for calculating total protein energy. Work along this line is in progress.
References [1] J.D. Madura, M.E. Davis, M.K. Gilson, R.C. Wade, B.A. Luty, J.A. McCammon, in: K.B. Lipkowitz, D.B. Boyd (Eds.), Reviews in Computational Chemistry, vol. 5, VCH, New York, 1994, p. 229. [2] P. Politzer, J.S. Murray, in: J.S. Murray, K. Sen (Eds.), Molecular Electrostatic Potentials: Concepts and Applications, Elsevier, Amsterdam, 1996, p. 649. [3] G. Naray-Szabo, in: P.v.R. Schleyer, N.L. Allinger, T. Clark, J. Gasteiger, P.A. Kollman, H.F. Schaefer III, P.R. Schreiner
A.M. Gao et al. / Chemical Physics Letters 394 (2004) 293–297
[4] [5]
[6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
(Eds.), The Encyclopedia of Computational Chemistry, Wiley, Chichester, p. 905. M.K. Gilson, B. Honig, Proteins 4 (1988) 7. J. Tomasi, B. Mennucci, R. Cammi, in: J.S. Murray, K. Sen (Eds.), Molecular Electrostatic Potentials: Concepts and Applications, Elsevier, Amsterdam, 1996, p. 1. G. Hummer, L.R. Pratt, A.E. Garcia, B.J. Berne, S.W. Rick, J. Phys. Chem. B 101 (1997) 3017. D.J. Tobias, Curr. Opin. Struct. Biol. 11 (2001) 253. M.E. Matrin, M.L. Sanchez, F.J. Olivares del Valle, M.A. Aguilar, J. Chem. Phys. 116 (2002) 1613. M. Orozco, E.I. Canela, R. Franco, Eur. J. Biochem. 188 (1990) 155. J. Rogriguez, F. Manault, F. Sanz, J. Comput. Chem. 14 (1993) 922. B.H. Besler, K.M. Merz Jr., P.A. Kollman, J. Comput. Chem. 11 (1990) 431. W.D. Cornell, P. Cieplak, C.I. Bayly, P.A. Kollman, J. Am. Chem. Soc. 115 (1993) 9620. C.E. Dykstra, Chem. Rev. 93 (1993) 3. W. Wang, O. Donini, C.M. Reges, P.A. Kollman, Annu. Rev. Biophys. Biomol. Struct. 30 (2001) 211. T.A. Halgren, W. Damm, Curr. Opin. Struct. Biol. 11 (2001) 236. G. Naray-Szabo, P.R. Surjan, in: G. Naray-Szabo (Ed.), Theoretical Chemistry of Biological Systems, Elsevier, Amsterdam, 1986, p. 1.
297
[17] A. Warshel, J. Aqvist, Annu. Rev. Biophys. Biophys. Chem. 20 (1991) 267. [18] W. Yang, Phys. Rev. Lett. 66 (1991) 1438. [19] T.-S. Lee, D.M. York, W. Yang, J. Chem. Phys. 105 (1996) 2744. [20] W. Yang, T.-S. Lee, J. Chem. Phys. 103 (1995) 5674. [21] W. Kohn, Phys. Rev. Lett. 76 (1996) 3168. [22] S.L. Dixon, K.M. Merz Jr., J. Chem. Phys. 104 (1996) 6643. [23] V. Gogonea, L.M. Westerhoff, K.M. Merz Jr., J. Chem. Phys. 113 (2000) 5604. [24] J.J.P. Stewart, Int. J. Quantum Chem. 58 (1996) 133. [25] A.D. Daniels, J.M. William, G.E. Scuseria, J. Chem. Phys. 107 (1997) 425. [26] A.D. Daniels, G.E. Scuseria, J. Chem. Phys. 110 (1999) 1321. [27] G.E. Scuseria, J. Phys. Chem. A 103 (1999) 4782. [28] D.W. Zhang, J.Z.H. Zhang, J. Chem. Phys. 119 (2003) 3599. [29] D.W. Zhang, X.H. Chen, J.Z.H. Zhang, J. Comput. Chem. 24 (2003) 1846. [30] X.H. Chen, D.W. Zhang, J.Z.H. Zhang, J. Chem. Phys. 120 (2004) 839. [31] D.W. Zhang, J.Z.H. Zhang, J. Theoret. Comput. Chem. 3 (2004) 43. [32] D.W. Zhang, Y. Xiang, J.Z.H. Zhang, J. Phys. Chem. B 107 (2003) 12039. [33] D.W. Zhang, Y. Xiang, A.M. Gao, J.Z.H. Zhang, J. Chem. Phys. 120 (2004) 1145. [34] Y. Xiang, J.Z.H. Zhang, J. Comput. Chem. 25 (2004) 1431.