Chemical Physics Letters 474 (2009) 195–198
Contents lists available at ScienceDirect
Chemical Physics Letters journal homepage: www.elsevier.com/locate/cplett
Fragment molecular orbital calculation using the RI-MP2 method Takeshi Ishikawa, Kazuo Kuwata * Division of Prion Research, Center for Emerging Infectious Disease, Gifu University, 1-1 Yanagido, Gifu 501-1194, Japan CREST Project, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
a r t i c l e
i n f o
Article history: Received 13 March 2009 In final form 15 April 2009 Available online 19 April 2009
a b s t r a c t The resolution of the identity second-order Møller–Plesset perturbation theory (RI-MP2) was introduced into the fragment molecular orbital (FMO) method, where the program of the RI-MP2 method was newly developed. After some test calculations with a small peptide, the performance of the RI-MP2 method with the FMO scheme was demonstrated for two biomolecular systems: the human immunodeficiency virus type 1 protease with the lopinavir molecule, and the prion protein with the GN8 molecule. These calculations showed the great advantage of FMO calculations using the RI-MP2 method over ordinary FMO calculations. Ó 2009 Elsevier B.V. All rights reserved.
1. Introduction Evaluation of the four-center electron repulsion integrals is a time-consuming procedure in general quantum chemical calculations. The resolution of the identity (RI) is one of the most efficient approaches to reduce the computational cost of this process, where the four-center integrals are approximately calculated from threeand two-center electron repulsion integrals using auxiliary basis sets [1–4]. This approach has been utilized for various methods in quantum chemistry, including Hartree–Fock (HF) [5,6], secondorder Møller–Plesset perturbation theory (MP2) [7–12], coupled cluster theory [13], and multi-configurational self-consistent field theory [14,15]. The first usage of the RI approximation with the MP2 method (RI-MP2) was reported by Feyereisen et al. [7], in which RI-MP2 calculations were about 10 times faster than conventional MP2 calculations and the error was within 0.1 kcal/mol in the water dimer. Parallel computations of the RI-MP2 method were performed by Bernholdt and Harrison, and a good parallel efficiency was shown [8]. The quality of the auxiliary basis sets is very important for the accuracy of RI-MP2 calculations. Several auxiliary basis sets have been provided by Weigend et al. [10,16] and Hättig [17], which were optimized for the atomic basis sets of SVP, TZVP, and TZVPP [18,19], and a series of correlation consistent basis sets [20–22]. Local MP2 (LMP2) method is another approach for saving the computational cost of the MP2 calculations [23–25], in which formally linear scaling can be achieved by introducing the localized molecular orbitals and using two individual approximations, i.e., restriction of the virtual space and selection of the correlated orbital pairs. Recently, the more promising approach have been devel-
oped by Werner et al. [26], where the RI approximation and locality of the localized molecular orbital could be utilized. The fragment molecular orbital (FMO) method [27–30] is an efficient approach for quantum mechanical treatments of large molecules, including proteins and nucleic acids, in which computational effort can be reduced by dividing a target molecule into small fragments. The FMO scheme has been extended to the MP2 method [31–33], thereby enabling us to obtain computational results including dispersion interaction. Many studies for biomolecular systems using the MP2 method with the FMO scheme have so far been conducted [34–37]. In recent days, demonstrative calculations of the FMO scheme at the MP2 level was reported by Mochizuki et al. [38], where the calculation of a large biomolecular system constructed with 921 amino acid residues was completed in 53.4 min with 4096 vector processors. However, using standard modern computer systems, FMO calculations of biomolecular systems at the MP2 level entail a high computational cost. Although the LMP2 method have been incorporated with the FMO scheme by Ishikawa et al. [39,40], its aim is not to save the cost of calculations but to analyze the dispersion interaction. In this Letter, we introduce the RI-MP2 method into the FMO scheme to reduce the computational effort that results from MP2 calculations. We provide brief descriptions about the RI-MP2 method and the FMO scheme in the subsequent section. Algorithm and implementation of our program are presented in Section 3. Finally, we demonstrate the performance of the FMO calculation using the RI-MP2 method with two illustrative examples. 2. Method 2.1. RI-MP2 method
* Corresponding author. Address: Division of Prion Research, Center for Emerging Infectious Disease, Gifu University, 1-1 Yanagido, Gifu 501-1194, Japan. E-mail address:
[email protected] (K. Kuwata). 0009-2614/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.cplett.2009.04.045
In RI approximation, the four-center electron repulsion integrals of the atomic basis functions are calculated with the three- and
196
T. Ishikawa, K. Kuwata / Chemical Physics Letters 474 (2009) 195–198
two-center electron repulsion integrals as the following equation [5]:
X ðlmjPÞV 1 PQ ðQ jkrÞ;
ðmljkrÞ
ð1Þ
PQ
where P and Q are indices of the auxiliary basis sets, and V 1 PQ is the element of the inverse matrix of V, which is defined as
V PQ ¼ ðPjQ Þ:
ð2Þ
If the auxiliary basis sets span the entire space made by the pair products of atomic basis functions, the right-hand side is equivalent to the left-hand side in Eq. (1). As is well known, the MP2 correlation energy of a closed shell molecule is calculated as
ðiajjbÞf2ðiajjbÞ ðibjjaÞg ; i þ j a b
Ecorr are calculated with the RI-MP2 method instead of the convenIJ tional MP2 method. 3. Algorithm and implementation In this section, we discuss the implementation of the RI-MP2 method with the FMO scheme. In order to represent the cost of calculations, we introduce four notations, N; O; V, and X, which represent the numbers of atomic basis functions, occupied orbitals, virtual orbitals, and auxiliary basis functions, respectively. The loop structure of our program is given in Fig. 1. The transformation of Eq. (5) is performed under the outer loop over the auxiliary basis functions ðQ Þ. This process is divided into three transformations:
ð3Þ
ðimjQ Þ ¼
where i; j and a; b are indices of occupied and virtual orbitals, respectively. In the case of the RI-MP2 method, ðiajjbÞ are approximately calculated with the following equations [9,10]:
ðiajQ Þ ¼
Ecorr ¼
ðiajjbÞ
X
BPia
¼
l
BPia BPjb ;
m
ð4Þ
C li C mi ðlm
1=2 jQ ÞV QP ;
ð5Þ
Q
1=2 where C li is the orbital coefficient and V QP is the matrix element of 1=2 . V
2.2. FMO method In the FMO method, the total properties of target molecules are evaluated using the calculations of fragments (referred to as ‘monomers’) and pairs of fragments (referred to as ‘dimers’) [27,28]. Fragmentation is achieved using the projection operator [28], and each calculation is performed with environmental electrostatic potential (ESP) [27,28]. The ESP is determined by achieving self-consistency in the iterative procedure called ‘monomer self-consistent charge’ (SCC). The total energy of the HF level is calculated using the following equation [30]:
EHF total ¼
X
E0HF þ I
X
I
ð9Þ
l
X ðimjQ ÞC ma ;
ð10Þ
m
X 1=2 BPia ¼ ðiajQ ÞV QP ;
ð11Þ
Q
P
XXX
X ðlmjQ ÞC li ;
DEHF IJ ;
where the cost of each step is ON2 X; OVXN, and OVX 2 , respectively. The calculated BPia is stored in the memory. After the above transformations, ðiajjbÞ are evaluated using Eq. (4) and the MP2 correlation energy is accumulated under the outer loop over the occupied orbital pairs. Although the cost of calculations of ðiajjbÞ is O2 V 2 X (fifth power of system size), this step can be performed by the DGEMM function of the basic linear algebra subroutines (BLAS) [41]. At first glance, the memory requirement of BPia ðXOVÞ may appear to cause computational problems for large molecules. However, such problems are not serious in this study where the FMO scheme is employed. As shown in the previous section, the target molecule is divided into small fragments and only calculations of the monomer and the dimer are required in the FMO scheme. Therefore,
ð6Þ
I>J
where E0HF is the energy of the monomer without the ESP and DEHF I IJ is the inter fragment interaction energy (IFIE). Formulation of the IFIE has been described in detail by Nakano et al. [30]. For the dimer constructed by largely separated fragments, DEHF IJ is approximately calculated only from the contributions of electrostatic interaction between two monomers. Such treatment is called ‘dimer-es’ approximation [30]. In the MP2 level of theory, the total energy is calculated as [31–33] HF EMP2 total ¼ Etotal þ
X I
Ecorr þ I
X
DEcorr IJ ;
ð7Þ
I>J
where Ecorr is the electron correlation energy obtained from the I is the IFIE associated with monomer MP2 calculation and DEcorr IJ the electron correlation. This value is obtained by the following equation:
DEcorr ¼ Ecorr Ecorr Ecorr ; IJ IJ I J Ecorr IJ
ð8Þ
where is the electron correlation energy obtained from the dimer MP2 calculation. For the fragment pairs treated with the dimeris set to be zero. That is to say, dispersion es approximation, DEcorr IJ interaction between largely separated fragments is neglected. and When using the RI-MP2 method with the FMO scheme, Ecorr I
Fig. 1. The loop structure of our implementation of the RI-MP2 method. The first outer loop is over the auxiliary basis sets ðQ Þ and the second outer loop is over the occupied orbital pairs ðijÞ. In the cases of the parallel computations, the index P is divided into several ranges (each range is denoted by P 0 ) and distributed to the 0 CPUs, where only BPia are calculated and stored in the memory (at position A). The 0 contribution of the BPia is summed up into ðiajjbÞ at position B, and it is completed via the Allreduce function of the MPI [42] at position C. Finally, the electron correlation energy is evaluated using ðiajjbÞ.
197
T. Ishikawa, K. Kuwata / Chemical Physics Letters 474 (2009) 195–198
even if the entire system is large, the sizes of the calculations are limited to those of the dimers, and the memory requirement of BPia remains small. In the cases of calculations containing large fragments, BPia can be distributed into several CPUs and stored in the memory. As shown in Fig. 1, our program was parallelized via the message-passing-interface (MPI) [42]. Here, one should note that the RI-MP2 calculations in the FMO scheme will be parallelized only when the memory requirement is too large, that is to say, the parallel computation is not used in order to speed-up the RI-MP2 calculations. The reason for this arises from a feature of the FMO scheme: the most efficient speed-up of the FMO method can be achieved by the parallel computation for the indices of the fragments or pairs of fragments, because their calculations are completely independent of each other. Our program code was implemented in the program package of PAICS [43], which was recently developed in our laboratory for FMO calculations.
Table 2 Time (min) taken using the RI-MP2 method and the conventional MP2 method in the calculations of Trp–Trp. Here M shows the memory requirement of BPia (MB) with the RI-MP2 method. The cc-pVDZ and cc-pVTZ [20] were employed together with the optimized auxiliary basis sets [16]. Calculations were performed using Xeon E5420 (2.5 GHz) in which 2.0 GB memory is available per core.
T MP2 T RIMP2 M a b c
cc-pVDZ
cc-pVTZ
282.7a 13.6a 450
12 130.5b 72.6c 1827
One core was used. Four cores were used. Two cores were used.
4.2. FMO calculations of biomolecular systems
Prior to the demonstrative calculations of biomolecular systems, some test calculations are given in this subsection. A small peptide containing two tryptophans (Trp–Trp) was selected for our test calculations. Since amino acid residues are treated as a single fragment, and tryptophan is the largest residue, Trp–Trp is one of the largest dimers in FMO calculations of typical proteins. The structure of this peptide was picked up from a classical MD trajectory with the AMBER 10 program [44]. The calculations were performed using cc-pVDZ and cc-pVTZ [20] with the optimized auxiliary basis sets by Weigend et al. [16]. The electron correlation energies obtained by the RI-MP2 method are shown in Table 1 along with those by the conventional MP2 method. The energy differences are 0.00026 and 0.00059 hartree (0.16 and 0.37 kcal/mol) in cc-pVDZ and cc-pVTZ, respectively. These errors are considered to be small for general discussions in chemistry. We next focus our attention to the timings summarized in Table 2. In cc-pVDZ, the RI-MP2 calculation was about 20 times faster than the conventional MP2 calculation. For the RI-MP2 method, the memory requirement of BPia was 450 MB, which is sufficiently small for standard modern computer systems. On the other hand, the integral transformation of the conventional MP2 method was divided into five batches using our computer system (2.0 GB memory is available per core). In cc-pVTZ, RI-MP2 took 72.6 min with two cores. The total memory required for BPia was 1827 MB, and therefore, the memory requirement per core was 914 MB for the parallel calculation with two cores. This computational cost is sufficiently reasonable to be applied to the FMO scheme. In the case of the conventional MP2 method, the integral transformation required 74 batches and took 12130.5 min with four cores. This result indicates that FMO calculations of biomolecular systems employing cc-pVTZ levels are extremely difficult when using the conventional MP2 method.
In this subsection, we show the performance of RI-MP2 calculations with the FMO scheme using two biomolecular systems. One is the human immunodeficiency virus type 1 protease (HIV-1 PR) complexed with the lopinavir (LPV) molecule. HIV-1 PR is an important molecule in the infection of Acquired Immune Deficiency Syndrome (AIDS), and LPV is an inhibitor. Detailed descriptions about the atomic coordinates of this system can be found in previous studies [39,40]. Another example of a biomolecular system is the complex of the prion protein (PrP) with the GN8 molecule. PrP is considered to be a key molecule in prion diseases, and GN8 is a potential curative agent that was discovered in our laboratory [45]. Details of the atomic coordinates of this complex were presented in a previous paper [43]. FMO calculations of two biomolecular systems were performed with cc-pVDZ, where eight cores were employed and 2.0 GB memory was available per core. A parallel computation was applied only for the indices of the fragments and pairs of fragments, that is to say, each calculation of the RIMP2 method was performed using a single core (see Section 3). The electron correlation energies obtained with the RI-MP2 method and the conventional MP2 method are provided in Table 3. The differences in energies between the two methods were 0.01380 hartree (8.7 kcal/mol) in the HIV-1 PR and 0.00614 hartree (3.9 kcal/mol) in the PrP. These errors are 0.0058% and 0.0045% of the total electron correlation energies, respectively. We next discuss the timings of the calculations, which are summarized in Table 4. In the case of the HIV-1 PR, the dimer MP2 (conventional) took 7385.9 min and the total time of the FMO calculation with conventional MP2 was 11 595.8 min, i.e., 63.7% was spent on the dimer MP2 calculations. On the other hand, when using the RI-MP2 method, the dimer MP2 took 717.0 min, which was only 14.7% of the total time of the FMO calculation (4870.5 min). Thus, the total FMO calculation could be performed 2.38 times faster by introducing the RI-MP2 method. In the calculation of the PrP, the time taken for the dimer MP2 (conventional) was 70.4% of the total FMO time, in contrast, the time taken with the RI-MP2 method was only 18.6%. Therefore, the speed-up of the FMO calculation was 2.81 times. As a result that computational cost of MP2 calculations was reduced, the dimer SCF procedure became the most time-consuming
Table 1 The electron correlation energies (hartree) of Trp–Trp obtained by the RI-MP2 method and the conventional MP2 method. Here D denotes the difference between two energies. The cc-pVDZ and cc-pVTZ [20] were employed together with the optimized auxiliary basis sets [16].
Table 3 The electron correlation energies (hartree) of two biomolecular systems obtained by the RI-MP2 method and the conventional MP2 method with the FMO scheme. Here D denotes the difference between two energies. The cc-pVDZ [20] were employed together with the optimized auxiliary basis sets [16].
4. Results and discussion 4.1. Test calculations without FMO scheme
EMP2 ERI-MP2 D
cc-pVDZ
cc-pVTZ
4.12586 4.12560 +0.00026
5.04831 5.04772 +0.00059
ERI-MP2 EMP2 D
HIV-1
PrP
237.29583 237.30963 +0.01380
135.36924 135.35738 +0.00614
198
T. Ishikawa, K. Kuwata / Chemical Physics Letters 474 (2009) 195–198
Table 4 Details of the time (min) taken in FMO calculations of two biomolecular systems. These calculations were performed using eight cores of Xeon E5420 (2.5 GHz) in which 2.0 GB memory is available per core. HIV-1 Monomer SCCa Monomer MP2 (conventional) Monomer MP2 (RI-MP2) Dimer ES Dimer SCF Dimer MP2 (conventional) Dimer MP2 (RI-MP2) Etc. Total (conventional MP2)b Total (RI-MP2)c a b c
ence of the National Institute of Biomedical Innovation. Calculations necessary in this study were performed using the computational resources at Gifu University and the Research Center for Computational Science, Okazaki, Japan.
PrP
1085.3 62.8 6.4 706.9 2329.9 7385.9 717.0 25.0
560.1 41.7 4.7 198.6 1221.6 4846.9 455.8 12.3
References
11 595.8 4870.5
6881.2 2453.1
[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
Monomer RHF is included. Total time with conventional MP2. Total time with RI-MP2.
step of the FMO scheme, which undertook 47.8 % in the HIV-1 PR and 49.8 % in the PrP. Thus, the acceleration of the SCF calculations is naturally hoped, and it will be accomplished by the low scaling methods reported by the several studies, e.g., the approaches using the fast multipole method [46–48]. It is expected that the superiority of the RI-MP2 method in the FMO scheme is more significant in the calculation using large basis sets (cf. our test calculations of the Trp–Trp). In this study, however, we could not complete the calculations of the two biomolecular systems using cc-pVTZ because another problem arose, i.e., poor convergence of the iterative procedure in monomer SCC. Thus, at this stage, the calculation of biomolecular systems using ccpVTZ is a topic for future study along with resolving the aforementioned convergence problem. 5. Summary
[1] [2] [3] [4] [5] [6]
[28] [29] [30] [31] [32]
In this Letter, we reported that the cost of calculations of the FMO scheme could be reduced by introducing the RI-MP2 method. The program for the RI-MP2 method has been newly developed and implemented with PAICS. Since only calculations of fragments and pairs of fragments are required in the FMO scheme, the intermediate values ðBPia Þ are easily stored in the memory even if a target molecule is large. After some test calculations with a small peptide, we performed demonstrative calculations using the RIMP2 method in the context of the FMO scheme, in which two real biomolecular systems were calculated as illustrative examples. These calculations showed that the computational effort resulting from MP2 calculations could be extremely reduced and that the total time of FMO calculations decreased by more than half. Acknowledgements The study reported here was supported by a Grant-In-Aid for Young Scientists (Start-up) No. 20850020 from the Japan Society for the Promotion of Science. This study was also supported by the Program for Promotion of Fundamental Studies in Health Sci-
[33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48]
J.L. Whitten, J. Chem. Phys. 58 (1973) 4496. E.J. Baerends, D.E. Ellis, P. Ros, Chem. Phys. 2 (1973) 41. B.I. Dunlap, J.W.D. Connolly, J.R. Sabin, J. Chem. Phys. 71 (1979) 3396. R.A. Kendall, H.A. Fruchtl, Theor. Chem. Acc. 97 (1997) 158. O. Vahtras, J. Almlöf, M.W. Feyereisen, Chem. Phys. Lett. 213 (1993) 514. H.A. Früchtl, R.A. Kendall, R.J. Harrison, K.G. Dyall, Int. J. Quant. Chem. 64 (1997) 63. M. Feyereisen, G. Fitzgerald, A. Komornicki, Chem. Phys. Lett. 208 (1993) 359. D.E. Bernholdt, B.J. Harrison, Chem. Phys. Lett. 205 (1996) 477. F. Weigend, M. Häser, Theor. Chem. Acc. 97 (1997) 331. F. Weigend, M. Häser, H. Patzelt, R. Ahlrichs, Chem. Phys. Lett. 294 (1998) 143. A. Köhn, C. Hättig, Chem. Phys. Lett. 358 (2002) 350. Y. Jung, Y. SHAO, M. Head-Gordon, J. Comput. Chem. 28 (2007) 1953. A.P. Rendell, T.J. Lee, J. Chem. Phys. 101 (1994) 400. S. Ten-no, S. Iwata, J. Chem. Phys. 105 (1996) 3604. S. Ten-no, S. Iwata, Int. J. Quant. Chem. Symp. 30 (1996) 107. F. Weigend, A. Köhn, C. Hättig, J. Chem. Phys. 116 (2002) 3175. C. Hättig, Phys. Chem. Chem. Phys. 7 (2005) 59. A. Schäfer, H. Horn, R. Ahlrichs, J. Chem. Phys. 97 (1992) 2571. A. Schäfer, C. Huber, R. Ahlrichs, J. Chem. Phys. 100 (1994) 5829. T.H. Dunning, J. Chem. Phys. 90 (1989) 1007. D.E. Woon, T.H. Dunning, J. Chem. Phys. 100 (1994) 2975. K.A. Peterson, T.H. Dunning, J. Chem. Phys. 117 (2002) 10548. G. Rauhut, P. Pulay, H.-J. Werner, J. Comput. Chem. 19 (1988) 1241. G.E. Scuseria, P.Y. Ayara, J. Chem. Phys. 111 (1999) 8330. J.E. Subotnik, M. Head-Gordon, J. Chem. Phys. 122 (2005) 034109. H.-J. Werner, F.R. Manby, P.J. Knowles, J. Chem. Phys. 118 (2003) 8149. K. Kitaura, T. Sawai, T. Asada, T. Nakano, M. Uebayasi, Chem. Phys. Lett. 312 (1999) 319. K. Kitaura, E. Ikeo, T. Asada, T. Nakano, M. Uebayasi, Chem. Phys. Lett. 313 (1999) 701. T. Nakano, T. Kaminuma, T. Sato, Y. Akiyama, M. Uevayasi, K. Kitaura, Chem. Phys. Lett. 318 (2000) 416. T. Nakano, T. Kaminuma, T. Sato, K. Fukuzawa, Y. Akiyama, M. Uebayasi, K. Kitaura, Chem. Phys. Lett. 351 (2002) 475. Y. Mochizuki, T. Nakano, S. Koikegami, S. Tanimori, Y. Abe, U. Nagashima, K. Kitaura, Theor. Chem. Acc. 112 (2004) 442. Y. Mochizuki, S. Koikegami, T. Nakano, S. Amari, K. Kitaura, Chem. Phys. Lett. 396 (2004) 473. D.G. Fedorov, K. Kitaura, J. Chem. Phys. 121 (2004) 2483. K. Fukuzawa, Y. Mochizuki, S. Tanaka, K. Kitaura, T. Nakano, J. Phys. Chem. B 110 (2006) 16102. M. Ito, K. Fukuzawa, Y. Mochizuki, T. Nakano, S. Tanaka, J. Phys. Chem. B 111 (2007) 3525. M. Ito, K. Fukuzawa, Y. Mochizuki, T. Nakano, S. Tanaka, J. Phys. Chem. A 112 (2008) 1986. M. Ito, K. Fukuzawa, T. Ishikawa, Y. Mochizuki, T. Nakano, S. Tanaka, J. Phys. Chem. B 112 (2008) 12081. Y. Mochizuki et al., Chem. Phys. Lett. 457 (2008) 396. T. Ishikawa, Y. Mochizuki, S. Amari, T. Nakano, H. Tokiwa, S. Tanaka, K. Tanaka, Theor. Chem. Acc. 118 (2007) 937. T. Ishikawa, Y. Mochizuki, S. Amari, T. Nakano, S. Tanaka, K. Tanaka, Chem. Phys. Lett. 463 (2008) 189. Basic Linear Algebra Subroutines,
. Message Passing Interface (MPI),
. T. Ishikawa, T. Ishikura, K. Kuwata, J. Comput. Chem., in preparation. D.A. Case et al., AMBER 10, University of California, San Francisco, 2008. K. Kuwata et al., Proc. Natl. Acad. Sci. USA 104 (2007) 11921. C.A. White, B.G. Johnson, P.M.W. Gill, M. Head-Gordon, Chem. Phys. Lett. 230 (1994) 8. C.A. White, B.G. Johnson, P.M.W. Gill, M. Head-Gordon, Chem. Phys. Lett. 253 (1996) 268. M.C. Strain, G.E. Scuseria, M.J. Frisch, Science 271 (1996) 51.