Journal of Molecular Structure: THEOCHEM 948 (2010) 102–107
Contents lists available at ScienceDirect
Journal of Molecular Structure: THEOCHEM journal homepage: www.elsevier.com/locate/theochem
Gas phase isomerization enthalpies of organic compounds: A semiempirical, density functional theory, and ab initio post-Hartree–Fock theoretical study Sierra Rayne a,*, Kaya Forest b a b
Ecologica Research, Penticton, BC, Canada V2A 8J3 Department of Chemistry, Okanagan College, Penticton, BC, Canada V2A 8E1
a r t i c l e
i n f o
Article history: Received 27 January 2010 Received in revised form 26 February 2010 Accepted 27 February 2010 Available online 6 March 2010 Keywords: Isomerization enthalpies Hydrocarbons Semiempirical Density functional theory Ab initio post-Hartree–Fock
a b s t r a c t A database of gas phase standard state isomerization enthalpies was constructed for 562 pure and nitrogen-, oxygen-, sulfur-, and halogen-containing hydrocarbon reactions. The PM6 and PDDG semiempirical methods, B3LYP and M062X hybrid density functionals with the 6-311++G(d,p) and 6-311+G(3df,3p) basis sets, and the CBS-Q//B3 and G4MP2 ab initio post-Hartree–Fock composite methods were examined for prediction accuracy within each class of isomerization reactions. At much lower computational cost, the PM6 and PDDG semiempirical methods offer modest isomerization enthalpy prediction performance approximately comparable to the B3LYP density functional. The M062X density functional provides nearly equivalent accuracy to the higher level CBS-Q//B3 and G4MP2 methods across all hydrocarbon classes. Increasing basis set size from 6-311++G(d,p) to 6-311+G(3df,3p) with the B3LYP and M062X density functionals does not influence their respective isomerization enthalpy prediction accuracies. Using the 6-311+G(3df,3p) basis set, the M062X functional also achieves near CBS-Q//B3 quality accuracy for enthalpies of formation using the atomization approach. Ó 2010 Elsevier B.V. All rights reserved.
1. Introduction Isomerization enthalpies of organic compounds have attracted increasing computational interest [1–4] in recent years because of their insights into molecular structure and bonding, use in predicting synthetic pathways, applicability towards the fields of polymer science and energy conversion, as well as the ability to help benchmark theoretical methods. Earlier investigations often focused on the known hydrocarbon branching errors of density functional theory (DFT) methods (particularly the B3LYP functional), whereby linear to branched isomerizations (e.g., n-octane to tetramethylbutane) were incorrectly predicted to be quantitatively less exothermic (and often qualitative errors were present) than the well established experimental datasets [2,5,6]. Recent studies have extended these isomerization enthalpy errors to a broad range of bonding situations, including extensions to oxygen-, nitrogen-, and halogen-containing hydrocarbons and silanes, as well as various bridged and cyclic structures and functional group conversions (e.g., acyclic to cyclic, aromatic to aliphatic, carboxylic acid/dione to ester, ketone to ether, diol to peroxide, etc.) [1,7,8]. In this work, we have investigated the pre-existing isomerization enthalpy validation datasets with levels of theory not previously considered in the literature, as well as compiled larger sets
* Corresponding author. Tel.: +1 250 487 0166. E-mail address:
[email protected] (S. Rayne). 0166-1280/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.theochem.2010.02.030
of hydrocarbon isomerization reactions with experimental data, including new classes of sulfur and halogen substituted derivatives, and have applied various semiempirical, DFT, and higher level ab initio post-Hartree–Fock composite methods to these broader datasets. 2. Materials and methods Calculations were conducted using Gaussian-09 [9] on the Western Canada Research Grid (WestGrid; project 100185 [K. Forest]) and the Shared Hierarchical Academic Research Computing Network (SHARCNET; project sn4612 [K. Forest]), both within the umbrella of Compute/Calcul Canada. All calculations used the same gas phase starting geometries obtained with the PM6 semiempirical method [10] as implemented in MOPAC 2009 (http:// www.openmopac.net/; v. 9.099). Semiempirical calculations used the AM1 [11], PM3/PM3MM [12], PM6 [10], and PDDG [5] methods as reimplemented [13] in Gaussian-09. Density functional theory (DFT) calculations used the B3LYP [14,15] and M062X [16] hybrid density functionals with the 6-311++G(d,p) [17–19], 6-311+G(3df,3p) [17–19], and aug-ccpVDZ [20–22] basis sets. Ab initio Hartree–Fock (HF) calculations were conducted using the 6-311++G(d,p) and aug-cc-pVDZ basis sets. Complete Basis Set (CBS) calculations used the CBS-Q//B3 [23,24] method. Gaussian-n calculations used the G4MP2 [25,26] method.
S. Rayne, K. Forest / Journal of Molecular Structure: THEOCHEM 948 (2010) 102–107
All optimized structures were confirmed as true minima by vibrational analysis at the same level. Isomerization enthalpies include zero point and thermal corrections. Only the lowest energy conformation of each molecule was considered for isomerization enthalpy calculations. Experimental gas phase standard state (298.15 K, 1 atm) enthalpies of formation Df HðgÞ were used to calculate experimental gasphase standard state (298.15 K, 1 atm) isomerization enthalpies Disom HðgÞ . Df HðgÞ data were obtained from Refs. [27–34]. Chemical accuracy is operationally defined herein as a Disom HðgÞ error of less than 1.5 kcal/mol based on the assumption of a typical 1 kcal/mol error in each Df HðgÞ datapoint used for a Disom HðgÞ calculation. Because of the wide range of Df HðgÞ errors reported in the literature (from less than 0.5 kcal/ mol up to 5 kcal/mol), a chemical accuracy definition of 1.5 kcal/ mol for Disom HðgÞ is only for convenience. In practice, each calculated experimental Disom HðgÞ value will have a unique error defined by the respective errors of the Df HðgÞ measurements that comprise the isomerization reaction, and rigorous chemical accuracy would need to be defined on a case-by-case basis.
3. Results and discussion Work by Sattelmeyer et al. [1], Grimme et al. [3], and TiradoRives and Jorgensen [8] established a standard set of Disom HðgÞ for various pure hydrocarbons (n = 13; Table S1), nitrogen-containing hydrocarbons (n = 10; Table S2), and oxygen-containing hydrocarbons (n = 11; Table S3) that have been used to benchmark and investigate various levels of theory. These studies have collectively shown that while low mean signed errors (MSEs; 61 kcal/mol) can be obtained using various levels of semiempirical, Hartree–Fock (HF) ab initio, and DFT methods, the results only appear favorable due to error cancellations. Consideration of mean unsigned errors (MUEs) and root mean square errors (RMSEs) in these reports show that various semiempirical (AM1, PM3, PDDG), HF, DFT (BLYP, BHLYP, B2LYP, B3LYP, O3LYP, mPW2PLYP, BMK, BP86, B97D, PBE/ PBE0, TPSS, MPW1k, MPWB1k, and SCC-DFTB), and MP2/SCSMP2 model chemistries generally yield MUEs/RMSEs on the order of 1–12 kcal/mol, with typically lower MUEs using higher levels of theory (Table 1). We obtain similar errors on these datasets with the semiempirical AM1, PM3, PM6, and PDDG methods as implemented in Gaussian-09, as well as the HF level of theory with representative Pople (6-311++G(d,p)) and Dunning-type (aug-cc-pVDZ) basis sets, but find near chemical accuracy (MUEs of 61.5 kcal/mol for each of the three standard isomerization enthalpy sets) can be achieved using the M062X density functional with the 6-311++G(d,p), 6311+G(3df,3p), and aug-cc-pVDZ basis sets (Table 2). The B3LYP functional with the 6-311++G(d,p), 6-311+G(3df,3p), and aug-ccpVDZ basis sets does not perform well with either the pure hydrocarbon set (MUEs of 3.2, 3.4, and 2.8 kcal/mol, respectively) or with the oxygen-containing hydrocarbon dataset (MUEs of 2.2, 2.2, and 1.4 kcal/mol, respectively), but displays remarkably low MUEs of 0.8–0.9 kcal/mol using all three basis sets on the nitrogen-containing hydrocarbon dataset. This B3LYP error profile is similar to the previous isomerization enthalpy reports with this functional using other basis sets (see Refs. [1,3,8] and the literature summaries in Table 1). The aug-cc-pVDZ basis set yields slightly better MUEs than the Pople-type basis sets with the B3LYP functional, but the opposite error trending with the M062X functional, and conflicting error trendings for the HF model chemistry. Increasing the basis set size from 6-311++G(d,p) to 6-311+G(3df,3p) has no significant effect on the Disom HðgÞ prediction accuracy for either the M062X or B3LYP functionals. Similar to the work of Grimme et al. [3] using higher level MP2 and CCSD(T) methods, we find that the CBS-Q// B3 and G4MP2 methods display the highest MUEs on the oxy-
103
gen-containing hydrocarbon dataset relative to their corresponding chemical accuracies on the pure hydrocarbon and nitrogencontaining hydrocarbon datasets, potentially highlighting systematic errors in the oxygen-containing hydrocarbon experimental Df HðgÞ database. For all three datasets, the CBS-Q//B3 and G4MP2 methods yield the lowest MUEs reported in the literature to date for these isomerization reactions. To these standard isomerization datasets of Sattelmeyer et al. [1], Grimme et al. [3], and Tirado-Rives and Jorgensen [8], we added additional isomerization reactions in each class to bring the total number of isomerizations to 311 for the pure hydrocarbons (Table S4; note that the isomerization reactions with the literature Df HðgÞ data from Schreiner et al. [35] and Zhao and Truhlar [36] are included in this dataset, as are some isomerization reactions examined by Taskinen [37]), 73 for the nitrogen-containing hydrocarbons (Table S5), 116 for the oxygen-containing hydrocarbons (Table S6), as well as 39 isomerization enthalpies for sulfurcontaining hydrocarbons (Table S7) and 23 for halogen-containing hydrocarbons (Table S8), for a total of 562 isomerization enthalpies across these functional group classes. We omitted the N-methylacetamide ? dimethylformamide, tetrahydro-2H-pyran-2-one ? acetylacetone, and hexanoic acid ? methyl pivalate isomerization reactions from the new composite dataset due to uncertainty regarding their experimental Df HðgÞ values. Dimethylformamide does not have an experimental Df HðgÞ in the NIST database, and the reported experimental Df HðgÞ values for acetylacetone and methyl pivalate range over 44 and 20 kJ/mol, respectively. Calculations using the PM6 and PDDG semiempirical methods, B3LYP/6311++G(d,p), B3LYP/6-311+G(3df,3p), M062X/6-311++G(d,p), and M062X/6-311+G(3df,3p) density functional levels of theory, and the CBS-Q//B3 and G4MP2 methods were then conducted on these composite datasets. The representative 6-311++G(d,p) and 6311+G(3df,3p) Pople-type basis sets were chosen for the DFT calculations because no clear and consistent increase in accuracy was obtained across both density functionals for all compound classes using the validation set results discussed above with the 6-311++G(d,p), 6-311+G(3df,3p), and aug-cc-pVDZ basis sets. Neither of the semiempirical methods provided chemical accuracy, having Disom HðgÞ MUEs ranging from about 2 to 3 kcal/mol and RMSEs from about 3 to 5 kcal/mol (Table 3 and Fig. 1). The B3LYP/ 6-311++G(d,p) and B3LYP/6-311+G(3df,3p) methods performed with modestly better accuracy than the two semiempirical methods for the nitrogen, oxygen, sulfur, and halogen derivatives, yielding MUEs from 1.2 to 2.3 kcal/mol and RMSEs from 1.6 to 2.9 kcal/ mol, but provided much lower accuracy on the pure hydrocarbons (MUE = 5.5 kcal/mol for both basis sets and RMSE = 6.6–6.8 kcal/ mol, respectively). In comparison, the M062X/6-311++G(d,p) and M062X/6-311+G(3df,3p) levels of theory provided nearly equivalent accuracies to the more computationally expensive CBS-Q//B3 and G4MP2 methods for all compound classes. The M062X/6311++G(d,p), M062X/6-311+G(3df,3p), CBS-Q//B3, and G4MP2 methods achieved effective Disom HðgÞ chemical accuracy for all hydrocarbon classes. Thus, while a modest Disom HðgÞ accuracy increase can be obtained by using the more expensive CBS-Q//B3 method over the M062X/6-311++G(d,p) and M062X/6-311+ G(3df,3p) levels of theory, there appears to be no Disom HðgÞ accuracy increase in adding further computational costs at the G4MP2 level relative to the cheaper CBS-Q//B3 method. Similarly, there is no error reduction for Disom HðgÞ in using the more expensive 6-311+G(3df,3p) basis set, versus its cheaper 6-311++G(d,p) counterpart, with either the B3LYP or M062X functionals. Statistically insignificant (p > 0.05) or generally weak linear regressions (jrj < 0.75) with low values for the slope (m) of calculated Disom HðgÞ error against the experimental Disom HðgÞ were observed within each combination of hydrocarbon class and level of theory (slopes not provided where regressions are not significant):
104
S. Rayne, K. Forest / Journal of Molecular Structure: THEOCHEM 948 (2010) 102–107
Table 1 Literature derived calculated Disom HðgÞ errors for the validation sets of pure hydrocarbons, nitrogen-containing hydrocarbons, and oxygen-containing hydrocarbons developed by Sattelmeyer et al. [1], Grimme et al. [3], and Tirado-Rives and Jorgensen [8] at various levels of semiempirical, Hartree–Fock ab initio, and density functional theory. Values are in kcal/mol and presented as mean unsigned error [root mean squared error] (mean signed error). Pure hydrocarbons
Nitrogen-containing hydrocarbons
Oxygen-containing hydrocarbons
TMa/HF/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/B3LYP/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/BMK/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/B2LYP/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/PBE0/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/PBE/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/BHLYP/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/BP86/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/B97D/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/TPSS/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/O3LYP/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/BLYP/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/mPW2PLYP/TZV(2df,2pd)//B3LYP/TZV(d,p)b G03c/MPW1k/TZV(2df,2pd)//B3LYP/TZV(d,p)b G03/MPWB1k/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/MP2/TZV(2df,2pd)//B3LYP/TZV(d,p)b TM/SCS-MP2/TZV(2df,2pd)//B3LYP/TZV(d,p)b MPd/CCSD(T)/6-31G(d)//B3LYP/TZV(d,p)b MP/CCSD(T)/cc-pVTZ//B3LYP/TZV(d,p)b
3.6 3.5 1.9 2.1 2.7 2.8 2.8 2.8 2.4 3.0 3.7 4.3 2.0 2.7 2.4 1.4 1.3 1.0 0.7
[5.5] [5.0] [2.7] [3.1] [3.6] [3.6] [4.4] [3.9] [3.5] [4.0] [5.5] [5.8] [3.0] [3.7] [3.2] [2.3] [1.6] [1.4] [1.0]
(0.1) (0.0) (0.9) (0.3) (1.4) (1.3) (0.0) (0.9) (0.2) (1.3) (1.7) (0.1) (0.3) (1.3) (1.5) (0.6) (0.1) (0.6) (0.3)
1.8 1.3 1.8 1.2 1.8 1.6 1.3 1.6 1.4 2.2 1.6 1.7 1.3 2.0 1.9 1.7 1.4 2.5 1.5
[2.7] [2.0] [2.8] [1.8] [2.5] [2.3] [2.1] [2.2] [2.2] [2.6] [2.2] [2.4] [1.8] [2.6] [2.7] [2.4] [2.0] [3.1] [2.0]
(0.4) (0.8) (1.3) (0.5) (1.0) (1.2) (0.5) (1.2) (1.2) (1.9) (1.3) (1.0) (0.5) (0.8) (1.0) (0.3) (0.5) (1.6) (0.7)
3.7 2.6 2.6 1.9 2.8 2.7 2.5 2.5 2.8 3.5 2.6 4.1 2.0 3.5 3.2 2.8 2.7 3.2 2.2
MOPAC/AM1e MOPAC/PM3e MOPAC/PDDG/PM3e G03/B3LYP/6-31G(d)e SCC-DFTBe
6.6 4.3 2.4 3.1 5.0
[8.9] [5.1] [2.8] [4.7] [7.0]
(1.1) (0.6) (0.0) (0.2) (3.3)
4.2 3.8 2.6 2.5 4.7
[5.0] [4.3] [3.4] [3.2] [7.3]
(1.1) (1.3) (0.9) (1.3) (0.0)
10.3 [11.8] (5.1) 6.0 [7.1] (2.6) 3.1 [3.3] (0.7) 2.9 [4.8] (2.3) 4.7 [5.8] (3.9)
MOPAC/PDDG/PM3//(SCC-DFTB and B3LYP/6-31G(d)e)f G03/HF/6-31G(d)//(SCC-DFTB and B3LYP/6-31G(d)f)g G03/B3LYP/6-31G(d)//(SCC-DFTB and B3LYP/6-31G(d)f)g G03/HF/6-31G(d)//B3LYP/6-31G(d)g G03/B3LYP/6-31G(d,p)//(SCC-DFTB and B3LYP/6-31G(d)f)g G03/B3LYP/6-31+G(d,p)//(SCC-DFTB and B3LYP/6-31G(d)f)g
1.7 3.3 3.2 3.1 3.1 3.4
[2.4] [5.2] [4.8] [4.7] [4.7] [4.7]
(0.0) (0.5) (0.2) (0.2) (0.1) (0.1)
2.7 2.8 2.9 2.5 1.9 1.1
[3.4] [3.4] [3.6] [3.2] [2.2] [1.2]
(0.1) (0.9) (1.4) (1.3) (0.8) (0.5)
2.7 3.4 3.0 2.9 2.1 1.7
[4.5] [3.2] [2.9] [2.6] [3.5] [3.1] [3.0] [3.1] [3.6] [4.4] [3.6] [4.8] [2.5] [4.4] [3.9] [3.3] [2.9] [4.0] [2.5]
[3.1] [3.9] [4.6] [4.8] [2.8] [2.2]
(1.2) (0.4) (0.6) (0.3) (1.4) (0.1) (0.9) (0.5) (1.3) (2.4) (0.3) (1.9) (0.5) (2.1) (1.4) (1.4) (0.6) (1.2) (0.5)
(0.9) (1.1) (2.0) (2.3) (0.8) (0.2)
a
Turbomole. From Ref. [3] and adjusted for error calculation using experimental data from Refs. [1,8]. c Gaussian-03. d MOLPRO. e From Ref. [1]. f Single point energies obtained from SCC-DFTB optimized geometries with the exception of B3LYP/6-31G(d) optimized geometries for propene, 2-methylpropene, and penta-1,3-diene. g From Ref. [8]. b
Table 2 Calculated Disom HðgÞ errors for the validation sets of pure hydrocarbons, nitrogen-containing hydrocarbons, and oxygen-containing hydrocarbons developed by Sattelmeyer et al. [1], Grimme et al. [3], and Tirado-Rives and Jorgensen [8] at various levels of semiempirical, Hartree–Fock ab initio, density functional, and ab initio post-Hartree–Fock theory. Values are in kcal/mol and presented as mean unsigned error [root mean squared error] (mean signed error).
AM1 PM3 PM6 PDDG HF/6-311++G(d,p) HF/aug-cc-pVDZ B3LYP/6-311++G(d,p) B3LYP/6-311+G(3df,3p) B3LYP/aug-cc-pVDZ M062X/6-311++G(d,p) M062X/6-311+G(3df,3p) M062X/aug-cc-pVDZ CBS-Q//B3 G4MP2
Pure hydrocarbons
Nitrogen-containing hydrocarbons
Oxygen-containing hydrocarbons
6.7 4.2 3.6 1.8 3.4 3.0 3.2 3.4 2.8 1.1 1.4 1.5 0.5 0.6
3.9 4.1 3.5 3.1 1.4 1.4 0.8 0.9 0.8 1.0 1.0 1.3 0.7 0.7
8.0 [9.6] (3.9) 5.5 [6.5] (1.3) 3.9 [4.8] (0.3) 3.2 [3.7] (0.5) 2.8 [3.2] (0.0) 6.1 [11.0] (0.3) 2.2 [2.4] (1.2) 2.2 [2.6] (1.5) 1.4 [1.7] (0.7) 1.2 [1.7] (0.9) 1.5 [1.9] (1.2) 1.5 [1.8] (0.3) 1.5 [1.8] (0.6) 1.1 [1.4] (0.6)
[8.6] [5.0] [5.5] [2.3] [4.9] [4.7] [4.3] [4.6] [4.0] [1.7] [2.0] [2.0] [0.8] [0.8]
(0.7) (0.4) (3.2) (0.1) (0.4) (0.1) (0.1) (0.1) (0.2) (0.5) (0.7) (0.8) (0.4) (0.2)
PM6: pure HCs (r = 0.65, p < 1038, m = 0.16 (kcal/mol)/(kcal/ mol)), nitrogen HCs (r = 0.29, p = 0.01, m = 0.05), oxygen HCs (r = 0.28, p < 0.01, m = 0.08), sulfur HCs (r = 0.26, p = 0.11), and halogen HCs (r = 0.26, p = 0.24). PDDG: pure HCs (r = 0.24, p < 104, m = 0.04), nitrogen HCs (r = 0.03, p = 0.77), oxygen HCs (r = 0.35, p < 104, m = 0.11), sulfur HCs (r = 0.49, p < 0.01, m = 0.47), and halogen HCs (r = 0.34, p = 0.11).
[4.7] (0.8) [4.7] (1.2) [5.1] (3.3) [3.5] (0.6) [1.9] (0.1) [1.8] (0.1) [1.0] (0.1) [1.1] (0.5) [1.0] (0.5) [1.4] (0.5) [1.5] (1.0) [1.9] (0.6) [0.9] (0.5) [0.8] (0.5)
B3LYP/6-311++G(d,p): pure HCs (r = 0.11, p = 0.05), nitrogen HCs (r = 0.19, p = 0.11), oxygen HCs (r = 0.14, p = 0.14), sulfur HCs (r = 0.73, p < 106, m = 0.51), and halogen HCs (r = 0.27, p = 0.22). B3LYP/6-311+G(3df,3p): pure HCs (r = 0.16, p < 0.01, m = 0.04), nitrogen HCs (r = 0.17, p = 0.14), oxygen HCs (r = 0.32, p < 0.001, m = 0.06), sulfur HCs (r = 0.68, p < 105, m = 0.53), and halogen HCs (r = 0.22, p = 0.31).
105
S. Rayne, K. Forest / Journal of Molecular Structure: THEOCHEM 948 (2010) 102–107
Table 3 Calculated Disom HðgÞ errors for the extended composite isomerization datasets of pure hydrocarbons, nitrogen-containing hydrocarbons, and oxygen-containing hydrocarbons using the PM6 and PDDG semiempirical methods, the B3LYP/6-311++G(d,p), B3LYP/6-311+G(3df,3p), M062X/6-311++G(d,p), and M062X/6-311+G(3df,3p) density functional levels of theory, and the CBS-Q//B3 and G4MP2 ab initio post-Hartree–Fock composite methods. Values are in kcal/mol and presented as mean unsigned error [root mean squared error] (mean signed error). na
PM6
PDDG
311 3.1 [5.1] (0.8) 2.2 Pure HCsb Nitrogen-containing 73 2.7 [3.7] (1.2) 3.2 HCs Oxygen-containing 116 2.8 [4.1] (1.2) 3.1 HCs Sulfur-containing HCs 39 2.8 [4.2] (1.0) 1.9 Halogen-containing 23 2.8 [3.7] (0.7) 3.0 HCs Total a b
B3LYP 6-311++G(d,p)
B3LYP M062X 6-311+G(3df,3p) 6-311++G(d,p)
M062X CBS-Q//B3 6-311+G(3df,3p)
G4MP2
[3.1] (0.3) 5.5 [6.6] (4.2) [4.6] (0.9) 1.7 [2.4] (0.6)
5.5 [6.8] (4.3) 1.9 [2.7] (0.3)
1.3 [2.1] (0.1) 1.6 [2.5] (0.4) 1.7 [2.3] (0.4) 1.7 [2.4] (0.5)
1.1 [1.7] (0.4) 1.1 [1.6] (0.3) 1.2 [1.6] (0.1) 1.1 [1.5] (0.2)
[4.6] (0.9) 2.2 [2.8] (0.5)
2.3 [2.9] (0.3)
1.1 [1.6] (0.2) 1.4 [1.8] (0.4)
1.2 [1.6] (0.3) 1.0 [1.4] (0.1)
1.0 [1.3] (0.4) 0.9 [1.2] (0.4) 1.1 [1.5] (0.6) 1.1 [1.5] (0.4)
0.8 [1.1] (0.5) 0.8 [1.1] (0.6) 1.3 [1.6] (0.3) 1.2 [1.5] (0.3)
1.3 [1.9] (0.1) 1.5 [2.2] (0.0)
1.1 [1.6] (0.1)
[2.9] (0.3) 1.8 [2.5] (1.3) 2.0 [2.7] (1.3) [3.7] (0.6) 1.2 [1.6] (0.3) 1.2 [1.6] (0.2)
562 3.0 [4.6] (0.9) 2.5 [3.7] (0.2) 3.9 [5.2] (2.6)
4.0 [5.4] (2.5)
1.1 [1.5] (0.1)
Number of isomerization reactions in the composite dataset. Hydrocarbons.
20
PM6
20
15
15
10
10
5
5
0 14
M062X/6-311++G(d,p): pure HCs (r = 0.31, p < 107, m = 0.03), nitrogen HCs (r = 0.44, p < 0.001, m = 0.05), oxygen HCs (r = 0.14, p = 0.14), sulfur HCs (r = 0.06, p = 0.72), and halogen HCs (r = 0.42, p = 0.18). M062X/6-311+G(3df,3p): pure HCs (r = 0.39, p < 1011, m = 0.05), nitrogen HCs (r = 0.31, p<0.01, m = 0.04), oxygen HCs (r = 0.39, p < 104, m = 0.05), sulfur HCs (r = 0.15, p = 0.36), and halogen HCs (r = 0.22, p = 0.31). CBS-Q//B3: pure HCs (r = 0.02, p = 0.78), nitrogen HCs (r = 0.20, p = 0.09), oxygen HCs (r = 0.15, p = 0.10), sulfur HCs (r = 0.14, p = 0.39), and halogen HCs (r = 0.23, p = 0.29). G4MP2: pure HCs (r = 0.14, p = 0.01, m = 0.01), nitrogen HCs (r = 0.19, p = 0.10), oxygen HCs (r = 0.08, p = 0.39), sulfur HCs (r = 0.13, p = 0.42), and halogen HCs (r = 0.22, p = 0.31).
PDDG
0 B3LYP/6-311++G(d,p)
40
M062X/6-311++G(d,p)
12 30
10 8
20
6
% Frequency
4
10
2 0 12
0 B3LYP/6-311+G(3df,3p)
M062X/6-311+G(3df,3p)
30
10
25
8
20
6
15
4
10
2
5
0 40
35
0 CBS-Q//B3
35
G4MP2
30 30
25 20
20
15 10
10
5 0 -30 -20 -10
0
10
20
0 -30 -20 -10
0
10
20
Signed error in Δ isom H°(g) Fig. 1. Histograms showing the signed error in calculated Disom HðgÞ across all hydrocarbon classes (n = 562) using the PM6 and PDDG semiempirical methods, the B3LYP/6-311++G(d,p), B3LYP/6-311+G(3df,3p), M062X/6-311++G(d,p), and M062X/ 6-311+G(3df,3p) density functional levels of theory, and the CBS-Q//B3 and G4MP2 ab initio post-Hartree–Fock composite methods. Normal distribution best fits (solid lines) are shown for comparison.
We do not view these correlations as sufficiently reliable to allow correction of any of the theoretical methods in order to achieve improved Disom HðgÞ accuracy. As a further check on data quality and sources of error, we calculated gas phase standard state (298.15 K, 1 atm) enthalpies of formation Df HðgÞ using the atomization enthalpy approach [38] for all compounds comprising our Disom HðgÞ dataset (Table 4, Fig. 2, and Tables S9–S14). Consistent with previous reports, the two DFT methods with the 6-311++G(d,p) basis set systematically overestimated Df HðgÞ using the atomization approach (although the M062X/6-311++G(d,p) errors are about one-half those of B3LYP/6311++G(d,p)), whereas the CBS-Q//B3 and G4MP2 methods gave substantially lower errors. The G4MP2 method provided Df HðgÞ chemical accuracy for all functional group classes. With the 6311+G(3df,3p) basis set, the B3LYP Df HðgÞ errors are decreased by about half relative to the 6-311++G(d,p) basis set, although they remain well beyond chemical accuracy (i.e., 17 kcal/mol Df HðgÞ MUE for all compounds). In contrast, moving from the 6-311++G(d,p) to 6-311+G(3df,3p) basis set with the M062X functional decreases the Df HðgÞ error by about fourfold on average. This increase in basis set completeness brings the M062X/6-311+G(3df,3p) Df HðgÞ prediction accuracy with the atomization approach to only about 1.5 kcal/mol less accurate than the CBS-Q//B3 method, and about 3 kcal/mol less accurate on average than the G4MP2 level. Consequently, where such DFT methods achieve accurate Disom HðgÞ estimates with the 6-311++G(d,p) basis set, the favorable results depend on error cancellations in the Df HðgÞ values for each component of the isomerization reaction. Using the 6-311+ G(3df,3p) basis set, the M062X functional achieves accurate Disom HðgÞ estimates via the desired mode of accurate Df HðgÞ estimates. In contrast, the B3LYP functional achieves modest
106
S. Rayne, K. Forest / Journal of Molecular Structure: THEOCHEM 948 (2010) 102–107
Table 4 Calculated Df HðgÞ errors for all compounds in the extended composite isomerization datasets of pure hydrocarbons, nitrogen-containing hydrocarbons, and oxygen-containing hydrocarbons using the PM6 and PDDG semiempirical methods, the B3LYP/6-311++G(d,p), B3LYP/6-311+G(3df,3p), M062X/6-311++G(d,p), and M062X/6-311+G(3df,3p) density functional levels of theory, and the CBS-Q//B3 and G4MP2 ab initio post-Hartree–Fock composite methods via the atomization enthalpy approach. Values are in kcal/mol and presented as mean unsigned error [root mean squared error] (mean signed error).
Pure hydrocarbons Nitrogen-containing hydrocarbons Oxygen-containing hydrocarbons Sulfur-containing hydrocarbons Halogen-containing hydrocarbons Total a
na
B3LYP/6311++G(d,p)
B3LYP/6311+G(3df,3p)
M062X/6311++G(d,p)
M062X/6311+G(3df,3p)
CBS-Q//B3
G4MP2
336 102
34.8 [36.7] (34.8) 20.1 [21.8] (20.1)
22.4 [24.4] (22.4) 7.5 [9.2] (7.3)
17.5 [18.1] (17.5) 13.6 [14.2] (13.6)
5.5 [6.1] (5.3) 2.0 [2.5] (0.8)
3.1 [3.5] (3.1) 1.8 [2.1] (0.9)
1.0 [1.4] (0.2) 1.0 [1.3] (0.5)
157
27.0 [28.7] (27.0)
13.7 [15.1] (13.7)
13.8 [14.4] (13.8)
1.9 [2.3] (0.2)
1.1 [1.4] (0.2)
0.9 [1.2] (0.8)
55
28.6 [30.1] (28.6)
16.9 [18.2] (16.9)
14.7 [15.3] (14.7)
2.6 [2.9] (2.5)
0.9 [1.3] (0.5)
37
23.5 [24.7] (23.5)
11.7 [12.5] (11.7)
9.5 [9.9] (9.5)
2.7 [3.3] (2.1)
1.9 [2.3] (1.8)
1.1 [1.4] (0.9) 1.0 [1.6] (0.6)
687
29.7 [32.0] (29.7)
17.2 [19.7] (17.2)
15.4 [16.2] (15.4)
3.8 [4.6] (2.8)
2.2 [2.7] (1.7)
1.0 [1.3] (0.3)
Number of compounds in the composite dataset.
6
B3LYP/6-311++G(d,p)
10
5
mental Df HðgÞ for the various compound class/theoretical method combinations gave either statistically insignificant (p > 0.05) correlations or a poor quality of fit (jrj < 0.75) with low values for the regression slope:
M062X/6-311++G(d,p)
8
4
6
3 4
2
2
1 0 6
0 B3LYP/6-311+G(3df,3p)
14 12
5
% Frequency
M062X/6-311+G(3df,3p)
10
4
8
3
6
2
4
1
2
0
0 CBS-Q//B3
30
G4MP2
40
25 30 20 20
15 10
10 5 0
0
20
40
60
80
0
0
20
40
60
80
Signed error in Δ fH°(g) Fig. 2. Histograms showing the signed error in calculated Df HðgÞ via the atomization enthalpy approach across all hydrocarbon classes (n = 687) using the B3LYP/6311++G(d,p), B3LYP/6-311+G(3df,3p), M062X/6-311++G(d,p), and M062X/6311+G(3df,3p) density functional theory methods and the CBS-Q//B3 and G4MP2 ab initio post-Hartree–Fock composite methods. Normal distribution best fits (solid lines) are shown for comparison.
Disom HðgÞ prediction performance via Df HðgÞ error cancellation, even with the 6-311+G(3df,3p) basis set. The G4MP2 method, similar to its Gaussian-n precursors, appears capable of accurately estimating Df HðgÞ using the atomization method across a broad range of organic compounds with varying functional groups and molecular masses. Thus, the G4MP2, CBS-Q//B3, and M062X/6-311+G(3df,3p) methods achieve quality Disom HðgÞ predictions due to fundamentally correct underlying Df HðgÞ estimates. Similar to our statistical analyses of the Disom HðgÞ estimates, linear regressions of the calculated Df HðgÞ error against the experi-
B3LYP/6-311++G(d,p): pure HCs (r = 0.46, p < 1017, m = 0.12 (kcal/mol)/(kcal/mol)), nitrogen HCs (r = 0.12, p = 0.21), oxygen HCs (r = 0.29, p < 0.001, m = 0.08), sulfur HCs (r = 0.63, p < 106, m = 0.32), and halogen HCs (r = 0.29, p = 0.08). B3LYP/6-311+G(3df,3p): pure HCs (r = 0.54, p < 1025, m = 0.12), nitrogen HCs (r = 0.07, p = 0.46), oxygen HCs (r = 0.32, p < 104, m = 0.06), sulfur HCs (r = 0.66, p < 107, m = 0.24), and halogen HCs (r = 0.47, p < 0.01, m = 0.05). M062X/6-311++G(d,p): pure HCs (r = 0.58, p < 1031, m = 0.07), nitrogen HCs (r = 0.07, p = 0.49), oxygen HCs (r = 0.29, p < 0.001, m = 0.04), sulfur HCs (r = 0.64, p < 106, m = 0.15), and halogen HCs (r = 0.11, p = 0.51). M062X/6-311+G(3df,3p): pure HCs (r = 0.74, p < 1057, m = 0.05), nitrogen HCs (r = 0.06, p = 0.53), oxygen HCs (r = 0.07, p = 0.41), sulfur HCs (r = 0.68, p < 108, m = 0.05), and halogen HCs (r = 0.14, p = 0.39). CBS-Q//B3: pure HCs (r = 0.03, p = 0.64), nitrogen HCs (r = 0.46, p < 105, m = 0.02), oxygen HCs (r = 0.33, p < 104, m = 0.01), sulfur HCs (r = 0.01, p = 0.97), and halogen HCs (r = 0.01, p = 0.99). G4MP2: pure HCs (r = 0.30, p < 107, m = 0.01), nitrogen HCs (r = 0.32, p < 0.001, m = 0.01), oxygen HCs (r = 0.31, p < 104, m = 0.01), sulfur HCs (r = 0.06, p = 0.68), and halogen HCs (r = 0.57, p < 0.001, m = 0.02). Similar regression of the Df HðgÞ error against molecular mass for all compounds within a particular computational method yielded reasonably strong correlations and modest positive slopes for the B3LYP/6-311++G(d,p) (r = 0.88, p < 10226, m = 0.35 (kcal/mol)/(g/ mol)), B3LYP/6-311+G(3df,3p) (r = 0.75, p < 10125, m = 0.24), and M062X/6-311++G(d,p) (r = 0.80, p < 10156, m = 0.13) levels of theory. Much weaker regression qualities of fit and lower slopes were found for the M062X/6-311+G(3df,3p) (r = 0.27, p < 1012, m = 0.03), CBS-Q//B3 (r = 0.25, p < 1011, m = 0.02), and G4MP2 (r = 0.10, p = 0.01, m = 0.004) methods. We note that the molecular mass scaling error is nearly absent at the M062X/6311+G(3df,3p) level, having a Df HðgÞ error against molecular mass slope more than fivefold lower than with the 6-311++G(d,p) basis set and a substantial reduction in the quality of fit. In contrast, the mass scaling error is only reduced modestly in terms of both magnitude and quality of fit in moving from the 6-311++G(d,p) to 6311+G(3df,3p) basis set with the B3LYP functional. The findings
S. Rayne, K. Forest / Journal of Molecular Structure: THEOCHEM 948 (2010) 102–107
are consistent with the known Df HðgÞ scaling error of various DFT methods using atomization approaches which is generally absent when higher level composite methods or modern DFT functionals with more balanced basis sets are employed [30,31,37–47]. In conclusion, the PM6 and PDDG semiempirical methods offer modest Disom HðgÞ prediction performance approximately comparable to the B3LYP density functional, but at much lower computational cost. For pure hydrocarbons, both semiempirical methods significantly outperform the B3LYP functional for Disom HðgÞ estimation. The M062X density functional offers nearly equivalent Disom HðgÞ prediction accuracy to the higher level, and much more expensive, CBS-Q//B3 and G4MP2 methods across all hydrocarbon classes. Thus, for very large systems where composite methods are too expensive, pure and functionalized hydrocarbon derivative isomerization enthalpies can be reliably estimated with the M062X density functional. With the 6-311+G(3df,3p) basis set, the M062X functional also provides Df HðgÞ estimation accuracy using the atomization method near that of the composite CBS-Q// B3 method. Acknowledgements This work was made possible by the facilities of the Western Canada Research Grid (WestGrid: www.westgrid.ca; project 100185), the Shared Hierarchical Academic Research Computing Network (SHARCNET: www.sharcnet.ca; project sn4612), and Compute/Calcul Canada. Thanks are extended to an anonymous reviewer whose helpful suggestions improved the quality of the work.
[10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]
[30] [31] [32] [33] [34]
Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.theochem.2010.02.030. References [1] K.W. Sattelmeyer, J. Tirado-Rives, W.L. Jorgensen, J. Phys. Chem. A 110 (2006) 13551. [2] S. Grimme, Angew. Chem., Int. Ed. 45 (2006) 4460. [3] S. Grimme, M. Steinmetz, M. Korth, J. Org. Chem. 72 (2007) 2118. [4] T. Schwabe, S. Grimme, Phys. Chem. Chem. Phys. 9 (2007) 3397. [5] M.P. Repasky, J. Chandrasekhar, W.L. Jorgensen, J. Comput. Chem. 23 (2002) 1601. [6] J. Stewart, J. Mol. Model. 10 (2004) 6. [7] S. Grimme, M. Steinmetz, M. Korth, J. Chem. Theor. Comput. 3 (2007) 42. [8] J. Tirado-Rives, W.L. Jorgensen, J. Chem. Theor. Comput. 4 (2008) 297. [9] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G.A. Petersson, H. Nakatsuji, M. Caricato,
[35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47]
107
X. Li, H.P. Hratchian, A.F. Izmaylov, J. Bloino, G. Zheng, J.L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, J.A. Montgomery, J.E. Peralta, F. Ogliaro, M. Bearpark, J.J. Heyd, E. Brothers, K.N. Kudin, V.N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J.C. Burant, S.S. Iyengar, J. Tomasi, Gaussian-09, Revision A.02, Gaussian, Inc., Wallingford, CT, 2009. J. Stewart, J. Mol. Model. 13 (2007) 1173. M.J.S. Dewar, E.G. Zoebisch, E.F. Healy, J.J.P. Stewart, J. Am. Chem. Soc. 107 (1985) 3902. J.J.P. Stewart, J. Comput. Chem. 10 (1989) 209. M. Frisch, G. Scalmani, T. Vreven, G. Zheng, Mol. Phys. 107 (2009) 881. A.D. Becke, J. Chem. Phys. 98 (1993) 5648. C. Lee, W. Yang, R.G. Parr, Phys. Rev. B 37 (1988) 785. Y. Zhao, D. Truhlar, Theor. Chem. Acc. 120 (2008) 215. R. Ditchfield, W.J. Hehre, J.A. Pople, J. Chem. Phys. 54 (1971) 724. W.J. Hehre, R. Ditchfield, J.A. Pople, J. Chem. Phys. 56 (1972) 2257. V. Rassolov, J.A. Pople, M. Ratner, P.C. Redfern, L.A. Curtiss, J. Comput. Chem. 22 (2001) 976. T.H. Dunning, J. Chem. Phys. 90 (1989) 1007. R.A. Kendall, T.H. Dunning, R.J. Harrison, J. Chem. Phys. 96 (1992) 6796. E.R. Davidson, Chem. Phys. Lett. 260 (1996) 514. J.A. Montgomery, M.J. Frisch, J.W. Ochterski, G.A. Petersson, J. Chem. Phys. 110 (1999) 2822. J.A. Montgomery, M.J. Frisch, J.W. Ochterski, G.A. Petersson, J. Chem. Phys. 112 (2000) 6532. L.A. Curtiss, P.C. Redfern, K. Raghavachari, J. Chem. Phys. 126 (2007) 84108. L.A. Curtiss, P.C. Redfern, K. Raghavachari, J. Chem. Phys. 127 (2007) 124105. J.B. Pedley, Thermochemical Data and Structures of Organic Compounds, Texas A & M University, College Station, TX, 1994. D.R. Lide, CRC Handbook of Chemistry and Physics, Taylor & Francis, Boca Raton, FL, 2008. H. Afeefy, J. Liebman, S. Stein, NIST Chemistry WebBook, NIST Standard Reference Database Number 69, National Institute of Standards and Technology, Gaithersburg, MD, USA, 2009. D. Bond, J. Org. Chem. 72 (2007) 5555. D. Bond, J. Org. Chem. 72 (2007) 7313. A.A. Zavitsas, N. Matsunaga, D.W. Rogers, J. Phys. Chem. A 112 (2008) 5734. A.A. Voityuk, J. Chem. Theor. Comput. 4 (2008) 1877. D.W. Scott, Chemical Thermodynamic Properties of Hydrocarbons and Related Substances. Properties of the Alkane Hydrocarbons, C1 through C10 in the Ideal Gas State from 0 to 1500 K (Bulletin 666), United States Bureau of Mines, Washington, DC, 1974. P.R. Schreiner, A.A. Fokin, R.A. Pascal, A. de Meijere, Org. Lett. 8 (2006) 3635. Y. Zhao, D. Truhlar, Org. Lett. 8 (2006) 5753. E. Taskinen, J. Phys. Org. Chem. 22 (2009) 632. M. Saeys, M. Reyniers, G. Marin, V. Van Speybroeck, M. Waroquier, J. Phys. Chem. A 107 (2003) 9147. M.D. Wodrich, C. Corminboeuf, P. von Rague Schleyer, Org. Lett. 8 (2006) 3631. M.D. Wodrich, C. Corminboeuf, P.R. Schreiner, A.A. Fokin, P. von Rague Schleyer, Org. Lett. 9 (2007) 1851. L. Curtiss, K. Raghavachari, P. Redfern, J. Pople, J. Chem. Phys. 112 (2000) 7374. P. Redfern, P. Zapol, L. Curtiss, K. Raghavachari, J. Phys. Chem. A 104 (2000) 5850. N. Haworth, M. Smith, G. Bacskay, J. Mackie, J. Phys. Chem. A 104 (2000) 7600. G. Tasi, R. Izsak, G. Matisz, A. Csaszar, M. Kallay, B. Ruscic, J. Stanton, Chem. Phys. Chem. 7 (2006) 1664. M. Sabbe, M. Saeys, M. Reyniers, G. Marin, V. Van Speybroeck, M. Waroquier, J. Phys. Chem. A 109 (2005) 7466. J. Gomes, M. Ribeiro da Silva, J. Phys. Chem. A 108 (2004) 11684. G. Blanquart, H. Pitsch, J. Phys. Chem. 111 (2007) 6510.