Machine learning enabled high-throughput screening of hydrocarbon molecules for the design of next generation fuels

Machine learning enabled high-throughput screening of hydrocarbon molecules for the design of next generation fuels

Fuel 265 (2020) 116968 Contents lists available at ScienceDirect Fuel journal homepage: www.elsevier.com/locate/fuel Full Length Article Machine l...

2MB Sizes 0 Downloads 21 Views

Fuel 265 (2020) 116968

Contents lists available at ScienceDirect

Fuel journal homepage: www.elsevier.com/locate/fuel

Full Length Article

Machine learning enabled high-throughput screening of hydrocarbon molecules for the design of next generation fuels

T



Guozhu Lia,b, , Zheng Hua, Fang Houa, Xinyu Lia, Li Wanga,b, Xiangwen Zhanga,b a b

Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China

G R A P H I C A L A B S T R A C T

A R T I C LE I N FO

A B S T R A C T

Keywords: Machine learning Fuel High-throughput screening Hydrocarbon DFT Group-contribution

Next generation high energy density hydrocarbon (HEDH) fuels are urgently demanded to extend the range of propulsion system and meet additional requirements of new engines. We develop a facile and efficient methodology based on machine learning enabled high-throughput screening to accelerate the design of next generation fuels, and present a proof-of-concept study for discovering new HEDH fuels. This approach screens 319,895 hydrocarbon molecules using the key properties of fuel as the threshold values, and a group of 28 highly potent hydrocarbon molecules with high net heat of combustion, high specific impulse, high density and low melting point has been identified. The as-discovered molecules possess distinctive ring composition and unique spatial structure, which direct the synthetic efforts toward next generation HEDH fuels. This strategy not only discovers a new group of polycyclic molecules as competitive fuel candidates but also accelerates the development of new HEDH fuels.

⁎ Corresponding author at: Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China. E-mail address: [email protected] (G. Li).

https://doi.org/10.1016/j.fuel.2019.116968 Received 21 October 2019; Received in revised form 5 December 2019; Accepted 27 December 2019 0016-2361/ © 2019 Elsevier Ltd. All rights reserved.

Fuel 265 (2020) 116968

G. Li, et al.

1. Introduction

2. Materials and methods

High energy density hydrocarbon (HEDH) fuels are key materials for aircrafts, rockets, and launchers to extend their flight range and increase their payload [1–3]. Until now, many qualified HEDH fuels have been designed and synthesized, including the widely used JP-10 fuel [4], polycyclic RJ-5 and RJ-7 [2], strained and caged quadricyclane [5,6], and renewable fuels from biomass [7–9]. The propulsion performance of the engine is greatly dependent on the properties of the applied fuel. For instance, aerospace vehicles with turbine, turbofan, ramjet, rocket, combined engine, etc. use different fuels due to various requirements of performance. Therefore, new HEDH Fuels have to be continuously developed to extend the range of propulsion system and meet changing requirements by the design of new engines. In recent years, high-throughput computational estimation has emerged as a major driver for designing new HEDH fuels [10–12]. It typically involves a combination of group-contribution approaches [13–16] and first-principles quantum–mechanical calculations [17] to quickly predict physicochemical properties and energy properties of various hydrocarbon molecules. The search for new hydrocarbon structures as candidates of next generation HEDH fuels should be accelerated to meet the demand of rapidly developed engine technologies. It will be of great significance to develop robust method for efficiently discovering new HEDH fuels. Many groups had successfully employed modern machine-learning models to predict target properties for the discovery of new materials, e.g., electrochemical anodes [18], conducting materials [19], antimicrobial peptides [20], and superhard materials [21]. For the prediction of fuel properties, machine-learning methods can bypass the use of time-consuming group-contribution analyses and expensive quantum–mechanical calculations [22]. Structural descriptors can be used to predict fuel properties more efficiently [23–25]. For instance, multiple molecular properties were predicted simultaneously by one neural network. We recently initiated a research effort to predict multiple molecular properties by various neural networks [26]. However, database establishment and high-throughput screening of hydrocarbon molecules is an additional requirement that needs to be assessed to discover new HEDH fuels. In this work, a facile and efficient methodology has been developed based on machine learning to accelerate the design of next generation fuels. A proof-of-concept study has been conducted for discovering new HEDH fuel. A small database containing molecular structures and properties of 342 hydrocarbon fuels was firstly established, based on which the structure-properties relationship was mined by machine learning. Then, a large structure-properties database containing 319,895 hydrocarbon molecules was built, in which high-throughput screening of new HEDH fuels was carried out. The schematic diagram of this work is shown in Scheme 1.

2.1. The small database of 342 hydrocarbon molecules A training dataset was firstly established, which contains 342 saturated hydrocarbon molecules. Molecular structures were optimized at the DFT level of (B3LYP/6-31G (d, p)) by using Gaussian 09. The atomic location data are given by Cartesian coordinates XYZ formats. Physicochemical properties and energy properties of all the hydrocarbon molecules in the database were calculated using the groupcontribution method and DFT approach. Density at 298 K (ρ), melting point (Tm), boiling point (Tb), critical temperature (Tc), critical pressure (Pc), critical volume (Vc), standard enthalpy of vaporization at 298 K (Δvap H 0 ), standard enthalpy of fusion at 298 K (Δfus H 0 ), flash points in air at atmospheric pressure (FP) were calculated by the values of corresponding groups using group-contribution method (Table S1) [13–15]. The details were described in the support information. The sum of electronic and zero-point energies (U0), sum of electronic and thermal energies (U), sum of electronic and thermal enthalpies (H) and sum of electronic and thermal free energies (G) for all the molecules were calculated at the DFT level of (B3LYP/6-31G (d, p)) by using Gaussian 09. Using the calculated sum of electronic and thermal enthalpies, the standard formation enthalpy of vapor phase at 298 K 0 (Δf H298( g ) ) was calculated according to the homodesmotic reactions for thermochemistry [27]. Then, the net heat of combustion (NHOC) can be calculated. Specific impulse (Isp) of fuel was calculated by an empirical equation as shown in the supplemental text. The calculated properties of 8 HEDH fuels are compared with their corresponding experimental values in Table S2. The consistent data identifies that the calculation method of fuel properties employed in this work is accurate enough for database establishment. Then, the database containing molecular structures and their properties of 342 hydrocarbon molecules was established. 2.2. Data representation Coulomb matrix was initially confirmed to be an effective representation for predicting molecular properties. Coulomb matrix containing molecular structure and nuclear charge information has been selected as the input in this work. The coulomb matrix was calculated using the formulas shown in Fig. 1. The Coulomb matrix tensor of JP-10 is also displayed in Fig. 1. Here Ri is cartesian coordinates and Zi is nuclear charges, off-diagonal elements correspond to the coulomb repulsion between atoms i and j, while diagonal elements encode a polynomial (0.5Zi2.4 ) fit of atomic energies to nuclear charges. The number of atoms for the molecules in our database is from 9 to 55. We filled zeros in the low dimensional matrix to obtain uniform matrices with the same dimension. All molecules were converted to a new data set which contains molecular information tensor (342 × 55 × 55) and calculated properties tensor. “CoulombMatrix” module in “molml” package was employed to

Scheme 1. Schematic diagram for discovering new fuels via machine learning enabled high-throughput screening. 2

Fuel 265 (2020) 116968

G. Li, et al.

Fig. 1. The Coulomb matrix tensor of JP-10. Dark red color means larger value, and dark green indicates smaller value. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

△vapH0, Vc, VNHOC) were also learned and predicted as shown in Figs. S2–S16. The data points distributed near the diagonal (y = x) indicate good prediction. The error analysis shows that single layer neural network performed well on the three datasets (train dataset, validation dataset, and test dataset). Similar performance on training dataset and test dataset indicates that overfitting is very weak though we use a small dataset. Significantly, the energy properties, U0, U, H, and G, were predicted with high accuracy (R > 0.9999) by the as-developed neural networks. It is ascribed to the energy form of Coulomb matrix, which guarantees its good correlation with the energy properties. The mean absolute errors (MAEs) of prediction are summarized in Table 1. The prediction errors have been effectively suppressed. It can be concluded that a function for the mapping from structure information (eigenvalue of Coulomb matrix) to a certain property was constructed. The errors are acceptable for our screening process, in which a group of candidates will be generated based on the comparison of relative values in the database. The SLNN possessing acceptable accuracy requires few parameters and calculates fast, which makes it an efficient tool for building large database. Therefore, the as-developed SLNNs were employed to generate target molecular properties based on given molecular structures.

transfer our input data to Coulomb matrix [28,29]. Previous studies showed that the Coulomb matrix gave excellent results in energy prediction [30,31]. In our database, the converted Coulomb matrix contains 3025 (55 × 55) elements. There will be a 3025-dimension array after flatten, and the cost of computation will be huge. Thus, we choose singular value decomposition (SVD) transformation which sacrifices a little information but gets a much lower dimension (square root of 3025), as shown in Eq. (1). SVD results can be regarded as a generalized eigenvalue of matrix containing much information of the original matrix.

Cij v = λi

(1)

3. Results and discussions 3.1. Machine learning Machine learning was firstly conducted on a self-built database containing 342 hydrocarbon structures and their properties, including melting point (Tm), boiling point (Tb), critical temperature (Tc), critical pressure (Pc), critical volume (Vc), standard enthalpy of vaporization at 298 K (△vapH0) , standard enthalpy of fusion at 298 K (△fusH0), flash points in air at atmospheric pressure (FP) and the density (ρ) at 298 K, the sum of electronic and zero-point energies (U0), sum of electronic and thermal energies (U), sum of electronic and thermal enthalpies (H) and sum of electronic and thermal free energies (G), the standard for0 mation of vapor phase at 298 K (Δf H298( g ) ), the net heat of combustion (NHOC), volumetric NHOC and specific impulse (ISP). The database contains typical hydrocarbon fuels, including chain paraffin (e.g., noctane, n-dodecane), multi-cyclic fuels (e.g., RJ-4, RJ-5, JP-10, tetracyclic heptane), and strained and caged fuel (e.g., cubane, adamantane). Based on these typical structures, new hydrocarbon derivatives were designed via the substitution by methyl, ethyl, isopropyl or cyclopropane groups. Statistical analyses of carbon atom number and hydrocarbon unsaturation for the 342 molecules in the database were shown in Fig. 2. Coulomb matrix containing molecular structure and nuclear charge information has been selected as the input in machine learning (Fig. 1). The model of single layer neural network used in this work was schematically shown in Fig. S1. Fig. 3 shows the fitting degree of target melting point (Tm) using a single layer neural network (SLNN). The other properties (FP, △fusH0, G, H, Isp, NHOC, Pc, Tb, Tc, ρ, U, U0,

3.2. Construction of a large database. Molecular structures in the hydrocarbon subset of GDB-13 were then input into the neural networks. Using a cycling code, we applied the obtained functions of neural networks to all arrays in GDB library, and calculated a list of properties for each molecule. This process was run on a personal computer (CPU i5-7th, 8 GB RAM) and cost 13.4 s (Table S3). Finally, a new database containing 319,895 molecular structures and their properties was established. The data were visualized to gain a global view. Fig. S17 shows the Isp values for 319,895 molecules in the database. Some molecules exhibit obviously higher Isp values than the others, which are potential candidates for the design of next generation HEDH fuels. It should be noticed that our results of Isp were mainly defined for a specific kind of rocket engines as reported by Savos’kin et al. [32]. Specific impulse depends on the structure of engine, the value of which may be changed by the using of different engines. When three properties for each molecule are considered, new two-dimension hot-map figure can be drawn. In Fig. S18, the melting points together with the NHOC and Isp values are discriminated for all the molecules. Fig. S19 displays the 3

Fuel 265 (2020) 116968

G. Li, et al.

Fig. 2. The statistical data of (a) carbon atom number and (b) hydrocarbon unsaturation for the 342 molecules in the database.

3.3. High-throughput screening

values of Tb, NHOC and Isp for all the molecules. A rough screening of the hydrocarbon molecules can be done on both figures. When the four properties describing fuel performance, i.e. ρ, Tm, NHOC and Isp, are simultaneously displayed, Fig. 4 was obtained. Based on the distribution of the data points, interesting molecules with outstanding properties can be screened. Meanwhile, useless molecules will be quickly ruled out.

We ranked some key properties in the database, and the top-10 molecules for each property (density, melting point, the net heat of combustion, and specific impulse). The best molecules are different for different properties. There is not a best hydrocarbon molecule, all the properties of which are superior to the others’. In practice, a qualified

Fig. 3. Predictions of melting point (Tm) by neural network via training, validation and test. 4

Fuel 265 (2020) 116968

G. Li, et al.

models after structure optimization are summarized in Fig. S20. Half of the as-screened 28 molecules have four carbon rings (the first two lines of Fig. 5), and the other half possess five rings (the last two lines of Fig. 5). The most popular ring is five-membered cyclic carbon ring, which appeared in 96.4% molecules. Moreover, 92.9% molecules have three-membered cyclic carbon ring. 60.7% and 35.7% molecules have four-membered and six-membered cyclic carbon rings, respectively. The ring composition of both types of molecules (five-ring and four-ring structures) was also analyzed as shown in Fig. 6. In the five-ring molecules, the contents of three-membered, four-membered, five-membered and six-membered cyclic carbon rings are 32.9%, 20.0%, 40.0% and 5.7%, respectively. In the molecules possessing four cyclic rings, the contents of three-membered, four-membered, fivemembered and six-membered cyclic carbon rings are 30.4%, 14.3%, 39.3% and 10.7%, respectively. Beside their unique spatial structure, another distinguishing feature of the newly discovered fuel candidates is the presence of three-membered cyclic carbon ring, which is seldom found in the traditional HEDH fuels. Therefore, cyclopropanation is supposed to be a universal technology to synthesize new HEDH fuels or improve the performance of existed fuels. When stricter standards were applied during screening (Tm < 273.15 K, NHOC > 90%*max(NHOC), Isp > 80%*max(Isp), and ρ > 80%*max(ρ)), only one molecule (No. 251579) met all the requirements. As displayed in the first place of Fig. 5, molecule 251,579 exhibits beautiful and concise structure. It is a cleverly connected tetracyclic molecule consisting of one three-membered ring, one fourmembered ring, one five-membered ring and one six-membered ring. To validate our screening results, the properties of the 28 molecules were also calculated by DFT and group-contribution methods. Fig. S21 compares the values of melting point, flash point, the net heat of combustion, and specific impulse of the 28 molecules predicted by machine learning (black squares) and calculated by DFT and group contribution (red dots). Even though there are differences among the absolute values, the data trends predicted by both methods are consistent. The distinctive molecular structures together with verifiable data trends of the properties indicate that our screening method is effective. The properties of the 28 hydrocarbon molecules calculated by

Table 1 MAE values for the prediction of various molecular properties using SLNNs. Properties

Unit

MAE

Tm Tb Tc Pc Vc

Δvap H 0

K K K bar cm3/mol KJ/mol

11.47 7.510 11.51 2.525 10.51 1.275

Δfus H 0

KJ/mol

1.992

FP ρ U0 U H G NHOC VNHOC Isp

o

4.029 0.05152 0.09222 0.09365 0.2885 1.075 0. 3651 2.602 1.447

C g/cm3 Ha Ha Ha Ha MJ/kg MJ/L s

HEDH fuel should possess one or two outstanding properties, the other properties of which just meet the basic requirements. Therefore, highthroughput screening of potential HEDH fuels was conducted in the database. The filter criteria, including density, melting point, the net heat of combustion, and specific impulse, are the key parameters to screen a group of HEDH fuels. By changing the threshold values, various candidates with different performances can be discovered. The threshold value was set to either a minimum percentage of the highest value or a specific value that should be achieved or exceeded. Herein, the screening criteria of melting point was set to be lower than 273.15 K to find potential liquid fuels. When the net heat of combustion is > 85% of the maximum value and the specific impulse is > 80% of the maximum value (NHOC > 85%*max(NHOC), Isp > 80%*max(Isp)), 28 molecules were obtained, all of which are new hydrocarbon structures compared with those of traditional HEDH fuels. Their molecular structures are displayed in Fig. 5, and corresponding ball-and-stick

Fig. 4. The values of NHOC (x axis), Isp (y axis), ρ (z axis) and Tm (color depth of the dots) for the 319,895 molecules. The deeper color of the dots means higher melting temperature. 5

Fuel 265 (2020) 116968

G. Li, et al.

Fig. 5. Molecular structures of the as-screened 28 hydrocarbon molecules.

Table 2 . Calculated properties of the new hydrocarbon molecules discovered by highthroughput screening and traditional fuels of JP-10 and quadricyclane (QC).

Fig. 6. Ring composition of the 28 hydrocarbon molecules. Cx (x = 3, 4, 5, 6, 7) means x-membered cyclic carbon ring.

group contribution and DFT are summarized in Table 2. These molecules possess different advantages in one or two properties, the other properties of which are comparable to those of JP-10 and quadricyclane (QC). In addition, the density and flash point of the fuel candidates are generally higher than those of JP-10 and QC. Due to their unique structures and outstanding properties, the as-discovered 28 molecules are competitive candidates of the next-generation HEDH fuels. In the future, the as-screened compounds will be synthesized, and their properties can be measured experimentally to confirm our prediction. As the accuracy and quality of initial dataset improve, and the predicting errors of neural network minimize, the combinatorial screening procedures will be employed to exploring the broader composites and structures of hydrocarbon molecules for more broadly applications. In addition, the high-throughput screening methodology based on machine-learning is easily extended to other fields for assisting the design of new molecules. It is anticipated that this method would also accelerate the discovery of other functional molecules, e.g., explosives, lubricants, and additives.

No.

Formula

Tm/K

Isp/s

NHOC/MJ·kg−1

ρ/g·cm−3

FP/℃

705 706 5375 6377 9813 20,429 22,185 26,771 33,743 33,744 41,131 53,278 55,744 65,884 81,434 81,974 82,301 82,621 82,630 118,415 153,954 178,067 250,609 251,579 257,866 261,824 268,141 304,408 JP-10 QC

C13H18 C13H18 C13H18 C13H18 C13H20 C13H18 C13H18 C13H18 C13H18 C13H20 C13H20 C13H20 C13H18 C13H20 C13H18 C13H18 C13H18 C13H20 C13H20 C13H20 C13H20 C13H20 C12H16 C12H18 C12H18 C12H18 C12H18 C11H14 C10H16 C7H8

292.7 292.7 280.5 304.0 299.1 289.6 301.1 311.8 289.6 293.1 293.1 275.5 284.9 303.7 293.4 296.7 281.3 291.5 288.4 275.5 220.4 231.5 280.7 274.9 271.4 277.7 279.4 280.9 269.3 230.0

341.4 341.6 341.4 341.2 341.9 341.1 341.3 340.8 341.7 340.2 338.7 340.2 341.0 359.5 343.9 342.4 339.9 340.0 341.2 338.6 339.9 340.3 348.2 340.0 340.2 340.2 341.6 338.9 337.4 347.4

42.44 42.48 42.43 42.38 43.10 42.35 42.40 42.29 42.51 42.67 42.29 42.68 42.32 47.66 43.06 42.67 42.06 42.61 42.93 42.28 42.59 42.70 43.95 42.47 42.55 42.53 42.89 41.40 42.17 43.02

1.36 1.36 1.39 1.32 1.19 1.33 1.29 1.26 1.33 1.15 1.15 1.14 1.22 1.08 1.22 1.18 1.25 1.08 1.07 1.14 1.19 1.08 1.37 1.23 1.21 1.16 1.08 1.34 1.04 1.09

84.0 84.0 78.1 89.7 90.3 76.8 82.7 88.3 76.8 76.3 76.3 80.3 86.5 80.8 85.3 74.5 79.4 75.5 68.0 80.3 76.8 74.6 70.3 71.2 63.5 66.2 61.0 62.1 55.7 −0.4

proof-of-concept study for discovering new HEDH hydrocarbon fuels. A small database containing molecular structures and properties of 342 molecules was established, based on which machine learning was carried out. We show that machine learning can map the intricate structure-properties relationship and enable accurate prediction for the construction of a large database. Then, we generated a properties library of 319,895 hydrocarbon molecules using the as-developed neural networks with the molecular structures in GDB-13 database as input. A large-scale data-driven search for the candidates of new HEDH fuels was carried out. A group of 28 hydrocarbon molecules was identified,

4. Conclusions In summary, we developed a new strategy to accelerate the design of next generation fuels based on machine learning, and conducted a 6

Fuel 265 (2020) 116968

G. Li, et al.

which were found to be highly potent with high density, high net heat of combustion, high specific impulse, and low melting point. The highthroughput screening strategy based on machine learning is a promising technique for use in the searches or designs of new fuels. The screening procedure can be viewed as a general, systematic, efficient method of incorporating any interesting criteria of fuel properties for the design of next generation fuels.

[9] Xie J, Zhang X, Pan L, Nie G, Xiu-Tian-Feng E, Liu Q, et al. Renewable high-density spiro-fuels from lignocellulose-derived cyclic ketones. Chem Commun 2017;53(74):10303–5. [10] Rokni HB, Gupta A, Moore JD, Mhugh MA, Bamgbade BA, Gavaises M. Purely predictive method for density, compressibility, and expansivity for hydrocarbon mixtures and diesel and jet fuels up to high temperatures and pressures. Fuel 2019;236:1377–90. [11] Shi X, Li H, Song Z, Zhang X, Liu G. Quantitative composition-property relationship of aviation hydrocarbon fuel based on comprehensive two-dimensional gas chromatography with mass spectrometry and flame ionization detector. Fuel 2017;200:395–406. [12] Wang Y, Ding Y, Wei W, Cao Y, Davidson DF, Hanson RK. On estimating physical and chemical properties of hydrocarbon fuels using mid-infrared FTIR spectra and regularized linear models. Fuel 2019;255:115715. [13] Osmont A, Catoire L, Gökalp I. Physicochemical properties and thermochemistry of propellanes. Energy Fuels 2008;22(4):2241–57. [14] Osmont A, Gökalp I, Catoire L. Evaluating missile fuels. Propellants Explos Pyrotech 2006;31(5):343–54. [15] Marrero J, Gani R. Group-contribution based estimation of pure component properties. Fluid Phase Equilib 2001;183–184:183–208. [16] Saldana DA, Starck L, Mougin P, Rousseau B, Pidol L, Jeuland N, et al. Flash point and cetane number predictions for fuel compounds using quantitative structure property relationship (QSPR) methods. Energy Fuels 2015;25(9):3900–8. [17] Jain A, Shin Y, Persson KA. Computational predictions of energy materials using density functional theory. Nat Rev Mater 2016;1:15004. [18] Ahmad Z, Xie T, Maheshwari C, Grossman JC, Viswanathan V. Machine learning enabled computational screening of inorganic solid electrolytes for suppression of dendrite formation in lithium metal anodes. ACS Cent Sci 2018;4(8):996–1006. [19] Sendek AD, Cubuk ED, Antoniuk ER, Cheon G, Cui Y, Reed EJ. Machine learningassisted discovery of solid li-ion conducting materials. Chem Mater 2019;31(2):342–52. [20] Yoshida M, Hinkley T, Tsuda S, Abul-Haija YM, McBurney RT, Kulikov V, et al. Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 2018;4(3):533–43. [21] Mansouri Tehrani A, Oliynyk AO, Parry M, Rizvi Z, Couper S, Lin F, et al. Machine learning directed search for ultraincompressible, superhard materials. J Am Chem Soc 2018;140(31):9844–53. [22] Liu G, Wang L, Qu H, Shen H, Zhang X, Zhang S, et al. Artificial neural network approaches on composition–property relationships of jet fuels based on GC–MS. Fuel 2007;86(16):2551–9. [23] Saldana DA, Starck L, Mougin P, Rousseau B, Ferrando N, Creton B. Prediction of density and viscosity of biofuel compounds using machine learning methods. Energy Fuels 2012;26(4):2416–26. [24] Saldana DA, Starck L, Mougin P, Rousseau B, Creton B. On the rational formulation of alternative fuels: melting point and net heat of combustion predictions for fuel compounds using machine learning methods. Sar & Qsar in Environmental Research 2013;24(4):259-77. [25] Saldana DA, Starck L, Mougin P, Rousseau B, Creton B. Prediction of flash points for fuel mixtures using machine learning and a novel equation. Energy Fuels 2013;27(7):3811–20. [26] Hou F, Wu Z, Hu Z, Xiao Z, Wang L, Zhang X, et al. Comparison study on the prediction of multiple molecular properties by various neural networks. J. Phys. Chem. A 2018;122(46):9128–34. [27] Wheeler SE, Houk KN, Schleyer PVR, Allen WD. A hierarchy of homodesmotic reactions for thermochemistry. J Am Chem Soc 2009;131(7):2547–60. [28] Collins CR, Gordon GJ, Lilienfeld OAV, Yaron DJ. Constant size descriptors for accurate machine learning models of molecular properties. J Chem Phys 2018;148(24):241718. [29] Landrum G. RDKit: Open-source cheminformatics; Available from: http://www. rdkit.org. [30] Rupp M, Tkatchenko A, Müller K-R, von Lilienfeld OA. Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett 2012;108(5):058301. [31] Hansen K, Montavon G, Biegler F, Fazli S, Rupp M, Scheffler M, et al. Assessment and validation of machine learning methods for predicting molecular atomization energies. J Chem Theory Comput 2013;9(8):3404–19. [32] Savoskin MV, Kapkan LM, Vaiman GE, Vdovichenko AN, Gorkunenko OA, Yaroshenko AP, et al. New approaches to the development of high-performance hydrocarbon propellants. Russian J. Appl. Chem. 2007;80(1):31–7.

Author contributions G.L. conceived the project, carried out the analyses, and wrote the manuscript. L.W., X.Z., and G.L. supervised the project. F.H. calculated the properties using group-contribution and DFT methods, and built the small database. Z.H. conducted machine learning, built the large database and did high-throughput screening. X.L. examined the database. All authors read and approved the manuscript. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements The authors gratefully acknowledge financial support from the National Key Research and Development Program of China (2016YFB0600305) and the National Natural Science Foundation of China (21306132). The DFT calculations (Gaussian 09) were performed on TianHe-1(A) at National Supercomputer Center in Tianjin. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.fuel.2019.116968. References [1] Zhang X, Pan L, Wang L, Zou J-J. Review on synthesis and properties of high-energy-density liquid fuels: Hydrocarbons, nanofluids and energetic ionic liquids. Chem Eng Sci 2018;180:95–125. [2] Chung HS, Chen CSH, Kremer RA, Boulton JR, Burdette GW. Recent developments in high-energy density liquid hydrocarbon fuels. Energy Fuels 1999;13(3):641–9. [3] Pan L, Feng R, Peng H, X-t-f E, Zou J-J, Wang L, et al. A solar-energy-derived strained hydrocarbon as an energetic hypergolic fuel. RSC Adv. 2014;4(92):50998–1001. [4] Bruno TJ, Huber ML, Laesecke AD, Lemmon EW, Perkins RA. Thermochemical and Thermophysical Properties of JP-10. NIST Interagency/Internal Report (NISTIR)6640 2006. [5] Dubonosov AD, Bren VA, Chernoivanov VA. Norbornadiene-quadricyclane as an abiotic system for the storage of solar energy. Russ Chem Rev 2002;71(11):917–27. [6] Bren VA, Dubonosov AD, Minkin VI, Chernoivanov VA. Norbornadiene–quadricyclane — an effective molecular system for the storage of solar energy. Russ Chem Rev 1991;60(5):451–69. [7] Harvey BG, Wright ME, Quintana RL. High-density renewable fuels based on the selective dimerization of pinenes. Energy Fuels 2010;24(1):267–73. [8] Meylemans HA, Quintana RL, Goldsmith BR, Harvey BG. Solvent-free conversion of linalool to methylcyclopentadiene dimers: a route to renewable high-density fuels. ChemSusChem 2011;4(4):465–9.

7