Prediction of cellulose dissolution in ionic liquids using molecular descriptors based QSAR model

Prediction of cellulose dissolution in ionic liquids using molecular descriptors based QSAR model

Journal of Molecular Liquids 215 (2016) 541–548 Contents lists available at ScienceDirect Journal of Molecular Liquids journal homepage: www.elsevie...

3MB Sizes 7 Downloads 171 Views

Journal of Molecular Liquids 215 (2016) 541–548

Contents lists available at ScienceDirect

Journal of Molecular Liquids journal homepage: www.elsevier.com/locate/molliq

Prediction of cellulose dissolution in ionic liquids using molecular descriptors based QSAR model Ngoc Lan Mai a,b, Chan Kyung Kim c, Byungho Park c, Heon-Jin Park d, Sang Huyn Lee e, Yoon-Mo Koo b,⁎ a

Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh, Viet Nam Department of Marine Science and Biological Engineering, Inha University, Incheon, Republic of Korea Department of Chemistry, Inha University, Incheon, Republic of Korea d Department of Mathematics and Statistics, Inha University, Incheon, Republic of Korea e Department of Microbial Engineering, Konkuk University, Seoul, Republic of Korea b c

a r t i c l e

i n f o

Article history: Received 20 April 2015 Received in revised form 6 January 2016 Accepted 11 January 2016 Available online xxxx Keywords: Ionic liquid Cellulose Molecular descriptors Prediction QSAR

a b s t r a c t The dissolution of lignocellulose by ionic liquids attracted much attention during the last decade. However, the experimental screening and selection of a large number of potential ionic liquids for biomass processing are challenging task. In this study, the prediction of cellulose dissolution in ionic liquids was evaluated by quantitative structure–activity relationship (QSAR) model using the molecular descriptors of ionic liquid's constitutional ions derived from CODESSA program. Models based on cellulose molar solubility exhibited better correlation (R2 of 0.93 vs 0.88) and predictability (R2 of 0.89 vs 0.83) than those based on mass percentage solubility. In addition, models developed by multivariate adaptive regression spline (MARS) method employed less variables (13 vs 57–59) and showed better predictability (R2 of 0.83–0.89 vs 0.45–0.51) compared to those developed by multiple linear regression (MLR) technique. The results indicated that the molecular descriptor of ILs could be effectively used to develop QSAR models for facilitating the in silico and a priori screening/selection of ILs customized for specific applications. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Cellulose is the most abundant bio-renewable organic material on Earth. It is the structural component of the primary cell wall of green plants and can also be produced by a large variety of living organisms such as algae and bacteria [1]. Due to its non-toxic, biodegradable, and modifiable properties, cellulose is considered as the most promising material for variety of applications such as daily commodities (e.g. paper, fiber and textile), pharmaceutical and food industries. In addition, more and more attention is being paid to the utilization of this inexhaustible material for the production of biocompatible products and fuels. Dissolution of cellulose is required for production of biofuels and platform chemicals from biomass as well as for the production of man-made cellulose fibers. Only few solvent systems can effectively dissolve cellulose, such as N,N-dimethylacetamide/Lithium chloride mixture (DMAc/LiCl), mixture of N,N-Dimethylformamide and dinitrogen tetroxide (DMF/N2O4), N-methylmorpholine-N-oxide (NMMO), and mixture of dimethyl sulfoxide and tetrabutylammonium fluoride (DMSO/TBAF). However, these solvents are highly toxic, unstable and/ or difficult to recover and, in many cases, alter cellulose to some extent [2,3]. ⁎ Corresponding author at: Department of Biological Engineering, Inha University, 100 Inharo, Nam-gu, Incheon 402–751, Republic of Korea. E-mail address: [email protected] (Y.-M. Koo).

http://dx.doi.org/10.1016/j.molliq.2016.01.040 0167-7322/© 2016 Elsevier B.V. All rights reserved.

Recently, ionic liquids (ILs), consisting entirely of ions and having low melting point, have been used as alternative solvents for cellulose and lignocelluloses pretreatment [4,5]. In comparison to traditional molecular solvents, ILs exhibit very interesting properties such as broad liquid temperature, high thermal stability and negligible vapor pressure [6]. It is commonly acknowledged that carbohydrates and lignin can be dissolved in ILs. As a result of dissolution in ILs, the intricate networks of non-covalent interactions among cellulose, hemicellulose, and lignin were effectively disrupted while minimizing the formation of degradation products [5]. Dissolution of various types of celluloses in ILs was studied by various researchers [4,7–12]. Since the first report of cellulose dissolution in ILs, many kinds of ILs or IL-like solvents have been found or developed with an ability to dissolve cellulose [1,13]. Both cation and anion of ILs affect the dissolution of cellulose. However, while the role of anion seems to be clear with the trends in solubility following basicity, the same cannot be said of the influence of the cations [1]. The anions having good hydrogen bond acceptors, such as acetate, formate and chlorideare more effective in dissolving cellulose. As a result, hydrogen bonding acceptor ability (or basicity) of ILs has been used as indicator for experimental screening of ILs for cellulose dissolution. However, the properties of ILs can be easily modified by changing the structure of the cations or anions [14], and it is estimated that the number of potential cation and anion combinations available reputedly equate to 1018 different ILs [15]. Given the huge number of potential ILs, experimental measurement of properties/activities for all ILs of interest

542

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548

Table 1 Cellulose solubility in ionic liquids. Entry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

Ionic liquids

[DEME][Ala] [DEME][Arg] [DEME][Asn] [DEME][Asp] [Admim[Br] [C2mim][Br] [C3mim][Br]⁎ [C4mim][Br] [C5mim][Br] [C6mim][Br] [C7mim][Br] [C8mim][Br] [DEMB][Butyrate] [DEME][Butyrate] [C2mim][Cl] [C2mim][Cl] [C3mim][Cl]⁎ [C4mim][Cl] [C5mim][Cl] [C6mim][Cl] [C7mim][Cl] [C8mim][Cl] [C9mim][Cl]⁎ [C10mim][Cl]⁎ [H(OEt)2-Mim][Cl] [Me(OEt)2-Et-IM][Cl] [DEME][Cys] [C4mim][DCA] [C2mim][DEP] [DEME][DMA] [C1mim][DMP] [C2mim][F] [C4mim][Formate] [TBA][Formate] [TBP][Formate] [DEME][Gln] [DEME][Glu] [DEME][Gly] [DEME][His] [DEME][ILe] [DEME][Leu] [DEME][Lys] [DEME][MDEPA] [DEME][MeO(H)PO2] [C2mim][MEPA] [DEMB]MEPA] [DEME][MEPA] [DEME][Met] [DEMB][MTA] [DEME][MTA] [DEME][MTEPA] [(MeOEt)2NH2][OAc] [C2mim][OAc] [C4mim][OAc] [C8mim][OAc] [DEME][OAc] [H(OEt)2-Me-IM][OAc] [H(OEt)3-Me-IM][OAc] [Me(MeOEt)2NH][OAc] [Me(OEt)2-Et3N][OAc] [Me(OEt)2-Et-Im][OAc] [Me(OEt)3-Bu-Im][OAc]⁎ [Me(OEt)3-Et3N][OAc] [Me(OEt)3-Et-Im][OAc] [Me(OEt)3-MeOEtOMe-Im][OAc] [Me(OEt)4-Et-Im][OAc] [Me(OEt)7-Et-Im][OAc] [Me(OPr)3-Et-Im][OAc] [MM(EtOH)NH][OAc] [MM(MeOEt)NH][OAc] [DEME][OH] [DEME][Orn] [DEME][Phe] [DEME][Pro]

Temp. (°C)

100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 90 100 110 100 100 100 100 100 100 110 110 100 110 100 100 100 100 110 110 110 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 110 100 100 110 100 110 110 110 110 110 110 110 110 110 110 110 110 110 110 100 100 100 100

Experiment solubility

Prediction solubility

wt.%

g/mol

wt.%

g/mol

12 1 5 1 12 1 1 2 1 1 1 1 0 3 14 12 0.5 10 1 6 5 4 2 0.5 1 2 1 1 14 5 10 2 8 1.5 6 3 0 4 1 7 5 11 18 7 15 21 24 5 8 25 16 0.5 8 12 1 7 5 2 0.5 10 12 0.5 10 12 0.5 10 3 0.5 0.5 0.5 5 8 5 1

28.1 3.2 13.9 2.8 26.1 1.9 2.1 4.4 2.3 2.5 2.6 2.8 0 7.0 20.5 17.6 0.8 17.5 1.9 12.2 10.8 9.2 4.9 1.3 2.1 4.7 2.7 2.1 37.0 13.2 22.2 2.6 14.7 4.3 18.3 8.7 0 8.8 3.0 19.3 13.8 32.1 60.7 16.9 38.7 61.2 70.4 14.7 18.7 58.8 61.0 1.0 13.6 23.8 2.5 14.4 11.5 5.5 1.0 26.3 31.0 1.7 30.7 36.3 1.8 34.6 14.4 1.7 0.8 0.8 8.2 22.2 15.5 2.6

7.7 1.5 5.2 3.3 11.1 4.6 1.8 2.7 1.0 −0.2 0 0.3 −0.6 2.3 10.7 10.7 1.2 7.9 6.0 4.8 5.0 5.3 2.8 0.2 1.0 3.2 2.2 0.2 14.8 3.7 8.1 2.1 10 3.7 4.4 0.9 2.6 9.4 1.4 6.5 5.9 9.4 18.9 4.9 15.7 22.6 21.6 6.6 8.2 24.8 15.5 3.2 8.1 7.7 3.2 10.3 3.2 4.1 −1.8 8.6 11.3 0.3 7.2 9.4 2.3 7.8 5.1 3.7 0.8 −2.5 4.9 7.5 4.3 0.1

26.0 3.2 12.4 4.9 30.9 8.9 2.7 10.9 7.5 10.0 5.8 5.0 2.5 9.1 24.1 18.9 1.9 18.7 1.9 7.5 10.0 8.8 5.0 1.8 2.5 13.7 9.1 0.2 35.9 17.1 21.1 8.9 10.9 10.3 11.4 10.2 3.7 12.8 3.5 9.6 17.9 28.6 66.8 14.4 49.8 56.2 63.0 18.0 18.6 58.9 58.7 3.8 8.9 10.9 5.0 16.3 9.0 7.4 1.1 25.3 33.6 1.4 30.3 29.6 1.9 31.6 13.1 1.5 2.1 0.7 6.1 24.2 16.5 1.3

Ref.

[10] [10] [10] [10] [29] [30] [30] [30] [30] [30] [30] [30] [31] [31] [30] [29] [32] [9] [30] [30] [30] [30] [32] [32] [9] [9] [10] [9] [30] [10] [30] [30] [9] [9] [9] [10] [10] [10] [10] [10] [10] [10] [10] [10] [31] [31] [31] [10] [31] [31] [31] [9] [30] [30] [9] [10] [9] [9] [9] [9] [9] [9] [9] [9] [9] [9] [9] [9] [9] [9] [10] [10] [10] [10]

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548

543

Table 1 (continued) Entry

75 76 77 78 79 80

Ionic liquids

[DEME][Propionate] [DEME][Ser] [DEME][Thr] [DEME][Trp] [DEME][Tyr] [DEME][Val]

Temp. (°C)

100 100 100 100 100 100

Experiment solubility

Prediction solubility

wt.%

wt.%

6 4 7 5 5 5

g/mol 13.2 10.0 18.5 17.5 16.3 13.1

7.5 4.9 6.4 5.6 3.7 6.2

Ref.

g/mol 10.5 10.8 14.6 15.2 12.3 13.7

[31] [10] [10] [10] [10] [10]

⁎ Data used in the testing set.

is unrealistic. An alternative is to develop predictive models from which the properties/activities of ILs can be a priori estimated. For instance, Kahlen and colleges have modeled cellulose solubilities in ILs using a conductor-like screening model for realistic solvation (COSMO-RS) [16]. They found that COSMO-RS is well suited for a relatively quick,

qualitative screening of a large number of ILs potentially dissolving cellulose, although a quantitative prediction of cellulose solubility in ILs is not yet possible. On the other hand, several quantitative structure– property relationship (QSPR) or quantitative structure–activity relationship (QSAR) models employing molecular descriptors (structural

Table 2 Predictive models for cellulose solubility in ionic liquids. Y (wt.%) = 6.355871 + 23.80586 ∗ max(0, C166 + 3.5179) + 3.87114 ∗ max(0, −3.5179 − C166) + 324.2795 ∗ max(0, A084 − 0.7929) −236.3384 ∗ max(0, 0.7929 − A084) + 6.779582 ∗ max(0, A163 + 1.4877) − 5.968241 ∗ max(0, −1.4877 − A163) +0.00767791 ∗ max(0, A175 − 109.6) + 0.2432445 ∗ max(0, 109.6 − A175) − 0.597909 ∗ max(0, A189 + 79.8106) +0.03178639 ∗ max(0, −79.8106 − A189) − 14,409.9 ∗ max(0, C005 − 0.6667) ∗ max(0, A163 + 1.4877) −18.55 ∗ max(0, 0.6667 − C005) ∗ max(0, A163 + 1.4877) − 0.27522 ∗ max(0, −3.5179 − C166) ∗ max(0, C219 − 13.8087) +4.683934 ∗ max(0, −3.5179 − C166) ∗ max(0, 13.8087 − C219) − 0.1630504 ∗ max(0, −3.5179 − C166) ∗ max(0, A161 − 18) + 1.42589 ∗ max(0, −3.5179 − C166) ∗ max(0, 18 − A161) + 0.0630375 ∗ max(0, C186 − 947.7028) ∗ max(0, 0.7929 − A084) +0.6670725 ∗ max(0, 947.7028 − C186) ∗ max(0, 0.7929 − A084) − 0.152008 ∗ max(0, C196 − 1.5474) ∗ max(0, 109.6 − A175) −0.3919169 ∗ max(0, 1.5474 − C196) ∗ max(0, 109.6 − A175) − 5.270999 ∗ max(0, C200 − 0.8266) ∗ max(0, 109.6 − A175) +3.360952 ∗ max(0, 0.8266 − C200) ∗ max(0, 109.6 − A175) − 4.957844 ∗ max(0, A005 − 0.4545) ∗ max(0, A175 − 109.6) −0.4087298 ∗ max(0, 0.4545 − A005) ∗ max(0, A175 − 109.6) + 115.6391 ∗ max(0, A035 − 87.0973) ∗ max(0, A163 + 1.4877) −0.1060982 ∗ max(0, 87.0973 − A035) ∗ max(0, A163 + 1.4877)

R2: 0.88 MAE: 1.53 (training set) R2: 0.83 MAE: 0.52 (testing set)

Molecular descriptors and their importance evaluated by their contribution in the regression model (based on the number of model subsets that include the variable) A084: ZX Shadow/ZX Rectangle 26 A163: HOMO-1 energy 25 C186: WPSA-2 Weighted PPSA (PPSA2 ∗ TMSA / 1000) [Quantum-Chemical PC] 24 A035: Molecular weight 21 C200: FHDSA Fractional HDSA (HDSA/TMSA) [Quantum-Chemical PC] 21 A175: PNSA-1 Partial negative surface area [Quantum-Chemical PC] 21 C166: LUMO + 1 energy 21 A005: Relative number of H atoms 20 C005: Relative number of H atoms 18 C219: HA dependent HDCA-1 [Quantum-Chemical PC] 16 A161: No. of occupied electronic levels 15 C196: RPCS Relative positive charged SA (SAMPOS ∗ RPCG) [Quantum-Chemical PC] 15 A189: PNSA-3 Atomic charge weighted PNSA [Quantum-Chemical PC] 7 R2: 0.93 MAE: 3.16 Y (g/mol) = 58.09079 + 0.06919651 ∗ max (0, C060 − 54.0133) − 0.9901619 ∗ max (0, 54.0133 − C060) − 8.933019 ∗ max (0, A058 − 8.5094) (training set) −3.82879 ∗ max (0, 8.5094 − A058) + 531.608 ∗ max (0, A084 − 0.7929) − 386.8704 ∗ max (0, 0.7929 − A084) R2: 0.89 MAE: 0.56 −11.86868 ∗ max (0, A163 + 1.193) − 10.84949 ∗ max (0, −1.193 − A163) + 0.3382362 ∗ max (0, A175 − 109.6) (testing set) −1.139212 ∗ max (0, 109.6 − A175) + 12,827.31 ∗ max (0, C026 − 0.0323) ∗ max (0, 0.7929 − A084) −14.80011 ∗ max (0, C060 − 54.0133) ∗ max (0, C243 − 0.0655) − 14.82191 ∗ max (0, C060 − 54.0133) ∗ max (0, 0.0655 − C243) −63.02275 ∗ max (0, 54.0133 − C060) ∗ max (0, A069 − 0.6563) − 1.183504 ∗ max (0, 54.0133 − C060) ∗ max (0, 0.6563 − A069) +157.7688 ∗ max(0, C090 − 0.0085566) ∗ max (0, 109.6 − A175) + 48.14375 ∗ max (0, 0.0085566 − C090) ∗ max(0, 109.6 − A175) −32.22669 ∗ max (0, C091 + 0.0297) ∗ max (0, 109.6 − A175) − 96.48308 ∗ max (0, −0.0297 − C091) ∗ max (0, 109.6 − A175) +0.1453646 ∗ max (0, 8.5094 − A058) ∗ max (0, A238 − 44.4138) + 0.07017021 ∗ max (0, 8.5094 − A058) ∗ max (0, 44.4138 − A238) −0.4723991 ∗ max (0, A075 − 2.695) ∗ max (0, A175 − 109.6) − 0.2630441 ∗ max (0, 2.695 − A075) ∗ max (0, A175 − 109.6) −14.70788 ∗ max (0, A084 − 0.753) ∗ max (0, 109.6 − A175) + 27.77304 ∗ max (0, 0.753 − A084) ∗ max (0, 109.6 − A175) −17.66268 ∗ max (0, 0.7929 − A084) ∗ max (0, A189 + 79.8106) + 4.328824 ∗ max (0, 0.7929 − A084) ∗ max (0, −79.8106 − A189) Molecular descriptors and their importance evaluated by their contribution in the regression model (based on the number of model subsets that include the variable) A084: ZX Shadow/ZX Rectangle 27 A075: Balaban index 26 A175: PNSA-1 Partial negative surface area [Quantum-Chemical PC] 26 C091: Min partial charge for a C atom [Zefirov's PC] 25 C090: Max partial charge for a C atom [Zefirov's PC] 21 C060: Information content (order 1) 20 C243: Principal moment of inertia A 20 A069: Average Structural Information content (order 2) 17 A058: Bonding Information content (order 0) 16 A163: HOMO-1 energy 10 A189: PNSA-3 Atomic charge weighted PNSA [Quantum-Chemical PC] 10 C026: Relative number of double bonds 10 A238: ALFA polarizability (DIP) 9 A: molecular descriptors of IL's anions, C: molecular descriptors of IL's cations.

544

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548

information) of ILs have been developed to quantitatively predict their physical properties [17–23], gas solubility in ILs [24] and toxicity [25–28]with acceptable accuracy. For the best of our knowledge, there are no publications that try to develop QSAR model to predict the cellulose solubility in ILs. In this study, QSAR models for predicting the cellulose solubility in ILs were investigated using molecular descriptors of ILs constitutional ions. 2. Data and methodology 2.1. Data collection To derive QSAR between cellulose solubility and molecular structural characteristics of ILs, it is necessary to collect data in a systematic way. In this work, Avicel solubility in 80 different ILs at temperature 90– 110 °C was comprehensively collected from literature. The ILs employed in this study were summarized in Table 1 together with their associated cellulose dissolution abilities.

2.2. Computational methods In performing the three-dimensional QSAR study, the first thing is to obtain proper molecular structures under experimental condition. Secondly, these structures can be used to generate various molecular descriptors. Computationally, it is not possible to determine any discrete structure of any IL because the interaction between the cations (anions) and anions (cations) involves long range neighboring ions due to strong Coulomb interaction between them. Alternatively, we can consider the cations and anions separately. In this work, all the structures of individual ions were optimized without constraint using BLY3P/6-31G** level in Gaussian 03 package [33] and verified as minima by the frequency calculations. If any structure showed more than one imaginary frequencies, the positions of the concerning atoms were adjusted according to their normal mode of vibrations. After reoptimization of these new structures, all the structures showed no imaginary frequency. It showed that the structures correspond to the minima on the potential energy surface. To calculate molecular

Fig. 1. Scatter plots of predicted vs experimental values of cellulose solubility in ionic liquids.

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548

descriptors of cations and anions employed in this work, all the optimized structures were read into CODESSA package [34]. The CODESSA is known to calculate various descriptors such as constitutional, topological, geometrical, electrostatic, quantum-chemical, and thermodynamic from the three-dimensional structures of molecules. The molecular descriptors of ILs are defined as a combination of molecular descriptors of ILs cation and anion. Four hundred thirty-eight molecular descriptors were calculated for 80 different ILs in the data set. Due to the small size of the data set, seventy-five data were used as training set to develop predictive models and 5 data were used as testing set to evaluate the predictability of the models. The split of data set into training and test set was done randomly by R. Because of nonlinearity between cellulose solubility and IL molecular descriptors, MARS

545

(Multivariate Adaptive Regression Spline) proposed by Friedman was employed for regression analysis and development of correlation model [35]. The model built by MARS, in general, has the following form f ðxÞ ¼

k X Ci Bi ðxÞ i¼1

where Ci is a constant coefficient, Bi(x) can be a constant, a hinge function having the form max (0, x − constant) or max (0, constant − x), or a product of two or more hinge functions. MARS automatically select explanatory variables (predictors) in the model with linear spline to reflect nonlinearity. In this study, MARS implemented in Earth package of R statistical program was used [36].

Fig. 2. Linear effect of molecular descriptor on the cellulose solubility in ionic liquids. All other molecular descriptors were at their median value.

546

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548

The predictive capability of the models was evaluated by mean absolute error (MAE) as following N 1X MAE ¼ jPi −Xi j N i¼1

where N, Pi, Xi are the number of data set, predicted value and experimental value, respectively. Multiple linear regression (MLR) models were also developed for comparison. Stepwise regression based on Akaike's information criterion (AIC) [37] implemented by stepAIC function in MASS package in R [38] was used to select MLR model that best explained the data with a minimum of free parameters (or variables).

3. Results and discussion Two QSAR correlation models for cellulose solubility in ILs with respect to mass, and molar percentages were developed. Both models include 13 molecular descriptors (variables or predictors) and were reliable as indicated by considerably high R2 value (0.83– 0.93) for both training and testing data as presented in Table 2 and Fig. 1. The model based on molar solubility showed better correlation (R2 of 0.93 vs 0.88) and prediction (R2 of 0.89 vs 0.83) than that based on the mass percentage. There were 7 and 8 molecular descriptors from IL's anions contributed to the regression models based on mass and molar percentage, respectively. However, only 4 molecular descriptors are identical (i.e. ZX shadow/ZX rectangle, HOMO-1 energy, PNSA-1 partial negative surface area [Quantum-chemical PC], and PNSA-3

Fig. 3. Nonlinear effect of molecular descriptor interactions on the cellulose solubility in ionic liquids. All other molecular descriptors were at their median value.

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548

atomic charge weighted PNSA [Quantum-chemical PC]) for both QSAR models, and all of them were of IL's anion. This can be explained by the importance of IL's anion on the dissolution of cellulose [12,16,39]. The importance of molecular descriptors involved in the models estimated by MARS according to their contribution to models was also presented in Table 2. The importance of variables in the MARS model was evaluated by counting the number of model subsets that include the variables. Therefore, variables that are included in more subset were considered more important. The geometrical descriptor, ZX shadow/ ZX rectangle of anion, was considered as the most important molecular descriptor in both QSAR models. This molecular descriptor is normalized shadow area calculated as the ratio of ZX shadow/(Zmax Xmax), where ZX shadow is the area of the shadow of the molecule as projected on the ZX plane and Zmax, Xmax are the maximum dimensions of the molecules along the corresponding axes [40]. In addition, the contribution of many electrostatic descriptors which reflect characteristics of the charge distribution of the molecule (i.e. WPSA-2 Weighted PPSA (PPSA2 ∗ TMSA / 1000) [Quantum-Chemical PC], FHDSA Fractional HDSA (HDSA/TMSA) [Quantum-Chemical PC], PNSA-1 partial negative surface area [Quantum-Chemical PC], HA dependent HDCA-1 [Quantum-Chemical PC], No. of occupied electronic levels, RPCS relative positive charged SA (SAMPOS ∗ RPCG) [Quantum-Chemical PC], and PNSA-3 atomic charge weighted PNSA [Quantum-Chemical PC] in mass percentage model and PNSA-1 partial negative surface area [QuantumChemical PC], Min partial charge for a C atom [Zefirov's PC], Max partial charge for a C atom [Zefirov's PC], and PNSA-3 Atomic charge weighted PNSA [Quantum-Chemical PC] in the molar solubility model) indicated the importance of charge and surface area of ILs for the cellulose dissolution. An explanation for this phenomenon could be that highly polar ions in solvent are required to disrupt hydrogen bonding network between cellulose chains in order to dissolve cellulose [16]. For instance, more negative surface area of the IL anion (higher value of PNSA-1 Partial negative surface area) corresponding to a more polar characteristic of ions [41] results in higher cellulose solubility in ILs as depicted in Fig. 2. In addition, it is well accepted that the dissolution of cellulose, in general, is favorable in low viscous ILs media [12], and the cation– anion interaction plays an important role for the viscosity of ILs. Binia et al. have pointed out important role of molecular descriptor FNSA-3 Fractional PNSA (PNSA-3/TMSA), where PNSA-3 is atomic charged

547

weighted partial negatively charged molecular surface area and TMSA are total molecular surface area, involved in the QSPR correlation for viscosities of ILs [42]. The contribution of PNSA-3 molecular descriptor in both QSAR models, therefore, is in agreement with the importance of ILs viscosity for cellulose dissolution. For comparison, two MLR models (for mass percentage and molar solubility) were also developed (data not shown). However, the large number of variables (molecular descriptors) involved in the models (57 and 59 variables for model based on mass percentage and molar solubility, respectively) led to an over-fitting phenomenon of these models. Both models showed comparable R2 value to MARS models (0.89 and 0.93 for training data set). However, their predictability for testing data set was very poor (R 2 of 0.45 and 0.51 and MAE of 5.32 and 11.76, respectively, for mass percentage and molar solubility model). The low predictability can be explained by the lack of interactions between variables in the MLR models. On the other hand, the MARS automatically model's non-linearities and interactions between variables result in better predictable models. Figs. 2 and 3 presented the linear and nonlinear effect of interactions among variables on the cellulose solubility in ILs, respectively. Both the figures showed the important effects of a geometrical descriptor (i.e. ZX shadow/ZX rectangle of anion) and electrostatic descriptors (i.e. PNSA-1 partial negative surface area [Quantum-Chemical PC] and PNSA-3 atomic charge weighted PNSA [Quantum-Chemical PC]) on the cellulose solubility in ILs as depicted from the QSAR models. 4. Conclusions Two QSAR models have been developed for cellulose solubility in ILs using molecular descriptors of IL constitutional ions. The models developed by MARS method showed better predictability compared to those developed by MLR technique. In addition, the MARS models based on molar solubility showed better predicting ability for cellulose solubility in ILs as compared to the model based on the mass percentage. The results indicated that the molecular descriptors of IL ions could be used to develop predictive models for activities/properties (e.g. cellulose solubility) which might facilitate the in silico and a priori screening/selecting of ILs that are suitable for specific applications.

Abbreviations Abbreviation [Cnmim]+: n = 1 ,2,3,…10

Structure

Name 1-Alkyl-3-methylimidazolium

[TBA]+

Tetrabutylammonium

[TBP]+

Tetrabutylphosphonium

[DEME]+

Diethylmethylmethoxyethylammonium

[DEMB]+

Diethylmethylbutylammonium

[DMA]−

Dimethylalanine

[DMP]−

Dimethylphosphate

[DEP]−

Diethylphosphate

[DCA]−

Dicyanamide

[MTA]



Methoxyacetate

[MEPA−]

Methoxyethoxypropionate

[MDEPA]−

Methoxydiethoxypropionate

[MTEPA]−

Methoxytriethoxypropionate (continued on next page)

548

N.L. Mai et al. / Journal of Molecular Liquids 215 (2016) 541–548 Abbreviations (continued) [Formate]−

Formate

[OAc]−

Acetate

[Propionate] [Butyrate]





Acknowledgment This work was supported by the Global Excellent Technology Innovation funded by the Ministry of Trade, Industry & Energy (MOTIE) of Korea (KC000618). This work was also partly supported by Inha University, Korea. References [1] H. Wang, G. Gurau, R.D. Rogers, Ionic liquid processing of cellulose, Chem. Soc. Rev. 41 (2012) 1519–1537. [2] T. Heinze, T. Liebert, Unconventional methods in cellulose functionalization, Prog. Polym. Sci. 26 (2001) 1689–1762. [3] C. Liu, R. Sun, A. Zhang, W. Li, Dissolution of cellulose in Ionic liquids and its application for cellulose processing and modification, cellulose solvents: for analysis, shaping and chemical modification, ACS2010, pp. 287–297. [4] R.P. Swatloski, S.K. Spear, J.D. Holbrey, R.D. Rogers, Dissolution of cellulose with ionic liquids, J. Am. Chem. Soc. 124 (2002) 4974–4975. [5] P. Maki-Arvela, I. Anugwom, P. Virtanen, R. Sjoholm, J.P. Mikkola, Dissolution of lignocellulosic materials and its constituents using ionic liquids-a review, Ind. Crop. Prod. 32 (2010) 175–201. [6] R.D. Rogers, K.R. Seddon, Ionic Liquids as Green Solvents: Progress and Prospects, American Chemical Society, Washington, DC, 2003. [7] S. Zhu, Y. Wu, Q. Chen, Z. Yu, C. Wang, S. Jin, Y. Ding, G. Wu, Dissolution of cellulose with ionic liquids and its application: a mini-review, Green Chem. 8 (2006) 325–327. [8] B. Kosan, C. Michels, F. Meister, Dissolution and forming of cellulose with ionic liquids, Cellulose 15 (2008) 59–66. [9] H. Zhao, G.A. Baker, Z. Song, O. Olubajo, T. Crittle, D. Peters, Designing enzymecompatible ionic liquids that can dissolve carbohydrates, Green Chem. 10 (2008) 696–705. [10] K. Ohira, Y. Abe, M. Kawatsura, K. Suzuki, M. Mizuno, Y. Amano, T. Itoh, Design of cellulose dissolving ionic liquids inspired by nature, ChemSusChem 5 (2012) 388–391. [11] S.H. Ha, N.L. Mai, G. An, Y.-M. Koo, Microwave-assisted pretreatment of cellulose in ionic liquid for accelerated enzymatic hydrolysis, Bioresour. Technol. 102 (2011) 1214–1219. [12] N.L. Mai, S.H. Ha, Y.-M. Koo, Efficient pretreatment of lignocellulose in ionic liquids/ co-solvent for enzymatic hydrolysis enhancement into fermentable sugars, Process Biochem. 49 (2014) 1144–1151. [13] A. Brandt, J. Grasvik, J.P. Hallett, T. Welton, Deconstruction of lignocellulosic biomass with ionic liquids, Green Chem. 15 (2013) 550–583. [14] H. Matsumoto, M. Yanagida, K. Tanimoto, M. Nomura, Y. Kitagawa, Y. Miyazaki, Highly conductive room temperature molten salts based on small trimethylalkylammonium cations and bis(trifluoromethylsulfonyl)imide, Chem. Lett. 29 (2000) 922–923. [15] S.A. Forsyth, J.M. Pringle, D.R. MacFarlane, Ionic liquids — an overview, Aust. J. Chem. 57 (2004) 113–119. [16] J. Kahlen, K. Masuch, K. Leonhard, Modelling cellulose solubilities in ionic liquids using COSMO-RS, Green Chem. 12 (2010) 2172–2181. [17] A.R. Katritzky, A. Lomaka, R. Petrukhin, R. Jain, M. Karelson, A.E. Visser, R.D. Rogers, QSPR correlation of the melting point for pyridinium bromides, potential ionic liquids, J. Chem. Inf. Comput. Sci. 42 (2001) 71–74. [18] A.R. Katritzky, R. Jain, A. Lomaka, R. Petrukhin, M. Karelson, A.E. Visser, R.D. Rogers, Correlation of the melting points of potential ionic liquids (imidazolium bromides and benzimidazolium bromides) using the CODESSA program, J. Chem. Inf. Comput. Sci. 42 (2002) 225–231. [19] D.M. Eike, J.F. Brennecke, E.J. Maginn, Predicting melting points of quaternary ammonium ionic liquids, Green Chem. 5 (2003) 323–328. [20] G. Carrera, J. Aires-de-Sousa, Estimation of melting points of pyridinium bromide ionic liquids with decision trees and neural networks, Green Chem. 7 (2005) 20–27. [21] S. Trohalaki, R. Pachter, Prediction of melting points for ionic liquids, QSAR Comb. Sci. 24 (2005) 485–490.

Propionate Butyrate

[22] S. Trohalaki, R. Pachter, G.W. Drake, T. Hawkins, Quantitative structure–property relationships for melting points and densities of ionic liquids, Energy Fuel 19 (2005) 279–284. [23] M. Sattari, F. Gharagheizi, P. Ilani-Kashkouli, A.H. Mohammadi, D. Ramjugernath, A chemical structure based model for the determination of speed of sound in ionic liquids, J. Mol. Liq. 196 (2014) 7–13. [24] Z. Dashtbozorgi, H. Golmohammadi, W.E. Acree Jr., Prediction of gas-to-ionic liquid partition coefficient of organic solutes dissolved in 1-(2-methoxyethyl)-1methylpyrrolidinium tris(pentafluoroethyl)trifluorophosphate using QSPR approaches, J. Mol. Liq. 201 (2015) 21–29. [25] F. Stock, J. Hoffmann, J. Ranke, R. Stormann, B. Ondruschka, B. Jastorff, Effects of ionic liquids on the acetylcholinesterase — a structure–activity relationship consideration, Green Chem. 6 (2004) 286–290. [26] A.-M. Lacrămă, M. Putz, V. Ostafe, A spectral-SAR model for the anionic-cationic interaction in ionic liquids: application to Vibrio fischeri ecotoxicity, Int. J. Mol. Sci. 8 (2007) 842–863. [27] P. Luis, A. Garea, A. Irabien, Quantitative structure–activity relationships (QSARs) to estimate ionic liquids ecotoxicity EC50 (Vibrio fischeri), J. Mol. Liq. 152 (2010) 28–33. [28] S. Bruzzone, C. Chiappe, S.E. Focardi, C. Pretti, M. Renzi, Theoretical descriptor for the correlation of aquatic toxicity of ionic liquids by quantitative structure–toxicity relationships, Chem. Eng. J. 175 (2011) 17–23. [29] S. Barthel, T. Heinze, Acylation and carbanilation of cellulose in ionic liquids, Green Chem. 8 (2006) 301–306. [30] J. Vitz, T. Erdmenger, C. Haensch, U.S. Schubert, Extended dissolution studies of cellulose in imidazolium based ionic liquids, Green Chem. 11 (2009) 417–424. [31] T. Itoh, Y. Hamada, K. Yoshida, R.-i. Asai, Design of ionic liquids for direct extraction of lignin from wood chips, 5th congress on ionic liquids, Algarve, Portugal, 2013. [32] T. Erdmenger, C. Haensch, R. Hoogenboom, U.S. Schubert, Homogeneous tritylation of cellulose in 1-butyl-3-methylimidazolium chloride, Macromol. Biosci. 7 (2007) 440–445. [33] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, J.A. Montgomery, T.K.K.N. Vreven, K.N. Kudin, J.C. Burant, J.M. Millam, S.S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G.A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J.E. Knox, H.P. Hratchian, J.B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev, A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala, K. Morokuma, G.A. Voth, P. Salvador, J.J. Dannenberg, V.G. Zakrzewski, S. Dapprich, A.D. Daniels, M.C. Strain, O. Farkas, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J.V. Ortiz, Q. Cui, A.G. Baboul, S. Clifford, J. Cioslowski, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham, C.Y. Peng, A. Nanayakkara, M. Challacombe, P.M.W. Gill, B. Johnson, W. Chen, M.W. Wong, C. Gonzalez, J.A. Pople, Gaussian 03, Revision C.02, Gaussian, Inc., Wallingford CT, 2004. [34] A.R. Katritzky, M. Karelson, V.S. Lobanov, R. Dennignton, T.A. Keith, CODESSA, 19952009. [35] J.H. Friedman, Multivariate adaptive regression splines, Ann. Stat. 19 (1991) 1–67. [36] S. Milborrow, T. Hastie, R. Tibshirani, Earth: multivariate adaptive regression spline models, R package Version 3.2-6, 2011. [37] H. Akaike, A new look at the statistical model identification, IEEE Trans. Autom. Control 19 (1974) 716–723. [38] B. Ripley, B. Venables, D.M. Bates, K. Hornik, A. Gebhardt, D. Firth, Support functions and datasets for venables and ripley's MASS, 2015. [39] A. Casas, J. Palomar, M.V. Alonso, M. Oliet, S. Omar, F. Rodriguez, Comparison of lignin and cellulose solubilities in ionic liquids by COSMO-RS analysis and experimental validation, Ind. Crop. Prod. 37 (2012) 155–163. [40] R.H. Rohrbaugh, P.C. Jurs, Molecular shape and the prediction of high-performance liquid chromatographic retention indexes of polycyclic aromatic hydrocarbons, Anal. Chem. 59 (1987) 1048–1054. [41] N. Schwierz, D. Horinek, R.R. Netz, Reversed anionic Hofmeister series: the interplay of surface charge and surface polarity, Langmuir 26 (2010) 7370–7379. [42] R. Bini, M. Malvaldi, W.R. Pitner, C. Chiappe, QSPR correlation for conductivities and viscosities of low-temperature melting ionic liquids, J. Phys. Org. Chem. 21 (2008) 622–629.