Norm descriptors for predicting the hydrophile-lipophile balance (HLB) and critical micelle concentration (CMC) of anionic surfactants

Norm descriptors for predicting the hydrophile-lipophile balance (HLB) and critical micelle concentration (CMC) of anionic surfactants

Colloids and Surfaces A 583 (2019) 123967 Contents lists available at ScienceDirect Colloids and Surfaces A journal homepage: www.elsevier.com/locat...

2MB Sizes 1 Downloads 38 Views

Colloids and Surfaces A 583 (2019) 123967

Contents lists available at ScienceDirect

Colloids and Surfaces A journal homepage: www.elsevier.com/locate/colsurfa

Norm descriptors for predicting the hydrophile-lipophile balance (HLB) and critical micelle concentration (CMC) of anionic surfactants

T



Yajuan Shia, Fangyou Yana, , Qingzhu Jiab, Qiang Wanga a b

School of Chemical Engineering and Material Science, Tianjin University of Science and Technology, 13St. 29, TEDA, Tianjin 300457, PR China School of Marine and Environmental Science, Tianjin University of Science and Technology, 13St. 29, TEDA, Tianjin 300457, PR China

G R A P H I C A L A B S T R A C T

A R T I C LE I N FO

A B S T R A C T

Keywords: Norm descriptors QSPR Hydrophile-lipophile balance (HLB) Critical micelle concentration (CMC) Anionic surfactants

The hydrophile-lipophile balance (HLB) and critical micelle concentration (CMC) are vitally important indexes to measure the performance of surfactants quantitatively. Herein, based on the concept of norm index, new norm descriptors were proposed for describing the HLB and CMC of anionic surfactants, and two QSPR models were built to calculate the two values. Results showed that the norm descriptors-based models could give satisfactory prediction effect for the HLB and CMC of anionic surfactants with the R2 of 0.9983 for HLB and R2 of 0.9130 for CMC. The satisfactory results of several validations demonstrated that these models were stable, reliable and had a very good predictive ability. Moreover, for anionic surfactants without experimental values, the predicted HLB of 19 anionic surfactants and predicted CMC of 101 anionic surfactants were respectively calculated using the proposed models. The above results indicated that the norm index was suitable for evaluating the HLB and CMC.

1. Introduction The surface-active agents [1], a class of substances that are hydrophilic at one end and hydrophobic at the other, can be used for washing, emulsifying, foaming, wetting, soaking and dispersing [2]. They have important applications in traditional industrial and agricultural production due to their solubility in the water and organic



solvent [3]. And there are also many applications in emerging fields, such as the preparation of nanoparticles [4], membrane materials that can be used for molecular recognition [5], etc. Surfactants can be classified into anionic, cationic, and nonionic surfactants, depending on the type of charge at the hydrophilic end. Anionic surfactants, an important type of surfactants, also have wide application in industry and agriculture [6].

Corresponding author. E-mail address: [email protected] (F. Yan).

https://doi.org/10.1016/j.colsurfa.2019.123967 Received 3 July 2019; Received in revised form 9 September 2019; Accepted 10 September 2019 Available online 11 September 2019 0927-7757/ © 2019 Elsevier B.V. All rights reserved.

Colloids and Surfaces A 583 (2019) 123967

Y. Shi, et al.

2. Method

The HLB [7,8] and CMC [9] both are basic properties for surfactants. The HLB numbers could reflect the water and oil affinity of surfactants and the stability of an emulsion [10,11]. Low HLB numbers show that the surface-active agent is lipophilic, while high HLB numbers show that the surface-active agent is hydrophilic [12]. The CMC is known as the lowest concentration when the surface-active agent begins to come into being micelles [13], and the CMC values are influenced by many factors, such as the temperature, pH, etc. [14]. There are many researchers or methods focusing on determining the HLB and CMC values experimentally or computationally. Griffin [15] proposed the HLB concept of surface-active agents for the first time and made a great effort for determining HLB numbers. Then, Davies [16] assigned several numbers to functional groups for calculating HLB numbers with the suggestion that the HLB number was additive. Partly based on Davies’ group contribution method (GCM), Rong et al. [17] introduced some values of new groups to predict HLB numbers of 224 nonionic surfactants by effective chain length (ECL) method. And the comparison result showed that the ECL method could produce better results than Davies’ method for most surfactants. While for other surfactants, such as Span and Tween, the calculated values of the two methods were almost same. Although group contribution method was widely used in calculating the HLB numbers, some parameters were not easily obtained and the above group contribution method only provided partial group contribution values at present, and as a result, the HLB values may not be calculated due to the absence of a group contribution value. Also, for determination of CMC, many methods could be used to determine the values including surface tension, conductometry, calorimetry and so on [18]. Klevens [19] developed a classical empirical formula for calculating the log10 CMC , the number of carbon atom at alkyl part was introduced to the formula. Over the years, quantitative structure-property relationships (QSPR) have been confirmed that it was very successful in predicting the physicochemical properties of surface-active agents, such as CMC [20], surface tension [21] and HLB numbers [22–24]. A linear model for calculating logCMC of 181 anionic surfactants was proposed by Katritzky et al. with molecular and fragment descriptors [6], the statistical parameter R2 is 0.897. And for sugar-based surfactants, Gaudin et al. [25] and Baghban et al. [26] developed QSPR models for estimating the CMC using molecular descriptors based on 83 datapoints, and both the models received good performances in calculating the CMC values. Also, the CMC of sugar-based surfactants [14] was described by our group through norm descriptors, the satisfactory results (R2 = 0.9545, n = 83) indicated that the norm-index was suitable for describing the physicochemical properties of surfactants. Surface tension, another important property of surfactants, was predicted by Wang et al. [21] and Gaudin et al. [27] with QSPR approach. Moreover, Chen et al. [22] presented a QSPR model to calculate the HLB numbers of 90 nonionic compounds. The comparison demonstrated that the QSPR method was easier than ECL method, and the HLB numbers estimated through QSPR model were more accurate than ECL’s. Gad and Khairou [24] presented a QSPR model with R2 of 0.9825 and F of 1301 for calculating HLB values of nonionic compounds. For anionic surfactants [28], Luan et al. [23] developed a multiple and a nonlinear model using QSPR method to estimate the HLB numbers of 73 samples. Their results showed that the nonlinear model produced better results than their linear model. In this work, new norm descriptors were proposed to describe the HLB and CMC numbers of anionic surface-active agents based on the concept of norm-index. Then, two reliable QSPR models were built for calculating the HLB and CMC numbers. The models were both verified by several validation approaches. The good performance of the models proved the generality of norm descriptors.

2.1. Dataset The experimental HLB numbers of 73 anionic surfactants were from Luan et al.’s work [23] and the experimental CMC numbers of 155 anionic surfactants were collected form Katritzky et al.’s work [6]. The HLB values were consistent with Luan et al.’s work. For CMC values, the temperature affected the experimental values, thus in this work, the experimental CMC values at 40℃ in Katritzky et al.’s work were utilized to develop the QSPR model. Here, three anionic groups of − SO4−, − SO3− and − COO−, two cations of Na+ and K+ were included. 10 types of anionic surfactants (linear alkyl acetate, linear alkylsulfonate, linear alkylsulfates, branched alkylsulfate, branched alkylsulfonate, alkyl benzenesulfonate, dodecyl polyoxy ethylene sulfate, alkyl ester sulfonate, alkyl-vinyl sulfonate and fluorinated linear alkylsulfonate) were covered in the dataset. The anionic surfactants together with corresponding kind and observed values used in this work were listed in Table S1 in the Supplementary Material. 2.2. Atomic distribution matrices and norm descriptors The norm-index has been successfully used in describing the physicochemical properties, including viscosity of ionic liquids at variable temperatures and pressures(253.15–573.00 K, 0.06–300.00 MPa) [29], heat capacity of organic compounds at variable temperatures (50–1500 K) [30], flash point of multiple component mixture [31], solubility of nanomaterial [32], thermal conductivity of ionic liquids at variable temperatures (273.15–355.07 K) [33], and CMC of sugar-based surfactants [14]. In this work, the norm-index was employed to describe the QSPR models of anionic surfactants. Firstly, on the basis of the optimized structure of anionic surfactant, eight atomic distribution matrices (M1-M8) were developed to reflect the connected relation of the atoms and the specialized contribution of each atom. Then, the norm descriptors (Ip) were calculated from the atomic distribution matrices (M) through the following formulas (Eqs. (9)–(12)). The atomic distribution matrices and the corresponding calculation of their norm descriptors are listed in Table 1. Herein, four new norm descriptors (IHLB,1- IHLB,4) were proposed to describe the HLB numbers, and other four norm descriptors (ICMC, 1- ICMC, 4) were proposed to describe the CMC values.

M1 = [m ij ] m ij =

M2 = [m ij ] m ij =

M3 = [m ij ] m ij =

M4 = [m ij ] m ij =

⎧ rj pij = 1 ⎨ 0 pij ≠ 1 ⎩

(1)

−1 −1 ⎧ aci × acj × Ei × E j pij = 1 ⎨0 pij ≠ 1 ⎩

(2)

−1 ⎧ acj × E j pij = 3 ⎨0 pij ≠ 3 ⎩

(3)

⎧ pi × ipj pij = 3 pij ≠ 3 ⎨0 ⎩

(4)

M5 = [m ij ] m ij = aci × acj × Ei−1 × E j−1 × pij

(5)

Table 1 The corresponding calculation of norm descriptors.

2

p

IHLB,p

ICMC,p

1 2 3 4

|| M1 ||1 || M2 ||2 || M3 ||2 || M4 ||3

|| M5 ||2 || M6 ||3 || M7 ||3 || M8 ||4

Colloids and Surfaces A 583 (2019) 123967

Y. Shi, et al.

M6 = [m ij ] m ij = acj × pij

M7 = [m ij ] m ij =

(6)

zj

d ij pij = 3 ⎨ 0 p ≠3 ij ⎩

(7)

1 p =3 (lgEi + 1) × d ij ij ⎨ 0 pij ≠ 3 ⎩

(8)



⎧ M8 = [m ij ] m ij =

|| M ||1 = max( ∑ |(M )ij| i

|| M ||2 =

max(λ i (MH × M ))

|| M ||3 =

∑ ∑ (M )2ij i

|| M ||4 =

j

1 ( ∑ ∑ |(M )ij|) n j i

(9) (10)

(11)

(12)

where, pij is the path between atom i and j, the λi refers to the matrix eigenvalue MH × M , ri, aci, ipi, Ei, zi represent the Van der Waals radius, atomic charge, ionization potential, electronegativity, and proton number of atom i in a molecule. The stable structures of anionic surfactants were optimized by HyperChem 7.0 software based on the Restricted Hartree-Fock (RHF) of quantum chemistry ab initio method at the STO-3G level. 2.3. Model validation The above produced norm descriptors would be utilized to develop the QSPR models for calculating the HLB and CMC. Herein three statistical parameters, R2, RMSE and F (Fisher's criterion) were employed to access the models’ performance. Moreover, internal validation, external validation and Williams plot [34,35] were used to verify their robustness, predictability and reliability. Fig. 1. The experimental values vs. calculated values for HLB (a) and logCMC (b).

3. Results and discussion 3.1. The HLB and logCMC models

experimental values. All residuals are maintained between [−2, 2] for HLB numbers, for logCMC values, most compounds were within a margin of error of ± 0.5. The R2 of 0.9983, 0.9130, respectively for HLB and logCMC proved that the models had good fitting effect. In addition, in order to explain the calculation process of norm descriptor in detail, an example was added in the Supplementary Material to calculate the HLB value. This example (anionic surfactant 167, C6F13COO−) started from the molecular structure, to the atomic distribution matrix, to the calculation of norm descriptors, and finally to calculate the predicted value using the established model.

The developed model for predicting HLB numbers was expressed as:

HLB = −1.1286 × 103 + 160.3323IHLB,1 − 321.6018IHLB,2 + 29.1126IHLB,3 − 0.3915IHLB,4

(13)

n = 73, R2 = 0.9983, F = 10,012.8, RMSE = 0.5203 QLOO2 = 0.9981, Q5-fold2 = 0.9982, Q10-fold2 = 0.9981 The developed model for predicting logCMC values was expressed

3.2. Internal validation

as:

log CMC = 6.3221 + 5.6126ICMC ,1 − 0.0730ICMC ,2 − 0.2741ICMC ,3 − 2.5884ICMC ,4

Herein three cross validations (LOO-CV, 5-fold CV, 10-fold CV) were used to access the models’ robustness. In leave-one-out cross-validation (LOO-CV) procedure, given n organic compounds, n reduced models were calculated. Each of these models was developed with the remaining n-1 compounds and used to predict the response of the deleted compound. In 5-fold (10-fold) CV procedure, the data was divided into five (ten) parts, one portion of the data was selected as the testing set without repeating, and other four (nine) parts were used as training set to model. The experimental HLB versus predicted HLB (LOO-CV, 5-fold CV, 10-fold CV), and the experimental logCMC versus predicted logCMC were both plotted in Fig. 2, which displayed that the experimental values obtained by the three cross-validation methods almost coincide. Moreover, the error distribution for two models, LOO-CV, 5-fold CV and

(14)

n = 155, R2 = 0.9130, F = 393.33, RMSE = 0.2569 QLOO2 = 0.9015, Q5-fold2 = 0.8976, Q10-fold2 = 0.9006 where, Ip represented the norm descriptors. The predicted values of HLB and logCMC were listed in Table S1. The scatter diagrams of experimental versus calculated values for HLB and logCMC were plotted in Fig. 1. It was obvious that the anionic surfactants would have high HLB numbers if the − SO4− group appears in the compound. Also, Fig. 1 clearly indicated that both the two values matched perfect with the 3

Colloids and Surfaces A 583 (2019) 123967

Y. Shi, et al.

Fig. 2. The experimental vs. calculated values for HLB (a) and logCMC (b) of LOO-CV, 5-fold CV and 10-fold CV. Fig. 3. The error distribution of the model for internal validation.

10-fold CV were shown in Fig. 3, which indicated that the errors were normally distributed and relatively concentrated, indicating that the accuracy of the models was high. The statistical results of three crossvalidations were listed in Table 2. The parameters, QLOO2 (0.9981), Q52 2 2 fold (0.9982) and Q10-fold (0.9982) for HLB numbers, QLOO (0.9015), Q5-fold2 (0.8976) and Q10-fold2 (0.9006) for logCMC numbers, were greater than 0.5, which demonstrated the reasonable robustness of the models [36,37]. In addition, the “r m2” metrics were utilized to judge the quality of the model [38,39]. The “r m2” metrics included two variants: rm2 and Δrm2 . For HLB values, the rm2 and Δrm2 were 0.9975 and 0.0015. For CMC values, the rm2 and Δrm2 were 0.8736 and 0.0788. Both the rm2 were greater than 0.5 and both the Δrm2 were less than 0.2, which indicated that these two models had a great predictive ability. The details of the two parameters for the two models were provided in the Supplementary Material.

calculated values versus experimental values of both indexes for the training set and testing set were plotted in Fig. 4. The Rtraining2 and Rtesting2 were 0.9982 and 0.9988 for HLB numbers, the Rtraining2 and Rtesting2 were 0.9141 and 0.9141 for logCMC values, which both were almost the same as the whole R2. Fig. 4 showed that both the data were close to the bisector, which proved that these models had a good predictive performance. Several external validation parameters [37], QF12 (0.9988), QF22 (0.9888) and QF32 (0.9988) for HLB values, QF12 (0.9061), QF22 (0.9001) and QF32 (0.9277) for logCMC values, indicated that the two models had a pretty good predictivity. Moreover, for anionic surfactants without experimental values, the predicted values were calculated using the proposed models and were listed in Table S1.

3.4. Y-randomized validation In this work, 1000 times of Y-randomized validation [41] were repeated to confirm the possibility of chance correlation for the models. The average RY2 and QY2 for 1000 times Y-randomized validation were 0.0577 and 0.0455, 0.0259 and 0.0308, respectively for HLB and log CMC. These two values were far less than the original R2 and QLOO2. Accordingly, there were no chance correlation in our modeling process.

3.3. External validation In this work, the external validation was utilized to assess the models’ predictive performance. For HLB model, in order to make the comparison results clearer, the division of training set and testing set in this part was consistent with Luan et al.’s work. For logCMC model, the division of training set and testing set was randomly in a ratio of approximately 3:1 [40]. The results were provided in Table 2. The 4

Colloids and Surfaces A 583 (2019) 123967

Y. Shi, et al.

Table 2 Validation results for this work. property

Samples

R2

F

RMSE

Internal validation QLOO

HLB CMC

73 155

0.9983 0.9130

10012.8 393.33

0.5203 0.2569

2

0.9981 0.9015

External validation

2

Q10-fold

0.9982 0.8976

0.9981 0.9006

Q5-fold

2

ntraining

Rtraining

58 116

0.9982 0.9141

2

CCC 2

ntesting

Rtesting

15 39

0.9988 0.9141

QF12

2

QF2

0.9988 0.9061

0.9988 0.9001

QF32 0.9988 0.9277

0.9945 0.9090

Fig. 5. Three-dimensional Williams plot for HLB (a) and logCMC (b). Fig. 4. The experimental vs. predicted values of external validation for HLB(a) and logCMC (b).

value, and the latter may be due to structural problems, one possible reason was that the error increased due to the increase of the number of ethoxy groups. The results indicated that two models both covered a wide applicability domain. Therefore, it can be concluded that the MLR models could estimate HLB and logCMC for many kinds of anionic surfactants with reliable predictions.

3.5. Application domain (AD) analysis The Williams plot was often employed to visualize the AD. The Williams plots were described in Fig. 5. The Fig. 5a showed that for HLB model almost all the anionic surfactants locater in the application domain were surrounded by both the critical leverage value (h* = 0.2586) and three standard deviation units. However, there was only one anionic surfactant (C14H29COO(OC2H4)SO3−, molecule 163) with standardized residual slightly greater than three standard deviation units [−3,3] but with h less than the h*. For this compound, maybe due to the unreasonable experimental value that caused the high error [42]. As shown in Fig. 5b, for logCMC model, there were four “good high leverage points” (molecule 34, 35, 36 and 122), which might make the model more stable [43]. Besides, two response outliers (molecule 11 and 149) and one structural outlier (molecule 30) appeared in the logCMC model, the former may be caused by the error of experimental

3.6. Model comparison with references To evaluate the performance of our work, the HLB model was compared with Luan et al.’s work [23], which was a good job in calculating the HLB and the results were presented in Table 3. For HLB model, it was clear that our model produced higher R2 (0.9983), F (10012.8) and lower RMSE (0.5203) than Luan et al.’s MLR model (R2 = 0.981, F = 3729.452, RMSE = 1.6601), which demonstrated that our MLR model to calculate HLB number was more reasonable and the predicted HLB was more precise. Importantly, the results of our linear model were even slightly better than their nonlinear ANN model for the 5

Colloids and Surfaces A 583 (2019) 123967

Y. Shi, et al.

Table 3 Model comparison. Property

HLB

CMC

Reference

Luan et al. [23] This model Roberts [44] Li et al. [45] Katritzky et al. [6] This model

Algorithm

n

Descriptors number

R2

F

RMSE

QLOO2

Training set ntraining

Rtraining

Testing set 2

Ftraining

RMSE

ntesting

Rtesting2

Ftesting

RMSE

ANN MLR MLR MLR MLR MLR

73 73 73 133 98 181

– 4 4 2 3 5

0.996 0.981 0.9983 0.976 0.980 0.897

17322.59 3729.452 10012.8 5360 1505.23 303.7115

0.8080 1.6601 0.5203 – – –

– – 0.9981 – 0.978 0.877

58 58 58 – – 121

0.9972 0.9829 0.9982 – – 0.921

19647.91 763.69 7223.4 – – –

0.6950 1.7309 0.5422 – – –

15 15 15 – – 60

0.9912 0.989 0.9988 – – 0.853

1453.52 1142.12 2108.2 – – –

1.1895 1.3509 0.4302 – – –

MLR

155

4

0.9130

393.329

0.2569

0.9015

116

0.9141

295.184

0.2634

39

0.9141

60.279

0.2416

References

overall model, training set and testing set. In a word, the HLB model received reasonable robustness and predictability. For CMC model, as shown in Table 3, there were several researchers focusing on developing QSPR model for CMC with good results. In this work, the CMC data of anionic surfactant at 40 ℃ were collected form Katritzky et al.’s [6] article, and the R2 of this work was 0.9130, while Katritzky et al.’s was 0.897. The stability of this model (QLOO2 = 0.9015) was higher than his (QLOO2 = 0.877). In addition, for the models of Roberts [44] and Li et al. [45], the R2 of their model was higher indeed, but πh, one of these descriptors in Roberts’ work, was calculated from another computed value, logP, this method would increase the error when the surfactants was not contained in the dataset. After comprehensive consideration, it could be concluded that this CMC model was very suitable for predicting CMC of anionic surfactants.

[1] J. Mondal, M. Mahanthappa, A. Yethiraj, Self-assembly of gemini surfactants: a computer simulation study, J. Phys. Chem. B 117 (2013) 4254–4262. [2] S. Ross, E.S. Chen, P. Becher, H.J. Ranauto, Spreading coefficients and hydrophilelipophile balance of aqueous solutions of emulsifying agents, J. Phys. Chem. 63 (1959) 1681–1683. [3] J. Hu, X. Zhang, Z. Wang, A review on progress in QSPR studies for surfactants, Int. J. Mol. Sci. 11 (2010). [4] J.H. Fendler, Atomic and molecular clusters in membrane mimetic chemistry, Chem. Rev. 87 (1987) 877–899. [5] J.E. Riviere, R.E. Baynes, X.R. Xia, Membrane-coated fiber array approach for predicting skin permeability of chemical mixtures from different vehicles, Toxicol. Sci. 99 (2007) 153–161. [6] A.R. Katritzky, L. Pacureanu, D. Dobchev, M. Karelson, QSPR study of critical micelle concentration of anionic surfactants using computational molecular descriptors, J. Chem. Inf. Model. 47 (2007) 782–793. [7] I.J. Lin, Hydrophile-lipophile balance (hlb) of fluorocarbon surfactants and its relation to the critical micelle concentration (cmc), J. Phys. Chem. 76 (1972) 2019–2023. [8] M.T. Lima, V.J. Spiering, S.N. Kurt-Zerdeli, D.C. Brüggemann, M. Gradzielski, R. Schomäcker, The hydrophilic-lipophilic balance of carboxylate and carbonate modified nonionic surfactants, Colloids Surf. A Physicochem. Eng. Asp. 569 (2019) 156–163. [9] P.D.T. Huibers, V.S. Lobanov, A.R. Katritzky, D.O. Shah, M. Karelson, Prediction of critical micelle concentration using a quantitative structure−property relationship approach. 1. Nonionic surfactants, Langmuir 12 (1996) 1462–1470. [10] H. Schott, Hydrophile-lipophile balance and cloud points of nonionic surfactants, J. Pharm. Sci. 58 (1969) 1443–1449. [11] V. Verdinelli, P.V. Messina, P.C. Schulz, B. Vuano, Hydrophile–lipophile balance (HLB) of n-alkane phosphonic acids and theirs salts, Colloids Surf. A Physicochem. Eng. Asp. 316 (2008) 131–135. [12] B. Creton, C. Nieto-Draghi, N. Pannacci, Prediction of surfactants’ properties using multiscale molecular modeling tools: a review, oil gas sci, Technol. – Rev. IFP Energ. Nouvelles 67 (2012) 969–982. [13] Z. Wang, G. Li, X. Zhang, R. Wang, A. Lou, A quantitative structure-property relationship study for the prediction of critical micelle concentration of nonionic surfactants, Colloids Surf. A Physicochem. Eng. Asp. 197 (2002) 37–45. [14] Y. Wang, F. Yan, Q. Jia, Q. Wang, Quantitative structure-property relationship for critical micelles concentration of sugar-based surfactants using norm indexes, J. Mol. Liq. 253 (2018) 205–210. [15] W.C. Griffin, Classification of surface-active agents by "HLB", J. Soc. Cosmet. Chem. 1 (1949) 311–326. [16] J.T. Davies, A quantitative kinetic theory of emulsion type. I. Physical chemistry of the emulsifying agent. Gas/liquid and liquid/liquid interfacesinterfaces, Proceedings of 2nd International Congress Surface Activity (1957) 426–438. [17] X. Guo, Z. Rong, X. Ying, Calculation of hydrophile–lipophile balance for polyethoxylated surfactants by group contribution method, J. Colloid Interface Sci. 298 (2006) 441–450. [18] M. Anna, R.-R. Bozenna, Prediction of critical micelle concentration of nonionic surfactants by a quantitative structure - property relationship, Comb. Chem. High Throughput Screen. 13 (2010) 39–44. [19] H. Klevens, Structure and aggregation in dilate solution of surface active agents, J. Am. Oil Chem. Soc. 30 (1953) 74–80. [20] K. Roy, H. Kabir, QSPR with extended topochemical atom (ETA) indices: exploring effects of hydrophobicity, branching and electronic parameters on logCMC values of anionic surfactants, Chem. Eng. Sci. 87 (2013) 141–151. [21] Z.W. Wang, J.L. Feng, H.J. Wang, Z.G. Cui, G.Z. Li, Effectiveness of surface tension reduction by nonionic surfactants with quantitative structure‐property relationship approach, J. Dispers. Sci. Technol. 26 (2005) 441–447. [22] M.-L. Chen, Z.-W. Wang, H.-J. Duan, QSPR for HLB values of nonionic surfactants using two simple descriptors, J. Dispers. Sci. Technol. 30 (2009) 1481–1485. [23] F. Luan, H. Liu, Y. Gao, Q. Li, X. Zhang, Y. Guo, Prediction of hydrophile–lipophile balance values of anionic surfactants using a quantitative structure–property relationship, J. Colloid Interface Sci. 336 (2009) 773–779. [24] E.A.M. Gad, K.S. Khairou, QSPR for HLB of nonionic surfactants based on polyoxyethylene group, J. Dispers. Sci. Technol. 29 (2008) 940–947.

4. Conclusion In this work, two new sets of norm descriptors were proposed to describe the properties of the surfactants. Moreover, two QSPR models were built to calculate HLB and logCMC of anionic surfactants. These descriptors could reflect the electrostatic and steric interactions of anionic surfactants properly. The statistical parameters, R2 (0.9983) and F (10012.8) for HLB, R2 (0.9130) and F (393.329) for logCMC, presented that both the models could provide good fitting effect. Similarly, the results of internal validation (QLOO2 = 0.9981 for HLB, QLOO2 = 0.9015 for logCMC) indicated that these models had good stability. Meanwhile, the results of external validation demonstrated the models’ good predictive ability. The Y-randomized and AD analysis proved that two models were reliable. Therefore, it could be concluded that the normindex with generality was successful for describing the HLB and CMC of anionic surfactants, and the QSPR models were reasonable for predicting HLB and logCMC of anionic surfactants. What is more, for anionic surfactants without experimental HLB and CMC values, the developed models could be used to calculate the properties values, which might provide guidance for designing new surfactants. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgement This work was financially supported by the National Natural Science Foundation of China [NO: 21306137, 21676203 and 21808167]. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.colsurfa.2019.123967. 6

Colloids and Surfaces A 583 (2019) 123967

Y. Shi, et al.

[36] A. Tropsha, P. Gramatica, V.K. Gombar, The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models, QSAR Comb. Sci. 22 (2003) 69–77. [37] P. Gramatica, A. Sangion, A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology, J. Chem. Inf. Model. 56 (2016) 1127–1131. [38] K. Roy, P. Chakraborty, I. Mitra, P.K. Ojha, S. Kar, R.N. Das, Some case studies on application of “rm2” metrics for judging quality of quantitative structure–activity relationship predictions: Emphasis on scaling of response data, J. Comput. Chem. 34 (2013) 1071–1082. [39] K. Roy, S. Kar, R.N. Das, Chapter 7 – validation of QSAR models, in: K. Roy, S. Kar, R.N. Das (Eds.), Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press, Boston, 2015, pp. 231–289. [40] A. Tropsha, Best practices for QSAR model development, validation, and exploitation, Mol. Inform. 29 (2010) 476–488. [41] C. Rücker, G. Rücker, M. Meringer, Y-randomization and its variants in QSPR/ QSAR, J. Chem. Inf. Model. 47 (2007) 2345–2357. [42] P. Gramatica, Principles of QSAR models validation: internal and external, QSAR Comb. Sci. 26 (2007) 694–701. [43] J. Jaworska, N. Jeliazkova, T. Aldenberg, QSAR Applicabilty Domain Estimation by Projection of the Training Set Descriptor Space: a Review, (2005). [44] D.W. Roberts, Application of octanol/water partition coefficients in surfactant science: a quantitative structure−property relationship for micellization of anionic surfactants, Langmuir 18 (2002) 345–352. [45] X. Li, G. Zhang, J. Dong, X. Zhou, X. Yan, M. Luo, Estimation of critical micelle concentration of anionic surfactants with QSPR approach, J. Mol. Struct. Theochem 710 (2004) 119–126.

[25] T. Gaudin, P. Rotureau, I. Pezron, G. Fayet, New QSPR models to predict the critical micelle concentration of sugar-based surfactants, Ind. Eng. Chem. Res. 55 (2016) 11716–11726. [26] A. Baghban, J. Sasanipour, M. Sarafbidabad, A. Piri, R. Razavi, On the prediction of critical micelle concentration for sugar-based non-ionic surfactants, Chem. Phys. Lipids 214 (2018) 46–57. [27] T. Gaudin, P. Rotureau, I. Pezron, G. Fayet, Investigating the impact of sugar-based surfactants structure on surface tension at critical micelle concentration with structure-property relationships, J. Colloid Interface Sci. 516 (2018) 162–171. [28] E. Riccardi, T. Tichelkamp, Calcium ion effects on the water/oil interface in the presence of anionic surfactants, Colloids Surf. A Physicochem. Eng. Asp. 573 (2019) 246–254. [29] F. Yan, W. He, Q. Jia, Q. Wang, S. Xia, P. Ma, Prediction of ionic liquids viscosity at variable temperatures and pressures, Chem. Eng. Sci. 184 (2018) 134–140. [30] J. Yin, Q. Jia, F. Yan, Q. Wang, Predicting heat capacity of gas for diverse organic compounds at different temperatures, Fluid Phase Equilib. 446 (2017) 1–8. [31] Y. Wang, F. Yan, Q. Jia, Q. Wang, Distributive structure-properties relationship for flash point of multiple components mixture, Fluid Phase Equilib. 474 (2018) 1–5. [32] X. Xu, L. Li, F. Yan, Q. Jia, Q. Wang, P. Ma, Predicting solubility of fullerene C 60 in diverse organic solvents using norm indexes, J. Mol. Liq. 223 (2016) 603–610. [33] W. He, F. Yan, Q. Jia, S. Xia, Q. Wang, Description of the thermal conductivity λ(T, P) of ionic liquids using the structure–property relationship method, J. Chem. Eng. Data 62 (2017) 2466–2472. [34] K. Roy, S. Kar, P. Ambure, On a simple approach for determining applicability domain of QSAR models, Chemom. Intell. Lab. Syst. 145 (2015) 22–29. [35] K. Roy, P. Ambure, R.B. Aher, How important is to detect systematic error in predictions and understand statistical applicability domain of QSAR models? Chemom. Intell. Lab. Syst. 162 (2017) 44–54.

7