Data mining assisted prediction of liquidus temperature for primary crystallization of different electrolyte systems

Data mining assisted prediction of liquidus temperature for primary crystallization of different electrolyte systems

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Sys...

3MB Sizes 0 Downloads 11 Views

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics

Data mining assisted prediction of liquidus temperature for primary crystallization of different electrolyte systems Hui Lu a, b, Xiaojun Hu a, *, Bin Cao b, Liang Ma b, Wanqiu Chai b, Yunchuan Yang b a b

State Key Laboratory of Advanced Metallurgy, University of Science and Technology Beijing, Beijing, 10083, China Guiyang Aluminium Magnesium Design & Research Institute Co., Ltd, Guiyang, 550081, China

A R T I C L E I N F O

A B S T R A C T

Keywords: Aluminium electrolysis industry Electrolyte Liquidus temperature Support vector machine Back-propagation artificial neural network Random forest regression Gradient boosting regression

Liquidus temperature for primary crystallization is an important physical and chemical property for electrolyte system. It plays a crucial role on the stability of the electric cell in electrolysis production process. So how to accurately predict the liquidus temperature for primary crystallization of electrolyte based on the composition of electrolyte is a meaningful research subject. In this work, data mining assisted prediction of liquidus temperature for primary crystallization of electrolyte systems was proposed. The essential differences between the complex industrial electrolyte system and electrolyte system prepared in laboratory were revealed by means of comparing the micro-morphology, phase composition and thermal analysis. To some extent, it was verified that the empirical formula has no versatility in the two different electrolyte systems. The prediction model of liquidus temperature for primary crystallization of different electrolyte systems was constructed by using SVM(support vector machine), BPANN(back-propagation artifical neural networks), RFR(random forest regression) and GBR(gradient boosting regression) algorithm, respectively. The electroyte system inculdes Na3AlF 6(CR)-Al2O3–AlF3–CaF2, Na3AlF6(CR)-Al2O3–MgF2–CaF2–LiF, Na3AlF6(CR)Al2O3-MgF2-CaF 2-KF-LiF, and Na3AlF 6(CR)-Al2O3-AlF 3-CaF2-MgF2-LiF-KF-NaF. For different electrolyte systems, ANN, SVM, RFR and other models all have good performances, they can effectively predict the liquidus temperature for primary crystallization of each electrolyte systems. For some electrolyte systems, ANN, SVM, RFR models are obviously superior to the prediction level of empirical formula described in the literature. It can be seen that data mining has a good application prospect in the prediction of the liquidus temperature for primary crystallization of electrolyte systems. We provide a new method for predicting the liquidus temperature for primary crystallization of different electrolyte systems based on the electrolyte composition dataset in this work.

1. Introduction The industrial aluminium electrolyte had been composed of cryolite-alumina system since Hall-Heroult process method was proposed in 1886. In order to improve the physical and chemical properties of the electrolyte system, some salts referred as additives are usually added to the electrolyte. The additives meeting relevant requirements mainly include AlF3, CaF2, MgF2, LiF,KF, NaCl and BaCl2 etc. These additives generally have the characteristics of reducing the liquidus temperature of electrolyte and increasing the conductivity of electrolyte. But these additives also have the disadvantage of reducing the solubility of alumina in the process of aluminum electrolysis production [1,2]. At present, the industrial electrolyte system usually contains

cyrolite (about 80%), AlF3(about 6%–12%), Al2O3(about 3%–4%) and additives CaF2, MgF2 and LiF(total 5%–7%). In the electrolyte system, alumina is utilized as raw material, cryolite and AlF3 are used as dissolving carrier [3]. The liquidus temperature for primary crystallization of electrolyte is one of the most important physical and chemical properties of aluminium electrolyte. The selection of industrial electrolysis temperature is directly determined by the liquidus temperature for primary crystallization of electrolyte and superheat. The relationship among the three parameters can be expressd as follow formula [4]: Te ¼ Tl þ Δt

(1)

Where Te is the electrolysis temperature for the electrolyte, Tl is the

* Corresponding author. 1105 room, metallurgical ecological building, xueyuan road, Beijing, 100083, China. E-mail address: [email protected] (X. Hu). https://doi.org/10.1016/j.chemolab.2019.103885 Received 7 July 2019; Received in revised form 26 October 2019; Accepted 27 October 2019 Available online xxxx 0169-7439/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: H. Lu et al., Data mining assisted prediction of liquidus temperature for primary crystallization of different electrolyte systems, Chemometrics and Intelligent Laboratory Systems, https://doi.org/10.1016/j.chemolab.2019.103885

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

liquidus temperature for the primary crystallization of electrolyte, Δt is the superheat. Superheat is also an important parameter in electrolysis production process. It is directly related to the stability of electrolyzer and the thickness of sideways furnace wall. The relationship between the superheat and the thickness of crust in electrolyzer can be expressed as follows [5]: Q ¼ h  A  Δt

(2)

X ¼ K  A  ðTl  Ts  Q  Rw Þ=Q

(3)

temperature for primary crystallization of the electrolyte system is evaluated, respectively. 2. Methods 2.1. Theory 2.1.1. Back-propagation artificial neural networks (BPANN) Back-Propagation network is one of the most widely used artificial neural networks [22,23]. It is a dynamic system with directed graph as its topological structure and can also be regarded as a non-linear mapping for high-dimensional space. By iteratively optimizing the weights of the network, the actual mapping relationship between output and input is consistent with the expected mapping relationship. The gradient descent algorithm is used to minimize the objective function by adjusting the weights of each layer. The computational steps of the BPBPANN algorithm are described as follows:

Where the Q is heat flux(W), h is heat transfer coefficient(W/m2*K), A is the interfacial area(m2), Δt is the superheat(K), X is the thickness of the sideways crust(m), K is the thermal conductivity for crust(W/m*K), Tl is the liquidus temperature of the primary crystallization for electrolyte(K), Ts is the temperature for crust(K), Rw is the thermal resistance for sideways wall(K/W). Therefor, the electrolytic temperature is determined by the combination of liquidus temperature and superheat. The accurate prediction of liquidus temperature for primary crystallization is the basis of determining electrolysis temperature and setting reasonable superheat. At present, there are two main methods for obtaining the liquidus temperature for primary crystallization of electrolyte: thermal analysis method and empirical formula method. For thermal analysis, there exist the following shortcomings including long detection time, high cost and large workload. Because of using thermal analysis method to measure the liquidus temperature for primary crystallization of electrolyte, the solid electrolyte need to be heated to about 1000  C to molten state, then the natural cooling process records the trend of its temperature cooling process. According to the current situation of testing equipment, the thermal analysis approach is expensive because it consumes a lot of energy and testing thermocouples. Moreover, If the cooling rate is not controlled properly, the step cooling curve will be difficult to observe, which will directly lead to large deviation of measurement results. Meanwile, empirical formulas are also used to calculate the liquidus temperature for primary crystallization of electrolyte. The reliable empirical formulas must be based on reliable experimental data and appropriate modeling methods. However, most of the formulas have a certain range of application, which has great limitations and poor adaptability [6–10]. In the early period, the electrolyte system was relatively simple, and there were few kinds of additives in the electrolyte. Many researchers have summarized many formulas for predicting the liquidus temperature for primary crystallization of electrolyte, such as the simple binary system, ternary system and quaternary system [11–14]. With the increasing barrenness of bauxite grade, low-grade bauxite with multiple elements began to be gradually applied. This will directly lead to the gradual complexity of electrolyte system components in the electrolysis process [15–17]. It is very difficult to predict liquidus temperature for primary crystallization of this complex electrolyte system, then the machine learning algorithm began to play an important role in regression computation for liquidus temperature prediction. Prediction of related physicochemical properties or parameters using support vector machines or the other algorithms has been reported in previous research papers [18–20]. The prediction of liquidus temperature for primary crystallization of complex electrolyte systems by data mining method has also been reported [21]. In this work, we will not only employ different data mining methods to predict the liquidus temperature for primary crystallization of different electrolyte systems such as the Na3AlF6(CR)-Al2O3–AlF3–CaF2, Na3AlF6(CR)-Al2O3–MgF2–CaF2–LiF, Na3AlF6(CR)-Al2O3-MgF2-CaF2-KFLiF and Na3AlF6(CR)-Al2O3-AlF3-CaF2-MgF2-LiF-KF-NaF systems, but also investigate their prediction performances of different models. Finally, the contribution of different compositions to the liquidus

1 Initialization of weights.   wlij wi ¼ 0; potðl  1Þ; j ¼ 0; potðlÞ; l ¼ 2; L

(4)

2 Random selection of a sample and calculate its value. Ej ðj ¼ 1; 2; ⋯⋯; MÞ

(5)

3 Reverse layer-by-layer calculation of error function value.    δlj j ¼ 0; potðlÞ; l ¼ 2; L 8 < :

(6)

   b j  Ej E δLj ¼ f ’ NetLj    potðlþ1Þ  X lþ1 δlþ1 j ¼ 1; M δlj ¼ f ’ Netlj i wij

  l ¼ ðL  1Þ; 2

(7)

i¼1

4 Modified the weights.   þ α Wji l ðtÞ  Wji l ðt  1Þ Wji l ðt þ 1Þ ¼ Wji l ðtÞ  ηδlj Outl1 i

(8)

Where t is the number of iterations, η is the learning efficiency, α is the momentum term, Ej is the estimation value of target Ej , potðlÞðl ¼ 1; 2; ⋯; LÞ is the number of nodes in each layer, 5 Repeat the 2,3 and 4 steps until convergence to the given condition. 2.1.2. Support vector machine(SVM) SVM is a powerful algorithm using for solving the problems in nonlinear classification and regression. It is also a supervised learning methodology which has been widely used in various fields [18–20]. The nonlinear model usually requires enough dataset, a nonlinear mapping can map data to a high-dimensional feature space, where linear regression can be performed. The advantage of kernel functions can avoid the “dimension disaster” that may occur in the ascending dimension circumstance. The sample set supposed as: ðy1 ;x1 Þ;⋯;ðyl ;xl Þ; x 2 Rn; y 2 R, Therefore, the solution of the nonlinear SVR can be obtained based on the following equation Eq (6) using a non-sensitive loss

2

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

function.

2. The regression results obtained by T weak learners are averaged arithmetically as the final output model.

l X l   X    8 1 αi  α*i αj  α*j K xi ; xj > 2 < i¼1 j¼1 maxWðα; α* Þ ¼ max * * α;α α;α > : l X αi ðyi  εÞ  α*i ðyi þ εÞ þ

Therefor, RFR is an ensemble of unpruned regression tree created by means of bootstrap samples of the training dataset and random feature selection in tree induction, then output the mean prediction of the individual trees during regression. Because of these advantages, RFR has become a well-known algorithm that has been applied in a wide variety of prediction area.

(9)

i¼1

With the following constrains: 0  αi  C;

i ¼ 1; …; l

0  α*i  C;

i ¼ 1; …; l

l X  i¼1

2.1.4. Gradient boosting regression(GBR) Gradient boosting regression is a technique for learning from its mistakes. It is essentially brainstorming and integrating a bunch of poor learning algorithms for learning [27–30]. The algorithm computation steps are described as follows:



α*i  αi ¼ 0

Input: Training data set T ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; ⋯; ðxN ; yN Þg; xi 2 Rn

Lagrange multipliers αi and α can be obtained, the regression function which has the following form: f ðxÞ ¼

X SV



α*i  αi Kðxi ; xÞ

1 Initialization (10) f0 ðxÞ ¼ argmin c

2.1.3. Random forest regression(RFR) Random forest regression is an algorithm in ensemble learning that can compete with gradient boosting decision tree(GBDT). Especially, it can be easily trained in parallel, which is attractive in the era of big data [24–26]. RFR outperforms decision tree algorithms lie in it can correct the defects of decision tree, such as the habit of overfitting to the training dataset. The principles and procedures of the RFR algorithm can be described as follows: Input: sample set D ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; ⋯ðxm ; ym Þg

(12)

Output:Regression tree f ðxÞ

* i

N X

Lðyi ; cÞ

(13)

i¼1

2 For m ¼ 1,2……,M. (a) For i ¼ 1,2……,N,calculate. 

 rmi ¼ 

∂Lðyi ; f ðxi ÞÞ ∂f ðxi Þ

(14) f ðxÞ¼fm1 ðxÞ

(b) Fitting the residual rmi , learing a regression tree, the leaf node region Rmj can be obtained for the m sequence tree. j ¼ 1,2……,J. (c) For j ¼ 1,2……,N,calculate.

(11)

Output: Strong Classifier f ðxÞ. 1 For t ¼ 1,2, ……,T. (a) The training set is sampled randomly for the t time, and collect the m times, we can obtain the sampling set Dt containing m samples. (b) The t decision tree model Gt(x) is trainned using the sample dataset Dt. When training the nodes of the decision tree model, select a part of the sample features from all the sample features on the nodes, and select an optimal feature from the random selected part of the sample features to divide decision trees into left and right subtrees.

cmj ¼ argmin c

X

Lðyi ; fm1 ðxi Þ þ cÞ

(15)

xi 2Rmj

(d) Update. fm ðxÞ ¼ fm1 ðxÞ þ

J X

  cmj I x 2 Rmj

j¼1

3 The regression tree can be obtained.

Fig. 1. SEM images of electrolyte sample prepared by chemical pure reagents (A-D) and industrial electrolyte sample used by aluminum industry (E-H). 3

(16)

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 2. EMPA images of the electrolyte sample prepared by the chemical pure reagents.(A SEM images of the selected area; B-G Distribution patterns of various elements for F,Na,Mg,Al,K and Ca).

f ðxÞ ¼ fM ðxÞ ¼

M X J X

  cmj I x 2 Rmj

In order to investigate the phase structure of different electrolyte systems in molten state, XRD analysis was carried out for the four electrolyte systems (Two industrial electrolyte systems and two analytical pure electrolyte systems). The analysis results are shown in Fig. 3. According to the XRD analysis results as shown in Fig. 3, Na3AlF6 phase was observed in the XRD pattern of the electrolyte samples. In Fig. 3 A, it can be seen that the Na3AlF6 is the predominant phase in the 2.5NaF⋅AlF3-1% LiF (CR ¼ 2.5, the content of LiF equal to 1%) electrolyte system. There are sharp peaks with high intensity in Fig. 3 A, which indicates that the crystallization degree of molten electrolyte after quenching is better, the impurity phase is less, and the crystal size is

(17)

m¼1 j¼1

2.2. Sample and instrucment setup In order to compare the microstructures of the electrolytes prepared by chemical purity reagents with industrial electrolytes micromorphology in molten state, the electrolyte comes from Guizhou Aluminum Plant and the electrolyte sample prepared by chemical purity reagents were employed to test the liquidus temperature for primary crystallization. After the high temperature measurement, the stainless steel crucible filling with samples of molten electrolyte was put into water for quenching to keep the phase structure and micro-morphology of the electrolyte in the molten state as far as possible. The quenched samples were taken out of the crucible and grinded, and then the samples were analyzed by scBPANNing electron microscope. The results are shown in Fig. 1. In order to further investigate the distribution of various elements in electrolytes, the samples were analyzed by EPMA(electron microprobe analysis). The images are shown in Fig. 2. The distribution of elements in industrial electrolytes can be consulted in the literature [21]. Fig. 1 clearly shows the differentiation of microscopic morphology for electrolyte sample prepared by chemical pure reagents and industrial electrolyte sample. It can be seen that different compounds are distributed on the electrolyte matrix in granular form for the prepared electrolyte sample in Fig. 1 A. As shown for the high-magnification image in Fig. 1 B, C and D, the electrolyte matrix surface presents a crisscross dendritic overlapping distribution. With the increasing of magnification, the dendritic morphology becomes clearer, and there are small grains among the interlaced dendritic crystals. By observing the rough morphology of industrial electrolyte with laser microscope, it can be found that there are impurities such as black carbon slag in the crystal structure of industrial electrolyte as shown in Fig. 1 E. The microstructure and appearance of the cool electrolyte sample can be seen clearly in Fig. 1 F, G and H, the size of the electrolyte sample grains is unevenly distributed in the sample. The diameter of the maximum particle is less than 100 μm in the scBPANNing area measured by the scBPANNing electronic microscope. The surface of massive particles is compact with the sharply edges and corners. There are white crystalline substance covers the surface of the sample. The distribution and embedding patterns of different elements are shown in Fig. 2. The EMPA images of the electrolyte sample prepared by the chemical pure reagents demonstrate that the main elements of fluorine, sodium and aluminium distribute evenly, without obvious aggregation phenomenon, and the other trace elements also distribute evenly in the prepared electrolyte. Moreover, The distribution and embedding patterns of industrial electrolyte samples have been investigated [21]. Therefore, it can be seen that the distribution of elements is well-distributed in the testing process, and there will be no errors in the measurement of liquidus temperature for primary crystallization due to the uneven distribution of elements in the electrolyte.

Fig. 3. XRD patterns of electrolyte sample prepared by chemical pure reagents (A-B) and industrial electrolyte sample (C-D). 4

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 4. Experimental values versus predicted values of liquidius temperature in LOOCV.

Table 1 RMSE and MRE of liquidus temperature for primary crystallization of Na3AlF6(CR)-Al2O3–AlF3–CaF2 electrolyte system in LOOCV test of BPANN, SVM, RFR and GBR model.

Table 3 RMSE and MRE of liquidus temperature for primary crystallization of Na3AlF6(CR)-Al2O3–MgF2–CaF2–LiF electrolyte system in LOOCV test of BPANN, SVM, RFR and GBR model.

No.

Model

RMSE

MRE(%)

No.

Model

RMSE

MRE(%)

1 2 3 4

BPANN SVM RFR GBR

10.58 9.99 8.27 8.31

0.65 0.63 0.67 0.65

1 2 3 4

BPANN SVM RFR GBR

8.30 6.98 16.69 17.01

0.66 0.57 1.48 1.40

noise of the spectrogram is higher, the peak strength is lower, and the crystallinity of the sample is poor. The XRD patterns of industrial electrolyte samples comes from aluminium production enterprises in central and northern China are shown in Fig. 3C and D. The predominant phase are both Na3AlF6 and CaF2 for the industrial electrolyte. According to the figures as shown, it can be seen that the diffraction peak is sharper with higher strength, while the peak full width at half maximum(FWHM) is relatively smaller. It means that the sample crystallinity is higher and the crystal size is larger. Therefore, the phase structure of these two electrolyte samples is typical and similar to that of the electrolyte used in production enterprise and laboratory. The measurement of their liquidus temperature for primary crystallization is also widely representative and has practical significance.

Table 2 The results of optimization of hyper-parameters for RFR model. No.

Algorithm parameters

Hyper-parameters

1 2 3 4

Estimators Max features Min samples leaf Training RMSE

100 0.05 0.01 7.32

relatively large. Fig. 3 B shows the XRD pattern of the 2.5NaF⋅AlF3-9% (CR ¼ 2.5, the content of LiF equal to 9%) LiF electrolyte system. The image of XRD pattern is quite different from this in Fig. 3 A possibly due to changes in electrolyte composition or measurement reasons, it may even be due to the slower cooling rate. As can be seen from Fig. 3 B, the

5

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 5. Experimental values versus predicted values of liquidus temperature for primary crystallization using BPANN, SVM, RFR and GBR models based on the 11 independent testing samples datasets.

Fig. 6. Experimental values versus predicted values of liquidius temperature in LOOCV.

6

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 7. RMSE in LOOCV versus C and Epsilon with RBF kernel function (A), RMSE in LOOCV versus g and Epsilon with RBF kernel function (B), RMSE in LOOCV versus g and C with RBF kernel function (C).

include Na3AlF6–AlF3– CaF2– Al2O3 system, AlF3–Al2O3–CaF2–LiF system and so on. There are also many researches on electrolyte quinary systems and hexanary systems. In this paper, according to the composition of different electrolyte systems, the data mining method is used to build the prediction model for electrolyte primary crystallization liquidus temperature, and the generalization ability and prediction performance of the models constructed are evaluated. Simultaneously, the established models are compared with the traditional mathematical regression formula models in the prediction performance. In the research work, the leaving-one-out cross-validation (LOOCV) method are employed to evaluate the prediction performance for the different models. In the progress of LOOCV testing, the data sets of N samples were divided into two disjoint subsets, one is the training data set (N-1 samples), the other is the test data set(1 sample). The 1 sample was predicted based on the constructed model, the difference between the predicted value and the experimental value was calculated after developing the model on basis of training dataset. The root mean squared error (RMSE) and mean relative error(MRE) were employed to evaluate the generalization and prediction performance of the model. Generally speaking, the lower RMSE values, the better is the performance of the model [37–44]. The expression of the RMSE is shown as follows:

Table 4 Optimized hyper-parameters of SVM model by using grid search method. No.

Algorithm parameters(SVM)

Hyper-parameters

1 2 3 4 5

C Epsilon g Kernel function Training RMSE

23 0.03 0.5 RBF 6.029

2.3. Implementation In this work, the computation was performed on Online Computational Platform of Material Data Mining (OCPMDM) [31], which can be utilized on the following website: https://www.mgedata.cn. 3. Results and discussion A large number of research results show that there is a certain mathematical and statistical relationship between the liquidus temperature for electrolyte primary crystallization and the composition of electrolyte system. At present, the research work is mostly limited to binary electrolyte systems, such as Na3AlF6–Al2O3 system, Na3AlF6–AlF3 system, Na3AlF6–CaF2 system [32–36]. Ternary electrolyte systems include Na3AlF6–AlF3– Al2O3 system, Na3AlF6–AlF3– CaF2, Na3AlF6–AlF3–MgF2 system and so on, the quaternary electrolyte systems

RMSE ¼

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n uP u ðpi  ei Þ2 t i¼1

n

(18)

MRE was defined as follows:

Fig. 8. Experimental values versus predicted values of liquidus temperature for primary crystallization using BPANN, SVM, RFR and GBR models based on the 10 independent testing samples datasets. 7

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 9. Experimental values versus predicted values of liquidius temperature for the trained model and LOOCV test using the training datasets.

compositions from the literature [8]. The dataset used is the original data without dimensionality reduction or other pretreatments. The back-propagation artificial neural network(BPANN), support vector machine(SVM), random forest regression(RFR)and gradient boosting regression(GBR) algorithm are used to establish prediction models [45–52]. In the work, the BPANN model was carried out with 8 input layer nodes, 3 hidden nodes, 1 output layer nodes and Sigmoid transformation function, while the momentum term is equal to 0.4 with the maximal training number 250000. The SVM model was employed using the RBF kernel function with the capacity parameter C ¼ 10, Gamma ¼ 1.0, while the maximum number of optimizations is equal to 10000 in the computation process. In the RFR model, Estimator is set as 10 with Max features equal to 1, it means that the features used by subtrees account for 100% of the total features. The Min sample leaf is equal to 1, while the Min samples split is equal to 2. For the GBR model, the algorithm parameters are set as follows: Max depth equal to 3, the number of estimators set as 100, Subsample equal to 1.0, Min samples split equal to 2 and the Min samples leaf set as 1.0. The above parameters are set according to the computation experience. The specific work of model hyper-parameter optimization was explored in the research to obtain better prediction results. The algorithm setting parameters of various computation models in section 3.2, 3.3, 3.4 are the same as those in section 3.1. Fig. 4 (A,B,C,D) illustrates the experimental values versus predicted values of liquidus temperatures for primary crystallization by using BPANN, SVM, RFR and GBR model basing on the setting default

Table 5 RMSE and MRE of liquidus temperature for primary crystallization of Na3AlF6(CR)-Al2O3-MgF2-CaF2-KF-LiF electrolyte system in LOOCV test of BPANN, SVM, RFR and GBR models. No.

Model

RMSE

MRE(%)

1 2 3 4

BPANN SVM RFR GBR

2.83 3.70 6.52 6.06

0.25 0.28 0.54 0.45

MRE ¼

n 1X pi  ei  100% n i¼1 ei

(19)

Where ei is the experimental value of i sample, pi is the predicted value of i sample, n is the number of the whole samples. Usually, the smaller the RMSE, the better the performance of the model. 3.1. Na3AlF6(CR)-Al2O3–AlF3–CaF2 system In the present work, The mathematical model was established by using industrial electrolyte component dataset [8], and the generalization ability of the model was validated by leave-one-out cross-validation(LOOCV) method. In the progress of constructing the prediction model, CR, the content of Al2O3, AlF3 and CaF2 were set as independent variable, the liquidus temperature for primary crystallization is the dependent variable. The dataset consisting of 46 samples with 8

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 10. RMSE in LOOCV versus number of hidden layer nodes and learning efficiency from input layer to implicit layer(A), RMSE in LOOCV versus number of hidden layer nodes and learning efficiency from implicit layer to output layer(B). RMSE in LOOCV versus number of hidden layer nodes and momentum(C). RMSE in LOOCV versus learning efficiency from implicit layer to output layer and momentum(D).

model result was obtained using the setting default parameters. In order to further optimize the model and improve the prediction performance, the strategy of grid search was carried out to optimize the hyperparameter (Estimators, Max feature, Min sample leaf) of the RFR model. Table 2 lists the results of optimized hyper-parameters for RFR model. W.Lian et al. [8] used above dataset as external independent validation dataset to predict the liquidus temperature for primary crystallization, but the datasets used to construct the model equation as training dataset can not be found in the literature. In this work, we used 11 independent test samples to validate the performance of each models. The predicted results are shown in Fig. 5. It is obvious that the predicted values of four samples are obviously higher, the distribution is far from ideal line. It may be abnormal data values. Through Fig. 5B, the test errors of the four models have been compared. It is found that the mean relative error of RFR model for 11 independent test samples was 0.543%, which was the smallest based on the RFR model with optimized hyper-parameters.

Table 6 The optimized hyper-parameters of ANN model by using grid search method. No.

Network parameters

Parameter value

1 2 3 4 5 6 7 8

Number of input layer nodes Number of hidden layer nodes Number of output layer nodes Learning efficiency from input layer to hidden layer Learning efficiency from hidden layer to output layer Momentum Training MRE Training RMSE

6 5 1 0.52 0.42 0.4 0.165 1.856

parameters, respectively. The performance of different models on training datasets can be seen in Table 1. In the LOOCV for training dataset, the values of related coefficient(R), RMSE and MRE for RFR model are 0.8466,8.27 and 0.67, respectively. It is found that the performance of RFR model is the best in terms of LOOCV results compared with the other three models for the training datasets. The above RFR

9

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 11. Experimental values versus predicted values of liquidus temperature for primary crystallization using literature equation, BPANN and SVM models based on the 15 testing samples datasets.

Fig. 12. Experimental values versus predicted values of liquidius temperature for the trained model in LOOCV test using the training datasets and verification results using the 23 monitoring datasets for different models.

3.2. Na3AlF6(CR)-Al2O3–MgF2–CaF2–LiF system

of Al2O3,MgF2, CaF2 and LiF were set as independent variable, the liquidus temperature for primary crystallization is the dependent variable. The above variables are adopted to establish the mathematical model by

In the progress of constructing the prediction model,CR, the content 10

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

using data mining algorithms, and the generalization ability of the prediction model is validated by means of the LOOCV method. The dataset were collected from the Ref [53], consiting of 43 electrolyte samples with the CR, content information of Al2O3,MgF2, CaF2 and LiF. The dataset used is the original data without dimensionality reduction or other pretreatments. Fig. 6 illustrates the experimental values versus predicted values of liquidus temperatures for primary crystallization in LOOCV by using BPANN,SVM, RFR and GBR models basing on the default parameters, respectively. The performance of different models on training datasets can be seen in Table 3 . It was found that the generalization ability of SVM model (R ¼ 0.9762,RMSE ¼ 6.98,MRE ¼ 0.57, respectively) outperformed the other models in terms of the R, RMSE and MRE values in Table 3. Generally speaking, the hyper-parameters can be optimized to achieve the ideal performance of model. In this work, grid search method was used to optimize hyper-parameters of the SVM algorithms to further improve the prediction accuracy. The results of hyper-parameters optimization are shown in Fig. 7 and Table 4. In order to obtained the suitable model, the modeling parameters are optimized for SVM model. The RMSE value was computed under different parameters (C, Epsilon and g) with RBF kernel function by using the LOOCV of SVM. The results were shown in Fig. 7. Fig. 7A illustrates RMSE in LOOCV versus C(C ¼ 0–100, step ¼ 2) and Epsilon(ε ¼ 0–0.1, step ¼ 0.02) with RBF kernel function. Fig. 7B illustrates RMSE in LOOCV versus g(g ¼ 0.4–1.5, step ¼ 0.1) and Epsilon(ε ¼ 0–0.1, step ¼ 0.02) with RBF kernel function. Fig. 7C illustrates RMSE in LOOCV versus g(g ¼ 0.4–1.5, step ¼ 0.1) and C(C ¼ 0–100, step ¼ 2) with RBF kernel function. The optimal parameters of SVM model with the RBF kernel function is shown in Table 4. Furthermore, we use the optimized SVM model with the suitable hyper-parameters and other models to predict the liquidus temperature for primary crystallization of 10 independent test samples. The results are shown in Fig. 8. Obviously, the mean relative error of the optimized SVM model is the smallest, which is 0.320%.

Table 7 RMSE and MRE of liquidus temperature for primary crystallization of Na3AlF6(CR)-Al2O3-MgF2-CaF2-KF-LiF-NaF electrolyte system in LOOCV test of BPANN, SVM, RFR and GBR model.

3.3. Na3AlF6(CR)-Al2O3-MgF2-CaF2-KF-LiF system

3.4. Na3AlF6(CR)-Al2O3-AlF3-CaF2-MgF2-LiF-KF-NaF system

In this work, the contents of Al2O3, MgF2,CaF2 KF, LiF and CR value were set as independent variable, the liquidus temperature for primary crystallization is the dependent variable. The dataset were collected from the Ref [54], consiting of 50 electrolyte samples with the information of CR Al2O3, MgF2 CaF2 KF and LiF. The dataset used is the original data without any dimensionality reduction or other pretreatments. The dataset was randomly divided into the training set with 35 samples and the external test set with 15 samples. The BPANN, SVM, RFR and GBR are also utilized to establish the predict model of the liquidus temperature for primary crystallization with the concentration of Al2O3, MgF2,CaF2 KF, LiF and CR value. The generation ability is evaluted in the LOOCV test. The computation results are given in Fig. 7 and Table 4. The predicted results by the training models obtained in Fig. 7 for the 15 samples are shown in Fig. 8. Fig. 9 shows the plots of the experimental values versus predicted values for electrolyte primary crystallization liquidus temperatures by using the BPANN, SVM, RFR and GBR models, respectively. Table 5 lists the RMSE and MRE of liquidus temperature for primary crystallization of Na3AlF6(CR)-Al2O3-MgF2-CaF2-KF-LiF electrolyte system in LOOCV test of BPANN, SVM, RFR and GBR models. As illustrated in Fig. 9 and Table 5, it can be concluded that the BPANN model with the default parameters has the best generalization ability among the different models according to the training dataset. The R, RMSE and MRE values are 0.9650, 2.83 and 0.25, respectively. In order to obtain accurate prediction results, the BPANN model is optimized based on the optimization of hyper-parameters adopted in the model. The results are shown in Fig. 10 and Table 6. In order to compare the performances of the different models, 15 independent test samples were used to verify the predictive performances

In the progress of constructing the prediction model, CR value, the content of Al2O3, AlF3, CaF2, MgF2, LiF, KF and NaF were set as independent variable, the liquidus temperature for primary crystallization is the dependent variable. The above variables are adopted to establish the mathematical model by using data mining algorithms, and the generalization ability of the prediction model is validated by means of the LOOCV method. The experimental data of 203 samples come from our laboratory testing. It includes the CR value and the composition of Al2O3, AlF3, CaF2, MgF2, LiF, KF and NaF. The dataset was randonly divided into the training set with 180 samples and the external test set with 23 samples. The dataset used is the original data without any dimensionality reduction or other pretreatments. The four models were evaluted by LOOCV as shown in Fig. 12 (A,B,C and D) and Table 7. Fig. 12 A,B,C and D illustrate experimental values and predicted values of liquidus temperature for primary crystallization in LOOCV using BPANN,SVM, RFR and GBR models, respectively. Table 7 further shows that the SVM model has the best prediction generalization ability with the highest R ¼ 0.9061, the lowest RMSE ¼ 7.33 and MRE ¼ 0.54 according to the training dataset, respectively. The second ideal model is BPANN, it has the prediction ability with R ¼ 0.8976,RMSE ¼ 7.61 and MRE ¼ 0.56,respectively. Because the four models in Fig. 12 are all obtained using the default parameters. In order to further improve and optimize the ANN model, a grid search method is used to optimize the ANN hyperparameters. The results are shown in Table 8. The minimum RMSE of training samples dataset can be achieved 6.774. Fig. 13 A and B reveal the prediction results for the 23 samples using different prediction models. We can see that the generalization ability of ANN(optimized hyper-parameters) and SVM is superior to the other two models. The prediction mean relative error for 23 samples are 0.09551

No.

Model

RMSE

MRE %

1 2 3 4

BPANN SVM RFR GBR

7.61 7.33 7.87 7.96

0.56 0.54 0.60 0.60

Table 8 The optimized hyper-parameters of ANN model by using grid search method. No.

Network parameters

Parameter value

1 2 3 4 5 6 7 8

Number of input layer nodes Number of hidden layer nodes Number of output layer nodes Learning efficiency from input layer to hidden layer Learning efficiency from hidden layer to output layer Momentum Training MRE Training RMSE

8 4 1 0.68 0.71 0.4 0.499 6.774

of the models based on the optimizing process for BPANN model. Fig. 11 illustrated the experimental values versus predicted values of electroyte primary crystallization liquidus temperature for literature equation, BPANN and SVM model, respectively. For the electrolyte system, the content of AlF3 has a significant influence on liquidus temperature for primary crystallization of electrolyte. The accuracy of the prediction model may be affected by the lack of information about some components when predicting liquidus temperature for primary crystallization. For this group of dataset, the multiple regression model (Zhang Yanli empirical formula) shows better prediction accuracy, and the prediction performance is better than ANN and SVM models.

11

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 13. Comparison of prediction results of the liquidus temperature for primary crystallization by ANN model and traditional empirical formulas.

industrial production process.

and 0.10016,respectively. The optimized BPANN model and empirical formula in literature were used to predict the liquidus temperature for primary crystallization of 23 electrolyte samples respectively. The predicted results are shown in Fig. 13C,D and E. It can be concluded that BPANN model has the smallest mean relative error. The predicted results are obtained by using the BPANN model are obviously better than those of the empirical formulas computation mentioned in the literature [8,10, 12,20,53,55]. It has important application value in the aluminium

3.5. Sensitivity analysis In this paper, different models of liquidus temperature prediction were constructed with excellent performance for electrolyte systems with different components. However, these algorithm models can not reflect the variation trend of liquidus temperature influenced by single 12

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Fig. 14. The results of sensitivity analysis. A: LT versus CR; B:LT versus Al2O3 concentration; C: LT versus AlF3 concentration; D: LT versus LiF concentration; E: LT versus MgF2 concentration; F: LT versus KF concentration; G: LT versus CaF2 concentration.

13

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx

Declaration of competing interest

component factor. In order to analyze the relationship between the component and the liquidus temperature for Na3AlF6(CR)-Al2O3-AlF3CaF2-MgF2-LiF-KF-NaF electrolyte system, sensitivity analysis method is employed in this work to study how the components affect the target value [56–63]. The data mining method of back-propagation artificial neural network is utilized to analyze the sensitivity of single factor based on the 180 samples datasets in above section 3.5. The result is shown in Fig. 14. From Fig. 14 A, it can be seen that the liquidus temperature will increase with the increase of CR value in the range of 2–5.6 when the other factors are fixed(Al2O3 2.7%, AlF3 8.9%, LiF 2.72%,MgF2 0.66%, KF 1.89%,CaF2 4.54%). In Fig. 14B, we can see that the liquidus temperature will decrease with the increase of Al2O3 concentration in the range of 0.44%–5.62% when the other factors are fixed(CR ¼ 2.46, AlF3 8.9%, LiF 2.72%,MgF2 0.66, KF 1.89%,CaF2 4.54%). From Fig. 14C, it can be seen that the liquidus temperature varies with the increasing of AlF3 concentration in the range of 33.67%–42.42% and when the AlF3 concentration is about 36%, the maximum of liquidus temperature appeared, the liquidus temperature reached about 937  C when the other factors are fixed(CR ¼ 2.46, Al2O3 2.7%, LiF 2.72%,MgF2 0.66%, KF 1.89%,CaF2 4.54%). Fig. 14D shows the relationship between liquidus temperature and the concentration of LiF, it is found that liquidus temperature decreases with the increasing of LiF concentration in the range of 0.14%– 6.01% when the other factors are fixed(CR ¼ 2.46, Al2O3 2.7%, AlF3 38.9%,MgF2 0.66%, KF 1.89%,CaF2 4.54%). Fig. 14E illustrates the relationship between liquidus temperature and concentration of MgF2. As shown in Fig. 14E, the liquidus temperature will decrease with the increase of MgF2 concentration in the range of 0.32%–3.73% when the other factors are fixed(CR ¼ 2.46, Al2O3 2.7%, AlF3 38.9%, LiF 2.72%, KF 1.89%,CaF2 4.54%). Fig. 14F reflects the relationship of the liquidus temperature and concentration of KF, it can be seen that the liquidus temperature will decrease with the increasing of KF concentration. in the range of 0.18%–4.92%. In Fig. 14G, when the concentration of CaF2 equal to 4.4%, the liquidus temperature reached the minium value. The liquidus temperature is increasing with the increasing of CaF2 concentration while the concentration of CaF2 is in the range of 4.4%–6.5%. According to the above analysis, we can conclude that if we want to obtain the electrolyte with the lower liquidus temperature, maybe we can either decrese the CR and the NaF concentration parameter, or increase the concentration of Al2O3, AlF3, LiF, MgF2, KF and CaF2 for the local changes around the base setting point in the experiment.

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled. Acknowledge The authors gratefully acknowledge the financial support of the National Natural Science Foundation of China, China(Grant No.51734002), China Aluminum Star Plan program, China(Grant No.2017MXJH05),the research program of China Development Fund of Aluminum International Engineering Co., Ltd., China(Grant No. CJ2015JS-05). We are also grateful to Yi-feng Lan who is laboratory director of Zunyi Aluminum Co. Ltd. for the hard work of electrolyte composition testing. References [1] Z. Qiu, Aluminium Electrolysis by Prebaked Anode Cell, Metallurgical Industry Press, Beijing, 2005. [2] N. Feng, Auminium Eectrolysis, Chemical Industry Press, Beijing, 2006. [3] W. Fu, Z. Wang, Y. Yang, et al., Study on the liquidus temperature of Na3Al F6 -Al F3-Al2O3-CaF2-KCl aluminum electrolyte, J. Metall. Anal. 33 (2013) 28–34. [4] Y. Liu, J. Li, Morden Aluminium Electrolysis, Metallurgical industry press, Beijing, 2008. [5] H. Warren, The liquidus enigma, J. Warrendale Miner. Met. Mater. Soc. (1992) 477–480. [6] J. Chen, D. Li, Molten salts properties and electrolyte compositions with same solubility of alumina at 20 C above liquidus of aluminium electrolyte for Na3AlF6AIF3-LiF-MgF2-CaF2 system, J. Light Met. 1 (2009) 22–26. [7] H. Kan, Z. Wang, Y. Ban, Z. Shi, Z. Qiu, Effect of NaCl and LiF on liquidus temperature of molten cryolite-based alnminnm electrolyte, J. Metall. Anal. 27 (2007) 13–17. [8] F. Ren, H. Li, Z. Cai, Y. You, Study on new device for testing liquidus temperature of aluminium electrolyte and mathematical model of liquidus temperature, J. Metall. Anal. 25 (2005) 9–12. [9] Q. Xu, Z. Qiu, Y. Yu, Multiple regressive analysis and prediction of liquidus temperatures of cryolite with additives, J. Nonferrous Metals 47 (1995) 70–73. [10] A. Rotum, A. Solheim, A. Sterten, Phase diagram data in the system Na3A1F6Li3A1F6-A1F3-A12O3, J. Warrendale Miner. Met. Mater. Soc. (1990) 311–316. [11] S. Rolseth, P. Verstreken, O. Kohbeltvedt, Liquidus temperature determination in molten salts, J. Light Met. (1998) 359–365. [12] A. Sloheim, S. Rolseth, E. Skybakmoen, et al., Liquidus temperature and alumina solubility in the system Na3AlF6-AlF3-LiF-CaF2-MgF2-KF-Al2O3, J. Light Met. (1995) 451–460. [13] D.A. Chin, E. Hollingshead, A.J. Electrochem, The liquidus temperature of K3AIF6Na3F6 system, J. Am. Ceram. Soc. (1966) l13–119. [14] G. TARCE, Determination of factors affecting current efficiency in commereial hall cells using controlled potential coulometry and statistical experiments, J. Light Met. (1991) 453–459. [15] D.P. Ray, T.T. Alton, Liquidus curves for the cryoliteAlF3-CaF2-Al2O3 system in aluminum cell electrolytes, J. Warrendale Miner. Met. Mater. Soc. (1987) 383–388. [16] S. Rolseth, H. Gudbrandsen, J. Thonstad, et al., Low temperature aluminum electrolysis in a high density electrolyte, J. Alum. 81 (2005) 448–450. [17] G. Tsirlina, E. Antipov, D. Glukhov, A. Gusev, V. Laurinavichute, R. Nazmutdinov, D. Simakov, S. Vassiliev, T. Zinkicheva, Specific molecular features of potasstumcont aintng cryolite melts, J. Light Met. (2012) 787–791. [18] A. Solheim, Liquidus temperature depression in cryolitic melts, J. Metall. Mater. Trans. B 43B (2012) 995–999. [19] A. Solheim, S. Rolseth, E. Skybakmoen, L. Stoen, S. Sterten, T. Store, Liquidus temperatures for primary crystallization of cryolite in molten salt systems of interest foraluminum electrolysis, J. Metall. Mater. Trans. B 27B (1996) 739–744. [20] A. Solheim, S. Rolseth, E. Skybakmoen, L. Stoen, Liquidus temperature and alumina solubility in the system Na3AlF6-AlF3-LiF-CaF2-MgF2-KF-Al2O3, J. Light Met. (1995) 73–82. [21] D. Rumelhard, J. Mccelland, Paralled Distributed Processing, Exploration in the Microstructure of Cognition, vol. 1, Bradford Books, MIT Press, Cambridge, 1986 vol. 2. [22] R. Ruffini, et al., Using neural network for springback minimization in a chBPANNel forming process, SAE Trans. J. Mater. Manuf. 107 (1998) 65–69. [23] H. Lu, X. Hu, B. Cao, W. Chai, F. Yan, Prediction of liquidus temperature for complex electrolyte systems Na3AlF6-AlF3-CaF2-MgF2-Al2O3-KF-LiF based on the machine learning methods, J. Chemom. Intell. 189 (2019) 110–120. [24] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32. [25] I. Vatolkin, M. Preuß, G. Rudolph, M. Eichhoff, C. Weihs, Multi-objective evolutionary feature selection for instrument recognition in polyphonic audio mixtures, J. Soft Comput. 16 (2012) 2027–2047.

4. Conclusion The essential differences between the complex industrial electrolyte system and electrolyte system prepared in laboratory were revealed by means of comparing the micro-morphology, phase composition and thermal analysis. It can be concluded that different empirical formulas are only suitable for different electrolyte systems and different application scenarios. The BPANN, SVM, RFR and GBR models were employed to predict the liquidus temperature for primary crystallization of different types of electrolyte systems. The ideal prediction model can be obtained by using data mining algorithm. The relationship between the liquidus temperature for primary crystallization of electrolyte and the composition of electrolyte is revealed by data mining methods, and the soft-measurement model of the liquidus temperature for primary crystallation of different types electrolyte systems were established, which can effectively solve the problems about the high cost of testing the liquidus temperature for primary crystallation of electrolyte or the large prediction error of the traditional prediction formula. It provides a new way to precdict the liquidus temperature for primary crystallization for the industrial electrolyte. So, the prediction method presented in the work has great application prospects in the production of electrolytic aluminium industry.

14

H. Lu et al.

Chemometrics and Intelligent Laboratory Systems xxx (xxxx) xxx [45] M.E. Tipping, Sparse bayesian learning and the relevance vector machine, J. Mach. Learn. Res. 1 (2001) 211–244. [46] N. Chen, W. Lu, R. Chen, C. Li, P. Qin, Chemometric methods applied to industrial optimization and materials optimal design, J. Chemom. Intell. Lab 45 (1999) 329–333. [47] K. Fujimura, A. Seko, Y. Koyama, A. Kuwabara, I. Kishida, K. Shitara, et al., Accelerated materials design of lithium superionic conductors based on firstprinciples calculations and machine learning algorithms, J. Adv. Energy Mat. 3 (2013) 980–985. [48] D. Xue, P.V. Balachandran, J. Hogden, et al., Accelerated search for materials with targeted properties by adaptive design, J. Nat. Commun. 7 (2016) 1–9. [49] M. Attarian Shandiz, R. Gauvin, Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries, J. Comput. Mat. Sci. 117 (2016) 270–278. [50] M. Nazemi, A. Heidaripanah, Support vector machine to predict the indirect tensile strength of foamed bitumen-stabilised base course materials, Road Mater. Pavement Des. 17 (2016) 768–778. [51] T.O. Owolabi, K.O. Akande, S.O. Olatunji, Application of computational intelligence technique for estimating superconducting transition temperature of YBCO superconductors, Appl. Soft Comput. 43 (2016) 143–149. [52] W. Yu, Z. Jiang, J. Wang, R. Tao, Using featureselection technique for drug-target interaction networks prediction, Curr. Med. Chem. 18 (2011) 5687–5693. [53] Y. Liu, F. Ren, Application of BP neural network to predicting value of bath ratio in aluminum electrolyte, J. Metalligical Anal. 26 (2006) 28–31. [54] Y. Zahng, S. Qiu, Research on the liquidus temperature of NaF-AlF3-Al2O3-CaF2-LiFMgF2-KF industrial aluminum electrolyte, J. Light Met. 1 (2019) 34–39. [55] D. Li, X. Wang, Research on mathematical models of liquidus temperature and conductivity of electrolyte in aluminium electrolysis, J. BGRIMM 12 (1993) 59–64. [56] W. Lu, X. Ji, M. Li, L. Liu, B. Yue, L. Zhang, Using support vector machine for materials design, J. Adv. Manuf. 1 (2013) 151–159. [57] V. Vapnik, Statistical Learning Theory. New York:Wiley, 1998. [58] R.R. Nazrnutdinov, et al., A spectroscopic and computational study of Al(III) complexes in sodium cryolite melts:ionic composition in a wide range of cryolite ratios, J. Spectrochim. Acta 75 (2010) 1244–1252. [59] X. Wang, Z. Chen, C. Wang, R. Yan, Z. Zhang, J. Song, Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach, PLoS One 6 (2011) 1–11. [60] M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, D. Haussler, Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Sci. U. S. A. 97 (2000) 262–267. [61] C. Chi, L.T. Yi, S. Ming, J.B.K. Hsu, H. Jorng-Tzong, H. Po-Chiang, W. Yuan, H. Hsien-Da, P. Long, Incorporating support vector machine for identifying protein tyrosine sulfation sites, J. Comput. Chem. 30 (2009) 2526–2537. [62] M. Wang, K. Wang, A. Yan, C. Yu, Classification of HCV NS5B polymerase inhibitors using support vector machine, J. Mol. Sci. Int. Ed. 13 (2012) 4033–4047. [63] X. Zhai, M. Chen, W. Lu, Accelerated search for perovskite materials with higher Curie temperature based on the machine learning methods, J. Comput. Mater. Sci. 151 (2018) 41–48.

[26] C.C. Yeh, D.J. Chi, Y.R. Lin, Going-concern prediction using hybrid random forests and rough set approach, J. Inf. Sci. 254 (2014) 98–110. [27] F. Chen, H. Howard, An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree, J. Soft Comput. 20 (2016) 1945–1960. [28] Nigel Duffy, David Helmbold, Boosting methods for regression, Mach. Learn. 47 (2002) 153–200. [29] Y. Song, H. Zhou, P. Wang, M. Yang, Prediction of clathrate hydrate phase equilibria using gradient boosted regression trees and deep neural networks, J. Chem. Thermodyn. 135 (2019) 86–96. [30] PederBacher CarolinePersson, Multi-site solar power forecasting using gradient boosted regression trees, J. Sol. Energy 105 (2017) 423–436. [31] B. Gao, S. Wang, X. Hu, Liquidus temperatures of Na3AlF6-AlF3-CaF2-KF-LiF-Al2O3 melts, J. Chem. Eng. Data 55 (2010) 5214–5215. [32] A. Apisarov, A. Dedyukhin, E. Nikolaeva, P. Tinghaev, O. Tkacheva, A. Redkin, Y. Zaikov, Liquidus temperatures of cryolite melts with low cryolite ratio, J. Metall. Mater. Trans. B 42B (2011) 236–242. [33] A. Sterten, O. Skar, Some Binary Na3AlF6-MxOy phase diagrams, J. Alum. 64 (1988) 1051–1054. [34] K. Hongmin, W. Zhaowen, S. Zhongning, B. Yungang, C. Xiaozhou, Y. Shaohua, Q. Zhuxian, Liquidus temperature, density and electrical conductivity of temperature electrolyte for aluminum electrolysis, J. Light Met. (2007) 531–535. [35] H. Yan, J. Yang, W. Li, S. Chen, Alumina Solubility in KF-NaF-AIF3-based lowtemperature electrolyte, J. Met. Trans. B 42B (2011) 1065–1070. [36] Redkin Alexander, Olga Tkateheva, Yurii Zaikov, Alexei Apisarov, Modeling of cryolite-alumina melts properties and experimental investigation of low melting electrolytes, J. Light Met. (2007) 513–517. [37] R. Burbidge, M. Trotter, B. Buxton, S. Holden, Drug design by machine learning: support vector machines for pharmaceutical data analysis, Comput. Chem. 26 (2001) 5–14. [38] P. Raccuglia, K.C. Elbert, P.D. Adler, et al., Machine-learning-assisted materials discovery using failed experiments, J. Nat. 533 (2016) 73–76. [39] Q. Su, W. Lu, D. Du, et al., Prediction of the aquatic toxicity of aromatic compouds to tetrahymena pyriformis through support vector regression, J. Oncotarget 8 (2017) 49359–49369. [40] G. Mountrakis, J. Im, C. Ogole, Support vector machines in remote sensing: areview, ISPRS J. Photogrammetry 66 (2011) 247–259. [41] R. Burbidge, M. Trotter, B. Buxton, S. Holden, Drug design by machine learning support vector machines for pharmaceutical data analysis, J. Comput. Chem. 26 (2001) 5–14. [42] H. Li, Q. Xu, Y. Liang, LibPLS: an integrated library for partial least squares regression and linear discriminant analysis, J. Chemom. Intell. 176 (2018) 34–43. [43] J. Cheng, H. Yang, M. Liu, W. Su, P. Feng, H. Ding, W. Chen, H. Lin, Prediction of bacteriophage proteins located in the host cell using hybrid features, J. Chemom. Intell. 180 (2018) 64–69. [44] Q. Zhang, D. Chang, X. Zhai, W. Lu, OCPMDM:Online computation platform for materials data mining, J. Chemom. Intell. Lab 177 (2018) 26–34.

15