Accepted Manuscript Data mining-aided materials discovery and optimization Wencong Lu, Ruijuan Xiao, Jiong Yang, Hong Li, Wenqing Zhang PII:
S2352-8478(17)30061-8
DOI:
10.1016/j.jmat.2017.08.003
Reference:
JMAT 104
To appear in:
Journal of Materiomics
Received Date: 1 August 2017 Revised Date:
7 August 2017
Accepted Date: 7 August 2017
Please cite this article as: Lu W, Xiao R, Yang J, Li H, Zhang W, Data mining-aided materials discovery and optimization, Journal of Materiomics (2017), doi: 10.1016/j.jmat.2017.08.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Graphical Abstract
Data Mining-aided Materials Discovery and Optimization
RI PT
Wencong Lu1*, Ruijuan Xiao2, Jiong Yang1, Hong Li2*, Wenqing Zhang1*
M AN U
SC
(b)
(a)
TE D
Fig. 10(a) The correlation between Nv*K* and the calculated power factors;50 (b) the pseudocubic structure and converged band structures in tetragonal chalcopyrite compounds.51
Highlights—review paper
Both qualitative and quantitative methods adopted widely in materials data mining (MDM) have
EP
AC C
been systematically reviewed to meet different tasks of materials discovery and optimization. The novel qualitative method by using optimal projection recognition technique is reviewed in detail
for controllable synthesis of dendritic Co3O4 superstructures based on pattern recognition
classification diagram.
The detailed MDM process has been demonstrated in case study on materials design of layered double hydroxide with desired basal spacing based on the quantitative modelling method called relevance vector machine.
ACCEPTED MANUSCRIPT The state-of-the-arts of data mining-aided battery materials discovery and thermoelectric materials design have been reviewed, indicating that MDM approach may play a more important role to
EP
TE D
M AN U
SC
RI PT
discover novel materials in future.
AC C
ACCEPTED MANUSCRIPT
Data Mining-aided Materials Discovery and Optimization Wencong Lu1*, Ruijuan Xiao2, Jiong Yang1, Hong Li2*, Wenqing Zhang1* Materials Genome Institute of Shanghai University, and Shanghai Materials Genome Institute, Shanghai 200444, China 2 Institute of Physics, Chinese Academy of Sciences, Beijing, China
RI PT
1
Tel.: 86-021-66132406 E-mail:
[email protected];
[email protected];
[email protected]
EP
TE D
M AN U
SC
Abstract Recent developments in data mining-aided materials discovery and optimization are reviewed in this paper, and an introduction to the materials data mining (MDM) process is provided using case studies. Both qualitative and quantitative methods in machine learning can be adopted in the MDM process to accomplish different tasks in materials discovery, design, and optimization. State-of-the-art techniques in data mining-aided materials discovery and optimization are demonstrated by reviewing the controllable synthesis of dendritic Co3O4 superstructures, materials design of layered double hydroxide, battery materials discovery, and thermoelectric materials design. The results of the case studies indicate that MDM is a powerful approach for use in materials discovery and innovation, and will play an important role in the development of the Materials Genome Initiative and Materials Informatics. Keywords: Data mining, materials design, Co3O4 superstructures, layered double hydroxide, battery materials, thermoelectric materials, materials genome initiative
1. Introduction The Materials Genome Initiative (MGI) global competition was proposed in June 2011 to encourage the development of an infrastructure to shorten the materials development cycle.1 This challenge provides an incentive for scientists to develop, manufacture, and deploy advanced materials as fast as possible, in contrast to traditional discovery and optimization of materials, which is time-consuming, labor intensive, complex, and expensive. It is very difficult to solve most of the complicated problems in materials exploration by using only first principles, i.e., quantum mechanics and statistical mechanics, although many first principles approaches are truly helpful in materials discovery and optimization.2 Besides the first principles strategy, data mining or machine learning is a semi-empirical strategy that uses known data about properties and descriptors (including both computational and experimental parameters) of some materials to find semi-empirical rules, and uses these rules to predict and evaluate the properties of unknown materials.3
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
1
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
In recent years, material data mining (MDM) approaches have been developed rapidly, as known data about some materials can be used not only to establish criteria for new materials formation, but also to construct quantitative structure-property relationship (QSPR) models for predicting the properties of unknown and optimal materials. For example, Xue et al. demonstrated an efficient approach based on adaptive design strategy that could accelerate search for materials with targeted properties by using MDM. In this study, inference and global optimization were simultaneously considered to find the lowest thermal hysteresis of Ni-Ti-based shape memory alloys with successful results.4 Raccuglia et al. reported that the support vector machine (SVM) technique in machine learning could be applied to predict reaction outcomes for the crystallization of templated vanadium selenites with better accuracy than that of predictions obtained from experts.5 Fischer et al. predicted crystal structures by combining modern quantum mechanical methods with machine learning techniques.6 Shi et al. reviewed the applications multi-scale computation methods in lithium-ion battery research and development by combining calculations and experiments linked by a big shared database, enabling accelerated development of the whole industrial chain.7 Lu et al. demonstrated the application of SVM in the formability of perovskite or BaNiO3 structure, the prediction of energy gaps of binary compounds, the prediction of sintered cold modulus of sialon-corundum castable, the optimization of electric resistances of VPTC semiconductors, and the thickness control of In2O3 semiconductor film preparation.3 In this review paper, we briefly introduce data mining methods that can be used for materials discovery and optimization in Section 2; specifically, we describe the methods used in the case studies discussed later. Two typical case studies implemented in our labs are described in detail in Section 3; these demonstrate the process of data mining-aided materials discovery and optimization. State-of-the-art data mining-aided battery and thermoelectric materials discovery are reviewed in Sections 4 and 5, respectively. Some of the issues encountered in data mining approaches to materials discovery and optimization are briefly discussed in Section 6.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
2. Machine Learning for Materials Discovery and Optimization Machine learning algorithms can be classified based on their learning styles into supervised learning (Decision Tree, Boosting, Artificial Neural Network, SVM, etc.), unsupervised learning (Clustering, Associate Rules, etc.), and semi-supervised learning techniques (Generative models, Low-density separation, etc.).8 Supervised learning methods are widely used in MDM, especially in construction of QSPR models. In this approach, labeled data are used to train a machine learning model for predicting the relationship between targeted properties and features. Choosing an appropriate algorithm for practical implementation might be a great challenge in MDM, since there are many machine learning algorithms available. In MDM, different kinds of qualitative and quantitative methods in machine learning are used for materials discovery and optimization. Unsupervised machine learning is 2
ACCEPTED MANUSCRIPT the machine learning task of inferring a function to describe hidden structure from unlabeled data (a classification or categorization is not included in the observations). In semi-supervised learning, unlabeled data are used along with the labeled data to improve the accuracy of the models on the training data. In this paper, we only describe the most popular and commonly used methods and demonstrate their application using case studies.
EP
TE D
M AN U
SC
RI PT
2.1 Qualitative methods used for materials discovery Pattern recognition methods constitute qualitative methods used frequently in materials discovery.9 Qualitative implies that different kinds of material samples can be distinguished by using a classification diagram. Widely-used pattern recognition methods include K nearest neighbor (KNN), principal component analysis (PCA), Fisher vector, and partial least square (PLS) et al.10 Independent variables (often called features or descriptors) that influence the target (dependent variables) are used to span multidimensional spaces. Material samples of different classes are represented as sample points with different symbols in these spaces. Then, pattern recognition methods are used to construct distribution zones of different kinds of sample points. In this way, the mathematical model that describes the regularities of pattern recognition diagram is obtained. Next, we introduce the optimal projection recognition (OPR) algorithm used in case study 1, which is described in Section 3.1.11 The OPR method is a novel pattern recognition technique developed by our research group that selects the projection map of best separability among various classification diagrams of pattern recognitions. In the process of OPR, features are used to span a multi-dimensional space. The representative points of samples are plotted in space. An optimal region on the plot diagram is obtained that encloses optimal sample points. The criteria corresponding to the borderlines of optimal region can be used to describe the zone where known optimal samples are distributed. Therefore, the OPR model can be expressed by a series of inequalities that describe the boundaries of different kinds of samples. Since the projection map obtained is one with the best separability among various pattern recognition diagrams, the result of separation is usually satisfactory, provided that the original classification of deferent classes in the multi-dimensional space is well defined. The trial version of HyperMiner software containing OPR method can be downloaded on the website of Laboratory of Material Data Mining in Shanghai University (http://chemdata.shu.edu.cn:8080/MyLab/Lab/download.jsp).
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
2.2 Quantitative methods used for materials optimization Quantitative methods used commonly in materials optimization include multiple linear or nonlinear regression, artificial neural networks (ANN), support vector regression (SVR), and relevance vector machine (RVM) et al.12 These approaches are especially suitable for constructing QSPR models dealing with multivariate and nonlinear problems. Next, we introduce the RVM algorithm adopted in case study 2, 3
ACCEPTED MANUSCRIPT
RI PT
SC
M AN U
2.3 Flow chart of the MDM process Fig. 1 illustrates the flow chart of the MDM process, which comprises data preparation, descriptor selection, model selection, model evaluation, and model application. The main functions of the different modules are explained to illustrate the MDM process.
TE D
Fig. 1 Flow chart of the MDM process
(1) Data preparation The dataset comprises dependent and independent variables associated with material samples. Dependent variables refer to target properties of materials that are affected by independent variables called features or descriptors. These include parameters such as chemical compositions, atomic or molecular parameters of characteristics, and technical conditions of preparing materials. Candidate descriptors are initially selected by using domain knowledge from materials science and engineering. (2) Descriptor selection Descriptor selection is a great challenge in MDM because key descriptors are the decisive factors in the construction of the machine learning model. The purpose of descriptor selection is to find the most influential features that model target properties without redundancy. Therefore, descriptor selection should reduce the dimensionality of input space without loss of important information. There are two types of descriptor selection technologies, which analyze correlations of descriptors with and without combining the modelling method, respectively. The Pearson correlation map that does not involve any modelling method is commonly utilized to analyze correlations of candidate descriptors. Another technique to analyze correlations of
EP
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
which is described in Section 4. RVM is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression.13 RVM was introduced by Tipping as an alternative to the popular SVR algorithm. It operates in a framework similar to generalized linear regression, but uses yX, W = ∑
〈 , 〉 + as the generative model, where 〈 , 〉 is the kernel function, W = [w1, w2, w3, …, wn] is the weight vector, and w0 is the bias. In the RVM, the training dataset is expressed as T = X , t , where Xn is the input vector and tn is the target value. The target values, which are assumed to include additive noise, can be represented as = , + , where is the error, modelled as a mean-zero Gaussian noise with variance σ2. Since the noise εn is assumed to satisfy Gaussian distribution, the target values will also obey the Gaussian distribution with mean y(Xn) and variance of σ2, i.e., pt |~Nt |yX , σ% . We can solve this problem by using Maximum Likelihood (ML) method.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
descriptors without involving any modelling method is called 14 minimal-Redundancy-Maximal-Relevance (mRMR). The descriptor selection process combining the modelling method mainly consists of two parts: feature subset generation and feature subset evaluation. Feature subset generation is a searching procedure. It generates subsets of features for evaluation. Feature evaluation is to determine which feature should be eliminated or be chosen in feature selection. In our case study in Section 3.2, the advanced feature subset generation method called genetic algorithm (GA)15 was used to find a compact set of key features among candidate descriptors, while the results of descriptor selection were evaluated based on the RMSEs of LOOCV by applying RVM method with feature subset generated by GA individuals.16 The GA is a metaheuristic inspired by the process of natural selection and belongs to a larger class of evolutionary algorithms that are used to generate high-quality solutions to the global optimization and search problem. (3) Model selection It is very important for researchers to construct an optimal model that fits available data without over-fitting or under fitting. Qualitative or quantitative models that relate targeted properties with their candidate descriptors should be constructed using different machine learning methods. The most optimal model among these is selected based on cross-validation results. In our case study in Section 3.2, the model with the least root mean squared error (RMSE) of leave one out cross-validation (LOOCV) is considered to outperform the other models. Besides RMSE, mean squared error (MSE) and coefficient of determination (R2) etc. can be also used to evaluate the performances of machine learning model in cross-validation results. (4) Model evaluation Following descriptor selection, the performance of the constructed machine learning model is evaluated in terms of the predictive results of test sets. The model developed with the selected descriptors and training set is applied to predict the targeted properties of testing set. In general, the lower the value of RMSE obtained for the test sets, the better the model achieved. (5) Model application Based on the data mining results, the developed machine learning model can be applied to predict the targeted properties of unknown samples when their features are input into the model. Therefore, new materials with desired properties can be chosen from suppositional samples before their syntheses based on the predicted results. For the time being, it is not convenient to utilize the machine learning models that are expressed by the complicated functions or nonlinear maps. To facilitate the application of the developed machine learning models, we suggest their use in online services for predicting targeted properties. This will ensure that new materials with desired properties can be obtained after a user inputs necessary information about the unknown sample. The MDM process may be repeated to update the MDM model if more data is obtained during research work. Xue et al. (2016) developed an adaptive design loop to
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
5
ACCEPTED MANUSCRIPT combine machine learning and experiment (or/and computation) to accelerate the discovery of new materials with targeted properties.4 An adaptive design loop is an iterative feedback loop that incorporates data mining, design, and experiment. The iteration continues until the targeted properties are achieved.
TE D
M AN U
SC
RI PT
3. Case Studies 3.1 Pattern recognition of dendritic Co3O4 superstructures11 Specific morphologies and crystallographic phases of transition metal oxides nanostructures materials are being studied owing to their optical, magnetic, and electric properties. The ability to tune the structure, size, and shape of inorganic materials is an important goal in state-of-the-art material syntheses17. However, finding the optimal conditions for controllable synthesis is still an open challenge because of the complicated systems and processes involved.18 Cobalt (II, III) oxide (Co3O4) is an important semiconductor oxide and its synthesis and properties have been of interest owing to possible applications in several fields, such as energy storage and conversion, catalysis, and gas sensing.19 As a unique class of structured materials, dendritic Co3O4 superstructures have attracted research efforts owing to specific morphology and potential applications. However, shape-controlled synthesis of dendritic Co3O4 superstructures is a very difficult task in materials design. In our study, we tried applying OPR to predict the dendritic Co3O4 superstructures shown in Fig. 2. The test results were verified in our lab. The optimal zone where dendritic Co3O4 is located can be obtained by using the OPR method. New samples predicted to be dendritic Co3O4 were analyzed using inverse projection20 based on the OPR method. The predicted results agreed well with our experiments. Therefore, we concluded that our work is useful in materials design and controllable synthesis of nanomaterials.
27 28 29 30 31 32 33 34 35
AC C
EP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Fig. 2 Dendritic Co3O4 superstructures
The multi-dimensional pattern space in this case was spanned by features including the ratio of Co(NO3)2•6H2O and trisodium citrate (Co2+/ TSC), reaction temperature (T), reaction duration time (t), amount of deionized water (W), and acetone (Ace). The samples were classified into two different kinds depending on whether their morphologies are dendritic structures or not. Samples of dendritic structures were grouped into “class 1,” while the others were grouped into “class 2.” 6
ACCEPTED MANUSCRIPT The classification diagram obtained by using the OPR method is shown in Fig. 2. The following inequalities were used as the criteria for deciding about the dendritic structures using the OPR method. 4536≤ 29.57[T] +6.747[Co2+/TSC] -61.93[t] +1.382[W] -1.382[Ace] ≤5651 4021≤ 25.64[T] -37.58[Co2+/TSC] +54.15[t] -5.895[W] +5.896[Ace] ≤5825
●: dendritic Co3O4 superstructures; ○: non Co3O4 superstructures; ▲: Samples designed and predicted to be dendritic Co3O4 superstructures Fig. 3 Classification diagram using OPR method
EP
TE D
Two new samples (labelled ▲ in Fig. 3) were designed by using the inverse projection20 to test the correctness of the OPR model. The experimental results confirmed our predicted results. Therefore, the controllable synthesis of dendritic Co3O4 superstructures was realized by controlling the reaction temperature (T), reaction duration time (t), and the ratio of Co(NO3)2•6H2O and trisodium citrate (Co2+/ TSC). 3.2 Materials design of layered double hydroxide (LDH) with desired basal spacing21 Layered double hydroxide (LDH), which can be represented by the general formula [(M2+1-xM3+x(OH)2)]x+(An-x/n)x-, is a material of interest because of its applications in sustained drug release, wastewater treatment, and adsorbents. In all these examples, the performance of LDH is closely related to its basal spacing.22 The basal spacing is defined as dspacing = dlayer + dinter, where dlayer represents the thickness of the hydrotalcite brucite-like LDH sheet and dinter denotes the length of intercalated species and absorptive water in the interlayer. The basal spacing of LDH can be adjusted by designing its composition. Therefore, accurate prediction of the basal spacing of LDH is a very rewarding task in materials design. Next, we demonstrate materials design of LDH with desired basal spacing; this also serves as a step-by-step summary of the main processes in MDM.
AC C
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
M AN U
SC
RI PT
1 2 3 4 5 6
7
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
(1) Data preparation The dataset consisted of 36 compounds of LDH materials, with basal spacing between 7.5 and 8.0 Å collected from 10 references.21 The candidate descriptors were proposed based on the domain knowledge of LDH materials. Basal spacing of LDH is mainly related to the geometrical factor, whether the element type is divalent or trivalent ionic, and the weights of each composition. Therefore, it is reasonable to suggest that the atomic parameters such as ionic radius, electronegativity, number of interlayer water, and their functions are the candidate descriptors that affect basal spacing of LDH. Here 36 LDH materials with known basal spacing and 23 atomic parameters that function as candidate descriptors were prepared for data mining. (2) Descriptor selection The genetic algorithm (GA)23 based on the cross-validation results obtained by using RVM was implemented to find a compact set of features among the 23 atomic parameters. RVM was applied to develop the prediction models for each feature subset generated by the GA individual, and the prediction accuracies were evaluated by using LOOCV strategy16. Finally, seven best descriptors were selected for the QSPR model. These are na (number of divalent metal ionic), nb (number of trivalent metal ionic), nz (number of anion), nh (number of interlayer water), xa (electronegativity of divalent metal element), ra weighted (weighted ionic radius of divalent metal ionic), and za/ra (ratio of the number of valence electrons and its ionic radius for divalent metal element). (3) Model selection Four different machine learning methods, namely, multiple leaner regression (MLR), PLS, back propagation artificial neural networks (BPANN), and RVM, were used to construct QSPR models for correlating basal spacing with their candidate descriptors. The aim was to select an optimal model based on the cross-validation results. It was found that the RMSEs of LOOCV using MLR, PLS, BPANN, and RVM models were 0.0992, 0.1005, 0.1149, and 0.0911, respectively. The results indicated that the RVM model outperformed the other models based on RMSEs of LOOCV tests. Therefore, the RVM model was selected to predict the basal spacing of the LDH material. (4) Model evaluation
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
8
SC
Fig. 4 Comparison of experimental basal spacing and predicted basal spacing
TE D
M AN U
After descriptor selection, the RVM model was further evaluated with respect to its performance in cross validation and prediction results of the test set. The RVM model with the selected descriptors was constructed by using the training set and applied to predict basal spacing of testing set. Fig. 4 shows the plots of experimental basal spacing versus predicted basal spacing of LDH materials for the training set in LOOCV and the testing set, respectively. It was found that the best RVM model with the least RMSE of LOOCV was achieved for the training model when gamma of the Gaussian kernel function was 0.9. (5) Model application Based on our data mining results, we conclude that the developed RVM model can predict the basal spacing for LDH given the atomic parameters.
15 16 17 18 19 20 21
AC C
EP
1 2 3 4 5 6 7 8 9 10 11 12 13 14
RI PT
ACCEPTED MANUSCRIPT
Fig. 5 XRD pattern of new LDH material In this case study, the new material Ni0.67Al0.33(OH)2(CO3)0.17·0.8 H2O was predicted to be a LDH with 7.61 Å of basal spacing. Fig. 5 shows the X-ray diffraction (XRD) pattern of Ni0.67Al0.33(OH)2(CO3)0.17 LDH material. According to the 003 diffraction peaks from the Scherrer equation, the basal spacing of the 9
ACCEPTED MANUSCRIPT
4 Data Mining-Aided Battery Materials Discovery
RI PT
Ni0.67Al0.33(OH)2(CO3)0.17 LDH is obtained as 7.68 Å. Therefore, the relative error of the basal spacing for the Ni-Al-CO3 LDH was 0.78%. To facilitate the application of the developed RVM model, an online service for predicting basal spacing of LDH can be developed based on the RVM model; this process takes as user input necessary information about the formula of LDH. The prediction web service can assist researchers in synthesizing LDH materials with desired basal spacing efficiently.21
M AN U
SC
In this section, the use of data mining techniques in the discovery of lithium battery materials is discussed. The aim here is to explore new cathodes and solid electrolytes for next generation lithium batteries. By combining high-throughput screening and machine learning, we try to understand the mechanism of diffusion phenomenon in lithium batteries, and analyze experimental data to extract important descriptors for structure-property relationship of battery materials.
TE D
4.1 Discovery of new battery materials by high-throughput screening With the rapid increase in computational power and development of efficient theoretical methods, high-throughput calculations have become a promising way to discover new materials. Using this screening method, researchers have explored several new materials that can be potentially applied in lithium batteries. Previous work in this field include the screening and design of high-capacity cathodes,24-26 low-strain cathodes,27 anodes,28 solid state electrolytes,29-30 and electrolyte additives.31
EP
The high-throughput screening criteria and procedures for analyzing lithium battery materials from inorganic crystal structure database (ICSD)32 are presented in Ref. 21C26. In 2011, Ceder’s group proposed high-throughput ab initio calculations to discover high-capacity cathodes.24-25 By substituting for host elements in both phosphates and polyanions, new structures were created and their capacity, voltage, specific energy, energy density, and thermal stability were evaluated. Using high-throughput structure creation and calculations, 270 new structures and constituents were constructed and some potential cathode materials like Li3Mn(CO3)(PO4), Li2V(CO3)(PO4), and Li3V(CO3)(SiO4) were identified. Besides oxides, lithium-containing sulfides are also potential materials for lithium battery cathodes. Ouyang’s group 26 screened the ICSD database and found three potential sulfide cathodes, LiInS2, LiYS2, and LiGaS2. By searching in a wide chemical compositional space using density functional theory calculations, the low-strain cathode materials in the structure of LiFePO4 were discovered, and excellent cycle-life performances were realized in the optimized materials.33
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Lithium-containing solid state electrolytes with high ionic conductivity and 10
ACCEPTED MANUSCRIPT electrochemical stability are key materials for next-generation all-solid-state lithium secondary batteries. The approaches used to understand ionic migration in solids include space topology analysis,34 bond-valence estimation,35 and quantum mechanical simulation based on transition-state method.36 In these methods, as the accuracy improves, the computational cost also increases. Xiao et al. proposed a high-throughput design and optimization scheme that combines simulation techniques of different levels of accuracy.30 Using this scheme, screening of more than 1000
8
compounds was performed,37 and doping derivatives of β-Li3PS4 with lower Li+ migration energy barriers were found.30, 38
EP
TE D
M AN U
SC
4.2 Discovery of new battery materials by machine learning Advanced machine learning methods are important in the field of data mining as they have the potential to solve many complex problems including prediction of physical properties, crystal structures, and applications for specific materials.39 Fjimura et al. used machine-learning techniques to combine theoretical and experimental datasets to predict the ionic conductivity of LISICONs with the general formula Li8-cAaBbO4.29 The theoretical data were taken from systematic sets of first-principles calculations based on cluster expansion and first-principles molecular dynamics simulations, while the experimental data were from measurement at 373 K. Several compositions that exhibited higher ionic conductivities when compared to traditional LISICON were discovered based on the prediction, as shown in Fig. 6.
AC C
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
RI PT
1 2 3 4 5 6 7
Fig. 6 Predicted ionic conductivities at 373 K, σ373, for 72 compositions in the system Li8−cAaBbO4, where Am+ = Zn, Mg, Al, Ga, P, or As, and Bn+ = Ge or Si, and c = ma + nb. Values of σ373 were obtained by iterative analysis of calculated datasets and experimental datasets.29 Machine learning techniques are also used to create new derivatives from known structures. Shandiz and Gauvin used the classification method in machine learning to identify three major crystal systems of silicate-based cathodes with 11
ACCEPTED MANUSCRIPT Li-Si-(Mn,Fe,Co)-O compositions39. Apart from the prediction of crystal system, the relationship between types of crystal structures and material properties was also studied. The efficiency and accuracy of several algorithms were evaluated for crystal system classification.
EP
TE D
M AN U
SC
RI PT
4.3 Deepen understandings via data mining and molecular dynamics simulations Besides discovering new structures that show potential applications in batteries, data mining method is also helpful in understanding the complex phenomenon in lithium batteries. The key process during the charge and discharge of lithium batteries is the transportation of lithium ions. The diffusion mechanisms, including the Li+ migration pathways and barriers, play a vital role in the performance of batteries. However, it is difficult to arrive at a comprehensive physical description for the ionic conductivity. This prompted researchers to study the structure-property relationships in fast ionic conductors. Jalem et al. surveyed promising compositions with extremely low Li migration energy within the newly discovered favorite structure LiMTO4F (M3+ - T5+ and M2+ - T6+ pairs; M is nontransition metals).40 The nudged-elastic-band (NEB) method based on transition-state theory was adopted by them to obtain the Li+ migration energy (ME). Candidate solid electrolytes with ME less than 0.30 eV were identified within the favorite structure and an informatics-based neural network model was employed to elucidate the effects of the bottleneck size, covalency, and local lattice distortion on the migration energy, as shown in Fig. 7.40
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
12
TE D
Fig. 7. (a) Fitting quality of the final neural network model for ME prediction, (b) calculated ME causal index (CI) values of input neuron variables, and (c) schematic representation of selected variables with high CI values and their spatial relationship with the Li ion bottleneck pathway.40
EP
The calculations based on the NEB method are helpful in extracting the structure characteristics and their effects on the Li+ migration energy. Molecular dynamics simulations, which serve as a powerful resource in investigating kinetic properties, were used by Chen et al. to unravel the Li diffusion mechanism within the garnet family Li7La3Zr2O12(LLZO) crystal lattice.41 By carrying out density-based clustering of the trajectories computed using molecular dynamics, Li hops contributing to the ionic conductivity were found, and the back-and-forth jumping mode that makes no contribution to the diffusivity was also discovered. Both solid electrolytes and low-strain electrodes are key for the next-generation all-solid-state lithium batteries. The lesser the volume change of electrodes in the insertion and extraction of Li ions, the higher the mechanical stability expected at the electrode/electrolyte interface. To improve the cycle ability of the solid-state lithium batteries, the discovery of low-strain electrode is essential. Wang et al. developed the QSPR formulations of cathode volume changes using a combination of ab initio calculations and PLS analysis.27 The variable importance analysis indicates that the radius of transition metal ions and transition metal octahedron distortion contribute majorly to the volume change of cathode during delithiation, which provides
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
13
ACCEPTED MANUSCRIPT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
information on low-strain cathode design.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
5 Data Mining-Aided Novel Thermoelectric Materials Design Thermoelectric (TE) materials are one type of the energy materials which can convert between heat and electricity. The efficiency of thermoelectrics can be measured by the dimensionless figure of merit ZT=S2σT/κ, where S is the Seebeck coefficient, σ is the electrical conductivity, T is the absolute temperature, and κ is the thermal conductivity. Good TE materials require good electrical transport properties, as represented by the S2σ, and poor thermal transport properties. Understanding all these processes and their relations with the structures, i.e., the QSPR, is crucial for the design of the new TE candidates. In this section, we will review the high-throughput efforts implemented in the TE field, and the QSPR based on MDM methods.
EP
TE D
M AN U
SC
RI PT
4.4 Experimental data analysis aided by data mining High-throughput methodology includes both theoretical and experimental techniques. Using the high-throughput experiments, a series of samples surrounding composition space can be synthesized and characterized. For example, the high-throughput physical vapor deposition system provides a way to produce thin film sample library for perovskite ion conductor Li3xLa2/3-xTiO3.42 Synthetic parameters like elemental composition, thickness, and deposition and annealing temperatures, can be optimized by using data analysis techniques in data mining. Furthermore, the analysis of raw diffraction data by PCA helps build the phase distribution diagram in the composition space. Aoun et al. introduced correlation functions and statistical scedasticity formalism to analyze high-throughput in-situ high-energy X-ray diffraction data.43 Their research demonstrated that Pearson’s correlation function can easily unravel all major phase transitions and minor structural change using a series of diffraction patterns.
AC C
5.1 High-throughput search for new candidates The transport properties of TE compounds are mainly determined by the quantum mechanics of charge carriers. This discovery led to several high-throughput implementations in the TE field based on first principles calculations of electronic structures, and new candidates have been proposed ever since. The first high-throughput search for TE candidates can be traced back to 2006. Madsen et al. screened Sb-based compounds from ICSD, and narrowed down this search to one new candidate, LiZnSb. The predicted ZT is approximately 2.0, based on the parameterized electronic relaxation time and lattice thermal conductivity.44 In 2008, Yang et al. carried out electrical transport calculations to evaluate the power factor and the optimal doping levels in half-Heusler alloys with 1:1:1 atomic ratio in ICSD.45 The samples were selected from a larger compound pool based on several selection rules. As a result, new half-Heusler compounds with good electrical transport properties were proposed; these are shown in Fig. 8(a). The large power 14
ACCEPTED MANUSCRIPT
RI PT
factor of one of the proposed candidates, NbFeSb, was experimentally verified years later and it became one of the best p-type half-Heusler compounds.46 Researchers from the Materials Project designed a workflow for screening new TE candidates, including calculations of electrical transport properties, minimum thermal conductivity, and doping capability. XYZ2 compounds were predicted to have good figure of merits, and specifically, TmAgTe2 was intensively studied.47 Although this particular compound was not so promising experimentally, an isoelectronic substitution, YCuTe2, was proposed by experimentalists and was found to possess better TE properties; this is shown in Fig. 8(b).48
M AN U
SC
1 2 3 4 5 6 7 8 9 10
(a)
TE D
Fig. 8 (a) High-throughput screening of the thermoelectric half-Heusler compounds;45 (b) Optimization of YCuTe2 proposed by the high-throughput calculations48
EP
5.2 High-throughput transport algorithms based on big data Microscopically, both electrical and thermal transport processes are composed of carrier dispersion parts (known as electronic structures and phonon spectra) and scattering parts. If only the commonly believed dominant scattering mechanisms, i.e., the electron-phonon interaction for electrical transport and phonon-phonon interaction for lattice thermal transport are considered, all the transport parameters are calculable under density functional (perturbational) theory. Readers can refer to Ref [49] for calculation details. However, when dealing with a large number of samples such as in the high-throughput implementations, the cost for phonon spectra and any scattering process are computationally formidable. On the other hand, big data results offer the possibility of developing and examining high-throughput transport algorithms. For the heat transport, Toher et al. adopted the quasiharmonic Debye approximation to calculate Debye temperature and lattice thermal conductivity.50 Their method, called Automatic Gibbs Library, is significantly cheaper when compared to the full first principles method, and is evaluated on a variety of crystalline structures, as shown in Fig. 9(a). The Pearson correlation coefficients between the so-obtained lattice thermal conductivities and experimental values range from 0.27 to 0.98, with an average of 0.88. Carrete et al. aimed at a more accurate
AC C
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
(b)
15
ACCEPTED MANUSCRIPT
SC
RI PT
high-throughput algorithm for computing the lattice thermal conductivity for half-Heusler compounds (75 thermodynamically stable entries).51 Three different strategies were adopted. The first one used anharmonic force constants from another compound Mg2Si, which has a similar structure and whose harmonic parts are fully calculated. This method separates compounds with low and high lattice thermal conductivities. The second strategy follows the standard MDM procedure, i.e., using 32 (out of the 75) fully calculated entries as the training set to obtain a fitted model using random-forest regression. The last strategy performs PCA of the independent anharmonic force constants and finds the four dominant components. The lattice thermal conductivities based on this strategy are very accurate when compared with fully calculated values; this is shown in Fig. 9(b). Apart from the aforementioned efforts, Miller et al. also provided a lattice thermal conductivity model for high-throughput prediction based on a material data set from literature.52
Fig. 9 (a) The lattice thermal conductivities predicted by Automatic Gibbs Library;50 (b) lattice thermal conductivity model generated by PCA analysis in half-Heusler compounds51
EP
The development of the electrical transport algorithm is two-fold. On one hand, the calculation of the dispersion part based on the explicit electronic structure has been widely accepted.53-54 This enhances our understanding of the QSPR and helps material design from the perspective of the band structure (this will be discussed in Section 5.3). With regard to the scattering of electrons, Yan et al. proposed an empirical formula for carrier mobility based on bulk modulus and single-valley effective mass, and determined their exponentials by using MDM.55 Overall, carrier scattering is far less studied owing to extremely high computational cost, and is not suitable for current high-throughput purpose.
AC C
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
(b)
TE D
(a)
M AN U
1 2 3 4 5 6 7 8 9 10 11 12 13 14
5.3 QSPR and materials design Understanding the QSPR is also crucial for designing new functional materials. In TE, both the crystalline and electronic structures determine the transport properties. The QSPR is usually one of the topics in the HT works for TE. Wang et al. revealed several QSPRs based on their high-throughput study. These included positive 16
ACCEPTED MANUSCRIPT
SC
RI PT
correlations between power factors and number of atoms in one primitive cell, effective masses, and energy gaps.56 The revealed QSPRs were consistent with those derived from simple models, though no a priori assumptions were made in their work. Gibbs et al. proposed an equivalent quantity for the Fermi surface complexity, Nv*K*, where Nv* is the valley degeneracy and K* is the valley anisotropy.57 The quantity can be further calculated from first principles transport calculations, and shows good correlation with calculated power factors (Fig. 10(a)). As mentioned by the authors, Nv*K* can be used as a simple indicator for TE compounds with complex Fermi surfaces and good electrical transport properties. Another good example of QSPR in TE is the “unity-η” rule found in chalcopyrite diamond-like compounds. Zhang et al. revealed a strong correlation between the band degeneracy and structural parameter& = c/2a in tetragonal chalcogenides.58 The TE performance of these compounds and their structural parameter are connected through this QSPR, and the “unity-η” structure (pseudocubic structure) results in converged bands and enhanced power factors as shown in Fig. 10(b). Consequently, a simple rule is proposed to screen for promising candidates in the crystal database. The authors further proposed alloying between compounds with η > 1 and η < 1 to achieve better power factors and ZTs, and tested this approach in their experiments.
M AN U
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Fig. 10 (a) Correlation between Nv*K* and calculated power factors;57 (b) pseudocubic structure and converged band structures in tetragonal chalcopyrite compounds58
AC C
20 21 22 23 24 25 26 27 28 29 30 31 32
EP
(a)
TE D
(b)
6. Concluding Remarks From the work reviewed in this paper, it can be concluded that MDM methods are very helpful in materials discovery and optimization because of their relative good performance, speed, and simplicity, both in obtaining classification diagrams and in construction of QSPR models. Based on the issues faced in materials discovery and optimization, MDM tasks can be classified into at least four different categories. The first type of task aims to solve the “formability problems,” i.e., to find some mathematical model or criteria for the stability of unknown materials. The second set is associated with “property prediction,” i.e., construction of QSPR models for 17
ACCEPTED MANUSCRIPT predicting the properties of new materials (or the inverse problem: to search for unknown new materials with some pre-assigned properties). The third type aims to solve “optimization problems,” i.e., to find conditions for optimizing some properties of materials. The last type of task deals with solutions to the “problem of control,” i.e., to find a mathematical model to control some index of materials within a desired range. Different data mining techniques should be adopted for these different purposes. In the development of novel materials using MDM, it is time-consuming and costly to obtain sufficient experimental samples. Therefore, efficient learning from a limited number of samples is becoming increasingly important for shortening the materials development cycle. Although the available data of known materials used in the training sets may be rather limited, MDM methods such as RVM and SVM can still be implemented based on statistical learning theory.59 However, it is expected that a large volume of materials data will be generated in high throughput computations and high throughput experiments, which will provide a great opportunity for MDM. Professor Zadeh, the famous scientist who proposed fuzzy mathematics, stated that different techniques of soft computing are usually synergistic rather than competitive.60 This point of view agrees with our experience in data mining-aided materials discovery and optimization. With the rapid development in MGI, it is imperative that the entire global materials community must build materials databases and share valuable successful and failed materials data with clear descriptions of data generation conditions. It is expected that MDM approach will play an important role in discovering novel materials in half time and cost.
29
Program of China(No. 2016YFB0700504, 2017YFB0701600) and Science and
30 31 32 33 34 35 36 37 38 39 40 41
Technology Commission of Shanghai Municipality of China (No. 15DZ2260300 and No. 16DZ2260600) are gratefully acknowledged.
TE D
M AN U
SC
RI PT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
AC C
EP
Acknowledgements Financial supports to this work from National Key Research and Development
References 1. Materials Genome Initiative for Global Competitiveness. Coucil, N. S. a. t., Ed. Washington D.C.: America, 2011. 2. Ceder, G., Opportunities and challenges for first-principles materials design and applications to Li battery materials. MRS Bull. 2011, 35 (9), 693-701. 3. Lu, W.-C.; Ji, X.-B.; Li, M.-J.; Liu, L.; Yue, B.-H.; Zhang, L.-M., Using support vector machine for materials design. Adv. in Manuf. 2013, 1 (2), 151-159. 4. Xue, D.; Balachandran, P. V.; Hogden, J.; Theiler, J.; Xue, D.; Lookman, T., Accelerated search for materials with targeted properties by adaptive design. Nat. 18
ACCEPTED MANUSCRIPT 5.
6.
11.
12. 13. 14.
15. 16.
M AN U
10.
TE D
9.
EP
8.
SC
RI PT
7.
Commun. 2016, 7, 11241. Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J., Machine-learning-assisted materials discovery using failed experiments. Nature 2016, 533 (7601), 73-76. Fischer, C. C.; Tibbetts, K. J.; Morgan, D.; Ceder, G., Predicting crystal structure by merging data mining with quantum mechanics. Nat. Mater. 2006, 5 (8), 641-646. Siqi, S.; Jian, G.; Yue, L.; Yan, Z.; Qu, W.; Wangwei, J.; Chuying, O.; Ruijuan, X., Multi-scale computation methods: Their applications in lithium-ion battery research and development. Chinese Phys. B 2016, 25 (1), 018212. Mountrakis, G.; Im, J.; Ogole, C., Support vector machines in remote sensing: A review. ISPRS J. Photogramm. 2011, 66 (3), 247-259. Sapatinas, T., Discriminant Analysis and Statistical Pattern Recognition. J. Roy. Stat. Soc. A STA 2005, 168 (3), 635-636. Jain, A. K.; Duin, R. P. W.; Jianchang, M., Statistical pattern recognition: a review. IEEE T. Pattern Anal. 2000, 22 (1), 4-37. Mi Lin Wu, L. M. Z., Tian Hong Gu, Na Qian, Wen Jing Ma, Wen Cong Lu, Shape-Controlled Synthesis and Pattern Recognition of Dendritic Co3O4 Superstructures. Adv. Mater. Res. 2013, 652-654, 352-355. Croux, C.; Gallopoulos, E.; Van Aelst, S.; Zha, H., Machine Learning and Robust Data Mining. Comput. Stat. Data An. 2007, 52 (1), 151-154. Tipping, M. E., Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001, 1, 211-244. Hanchuan, P.; Fuhui, L.; Ding, C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE T. Pattern Anal. 2005, 27 (8), 1226-1238. Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc.: 1989; p 372. Gu, T.; Lu, W.; Bao, X.; Chen, N., Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors. Solid State Sci. 2006, 8 (2), 129-136. Wang, X.; Tian, W.; Zhai, T.; Zhi, C.; Bando, Y.; Golberg, D., Cobalt(ii,iii) oxide hollow structures: fabrication, properties and applications. J. Mater. Chem. 2012, 22 (44), 23310-23326. Ren, Y.; Ma, Z.; Bruce, P. G., Ordered mesoporous metal oxides: synthesis and applications. Chem. Soc. Rev. 2012, 41 (14), 4909-4927. Meher, S. K.; Rao, G. R., Effect of Microwave on the Nanowire Morphology, Optical, Magnetic, and Pseudocapacitance Behavior of Co3O4. J. Phys. Chem. C 2011, 115 (51), 25543-25556. Nianyi, C.; Wencong, L.; Ruiliang, C.; Chonghe, L.; Pei, Q., Chemometric methods applied to industrial optimization and materials optimal design. Chemometr. Intell. Lab. 1999, 45 (1), 329-333.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
17.
18. 19.
20.
19
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
21. Zhang, Q.; Zhai, X.; Xiong, P.; Kou, L.; Ji, X.; Lu, W., Prediction and synthesis of novel layered double hydroxide with desired basal spacing based on relevance vector machine. Mater. Res. Bull. 2017, 93, 123-129. 22. Pahalagedara, M. N.; Samaraweera, M.; Dharmarathna, S.; Kuo, C.-H.; Pahalagedara, L. R.; Gascón, J. A.; Suib, S. L., Removal of Azo Dyes: Intercalation into Sonochemically Synthesized NiAl Layered Double Hydroxide. J. Phys. Chem. C 2014, 118 (31), 17801-17809. 23. Long, T.; McDougal, O. M.; Andersen, T., GAMPMS: Genetic algorithm managed peptide mutant screening. J. Comput. Chem. 2015, 36 (17), 1304-1310. 24. Hautier, G.; Jain, A.; Chen, H.; Moore, C.; Ong, S. P.; Ceder, G., Novel mixed polyanions lithium-ion battery cathode materials predicted by high-throughput ab initio computations. J. Mater. Chem. 2011, 21 (43), 17147-17153. 25. Hautier, G.; Jain, A.; Ong, S. P.; Kang, B.; Moore, C.; Doe, R.; Ceder, G., Phosphates as Lithium-Ion Battery Cathodes: An Evaluation Based on High-Throughput ab Initio Calculations. Chem. Mater. 2011, 23 (15), 3495-3508. 26. Ling S G, G. J., Chu G, Huang J, Xiao R J, Ouyang C Y, Li H and Chen L Q, Application of High-Throughput Calculations for Screening Lithium Battery Materials. Mater. China 2015, 34, 272-281. 27. Wang X L, X. R. J., Li H, Chen L Q, Quantitative structure-property relationship study of cathode volume changes in lithium ion batteries using ab-initio and partial least squares analysis. J. Materiomics 2017, (in press). 28. Kirklin, S.; Meredig, B.; Wolverton, C., High-Throughput Computational Screening of New Li-Ion Battery Anode Materials. Adv. Energy Mater. 2013, 3 (2), 252-262. 29. Fujimura, K.; Seko, A.; Koyama, Y.; Kuwabara, A.; Kishida, I.; Shitara, K.; Fisher, C. A. J.; Moriwake, H.; Tanaka, I., Accelerated Materials Design of Lithium Superionic Conductors Based on First-Principles Calculations and Machine Learning Algorithms. Adv. Energy Mater. 2013, 3 (8), 980-985. 30. Xiao, R.; Li, H.; Chen, L., High-throughput design and optimization of fast lithium ion conductors by the combination of bond-valence method and density functional theory. Sci. Rep. 2015, 5, 14227. 31. Halls, M. D.; Tasaki, K., High-throughput quantum chemistry and virtual screening for lithium ion battery electrolyte additives. J. Power Sources 2010, 195 (5), 1472-1478. 32. Inorganic Crystal Structure Database, ICSD. Karlsruhe: Fachinformationszentrum, 2008. 33. Nishijima, M.; Ootani, T.; Kamimura, Y.; Sueki, T.; Esaki, S.; Murai, S.; Fujita, K.; Tanaka, K.; Ohira, K.; Koyama, Y.; Tanaka, I., Accelerated discovery of cathode materials with prolonged cycle life for lithium-ion battery. Nat. Commun. 2014, 5, 4553. 34. Polyakov, V. I., Visualization of conduction channels and the dynamics of ion transport in superionic conductors. Phys. Solid State 2001, 43 (4), 655-662.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
20
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
35. Adams, S.; Swenson, J., Pathway models for fast ion conductors by combination of bond valence and reverse Monte Carlo methods. Solid State Ionics 2002, 154, 151-159. 36. Ouyang C Y, C. L. Q., Physics towards next generation Li secondary batteries materials: A short review from computational materials design perspective. Sci. China-Phys. Mech. Astron. 2013, 56, 2278-2292. 37. Xiao, R.; Li, H.; Chen, L., Candidate structures for inorganic lithium solid-state electrolytes identified by high-throughput bond-valence calculations. J. Materiomics 2015, 1 (4), 325-332. 38. Wang, X.; Xiao, R.; Li, H.; Chen, L., Oxygen-driven transition from two-dimensional to three-dimensional transport behaviour in [small beta]-Li3PS4 electrolyte. Phys. Chem. Chem. Phys. 2016, 18 (31), 21269-21277. 39. Attarian Shandiz, M.; Gauvin, R., Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries. Comp. Mater. Sci. 2016, 117, 270-278. 40. Jalem, R.; Kimura, M.; Nakayama, M.; Kasuga, T., Informatics-Aided Density Functional Theory Study on the Li Ion Transport of Tavorite-Type LiMTO4F (M3+–T5+, M2+–T6+). J. Chem. Inf. Model. 2015, 55 (6), 1158-1168. 41. Chen, C.; Lu, Z.; Ciucci, F., Data mining of molecular dynamics data reveals Li diffusion characteristics in garnet Li7La3Zr2O12. Sci. Rep. 2017, 7, 40769. 42. Beal, M. S.; Hayden, B. E.; Le Gall, T.; Lee, C. E.; Lu, X.; Mirsaneh, M.; Mormiche, C.; Pasero, D.; Smith, D. C. A.; Weld, A.; Yada, C.; Yokoishi, S., High Throughput Methodology for Synthesis, Screening, and Optimization of Solid State Lithium Ion Electrolytes. ACS Comb. Sci. 2011, 13 (4), 375-381. 43. Aoun, B.; Yu, C.; Fan, L.; Chen, Z.; Amine, K.; Ren, Y., A generalized method for high throughput in-situ experiment data analysis: An example of battery materials exploration. J. Power Sources 2015, 279, 246-251. 44. Madsen, G. K., Automated search for new thermoelectric materials: the case of LiZnSb. Journal of the American Chemical Society 2006, 128 (37), 12140-12146. 45. Yang, J.; Li, H.; Wu, T.; Zhang, W.; Chen, L.; Yang, J., Evaluation of Half‐ Heusler Compounds as Thermoelectric Materials Based on the Calculated Electrical Transport Properties. Advanced Functional Materials 2008, 18 (19), 2880-2888. 46. Fu, C.; Zhu, T.; Liu, Y.; Xie, H.; Zhao, X., Band engineering of high performance p-type FeNbSb based half-Heusler thermoelectric materials for figure of merit zT> 1. Energy & Environmental Science 2015, 8 (1), 216-220. 47. Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pöhls, J.-H.; Broberg, D.; Chen, W.; Jain, A., Computational and experimental investigation of TmAgTe 2 and XYZ 2 compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening. Journal of Materials Chemistry C 2015, 3 (40), 10554-10565.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
21
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
48. Aydemir, U.; Pöhls, J.-H.; Zhu, H.; Hautier, G.; Bajaj, S.; Gibbs, Z. M.; Chen, W.; Li, G.; Ohno, S.; Broberg, D., YCuTe 2: a member of a new class of thermoelectric materials with CuTe 4-based layered structure. Journal of Materials Chemistry A 2016, 4 (7), 2461-2472. 49. Yang, J.; Xi, L.; Qiu, W.; Wu, L.; Shi, X.; Chen, L.; Yang, J.; Zhang, W.; Uher, C.; Singh, D. J., On the tuning of electrical and thermal transport in thermoelectrics: an integrated theory–experiment perspective. npj Computational Materials 2016, 2, 15015. 50. Toher, C.; Plata, J. J.; Levy, O.; de Jong, M.; Asta, M.; Nardelli, M. B.; Curtarolo, S., High-throughput computational screening of thermal conductivity, Debye temperature, and Grüneisen parameter using a quasiharmonic Debye model. Physical Review B 2014, 90 (17), 174107. 51. Carrete, J.; Li, W.; Mingo, N.; Wang, S.; Curtarolo, S., Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Physical Review X 2014, 4 (1), 011019. 52. Miller, S. A.; Gorai, P.; Ortiz, B. R.; Goyal, A.; Gao, D.; Barnett, S. A.; Mason, T. O.; Snyder, G. J.; Lv, Q.; Stevanovic, V., Capturing Anharmonicity in a Lattice Thermal Conductivity Model for High-Throughput Predictions. Chemistry of Materials 2017, 29 (6), 2494-2501. 53. Madsen, G. K.; Singh, D. J., BoltzTraP. A code for calculating band-structure dependent quantities. Computer Physics Communications 2006, 175 (1), 67-71. 54. Yang, J.; Xi, L.; Zhang, W.; Chen, L.; Yang, J., Electrical transport properties of filled CoSb 3 skutterudites: a theoretical study. Journal of electronic materials 2009, 38 (7), 1397-1401. 55. Yan, J.; Gorai, P.; Ortiz, B.; Miller, S.; Barnett, S. A.; Mason, T.; Stevanović, V.; Toberer, E. S., Material descriptors for predicting thermoelectric performance. Energy & Environmental Science 2015, 8 (3), 983-994. 56. Wang, S.; Wang, Z.; Setyawan, W.; Mingo, N.; Curtarolo, S., Assessing the thermoelectric properties of sintered compounds via high-throughput ab-initio calculations. Physical Review X 2011, 1 (2), 021012. 57. Gibbs, Z. M.; Ricci, F.; Li, G.; Zhu, H.; Persson, K.; Ceder, G.; Hautier, G.; Jain, A.; Snyder, G. J., Effective mass and Fermi surface complexity factor from ab initio band structure calculations. npj Computational Materials 2017, 3 (1), 8. 58. Zhang, J.; Liu, R.; Cheng, N.; Zhang, Y.; Yang, J.; Uher, C.; Shi, X.; Chen, L.; Zhang, W., High‐Performance Pseudocubic Thermoelectric Materials from Non‐cubic Chalcopyrite Compounds. Advanced Materials 2014, 26 (23), 3848-3853. 59. Vapnik, V., Statistical learning theory. Wiley: New York, 1998. 60. Zadeh, L. A., Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Set. Syst. 1997, 90 (2), 111-127.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
22
ACCEPTED MANUSCRIPT
RI PT
Prof. Wencong Lu Department of Chemistry, and Materials Genome Institute, Shanghai University, 99 Shangda Road, Shanghai 200444, China
M AN U
SC
Dr. Wencong Lu, a professor in physical chemistry, graduated from Tsinghua University in 1986, and obtained a Ph.D. degree in Shanghai Institute of Metallurgy, Chinese Academy of Sciences in 2000. He joined the faculty of Shanghai University in 1986, and worked as a full professor in 2002. Prof. Lu’s research interests cover materials data mining, computer chemistry, and industrial optimization. He is the chairman of the Molecular Science Society of Shanghai (2002–present), and a commissioner of the Computer Chemistry Council, Chemical Society of China (2000–present). In the past 20 years, he has published more than 200 academic papers and 3 academic books.
EP
TE D
Prof. Wenqing Zhang Materials Genome Institute, Shanghai University, 99 Shangda Road, Shanghai 200444, China
AC C
Dr. Wenqing Zhang, a professor in physics and materials science, graduated from Xiamen University and obtained a Ph.D. degree in Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences in 1992. Since then, he has worked in a few research institutions in Europe and America. He joined the faculty of Institute of Ceramics, CAS at the special CAS “one-hundred-talent plan” in 2003, and moved to Materials Genome Institute of Shanghai University as a full professor in 2014. Prof. Zhang’s research interests cover energy conversion and storage materials including thermoelectrics and Li battery materials, computational materials science, and interface-related phenomena. Dr. Zhang was elected as an APS Fellow due to his pioneering work in thermoelectric material design in 2013.